draft-ietf-tcpm-accurate-ecn-01.txt   draft-ietf-tcpm-accurate-ecn-02.txt 
TCP Maintenance & Minor Extensions (tcpm) B. Briscoe TCP Maintenance & Minor Extensions (tcpm) B. Briscoe
Internet-Draft Simula Research Laboratory Internet-Draft Simula Research Laboratory
Intended status: Experimental M. Kuehlewind Intended status: Experimental M. Kuehlewind
Expires: January 1, 2017 ETH Zurich Expires: May 4, 2017 ETH Zurich
R. Scheffenegger R. Scheffenegger
NetApp, Inc. October 31, 2016
June 30, 2016
More Accurate ECN Feedback in TCP More Accurate ECN Feedback in TCP
draft-ietf-tcpm-accurate-ecn-01 draft-ietf-tcpm-accurate-ecn-02
Abstract Abstract
Explicit Congestion Notification (ECN) is a mechanism where network Explicit Congestion Notification (ECN) is a mechanism where network
nodes can mark IP packets instead of dropping them to indicate nodes can mark IP packets instead of dropping them to indicate
incipient congestion to the end-points. Receivers with an ECN- incipient congestion to the end-points. Receivers with an ECN-
capable transport protocol feed back this information to the sender. capable transport protocol feed back this information to the sender.
ECN is specified for TCP in such a way that only one feedback signal ECN is specified for TCP in such a way that only one feedback signal
can be transmitted per Round-Trip Time (RTT). Recently, new TCP can be transmitted per Round-Trip Time (RTT). Recently, new TCP
mechanisms like Congestion Exposure (ConEx) or Data Center TCP mechanisms like Congestion Exposure (ConEx) or Data Center TCP
skipping to change at page 1, line 45 skipping to change at page 1, line 44
Internet-Drafts are working documents of the Internet Engineering Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF). Note that other groups may also distribute Task Force (IETF). Note that other groups may also distribute
working documents as Internet-Drafts. The list of current Internet- working documents as Internet-Drafts. The list of current Internet-
Drafts is at http://datatracker.ietf.org/drafts/current/. Drafts is at http://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress." material or to cite them other than as "work in progress."
This Internet-Draft will expire on January 1, 2017. This Internet-Draft will expire on May 4, 2017.
Copyright Notice Copyright Notice
Copyright (c) 2016 IETF Trust and the persons identified as the Copyright (c) 2016 IETF Trust and the persons identified as the
document authors. All rights reserved. document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents Provisions Relating to IETF Documents
(http://trustee.ietf.org/license-info) in effect on the date of (http://trustee.ietf.org/license-info) in effect on the date of
publication of this document. Please review these documents publication of this document. Please review these documents
skipping to change at page 17, line 52 skipping to change at page 17, line 52
main TCP header (Section 3.1) is successful, it implicitly declares main TCP header (Section 3.1) is successful, it implicitly declares
that the endpoints also support the AccECN TCP Option. that the endpoints also support the AccECN TCP Option.
If the TCP client indicated AccECN support, a TCP server tha confirms If the TCP client indicated AccECN support, a TCP server tha confirms
its support for AccECN (as described in Section 3.1) SHOULD also its support for AccECN (as described in Section 3.1) SHOULD also
include an AccECN TCP Option in the SYN/ACK. A TCP client that has include an AccECN TCP Option in the SYN/ACK. A TCP client that has
successfully negotiated AccECN SHOULD include an AccECN Option in the successfully negotiated AccECN SHOULD include an AccECN Option in the
first ACK at the end of the 3WHS. However, this first ACK is not first ACK at the end of the 3WHS. However, this first ACK is not
delivered reliably, so the TCP client SHOULD also include an AccECN delivered reliably, so the TCP client SHOULD also include an AccECN
Option on the first data segment it sends (if it ever sends one). A Option on the first data segment it sends (if it ever sends one). A
host need not include an AccECN Option in any of these three cases if host MAY NOT include an AccECN Option in any of these three cases if
it has cached knowledge that the packet would be likely to be blocked it has cached knowledge that the packet would be likely to be blocked
on the path to the other host if it included an AccECN Option. on the path to the other host if it included an AccECN Option.
If the TCP client has successfully negotiated AccECN but does not If the TCP client has successfully negotiated AccECN but does not
receive an AccECN Option on the SYN/ACK, it switches into a mode that receive an AccECN Option on the SYN/ACK, it switches into a mode that
assumes that the AccECN Option is not available for this half assumes that the AccECN Option is not available for this half
connection. Similarly, if the TCP server has successfully negotiated connection. Similarly, if the TCP server has successfully negotiated
AccECN but does not receive an AccECN Option on the first ACK or on AccECN but does not receive an AccECN Option on the first ACK or on
the first data segment, it switches into a mode that assumes that the the first data segment, it switches into a mode that assumes that the
AccECN Option is not available for this half connection. AccECN Option is not available for this half connection.
skipping to change at page 18, line 34 skipping to change at page 18, line 34
AccECN Option. To expedite connection setup, the host SHOULD fall AccECN Option. To expedite connection setup, the host SHOULD fall
back to NS=CWR=ECE=0 and no AccECN Option on the retransmission of back to NS=CWR=ECE=0 and no AccECN Option on the retransmission of
the SYN/ACK. Implementers MAY use other fall-back strategies if they the SYN/ACK. Implementers MAY use other fall-back strategies if they
are found to be more effective (e.g. retransmitting a SYN/ACK with are found to be more effective (e.g. retransmitting a SYN/ACK with
AccECN TCP flags but not the AccECN Option; attempting to retransmit AccECN TCP flags but not the AccECN Option; attempting to retransmit
a second AccECN segment before fall-back (most appropriate during a second AccECN segment before fall-back (most appropriate during
high levels of congestion); or falling back to classic ECN feedback high levels of congestion); or falling back to classic ECN feedback
rather than non-ECN). rather than non-ECN).
Similarly, if the TCP client detects that the first data segment it Similarly, if the TCP client detects that the first data segment it
sent was lost, it SHOULD fall back to no AccECN Option on the sent with the AccECN Option was lost, it SHOULD fall back to no
retransmission. Again, implementers MAY use other fall-back AccECN Option on the retransmission. Again, implementers MAY use
strategies such as attempting to retransmit a second segment with the other fall-back strategies such as attempting to retransmit a second
AccECN Option before fall-back, and/or caching the result of previous segment with the AccECN Option before fall-back, and/or caching the
attempts. result of previous attempts.
Either host MAY include the AccECN Option in a subsequent segment to Either host MAY include the AccECN Option in a subsequent segment to
retest whether the AccECN Option can traverse the path. retest whether the AccECN Option can traverse the path.
Currently the Data Sender is not required to test whether the Currently the Data Sender is not required to test whether the
arriving byte counters in the AccECN Option have been correctly arriving byte counters in the AccECN Option have been correctly
initialised. This allows different initial values to be used as an initialised. This allows different initial values to be used as an
additional signalling channel in future. If any inappropriate additional signalling channel in future. If any inappropriate
zeroing of these fields is discovered during testing, this approach zeroing of these fields is discovered during testing, this approach
will need to be reviewed. will need to be reviewed.
skipping to change at page 34, line 51 skipping to change at page 34, line 51
rather than 3, so that the division could be implemented as an rather than 3, so that the division could be implemented as an
integer right bit-shift by lg(BEACON_FREQ). integer right bit-shift by lg(BEACON_FREQ).
In certain operating systems, it might be too complex to maintain In certain operating systems, it might be too complex to maintain
acks_in_round. In others it might be possible by tagging each data acks_in_round. In others it might be possible by tagging each data
segment in the retransmit buffer with the number of ACKs sent at the segment in the retransmit buffer with the number of ACKs sent at the
point that segment was sent. This would not work well if the Data point that segment was sent. This would not work well if the Data
Receiver was not sending data itself, in which case it might be Receiver was not sending data itself, in which case it might be
necessary to beacon based on time instead, as follows: necessary to beacon based on time instead, as follows:
if (time_now > time_last_option_sent + RTT / BEACON_FREQ) if ( time_now > time_last_option_sent + (RTT / BEACON_FREQ) )
send_full_AccECN_Option() send_full_AccECN_Option()
However, this time-based approach does not work well when all the This time-based approach does not work well when all the ACKs are
ACKs are sent early in each round trip, as is the case during slow- sent early in each round trip, as is the case during slow-start. In
start. this case few options will be sent (evtl. even less than 3 per RTT).
However, when continuously sending data, data packets as well as ACKs
{ToDo: A simple and robust beaconing algorithm for all circumstances will spread out equally over the RTT and sufficient ACKs with the
is still work-in-progress.} AccECN option will be sent.
A.5. Example Algorithm to Count Not-ECT Bytes A.5. Example Algorithm to Count Not-ECT Bytes
A Data Sender in AccECN mode can infer the amount of TCP payload data A Data Sender in AccECN mode can infer the amount of TCP payload data
arriving at the receiver marked Not-ECT from the difference between arriving at the receiver marked Not-ECT from the difference between
the amount of newly ACKed data and the sum of the bytes with the the amount of newly ACKed data and the sum of the bytes with the
other three markings, d.ceb, d.e0b and d.e1b. Note that, because other three markings, d.ceb, d.e0b and d.e1b. Note that, because
r.e0b is initialised to 1 and the other two counters are initialised r.e0b is initialised to 1 and the other two counters are initialised
to 0, the initial sum will be 1, which matches the initial offset of to 0, the initial sum will be 1, which matches the initial offset of
the TCP sequence number on completion of the 3WHS. the TCP sequence number on completion of the 3WHS.
skipping to change at page 36, line 28 skipping to change at page 36, line 28
middlebox had stripped the option. middlebox had stripped the option.
Appendix C. Open Protocol Design Issues (To Be Removed Before Appendix C. Open Protocol Design Issues (To Be Removed Before
Publication) Publication)
1. Currently it is specified that the receiver `SHOULD' use Change- 1. Currently it is specified that the receiver `SHOULD' use Change-
Triggered ACKs. It is controversial whether this ought to be a Triggered ACKs. It is controversial whether this ought to be a
`MUST' instead. A `SHOULD' would leave the Data Sender uncertain `MUST' instead. A `SHOULD' would leave the Data Sender uncertain
whether it can rely on the timing and ordering information in whether it can rely on the timing and ordering information in
ACKs. If the sender guesses wrongly, it will probably introduce ACKs. If the sender guesses wrongly, it will probably introduce
at least 1RTT of delay before it can use this timing information. at least 1 RTT of delay before it can use this timing
Ironically it will most likely be wanting this information to information. Ironically it will most likely be wanting this
reduce ramp-up delay. A `MUST' could make it hard to implement information to reduce ramp-up delay. A `MUST' could make it hard
AccECN in offload hardware. However, it is not known whether to implement AccECN in offload hardware. However, it is not
AccECN would be hard to implement in such hardware even with a known whether AccECN would be hard to implement in such hardware
`SHOULD' here. For instance, was it hard to offload DCTCP to even with a `SHOULD' here. For instance, was it hard to offload
hardware because of change-triggered ACKs, or was this just one DCTCP to hardware because of change-triggered ACKs, or was this
of many reasons? The choice between MUST and SHOULD here is just one of many reasons? The choice between MUST and SHOULD
critical. Before that choice is made, a clear use-case for here is critical. Before that choice is made, a clear use-case
certainty of timing and ordering information is needed, plus for certainty of timing and ordering information is needed, plus
well-informed discussion about hardware offload constraints. well-informed discussion about hardware offload constraints.
2. There is possibly a concern that a receiver could deliberately 2. There is possibly a concern that a receiver could deliberately
omit the AccECN Option pretending that it had been stripped by a omit the AccECN Option pretending that it had been stripped by a
middlebox. No known way can yet be contrived to take advantage middlebox. No known way can yet be contrived to take advantage
of this downgrade attack, but it is mentioned here in case of this downgrade attack, but it is mentioned here in case
someone else can contrive one. someone else can contrive one.
3. The s.cep counter might increase even if the s.ceb counter does 3. The s.cep counter might increase even if the s.ceb counter does
not (e.g. due to a CE-marked control packet). The sender's not (e.g. due to a CE-marked control packet). The sender's
skipping to change at page 37, line 26 skipping to change at page 37, line 26
Authors' Addresses Authors' Addresses
Bob Briscoe Bob Briscoe
Simula Research Laboratory Simula Research Laboratory
EMail: ietf@bobbriscoe.net EMail: ietf@bobbriscoe.net
URI: http://bobbriscoe.net/ URI: http://bobbriscoe.net/
Mirja Kuehlewind Mirja Kuehlewind
ETH Zurich ETH Zurich
Gloriastrasse 35 Zurich
Zurich 8092
Switzerland Switzerland
EMail: mirja.kuehlewind@tik.ee.ethz.ch EMail: mirja.kuehlewind@tik.ee.ethz.ch
Richard Scheffenegger Richard Scheffenegger
NetApp, Inc. Vienna
Am Euro Platz 2
Vienna 1120
Austria Austria
Phone: +43 1 3676811 3146 EMail: rscheff@gmx.at
EMail: rs@netapp.com
 End of changes. 12 change blocks. 
33 lines changed or deleted 29 lines changed or added

This html diff was produced by rfcdiff 1.45. The latest version is available from http://tools.ietf.org/tools/rfcdiff/