draft-ietf-tcpm-accurate-ecn-07.txt   draft-ietf-tcpm-accurate-ecn-08.txt 
TCP Maintenance & Minor Extensions (tcpm) B. Briscoe TCP Maintenance & Minor Extensions (tcpm) B. Briscoe
Internet-Draft CableLabs Internet-Draft CableLabs
Intended status: Experimental M. Kuehlewind Intended status: Experimental M. Kuehlewind
Expires: January 3, 2019 ETH Zurich Expires: September 12, 2019 ETH Zurich
R. Scheffenegger R. Scheffenegger
July 2, 2018 March 11, 2019
More Accurate ECN Feedback in TCP More Accurate ECN Feedback in TCP
draft-ietf-tcpm-accurate-ecn-07 draft-ietf-tcpm-accurate-ecn-08
Abstract Abstract
Explicit Congestion Notification (ECN) is a mechanism where network Explicit Congestion Notification (ECN) is a mechanism where network
nodes can mark IP packets instead of dropping them to indicate nodes can mark IP packets instead of dropping them to indicate
incipient congestion to the end-points. Receivers with an ECN- incipient congestion to the end-points. Receivers with an ECN-
capable transport protocol feed back this information to the sender. capable transport protocol feed back this information to the sender.
ECN is specified for TCP in such a way that only one feedback signal ECN is specified for TCP in such a way that only one feedback signal
can be transmitted per Round-Trip Time (RTT). Recently, new TCP can be transmitted per Round-Trip Time (RTT). Recent new TCP
mechanisms like Congestion Exposure (ConEx), Data Center TCP (DCTCP) mechanisms like Congestion Exposure (ConEx), Data Center TCP (DCTCP)
or Low Latency Low Loss Scalable Throughput (L4S) need more accurate or Low Latency Low Loss Scalable Throughput (L4S) need more accurate
ECN feedback information whenever more than one marking is received ECN feedback information whenever more than one marking is received
in one RTT. This document specifies an experimental scheme to in one RTT. This document specifies an experimental scheme to
provide more than one feedback signal per RTT in the TCP header. provide more than one feedback signal per RTT in the TCP header.
Given TCP header space is scarce, it allocates a reserved header bit, Given TCP header space is scarce, it allocates a reserved header bit,
that was previously used for the ECN-Nonce which has now been that was previously used for the ECN-Nonce which has now been
declared historic. It also overloads the two existing ECN flags in declared historic. It also overloads the two existing ECN flags in
the TCP header. Supplementary feedback information can optionally be the TCP header. Supplementary feedback information can optionally be
provided in a new TCP option, which is never used on the TCP SYN. provided in a new TCP option, which is never used on the TCP SYN.
skipping to change at page 1, line 47 skipping to change at page 1, line 47
Internet-Drafts are working documents of the Internet Engineering Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF). Note that other groups may also distribute Task Force (IETF). Note that other groups may also distribute
working documents as Internet-Drafts. The list of current Internet- working documents as Internet-Drafts. The list of current Internet-
Drafts is at https://datatracker.ietf.org/drafts/current/. Drafts is at https://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress." material or to cite them other than as "work in progress."
This Internet-Draft will expire on January 3, 2019. This Internet-Draft will expire on September 12, 2019.
Copyright Notice Copyright Notice
Copyright (c) 2018 IETF Trust and the persons identified as the Copyright (c) 2019 IETF Trust and the persons identified as the
document authors. All rights reserved. document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents Provisions Relating to IETF Documents
(https://trustee.ietf.org/license-info) in effect on the date of (https://trustee.ietf.org/license-info) in effect on the date of
publication of this document. Please review these documents publication of this document. Please review these documents
carefully, as they describe your rights and restrictions with respect carefully, as they describe your rights and restrictions with respect
to this document. Code Components extracted from this document must to this document. Code Components extracted from this document must
include Simplified BSD License text as described in Section 4.e of include Simplified BSD License text as described in Section 4.e of
the Trust Legal Provisions and are provided without warranty as the Trust Legal Provisions and are provided without warranty as
skipping to change at page 2, line 32 skipping to change at page 2, line 32
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3
1.1. Document Roadmap . . . . . . . . . . . . . . . . . . . . 4 1.1. Document Roadmap . . . . . . . . . . . . . . . . . . . . 4
1.2. Goals . . . . . . . . . . . . . . . . . . . . . . . . . . 5 1.2. Goals . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.3. Experiment Goals . . . . . . . . . . . . . . . . . . . . 5 1.3. Experiment Goals . . . . . . . . . . . . . . . . . . . . 5
1.4. Terminology . . . . . . . . . . . . . . . . . . . . . . . 6 1.4. Terminology . . . . . . . . . . . . . . . . . . . . . . . 6
1.5. Recap of Existing ECN feedback in IP/TCP . . . . . . . . 7 1.5. Recap of Existing ECN feedback in IP/TCP . . . . . . . . 7
2. AccECN Protocol Overview and Rationale . . . . . . . . . . . 8 2. AccECN Protocol Overview and Rationale . . . . . . . . . . . 8
2.1. Capability Negotiation . . . . . . . . . . . . . . . . . 9 2.1. Capability Negotiation . . . . . . . . . . . . . . . . . 9
2.2. Feedback Mechanism . . . . . . . . . . . . . . . . . . . 9 2.2. Feedback Mechanism . . . . . . . . . . . . . . . . . . . 9
2.3. Delayed ACKs and Resilience Against ACK Loss . . . . . . 10 2.3. Delayed ACKs and Resilience Against ACK Loss . . . . . . 10
2.4. Feedback Metrics . . . . . . . . . . . . . . . . . . . . 10 2.4. Feedback Metrics . . . . . . . . . . . . . . . . . . . . 11
2.5. Generic (Dumb) Reflector . . . . . . . . . . . . . . . . 11 2.5. Generic (Dumb) Reflector . . . . . . . . . . . . . . . . 11
3. AccECN Protocol Specification . . . . . . . . . . . . . . . . 12 3. AccECN Protocol Specification . . . . . . . . . . . . . . . . 12
3.1. Negotiating to use AccECN . . . . . . . . . . . . . . . . 12 3.1. Negotiating to use AccECN . . . . . . . . . . . . . . . . 12
3.1.1. Negotiation during the TCP handshake . . . . . . . . 12 3.1.1. Negotiation during the TCP handshake . . . . . . . . 12
3.1.2. Retransmission of the SYN . . . . . . . . . . . . . . 14 3.1.2. Forward Compatibility . . . . . . . . . . . . . . . . 14
3.1.3. Retransmission of the SYN . . . . . . . . . . . . . . 15
3.2. AccECN Feedback . . . . . . . . . . . . . . . . . . . . . 15 3.2. AccECN Feedback . . . . . . . . . . . . . . . . . . . . . 15
3.2.1. Initialization of Feedback Counters at the Data 3.2.1. Initialization of Feedback Counters at the Data
Sender . . . . . . . . . . . . . . . . . . . . . . . 16 Sender . . . . . . . . . . . . . . . . . . . . . . . 16
3.2.2. The ACE Field . . . . . . . . . . . . . . . . . . . . 16 3.2.2. The ACE Field . . . . . . . . . . . . . . . . . . . . 16
3.2.3. Testing for Zeroing of the ACE Field . . . . . . . . 18 3.2.3. Testing for Zeroing of the ACE Field . . . . . . . . 18
3.2.4. Testing for Mangling of the IP/ECN Field . . . . . . 18 3.2.4. Testing for Mangling of the IP/ECN Field . . . . . . 19
3.2.5. Safety against Ambiguity of the ACE Field . . . . . . 19 3.2.5. Safety against Ambiguity of the ACE Field . . . . . . 20
3.2.6. The AccECN Option . . . . . . . . . . . . . . . . . . 20 3.2.6. The AccECN Option . . . . . . . . . . . . . . . . . . 20
3.2.7. Path Traversal of the AccECN Option . . . . . . . . . 21 3.2.7. Path Traversal of the AccECN Option . . . . . . . . . 22
3.2.8. Usage of the AccECN TCP Option . . . . . . . . . . . 25 3.2.8. Usage of the AccECN TCP Option . . . . . . . . . . . 25
3.3. Requirements for TCP Proxies, Offload Engines and other 3.3. Requirements for TCP Proxies, Offload Engines and other
Middleboxes on AccECN Compliance . . . . . . . . . . . . 26 Middleboxes on AccECN Compliance . . . . . . . . . . . . 27
4. Interaction with Other TCP Variants . . . . . . . . . . . . . 27 4. Interaction with Other TCP Variants . . . . . . . . . . . . . 28
4.1. Compatibility with SYN Cookies . . . . . . . . . . . . . 27 4.1. Compatibility with SYN Cookies . . . . . . . . . . . . . 28
4.2. Compatibility with Other TCP Options and Experiments . . 28 4.2. Compatibility with Other TCP Options and Experiments . . 29
4.3. Compatibility with Feedback Integrity Mechanisms . . . . 28 4.3. Compatibility with Feedback Integrity Mechanisms . . . . 29
5. Protocol Properties . . . . . . . . . . . . . . . . . . . . . 29 5. Protocol Properties . . . . . . . . . . . . . . . . . . . . . 30
6. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 31 6. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 32
7. Security Considerations . . . . . . . . . . . . . . . . . . . 32 7. Security Considerations . . . . . . . . . . . . . . . . . . . 33
8. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 33 8. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 33
9. Comments Solicited . . . . . . . . . . . . . . . . . . . . . 33 9. Comments Solicited . . . . . . . . . . . . . . . . . . . . . 34
10. References . . . . . . . . . . . . . . . . . . . . . . . . . 33 10. References . . . . . . . . . . . . . . . . . . . . . . . . . 34
10.1. Normative References . . . . . . . . . . . . . . . . . . 33 10.1. Normative References . . . . . . . . . . . . . . . . . . 34
10.2. Informative References . . . . . . . . . . . . . . . . . 34 10.2. Informative References . . . . . . . . . . . . . . . . . 35
Appendix A. Example Algorithms . . . . . . . . . . . . . . . . . 36 Appendix A. Example Algorithms . . . . . . . . . . . . . . . . . 37
A.1. Example Algorithm to Encode/Decode the AccECN Option . . 36 A.1. Example Algorithm to Encode/Decode the AccECN Option . . 37
A.2. Example Algorithm for Safety Against Long Sequences of A.2. Example Algorithm for Safety Against Long Sequences of
ACK Loss . . . . . . . . . . . . . . . . . . . . . . . . 37 ACK Loss . . . . . . . . . . . . . . . . . . . . . . . . 38
A.2.1. Safety Algorithm without the AccECN Option . . . . . 37 A.2.1. Safety Algorithm without the AccECN Option . . . . . 38
A.2.2. Safety Algorithm with the AccECN Option . . . . . . . 39 A.2.2. Safety Algorithm with the AccECN Option . . . . . . . 40
A.3. Example Algorithm to Estimate Marked Bytes from Marked A.3. Example Algorithm to Estimate Marked Bytes from Marked
Packets . . . . . . . . . . . . . . . . . . . . . . . . . 40 Packets . . . . . . . . . . . . . . . . . . . . . . . . . 41
A.4. Example Algorithm to Beacon AccECN Options . . . . . . . 41 A.4. Example Algorithm to Beacon AccECN Options . . . . . . . 42
A.5. Example Algorithm to Count Not-ECT Bytes . . . . . . . . 42 A.5. Example Algorithm to Count Not-ECT Bytes . . . . . . . . 43
Appendix B. Rationale for Usage of TCP Header Flags . . . . . . 42 Appendix B. Rationale for Usage of TCP Header Flags . . . . . . 43
B.1. Three TCP Header Flags in the SYN-SYN/ACK Handshake . . . 42 B.1. Three TCP Header Flags in the SYN-SYN/ACK Handshake . . . 43
B.2. Four Codepoints in the SYN/ACK . . . . . . . . . . . . . 43 B.2. Four Codepoints in the SYN/ACK . . . . . . . . . . . . . 44
B.3. Space for Future Evolution . . . . . . . . . . . . . . . 44 B.3. Space for Future Evolution . . . . . . . . . . . . . . . 45
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 44 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 46
1. Introduction 1. Introduction
Explicit Congestion Notification (ECN) [RFC3168] is a mechanism where Explicit Congestion Notification (ECN) [RFC3168] is a mechanism where
network nodes can mark IP packets instead of dropping them to network nodes can mark IP packets instead of dropping them to
indicate incipient congestion to the end-points. Receivers with an indicate incipient congestion to the end-points. Receivers with an
ECN-capable transport protocol feed back this information to the ECN-capable transport protocol feed back this information to the
sender. ECN is specified for TCP in such a way that only one sender. ECN is specified for TCP in such a way that only one
feedback signal can be transmitted per Round-Trip Time (RTT). feedback signal can be transmitted per Round-Trip Time (RTT).
Recently, proposed mechanisms like Congestion Exposure (ConEx Recently, proposed mechanisms like Congestion Exposure (ConEx
skipping to change at page 4, line 36 skipping to change at page 4, line 37
AccECN is solely an (experimental) change to the TCP wire protocol; AccECN is solely an (experimental) change to the TCP wire protocol;
it only specifies the negotiation and signaling of more accurate ECN it only specifies the negotiation and signaling of more accurate ECN
feedback from a TCP Data Receiver to a Data Sender. It is completely feedback from a TCP Data Receiver to a Data Sender. It is completely
independent of how TCP might respond to congestion feedback, which is independent of how TCP might respond to congestion feedback, which is
out of scope. For that we refer to [RFC3168] or any RFC that out of scope. For that we refer to [RFC3168] or any RFC that
specifies a different response to TCP ECN feedback, for example: specifies a different response to TCP ECN feedback, for example:
[RFC8257]; or the ECN experiments referred to in [RFC8311], namely: a [RFC8257]; or the ECN experiments referred to in [RFC8311], namely: a
TCP-based Low Latency Low Loss Scalable (L4S) congestion control TCP-based Low Latency Low Loss Scalable (L4S) congestion control
[I-D.ietf-tsvwg-l4s-arch]; ECN-capable TCP control packets [I-D.ietf-tsvwg-l4s-arch]; ECN-capable TCP control packets
[I-D.ietf-tcpm-generalized-ecn], or Alternative Backoff with ECN [I-D.ietf-tcpm-generalized-ecn], or Alternative Backoff with ECN
(ABE) [I-D.ietf-tcpm-alternativebackoff-ecn]. (ABE) [RFC8511].
It is likely (but not required) that the AccECN protocol will be It is recommended that the AccECN protocol is implemented alongside
implemented along with the following experimental additions to the the experimental ECN++ protocol [I-D.ietf-tcpm-generalized-ecn].
TCP-ECN protocol: ECN-capable TCP control packets and retransmissions Therefore, this specification does not discuss implementing AccECN
[I-D.ietf-tcpm-generalized-ecn], which includes the ECN-capable SYN/ alongside [RFC5562], which was an earlier experimental protocol with
ACK experiment [RFC5562]; and testing receiver non-compliance narrower scope than ECN++.
[I-D.moncaster-tcpm-rcv-cheat].
1.1. Document Roadmap 1.1. Document Roadmap
The following introductory sections outline the goals of AccECN The following introductory sections outline the goals of AccECN
(Section 1.2) and the goal of experiments with ECN (Section 1.3) so (Section 1.2) and the goal of experiments with ECN (Section 1.3) so
that it is clear what success would look like. Then terminology is that it is clear what success would look like. Then terminology is
defined (Section 1.4) and a recap of existing prerequisite technology defined (Section 1.4) and a recap of existing prerequisite technology
is given (Section 1.5). is given (Section 1.5).
Section 2 gives an informative overview of the AccECN protocol. Then Section 2 gives an informative overview of the AccECN protocol. Then
skipping to change at page 5, line 50 skipping to change at page 5, line 50
more accurate ECN feedback to the TCP protocol. The intention is to more accurate ECN feedback to the TCP protocol. The intention is to
specify the protocol sufficiently so that more than one specify the protocol sufficiently so that more than one
implementation can be built in order to test its function, robustness implementation can be built in order to test its function, robustness
and interoperability (with itself and with previous version of ECN and interoperability (with itself and with previous version of ECN
and TCP). and TCP).
The experimental protocol will be considered successful if testing The experimental protocol will be considered successful if testing
confirms that the proposed mechanism can be deployed at large scale. confirms that the proposed mechanism can be deployed at large scale.
Testing will mostly focus on fall-back strategies in case of Testing will mostly focus on fall-back strategies in case of
middlebox interference. Current recommended strategies are specified middlebox interference. Current recommended strategies are specified
in Sections 3.1.2, 3.2.3, 3.2.4 and 3.2.7. The effectiveness of in Sections 3.1.3, 3.2.3, 3.2.4 and 3.2.7. The effectiveness of
these strategies depends on the actual deployment situation of these strategies depends on the actual deployment situation of
middleboxes. Therefore experimental verification to confirm large- middleboxes. Therefore experimental verification to confirm large-
scale path traversal in the Internet is needed before finalizing this scale path traversal in the Internet is needed before finalizing this
specification on the Standards Track. specification on the Standards Track.
Another experimentation focus is the implementation feasibiliy of Another experimentation focus is the implementation feasibiliy of
change-triggered ACKs as described in section 3.2.8. While on change-triggered ACKs as described in section 3.2.8. While on
average this should not lead to a higher ACK rate, it changes the ACK average this should not lead to a higher ACK rate, it changes the ACK
pattern which can particularly have an impact on hardware offload. pattern which can particularly have an impact on hardware offload.
It is currently specified as a hard requirement, because the sender It is currently specified as a hard requirement, because the sender
skipping to change at page 7, line 20 skipping to change at page 7, line 20
to indicate an ECN-capable transport (ECT). If both ECN bits are to indicate an ECN-capable transport (ECT). If both ECN bits are
zero, the packet is considered to have been sent by a Not-ECN-capable zero, the packet is considered to have been sent by a Not-ECN-capable
Transport (Not-ECT). When a network node experiences congestion, it Transport (Not-ECT). When a network node experiences congestion, it
will occasionally either drop or mark a packet, with the choice will occasionally either drop or mark a packet, with the choice
depending on the packet's ECN codepoint. If the codepoint is Not- depending on the packet's ECN codepoint. If the codepoint is Not-
ECT, only drop is appropriate. If the codepoint is ECT(0) or ECT(1), ECT, only drop is appropriate. If the codepoint is ECT(0) or ECT(1),
the node can mark the packet by setting both ECN bits, which is the node can mark the packet by setting both ECN bits, which is
termed 'Congestion Experienced' (CE), or loosely a 'congestion mark'. termed 'Congestion Experienced' (CE), or loosely a 'congestion mark'.
Table 1 summarises these codepoints. Table 1 summarises these codepoints.
+-----------------------+---------------+---------------------------+ +-------------------------+---------------+-------------------------+
| IP-ECN codepoint | Codepoint | Description | | IP-ECN codepoint | Codepoint | Description |
| (binary) | name | | | (binary) | name | |
+-----------------------+---------------+---------------------------+ +-------------------------+---------------+-------------------------+
| 00 | Not-ECT | Not ECN-Capable Transport | | 00 | Not-ECT | Not ECN-Capable |
| 01 | ECT(1) | ECN-Capable Transport (1) | | | | Transport |
| 10 | ECT(0) | ECN-Capable Transport (0) | | 01 | ECT(1) | ECN-Capable Transport |
| 11 | CE | Congestion Experienced | | | | (1) |
+-----------------------+---------------+---------------------------+ | 10 | ECT(0) | ECN-Capable Transport |
| | | (0) |
| 11 | CE | Congestion Experienced |
+-------------------------+---------------+-------------------------+
Table 1: The ECN Field in the IP Header Table 1: The ECN Field in the IP Header
In the TCP header the first two bits in byte 14 are defined as flags In the TCP header the first two bits in byte 14 are defined as flags
for the use of ECN (CWR and ECE in Figure 1 [RFC3168]). A TCP client for the use of ECN (CWR and ECE in Figure 1 [RFC3168]). A TCP client
indicates it supports ECN by setting ECE=CWR=1 in the SYN, and an indicates it supports ECN by setting ECE=CWR=1 in the SYN, and an
ECN-enabled server confirms ECN support by setting ECE=1 and CWR=0 in ECN-enabled server confirms ECN support by setting ECE=1 and CWR=0 in
the SYN/ACK. On reception of a CE-marked packet at the IP layer, the the SYN/ACK. On reception of a CE-marked packet at the IP layer, the
Data Receiver starts to set the Echo Congestion Experienced (ECE) Data Receiver starts to set the Echo Congestion Experienced (ECE)
flag continuously in the TCP header of ACKs, which ensures the signal flag continuously in the TCP header of ACKs, which ensures the signal
skipping to change at page 11, line 31 skipping to change at page 11, line 37
private networks (e.g. data centres) set control packets to be ECN private networks (e.g. data centres) set control packets to be ECN
capable because they are precisely the packets that performance capable because they are precisely the packets that performance
depends on most. depends on most.
For this reason, AccECN is designed to be a generic reflector of For this reason, AccECN is designed to be a generic reflector of
whatever ECN markings it sees, whether or not they are compliant with whatever ECN markings it sees, whether or not they are compliant with
a current standard. Then as standards evolve, Data Senders can a current standard. Then as standards evolve, Data Senders can
upgrade unilaterally without any need for receivers to upgrade too. upgrade unilaterally without any need for receivers to upgrade too.
It is also useful to be able to rely on generic reflection behaviour It is also useful to be able to rely on generic reflection behaviour
when senders need to test for unexpected interference with markings when senders need to test for unexpected interference with markings
(for instance [I-D.kuehlewind-tcpm-ecn-fallback] and (for instance [I-D.kuehlewind-tcpm-ecn-fallback] and para 2 of
[I-D.moncaster-tcpm-rcv-cheat]). Section 20.2 of [RFC3168]).
The initial SYN is the most critical control packet, so AccECN The initial SYN is the most critical control packet, so AccECN
provides feedback on whether it is CE marked. Although RFC 3168 provides feedback on its ECN marking. Although RFC 3168 prohibits an
prohibits an ECN-capable SYN, providing feedback of CE marking on the ECN-capable SYN, providing feedback of ECN marking on the SYN
SYN supports future scenarios in which SYNs might be ECN-enabled supports future scenarios in which SYNs might be ECN-enabled (without
(without prejudging whether they ought to be). For instance, prejudging whether they ought to be). For instance, [RFC8311]
[RFC8311] updates this aspect of RFC 3168 to allow experimentation updates this aspect of RFC 3168 to allow experimentation with ECN-
with ECN-capable TCP control packets. capable TCP control packets.
Even if the TCP client (or server) has set the SYN (or SYN/ACK) to Even if the TCP client (or server) has set the SYN (or SYN/ACK) to
not-ECT in compliance with RFC 3168, feedback on the state of the ECN not-ECT in compliance with RFC 3168, feedback on the state of the ECN
field when it arrives at the receiver could still be useful, because field when it arrives at the receiver could still be useful, because
middleboxes have been known to overwrite the ECN IP field as if it is middleboxes have been known to overwrite the ECN IP field as if it is
still part of the old Type of Service (ToS) field [Mandalari18]. If still part of the old Type of Service (ToS) field [Mandalari18]. If
a TCP client has set the SYN to Not-ECT, but receives feedback that a TCP client has set the SYN to Not-ECT, but receives feedback that
the ECN field on the SYN arrived with a different codepoint, it can the ECN field on the SYN arrived with a different codepoint, it can
detect such middlebox interference and send Not-ECT for the rest of detect such middlebox interference and send Not-ECT for the rest of
the connection (see [I-D.kuehlewind-tcpm-ecn-fallback]). Today, if a the connection (see [I-D.kuehlewind-tcpm-ecn-fallback]). Today, if a
skipping to change at page 12, line 47 skipping to change at page 13, line 6
that arrived on the SYN. This applies whether or not the server that arrived on the SYN. This applies whether or not the server
itself supports setting the IP-ECN field on a SYN or SYN/ACK (see itself supports setting the IP-ECN field on a SYN or SYN/ACK (see
Section 2.5 for rationale). Section 2.5 for rationale).
Once a TCP client (A) has sent the above SYN to declare that it Once a TCP client (A) has sent the above SYN to declare that it
supports AccECN, and once it has received the above SYN/ACK segment supports AccECN, and once it has received the above SYN/ACK segment
that confirms that the TCP server supports AccECN, the TCP client that confirms that the TCP server supports AccECN, the TCP client
MUST set both its half connections into AccECN mode. MUST set both its half connections into AccECN mode.
The procedure for the client to follow if a SYN/ACK does not arrive The procedure for the client to follow if a SYN/ACK does not arrive
before its retransmission timer expires is given in Section 3.1.2. before its retransmission timer expires is given in Section 3.1.3.
The three flags set to 1 to indicate AccECN support on the SYN have The three flags set to 1 to indicate AccECN support on the SYN have
been carefully chosen to enable natural fall-back to prior stages in been carefully chosen to enable natural fall-back to prior stages in
the evolution of ECN. Table 2 tabulates all the negotiation the evolution of ECN. Table 2 tabulates all the negotiation
possibilities for ECN-related capabilities that involve at least one possibilities for ECN-related capabilities that involve at least one
AccECN-capable host. The entries in the first two columns have been AccECN-capable host. The entries in the first two columns have been
abbreviated, as follows: abbreviated, as follows:
AccECN: More Accurate ECN Feedback (the present specification) AccECN: More Accurate ECN Feedback (the present specification)
skipping to change at page 14, line 24 skipping to change at page 14, line 29
4. The fourth block displays a combination labelled `Broken' . Some 4. The fourth block displays a combination labelled `Broken' . Some
older TCP server implementations incorrectly set the reserved older TCP server implementations incorrectly set the reserved
flags in the SYN/ACK by reflecting those in the SYN. Such broken flags in the SYN/ACK by reflecting those in the SYN. Such broken
TCP servers (B) cannot support ECN, so as soon as an AccECN- TCP servers (B) cannot support ECN, so as soon as an AccECN-
capable TCP client (A) receives such a broken SYN/ACK it MUST capable TCP client (A) receives such a broken SYN/ACK it MUST
fall-back to Not ECN mode for both its half connections. fall-back to Not ECN mode for both its half connections.
The following exceptional cases need some explanation: The following exceptional cases need some explanation:
ECN Nonce: With AccECN implementation, there is no need for the ECN ECN Nonce: With AccECN implementation, there is no need for the ECN
Nonce feedback mode [RFC3540], which has also been reclassified as Nonce feedback mode [RFC3540], which has been reclassified as
historic [RFC8311], as AccECN is compatible with an alternative historic [RFC8311], as AccECN is compatible with an alternative
ECN feedback integrity approach that does not use up the ECT(1) ECN feedback integrity approach that does not use up the ECT(1)
codepoint and can be implemented solely at the sender (see codepoint and can be implemented solely at the sender (see
Section 4.3). Section 4.3).
Simultaneous Open: An originating AccECN Host (A), having sent a SYN Simultaneous Open: An originating AccECN Host (A), having sent a SYN
with AE=1, CWR=1 and ECE=1, might receive another SYN from host B. with AE=1, CWR=1 and ECE=1, might receive another SYN from host B.
Host A MUST then enter the same feedback mode as it would have Host A MUST then enter the same feedback mode as it would have
entered had it been a responding host and received the same SYN. entered had it been a responding host and received the same SYN.
Then host A MUST send the same SYN/ACK as it would have sent had Then host A MUST send the same SYN/ACK as it would have sent had
it been a responding host. it been a responding host.
3.1.2. Retransmission of the SYN 3.1.2. Forward Compatibility
If a TCP server that implements AccECN receives a SYN with the three
TCP header flags (AE, CWR and ECE) set to any combination other than
000, 011 or 111, it MUST negotiate the use of AccECN as if they had
been set to 111. This ensures that future uses of the other
combinations on a SYN can rely on consistent behaviour from the
installed base of AccECN servers.
For the avoidance of doubt, the negotiation tabulated in Table 2
solely concerns the three TCP header flags shown (AE, CWR and ECE).
An AccECN host (client or server) MUST ignore the three remaining
reserved TCP header flags on all packets.
3.1.3. Retransmission of the SYN
If the sender of an AccECN SYN times out before receiving the SYN/ If the sender of an AccECN SYN times out before receiving the SYN/
ACK, the sender SHOULD attempt to negotiate the use of AccECN at ACK, the sender SHOULD attempt to negotiate the use of AccECN at
least one more time by continuing to set all three TCP ECN flags on least one more time by continuing to set all three TCP ECN flags on
the first retransmitted SYN (using the usual retransmission time- the first retransmitted SYN (using the usual retransmission time-
outs). If this first retransmission also fails to be acknowledged, outs). If this first retransmission also fails to be acknowledged,
the sender SHOULD send subsequent retransmissions of the SYN without the sender SHOULD send subsequent retransmissions of the SYN without
any TCP-ECN flags set. This adds delay, in the case where a any TCP-ECN flags set. This adds delay, in the case where a
middlebox drops an AccECN (or ECN) SYN deliberately. However, middlebox drops an AccECN (or ECN) SYN deliberately. However,
current measurements imply that a drop is less likely to be due to current measurements imply that a drop is less likely to be due to
skipping to change at page 17, line 13 skipping to change at page 17, line 26
retransmission of an unacknowledged SYN/ACK, or when both ends send retransmission of an unacknowledged SYN/ACK, or when both ends send
SYN/ACKs after AccECN support has been successfully negotiated during SYN/ACKs after AccECN support has been successfully negotiated during
a simultaneous open). a simultaneous open).
With only one exception, on any packet with the SYN flag cleared With only one exception, on any packet with the SYN flag cleared
(SYN=0), the Data Receiver MUST encode the three least significant (SYN=0), the Data Receiver MUST encode the three least significant
bits of its r.cep counter into the ACE field it feeds back to the bits of its r.cep counter into the ACE field it feeds back to the
Data Sender. Data Sender.
There is only one exception to this rule: On the final ACK of the There is only one exception to this rule: On the final ACK of the
3WHS, a TCP client (A) in AccECN mode MUST use the ACE field to feed 3-way handshake (3WHS), a TCP client (A) in AccECN mode MUST use the
back which of the 4 possible values of the IP-ECN field were on the ACE field to feed back which of the 4 possible values of the IP-ECN
SYN/ACK (the binary encoding is the same as that used on the SYN/ field were on the SYN/ACK (the binary encoding is the same as that
ACK). Table 3 shows the meaning of each possible value of the ACE used on the SYN/ACK). Table 3 shows the meaning of each possible
field on the ACK of the SYN/ACK and the value that an AccECN server value of the ACE field on the ACK of the SYN/ACK and the value that
MUST set s.cep to as a result. The encoding in Table 3 is solely an AccECN server MUST set s.cep to as a result. The encoding in
applicable on a packet in the client-server direction with an Table 3 is solely applicable on a packet in the client-server
acknowledgement number 1 greater than the Initial Sequence Number direction with an acknowledgement number 1 greater than the Initial
(ISN) that was used by the server. Sequence Number (ISN) that was used by the server.
+--------------+---------------------------+------------------------+ +--------------+---------------------------+------------------------+
| ACE on ACK | IP-ECN codepoint on | Initial s.cep of | | ACE on ACK | IP-ECN codepoint on | Initial s.cep of |
| of SYN/ACK | SYN/ACK inferred by | server in AccECN mode | | of SYN/ACK | SYN/ACK inferred by | server in AccECN mode |
| | server | | | | server | |
+--------------+---------------------------+------------------------+ +--------------+---------------------------+------------------------+
| 0b000 | {Notes 1, 2} | Disable ECN | | 0b000 | {Notes 1, 2} | Disable ECN |
| 0b001 | {Notes 2, 3} | 5 | | 0b001 | {Notes 2, 3} | 5 |
| 0b010 | Not-ECT | 5 | | 0b010 | Not-ECT | 5 |
| 0b011 | ECT(1) | 5 | | 0b011 | ECT(1) | 5 |
skipping to change at page 18, line 46 skipping to change at page 19, line 10
in the initial ACE field has been initialized to a specific valid in the initial ACE field has been initialized to a specific valid
value - the above check solely tests whether the ACE fields have been value - the above check solely tests whether the ACE fields have been
incorrectly zeroed. This allows hosts to use different initial incorrectly zeroed. This allows hosts to use different initial
values as an additional signalling channel in future. values as an additional signalling channel in future.
3.2.4. Testing for Mangling of the IP/ECN Field 3.2.4. Testing for Mangling of the IP/ECN Field
The value of the ACE field on the SYN/ACK indicates the value of the The value of the ACE field on the SYN/ACK indicates the value of the
IP/ECN field when the SYN arrived at the server. The client can IP/ECN field when the SYN arrived at the server. The client can
compare this with how it originally set the IP/ECN field on the SYN. compare this with how it originally set the IP/ECN field on the SYN.
If this comparison implies an unsafe transition of the IP/ECN field, If this comparison implies an unsafe transition (see below) of the
for the remainder of the connection the client MUST NOT send ECN- IP/ECN field, for the remainder of the connection the client MUST NOT
capable packets, but it MUST continue to feed back any ECN markings send ECN-capable packets, but it MUST continue to feed back any ECN
on arriving packets. markings on arriving packets.
The value of the ACE field on the last ACK of the 3WHS indicates the The value of the ACE field on the last ACK of the 3WHS indicates the
value of the IP/ECN field when the SYN/ACK arrived at the client. value of the IP/ECN field when the SYN/ACK arrived at the client.
The server can compare this with how it originally set the IP/ECN The server can compare this with how it originally set the IP/ECN
field on the SYN/ACK. If this comparison implies an unsafe field on the SYN/ACK. If this comparison implies an unsafe
transition of the IP/ECN field, for the remainder of the connection transition of the IP/ECN field, for the remainder of the connection
the server MUST NOT send ECN-capable packets, but it MUST continue to the server MUST NOT send ECN-capable packets, but it MUST continue to
feedback any ECN markings on arriving packets. feedback any ECN markings on arriving packets.
The ACK of the SYN/ACK is not reliably delivered (nonetheless, the The ACK of the SYN/ACK is not reliably delivered (nonetheless, the
count of CE marks is still eventually delivered reliably). If this count of CE marks is still eventually delivered reliably). If this
ACK does not arrive, the server has to continue to send ECN-capable ACK does not arrive, the server can continue to send ECN-capable
packets without having tested for mangling of the IP/ECN field on the packets without having tested for mangling of the IP/ECN field on the
SYN/ACK. Experiments with AccECN deployment will assess whether this SYN/ACK. Experiments with AccECN deployment will assess whether this
limitation has any effect in practice. limitation has any effect in practice.
Invalid transitions of the IP/ECN field are defined in [RFC3168] and Invalid transitions of the IP/ECN field are defined in [RFC3168] and
repeated here for convenience: repeated here for convenience:
o the not-ECT codepoint changes; o the not-ECT codepoint changes;
o either ECT codepoint transitions to not-ECT; o either ECT codepoint transitions to not-ECT;
skipping to change at page 20, line 39 skipping to change at page 20, line 49
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Kind = TBD1 | Length = 11 | EE0B field | | Kind = TBD1 | Length = 11 | EE0B field |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| EE0B (cont'd) | ECEB field | | EE0B (cont'd) | ECEB field |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| EE1B field | | EE1B field |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Figure 3: The AccECN Option Figure 3: The AccECN Option
The Data Receiver MUST set the Kind field to TBD1, which is When a Data Receiver sends an AccECN Option, it MUST set the Kind
registered in Section 6 as a new TCP option Kind called AccECN. An field to TBD1, which is registered in Section 6 as a new TCP option
experimental TCP option with Kind=254 MAY be used for initial Kind called AccECN. An experimental TCP option with Kind=254 MAY be
experiments, with magic number 0xACCE. used for initial experiments, with magic number 0xACCE.
Appendix A.1 gives an example algorithm for the Data Receiver to Appendix A.1 gives an example algorithm for the Data Receiver to
encode its byte counters into the AccECN Option, and for the Data encode its byte counters into the AccECN Option, and for the Data
Sender to decode the AccECN Option fields into its byte counters. Sender to decode the AccECN Option fields into its byte counters.
Note that there is no field to feedback Not-ECT bytes. Nonetheless Note that there is no field to feed back Not-ECT bytes. Nonetheless
an algorithm for the Data Sender to calculate the number of payload an algorithm for the Data Sender to calculate the number of payload
bytes received as Not-ECT is given in Appendix A.5. bytes received as Not-ECT is given in Appendix A.5.
Whenever a Data Receiver sends an AccECN Option, the rules in Whenever a Data Receiver sends an AccECN Option, the rules in
Section 3.2.8 expect it to always send a full-length option. To cope Section 3.2.8 expect it to always send a full-length option. To cope
with option space limitations, it can omit unchanged fields from the with option space limitations, it can omit unchanged fields from the
tail of the option, as long as it preserves the order of the tail of the option, as long as it preserves the order of the
remaining fields and includes any field that has changed. The length remaining fields and includes any field that has changed. The length
field MUST indicate which fields are present as follows: field MUST indicate which fields are present as follows:
skipping to change at page 21, line 26 skipping to change at page 21, line 34
Length=5: EE0B Length=5: EE0B
Length=2: (empty) Length=2: (empty)
The empty option of Length=2 is provided to allow for a case where an The empty option of Length=2 is provided to allow for a case where an
AccECN Option has to be sent (e.g. on the SYN/ACK to test the path), AccECN Option has to be sent (e.g. on the SYN/ACK to test the path),
but there is very limited space for the option. For initial but there is very limited space for the option. For initial
experiments, the Length field MUST be 2 greater to accommodate the experiments, the Length field MUST be 2 greater to accommodate the
16-bit magic number. 16-bit magic number.
All implementations of a Data Sender MUST be able to read in AccECN All implementations of a Data Sender that read any AccECN Option MUST
Options of any of the above lengths. If the AccECN Option is of any be able to read in AccECN Options of any of the above lengths. If
other length, implementations MUST use those whole 3 octet fields the AccECN Option is of any other length, implementations MUST use
that fit within the length and ignore the remainder of the option. those whole 3 octet fields that fit within the length and ignore the
remainder of the option.
The use of the AccECN option is optional for the Data Receiver. If The AccECN Option has to be optional to implement, because both
the Data Receiver intents to use the AccECN option at any time during sender and receiver have to be able to cope without the option anyway
the rest of the connection it strongly recommended to also test its - in cases where it does not traverse a network path. It is
path traversal by including it in the SYN/ACK as specified in the RECOMMENDED to implement both sending and receiving of the AccECN
next section. By default the use of the AccECN option is Option. If sending of the AccECN Option is implemented, the fall-
RECOMMENDED. backs described in this document will need to be implemented as well
(unless solely for a controlled environment where path traversal is
not considered a problem). Even if a developer does not implement
sending of the AccECN Option, it is RECOMMENDED that they still
implement logic to receive and understand any AccECN Options sent by
remote peers.
If a Data Receiver intends to send the AccECN Option at any time
during the rest of the connection it is strongly recommended to also
test path traversal of the AccECN Option as specified in the next
section.
3.2.7. Path Traversal of the AccECN Option 3.2.7. Path Traversal of the AccECN Option
3.2.7.1. Testing the AccECN Option during the Handshake 3.2.7.1. Testing the AccECN Option during the Handshake
The TCP client MUST NOT include the AccECN TCP Option on the SYN. The TCP client MUST NOT include the AccECN TCP Option on the SYN. A
Nonetheless, if the AccECN negotiation using the ECN flags in the fall-back strategy for the loss of the SYN (possibly due to middlebox
main TCP header (Section 3.1) is successful, it implicitly declares interference) is specified in Section 3.1.3.
that the endpoints also support the AccECN TCP Option. A fall-back
strategy for the loss of the SYN (possibly due to middlebox
interference) is specified in Section 3.1.2.
A TCP server that confirms its support for AccECN (in response to an A TCP server that confirms its support for AccECN (in response to an
AccECN SYN from the client as described in Section 3.1) SHOULD AccECN SYN from the client as described in Section 3.1) SHOULD
include an AccECN TCP Option in the SYN/ACK. include an AccECN TCP Option in the SYN/ACK.
A TCP client that has successfully negotiated AccECN SHOULD include A TCP client that has successfully negotiated AccECN SHOULD include
an AccECN Option in the first ACK at the end of the 3WHS. However, an AccECN Option in the first ACK at the end of the 3WHS. However,
this first ACK is not delivered reliably, so the TCP client SHOULD this first ACK is not delivered reliably, so the TCP client SHOULD
also include an AccECN Option on the first data segment it sends (if also include an AccECN Option on the first data segment it sends (if
it ever sends one). it ever sends one).
skipping to change at page 22, line 22 skipping to change at page 22, line 39
if it has cached knowledge that the packet would be likely to be if it has cached knowledge that the packet would be likely to be
blocked on the path to the other host if it included an AccECN blocked on the path to the other host if it included an AccECN
Option. Option.
3.2.7.2. Testing for Loss of Packets Carrying the AccECN Option 3.2.7.2. Testing for Loss of Packets Carrying the AccECN Option
If after the normal TCP timeout the TCP server has not received an If after the normal TCP timeout the TCP server has not received an
ACK to acknowledge its SYN/ACK, the SYN/ACK might just have been ACK to acknowledge its SYN/ACK, the SYN/ACK might just have been
lost, e.g. due to congestion, or a middlebox might be blocking the lost, e.g. due to congestion, or a middlebox might be blocking the
AccECN Option. To expedite connection setup, the TCP server SHOULD AccECN Option. To expedite connection setup, the TCP server SHOULD
retransmit the SYN/ACK with the same TCP flags (AE, CWR and ECE) but retransmit the SYN/ACK repeating the AE, CWR and ECE TCP flags on the
with no AccECN Option. If this retransmission times out, to expedite original SYN/ACK but with no AccECN Option. If this retransmission
connection setup, the TCP server SHOULD disable AccECN and ECN for times out, to expedite connection setup, the TCP server SHOULD
this connection by retransmitting the SYN/ACK with AE=CWR=ECE=0 and disable AccECN and ECN for this connection by retransmitting the SYN/
no AccECN Option. Implementers MAY use other fall-back strategies if ACK with AE=CWR=ECE=0 and no AccECN Option. Implementers MAY use
they are found to be more effective (e.g. falling back to classic other fall-back strategies if they are found to be more effective
ECN feedback on the first retransmission; retrying the AccECN Option (e.g. falling back to classic ECN feedback on the first
for a second time before fall-back (most appropriate during high retransmission; retrying the AccECN Option for a second time before
levels of congestion); or falling back to classic ECN feedback rather fall-back (most appropriate during high levels of congestion); or
than non-ECN on the third retransmission). falling back to classic ECN feedback rather than non-ECN on the third
retransmission).
If the TCP client detects that the first data segment it sent with If the TCP client detects that the first data segment it sent with
the AccECN Option was lost, it SHOULD fall back to no AccECN Option the AccECN Option was lost, it SHOULD fall back to no AccECN Option
on the retransmission. Again, implementers MAY use other fall-back on the retransmission. Again, implementers MAY use other fall-back
strategies such as attempting to retransmit a second segment with the strategies such as attempting to retransmit a second segment with the
AccECN Option before fall-back, and/or caching whether the AccECN AccECN Option before fall-back, and/or caching whether the AccECN
Option is blocked for subsequent connections. Option is blocked for subsequent connections.
Either host MAY include the AccECN Option in a subsequent segment to Either host MAY include the AccECN Option in a subsequent segment to
retest whether the AccECN Option can traverse the path. retest whether the AccECN Option can traverse the path.
skipping to change at page 27, line 25 skipping to change at page 27, line 43
Hardware to offload certain TCP processing represents another large Hardware to offload certain TCP processing represents another large
class of middleboxes, even though it is often a function of a host's class of middleboxes, even though it is often a function of a host's
network interface and rarely in its own 'box'. Leeway has been network interface and rarely in its own 'box'. Leeway has been
allowed in the present AccECN specification in the expectation that allowed in the present AccECN specification in the expectation that
offload hardware could comply and still serve its function. offload hardware could comply and still serve its function.
Nonetheless, such hardware SHOULD also preserve the timing of each Nonetheless, such hardware SHOULD also preserve the timing of each
ACK (for example, if it coalesced ACKs it would not be AccECN- ACK (for example, if it coalesced ACKs it would not be AccECN-
compliant). compliant).
The ACE field changes with every received CE marking, so today's
receive offloading could lead to many interrupts in high congestion
situations. Although that would be useful (because congestion
information is received sooner), it could also significantly increase
processor load, particularly in scenarios such as DCTCP or L4S where
the marking rate is generally higher.
In data centres it has been fortunate for offload hardware that
DCTCP-style feedback changes less often when there are long sequences
of CE marks, which is more common with a step marking threshold. In
order to enable DCTCP to improve its responsiveness, DCs will need to
move beyond step marking. Before this can happen, offload hardware
will have to explicitly address the variability of ECN feedback.
ECN encodes a varying signal in the ACK stream, so it is inevitable
that offload hardware will ultimately need to handle any form of ECN
feedback exceptionally. The purpose of working towards standardized
TCP ECN feedback is to reduce the risk for hardware developers, who
will have to choose which scheme is likely to become dominant.
4. Interaction with Other TCP Variants 4. Interaction with Other TCP Variants
This section is informative, not normative. This section is informative, not normative.
4.1. Compatibility with SYN Cookies 4.1. Compatibility with SYN Cookies
A TCP server can use SYN Cookies (see Appendix A of [RFC4987]) to A TCP server can use SYN Cookies (see Appendix A of [RFC4987]) to
protect itself from SYN flooding attacks. It places minimal commonly protect itself from SYN flooding attacks. It places minimal commonly
used connection state in the SYN/ACK, and deliberately does not hold used connection state in the SYN/ACK, and deliberately does not hold
any state while waiting for the subsequent ACK (e.g. it closes the any state while waiting for the subsequent ACK (e.g. it closes the
skipping to change at page 28, line 37 skipping to change at page 29, line 29
4.3. Compatibility with Feedback Integrity Mechanisms 4.3. Compatibility with Feedback Integrity Mechanisms
Three alternative mechanisms are available to assure the integrity of Three alternative mechanisms are available to assure the integrity of
ECN and/or loss signals. AccECN is compatible with any of these ECN and/or loss signals. AccECN is compatible with any of these
approaches: approaches:
o The Data Sender can test the integrity of the receiver's ECN (or o The Data Sender can test the integrity of the receiver's ECN (or
loss) feedback by occasionally setting the IP-ECN field to a value loss) feedback by occasionally setting the IP-ECN field to a value
normally only set by the network (and/or deliberately leaving a normally only set by the network (and/or deliberately leaving a
sequence number gap). Then it can test whether the Data sequence number gap). Then it can test whether the Data
Receiver's feedback faithfully reports what it expects Receiver's feedback faithfully reports what it expects (similar to
[I-D.moncaster-tcpm-rcv-cheat]. Unlike the ECN Nonce [RFC3540], para 2 of Section 20.2 of [RFC3168]). Unlike the ECN Nonce
this approach does not waste the ECT(1) codepoint in the IP [RFC3540], this approach does not waste the ECT(1) codepoint in
header, it does not require standardisation and it does not rely the IP header, it does not require standardisation and it does not
on misbehaving receivers volunteering to reveal feedback rely on misbehaving receivers volunteering to reveal feedback
information that allows them to be detected. However, setting the information that allows them to be detected. However, setting the
CE mark by the sender might conceal actual congestion feedback CE mark by the sender might conceal actual congestion feedback
from the network and should therefore only be done sparsely. from the network and should therefore only be done sparsely.
o Networks generate congestion signals when they are becoming o Networks generate congestion signals when they are becoming
congested, so networks are more likely than Data Senders to be congested, so networks are more likely than Data Senders to be
concerned about the integrity of the receiver's feedback of these concerned about the integrity of the receiver's feedback of these
signals. A network can enforce a congestion response to its ECN signals. A network can enforce a congestion response to its ECN
markings (or packet losses) using congestion exposure (ConEx) markings (or packet losses) using congestion exposure (ConEx)
audit [RFC7713]. Whether the receiver or a downstream network is audit [RFC7713]. Whether the receiver or a downstream network is
skipping to change at page 31, line 29 skipping to change at page 32, line 23
Forward Compatibility: The behaviour of endpoints and middleboxes is Forward Compatibility: The behaviour of endpoints and middleboxes is
carefully defined for all reserved or currently unused codepoints carefully defined for all reserved or currently unused codepoints
in the scheme, to ensure that any blocking of anomalous values is in the scheme, to ensure that any blocking of anomalous values is
always at least under reversible policy control. always at least under reversible policy control.
6. IANA Considerations 6. IANA Considerations
This document reassigns bit 7 of the TCP header flags to the AccECN This document reassigns bit 7 of the TCP header flags to the AccECN
experiment. This bit was previously called the Nonce Sum (NS) flag experiment. This bit was previously called the Nonce Sum (NS) flag
[RFC3540], but RFC 3540 is being reclassified as historic [RFC8311]. [RFC3540], but RFC 3540 has been reclassified as historic [RFC8311].
The flag will now be defined as: The flag will now be defined as:
+-----+-------------------+-----------+ +-----+-------------------+-----------+
| Bit | Name | Reference | | Bit | Name | Reference |
+-----+-------------------+-----------+ +-----+-------------------+-----------+
| 7 | AE (Accurate ECN) | RFC XXXX | | 7 | AE (Accurate ECN) | RFC XXXX |
+-----+-------------------+-----------+ +-----+-------------------+-----------+
[TO BE REMOVED: IANA is requested to update the existing entry in the [TO BE REMOVED: IANA is requested to update the existing entry in the
Transmission Control Protocol (TCP) Header Flags registration Transmission Control Protocol (TCP) Header Flags registration
skipping to change at page 34, line 19 skipping to change at page 35, line 7
[RFC6994] Touch, J., "Shared Use of Experimental TCP Options", [RFC6994] Touch, J., "Shared Use of Experimental TCP Options",
RFC 6994, DOI 10.17487/RFC6994, August 2013, RFC 6994, DOI 10.17487/RFC6994, August 2013,
<https://www.rfc-editor.org/info/rfc6994>. <https://www.rfc-editor.org/info/rfc6994>.
[RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC [RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC
2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174,
May 2017, <https://www.rfc-editor.org/info/rfc8174>. May 2017, <https://www.rfc-editor.org/info/rfc8174>.
10.2. Informative References 10.2. Informative References
[I-D.ietf-tcpm-alternativebackoff-ecn]
Khademi, N., Welzl, M., Armitage, G., and G. Fairhurst,
"TCP Alternative Backoff with ECN (ABE)", draft-ietf-tcpm-
alternativebackoff-ecn-07 (work in progress), March 2018.
[I-D.ietf-tcpm-generalized-ecn] [I-D.ietf-tcpm-generalized-ecn]
Bagnulo, M. and B. Briscoe, "ECN++: Adding Explicit Bagnulo, M. and B. Briscoe, "ECN++: Adding Explicit
Congestion Notification (ECN) to TCP Control Packets", Congestion Notification (ECN) to TCP Control Packets",
draft-ietf-tcpm-generalized-ecn-02 (work in progress), draft-ietf-tcpm-generalized-ecn-03 (work in progress),
October 2017. October 2018.
[I-D.ietf-tsvwg-l4s-arch] [I-D.ietf-tsvwg-l4s-arch]
Briscoe, B., Schepper, K., and M. Bagnulo, "Low Latency, Briscoe, B., Schepper, K., and M. Bagnulo, "Low Latency,
Low Loss, Scalable Throughput (L4S) Internet Service: Low Loss, Scalable Throughput (L4S) Internet Service:
Architecture", draft-ietf-tsvwg-l4s-arch-02 (work in Architecture", draft-ietf-tsvwg-l4s-arch-03 (work in
progress), March 2018. progress), October 2018.
[I-D.kuehlewind-tcpm-ecn-fallback] [I-D.kuehlewind-tcpm-ecn-fallback]
Kuehlewind, M. and B. Trammell, "A Mechanism for ECN Path Kuehlewind, M. and B. Trammell, "A Mechanism for ECN Path
Probing and Fallback", draft-kuehlewind-tcpm-ecn- Probing and Fallback", draft-kuehlewind-tcpm-ecn-
fallback-01 (work in progress), September 2013. fallback-01 (work in progress), September 2013.
[I-D.moncaster-tcpm-rcv-cheat]
Moncaster, T., Briscoe, B., and A. Jacquet, "A TCP Test to
Allow Senders to Identify Receiver Non-Compliance", draft-
moncaster-tcpm-rcv-cheat-03 (work in progress), July 2014.
[Mandalari18] [Mandalari18]
Mandalari, A., Lutu, A., Briscoe, B., Bagnulo, M., and Oe. Mandalari, A., Lutu, A., Briscoe, B., Bagnulo, M., and Oe.
Alay, "Measuring ECN++: Good News for ++, Bad News for ECN Alay, "Measuring ECN++: Good News for ++, Bad News for ECN
over Mobile", IEEE Communications Magazine , March 2018. over Mobile", IEEE Communications Magazine , March 2018.
(to appear) (to appear)
[RFC3540] Spring, N., Wetherall, D., and D. Ely, "Robust Explicit [RFC3540] Spring, N., Wetherall, D., and D. Ely, "Robust Explicit
Congestion Notification (ECN) Signaling with Nonces", Congestion Notification (ECN) Signaling with Nonces",
RFC 3540, DOI 10.17487/RFC3540, June 2003, RFC 3540, DOI 10.17487/RFC3540, June 2003,
skipping to change at page 36, line 5 skipping to change at page 36, line 35
[RFC8257] Bensley, S., Thaler, D., Balasubramanian, P., Eggert, L., [RFC8257] Bensley, S., Thaler, D., Balasubramanian, P., Eggert, L.,
and G. Judd, "Data Center TCP (DCTCP): TCP Congestion and G. Judd, "Data Center TCP (DCTCP): TCP Congestion
Control for Data Centers", RFC 8257, DOI 10.17487/RFC8257, Control for Data Centers", RFC 8257, DOI 10.17487/RFC8257,
October 2017, <https://www.rfc-editor.org/info/rfc8257>. October 2017, <https://www.rfc-editor.org/info/rfc8257>.
[RFC8311] Black, D., "Relaxing Restrictions on Explicit Congestion [RFC8311] Black, D., "Relaxing Restrictions on Explicit Congestion
Notification (ECN) Experimentation", RFC 8311, Notification (ECN) Experimentation", RFC 8311,
DOI 10.17487/RFC8311, January 2018, DOI 10.17487/RFC8311, January 2018,
<https://www.rfc-editor.org/info/rfc8311>. <https://www.rfc-editor.org/info/rfc8311>.
[RFC8511] Khademi, N., Welzl, M., Armitage, G., and G. Fairhurst,
"TCP Alternative Backoff with ECN (ABE)", RFC 8511,
DOI 10.17487/RFC8511, December 2018,
<https://www.rfc-editor.org/info/rfc8511>.
Appendix A. Example Algorithms Appendix A. Example Algorithms
This appendix is informative, not normative. It gives example This appendix is informative, not normative. It gives example
algorithms that would satisfy the normative requirements of the algorithms that would satisfy the normative requirements of the
AccECN protocol. However, implementers are free to choose other ways AccECN protocol. However, implementers are free to choose other ways
to implement the requirements. to implement the requirements.
A.1. Example Algorithm to Encode/Decode the AccECN Option A.1. Example Algorithm to Encode/Decode the AccECN Option
The example algorithms below show how a Data Receiver in AccECN mode The example algorithms below show how a Data Receiver in AccECN mode
skipping to change at page 44, line 16 skipping to change at page 45, line 16
B.3. Space for Future Evolution B.3. Space for Future Evolution
Despite availability of usable TCP header space being extremely Despite availability of usable TCP header space being extremely
scarce, the AccECN protocol has taken all possible steps to ensure scarce, the AccECN protocol has taken all possible steps to ensure
that there is space to negotiate possible future variants of the that there is space to negotiate possible future variants of the
protocol, either if the experiment proves that a variant of AccECN is protocol, either if the experiment proves that a variant of AccECN is
required, or if a completely different ECN feedback approach is required, or if a completely different ECN feedback approach is
needed: needed:
Future AccECN variants: The requirement not to reject unexpected Future AccECN variants: When the AccECN capability is negotiated
initial values of the ACE counter (in the main TCP header) in the during TCP's 3WHS, the rows in Table 2 tagged as 'Nonce' and
last para of Section 3.2.3 ensures that 5 unused codepoints on the 'Broken' in the column for the capability of node B are unused by
final ACK of the 3-way handshake and 7 unused values on the first any current protocol in the RFC series. These could be used by
data packet from the server could be used to negotiate future TCP servers in future to indicate a variant of the AccECN
variants of the AccECN protocol between the endpoints. Also, a protocol. In recent measurement studies in which the response of
similar requirement not to reject unexpected initial values in the large numbers of servers to an AccECN SYN has been tested, e.g.
TCP option is for the same purpose. If traversal of the TCP [Mandalari18], a very small number of SYN/ACKs arrive with the
option were reliable, this would have enabled a far wider range of pattern tagged as 'Nonce', and a small but more significant number
future variation. arrive with the pattern tagged as 'Broken'. The 'Nonce' pattern
could be a sign that a few servers have implemented the ECN Nonce
[RFC3540], which has now been reclassified as historic [RFC8311],
or it could be the random result of some unknown middlebox
behaviour. The greater prevalence of the 'Broken' pattern
suggests that some instances still exist of the broken code that
reflects the reserved flags on the SYN.
The requirement not to reject unexpected initial values of the ACE
counter (in the main TCP header) in the last para of Section 3.2.3
ensures that 5 unused codepoints on the final ACK of the 3WHS and
7 unused values on the first data packet from the server could be
used to declare future variants of the AccECN protocol. The word
'declare' is used rather than 'negotiate' because, at this late
stage in the 3WHS, it would be too late for a negotiation between
the endpoints to be completed. A similar requirement not to
reject unexpected initial values in the TCP option
(Section 3.2.7.4) is for the same purpose. If traversal of the
TCP option were reliable, this would have enabled a far wider
range of future variation of the whole AccECN protocol.
Nonetheless, it could be used to reliably negotiate a wide range
of variation in the semantics of the AccECN Option.
Future non-AccECN variants: Five codepoints out of the 8 possible in Future non-AccECN variants: Five codepoints out of the 8 possible in
the 3 TCP header flags used by AccECN are unused on the initial the 3 TCP header flags used by AccECN are unused on the initial
SYN (in the order AE,CWR,ECE): 001, 010, 100, 101, 110. All SYN (in the order AE,CWR,ECE): 001, 010, 100, 101, 110.
possible combinations of SYN/ACK coiuld be used in response except Section 3.1.2 ensures that the installed base of AccECN servers
000 and reflection of the same values sent on the SYN. These will all assume these are equivalent to AccECN negotiation with
would not allow fall-back to Classic ECN support for a server that 111 on the SYN. These codepoints would not allow fall-back to
did not understand them, but they are available, perhaps for uses Classic ECN support for a server that did not understand them, but
other than ECN in future. this approach ensures they are available in future, perhaps for
uses other than ECN alongside the AccECN scheme. All possible
combinations of SYN/ACK could be used in response except either
000 or reflection of the same values sent on the SYN.
Of course, other ways could be resorted to in order to extend Of course, other ways could be resorted to in order to extend
AccECN or ECN in future, although their traversal properties are AccECN or ECN in future, although their traversal properties are
likely to be inferior. They include a new TCP option; using the likely to be inferior. They include a new TCP option; using the
remaining reserved flags in the main TCP header (preferably remaining reserved flags in the main TCP header (preferably
extending the 3-bit combinations used by AccECN to 4-bit extending the 3-bit combinations used by AccECN to 4-bit
combinations, rather than burning one bit for just one state); a combinations, rather than burning one bit for just one state); a
non-zero urgent pointer in combination with the URG flag cleared; non-zero urgent pointer in combination with the URG flag cleared;
or some other unexpected combination of fields yet to be invented. or some other unexpected combination of fields yet to be invented.
 End of changes. 42 change blocks. 
145 lines changed or deleted 211 lines changed or added

This html diff was produced by rfcdiff 1.47. The latest version is available from http://tools.ietf.org/tools/rfcdiff/