draft-ietf-tcpm-accurate-ecn-09.txt   draft-ietf-tcpm-accurate-ecn-10.txt 
TCP Maintenance & Minor Extensions (tcpm) B. Briscoe TCP Maintenance & Minor Extensions (tcpm) B. Briscoe
Internet-Draft CableLabs Internet-Draft Independent
Intended status: Experimental M. Kuehlewind Intended status: Experimental M. Kuehlewind
Expires: January 9, 2020 Ericsson Expires: September 6, 2020 Ericsson
R. Scheffenegger R. Scheffenegger
NetApp NetApp
July 8, 2019 March 5, 2020
More Accurate ECN Feedback in TCP More Accurate ECN Feedback in TCP
draft-ietf-tcpm-accurate-ecn-09 draft-ietf-tcpm-accurate-ecn-10
Abstract Abstract
Explicit Congestion Notification (ECN) is a mechanism where network Explicit Congestion Notification (ECN) is a mechanism where network
nodes can mark IP packets instead of dropping them to indicate nodes can mark IP packets instead of dropping them to indicate
incipient congestion to the end-points. Receivers with an ECN- incipient congestion to the end-points. Receivers with an ECN-
capable transport protocol feed back this information to the sender. capable transport protocol feed back this information to the sender.
ECN is specified for TCP in such a way that only one feedback signal ECN is specified for TCP in such a way that only one feedback signal
can be transmitted per Round-Trip Time (RTT). Recent new TCP can be transmitted per Round-Trip Time (RTT). Recent new TCP
mechanisms like Congestion Exposure (ConEx), Data Center TCP (DCTCP) mechanisms like Congestion Exposure (ConEx), Data Center TCP (DCTCP)
or Low Latency Low Loss Scalable Throughput (L4S) need more accurate or Low Latency Low Loss Scalable Throughput (L4S) need more accurate
ECN feedback information whenever more than one marking is received ECN feedback information whenever more than one marking is received
in one RTT. This document specifies an experimental scheme to in one RTT. This document specifies an experimental scheme to
provide more than one feedback signal per RTT in the TCP header. provide more than one feedback signal per RTT in the TCP header.
Given TCP header space is scarce, it allocates a reserved header bit, Given TCP header space is scarce, it allocates a reserved header bit,
that was previously used for the ECN-Nonce which has now been that was previously used for the ECN-Nonce which has now been
declared historic. It also overloads the two existing ECN flags in declared historic. It also overloads the two existing ECN flags in
the TCP header. Supplementary feedback information can optionally be the TCP header. The resulting extra space is exploited to feed back
provided in a new TCP option, which is never used on the TCP SYN. the IP-ECN field received during the 3-way handshake as well.
Supplementary feedback information can optionally be provided in a
new TCP option, which is never used on the TCP SYN.
Status of This Memo Status of This Memo
This Internet-Draft is submitted in full conformance with the This Internet-Draft is submitted in full conformance with the
provisions of BCP 78 and BCP 79. provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF). Note that other groups may also distribute Task Force (IETF). Note that other groups may also distribute
working documents as Internet-Drafts. The list of current Internet- working documents as Internet-Drafts. The list of current Internet-
Drafts is at https://datatracker.ietf.org/drafts/current/. Drafts is at https://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress." material or to cite them other than as "work in progress."
This Internet-Draft will expire on September 6, 2020.
This Internet-Draft will expire on January 9, 2020.
Copyright Notice Copyright Notice
Copyright (c) 2019 IETF Trust and the persons identified as the Copyright (c) 2020 IETF Trust and the persons identified as the
document authors. All rights reserved. document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents Provisions Relating to IETF Documents
(https://trustee.ietf.org/license-info) in effect on the date of (https://trustee.ietf.org/license-info) in effect on the date of
publication of this document. Please review these documents publication of this document. Please review these documents
carefully, as they describe your rights and restrictions with respect carefully, as they describe your rights and restrictions with respect
to this document. Code Components extracted from this document must to this document. Code Components extracted from this document must
include Simplified BSD License text as described in Section 4.e of include Simplified BSD License text as described in Section 4.e of
the Trust Legal Provisions and are provided without warranty as the Trust Legal Provisions and are provided without warranty as
skipping to change at page 2, line 37 skipping to change at page 2, line 38
1.5. Recap of Existing ECN feedback in IP/TCP . . . . . . . . 7 1.5. Recap of Existing ECN feedback in IP/TCP . . . . . . . . 7
2. AccECN Protocol Overview and Rationale . . . . . . . . . . . 8 2. AccECN Protocol Overview and Rationale . . . . . . . . . . . 8
2.1. Capability Negotiation . . . . . . . . . . . . . . . . . 9 2.1. Capability Negotiation . . . . . . . . . . . . . . . . . 9
2.2. Feedback Mechanism . . . . . . . . . . . . . . . . . . . 9 2.2. Feedback Mechanism . . . . . . . . . . . . . . . . . . . 9
2.3. Delayed ACKs and Resilience Against ACK Loss . . . . . . 10 2.3. Delayed ACKs and Resilience Against ACK Loss . . . . . . 10
2.4. Feedback Metrics . . . . . . . . . . . . . . . . . . . . 11 2.4. Feedback Metrics . . . . . . . . . . . . . . . . . . . . 11
2.5. Generic (Dumb) Reflector . . . . . . . . . . . . . . . . 11 2.5. Generic (Dumb) Reflector . . . . . . . . . . . . . . . . 11
3. AccECN Protocol Specification . . . . . . . . . . . . . . . . 12 3. AccECN Protocol Specification . . . . . . . . . . . . . . . . 12
3.1. Negotiating to use AccECN . . . . . . . . . . . . . . . . 12 3.1. Negotiating to use AccECN . . . . . . . . . . . . . . . . 12
3.1.1. Negotiation during the TCP handshake . . . . . . . . 12 3.1.1. Negotiation during the TCP handshake . . . . . . . . 12
3.1.2. Forward Compatibility . . . . . . . . . . . . . . . . 14 3.1.2. Backward Compatibility . . . . . . . . . . . . . . . 13
3.1.3. Retransmission of the SYN . . . . . . . . . . . . . . 15 3.1.3. Forward Compatibility . . . . . . . . . . . . . . . . 15
3.2. AccECN Feedback . . . . . . . . . . . . . . . . . . . . . 15 3.1.4. Retransmission of the SYN . . . . . . . . . . . . . . 16
3.2.1. Initialization of Feedback Counters at the Data 3.1.5. Implications of AccECN Mode . . . . . . . . . . . . . 17
Sender . . . . . . . . . . . . . . . . . . . . . . . 16 3.2. AccECN Feedback . . . . . . . . . . . . . . . . . . . . . 18
3.2.2. The ACE Field . . . . . . . . . . . . . . . . . . . . 16 3.2.1. Initialization of Feedback Counters . . . . . . . . . 19
3.2.3. Testing for Zeroing of the ACE Field . . . . . . . . 18 3.2.2. The ACE Field . . . . . . . . . . . . . . . . . . . . 19
3.2.4. Testing for Mangling of the IP/ECN Field . . . . . . 19 3.2.3. The AccECN Option . . . . . . . . . . . . . . . . . . 27
3.2.5. Safety against Ambiguity of the ACE Field . . . . . . 20
3.2.6. The AccECN Option . . . . . . . . . . . . . . . . . . 20
3.2.7. Path Traversal of the AccECN Option . . . . . . . . . 22
3.2.8. Usage of the AccECN TCP Option . . . . . . . . . . . 25
3.3. Requirements for TCP Proxies, Offload Engines and other 3.3. Requirements for TCP Proxies, Offload Engines and other
Middleboxes on AccECN Compliance . . . . . . . . . . . . 27 Middleboxes on AccECN Compliance . . . . . . . . . . . . 36
4. Interaction with Other TCP Variants . . . . . . . . . . . . . 28 4. Interaction with Other TCP Variants . . . . . . . . . . . . . 37
4.1. Compatibility with SYN Cookies . . . . . . . . . . . . . 28 4.1. Compatibility with SYN Cookies . . . . . . . . . . . . . 37
4.2. Compatibility with Other TCP Options and Experiments . . 29 4.2. Compatibility with Other TCP Options and Experiments . . 38
4.3. Compatibility with Feedback Integrity Mechanisms . . . . 29 4.3. Compatibility with Feedback Integrity Mechanisms . . . . 38
5. Protocol Properties . . . . . . . . . . . . . . . . . . . . . 30
6. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 32 5. Protocol Properties . . . . . . . . . . . . . . . . . . . . . 40
7. Security Considerations . . . . . . . . . . . . . . . . . . . 33 6. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 42
8. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 34 7. Security Considerations . . . . . . . . . . . . . . . . . . . 43
9. Comments Solicited . . . . . . . . . . . . . . . . . . . . . 34 8. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 43
10. References . . . . . . . . . . . . . . . . . . . . . . . . . 34 9. Comments Solicited . . . . . . . . . . . . . . . . . . . . . 44
10.1. Normative References . . . . . . . . . . . . . . . . . . 34 10. References . . . . . . . . . . . . . . . . . . . . . . . . . 44
10.2. Informative References . . . . . . . . . . . . . . . . . 35 10.1. Normative References . . . . . . . . . . . . . . . . . . 44
Appendix A. Example Algorithms . . . . . . . . . . . . . . . . . 37 10.2. Informative References . . . . . . . . . . . . . . . . . 45
A.1. Example Algorithm to Encode/Decode the AccECN Option . . 37 Appendix A. Example Algorithms . . . . . . . . . . . . . . . . . 47
A.1. Example Algorithm to Encode/Decode the AccECN Option . . 47
A.2. Example Algorithm for Safety Against Long Sequences of A.2. Example Algorithm for Safety Against Long Sequences of
ACK Loss . . . . . . . . . . . . . . . . . . . . . . . . 38 ACK Loss . . . . . . . . . . . . . . . . . . . . . . . . 48
A.2.1. Safety Algorithm without the AccECN Option . . . . . 38 A.2.1. Safety Algorithm without the AccECN Option . . . . . 48
A.2.2. Safety Algorithm with the AccECN Option . . . . . . . 40 A.2.2. Safety Algorithm with the AccECN Option . . . . . . . 50
A.3. Example Algorithm to Estimate Marked Bytes from Marked A.3. Example Algorithm to Estimate Marked Bytes from Marked
Packets . . . . . . . . . . . . . . . . . . . . . . . . . 41 Packets . . . . . . . . . . . . . . . . . . . . . . . . . 52
A.4. Example Algorithm to Beacon AccECN Options . . . . . . . 42 A.4. Example Algorithm to Beacon AccECN Options . . . . . . . 52
A.5. Example Algorithm to Count Not-ECT Bytes . . . . . . . . 43 A.5. Example Algorithm to Count Not-ECT Bytes . . . . . . . . 53
Appendix B. Rationale for Usage of TCP Header Flags . . . . . . 43 Appendix B. Rationale for Usage of TCP Header Flags . . . . . . 54
B.1. Three TCP Header Flags in the SYN-SYN/ACK Handshake . . . 43 B.1. Three TCP Header Flags in the SYN-SYN/ACK Handshake . . . 54
B.2. Four Codepoints in the SYN/ACK . . . . . . . . . . . . . 44 B.2. Four Codepoints in the SYN/ACK . . . . . . . . . . . . . 55
B.3. Space for Future Evolution . . . . . . . . . . . . . . . 45 B.3. Space for Future Evolution . . . . . . . . . . . . . . . 55
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 46 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 57
1. Introduction 1. Introduction
Explicit Congestion Notification (ECN) [RFC3168] is a mechanism where Explicit Congestion Notification (ECN) [RFC3168] is a mechanism where
network nodes can mark IP packets instead of dropping them to network nodes can mark IP packets instead of dropping them to
indicate incipient congestion to the end-points. Receivers with an indicate incipient congestion to the end-points. Receivers with an
ECN-capable transport protocol feed back this information to the ECN-capable transport protocol feed back this information to the
sender. ECN is specified for TCP in such a way that only one sender. ECN is specified for TCP in such a way that only one
feedback signal can be transmitted per Round-Trip Time (RTT). feedback signal can be transmitted per Round-Trip Time (RTT).
Recently, proposed mechanisms like Congestion Exposure (ConEx Recently, proposed mechanisms like Congestion Exposure (ConEx
skipping to change at page 4, line 33 skipping to change at page 4, line 31
three bits in the TCP header to negotiate the most advanced feedback three bits in the TCP header to negotiate the most advanced feedback
protocol that they can both support, in a way that is backward protocol that they can both support, in a way that is backward
compatible with [RFC3168]. compatible with [RFC3168].
AccECN is solely an (experimental) change to the TCP wire protocol; AccECN is solely an (experimental) change to the TCP wire protocol;
it only specifies the negotiation and signaling of more accurate ECN it only specifies the negotiation and signaling of more accurate ECN
feedback from a TCP Data Receiver to a Data Sender. It is completely feedback from a TCP Data Receiver to a Data Sender. It is completely
independent of how TCP might respond to congestion feedback, which is independent of how TCP might respond to congestion feedback, which is
out of scope. For that we refer to [RFC3168] or any RFC that out of scope. For that we refer to [RFC3168] or any RFC that
specifies a different response to TCP ECN feedback, for example: specifies a different response to TCP ECN feedback, for example:
[RFC8257]; or the ECN experiments referred to in [RFC8311], namely: a [RFC8257]; or ECN experiments such as those referred to in [RFC8311],
TCP-based Low Latency Low Loss Scalable (L4S) congestion control namely: a TCP-based Low Latency Low Loss Scalable (L4S) congestion
[I-D.ietf-tsvwg-l4s-arch]; ECN-capable TCP control packets control [I-D.ietf-tsvwg-l4s-arch]; ECN-capable TCP control packets
[I-D.ietf-tcpm-generalized-ecn], or Alternative Backoff with ECN [I-D.ietf-tcpm-generalized-ecn], or Alternative Backoff with ECN
(ABE) [RFC8511]. (ABE) [RFC8511].
It is recommended that the AccECN protocol is implemented alongside It is recommended that the AccECN protocol is implemented alongside
the experimental ECN++ protocol [I-D.ietf-tcpm-generalized-ecn]. SACK [RFC2018] and the experimental ECN++ protocol
Therefore, this specification does not discuss implementing AccECN [I-D.ietf-tcpm-generalized-ecn], which allows the ECN capability to
alongside [RFC5562], which was an earlier experimental protocol with be used on TCP control packets. Therefore, this specification does
narrower scope than ECN++. not discuss implementing AccECN alongside [RFC5562], which was an
earlier experimental protocol with narrower scope than ECN++.
1.1. Document Roadmap 1.1. Document Roadmap
The following introductory sections outline the goals of AccECN The following introductory sections outline the goals of AccECN
(Section 1.2) and the goal of experiments with ECN (Section 1.3) so (Section 1.2) and the goal of experiments with ECN (Section 1.3) so
that it is clear what success would look like. Then terminology is that it is clear what success would look like. Then terminology is
defined (Section 1.4) and a recap of existing prerequisite technology defined (Section 1.4) and a recap of existing prerequisite technology
is given (Section 1.5). is given (Section 1.5).
Section 2 gives an informative overview of the AccECN protocol. Then Section 2 gives an informative overview of the AccECN protocol. Then
Section 3 gives the normative protocol specification. Section 4 Section 3 gives the normative protocol specification. Section 4
assesses the interaction of AccECN with commonly used variants of assesses the interaction of AccECN with commonly used variants of
TCP, whether standardised or not. Section 5 summarises the features TCP, whether standardized or not. Section 5 summarizes the features
and properties of AccECN. and properties of AccECN.
Section 6 summarises the protocol fields and numbers that IANA will Section 6 summarizes the protocol fields and numbers that IANA will
need to assign and Section 7 points to the aspects of the protocol need to assign and Section 7 points to the aspects of the protocol
that will be of interest to the security community. that will be of interest to the security community.
Appendix A gives pseudocode examples for the various algorithms that Appendix A gives pseudocode examples for the various algorithms that
AccECN uses. AccECN uses.
1.2. Goals 1.2. Goals
[RFC7560] enumerates requirements that a candidate feedback scheme [RFC7560] enumerates requirements that a candidate feedback scheme
will need to satisfy, under the headings: resilience, timeliness, will need to satisfy, under the headings: resilience, timeliness,
integrity, accuracy (including ordering and lack of bias), integrity, accuracy (including ordering and lack of bias),
complexity, overhead and compatibility (both backward and forward). complexity, overhead and compatibility (both backward and forward).
It recognises that a perfect scheme that fully satisfies all the It recognizes that a perfect scheme that fully satisfies all the
requirements is unlikely and trade-offs between requirements are requirements is unlikely and trade-offs between requirements are
likely. Section 5 presents the properties of AccECN against these likely. Section 5 presents the properties of AccECN against these
requirements and discusses the trade-offs made. requirements and discusses the trade-offs made.
The requirements document recognises that a protocol as ubiquitous as The requirements document recognizes that a protocol as ubiquitous as
TCP needs to be able to serve as-yet-unspecified requirements. TCP needs to be able to serve as-yet-unspecified requirements.
Therefore an AccECN receiver aims to act as a generic (dumb) Therefore an AccECN receiver aims to act as a generic (dumb)
reflector of congestion information so that in future new sender reflector of congestion information so that in future new sender
behaviours can be deployed unilaterally. behaviours can be deployed unilaterally.
1.3. Experiment Goals 1.3. Experiment Goals
TCP is critical to the robust functioning of the Internet, therefore TCP is critical to the robust functioning of the Internet, therefore
any proposed modifications to TCP need to be thoroughly tested. The any proposed modifications to TCP need to be thoroughly tested. The
present specification describes an experimental protocol that adds present specification describes an experimental protocol that adds
more accurate ECN feedback to the TCP protocol. The intention is to more accurate ECN feedback to the TCP protocol. The intention is to
specify the protocol sufficiently so that more than one specify the protocol sufficiently so that more than one
implementation can be built in order to test its function, robustness implementation can be built in order to test its function, robustness
and interoperability (with itself and with previous version of ECN and interoperability (with itself and with previous version of ECN
and TCP). and TCP).
The experimental protocol will be considered successful if testing The experimental protocol will be considered successful if testing
confirms that the proposed mechanism can be deployed at large scale. confirms that the proposed mechanism can be deployed at large scale.
Testing will mostly focus on fall-back strategies in case of Testing will mostly focus on fall-back strategies in case of
middlebox interference. Current recommended strategies are specified middlebox interference. Current recommended strategies are specified
in Sections 3.1.3, 3.2.3, 3.2.4 and 3.2.7. The effectiveness of in Sections 3.1.4, 3.2.2.3, 3.2.2.4 and 3.2.3.2. The effectiveness
these strategies depends on the actual deployment situation of of these strategies depends on the actual deployment situation of
middleboxes. Therefore experimental verification to confirm large- middleboxes. Therefore experimental verification to confirm large-
scale path traversal in the Internet is needed before finalizing this scale path traversal in the Internet is needed before finalizing this
specification on the Standards Track. specification on the Standards Track.
Another experimentation focus is the implementation feasibiliy of Another experimentation focus is the implementation feasibiliy of
change-triggered ACKs as described in section 3.2.8. While on change-triggered ACKs as described in section 3.2.3.3. While on
average this should not lead to a higher ACK rate, it changes the ACK average this should not lead to a higher ACK rate, it changes the ACK
pattern which can particularly have an impact on hardware offload. pattern which can particularly have an impact on hardware offload.
It is currently specified as a hard requirement, because the sender It is currently specified as a hard requirement, because the sender
can exploit the predictability of the receiver's behaviour. However, can exploit the predictability of the receiver's behaviour. However,
further experimentation is needed to advise if will have to become further experimentation is needed to advise if will have to become
just preferred behavior. just preferred behavior.
1.4. Terminology 1.4. Terminology
AccECN: The more accurate ECN feedback scheme will be called AccECN AccECN: The more accurate ECN feedback scheme will be called AccECN
for short. for short.
Classic ECN: the ECN protocol specified in [RFC3168]. Classic ECN: the ECN protocol specified in [RFC3168].
Classic ECN feedback: the feedback aspect of the ECN protocol Classic ECN feedback: the feedback aspect of the ECN protocol
specified in [RFC3168], including generation, encoding, specified in [RFC3168], including generation, encoding,
transmission and decoding of feedback, but not the Data Sender's transmission and decoding of feedback, but not the Data Sender's
subsequent response to that feedback. subsequent response to that feedback.
ACK: A TCP acknowledgement, with or without a data payload. ACK: A TCP acknowledgement, with or without a data payload (ACK=1).
Pure ACK: A TCP acknowledgement without a data payload. Pure ACK: A TCP acknowledgement without a data payload.
Acceptable packet / segment: A packet or segment that passes the
acceptability tests in [RFC0793] and [RFC5961].
TCP client: The TCP stack that originates a connection. TCP client: The TCP stack that originates a connection.
TCP server: The TCP stack that responds to a connection request. TCP server: The TCP stack that responds to a connection request.
Data Receiver: The endpoint of a TCP half-connection that receives Data Receiver: The endpoint of a TCP half-connection that receives
data and sends AccECN feedback. data and sends AccECN feedback.
Data Sender: The endpoint of a TCP half-connection that sends data Data Sender: The endpoint of a TCP half-connection that sends data
and receives AccECN feedback. and receives AccECN feedback.
skipping to change at page 9, line 31 skipping to change at page 9, line 31
therefore it can only be used if both endpoints have been upgraded to therefore it can only be used if both endpoints have been upgraded to
understand it. The TCP client signals support for AccECN on the understand it. The TCP client signals support for AccECN on the
initial SYN of a connection and the TCP server signals whether it initial SYN of a connection and the TCP server signals whether it
supports AccECN on the SYN/ACK. The TCP flags on the SYN that the supports AccECN on the SYN/ACK. The TCP flags on the SYN that the
client uses to signal AccECN support have been carefully chosen so client uses to signal AccECN support have been carefully chosen so
that a TCP server will interpret them as a request to support the that a TCP server will interpret them as a request to support the
most recent variant of ECN feedback that it supports. Then the most recent variant of ECN feedback that it supports. Then the
client falls back to the same variant of ECN feedback. client falls back to the same variant of ECN feedback.
An AccECN TCP client does not send the new AccECN Option on the SYN An AccECN TCP client does not send the new AccECN Option on the SYN
as SYN option space is limited and successful negotiation using the as SYN option space is limited. The TCP server sends the AccECN
flags in the main header is taken as sufficient evidence that both
ends also support the AccECN Option. The TCP server sends the AccECN
Option on the SYN/ACK and the client sends it on the first ACK to Option on the SYN/ACK and the client sends it on the first ACK to
test whether the network path forwards the option correctly. test whether the network path forwards the option correctly.
2.2. Feedback Mechanism 2.2. Feedback Mechanism
A Data Receiver maintains four counters initialised at the start of A Data Receiver maintains four counters initialized at the start of
the half-connection. Three count the number of arriving payload the half-connection. Three count the number of arriving payload
bytes marked CE, ECT(1) and ECT(0) respectively. The fourth counts bytes marked CE, ECT(1) and ECT(0) respectively. The fourth counts
the number of packets arriving marked with a CE codepoint (including the number of packets arriving marked with a CE codepoint (including
control packets without payload if they are CE-marked). control packets without payload if they are CE-marked).
The Data Sender maintains four equivalent counters for the half The Data Sender maintains four equivalent counters for the half
connection, and the AccECN protocol is designed to ensure they will connection, and the AccECN protocol is designed to ensure they will
match the values in the Data Receiver's counters, albeit after a match the values in the Data Receiver's counters, albeit after a
little delay. little delay.
Each ACK carries the three least significant bits (LSBs) of the Each ACK carries the three least significant bits (LSBs) of the
packet-based CE counter using the ECN bits in the TCP header, now packet-based CE counter using the ECN bits in the TCP header, now
renamed the Accurate ECN (ACE) field (see Figure 2 later). The LSBs renamed the Accurate ECN (ACE) field (see Figure 3 later). The 24
of each of the three byte counters are carried in the AccECN Option. LSBs of each byte counter are carried in the AccECN Option.
2.3. Delayed ACKs and Resilience Against ACK Loss 2.3. Delayed ACKs and Resilience Against ACK Loss
With both the ACE and the AccECN Option mechanisms, the Data Receiver With both the ACE and the AccECN Option mechanisms, the Data Receiver
continually repeats the current LSBs of each of its respective continually repeats the current LSBs of each of its respective
counters. There is no need to acknowledge these continually repeated counters. There is no need to acknowledge these continually repeated
counters, so the congestion window reduced (CWR) mechanism is no counters, so the congestion window reduced (CWR) mechanism is no
longer used. Even if some ACKs are lost, the Data Sender should be longer used. Even if some ACKs are lost, the Data Sender should be
able to infer how much to increment its own counters, even if the able to infer how much to increment its own counters, even if the
protocol field has wrapped. protocol field has wrapped.
The 3-bit ACE field can wrap fairly frequently. Therefore, even if The 3-bit ACE field can wrap fairly frequently. Therefore, even if
it appears to have incremented by one (say), the field might have it appears to have incremented by one (say), the field might have
actually cycled completely then incremented by one. The Data actually cycled completely then incremented by one. The Data
Receiver is required not to delay sending an ACK to such an extent Receiver is not allowed to delay sending an ACK to such an extent
that the ACE field would cycle. However cyling is still a that the ACE field would cycle. However cycling is still a
possibility at the Data Sender because a whole sequence of ACKs possibility at the Data Sender because a whole sequence of ACKs
carrying intervening values of the field might all be lost or delayed carrying intervening values of the field might all be lost or delayed
in transit. in transit.
The fields in the AccECN Option are larger, but they will increment The fields in the AccECN Option are larger, but they will increment
in larger steps because they count bytes not packets. Nonetheless, in larger steps because they count bytes not packets. Nonetheless,
their size has been chosen such that a whole cycle of the field would their size has been chosen such that a whole cycle of the field would
never occur between ACKs unless there had been an infeasibly long never occur between ACKs unless there had been an infeasibly long
sequence of ACK losses. Therefore, as long as the AccECN Option is sequence of ACK losses. Therefore, as long as the AccECN Option is
available, it can be treated as a dependable feedback channel. available, it can be treated as a dependable feedback channel.
If the AccECN Option is not available, e.g. it is being stripped by a If the AccECN Option is not available, e.g. it is being stripped by a
middlebox, the AccECN protocol will only feed back information on CE middlebox, the AccECN protocol will only feed back information on CE
markings (using the ACE field). Although not ideal, this will be markings (using the ACE field). Although not ideal, this will be
sufficient, because it is envisaged that neither ECT(0) nor ECT(1) sufficient, because it is envisaged that neither ECT(0) nor ECT(1)
will ever indicate more severe congestion than CE, even though future will ever indicate more severe congestion than CE, even though future
uses for ECT(0) or ECT(1) are still unclear [RFC8311]. Because the uses for ECT(0) or ECT(1) are still unclear [RFC8311]. Because the
3-bit ACE field is so small, when it is the only field available the 3-bit ACE field is so small, when it is the only field available the
Data Sender has to interpret it conservatively assuming the worst Data Sender has to interpret it assuming the most likely wrap, but
possible wrap. with a degree of conservatism.
Certain specified events trigger the Data Receiver to include an Certain specified events trigger the Data Receiver to include an
AccECN Option on an ACK. The rules are designed to ensure that the AccECN Option on an ACK. The rules are designed to ensure that the
order in which different markings arrive at the receiver is order in which different markings arrive at the receiver is
communicated to the sender (as long as there is no ACK loss). communicated to the sender (as long as options are reaching the
Implementations are encouraged to send an AccECN Option more sender and as long as there is no ACK loss). Implementations are
frequently, but this is left up to the implementer. encouraged to send an AccECN Option more frequently, but this is left
up to the implementer.
2.4. Feedback Metrics 2.4. Feedback Metrics
The CE packet counter in the ACE field and the CE byte counter in the The CE packet counter in the ACE field and the CE byte counter in the
AccECN Option both provide feedback on received CE-marks. The CE AccECN Option both provide feedback on received CE-marks. The CE
packet counter includes control packets that do not have payload packet counter includes control packets that do not have payload
data, while the CE byte counter solely includes marked payload bytes. data, while the CE byte counter solely includes marked payload bytes.
If both are present, the byte counter in the option will provide the If both are present, the byte counter in the option will provide the
more accurate information needed for modern congestion control and more accurate information needed for modern congestion control and
policing schemes, such as L4S, DCTCP or ConEx. If the option is policing schemes, such as L4S, DCTCP or ConEx. If the option is
skipping to change at page 11, line 37 skipping to change at page 11, line 37
private networks (e.g. data centres) set control packets to be ECN private networks (e.g. data centres) set control packets to be ECN
capable because they are precisely the packets that performance capable because they are precisely the packets that performance
depends on most. depends on most.
For this reason, AccECN is designed to be a generic reflector of For this reason, AccECN is designed to be a generic reflector of
whatever ECN markings it sees, whether or not they are compliant with whatever ECN markings it sees, whether or not they are compliant with
a current standard. Then as standards evolve, Data Senders can a current standard. Then as standards evolve, Data Senders can
upgrade unilaterally without any need for receivers to upgrade too. upgrade unilaterally without any need for receivers to upgrade too.
It is also useful to be able to rely on generic reflection behaviour It is also useful to be able to rely on generic reflection behaviour
when senders need to test for unexpected interference with markings when senders need to test for unexpected interference with markings
(for instance [I-D.kuehlewind-tcpm-ecn-fallback] and para 2 of (for instance Section 3.2.2.3, Section 3.2.2.4 and Section 3.2.3.2 of
Section 20.2 of [RFC3168]). the present document, para 2 of Section 20.2 of [RFC3168]) and
[I-D.kuehlewind-tcpm-ecn-fallback].
The initial SYN is the most critical control packet, so AccECN The initial SYN is the most critical control packet, so AccECN
provides feedback on its ECN marking. Although RFC 3168 prohibits an provides feedback on its ECN marking. Although RFC 3168 prohibits an
ECN-capable SYN, providing feedback of ECN marking on the SYN ECN-capable SYN, providing feedback of ECN marking on the SYN
supports future scenarios in which SYNs might be ECN-enabled (without supports future scenarios in which SYNs might be ECN-enabled (without
prejudging whether they ought to be). For instance, [RFC8311] prejudging whether they ought to be). For instance, [RFC8311]
updates this aspect of RFC 3168 to allow experimentation with ECN- updates this aspect of RFC 3168 to allow experimentation with ECN-
capable TCP control packets. capable TCP control packets.
Even if the TCP client (or server) has set the SYN (or SYN/ACK) to Even if the TCP client (or server) has set the SYN (or SYN/ACK) to
skipping to change at page 12, line 26 skipping to change at page 12, line 27
3. AccECN Protocol Specification 3. AccECN Protocol Specification
3.1. Negotiating to use AccECN 3.1. Negotiating to use AccECN
3.1.1. Negotiation during the TCP handshake 3.1.1. Negotiation during the TCP handshake
Given the ECN Nonce [RFC3540] has been reclassified as historic Given the ECN Nonce [RFC3540] has been reclassified as historic
[RFC8311], the present specification re-allocates the TCP flag at bit [RFC8311], the present specification re-allocates the TCP flag at bit
7 of the TCP header, which was previously called NS (Nonce Sum), as 7 of the TCP header, which was previously called NS (Nonce Sum), as
the AE (Accurate ECN) flag (see IANA Considerations in Section 6). the AE (Accurate ECN) flag (see IANA Considerations in Section 6) as
shown below.
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
| | | A | C | E | U | A | P | R | S | F |
| Header Length | Reserved | E | W | C | R | C | S | S | Y | I |
| | | | R | E | G | K | H | T | N | N |
+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
Figure 2: The (post-AccECN) definition of the TCP header flags during
the TCP handshake
During the TCP handshake at the start of a connection, to request During the TCP handshake at the start of a connection, to request
more accurate ECN feedback the TCP client (host A) MUST set the TCP more accurate ECN feedback the TCP client (host A) MUST set the TCP
flags AE=1, CWR=1 and ECE=1 in the initial SYN segment. flags AE=1, CWR=1 and ECE=1 in the initial SYN segment.
If a TCP server (B) that is AccECN-enabled receives a SYN with the If a TCP server (B) that is AccECN-enabled receives a SYN with the
above three flags set, it MUST set both its half connections into above three flags set, it MUST set both its half connections into
AccECN mode. Then it MUST set the TCP flags on the SYN/ACK to one of AccECN mode. Then it MUST set the TCP flags on the SYN/ACK to one of
the 4 values shown in the top block of Table 2 to confirm that it the 4 values shown in the top block of Table 2 to confirm that it
supports AccECN. The TCP server MUST NOT set one of these 4 supports AccECN. The TCP server MUST NOT set one of these 4
skipping to change at page 13, line 5 skipping to change at page 13, line 16
the SYN/ACK to the value in Table 2 that feeds back the IP-ECN field the SYN/ACK to the value in Table 2 that feeds back the IP-ECN field
that arrived on the SYN. This applies whether or not the server that arrived on the SYN. This applies whether or not the server
itself supports setting the IP-ECN field on a SYN or SYN/ACK (see itself supports setting the IP-ECN field on a SYN or SYN/ACK (see
Section 2.5 for rationale). Section 2.5 for rationale).
Once a TCP client (A) has sent the above SYN to declare that it Once a TCP client (A) has sent the above SYN to declare that it
supports AccECN, and once it has received the above SYN/ACK segment supports AccECN, and once it has received the above SYN/ACK segment
that confirms that the TCP server supports AccECN, the TCP client that confirms that the TCP server supports AccECN, the TCP client
MUST set both its half connections into AccECN mode. MUST set both its half connections into AccECN mode.
Once in AccECN mode, a TCP client or server has the rights and
obligations to participate in the ECN protocol defined in
Section 3.1.5.
The procedure for the client to follow if a SYN/ACK does not arrive The procedure for the client to follow if a SYN/ACK does not arrive
before its retransmission timer expires is given in Section 3.1.3. before its retransmission timer expires is given in Section 3.1.4.
3.1.2. Backward Compatibility
The three flags set to 1 to indicate AccECN support on the SYN have The three flags set to 1 to indicate AccECN support on the SYN have
been carefully chosen to enable natural fall-back to prior stages in been carefully chosen to enable natural fall-back to prior stages in
the evolution of ECN. Table 2 tabulates all the negotiation the evolution of ECN, as above. Table 2 tabulates all the
possibilities for ECN-related capabilities that involve at least one negotiation possibilities for ECN-related capabilities that involve
AccECN-capable host. The entries in the first two columns have been at least one AccECN-capable host. The entries in the first two
abbreviated, as follows: columns have been abbreviated, as follows:
AccECN: More Accurate ECN Feedback (the present specification) AccECN: More Accurate ECN Feedback (the present specification)
Nonce: ECN Nonce feedback [RFC3540] Nonce: ECN Nonce feedback [RFC3540]
ECN: 'Classic' ECN feedback [RFC3168] ECN: 'Classic' ECN feedback [RFC3168]
No ECN: Not-ECN-capable. Implicit congestion notification using No ECN: Not-ECN-capable. Implicit congestion notification using
packet drop. packet drop.
+--------+--------+------------+-------------+----------------------+ +--------+--------+------------+-----------+------------------------+
| A | B | SYN A->B | SYN/ACK | Feedback Mode | | A | B | SYN A->B | SYN/ACK | Feedback Mode |
| | | | B->A | | | | | | B->A | |
+--------+--------+------------+-------------+----------------------+ +--------+--------+------------+-----------+------------------------+
| | | AE CWR ECE | AE CWR ECE | | | | | AE CWR ECE | AE CWR | |
| AccECN | AccECN | 1 1 1 | 0 1 0 | AccECN (Not-ECT on | | | | | ECE | |
| | | | | SYN) | | AccECN | AccECN | 1 1 1 | 0 1 0 | AccECN (no ECT on SYN) |
| AccECN | AccECN | 1 1 1 | 0 1 1 | AccECN (ECT1 on SYN) | | AccECN | AccECN | 1 1 1 | 0 1 1 | AccECN (ECT1 on SYN) |
| AccECN | AccECN | 1 1 1 | 1 0 0 | AccECN (ECT0 on SYN) | | AccECN | AccECN | 1 1 1 | 1 0 0 | AccECN (ECT0 on SYN) |
| AccECN | AccECN | 1 1 1 | 1 1 0 | AccECN (CE on SYN) | | AccECN | AccECN | 1 1 1 | 1 1 0 | AccECN (CE on SYN) |
| | | | | | | | | | | |
| AccECN | Nonce | 1 1 1 | 1 0 1 | classic ECN | | AccECN | Nonce | 1 1 1 | 1 0 1 | (Reserved) |
| AccECN | ECN | 1 1 1 | 0 0 1 | classic ECN | | AccECN | ECN | 1 1 1 | 0 0 1 | classic ECN |
| AccECN | No ECN | 1 1 1 | 0 0 0 | Not ECN | | AccECN | No ECN | 1 1 1 | 0 0 0 | Not ECN |
| | | | | | | | | | | |
| Nonce | AccECN | 0 1 1 | 0 0 1 | classic ECN | | Nonce | AccECN | 0 1 1 | 0 0 1 | classic ECN |
| ECN | AccECN | 0 1 1 | 0 0 1 | classic ECN | | ECN | AccECN | 0 1 1 | 0 0 1 | classic ECN |
| No ECN | AccECN | 0 0 0 | 0 0 0 | Not ECN | | No ECN | AccECN | 0 0 0 | 0 0 0 | Not ECN |
| | | | | | | | | | | |
| AccECN | Broken | 1 1 1 | 1 1 1 | Not ECN | | AccECN | Broken | 1 1 1 | 1 1 1 | Not ECN |
+--------+--------+------------+-------------+----------------------+ +--------+--------+------------+-----------+------------------------+
Table 2: ECN capability negotiation between Client (A) and Server (B) Table 2: ECN capability negotiation between Client (A) and Server (B)
Table 2 is divided into blocks each separated by an empty row. Table 2 is divided into blocks each separated by an empty row.
1. The top block shows the case already described where both 1. The top block shows the case already described in Section 3.1
endpoints support AccECN and how the TCP server (B) indicates where both endpoints support AccECN and how the TCP server (B)
congestion feedback. indicates congestion feedback.
2. The second block shows the cases where the TCP client (A) 2. The second block shows the cases where the TCP client (A)
supports AccECN but the TCP server (B) supports some earlier supports AccECN but the TCP server (B) supports some earlier
variant of TCP feedback, indicated in its SYN/ACK. Therefore, as variant of TCP feedback, indicated in its SYN/ACK. Therefore, as
soon as an AccECN-capable TCP client (A) receives the SYN/ACK soon as an AccECN-capable TCP client (A) receives the SYN/ACK
shown it MUST set both its half connections into the feedback shown it MUST set both its half connections into the feedback
mode shown in the rightmost column. mode shown in the rightmost column. If it has set itself into
classic ECN feedback mode it MUST then comply with [RFC3168].
The server response called 'Nonce' in the table is now historic.
For an AccECN implementation, there is no need to recognize or
support ECN Nonce feedback [RFC3540], which has been reclassified
as historic [RFC8311]. AccECN is compatible with alternative ECN
feedback integrity approaches (see Section 4.3).
3. The third block shows the cases where the TCP server (B) supports 3. The third block shows the cases where the TCP server (B) supports
AccECN but the TCP client (A) supports some earlier variant of AccECN but the TCP client (A) supports some earlier variant of
TCP feedback, indicated in its SYN. Therefore, as soon as an TCP feedback, indicated in its SYN.
AccECN-enabled TCP server (B) receives the SYN shown, it MUST set
both its half connections into the feedback mode shown in the
rightmost column.
4. The fourth block displays a combination labelled `Broken' . Some When an AccECN-enabled TCP server (B) receives a SYN with
AE,CWR,ECE = 0,1,1 it MUST do one of the following:
* set both its half connections into the classic ECN feedback
mode and return a SYN/ACK with AE, CWR, ECE = 0,0,1 as shown.
Then it MUST comply with [RFC3168].
* set both its half-connections into No ECN mode and return a
SYN/ACK with AE,CWR,ECE = 0,0,0, then continue with ECN
disabled. This latter case is unlikely to be desirable, but
it is allowed as a possibility, e.g. for minimal TCP
implementations.
When an AccECN-enabled TCP server (B) receives a SYN with
AE,CWR,ECE = 0,0,0 it MUST set both its half connections into the
Not ECN feedback mode, return a SYN/ACK with AE,CWR,ECE = 0,0,0
as shown and continue with ECN disabled.
4. The fourth block displays a combination labelled `Broken'. Some
older TCP server implementations incorrectly set the reserved older TCP server implementations incorrectly set the reserved
flags in the SYN/ACK by reflecting those in the SYN. Such broken flags in the SYN/ACK by reflecting those in the SYN. Such broken
TCP servers (B) cannot support ECN, so as soon as an AccECN- TCP servers (B) cannot support ECN, so as soon as an AccECN-
capable TCP client (A) receives such a broken SYN/ACK it MUST capable TCP client (A) receives such a broken SYN/ACK it MUST
fall-back to Not ECN mode for both its half connections. fall back to Not ECN mode for both its half connections and
continue with ECN disabled.
The following exceptional cases need some explanation:
ECN Nonce: With AccECN implementation, there is no need for the ECN The following additional rules do not fit the structure of the table,
Nonce feedback mode [RFC3540], which has been reclassified as but they complement it:
historic [RFC8311], as AccECN is compatible with an alternative
ECN feedback integrity approach that does not use up the ECT(1)
codepoint and can be implemented solely at the sender (see
Section 4.3).
Simultaneous Open: An originating AccECN Host (A), having sent a SYN Simultaneous Open: An originating AccECN Host (A), having sent a SYN
with AE=1, CWR=1 and ECE=1, might receive another SYN from host B. with AE=1, CWR=1 and ECE=1, might receive another SYN from host B.
Host A MUST then enter the same feedback mode as it would have Host A MUST then enter the same feedback mode as it would have
entered had it been a responding host and received the same SYN. entered had it been a responding host and received the same SYN.
Then host A MUST send the same SYN/ACK as it would have sent had Then host A MUST send the same SYN/ACK as it would have sent had
it been a responding host. it been a responding host.
3.1.2. Forward Compatibility In-window SYN during TIME-WAIT: Many TCP implementations create a
new TCP connection if they receive an in-window SYN packet during
TIME-WAIT state. When a TCP host enters TIME-WAIT or CLOSED
state, it should ignore any previous state about the negotiation
of AccECN for that connection and renegotiate the feedback mode
according to Table 2.
3.1.3. Forward Compatibility
If a TCP server that implements AccECN receives a SYN with the three If a TCP server that implements AccECN receives a SYN with the three
TCP header flags (AE, CWR and ECE) set to any combination other than TCP header flags (AE, CWR and ECE) set to any combination other than
000, 011 or 111, it MUST negotiate the use of AccECN as if they had 000, 011 or 111, it MUST negotiate the use of AccECN as if they had
been set to 111. This ensures that future uses of the other been set to 111. This ensures that future uses of the other
combinations on a SYN can rely on consistent behaviour from the combinations on a SYN can rely on consistent behaviour from the
installed base of AccECN servers. installed base of AccECN servers.
For the avoidance of doubt, the negotiation tabulated in Table 2 For the avoidance of doubt, the behaviour described in the present
solely concerns the three TCP header flags shown (AE, CWR and ECE). specification applies whether or not the three remaining reserved TCP
header flags are zero.
An AccECN host (client or server) MUST ignore the three remaining
reserved TCP header flags on all packets.
3.1.3. Retransmission of the SYN 3.1.4. Retransmission of the SYN
If the sender of an AccECN SYN times out before receiving the SYN/ If the sender of an AccECN SYN times out before receiving the SYN/
ACK, the sender SHOULD attempt to negotiate the use of AccECN at ACK, the sender SHOULD attempt to negotiate the use of AccECN at
least one more time by continuing to set all three TCP ECN flags on least one more time by continuing to set all three TCP ECN flags on
the first retransmitted SYN (using the usual retransmission time- the first retransmitted SYN (using the usual retransmission time-
outs). If this first retransmission also fails to be acknowledged, outs). If this first retransmission also fails to be acknowledged,
the sender SHOULD send subsequent retransmissions of the SYN without the sender SHOULD send subsequent retransmissions of the SYN with the
any TCP-ECN flags set. This adds delay, in the case where a three TCP-ECN flags cleared (AE=CWR=ECE=0). A retransmitted SYN MUST
use the same ISN as the original SYN.
Retrying once before fall-back adds delay in the case where a
middlebox drops an AccECN (or ECN) SYN deliberately. However, middlebox drops an AccECN (or ECN) SYN deliberately. However,
current measurements imply that a drop is less likely to be due to current measurements imply that a drop is less likely to be due to
middlebox interference than other intermittent causes of loss, e.g. middlebox interference than other intermittent causes of loss, e.g.
congestion, wireless interference, etc. congestion, wireless interference, etc.
Implementers MAY use other fall-back strategies if they are found to Implementers MAY use other fall-back strategies if they are found to
be more effective (e.g. attempting to negotiate AccECN on the SYN be more effective (e.g. attempting to negotiate AccECN on the SYN
only once or more than twice (most appropriate during high levels of only once or more than twice (most appropriate during high levels of
congestion); or falling back to classic ECN feedback rather than non- congestion). However, other fall-back strategies will need to follow
ECN). Further it may make sense to also remove any other all the rules in Section 3.1.5, which concern behaviour when SYNs or
SYN/ACKs negotiating different types of feedback have been sent
within the same connection.
Further it may make sense to also remove any other new or
experimental fields or options on the SYN in case a middlebox might experimental fields or options on the SYN in case a middlebox might
be blocking them, although the required behaviour will depend on the be blocking them, although the required behaviour will depend on the
specification of the other option(s) and any attempt to co-ordinate specification of the other option(s) and any attempt to co-ordinate
fall-back between different modules of the stack. In any case, the fall-back between different modules of the stack.
TCP initiator SHOULD cache failed connection attempts. If it does,
it SHOULD NOT give up attempting to negotiate AccECN on the SYN of Whichever fall-back strategy is used, the TCP initiator SHOULD cache
subsequent connection attempts until it is clear that the blockage is failed connection attempts. If it does, it SHOULD NOT give up
persistently and specifically due to AccECN. The cache should be attempting to negotiate AccECN on the SYN of subsequent connection
arranged to expire so that the initiator will infrequently attempt to attempts until it is clear that the blockage is persistently and
check whether the problem has been resolved. specifically due to AccECN. The cache should be arranged to expire
so that the initiator will infrequently attempt to check whether the
problem has been resolved.
The fall-back procedure if the TCP server receives no ACK to The fall-back procedure if the TCP server receives no ACK to
acknowledge a SYN/ACK that tried to negotiate AccECN is specified in acknowledge a SYN/ACK that tried to negotiate AccECN is specified in
Section 3.2.7. Section 3.2.3.2.
3.1.5. Implications of AccECN Mode
Section 3.1.1 describes the only ways that a host can enter AccECN
mode, whether as a client or as a server.
As a Data Sender, a host in AccECN mode has the rights and
obligations concerning the use of ECN defined below, which build on
those in [RFC3168] as updated by [RFC8311]:
o Using ECT:
* It can set an ECT codepoint in the IP header of packets to
indicate to the network that the transport is capable and
willing to participate in ECN for this packet.
* It does not have to set ECT on any packet (for instance if it
has reason to believe such a packet would be blocked).
* If for any reason it is not willing to provide ECN feedback on
a particular TCP connection, to indicate this unwillingness it
SHOULD clear the AE, CWR and ECE flags in all SYN and/or SYN/
ACK packets that it sends.
o Switching feedback negotiation (e.g. fall-back):
* It SHOULD NOT set ECT on any packet if it has received at least
one valid SYN or Acceptable SYN/ACK with AE=CWR=ECE=0. A
"valid SYN" has the same port numbers and the same ISN as the
SYN that caused the server to enter AccECN mode.
* It MUST NOT send an ECN-setup SYN [RFC3168] within the same
connection as it has sent a SYN requesting AccECN feedback.
* It MUST NOT send an ECN-setup SYN/ACK [RFC3168] within the same
connection as it has sent a SYN/ACK agreeing to use AccECN
feedback.
The above rules are necessary because, when one peer negotiates
the feedback mode in two different types of handshake, it is not
possible for the other peer to know for certain which handshake
packet(s) the other end eventually receives or in which order it
receives them. So the two peers can end up using difference
feedback modes without knowing it.
o Congestion response:
* It is still obliged to respond appropriately to AccECN feedback
with congestion indications on packets it had previously sent,
as defined in Section 6.1 of [RFC3168] and updated by Sections
2.1 and 4.1 of [RFC8311].
* The commitment to respond appropriately to incoming indications
of congestion remains even if it sends a SYN packet with
AE=CWR=ECE=0, in a later transmission within the same TCP
connection.
* Unlike an RFC 3168 data sender, it MUST NOT set CWR to indicate
it has received and responded to indications of congestion (for
the avoidance of doubt, this does not preclude it from setting
the bits of the ACE counter field, which includes an overloaded
use of the same bit).
As a Data Receiver:
o a host in AccECN mode MUST feed back the information in the IP-ECN
field on incoming packets using Accurate ECN feedback, as
specified in Section 3.2 below.
o if it receives an ECN-setup SYN or ECN-setup SYN/ACK [RFC3168]
during the same connection as it receives a SYN requesting AccECN
feedback or a SYN/ACK agreeing to use AccECN feedback, it MUST
reset the connection with a RST packet.
o it MUST NOT use reception of packets with ECT set in the IP-ECN
field as an implicit signal that the peer is ECN-capable. Reason:
ECT at the IP layer does not explicitly confirm the peer has the
correct ECN feedback logic, and the packets could have been
mangled at the IP layer.
3.2. AccECN Feedback 3.2. AccECN Feedback
Each Data Receiver of each half connection maintains four counters, Each Data Receiver of each half connection maintains four counters,
r.cep, r.ceb, r.e0b and r.e1b. The CE packet counter (r.cep), counts r.cep, r.ceb, r.e0b and r.e1b:
the number of packets the host receives with the CE code point in the
IP ECN field, including CE marks on control packets without data.
r.ceb, r.e0b and r.e1b count the number of TCP payload bytes in
packets marked respectively with the CE, ECT(0) and ECT(1) codepoint
in their IP-ECN field. When a host first enters AccECN mode, it
initializes its counters to r.cep = 5, r.e0b = 1 and r.ceb = r.e1b.=
0 (see Appendix A.5). Non-zero initial values are used to support a
stateless handshake (see Section 4.1) and to be distinct from cases
where the fields are incorrectly zeroed (e.g. by middleboxes - see
Section 3.2.7.4).
A host feeds back the CE packet counter using the Accurate ECN (ACE) o The Data Receiver MUST increment the CE packet counter (r.cep),
field, as explained in the next section. And it feeds back all the for every Acceptable packet that it receives with the CE code
byte counters using the AccECN TCP Option, as specified in point in the IP ECN field, including CE marked control packets but
Section 3.2.6. Whenever a host feeds back the value of any counter, excluding CE on SYN packets (SYN=1; ACK=0).
it MUST report the most recent value, no matter whether it is in a
pure ACK, an ACK with new payload data or a retransmission.
Therefore the feedback carried on a retransmitted packet is unlikely
to be the same as the feedback on the original packet.
3.2.1. Initialization of Feedback Counters at the Data Sender o The Data Receiver MUST increment the r.ceb, r.e0b or r.e1b byte
counters by the number of TCP payload octets in Acceptable packets
marked respectively with the CE, ECT(0) and ECT(1) codepoint in
their IP-ECN field, including any payload octets on control
packets, but not including any payload octets on SYN packets
(SYN=1; ACK=0).
Each Data Sender of each half connection maintains four counters, Each Data Sender of each half connection maintains four counters,
s.cep, s.ceb, s.e0b and s.e1b intended to track the equivalent s.cep, s.ceb, s.e0b and s.e1b intended to track the equivalent
counters at the Data Receiver. When a host enters AccECN mode, it counters at the Data Receiver.
initializes them to s.cep = 5, s.e0b = 1 and s.ceb = s.e1b.= 0.
If a TCP client (A) in AccECN mode receives a SYN/ACK with CE A Data Receiver feeds back the CE packet counter using the Accurate
feedback, i.e. AE=1, CWR=1, ECE=0, it increments s.cep to 6. ECN (ACE) field, as explained in Section 3.2.2. And it feeds back
Otherwise, for any of the 3 other combinations of the 3 ECN TCP flags all the byte counters using the AccECN TCP Option, as specified in
(the top 3 rows in Table 2), s.cep remains initialized to 5. Section 3.2.3.
Whenever a host feeds back the value of any counter, it MUST report
the most recent value, no matter whether it is in a pure ACK, an ACK
with new payload data or a retransmission. Therefore the feedback
carried on a retransmitted packet is unlikely to be the same as the
feedback on the original packet.
3.2.1. Initialization of Feedback Counters
When a host first enters AccECN mode, in its role as a Data Receiver
it initializes its counters to r.cep = 5 and r.ceb = 0, The initial
values of the other two byte counters depend on the Data Receiver's
choice of the order of fields it will use in the AccECN TCP Option
(see Section 3.2.3). If field order 0, it will initialize the
remaining counters to r.e0b = 1; r.e1b.= 0. If field order 1, it
will initialize them to r.e0b = 0 and r.e1b.= 0x800001.
Non-zero initial values are used to support a stateless handshake
(see Section 4.1) and to be distinct from cases where the fields are
incorrectly zeroed (e.g. by middleboxes - see Section 3.2.3.2.4).
When a host enters AccECN mode, in its role as a Data Sender it
initializes its counters to s.cep = 5 and s.ceb = 0. The initial
values of the other two byte counters depend on the peer's choice of
the order of fields it will use in the AccECN TCP Option (see
Section 3.2.3). If field order 0, it will initialize the remaining
counters to s.e0b = 1; s.e1b.= 0. If field order 1, it will
initialize them to s.e0b = 0 and s.e1b.= 0x800001.
3.2.2. The ACE Field 3.2.2. The ACE Field
After AccECN has been negotiated on the SYN and SYN/ACK, both hosts After AccECN has been negotiated on the SYN and SYN/ACK, both hosts
overload the three TCP flags (AE, CWR and ECE) in the main TCP header overload the three TCP flags (AE, CWR and ECE) in the main TCP header
as one 3-bit field. Then the field is given a new name, ACE, as as one 3-bit field. Then the field is given a new name, ACE, as
shown in Figure 2. shown in Figure 3.
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+ +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
| | | | U | A | P | R | S | F | | | | | U | A | P | R | S | F |
| Header Length | Reserved | ACE | R | C | S | S | Y | I | | Header Length | Reserved | ACE | R | C | S | S | Y | I |
| | | | G | K | H | T | N | N | | | | | G | K | H | T | N | N |
+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+ +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
Figure 2: Definition of the ACE field within bytes 13 and 14 of the Figure 3: Definition of the ACE field within bytes 13 and 14 of the
TCP Header (when AccECN has been negotiated and SYN=0). TCP Header (when AccECN has been negotiated and SYN=0).
The original definition of these three flags in the TCP header, The original definition of these three flags in the TCP header,
including the addition of support for the ECN Nonce, is shown for including the addition of support for the ECN Nonce, is shown for
comparison in Figure 1. This specification does not rename these comparison in Figure 1. This specification does not rename these
three TCP flags to ACE unconditionally; it merely overloads them with three TCP flags to ACE unconditionally; it merely overloads them with
another name and definition once an AccECN connection has been another name and definition once an AccECN connection has been
established. established.
A host MUST interpret the AE, CWR and ECE flags as the 3-bit ACE With one exception (Section 3.2.2.1), a host with both of its half-
counter on a segment with the SYN flag cleared (SYN=0) that it sends connections in AccECN mode MUST interpret the AE, CWR and ECE flags
or receives if both of its half-connections are set into AccECN mode as the 3-bit ACE counter on a segment with the SYN flag cleared
having successfully negotiated AccECN (see Section 3.1). A host MUST (SYN=0). On such a packet, a Data Receiver MUST encode the three
NOT interpret the 3 flags as a 3-bit ACE field on any segment with least significant bits of its r.cep counter into the ACE field that
SYN=1 (whether ACK is 0 or 1), or if AccECN negotiation is incomplete it feeds back to the Data Sender. A host MUST NOT interpret the 3
or has not succeeded. flags as a 3-bit ACE field on any segment with SYN=1 (whether ACK is
0 or 1), or if AccECN negotiation is incomplete or has not succeeded.
Both parts of each of these conditions are equally important. For Both parts of each of these conditions are equally important. For
instance, even if AccECN negotiation has been successful, the ACE instance, even if AccECN negotiation has been successful, the ACE
field is not defined on any segments with SYN=1 (e.g. a field is not defined on any segments with SYN=1 (e.g. a
retransmission of an unacknowledged SYN/ACK, or when both ends send retransmission of an unacknowledged SYN/ACK, or when both ends send
SYN/ACKs after AccECN support has been successfully negotiated during SYN/ACKs after AccECN support has been successfully negotiated during
a simultaneous open). a simultaneous open).
With only one exception, on any packet with the SYN flag cleared 3.2.2.1. ACE Field on the ACK of the SYN/ACK
(SYN=0), the Data Receiver MUST encode the three least significant
bits of its r.cep counter into the ACE field it feeds back to the
Data Sender.
There is only one exception to this rule: On the final ACK of the A TCP client (A) in AccECN mode MUST feed back which of the 4
3-way handshake (3WHS), a TCP client (A) in AccECN mode MUST use the possible values of the IP-ECN field was on the SYN/ACK by writing it
appropriate values of the ACE field in Table 3 to feed back which of into the ACE field of a pure ACK with no SACK blocks using the binary
the 4 possible values of the IP-ECN field were on the SYN/ACK (the encoding in Table 3 (which is the same as that used on the SYN/ACK in
binary encoding is the same as that used on the SYN/ACK). Table 3 Table 2). This shall be called the handshake encoding of the ACE
shows the meaning of each possible value of the ACE field on the ACK field, and it is the only exception to the rule that the ACE field
of the SYN/ACK and the value that an AccECN server MUST set s.cep to carries the 3 least significant bits of the r.cep counter on packets
as a result. The encoding in Table 3 is solely applicable on a with SYN=0.
packet in the client-server direction with an acknowledgement number
1 greater than the Initial Sequence Number (ISN) that was used by the
server.
+--------------+---------------------------+------------------------+ Normally, a TCP client acknowledges a SYN/ACK with an ACK that
| ACE on ACK | IP-ECN codepoint on | Initial s.cep of | satisfies the above conditions anyway (SYN=0, no data, no SACK
| of SYN/ACK | SYN/ACK inferred by | server in AccECN mode | blocks). If an AccECN TCP client intends to acknowledge the SYN/ACK
| | server | | with a packet that does not satisfy these conditions (e.g. it has
+--------------+---------------------------+------------------------+ data to include on the ACK), it SHOULD first send a pure ACK that
| 0b000 | {Notes 1, 3} | Disable ECN | does satisfy these conditions (see Section 4.2), so that it can feed
| 0b001 | {Notes 2, 3} | 5 | back which of the four values of the IP-ECN field arrived on the SYN/
| 0b010 | Not-ECT | 5 | ACK. A valid exception to this "SHOULD" would be where the
| 0b011 | ECT(1) | 5 | implementation will only be used in an environment where mangling of
| 0b100 | ECT(0) | 5 | the ECN field is unlikely.
| 0b101 | Currently Unused {Note 2} | 5 |
| 0b110 | CE | 6 |
| 0b111 | Currently Unused {Note 2} | 5 |
+--------------+---------------------------+------------------------+
Table 3: Meaning of the ACE field on the ACK of the SYN/ACK +---------------------+---------------------+-----------------------+
| IP-ECN codepoint on | ACE on pure ACK of | r.cep of client in |
| SYN/ACK | SYN/ACK | AccECN mode |
+---------------------+---------------------+-----------------------+
| Not-ECT | 0b010 | 5 |
| ECT(1) | 0b011 | 5 |
| ECT(0) | 0b100 | 5 |
| CE | 0b110 | 6 |
+---------------------+---------------------+-----------------------+
Table 3: The encoding of the ACE field in the ACK of the SYN-ACK to
reflect the SYN-ACK's IP-ECN field
When an AccECN server in SYN-RCVD state receives a pure ACK with
SYN=0 and no SACK blocks, instead of treating the ACE field as a
counter, it MUST infer the meaning of each possible value of the ACE
field from Table 4, which also shows the value that an AccECN server
MUST set s.cep to as a result.
Given this encoding of the ACE field on the ACK of a SYN/ACK is
exceptional, an AccECN server using large receive offload (LRO) might
prefer to disable LRO until such an ACK has transitioned it out of
SYN-RCVD state.
+---------------+-----------------------------+---------------------+
| ACE on ACK of | IP-ECN codepoint on SYN/ACK | s.cep of server in |
| SYN/ACK | inferred by server | AccECN mode |
+---------------+-----------------------------+---------------------+
| 0b000 | {Notes 1, 3} | Disable ECN |
| 0b001 | {Notes 2, 3} | 5 |
| 0b010 | Not-ECT | 5 |
| 0b011 | ECT(1) | 5 |
| 0b100 | ECT(0) | 5 |
| 0b101 | Currently Unused {Note 2} | 5 |
| 0b110 | CE | 6 |
| 0b111 | Currently Unused {Note 2} | 5 |
+---------------+-----------------------------+---------------------+
Table 4: Meaning of the ACE field on the ACK of the SYN/ACK
{Note 1}: If the server is in AccECN mode, the value of zero raises {Note 1}: If the server is in AccECN mode, the value of zero raises
suspicion of zeroing of the ACE field on the path (see suspicion of zeroing of the ACE field on the path (see
Section 3.2.3). Section 3.2.2.3).
{Note 2}: If the server is in AccECN mode, these values are Currently {Note 2}: If the server is in AccECN mode, these values are Currently
Unused but the AccECN server's behaviour is still defined for forward Unused but the AccECN server's behaviour is still defined for forward
compatibility. Then the designer of a future protocol can know for compatibility. Then the designer of a future protocol can know for
certain what AccECN servers will do with these codepoints. certain what AccECN servers will do with these codepoints.
{Note 3}: In the case where a server that implements AccECN is also {Note 3}: In the case where a server that implements AccECN is also
using a stateless handshake (termed a SYN cookie) it will not using a stateless handshake (termed a SYN cookie) it will not
remember whether it entered AccECN mode. The values 0b000 or 0b001 remember whether it entered AccECN mode. The values 0b000 or 0b001
will remind it that it did not enter AccECN mode, because AccECN does will remind it that it did not enter AccECN mode, because AccECN does
not use them (see Section 4.1 for details). not use them (see Section 4.1 for details). If a stateless server
that implements AccECN receives either of these two values in the
ACK, its action is implementation-dependent and outside the scope of
this spec, It will certainly not take the action in the third column
because, after it receives either of these values, it is not in
AccECN mode. I.e., it will not disable ECN (at least not just
because ACE is 0b000) and it will not set s.cep.
If a stateless server that implements AccECN receives either of these 3.2.2.2. Encoding and Decoding Feedback in the ACE Field
two values in the ACK, its action is implementation-dependent and
outside the scope of this spec, It will certainly not take the action
in the third column because, after it receives either of these
values, it is not in AccECN mode. I.e., it will not disable ECN (at
least not just because ACE is 0b000) and it will not set s.cep.
3.2.3. Testing for Zeroing of the ACE Field Whenever the Data Receiver sends an ACK with SYN=0 (with or without
data), unless the handshake encoding in Section 3.2.2.1 applies, the
Data Receiver MUST encode the least significant 3 bits of its r.cep
counter into the ACE field (see Appendix A.2).
Whenever the Data Sender receives an ACK with SYN=0 (with or without
data), it first checks whether it has already been superseded by
another ACK in which case it ignores the ECN feedback. If the ACK
has not been superseded, and if the special handshake encoding in
Section 3.2.2.1 does not apply, the Data Sender decodes the ACE field
as follows (see Appendix A.2 for examples).
o It takes the least significant 3 bits of its local s.cep counter
and subtracts them from the incoming ACE counter to work out the
minimum positive increment it could apply to s.cep (assuming the
ACE field only wrapped at most once).
o It then follows the safety procedures in Section 3.2.2.5.2 to
calculate or estimate how many packets the ACK could have
acknowledged under the prevailing conditions to determine whether
the ACE field might have wrapped more than once.
The encode/decode procedures during the three-way handshake are
exceptions to the general rules given so far, so they are spelled out
step by step below for clarity:
o If a TCP server in AccECN mode receives a CE mark in the IP-ECN
field of a SYN (SYN=1, ACK=0), it MUST NOT increment r.cep (it
remains at its initial value of 5).
Reason: It would be redundant for the server to include CE-marked
SYNs in its r.cep counter, because it already reliably delivers
feedback of any CE marking on the SYN/ACK using the encoding in
Table 2. This also ensures that, when the server starts using the
ACE field, it has not unnecessarily consumed more than one initial
value, given they can be used to negotiate variants of the AccECN
protocol (see Appendix B.3).
o If a TCP client in AccECN mode receives CE feedback in the TCP
flags of a SYN/ACK, it MUST NOT increment s.cep (it remains at its
initial value of 5), so that it stays in step with r.cep on the
server. Nonetheless, the TCP client still triggers the congestion
control actions necessary to respond to the CE feedback.
o If a TCP client in AccECN mode receives a CE mark in the IP-ECN
field of a SYN/ACK, it MUST increment r.cep, but no more than once
no matter how many CE-marked SYN/ACKs it receives (i.e.
incremented from 5 to 6, but no further).
Reason: Incrementing r.cep ensures the client will eventually
deliver any CE marking to the server reliably when it starts using
the ACE field. Even though the client also feeds back any CE
marking on the ACK of the SYN/ACK using the encoding in Table 3,
this ACK is not delivered reliably, so it can be considered as a
timely notification that is redundant but unreliable. The client
does not increment r.cep more than once, because the server can
only increment s.cep once (see next bullet). Also, this limits
the unnecessarily consumed initial values of the ACE field to two.
o If a TCP server in AccECN mode and in SYN-RCVD state receives CE
feedback in the TCP flags of a pure ACK with no SACK blocks, it
MUST increment s.cep (from 5 to 6). The TCP server then triggers
the congestion control actions necessary to respond to the CE
feedback.
Reasoning: The TCP server can only increment s.cep once, because
the first ACK it receives will cause it to transition out of SYN-
RCVD state. The server's congestion response would be no
different even if it could receive feedback of more than one CE-
marked SYN/ACK.
Once the TCP server transitions to ESTABLISHED state, it might
later receive other pure ACK(s) with the handshake encoding in the
ACE field. The conditions for this to occur are quite unusual,
but not impossible, e.g. a SYN/ACK (or ACK of the SYN/ACK) that is
delayed for longer than the server's retransmission timeout; or
packet duplication by the network. Nonetheless, once in the
ESTABLISHED state, the server will consider the ACE field to be
encoded as the normal ACE counter on all packets with SYN=0 (given
it will be following the above rule in this bullet). The server
MAY include a test to avoid this case.
3.2.2.3. Testing for Zeroing of the ACE Field
Section 3.2.2 required the Data Receiver to initialize the r.cep Section 3.2.2 required the Data Receiver to initialize the r.cep
counter to a non-zero value. Therefore, in either direction the counter to a non-zero value. Therefore, in either direction the
initial value of the ACE field ought to be non-zero. initial value of the ACE counter ought to be non-zero.
If AccECN has been successfully negotiated, the Data Sender SHOULD If AccECN has been successfully negotiated, the Data Sender SHOULD
check the initial value of the ACE field in the first arriving check the value of the ACE counter in the first packet (with or
segment with SYN=0. If the initial value of the ACE field is zero without data) that arrives with SYN=0. If the value of this ACE
(0b000), the Data Sender MUST disable sending ECN-capable packets for field is zero (0b000), the Data Sender disables sending ECN-capable
the remainder of the half-connection by setting the IP/ECN field in packets for the remainder of the half-connection by setting the IP/
all subsequent packets to Not-ECT. ECN field in all subsequent packets to Not-ECT.
For example, the server checks the ACK of the SYN/ACK or the first Usually, the server checks the ACK of the SYN/ACK from the client,
data segment from the client, while the client checks the first data while the client checks the first data segment from the server.
segment from the server. More precisely, the "first segment with However, if reordering occurs, "the first packet ... that arrives"
SYN=0" is defined as: the segment with SYN=0 that i) acknowledges will not necessarily be the same as the first packet in sequence
sequence space at least covering the initial sequence number (ISN) order. The test has been specified loosely like this to simplify
plus 1; and ii) arrives before any other segments with SYN=0 so it is implementation, and because it would not have been any more precise
unlikely to be a retransmission. If no such segment arrives (e.g. to have specified the first packet in sequence order, which would not
because it is lost and the ISN is first acknowledged by a subsequent necessarily be the first ACE counter that the Data Receiver fed back
segment), no test for invalid initialization can be conducted, and anyway, given it might have been a retransmission.
the half-connection will continue in AccECN mode.
The possibility of re-ordering means that there is a small chance
that the ACE field on the first packet to arrive is genuinely zero
(without middlebox interference). This would cause a host to
unnecessarily disable ECN for a half connection. Therefore, in
environments where there is no evidence of the ACE field being
zeroed, implementations can skip this test.
Note that the Data Sender MUST NOT test whether the arriving counter Note that the Data Sender MUST NOT test whether the arriving counter
in the initial ACE field has been initialized to a specific valid in the initial ACE field has been initialized to a specific valid
value - the above check solely tests whether the ACE fields have been value - the above check solely tests whether the ACE fields have been
incorrectly zeroed. This allows hosts to use different initial incorrectly zeroed. This allows hosts to use different initial
values as an additional signalling channel in future. values as an additional signalling channel in future.
3.2.4. Testing for Mangling of the IP/ECN Field 3.2.2.4. Testing for Mangling of the IP/ECN Field
The value of the ACE field on the SYN/ACK indicates the value of the The value of the ACE field on the SYN/ACK indicates the value of the
IP/ECN field when the SYN arrived at the server. The client can IP/ECN field when the SYN arrived at the server. The client can
compare this with how it originally set the IP/ECN field on the SYN. compare this with how it originally set the IP/ECN field on the SYN.
If this comparison implies an unsafe transition (see below) of the If this comparison implies an unsafe transition (see below) of the
IP/ECN field, for the remainder of the connection the client MUST NOT IP/ECN field, for the remainder of the connection the client MUST NOT
send ECN-capable packets, but it MUST continue to feed back any ECN send ECN-capable packets, but it MUST continue to feed back any ECN
markings on arriving packets. markings on arriving packets.
The value of the ACE field on the last ACK of the 3WHS indicates the The value of the ACE field on the last ACK of the 3WHS indicates the
skipping to change at page 20, line 8 skipping to change at page 26, line 5
This scenario could well happen where an ECN-enabled home router This scenario could well happen where an ECN-enabled home router
congests its upstream mobile broadband bottleneck link, then the congests its upstream mobile broadband bottleneck link, then the
ingress to the mobile network clears the ECN field [Mandalari18]. ingress to the mobile network clears the ECN field [Mandalari18].
The above fall-back behaviours are necessary in case mangling of the The above fall-back behaviours are necessary in case mangling of the
IP/ECN field is asymmetric, which is currently common over some IP/ECN field is asymmetric, which is currently common over some
mobile networks [Mandalari18]. Then one end might see no unsafe mobile networks [Mandalari18]. Then one end might see no unsafe
transition and continue sending ECN-capable packets, while the other transition and continue sending ECN-capable packets, while the other
end sees an unsafe transition and stops sending ECN-capable packets. end sees an unsafe transition and stops sending ECN-capable packets.
3.2.5. Safety against Ambiguity of the ACE Field 3.2.2.5. Safety against Ambiguity of the ACE Field
If too many CE-marked segments are acknowledged at once, or if a long If too many CE-marked segments are acknowledged at once, or if a long
run of ACKs is lost, the 3-bit counter in the ACE field might have run of ACKs is lost or thinned out, the 3-bit counter in the ACE
cycled between two ACKs arriving at the Data Sender. field might have cycled between two ACKs arriving at the Data Sender.
The following safety procedures minimize this ambiguity.
Therefore an AccECN Data Receiver SHOULD immediately send an ACK once 3.2.2.5.1. Data Receiver Safety Procedures
'n' CE marks have arrived since the previous ACK, where 'n' SHOULD be
2 and MUST be no greater than 6. An AccECN Data Receiver:
o SHOULD immediately send an ACK whenever a data packet marked CE
arrives after the previous data packet was not CE.
o MUST immediately send an ACK once 'n' CE marks have arrived since
the previous ACK, where 'n' SHOULD be 2 and MUST be no greater
than 6.
These rules for when to send an ACK are designed to be complemented
by those in Section 3.2.3.3, which concern whether the AccECN TCP
Option ought to be included on ACKs.
For the avoidance of doubt, the change-triggered ACK mechanism is
deliberately worded to solely apply to data packets, and to ignore
the arrival of a control packet with no payload, because it is
important that TCP does not acknowledge pure ACKs. The change-
triggered ACK approach can lead to some additional ACKs but it feeds
back the timing and the order in which ECN marks are received with
minimal additional complexity. If only CE marks are infrequent, or
there are multiple marks in a row, the additional load will be low.
Other marking patterns could increase the load significantly.
Investigating the additional load is a goal of the proposed
experiment.
Even though the first bullet is stated as a "SHOULD", it is important
for a transition to immediately trigger an ACK if at all possible, so
that the Data Sender can rely on change-triggered ACKs to detect
queue growth as soon as possible, e.g. at the start of a flow. This
requirement can only be relaxed if certain offload hardware needed
for high performance cannot support change-triggered ACKs (although
high performance protocols such as DCTCP already successfully use
change-triggered ACKs). One possible experimental compromise would
be for the receiver to heuristically detect whether the sender is in
slow-start, then to implement change-triggered ACKs while the sender
is in slow-start, and offload otherwise.
3.2.2.5.2. Data Sender Safety Procedures
If the Data Sender has not received AccECN TCP Options to give it If the Data Sender has not received AccECN TCP Options to give it
more dependable information, and it detects that the ACE field could more dependable information, and it detects that the ACE field could
have cycled under the prevailing conditions, it SHOULD conservatively have cycled, it SHOULD deem whether it cycled by taking the safest
assume that the counter did cycle. It can detect if the counter likely case under the prevailing conditions. It can detect if the
could have cycled by using the jump in the acknowledgement number counter could have cycled by using the jump in the acknowledgement
since the last ACK to calculate or estimate how many segments could number since the last ACK to calculate or estimate how many segments
have been acknowledged. An example algorithm to implement this could have been acknowledged. An example algorithm to implement this
policy is given in Appendix A.2. An implementer MAY develop an policy is given in Appendix A.2. An implementer MAY develop an
alternative algorithm as long as it satisfies these requirements. alternative algorithm as long as it satisfies these requirements.
If missing acknowledgement numbers arrive later (reordering) and If missing acknowledgement numbers arrive later (reordering) and
prove that the counter did not cycle, the Data Sender MAY attempt to prove that the counter did not cycle, the Data Sender MAY attempt to
neutralise the effect of any action it took based on a conservative neutralize the effect of any action it took based on a conservative
assumption that it later found to be incorrect. assumption that it later found to be incorrect.
3.2.6. The AccECN Option The Data Sender can estimate how many packets (of any marking) an ACK
acknowledges. If the ACE counter on an ACK seems to imply that the
minimum number of newly CE-marked packets is greater that the number
of newly acknowledged packets, the Data Sender SHOULD believe the ACE
counter, unless it can be sure that it is counting all control
packets correctly.
The AccECN Option is defined as shown below in Figure 3. It consists 3.2.3. The AccECN Option
of three 24-bit fields that provide the 24 least significant bits of
the r.e0b, r.ceb and r.e1b counters, respectively. The initial 'E' The AccECN Option is defined as shown in Figure 4. The initial 'E'
of each field name stands for 'Echo'. of each field name stands for 'Echo'.
0 1 2 3 0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Kind = TBD1 | Length = 11 | EE0B field | | Kind = TBD1 | Length = 11 | EE0B field |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| EE0B (cont'd) | ECEB field | | EE0B (cont'd) | ECEB field |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| EE1B field | | EE1B field | Order 0
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Figure 3: The AccECN Option 0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Kind = TBD1 | Length = 11 | EE1B field |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| EE1B (cont'd) | ECEB field |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| EE0B field | Order 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Figure 4: The AccECN TCP Option
When a Data Receiver sends an AccECN Option, it MUST set the Kind When a Data Receiver sends an AccECN Option, it MUST set the Kind
field to TBD1, which is registered in Section 6 as a new TCP option field to TBD1, which is registered in Section 6 as a new TCP option
Kind called AccECN. An experimental TCP option with Kind=254 MAY be Kind called AccECN. An experimental TCP option with Kind=254 MAY be
used for initial experiments, with magic number 0xACCE. used for initial experiments, with magic number 0xACCE.
Appendix A.1 gives an example algorithm for the Data Receiver to Figure 4 shows two option field orders; order 0 and order 1. They
encode its byte counters into the AccECN Option, and for the Data both consists of three 24-bit fields. Order 0 provides the 24 least
Sender to decode the AccECN Option fields into its byte counters. significant bits of the r.e0b, r.ceb and r.e1b counters,
respectively. Order 1 provides the same fields, but in the opposite
order. Each half-connection can use a different field order, but a
Data Receiver MUST consistently send the same field order within the
same half-connection.
The field order to use for each half-connection is up to the Data
Receiver implementation. It might use the same hard-coded order for
all half-connections, or it might make a different choice for each
half-connection. For instance, the implementation of a Data Receiver
might default to using order 0, unless the ECN field in the IP header
of the packet it received during the 3WHS is ECT(1). A Data Receiver
just starts using its chosen field order and the field immediately
after the length field in the first AccECN TCP Option of a half-
connection will intrinsically indicate which order it is using,
because the initial counter values that it is required to use depend
on its chosen field order (see Section 3.2.1).
A Data Sender can know which field order the Data Receiver is using
for a half-connection from the most significant bit (MSB) of the
counter in the field immediately after the length field in the first
non-empty AccECN TCP Option to arrive. If this MSB = 0, field order
0 is being used, and if MSB = 1, field order 1 is being used. Note
that the Data Sender only tests the most significant bit, not the
value of the whole field, because the counters in the first packet to
arrive might have started to increment (e.g. if the first packet to
arrive is not the first packet sent due to loss or reordering).
Note that there is no field to feed back Not-ECT bytes. Nonetheless Note that there is no field to feed back Not-ECT bytes. Nonetheless
an algorithm for the Data Sender to calculate the number of payload an algorithm for the Data Sender to calculate the number of payload
bytes received as Not-ECT is given in Appendix A.5. bytes received as Not-ECT is given in Appendix A.5.
Whenever a Data Receiver sends an AccECN Option, the rules in Whenever a Data Receiver sends an AccECN Option, the rules in
Section 3.2.8 expect it to always send a full-length option. To cope Section 3.2.3.3 expect it to usually send a full-length option. To
with option space limitations, it can omit unchanged fields from the cope with option space limitations, it can omit unchanged fields from
tail of the option, as long as it preserves the order of the the tail of the option, as long as it preserves the order of the
remaining fields and includes any field that has changed. The length remaining fields and includes any field that has changed. The length
field MUST indicate which fields are present as follows: field MUST indicate which fields are present as follows:
Length=11: EE0B, ECEB, EE1B +--------+------------------+------------------+
| Length | Type 0 | Type 1 |
Length=8: EE0B, ECEB +--------+------------------+------------------+
| 11 | EE0B, ECEB, EE1B | EE1B, ECEB, EE0B |
Length=5: EE0B | 8 | EE0B, ECEB | EE1B, ECEB |
| 5 | EE0B | EE1B |
Length=2: (empty) | 2 | (empty) | (empty) |
+--------+------------------+------------------+
The empty option of Length=2 is provided to allow for a case where an The empty option of Length=2 is provided to allow for a case where an
AccECN Option has to be sent (e.g. on the SYN/ACK to test the path), AccECN Option has to be sent (e.g. on the SYN/ACK to test the path),
but there is very limited space for the option. For initial but there is very limited space for the option. For initial
experiments, the Length field MUST be 2 greater to accommodate the experiments, the Length field MUST be 2 greater to accommodate the
16-bit magic number. 16-bit magic number.
All implementations of a Data Sender that read any AccECN Option MUST All implementations of a Data Sender that read any AccECN Option MUST
be able to read in AccECN Options of any of the above lengths. If be able to read in AccECN Options of any of the above lengths. For
the AccECN Option is of any other length, implementations MUST use forward compatibility, if the AccECN Option is of any other length,
those whole 3 octet fields that fit within the length and ignore the implementations MUST use those whole 3 octet fields that fit within
remainder of the option. the length and ignore the remainder of the option.
The AccECN Option has to be optional to implement, because both The AccECN Option has to be optional to implement, because both
sender and receiver have to be able to cope without the option anyway sender and receiver have to be able to cope without the option anyway
- in cases where it does not traverse a network path. It is - in cases where it does not traverse a network path. It is
RECOMMENDED to implement both sending and receiving of the AccECN RECOMMENDED to implement both sending and receiving of the AccECN
Option. If sending of the AccECN Option is implemented, the fall- Option. If sending of the AccECN Option is implemented, the fall-
backs described in this document will need to be implemented as well backs described in this document will need to be implemented as well
(unless solely for a controlled environment where path traversal is (unless solely for a controlled environment where path traversal is
not considered a problem). Even if a developer does not implement not considered a problem). Even if a developer does not implement
sending of the AccECN Option, it is RECOMMENDED that they still sending of the AccECN Option, it is RECOMMENDED that they still
implement logic to receive and understand any AccECN Options sent by implement logic to receive and understand any AccECN Options sent by
remote peers. remote peers.
If a Data Receiver intends to send the AccECN Option at any time If a Data Receiver intends to send the AccECN Option at any time
during the rest of the connection it is strongly recommended to also during the rest of the connection it is strongly recommended to also
test path traversal of the AccECN Option as specified in the next test path traversal of the AccECN Option as specified in
section. Section 3.2.3.2.
3.2.7. Path Traversal of the AccECN Option 3.2.3.1. Encoding and Decoding Feedback in the AccECN Option Fields
3.2.7.1. Testing the AccECN Option during the Handshake Whenever the Data Receiver includes any of the counter fields (ECEB,
EE0B, EE1B) in an AccECN Option, it MUST encode the 24 least
significant bits of the current value of the associated counter into
the field (respectively r.ceb, r.e0b, r.e1b).
The TCP client MUST NOT include the AccECN TCP Option on the SYN. A Whenever the Data Sender receives ACK carrying an AccECN Option, it
first checks whether the ACK has already been superseded by another
ACK in which case it ignores the ECN feedback. If the ACK has not
been superseded, the Data Sender MUST decode the fields in the AccECN
Option as follows. For each field, it takes the least significant 24
bits of its associated local counter (s.ceb, s.e0b or s.e1b) and
subtracts them from the counter in the associated field of the
incoming AccECN Option (respectively ECEB, EE0B, EE1B), to work out
the minimum positive increment it could apply to s.ceb, s.e0b or
s.e1b (assuming the field in the option only wrapped at most once).
Appendix A.1 gives an example algorithm for the Data Receiver to
encode its byte counters into the AccECN Option, and for the Data
Sender to decode the AccECN Option fields into its byte counters.
Note that, as specified in Section 3.2, any data on the SYN (SYN=1,
ACK=0) is not included in any of the locally held octet counters nor
in the AccECN Option on the wire.
3.2.3.2. Path Traversal of the AccECN Option
3.2.3.2.1. Testing the AccECN Option during the Handshake
The TCP client MUST NOT include the AccECN TCP Option on the SYN. (A
fall-back strategy for the loss of the SYN (possibly due to middlebox fall-back strategy for the loss of the SYN (possibly due to middlebox
interference) is specified in Section 3.1.3. interference) is specified in Section 3.1.4.)
A TCP server that confirms its support for AccECN (in response to an A TCP server that confirms its support for AccECN (in response to an
AccECN SYN from the client as described in Section 3.1) SHOULD AccECN SYN from the client as described in Section 3.1) SHOULD
include an AccECN TCP Option in the SYN/ACK. include an AccECN TCP Option on the SYN/ACK.
A TCP client that has successfully negotiated AccECN SHOULD include A TCP client that has successfully negotiated AccECN SHOULD include
an AccECN Option in the first ACK at the end of the 3WHS. However, an AccECN Option in the first ACK at the end of the 3WHS. However,
this first ACK is not delivered reliably, so the TCP client SHOULD this first ACK is not delivered reliably, so the TCP client SHOULD
also include an AccECN Option on the first data segment it sends (if also include an AccECN Option on the first data segment it sends (if
it ever sends one). it ever sends one).
A host MAY NOT include an AccECN Option in any of these three cases A host MAY NOT include an AccECN Option in any of these three cases
if it has cached knowledge that the packet would be likely to be if it has cached knowledge that the packet would be likely to be
blocked on the path to the other host if it included an AccECN blocked on the path to the other host if it included an AccECN
Option. Option.
3.2.7.2. Testing for Loss of Packets Carrying the AccECN Option 3.2.3.2.2. Testing for Loss of Packets Carrying the AccECN Option
If after the normal TCP timeout the TCP server has not received an If after the normal TCP timeout the TCP server has not received an
ACK to acknowledge its SYN/ACK, the SYN/ACK might just have been ACK to acknowledge its SYN/ACK, the SYN/ACK might just have been
lost, e.g. due to congestion, or a middlebox might be blocking the lost, e.g. due to congestion, or a middlebox might be blocking the
AccECN Option. To expedite connection setup, the TCP server SHOULD AccECN Option. To expedite connection setup, the TCP server SHOULD
retransmit the SYN/ACK repeating the AE, CWR and ECE TCP flags on the retransmit the SYN/ACK repeating the same AE, CWR and ECE TCP flags
original SYN/ACK but with no AccECN Option. If this retransmission as on the original SYN/ACK but with no AccECN Option. If this
times out, to expedite connection setup, the TCP server SHOULD retransmission times out, to expedite connection setup, the TCP
disable AccECN and ECN for this connection by retransmitting the SYN/ server SHOULD disable AccECN and ECN for this connection by
ACK with AE=CWR=ECE=0 and no AccECN Option. Implementers MAY use retransmitting the SYN/ACK with AE=CWR=ECE=0 and no AccECN Option.
other fall-back strategies if they are found to be more effective
(e.g. falling back to classic ECN feedback on the first Implementers MAY use other fall-back strategies if they are found to
retransmission; retrying the AccECN Option for a second time before be more effective (e.g. retrying the AccECN Option for a second time
fall-back (most appropriate during high levels of congestion); or before fall-back - most appropriate during high levels of
falling back to classic ECN feedback rather than non-ECN on the third congestion). However, other fall-back strategies will need to follow
retransmission). all the rules in Section 3.1.5, which concern behaviour when SYNs or
SYN/ACKs negotiating different types of feedback have been sent
within the same connection.
If the TCP client detects that the first data segment it sent with If the TCP client detects that the first data segment it sent with
the AccECN Option was lost, it SHOULD fall back to no AccECN Option the AccECN Option was lost, it SHOULD fall back to no AccECN Option
on the retransmission. Again, implementers MAY use other fall-back on the retransmission. Again, implementers MAY use other fall-back
strategies such as attempting to retransmit a second segment with the strategies such as attempting to retransmit a second segment with the
AccECN Option before fall-back, and/or caching whether the AccECN AccECN Option before fall-back, and/or caching whether the AccECN
Option is blocked for subsequent connections. Option is blocked for subsequent connections.
[I-D.ietf-tcpm-2140bis] further discusses caching of TCP parameters
and status information.
If a host falls back to not sending the AccECN Option, it will
continue to process any incoming AccECN Options as normal.
Either host MAY include the AccECN Option in a subsequent segment to Either host MAY include the AccECN Option in a subsequent segment to
retest whether the AccECN Option can traverse the path. retest whether the AccECN Option can traverse the path.
If the TCP server receives a second SYN with a request for AccECN If the TCP server receives a second SYN with a request for AccECN
support, it should resend the SYN/ACK, again confirming its support support, it should resend the SYN/ACK, again confirming its support
for AccECN, but this time without the AccECN Option. This approach for AccECN, but this time without the AccECN Option. This approach
rules out any interference by middleboxes that may drop packets with rules out any interference by middleboxes that may drop packets with
unknown options, even though it is more likely that the SYN/ACK would unknown options, even though it is more likely that the SYN/ACK would
have been lost due to congestion. The TCP server MAY try to send have been lost due to congestion. The TCP server MAY try to send
another packet with the AccECN Option at a later point during the another packet with the AccECN Option at a later point during the
connection but should monitor if that packet got lost as well, in connection but should monitor if that packet got lost as well, in
which case it SHOULD disable the sending of the AccECN Option for which case it SHOULD disable the sending of the AccECN Option for
this half-connection. this half-connection.
Similarly, an AccECN end-point MAY separately memorize which data Similarly, an AccECN end-point MAY separately memorize which data
packets carried an AccECN Option and disable the sending of AccECN packets carried an AccECN Option and disable the sending of AccECN
Options if the loss probability of those packets is significantly Options if the loss probability of those packets is significantly
higher than that of all other data packets in the same connection. higher than that of all other data packets in the same connection.
3.2.7.3. Testing for Absence of the AccECN Option 3.2.3.2.3. Testing for Absence of the AccECN Option
If the TCP client has successfully negotiated AccECN but does not If the TCP client has successfully negotiated AccECN but does not
receive an AccECN Option on the SYN/ACK (e.g. because is has been receive an AccECN Option on the SYN/ACK (e.g. because is has been
stripped by a middlebox or not sent by the server), the client stripped by a middlebox or not sent by the server), the client
switches into a mode that assumes that the AccECN Option is not switches into a mode that assumes that the AccECN Option is not
available for this half connection. available for this half connection.
Similarly, if the TCP server has successfully negotiated AccECN but Similarly, if the TCP server has successfully negotiated AccECN but
does not receive an AccECN Option on the first segment that does not receive an AccECN Option on the first segment that
acknowledges sequence space at least covering the ISN, it switches acknowledges sequence space at least covering the ISN, it switches
into a mode that assumes that the AccECN Option is not available for into a mode that assumes that the AccECN Option is not available for
this half connection. this half connection.
While a host is in this mode that assumes incoming AccECN Options are While a host is in this mode that assumes incoming AccECN Options are
not available, it MUST adopt the conservative interpretation of the not available, it MUST adopt the conservative interpretation of the
ACE field discussed in Section 3.2.5. However, it cannot make any ACE field discussed in Section 3.2.2.5. However, it cannot make any
assumption about support of outgoing AccECN Options on the other half assumption about support of outgoing AccECN Options on the other half
connection, so it SHOULD continue to send the AccECN Option itself connection, so it SHOULD continue to send the AccECN Option itself
(unless it has established that sending the AccECN Option is causing (unless it has established that sending the AccECN Option is causing
packets to be blocked as in Section 3.2.7.2). packets to be blocked as in Section 3.2.3.2.2).
If a host is in the mode that assumes incoming AccECN Options are not If a host is in the mode that assumes incoming AccECN Options are not
available, but it receives an AccECN Option at any later point during available, but it receives an AccECN Option at any later point during
the connection, this clearly indicates that the AccECN Option is not the connection, this clearly indicates that the AccECN Option is not
blocked on the respective path, and the AccECN endpoint MAY switch blocked on the respective path, and the AccECN endpoint MAY switch
out of the mode that assumes the AccECN Option is not available for out of the mode that assumes the AccECN Option is not available for
this half connection. this half connection.
3.2.7.4. Test for Zeroing of the AccECN Option 3.2.3.2.4. Test for Zeroing of the AccECN Option
For a related test for invalid initialization of the ACE field, see For a related test for invalid initialization of the ACE field, see
Section 3.2.3 Section 3.2.2.3
Section 3.2 required the Data Receiver to initialize the r.e0b Section 3.2 required the Data Receiver to initialize the r.e0b
counter to a non-zero value. Therefore, in either direction the counter to a non-zero value. Therefore, in either direction the
initial value of the EE0B field in the AccECN Option (if one exists) initial value of the EE0B field in the AccECN Option (if one exists)
ought to be non-zero. If AccECN has been negotiated: ought to be non-zero. If AccECN has been negotiated:
o the TCP server MAY check the initial value of the EE0B field in o the TCP server MAY check the initial value of the EE0B field in
the first segment that acknowledges sequence space that at least the first segment that acknowledges sequence space that at least
covers the ISN plus 1. If the initial value of the EE0B field is covers the ISN plus 1. If the initial value of the EE0B field is
zero, the server will switch into a mode that ignores the AccECN zero, the server will switch into a mode that ignores the AccECN
Option for this half connection. Option for this half connection.
o the TCP client MAY check the initial value of the EE0B field on o the TCP client MAY check the initial value of the EE0B field on
the SYN/ACK. If the initial value of the EE0B field is zero, the the SYN/ACK. If the initial value of the EE0B field is zero, the
client will switch into a mode that ignores the AccECN Option for client will switch into a mode that ignores the AccECN Option for
this half connection. this half connection.
While a host is in the mode that ignores the AccECN Option it MUST While a host is in the mode that ignores the AccECN Option it MUST
adopt the conservative interpretation of the ACE field discussed in adopt the conservative interpretation of the ACE field discussed in
Section 3.2.5. Section 3.2.2.5.
Note that the Data Sender MUST NOT test whether the arriving byte Note that the Data Sender MUST NOT test whether the arriving byte
counters in the initial AccECN Option have been initialized to counters in the initial AccECN Option have been initialized to
specific valid values - the above checks solely test whether these specific valid values - the above checks solely test whether these
fields have been incorrectly zeroed. This allows hosts to use fields have been incorrectly zeroed. This allows hosts to use
different initial values as an additional signalling channel in different initial values as an additional signalling channel in
future. Also note that the initial value of either field might be future. Also note that the initial value of either field might be
greater than its expected initial value, because the counters might greater than its expected initial value, because the counters might
already have been incremented. Nonetheless, the initial values of already have been incremented. Nonetheless, the initial values of
the counters have been chosen so that they cannot wrap to zero on the counters have been chosen so that they cannot wrap to zero on
these initial segments. these initial segments.
3.2.7.5. Consistency between AccECN Feedback Fields 3.2.3.2.5. Consistency between AccECN Feedback Fields
When the AccECN Option is available it supplements but does not When the AccECN Option is available it supplements but does not
replace the ACE field. An endpoint using AccECN feedback MUST always replace the ACE field. An endpoint using AccECN feedback MUST always
consider the information provided in the ACE field whether or not the consider the information provided in the ACE field whether or not the
AccECN Option is also available. AccECN Option is also available.
If the AccECN option is present, the s.cep counter might increase If the AccECN option is present, the s.cep counter might increase
while the s.ceb counter does not (e.g. due to a CE-marked control while the s.ceb counter does not (e.g. due to a CE-marked control
packet). The sender's response to such a situation is out of scope, packet). The sender's response to such a situation is out of scope,
and needs to be dealt with in a specification that uses ECN-capable and needs to be dealt with in a specification that uses ECN-capable
skipping to change at page 25, line 30 skipping to change at page 34, line 13
and optionally other integrity tests (Section 4.3). and optionally other integrity tests (Section 4.3).
If either end-point detects that the s.ceb counter has increased but If either end-point detects that the s.ceb counter has increased but
the s.cep has not (and by testing ACK coverage it is certain how much the s.cep has not (and by testing ACK coverage it is certain how much
the ACE field has wrapped), this invalid protocol transition has to the ACE field has wrapped), this invalid protocol transition has to
be due to some form of feedback mangling. So, the Data Sender MUST be due to some form of feedback mangling. So, the Data Sender MUST
disable sending ECN-capable packets for the remainder of the half- disable sending ECN-capable packets for the remainder of the half-
connection by setting the IP/ECN field in all subsequent packets to connection by setting the IP/ECN field in all subsequent packets to
Not-ECT. Not-ECT.
3.2.8. Usage of the AccECN TCP Option 3.2.3.3. Usage of the AccECN TCP Option
The following rules determine when a Data Receiver in AccECN mode If the Data Receiver intends to use the AccECN TCP Option to provide
sends the AccECN TCP Option, and which fields to include: feedback, the following rules determine when a Data Receiver in
AccECN mode sends an ACK with the AccECN TCP Option, and which fields
to include:
Change-Triggered ACKs: If an arriving packet increments a different Change-Triggered ACKs: If an arriving packet increments a different
byte counter to that incremented by the previous packet, the Data byte counter to that incremented by the previous packet, the Data
Receiver MUST immediately send an ACK with an AccECN Option, Receiver SHOULD immediately send an ACK with an AccECN Option,
without waiting for the next delayed ACK (this is in addition to without waiting for the next delayed ACK (this is in addition to
the safety recommendation in Section 3.2.5 against ambiguity of the safety recommendation in Section 3.2.2.5 against ambiguity of
the ACE field). the ACE field).
This is stated as a "MUST" so that the data sender can rely on Even though this bullet is stated as a "SHOULD", it is important
change-triggered ACKs to detect transitions right from the very for a transition to immediately trigger an ACK if at all possible,
start of a flow, without first having to detect whether the as already argued when specifying change-triggered ACKs for the
receiver complies. A concern has been raised that certain offload ACE.
hardware needed for high performance might not be able to support
change-triggered ACKs, although high performance protocols such as
DCTCP successfully use change-triggered ACKs. One possible
experimental compromise would be for the receiver to heuristically
detect whether the sender is in slow-start, then to implement
change-triggered ACKs in software while the sender is in slow-
start, and offload to hardware otherwise. If the operator
disables change-triggered ACKs, whether partially like this or
otherwise, the operator will also be responsible for ensuring a
co-ordinated sender algorithm is deployed;
Continual Repetition: Otherwise, if arriving packets continue to Continual Repetition: Otherwise, if arriving packets continue to
increment the same byte counter, the Data Receiver can include an increment the same byte counter, the Data Receiver can include an
AccECN Option on most or all (delayed) ACKs, but it does not have AccECN Option on most or all (delayed) ACKs, but it does not have
to. If option space is limited on a particular ACK, the Data to.
Receiver MUST give precedence to SACK information about loss. It
SHOULD include an AccECN Option if the r.ceb counter has
incremented and it MAY include an AccECN Option if r.ec0b or
r.ec1b has incremented;
Full-Length Options Preferred: It SHOULD always use full-length * It SHOULD include a counter that has continued to increment on
AccECN Options. It MAY use shorter AccECN Options if space is the next scheduled ACK following a change-triggered ACK;
limited, but it MUST include the counter(s) that have incremented
since the previous AccECN Option and it MUST only truncate fields * while the same counter continues to increment, it SHOULD
from the right-hand tail of the option to preserve the order of include the counter every n ACKs as consistently as possible,
the remaining fields (see Section 3.2.6); where n can be chosen by the implementer;
* It SHOULD always include an AccECN Option if the r.ceb counter
is incrementing and it MAY include an AccECN Option if r.ec0b
or r.ec1b is incrementing
* It SHOULD, include each counter at least once for every 2^22
bytes incremented to prevent overflow during continual
repetition.
If the smallest allowed AccECN Option would leave insufficient
space for two SACK blocks on a particular ACK, the Data Receiver
MUST give precedence to the SACK option (total 18 octets), because
loss feedback is more critical.
Necessary Option Length: It MAY exclude counter(s) that have not
changed for the whole connection (but beacons still include all
fields - see below). It SHOULD include counter(s) that have
incremented at some time during the connection. It MUST include
the counter(s) that have incremented since the previous AccECN
Option and it MUST only truncate fields from the right-hand tail
of the option to preserve the order of the remaining fields (see
Section 3.2.3);
Beaconing Full-Length Options: Nonetheless, it MUST include a full- Beaconing Full-Length Options: Nonetheless, it MUST include a full-
length AccECN TCP Option on at least three ACKs per RTT, or on all length AccECN TCP Option on at least three ACKs per RTT, or on all
ACKs if there are less than three per RTT (see Appendix A.4 for an ACKs if there are less than three per RTT (see Appendix A.4 for an
example algorithm that satisfies this requirement). example algorithm that satisfies this requirement).
The above rules complement those in Section 3.2.2.5, which determine
when to generate an ACK irrespective of whether an AccECN TCP Option
is to be included.
The following example series of arriving IP/ECN fields illustrates The following example series of arriving IP/ECN fields illustrates
when a Data Receiver will emit an ACK if it is using a delayed ACK when a Data Receiver will emit an ACK with an AccECN Option if it is
factor of 2 segments and change-triggered ACKs: 01 -> ACK, 01, 01 -> using a delayed ACK factor of 2 segments and change-triggered ACKs:
ACK, 10 -> ACK, 10, 01 -> ACK, 01, 11 -> ACK, 01 -> ACK. 01 -> ACK, 01, 01 -> ACK, 10 -> ACK, 10, 01 -> ACK, 01, 11 -> ACK, 01
-> ACK.
For the avoidance of doubt, the change-triggered ACK mechanism is Even though first bullet is stated as a "SHOULD", it is important for
a transition to immediately trigger an ACK if at all possible, so
that the Data Sender can rely on change-triggered ACKs to detect
queue growth as soon as possible, e.g. at the start of a flow. This
requirement can only be relaxed if certain offload hardware needed
for high performance cannot support change-triggered ACKs (although
high performance protocols such as DCTCP already successfully use
change-triggered ACKs). One possible experimental compromise would
be for the receiver to heuristically detect whether the sender is in
slow-start, then to implement change-triggered ACKs while the sender
is in slow-start, and offload otherwise.
For the avoidance of doubt, this change-triggered ACK mechanism is
deliberately worded to ignore the arrival of a control packet with no deliberately worded to ignore the arrival of a control packet with no
payload, which therefore does not alter any byte counters, because it payload, which therefore does not alter any byte counters, because it
is important that TCP does not acknowledge pure ACKs. The change- is important that TCP does not acknowledge pure ACKs. The change-
triggered ACK approach can lead to some additional ACKs but it feeds triggered ACK approach can lead to some additional ACKs but it feeds
back the timing and the order in which ECN marks are received with back the timing and the order in which ECN marks are received with
minimal additional complexity. If only CE marks are infrequent, or minimal additional complexity. If only CE marks are infrequent, or
there are multiple marks in a row, the additional load will be low. there are multiple marks in a row, the additional load will be low.
Other marking patterns could increase the load significantly, Other marking patterns could increase the load significantly,
Investigating the additional load is a goal of the proposed Investigating the additional load is a goal of the proposed
experiment. experiment.
skipping to change at page 27, line 24 skipping to change at page 36, line 34
Middleboxes on AccECN Compliance Middleboxes on AccECN Compliance
A large class of middleboxes split TCP connections. Such a middlebox A large class of middleboxes split TCP connections. Such a middlebox
would be compliant with the AccECN protocol if the TCP implementation would be compliant with the AccECN protocol if the TCP implementation
on each side complied with the present AccECN specification and each on each side complied with the present AccECN specification and each
side negotiated AccECN independently of the other side. side negotiated AccECN independently of the other side.
Another large class of middleboxes intervenes to some degree at the Another large class of middleboxes intervenes to some degree at the
transport layer, but attempts to be transparent (invisible) to the transport layer, but attempts to be transparent (invisible) to the
end-to-end connection. A subset of this class of middleboxes end-to-end connection. A subset of this class of middleboxes
attempts to `normalise' the TCP wire protocol by checking that all attempts to `normalize' the TCP wire protocol by checking that all
values in header fields comply with a rather narrow interpretation of values in header fields comply with a rather narrow interpretation of
the TCP specifications. To comply with the present AccECN the TCP specifications. To comply with the present AccECN
specification, such a middlebox MUST NOT change the ACE field or the specification, such a middlebox MUST NOT change the ACE field or the
AccECN Option and it SHOULD preserve the timing of each ACK (for AccECN Option and it SHOULD preserve the timing of each ACK (for
example, if it coalesced ACKs it would not be AccECN-compliant) as example, if it coalesced ACKs it would not be AccECN-compliant) as
these can be used by the Data Sender to infer further information these can be used by the Data Sender to infer further information
about the path congestion level. A middlebox claiming to be about the path congestion level. A middlebox claiming to be
transparent at the transport layer MUST forward the AccECN TCP Option transparent at the transport layer MUST forward the AccECN TCP Option
unaltered, whether or not the length value matches one of those unaltered, whether or not the length value matches one of those
specified in Section 3.2.6, and whether or not the initial values of specified in Section 3.2.3, and whether or not the initial values of
the byte-counter fields are correct. This is because blocking the byte-counter fields are correct. This is because blocking
apparently invalid values does not improve security (because AccECN apparently invalid values does not improve security (because AccECN
hosts are required to ignore invalid values anyway), while it hosts are required to ignore invalid values anyway), while it
prevents the standardised set of values being extended in future prevents the standardized set of values being extended in future
(because outdated normalisers would block updated hosts from using (because outdated normalizers would block updated hosts from using
the extended AccECN standard). the extended AccECN standard).
Hardware to offload certain TCP processing represents another large Hardware to offload certain TCP processing represents another large
class of middleboxes, even though it is often a function of a host's class of middleboxes, even though it is often a function of a host's
network interface and rarely in its own 'box'. Leeway has been network interface and rarely in its own 'box'. Leeway has been
allowed in the present AccECN specification in the expectation that allowed in the present AccECN specification in the expectation that
offload hardware could comply and still serve its function. offload hardware could comply and still serve its function.
Nonetheless, such hardware SHOULD also preserve the timing of each Nonetheless, such hardware SHOULD also preserve the timing of each
ACK (for example, if it coalesced ACKs it would not be AccECN- ACK (for example, if it coalesced ACKs it would not be AccECN-
compliant). compliant).
skipping to change at page 28, line 20 skipping to change at page 37, line 32
DCTCP-style feedback changes less often when there are long sequences DCTCP-style feedback changes less often when there are long sequences
of CE marks, which is more common with a step marking threshold. In of CE marks, which is more common with a step marking threshold. In
order to enable DCTCP to improve its responsiveness, DCs will need to order to enable DCTCP to improve its responsiveness, DCs will need to
move beyond step marking. Before this can happen, offload hardware move beyond step marking. Before this can happen, offload hardware
will have to explicitly address the variability of ECN feedback. will have to explicitly address the variability of ECN feedback.
ECN encodes a varying signal in the ACK stream, so it is inevitable ECN encodes a varying signal in the ACK stream, so it is inevitable
that offload hardware will ultimately need to handle any form of ECN that offload hardware will ultimately need to handle any form of ECN
feedback exceptionally. The purpose of working towards standardized feedback exceptionally. The purpose of working towards standardized
TCP ECN feedback is to reduce the risk for hardware developers, who TCP ECN feedback is to reduce the risk for hardware developers, who
will have to choose which scheme is likely to become dominant. would otherwise have to guess which scheme is likely to become
dominant.
4. Interaction with Other TCP Variants 4. Interaction with Other TCP Variants
This section is informative, not normative. This section is informative, not normative.
4.1. Compatibility with SYN Cookies 4.1. Compatibility with SYN Cookies
A TCP server can use SYN Cookies (see Appendix A of [RFC4987]) to A TCP server can use SYN Cookies (see Appendix A of [RFC4987]) to
protect itself from SYN flooding attacks. It places minimal commonly protect itself from SYN flooding attacks. It places minimal commonly
used connection state in the SYN/ACK, and deliberately does not hold used connection state in the SYN/ACK, and deliberately does not hold
skipping to change at page 29, line 20 skipping to change at page 38, line 33
4.2. Compatibility with Other TCP Options and Experiments 4.2. Compatibility with Other TCP Options and Experiments
AccECN is compatible (at least on paper) with the most commonly used AccECN is compatible (at least on paper) with the most commonly used
TCP options: MSS, time-stamp, window scaling, SACK and TCP-AO. It is TCP options: MSS, time-stamp, window scaling, SACK and TCP-AO. It is
also compatible with the recent promising experimental TCP options also compatible with the recent promising experimental TCP options
TCP Fast Open (TFO [RFC7413]) and Multipath TCP (MPTCP [RFC6824]). TCP Fast Open (TFO [RFC7413]) and Multipath TCP (MPTCP [RFC6824]).
AccECN is friendly to all these protocols, because space for TCP AccECN is friendly to all these protocols, because space for TCP
options is particularly scarce on the SYN, where AccECN consumes zero options is particularly scarce on the SYN, where AccECN consumes zero
additional header space. additional header space.
When option space is under pressure from other options, Section 3.2.8 When option space is under pressure from other options,
provides guidance on how important it is to send an AccECN Option and Section 3.2.3.3 provides guidance on how important it is to send an
whether it needs to be a full-length option. AccECN Option and whether it needs to be a full-length option.
Implementers of TFO need to take careful note of the recommendation
in Section 3.2.2.1. That section recommends that, if the client has
successfully negotiated AccECN, when acknowledging the SYN/ACK, even
if it has data to send, it sends a pure ACK immediately before the
data. Then it can reflect the IP-ECN field of the SYN/ACK on this
pure ACK, which allows the server to detect ECN mangling.
4.3. Compatibility with Feedback Integrity Mechanisms 4.3. Compatibility with Feedback Integrity Mechanisms
Three alternative mechanisms are available to assure the integrity of Three alternative mechanisms are available to assure the integrity of
ECN and/or loss signals. AccECN is compatible with any of these ECN and/or loss signals. AccECN is compatible with any of these
approaches: approaches:
o The Data Sender can test the integrity of the receiver's ECN (or o The Data Sender can test the integrity of the receiver's ECN (or
loss) feedback by occasionally setting the IP-ECN field to a value loss) feedback by occasionally setting the IP-ECN field to a value
normally only set by the network (and/or deliberately leaving a normally only set by the network (and/or deliberately leaving a
sequence number gap). Then it can test whether the Data sequence number gap). Then it can test whether the Data
Receiver's feedback faithfully reports what it expects (similar to Receiver's feedback faithfully reports what it expects (similar to
para 2 of Section 20.2 of [RFC3168]). Unlike the ECN Nonce para 2 of Section 20.2 of [RFC3168]). Unlike the ECN Nonce
[RFC3540], this approach does not waste the ECT(1) codepoint in [RFC3540], this approach does not waste the ECT(1) codepoint in
the IP header, it does not require standardisation and it does not the IP header, it does not require standardization and it does not
rely on misbehaving receivers volunteering to reveal feedback rely on misbehaving receivers volunteering to reveal feedback
information that allows them to be detected. However, setting the information that allows them to be detected. However, setting the
CE mark by the sender might conceal actual congestion feedback CE mark by the sender might conceal actual congestion feedback
from the network and should therefore only be done sparsely. from the network and should therefore only be done sparingly.
o Networks generate congestion signals when they are becoming o Networks generate congestion signals when they are becoming
congested, so networks are more likely than Data Senders to be congested, so networks are more likely than Data Senders to be
concerned about the integrity of the receiver's feedback of these concerned about the integrity of the receiver's feedback of these
signals. A network can enforce a congestion response to its ECN signals. A network can enforce a congestion response to its ECN
markings (or packet losses) using congestion exposure (ConEx) markings (or packet losses) using congestion exposure (ConEx)
audit [RFC7713]. Whether the receiver or a downstream network is audit [RFC7713]. Whether the receiver or a downstream network is
suppressing congestion feedback or the sender is unresponsive to suppressing congestion feedback or the sender is unresponsive to
the feedback, or both, ConEx audit can neutralise any advantage the feedback, or both, ConEx audit can neutralize any advantage
that any of these three parties would otherwise gain. that any of these three parties would otherwise gain.
ConEx is a change to the Data Sender that is most useful when ConEx is a change to the Data Sender that is most useful when
combined with AccECN. Without AccECN, the ConEx behaviour of a combined with AccECN. Without AccECN, the ConEx behaviour of a
Data Sender would have to be more conservative than would be Data Sender would have to be more conservative than would be
necessary if it had the accurate feedback of AccECN. necessary if it had the accurate feedback of AccECN.
o The TCP authentication option (TCP-AO [RFC5925]) can be used to o The TCP authentication option (TCP-AO [RFC5925]) can be used to
detect any tampering with AccECN feedback between the Data detect any tampering with AccECN feedback between the Data
Receiver and the Data Sender (whether malicious or accidental). Receiver and the Data Sender (whether malicious or accidental).
The AccECN fields are immutable end-to-end, so they are amenable The AccECN fields are immutable end-to-end, so they are amenable
to TCP-AO protection, which covers TCP options by default. to TCP-AO protection, which covers TCP options by default.
However, TCP-AO is often too brittle to use on many end-to-end However, TCP-AO is often too brittle to use on many end-to-end
paths, where middleboxes can make verification fail in their paths, where middleboxes can make verification fail in their
attempts to improve performance or security, e.g. by attempts to improve performance or security, e.g. by
resegmentation or shifting the sequence space. resegmentation or shifting the sequence space.
Originally the ECN Nonce [RFC3540] was proposed to ensure integrity Originally the ECN Nonce [RFC3540] was proposed to ensure integrity
of congestion feedback. With minor changes AccECN could be optimised of congestion feedback. With minor changes AccECN could be optimized
for the possibility that the ECT(1) codepoint might be used as an ECN for the possibility that the ECT(1) codepoint might be used as an ECN
Nonce. However, given RFC 3540 has been reclassified as historic, Nonce. However, given RFC 3540 has been reclassified as historic,
the AccECN design has been generalised so that it ought to be able to the AccECN design has been generalized so that it ought to be able to
support other possible uses of the ECT(1) codepoint, such as a lower support other possible uses of the ECT(1) codepoint, such as a lower
severity or a more instant congestion signal than CE. severity or a more instant congestion signal than CE.
5. Protocol Properties 5. Protocol Properties
This section is informative not normative. It describes how well the This section is informative not normative. It describes how well the
protocol satisfies the agreed requirements for a more accurate ECN protocol satisfies the agreed requirements for a more accurate ECN
feedback protocol [RFC7560]. feedback protocol [RFC7560].
Accuracy: From each ACK, the Data Sender can infer the number of new Accuracy: From each ACK, the Data Sender can infer the number of new
skipping to change at page 31, line 10 skipping to change at page 40, line 36
arrives. arrives.
Timeliness: While the same ECN markings are arriving continually at Timeliness: While the same ECN markings are arriving continually at
the Data Receiver, it can defer ACKs as TCP does normally, but it the Data Receiver, it can defer ACKs as TCP does normally, but it
will immediately send an ACK as soon as a different ECN marking will immediately send an ACK as soon as a different ECN marking
arrives. arrives.
Timeliness vs Overhead: Change-Triggered ACKs are intended to enable Timeliness vs Overhead: Change-Triggered ACKs are intended to enable
latency-sensitive uses of ECN feedback by capturing the timing of latency-sensitive uses of ECN feedback by capturing the timing of
transitions but not wasting resources while the state of the transitions but not wasting resources while the state of the
signalling system is stable. The receiver can control how signalling system is stable. Within the constraints of the
frequently it sends the AccECN TCP Option and therefore it can change-triggered ACK rules, the receiver can control how
control the overhead induced by AccECN. frequently it sends the AccECN TCP Option and therefore to some
extent it can control the overhead induced by AccECN.
Resilience: All information is provided based on counters. Resilience: All information is provided based on counters.
Therefore if ACKs are lost, the counters on the first ACK Therefore if ACKs are lost, the counters on the first ACK
following the losses allows the Data Sender to immediately recover following the losses allows the Data Sender to immediately recover
the number of the ECN markings that it missed. the number of the ECN markings that it missed. And if data or
ACKs are reordered, stale congestion information can be identified
and ignored.
Resilience against Bias: Because feedback is based on repetition of Resilience against Bias: Because feedback is based on repetition of
counters, random losses do not remove any information, they only counters, random losses do not remove any information, they only
delay it. Therefore, even though some ACKs are change-triggered, delay it. Therefore, even though some ACKs are change-triggered,
random losses will not alter the proportions of the different ECN random losses will not alter the proportions of the different ECN
markings in the feedback. markings in the feedback.
Resilience vs Overhead: If space is limited in some segments (e.g. Resilience vs Overhead: If space is limited in some segments (e.g.
because more option are need on some segments, such as the SACK because more options are needed on some segments, such as the SACK
option after loss), the Data Receiver can send AccECN Options less option after loss), the Data Receiver can send AccECN Options less
frequently or truncate fields that have not changed, usually down frequently or truncate fields that have not changed, usually down
to as little as 5 bytes. However, it has to send a full-sized to as little as 5 bytes. However, it has to send a full-sized
AccECN Option at least three times per RTT, which the Data Sender AccECN Option at least three times per RTT, which the Data Sender
can rely on as a regular beacon or checkpoint. can rely on as a regular beacon or checkpoint.
Resilience vs Timeliness and Ordering: Ordering information and the Resilience vs Timeliness and Ordering: Ordering information and the
timing of transitions cannot be communicated in three cases: i) timing of transitions cannot be communicated in three cases: i)
during ACK loss; ii) if something on the path strips the AccECN during ACK loss; ii) if something on the path strips the AccECN
Option; or iii) if the Data Receiver is unable to support Change- Option; or iii) if the Data Receiver is unable to support Change-
Triggered ACKs. Triggered ACKs. Following ACK reordering, the Data Sender can
reconstruct the order in which feedback was sent, but not until
all the missing feedback has arrived.
Complexity: An AccECN implementation solely involves simple counter Complexity: An AccECN implementation solely involves simple counter
increments, some modulo arithmetic to communicate the least increments, some modulo arithmetic to communicate the least
significant bits and allow for wrap, and some heuristics for significant bits and allow for wrap, and some heuristics for
safety against fields cycling due to prolonged periods of ACK safety against fields cycling due to prolonged periods of ACK
loss. Each host needs to maintain eight additional counters. The loss. Each host needs to maintain eight additional counters. The
hosts have to apply some additional tests to detect tampering by hosts have to apply some additional tests to detect tampering by
middleboxes, but in general the protocol is simple to understand, middleboxes, but in general the protocol is simple to understand,
simple to implement and requires few cycles per packet to execute. simple to implement and requires few cycles per packet to execute.
skipping to change at page 32, line 20 skipping to change at page 41, line 50
middlebox, AccECN still provides basic congestion feedback in the middlebox, AccECN still provides basic congestion feedback in the
ACE field. Further, AccECN can be used to detect mangling of the ACE field. Further, AccECN can be used to detect mangling of the
IP ECN field; mangling of the TCP ECN flags; blocking of ECT- IP ECN field; mangling of the TCP ECN flags; blocking of ECT-
marked segments; and blocking of segments carrying the AccECN marked segments; and blocking of segments carrying the AccECN
Option. It can detect these conditions during TCP's 3WHS so that Option. It can detect these conditions during TCP's 3WHS so that
it can fall back to operation without ECN and/or operation without it can fall back to operation without ECN and/or operation without
the AccECN Option. the AccECN Option.
Forward Compatibility: The behaviour of endpoints and middleboxes is Forward Compatibility: The behaviour of endpoints and middleboxes is
carefully defined for all reserved or currently unused codepoints carefully defined for all reserved or currently unused codepoints
in the scheme, to ensure that any blocking of anomalous values is in the scheme. Then, the designers of security devices can
always at least under reversible policy control. understand which currently unused values might appear in future.
So, even if they choose to treat such values as anomalous while
they are not widely used, any blocking will at least be under
policy control not hard-coded. Then, if previously unused values
start to appear on the Internet (or in standards), such policies
could be quickly reversed.
6. IANA Considerations 6. IANA Considerations
This document reassigns bit 7 of the TCP header flags to the AccECN This document reassigns bit 7 of the TCP header flags to the AccECN
experiment. This bit was previously called the Nonce Sum (NS) flag experiment. This bit was previously called the Nonce Sum (NS) flag
[RFC3540], but RFC 3540 has been reclassified as historic [RFC8311]. [RFC3540], but RFC 3540 has been reclassified as historic [RFC8311].
The flag will now be defined as: The flag will now be defined as:
+-----+-------------------+-----------+ +-----+-------------------+-----------+
| Bit | Name | Reference | | Bit | Name | Reference |
skipping to change at page 33, line 18 skipping to change at page 43, line 10
Early implementation before the IANA allocation MUST follow [RFC6994] Early implementation before the IANA allocation MUST follow [RFC6994]
and use experimental option 254 and magic number 0xACCE (16 bits), and use experimental option 254 and magic number 0xACCE (16 bits),
then migrate to the new option after the allocation. then migrate to the new option after the allocation.
7. Security Considerations 7. Security Considerations
If ever the supplementary part of AccECN based on the new AccECN TCP If ever the supplementary part of AccECN based on the new AccECN TCP
Option is unusable (due for example to middlebox interference) the Option is unusable (due for example to middlebox interference) the
essential part of AccECN's congestion feedback offers only limited essential part of AccECN's congestion feedback offers only limited
resilience to long runs of ACK loss (see Section 3.2.5). These resilience to long runs of ACK loss (see Section 3.2.2.5). These
problems are unlikely to be due to malicious intervention (because if problems are unlikely to be due to malicious intervention (because if
an attacker could strip a TCP option or discard a long run of ACKs it an attacker could strip a TCP option or discard a long run of ACKs it
could wreak other arbitrary havoc). However, it would be of concern could wreak other arbitrary havoc). However, it would be of concern
if AccECN's resilience could be indirectly compromised during a if AccECN's resilience could be indirectly compromised during a
flooding attack. AccECN is still considered safe though, because if flooding attack. AccECN is still considered safe though, because if
the option is not presented, the AccECN Data Sender is then required the option is not presented, the AccECN Data Sender is then required
to switch to more conservative assumptions about wrap of congestion to switch to more conservative assumptions about wrap of congestion
indication counters (see Section 3.2.5 and Appendix A.2). indication counters (see Section 3.2.2.5 and Appendix A.2).
Section 4.1 describes how a TCP server can negotiate AccECN and use Section 4.1 describes how a TCP server can negotiate AccECN and use
the SYN cookie method for mitigating SYN flooding attacks. the SYN cookie method for mitigating SYN flooding attacks.
There is concern that ECN markings could be altered or suppressed, There is concern that ECN markings could be altered or suppressed,
particularly because a misbehaving Data Receiver could increase its particularly because a misbehaving Data Receiver could increase its
own throughput at the expense of others. AccECN is compatible with own throughput at the expense of others. AccECN is compatible with
the three schemes known to assure the integrity of ECN feedback (see the three schemes known to assure the integrity of ECN feedback (see
Section 4.3 for details). If the AccECN Option is stripped by an Section 4.3 for details). If the AccECN Option is stripped by an
incorrectly implemented middlebox, the resolution of the feedback incorrectly implemented middlebox, the resolution of the feedback
skipping to change at page 34, line 9 skipping to change at page 43, line 46
can contrive one. can contrive one.
The AccECN protocol is not believed to introduce any new privacy The AccECN protocol is not believed to introduce any new privacy
concerns, because it merely counts and feeds back signals at the concerns, because it merely counts and feeds back signals at the
transport layer that had already been visible at the IP layer. transport layer that had already been visible at the IP layer.
8. Acknowledgements 8. Acknowledgements
We want to thank Koen De Schepper, Praveen Balasubramanian, Michael We want to thank Koen De Schepper, Praveen Balasubramanian, Michael
Welzl, Gorry Fairhurst, David Black, Spencer Dawkins, Michael Scharf, Welzl, Gorry Fairhurst, David Black, Spencer Dawkins, Michael Scharf,
Michael Tuexen and Yuchung Cheng for their input and discussion. The Michael Tuexen, Yuchung Cheng, Kenjiro Cho, Olivier Tilmans and Ilpo
idea of using the three ECN-related TCP flags as one field for more Jaervinen for their input and discussion. The idea of using the
accurate TCP-ECN feedback was first introduced in the re-ECN protocol three ECN-related TCP flags as one field for more accurate TCP-ECN
that was the ancestor of ConEx. feedback was first introduced in the re-ECN protocol that was the
ancestor of ConEx.
Bob Briscoe was part-funded by the European Community under its Bob Briscoe was part-funded by the Comcast Innovation Fund, the
Seventh Framework Programme through the Reducing Internet Transport European Community under its Seventh Framework Programme through the
Latency (RITE) project (ICT-317700) and through the Trilogy 2 project Reducing Internet Transport Latency (RITE) project (ICT-317700) and
(ICT-317756). He was also part-funded by the Research Council of through the Trilogy 2 project (ICT-317756), and the Research Council
Norway through the TimeIn project. The views expressed here are of Norway through the TimeIn project. The views expressed here are
solely those of the authors. solely those of the authors.
Mirja Kuehlewind was partly supported by the European Commission Mirja Kuehlewind was partly supported by the European Commission
under Horizon 2020 grant agreement no. 688421 Measurement and under Horizon 2020 grant agreement no. 688421 Measurement and
Architecture for a Middleboxed Internet (MAMI), and by the Swiss Architecture for a Middleboxed Internet (MAMI), and by the Swiss
State Secretariat for Education, Research, and Innovation under State Secretariat for Education, Research, and Innovation under
contract no. 15.0268. This support does not imply endorsement. contract no. 15.0268. This support does not imply endorsement.
9. Comments Solicited 9. Comments Solicited
Comments and questions are encouraged and very welcome. They can be Comments and questions are encouraged and very welcome. They can be
addressed to the IETF TCP maintenance and minor modifications working addressed to the IETF TCP maintenance and minor modifications working
group mailing list <tcpm@ietf.org>, and/or to the authors. group mailing list <tcpm@ietf.org>, and/or to the authors.
10. References 10. References
10.1. Normative References 10.1. Normative References
[RFC0793] Postel, J., "Transmission Control Protocol", STD 7,
RFC 793, DOI 10.17487/RFC0793, September 1981,
<https://www.rfc-editor.org/info/rfc793>.
[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
Requirement Levels", BCP 14, RFC 2119, Requirement Levels", BCP 14, RFC 2119,
DOI 10.17487/RFC2119, March 1997, DOI 10.17487/RFC2119, March 1997,
<https://www.rfc-editor.org/info/rfc2119>. <https://www.rfc-editor.org/info/rfc2119>.
[RFC3168] Ramakrishnan, K., Floyd, S., and D. Black, "The Addition [RFC3168] Ramakrishnan, K., Floyd, S., and D. Black, "The Addition
of Explicit Congestion Notification (ECN) to IP", of Explicit Congestion Notification (ECN) to IP",
RFC 3168, DOI 10.17487/RFC3168, September 2001, RFC 3168, DOI 10.17487/RFC3168, September 2001,
<https://www.rfc-editor.org/info/rfc3168>. <https://www.rfc-editor.org/info/rfc3168>.
[RFC5681] Allman, M., Paxson, V., and E. Blanton, "TCP Congestion [RFC5681] Allman, M., Paxson, V., and E. Blanton, "TCP Congestion
Control", RFC 5681, DOI 10.17487/RFC5681, September 2009, Control", RFC 5681, DOI 10.17487/RFC5681, September 2009,
<https://www.rfc-editor.org/info/rfc5681>. <https://www.rfc-editor.org/info/rfc5681>.
[RFC6994] Touch, J., "Shared Use of Experimental TCP Options",
RFC 6994, DOI 10.17487/RFC6994, August 2013,
<https://www.rfc-editor.org/info/rfc6994>.
[RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC [RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC
2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174,
May 2017, <https://www.rfc-editor.org/info/rfc8174>. May 2017, <https://www.rfc-editor.org/info/rfc8174>.
10.2. Informative References 10.2. Informative References
[I-D.ietf-tcpm-2140bis]
Touch, J., Welzl, M., and S. Islam, "TCP Control Block
Interdependence", draft-ietf-tcpm-2140bis-02 (work in
progress), February 2020.
[I-D.ietf-tcpm-generalized-ecn] [I-D.ietf-tcpm-generalized-ecn]
Bagnulo, M. and B. Briscoe, "ECN++: Adding Explicit Bagnulo, M. and B. Briscoe, "ECN++: Adding Explicit
Congestion Notification (ECN) to TCP Control Packets", Congestion Notification (ECN) to TCP Control Packets",
draft-ietf-tcpm-generalized-ecn-03 (work in progress), draft-ietf-tcpm-generalized-ecn-05 (work in progress),
October 2018. November 2019.
[I-D.ietf-tsvwg-l4s-arch] [I-D.ietf-tsvwg-l4s-arch]
Briscoe, B., Schepper, K., and M. Bagnulo, "Low Latency, Briscoe, B., Schepper, K., Bagnulo, M., and G. White, "Low
Low Loss, Scalable Throughput (L4S) Internet Service: Latency, Low Loss, Scalable Throughput (L4S) Internet
Architecture", draft-ietf-tsvwg-l4s-arch-03 (work in Service: Architecture", draft-ietf-tsvwg-l4s-arch-05 (work
progress), October 2018. in progress), February 2020.
[I-D.kuehlewind-tcpm-ecn-fallback] [I-D.kuehlewind-tcpm-ecn-fallback]
Kuehlewind, M. and B. Trammell, "A Mechanism for ECN Path Kuehlewind, M. and B. Trammell, "A Mechanism for ECN Path
Probing and Fallback", draft-kuehlewind-tcpm-ecn- Probing and Fallback", draft-kuehlewind-tcpm-ecn-
fallback-01 (work in progress), September 2013. fallback-01 (work in progress), September 2013.
[Mandalari18] [Mandalari18]
Mandalari, A., Lutu, A., Briscoe, B., Bagnulo, M., and Oe. Mandalari, A., Lutu, A., Briscoe, B., Bagnulo, M., and Oe.
Alay, "Measuring ECN++: Good News for ++, Bad News for ECN Alay, "Measuring ECN++: Good News for ++, Bad News for ECN
over Mobile", IEEE Communications Magazine , March 2018. over Mobile", IEEE Communications Magazine , March 2018.
[RFC2018] Mathis, M., Mahdavi, J., Floyd, S., and A. Romanow, "TCP
Selective Acknowledgment Options", RFC 2018,
DOI 10.17487/RFC2018, October 1996,
<https://www.rfc-editor.org/info/rfc2018>.
[RFC3540] Spring, N., Wetherall, D., and D. Ely, "Robust Explicit [RFC3540] Spring, N., Wetherall, D., and D. Ely, "Robust Explicit
Congestion Notification (ECN) Signaling with Nonces", Congestion Notification (ECN) Signaling with Nonces",
RFC 3540, DOI 10.17487/RFC3540, June 2003, RFC 3540, DOI 10.17487/RFC3540, June 2003,
<https://www.rfc-editor.org/info/rfc3540>. <https://www.rfc-editor.org/info/rfc3540>.
[RFC4987] Eddy, W., "TCP SYN Flooding Attacks and Common [RFC4987] Eddy, W., "TCP SYN Flooding Attacks and Common
Mitigations", RFC 4987, DOI 10.17487/RFC4987, August 2007, Mitigations", RFC 4987, DOI 10.17487/RFC4987, August 2007,
<https://www.rfc-editor.org/info/rfc4987>. <https://www.rfc-editor.org/info/rfc4987>.
[RFC5562] Kuzmanovic, A., Mondal, A., Floyd, S., and K. [RFC5562] Kuzmanovic, A., Mondal, A., Floyd, S., and K.
Ramakrishnan, "Adding Explicit Congestion Notification Ramakrishnan, "Adding Explicit Congestion Notification
(ECN) Capability to TCP's SYN/ACK Packets", RFC 5562, (ECN) Capability to TCP's SYN/ACK Packets", RFC 5562,
DOI 10.17487/RFC5562, June 2009, DOI 10.17487/RFC5562, June 2009,
<https://www.rfc-editor.org/info/rfc5562>. <https://www.rfc-editor.org/info/rfc5562>.
[RFC5925] Touch, J., Mankin, A., and R. Bonica, "The TCP [RFC5925] Touch, J., Mankin, A., and R. Bonica, "The TCP
Authentication Option", RFC 5925, DOI 10.17487/RFC5925, Authentication Option", RFC 5925, DOI 10.17487/RFC5925,
June 2010, <https://www.rfc-editor.org/info/rfc5925>. June 2010, <https://www.rfc-editor.org/info/rfc5925>.
[RFC5961] Ramaiah, A., Stewart, R., and M. Dalal, "Improving TCP's
Robustness to Blind In-Window Attacks", RFC 5961,
DOI 10.17487/RFC5961, August 2010,
<https://www.rfc-editor.org/info/rfc5961>.
[RFC6824] Ford, A., Raiciu, C., Handley, M., and O. Bonaventure, [RFC6824] Ford, A., Raiciu, C., Handley, M., and O. Bonaventure,
"TCP Extensions for Multipath Operation with Multiple "TCP Extensions for Multipath Operation with Multiple
Addresses", RFC 6824, DOI 10.17487/RFC6824, January 2013, Addresses", RFC 6824, DOI 10.17487/RFC6824, January 2013,
<https://www.rfc-editor.org/info/rfc6824>. <https://www.rfc-editor.org/info/rfc6824>.
[RFC6994] Touch, J., "Shared Use of Experimental TCP Options",
RFC 6994, DOI 10.17487/RFC6994, August 2013,
<https://www.rfc-editor.org/info/rfc6994>.
[RFC7413] Cheng, Y., Chu, J., Radhakrishnan, S., and A. Jain, "TCP [RFC7413] Cheng, Y., Chu, J., Radhakrishnan, S., and A. Jain, "TCP
Fast Open", RFC 7413, DOI 10.17487/RFC7413, December 2014, Fast Open", RFC 7413, DOI 10.17487/RFC7413, December 2014,
<https://www.rfc-editor.org/info/rfc7413>. <https://www.rfc-editor.org/info/rfc7413>.
[RFC7560] Kuehlewind, M., Ed., Scheffenegger, R., and B. Briscoe, [RFC7560] Kuehlewind, M., Ed., Scheffenegger, R., and B. Briscoe,
"Problem Statement and Requirements for Increased Accuracy "Problem Statement and Requirements for Increased Accuracy
in Explicit Congestion Notification (ECN) Feedback", in Explicit Congestion Notification (ECN) Feedback",
RFC 7560, DOI 10.17487/RFC7560, August 2015, RFC 7560, DOI 10.17487/RFC7560, August 2015,
<https://www.rfc-editor.org/info/rfc7560>. <https://www.rfc-editor.org/info/rfc7560>.
skipping to change at page 37, line 34 skipping to change at page 47, line 34
DIVOPT = 2^24 DIVOPT = 2^24
Every time a CE marked data segment arrives, the Data Receiver Every time a CE marked data segment arrives, the Data Receiver
increments its local value of r.ceb by the size of the TCP Data. increments its local value of r.ceb by the size of the TCP Data.
Whenever it sends an ACK with the AccECN Option, the value it writes Whenever it sends an ACK with the AccECN Option, the value it writes
into the ECEB field is into the ECEB field is
ECEB = r.ceb % DIVOPT ECEB = r.ceb % DIVOPT
where '%' is the modulo operator. where '%' is the remainder operator.
On the arrival of an AccECN Option, the Data Sender uses the TCP On the arrival of an AccECN Option, the Data Sender first makes sure
acknowledgement number and any SACK options to calculate newlyAckedB, the ACK has not been superseded in order to avoid winding the s.ceb
the amount of new data that the ACK acknowledges in bytes. If counter backwards. It uses the TCP acknowledgement number and any
newlyAckedB is negative it means that a more up to date ACK has SACK options to calculate newlyAckedB, the amount of new data that
already been processed, so this ACK has been superseded and the Data the ACK acknowledges in bytes (newlyAckedB can be zero but not
Sender has to ignore the AccECN Option. Otherwise, the Data Sender negative). If newlyAckedB is zero, either the ACK has been
calculates the minimum difference d.ceb between the ECEB field and superseded or CE-marked packet(s) without data could have arrived.
its local s.ceb counter, using modulo arithmetic as follows: To break the tie for the latter case, the Data Sender could use
timestamps (if present) to work out newlyAckedT, the amount of new
time that the ACK acknowledges. If the Data Sender determines that
the ACK has been superseded it ignores the AccECN Option. Otherwise,
the Data Sender calculates the minimum non-negative difference d.ceb
between the ECEB field and its local s.ceb counter, using modulo
arithmetic as follows:
if (newlyAckedB >= 0) { if ((newlyAckedB > 0) || (newlyAckedT > 0)) {
d.ceb = (ECEB + DIVOPT - (s.ceb % DIVOPT)) % DIVOPT d.ceb = (ECEB + DIVOPT - (s.ceb % DIVOPT)) % DIVOPT
s.ceb += d.ceb s.ceb += d.ceb
} }
For example, if s.ceb is 33,554,433 and ECEB is 1461 (both decimal), For example, if s.ceb is 33,554,433 and ECEB is 1461 (both decimal),
then then
s.ceb % DIVOPT = 1 s.ceb % DIVOPT = 1
d.ceb = (1461 + 2^24 - 1) % 2^24 d.ceb = (1461 + 2^24 - 1) % 2^24
= 1460 = 1460
s.ceb = 33,554,433 + 1460 s.ceb = 33,554,433 + 1460
= 33,555,893 = 33,555,893
A.2. Example Algorithm for Safety Against Long Sequences of ACK Loss A.2. Example Algorithm for Safety Against Long Sequences of ACK Loss
The example algorithms below show how a Data Receiver in AccECN mode The example algorithms below show how a Data Receiver in AccECN mode
could encode its CE packet counter r.cep into the ACE field, and how could encode its CE packet counter r.cep into the ACE field, and how
skipping to change at page 38, line 22 skipping to change at page 48, line 31
The example algorithms below show how a Data Receiver in AccECN mode The example algorithms below show how a Data Receiver in AccECN mode
could encode its CE packet counter r.cep into the ACE field, and how could encode its CE packet counter r.cep into the ACE field, and how
the Data Sender in AccECN mode could decode the ACE field into its the Data Sender in AccECN mode could decode the ACE field into its
s.cep counter. The Data Sender's algorithm includes code to s.cep counter. The Data Sender's algorithm includes code to
heuristically detect a long enough unbroken string of ACK losses that heuristically detect a long enough unbroken string of ACK losses that
could have concealed a cycle of the congestion counter in the ACE could have concealed a cycle of the congestion counter in the ACE
field of the next ACK to arrive. field of the next ACK to arrive.
Two variants of the algorithm are given: i) a more conservative Two variants of the algorithm are given: i) a more conservative
variant for a Data Sender to use if it detects that the AccECN Option variant for a Data Sender to use if it detects that the AccECN Option
is not available (see Section 3.2.5 and Section 3.2.7); and ii) a is not available (see Section 3.2.2.5 and Section 3.2.3.2); and ii) a
less conservative variant that is feasible when complementary less conservative variant that is feasible when complementary
information is available from the AccECN Option. information is available from the AccECN Option.
A.2.1. Safety Algorithm without the AccECN Option A.2.1. Safety Algorithm without the AccECN Option
It is assumed that each local packet counter is a sufficiently sized It is assumed that each local packet counter is a sufficiently sized
unsigned integer (probably 32b) and that the following constant has unsigned integer (probably 32b) and that the following constant has
been assigned: been assigned:
DIVACE = 2^3 DIVACE = 2^3
Every time a CE marked packet arrives, the Data Receiver increments Every time an Acceptable CE marked packet arrives (Section 3.2.2.2),
its local value of r.cep by 1. It repeats the same value of ACE in the Data Receiver increments its local value of r.cep by 1. It
every subsequent ACK until the next CE marking arrives, where repeats the same value of ACE in every subsequent ACK until the next
CE marking arrives, where
ACE = r.cep % DIVACE. ACE = r.cep % DIVACE.
If the Data Sender received an earlier value of the counter that had If the Data Sender received an earlier value of the counter that had
been delayed due to ACK reordering, it might incorrectly calculate been delayed due to ACK reordering, it might incorrectly calculate
that the ACE field had wrapped. Therefore, on the arrival of every that the ACE field had wrapped. Therefore, on the arrival of every
ACK, the Data Sender uses the TCP acknowledgement number and any SACK ACK, the Data Sender ensures the ACK has not been superseded using
options to calculate newlyAckedB, the amount of new data that the ACK the TCP acknowledgement number, any SACK options and timestamps (if
acknowledges. If newlyAckedB is negative it means that a more up to available) to calculate newlyAckedB, as in Appendix A.1. If the ACK
date ACK has already been processed, so this ACK has been superseded has not been superseded, the Data Sender calculates the minimum
and the Data Sender has to ignore the AccECN Option. If newlyAckedB difference d.cep between the ACE field and its local s.cep counter,
is zero, to break the tie the Data Sender could use timestamps (if using modulo arithmetic as follows:
present) to work out newlyAckedT, the amount of new time that the ACK
acknowledges. Then the Data Sender calculates the minimum difference
d.cep between the ACE field and its local s.cep counter, using modulo
arithmetic as follows:
if ((newlyAckedB > 0) || (newlyAckedB == 0 && newlyAckedT > 0)) if ((newlyAckedB > 0) || (newlyAckedT > 0))
d.cep = (ACE + DIVACE - (s.cep % DIVACE)) % DIVACE d.cep = (ACE + DIVACE - (s.cep % DIVACE)) % DIVACE
Section 3.2.5 requires the Data Sender to assume that the ACE field Section 3.2.2.5 expects the Data Sender to assume that the ACE field
did cycle if it could have cycled under prevailing conditions. The cycled if it is the safest likely case under prevailing conditions.
3-bit ACE field in an arriving ACK could have cycled and become The 3-bit ACE field in an arriving ACK could have cycled and become
ambiguous to the Data Sender if a row of ACKs goes missing that ambiguous to the Data Sender if a row of ACKs goes missing that
covers a stream of data long enough to contain 8 or more CE marks. covers a stream of data long enough to contain 8 or more CE marks.
We use the word `missing' rather than `lost', because some or all the We use the word `missing' rather than `lost', because some or all the
missing ACKs might arrive eventually, but out of order. Even if some missing ACKs might arrive eventually, but out of order. Even if some
of the lost ACKs are piggy-backed on data (i.e. not pure ACKs) of the missing ACKs were piggy-backed on data (i.e. not pure ACKs)
retransmissions will not repair the lost AccECN information, because retransmissions will not repair the lost AccECN information, because
AccECN requires retransmissions to carry the latest AccECN counters, AccECN requires retransmissions to carry the latest AccECN counters,
not the original ones. not the original ones.
The phrase `under prevailing conditions' allows the Data Sender to The phrase `under prevailing conditions' allows for implementation-
take account of the prevailing size of data segments and the dependent interpretation. A Data Sender might take account of the
prevailing CE marking rate just before the sequence of ACK losses. prevailing size of data segments and the prevailing CE marking rate
However, we shall start with the simplest algorithm, which assumes just before the sequence of missing ACKs. However, we shall start
segments are all full-sized and ultra-conservatively it assumes that with the simplest algorithm, which assumes segments are all full-
ECN marking was 100% on the forward path when ACKs on the reverse sized and ultra-conservatively it assumes that ECN marking was 100%
path started to all be dropped. Specifically, if newlyAckedB is the on the forward path when ACKs on the reverse path started to all be
amount of data that an ACK acknowledges since the previous ACK, then dropped. Specifically, if newlyAckedB is the amount of data that an
the Data Sender could assume that this acknowledges newlyAckedPkt ACK acknowledges since the previous ACK, then the Data Sender could
full-sized segments, where newlyAckedPkt = newlyAckedB/MSS. Then it assume that this acknowledges newlyAckedPkt full-sized segments,
could assume that the ACE field incremented by where newlyAckedPkt = newlyAckedB/MSS. Then it could assume that the
ACE field incremented by
dSafer.cep = newlyAckedPkt - ((newlyAckedPkt - d.cep) % DIVACE), dSafer.cep = newlyAckedPkt - ((newlyAckedPkt - d.cep) % DIVACE),
For example, imagine an ACK acknowledges newlyAckedPkt=9 more full- For example, imagine an ACK acknowledges newlyAckedPkt=9 more full-
size segments than any previous ACK, and that ACE increments by a size segments than any previous ACK, and that ACE increments by a
minimum of 2 CE marks (d.cep=2). The above formula works out that it minimum of 2 CE marks (d.cep=2). The above formula works out that it
would still be safe to assume 2 CE marks (because 9 - ((9-2) % 8) = would still be safe to assume 2 CE marks (because 9 - ((9-2) % 8) =
2). However, if ACE increases by a minimum of 2 but acknowledges 10 2). However, if ACE increases by a minimum of 2 but acknowledges 10
full-sized segments, then it would be necessary to assume that there full-sized segments, then it would be necessary to assume that there
could have been 10 CE marks (because 10 - ((10-2) % 8) = 10). could have been 10 CE marks (because 10 - ((10-2) % 8) = 10).
ACKs that acknowledge a large stretch of packets might be common in
data centres to achieve a high packet rate or might be due to ACK
thinning by a middlebox. In these cases, cycling of the ACE field
would often appear to have been possible, so the above algorithm
would be over-conservative, leading to a false high marking rate and
poor performance. Therefore it would be reasonable to only use
dSafer.cep rather than d.cep if the moving average of newlyAckedPkt
was well below 8.
Implementers could build in more heuristics to estimate prevailing Implementers could build in more heuristics to estimate prevailing
average segment size and prevailing ECN marking. For instance, average segment size and prevailing ECN marking. For instance,
newlyAckedPkt in the above formula could be replaced with newlyAckedPkt in the above formula could be replaced with
newlyAckedPktHeur = newlyAckedPkt*p*MSS/s, where s is the prevailing newlyAckedPktHeur = newlyAckedPkt*p*MSS/s, where s is the prevailing
segment size and p is the prevailing ECN marking probability. segment size and p is the prevailing ECN marking probability.
However, ultimately, if TCP's ECN feedback becomes inaccurate it However, ultimately, if TCP's ECN feedback becomes inaccurate it
still has loss detection to fall back on. Therefore, it would seem still has loss detection to fall back on. Therefore, it would seem
safe to implement a simple algorithm, rather than a perfect one. safe to implement a simple algorithm, rather than a perfect one.
The simple algorithm for dSafer.cep above requires no monitoring of The simple algorithm for dSafer.cep above requires no monitoring of
prevailing conditions and it would still be safe if, for example, prevailing conditions and it would still be safe if, for example,
segments were on average at least 5% of full-sized as long as ECN segments were on average at least 5% of full-sized as long as ECN
marking was 5% or less. Assuming it was used, the Data Sender would marking was 5% or less. Assuming it was used, the Data Sender would
increment its packet counter as follows: increment its packet counter as follows:
s.cep += dSafer.cep s.cep += dSafer.cep
If missing acknowledgement numbers arrive later (due to reordering), If missing acknowledgement numbers arrive later (due to reordering),
Section 3.2.5 says "the Data Sender MAY attempt to neutralise the Section 3.2.2.5 says "the Data Sender MAY attempt to neutralize the
effect of any action it took based on a conservative assumption that effect of any action it took based on a conservative assumption that
it later found to be incorrect". To do this, the Data Sender would it later found to be incorrect". To do this, the Data Sender would
have to store the values of all the relevant variables whenever it have to store the values of all the relevant variables whenever it
made assumptions, so that it could re-evaluate them later. Given made assumptions, so that it could re-evaluate them later. Given
this could become complex and it is not required, we do not attempt this could become complex and it is not required, we do not attempt
to provide an example of how to do this. to provide an example of how to do this.
A.2.2. Safety Algorithm with the AccECN Option A.2.2. Safety Algorithm with the AccECN Option
When the AccECN Option is available on the ACKs before and after the When the AccECN Option is available on the ACKs before and after the
possible sequence of ACK losses, if the Data Sender only needs CE- possible sequence of ACK losses, if the Data Sender only needs CE-
marked bytes, it will have sufficient information in the AccECN marked bytes, it will have sufficient information in the AccECN
Option without needing to process the ACE field. However, if for Option without needing to process the ACE field. If for some reason
some reason it needs CE-marked packets, if dSafer.cep is different it needs CE-marked packets, if dSafer.cep is different from d.cep, it
from d.cep, it can calculate the average marked segment size that can determine whether d.cep is likely to be a safe enough estimate by
each implies to determine whether d.cep is likely to be a safe enough checking whether the average marked segment size (s = d.ceb/d.cep) is
estimate. Specifically, it could use the following algorithm, where less than the MSS (where d.ceb is the amount of newly CE-marked bytes
d.ceb is the amount of newly CE-marked bytes (see Appendix A.1): - see Appendix A.1). Specifically, it could use the following
algorithm:
SAFETY_FACTOR = 2 SAFETY_FACTOR = 2
if (dSafer.cep > d.cep) { if (dSafer.cep > d.cep) {
s = d.ceb/d.cep if (d.ceb <= MSS * d.cep) { % Same as (s <= MSS), but no DBZ
if (s <= MSS) {
sSafer = d.ceb/dSafer.cep sSafer = d.ceb/dSafer.cep
if (sSafer < MSS/SAFETY_FACTOR) if (sSafer < MSS/SAFETY_FACTOR)
dSafer.cep = d.cep % d.cep is a safe enough estimate dSafer.cep = d.cep % d.cep is a safe enough estimate
} % else } % else
% No need for else; dSafer.cep is already correct, % No need for else; dSafer.cep is already correct,
% because d.cep must have been too small % because d.cep must have been too small
} }
The chart below shows when the above algorithm will consider d.cep The chart below shows when the above algorithm will consider d.cep
can replace dSafer.cep as a safe enough estimate of the number of CE- can replace dSafer.cep as a safe enough estimate of the number of CE-
marked packets: marked packets:
^ ^
sSafer| sSafer|
| |
MSS+ MSS+
| |
| dSafer.cep | dSafer.cep
| is | is
MSS/2+--------------+ safest MSS/SAFETY_FACTOR+--------------+ safest
| | | |
| d.cep is safe| | d.cep is safe|
| enough | | enough |
+--------------------> +-------------------->
MSS s MSS s
The following examples give the reasoning behind the algorithm, The following examples give the reasoning behind the algorithm,
assuming MSS=1,460 [B]: assuming MSS=1460 [B]:
o if d.cep=0, dSafer.cep=8 and d.ceb=1,460, then s=infinity and o if d.cep=0, dSafer.cep=8 and d.ceb=1460, then s=infinity and
sSafer=182.5. sSafer=182.5.
Therefore even though the average size of 8 data segments is Therefore even though the average size of 8 data segments is
unlikely to have been as small as MSS/8, d.cep cannot have been unlikely to have been as small as MSS/8, d.cep cannot have been
correct, because it would imply an average segment size greater correct, because it would imply an average segment size greater
than the MSS. than the MSS.
o if d.cep=2, dSafer.cep=10 and d.ceb=1,460, then s=730 and o if d.cep=2, dSafer.cep=10 and d.ceb=1460, then s=730 and
sSafer=146. sSafer=146.
Therefore d.cep is safe enough, because the average size of 10 Therefore d.cep is safe enough, because the average size of 10
data segments is unlikely to have been as small as MSS/10. data segments is unlikely to have been as small as MSS/10.
o if d.cep=7, dSafer.cep=15 and d.ceb=10,200, then s=1,457 and o if d.cep=7, dSafer.cep=15 and d.ceb=10200, then s=1457 and
sSafer=680. sSafer=680.
Therefore d.cep is safe enough, because the average data segment Therefore d.cep is safe enough, because the average data segment
size is more likely to have been just less than one MSS, rather size is more likely to have been just less than one MSS, rather
than below MSS/2. than below MSS/2.
If pure ACKs were allowed to be ECN-capable, missing ACKs would be If pure ACKs were allowed to be ECN-capable, missing ACKs would be
far less likely. However, because [RFC3168] currently precludes far less likely. However, because [RFC3168] currently precludes
this, the above algorithm assumes that pure ACKs are not ECN-capable. this, the above algorithm assumes that pure ACKs are not ECN-capable.
A.3. Example Algorithm to Estimate Marked Bytes from Marked Packets A.3. Example Algorithm to Estimate Marked Bytes from Marked Packets
skipping to change at page 41, line 51 skipping to change at page 52, line 20
far less likely. However, because [RFC3168] currently precludes far less likely. However, because [RFC3168] currently precludes
this, the above algorithm assumes that pure ACKs are not ECN-capable. this, the above algorithm assumes that pure ACKs are not ECN-capable.
A.3. Example Algorithm to Estimate Marked Bytes from Marked Packets A.3. Example Algorithm to Estimate Marked Bytes from Marked Packets
If the AccECN Option is not available, the Data Sender can only If the AccECN Option is not available, the Data Sender can only
decode CE-marking from the ACE field in packets. Every time an ACK decode CE-marking from the ACE field in packets. Every time an ACK
arrives, to convert this into an estimate of CE-marked bytes, it arrives, to convert this into an estimate of CE-marked bytes, it
needs an average of the segment size, s_ave. Then it can add or needs an average of the segment size, s_ave. Then it can add or
subtract s_ave from the value of d.ceb as the value of d.cep subtract s_ave from the value of d.ceb as the value of d.cep
increments or decrements. increments or decrements. Some possible ways to calculate s_ave are
outlined below. The precise details will depend on why an estimate
of marked bytes is needed.
To calculate s_ave, it could keep a record of the byte numbers of all The implementation could keep a record of the byte numbers of all the
the boundaries between packets in flight (including control packets), boundaries between packets in flight (including control packets), and
and recalculate s_ave on every ACK. However it would be simpler to recalculate s_ave on every ACK. However it would be simpler to
merely maintain a counter packets_in_flight for the number of packets merely maintain a counter packets_in_flight for the number of packets
in flight (including control packets), which it could update once per in flight (including control packets), which is reset once per RTT.
RTT. Either way, it would estimate s_ave as: Either way, it would estimate s_ave as:
s_ave ~= flightsize / packets_in_flight, s_ave ~= flightsize / packets_in_flight,
where flightsize is the variable that TCP already maintains for the where flightsize is the variable that TCP already maintains for the
number of bytes in flight. To avoid floating point arithmetic, it number of bytes in flight. To avoid floating point arithmetic, it
could right-bit-shift by lg(packets_in_flight), where lg() means log could right-bit-shift by lg(packets_in_flight), where lg() means log
base 2. base 2.
An alternative would be to maintain an exponentially weighted moving An alternative would be to maintain an exponentially weighted moving
average (EWMA) of the segment size: average (EWMA) of the segment size:
s_ave = a * s + (1-a) * s_ave, s_ave = a * s + (1-a) * s_ave,
where a is the decay constant for the EWMA. However, then it is where a is the decay constant for the EWMA. However, then it is
necessary to choose a good value for this constant, which ought to necessary to choose a good value for this constant, which ought to
depend on the number of packets in flight. Also the decay constant depend on the number of packets in flight. Also the decay constant
needs to be power of two to avoid floating point arithmetic. needs to be power of two to avoid floating point arithmetic.
A.4. Example Algorithm to Beacon AccECN Options A.4. Example Algorithm to Beacon AccECN Options
Section 3.2.8 requires a Data Receiver to beacon a full-length AccECN Section 3.2.3.3 requires a Data Receiver to beacon a full-length
Option at least 3 times per RTT. This could be implemented by AccECN Option at least 3 times per RTT. This could be implemented by
maintaining a variable to store the number of ACKs (pure and data maintaining a variable to store the number of ACKs (pure and data
ACKs) since a full AccECN Option was last sent and another for the ACKs) since a full AccECN Option was last sent and another for the
approximate number of ACKs sent in the last round trip time: approximate number of ACKs sent in the last round trip time:
if (acks_since_full_last_sent > acks_in_round / BEACON_FREQ) if (acks_since_full_last_sent > acks_in_round / BEACON_FREQ)
send_full_AccECN_Option() send_full_AccECN_Option()
For optimised integer arithmetic, BEACON_FREQ = 4 could be used, For optimized integer arithmetic, BEACON_FREQ = 4 could be used,
rather than 3, so that the division could be implemented as an rather than 3, so that the division could be implemented as an
integer right bit-shift by lg(BEACON_FREQ). integer right bit-shift by lg(BEACON_FREQ).
In certain operating systems, it might be too complex to maintain In certain operating systems, it might be too complex to maintain
acks_in_round. In others it might be possible by tagging each data acks_in_round. In others it might be possible by tagging each data
segment in the retransmit buffer with the number of ACKs sent at the segment in the retransmit buffer with the number of ACKs sent at the
point that segment was sent. This would not work well if the Data point that segment was sent. This would not work well if the Data
Receiver was not sending data itself, in which case it might be Receiver was not sending data itself, in which case it might be
necessary to beacon based on time instead, as follows: necessary to beacon based on time instead, as follows:
skipping to change at page 43, line 26 skipping to change at page 53, line 45
the amount of newly ACKed data and the sum of the bytes with the the amount of newly ACKed data and the sum of the bytes with the
other three markings, d.ceb, d.e0b and d.e1b. Note that, because other three markings, d.ceb, d.e0b and d.e1b. Note that, because
r.e0b is initialized to 1 and the other two counters are initialized r.e0b is initialized to 1 and the other two counters are initialized
to 0, the initial sum will be 1, which matches the initial offset of to 0, the initial sum will be 1, which matches the initial offset of
the TCP sequence number on completion of the 3WHS. the TCP sequence number on completion of the 3WHS.
For this approach to be precise, it has to be assumed that spurious For this approach to be precise, it has to be assumed that spurious
(unnecessary) retransmissions do not lead to double counting. This (unnecessary) retransmissions do not lead to double counting. This
assumption is currently correct, given that RFC 3168 requires that assumption is currently correct, given that RFC 3168 requires that
the Data Sender marks retransmitted segments as Not-ECT. However, the Data Sender marks retransmitted segments as Not-ECT. However,
the converse is not true; necessary transmissions will result in the converse is not true; necessary retransmissions will result in
under-counting. under-counting.
However, such precision is unlikely to be necessary. The only known However, such precision is unlikely to be necessary. The only known
use of a count of Not-ECT marked bytes is to test whether equipment use of a count of Not-ECT marked bytes is to test whether equipment
on the path is clearing the ECN field (perhaps due to an out-dated on the path is clearing the ECN field (perhaps due to an out-dated
attempt to clear, or bleach, what used to be the ToS field). To attempt to clear, or bleach, what used to be the ToS field). To
detect bleaching it will be sufficient to detect whether nearly all detect bleaching it will be sufficient to detect whether nearly all
bytes arrive marked as Not-ECT. Therefore there should be no need to bytes arrive marked as Not-ECT. Therefore there should be no need to
keep track of the details of retransmissions. keep track of the details of retransmissions.
Appendix B. Rationale for Usage of TCP Header Flags Appendix B. Rationale for Usage of TCP Header Flags
B.1. Three TCP Header Flags in the SYN-SYN/ACK Handshake B.1. Three TCP Header Flags in the SYN-SYN/ACK Handshake
AccECN uses a rather unorthodox but justified approach to negotiate AccECN uses a rather unorthodox approach to negotiate the highest
the highest version TCP ECN feedback scheme that both ends support. version TCP ECN feedback scheme that both ends support, as justified
It follows from the original TCP ECN capability negotiation below. It follows from the original TCP ECN capability negotiation
[RFC3168], in which the client set the 2 least significant reserved [RFC3168], in which the client set the 2 least significant of the
flags in the TCP header, and fell back to no ECN support if the original reserved flags in the TCP header, and fell back to no ECN
server responded with the 2 flags cleared, which had previously been support if the server responded with the 2 flags cleared, which had
the default. It is not recorded why ECN originally used this previously been the default.
approach instead of the more orthodox use of a TCP option.
ECN originally used header flags rather than a TCP option because it
was considered more efficient to use a header flag for 1 bit of
feedback per ACK, and this bit could be overloaded to indicate
support for ECN during the handshake. During the development of ECN,
1 bit crept up to 2, in order to deliver the feedback reliably and to
work round some broken hosts that reflected the reserved flags during
the handshake.
In order to be backward compatible with RFC 3168, AccECN continues In order to be backward compatible with RFC 3168, AccECN continues
this approach, using the 3rd least significant TCP header flag that this approach, using the 3rd least significant TCP header flag that
had previously been allocated for the ECN nonce (now historic). had previously been allocated for the ECN nonce (now historic).
Then, whatever form of server an AccECN client encounters, the Then, whatever form of server an AccECN client encounters, the
connection can fall back to the highest version of feedback protocol connection can fall back to the highest version of feedback protocol
that both ends support, as explained in Section 3.1. that both ends support, as explained in Section 3.1.
If AccECN had used the more orthodox approach of a TCP option, it If AccECN had used the more orthodox approach of a TCP option, it
would still have had to set the two ECN flags in the main TCP header, would still have had to set the two ECN flags in the main TCP header,
in order to be able to fall back to Classic RFC 3168 ECN, or to in order to be able to fall back to Classic RFC 3168 ECN, or to
disable ECN support, without another round of negotiation. Then disable ECN support, without another round of negotiation. Then
AccECN would also have had to handle all the different ways that AccECN would also have had to handle all the different ways that
servers currently respond to settings of the ECN flags in the main servers currently respond to settings of the ECN flags in the main
skipping to change at page 45, line 34 skipping to change at page 56, line 13
pattern tagged as 'Nonce', and a small but more significant number pattern tagged as 'Nonce', and a small but more significant number
arrive with the pattern tagged as 'Broken'. The 'Nonce' pattern arrive with the pattern tagged as 'Broken'. The 'Nonce' pattern
could be a sign that a few servers have implemented the ECN Nonce could be a sign that a few servers have implemented the ECN Nonce
[RFC3540], which has now been reclassified as historic [RFC8311], [RFC3540], which has now been reclassified as historic [RFC8311],
or it could be the random result of some unknown middlebox or it could be the random result of some unknown middlebox
behaviour. The greater prevalence of the 'Broken' pattern behaviour. The greater prevalence of the 'Broken' pattern
suggests that some instances still exist of the broken code that suggests that some instances still exist of the broken code that
reflects the reserved flags on the SYN. reflects the reserved flags on the SYN.
The requirement not to reject unexpected initial values of the ACE The requirement not to reject unexpected initial values of the ACE
counter (in the main TCP header) in the last para of Section 3.2.3 counter (in the main TCP header) in the last para of
ensures that 3 unused codepoints on the final ACK of the 3WHS and Section 3.2.2.3 ensures that 3 unused codepoints on the ACK of the
7 unused values on the first data packet from the server could be SYN/ACK, 6 unused values on the first SYN=0 data packet from the
used to declare future variants of the AccECN protocol. The word client and 7 unused values on the first SYN=0 data packet from the
'declare' is used rather than 'negotiate' because, at this late server could be used to declare future variants of the AccECN
stage in the 3WHS, it would be too late for a negotiation between protocol. The word 'declare' is used rather than 'negotiate'
the endpoints to be completed. A similar requirement not to because, at this late stage in the 3WHS, it would be too late for
reject unexpected initial values in the TCP option a negotiation between the endpoints to be completed. A similar
(Section 3.2.7.4) is for the same purpose. If traversal of the requirement not to reject unexpected initial values in the TCP
TCP option were reliable, this would have enabled a far wider option (Section 3.2.3.2.4) is for the same purpose. If traversal
range of future variation of the whole AccECN protocol. of the TCP option were reliable, this would have enabled a far
wider range of future variation of the whole AccECN protocol.
Nonetheless, it could be used to reliably negotiate a wide range Nonetheless, it could be used to reliably negotiate a wide range
of variation in the semantics of the AccECN Option. of variation in the semantics of the AccECN Option.
Future non-AccECN variants: Five codepoints out of the 8 possible in Future non-AccECN variants: Five codepoints out of the 8 possible in
the 3 TCP header flags used by AccECN are unused on the initial the 3 TCP header flags used by AccECN are unused on the initial
SYN (in the order AE,CWR,ECE): 001, 010, 100, 101, 110. SYN (in the order AE,CWR,ECE): 001, 010, 100, 101, 110.
Section 3.1.2 ensures that the installed base of AccECN servers Section 3.1.3 ensures that the installed base of AccECN servers
will all assume these are equivalent to AccECN negotiation with will all assume these are equivalent to AccECN negotiation with
111 on the SYN. These codepoints would not allow fall-back to 111 on the SYN. These codepoints would not allow fall-back to
Classic ECN support for a server that did not understand them, but Classic ECN support for a server that did not understand them, but
this approach ensures they are available in future, perhaps for this approach ensures they are available in future, perhaps for
uses other than ECN alongside the AccECN scheme. All possible uses other than ECN alongside the AccECN scheme. All possible
combinations of SYN/ACK could be used in response except either combinations of SYN/ACK could be used in response except either
000 or reflection of the same values sent on the SYN. 000 or reflection of the same values sent on the SYN.
Of course, other ways could be resorted to in order to extend Of course, other ways could be resorted to in order to extend
AccECN or ECN in future, although their traversal properties are AccECN or ECN in future, although their traversal properties are
likely to be inferior. They include a new TCP option; using the likely to be inferior. They include a new TCP option; using the
remaining reserved flags in the main TCP header (preferably remaining reserved flags in the main TCP header (preferably
extending the 3-bit combinations used by AccECN to 4-bit extending the 3-bit combinations used by AccECN to 4-bit
combinations, rather than burning one bit for just one state); a combinations, rather than burning one bit for just one state); a
non-zero urgent pointer in combination with the URG flag cleared; non-zero urgent pointer in combination with the URG flag cleared;
or some other unexpected combination of fields yet to be invented. or some other unexpected combination of fields yet to be invented.
Authors' Addresses Authors' Addresses
Bob Briscoe Bob Briscoe
CableLabs Independent
UK UK
EMail: ietf@bobbriscoe.net EMail: ietf@bobbriscoe.net
URI: http://bobbriscoe.net/ URI: http://bobbriscoe.net/
Mirja Kuehlewind Mirja Kuehlewind
Ericsson Ericsson
Germany Germany
EMail: ietf@kuehlewind.net EMail: ietf@kuehlewind.net
 End of changes. 164 change blocks. 
468 lines changed or deleted 934 lines changed or added

This html diff was produced by rfcdiff 1.47. The latest version is available from http://tools.ietf.org/tools/rfcdiff/