draft-ietf-tcpm-accurate-ecn-03.txt   draft-ietf-tcpm-accurate-ecn-04.txt 
TCP Maintenance & Minor Extensions (tcpm) B. Briscoe TCP Maintenance & Minor Extensions (tcpm) B. Briscoe
Internet-Draft Simula Research Laboratory Internet-Draft CableLabs
Intended status: Experimental M. Kuehlewind Intended status: Experimental M. Kuehlewind
Expires: December 1, 2017 ETH Zurich Expires: May 3, 2018 ETH Zurich
R. Scheffenegger R. Scheffenegger
May 30, 2017 October 30, 2017
More Accurate ECN Feedback in TCP More Accurate ECN Feedback in TCP
draft-ietf-tcpm-accurate-ecn-03 draft-ietf-tcpm-accurate-ecn-04
Abstract Abstract
Explicit Congestion Notification (ECN) is a mechanism where network Explicit Congestion Notification (ECN) is a mechanism where network
nodes can mark IP packets instead of dropping them to indicate nodes can mark IP packets instead of dropping them to indicate
incipient congestion to the end-points. Receivers with an ECN- incipient congestion to the end-points. Receivers with an ECN-
capable transport protocol feed back this information to the sender. capable transport protocol feed back this information to the sender.
ECN is specified for TCP in such a way that only one feedback signal ECN is specified for TCP in such a way that only one feedback signal
can be transmitted per Round-Trip Time (RTT). Recently, new TCP can be transmitted per Round-Trip Time (RTT). Recently, new TCP
mechanisms like Congestion Exposure (ConEx) or Data Center TCP mechanisms like Congestion Exposure (ConEx) or Data Center TCP
skipping to change at page 1, line 37 skipping to change at page 1, line 37
additional information in a new TCP option. additional information in a new TCP option.
Status of This Memo Status of This Memo
This Internet-Draft is submitted in full conformance with the This Internet-Draft is submitted in full conformance with the
provisions of BCP 78 and BCP 79. provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF). Note that other groups may also distribute Task Force (IETF). Note that other groups may also distribute
working documents as Internet-Drafts. The list of current Internet- working documents as Internet-Drafts. The list of current Internet-
Drafts is at http://datatracker.ietf.org/drafts/current/. Drafts is at https://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress." material or to cite them other than as "work in progress."
This Internet-Draft will expire on December 1, 2017. This Internet-Draft will expire on May 3, 2018.
Copyright Notice Copyright Notice
Copyright (c) 2017 IETF Trust and the persons identified as the Copyright (c) 2017 IETF Trust and the persons identified as the
document authors. All rights reserved. document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents Provisions Relating to IETF Documents
(http://trustee.ietf.org/license-info) in effect on the date of (https://trustee.ietf.org/license-info) in effect on the date of
publication of this document. Please review these documents publication of this document. Please review these documents
carefully, as they describe your rights and restrictions with respect carefully, as they describe your rights and restrictions with respect
to this document. Code Components extracted from this document must to this document. Code Components extracted from this document must
include Simplified BSD License text as described in Section 4.e of include Simplified BSD License text as described in Section 4.e of
the Trust Legal Provisions and are provided without warranty as the Trust Legal Provisions and are provided without warranty as
described in the Simplified BSD License. described in the Simplified BSD License.
Table of Contents Table of Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3
1.1. Document Roadmap . . . . . . . . . . . . . . . . . . . . 4 1.1. Document Roadmap . . . . . . . . . . . . . . . . . . . . 4
1.2. Goals . . . . . . . . . . . . . . . . . . . . . . . . . . 5 1.2. Goals . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.3. Experiment Goals . . . . . . . . . . . . . . . . . . . . 5 1.3. Experiment Goals . . . . . . . . . . . . . . . . . . . . 5
1.4. Terminology . . . . . . . . . . . . . . . . . . . . . . . 6 1.4. Terminology . . . . . . . . . . . . . . . . . . . . . . . 6
1.5. Recap of Existing ECN feedback in IP/TCP . . . . . . . . 6 1.5. Recap of Existing ECN feedback in IP/TCP . . . . . . . . 6
2. AccECN Protocol Overview and Rationale . . . . . . . . . . . 7 2. AccECN Protocol Overview and Rationale . . . . . . . . . . . 7
2.1. Capability Negotiation . . . . . . . . . . . . . . . . . 8 2.1. Capability Negotiation . . . . . . . . . . . . . . . . . 8
2.2. Feedback Mechanism . . . . . . . . . . . . . . . . . . . 9 2.2. Feedback Mechanism . . . . . . . . . . . . . . . . . . . 9
2.3. Delayed ACKs and Resilience Against ACK Loss . . . . . . 9 2.3. Delayed ACKs and Resilience Against ACK Loss . . . . . . 9
2.4. Feedback Metrics . . . . . . . . . . . . . . . . . . . . 10 2.4. Feedback Metrics . . . . . . . . . . . . . . . . . . . . 10
2.5. Generic (Dumb) Reflector . . . . . . . . . . . . . . . . 10 2.5. Generic (Dumb) Reflector . . . . . . . . . . . . . . . . 11
3. AccECN Protocol Specification . . . . . . . . . . . . . . . . 11 3. AccECN Protocol Specification . . . . . . . . . . . . . . . . 11
3.1. Negotiating to use AccECN . . . . . . . . . . . . . . . . 11 3.1. Negotiating to use AccECN . . . . . . . . . . . . . . . . 12
3.1.1. Negotiation during the TCP handshake . . . . . . . . 11 3.1.1. Negotiation during the TCP handshake . . . . . . . . 12
3.1.2. Retransmission of the SYN . . . . . . . . . . . . . . 14 3.1.2. Retransmission of the SYN . . . . . . . . . . . . . . 14
3.2. AccECN Feedback . . . . . . . . . . . . . . . . . . . . . 15 3.2. AccECN Feedback . . . . . . . . . . . . . . . . . . . . . 15
3.2.1. The ACE Field . . . . . . . . . . . . . . . . . . . . 15 3.2.1. Initialization of Feedback Counters at the Data
3.2.2. Testing for Zeroing of the ACE Field . . . . . . . . 16 Sender . . . . . . . . . . . . . . . . . . . . . . . 15
3.2.3. Safety against Ambiguity of the ACE Field . . . . . . 17 3.2.2. The ACE Field . . . . . . . . . . . . . . . . . . . . 16
3.2.4. The AccECN Option . . . . . . . . . . . . . . . . . . 17 3.2.3. Testing for Zeroing of the ACE Field . . . . . . . . 17
3.2.5. Path Traversal of the AccECN Option . . . . . . . . . 19 3.2.4. Testing for Mangling of the IP/ECN Field . . . . . . 18
3.2.6. Usage of the AccECN TCP Option . . . . . . . . . . . 22 3.2.5. Safety against Ambiguity of the ACE Field . . . . . . 19
3.2.6. The AccECN Option . . . . . . . . . . . . . . . . . . 19
3.2.7. Path Traversal of the AccECN Option . . . . . . . . . 21
3.2.8. Usage of the AccECN TCP Option . . . . . . . . . . . 24
3.3. AccECN Compliance by TCP Proxies, Offload Engines and 3.3. AccECN Compliance by TCP Proxies, Offload Engines and
other Middleboxes . . . . . . . . . . . . . . . . . . . . 23 other Middleboxes . . . . . . . . . . . . . . . . . . . . 26
4. Interaction with Other TCP Variants . . . . . . . . . . . . . 24 4. Interaction with Other TCP Variants . . . . . . . . . . . . . 26
4.1. Compatibility with SYN Cookies . . . . . . . . . . . . . 24 4.1. Compatibility with SYN Cookies . . . . . . . . . . . . . 26
4.2. Compatibility with Other TCP Options and Experiments . . 25 4.2. Compatibility with Other TCP Options and Experiments . . 27
4.3. Compatibility with Feedback Integrity Mechanisms . . . . 25 4.3. Compatibility with Feedback Integrity Mechanisms . . . . 27
5. Protocol Properties . . . . . . . . . . . . . . . . . . . . . 26 5. Protocol Properties . . . . . . . . . . . . . . . . . . . . . 28
6. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 28 6. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 30
7. Security Considerations . . . . . . . . . . . . . . . . . . . 29 7. Security Considerations . . . . . . . . . . . . . . . . . . . 31
8. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 29 8. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 32
9. Comments Solicited . . . . . . . . . . . . . . . . . . . . . 30 9. Comments Solicited . . . . . . . . . . . . . . . . . . . . . 32
10. References . . . . . . . . . . . . . . . . . . . . . . . . . 30 10. References . . . . . . . . . . . . . . . . . . . . . . . . . 32
10.1. Normative References . . . . . . . . . . . . . . . . . . 30 10.1. Normative References . . . . . . . . . . . . . . . . . . 33
10.2. Informative References . . . . . . . . . . . . . . . . . 30 10.2. Informative References . . . . . . . . . . . . . . . . . 33
Appendix A. Example Algorithms . . . . . . . . . . . . . . . . . 33 Appendix A. Example Algorithms . . . . . . . . . . . . . . . . . 36
A.1. Example Algorithm to Encode/Decode the AccECN Option . . 33 A.1. Example Algorithm to Encode/Decode the AccECN Option . . 36
A.2. Example Algorithm for Safety Against Long Sequences of A.2. Example Algorithm for Safety Against Long Sequences of
ACK Loss . . . . . . . . . . . . . . . . . . . . . . . . 34 ACK Loss . . . . . . . . . . . . . . . . . . . . . . . . 37
A.2.1. Safety Algorithm without the AccECN Option . . . . . 34 A.2.1. Safety Algorithm without the AccECN Option . . . . . 37
A.2.2. Safety Algorithm with the AccECN Option . . . . . . . 36 A.2.2. Safety Algorithm with the AccECN Option . . . . . . . 39
A.3. Example Algorithm to Estimate Marked Bytes from Marked A.3. Example Algorithm to Estimate Marked Bytes from Marked
Packets . . . . . . . . . . . . . . . . . . . . . . . . . 37 Packets . . . . . . . . . . . . . . . . . . . . . . . . . 40
A.4. Example Algorithm to Beacon AccECN Options . . . . . . . 38 A.4. Example Algorithm to Beacon AccECN Options . . . . . . . 41
A.5. Example Algorithm to Count Not-ECT Bytes . . . . . . . . 39 A.5. Example Algorithm to Count Not-ECT Bytes . . . . . . . . 42
Appendix B. Alternative Design Choices (To Be Removed Before Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 42
Publication) . . . . . . . . . . . . . . . . . . . . 39
Appendix C. Open Protocol Design Issues (To Be Removed Before
Publication) . . . . . . . . . . . . . . . . . . . . 40
Appendix D. Changes in This Version (To Be Removed Before
Publication) . . . . . . . . . . . . . . . . . . . . 40
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 40
1. Introduction 1. Introduction
Explicit Congestion Notification (ECN) [RFC3168] is a mechanism where Explicit Congestion Notification (ECN) [RFC3168] is a mechanism where
network nodes can mark IP packets instead of dropping them to network nodes can mark IP packets instead of dropping them to
indicate incipient congestion to the end-points. Receivers with an indicate incipient congestion to the end-points. Receivers with an
ECN-capable transport protocol feed back this information to the ECN-capable transport protocol feed back this information to the
sender. ECN is specified for TCP in such a way that only one sender. ECN is specified for TCP in such a way that only one
feedback signal can be transmitted per Round-Trip Time (RTT). feedback signal can be transmitted per Round-Trip Time (RTT).
Recently, proposed mechanisms like Congestion Exposure (ConEx Recently, proposed mechanisms like Congestion Exposure (ConEx
[RFC7713]), DCTCP [I-D.ietf-tcpm-dctcp] or L4S [RFC7713]), DCTCP [RFC8257] or L4S [I-D.ietf-tsvwg-l4s-arch] need
[I-D.ietf-tsvwg-l4s-arch] need more accurate ECN feedback information more accurate ECN feedback information whenever more than one marking
whenever more than one marking is received in one RTT. A fuller is received in one RTT. A fuller treatment of the motivation for
treatment of the motivation for this specification is given in the this specification is given in the associated requirements document
associated requirements document [RFC7560]. [RFC7560].
This documents specifies an experimental scheme for ECN feedback in This documents specifies an experimental scheme for ECN feedback in
the TCP header to provide more than one feedback signal per RTT. It the TCP header to provide more than one feedback signal per RTT. It
will be called the more accurate ECN feedback scheme, or AccECN for will be called the more accurate ECN feedback scheme, or AccECN for
short. If AccECN progresses from experimental to the standards short. If AccECN progresses from experimental to the standards
track, it is intended to be a complete replacement for classic ECN track, it is intended to be a complete replacement for classic ECN
feedback, not a fork in the design of TCP. Thus, the applicability feedback, not a fork in the design of TCP. Thus, the applicability
of AccECN is intended to include all public and private IP networks of AccECN is intended to include all public and private IP networks
(and even any non-IP networks over which TCP is used today). Until (and even any non-IP networks over which TCP is used today). Until
the AccECN experiment succeeds, [RFC3168] will remain as the the AccECN experiment succeeds, [RFC3168] will remain as the
skipping to change at page 4, line 21 skipping to change at page 4, line 18
the two ends use the three ECN-related flags in the TCP header to the two ends use the three ECN-related flags in the TCP header to
negotiate the most advanced feedback protocol that they can both negotiate the most advanced feedback protocol that they can both
support. support.
AccECN is solely an (experimental) change to the TCP wire protocol; AccECN is solely an (experimental) change to the TCP wire protocol;
it only specifies the negotiation and signaling of more accurate ECN it only specifies the negotiation and signaling of more accurate ECN
feedback from a TCP Data Receiver to a Data Sender. It is completely feedback from a TCP Data Receiver to a Data Sender. It is completely
independent of how TCP might respond to congestion feedback, which is independent of how TCP might respond to congestion feedback, which is
out of scope. For that we refer to [RFC3168] or any RFC that out of scope. For that we refer to [RFC3168] or any RFC that
specifies a different response to TCP ECN feedback, for example: specifies a different response to TCP ECN feedback, for example:
[I-D.ietf-tcpm-dctcp]; or the ECN experiments referred to in [RFC8257]; or the ECN experiments referred to in
[I-D.ietf-tsvwg-ecn-experimentation], namely: a TCP-based Low Latency [I-D.ietf-tsvwg-ecn-experimentation], namely: a TCP-based Low Latency
Low Loss Scalable (L4S) congestion control [I-D.ietf-tsvwg-l4s-arch]; Low Loss Scalable (L4S) congestion control [I-D.ietf-tsvwg-l4s-arch];
ECN-capable TCP control packets [I-D.bagnulo-tcpm-generalized-ecn], ECN-capable TCP control packets [I-D.ietf-tcpm-generalized-ecn], or
or Alternative Backoff with ECN (ABE) Alternative Backoff with ECN (ABE)
[I-D.ietf-tcpm-alternativebackoff-ecn]. [I-D.ietf-tcpm-alternativebackoff-ecn].
It is likely (but not required) that the AccECN protocol will be It is likely (but not required) that the AccECN protocol will be
implemented along with the following experimental additions to the implemented along with the following experimental additions to the
TCP-ECN protocol: ECN-capable TCP control packets and retransmissions TCP-ECN protocol: ECN-capable TCP control packets and retransmissions
[I-D.bagnulo-tcpm-generalized-ecn], which includes the ECN-capable [I-D.ietf-tcpm-generalized-ecn], which includes the ECN-capable SYN/
SYN-ACK experiment [RFC5562]; and testing receiver non-compliance ACK experiment [RFC5562]; and testing receiver non-compliance
[I-D.moncaster-tcpm-rcv-cheat]. [I-D.moncaster-tcpm-rcv-cheat].
1.1. Document Roadmap 1.1. Document Roadmap
The following introductory sections outline the goals of AccECN The following introductory sections outline the goals of AccECN
(Section 1.2) and the goal of experiments with ECN (Section 1.3) so (Section 1.2) and the goal of experiments with ECN (Section 1.3) so
that it is clear what success would look like. Then terminology is that it is clear what success would look like. Then terminology is
defined (Section 1.4) and a recap of existing prerequisite technology defined (Section 1.4) and a recap of existing prerequisite technology
is given (Section 1.5). is given (Section 1.5).
skipping to change at page 5, line 36 skipping to change at page 5, line 33
TCP is critical to the robust functioning of the Internet, therefore TCP is critical to the robust functioning of the Internet, therefore
any proposed modifications to TCP need to be thoroughly tested. The any proposed modifications to TCP need to be thoroughly tested. The
present specification describes an experimental protocol that adds present specification describes an experimental protocol that adds
more accurate ECN feedback to the TCP protocol. The intention is to more accurate ECN feedback to the TCP protocol. The intention is to
specify the protocol sufficiently so that more than one specify the protocol sufficiently so that more than one
implementation can be built in order to test its function, robustness implementation can be built in order to test its function, robustness
and interoperability (with itself and with previous version of ECN and interoperability (with itself and with previous version of ECN
and TCP). and TCP).
The experimental protocol will be considered successful if it The experimental protocol will be considered successful if it is
satisfies the requirements of [RFC7560] in the consensus opinion of deployed and if it satisfies the requirements of [RFC7560] in the
the IETF tcpm working group. In short, this requires that it consensus opinion of the IETF tcpm working group. In short, this
improves the accuracy and timeliness of TCP's ECN feedback, as requires that it improves the accuracy and timeliness of TCP's ECN
claimed in Section 5, while striking a balance between the feedback, as claimed in Section 5, while striking a balance between
conflicting requirements of resilience, integrity and minimisation of the conflicting requirements of resilience, integrity and
overhead. It also requires that it is not unduly complex, and that minimisation of overhead. It also requires that it is not unduly
it is compatible with prevalent equipment behaviours in the current complex, and that it is compatible with prevalent equipment
Internet, whether or not they comply with standards. behaviours in the current Internet (e.g. hardware offloading and
middleboxes), whether or not they comply with standards.
Testing will mostly focus on fall-back strategies in case of Testing will mostly focus on fall-back strategies in case of
middlebox interference. Current recommended strategies are specified middlebox interference. Current recommended strategies are specified
in Sections 3.1.2, 3.2.2 and 3.2.5. The effectiveness of these in Sections 3.1.2, 3.2.3, 3.2.4 and 3.2.7. The effectiveness of
strategies depends on the actual deployment situation of middleboxes. these strategies depends on the actual deployment situation of
Therefore experimental verification to confirm large-scale path middleboxes. Therefore experimental verification to confirm large-
traversal in the Internet is needed to finalize this specification on scale path traversal in the Internet is needed before finalizing this
Standards Track. specification on the Standards Track.
1.4. Terminology 1.4. Terminology
AccECN: The more accurate ECN feedback scheme will be called AccECN AccECN: The more accurate ECN feedback scheme will be called AccECN
for short. for short.
Classic ECN: the ECN protocol specified in [RFC3168]. Classic ECN: the ECN protocol specified in [RFC3168].
Classic ECN feedback: the feedback aspect of the ECN protocol Classic ECN feedback: the feedback aspect of the ECN protocol
specified in [RFC3168], including generation, encoding, specified in [RFC3168], including generation, encoding,
skipping to change at page 9, line 40 skipping to change at page 9, line 40
Each ACK carries the three least significant bits (LSBs) of the Each ACK carries the three least significant bits (LSBs) of the
packet-based CE counter using the ECN bits in the TCP header, now packet-based CE counter using the ECN bits in the TCP header, now
renamed the Accurate ECN (ACE) field (see Figure 2 later). The LSBs renamed the Accurate ECN (ACE) field (see Figure 2 later). The LSBs
of each of the three byte counters are carried in the AccECN Option. of each of the three byte counters are carried in the AccECN Option.
2.3. Delayed ACKs and Resilience Against ACK Loss 2.3. Delayed ACKs and Resilience Against ACK Loss
With both the ACE and the AccECN Option mechanisms, the Data Receiver With both the ACE and the AccECN Option mechanisms, the Data Receiver
continually repeats the current LSBs of each of its respective continually repeats the current LSBs of each of its respective
counters. Then, even if some ACKs are lost, the Data Sender should counters. There is no need to acknowledge these continually repeated
be able to infer how much to increment its own counters, even if the counters, so the congestion window reduced (CWR) mechanism is no
longer used. Even if some ACKs are lost, the Data Sender should be
able to infer how much to increment its own counters, even if the
protocol field has wrapped. protocol field has wrapped.
The 3-bit ACE field can wrap fairly frequently. Therefore, even if The 3-bit ACE field can wrap fairly frequently. Therefore, even if
it appears to have incremented by one (say), the field might have it appears to have incremented by one (say), the field might have
actually cycled completely then incremented by one. The Data actually cycled completely then incremented by one. The Data
Receiver is required not to delay sending an ACK to such an extent Receiver is required not to delay sending an ACK to such an extent
that the ACE field would cycle. However cyling is still a that the ACE field would cycle. However cyling is still a
possibility at the Data Sender because a whole sequence of ACKs possibility at the Data Sender because a whole sequence of ACKs
carrying intervening values of the field might all be lost or delayed carrying intervening values of the field might all be lost or delayed
in transit. in transit.
skipping to change at page 11, line 24 skipping to change at page 11, line 31
[I-D.moncaster-tcpm-rcv-cheat]). [I-D.moncaster-tcpm-rcv-cheat]).
The initial SYN is the most critical control packet, so AccECN The initial SYN is the most critical control packet, so AccECN
provides feedback on whether it is CE marked. Although RFC 3168 provides feedback on whether it is CE marked. Although RFC 3168
prohibits an ECN-capable SYN, providing feedback of CE marking on the prohibits an ECN-capable SYN, providing feedback of CE marking on the
SYN supports future scenarios in which SYNs might be ECN-enabled SYN supports future scenarios in which SYNs might be ECN-enabled
(without prejudging whether they ought to be). For instance, (without prejudging whether they ought to be). For instance,
[I-D.ietf-tsvwg-ecn-experimentation] updates this aspect of RFC 3168 [I-D.ietf-tsvwg-ecn-experimentation] updates this aspect of RFC 3168
to allow experimentation with ECN-capable TCP control packets. to allow experimentation with ECN-capable TCP control packets.
Even if the TCP client has set the SYN to not-ECT in compliance with Even if the TCP client (or server) has set the SYN (or SYN/ACK) to
RFC 3168, feedback on whether it has been CE-marked could still be not-ECT in compliance with RFC 3168, feedback on the state of the ECN
useful, because middleboxes have been known to overwrite the ECN IP field when it arrives at the receiver could still be useful, because
field as if it is still part of the old Type of Service (ToS) field. middleboxes have been known to overwrite the ECN IP field as if it is
If a TCP client has set the SYN to Not-ECT, but receives CE feedback, still part of the old Type of Service (ToS) field [Mandalari18]. If
it can detect such middlebox interference and send Not-ECT for the a TCP client has set the SYN to Not-ECT, but receives CE feedback, it
rest of the connection (see [I-D.kuehlewind-tcpm-ecn-fallback]). can detect such middlebox interference and send Not-ECT for the rest
Today, if a TCP server receives CE on a SYN, it cannot know whether of the connection (see [I-D.kuehlewind-tcpm-ecn-fallback]). Today,
if a TCP server receives ECT or CE on a SYN, it cannot know whether
it is invalid (or valid) because only the TCP client knows whether it it is invalid (or valid) because only the TCP client knows whether it
originally marked the SYN as Not-ECT (or ECT). Therefore, prior to originally marked the SYN as Not-ECT (or ECT). Therefore, prior to
AccECN, the server's only safe course of action was to disable ECN AccECN, the server's only safe course of action was to disable ECN
for the connection. Instead, the AccECN protocol allows the server for the connection. Instead, the AccECN protocol allows the server
to feed back the CE marking to the client, which then has all the to feed back the received ECN field to the client, which then has all
information to decide whether the connection has to fall-back from the information to decide whether the connection has to fall-back
supporting ECN (or not). from supporting ECN (or not).
3. AccECN Protocol Specification 3. AccECN Protocol Specification
3.1. Negotiating to use AccECN 3.1. Negotiating to use AccECN
3.1.1. Negotiation during the TCP handshake 3.1.1. Negotiation during the TCP handshake
Given the ECN Nonce [RFC3540] is being reclassified as historic, the Given the ECN Nonce [RFC3540] is being reclassified as historic, the
present specification renames the TCP flag at bit 7 of the TCP header present specification renames the TCP flag at bit 7 of the TCP header
flags from NS (Nonce Sum) to AE (Accurate ECN) (see IANA flags from NS (Nonce Sum) to AE (Accurate ECN) (see IANA
Considerations in Section 6). Considerations in Section 6).
During the TCP handshake at the start of a connection, to request During the TCP handshake at the start of a connection, to request
skipping to change at page 12, line 11 skipping to change at page 12, line 19
present specification renames the TCP flag at bit 7 of the TCP header present specification renames the TCP flag at bit 7 of the TCP header
flags from NS (Nonce Sum) to AE (Accurate ECN) (see IANA flags from NS (Nonce Sum) to AE (Accurate ECN) (see IANA
Considerations in Section 6). Considerations in Section 6).
During the TCP handshake at the start of a connection, to request During the TCP handshake at the start of a connection, to request
more accurate ECN feedback the TCP client (host A) MUST set the TCP more accurate ECN feedback the TCP client (host A) MUST set the TCP
flags AE=1, CWR=1 and ECE=1 in the initial SYN segment. flags AE=1, CWR=1 and ECE=1 in the initial SYN segment.
If a TCP server (B) that is AccECN-enabled receives a SYN with the If a TCP server (B) that is AccECN-enabled receives a SYN with the
above three flags set, it MUST set both its half connections into above three flags set, it MUST set both its half connections into
AccECN mode. Then it MUST set the TCP flags CWR=1 and ECE=0 on its AccECN mode. Then it MUST set the TCP flags on the SYN/ACK to one of
response in the SYN/ACK segment to confirm that it supports AccECN. the 4 values shown in the top block of Table 2 to confirm that it
The TCP server MUST NOT set this combination of flags unless the supports AccECN. The TCP server MUST NOT set one of these 4
preceding SYN requested support for AccECN as above. combination of flags on the SYN/ACK unless the preceding SYN
requested support for AccECN as above.
A TCP server in AccECN mode MUST additionally set the TCP flag AE=1 A TCP server in AccECN mode MUST set the AE, CWR and ECE TCP flags on
on the SYN/ACK if the IP/ECN field of the SYN was CE-marked (see the SYN/ACK to the value in Table 2 that feeds back the IP-ECN field
Section 2.5 for rationale). If the IP/ECN field of the received SYN that arrived on the SYN. This applies whether or not the server
was Not-ECT, ECT(0) or ECT(1), it MUST clear the TCP AE flag (AE=0) itself supports setting the IP-ECN field on a SYN or SYN/ACK (see
on the SYN/ACK. Section 2.5 for rationale).
Once a TCP client (A) has sent the above SYN to declare that it Once a TCP client (A) has sent the above SYN to declare that it
supports AccECN, and once it has received the above SYN/ACK segment supports AccECN, and once it has received the above SYN/ACK segment
that confirms that the TCP server supports AccECN, the TCP client that confirms that the TCP server supports AccECN, the TCP client
MUST set both its half connections into AccECN mode. MUST set both its half connections into AccECN mode.
The procedure for the client to follow if a SYN/ACK does not arrive The procedure for the client to follow if a SYN/ACK does not arrive
before its retransmission timer expires is given in Section 3.1.2. before its retransmission timer expires is given in Section 3.1.2.
The three flags set to 1 to indicate AccECN support on the SYN have The three flags set to 1 to indicate AccECN support on the SYN have
skipping to change at page 12, line 42 skipping to change at page 13, line 4
the evolution of ECN. Table 2 tabulates all the negotiation the evolution of ECN. Table 2 tabulates all the negotiation
possibilities for ECN-related capabilities that involve at least one possibilities for ECN-related capabilities that involve at least one
AccECN-capable host. The entries in the first two columns have been AccECN-capable host. The entries in the first two columns have been
abbreviated, as follows: abbreviated, as follows:
AccECN: More Accurate ECN Feedback (the present specification) AccECN: More Accurate ECN Feedback (the present specification)
Nonce: ECN Nonce feedback [RFC3540] Nonce: ECN Nonce feedback [RFC3540]
ECN: 'Classic' ECN feedback [RFC3168] ECN: 'Classic' ECN feedback [RFC3168]
No ECN: Not-ECN-capable. Implicit congestion notification using No ECN: Not-ECN-capable. Implicit congestion notification using
packet drop. packet drop.
+--------+---------+------------+--------------+--------------------+ +--------+--------+------------+-------------+----------------------+
| A | B | SYN A->B | SYN/ACK B->A | Feedback Mode | | A | B | SYN A->B | SYN/ACK | Feedback Mode |
+--------+---------+------------+--------------+--------------------+ | | | | B->A | |
| | | AE CWR ECE | AE CWR ECE | | +--------+--------+------------+-------------+----------------------+
| AccECN | AccECN | 1 1 1 | 0 1 0 | AccECN | | | | AE CWR ECE | AE CWR ECE | |
| AccECN | AccECN | 1 1 1 | 1 1 0 | AccECN (CE on SYN) | | AccECN | AccECN | 1 1 1 | 0 1 0 | AccECN (Not-ECT on |
| | | | | | | | | | | SYN) |
| AccECN | Nonce | 1 1 1 | 1 0 1 | classic ECN | | AccECN | AccECN | 1 1 1 | 0 1 1 | AccECN (ECT1 on SYN) |
| AccECN | ECN | 1 1 1 | 0 0 1 | classic ECN | | AccECN | AccECN | 1 1 1 | 1 0 0 | AccECN (ECT0 on SYN) |
| AccECN | No ECN | 1 1 1 | 0 0 0 | Not ECN | | AccECN | AccECN | 1 1 1 | 1 1 0 | AccECN (CE on SYN) |
| | | | | | | | | | | |
| Nonce | AccECN | 0 1 1 | 0 0 1 | classic ECN | | AccECN | Nonce | 1 1 1 | 1 0 1 | classic ECN |
| ECN | AccECN | 0 1 1 | 0 0 1 | classic ECN | | AccECN | ECN | 1 1 1 | 0 0 1 | classic ECN |
| No ECN | AccECN | 0 0 0 | 0 0 0 | Not ECN | | AccECN | No ECN | 1 1 1 | 0 0 0 | Not ECN |
| | | | | | | | | | | |
| AccECN | Broken | 1 1 1 | 1 1 1 | Not ECN | | Nonce | AccECN | 0 1 1 | 0 0 1 | classic ECN |
| AccECN | AccECN+ | 1 1 1 | 0 1 1 | AccECN (CU) | | ECN | AccECN | 0 1 1 | 0 0 1 | classic ECN |
| AccECN | AccECN+ | 1 1 1 | 1 0 0 | AccECN (CU) | | No ECN | AccECN | 0 0 0 | 0 0 0 | Not ECN |
+--------+---------+------------+--------------+--------------------+ | | | | | |
| AccECN | Broken | 1 1 1 | 1 1 1 | Not ECN |
+--------+--------+------------+-------------+----------------------+
Table 2: ECN capability negotiation between Client (A) and Server (B) Table 2: ECN capability negotiation between Client (A) and Server (B)
Table 2 is divided into blocks each separated by an empty row. Table 2 is divided into blocks each separated by an empty row.
1. The top block shows the case already described where both 1. The top block shows the case already described where both
endpoints support AccECN and how the TCP server (B) indicates endpoints support AccECN and how the TCP server (B) indicates
congestion feedback. congestion feedback.
2. The second block shows the cases where the TCP client (A) 2. The second block shows the cases where the TCP client (A)
skipping to change at page 13, line 47 skipping to change at page 14, line 5
shown it MUST set both its half connections into the feedback shown it MUST set both its half connections into the feedback
mode shown in the rightmost column. mode shown in the rightmost column.
3. The third block shows the cases where the TCP server (B) supports 3. The third block shows the cases where the TCP server (B) supports
AccECN but the TCP client (A) supports some earlier variant of AccECN but the TCP client (A) supports some earlier variant of
TCP feedback, indicated in its SYN. Therefore, as soon as an TCP feedback, indicated in its SYN. Therefore, as soon as an
AccECN-enabled TCP server (B) receives the SYN shown, it MUST set AccECN-enabled TCP server (B) receives the SYN shown, it MUST set
both its half connections into the feedback mode shown in the both its half connections into the feedback mode shown in the
rightmost column. rightmost column.
4. The fourth block displays combinations that are not valid or 4. The fourth block displays a combination labelled `Broken' . Some
currently unused. The first case (labelled `Broken' is where all older TCP server implementations incorrectly set the reserved
bits set in the SYN are reflected by the receiver in the SYN/ACK, flags in the SYN/ACK by reflecting those in the SYN. Such broken
which happens quite often if the TCP connection is proxied. In TCP servers (B) cannot support ECN, so as soon as an AccECN-
this case, both ends MUST fall-back to Not ECN for both half capable TCP client (A) receives such a broken SYN/ACK it MUST
connections. The other two cases (labelled 'AccECN (CU)') are fall-back to Not ECN mode for both its half connections.
currently unassigned and available for an RFC to extend TCP in
future, tagged as 'AccECN+' (see Appendix B for possible uses).
For forward compatibility, as soon as an AccECN-capable TCP
client (A) receives either of these SYN/ACKs it MUST set both its
half connections into AccECN mode, as if the SYN/ACK had been
AE=0, CWR=1, ECE=0.
The following exceptional cases need some explanation: The following exceptional cases need some explanation:
ECN Nonce: An AccECN implementation, whether client or server, ECN Nonce: An AccECN implementation, whether client or server,
sender or receiver, does not need to implement the ECN Nonce sender or receiver, does not need to implement the ECN Nonce
feedback mode [RFC3540], which is being reclassified as historic feedback mode [RFC3540], which is being reclassified as historic
[I-D.ietf-tsvwg-ecn-experimentation]. AccECN is compatible with [I-D.ietf-tsvwg-ecn-experimentation]. AccECN is compatible with
an alternative ECN feedback integrity approach that does not use an alternative ECN feedback integrity approach that does not use
up the ECT(1) codepoint and can be implemented solely at the up the ECT(1) codepoint and can be implemented solely at the
sender (see Section 4.3). sender (see Section 4.3).
Simultaneous Open: An originating AccECN Host (A), having sent a SYN Simultaneous Open: An originating AccECN Host (A), having sent a SYN
with AE=1, CWR=1 and ECE=1, might receive another SYN from host B. with AE=1, CWR=1 and ECE=1, might receive another SYN from host B.
Host A MUST then enter the same feedback mode as it would have Host A MUST then enter the same feedback mode as it would have
entered had it been a responding host and received the same SYN. entered had it been a responding host and received the same SYN.
Then host A MUST send the same SYN/ACK as it would have sent had Then host A MUST send the same SYN/ACK as it would have sent had
it been a responding host (see the third block above). it been a responding host.
3.1.2. Retransmission of the SYN 3.1.2. Retransmission of the SYN
If the sender of an AccECN SYN times out before receiving the SYN/ If the sender of an AccECN SYN times out before receiving the SYN/
ACK, the sender SHOULD attempt to negotiate the use of AccECN at ACK, the sender SHOULD attempt to negotiate the use of AccECN at
least one more time by continuing to set all three TCP ECN flags on least one more time by continuing to set all three TCP ECN flags on
the first retransmitted SYN (using the usual retransmission time- the first retransmitted SYN (using the usual retransmission time-
outs). If this first retransmission also fails to be acknowledged, outs). If this first retransmission also fails to be acknowledged,
the sender SHOULD send subsequent retransmissions of the SYN without the sender SHOULD send subsequent retransmissions of the SYN without
any ECN flags set. This adds delay, in the case where a middlebox any TCP-ECN flags set. This adds delay, in the case where a
drops an AccECN (or ECN) SYN deliberately. However, current middlebox drops an AccECN (or ECN) SYN deliberately. However,
measurements imply that a drop is less likely to be due to middlebox current measurements imply that a drop is less likely to be due to
interference than other intermittent causes of loss, e.g. congestion, middlebox interference than other intermittent causes of loss, e.g.
wireless interference, etc. congestion, wireless interference, etc.
Implementers MAY use other fall-back strategies if they are found to Implementers MAY use other fall-back strategies if they are found to
be more effective (e.g. attempting to retransmit an AccECN SYN only be more effective (e.g. attempting to negotiate AccECN on the SYN
once or more than twice (most appropriate during high levels of only once or more than twice (most appropriate during high levels of
congestion); or falling back to classic ECN feedback rather than non- congestion); or falling back to classic ECN feedback rather than non-
ECN). Further it may make sense to also remove any other ECN). Further it may make sense to also remove any other
experimental fields or options on the SYN in case a middlebox might experimental fields or options on the SYN in case a middlebox might
be blocking them, although the required behaviour will depend on the be blocking them, although the required behaviour will depend on the
specification of the other option(s) and any attempt to co-ordinate specification of the other option(s) and any attempt to co-ordinate
fall-back between different modules of the stack. In any case, the fall-back between different modules of the stack. In any case, the
TCP initiator SHOULD cache failed connection attempts. If it does, TCP initiator SHOULD cache failed connection attempts. If it does,
it SHOULD NOT give up attempting to negotiate AccECN on the SYN of it SHOULD NOT give up attempting to negotiate AccECN on the SYN of
subsequent connection attempts until it is clear that the blockage is subsequent connection attempts until it is clear that the blockage is
persistently and specifically due to AccECN. The cache should be persistently and specifically due to AccECN. The cache should be
arranged to expire so that the initiator will infrequently attempt to arranged to expire so that the initiator will infrequently attempt to
check whether the problem has been resolved. check whether the problem has been resolved.
The fall-back procedure if the TCP server receives no ACK to The fall-back procedure if the TCP server receives no ACK to
acknowledge a SYN/ACK that tried to negotiate AccECN is specified in acknowledge a SYN/ACK that tried to negotiate AccECN is specified in
Section 3.2.5. Section 3.2.7.
3.2. AccECN Feedback 3.2. AccECN Feedback
Each Data Receiver maintains four counters, r.cep, r.ceb, r.e0b and Each Data Receiver of each half connection maintains four counters,
r.e1b. The CE packet counter (r.cep), counts the number of packets r.cep, r.ceb, r.e0b and r.e1b. The CE packet counter (r.cep), counts
the host receives with the CE code point in the IP ECN field, the number of packets the host receives with the CE code point in the
including CE marks on control packets without data. r.ceb, r.e0b and IP ECN field, including CE marks on control packets without data.
r.e1b count the number of TCP payload bytes in packets marked r.ceb, r.e0b and r.e1b count the number of TCP payload bytes in
respectively with the CE, ECT(0) and ECT(1) codepoint in their IP-ECN packets marked respectively with the CE, ECT(0) and ECT(1) codepoint
field. When a host first enters AccECN mode, it initialises its in their IP-ECN field. When a host first enters AccECN mode, it
counters to r.cep = 6, r.e0b = 1 and r.ceb = r.e1b.= 0 (see initializes its counters to r.cep = 5, r.e0b = 1 and r.ceb = r.e1b.=
Appendix A.5). Non-zero initial values are used to support a 0 (see Appendix A.5). Non-zero initial values are used to support a
stateless handshake (see Section 4.1) and to be distinct from cases stateless handshake (see Section 4.1) and to be distinct from cases
where the fields are incorrectly zeroed (e.g. by middleboxes - see where the fields are incorrectly zeroed (e.g. by middleboxes - see
Section 3.2.5.4). Section 3.2.7.4).
A host feeds back the CE packet counter using the Accurate ECN (ACE) A host feeds back the CE packet counter using the Accurate ECN (ACE)
field, as explained in the next section. And it feeds back all the field, as explained in the next section. And it feeds back all the
byte counters using the AccECN TCP Option, as specified in byte counters using the AccECN TCP Option, as specified in
Section 3.2.4. Whenever a host feeds back the value of any counter, Section 3.2.6. Whenever a host feeds back the value of any counter,
it MUST report the most recent value, no matter whether it is in a it MUST report the most recent value, no matter whether it is in a
pure ACK, an ACK with new payload data or a retransmission. pure ACK, an ACK with new payload data or a retransmission.
Therefore the feedback carried on a retransmitted packet is unlikely
to be the same as the feedback on the original packet.
3.2.1. The ACE Field 3.2.1. Initialization of Feedback Counters at the Data Sender
Each Data Sender of each half connection maintains four counters,
s.cep, s.ceb, s.e0b and s.e1b intended to track the equivalent
counters at the Data Receiver. When a host enters AccECN mode, it
initializes them to s.cep = 5, s.e0b = 1 and s.ceb = s.e1b.= 0.
If a TCP client (A) in AccECN mode receives a SYN/ACK with CE
feedback, i.e. AE=1, CWR=1, ECE=0, it increments s.cep to 6.
Otherwise, for any of the 3 other combinations of the 3 ECN TCP flags
(the top 3 rows in Table 2), s.cep remains initialized to 5.
3.2.2. The ACE Field
After AccECN has been negotiated on the SYN and SYN/ACK, both hosts After AccECN has been negotiated on the SYN and SYN/ACK, both hosts
overload the three TCP flags (AE, CWR and ECE) in the main TCP header overload the three TCP flags (AE, CWR and ECE) in the main TCP header
as one 3-bit field. Then the field is given a new name, ACE, as as one 3-bit field. Then the field is given a new name, ACE, as
shown in Figure 2. shown in Figure 2.
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+ +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
| | | | U | A | P | R | S | F | | | | | U | A | P | R | S | F |
| Header Length | Reserved | ACE | R | C | S | S | Y | I | | Header Length | Reserved | ACE | R | C | S | S | Y | I |
| | | | G | K | H | T | N | N | | | | | G | K | H | T | N | N |
+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+ +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
Figure 2: Definition of the ACE field within bytes 13 and 14 of the Figure 2: Definition of the ACE field within bytes 13 and 14 of the
TCP Header (when AccECN has been negotiated and SYN=0). TCP Header (when AccECN has been negotiated and SYN=0).
The original definition of these three flags in the TCP header, The original definition of these three flags in the TCP header,
including the addition of support for the ECN Nonce, is shown for including the addition of support for the ECN Nonce, is shown for
comparison in Figure 1. This specification does not rename these comparison in Figure 1. This specification does not rename these
three TCP flags to ACE for always; it merely overloads them with three TCP flags to ACE unconditionally; it merely overloads them with
another name and definition once an AccECN connection has been another name and definition once an AccECN connection has been
established. established.
A host MUST interpret the AE, CWR and ECE flags as the 3-bit ACE A host MUST interpret the AE, CWR and ECE flags as the 3-bit ACE
counter on a segment with the SYN flag cleared (SYN=0) that it sends counter on a segment with the SYN flag cleared (SYN=0) that it sends
or receives if both of its half-connections are set into AccECN mode or receives if both of its half-connections are set into AccECN mode
having successfully negotiated AccECN (see Section 3.1). A host MUST having successfully negotiated AccECN (see Section 3.1). A host MUST
NOT interpret the 3 flags as a 3-bit ACE field on any segment with NOT interpret the 3 flags as a 3-bit ACE field on any segment with
SYN=1 (whether ACK is 0 or 1), or if AccECN negotiation is incomplete SYN=1 (whether ACK is 0 or 1), or if AccECN negotiation is incomplete
or has not succeeded. or has not succeeded.
Both parts of each of these conditions are equally important. For Both parts of each of these conditions are equally important. For
instance, even if AccECN negotiation has been successful, the ACE instance, even if AccECN negotiation has been successful, the ACE
field is not defined on any segments with SYN=1 (e.g. a field is not defined on any segments with SYN=1 (e.g. a
retransmission of an unacknowledged SYN/ACK, or when both ends send retransmission of an unacknowledged SYN/ACK, or when both ends send
SYN/ACKs after AccECN support has been successfully negotiated during SYN/ACKs after AccECN support has been successfully negotiated during
a simultaneous open). a simultaneous open).
The ACE field encodes the three least significant bits of the r.cep With only one exception, on any packet with the SYN flag cleared
counter, therefore its initial value will be 0b110 (decimal 6). If (SYN=0), the Data Receiver MUST encode the three least significant
the SYN/ACK was CE marked, the client MUST increase its r.cep counter bits of its r.cep counter into the ACE field it feeds back to the
before it sends its first ACK, therefore the initial value of the ACE Data Sender.
field will be 0b111 (decimal 7). To support a stateless handshake
(see Section 4.1), these values have been chosen deliberately so that
they are distinct from [RFC5562] behaviour, where the TCP client
would set ECE on the first ACK as feedback for a CE mark on the SYN/
ACK.
3.2.2. Testing for Zeroing of the ACE Field There is only one exception to this rule: On the final ACK of the
3WHS, a TCP client (A) in AccECN mode MUST use the ACE field to feed
back which of the 4 possible values of the IP-ECN field were on the
SYN/ACK (the binary encoding is the same as that used on the SYN/
ACK). Table 3 shows the meaning of each possible value of the ACE
field on the ACK of the SYN/ACK and the value that an AccECN server
MUST set s.cep to as a result.
Section 3.2.1 required the Data Receiver to initialize the r.cep +--------------+---------------------------+------------------------+
| ACE on ACK | IP-ECN codepoint on | Initial s.cep of |
| of SYN/ACK | SYN/ACK inferred by | server in AccECN mode |
| | server | |
+--------------+---------------------------+------------------------+
| 0b000 | {Notes 1, 2} | Disable ECN |
| 0b001 | {Notes 2, 3} | 5 |
| 0b010 | Not-ECT | 5 |
| 0b011 | ECT(1) | 5 |
| 0b100 | ECT(0) | 5 |
| 0b101 | Currently Unused {Note 3} | 5 |
| 0b110 | CE | 6 |
| 0b111 | Currently Unused {Note 3} | 5 |
+--------------+---------------------------+------------------------+
Table 3: Meaning of the ACE field on the ACK of the SYN/ACK
{Note 1}: If the server is in AccECN mode, the value of zero raises
suspicion of zeroing of the ACE field on the path (see
Section 3.2.3).
{Note 2}: If a server is in AccECN mode, there ought to be no valid
case where the ACE field on the last ACK of the 3WHS has a value of
0b000 or 0b001.
However, in the case where a server that implements AccECN is also
using a stateless handshake (termed a SYN cookie) it will not
remember whether it entered AccECN mode. Then these two values
remind it that it did not enter AccECN mode (see Section 4.1 for
details).
{Note 3}: If the server is in AccECN mode, these values are Currently
Unused but the AccECN server's behaviour is still defined for forward
compatibility.
3.2.3. Testing for Zeroing of the ACE Field
Section 3.2.2 required the Data Receiver to initialize the r.cep
counter to a non-zero value. Therefore, in either direction the counter to a non-zero value. Therefore, in either direction the
initial value of the ACE field ought to be non-zero. initial value of the ACE field ought to be non-zero.
If AccECN has been successfully negotiated, the Data Sender SHOULD If AccECN has been successfully negotiated, the Data Sender SHOULD
check the initial value of the ACE field in the first arriving check the initial value of the ACE field in the first arriving
segment with SYN=0. If the initial value of the ACE field is zero segment with SYN=0. If the initial value of the ACE field is zero
(0b000), the Data Sender MUST disable sending ECN-capable packets for (0b000), the Data Sender MUST disable sending ECN-capable packets for
the remainder of the half-connection by setting the IP/ECN field in the remainder of the half-connection by setting the IP/ECN field in
all subsequent packets to Not-ECT. all subsequent packets to Not-ECT.
skipping to change at page 17, line 18 skipping to change at page 18, line 25
because it is lost and the ISN is first acknowledged by a subsequent because it is lost and the ISN is first acknowledged by a subsequent
segment), no test for invalid initialization can be conducted, and segment), no test for invalid initialization can be conducted, and
the half-connection will continue in AccECN mode. the half-connection will continue in AccECN mode.
Note that the Data Sender MUST NOT test whether the arriving counter Note that the Data Sender MUST NOT test whether the arriving counter
in the initial ACE field has been initialized to a specific valid in the initial ACE field has been initialized to a specific valid
value - the above check solely tests whether the ACE fields have been value - the above check solely tests whether the ACE fields have been
incorrectly zeroed. This allows hosts to use different initial incorrectly zeroed. This allows hosts to use different initial
values as an additional signalling channel in future. values as an additional signalling channel in future.
3.2.3. Safety against Ambiguity of the ACE Field 3.2.4. Testing for Mangling of the IP/ECN Field
The value of the ACE field on the SYN/ACK indicates the value of the
IP/ECN field when the SYN arrived at the server. The client can
compare this with how it originally set the IP/ECN field on the SYN.
If this comparison implies an unsafe transition of the IP/ECN field,
for the remainder of the connection the client MUST NOT send ECN-
capable packets, but it MUST continue to feed back any ECN markings
on arriving packets.
The value of the ACE field on the last ACK of the 3WHS indicates the
value of the IP/ECN field when the SYN/ACK arrived at the client.
The server can compare this with how it originally set the IP/ECN
field on the SYN/ACK. If this comparison implies an unsafe
transition of the IP/ECN field, for the remainder of the connection
the server MUST NOT send ECN-capable packets, but it MUST continue to
feedback any ECN markings on arriving packets.
Invalid transitions of the IP/ECN field are defined in [RFC3168] and
repeated here for convenience:
o the not-ECT codepoint changes;
o either ECT codepoint transitions to not-ECT;
o the CE codepoint changes.
RFC 3168 says that a router that changes ECT to not-ECT is invalid
but safe. However, from a host's viewpoint, this transition is
unsafe because it could be the result of two transitions at different
routers on the path: ECT to CE (safe) then CE to not-ECT (unsafe).
This scenario could well happen where an ECN-enabled home router
congests its upstream mobile broadband bottleneck link, then the
ingress to the mobile network clears the ECN field [Mandalari18].
The above fall-back behaviours are necessary in case mangling of the
IP/ECN field is asymmetric, which is currently common over some
mobile networks [Mandalari18]. Then one end might see no unsafe
transition and continue sending ECN-capable packets, while the other
end sees an unsafe transition and stops sending ECN-capable packets.
3.2.5. Safety against Ambiguity of the ACE Field
If too many CE-marked segments are acknowledged at once, or if a long If too many CE-marked segments are acknowledged at once, or if a long
run of ACKs is lost, the 3-bit counter in the ACE field might have run of ACKs is lost, the 3-bit counter in the ACE field might have
cycled between two ACKs arriving at the Data Sender. cycled between two ACKs arriving at the Data Sender.
Therefore an AccECN Data Receiver SHOULD immediately send an ACK once Therefore an AccECN Data Receiver SHOULD immediately send an ACK once
'n' CE marks have arrived since the previous ACK, where 'n' SHOULD be 'n' CE marks have arrived since the previous ACK, where 'n' SHOULD be
2 and MUST be no greater than 6. 2 and MUST be no greater than 6.
If the Data Sender has not received AccECN TCP Options to give it If the Data Sender has not received AccECN TCP Options to give it
skipping to change at page 17, line 43 skipping to change at page 19, line 44
since the last ACK to calculate or estimate how many segments could since the last ACK to calculate or estimate how many segments could
have been acknowledged. An example algorithm to implement this have been acknowledged. An example algorithm to implement this
policy is given in Appendix A.2. An implementer MAY develop an policy is given in Appendix A.2. An implementer MAY develop an
alternative algorithm as long as it satisfies these requirements. alternative algorithm as long as it satisfies these requirements.
If missing acknowledgement numbers arrive later (reordering) and If missing acknowledgement numbers arrive later (reordering) and
prove that the counter did not cycle, the Data Sender MAY attempt to prove that the counter did not cycle, the Data Sender MAY attempt to
neutralise the effect of any action it took based on a conservative neutralise the effect of any action it took based on a conservative
assumption that it later found to be incorrect. assumption that it later found to be incorrect.
3.2.4. The AccECN Option 3.2.6. The AccECN Option
The AccECN Option is defined as shown below in Figure 3. It consists The AccECN Option is defined as shown below in Figure 3. It consists
of three 24-bit fields that provide the 24 least significant bits of of three 24-bit fields that provide the 24 least significant bits of
the r.e0b, r.ceb and r.e1b counters, respectively. The initial 'E' the r.e0b, r.ceb and r.e1b counters, respectively. The initial 'E'
of each field name stands for 'Echo'. of each field name stands for 'Echo'.
0 1 2 3 0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Kind = TBD1 | Length = 11 | EE0B field | | Kind = TBD1 | Length = 11 | EE0B field |
skipping to change at page 18, line 31 skipping to change at page 20, line 31
Appendix A.1 gives an example algorithm for the Data Receiver to Appendix A.1 gives an example algorithm for the Data Receiver to
encode its byte counters into the AccECN Option, and for the Data encode its byte counters into the AccECN Option, and for the Data
Sender to decode the AccECN Option fields into its byte counters. Sender to decode the AccECN Option fields into its byte counters.
Note that there is no field to feedback Not-ECT bytes. Nonetheless Note that there is no field to feedback Not-ECT bytes. Nonetheless
an algorithm for the Data Sender to calculate the number of payload an algorithm for the Data Sender to calculate the number of payload
bytes received as Not-ECT is given in Appendix A.5. bytes received as Not-ECT is given in Appendix A.5.
Whenever a Data Receiver sends an AccECN Option, the rules in Whenever a Data Receiver sends an AccECN Option, the rules in
Section 3.2.6 expect it to always send a full-length option. To cope Section 3.2.8 expect it to always send a full-length option. To cope
with option space limitations, it can omit unchanged fields from the with option space limitations, it can omit unchanged fields from the
tail of the option, as long as it preserves the order of the tail of the option, as long as it preserves the order of the
remaining fields and includes any field that has changed. The length remaining fields and includes any field that has changed. The length
field MUST indicate which fields are present as follows: field MUST indicate which fields are present as follows:
Length=11: EE0B, ECEB, EE1B Length=11: EE0B, ECEB, EE1B
Length=8: EE0B, ECEB Length=8: EE0B, ECEB
Length=5: EE0B Length=5: EE0B
Length=2: (empty) Length=2: (empty)
The empty option of Length=2 is provided to allow for a case where an The empty option of Length=2 is provided to allow for a case where an
AccECN Option has to be sent (e.g. on the SYN/ACK to test the path), AccECN Option has to be sent (e.g. on the SYN/ACK to test the path),
but there is very limited space for the option. For initial but there is very limited space for the option. For initial
experiments, the Length field MUST be 2 greater to accommodate the experiments, the Length field MUST be 2 greater to accommodate the
16-bit magic number. 16-bit magic number.
All implementations of a Data Sender MUST be able to read in AccECN All implementations of a Data Sender MUST be able to read in AccECN
Options of any of the above lengths. They MUST ignore an AccECN Options of any of the above lengths. If the AccECN Option is of any
Option of any other length. other length, implementations MUST use those whole 3 octet fields
that fit within the length and ignore the remainder of the option.
3.2.5. Path Traversal of the AccECN Option 3.2.7. Path Traversal of the AccECN Option
3.2.5.1. Testing the AccECN Option during the Handshake 3.2.7.1. Testing the AccECN Option during the Handshake
The TCP client MUST NOT include the AccECN TCP Option on the SYN. The TCP client MUST NOT include the AccECN TCP Option on the SYN.
Nonetheless, if the AccECN negotiation using the ECN flags in the Nonetheless, if the AccECN negotiation using the ECN flags in the
main TCP header (Section 3.1) is successful, it implicitly declares main TCP header (Section 3.1) is successful, it implicitly declares
that the endpoints also support the AccECN TCP Option. A fall-back that the endpoints also support the AccECN TCP Option. A fall-back
strategy for the loss of the SYN (possibly due to middlebox strategy for the loss of the SYN (possibly due to middlebox
interference) is specified in Section 3.1.2. interference) is specified in Section 3.1.2.
A TCP server that confirms its support for AccECN (in response to an A TCP server that confirms its support for AccECN (in response to an
AccECN SYN from the client as described in Section 3.1) SHOULD also AccECN SYN from the client as described in Section 3.1) SHOULD also
skipping to change at page 19, line 35 skipping to change at page 21, line 33
an AccECN Option in the first ACK at the end of the 3WHS. However, an AccECN Option in the first ACK at the end of the 3WHS. However,
this first ACK is not delivered reliably, so the TCP client SHOULD this first ACK is not delivered reliably, so the TCP client SHOULD
also include an AccECN Option on the first data segment it sends (if also include an AccECN Option on the first data segment it sends (if
it ever sends one). it ever sends one).
A host MAY NOT include an AccECN Option in any of these three cases A host MAY NOT include an AccECN Option in any of these three cases
if it has cached knowledge that the packet would be likely to be if it has cached knowledge that the packet would be likely to be
blocked on the path to the other host if it included an AccECN blocked on the path to the other host if it included an AccECN
Option. Option.
3.2.5.2. Testing for Loss of Packets Carrying the AccECN Option 3.2.7.2. Testing for Loss of Packets Carrying the AccECN Option
If after the normal TCP timeout the TCP server has not received an If after the normal TCP timeout the TCP server has not received an
ACK to acknowledge its SYN/ACK, the SYN/ACK might just have been ACK to acknowledge its SYN/ACK, the SYN/ACK might just have been
lost, e.g. due to congestion, or a middlebox might be blocking the lost, e.g. due to congestion, or a middlebox might be blocking the
AccECN Option. To expedite connection setup, the TCP server SHOULD AccECN Option. To expedite connection setup, the TCP server SHOULD
retransmit the SYN/ACK with the same TCP flags (AE, CWR and ECE) but retransmit the SYN/ACK with the same TCP flags (AE, CWR and ECE) but
with no AccECN Option. If this retransmission times out, to expedite with no AccECN Option. If this retransmission times out, to expedite
connection setup, the TCP server SHOULD disable AccECN and ECN for connection setup, the TCP server SHOULD disable AccECN and ECN for
this connection by retransmitting the SYN/ACK with AE=CWR=ECE=0 and this connection by retransmitting the SYN/ACK with AE=CWR=ECE=0 and
no AccECN Option. Implementers MAY use other fall-back strategies if no AccECN Option. Implementers MAY use other fall-back strategies if
skipping to change at page 20, line 31 skipping to change at page 22, line 28
another packet with the AccECN Option at a later point during the another packet with the AccECN Option at a later point during the
connection but should monitor if that packet got lost as well, in connection but should monitor if that packet got lost as well, in
which case it SHOULD disable the sending of the AccECN Option for which case it SHOULD disable the sending of the AccECN Option for
this half-connection. this half-connection.
Similarly, an AccECN end-point MAY separately memorize which data Similarly, an AccECN end-point MAY separately memorize which data
packets carried an AccECN Option and disable the sending of AccECN packets carried an AccECN Option and disable the sending of AccECN
Options if the loss probability of those packets is significantly Options if the loss probability of those packets is significantly
higher than that of all other data packets in the same connection. higher than that of all other data packets in the same connection.
3.2.5.3. Testing for Stripping of the AccECN Option 3.2.7.3. Testing for Stripping of the AccECN Option
If the TCP client has successfully negotiated AccECN but does not If the TCP client has successfully negotiated AccECN but does not
receive an AccECN Option on the SYN/ACK, it switches into a mode that receive an AccECN Option on the SYN/ACK, it switches into a mode that
assumes that the AccECN Option is not available for this half assumes that the AccECN Option is not available for this half
connection. connection.
Similarly, if the TCP server has successfully negotiated AccECN but Similarly, if the TCP server has successfully negotiated AccECN but
does not receive an AccECN Option on the first segment that does not receive an AccECN Option on the first segment that
acknowledges sequence space at least covering the ISN, it switches acknowledges sequence space at least covering the ISN, it switches
into a mode that assumes that the AccECN Option is not available for into a mode that assumes that the AccECN Option is not available for
this half connection. this half connection.
While a host is in this mode that assumes incoming AccECN Options are While a host is in this mode that assumes incoming AccECN Options are
not available, it MUST adopt the conservative interpretation of the not available, it MUST adopt the conservative interpretation of the
ACE field discussed in Section 3.2.3. However, it cannot make any ACE field discussed in Section 3.2.5. However, it cannot make any
assumption about support of outgoing AccECN Options on the other half assumption about support of outgoing AccECN Options on the other half
connection, so it SHOULD continue to send the AccECN Option itself connection, so it SHOULD continue to send the AccECN Option itself
(unless it has established that sending the AccECN Option is causing (unless it has established that sending the AccECN Option is causing
packets to be blocked as in Section 3.2.5.2). packets to be blocked as in Section 3.2.7.2).
If a host is in the mode that assumes incoming AccECN Options are not If a host is in the mode that assumes incoming AccECN Options are not
available, but it receives an AccECN Option at any later point during available, but it receives an AccECN Option at any later point during
the connection, this clearly indicates that the AccECN Option is not the connection, this clearly indicates that the AccECN Option is not
blocked on the respective path, and the AccECN endpoint MAY switch blocked on the respective path, and the AccECN endpoint MAY switch
out of the mode that assumes the AccECN Option is not available for out of the mode that assumes the AccECN Option is not available for
this half connection. this half connection.
3.2.5.4. Test for Zeroing of the AccECN Option 3.2.7.4. Test for Zeroing of the AccECN Option
For a related test for invalid initialization of the ACE field, see For a related test for invalid initialization of the ACE field, see
Section 3.2.2 Section 3.2.3
Section 3.2 required the Data Receiver to initialize the r.e0b Section 3.2 required the Data Receiver to initialize the r.e0b
counter to a non-zero value. Therefore, in either direction the counter to a non-zero value. Therefore, in either direction the
initial value of the EE0B field in the AccECN Option (if one exists) initial value of the EE0B field in the AccECN Option (if one exists)
ought to be non-zero. If AccECN has been negotiated: ought to be non-zero. If AccECN has been negotiated:
o the TCP server MAY check the initial value of the EE0B field in o the TCP server MAY check the initial value of the EE0B field in
the first segment that acknowledges sequence space that at least the first segment that acknowledges sequence space that at least
covers the ISN plus 1. If the initial value of the EE0B field is covers the ISN plus 1. If the initial value of the EE0B field is
zero, the server will switch into a mode that ignores the AccECN zero, the server will switch into a mode that ignores the AccECN
Option for this half connection. Option for this half connection.
o the TCP client MAY check the initial value of the EE0B field on o the TCP client MAY check the initial value of the EE0B field on
the SYN/ACK. If the initial value of the EE0B field is zero, the the SYN/ACK. If the initial value of the EE0B field is zero, the
client will switch into a mode that ignores the AccECN Option for client will switch into a mode that ignores the AccECN Option for
this half connection. this half connection.
While a host is in the mode that ignores the AccECN Option it MUST While a host is in the mode that ignores the AccECN Option it MUST
adopt the conservative interpretation of the ACE field discussed in adopt the conservative interpretation of the ACE field discussed in
Section 3.2.3. Section 3.2.5.
Note that the Data Sender MUST NOT test whether the arriving byte Note that the Data Sender MUST NOT test whether the arriving byte
counters in the initial AccECN Option have been initialized to counters in the initial AccECN Option have been initialized to
specific valid values - the above checks solely test whether these specific valid values - the above checks solely test whether these
fields have been incorrectly zeroed. This allows hosts to use fields have been incorrectly zeroed. This allows hosts to use
different initial values as an additional signalling channel in different initial values as an additional signalling channel in
future. Also note that the initial value of either field might be future. Also note that the initial value of either field might be
greater than its expected initial value, because the counters might greater than its expected initial value, because the counters might
already have been incremented. Nonetheless, the initial values of already have been incremented. Nonetheless, the initial values of
the counters have been chosen so that they cannot wrap to zero on the counters have been chosen so that they cannot wrap to zero on
these initial segments. these initial segments.
3.2.5.5. Consistency between AccECN Feedback Fields 3.2.7.5. Consistency between AccECN Feedback Fields
When the AccECN Option is available it supplements but does not When the AccECN Option is available it supplements but does not
replace the ACE field. An endpoint using AccECN feedback MUST always replace the ACE field. An endpoint using AccECN feedback MUST always
consider the information provided in the ACE field whether or not the consider the information provided in the ACE field whether or not the
AccECN Option is also available. AccECN Option is also available.
If the AccECN option is present, the s.cep counter might increase If the AccECN option is present, the s.cep counter might increase
while the s.ceb counter does not (e.g. due to a CE-marked control while the s.ceb counter does not (e.g. due to a CE-marked control
packet). The sender's response to such a situation is out of scope, packet). The sender's response to such a situation is out of scope,
and needs to be dealt with in a specification that uses ECN-capable and needs to be dealt with in a specification that uses ECN-capable
skipping to change at page 22, line 25 skipping to change at page 24, line 23
and optionally other integrity tests (Section 4.3). and optionally other integrity tests (Section 4.3).
If either end-point detects that the s.ceb counter has increased but If either end-point detects that the s.ceb counter has increased but
the s.cep has not (and by testing ACK coverage it is certain how much the s.cep has not (and by testing ACK coverage it is certain how much
the ACE field has wrapped), this invalid protocol transition has to the ACE field has wrapped), this invalid protocol transition has to
be due to some form of feedback mangling. So, the Data Sender MUST be due to some form of feedback mangling. So, the Data Sender MUST
disable sending ECN-capable packets for the remainder of the half- disable sending ECN-capable packets for the remainder of the half-
connection by setting the IP/ECN field in all subsequent packets to connection by setting the IP/ECN field in all subsequent packets to
Not-ECT. Not-ECT.
3.2.6. Usage of the AccECN TCP Option 3.2.8. Usage of the AccECN TCP Option
The following rules determine when a Data Receiver in AccECN mode The following rules determine when a Data Receiver in AccECN mode
sends the AccECN TCP Option, and which fields to include: sends the AccECN TCP Option, and which fields to include:
Change-Triggered ACKs: If an arriving packet increments a different Change-Triggered ACKs: If an arriving packet increments a different
byte counter to that incremented by the previous packet, the Data byte counter to that incremented by the previous packet, the Data
Receiver SHOULD immediately send an ACK with an AccECN Option, Receiver MUST immediately send an ACK with an AccECN Option,
without waiting for the next delayed ACK (this is in addition to without waiting for the next delayed ACK (this is in addition to
the safety recommendation in Section 3.2.3 against ambiguity of the safety recommendation in Section 3.2.5 against ambiguity of
the ACE field). Certain offload hardware might not be able to the ACE field).
support change-triggered ACKs, but otherwise it is important to
keep exceptions to this rule to a minimum so that Data Senders can This is stated as a "MUST" so that the data sender can rely on
generally rely on this behaviour; change-triggered ACKs to detect transitions right from the very
start of a flow, without first having to detect whether the
receiver complies. A concern has been raised that certain offload
hardware needed for high performance might not be able to support
change-triggered ACKs, although high performance protocols such as
DCTCP successfully use change-triggered ACKs. One possible
compromise would be for the receiver to heuristically detect
whether the sender is in slow-start, then to implement change-
triggered ACKs in software while the sender is in slow-start, and
offload to hardware otherwise. If the operator disables change-
triggered ACKs, whether partially like this or otherwise, the
operator will also be responsible for ensuring a co-ordinated
sender algorithm is deployed;
Continual Repetition: Otherwise, if arriving packets continue to Continual Repetition: Otherwise, if arriving packets continue to
increment the same byte counter, the Data Receiver can include an increment the same byte counter, the Data Receiver can include an
AccECN Option on most or all (delayed) ACKs, but it does not have AccECN Option on most or all (delayed) ACKs, but it does not have
to. If option space is limited on a particular ACK, the Data to. If option space is limited on a particular ACK, the Data
Receiver MUST give precedence to SACK information about loss. It Receiver MUST give precedence to SACK information about loss. It
SHOULD include an AccECN Option if the r.ceb counter has SHOULD include an AccECN Option if the r.ceb counter has
incremented and it MAY include an AccECN Option if r.ec0b or incremented and it MAY include an AccECN Option if r.ec0b or
r.ec1b has incremented; r.ec1b has incremented;
Full-Length Options Preferred: It SHOULD always use full-length Full-Length Options Preferred: It SHOULD always use full-length
AccECN Options. It MAY use shorter AccECN Options if space is AccECN Options. It MAY use shorter AccECN Options if space is
limited, but it MUST include the counter(s) that have incremented limited, but it MUST include the counter(s) that have incremented
since the previous AccECN Option and it MUST only truncate fields since the previous AccECN Option and it MUST only truncate fields
from the right-hand tail of the option to preserve the order of from the right-hand tail of the option to preserve the order of
the remaining fields (see Section 3.2.4); the remaining fields (see Section 3.2.6);
Beaconing Full-Length Options: Nonetheless, it MUST include a full- Beaconing Full-Length Options: Nonetheless, it MUST include a full-
length AccECN TCP Option on at least three ACKs per RTT, or on all length AccECN TCP Option on at least three ACKs per RTT, or on all
ACKs if there are less than three per RTT (see Appendix A.4 for an ACKs if there are less than three per RTT (see Appendix A.4 for an
example algorithm that satisfies this requirement). example algorithm that satisfies this requirement).
The following example series of arriving IP/ECN fields illustrates The following example series of arriving IP/ECN fields illustrates
when a Data Receiver will emit an ACK if it is using a delayed ACK when a Data Receiver will emit an ACK if it is using a delayed ACK
factor of 2 segments and change-triggered ACKs: 01 -> ACK, 01, 01 -> factor of 2 segments and change-triggered ACKs: 01 -> ACK, 01, 01 ->
ACK, 10 -> ACK, 10, 01 -> ACK, 01, 11 -> ACK, 01 -> ACK. ACK, 10 -> ACK, 10, 01 -> ACK, 01, 11 -> ACK, 01 -> ACK.
skipping to change at page 24, line 11 skipping to change at page 26, line 24
transport layer, but attempts to be transparent (invisible) to the transport layer, but attempts to be transparent (invisible) to the
end-to-end connection. A subset of this class of middleboxes end-to-end connection. A subset of this class of middleboxes
attempts to `normalise' the TCP wire protocol by checking that all attempts to `normalise' the TCP wire protocol by checking that all
values in header fields comply with a rather narrow interpretation of values in header fields comply with a rather narrow interpretation of
the TCP specifications. To comply with the present AccECN the TCP specifications. To comply with the present AccECN
specification, such a middlebox MUST NOT change the ACE field or the specification, such a middlebox MUST NOT change the ACE field or the
AccECN Option and it MUST attempt to preserve the timing of each ACK AccECN Option and it MUST attempt to preserve the timing of each ACK
(for example, if it coalesced ACKs it would not be AccECN-compliant). (for example, if it coalesced ACKs it would not be AccECN-compliant).
A middlebox claiming to be transparent at the transport layer MUST A middlebox claiming to be transparent at the transport layer MUST
forward the AccECN TCP Option unaltered, whether or not the length forward the AccECN TCP Option unaltered, whether or not the length
value matches one of those specified in Section 3.2.4, and whether or value matches one of those specified in Section 3.2.6, and whether or
not the initial values of the byte-counter fields are correct. This not the initial values of the byte-counter fields are correct. This
is because blocking apparently invalid values does not improve is because blocking apparently invalid values does not improve
security (because AccECN hosts are required to ignore invalid values security (because AccECN hosts are required to ignore invalid values
anyway), while it prevents the standardised set of values being anyway), while it prevents the standardised set of values being
extended in future (because outdated normalisers would block updated extended in future (because outdated normalisers would block updated
hosts from using the extended AccECN standard). hosts from using the extended AccECN standard).
Hardware to offload certain TCP processing represents another large Hardware to offload certain TCP processing represents another large
class of middleboxes, even though it is often a function of a host's class of middleboxes, even though it is often a function of a host's
network interface and rarely in its own 'box'. Leeway has been network interface and rarely in its own 'box'. Leeway has been
skipping to change at page 24, line 43 skipping to change at page 27, line 8
A TCP server can use SYN Cookies (see Appendix A of [RFC4987]) to A TCP server can use SYN Cookies (see Appendix A of [RFC4987]) to
protect itself from SYN flooding attacks. It places minimal commonly protect itself from SYN flooding attacks. It places minimal commonly
used connection state in the SYN/ACK, and deliberately does not hold used connection state in the SYN/ACK, and deliberately does not hold
any state while waiting for the subsequent ACK (e.g. it closes the any state while waiting for the subsequent ACK (e.g. it closes the
thread). Therefore it cannot record the fact that it entered AccECN thread). Therefore it cannot record the fact that it entered AccECN
mode for both half-connections. Indeed, it cannot even remember mode for both half-connections. Indeed, it cannot even remember
whether it negotiated the use of classic ECN [RFC3168]. whether it negotiated the use of classic ECN [RFC3168].
Nonetheless, such a server can determine that it negotiated AccECN as Nonetheless, such a server can determine that it negotiated AccECN as
follows. If a TCP server using SYN Cookies supports AccECN and if follows. If a TCP server using SYN Cookies supports AccECN and if it
the first segment it receives that at least covers the ISN contains receives a pure ACK that acknowledges an ISN that is a valid SYN
an ACE field with the value 0b110 or 0b111, it can assume that: cookie, and if the ACK contains an ACE field with the value 0b010 to
0b111 (decimal 2 to 7), it can assume that:
o the TCP client must have requested AccECN support on the SYN o the TCP client must have requested AccECN support on the SYN
o it (the server) must have confirmed that it supported AccECN o it (the server) must have confirmed that it supported AccECN
Therefore the server can switch itself into AccECN mode, and continue Therefore the server can switch itself into AccECN mode, and continue
as if it had never forgotten that it switched itself into AccECN mode as if it had never forgotten that it switched itself into AccECN mode
earlier. For other values of ACE field, heuristics to infer what earlier.
other type of ECN the client supports are out of scope.
If the pure ACK that acknowledges a SYN cookie contains an ACE field
with the value 0b000 or 0b001, these values indicate that the client
did not request support for AccECN and therefore the server does not
enter AccECN mode for this connection. Further, 0b001 on the ACK
implies that the server sent an ECN-capable SYN/ACK, which was marked
CE in the network, and the non-AccECN client fed this back by setting
ECE on the ACK of the SYN/ACK.
4.2. Compatibility with Other TCP Options and Experiments 4.2. Compatibility with Other TCP Options and Experiments
AccECN is compatible (at least on paper) with the most commonly used AccECN is compatible (at least on paper) with the most commonly used
TCP options: MSS, time-stamp, window scaling, SACK and TCP-AO. It is TCP options: MSS, time-stamp, window scaling, SACK and TCP-AO. It is
also compatible with the recent promising experimental TCP options also compatible with the recent promising experimental TCP options
TCP Fast Open (TFO [RFC7413]) and Multipath TCP (MPTCP [RFC6824]). TCP Fast Open (TFO [RFC7413]) and Multipath TCP (MPTCP [RFC6824]).
AccECN is friendly to all these protocols, because space for TCP AccECN is friendly to all these protocols, because space for TCP
options is particularly scarce on the SYN, where AccECN consumes zero options is particularly scarce on the SYN, where AccECN consumes zero
additional header space. additional header space.
When option space is under pressure from other options, Section 3.2.6 When option space is under pressure from other options, Section 3.2.8
provides guidance on how important it is to send an AccECN Option and provides guidance on how important it is to send an AccECN Option and
whether it needs to be a full-length option. whether it needs to be a full-length option.
4.3. Compatibility with Feedback Integrity Mechanisms 4.3. Compatibility with Feedback Integrity Mechanisms
Three alternative mechanisms are available to assure the integrity of Three alternative mechanisms are available to assure the integrity of
ECN and/or loss signals. AccECN is compatible with any of these ECN and/or loss signals. AccECN is compatible with any of these
approaches: approaches:
o The Data Sender can test the integrity of the receiver's ECN (or o The Data Sender can test the integrity of the receiver's ECN (or
skipping to change at page 28, line 27 skipping to change at page 30, line 47
Forward Compatibility: The behaviour of endpoints and middleboxes is Forward Compatibility: The behaviour of endpoints and middleboxes is
carefully defined for all reserved or currently unused codepoints carefully defined for all reserved or currently unused codepoints
in the scheme, to ensure that any blocking of anomalous values is in the scheme, to ensure that any blocking of anomalous values is
always at least under reversible policy control. always at least under reversible policy control.
6. IANA Considerations 6. IANA Considerations
This document reassigns bit 7 of the TCP header flags to the AccECN This document reassigns bit 7 of the TCP header flags to the AccECN
experiment. This bit was previously called the Nonce Sum (NS) flag experiment. This bit was previously called the Nonce Sum (NS) flag
[RFC3540], but RFC 3540 is being reclassified as historic. The flag [RFC3540], but RFC 3540 is being reclassified as historic
will now be defined as: [I-D.ietf-tsvwg-ecn-experimentation]. The flag will now be defined
as:
+-----+-------------------+-----------+ +-----+-------------------+-----------+
| Bit | Name | Reference | | Bit | Name | Reference |
+-----+-------------------+-----------+ +-----+-------------------+-----------+
| 7 | AE (Accurate ECN) | RFC XXXX | | 7 | AE (Accurate ECN) | RFC XXXX |
+-----+-------------------+-----------+ +-----+-------------------+-----------+
[TO BE REMOVED: This registration should take place at the following [TO BE REMOVED: This registration should take place at the following
location: https://www.iana.org/assignments/tcp-header-flags/tcp- location: https://www.iana.org/assignments/tcp-header-flags/tcp-
header-flags.xhtml#tcp-header-flags-1 ] header-flags.xhtml#tcp-header-flags-1 ]
skipping to change at page 29, line 4 skipping to change at page 31, line 28
+------+--------+-----------------------+-----------+ +------+--------+-----------------------+-----------+
| Kind | Length | Meaning | Reference | | Kind | Length | Meaning | Reference |
+------+--------+-----------------------+-----------+ +------+--------+-----------------------+-----------+
| TBD1 | N | Accurate ECN (AccECN) | RFC XXXX | | TBD1 | N | Accurate ECN (AccECN) | RFC XXXX |
+------+--------+-----------------------+-----------+ +------+--------+-----------------------+-----------+
[TO BE REMOVED: This registration should take place at the following [TO BE REMOVED: This registration should take place at the following
location: http://www.iana.org/assignments/tcp-parameters/tcp- location: http://www.iana.org/assignments/tcp-parameters/tcp-
parameters.xhtml#tcp-parameters-1 ] parameters.xhtml#tcp-parameters-1 ]
Early implementation before the IANA allocation MUST follow [RFC6994] Early implementation before the IANA allocation MUST follow [RFC6994]
and use experimental option 254 and magic number 0xACCE (16 bits), and use experimental option 254 and magic number 0xACCE (16 bits),
then migrate to the new option after the allocation. then migrate to the new option after the allocation.
7. Security Considerations 7. Security Considerations
If ever the supplementary part of AccECN based on the new AccECN TCP If ever the supplementary part of AccECN based on the new AccECN TCP
Option is unusable (due for example to middlebox interference) the Option is unusable (due for example to middlebox interference) the
essential part of AccECN's congestion feedback offers only limited essential part of AccECN's congestion feedback offers only limited
resilience to long runs of ACK loss (see Section 3.2.3). These resilience to long runs of ACK loss (see Section 3.2.5). These
problems are unlikely to be due to malicious intervention (because if problems are unlikely to be due to malicious intervention (because if
an attacker could strip a TCP option or discard a long run of ACKs it an attacker could strip a TCP option or discard a long run of ACKs it
could wreak other arbitrary havoc). However, it would be of concern could wreak other arbitrary havoc). However, it would be of concern
if AccECN's resilience could be indirectly compromised during a if AccECN's resilience could be indirectly compromised during a
flooding attack. AccECN is still considered safe though, because if flooding attack. AccECN is still considered safe though, because if
the option is not presented, the AccECN Data Sender is then required the option is not presented, the AccECN Data Sender is then required
to switch to more conservative assumptions about wrap of congestion to switch to more conservative assumptions about wrap of congestion
indication counters (see Section 3.2.3 and Appendix A.2). indication counters (see Section 3.2.5 and Appendix A.2).
Section 4.1 describes how a TCP server can negotiate AccECN and use Section 4.1 describes how a TCP server can negotiate AccECN and use
the SYN cookie method for mitigating SYN flooding attacks. the SYN cookie method for mitigating SYN flooding attacks.
There is concern that ECN markings could be altered or suppressed, There is concern that ECN markings could be altered or suppressed,
particularly because a misbehaving Data Receiver could increase its particularly because a misbehaving Data Receiver could increase its
own throughput at the expense of others. AccECN is compatible with own throughput at the expense of others. AccECN is compatible with
the three schemes known to assure the integrity of ECN feedback (see the three schemes known to assure the integrity of ECN feedback (see
Section 4.3 for details). If the AccECN Option is stripped by an Section 4.3 for details). If the AccECN Option is stripped by an
incorrectly implemented middlebox, the resolution of the feedback incorrectly implemented middlebox, the resolution of the feedback
will be degraded, but the integrity of this degraded information can will be degraded, but the integrity of this degraded information can
still be assured. still be assured.
There is a potential concern that a receiver could deliberately omit
the AccECN Option pretending that it had been stripped by a
middlebox. No known way can yet be contrived to take advantage of
this downgrade attack, but it is mentioned here in case someone else
can contrive one.
The AccECN protocol is not believed to introduce any new privacy The AccECN protocol is not believed to introduce any new privacy
concerns, because it merely counts and feeds back signals at the concerns, because it merely counts and feeds back signals at the
transport layer that had already been visible at the IP layer. transport layer that had already been visible at the IP layer.
8. Acknowledgements 8. Acknowledgements
We want to thank Koen De Schepper, Praveen Balasubramanian and We want to thank Koen De Schepper, Praveen Balasubramanian and
Michael Welzl for their input and discussion. The idea of using the Michael Welzl for their input and discussion. The idea of using the
three ECN-related TCP flags as one field for more accurate TCP-ECN three ECN-related TCP flags as one field for more accurate TCP-ECN
feedback was first introduced in the re-ECN protocol that was the feedback was first introduced in the re-ECN protocol that was the
skipping to change at page 30, line 18 skipping to change at page 33, line 4
for Education, Research, and Innovation under contract no. 15.0268. for Education, Research, and Innovation under contract no. 15.0268.
This support does not imply endorsement. This support does not imply endorsement.
9. Comments Solicited 9. Comments Solicited
Comments and questions are encouraged and very welcome. They can be Comments and questions are encouraged and very welcome. They can be
addressed to the IETF TCP maintenance and minor modifications working addressed to the IETF TCP maintenance and minor modifications working
group mailing list <tcpm@ietf.org>, and/or to the authors. group mailing list <tcpm@ietf.org>, and/or to the authors.
10. References 10. References
10.1. Normative References 10.1. Normative References
[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
Requirement Levels", BCP 14, RFC 2119, Requirement Levels", BCP 14, RFC 2119,
DOI 10.17487/RFC2119, March 1997, DOI 10.17487/RFC2119, March 1997,
<http://www.rfc-editor.org/info/rfc2119>. <https://www.rfc-editor.org/info/rfc2119>.
[RFC3168] Ramakrishnan, K., Floyd, S., and D. Black, "The Addition [RFC3168] Ramakrishnan, K., Floyd, S., and D. Black, "The Addition
of Explicit Congestion Notification (ECN) to IP", of Explicit Congestion Notification (ECN) to IP",
RFC 3168, DOI 10.17487/RFC3168, September 2001, RFC 3168, DOI 10.17487/RFC3168, September 2001,
<http://www.rfc-editor.org/info/rfc3168>. <https://www.rfc-editor.org/info/rfc3168>.
[RFC5681] Allman, M., Paxson, V., and E. Blanton, "TCP Congestion [RFC5681] Allman, M., Paxson, V., and E. Blanton, "TCP Congestion
Control", RFC 5681, DOI 10.17487/RFC5681, September 2009, Control", RFC 5681, DOI 10.17487/RFC5681, September 2009,
<http://www.rfc-editor.org/info/rfc5681>. <https://www.rfc-editor.org/info/rfc5681>.
[RFC6994] Touch, J., "Shared Use of Experimental TCP Options", [RFC6994] Touch, J., "Shared Use of Experimental TCP Options",
RFC 6994, DOI 10.17487/RFC6994, August 2013, RFC 6994, DOI 10.17487/RFC6994, August 2013,
<http://www.rfc-editor.org/info/rfc6994>. <https://www.rfc-editor.org/info/rfc6994>.
10.2. Informative References 10.2. Informative References
[I-D.bagnulo-tcpm-generalized-ecn]
Bagnulo, M. and B. Briscoe, "ECN++: Adding Explicit
Congestion Notification (ECN) to TCP Control Packets",
draft-bagnulo-tcpm-generalized-ecn-04 (work in progress),
May 2017.
[I-D.ietf-tcpm-alternativebackoff-ecn] [I-D.ietf-tcpm-alternativebackoff-ecn]
Khademi, N., Welzl, M., Armitage, G., and G. Fairhurst, Khademi, N., Welzl, M., Armitage, G., and G. Fairhurst,
"TCP Alternative Backoff with ECN (ABE)", draft-ietf-tcpm- "TCP Alternative Backoff with ECN (ABE)", draft-ietf-tcpm-
alternativebackoff-ecn-01 (work in progress), May 2017. alternativebackoff-ecn-02 (work in progress), October
2017.
[I-D.ietf-tcpm-dctcp] [I-D.ietf-tcpm-generalized-ecn]
Bensley, S., Thaler, D., Balasubramanian, P., Eggert, L., Bagnulo, M. and B. Briscoe, "ECN++: Adding Explicit
and G. Judd, "Datacenter TCP (DCTCP): TCP Congestion Congestion Notification (ECN) to TCP Control Packets",
Control for Datacenters", draft-ietf-tcpm-dctcp-06 (work draft-ietf-tcpm-generalized-ecn-01 (work in progress),
in progress), May 2017. September 2017.
[I-D.ietf-tsvwg-ecn-experimentation] [I-D.ietf-tsvwg-ecn-experimentation]
Black, D., "Explicit Congestion Notification (ECN) Black, D., "Relaxing Restrictions on Explicit Congestion
Experimentation", draft-ietf-tsvwg-ecn-experimentation-02 Notification (ECN) Experimentation", draft-ietf-tsvwg-ecn-
(work in progress), April 2017. experimentation-07 (work in progress), October 2017.
[I-D.ietf-tsvwg-l4s-arch] [I-D.ietf-tsvwg-l4s-arch]
Briscoe, B., Schepper, K., and M. Bagnulo, "Low Latency, Briscoe, B., Schepper, K., and M. Bagnulo, "Low Latency,
Low Loss, Scalable Throughput (L4S) Internet Service: Low Loss, Scalable Throughput (L4S) Internet Service:
Architecture", draft-ietf-tsvwg-l4s-arch-00 (work in Architecture", draft-ietf-tsvwg-l4s-arch-00 (work in
progress), May 2017. progress), May 2017.
[I-D.kuehlewind-tcpm-ecn-fallback] [I-D.kuehlewind-tcpm-ecn-fallback]
Kuehlewind, M. and B. Trammell, "A Mechanism for ECN Path Kuehlewind, M. and B. Trammell, "A Mechanism for ECN Path
Probing and Fallback", draft-kuehlewind-tcpm-ecn- Probing and Fallback", draft-kuehlewind-tcpm-ecn-
fallback-01 (work in progress), September 2013. fallback-01 (work in progress), September 2013.
[I-D.moncaster-tcpm-rcv-cheat] [I-D.moncaster-tcpm-rcv-cheat]
Moncaster, T., Briscoe, B., and A. Jacquet, "A TCP Test to Moncaster, T., Briscoe, B., and A. Jacquet, "A TCP Test to
Allow Senders to Identify Receiver Non-Compliance", draft- Allow Senders to Identify Receiver Non-Compliance", draft-
moncaster-tcpm-rcv-cheat-03 (work in progress), July 2014. moncaster-tcpm-rcv-cheat-03 (work in progress), July 2014.
[Mandalari18]
Mandalari, A., Lutu, A., Briscoe, B., Bagnulo, M., and Oe.
Alay, "Measuring ECN++: Good News for ++, Bad News for ECN
over Mobile", IEEE Communications Magazine , March 2018.
(to appear)
[RFC3540] Spring, N., Wetherall, D., and D. Ely, "Robust Explicit [RFC3540] Spring, N., Wetherall, D., and D. Ely, "Robust Explicit
Congestion Notification (ECN) Signaling with Nonces", Congestion Notification (ECN) Signaling with Nonces",
RFC 3540, DOI 10.17487/RFC3540, June 2003, RFC 3540, DOI 10.17487/RFC3540, June 2003,
<http://www.rfc-editor.org/info/rfc3540>. <https://www.rfc-editor.org/info/rfc3540>.
[RFC4987] Eddy, W., "TCP SYN Flooding Attacks and Common [RFC4987] Eddy, W., "TCP SYN Flooding Attacks and Common
Mitigations", RFC 4987, DOI 10.17487/RFC4987, August 2007, Mitigations", RFC 4987, DOI 10.17487/RFC4987, August 2007,
<http://www.rfc-editor.org/info/rfc4987>. <https://www.rfc-editor.org/info/rfc4987>.
[RFC5562] Kuzmanovic, A., Mondal, A., Floyd, S., and K. [RFC5562] Kuzmanovic, A., Mondal, A., Floyd, S., and K.
Ramakrishnan, "Adding Explicit Congestion Notification Ramakrishnan, "Adding Explicit Congestion Notification
(ECN) Capability to TCP's SYN/ACK Packets", RFC 5562, (ECN) Capability to TCP's SYN/ACK Packets", RFC 5562,
DOI 10.17487/RFC5562, June 2009, DOI 10.17487/RFC5562, June 2009,
<http://www.rfc-editor.org/info/rfc5562>. <https://www.rfc-editor.org/info/rfc5562>.
[RFC5925] Touch, J., Mankin, A., and R. Bonica, "The TCP [RFC5925] Touch, J., Mankin, A., and R. Bonica, "The TCP
Authentication Option", RFC 5925, DOI 10.17487/RFC5925, Authentication Option", RFC 5925, DOI 10.17487/RFC5925,
June 2010, <http://www.rfc-editor.org/info/rfc5925>. June 2010, <https://www.rfc-editor.org/info/rfc5925>.
[RFC6824] Ford, A., Raiciu, C., Handley, M., and O. Bonaventure, [RFC6824] Ford, A., Raiciu, C., Handley, M., and O. Bonaventure,
"TCP Extensions for Multipath Operation with Multiple "TCP Extensions for Multipath Operation with Multiple
Addresses", RFC 6824, DOI 10.17487/RFC6824, January 2013, Addresses", RFC 6824, DOI 10.17487/RFC6824, January 2013,
<http://www.rfc-editor.org/info/rfc6824>. <https://www.rfc-editor.org/info/rfc6824>.
[RFC7413] Cheng, Y., Chu, J., Radhakrishnan, S., and A. Jain, "TCP [RFC7413] Cheng, Y., Chu, J., Radhakrishnan, S., and A. Jain, "TCP
Fast Open", RFC 7413, DOI 10.17487/RFC7413, December 2014, Fast Open", RFC 7413, DOI 10.17487/RFC7413, December 2014,
<http://www.rfc-editor.org/info/rfc7413>. <https://www.rfc-editor.org/info/rfc7413>.
[RFC7560] Kuehlewind, M., Ed., Scheffenegger, R., and B. Briscoe, [RFC7560] Kuehlewind, M., Ed., Scheffenegger, R., and B. Briscoe,
"Problem Statement and Requirements for Increased Accuracy "Problem Statement and Requirements for Increased Accuracy
in Explicit Congestion Notification (ECN) Feedback", in Explicit Congestion Notification (ECN) Feedback",
RFC 7560, DOI 10.17487/RFC7560, August 2015, RFC 7560, DOI 10.17487/RFC7560, August 2015,
<http://www.rfc-editor.org/info/rfc7560>. <https://www.rfc-editor.org/info/rfc7560>.
[RFC7713] Mathis, M. and B. Briscoe, "Congestion Exposure (ConEx) [RFC7713] Mathis, M. and B. Briscoe, "Congestion Exposure (ConEx)
Concepts, Abstract Mechanism, and Requirements", RFC 7713, Concepts, Abstract Mechanism, and Requirements", RFC 7713,
DOI 10.17487/RFC7713, December 2015, DOI 10.17487/RFC7713, December 2015,
<http://www.rfc-editor.org/info/rfc7713>. <https://www.rfc-editor.org/info/rfc7713>.
[RFC8257] Bensley, S., Thaler, D., Balasubramanian, P., Eggert, L.,
and G. Judd, "Data Center TCP (DCTCP): TCP Congestion
Control for Data Centers", RFC 8257, DOI 10.17487/RFC8257,
October 2017, <https://www.rfc-editor.org/info/rfc8257>.
Appendix A. Example Algorithms Appendix A. Example Algorithms
This appendix is informative, not normative. It gives example This appendix is informative, not normative. It gives example
algorithms that would satisfy the normative requirements of the algorithms that would satisfy the normative requirements of the
AccECN protocol. However, implementers are free to choose other ways AccECN protocol. However, implementers are free to choose other ways
to implement the requirements. to implement the requirements.
A.1. Example Algorithm to Encode/Decode the AccECN Option A.1. Example Algorithm to Encode/Decode the AccECN Option
skipping to change at page 34, line 22 skipping to change at page 37, line 22
The example algorithms below show how a Data Receiver in AccECN mode The example algorithms below show how a Data Receiver in AccECN mode
could encode its CE packet counter r.cep into the ACE field, and how could encode its CE packet counter r.cep into the ACE field, and how
the Data Sender in AccECN mode could decode the ACE field into its the Data Sender in AccECN mode could decode the ACE field into its
s.cep counter. The Data Sender's algorithm includes code to s.cep counter. The Data Sender's algorithm includes code to
heuristically detect a long enough unbroken string of ACK losses that heuristically detect a long enough unbroken string of ACK losses that
could have concealed a cycle of the congestion counter in the ACE could have concealed a cycle of the congestion counter in the ACE
field of the next ACK to arrive. field of the next ACK to arrive.
Two variants of the algorithm are given: i) a more conservative Two variants of the algorithm are given: i) a more conservative
variant for a Data Sender to use if it detects that the AccECN Option variant for a Data Sender to use if it detects that the AccECN Option
is not available (see Section 3.2.3 and Section 3.2.5); and ii) a is not available (see Section 3.2.5 and Section 3.2.7); and ii) a
less conservative variant that is feasible when complementary less conservative variant that is feasible when complementary
information is available from the AccECN Option. information is available from the AccECN Option.
A.2.1. Safety Algorithm without the AccECN Option A.2.1. Safety Algorithm without the AccECN Option
It is assumed that each local packet counter is a sufficiently sized It is assumed that each local packet counter is a sufficiently sized
unsigned integer (probably 32b) and that the following constant has unsigned integer (probably 32b) and that the following constant has
been assigned: been assigned:
DIVACE = 2^3 DIVACE = 2^3
skipping to change at page 35, line 10 skipping to change at page 38, line 10
and the Data Sender has to ignore the AccECN Option. If newlyAckedB and the Data Sender has to ignore the AccECN Option. If newlyAckedB
is zero, to break the tie the Data Sender could use timestamps (if is zero, to break the tie the Data Sender could use timestamps (if
present) to work out newlyAckedT, the amount of new time that the ACK present) to work out newlyAckedT, the amount of new time that the ACK
acknowledges. Then the Data Sender calculates the minimum difference acknowledges. Then the Data Sender calculates the minimum difference
d.cep between the ACE field and its local s.cep counter, using modulo d.cep between the ACE field and its local s.cep counter, using modulo
arithmetic as follows: arithmetic as follows:
if ((newlyAckedB > 0) || (newlyAckedB == 0 && newlyAckedT > 0)) if ((newlyAckedB > 0) || (newlyAckedB == 0 && newlyAckedT > 0))
d.cep = (ACE + DIVACE - (s.cep % DIVACE)) % DIVACE d.cep = (ACE + DIVACE - (s.cep % DIVACE)) % DIVACE
Section 3.2.3 requires the Data Sender to assume that the ACE field Section 3.2.5 requires the Data Sender to assume that the ACE field
did cycle if it could have cycled under prevailing conditions. The did cycle if it could have cycled under prevailing conditions. The
3-bit ACE field in an arriving ACK could have cycled and become 3-bit ACE field in an arriving ACK could have cycled and become
ambiguous to the Data Sender if a row of ACKs goes missing that ambiguous to the Data Sender if a row of ACKs goes missing that
covers a stream of data long enough to contain 8 or more CE marks. covers a stream of data long enough to contain 8 or more CE marks.
We use the word `missing' rather than `lost', because some or all the We use the word `missing' rather than `lost', because some or all the
missing ACKs might arrive eventually, but out of order. Even if some missing ACKs might arrive eventually, but out of order. Even if some
of the lost ACKs are piggy-backed on data (i.e. not pure ACKs) of the lost ACKs are piggy-backed on data (i.e. not pure ACKs)
retransmissions will not repair the lost AccECN information, because retransmissions will not repair the lost AccECN information, because
AccECN requires retransmissions to carry the latest AccECN counters, AccECN requires retransmissions to carry the latest AccECN counters,
not the original ones. not the original ones.
skipping to change at page 36, line 14 skipping to change at page 39, line 14
The simple algorithm for dSafer.cep above requires no monitoring of The simple algorithm for dSafer.cep above requires no monitoring of
prevailing conditions and it would still be safe if, for example, prevailing conditions and it would still be safe if, for example,
segments were on average at least 5% of full-sized as long as ECN segments were on average at least 5% of full-sized as long as ECN
marking was 5% or less. Assuming it was used, the Data Sender would marking was 5% or less. Assuming it was used, the Data Sender would
increment its packet counter as follows: increment its packet counter as follows:
s.cep += dSafer.cep s.cep += dSafer.cep
If missing acknowledgement numbers arrive later (due to reordering), If missing acknowledgement numbers arrive later (due to reordering),
Section 3.2.3 says "the Data Sender MAY attempt to neutralise the Section 3.2.5 says "the Data Sender MAY attempt to neutralise the
effect of any action it took based on a conservative assumption that effect of any action it took based on a conservative assumption that
it later found to be incorrect". To do this, the Data Sender would it later found to be incorrect". To do this, the Data Sender would
have to store the values of all the relevant variables whenever it have to store the values of all the relevant variables whenever it
made assumptions, so that it could re-evaluate them later. Given made assumptions, so that it could re-evaluate them later. Given
this could become complex and it is not required, we do not attempt this could become complex and it is not required, we do not attempt
to provide an example of how to do this. to provide an example of how to do this.
A.2.2. Safety Algorithm with the AccECN Option A.2.2. Safety Algorithm with the AccECN Option
When the AccECN Option is available on the ACKs before and after the When the AccECN Option is available on the ACKs before and after the
skipping to change at page 38, line 31 skipping to change at page 41, line 31
s_ave = a * s + (1-a) * s_ave, s_ave = a * s + (1-a) * s_ave,
where a is the decay constant for the EWMA. However, then it is where a is the decay constant for the EWMA. However, then it is
necessary to choose a good value for this constant, which ought to necessary to choose a good value for this constant, which ought to
depend on the number of packets in flight. Also the decay constant depend on the number of packets in flight. Also the decay constant
needs to be power of two to avoid floating point arithmetic. needs to be power of two to avoid floating point arithmetic.
A.4. Example Algorithm to Beacon AccECN Options A.4. Example Algorithm to Beacon AccECN Options
Section 3.2.6 requires a Data Receiver to beacon a full-length AccECN Section 3.2.8 requires a Data Receiver to beacon a full-length AccECN
Option at least 3 times per RTT. This could be implemented by Option at least 3 times per RTT. This could be implemented by
maintaining a variable to store the number of ACKs (pure and data maintaining a variable to store the number of ACKs (pure and data
ACKs) since a full AccECN Option was last sent and another for the ACKs) since a full AccECN Option was last sent and another for the
approximate number of ACKs sent in the last round trip time: approximate number of ACKs sent in the last round trip time:
if (acks_since_full_last_sent > acks_in_round / BEACON_FREQ) if (acks_since_full_last_sent > acks_in_round / BEACON_FREQ)
send_full_AccECN_Option() send_full_AccECN_Option()
For optimised integer arithmetic, BEACON_FREQ = 4 could be used, For optimised integer arithmetic, BEACON_FREQ = 4 could be used,
rather than 3, so that the division could be implemented as an rather than 3, so that the division could be implemented as an
skipping to change at page 39, line 37 skipping to change at page 42, line 37
under-counting. under-counting.
However, such precision is unlikely to be necessary. The only known However, such precision is unlikely to be necessary. The only known
use of a count of Not-ECT marked bytes is to test whether equipment use of a count of Not-ECT marked bytes is to test whether equipment
on the path is clearing the ECN field (perhaps due to an out-dated on the path is clearing the ECN field (perhaps due to an out-dated
attempt to clear, or bleach, what used to be the ToS field). To attempt to clear, or bleach, what used to be the ToS field). To
detect bleaching it will be sufficient to detect whether nearly all detect bleaching it will be sufficient to detect whether nearly all
bytes arrive marked as Not-ECT. Therefore there should be no need to bytes arrive marked as Not-ECT. Therefore there should be no need to
keep track of the details of retransmissions. keep track of the details of retransmissions.
Appendix B. Alternative Design Choices (To Be Removed Before
Publication)
This appendix is informative, not normative. It records alternative
designs that the authors chose not to include in the normative
specification, but which the IETF might wish to consider for
inclusion:
Feedback all four ECN codepoints on the SYN/ACK: The last two
negotiation combinations in Table 2 could be used to indicate
AccECN support while also feeding back that the arriving SYN was
ECT(0) or ECT(1). This could be used to probe the client to
server path for incorrect forwarding of the ECN field
[I-D.kuehlewind-tcpm-ecn-fallback].
Feedback all four ECN codepoints on the First ACK: To probe the
server to client path for incorrect ECN forwarding, it could be
useful to have four feedback states on the first ACK from the TCP
client. This could be achieved by assigning four combinations of
the ECN flags in the main TCP header, and only initializing the
ACE field on subsequent segments.
Appendix C. Open Protocol Design Issues (To Be Removed Before
Publication)
1. Currently it is specified that the receiver `SHOULD' use Change-
Triggered ACKs. It is controversial whether this ought to be a
`MUST' instead. A `SHOULD' would leave the Data Sender uncertain
whether it can rely on the timing and ordering information in
ACKs. If the sender guesses wrongly, it will probably introduce
at least 1 RTT of delay before it can use this timing
information. Ironically it will most likely be wanting this
information to reduce ramp-up delay. A `MUST' could make it hard
to implement AccECN in offload hardware. However, it is not
known whether AccECN would be hard to implement in such hardware
even with a `SHOULD' here. For instance, was it hard to offload
DCTCP to hardware because of change-triggered ACKs, or was this
just one of many reasons? The choice between MUST and SHOULD
here is critical. Before that choice is made, a clear use-case
for certainty of timing and ordering information is needed, plus
well-informed discussion about hardware offload constraints.
2. There is possibly a concern that a receiver could deliberately
omit the AccECN Option pretending that it had been stripped by a
middlebox. No known way can yet be contrived to take advantage
of this downgrade attack, but it is mentioned here in case
someone else can contrive one.
Appendix D. Changes in This Version (To Be Removed Before Publication)
The difference between any pair of versions can be displayed at
http://datatracker.ietf.org/doc/draft-kuehlewind-tcpm-accurate-ecn/
history/
Authors' Addresses Authors' Addresses
Bob Briscoe Bob Briscoe
Simula Research Laboratory CableLabs
UK
EMail: ietf@bobbriscoe.net EMail: ietf@bobbriscoe.net
URI: http://bobbriscoe.net/ URI: http://bobbriscoe.net/
Mirja Kuehlewind Mirja Kuehlewind
ETH Zurich ETH Zurich
Zurich Zurich
Switzerland Switzerland
EMail: mirja.kuehlewind@tik.ee.ethz.ch EMail: mirja.kuehlewind@tik.ee.ethz.ch
Richard Scheffenegger Richard Scheffenegger
Vienna Vienna
Austria Austria
EMail: rscheff@gmx.at EMail: rscheff@gmx.at
 End of changes. 95 change blocks. 
278 lines changed or deleted 350 lines changed or added

This html diff was produced by rfcdiff 1.46. The latest version is available from http://tools.ietf.org/tools/rfcdiff/