draft-ietf-tsvwg-ecn-encap-guidelines-03.txt   draft-ietf-tsvwg-ecn-encap-guidelines-04.txt 
Transport Area Working Group B. Briscoe Transport Area Working Group B. Briscoe
Internet-Draft Simula Research Laboratory Internet-Draft Simula Research Laboratory
Updates: 3819 (if approved) J. Kaippallimalil Updates: 3819 (if approved) J. Kaippallimalil
Intended status: Best Current Practice Huawei Intended status: Best Current Practice Huawei
Expires: March 28, 2016 P. Thaler Expires: April 10, 2016 P. Thaler
Broadcom Corporation Broadcom Corporation
September 25, 2015 October 8, 2015
Guidelines for Adding Congestion Notification to Protocols that Guidelines for Adding Congestion Notification to Protocols that
Encapsulate IP Encapsulate IP
draft-ietf-tsvwg-ecn-encap-guidelines-03 draft-ietf-tsvwg-ecn-encap-guidelines-04
Abstract Abstract
The purpose of this document is to guide the design of congestion The purpose of this document is to guide the design of congestion
notification in any lower layer or tunnelling protocol that notification in any lower layer or tunnelling protocol that
encapsulates IP. The aim is for explicit congestion signals to encapsulates IP. The aim is for explicit congestion signals to
propagate consistently from lower layer protocols into IP. Then the propagate consistently from lower layer protocols into IP. Then the
IP internetwork layer can act as a portability layer to carry IP internetwork layer can act as a portability layer to carry
congestion notification from non-IP-aware congested nodes up to the congestion notification from non-IP-aware congested nodes up to the
transport layer (L4). Following these guidelines should assure transport layer (L4). Following these guidelines should assure
skipping to change at page 1, line 42 skipping to change at page 1, line 42
Internet-Drafts are working documents of the Internet Engineering Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF). Note that other groups may also distribute Task Force (IETF). Note that other groups may also distribute
working documents as Internet-Drafts. The list of current Internet- working documents as Internet-Drafts. The list of current Internet-
Drafts is at http://datatracker.ietf.org/drafts/current/. Drafts is at http://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress." material or to cite them other than as "work in progress."
This Internet-Draft will expire on March 28, 2016. This Internet-Draft will expire on April 10, 2016.
Copyright Notice Copyright Notice
Copyright (c) 2015 IETF Trust and the persons identified as the Copyright (c) 2015 IETF Trust and the persons identified as the
document authors. All rights reserved. document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents Provisions Relating to IETF Documents
(http://trustee.ietf.org/license-info) in effect on the date of (http://trustee.ietf.org/license-info) in effect on the date of
publication of this document. Please review these documents publication of this document. Please review these documents
skipping to change at page 2, line 29 skipping to change at page 2, line 29
2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 6 2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 6
3. Guidelines in All Cases . . . . . . . . . . . . . . . . . . . 7 3. Guidelines in All Cases . . . . . . . . . . . . . . . . . . . 7
4. Modes of Operation . . . . . . . . . . . . . . . . . . . . . 7 4. Modes of Operation . . . . . . . . . . . . . . . . . . . . . 7
4.1. Feed-Forward-and-Up Mode . . . . . . . . . . . . . . . . 8 4.1. Feed-Forward-and-Up Mode . . . . . . . . . . . . . . . . 8
4.2. Feed-Up-and-Forward Mode . . . . . . . . . . . . . . . . 10 4.2. Feed-Up-and-Forward Mode . . . . . . . . . . . . . . . . 10
4.3. Feed-Backward Mode . . . . . . . . . . . . . . . . . . . 10 4.3. Feed-Backward Mode . . . . . . . . . . . . . . . . . . . 10
4.4. Null Mode . . . . . . . . . . . . . . . . . . . . . . . . 12 4.4. Null Mode . . . . . . . . . . . . . . . . . . . . . . . . 12
5. Feed-Forward-and-Up Mode: Guidelines for Adding Congestion 5. Feed-Forward-and-Up Mode: Guidelines for Adding Congestion
Notification . . . . . . . . . . . . . . . . . . . . . . . . 12 Notification . . . . . . . . . . . . . . . . . . . . . . . . 12
5.1. IP-in-IP Tunnels with Tightly Coupled Shim Headers . . . 13 5.1. IP-in-IP Tunnels with Tightly Coupled Shim Headers . . . 13
5.2. Wire Protocol Design: Indication of ECN Support . . . . . 13 5.2. Wire Protocol Design: Indication of ECN Support . . . . . 14
5.3. Encapsulation Guidelines . . . . . . . . . . . . . . . . 15 5.3. Encapsulation Guidelines . . . . . . . . . . . . . . . . 15
5.4. Decapsulation Guidelines . . . . . . . . . . . . . . . . 17 5.4. Decapsulation Guidelines . . . . . . . . . . . . . . . . 17
5.5. Sequences of Similar Tunnels or Subnets . . . . . . . . . 18 5.5. Sequences of Similar Tunnels or Subnets . . . . . . . . . 18
5.6. Reframing and Congestion Markings . . . . . . . . . . . . 19 5.6. Reframing and Congestion Markings . . . . . . . . . . . . 19
6. Feed-Up-and-Forward Mode: Guidelines for Adding Congestion 6. Feed-Up-and-Forward Mode: Guidelines for Adding Congestion
Notification . . . . . . . . . . . . . . . . . . . . . . . . 19 Notification . . . . . . . . . . . . . . . . . . . . . . . . 20
7. Feed-Backward Mode: Guidelines for Adding Congestion 7. Feed-Backward Mode: Guidelines for Adding Congestion
Notification . . . . . . . . . . . . . . . . . . . . . . . . 21 Notification . . . . . . . . . . . . . . . . . . . . . . . . 21
8. IANA Considerations (to be removed by RFC Editor) . . . . . . 22 8. IANA Considerations (to be removed by RFC Editor) . . . . . . 22
9. Security Considerations . . . . . . . . . . . . . . . . . . . 22 9. Security Considerations . . . . . . . . . . . . . . . . . . . 22
10. Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . 22 10. Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . 23
11. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 23 11. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 23
12. Comments Solicited . . . . . . . . . . . . . . . . . . . . . 23 12. Comments Solicited . . . . . . . . . . . . . . . . . . . . . 23
13. References . . . . . . . . . . . . . . . . . . . . . . . . . 23 13. References . . . . . . . . . . . . . . . . . . . . . . . . . 24
13.1. Normative References . . . . . . . . . . . . . . . . . . 23 13.1. Normative References . . . . . . . . . . . . . . . . . . 24
13.2. Informative References . . . . . . . . . . . . . . . . . 24 13.2. Informative References . . . . . . . . . . . . . . . . . 24
Appendix A. Outstanding Document Issues . . . . . . . . . . . . 28 Appendix A. Outstanding Document Issues . . . . . . . . . . . . 28
Appendix B. Changes in This Version (to be removed by RFC Appendix B. Changes in This Version (to be removed by RFC
Editor) . . . . . . . . . . . . . . . . . . . . . . 28 Editor) . . . . . . . . . . . . . . . . . . . . . . 28
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 30 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 30
1. Introduction 1. Introduction
The benefits of Explicit Congestion Notification (ECN) described The benefits of Explicit Congestion Notification (ECN) described
below can only be fully realised if support for ECN is added to the below can only be fully realised if support for ECN is added to the
skipping to change at page 3, line 27 skipping to change at page 3, line 27
at any layer is not ECN-aware, or if the ultimate receiver or sender at any layer is not ECN-aware, or if the ultimate receiver or sender
is not ECN-aware, congestion needs to be indicated by dropping a is not ECN-aware, congestion needs to be indicated by dropping a
packet, not marking it. packet, not marking it.
The purpose of this document is to guide the addition of congestion The purpose of this document is to guide the addition of congestion
notification to any subnet technology or tunnelling protocol, so that notification to any subnet technology or tunnelling protocol, so that
lower layer equipment can signal congestion explicitly and it will lower layer equipment can signal congestion explicitly and it will
propagate consistently into encapsulated (higher layer) headers, propagate consistently into encapsulated (higher layer) headers,
otherwise the signals will not reach their ultimate destination. otherwise the signals will not reach their ultimate destination.
ECN is defined in the IP header (v4 & v6) [RFC3168] to allow a ECN is defined in the IP header (v4 and v6) [RFC3168] to allow a
resource to notify the onset of queue build-up without having to drop resource to notify the onset of queue build-up without having to drop
packets, by explicitly marking a proportion of packets with the packets, by explicitly marking a proportion of packets with the
congestion experienced (CE) codepoint. congestion experienced (CE) codepoint.
Given a suitable marking scheme, ECN removes nearly all congestion Given a suitable marking scheme, ECN removes nearly all congestion
loss and it cuts delays for two main reasons: loss and it cuts delays for two main reasons:
o It avoids the delay when recovering from congestion losses, which o It avoids the delay when recovering from congestion losses, which
particularly benefits small flows or real-time flows, making their particularly benefits small flows or real-time flows, making their
delivery time predictably short [RFC2884]; delivery time predictably short [RFC2884];
skipping to change at page 3, line 50 skipping to change at page 3, line 50
remove the need to configure a degree of delay into buffers before remove the need to configure a degree of delay into buffers before
they start to notify congestion (the cause of bufferbloat). This they start to notify congestion (the cause of bufferbloat). This
is because drop involves a trade-off between sending a timely is because drop involves a trade-off between sending a timely
signal and trying to avoid impairment, whereas ECN is solely a signal and trying to avoid impairment, whereas ECN is solely a
signal not an impairment, so there is no harm triggering it signal not an impairment, so there is no harm triggering it
earlier. earlier.
Some lower layer technologies (e.g. MPLS, Ethernet) are used to form Some lower layer technologies (e.g. MPLS, Ethernet) are used to form
subnetworks with IP-aware nodes only at the edges. These networks subnetworks with IP-aware nodes only at the edges. These networks
are often sized so that it is rare for interior queues to overflow. are often sized so that it is rare for interior queues to overflow.
However, this has often be more due to the inability of the original However, until recently this was more due to the inability of TCP to
TCP protocol to saturate the links. For many years, fixes such as saturate the links. For many years, fixes such as window scaling
window scaling [RFC1323] proved hard to deploy. But now that modern [RFC1323] proved hard to deploy. And the New Reno variant of TCP has
operating systems are finally capable of saturating interior links, remained in widespread use despite its inability to scale to high
even the buffers of well-provisioned interior switches will need to flow rates. However, now that modern operating systems are finally
signal episodes of queuing. capable of saturating interior links, even the buffers of well-
provisioned interior switches will need to signal episodes of
queuing.
Propagation of ECN is defined for MPLS [RFC5129], and is being Propagation of ECN is defined for MPLS [RFC5129], and is being
defined for TRILL [I-D.ietf-trill-rfc7180bis], but it remains to be defined for TRILL [I-D.ietf-trill-rfc7180bis], but it remains to be
defined for a number of other subnetwork technologies. defined for a number of other subnetwork technologies.
Similarly, ECN propagation is yet to be defined for many tunnelling Similarly, ECN propagation is yet to be defined for many tunnelling
protocols. [RFC6040] defines how ECN should be propagated for IP-in- protocols. [RFC6040] defines how ECN should be propagated for IP-in-
IP [RFC2003] and IPsec [RFC4301] tunnels. However, as Section 9.3 of IP [RFC2003] and IPsec [RFC4301] tunnels. However, as Section 9.3 of
RFC3168 pointed out, ECN support will need to be defined for other RFC3168 pointed out, ECN support will need to be defined for other
tunnelling protocols, e.g. L2TP [RFC2661], GRE [RFC1701], [RFC2784], tunnelling protocols, e.g. L2TP [RFC2661], GRE [RFC1701], [RFC2784],
PPTP [RFC2637] and GTP [GTPv1], [GTPv1-U], [GTPv2-C]. PPTP [RFC2637] and GTP [GTPv1], [GTPv1-U], [GTPv2-C].
Incremental deployment is the most tricky aspect when adding support Incremental deployment is the most delicate aspect when adding
for ECN. The original ECN protocol in IP [RFC3168] was carefully support for ECN. The original ECN protocol in IP [RFC3168] was
designed so that a congested buffer would not mark a packet (rather carefully designed so that a congested buffer would not mark a packet
than drop it) unless both source and destination hosts were ECN- (rather than drop it) unless both source and destination hosts were
capable. Otherwise its congestion markings would never be detected ECN-capable. Otherwise its congestion markings would never be
and congestion would just deteriorate further. However, to support detected and congestion would just build up further. However, to
congestion marking below the IP layer, it is not sufficient to only support congestion marking below the IP layer, it is not sufficient
check that the two end-points support ECN; correct operation also to only check that the two end-points support ECN; correct operation
depends on the decapsulator at each subnet egress faithfully also depends on the decapsulator at each subnet egress faithfully
propagating congestion notifications to the higher layer. Otherwise, propagating congestion notifications to the higher layer. Otherwise,
a legacy decapsulator might silently fail to propagate any ECN a legacy decapsulator might silently fail to propagate any ECN
signals from the outer to the forwarded header. Then the lost signals from the outer to the forwarded header. Then the lost
signals would never be detected and again congestion would signals would never be detected and again congestion would build up
deteriorate further. The guidelines given later require protocol further. The guidelines given later require protocol designers to
designers to carefully consider incremental deployment, and suggest carefully consider incremental deployment, and suggest various safe
various safe approaches for different circumstances. approaches for different circumstances.
Of course, the IETF does not have standards authority over every link Of course, the IETF does not have standards authority over every link
layer protocol. So this document gives guidelines for designing layer protocol. So this document gives guidelines for designing
propagation of congestion notification across the interface between propagation of congestion notification across the interface between
IP and protocols that may encapsulate IP (i.e. that can be layered IP and protocols that may encapsulate IP (i.e. that can be layered
beneath IP). Each lower layer technology will exhibit different beneath IP). Each lower layer technology will exhibit different
issues and compromises, so the IETF or the relevant standards body issues and compromises, so the IETF or the relevant standards body
must be free to define the specifics of each lower layer congestion must be free to define the specifics of each lower layer congestion
notification scheme. Nonetheless, if the guidelines are followed, notification scheme. Nonetheless, if the guidelines are followed,
congestion notification should interwork between different congestion notification should interwork between different
skipping to change at page 6, line 25 skipping to change at page 6, line 25
Further terminology used within this document: Further terminology used within this document:
Protocol data unit (PDU): Information that is delivered as a unit Protocol data unit (PDU): Information that is delivered as a unit
among peer entities of a layered network consisting of protocol among peer entities of a layered network consisting of protocol
control information (typically a header) and possibly user data control information (typically a header) and possibly user data
(payload) of that layer. The scope of this document includes (payload) of that layer. The scope of this document includes
layer 2 and layer 3 networks, where the PDU is respectively termed layer 2 and layer 3 networks, where the PDU is respectively termed
a frame or a packet (or a cell in ATM). PDU is a general term for a frame or a packet (or a cell in ATM). PDU is a general term for
any of these. This definition also includes a payload with a shim any of these. This definition also includes a payload with a shim
header lying somewhere between layer 2 & 3. header lying somewhere between layer 2 and 3.
Transport: The end-to-end transmission control function, Transport: The end-to-end transmission control function,
conventionally considered at layer-4 in the OSI reference model. conventionally considered at layer-4 in the OSI reference model.
Given the audience for this document will often use the word Given the audience for this document will often use the word
transport to mean low level bit carriage, whenever the term is transport to mean low level bit carriage, whenever the term is
used it will be qualified, e.g. 'L4 transport'. used it will be qualified, e.g. 'L4 transport'.
Encapsulator: The link or tunnel endpoint function that adds an Encapsulator: The link or tunnel endpoint function that adds an
outer header to a PDU (also termed the 'link ingress', the 'subnet outer header to a PDU (also termed the 'link ingress', the 'subnet
ingress', the 'ingress tunnel endpoint' or just the 'ingress' ingress', the 'ingress tunnel endpoint' or just the 'ingress'
skipping to change at page 7, line 8 skipping to change at page 7, line 8
Inner header: The header encapsulated by the outer header. Inner header: The header encapsulated by the outer header.
Outgoing header: The header forwarded by the decapsulator. Outgoing header: The header forwarded by the decapsulator.
CE: Congestion Experienced [RFC3168] CE: Congestion Experienced [RFC3168]
ECT: ECN-Capable Transport [RFC3168] ECT: ECN-Capable Transport [RFC3168]
Not-ECT: Not ECN-Capable Transport [RFC3168] Not-ECT: Not ECN-Capable Transport [RFC3168]
Load Regulator: For each flow of PDUs, the transport function that
is capable of controlling the data rate. Typically located at the
data source, but in-path nodes can regulate load in some
congestion control arrangements (e.g. admission control, policing
nodes or transport circuit-breakers
[I-D.ietf-tsvwg-circuit-breaker]). Note the term "a function
capable of controlling the load" deliberately includes a transport
that doesn't actually control the load responsively but ideally it
ought to (e.g. a sending application without congestion control
that uses UDP).
ECN-PDU: A PDU that is part of a feedback loop within which all the ECN-PDU: A PDU that is part of a feedback loop within which all the
nodes that need to propagate explicit congestion notifications nodes that need to propagate explicit congestion notifications
back to the Load Regulator are ECN-capable. An IP packet with a back to the Load Regulator are ECN-capable. An IP packet with a
non-zero ECN field implies that the endpoints are ECN-capable, so non-zero ECN field implies that the endpoints are ECN-capable, so
this would be an ECN-PDU. However, ECN-PDU is intended to be a this would be an ECN-PDU. However, ECN-PDU is intended to be a
general term for a PDU at any layer, not just IP. general term for a PDU at any layer, not just IP.
Not-ECN-PDU: A PDU that is part of a feedback-loop within which some Not-ECN-PDU: A PDU that is part of a feedback-loop within which some
nodes necessary to propagate explicit congestion notifications nodes necessary to propagate explicit congestion notifications
back to the load regulator are not ECN-capable. back to the load regulator are not ECN-capable.
Load Regulator: For each flow of PDUs, the transport function that
is capable of controlling the data rate. Typically located at the
data source, but in-path nodes can regulate load in some
congestion control arrangements (e.g. admission control or
policing nodes). Note the term "a function capable of controlling
the load" deliberately includes a transport that doesn't actually
control the load but ideally it ought to (e.g. a sending
application without congestion control that uses UDP).
Congestion Baseline: The location of the function on the path that Congestion Baseline: The location of the function on the path that
initialised the values of all congestion notification fields in a initialised the values of all congestion notification fields in a
sequence of packets, before any are set to the congestion sequence of packets, before any are set to the congestion
experienced (CE) codepoint if they experience congestion further experienced (CE) codepoint if they experience congestion further
downstream. Typically the original data source at layer-4. downstream. Typically the original data source at layer-4.
3. Guidelines in All Cases 3. Guidelines in All Cases
RFC 3168 specifies that the ECN field in the IP header is intended to RFC 3168 specifies that the ECN field in the IP header is intended to
be marked by active queue management algorithms. Any congestion be marked by active queue management algorithms. Any congestion
skipping to change at page 8, line 42 skipping to change at page 8, line 42
the data forwards. It will then also be necessary to define how the the data forwards. It will then also be necessary to define how the
egress of the lower layer subnet propagates this explicit signal into egress of the lower layer subnet propagates this explicit signal into
the forwarded upper layer (IP) header. It can then continue forwards the forwarded upper layer (IP) header. It can then continue forwards
until it finally reaches the destination transport (at L4). Then until it finally reaches the destination transport (at L4). Then
typically the destination will feed this congestion notification back typically the destination will feed this congestion notification back
to the source transport using an end-to-end protocol (e.g. TCP). to the source transport using an end-to-end protocol (e.g. TCP).
This is the arrangement that has already been used to add ECN to IP- This is the arrangement that has already been used to add ECN to IP-
in-IP tunnels [RFC6040], IP-in-MPLS and MPLS-in-MPLS [RFC5129]. in-IP tunnels [RFC6040], IP-in-MPLS and MPLS-in-MPLS [RFC5129].
This mode is illustrated in Figure 1. Along the middle of the This mode is illustrated in Figure 1. Along the middle of the
figure, layers 2, 3 & 4 of the protocol stack are shown, and one figure, layers 2, 3 and 4 of the protocol stack are shown, and one
packet is shown along the bottom as it progresses across the network packet is shown along the bottom as it progresses across the network
from source to destination, crossing two subnets connected by a from source to destination, crossing two subnets connected by a
router, and crossing two switches on the path across each subnet. router, and crossing two switches on the path across each subnet.
Congestion at the output of the first switch (shown as *) leads to a Congestion at the output of the first switch (shown as *) leads to a
congestion marking in the L2 header (shown as C in the illustration congestion marking in the L2 header (shown as C in the illustration
of the packet). The chevrons show the progress of the resulting of the packet). The chevrons show the progress of the resulting
congestion indication. It is propagated from link to link across the congestion indication. It is propagated from link to link across the
subnet in the L2 header, then when the router removes the marked L2 subnet in the L2 header, then when the router removes the marked L2
header, it propagates the marking up into the L3 (IP) header. The header, it propagates the marking up into the L3 (IP) header. The
router forwards the marked L3 header into subnet 2, and when it adds router forwards the marked L3 header into subnet 2, and when it adds
skipping to change at page 9, line 46 skipping to change at page 9, line 46
Of course, modern networks are rarely as simple as this text-book Of course, modern networks are rarely as simple as this text-book
example, often involving multiple nested layers. For example, a 3GPP example, often involving multiple nested layers. For example, a 3GPP
mobile network may have two IP-in-IP (GTP) tunnels in series and an mobile network may have two IP-in-IP (GTP) tunnels in series and an
MPLS backhaul between the base station and the first router. MPLS backhaul between the base station and the first router.
Nonetheless, the example illustrates the general idea of feeding Nonetheless, the example illustrates the general idea of feeding
congestion notification forward then upward whenever a header is congestion notification forward then upward whenever a header is
removed at the egress of a subnet. removed at the egress of a subnet.
Note that the FECN (forward ECN) bit in Frame Relay and the explicit Note that the FECN (forward ECN) bit in Frame Relay and the explicit
forward congestion indication (EFCI [ITU-T.I.371]) bit in ATM user forward congestion indication (EFCI [ITU-T.I.371]) bit in ATM user
data cells follow a feed-forward pattern. However, in ATM, this is data cells follow a feed-forward pattern. However, in ATM, this
only as part of a feed-forward-and-backward pattern at the lower arrangement is only part of a feed-forward-and-backward pattern at
layer, not feed-forward-and-up out of the lower layer--the intention the lower layer, not feed-forward-and-up out of the lower layer--the
was never to interface to IP ECN at the subnet egress. To our intention was never to interface to IP ECN at the subnet egress. To
knowledge, Frame Relay FECN is solely used to detect where more our knowledge, Frame Relay FECN is solely used to detect where more
capacity should be provisioned [Buck00]. capacity should be provisioned [Buck00].
4.2. Feed-Up-and-Forward Mode 4.2. Feed-Up-and-Forward Mode
Ethernet is particularly difficult to extend incrementally to support Ethernet is particularly difficult to extend incrementally to support
explicit congestion notification. One way to support ECN in such explicit congestion notification. One way to support ECN in such
cases has been to use so called 'layer-3 switches'. These are cases has been to use so called 'layer-3 switches'. These are
Ethernet switches that bury into the Ethernet payload to find an IP Ethernet switches that bury into the Ethernet payload to find an IP
header and manipulate or act on certain IP fields (specifically header and manipulate or act on certain IP fields (specifically
Diffserv & ECN). For instance, in Data Center TCP [DCTCP], layer-3 Diffserv & ECN). For instance, in Data Center TCP [DCTCP], layer-3
skipping to change at page 11, line 11 skipping to change at page 11, line 11
been defined for use internally within the subnet with its own been defined for use internally within the subnet with its own
feedback and load regulation, but typically the interface with IP for feedback and load regulation, but typically the interface with IP for
ECN has not been defined. ECN has not been defined.
For instance, for the available bit-rate (ABR) service in ATM, the For instance, for the available bit-rate (ABR) service in ATM, the
relative rate mechanism was one of the more popular mechanisms for relative rate mechanism was one of the more popular mechanisms for
managing traffic, tending to supersede earlier designs. In this managing traffic, tending to supersede earlier designs. In this
approach ATM switches send special resource management (RM) cells in approach ATM switches send special resource management (RM) cells in
both the forward and backward directions to control the ingress rate both the forward and backward directions to control the ingress rate
of user data into a virtual circuit. If a switch buffer is of user data into a virtual circuit. If a switch buffer is
approaching congestion or congested it sends an RM cell back towards approaching congestion or is congested it sends an RM cell back
the ingress with respectively the No Increase (NI) or Congestion towards the ingress with respectively the No Increase (NI) or
Indication (CI) bit set in its message type field [ATM-TM-ABR]. The Congestion Indication (CI) bit set in its message type field
ingress then holds or decreases its sending bit-rate accordingly. [ATM-TM-ABR]. The ingress then holds or decreases its sending bit-
rate accordingly.
_ _ _ _ _ _
/_______ | | |C| ACK packet (X) /_______ | | |C| ACK packet (X)
\ |_|_|_| \ |_|_|_|
+---+ layer: 2 3 4 header +---+ +---+ layer: 2 3 4 header +---+
| <|<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< Packet X <<<<<<<<<<<<<|<< |L4 | <|<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< Packet X <<<<<<<<<<<<<|<< |L4
| | +---+ | ^ | | | +---+ | ^ |
| | | *|>>> Packet W >>>>>>>>>>>>|>^ |L3 | | | *|>>> Packet W >>>>>>>>>>>>|>^ |L3
| | +---+ +---+ | | +---+ +---+ | | | | +---+ +---+ | | +---+ +---+ | |
| | | | | | | <|<<<<<|<<<|<(V)<|<<<| | |L2 | | | | | | | <|<<<<<|<<<|<(V)<|<<<| | |L2
skipping to change at page 12, line 32 skipping to change at page 12, line 32
Often link and physical layer resources are 'non-blocking' by design. Often link and physical layer resources are 'non-blocking' by design.
In these cases congestion notification may be implemented but it does In these cases congestion notification may be implemented but it does
not need to be deployed at the lower layer; ECN in IP would be not need to be deployed at the lower layer; ECN in IP would be
sufficient. sufficient.
A degenerate example is a point-to-point Ethernet link. Excess A degenerate example is a point-to-point Ethernet link. Excess
loading of the link merely causes the queue from the higher layer to loading of the link merely causes the queue from the higher layer to
back up, while the lower layer remains immune to congestion. Even a back up, while the lower layer remains immune to congestion. Even a
whole meshed subnetwork can be made immune to interior congestion by whole meshed subnetwork can be made immune to interior congestion by
limiting ingress capacity and careful sizing of links, particularly limiting ingress capacity and sufficient sizing of interior links,
if multi-path routing is used to ensure even worst-case patterns of e.g. a non-blocking fat-tree network. An alternative to fat links
load cannot congest any link. near the root is numerous thin links with multi-path routing to
ensure even worst-case patterns of load cannot congest any link, e.g.
a Clos network.
5. Feed-Forward-and-Up Mode: Guidelines for Adding Congestion 5. Feed-Forward-and-Up Mode: Guidelines for Adding Congestion
Notification Notification
Feed-forward-and-up is the mode already used for signalling ECN up Feed-forward-and-up is the mode already used for signalling ECN up
the layers through MPLS into IP [RFC5129] and through IP-in-IP the layers through MPLS into IP [RFC5129] and through IP-in-IP
tunnels [RFC6040]. These RFCs take a consistent approach and the tunnels [RFC6040]. These RFCs take a consistent approach and the
following guidelines are designed to ensure this consistency following guidelines are designed to ensure this consistency
continues as ECN support is added to other protocols that encapsulate continues as ECN support is added to other protocols that encapsulate
IP. The guidelines are also designed to ensure compliance with the IP. The guidelines are also designed to ensure compliance with the
skipping to change at page 21, line 21 skipping to change at page 21, line 33
directly coupled with IP layer congestion notification. The subnet directly coupled with IP layer congestion notification. The subnet
attempts to minimise congestion internally, and if the incoming load attempts to minimise congestion internally, and if the incoming load
at the ingress exceeds the capacity somewhere through the subnet, the at the ingress exceeds the capacity somewhere through the subnet, the
layer 3 buffer into the ingress backs up. Thus, a feed-backward mode layer 3 buffer into the ingress backs up. Thus, a feed-backward mode
subnet is in some sense similar to a null mode subnet, in that there subnet is in some sense similar to a null mode subnet, in that there
is no need for any direct interaction between the subnet and higher is no need for any direct interaction between the subnet and higher
layer congestion notification. Therefore no detailed protocol design layer congestion notification. Therefore no detailed protocol design
guidelines are appropriate. Nonetheless, a more general guideline is guidelines are appropriate. Nonetheless, a more general guideline is
appropriate: appropriate:
1. A subnetwork technology intended to eventually interface to IP A subnetwork technology intended to eventually interface to IP
SHOULD NOT be designed using only the feed-backward mode, which SHOULD NOT be designed using only the feed-backward mode, which is
is certainly best for a stand-alone subnet, but would need to be certainly best for a stand-alone subnet, but would need to be
modified to work efficiently as part of the wider Internet, modified to work efficiently as part of the wider Internet,
because IP uses feed-forward-and-up mode. because IP uses feed-forward-and-up mode.
The feed-backward approach at least works beneath IP, where the term The feed-backward approach at least works beneath IP, where the term
'works' is used only in a narrow functional sense because feed- 'works' is used only in a narrow functional sense because feed-
backward can result in very inefficient and sluggish congestion backward can result in very inefficient and sluggish congestion
control--except if it is confined to the subnet directly connected to control--except if it is confined to the subnet directly connected to
the original data source, when it is faster than feed-forward. It the original data source, when it is faster than feed-forward. It
would be valid to design a protocol that could work in feed-backward would be valid to design a protocol that could work in feed-backward
mode for paths that only cross one subnet, and in feed-forward-and-up mode for paths that only cross one subnet, and in feed-forward-and-up
mode for paths that cross subnets. mode for paths that cross subnets.
skipping to change at page 23, line 25 skipping to change at page 23, line 34
technologies to pass ECN signals into the IP layer, even if they do technologies to pass ECN signals into the IP layer, even if they do
not support ECN natively. not support ECN natively.
Finally, attempting to add ECN to a subnet technology in feed- Finally, attempting to add ECN to a subnet technology in feed-
backward mode is deprecated except in special cases, due to its backward mode is deprecated except in special cases, due to its
likely sluggish response to congestion. likely sluggish response to congestion.
11. Acknowledgements 11. Acknowledgements
Thanks to Gorry Fairhurst for extensive reviews. Thanks also to the Thanks to Gorry Fairhurst for extensive reviews. Thanks also to the
following reviewers: Ingemar Johansson and Piers O'Hanlon and Michael following reviewers: Richard Scheffenegger, Ingemar Johansson, Piers
Welzl, who pointed out that lower layer congestion notification O'Hanlon and Michael Welzl, who pointed out that lower layer
signals may have different semantics to those in IP. congestion notification signals may have different semantics to those
in IP.
Bob Briscoe was part-funded by the European Community under its Bob Briscoe was part-funded by the European Community under its
Seventh Framework Programme through the Trilogy project (ICT-216372) Seventh Framework Programme through the Trilogy project (ICT-216372)
for initial drafts and through the Reducing Internet Transport for initial drafts and through the Reducing Internet Transport
Latency (RITE) project (ICT-317700) subsequently. The views Latency (RITE) project (ICT-317700) subsequently. The views
expressed here are solely those of the authors. expressed here are solely those of the authors.
12. Comments Solicited 12. Comments Solicited
Comments and questions are encouraged and very welcome. They can be Comments and questions are encouraged and very welcome. They can be
skipping to change at page 25, line 19 skipping to change at page 25, line 30
[I-D.ietf-conex-abstract-mech] [I-D.ietf-conex-abstract-mech]
Mathis, M. and B. Briscoe, "Congestion Exposure (ConEx) Mathis, M. and B. Briscoe, "Congestion Exposure (ConEx)
Concepts, Abstract Mechanism and Requirements", draft- Concepts, Abstract Mechanism and Requirements", draft-
ietf-conex-abstract-mech-13 (work in progress), October ietf-conex-abstract-mech-13 (work in progress), October
2014. 2014.
[I-D.ietf-trill-rfc7180bis] [I-D.ietf-trill-rfc7180bis]
Eastlake, D., Zhang, M., Perlman, R., Banerjee, A., Eastlake, D., Zhang, M., Perlman, R., Banerjee, A.,
Ghanwani, A., and S. Gupta, "TRILL: Clarifications, Ghanwani, A., and S. Gupta, "TRILL: Clarifications,
Corrections, and Updates", draft-ietf-trill-rfc7180bis-05 Corrections, and Updates", draft-ietf-trill-rfc7180bis-06
(work in progress), June 2015. (work in progress), October 2015.
[I-D.ietf-tsvwg-circuit-breaker]
Fairhurst, G., "Network Transport Circuit Breakers",
draft-ietf-tsvwg-circuit-breaker-05 (work in progress),
October 2015.
[I-D.moncaster-tcpm-rcv-cheat] [I-D.moncaster-tcpm-rcv-cheat]
Moncaster, T., Briscoe, B., and A. Jacquet, "A TCP Test to Moncaster, T., Briscoe, B., and A. Jacquet, "A TCP Test to
Allow Senders to Identify Receiver Non-Compliance", draft- Allow Senders to Identify Receiver Non-Compliance", draft-
moncaster-tcpm-rcv-cheat-03 (work in progress), July 2014. moncaster-tcpm-rcv-cheat-03 (work in progress), July 2014.
[IEEE802.1Qah] [IEEE802.1Qah]
IEEE, "IEEE Standard for Local and Metropolitan Area IEEE, "IEEE Standard for Local and Metropolitan Area
Networks--Virtual Bridged Local Area Networks--Amendment Networks--Virtual Bridged Local Area Networks--Amendment
6: Provider Backbone Bridges", IEEE Std 802.1Qah-2008, 6: Provider Backbone Bridges", IEEE Std 802.1Qah-2008,
skipping to change at page 28, line 18 skipping to change at page 28, line 18
than a SHOULD (NOT). Given the guidelines say that if any SHOULD than a SHOULD (NOT). Given the guidelines say that if any SHOULD
(NOT)s are not followed, a strong justification will be needed, (NOT)s are not followed, a strong justification will be needed,
they have been left as SHOULD (NOT) pending further list they have been left as SHOULD (NOT) pending further list
discussion. In particular: discussion. In particular:
* If inner is a Not-ECN-PDU and Outer is CE (or highest severity * If inner is a Not-ECN-PDU and Outer is CE (or highest severity
congestion level), MUST (not SHOULD) drop? congestion level), MUST (not SHOULD) drop?
2. Consider whether an IETF Standard Track doc will be needed to 2. Consider whether an IETF Standard Track doc will be needed to
Update the IP-in-IP protocols listed in Section 5.1--at least Update the IP-in-IP protocols listed in Section 5.1--at least
those that the IET those that the IETF controls--and which Area it should sit under.
Appendix B. Changes in This Version (to be removed by RFC Editor) Appendix B. Changes in This Version (to be removed by RFC Editor)
From ietf-03 to ietf-04:
* Addressed Richard Scheffenegger's review comments: primarily
editorial corrections, and addition of examples for clarity.
From ietf-02 to ietf-03: From ietf-02 to ietf-03:
* Updated references, ad cited RFC4774. * Updated references, ad cited RFC4774.
From ietf-01 to ietf-02: From ietf-01 to ietf-02:
* Added Section for guidelines that are applicable in all cases. * Added Section for guidelines that are applicable in all cases.
* Updated references. * Updated references.
 End of changes. 24 change blocks. 
63 lines changed or deleted 81 lines changed or added

This html diff was produced by rfcdiff 1.42. The latest version is available from http://tools.ietf.org/tools/rfcdiff/