draft-ietf-tsvwg-aqm-dualq-coupled-12.txt   draft-ietf-tsvwg-aqm-dualq-coupled-13.txt 
Transport Area working group (tsvwg) K. De Schepper Transport Area working group (tsvwg) K. De Schepper
Internet-Draft Nokia Bell Labs Internet-Draft Nokia Bell Labs
Intended status: Experimental B. Briscoe, Ed. Intended status: Experimental B. Briscoe, Ed.
Expires: January 28, 2021 Independent Expires: May 19, 2021 Independent
G. White G. White
CableLabs CableLabs
July 27, 2020 November 15, 2020
DualQ Coupled AQMs for Low Latency, Low Loss and Scalable Throughput DualQ Coupled AQMs for Low Latency, Low Loss and Scalable Throughput
(L4S) (L4S)
draft-ietf-tsvwg-aqm-dualq-coupled-12 draft-ietf-tsvwg-aqm-dualq-coupled-13
Abstract Abstract
The Low Latency Low Loss Scalable Throughput (L4S) architecture The Low Latency Low Loss Scalable Throughput (L4S) architecture
allows data flows over the public Internet to achieve consistent low allows data flows over the public Internet to achieve consistent low
queuing latency, generally zero congestion loss and scaling of per- queuing latency, generally zero congestion loss and scaling of per-
flow throughput without the scaling problems of standard TCP Reno- flow throughput without the scaling problems of standard TCP Reno-
friendly congestion controls. To achieve this, L4S data flows have friendly congestion controls. To achieve this, L4S data flows have
to use one of the family of 'Scalable' congestion controls (TCP to use one of the family of 'Scalable' congestion controls (TCP
Prague and Data Center TCP are examples) and a form of Explicit Prague and Data Center TCP are examples) and a form of Explicit
Congestion Notification (ECN) with modified behaviour. However, Congestion Notification (ECN) with modified behaviour. However,
until now, Scalable congestion controls did not co-exist with until now, Scalable congestion controls did not co-exist with
existing Reno/Cubic traffic --- Scalable controls are so aggressive existing Reno/Cubic traffic --- Scalable controls are so aggressive
that 'Classic' (e.g. Reno-friendly) algorithms sharing an ECN- that 'Classic' (e.g. Reno-friendly) algorithms sharing an ECN-capable
capable queue would drive themselves to a small capacity share. queue would drive themselves to a small capacity share. Therefore,
Therefore, until now, L4S controls could only be deployed where a until now, L4S controls could only be deployed where a clean-slate
clean-slate environment could be arranged, such as in private data environment could be arranged, such as in private data centres (hence
centres (hence the name DCTCP). This specification defines `DualQ the name DCTCP). This specification defines `DualQ Coupled Active
Coupled Active Queue Management (AQM)', which enables Scalable Queue Management (AQM)', which enables Scalable congestion controls
congestion controls that comply with the Prague L4S requirements to that comply with the Prague L4S requirements to co-exist safely with
co-exist safely with Classic Internet traffic. Classic Internet traffic.
Analytical study and implementation testing of the Coupled AQM have Analytical study and implementation testing of the Coupled AQM have
shown that Scalable and Classic flows competing under similar shown that Scalable and Classic flows competing under similar
conditions run at roughly the same rate. It achieves this conditions run at roughly the same rate. It achieves this
indirectly, without having to inspect transport layer flow indirectly, without having to inspect transport layer flow
identifiers. When tested in a residential broadband setting, DCTCP identifiers. When tested in a residential broadband setting, DCTCP
also achieves sub-millisecond average queuing delay and zero also achieves sub-millisecond average queuing delay and zero
congestion loss under a wide range of mixes of DCTCP and `Classic' congestion loss under a wide range of mixes of DCTCP and `Classic'
broadband Internet traffic, without compromising the performance of broadband Internet traffic, without compromising the performance of
the Classic traffic. The solution has low complexity and requires no the Classic traffic. The solution has low complexity and requires no
skipping to change at page 2, line 20 skipping to change at page 2, line 20
Internet-Drafts are working documents of the Internet Engineering Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF). Note that other groups may also distribute Task Force (IETF). Note that other groups may also distribute
working documents as Internet-Drafts. The list of current Internet- working documents as Internet-Drafts. The list of current Internet-
Drafts is at https://datatracker.ietf.org/drafts/current/. Drafts is at https://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress." material or to cite them other than as "work in progress."
This Internet-Draft will expire on January 28, 2021. This Internet-Draft will expire on May 19, 2021.
Copyright Notice Copyright Notice
Copyright (c) 2020 IETF Trust and the persons identified as the Copyright (c) 2020 IETF Trust and the persons identified as the
document authors. All rights reserved. document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents Provisions Relating to IETF Documents
(https://trustee.ietf.org/license-info) in effect on the date of (https://trustee.ietf.org/license-info) in effect on the date of
publication of this document. Please review these documents publication of this document. Please review these documents
skipping to change at page 3, line 5 skipping to change at page 3, line 5
2. DualQ Coupled AQM . . . . . . . . . . . . . . . . . . . . . . 10 2. DualQ Coupled AQM . . . . . . . . . . . . . . . . . . . . . . 10
2.1. Coupled AQM . . . . . . . . . . . . . . . . . . . . . . . 10 2.1. Coupled AQM . . . . . . . . . . . . . . . . . . . . . . . 10
2.2. Dual Queue . . . . . . . . . . . . . . . . . . . . . . . 12 2.2. Dual Queue . . . . . . . . . . . . . . . . . . . . . . . 12
2.3. Traffic Classification . . . . . . . . . . . . . . . . . 12 2.3. Traffic Classification . . . . . . . . . . . . . . . . . 12
2.4. Overall DualQ Coupled AQM Structure . . . . . . . . . . . 13 2.4. Overall DualQ Coupled AQM Structure . . . . . . . . . . . 13
2.5. Normative Requirements for a DualQ Coupled AQM . . . . . 16 2.5. Normative Requirements for a DualQ Coupled AQM . . . . . 16
2.5.1. Functional Requirements . . . . . . . . . . . . . . . 16 2.5.1. Functional Requirements . . . . . . . . . . . . . . . 16
2.5.1.1. Requirements in Unexpected Cases . . . . . . . . 17 2.5.1.1. Requirements in Unexpected Cases . . . . . . . . 17
2.5.2. Management Requirements . . . . . . . . . . . . . . . 18 2.5.2. Management Requirements . . . . . . . . . . . . . . . 18
2.5.2.1. Configuration . . . . . . . . . . . . . . . . . . 18 2.5.2.1. Configuration . . . . . . . . . . . . . . . . . . 18
2.5.2.2. Monitoring . . . . . . . . . . . . . . . . . . . 19 2.5.2.2. Monitoring . . . . . . . . . . . . . . . . . . . 20
2.5.2.3. Anomaly Detection . . . . . . . . . . . . . . . . 20 2.5.2.3. Anomaly Detection . . . . . . . . . . . . . . . . 20
2.5.2.4. Deployment, Coexistence and Scaling . . . . . . . 20 2.5.2.4. Deployment, Coexistence and Scaling . . . . . . . 21
3. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 21 3. IANA Considerations (to be removed by RFC Editor) . . . . . . 21
4. Security Considerations . . . . . . . . . . . . . . . . . . . 21 4. Security Considerations . . . . . . . . . . . . . . . . . . . 21
4.1. Overload Handling . . . . . . . . . . . . . . . . . . . . 21 4.1. Overload Handling . . . . . . . . . . . . . . . . . . . . 21
4.1.1. Avoiding Classic Starvation: Sacrifice L4S Throughput 4.1.1. Avoiding Classic Starvation: Sacrifice L4S Throughput
or Delay? . . . . . . . . . . . . . . . . . . . . . . 21 or Delay? . . . . . . . . . . . . . . . . . . . . . . 22
4.1.2. Congestion Signal Saturation: Introduce L4S Drop or 4.1.2. Congestion Signal Saturation: Introduce L4S Drop or
Delay? . . . . . . . . . . . . . . . . . . . . . . . 23 Delay? . . . . . . . . . . . . . . . . . . . . . . . 23
4.1.3. Protecting against Unresponsive ECN-Capable Traffic . 24 4.1.3. Protecting against Unresponsive ECN-Capable Traffic . 24
5. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 24 5. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 24
6. Contributors . . . . . . . . . . . . . . . . . . . . . . . . 24 6. Contributors . . . . . . . . . . . . . . . . . . . . . . . . 25
7. References . . . . . . . . . . . . . . . . . . . . . . . . . 25 7. References . . . . . . . . . . . . . . . . . . . . . . . . . 25
7.1. Normative References . . . . . . . . . . . . . . . . . . 25 7.1. Normative References . . . . . . . . . . . . . . . . . . 25
7.2. Informative References . . . . . . . . . . . . . . . . . 26 7.2. Informative References . . . . . . . . . . . . . . . . . 26
Appendix A. Example DualQ Coupled PI2 Algorithm . . . . . . . . 30 Appendix A. Example DualQ Coupled PI2 Algorithm . . . . . . . . 30
A.1. Pass #1: Core Concepts . . . . . . . . . . . . . . . . . 31 A.1. Pass #1: Core Concepts . . . . . . . . . . . . . . . . . 31
A.2. Pass #2: Overload Details . . . . . . . . . . . . . . . . 39 A.2. Pass #2: Overload Details . . . . . . . . . . . . . . . . 40
Appendix B. Example DualQ Coupled Curvy RED Algorithm . . . . . 43 Appendix B. Example DualQ Coupled Curvy RED Algorithm . . . . . 44
B.1. Curvy RED in Pseudocode . . . . . . . . . . . . . . . . . 43 B.1. Curvy RED in Pseudocode . . . . . . . . . . . . . . . . . 44
B.2. Efficient Implementation of Curvy RED . . . . . . . . . . 49 B.2. Efficient Implementation of Curvy RED . . . . . . . . . . 50
Appendix C. Choice of Coupling Factor, k . . . . . . . . . . . . 51 Appendix C. Choice of Coupling Factor, k . . . . . . . . . . . . 52
C.1. RTT-Dependence . . . . . . . . . . . . . . . . . . . . . 51 C.1. RTT-Dependence . . . . . . . . . . . . . . . . . . . . . 52
C.2. Guidance on Controlling Throughput Equivalence . . . . . 52 C.2. Guidance on Controlling Throughput Equivalence . . . . . 53
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 53 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 54
1. Introduction 1. Introduction
This document specifies a framework for DualQ Coupled AQMs, which is This document specifies a framework for DualQ Coupled AQMs, which is
the network part of the L4S architecture [I-D.ietf-tsvwg-l4s-arch]. the network part of the L4S architecture [I-D.ietf-tsvwg-l4s-arch].
L4S enables both ultra-low queuing latency (sub-millisecond on L4S enables both ultra-low queuing latency (sub-millisecond on
average) and high throughput at the same time, for ad hoc numbers of average) and high throughput at the same time, for ad hoc numbers of
capacity-seeking applications all sharing the same capacity. capacity-seeking applications all sharing the same capacity.
1.1. Outline of the Problem 1.1. Outline of the Problem
skipping to change at page 4, line 10 skipping to change at page 4, line 10
machinery and industrial processes. In the developed world, further machinery and industrial processes. In the developed world, further
increases in access network bit-rate offer diminishing returns, increases in access network bit-rate offer diminishing returns,
whereas latency is still a multi-faceted problem. In the last decade whereas latency is still a multi-faceted problem. In the last decade
or so, much has been done to reduce propagation time by placing or so, much has been done to reduce propagation time by placing
caches or servers closer to users. However, queuing remains a major caches or servers closer to users. However, queuing remains a major
intermittent component of latency. intermittent component of latency.
Traditionally ultra-low latency has only been available for a few Traditionally ultra-low latency has only been available for a few
selected low rate applications, that confine their sending rate selected low rate applications, that confine their sending rate
within a specially carved-off portion of capacity, which is within a specially carved-off portion of capacity, which is
prioritized over other traffic, e.g. Diffserv EF [RFC3246]. Up to prioritized over other traffic, e.g. Diffserv EF [RFC3246]. Up to
now it has not been possible to allow any number of low latency, high now it has not been possible to allow any number of low latency, high
throughput applications to seek to fully utilize available capacity, throughput applications to seek to fully utilize available capacity,
because the capacity-seeking process itself causes too much queuing because the capacity-seeking process itself causes too much queuing
delay. delay.
To reduce this queuing delay caused by the capacity seeking process, To reduce this queuing delay caused by the capacity seeking process,
changes either to the network alone or to end-systems alone are in changes either to the network alone or to end-systems alone are in
progress. L4S involves a recognition that both approaches are progress. L4S involves a recognition that both approaches are
yielding diminishing returns: yielding diminishing returns:
o Recent state-of-the-art active queue management (AQM) in the o Recent state-of-the-art active queue management (AQM) in the
network, e.g. fq_CoDel [RFC8290], PIE [RFC8033], Adaptive network, e.g. FQ-CoDel [RFC8290], PIE [RFC8033], Adaptive
RED [ARED01] ) has reduced queuing delay for all traffic, not just RED [ARED01] ) has reduced queuing delay for all traffic, not just
a select few applications. However, no matter how good the AQM, a select few applications. However, no matter how good the AQM,
the capacity-seeking (sawtoothing) rate of TCP-like congestion the capacity-seeking (sawtoothing) rate of TCP-like congestion
controls represents a lower limit that will either cause queuing controls represents a lower limit that will either cause queuing
delay to vary or cause the link to be under-utilized. These AQMs delay to vary or cause the link to be under-utilized. These AQMs
are tuned to allow a typical capacity-seeking Reno-friendly flow are tuned to allow a typical capacity-seeking Reno-friendly flow
to induce an average queue that roughly doubles the base RTT, to induce an average queue that roughly doubles the base RTT,
adding 5-15 ms of queuing on average (cf. 500 microseconds with adding 5-15 ms of queuing on average (cf. 500 microseconds with
L4S for the same mix of long-running and web traffic). However, L4S for the same mix of long-running and web traffic). However,
for many applications low delay is not useful unless it is for many applications low delay is not useful unless it is
consistently low. With these AQMs, 99th percentile queuing delay consistently low. With these AQMs, 99th percentile queuing delay
is 20-30 ms (cf. 2 ms with the same traffic over L4S). is 20-30 ms (cf. 2 ms with the same traffic over L4S).
o Similarly, recent research into using e2e congestion control o Similarly, recent research into using e2e congestion control
without needing an AQM in the network (e.g.BBRv1 [BBRv1]) seems to without needing an AQM in the network (e.g.BBR [BBRv1],
have hit a similar lower limit to queuing delay of about 20ms on [I-D.cardwell-iccrg-bbr-congestion-control]) seems to have hit a
average (and any additional BBRv1 flow adds another 20ms of similar lower limit to queuing delay of about 20ms on average (and
queuing) but there are also regular 25ms delay spikes due to any additional BBRv1 flow adds another 20ms of queuing) but there
bandwidth probes and 60ms spikes due to flow-starts. are also regular 25ms delay spikes due to bandwidth probes and
60ms spikes due to flow-starts.
L4S learns from the experience of Data Center TCP [RFC8257], which L4S learns from the experience of Data Center TCP [RFC8257], which
shows the power of complementary changes both in the network and on shows the power of complementary changes both in the network and on
end-systems. DCTCP teaches us that two small but radical changes to end-systems. DCTCP teaches us that two small but radical changes to
congestion control are needed to cut the two major outstanding causes congestion control are needed to cut the two major outstanding causes
of queuing delay variability: of queuing delay variability:
1. Far smaller rate variations (sawteeth) than Reno-friendly 1. Far smaller rate variations (sawteeth) than Reno-friendly
congestion controls; congestion controls;
2. A shift of smoothing and hence smoothing delay from network to 2. A shift of smoothing and hence smoothing delay from network to
sender. sender.
Without the former, a 'Classic' (e.g. Reno-friendly) flow's round Without the former, a 'Classic' (e.g. Reno-friendly) flow's round
trip time (RTT) varies between roughly 1 and 2 times the base RTT trip time (RTT) varies between roughly 1 and 2 times the base RTT
between the machines in question. Without the latter a 'Classic' between the machines in question. Without the latter a 'Classic'
flow's response to changing events is delayed by a worst-case flow's response to changing events is delayed by a worst-case
(transcontinental) RTT, which could be hundreds of times the actual (transcontinental) RTT, which could be hundreds of times the actual
smoothing delay needed for the RTT of typical traffic from localized smoothing delay needed for the RTT of typical traffic from localized
CDNs. CDNs.
These changes are the two main features of the family of so-called These changes are the two main features of the family of so-called
'Scalable' congestion controls (which includes DCTCP). Both these 'Scalable' congestion controls (which includes DCTCP). Both these
changes only reduce delay in combination with a complementary change changes only reduce delay in combination with a complementary change
skipping to change at page 6, line 26 skipping to change at page 6, line 26
deployed incrementally, because they both identify L4S packets deployed incrementally, because they both identify L4S packets
using the experimentally assigned explicit congestion notification using the experimentally assigned explicit congestion notification
(ECN) codepoints in the IP header: ECT(1) and CE [RFC8311] (ECN) codepoints in the IP header: ECT(1) and CE [RFC8311]
[I-D.ietf-tsvwg-ecn-l4s-id]. [I-D.ietf-tsvwg-ecn-l4s-id].
Data Center TCP (DCTCP [RFC8257]) is an example of a Scalable Data Center TCP (DCTCP [RFC8257]) is an example of a Scalable
congestion control that has been deployed for some time in Linux, congestion control that has been deployed for some time in Linux,
Windows and FreeBSD operating systems and Relentless TCP [Mathis09] Windows and FreeBSD operating systems and Relentless TCP [Mathis09]
is another example. During the progress of this document through the is another example. During the progress of this document through the
IETF a number of other Scalable congestion controls were implemented, IETF a number of other Scalable congestion controls were implemented,
e.g. TCP Prague [PragueLinux], QUIC Prague and the L4S variant of e.g. TCP Prague [PragueLinux], QUIC Prague and the L4S variant of
SCREAM for real-time media [RFC8298]. (Note: after the v3.19 Linux SCREAM for real-time media [RFC8298]. (Note: after the v3.19 Linux
kernel, bugs were introduced into DCTCP's scalable behaviour and not kernel, bugs were introduced into DCTCP's scalable behaviour and not
all the patches applied for L4S evaluation had been applied to the all the patches applied for L4S evaluation had been applied to the
mainline Linux kernel, which was at v5.5 at the time of writing. TCP mainline Linux kernel, which was at v5.5 at the time of writing. TCP
Prague includes these patches and is available for all these Linux Prague includes these patches and is available for all these Linux
kernels). kernels).
The focus of this specification is to enable deployment of the The focus of this specification is to enable deployment of the
network part of the L4S service. Then, without any management network part of the L4S service. Then, without any management
intervention, applications can exploit this new network capability as intervention, applications can exploit this new network capability as
skipping to change at page 7, line 49 skipping to change at page 7, line 49
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
document are to be interpreted as described in [RFC2119] when, and document are to be interpreted as described in [RFC2119] when, and
only when, they appear in all capitals, as shown here. only when, they appear in all capitals, as shown here.
The DualQ Coupled AQM uses two queues for two services. Each of the The DualQ Coupled AQM uses two queues for two services. Each of the
following terms identifies both the service and the queue that following terms identifies both the service and the queue that
provides the service: provides the service:
Classic service/queue: The Classic service is intended for all the Classic service/queue: The Classic service is intended for all the
congestion control behaviours that co-exist with Reno [RFC5681] congestion control behaviours that co-exist with Reno [RFC5681]
(e.g. Reno itself, Cubic [RFC8312], TFRC [RFC5348]). (e.g. Reno itself, Cubic [RFC8312], TFRC [RFC5348]).
Low-Latency, Low-Loss Scalable throughput (L4S) service/queue: The Low-Latency, Low-Loss Scalable throughput (L4S) service/queue: The
'L4S' service is intended for traffic from scalable congestion 'L4S' service is intended for traffic from scalable congestion
control algorithms, such as Data Center TCP [RFC8257]. The L4S control algorithms, such as Data Center TCP [RFC8257]. The L4S
service is for more general traffic than just DCTCP--it allows the service is for more general traffic than just DCTCP--it allows the
set of congestion controls with similar scaling properties to set of congestion controls with similar scaling properties to
DCTCP to evolve (e.g. Relentless TCP [Mathis09], TCP Prague DCTCP to evolve (e.g. Relentless TCP [Mathis09], TCP
[PragueLinux] and the L4S variant of SCREAM for real-time media Prague [PragueLinux] and the L4S variant of SCREAM for real-time
[RFC8298]). media [RFC8298]).
Classic Congestion Control: A congestion control behaviour that can Classic Congestion Control: A congestion control behaviour that can
co-exist with standard TCP Reno [RFC5681] without causing co-exist with standard TCP Reno [RFC5681] without causing
significantly negative impact on its flow rate [RFC5033]. With significantly negative impact on its flow rate [RFC5033]. With
Classic congestion controls, as flow rate scales, the number of Classic congestion controls, as flow rate scales, the number of
round trips between congestion signals (losses or ECN marks) rises round trips between congestion signals (losses or ECN marks) rises
with the flow rate. So it takes longer and longer to recover with the flow rate. So it takes longer and longer to recover
after each congestion event. Therefore control of queuing and after each congestion event. Therefore control of queuing and
utilization becomes very slack, and the slightest disturbance utilization becomes very slack, and the slightest disturbance
prevents a high rate from being attained [RFC3649]. prevents a high rate from being attained [RFC3649].
skipping to change at page 8, line 45 skipping to change at page 8, line 45
C: Abbreviation for Classic, e.g. when used as a subscript. C: Abbreviation for Classic, e.g. when used as a subscript.
L: Abbreviation for L4S, e.g. when used as a subscript. L: Abbreviation for L4S, e.g. when used as a subscript.
The terms Classic or L4S can also qualify other nouns, such as The terms Classic or L4S can also qualify other nouns, such as
'codepoint', 'identifier', 'classification', 'packet', 'flow'. 'codepoint', 'identifier', 'classification', 'packet', 'flow'.
For example: an L4S packet means a packet with an L4S identifier For example: an L4S packet means a packet with an L4S identifier
sent from an L4S congestion control. sent from an L4S congestion control.
Both Classic and L4S queues can cope with a proportion of Both Classic and L4S queues can cope with a proportion of
unresponsive or less-responsive traffic as well (e.g. DNS, VoIP, unresponsive or less-responsive traffic as well (e.g. DNS, VoIP,
game sync datagrams), just as a single queue AQM can if this game sync datagrams), just as a single queue AQM can if this
traffic makes minimal contribution to queuing. The DualQ Coupled traffic makes minimal contribution to queuing. The DualQ Coupled
AQM behaviour is defined to be similar to a single FIFO queue with AQM behaviour is defined to be similar to a single FIFO queue with
respect to unresponsive and overload traffic. respect to unresponsive and overload traffic.
Reno-friendly: The subset of Classic traffic that excludes Reno-friendly: The subset of Classic traffic that excludes
unresponsive traffic and excludes experimental congestion controls unresponsive traffic and excludes experimental congestion controls
intended to coexist with Reno but without always being strictly intended to coexist with Reno but without always being strictly
friendly to it (as allowed by [RFC5033]). Reno-friendly is used friendly to it (as allowed by [RFC5033]). Reno-friendly is used
in place of 'TCP-friendly', given that the TCP protocol is used in place of 'TCP-friendly', given that friendliness is a property
with many different congestion control behaviours. of the congestion controller (Reno), not the wire protocol (TCP),
which is used with many different congestion control behaviours.
Classic ECN: The original Explicit Congestion Notification (ECN) Classic ECN: The original Explicit Congestion Notification (ECN)
protocol [RFC3168], which requires ECN signals to be treated the protocol [RFC3168], which requires ECN signals to be treated the
same as drops, both when generated in the network and when same as drops, both when generated in the network and when
responded to by the sender. responded to by the sender.
The names used for the four codepoints of the 2-bit IP-ECN field The names used for the four codepoints of the 2-bit IP-ECN field
are as defined in [RFC3168]: Not ECT, ECT(0), ECT(1) and CE, where are as defined in [RFC3168]: Not ECT, ECT(0), ECT(1) and CE, where
ECT stands for ECN-Capable Transport and CE stands for Congestion ECT stands for ECN-Capable Transport and CE stands for Congestion
Experienced. Experienced.
skipping to change at page 10, line 5 skipping to change at page 10, line 6
application, each user could use finger gestures to pan or zoom their application, each user could use finger gestures to pan or zoom their
own high definition (HD) sub-window of a larger video scene generated own high definition (HD) sub-window of a larger video scene generated
on the fly in 'the cloud' from a football match. Another user on the fly in 'the cloud' from a football match. Another user
wearing VR goggles was remotely receiving a feed from a 360-degree wearing VR goggles was remotely receiving a feed from a 360-degree
camera in a racing car, again with the sub-window in their field of camera in a racing car, again with the sub-window in their field of
vision generated on the fly in 'the cloud' dependent on their head vision generated on the fly in 'the cloud' dependent on their head
movements. Even though other users were also downloading large movements. Even though other users were also downloading large
amounts of L4S and Classic data, playing a gaming benchmark and amounts of L4S and Classic data, playing a gaming benchmark and
watchings videos over the same 40Mb/s downstream broadband link, watchings videos over the same 40Mb/s downstream broadband link,
latency was so low that the football picture appeared to stick to the latency was so low that the football picture appeared to stick to the
user's finger on the touchpad and the experience fed from the remote user's finger on the touch pad and the experience fed from the remote
camera did not noticeably lag head movements. All the L4S data (even camera did not noticeably lag head movements. All the L4S data (even
including the downloads) achieved the same ultra-low latency. With including the downloads) achieved the same ultra-low latency. With
an alternative AQM, the video noticeably lagged behind the finger an alternative AQM, the video noticeably lagged behind the finger
gestures and head movements. gestures and head movements.
Unlike Diffserv Expedited Forwarding, the L4S queue does not have to Unlike Diffserv Expedited Forwarding, the L4S queue does not have to
be limited to a small proportion of the link capacity in order to be limited to a small proportion of the link capacity in order to
achieve low delay. The L4S queue can be filled with a heavy load of achieve low delay. The L4S queue can be filled with a heavy load of
capacity-seeking flows (TCP Prague etc.) and still achieve low delay. capacity-seeking flows (TCP Prague etc.) and still achieve low delay.
The L4S queue does not rely on the presence of other traffic in the The L4S queue does not rely on the presence of other traffic in the
skipping to change at page 10, line 39 skipping to change at page 10, line 40
The L4S queue has latency priority, but the coupling from the Classic The L4S queue has latency priority, but the coupling from the Classic
to the L4S AQM (explained below) ensures that it does not have to the L4S AQM (explained below) ensures that it does not have
bandwidth priority over the Classic queue. bandwidth priority over the Classic queue.
2. DualQ Coupled AQM 2. DualQ Coupled AQM
There are two main aspects to the approach: There are two main aspects to the approach:
o the Coupled AQM that addresses throughput equivalence between o the Coupled AQM that addresses throughput equivalence between
Classic (e.g. Reno, Cubic) flows and L4S flows (that satisfy the Classic (e.g. Reno, Cubic) flows and L4S flows (that satisfy the
Prague L4S requirements). Prague L4S requirements).
o the Dual Queue structure that provides latency separation for L4S o the Dual Queue structure that provides latency separation for L4S
flows to isolate them from the typically large Classic queue. flows to isolate them from the typically large Classic queue.
2.1. Coupled AQM 2.1. Coupled AQM
In the 1990s, the `TCP formula' was derived for the relationship In the 1990s, the `TCP formula' was derived for the relationship
between the steady-state congestion window, cwnd, and the drop between the steady-state congestion window, cwnd, and the drop
probability, p of standard Reno congestion control [RFC5681] . To a probability, p of standard Reno congestion control [RFC5681] . To a
skipping to change at page 13, line 4 skipping to change at page 13, line 8
other operators downstream from making their own choices on how to other operators downstream from making their own choices on how to
treat L4S traffic. treat L4S traffic.
In addition, an operator could use other identifiers to classify In addition, an operator could use other identifiers to classify
certain additional packet types into the L queue that it deems will certain additional packet types into the L queue that it deems will
not risk harm to the L4S service. For instance addresses of specific not risk harm to the L4S service. For instance addresses of specific
applications or hosts (see [I-D.ietf-tsvwg-ecn-l4s-id]), specific applications or hosts (see [I-D.ietf-tsvwg-ecn-l4s-id]), specific
Diffserv codepoints such as EF (Expedited Forwarding) and Voice-Admit Diffserv codepoints such as EF (Expedited Forwarding) and Voice-Admit
service classes (see [I-D.briscoe-tsvwg-l4s-diffserv]), the Non- service classes (see [I-D.briscoe-tsvwg-l4s-diffserv]), the Non-
Queue-Building (NQB) per-hop behaviour [I-D.ietf-tsvwg-nqb] or Queue-Building (NQB) per-hop behaviour [I-D.ietf-tsvwg-nqb] or
certain protocols (e.g. ARP, DNS). Note that the mechanism only certain protocols (e.g. ARP, DNS). Note that the mechanism only
reads these identifiers. [I-D.ietf-tsvwg-ecn-l4s-id] says it "MUST reads these identifiers. [I-D.ietf-tsvwg-ecn-l4s-id] says it "MUST
NOT alter these non-ECN identifiers". Thus, the L queue is not soley NOT alter these non-ECN identifiers". Thus, the L queue is not
an L4S queue, it can be consider more generally as a low latency solely an L4S queue, it can be consider more generally as a low
queue. latency queue.
2.4. Overall DualQ Coupled AQM Structure 2.4. Overall DualQ Coupled AQM Structure
Figure 1 shows the overall structure that any DualQ Coupled AQM is Figure 1 shows the overall structure that any DualQ Coupled AQM is
likely to have. This schematic is intended to aid understanding of likely to have. This schematic is intended to aid understanding of
the current designs of DualQ Coupled AQMs. However, it is not the current designs of DualQ Coupled AQMs. However, it is not
intended to preclude other innovative ways of satisfying the intended to preclude other innovative ways of satisfying the
normative requirements in Section 2.5 that minimally define a DualQ normative requirements in Section 2.5 that minimally define a DualQ
Coupled AQM. Coupled AQM.
skipping to change at page 17, line 9 skipping to change at page 17, line 11
Assuming Scalable congestion controls for the Internet will be as Assuming Scalable congestion controls for the Internet will be as
aggressive as DCTCP, this will ensure their congestion window will be aggressive as DCTCP, this will ensure their congestion window will be
roughly the same as that of a standards track TCP Reno congestion roughly the same as that of a standards track TCP Reno congestion
control (Reno) [RFC5681] and other Reno-friendly controls, such as control (Reno) [RFC5681] and other Reno-friendly controls, such as
TCP Cubic in its Reno-compatibility mode. TCP Cubic in its Reno-compatibility mode.
The choice of k is a matter of operator policy, and operators MAY The choice of k is a matter of operator policy, and operators MAY
choose a different value using Table 1 and the guidelines in choose a different value using Table 1 and the guidelines in
Appendix C.2. Appendix C.2.
If multiple customers or users share capacity at a bottleneck (e.g. If multiple customers or users share capacity at a bottleneck
in the Internet access link of a campus network), the operator's (e.g. in the Internet access link of a campus network), the
choice of k will determine capacity sharing between the flows of operator's choice of k will determine capacity sharing between the
different customers. However, on the public Internet, access network flows of different customers. However, on the public Internet,
operators typically isolate customers from each other with some form access network operators typically isolate customers from each other
of layer-2 multiplexing (OFDM(A) in DOCSIS3.1, CDMA in 3G, SC-FDMA in with some form of layer-2 multiplexing (OFDM(A) in DOCSIS3.1, CDMA in
LTE) or L3 scheduling (WRR in DSL), rather than relying on host 3G, SC-FDMA in LTE) or L3 scheduling (WRR in DSL), rather than
congestion controls to share capacity between customers [RFC0970]. relying on host congestion controls to share capacity between
In such cases, the choice of k will solely affect relative flow rates customers [RFC0970]. In such cases, the choice of k will solely
within each customer's access capacity, not between customers. Also, affect relative flow rates within each customer's access capacity,
k will not affect relative flow rates at any times when all flows are not between customers. Also, k will not affect relative flow rates
Classic or all flows are L4S, and it will not affect the relative at any times when all flows are Classic or all flows are L4S, and it
throughput of small flows. will not affect the relative throughput of small flows.
2.5.1.1. Requirements in Unexpected Cases 2.5.1.1. Requirements in Unexpected Cases
The flexibility to allow operator-specific classifiers (Section 2.3) The flexibility to allow operator-specific classifiers (Section 2.3)
leads to the need to specify what the AQM in each queue ought to do leads to the need to specify what the AQM in each queue ought to do
with packets that do not carry the ECN field expected for that queue. with packets that do not carry the ECN field expected for that queue.
It is expected that the AQM in each queue will inspect the ECN field It is expected that the AQM in each queue will inspect the ECN field
to determine what sort of congestion notification to signal, then it to determine what sort of congestion notification to signal, then it
will decide whether to apply congestion notification to this will decide whether to apply congestion notification to this
particular packet, as follows: particular packet, as follows:
o If a packet that does not carry an ECT(1) or CE codepoint is o If a packet that does not carry an ECT(1) or CE codepoint is
classified into the L queue: classified into the L queue:
* if the packet is ECT(0), the L AQM SHOULD apply CE-marking * if the packet is ECT(0), the L AQM SHOULD apply CE-marking
using a probability appropriate to Classic congestion control using a probability appropriate to Classic congestion control
and appropriate to the target delay in the L queue and appropriate to the target delay in the L queue
* if the packet is Not-ECT, the appropriate action depends on * if the packet is Not-ECT, the appropriate action depends on
whether some other function is protecting the L queue from whether some other function is protecting the L queue from
misbehaving flows (e.g. per-flow queue protection misbehaving flows (e.g. per-flow queue
[I-D.briscoe-docsis-q-protection] or latency policing): protection [I-D.briscoe-docsis-q-protection] or latency
policing):
+ If separate queue protection is provided, the L AQM SHOULD + If separate queue protection is provided, the L AQM SHOULD
ignore the packet and forward it unchanged, meaning it ignore the packet and forward it unchanged, meaning it
should not calculate whether to apply congestion should not calculate whether to apply congestion
notification and it should neither drop nor CE-mark the notification and it should neither drop nor CE-mark the
packet (for instance, the operator might classify EF traffic packet (for instance, the operator might classify EF traffic
that is unresponsive to drop into the L queue, alongside that is unresponsive to drop into the L queue, alongside
responsive L4S-ECN traffic) responsive L4S-ECN traffic)
+ if separate queue protection is not provided, the L AQM + if separate queue protection is not provided, the L AQM
skipping to change at page 18, line 23 skipping to change at page 18, line 26
* the C AQM SHOULD apply CE-marking using the coupled AQM * the C AQM SHOULD apply CE-marking using the coupled AQM
probability p_CL (= k*p'). probability p_CL (= k*p').
The above requirements are worded as "SHOULDs", because operator- The above requirements are worded as "SHOULDs", because operator-
specific classifiers are for flexibility, by definition. Therefore, specific classifiers are for flexibility, by definition. Therefore,
alternative actions might be appropriate in the operator's specific alternative actions might be appropriate in the operator's specific
circumstances. An example would be where the operator knows that circumstances. An example would be where the operator knows that
certain legacy traffic marked with one codepoint actually has a certain legacy traffic marked with one codepoint actually has a
congestion response associated with another codepoint. congestion response associated with another codepoint.
If the DualQ Coupled AQM has detected overload, it SHOULD signal If the DualQ Coupled AQM has detected overload, it MUST begin using
congestion solely using drop, irrespective of the ECN field. Classic drop, and continue until the overload episode has subsided.
Switching to drop if ECN marking is persistently high is required by Switching to drop if ECN marking is persistently high is required by
Section 7 of [RFC3168] and Section 4.2.1 of [RFC7567]. Section 7 of [RFC3168] and Section 4.2.1 of [RFC7567].
2.5.2. Management Requirements 2.5.2. Management Requirements
2.5.2.1. Configuration 2.5.2.1. Configuration
By default, a DualQ Coupled AQM SHOULD NOT need any configuration for By default, a DualQ Coupled AQM SHOULD NOT need any configuration for
use at a bottleneck on the public Internet [RFC7567]. The following use at a bottleneck on the public Internet [RFC7567]. The following
parameters MAY be operator-configurable, e.g. to tune for non- parameters MAY be operator-configurable, e.g. to tune for non-
skipping to change at page 21, line 4 skipping to change at page 21, line 14
but no new report is generated until the first time the AQM is out but no new report is generated until the first time the AQM is out
of overload once the timer has expired. of overload once the timer has expired.
2.5.2.4. Deployment, Coexistence and Scaling 2.5.2.4. Deployment, Coexistence and Scaling
[RFC5706] suggests that deployment, coexistence and scaling should [RFC5706] suggests that deployment, coexistence and scaling should
also be covered as management requirements. The raison d'etre of the also be covered as management requirements. The raison d'etre of the
DualQ Coupled AQM is to enable deployment and coexistence of Scalable DualQ Coupled AQM is to enable deployment and coexistence of Scalable
congestion controls - as incremental replacements for today's Reno- congestion controls - as incremental replacements for today's Reno-
friendly controls that do not scale with bandwidth-delay product. friendly controls that do not scale with bandwidth-delay product.
Therefore there is no need to repeat these motivating issues here Therefore there is no need to repeat these motivating issues here
given they are already explained in the Introduction and detailed in given they are already explained in the Introduction and detailed in
the L4S architecture [I-D.ietf-tsvwg-l4s-arch]. the L4S architecture [I-D.ietf-tsvwg-l4s-arch].
The descriptions of specific DualQ Coupled AQM algorithms in the The descriptions of specific DualQ Coupled AQM algorithms in the
appendices cover scaling of their configuration parameters, e.g. with appendices cover scaling of their configuration parameters, e.g. with
respect to RTT and sampling frequency. respect to RTT and sampling frequency.
3. IANA Considerations 3. IANA Considerations (to be removed by RFC Editor)
This specification contains no IANA considerations. This specification contains no IANA considerations.
4. Security Considerations 4. Security Considerations
4.1. Overload Handling 4.1. Overload Handling
Where the interests of users or flows might conflict, it could be Where the interests of users or flows might conflict, it could be
necessary to police traffic to isolate any harm to the performance of necessary to police traffic to isolate any harm to the performance of
individual flows. However it is hard to avoid unintended side- individual flows. However it is hard to avoid unintended side-
effects with policing, and in a trusted environment policing is not effects with policing, and in a trusted environment policing is not
necessary. Therefore per-flow policing (e.g. necessary. Therefore per-flow policing
[I-D.briscoe-docsis-q-protection]) needs to be separable from a basic (e.g. [I-D.briscoe-docsis-q-protection]) needs to be separable from a
AQM, as an option under policy control. basic AQM, as an option under policy control.
However, a basic DualQ AQM does at least need to handle overload. A However, a basic DualQ AQM does at least need to handle overload. A
useful objective would be for the overload behaviour of the DualQ AQM useful objective would be for the overload behaviour of the DualQ AQM
to be at least no worse than a single queue AQM. However, a trade- to be at least no worse than a single queue AQM. However, a trade-
off needs to be made between complexity and the risk of either off needs to be made between complexity and the risk of either
traffic class harming the other. In each of the following three traffic class harming the other. In each of the following three
subsections, an overload issue specific to the DualQ is described, subsections, an overload issue specific to the DualQ is described,
followed by proposed solution(s). followed by proposed solution(s).
Under overload the higher priority L4S service will have to sacrifice Under overload the higher priority L4S service will have to sacrifice
skipping to change at page 22, line 8 skipping to change at page 22, line 19
of whether to sacrifice L4S throughput or L4S delay (or some other of whether to sacrifice L4S throughput or L4S delay (or some other
policy) to mitigate starvation of Classic: policy) to mitigate starvation of Classic:
Sacrifice L4S throughput: By using weighted round robin as the Sacrifice L4S throughput: By using weighted round robin as the
conditional priority scheduler, the L4S service can sacrifice some conditional priority scheduler, the L4S service can sacrifice some
throughput during overload. This can either be thought of as throughput during overload. This can either be thought of as
guaranteeing a minimum throughput service for Classic traffic, or guaranteeing a minimum throughput service for Classic traffic, or
as guaranteeing a maximum delay for a packet at the head of the as guaranteeing a maximum delay for a packet at the head of the
Classic queue. Classic queue.
The scheduling weight of the Classic queue should be small (e.g. The scheduling weight of the Classic queue should be small
1/16). Then, in most traffic scenarios the scheduler will not (e.g. 1/16). Then, in most traffic scenarios the scheduler will
interfere and it will not need to - the coupling mechanism and the not interfere and it will not need to - the coupling mechanism and
end-systems will share out the capacity across both queues as if the end-systems will share out the capacity across both queues as
it were a single pool. However, because the congestion coupling if it were a single pool. However, because the congestion
only applies in one direction (from C to L), if L4S traffic is coupling only applies in one direction (from C to L), if L4S
over-aggressive or unresponsive, the scheduler weight for Classic traffic is over-aggressive or unresponsive, the scheduler weight
traffic will at least be large enough to ensure it does not for Classic traffic will at least be large enough to ensure it
starve. does not starve.
In cases where the ratio of L4S to Classic flows (e.g. 19:1) is In cases where the ratio of L4S to Classic flows (e.g. 19:1) is
greater than the ratio of their scheduler weights (e.g. 15:1), the greater than the ratio of their scheduler weights (e.g. 15:1), the
L4S flows will get less than an equal share of the capacity, but L4S flows will get less than an equal share of the capacity, but
only slightly. For instance, with the example numbers given, each only slightly. For instance, with the example numbers given, each
L4S flow will get (15/16)/19 = 4.9% when ideally each would get L4S flow will get (15/16)/19 = 4.9% when ideally each would get
1/20=5%. In the rather specific case of an unresponsive flow 1/20=5%. In the rather specific case of an unresponsive flow
taking up just less than the capacity set aside for L4S (e.g. taking up just less than the capacity set aside for L4S
14/16 in the above example), using WRR could significantly reduce (e.g. 14/16 in the above example), using WRR could significantly
the capacity left for any responsive L4S flows. reduce the capacity left for any responsive L4S flows.
The scheduling weight of the Classic queue should not be too The scheduling weight of the Classic queue should not be too
small, otherwise a C packet at the head of the queue could be small, otherwise a C packet at the head of the queue could be
excessively delayed by a continually busy L queue. For instance excessively delayed by a continually busy L queue. For instance
if the Classic weight is 1/16, the maximum that a Classic packet if the Classic weight is 1/16, the maximum that a Classic packet
at the head of the queue can be delayed by L traffic is the at the head of the queue can be delayed by L traffic is the
serialization delay of 15 MTU-sized packets. serialization delay of 15 MTU-sized packets.
Sacrifice L4S Delay: To control milder overload of responsive Sacrifice L4S Delay: To control milder overload of responsive
traffic, particularly when close to the maximum congestion signal, traffic, particularly when close to the maximum congestion signal,
skipping to change at page 23, line 49 skipping to change at page 24, line 11
Delay on Saturation: When L4S marking saturates, instead of Delay on Saturation: When L4S marking saturates, instead of
switching to drop, the drop and marking probabilities could be switching to drop, the drop and marking probabilities could be
capped. Beyond that, delay will grow either solely in the queue capped. Beyond that, delay will grow either solely in the queue
with unresponsive traffic (if WRR is used), or in both queues (if with unresponsive traffic (if WRR is used), or in both queues (if
time-shifted FIFO is used). In either case, the higher delay time-shifted FIFO is used). In either case, the higher delay
ought to control temporary high congestion. If the overload is ought to control temporary high congestion. If the overload is
more persistent, eventually the combined DualQ will overflow and more persistent, eventually the combined DualQ will overflow and
tail drop will control congestion. tail drop will control congestion.
The example implementation in Appendix A solely applies the "drop on The example implementation in Appendix A solely applies the "drop on
saturation" policy. The DOCSIS specification of a DualQ Coupled AQM saturation" policy. The DOCSIS specification of a DualQ Coupled
[DOCSIS3.1] also implements the 'drop on saturation' policy with a AQM [DOCSIS3.1] also implements the 'drop on saturation' policy with
very shallow L buffer. However, the addition of DOCSIS per-flow a very shallow L buffer. However, the addition of DOCSIS per-flow
Queue Protection [I-D.briscoe-docsis-q-protection] turns this into Queue Protection [I-D.briscoe-docsis-q-protection] turns this into
'delay on saturation' by redirecting some packets of the flow(s) most 'delay on saturation' by redirecting some packets of the flow(s) most
responsible for L queue overload into the C queue, which has a higher responsible for L queue overload into the C queue, which has a higher
delay target. If overload continues, this again becomes 'drop on delay target. If overload continues, this again becomes 'drop on
saturation' as the level of drop in the C queue rises to maintain the saturation' as the level of drop in the C queue rises to maintain the
target delay of the C queue. target delay of the C queue.
4.1.3. Protecting against Unresponsive ECN-Capable Traffic 4.1.3. Protecting against Unresponsive ECN-Capable Traffic
Unresponsive traffic has a greater advantage if it is also ECN- Unresponsive traffic has a greater advantage if it is also ECN-
skipping to change at page 25, line 15 skipping to change at page 25, line 24
Olga Albisser <olga@albisser.org> of Simula Research Lab, Norway Olga Albisser <olga@albisser.org> of Simula Research Lab, Norway
(Olga Bondarenko during early drafts) implemented the prototype (Olga Bondarenko during early drafts) implemented the prototype
DualPI2 AQM for Linux with Koen De Schepper and conducted DualPI2 AQM for Linux with Koen De Schepper and conducted
extensive evaluations as well as implementing the live performance extensive evaluations as well as implementing the live performance
visualization GUI [L4Sdemo16]. visualization GUI [L4Sdemo16].
Olivier Tilmans <olivier.tilmans@nokia-bell-labs.com> of Nokia Olivier Tilmans <olivier.tilmans@nokia-bell-labs.com> of Nokia
Bell Labs, Belgium prepared and maintains the Linux implementation Bell Labs, Belgium prepared and maintains the Linux implementation
of DualPI2 for upstreaming. of DualPI2 for upstreaming.
Tom Henderson <tomh@tomh.org> of CableLabs, US implemented various Shravya K.S. wrote a model for the ns-3 simulator based on the -01
DualQ Coupled AQMs for ns3, including DualPI2 and DualPIE over version of this Internet-Draft. Based on this initial work, Tom
point to point and DOCSIS 3.1 link models and conducted extensive Henderson <tomh@tomh.org> updated that earlier model and created a
evaluations. model for the DualQ variant specified as part of the Low Latency
DOCSIS specification, as well as conducting extensive evaluations.
Ing Jyh (Inton) Tsang of Nokia, Belgium built the End-to-End Data Ing Jyh (Inton) Tsang of Nokia, Belgium built the End-to-End Data
Centre to the Home broadband testbed on which DualQ Coupled AQM Centre to the Home broadband testbed on which DualQ Coupled AQM
implementations were tested. implementations were tested.
7. References 7. References
7.1. Normative References 7.1. Normative References
[I-D.ietf-tsvwg-ecn-l4s-id] [I-D.ietf-tsvwg-ecn-l4s-id]
Schepper, K. and B. Briscoe, "Identifying Modified Schepper, K. and B. Briscoe, "Identifying Modified
Explicit Congestion Notification (ECN) Semantics for Explicit Congestion Notification (ECN) Semantics for
Ultra-Low Queuing Delay (L4S)", draft-ietf-tsvwg-ecn-l4s- Ultra-Low Queuing Delay (L4S)", draft-ietf-tsvwg-ecn-l4s-
id-10 (work in progress), March 2020. id-11 (work in progress), November 2020.
[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
Requirement Levels", BCP 14, RFC 2119, Requirement Levels", BCP 14, RFC 2119,
DOI 10.17487/RFC2119, March 1997, DOI 10.17487/RFC2119, March 1997,
<https://www.rfc-editor.org/info/rfc2119>. <https://www.rfc-editor.org/info/rfc2119>.
[RFC3168] Ramakrishnan, K., Floyd, S., and D. Black, "The Addition [RFC3168] Ramakrishnan, K., Floyd, S., and D. Black, "The Addition
of Explicit Congestion Notification (ECN) to IP", of Explicit Congestion Notification (ECN) to IP",
RFC 3168, DOI 10.17487/RFC3168, September 2001, RFC 3168, DOI 10.17487/RFC3168, September 2001,
<https://www.rfc-editor.org/info/rfc3168>. <https://www.rfc-editor.org/info/rfc3168>.
skipping to change at page 27, line 28 skipping to change at page 27, line 40
Briscoe, B. and G. White, "Queue Protection to Preserve Briscoe, B. and G. White, "Queue Protection to Preserve
Low Latency", draft-briscoe-docsis-q-protection-00 (work Low Latency", draft-briscoe-docsis-q-protection-00 (work
in progress), July 2019. in progress), July 2019.
[I-D.briscoe-tsvwg-l4s-diffserv] [I-D.briscoe-tsvwg-l4s-diffserv]
Briscoe, B., "Interactions between Low Latency, Low Loss, Briscoe, B., "Interactions between Low Latency, Low Loss,
Scalable Throughput (L4S) and Differentiated Services", Scalable Throughput (L4S) and Differentiated Services",
draft-briscoe-tsvwg-l4s-diffserv-02 (work in progress), draft-briscoe-tsvwg-l4s-diffserv-02 (work in progress),
November 2018. November 2018.
[I-D.cardwell-iccrg-bbr-congestion-control]
Cardwell, N., Cheng, Y., Yeganeh, S., and V. Jacobson,
"BBR Congestion Control", draft-cardwell-iccrg-bbr-
congestion-control-00 (work in progress), July 2017.
[I-D.ietf-tsvwg-l4s-arch] [I-D.ietf-tsvwg-l4s-arch]
Briscoe, B., Schepper, K., Bagnulo, M., and G. White, "Low Briscoe, B., Schepper, K., Bagnulo, M., and G. White, "Low
Latency, Low Loss, Scalable Throughput (L4S) Internet Latency, Low Loss, Scalable Throughput (L4S) Internet
Service: Architecture", draft-ietf-tsvwg-l4s-arch-06 (work Service: Architecture", draft-ietf-tsvwg-l4s-arch-07 (work
in progress), March 2020. in progress), October 2020.
[I-D.ietf-tsvwg-nqb] [I-D.ietf-tsvwg-nqb]
White, G. and T. Fossati, "A Non-Queue-Building Per-Hop White, G. and T. Fossati, "A Non-Queue-Building Per-Hop
Behavior (NQB PHB) for Differentiated Services", draft- Behavior (NQB PHB) for Differentiated Services", draft-
ietf-tsvwg-nqb-01 (work in progress), March 2020. ietf-tsvwg-nqb-03 (work in progress), November 2020.
[L4Sdemo16] [L4Sdemo16]
Bondarenko, O., De Schepper, K., Tsang, I., and B. Bondarenko, O., De Schepper, K., Tsang, I., and B.
Briscoe, "Ultra-Low Delay for All: Live Experience, Live Briscoe, "Ultra-Low Delay for All: Live Experience, Live
Analysis", Proc. MMSYS'16 pp33:1--33:4, May 2016, Analysis", Proc. MMSYS'16 pp33:1--33:4, May 2016,
<http://dl.acm.org/citation.cfm?doid=2910017.2910633 <http://dl.acm.org/citation.cfm?doid=2910017.2910633
(videos of demos: (videos of demos:
https://riteproject.eu/dctth/#1511dispatchwg )>. https://riteproject.eu/dctth/#1511dispatchwg )>.
[LLD] White, G., Sundaresan, K., and B. Briscoe, "Low Latency [LLD] White, G., Sundaresan, K., and B. Briscoe, "Low Latency
skipping to change at page 31, line 38 skipping to change at page 32, line 5
well as being coupled across to the L4S queue. well as being coupled across to the L4S queue.
It also uses the following functions that are not shown in full here: It also uses the following functions that are not shown in full here:
o scheduler(), which selects between the head packets of the two o scheduler(), which selects between the head packets of the two
queues; the choice of scheduler technology is discussed later; queues; the choice of scheduler technology is discussed later;
o cq.len() or lq.len() returns the current length (aka. backlog) of o cq.len() or lq.len() returns the current length (aka. backlog) of
the relevant queue in bytes; the relevant queue in bytes;
o cq.time() or lq.time() returns the current queuing delay (aka. o cq.time() or lq.time() returns the current queuing delay
sojourn time or service time) of the relevant queue in units of (aka. sojourn time or service time) of the relevant queue in units
time (see Note a); of time (see Note a);
o mark(pkt) and drop(pkt) for ECN-marking and dropping a packet; o mark(pkt) and drop(pkt) for ECN-marking and dropping a packet;
In experiments so far (building on experiments with PIE) on broadband In experiments so far (building on experiments with PIE) on broadband
access links ranging from 4 Mb/s to 200 Mb/s with base RTTs from 5 ms access links ranging from 4 Mb/s to 200 Mb/s with base RTTs from 5 ms
to 100 ms, DualPI2 achieves good results with the default parameters to 100 ms, DualPI2 achieves good results with the default parameters
in Figure 2. The parameters are categorised by whether they relate in Figure 2. The parameters are categorised by whether they relate
to the Base PI2 AQM, the L4S AQM or the framework coupling them to the Base PI2 AQM, the L4S AQM or the framework coupling them
together. Constants and variables derived from these parameters are together. Constants and variables derived from these parameters are
also included at the end of each category. Each parameter is also included at the end of each category. Each parameter is
skipping to change at page 38, line 22 skipping to change at page 39, line 15
Nonetheless, an implementer might wish to add selected heuristics to Nonetheless, an implementer might wish to add selected heuristics to
either AQM. For instance the Linux reference DualPI2 implementation either AQM. For instance the Linux reference DualPI2 implementation
includes the following: includes the following:
o Prior to enqueuing an L4S packet, if the L queue contains <2 o Prior to enqueuing an L4S packet, if the L queue contains <2
packets, the packet is flagged to suppress any native L4S AQM packets, the packet is flagged to suppress any native L4S AQM
marking at dequeue (which depends on sojourn time); marking at dequeue (which depends on sojourn time);
o Classic and coupled marking or dropping (i.e. based on p_C and o Classic and coupled marking or dropping (i.e. based on p_C and
p_CL from the PI controller) is only applied to a packet if the p_CL from the PI controller) is only applied to a packet if the
respective queue length in bytes is > 2 MTU (prior to enqueueing respective queue length in bytes is > 2 MTU (prior to enqueuing
the packet or after dequeuing it, depending on whether the AQM is the packet or after dequeuing it, depending on whether the AQM is
configured to be applied at enqueue or dequeue); configured to be applied at enqueue or dequeue);
o In the WRR scheduler, the 'credit' indicating which queue should o In the WRR scheduler, the 'credit' indicating which queue should
transmit is only changed if there are packets in both queues (i.e. transmit is only changed if there are packets in both queues
if there is actual resource contention). This means that a (i.e. if there is actual resource contention). This means that a
properly paced L flow might never be delayed by the WRR. The WRR properly paced L flow might never be delayed by the WRR. The WRR
credit is reset in favour of the L queue when the link is idle. credit is reset in favour of the L queue when the link is idle.
An implementer might also wish to add other heuristics, e.g. burst An implementer might also wish to add other heuristics, e.g. burst
protection [RFC8033] or enhanced burst protection [RFC8034]. protection [RFC8033] or enhanced burst protection [RFC8034].
Notes: Notes:
a. The drain rate of the queue can vary if it is scheduled relative a. The drain rate of the queue can vary if it is scheduled relative
to other queues, or to cater for fluctuations in a wireless to other queues, or to cater for fluctuations in a wireless
medium. To auto-adjust to changes in drain rate, the queue needs medium. To auto-adjust to changes in drain rate, the queue needs
to be measured in time, not bytes or packets [AQMmetrics] to be measured in time, not bytes or packets [AQMmetrics],
[CoDel]. Queuing delay could be measured directly by storing a [CoDel]. Queuing delay could be measured directly by storing a
per-packet time-stamp as each packet is enqueued, and subtracting per-packet time-stamp as each packet is enqueued, and subtracting
this from the system time when the packet is dequeued. If time- this from the system time when the packet is dequeued. If time-
stamping is not easy to introduce with certain hardware, queuing stamping is not easy to introduce with certain hardware, queuing
delay could be predicted indirectly by dividing the size of the delay could be predicted indirectly by dividing the size of the
queue by the predicted departure rate, which might be known queue by the predicted departure rate, which might be known
precisely for some link technologies (see for example [RFC8034]). precisely for some link technologies (see for example [RFC8034]).
b. Line 2 of the dualpi2_enqueue() function (Figure 3) assumes an b. Line 2 of the dualpi2_enqueue() function (Figure 3) assumes an
implementation where lq and cq share common buffer memory. An implementation where lq and cq share common buffer memory. An
skipping to change at page 39, line 51 skipping to change at page 40, line 46
protects the queues against both temporary overload from responsive protects the queues against both temporary overload from responsive
flows and more persistent overload from any unresponsive traffic that flows and more persistent overload from any unresponsive traffic that
falsely claims to be responsive to ECN. falsely claims to be responsive to ECN.
When the Classic ECN marking probability reaches the p_Cmax threshold When the Classic ECN marking probability reaches the p_Cmax threshold
(1/k^2), the marking probability coupled to the L4S queue, p_CL will (1/k^2), the marking probability coupled to the L4S queue, p_CL will
always be 100% for any k (by equation (1) in Section 2). So, for always be 100% for any k (by equation (1) in Section 2). So, for
readability, the constant p_Lmax is defined as 1 in line 22 of the readability, the constant p_Lmax is defined as 1 in line 22 of the
initialization function (Figure 2). This is intended to ensure that initialization function (Figure 2). This is intended to ensure that
the L4S queue starts to introduce dropping once ECN-marking saturates the L4S queue starts to introduce dropping once ECN-marking saturates
at 100% and can rise no further. The 'Prague L4S' requirements at 100% and can rise no further. The 'Prague L4S'
requirements [I-D.ietf-tsvwg-ecn-l4s-id] state that, when an L4S
[I-D.ietf-tsvwg-ecn-l4s-id] state that, when an L4S congestion congestion control detects a drop, it falls back to a response that
control detects a drop, it falls back to a response that coexists coexists with 'Classic' Reno congestion control. So it is correct
with 'Classic' Reno congestion control. So it is correct that, when that, when the L4S queue drops packets, it drops them proportional to
the L4S queue drops packets, it drops them proportional to p'^2, as p'^2, as if they are Classic packets.
if they are Classic packets.
Both these switch-overs are triggered by the tests for overload Both these switch-overs are triggered by the tests for overload
introduced in lines 4b and 12b of the dequeue function (Figure 7). introduced in lines 4b and 12b of the dequeue function (Figure 7).
Lines 8c to 8g drop L4S packets with probability p'^2. Lines 8h to Lines 8c to 8g drop L4S packets with probability p'^2. Lines 8h to
8i mark the remaining packets with probability p_CL. Given p_Lmax = 8i mark the remaining packets with probability p_CL. Given p_Lmax =
1, all remaining packets will be marked because, to have reached the 1, all remaining packets will be marked because, to have reached the
else block at line 8b, p_CL >= 1. else block at line 8b, p_CL >= 1.
Lines 2c to 2d in the core PI algorithm (Figure 8) deal with overload Lines 2c to 2d in the core PI algorithm (Figure 8) deal with overload
of the L4S queue when there is no Classic traffic. This is of the L4S queue when there is no Classic traffic. This is
skipping to change at page 43, line 8 skipping to change at page 44, line 8
* TS-FIFO is only appropriate if time-stamping of packets is * TS-FIFO is only appropriate if time-stamping of packets is
feasible; feasible;
* Even if time-stamping is supported, the sojourn time of the * Even if time-stamping is supported, the sojourn time of the
head packet is always stale. For instance, if a burst arrives head packet is always stale. For instance, if a burst arrives
at an empty queue, the sojourn time will only measure the delay at an empty queue, the sojourn time will only measure the delay
of the burst once the burst is over, even though the queue knew of the burst once the burst is over, even though the queue knew
about it from the start. At the cost of more operations and about it from the start. At the cost of more operations and
more storage, a 'scaled sojourn time' metric of queue delay can more storage, a 'scaled sojourn time' metric of queue delay can
be used, which is the sojourn time of a packet scaled by the be used, which is the sojourn time of a packet scaled by the
ratio of the queue sizes when the packet departed and arrived ratio of the queue sizes when the packet departed and
[SigQ-Dyn]. arrived [SigQ-Dyn].
o A strict priority scheduler would be inappropriate, because it o A strict priority scheduler would be inappropriate, because it
would starve Classic if L4S was overloaded. would starve Classic if L4S was overloaded.
Appendix B. Example DualQ Coupled Curvy RED Algorithm Appendix B. Example DualQ Coupled Curvy RED Algorithm
As another example of a DualQ Coupled AQM algorithm, the pseudocode As another example of a DualQ Coupled AQM algorithm, the pseudocode
below gives the Curvy RED based algorithm. Although the AQM was below gives the Curvy RED based algorithm. Although the AQM was
designed to be efficient in integer arithmetic, to aid understanding designed to be efficient in integer arithmetic, to aid understanding
it is first given using floating point arithmetic (Figure 10). Then, it is first given using floating point arithmetic (Figure 10). Then,
skipping to change at page 44, line 5 skipping to change at page 45, line 5
or not shown in full here: or not shown in full here:
o the enqueue function, which is identical to that used for DualPI2, o the enqueue function, which is identical to that used for DualPI2,
dualpi2_enqueue(lq, cq, pkt) in Figure 3; dualpi2_enqueue(lq, cq, pkt) in Figure 3;
o mark(pkt) and drop(pkt) for ECN-marking and dropping a packet; o mark(pkt) and drop(pkt) for ECN-marking and dropping a packet;
o cq.len() or lq.len() returns the current length (aka. backlog) of o cq.len() or lq.len() returns the current length (aka. backlog) of
the relevant queue in bytes; the relevant queue in bytes;
o cq.time() or lq.time() returns the current queuing delay (aka. o cq.time() or lq.time() returns the current queuing delay
sojourn time or service time) of the relevant queue in units of (aka. sojourn time or service time) of the relevant queue in units
time (see Note a in Appendix A.1). of time (see Note a in Appendix A.1).
Because Curvy RED was evaluated before DualPI2, certain improvements Because Curvy RED was evaluated before DualPI2, certain improvements
introduced for DualPI2 were not evaluated for Curvy RED. In the introduced for DualPI2 were not evaluated for Curvy RED. In the
pseudocode below, the straightforward improvements have been added on pseudocode below, the straightforward improvements have been added on
the assumption they will provide similar benefits, but that has not the assumption they will provide similar benefits, but that has not
been proven experimentally. They are: i) a conditional priority been proven experimentally. They are: i) a conditional priority
scheduler instead of strict priority ii) a time-based threshold for scheduler instead of strict priority ii) a time-based threshold for
the native L4S AQM; iii) ECN support for the Classic AQM. A recent the native L4S AQM; iii) ECN support for the Classic AQM. A recent
evaluation has proved that a minimum ECN-marking threshold (minTh) evaluation has proved that a minimum ECN-marking threshold (minTh)
greatly improves performance, so this is also included in the greatly improves performance, so this is also included in the
skipping to change at page 44, line 38 skipping to change at page 45, line 38
same degree as the DualPI2 algorithm. In initial experiments on same degree as the DualPI2 algorithm. In initial experiments on
broadband access links ranging from 4 Mb/s to 200 Mb/s with base RTTs broadband access links ranging from 4 Mb/s to 200 Mb/s with base RTTs
from 5 ms to 100 ms, Curvy RED achieved good results with the default from 5 ms to 100 ms, Curvy RED achieved good results with the default
parameters in Figure 9. parameters in Figure 9.
The parameters are categorised by whether they relate to the Classic The parameters are categorised by whether they relate to the Classic
AQM, the L4S AQM or the framework coupling them together. Constants AQM, the L4S AQM or the framework coupling them together. Constants
and variables derived from these parameters are also included at the and variables derived from these parameters are also included at the
end of each category. These are the raw input parameters for the end of each category. These are the raw input parameters for the
algorithm. A configuration front-end could accept more meaningful algorithm. A configuration front-end could accept more meaningful
parameters (e.g. RTT_max and RTT_typ) and convert them into these parameters (e.g. RTT_max and RTT_typ) and convert them into these raw
raw parameters, as has been done for DualPI2 in Appendix A. Where parameters, as has been done for DualPI2 in Appendix A. Where
necessary, parameters are explained further in the walk-through of necessary, parameters are explained further in the walk-through of
the pseudocode below. the pseudocode below.
1: cred_params_init(...) { % Set input parameter defaults 1: cred_params_init(...) { % Set input parameter defaults
2: % DualQ Coupled framework parameters 2: % DualQ Coupled framework parameters
3: limit = MAX_LINK_RATE * 250 ms % Dual buffer size 3: limit = MAX_LINK_RATE * 250 ms % Dual buffer size
4: k' = 1 % Coupling factor as a power of 2 4: k' = 1 % Coupling factor as a power of 2
5: tshift = 50 ms % Time shift of TS-FIFO scheduler 5: tshift = 50 ms % Time shift of TS-FIFO scheduler
6: % Constants derived from Classic AQM parameters 6: % Constants derived from Classic AQM parameters
7: k = 2^k' % Coupling factor from Equation (1) 7: k = 2^k' % Coupling factor from Equation (1)
skipping to change at page 48, line 22 skipping to change at page 49, line 22
compute Classic drop probability so, before it is squared, it is compute Classic drop probability so, before it is squared, it is
effectively the square root of the drop probability, hence it is effectively the square root of the drop probability, hence it is
given the variable name sqrt_p_C. The squaring is done by given the variable name sqrt_p_C. The squaring is done by
comparing it with the maximum out of two random numbers (assuming comparing it with the maximum out of two random numbers (assuming
U=1). Comparing it with the maximum out of two is the same as the U=1). Comparing it with the maximum out of two is the same as the
logical `AND' of two tests, which ensures drop probability rises logical `AND' of two tests, which ensures drop probability rises
with the square of queuing time. with the square of queuing time.
The AQM functions in each queue (lines 5c & 10b) are two cases of a The AQM functions in each queue (lines 5c & 10b) are two cases of a
new generalization of RED called Curvy RED, motivated as follows. new generalization of RED called Curvy RED, motivated as follows.
When the performance of this AQM was compared with fq_CoDel and PIE, When the performance of this AQM was compared with FQ-CoDel and PIE,
their goal of holding queuing delay to a fixed target seemed their goal of holding queuing delay to a fixed target seemed
misguided [CRED_Insights]. As the number of flows increases, if the misguided [CRED_Insights]. As the number of flows increases, if the
AQM does not allow host congestion controllers to increase queuing AQM does not allow host congestion controllers to increase queuing
delay, it has to introduce abnormally high levels of loss. Then loss delay, it has to introduce abnormally high levels of loss. Then loss
rather than queuing becomes the dominant cause of delay for short rather than queuing becomes the dominant cause of delay for short
flows, due to timeouts and tail losses. flows, due to timeouts and tail losses.
Curvy RED constrains delay with a softened target that allows some Curvy RED constrains delay with a softened target that allows some
increase in delay as load increases. This is achieved by increasing increase in delay as load increases. This is achieved by increasing
drop probability on a convex curve relative to queue growth (the drop probability on a convex curve relative to queue growth (the
skipping to change at page 49, line 18 skipping to change at page 50, line 18
U=1 has been used in experiments so far, but results might be even U=1 has been used in experiments so far, but results might be even
better with U=2 or higher. better with U=2 or higher.
Notes: Notes:
1. The alternative of applying the AQMs at enqueue would shift some 1. The alternative of applying the AQMs at enqueue would shift some
processing from the critical time when each packet is dequeued. processing from the critical time when each packet is dequeued.
However, it would also add a whole queue of delay to the control However, it would also add a whole queue of delay to the control
signals, making the control loop sloppier (for a typical RTT it signals, making the control loop sloppier (for a typical RTT it
would double the Classic queue's feedback delay). On a platform would double the Classic queue's feedback delay). On a platform
where packet timestamping is feasible, e.g. Linux, it is also where packet timestamping is feasible, e.g. Linux, it is also
easiest to apply the AQMs at dequeue because that is where easiest to apply the AQMs at dequeue because that is where
queuing time is also measured. queuing time is also measured.
2. WRR better isolates the L4S queue from large delay bursts in the 2. WRR better isolates the L4S queue from large delay bursts in the
Classic queue, but it is slightly less simple than TS-FIFO. If Classic queue, but it is slightly less simple than TS-FIFO. If
WRR were used, a low default Classic weight (e.g. 1/16) would WRR were used, a low default Classic weight (e.g. 1/16) would
need to be configured in place of the time shift in line 5 of the need to be configured in place of the time shift in line 5 of the
initialization function (Figure 9). initialization function (Figure 9).
3. A step function is shown for simplicity. A ramp function (see 3. A step function is shown for simplicity. A ramp function (see
Figure 5 and the discussion around it in Appendix A.1) is Figure 5 and the discussion around it in Appendix A.1) is
recommended, because it is more general than a step and has the recommended, because it is more general than a step and has the
potential to enable L4S congestion controls to converge more potential to enable L4S congestion controls to converge more
rapidly. rapidly.
4. An EWMA is only one possible way to filter bursts; other more 4. An EWMA is only one possible way to filter bursts; other more
adaptive smoothing methods could be valid and it might be adaptive smoothing methods could be valid and it might be
appropriate to decrease the EWMA faster than it increases, e.g. appropriate to decrease the EWMA faster than it increases,
by using the minimum of the smoothed and instantaneous queue e.g. by using the minimum of the smoothed and instantaneous queue
delays, min(Q_C, qc.time()). delays, min(Q_C, qc.time()).
B.2. Efficient Implementation of Curvy RED B.2. Efficient Implementation of Curvy RED
Although code optimization depends on the platform, the following Although code optimization depends on the platform, the following
notes explain where the design of Curvy RED was particularly notes explain where the design of Curvy RED was particularly
motivated by efficient implementation. motivated by efficient implementation.
The Classic AQM at line 10b calls maxrand(2*U), which gives twice as The Classic AQM at line 10b calls maxrand(2*U), which gives twice as
much curviness as the call to maxrand(U) in the marking function at much curviness as the call to maxrand(U) in the marking function at
skipping to change at page 53, line 6 skipping to change at page 54, line 6
as Classic flows, to compensate for Classic flows slowing themselves as Classic flows, to compensate for Classic flows slowing themselves
down by causing themselves extra queuing delay. down by causing themselves extra queuing delay.
The values for k' in the table are derived from the formulae below, The values for k' in the table are derived from the formulae below,
which were developed in [DCttH15]: which were developed in [DCttH15]:
2^k' = 1.64 (RTT_reno / RTT_dc) (5) 2^k' = 1.64 (RTT_reno / RTT_dc) (5)
2^k' = 1.19 (RTT_cubic / RTT_dc ) (6) 2^k' = 1.19 (RTT_cubic / RTT_dc ) (6)
For localized traffic from a particular ISP's data centre, using the For localized traffic from a particular ISP's data centre, using the
measured RTTs, it was calculated that a value of k'=3 (equivalant to measured RTTs, it was calculated that a value of k'=3 (equivalent to
k=8) would achieve throughput equivalence, and experiments verified k=8) would achieve throughput equivalence, and experiments verified
the formula very closely. the formula very closely.
For a typical mix of RTTs from local data centres and across the For a typical mix of RTTs from local data centres and across the
general Internet, a value of k'=1 (equivalent to k=2) is recommended general Internet, a value of k'=1 (equivalent to k=2) is recommended
as a good workable compromise. as a good workable compromise.
Authors' Addresses Authors' Addresses
Koen De Schepper Koen De Schepper
 End of changes. 50 change blocks. 
119 lines changed or deleted 126 lines changed or added

This html diff was produced by rfcdiff 1.48. The latest version is available from http://tools.ietf.org/tools/rfcdiff/