draft-ietf-rmcat-sbd-00.txt | draft-ietf-rmcat-sbd-01.txt | |||
---|---|---|---|---|
RTP Media Congestion Avoidance D. Hayes, Ed. | RTP Media Congestion Avoidance D. Hayes, Ed. | |||
Techniques University of Oslo | Techniques University of Oslo | |||
Internet-Draft S. Ferlin | Internet-Draft S. Ferlin | |||
Intended status: Experimental Simula Research Laboratory | Intended status: Experimental Simula Research Laboratory | |||
Expires: November 9, 2015 M. Welzl | Expires: January 2, 2016 M. Welzl | |||
University of Oslo | University of Oslo | |||
May 8, 2015 | July 1, 2015 | |||
Shared Bottleneck Detection for Coupled Congestion Control for RTP | Shared Bottleneck Detection for Coupled Congestion Control for RTP | |||
Media. | Media. | |||
draft-ietf-rmcat-sbd-00 | draft-ietf-rmcat-sbd-01 | |||
Abstract | Abstract | |||
This document describes a mechanism to detect whether end-to-end data | This document describes a mechanism to detect whether end-to-end data | |||
flows share a common bottleneck. It relies on summary statistics | flows share a common bottleneck. It relies on summary statistics | |||
that are calculated by a data receiver based on continuous | that are calculated by a data receiver based on continuous | |||
measurements and regularly fed to a grouping algorithm that runs | measurements and regularly fed to a grouping algorithm that runs | |||
wherever the knowledge is needed. This mechanism complements the | wherever the knowledge is needed. This mechanism complements the | |||
coupled congestion control mechanism in draft-welzl-rmcat-coupled-cc. | coupled congestion control mechanism in draft-welzl-rmcat-coupled-cc. | |||
skipping to change at page 1, line 39 | skipping to change at page 1, line 39 | |||
Internet-Drafts are working documents of the Internet Engineering | Internet-Drafts are working documents of the Internet Engineering | |||
Task Force (IETF). Note that other groups may also distribute | Task Force (IETF). Note that other groups may also distribute | |||
working documents as Internet-Drafts. The list of current Internet- | working documents as Internet-Drafts. The list of current Internet- | |||
Drafts is at http://datatracker.ietf.org/drafts/current/. | Drafts is at http://datatracker.ietf.org/drafts/current/. | |||
Internet-Drafts are draft documents valid for a maximum of six months | Internet-Drafts are draft documents valid for a maximum of six months | |||
and may be updated, replaced, or obsoleted by other documents at any | and may be updated, replaced, or obsoleted by other documents at any | |||
time. It is inappropriate to use Internet-Drafts as reference | time. It is inappropriate to use Internet-Drafts as reference | |||
material or to cite them other than as "work in progress." | material or to cite them other than as "work in progress." | |||
This Internet-Draft will expire on November 9, 2015. | This Internet-Draft will expire on January 2, 2016. | |||
Copyright Notice | Copyright Notice | |||
Copyright (c) 2015 IETF Trust and the persons identified as the | Copyright (c) 2015 IETF Trust and the persons identified as the | |||
document authors. All rights reserved. | document authors. All rights reserved. | |||
This document is subject to BCP 78 and the IETF Trust's Legal | This document is subject to BCP 78 and the IETF Trust's Legal | |||
Provisions Relating to IETF Documents | Provisions Relating to IETF Documents | |||
(http://trustee.ietf.org/license-info) in effect on the date of | (http://trustee.ietf.org/license-info) in effect on the date of | |||
publication of this document. Please review these documents | publication of this document. Please review these documents | |||
skipping to change at page 2, line 18 | skipping to change at page 2, line 18 | |||
described in the Simplified BSD License. | described in the Simplified BSD License. | |||
Table of Contents | Table of Contents | |||
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 | 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 | |||
1.1. The signals . . . . . . . . . . . . . . . . . . . . . . . 3 | 1.1. The signals . . . . . . . . . . . . . . . . . . . . . . . 3 | |||
1.1.1. Packet Loss . . . . . . . . . . . . . . . . . . . . . 3 | 1.1.1. Packet Loss . . . . . . . . . . . . . . . . . . . . . 3 | |||
1.1.2. Packet Delay . . . . . . . . . . . . . . . . . . . . . 3 | 1.1.2. Packet Delay . . . . . . . . . . . . . . . . . . . . . 3 | |||
1.1.3. Path Lag . . . . . . . . . . . . . . . . . . . . . . . 4 | 1.1.3. Path Lag . . . . . . . . . . . . . . . . . . . . . . . 4 | |||
2. Definitions . . . . . . . . . . . . . . . . . . . . . . . . . 4 | 2. Definitions . . . . . . . . . . . . . . . . . . . . . . . . . 4 | |||
2.1. Parameter Values . . . . . . . . . . . . . . . . . . . . . 5 | 2.1. Parameters and their Effect . . . . . . . . . . . . . . . 5 | |||
3. Mechanism . . . . . . . . . . . . . . . . . . . . . . . . . . 6 | 2.2. Recommended Parameter Values . . . . . . . . . . . . . . . 7 | |||
3.1. Key metrics and their calculation . . . . . . . . . . . . 7 | 3. Mechanism . . . . . . . . . . . . . . . . . . . . . . . . . . 7 | |||
3.1.1. Mean delay . . . . . . . . . . . . . . . . . . . . . . 7 | 3.1. Key metrics and their calculation . . . . . . . . . . . . 9 | |||
3.1.2. Skewness Estimate . . . . . . . . . . . . . . . . . . 8 | 3.1.1. Mean delay . . . . . . . . . . . . . . . . . . . . . . 9 | |||
3.1.3. Variance Estimate . . . . . . . . . . . . . . . . . . 9 | 3.1.2. Skewness Estimate . . . . . . . . . . . . . . . . . . 9 | |||
3.1.4. Oscillation Estimate . . . . . . . . . . . . . . . . . 9 | 3.1.3. Variability Estimate . . . . . . . . . . . . . . . . . 10 | |||
3.1.5. Packet loss . . . . . . . . . . . . . . . . . . . . . 10 | 3.1.4. Oscillation Estimate . . . . . . . . . . . . . . . . . 11 | |||
3.2. Flow Grouping . . . . . . . . . . . . . . . . . . . . . . 10 | 3.1.5. Packet loss . . . . . . . . . . . . . . . . . . . . . 11 | |||
3.2.1. Flow Grouping Algorithm . . . . . . . . . . . . . . . 10 | 3.2. Flow Grouping . . . . . . . . . . . . . . . . . . . . . . 12 | |||
3.2.2. Using the flow group signal . . . . . . . . . . . . . 12 | 3.2.1. Flow Grouping Algorithm . . . . . . . . . . . . . . . 12 | |||
3.3. Removing Noise from the Estimates . . . . . . . . . . . . 12 | 3.2.2. Using the flow group signal . . . . . . . . . . . . . 13 | |||
3.3.1. Oscillation noise . . . . . . . . . . . . . . . . . . 12 | 3.3. Removing Noise from the Estimates . . . . . . . . . . . . 13 | |||
3.3.2. Clock drift . . . . . . . . . . . . . . . . . . . . . 13 | 3.3.1. PDV noise . . . . . . . . . . . . . . . . . . . . . . 14 | |||
3.3.3. Bias in the skewness measure . . . . . . . . . . . . . 14 | 3.3.2. Oscillation noise . . . . . . . . . . . . . . . . . . 14 | |||
3.4. Reducing lag and Improving Responsiveness . . . . . . . . 14 | 3.3.3. Clock skew . . . . . . . . . . . . . . . . . . . . . . 15 | |||
3.4.1. Improving the response of the skewness estimate . . . 15 | 3.4. Reducing lag and Improving Responsiveness . . . . . . . . 15 | |||
3.4.2. Improving the response of the variance estimate . . . 15 | 3.4.1. Improving the response of the skewness estimate . . . 16 | |||
4. Measuring OWD . . . . . . . . . . . . . . . . . . . . . . . . 16 | 3.4.2. Improving the response of the variability estimate . . 16 | |||
4.1. Time stamp resolution . . . . . . . . . . . . . . . . . . 16 | 4. Measuring OWD . . . . . . . . . . . . . . . . . . . . . . . . 17 | |||
5. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 16 | 4.1. Time stamp resolution . . . . . . . . . . . . . . . . . . 17 | |||
6. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 16 | 5. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 17 | |||
7. Security Considerations . . . . . . . . . . . . . . . . . . . 16 | 6. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 17 | |||
8. Change history . . . . . . . . . . . . . . . . . . . . . . . . 17 | 7. Security Considerations . . . . . . . . . . . . . . . . . . . 17 | |||
9. References . . . . . . . . . . . . . . . . . . . . . . . . . . 17 | 8. Change history . . . . . . . . . . . . . . . . . . . . . . . . 18 | |||
9.1. Normative References . . . . . . . . . . . . . . . . . . . 17 | 9. References . . . . . . . . . . . . . . . . . . . . . . . . . . 18 | |||
9.2. Informative References . . . . . . . . . . . . . . . . . . 17 | 9.1. Normative References . . . . . . . . . . . . . . . . . . . 18 | |||
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 18 | 9.2. Informative References . . . . . . . . . . . . . . . . . . 18 | |||
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 19 | ||||
1. Introduction | 1. Introduction | |||
In the Internet, it is not normally known if flows (e.g., TCP | In the Internet, it is not normally known if flows (e.g., TCP | |||
connections or UDP data streams) traverse the same bottlenecks. Even | connections or UDP data streams) traverse the same bottlenecks. Even | |||
flows that have the same sender and receiver may take different paths | flows that have the same sender and receiver may take different paths | |||
and share a bottleneck or not. Flows that share a bottleneck link | and share a bottleneck or not. Flows that share a bottleneck link | |||
usually compete with one another for their share of the capacity. | usually compete with one another for their share of the capacity. | |||
This competition has the potential to increase packet loss and | This competition has the potential to increase packet loss and | |||
delays. This is especially relevant for interactive applications | delays. This is especially relevant for interactive applications | |||
skipping to change at page 3, line 47 | skipping to change at page 3, line 47 | |||
End-to-end delay measurements include noise from every device along | End-to-end delay measurements include noise from every device along | |||
the path in addition to the delay perturbation at the bottleneck | the path in addition to the delay perturbation at the bottleneck | |||
device. The noise is often significantly increased if the round-trip | device. The noise is often significantly increased if the round-trip | |||
time is used. The cleanest signal is obtained by using One-Way-Delay | time is used. The cleanest signal is obtained by using One-Way-Delay | |||
(OWD). | (OWD). | |||
Measuring absolute OWD is difficult since it requires both the sender | Measuring absolute OWD is difficult since it requires both the sender | |||
and receiver clocks to be synchronised. However, since the | and receiver clocks to be synchronised. However, since the | |||
statistics being collected are relative to the mean OWD, a relative | statistics being collected are relative to the mean OWD, a relative | |||
OWD measurement is sufficient. Clock drift is not usually | OWD measurement is sufficient. Clock skew is not usually significant | |||
significant over the time intervals used by this SBD mechanism (see | over the time intervals used by this SBD mechanism (see [RFC6817] A.2 | |||
[RFC6817] A.2 for a discussion on clock drift and OWD measurements). | for a discussion on clock skew and OWD measurements). However, in | |||
However, in circumstances where it is significant, Section 3.3.2 | circumstances where it is significant, Section 3.3.3 outlines a way | |||
outlines a way of adjusting the calculations to cater for it. | of adjusting the calculations to cater for it. | |||
Each packet arriving at the bottleneck buffer may experience very | Each packet arriving at the bottleneck buffer may experience very | |||
different queue lengths, and therefore different waiting times. A | different queue lengths, and therefore different waiting times. A | |||
single OWD sample does not, therefore, characterize the path well. | single OWD sample does not, therefore, characterize the path well. | |||
However, multiple OWD measurements do reflect the distribution of | However, multiple OWD measurements do reflect the distribution of | |||
delays experienced at the bottleneck. | delays experienced at the bottleneck. | |||
1.1.3. Path Lag | 1.1.3. Path Lag | |||
Flows that share a common bottleneck may traverse different paths, | Flows that share a common bottleneck may traverse different paths, | |||
skipping to change at page 4, line 31 | skipping to change at page 4, line 31 | |||
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", | The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", | |||
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this | "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this | |||
document are to be interpreted as described in RFC 2119 [RFC2119]. | document are to be interpreted as described in RFC 2119 [RFC2119]. | |||
Acronyms used in this document: | Acronyms used in this document: | |||
OWD -- One Way Delay | OWD -- One Way Delay | |||
PDV -- Packet Delay Variation | PDV -- Packet Delay Variation | |||
MAD -- Mean Absolute Deviation | ||||
RTT -- Round Trip Time | RTT -- Round Trip Time | |||
SBD -- Shared Bottleneck Detection | SBD -- Shared Bottleneck Detection | |||
Conventions used in this document: | Conventions used in this document: | |||
T -- the base time interval over which measurements are | T -- the base time interval over which measurements are | |||
made. | made. | |||
N -- the number of base time, T, intervals used in some | N -- the number of base time, T, intervals used in some | |||
calculations. | calculations. | |||
sum_T(...) -- summation of all the measurements of the variable | sum_T(...) -- summation of all the measurements of the variable | |||
in parentheses taken over the interval T | in parentheses taken over the interval T | |||
sum(...) -- summation of terms of the variable in parentheses | ||||
sum(...) -- summation of terms of the variable in parentheses | ||||
sum_N(...) -- summation of N terms of the variable in parentheses | sum_N(...) -- summation of N terms of the variable in parentheses | |||
sum_NT(...) -- summation of all measurements taken over the | ||||
interval N*T | ||||
E_T(...) -- the expectation or mean of the measurements of the | sum_NT(...) -- summation of all measurements taken over the | |||
variable in parentheses over T | interval N*T | |||
E_N(...) -- The expectation or mean of the last N values of the | E_T(...) -- the expectation or mean of the measurements of the | |||
variable in parentheses | variable in parentheses over T | |||
E_M(...) -- The expectation or mean of the last M values of the | E_N(...) -- the expectation or mean of the last N values of the | |||
variable in parentheses, where M <= N. | variable in parentheses | |||
E_M(...) -- the expectation or mean of the last M values of the | ||||
variable in parentheses, where M <= N. | ||||
max_T(...) -- the maximum recorded measurement of the variable in | max_T(...) -- the maximum recorded measurement of the variable in | |||
parentheses taken over the interval T | parentheses taken over the interval T | |||
min_T(...) -- the minimum recorded measurement of the variable in | min_T(...) -- the minimum recorded measurement of the variable in | |||
parentheses taken over the interval T | parentheses taken over the interval T | |||
num_T(...) -- the count of measurements of the variable in | num_T(...) -- the count of measurements of the variable in | |||
parentheses taken in the interval T | parentheses taken in the interval T | |||
num_VM(...) -- the count of valid values of the variable in | num_VM(...) -- the count of valid values of the variable in | |||
parentheses given M records | parentheses given M records | |||
PC -- a boolean variable indicating the particular flow was | PC -- a boolean variable indicating the particular flow | |||
identified as experiencing congestion in the previous | was identified as experiencing congestion in the | |||
interval T (i.e. Previously Congested) | previous interval T (i.e. Previously Congested) | |||
CD_T -- an estimate of the effect of Clock Drift on the mean | skew_est -- a measure of skewness in a OWD distribution. | |||
OWD per T | ||||
CD_Adj(...) -- Mean OWD adjusted for clock drift | var_est -- a measure of variability in OWD measurements. | |||
p_l, p_f, p_pdv, c_s, c_h, p_s, p_d, p_v -- various thresholds | freq_est -- a measure of low frequency oscillation in the OWD | |||
used in the mechanism. | measurements. | |||
N, M, and F -- number of values (calculated over T). | p_l, p_f, p_pdv, p_mad, c_s, c_h, p_s, p_d, p_v -- various | |||
thresholds used in the mechanism | ||||
2.1. Parameter Values | M and F -- number of values related to N | |||
2.1. Parameters and their Effect | ||||
T T should be long enough so that there are enough packets | ||||
received during T for a useful estimate of short term mean | ||||
OWD and variation statistics. Making T too large can limit | ||||
the efficacy of PDV and freq_est. It will also increase the | ||||
response time of the mechanism. Making T too small will make | ||||
the metrics noisier. | ||||
N & M N should be large enough provide a stable estimate of | ||||
oscillations in OWD and average PDV. Usually M=N, though | ||||
having M<N may be beneficial in certain circumstances. M*T | ||||
needs to be long enough provide stable estimates of skewness | ||||
and MAD (if used). | ||||
F F determines the number of intervals over which statistics | ||||
are considered to be equally weighted. When F=M recent and | ||||
older measurements are considered equal. Making F<M can | ||||
increase the responsiveness of the SBD mechanism. If F is | ||||
too small, statistics will be too noisy. | ||||
c_s c_s is the threshold in skew_est used for determining whether | ||||
a flow is experiencing congestion or not. It should be | ||||
slightly negative so that a very lightly loaded path does not | ||||
give a false indication. Setting c_s more negative makes the | ||||
SBD mechanism less sensitive to transient and light | ||||
congestion episodes. | ||||
c_s c_h adds hysteresis to the congestion determination. It | ||||
should be large enough to avoid constant switching in the | ||||
determination, but low enough to ensure that grouping is not | ||||
attempted when there is no congestion and the delay and loss | ||||
signals cannot be relied upon. | ||||
p_v p_v determines the sensitivity of freq_est to noise. Making | ||||
it smaller will yield higher but noisier values for freq_est. | ||||
Making it too large will render it ineffective for | ||||
determining groups. | ||||
p_* Flows are separated when the skew_est|var_est|freq_est | ||||
measure is greater than p_s|p_f|p_d|(p_pdv|p_mad). Adjusting | ||||
these is a compromise between false grouping of flows that do | ||||
not share a bottleneck and false splitting of flows that do. | ||||
Making them larger can help if the measures are very noisy, | ||||
but reducing the noise in the statistical measures by | ||||
adjusting T and N|M may be a better solution. | ||||
2.2. Recommended Parameter Values | ||||
Reference [Hayes-LCN14] uses T=350ms, N=50, p_l = 0.1. The other | Reference [Hayes-LCN14] uses T=350ms, N=50, p_l = 0.1. The other | |||
parameters have been tightened to reflect minor enhancements to the | parameters have been tightened to reflect minor enhancements to the | |||
algorithm outlined in Section 3.3: c_s = -0.01, p_f = p_s = p_d = | algorithm outlined in Section 3.3: c_s = -0.01, p_f = p_s = p_d = | |||
0.1, p_pdv = 0.2, p_v = 0.2. M=50, F=10, and c_h = 0.3 are | 0.1, p_pdv = 0.2, p_v = 0.2 (or p_mad=0.1, p_v=0.7). M=50, F=25, and | |||
additional parameters defined in the document. These are values that | c_h = 0.3 are additional parameters defined in the document. These | |||
seem to work well over a wide range of practical Internet conditions, | are values that seem to work well over a wide range of practical | |||
but are the subject of ongoing tests. | Internet conditions. | |||
3. Mechanism | 3. Mechanism | |||
The mechanism described in this document is based on the observation | The mechanism described in this document is based on the observation | |||
that the distribution of delay measurements of packets from flows | that the distribution of delay measurements of packets that traverse | |||
that share a common bottleneck have similar shape characteristics. | a common bottleneck have similar shape characteristics. These shape | |||
These shape characteristics are described using 3 key summary | characteristics are described using 3 key summary statistics: | |||
statistics: | ||||
variance (estimate var_est, see Section 3.1.3) | variability (estimate var_est, see Section 3.1.3) | |||
skewness (estimate skew_est, see Section 3.1.2) | skewness (estimate skew_est, see Section 3.1.2) | |||
oscillation (estimate freq_est, see Section 3.1.4) | oscillation (estimate freq_est, see Section 3.1.4) | |||
with packet loss (estimate pkt_loss, see Section 3.1.5) used as a | with packet loss (estimate pkt_loss, see Section 3.1.5) used as a | |||
supplementary statistic. | supplementary statistic. | |||
Summary statistics help to address both the noise and the path lag | Summary statistics help to address both the noise and the path lag | |||
problems by describing the general shape over a relatively long | problems by describing the general shape over a relatively long | |||
period of time. This is sufficient for their application in coupled | period of time. This is sufficient for their application in coupled | |||
congestion control for RTP Media. They can be signalled from a | congestion control for RTP Media. They can be signalled from a | |||
receiver, which measures the OWD and calculates the summary | receiver, which measures the OWD and calculates the summary | |||
statistics, to a sender, which is the entity that is transmitting the | statistics, to a sender, which is the entity that is transmitting the | |||
media stream. An RTP Media device may be both a sender and a | media stream. An RTP Media device may be both a sender and a | |||
receiver. SBD can be performed at either Sender or receiver or both. | receiver. SBD can be performed at either a sender or a receiver or | |||
both. | ||||
+----+ | +----+ | |||
| H2 | | | H2 | | |||
+----+ | +----+ | |||
| | | | |||
| L2 | | L2 | |||
| | | | |||
+----+ L1 | L3 +----+ | +----+ L1 | L3 +----+ | |||
| H1 |------|------| H3 | | | H1 |------|------| H3 | | |||
+----+ +----+ | +----+ +----+ | |||
skipping to change at page 7, line 50 | skipping to change at page 9, line 28 | |||
necessarily be synchronized. However, it is a base measure for the 3 | necessarily be synchronized. However, it is a base measure for the 3 | |||
summary statistics. The mean delay, E_T(OWD), is the average one way | summary statistics. The mean delay, E_T(OWD), is the average one way | |||
delay measured over T. | delay measured over T. | |||
To facilitate the other calculations, the last N E_T(OWD) values will | To facilitate the other calculations, the last N E_T(OWD) values will | |||
need to be stored in a cyclic buffer along with the moving average of | need to be stored in a cyclic buffer along with the moving average of | |||
E_T(OWD): | E_T(OWD): | |||
mean_delay = E_M(E_T(OWD)) = sum_M(E_T(OWD)) / M | mean_delay = E_M(E_T(OWD)) = sum_M(E_T(OWD)) / M | |||
where M <= N. Generally M=N, setting M to be less than N allows the | where M <= N. Generally M=N: setting M to be less than N allows the | |||
mechanism to be more responsive to changes, but potentially at the | mechanism to be more responsive to changes, but potentially at the | |||
expense of a higher error rate (see Section 3.4 for a discussion on | expense of a higher error rate (see Section 3.4 for a discussion on | |||
improving the responsiveness of the mechanism.) | improving the responsiveness of the mechanism.) | |||
3.1.2. Skewness Estimate | 3.1.2. Skewness Estimate | |||
Skewness is difficult to calculate efficiently and accurately. | Skewness is difficult to calculate efficiently and accurately. | |||
Ideally it should be calculated over the entire period (M * T) from | Ideally it should be calculated over the entire period (M * T) from | |||
the mean OWD over that period. However this would require storing | the mean OWD over that period. However this would require storing | |||
every delay measurement over the period. Instead, an estimate is | every delay measurement over the period. Instead, an estimate is | |||
made over T using the previous calculation of mean_delay. | made over M * T based on a calculation every T using the previous T's | |||
Comparisons are made using the mean of M skew estimates (an | calculation of mean_delay. | |||
alternative that removes bias in the mean is given in Section 3.3.3). | ||||
The skewness is estimated using two counters, counting the number of | The skewness is estimated using two counters, counting the number of | |||
one way delay samples (OWD) above and below the mean: | one way delay samples (OWD) above and below the mean: | |||
skew_est_T = (sum_T(OWD < mean_delay) | skew_base_T = sum_T(OWD < mean_delay) - sum_T(OWD > mean_delay) | |||
- sum_T(OWD > mean_delay)) / num_T(OWD) | ||||
where | where | |||
if (OWD < mean_delay) 1 else 0 | if (OWD < mean_delay) 1 else 0 | |||
if (OWD > mean_delay) 1 else 0 | if (OWD > mean_delay) 1 else 0 | |||
skew_est_T is a number between -1 and 1 | and mean_delay does not include the mean of the current T | |||
interval. | ||||
skew_est = E_M(skew_est_T) = sum_M(skew_est_T) / M | skew_est = sum_MT(skew_base_T)/num_MT(OWD) | |||
For implementation ease, mean_delay does not include the mean of the | where skew_est is a number between -1 and 1 | |||
current T interval. | ||||
Note: Care must be taken when implementing the comparisons to ensure | Note: Care must be taken when implementing the comparisons to ensure | |||
that rounding does not bias skew_est. It is important that the mean | that rounding does not bias skew_est. It is important that the mean | |||
is calculated with a higher precision than the samples. | is calculated with a higher precision than the samples. | |||
3.1.3. Variance Estimate | 3.1.3. Variability Estimate | |||
Packet Delay Variation (PDV) ([RFC5481] and [ITU-Y1540]) is used as | Packet Delay Variation (PDV) ([RFC5481] and [ITU-Y1540]) is used as | |||
an estimator of the variance of the delay signal. We define PDV as | an estimator of the variability of the delay signal. We define PDV | |||
follows: | as follows: | |||
PDV = PDV_max = max_T(OWD) - E_T(OWD) | PDV = PDV_max = max_T(OWD) - E_T(OWD) | |||
var_est = E_M(PDV) = sum_M(PDV) / M | var_est = E_M(PDV) = sum_M(PDV) / M | |||
This modifies PDV as outlined in [RFC5481] to provide a summary | This modifies PDV as outlined in [RFC5481] to provide a summary | |||
statistic version that best aids the grouping decisions of the | statistic version that best aids the grouping decisions of the | |||
algorithm (see [Hayes-LCN14] section IVB). | algorithm (see [Hayes-LCN14] section IVB). | |||
The use of PDV = PDV_min = E_T(OWD) - min_T(OWD) is currently being | Generally the maximum is sampled well during congestion, though it is | |||
investigated as an alternative that is less sensitive to noise. The | more sensitive to path and operating system noise. The use of PDV = | |||
drawback of using PDV_min is that it does not distinguish between | PDV_min = E_T(OWD) - min_T(OWD) would be less sensitive to this | |||
groups of flows with similar values of skew_est as well as PDV_max | noise, but is not well sampled during congestion at the bottleneck | |||
(see [Hayes-LCN14] section IVB). | and therefore not recommended. | |||
3.1.4. Oscillation Estimate | 3.1.4. Oscillation Estimate | |||
An estimate of the low frequency oscillation of the delay signal is | An estimate of the low frequency oscillation of the delay signal is | |||
calculated by counting and normalising the significant mean, | calculated by counting and normalising the significant mean, | |||
E_T(OWD), crossings of mean_delay: | E_T(OWD), crossings of mean_delay: | |||
freq_est = number_of_crossings / N | freq_est = number_of_crossings / N | |||
Where | where we define a significant mean crossing as a crossing that | |||
we define a significant mean crossing as a crossing that | ||||
extends p_v * var_est from mean_delay. In our experiments we | extends p_v * var_est from mean_delay. In our experiments we | |||
have found that p_v = 0.2 is a good value. | have found that p_v = 0.2 is a good value. | |||
Freq_est is a number between 0 and 1. Freq_est can be approximated | Freq_est is a number between 0 and 1. Freq_est can be approximated | |||
incrementally as follows: | incrementally as follows: | |||
With each new calculation of E_T(OWD) a decision is made as to | With each new calculation of E_T(OWD) a decision is made as to | |||
whether this value of E_T(OWD) significantly crosses the current | whether this value of E_T(OWD) significantly crosses the current | |||
long term mean, mean_delay, with respect to the previous | long term mean, mean_delay, with respect to the previous | |||
significant mean crossing. | significant mean crossing. | |||
A cyclic buffer, last_N_crossings, records a 1 if there is a | A cyclic buffer, last_N_crossings, records a 1 if there is a | |||
significant mean crossing, otherwise a 0. | significant mean crossing, otherwise a 0. | |||
The counter, number_of_crossings, is incremented when there is a | The counter, number_of_crossings, is incremented when there is a | |||
significant mean crossing and subtracted from when a non-zero | significant mean crossing and decremented when a non-zero value is | |||
value is removed from the last_N_crossings. | removed from the last_N_crossings. | |||
This approximation of freq_est was not used in [Hayes-LCN14], which | This approximation of freq_est was not used in [Hayes-LCN14], which | |||
calculated freq_est every T using the current E_N(E_T(OWD)). Our | calculated freq_est every T using the current E_N(E_T(OWD)). Our | |||
tests show that this approximation of freq_est yields results that | tests show that this approximation of freq_est yields results that | |||
are almost identical to when the full calculation is performed every | are almost identical to when the full calculation is performed every | |||
T. | T. | |||
3.1.5. Packet loss | 3.1.5. Packet loss | |||
The proportion of packets lost is used as a supplementary measure: | The proportion of packets lost is used as a supplementary measure: | |||
pkt_loss = sum_NT(lost packets) / sum_NT(total packets) | pkt_loss = sum_NT(lost packets) / sum_NT(total packets) | |||
Note: When pkt_loss is small it is very variable, however, when | Note: When pkt_loss is small it is very variable, however, when | |||
pkt_loss is high it becomes a stable measure for making grouping | pkt_loss is high it becomes a stable measure for making grouping | |||
decisions. | decisions.. | |||
3.2. Flow Grouping | 3.2. Flow Grouping | |||
3.2.1. Flow Grouping Algorithm | 3.2.1. Flow Grouping Algorithm | |||
The following grouping algorithm is RECOMMENDED for SBD in the RMCAT | The following grouping algorithm is RECOMMENDED for SBD in the RMCAT | |||
context and is sufficient and efficient for small to moderate numbers | context and is sufficient and efficient for small to moderate numbers | |||
of flows. For very large numbers of flows (e.g. hundreds), a more | of flows. For very large numbers of flows (e.g. hundreds), a more | |||
complex clustering algorithm may be substituted. | complex clustering algorithm may be substituted. | |||
skipping to change at page 11, line 51 | skipping to change at page 13, line 30 | |||
diff(skew_est) < p_s | diff(skew_est) < p_s | |||
otherwise | otherwise | |||
diff(pkt_loss) < (p_d * pkt_loss) | diff(pkt_loss) < (p_d * pkt_loss) | |||
The threshold, (p_d * pkt_loss), is with respect to the | The threshold, (p_d * pkt_loss), is with respect to the | |||
highest value in the difference. | highest value in the difference. | |||
This procedure involves sorting estimates from highest to lowest. It | This procedure involves sorting estimates from highest to lowest. It | |||
is simple to implement, and efficient for small numbers of flows, | is simple to implement, and efficient for small numbers of flows (up | |||
such as are expected in RTCWEB. | to 10-20). | |||
3.2.2. Using the flow group signal | 3.2.2. Using the flow group signal | |||
A grouping decisions is made every T from the second T, though they | A grouping decisions is made every T from the second T, though they | |||
will not attain their full design accuracy until after the N'th T | will not attain their full design accuracy until after the N'th T | |||
interval. | interval. | |||
Network conditions, and even the congestion controllers, can cause | Network conditions, and even the congestion controllers, can cause | |||
bottlenecks to fluctuate. A coupled congestion controller MAY decide | bottlenecks to fluctuate. A coupled congestion controller MAY decide | |||
only to couple groups that remain stable, say grouped together 90% of | only to couple groups that remain stable, say grouped together 90% of | |||
skipping to change at page 12, line 26 | skipping to change at page 14, line 5 | |||
coupled congestion controllers objectives. | coupled congestion controllers objectives. | |||
3.3. Removing Noise from the Estimates | 3.3. Removing Noise from the Estimates | |||
The following describe small changes to the calculation of the key | The following describe small changes to the calculation of the key | |||
metrics that help remove noise from them. Currently these "tweaks" | metrics that help remove noise from them. Currently these "tweaks" | |||
are described separately to keep the main description succinct. In | are described separately to keep the main description succinct. In | |||
future revisions of the draft these enhancements may replace the | future revisions of the draft these enhancements may replace the | |||
original key metric calculations. | original key metric calculations. | |||
3.3.1. Oscillation noise | 3.3.1. PDV noise | |||
When a path has no congestion, the PDV will be very small and the | Usually during congestion the max_T(OWD) is quite well sampled as the | |||
delay distribution is skewed toward the maximum. However max_T(OWD) | ||||
is subject to delay noise from other queues along the path as well as | ||||
the host operating system. Min_T(OWD) is less prone to noise along | ||||
the path and from the host operating system, but is not well sampled | ||||
during congestion (i.e. when there is a bottleneck). Flows with very | ||||
different packet send rates exacerbate the problem. | ||||
An alternative delay variation measure that is less sensitive to | ||||
extreme values and different send rates is Mean Absolute Deviation | ||||
(MAD). It can be implemented in an online manner as follows: | ||||
var_base_T = sum_T(|OWD - E_T(OWD)|) | ||||
where | ||||
|x| is the absolute value of x | ||||
E_T(OWD) is the mean OWD calculated in the previous T | ||||
var_est = MAD_MT = sum_MT(var_base_T)/num_MT(OWD) | ||||
For calculation of freq_est p_v=0.7 (MAD is a smaller number than | ||||
PDV) | ||||
For the grouping threshold p_mad=0.1 instead of p_pdv (MAD is less | ||||
noisy so the test can be tighter) | ||||
Note that the method for improving responsiveness of MAD_MT is the | ||||
same as that described in Section 3.4.1 for skew_est. | ||||
3.3.2. Oscillation noise | ||||
When a path has no congestion, var_est will be very small and the | ||||
recorded significant mean crossings will be the result of path noise. | recorded significant mean crossings will be the result of path noise. | |||
Thus up to N-1 meaningless mean crossings can be a source of error at | Thus up to N-1 meaningless mean crossings can be a source of error at | |||
the point a link becomes a bottleneck and flows traversing it begin | the point a link becomes a bottleneck and flows traversing it begin | |||
to be grouped. | to be grouped. | |||
To remove this source of noise from freq_est: | To remove this source of noise from freq_est: | |||
1. Set the current PDV to PDV = NaN (a value representing an invalid | 1. Set the current PDV to PDV = NaN (a value representing an invalid | |||
record, ie Not a Number) for flows that are deemed to not be | record, i.e. Not a Number) for flows that are deemed to not be | |||
experiencing congestion by the first skew_est based grouping test | experiencing congestion by the first skew_est based grouping test | |||
(see Section 3.2.1). | (see Section 3.2.1). | |||
2. Then var_est = sum_M(PDV != NaN) / num_VM(PDV) | 2. Then var_est = sum_M(PDV != NaN) / num_VM(PDV) | |||
3. For freq_est, only record a significant mean crossing if flow is | 3. For freq_est, only record a significant mean crossing if flow is | |||
experiencing congestion. | experiencing congestion. | |||
These three changes will remove the non-congestion noise from | These three changes will remove the non-congestion noise from | |||
freq_est. | freq_est. A similar adjustment can be made for MAD based var_est. | |||
3.3.2. Clock drift | 3.3.3. Clock skew | |||
Generally sender and receiver clock drift will be too small to cause | Generally sender and receiver clock skew will be too small to cause | |||
significant errors in the estimators. Skew_est is most sensitive to | significant errors in the estimators. Skew_est is most sensitive to | |||
this type of noise. In circumstances where clock drift is high, | this type of noise. In circumstances where clock skew is high, | |||
making M < N can reduce this error. | making M < N can reduce this error. | |||
A better method is to estimate the effect the clock drift is having | A better method is to estimate the effect the clock skew is having on | |||
on the E_N(E_T(OWD)), and then adjust mean_delay accordingly. A | the summary statistics, and then adjust statistics accordingly. A | |||
simple method of doing this follows: | simple online method of doing this based on min_T(OWD) will be | |||
described here in a subsequent version of the draft. | ||||
First divide the N E_T(OWD) values into two halves (N/2 in each) | ||||
-- old and new. | ||||
Calculate a mean of the old half: | ||||
Older_mean = E_old(E_T(OWD)) / N/2 | ||||
Calculate a mean of the new (most recent) half: | ||||
Newer_mean = E_new(E_T(OWD)) / N/2 | ||||
A linear estimate of the Clock Drift per T estimates is: | ||||
CD_T = (Newer_mean - Older_mean)/N/2 | ||||
An adjusted mean estimate then is: | ||||
mean_delay = CD_Adj(E_M(E_T(OWD))) = E_M(E_T(OWD)) + CD_T * | ||||
(M/2 + 0.5) | ||||
CD_Adj can be thought of as a prediction of what the long term mean | ||||
will be in the current measurement period T. It is used as the basis | ||||
for skew_est and freq_est. | ||||
3.3.3. Bias in the skewness measure | ||||
If successive calculations of skew_est are made with very different | ||||
numbers of samples (num_T(OWD)), the simple calculation of | ||||
E_M(skew_est) used for grouping decisions will be biased by the | ||||
intervals that have few samples samples. This bias can be corrected | ||||
if necessary as follows. | ||||
skew_base_T = sum_T(OWD < mean_delay) - sum_T(OWD > mean_delay) | ||||
skew_est = sum_MT(skew_base_T)/num_MT(OWD) | ||||
This calculation requires slightly more state, since an | ||||
implementation will need to maintain two cyclic buffers storing | ||||
skew_base_T and num_T(OWD) respectively to manage the rolling | ||||
summations (note only one cyclic buffer is needed for the calculation | ||||
of skew_est outlined previously). | ||||
3.4. Reducing lag and Improving Responsiveness | 3.4. Reducing lag and Improving Responsiveness | |||
Measurement based shared bottleneck detection makes decisions in the | Measurement based shared bottleneck detection makes decisions in the | |||
present based on what has been measured in the past. This means that | present based on what has been measured in the past. This means that | |||
there is always a lag in responding to changing conditions. This | there is always a lag in responding to changing conditions. This | |||
mechanism is based on summary statistics taken over (N*T) seconds. | mechanism is based on summary statistics taken over (N*T) seconds. | |||
This mechanism can be made more responsive to changing conditions by: | This mechanism can be made more responsive to changing conditions by: | |||
1. Reducing N and/or M -- but at the expense of less accurate | 1. Reducing N and/or M -- but at the expense of having less accurate | |||
metrics, and/or | metrics, and/or | |||
2. Exploiting the fact that more recent measurements are more | 2. Exploiting the fact that more recent measurements are more | |||
valuable than older measurements and weighting them accordingly. | valuable than older measurements and weighting them accordingly. | |||
Although more recent measurements are more valuable, older | Although more recent measurements are more valuable, older | |||
measurements are still needed to gain an accurate estimate of the | measurements are still needed to gain an accurate estimate of the | |||
distribution descriptor we are measuring. Unfortunately, the simple | distribution descriptor we are measuring. Unfortunately, the simple | |||
exponentially weighted moving average weights drop off too quickly | exponentially weighted moving average weights drop off too quickly | |||
for our requirements and have an infinite tail. A simple linearly | for our requirements and have an infinite tail. A simple linearly | |||
skipping to change at page 15, line 8 | skipping to change at page 16, line 8 | |||
to the most recent measurements. We propose a piecewise linear | to the most recent measurements. We propose a piecewise linear | |||
distribution of weights, such that the first section (samples 1:F) is | distribution of weights, such that the first section (samples 1:F) is | |||
flat as in a simple moving average, and the second section (samples | flat as in a simple moving average, and the second section (samples | |||
F+1:M) is linearly declining weights to the end of the averaging | F+1:M) is linearly declining weights to the end of the averaging | |||
window. We choose integer weights, which allows incremental | window. We choose integer weights, which allows incremental | |||
calculation without introducing rounding errors. | calculation without introducing rounding errors. | |||
3.4.1. Improving the response of the skewness estimate | 3.4.1. Improving the response of the skewness estimate | |||
The weighted moving average for skew_est, based on skew_est in | The weighted moving average for skew_est, based on skew_est in | |||
Section 3.3.3, can be calculated as follows: | Section 3.1.2, can be calculated as follows: | |||
skew_est = ((M-F+1)*sum(skew_base_T(1:F)) | skew_est = ((M-F+1)*sum(skew_base_T(1:F)) | |||
+ sum([(M-F):1].*skew_base_T(F+1:M))) | + sum([(M-F):1].*skew_base_T(F+1:M))) | |||
/ ((M-F+1)*sum(numsampT(1:F)) | / ((M-F+1)*sum(numsampT(1:F)) | |||
+ sum([(M-F):1].*numsampT(F+1:M))) | + sum([(M-F):1].*numsampT(F+1:M))) | |||
where numsampT is an array of the number of OWD samples in each T (ie | where numsampT is an array of the number of OWD samples in each T | |||
num_T(OWD)), and numsampT(1) is the most recent; skew_base_T(1) is | (i.e. num_T(OWD)), and numsampT(1) is the most recent; skew_base_T(1) | |||
the most recent calculation of skew_base_T; 1:F refers to the integer | is the most recent calculation of skew_base_T; 1:F refers to the | |||
values 1 through to F, and [(M-F):1] refers to an array of the | integer values 1 through to F, and [(M-F):1] refers to an array of | |||
integer values (M-F) declining through to 1; and ".*" is the array | the integer values (M-F) declining through to 1; and ".*" is the | |||
scalar dot product operator. | array scalar dot product operator. | |||
3.4.2. Improving the response of the variance estimate | 3.4.2. Improving the response of the variability estimate | |||
The weighted moving average for var_est can be calculated as follows: | The weighted moving average for var_est can be calculated as follows: | |||
var_est = ((M-F+1)*sum(PDV(1:F)) + sum([(M-F):1].*PDV(F+1:M))) | var_est = ((M-F+1)*sum(PDV(1:F)) + sum([(M-F):1].*PDV(F+1:M))) | |||
/ (F*(M-F+1) + sum([(M-F):1]) | / (F*(M-F+1) + sum([(M-F):1]) | |||
where 1:F refers to the integer values 1 through to F, and [(M-F):1] | where 1:F refers to the integer values 1 through to F, and [(M-F):1] | |||
refers to an array of the integer values (M-F) declining through to | refers to an array of the integer values (M-F) declining through to | |||
1; and ".*" is the array scalar dot product operator. When removing | 1; and ".*" is the array scalar dot product operator. When removing | |||
oscillation noise (see Section 3.3.1) this calculation must be | oscillation noise (see Section 3.3.2) this calculation must be | |||
adjusted to allow for invalid PDV records. | adjusted to allow for invalid PDV records. | |||
4. Measuring OWD | 4. Measuring OWD | |||
This section discusses the OWD measurements required for this | This section discusses the OWD measurements required for this | |||
algorithm to detect shared bottlenecks. | algorithm to detect shared bottlenecks. | |||
The SBD mechanism described in this draft relies on differences | The SBD mechanism described in this draft relies on differences | |||
between OWD measurements to avoid the practical problems with | between OWD measurements to avoid the practical problems with | |||
measuring absolute OWD (see [Hayes-LCN14] section IIIC). Since all | measuring absolute OWD (see [Hayes-LCN14] section IIIC). Since all | |||
skipping to change at page 17, line 9 | skipping to change at page 18, line 9 | |||
Non-authenticated RTCP packets carrying shared bottleneck indications | Non-authenticated RTCP packets carrying shared bottleneck indications | |||
and summary statistics could allow attackers to alter the bottleneck | and summary statistics could allow attackers to alter the bottleneck | |||
sharing characteristics for private gain or disruption of other | sharing characteristics for private gain or disruption of other | |||
parties communication. | parties communication. | |||
8. Change history | 8. Change history | |||
Changes made to this document: | Changes made to this document: | |||
02->WG-00 : Fixed missing 0.5 in 3.3.2 and missing brace in 3.3.3 | WG-00->WG-01 : Moved unbiased skew section to replace skew | |||
estimate, more robust variability estimator, the | ||||
term variance replaced with variability, clock | ||||
drift term corrected to clock skew, revision to | ||||
clock skew section with a place holder, description | ||||
of parameters. | ||||
01->02 : New section describing improvements to the key metric | 02->WG-00 : Fixed missing 0.5 in 3.3.2 and missing brace in | |||
calculations that help to remove noise, bias, and | 3.3.3 | |||
reduce lag. Some revisions to the notation to make | ||||
it clearer. Some tightening of the thresholds. | ||||
00->01 : Revisions to terminology for clarity | 01->02 : New section describing improvements to the key | |||
metric calculations that help to remove noise, | ||||
bias, and reduce lag. Some revisions to the | ||||
notation to make it clearer. Some tightening of | ||||
the thresholds. | ||||
00->01 : Revisions to terminology for clarity | ||||
9. References | 9. References | |||
9.1. Normative References | 9.1. Normative References | |||
[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate | [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate | |||
Requirement Levels", BCP 14, RFC 2119, March 1997. | Requirement Levels", BCP 14, RFC 2119, March 1997. | |||
9.2. Informative References | 9.2. Informative References | |||
End of changes. 61 change blocks. | ||||
175 lines changed or deleted | 222 lines changed or added | |||
This html diff was produced by rfcdiff 1.42. The latest version is available from http://tools.ietf.org/tools/rfcdiff/ |