draft-ietf-tcpm-tcp-dcr-06.txt | draft-ietf-tcpm-tcp-dcr-07.txt | |||
---|---|---|---|---|
Internet Engineering Task Force Sumitha Bhandarkar | Internet Engineering Task Force Sumitha Bhandarkar | |||
INTERNET DRAFT A. L. Narasimha Reddy | INTERNET DRAFT A. L. Narasimha Reddy | |||
draft-ietf-tcpm-tcp-dcr-06.txt Texas A&M University | draft-ietf-tcpm-tcp-dcr-07.txt Texas A&M University | |||
Expires: May 2006 Mark Allman | Expires: July 2006 Mark Allman | |||
ICIR/ICSI | ICIR/ICSI | |||
Ethan Blanton | Ethan Blanton | |||
Purdue University | Purdue University | |||
November 2005 | January 2006 | |||
Improving the Robustness of TCP to Non-Congestion Events | Improving the Robustness of TCP to Non-Congestion Events | |||
Status of this Memo | Status of this Memo | |||
By submitting this Internet-Draft, each author represents that any | By submitting this Internet-Draft, each author represents that any | |||
applicable patent or other IPR claims of which he or she is aware | applicable patent or other IPR claims of which he or she is aware | |||
have been or will be disclosed, and any of which he or she becomes | have been or will be disclosed, and any of which he or she becomes | |||
aware will be disclosed, in accordance with Section 6 of BCP 79. | aware will be disclosed, in accordance with Section 6 of BCP 79. | |||
skipping to change at page 1, line 39 | skipping to change at page 1, line 39 | |||
material or to cite them other than as "work in progress." | material or to cite them other than as "work in progress." | |||
The list of current Internet-Drafts can be accessed at | The list of current Internet-Drafts can be accessed at | |||
http://www.ietf.org/ietf/1id-abstracts.txt. | http://www.ietf.org/ietf/1id-abstracts.txt. | |||
The list of Internet-Draft Shadow Directories can be accessed at | The list of Internet-Draft Shadow Directories can be accessed at | |||
http://www.ietf.org/shadow.html. | http://www.ietf.org/shadow.html. | |||
Copyright Notice | Copyright Notice | |||
Copyright (C) The Internet Society (2005). | Copyright (C) The Internet Society (2006). | |||
Abstract: | Abstract | |||
This document specifies Non-Congestion Robustness (NCR) for TCP. In | This document specifies Non-Congestion Robustness (NCR) for TCP. In | |||
the absence of explicit congestion notification from the network, | the absence of explicit congestion notification from the network TCP | |||
TCP's loss recovery algorithms treat the receipt of three duplicate | uses loss as an indication of congestion. One of the ways TCP | |||
acknowledgments as an implicit indication of congestion in the | detects loss is using the arrival of three duplicate acknowledgments. | |||
network. This is not always correct, notably in the case when | However, this heuristic is not always correct, notably in the case | |||
network paths reorder segments (for whatever reason), resulting in | when network paths reorder segments (for whatever reason), resulting | |||
degraded performance. TCP-NCR is designed to mitigate this degraded | in degraded performance. TCP-NCR is designed to mitigate this | |||
performance by increasing the number of duplicate acknowledgments | degraded performance by increasing the number of duplicate | |||
required to trigger loss recovery, based on the current state of the | acknowledgments required to trigger loss recovery, based on the | |||
connection, in an effort to better disambiguate true segment loss | current state of the connection, in an effort to better disambiguate | |||
from segment reordering. This document specifies the changes to TCP, | true segment loss from segment reordering. This document specifies | |||
as well as the costs and benefits of these modifications. | the changes to TCP, as well as the costs and benefits of these | |||
modifications. | ||||
Table of Contents | ||||
1. Introduction . . . . . . . . . . . . . . . . . . . . . . 2 | ||||
2. NCR Description . . . . . . . . . . . . . . . . . . . . 5 | ||||
3. Algorithm . . . . . . . . . . . . . . . . . . . . . . . 6 | ||||
3.1 Initialization . . . . . . . . . . . . . . . . . . . 8 | ||||
3.2 Terminating Extended Limited Transmit and | ||||
Preventing Bursts . . . . . . . . . . . . . . . . . . 9 | ||||
3.3 Extended Limited Transmit . . . . . . . . . . . . . . 10 | ||||
3.4 Entering Loss Recovery . . . . . . . . . . . . . . . 11 | ||||
4. Advantages . . . . . . . . . . . . . . . . . . . . . . . 11 | ||||
5. Disadvantages . . . . . . . . . . . . . . . . . . . . . 12 | ||||
6. Related Work . . . . . . . . . . . . . . . . . . . . . . 13 | ||||
7. Security Considerations . . . . . . . . . . . . . . . . 13 | ||||
8. Acknowledgements . . . . . . . . . . . . . . . . . . . . 14 | ||||
9. IANA Considerations . . . . . . . . . . . . . . . . . . 14 | ||||
10. Normative References . . . . . . . . . . . . . . . . . . 14 | ||||
11. Informative References . . . . . . . . . . . . . . . . . 14 | ||||
12. Author's Addresses . . . . . . . . . . . . . . . . . . . 16 | ||||
Terminology | Terminology | |||
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL | The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL | |||
NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and | NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and | |||
"OPTIONAL" in this document are to be interpreted as described | "OPTIONAL" in this document are to be interpreted as described | |||
in [RFC2119]. | in [RFC2119]. | |||
Readers should be familiar with the TCP terminology (e.g., | Readers should be familiar with the TCP terminology (e.g., | |||
FlightSize, Pipe, etc.) given in [RFC2581] and [RFC3517]. | FlightSize, Pipe, etc.) given in [RFC2581] and [RFC3517]. | |||
skipping to change at page 2, line 36 | skipping to change at page 3, line 8 | |||
of congestion (i.e., assuming queue overflow). TCP receivers send | of congestion (i.e., assuming queue overflow). TCP receivers send | |||
cumulative acknowledgments (ACKs) indicating the next sequence number | cumulative acknowledgments (ACKs) indicating the next sequence number | |||
expected from the sender for arriving segments [RFC793]. When | expected from the sender for arriving segments [RFC793]. When | |||
segments arrive out-of-order, duplicate ACKs are generated. As | segments arrive out-of-order, duplicate ACKs are generated. As | |||
specified in [RFC2581], a TCP sender uses the arrival of three | specified in [RFC2581], a TCP sender uses the arrival of three | |||
duplicate ACKs as an indication of segment loss. The TCP sender | duplicate ACKs as an indication of segment loss. The TCP sender | |||
retransmits the lost segment and reduces the load imposed on the | retransmits the lost segment and reduces the load imposed on the | |||
network, assuming the segment loss was caused by resource contention | network, assuming the segment loss was caused by resource contention | |||
within the network path. The TCP sender does not assume loss on the | within the network path. The TCP sender does not assume loss on the | |||
first or second duplicate ACK, but waits for three duplicate ACKs to | first or second duplicate ACK, but waits for three duplicate ACKs to | |||
account for mild reordering. However, the use of this constant | account for minor packet reordering. However, the use of this | |||
threshold of duplicate ACKs has several problems that can be | constant threshold of duplicate ACKs has several problems that can be | |||
mitigated with a dynamic threshold. | mitigated with a dynamic threshold. | |||
The following is an example of TCP's behavior: | The following is an example of TCP's behavior: | |||
+ TCP A is the data sender and TCP B is the data receiver. | + TCP A is the data sender and TCP B is the data receiver. | |||
+ TCP A sends 10 segments each consisting of a single data byte | + TCP A sends 10 segments each consisting of a single data byte | |||
(i.e., transmits bytes 1-10 in segments 1-10). | (i.e., transmits bytes 1-10 in segments 1-10). | |||
+ Assume segment 3 is dropped in the network. | + Assume segment 3 is dropped in the network. | |||
skipping to change at page 3, line 19 | skipping to change at page 3, line 39 | |||
[RFC2581] recommends that delayed ACKs not be used when the ACK | [RFC2581] recommends that delayed ACKs not be used when the ACK | |||
is triggered by an out-of-order segment.) | is triggered by an out-of-order segment.) | |||
+ When TCP A receives the third duplicate ACK (or fourth ACK | + When TCP A receives the third duplicate ACK (or fourth ACK | |||
overall) for sequence number 3, TCP A will retransmit | overall) for sequence number 3, TCP A will retransmit | |||
segment 3 and reduce the sending rate by roughly half (see | segment 3 and reduce the sending rate by roughly half (see | |||
[RFC2581] for specifics on the congestion control state | [RFC2581] for specifics on the congestion control state | |||
adjustments). | adjustments). | |||
Alternatively, suppose segment 3 was not dropped by the network, but | Alternatively, suppose segment 3 was not dropped by the network, but | |||
rather delayed such that segment 3 arrives after segment 10. The | rather delayed such that segment 3 arrives at TCP B after segment 10. | |||
above scenario will play out in precisely the same manner insomuch as | The above scenario will play out in precisely the same manner | |||
a retransmission of segment 3 will be triggered. In other words, TCP | insomuch as a retransmission of segment 3 will be triggered. In | |||
is not capable of disambiguating this reordering event from a segment | other words, TCP is not capable of disambiguating this reordering | |||
loss. | event from a segment loss, resulting in an unnecessary retransmission | |||
and rate reduction. | ||||
The following is the specific motivation behind making TCP robust to | The following is the specific motivation behind making TCP robust to | |||
reordered segments: | reordered segments: | |||
* A number of Internet measurement studies have shown that packet | * A number of Internet measurement studies have shown that packet | |||
reordering is not a rare phenomenon [Pax97,BPS99,JIDKT03,GPL04]. | reordering is not a rare phenomenon [Pax97,BPS99,JIDKT03,GPL04]. | |||
Further, the reordering can be well beyond that required for | Further, the reordering can be well beyond that required for | |||
fast retransmit to be falsely triggered. | fast retransmit to be falsely triggered. | |||
* [BA02,ZKFP03] show the negative performance implications that | * [BA02,ZKFP03] show the negative performance implications that | |||
skipping to change at page 3, line 45 | skipping to change at page 4, line 18 | |||
* The requirement imposed by TCP for almost in-order packet | * The requirement imposed by TCP for almost in-order packet | |||
delivery places a constraint on the design of future technology. | delivery places a constraint on the design of future technology. | |||
Novel routing algorithms, network components, link-layer | Novel routing algorithms, network components, link-layer | |||
retransmission mechanisms and applications could all be looked | retransmission mechanisms and applications could all be looked | |||
at with a fresh perspective if TCP were to be more robust to | at with a fresh perspective if TCP were to be more robust to | |||
segment reordering. For instance, high speed packet switches | segment reordering. For instance, high speed packet switches | |||
could cause resequencing of packets if TCP were more robust. | could cause resequencing of packets if TCP were more robust. | |||
There has been work proposed in the literature explicitly to | There has been work proposed in the literature explicitly to | |||
ensure that packet ordering is maintained in such switches | ensure that packet ordering is maintained in such switches | |||
[KM02]. Also, link-layer mechanisms that attempt to recover | (e.g., [KM02]). Also, link-layer mechanisms that attempt to | |||
from packet corruption by retransmitting could be allowed to | recover from packet corruption by retransmitting could be | |||
reorder packets and, hence, increase the chances of local loss | allowed to reorder packets and, hence, increase the chances of | |||
repair rather than relying on TCP to repair the loss (and, | local loss repair rather than relying on TCP to repair the loss | |||
needlessly reduce its sending rate). Additional examples | (and, needlessly reduce its sending rate). Additional examples | |||
include multi-path routing, high-delay satellite links and some | include multi-path routing, high-delay satellite links and some | |||
of the schemes proposed for differentiated services | of the schemes proposed for a differentiated services | |||
architecture. By making TCP more robust to non-congestion | architecture. By making TCP more robust to non-congestion | |||
events, TCP-NCR may open the design space of the future Internet | events, TCP-NCR may open the design space of the future Internet | |||
components. | components. | |||
In this document we specify a set of TCP sender modifications to | In this document we specify a set of TCP sender modifications to | |||
provide Non-Congestion Robustness (NCR) to TCP. In particular, these | provide Non-Congestion Robustness (NCR) to TCP. In particular, these | |||
changes are built on top of TCP with selective acknowledgments | changes are built on top of TCP with selective acknowledgments | |||
(SACKs) [RFC2018] and the SACK-based loss recovery scheme given in | (SACKs) [RFC2018] and the SACK-based loss recovery scheme given in | |||
[RFC3517], since SACK is widely deployed at this point ([MAF05] | [RFC3517], since SACK is widely deployed at this point ([MAF05] | |||
indicates that 68% of web servers and 88% of web clients utilize SACK | indicates that 68% of web servers and 88% of web clients utilize SACK | |||
as of spring, 2004). | as of spring, 2004). | |||
Finally, we note that the TCP-NCR algorithm provided in this document | We note that the TCP-NCR algorithm provided in this document could be | |||
could be easily adapted to SCTP [RFC2960] since SCTP uses congestion | easily adapted to SCTP [RFC2960] since SCTP uses congestion control | |||
control algorithms similar to TCP's (and, hence, has the same | algorithms similar to TCP's (and, hence, has the same reordering | |||
reordering robustness issues). | robustness issues). | |||
As we note in several places in the remainder of this document, we | ||||
consider TCP-NCR to be experimental in that more experience with the | ||||
techniques is required before TCP-NCR should be used on a large scale | ||||
on the Internet. We encourage implementation and experimentation | ||||
with TCP-NCR in the hopes of gaining an understanding of its | ||||
suitability for wide-scale deployment. | ||||
The remainder of this document is organized as follows. Section 2 | The remainder of this document is organized as follows. Section 2 | |||
provides a high-level description of the TCP-NCR mechanisms. In | provides a high-level description of the TCP-NCR mechanisms. In | |||
Section 3, we specify the TCP-NCR algorithm. Section 4 provides a | Section 3, we specify the TCP-NCR algorithm. Section 4 provides a | |||
brief overview of the benefits of TCP-NCR, while Section 5 discusses | brief overview of the benefits of TCP-NCR, while Section 5 discusses | |||
the drawbacks of TCP-NCR. Section 6 discusses related work. Section | the drawbacks of TCP-NCR. Section 6 discusses related work. Section | |||
7 discusses security concerns. | 7 discusses security concerns. | |||
2. NCR Description | 2. NCR Description | |||
As discussed above, in the face of packet reordering three duplicate | As discussed above, in the face of packet reordering, three duplicate | |||
ACKs may not be enough to disambiguate loss from reordering. In this | ACKs may not be enough to disambiguate loss from reordering. In this | |||
section we provide a non-normative sketch of TCP-NCR. The detailed | section we provide a non-normative sketch of TCP-NCR. The detailed | |||
algorithms for implementing Non-Congestion Robustness for TCP are | algorithms for implementing Non-Congestion Robustness for TCP are | |||
presented in the next section. | presented in the next section. | |||
The general idea behind TCP-NCR is to increase the threshold used to | The general idea behind TCP-NCR is to increase the threshold used to | |||
trigger a fast retransmission from the current fixed value of three | trigger a fast retransmission from the current fixed value of three | |||
duplicate ACKs [RFC2581] to approximately a congestion window of data | duplicate ACKs [RFC2581] to approximately a congestion window of data | |||
having left the network (but, not less than the currently | having left the network (but, not less than the currently | |||
standardized value of three duplicate ACKs). Since cwnd represents | standardized value of three duplicate ACKs). Since cwnd represents | |||
the amount of data a TCP flow can transmit in one round-trip time | the amount of data a TCP flow can transmit in one round-trip time | |||
(RTT), waiting to receive notice that cwnd bytes have left the | (RTT), waiting to receive notice that cwnd bytes have left the | |||
network before deciding whether the root cause is loss or reordering | network before deciding whether the root cause is loss or reordering | |||
imposes a delay of roughly one RTT. The appropriate choice for a new | imposes a delay of roughly one RTT on both the retransmission and the | |||
value of the threshold is essentially a tradeoff between making the | congestion control response. The appropriate choice for a new value | |||
best decision regarding the cause of the duplicate ACKs and | of the threshold is essentially a tradeoff between making the best | |||
decision regarding the cause of the duplicate ACKs and | ||||
responsiveness. The choice to trigger a retransmission only after a | responsiveness. The choice to trigger a retransmission only after a | |||
cwnd's worth of data is known to have left the network represents | cwnd's worth of data is known to have left the network represents | |||
roughly the largest amount of time a TCP can wait before the (often | roughly the largest amount of time a TCP can wait before the (often | |||
costly) retransmission timeout may be triggered. Therefore, the | costly) retransmission timeout may be triggered. Therefore, the | |||
algorithm described in this document attempts to make the best root | algorithm described in this document attempts to make the best | |||
cause decision possible. | decision possible at the expense of timeliness. | |||
Simply increasing the threshold before retransmitting a segment can | Simply increasing the threshold before retransmitting a segment can | |||
make TCP brittle to packet loss or ACK loss since such loss reduces | make TCP brittle to packet loss or ACK loss since such loss reduces | |||
the number of duplicate ACKs that will arrive at the sender from the | the number of duplicate ACKs that will arrive at the sender from the | |||
receiver. For instance, if the cwnd is 10 segments and one segment | receiver. For instance, if the cwnd is 10 segments and one segment | |||
is lost, a duplicate ACK threshold of 10 will never be met because | is lost, a duplicate ACK threshold of 10 will never be met because | |||
duplicate ACKs corresponding to at most 9 segments will arrive at the | duplicate ACKs corresponding to at most 9 segments will arrive at the | |||
sender. To offset the issue of loss, we extend TCP's Limited | sender. To offset the issue of loss, we extend TCP's Limited | |||
Transmit [RFC3042] scheme to allow for the sending of new data during | Transmit [RFC3042] scheme to allow for the sending of new data during | |||
the period when the TCP sender is disambiguating loss and reordering. | the period when the TCP sender is disambiguating loss and reordering. | |||
skipping to change at page 7, line 4 | skipping to change at page 7, line 33 | |||
A constant MUST be set depending on which variant of extended Limited | A constant MUST be set depending on which variant of extended Limited | |||
Transmit is used, as follows: | Transmit is used, as follows: | |||
Careful Limited Transmit: | Careful Limited Transmit: | |||
LT_F = 2/3 | LT_F = 2/3 | |||
Aggressive Limited Transmit: | Aggressive Limited Transmit: | |||
LT_F = 1/2 | LT_F = 1/2 | |||
This constant reflects the fraction of outstanding data that must be | ||||
SACKed before a retransmission is triggered. Since Aggressive | This constant reflects the fraction of outstanding data (including | |||
Limited Transmit sends a new segment for every segment known to have | data sent during Extended Limited Transmit) that must be SACKed | |||
left the network, a total of roughly cwnd segments will be sent | before a retransmission is triggered. Since Aggressive Limited | |||
during Aggressive Limited Transmit and therefore ideally a total of | Transmit sends a new segment for every segment known to have left the | |||
2*cwnd segments will be outstanding. The duplicate ACK threshold is | network, a total of roughly cwnd segments will be sent during | |||
then set to LT_F = 1/2 of 2*cwnd (or about 1 RTT worth of data). The | Aggressive Limited Transmit and therefore ideally a total of roughly | |||
factor is different for Careful Limited Transmit because the sender | 2*cwnd segments will be outstanding when a retransmission is | |||
only transmits one new segment for every two segments that are SACKed | triggered. The duplicate ACK threshold is then set to LT_F = 1/2 of | |||
and therefore will ideally have a total of 1.5*cwnd segments | 2*cwnd (or about 1 RTT worth of data). The factor is different for | |||
outstanding when the retransmission is to be triggered. Hence, the | Careful Limited Transmit because the sender only transmits one new | |||
required threshold is LT_F=2/3 of 1.5*cwnd to delay the | segment for every two segments that are SACKed and therefore will | |||
retransmission by roughly 1 RTT. | ideally have a total of 1.5*cwnd segments outstanding when the | |||
retransmission is to be triggered. Hence, the required threshold is | ||||
LT_F=2/3 of 1.5*cwnd to delay the retransmission by roughly 1 RTT. | ||||
There are situations whereby the sender cannot transmit new data | There are situations whereby the sender cannot transmit new data | |||
during Extended Limited Transmit (e.g., lack of data from the | during Extended Limited Transmit (e.g., lack of data from the | |||
application, receiver's advertised window limit). These situations | application, receiver's advertised window limit). These situations | |||
can lead to the problems discussed in the last section when a TCP | can lead to the problems discussed in the last section when a TCP | |||
does not employ Extended Limited Transmit and is starved for ACKs. | does not employ Extended Limited Transmit and is starved for ACKs. | |||
Therefore, TCP-NCR adapts the duplicate ACK threshold on each SACK | Therefore, TCP-NCR adapts the duplicate ACK threshold on each SACK | |||
arrival to be as robust as possible given the actual amount of data | arrival to be as robust as possible given the actual amount of data | |||
that has been transmitted, or roughly LT_F times the number of | that has been transmitted, or roughly LT_F times the number of | |||
outstanding segments. | outstanding segments. | |||
The TCP-NCR modifications specified in this document lend themselves | The TCP-NCR modifications specified in this document lend themselves | |||
to incremental deployment. Only the TCP implementation on the sender | to incremental deployment. Only the TCP implementation on the sender | |||
side requires modification. The changes themselves are modest. | side requires modification (assuming both hosts support SACK). The | |||
However, as will be discussed below, availability of additional | changes themselves are modest. However, as will be discussed below, | |||
buffer space at the receiver will help maximize the benefits of using | availability of additional buffer space at the receiver will help | |||
TCP-NCR but are not strictly necessary. | maximize the benefits of using TCP-NCR but are not strictly | |||
necessary. | ||||
The following algorithms depend on the notions provided by [RFC3517] | The following algorithms depend on the notions provided by [RFC3517] | |||
and we assume the reader is familiar with the terminology given in | and we assume the reader is familiar with the terminology given in | |||
[RFC3517]. The TCP-NCR algorithm can be adapted to alternate SACK- | [RFC3517]. The TCP-NCR algorithm can be adapted to alternate SACK- | |||
based loss recovery schemes. [BR04,BSRV04] outline non-SACK-based | based loss recovery schemes. [BR04,BSRV04] outline non-SACK-based | |||
algorithms, however, we do not specify those algorithms in this | algorithms, however, we do not specify those algorithms in this | |||
document and do not recommend them due to both the complexity and | document and do not recommend them due to both the complexity and | |||
security implications of having only a gross understanding of the | security implications of having only a gross understanding of the | |||
number of outstanding segments in the network. | number of outstanding segments in the network. | |||
skipping to change at page 8, line 18 | skipping to change at page 8, line 49 | |||
3.1. Initialization | 3.1. Initialization | |||
When entering a period of loss / reordering detection and Extended | When entering a period of loss / reordering detection and Extended | |||
Limited Transmit a TCP-NCR MUST initialize several state variables. | Limited Transmit a TCP-NCR MUST initialize several state variables. | |||
A TCP MUST enter Extended Limited Transmit upon receiving the first | A TCP MUST enter Extended Limited Transmit upon receiving the first | |||
ACK with a SACK block after the reception of an ACK that (a) did not | ACK with a SACK block after the reception of an ACK that (a) did not | |||
contain SACK information and (b) did increase the connection's | contain SACK information and (b) did increase the connection's | |||
cumulative ACK point. The initializations are: | cumulative ACK point. The initializations are: | |||
(I.1) Save the current FlightSize. | (I.1) The TCP MUST save the current FlightSize. | |||
FlightSizePrev = FlightSize | FlightSizePrev = FlightSize | |||
(I.2) The TCP MUST set a variable for tracking the number of | ||||
(I.2) Set a variable for tracking the number of segments for which | segments for which an ACK does not trigger a transmission | |||
an ACK does not trigger a transmission during Careful Limited | during Careful Limited Transmit. | |||
Transmit. | ||||
Skipped = 0 | Skipped = 0 | |||
(Note: Skipped is not used during Aggressive Limited | (Note: Skipped is not used during Aggressive Limited | |||
Transmit.) | Transmit.) | |||
(I.3) Set DupThresh (from [RFC3517]) based on the size of the | (I.3) The TCP MUST set DupThresh (from [RFC3517]) based on the | |||
current FlightSize. | current FlightSize. | |||
DupThresh = max (LT_F * (FlightSize / SMSS),3) | DupThresh = max (LT_F * (FlightSize / SMSS),3) | |||
Note: We keep the lower bound of DupThresh = 3 from | Note: We keep the lower bound of DupThresh = 3 from | |||
[RFC2581,RFC3517]. | [RFC2581,RFC3517]. | |||
In addition to the above steps, the incoming ACK MUST be processed | In addition to the above steps, the incoming ACK MUST be processed | |||
with the E series of steps in section 3.3. | with the E series of steps in section 3.3. | |||
skipping to change at page 9, line 12 | skipping to change at page 9, line 42 | |||
not congestion. Therefore, the receipt of an ACK that extends the | not congestion. Therefore, the receipt of an ACK that extends the | |||
cumulative ACK point MUST terminate Extended Limited Transmit. As | cumulative ACK point MUST terminate Extended Limited Transmit. As | |||
described below (in (T.4)), an ACK that extends the cumulative ACK | described below (in (T.4)), an ACK that extends the cumulative ACK | |||
point and *also* contains SACK information will also trigger the | point and *also* contains SACK information will also trigger the | |||
beginning of a new Extended Limited Transmit phase. | beginning of a new Extended Limited Transmit phase. | |||
Upon the termination of Extended Limited Transmit, and especially | Upon the termination of Extended Limited Transmit, and especially | |||
when using the Careful variant, TCP-NCR may be in a situation where | when using the Careful variant, TCP-NCR may be in a situation where | |||
the entire cwnd is not being utilized and therefore TCP-NCR will be | the entire cwnd is not being utilized and therefore TCP-NCR will be | |||
prone to transmitting a burst of segments into the network. | prone to transmitting a burst of segments into the network. | |||
Therefore, when a TCP-NCR in the Extended Limited Transmit phase | Therefore, to mitigate this bursting when a TCP-NCR in the Extended | |||
receives an ACK that updates the cumulative ACK point (regardless of | Limited Transmit phase receives an ACK that updates the cumulative | |||
whether the ACK contains SACK information), the following steps MUST | ACK point (regardless of whether the ACK contains SACK information), | |||
be taken: | the following steps MUST be taken: | |||
(T.1) cwnd = min (FlightSize + SMSS,FlightSizePrev) | (T.1) A TCP MUST reset cwnd to: | |||
cwnd = min (FlightSize + SMSS,FlightSizePrev) | ||||
This step ensures that cwnd is not grossly larger than the | This step ensures that cwnd is not grossly larger than the | |||
amount of data outstanding --- a situation that would cause a | amount of data outstanding --- a situation that would cause a | |||
line rate burst. | line rate burst. | |||
(T.2) ssthresh = FlightSizePrev | (T.2) A TCP MUST set ssthresh to: | |||
ssthresh = FlightSizePrev | ||||
This step provides TCP-NCR with a sense of "history". If step | This step provides TCP-NCR with a sense of "history". If step | |||
(T.1) reduces cwnd below FlightSizePrev this step ensures that | (T.1) reduces cwnd below FlightSizePrev this step ensures that | |||
TCP-NCR will slow start back to the operating point in effect | TCP-NCR will slow start back to the operating point in effect | |||
before Extended Limited Transmit. | before Extended Limited Transmit. | |||
(T.3) Transmit previously unsent data as allowed by cwnd, | (T.3) A TCP is now permitted to transmit previously unsent data as | |||
FlightSize, application data availability and the receiver's | allowed by cwnd, FlightSize, application data availability and | |||
advertised window. | the receiver's advertised window. | |||
(T.4) When the ACK extends the cumulative ACK point and also | (T.4) When an incoming ACK extends the cumulative ACK point and also | |||
contains SACK information, the initializations in steps (I.2) | contains SACK information, the initializations in steps (I.2) | |||
and (I.3) from section 3.1 MUST be taken (but, not step (I.1)) | and (I.3) from section 3.1 MUST be taken (but, step (I.1) MUST | |||
to re-start Extended Limited Transmit. In addition, the | NOT be executed) to re-start Extended Limited Transmit. In | |||
series of steps in section 3.3 (the "E" steps) MUST be taken. | addition, the series of steps in section 3.3 (the "E" steps) | |||
MUST be taken. | ||||
3.3. Extended Limited Transmit | 3.3. Extended Limited Transmit | |||
On each ACK containing SACK information that arrives after TCP-NCR | On each ACK containing SACK information that arrives after TCP-NCR | |||
has entered the Extended Limited Transmit phase (as outlined in | has entered the Extended Limited Transmit phase (as outlined in | |||
section 3.1) and before Extended Limited Transmit terminates, the | section 3.1) and before Extended Limited Transmit terminates, the | |||
sender MUST use the following procedure. | sender MUST use the following procedure. | |||
(E.1) Use the SetPipe () procedure from [RFC3517] to set the "pipe" | (E.1) The SetPipe () procedure from [RFC3517] MUST be used to set | |||
variable (which represents the number of bytes still considered | the "pipe" variable (which represents the number of bytes | |||
"in the network"). Note: the current value of DupThresh MUST | still considered "in the network"). Note: the current value | |||
be used by SetPipe () to produce an accurate assessment of the | of DupThresh MUST be used by SetPipe () to produce an accurate | |||
amount of data still considered in the network. | assessment of the amount of data still considered in the | |||
network. | ||||
(E.2) If the comparison in equation (1) below holds and there are | (E.2) If the comparison in equation (1) below holds and there are | |||
SMSS bytes of previously unsent data available for | SMSS bytes of previously unsent data available for | |||
transmission then transmit one segment of SMSS bytes. | transmission then the sender MUST transmit one segment of SMSS | |||
bytes. | ||||
(pipe + Skipped) <= (FlightSizePrev - SMSS) (1) | (pipe + Skipped) <= (FlightSizePrev - SMSS) (1) | |||
If the comparison in equation (1) does not hold or no new data | If the comparison in equation (1) does not hold or no new data | |||
can be transmitted (due to lack of data from the application | can be transmitted (due to lack of data from the application | |||
or the advertised window limit), skip to step (E.6). | or the advertised window limit), skip to step (E.6). | |||
(E.3) Increment pipe by SMSS bytes. | (E.3) Pipe MUST be incremented by SMSS bytes. | |||
(E.4) If using Careful Limited Transmit, increment Skipped by SMSS | (E.4) If using Careful Limited Transmit, Skipped MUST be incremented | |||
bytes to ensure that the next SMSS bytes of SACKed data | by SMSS bytes to ensure that the next SMSS bytes of SACKed data | |||
processed do not trigger a Limited Transmit transmission (since | processed does not trigger a Limited Transmit transmission | |||
the goal of Careful Limited Transmit is to send upon the | (since the goal of Careful Limited Transmit is to send upon | |||
reception of every second duplicate ACK). | the reception of every second duplicate ACK). | |||
(E.5) Return to step (E.2) to ensure that as many bytes as | (E.5) A TCP MUST return to step (E.2) to ensure that as many bytes | |||
appropriate are transmitted. This provides robustness to ACK | as appropriate are transmitted. This provides robustness to | |||
loss that can be (largely) compensated for using SACK | ACK loss that can be (largely) compensated for using SACK | |||
information. | information. | |||
(E.6) Reset DupThresh via: | (E.6) DupThresh MUST be reset via: | |||
DupThresh = max (LT_F * (FlightSize / SMSS),3) | DupThresh = max (LT_F * (FlightSize / SMSS),3) | |||
where FlightSize is the total number of bytes that have not | where FlightSize is the total number of bytes that have not | |||
been cumulatively acknowledged (which is different from | been cumulatively acknowledged (which is different from | |||
"pipe"). | "pipe"). | |||
3.4 Entering Loss Recovery | 3.4 Entering Loss Recovery | |||
When a segment is deemed lost via the algorithms in [RFC3517], | When a segment is deemed lost via the algorithms in [RFC3517], | |||
Extended Limited Transmit MUST be terminated, leaving the | Extended Limited Transmit MUST be terminated, leaving the | |||
algoritms in [RFC3517] to govern TCP's behavior. One slight | algorithms in [RFC3517] to govern TCP's behavior. One slight | |||
change to [RFC3517] MUST be made, however. In section 5, step | change to [RFC3517] MUST be made, however. In section 5, step | |||
(2) of [RFC3517] MUST be changed to: | (2) of [RFC3517] MUST be changed to: | |||
(2) ssthresh = cwnd = (FlightSizePrev / 2) | (2) ssthresh = cwnd = (FlightSizePrev / 2) | |||
This ensures that the congestion control modifications are made | This ensures that the congestion control modifications are made | |||
with respect to the amount of data in the network before | with respect to the amount of data in the network before | |||
FlightSize was increased by Extended Limited Transmit. | FlightSize was increased by Extended Limited Transmit. | |||
Note: Once the algorithm in [RFC3517] takes over from Extended | Note: Once the algorithm in [RFC3517] takes over from Extended | |||
skipping to change at page 11, line 47 | skipping to change at page 12, line 37 | |||
5. Disadvantages | 5. Disadvantages | |||
While we note that all of the changes outlined above are implemented | While we note that all of the changes outlined above are implemented | |||
in the sender, the receiver also potentially has a part to play. In | in the sender, the receiver also potentially has a part to play. In | |||
particular, TCP-NCR increases the receiver's buffering requirement by | particular, TCP-NCR increases the receiver's buffering requirement by | |||
up to an extra cwnd -- in the case of the TCP sender using Aggressive | up to an extra cwnd -- in the case of the TCP sender using Aggressive | |||
Limited Transmit and actual loss occurring in the network. | Limited Transmit and actual loss occurring in the network. | |||
Therefore, to maximize the benefits from TCP-NCR receivers should | Therefore, to maximize the benefits from TCP-NCR receivers should | |||
advertise a large window to absorb the extra out-of-order traffic. In | advertise a large window to absorb the extra out-of-order traffic. In | |||
the case that the additonal buffer requirements are not met, the use | the case that the additional buffer requirements are not met, the use | |||
of the above algorithm takes into account the reduced advertised | of the above algorithm takes into account the reduced advertised | |||
window. | window---with a corresponding loss in robustness to packet | |||
reordering. | ||||
In addition, using TCP-NCR could delay the delivery of data to the | In addition, using TCP-NCR could delay the delivery of data to the | |||
application by up to one RTT because the fast retransmission point is | application by up to one RTT because the fast retransmission point is | |||
delayed by roughly one RTT in TCP-NCR. Applications that are | delayed by roughly one RTT in TCP-NCR. Applications that are | |||
sensitive to such delays should turn off the TCP-NCR option. For | sensitive to such delays should turn off the TCP-NCR option. For | |||
instance, a socket option could be introduced to allow applications | instance, a socket option could be introduced to allow applications | |||
to control whether NCR would be used for a particular connection. | to control whether NCR would be used for a particular connection. | |||
Finally, the use of TCP-NCR makes the recovery from congestion events | Finally, the use of TCP-NCR makes the recovery from congestion events | |||
sluggish in comparison to the standard reaction in [RFC2581]. [BR04, | sluggish in comparison to the standard reaction in [RFC2581]. [BR04, | |||
skipping to change at page 12, line 35 | skipping to change at page 13, line 27 | |||
happening and mechanisms that try to detect spurious retransmits and | happening and mechanisms that try to detect spurious retransmits and | |||
"undo" the needless congestion control state changes that have been | "undo" the needless congestion control state changes that have been | |||
taken. | taken. | |||
[BA02,ZKFP03] attempt to prevent segment reordering from triggering | [BA02,ZKFP03] attempt to prevent segment reordering from triggering | |||
spurious retransmits by using various algorithms to approximate the | spurious retransmits by using various algorithms to approximate the | |||
duplicate ACK threshold required to disambiguate loss and reordering | duplicate ACK threshold required to disambiguate loss and reordering | |||
over a given network path at a given time. TCP-NCR similarly tries | over a given network path at a given time. TCP-NCR similarly tries | |||
to prevent spurious retransmits. However, TCP-NCR takes a simplified | to prevent spurious retransmits. However, TCP-NCR takes a simplified | |||
approach compared to those in [BA02,ZKFP03] in that TCP-NCR simply | approach compared to those in [BA02,ZKFP03] in that TCP-NCR simply | |||
delays retransmission by a fixed amount (in comparison to standard | delays retransmission by an amount based on the current cwnd (in | |||
TCP), while the other schemes use relatively complex algorithms in an | comparison to standard TCP), while the other schemes use relatively | |||
attempt to derive a more precise value for DupThresh that depends on | complex algorithms in an attempt to derive a more precise value for | |||
the network conditions. While TCP-NCR offers simplicity the other | DupThresh that depends on the current patterns of packet reordering. | |||
schemes may offer more precision such that applications would not be | While TCP-NCR offers simplicity the other schemes may offer more | |||
forced to wait as long for their retransmissions. Future work could | precision such that applications would not be forced to wait as long | |||
be undertaken to achieve robustness without needless delay. | for their retransmissions. Future work could be undertaken to | |||
achieve robustness without needless delay. | ||||
On the other hand, several schemes have been developed to detect and | On the other hand, several schemes have been developed to detect and | |||
mitigate needless retransmissions after the fact. | mitigate needless retransmissions after the fact. | |||
[RFC3522,RFC3708,BA02,RFC4015,SK04] present algorithms to detect | [RFC3522,RFC3708,BA02,RFC4015,SK04] present algorithms to detect | |||
spurious retransmits and mitigate the changes these events made to | spurious retransmits and mitigate the changes these events made to | |||
the congestion control state. TCP-NCR could be used in conjunction | the congestion control state. TCP-NCR could be used in conjunction | |||
with these algorithms, with TCP-NCR attempting to prevent spurious | with these algorithms, with TCP-NCR attempting to prevent spurious | |||
retransmits and some other scheme kicking in if the prevention | retransmits and some other scheme kicking in if the prevention | |||
failed. In addition, we note that TCP-NCR is concentrated on | failed. In addition, we note that TCP-NCR is concentrated on | |||
preventing spurious fast retransmits and some of the above algorithms | preventing spurious fast retransmits and some of the above algorithms | |||
skipping to change at page 13, line 18 | skipping to change at page 14, line 11 | |||
We do not believe there are security implications involved with TCP- | We do not believe there are security implications involved with TCP- | |||
NCR over and above those for general TCP congestion control | NCR over and above those for general TCP congestion control | |||
[RFC2581]. In particular, the Extended Limited Transmit algorithms | [RFC2581]. In particular, the Extended Limited Transmit algorithms | |||
specified in this document have been specifically designed not to be | specified in this document have been specifically designed not to be | |||
susceptible to the sorts of ACK splitting attacks TCP's general TCP | susceptible to the sorts of ACK splitting attacks TCP's general TCP | |||
congestion control is vulnerable to (as discussed in [RFC3465]). | congestion control is vulnerable to (as discussed in [RFC3465]). | |||
8. Acknowledgements | 8. Acknowledgements | |||
Ted Faber, Wesley Eddy, Gorry Fairhurst, Sally Floyd, Sara Landstrom, | Feedback from Lars Eggert, Ted Faber, Wesley Eddy, Gorry Fairhurst, | |||
Nauzad Sadry, Pasi Sarolahti, Joe Touch and Nitin Vaidya as well as | Sally Floyd, Sara Landstrom, Nauzad Sadry, Pasi Sarolahti, Joe Touch | |||
feedback from the TCPM working group have contributed significantly | and Nitin Vaidya and the TCPM working group have contributed | |||
to this document. Our thanks to all! | significantly to this document. Our thanks to all! | |||
9. Normative References | 9. IANA Considerations | |||
This document requires no IANA assignments. The RFC Editor can | ||||
safely remove this section. | ||||
10. Normative References | ||||
[RFC793] J. Postel, "Transmission Control Protocol", RFC 793, | [RFC793] J. Postel, "Transmission Control Protocol", RFC 793, | |||
September 1981. | September 1981. | |||
[RFC2018] M. Mathis, J. Mahdavi, S. Floyd and A. Romanow, "TCP | [RFC2018] M. Mathis, J. Mahdavi, S. Floyd and A. Romanow, "TCP | |||
selective acknowledgment options," Internet RFC 2018. | selective acknowledgment options," Internet RFC 2018. | |||
[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate | [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate | |||
Requirement Levels", BCP 14, RFC 2119, March 1997. | Requirement Levels", BCP 14, RFC 2119, March 1997. | |||
[RFC2581] M. Allman, V. Paxson, and W. Stevens, "TCP Congestion | [RFC2581] M. Allman, V. Paxson, and W. Stevens, "TCP Congestion | |||
Control", RFC 2581, April 1999. | Control", RFC 2581, April 1999. | |||
[RFC3042] M. Allman, H. Balakrishnan and S. Floyd, "Enhancing TCP's | [RFC3042] M. Allman, H. Balakrishnan and S. Floyd, "Enhancing TCP's | |||
Loss Recovery Using Limited Transmit", RFC 3042, January 2001. | Loss Recovery Using Limited Transmit", RFC 3042, January 2001. | |||
[RFC3517] E. Blanton, M. Allman, K. Fall and L. Wang, "A Conservative | [RFC3517] E. Blanton, M. Allman, K. Fall and L. Wang, "A Conservative | |||
Selective Acknowledgment (SACK)-based Loss Recovery Algorithm for | Selective Acknowledgment (SACK)-based Loss Recovery Algorithm for | |||
TCP", RFC 3517, April 2003. | TCP", RFC 3517, April 2003. | |||
10. Informative References | 11. Informative References | |||
[BA02] E. Blanton and M. Allman, "On Making TCP More Robust to Packet | [BA02] E. Blanton and M. Allman, "On Making TCP More Robust to Packet | |||
Reordering," ACM Computer Communication Review, January 2002. | Reordering," ACM Computer Communication Review, January 2002. | |||
[BBFS01] D. Bansal, H. Balakrishnan, S. Floyd and S. Shenker, | [BBFS01] D. Bansal, H. Balakrishnan, S. Floyd and S. Shenker, | |||
"Dynamic Behavior of Slowly Responsive Congestion Control | "Dynamic Behavior of Slowly Responsive Congestion Control | |||
Algorithms", Proceedings of ACM SIGCOMM, Sep. 2001. | Algorithms", Proceedings of ACM SIGCOMM, Sep. 2001. | |||
[BPS99] J. Bennett, C. Partridge, and N. Shectman, "Packet reordering | [BPS99] J. Bennett, C. Partridge, and N. Shectman, "Packet reordering | |||
is not pat hological network behavior," IEEE/ACM Transactions on | is not pat hological network behavior," IEEE/ACM Transactions on | |||
skipping to change at page 15, line 35 | skipping to change at page 16, line 34 | |||
[SK04] P. Sarolahti, M. Kojo, "Forward RTO-Recovery (F-RTO): An | [SK04] P. Sarolahti, M. Kojo, "Forward RTO-Recovery (F-RTO): An | |||
Algorithm for Detecting Spurious Retransmission Timeouts with TCP and | Algorithm for Detecting Spurious Retransmission Timeouts with TCP and | |||
SCTP", Internet-Draft draft-ietf-tcpm-frto-02.txt (work in progress). | SCTP", Internet-Draft draft-ietf-tcpm-frto-02.txt (work in progress). | |||
November 2004. | November 2004. | |||
[ZKFP03] M. Zhang, B. Karp, S. Floyd, L. Peterson, "RR-TCP: A | [ZKFP03] M. Zhang, B. Karp, S. Floyd, L. Peterson, "RR-TCP: A | |||
Reordering-Robust TCP with DSACK", in Proceedings of the Eleventh | Reordering-Robust TCP with DSACK", in Proceedings of the Eleventh | |||
IEEE International Conference on Networking Protocols (ICNP 2003), | IEEE International Conference on Networking Protocols (ICNP 2003), | |||
Atlanta, GA, November, 2003. | Atlanta, GA, November, 2003. | |||
11. Author's Addresses | 12. Author's Addresses | |||
Sumitha Bhandarkar | Sumitha Bhandarkar | |||
Dept. of Elec. Engg. | Dept. of Elec. Engg. | |||
214 ZACH | 214 ZACH | |||
College Station, TX 77843-3128 | College Station, TX 77843-3128 | |||
Phone: (512) 468-8078 | Phone: (512) 468-8078 | |||
Email: sumitha@tamu.edu | Email: sumitha@tamu.edu | |||
URL : http://students.cs.tamu.edu/sumitha/ | URL : http://students.cs.tamu.edu/sumitha/ | |||
A. L. Narasimha Reddy | A. L. Narasimha Reddy | |||
skipping to change at page 17, line 7 | skipping to change at page 18, line 7 | |||
This document and the information contained herein are provided on an | This document and the information contained herein are provided on an | |||
"AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS | "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS | |||
OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND THE INTERNET | OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND THE INTERNET | |||
ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED, | ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED, | |||
INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE | INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE | |||
INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED | INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED | |||
WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. | WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. | |||
Copyright Statement | Copyright Statement | |||
Copyright (C) The Internet Society (2005). This document is subject | Copyright (C) The Internet Society (2006). This document is subject | |||
to the rights, licenses and restrictions contained in BCP 78, and | to the rights, licenses and restrictions contained in BCP 78, and | |||
except as set forth therein, the authors retain all their rights. | except as set forth therein, the authors retain all their rights. | |||
Acknowledgment | Acknowledgment | |||
Funding for the RFC Editor function is currently provided by the | Funding for the RFC Editor function is currently provided by the | |||
Internet Society. | Internet Society. | |||
End of changes. 39 change blocks. | ||||
109 lines changed or deleted | 155 lines changed or added | |||
This html diff was produced by rfcdiff 1.29, available from http://www.levkowetz.com/ietf/tools/rfcdiff/ |