draft-ietf-tcpm-tcp-dcr-06.txt   draft-ietf-tcpm-tcp-dcr-07.txt 
Internet Engineering Task Force Sumitha Bhandarkar Internet Engineering Task Force Sumitha Bhandarkar
INTERNET DRAFT A. L. Narasimha Reddy INTERNET DRAFT A. L. Narasimha Reddy
draft-ietf-tcpm-tcp-dcr-06.txt Texas A&M University draft-ietf-tcpm-tcp-dcr-07.txt Texas A&M University
Expires: May 2006 Mark Allman Expires: July 2006 Mark Allman
ICIR/ICSI ICIR/ICSI
Ethan Blanton Ethan Blanton
Purdue University Purdue University
November 2005 January 2006
Improving the Robustness of TCP to Non-Congestion Events Improving the Robustness of TCP to Non-Congestion Events
Status of this Memo Status of this Memo
By submitting this Internet-Draft, each author represents that any By submitting this Internet-Draft, each author represents that any
applicable patent or other IPR claims of which he or she is aware applicable patent or other IPR claims of which he or she is aware
have been or will be disclosed, and any of which he or she becomes have been or will be disclosed, and any of which he or she becomes
aware will be disclosed, in accordance with Section 6 of BCP 79. aware will be disclosed, in accordance with Section 6 of BCP 79.
skipping to change at page 1, line 39 skipping to change at page 1, line 39
material or to cite them other than as "work in progress." material or to cite them other than as "work in progress."
The list of current Internet-Drafts can be accessed at The list of current Internet-Drafts can be accessed at
http://www.ietf.org/ietf/1id-abstracts.txt. http://www.ietf.org/ietf/1id-abstracts.txt.
The list of Internet-Draft Shadow Directories can be accessed at The list of Internet-Draft Shadow Directories can be accessed at
http://www.ietf.org/shadow.html. http://www.ietf.org/shadow.html.
Copyright Notice Copyright Notice
Copyright (C) The Internet Society (2005). Copyright (C) The Internet Society (2006).
Abstract: Abstract
This document specifies Non-Congestion Robustness (NCR) for TCP. In This document specifies Non-Congestion Robustness (NCR) for TCP. In
the absence of explicit congestion notification from the network, the absence of explicit congestion notification from the network TCP
TCP's loss recovery algorithms treat the receipt of three duplicate uses loss as an indication of congestion. One of the ways TCP
acknowledgments as an implicit indication of congestion in the detects loss is using the arrival of three duplicate acknowledgments.
network. This is not always correct, notably in the case when However, this heuristic is not always correct, notably in the case
network paths reorder segments (for whatever reason), resulting in when network paths reorder segments (for whatever reason), resulting
degraded performance. TCP-NCR is designed to mitigate this degraded in degraded performance. TCP-NCR is designed to mitigate this
performance by increasing the number of duplicate acknowledgments degraded performance by increasing the number of duplicate
required to trigger loss recovery, based on the current state of the acknowledgments required to trigger loss recovery, based on the
connection, in an effort to better disambiguate true segment loss current state of the connection, in an effort to better disambiguate
from segment reordering. This document specifies the changes to TCP, true segment loss from segment reordering. This document specifies
as well as the costs and benefits of these modifications. the changes to TCP, as well as the costs and benefits of these
modifications.
Table of Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . 2
2. NCR Description . . . . . . . . . . . . . . . . . . . . 5
3. Algorithm . . . . . . . . . . . . . . . . . . . . . . . 6
3.1 Initialization . . . . . . . . . . . . . . . . . . . 8
3.2 Terminating Extended Limited Transmit and
Preventing Bursts . . . . . . . . . . . . . . . . . . 9
3.3 Extended Limited Transmit . . . . . . . . . . . . . . 10
3.4 Entering Loss Recovery . . . . . . . . . . . . . . . 11
4. Advantages . . . . . . . . . . . . . . . . . . . . . . . 11
5. Disadvantages . . . . . . . . . . . . . . . . . . . . . 12
6. Related Work . . . . . . . . . . . . . . . . . . . . . . 13
7. Security Considerations . . . . . . . . . . . . . . . . 13
8. Acknowledgements . . . . . . . . . . . . . . . . . . . . 14
9. IANA Considerations . . . . . . . . . . . . . . . . . . 14
10. Normative References . . . . . . . . . . . . . . . . . . 14
11. Informative References . . . . . . . . . . . . . . . . . 14
12. Author's Addresses . . . . . . . . . . . . . . . . . . . 16
Terminology Terminology
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL
NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and
"OPTIONAL" in this document are to be interpreted as described "OPTIONAL" in this document are to be interpreted as described
in [RFC2119]. in [RFC2119].
Readers should be familiar with the TCP terminology (e.g., Readers should be familiar with the TCP terminology (e.g.,
FlightSize, Pipe, etc.) given in [RFC2581] and [RFC3517]. FlightSize, Pipe, etc.) given in [RFC2581] and [RFC3517].
skipping to change at page 2, line 36 skipping to change at page 3, line 8
of congestion (i.e., assuming queue overflow). TCP receivers send of congestion (i.e., assuming queue overflow). TCP receivers send
cumulative acknowledgments (ACKs) indicating the next sequence number cumulative acknowledgments (ACKs) indicating the next sequence number
expected from the sender for arriving segments [RFC793]. When expected from the sender for arriving segments [RFC793]. When
segments arrive out-of-order, duplicate ACKs are generated. As segments arrive out-of-order, duplicate ACKs are generated. As
specified in [RFC2581], a TCP sender uses the arrival of three specified in [RFC2581], a TCP sender uses the arrival of three
duplicate ACKs as an indication of segment loss. The TCP sender duplicate ACKs as an indication of segment loss. The TCP sender
retransmits the lost segment and reduces the load imposed on the retransmits the lost segment and reduces the load imposed on the
network, assuming the segment loss was caused by resource contention network, assuming the segment loss was caused by resource contention
within the network path. The TCP sender does not assume loss on the within the network path. The TCP sender does not assume loss on the
first or second duplicate ACK, but waits for three duplicate ACKs to first or second duplicate ACK, but waits for three duplicate ACKs to
account for mild reordering. However, the use of this constant account for minor packet reordering. However, the use of this
threshold of duplicate ACKs has several problems that can be constant threshold of duplicate ACKs has several problems that can be
mitigated with a dynamic threshold. mitigated with a dynamic threshold.
The following is an example of TCP's behavior: The following is an example of TCP's behavior:
+ TCP A is the data sender and TCP B is the data receiver. + TCP A is the data sender and TCP B is the data receiver.
+ TCP A sends 10 segments each consisting of a single data byte + TCP A sends 10 segments each consisting of a single data byte
(i.e., transmits bytes 1-10 in segments 1-10). (i.e., transmits bytes 1-10 in segments 1-10).
+ Assume segment 3 is dropped in the network. + Assume segment 3 is dropped in the network.
skipping to change at page 3, line 19 skipping to change at page 3, line 39
[RFC2581] recommends that delayed ACKs not be used when the ACK [RFC2581] recommends that delayed ACKs not be used when the ACK
is triggered by an out-of-order segment.) is triggered by an out-of-order segment.)
+ When TCP A receives the third duplicate ACK (or fourth ACK + When TCP A receives the third duplicate ACK (or fourth ACK
overall) for sequence number 3, TCP A will retransmit overall) for sequence number 3, TCP A will retransmit
segment 3 and reduce the sending rate by roughly half (see segment 3 and reduce the sending rate by roughly half (see
[RFC2581] for specifics on the congestion control state [RFC2581] for specifics on the congestion control state
adjustments). adjustments).
Alternatively, suppose segment 3 was not dropped by the network, but Alternatively, suppose segment 3 was not dropped by the network, but
rather delayed such that segment 3 arrives after segment 10. The rather delayed such that segment 3 arrives at TCP B after segment 10.
above scenario will play out in precisely the same manner insomuch as The above scenario will play out in precisely the same manner
a retransmission of segment 3 will be triggered. In other words, TCP insomuch as a retransmission of segment 3 will be triggered. In
is not capable of disambiguating this reordering event from a segment other words, TCP is not capable of disambiguating this reordering
loss. event from a segment loss, resulting in an unnecessary retransmission
and rate reduction.
The following is the specific motivation behind making TCP robust to The following is the specific motivation behind making TCP robust to
reordered segments: reordered segments:
* A number of Internet measurement studies have shown that packet * A number of Internet measurement studies have shown that packet
reordering is not a rare phenomenon [Pax97,BPS99,JIDKT03,GPL04]. reordering is not a rare phenomenon [Pax97,BPS99,JIDKT03,GPL04].
Further, the reordering can be well beyond that required for Further, the reordering can be well beyond that required for
fast retransmit to be falsely triggered. fast retransmit to be falsely triggered.
* [BA02,ZKFP03] show the negative performance implications that * [BA02,ZKFP03] show the negative performance implications that
skipping to change at page 3, line 45 skipping to change at page 4, line 18
* The requirement imposed by TCP for almost in-order packet * The requirement imposed by TCP for almost in-order packet
delivery places a constraint on the design of future technology. delivery places a constraint on the design of future technology.
Novel routing algorithms, network components, link-layer Novel routing algorithms, network components, link-layer
retransmission mechanisms and applications could all be looked retransmission mechanisms and applications could all be looked
at with a fresh perspective if TCP were to be more robust to at with a fresh perspective if TCP were to be more robust to
segment reordering. For instance, high speed packet switches segment reordering. For instance, high speed packet switches
could cause resequencing of packets if TCP were more robust. could cause resequencing of packets if TCP were more robust.
There has been work proposed in the literature explicitly to There has been work proposed in the literature explicitly to
ensure that packet ordering is maintained in such switches ensure that packet ordering is maintained in such switches
[KM02]. Also, link-layer mechanisms that attempt to recover (e.g., [KM02]). Also, link-layer mechanisms that attempt to
from packet corruption by retransmitting could be allowed to recover from packet corruption by retransmitting could be
reorder packets and, hence, increase the chances of local loss allowed to reorder packets and, hence, increase the chances of
repair rather than relying on TCP to repair the loss (and, local loss repair rather than relying on TCP to repair the loss
needlessly reduce its sending rate). Additional examples (and, needlessly reduce its sending rate). Additional examples
include multi-path routing, high-delay satellite links and some include multi-path routing, high-delay satellite links and some
of the schemes proposed for differentiated services of the schemes proposed for a differentiated services
architecture. By making TCP more robust to non-congestion architecture. By making TCP more robust to non-congestion
events, TCP-NCR may open the design space of the future Internet events, TCP-NCR may open the design space of the future Internet
components. components.
In this document we specify a set of TCP sender modifications to In this document we specify a set of TCP sender modifications to
provide Non-Congestion Robustness (NCR) to TCP. In particular, these provide Non-Congestion Robustness (NCR) to TCP. In particular, these
changes are built on top of TCP with selective acknowledgments changes are built on top of TCP with selective acknowledgments
(SACKs) [RFC2018] and the SACK-based loss recovery scheme given in (SACKs) [RFC2018] and the SACK-based loss recovery scheme given in
[RFC3517], since SACK is widely deployed at this point ([MAF05] [RFC3517], since SACK is widely deployed at this point ([MAF05]
indicates that 68% of web servers and 88% of web clients utilize SACK indicates that 68% of web servers and 88% of web clients utilize SACK
as of spring, 2004). as of spring, 2004).
Finally, we note that the TCP-NCR algorithm provided in this document We note that the TCP-NCR algorithm provided in this document could be
could be easily adapted to SCTP [RFC2960] since SCTP uses congestion easily adapted to SCTP [RFC2960] since SCTP uses congestion control
control algorithms similar to TCP's (and, hence, has the same algorithms similar to TCP's (and, hence, has the same reordering
reordering robustness issues). robustness issues).
As we note in several places in the remainder of this document, we
consider TCP-NCR to be experimental in that more experience with the
techniques is required before TCP-NCR should be used on a large scale
on the Internet. We encourage implementation and experimentation
with TCP-NCR in the hopes of gaining an understanding of its
suitability for wide-scale deployment.
The remainder of this document is organized as follows. Section 2 The remainder of this document is organized as follows. Section 2
provides a high-level description of the TCP-NCR mechanisms. In provides a high-level description of the TCP-NCR mechanisms. In
Section 3, we specify the TCP-NCR algorithm. Section 4 provides a Section 3, we specify the TCP-NCR algorithm. Section 4 provides a
brief overview of the benefits of TCP-NCR, while Section 5 discusses brief overview of the benefits of TCP-NCR, while Section 5 discusses
the drawbacks of TCP-NCR. Section 6 discusses related work. Section the drawbacks of TCP-NCR. Section 6 discusses related work. Section
7 discusses security concerns. 7 discusses security concerns.
2. NCR Description 2. NCR Description
As discussed above, in the face of packet reordering three duplicate As discussed above, in the face of packet reordering, three duplicate
ACKs may not be enough to disambiguate loss from reordering. In this ACKs may not be enough to disambiguate loss from reordering. In this
section we provide a non-normative sketch of TCP-NCR. The detailed section we provide a non-normative sketch of TCP-NCR. The detailed
algorithms for implementing Non-Congestion Robustness for TCP are algorithms for implementing Non-Congestion Robustness for TCP are
presented in the next section. presented in the next section.
The general idea behind TCP-NCR is to increase the threshold used to The general idea behind TCP-NCR is to increase the threshold used to
trigger a fast retransmission from the current fixed value of three trigger a fast retransmission from the current fixed value of three
duplicate ACKs [RFC2581] to approximately a congestion window of data duplicate ACKs [RFC2581] to approximately a congestion window of data
having left the network (but, not less than the currently having left the network (but, not less than the currently
standardized value of three duplicate ACKs). Since cwnd represents standardized value of three duplicate ACKs). Since cwnd represents
the amount of data a TCP flow can transmit in one round-trip time the amount of data a TCP flow can transmit in one round-trip time
(RTT), waiting to receive notice that cwnd bytes have left the (RTT), waiting to receive notice that cwnd bytes have left the
network before deciding whether the root cause is loss or reordering network before deciding whether the root cause is loss or reordering
imposes a delay of roughly one RTT. The appropriate choice for a new imposes a delay of roughly one RTT on both the retransmission and the
value of the threshold is essentially a tradeoff between making the congestion control response. The appropriate choice for a new value
best decision regarding the cause of the duplicate ACKs and of the threshold is essentially a tradeoff between making the best
decision regarding the cause of the duplicate ACKs and
responsiveness. The choice to trigger a retransmission only after a responsiveness. The choice to trigger a retransmission only after a
cwnd's worth of data is known to have left the network represents cwnd's worth of data is known to have left the network represents
roughly the largest amount of time a TCP can wait before the (often roughly the largest amount of time a TCP can wait before the (often
costly) retransmission timeout may be triggered. Therefore, the costly) retransmission timeout may be triggered. Therefore, the
algorithm described in this document attempts to make the best root algorithm described in this document attempts to make the best
cause decision possible. decision possible at the expense of timeliness.
Simply increasing the threshold before retransmitting a segment can Simply increasing the threshold before retransmitting a segment can
make TCP brittle to packet loss or ACK loss since such loss reduces make TCP brittle to packet loss or ACK loss since such loss reduces
the number of duplicate ACKs that will arrive at the sender from the the number of duplicate ACKs that will arrive at the sender from the
receiver. For instance, if the cwnd is 10 segments and one segment receiver. For instance, if the cwnd is 10 segments and one segment
is lost, a duplicate ACK threshold of 10 will never be met because is lost, a duplicate ACK threshold of 10 will never be met because
duplicate ACKs corresponding to at most 9 segments will arrive at the duplicate ACKs corresponding to at most 9 segments will arrive at the
sender. To offset the issue of loss, we extend TCP's Limited sender. To offset the issue of loss, we extend TCP's Limited
Transmit [RFC3042] scheme to allow for the sending of new data during Transmit [RFC3042] scheme to allow for the sending of new data during
the period when the TCP sender is disambiguating loss and reordering. the period when the TCP sender is disambiguating loss and reordering.
skipping to change at page 7, line 4 skipping to change at page 7, line 33
A constant MUST be set depending on which variant of extended Limited A constant MUST be set depending on which variant of extended Limited
Transmit is used, as follows: Transmit is used, as follows:
Careful Limited Transmit: Careful Limited Transmit:
LT_F = 2/3 LT_F = 2/3
Aggressive Limited Transmit: Aggressive Limited Transmit:
LT_F = 1/2 LT_F = 1/2
This constant reflects the fraction of outstanding data that must be
SACKed before a retransmission is triggered. Since Aggressive This constant reflects the fraction of outstanding data (including
Limited Transmit sends a new segment for every segment known to have data sent during Extended Limited Transmit) that must be SACKed
left the network, a total of roughly cwnd segments will be sent before a retransmission is triggered. Since Aggressive Limited
during Aggressive Limited Transmit and therefore ideally a total of Transmit sends a new segment for every segment known to have left the
2*cwnd segments will be outstanding. The duplicate ACK threshold is network, a total of roughly cwnd segments will be sent during
then set to LT_F = 1/2 of 2*cwnd (or about 1 RTT worth of data). The Aggressive Limited Transmit and therefore ideally a total of roughly
factor is different for Careful Limited Transmit because the sender 2*cwnd segments will be outstanding when a retransmission is
only transmits one new segment for every two segments that are SACKed triggered. The duplicate ACK threshold is then set to LT_F = 1/2 of
and therefore will ideally have a total of 1.5*cwnd segments 2*cwnd (or about 1 RTT worth of data). The factor is different for
outstanding when the retransmission is to be triggered. Hence, the Careful Limited Transmit because the sender only transmits one new
required threshold is LT_F=2/3 of 1.5*cwnd to delay the segment for every two segments that are SACKed and therefore will
retransmission by roughly 1 RTT. ideally have a total of 1.5*cwnd segments outstanding when the
retransmission is to be triggered. Hence, the required threshold is
LT_F=2/3 of 1.5*cwnd to delay the retransmission by roughly 1 RTT.
There are situations whereby the sender cannot transmit new data There are situations whereby the sender cannot transmit new data
during Extended Limited Transmit (e.g., lack of data from the during Extended Limited Transmit (e.g., lack of data from the
application, receiver's advertised window limit). These situations application, receiver's advertised window limit). These situations
can lead to the problems discussed in the last section when a TCP can lead to the problems discussed in the last section when a TCP
does not employ Extended Limited Transmit and is starved for ACKs. does not employ Extended Limited Transmit and is starved for ACKs.
Therefore, TCP-NCR adapts the duplicate ACK threshold on each SACK Therefore, TCP-NCR adapts the duplicate ACK threshold on each SACK
arrival to be as robust as possible given the actual amount of data arrival to be as robust as possible given the actual amount of data
that has been transmitted, or roughly LT_F times the number of that has been transmitted, or roughly LT_F times the number of
outstanding segments. outstanding segments.
The TCP-NCR modifications specified in this document lend themselves The TCP-NCR modifications specified in this document lend themselves
to incremental deployment. Only the TCP implementation on the sender to incremental deployment. Only the TCP implementation on the sender
side requires modification. The changes themselves are modest. side requires modification (assuming both hosts support SACK). The
However, as will be discussed below, availability of additional changes themselves are modest. However, as will be discussed below,
buffer space at the receiver will help maximize the benefits of using availability of additional buffer space at the receiver will help
TCP-NCR but are not strictly necessary. maximize the benefits of using TCP-NCR but are not strictly
necessary.
The following algorithms depend on the notions provided by [RFC3517] The following algorithms depend on the notions provided by [RFC3517]
and we assume the reader is familiar with the terminology given in and we assume the reader is familiar with the terminology given in
[RFC3517]. The TCP-NCR algorithm can be adapted to alternate SACK- [RFC3517]. The TCP-NCR algorithm can be adapted to alternate SACK-
based loss recovery schemes. [BR04,BSRV04] outline non-SACK-based based loss recovery schemes. [BR04,BSRV04] outline non-SACK-based
algorithms, however, we do not specify those algorithms in this algorithms, however, we do not specify those algorithms in this
document and do not recommend them due to both the complexity and document and do not recommend them due to both the complexity and
security implications of having only a gross understanding of the security implications of having only a gross understanding of the
number of outstanding segments in the network. number of outstanding segments in the network.
skipping to change at page 8, line 18 skipping to change at page 8, line 49
3.1. Initialization 3.1. Initialization
When entering a period of loss / reordering detection and Extended When entering a period of loss / reordering detection and Extended
Limited Transmit a TCP-NCR MUST initialize several state variables. Limited Transmit a TCP-NCR MUST initialize several state variables.
A TCP MUST enter Extended Limited Transmit upon receiving the first A TCP MUST enter Extended Limited Transmit upon receiving the first
ACK with a SACK block after the reception of an ACK that (a) did not ACK with a SACK block after the reception of an ACK that (a) did not
contain SACK information and (b) did increase the connection's contain SACK information and (b) did increase the connection's
cumulative ACK point. The initializations are: cumulative ACK point. The initializations are:
(I.1) Save the current FlightSize. (I.1) The TCP MUST save the current FlightSize.
FlightSizePrev = FlightSize FlightSizePrev = FlightSize
(I.2) The TCP MUST set a variable for tracking the number of
(I.2) Set a variable for tracking the number of segments for which segments for which an ACK does not trigger a transmission
an ACK does not trigger a transmission during Careful Limited during Careful Limited Transmit.
Transmit.
Skipped = 0 Skipped = 0
(Note: Skipped is not used during Aggressive Limited (Note: Skipped is not used during Aggressive Limited
Transmit.) Transmit.)
(I.3) Set DupThresh (from [RFC3517]) based on the size of the (I.3) The TCP MUST set DupThresh (from [RFC3517]) based on the
current FlightSize. current FlightSize.
DupThresh = max (LT_F * (FlightSize / SMSS),3) DupThresh = max (LT_F * (FlightSize / SMSS),3)
Note: We keep the lower bound of DupThresh = 3 from Note: We keep the lower bound of DupThresh = 3 from
[RFC2581,RFC3517]. [RFC2581,RFC3517].
In addition to the above steps, the incoming ACK MUST be processed In addition to the above steps, the incoming ACK MUST be processed
with the E series of steps in section 3.3. with the E series of steps in section 3.3.
skipping to change at page 9, line 12 skipping to change at page 9, line 42
not congestion. Therefore, the receipt of an ACK that extends the not congestion. Therefore, the receipt of an ACK that extends the
cumulative ACK point MUST terminate Extended Limited Transmit. As cumulative ACK point MUST terminate Extended Limited Transmit. As
described below (in (T.4)), an ACK that extends the cumulative ACK described below (in (T.4)), an ACK that extends the cumulative ACK
point and *also* contains SACK information will also trigger the point and *also* contains SACK information will also trigger the
beginning of a new Extended Limited Transmit phase. beginning of a new Extended Limited Transmit phase.
Upon the termination of Extended Limited Transmit, and especially Upon the termination of Extended Limited Transmit, and especially
when using the Careful variant, TCP-NCR may be in a situation where when using the Careful variant, TCP-NCR may be in a situation where
the entire cwnd is not being utilized and therefore TCP-NCR will be the entire cwnd is not being utilized and therefore TCP-NCR will be
prone to transmitting a burst of segments into the network. prone to transmitting a burst of segments into the network.
Therefore, when a TCP-NCR in the Extended Limited Transmit phase Therefore, to mitigate this bursting when a TCP-NCR in the Extended
receives an ACK that updates the cumulative ACK point (regardless of Limited Transmit phase receives an ACK that updates the cumulative
whether the ACK contains SACK information), the following steps MUST ACK point (regardless of whether the ACK contains SACK information),
be taken: the following steps MUST be taken:
(T.1) cwnd = min (FlightSize + SMSS,FlightSizePrev) (T.1) A TCP MUST reset cwnd to:
cwnd = min (FlightSize + SMSS,FlightSizePrev)
This step ensures that cwnd is not grossly larger than the This step ensures that cwnd is not grossly larger than the
amount of data outstanding --- a situation that would cause a amount of data outstanding --- a situation that would cause a
line rate burst. line rate burst.
(T.2) ssthresh = FlightSizePrev (T.2) A TCP MUST set ssthresh to:
ssthresh = FlightSizePrev
This step provides TCP-NCR with a sense of "history". If step This step provides TCP-NCR with a sense of "history". If step
(T.1) reduces cwnd below FlightSizePrev this step ensures that (T.1) reduces cwnd below FlightSizePrev this step ensures that
TCP-NCR will slow start back to the operating point in effect TCP-NCR will slow start back to the operating point in effect
before Extended Limited Transmit. before Extended Limited Transmit.
(T.3) Transmit previously unsent data as allowed by cwnd, (T.3) A TCP is now permitted to transmit previously unsent data as
FlightSize, application data availability and the receiver's allowed by cwnd, FlightSize, application data availability and
advertised window. the receiver's advertised window.
(T.4) When the ACK extends the cumulative ACK point and also (T.4) When an incoming ACK extends the cumulative ACK point and also
contains SACK information, the initializations in steps (I.2) contains SACK information, the initializations in steps (I.2)
and (I.3) from section 3.1 MUST be taken (but, not step (I.1)) and (I.3) from section 3.1 MUST be taken (but, step (I.1) MUST
to re-start Extended Limited Transmit. In addition, the NOT be executed) to re-start Extended Limited Transmit. In
series of steps in section 3.3 (the "E" steps) MUST be taken. addition, the series of steps in section 3.3 (the "E" steps)
MUST be taken.
3.3. Extended Limited Transmit 3.3. Extended Limited Transmit
On each ACK containing SACK information that arrives after TCP-NCR On each ACK containing SACK information that arrives after TCP-NCR
has entered the Extended Limited Transmit phase (as outlined in has entered the Extended Limited Transmit phase (as outlined in
section 3.1) and before Extended Limited Transmit terminates, the section 3.1) and before Extended Limited Transmit terminates, the
sender MUST use the following procedure. sender MUST use the following procedure.
(E.1) Use the SetPipe () procedure from [RFC3517] to set the "pipe" (E.1) The SetPipe () procedure from [RFC3517] MUST be used to set
variable (which represents the number of bytes still considered the "pipe" variable (which represents the number of bytes
"in the network"). Note: the current value of DupThresh MUST still considered "in the network"). Note: the current value
be used by SetPipe () to produce an accurate assessment of the of DupThresh MUST be used by SetPipe () to produce an accurate
amount of data still considered in the network. assessment of the amount of data still considered in the
network.
(E.2) If the comparison in equation (1) below holds and there are (E.2) If the comparison in equation (1) below holds and there are
SMSS bytes of previously unsent data available for SMSS bytes of previously unsent data available for
transmission then transmit one segment of SMSS bytes. transmission then the sender MUST transmit one segment of SMSS
bytes.
(pipe + Skipped) <= (FlightSizePrev - SMSS) (1) (pipe + Skipped) <= (FlightSizePrev - SMSS) (1)
If the comparison in equation (1) does not hold or no new data If the comparison in equation (1) does not hold or no new data
can be transmitted (due to lack of data from the application can be transmitted (due to lack of data from the application
or the advertised window limit), skip to step (E.6). or the advertised window limit), skip to step (E.6).
(E.3) Increment pipe by SMSS bytes. (E.3) Pipe MUST be incremented by SMSS bytes.
(E.4) If using Careful Limited Transmit, increment Skipped by SMSS (E.4) If using Careful Limited Transmit, Skipped MUST be incremented
bytes to ensure that the next SMSS bytes of SACKed data by SMSS bytes to ensure that the next SMSS bytes of SACKed data
processed do not trigger a Limited Transmit transmission (since processed does not trigger a Limited Transmit transmission
the goal of Careful Limited Transmit is to send upon the (since the goal of Careful Limited Transmit is to send upon
reception of every second duplicate ACK). the reception of every second duplicate ACK).
(E.5) Return to step (E.2) to ensure that as many bytes as (E.5) A TCP MUST return to step (E.2) to ensure that as many bytes
appropriate are transmitted. This provides robustness to ACK as appropriate are transmitted. This provides robustness to
loss that can be (largely) compensated for using SACK ACK loss that can be (largely) compensated for using SACK
information. information.
(E.6) Reset DupThresh via: (E.6) DupThresh MUST be reset via:
DupThresh = max (LT_F * (FlightSize / SMSS),3) DupThresh = max (LT_F * (FlightSize / SMSS),3)
where FlightSize is the total number of bytes that have not where FlightSize is the total number of bytes that have not
been cumulatively acknowledged (which is different from been cumulatively acknowledged (which is different from
"pipe"). "pipe").
3.4 Entering Loss Recovery 3.4 Entering Loss Recovery
When a segment is deemed lost via the algorithms in [RFC3517], When a segment is deemed lost via the algorithms in [RFC3517],
Extended Limited Transmit MUST be terminated, leaving the Extended Limited Transmit MUST be terminated, leaving the
algoritms in [RFC3517] to govern TCP's behavior. One slight algorithms in [RFC3517] to govern TCP's behavior. One slight
change to [RFC3517] MUST be made, however. In section 5, step change to [RFC3517] MUST be made, however. In section 5, step
(2) of [RFC3517] MUST be changed to: (2) of [RFC3517] MUST be changed to:
(2) ssthresh = cwnd = (FlightSizePrev / 2) (2) ssthresh = cwnd = (FlightSizePrev / 2)
This ensures that the congestion control modifications are made This ensures that the congestion control modifications are made
with respect to the amount of data in the network before with respect to the amount of data in the network before
FlightSize was increased by Extended Limited Transmit. FlightSize was increased by Extended Limited Transmit.
Note: Once the algorithm in [RFC3517] takes over from Extended Note: Once the algorithm in [RFC3517] takes over from Extended
skipping to change at page 11, line 47 skipping to change at page 12, line 37
5. Disadvantages 5. Disadvantages
While we note that all of the changes outlined above are implemented While we note that all of the changes outlined above are implemented
in the sender, the receiver also potentially has a part to play. In in the sender, the receiver also potentially has a part to play. In
particular, TCP-NCR increases the receiver's buffering requirement by particular, TCP-NCR increases the receiver's buffering requirement by
up to an extra cwnd -- in the case of the TCP sender using Aggressive up to an extra cwnd -- in the case of the TCP sender using Aggressive
Limited Transmit and actual loss occurring in the network. Limited Transmit and actual loss occurring in the network.
Therefore, to maximize the benefits from TCP-NCR receivers should Therefore, to maximize the benefits from TCP-NCR receivers should
advertise a large window to absorb the extra out-of-order traffic. In advertise a large window to absorb the extra out-of-order traffic. In
the case that the additonal buffer requirements are not met, the use the case that the additional buffer requirements are not met, the use
of the above algorithm takes into account the reduced advertised of the above algorithm takes into account the reduced advertised
window. window---with a corresponding loss in robustness to packet
reordering.
In addition, using TCP-NCR could delay the delivery of data to the In addition, using TCP-NCR could delay the delivery of data to the
application by up to one RTT because the fast retransmission point is application by up to one RTT because the fast retransmission point is
delayed by roughly one RTT in TCP-NCR. Applications that are delayed by roughly one RTT in TCP-NCR. Applications that are
sensitive to such delays should turn off the TCP-NCR option. For sensitive to such delays should turn off the TCP-NCR option. For
instance, a socket option could be introduced to allow applications instance, a socket option could be introduced to allow applications
to control whether NCR would be used for a particular connection. to control whether NCR would be used for a particular connection.
Finally, the use of TCP-NCR makes the recovery from congestion events Finally, the use of TCP-NCR makes the recovery from congestion events
sluggish in comparison to the standard reaction in [RFC2581]. [BR04, sluggish in comparison to the standard reaction in [RFC2581]. [BR04,
skipping to change at page 12, line 35 skipping to change at page 13, line 27
happening and mechanisms that try to detect spurious retransmits and happening and mechanisms that try to detect spurious retransmits and
"undo" the needless congestion control state changes that have been "undo" the needless congestion control state changes that have been
taken. taken.
[BA02,ZKFP03] attempt to prevent segment reordering from triggering [BA02,ZKFP03] attempt to prevent segment reordering from triggering
spurious retransmits by using various algorithms to approximate the spurious retransmits by using various algorithms to approximate the
duplicate ACK threshold required to disambiguate loss and reordering duplicate ACK threshold required to disambiguate loss and reordering
over a given network path at a given time. TCP-NCR similarly tries over a given network path at a given time. TCP-NCR similarly tries
to prevent spurious retransmits. However, TCP-NCR takes a simplified to prevent spurious retransmits. However, TCP-NCR takes a simplified
approach compared to those in [BA02,ZKFP03] in that TCP-NCR simply approach compared to those in [BA02,ZKFP03] in that TCP-NCR simply
delays retransmission by a fixed amount (in comparison to standard delays retransmission by an amount based on the current cwnd (in
TCP), while the other schemes use relatively complex algorithms in an comparison to standard TCP), while the other schemes use relatively
attempt to derive a more precise value for DupThresh that depends on complex algorithms in an attempt to derive a more precise value for
the network conditions. While TCP-NCR offers simplicity the other DupThresh that depends on the current patterns of packet reordering.
schemes may offer more precision such that applications would not be While TCP-NCR offers simplicity the other schemes may offer more
forced to wait as long for their retransmissions. Future work could precision such that applications would not be forced to wait as long
be undertaken to achieve robustness without needless delay. for their retransmissions. Future work could be undertaken to
achieve robustness without needless delay.
On the other hand, several schemes have been developed to detect and On the other hand, several schemes have been developed to detect and
mitigate needless retransmissions after the fact. mitigate needless retransmissions after the fact.
[RFC3522,RFC3708,BA02,RFC4015,SK04] present algorithms to detect [RFC3522,RFC3708,BA02,RFC4015,SK04] present algorithms to detect
spurious retransmits and mitigate the changes these events made to spurious retransmits and mitigate the changes these events made to
the congestion control state. TCP-NCR could be used in conjunction the congestion control state. TCP-NCR could be used in conjunction
with these algorithms, with TCP-NCR attempting to prevent spurious with these algorithms, with TCP-NCR attempting to prevent spurious
retransmits and some other scheme kicking in if the prevention retransmits and some other scheme kicking in if the prevention
failed. In addition, we note that TCP-NCR is concentrated on failed. In addition, we note that TCP-NCR is concentrated on
preventing spurious fast retransmits and some of the above algorithms preventing spurious fast retransmits and some of the above algorithms
skipping to change at page 13, line 18 skipping to change at page 14, line 11
We do not believe there are security implications involved with TCP- We do not believe there are security implications involved with TCP-
NCR over and above those for general TCP congestion control NCR over and above those for general TCP congestion control
[RFC2581]. In particular, the Extended Limited Transmit algorithms [RFC2581]. In particular, the Extended Limited Transmit algorithms
specified in this document have been specifically designed not to be specified in this document have been specifically designed not to be
susceptible to the sorts of ACK splitting attacks TCP's general TCP susceptible to the sorts of ACK splitting attacks TCP's general TCP
congestion control is vulnerable to (as discussed in [RFC3465]). congestion control is vulnerable to (as discussed in [RFC3465]).
8. Acknowledgements 8. Acknowledgements
Ted Faber, Wesley Eddy, Gorry Fairhurst, Sally Floyd, Sara Landstrom, Feedback from Lars Eggert, Ted Faber, Wesley Eddy, Gorry Fairhurst,
Nauzad Sadry, Pasi Sarolahti, Joe Touch and Nitin Vaidya as well as Sally Floyd, Sara Landstrom, Nauzad Sadry, Pasi Sarolahti, Joe Touch
feedback from the TCPM working group have contributed significantly and Nitin Vaidya and the TCPM working group have contributed
to this document. Our thanks to all! significantly to this document. Our thanks to all!
9. Normative References 9. IANA Considerations
This document requires no IANA assignments. The RFC Editor can
safely remove this section.
10. Normative References
[RFC793] J. Postel, "Transmission Control Protocol", RFC 793, [RFC793] J. Postel, "Transmission Control Protocol", RFC 793,
September 1981. September 1981.
[RFC2018] M. Mathis, J. Mahdavi, S. Floyd and A. Romanow, "TCP [RFC2018] M. Mathis, J. Mahdavi, S. Floyd and A. Romanow, "TCP
selective acknowledgment options," Internet RFC 2018. selective acknowledgment options," Internet RFC 2018.
[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
Requirement Levels", BCP 14, RFC 2119, March 1997. Requirement Levels", BCP 14, RFC 2119, March 1997.
[RFC2581] M. Allman, V. Paxson, and W. Stevens, "TCP Congestion [RFC2581] M. Allman, V. Paxson, and W. Stevens, "TCP Congestion
Control", RFC 2581, April 1999. Control", RFC 2581, April 1999.
[RFC3042] M. Allman, H. Balakrishnan and S. Floyd, "Enhancing TCP's [RFC3042] M. Allman, H. Balakrishnan and S. Floyd, "Enhancing TCP's
Loss Recovery Using Limited Transmit", RFC 3042, January 2001. Loss Recovery Using Limited Transmit", RFC 3042, January 2001.
[RFC3517] E. Blanton, M. Allman, K. Fall and L. Wang, "A Conservative [RFC3517] E. Blanton, M. Allman, K. Fall and L. Wang, "A Conservative
Selective Acknowledgment (SACK)-based Loss Recovery Algorithm for Selective Acknowledgment (SACK)-based Loss Recovery Algorithm for
TCP", RFC 3517, April 2003. TCP", RFC 3517, April 2003.
10. Informative References 11. Informative References
[BA02] E. Blanton and M. Allman, "On Making TCP More Robust to Packet [BA02] E. Blanton and M. Allman, "On Making TCP More Robust to Packet
Reordering," ACM Computer Communication Review, January 2002. Reordering," ACM Computer Communication Review, January 2002.
[BBFS01] D. Bansal, H. Balakrishnan, S. Floyd and S. Shenker, [BBFS01] D. Bansal, H. Balakrishnan, S. Floyd and S. Shenker,
"Dynamic Behavior of Slowly Responsive Congestion Control "Dynamic Behavior of Slowly Responsive Congestion Control
Algorithms", Proceedings of ACM SIGCOMM, Sep. 2001. Algorithms", Proceedings of ACM SIGCOMM, Sep. 2001.
[BPS99] J. Bennett, C. Partridge, and N. Shectman, "Packet reordering [BPS99] J. Bennett, C. Partridge, and N. Shectman, "Packet reordering
is not pat hological network behavior," IEEE/ACM Transactions on is not pat hological network behavior," IEEE/ACM Transactions on
skipping to change at page 15, line 35 skipping to change at page 16, line 34
[SK04] P. Sarolahti, M. Kojo, "Forward RTO-Recovery (F-RTO): An [SK04] P. Sarolahti, M. Kojo, "Forward RTO-Recovery (F-RTO): An
Algorithm for Detecting Spurious Retransmission Timeouts with TCP and Algorithm for Detecting Spurious Retransmission Timeouts with TCP and
SCTP", Internet-Draft draft-ietf-tcpm-frto-02.txt (work in progress). SCTP", Internet-Draft draft-ietf-tcpm-frto-02.txt (work in progress).
November 2004. November 2004.
[ZKFP03] M. Zhang, B. Karp, S. Floyd, L. Peterson, "RR-TCP: A [ZKFP03] M. Zhang, B. Karp, S. Floyd, L. Peterson, "RR-TCP: A
Reordering-Robust TCP with DSACK", in Proceedings of the Eleventh Reordering-Robust TCP with DSACK", in Proceedings of the Eleventh
IEEE International Conference on Networking Protocols (ICNP 2003), IEEE International Conference on Networking Protocols (ICNP 2003),
Atlanta, GA, November, 2003. Atlanta, GA, November, 2003.
11. Author's Addresses 12. Author's Addresses
Sumitha Bhandarkar Sumitha Bhandarkar
Dept. of Elec. Engg. Dept. of Elec. Engg.
214 ZACH 214 ZACH
College Station, TX 77843-3128 College Station, TX 77843-3128
Phone: (512) 468-8078 Phone: (512) 468-8078
Email: sumitha@tamu.edu Email: sumitha@tamu.edu
URL : http://students.cs.tamu.edu/sumitha/ URL : http://students.cs.tamu.edu/sumitha/
A. L. Narasimha Reddy A. L. Narasimha Reddy
skipping to change at page 17, line 7 skipping to change at page 18, line 7
This document and the information contained herein are provided on an This document and the information contained herein are provided on an
"AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS
OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND THE INTERNET OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND THE INTERNET
ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED, ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED,
INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE
INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED
WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
Copyright Statement Copyright Statement
Copyright (C) The Internet Society (2005). This document is subject Copyright (C) The Internet Society (2006). This document is subject
to the rights, licenses and restrictions contained in BCP 78, and to the rights, licenses and restrictions contained in BCP 78, and
except as set forth therein, the authors retain all their rights. except as set forth therein, the authors retain all their rights.
Acknowledgment Acknowledgment
Funding for the RFC Editor function is currently provided by the Funding for the RFC Editor function is currently provided by the
Internet Society. Internet Society.
 End of changes. 39 change blocks. 
109 lines changed or deleted 155 lines changed or added

This html diff was produced by rfcdiff 1.29, available from http://www.levkowetz.com/ietf/tools/rfcdiff/