draft-ietf-tcpm-rfc2581bis-07.txt   rfc5681.txt 
Network Working Group M. Allman Network Working Group M. Allman
Internet-Draft V. Paxson Request for Comments: 5681 V. Paxson
Obsoletes: 2581 ICSI Obsoletes: 2581 ICSI
Intended status: Draft Standard E. Blanton Category: Standards Track E. Blanton
Expires: January 27 2010 Purdue University Purdue University
July 27 2009 September 2009
TCP Congestion Control TCP Congestion Control
draft-ietf-tcpm-rfc2581bis-07.txt
Status of this Memo
This Internet-Draft is submitted to IETF in full conformance with
the provisions of BCP 78 and BCP 79. This document may contain
material from IETF Documents or IETF Contributions published or made
publicly available before November 10, 2008. The person(s)
controlling the copyright in some of this material may not have
granted the IETF Trust the right to allow modifications of such
material outside the IETF Standards Process. Without obtaining an
adequate license from the person(s) controlling the copyright in
such materials, this document may not be modified outside the IETF
Standards Process, and derivative works of it may not be created
outside the IETF Standards Process, except to format it for
publication as an RFC or to translate it into languages other than
English.
Internet-Drafts are working documents of the Internet Engineering Abstract
Task Force (IETF), its areas, and its working groups. Note that
other groups may also distribute working documents as
Internet-Drafts.
Internet-Drafts are draft documents valid for a maximum of six This document defines TCP's four intertwined congestion control
months and may be updated, replaced, or obsoleted by other documents algorithms: slow start, congestion avoidance, fast retransmit, and
at any time. It is inappropriate to use Internet-Drafts as fast recovery. In addition, the document specifies how TCP should
reference material or to cite them other than as "work in progress." begin transmission after a relatively long idle period, as well as
discussing various acknowledgment generation methods. This document
obsoletes RFC 2581.
The list of current Internet-Drafts can be accessed at Status of This Memo
http://www.ietf.org/ietf/1id-abstracts.txt.
The list of Internet-Draft Shadow Directories can be accessed at This document specifies an Internet standards track protocol for the
http://www.ietf.org/shadow.html. Internet community, and requests discussion and suggestions for
improvements. Please refer to the current edition of the "Internet
Official Protocol Standards" (STD 1) for the standardization state
and status of this protocol. Distribution of this memo is unlimited.
Copyright Statement Copyright Notice
Copyright (c) 2009 IETF Trust and the persons identified as the Copyright (c) 2009 IETF Trust and the persons identified as the
document authors. All rights reserved. document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents in effect on the date of Provisions Relating to IETF Documents in effect on the date of
publication of this document (http://trustee.ietf.org/license-info). publication of this document (http://trustee.ietf.org/license-info).
Please review these documents carefully, as they describe your Please review these documents carefully, as they describe your rights
rights and restrictions with respect to this document. and restrictions with respect to this document.
This document may contain material from IETF Documents or IETF This document may contain material from IETF Documents or IETF
Contributions published or made publicly available before November Contributions published or made publicly available before November
10, 2008. The person(s) controlling the copyright in some of this 10, 2008. The person(s) controlling the copyright in some of this
material may not have granted the IETF Trust the right to allow material may not have granted the IETF Trust the right to allow
modifications of such material outside the IETF Standards Process. modifications of such material outside the IETF Standards Process.
Without obtaining an adequate license from the person(s) controlling Without obtaining an adequate license from the person(s) controlling
the copyright in such materials, this document may not be modified the copyright in such materials, this document may not be modified
outside the IETF Standards Process, and derivative works of it may outside the IETF Standards Process, and derivative works of it may
not be created outside the IETF Standards Process, except to format not be created outside the IETF Standards Process, except to format
it for publication as an RFC or to translate it into languages other it for publication as an RFC or to translate it into languages other
than English. than English.
Abstract
This document defines TCP's four intertwined congestion control
algorithms: slow start, congestion avoidance, fast retransmit, and
fast recovery. In addition, the document specifies how TCP should
begin transmission after a relatively long idle period, as well as
discussing various acknowledgment generation methods. This document
obsoletes RFC 2581.
Table Of Contents Table Of Contents
1. Introduction. . . . . . . . . . . . . . . . . 2 1. Introduction ....................................................2
2. Definitions . . . . . . . . . . . . . . . . . 3 2. Definitions .....................................................3
3. Congestion Control Algorithms . . . . . . . . 4 3. Congestion Control Algorithms ...................................4
3.1 Slow Start and Congestion Avoidance . . . . . 4 3.1. Slow Start and Congestion Avoidance ........................4
3.2 Fast Retransmit/Fast Recovery . . . . . . . . 7 3.2. Fast Retransmit/Fast Recovery ..............................8
4. Additional Considerations . . . . . . . . . . 9 4. Additional Considerations ......................................10
4.1 Re-starting Idle Connections. . . . . . . . . 9 4.1. Restarting Idle Connections ...............................10
4.2 Generating Acknowledgments. . . . . . . . . . 10 4.2. Generating Acknowledgments ................................11
4.3 Loss Recovery Mechanisms. . . . . . . . . . . 11 4.3. Loss Recovery Mechanisms ..................................12
5. Security Considerations . . . . . . . . . . . 12 5. Security Considerations ........................................13
6. Changes Between RFC 2001 and RFC 2581 . . . . 12 6. Changes between RFC 2001 and RFC 2581 ..........................13
7. Changes Relative to RFC 2581. . . . . . . . . 12 7. Changes Relative to RFC 2581 ...................................14
8. IANA Considerations . . . . . . . . . . . . . 13 8. Acknowledgments ................................................15
9. References .....................................................15
9.1. Normative References ......................................15
9.2. Informative References ....................................16
1. Introduction 1. Introduction
This document specifies four TCP [RFC793] congestion control This document specifies four TCP [RFC793] congestion control
algorithms: slow start, congestion avoidance, fast retransmit and algorithms: slow start, congestion avoidance, fast retransmit and
fast recovery. These algorithms were devised in [Jac88] and fast recovery. These algorithms were devised in [Jac88] and [Jac90].
[Jac90]. Their use with TCP is standardized in [RFC1122]. Their use with TCP is standardized in [RFC1122]. Additional early
Additional early work in additive-increase, multiplicative-decrease work in additive-increase, multiplicative-decrease congestion control
congestion control is given in [CJ89]. is given in [CJ89].
Note that [Ste94] provides examples of these algorithms in action Note that [Ste94] provides examples of these algorithms in action and
and [WS95] provides an explanation of the source code for the BSD [WS95] provides an explanation of the source code for the BSD
implementation of these algorithms. implementation of these algorithms.
In addition to specifying these congestion control algorithms, this In addition to specifying these congestion control algorithms, this
document specifies what TCP connections should do after a relatively document specifies what TCP connections should do after a relatively
long idle period, as well as specifying and clarifying some of the long idle period, as well as specifying and clarifying some of the
issues pertaining to TCP ACK generation. issues pertaining to TCP ACK generation.
This document obsoletes [RFC2581], which in turn obsoleted This document obsoletes [RFC2581], which in turn obsoleted [RFC2001].
[RFC2001].
This document is organized as follows. Section 2 provides various This document is organized as follows. Section 2 provides various
definitions which will be used throughout the document. Section 3 definitions that will be used throughout the document. Section 3
provides a specification of the congestion control provides a specification of the congestion control algorithms.
algorithms. Section 4 outlines concerns related to the congestion Section 4 outlines concerns related to the congestion control
control algorithms and finally, section 5 outlines security algorithms and finally, section 5 outlines security considerations.
considerations.
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
document are to be interpreted as described in [RFC2119]. document are to be interpreted as described in [RFC2119].
2. Definitions 2. Definitions
This section provides the definition of several terms that will be This section provides the definition of several terms that will be
used throughout the remainder of this document. used throughout the remainder of this document.
skipping to change at page 3, line 37 skipping to change at page 3, line 27
SENDER MAXIMUM SEGMENT SIZE (SMSS): The SMSS is the size of the SENDER MAXIMUM SEGMENT SIZE (SMSS): The SMSS is the size of the
largest segment that the sender can transmit. This value can be largest segment that the sender can transmit. This value can be
based on the maximum transmission unit of the network, the path based on the maximum transmission unit of the network, the path
MTU discovery [RFC1191,RFC4821] algorithm, RMSS (see next item), MTU discovery [RFC1191,RFC4821] algorithm, RMSS (see next item),
or other factors. The size does not include the TCP/IP headers or other factors. The size does not include the TCP/IP headers
and options. and options.
RECEIVER MAXIMUM SEGMENT SIZE (RMSS): The RMSS is the size of the RECEIVER MAXIMUM SEGMENT SIZE (RMSS): The RMSS is the size of the
largest segment the receiver is willing to accept. This is the largest segment the receiver is willing to accept. This is the
value specified in the MSS option sent by the receiver during value specified in the MSS option sent by the receiver during
connection startup. Or, if the MSS option is not used, 536 connection startup. Or, if the MSS option is not used, it is 536
bytes [RFC1122]. The size does not include the TCP/IP headers bytes [RFC1122]. The size does not include the TCP/IP headers and
and options. options.
FULL-SIZED SEGMENT: A segment that contains the maximum number of FULL-SIZED SEGMENT: A segment that contains the maximum number of
data bytes permitted (i.e., a segment containing SMSS bytes of data bytes permitted (i.e., a segment containing SMSS bytes of
data). data).
RECEIVER WINDOW (rwnd): The most recently advertised receiver RECEIVER WINDOW (rwnd): The most recently advertised receiver window.
window.
CONGESTION WINDOW (cwnd): A TCP state variable that limits the CONGESTION WINDOW (cwnd): A TCP state variable that limits the amount
amount of data a TCP can send. At any given time, a TCP MUST of data a TCP can send. At any given time, a TCP MUST NOT send
NOT send data with a sequence number higher than the sum of the data with a sequence number higher than the sum of the highest
highest acknowledged sequence number and the minimum of cwnd and acknowledged sequence number and the minimum of cwnd and rwnd.
rwnd.
INITIAL WINDOW (IW): The initial window is the size of the sender's INITIAL WINDOW (IW): The initial window is the size of the sender's
congestion window after the three-way handshake is completed. congestion window after the three-way handshake is completed.
LOSS WINDOW (LW): The loss window is the size of the congestion LOSS WINDOW (LW): The loss window is the size of the congestion
window after a TCP sender detects loss using its retransmission window after a TCP sender detects loss using its retransmission
timer. timer.
RESTART WINDOW (RW): The restart window is the size of the RESTART WINDOW (RW): The restart window is the size of the congestion
congestion window after a TCP restarts transmission after an window after a TCP restarts transmission after an idle period (if
idle period (if the slow start algorithm is used; see section the slow start algorithm is used; see section 4.1 for more
4.1 for more discussion). discussion).
FLIGHT SIZE: The amount of data that has been sent but not yet FLIGHT SIZE: The amount of data that has been sent but not yet
cumulatively acknowledged. cumulatively acknowledged.
DUPLICATE ACKNOWLEDGMENT: An acknowledgment is considered a DUPLICATE ACKNOWLEDGMENT: An acknowledgment is considered a
"duplicate" in the following algorithms when (a) the receiver of "duplicate" in the following algorithms when (a) the receiver of
the ACK has outstanding data, (b) the incoming acknowledgment the ACK has outstanding data, (b) the incoming acknowledgment
carries no data, (c) the SYN and FIN bits are both off, (d) the carries no data, (c) the SYN and FIN bits are both off, (d) the
acknowledgment number is equal to the greatest acknowledgment acknowledgment number is equal to the greatest acknowledgment
received on the given connection (TCP.UNA from [RFC793]) and (e) received on the given connection (TCP.UNA from [RFC793]) and (e)
the advertised window in the incoming acknowledgment equals the the advertised window in the incoming acknowledgment equals the
advertised window in the last incoming acknowledgment. advertised window in the last incoming acknowledgment.
Alternatively, a TCP that utilizes selective acknowledgments Alternatively, a TCP that utilizes selective acknowledgments
[RFC2018,RFC2883] can leverage the SACK information to determine (SACKs) [RFC2018, RFC2883] can leverage the SACK information to
when an incoming ACK is a "duplicate" (e.g., if the ACK contains determine when an incoming ACK is a "duplicate" (e.g., if the ACK
previously unknown SACK information). contains previously unknown SACK information).
3. Congestion Control Algorithms 3. Congestion Control Algorithms
This section defines the four congestion control algorithms: slow This section defines the four congestion control algorithms: slow
start, congestion avoidance, fast retransmit and fast recovery, start, congestion avoidance, fast retransmit, and fast recovery,
developed in [Jac88] and [Jac90]. In some situations it may be developed in [Jac88] and [Jac90]. In some situations, it may be
beneficial for a TCP sender to be more conservative than the beneficial for a TCP sender to be more conservative than the
algorithms allow, however a TCP MUST NOT be more aggressive than the algorithms allow; however, a TCP MUST NOT be more aggressive than the
following algorithms allow (that is, MUST NOT send data when the following algorithms allow (that is, MUST NOT send data when the
value of cwnd computed by the following algorithms would not allow value of cwnd computed by the following algorithms would not allow
the data to be sent). the data to be sent).
Also note that the algorithms specified in this document work in Also, note that the algorithms specified in this document work in
terms of using loss as the signal of congestion. Explicit terms of using loss as the signal of congestion. Explicit Congestion
Congestion Notification (ECN) could also be used as specified in Notification (ECN) could also be used as specified in [RFC3168].
[RFC3168].
3.1 Slow Start and Congestion Avoidance 3.1. Slow Start and Congestion Avoidance
The slow start and congestion avoidance algorithms MUST be used by a The slow start and congestion avoidance algorithms MUST be used by a
TCP sender to control the amount of outstanding data being injected TCP sender to control the amount of outstanding data being injected
into the network. To implement these algorithms, two variables are into the network. To implement these algorithms, two variables are
added to the TCP per-connection state. The congestion window (cwnd) added to the TCP per-connection state. The congestion window (cwnd)
is a sender-side limit on the amount of data the sender can transmit is a sender-side limit on the amount of data the sender can transmit
into the network before receiving an acknowledgment (ACK), while the into the network before receiving an acknowledgment (ACK), while the
receiver's advertised window (rwnd) is a receiver-side limit on the receiver's advertised window (rwnd) is a receiver-side limit on the
amount of outstanding data. The minimum of cwnd and rwnd governs amount of outstanding data. The minimum of cwnd and rwnd governs
data transmission. data transmission.
Another state variable, the slow start threshold (ssthresh), is used Another state variable, the slow start threshold (ssthresh), is used
to determine whether the slow start or congestion avoidance to determine whether the slow start or congestion avoidance algorithm
algorithm is used to control data transmission, as discussed below. is used to control data transmission, as discussed below.
Beginning transmission into a network with unknown conditions Beginning transmission into a network with unknown conditions
requires TCP to slowly probe the network to determine the available requires TCP to slowly probe the network to determine the available
capacity, in order to avoid congesting the network with an capacity, in order to avoid congesting the network with an
inappropriately large burst of data. The slow start algorithm is inappropriately large burst of data. The slow start algorithm is
used for this purpose at the beginning of a transfer, or after used for this purpose at the beginning of a transfer, or after
repairing loss detected by the retransmission timer. Slow start repairing loss detected by the retransmission timer. Slow start
additionally serves to start the "ACK clock" used by the TCP sender additionally serves to start the "ACK clock" used by the TCP sender
to release data into the network in the slow start, congestion to release data into the network in the slow start, congestion
avoidance, and loss recovery algorithms. avoidance, and loss recovery algorithms.
skipping to change at page 5, line 45 skipping to change at page 5, line 44
When initial congestion windows of more than one segment are When initial congestion windows of more than one segment are
implemented along with Path MTU Discovery [RFC1191], and the MSS implemented along with Path MTU Discovery [RFC1191], and the MSS
being used is found to be too large, the congestion window cwnd being used is found to be too large, the congestion window cwnd
SHOULD be reduced to prevent large bursts of smaller segments. SHOULD be reduced to prevent large bursts of smaller segments.
Specifically, cwnd SHOULD be reduced by the ratio of the old segment Specifically, cwnd SHOULD be reduced by the ratio of the old segment
size to the new segment size. size to the new segment size.
The initial value of ssthresh SHOULD be set arbitrarily high (e.g., The initial value of ssthresh SHOULD be set arbitrarily high (e.g.,
to the size of the largest possible advertised window), but ssthresh to the size of the largest possible advertised window), but ssthresh
MUST be reduced in response to congestion. Setting ssthresh as high MUST be reduced in response to congestion. Setting ssthresh as high
as possible allows the network conditions, rather than some as possible allows the network conditions, rather than some arbitrary
arbitrary host limit, to dictate the sending rate. In cases where host limit, to dictate the sending rate. In cases where the end
the end systems have a solid understanding of the network path, more systems have a solid understanding of the network path, more
carefully setting the initial ssthresh value may have merit (e.g., carefully setting the initial ssthresh value may have merit (e.g.,
such that the end host does not create congestion along the path). such that the end host does not create congestion along the path).
The slow start algorithm is used when cwnd < ssthresh, while the The slow start algorithm is used when cwnd < ssthresh, while the
congestion avoidance algorithm is used when cwnd > ssthresh. When congestion avoidance algorithm is used when cwnd > ssthresh. When
cwnd and ssthresh are equal the sender may use either slow start or cwnd and ssthresh are equal, the sender may use either slow start or
congestion avoidance. congestion avoidance.
During slow start, a TCP increments cwnd by at most SMSS bytes for During slow start, a TCP increments cwnd by at most SMSS bytes for
each ACK received that cumulatively acknowledges new data. Slow each ACK received that cumulatively acknowledges new data. Slow
start ends when cwnd exceeds ssthresh (or, optionally, when it start ends when cwnd exceeds ssthresh (or, optionally, when it
reaches it, as noted above) or when congestion is observed. While reaches it, as noted above) or when congestion is observed. While
traditionally TCP implementations have increased cwnd by precisely traditionally TCP implementations have increased cwnd by precisely
SMSS bytes upon receipt of an ACK covering new data, we RECOMMEND SMSS bytes upon receipt of an ACK covering new data, we RECOMMEND
that TCP implementations increase cwnd, per: that TCP implementations increase cwnd, per:
cwnd += min (N, SMSS) (2) cwnd += min (N, SMSS) (2)
where N is the number of previously unacknowledged bytes where N is the number of previously unacknowledged bytes acknowledged
acknowledged in the incoming ACK. This adjustment is part of in the incoming ACK. This adjustment is part of Appropriate Byte
Appropriate Byte Counting [RFC3465] and provides robustness against Counting [RFC3465] and provides robustness against misbehaving
misbehaving receivers which may attempt to induce a sender to receivers that may attempt to induce a sender to artificially inflate
artificially inflate cwnd using a mechanism known as "ACK Division" cwnd using a mechanism known as "ACK Division" [SCWA99]. ACK
[SCWA99]. ACK Division consists of a receiver sending multiple ACKs Division consists of a receiver sending multiple ACKs for a single
for a single TCP data segment, each acknowledging only a portion of TCP data segment, each acknowledging only a portion of its data. A
its data. A TCP that increments cwnd by SMSS for each such ACK will TCP that increments cwnd by SMSS for each such ACK will
inappropriately inflate the amount of data injected into the inappropriately inflate the amount of data injected into the network.
network.
During congestion avoidance, cwnd is incremented by roughly 1 During congestion avoidance, cwnd is incremented by roughly 1 full-
full-sized segment per round-trip time (RTT). Congestion avoidance sized segment per round-trip time (RTT). Congestion avoidance
continues until congestion is detected. The basic guidelines for continues until congestion is detected. The basic guidelines for
incrementing cwnd during congestion avoidance are: incrementing cwnd during congestion avoidance are:
* MAY increment cwnd by SMSS bytes * MAY increment cwnd by SMSS bytes
* SHOULD increment cwnd per equation (2) once per RTT * SHOULD increment cwnd per equation (2) once per RTT
* MUST NOT increment cwnd by more than SMSS bytes * MUST NOT increment cwnd by more than SMSS bytes
We note that [RFC3465] allows for cwnd increases of more than SMSS We note that [RFC3465] allows for cwnd increases of more than SMSS
bytes for incoming acknowledgments during slow start on an bytes for incoming acknowledgments during slow start on an
experimental basis, however such behavior is not allowed as part of experimental basis; however, such behavior is not allowed as part of
the standard. the standard.
The RECOMMENDED way to increase cwnd during congestion avoidance is The RECOMMENDED way to increase cwnd during congestion avoidance is
to count the number of bytes that have been acknowledged by ACKs for to count the number of bytes that have been acknowledged by ACKs for
new data. (A drawback of this implementation is that it requires new data. (A drawback of this implementation is that it requires
maintaining an additional state variable.) When the number of bytes maintaining an additional state variable.) When the number of bytes
acknowledged reaches cwnd, then cwnd can be incremented by up to acknowledged reaches cwnd, then cwnd can be incremented by up to SMSS
SMSS bytes. Note that during congestion avoidance, cwnd MUST NOT be bytes. Note that during congestion avoidance, cwnd MUST NOT be
increased by more than SMSS bytes per RTT. This method both allows increased by more than SMSS bytes per RTT. This method both allows
TCPs to increase cwnd by one segment per RTT in the face of delayed TCPs to increase cwnd by one segment per RTT in the face of delayed
ACKs and provides robustness against ACK Division attacks. ACKs and provides robustness against ACK Division attacks.
Another common formula that a TCP MAY use to update cwnd during Another common formula that a TCP MAY use to update cwnd during
congestion avoidance is given in equation 3: congestion avoidance is given in equation (3):
cwnd += SMSS*SMSS/cwnd (3) cwnd += SMSS*SMSS/cwnd (3)
This adjustment is executed on every incoming ACK that acknowledges This adjustment is executed on every incoming ACK that acknowledges
new data. Equation (3) provides an acceptable approximation to the new data. Equation (3) provides an acceptable approximation to the
underlying principle of increasing cwnd by 1 full-sized segment per underlying principle of increasing cwnd by 1 full-sized segment per
RTT. (Note that for a connection in which the receiver is RTT. (Note that for a connection in which the receiver is
acknowledging every-other packet, (3) is less aggressive than acknowledging every-other packet, (3) is less aggressive than allowed
allowed -- roughly increasing cwnd every second RTT.) -- roughly increasing cwnd every second RTT.)
Implementation Note: Since integer arithmetic is usually used in TCP Implementation Note: Since integer arithmetic is usually used in TCP
implementations, the formula given in equation 3 can fail to implementations, the formula given in equation (3) can fail to
increase cwnd when the congestion window is larger than SMSS*SMSS. increase cwnd when the congestion window is larger than SMSS*SMSS.
If the above formula yields 0, the result SHOULD be rounded up to 1 If the above formula yields 0, the result SHOULD be rounded up to 1
byte. byte.
Implementation Note: Older implementations have an additional Implementation Note: Older implementations have an additional
additive constant on the right-hand side of equation (3). This is additive constant on the right-hand side of equation (3). This is
incorrect and can actually lead to diminished performance [RFC2525]. incorrect and can actually lead to diminished performance [RFC2525].
Implementation Note: Some implementations maintain cwnd in units of Implementation Note: Some implementations maintain cwnd in units of
bytes, while others in units of full-sized segments. The latter bytes, while others in units of full-sized segments. The latter will
will find equation (3) difficult to use, and may prefer to use the find equation (3) difficult to use, and may prefer to use the
counting approach discussed in the previous paragraph. counting approach discussed in the previous paragraph.
When a TCP sender detects segment loss using the retransmission When a TCP sender detects segment loss using the retransmission timer
timer and the given segment has not yet been resent by way of the and the given segment has not yet been resent by way of the
retransmission timer, the value of ssthresh MUST be set to no more retransmission timer, the value of ssthresh MUST be set to no more
than the value given in equation 4: than the value given in equation (4):
ssthresh = max (FlightSize / 2, 2*SMSS) (4) ssthresh = max (FlightSize / 2, 2*SMSS) (4)
where, as discussed above, FlightSize is the amount of outstanding where, as discussed above, FlightSize is the amount of outstanding
data in the network. data in the network.
On the other hand, when a TCP sender detects segment loss using the On the other hand, when a TCP sender detects segment loss using the
retransmission timer and the given segment has already been retransmission timer and the given segment has already been
retransmitted by way of the retransmission timer at least once, the retransmitted by way of the retransmission timer at least once, the
value of ssthresh is held constant. value of ssthresh is held constant.
Implementation Note: An easy mistake to make is to simply use cwnd, Implementation Note: An easy mistake to make is to simply use cwnd,
rather than FlightSize, which in some implementations may rather than FlightSize, which in some implementations may
incidentally increase well beyond rwnd. incidentally increase well beyond rwnd.
Furthermore, upon a timeout (as specified in [RFC2988]) cwnd MUST be Furthermore, upon a timeout (as specified in [RFC2988]) cwnd MUST be
set to no more than the loss window, LW, which equals 1 full-sized set to no more than the loss window, LW, which equals 1 full-sized
segment (regardless of the value of IW). Therefore, after segment (regardless of the value of IW). Therefore, after
retransmitting the dropped segment the TCP sender uses the slow retransmitting the dropped segment the TCP sender uses the slow start
start algorithm to increase the window from 1 full-sized segment to algorithm to increase the window from 1 full-sized segment to the new
the new value of ssthresh, at which point congestion avoidance again value of ssthresh, at which point congestion avoidance again takes
takes over. over.
As shown in [FF96,RFC3782], slow start-based loss recovery after a As shown in [FF96] and [RFC3782], slow-start-based loss recovery
timeout can cause spurious retransmissions that trigger duplicate after a timeout can cause spurious retransmissions that trigger
acknowledgments. The reaction to the arrival of these duplicate duplicate acknowledgments. The reaction to the arrival of these
ACKs in TCP implementations varies widely. This document does not duplicate ACKs in TCP implementations varies widely. This document
specify how to treat such acknowledgments, but does note this as an does not specify how to treat such acknowledgments, but does note
area that may benefit from additional attention, experimentation and this as an area that may benefit from additional attention,
specification. experimentation and specification.
3.2. Fast Retransmit/Fast Recovery
3.2 Fast Retransmit/Fast Recovery
A TCP receiver SHOULD send an immediate duplicate ACK when an out- A TCP receiver SHOULD send an immediate duplicate ACK when an out-
of-order segment arrives. The purpose of this ACK is to inform the of-order segment arrives. The purpose of this ACK is to inform the
sender that a segment was received out-of-order and which sequence sender that a segment was received out-of-order and which sequence
number is expected. From the sender's perspective, duplicate ACKs number is expected. From the sender's perspective, duplicate ACKs
can be caused by a number of network problems. First, they can be can be caused by a number of network problems. First, they can be
caused by dropped segments. In this case, all segments after the caused by dropped segments. In this case, all segments after the
dropped segment will trigger duplicate ACKs until the loss is dropped segment will trigger duplicate ACKs until the loss is
repaired. Second, duplicate ACKs can be caused by the re-ordering repaired. Second, duplicate ACKs can be caused by the re-ordering of
of data segments by the network (not a rare event along some network data segments by the network (not a rare event along some network
paths [Pax97]). Finally, duplicate ACKs can be caused by paths [Pax97]). Finally, duplicate ACKs can be caused by replication
replication of ACK or data segments by the network. In addition, a of ACK or data segments by the network. In addition, a TCP receiver
TCP receiver SHOULD send an immediate ACK when the incoming segment SHOULD send an immediate ACK when the incoming segment fills in all
fills in all or part of a gap in the sequence space. This will or part of a gap in the sequence space. This will generate more
generate more timely information for a sender recovering from a loss timely information for a sender recovering from a loss through a
through a retransmission timeout, a fast retransmit, or an advanced retransmission timeout, a fast retransmit, or an advanced loss
loss recovery algorithm, as outlined in section 4.3. recovery algorithm, as outlined in section 4.3.
The TCP sender SHOULD use the "fast retransmit" algorithm to detect The TCP sender SHOULD use the "fast retransmit" algorithm to detect
and repair loss, based on incoming duplicate ACKs. The fast and repair loss, based on incoming duplicate ACKs. The fast
retransmit algorithm uses the arrival of 3 duplicate ACKs (as retransmit algorithm uses the arrival of 3 duplicate ACKs (as defined
defined in section 2, without any intervening ACKs which move in section 2, without any intervening ACKs which move SND.UNA) as an
SND.UNA) as an indication that a segment has been lost. After indication that a segment has been lost. After receiving 3 duplicate
receiving 3 duplicate ACKs, TCP performs a retransmission of what ACKs, TCP performs a retransmission of what appears to be the missing
appears to be the missing segment, without waiting for the segment, without waiting for the retransmission timer to expire.
retransmission timer to expire.
After the fast retransmit algorithm sends what appears to be the After the fast retransmit algorithm sends what appears to be the
missing segment, the "fast recovery" algorithm governs the missing segment, the "fast recovery" algorithm governs the
transmission of new data until a non-duplicate ACK arrives. The transmission of new data until a non-duplicate ACK arrives. The
reason for not performing slow start is that the receipt of the reason for not performing slow start is that the receipt of the
duplicate ACKs not only indicates that a segment has been lost, but duplicate ACKs not only indicates that a segment has been lost, but
also that segments are most likely leaving the network (although a also that segments are most likely leaving the network (although a
massive segment duplication by the network can invalidate this massive segment duplication by the network can invalidate this
conclusion). In other words, since the receiver can only generate a conclusion). In other words, since the receiver can only generate a
duplicate ACK when a segment has arrived, that segment has left the duplicate ACK when a segment has arrived, that segment has left the
network and is in the receiver's buffer, so we know it is no longer network and is in the receiver's buffer, so we know it is no longer
consuming network resources. Furthermore, since the ACK "clock" consuming network resources. Furthermore, since the ACK "clock"
[Jac88] is preserved, the TCP sender can continue to transmit new [Jac88] is preserved, the TCP sender can continue to transmit new
segments (although transmission must continue using a reduced cwnd, segments (although transmission must continue using a reduced cwnd,
since loss is an indication of congestion). since loss is an indication of congestion).
The fast retransmit and fast recovery algorithms are implemented The fast retransmit and fast recovery algorithms are implemented
together as follows. together as follows.
1. On the first and second duplicate ACKs received at a sender, a 1. On the first and second duplicate ACKs received at a sender, a
TCP SHOULD send a segment of previously unsent data per TCP SHOULD send a segment of previously unsent data per [RFC3042]
[RFC3042] provided that the receiver's advertised window allows, provided that the receiver's advertised window allows, the total
the total FlightSize would remain less than or equal to cwnd FlightSize would remain less than or equal to cwnd plus 2*SMSS,
plus 2*SMSS, and that new data is available for transmission. and that new data is available for transmission. Further, the
Further, the TCP sender MUST NOT change cwnd to reflect these TCP sender MUST NOT change cwnd to reflect these two segments
two segments [RFC3042]. Note that a sender using SACK [RFC2018] [RFC3042]. Note that a sender using SACK [RFC2018] MUST NOT send
MUST NOT send new data unless the incoming duplicate new data unless the incoming duplicate acknowledgment contains
acknowledgment contains new SACK information. new SACK information.
2. When the third duplicate ACK is received, a TCP MUST set 2. When the third duplicate ACK is received, a TCP MUST set ssthresh
ssthresh to no more than the value given in equation 4. When to no more than the value given in equation (4). When [RFC3042]
[RFC3042] is in use, additional data sent in limited transmit is in use, additional data sent in limited transmit MUST NOT be
MUST NOT be included in this calculation. included in this calculation.
3. The lost segment starting at SND.UNA MUST be retransmitted and 3. The lost segment starting at SND.UNA MUST be retransmitted and
cwnd set to ssthresh plus 3*SMSS. This artificially "inflates" cwnd set to ssthresh plus 3*SMSS. This artificially "inflates"
the congestion window by the number of segments (three) that the congestion window by the number of segments (three) that have
have left the network and which the receiver has buffered. left the network and which the receiver has buffered.
4. For each additional duplicate ACK received (after the third), 4. For each additional duplicate ACK received (after the third),
cwnd MUST be incremented by SMSS. This artificially inflates cwnd MUST be incremented by SMSS. This artificially inflates the
the congestion window in order to reflect the additional segment congestion window in order to reflect the additional segment that
that has left the network. has left the network.
Note: [SCWA99] discusses a receiver-based attack whereby many Note: [SCWA99] discusses a receiver-based attack whereby many
bogus duplicate ACKs are sent to the data sender in order to bogus duplicate ACKs are sent to the data sender in order to
artificially inflate cwnd and cause a higher than appropriate artificially inflate cwnd and cause a higher than appropriate
sending rate to be used. A TCP MAY therefore limit the number sending rate to be used. A TCP MAY therefore limit the number of
of times cwnd is artificially inflated during loss recovery times cwnd is artificially inflated during loss recovery to the
to the number of outstanding segments (or, an approximation number of outstanding segments (or, an approximation thereof).
thereof).
Note: When an advanced loss recovery mechanism (such as outlined Note: When an advanced loss recovery mechanism (such as outlined
in section 4.3) is not in use, this increase in FlightSize can in section 4.3) is not in use, this increase in FlightSize can
cause equation 4 to slightly inflate cwnd and ssthresh, as some cause equation (4) to slightly inflate cwnd and ssthresh, as some
of the segments between SND.UNA and SND.NXT are assumed to have of the segments between SND.UNA and SND.NXT are assumed to have
left the network but are still reflected in FlightSize. left the network but are still reflected in FlightSize.
5. When previously unsent data is available and the new value of 5. When previously unsent data is available and the new value of
cwnd and the receiver's advertised window allow, a TCP SHOULD cwnd and the receiver's advertised window allow, a TCP SHOULD
send 1*SMSS bytes of previously unsent data. send 1*SMSS bytes of previously unsent data.
6. When the next ACK arrives that acknowledges previously 6. When the next ACK arrives that acknowledges previously
unacknowledged data, a TCP MUST set cwnd to ssthresh (the value unacknowledged data, a TCP MUST set cwnd to ssthresh (the value
set in step 2). This is termed "deflating" the window. set in step 2). This is termed "deflating" the window.
skipping to change at page 9, line 56 skipping to change at page 10, line 36
Additionally, this ACK should acknowledge all the intermediate Additionally, this ACK should acknowledge all the intermediate
segments sent between the lost segment and the receipt of the segments sent between the lost segment and the receipt of the
third duplicate ACK, if none of these were lost. third duplicate ACK, if none of these were lost.
Note: This algorithm is known to generally not recover efficiently Note: This algorithm is known to generally not recover efficiently
from multiple losses in a single flight of packets [FF96]. Section from multiple losses in a single flight of packets [FF96]. Section
4.3 below addresses such cases. 4.3 below addresses such cases.
4. Additional Considerations 4. Additional Considerations
4.1 Re-starting Idle Connections 4.1. Restarting Idle Connections
A known problem with the TCP congestion control algorithms described A known problem with the TCP congestion control algorithms described
above is that they allow a potentially inappropriate burst of above is that they allow a potentially inappropriate burst of traffic
traffic to be transmitted after TCP has been idle for a relatively to be transmitted after TCP has been idle for a relatively long
long period of time. After an idle period, TCP cannot use the ACK period of time. After an idle period, TCP cannot use the ACK clock
clock to strobe new segments into the network, as all the ACKs have to strobe new segments into the network, as all the ACKs have drained
drained from the network. Therefore, as specified above, TCP can from the network. Therefore, as specified above, TCP can potentially
potentially send a cwnd-size line-rate burst into the network after send a cwnd-size line-rate burst into the network after an idle
an idle period. In addition, changing network conditions may have period. In addition, changing network conditions may have rendered
rendered TCP's notion of the available end-to-end network capacity TCP's notion of the available end-to-end network capacity between two
between two endpoints, as estimated by cwnd, inaccurate during the endpoints, as estimated by cwnd, inaccurate during the course of a
course of a long idle period. long idle period.
[Jac88] recommends that a TCP use slow start to restart [Jac88] recommends that a TCP use slow start to restart transmission
transmission after a relatively long idle period. Slow start after a relatively long idle period. Slow start serves to restart
serves to restart the ACK clock, just as it does at the beginning the ACK clock, just as it does at the beginning of a transfer. This
of a transfer. This mechanism has been widely deployed in the mechanism has been widely deployed in the following manner. When TCP
following manner. When TCP has not received a segment for more has not received a segment for more than one retransmission timeout,
than one retransmission timeout, cwnd is reduced to the value of cwnd is reduced to the value of the restart window (RW) before
the restart window (RW) before transmission begins. transmission begins.
For the purposes of this standard, we define RW = min(IW,cwnd). For the purposes of this standard, we define RW = min(IW,cwnd).
Using the last time a segment was received to determine whether or Using the last time a segment was received to determine whether or
not to decrease cwnd can fail to deflate cwnd in the common case of not to decrease cwnd can fail to deflate cwnd in the common case of
persistent HTTP connections [HTH98]. In this case, a Web server persistent HTTP connections [HTH98]. In this case, a Web server
receives a request before transmitting data to the Web client. The receives a request before transmitting data to the Web client. The
reception of the request makes the test for an idle connection fail, reception of the request makes the test for an idle connection fail,
and allows the TCP to begin transmission with a possibly and allows the TCP to begin transmission with a possibly
inappropriately large cwnd. inappropriately large cwnd.
Therefore, a TCP SHOULD set cwnd to no more than RW before beginning Therefore, a TCP SHOULD set cwnd to no more than RW before beginning
transmission if the TCP has not sent data in an interval exceeding transmission if the TCP has not sent data in an interval exceeding
the retransmission timeout. the retransmission timeout.
4.2 Generating Acknowledgments 4.2. Generating Acknowledgments
The delayed ACK algorithm specified in [RFC1122] SHOULD be used by a The delayed ACK algorithm specified in [RFC1122] SHOULD be used by a
TCP receiver. When using delayed ACKs, a TCP receiver MUST NOT TCP receiver. When using delayed ACKs, a TCP receiver MUST NOT
excessively delay acknowledgments. Specifically, an ACK SHOULD be excessively delay acknowledgments. Specifically, an ACK SHOULD be
generated for at least every second full-sized segment, and MUST be generated for at least every second full-sized segment, and MUST be
generated within 500 ms of the arrival of the first unacknowledged generated within 500 ms of the arrival of the first unacknowledged
packet. packet.
The requirement that an ACK "SHOULD" be generated for at least every The requirement that an ACK "SHOULD" be generated for at least every
second full-sized segment is listed in [RFC1122] in one place as a second full-sized segment is listed in [RFC1122] in one place as a
SHOULD and another as a MUST. Here we unambiguously state it is a SHOULD and another as a MUST. Here we unambiguously state it is a
SHOULD. We also emphasize that this is a SHOULD, meaning that an SHOULD. We also emphasize that this is a SHOULD, meaning that an
implementor should indeed only deviate from this requirement after implementor should indeed only deviate from this requirement after
careful consideration of the implications. See the discussion of careful consideration of the implications. See the discussion of
"Stretch ACK violation" in [RFC2525] and the references therein for "Stretch ACK violation" in [RFC2525] and the references therein for a
a discussion of the possible performance problems with generating discussion of the possible performance problems with generating ACKs
ACKs less frequently than every second full-sized segment. less frequently than every second full-sized segment.
In some cases, the sender and receiver may not agree on what In some cases, the sender and receiver may not agree on what
constitutes a full-sized segment. An implementation is deemed to constitutes a full-sized segment. An implementation is deemed to
comply with this requirement if it sends at least one acknowledgment comply with this requirement if it sends at least one acknowledgment
every time it receives 2*RMSS bytes of new data from the sender, every time it receives 2*RMSS bytes of new data from the sender,
where RMSS is the Maximum Segment Size specified by the receiver to where RMSS is the Maximum Segment Size specified by the receiver to
the sender (or the default value of 536 bytes, per [RFC1122], if the the sender (or the default value of 536 bytes, per [RFC1122], if the
receiver does not specify an MSS option during connection receiver does not specify an MSS option during connection
establishment). The sender may be forced to use a segment size less establishment). The sender may be forced to use a segment size less
than RMSS due to the maximum transmission unit (MTU), the path MTU than RMSS due to the maximum transmission unit (MTU), the path MTU
discovery algorithm or other factors. For instance, consider the discovery algorithm or other factors. For instance, consider the
case when the receiver announces an RMSS of X bytes but the sender case when the receiver announces an RMSS of X bytes but the sender
ends up using a segment size of Y bytes (Y < X) due to path MTU ends up using a segment size of Y bytes (Y < X) due to path MTU
discovery (or the sender's MTU size). The receiver will generate discovery (or the sender's MTU size). The receiver will generate
stretch ACKs if it waits for 2*X bytes to arrive before an ACK is stretch ACKs if it waits for 2*X bytes to arrive before an ACK is
sent. Clearly this will take more than 2 segments of size Y bytes. sent. Clearly this will take more than 2 segments of size Y bytes.
Therefore, while a specific algorithm is not defined, it is Therefore, while a specific algorithm is not defined, it is desirable
desirable for receivers to attempt to prevent this situation, for for receivers to attempt to prevent this situation, for example, by
example by acknowledging at least every second segment, regardless acknowledging at least every second segment, regardless of size.
of size. Finally, we repeat that an ACK MUST NOT be delayed for Finally, we repeat that an ACK MUST NOT be delayed for more than 500
more than 500 ms waiting on a second full-sized segment to arrive. ms waiting on a second full-sized segment to arrive.
Out-of-order data segments SHOULD be acknowledged immediately, in Out-of-order data segments SHOULD be acknowledged immediately, in
order to accelerate loss recovery. To trigger the fast retransmit order to accelerate loss recovery. To trigger the fast retransmit
algorithm, the receiver SHOULD send an immediate duplicate ACK when algorithm, the receiver SHOULD send an immediate duplicate ACK when
it receives a data segment above a gap in the sequence space. To it receives a data segment above a gap in the sequence space. To
provide feedback to senders recovering from losses, the receiver provide feedback to senders recovering from losses, the receiver
SHOULD send an immediate ACK when it receives a data segment that SHOULD send an immediate ACK when it receives a data segment that
fills in all or part of a gap in the sequence space. fills in all or part of a gap in the sequence space.
A TCP receiver MUST NOT generate more than one ACK for every A TCP receiver MUST NOT generate more than one ACK for every incoming
incoming segment, other than to update the offered window as the segment, other than to update the offered window as the receiving
receiving application consumes new data [page 42, RFC793][RFC813]. application consumes new data (see [RFC813] and page 42 of [RFC793]).
4.3 Loss Recovery Mechanisms 4.3. Loss Recovery Mechanisms
A number of loss recovery algorithms that augment fast retransmit A number of loss recovery algorithms that augment fast retransmit and
and fast recovery have been suggested by TCP researchers and fast recovery have been suggested by TCP researchers and specified in
specified in the RFC series. While some of these algorithms are the RFC series. While some of these algorithms are based on the TCP
based on the TCP selective acknowledgment (SACK) option [RFC2018], selective acknowledgment (SACK) option [RFC2018], such as [FF96],
such as [FF96,MM96a,MM96b,RFC3517], others do not require SACKs [MM96a], [MM96b], and [RFC3517], others do not require SACKs, such as
[Hoe96,FF96,RFC3782]. The non-SACK algorithms use "partial [Hoe96], [FF96], and [RFC3782]. The non-SACK algorithms use "partial
acknowledgments" (ACKs which cover previously unacknowledged data, acknowledgments" (ACKs that cover previously unacknowledged data, but
but not all the data outstanding when loss was detected) to trigger not all the data outstanding when loss was detected) to trigger
retransmissions. While this document does not standardize any of retransmissions. While this document does not standardize any of the
the specific algorithms that may improve fast retransmit/fast specific algorithms that may improve fast retransmit/fast recovery,
recovery, these enhanced algorithms are implicitly allowed, as long these enhanced algorithms are implicitly allowed, as long as they
as they follow the general principles of the basic four algorithms follow the general principles of the basic four algorithms outlined
outlined above. above.
That is, when the first loss in a window of data is detected, That is, when the first loss in a window of data is detected,
ssthresh MUST be set to no more than the value given by equation ssthresh MUST be set to no more than the value given by equation (4).
(4). Second, until all lost segments in the window of data in Second, until all lost segments in the window of data in question are
question are repaired, the number of segments transmitted in each repaired, the number of segments transmitted in each RTT MUST be no
RTT MUST be no more than half the number of outstanding segments more than half the number of outstanding segments when the loss was
when the loss was detected. Finally, after all loss in the given detected. Finally, after all loss in the given window of segments
window of segments has been successfully retransmitted, cwnd MUST be has been successfully retransmitted, cwnd MUST be set to no more than
set to no more than ssthresh and congestion avoidance MUST be used ssthresh and congestion avoidance MUST be used to further increase
to further increase cwnd. Loss in two successive windows of data, cwnd. Loss in two successive windows of data, or the loss of a
or the loss of a retransmission, should be taken as two indications retransmission, should be taken as two indications of congestion and,
of congestion and, therefore, cwnd (and ssthresh) MUST be lowered therefore, cwnd (and ssthresh) MUST be lowered twice in this case.
twice in this case.
We RECOMMEND that TCP implementers employ some form of advanced loss We RECOMMEND that TCP implementors employ some form of advanced loss
recovery that can cope with multiple losses in a window of data. recovery that can cope with multiple losses in a window of data. The
The algorithms detailed in [RFC3782] and [RFC3517] conform to the algorithms detailed in [RFC3782] and [RFC3517] conform to the general
general principles outlined above. We note that while these are not principles outlined above. We note that while these are not the only
the only two algorithms that conform to the above general principles two algorithms that conform to the above general principles these two
these two algorithms have been vetted by the community and are algorithms have been vetted by the community and are currently on the
currently on the standards track. Standards Track.
5. Security Considerations 5. Security Considerations
This document requires a TCP to diminish its sending rate in the This document requires a TCP to diminish its sending rate in the
presence of retransmission timeouts and the arrival of duplicate presence of retransmission timeouts and the arrival of duplicate
acknowledgments. An attacker can therefore impair the performance acknowledgments. An attacker can therefore impair the performance of
of a TCP connection by either causing data packets or their a TCP connection by either causing data packets or their
acknowledgments to be lost, or by forging excessive duplicate acknowledgments to be lost, or by forging excessive duplicate
acknowledgments. acknowledgments.
In response to the ACK division attack outlined in [SCWA99] this In response to the ACK division attack outlined in [SCWA99], this
document RECOMMENDS increasing the congestion window based on the document RECOMMENDS increasing the congestion window based on the
number of bytes newly acknowledged in each arriving ACK rather than number of bytes newly acknowledged in each arriving ACK rather than
by a particular constant on each arriving ACK (as outlined in by a particular constant on each arriving ACK (as outlined in section
section 3.1). 3.1).
The Internet to a considerable degree relies on the correct The Internet, to a considerable degree, relies on the correct
implementation of these algorithms in order to preserve network implementation of these algorithms in order to preserve network
stability and avoid congestion collapse. An attacker could cause stability and avoid congestion collapse. An attacker could cause TCP
TCP endpoints to respond more aggressively in the face of congestion endpoints to respond more aggressively in the face of congestion by
by forging excessive duplicate acknowledgments or excessive forging excessive duplicate acknowledgments or excessive
acknowledgments for new data. Conceivably, such an attack could acknowledgments for new data. Conceivably, such an attack could
drive a portion of the network into congestion collapse. drive a portion of the network into congestion collapse.
6. Changes Between RFC 2001 and RFC 2581 6. Changes between RFC 2001 and RFC 2581
[RFC2001] was extensively rewritten editorially and it is not [RFC2001] was extensively rewritten editorially and it is not
feasible to itemize the list of changes between [RFC2001] and feasible to itemize the list of changes between [RFC2001] and
[RFC2581]. The intention of [RFC2581] was to not change any of the [RFC2581]. The intention of [RFC2581] was to not change any of the
recommendations given in [RFC2001], but to further clarify cases recommendations given in [RFC2001], but to further clarify cases that
that were not discussed in detail in [RFC2001]. Specifically, were not discussed in detail in [RFC2001]. Specifically, [RFC2581]
[RFC2581] suggested what TCP connections should do after a suggested what TCP connections should do after a relatively long idle
relatively long idle period, as well as specified and clarified period, as well as specified and clarified some of the issues
some of the issues pertaining to TCP ACK generation. Finally, the pertaining to TCP ACK generation. Finally, the allowable upper bound
allowable upper bound for the initial congestion window was raised for the initial congestion window was raised from one to two
from one to two segments. segments.
7. Changes Relative to RFC 2581 7. Changes Relative to RFC 2581
A specific definition for "duplicate acknowledgment" has been A specific definition for "duplicate acknowledgment" has been added,
added, based on the definition used by BSD TCP. based on the definition used by BSD TCP.
The document now notes that what to do with duplicate ACKs after the The document now notes that what to do with duplicate ACKs after the
retransmission timer has fired is future work and explicitly retransmission timer has fired is future work and explicitly
unspecified in this document. unspecified in this document.
The initial window requirements were changed to allow Larger The initial window requirements were changed to allow Larger Initial
Initial Windows as standardized in [RFC3390]. Additionally, the Windows as standardized in [RFC3390]. Additionally, the steps to
steps to take when an initial window is discovered to be too large take when an initial window is discovered to be too large due to Path
due to Path MTU Discovery [RFC1191] are detailed. MTU Discovery [RFC1191] are detailed.
The recommended initial value for ssthresh has been changed to say The recommended initial value for ssthresh has been changed to say
that it SHOULD be arbitrarily high, where it was previously MAY. that it SHOULD be arbitrarily high, where it was previously MAY.
This is to provide additional guidance to implementors on the This is to provide additional guidance to implementors on the matter.
matter.
During slow start, the usage of Appropriate Byte Counting [RFC3465] During slow start, the usage of Appropriate Byte Counting [RFC3465]
with L=1*SMSS is explicitly recommended. The method of increasing with L=1*SMSS is explicitly recommended. The method of increasing
cwnd given in [RFC2581] is still explicitly allowed. Byte counting cwnd given in [RFC2581] is still explicitly allowed. Byte counting
during congestion avoidance is also recommended, while the method during congestion avoidance is also recommended, while the method
from [RFC2581] and other safe methods are still allowed. from [RFC2581] and other safe methods are still allowed.
The treatment of ssthresh on retransmission timeout was clarified. The treatment of ssthresh on retransmission timeout was clarified.
In particular, ssthresh must be set to half the FlightSize on the In particular, ssthresh must be set to half the FlightSize on the
first retransmission of a given segment and then is held constant on first retransmission of a given segment and then is held constant on
skipping to change at page 13, line 44 skipping to change at page 14, line 50
TCPs now MAY limit the number of duplicate ACKs that artificially TCPs now MAY limit the number of duplicate ACKs that artificially
inflate cwnd during loss recovery to the number of segments inflate cwnd during loss recovery to the number of segments
outstanding to avoid the duplicate ACK spoofing attack described in outstanding to avoid the duplicate ACK spoofing attack described in
[SCWA99]. [SCWA99].
The restart window has been changed to min(IW,cwnd) from IW. This The restart window has been changed to min(IW,cwnd) from IW. This
behavior was described as "experimental" in [RFC2581]. behavior was described as "experimental" in [RFC2581].
It is now recommended that TCP implementors implement an advanced It is now recommended that TCP implementors implement an advanced
loss recovery algorithm conforming to the principles outlined in loss recovery algorithm conforming to the principles outlined in this
this document. document.
The security considerations have been updated to discuss ACK
division and recommend byte counting as a counter to this attack.
8. IANA Considerations
This document contains no IANA considerations, but apparently an The security considerations have been updated to discuss ACK division
Internet *Draft* can no longer be published without this section. and recommend byte counting as a counter to this attack.
Acknowledgments 8. Acknowledgments
The core algorithms we describe were developed by Van Jacobson The core algorithms we describe were developed by Van Jacobson
[Jac88, Jac90]. In addition, Limited Transmit [RFC3042] was [Jac88, Jac90]. In addition, Limited Transmit [RFC3042] was
developed in conjunction with Hari Balakrishnan and Sally Floyd. developed in conjunction with Hari Balakrishnan and Sally Floyd. The
The initial congestion window size specified in this document is a initial congestion window size specified in this document is a result
result of work with Sally Floyd and Craig Partridge of work with Sally Floyd and Craig Partridge [RFC2414, RFC3390].
[RFC2414,RFC3390].
W. Richard ("Rich") Stevens wrote the first version of this document W. Richard ("Rich") Stevens wrote the first version of this document
[RFC2001] and co-authored the second version [RFC2581]. This [RFC2001] and co-authored the second version [RFC2581]. This present
present version much benefits from his clarity and thoughtfulness of version much benefits from his clarity and thoughtfulness of
description, and we are grateful for Rich's contributions in description, and we are grateful for Rich's contributions in
elucidating TCP congestion control, as well as in more broadly elucidating TCP congestion control, as well as in more broadly
helping us understand numerous issues relating to networking. helping us understand numerous issues relating to networking.
We wish to emphasize that the shortcomings and mistakes of this We wish to emphasize that the shortcomings and mistakes of this
document are solely the responsibility of the current authors. document are solely the responsibility of the current authors.
Some of the text from this document is taken from "TCP/IP Some of the text from this document is taken from "TCP/IP
Illustrated, Volume 1: The Protocols" by W. Richard Stevens Illustrated, Volume 1: The Protocols" by W. Richard Stevens
(Addison-Wesley, 1994) and "TCP/IP Illustrated, Volume 2: The (Addison-Wesley, 1994) and "TCP/IP Illustrated, Volume 2: The
Implementation" by Gary R. Wright and W. Richard Stevens (Addison- Implementation" by Gary R. Wright and W. Richard Stevens (Addison-
Wesley, 1995). This material is used with the permission of Wesley, 1995). This material is used with the permission of
Addison-Wesley. Addison-Wesley.
Anil Agarwal, Steve Arden, Neal Cardwell, Noritoshi Demizu, Gorry Anil Agarwal, Steve Arden, Neal Cardwell, Noritoshi Demizu, Gorry
Fairhurst, Kevin Fall, John Heffner, Alfred Hoenes, Sally Floyd, Fairhurst, Kevin Fall, John Heffner, Alfred Hoenes, Sally Floyd,
Reiner Ludwig, Matt Mathis, Craig Partridge and Joe Touch Reiner Ludwig, Matt Mathis, Craig Partridge, and Joe Touch
contributed a number of helpful suggestions. contributed a number of helpful suggestions.
Normative References 9. References
9.1. Normative References
[RFC793] Postel, J., "Transmission Control Protocol", STD 7, RFC [RFC793] Postel, J., "Transmission Control Protocol", STD 7, RFC
793, September 1981. 793, September 1981.
[RFC1122] Braden, R., "Requirements for Internet Hosts -- [RFC1122] Braden, R., Ed., "Requirements for Internet Hosts -
Communication Layers", STD 3, RFC 1122, October 1989. Communication Layers", STD 3, RFC 1122, October 1989.
[RFC1191] Mogul, J. and S. Deering, "Path MTU Discovery", RFC 1191, [RFC1191] Mogul, J. and S. Deering, "Path MTU discovery", RFC 1191,
November 1990. November 1990.
Informative References [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
Requirement Levels", BCP 14, RFC 2119, March 1997.
9.2. Informative References
[CJ89] Chiu, D. and R. Jain, "Analysis of the Increase/Decrease [CJ89] Chiu, D. and R. Jain, "Analysis of the Increase/Decrease
Algorithms for Congestion Avoidance in Computer Networks", Algorithms for Congestion Avoidance in Computer Networks",
Journal of Computer Networks and ISDN Systems, vol. 17, no. 1, Journal of Computer Networks and ISDN Systems, vol. 17, no.
pp. 1-14, June 1989. 1, pp. 1-14, June 1989.
[FF96] Fall, K. and S. Floyd, "Simulation-based Comparisons of [FF96] Fall, K. and S. Floyd, "Simulation-based Comparisons of
Tahoe, Reno and SACK TCP", Computer Communication Review, July Tahoe, Reno and SACK TCP", Computer Communication Review,
1996. ftp://ftp.ee.lbl.gov/papers/sacks.ps.Z. July 1996, ftp://ftp.ee.lbl.gov/papers/sacks.ps.Z.
[Hoe96] Hoe, J., "Improving the Start-up Behavior of a Congestion [Hoe96] Hoe, J., "Improving the Start-up Behavior of a Congestion
Control Scheme for TCP", In ACM SIGCOMM, August 1996. Control Scheme for TCP", In ACM SIGCOMM, August 1996.
[HTH98] Hughes, A., Touch, J. and J. Heidemann, "Issues in TCP [HTH98] Hughes, A., Touch, J., and J. Heidemann, "Issues in TCP
Slow-Start Restart After Idle", Work in Progress. Slow-Start Restart After Idle", Work in Progress, March
1998.
[Jac88] Jacobson, V., "Congestion Avoidance and Control", Computer [Jac88] Jacobson, V., "Congestion Avoidance and Control", Computer
Communication Review, vol. 18, no. 4, pp. 314-329, Aug. 1988. Communication Review, vol. 18, no. 4, pp. 314-329, Aug.
ftp://ftp.ee.lbl.gov/papers/congavoid.ps.Z. 1988. ftp://ftp.ee.lbl.gov/papers/congavoid.ps.Z.
[Jac90] Jacobson, V., "Modified TCP Congestion Avoidance Algorithm", [Jac90] Jacobson, V., "Modified TCP Congestion Avoidance
end2end-interest mailing list, April 30, 1990. Algorithm", end2end-interest mailing list, April 30, 1990.
ftp://ftp.isi.edu/end2end/end2end-interest-1990.mail. ftp://ftp.isi.edu/end2end/end2end-interest-1990.mail.
[MM96a] Mathis, M. and J. Mahdavi, "Forward Acknowledgment: Refining [MM96a] Mathis, M. and J. Mahdavi, "Forward Acknowledgment:
TCP Congestion Control", Proceedings of SIGCOMM'96, August, Refining TCP Congestion Control", Proceedings of
1996, Stanford, CA. Available SIGCOMM'96, August, 1996, Stanford, CA. Available from
from http://www.psc.edu/networking/papers/papers.html http://www.psc.edu/networking/papers/papers.html
[MM96b] Mathis, M. and J. Mahdavi, "TCP Rate-Halving with Bounding [MM96b] Mathis, M. and J. Mahdavi, "TCP Rate-Halving with Bounding
Parameters", Technical report. Available from Parameters", Technical report. Available from
http://www.psc.edu/networking/papers/FACKnotes/current. http://www.psc.edu/networking/papers/FACKnotes/current.
[Pax97] Paxson, V., "End-to-End Internet Packet Dynamics", [Pax97] Paxson, V., "End-to-End Internet Packet Dynamics",
Proceedings of SIGCOMM '97, Cannes, France, Sep. 1997. Proceedings of SIGCOMM '97, Cannes, France, Sep. 1997.
[RFC813] Clark, D., "Window and Acknowledgment Strategy in TCP", RFC [RFC813] Clark, D., "Window and Acknowledgement Strategy in TCP",
813, July 1982. RFC 813, July 1982.
[RFC2001] Stevens, W., "TCP Slow Start, Congestion Avoidance, Fast [RFC2001] Stevens, W., "TCP Slow Start, Congestion Avoidance, Fast
Retransmit, and Fast Recovery Algorithms", RFC 2001, January Retransmit, and Fast Recovery Algorithms", RFC 2001,
1997. January 1997.
[RFC2018] Mathis, M., Mahdavi, J., Floyd, S. and A. Romanow, "TCP
Selective Acknowledgement Options", RFC 2018, October 1996.
[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate [RFC2018] Mathis, M., Mahdavi, J., Floyd, S., and A. Romanow, "TCP
Requirement Levels", BCP 14, RFC 2119, March 1997. Selective Acknowledgment Options", RFC 2018, October 1996.
[RFC2414] Allman, M., Floyd, S. and C. Partridge, "Increasing TCP's [RFC2414] Allman, M., Floyd, S., and C. Partridge, "Increasing TCP's
Initial Window Size", RFC 2414, September 1998. Initial Window", RFC 2414, September 1998.
[RFC2525] Paxson, V., Allman, M., Dawson, S., Fenner, W., Griner, [RFC2525] Paxson, V., Allman, M., Dawson, S., Fenner, W., Griner, J.,
J., Heavens, I., Lahey, K., Semke, J. and B. Volz, "Known TCP Heavens, I., Lahey, K., Semke, J., and B. Volz, "Known TCP
Implementation Problems", RFC 2525, March 1999. Implementation Problems", RFC 2525, March 1999.
[RFC2581] Allman, M., Paxson, V., W. Stevens, TCP Congestion [RFC2581] Allman, M., Paxson, V., and W. Stevens, "TCP Congestion
Control, RFC 2581, April 1999. Control", RFC 2581, April 1999.
[RFC2883] Floyd, S., J. Mahdavi, M. Mathis, M. Podolsky, An [RFC2883] Floyd, S., Mahdavi, J., Mathis, M., and M. Podolsky, "An
Extension to the Selective Acknowledgement (SACK) Option for Extension to the Selective Acknowledgement (SACK) Option
TCP, RFC 2883, July 2000. for TCP", RFC 2883, July 2000.
[RFC2988] V. Paxson and M. Allman, "Computing TCP's Retransmission [RFC2988] Paxson, V. and M. Allman, "Computing TCP's Retransmission
Timer", RFC 2988, November 2000. Timer", RFC 2988, November 2000.
[RFC3042] Allman, M., Balakrishnan, H. and S. Floyd, "Enhancing [RFC3042] Allman, M., Balakrishnan, H., and S. Floyd, "Enhancing
TCP's Loss Recovery Using Limited Transmit", RFC 3042, January TCP's Loss Recovery Using Limited Transmit", RFC 3042,
2001. January 2001.
[RFC3168] K. Ramakrishnan, S. Floyd, D. Black, "The Addition of [RFC3168] Ramakrishnan, K., Floyd, S., and D. Black, "The Addition of
Explicit Congestion Notification (ECN) to IP", RFC 3168, Explicit Congestion Notification (ECN) to IP", RFC 3168,
September 2001. September 2001.
[RFC3390] Allman, M., Floyd, S., C. Partridge, "Increasing TCP's [RFC3390] Allman, M., Floyd, S., and C. Partridge, "Increasing TCP's
Initial Window", RFC 3390, October 2002. Initial Window", RFC 3390, October 2002.
[RFC3465] Mark Allman, TCP Congestion Control with Appropriate Byte [RFC3465] Allman, M., "TCP Congestion Control with Appropriate Byte
Counting (ABC), RFC 3465, February 2003. Counting (ABC)", RFC 3465, February 2003.
[RFC3517] Ethan Blanton, Mark Allman, Kevin Fall, Lili Wang, A [RFC3517] Blanton, E., Allman, M., Fall, K., and L. Wang, "A
Conservative Selective Acknowledgment (SACK)-based Loss Recovery Conservative Selective Acknowledgment (SACK)-based Loss
Algorithm for TCP, RFC 3517, April 2003. Recovery Algorithm for TCP", RFC 3517, April 2003.
[RFC3782] Sally Floyd, Tom Henderson, Andrei Gurtov, The NewReno [RFC3782] Floyd, S., Henderson, T., and A. Gurtov, "The NewReno
Modification to TCP's Fast Recovery Algorithm, RFC 3782, April Modification to TCP's Fast Recovery Algorithm", RFC 3782,
2004. April 2004.
[RFC4821] Matt Mathis, John Heffner, Packetization Layer Path MTU [RFC4821] Mathis, M. and J. Heffner, "Packetization Layer Path MTU
Discovery, RFC 4821, March 2007. Discovery", RFC 4821, March 2007.
[SCWA99] Savage, S., Cardwell, N., Wetherall, D., and T. Anderson, [SCWA99] Savage, S., Cardwell, N., Wetherall, D., and T. Anderson,
"TCP Congestion Control With a Misbehaving Receiver", ACM "TCP Congestion Control With a Misbehaving Receiver", ACM
Computer Communication Review, 29(5), October 1999. Computer Communication Review, 29(5), October 1999.
[Ste94] Stevens, W., "TCP/IP Illustrated, Volume 1: The Protocols", [Ste94] Stevens, W., "TCP/IP Illustrated, Volume 1: The Protocols",
Addison-Wesley, 1994. Addison-Wesley, 1994.
[WS95] Wright, G. and W. Stevens, "TCP/IP Illustrated, Volume 2: The [WS95] Wright, G. and W. Stevens, "TCP/IP Illustrated, Volume 2:
Implementation", Addison-Wesley, 1995. The Implementation", Addison-Wesley, 1995.
Authors' Addresses Authors' Addresses
Mark Allman Mark Allman
International Computer Science Institute (ICSI) International Computer Science Institute (ICSI)
1947 Center Street 1947 Center Street
Suite 600 Suite 600
Berkeley, CA 94704-1198 Berkeley, CA 94704-1198
Phone: +1 440 235 1792 Phone: +1 440 235 1792
EMail: mallman@icir.org EMail: mallman@icir.org
skipping to change at page 17, line 4 skipping to change at page 18, line 27
http://www.icir.org/mallman/ http://www.icir.org/mallman/
Vern Paxson Vern Paxson
International Computer Science Institute (ICSI) International Computer Science Institute (ICSI)
1947 Center Street 1947 Center Street
Suite 600 Suite 600
Berkeley, CA 94704-1198 Berkeley, CA 94704-1198
Phone: +1 510/642-4274 x302 Phone: +1 510/642-4274 x302
EMail: vern@icir.org EMail: vern@icir.org
http://www.icir.org/vern/ http://www.icir.org/vern/
Ethan Blanton Ethan Blanton
Purdue University Computer Sciences Purdue University Computer Sciences
305 North University Street 305 North University Street
West Lafayette, IN 47907 West Lafayette, IN 47907
EMail: eblanton@cs.purdue.edu EMail: eblanton@cs.purdue.edu
http://www.cs.purdue.edu/homes/eblanton/ http://www.cs.purdue.edu/homes/eblanton/
Acknowledgment
Funding for the RFC Editor function is currently provided by the
Internet Society.
 End of changes. 103 change blocks. 
333 lines changed or deleted 301 lines changed or added

This html diff was produced by rfcdiff 1.35. The latest version is available from http://tools.ietf.org/tools/rfcdiff/