draft-ietf-tcpm-rfc2581bis-04.txt   draft-ietf-tcpm-rfc2581bis-05.txt 
Network Working Group M. Allman Network Working Group M. Allman
Internet-Draft V. Paxson Internet-Draft V. Paxson
Expires: October 2008 ICSI Expires: October 2009 ICSI
E. Blanton E. Blanton
Purdue University Purdue University
TCP Congestion Control TCP Congestion Control
draft-ietf-tcpm-rfc2581bis-04.txt draft-ietf-tcpm-rfc2581bis-05.txt
Status of this Memo Status of this Memo
By submitting this Internet-Draft, each author represents that any This Internet-Draft is submitted to IETF in full conformance with
applicable patent or other IPR claims of which he or she is aware the provisions of BCP 78 and BCP 79. This document may contain
have been or will be disclosed, and any of which he or she becomes material from IETF Documents or IETF Contributions published or made
aware will be disclosed, in accordance with Section 6 of BCP 79. publicly available before November 10, 2008. The person(s)
controlling the copyright in some of this material may not have
granted the IETF Trust the right to allow modifications of such
material outside the IETF Standards Process. Without obtaining an
adequate license from the person(s) controlling the copyright in
such materials, this document may not be modified outside the IETF
Standards Process, and derivative works of it may not be created
outside the IETF Standards Process, except to format it for
publication as an RFC or to translate it into languages other than
English.
Internet-Drafts are working documents of the Internet Engineering Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF), its areas, and its working groups. Note that Task Force (IETF), its areas, and its working groups. Note that
other groups may also distribute working documents as other groups may also distribute working documents as
Internet-Drafts. Internet-Drafts.
Internet-Drafts are draft documents valid for a maximum of six Internet-Drafts are draft documents valid for a maximum of six
months and may be updated, replaced, or obsoleted by other documents months and may be updated, replaced, or obsoleted by other documents
at any time. It is inappropriate to use Internet-Drafts as at any time. It is inappropriate to use Internet-Drafts as
reference material or to cite them other than as "work in progress." reference material or to cite them other than as "work in progress."
The list of current Internet-Drafts can be accessed at The list of current Internet-Drafts can be accessed at
http://www.ietf.org/ietf/1id-abstracts.txt. http://www.ietf.org/ietf/1id-abstracts.txt.
The list of Internet-Draft Shadow Directories can be accessed at The list of Internet-Draft Shadow Directories can be accessed at
http://www.ietf.org/shadow.html. http://www.ietf.org/shadow.html.
Copyright Statement
Copyright (c) 2009 IETF Trust and the persons identified as the
document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents in effect on the date of
publication of this document (http://trustee.ietf.org/license-info).
Please review these documents carefully, as they describe your
rights and restrictions with respect to this document.
This document may contain material from IETF Documents or IETF
Contributions published or made publicly available before November
10, 2008. The person(s) controlling the copyright in some of this
material may not have granted the IETF Trust the right to allow
modifications of such material outside the IETF Standards Process.
Without obtaining an adequate license from the person(s) controlling
the copyright in such materials, this document may not be modified
outside the IETF Standards Process, and derivative works of it may
not be created outside the IETF Standards Process, except to format
it for publication as an RFC or to translate it into languages other
than English.
Abstract Abstract
This document defines TCP's four intertwined congestion control This document defines TCP's four intertwined congestion control
algorithms: slow start, congestion avoidance, fast retransmit, and algorithms: slow start, congestion avoidance, fast retransmit, and
fast recovery. In addition, the document specifies how TCP should fast recovery. In addition, the document specifies how TCP should
begin transmission after a relatively long idle period, as well as begin transmission after a relatively long idle period, as well as
discussing various acknowledgment generation methods. discussing various acknowledgment generation methods.
1. Introduction 1. Introduction
This document specifies four TCP [RFC793] congestion control This document specifies four TCP [RFC793] congestion control
algorithms: slow start, congestion avoidance, fast retransmit and algorithms: slow start, congestion avoidance, fast retransmit and
fast recovery. These algorithms were devised in [Jac88] and fast recovery. These algorithms were devised in [Jac88] and
[Jac90]. Their use with TCP is standardized in [RFC1122]. [Jac90]. Their use with TCP is standardized in [RFC1122].
Additional early work in additive-increase, multiplicative-decrease Additional early work in additive-increase, multiplicative-decrease
congestion control is given in [CJ89]. congestion control is given in [CJ89].
This document obsoletes [RFC2581] which in turned obsoleted Note that [Ste94] provides examples of these algorithms in action
[RFC2001]. and [WS95] provides an explanation of the source code for the BSD
implementation of these algorithms.
In addition to specifying the congestion control algorithms, this In addition to specifying these congestion control algorithms, this
document specifies what TCP connections should do after a relatively document specifies what TCP connections should do after a relatively
long idle period, as well as specifying and clarifying some of the long idle period, as well as specifying and clarifying some of the
issues pertaining to TCP ACK generation. issues pertaining to TCP ACK generation.
Note that [Ste94] provides examples of these algorithms in action This document obsoletes [RFC2581], which in turn obsoleted
and [WS95] provides an explanation of the source code for the BSD [RFC2001].
implementation of these algorithms.
This document is organized as follows. Section 2 provides various This document is organized as follows. Section 2 provides various
definitions which will be used throughout the document. Section 3 definitions which will be used throughout the document. Section 3
provides a specification of the congestion control provides a specification of the congestion control
algorithms. Section 4 outlines concerns related to the congestion algorithms. Section 4 outlines concerns related to the congestion
control algorithms and finally, section 5 outlines security control algorithms and finally, section 5 outlines security
considerations. considerations.
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
skipping to change at page 6, line 27 skipping to change at page 7, line 6
Implementation Note: Older implementations have an additional Implementation Note: Older implementations have an additional
additive constant on the right-hand side of equation (3). This is additive constant on the right-hand side of equation (3). This is
incorrect and can actually lead to diminished performance [RFC2525]. incorrect and can actually lead to diminished performance [RFC2525].
Implementation Note: Some implementations maintain cwnd in units of Implementation Note: Some implementations maintain cwnd in units of
bytes, while others in units of full-sized segments. The latter bytes, while others in units of full-sized segments. The latter
will find equation (3) difficult to use, and may prefer to use the will find equation (3) difficult to use, and may prefer to use the
counting approach discussed in the previous paragraph. counting approach discussed in the previous paragraph.
When a TCP sender detects segment loss using the retransmission When a TCP sender detects segment loss using the retransmission
timer and the given segment has not yet been retransmitted, the timer and the given segment has not yet been resent by way of the
value of ssthresh MUST be set to no more than the value given in retransmission timer, the value of ssthresh MUST be set to no more
equation 4: than the value given in equation 4:
ssthresh = max (FlightSize / 2, 2*SMSS) (4) ssthresh = max (FlightSize / 2, 2*SMSS) (4)
where, as discussed above, FlightSize is the amount of outstanding where, as discussed above, FlightSize is the amount of outstanding
data in the network. data in the network.
On the other hand, when a TCP sender detects segment loss using the On the other hand, when a TCP sender detects segment loss using the
retransmission timer and the given segment has already been retransmission timer and the given segment has already been
retransmitted by way of the retransmission timer at least once, the retransmitted by way of the retransmission timer at least once, the
value of ssthresh is held constant. value of ssthresh is held constant.
skipping to change at page 8, line 33 skipping to change at page 9, line 11
that has left the network. that has left the network.
Note: [SCWA99] discusses a receiver-based attack whereby many Note: [SCWA99] discusses a receiver-based attack whereby many
bogus duplicate ACKs are sent to the data sender in order to bogus duplicate ACKs are sent to the data sender in order to
artificially inflate cwnd and cause a higher than appropriate artificially inflate cwnd and cause a higher than appropriate
sending rate to be used. A TCP MAY therefore limit the number sending rate to be used. A TCP MAY therefore limit the number
of times cwnd is artificially inflated during loss recovery of times cwnd is artificially inflated during loss recovery
to the number of outstanding segments (or, an approximation to the number of outstanding segments (or, an approximation
thereof). thereof).
Note: When an advanced loss recovery mechanism (such as outlined
in section 4.3) is not in use, this increase in FlightSize can
cause equation 4 to slightly inflate cwnd and ssthresh, as some
of the segments between SND.UNA and SND.NXT are assumed to have
left the network but are still reflected in FlightSize.
5. When previously unsent data is available and the new value of 5. When previously unsent data is available and the new value of
cwnd and the receiver's advertised window allow, a TCP SHOULD cwnd and the receiver's advertised window allow, a TCP SHOULD
send 1*SMSS bytes of previously unsent data. send 1*SMSS bytes of previously unsent data.
6. When the next ACK arrives that acknowledges previously 6. When the next ACK arrives that acknowledges previously
unacknowledged data, a TCP MUST set cwnd to ssthresh (the value unacknowledged data, a TCP MUST set cwnd to ssthresh (the value
set in step 2). This is termed "deflating" the window. set in step 2). This is termed "deflating" the window.
This ACK should be the acknowledgment elicited by the This ACK should be the acknowledgment elicited by the
retransmission from step 3, one RTT after the retransmission retransmission from step 3, one RTT after the retransmission
skipping to change at page 11, line 19 skipping to change at page 12, line 4
We RECOMMEND that TCP implementers employ some form of advanced loss We RECOMMEND that TCP implementers employ some form of advanced loss
recovery that can cope with multiple losses in a window of data. recovery that can cope with multiple losses in a window of data.
The algorithms detailed in [RFC3782] and [RFC3517] conform to the The algorithms detailed in [RFC3782] and [RFC3517] conform to the
general principles outlined above. We note that while these are not general principles outlined above. We note that while these are not
the only two algorithms that conform to the above general principles the only two algorithms that conform to the above general principles
these two algorithms have been vetted by the community and are these two algorithms have been vetted by the community and are
currently on the standards track. currently on the standards track.
5. Security Considerations 5. Security Considerations
This document requires a TCP to diminish its sending rate in the This document requires a TCP to diminish its sending rate in the
presence of retransmission timeouts and the arrival of duplicate presence of retransmission timeouts and the arrival of duplicate
acknowledgments. An attacker can therefore impair the performance acknowledgments. An attacker can therefore impair the performance
of a TCP connection by either causing data packets or their of a TCP connection by either causing data packets or their
acknowledgments to be lost, or by forging excessive duplicate acknowledgments to be lost, or by forging excessive duplicate
acknowledgments. Causing two congestion control events back-to-back acknowledgments.
will often cut ssthresh to its minimum value of 2*SMSS, causing the
connection to immediately enter the slower-performing congestion
avoidance phase.
In response to the ACK division attack outlined in [SCWA99] this In response to the ACK division attack outlined in [SCWA99] this
document RECOMMENDS increasing the congestion window based on the document RECOMMENDS increasing the congestion window based on the
number of bytes newly acknowledged in each arriving ACK rather than number of bytes newly acknowledged in each arriving ACK rather than
by a particular constant on each arriving ACK (as outlined in by a particular constant on each arriving ACK (as outlined in
section 3.1). section 3.1).
The Internet to a considerable degree relies on the correct The Internet to a considerable degree relies on the correct
implementation of these algorithms in order to preserve network implementation of these algorithms in order to preserve network
stability and avoid congestion collapse. An attacker could cause stability and avoid congestion collapse. An attacker could cause
TCP endpoints to respond more aggressively in the face of congestion TCP endpoints to respond more aggressively in the face of congestion
by forging excessive duplicate acknowledgments or excessive by forging excessive duplicate acknowledgments or excessive
acknowledgments for new data. Conceivably, such an attack could acknowledgments for new data. Conceivably, such an attack could
drive a portion of the network into congestion collapse. drive a portion of the network into congestion collapse.
6. Changes Between RFC 2001 and RFC 2581 6. Changes Between RFC 2001 and RFC 2581
[RFC2001] has been extensively rewritten editorially and it is not [RFC2001] was extensively rewritten editorially and it is not
feasible to itemize the list of changes between [RFC2001] and feasible to itemize the list of changes between [RFC2001] and
[RFC2581]. The intention of [RFC2581] is to not change any of the [RFC2581]. The intention of [RFC2581] was to not change any of the
recommendations given in [RFC2001], but to further clarify cases recommendations given in [RFC2001], but to further clarify cases
that were not discussed in detail in [RFC2001]. Specifically, that were not discussed in detail in [RFC2001]. Specifically,
[RFC2581] suggests what TCP connections should do after a relatively [RFC2581] suggested what TCP connections should do after a
long idle period, as well as specifying and clarifying some of the relatively long idle period, as well as specified and clarified
issues pertaining to TCP ACK generation. Finally, the allowable some of the issues pertaining to TCP ACK generation. Finally, the
upper bound for the initial congestion window has also been raised allowable upper bound for the initial congestion window was raised
from one to two segments. from one to two segments.
7. Changes Relative to RFC 2581 7. Changes Relative to RFC 2581
A specific definition for "duplicate acknowledgment" has been A specific definition for "duplicate acknowledgment" has been
added, based on the definition used by BSD TCP. added, based on the definition used by BSD TCP.
The document now notes that what to do with duplicate ACKs after the The document now notes that what to do with duplicate ACKs after the
retransmission timer has fired is future work and explicitly retransmission timer has fired is future work and explicitly
unspecified in this document. unspecified in this document.
The initial window requirements were changed to allow Larger The initial window requirements were changed to allow Larger
Initial Windows as standardized in [RFC3390]. Additionally, the Initial Windows as standardized in [RFC3390]. Additionally, the
steps to take when an initial window is discovered to be too large steps to take when an initial window is discovered to be too large
skipping to change at page 13, line 55 skipping to change at page 14, line 37
[CJ89] Chiu, D. and R. Jain, "Analysis of the Increase/Decrease [CJ89] Chiu, D. and R. Jain, "Analysis of the Increase/Decrease
Algorithms for Congestion Avoidance in Computer Networks", Algorithms for Congestion Avoidance in Computer Networks",
Journal of Computer Networks and ISDN Systems, vol. 17, no. 1, Journal of Computer Networks and ISDN Systems, vol. 17, no. 1,
pp. 1-14, June 1989. pp. 1-14, June 1989.
[FF96] Fall, K. and S. Floyd, "Simulation-based Comparisons of [FF96] Fall, K. and S. Floyd, "Simulation-based Comparisons of
Tahoe, Reno and SACK TCP", Computer Communication Review, July Tahoe, Reno and SACK TCP", Computer Communication Review, July
1996. ftp://ftp.ee.lbl.gov/papers/sacks.ps.Z. 1996. ftp://ftp.ee.lbl.gov/papers/sacks.ps.Z.
[Flo94] Floyd, S., "TCP and Successive Fast Retransmits. Technical
report", October 1994.
ftp://ftp.ee.lbl.gov/papers/fastretrans.ps.
[Hoe96] Hoe, J., "Improving the Start-up Behavior of a Congestion [Hoe96] Hoe, J., "Improving the Start-up Behavior of a Congestion
Control Scheme for TCP", In ACM SIGCOMM, August 1996. Control Scheme for TCP", In ACM SIGCOMM, August 1996.
[HTH98] Hughes, A., Touch, J. and J. Heidemann, "Issues in TCP [HTH98] Hughes, A., Touch, J. and J. Heidemann, "Issues in TCP
Slow-Start Restart After Idle", Work in Progress. Slow-Start Restart After Idle", Work in Progress.
[Jac88] Jacobson, V., "Congestion Avoidance and Control", Computer [Jac88] Jacobson, V., "Congestion Avoidance and Control", Computer
Communication Review, vol. 18, no. 4, pp. 314-329, Aug. 1988. Communication Review, vol. 18, no. 4, pp. 314-329, Aug. 1988.
ftp://ftp.ee.lbl.gov/papers/congavoid.ps.Z. ftp://ftp.ee.lbl.gov/papers/congavoid.ps.Z.
skipping to change at page 16, line 12 skipping to change at page 16, line 42
International Computer Science Institute (ICSI) International Computer Science Institute (ICSI)
1947 Center Street 1947 Center Street
Suite 600 Suite 600
Berkeley, CA 94704-1198 Berkeley, CA 94704-1198
Phone: +1 510/642-4274 x302 Phone: +1 510/642-4274 x302
EMail: vern@icir.org EMail: vern@icir.org
http://www.icir.org/vern/ http://www.icir.org/vern/
Ethan Blanton Ethan Blanton
Purdue University Computer Sciences Purdue University Computer Sciences
1398 Computer Science Building 305 North University Street
West Lafayette, IN 47907 West Lafayette, IN 47907
EMail: eblanton@cs.purdue.edu EMail: eblanton@cs.purdue.edu
http://www.cs.purdue.edu/homes/eblanton/ http://www.cs.purdue.edu/homes/eblanton/
Intellectual Property Statement
The IETF takes no position regarding the validity or scope of any
Intellectual Property Rights or other rights that might be claimed
to pertain to the implementation or use of the technology described
in this document or the extent to which any license under such
rights might or might not be available; nor does it represent that
it has made any independent effort to identify any such rights.
Information on the procedures with respect to rights in RFC
documents can be found in BCP 78 and BCP 79.
Copies of IPR disclosures made to the IETF Secretariat and any
assurances of licenses to be made available, or the result of an
attempt made to obtain a general license or permission for the use
of such proprietary rights by implementers or users of this
specification can be obtained from the IETF on-line IPR repository
at http://www.ietf.org/ipr.
The IETF invites any interested party to bring to its attention any
copyrights, patents or patent applications, or other proprietary
rights that may cover technology that may be required to implement
this standard. Please address the information to the IETF at
ietf-ipr@ietf.org.
Disclaimer of Validity
This document and the information contained herein are provided
on an "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE
REPRESENTS OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY, THE
IETF TRUST AND THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL
WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY
WARRANTY THAT THE USE OF THE INFORMATION HEREIN WILL NOT INFRINGE
ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS
FOR A PARTICULAR PURPOSE.
Copyright Statement
Copyright (C) The IETF Trust (2008). This document is subject to
the rights, licenses and restrictions contained in BCP 78, and
except as set forth therein, the authors retain all their rights.
Acknowledgment Acknowledgment
Funding for the RFC Editor function is currently provided by the Funding for the RFC Editor function is currently provided by the
Internet Society. Internet Society.
 End of changes. 18 change blocks. 
73 lines changed or deleted 62 lines changed or added

This html diff was produced by rfcdiff 1.35. The latest version is available from http://tools.ietf.org/tools/rfcdiff/