draft-ietf-tsvwg-newreno-01.txt   draft-ietf-tsvwg-newreno-02.txt 
Internet Engineering Task Force S. Floyd Internet Engineering Task Force S. Floyd
INTERNET DRAFT ICSI INTERNET DRAFT ICSI
draft-ietf-tsvwg-newreno-01.txt T. Henderson draft-ietf-tsvwg-newreno-02.txt T. Henderson
Boeing Boeing
A. Gurtov A. Gurtov
U. Helsinki U. Helsinki
September 2003 November 2003
The NewReno Modification to TCP's Fast Recovery Algorithm The NewReno Modification to TCP's Fast Recovery Algorithm
Status of this Memo Status of this Memo
This document is an Internet-Draft and is in full conformance with This document is an Internet-Draft and is in full conformance with
all provisions of Section 10 of RFC2026. all provisions of Section 10 of RFC2026.
Internet-Drafts are working documents of the Internet Engineering Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF), its areas, and its working groups. Note that Task Force (IETF), its areas, and its working groups. Note that
skipping to change at page 1, line 40 skipping to change at page 1, line 40
The list of Internet-Draft Shadow Directories can be accessed at The list of Internet-Draft Shadow Directories can be accessed at
http://www.ietf.org/shadow.html. http://www.ietf.org/shadow.html.
Abstract Abstract
RFC 2581 [RFC2581] documents the following four intertwined TCP RFC 2581 [RFC2581] documents the following four intertwined TCP
congestion control algorithms: Slow Start, Congestion Avoidance, Fast congestion control algorithms: Slow Start, Congestion Avoidance, Fast
Retransmit, and Fast Recovery. RFC 2581 [RFC2581] explicitly allows Retransmit, and Fast Recovery. RFC 2581 [RFC2581] explicitly allows
certain modifications of these algorithms, including modifications certain modifications of these algorithms, including modifications
that use the TCP Selective Acknowledgement (SACK) option [RFC2018], that use the TCP Selective Acknowledgement (SACK) option
and modifications that respond to "partial acknowledgments" (ACKs [RFC2018,RFC3517], and modifications that respond to "partial
which cover new data, but not all the data outstanding when loss was acknowledgments" (ACKs which cover new data, but not all the data
detected) in the absence of SACK. The NewReno mechanism uses an outstanding when loss was detected) in the absence of SACK. The
algorithm for responding to partial acknowledgments that was first NewReno mechanism uses an algorithm for responding to partial
proposed by Janey Hoe in [Hoe95]. acknowledgments that was first proposed by Janey Hoe in [Hoe95].
RFC 2582 [RFC2582] specified the NewReno mechanisms as Experimental RFC 2582 [RFC2582] specified the NewReno mechanisms as Experimental
in 1999. This document is a small revision of RFC 2582 intended to in 1999. This document is a small revision of RFC 2582 intended to
advance the NewReno mechanisms to Proposed Standard. RFC 2581 notes advance the NewReno mechanisms to Proposed Standard. RFC 2581 notes
that the Fast Retransmit/Fast Recovery algorithm specified in that that the Fast Retransmit/Fast Recovery algorithm specified in that
document does not recover very efficiently from multiple losses in a document does not recover very efficiently from multiple losses in a
single flight of packets, and that RFC 2582 contains one set of single flight of packets, and that RFC 2582 contains one set of
modifications to address this problem. modifications to address this problem.
NOTE TO THE RFC EDITOR: PLEASE REMOVE THIS SECTION UPON PUBLICATION. NOTE TO THE RFC EDITOR: PLEASE REMOVE THIS SECTION UPON PUBLICATION.
Changes from draft-ietf-tsvwg-newreno-01.txt:
* Some improvements in phrasing suggested by Mark Allman.
Changes from draft-ietf-tsvwg-newreno-00.txt: Changes from draft-ietf-tsvwg-newreno-00.txt:
* In Section 8, added a cautionary note about using the duplicate * In Section 8, added a cautionary note about using the duplicate
acknowledgment counter as a flag for whether Fast Recovery is in acknowledgment counter as a flag for whether Fast Recovery is in
effect. effect.
* In Section 8, added a note about pulling along "recover" with * In Section 8, added a note about pulling along "recover" with
"snd_una" when Fast Recovery is not in effect. "snd_una" when Fast Recovery is not in effect.
* Added a discussion in Section 6 about heuristics for distinguishing * Added a discussion in Section 6 about heuristics for distinguishing
skipping to change at page 6, line 43 skipping to change at page 6, line 45
"recover", then the ACK acknowledges all the intermediate "recover", then the ACK acknowledges all the intermediate
segments sent between the original transmission of the lost segments sent between the original transmission of the lost
segment and the receipt of the third duplicate ACK. Set cwnd to segment and the receipt of the third duplicate ACK. Set cwnd to
either (1) min (ssthresh, FlightSize + SMSS); or (2) ssthresh, either (1) min (ssthresh, FlightSize + SMSS); or (2) ssthresh,
where ssthresh is the value set in step 1; this is termed where ssthresh is the value set in step 1; this is termed
"deflating" the window. (We note that "FlightSize" in step 1 "deflating" the window. (We note that "FlightSize" in step 1
referred to the amount of data outstanding in step 1, when Fast referred to the amount of data outstanding in step 1, when Fast
Recovery was entered, while "FlightSize" in step 5 refers to the Recovery was entered, while "FlightSize" in step 5 refers to the
amount of data outstanding in step 5, when Fast Recovery is amount of data outstanding in step 5, when Fast Recovery is
exited.) If the second option is selected, the implementation exited.) If the second option is selected, the implementation
should take measures to avoid a possible burst of data, in case is encouraged to take measures to avoid a possible burst of
the amount of data outstanding in the network was much less than data, in case the amount of data outstanding in the network was
the new congestion window allows. A simple mechanism is to limit much less than the new congestion window allows. A simple
the number of data packets that can be sent in response to a mechanism is to limit the number of data packets that can be
single acknowledgement. (This is known as "maxburst_" in the NS sent in response to a single acknowledgement. (This is known
simulator). Exit the Fast Recovery procedure. as "maxburst_" in the NS simulator). Exit the Fast Recovery
procedure.
Partial acknowledgements: Partial acknowledgements:
If this ACK does *not* acknowledge all of the data up to and If this ACK does *not* acknowledge all of the data up to and
including "recover", then this is a partial ACK. In this case, including "recover", then this is a partial ACK. In this case,
retransmit the first unacknowledged segment. Deflate the retransmit the first unacknowledged segment. Deflate the
congestion window by the amount of new data acknowledged, then congestion window by the amount of new data acknowledged by the
add back one SMSS (if the partial ACK acknowledges at least one cumulative acknowledgement field. If the partial ACK
SMSS of new data) and send a new segment if permitted by the new acknowledges at least one SMSS of new data, then add back SMSS
value of cwnd. This "partial window deflation" attempts to bytes to the congestion window. As in Step 3, this artificially
ensure that, when Fast Recovery eventually ends, approximately inflates the congestion window in order to reflect the additional
ssthresh amount of data will be outstanding in the network. Do segment that has left the network. Send a new segment if
not exit the Fast Recovery procedure (i.e., if any duplicate ACKs permitted by the new value of cwnd. This "partial window
subsequently arrive, execute Steps 3 and 4 above). deflation" attempts to ensure that, when Fast Recovery eventually
ends, approximately ssthresh amount of data will be outstanding
in the network. Do not exit the Fast Recovery procedure (i.e.,
if any duplicate ACKs subsequently arrive, execute Steps 3 and
4 above).
For the first partial ACK that arrives during Fast Recovery, also For the first partial ACK that arrives during Fast Recovery, also
reset the retransmit timer. reset the retransmit timer. Timer management is discussed in
more detail in Section 4.
6) Retransmit timeouts: 6) Retransmit timeouts:
After a retransmit timeout, record the highest sequence number After a retransmit timeout, record the highest sequence number
transmitted in the variable "recover" and exit the Fast transmitted in the variable "recover" and exit the Fast
Recovery procedure if applicable. Recovery procedure if applicable.
Step 1 specifies a check that the Cumulative Acknowledgement field Step 1 specifies a check that the Cumulative Acknowledgement field
covers more than "recover". Because the acknowledgement field covers more than "recover". Because the acknowledgement field
contains the sequence number that the sender next expects to receive, contains the sequence number that the sender next expects to receive,
the acknowledgement "ack_number" covers more than "recover" when: the acknowledgement "ack_number" covers more than "recover" when:
ack_number - one > recover. ack_number - 1 > recover.
Note that in Step 5, the congestion window is deflated after a Note that in Step 5, the congestion window is deflated after a
partial acknowledgement is received. The congestion window was partial acknowledgement is received. The congestion window was
likely to have been inflated considerably when the partial likely to have been inflated considerably when the partial
acknowledgement was received. In addition, depending on the original acknowledgement was received. In addition, depending on the original
pattern of packet losses, the partial acknowledgement might pattern of packet losses, the partial acknowledgement might
acknowledge nearly a window of data. In this case, if the congestion acknowledge nearly a window of data. In this case, if the congestion
window was not deflated, the data sender might be able to send nearly window was not deflated, the data sender might be able to send nearly
a window of data back-to-back. a window of data back-to-back.
This document does not specify the sender's response to duplicate This document does not specify the sender's response to duplicate
ACKs when the Fast Retransmit/Fast Recovery algorithm is not invoked. ACKs when the Fast Retransmit/Fast Recovery algorithm is not invoked.
This is addressed in other documents, such as those describing the This is addressed in other documents, such as those describing the
Limited Transmit procedure [RFC3042]. This document also does not Limited Transmit procedure [RFC3042]. This document also does not
address issues of adjusting the duplicate acknowledgement threshold, address issues of adjusting the duplicate acknowledgement threshold,
but assumes the threshold of three duplicate acknowledgements but assumes the threshold specified in the IETF standards; the
currently specified in RFC 2581. current standard is RFC 2581, which specifies a threshold of three
duplicate acknowledgements.
As a final note, we would observe that in the absence of the SACK As a final note, we would observe that in the absence of the SACK
option, the data sender is working from limited information. When option, the data sender is working from limited information. When
the issue of recovery from multiple dropped packets from a single the issue of recovery from multiple dropped packets from a single
window of data is of particular importance, the best alternative window of data is of particular importance, the best alternative
would be to use the SACK option. would be to use the SACK option.
4. Resetting the Retransmit Timer in Response to Partial 4. Resetting the Retransmit Timer in Response to Partial
Acknowledgements. Acknowledgements.
skipping to change at page 9, line 47 skipping to change at page 10, line 8
SACK, recovering as quickly as slow-start introduces the likelihood SACK, recovering as quickly as slow-start introduces the likelihood
of unnecessarily retransmitting packets, and this could significantly of unnecessarily retransmitting packets, and this could significantly
complicate the recovery mechanisms. complicate the recovery mechanisms.
We note that the response to partial acknowledgements specified in We note that the response to partial acknowledgements specified in
Section 3 of this document and in RFC 2582 differs from the response Section 3 of this document and in RFC 2582 differs from the response
in [FF96], even though both approaches only retransmit one packet in in [FF96], even though both approaches only retransmit one packet in
response to a partial acknowledgement. Step 5 of Section 3 specifies response to a partial acknowledgement. Step 5 of Section 3 specifies
that the TCP sender responds to a partial ACK by deflating the that the TCP sender responds to a partial ACK by deflating the
congestion window by the amount of new data acknowledged, then adding congestion window by the amount of new data acknowledged, then adding
back one SMSS if the partial ACK acknowledges at least one SMSS of back SMSS bytes if the partial ACK acknowledges at least SMSS bytes
new data, and sending a new segment if permitted by the new value of of new data, and sending a new segment if permitted by the new value
cwnd. Thus, only one previously-sent packet is retransmitted in of cwnd. Thus, only one previously-sent packet is retransmitted in
response to each partial acknowledgement, but additional new packets response to each partial acknowledgement, but additional new packets
might be transmitted as well, depending on the amount of new data might be transmitted as well, depending on the amount of new data
acknowledged by the partial acknowledgement. In contrast, the acknowledged by the partial acknowledgement. In contrast, the
variant of NewReno illustrated in [FF96] simply set the congestion variant of NewReno illustrated in [FF96] simply set the congestion
window to ssthresh when a partial acknowledgement was received. The window to ssthresh when a partial acknowledgement was received. The
approach in [FF96] is more conservative, and does not attempt to approach in [FF96] is more conservative, and does not attempt to
accurately track the actual number of outstanding packets after a accurately track the actual number of outstanding packets after a
partial acknowledgement is received. While either of these partial acknowledgement is received. While either of these
approaches gives acceptable performance, the variant specified in approaches gives acceptable performance, the variant specified in
Section 3 recovers more smoothly when multiple packets are dropped Section 3 recovers more smoothly when multiple packets are dropped
skipping to change at page 12, line 25 skipping to change at page 12, line 34
If the ACK-based heuristic is used, then following the advancement of If the ACK-based heuristic is used, then following the advancement of
the cumulative acknowledgement field, the sender stores the value of the cumulative acknowledgement field, the sender stores the value of
previous cumulative acknowledgement as prev_highest_ack and stores previous cumulative acknowledgement as prev_highest_ack and stores
the latest cumulative ACK as highest_ack. In addition, the following the latest cumulative ACK as highest_ack. In addition, the following
step is performed if Step 1 in Section 3 fails, before proceeding to step is performed if Step 1 in Section 3 fails, before proceeding to
Step 1B. Step 1B.
1*) If the Cumulative Acknowledgement field didn't cover more than 1*) If the Cumulative Acknowledgement field didn't cover more than
"recover", check to see if the congestion window is greater "recover", check to see if the congestion window is greater
than one SMSS and the difference between highest_ack and than SMSS bytes and the difference between highest_ack and
prev_highest_ack is at most four SMSS. If true, duplicate prev_highest_ack is at most 4*SMSS bytes. If true, duplicate
ACKs indicate a lost segment (proceed to Step 1A in Section ACKs indicate a lost segment (proceed to Step 1A in Section
3). Otherwise, duplicate ACKs likely result from unnecessary 3). Otherwise, duplicate ACKs likely result from unnecessary
retransmissions (proceed to Step 1B in Section 3). retransmissions (proceed to Step 1B in Section 3).
The congestion window check serves to protect against fast retransmit The congestion window check serves to protect against fast retransmit
immediately after a retransmit timeout, similar to the immediately after a retransmit timeout, similar to the
"exitFastRetrans_" variable in NS. Examples of applying the ACK "exitFastRetrans_" variable in NS. Examples of applying the ACK
heuristic are in validation tests "./test-all-newreno heuristic are in validation tests "./test-all-newreno
newreno_rto_loss_ack" and "./test-all-newreno newreno_rto_dup_ack" in newreno_rto_loss_ack" and "./test-all-newreno newreno_rto_dup_ack" in
directory "tcl/test" of the NS simulator. directory "tcl/test" of the NS simulator.
If several ACKs are lost, the sender can see a jump in the cumulative If several ACKs are lost, the sender can see a jump in the cumulative
ACK of more than three segments and the heuristic can fail. A ACK of more than three segments and the heuristic can fail. A
validation test for this scenario is "./test-all-newreno validation test for this scenario is "./test-all-newreno
newreno_rto_loss_ackf". The ACK heuristic is more likely to fail if newreno_rto_loss_ackf". RFC 2581 recommends that a receiver should
the receiver uses delayed ACKs, because then a smaller number of ACK send duplicate ACKs for every out-of-order data packet, such as a
losses are needed to produce a sufficient jump in the cumulative ACK. data packet received during Fast Recovery. The ACK heuristic is more
likely to fail if the receiver does not follow this advice, because
then a smaller number of ACK losses are needed to produce a
sufficient jump in the cumulative ACK.
6.2. Timestamp Heuristic. 6.2. Timestamp Heuristic.
If this heuristic is used, the sender stores the timestamp of the If this heuristic is used, the sender stores the timestamp of the
last acknowledged segment. In addition, the second paragraph of step last acknowledged segment. In addition, the second paragraph of step
1 in Section 3 is replaced as follows: 1 in Section 3 is replaced as follows:
1**) If the Cumulative Acknowledgement field didn't cover more than 1**) If the Cumulative Acknowledgement field didn't cover more than
"recover", check to see if the echoed timestamp equals the "recover", check to see if the echoed timestamp equals the
stored timestamp. If true, duplicate ACKs indicate a lost stored timestamp. If true, duplicate ACKs indicate a lost
skipping to change at page 13, line 29 skipping to change at page 13, line 41
7. Implementation Issues for the Data Receiver 7. Implementation Issues for the Data Receiver
[RFC2581] specifies that "Out-of-order data segments SHOULD be [RFC2581] specifies that "Out-of-order data segments SHOULD be
acknowledged immediately, in order to accelerate loss recovery." acknowledged immediately, in order to accelerate loss recovery."
Neal Cardwell has noted that some data receivers do not send an Neal Cardwell has noted that some data receivers do not send an
immediate acknowledgement when they send a partial acknowledgment, immediate acknowledgement when they send a partial acknowledgment,
but instead wait first for their delayed acknowledgement timer to but instead wait first for their delayed acknowledgement timer to
expire [C98]. As [C98] notes, this severely limits the potential expire [C98]. As [C98] notes, this severely limits the potential
benefit from NewReno by delaying the receipt of the partial benefit from NewReno by delaying the receipt of the partial
acknowledgement at the data sender. Our recommendation is that the acknowledgement at the data sender. Echoing RFC 2581, our
data receiver send an immediate acknowledgement for an out-of-order recommendation is that the data receiver send an immediate
segment, even when that out-of-order segment fills a hole in the acknowledgement for an out-of-order segment, even when that out-of-
buffer. order segment fills a hole in the buffer.
8. Implementation Issues for the Data Sender 8. Implementation Issues for the Data Sender
In Section 3, Step 5 above, it is noted that implementations should In Section 3, Step 5 above, it is noted that implementations should
take measures to avoid a possible burst of data when leaving Fast take measures to avoid a possible burst of data when leaving Fast
Recovery, in case the amount of new data that the sender is eligible Recovery, in case the amount of new data that the sender is eligible
to send due to the new value of the congestion window is large. This to send due to the new value of the congestion window is large. This
can arise during NewReno when ACKs are lost or treated as pure window can arise during NewReno when ACKs are lost or treated as pure window
updates, thereby causing the sender to underestimate the number of updates, thereby causing the sender to underestimate the number of
new segments that can be sent during the recovery procedure. new segments that can be sent during the recovery procedure.
skipping to change at page 19, line 31 skipping to change at page 19, line 31
"http://www.acm.org/sigcomm/sigcomm97/program.html". "http://www.acm.org/sigcomm/sigcomm97/program.html".
[NS] The Network Simulator (NS). URL "http://www.isi.edu/nsnam/ns/". [NS] The Network Simulator (NS). URL "http://www.isi.edu/nsnam/ns/".
[PF01] J. Padhye and S. Floyd, "Identifying the TCP Behavior of Web [PF01] J. Padhye and S. Floyd, "Identifying the TCP Behavior of Web
Servers", June 2001, SIGCOMM 2001. Servers", June 2001, SIGCOMM 2001.
[RFC1323] V. Jacobson, R. Braden, and D. Borman, "TCP Extensions for [RFC1323] V. Jacobson, R. Braden, and D. Borman, "TCP Extensions for
High Performance,", RFC 1323, May 1992. High Performance,", RFC 1323, May 1992.
[RFC3517] E. Blanton, M. Allman, K. Fall, and L. Wang, "A
Conservative Selective Acknowledgment (SACK)-based Loss Recovery
Algorithm for TCP", RFC 3517, April 2003.
[RFC3522] R. Ludwig and M. Meyer, The Eifel Detection Algorithm for [RFC3522] R. Ludwig and M. Meyer, The Eifel Detection Algorithm for
TCP, RFC 3522, April 2003. TCP, RFC 3522, April 2003.
15. Security Considerations 15. Security Considerations
RFC 2581 discusses general security considerations concerning TCP RFC 2581 discusses general security considerations concerning TCP
congestion control. This document describes a specific algorithm congestion control. This document describes a specific algorithm
that conforms with the congestion control requirements of RFC 2581, that conforms with the congestion control requirements of RFC 2581,
and so those considerations apply to this algorithm, too. There are and so those considerations apply to this algorithm, too. There are
no known additional security concerns for this specific algorithm. no known additional security concerns for this specific algorithm.
 End of changes. 

This html diff was produced by rfcdiff 1.23, available from http://www.levkowetz.com/ietf/tools/rfcdiff/