draft-ietf-tcpm-1323bis-12.txt   draft-ietf-tcpm-1323bis-13.txt 
TCP Maintenance (TCPM) D. Borman TCP Maintenance (TCPM) D. Borman
Internet-Draft Quantum Corporation Internet-Draft Quantum Corporation
Intended status: Standards Track B. Braden Intended status: Standards Track B. Braden
Expires: November 15, 2013 University of Southern Expires: November 19, 2013 University of Southern
California California
V. Jacobson V. Jacobson
Packet Design Packet Design
R. Scheffenegger, Ed. R. Scheffenegger, Ed.
NetApp, Inc. NetApp, Inc.
May 14, 2013 May 18, 2013
TCP Extensions for High Performance TCP Extensions for High Performance
draft-ietf-tcpm-1323bis-12 draft-ietf-tcpm-1323bis-13
Abstract Abstract
This document specifies a set of TCP extensions to improve This document specifies a set of TCP extensions to improve
performance over paths with a large bandwidth * delay product and to performance over paths with a large bandwidth * delay product and to
provide reliable operation over very high-speed paths. It defines provide reliable operation over very high-speed paths. It defines
TCP options for scaled windows and timestamps. The timestamps are TCP options for scaled windows and timestamps. The timestamps are
used for two distinct mechanisms, RTTM (Round Trip Time Measurement) used for two distinct mechanisms, RTTM (Round Trip Time Measurement)
and PAWS (Protection Against Wrapped Sequences). and PAWS (Protection Against Wrapped Sequences).
skipping to change at page 1, line 43 skipping to change at page 1, line 43
Internet-Drafts are working documents of the Internet Engineering Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF). Note that other groups may also distribute Task Force (IETF). Note that other groups may also distribute
working documents as Internet-Drafts. The list of current Internet- working documents as Internet-Drafts. The list of current Internet-
Drafts is at http://datatracker.ietf.org/drafts/current/. Drafts is at http://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress." material or to cite them other than as "work in progress."
This Internet-Draft will expire on November 15, 2013. This Internet-Draft will expire on November 19, 2013.
Copyright Notice Copyright Notice
Copyright (c) 2013 IETF Trust and the persons identified as the Copyright (c) 2013 IETF Trust and the persons identified as the
document authors. All rights reserved. document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents Provisions Relating to IETF Documents
(http://trustee.ietf.org/license-info) in effect on the date of (http://trustee.ietf.org/license-info) in effect on the date of
publication of this document. Please review these documents publication of this document. Please review these documents
skipping to change at page 3, line 17 skipping to change at page 3, line 17
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.1. TCP Performance . . . . . . . . . . . . . . . . . . . . . 4 1.1. TCP Performance . . . . . . . . . . . . . . . . . . . . . 4
1.2. TCP Reliability . . . . . . . . . . . . . . . . . . . . . 5 1.2. TCP Reliability . . . . . . . . . . . . . . . . . . . . . 5
1.3. Using TCP options . . . . . . . . . . . . . . . . . . . . 6 1.3. Using TCP options . . . . . . . . . . . . . . . . . . . . 6
1.4. Terminology . . . . . . . . . . . . . . . . . . . . . . . 7 1.4. Terminology . . . . . . . . . . . . . . . . . . . . . . . 7
2. TCP Window Scale Option . . . . . . . . . . . . . . . . . . . 8 2. TCP Window Scale Option . . . . . . . . . . . . . . . . . . . 8
2.1. Introduction . . . . . . . . . . . . . . . . . . . . . . . 8 2.1. Introduction . . . . . . . . . . . . . . . . . . . . . . . 8
2.2. Window Scale Option . . . . . . . . . . . . . . . . . . . 8 2.2. Window Scale Option . . . . . . . . . . . . . . . . . . . 8
2.3. Using the Window Scale Option . . . . . . . . . . . . . . 9 2.3. Using the Window Scale Option . . . . . . . . . . . . . . 9
2.4. Addressing Window Retraction . . . . . . . . . . . . . . . 10 2.4. Addressing Window Retraction . . . . . . . . . . . . . . . 10
3. TCP Timestamp Option . . . . . . . . . . . . . . . . . . . . . 12 3. TCP Timestamps option . . . . . . . . . . . . . . . . . . . . 12
3.1. Introduction . . . . . . . . . . . . . . . . . . . . . . . 12 3.1. Introduction . . . . . . . . . . . . . . . . . . . . . . . 12
3.2. Timestamp Option . . . . . . . . . . . . . . . . . . . . . 12 3.2. Timestamps option . . . . . . . . . . . . . . . . . . . . 12
3.3. The RTTM Mechanism . . . . . . . . . . . . . . . . . . . . 13 3.3. The RTTM Mechanism . . . . . . . . . . . . . . . . . . . . 13
3.4. Updating the RTO value . . . . . . . . . . . . . . . . . . 15 3.4. Updating the RTO value . . . . . . . . . . . . . . . . . . 15
3.5. Which Timestamp to Echo . . . . . . . . . . . . . . . . . 15 3.5. Which Timestamp to Echo . . . . . . . . . . . . . . . . . 15
4. PAWS -- Protection Against Wrapped Sequence Numbers . . . . . 18 4. PAWS - Protection Against Wrapped Sequence Numbers . . . . . . 18
4.1. Introduction . . . . . . . . . . . . . . . . . . . . . . . 18 4.1. Introduction . . . . . . . . . . . . . . . . . . . . . . . 18
4.2. The PAWS Mechanism . . . . . . . . . . . . . . . . . . . . 18 4.2. The PAWS Mechanism . . . . . . . . . . . . . . . . . . . . 18
4.3. Basic PAWS Algorithm . . . . . . . . . . . . . . . . . . . 19 4.3. Basic PAWS Algorithm . . . . . . . . . . . . . . . . . . . 19
4.4. Timestamp Clock . . . . . . . . . . . . . . . . . . . . . 21 4.4. Timestamp Clock . . . . . . . . . . . . . . . . . . . . . 21
4.5. Outdated Timestamps . . . . . . . . . . . . . . . . . . . 23 4.5. Outdated Timestamps . . . . . . . . . . . . . . . . . . . 23
4.6. Header Prediction . . . . . . . . . . . . . . . . . . . . 23 4.6. Header Prediction . . . . . . . . . . . . . . . . . . . . 23
4.7. IP Fragmentation . . . . . . . . . . . . . . . . . . . . . 25 4.7. IP Fragmentation . . . . . . . . . . . . . . . . . . . . . 25
4.8. Duplicates from Earlier Incarnations of Connection . . . . 25 4.8. Duplicates from Earlier Incarnations of Connection . . . . 25
5. Conclusions and Acknowledgements . . . . . . . . . . . . . . . 25 5. Conclusions and Acknowledgements . . . . . . . . . . . . . . . 25
6. Security Considerations . . . . . . . . . . . . . . . . . . . 26 6. Security Considerations . . . . . . . . . . . . . . . . . . . 26
7. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 27 7. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 27
8. References . . . . . . . . . . . . . . . . . . . . . . . . . . 28 8. References . . . . . . . . . . . . . . . . . . . . . . . . . . 28
8.1. Normative References . . . . . . . . . . . . . . . . . . . 28 8.1. Normative References . . . . . . . . . . . . . . . . . . . 28
8.2. Informative References . . . . . . . . . . . . . . . . . . 28 8.2. Informative References . . . . . . . . . . . . . . . . . . 28
Appendix A. Implementation Suggestions . . . . . . . . . . . . . 31 Appendix A. Implementation Suggestions . . . . . . . . . . . . . 31
Appendix B. Duplicates from Earlier Connection Incarnations . . . 32 Appendix B. Duplicates from Earlier Connection Incarnations . . . 32
B.1. System Crash with Loss of State . . . . . . . . . . . . . 32 B.1. System Crash with Loss of State . . . . . . . . . . . . . 32
B.2. Closing and Reopening a Connection . . . . . . . . . . . . 33 B.2. Closing and Reopening a Connection . . . . . . . . . . . . 33
Appendix C. Summary of Notation . . . . . . . . . . . . . . . . . 34 Appendix C. Summary of Notation . . . . . . . . . . . . . . . . . 34
Appendix D. Event Processing Summary . . . . . . . . . . . . . . 35 Appendix D. Event Processing Summary . . . . . . . . . . . . . . 35
Appendix E. Timestamps Edge Cases . . . . . . . . . . . . . . . . 40 Appendix E. Timestamps Edge Cases . . . . . . . . . . . . . . . . 41
Appendix F. Window Retraction Example . . . . . . . . . . . . . . 41 Appendix F. Window Retraction Example . . . . . . . . . . . . . . 41
Appendix G. Changes from RFC 1323 . . . . . . . . . . . . . . . . 42 Appendix G. RTO calculation modification . . . . . . . . . . . . 42
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 44 Appendix H. Changes from RFC 1323 . . . . . . . . . . . . . . . . 43
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 45
1. Introduction 1. Introduction
The TCP protocol [RFC0793] was designed to operate reliably over The TCP protocol [RFC0793] was designed to operate reliably over
almost any transmission medium regardless of transmission rate, almost any transmission medium regardless of transmission rate,
delay, corruption, duplication, or reordering of segments. Over the delay, corruption, duplication, or reordering of segments. Over the
years, advances in networking technology has resulted in ever-higher years, advances in networking technology has resulted in ever-higher
transmission speeds, and the fastest paths are well beyond the domain transmission speeds, and the fastest paths are well beyond the domain
for which TCP was originally engineered. for which TCP was originally engineered.
This document defines a set of modest extensions to TCP to extend the This document defines a set of modest extensions to TCP to extend the
domain of its application to match the increasing network capability. domain of its application to match the increasing network capability.
It is an update to and obsoletes [RFC1323], which in turn is based It is an update to and obsoletes [RFC1323], which in turn is based
upon and obsoletes [RFC1072] and [RFC1185]. upon and obsoletes [RFC1072] and [RFC1185].
Changes between [RFC1323] and this document are detailed in Changes between [RFC1323] and this document are detailed in
Appendix G. Appendix H.
For brevity, the full discussions of the merits and history behind For brevity, the full discussions of the merits and history behind
the TCP options defined within this document have been omitted. the TCP options defined within this document have been omitted.
[RFC1323] should be consulted for reference. It is recommended that [RFC1323] should be consulted for reference. It is recommended that
a modern TCP stack implements and make use of the extensions a modern TCP stack implements and make use of the extensions
described in this document. described in this document.
1.1. TCP Performance 1.1. TCP Performance
TCP performance problems arise when the bandwidth * delay product is TCP performance problems arise when the bandwidth * delay product is
skipping to change at page 6, line 15 skipping to change at page 6, line 15
A possible fix for the problem of cycling the sequence space would be A possible fix for the problem of cycling the sequence space would be
to increase the size of the TCP sequence number field. For example, to increase the size of the TCP sequence number field. For example,
the sequence number field (and also the acknowledgment field) could the sequence number field (and also the acknowledgment field) could
be expanded to 64 bits. This could be done either by changing the be expanded to 64 bits. This could be done either by changing the
TCP header or by means of an additional option. TCP header or by means of an additional option.
Section 4 presents a different mechanism, which we call PAWS Section 4 presents a different mechanism, which we call PAWS
(Protection Against Wrapped Sequence numbers), to extend TCP (Protection Against Wrapped Sequence numbers), to extend TCP
reliability to transfer rates well beyond the foreseeable upper limit reliability to transfer rates well beyond the foreseeable upper limit
of network bandwidths. PAWS uses the TCP timestamp option defined in of network bandwidths. PAWS uses the TCP Timestamps option defined
Section 3.2 to protect against old duplicates from the same in Section 3.2 to protect against old duplicates from the same
connection. connection.
1.3. Using TCP options 1.3. Using TCP options
The extensions defined in this document all use TCP options. The extensions defined in this document all use TCP options.
When [RFC1323] was published, there was concern that some buggy TCP When [RFC1323] was published, there was concern that some buggy TCP
implementation might be crashed by the first appearance of an option implementation might be crashed by the first appearance of an option
on a non-<SYN> segment. However, bugs like that can lead to DOS on a non-<SYN> segment. However, bugs like that can lead to DOS
attacks against a TCP. Research has shown that most TCP attacks against a TCP. Research has shown that most TCP
skipping to change at page 6, line 38 skipping to change at page 6, line 38
segments ([Medina04], [Medina05]). But it is still prudent to be segments ([Medina04], [Medina05]). But it is still prudent to be
conservative in what you send, and avoiding buggy TCP implementation conservative in what you send, and avoiding buggy TCP implementation
is not the only reason for negotiating TCP options on <SYN> segments. is not the only reason for negotiating TCP options on <SYN> segments.
The window scale option negotiates fundamental parameters of the TCP The window scale option negotiates fundamental parameters of the TCP
session. Therefore, it is only sent during the initial handshake. session. Therefore, it is only sent during the initial handshake.
Furthermore, the window scale option will be sent in a <SYN,ACK> Furthermore, the window scale option will be sent in a <SYN,ACK>
segment only if the corresponding option was received in the initial segment only if the corresponding option was received in the initial
<SYN> segment. <SYN> segment.
The timestamp option may appear in any data or <ACK> segment, adding The Timestamps option may appear in any data or <ACK> segment, adding
12 bytes to the 20-byte TCP header. It is required that this TCP 12 bytes to the 20-byte TCP header. It is required that this TCP
option will be sent on all non-<SYN> segments after an exchange of option will be sent on all non-<SYN> segments after an exchange of
options on the <SYN> segments has indicated that both sides options on the <SYN> segments has indicated that both sides
understand this extension. understand this extension.
Research has shown that the use of the Timestamp option to arrive at Research has shown that the use of the Timestamps option to arrive at
an optimal retransmission timeout value has only limited benefit an optimal retransmission timeout value has only limited benefit
([Allman99]. However, there are other uses of the Timestamp option, ([Allman99]. However, there are other uses of the Timestamps option,
such as the Eifel mechanism [RFC3522], [RFC4015], and PAWS (see such as the Eifel mechanism [RFC3522], [RFC4015], and PAWS (see
Section 4) which improve overall TCP security and performance. The Section 4) which improve overall TCP security and performance. The
extra header bandwidth used by this option should be evaluated for extra header bandwidth used by this option should be evaluated for
the gains in performance and security in an actual deployment. the gains in performance and security in an actual deployment.
Appendix A contains a recommended layout of the options in TCP Appendix A contains a recommended layout of the options in TCP
headers to achieve reasonable data field alignment. headers to achieve reasonable data field alignment.
Finally, we observe that most of the mechanisms defined in this Finally, we observe that most of the mechanisms defined in this
document are important for LFN's and/or very high-speed networks. document are important for LFN's and/or very high-speed networks.
skipping to change at page 10, line 9 skipping to change at page 10, line 9
SND.WND = SEG.WND << Snd.Wind.Shift SND.WND = SEG.WND << Snd.Wind.Shift
(assuming the other conditions of [RFC0793] are met, and using the (assuming the other conditions of [RFC0793] are met, and using the
"C" notation "<<" for left-shift). "C" notation "<<" for left-shift).
o The window field (SEG.WND) of every outgoing segment, with the o The window field (SEG.WND) of every outgoing segment, with the
exception of <SYN> segments, is right-shifted by Rcv.Wind.Shift exception of <SYN> segments, is right-shifted by Rcv.Wind.Shift
bits: bits:
SND.WND = RCV.WND >> Rcv.Wind.Shift SEG.WND = RCV.WND >> Rcv.Wind.Shift
TCP determines if a data segment is "old" or "new" by testing whether TCP determines if a data segment is "old" or "new" by testing whether
its sequence number is within 2^31 bytes of the left edge of the its sequence number is within 2^31 bytes of the left edge of the
window, and if it is not, discarding the data as "old". To insure window, and if it is not, discarding the data as "old". To insure
that new data is never mistakenly considered old and vice versa, the that new data is never mistakenly considered old and vice versa, the
left edge of the sender's window has to be at most 2^31 away from the left edge of the sender's window has to be at most 2^31 away from the
right edge of the receiver's window. Similarly with the sender's right edge of the receiver's window. Similarly with the sender's
right edge and receiver's left edge. Since the right and left edges right edge and receiver's left edge. Since the right and left edges
of either the sender's or receiver's window differ by the window of either the sender's or receiver's window differ by the window
size, and since the sender and receiver windows can be out of phase size, and since the sender and receiver windows can be out of phase
skipping to change at page 12, line 5 skipping to change at page 12, line 5
the most recent <ACK>. the most recent <ACK>.
4) On first retransmission, or if the sequence number is out-of- 4) On first retransmission, or if the sequence number is out-of-
window by less than 2^Rcv.Wind.Shift then do normal window by less than 2^Rcv.Wind.Shift then do normal
retransmission(s) without regard to receiver window as long as retransmission(s) without regard to receiver window as long as
the original segment was in window when it was sent. the original segment was in window when it was sent.
5) Subsequent retransmissions MAY only be sent, if they are within 5) Subsequent retransmissions MAY only be sent, if they are within
the window announced by the most recent <ACK>. the window announced by the most recent <ACK>.
3. TCP Timestamp Option 3. TCP Timestamps option
3.1. Introduction 3.1. Introduction
TCP measures the round trip time (RTT), primarily for the purpose of TCP measures the round trip time (RTT), primarily for the purpose of
arriving at a reasonable value for the Retransmission Timeout (RTO) arriving at a reasonable value for the Retransmission Timeout (RTO)
timer interval. Accurate and current RTT estimates are necessary to timer interval. Accurate and current RTT estimates are necessary to
adapt to changing traffic conditions, while a conservative estimate adapt to changing traffic conditions, while a conservative estimate
of the RTO inveral is necessary to minimize spurious RTOs. of the RTO inveral is necessary to minimize spurious RTOs.
When [RFC1323] was originally written, it was perceived that taking When [RFC1323] was originally written, it was perceived that taking
skipping to change at page 12, line 27 skipping to change at page 12, line 27
would contribute to reduce spurious RTOs, while maintaining the would contribute to reduce spurious RTOs, while maintaining the
timeliness of necessary RTOs. At the time, RTO was also the only timeliness of necessary RTOs. At the time, RTO was also the only
mechanism to make use of the measured RTT. It has been shown, that mechanism to make use of the measured RTT. It has been shown, that
taking more RTT samples has only a very limited effect to optimize taking more RTT samples has only a very limited effect to optimize
RTOs [Allman99]. RTOs [Allman99].
This document makes a clear distinction between the round trip time This document makes a clear distinction between the round trip time
measurement (RTTM) mechanism, and subsequent mechanisms using the RTT measurement (RTTM) mechanism, and subsequent mechanisms using the RTT
signal as input, such as RTO (see Section 3.4). signal as input, such as RTO (see Section 3.4).
The timestamp option is important when large receive windows are The Timestamps option is important when large receive windows are
used, to allow the use of the PAWS mechanism (see Section 4). used, to allow the use of the PAWS mechanism (see Section 4).
Furthermore, the option is useful for all TCP's, since it simplifies Furthermore, the option is useful for all TCP's, since it simplifies
the sender and allows the use of additional optimizations such as the sender and allows the use of additional optimizations such as
Eifel ([RFC3522], [RFC4015]) and others. Eifel ([RFC3522], [RFC4015]) and others.
3.2. Timestamp Option 3.2. Timestamps option
TCP is a symmetric protocol, allowing data to be sent at any time in TCP is a symmetric protocol, allowing data to be sent at any time in
either direction, and therefore timestamp echoing may occur in either either direction, and therefore timestamp echoing may occur in either
direction. For simplicity and symmetry, we specify that timestamps direction. For simplicity and symmetry, we specify that timestamps
always be sent and echoed in both directions. For efficiency, we always be sent and echoed in both directions. For efficiency, we
combine the timestamp and timestamp reply fields into a single TCP combine the timestamp and timestamp reply fields into a single TCP
Timestamp Option. Timestamps option.
TCP Timestamp Option (TSopt): TCP Timestamps option (TSopt):
Kind: 8 Kind: 8
Length: 10 bytes Length: 10 bytes
+-------+-------+---------------------+---------------------+ +-------+-------+---------------------+---------------------+
|Kind=8 | 10 | TS Value (TSval) |TS Echo Reply (TSecr)| |Kind=8 | 10 | TS Value (TSval) |TS Echo Reply (TSecr)|
+-------+-------+---------------------+---------------------+ +-------+-------+---------------------+---------------------+
1 1 4 4 1 1 4 4
The Timestamp Option carries two four-byte timestamp fields. The The Timestamps option carries two four-byte timestamp fields. The
Timestamp Value field (TSval) contains the current value of the Timestamp Value field (TSval) contains the current value of the
timestamp clock of the TCP sending the option. timestamp clock of the TCP sending the option.
The Timestamp Echo Reply field (TSecr) is valid if the ACK bit is set The Timestamp Echo Reply field (TSecr) is valid if the ACK bit is set
in the TCP header; if it is valid, it echoes a timestamp value that in the TCP header; if it is valid, it echoes a timestamp value that
was sent by the remote TCP in the TSval field of a Timestamp option. was sent by the remote TCP in the TSval field of a Timestamps option.
When TSecr is not valid, its value MUST be zero. However, a value of When TSecr is not valid, its value MUST be zero. However, a value of
zero does not imply TSecr being invalid. The TSecr value will zero does not imply TSecr being invalid. The TSecr value will
generally be from the most recent Timestamp Option that was received; generally be from the most recent Timestamps option that was
however, there are exceptions that are explained below. received; however, there are exceptions that are explained below.
A TCP MAY send the Timestamp option (TSopt) in an initial <SYN> A TCP MAY send the Timestamps option (TSopt) in an initial <SYN>
segment (i.e., segment containing a SYN bit and no ACK bit), and MAY segment (i.e., segment containing a SYN bit and no ACK bit), and MAY
send a TSopt in other segments only if it received a TSopt in the send a TSopt in other segments only if it received a TSopt in the
initial <SYN> or <SYN,ACK> segment for the connection. initial <SYN> or <SYN,ACK> segment for the connection.
Once TSopt has been successfully negotiated (sent and received) Once TSopt has been successfully negotiated (sent and received)
during the <SYN>, <SYN,ACK> exchange, TSopt MUST be sent in every during the <SYN>, <SYN,ACK> exchange, TSopt MUST be sent in every
non-<RST> segment for the duration of the connection, and SHOULD be non-<RST> segment for the duration of the connection, and SHOULD be
sent in a <RST> segment (see Section 4.2 for details). If a non- sent in a <RST> segment (see Section 4.2 for details). If a non-
<RST> segment is received without a TSopt, a TCP MAY drop the segment <RST> segment is received without a TSopt, a TCP MAY drop the segment
and send an <ACK> for the last in-sequence segment. A TCP MUST NOT and send an <ACK> for the last in-sequence segment. A TCP MUST NOT
skipping to change at page 13, line 49 skipping to change at page 13, line 49
TSopt is required for the two mechanisms described in sections 3.3 TSopt is required for the two mechanisms described in sections 3.3
and 4.2. There are also other mechanisms that rely on the presence and 4.2. There are also other mechanisms that rely on the presence
of the TSopt, e.g. [RFC3522]. If a TCP stopped sending TSopt at any of the TSopt, e.g. [RFC3522]. If a TCP stopped sending TSopt at any
time during an established session, it interferes with these time during an established session, it interferes with these
mechanisms. This update to [RFC1323] describes explicitly the mechanisms. This update to [RFC1323] describes explicitly the
previous assumption (see Section 4.2), that each TCP segment must previous assumption (see Section 4.2), that each TCP segment must
have TSopt, once negotiated. have TSopt, once negotiated.
3.3. The RTTM Mechanism 3.3. The RTTM Mechanism
RTTM places a Timestamp Option in every segment, with a TSval that is RTTM places a Timestamps option in every segment, with a TSval that
obtained from a (virtual) "timestamp clock". Values of this clock is obtained from a (virtual) "timestamp clock". Values of this clock
MUST be at least approximately proportional to real time, in order to MUST be at least approximately proportional to real time, in order to
measure actual RTT. measure actual RTT.
These TSval values are echoed in TSecr values in the reverse These TSval values are echoed in TSecr values in the reverse
direction. The difference between a received TSecr value and the direction. The difference between a received TSecr value and the
current timestamp clock value provides a RTT measurement. current timestamp clock value provides an RTTmeasurement.
When timestamps are used, every segment that is received will contain When timestamps are used, every segment that is received will contain
a TSecr value. However, these values cannot all be used to update a TSecr value. However, these values cannot all be used to update
the measured RTT. The following example illustrates why. It shows a the measured RTT. The following example illustrates why. It shows a
one-way data flow with segments arriving in sequence without loss. one-way data flow with segments arriving in sequence without loss.
Here A, B, C... represent data blocks occupying successive blocks of Here A, B, C... represent data blocks occupying successive blocks of
sequence numbers, and ACK(A),... represent the corresponding sequence numbers, and ACK(A),... represent the corresponding
cumulative acknowledgments. The two timestamp fields of the cumulative acknowledgments. The two timestamp fields of the
Timestamp Option are shown symbolically as <TSval=x,TSecr=y>. Each Timestamps option are shown symbolically as <TSval=x,TSecr=y>. Each
TSecr field contains the value most recently received in a TSval TSecr field contains the value most recently received in a TSval
field. field.
TCP A TCP B TCP A TCP B
<A,TSval=1,TSecr=120> -----> <A,TSval=1,TSecr=120> ----->
<---- <ACK(A),TSval=127,TSecr=1> <---- <ACK(A),TSval=127,TSecr=1>
<B,TSval=5,TSecr=127> -----> <B,TSval=5,TSecr=127> ----->
skipping to change at page 15, line 7 skipping to change at page 15, line 7
the averaged RTT measurement only if the segment advances the averaged RTT measurement only if the segment advances
the left edge of the send window, i.e. SND.UNA is the left edge of the send window, i.e. SND.UNA is
increased. increased.
Since TCP B is not sending data, the data segment C does not Since TCP B is not sending data, the data segment C does not
acknowledge any new data when it arrives at B. Thus, the inflated acknowledge any new data when it arrives at B. Thus, the inflated
RTTM measurement is not used to update B's RTTM measurement. RTTM measurement is not used to update B's RTTM measurement.
3.4. Updating the RTO value 3.4. Updating the RTO value
[Ekstroem04] and [Floyd05] have highlighted the problem that an [Ludwig00] and [Floyd05] have highlighted the problem that an
unmodified RTO calculation, which is updated with per-packet RTT unmodified RTO calculation, which is updated with per-packet RTT
samples, will truncate the path history too soon. This can lead to samples, will truncate the path history too soon. This can lead to
an increase in spurious retransmissions, when the path properties an increase in spurious retransmissions, when the path properties
vary in the order of a few RTTs, but a high number of RTT samples are vary in the order of a few RTTs, but a high number of RTT samples are
taken on a much shorter timescale. taken on a much shorter timescale.
Implementers should note that with timestamps multiple RTTMs can be Implementers should note that with timestamps multiple RTTMs can be
taken per RTT. The [RFC6298] RTO estimator has weighting factors, taken per RTT. The [RFC6298] RTO estimator has weighting factors,
alpha and beta, based on an implicit assumption that at most one RTTM alpha and beta, based on an implicit assumption that at most one RTTM
will be sampled per RTT. When using multiple RTTMs per RTT to update will be sampled per RTT. When multiple RTTMs per RTT are available
the RTO estimator, the weighting factor SHOULD be decreased to take to update the RTO estimator, this implicit assumption must be
into account the more frequent RTTMs. considered. An implementation suggestion is detailed in Appendix G.
For example, an implementation could choose to
o just use one sample per RTT to update the RTO estimator, or
o vary the gain based on the congestion window, or
o take an average of all the RTT measurements (and the maximum of
the variance) received over one RTT,
and then use that value to update the RTO estimator. This document
does not prescribe any particular method for modifying the RTO
estimator.
3.5. Which Timestamp to Echo 3.5. Which Timestamp to Echo
If more than one Timestamp Option is received before a reply segment If more than one Timestamps option is received before a reply segment
is sent, the TCP must choose only one of the TSvals to echo, ignoring is sent, the TCP must choose only one of the TSvals to echo, ignoring
the others. To minimize the state kept in the receiver (i.e., the the others. To minimize the state kept in the receiver (i.e., the
number of unprocessed TSvals), the receiver should be required to number of unprocessed TSvals), the receiver should be required to
retain at most one timestamp in the connection control block. retain at most one timestamp in the connection control block.
There are three situations to consider: There are three situations to consider:
(A) Delayed ACKs. (A) Delayed ACKs.
Many TCP's acknowledge only every second segment out of a group Many TCP's acknowledge only every second segment out of a group
skipping to change at page 16, line 25 skipping to change at page 16, line 12
SHOULD therefore contain the timestamp from the most recent SHOULD therefore contain the timestamp from the most recent
segment that advanced the window. segment that advanced the window.
The same situation occurs if segments are re-ordered by the The same situation occurs if segments are re-ordered by the
network. network.
(C) A filled hole in the sequence space. (C) A filled hole in the sequence space.
The segment that fills the hole and advances the window The segment that fills the hole and advances the window
represents the most recent measurement of the network represents the most recent measurement of the network
characteristics. A RTT computed from an earlier segment would characteristics. An RTT computed from an earlier segment would
probably include the sender's retransmit time-out, badly biasing probably include the sender's retransmit time-out, badly biasing
the sender's average RTT estimate. Thus, the timestamp from the the sender's average RTT estimate. Thus, the timestamp from the
latest segment (which filled the hole) MUST be echoed. latest segment (which filled the hole) MUST be echoed.
An algorithm that covers all three cases is described in the An algorithm that covers all three cases is described in the
following rules for Timestamp Option processing on a synchronized following rules for Timestamps option processing on a synchronized
connection: connection:
(1) The connection state is augmented with two 32-bit slots: (1) The connection state is augmented with two 32-bit slots:
TS.Recent holds a timestamp to be echoed in TSecr whenever a TS.Recent holds a timestamp to be echoed in TSecr whenever a
segment is sent, and Last.ACK.sent holds the ACK field from the segment is sent, and Last.ACK.sent holds the ACK field from the
last segment sent. Last.ACK.sent will equal RCV.NXT except when last segment sent. Last.ACK.sent will equal RCV.NXT except when
<ACK>s have been delayed. <ACK>s have been delayed.
(2) If: (2) If:
skipping to change at page 18, line 5 skipping to change at page 18, line 5
2 2
<E, TSval=5> -------------------> <E, TSval=5> ------------------->
2 2
<---- <ACK(C), TSecr=2> <---- <ACK(C), TSecr=2>
2 2
<D, TSval=4> -------------------> <D, TSval=4> ------------------->
4 4
<---- <ACK(E), TSecr=4> <---- <ACK(E), TSecr=4>
(etc) (etc)
4. PAWS -- Protection Against Wrapped Sequence Numbers 4. PAWS - Protection Against Wrapped Sequence Numbers
4.1. Introduction 4.1. Introduction
Section 4.2 describes a simple mechanism to reject old duplicate Section 4.2 describes a simple mechanism to reject old duplicate
segments that might corrupt an open TCP connection; we call this segments that might corrupt an open TCP connection; we call this
mechanism PAWS (Protection Against Wrapped Sequence numbers). PAWS mechanism PAWS (Protection Against Wrapped Sequence numbers). PAWS
operates within a single TCP connection, using state that is saved in operates within a single TCP connection, using state that is saved in
the connection control block. Section 4.8 and Appendix G discuss the the connection control block. Section 4.8 and Appendix H discuss the
implications of the PAWS mechanism for avoiding old duplicates from implications of the PAWS mechanism for avoiding old duplicates from
previous incarnations of the same connection. previous incarnations of the same connection.
4.2. The PAWS Mechanism 4.2. The PAWS Mechanism
PAWS uses the same TCP Timestamp Option as the RTTM mechanism PAWS uses the same TCP Timestamps option as the RTTM mechanism
described earlier, and assumes that every received TCP segment described earlier, and assumes that every received TCP segment
(including data and <ACK> segments) contains a timestamp SEG.TSval (including data and <ACK> segments) contains a timestamp SEG.TSval
whose values are monotonically non-decreasing in time. The basic whose values are monotonically non-decreasing in time. The basic
idea is that a segment can be discarded as an old duplicate if it is idea is that a segment can be discarded as an old duplicate if it is
received with a timestamp SEG.TSval less than some timestamp recently received with a timestamp SEG.TSval less than some timestamp recently
received on this connection. received on this connection.
In both the PAWS and the RTTM mechanism, the "timestamps" are 32-bit In both the PAWS and the RTTM mechanism, the "timestamps" are 32-bit
unsigned integers in a modular 32-bit space. Thus, "less than" is unsigned integers in a modular 32-bit space. Thus, "less than" is
defined the same way it is for TCP sequence numbers, and the same defined the same way it is for TCP sequence numbers, and the same
skipping to change at page 19, line 23 skipping to change at page 19, line 23
synchronized connection. Duplicate <SYN> and <SYN,ACK> segments synchronized connection. Duplicate <SYN> and <SYN,ACK> segments
received when there is no connection will be discarded by the normal received when there is no connection will be discarded by the normal
3-way handshake and sequence number checks of TCP. 3-way handshake and sequence number checks of TCP.
[RFC1323] recommended that <RST> segments NOT carry timestamps, and [RFC1323] recommended that <RST> segments NOT carry timestamps, and
that they be acceptable regardless of their timestamp. At that time, that they be acceptable regardless of their timestamp. At that time,
the thinking was that old duplicate <RST> segments should be the thinking was that old duplicate <RST> segments should be
exceedingly unlikely, and their cleanup function should take exceedingly unlikely, and their cleanup function should take
precedence over timestamps. More recently, discussions about various precedence over timestamps. More recently, discussions about various
blind attacks on TCP connections have raised the suggestion that if blind attacks on TCP connections have raised the suggestion that if
the timestamp option is present, SEG.TSecr could be used to provide the Timestamps option is present, SEG.TSecr could be used to provide
stricter acceptance tests for <RST> segments. While still under stricter acceptance tests for <RST> segments. While still under
discussion, to enable research into this area it is now RECOMMENDED discussion, to enable research into this area it is now RECOMMENDED
that when generating a <RST>, that if the segment causing the <RST> that when generating a <RST>, that if the segment causing the <RST>
to be generated contained a timestamp option, that the <RST> also to be generated contained a Timestamps option, that the <RST> also
contain a timestamp option. In the <RST> segment, SEG.TSecr SHOULD contain a Timestamps option. In the <RST> segment, SEG.TSecr SHOULD
be set to SEG.TSval from the incoming segment and SEG.TSval SHOULD be be set to SEG.TSval from the incoming segment and SEG.TSval SHOULD be
set to zero. If a <RST> is being generated because of a user abort, set to zero. If a <RST> is being generated because of a user abort,
and Snd.TS.OK is set, then a timestamp option SHOULD be included in and Snd.TS.OK is set, then a Timestamps option SHOULD be included in
the <RST>. When a <RST> segment is received, it MUST NOT be the <RST>. When a <RST> segment is received, it MUST NOT be
subjected to PAWS checks, and information from the timestamp option subjected to PAWS checks, and information from the Timestamps option
MUST NOT be used to update connection state information. SEG.TSecr MUST NOT be used to update connection state information. SEG.TSecr
MAY be used to provide stricter <RST> acceptance checks. MAY be used to provide stricter <RST> acceptance checks.
4.3. Basic PAWS Algorithm 4.3. Basic PAWS Algorithm
The PAWS algorithm REQUIRES the following processing to be performed The PAWS algorithm REQUIRES the following processing to be performed
on all incoming segments for a synchronized connection. Also, PAWS on all incoming segments for a synchronized connection. Also, PAWS
processing MUST take precedence over the regular TCP acceptablitiy processing MUST take precedence over the regular TCP acceptablitiy
check (Section 3.3 in [RFC0793]), which is performed after check (Section 3.3 in [RFC0793]), which is performed after
verification of the received timestamp option: verification of the received Timestamps option:
R1) If there is a Timestamp Option in the arriving segment, R1) If there is a Timestamps option in the arriving segment,
SEG.TSval < TS.Recent, TS.Recent is valid (see later discussion) SEG.TSval < TS.Recent, TS.Recent is valid (see later discussion)
and the RST bit is not set, then treat the arriving segment as and the RST bit is not set, then treat the arriving segment as
not acceptable: not acceptable:
Send an acknowledgement in reply as specified in [RFC0793] Send an acknowledgement in reply as specified in [RFC0793]
page 69 and drop the segment. page 69 and drop the segment.
Note: it is necessary to send an <ACK> segment in order to Note: it is necessary to send an <ACK> segment in order to
retain TCP's mechanisms for detecting and recovering from retain TCP's mechanisms for detecting and recovering from
half-open connections. For example, see Figure 10 of half-open connections. For example, see Figure 10 of
skipping to change at page 20, line 47 skipping to change at page 20, line 47
retransmitted as segment B.2 (using the latest timestamp), it retransmitted as segment B.2 (using the latest timestamp), it
fills the hole and causes all the segments through Z to be fills the hole and causes all the segments through Z to be
acknowledged and passed to the user. The timestamps of the queued acknowledged and passed to the user. The timestamps of the queued
segments are *not* inspected again at this time, since they have segments are *not* inspected again at this time, since they have
already been accepted. When B.2 is accepted, TS.Recent is set to already been accepted. When B.2 is accepted, TS.Recent is set to
2. 2.
This rule allows reasonable performance under loss. A full window of This rule allows reasonable performance under loss. A full window of
data is in transit at all times, and after a loss a full window less data is in transit at all times, and after a loss a full window less
one segment will show up out-of-sequence to be queued at the receiver one segment will show up out-of-sequence to be queued at the receiver
(e.g., up to ~2^30 bytes of data); the timestamp option must not (e.g., up to ~2^30 bytes of data); the Timestamps option must not
result in discarding this data. result in discarding this data.
In certain unlikely circumstances, the algorithm of rules R1-R5 could In certain unlikely circumstances, the algorithm of rules R1-R5 could
lead to discarding some segments unnecessarily, as shown in the lead to discarding some segments unnecessarily, as shown in the
following example: following example:
Suppose again that segments: A.1, B.1, C.1, ..., Z.1 have been Suppose again that segments: A.1, B.1, C.1, ..., Z.1 have been
sent in sequence and that segment B.1 has been lost. Furthermore, sent in sequence and that segment B.1 has been lost. Furthermore,
suppose delivery of some of C.1, ... Z.1 is delayed until *after* suppose delivery of some of C.1, ... Z.1 is delayed until *after*
the retransmission B.2 arrives at the receiver. These delayed the retransmission B.2 arrives at the receiver. These delayed
skipping to change at page 27, line 28 skipping to change at page 27, line 28
* If a Window Scale option is removed from a <SYN,ACK> segment, * If a Window Scale option is removed from a <SYN,ACK> segment,
the end hosts will not negotiate the window scaling factor the end hosts will not negotiate the window scaling factor
correctly. Middleboxes must not remove or modify the Window correctly. Middleboxes must not remove or modify the Window
Scale option from <SYN,ACK> segments. Scale option from <SYN,ACK> segments.
* If a stateful firewall uses the window field to detect whether * If a stateful firewall uses the window field to detect whether
a received segment is inside the current window, and does not a received segment is inside the current window, and does not
support the Window Scale option, it will not be able to support the Window Scale option, it will not be able to
correctly determine whether or not a packet is in the window. correctly determine whether or not a packet is in the window.
These middle boxes must also support the Window Scale option These middle boxes must also support the Window Scale option
and apply the scaling when processing segments. If the window and apply the scale factor when processing segments. If the
scale cannot be determined, it must not do window based window scale factor cannot be determined, it must not do window
processing. based processing.
* If the Timestamp option is removed from the <SYN> or <SYN,ACK> * If the Timestamps option is removed from the <SYN> or <SYN,ACK>
segment, high speed connections that need PAWS would not have segment, high speed connections that need PAWS would not have
that protection. Middleboxes should not remove the Timestamp that protection. Middleboxes should not remove the Timestamps
option. option.
Implementations that depend on PAWS could provide a mechanism for the Implementations that depend on PAWS could provide a mechanism for the
application to determine whether or not PAWS is in use on the application to determine whether or not PAWS is in use on the
connection, and chose to terminate the connection if that protection connection, and chose to terminate the connection if that protection
doesn't exist. This is not just to protect the connection against doesn't exist. This is not just to protect the connection against
middleboxes that might remove the Timestamp option, but also against middleboxes that might remove the Timestamps option, but also against
remote hosts that do not have Timestamp support. remote hosts that do not have Timestamp support.
7. IANA Considerations 7. IANA Considerations
This document has no actions for IANA. This document has no actions for IANA.
8. References 8. References
8.1. Normative References 8.1. Normative References
[RFC0793] Postel, J., "Transmission Control Protocol", STD 7, [RFC0793] Postel, J., "Transmission Control Protocol", STD 7,
RFC 793, September 1981. RFC 793, September 1981.
[RFC1191] Mogul, J. and S. Deering, "Path MTU discovery", RFC 1191, [RFC1191] Mogul, J. and S. Deering, "Path MTU discovery", RFC 1191,
November 1990. November 1990.
[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
Requirement Levels", BCP 14, RFC 2119, March 1997. Requirement Levels", BCP 14, RFC 2119, March 1997.
skipping to change at page 29, line 21 skipping to change at page 29, line 24
[Jain86] Jain, R., "Divergence of Timeout Algorithms for Packet [Jain86] Jain, R., "Divergence of Timeout Algorithms for Packet
Retransmissions", Proc. Fifth Phoenix Conf. on Comp. and Retransmissions", Proc. Fifth Phoenix Conf. on Comp. and
Comm., Scottsdale, Arizona, March 1986, Comm., Scottsdale, Arizona, March 1986,
<http://arxiv.org/ftp/cs/papers/9809/9809097.pdf>. <http://arxiv.org/ftp/cs/papers/9809/9809097.pdf>.
[Karn87] Karn, P. and C. Partridge, "Estimating Round-Trip Times in [Karn87] Karn, P. and C. Partridge, "Estimating Round-Trip Times in
Reliable Transport Protocols", Proc. SIGCOMM '87, Reliable Transport Protocols", Proc. SIGCOMM '87,
August 1987. August 1987.
[Ludwig00]
Ludwig, R. and K. Sklower, "The Eifel Retransmission
Timer", ACM SIGCOMM Computer Communication Review Volume
30 Issue 3, July 2000, <http://ccr.sigcomm.org/archive/
2000/july00/LudwigFinal.pdf>.
[Martin03] [Martin03]
Martin, D., "[Tsvwg] RFC 1323.bis", Message to the tsvwg Martin, D., "[Tsvwg] RFC 1323.bis", Message to the tsvwg
mailing list, September 2003, <http://www.ietf.org/ mailing list, September 2003, <http://www.ietf.org/
mail-archive/web/tsvwg/current/msg04435.html>. mail-archive/web/tsvwg/current/msg04435.html>.
[Mathis08] [Mathis08]
Mathis, M., "[tcpm] Example of 1323 window retraction Mathis, M., "[tcpm] Example of 1323 window retraction
problem", Message to the tcpm mailing list, March 2008, <h problem", Message to the tcpm mailing list, March 2008, <h
ttp://www.ietf.org/mail-archive/web/tcpm/current/ ttp://www.ietf.org/mail-archive/web/tcpm/current/
msg03564.html>. msg03564.html>.
skipping to change at page 31, line 22 skipping to change at page 31, line 29
Protocol Connection Management", Computer Networks, Vol. Protocol Connection Management", Computer Networks, Vol.
5, 1981. 5, 1981.
[Zhang86] Zhang, L., "Why TCP Timers Don't Work Well", Proc. SIGCOMM [Zhang86] Zhang, L., "Why TCP Timers Don't Work Well", Proc. SIGCOMM
'86, Stowe, VT, August 1986. '86, Stowe, VT, August 1986.
Appendix A. Implementation Suggestions Appendix A. Implementation Suggestions
TCP Option Layout TCP Option Layout
The following layouts are recommended for sending options on non- The following layout is recommended for sending options on non-
<SYN> segments, to achieve maximum feasible alignment of 32-bit <SYN> segments, to achieve maximum feasible alignment of 32-bit
and 64-bit machines. and 64-bit machines.
+--------+--------+--------+--------+ +--------+--------+--------+--------+
| NOP | NOP | TSopt | 10 | | NOP | NOP | TSopt | 10 |
+--------+--------+--------+--------+ +--------+--------+--------+--------+
| TSval timestamp | | TSval timestamp |
+--------+--------+--------+--------+ +--------+--------+--------+--------+
| TSecr timestamp | | TSecr timestamp |
+--------+--------+--------+--------+ +--------+--------+--------+--------+
skipping to change at page 32, line 39 skipping to change at page 32, line 46
TCP's quiet time of one MSL upon system startup handles the loss of TCP's quiet time of one MSL upon system startup handles the loss of
connection state in a system crash/restart. For an explanation, see connection state in a system crash/restart. For an explanation, see
for example "When to Keep Quiet" in the TCP protocol specification for example "When to Keep Quiet" in the TCP protocol specification
[RFC0793]. The MSL that is required here does not depend upon the [RFC0793]. The MSL that is required here does not depend upon the
transfer speed. The current TCP MSL of 2 minutes seemed acceptable transfer speed. The current TCP MSL of 2 minutes seemed acceptable
as an operational compromise, when many host systems used to take as an operational compromise, when many host systems used to take
this long to boot after a crash. Current host systems can boot this long to boot after a crash. Current host systems can boot
considerably faster. considerably faster.
The timestamp option may be used to ease the MSL requirements (or to The Timestamps option may be used to ease the MSL requirements (or to
provide additional security against data corruption). If timestamps provide additional security against data corruption). If timestamps
are being used and if the timestamp clock can be guaranteed to be are being used and if the timestamp clock can be guaranteed to be
monotonic over a system crash/restart, i.e., if the first value of monotonic over a system crash/restart, i.e., if the first value of
the sender's timestamp clock after a crash/restart can be guaranteed the sender's timestamp clock after a crash/restart can be guaranteed
to be greater than the last value before the restart, then a quiet to be greater than the last value before the restart, then a quiet
time is unnecessary. time is unnecessary.
To dispense totally with the quiet time would require that the host To dispense totally with the quiet time would require that the host
clock be synchronized to a time source that is stable over the crash/ clock be synchronized to a time source that is stable over the crash/
restart period, with an accuracy of one timestamp clock tick or restart period, with an accuracy of one timestamp clock tick or
skipping to change at page 34, line 30 skipping to change at page 34, line 37
only necessary to keep one quantity per remote host, regardless only necessary to keep one quantity per remote host, regardless
of the number of simultaneous connections to that host. of the number of simultaneous connections to that host.
Appendix C. Summary of Notation Appendix C. Summary of Notation
The following notation has been used in this document. The following notation has been used in this document.
Options Options
WSopt: TCP Window Scale Option WSopt: TCP Window Scale Option
TSopt: TCP Timestamp Option TSopt: TCP Timestamps option
Option Fields Option Fields
shift.cnt: Window scale byte in WSopt shift.cnt: Window scale byte in WSopt
TSval: 32-bit Timestamp Value field in TSopt TSval: 32-bit Timestamp Value field in TSopt
TSecr: 32-bit Timestamp Reply field in TSopt TSecr: 32-bit Timestamp Reply field in TSopt
Option Fields in Current Segment Option Fields in Current Segment
SEG.TSval: TSval field from TSopt in current segment SEG.TSval: TSval field from TSopt in current segment
skipping to change at page 36, line 18 skipping to change at page 36, line 26
... ...
ESTABLISHED STATE ESTABLISHED STATE
CLOSE-WAIT STATE CLOSE-WAIT STATE
Segmentize the buffer and send it with a piggybacked Segmentize the buffer and send it with a piggybacked
acknowledgment (acknowledgment value = RCV.NXT). ... acknowledgment (acknowledgment value = RCV.NXT). ...
If the urgent flag is set ... If the urgent flag is set ...
If the Snd.TS.OK flag is set, then include the TCP Timestamp If the Snd.TS.OK flag is set, then include the TCP Timestamps
Option <TSval=Snd.TSclock,TSecr=TS.Recent> in each data option <TSval=Snd.TSclock,TSecr=TS.Recent> in each data
segment. segment.
Scale the receive window for transmission in the segment Scale the receive window for transmission in the segment
header: header:
SEG.WND = (RCV.WND >> Rcv.Wind.Shift). SEG.WND = (RCV.WND >> Rcv.Wind.Shift).
SEGMENT ARRIVES SEGMENT ARRIVES
... ...
skipping to change at page 39, line 43 skipping to change at page 40, line 6
... ...
If an incoming segment is not acceptable, an acknowledgment If an incoming segment is not acceptable, an acknowledgment
should be sent in reply (unless the RST bit is set, if so should be sent in reply (unless the RST bit is set, if so
drop the segment and return): drop the segment and return):
<SEQ=SND.NXT><ACK=RCV.NXT><CTL=ACK> <SEQ=SND.NXT><ACK=RCV.NXT><CTL=ACK>
Last.ACK.sent is set to SEG.ACK of the acknowledgment. If Last.ACK.sent is set to SEG.ACK of the acknowledgment. If
the Snd.Echo.OK bit is on, include the Timestamp Option the Snd.Echo.OK bit is on, include the Timestamps option
<TSval=Snd.TSclock,TSecr=TS.Recent> in this <ACK> segment. <TSval=Snd.TSclock,TSecr=TS.Recent> in this <ACK> segment.
Set Last.ACK.sent to SEG.ACK and send the <ACK> segment. Set Last.ACK.sent to SEG.ACK and send the <ACK> segment.
After sending the acknowledgment, drop the unacceptable After sending the acknowledgment, drop the unacceptable
segment and return. segment and return.
... ...
fifth check the ACK field. fifth check the ACK field.
if the ACK bit is off drop the segment and return. if the ACK bit is off drop the segment and return.
skipping to change at page 42, line 16 skipping to change at page 42, line 27
In most stacks it is at least partially obscured when the window size In most stacks it is at least partially obscured when the window size
is larger than some small number of segments because the stacks is larger than some small number of segments because the stacks
prefer to announce windows that are an integral number of segments, prefer to announce windows that are an integral number of segments,
rounded up to the next scale factor. This plus silly window rounded up to the next scale factor. This plus silly window
suppression tends to cause less frequent, larger window updates. If suppression tends to cause less frequent, larger window updates. If
the window was rounded down to a segment size there is more the window was rounded down to a segment size there is more
opportunity to advance the window, the BEYOND BUFFER case above, opportunity to advance the window, the BEYOND BUFFER case above,
rather than retracting it. rather than retracting it.
Appendix G. Changes from RFC 1323 Appendix G. RTO calculation modification
Taking multiple RTT samples per window would shorten the history
calculated by the RTO mechanism in [RFC6298], and the below algorithm
aims to maintain a similar history as originally intended by
[RFC6298].
It is roughly known how many samples a congestion window worth of
data will yield, not accounting for ACK compression, and ACK losses.
Such events will result in more history of the path being reflected
in the final value for RTO, and are uncritical. This modification
will ensure that a similar amount of time is taken into account for
the RTO estimation, regardless of how many samples are taken per
window:
ExpectedSamples = ceiling(FlightSize / (SMSS * 2))
alpha' = alpha / ExpectedSamples
beta' = beta / ExpectedSamples
Note that the factor 2 in ExpectedSamples is due to "Delayed ACKs".
Instead of using alpha and beta in the algorithm of [RFC6298], use
alpha' and beta' instead:
RTTVAR <- (1 - beta') * RTTVAR + beta' * |SRTT - R'|
SRTT <- (1 - alpha') * SRTT + alpha' * R'
(for each sample R')
Appendix H. Changes from RFC 1323
Several important updates and clarifications to the specification in Several important updates and clarifications to the specification in
RFC 1323 are made in these document. The technical changes are RFC 1323 are made in these document. The technical changes are
summarized below: summarized below:
(a) Section 2.4 was added describing the unavoidable window (a) A wrong reference to SND.WND was corrected to SEG.WND in
Section 2.3
(b) Section 2.4 was added describing the unavoidable window
retraction issue, and explicitly describing the mitigation steps retraction issue, and explicitly describing the mitigation steps
necessary. necessary.
(b) In Section 3.2 the wording how timestamp option negotiation is (c) In Section 3.2 the wording how the Timestamps option negotiation
to be performed was updated with RFC2119 wording. Further, a is to be performed was updated with RFC2119 wording. Further, a
number of paragraphs were added to clarify the expected behavior number of paragraphs were added to clarify the expected behavior
with a compliant implementation using TSopt, as RFC1323 left with a compliant implementation using TSopt, as RFC1323 left
room for interpretation - e.g. potential late enablement of room for interpretation - e.g. potential late enablement of
TSopt. TSopt.
(c) The description of which TSecr values can be used to update the (d) The description of which TSecr values can be used to update the
measured RTT has been clarified. Specifically, with timestamps, measured RTT has been clarified. Specifically, with timestamps,
the Karn algorithm [Karn87] is disabled. The Karn algorithm the Karn algorithm [Karn87] is disabled. The Karn algorithm
disables all RTT measurements during retransmission, since it is disables all RTT measurements during retransmission, since it is
ambiguous whether the <ACK> is for the original segment, or the ambiguous whether the <ACK> is for the original segment, or the
retransmitted segment. With timestamps, that ambiguity is retransmitted segment. With timestamps, that ambiguity is
removed since the TSecr in the <ACK> will contain the TSval from removed since the TSecr in the <ACK> will contain the TSval from
whichever data segment made it to the destination. whichever data segment made it to the destination.
(d) RTTM update processing explicitly excludes segments not updating (e) RTTM update processing explicitly excludes segments not updating
SND.UNA. The original text could be interpreted to allow taking SND.UNA. The original text could be interpreted to allow taking
RTT samples when SACK acknowledges some new, non-continuous RTT samples when SACK acknowledges some new, non-continuous
data. data.
(e) In RFC1323, section 3.4, step (2) of the algorithm to control (f) In RFC1323, section 3.4, step (2) of the algorithm to control
which timestamp is echoed was incorrect in two regards: which timestamp is echoed was incorrect in two regards:
(1) It failed to update TS.recent for a retransmitted segment (1) It failed to update TS.recent for a retransmitted segment
that resulted from a lost <ACK>. that resulted from a lost <ACK>.
(2) It failed if SEG.LEN = 0. (2) It failed if SEG.LEN = 0.
In the new algorithm, the case of SEG.TSval >= TS.recent is In the new algorithm, the case of SEG.TSval >= TS.recent is
included for consistency with the PAWS test. included for consistency with the PAWS test.
(f) It is now recommended that Timestamp Options be included in (g) It is now recommended that the Timestamps option is included in
<RST> segments if the incoming segment contained a Timestamp <RST> segments if the incoming segment contained a Timestamps
Option. option.
(g) <RST> segments are explicitly excluded from PAWS processing. (h) <RST> segments are explicitly excluded from PAWS processing.
(h) Added text to clarify the precedence between regular TCP (i) Added text to clarify the precedence between regular TCP
[RFC0793] and timestamp/PAWS [RFCxxxx] processing. Discussion [RFC0793] and this document Timestamps option / PAWS processing.
about combined acceptability checks are ongoing. Discussion about combined acceptability checks are ongoing.
(i) Snd.TSoffset and Snd.TSclock variables have been added. (j) Snd.TSoffset and Snd.TSclock variables have been added.
Snd.TSclock is the sum of my.TSclock and Snd.TSoffset. This Snd.TSclock is the sum of my.TSclock and Snd.TSoffset. This
allows the starting points for timestamp values to be randomized allows the starting points for timestamp values to be randomized
on a per-connection basis. Setting Snd.TSoffset to zero yields on a per-connection basis. Setting Snd.TSoffset to zero yields
the same results as [RFC1323]. the same results as [RFC1323].
(j) Appendix A has been expanded with information about the TCP (k) Appendix A has been expanded with information about the TCP
Urgent Pointer. An earlier revision contained text around the Urgent Pointer. An earlier revision contained text around the
TCP MSS option, which was split off into [RFC6691]. TCP MSS option, which was split off into [RFC6691].
(k) One correction was made to the Event Processing Summary in (l) One correction was made to the Event Processing Summary in
Appendix D. In SEND CALL/ESTABLISHED STATE, RCV.WND is used to Appendix D. In SEND CALL/ESTABLISHED STATE, RCV.WND is used to
fill in the SEG.WND value, not SND.WND. fill in the SEG.WND value, not SND.WND.
(m) Appendix G was added to exemplify how an RTO calculation might
be updated to properly take the much higher RTT sampling
frequency enabled by the Timestamps option into account.
Editorial changes of the document, that don't impact the Editorial changes of the document, that don't impact the
implementation or function of the mechanisms described in this implementation or function of the mechanisms described in this
document include: document include:
(a) Removed much of the discussion in Section 1 to streamline the (a) Removed much of the discussion in Section 1 to streamline the
document. However, detailed examples and discussions in document. However, detailed examples and discussions in
Section 2, Section 3 and Section 4 are kept as guideline for Section 2, Section 3 and Section 4 are kept as guideline for
implementers. implementers.
(b) Removed references to "new" options, as the options were (b) Removed references to "new" options, as the options were
introduced in [RFC1323] already. Changed the text in introduced in [RFC1323] already. Changed the text in
Section 1.3 to specifically address TS and WS options. Section 1.3 to specifically address TS and WS options.
(c) Section 1.4 was added for RFC2119 wording. Normative text was (c) Section 1.4 was added for RFC2119 wording. Normative text was
updated with the appropriate phrases. updated with the appropriate phrases.
(d) Added < > brackets to mark specific types of segments, and (d) Added < > brackets to mark specific types of segments, and
replaced most occurances of "packet" with "segment", where TCP replaced most occurances of "packet" with "segment", where TCP
segments are referred. segments are referred to.
(e) Updated the text in section 3 to take into account what has been (e) Updated the text in Section 3 to take into account what has been
learned since [RFC1323]. learned since [RFC1323].
(f) Removed the list of changes between RFC 1323 and prior versions. (f) Removed the list of changes between [RFC1323] and prior
These changes are mentioned in Appendix C of RFC 1323. versions. These changes are mentioned in Appendix C of
[RFC1323].
(g) Moved Appendix "Changes" at the end of the appendices for easier (g) Moved Appendix Changes from RFC 1323 at the end of the
lookup. In addition, the entries were split into a technical appendices for easier lookup. In addition, the entries were
and an editorial part, and sorted to roughly correspond with the split into a technical and an editorial part, and sorted to
sections in the text where they apply. roughly correspond with the sections in the text where they
apply.
Authors' Addresses Authors' Addresses
David Borman David Borman
Quantum Corporation Quantum Corporation
Mendota Heights MN 55120 Mendota Heights MN 55120
USA USA
Email: david.borman@quantum.com Email: david.borman@quantum.com
 End of changes. 70 change blocks. 
99 lines changed or deleted 135 lines changed or added

This html diff was produced by rfcdiff 1.41. The latest version is available from http://tools.ietf.org/tools/rfcdiff/