draft-ietf-tcpm-1323bis-15.txt   draft-ietf-tcpm-1323bis-16.txt 
TCP Maintenance (TCPM) D. Borman TCP Maintenance (TCPM) D. Borman
Internet-Draft Quantum Corporation Internet-Draft Quantum Corporation
Intended status: Standards Track B. Braden Intended status: Standards Track B. Braden
Expires: February 7, 2014 University of Southern Expires: May 16, 2014 University of Southern
California California
V. Jacobson V. Jacobson
Google, Inc. Google, Inc.
R. Scheffenegger, Ed. R. Scheffenegger, Ed.
NetApp, Inc. NetApp, Inc.
August 6, 2013 November 12, 2013
TCP Extensions for High Performance TCP Extensions for High Performance
draft-ietf-tcpm-1323bis-15 draft-ietf-tcpm-1323bis-16
Abstract Abstract
This document specifies a set of TCP extensions to improve This document specifies a set of TCP extensions to improve
performance over paths with a large bandwidth * delay product and to performance over paths with a large bandwidth * delay product and to
provide reliable operation over very high-speed paths. It defines provide reliable operation over very high-speed paths. It defines
TCP options for scaled windows and timestamps. The timestamps are TCP options for scaled windows and timestamps. The timestamps can be
used for two distinct mechanisms, RTTM (Round Trip Time Measurement) used for two distinct mechanisms, PAWS (Protection Against Wrapped
and PAWS (Protection Against Wrapped Sequences). Sequences) and RTTM (Round Trip Time Measurement).
This document obsoletes RFC 1323 and describes changes from it. This document obsoletes RFC 1323 and describes changes from it.
Status of this Memo Status of this Memo
This Internet-Draft is submitted in full conformance with the This Internet-Draft is submitted in full conformance with the
provisions of BCP 78 and BCP 79. provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF). Note that other groups may also distribute Task Force (IETF). Note that other groups may also distribute
working documents as Internet-Drafts. The list of current Internet- working documents as Internet-Drafts. The list of current Internet-
Drafts is at http://datatracker.ietf.org/drafts/current/. Drafts is at http://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress." material or to cite them other than as "work in progress."
This Internet-Draft will expire on February 7, 2014. This Internet-Draft will expire on May 16, 2014.
Copyright Notice Copyright Notice
Copyright (c) 2013 IETF Trust and the persons identified as the Copyright (c) 2013 IETF Trust and the persons identified as the
document authors. All rights reserved. document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents Provisions Relating to IETF Documents
(http://trustee.ietf.org/license-info) in effect on the date of (http://trustee.ietf.org/license-info) in effect on the date of
publication of this document. Please review these documents publication of this document. Please review these documents
skipping to change at page 3, line 20 skipping to change at page 3, line 20
1.3. Using TCP options . . . . . . . . . . . . . . . . . . . . 6 1.3. Using TCP options . . . . . . . . . . . . . . . . . . . . 6
1.4. Terminology . . . . . . . . . . . . . . . . . . . . . . . 7 1.4. Terminology . . . . . . . . . . . . . . . . . . . . . . . 7
2. TCP Window Scale Option . . . . . . . . . . . . . . . . . . . 8 2. TCP Window Scale Option . . . . . . . . . . . . . . . . . . . 8
2.1. Introduction . . . . . . . . . . . . . . . . . . . . . . . 8 2.1. Introduction . . . . . . . . . . . . . . . . . . . . . . . 8
2.2. Window Scale Option . . . . . . . . . . . . . . . . . . . 8 2.2. Window Scale Option . . . . . . . . . . . . . . . . . . . 8
2.3. Using the Window Scale Option . . . . . . . . . . . . . . 9 2.3. Using the Window Scale Option . . . . . . . . . . . . . . 9
2.4. Addressing Window Retraction . . . . . . . . . . . . . . . 10 2.4. Addressing Window Retraction . . . . . . . . . . . . . . . 10
3. TCP Timestamps option . . . . . . . . . . . . . . . . . . . . 12 3. TCP Timestamps option . . . . . . . . . . . . . . . . . . . . 12
3.1. Introduction . . . . . . . . . . . . . . . . . . . . . . . 12 3.1. Introduction . . . . . . . . . . . . . . . . . . . . . . . 12
3.2. Timestamps option . . . . . . . . . . . . . . . . . . . . 12 3.2. Timestamps option . . . . . . . . . . . . . . . . . . . . 12
3.3. The RTTM Mechanism . . . . . . . . . . . . . . . . . . . . 14 4. The RTTM Mechanism . . . . . . . . . . . . . . . . . . . . . . 15
3.4. Updating the RTO value . . . . . . . . . . . . . . . . . . 15 4.1. Introduction . . . . . . . . . . . . . . . . . . . . . . . 15
3.5. Which Timestamp to Echo . . . . . . . . . . . . . . . . . 16 4.2. Updating the RTO value . . . . . . . . . . . . . . . . . . 16
4. PAWS - Protection Against Wrapped Sequence Numbers . . . . . . 18 4.3. Which Timestamp to Echo . . . . . . . . . . . . . . . . . 16
4.1. Introduction . . . . . . . . . . . . . . . . . . . . . . . 18 5. PAWS - Protection Against Wrapped Sequence Numbers . . . . . . 20
4.2. The PAWS Mechanism . . . . . . . . . . . . . . . . . . . . 19 5.1. Introduction . . . . . . . . . . . . . . . . . . . . . . . 20
4.3. Basic PAWS Algorithm . . . . . . . . . . . . . . . . . . . 20 5.2. The PAWS Mechanism . . . . . . . . . . . . . . . . . . . . 20
4.4. Timestamp Clock . . . . . . . . . . . . . . . . . . . . . 22 5.3. Basic PAWS Algorithm . . . . . . . . . . . . . . . . . . . 21
4.5. Outdated Timestamps . . . . . . . . . . . . . . . . . . . 23 5.4. Timestamp Clock . . . . . . . . . . . . . . . . . . . . . 23
4.6. Header Prediction . . . . . . . . . . . . . . . . . . . . 24 5.5. Outdated Timestamps . . . . . . . . . . . . . . . . . . . 25
4.7. IP Fragmentation . . . . . . . . . . . . . . . . . . . . . 25 5.6. Header Prediction . . . . . . . . . . . . . . . . . . . . 25
4.8. Duplicates from Earlier Incarnations of Connection . . . . 26 5.7. IP Fragmentation . . . . . . . . . . . . . . . . . . . . . 27
5. Conclusions and Acknowledgements . . . . . . . . . . . . . . . 26 5.8. Duplicates from Earlier Incarnations of Connection . . . . 27
6. Security Considerations . . . . . . . . . . . . . . . . . . . 27 6. Conclusions and Acknowledgements . . . . . . . . . . . . . . . 28
7. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 28 7. Security Considerations . . . . . . . . . . . . . . . . . . . 28
8. References . . . . . . . . . . . . . . . . . . . . . . . . . . 29 7.1. Privacy Considerations . . . . . . . . . . . . . . . . . . 30
8.1. Normative References . . . . . . . . . . . . . . . . . . . 29 8. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 30
8.2. Informative References . . . . . . . . . . . . . . . . . . 29 9. References . . . . . . . . . . . . . . . . . . . . . . . . . . 30
Appendix A. Implementation Suggestions . . . . . . . . . . . . . 32 9.1. Normative References . . . . . . . . . . . . . . . . . . . 30
Appendix B. Duplicates from Earlier Connection Incarnations . . . 33 9.2. Informative References . . . . . . . . . . . . . . . . . . 31
B.1. System Crash with Loss of State . . . . . . . . . . . . . 33 Appendix A. Implementation Suggestions . . . . . . . . . . . . . 34
B.2. Closing and Reopening a Connection . . . . . . . . . . . . 34 Appendix B. Duplicates from Earlier Connection Incarnations . . . 35
Appendix C. Summary of Notation . . . . . . . . . . . . . . . . . 35 B.1. System Crash with Loss of State . . . . . . . . . . . . . 36
Appendix D. Event Processing Summary . . . . . . . . . . . . . . 36 B.2. Closing and Reopening a Connection . . . . . . . . . . . . 36
Appendix E. Timestamps Edge Cases . . . . . . . . . . . . . . . . 42 Appendix C. Summary of Notation . . . . . . . . . . . . . . . . . 37
Appendix F. Window Retraction Example . . . . . . . . . . . . . . 42 Appendix D. Event Processing Summary . . . . . . . . . . . . . . 38
Appendix G. RTO calculation modification . . . . . . . . . . . . 43 Appendix E. Timestamps Edge Cases . . . . . . . . . . . . . . . . 44
Appendix H. Changes from RFC 1323 . . . . . . . . . . . . . . . . 44 Appendix F. Window Retraction Example . . . . . . . . . . . . . . 45
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 46 Appendix G. RTO calculation modification . . . . . . . . . . . . 45
Appendix H. Changes from RFC 1323 . . . . . . . . . . . . . . . . 46
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 48
1. Introduction 1. Introduction
The TCP protocol [RFC0793] was designed to operate reliably over The TCP protocol [RFC0793] was designed to operate reliably over
almost any transmission medium regardless of transmission rate, almost any transmission medium regardless of transmission rate,
delay, corruption, duplication, or reordering of segments. Over the delay, corruption, duplication, or reordering of segments. Over the
years, advances in networking technology has resulted in ever-higher years, advances in networking technology have resulted in ever-higher
transmission speeds, and the fastest paths are well beyond the domain transmission speeds, and the fastest paths are well beyond the domain
for which TCP was originally engineered. for which TCP was originally engineered.
This document defines a set of modest extensions to TCP to extend the This document defines a set of modest extensions to TCP to extend the
domain of its application to match the increasing network capability. domain of its application to match the increasing network capability.
It is an update to and obsoletes [RFC1323], which in turn is based It is an update to and obsoletes [RFC1323], which in turn is based
upon and obsoletes [RFC1072] and [RFC1185]. upon and obsoletes [RFC1072] and [RFC1185].
Changes between [RFC1323] and this document are detailed in Changes between [RFC1323] and this document are detailed in
Appendix H. Appendix H. These changes are partly due to errata in [RFC1323], and
partly due to the improved understanding of how the involved
components interact.
For brevity, the full discussions of the merits and history behind For brevity, the full discussions of the merits and history behind
the TCP options defined within this document have been omitted. the TCP options defined within this document have been omitted.
[RFC1323] should be consulted for reference. It is recommended that [RFC1323] should be consulted for reference. It is recommended that
a modern TCP stack implements and make use of the extensions a modern TCP stack implements and make use of the extensions
described in this document. described in this document.
1.1. TCP Performance 1.1. TCP Performance
TCP performance problems arise when the bandwidth * delay product is TCP performance problems arise when the bandwidth * delay product is
skipping to change at page 5, line 5 skipping to change at page 5, line 6
maximum throughput of the TCP connection over the path, i.e., maximum throughput of the TCP connection over the path, i.e.,
the amount of unacknowledged data that TCP can send in order to the amount of unacknowledged data that TCP can send in order to
keep the pipeline full. keep the pipeline full.
To circumvent this problem, Section 2 of this memo defines a TCP To circumvent this problem, Section 2 of this memo defines a TCP
option, "Window Scale", to allow windows larger than 2^16. This option, "Window Scale", to allow windows larger than 2^16. This
option defines an implicit scale factor, which is used to option defines an implicit scale factor, which is used to
multiply the window size value found in a TCP header to obtain multiply the window size value found in a TCP header to obtain
the true window size. the true window size.
It must be noted, that the use of large receive windows
increases the chance of too quickly wrapping sequence numbers,
as described below in Section 1.2, (1).
(2) Recovery from Losses (2) Recovery from Losses
Packet losses in an LFN can have a catastrophic effect on Packet losses in an LFN can have a catastrophic effect on
throughput. throughput.
To generalize the Fast Retransmit / Fast Recovery mechanism to To generalize the Fast Retransmit / Fast Recovery mechanism to
handle multiple packets dropped per window, Selective handle multiple packets dropped per window, Selective
Acknowledgments are required. Unlike the normal cumulative Acknowledgments are required. Unlike the normal cumulative
acknowledgments of TCP, Selective Acknowledgments give the acknowledgments of TCP, Selective Acknowledgments give the
sender a complete picture of which segments are queued at the sender a complete picture of which segments are queued at the
skipping to change at page 5, line 36 skipping to change at page 5, line 41
An especially serious kind of error may result from an accidental An especially serious kind of error may result from an accidental
reuse of TCP sequence numbers in data segments. TCP reliability reuse of TCP sequence numbers in data segments. TCP reliability
depends upon the existence of a bound on the lifetime of a segment: depends upon the existence of a bound on the lifetime of a segment:
the "Maximum Segment Lifetime" or MSL. the "Maximum Segment Lifetime" or MSL.
Duplication of sequence numbers might happen in either of two ways: Duplication of sequence numbers might happen in either of two ways:
(1) Sequence number wrap-around on the current connection (1) Sequence number wrap-around on the current connection
A TCP sequence number contains 32 bits. At a high enough A TCP sequence number contains 32 bits. At a high enough
transfer rate, the 32-bit sequence space may be "wrapped" transfer rate of large volumes of data (at least 4 GiB in the
same session), the 32-bit sequence space may be "wrapped"
(cycled) within the time that a segment is delayed in queues. (cycled) within the time that a segment is delayed in queues.
(2) Earlier incarnation of the connection (2) Earlier incarnation of the connection
Suppose that a connection terminates, either by a proper close Suppose that a connection terminates, either by a proper close
sequence or due to a host crash, and the same connection (i.e., sequence or due to a host crash, and the same connection (i.e.,
using the same pair of port numbers) is immediately reopened. A using the same pair of port numbers) is immediately reopened. A
delayed segment from the terminated connection could fall within delayed segment from the terminated connection could fall within
the current window for the new incarnation and be accepted as the current window for the new incarnation and be accepted as
valid. valid.
Duplicates from earlier incarnations, case (2), are avoided by Duplicates from earlier incarnations, case (2), are avoided by
enforcing the current fixed MSL of the TCP specification, as enforcing the current fixed MSL of the TCP specification, as
explained in Section 4.8 and Appendix B. However, case (1), avoiding explained in Section 5.8 and Appendix B. In addition, the randmizing
the reuse of sequence numbers within the same connection, requires an of ephemeral ports can also help to probabilistically reduce the
upper bound on MSL that depends upon the transfer rate, and at high chances of duplicates from earlier connections. However, case (1),
enough rates, a dedicated mechanism is required. avoiding the reuse of sequence numbers within the same connection,
requires an upper bound on MSL that depends upon the transfer rate,
and at high enough rates, a dedicated mechanism is required.
A possible fix for the problem of cycling the sequence space would be A possible fix for the problem of cycling the sequence space would be
to increase the size of the TCP sequence number field. For example, to increase the size of the TCP sequence number field. For example,
the sequence number field (and also the acknowledgment field) could the sequence number field (and also the acknowledgment field) could
be expanded to 64 bits. This could be done either by changing the be expanded to 64 bits. This could be done either by changing the
TCP header or by means of an additional option. TCP header or by means of an additional option.
Section 4 presents a different mechanism, which we call PAWS Section 5 presents a different mechanism, which we call PAWS
(Protection Against Wrapped Sequence numbers), to extend TCP (Protection Against Wrapped Sequence numbers), to extend TCP
reliability to transfer rates well beyond the foreseeable upper limit reliability to transfer rates well beyond the foreseeable upper limit
of network bandwidths. PAWS uses the TCP Timestamps option defined of network bandwidths. PAWS uses the TCP Timestamps option defined
in Section 3.2 to protect against old duplicates from the same in Section 3.2 to protect against old duplicates from the same
connection. connection.
1.3. Using TCP options 1.3. Using TCP options
The extensions defined in this document all use TCP options. The extensions defined in this document all use TCP options.
When [RFC1323] was published, there was concern that some buggy TCP When [RFC1323] was published, there was concern that some buggy TCP
implementation might be crashed by the first appearance of an option implementation might crash on the first appearance of an option on a
on a non-<SYN> segment. However, bugs like that can lead to DOS non-<SYN> segment. However, bugs like that can lead to DOS attacks
attacks against a TCP. Research has shown that most TCP against a TCP. Research has shown that most TCP implementations will
implementations will properly handle unknown options on non-<SYN> properly handle unknown options on non-<SYN> segments ([Medina04],
segments ([Medina04], [Medina05]). But it is still prudent to be [Medina05]). But it is still prudent to be conservative in what you
conservative in what you send, and avoiding buggy TCP implementation send, and avoiding buggy TCP implementation is not the only reason
is not the only reason for negotiating TCP options on <SYN> segments. for negotiating TCP options on <SYN> segments.
The window scale option negotiates fundamental parameters of the TCP The window scale option negotiates fundamental parameters of the TCP
session. Therefore, it is only sent during the initial handshake. session. Therefore, it is only sent during the initial handshake.
Furthermore, the window scale option will be sent in a <SYN,ACK> Furthermore, the window scale option will be sent in a <SYN,ACK>
segment only if the corresponding option was received in the initial segment only if the corresponding option was received in the initial
<SYN> segment. <SYN> segment.
The Timestamps option may appear in any data or <ACK> segment, adding The Timestamps option may appear in any data or <ACK> segment, adding
12 bytes to the 20-byte TCP header. It is required that this TCP 10 bytes (up to 12 bytes including padding) to the 20-byte TCP
option will be sent on all non-<SYN> segments after an exchange of header. It is required that this TCP option will be sent on all non-
options on the <SYN> segments has indicated that both sides <SYN> segments after an exchange of options on the <SYN> segments has
understand this extension. indicated that both sides understand this extension.
Research has shown that the use of the Timestamps option to arrive at Research has shown that the use of the Timestamps option to take
an optimal retransmission timeout value has only limited benefit additional RTT samples within each RTT has little effect on the
([Allman99]. However, there are other uses of the Timestamps option, ultimate retransmission timeout value [Allman99]. However, there are
such as the Eifel mechanism [RFC3522], [RFC4015], and PAWS (see other uses of the Timestamps option, such as the Eifel mechanism
Section 4) which improve overall TCP security and performance. The [RFC3522], [RFC4015], and PAWS (see Section 5) which improve overall
extra header bandwidth used by this option should be evaluated for TCP security and performance. The extra header bandwidth used by
the gains in performance and security in an actual deployment. this option should be evaluated for the gains in performance and
security in an actual deployment.
Appendix A contains a recommended layout of the options in TCP Appendix A contains a recommended layout of the options in TCP
headers to achieve reasonable data field alignment. headers to achieve reasonable data field alignment.
Finally, we observe that most of the mechanisms defined in this Finally, we observe that most of the mechanisms defined in this
document are important for LFN's and/or very high-speed networks. document are important for LFN's and/or very high-speed networks.
For low-speed networks, it might be a performance optimization to NOT For low-speed networks, it might be a performance optimization to NOT
use these mechanisms. A TCP vendor concerned about optimal use these mechanisms. A TCP vendor concerned about optimal
performance over low-speed paths might consider turning these performance over low-speed paths might consider turning these
extensions off for low- speed paths, or allow a user or installation extensions off for low- speed paths, or allow a user or installation
skipping to change at page 8, line 28 skipping to change at page 8, line 28
determined by the maximum receive buffer space. In a typical modern determined by the maximum receive buffer space. In a typical modern
implementation, this maximum buffer space is set by default but can implementation, this maximum buffer space is set by default but can
be overridden by a user program before a TCP connection is opened. be overridden by a user program before a TCP connection is opened.
This determines the scale factor, and therefore no new user interface This determines the scale factor, and therefore no new user interface
is needed for window scaling. is needed for window scaling.
2.2. Window Scale Option 2.2. Window Scale Option
The three-byte Window Scale option MAY be sent in a <SYN> segment by The three-byte Window Scale option MAY be sent in a <SYN> segment by
a TCP. It has two purposes: (1) indicate that the TCP is prepared to a TCP. It has two purposes: (1) indicate that the TCP is prepared to
do both send and receive window scaling, and (2) communicate the both send and receive window scaling, and (2) communicate the
exponent of a scale factor to be applied to its receive window. exponent of a scale factor to be applied to its receive window.
Thus, a TCP that is prepared to scale windows SHOULD send the option, Thus, a TCP that is prepared to scale windows SHOULD send the option,
even if its own scale factor is 1 and the exponent 0. The scale even if its own scale factor is 1 and the exponent 0. The scale
factor is limited to a power of two and encoded logarithmically, so factor is limited to a power of two and encoded logarithmically, so
it may be implemented by binary shift operations. The maximum scale it may be implemented by binary shift operations. The maximum scale
exponent is limited to 14 for a maximum permissible receive window exponent is limited to 14 for a maximum permissible receive window
size of 1 GiB (2^(14+16)). size of 1 GiB (2^(14+16)).
TCP Window Scale Option (WSopt): TCP Window Scale Option (WSopt):
skipping to change at page 9, line 14 skipping to change at page 9, line 14
MAY be zero (offering to scale, while applying a scale factor of 1 to MAY be zero (offering to scale, while applying a scale factor of 1 to
the receive window). the receive window).
This option MAY be sent in an initial <SYN> segment (i.e., a segment This option MAY be sent in an initial <SYN> segment (i.e., a segment
with the SYN bit on and the ACK bit off). It MAY also be sent in a with the SYN bit on and the ACK bit off). It MAY also be sent in a
<SYN,ACK> segment, but only if a Window Scale option was received in <SYN,ACK> segment, but only if a Window Scale option was received in
the initial <SYN> segment. A Window Scale option in a segment the initial <SYN> segment. A Window Scale option in a segment
without a SYN bit MUST be ignored. without a SYN bit MUST be ignored.
The window field in a segment where the SYN bit is set (i.e., a <SYN> The window field in a segment where the SYN bit is set (i.e., a <SYN>
or <SYN,ACK>) is never scaled. or <SYN,ACK>) MUST NOT be scaled.
2.3. Using the Window Scale Option 2.3. Using the Window Scale Option
A model implementation of window scaling is as follows, using the A model implementation of window scaling is as follows, using the
notation of [RFC0793]: notation of [RFC0793]:
o All windows are treated as 32-bit quantities for storage in the o The connection state MUST be augmented by two window shift
connection control block and for local calculations. This counters, Snd.Wind.Shift and Rcv.Wind.Shift, to be applied to the
includes the send-window (SND.WND) and the receive-window incoming and outgoing window fields, respectively.
(RCV.WND) values, as well as the congestion window.
o The connection state is augmented by two window shift counters,
Snd.Wind.Shift and Rcv.Wind.Shift, to be applied to the incoming
and outgoing window fields, respectively.
o If a TCP receives a <SYN> segment containing a Window Scale o If a TCP receives a <SYN> segment containing a Window Scale
option, it sends its own Window Scale option in the <SYN,ACK> option, it SHOULD send its own Window Scale option in the
segment. <SYN,ACK> segment.
o The Window Scale option is sent with shift.cnt = R, where R is the o The Window Scale option MUST be sent with shift.cnt = R, where R
value that the TCP would like to use for its receive window. is the value that the TCP would like to use for its receive
window.
o Upon receiving a <SYN> segment with a Window Scale option o Upon receiving a <SYN> segment with a Window Scale option
containing shift.cnt = S, a TCP sets Snd.Wind.Shift to S and sets containing shift.cnt = S, a TCP MUST set Snd.Wind.Shift to S and
Rcv.Wind.Shift to R; otherwise, it sets both Snd.Wind.Shift and MUST set Rcv.Wind.Shift to R; otherwise, it MUST set both
Rcv.Wind.Shift to zero. Snd.Wind.Shift and Rcv.Wind.Shift to zero.
o The window field (SEG.WND) in the header of every incoming o The window field (SEG.WND) in the header of every incoming
segment, with the exception of <SYN> segments, is left-shifted by segment, with the exception of <SYN> segments, MUST be left-
Snd.Wind.Shift bits before updating SND.WND: shifted by Snd.Wind.Shift bits before updating SND.WND:
SND.WND = SEG.WND << Snd.Wind.Shift SND.WND = SEG.WND << Snd.Wind.Shift
(assuming the other conditions of [RFC0793] are met, and using the (assuming the other conditions of [RFC0793] are met, and using the
"C" notation "<<" for left-shift). "C" notation "<<" for left-shift).
o The window field (SEG.WND) of every outgoing segment, with the o The window field (SEG.WND) of every outgoing segment, with the
exception of <SYN> segments, is right-shifted by Rcv.Wind.Shift exception of <SYN> segments, MUST be right-shifted by
bits: Rcv.Wind.Shift bits:
SEG.WND = RCV.WND >> Rcv.Wind.Shift SEG.WND = RCV.WND >> Rcv.Wind.Shift
TCP determines if a data segment is "old" or "new" by testing whether TCP determines if a data segment is "old" or "new" by testing whether
its sequence number is within 2^31 bytes of the left edge of the its sequence number is within 2^31 bytes of the left edge of the
window, and if it is not, discarding the data as "old". To insure window, and if it is not, discarding the data as "old". To insure
that new data is never mistakenly considered old and vice versa, the that new data is never mistakenly considered old and vice versa, the
left edge of the sender's window has to be at most 2^31 away from the left edge of the sender's window has to be at most 2^31 away from the
right edge of the receiver's window. Similarly with the sender's right edge of the receiver's window. Similarly with the sender's
right edge and receiver's left edge. Since the right and left edges right edge and receiver's left edge. Since the right and left edges
skipping to change at page 10, line 32 skipping to change at page 10, line 26
max window < 2^30 max window < 2^30
Since the max window is 2^S (where S is the scaling shift count) Since the max window is 2^S (where S is the scaling shift count)
times at most 2^16 - 1 (the maximum unscaled window), the maximum times at most 2^16 - 1 (the maximum unscaled window), the maximum
window is guaranteed to be < 2^30 if S <= 14. Thus, the shift count window is guaranteed to be < 2^30 if S <= 14. Thus, the shift count
MUST be limited to 14 (which allows windows of 2^30 = 1 GiB). If a MUST be limited to 14 (which allows windows of 2^30 = 1 GiB). If a
Window Scale option is received with a shift.cnt value larger than Window Scale option is received with a shift.cnt value larger than
14, the TCP SHOULD log the error but MUST use 14 instead of the 14, the TCP SHOULD log the error but MUST use 14 instead of the
specified value. This is safe as a sender can always choose to only specified value. This is safe as a sender can always choose to only
partially use any signaled receive window. partially use any signaled receive window. If the receiver is
scaling by a factor larger than 14 and the sender is only scaling by
14 then the receive window used by the sender will appear smaller
than it is in reality.
The scale factor applies only to the Window field as transmitted in The scale factor applies only to the Window field as transmitted in
the TCP header; each TCP using extended windows will maintain the the TCP header; each TCP using extended windows will maintain the
window values locally as 32-bit numbers. For example, the window values locally as 32-bit numbers. For example, the
"congestion window" computed by Slow Start and Congestion Avoidance "congestion window" computed by Slow Start and Congestion Avoidance
(see [RFC5681]) is not affected by the scale factor, so window (see [RFC5681]) is not affected by the scale factor, so window
scaling will not introduce quantization into the congestion window. scaling will not introduce quantization into the congestion window.
2.4. Addressing Window Retraction 2.4. Addressing Window Retraction
skipping to change at page 12, line 9 skipping to change at page 12, line 9
retransmission(s) without regard to receiver window as long as retransmission(s) without regard to receiver window as long as
the original segment was in window when it was sent. the original segment was in window when it was sent.
5) Subsequent retransmissions MAY only be sent, if they are within 5) Subsequent retransmissions MAY only be sent, if they are within
the window announced by the most recent <ACK>. the window announced by the most recent <ACK>.
3. TCP Timestamps option 3. TCP Timestamps option
3.1. Introduction 3.1. Introduction
TCP measures the round trip time (RTT), primarily for the purpose of The Timestamps option is introduced to address some of the issues
arriving at a reasonable value for the Retransmission Timeout (RTO) mentioned in Section 1.1 and Section 1.2. The Timestamps option is
timer interval. Accurate and current RTT estimates are necessary to specified in a symmetrical manner, so that TSval timestamps are
adapt to changing traffic conditions, while a conservative estimate carried in both data and <ACK> segments and are echoed in TSecr
of the RTO interval is necessary to minimize spurious RTOs. fields carried in returning <ACK> or data segments. Originally used
primarily for timestamping individual segments, the properties of the
When [RFC1323] was originally written, it was perceived that taking Timestamps option allow not only the use for taking time measurements
RTT measurements for each segment, and also during retransmissions, (Section 4), but additional uses as well (xref target="sec4"/>).
would contribute to reduce spurious RTOs, while maintaining the
timeliness of necessary RTOs. At the time, RTO was also the only
mechanism to make use of the measured RTT. It has been shown, that
taking more RTT samples has only a very limited effect to optimize
RTOs [Allman99].
This document makes a clear distinction between the round trip time It is necessary to remember that there is a distinction between the
measurement (RTTM) mechanism, and subsequent mechanisms using the RTT Timestamps option conveying timestamp information, and the use of
signal as input, such as RTO (see Section 3.4). that information. In particular, the Round Trip Time Measurement
(RTTM) mechanism must be viewed independently from updating the
Retransmission Timeout (RTO) (see Section 4.2). In this case, the
sample granularity also needs to be taken into account. Other
mechanisms, such as PAWS, or Eifel, are not built upon the timestamp
information itself, but are based on the intrinsic property of
monotonically increasing values.
The Timestamps option is important when large receive windows are The Timestamps option is important when large receive windows are
used, to allow the use of the PAWS mechanism (see Section 4). used, to allow the use of the PAWS mechanism (see Section 5).
Furthermore, the option is useful for all TCP's, since it simplifies Furthermore, the option may be useful for all TCP's, since it
the sender and allows the use of additional optimizations such as simplifies the sender and allows the use of additional optimizations
Eifel ([RFC3522], [RFC4015]) and others. such as Eifel ([RFC3522], [RFC4015]) and others ([RFC6817],
[Kuzmanovic03], [Kuehlewind10].
3.2. Timestamps option 3.2. Timestamps option
TCP is a symmetric protocol, allowing data to be sent at any time in TCP is a symmetric protocol, allowing data to be sent at any time in
either direction, and therefore timestamp echoing may occur in either either direction, and therefore timestamp echoing may occur in either
direction. For simplicity and symmetry, we specify that timestamps direction. For simplicity and symmetry, we specify that timestamps
always be sent and echoed in both directions. For efficiency, we always be sent and echoed in both directions. For efficiency, we
combine the timestamp and timestamp reply fields into a single TCP combine the timestamp and timestamp reply fields into a single TCP
Timestamps option. Timestamps option.
skipping to change at page 13, line 19 skipping to change at page 13, line 30
The Timestamp Echo Reply field (TSecr) is valid if the ACK bit is set The Timestamp Echo Reply field (TSecr) is valid if the ACK bit is set
in the TCP header; if it is valid, it echoes a timestamp value that in the TCP header; if it is valid, it echoes a timestamp value that
was sent by the remote TCP in the TSval field of a Timestamps option. was sent by the remote TCP in the TSval field of a Timestamps option.
When TSecr is not valid, its value MUST be zero. However, a value of When TSecr is not valid, its value MUST be zero. However, a value of
zero does not imply TSecr being invalid. The TSecr value will zero does not imply TSecr being invalid. The TSecr value will
generally be from the most recent Timestamps option that was generally be from the most recent Timestamps option that was
received; however, there are exceptions that are explained below. received; however, there are exceptions that are explained below.
A TCP MAY send the Timestamps option (TSopt) in an initial <SYN> A TCP MAY send the Timestamps option (TSopt) in an initial <SYN>
segment (i.e., segment containing a SYN bit and no ACK bit), and MAY segment (i.e., segment containing a SYN bit and no ACK bit), and MAY
send a TSopt in other segments only if it received a TSopt in the send a TSopt in <SYN,ACK> only if it received a TSopt in the initial
initial <SYN> or <SYN,ACK> segment for the connection. <SYN> segment for the connection.
Once TSopt has been successfully negotiated (sent and received) Once TSopt has been successfully negotiated, that is both <SYN>, and
during the <SYN>, <SYN,ACK> exchange, TSopt MUST be sent in every <SYN,ACK> contain TSopt, the TSopt MUST be sent in every non-<RST>
non- <RST> segment for the duration of the connection, and SHOULD be segment for the duration of the connection, and SHOULD be sent in an
sent in an <RST> segment (see Section 4.2 for details). If a non- <RST> segment (see Section 5.2 for details). The TCP SHOULD remember
<RST> segment is received without a TSopt, a TCP SHOULD silently drop this state by setting a flag, referred to as Snd.TS.OK, to one. If a
the segment. A TCP MUST NOT abort a TCP connection because any non-<RST> segment is received without a TSopt, a TCP SHOULD silently
drop the segment. A TCP MUST NOT abort a TCP connection because any
segment lacks an expected TSopt. segment lacks an expected TSopt.
Implementations are strongly encouraged to follow the above rules for Implementations are strongly encouraged to follow the above rules for
handling a missing Timestamps option, and the order of precedence handling a missing Timestamps option, and the order of precedence
mentioned in Section 4.3 when deciding on the acceptance of a mentioned in Section 5.3 when deciding on the acceptance of a
segment. segment.
If a receiver chooses to accept a segment without an expected If a receiver chooses to accept a segment without an expected
Timestamps option, it must be clear that undetectable data corruption Timestamps option, it must be clear that undetectable data corruption
may occur. may occur.
Such a TCP receiver may experience undetectable wrapped- sequence Such a TCP receiver may experience undetectable wrapped- sequence
effects, such as data (payload) corruption or session stalls. In effects, such as data (payload) corruption or session stalls. In
order to maintain the integrity of the payload data, in particular on order to maintain the integrity of the payload data, in particular on
high speed networks, it is paramount to follow the described high speed networks, it is paramount to follow the described
processing rules. processing rules.
However, it has been mentioned that under some circumstances, the However, it has been mentioned that under some circumstances, the
above guidelines are too strict, and some paths sporadically suppress above guidelines are too strict, and some paths sporadically suppress
the Timestamps option, while maintaining payload integrity. A path the Timestamps option, while maintaining payload integrity. A path
behaving in this manner should be deemed unacceptable, but it has behaving in this manner should be deemed unacceptable, but it has
been noted that some implementations relax the acceptance rules as a been noted that some implementations relax the acceptance rules as a
workaround, and allow TCP to run across such paths. workaround, and allow TCP to run across such paths [Oppermann13]
If a TSopt is received on a connection where TSopt was not negotiated If a TSopt is received on a connection where TSopt was not negotiated
in the initial three-way handshake, the TSopt MUST be ignored and the in the initial three-way handshake, the TSopt MUST be ignored and the
packet processed normally. packet processed normally.
In the case of crossing <SYN> segments where one <SYN> contains a In the case of crossing <SYN> segments where one <SYN> contains a
TSopt and the other doesn't, both sides MAY send a TSopt in the TSopt and the other doesn't, both sides MAY send a TSopt in the
<SYN,ACK> segment. <SYN,ACK> segment.
TSopt is required for the two mechanisms described in sections 3.3 TSopt is required for the two mechanisms described in sections 4 and
and 4.2. There are also other mechanisms that rely on the presence 5. There are also other mechanisms that rely on the presence of the
of the TSopt, e.g. [RFC3522]. If a TCP stopped sending TSopt at any TSopt, e.g. [RFC3522]. If a TCP stopped sending TSopt at any time
time during an established session, it interferes with these during an established session, it interferes with these mechanisms.
mechanisms. This update to [RFC1323] describes explicitly the This update to [RFC1323] describes explicitly the previous assumption
previous assumption (see Section 4.2), that each TCP segment must (see Section 5.2), that each TCP segment must have TSopt, once
have TSopt, once negotiated. negotiated.
3.3. The RTTM Mechanism 4. The RTTM Mechanism
RTTM places a Timestamps option in every segment, with a TSval that 4.1. Introduction
is obtained from a (virtual) "timestamp clock". Values of this clock
MUST be at least approximately proportional to real time, in order to One use of the Timestamps option is to measure the round trip time of
measure actual RTT. virtually every packet acknowledged. The Round Trip Time Measurement
(RTTM) mechansim requires a Timestamps option in every measured
segment, with a TSval that is obtained from a (virtual) "timestamp
clock". Values of this clock MUST be at least approximately
proportional to real time, in order to measure actual RTT.
TCP measures the round trip time (RTT), primarily for the purpose of
arriving at a reasonable value for the Retransmission Timeout (RTO)
timer interval. Accurate and current RTT estimates are necessary to
adapt to changing traffic conditions, while a conservative estimate
of the RTO interval is necessary to minimize spurious RTOs.
These TSval values are echoed in TSecr values in the reverse These TSval values are echoed in TSecr values in the reverse
direction. The difference between a received TSecr value and the direction. The difference between a received TSecr value and the
current timestamp clock value provides an RTT measurement. current timestamp clock value provides an RTT measurement.
When timestamps are used, every segment that is received will contain When timestamps are used, every segment that is received will contain
a TSecr value. However, these values cannot all be used to update a TSecr value. However, these values cannot all be used to update
the measured RTT. The following example illustrates why. It shows a the measured RTT. The following example illustrates why. It shows a
one-way data flow with segments arriving in sequence without loss. one-way data flow with segments arriving in sequence without loss.
Here A, B, C... represent data blocks occupying successive blocks of Here A, B, C... represent data blocks occupying successive blocks of
skipping to change at page 15, line 39 skipping to change at page 16, line 22
RTTM Rule: A TSecr value received in a segment MAY be used to update RTTM Rule: A TSecr value received in a segment MAY be used to update
the averaged RTT measurement only if the segment advances the averaged RTT measurement only if the segment advances
the left edge of the send window, i.e. SND.UNA is the left edge of the send window, i.e. SND.UNA is
increased. increased.
Since TCP B is not sending data, the data segment C does not Since TCP B is not sending data, the data segment C does not
acknowledge any new data when it arrives at B. Thus, the inflated acknowledge any new data when it arrives at B. Thus, the inflated
RTTM measurement is not used to update B's RTTM measurement. RTTM measurement is not used to update B's RTTM measurement.
3.4. Updating the RTO value 4.2. Updating the RTO value
[Ludwig00] and [Floyd05] have highlighted the problem that an When [RFC1323] was originally written, it was perceived that taking
unmodified RTO calculation, which is updated with per- packet RTT RTT measurements for each segment, and also during retransmissions,
samples, will truncate the path history too soon. This can lead to would contribute to reduce spurious RTOs, while maintaining the
an increase in spurious retransmissions, when the path properties timeliness of necessary RTOs. At the time, RTO was also the only
vary in the order of a few RTTs, but a high number of RTT samples are mechanism to make use of the measured RTT. It has been shown, that
taken on a much shorter timescale. taking more RTT samples has only a very limited effect to optimize
RTOs [Allman99].
Implementers should note that with timestamps multiple RTTMs can be Implementers should note that with timestamps multiple RTTMs can be
taken per RTT. The [RFC6298] RTO estimator has weighting factors, taken per RTT. The [RFC6298] RTO estimator has weighting factors,
alpha and beta, based on an implicit assumption that at most one RTTM alpha and beta, based on an implicit assumption that at most one RTTM
will be sampled per RTT. When multiple RTTMs per RTT are available will be sampled per RTT. When multiple RTTMs per RTT are available
to update the RTO estimator, this implicit assumption must be to update the RTO estimator, an implementation SHOULD try to adhere
considered. An implementation suggestion is detailed in Appendix G. to the spirit of the history specified in [RFC6298]. An
implementation suggestion is detailed in Appendix G.
3.5. Which Timestamp to Echo [Ludwig00] and [Floyd05] have highlighted the problem that an
unmodified RTO calculation, which is updated with per-packet RTT
samples, will truncate the path history too soon. This can lead to
an increase in spurious retransmissions, when the path properties
vary in the order of a few RTTs, but a high number of RTT samples are
taken on a much shorter timescale.
4.3. Which Timestamp to Echo
If more than one Timestamps option is received before a reply segment If more than one Timestamps option is received before a reply segment
is sent, the TCP must choose only one of the TSvals to echo, ignoring is sent, the TCP must choose only one of the TSvals to echo, ignoring
the others. To minimize the state kept in the receiver (i.e., the the others. To minimize the state kept in the receiver (i.e., the
number of unprocessed TSvals), the receiver should be required to number of unprocessed TSvals), the receiver should be required to
retain at most one timestamp in the connection control block. retain at most one timestamp in the connection control block.
There are three situations to consider: There are three situations to consider:
(A) Delayed ACKs. (A) Delayed ACKs.
skipping to change at page 18, line 27 skipping to change at page 20, line 5
2 2
<E, TSval=5> -------------------> <E, TSval=5> ------------------->
2 2
<---- <ACK(C), TSecr=2> <---- <ACK(C), TSecr=2>
2 2
<D, TSval=4> -------------------> <D, TSval=4> ------------------->
4 4
<---- <ACK(E), TSecr=4> <---- <ACK(E), TSecr=4>
(etc) (etc)
4. PAWS - Protection Against Wrapped Sequence Numbers 5. PAWS - Protection Against Wrapped Sequence Numbers
4.1. Introduction 5.1. Introduction
Section 4.2 describes a simple mechanism to reject old duplicate Another use for the Timestamps options is the mechanism to Protect
segments that might corrupt an open TCP connection; we call this Against Wrapped Sequence numbers (PAWS). Section 5.2 describes a
mechanism PAWS (Protection Against Wrapped Sequence numbers). PAWS simple mechanism to reject old duplicate segments that might corrupt
operates within a single TCP connection, using state that is saved in an open TCP connection. PAWS operates within a single TCP
the connection control block. Section 4.8 and Appendix H discuss the connection, using state that is saved in the connection control
implications of the PAWS mechanism for avoiding old duplicates from block. Section 5.8 and Appendix H discuss the implications of the
previous incarnations of the same connection. PAWS mechanism for avoiding old duplicates from previous incarnations
of the same connection.
4.2. The PAWS Mechanism 5.2. The PAWS Mechanism
PAWS uses the same TCP Timestamps option as the RTTM mechanism PAWS uses the TCP Timestamps option described earlier, and assumes
described earlier, and assumes that every received TCP segment that every received TCP segment (including data and <ACK> segments)
(including data and <ACK> segments) contains a timestamp SEG.TSval contains a timestamp SEG.TSval whose values are monotonically non-
whose values are monotonically non-decreasing in time. The basic decreasing in time. The basic idea is that a segment can be
idea is that a segment can be discarded as an old duplicate if it is discarded as an old duplicate if it is received with a timestamp
received with a timestamp SEG.TSval less than some timestamp recently SEG.TSval less than some timestamp recently received on this
received on this connection. connection.
In both the PAWS and the RTTM mechanism, the "timestamps" are 32-bit In the PAWS mechanism, the "timestamps" are 32-bit unsigned integers
unsigned integers in a modular 32-bit space. Thus, "less than" is in a modular 32-bit space. Thus, "less than" is defined the same way
defined the same way it is for TCP sequence numbers, and the same it is for TCP sequence numbers, and the same implementation
implementation techniques apply. If s and t are timestamp values, techniques apply. If s and t are timestamp values,
s < t if 0 < (t - s) < 2^31, s < t if 0 < (t - s) < 2^31,
computed in unsigned 32-bit arithmetic. computed in unsigned 32-bit arithmetic.
The choice of incoming timestamps to be saved for this comparison The choice of incoming timestamps to be saved for this comparison
MUST guarantee a value that is monotonically increasing. For MUST guarantee a value that is monotonically non-decreasing. For
example, we might save the timestamp from the segment that last example, an implementation might save the timestamp from the segment
advanced the left edge of the receive window, i.e., the most recent that last advanced the left edge of the receive window, i.e., the
in-sequence segment. Instead, we choose the value TS.Recent most recent in-sequence segment. For simplicity, the value TS.Recent
introduced in Section 3.5 for the RTTM mechanism, since using a introduced in Section 4.3 is used instead, as using a common value
common value for both PAWS and RTTM simplifies the implementation of for both PAWS and RTTM simplifies the implementation. As Section 4.3
both. As Section 3.5 explained, TS.Recent differs from the timestamp explained, TS.Recent differs from the timestamp from the last in-
from the last in-sequence segment only in the case of delayed <ACK>s, sequence segment only in the case of delayed <ACK>s, and therefore by
and therefore by less than one window. Either choice will therefore less than one window. Either choice will therefore protect against
protect against sequence number wrap-around. sequence number wrap-around.
RTTM was specified in a symmetrical manner, so that TSval timestamps PAWS submits all incoming segments to the same test, and therefore
are carried in both data and <ACK> segments and are echoed in TSecr protects against duplicate <ACK> segments as well as data segments.
fields carried in returning <ACK> or data segments. PAWS submits all (An alternative non-symmetric algorithm would protect against old
incoming segments to the same test, and therefore protects against duplicate <ACK>s: the sender of data would reject incoming <ACK>
duplicate <ACK> segments as well as data segments. (An alternative segments whose TSecr values were less than the TSecr saved from the
non-symmetric algorithm would protect against old duplicate <ACK>s: last segment whose ACK field advanced the left edge of the send
the sender of data would reject incoming <ACK> segments whose TSecr window. This algorithm was deemed to lack economy of mechanism and
values were less than the TSecr saved from the last segment whose ACK symmetry.)
field advanced the left edge of the send window. This algorithm was
deemed to lack economy of mechanism and symmetry.)
TSval timestamps sent on <SYN> and <SYN,ACK> segments are used to TSval timestamps sent on <SYN> and <SYN,ACK> segments are used to
initialize PAWS. PAWS protects against old duplicate non- <SYN> initialize PAWS. PAWS protects against old duplicate non- <SYN>
segments, and duplicate <SYN> segments received while there is a segments, and duplicate <SYN> segments received while there is a
synchronized connection. Duplicate <SYN> and <SYN,ACK> segments synchronized connection. Duplicate <SYN> and <SYN,ACK> segments
received when there is no connection will be discarded by the normal received when there is no connection will be discarded by the normal
3-way handshake and sequence number checks of TCP. 3-way handshake and sequence number checks of TCP.
[RFC1323] recommended that <RST> segments NOT carry timestamps, and [RFC1323] recommended that <RST> segments NOT carry timestamps, and
that they be acceptable regardless of their timestamp. At that time, that they be acceptable regardless of their timestamp. At that time,
the thinking was that old duplicate <RST> segments should be the thinking was that old duplicate <RST> segments should be
exceedingly unlikely, and their cleanup function should take exceedingly unlikely, and their cleanup function should take
precedence over timestamps. More recently, discussions about various precedence over timestamps. More recently, discussions about various
blind attacks on TCP connections have raised the suggestion that if blind attacks on TCP connections have raised the suggestion that if
the Timestamps option is present, SEG.TSecr could be used to provide the Timestamps option is present, SEG.TSecr could be used to provide
stricter acceptance tests for <RST> segments. While still under stricter acceptance tests for <RST> segments.
discussion, to enable research into this area it is now RECOMMENDED
that when generating an <RST>, that if the segment causing the <RST>
to be generated contained a Timestamps option, that the <RST> also
contain a Timestamps option. In the <RST> segment, SEG.TSecr SHOULD
be set to SEG.TSval from the incoming segment and SEG.TSval SHOULD be
set to zero. If an <RST> is being generated because of a user abort,
and Snd.TS.OK is set, then a Timestamps option SHOULD be included in
the <RST>. When an <RST> segment is received, it MUST NOT be
subjected to PAWS checks, and information from the Timestamps option
MUST NOT be used to update connection state information. SEG.TSecr
MAY be used to provide stricter <RST> acceptance checks.
4.3. Basic PAWS Algorithm While still under discussion, to enable research into this area it is
now RECOMMENDED that when generating an <RST>, that if the segment
causing the <RST> to be generated contained a Timestamps option, that
the <RST> also contain a Timestamps option. In the <RST> segment,
SEG.TSecr SHOULD be set to SEG.TSval from the incoming segment and
SEG.TSval SHOULD be set to zero. If an <RST> is being generated
because of a user abort, and Snd.TS.OK is set, then a Timestamps
option SHOULD be included in the <RST>. When an <RST> segment is
received, it MUST NOT be subjected to the PAWS check by verifying an
acceptable value in SEG.TSval, and information from the Timestamps
option MUST NOT be used to update connection state information.
SEG.TSecr MAY be used to provide stricter <RST> acceptance checks.
5.3. Basic PAWS Algorithm
If the PAWS algorithm is used, the following processing MUST be If the PAWS algorithm is used, the following processing MUST be
performed on all incoming segments for a synchronized connection. performed on all incoming segments for a synchronized connection.
Also, PAWS processing MUST take precedence over the regular TCP Also, PAWS processing MUST take precedence over the regular TCP
acceptablitiy check (Section 3.3 in [RFC0793]), which is performed acceptablitiy check (Section 3.3 in [RFC0793]), which is performed
after verification of the received Timestamps option: after verification of the received Timestamps option:
R1) If there is a Timestamps option in the arriving segment, R1) If there is a Timestamps option in the arriving segment,
SEG.TSval < TS.Recent, TS.Recent is valid (see later discussion) SEG.TSval < TS.Recent, TS.Recent is valid (see later discussion)
and the RST bit is not set, then treat the arriving segment as and the RST bit is not set, then treat the arriving segment as
skipping to change at page 20, line 50 skipping to change at page 22, line 17
Note: it is necessary to send an <ACK> segment in order to Note: it is necessary to send an <ACK> segment in order to
retain TCP's mechanisms for detecting and recovering from retain TCP's mechanisms for detecting and recovering from
half- open connections. For example, see Figure 10 of half- open connections. For example, see Figure 10 of
[RFC0793]. [RFC0793].
R2) If the segment is outside the window, reject it (normal TCP R2) If the segment is outside the window, reject it (normal TCP
processing) processing)
R3) If an arriving segment satisfies: SEG.SEQ <= Last.ACK.sent (see R3) If an arriving segment satisfies: SEG.SEQ <= Last.ACK.sent (see
Section 3.5), then record its timestamp in TS.Recent. Section 4.3), then record its timestamp in TS.Recent.
R4) If an arriving segment is in-sequence (i.e., at the left window R4) If an arriving segment is in-sequence (i.e., at the left window
edge), then accept it normally. edge), then accept it normally.
R5) Otherwise, treat the segment as a normal in-window, out-of- R5) Otherwise, treat the segment as a normal in-window, out-of-
sequence TCP segment (e.g., queue it for later delivery to the sequence TCP segment (e.g., queue it for later delivery to the
user). user).
Steps R2, R4, and R5 are the normal TCP processing steps specified by Steps R2, R4, and R5 are the normal TCP processing steps specified by
[RFC0793]. [RFC0793].
skipping to change at page 22, line 18 skipping to change at page 23, line 33
Even if a segment were delayed past the RTO, the Fast Retransmit Even if a segment were delayed past the RTO, the Fast Retransmit
mechanism [Jacobson90c] will cause the delayed segments to be mechanism [Jacobson90c] will cause the delayed segments to be
retransmitted at the same time as B.2, avoiding an extra RTT and retransmitted at the same time as B.2, avoiding an extra RTT and
therefore causing a very small performance penalty. therefore causing a very small performance penalty.
We know of no case with a significant probability of occurrence in We know of no case with a significant probability of occurrence in
which timestamps will cause performance degradation by unnecessarily which timestamps will cause performance degradation by unnecessarily
discarding segments. discarding segments.
4.4. Timestamp Clock 5.4. Timestamp Clock
It is important to understand that the PAWS algorithm does not It is important to understand that the PAWS algorithm does not
require clock synchronization between sender and receiver. The require clock synchronization between sender and receiver. The
sender's timestamp clock is used to stamp the segments, and the sender's timestamp clock is used as a source of monotonic non-
sender uses the echoed timestamp to measure RTTs. However, the decreasing values to stamp the segments. The receiver treats the
receiver treats the timestamp as simply a monotonically increasing timestamp value as simply a monotonically non-decreasing serial
serial number, without any necessary connection to its clock. From number, without any connection to time. From the receiver's
the receiver's viewpoint, the timestamp is acting as a logical viewpoint, the timestamp is acting as a logical extension of the
extension of the high-order bits of the sequence number. high-order bits of the sequence number.
The receiver algorithm does place some requirements on the frequency The receiver algorithm does place some requirements on the frequency
of the timestamp clock. of the timestamp clock.
(a) The timestamp clock must not be "too slow". (a) The timestamp clock must not be "too slow".
It MUST tick at least once for each 2^31 bytes sent. In fact, It MUST tick at least once for each 2^31 bytes sent. In fact,
in order to be useful to the sender for round trip timing, the in order to be useful to the sender for round trip timing, the
clock SHOULD tick at least once per window's worth of data, and clock SHOULD tick at least once per window's worth of data, and
even with the window extension defined in Section 2.2, 2^31 even with the window extension defined in Section 2.2, 2^31
skipping to change at page 23, line 7 skipping to change at page 24, line 20
(b) The timestamp clock must not be "too fast". (b) The timestamp clock must not be "too fast".
The recycling time of the timestamp clock MUST be greater than The recycling time of the timestamp clock MUST be greater than
MSL seconds. Since the clock (timestamp) is 32 bits and the MSL seconds. Since the clock (timestamp) is 32 bits and the
worst-case MSL is 255 seconds, the maximum acceptable clock worst-case MSL is 255 seconds, the maximum acceptable clock
frequency is one tick every 59 ns. frequency is one tick every 59 ns.
However, it is desirable to establish a much longer recycle However, it is desirable to establish a much longer recycle
period, in order to handle outdated timestamps on idle period, in order to handle outdated timestamps on idle
connections (see Section 4.5), and to relax the MSL requirement connections (see Section 5.5), and to relax the MSL requirement
for preventing sequence number wrap-around. With a 1 ms for preventing sequence number wrap-around. With a 1 ms
timestamp clock, the 32-bit timestamp will wrap its sign bit in timestamp clock, the 32-bit timestamp will wrap its sign bit in
24.8 days. Thus, it will reject old duplicates on the same 24.8 days. Thus, it will reject old duplicates on the same
connection if MSL is 24.8 days or less. This appears to be a connection if MSL is 24.8 days or less. This appears to be a
very safe figure; an MSL of 24.8 days or longer can probably be very safe figure; an MSL of 24.8 days or longer can probably be
assumed in the internet without requiring precise MSL assumed in the Internet without requiring precise MSL
enforcement. enforcement.
Based upon these considerations, we choose a timestamp clock Based upon these considerations, we choose a timestamp clock
frequency in the range 1 ms to 1 sec per tick. This range also frequency in the range 1 ms to 1 sec per tick. This range also
matches the requirements of the RTTM mechanism, which does not need matches the requirements of the RTTM mechanism, which does not need
much more resolution than the granularity of the retransmit timer, much more resolution than the granularity of the retransmit timer,
e.g., tens or hundreds of milliseconds. e.g., tens or hundreds of milliseconds.
The PAWS mechanism also puts a strong monotonicity requirement on the The PAWS mechanism also puts a strong monotonicity requirement on the
sender's timestamp clock. The method of implementation of the sender's timestamp clock. The method of implementation of the
skipping to change at page 23, line 40 skipping to change at page 25, line 5
o A clock interrupt may be used to simply increment a binary integer o A clock interrupt may be used to simply increment a binary integer
by 1 periodically. by 1 periodically.
o The timestamp clock may be derived from a system clock that is o The timestamp clock may be derived from a system clock that is
subject to being abruptly changed, by adding a variable offset subject to being abruptly changed, by adding a variable offset
value. This offset is initialized to zero. When a new timestamp value. This offset is initialized to zero. When a new timestamp
clock value is needed, the offset can be adjusted as necessary to clock value is needed, the offset can be adjusted as necessary to
make the new value equal to or larger than the previous value make the new value equal to or larger than the previous value
(which was saved for this purpose). (which was saved for this purpose).
4.5. Outdated Timestamps o A random offset may be added to the timestamp clock on a per
connection basis. See [RFC6528], section 3, on randomizing the
initial sequence number (ISN). The same function with a different
secret key can be use to generate the per connection timestamp
offset.
5.5. Outdated Timestamps
If a connection remains idle long enough for the timestamp clock of If a connection remains idle long enough for the timestamp clock of
the other TCP to wrap its sign bit, then the value saved in TS.Recent the other TCP to wrap its sign bit, then the value saved in TS.Recent
will become too old; as a result, the PAWS mechanism will cause all will become too old; as a result, the PAWS mechanism will cause all
subsequent segments to be rejected, freezing the connection (until subsequent segments to be rejected, freezing the connection (until
the timestamp clock wraps its sign bit again). the timestamp clock wraps its sign bit again).
With the chosen range of timestamp clock frequencies (1 sec to 1 ms), With the chosen range of timestamp clock frequencies (1 sec to 1 ms),
the time to wrap the sign bit will be between 24.8 days and 24800 the time to wrap the sign bit will be between 24.8 days and 24800
days. A TCP connection that is idle for more than 24 days and then days. A TCP connection that is idle for more than 24 days and then
skipping to change at page 24, line 27 skipping to change at page 25, line 47
timestamp check fails, i.e., only if SEG.TSval < TS.Recent. If timestamp check fails, i.e., only if SEG.TSval < TS.Recent. If
TS.Recent is found to be invalid, then the segment is accepted, TS.Recent is found to be invalid, then the segment is accepted,
regardless of the failure of the timestamp check, and rule R3 updates regardless of the failure of the timestamp check, and rule R3 updates
TS.Recent with the TSval from the new segment. TS.Recent with the TSval from the new segment.
To detect how long the connection has been idle, the TCP MAY update a To detect how long the connection has been idle, the TCP MAY update a
clock or timestamp value associated with the connection whenever clock or timestamp value associated with the connection whenever
TS.Recent is updated, for example. The details will be TS.Recent is updated, for example. The details will be
implementation-dependent. implementation-dependent.
4.6. Header Prediction 5.6. Header Prediction
"Header prediction" [Jacobson90a] is a high-performance transport "Header prediction" [Jacobson90a] is a high-performance transport
protocol implementation technique that is most important for high- protocol implementation technique that is most important for high-
speed links. This technique optimizes the code for the most common speed links. This technique optimizes the code for the most common
case, receiving a segment correctly and in order. Using header case, receiving a segment correctly and in order. Using header
prediction, the receiver asks the question, "Is this segment the next prediction, the receiver asks the question, "Is this segment the next
in sequence?" This question can be answered in fewer machine in sequence?" This question can be answered in fewer machine
instructions than the question, "Is this segment within the window?" instructions than the question, "Is this segment within the window?"
Adding header prediction to our timestamp procedure leads to the Adding header prediction to our timestamp procedure leads to the
skipping to change at page 25, line 43 skipping to change at page 27, line 15
enough", i.e., it won't contribute significantly to the overall enough", i.e., it won't contribute significantly to the overall
error rate. We therefore believe we can ignore the problem of an error rate. We therefore believe we can ignore the problem of an
old duplicate being accepted by doing header prediction before old duplicate being accepted by doing header prediction before
checking the timestamp. checking the timestamp.
However, this probabilistic argument is not universally accepted, and However, this probabilistic argument is not universally accepted, and
the consensus at present is that the performance gain does not the consensus at present is that the performance gain does not
justify the hazard in the general case. It is therefore recommended justify the hazard in the general case. It is therefore recommended
that H2 follow H1. that H2 follow H1.
4.7. IP Fragmentation 5.7. IP Fragmentation
At high data rates, the protection against old segments provided by At high data rates, the protection against old segments provided by
PAWS can be circumvented by errors in IP fragment reassembly (see PAWS can be circumvented by errors in IP fragment reassembly (see
[RFC4963]). The only way to protect against incorrect IP fragment [RFC4963]). The only way to protect against incorrect IP fragment
reassembly is to not allow the segments to be fragmented. This is reassembly is to not allow the segments to be fragmented. This is
done by setting the Don't Fragment (DF) bit in the IP header. done by setting the Don't Fragment (DF) bit in the IP header.
Setting the DF bit implies the use of Path MTU Discovery as described Setting the DF bit implies the use of Path MTU Discovery as described
in [RFC1191], [RFC1981], and [RFC4821], thus any TCP implementation in [RFC1191], [RFC1981], and [RFC4821], thus any TCP implementation
that implements PAWS MUST also implement Path MTU Discovery. that implements PAWS MUST also implement Path MTU Discovery.
4.8. Duplicates from Earlier Incarnations of Connection 5.8. Duplicates from Earlier Incarnations of Connection
The PAWS mechanism protects against errors due to sequence number The PAWS mechanism protects against errors due to sequence number
wrap-around on high-speed connections. Segments from an earlier wrap-around on high-speed connections. Segments from an earlier
incarnation of the same connection are also a potential cause of old incarnation of the same connection are also a potential cause of old
duplicate errors. In both cases, the TCP mechanisms to prevent such duplicate errors. In both cases, the TCP mechanisms to prevent such
errors depend upon the enforcement of a maximum segment lifetime errors depend upon the enforcement of a maximum segment lifetime
(MSL) by the Internet (IP) layer (see Appendix of RFC 1185 for a (MSL) by the Internet (IP) layer (see Appendix of RFC 1185 for a
detailed discussion). Unlike the case of sequence space wrap-around, detailed discussion). Unlike the case of sequence space wrap-around,
the MSL required to prevent old duplicate errors from earlier the MSL required to prevent old duplicate errors from earlier
incarnations does not depend upon the transfer rate. If the IP layer incarnations does not depend upon the transfer rate. If the IP layer
skipping to change at page 26, line 29 skipping to change at page 28, line 5
no matter how high the network speed. Thus, the PAWS mechanism is no matter how high the network speed. Thus, the PAWS mechanism is
not required for this case. not required for this case.
We may still ask whether the PAWS mechanism can provide additional We may still ask whether the PAWS mechanism can provide additional
security against old duplicates from earlier connections, allowing us security against old duplicates from earlier connections, allowing us
to relax the enforcement of MSL by the IP layer. Appendix B explores to relax the enforcement of MSL by the IP layer. Appendix B explores
this question, showing that further assumptions and/or mechanisms are this question, showing that further assumptions and/or mechanisms are
required, beyond those of PAWS. This is not part of the current required, beyond those of PAWS. This is not part of the current
extension. extension.
5. Conclusions and Acknowledgements 6. Conclusions and Acknowledgements
This memo presented a set of extensions to TCP to provide efficient This memo presented a set of extensions to TCP to provide efficient
operation over large bandwidth * delay product paths and reliable operation over large bandwidth * delay product paths and reliable
operation over very high-speed paths. These extensions are designed operation over very high-speed paths. These extensions are designed
to provide compatible interworking with TCP stacks that do not to provide compatible interworking with TCP stacks that do not
implement the extensions. implement the extensions.
These mechanisms are implemented using TCP options for scaled windows These mechanisms are implemented using TCP options for scaled windows
and timestamps. The timestamps are used for two distinct mechanisms: and timestamps. The timestamps are used for two distinct mechanisms:
RTTM (Round Trip Time Measurement) and PAWS (Protection Against RTTM (Round Trip Time Measurement) and PAWS (Protection Against
skipping to change at page 27, line 12 skipping to change at page 28, line 35
within the End-to-End Task Force on the theoretical limitations of within the End-to-End Task Force on the theoretical limitations of
transport protocols in general and TCP in particular. Task force transport protocols in general and TCP in particular. Task force
members and other on the end2end-interest list have made valuable members and other on the end2end-interest list have made valuable
contributions by pointing out flaws in the algorithms and the contributions by pointing out flaws in the algorithms and the
documentation. Continued discussion and development since the documentation. Continued discussion and development since the
publication of [RFC1323] originally occurred in the IETF TCP Large publication of [RFC1323] originally occurred in the IETF TCP Large
Windows Working Group, later on in the End-to-End Task Force, and Windows Working Group, later on in the End-to-End Task Force, and
most recently in the IETF TCP Maintenance Working Group. The authors most recently in the IETF TCP Maintenance Working Group. The authors
are grateful for all these contributions. are grateful for all these contributions.
6. Security Considerations 7. Security Considerations
The TCP sequence space is a fixed size, and as the window becomes The TCP sequence space is a fixed size, and as the window becomes
larger it becomes easier for an attacker to generate forged packets larger it becomes easier for an attacker to generate forged packets
that can fall within the TCP window, and be accepted as valid that can fall within the TCP window, and be accepted as valid
segments. While use of timestamps and PAWS can help to mitigate segments. While use of timestamps and PAWS can help to mitigate
this, when using PAWS, if an attacker is able to forge a packet that this, when using PAWS, if an attacker is able to forge a packet that
is acceptable to the TCP connection, a timestamp that is in the is acceptable to the TCP connection, a timestamp that is in the
future would cause valid segments to be dropped due to PAWS checks. future would cause valid segments to be dropped due to PAWS checks.
Hence, implementers should take care to not open the TCP window Hence, implementers should take care to not open the TCP window
drastically beyond the requirements of the connection. drastically beyond the requirements of the connection.
skipping to change at page 28, line 42 skipping to change at page 30, line 15
cause some broken middlebox behavior to be detected cause some broken middlebox behavior to be detected
(potentially unresponsive TCP sessions). (potentially unresponsive TCP sessions).
Implementations that depend on PAWS could provide a mechanism for the Implementations that depend on PAWS could provide a mechanism for the
application to determine whether or not PAWS is in use on the application to determine whether or not PAWS is in use on the
connection, and chose to terminate the connection if that protection connection, and chose to terminate the connection if that protection
doesn't exist. This is not just to protect the connection against doesn't exist. This is not just to protect the connection against
middleboxes that might remove the Timestamps option, but also against middleboxes that might remove the Timestamps option, but also against
remote hosts that do not have Timestamp support. remote hosts that do not have Timestamp support.
7. IANA Considerations 7.1. Privacy Considerations
This document has no actions for IANA. The TCP options described in this document do not expose individual
users data. However, a naive implementation simply using the system
clock as source for the Timestamps option will reveal characteristics
of the TCP potentially allowing more targeted attacks. It is
therefore RECOMMENDED to generate a random, per-connection offset to
be used with the clock source when generating the Timestamps option
value (see Section 5.4).
8. References Furthermore, the combination, relative ordering and padding of the
TCP options described in Section 2.2 and Section 3.2 will reveal
additional clues to allow the fingerprinting of the system.
8.1. Normative References 8. IANA Considerations
This document has no actions for IANA. The described TCP options are
well known from the superceded [RFC1323].
9. References
9.1. Normative References
[RFC0793] Postel, J., "Transmission Control Protocol", STD 7, [RFC0793] Postel, J., "Transmission Control Protocol", STD 7,
RFC 793, September 1981. RFC 793, September 1981.
[RFC1191] Mogul, J. and S. Deering, "Path MTU discovery", RFC 1191, [RFC1191] Mogul, J. and S. Deering, "Path MTU discovery", RFC 1191,
November 1990. November 1990.
[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
Requirement Levels", BCP 14, RFC 2119, March 1997. Requirement Levels", BCP 14, RFC 2119, March 1997.
8.2. Informative References 9.2. Informative References
[Allman99] [Allman99]
Allman, M. and V. Paxson, "On Estimating End-to-End Allman, M. and V. Paxson, "On Estimating End-to-End
Network Path Properties", Proc. ACM SIGCOMM Technical Network Path Properties", Proc. ACM SIGCOMM Technical
Symposium, Cambridge, MA, September 1999, Symposium, Cambridge, MA, September 1999,
<http://aciri.org/mallman/papers/estimation-la.pdf>. <http://aciri.org/mallman/papers/estimation-la.pdf>.
[Ekstroem04] [Ekstroem04]
Ekstroem, H. and R. Ludwig, "The Peak-Hopper: A New End- Ekstroem, H. and R. Ludwig, "The Peak-Hopper: A New End-
to-End Retransmission Timer for Reliable Unicast to-End Retransmission Timer for Reliable Unicast
skipping to change at page 30, line 29 skipping to change at page 32, line 17
[Jain86] Jain, R., "Divergence of Timeout Algorithms for Packet [Jain86] Jain, R., "Divergence of Timeout Algorithms for Packet
Retransmissions", Proc. Fifth Phoenix Conf. on Comp. and Retransmissions", Proc. Fifth Phoenix Conf. on Comp. and
Comm., Scottsdale, Arizona, March 1986, Comm., Scottsdale, Arizona, March 1986,
<http://arxiv.org/ftp/cs/papers/9809/9809097.pdf>. <http://arxiv.org/ftp/cs/papers/9809/9809097.pdf>.
[Karn87] Karn, P. and C. Partridge, "Estimating Round-Trip Times in [Karn87] Karn, P. and C. Partridge, "Estimating Round-Trip Times in
Reliable Transport Protocols", Proc. SIGCOMM '87, Reliable Transport Protocols", Proc. SIGCOMM '87,
August 1987. August 1987.
[Kuehlewind10]
Kuehlewind, M. and B. Briscoe, "Chirping for Congestion
Control - Implementation Feasibility", November 2010,
<bobbriscoe.net/projects/netsvc_i-f/chirp_pfldnet10.pdf>.
[Kuzmanovic03]
Kuzmanovic, A. and E. Knightly, "TCP-LP: Low-Priority
Service via End-Point Congestion Control", 2003,
<www.cs.northwestern.edu/~akuzma/doc/TCP-LP-ToN.pdf>.
[Ludwig00] [Ludwig00]
Ludwig, R. and K. Sklower, "The Eifel Retransmission Ludwig, R. and K. Sklower, "The Eifel Retransmission
Timer", ACM SIGCOMM Computer Communication Review Volume Timer", ACM SIGCOMM Computer Communication Review Volume
30 Issue 3, July 2000, <http://ccr.sigcomm.org/archive/ 30 Issue 3, July 2000, <http://ccr.sigcomm.org/archive/
2000/july00/LudwigFinal.pdf>. 2000/july00/LudwigFinal.pdf>.
[Martin03] [Martin03]
Martin, D., "[Tsvwg] RFC 1323.bis", Message to the tsvwg Martin, D., "[Tsvwg] RFC 1323.bis", Message to the tsvwg
mailing list, September 2003, <http://www.ietf.org/ mailing list, September 2003, <http://www.ietf.org/
mail-archive/web/tsvwg/current/msg04435.html>. mail-archive/web/tsvwg/current/msg04435.html>.
skipping to change at page 31, line 11 skipping to change at page 33, line 9
Proc. ACM SIGCOMM/USENIX Internet Measurement Conference. Proc. ACM SIGCOMM/USENIX Internet Measurement Conference.
October 2004, August 2004, October 2004, August 2004,
<http://www.icir.net/tbit/tbit-Aug2004.pdf>. <http://www.icir.net/tbit/tbit-Aug2004.pdf>.
[Medina05] [Medina05]
Medina, A., Allman, M., and S. Floyd, "Measuring the Medina, A., Allman, M., and S. Floyd, "Measuring the
Evolution of Transport Protocols in the Internet", ACM Evolution of Transport Protocols in the Internet", ACM
Computer Communication Review 35(2), April 2005, Computer Communication Review 35(2), April 2005,
<http://icir.net/floyd/papers/TCPevolution-Mar2005.pdf>. <http://icir.net/floyd/papers/TCPevolution-Mar2005.pdf>.
[Oppermann13]
Oppermann, A., "[tcpm] Explanation to the relaxation of
TSopt acceptance rules", Message to the tcpm mailing list,
Jun 2013, <http://www.ietf.org/mail-archive/web/tcpm/
current/msg08001.html>.
[RFC0896] Nagle, J., "Congestion control in IP/TCP internetworks", [RFC0896] Nagle, J., "Congestion control in IP/TCP internetworks",
RFC 896, January 1984. RFC 896, January 1984.
[RFC1072] Jacobson, V. and R. Braden, "TCP extensions for long-delay [RFC1072] Jacobson, V. and R. Braden, "TCP extensions for long-delay
paths", RFC 1072, October 1988. paths", RFC 1072, October 1988.
[RFC1110] McKenzie, A., "Problem with the TCP big window option", [RFC1110] McKenzie, A., "Problem with the TCP big window option",
RFC 1110, August 1989. RFC 1110, August 1989.
[RFC1122] Braden, R., "Requirements for Internet Hosts - [RFC1122] Braden, R., "Requirements for Internet Hosts -
skipping to change at page 32, line 15 skipping to change at page 34, line 21
[RFC4963] Heffner, J., Mathis, M., and B. Chandler, "IPv4 Reassembly [RFC4963] Heffner, J., Mathis, M., and B. Chandler, "IPv4 Reassembly
Errors at High Data Rates", RFC 4963, July 2007. Errors at High Data Rates", RFC 4963, July 2007.
[RFC5681] Allman, M., Paxson, V., and E. Blanton, "TCP Congestion [RFC5681] Allman, M., Paxson, V., and E. Blanton, "TCP Congestion
Control", RFC 5681, September 2009. Control", RFC 5681, September 2009.
[RFC6298] Paxson, V., Allman, M., Chu, J., and M. Sargent, [RFC6298] Paxson, V., Allman, M., Chu, J., and M. Sargent,
"Computing TCP's Retransmission Timer", RFC 6298, "Computing TCP's Retransmission Timer", RFC 6298,
June 2011. June 2011.
[RFC6528] Gont, F. and S. Bellovin, "Defending against Sequence
Number Attacks", RFC 6528, February 2012.
[RFC6675] Blanton, E., Allman, M., Wang, L., Jarvinen, I., Kojo, M., [RFC6675] Blanton, E., Allman, M., Wang, L., Jarvinen, I., Kojo, M.,
and Y. Nishida, "A Conservative Loss Recovery Algorithm and Y. Nishida, "A Conservative Loss Recovery Algorithm
Based on Selective Acknowledgment (SACK) for TCP", Based on Selective Acknowledgment (SACK) for TCP",
RFC 6675, August 2012. RFC 6675, August 2012.
[RFC6691] Borman, D., "TCP Options and Maximum Segment Size (MSS)", [RFC6691] Borman, D., "TCP Options and Maximum Segment Size (MSS)",
RFC 6691, July 2012. RFC 6691, July 2012.
[RFC6817] Shalunov, S., Hazel, G., Iyengar, J., and M. Kuehlewind,
"Low Extra Delay Background Transport (LEDBAT)", RFC 6817,
December 2012.
[Watson81] [Watson81]
Watson, R., "Timer-based Mechanisms in Reliable Transport Watson, R., "Timer-based Mechanisms in Reliable Transport
Protocol Connection Management", Computer Networks, Vol. Protocol Connection Management", Computer Networks, Vol.
5, 1981. 5, 1981.
[Zhang86] Zhang, L., "Why TCP Timers Don't Work Well", Proc. SIGCOMM [Zhang86] Zhang, L., "Why TCP Timers Don't Work Well", Proc. SIGCOMM
'86, Stowe, VT, August 1986. '86, Stowe, VT, August 1986.
Appendix A. Implementation Suggestions Appendix A. Implementation Suggestions
skipping to change at page 45, line 27 skipping to change at page 47, line 36
(h) <RST> segments are explicitly excluded from PAWS processing. (h) <RST> segments are explicitly excluded from PAWS processing.
(i) Added text to clarify the precedence between regular TCP (i) Added text to clarify the precedence between regular TCP
[RFC0793] and this document Timestamps option / PAWS processing. [RFC0793] and this document Timestamps option / PAWS processing.
Discussion about combined acceptability checks are ongoing. Discussion about combined acceptability checks are ongoing.
(j) Snd.TSoffset and Snd.TSclock variables have been added. (j) Snd.TSoffset and Snd.TSclock variables have been added.
Snd.TSclock is the sum of my.TSclock and Snd.TSoffset. This Snd.TSclock is the sum of my.TSclock and Snd.TSoffset. This
allows the starting points for timestamp values to be randomized allows the starting points for timestamp values to be randomized
on a per-connection basis. Setting Snd.TSoffset to zero yields on a per-connection basis. Setting Snd.TSoffset to zero yields
the same results as [RFC1323]. the same results as [RFC1323]. Text was added to guide
implementors to the proper selection of these offsets, as
entirly random offsets for each new connection will conflict
with PAWS.
(k) Appendix A has been expanded with information about the TCP (k) Appendix A has been expanded with information about the TCP
Urgent Pointer. An earlier revision contained text around the Urgent Pointer. An earlier revision contained text around the
TCP MSS option, which was split off into [RFC6691]. TCP MSS option, which was split off into [RFC6691].
(l) One correction was made to the Event Processing Summary in (l) One correction was made to the Event Processing Summary in
Appendix D. In SEND CALL/ESTABLISHED STATE, RCV.WND is used to Appendix D. In SEND CALL/ESTABLISHED STATE, RCV.WND is used to
fill in the SEG.WND value, not SND.WND. fill in the SEG.WND value, not SND.WND.
(m) Appendix G was added to exemplify how an RTO calculation might (m) Appendix G was added to exemplify how an RTO calculation might
be updated to properly take the much higher RTT sampling be updated to properly take the much higher RTT sampling
frequency enabled by the Timestamps option into account. frequency enabled by the Timestamps option into account.
Editorial changes of the document, that don't impact the Editorial changes of the document, that don't impact the
implementation or function of the mechanisms described in this implementation or function of the mechanisms described in this
document include: document include:
(a) Removed much of the discussion in Section 1 to streamline the (a) Removed much of the discussion in Section 1 to streamline the
document. However, detailed examples and discussions in document. However, detailed examples and discussions in
Section 2, Section 3 and Section 4 are kept as guideline for Section 2, Section 3 and Section 5 are kept as guideline for
implementers. implementers.
(b) Removed references to "new" options, as the options were (b) Added short text that the use of WS increases the chances of
sequence number wrap, thus the PAWS mechanism is required in
certain environments.
(c) Removed references to "new" options, as the options were
introduced in [RFC1323] already. Changed the text in introduced in [RFC1323] already. Changed the text in
Section 1.3 to specifically address TS and WS options. Section 1.3 to specifically address TS and WS options.
(c) Section 1.4 was added for [RFC2119] wording. Normative text was (d) Section 1.4 was added for [RFC2119] wording. Normative text was
updated with the appropriate phrases. updated with the appropriate phrases.
(d) Added < > brackets to mark specific types of segments, and (e) Added < > brackets to mark specific types of segments, and
replaced most occurances of "packet" with "segment", where TCP replaced most occurances of "packet" with "segment", where TCP
segments are referred to. segments are referred to.
(e) Updated the text in Section 3 to take into account what has been (f) Updated the text in Section 3 to take into account what has been
learned since [RFC1323]. learned since [RFC1323].
(f) Removed the list of changes between [RFC1323] and prior (g) Removed the list of changes between [RFC1323] and prior
versions. These changes are mentioned in Appendix C of versions. These changes are mentioned in Appendix C of
[RFC1323]. [RFC1323].
(g) Moved Appendix Changes from RFC 1323 to the end of the (h) Moved Appendix Changes from RFC 1323 to the end of the
appendices for easier lookup. In addition, the entries were appendices for easier lookup. In addition, the entries were
split into a technical and an editorial part, and sorted to split into a technical and an editorial part, and sorted to
roughly correspond with the sections in the text where they roughly correspond with the sections in the text where they
apply. apply.
Authors' Addresses Authors' Addresses
David Borman David Borman
Quantum Corporation Quantum Corporation
Mendota Heights MN 55120 Mendota Heights MN 55120
 End of changes. 76 change blocks. 
219 lines changed or deleted 304 lines changed or added

This html diff was produced by rfcdiff 1.41. The latest version is available from http://tools.ietf.org/tools/rfcdiff/