draft-ietf-tcpm-1323bis-14.txt   draft-ietf-tcpm-1323bis-15.txt 
TCP Maintenance (TCPM) D. Borman TCP Maintenance (TCPM) D. Borman
Internet-Draft Quantum Corporation Internet-Draft Quantum Corporation
Intended status: Standards Track B. Braden Intended status: Standards Track B. Braden
Expires: November 24, 2013 University of Southern Expires: February 7, 2014 University of Southern
California California
V. Jacobson V. Jacobson
Packet Design Google, Inc.
R. Scheffenegger, Ed. R. Scheffenegger, Ed.
NetApp, Inc. NetApp, Inc.
May 23, 2013 August 6, 2013
TCP Extensions for High Performance TCP Extensions for High Performance
draft-ietf-tcpm-1323bis-14 draft-ietf-tcpm-1323bis-15
Abstract Abstract
This document specifies a set of TCP extensions to improve This document specifies a set of TCP extensions to improve
performance over paths with a large bandwidth * delay product and to performance over paths with a large bandwidth * delay product and to
provide reliable operation over very high-speed paths. It defines provide reliable operation over very high-speed paths. It defines
TCP options for scaled windows and timestamps. The timestamps are TCP options for scaled windows and timestamps. The timestamps are
used for two distinct mechanisms, RTTM (Round Trip Time Measurement) used for two distinct mechanisms, RTTM (Round Trip Time Measurement)
and PAWS (Protection Against Wrapped Sequences). and PAWS (Protection Against Wrapped Sequences).
This document updates and obsoletes RFC 1323. This document obsoletes RFC 1323 and describes changes from it.
Status of this Memo Status of this Memo
This Internet-Draft is submitted in full conformance with the This Internet-Draft is submitted in full conformance with the
provisions of BCP 78 and BCP 79. provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF). Note that other groups may also distribute Task Force (IETF). Note that other groups may also distribute
working documents as Internet-Drafts. The list of current Internet- working documents as Internet-Drafts. The list of current Internet-
Drafts is at http://datatracker.ietf.org/drafts/current/. Drafts is at http://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress." material or to cite them other than as "work in progress."
This Internet-Draft will expire on November 24, 2013. This Internet-Draft will expire on February 7, 2014.
Copyright Notice Copyright Notice
Copyright (c) 2013 IETF Trust and the persons identified as the Copyright (c) 2013 IETF Trust and the persons identified as the
document authors. All rights reserved. document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents Provisions Relating to IETF Documents
(http://trustee.ietf.org/license-info) in effect on the date of (http://trustee.ietf.org/license-info) in effect on the date of
publication of this document. Please review these documents publication of this document. Please review these documents
skipping to change at page 3, line 20 skipping to change at page 3, line 20
1.3. Using TCP options . . . . . . . . . . . . . . . . . . . . 6 1.3. Using TCP options . . . . . . . . . . . . . . . . . . . . 6
1.4. Terminology . . . . . . . . . . . . . . . . . . . . . . . 7 1.4. Terminology . . . . . . . . . . . . . . . . . . . . . . . 7
2. TCP Window Scale Option . . . . . . . . . . . . . . . . . . . 8 2. TCP Window Scale Option . . . . . . . . . . . . . . . . . . . 8
2.1. Introduction . . . . . . . . . . . . . . . . . . . . . . . 8 2.1. Introduction . . . . . . . . . . . . . . . . . . . . . . . 8
2.2. Window Scale Option . . . . . . . . . . . . . . . . . . . 8 2.2. Window Scale Option . . . . . . . . . . . . . . . . . . . 8
2.3. Using the Window Scale Option . . . . . . . . . . . . . . 9 2.3. Using the Window Scale Option . . . . . . . . . . . . . . 9
2.4. Addressing Window Retraction . . . . . . . . . . . . . . . 10 2.4. Addressing Window Retraction . . . . . . . . . . . . . . . 10
3. TCP Timestamps option . . . . . . . . . . . . . . . . . . . . 12 3. TCP Timestamps option . . . . . . . . . . . . . . . . . . . . 12
3.1. Introduction . . . . . . . . . . . . . . . . . . . . . . . 12 3.1. Introduction . . . . . . . . . . . . . . . . . . . . . . . 12
3.2. Timestamps option . . . . . . . . . . . . . . . . . . . . 12 3.2. Timestamps option . . . . . . . . . . . . . . . . . . . . 12
3.3. The RTTM Mechanism . . . . . . . . . . . . . . . . . . . . 13 3.3. The RTTM Mechanism . . . . . . . . . . . . . . . . . . . . 14
3.4. Updating the RTO value . . . . . . . . . . . . . . . . . . 15 3.4. Updating the RTO value . . . . . . . . . . . . . . . . . . 15
3.5. Which Timestamp to Echo . . . . . . . . . . . . . . . . . 15 3.5. Which Timestamp to Echo . . . . . . . . . . . . . . . . . 16
4. PAWS - Protection Against Wrapped Sequence Numbers . . . . . . 18 4. PAWS - Protection Against Wrapped Sequence Numbers . . . . . . 18
4.1. Introduction . . . . . . . . . . . . . . . . . . . . . . . 18 4.1. Introduction . . . . . . . . . . . . . . . . . . . . . . . 18
4.2. The PAWS Mechanism . . . . . . . . . . . . . . . . . . . . 18 4.2. The PAWS Mechanism . . . . . . . . . . . . . . . . . . . . 19
4.3. Basic PAWS Algorithm . . . . . . . . . . . . . . . . . . . 19 4.3. Basic PAWS Algorithm . . . . . . . . . . . . . . . . . . . 20
4.4. Timestamp Clock . . . . . . . . . . . . . . . . . . . . . 21 4.4. Timestamp Clock . . . . . . . . . . . . . . . . . . . . . 22
4.5. Outdated Timestamps . . . . . . . . . . . . . . . . . . . 23 4.5. Outdated Timestamps . . . . . . . . . . . . . . . . . . . 23
4.6. Header Prediction . . . . . . . . . . . . . . . . . . . . 23 4.6. Header Prediction . . . . . . . . . . . . . . . . . . . . 24
4.7. IP Fragmentation . . . . . . . . . . . . . . . . . . . . . 25 4.7. IP Fragmentation . . . . . . . . . . . . . . . . . . . . . 25
4.8. Duplicates from Earlier Incarnations of Connection . . . . 25 4.8. Duplicates from Earlier Incarnations of Connection . . . . 26
5. Conclusions and Acknowledgements . . . . . . . . . . . . . . . 25 5. Conclusions and Acknowledgements . . . . . . . . . . . . . . . 26
6. Security Considerations . . . . . . . . . . . . . . . . . . . 26 6. Security Considerations . . . . . . . . . . . . . . . . . . . 27
7. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 27 7. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 28
8. References . . . . . . . . . . . . . . . . . . . . . . . . . . 28 8. References . . . . . . . . . . . . . . . . . . . . . . . . . . 29
8.1. Normative References . . . . . . . . . . . . . . . . . . . 28 8.1. Normative References . . . . . . . . . . . . . . . . . . . 29
8.2. Informative References . . . . . . . . . . . . . . . . . . 28 8.2. Informative References . . . . . . . . . . . . . . . . . . 29
Appendix A. Implementation Suggestions . . . . . . . . . . . . . 31 Appendix A. Implementation Suggestions . . . . . . . . . . . . . 32
Appendix B. Duplicates from Earlier Connection Incarnations . . . 32 Appendix B. Duplicates from Earlier Connection Incarnations . . . 33
B.1. System Crash with Loss of State . . . . . . . . . . . . . 32 B.1. System Crash with Loss of State . . . . . . . . . . . . . 33
B.2. Closing and Reopening a Connection . . . . . . . . . . . . 33 B.2. Closing and Reopening a Connection . . . . . . . . . . . . 34
Appendix C. Summary of Notation . . . . . . . . . . . . . . . . . 34 Appendix C. Summary of Notation . . . . . . . . . . . . . . . . . 35
Appendix D. Event Processing Summary . . . . . . . . . . . . . . 35 Appendix D. Event Processing Summary . . . . . . . . . . . . . . 36
Appendix E. Timestamps Edge Cases . . . . . . . . . . . . . . . . 41 Appendix E. Timestamps Edge Cases . . . . . . . . . . . . . . . . 42
Appendix F. Window Retraction Example . . . . . . . . . . . . . . 41 Appendix F. Window Retraction Example . . . . . . . . . . . . . . 42
Appendix G. RTO calculation modification . . . . . . . . . . . . 42 Appendix G. RTO calculation modification . . . . . . . . . . . . 43
Appendix H. Changes from RFC 1323 . . . . . . . . . . . . . . . . 43 Appendix H. Changes from RFC 1323 . . . . . . . . . . . . . . . . 44
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 45 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 46
1. Introduction 1. Introduction
The TCP protocol [RFC0793] was designed to operate reliably over The TCP protocol [RFC0793] was designed to operate reliably over
almost any transmission medium regardless of transmission rate, almost any transmission medium regardless of transmission rate,
delay, corruption, duplication, or reordering of segments. Over the delay, corruption, duplication, or reordering of segments. Over the
years, advances in networking technology has resulted in ever-higher years, advances in networking technology has resulted in ever-higher
transmission speeds, and the fastest paths are well beyond the domain transmission speeds, and the fastest paths are well beyond the domain
for which TCP was originally engineered. for which TCP was originally engineered.
skipping to change at page 5, line 10 skipping to change at page 5, line 10
option, "Window Scale", to allow windows larger than 2^16. This option, "Window Scale", to allow windows larger than 2^16. This
option defines an implicit scale factor, which is used to option defines an implicit scale factor, which is used to
multiply the window size value found in a TCP header to obtain multiply the window size value found in a TCP header to obtain
the true window size. the true window size.
(2) Recovery from Losses (2) Recovery from Losses
Packet losses in an LFN can have a catastrophic effect on Packet losses in an LFN can have a catastrophic effect on
throughput. throughput.
To generalize the Fast Retransmit/Fast Recovery mechanism to To generalize the Fast Retransmit / Fast Recovery mechanism to
handle multiple packets dropped per window, selective handle multiple packets dropped per window, Selective
acknowledgments are required. Unlike the normal cumulative Acknowledgments are required. Unlike the normal cumulative
acknowledgments of TCP, selective acknowledgments give the acknowledgments of TCP, Selective Acknowledgments give the
sender a complete picture of which segments are queued at the sender a complete picture of which segments are queued at the
receiver and which have not yet arrived. receiver and which have not yet arrived.
Selective acknowledgements and their use are specified in Selective acknowledgements and their use are specified in
separate documents, "TCP Selective Acknowledgment Options" separate documents, "TCP Selective Acknowledgment Options"
[RFC2018], "An Extension to the Selective Acknowledgement (SACK) [RFC2018], "An Extension to the Selective Acknowledgement (SACK)
Option for TCP" [RFC2883], and "A Conservative Selective Option for TCP" [RFC2883], and "A Conservative Selective
Acknowledgment (SACK)-based Loss Recovery Algorithm for TCP" Acknowledgment (SACK)-based Loss Recovery Algorithm for TCP"
[RFC6675], and not further discussed in this document. [RFC6675], and not further discussed in this document.
skipping to change at page 7, line 13 skipping to change at page 7, line 13
the gains in performance and security in an actual deployment. the gains in performance and security in an actual deployment.
Appendix A contains a recommended layout of the options in TCP Appendix A contains a recommended layout of the options in TCP
headers to achieve reasonable data field alignment. headers to achieve reasonable data field alignment.
Finally, we observe that most of the mechanisms defined in this Finally, we observe that most of the mechanisms defined in this
document are important for LFN's and/or very high-speed networks. document are important for LFN's and/or very high-speed networks.
For low-speed networks, it might be a performance optimization to NOT For low-speed networks, it might be a performance optimization to NOT
use these mechanisms. A TCP vendor concerned about optimal use these mechanisms. A TCP vendor concerned about optimal
performance over low-speed paths might consider turning these performance over low-speed paths might consider turning these
extensions off for low-speed paths, or allow a user or installation extensions off for low- speed paths, or allow a user or installation
manager to disable them. manager to disable them.
1.4. Terminology 1.4. Terminology
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
document are to be interpreted as described in [RFC2119]. document are to be interpreted as described in [RFC2119].
In this document, these words will appear with that interpretation In this document, these words will appear with that interpretation
only when in UPPER CASE. Lower case uses of these words are not to only when in UPPER CASE. Lower case uses of these words are not to
skipping to change at page 12, line 13 skipping to change at page 12, line 13
the window announced by the most recent <ACK>. the window announced by the most recent <ACK>.
3. TCP Timestamps option 3. TCP Timestamps option
3.1. Introduction 3.1. Introduction
TCP measures the round trip time (RTT), primarily for the purpose of TCP measures the round trip time (RTT), primarily for the purpose of
arriving at a reasonable value for the Retransmission Timeout (RTO) arriving at a reasonable value for the Retransmission Timeout (RTO)
timer interval. Accurate and current RTT estimates are necessary to timer interval. Accurate and current RTT estimates are necessary to
adapt to changing traffic conditions, while a conservative estimate adapt to changing traffic conditions, while a conservative estimate
of the RTO inveral is necessary to minimize spurious RTOs. of the RTO interval is necessary to minimize spurious RTOs.
When [RFC1323] was originally written, it was perceived that taking When [RFC1323] was originally written, it was perceived that taking
RTT measurements for each segment, and also during retransmissions, RTT measurements for each segment, and also during retransmissions,
would contribute to reduce spurious RTOs, while maintaining the would contribute to reduce spurious RTOs, while maintaining the
timeliness of necessary RTOs. At the time, RTO was also the only timeliness of necessary RTOs. At the time, RTO was also the only
mechanism to make use of the measured RTT. It has been shown, that mechanism to make use of the measured RTT. It has been shown, that
taking more RTT samples has only a very limited effect to optimize taking more RTT samples has only a very limited effect to optimize
RTOs [Allman99]. RTOs [Allman99].
This document makes a clear distinction between the round trip time This document makes a clear distinction between the round trip time
skipping to change at page 13, line 24 skipping to change at page 13, line 24
generally be from the most recent Timestamps option that was generally be from the most recent Timestamps option that was
received; however, there are exceptions that are explained below. received; however, there are exceptions that are explained below.
A TCP MAY send the Timestamps option (TSopt) in an initial <SYN> A TCP MAY send the Timestamps option (TSopt) in an initial <SYN>
segment (i.e., segment containing a SYN bit and no ACK bit), and MAY segment (i.e., segment containing a SYN bit and no ACK bit), and MAY
send a TSopt in other segments only if it received a TSopt in the send a TSopt in other segments only if it received a TSopt in the
initial <SYN> or <SYN,ACK> segment for the connection. initial <SYN> or <SYN,ACK> segment for the connection.
Once TSopt has been successfully negotiated (sent and received) Once TSopt has been successfully negotiated (sent and received)
during the <SYN>, <SYN,ACK> exchange, TSopt MUST be sent in every during the <SYN>, <SYN,ACK> exchange, TSopt MUST be sent in every
non-<RST> segment for the duration of the connection, and SHOULD be non- <RST> segment for the duration of the connection, and SHOULD be
sent in an <RST> segment (see Section 4.2 for details). If a non- sent in an <RST> segment (see Section 4.2 for details). If a non-
<RST> segment is received without a TSopt, a TCP MUST drop the <RST> segment is received without a TSopt, a TCP SHOULD silently drop
segment and MAY also send an <ACK> for the last in-sequence segment. the segment. A TCP MUST NOT abort a TCP connection because any
A TCP MUST NOT abort a TCP connection because any segment lacks an segment lacks an expected TSopt.
expected TSopt.
Implementations are strongly encouraged to follow the above rules for
handling a missing Timestamps option, and the order of precedence
mentioned in Section 4.3 when deciding on the acceptance of a
segment.
If a receiver chooses to accept a segment without an expected
Timestamps option, it must be clear that undetectable data corruption
may occur.
Such a TCP receiver may experience undetectable wrapped- sequence
effects, such as data (payload) corruption or session stalls. In
order to maintain the integrity of the payload data, in particular on
high speed networks, it is paramount to follow the described
processing rules.
However, it has been mentioned that under some circumstances, the
above guidelines are too strict, and some paths sporadically suppress
the Timestamps option, while maintaining payload integrity. A path
behaving in this manner should be deemed unacceptable, but it has
been noted that some implementations relax the acceptance rules as a
workaround, and allow TCP to run across such paths.
If a TSopt is received on a connection where TSopt was not negotiated If a TSopt is received on a connection where TSopt was not negotiated
in the initial three-way handshake, the TSopt MUST be ignored and the in the initial three-way handshake, the TSopt MUST be ignored and the
packet processed normally. packet processed normally.
In the case of crossing <SYN> segments where one <SYN> contains a In the case of crossing <SYN> segments where one <SYN> contains a
TSopt and the other doesn't, both sides MAY send a TSopt in the TSopt and the other doesn't, both sides MAY send a TSopt in the
<SYN,ACK> segment. <SYN,ACK> segment.
TSopt is required for the two mechanisms described in sections 3.3 TSopt is required for the two mechanisms described in sections 3.3
skipping to change at page 14, line 7 skipping to change at page 14, line 28
3.3. The RTTM Mechanism 3.3. The RTTM Mechanism
RTTM places a Timestamps option in every segment, with a TSval that RTTM places a Timestamps option in every segment, with a TSval that
is obtained from a (virtual) "timestamp clock". Values of this clock is obtained from a (virtual) "timestamp clock". Values of this clock
MUST be at least approximately proportional to real time, in order to MUST be at least approximately proportional to real time, in order to
measure actual RTT. measure actual RTT.
These TSval values are echoed in TSecr values in the reverse These TSval values are echoed in TSecr values in the reverse
direction. The difference between a received TSecr value and the direction. The difference between a received TSecr value and the
current timestamp clock value provides an RTTmeasurement. current timestamp clock value provides an RTT measurement.
When timestamps are used, every segment that is received will contain When timestamps are used, every segment that is received will contain
a TSecr value. However, these values cannot all be used to update a TSecr value. However, these values cannot all be used to update
the measured RTT. The following example illustrates why. It shows a the measured RTT. The following example illustrates why. It shows a
one-way data flow with segments arriving in sequence without loss. one-way data flow with segments arriving in sequence without loss.
Here A, B, C... represent data blocks occupying successive blocks of Here A, B, C... represent data blocks occupying successive blocks of
sequence numbers, and ACK(A),... represent the corresponding sequence numbers, and ACK(A),... represent the corresponding
cumulative acknowledgments. The two timestamp fields of the cumulative acknowledgments. The two timestamp fields of the
Timestamps option are shown symbolically as <TSval=x,TSecr=y>. Each Timestamps option are shown symbolically as <TSval=x,TSecr=y>. Each
TSecr field contains the value most recently received in a TSval TSecr field contains the value most recently received in a TSval
skipping to change at page 15, line 8 skipping to change at page 15, line 42
the left edge of the send window, i.e. SND.UNA is the left edge of the send window, i.e. SND.UNA is
increased. increased.
Since TCP B is not sending data, the data segment C does not Since TCP B is not sending data, the data segment C does not
acknowledge any new data when it arrives at B. Thus, the inflated acknowledge any new data when it arrives at B. Thus, the inflated
RTTM measurement is not used to update B's RTTM measurement. RTTM measurement is not used to update B's RTTM measurement.
3.4. Updating the RTO value 3.4. Updating the RTO value
[Ludwig00] and [Floyd05] have highlighted the problem that an [Ludwig00] and [Floyd05] have highlighted the problem that an
unmodified RTO calculation, which is updated with per-packet RTT unmodified RTO calculation, which is updated with per- packet RTT
samples, will truncate the path history too soon. This can lead to samples, will truncate the path history too soon. This can lead to
an increase in spurious retransmissions, when the path properties an increase in spurious retransmissions, when the path properties
vary in the order of a few RTTs, but a high number of RTT samples are vary in the order of a few RTTs, but a high number of RTT samples are
taken on a much shorter timescale. taken on a much shorter timescale.
Implementers should note that with timestamps multiple RTTMs can be Implementers should note that with timestamps multiple RTTMs can be
taken per RTT. The [RFC6298] RTO estimator has weighting factors, taken per RTT. The [RFC6298] RTO estimator has weighting factors,
alpha and beta, based on an implicit assumption that at most one RTTM alpha and beta, based on an implicit assumption that at most one RTTM
will be sampled per RTT. When multiple RTTMs per RTT are available will be sampled per RTT. When multiple RTTMs per RTT are available
to update the RTO estimator, this implicit assumption must be to update the RTO estimator, this implicit assumption must be
skipping to change at page 15, line 52 skipping to change at page 16, line 37
The sender will continue sending until the window is filled, and The sender will continue sending until the window is filled, and
the receiver may be generating <ACK>s as these out-of-order the receiver may be generating <ACK>s as these out-of-order
segments arrive (e.g., to aid "fast retransmit"). segments arrive (e.g., to aid "fast retransmit").
The lost segment is probably a sign of congestion, and in that The lost segment is probably a sign of congestion, and in that
situation the sender should be conservative about situation the sender should be conservative about
retransmission. Furthermore, it is better to overestimate than retransmission. Furthermore, it is better to overestimate than
underestimate the RTT. An <ACK> for an out-of-order segment underestimate the RTT. An <ACK> for an out-of-order segment
SHOULD therefore contain the timestamp from the most recent SHOULD therefore contain the timestamp from the most recent
segment that advanced the window. segment that advanced RCV.NXT.
The same situation occurs if segments are re-ordered by the The same situation occurs if segments are re-ordered by the
network. network.
(C) A filled hole in the sequence space. (C) A filled hole in the sequence space.
The segment that fills the hole and advances the window The segment that fills the hole and advances the window
represents the most recent measurement of the network represents the most recent measurement of the network
characteristics. An RTT computed from an earlier segment would characteristics. An RTT computed from an earlier segment would
probably include the sender's retransmit time-out, badly biasing probably include the sender's retransmit time-out, badly biasing
skipping to change at page 19, line 11 skipping to change at page 19, line 48
fields carried in returning <ACK> or data segments. PAWS submits all fields carried in returning <ACK> or data segments. PAWS submits all
incoming segments to the same test, and therefore protects against incoming segments to the same test, and therefore protects against
duplicate <ACK> segments as well as data segments. (An alternative duplicate <ACK> segments as well as data segments. (An alternative
non-symmetric algorithm would protect against old duplicate <ACK>s: non-symmetric algorithm would protect against old duplicate <ACK>s:
the sender of data would reject incoming <ACK> segments whose TSecr the sender of data would reject incoming <ACK> segments whose TSecr
values were less than the TSecr saved from the last segment whose ACK values were less than the TSecr saved from the last segment whose ACK
field advanced the left edge of the send window. This algorithm was field advanced the left edge of the send window. This algorithm was
deemed to lack economy of mechanism and symmetry.) deemed to lack economy of mechanism and symmetry.)
TSval timestamps sent on <SYN> and <SYN,ACK> segments are used to TSval timestamps sent on <SYN> and <SYN,ACK> segments are used to
initialize PAWS. PAWS protects against old duplicate non-<SYN> initialize PAWS. PAWS protects against old duplicate non- <SYN>
segments, and duplicate <SYN> segments received while there is a segments, and duplicate <SYN> segments received while there is a
synchronized connection. Duplicate <SYN> and <SYN,ACK> segments synchronized connection. Duplicate <SYN> and <SYN,ACK> segments
received when there is no connection will be discarded by the normal received when there is no connection will be discarded by the normal
3-way handshake and sequence number checks of TCP. 3-way handshake and sequence number checks of TCP.
[RFC1323] recommended that <RST> segments NOT carry timestamps, and [RFC1323] recommended that <RST> segments NOT carry timestamps, and
that they be acceptable regardless of their timestamp. At that time, that they be acceptable regardless of their timestamp. At that time,
the thinking was that old duplicate <RST> segments should be the thinking was that old duplicate <RST> segments should be
exceedingly unlikely, and their cleanup function should take exceedingly unlikely, and their cleanup function should take
precedence over timestamps. More recently, discussions about various precedence over timestamps. More recently, discussions about various
skipping to change at page 19, line 39 skipping to change at page 20, line 27
be set to SEG.TSval from the incoming segment and SEG.TSval SHOULD be be set to SEG.TSval from the incoming segment and SEG.TSval SHOULD be
set to zero. If an <RST> is being generated because of a user abort, set to zero. If an <RST> is being generated because of a user abort,
and Snd.TS.OK is set, then a Timestamps option SHOULD be included in and Snd.TS.OK is set, then a Timestamps option SHOULD be included in
the <RST>. When an <RST> segment is received, it MUST NOT be the <RST>. When an <RST> segment is received, it MUST NOT be
subjected to PAWS checks, and information from the Timestamps option subjected to PAWS checks, and information from the Timestamps option
MUST NOT be used to update connection state information. SEG.TSecr MUST NOT be used to update connection state information. SEG.TSecr
MAY be used to provide stricter <RST> acceptance checks. MAY be used to provide stricter <RST> acceptance checks.
4.3. Basic PAWS Algorithm 4.3. Basic PAWS Algorithm
The PAWS algorithm REQUIRES the following processing to be performed If the PAWS algorithm is used, the following processing MUST be
on all incoming segments for a synchronized connection. Also, PAWS performed on all incoming segments for a synchronized connection.
processing MUST take precedence over the regular TCP acceptablitiy Also, PAWS processing MUST take precedence over the regular TCP
check (Section 3.3 in [RFC0793]), which is performed after acceptablitiy check (Section 3.3 in [RFC0793]), which is performed
verification of the received Timestamps option: after verification of the received Timestamps option:
R1) If there is a Timestamps option in the arriving segment, R1) If there is a Timestamps option in the arriving segment,
SEG.TSval < TS.Recent, TS.Recent is valid (see later discussion) SEG.TSval < TS.Recent, TS.Recent is valid (see later discussion)
and the RST bit is not set, then treat the arriving segment as and the RST bit is not set, then treat the arriving segment as
not acceptable: not acceptable:
Send an acknowledgement in reply as specified in [RFC0793] Send an acknowledgement in reply as specified in [RFC0793]
page 69 and drop the segment. page 69 and drop the segment.
Note: it is necessary to send an <ACK> segment in order to Note: it is necessary to send an <ACK> segment in order to
retain TCP's mechanisms for detecting and recovering from retain TCP's mechanisms for detecting and recovering from
half-open connections. For example, see Figure 10 of half- open connections. For example, see Figure 10 of
[RFC0793]. [RFC0793].
R2) If the segment is outside the window, reject it (normal TCP R2) If the segment is outside the window, reject it (normal TCP
processing) processing)
R3) If an arriving segment satisfies: SEG.SEQ <= Last.ACK.sent (see R3) If an arriving segment satisfies: SEG.SEQ <= Last.ACK.sent (see
Section 3.5), then record its timestamp in TS.Recent. Section 3.5), then record its timestamp in TS.Recent.
R4) If an arriving segment is in-sequence (i.e., at the left window R4) If an arriving segment is in-sequence (i.e., at the left window
edge), then accept it normally. edge), then accept it normally.
R5) Otherwise, treat the segment as a normal in-window, out-of- R5) Otherwise, treat the segment as a normal in-window, out-of-
sequence TCP segment (e.g., queue it for later delivery to the sequence TCP segment (e.g., queue it for later delivery to the
user). user).
Steps R2, R4, and R5 are the normal TCP processing steps specified by Steps R2, R4, and R5 are the normal TCP processing steps specified by
[RFC0793]. [RFC0793].
It is important to note that the timestamp MUST be checked only when It is important to note that the timestamp MUST be checked only when
a segment first arrives at the receiver, regardless of whether it is a segment first arrives at the receiver, regardless of whether it is
in-sequence or it must be queued for later delivery. in- sequence or it must be queued for later delivery.
Consider the following example. Consider the following example.
Suppose the segment sequence: A.1, B.1, C.1, ..., Z.1 has been Suppose the segment sequence: A.1, B.1, C.1, ..., Z.1 has been
sent, where the letter indicates the sequence number and the digit sent, where the letter indicates the sequence number and the digit
represents the timestamp. Suppose also that segment B.1 has been represents the timestamp. Suppose also that segment B.1 has been
lost. The timestamp in TS.Recent is 1 (from A.1), so C.1, ..., lost. The timestamp in TS.Recent is 1 (from A.1), so C.1, ...,
Z.1 are considered acceptable and are queued. When B is Z.1 are considered acceptable and are queued. When B is
retransmitted as segment B.2 (using the latest timestamp), it retransmitted as segment B.2 (using the latest timestamp), it
fills the hole and causes all the segments through Z to be fills the hole and causes all the segments through Z to be
skipping to change at page 27, line 11 skipping to change at page 27, line 44
[RFC2675] to be used when the local network supports packets larger [RFC2675] to be used when the local network supports packets larger
than 64 KiB. When larger TCP segments are used, the TCP checksum than 64 KiB. When larger TCP segments are used, the TCP checksum
becomes weaker. becomes weaker.
Mechanisms to protect the TCP header from modification should also Mechanisms to protect the TCP header from modification should also
protect the TCP options. protect the TCP options.
Middleboxes and TCP options: Middleboxes and TCP options:
Some middleboxes have been known to remove the TCP options Some middleboxes have been known to remove the TCP options
described in this document from the <SYN> segment. Middleboxes described in this document from TCP segments [Honda11].
that remove TCP options described in this document from the <SYN> Middleboxes that remove TCP options described in this document
segment interfere with the selection of parameters appropriate for from the <SYN> segment interfere with the selection of parameters
the session. Removing any of these options in a <SYN,ACK> segment appropriate for the session. Removing any of these options in a
will leave the end hosts in a state that destroys the proper <SYN,ACK> segment will leave the end hosts in a state that
operation of the protocol. destroys the proper operation of the protocol.
* If a Window Scale option is removed from a <SYN,ACK> segment, * If a Window Scale option is removed from a <SYN,ACK> segment,
the end hosts will not negotiate the window scaling factor the end hosts will not negotiate the window scaling factor
correctly. Middleboxes must not remove or modify the Window correctly. Middleboxes must not remove or modify the Window
Scale option from <SYN,ACK> segments. Scale option from <SYN,ACK> segments.
* If a stateful firewall uses the window field to detect whether * If a stateful firewall uses the window field to detect whether
a received segment is inside the current window, and does not a received segment is inside the current window, and does not
support the Window Scale option, it will not be able to support the Window Scale option, it will not be able to
correctly determine whether or not a packet is in the window. correctly determine whether or not a packet is in the window.
These middle boxes must also support the Window Scale option These middle boxes must also support the Window Scale option
and apply the scale factor when processing segments. If the and apply the scale factor when processing segments. If the
window scale factor cannot be determined, it must not do window window scale factor cannot be determined, it must not do window
based processing. based processing.
* If the Timestamps option is removed from the <SYN> or <SYN,ACK> * If the Timestamps option is removed from the <SYN> or <SYN,ACK>
segment, high speed connections that need PAWS would not have segment, high speed connections that need PAWS would not have
that protection. Middleboxes should not remove the Timestamps that protection. Successful negotiation of Timestamps option
option. enforces a stricter verification of incoming segments at the
receiver. If the Timestamps option was removed from a
subsequent data segment after a successful negotiation (e.g. as
part of re-segmentation), the segment is discarded by the
receiver without further processing. Middleboxes should not
remove the Timestamps option.
* It must be noted that [RFC1323] doesn't address the case of the
Timestamps option being dropped or selectively omitted after
being negotiated, and that the update in this document may
cause some broken middlebox behavior to be detected
(potentially unresponsive TCP sessions).
Implementations that depend on PAWS could provide a mechanism for the Implementations that depend on PAWS could provide a mechanism for the
application to determine whether or not PAWS is in use on the application to determine whether or not PAWS is in use on the
connection, and chose to terminate the connection if that protection connection, and chose to terminate the connection if that protection
doesn't exist. This is not just to protect the connection against doesn't exist. This is not just to protect the connection against
middleboxes that might remove the Timestamps option, but also against middleboxes that might remove the Timestamps option, but also against
remote hosts that do not have Timestamp support. remote hosts that do not have Timestamp support.
7. IANA Considerations 7. IANA Considerations
skipping to change at page 28, line 48 skipping to change at page 29, line 48
[Garlick77] [Garlick77]
Garlick, L., Rom, R., and J. Postel, "Issues in Reliable Garlick, L., Rom, R., and J. Postel, "Issues in Reliable
Host-to-Host Protocols", Proc. Second Berkeley Workshop on Host-to-Host Protocols", Proc. Second Berkeley Workshop on
Distributed Data Management and Computer Networks, Distributed Data Management and Computer Networks,
May 1977, <http://www.rfc-editor.org/ien/ien12.txt>. May 1977, <http://www.rfc-editor.org/ien/ien12.txt>.
[Hamming77] [Hamming77]
Hamming, R., "Digital Filters", Prentice Hall, Englewood Hamming, R., "Digital Filters", Prentice Hall, Englewood
Cliffs, N.J. ISBN 0-13-212571-4, 1977. Cliffs, N.J. ISBN 0-13-212571-4, 1977.
[Honda11] Honda, M., Nishida, Y., Raiciu, C., Greenhalgh, A.,
Handley, M., and H. Tokuda, "Is it still possible to
extend TCP?", Proc. of ACM Internet Measurement
Conference (IMC) '11, November 2011.
[Jacobson88a] [Jacobson88a]
Jacobson, V., "Congestion Avoidance and Control", SIGCOMM Jacobson, V., "Congestion Avoidance and Control", SIGCOMM
'88, Stanford, CA., August 1988, '88, Stanford, CA., August 1988,
<http://ee.lbl.gov/papers/congavoid.pdf>. <http://ee.lbl.gov/papers/congavoid.pdf>.
[Jacobson90a] [Jacobson90a]
Jacobson, V., "4BSD Header Prediction", ACM Computer Jacobson, V., "4BSD Header Prediction", ACM Computer
Communication Review, April 1990. Communication Review, April 1990.
[Jacobson90c] [Jacobson90c]
skipping to change at page 46, line 5 skipping to change at page 47, line 5
Email: david.borman@quantum.com Email: david.borman@quantum.com
Bob Braden Bob Braden
University of Southern California University of Southern California
4676 Admiralty Way 4676 Admiralty Way
Marina del Rey CA 90292 Marina del Rey CA 90292
USA USA
Email: braden@isi.edu Email: braden@isi.edu
Van Jacobson Van Jacobson
Packet Design Google, Inc.
2465 Latham Street 1600 Amphitheatre Parkway
Mountain View CA 94040 Mountain View CA 94043
USA USA
Email: van@packetdesign.com Email: vanj@google.com
Richard Scheffenegger (editor) Richard Scheffenegger (editor)
NetApp, Inc. NetApp, Inc.
Am Euro Platz 2 Am Euro Platz 2
Vienna, 1120 Vienna, 1120
Austria Austria
Email: rs@netapp.com Email: rs@netapp.com
 End of changes. 28 change blocks. 
64 lines changed or deleted 101 lines changed or added

This html diff was produced by rfcdiff 1.41. The latest version is available from http://tools.ietf.org/tools/rfcdiff/