draft-ietf-tcpm-2140bis-00.txt   draft-ietf-tcpm-2140bis-01.txt 
TCPM WG J. Touch TCPM WG J. Touch
Internet Draft Independent Internet Draft Independent
Intended status: Informational M. Welzl Intended status: Informational M. Welzl
Obsoletes: 2140 S. Islam Obsoletes: 2140 S. Islam
Expires: October 2019 University of Oslo Expires: May 2020 University of Oslo
April 15, 2019 November 19, 2019
TCP Control Block Interdependence TCP Control Block Interdependence
draft-ietf-tcpm-2140bis-00.txt draft-ietf-tcpm-2140bis-01.txt
Status of this Memo Status of this Memo
This Internet-Draft is submitted in full conformance with the This Internet-Draft is submitted in full conformance with the
provisions of BCP 78 and BCP 79. provisions of BCP 78 and BCP 79.
This document may contain material from IETF Documents or IETF This document may contain material from IETF Documents or IETF
Contributions published or made publicly available before November Contributions published or made publicly available before November
10, 2008. The person(s) controlling the copyright in some of this 10, 2008. The person(s) controlling the copyright in some of this
material may not have granted the IETF Trust the right to allow material may not have granted the IETF Trust the right to allow
skipping to change at page 1, line 45 skipping to change at page 1, line 45
months and may be updated, replaced, or obsoleted by other documents months and may be updated, replaced, or obsoleted by other documents
at any time. It is inappropriate to use Internet-Drafts as at any time. It is inappropriate to use Internet-Drafts as
reference material or to cite them other than as "work in progress." reference material or to cite them other than as "work in progress."
The list of current Internet-Drafts can be accessed at The list of current Internet-Drafts can be accessed at
http://www.ietf.org/ietf/1id-abstracts.txt http://www.ietf.org/ietf/1id-abstracts.txt
The list of Internet-Draft Shadow Directories can be accessed at The list of Internet-Draft Shadow Directories can be accessed at
http://www.ietf.org/shadow.html http://www.ietf.org/shadow.html
This Internet-Draft will expire on October 15, 2019. This Internet-Draft will expire on May 19, 2020.
Copyright Notice Copyright Notice
Copyright (c) 2019 IETF Trust and the persons identified as the Copyright (c) 2019 IETF Trust and the persons identified as the
document authors. All rights reserved. document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents Provisions Relating to IETF Documents
(https://trustee.ietf.org/license-info) in effect on the date of (https://trustee.ietf.org/license-info) in effect on the date of
publication of this document. Please review these documents publication of this document. Please review these documents
skipping to change at page 2, line 42 skipping to change at page 2, line 42
across connections to the same host. Such sharing is intended to across connections to the same host. Such sharing is intended to
improve overall transient transport performance, while maintaining improve overall transient transport performance, while maintaining
backward-compatibility with existing implementations. The sharing backward-compatibility with existing implementations. The sharing
described herein is limited to only the TCB initialization and so described herein is limited to only the TCB initialization and so
has no effect on the long-term behavior of TCP after a connection has no effect on the long-term behavior of TCP after a connection
has been established. has been established.
Table of Contents Table of Contents
1. Introduction...................................................3 1. Introduction...................................................3
2. Conventions used in this document..............................3 2. Conventions used in this document..............................4
3. Terminology....................................................4 3. Terminology....................................................4
4. The TCP Control Block (TCB)....................................4 4. The TCP Control Block (TCB)....................................4
5. TCB Interdependence............................................5 5. TCB Interdependence............................................5
6. An Example of Temporal Sharing.................................5 6. An Example of Temporal Sharing.................................6
7. An Example of Ensemble Sharing.................................9 7. An Example of Ensemble Sharing.................................9
8. Compatibility Issues..........................................11 8. Compatibility Issues..........................................11
9. Implications..................................................13 9. Implications..................................................13
10. Implementation Observations..................................14 10. Implementation Observations..................................14
11. Updates to RFC 2140..........................................15 11. Updates to RFC 2140..........................................15
12. Security Considerations......................................16 12. Security Considerations......................................16
13. IANA Considerations..........................................16 13. IANA Considerations..........................................16
14. References...................................................16 14. References...................................................16
14.1. Normative References....................................16 14.1. Normative References....................................16
14.2. Informative References..................................17 14.2. Informative References..................................17
15. Acknowledgments..............................................19 15. Acknowledgments..............................................19
16. Change log...................................................19 16. Change log...................................................20
17. Appendix A: TCB sharing history..............................21 Appendix A : TCB sharing history.................................22
18. Appendix B: Options..........................................22 Appendix B : TCP Option Sharing and Caching......................22
Appendix C : Automating the Initial Window in TCP over Long
Timescales.......................................................25
C.1. Introduction.............................................25
C.2. Design Considerations....................................25
C.3. Proposed IW Algorithm....................................26
C.4. Discussion...............................................29
C.5. Observations.............................................30
1. Introduction 1. Introduction
TCP is a connection-oriented reliable transport protocol layered TCP is a connection-oriented reliable transport protocol layered
over IP [RFC793]. Each TCP connection maintains state, usually in a over IP [RFC793]. Each TCP connection maintains state, usually in a
data structure called the TCP Control Block (TCB). The TCB contains data structure called the TCP Control Block (TCB). The TCB contains
information about the connection state, its associated local information about the connection state, its associated local
process, and feedback parameters about the connection's transmission process, and feedback parameters about the connection's transmission
properties. As originally specified and usually implemented, most properties. As originally specified and usually implemented, most
TCB information is maintained on a per-connection basis. Some TCB information is maintained on a per-connection basis. Some
skipping to change at page 7, line 14 skipping to change at page 7, line 22
(SYN-ACK) from the server at all, i.e., connection timeout." [RFC (SYN-ACK) from the server at all, i.e., connection timeout." [RFC
7413]. TFOinfo is cached when a connection is established. 7413]. TFOinfo is cached when a connection is established.
Other TCP option state might not be as readily cached. E.g., TCP-AO Other TCP option state might not be as readily cached. E.g., TCP-AO
[RFC5925] success or failure between a host pair for a single SYN [RFC5925] success or failure between a host pair for a single SYN
destination port might be usefully cached. TCP-AO success or failure destination port might be usefully cached. TCP-AO success or failure
to other SYN destination ports on that host pair is never useful to to other SYN destination ports on that host pair is never useful to
cache because TCP-AO security parameters can vary per service. cache because TCP-AO security parameters can vary per service.
The table below gives an overview of option-specific information The table below gives an overview of option-specific information
that can be shared. that can be shared. Additional information on TCP options and
sharing is provided in Appendix B.
TEMPORAL SHARING - Option info TEMPORAL SHARING - Option info
Cached New Cached New
---------------------------------------- ----------------------------------------
old_TFO_Cookie old_TFO_Cookie old_TFO_Cookie old_TFO_Cookie
old_TFO_Failure old_TFO_Failure old_TFO_Failure old_TFO_Failure
TEMPORAL SHARING - Cache Updates TEMPORAL SHARING - Cache Updates
skipping to change at page 11, line 39 skipping to change at page 11, line 39
There are several ways to initialize the congestion window in a new There are several ways to initialize the congestion window in a new
TCB among an ensemble of current connections to a host. Current TCP TCB among an ensemble of current connections to a host. Current TCP
implementations initialize it to four segments as standard [rfc3390] implementations initialize it to four segments as standard [rfc3390]
and 10 segments experimentally [RFC6928]. These approaches assume and 10 segments experimentally [RFC6928]. These approaches assume
that new connections should behave as conservatively as possible. that new connections should behave as conservatively as possible.
The algorithm described in [Ba12] adjusts the initial cwnd depending The algorithm described in [Ba12] adjusts the initial cwnd depending
on the cwnd values of ongoing connections. There have also been on the cwnd values of ongoing connections. There have also been
suggestions to use the kind of sharing mechanisms described in this suggestions to use the kind of sharing mechanisms described in this
document over long timescales to adapt TCP's initial window document over long timescales to adapt TCP's initial window
automatically [To13]. automatically, as described further in Appendix A [To12].
8. Compatibility Issues 8. Compatibility Issues
For the congestion and current window information, the initial For the congestion and current window information, the initial
values computed by TCB interdependence may not be consistent with values computed by TCB interdependence may not be consistent with
the long-term aggregate behavior of a set of concurrent connections the long-term aggregate behavior of a set of concurrent connections
between the same endpoints. Under conventional TCP congestion between the same endpoints. Under conventional TCP congestion
control, if a single existing connection has converged to a control, if a single existing connection has converged to a
congestion window of 40 segments, two newly joining concurrent congestion window of 40 segments, two newly joining concurrent
connections assume initial windows of 10 segments [RFC6928], and the connections assume initial windows of 10 segments [RFC6928], and the
skipping to change at page 12, line 34 skipping to change at page 12, line 34
shared only within connections to the same SYN destination port. In shared only within connections to the same SYN destination port. In
case of Temporal Sharing, TCB information could also become invalid case of Temporal Sharing, TCB information could also become invalid
over time. Because this is similar to the case when a connection over time. Because this is similar to the case when a connection
becomes idle, mechanisms that address idle TCP connections (e.g., becomes idle, mechanisms that address idle TCP connections (e.g.,
[RFC7661]) could also be applied to TCB cache management, especially [RFC7661]) could also be applied to TCB cache management, especially
when TCP Fast Open is used [RFC7413]. when TCP Fast Open is used [RFC7413].
There may be additional considerations to the way in which TCB There may be additional considerations to the way in which TCB
interdependence rebalances congestion feedback among the current interdependence rebalances congestion feedback among the current
connections, e.g., it may be appropriate to consider the impact of a connections, e.g., it may be appropriate to consider the impact of a
connection being in Fast Recovery [RFC5861] or some other similar connection being in Fast Recovery [RFC5681] or some other similar
unusual feedback state, e.g., as inhibiting or affecting the unusual feedback state, e.g., as inhibiting or affecting the
calculations described herein. calculations described herein.
TCP is sometimes used in situations where packets of the same host- TCP is sometimes used in situations where packets of the same host-
pair do not always take the same path. Multipath routing that relies pair do not always take the same path. Multipath routing that relies
on examining transport headers, such as ECMP and LAG, may not result on examining transport headers, such as ECMP and LAG, may not result
in repeatable path selection when TCP segments are encapsulated, in repeatable path selection when TCP segments are encapsulated,
encrypted, or altered - for example, in some Virtual Private Network encrypted, or altered - for example, in some Virtual Private Network
(VPN) tunnels that rely on proprietary encapsulation. Similarly, (VPN) tunnels that rely on proprietary encapsulation. Similarly,
such approaches cannot operate deterministically when the TCP header such approaches cannot operate deterministically when the TCP header
skipping to change at page 16, line 4 skipping to change at page 16, line 4
multipath TCP, fast open, PLPMTUD, NAT, and the TCP Authentication multipath TCP, fast open, PLPMTUD, NAT, and the TCP Authentication
Option. Option.
The detailed impact on TCB state addresses TCB parameters in greater The detailed impact on TCB state addresses TCB parameters in greater
detail, addressing RSS in both the send and receive direction, MSS detail, addressing RSS in both the send and receive direction, MSS
and send-MSS separately, adds path MTU and ssthresh, and addresses and send-MSS separately, adds path MTU and ssthresh, and addresses
the impact on TCP option state. the impact on TCP option state.
New sections have been added to address compatibility issues and New sections have been added to address compatibility issues and
implementation observations. The relation of this work to T/TCP has implementation observations. The relation of this work to T/TCP has
been moved to an appendix discussion on history, partly to reflect been moved to Appendix A on history, partly to reflect the
the deprecation of that protocol. deprecation of that protocol.
Appendix C has been added to discuss the potential to use temporal
sharing over long timescales to adapt TCP's initial window
automatically, largely imported from [To12].
Finally, this document updates and significantly expands the Finally, this document updates and significantly expands the
referenced literature. referenced literature.
12. Security Considerations 12. Security Considerations
These presented implementation methods do not have additional These presented implementation methods do not have additional
ramifications for explicit attacks. They may be susceptible to ramifications for explicit attacks. They may be susceptible to
denial-of-service attacks if not otherwise secured. For example, an denial-of-service attacks if not otherwise secured. For example, an
application can open a connection and set its window size to zero, application can open a connection and set its window size to zero,
skipping to change at page 16, line 45 skipping to change at page 16, line 49
13. IANA Considerations 13. IANA Considerations
There are no IANA implications or requests in this document. There are no IANA implications or requests in this document.
This section should be removed upon final publication as an RFC. This section should be removed upon final publication as an RFC.
14. References 14. References
14.1. Normative References 14.1. Normative References
This document has no normative references. [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
Requirement Levels", BCP 14, RFC 2119, March 1997.
[RFC8174] Leiba., B., "Ambiguity of Uppercase vs Lowercase in RFC
2119 Key Words", RFC 8174, May 2017.
14.2. Informative References 14.2. Informative References
[Br02] Brownlee, N. and K. Claffy, "Understanding Internet [Al10] Allman, M., "Initial Congestion Window Specification",
Traffic Streams: Dragonflies and Tortoises", IEEE (work in progress), draft-allman-tcpm-bump-initcwnd-00,
Communications Magazine p110-117, 2002. Nov. 2010.
[Ba12] Barik, R., Welzl, M., Ferlin, S., Alay, O., " LISA: A
Linked Slow-Start Algorithm for MPTCP", IEEE ICC, Kuala
Lumpur, Malaysia, May 23-27 2016.
[Be94] Berners-Lee, T., et al., "The World-Wide Web," [Be94] Berners-Lee, T., et al., "The World-Wide Web,"
Communications of the ACM, V37, Aug. 1994, pp. 76-82. Communications of the ACM, V37, Aug. 1994, pp. 76-82.
[Br94] Braden, B., "T/TCP -- Transaction TCP: Source Changes for [Br94] Braden, B., "T/TCP -- Transaction TCP: Source Changes for
Sun OS 4.1.3,", Release 1.0, USC/ISI, September 14, 1994. Sun OS 4.1.3,", Release 1.0, USC/ISI, September 14, 1994.
[Br02] Brownlee, N. and K. Claffy, "Understanding Internet
Traffic Streams: Dragonflies and Tortoises", IEEE
Communications Magazine p110-117, 2002.
[Co91] Comer, D., Stevens, D., Internetworking with TCP/IP, V2, [Co91] Comer, D., Stevens, D., Internetworking with TCP/IP, V2,
Prentice-Hall, NJ, 1991. Prentice-Hall, NJ, 1991.
[FreeBSD] FreeBSD source code, Release 2.10, http://www.freebsd.org/
[Du16] Dukkipati, N., Yuchung C., and Amin V., "Research [Du16] Dukkipati, N., Yuchung C., and Amin V., "Research
Impacting the Practice of Congestion Control." ACM SIGCOMM Impacting the Practice of Congestion Control." ACM SIGCOMM
CCR (editorial), on-line post, July 2016. CCR (editorial), on-line post, July 2016.
[FreeBSD] FreeBSD source code, Release 2.10, http://www.freebsd.org/
[Hu01] Hugues, A., Touch, J., Heidemann, J., "Issues in Slow- [Hu01] Hugues, A., Touch, J., Heidemann, J., "Issues in Slow-
Start Restart After Idle", draft-hughes-restart-00 Start Restart After Idle", draft-hughes-restart-00
(expired), Dec. 2001. (expired), Dec. 2001.
[Hu12] Hurtig, P., Brunstrom, A., "Enhanced metric caching for [Hu12] Hurtig, P., Brunstrom, A., "Enhanced metric caching for
short TCP flows," 2012 IEEE International Conference on short TCP flows," 2012 IEEE International Conference on
Communications (ICC), Ottawa, ON, 2012, pp. 1209-1213. Communications (ICC), Ottawa, ON, 2012, pp. 1209-1213.
[Ba12] Barik, R., Welzl, M., Ferlin, S., Alay, O., " LISA: A [Ja88] Jacobson, V., M. Karels, "Congestion Avoidance and
Linked Slow-Start Algorithm for MPTCP", IEEE ICC, Kuala Control", Proc. Sigcomm 1988.
Lumpur, Malaysia, May 23-27 2016.
[RFC793] Postel, Jon, "Transmission Control Protocol," Network [RFC793] Postel, Jon, "Transmission Control Protocol," Network
Working Group RFC-793/STD-7, ISI, Sept. 1981. Working Group RFC-793/STD-7, ISI, Sept. 1981.
[RFC1122] Braden, R. (ed), "Requirements for Internet Hosts -- [RFC1122] Braden, R. (ed), "Requirements for Internet Hosts --
Communication Layers", RFC-1122, Oct. 1989. Communication Layers", RFC-1122, Oct. 1989.
[RFC1191] Mogul, J., Deering, S., "Path MTU Discovery," RFC 1191, [RFC1191] Mogul, J., Deering, S., "Path MTU Discovery," RFC 1191,
Nov. 1990. Nov. 1990.
[RFC1644] Braden, R., "T/TCP -- TCP Extensions for Transactions [RFC1644] Braden, R., "T/TCP -- TCP Extensions for Transactions
Functional Specification," RFC-1644, July 1994. Functional Specification," RFC-1644, July 1994.
[RFC1379] Braden, R., "Transaction TCP -- Concepts," RFC-1379, [RFC1379] Braden, R., "Transaction TCP -- Concepts," RFC-1379,
September 1992. September 1992.
[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate [RFC2001] Stevens, W., "TCP Slow Start, Congestion Avoidance, Fast
Requirement Levels", BCP 14, RFC 2119, March 1997. Retransmit, and Fast Recovery Algorithms", RFC2001
(Standards Track), Jan. 1997.
[RFC2140] Touch, J., "TCP Control Block Interdependence", RFC 2140, [RFC2140] Touch, J., "TCP Control Block Interdependence", RFC 2140,
April 1997. April 1997.
[RFC2414] Allman, M., Floyd, S., Partridge, C., "Increasing TCP's
Initial Window", RFC 2414 (Experimental), Sept. 1998.
[RFC2581] Allman, M., Paxson, V., Stevens, W., "TCP Congestion
Control," RFC2581 (Standards Track), Apr. 1999.
[RFC2663] Srisuresh, P., Holdrege, M., "IP Network Address [RFC2663] Srisuresh, P., Holdrege, M., "IP Network Address
Translator (NAT) Terminology and Considerations", RFC- Translator (NAT) Terminology and Considerations", RFC-
2663, August 1999. 2663, August 1999.
[RFC2861] Handley, M., Padhye, J., Floyd, S., "TCP Congestion Window
Validation", RFC2861 (Experimental), June 2000.
[RFC3390] Allman, M., Floyd, S., Partridge, C., "Increasing TCP's [RFC3390] Allman, M., Floyd, S., Partridge, C., "Increasing TCP's
Initial Window," RFC 3390, Oct. 2002. Initial Window," RFC 3390, Oct. 2002.
[RFC7231] Fielding, R., J. Reshke, Eds., "HTTP/1.1 Semantics and
Content," RFC-7231, June 2014.
[RFC3124] Balakrishnan, H., Seshan, S., "The Congestion Manager," [RFC3124] Balakrishnan, H., Seshan, S., "The Congestion Manager,"
RFC 3124, June 2001. RFC 3124, June 2001.
[RFC4340] Kohler, E., Handley, M., Floyd, S., "Datagram Congestion [RFC4340] Kohler, E., Handley, M., Floyd, S., "Datagram Congestion
Control Protocol (DCCP)," RFC 4340, Mar. 2006. Control Protocol (DCCP)," RFC 4340, Mar. 2006.
[RFC4821] Mathis, M., Heffner, J., "Packetization Layer Path MTU [RFC4821] Mathis, M., Heffner, J., "Packetization Layer Path MTU
Discovery," RFC 4821, Mar. 2007. Discovery," RFC 4821, Mar. 2007.
[RFC4960] Stewart, R., (Ed.), "Stream Control Transmission [RFC4960] Stewart, R., (Ed.), "Stream Control Transmission
Protocol," RFC4960, Sept. 2007. Protocol," RFC4960, Sept. 2007.
[RFC5861] Allman, M., Paxson, V., Blanton, E., "TCP Congestion [RFC5681] Allman, M., Paxson, V., Blanton, E., "TCP Congestion
Control," RFC 5861, Sept. 2009. Control," RFC 5681 (Standards Track), Sep. 2009.
[RFC5925] Touch, J., Mankin, A., Bonica, R., "The TCP Authentication [RFC5925] Touch, J., Mankin, A., Bonica, R., "The TCP Authentication
Option," RFC 5925, June 2010. Option," RFC 5925, June 2010.
[RFC6824] Ford, A., Raiciu, C., Handley, M., Bonaventure, O., "TCP [RFC6824] Ford, A., Raiciu, C., Handley, M., Bonaventure, O., "TCP
Extensions for Multipath Operation with Multiple Extensions for Multipath Operation with Multiple
Addresses," RFC 6824, Jan. 2013. Addresses," RFC 6824, Jan. 2013.
[RFC6928] Chu, J., Dukkipati, N., Cheng, Y., Mathis, M., "Increasing [RFC6928] Chu, J., Dukkipati, N., Cheng, Y., Mathis, M., "Increasing
TCP's Initial Window," RFC 6928, Apr. 2013. TCP's Initial Window," RFC 6928, Apr. 2013.
[RFC7231] Fielding, R., J. Reshke, Eds., "HTTP/1.1 Semantics and
Content," RFC-7231, June 2014.
[RFC7323] Borman, D., B. Braden, V. Jacobson, R. Scheffenegger
(Ed.), "TCP Extensions for High Performance," RFC 7323,
Sept. 2014.
[RFC7413] Cheng, Y., Chu, J., Radhakrishnan, S., Jain, A., "TCP Fast [RFC7413] Cheng, Y., Chu, J., Radhakrishnan, S., Jain, A., "TCP Fast
Open", RFC 7413, Dec. 2014. Open", RFC 7413, Dec. 2014.
[RFC7424] Krishnan, R., Yong, L., Ghanwani, A., So, N., Khasnabish, [RFC7424] Krishnan, R., Yong, L., Ghanwani, A., So, N., Khasnabish,
B., "Mechanisms for Optimizing Link Aggregation Group B., "Mechanisms for Optimizing Link Aggregation Group
(LAG) and Equal-Cost Multipath (ECMP) Component Link (LAG) and Equal-Cost Multipath (ECMP) Component Link
Utilization in Networks", RFC 7424, Jan. 2015 Utilization in Networks", RFC 7424, Jan. 2015
[RFC7540] Belshe, M., Peon, R., Thomson, M., "Hypertext Transfer [RFC7540] Belshe, M., Peon, R., Thomson, M., "Hypertext Transfer
Protocol Version 2 (HTTP/2)", RFC 7540, May 2015. Protocol Version 2 (HTTP/2)", RFC 7540, May 2015.
[RFC7661] Fairhurst, G., Sathiaseelan, A., Secchi, R., "Updating TCP [RFC7661] Fairhurst, G., Sathiaseelan, A., Secchi, R., "Updating TCP
to Support Rate-Limited Traffic", RFC 7661, Oct. 2015. to Support Rate-Limited Traffic", RFC 7661, Oct. 2015.
[RFC8174] Leiba., B., "Ambiguity of Uppercase vs Lowercase in RFC
2119 Key Words", RFC 8174, May 2017.
[RFC8201] McCann, J., Deering. S., Mogul, J., Hinden, R. (Ed.), [RFC8201] McCann, J., Deering. S., Mogul, J., Hinden, R. (Ed.),
"Path MTU Discovery for IP version 6," RFC 8201, Jul. "Path MTU Discovery for IP version 6," RFC 8201, Jul.
2017. 2017.
[To13] Touch, J., "Automating the Initial Window in TCP," draft- [To12] Touch, J., "Automating the Initial Window in TCP," draft-
touch-tcpm-automatic-iw-03 (expired), Jan. 2013. touch-tcpm-automatic-iw-03 (expired), July 2012.
15. Acknowledgments 15. Acknowledgments
The authors would like to thank for Praveen Balasubramanian for The authors would like to thank for Praveen Balasubramanian for
information regarding TCB sharing in Windows, and Yuchung Cheng, information regarding TCB sharing in Windows, and Yuchung Cheng,
Lars Eggert, Ilpo Jarvinen and Michael Scharf for comments on Lars Eggert, Ilpo Jarvinen and Michael Scharf for comments on
earlier versions of the draft. Earlier revisions of this work earlier versions of the draft. Earlier revisions of this work
received funding from a collaborative research project between the received funding from a collaborative research project between the
University of Oslo and Huawei Technologies Co., Ltd. and were partly University of Oslo and Huawei Technologies Co., Ltd. and were partly
supported by USC/ISI's Postel Center. supported by USC/ISI's Postel Center.
This document was prepared using 2-Word-v2.0.template.dot. This document was prepared using 2-Word-v2.0.template.dot.
16. Change log 16. Change log
This section should be removed upon final publication as an RFC. This section should be removed upon final publication as an RFC.
ietf-01:
- Added Appendix C to address long-timescale temporal adaptation.
ietf-00: ietf-00:
- Re-issued as draft-ietf-tcpm-2140bis due to WG adoption. - Re-issued as draft-ietf-tcpm-2140bis due to WG adoption.
- Cleaned orphan references to T/TCP, removed incomplete refs - Cleaned orphan references to T/TCP, removed incomplete refs
- Moved references to informative section and updated Sec 2 - Moved references to informative section and updated Sec 2
- Updated to clarify no impact to interoperability - Updated to clarify no impact to interoperability
- Updated appendix B to avoid 2119 language - Updated appendix B to avoid 2119 language
06: 06:
skipping to change at page 20, line 37 skipping to change at page 21, line 14
- Marked entries that are considered safe to share with an - Marked entries that are considered safe to share with an
asterisk (suggestion was to split the table) asterisk (suggestion was to split the table)
- Discussed correct host identification: NATs may make IP - Discussed correct host identification: NATs may make IP
addresses the wrong input, could e.g. use HTTP cookie. addresses the wrong input, could e.g. use HTTP cookie.
- Included MMS_S and MMS_R from RFC1122; fixed the use of MSS and - Included MMS_S and MMS_R from RFC1122; fixed the use of MSS and
MTU MTU
- Added information about option sharing, listed options in the - Added information about option sharing, listed options in
appendix Appendix B
Authors' Addresses Authors' Addresses
Joe Touch Joe Touch
Manhattan Beach, CA 90266 Manhattan Beach, CA 90266
USA USA
Phone: +1 (310) 560-0334 Phone: +1 (310) 560-0334
Email: touch@strayalpha.com Email: touch@strayalpha.com
Michael Welzl Michael Welzl
University of Oslo University of Oslo
PO Box 1080 Blindern PO Box 1080 Blindern
Oslo N-0316 Oslo N-0316
Norway Norway
Phone: +47 22 85 24 20 Phone: +47 22 85 24 20
Email: michawe@ifi.uio.no Email: michawe@ifi.uio.no
Safiqul Islam Safiqul Islam
skipping to change at page 21, line 22 skipping to change at page 22, line 5
Safiqul Islam Safiqul Islam
University of Oslo University of Oslo
PO Box 1080 Blindern PO Box 1080 Blindern
Oslo N-0316 Oslo N-0316
Norway Norway
Phone: +47 22 84 08 37 Phone: +47 22 84 08 37
Email: safiquli@ifi.uio.no Email: safiquli@ifi.uio.no
17. Appendix A: TCB sharing history Appendix A: TCB sharing history
T/TCP proposed using caches to maintain TCB information across T/TCP proposed using caches to maintain TCB information across
instances (temporal sharing), e.g., smoothed RTT, RTT variance, instances (temporal sharing), e.g., smoothed RTT, RTT variance,
congestion avoidance threshold, and MSS [RFC1644]. These values were congestion avoidance threshold, and MSS [RFC1644]. These values were
in addition to connection counts used by T/TCP to accelerate data in addition to connection counts used by T/TCP to accelerate data
delivery prior to the full three-way handshake during an OPEN. The delivery prior to the full three-way handshake during an OPEN. The
goal was to aggregate TCB components where they reflect one goal was to aggregate TCB components where they reflect one
association - that of the host-pair, rather than artificially association - that of the host-pair, rather than artificially
separating those components by connection. separating those components by connection.
skipping to change at page 22, line 7 skipping to change at page 22, line 34
sessions. sessions.
Temporal sharing of cached TCB data was originally implemented in Temporal sharing of cached TCB data was originally implemented in
the SunOS 4.1.3 T/TCP extensions [Br94] and the FreeBSD port of same the SunOS 4.1.3 T/TCP extensions [Br94] and the FreeBSD port of same
[FreeBSD]. As mentioned before, only the MSS and RTT parameters were [FreeBSD]. As mentioned before, only the MSS and RTT parameters were
cached, as originally specified in [RFC1379]. Later discussion of cached, as originally specified in [RFC1379]. Later discussion of
T/TCP suggested including congestion control parameters in this T/TCP suggested including congestion control parameters in this
cache; for example, [RFC1644] (Section 3.1) hints at initializing cache; for example, [RFC1644] (Section 3.1) hints at initializing
the congestion window to the old window size. the congestion window to the old window size.
18. Appendix B: Options Appendix B: TCP Option Sharing and Caching
In addition to the options that can be cached and shared, this memo In addition to the options that can be cached and shared, this memo
also lists known options for which state is unsafe to be kept. This also lists known options for which state is unsafe to be kept. This
list is meant to avoid work duplication and should be removed upon list is meant to avoid work duplication and should be removed upon
publication. publication.
Obsolete (unsafe to keep state): Obsolete (unsafe to keep state):
ECHO ECHO
skipping to change at line 1002 skipping to change at page 25, line 4
Safe but optional to keep state: Safe but optional to keep state:
MSS MSS
TFO failure (so we don't try again, since it's optional) TFO failure (so we don't try again, since it's optional)
Safe and necessary to keep state: Safe and necessary to keep state:
TFP cookie (if TFO succeeded in the past) TFP cookie (if TFO succeeded in the past)
Appendix C: Automating the Initial Window in TCP over Long Timescales
Note: this section is taken verbatim from [To12], updated to refer
to itself as an appendix.
C.1. Introduction
TCP's congestion control algorithm uses an initial window value
(IW), both as a starting point for new connections and after one RTO
or more [RFC2581][RFC2861]. This value has evolved over time,
originally one maximum segment size (MSS), and increased to the
lesser of four MSS or 4,380 bytes [RFC3390][RFC5681]. For typical
Internet connections with an maximum transmission units (MTUs) of
1500 bytes, this permits three segments of 1,460 bytes each.
The IW value was originally implied in the original TCP congestion
control description, and documented as a standard in 1997
[RFC2001][Ja88]. The value was last updated in 1998 experimentally,
and moved to the standards track in 2002 [RFC2414][RFC3390]. There
have been recent proposals to update the IW based on further
increases in host and router capabilities and network capacity, some
focusing on specific values (e.g., IW=10), and others prescribing a
schedule for increases over time (e.g., IW=6 for 2011, increasing by
1-2 MSS per year).
This appendix discusses how TCP can objectively measure when an IW
is too large, and that such feedback should be used over long
timescales to adjust the IW automatically. The result should be
safer to deploy and might avoid the need to repeatedly revisit IW
size over time.
Note that this mechanism attempts to make the IW more adaptive over
time. It can increase the IW beyond that which is currently
recommended for widescale deployment, and so its use should be
carefully monitored.
C.2. Design Considerations
TCP's IW value has existed statically for over two decades, so any
solution to adjusting the IW dynamically should have similarly
stable, non-invasive effects on the performance and complexity of
TCP. In order to be fair, the IW should be similar for most machines
on the public Internet. Finally, a desirable goal is to develop a
self-correcting algorithm, so that IW values that cause network
problems can be avoided. To that end, we propose the following list
of design goals:
o Impart little to no impact to TCP in the absence of loss, i.e.,
it should not increase the complexity of default packet
processing in the normal case.
o Adapt to network feedback over long timescales, avoiding values
that persistently cause network problems.
o Decrease the IW in the presence of sustained loss of IW segments,
as determined over a number of different connections.
o Increase the IW in the absence of sustained loss of IW segments,
as determined over a number of different connections.
o Operate conservatively, i.e., tend towards leaving the IW the
same in the absence of sufficient information, and give greater
consideration to IW segment loss than IW segment success.
We expect that, without other context, a good IW algorithm will
converge to a single value, but this is not required. An endpoint
with additional context or information, or deployed in a constrained
environment, can always use a different value. In specific,
information from previous connections, or sets of connections with a
similar path, can already be used as context for such decisions (as
noted in the core of this document).
However, if a given IW value persistently causes packet loss during
the initial burst of packets, it is clearly inappropriate and could
be inducing unnecessary loss in other competing connections. This
might happen for sites behind very slow boxes with small buffers,
which may or may not be the first hop.
C.3. Proposed IW Algorithm
Below is a simple description of the proposed IW algorithm. It
relies on the following parameters:
o MinIW = 3 MSS or 4,380 bytes (as per RFC3390]
o MaxIW = 10
o MulDecr = 0.5
o AddIncr = 2 MSS
o Threshold = 0.05
We assume that the minimum IW (MinIW) should be as currently
specified [RFC3390]. The maximum IW can be set to a fixed value
[RFC6928], or set based on a schedule if trusted time references are
available [Al10]; here we prefer a fixed value. We also propose to
use an AIMD algorithm, with increase and decreases as noted.
Although these parameters are somewhat arbitrary, their initial
values are not important except that the algorithm is AIMD and the
MaxIW should not exceed that recommended for other systems on the
Internet. Current proposals, including default current operation,
are degenerate cases of the algorithm below for given parameters -
notably MulDec = 1.0 and AddIncr = 0 MSS, thus disabling the
automatic part of the algorithm.
The proposed algorithm is as follows:
1. On boot:
IW = MaxIW; # assume this is in bytes, and an even number of MSS
2. Upon starting a new connection
CWND = IW;
conncount++;
IWnotchecked = 1; # true
3. During a connection's SYN-ACK processing, if SYN-ACK includes
ECN, treat as if the IW is too large
if (IWnotchecked && (synackecn == 1)) {
losscount++;
IWnotchecked = 0; # never check again
}
4. During a connection, if retransmission occurs, check the seqno of
the outgoing packet (in bytes) to see if the resent segment fixes
an IW loss:
if (Retransmitting && IWnotchecked && ((ISN - seqno) < IW))) {
losscount++;
IWnotchecked = 0; # never do this entire "if" again
} else {
IWnotchecked = 0; # you're beyond the IW so stop checking
}
5. Once every 1000 conections, as a separate process (i.e., not as
part of processing a given connection):
if (conncount > 1000) {
if (losscount/conncount > threshold) {
# the number of connections with errors is too high
IW = IW * MulDecr;
} else {
IW = IW + AddIncr;
}
}
We recognize that this algorithm can yield a false positive when the
sequence number wraps around. This can be avoided using either PAWS
[RFC7323] context or 64-bit internal sequence numbers (as in TCP-AO
[RFC5925]). Alternately, false positives can be allowed since they
are expected to be infrequent and thus will not affect the overall
statistics of the algorithm.
The following additional constraints are imposed:
>> The automatic IW algorithm MUST initialize to MaxIW, in the
absence of other context information.
If there are too few connections to make a decision or if there is
otherwise insufficient information to increase the IW, then the
MaxIW defaults to the current recommended value.
>> An implementation may allow the MaxIW to grow beyond the
currently recommended Internet default, but not more than 2 segments
per calendar year.
If an endpoint has a persistent history of successfully transmitting
IW segments without loss, then it is allowed to probe the Internet
to determine if larger IW values have similar success. This probing
is limited and requires a trusted time source, otherwise the MaxIW
remains constant.
>> An implementation MUST adjust the IW based on loss statistics at
least once every 1000 connections.
An endpoint needs to be sufficiently reactive to IW loss.
>> An implementation MUST decrease the IW by at least one MSS when
indicated during an evaluation interval.
An endpoint that detects loss needs to decrease its IW by at least
one MSS, otherwise it is not participating in an automatic reactive
algorithm.
>> An implementation MUST increase by no more than 2 MSS per
evaluation interval.
An endpoint that does not experience IW loss needs to probe the
network incrementally.
>> An implementation SHOULD use an IW that is an integer multiple of
2 MSS.
The IW should remain a multiple of 2 MSS segments, to enable
efficient ACK compression without incurring unnecessary timeouts.
>> An implementation MUST decrease the IW if more than 95% of
connections have IW losses.
Again, this is to ensure an implementation is sufficiently reactive.
>> An implementation MAY group IW values and statistics within
subsets of connections. Such grouping MAY use any information about
connections to form groups except loss statistics.
There are some TCP connections which might not be counted at all,
such as those to/from loopback addresses, or those within the same
subnet as that of a local interface (for which congestion control is
sometimes disabled anyway). This may also include connections that
terminate before the IW is full, i.e., as a separate check at the
time of the connection closing.
The period over which the IW is updated is intended to be a long
timescale, e.g., a month or so, or 1,000 connections, whichever is
longer. An implementation might check the IW once a month, and
simply not update the IW or clear the connection counts in months
where the number of connections is too small.
C.4. Discussion
There are numerous parameters to the above algorithm that are
compliant with the given requirements; this is intended to allow
variation in configuration and implementation while ensuring that
all such algorithms are reactive and safe.
This algorithm continues to assume segments because that is the
basis of most TCP implementations. It might be useful to consider
revising the specifications to allow byte-based congestion given
sufficient experience.
The algorithm checks for IW losses only during the first IW after a
connection start; it does not check for IW losses elsewhere the IW
is used, e.g., during slow-start restarts.
>> An implementation MAY detect IW losses during slow-start restarts
in addition to losses during the first IW of a connection. In this
case, the implementation MUST count each restart as a "connection"
for the purposes of connection counts and periodic rechecking of the
IW value.
False positives can occur during some kinds of segment reordering,
e.g., that might trigger spurious retransmissions even without a
true segment loss. These are not expected to be sufficiently common
to dominate the algorithm and its conclusions.
This mechanism does require additional per-connection state which is
currently common in some implementations, and is useful for other
reasons (e.g., the ISN is used in TCP-AO [RFC5925]). The mechanism
also benefits from persistent state kept across reboots, as would be
other state sharing mechanisms (e.g., TCP Control Block Sharing
[RFC2140]). The mechanism is inspired by RFC 2140's use of
information across connections.
The receive window (RWIN) is not involved in this calculation. The
size of RWIN is determined by receiver resources, and provides space
to accommodate segment reordering. It is not involved with
congestion control, which is the focus of this document and its
management of the IW.
C.5. Observations
The IW may not converge to a single, global value. It also may not
converge at all, but rather may oscillate by a few MSS as it
repeatedly probes the Internet for larger IWs and fails. Both
properties are consistent with TCP behavior during each individual
connection.
This mechanism assumes that losses during the IW are due to IW size.
Persistent errors that drop packets for other reasons - e.g., OS
bugs, can cause false positives. Again, this is consistent with
TCP's basic assumption that loss is caused by congestion and
requires backoff. This algorithm treats the IW of new connections as
a long-timescale backoff system.
 End of changes. 30 change blocks. 
39 lines changed or deleted 78 lines changed or added

This html diff was produced by rfcdiff 1.47. The latest version is available from http://tools.ietf.org/tools/rfcdiff/