--- 1/draft-ietf-mptcp-rfc6824bis-12.txt 2019-02-17 12:13:11.643822455 -0800 +++ 2/draft-ietf-mptcp-rfc6824bis-13.txt 2019-02-17 12:13:11.815826706 -0800 @@ -1,71 +1,71 @@ Internet Engineering Task Force A. Ford Internet-Draft Pexip Obsoletes: 6824 (if approved) C. Raiciu Intended status: Standards Track U. Politechnica of Bucharest -Expires: April 6, 2019 M. Handley +Expires: August 21, 2019 M. Handley U. College London O. Bonaventure U. catholique de Louvain C. Paasch Apple, Inc. - October 3, 2018 + February 17, 2019 TCP Extensions for Multipath Operation with Multiple Addresses - draft-ietf-mptcp-rfc6824bis-12 + draft-ietf-mptcp-rfc6824bis-13 Abstract TCP/IP communication is currently restricted to a single path per connection, yet multiple paths often exist between peers. The simultaneous use of these multiple paths for a TCP/IP session would improve resource usage within the network and, thus, improve user experience through higher throughput and improved resilience to network failure. Multipath TCP provides the ability to simultaneously use multiple paths between peers. This document presents a set of extensions to traditional TCP to support multipath operation. The protocol offers the same type of service to applications as TCP (i.e., reliable bytestream), and it provides the components necessary to establish and use multiple TCP flows across potentially disjoint paths. This document specifies v1 of Multipath TCP, obsoleting v0 as - specified in RFC6824 [RFC6824] through clarifications and - modifications primarily driven by deployment experience. + specified in RFC6824, through clarifications and modifications + primarily driven by deployment experience. Status of This Memo This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet- - Drafts is at https://datatracker.ietf.org/drafts/current/. + Drafts is at http://datatracker.ietf.org/drafts/current/. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." - This Internet-Draft will expire on April 6, 2019. + This Internet-Draft will expire on August 21, 2019. Copyright Notice - Copyright (c) 2018 IETF Trust and the persons identified as the + Copyright (c) 2019 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents - (https://trustee.ietf.org/license-info) in effect on the date of + (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License. Table of Contents 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 @@ -83,62 +83,63 @@ 2.5. Requesting a Change in a Path's Priority . . . . . . . . 13 2.6. Closing an MPTCP Connection . . . . . . . . . . . . . . . 13 2.7. Notable Features . . . . . . . . . . . . . . . . . . . . 14 3. MPTCP Protocol . . . . . . . . . . . . . . . . . . . . . . . 15 3.1. Connection Initiation . . . . . . . . . . . . . . . . . . 16 3.2. Starting a New Subflow . . . . . . . . . . . . . . . . . 23 3.3. General MPTCP Operation . . . . . . . . . . . . . . . . . 28 3.3.1. Data Sequence Mapping . . . . . . . . . . . . . . . . 30 3.3.2. Data Acknowledgments . . . . . . . . . . . . . . . . 33 3.3.3. Closing a Connection . . . . . . . . . . . . . . . . 34 - 3.3.4. Receiver Considerations . . . . . . . . . . . . . . . 36 + 3.3.4. Receiver Considerations . . . . . . . . . . . . . . . 35 3.3.5. Sender Considerations . . . . . . . . . . . . . . . . 37 - 3.3.6. Reliability and Retransmissions . . . . . . . . . . . 38 + 3.3.6. Reliability and Retransmissions . . . . . . . . . . . 37 3.3.7. Congestion Control Considerations . . . . . . . . . . 39 3.3.8. Subflow Policy . . . . . . . . . . . . . . . . . . . 39 3.4. Address Knowledge Exchange (Path Management) . . . . . . 41 3.4.1. Address Advertisement . . . . . . . . . . . . . . . . 42 3.4.2. Remove Address . . . . . . . . . . . . . . . . . . . 45 3.5. Fast Close . . . . . . . . . . . . . . . . . . . . . . . 46 3.6. Subflow Reset . . . . . . . . . . . . . . . . . . . . . . 48 - 3.7. Fallback . . . . . . . . . . . . . . . . . . . . . . . . 50 + 3.7. Fallback . . . . . . . . . . . . . . . . . . . . . . . . 49 3.8. Error Handling . . . . . . . . . . . . . . . . . . . . . 53 - 3.9. Heuristics . . . . . . . . . . . . . . . . . . . . . . . 54 + 3.9. Heuristics . . . . . . . . . . . . . . . . . . . . . . . 53 3.9.1. Port Usage . . . . . . . . . . . . . . . . . . . . . 54 3.9.2. Delayed Subflow Start and Subflow Symmetry . . . . . 54 3.9.3. Failure Handling . . . . . . . . . . . . . . . . . . 55 4. Semantic Issues . . . . . . . . . . . . . . . . . . . . . . . 56 5. Security Considerations . . . . . . . . . . . . . . . . . . . 57 6. Interactions with Middleboxes . . . . . . . . . . . . . . . . 60 7. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 63 8. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 64 8.1. MPTCP Option Subtypes . . . . . . . . . . . . . . . . . . 64 8.2. MPTCP Handshake Algorithms . . . . . . . . . . . . . . . 65 8.3. MP_TCPRST Reason Codes . . . . . . . . . . . . . . . . . 66 9. References . . . . . . . . . . . . . . . . . . . . . . . . . 67 9.1. Normative References . . . . . . . . . . . . . . . . . . 67 9.2. Informative References . . . . . . . . . . . . . . . . . 67 Appendix A. Notes on Use of TCP Options . . . . . . . . . . . . 71 Appendix B. TCP Fast Open and MPTCP . . . . . . . . . . . . . . 72 - B.1. TFO cookie request with MPTCP . . . . . . . . . . . . . . 73 + B.1. TFO cookie request with MPTCP . . . . . . . . . . . . . . 72 B.2. Data sequence mapping under TFO . . . . . . . . . . . . . 73 B.3. Connection establishment examples . . . . . . . . . . . . 74 Appendix C. Control Blocks . . . . . . . . . . . . . . . . . . . 76 C.1. MPTCP Control Block . . . . . . . . . . . . . . . . . . . 76 C.1.1. Authentication and Metadata . . . . . . . . . . . . . 76 C.1.2. Sending Side . . . . . . . . . . . . . . . . . . . . 77 C.1.3. Receiving Side . . . . . . . . . . . . . . . . . . . 77 C.2. TCP Control Blocks . . . . . . . . . . . . . . . . . . . 77 C.2.1. Sending Side . . . . . . . . . . . . . . . . . . . . 78 C.2.2. Receiving Side . . . . . . . . . . . . . . . . . . . 78 Appendix D. Finite State Machine . . . . . . . . . . . . . . . . 78 - Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 79 + Appendix E. Changes from RFC6184 . . . . . . . . . . . . . . . . 79 + Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 81 1. Introduction Multipath TCP (MPTCP) is a set of extensions to regular TCP [RFC0793] to provide a Multipath TCP [RFC6182] service, which enables a transport connection to operate across multiple paths simultaneously. This document presents the protocol changes required to add multipath capability to TCP; specifically, those for signaling and setting up multiple paths ("subflows"), managing these subflows, reassembly of data, and termination of sessions. This is not the only information @@ -553,31 +555,31 @@ Host A Host B ------ ------ MP_PRIO -> 2.6. Closing an MPTCP Connection When a host wants to close an existing subflow, but not the whole connection, it can initiate a regular TCP FIN/ACK exchange. When Host A wants to inform Host B that it has no more data to send, - it signals this "Data FIN" as part of the Data Sequence Signal (see + it signals this "DATA_FIN" as part of the Data Sequence Signal (see above). It has the same semantics and behavior as a regular TCP FIN, but at the connection level. Once all the data on the MPTCP connection has been successfully received, then this message is acknowledged at the connection level with a DATA_ACK. Further details are in Section 3.3.3. Host A Host B ------ ------ DATA_SEQUENCE_SIGNAL -> - [Data FIN] + [DATA_FIN] <- (MPTCP DATA_ACK) There is an additional method of connection closure, referred to as "Fast Close", which is analogous to closing a single-path TCP connection with a RST signal. The MP_FASTCLOSE signal is used to indicate to the peer that the connection will be abruptly closed and no data will be accepted anymore. This can be used on an ACK (ensuring reliability of the signal), or a RST (which is not). Both examples are shown in the following diagrams. Further details are in Section 3.5. @@ -901,25 +903,25 @@ C: The third bit, labeled "C", is set to "1" to indicate that the sender of this option will not accept additional MPTCP subflows to the source address and port, and therefore the receiver MUST NOT try to open any additional subflows towards this address and port. This is an efficiency improvement for situations where the sender knows a restriction is in place, for example if the sender is behind a strict NAT, or operating behind a legacy Layer 4 load balancer. D through H: The remaining bits, labeled "D" through "H", are used - for crypto algorithm negotiation. Currently only the rightmost - bit, labeled "H", is assigned. Bit "H" indicates the use of HMAC- - SHA256 (as defined in Section 3.2). An implementation that only - supports this method MUST set bit "H" to 1, and bits "D" through - "G" to 0. + for crypto algorithm negotiation. In this specification only the + rightmost bit, labeled "H", is assigned. Bit "H" indicates the + use of HMAC-SHA256 (as defined in Section 3.2). An implementation + that only supports this method MUST set bit "H" to 1, and bits "D" + through "G" to 0. A crypto algorithm MUST be specified. If flag bits D through H are all 0, the MP_CAPABLE option MUST be treated as invalid and ignored (that is, it must be treated as a regular TCP handshake). The selection of the authentication algorithm also impacts the algorithm used to generate the token and the Initial Data Sequence Number (IDSN). In this specification, with only the SHA-256 algorithm (bit "H") specified and selected, the token MUST be a truncated (most significant 32 bits) SHA-256 hash ([SHS], [RFC6234]) @@ -1035,29 +1037,31 @@ send Token-B (which is generated from Key-B). Note that the hash generation algorithm can be overridden by the choice of cryptographic handshake algorithm, as defined in Section 3.1. The MP_JOIN SYN sends not only the token (which is static for a connection) but also random numbers (nonces) that are used to prevent replay attacks on the authentication method. Recommendations for the generation of random numbers for this purpose are given in [RFC4086]. The MP_JOIN option includes an "Address ID". This is an identifier - that only has significance within a single connection, where it - identifies the source address of this packet, even if the IP header - has been changed in transit by a middlebox. The Address ID allows - address removal (Section 3.4.2) without needing to know what the - source address at the receiver is, thus allowing address removal - through NATs. The Address ID also allows correlation between new - subflow setup attempts and address signaling (Section 3.4.1), to - prevent setting up duplicate subflows on the same path, if an MP_JOIN - and ADD_ADDR are sent at the same time. + generated by the sender of the option, used to identify the source + address of this packet, even if the IP header has been changed in + transit by a middlebox. The numeric value of this field is generated + by the sender and must map uniquely to a source IP address for the + sending host. The Address ID allows address removal (Section 3.4.2) + without needing to know what the source address at the receiver is, + thus allowing address removal through NATs. The Address ID also + allows correlation between new subflow setup attempts and address + signaling (Section 3.4.1), to prevent setting up duplicate subflows + on the same path, if an MP_JOIN and ADD_ADDR are sent at the same + time. The Address IDs of the subflow used in the initial SYN exchange of the first subflow in the connection are implicit, and have the value zero. A host MUST store the mappings between Address IDs and addresses both for itself and the remote host. An implementation will also need to know which local and remote Address IDs are associated with which established subflows, for when addresses are removed from a local or remote host. The MP_JOIN option on packets with the SYN flag set also includes 4 @@ -1161,21 +1165,21 @@ Figure 7: Join Connection (MP_JOIN) Option (for Third ACK) These various MPTCP options fit together to enable authenticated subflow setup as illustrated in Figure 8. Host A Host B ------------------------ ---------- Address A1 Address A2 Address B1 ---------- ---------- ---------- | | | - | SYN + MP_CAPABLE(Key-A) | + | | SYN + MP_CAPABLE | |--------------------------------------------->| |<---------------------------------------------| | SYN/ACK + MP_CAPABLE(Key-B) | | | | | ACK + MP_CAPABLE(Key-A, Key-B) | |--------------------------------------------->| | | | | | SYN + MP_JOIN(Token-B, R-A) | | |------------------------------->| | |<-------------------------------| @@ -1197,32 +1201,29 @@ "Administratively prohibited" reason code (Section 3.6) should be included. If the token is accepted at Host B, but the HMAC returned to Host A does not match the one expected, Host A MUST close the subflow with a TCP RST. In this, and all following cases of sending a RST in this section, the sender SHOULD send a MP_TCPRST option (Section 3.6) on this RST packet with the reason code for a "MPTCP specific error". If Host B does not receive the expected HMAC, or the MP_JOIN option - is missing from the ACK, it MUST close the subflow with a TCP RST - with a MP_TCPRST (Section 3.6) option with the reason code for "MPTCP - specific error". + is missing from the ACK, it MUST close the subflow with a TCP RST. If the HMACs are verified as correct, then both hosts have authenticated each other as being the same peers as existed at the start of the connection, and they have agreed of which connection this subflow will become a part. If the SYN/ACK as received at Host A does not have an MP_JOIN option, - Host A MUST close the subflow with a TCP RST with a MP_TCPRST - (Section 3.6) option with the reason code for "MPTCP specific error". + Host A MUST close the subflow with a TCP RST. This covers all cases of the loss of an MP_JOIN. In more detail, if MP_JOIN is stripped from the SYN on the path from A to B, and Host B does not have a listener on the relevant port, it will respond with a RST in the normal way. If in response to a SYN with an MP_JOIN option, a SYN/ACK is received without the MP_JOIN option (either since it was stripped on the return path, or it was stripped on the outgoing path but Host B responded as if it were a new regular TCP session), then the subflow is unusable and Host A MUST close it with a RST. @@ -1426,21 +1427,21 @@ numbers is not required, then an implementation MAY include just the lower 32 bits of the data sequence number in the data sequence mapping and/or Data ACK as an optimization, and an implementation can make this choice independently for each packet. An implementation MUST be able to receive and process both 64-bit or 32-bit sequence number values, but it is not required that an implementation is able to send both. An implementation MUST send the full 64-bit data sequence number if it is transmitting at a sufficiently high rate that the 32-bit value - could wrap within the Maximum Segment Lifetime (MSL) [RFC1323]. The + could wrap within the Maximum Segment Lifetime (MSL) [RFC7323]. The lengths of the DSNs used in these values (which may be different) are declared with flags in the DSS option. Implementations MUST accept a 32-bit DSN and implicitly promote it to a 64-bit quantity by incrementing the upper 32 bits of sequence number each time the lower 32 bits wrap. A sanity check MUST be implemented to ensure that a wrap occurs at an expected time (e.g., the sequence number jumps from a very high number to a very low number) and is not triggered by out- of-order packets. As with the standard TCP sequence number, the data sequence number @@ -1802,21 +1802,22 @@ In the event that the available set of paths changes, a host may wish to signal a change in priority of subflows to the peer (e.g., a subflow that was previously set as backup should now take priority over all remaining subflows). Therefore, the MP_PRIO option, shown in Figure 11, can be used to change the 'B' flag of the subflow on which it is sent. Another use of the MP_PRIO option is to set the 'B' flag on a subflow to cleanly retire its use before closing it and removing it with REMOVE_ADDR Section 3.4.2, for example to support make-before-break - session continuity. + session continuity, where new subflows are added before the + previously used ones are closed. 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +---------------+---------------+-------+-----+-+ | Kind | Length |Subtype| |B| +---------------+---------------+-------+-----+-+ Figure 11: Change Subflow Priority (MP_PRIO) Option It should be noted that the backup flag is a request from a data @@ -1944,42 +1945,42 @@ ADD_ADDR option. If the port is not present in the ADD_ADDR option, the HMAC message will nevertheless include two octets of value zero. The rationale for the HMAC is to prevent unauthorized entities from injecting ADD_ADDR signals in an attempt to hijack a connection. Note that additionally the presence of this HMAC prevents the address being changed in flight unless the key is known by an intermediary. If a host receives an ADD_ADDR option for which it cannot validate the HMAC, it SHOULD silently ignore the option. A set of four flags are present after the subtype and before the - Address ID. Only the rightmost bit - labelled 'E' - is assigned - today. The other bits are currently unassigned and MUST be set to - zero by a sender and MUST be ignored by the receiver. + Address ID. Only the rightmost bit - labelled 'E' - is assigned in + this specification. The other bits are currently unassigned and MUST + be set to zero by a sender and MUST be ignored by the receiver. The 'E' flag exists to provide reliability for this option. Because this option will often be sent on pure ACKs, there is no guarantee of reliability. Therefore, a receiver receiving a fresh ADD_ADDR option (where E=0), will send the same option back to the sender, but not - including the HMAC, and with E=1. The lack of this echo can be used - by the initial ADD_ADDR sender to retransmit the ADD_ADDR according - to local policy. + including the HMAC, and with E=1, to indicate receipt. The lack of + this echo can be used by the initial ADD_ADDR sender to retransmit + the ADD_ADDR according to local policy. 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +---------------+---------------+-------+-------+---------------+ | Kind | Length |Subtype|(rsv)|E| Address ID | +---------------+---------------+-------+-------+---------------+ | Address (IPv4 - 4 octets / IPv6 - 16 octets) | +-------------------------------+-------------------------------+ | Port (2 octets, optional) | | +-------------------------------+ | - | Truncated HMAC (8 octets, if length > 10 octets) | + | Truncated HMAC (8 octets, if E=0) | | +-------------------------------+ | | +-------------------------------+ Figure 12: Add Address (ADD_ADDR) Option Due to the proliferation of NATs, it is reasonably likely that one host may attempt to advertise private addresses [RFC1918]. It is not desirable to prohibit this, since there may be cases where both hosts have additional interfaces on the same private network, and a host @@ -1989,36 +1990,20 @@ uniquely identifies the connection to the receiving host. If the token is unknown, the host will return with a RST. In the unlikely event that the token is valid at the receiving host, subflow setup will continue, but the HMAC exchange must occur for authentication. This will fail, and will provide sufficient protection against two unconnected hosts accidentally setting up a new subflow upon the signal of a private address. Further security considerations around the issue of ADD_ADDR messages that accidentally misdirect, or maliciously direct, new MP_JOIN attempts are discussed in Section 5. - Ideally, ADD_ADDR and REMOVE_ADDR options would be sent reliably, and - in order, to the other end. This would ensure that this address - management does not unnecessarily cause an outage in the connection - when remove/add addresses are processed in reverse order, and also to - ensure that all possible paths are used. Note, however, that losing - reliability and ordering will not break the multipath connections, it - will just reduce the opportunity to open multipath paths and to - survive different patterns of path failures. - - Therefore, implementing reliability signals for these MPTCP options - is not necessary. In order to minimize the impact of the loss of - these options, however, it is RECOMMENDED that a sender should send - these options on all available subflows. If these options need to be - received in order, an implementation SHOULD only send one ADD_ADDR/ - REMOVE_ADDR option per RTT, to minimize the risk of misordering. - A host that receives an ADD_ADDR but finds a connection set up to that IP address and port number is unsuccessful SHOULD NOT perform further connection attempts to this address/port combination for this connection. A sender that wants to trigger a new incoming connection attempt on a previously advertised address/port combination can therefore refresh ADD_ADDR information by sending the option again. A host can therefore send an ADD_ADDR message with an already assigned Address ID, but the Address MUST be the same as previously assigned to this Address ID. A new ADD_ADDR may have the same, or @@ -2664,24 +2649,25 @@ The use of crypto capability bits in the initial connection handshake to negotiate use of a particular algorithm allows the deployment of additional crypto mechanisms in the future. Note that this would be susceptible to bid-down attacks only if the attacker was on-path (and thus would be able to modify the data anyway). The security mechanism presented in this document should therefore protect against all forms of flooding and hijacking attacks discussed in [RFC6181]. The version negotiation specified in Section 3.1, if differing MPTCP versions shared a common negotiation format, would allow an on-path - attacker to apply a theoretical bid-down attack. However, since the - v1 and v0 protocols have a different handshake, this is not an attack - that can be applied here. Furthermore, an on-path attacker would - have access to the raw data, negating any other TCP-level security + attacker to apply a theoretical bid-down attack. Since the v1 and v0 + protocols have a different handshake, such an attack would require + the client to re-establish the connection using v0, and this being + supported by the server. Note that an on-path attacker would have + access to the raw data, negating any other TCP-level security mechanisms. Also a change from [RFC6824] has removed the subflow identifier from the MP_PRIO option (Section 3.3.8), to remove the theoretical attack where a subflow could be placed in "backup" mode by an attacker. During normal operation, regular TCP protection mechanisms (such as ensuring sequence numbers are in-window) will provide the same level of protection against attacks on individual TCP subflows as exists for regular TCP today. Implementations will introduce additional buffers compared to regular TCP, to reassemble data at the connection @@ -2912,21 +2898,21 @@ The authors gratefully acknowledge significant input into this document from Sebastien Barre and Andrew McDonald. The authors also wish to acknowledge reviews and contributions from Iljitsch van Beijnum, Lars Eggert, Marcelo Bagnulo, Robert Hancock, Pasi Sarolahti, Toby Moncaster, Philip Eardley, Sergio Lembo, Lawrence Conroy, Yoshifumi Nishida, Bob Briscoe, Stein Gjessing, Andrew McGregor, Georg Hampel, Anumita Biswas, Wes Eddy, Alexey Melnikov, Francis Dupont, Adrian Farrel, Barry Leiba, Robert Sparks, Sean Turner, Stephen Farrell, Martin Stiemerling, Gregory Detal, - Fabien Duchene, Xavier de Foy, and Rahul Jadhav. + Fabien Duchene, Xavier de Foy, Rahul Jadhav, and Klemens Schragel. 8. IANA Considerations This document obsoletes [RFC6824] and as such IANA is requested to update the TCP option space registry to point to this document for Multipath TCP, as follows: +------+--------+-----------------------+---------------+ | Kind | Length | Meaning | Reference | +------+--------+-----------------------+---------------+ @@ -3006,21 +2992,21 @@ | | | Section 3.2 | +-------+----------------------------------------+------------------+ Table 3: MPTCP Handshake Algorithms Note that the meanings of bits D through H can be dependent upon bit B, depending on how Extensibility is defined in future specifications; see Section 3.1 for more information. Future assignments in this registry are also to be defined by - Standards Action as defined by [RFC5226]. Assignments consist of the + Standards Action as defined by [RFC8126]. Assignments consist of the value of the flags, a symbolic name for the algorithm, and a reference to its specification. 8.3. MP_TCPRST Reason Codes IANA is requested to create a further sub-registry, "MP_TCPRST Reason Codes" under the "Transmission Control Protocol (TCP) Parameters" registry, based on the reason code in MP_TCPRST (Section 3.6): +------+-----------------------------+----------------------------+ @@ -3040,163 +3026,174 @@ 9. References 9.1. Normative References [RFC0793] Postel, J., "Transmission Control Protocol", STD 7, RFC 793, DOI 10.17487/RFC0793, September 1981, . [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, - DOI 10.17487/RFC2119, March 1997, - . + DOI 10.17487/RFC2119, March 1997, . [RFC6182] Ford, A., Raiciu, C., Handley, M., Barre, S., and J. Iyengar, "Architectural Guidelines for Multipath TCP Development", RFC 6182, DOI 10.17487/RFC6182, March 2011, . [RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, May 2017, . [SHS] National Institute of Science and Technology, "Secure Hash Standard", Federal Information Processing Standard (FIPS) 180-4, August 2015, . 9.2. Informative References + [deployments] + Bonaventure, O. and S. Seo, "Multipath TCP Deployments", + IETF Journal 2016, November 2016, + . + [howhard] Raiciu, C., Paasch, C., Barre, S., Ford, A., Honda, M., Duchene, F., Bonaventure, O., and M. Handley, "How Hard Can It Be? Designing and Implementing a Deployable Multipath TCP", Usenix Symposium on Networked Systems Design and Implementation 2012, 2012, . [norm] Handley, M., Paxson, V., and C. Kreibich, "Network Intrusion Detection: Evasion, Traffic Normalization, and End-to-End Protocol Semantics", Usenix Security 2001, 2001, . [RFC1122] Braden, R., Ed., "Requirements for Internet Hosts - Communication Layers", STD 3, RFC 1122, - DOI 10.17487/RFC1122, October 1989, - . - - [RFC1323] Jacobson, V., Braden, R., and D. Borman, "TCP Extensions - for High Performance", RFC 1323, DOI 10.17487/RFC1323, May - 1992, . + DOI 10.17487/RFC1122, October 1989, . [RFC1918] Rekhter, Y., Moskowitz, B., Karrenberg, D., de Groot, G., and E. Lear, "Address Allocation for Private Internets", BCP 5, RFC 1918, DOI 10.17487/RFC1918, February 1996, . [RFC2018] Mathis, M., Mahdavi, J., Floyd, S., and A. Romanow, "TCP Selective Acknowledgment Options", RFC 2018, - DOI 10.17487/RFC2018, October 1996, - . + DOI 10.17487/RFC2018, October 1996, . [RFC2104] Krawczyk, H., Bellare, M., and R. Canetti, "HMAC: Keyed- Hashing for Message Authentication", RFC 2104, - DOI 10.17487/RFC2104, February 1997, - . + DOI 10.17487/RFC2104, February 1997, . [RFC2979] Freed, N., "Behavior of and Requirements for Internet Firewalls", RFC 2979, DOI 10.17487/RFC2979, October 2000, . [RFC2992] Hopps, C., "Analysis of an Equal-Cost Multi-Path Algorithm", RFC 2992, DOI 10.17487/RFC2992, November 2000, . [RFC3022] Srisuresh, P. and K. Egevang, "Traditional IP Network Address Translator (Traditional NAT)", RFC 3022, - DOI 10.17487/RFC3022, January 2001, - . + DOI 10.17487/RFC3022, January 2001, . [RFC3135] Border, J., Kojo, M., Griner, J., Montenegro, G., and Z. Shelby, "Performance Enhancing Proxies Intended to Mitigate Link-Related Degradations", RFC 3135, - DOI 10.17487/RFC3135, June 2001, - . + DOI 10.17487/RFC3135, June 2001, . [RFC3168] Ramakrishnan, K., Floyd, S., and D. Black, "The Addition of Explicit Congestion Notification (ECN) to IP", RFC 3168, DOI 10.17487/RFC3168, September 2001, . [RFC4086] Eastlake 3rd, D., Schiller, J., and S. Crocker, "Randomness Requirements for Security", BCP 106, RFC 4086, - DOI 10.17487/RFC4086, June 2005, - . + DOI 10.17487/RFC4086, June 2005, . [RFC4987] Eddy, W., "TCP SYN Flooding Attacks and Common Mitigations", RFC 4987, DOI 10.17487/RFC4987, August 2007, . - [RFC5226] Narten, T. and H. Alvestrand, "Guidelines for Writing an - IANA Considerations Section in RFCs", RFC 5226, - DOI 10.17487/RFC5226, May 2008, - . - [RFC5681] Allman, M., Paxson, V., and E. Blanton, "TCP Congestion Control", RFC 5681, DOI 10.17487/RFC5681, September 2009, . [RFC5961] Ramaiah, A., Stewart, R., and M. Dalal, "Improving TCP's Robustness to Blind In-Window Attacks", RFC 5961, - DOI 10.17487/RFC5961, August 2010, - . + DOI 10.17487/RFC5961, August 2010, . [RFC6181] Bagnulo, M., "Threat Analysis for TCP Extensions for Multipath Operation with Multiple Addresses", RFC 6181, - DOI 10.17487/RFC6181, March 2011, - . + DOI 10.17487/RFC6181, March 2011, . [RFC6234] Eastlake 3rd, D. and T. Hansen, "US Secure Hash Algorithms (SHA and SHA-based HMAC and HKDF)", RFC 6234, - DOI 10.17487/RFC6234, May 2011, - . + DOI 10.17487/RFC6234, May 2011, . [RFC6356] Raiciu, C., Handley, M., and D. Wischik, "Coupled Congestion Control for Multipath Transport Protocols", RFC 6356, DOI 10.17487/RFC6356, October 2011, . [RFC6528] Gont, F. and S. Bellovin, "Defending against Sequence Number Attacks", RFC 6528, DOI 10.17487/RFC6528, February 2012, . [RFC6824] Ford, A., Raiciu, C., Handley, M., and O. Bonaventure, "TCP Extensions for Multipath Operation with Multiple Addresses", RFC 6824, DOI 10.17487/RFC6824, January 2013, . [RFC6897] Scharf, M. and A. Ford, "Multipath TCP (MPTCP) Application Interface Considerations", RFC 6897, DOI 10.17487/RFC6897, March 2013, . + [RFC7323] Borman, D., Braden, B., Jacobson, V., and R. + Scheffenegger, Ed., "TCP Extensions for High Performance", + RFC 7323, DOI 10.17487/RFC7323, September 2014, + . + [RFC7413] Cheng, Y., Chu, J., Radhakrishnan, S., and A. Jain, "TCP Fast Open", RFC 7413, DOI 10.17487/RFC7413, December 2014, . [RFC7430] Bagnulo, M., Paasch, C., Gont, F., Bonaventure, O., and C. Raiciu, "Analysis of Residual Threats and Possible Fixes for Multipath TCP (MPTCP)", RFC 7430, - DOI 10.17487/RFC7430, July 2015, - . + DOI 10.17487/RFC7430, July 2015, . + + [RFC8041] Bonaventure, O., Paasch, C., and G. Detal, "Use Cases and + Operational Experience with Multipath TCP", RFC 8041, + DOI 10.17487/RFC8041, January 2017, . + + [RFC8126] Cotton, M., Leiba, B., and T. Narten, "Guidelines for + Writing an IANA Considerations Section in RFCs", BCP 26, + RFC 8126, DOI 10.17487/RFC8126, June 2017, + . [TCPLO] Ramaiah, A., "TCP option space extension", Work in Progress, March 2012. Appendix A. Notes on Use of TCP Options The TCP option space is limited due to the length of the Data Offset field in the TCP header (4 bits), which defines the TCP header length in 32-bit words. With the standard TCP header being 20 bytes, this leaves a maximum of 40 bytes for options, and many of these may @@ -3211,21 +3208,21 @@ bytes) options. Together these sum to 19 bytes. Some operating systems appear to pad each option up to a word boundary, thus using 24 bytes (a brief survey suggests Windows XP and Mac OS X do this, whereas Linux does not). Optimistically, therefore, we have 21 bytes spare, or 16 if it has to be word-aligned. In either case, however, the SYN versions of Multipath Capable (12 bytes) and Join (12 or 16 bytes) options will fit in this remaining space. Note that due to the use of a 64-bit data-level sequence space, it is feasible that MPTCP will not require the timestamp option for - protection against wrapped sequence numbers (PAWS [RFC1323]), since + protection against wrapped sequence numbers (PAWS [RFC7323]), since the data-level sequence space has far less chance of wrapping. Confirmation of the validity of this optimisation is for further study. TCP data packets typically carry timestamp options in every packet, taking 10 bytes (or 12 with padding). That leaves 30 bytes (or 28, if word-aligned). The Data Sequence Signal (DSS) option varies in length depending on whether the data sequence mapping and DATA_ACK are included, and whether the sequence numbers in use are 4 or 8 octets. The maximum size of the DSS option is 28 bytes, so even that @@ -3260,29 +3257,20 @@ The ADD_ADDR option can be between 16 and 30 bytes, depending on whether IPv4 or IPv6 is used, and whether or not the port number is present. It is unlikely that such signaling would fit in a data packet (although if there is space, it is fine to include it). It is recommended to use duplicate ACKs with no other payload or options in order to transmit these rare signals. Note this is the reason for mandating that duplicate ACKs with MPTCP options are not taken as a signal of congestion. - Finally, there are issues with reliable delivery of options. As - options can also be sent on pure ACKs, these are not reliably sent. - This is not an issue for DATA_ACK due to their cumulative nature, but - may be an issue for ADD_ADDR/REMOVE_ADDR options. Here, it is - recommended to send these options redundantly (whether on multiple - paths or on the same path on a number of ACKs -- but interspersed - with data in order to avoid interpretation as congestion). The cases - where options are stripped by middleboxes are discussed in Section 6. - Appendix B. TCP Fast Open and MPTCP TCP Fast Open (TFO) is an experimental TCP extension, described in [RFC7413], which has been introduced to allow sending data one RTT earlier than with regular TCP. This is considered a valuable gain as very short connections are very common, especially for HTTP request/ response schemes. It achieves this by sending the SYN-segment together with the application's data and allowing the listener to reply immediately with data after the SYN/ACK. [RFC7413] secures this mechanism, by using a new TCP option that includes a cookie @@ -3491,35 +3479,35 @@ C.1.2. Sending Side SND.UNA (64 bits): This is the data sequence number of the next byte to be acknowledged, at the MPTCP connection level. This variable is updated upon reception of a DSS option containing a DATA_ACK. SND.NXT (64 bits): This is the data sequence number of the next byte to be sent. SND.NXT is used to determine the value of the DSN in the DSS option. - SND.WND (32 bits with RFC 1323, 16 bits otherwise): This is the + SND.WND (32 bits with RFC 7323, 16 bits otherwise): This is the sending window. MPTCP maintains the sending window at the MPTCP connection level and the same window is shared by all subflows. All subflows use the MPTCP connection level SND.WND to compute the SEQ.WND value that is sent in each transmitted segment. C.1.3. Receiving Side RCV.NXT (64 bits): This is the data sequence number of the next byte that is expected on the MPTCP connection. This state variable is modified upon reception of in-order data. The value of RCV.NXT is used to specify the DATA_ACK that is sent in the DSS option on all subflows. - RCV.WND (32 bits with RFC 1323, 16 bits otherwise): This is the + RCV.WND (32 bits with RFC 7323, 16 bits otherwise): This is the connection-level receive window, which is the maximum of the RCV.WND on all the subflows. C.2. TCP Control Blocks The MPTCP control block also contains a list of the TCP control blocks that are associated with the MPTCP connection. Note that the TCP control block on the TCP subflows does not contain the RCV.WND and SND.WND state variables as these are maintained at @@ -3538,31 +3526,31 @@ be sent on the subflow. SND.NXT is used to set the value of SEG.SEQ upon transmission of the next segment. C.2.2. Receiving Side RCV.NXT (32 bits): This is the sequence number of the next byte that is expected on the subflow. This state variable is modified upon reception of in-order segments. The value of RCV.NXT is copied to the SEG.ACK field of the next segments transmitted on the subflow. - RCV.WND (32 bits with RFC 1323, 16 bits otherwise): This is the + RCV.WND (32 bits with RFC 7323, 16 bits otherwise): This is the subflow-level receive window that is updated with the window field from the segments received on this subflow. Appendix D. Finite State Machine The diagram in Figure 22 shows the Finite State Machine for connection-level closure. This illustrates how the DATA_FIN - connection-level signal (indicated as the DFIN flag on a DATA_ACK) - interacts with subflow-level FINs, and permits "break-before-make" - handover between subflows. + connection-level signal (indicated in the diagram as the DFIN flag on + a DATA_ACK) interacts with subflow-level FINs, and permits "break- + before-make" handover between subflows. +---------+ | M_ESTAB | +---------+ M_CLOSE | | rcv DATA_FIN ------- | | ------- +---------+ snd DATA_FIN / \ snd DATA_ACK[DFIN] +---------+ | M_FIN |<----------------- ------------------->| M_CLOSE | | WAIT-1 |--------------------------- | WAIT | +---------+ rcv DATA_FIN \ +---------+ @@ -3579,20 +3567,91 @@ | snd DATA_ACK[DFIN] V delete MPTCP PCB V \ +-----------+ +---------+ ------------------------>|M_TIME WAIT|----------------->| M_CLOSED| +-----------+ +---------+ All subflows in CLOSED ------------ delete MPTCP PCB Figure 22: Finite State Machine for Connection Closure +Appendix E. Changes from RFC6184 + + This section lists the key technical changes between RFC6824 + [RFC6824], specifying MPTCP v0, and this document, which obsoletes + RFC6824 and specifies MPTCP v1. Note that this specification is not + backwards compatible with RFC6824. + + o The document incorporates lessons learnt from the various + implementations, deployments and experiments gathered in the + documents "Use Cases and Operational Experience with Multipath + TCP" [RFC8041] and the IETF Journal article "Multipath TCP + Deployments" [deployments]. + + o Connection initiation, through the exchange of the MP_CAPABLE + MPTCP option, is different from RFC6824. In order to permit + servers to act statelessly, the SYN doesn't include A's key (it is + still sent in the ACK). + + o This requires MP_CAPABLE to also be sent reliably on the third + ACK. If safe receipt of the third ACK cannot be inferred, the + MP_CAPABLE option must be repeated on the first data packet. + + o In the Flags field of MP_CAPABLE, C is now assigned to mean that + the sender of this option will not accept additional MPTCP + subflows to the source address and port. This is an efficiency + improvement, for example where the sender is behind a strict NAT. + + o In the Flags field of MP_CAPABLE, H now indicates the use of HMAC- + SHA256 (rather than HMAC-SHA1). + + o Connection initiation also defines the procedure for version + negotiation, for implementations that support both v0 (RFC6824) + and v1 (this document). + + o The HMAC-SHA256 (rather than HMAC-SHA1) algorithm is used, as the + algorithm provides better security. It is used to generate the + token in the MP_JOIN and ADD_ADDR messages, and to set the initial + data sequence number. + + o A new subflow-level option exists to signal reasons for sending a + RST on a subflow (MP_TCPRST Section 3.6), which can help an + implementation decide whether to attempt later re-connection. + + o The MP_PRIO option (Section 3.3.8), which is used to signal a + change of priority for a subflow, no longer includes the AddrID + field. Its purpose was to allow the changed priority to be + applied on a subflow other than the one it was sent on. However, + it has been realised that this could be used by a man-in-the- + middle to divert all traffic on to its own path, and MP_PRIO does + not include a token or other security mechanism. + + o The ADD_ADDR option (Section 3.4.1), which is used to inform the + other host about another potential address, is different in + several ways. It now includes an HMAC of the added address, for + enhanced security. In addition, reliability for the ADD_ADDR + option has been added: the IPVer field is replaced with a flag + field, and one flag is assigned (E) which is used as an 'Echo' so + a host can indicate that it has received the option. + + o An additional way of performing a Fast Close is described, by + sending a MP_FASTCLOSE option on a RST on all subflows. This + allows the host to tear down the subflows and the connection + immediately. + + o In the IANA registry a new MPTCP subtype option, MP_EXPERIMENTAL, + is reserved for private experiments. However, the document + doesn't define how to use the subtype option. + + o A new Appendix discusses the usage of both the MPTCP and TCP Fast + Open on the same packet (Appendix B). + Authors' Addresses Alan Ford Pexip EMail: alan.ford@gmail.com Costin Raiciu University Politehnica of Bucharest Splaiul Independentei 313