--- 1/draft-ietf-mptcp-rfc6824bis-05.txt 2016-07-06 01:16:14.421037656 -0700 +++ 2/draft-ietf-mptcp-rfc6824bis-06.txt 2016-07-06 01:16:14.573041473 -0700 @@ -1,62 +1,62 @@ Internet Engineering Task Force A. Ford Internet-Draft Pexip Obsoletes: 6824 (if approved) C. Raiciu Intended status: Experimental U. Politechnica of Bucharest -Expires: July 15, 2016 M. Handley +Expires: January 7, 2017 M. Handley U. College London O. Bonaventure U. catholique de Louvain C. Paasch Apple, Inc. - January 12, 2016 + July 6, 2016 TCP Extensions for Multipath Operation with Multiple Addresses - draft-ietf-mptcp-rfc6824bis-05 + draft-ietf-mptcp-rfc6824bis-06 Abstract TCP/IP communication is currently restricted to a single path per connection, yet multiple paths often exist between peers. The simultaneous use of these multiple paths for a TCP/IP session would improve resource usage within the network and, thus, improve user experience through higher throughput and improved resilience to network failure. Multipath TCP provides the ability to simultaneously use multiple paths between peers. This document presents a set of extensions to traditional TCP to support multipath operation. The protocol offers the same type of service to applications as TCP (i.e., reliable bytestream), and it provides the components necessary to establish and use multiple TCP flows across potentially disjoint paths. This document specifies v1 of Multipath TCP, obsoleting v0 as - specified in RFC6824 [5] through clarifications and modifications - primarily driven by deployment experience. + specified in RFC6824 [RFC6824] through clarifications and + modifications primarily driven by deployment experience. Status of This Memo This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet- Drafts is at http://datatracker.ietf.org/drafts/current/. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." - This Internet-Draft will expire on July 15, 2016. + This Internet-Draft will expire on January 7, 2017. Copyright Notice Copyright (c) 2016 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents @@ -106,64 +106,68 @@ 3.9. Error Handling . . . . . . . . . . . . . . . . . . . . . . 50 3.10. Heuristics . . . . . . . . . . . . . . . . . . . . . . . . 50 3.10.1. Port Usage . . . . . . . . . . . . . . . . . . . . . . 51 3.10.2. Delayed Subflow Start and Subflow Symmetry . . . . . . 51 3.10.3. Failure Handling . . . . . . . . . . . . . . . . . . . 52 4. Semantic Issues . . . . . . . . . . . . . . . . . . . . . . . 53 5. Security Considerations . . . . . . . . . . . . . . . . . . . 54 6. Interactions with Middleboxes . . . . . . . . . . . . . . . . 57 7. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 60 8. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 60 - 9. References . . . . . . . . . . . . . . . . . . . . . . . . . . 62 - 9.1. Normative References . . . . . . . . . . . . . . . . . . . 62 - 9.2. Informative References . . . . . . . . . . . . . . . . . . 63 - Appendix A. Notes on Use of TCP Options . . . . . . . . . . . . . 65 - Appendix B. Control Blocks . . . . . . . . . . . . . . . . . . . 67 - B.1. MPTCP Control Block . . . . . . . . . . . . . . . . . . . 67 - B.1.1. Authentication and Metadata . . . . . . . . . . . . . 67 - B.1.2. Sending Side . . . . . . . . . . . . . . . . . . . . . 68 - B.1.3. Receiving Side . . . . . . . . . . . . . . . . . . . . 68 - B.2. TCP Control Blocks . . . . . . . . . . . . . . . . . . . . 68 + 8.1. MPTCP Option Subtypes . . . . . . . . . . . . . . . . . . 61 + 8.2. MPTCP Handshake Algorithms . . . . . . . . . . . . . . . . 62 + 8.3. MP_TCPRST Reason Codes . . . . . . . . . . . . . . . . . . 62 + 8.4. Experimental option registry . . . . . . . . . . . . . . . 63 + 9. References . . . . . . . . . . . . . . . . . . . . . . . . . . 63 + 9.1. Normative References . . . . . . . . . . . . . . . . . . . 63 + 9.2. Informative References . . . . . . . . . . . . . . . . . . 64 + Appendix A. Notes on Use of TCP Options . . . . . . . . . . . . . 66 + Appendix B. Control Blocks . . . . . . . . . . . . . . . . . . . 68 + B.1. MPTCP Control Block . . . . . . . . . . . . . . . . . . . 68 + B.1.1. Authentication and Metadata . . . . . . . . . . . . . 68 + B.1.2. Sending Side . . . . . . . . . . . . . . . . . . . . . 69 + B.1.3. Receiving Side . . . . . . . . . . . . . . . . . . . . 69 + B.2. TCP Control Blocks . . . . . . . . . . . . . . . . . . . . 69 B.2.1. Sending Side . . . . . . . . . . . . . . . . . . . . . 69 - B.2.2. Receiving Side . . . . . . . . . . . . . . . . . . . . 69 - Appendix C. Finite State Machine . . . . . . . . . . . . . . . . 69 + B.2.2. Receiving Side . . . . . . . . . . . . . . . . . . . . 70 + Appendix C. Finite State Machine . . . . . . . . . . . . . . . . 70 1. Introduction - Multipath TCP (MPTCP) is a set of extensions to regular TCP [1] to - provide a Multipath TCP [2] service, which enables a transport - connection to operate across multiple paths simultaneously. This - document presents the protocol changes required to add multipath + Multipath TCP (MPTCP) is a set of extensions to regular TCP [RFC0793] + to provide a Multipath TCP [RFC6182] service, which enables a + transport connection to operate across multiple paths simultaneously. + This document presents the protocol changes required to add multipath capability to TCP; specifically, those for signaling and setting up multiple paths ("subflows"), managing these subflows, reassembly of data, and termination of sessions. This is not the only information required to create a Multipath TCP implementation, however. This document is complemented by three others: - o Architecture [2], which explains the motivations behind Multipath - TCP, contains a discussion of high-level design decisions on which - this design is based, and an explanation of a functional - separation through which an extensible MPTCP implementation can be - developed. + o Architecture [RFC6182], which explains the motivations behind + Multipath TCP, contains a discussion of high-level design + decisions on which this design is based, and an explanation of a + functional separation through which an extensible MPTCP + implementation can be developed. - o Congestion control [6] presents a safe congestion control + o Congestion control [RFC6356] presents a safe congestion control algorithm for coupling the behavior of the multiple paths in order to "do no harm" to other network users. - o Application considerations [7] discusses what impact MPTCP will - have on applications, what applications will want to do with + o Application considerations [RFC6897] discusses what impact MPTCP + will have on applications, what applications will want to do with MPTCP, and as a consequence of these factors, what API extensions an MPTCP implementation should present. This document is an update to, and obsoletes, the v0 specification of - Multipath TCP [5]. This document specifies MPTCP v1, which is not - backward compatible with MPTCP v0. This document additionally + Multipath TCP [RFC6824]. This document specifies MPTCP v1, which is + not backward compatible with MPTCP v0. This document additionally defines version negotiation procedures for implementations that support both versions. 1.1. Design Assumptions In order to limit the potentially huge design space, the working group imposed two key constraints on the Multipath TCP design presented in this document: o It must be backwards-compatible with current, regular TCP, to @@ -171,67 +175,67 @@ o It can be assumed that one or both hosts are multihomed and multiaddressed. To simplify the design, we assume that the presence of multiple addresses at a host is sufficient to indicate the existence of multiple paths. These paths need not be entirely disjoint: they may share one or many routers between them. Even in such a situation, making use of multiple paths is beneficial, improving resource utilization and resilience to a subset of node failures. The - congestion control algorithms defined in [6] ensure this does not act - detrimentally. Furthermore, there may be some scenarios where - different TCP ports on a single host can provide disjoint paths (such - as through certain Equal-Cost Multipath (ECMP) implementations [8]), - and so the MPTCP design also supports the use of ports in path - identifiers. + congestion control algorithms defined in [RFC6356] ensure this does + not act detrimentally. Furthermore, there may be some scenarios + where different TCP ports on a single host can provide disjoint paths + (such as through certain Equal-Cost Multipath (ECMP) implementations + [RFC2992]), and so the MPTCP design also supports the use of ports in + path identifiers. There are three aspects to the backwards-compatibility listed above - (discussed in more detail in [2]): + (discussed in more detail in [RFC6182]): External Constraints: The protocol must function through the vast majority of existing middleboxes such as NATs, firewalls, and proxies, and as such must resemble existing TCP as far as possible on the wire. Furthermore, the protocol must not assume the segments it sends on the wire arrive unmodified at the destination: they may be split or coalesced; TCP options may be removed or duplicated. Application Constraints: The protocol must be usable with no change to existing applications that use the common TCP API (although it is reasonable that not all features would be available to such legacy applications). Furthermore, the protocol must provide the same service model as regular TCP to the application. Fallback: The protocol should be able to fall back to standard TCP with no interference from the user, to be able to communicate with legacy hosts. - The complementary application considerations document [7] discusses - the necessary features of an API to provide backwards-compatibility, - as well as API extensions to convey the behavior of MPTCP at a level - of control and information equivalent to that available with regular, - single-path TCP. + The complementary application considerations document [RFC6897] + discusses the necessary features of an API to provide backwards- + compatibility, as well as API extensions to convey the behavior of + MPTCP at a level of control and information equivalent to that + available with regular, single-path TCP. Further discussion of the design constraints and associated design - decisions are given in the MPTCP Architecture document [2] and in - [9]. + decisions are given in the MPTCP Architecture document [RFC6182] and + in [howhard]. 1.2. Multipath TCP in the Networking Stack MPTCP operates at the transport layer and aims to be transparent to both higher and lower layers. It is a set of additional features on top of standard TCP; Figure 1 illustrates this layering. MPTCP is designed to be usable by legacy applications with no changes; detailed discussion of its interactions with applications is given in - [7]. + [RFC6897]. +-------------------------------+ | Application | +---------------+ +-------------------------------+ | Application | | MPTCP | +---------------+ + - - - - - - - + - - - - - - - + | TCP | | Subflow (TCP) | Subflow (TCP) | +---------------+ +-------------------------------+ | IP | | IP | IP | +---------------+ +-------------------------------+ @@ -272,23 +276,23 @@ Section 4. 1.4. MPTCP Concept This section provides a high-level summary of normal operation of MPTCP, and is illustrated by the scenario shown in Figure 2. A detailed description of operation is given in Section 3. o To a non-MPTCP-aware application, MPTCP will behave the same as normal TCP. Extended APIs could provide additional control to - MPTCP-aware applications [7]. An application begins by opening a - TCP socket in the normal way. MPTCP signaling and operation are - handled by the MPTCP implementation. + MPTCP-aware applications [RFC6897]. An application begins by + opening a TCP socket in the normal way. MPTCP signaling and + operation are handled by the MPTCP implementation. o An MPTCP connection begins similarly to a regular TCP connection. This is illustrated in Figure 2 where an MPTCP connection is established between addresses A1 and B1 on Hosts A and B, respectively. o If extra paths are available, additional TCP sessions (termed MPTCP "subflows") are created on these paths, and are combined with the existing session, which continues to appear as a single connection to the applications at both ends. The creation of the @@ -330,21 +334,21 @@ | |<---------------------| | | | | | | | | | Figure 2: Example MPTCP Usage Scenario 1.5. Requirements Language The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this - document are to be interpreted as described in RFC 2119 [3]. + document are to be interpreted as described in RFC 2119 [RFC2119]. 2. Operation Overview This section presents a single description of common MPTCP operation, with reference to the protocol operation. This is a high-level overview of the key functions; the full specification follows in Section 3. Extensibility and negotiated features are not discussed here. Considerable reference is made to symbolic names of MPTCP options throughout this section -- these are subtypes of the IANA- assigned MPTCP option (see Section 8), and their formats are defined @@ -527,25 +531,26 @@ o To cope with NATs on the path, addresses are referred to by Address IDs, in case the IP packet's source address gets changed by a NAT. Setting up a new TCP flow is not possible if the passive opener is behind a NAT; to allow subflows to be created when either end is behind a NAT, MPTCP uses the ADD_ADDR message. o MPTCP falls back to ordinary TCP if MPTCP operation is not possible, for example, if one host is not MPTCP capable or if a middlebox alters the payload. - o To meet the threats identified in [10], the following steps are - taken: keys are sent in the clear in the MP_CAPABLE messages; - MP_JOIN messages are secured with HMAC-SHA1 ([11], [4]) using - those keys; and standard TCP validity checks are made on the other - messages (ensuring sequence numbers are in-window [12]). + o To meet the threats identified in [RFC6181], the following steps + are taken: keys are sent in the clear in the MP_CAPABLE messages; + MP_JOIN messages are secured with HMAC-SHA1 ([RFC2104], [sha1]) + using those keys; and standard TCP validity checks are made on the + other messages (ensuring sequence numbers are in-window + [RFC5961]). 3. MPTCP Protocol This section describes the operation of the MPTCP protocol, and is subdivided into sections for each key part of the protocol operation. All MPTCP operations are signaled using optional TCP header fields. A single TCP option number ("Kind") has been assigned by IANA for MPTCP (see Section 8), and then individual messages will be determined by a "subtype", the values of which are also stored in an @@ -572,51 +577,51 @@ Those MPTCP options associated with subflow initiation are used on packets with the SYN flag set. Additionally, there is one MPTCP option for signaling metadata to ensure segmented data can be recombined for delivery to the application. The remaining options, however, are signals that do not need to be on a specific packet, such as those for signaling additional addresses. Whilst an implementation may desire to send MPTCP options as soon as possible, it may not be possible to combine all desired options (both those for MPTCP and for regular TCP, such as SACK (selective - acknowledgment) [13]) on a single packet. Therefore, an + acknowledgment) [RFC2018]) on a single packet. Therefore, an implementation may choose to send duplicate ACKs containing the additional signaling information. This changes the semantics of a duplicate ACK; these are usually only sent as a signal of a lost - segment [14] in regular TCP. Therefore, an MPTCP implementation + segment [RFC5681] in regular TCP. Therefore, an MPTCP implementation receiving a duplicate ACK that contains an MPTCP option MUST NOT treat it as a signal of congestion. Additionally, an MPTCP implementation SHOULD NOT send more than two duplicate ACKs in a row for the purposes of sending MPTCP options alone, in order to ensure no middleboxes misinterpret this as a sign of congestion. Furthermore, standard TCP validity checks (such as ensuring the sequence number and acknowledgment number are within window) MUST be - undertaken before processing any MPTCP signals, as described in [12], - and initial subfow sequence numbers SHOULD be generated according to - the recommendations in [15]. + undertaken before processing any MPTCP signals, as described in + [RFC5961], and initial subfow sequence numbers SHOULD be generated + according to the recommendations in [RFC6528]. 3.1. Connection Initiation Connection initiation begins with a SYN, SYN/ACK, ACK exchange on a single path. Each packet contains the Multipath Capable (MP_CAPABLE) MPTCP option (Figure 4). This option declares its sender is capable of performing Multipath TCP and wishes to do so on this particular connection. The MP_CAPABLE exchange in this specification (v1) is different to - that specified in v0 [5]. If a host supports multiple versions of - MPTCP, the sender of the MP_CAPABLE option SHOULD signal the highest - version number it supports. The passive opener, on receipt of this, - will signal the version number it wishes to use, which MUST be equal - to or lower than the version number indicated in the initial + that specified in v0 [RFC6824]. If a host supports multiple versions + of MPTCP, the sender of the MP_CAPABLE option SHOULD signal the + highest version number it supports. The passive opener, on receipt + of this, will signal the version number it wishes to use, which MUST + be equal to or lower than the version number indicated in the initial MP_CAPABLE. Given the SYN exchange is different between v1 and v0 the exchange cannot be immediately downgraded, and therefore if the far end has requested a lower version then the initiator SHOULD respond with an ACK without any MP_CAPABLE option, to fall back to regular TCP. If the initiator supports the requsted version, on future connections to the target host, the initiator MAY cache the version preference. Alternatively, the initiator MAY close the connection with a TCP RST and immediately re-establish with the requested version of MPTCP. @@ -674,22 +679,22 @@ identify the connection using a 32-bit "token". This token is a cryptographic hash of this key. The algorithm for this process is dependent on the authentication algorithm selected; the method of selection is defined later in this section. Upon reception of the initial SYN-segment, a stateful server generates a random key and replies with a SYN/ACK. The key's method of generation is implementation specific. The key MUST be hard to guess, and it MUST be unique for the sending host at any one time. Recommendations for generating random numbers for use in keys are - given in [16]. Connections will be indexed at each host by the token - (a one-way hash of the key). Therefore, an implementation will + given in [RFC4086]. Connections will be indexed at each host by the + token (a one-way hash of the key). Therefore, an implementation will require a mapping from each token to the corresponding connection, and in turn to the keys for the connection. There is a risk that two different keys will hash to the same token. The risk of hash collisions is usually small, unless the host is handling many tens of thousands of connections. Therefore, an implementation SHOULD check its list of connection tokens to ensure there is not a collision before sending its key, and if there is, then it should generate a new key. This would, however, be costly for a server with thousands of connections. The subflow handshake @@ -705,23 +710,23 @@ free to exchange cryptographic material out-of-band and generate these keys from this, in order to provide additional mechanisms by which to verify the identity of the communicating entities. For example, an implementation could choose to link its MPTCP keys to those used in higher-layer TLS or SSH connections. If the server behaves in a stateless manner, it has to generate its own key in a verifiable fashion. This verifiable way of generating the key can be done by using a hash of the 4-tuple, sequence number and a local secret (similar to what is done for the TCP-sequence - number [17]). It will thus be able to verify whether it is indeed - the originator of the key echoed back in the later MP_CAPABLE option. - As for a stateful server, the tokens SHOULD be checked for + number [RFC4987]). It will thus be able to verify whether it is + indeed the originator of the key echoed back in the later MP_CAPABLE + option. As for a stateful server, the tokens SHOULD be checked for uniqueness, however if uniqueness is not met, and there is no way to generate an alternative verifiable key, then the connection MUST fall back to using regular TCP by not sending a MP_CAPABLE in the SYN/ACK. The ACK carries both A's key and B's key. This is the first time that A's key is seen on the wire, although it is expected that A will have generated a key locally before the initial SYN. The echoing of B's key allows B to operate statelessly, as described above. Therefore, A's key must be delivered reliably to B, and in order to do this, the transmission of this packet must be made reliable. @@ -786,27 +791,27 @@ "G" to 0. A crypto algorithm MUST be specified. If flag bits C through H are all 0, the MP_CAPABLE option MUST be treated as invalid and ignored (that is, it must be treated as a regular TCP handshake). The selection of the authentication algorithm also impacts the algorithm used to generate the token and the Initial Data Sequence Number (IDSN). In this specification, with only the SHA-1 algorithm (bit "H") specified and selected, the token MUST be a truncated (most - significant 32 bits) SHA-1 hash ([4], [18]) of the key. A different, - 64-bit truncation (the least significant 64 bits) of the SHA-1 hash - of the key MUST be used as the IDSN. Note that the key MUST be - hashed in network byte order. Also note that the "least significant" - bits MUST be the rightmost bits of the SHA-1 digest, as per [4]. - Future specifications of the use of the crypto bits may choose to - specify different algorithms for token and IDSN generation. + significant 32 bits) SHA-1 hash ([sha1], [RFC6234]) of the key. A + different, 64-bit truncation (the least significant 64 bits) of the + SHA-1 hash of the key MUST be used as the IDSN. Note that the key + MUST be hashed in network byte order. Also note that the "least + significant" bits MUST be the rightmost bits of the SHA-1 digest, as + per [sha1]. Future specifications of the use of the crypto bits may + choose to specify different algorithms for token and IDSN generation. Both the crypto and checksum bits negotiate capabilities in similar ways. For the Checksum Required bit (labeled "A"), if either host requires the use of checksums, checksums MUST be used. In other words, the only way for checksums not to be used is if both hosts in their SYNs set A=0. This decision is confirmed by the setting of the "A" bit in the third packet (the ACK) of the handshake. For example, if the initiator sets A=0 in the SYN, but the responder sets A=1 in the SYN/ACK, checksums MUST be used in both directions, and the initiator will set A=1 in the ACK. The decision whether to use @@ -891,32 +896,32 @@ algorithm. An MP_JOIN option is present in the SYN, SYN/ACK, and ACK of the three-way handshake, although in each case with a different format. In the first MP_JOIN on the SYN packet, illustrated in Figure 5, the initiator sends a token, random number, and address ID. The token is used to identify the MPTCP connection and is a cryptographic hash of the receiver's key, as exchanged in the initial MP_CAPABLE handshake (Section 3.1). In this specification, the - tokens presented in this option are generated by the SHA-1 ([4], - [18]) algorithm, truncated to the most significant 32 bits. The + tokens presented in this option are generated by the SHA-1 ([sha1], + [RFC6234]) algorithm, truncated to the most significant 32 bits. The token included in the MP_JOIN option is the token that the receiver of the packet uses to identify this connection; i.e., Host A will send Token-B (which is generated from Key-B). Note that the hash generation algorithm can be overridden by the choice of cryptographic handshake algorithm, as defined in Section 3.1. The MP_JOIN SYN sends not only the token (which is static for a connection) but also random numbers (nonces) that are used to prevent replay attacks on the authentication method. Recommendations for the - generation of random numbers for this purpose are given in [16]. + generation of random numbers for this purpose are given in [RFC4086]. The MP_JOIN option includes an "Address ID". This is an identifier that only has significance within a single connection, where it identifies the source address of this packet, even if the IP header has been changed in transit by a middlebox. The Address ID allows address removal (Section 3.4.2) without needing to know what the source address at the receiver is, thus allowing address removal through NATs. The Address ID also allows correlation between new subflow setup attempts and address signaling (Section 3.4.1), to prevent setting up duplicate subflows on the same path, if an MP_JOIN @@ -967,28 +972,28 @@ that the 32-bit token in the MP_JOIN SYN gives sufficient protection against blind state exhaustion attacks; therefore, there is no need to provide mechanisms to allow a responder to operate statelessly at the MP_JOIN stage. An HMAC is sent by both hosts -- by the initiator (Host A) in the third packet (the ACK) and by the responder (Host B) in the second packet (the SYN/ACK). Doing the HMAC exchange at this stage allows both hosts to have first exchanged random data (in the first two SYN packets) that is used as the "message". This specification defines - that HMAC as defined in [11] is used, along with the SHA-1 hash - algorithm [4] (potentially implemented as in [18]), thus generating a - 160-bit / 20-octet HMAC. Due to option space limitations, the HMAC - included in the SYN/ACK is truncated to the leftmost 64 bits, but - this is acceptable since random numbers are used; thus, an attacker - only has one chance to guess the HMAC correctly (if the HMAC is - incorrect, the TCP connection is closed, so a new MP_JOIN negotiation - with a new random number is required). + that HMAC as defined in [RFC2104] is used, along with the SHA-1 hash + algorithm [sha1] (potentially implemented as in [RFC6234]), thus + generating a 160-bit / 20-octet HMAC. Due to option space + limitations, the HMAC included in the SYN/ACK is truncated to the + leftmost 64 bits, but this is acceptable since random numbers are + used; thus, an attacker only has one chance to guess the HMAC + correctly (if the HMAC is incorrect, the TCP connection is closed, so + a new MP_JOIN negotiation with a new random number is required). The initiator's authentication information is sent in its first ACK (the third packet of the handshake), as shown in Figure 7. This data needs to be sent reliably, since it is the only time this HMAC is sent; therefore, receipt of this packet MUST trigger a regular TCP ACK in response, and the packet MUST be retransmitted if this ACK is not received. In other words, sending the ACK/MP_JOIN packet places the subflow in the PRE_ESTABLISHED state, and it moves to the ESTABLISHED state only on receipt of an ACK from the receiver. It is not permitted to send data while in the PRE_ESTABLISHED state. The @@ -1183,21 +1188,21 @@ 3.3.1. Data Sequence Mapping The data stream as a whole can be reassembled through the use of the data sequence mapping components of the DSS option (Figure 9), which define the mapping from the subflow sequence number to the data sequence number. This is used by the receiver to ensure in-order delivery to the application layer. Meanwhile, the subflow-level sequence numbers (i.e., the regular sequence numbers in the TCP header) have subflow-only relevance. It is expected (but not - mandated) that SACK [13] is used at the subflow level to improve + mandated) that SACK [RFC2018] is used at the subflow level to improve efficiency. The data sequence mapping specifies a mapping from subflow sequence space to data sequence space. This is expressed in terms of starting sequence numbers for the subflow and the data level, and a length of bytes for which this mapping is valid. This explicit mapping for a range of data was chosen rather than per-packet signaling to assist with compatibility with situations where TCP/IP segmentation or coalescing is undertaken separately from the stack that is generating the data flow (e.g., through the use of TCP segmentation offloading @@ -1226,21 +1231,21 @@ The data sequence mapping also contains a checksum of the data that this mapping covers, if use of checksums has been negotiated at the MP_CAPABLE exchange. Checksums are used to detect if the payload has been adjusted in any way by a non-MPTCP-aware middlebox. If this checksum fails, it will trigger a failure of the subflow, or a fallback to regular TCP, as documented in Section 3.8, since MPTCP can no longer reliably know the subflow sequence space at the receiver to build data sequence mappings. - The checksum algorithm used is the standard TCP checksum [1], + The checksum algorithm used is the standard TCP checksum [RFC0793], operating over the data covered by this mapping, along with a pseudo- header as shown in Figure 10. 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +--------------------------------------------------------------+ | | | Data Sequence Number (8 octets) | | | +--------------------------------------------------------------+ @@ -1291,21 +1296,21 @@ numbers is not required, then an implementation MAY include just the lower 32 bits of the data sequence number in the data sequence mapping and/or Data ACK as an optimization, and an implementation can make this choice independently for each packet. An implementaton MUST be able to receive and process both 64-bit or 32-bit sequence number values, but it is not required that an implementation is able to send both. An implementation MUST send the full 64-bit data sequence number if it is transmitting at a sufficiently high rate that the 32-bit value - could wrap within the Maximum Segment Lifetime (MSL) [19]. The + could wrap within the Maximum Segment Lifetime (MSL) [RFC1323]. The lengths of the DSNs used in these values (which may be different) are declared with flags in the DSS option. Implementations MUST accept a 32-bit DSN and implicitly promote it to a 64-bit quantity by incrementing the upper 32 bits of sequence number each time the lower 32 bits wrap. A sanity check MUST be implemented to ensure that a wrap occurs at an expected time (e.g., the sequence number jumps from a very high number to a very low number) and is not triggered by out- of-order packets. As with the standard TCP sequence number, the data sequence number @@ -1599,21 +1604,21 @@ For example, a highly asymmetric path may be misdiagnosed as underperforming. A RST for this purpose SHOULD be accompanied with an appropriate MP_TCPRST option (Section 3.6). 3.3.7. Congestion Control Considerations Different subflows in an MPTCP connection have different congestion windows. To achieve fairness at bottlenecks and resource pooling, it is necessary to couple the congestion windows in use on each subflow, in order to push most traffic to uncongested links. One algorithm - for achieving this is presented in [6]; the algorithm does not + for achieving this is presented in [RFC6356]; the algorithm does not achieve perfect resource pooling but is "safe" in that it is readily deployable in the current Internet. By this, we mean that it does not take up more capacity on any one path than if it was a single path flow using only that route, so this ensures fair coexistence with single-path TCP at shared bottlenecks. It is foreseeable that different congestion controllers will be implemented for MPTCP, each aiming to achieve different properties in the resource pooling/fairness/stability design space, as well as those for achieving different properties in quality of service, @@ -1625,49 +1630,49 @@ for each subflow, which packets were lost and when. 3.3.8. Subflow Policy Within a local MPTCP implementation, a host may use any local policy it wishes to decide how to share the traffic to be sent over the available paths. In the typical use case, where the goal is to maximize throughput, all available paths will be used simultaneously for data transfer, - using coupled congestion control as described in [6]. It is + using coupled congestion control as described in [RFC6356]. It is expected, however, that other use cases will appear. For instance, a possibility is an 'all-or-nothing' approach, i.e., have a second path ready for use in the event of failure of the first path, but alternatives could include entirely saturating one path before using an additional path (the 'overflow' case). Such choices would be most likely based on the monetary cost of links, but may also be based on properties such as the delay or jitter of links, where stability (of delay or bandwidth) is more important than throughput. Application requirements such as these are discussed in - detail in [7]. + detail in [RFC6897]. The ability to make effective choices at the sender requires full knowledge of the path "cost", which is unlikely to be the case. It would be desirable for a receiver to be able to signal their own preferences for paths, since they will often be the multihomed party, and may have to pay for metered incoming bandwidth. Whilst fine-grained control may be the most powerful solution, that would require some mechanism such as overloading the Explicit - Congestion Notification (ECN) signal [20], which is undesirable, and - it is felt that there would not be sufficient benefit to justify an - entirely new signal. Therefore, the MP_JOIN option (see Section 3.2) - contains the 'B' bit, which allows a host to indicate to its peer - that this path should be treated as a backup path to use only in the - event of failure of other working subflows (i.e., a subflow where the - receiver has indicated B=1 SHOULD NOT be used to send data unless - there are no usable subflows where B=0). + Congestion Notification (ECN) signal [RFC3168], which is undesirable, + and it is felt that there would not be sufficient benefit to justify + an entirely new signal. Therefore, the MP_JOIN option (see + Section 3.2) contains the 'B' bit, which allows a host to indicate to + its peer that this path should be treated as a backup path to use + only in the event of failure of other working subflows (i.e., a + subflow where the receiver has indicated B=1 SHOULD NOT be used to + send data unless there are no usable subflows where B=0). In the event that the available set of paths changes, a host may wish to signal a change in priority of subflows to the peer (e.g., a subflow that was previously set as backup should now take priority over all remaining subflows). Therefore, the MP_PRIO option, shown in Figure 11, can be used to change the 'B' flag of the subflow on which it is sent. 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 @@ -1696,21 +1701,21 @@ to its peer that an address is temporarily unavailable (for example, if it has radio coverage issues) and the peer should therefore drop to backup state on all subflows using that Address ID. 3.4. Address Knowledge Exchange (Path Management) We use the term "path management" to refer to the exchange of information about additional paths between hosts, which in this design is managed by multiple addresses at hosts. For more detail of the architectural thinking behind this design, see the MPTCP - Architecture document [2]. + Architecture document [RFC6182]. This design makes use of two methods of sharing such information, and both can be used on a connection. The first is the direct setup of new subflows, already described in Section 3.2, where the initiator has an additional address. The second method, described in the following subsections, signals addresses explicitly to the other host to allow it to initiate new subflows. The two mechanisms are complementary: the first is implicit and simple, while the explicit is more complex but is more robust. Together, the mechanisms allow addresses to change in flight (and thus support operation through @@ -1753,22 +1758,22 @@ instance, signaling addresses in other address families can only be done explicitly using the Add Address option. 3.4.1. Address Advertisement The Add Address (ADD_ADDR) MPTCP option announces additional addresses (and optionally, ports) on which a host can be reached (Figure 12). This option can be used at any time during a connection, depending on when the sender wishes to enable multiple paths and/or when paths become available. As with all MPTCP signals, - the receiver MUST undertake standard TCP validity checks, e.g. [12], - before acting upon it. + the receiver MUST undertake standard TCP validity checks, e.g. + [RFC5961], before acting upon it. Every address has an Address ID that can be used for uniquely identifying the address within a connection for address removal. This is also used to identify MP_JOIN options (see Section 3.2) relating to the same address, even when address translators are in use. The Address ID MUST uniquely identify the address to the sender (within the scope of the connection), but the mechanism for allocating such IDs is implementation specific. All address IDs learned via either MP_JOIN or ADD_ADDR SHOULD be @@ -1796,35 +1801,35 @@ the explicit specification of a different port is required. If no port is specified, MPTCP SHOULD attempt to connect to the specified address on the same port as is already in use by the subflow on which the ADD_ADDR signal was sent; this is discussed in more detail in Section 3.10. The Truncated HMAC present in this Option is the rightmost 64 bits of an HMAC, negotiated and calculated in the same way as for MP_JOIN as described in Section 3.2. For this specification of MPTCP, as there is only one hash algorithm option specified, this will be HMAC as - defined in [11], using the SHA-1 hash algorithm [4], implemented as - in [18]. In the same way as for MP_JOIN, the key for the HMAC - algorithm, in the case of the message transmitted by Host A, will be - Key-A followed by Key-B, and in the case of Host B, Key-B followed by - Key-A. These are the keys that were exchanged in the original - MP_CAPABLE handshake. The message for the HMAC is the Address ID, IP - Address, and Port which precede the HMAC in the ADD_ADDR option. If - the port is not present in the ADD_ADDR option, the HMAC message will - nevertheless include two octets of value zero. The rationale for the - HMAC is to prevent unauthorized entities from injecting ADD_ADDR - signals in an attempt to hijack a connection. Note that additionally - the presence of this HMAC prevents the address being changed in - flight unless the key is known by an intermediary. If a host - receives an ADD_ADDR option for which it cannot validate the HMAC, it - SHOULD silently ignore the option. + defined in [RFC2104], using the SHA-1 hash algorithm [sha1], + implemented as in [RFC6234]. In the same way as for MP_JOIN, the key + for the HMAC algorithm, in the case of the message transmitted by + Host A, will be Key-A followed by Key-B, and in the case of Host B, + Key-B followed by Key-A. These are the keys that were exchanged in + the original MP_CAPABLE handshake. The message for the HMAC is the + Address ID, IP Address, and Port which precede the HMAC in the + ADD_ADDR option. If the port is not present in the ADD_ADDR option, + the HMAC message will nevertheless include two octets of value zero. + The rationale for the HMAC is to prevent unauthorized entities from + injecting ADD_ADDR signals in an attempt to hijack a connection. + Note that additionally the presence of this HMAC prevents the address + being changed in flight unless the key is known by an intermediary. + If a host receives an ADD_ADDR option for which it cannot validate + the HMAC, it SHOULD silently ignore the option. A set of four flags are present after the subtype and before the Address ID. These are currently unassigned and MUST be set to zero by a sender and MUST be ignored by the receiver. 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +---------------+---------------+-------+-------+---------------+ | Kind | Length |Subtype|(resvd)| Address ID | +---------------+---------------+-------+-------+---------------+ @@ -1833,21 +1838,21 @@ | Port (2 octets, optional) | | +-------------------------------+ | | Truncated HMAC (8 octets) | | +-------------------------------+ | | +-------------------------------+ Figure 12: Add Address (ADD_ADDR) Option Due to the proliferation of NATs, it is reasonably likely that one - host may attempt to advertise private addresses [21]. It is not + host may attempt to advertise private addresses [RFC1918]. It is not desirable to prohibit this, since there may be cases where both hosts have additional interfaces on the same private network, and a host MAY want to advertise such addresses. The MP_JOIN handshake to create a new subflow (Section 3.2) provides mechanisms to minimize security risks. The MP_JOIN message contains a 32-bit token that uniquely identifies the connection to the receiving host. If the token is unknown, the host will return with a RST. In the unlikely event that the token is known, subflow setup will continue, but the HMAC exchange must occur for authentication. This will fail, and will provide sufficient protection against two unconnected hosts @@ -1887,43 +1892,44 @@ attempt on a previously advertised address/port combination can therefore refresh ADD_ADDR information by sending the option again. During normal MPTCP operation, it is unlikely that there will be sufficient TCP option space for ADD_ADDR to be included along with those for data sequence numbering (Section 3.3.1). Therefore, it is expected that an MPTCP implementation will send the ADD_ADDR option on separate ACKs. As discussed earlier, however, an MPTCP implementation MUST NOT treat duplicate ACKs with any MPTCP option, with the exception of the DSS option, as indications of congestion - [14], and an MPTCP implementation SHOULD NOT send more than two + [RFC5681], and an MPTCP implementation SHOULD NOT send more than two duplicate ACKs in a row for signaling purposes. 3.4.2. Remove Address If, during the lifetime of an MPTCP connection, a previously announced address becomes invalid (e.g., if the interface disappears), the affected host SHOULD announce this so that the peer can remove subflows related to this address. This is achieved through the Remove Address (REMOVE_ADDR) option (Figure 13), which will remove a previously added address (or list of addresses) from a connection and terminate any subflows currently using that address. For security purposes, if a host receives a REMOVE_ADDR option, it must ensure the affected path(s) are no longer in use before it instigates closure. The receipt of REMOVE_ADDR SHOULD first trigger - the sending of a TCP keepalive [22] on the path, and if a response is - received the path SHOULD NOT be removed. Typical TCP validity tests - on the subflow (e.g., ensuring sequence and ACK numbers are correct) - MUST also be undertaken. An implementation can use indications of - these test failures as part of intrusion detection or error logging. + the sending of a TCP keepalive [RFC1122] on the path, and if a + response is received the path SHOULD NOT be removed. Typical TCP + validity tests on the subflow (e.g., ensuring sequence and ACK + numbers are correct) MUST also be undertaken. An implementation can + use indications of these test failures as part of intrusion detection + or error logging. The sending and receipt (if no keepalive response was received) of this message SHOULD trigger the sending of RSTs by both hosts on the affected subflow(s) (if possible), as a courtesy to cleaning up middlebox state, before cleaning up any local state. Address removal is undertaken by ID, so as to permit the use of NATs and other middleboxes that rewrite source addresses. If there is no address at the requested ID, the receiver will silently ignore the request. @@ -1949,24 +1955,24 @@ remaining subflows. MPTCP's connection will stay alive at the data level, in order to permit break-before-make handover between subflows. It is therefore necessary to provide an MPTCP-level "reset" to allow the abrupt closure of the whole MPTCP connection, and this is the MP_FASTCLOSE option. MP_FASTCLOSE is used to indicate to the peer that the connection will be abruptly closed and no data will be accepted anymore. The reasons for triggering an MP_FASTCLOSE are implementation specific. Regular TCP does not allow sending a RST while the connection is in a - synchronized state [1]. Nevertheless, implementations allow the - sending of a RST in this state, if, for example, the operating system - is running out of resources. In these cases, MPTCP should send the - MP_FASTCLOSE. This option is illustrated in Figure 14. + synchronized state [RFC0793]. Nevertheless, implementations allow + the sending of a RST in this state, if, for example, the operating + system is running out of resources. In these cases, MPTCP should + send the MP_FASTCLOSE. This option is illustrated in Figure 14. 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +---------------+---------------+-------+-----------------------+ | Kind | Length |Subtype| (reserved) | +---------------+---------------+-------+-----------------------+ | Option Receiver's Key | | (64 bits) | | | +---------------------------------------------------------------+ @@ -2096,21 +2102,22 @@ +---------------+---------------+-------+-------+---------------+ | Kind | Length |Subtype|S|U|rsv| Experiment | +---------------+---------------+-------+-------+---------------+ | Id. (16 bits) | Subtype-specific data (variable length) ... +----------------------------------------------------------- ... Figure 16: MPTCP Experimental (MP_EXPERIMENTAL) Option Figure 16 shows the format of the experimental option. The Experiment identifier is a 16 bits integer that shall be assigned by - using the same procedure as defined in [23]. + using the same procedure as defined in [RFC6994]; a request to IANA + is made in Section 8.4. The two high order flags that are included in the MPTCP Experimental option have the following semantics: o "S" flag (highest order bit) : This is the synchronising bit. When set to 1, it indicates that the host sending this option expects a reply from the remote host with an option having the same experiment identifier, but possibly containing other data. o "U" flag (second highest order bit) : When set to 1, this flag @@ -2371,21 +2379,22 @@ This strategy is intended to maximize the probability of the SYN being permitted by a firewall or NAT at the recipient and to avoid confusing any network monitoring software. There may also be cases, however, where the passive opener wishes to signal to the other host that a specific port should be used, and this facility is provided in the Add Address option as documented in Section 3.4.1. It is therefore feasible to allow multiple subflows between the same two addresses but using different port pairs, and such a facility could be used to allow load balancing within the - network based on 5-tuples (e.g., some ECMP implementations [8]). + network based on 5-tuples (e.g., some ECMP implementations + [RFC2992]). 3.10.2. Delayed Subflow Start and Subflow Symmetry Many TCP connections are short-lived and consist only of a few segments, and so the overheads of using MPTCP outweigh any benefits. A heuristic is required, therefore, to decide when to start using additional subflows in an MPTCP connection. We expect that experience gathered from deployments will provide further guidance on this, and will be affected by particular application characteristics (which are likely to change over time). However, a suggested @@ -2522,30 +2531,30 @@ per-connection local policy. Adding an address to one connection (either explicitly through an Add Address message, or implicitly through a Join) has no implication for other connections between the same pair of hosts. 5-tuple: The 5-tuple (protocol, local address, local port, remote address, remote port) presented by kernel APIs to the application layer in a non-multipath-aware application is that of the first subflow, even if the subflow has since been closed and removed from the connection. This decision, and other related API issues, - are discussed in more detail in [7]. + are discussed in more detail in [RFC6897]. 5. Security Considerations - As identified in [10], the addition of multipath capability to TCP - will bring with it a number of new classes of threat. In order to - prevent these, [2] presents a set of requirements for a security - solution for MPTCP. The fundamental goal is for the security of - MPTCP to be "no worse" than regular TCP today, and the key security - requirements are: + As identified in [RFC6181], the addition of multipath capability to + TCP will bring with it a number of new classes of threat. In order + to prevent these, [RFC6182] presents a set of requirements for a + security solution for MPTCP. The fundamental goal is for the + security of MPTCP to be "no worse" than regular TCP today, and the + key security requirements are: o Provide a mechanism to confirm that the parties in a subflow handshake are the same as in the original connection setup. o Provide verification that the peer can receive traffic at a new address before using it as part of a connection. o Provide replay protection, i.e., ensure that a request to add/ remove a subflow is 'fresh'. @@ -2559,30 +2568,30 @@ cryptographic material, future subflows use a truncated cryptographic hash of this key as the connection identification "token". The keys are concatenated and used as keys for creating Hash-based Message Authentication Codes (HMACs) used on subflow setup, in order to verify that the parties in the handshake are the same as in the original connection setup. It also provides verification that the peer can receive traffic at this new address. Replay attacks would still be possible when only keys are used; therefore, the handshakes use single-use random numbers (nonces) at both ends -- this ensures the HMAC will never be the same on two handshakes. Guidance on - generating random numbers suitable for use as keys is given in [16] - and discussed in Section 3.1. + generating random numbers suitable for use as keys is given in + [RFC4086] and discussed in Section 3.1. The use of crypto capability bits in the initial connection handshake to negotiate use of a particular algorithm allows the deployment of additional crypto mechanisms in the future. Note that this would be susceptible to bid-down attacks only if the attacker was on-path (and thus would be able to modify the data anyway). The security mechanism presented in this document should therefore protect against - all forms of flooding and hijacking attacks discussed in [10]. + all forms of flooding and hijacking attacks discussed in [RFC6181]. During normal operation, regular TCP protection mechanisms (such as ensuring sequence numbers are in-window) will provide the same level of protection against attacks on individual TCP subflows as exists for regular TCP today. Implementations will introduce additional buffers compared to regular TCP, to reassemble data at the connection level. The application of window sizing will minimize the risk of denial-of-service attacks consuming resources. As discussed in Section 3.4.1, a host may advertise its private @@ -2596,42 +2605,42 @@ implementations should consider heuristics (Section 3.10) at both the sender and receiver to reduce the impact of this. A small security risk could theoretically exist with key reuse, but in order to accomplish a replay attack, both the sender and receiver keys, and the sender and receiver random numbers, in the MP_JOIN handshake (Section 3.2) would have to match. Whilst this specification defines a "medium" security solution, meeting the criteria specified at the start of this section and the - threat analysis ([10]), since attacks only ever get worse, it is + threat analysis ([RFC6181]), since attacks only ever get worse, it is likely that a future Standards Track version of MPTCP would need to be able to support stronger security. There are several ways the security of MPTCP could potentially be improved; some of these would be compatible with MPTCP as defined in this document, whilst others may not be. For now, the best approach is to get experience with the current approach, establish what might work, and check that the threat analysis is still accurate. Possible ways of improving MPTCP security could include: o defining a new MPCTP cryptographic algorithm, as negotiated in MP_CAPABLE. A sub-case could be to include an additional deployment assumption, such as stateful servers, in order to allow a more powerful algorithm to be used. o defining how to secure data transfer with MPTCP, whilst not changing the signaling part of the protocol. o defining security that requires more option space, perhaps in conjunction with a "long options" proposal for extending the TCP - options space (such as those surveyed in [24]), or perhaps + options space (such as those surveyed in [TCPLO]), or perhaps building on the current approach with a second stage of MPTCP- option-based security. o revisiting the working group's decision to exclusively use TCP options for MPTCP signaling, and instead look at also making use of the TCP payloads. MPTCP has been designed with several methods available to indicate a new security mechanism, including: @@ -2721,51 +2730,51 @@ of some Data ACKs, but performance will degrade as the fraction of stripped options increases. We do not expect such cases to appear in practice, though: most middleboxes will either strip all options or let them all through. We end this section with a list of middlebox classes, their behavior, and the elements in the MPTCP design that allow operation through such middleboxes. Issues surrounding dropping packets with options or stripping options were discussed above, and are not included here: - o NATs [25] (Network Address (and Port) Translators) change the + o NATs [RFC3022] (Network Address (and Port) Translators) change the source address (and often source port) of packets. This means that a host will not know its public-facing address for signaling in MPTCP. Therefore, MPTCP permits implicit address addition via the MP_JOIN option, and the handshake mechanism ensures that - connection attempts to private addresses [21] do not cause + connection attempts to private addresses [RFC1918] do not cause problems. Explicit address removal is undertaken by an Address ID to allow no knowledge of the source address. - o Performance Enhancing Proxies (PEPs) [26] might proactively ACK - data to increase performance. MPTCP, however, relies on accurate - congestion control signals from the end host, and non-MPTCP-aware - PEPs will not be able to provide such signals. MPTCP will, - therefore, fall back to single-path TCP, or close the problematic - subflow (see Section 3.8). + o Performance Enhancing Proxies (PEPs) [RFC3135] might proactively + ACK data to increase performance. MPTCP, however, relies on + accurate congestion control signals from the end host, and non- + MPTCP-aware PEPs will not be able to provide such signals. MPTCP + will, therefore, fall back to single-path TCP, or close the + problematic subflow (see Section 3.8). - o Traffic Normalizers [27] may not allow holes in sequence numbers, - and may cache packets and retransmit the same data. MPTCP looks - like standard TCP on the wire, and will not retransmit different - data on the same subflow sequence number. In the event of a - retransmission, the same data will be retransmitted on the + o Traffic Normalizers [norm] may not allow holes in sequence + numbers, and may cache packets and retransmit the same data. + MPTCP looks like standard TCP on the wire, and will not retransmit + different data on the same subflow sequence number. In the event + of a retransmission, the same data will be retransmitted on the original TCP subflow even if it is additionally retransmitted at the connection level on a different subflow. - o Firewalls [28] might perform initial sequence number randomization - on TCP connections. MPTCP uses relative sequence numbers in data - sequence mapping to cope with this. Like NATs, firewalls will not - permit many incoming connections, so MPTCP supports address - signaling (ADD_ADDR) so that a multiaddressed host can invite its - peer behind the firewall/NAT to connect out to its additional - interface. + o Firewalls [RFC2979] might perform initial sequence number + randomization on TCP connections. MPTCP uses relative sequence + numbers in data sequence mapping to cope with this. Like NATs, + firewalls will not permit many incoming connections, so MPTCP + supports address signaling (ADD_ADDR) so that a multiaddressed + host can invite its peer behind the firewall/NAT to connect out to + its additional interface. o Intrusion Detection Systems look out for traffic patterns and content that could threaten a network. Multipath will mean that such data is potentially spread, so it is more difficult for an IDS to analyze the whole traffic, and potentially increases the risk of false positives. However, for an MPTCP-aware IDS, tokens can be read by such systems to correlate multiple subflows and reassemble for analysis. o Application-level middleboxes such as content-aware firewalls may @@ -2805,35 +2814,37 @@ The authors also wish to acknowledge reviews and contributions from Iljitsch van Beijnum, Lars Eggert, Marcelo Bagnulo, Robert Hancock, Pasi Sarolahti, Toby Moncaster, Philip Eardley, Sergio Lembo, Lawrence Conroy, Yoshifumi Nishida, Bob Briscoe, Stein Gjessing, Andrew McGregor, Georg Hampel, Anumita Biswas, Wes Eddy, Alexey Melnikov, Francis Dupont, Adrian Farrel, Barry Leiba, Robert Sparks, Sean Turner, Stephen Farrell, Martin Stiemerling, and Gregory Detal. 8. IANA Considerations - This document updates [5] and as such IANA is requested to update the - TCP option space registry to point to this document for Multipath - TCP, as follows: + This document updates [RFC6824] and as such IANA is requested to + update the TCP option space registry to point to this document for + Multipath TCP, as follows: +------+--------+-----------------------+---------------+ | Kind | Length | Meaning | Reference | +------+--------+-----------------------+---------------+ | 30 | N | Multipath TCP (MPTCP) | This document | +------+--------+-----------------------+---------------+ Table 1: TCP Option Kind Numbers +8.1. MPTCP Option Subtypes + The 4-bit MPTCP subtype sub-registry ("MPTCP Option Subtypes" under the "Transmission Control Protocol (TCP) Parameters" registry) was - defined in [5]. This document defines one additional subtype + defined in [RFC6824]. This document defines one additional subtype (ADD_ADDR) and updates the references to this document for all sub- types except ADD_ADDR, which is deprecated. The updates are listed in the following table. +-------+-----------------+-------------------------+---------------+ | Value | Symbol | Name | Reference | +-------+-----------------+-------------------------+---------------+ | 0x0 | MP_CAPABLE | Multipath Capable | This | | | | | document, | | | | | Section 3.1 | @@ -2863,20 +2874,22 @@ | | | | Section 3.6 | | 0xf | MP_EXPERIMENTAL | MPTCP Experimental | This | | | | Option | document, | | | | | Section 3.7 | +-------+-----------------+-------------------------+---------------+ Table 2: MPTCP Option Subtypes Values 0x9 through 0xe are currently unassigned. +8.2. MPTCP Handshake Algorithms + IANA has created another sub-registry, "MPTCP Handshake Algorithms" under the "Transmission Control Protocol (TCP) Parameters" registry, based on the flags in MP_CAPABLE (Section 3.1). IANA is requested to update the references of this table to this document, as follows: +----------+-------------------+----------------------------+ | Flag Bit | Meaning | Reference | +----------+-------------------+----------------------------+ | A | Checksum required | This document, Section 3.1 | | B | Extensibility | This document, Section 3.1 | @@ -2884,177 +2897,213 @@ | H | HMAC-SHA1 | This document, Section 3.2 | +----------+-------------------+----------------------------+ Table 3: MPTCP Handshake Algorithms Note that the meanings of bits C through H can be dependent upon bit B, depending on how Extensibility is defined in future specifications; see Section 3.1 for more information. Future assignments in this registry are also to be defined by - Standards Action as defined by [29]. Assignments consist of the + Standards Action as defined by [RFC5226]. Assignments consist of the value of the flags, a symbolic name for the algorithm, and a reference to its specification. +8.3. MP_TCPRST Reason Codes + IANA is requested to create a further sub-registry, "MP_TCPRST Reason Codes" under the "Transmission Control Protocol (TCP) Parameters" registry, based on the reason code in MP_TCPRST (Section 3.6). The contents of this sub-registry are to to this document, as follows: +------+-----------------------------+----------------------------+ | Code | Meaning | Reference | +------+-----------------------------+----------------------------+ | 0x00 | Unspecified TCP error | This document, Section 3.6 | | 0x01 | MPTCP specific error | This document, Section 3.6 | | 0x02 | Lack of resources | This document, Section 3.6 | | 0x03 | Administratively prohibited | This document, Section 3.6 | | 0x04 | Too much outstanding data | This document, Section 3.6 | | 0x05 | Unacceptable performance | This document, Section 3.6 | | 0x06 | Middlebox interference | This document, Section 3.6 | +------+-----------------------------+----------------------------+ Table 4: MPTCP MP_TCPRST Reason Codes +8.4. Experimental option registry + + Section 3.7 has defined the MP_EXPERIMENTAL option for private, + experimental MPTCP options, and the same considerations as for + [RFC6994] apply. IANA should create a "Multipath TCP Experimental + Option Identifiers (MPTCP ExIDs)" sub-registry. This registry + contains the 16 bits ExIDs and a reference (description, document + pointer, or assignee name and e-mail contact) for each entry. MPTCP + ExIDs are assigned on a First Come, First Served (FCFS) basis + [RFC5226]. + + IANA will advise applicants of duplicate entries to select an + alternate value, as per typical FCFS processing. + + IANA will record known duplicate uses to assist the community in both + debugging assigned uses as well as correcting unauthorized duplicate + uses. + + IANA should impose no requirement on making a registration other than + indicating the desired codepoint and providing a point of contact. A + short description or acronym for the use is desired but should not be + required. + 9. References 9.1. Normative References - [1] Postel, J., "Transmission Control Protocol", STD 7, RFC 793, - DOI 10.17487/RFC0793, September 1981, + [RFC0793] Postel, J., "Transmission Control Protocol", STD 7, + RFC 793, DOI 10.17487/RFC0793, September 1981, . - [2] Ford, A., Raiciu, C., Handley, M., Barre, S., and J. Iyengar, - "Architectural Guidelines for Multipath TCP Development", - RFC 6182, DOI 10.17487/RFC6182, March 2011, - . - - [3] Bradner, S., "Key words for use in RFCs to Indicate Requirement - Levels", BCP 14, RFC 2119, DOI 10.17487/RFC2119, March 1997, + [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate + Requirement Levels", BCP 14, RFC 2119, DOI 10.17487/ + RFC2119, March 1997, . - [4] National Institute of Science and Technology, "Secure Hash + [RFC6182] Ford, A., Raiciu, C., Handley, M., Barre, S., and J. + Iyengar, "Architectural Guidelines for Multipath TCP + Development", RFC 6182, DOI 10.17487/RFC6182, March 2011, + . + + [sha1] National Institute of Science and Technology, "Secure Hash Standard", Federal Information Processing Standard - (FIPS) 180-3, October 2008, . + (FIPS) 180-3, October 2008, . 9.2. Informative References - [5] Ford, A., Raiciu, C., Handley, M., and O. Bonaventure, "TCP - Extensions for Multipath Operation with Multiple Addresses", - RFC 6824, DOI 10.17487/RFC6824, January 2013, - . - - [6] Raiciu, C., Handley, M., and D. Wischik, "Coupled Congestion - Control for Multipath Transport Protocols", RFC 6356, - DOI 10.17487/RFC6356, October 2011, - . + [RFC1122] Braden, R., Ed., "Requirements for Internet Hosts - + Communication Layers", STD 3, RFC 1122, DOI 10.17487/ + RFC1122, October 1989, + . - [7] Scharf, M. and A. Ford, "Multipath TCP (MPTCP) Application - Interface Considerations", RFC 6897, DOI 10.17487/RFC6897, - March 2013, . + [RFC1323] Jacobson, V., Braden, R., and D. Borman, "TCP Extensions + for High Performance", RFC 1323, DOI 10.17487/RFC1323, + May 1992, . - [8] Hopps, C., "Analysis of an Equal-Cost Multi-Path Algorithm", - RFC 2992, DOI 10.17487/RFC2992, November 2000, - . + [RFC1918] Rekhter, Y., Moskowitz, B., Karrenberg, D., de Groot, G., + and E. Lear, "Address Allocation for Private Internets", + BCP 5, RFC 1918, DOI 10.17487/RFC1918, February 1996, + . - [9] Raiciu, C., Paasch, C., Barre, S., Ford, A., Honda, M., - Duchene, F., Bonaventure, O., and M. Handley, "How Hard Can It - Be? Designing and Implementing a Deployable Multipath TCP", - Usenix Symposium on Networked Systems Design and - Implementation 2012, . + [RFC2018] Mathis, M., Mahdavi, J., Floyd, S., and A. Romanow, "TCP + Selective Acknowledgment Options", RFC 2018, DOI 10.17487/ + RFC2018, October 1996, + . - [10] Bagnulo, M., "Threat Analysis for TCP Extensions for Multipath - Operation with Multiple Addresses", RFC 6181, DOI 10.17487/ - RFC6181, March 2011, . + [RFC2104] Krawczyk, H., Bellare, M., and R. Canetti, "HMAC: Keyed- + Hashing for Message Authentication", RFC 2104, + DOI 10.17487/RFC2104, February 1997, + . - [11] Krawczyk, H., Bellare, M., and R. Canetti, "HMAC: Keyed-Hashing - for Message Authentication", RFC 2104, DOI 10.17487/RFC2104, - February 1997, . + [RFC2979] Freed, N., "Behavior of and Requirements for Internet + Firewalls", RFC 2979, DOI 10.17487/RFC2979, October 2000, + . - [12] Ramaiah, A., Stewart, R., and M. Dalal, "Improving TCP's - Robustness to Blind In-Window Attacks", RFC 5961, DOI 10.17487/ - RFC5961, August 2010, . + [RFC2992] Hopps, C., "Analysis of an Equal-Cost Multi-Path + Algorithm", RFC 2992, DOI 10.17487/RFC2992, November 2000, + . - [13] Mathis, M., Mahdavi, J., Floyd, S., and A. Romanow, "TCP - Selective Acknowledgment Options", RFC 2018, DOI 10.17487/ - RFC2018, October 1996, - . + [RFC3022] Srisuresh, P. and K. Egevang, "Traditional IP Network + Address Translator (Traditional NAT)", RFC 3022, + DOI 10.17487/RFC3022, January 2001, + . - [14] Allman, M., Paxson, V., and E. Blanton, "TCP Congestion - Control", RFC 5681, DOI 10.17487/RFC5681, September 2009, - . + [RFC3135] Border, J., Kojo, M., Griner, J., Montenegro, G., and Z. + Shelby, "Performance Enhancing Proxies Intended to + Mitigate Link-Related Degradations", RFC 3135, + DOI 10.17487/RFC3135, June 2001, + . - [15] Gont, F. and S. Bellovin, "Defending against Sequence Number - Attacks", RFC 6528, DOI 10.17487/RFC6528, February 2012, - . + [RFC3168] Ramakrishnan, K., Floyd, S., and D. Black, "The Addition + of Explicit Congestion Notification (ECN) to IP", + RFC 3168, DOI 10.17487/RFC3168, September 2001, + . - [16] Eastlake 3rd, D., Schiller, J., and S. Crocker, "Randomness - Requirements for Security", BCP 106, RFC 4086, DOI 10.17487/ - RFC4086, June 2005, . + [RFC4086] Eastlake 3rd, D., Schiller, J., and S. Crocker, + "Randomness Requirements for Security", BCP 106, RFC 4086, + DOI 10.17487/RFC4086, June 2005, + . - [17] Eddy, W., "TCP SYN Flooding Attacks and Common Mitigations", - RFC 4987, DOI 10.17487/RFC4987, August 2007, + [RFC4987] Eddy, W., "TCP SYN Flooding Attacks and Common + Mitigations", RFC 4987, DOI 10.17487/RFC4987, August 2007, . - [18] Eastlake 3rd, D. and T. Hansen, "US Secure Hash Algorithms (SHA - and SHA-based HMAC and HKDF)", RFC 6234, DOI 10.17487/RFC6234, - May 2011, . + [RFC5226] Narten, T. and H. Alvestrand, "Guidelines for Writing an + IANA Considerations Section in RFCs", BCP 26, RFC 5226, + DOI 10.17487/RFC5226, May 2008, + . - [19] Jacobson, V., Braden, R., and D. Borman, "TCP Extensions for - High Performance", RFC 1323, DOI 10.17487/RFC1323, May 1992, - . + [RFC5681] Allman, M., Paxson, V., and E. Blanton, "TCP Congestion + Control", RFC 5681, DOI 10.17487/RFC5681, September 2009, + . - [20] Ramakrishnan, K., Floyd, S., and D. Black, "The Addition of - Explicit Congestion Notification (ECN) to IP", RFC 3168, - DOI 10.17487/RFC3168, September 2001, - . + [RFC5961] Ramaiah, A., Stewart, R., and M. Dalal, "Improving TCP's + Robustness to Blind In-Window Attacks", RFC 5961, + DOI 10.17487/RFC5961, August 2010, + . - [21] Rekhter, Y., Moskowitz, B., Karrenberg, D., de Groot, G., and - E. Lear, "Address Allocation for Private Internets", BCP 5, - RFC 1918, DOI 10.17487/RFC1918, February 1996, - . + [RFC6181] Bagnulo, M., "Threat Analysis for TCP Extensions for + Multipath Operation with Multiple Addresses", RFC 6181, + DOI 10.17487/RFC6181, March 2011, + . - [22] Braden, R., Ed., "Requirements for Internet Hosts - - Communication Layers", STD 3, RFC 1122, DOI 10.17487/RFC1122, - October 1989, . + [RFC6234] Eastlake 3rd, D. and T. Hansen, "US Secure Hash Algorithms + (SHA and SHA-based HMAC and HKDF)", RFC 6234, + DOI 10.17487/RFC6234, May 2011, + . - [23] Touch, J., "Shared Use of Experimental TCP Options", RFC 6994, - DOI 10.17487/RFC6994, August 2013, - . + [RFC6356] Raiciu, C., Handley, M., and D. Wischik, "Coupled + Congestion Control for Multipath Transport Protocols", + RFC 6356, DOI 10.17487/RFC6356, October 2011, + . - [24] Ramaiah, A., "TCP option space extension", Work in Progress, - March 2012. + [RFC6528] Gont, F. and S. Bellovin, "Defending against Sequence + Number Attacks", RFC 6528, DOI 10.17487/RFC6528, + February 2012, . - [25] Srisuresh, P. and K. Egevang, "Traditional IP Network Address - Translator (Traditional NAT)", RFC 3022, DOI 10.17487/RFC3022, - January 2001, . + [RFC6824] Ford, A., Raiciu, C., Handley, M., and O. Bonaventure, + "TCP Extensions for Multipath Operation with Multiple + Addresses", RFC 6824, DOI 10.17487/RFC6824, January 2013, + . - [26] Border, J., Kojo, M., Griner, J., Montenegro, G., and Z. - Shelby, "Performance Enhancing Proxies Intended to Mitigate - Link-Related Degradations", RFC 3135, DOI 10.17487/RFC3135, - June 2001, . + [RFC6897] Scharf, M. and A. Ford, "Multipath TCP (MPTCP) Application + Interface Considerations", RFC 6897, DOI 10.17487/RFC6897, + March 2013, . - [27] Handley, M., Paxson, V., and C. Kreibich, "Network Intrusion - Detection: Evasion, Traffic Normalization, and End-to-End - Protocol Semantics", Usenix Security 2001, 2001, . + [RFC6994] Touch, J., "Shared Use of Experimental TCP Options", + RFC 6994, DOI 10.17487/RFC6994, August 2013, + . - [28] Freed, N., "Behavior of and Requirements for Internet - Firewalls", RFC 2979, DOI 10.17487/RFC2979, October 2000, - . + [TCPLO] Ramaiah, A., "TCP option space extension", Work + in Progress, March 2012. - [29] Narten, T. and H. Alvestrand, "Guidelines for Writing an IANA - Considerations Section in RFCs", BCP 26, RFC 5226, - DOI 10.17487/RFC5226, May 2008, - . + [howhard] Raiciu, C., Paasch, C., Barre, S., Ford, A., Honda, M., + Duchene, F., Bonaventure, O., and M. Handley, "How Hard + Can It Be? Designing and Implementing a Deployable + Multipath TCP", Usenix Symposium on Networked Systems + Design and Implementation 2012, . + + [norm] Handley, M., Paxson, V., and C. Kreibich, "Network + Intrusion Detection: Evasion, Traffic Normalization, and + End-to-End Protocol Semantics", Usenix Security 2001, + 2001, . Appendix A. Notes on Use of TCP Options The TCP option space is limited due to the length of the Data Offset field in the TCP header (4 bits), which defines the TCP header length in 32-bit words. With the standard TCP header being 20 bytes, this leaves a maximum of 40 bytes for options, and many of these may already be used by options such as timestamp and SACK. We have performed a brief study on the commonly used TCP options in @@ -3066,22 +3115,22 @@ bytes) options. Together these sum to 19 bytes. Some operating systems appear to pad each option up to a word boundary, thus using 24 bytes (a brief survey suggests Windows XP and Mac OS X do this, whereas Linux does not). Optimistically, therefore, we have 21 bytes spare, or 16 if it has to be word-aligned. In either case, however, the SYN versions of Multipath Capable (12 bytes) and Join (12 or 16 bytes) options will fit in this remaining space. Note that due to the use of a 64-bit data-level sequence space, it is feasible that MPTCP will not require the timestamp option for - protection against wrapped sequence numbers (PAWS [19]), since the - data-level sequence space has far less chance of wrapping. + protection against wrapped sequence numbers (PAWS [RFC1323]), since + the data-level sequence space has far less chance of wrapping. Confirmation of the validity of this optimisation is for further study. TCP data packets typically carry timestamp options in every packet, taking 10 bytes (or 12 with padding). That leaves 30 bytes (or 28, if word-aligned). The Data Sequence Signal (DSS) option varies in length depending on whether the data sequence mapping and DATA_ACK are included, and whether the sequence numbers in use are 4 or 8 octets. The maximum size of the DSS option is 28 bytes, so even that will fit in the available space. But unless a connection is both @@ -3131,23 +3180,23 @@ with data in order to avoid interpretation as congestion). The cases where options are stripped by middleboxes are discussed in Section 6. Appendix B. Control Blocks Conceptually, an MPTCP connection can be represented as an MPTCP control block that contains several variables that track the progress and the state of the MPTCP connection and a set of linked TCP control blocks that correspond to the subflows that have been established. - RFC 793 [1] specifies several state variables. Whenever possible, we - reuse the same terminology as RFC 793 to describe the state variables - that are maintained by MPTCP. + RFC 793 [RFC0793] specifies several state variables. Whenever + possible, we reuse the same terminology as RFC 793 to describe the + state variables that are maintained by MPTCP. B.1. MPTCP Control Block The MPTCP control block contains the following variable per connection. B.1.1. Authentication and Metadata Local.Token (32 bits): This is the token chosen by the local host on this MPTCP connection. The token MUST be unique among all