--- 1/draft-ietf-mptcp-rfc6824bis-10.txt 2018-05-15 12:13:15.149613087 -0700 +++ 2/draft-ietf-mptcp-rfc6824bis-11.txt 2018-05-15 12:13:15.309616923 -0700 @@ -1,25 +1,25 @@ Internet Engineering Task Force A. Ford Internet-Draft Pexip Obsoletes: 6824 (if approved) C. Raiciu Intended status: Standards Track U. Politechnica of Bucharest -Expires: September 5, 2018 M. Handley +Expires: November 16, 2018 M. Handley U. College London O. Bonaventure U. catholique de Louvain C. Paasch Apple, Inc. - March 4, 2018 + May 15, 2018 TCP Extensions for Multipath Operation with Multiple Addresses - draft-ietf-mptcp-rfc6824bis-10 + draft-ietf-mptcp-rfc6824bis-11 Abstract TCP/IP communication is currently restricted to a single path per connection, yet multiple paths often exist between peers. The simultaneous use of these multiple paths for a TCP/IP session would improve resource usage within the network and, thus, improve user experience through higher throughput and improved resilience to network failure. @@ -42,21 +42,21 @@ Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet- Drafts is at https://datatracker.ietf.org/drafts/current/. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." - This Internet-Draft will expire on September 5, 2018. + This Internet-Draft will expire on November 16, 2018. Copyright Notice Copyright (c) 2018 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents @@ -93,54 +93,52 @@ 3.3.4. Receiver Considerations . . . . . . . . . . . . . . . 32 3.3.5. Sender Considerations . . . . . . . . . . . . . . . . 33 3.3.6. Reliability and Retransmissions . . . . . . . . . . . 34 3.3.7. Congestion Control Considerations . . . . . . . . . . 35 3.3.8. Subflow Policy . . . . . . . . . . . . . . . . . . . 36 3.4. Address Knowledge Exchange (Path Management) . . . . . . 37 3.4.1. Address Advertisement . . . . . . . . . . . . . . . . 38 3.4.2. Remove Address . . . . . . . . . . . . . . . . . . . 42 3.5. Fast Close . . . . . . . . . . . . . . . . . . . . . . . 43 3.6. Subflow Reset . . . . . . . . . . . . . . . . . . . . . . 44 - 3.7. MPTCP Experimental Option . . . . . . . . . . . . . . . . 46 - 3.8. Fallback . . . . . . . . . . . . . . . . . . . . . . . . 47 - 3.9. Error Handling . . . . . . . . . . . . . . . . . . . . . 51 - 3.10. Heuristics . . . . . . . . . . . . . . . . . . . . . . . 52 - 3.10.1. Port Usage . . . . . . . . . . . . . . . . . . . . . 52 - 3.10.2. Delayed Subflow Start and Subflow Symmetry . . . . . 52 - 3.10.3. Failure Handling . . . . . . . . . . . . . . . . . . 53 - 3.11. TCP Fast Open . . . . . . . . . . . . . . . . . . . . . . 54 - 3.11.1. TFO cookie request with MPTCP . . . . . . . . . . . 54 - 3.11.2. Data sequence mapping under TFO . . . . . . . . . . 55 - 3.11.3. Connection establishment examples . . . . . . . . . 56 - 4. Semantic Issues . . . . . . . . . . . . . . . . . . . . . . . 58 - 5. Security Considerations . . . . . . . . . . . . . . . . . . . 59 - 6. Interactions with Middleboxes . . . . . . . . . . . . . . . . 62 - 7. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 65 - 8. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 65 - 8.1. MPTCP Option Subtypes . . . . . . . . . . . . . . . . . . 66 - 8.2. MPTCP Handshake Algorithms . . . . . . . . . . . . . . . 67 - 8.3. MP_TCPRST Reason Codes . . . . . . . . . . . . . . . . . 67 - 8.4. Experimental option registry . . . . . . . . . . . . . . 68 - 9. References . . . . . . . . . . . . . . . . . . . . . . . . . 68 - 9.1. Normative References . . . . . . . . . . . . . . . . . . 68 - 9.2. Informative References . . . . . . . . . . . . . . . . . 69 - Appendix A. Notes on Use of TCP Options . . . . . . . . . . . . 72 - Appendix B. Control Blocks . . . . . . . . . . . . . . . . . . . 73 - B.1. MPTCP Control Block . . . . . . . . . . . . . . . . . . . 74 - B.1.1. Authentication and Metadata . . . . . . . . . . . . . 74 - B.1.2. Sending Side . . . . . . . . . . . . . . . . . . . . 74 - B.1.3. Receiving Side . . . . . . . . . . . . . . . . . . . 74 - B.2. TCP Control Blocks . . . . . . . . . . . . . . . . . . . 75 - B.2.1. Sending Side . . . . . . . . . . . . . . . . . . . . 75 - B.2.2. Receiving Side . . . . . . . . . . . . . . . . . . . 75 - Appendix C. Finite State Machine . . . . . . . . . . . . . . . . 75 - Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 76 + 3.7. Fallback . . . . . . . . . . . . . . . . . . . . . . . . 46 + 3.8. Error Handling . . . . . . . . . . . . . . . . . . . . . 50 + 3.9. Heuristics . . . . . . . . . . . . . . . . . . . . . . . 50 + 3.9.1. Port Usage . . . . . . . . . . . . . . . . . . . . . 51 + 3.9.2. Delayed Subflow Start and Subflow Symmetry . . . . . 51 + 3.9.3. Failure Handling . . . . . . . . . . . . . . . . . . 52 + 4. Semantic Issues . . . . . . . . . . . . . . . . . . . . . . . 53 + 5. Security Considerations . . . . . . . . . . . . . . . . . . . 54 + 6. Interactions with Middleboxes . . . . . . . . . . . . . . . . 57 + 7. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 60 + 8. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 60 + 8.1. MPTCP Option Subtypes . . . . . . . . . . . . . . . . . . 61 + 8.2. MPTCP Handshake Algorithms . . . . . . . . . . . . . . . 62 + 8.3. MP_TCPRST Reason Codes . . . . . . . . . . . . . . . . . 62 + 9. References . . . . . . . . . . . . . . . . . . . . . . . . . 63 + 9.1. Normative References . . . . . . . . . . . . . . . . . . 63 + 9.2. Informative References . . . . . . . . . . . . . . . . . 63 + Appendix A. Notes on Use of TCP Options . . . . . . . . . . . . 67 + Appendix B. TCP Fast Open . . . . . . . . . . . . . . . . . . . 68 + B.1. TFO cookie request with MPTCP . . . . . . . . . . . . . . 69 + B.2. Data sequence mapping under TFO . . . . . . . . . . . . . 69 + B.3. Connection establishment examples . . . . . . . . . . . . 70 + Appendix C. Control Blocks . . . . . . . . . . . . . . . . . . . 72 + C.1. MPTCP Control Block . . . . . . . . . . . . . . . . . . . 72 + C.1.1. Authentication and Metadata . . . . . . . . . . . . . 72 + C.1.2. Sending Side . . . . . . . . . . . . . . . . . . . . 73 + C.1.3. Receiving Side . . . . . . . . . . . . . . . . . . . 73 + C.2. TCP Control Blocks . . . . . . . . . . . . . . . . . . . 73 + C.2.1. Sending Side . . . . . . . . . . . . . . . . . . . . 74 + C.2.2. Receiving Side . . . . . . . . . . . . . . . . . . . 74 + Appendix D. Finite State Machine . . . . . . . . . . . . . . . . 74 + Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 75 1. Introduction Multipath TCP (MPTCP) is a set of extensions to regular TCP [RFC0793] to provide a Multipath TCP [RFC6182] service, which enables a transport connection to operate across multiple paths simultaneously. This document presents the protocol changes required to add multipath capability to TCP; specifically, those for signaling and setting up multiple paths ("subflows"), managing these subflows, reassembly of data, and termination of sessions. This is not the only information @@ -540,21 +537,21 @@ o MPTCP falls back to ordinary TCP if MPTCP operation is not possible, for example, if one host is not MPTCP capable or if a middlebox alters the payload. o To meet the threats identified in [RFC6181], the following steps are taken: keys are sent in the clear in the MP_CAPABLE messages; MP_JOIN messages are secured with HMAC-SHA256 ([RFC2104], [SHS]) using those keys; and standard TCP validity checks are made on the other messages (ensuring sequence numbers are in-window - [RFC5961]). + [RFC5961]). Further information can be found in Section 5. 3. MPTCP Protocol This section describes the operation of the MPTCP protocol, and is subdivided into sections for each key part of the protocol operation. All MPTCP operations are signaled using optional TCP header fields. A single TCP option number ("Kind") has been assigned by IANA for MPTCP (see Section 8), and then individual messages will be determined by a "subtype", the values of which are also stored in an @@ -771,24 +768,30 @@ Similar situations could occur when the MP_CAPABLE with data is lost and retransmitted. Furthermore, in the case of TCP Segmentation Offloading, the MP_CAPABLE with data parameters may be duplicated across multiple packets, and implementations must also be able to cope with duplicate MP_CAPABLE mappings as well as duplicate DSS mappings. Additionally, the MP_CAPABLE exchange allows the safe passage of MPTCP options on SYN packets to be determined. If any of these options are dropped, MPTCP will gracefully fall back to regular - single-path TCP, as documented in Section 3.8. Note that new - subflows MUST NOT be established (using the process documented in - Section 3.2) until a Data Sequence Signal (DSS) option has been - successfully received across the path (as documented in Section 3.3). + single-path TCP, as documented in Section 3.7. If at any point in + the handshake either party thinks the MPTCP negotiation is + compromised, for example by a middlebox corrupting the TCP options, + or unexpected ACK numbers being present, the host MUST stop using + MPTCP and no longer include MPTCP options in future TCP packets. The + other host will then also fall back to regular TCP using the fall + back mechanism. Note that new subflows MUST NOT be established + (using the process documented in Section 3.2) until a Data Sequence + Signal (DSS) option has been successfully received across the path + (as documented in Section 3.3). The first 4 bits of the first octet in the MP_CAPABLE option (Figure 4) define the MPTCP option subtype (see Section 8; for MP_CAPABLE, this is 0), and the remaining 4 bits of this octet specify the MPTCP version in use (for this specification, this is 1). The second octet is reserved for flags, allocated as follows: A: The leftmost bit, labeled "A", SHOULD be set to 1 to indicate "Checksum Required", unless the system administrator has decided @@ -873,33 +876,33 @@ If a SYN contains an MP_CAPABLE option but the SYN/ACK does not, it is assumed that the passive opener is not multipath capable; thus, the MPTCP session MUST operate as a regular, single-path TCP. If a SYN does not contain a MP_CAPABLE option, the SYN/ACK MUST NOT contain one in response. If the third packet (the ACK) does not contain the MP_CAPABLE option, then the session MUST fall back to operating as a regular, single-path TCP. This is to maintain compatibility with middleboxes on the path that drop some or all TCP options. Note that an implementation MAY choose to attempt sending MPTCP options more than one time before making this decision to - operate as regular TCP (see Section 3.10). + operate as regular TCP (see Section 3.9). If the SYN packets are unacknowledged, it is up to local policy to decide how to respond. It is expected that a sender will eventually fall back to single-path TCP (i.e., without the MP_CAPABLE option) in order to work around middleboxes that may drop packets with unknown options; however, the number of multipath-capable attempts that are made first will be up to local policy. It is possible that MPTCP and non-MPTCP SYNs could get reordered in the network. Therefore, the final state is inferred from the presence or absence of the MP_CAPABLE option in the third packet of the TCP handshake. If this option is not present, the connection SHOULD fall back to regular - TCP, as documented in Section 3.8. + TCP, as documented in Section 3.7. The initial data sequence number on an MPTCP connection is generated from the key. The algorithm for IDSN generation is also determined from the negotiated authentication algorithm. In this specification, with only the SHA-256 algorithm specified and selected, the IDSN of a host MUST be the least significant 64 bits of the SHA-256 hash of its key, i.e., IDSN-A = Hash(Key-A) and IDSN-B = Hash(Key-B). This deterministic generation of the IDSN allows a receiver to ensure that there are no gaps in sequence space at the start of the connection. The SYN with MP_CAPABLE occupies the first octet of data sequence @@ -909,21 +912,21 @@ 3.2. Starting a New Subflow Once an MPTCP connection has begun with the MP_CAPABLE exchange, further subflows can be added to the connection. Hosts have knowledge of their own address(es), and can become aware of the other host's addresses through signaling exchanges as described in Section 3.4. Using this knowledge, a host can initiate a new subflow over a currently unused pair of addresses. It is permitted for either host in a connection to initiate the creation of a new subflow, but it is expected that this will normally be the original - connection initiator (see Section 3.10 for heuristics). + connection initiator (see Section 3.9 for heuristics). A new subflow is started as a normal TCP SYN/ACK exchange. The Join Connection (MP_JOIN) MPTCP option is used to identify the connection to be joined by the new subflow. It uses keying material that was exchanged in the initial MP_CAPABLE handshake (Section 3.1), and that handshake also negotiates the crypto algorithm in use for the MP_JOIN handshake. This section specifies the behavior of MP_JOIN using the HMAC-SHA256 algorithm. An MP_JOIN option is present in the SYN, SYN/ACK, and ACK @@ -1128,21 +1131,21 @@ MP_JOIN is stripped from the SYN on the path from A to B, and Host B does not have a passive opener on the relevant port, it will respond with a RST in the normal way. If in response to a SYN with an MP_JOIN option, a SYN/ACK is received without the MP_JOIN option (either since it was stripped on the return path, or it was stripped on the outgoing path but the passive opener on Host B responded as if it were a new regular TCP session), then the subflow is unusable and Host A MUST close it with a RST. Note that additional subflows can be created between any pair of - ports (but see Section 3.10 for heuristics); no explicit application- + ports (but see Section 3.9 for heuristics); no explicit application- level accept calls or bind calls are required to open additional subflows. To associate a new subflow with an existing connection, the token supplied in the subflow's SYN exchange is used for demultiplexing. This then binds the 5-tuple of the TCP subflow to the local token of the connection. A consequence is that it is possible to allow any port pairs to be used for a connection. Demultiplexing subflow SYNs MUST be done using the token; this is unlike traditional TCP, where the destination port is used for demultiplexing SYN packets. Once a subflow is set up, demultiplexing @@ -1261,21 +1264,21 @@ the subflow sequence numbering is relative (the SYN at the start of the subflow has relative subflow sequence number 0). This is to allow middleboxes to change the initial sequence number of a subflow, such as firewalls that undertake ISN randomization. The data sequence mapping also contains a checksum of the data that this mapping covers, if use of checksums has been negotiated at the MP_CAPABLE exchange. Checksums are used to detect if the payload has been adjusted in any way by a non-MPTCP-aware middlebox. If this checksum fails, it will trigger a failure of the subflow, or a - fallback to regular TCP, as documented in Section 3.8, since MPTCP + fallback to regular TCP, as documented in Section 3.7, since MPTCP can no longer reliably know the subflow sequence space at the receiver to build data sequence mappings. The checksum algorithm used is the standard TCP checksum [RFC0793], operating over the data covered by this mapping, along with a pseudo- header as shown in Figure 10. 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +--------------------------------------------------------------+ @@ -1360,21 +1363,21 @@ A data sequence mapping does not need to be included in every MPTCP packet, as long as the subflow sequence space in that packet is covered by a mapping known at the receiver. This can be used to reduce overhead in cases where the mapping is known in advance; one such case is when there is a single subflow between the hosts, another is when segments of data are scheduled in larger than packet- sized chunks. An "infinite" mapping can be used to fall back to regular TCP by mapping the subflow-level data to the connection-level data for the - remainder of the connection (see Section 3.8). This is achieved by + remainder of the connection (see Section 3.7). This is achieved by setting the Data-Level Length field of the DSS option to the reserved value of 0. The checksum, in such a case, will also be set to zero. 3.3.2. Data Acknowledgments To provide full end-to-end resilience, MPTCP provides a connection- level acknowledgment, to act as a cumulative ACK for the connection as a whole. This is the "Data ACK" field of the DSS option (Figure 9). The Data ACK is analogous to the behavior of the standard TCP cumulative ACK -- indicating how much data has been @@ -1481,21 +1484,21 @@ A connection is considered closed once both hosts' DATA_FINs have been acknowledged by DATA_ACKs. As specified above, a standard TCP FIN on an individual subflow only shuts down the subflow on which it was sent. If all subflows have been closed with a FIN exchange, but no DATA_FIN has been received and acknowledged, the MPTCP connection is treated as closed only after a timeout. This implies that an implementation will have TIME_WAIT states at both the subflow and connection levels (see - Appendix C). This permits "break-before-make" scenarios where + Appendix D). This permits "break-before-make" scenarios where connectivity is lost on all subflows before a new one can be re- established. 3.3.4. Receiver Considerations Regular TCP advertises a receive window in each packet, telling the sender how much data the receiver is willing to accept past the cumulative ack. The receive window is used to implement flow control, throttling down fast senders when receivers cannot keep up. @@ -1818,21 +1821,21 @@ The 2 octets that specify the TCP port number to use are optional and their presence can be inferred from the length of the option. Although it is expected that the majority of use cases will use the same port pairs as used for the initial subflow (e.g., port 80 remains port 80 on all subflows, as does the ephemeral port at the client), there may be cases (such as port-based load balancing) where the explicit specification of a different port is required. If no port is specified, MPTCP SHOULD attempt to connect to the specified address on the same port as is already in use by the subflow on which the ADD_ADDR signal was sent; this is discussed in more detail in - Section 3.10. + Section 3.9. The Truncated HMAC present in this Option is the rightmost 64 bits of an HMAC, negotiated and calculated in the same way as for MP_JOIN as described in Section 3.2. For this specification of MPTCP, as there is only one hash algorithm option specified, this will be HMAC as defined in [RFC2104], using the SHA-256 hash algorithm [SHS], implemented as in [RFC6234]. In the same way as for MP_JOIN, the key for the HMAC algorithm, in the case of the message transmitted by Host A, will be Key-A followed by Key-B, and in the case of Host B, Key-B followed by Key-A. These are the keys that were exchanged in @@ -2131,85 +2134,21 @@ reset and start again than it is to retransmit the queued data. o Unacceptable performance (code 0x05). This code indicates that the performance of this subflow was too low compared to the other subflows of this Multipath TCP connection. o Middlebox interference (code 0x06). Middlebox interference has been detected over this subflow making MPTCP signaling invalid. For example, this may be sent if the checksum does not validate. -3.7. MPTCP Experimental Option - - In order to provide a structured identity and negotiation mechanism - for private experimental MPTCP extensions, the MP_EXPERIMENTAL option - has been reserved. - - 1 2 3 - 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 - +---------------+---------------+-------+-------+---------------+ - | Kind | Length |Subtype|S|U|rsv| Experiment | - +---------------+---------------+-------+-------+---------------+ - | Id. (16 bits) | Subtype-specific data (variable length) ... - +----------------------------------------------------------- ... - - Figure 16: MPTCP Experimental (MP_EXPERIMENTAL) Option - - Figure 16 shows the format of the experimental option. The - Experiment identifier is a 16 bits integer that shall be assigned by - using the same procedure as defined in [RFC6994]; a request to IANA - is made in Section 8.4. - - The two high order flags that are included in the MPTCP Experimental - option have the following semantics: - - o "S" flag (highest order bit) : This is the synchronising bit. - When set to 1, it indicates that the host sending this option - expects a reply from the remote host with an option having the - same experiment identifier, but possibly containing other data. - - o "U" flag (second highest order bit) : When set to 1, this flag - indicates that the experimental option was received by the sending - host but it was unable to parse it. - - The two low order flags are currently reserved for further use. They - MUST be set to zero when sending and ignored upon reception. - - To use the Experimental MPTCP option with a given experiment - identifier over a MPTCP connection, the sending host must first - verify the ability of the remote host to support this particular - Experimental option. For this, it first sends in any valid TCP - segment, including a duplicate acknowledgement, an Experimental MPTCP - option with the "S" flag set. Upon reception of this option, the - receiving host will verify whether it supports it. If yes, it shall - return a TCP segment that contains the experimental option with the - same identifier and the "S" and the "U" flags both set to 1. This - option may contain additional data depending on the semantics of the - extension. If the receiving host does not recognise the experimental - option that it has received, it shall return a TCP segment that - contains the received experimental option with the "S" flag set to 0 - and the "U" flag set to 1. - - If a host receives an Experimental MPTCP option with the "U" flag set - to 0 which it does not support, or which contains information that - the host cannot parse, it shall return the exact option that it - received with the "U" flag set to 1 to indicate the error to the - remote host. If an invalid option is received with the "U" flag set - to 0, it must be silently discarded. - - Future documents specifying new experimental MPTCP options should - specify the extract semantic of the Subtype-specific data and whether - additional validation operations are to be followed at both sides. - It should be noted that data can be included in an experimental - option concurrently with the capability check (S/U). - -3.8. Fallback +3.7. Fallback Sometimes, middleboxes will exist on a path that could prevent the operation of MPTCP. MPTCP has been designed in order to cope with many middlebox modifications (see Section 6), but there are still some cases where a subflow could fail to operate within the MPTCP requirements. These cases are notably the following: the loss of MPTCP options on a path and the modification of payload data. If such an event occurs, it is necessary to "fall back" to the previous, safe operation. This may be either falling back to regular TCP or removing a problematic subflow. @@ -2303,37 +2242,37 @@ tampered with. When multiple subflows are in use, the data in flight on a subflow will likely involve data that is not contiguously part of the connection-level stream, since segments will be spread across the multiple subflows. Due to the problems identified above, it is not possible to determine what the adjustment has done to the data (notably, any changes to the subflow sequence numbering). Therefore, it is not possible to recover the subflow, and the affected subflow must be immediately closed with a RST, featuring an MP_FAIL option - (Figure 17), which defines the data sequence number at the start of + (Figure 16), which defines the data sequence number at the start of the segment (defined by the data sequence mapping) that had the checksum failure. Note that the MP_FAIL option requires the use of the full 64-bit sequence number, even if 32-bit sequence numbers are normally in use in the DSS signals on the path. 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +---------------+---------------+-------+----------------------+ | Kind | Length=12 |Subtype| (reserved) | +---------------+---------------+-------+----------------------+ | | | Data Sequence Number (8 octets) | | | +--------------------------------------------------------------+ - Figure 17: Fallback (MP_FAIL) Option + Figure 16: Fallback (MP_FAIL) Option The receiver MUST discard all data following the data sequence number specified. Failed data MUST NOT be DATA_ACKed and so will be retransmitted on other subflows (Section 3.3.6). A special case is when there is a single subflow and it fails with a checksum error. If it is known that all unacknowledged data in flight is contiguous (which will usually be the case with a single subflow), an infinite mapping can be applied to the subflow without the need to close it first, and essentially turn off all further @@ -2377,50 +2316,50 @@ otherwise, the receiver would not know how to reorder the data. In practice, this means that all MPTCP subflows will have to be terminated except one. Once MPTCP falls back to regular TCP, it MUST NOT revert to MPTCP later in the connection. It should be emphasized that we are not attempting to prevent the use of middleboxes that want to adjust the payload. An MPTCP-aware middlebox could provide such functionality by also rewriting checksums. -3.9. Error Handling +3.8. Error Handling In addition to the fallback mechanism as described above, the standard classes of TCP errors may need to be handled in an MPTCP- specific way. Note that changing semantics -- such as the relevance of a RST -- are covered in Section 4. Where possible, we do not want to deviate from regular TCP behavior. The following list covers possible errors and the appropriate MPTCP behavior: o Unknown token in MP_JOIN (or HMAC failure in MP_JOIN ACK, or missing MP_JOIN in SYN/ACK response): send RST (analogous to TCP's behavior on an unknown port) o DSN out of window (during normal operation): drop the data, do not send Data ACKs o Remove request for unknown address ID: silently ignore -3.10. Heuristics +3.9. Heuristics There are a number of heuristics that are needed for performance or deployment but that are not required for protocol correctness. In this section, we detail such heuristics. Note that discussion of buffering and certain sender and receiver window behaviors are presented in Sections 3.3.4 and 3.3.5, as well as retransmission in Section 3.3.6. -3.10.1. Port Usage +3.9.1. Port Usage Under typical operation, an MPTCP implementation SHOULD use the same ports as already in use. In other words, the destination port of a SYN containing an MP_JOIN option SHOULD be the same as the remote port of the first subflow in the connection. The local port for such SYNs SHOULD also be the same as for the first subflow (and as such, an implementation SHOULD reserve ephemeral ports across all local IP addresses), although there may be cases where this is infeasible. This strategy is intended to maximize the probability of the SYN being permitted by a firewall or NAT at the recipient and to avoid @@ -2428,21 +2367,21 @@ There may also be cases, however, where the passive opener wishes to signal to the other host that a specific port should be used, and this facility is provided in the Add Address option as documented in Section 3.4.1. It is therefore feasible to allow multiple subflows between the same two addresses but using different port pairs, and such a facility could be used to allow load balancing within the network based on 5-tuples (e.g., some ECMP implementations [RFC2992]). -3.10.2. Delayed Subflow Start and Subflow Symmetry +3.9.2. Delayed Subflow Start and Subflow Symmetry Many TCP connections are short-lived and consist only of a few segments, and so the overheads of using MPTCP outweigh any benefits. A heuristic is required, therefore, to decide when to start using additional subflows in an MPTCP connection. We expect that experience gathered from deployments will provide further guidance on this, and will be affected by particular application characteristics (which are likely to change over time). However, a suggested general-purpose heuristic that an implementation MAY choose to employ is as follows. Results from experimental deployments are needed in @@ -2482,222 +2421,52 @@ is RECOMMENDED that some element of randomization is applied to the time waited before opening new subflows, so that only one subflow exists between a given address pair. If, however, hosts signal additional ports to use (for example, for leveraging ECMP on-path), this heuristic need not apply. This section has shown some of the considerations that an implementer should give when developing MPTCP heuristics, but is not intended to be prescriptive. -3.10.3. Failure Handling +3.9.3. Failure Handling Requirements for MPTCP's handling of unexpected signals have been - given in Section 3.9. There are other failure cases, however, where + given in Section 3.8. There are other failure cases, however, where a hosts can choose appropriate behavior. For example, Section 3.1 suggests that a host SHOULD fall back to trying regular TCP SYNs after one or more failures of MPTCP SYNs for a connection. A host may keep a system-wide cache of such information, so that it can back off from using MPTCP, firstly for that particular destination host, and eventually on a whole interface, if MPTCP connections continue failing. Another failure could occur when the MP_JOIN handshake fails. - Section 3.9 specifies that an incorrect handshake MUST lead to the + Section 3.8 specifies that an incorrect handshake MUST lead to the subflow being closed with a RST. A host operating an active intrusion detection system may choose to start blocking MP_JOIN packets from the source host if multiple failed MP_JOIN attempts are seen. From the connection initiator's point of view, if an MP_JOIN fails, it SHOULD NOT attempt to connect to the same IP address and port during the lifetime of the connection, unless the other host refreshes the information with another ADD_ADDR option. Note that the ADD_ADDR option is informational only, and does not guarantee the other host will attempt a connection. In addition, an implementation may learn, over a number of connections, that certain interfaces or destination addresses consistently fail and may default to not trying to use MPTCP for these. Behavior could also be learned for particularly badly performing subflows or subflows that regularly fail during use, in order to temporarily choose not to use these paths. -3.11. TCP Fast Open - - TCP Fast Open, described in [RFC7413], has been introduced with the - objective of gaining one RTT before transmitting data. This is - considered a valuable gain as very short connections are very common, - especially for HTTP request/response schemes. It achieves this by - sending the SYN-segment together with data and allowing the server to - reply immediately with data after the SYN/ACK. [RFC7413] secures - this mechanism, by using a new TCP option that includes a cookie - which is negotiated in a preceding connection. - - When using TCP Fast Open in conjunction with MPTCP, there are two key - points to take into account, detailed hereafter. - -3.11.1. TFO cookie request with MPTCP - - When a TFO client first connects to a server, it cannot immediately - include data in the SYN for security reasons [RFC7413]. Instead, it - requests a cookie that will be used in subsequent connections. This - is done with the TCP cookie request/response options, of resp. 2 - bytes and 6-18 bytes (depending on the chosen cookie length). - - TFO and MPTCP can be combined provided that the total length of their - options does not exceed the maximum 40 bytes possible in TCP: - - o In the SYN: MPTCP uses a 4-bytes long MP_CAPABLE option. The - MPTCP and TFO options sum up to 6 bytes. With typical TCP-options - using up to 19 bytes in the SYN (24 bytes if options are padded at - a word boundary), there is enough space to combine the MP_CAPABLE - with the TFO Cookie Request. - - o In the SYN+ACK: MPTCP uses a 12-bytes long MP_CAPABLE option, but - now TFO can be as long as 18 bytes. Since the maximum option - length may be exceeded, it is up to the server to solve this by - using a shorter cookie. As an example, if we consider that 19 - bytes are used for classical TCP options, the maximum possible - cookie length would be of 7 bytes. Note that the same limitation - applies to subsequent connections, for the SYN packet (because the - client then echoes back the cookie to the server). Finally, if - the security impact of reducing the cookie size is not deemed - acceptable, the server can reduce the amount of other TCP-options - by omitting the TCP timestamps (as outlined in Appendix A). - -3.11.2. Data sequence mapping under TFO - - MPTCP uses, in the TCP establishment phase, a key exchange that is - used to generate the Initial Data Sequence Numbers (IDSNs). In - particular, the SYN with MP_CAPABLE occupies the first octet of the - data sequence space. With TFO, one way to handle the data sent - together with the SYN would be to consider an implicit DSS mapping - that covers that SYN segment (since there is not enough space in the - SYN to include a DSS option). The problem with that approach is that - if a middlebox modifies the TFO data, this will not be noticed by - MPTCP because of the absence of a DSS-checksum. For example, a TCP - (but not MPTCP)-aware middlebox could insert bytes at the beginning - of the stream and adapt the TCP checksum and sequence numbers - accordingly. With an implicit mapping, this would give to client and - server a different view on the DSS-mapping, with no way to detect - this inconsistency as the DSS checksum is not present. - - To solve this, the TFO data should not be considered part of the Data - Sequence Number space: the SYN with MP_CAPABLE still occupies the - first octet of data sequence space, but then the first non-TFO data - byte occupies the second octet. This guarantees that, if the use of - DSS-checksum is negotiated, all data in the data sequence number - space is checksummed. We also note that this does not entail a loss - of functionality, because TFO-data is always sent when only one path - is active. - -3.11.3. Connection establishment examples - - The following shows a few examples of possible TFO+MPTCP - establishment scenarios. - - Before a client can send data together with the SYN, it must request - a cookie to the server, as shown in Figure Figure 18. This is done - by simply combining the TFO and MPTCP options. - - client server - | | - | S 0(0) , | - | -----------------------------------------------------------> | - | | - | S. 0(0) ack 1 , | - | <----------------------------------------------------------- | - | | - | . 0(0) ack 1 | - | -----------------------------------------------------------> | - | | - - Figure 18: Cookie request - - Once this is done, the received cookie can be used for TFO, as shown - in Figure Figure 19. In this example, the client first sends 20 - bytes in the SYN. The server immediately replies with 100 bytes - following the SYN-ACK upon which the client replies with 20 more - bytes. Note that the last segment in the figure has a TCP sequence - number of 21, while the DSS subflow sequence number is 1 (because the - TFO data is not part of the data sequence number space, as explained - in Section Section 3.11.2. - - client server - | | - | S 0(20) , | - | -----------------------------------------------------------> | - | | - | S. 0(0) ack 21 | - | <----------------------------------------------------------- | - | | - | . 1(100) ack 21 | - | <----------------------------------------------------------- | - | | - | . 21(0) ack 1 | - | -----------------------------------------------------------> | - | | - | . 21(20) ack 101 | - | -----------------------------------------------------------> | - | | - - Figure 19: The server supports TFO - - In Figure Figure 20, the server does not support TFO. The client - detects that no state is created in the server (as no data is acked), - and now sends the MP_CAPABLE in the third ack, in order for the - server to build its MPTCP context at then end of the establishment. - Now, the tfo data, retransmitted, becomes part of the data sequence - mapping because it is effectively sent (in fact re-sent) after the - establishment. - - client server - | | - | S 0(20) , | - | -----------------------------------------------------------> | - | | - | S. 0(0) ack 1 | - | <----------------------------------------------------------- | - | | - | . 1(0) ack 1 | - | -----------------------------------------------------------> | - | | - | . 1(20) ack 1 | - | -----------------------------------------------------------> | - | | - | . 0(0) ack 21 | - | <----------------------------------------------------------- | - | | - - Figure 20: The server does not support TFO - - It is also possible that the server acknowledges only part of the TFO - data, as illustrated in Figure Figure 21. The client will simply - retransmit the missing data together with a DSS-mapping. - - client server - | | - | S 0(1000) , | - | -----------------------------------------------------------> | - | | - | S. 0(0) ack 501 | - | <----------------------------------------------------------- | - | | - | . 501(0) ack 1 | - | -----------------------------------------------------------> | - | | - | . 501(500) ack 1 | - | -----------------------------------------------------------> | - | | - - Figure 21: Partial data acknowledgement - 4. Semantic Issues In order to support multipath operation, the semantics of some TCP components have changed. To aid clarity, this section collects these semantic changes as a reference. Sequence number: The (in-header) TCP sequence number is specific to the subflow. To allow the receiver to reorder application data, an additional data-level sequence space is used. In this data- level sequence space, the initial SYN and the final DATA_FIN @@ -2811,21 +2580,21 @@ denial-of-service attacks consuming resources. As discussed in Section 3.4.1, a host may advertise its private addresses, but these might point to different hosts in the receiver's network. The MP_JOIN handshake (Section 3.2) will ensure that this does not succeed in setting up a subflow to the incorrect host. However, it could still create unwanted TCP handshake traffic. This feature of MPTCP could be a target for denial-of-service exploits, with malicious participants in MPTCP connections encouraging the recipient to target other hosts in the network. Therefore, - implementations should consider heuristics (Section 3.10) at both the + implementations should consider heuristics (Section 3.9) at both the sender and receiver to reduce the impact of this. A small security risk could theoretically exist with key reuse, but in order to accomplish a replay attack, both the sender and receiver keys, and the sender and receiver random numbers, in the MP_JOIN handshake (Section 3.2) would have to match. Whilst this specification defines a "medium" security solution, meeting the criteria specified at the start of this section and the threat analysis ([RFC6181]), since attacks only ever get worse, it is @@ -2888,21 +2658,21 @@ presence of the SYN flag. MPTCP SYN packets on the first subflow of a connection contain the MP_CAPABLE option (Section 3.1). If this is dropped, MPTCP SHOULD fall back to regular TCP. If packets with the MP_JOIN option (Section 3.2) are dropped, the paths will simply not be used. If a middlebox strips options but otherwise passes the packets unchanged, MPTCP will behave safely. If an MP_CAPABLE option is dropped on either the outgoing or the return path, the initiating - host can fall back to regular TCP, as illustrated in Figure 22 and + host can fall back to regular TCP, as illustrated in Figure 17 and discussed in Section 3.1. Subflow SYNs contain the MP_JOIN option. If this option is stripped on the outgoing path, the SYN will appear to be a regular SYN to Host B. Depending on whether there is a listening socket on the target port, Host B will reply either with SYN/ACK or RST (subflow connection fails). When Host A receives the SYN/ACK it sends a RST because the SYN/ACK does not contain the MP_JOIN option and its token. Either way, the subflow setup fails, but otherwise does not affect the MPTCP connection as a whole. @@ -2918,37 +2688,37 @@ Host A Host B | SYN(MP_CAPABLE) | |------------------------------------>| | Middlebox M | | | | | SYN/ACK |SYN/ACK(MP_CAPABLE)| |<----------------|-------------------| b) MP_CAPABLE option stripped on return path - Figure 22: Connection Setup with Middleboxes that Strip Options from + Figure 17: Connection Setup with Middleboxes that Strip Options from Packets We now examine data flow with MPTCP, assuming the flow is correctly set up, which implies the options in the SYN packets were allowed through by the relevant middleboxes. If options are allowed through and there is no resegmentation or coalescing to TCP segments, Multipath TCP flows can proceed without problems. The case when options get stripped on data packets has been discussed in the Fallback section. If a fraction of options are stripped, behavior is not deterministic. If some data sequence mappings are lost, the connection can continue so long as mappings exist for the subflow-level data (e.g., if multiple maps have been sent that reinforce each other). If some subflow-level space is left unmapped, however, the subflow is treated as broken and is closed, through the - process described in Section 3.8. MPTCP should survive with a loss + process described in Section 3.7. MPTCP should survive with a loss of some Data ACKs, but performance will degrade as the fraction of stripped options increases. We do not expect such cases to appear in practice, though: most middleboxes will either strip all options or let them all through. We end this section with a list of middlebox classes, their behavior, and the elements in the MPTCP design that allow operation through such middleboxes. Issues surrounding dropping packets with options or stripping options were discussed above, and are not included here: @@ -2959,21 +2729,21 @@ the MP_JOIN option, and the handshake mechanism ensures that connection attempts to private addresses [RFC1918] do not cause problems. Explicit address removal is undertaken by an Address ID to allow no knowledge of the source address. o Performance Enhancing Proxies (PEPs) [RFC3135] might proactively ACK data to increase performance. MPTCP, however, relies on accurate congestion control signals from the end host, and non- MPTCP-aware PEPs will not be able to provide such signals. MPTCP will, therefore, fall back to single-path TCP, or close the - problematic subflow (see Section 3.8). + problematic subflow (see Section 3.7). o Traffic Normalizers [norm] may not allow holes in sequence numbers, and may cache packets and retransmit the same data. MPTCP looks like standard TCP on the wire, and will not retransmit different data on the same subflow sequence number. In the event of a retransmission, the same data will be retransmitted on the original TCP subflow even if it is additionally retransmitted at the connection level on a different subflow. o Firewalls [RFC2979] might perform initial sequence number @@ -3074,35 +2844,36 @@ | | | | document, | | | | | Section 3.4.1 | | 0x4 | REMOVE_ADDR | Remove Address | This | | | | | document, | | | | | Section 3.4.2 | | 0x5 | MP_PRIO | Change Subflow Priority | This | | | | | document, | | | | | Section 3.3.8 | | 0x6 | MP_FAIL | Fallback | This | | | | | document, | - | | | | Section 3.8 | + | | | | Section 3.7 | | 0x7 | MP_FASTCLOSE | Fast Close | This | | | | | document, | | | | | Section 3.5 | | 0x8 | MP_TCPRST | Subflow Reset | This | | | | | document, | | | | | Section 3.6 | - | 0xf | MP_EXPERIMENTAL | MPTCP Experimental | This | - | | | Option | document, | - | | | | Section 3.7 | + | 0xf | MP_EXPERIMENTAL | Reserved for private | | + | | | experiments | | +-------+-----------------+-------------------------+---------------+ Table 2: MPTCP Option Subtypes - Values 0x9 through 0xe are currently unassigned. + Values 0x9 through 0xe are currently unassigned. Option 0xf is + reserved for use by private experiments. Its use may be formalized + in a future specification. 8.2. MPTCP Handshake Algorithms IANA has created another sub-registry, "MPTCP Handshake Algorithms" under the "Transmission Control Protocol (TCP) Parameters" registry, based on the flags in MP_CAPABLE (Section 3.1). IANA is requested to update the references of this table to this document, as follows: +---------+----------------------------------+----------------------+ | Flag | Meaning | Reference | @@ -3144,43 +2915,20 @@ | 0x01 | MPTCP specific error | This document, Section 3.6 | | 0x02 | Lack of resources | This document, Section 3.6 | | 0x03 | Administratively prohibited | This document, Section 3.6 | | 0x04 | Too much outstanding data | This document, Section 3.6 | | 0x05 | Unacceptable performance | This document, Section 3.6 | | 0x06 | Middlebox interference | This document, Section 3.6 | +------+-----------------------------+----------------------------+ Table 4: MPTCP MP_TCPRST Reason Codes -8.4. Experimental option registry - - Section 3.7 has defined the MP_EXPERIMENTAL option for private, - experimental MPTCP options, and the same considerations as for - [RFC6994] apply. IANA should create a "Multipath TCP Experimental - Option Identifiers (MPTCP ExIDs)" sub-registry. This registry - contains the 16 bits ExIDs and a reference (description, document - pointer, or assignee name and e-mail contact) for each entry. MPTCP - ExIDs are assigned on a First Come, First Served (FCFS) basis - [RFC5226]. - - IANA will advise applicants of duplicate entries to select an - alternate value, as per typical FCFS processing. - - IANA will record known duplicate uses to assist the community in both - debugging assigned uses as well as correcting unauthorized duplicate - uses. - - IANA should impose no requirement on making a registration other than - indicating the desired codepoint and providing a point of contact. A - short description or acronym for the use is desired but should not be - required. - 9. References 9.1. Normative References [RFC0793] Postel, J., "Transmission Control Protocol", STD 7, RFC 793, DOI 10.17487/RFC0793, September 1981, . [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, @@ -3307,24 +3055,20 @@ [RFC6824] Ford, A., Raiciu, C., Handley, M., and O. Bonaventure, "TCP Extensions for Multipath Operation with Multiple Addresses", RFC 6824, DOI 10.17487/RFC6824, January 2013, . [RFC6897] Scharf, M. and A. Ford, "Multipath TCP (MPTCP) Application Interface Considerations", RFC 6897, DOI 10.17487/RFC6897, March 2013, . - [RFC6994] Touch, J., "Shared Use of Experimental TCP Options", - RFC 6994, DOI 10.17487/RFC6994, August 2013, - . - [RFC7413] Cheng, Y., Chu, J., Radhakrishnan, S., and A. Jain, "TCP Fast Open", RFC 7413, DOI 10.17487/RFC7413, December 2014, . [TCPLO] Ramaiah, A., "TCP option space extension", Work in Progress, March 2012. Appendix A. Notes on Use of TCP Options The TCP option space is limited due to the length of the Data Offset @@ -3400,121 +3144,292 @@ Finally, there are issues with reliable delivery of options. As options can also be sent on pure ACKs, these are not reliably sent. This is not an issue for DATA_ACK due to their cumulative nature, but may be an issue for ADD_ADDR/REMOVE_ADDR options. Here, it is recommended to send these options redundantly (whether on multiple paths or on the same path on a number of ACKs -- but interspersed with data in order to avoid interpretation as congestion). The cases where options are stripped by middleboxes are discussed in Section 6. -Appendix B. Control Blocks +Appendix B. TCP Fast Open + + TCP Fast Open (TFO) is an experimental TCP extension, described in + [RFC7413], which has been introduced with the objective of gaining + one RTT before transmitting data. This is considered a valuable gain + as very short connections are very common, especially for HTTP + request/response schemes. It achieves this by sending the SYN- + segment together with data and allowing the server to reply + immediately with data after the SYN/ACK. [RFC7413] secures this + mechanism, by using a new TCP option that includes a cookie which is + negotiated in a preceding connection. + + When using TCP Fast Open in conjunction with MPTCP, there are two key + points to take into account, detailed hereafter. + +B.1. TFO cookie request with MPTCP + + When a TFO client first connects to a server, it cannot immediately + include data in the SYN for security reasons [RFC7413]. Instead, it + requests a cookie that will be used in subsequent connections. This + is done with the TCP cookie request/response options, of resp. 2 + bytes and 6-18 bytes (depending on the chosen cookie length). + + TFO and MPTCP can be combined provided that the total length of their + options does not exceed the maximum 40 bytes possible in TCP: + + o In the SYN: MPTCP uses a 4-bytes long MP_CAPABLE option. The + MPTCP and TFO options sum up to 6 bytes. With typical TCP-options + using up to 19 bytes in the SYN (24 bytes if options are padded at + a word boundary), there is enough space to combine the MP_CAPABLE + with the TFO Cookie Request. + + o In the SYN+ACK: MPTCP uses a 12-bytes long MP_CAPABLE option, but + now TFO can be as long as 18 bytes. Since the maximum option + length may be exceeded, it is up to the server to solve this by + using a shorter cookie. As an example, if we consider that 19 + bytes are used for classical TCP options, the maximum possible + cookie length would be of 7 bytes. Note that the same limitation + applies to subsequent connections, for the SYN packet (because the + client then echoes back the cookie to the server). Finally, if + the security impact of reducing the cookie size is not deemed + acceptable, the server can reduce the amount of other TCP-options + by omitting the TCP timestamps (as outlined in Appendix A). + +B.2. Data sequence mapping under TFO + + MPTCP uses, in the TCP establishment phase, a key exchange that is + used to generate the Initial Data Sequence Numbers (IDSNs). In + particular, the SYN with MP_CAPABLE occupies the first octet of the + data sequence space. With TFO, one way to handle the data sent + together with the SYN would be to consider an implicit DSS mapping + that covers that SYN segment (since there is not enough space in the + SYN to include a DSS option). The problem with that approach is that + if a middlebox modifies the TFO data, this will not be noticed by + MPTCP because of the absence of a DSS-checksum. For example, a TCP + (but not MPTCP)-aware middlebox could insert bytes at the beginning + of the stream and adapt the TCP checksum and sequence numbers + accordingly. With an implicit mapping, this would give to client and + server a different view on the DSS-mapping, with no way to detect + this inconsistency as the DSS checksum is not present. + + To solve this, the TFO data should not be considered part of the Data + Sequence Number space: the SYN with MP_CAPABLE still occupies the + first octet of data sequence space, but then the first non-TFO data + byte occupies the second octet. This guarantees that, if the use of + DSS-checksum is negotiated, all data in the data sequence number + space is checksummed. We also note that this does not entail a loss + of functionality, because TFO-data is always sent when only one path + is active. + +B.3. Connection establishment examples + + The following shows a few examples of possible TFO+MPTCP + establishment scenarios. + + Before a client can send data together with the SYN, it must request + a cookie to the server, as shown in Figure Figure 18. This is done + by simply combining the TFO and MPTCP options. + + client server + | | + | S 0(0) , | + | -----------------------------------------------------------> | + | | + | S. 0(0) ack 1 , | + | <----------------------------------------------------------- | + | | + | . 0(0) ack 1 | + | -----------------------------------------------------------> | + | | + + Figure 18: Cookie request + + Once this is done, the received cookie can be used for TFO, as shown + in Figure Figure 19. In this example, the client first sends 20 + bytes in the SYN. The server immediately replies with 100 bytes + following the SYN-ACK upon which the client replies with 20 more + bytes. Note that the last segment in the figure has a TCP sequence + number of 21, while the DSS subflow sequence number is 1 (because the + TFO data is not part of the data sequence number space, as explained + in Section Appendix B.2. + + client server + | | + | S 0(20) , | + | -----------------------------------------------------------> | + | | + | S. 0(0) ack 21 | + | <----------------------------------------------------------- | + | | + | . 1(100) ack 21 | + | <----------------------------------------------------------- | + | | + | . 21(0) ack 1 | + | -----------------------------------------------------------> | + | | + | . 21(20) ack 101 | + | -----------------------------------------------------------> | + | | + + Figure 19: The server supports TFO + + In Figure Figure 20, the server does not support TFO. The client + detects that no state is created in the server (as no data is acked), + and now sends the MP_CAPABLE in the third ack, in order for the + server to build its MPTCP context at then end of the establishment. + Now, the tfo data, retransmitted, becomes part of the data sequence + mapping because it is effectively sent (in fact re-sent) after the + establishment. + + client server + | | + | S 0(20) , | + | -----------------------------------------------------------> | + | | + | S. 0(0) ack 1 | + | <----------------------------------------------------------- | + | | + | . 1(0) ack 1 | + | -----------------------------------------------------------> | + | | + | . 1(20) ack 1 | + | -----------------------------------------------------------> | + | | + | . 0(0) ack 21 | + | <----------------------------------------------------------- | + | | + + Figure 20: The server does not support TFO + + It is also possible that the server acknowledges only part of the TFO + data, as illustrated in Figure Figure 21. The client will simply + retransmit the missing data together with a DSS-mapping. + + client server + | | + | S 0(1000) , | + | -----------------------------------------------------------> | + | | + | S. 0(0) ack 501 | + | <----------------------------------------------------------- | + | | + | . 501(0) ack 1 | + | -----------------------------------------------------------> | + | | + | . 501(500) ack 1 | + | -----------------------------------------------------------> | + | | + + Figure 21: Partial data acknowledgement + +Appendix C. Control Blocks Conceptually, an MPTCP connection can be represented as an MPTCP control block that contains several variables that track the progress and the state of the MPTCP connection and a set of linked TCP control blocks that correspond to the subflows that have been established. RFC 793 [RFC0793] specifies several state variables. Whenever possible, we reuse the same terminology as RFC 793 to describe the state variables that are maintained by MPTCP. -B.1. MPTCP Control Block +C.1. MPTCP Control Block The MPTCP control block contains the following variable per connection. -B.1.1. Authentication and Metadata +C.1.1. Authentication and Metadata Local.Token (32 bits): This is the token chosen by the local host on this MPTCP connection. The token MUST be unique among all established MPTCP connections, generated from the local key. Local.Key (64 bits): This is the key sent by the local host on this MPTCP connection. Remote.Token (32 bits): This is the token chosen by the remote host on this MPTCP connection, generated from the remote key. Remote.Key (64 bits): This is the key chosen by the remote host on this MPTCP connection MPTCP.Checksum (flag): This flag is set to true if at least one of the hosts has set the A bit in the MP_CAPABLE options exchanged during connection establishment, and is set to false otherwise. If this flag is set, the checksum must be computed in all DSS options. -B.1.2. Sending Side +C.1.2. Sending Side SND.UNA (64 bits): This is the data sequence number of the next byte to be acknowledged, at the MPTCP connection level. This variable is updated upon reception of a DSS option containing a DATA_ACK. SND.NXT (64 bits): This is the data sequence number of the next byte to be sent. SND.NXT is used to determine the value of the DSN in the DSS option. SND.WND (32 bits with RFC 1323, 16 bits otherwise): This is the sending window. MPTCP maintains the sending window at the MPTCP connection level and the same window is shared by all subflows. All subflows use the MPTCP connection level SND.WND to compute the SEQ.WND value that is sent in each transmitted segment. -B.1.3. Receiving Side +C.1.3. Receiving Side RCV.NXT (64 bits): This is the data sequence number of the next byte that is expected on the MPTCP connection. This state variable is modified upon reception of in-order data. The value of RCV.NXT is used to specify the DATA_ACK that is sent in the DSS option on all subflows. RCV.WND (32 bits with RFC 1323, 16 bits otherwise): This is the connection-level receive window, which is the maximum of the RCV.WND on all the subflows. -B.2. TCP Control Blocks +C.2. TCP Control Blocks The MPTCP control block also contains a list of the TCP control blocks that are associated to the MPTCP connection. Note that the TCP control block on the TCP subflows does not contain the RCV.WND and SND.WND state variables as these are maintained at the MPTCP connection level and not at the subflow level. Inside each TCP control block, the following state variables are defined. -B.2.1. Sending Side +C.2.1. Sending Side SND.UNA (32 bits): This is the sequence number of the next byte to be acknowledged on the subflow. This variable is updated upon reception of each TCP acknowledgment on the subflow. SND.NXT (32 bits): This is the sequence number of the next byte to be sent on the subflow. SND.NXT is used to set the value of SEG.SEQ upon transmission of the next segment. -B.2.2. Receiving Side +C.2.2. Receiving Side RCV.NXT (32 bits): This is the sequence number of the next byte that is expected on the subflow. This state variable is modified upon reception of in-order segments. The value of RCV.NXT is copied to the SEG.ACK field of the next segments transmitted on the subflow. RCV.WND (32 bits with RFC 1323, 16 bits otherwise): This is the subflow-level receive window that is updated with the window field from the segments received on this subflow. -Appendix C. Finite State Machine +Appendix D. Finite State Machine - The diagram in Figure 23 shows the Finite State Machine for + The diagram in Figure 22 shows the Finite State Machine for connection-level closure. This illustrates how the DATA_FIN connection-level signal (indicated as the DFIN flag on a DATA_ACK) interacts with subflow-level FINs, and permits "break-before-make" handover between subflows. +---------+ | M_ESTAB | +---------+ M_CLOSE | | rcv DATA_FIN ------- | | ------- @@ -3533,21 +3448,21 @@ | rcv DATA_FIN -------------- | -------------- | | ------- CLOSE all subflows | CLOSE all subflows | | snd DATA_ACK[DFIN] V delete MPTCP PCB V \ +-----------+ +---------+ ------------------------>|M_TIME WAIT|----------------->| M_CLOSED| +-----------+ +---------+ All subflows in CLOSED ------------ delete MPTCP PCB - Figure 23: Finite State Machine for Connection Closure + Figure 22: Finite State Machine for Connection Closure Authors' Addresses Alan Ford Pexip EMail: alan.ford@gmail.com Costin Raiciu University Politehnica of Bucharest