draft-ietf-mptcp-rfc6824bis-10.txt   draft-ietf-mptcp-rfc6824bis-11.txt 
Internet Engineering Task Force A. Ford Internet Engineering Task Force A. Ford
Internet-Draft Pexip Internet-Draft Pexip
Obsoletes: 6824 (if approved) C. Raiciu Obsoletes: 6824 (if approved) C. Raiciu
Intended status: Standards Track U. Politechnica of Bucharest Intended status: Standards Track U. Politechnica of Bucharest
Expires: September 5, 2018 M. Handley Expires: November 16, 2018 M. Handley
U. College London U. College London
O. Bonaventure O. Bonaventure
U. catholique de Louvain U. catholique de Louvain
C. Paasch C. Paasch
Apple, Inc. Apple, Inc.
March 4, 2018 May 15, 2018
TCP Extensions for Multipath Operation with Multiple Addresses TCP Extensions for Multipath Operation with Multiple Addresses
draft-ietf-mptcp-rfc6824bis-10 draft-ietf-mptcp-rfc6824bis-11
Abstract Abstract
TCP/IP communication is currently restricted to a single path per TCP/IP communication is currently restricted to a single path per
connection, yet multiple paths often exist between peers. The connection, yet multiple paths often exist between peers. The
simultaneous use of these multiple paths for a TCP/IP session would simultaneous use of these multiple paths for a TCP/IP session would
improve resource usage within the network and, thus, improve user improve resource usage within the network and, thus, improve user
experience through higher throughput and improved resilience to experience through higher throughput and improved resilience to
network failure. network failure.
skipping to change at page 2, line 7 skipping to change at page 2, line 7
Internet-Drafts are working documents of the Internet Engineering Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF). Note that other groups may also distribute Task Force (IETF). Note that other groups may also distribute
working documents as Internet-Drafts. The list of current Internet- working documents as Internet-Drafts. The list of current Internet-
Drafts is at https://datatracker.ietf.org/drafts/current/. Drafts is at https://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress." material or to cite them other than as "work in progress."
This Internet-Draft will expire on September 5, 2018. This Internet-Draft will expire on November 16, 2018.
Copyright Notice Copyright Notice
Copyright (c) 2018 IETF Trust and the persons identified as the Copyright (c) 2018 IETF Trust and the persons identified as the
document authors. All rights reserved. document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents Provisions Relating to IETF Documents
(https://trustee.ietf.org/license-info) in effect on the date of (https://trustee.ietf.org/license-info) in effect on the date of
publication of this document. Please review these documents publication of this document. Please review these documents
skipping to change at page 3, line 10 skipping to change at page 3, line 10
3.3.4. Receiver Considerations . . . . . . . . . . . . . . . 32 3.3.4. Receiver Considerations . . . . . . . . . . . . . . . 32
3.3.5. Sender Considerations . . . . . . . . . . . . . . . . 33 3.3.5. Sender Considerations . . . . . . . . . . . . . . . . 33
3.3.6. Reliability and Retransmissions . . . . . . . . . . . 34 3.3.6. Reliability and Retransmissions . . . . . . . . . . . 34
3.3.7. Congestion Control Considerations . . . . . . . . . . 35 3.3.7. Congestion Control Considerations . . . . . . . . . . 35
3.3.8. Subflow Policy . . . . . . . . . . . . . . . . . . . 36 3.3.8. Subflow Policy . . . . . . . . . . . . . . . . . . . 36
3.4. Address Knowledge Exchange (Path Management) . . . . . . 37 3.4. Address Knowledge Exchange (Path Management) . . . . . . 37
3.4.1. Address Advertisement . . . . . . . . . . . . . . . . 38 3.4.1. Address Advertisement . . . . . . . . . . . . . . . . 38
3.4.2. Remove Address . . . . . . . . . . . . . . . . . . . 42 3.4.2. Remove Address . . . . . . . . . . . . . . . . . . . 42
3.5. Fast Close . . . . . . . . . . . . . . . . . . . . . . . 43 3.5. Fast Close . . . . . . . . . . . . . . . . . . . . . . . 43
3.6. Subflow Reset . . . . . . . . . . . . . . . . . . . . . . 44 3.6. Subflow Reset . . . . . . . . . . . . . . . . . . . . . . 44
3.7. MPTCP Experimental Option . . . . . . . . . . . . . . . . 46 3.7. Fallback . . . . . . . . . . . . . . . . . . . . . . . . 46
3.8. Fallback . . . . . . . . . . . . . . . . . . . . . . . . 47 3.8. Error Handling . . . . . . . . . . . . . . . . . . . . . 50
3.9. Error Handling . . . . . . . . . . . . . . . . . . . . . 51 3.9. Heuristics . . . . . . . . . . . . . . . . . . . . . . . 50
3.10. Heuristics . . . . . . . . . . . . . . . . . . . . . . . 52 3.9.1. Port Usage . . . . . . . . . . . . . . . . . . . . . 51
3.10.1. Port Usage . . . . . . . . . . . . . . . . . . . . . 52 3.9.2. Delayed Subflow Start and Subflow Symmetry . . . . . 51
3.10.2. Delayed Subflow Start and Subflow Symmetry . . . . . 52 3.9.3. Failure Handling . . . . . . . . . . . . . . . . . . 52
3.10.3. Failure Handling . . . . . . . . . . . . . . . . . . 53 4. Semantic Issues . . . . . . . . . . . . . . . . . . . . . . . 53
3.11. TCP Fast Open . . . . . . . . . . . . . . . . . . . . . . 54 5. Security Considerations . . . . . . . . . . . . . . . . . . . 54
3.11.1. TFO cookie request with MPTCP . . . . . . . . . . . 54 6. Interactions with Middleboxes . . . . . . . . . . . . . . . . 57
3.11.2. Data sequence mapping under TFO . . . . . . . . . . 55 7. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 60
3.11.3. Connection establishment examples . . . . . . . . . 56 8. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 60
4. Semantic Issues . . . . . . . . . . . . . . . . . . . . . . . 58 8.1. MPTCP Option Subtypes . . . . . . . . . . . . . . . . . . 61
5. Security Considerations . . . . . . . . . . . . . . . . . . . 59 8.2. MPTCP Handshake Algorithms . . . . . . . . . . . . . . . 62
6. Interactions with Middleboxes . . . . . . . . . . . . . . . . 62 8.3. MP_TCPRST Reason Codes . . . . . . . . . . . . . . . . . 62
7. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 65 9. References . . . . . . . . . . . . . . . . . . . . . . . . . 63
8. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 65 9.1. Normative References . . . . . . . . . . . . . . . . . . 63
8.1. MPTCP Option Subtypes . . . . . . . . . . . . . . . . . . 66 9.2. Informative References . . . . . . . . . . . . . . . . . 63
8.2. MPTCP Handshake Algorithms . . . . . . . . . . . . . . . 67 Appendix A. Notes on Use of TCP Options . . . . . . . . . . . . 67
8.3. MP_TCPRST Reason Codes . . . . . . . . . . . . . . . . . 67 Appendix B. TCP Fast Open . . . . . . . . . . . . . . . . . . . 68
8.4. Experimental option registry . . . . . . . . . . . . . . 68 B.1. TFO cookie request with MPTCP . . . . . . . . . . . . . . 69
9. References . . . . . . . . . . . . . . . . . . . . . . . . . 68 B.2. Data sequence mapping under TFO . . . . . . . . . . . . . 69
9.1. Normative References . . . . . . . . . . . . . . . . . . 68 B.3. Connection establishment examples . . . . . . . . . . . . 70
9.2. Informative References . . . . . . . . . . . . . . . . . 69 Appendix C. Control Blocks . . . . . . . . . . . . . . . . . . . 72
Appendix A. Notes on Use of TCP Options . . . . . . . . . . . . 72 C.1. MPTCP Control Block . . . . . . . . . . . . . . . . . . . 72
Appendix B. Control Blocks . . . . . . . . . . . . . . . . . . . 73 C.1.1. Authentication and Metadata . . . . . . . . . . . . . 72
B.1. MPTCP Control Block . . . . . . . . . . . . . . . . . . . 74 C.1.2. Sending Side . . . . . . . . . . . . . . . . . . . . 73
B.1.1. Authentication and Metadata . . . . . . . . . . . . . 74 C.1.3. Receiving Side . . . . . . . . . . . . . . . . . . . 73
B.1.2. Sending Side . . . . . . . . . . . . . . . . . . . . 74 C.2. TCP Control Blocks . . . . . . . . . . . . . . . . . . . 73
B.1.3. Receiving Side . . . . . . . . . . . . . . . . . . . 74 C.2.1. Sending Side . . . . . . . . . . . . . . . . . . . . 74
B.2. TCP Control Blocks . . . . . . . . . . . . . . . . . . . 75 C.2.2. Receiving Side . . . . . . . . . . . . . . . . . . . 74
B.2.1. Sending Side . . . . . . . . . . . . . . . . . . . . 75 Appendix D. Finite State Machine . . . . . . . . . . . . . . . . 74
B.2.2. Receiving Side . . . . . . . . . . . . . . . . . . . 75 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 75
Appendix C. Finite State Machine . . . . . . . . . . . . . . . . 75
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 76
1. Introduction 1. Introduction
Multipath TCP (MPTCP) is a set of extensions to regular TCP [RFC0793] Multipath TCP (MPTCP) is a set of extensions to regular TCP [RFC0793]
to provide a Multipath TCP [RFC6182] service, which enables a to provide a Multipath TCP [RFC6182] service, which enables a
transport connection to operate across multiple paths simultaneously. transport connection to operate across multiple paths simultaneously.
This document presents the protocol changes required to add multipath This document presents the protocol changes required to add multipath
capability to TCP; specifically, those for signaling and setting up capability to TCP; specifically, those for signaling and setting up
multiple paths ("subflows"), managing these subflows, reassembly of multiple paths ("subflows"), managing these subflows, reassembly of
data, and termination of sessions. This is not the only information data, and termination of sessions. This is not the only information
skipping to change at page 12, line 42 skipping to change at page 12, line 42
o MPTCP falls back to ordinary TCP if MPTCP operation is not o MPTCP falls back to ordinary TCP if MPTCP operation is not
possible, for example, if one host is not MPTCP capable or if a possible, for example, if one host is not MPTCP capable or if a
middlebox alters the payload. middlebox alters the payload.
o To meet the threats identified in [RFC6181], the following steps o To meet the threats identified in [RFC6181], the following steps
are taken: keys are sent in the clear in the MP_CAPABLE messages; are taken: keys are sent in the clear in the MP_CAPABLE messages;
MP_JOIN messages are secured with HMAC-SHA256 ([RFC2104], [SHS]) MP_JOIN messages are secured with HMAC-SHA256 ([RFC2104], [SHS])
using those keys; and standard TCP validity checks are made on the using those keys; and standard TCP validity checks are made on the
other messages (ensuring sequence numbers are in-window other messages (ensuring sequence numbers are in-window
[RFC5961]). [RFC5961]). Further information can be found in Section 5.
3. MPTCP Protocol 3. MPTCP Protocol
This section describes the operation of the MPTCP protocol, and is This section describes the operation of the MPTCP protocol, and is
subdivided into sections for each key part of the protocol operation. subdivided into sections for each key part of the protocol operation.
All MPTCP operations are signaled using optional TCP header fields. All MPTCP operations are signaled using optional TCP header fields.
A single TCP option number ("Kind") has been assigned by IANA for A single TCP option number ("Kind") has been assigned by IANA for
MPTCP (see Section 8), and then individual messages will be MPTCP (see Section 8), and then individual messages will be
determined by a "subtype", the values of which are also stored in an determined by a "subtype", the values of which are also stored in an
skipping to change at page 17, line 33 skipping to change at page 17, line 33
Similar situations could occur when the MP_CAPABLE with data is lost Similar situations could occur when the MP_CAPABLE with data is lost
and retransmitted. Furthermore, in the case of TCP Segmentation and retransmitted. Furthermore, in the case of TCP Segmentation
Offloading, the MP_CAPABLE with data parameters may be duplicated Offloading, the MP_CAPABLE with data parameters may be duplicated
across multiple packets, and implementations must also be able to across multiple packets, and implementations must also be able to
cope with duplicate MP_CAPABLE mappings as well as duplicate DSS cope with duplicate MP_CAPABLE mappings as well as duplicate DSS
mappings. mappings.
Additionally, the MP_CAPABLE exchange allows the safe passage of Additionally, the MP_CAPABLE exchange allows the safe passage of
MPTCP options on SYN packets to be determined. If any of these MPTCP options on SYN packets to be determined. If any of these
options are dropped, MPTCP will gracefully fall back to regular options are dropped, MPTCP will gracefully fall back to regular
single-path TCP, as documented in Section 3.8. Note that new single-path TCP, as documented in Section 3.7. If at any point in
subflows MUST NOT be established (using the process documented in the handshake either party thinks the MPTCP negotiation is
Section 3.2) until a Data Sequence Signal (DSS) option has been compromised, for example by a middlebox corrupting the TCP options,
successfully received across the path (as documented in Section 3.3). or unexpected ACK numbers being present, the host MUST stop using
MPTCP and no longer include MPTCP options in future TCP packets. The
other host will then also fall back to regular TCP using the fall
back mechanism. Note that new subflows MUST NOT be established
(using the process documented in Section 3.2) until a Data Sequence
Signal (DSS) option has been successfully received across the path
(as documented in Section 3.3).
The first 4 bits of the first octet in the MP_CAPABLE option The first 4 bits of the first octet in the MP_CAPABLE option
(Figure 4) define the MPTCP option subtype (see Section 8; for (Figure 4) define the MPTCP option subtype (see Section 8; for
MP_CAPABLE, this is 0), and the remaining 4 bits of this octet MP_CAPABLE, this is 0), and the remaining 4 bits of this octet
specify the MPTCP version in use (for this specification, this is 1). specify the MPTCP version in use (for this specification, this is 1).
The second octet is reserved for flags, allocated as follows: The second octet is reserved for flags, allocated as follows:
A: The leftmost bit, labeled "A", SHOULD be set to 1 to indicate A: The leftmost bit, labeled "A", SHOULD be set to 1 to indicate
"Checksum Required", unless the system administrator has decided "Checksum Required", unless the system administrator has decided
skipping to change at page 19, line 40 skipping to change at page 19, line 46
If a SYN contains an MP_CAPABLE option but the SYN/ACK does not, it If a SYN contains an MP_CAPABLE option but the SYN/ACK does not, it
is assumed that the passive opener is not multipath capable; thus, is assumed that the passive opener is not multipath capable; thus,
the MPTCP session MUST operate as a regular, single-path TCP. If a the MPTCP session MUST operate as a regular, single-path TCP. If a
SYN does not contain a MP_CAPABLE option, the SYN/ACK MUST NOT SYN does not contain a MP_CAPABLE option, the SYN/ACK MUST NOT
contain one in response. If the third packet (the ACK) does not contain one in response. If the third packet (the ACK) does not
contain the MP_CAPABLE option, then the session MUST fall back to contain the MP_CAPABLE option, then the session MUST fall back to
operating as a regular, single-path TCP. This is to maintain operating as a regular, single-path TCP. This is to maintain
compatibility with middleboxes on the path that drop some or all TCP compatibility with middleboxes on the path that drop some or all TCP
options. Note that an implementation MAY choose to attempt sending options. Note that an implementation MAY choose to attempt sending
MPTCP options more than one time before making this decision to MPTCP options more than one time before making this decision to
operate as regular TCP (see Section 3.10). operate as regular TCP (see Section 3.9).
If the SYN packets are unacknowledged, it is up to local policy to If the SYN packets are unacknowledged, it is up to local policy to
decide how to respond. It is expected that a sender will eventually decide how to respond. It is expected that a sender will eventually
fall back to single-path TCP (i.e., without the MP_CAPABLE option) in fall back to single-path TCP (i.e., without the MP_CAPABLE option) in
order to work around middleboxes that may drop packets with unknown order to work around middleboxes that may drop packets with unknown
options; however, the number of multipath-capable attempts that are options; however, the number of multipath-capable attempts that are
made first will be up to local policy. It is possible that MPTCP and made first will be up to local policy. It is possible that MPTCP and
non-MPTCP SYNs could get reordered in the network. Therefore, the non-MPTCP SYNs could get reordered in the network. Therefore, the
final state is inferred from the presence or absence of the final state is inferred from the presence or absence of the
MP_CAPABLE option in the third packet of the TCP handshake. If this MP_CAPABLE option in the third packet of the TCP handshake. If this
option is not present, the connection SHOULD fall back to regular option is not present, the connection SHOULD fall back to regular
TCP, as documented in Section 3.8. TCP, as documented in Section 3.7.
The initial data sequence number on an MPTCP connection is generated The initial data sequence number on an MPTCP connection is generated
from the key. The algorithm for IDSN generation is also determined from the key. The algorithm for IDSN generation is also determined
from the negotiated authentication algorithm. In this specification, from the negotiated authentication algorithm. In this specification,
with only the SHA-256 algorithm specified and selected, the IDSN of a with only the SHA-256 algorithm specified and selected, the IDSN of a
host MUST be the least significant 64 bits of the SHA-256 hash of its host MUST be the least significant 64 bits of the SHA-256 hash of its
key, i.e., IDSN-A = Hash(Key-A) and IDSN-B = Hash(Key-B). This key, i.e., IDSN-A = Hash(Key-A) and IDSN-B = Hash(Key-B). This
deterministic generation of the IDSN allows a receiver to ensure that deterministic generation of the IDSN allows a receiver to ensure that
there are no gaps in sequence space at the start of the connection. there are no gaps in sequence space at the start of the connection.
The SYN with MP_CAPABLE occupies the first octet of data sequence The SYN with MP_CAPABLE occupies the first octet of data sequence
skipping to change at page 20, line 29 skipping to change at page 20, line 33
3.2. Starting a New Subflow 3.2. Starting a New Subflow
Once an MPTCP connection has begun with the MP_CAPABLE exchange, Once an MPTCP connection has begun with the MP_CAPABLE exchange,
further subflows can be added to the connection. Hosts have further subflows can be added to the connection. Hosts have
knowledge of their own address(es), and can become aware of the other knowledge of their own address(es), and can become aware of the other
host's addresses through signaling exchanges as described in host's addresses through signaling exchanges as described in
Section 3.4. Using this knowledge, a host can initiate a new subflow Section 3.4. Using this knowledge, a host can initiate a new subflow
over a currently unused pair of addresses. It is permitted for over a currently unused pair of addresses. It is permitted for
either host in a connection to initiate the creation of a new either host in a connection to initiate the creation of a new
subflow, but it is expected that this will normally be the original subflow, but it is expected that this will normally be the original
connection initiator (see Section 3.10 for heuristics). connection initiator (see Section 3.9 for heuristics).
A new subflow is started as a normal TCP SYN/ACK exchange. The Join A new subflow is started as a normal TCP SYN/ACK exchange. The Join
Connection (MP_JOIN) MPTCP option is used to identify the connection Connection (MP_JOIN) MPTCP option is used to identify the connection
to be joined by the new subflow. It uses keying material that was to be joined by the new subflow. It uses keying material that was
exchanged in the initial MP_CAPABLE handshake (Section 3.1), and that exchanged in the initial MP_CAPABLE handshake (Section 3.1), and that
handshake also negotiates the crypto algorithm in use for the MP_JOIN handshake also negotiates the crypto algorithm in use for the MP_JOIN
handshake. handshake.
This section specifies the behavior of MP_JOIN using the HMAC-SHA256 This section specifies the behavior of MP_JOIN using the HMAC-SHA256
algorithm. An MP_JOIN option is present in the SYN, SYN/ACK, and ACK algorithm. An MP_JOIN option is present in the SYN, SYN/ACK, and ACK
skipping to change at page 25, line 20 skipping to change at page 25, line 20
MP_JOIN is stripped from the SYN on the path from A to B, and Host B MP_JOIN is stripped from the SYN on the path from A to B, and Host B
does not have a passive opener on the relevant port, it will respond does not have a passive opener on the relevant port, it will respond
with a RST in the normal way. If in response to a SYN with an with a RST in the normal way. If in response to a SYN with an
MP_JOIN option, a SYN/ACK is received without the MP_JOIN option MP_JOIN option, a SYN/ACK is received without the MP_JOIN option
(either since it was stripped on the return path, or it was stripped (either since it was stripped on the return path, or it was stripped
on the outgoing path but the passive opener on Host B responded as if on the outgoing path but the passive opener on Host B responded as if
it were a new regular TCP session), then the subflow is unusable and it were a new regular TCP session), then the subflow is unusable and
Host A MUST close it with a RST. Host A MUST close it with a RST.
Note that additional subflows can be created between any pair of Note that additional subflows can be created between any pair of
ports (but see Section 3.10 for heuristics); no explicit application- ports (but see Section 3.9 for heuristics); no explicit application-
level accept calls or bind calls are required to open additional level accept calls or bind calls are required to open additional
subflows. To associate a new subflow with an existing connection, subflows. To associate a new subflow with an existing connection,
the token supplied in the subflow's SYN exchange is used for the token supplied in the subflow's SYN exchange is used for
demultiplexing. This then binds the 5-tuple of the TCP subflow to demultiplexing. This then binds the 5-tuple of the TCP subflow to
the local token of the connection. A consequence is that it is the local token of the connection. A consequence is that it is
possible to allow any port pairs to be used for a connection. possible to allow any port pairs to be used for a connection.
Demultiplexing subflow SYNs MUST be done using the token; this is Demultiplexing subflow SYNs MUST be done using the token; this is
unlike traditional TCP, where the destination port is used for unlike traditional TCP, where the destination port is used for
demultiplexing SYN packets. Once a subflow is set up, demultiplexing demultiplexing SYN packets. Once a subflow is set up, demultiplexing
skipping to change at page 28, line 7 skipping to change at page 28, line 7
the subflow sequence numbering is relative (the SYN at the start of the subflow sequence numbering is relative (the SYN at the start of
the subflow has relative subflow sequence number 0). This is to the subflow has relative subflow sequence number 0). This is to
allow middleboxes to change the initial sequence number of a subflow, allow middleboxes to change the initial sequence number of a subflow,
such as firewalls that undertake ISN randomization. such as firewalls that undertake ISN randomization.
The data sequence mapping also contains a checksum of the data that The data sequence mapping also contains a checksum of the data that
this mapping covers, if use of checksums has been negotiated at the this mapping covers, if use of checksums has been negotiated at the
MP_CAPABLE exchange. Checksums are used to detect if the payload has MP_CAPABLE exchange. Checksums are used to detect if the payload has
been adjusted in any way by a non-MPTCP-aware middlebox. If this been adjusted in any way by a non-MPTCP-aware middlebox. If this
checksum fails, it will trigger a failure of the subflow, or a checksum fails, it will trigger a failure of the subflow, or a
fallback to regular TCP, as documented in Section 3.8, since MPTCP fallback to regular TCP, as documented in Section 3.7, since MPTCP
can no longer reliably know the subflow sequence space at the can no longer reliably know the subflow sequence space at the
receiver to build data sequence mappings. receiver to build data sequence mappings.
The checksum algorithm used is the standard TCP checksum [RFC0793], The checksum algorithm used is the standard TCP checksum [RFC0793],
operating over the data covered by this mapping, along with a pseudo- operating over the data covered by this mapping, along with a pseudo-
header as shown in Figure 10. header as shown in Figure 10.
1 2 3 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+--------------------------------------------------------------+ +--------------------------------------------------------------+
skipping to change at page 30, line 10 skipping to change at page 30, line 10
A data sequence mapping does not need to be included in every MPTCP A data sequence mapping does not need to be included in every MPTCP
packet, as long as the subflow sequence space in that packet is packet, as long as the subflow sequence space in that packet is
covered by a mapping known at the receiver. This can be used to covered by a mapping known at the receiver. This can be used to
reduce overhead in cases where the mapping is known in advance; one reduce overhead in cases where the mapping is known in advance; one
such case is when there is a single subflow between the hosts, such case is when there is a single subflow between the hosts,
another is when segments of data are scheduled in larger than packet- another is when segments of data are scheduled in larger than packet-
sized chunks. sized chunks.
An "infinite" mapping can be used to fall back to regular TCP by An "infinite" mapping can be used to fall back to regular TCP by
mapping the subflow-level data to the connection-level data for the mapping the subflow-level data to the connection-level data for the
remainder of the connection (see Section 3.8). This is achieved by remainder of the connection (see Section 3.7). This is achieved by
setting the Data-Level Length field of the DSS option to the reserved setting the Data-Level Length field of the DSS option to the reserved
value of 0. The checksum, in such a case, will also be set to zero. value of 0. The checksum, in such a case, will also be set to zero.
3.3.2. Data Acknowledgments 3.3.2. Data Acknowledgments
To provide full end-to-end resilience, MPTCP provides a connection- To provide full end-to-end resilience, MPTCP provides a connection-
level acknowledgment, to act as a cumulative ACK for the connection level acknowledgment, to act as a cumulative ACK for the connection
as a whole. This is the "Data ACK" field of the DSS option as a whole. This is the "Data ACK" field of the DSS option
(Figure 9). The Data ACK is analogous to the behavior of the (Figure 9). The Data ACK is analogous to the behavior of the
standard TCP cumulative ACK -- indicating how much data has been standard TCP cumulative ACK -- indicating how much data has been
skipping to change at page 32, line 36 skipping to change at page 32, line 36
A connection is considered closed once both hosts' DATA_FINs have A connection is considered closed once both hosts' DATA_FINs have
been acknowledged by DATA_ACKs. been acknowledged by DATA_ACKs.
As specified above, a standard TCP FIN on an individual subflow only As specified above, a standard TCP FIN on an individual subflow only
shuts down the subflow on which it was sent. If all subflows have shuts down the subflow on which it was sent. If all subflows have
been closed with a FIN exchange, but no DATA_FIN has been received been closed with a FIN exchange, but no DATA_FIN has been received
and acknowledged, the MPTCP connection is treated as closed only and acknowledged, the MPTCP connection is treated as closed only
after a timeout. This implies that an implementation will have after a timeout. This implies that an implementation will have
TIME_WAIT states at both the subflow and connection levels (see TIME_WAIT states at both the subflow and connection levels (see
Appendix C). This permits "break-before-make" scenarios where Appendix D). This permits "break-before-make" scenarios where
connectivity is lost on all subflows before a new one can be re- connectivity is lost on all subflows before a new one can be re-
established. established.
3.3.4. Receiver Considerations 3.3.4. Receiver Considerations
Regular TCP advertises a receive window in each packet, telling the Regular TCP advertises a receive window in each packet, telling the
sender how much data the receiver is willing to accept past the sender how much data the receiver is willing to accept past the
cumulative ack. The receive window is used to implement flow cumulative ack. The receive window is used to implement flow
control, throttling down fast senders when receivers cannot keep up. control, throttling down fast senders when receivers cannot keep up.
skipping to change at page 39, line 35 skipping to change at page 39, line 35
The 2 octets that specify the TCP port number to use are optional and The 2 octets that specify the TCP port number to use are optional and
their presence can be inferred from the length of the option. their presence can be inferred from the length of the option.
Although it is expected that the majority of use cases will use the Although it is expected that the majority of use cases will use the
same port pairs as used for the initial subflow (e.g., port 80 same port pairs as used for the initial subflow (e.g., port 80
remains port 80 on all subflows, as does the ephemeral port at the remains port 80 on all subflows, as does the ephemeral port at the
client), there may be cases (such as port-based load balancing) where client), there may be cases (such as port-based load balancing) where
the explicit specification of a different port is required. If no the explicit specification of a different port is required. If no
port is specified, MPTCP SHOULD attempt to connect to the specified port is specified, MPTCP SHOULD attempt to connect to the specified
address on the same port as is already in use by the subflow on which address on the same port as is already in use by the subflow on which
the ADD_ADDR signal was sent; this is discussed in more detail in the ADD_ADDR signal was sent; this is discussed in more detail in
Section 3.10. Section 3.9.
The Truncated HMAC present in this Option is the rightmost 64 bits of The Truncated HMAC present in this Option is the rightmost 64 bits of
an HMAC, negotiated and calculated in the same way as for MP_JOIN as an HMAC, negotiated and calculated in the same way as for MP_JOIN as
described in Section 3.2. For this specification of MPTCP, as there described in Section 3.2. For this specification of MPTCP, as there
is only one hash algorithm option specified, this will be HMAC as is only one hash algorithm option specified, this will be HMAC as
defined in [RFC2104], using the SHA-256 hash algorithm [SHS], defined in [RFC2104], using the SHA-256 hash algorithm [SHS],
implemented as in [RFC6234]. In the same way as for MP_JOIN, the key implemented as in [RFC6234]. In the same way as for MP_JOIN, the key
for the HMAC algorithm, in the case of the message transmitted by for the HMAC algorithm, in the case of the message transmitted by
Host A, will be Key-A followed by Key-B, and in the case of Host B, Host A, will be Key-A followed by Key-B, and in the case of Host B,
Key-B followed by Key-A. These are the keys that were exchanged in Key-B followed by Key-A. These are the keys that were exchanged in
skipping to change at page 46, line 24 skipping to change at page 46, line 24
reset and start again than it is to retransmit the queued data. reset and start again than it is to retransmit the queued data.
o Unacceptable performance (code 0x05). This code indicates that o Unacceptable performance (code 0x05). This code indicates that
the performance of this subflow was too low compared to the other the performance of this subflow was too low compared to the other
subflows of this Multipath TCP connection. subflows of this Multipath TCP connection.
o Middlebox interference (code 0x06). Middlebox interference has o Middlebox interference (code 0x06). Middlebox interference has
been detected over this subflow making MPTCP signaling invalid. been detected over this subflow making MPTCP signaling invalid.
For example, this may be sent if the checksum does not validate. For example, this may be sent if the checksum does not validate.
3.7. MPTCP Experimental Option 3.7. Fallback
In order to provide a structured identity and negotiation mechanism
for private experimental MPTCP extensions, the MP_EXPERIMENTAL option
has been reserved.
1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+---------------+---------------+-------+-------+---------------+
| Kind | Length |Subtype|S|U|rsv| Experiment |
+---------------+---------------+-------+-------+---------------+
| Id. (16 bits) | Subtype-specific data (variable length) ...
+----------------------------------------------------------- ...
Figure 16: MPTCP Experimental (MP_EXPERIMENTAL) Option
Figure 16 shows the format of the experimental option. The
Experiment identifier is a 16 bits integer that shall be assigned by
using the same procedure as defined in [RFC6994]; a request to IANA
is made in Section 8.4.
The two high order flags that are included in the MPTCP Experimental
option have the following semantics:
o "S" flag (highest order bit) : This is the synchronising bit.
When set to 1, it indicates that the host sending this option
expects a reply from the remote host with an option having the
same experiment identifier, but possibly containing other data.
o "U" flag (second highest order bit) : When set to 1, this flag
indicates that the experimental option was received by the sending
host but it was unable to parse it.
The two low order flags are currently reserved for further use. They
MUST be set to zero when sending and ignored upon reception.
To use the Experimental MPTCP option with a given experiment
identifier over a MPTCP connection, the sending host must first
verify the ability of the remote host to support this particular
Experimental option. For this, it first sends in any valid TCP
segment, including a duplicate acknowledgement, an Experimental MPTCP
option with the "S" flag set. Upon reception of this option, the
receiving host will verify whether it supports it. If yes, it shall
return a TCP segment that contains the experimental option with the
same identifier and the "S" and the "U" flags both set to 1. This
option may contain additional data depending on the semantics of the
extension. If the receiving host does not recognise the experimental
option that it has received, it shall return a TCP segment that
contains the received experimental option with the "S" flag set to 0
and the "U" flag set to 1.
If a host receives an Experimental MPTCP option with the "U" flag set
to 0 which it does not support, or which contains information that
the host cannot parse, it shall return the exact option that it
received with the "U" flag set to 1 to indicate the error to the
remote host. If an invalid option is received with the "U" flag set
to 0, it must be silently discarded.
Future documents specifying new experimental MPTCP options should
specify the extract semantic of the Subtype-specific data and whether
additional validation operations are to be followed at both sides.
It should be noted that data can be included in an experimental
option concurrently with the capability check (S/U).
3.8. Fallback
Sometimes, middleboxes will exist on a path that could prevent the Sometimes, middleboxes will exist on a path that could prevent the
operation of MPTCP. MPTCP has been designed in order to cope with operation of MPTCP. MPTCP has been designed in order to cope with
many middlebox modifications (see Section 6), but there are still many middlebox modifications (see Section 6), but there are still
some cases where a subflow could fail to operate within the MPTCP some cases where a subflow could fail to operate within the MPTCP
requirements. These cases are notably the following: the loss of requirements. These cases are notably the following: the loss of
MPTCP options on a path and the modification of payload data. If MPTCP options on a path and the modification of payload data. If
such an event occurs, it is necessary to "fall back" to the previous, such an event occurs, it is necessary to "fall back" to the previous,
safe operation. This may be either falling back to regular TCP or safe operation. This may be either falling back to regular TCP or
removing a problematic subflow. removing a problematic subflow.
skipping to change at page 50, line 4 skipping to change at page 48, line 36
tampered with. tampered with.
When multiple subflows are in use, the data in flight on a subflow When multiple subflows are in use, the data in flight on a subflow
will likely involve data that is not contiguously part of the will likely involve data that is not contiguously part of the
connection-level stream, since segments will be spread across the connection-level stream, since segments will be spread across the
multiple subflows. Due to the problems identified above, it is not multiple subflows. Due to the problems identified above, it is not
possible to determine what the adjustment has done to the data possible to determine what the adjustment has done to the data
(notably, any changes to the subflow sequence numbering). Therefore, (notably, any changes to the subflow sequence numbering). Therefore,
it is not possible to recover the subflow, and the affected subflow it is not possible to recover the subflow, and the affected subflow
must be immediately closed with a RST, featuring an MP_FAIL option must be immediately closed with a RST, featuring an MP_FAIL option
(Figure 17), which defines the data sequence number at the start of (Figure 16), which defines the data sequence number at the start of
the segment (defined by the data sequence mapping) that had the the segment (defined by the data sequence mapping) that had the
checksum failure. Note that the MP_FAIL option requires the use of checksum failure. Note that the MP_FAIL option requires the use of
the full 64-bit sequence number, even if 32-bit sequence numbers are the full 64-bit sequence number, even if 32-bit sequence numbers are
normally in use in the DSS signals on the path. normally in use in the DSS signals on the path.
1 2 3 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+---------------+---------------+-------+----------------------+ +---------------+---------------+-------+----------------------+
| Kind | Length=12 |Subtype| (reserved) | | Kind | Length=12 |Subtype| (reserved) |
+---------------+---------------+-------+----------------------+ +---------------+---------------+-------+----------------------+
| | | |
| Data Sequence Number (8 octets) | | Data Sequence Number (8 octets) |
| | | |
+--------------------------------------------------------------+ +--------------------------------------------------------------+
Figure 17: Fallback (MP_FAIL) Option Figure 16: Fallback (MP_FAIL) Option
The receiver MUST discard all data following the data sequence number The receiver MUST discard all data following the data sequence number
specified. Failed data MUST NOT be DATA_ACKed and so will be specified. Failed data MUST NOT be DATA_ACKed and so will be
retransmitted on other subflows (Section 3.3.6). retransmitted on other subflows (Section 3.3.6).
A special case is when there is a single subflow and it fails with a A special case is when there is a single subflow and it fails with a
checksum error. If it is known that all unacknowledged data in checksum error. If it is known that all unacknowledged data in
flight is contiguous (which will usually be the case with a single flight is contiguous (which will usually be the case with a single
subflow), an infinite mapping can be applied to the subflow without subflow), an infinite mapping can be applied to the subflow without
the need to close it first, and essentially turn off all further the need to close it first, and essentially turn off all further
skipping to change at page 51, line 33 skipping to change at page 50, line 25
otherwise, the receiver would not know how to reorder the data. In otherwise, the receiver would not know how to reorder the data. In
practice, this means that all MPTCP subflows will have to be practice, this means that all MPTCP subflows will have to be
terminated except one. Once MPTCP falls back to regular TCP, it MUST terminated except one. Once MPTCP falls back to regular TCP, it MUST
NOT revert to MPTCP later in the connection. NOT revert to MPTCP later in the connection.
It should be emphasized that we are not attempting to prevent the use It should be emphasized that we are not attempting to prevent the use
of middleboxes that want to adjust the payload. An MPTCP-aware of middleboxes that want to adjust the payload. An MPTCP-aware
middlebox could provide such functionality by also rewriting middlebox could provide such functionality by also rewriting
checksums. checksums.
3.9. Error Handling 3.8. Error Handling
In addition to the fallback mechanism as described above, the In addition to the fallback mechanism as described above, the
standard classes of TCP errors may need to be handled in an MPTCP- standard classes of TCP errors may need to be handled in an MPTCP-
specific way. Note that changing semantics -- such as the relevance specific way. Note that changing semantics -- such as the relevance
of a RST -- are covered in Section 4. Where possible, we do not want of a RST -- are covered in Section 4. Where possible, we do not want
to deviate from regular TCP behavior. to deviate from regular TCP behavior.
The following list covers possible errors and the appropriate MPTCP The following list covers possible errors and the appropriate MPTCP
behavior: behavior:
o Unknown token in MP_JOIN (or HMAC failure in MP_JOIN ACK, or o Unknown token in MP_JOIN (or HMAC failure in MP_JOIN ACK, or
missing MP_JOIN in SYN/ACK response): send RST (analogous to TCP's missing MP_JOIN in SYN/ACK response): send RST (analogous to TCP's
behavior on an unknown port) behavior on an unknown port)
o DSN out of window (during normal operation): drop the data, do not o DSN out of window (during normal operation): drop the data, do not
send Data ACKs send Data ACKs
o Remove request for unknown address ID: silently ignore o Remove request for unknown address ID: silently ignore
3.10. Heuristics 3.9. Heuristics
There are a number of heuristics that are needed for performance or There are a number of heuristics that are needed for performance or
deployment but that are not required for protocol correctness. In deployment but that are not required for protocol correctness. In
this section, we detail such heuristics. Note that discussion of this section, we detail such heuristics. Note that discussion of
buffering and certain sender and receiver window behaviors are buffering and certain sender and receiver window behaviors are
presented in Sections 3.3.4 and 3.3.5, as well as retransmission in presented in Sections 3.3.4 and 3.3.5, as well as retransmission in
Section 3.3.6. Section 3.3.6.
3.10.1. Port Usage 3.9.1. Port Usage
Under typical operation, an MPTCP implementation SHOULD use the same Under typical operation, an MPTCP implementation SHOULD use the same
ports as already in use. In other words, the destination port of a ports as already in use. In other words, the destination port of a
SYN containing an MP_JOIN option SHOULD be the same as the remote SYN containing an MP_JOIN option SHOULD be the same as the remote
port of the first subflow in the connection. The local port for such port of the first subflow in the connection. The local port for such
SYNs SHOULD also be the same as for the first subflow (and as such, SYNs SHOULD also be the same as for the first subflow (and as such,
an implementation SHOULD reserve ephemeral ports across all local IP an implementation SHOULD reserve ephemeral ports across all local IP
addresses), although there may be cases where this is infeasible. addresses), although there may be cases where this is infeasible.
This strategy is intended to maximize the probability of the SYN This strategy is intended to maximize the probability of the SYN
being permitted by a firewall or NAT at the recipient and to avoid being permitted by a firewall or NAT at the recipient and to avoid
skipping to change at page 52, line 36 skipping to change at page 51, line 29
There may also be cases, however, where the passive opener wishes to There may also be cases, however, where the passive opener wishes to
signal to the other host that a specific port should be used, and signal to the other host that a specific port should be used, and
this facility is provided in the Add Address option as documented in this facility is provided in the Add Address option as documented in
Section 3.4.1. It is therefore feasible to allow multiple subflows Section 3.4.1. It is therefore feasible to allow multiple subflows
between the same two addresses but using different port pairs, and between the same two addresses but using different port pairs, and
such a facility could be used to allow load balancing within the such a facility could be used to allow load balancing within the
network based on 5-tuples (e.g., some ECMP implementations network based on 5-tuples (e.g., some ECMP implementations
[RFC2992]). [RFC2992]).
3.10.2. Delayed Subflow Start and Subflow Symmetry 3.9.2. Delayed Subflow Start and Subflow Symmetry
Many TCP connections are short-lived and consist only of a few Many TCP connections are short-lived and consist only of a few
segments, and so the overheads of using MPTCP outweigh any benefits. segments, and so the overheads of using MPTCP outweigh any benefits.
A heuristic is required, therefore, to decide when to start using A heuristic is required, therefore, to decide when to start using
additional subflows in an MPTCP connection. We expect that additional subflows in an MPTCP connection. We expect that
experience gathered from deployments will provide further guidance on experience gathered from deployments will provide further guidance on
this, and will be affected by particular application characteristics this, and will be affected by particular application characteristics
(which are likely to change over time). However, a suggested (which are likely to change over time). However, a suggested
general-purpose heuristic that an implementation MAY choose to employ general-purpose heuristic that an implementation MAY choose to employ
is as follows. Results from experimental deployments are needed in is as follows. Results from experimental deployments are needed in
skipping to change at page 53, line 42 skipping to change at page 52, line 37
is RECOMMENDED that some element of randomization is applied to the is RECOMMENDED that some element of randomization is applied to the
time waited before opening new subflows, so that only one subflow time waited before opening new subflows, so that only one subflow
exists between a given address pair. If, however, hosts signal exists between a given address pair. If, however, hosts signal
additional ports to use (for example, for leveraging ECMP on-path), additional ports to use (for example, for leveraging ECMP on-path),
this heuristic need not apply. this heuristic need not apply.
This section has shown some of the considerations that an implementer This section has shown some of the considerations that an implementer
should give when developing MPTCP heuristics, but is not intended to should give when developing MPTCP heuristics, but is not intended to
be prescriptive. be prescriptive.
3.10.3. Failure Handling 3.9.3. Failure Handling
Requirements for MPTCP's handling of unexpected signals have been Requirements for MPTCP's handling of unexpected signals have been
given in Section 3.9. There are other failure cases, however, where given in Section 3.8. There are other failure cases, however, where
a hosts can choose appropriate behavior. a hosts can choose appropriate behavior.
For example, Section 3.1 suggests that a host SHOULD fall back to For example, Section 3.1 suggests that a host SHOULD fall back to
trying regular TCP SYNs after one or more failures of MPTCP SYNs for trying regular TCP SYNs after one or more failures of MPTCP SYNs for
a connection. A host may keep a system-wide cache of such a connection. A host may keep a system-wide cache of such
information, so that it can back off from using MPTCP, firstly for information, so that it can back off from using MPTCP, firstly for
that particular destination host, and eventually on a whole that particular destination host, and eventually on a whole
interface, if MPTCP connections continue failing. interface, if MPTCP connections continue failing.
Another failure could occur when the MP_JOIN handshake fails. Another failure could occur when the MP_JOIN handshake fails.
Section 3.9 specifies that an incorrect handshake MUST lead to the Section 3.8 specifies that an incorrect handshake MUST lead to the
subflow being closed with a RST. A host operating an active subflow being closed with a RST. A host operating an active
intrusion detection system may choose to start blocking MP_JOIN intrusion detection system may choose to start blocking MP_JOIN
packets from the source host if multiple failed MP_JOIN attempts are packets from the source host if multiple failed MP_JOIN attempts are
seen. From the connection initiator's point of view, if an MP_JOIN seen. From the connection initiator's point of view, if an MP_JOIN
fails, it SHOULD NOT attempt to connect to the same IP address and fails, it SHOULD NOT attempt to connect to the same IP address and
port during the lifetime of the connection, unless the other host port during the lifetime of the connection, unless the other host
refreshes the information with another ADD_ADDR option. Note that refreshes the information with another ADD_ADDR option. Note that
the ADD_ADDR option is informational only, and does not guarantee the the ADD_ADDR option is informational only, and does not guarantee the
other host will attempt a connection. other host will attempt a connection.
In addition, an implementation may learn, over a number of In addition, an implementation may learn, over a number of
connections, that certain interfaces or destination addresses connections, that certain interfaces or destination addresses
consistently fail and may default to not trying to use MPTCP for consistently fail and may default to not trying to use MPTCP for
these. Behavior could also be learned for particularly badly these. Behavior could also be learned for particularly badly
performing subflows or subflows that regularly fail during use, in performing subflows or subflows that regularly fail during use, in
order to temporarily choose not to use these paths. order to temporarily choose not to use these paths.
3.11. TCP Fast Open
TCP Fast Open, described in [RFC7413], has been introduced with the
objective of gaining one RTT before transmitting data. This is
considered a valuable gain as very short connections are very common,
especially for HTTP request/response schemes. It achieves this by
sending the SYN-segment together with data and allowing the server to
reply immediately with data after the SYN/ACK. [RFC7413] secures
this mechanism, by using a new TCP option that includes a cookie
which is negotiated in a preceding connection.
When using TCP Fast Open in conjunction with MPTCP, there are two key
points to take into account, detailed hereafter.
3.11.1. TFO cookie request with MPTCP
When a TFO client first connects to a server, it cannot immediately
include data in the SYN for security reasons [RFC7413]. Instead, it
requests a cookie that will be used in subsequent connections. This
is done with the TCP cookie request/response options, of resp. 2
bytes and 6-18 bytes (depending on the chosen cookie length).
TFO and MPTCP can be combined provided that the total length of their
options does not exceed the maximum 40 bytes possible in TCP:
o In the SYN: MPTCP uses a 4-bytes long MP_CAPABLE option. The
MPTCP and TFO options sum up to 6 bytes. With typical TCP-options
using up to 19 bytes in the SYN (24 bytes if options are padded at
a word boundary), there is enough space to combine the MP_CAPABLE
with the TFO Cookie Request.
o In the SYN+ACK: MPTCP uses a 12-bytes long MP_CAPABLE option, but
now TFO can be as long as 18 bytes. Since the maximum option
length may be exceeded, it is up to the server to solve this by
using a shorter cookie. As an example, if we consider that 19
bytes are used for classical TCP options, the maximum possible
cookie length would be of 7 bytes. Note that the same limitation
applies to subsequent connections, for the SYN packet (because the
client then echoes back the cookie to the server). Finally, if
the security impact of reducing the cookie size is not deemed
acceptable, the server can reduce the amount of other TCP-options
by omitting the TCP timestamps (as outlined in Appendix A).
3.11.2. Data sequence mapping under TFO
MPTCP uses, in the TCP establishment phase, a key exchange that is
used to generate the Initial Data Sequence Numbers (IDSNs). In
particular, the SYN with MP_CAPABLE occupies the first octet of the
data sequence space. With TFO, one way to handle the data sent
together with the SYN would be to consider an implicit DSS mapping
that covers that SYN segment (since there is not enough space in the
SYN to include a DSS option). The problem with that approach is that
if a middlebox modifies the TFO data, this will not be noticed by
MPTCP because of the absence of a DSS-checksum. For example, a TCP
(but not MPTCP)-aware middlebox could insert bytes at the beginning
of the stream and adapt the TCP checksum and sequence numbers
accordingly. With an implicit mapping, this would give to client and
server a different view on the DSS-mapping, with no way to detect
this inconsistency as the DSS checksum is not present.
To solve this, the TFO data should not be considered part of the Data
Sequence Number space: the SYN with MP_CAPABLE still occupies the
first octet of data sequence space, but then the first non-TFO data
byte occupies the second octet. This guarantees that, if the use of
DSS-checksum is negotiated, all data in the data sequence number
space is checksummed. We also note that this does not entail a loss
of functionality, because TFO-data is always sent when only one path
is active.
3.11.3. Connection establishment examples
The following shows a few examples of possible TFO+MPTCP
establishment scenarios.
Before a client can send data together with the SYN, it must request
a cookie to the server, as shown in Figure Figure 18. This is done
by simply combining the TFO and MPTCP options.
client server
| |
| S 0(0) <MP_CAPABLE>, <TFO cookie request> |
| -----------------------------------------------------------> |
| |
| S. 0(0) ack 1 <MP_CAPABLE>, <TFO cookie> |
| <----------------------------------------------------------- |
| |
| . 0(0) ack 1 <MP_CAPABLE> |
| -----------------------------------------------------------> |
| |
Figure 18: Cookie request
Once this is done, the received cookie can be used for TFO, as shown
in Figure Figure 19. In this example, the client first sends 20
bytes in the SYN. The server immediately replies with 100 bytes
following the SYN-ACK upon which the client replies with 20 more
bytes. Note that the last segment in the figure has a TCP sequence
number of 21, while the DSS subflow sequence number is 1 (because the
TFO data is not part of the data sequence number space, as explained
in Section Section 3.11.2.
client server
| |
| S 0(20) <MP_CAPABLE>, <TFO cookie> |
| -----------------------------------------------------------> |
| |
| S. 0(0) ack 21 <MP_CAPABLE> |
| <----------------------------------------------------------- |
| |
| . 1(100) ack 21 <DSS ack=1 seq=1 ssn=1 dlen=100> |
| <----------------------------------------------------------- |
| |
| . 21(0) ack 1 <MP_CAPABLE> |
| -----------------------------------------------------------> |
| |
| . 21(20) ack 101 <DSS ack=101 seq=1 ssn=1 dlen=20> |
| -----------------------------------------------------------> |
| |
Figure 19: The server supports TFO
In Figure Figure 20, the server does not support TFO. The client
detects that no state is created in the server (as no data is acked),
and now sends the MP_CAPABLE in the third ack, in order for the
server to build its MPTCP context at then end of the establishment.
Now, the tfo data, retransmitted, becomes part of the data sequence
mapping because it is effectively sent (in fact re-sent) after the
establishment.
client server
| |
| S 0(20) <MP_CAPABLE>, <TFO cookie> |
| -----------------------------------------------------------> |
| |
| S. 0(0) ack 1 <MP_CAPABLE> |
| <----------------------------------------------------------- |
| |
| . 1(0) ack 1 <MP_CAPABLE> |
| -----------------------------------------------------------> |
| |
| . 1(20) ack 1 <DSS ack=1 seq=1 ssn=1 dlen=20> |
| -----------------------------------------------------------> |
| |
| . 0(0) ack 21 <DSS ack=21 seq=1 ssn=1 dlen=0> |
| <----------------------------------------------------------- |
| |
Figure 20: The server does not support TFO
It is also possible that the server acknowledges only part of the TFO
data, as illustrated in Figure Figure 21. The client will simply
retransmit the missing data together with a DSS-mapping.
client server
| |
| S 0(1000) <MP_CAPABLE>, <TFO cookie> |
| -----------------------------------------------------------> |
| |
| S. 0(0) ack 501 <MP_CAPABLE> |
| <----------------------------------------------------------- |
| |
| . 501(0) ack 1 <MP_CAPABLE> |
| -----------------------------------------------------------> |
| |
| . 501(500) ack 1 <DSS ack=1 seq=1 ssn=1 dlen=500> |
| -----------------------------------------------------------> |
| |
Figure 21: Partial data acknowledgement
4. Semantic Issues 4. Semantic Issues
In order to support multipath operation, the semantics of some TCP In order to support multipath operation, the semantics of some TCP
components have changed. To aid clarity, this section collects these components have changed. To aid clarity, this section collects these
semantic changes as a reference. semantic changes as a reference.
Sequence number: The (in-header) TCP sequence number is specific to Sequence number: The (in-header) TCP sequence number is specific to
the subflow. To allow the receiver to reorder application data, the subflow. To allow the receiver to reorder application data,
an additional data-level sequence space is used. In this data- an additional data-level sequence space is used. In this data-
level sequence space, the initial SYN and the final DATA_FIN level sequence space, the initial SYN and the final DATA_FIN
skipping to change at page 61, line 9 skipping to change at page 56, line 4
denial-of-service attacks consuming resources. denial-of-service attacks consuming resources.
As discussed in Section 3.4.1, a host may advertise its private As discussed in Section 3.4.1, a host may advertise its private
addresses, but these might point to different hosts in the receiver's addresses, but these might point to different hosts in the receiver's
network. The MP_JOIN handshake (Section 3.2) will ensure that this network. The MP_JOIN handshake (Section 3.2) will ensure that this
does not succeed in setting up a subflow to the incorrect host. does not succeed in setting up a subflow to the incorrect host.
However, it could still create unwanted TCP handshake traffic. This However, it could still create unwanted TCP handshake traffic. This
feature of MPTCP could be a target for denial-of-service exploits, feature of MPTCP could be a target for denial-of-service exploits,
with malicious participants in MPTCP connections encouraging the with malicious participants in MPTCP connections encouraging the
recipient to target other hosts in the network. Therefore, recipient to target other hosts in the network. Therefore,
implementations should consider heuristics (Section 3.10) at both the implementations should consider heuristics (Section 3.9) at both the
sender and receiver to reduce the impact of this. sender and receiver to reduce the impact of this.
A small security risk could theoretically exist with key reuse, but A small security risk could theoretically exist with key reuse, but
in order to accomplish a replay attack, both the sender and receiver in order to accomplish a replay attack, both the sender and receiver
keys, and the sender and receiver random numbers, in the MP_JOIN keys, and the sender and receiver random numbers, in the MP_JOIN
handshake (Section 3.2) would have to match. handshake (Section 3.2) would have to match.
Whilst this specification defines a "medium" security solution, Whilst this specification defines a "medium" security solution,
meeting the criteria specified at the start of this section and the meeting the criteria specified at the start of this section and the
threat analysis ([RFC6181]), since attacks only ever get worse, it is threat analysis ([RFC6181]), since attacks only ever get worse, it is
skipping to change at page 62, line 38 skipping to change at page 57, line 35
presence of the SYN flag. presence of the SYN flag.
MPTCP SYN packets on the first subflow of a connection contain the MPTCP SYN packets on the first subflow of a connection contain the
MP_CAPABLE option (Section 3.1). If this is dropped, MPTCP SHOULD MP_CAPABLE option (Section 3.1). If this is dropped, MPTCP SHOULD
fall back to regular TCP. If packets with the MP_JOIN option fall back to regular TCP. If packets with the MP_JOIN option
(Section 3.2) are dropped, the paths will simply not be used. (Section 3.2) are dropped, the paths will simply not be used.
If a middlebox strips options but otherwise passes the packets If a middlebox strips options but otherwise passes the packets
unchanged, MPTCP will behave safely. If an MP_CAPABLE option is unchanged, MPTCP will behave safely. If an MP_CAPABLE option is
dropped on either the outgoing or the return path, the initiating dropped on either the outgoing or the return path, the initiating
host can fall back to regular TCP, as illustrated in Figure 22 and host can fall back to regular TCP, as illustrated in Figure 17 and
discussed in Section 3.1. discussed in Section 3.1.
Subflow SYNs contain the MP_JOIN option. If this option is stripped Subflow SYNs contain the MP_JOIN option. If this option is stripped
on the outgoing path, the SYN will appear to be a regular SYN to Host on the outgoing path, the SYN will appear to be a regular SYN to Host
B. Depending on whether there is a listening socket on the target B. Depending on whether there is a listening socket on the target
port, Host B will reply either with SYN/ACK or RST (subflow port, Host B will reply either with SYN/ACK or RST (subflow
connection fails). When Host A receives the SYN/ACK it sends a RST connection fails). When Host A receives the SYN/ACK it sends a RST
because the SYN/ACK does not contain the MP_JOIN option and its because the SYN/ACK does not contain the MP_JOIN option and its
token. Either way, the subflow setup fails, but otherwise does not token. Either way, the subflow setup fails, but otherwise does not
affect the MPTCP connection as a whole. affect the MPTCP connection as a whole.
skipping to change at page 63, line 23 skipping to change at page 58, line 23
Host A Host B Host A Host B
| SYN(MP_CAPABLE) | | SYN(MP_CAPABLE) |
|------------------------------------>| |------------------------------------>|
| Middlebox M | | Middlebox M |
| | | | | |
| SYN/ACK |SYN/ACK(MP_CAPABLE)| | SYN/ACK |SYN/ACK(MP_CAPABLE)|
|<----------------|-------------------| |<----------------|-------------------|
b) MP_CAPABLE option stripped on return path b) MP_CAPABLE option stripped on return path
Figure 22: Connection Setup with Middleboxes that Strip Options from Figure 17: Connection Setup with Middleboxes that Strip Options from
Packets Packets
We now examine data flow with MPTCP, assuming the flow is correctly We now examine data flow with MPTCP, assuming the flow is correctly
set up, which implies the options in the SYN packets were allowed set up, which implies the options in the SYN packets were allowed
through by the relevant middleboxes. If options are allowed through through by the relevant middleboxes. If options are allowed through
and there is no resegmentation or coalescing to TCP segments, and there is no resegmentation or coalescing to TCP segments,
Multipath TCP flows can proceed without problems. Multipath TCP flows can proceed without problems.
The case when options get stripped on data packets has been discussed The case when options get stripped on data packets has been discussed
in the Fallback section. If a fraction of options are stripped, in the Fallback section. If a fraction of options are stripped,
behavior is not deterministic. If some data sequence mappings are behavior is not deterministic. If some data sequence mappings are
lost, the connection can continue so long as mappings exist for the lost, the connection can continue so long as mappings exist for the
subflow-level data (e.g., if multiple maps have been sent that subflow-level data (e.g., if multiple maps have been sent that
reinforce each other). If some subflow-level space is left unmapped, reinforce each other). If some subflow-level space is left unmapped,
however, the subflow is treated as broken and is closed, through the however, the subflow is treated as broken and is closed, through the
process described in Section 3.8. MPTCP should survive with a loss process described in Section 3.7. MPTCP should survive with a loss
of some Data ACKs, but performance will degrade as the fraction of of some Data ACKs, but performance will degrade as the fraction of
stripped options increases. We do not expect such cases to appear in stripped options increases. We do not expect such cases to appear in
practice, though: most middleboxes will either strip all options or practice, though: most middleboxes will either strip all options or
let them all through. let them all through.
We end this section with a list of middlebox classes, their behavior, We end this section with a list of middlebox classes, their behavior,
and the elements in the MPTCP design that allow operation through and the elements in the MPTCP design that allow operation through
such middleboxes. Issues surrounding dropping packets with options such middleboxes. Issues surrounding dropping packets with options
or stripping options were discussed above, and are not included here: or stripping options were discussed above, and are not included here:
skipping to change at page 64, line 15 skipping to change at page 59, line 15
the MP_JOIN option, and the handshake mechanism ensures that the MP_JOIN option, and the handshake mechanism ensures that
connection attempts to private addresses [RFC1918] do not cause connection attempts to private addresses [RFC1918] do not cause
problems. Explicit address removal is undertaken by an Address ID problems. Explicit address removal is undertaken by an Address ID
to allow no knowledge of the source address. to allow no knowledge of the source address.
o Performance Enhancing Proxies (PEPs) [RFC3135] might proactively o Performance Enhancing Proxies (PEPs) [RFC3135] might proactively
ACK data to increase performance. MPTCP, however, relies on ACK data to increase performance. MPTCP, however, relies on
accurate congestion control signals from the end host, and non- accurate congestion control signals from the end host, and non-
MPTCP-aware PEPs will not be able to provide such signals. MPTCP MPTCP-aware PEPs will not be able to provide such signals. MPTCP
will, therefore, fall back to single-path TCP, or close the will, therefore, fall back to single-path TCP, or close the
problematic subflow (see Section 3.8). problematic subflow (see Section 3.7).
o Traffic Normalizers [norm] may not allow holes in sequence o Traffic Normalizers [norm] may not allow holes in sequence
numbers, and may cache packets and retransmit the same data. numbers, and may cache packets and retransmit the same data.
MPTCP looks like standard TCP on the wire, and will not retransmit MPTCP looks like standard TCP on the wire, and will not retransmit
different data on the same subflow sequence number. In the event different data on the same subflow sequence number. In the event
of a retransmission, the same data will be retransmitted on the of a retransmission, the same data will be retransmitted on the
original TCP subflow even if it is additionally retransmitted at original TCP subflow even if it is additionally retransmitted at
the connection level on a different subflow. the connection level on a different subflow.
o Firewalls [RFC2979] might perform initial sequence number o Firewalls [RFC2979] might perform initial sequence number
skipping to change at page 66, line 37 skipping to change at page 61, line 37
| | | | document, | | | | | document, |
| | | | Section 3.4.1 | | | | | Section 3.4.1 |
| 0x4 | REMOVE_ADDR | Remove Address | This | | 0x4 | REMOVE_ADDR | Remove Address | This |
| | | | document, | | | | | document, |
| | | | Section 3.4.2 | | | | | Section 3.4.2 |
| 0x5 | MP_PRIO | Change Subflow Priority | This | | 0x5 | MP_PRIO | Change Subflow Priority | This |
| | | | document, | | | | | document, |
| | | | Section 3.3.8 | | | | | Section 3.3.8 |
| 0x6 | MP_FAIL | Fallback | This | | 0x6 | MP_FAIL | Fallback | This |
| | | | document, | | | | | document, |
| | | | Section 3.8 | | | | | Section 3.7 |
| 0x7 | MP_FASTCLOSE | Fast Close | This | | 0x7 | MP_FASTCLOSE | Fast Close | This |
| | | | document, | | | | | document, |
| | | | Section 3.5 | | | | | Section 3.5 |
| 0x8 | MP_TCPRST | Subflow Reset | This | | 0x8 | MP_TCPRST | Subflow Reset | This |
| | | | document, | | | | | document, |
| | | | Section 3.6 | | | | | Section 3.6 |
| 0xf | MP_EXPERIMENTAL | MPTCP Experimental | This | | 0xf | MP_EXPERIMENTAL | Reserved for private | |
| | | Option | document, | | | | experiments | |
| | | | Section 3.7 |
+-------+-----------------+-------------------------+---------------+ +-------+-----------------+-------------------------+---------------+
Table 2: MPTCP Option Subtypes Table 2: MPTCP Option Subtypes
Values 0x9 through 0xe are currently unassigned. Values 0x9 through 0xe are currently unassigned. Option 0xf is
reserved for use by private experiments. Its use may be formalized
in a future specification.
8.2. MPTCP Handshake Algorithms 8.2. MPTCP Handshake Algorithms
IANA has created another sub-registry, "MPTCP Handshake Algorithms" IANA has created another sub-registry, "MPTCP Handshake Algorithms"
under the "Transmission Control Protocol (TCP) Parameters" registry, under the "Transmission Control Protocol (TCP) Parameters" registry,
based on the flags in MP_CAPABLE (Section 3.1). IANA is requested to based on the flags in MP_CAPABLE (Section 3.1). IANA is requested to
update the references of this table to this document, as follows: update the references of this table to this document, as follows:
+---------+----------------------------------+----------------------+ +---------+----------------------------------+----------------------+
| Flag | Meaning | Reference | | Flag | Meaning | Reference |
skipping to change at page 68, line 19 skipping to change at page 63, line 19
| 0x01 | MPTCP specific error | This document, Section 3.6 | | 0x01 | MPTCP specific error | This document, Section 3.6 |
| 0x02 | Lack of resources | This document, Section 3.6 | | 0x02 | Lack of resources | This document, Section 3.6 |
| 0x03 | Administratively prohibited | This document, Section 3.6 | | 0x03 | Administratively prohibited | This document, Section 3.6 |
| 0x04 | Too much outstanding data | This document, Section 3.6 | | 0x04 | Too much outstanding data | This document, Section 3.6 |
| 0x05 | Unacceptable performance | This document, Section 3.6 | | 0x05 | Unacceptable performance | This document, Section 3.6 |
| 0x06 | Middlebox interference | This document, Section 3.6 | | 0x06 | Middlebox interference | This document, Section 3.6 |
+------+-----------------------------+----------------------------+ +------+-----------------------------+----------------------------+
Table 4: MPTCP MP_TCPRST Reason Codes Table 4: MPTCP MP_TCPRST Reason Codes
8.4. Experimental option registry
Section 3.7 has defined the MP_EXPERIMENTAL option for private,
experimental MPTCP options, and the same considerations as for
[RFC6994] apply. IANA should create a "Multipath TCP Experimental
Option Identifiers (MPTCP ExIDs)" sub-registry. This registry
contains the 16 bits ExIDs and a reference (description, document
pointer, or assignee name and e-mail contact) for each entry. MPTCP
ExIDs are assigned on a First Come, First Served (FCFS) basis
[RFC5226].
IANA will advise applicants of duplicate entries to select an
alternate value, as per typical FCFS processing.
IANA will record known duplicate uses to assist the community in both
debugging assigned uses as well as correcting unauthorized duplicate
uses.
IANA should impose no requirement on making a registration other than
indicating the desired codepoint and providing a point of contact. A
short description or acronym for the use is desired but should not be
required.
9. References 9. References
9.1. Normative References 9.1. Normative References
[RFC0793] Postel, J., "Transmission Control Protocol", STD 7, [RFC0793] Postel, J., "Transmission Control Protocol", STD 7,
RFC 793, DOI 10.17487/RFC0793, September 1981, RFC 793, DOI 10.17487/RFC0793, September 1981,
<https://www.rfc-editor.org/info/rfc793>. <https://www.rfc-editor.org/info/rfc793>.
[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
Requirement Levels", BCP 14, RFC 2119, Requirement Levels", BCP 14, RFC 2119,
skipping to change at page 71, line 42 skipping to change at page 66, line 18
[RFC6824] Ford, A., Raiciu, C., Handley, M., and O. Bonaventure, [RFC6824] Ford, A., Raiciu, C., Handley, M., and O. Bonaventure,
"TCP Extensions for Multipath Operation with Multiple "TCP Extensions for Multipath Operation with Multiple
Addresses", RFC 6824, DOI 10.17487/RFC6824, January 2013, Addresses", RFC 6824, DOI 10.17487/RFC6824, January 2013,
<https://www.rfc-editor.org/info/rfc6824>. <https://www.rfc-editor.org/info/rfc6824>.
[RFC6897] Scharf, M. and A. Ford, "Multipath TCP (MPTCP) Application [RFC6897] Scharf, M. and A. Ford, "Multipath TCP (MPTCP) Application
Interface Considerations", RFC 6897, DOI 10.17487/RFC6897, Interface Considerations", RFC 6897, DOI 10.17487/RFC6897,
March 2013, <https://www.rfc-editor.org/info/rfc6897>. March 2013, <https://www.rfc-editor.org/info/rfc6897>.
[RFC6994] Touch, J., "Shared Use of Experimental TCP Options",
RFC 6994, DOI 10.17487/RFC6994, August 2013,
<https://www.rfc-editor.org/info/rfc6994>.
[RFC7413] Cheng, Y., Chu, J., Radhakrishnan, S., and A. Jain, "TCP [RFC7413] Cheng, Y., Chu, J., Radhakrishnan, S., and A. Jain, "TCP
Fast Open", RFC 7413, DOI 10.17487/RFC7413, December 2014, Fast Open", RFC 7413, DOI 10.17487/RFC7413, December 2014,
<https://www.rfc-editor.org/info/rfc7413>. <https://www.rfc-editor.org/info/rfc7413>.
[TCPLO] Ramaiah, A., "TCP option space extension", Work [TCPLO] Ramaiah, A., "TCP option space extension", Work
in Progress, March 2012. in Progress, March 2012.
Appendix A. Notes on Use of TCP Options Appendix A. Notes on Use of TCP Options
The TCP option space is limited due to the length of the Data Offset The TCP option space is limited due to the length of the Data Offset
skipping to change at page 73, line 38 skipping to change at page 68, line 38
Finally, there are issues with reliable delivery of options. As Finally, there are issues with reliable delivery of options. As
options can also be sent on pure ACKs, these are not reliably sent. options can also be sent on pure ACKs, these are not reliably sent.
This is not an issue for DATA_ACK due to their cumulative nature, but This is not an issue for DATA_ACK due to their cumulative nature, but
may be an issue for ADD_ADDR/REMOVE_ADDR options. Here, it is may be an issue for ADD_ADDR/REMOVE_ADDR options. Here, it is
recommended to send these options redundantly (whether on multiple recommended to send these options redundantly (whether on multiple
paths or on the same path on a number of ACKs -- but interspersed paths or on the same path on a number of ACKs -- but interspersed
with data in order to avoid interpretation as congestion). The cases with data in order to avoid interpretation as congestion). The cases
where options are stripped by middleboxes are discussed in Section 6. where options are stripped by middleboxes are discussed in Section 6.
Appendix B. Control Blocks Appendix B. TCP Fast Open
TCP Fast Open (TFO) is an experimental TCP extension, described in
[RFC7413], which has been introduced with the objective of gaining
one RTT before transmitting data. This is considered a valuable gain
as very short connections are very common, especially for HTTP
request/response schemes. It achieves this by sending the SYN-
segment together with data and allowing the server to reply
immediately with data after the SYN/ACK. [RFC7413] secures this
mechanism, by using a new TCP option that includes a cookie which is
negotiated in a preceding connection.
When using TCP Fast Open in conjunction with MPTCP, there are two key
points to take into account, detailed hereafter.
B.1. TFO cookie request with MPTCP
When a TFO client first connects to a server, it cannot immediately
include data in the SYN for security reasons [RFC7413]. Instead, it
requests a cookie that will be used in subsequent connections. This
is done with the TCP cookie request/response options, of resp. 2
bytes and 6-18 bytes (depending on the chosen cookie length).
TFO and MPTCP can be combined provided that the total length of their
options does not exceed the maximum 40 bytes possible in TCP:
o In the SYN: MPTCP uses a 4-bytes long MP_CAPABLE option. The
MPTCP and TFO options sum up to 6 bytes. With typical TCP-options
using up to 19 bytes in the SYN (24 bytes if options are padded at
a word boundary), there is enough space to combine the MP_CAPABLE
with the TFO Cookie Request.
o In the SYN+ACK: MPTCP uses a 12-bytes long MP_CAPABLE option, but
now TFO can be as long as 18 bytes. Since the maximum option
length may be exceeded, it is up to the server to solve this by
using a shorter cookie. As an example, if we consider that 19
bytes are used for classical TCP options, the maximum possible
cookie length would be of 7 bytes. Note that the same limitation
applies to subsequent connections, for the SYN packet (because the
client then echoes back the cookie to the server). Finally, if
the security impact of reducing the cookie size is not deemed
acceptable, the server can reduce the amount of other TCP-options
by omitting the TCP timestamps (as outlined in Appendix A).
B.2. Data sequence mapping under TFO
MPTCP uses, in the TCP establishment phase, a key exchange that is
used to generate the Initial Data Sequence Numbers (IDSNs). In
particular, the SYN with MP_CAPABLE occupies the first octet of the
data sequence space. With TFO, one way to handle the data sent
together with the SYN would be to consider an implicit DSS mapping
that covers that SYN segment (since there is not enough space in the
SYN to include a DSS option). The problem with that approach is that
if a middlebox modifies the TFO data, this will not be noticed by
MPTCP because of the absence of a DSS-checksum. For example, a TCP
(but not MPTCP)-aware middlebox could insert bytes at the beginning
of the stream and adapt the TCP checksum and sequence numbers
accordingly. With an implicit mapping, this would give to client and
server a different view on the DSS-mapping, with no way to detect
this inconsistency as the DSS checksum is not present.
To solve this, the TFO data should not be considered part of the Data
Sequence Number space: the SYN with MP_CAPABLE still occupies the
first octet of data sequence space, but then the first non-TFO data
byte occupies the second octet. This guarantees that, if the use of
DSS-checksum is negotiated, all data in the data sequence number
space is checksummed. We also note that this does not entail a loss
of functionality, because TFO-data is always sent when only one path
is active.
B.3. Connection establishment examples
The following shows a few examples of possible TFO+MPTCP
establishment scenarios.
Before a client can send data together with the SYN, it must request
a cookie to the server, as shown in Figure Figure 18. This is done
by simply combining the TFO and MPTCP options.
client server
| |
| S 0(0) <MP_CAPABLE>, <TFO cookie request> |
| -----------------------------------------------------------> |
| |
| S. 0(0) ack 1 <MP_CAPABLE>, <TFO cookie> |
| <----------------------------------------------------------- |
| |
| . 0(0) ack 1 <MP_CAPABLE> |
| -----------------------------------------------------------> |
| |
Figure 18: Cookie request
Once this is done, the received cookie can be used for TFO, as shown
in Figure Figure 19. In this example, the client first sends 20
bytes in the SYN. The server immediately replies with 100 bytes
following the SYN-ACK upon which the client replies with 20 more
bytes. Note that the last segment in the figure has a TCP sequence
number of 21, while the DSS subflow sequence number is 1 (because the
TFO data is not part of the data sequence number space, as explained
in Section Appendix B.2.
client server
| |
| S 0(20) <MP_CAPABLE>, <TFO cookie> |
| -----------------------------------------------------------> |
| |
| S. 0(0) ack 21 <MP_CAPABLE> |
| <----------------------------------------------------------- |
| |
| . 1(100) ack 21 <DSS ack=1 seq=1 ssn=1 dlen=100> |
| <----------------------------------------------------------- |
| |
| . 21(0) ack 1 <MP_CAPABLE> |
| -----------------------------------------------------------> |
| |
| . 21(20) ack 101 <DSS ack=101 seq=1 ssn=1 dlen=20> |
| -----------------------------------------------------------> |
| |
Figure 19: The server supports TFO
In Figure Figure 20, the server does not support TFO. The client
detects that no state is created in the server (as no data is acked),
and now sends the MP_CAPABLE in the third ack, in order for the
server to build its MPTCP context at then end of the establishment.
Now, the tfo data, retransmitted, becomes part of the data sequence
mapping because it is effectively sent (in fact re-sent) after the
establishment.
client server
| |
| S 0(20) <MP_CAPABLE>, <TFO cookie> |
| -----------------------------------------------------------> |
| |
| S. 0(0) ack 1 <MP_CAPABLE> |
| <----------------------------------------------------------- |
| |
| . 1(0) ack 1 <MP_CAPABLE> |
| -----------------------------------------------------------> |
| |
| . 1(20) ack 1 <DSS ack=1 seq=1 ssn=1 dlen=20> |
| -----------------------------------------------------------> |
| |
| . 0(0) ack 21 <DSS ack=21 seq=1 ssn=1 dlen=0> |
| <----------------------------------------------------------- |
| |
Figure 20: The server does not support TFO
It is also possible that the server acknowledges only part of the TFO
data, as illustrated in Figure Figure 21. The client will simply
retransmit the missing data together with a DSS-mapping.
client server
| |
| S 0(1000) <MP_CAPABLE>, <TFO cookie> |
| -----------------------------------------------------------> |
| |
| S. 0(0) ack 501 <MP_CAPABLE> |
| <----------------------------------------------------------- |
| |
| . 501(0) ack 1 <MP_CAPABLE> |
| -----------------------------------------------------------> |
| |
| . 501(500) ack 1 <DSS ack=1 seq=1 ssn=1 dlen=500> |
| -----------------------------------------------------------> |
| |
Figure 21: Partial data acknowledgement
Appendix C. Control Blocks
Conceptually, an MPTCP connection can be represented as an MPTCP Conceptually, an MPTCP connection can be represented as an MPTCP
control block that contains several variables that track the progress control block that contains several variables that track the progress
and the state of the MPTCP connection and a set of linked TCP control and the state of the MPTCP connection and a set of linked TCP control
blocks that correspond to the subflows that have been established. blocks that correspond to the subflows that have been established.
RFC 793 [RFC0793] specifies several state variables. Whenever RFC 793 [RFC0793] specifies several state variables. Whenever
possible, we reuse the same terminology as RFC 793 to describe the possible, we reuse the same terminology as RFC 793 to describe the
state variables that are maintained by MPTCP. state variables that are maintained by MPTCP.
B.1. MPTCP Control Block C.1. MPTCP Control Block
The MPTCP control block contains the following variable per The MPTCP control block contains the following variable per
connection. connection.
B.1.1. Authentication and Metadata C.1.1. Authentication and Metadata
Local.Token (32 bits): This is the token chosen by the local host on Local.Token (32 bits): This is the token chosen by the local host on
this MPTCP connection. The token MUST be unique among all this MPTCP connection. The token MUST be unique among all
established MPTCP connections, generated from the local key. established MPTCP connections, generated from the local key.
Local.Key (64 bits): This is the key sent by the local host on this Local.Key (64 bits): This is the key sent by the local host on this
MPTCP connection. MPTCP connection.
Remote.Token (32 bits): This is the token chosen by the remote host Remote.Token (32 bits): This is the token chosen by the remote host
on this MPTCP connection, generated from the remote key. on this MPTCP connection, generated from the remote key.
Remote.Key (64 bits): This is the key chosen by the remote host on Remote.Key (64 bits): This is the key chosen by the remote host on
this MPTCP connection this MPTCP connection
MPTCP.Checksum (flag): This flag is set to true if at least one of MPTCP.Checksum (flag): This flag is set to true if at least one of
the hosts has set the A bit in the MP_CAPABLE options exchanged the hosts has set the A bit in the MP_CAPABLE options exchanged
during connection establishment, and is set to false otherwise. during connection establishment, and is set to false otherwise.
If this flag is set, the checksum must be computed in all DSS If this flag is set, the checksum must be computed in all DSS
options. options.
B.1.2. Sending Side C.1.2. Sending Side
SND.UNA (64 bits): This is the data sequence number of the next byte SND.UNA (64 bits): This is the data sequence number of the next byte
to be acknowledged, at the MPTCP connection level. This variable to be acknowledged, at the MPTCP connection level. This variable
is updated upon reception of a DSS option containing a DATA_ACK. is updated upon reception of a DSS option containing a DATA_ACK.
SND.NXT (64 bits): This is the data sequence number of the next byte SND.NXT (64 bits): This is the data sequence number of the next byte
to be sent. SND.NXT is used to determine the value of the DSN in to be sent. SND.NXT is used to determine the value of the DSN in
the DSS option. the DSS option.
SND.WND (32 bits with RFC 1323, 16 bits otherwise): This is the SND.WND (32 bits with RFC 1323, 16 bits otherwise): This is the
sending window. MPTCP maintains the sending window at the MPTCP sending window. MPTCP maintains the sending window at the MPTCP
connection level and the same window is shared by all subflows. connection level and the same window is shared by all subflows.
All subflows use the MPTCP connection level SND.WND to compute the All subflows use the MPTCP connection level SND.WND to compute the
SEQ.WND value that is sent in each transmitted segment. SEQ.WND value that is sent in each transmitted segment.
B.1.3. Receiving Side C.1.3. Receiving Side
RCV.NXT (64 bits): This is the data sequence number of the next byte RCV.NXT (64 bits): This is the data sequence number of the next byte
that is expected on the MPTCP connection. This state variable is that is expected on the MPTCP connection. This state variable is
modified upon reception of in-order data. The value of RCV.NXT is modified upon reception of in-order data. The value of RCV.NXT is
used to specify the DATA_ACK that is sent in the DSS option on all used to specify the DATA_ACK that is sent in the DSS option on all
subflows. subflows.
RCV.WND (32 bits with RFC 1323, 16 bits otherwise): This is the RCV.WND (32 bits with RFC 1323, 16 bits otherwise): This is the
connection-level receive window, which is the maximum of the connection-level receive window, which is the maximum of the
RCV.WND on all the subflows. RCV.WND on all the subflows.
B.2. TCP Control Blocks C.2. TCP Control Blocks
The MPTCP control block also contains a list of the TCP control The MPTCP control block also contains a list of the TCP control
blocks that are associated to the MPTCP connection. blocks that are associated to the MPTCP connection.
Note that the TCP control block on the TCP subflows does not contain Note that the TCP control block on the TCP subflows does not contain
the RCV.WND and SND.WND state variables as these are maintained at the RCV.WND and SND.WND state variables as these are maintained at
the MPTCP connection level and not at the subflow level. the MPTCP connection level and not at the subflow level.
Inside each TCP control block, the following state variables are Inside each TCP control block, the following state variables are
defined. defined.
B.2.1. Sending Side C.2.1. Sending Side
SND.UNA (32 bits): This is the sequence number of the next byte to SND.UNA (32 bits): This is the sequence number of the next byte to
be acknowledged on the subflow. This variable is updated upon be acknowledged on the subflow. This variable is updated upon
reception of each TCP acknowledgment on the subflow. reception of each TCP acknowledgment on the subflow.
SND.NXT (32 bits): This is the sequence number of the next byte to SND.NXT (32 bits): This is the sequence number of the next byte to
be sent on the subflow. SND.NXT is used to set the value of be sent on the subflow. SND.NXT is used to set the value of
SEG.SEQ upon transmission of the next segment. SEG.SEQ upon transmission of the next segment.
B.2.2. Receiving Side C.2.2. Receiving Side
RCV.NXT (32 bits): This is the sequence number of the next byte that RCV.NXT (32 bits): This is the sequence number of the next byte that
is expected on the subflow. This state variable is modified upon is expected on the subflow. This state variable is modified upon
reception of in-order segments. The value of RCV.NXT is copied to reception of in-order segments. The value of RCV.NXT is copied to
the SEG.ACK field of the next segments transmitted on the subflow. the SEG.ACK field of the next segments transmitted on the subflow.
RCV.WND (32 bits with RFC 1323, 16 bits otherwise): This is the RCV.WND (32 bits with RFC 1323, 16 bits otherwise): This is the
subflow-level receive window that is updated with the window field subflow-level receive window that is updated with the window field
from the segments received on this subflow. from the segments received on this subflow.
Appendix C. Finite State Machine Appendix D. Finite State Machine
The diagram in Figure 23 shows the Finite State Machine for The diagram in Figure 22 shows the Finite State Machine for
connection-level closure. This illustrates how the DATA_FIN connection-level closure. This illustrates how the DATA_FIN
connection-level signal (indicated as the DFIN flag on a DATA_ACK) connection-level signal (indicated as the DFIN flag on a DATA_ACK)
interacts with subflow-level FINs, and permits "break-before-make" interacts with subflow-level FINs, and permits "break-before-make"
handover between subflows. handover between subflows.
+---------+ +---------+
| M_ESTAB | | M_ESTAB |
+---------+ +---------+
M_CLOSE | | rcv DATA_FIN M_CLOSE | | rcv DATA_FIN
------- | | ------- ------- | | -------
skipping to change at page 76, line 32 skipping to change at page 75, line 32
| rcv DATA_FIN -------------- | -------------- | | rcv DATA_FIN -------------- | -------------- |
| ------- CLOSE all subflows | CLOSE all subflows | | ------- CLOSE all subflows | CLOSE all subflows |
| snd DATA_ACK[DFIN] V delete MPTCP PCB V | snd DATA_ACK[DFIN] V delete MPTCP PCB V
\ +-----------+ +---------+ \ +-----------+ +---------+
------------------------>|M_TIME WAIT|----------------->| M_CLOSED| ------------------------>|M_TIME WAIT|----------------->| M_CLOSED|
+-----------+ +---------+ +-----------+ +---------+
All subflows in CLOSED All subflows in CLOSED
------------ ------------
delete MPTCP PCB delete MPTCP PCB
Figure 23: Finite State Machine for Connection Closure Figure 22: Finite State Machine for Connection Closure
Authors' Addresses Authors' Addresses
Alan Ford Alan Ford
Pexip Pexip
EMail: alan.ford@gmail.com EMail: alan.ford@gmail.com
Costin Raiciu Costin Raiciu
University Politehnica of Bucharest University Politehnica of Bucharest
 End of changes. 47 change blocks. 
343 lines changed or deleted 258 lines changed or added

This html diff was produced by rfcdiff 1.46. The latest version is available from http://tools.ietf.org/tools/rfcdiff/