draft-ietf-mptcp-rfc6824bis-11.txt   draft-ietf-mptcp-rfc6824bis-12.txt 
Internet Engineering Task Force A. Ford Internet Engineering Task Force A. Ford
Internet-Draft Pexip Internet-Draft Pexip
Obsoletes: 6824 (if approved) C. Raiciu Obsoletes: 6824 (if approved) C. Raiciu
Intended status: Standards Track U. Politechnica of Bucharest Intended status: Standards Track U. Politechnica of Bucharest
Expires: November 16, 2018 M. Handley Expires: April 6, 2019 M. Handley
U. College London U. College London
O. Bonaventure O. Bonaventure
U. catholique de Louvain U. catholique de Louvain
C. Paasch C. Paasch
Apple, Inc. Apple, Inc.
May 15, 2018 October 3, 2018
TCP Extensions for Multipath Operation with Multiple Addresses TCP Extensions for Multipath Operation with Multiple Addresses
draft-ietf-mptcp-rfc6824bis-11 draft-ietf-mptcp-rfc6824bis-12
Abstract Abstract
TCP/IP communication is currently restricted to a single path per TCP/IP communication is currently restricted to a single path per
connection, yet multiple paths often exist between peers. The connection, yet multiple paths often exist between peers. The
simultaneous use of these multiple paths for a TCP/IP session would simultaneous use of these multiple paths for a TCP/IP session would
improve resource usage within the network and, thus, improve user improve resource usage within the network and, thus, improve user
experience through higher throughput and improved resilience to experience through higher throughput and improved resilience to
network failure. network failure.
skipping to change at page 2, line 7 skipping to change at page 2, line 7
Internet-Drafts are working documents of the Internet Engineering Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF). Note that other groups may also distribute Task Force (IETF). Note that other groups may also distribute
working documents as Internet-Drafts. The list of current Internet- working documents as Internet-Drafts. The list of current Internet-
Drafts is at https://datatracker.ietf.org/drafts/current/. Drafts is at https://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress." material or to cite them other than as "work in progress."
This Internet-Draft will expire on November 16, 2018. This Internet-Draft will expire on April 6, 2019.
Copyright Notice Copyright Notice
Copyright (c) 2018 IETF Trust and the persons identified as the Copyright (c) 2018 IETF Trust and the persons identified as the
document authors. All rights reserved. document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents Provisions Relating to IETF Documents
(https://trustee.ietf.org/license-info) in effect on the date of (https://trustee.ietf.org/license-info) in effect on the date of
publication of this document. Please review these documents publication of this document. Please review these documents
skipping to change at page 2, line 35 skipping to change at page 2, line 35
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3
1.1. Design Assumptions . . . . . . . . . . . . . . . . . . . 4 1.1. Design Assumptions . . . . . . . . . . . . . . . . . . . 4
1.2. Multipath TCP in the Networking Stack . . . . . . . . . . 5 1.2. Multipath TCP in the Networking Stack . . . . . . . . . . 5
1.3. Terminology . . . . . . . . . . . . . . . . . . . . . . . 6 1.3. Terminology . . . . . . . . . . . . . . . . . . . . . . . 6
1.4. MPTCP Concept . . . . . . . . . . . . . . . . . . . . . . 7 1.4. MPTCP Concept . . . . . . . . . . . . . . . . . . . . . . 7
1.5. Requirements Language . . . . . . . . . . . . . . . . . . 8 1.5. Requirements Language . . . . . . . . . . . . . . . . . . 8
2. Operation Overview . . . . . . . . . . . . . . . . . . . . . 8 2. Operation Overview . . . . . . . . . . . . . . . . . . . . . 8
2.1. Initiating an MPTCP Connection . . . . . . . . . . . . . 9 2.1. Initiating an MPTCP Connection . . . . . . . . . . . . . 9
2.2. Associating a New Subflow with an Existing MPTCP 2.2. Associating a New Subflow with an Existing MPTCP
Connection . . . . . . . . . . . . . . . . . . . . . . . 9 Connection . . . . . . . . . . . . . . . . . . . . . . . 10
2.3. Informing the Other Host about Another Potential Address 10 2.3. Informing the Other Host about Another Potential Address 11
2.4. Data Transfer Using MPTCP . . . . . . . . . . . . . . . . 11 2.4. Data Transfer Using MPTCP . . . . . . . . . . . . . . . . 12
2.5. Requesting a Change in a Path's Priority . . . . . . . . 11 2.5. Requesting a Change in a Path's Priority . . . . . . . . 13
2.6. Closing an MPTCP Connection . . . . . . . . . . . . . . . 12 2.6. Closing an MPTCP Connection . . . . . . . . . . . . . . . 13
2.7. Notable Features . . . . . . . . . . . . . . . . . . . . 12 2.7. Notable Features . . . . . . . . . . . . . . . . . . . . 14
3. MPTCP Protocol . . . . . . . . . . . . . . . . . . . . . . . 12 3. MPTCP Protocol . . . . . . . . . . . . . . . . . . . . . . . 15
3.1. Connection Initiation . . . . . . . . . . . . . . . . . . 14 3.1. Connection Initiation . . . . . . . . . . . . . . . . . . 16
3.2. Starting a New Subflow . . . . . . . . . . . . . . . . . 20 3.2. Starting a New Subflow . . . . . . . . . . . . . . . . . 23
3.3. General MPTCP Operation . . . . . . . . . . . . . . . . . 25 3.3. General MPTCP Operation . . . . . . . . . . . . . . . . . 28
3.3.1. Data Sequence Mapping . . . . . . . . . . . . . . . . 27 3.3.1. Data Sequence Mapping . . . . . . . . . . . . . . . . 30
3.3.2. Data Acknowledgments . . . . . . . . . . . . . . . . 30 3.3.2. Data Acknowledgments . . . . . . . . . . . . . . . . 33
3.3.3. Closing a Connection . . . . . . . . . . . . . . . . 31 3.3.3. Closing a Connection . . . . . . . . . . . . . . . . 34
3.3.4. Receiver Considerations . . . . . . . . . . . . . . . 32 3.3.4. Receiver Considerations . . . . . . . . . . . . . . . 36
3.3.5. Sender Considerations . . . . . . . . . . . . . . . . 33 3.3.5. Sender Considerations . . . . . . . . . . . . . . . . 37
3.3.6. Reliability and Retransmissions . . . . . . . . . . . 34 3.3.6. Reliability and Retransmissions . . . . . . . . . . . 38
3.3.7. Congestion Control Considerations . . . . . . . . . . 35 3.3.7. Congestion Control Considerations . . . . . . . . . . 39
3.3.8. Subflow Policy . . . . . . . . . . . . . . . . . . . 36 3.3.8. Subflow Policy . . . . . . . . . . . . . . . . . . . 39
3.4. Address Knowledge Exchange (Path Management) . . . . . . 37 3.4. Address Knowledge Exchange (Path Management) . . . . . . 41
3.4.1. Address Advertisement . . . . . . . . . . . . . . . . 38 3.4.1. Address Advertisement . . . . . . . . . . . . . . . . 42
3.4.2. Remove Address . . . . . . . . . . . . . . . . . . . 42 3.4.2. Remove Address . . . . . . . . . . . . . . . . . . . 45
3.5. Fast Close . . . . . . . . . . . . . . . . . . . . . . . 43 3.5. Fast Close . . . . . . . . . . . . . . . . . . . . . . . 46
3.6. Subflow Reset . . . . . . . . . . . . . . . . . . . . . . 44 3.6. Subflow Reset . . . . . . . . . . . . . . . . . . . . . . 48
3.7. Fallback . . . . . . . . . . . . . . . . . . . . . . . . 46 3.7. Fallback . . . . . . . . . . . . . . . . . . . . . . . . 50
3.8. Error Handling . . . . . . . . . . . . . . . . . . . . . 50 3.8. Error Handling . . . . . . . . . . . . . . . . . . . . . 53
3.9. Heuristics . . . . . . . . . . . . . . . . . . . . . . . 50 3.9. Heuristics . . . . . . . . . . . . . . . . . . . . . . . 54
3.9.1. Port Usage . . . . . . . . . . . . . . . . . . . . . 51 3.9.1. Port Usage . . . . . . . . . . . . . . . . . . . . . 54
3.9.2. Delayed Subflow Start and Subflow Symmetry . . . . . 51 3.9.2. Delayed Subflow Start and Subflow Symmetry . . . . . 54
3.9.3. Failure Handling . . . . . . . . . . . . . . . . . . 52 3.9.3. Failure Handling . . . . . . . . . . . . . . . . . . 55
4. Semantic Issues . . . . . . . . . . . . . . . . . . . . . . . 53 4. Semantic Issues . . . . . . . . . . . . . . . . . . . . . . . 56
5. Security Considerations . . . . . . . . . . . . . . . . . . . 54 5. Security Considerations . . . . . . . . . . . . . . . . . . . 57
6. Interactions with Middleboxes . . . . . . . . . . . . . . . . 57 6. Interactions with Middleboxes . . . . . . . . . . . . . . . . 60
7. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 60 7. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 63
8. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 60 8. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 64
8.1. MPTCP Option Subtypes . . . . . . . . . . . . . . . . . . 61 8.1. MPTCP Option Subtypes . . . . . . . . . . . . . . . . . . 64
8.2. MPTCP Handshake Algorithms . . . . . . . . . . . . . . . 62 8.2. MPTCP Handshake Algorithms . . . . . . . . . . . . . . . 65
8.3. MP_TCPRST Reason Codes . . . . . . . . . . . . . . . . . 62 8.3. MP_TCPRST Reason Codes . . . . . . . . . . . . . . . . . 66
9. References . . . . . . . . . . . . . . . . . . . . . . . . . 63 9. References . . . . . . . . . . . . . . . . . . . . . . . . . 67
9.1. Normative References . . . . . . . . . . . . . . . . . . 63 9.1. Normative References . . . . . . . . . . . . . . . . . . 67
9.2. Informative References . . . . . . . . . . . . . . . . . 63 9.2. Informative References . . . . . . . . . . . . . . . . . 67
Appendix A. Notes on Use of TCP Options . . . . . . . . . . . . 67 Appendix A. Notes on Use of TCP Options . . . . . . . . . . . . 71
Appendix B. TCP Fast Open . . . . . . . . . . . . . . . . . . . 68 Appendix B. TCP Fast Open and MPTCP . . . . . . . . . . . . . . 72
B.1. TFO cookie request with MPTCP . . . . . . . . . . . . . . 69 B.1. TFO cookie request with MPTCP . . . . . . . . . . . . . . 73
B.2. Data sequence mapping under TFO . . . . . . . . . . . . . 69 B.2. Data sequence mapping under TFO . . . . . . . . . . . . . 73
B.3. Connection establishment examples . . . . . . . . . . . . 70 B.3. Connection establishment examples . . . . . . . . . . . . 74
Appendix C. Control Blocks . . . . . . . . . . . . . . . . . . . 72 Appendix C. Control Blocks . . . . . . . . . . . . . . . . . . . 76
C.1. MPTCP Control Block . . . . . . . . . . . . . . . . . . . 72 C.1. MPTCP Control Block . . . . . . . . . . . . . . . . . . . 76
C.1.1. Authentication and Metadata . . . . . . . . . . . . . 72 C.1.1. Authentication and Metadata . . . . . . . . . . . . . 76
C.1.2. Sending Side . . . . . . . . . . . . . . . . . . . . 73 C.1.2. Sending Side . . . . . . . . . . . . . . . . . . . . 77
C.1.3. Receiving Side . . . . . . . . . . . . . . . . . . . 73 C.1.3. Receiving Side . . . . . . . . . . . . . . . . . . . 77
C.2. TCP Control Blocks . . . . . . . . . . . . . . . . . . . 73 C.2. TCP Control Blocks . . . . . . . . . . . . . . . . . . . 77
C.2.1. Sending Side . . . . . . . . . . . . . . . . . . . . 74 C.2.1. Sending Side . . . . . . . . . . . . . . . . . . . . 78
C.2.2. Receiving Side . . . . . . . . . . . . . . . . . . . 74 C.2.2. Receiving Side . . . . . . . . . . . . . . . . . . . 78
Appendix D. Finite State Machine . . . . . . . . . . . . . . . . 74 Appendix D. Finite State Machine . . . . . . . . . . . . . . . . 78
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 75 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 79
1. Introduction 1. Introduction
Multipath TCP (MPTCP) is a set of extensions to regular TCP [RFC0793] Multipath TCP (MPTCP) is a set of extensions to regular TCP [RFC0793]
to provide a Multipath TCP [RFC6182] service, which enables a to provide a Multipath TCP [RFC6182] service, which enables a
transport connection to operate across multiple paths simultaneously. transport connection to operate across multiple paths simultaneously.
This document presents the protocol changes required to add multipath This document presents the protocol changes required to add multipath
capability to TCP; specifically, those for signaling and setting up capability to TCP; specifically, those for signaling and setting up
multiple paths ("subflows"), managing these subflows, reassembly of multiple paths ("subflows"), managing these subflows, reassembly of
data, and termination of sessions. This is not the only information data, and termination of sessions. This is not the only information
skipping to change at page 8, line 25 skipping to change at page 8, line 25
| |--------------------->| | | |--------------------->| |
| |<---------------------| | | |<---------------------| |
| | | | | | | |
| | | | | | | |
Figure 2: Example MPTCP Usage Scenario Figure 2: Example MPTCP Usage Scenario
1.5. Requirements Language 1.5. Requirements Language
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and
document are to be interpreted as described in RFC 2119 [RFC2119]. "OPTIONAL" in this document are to be interpreted as described in
BCP 14 RFC 2119 [RFC2119] RFC 8174 [RFC8174] when, and only when,
they appear in all capitals, as shown here.
2. Operation Overview 2. Operation Overview
This section presents a single description of common MPTCP operation, This section presents a single description of common MPTCP operation,
with reference to the protocol operation. This is a high-level with reference to the protocol operation. This is a high-level
overview of the key functions; the full specification follows in overview of the key functions; the full specification follows in
Section 3. Extensibility and negotiated features are not discussed Section 3. Extensibility and negotiated features are not discussed
here. Considerable reference is made to symbolic names of MPTCP here. Considerable reference is made to symbolic names of MPTCP
options throughout this section -- these are subtypes of the IANA- options throughout this section -- these are subtypes of the IANA-
assigned MPTCP option (see Section 8), and their formats are defined assigned MPTCP option (see Section 8), and their formats are defined
skipping to change at page 9, line 13 skipping to change at page 9, line 13
during the lifetime of the Multipath TCP connection. during the lifetime of the Multipath TCP connection.
All MPTCP operations are signaled with a TCP option -- a single All MPTCP operations are signaled with a TCP option -- a single
numerical type for MPTCP, with "sub-types" for each MPTCP message. numerical type for MPTCP, with "sub-types" for each MPTCP message.
What follows is a summary of the purpose and rationale of these What follows is a summary of the purpose and rationale of these
messages. messages.
2.1. Initiating an MPTCP Connection 2.1. Initiating an MPTCP Connection
This is the same signaling as for initiating a normal TCP connection, This is the same signaling as for initiating a normal TCP connection,
but the SYN, SYN/ACK, and initial ACK packets also carry the but the SYN, SYN/ACK, and initial ACK (and data) packets also carry
MP_CAPABLE option. This option is variable length and serves the MP_CAPABLE option. This option has a variable length and serves
multiple purposes. Firstly, it verifies whether the remote host multiple purposes. Firstly, it verifies whether the remote host
supports Multipath TCP; secondly, this option allows the hosts to supports Multipath TCP; secondly, this option allows the hosts to
exchange some information to authenticate the establishment of exchange some information to authenticate the establishment of
additional subflows. Further details are given in Section 3.1. additional subflows. Further details are given in Section 3.1.
Host A Host B Host A Host B
------ ------ ------ ------
MP_CAPABLE -> MP_CAPABLE ->
[flags] [flags]
<- MP_CAPABLE <- MP_CAPABLE
[B's key, flags] [B's key, flags]
ACK + MP_CAPABLE (+ data) -> ACK + MP_CAPABLE (+ data) ->
[A's key, B's key, flags, (data-level details)] [A's key, B's key, flags, (data-level details)]
Retransmission of the ACK + MP_CAPABLE can occur if it is not known
if it has been received. The following diagrams show all possible
exchanges for the initial subflow setup to ensure this reliability.
Host A (with data to send immediately) Host B
------ ------
MP_CAPABLE ->
[flags]
<- MP_CAPABLE
[B's key, flags]
ACK + MP_CAPABLE + data ->
[A's key, B's key, flags, data-level details]
Host A (with data to send later) Host B
------ ------
MP_CAPABLE ->
[flags]
<- MP_CAPABLE
[B's key, flags]
ACK + MP_CAPABLE ->
[A's key, B's key, flags]
ACK + MP_CAPABLE + data ->
[A's key, B's key, flags, data-level details]
Host A Host B (sending first)
------ ------
MP_CAPABLE ->
[flags]
<- MP_CAPABLE
[B's key, flags]
ACK + MP_CAPABLE ->
[A's key, B's key, flags]
<- ACK + DSS + data
[data-level details]
2.2. Associating a New Subflow with an Existing MPTCP Connection 2.2. Associating a New Subflow with an Existing MPTCP Connection
The exchange of keys in the MP_CAPABLE handshake provides material The exchange of keys in the MP_CAPABLE handshake provides material
that can be used to authenticate the endpoints when new subflows will that can be used to authenticate the endpoints when new subflows will
be set up. Additional subflows begin in the same way as initiating a be set up. Additional subflows begin in the same way as initiating a
normal TCP connection, but the SYN, SYN/ACK, and ACK packets also normal TCP connection, but the SYN, SYN/ACK, and ACK packets also
carry the MP_JOIN option. carry the MP_JOIN option.
Host A initiates a new subflow between one of its addresses and one Host A initiates a new subflow between one of its addresses and one
of Host B's addresses. The token -- generated from the key -- is of Host B's addresses. The token -- generated from the key -- is
skipping to change at page 10, line 34 skipping to change at page 11, line 38
port pair IP#-A1 and wants to open a second subflow starting at port pair IP#-A1 and wants to open a second subflow starting at
address/port pair IP#-A2, it simply initiates the establishment of address/port pair IP#-A2, it simply initiates the establishment of
the subflow as explained above. The remote host will then be the subflow as explained above. The remote host will then be
implicitly informed about the new address. implicitly informed about the new address.
In some circumstances, a host may want to advertise to the remote In some circumstances, a host may want to advertise to the remote
host the availability of an address without establishing a new host the availability of an address without establishing a new
subflow, for example, when a NAT prevents setup in one direction. In subflow, for example, when a NAT prevents setup in one direction. In
the example below, Host A informs Host B about its alternative IP the example below, Host A informs Host B about its alternative IP
address/port pair (IP#-A2). Host B may later send an MP_JOIN to this address/port pair (IP#-A2). Host B may later send an MP_JOIN to this
new address. This option contains a HMAC to authenticate the address new address. The ADD_ADDR option contains a HMAC to authenticate the
as having been sent from the originator of the connection. Further address as having been sent from the originator of the connection.
details are in Section 3.4.1. The receiver of this option echoes it back to the client to indicate
successful reception. Further details are in Section 3.4.1.
Host A Host B Host A Host B
------ ------ ------ ------
ADD_ADDR -> ADD_ADDR ->
[IP#-A2, [Echo-flag=0,
IP#-A2,
IP#-A2's Address ID, IP#-A2's Address ID,
HMAC of IP#-A2] HMAC of IP#-A2]
<- ADD_ADDR
[Echo-flag=1,
IP#-A2,
IP#-A2's Address ID,
HMAC of IP#-A2]
There is a corresponding signal for address removal, making use of There is a corresponding signal for address removal, making use of
the Address ID that is signaled in the add address handshake. the Address ID that is signaled in the add address handshake.
Further details in Section 3.4.2. Further details in Section 3.4.2.
Host A Host B Host A Host B
------ ------ ------ ------
REMOVE_ADDR -> REMOVE_ADDR ->
[IP#-A2's Address ID] [IP#-A2's Address ID]
2.4. Data Transfer Using MPTCP 2.4. Data Transfer Using MPTCP
To ensure reliable, in-order delivery of data over subflows that may To ensure reliable, in-order delivery of data over subflows that may
appear and disappear at any time, MPTCP uses a 64-bit data sequence appear and disappear at any time, MPTCP uses a 64-bit data sequence
number (DSN) to number all data sent over the MPTCP connection. Each number (DSN) to number all data sent over the MPTCP connection. Each
subflow has its own 32-bit sequence number space, utilising the subflow has its own 32-bit sequence number space, utilising the
regular TCP sequence number header, and an MPTCP option maps the regular TCP sequence number header, and an MPTCP option maps the
subflow sequence space to the data sequence space. In this way, data subflow sequence space to the data sequence space. In this way, data
can be retransmitted on different subflows (mapped to the same DSN) can be retransmitted on different subflows (mapped to the same DSN)
in the event of failure. in the event of failure.
The "Data Sequence Signal" carries the "Data Sequence Mapping". The The Data Sequence Signal (DSS) carries the Data Sequence Mapping.
data sequence mapping consists of the subflow sequence number, data The Data Sequence Mapping consists of the subflow sequence number,
sequence number, and length for which this mapping is valid. This data sequence number, and length for which this mapping is valid.
option can also carry a connection-level acknowledgment (the "Data This option can also carry a connection-level acknowledgment (the
ACK") for the received DSN. "Data ACK") for the received DSN.
With MPTCP, all subflows share the same receive buffer and advertise With MPTCP, all subflows share the same receive buffer and advertise
the same receive window. There are two levels of acknowledgment in the same receive window. There are two levels of acknowledgment in
MPTCP. Regular TCP acknowledgments are used on each subflow to MPTCP. Regular TCP acknowledgments are used on each subflow to
acknowledge the reception of the segments sent over the subflow acknowledge the reception of the segments sent over the subflow
independently of their DSN. In addition, there are connection-level independently of their DSN. In addition, there are connection-level
acknowledgments for the data sequence space. These acknowledgments acknowledgments for the data sequence space. These acknowledgments
track the advancement of the bytestream and slide the receiving track the advancement of the bytestream and slide the receiving
window. window.
Further details are in Section 3.3. Further details are in Section 3.3.
Host A Host B Host A Host B
------ ------ ------ ------
DATA_SEQUENCE_SIGNAL -> DSS ->
[Data Sequence Mapping] [Data Sequence Mapping]
[Data ACK] [Data ACK]
[Checksum] [Checksum]
2.5. Requesting a Change in a Path's Priority 2.5. Requesting a Change in a Path's Priority
Hosts can indicate at initial subflow setup whether they wish the Hosts can indicate at initial subflow setup whether they wish the
subflow to be used as a regular or backup path -- a backup path only subflow to be used as a regular or backup path -- a backup path only
being used if there are no regular paths available. During a being used if there are no regular paths available. During a
connection, Host A can request a change in the priority of a subflow connection, Host A can request a change in the priority of a subflow
through the MP_PRIO signal to Host B. Further details are in through the MP_PRIO signal to Host B. Further details are in
Section 3.3.8. Section 3.3.8.
Host A Host B Host A Host B
------ ------ ------ ------
MP_PRIO -> MP_PRIO ->
2.6. Closing an MPTCP Connection 2.6. Closing an MPTCP Connection
When a host wants to close an existing subflow, but not the whole
connection, it can initiate a regular TCP FIN/ACK exchange.
When Host A wants to inform Host B that it has no more data to send, When Host A wants to inform Host B that it has no more data to send,
it signals this "Data FIN" as part of the Data Sequence Signal (see it signals this "Data FIN" as part of the Data Sequence Signal (see
above). It has the same semantics and behavior as a regular TCP FIN, above). It has the same semantics and behavior as a regular TCP FIN,
but at the connection level. Once all the data on the MPTCP but at the connection level. Once all the data on the MPTCP
connection has been successfully received, then this message is connection has been successfully received, then this message is
acknowledged at the connection level with a DATA_ACK. Further acknowledged at the connection level with a DATA_ACK. Further
details are in Section 3.3.3. details are in Section 3.3.3.
Host A Host B Host A Host B
------ ------ ------ ------
skipping to change at page 12, line 19 skipping to change at page 13, line 44
above). It has the same semantics and behavior as a regular TCP FIN, above). It has the same semantics and behavior as a regular TCP FIN,
but at the connection level. Once all the data on the MPTCP but at the connection level. Once all the data on the MPTCP
connection has been successfully received, then this message is connection has been successfully received, then this message is
acknowledged at the connection level with a DATA_ACK. Further acknowledged at the connection level with a DATA_ACK. Further
details are in Section 3.3.3. details are in Section 3.3.3.
Host A Host B Host A Host B
------ ------ ------ ------
DATA_SEQUENCE_SIGNAL -> DATA_SEQUENCE_SIGNAL ->
[Data FIN] [Data FIN]
<- (MPTCP DATA_ACK) <- (MPTCP DATA_ACK)
There is an additional method of connection closure, referred to as
"Fast Close", which is analogous to closing a single-path TCP
connection with a RST signal. The MP_FASTCLOSE signal is used to
indicate to the peer that the connection will be abruptly closed and
no data will be accepted anymore. This can be used on an ACK
(ensuring reliability of the signal), or a RST (which is not). Both
examples are shown in the following diagrams. Further details are in
Section 3.5.
Host A Host B
------ ------
ACK + MP_FASTCLOSE ->
[B's key]
[RST on all other subflows] ->
<- [RST on all subflows]
Host A Host B
------ ------
RST + MP_FASTCLOSE ->
[B's key] [on all subflows]
<- [RST on all subflows]
2.7. Notable Features 2.7. Notable Features
It is worth highlighting that MPTCP's signaling has been designed It is worth highlighting that MPTCP's signaling has been designed
with several key requirements in mind: with several key requirements in mind:
o To cope with NATs on the path, addresses are referred to by o To cope with NATs on the path, addresses are referred to by
Address IDs, in case the IP packet's source address gets changed Address IDs, in case the IP packet's source address gets changed
by a NAT. Setting up a new TCP flow is not possible if the by a NAT. Setting up a new TCP flow is not possible if the
passive opener is behind a NAT; to allow subflows to be created receiver of the SYN is behind a NAT; to allow subflows to be
when either end is behind a NAT, MPTCP uses the ADD_ADDR message. created when either end is behind a NAT, MPTCP uses the ADD_ADDR
message.
o MPTCP falls back to ordinary TCP if MPTCP operation is not o MPTCP falls back to ordinary TCP if MPTCP operation is not
possible, for example, if one host is not MPTCP capable or if a possible, for example, if one host is not MPTCP capable or if a
middlebox alters the payload. middlebox alters the payload. This is discussed in Section 3.7.
o To meet the threats identified in [RFC6181], the following steps o To address the threats identified in [RFC6181], the following
are taken: keys are sent in the clear in the MP_CAPABLE messages; steps are taken: keys are sent in the clear in the MP_CAPABLE
MP_JOIN messages are secured with HMAC-SHA256 ([RFC2104], [SHS]) messages; MP_JOIN messages are secured with HMAC-SHA256
using those keys; and standard TCP validity checks are made on the ([RFC2104], [SHS]) using those keys; and standard TCP validity
other messages (ensuring sequence numbers are in-window checks are made on the other messages (ensuring sequence numbers
[RFC5961]). Further information can be found in Section 5. are in-window [RFC5961]). Residual threats to MPTCP v0 [RFC6824]
were identified in [RFC7430], and those affecting the protocol
(i.e. modification to ADD_ADDR) have been incorporated in this
document. Further discussion of security can be found in
Section 5.
3. MPTCP Protocol 3. MPTCP Protocol
This section describes the operation of the MPTCP protocol, and is This section describes the operation of the MPTCP protocol, and is
subdivided into sections for each key part of the protocol operation. subdivided into sections for each key part of the protocol operation.
All MPTCP operations are signaled using optional TCP header fields. All MPTCP operations are signaled using optional TCP header fields.
A single TCP option number ("Kind") has been assigned by IANA for A single TCP option number ("Kind") has been assigned by IANA for
MPTCP (see Section 8), and then individual messages will be MPTCP (see Section 8), and then individual messages will be
determined by a "subtype", the values of which are also stored in an determined by a "subtype", the values of which are also stored in an
IANA registry (and are also listed in Section 8). IANA registry (and are also listed in Section 8). As with all TCP
options, the Length field is specified in bytes, and includes the 2
bytes of Kind and Length.
Throughout this document, when reference is made to an MPTCP option Throughout this document, when reference is made to an MPTCP option
by symbolic name, such as "MP_CAPABLE", this refers to a TCP option by symbolic name, such as "MP_CAPABLE", this refers to a TCP option
with the single MPTCP option type, and with the subtype value of the with the single MPTCP option type, and with the subtype value of the
symbolic name as defined in Section 8. This subtype is a 4-bit field symbolic name as defined in Section 8. This subtype is a 4-bit field
-- the first 4 bits of the option payload, as shown in Figure 3. The -- the first 4 bits of the option payload, as shown in Figure 3. The
MPTCP messages are defined in the following sections. MPTCP messages are defined in the following sections.
1 2 3 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
skipping to change at page 13, line 49 skipping to change at page 16, line 11
segment [RFC5681] in regular TCP. Therefore, an MPTCP implementation segment [RFC5681] in regular TCP. Therefore, an MPTCP implementation
receiving a duplicate ACK that contains an MPTCP option MUST NOT receiving a duplicate ACK that contains an MPTCP option MUST NOT
treat it as a signal of congestion. Additionally, an MPTCP treat it as a signal of congestion. Additionally, an MPTCP
implementation SHOULD NOT send more than two duplicate ACKs in a row implementation SHOULD NOT send more than two duplicate ACKs in a row
for the purposes of sending MPTCP options alone, in order to ensure for the purposes of sending MPTCP options alone, in order to ensure
no middleboxes misinterpret this as a sign of congestion. no middleboxes misinterpret this as a sign of congestion.
Furthermore, standard TCP validity checks (such as ensuring the Furthermore, standard TCP validity checks (such as ensuring the
sequence number and acknowledgment number are within window) MUST be sequence number and acknowledgment number are within window) MUST be
undertaken before processing any MPTCP signals, as described in undertaken before processing any MPTCP signals, as described in
[RFC5961], and initial subfow sequence numbers SHOULD be generated [RFC5961], and initial subflow sequence numbers SHOULD be generated
according to the recommendations in [RFC6528]. according to the recommendations in [RFC6528].
3.1. Connection Initiation 3.1. Connection Initiation
Connection initiation begins with a SYN, SYN/ACK, ACK exchange on a Connection initiation begins with a SYN, SYN/ACK, ACK exchange on a
single path. Each packet contains the Multipath Capable (MP_CAPABLE) single path. Each packet contains the Multipath Capable (MP_CAPABLE)
MPTCP option (Figure 4). This option declares its sender is capable MPTCP option (Figure 4). This option declares its sender is capable
of performing Multipath TCP and wishes to do so on this particular of performing Multipath TCP and wishes to do so on this particular
connection. connection.
The MP_CAPABLE exchange in this specification (v1) is different to The MP_CAPABLE exchange in this specification (v1) is different to
that specified in v0 [RFC6824]. If a host supports multiple versions that specified in v0 [RFC6824]. If a host supports multiple versions
of MPTCP, the sender of the MP_CAPABLE option SHOULD signal the of MPTCP, the sender of the MP_CAPABLE option SHOULD signal the
highest version number it supports. The passive opener, on receipt highest version number it supports. In return, in its MP_CAPABLE
of this, will signal the version number it wishes to use, which MUST option, the receiver will signal the version number it wishes to use,
be equal to or lower than the version number indicated in the initial which MUST be equal to or lower than the version number indicated in
MP_CAPABLE. Given the SYN exchange is different between v1 and v0 the initial MP_CAPABLE. There is a caveat though with respect to
the exchange cannot be immediately downgraded, and therefore if the this version negotiation with old listeners that only support v0. A
far end has requested a lower version then the initiator SHOULD listener that supports v0 expects that the MP_CAPABLE option in the
respond with an ACK without any MP_CAPABLE option, to fall back to SYN-segment includes the initiator's key. If the initiator however
regular TCP. If the initiator supports the requsted version, on already upgraded to v1, it won't include the key in the SYN-segment.
future connections to the target host, the initiator MAY cache the Thus, the listener will ignore the MP_CAPABLE of this SYN-segment and
version preference. Alternatively, the initiator MAY close the reply with a SYN/ACK that does not include an MP_CAPABLE, thus
connection with a TCP RST and immediately re-establish with the leading to a fallback to regular TCP. An initiator MAY cache this
requested version of MPTCP. information about a peer and for future connections, MAY choose to
attempt using MPTCP v0, if supported, before recording the host as
not supporting MPTCP.
The MP_CAPABLE option is variable-length, with different fields The MP_CAPABLE option is variable-length, with different fields
included depending on which packet the option is used on. The full included depending on which packet the option is used on. The full
MP_CAPABLE option is shown in Figure 4. MP_CAPABLE option is shown in Figure 4.
1 2 3 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+---------------+---------------+-------+-------+---------------+ +---------------+---------------+-------+-------+---------------+
| Kind | Length |Subtype|Version|A|B|C|D|E|F|G|H| | Kind | Length |Subtype|Version|A|B|C|D|E|F|G|H|
+---------------+---------------+-------+-------+---------------+ +---------------+---------------+-------+-------+---------------+
skipping to change at page 15, line 4 skipping to change at page 17, line 25
| (if option Length > 12) | | (if option Length > 12) |
| | | |
+-------------------------------+-------------------------------+ +-------------------------------+-------------------------------+
| Data-Level Length (16 bits) | Checksum (16 bits, optional) | | Data-Level Length (16 bits) | Checksum (16 bits, optional) |
+-------------------------------+-------------------------------+ +-------------------------------+-------------------------------+
Figure 4: Multipath Capable (MP_CAPABLE) Option Figure 4: Multipath Capable (MP_CAPABLE) Option
The MP_CAPABLE option is carried on the SYN, SYN/ACK, and ACK packets The MP_CAPABLE option is carried on the SYN, SYN/ACK, and ACK packets
that start the first subflow of an MPTCP connection, as well as the that start the first subflow of an MPTCP connection, as well as the
first packet that carries data, if the initiator wishs to send first. first packet that carries data, if the initiator wishes to send
The data carried by each option is as follows, where A = initiator first. The data carried by each option is as follows, where A =
and B = listener. initiator and B = listener.
o SYN (A->B): only the first four octets (Length = 4). o SYN (A->B): only the first four octets (Length = 4).
o SYN/ACK (B->A): B's Key for this connection (Length = 12). o SYN/ACK (B->A): B's Key for this connection (Length = 12).
o ACK (no data) (A->B): A's Key followed by B's Key (Length = 20). o ACK (no data) (A->B): A's Key followed by B's Key (Length = 20).
o ACK (with first data) (A->B): A's Key followed by B's Key followed o ACK (with first data) (A->B): A's Key followed by B's Key followed
by Data-Level Length, and optional Checksum (Length = 22 or 24). by Data-Level Length, and optional Checksum (Length = 22 or 24).
The contents of the option is determined by the SYN and ACK flags of The contents of the option is determined by the SYN and ACK flags of
the packet, along with the option's length field. For the diagram the packet, along with the option's length field. For the diagram
shown in Figure 4, "sender" and "receiver" refer to the sender or shown in Figure 4, "sender" and "receiver" refer to the sender or
receiver of the TCP packet (which can be either host). receiver of the TCP packet (which can be either host).
The initial SYN, containing just the MP_CAPABLE header, is used to The initial SYN, containing just the MP_CAPABLE header, is used to
define the version of MPTCP being requested, as well as exchanging define the version of MPTCP being requested, as well as exchanging
flags to negotiate connection features, described later. flags to negotiate connection features, described later.
This option is used to declare the 64-bit keys that the end hosts This option is used to declare the 64-bit keys that the end hosts
have generated for this MPTCP connection. This key is used to have generated for this MPTCP connection. These keys are used to
authenticate the addition of future subflows to this connection. authenticate the addition of future subflows to this connection.
This is the only time the key will be sent in clear on the wire This is the only time the key will be sent in clear on the wire
(unless "fast close", Section 3.5, is used); all future subflows will (unless "fast close", Section 3.5, is used); all future subflows will
identify the connection using a 32-bit "token". This token is a identify the connection using a 32-bit "token". This token is a
cryptographic hash of this key. The algorithm for this process is cryptographic hash of this key. The algorithm for this process is
dependent on the authentication algorithm selected; the method of dependent on the authentication algorithm selected; the method of
selection is defined later in this section. selection is defined later in this section.
Upon reception of the initial SYN-segment, a stateful server Upon reception of the initial SYN-segment, a stateful server
generates a random key and replies with a SYN/ACK. The key's method generates a random key and replies with a SYN/ACK. The key's method
of generation is implementation specific. The key MUST be hard to of generation is implementation specific. The key MUST be hard to
guess, and it MUST be unique for the sending host at any one time. guess, and it MUST be unique for the sending host across all its
Recommendations for generating random numbers for use in keys are current MPTCP connections. Recommendations for generating random
given in [RFC4086]. Connections will be indexed at each host by the numbers for use in keys are given in [RFC4086]. Connections will be
token (a one-way hash of the key). Therefore, an implementation will indexed at each host by the token (a one-way hash of the key).
require a mapping from each token to the corresponding connection, Therefore, an implementation will require a mapping from each token
and in turn to the keys for the connection. to the corresponding connection, and in turn to the keys for the
connection.
There is a risk that two different keys will hash to the same token. There is a risk that two different keys will hash to the same token.
The risk of hash collisions is usually small, unless the host is The risk of hash collisions is usually small, unless the host is
handling many tens of thousands of connections. Therefore, an handling many tens of thousands of connections. Therefore, an
implementation SHOULD check its list of connection tokens to ensure implementation SHOULD check its list of connection tokens to ensure
there is not a collision before sending its key, and if there is, there is no collision before sending its key, and if there is, then
then it should generate a new key. This would, however, be costly it should generate a new key. This would, however, be costly for a
for a server with thousands of connections. The subflow handshake server with thousands of connections. The subflow handshake
mechanism (Section 3.2) will ensure that new subflows only join the mechanism (Section 3.2) will ensure that new subflows only join the
correct connection, however, through the cryptographic handshake, as correct connection, however, through the cryptographic handshake, as
well as checking the connection tokens in both directions, and well as checking the connection tokens in both directions, and
ensuring sequence numbers are in-window. So in the worst case if ensuring sequence numbers are in-window. So in the worst case if
there was a token collision, the new subflow would not succeed, but there was a token collision, the new subflow would not succeed, but
the MPTCP connection would continue to provide a regular TCP service. the MPTCP connection would continue to provide a regular TCP service.
Since key generation is implementation-specific, there is no Since key generation is implementation-specific, there is no
requirement that they be simply random numbers. An implemention is requirement that they be simply random numbers. An implementation is
free to exchange cryptographic material out-of-band and generate free to exchange cryptographic material out-of-band and generate
these keys from this, in order to provide additional mechanisms by these keys from this, in order to provide additional mechanisms by
which to verify the identity of the communicating entities. For which to verify the identity of the communicating entities. For
example, an implementation could choose to link its MPTCP keys to example, an implementation could choose to link its MPTCP keys to
those used in higher-layer TLS or SSH connections. those used in higher-layer TLS or SSH connections.
If the server behaves in a stateless manner, it has to generate its If the server behaves in a stateless manner, it has to generate its
own key in a verifiable fashion. This verifiable way of generating own key in a verifiable fashion. This verifiable way of generating
the key can be done by using a hash of the 4-tuple, sequence number the key can be done by using a hash of the 4-tuple, sequence number
and a local secret (similar to what is done for the TCP-sequence and a local secret (similar to what is done for the TCP-sequence
skipping to change at page 16, line 38 skipping to change at page 19, line 12
generate an alternative verifiable key, then the connection MUST fall generate an alternative verifiable key, then the connection MUST fall
back to using regular TCP by not sending a MP_CAPABLE in the SYN/ACK. back to using regular TCP by not sending a MP_CAPABLE in the SYN/ACK.
The ACK carries both A's key and B's key. This is the first time The ACK carries both A's key and B's key. This is the first time
that A's key is seen on the wire, although it is expected that A will that A's key is seen on the wire, although it is expected that A will
have generated a key locally before the initial SYN. The echoing of have generated a key locally before the initial SYN. The echoing of
B's key allows B to operate statelessly, as described above. B's key allows B to operate statelessly, as described above.
Therefore, A's key must be delivered reliably to B, and in order to Therefore, A's key must be delivered reliably to B, and in order to
do this, the transmission of this packet must be made reliable. do this, the transmission of this packet must be made reliable.
If B has data to send first, then the reliable delivery of the ACK If B has data to send first, then the reliable delivery of the
can be inferred by the receipt of this data with a MPTCP Data ACK+MP_CAPABLE can be inferred by the receipt of this data with a
Sequence Signal (DSS) option (Section 3.3). If, however, A wishes to MPTCP Data Sequence Signal (DSS) option (Section 3.3). If, however,
send data first, it would not know whether the ACK has successfully A wishes to send data first, it has two options to ensure the
been received, and thus whether the MPTCP is successfully reliable delivery of the ACK+MP_CAPABLE. If it immediately has data
established. Therefore, on the first data A has to send (if it has to send, then the third ACK (with data) would also contain an
not received any data from B), it MUST also include a MP_CAPABLE MP_CAPABLE option with additional data parameters (the Data-Level
option, with additional data parameters (the Data-Level Length and Length and optional Checksum as shown in Figure 4). If A does not
optional Checksum as shown in Figure 4). This packet may be the immediately have data to send, it MUST include the MP_CAPABLE on the
third ACK if data is ready to be sent by the application, or may be a third ACK, but without the additional data parameters. When A does
later packet if the application only later has data to send. This have data to send, it must repeat the sending of the MP_CAPABLE
option from the third ACK, with additional data parameters. This
MP_CAPABLE option is in place of the DSS, and simply specifies the MP_CAPABLE option is in place of the DSS, and simply specifies the
data-level length of the payload, and the checksum (if the use of data-level length of the payload, and the checksum (if the use of
checksums is negotiated). This is the minimal data required to checksums is negotiated). This is the minimal data required to
establish a MPTCP connection - it allows validation of the payload, establish a MPTCP connection - it allows validation of the payload,
and given it is the first data, the Initial Data Sequence Number and given it is the first data, the Initial Data Sequence Number
(IDSN) is also known (as it is generated from the key, as described (IDSN) is also known (as it is generated from the key, as described
below). Conveying the keys on the first data packet allows the TCP below). Conveying the keys on the first data packet allows the TCP
reliability mechanisms to ensure the packet is successfully reliability mechanisms to ensure the packet is successfully
delivered. The receiver will acknowledge this data a the connection delivered. The receiver will acknowledge this data at the connection
level with a Data ACK, as if a DSS option has been received. level with a Data ACK, as if a DSS option has been received.
There could be situations where both A and B attempt to transmit There could be situations where both A and B attempt to transmit
initial data at the same time. For example, if A did not initially initial data at the same time. For example, if A did not initially
have data to send, but then needed to transmit data before it had have data to send, but then needed to transmit data before it had
received anything from B, it would use a MP_CAPABLE option with data received anything from B, it would use a MP_CAPABLE option with data
parameters (since it would not know if the MP_CAPABLE on the ACK was parameters (since it would not know if the MP_CAPABLE on the ACK was
received). In such a situation, B may also have transmitted data received). In such a situation, B may also have transmitted data
with a DSS option, but it had not yet been received at A. Therefore, with a DSS option, but it had not yet been received at A. Therefore,
B has received data with a MP_CAPABLE mapping after it has sent data B has received data with a MP_CAPABLE mapping after it has sent data
skipping to change at page 17, line 44 skipping to change at page 20, line 19
the handshake either party thinks the MPTCP negotiation is the handshake either party thinks the MPTCP negotiation is
compromised, for example by a middlebox corrupting the TCP options, compromised, for example by a middlebox corrupting the TCP options,
or unexpected ACK numbers being present, the host MUST stop using or unexpected ACK numbers being present, the host MUST stop using
MPTCP and no longer include MPTCP options in future TCP packets. The MPTCP and no longer include MPTCP options in future TCP packets. The
other host will then also fall back to regular TCP using the fall other host will then also fall back to regular TCP using the fall
back mechanism. Note that new subflows MUST NOT be established back mechanism. Note that new subflows MUST NOT be established
(using the process documented in Section 3.2) until a Data Sequence (using the process documented in Section 3.2) until a Data Sequence
Signal (DSS) option has been successfully received across the path Signal (DSS) option has been successfully received across the path
(as documented in Section 3.3). (as documented in Section 3.3).
The first 4 bits of the first octet in the MP_CAPABLE option Like all MPTCP options, the MP_CAPABLE option starts with the Kind
(Figure 4) define the MPTCP option subtype (see Section 8; for and Length to specify the TCP-option kind and its length. Followed
MP_CAPABLE, this is 0), and the remaining 4 bits of this octet by that is the MP_CAPABLE option. The first 4 bits of the first
specify the MPTCP version in use (for this specification, this is 1). octet in the MP_CAPABLE option (Figure 4) define the MPTCP option
subtype (see Section 8; for MP_CAPABLE, this is 0x0), and the
remaining 4 bits of this octet specify the MPTCP version in use (for
this specification, this is 1).
The second octet is reserved for flags, allocated as follows: The second octet is reserved for flags, allocated as follows:
A: The leftmost bit, labeled "A", SHOULD be set to 1 to indicate A: The leftmost bit, labeled "A", SHOULD be set to 1 to indicate
"Checksum Required", unless the system administrator has decided "Checksum Required", unless the system administrator has decided
that checksums are not required (for example, if the environment that checksums are not required (for example, if the environment
is controlled and no middleboxes exist that might adjust the is controlled and no middleboxes exist that might adjust the
payload). payload).
B: The second bit, labeled "B", is an extensibility flag, and MUST be B: The second bit, labeled "B", is an extensibility flag, and MUST be
set to 0 for current implementations. This will be used for an set to 0 for current implementations. This will be used for an
extensibility mechanism in a future specification, and the impact extensibility mechanism in a future specification, and the impact
of this flag will be defined at a later date. If receiving a of this flag will be defined at a later date. If receiving a
message with the 'B' flag set to 1, and this is not understood, message with the 'B' flag set to 1, and this is not understood,
then this SYN MUST be silently ignored; the sender is expected to then the MP_CAPABLE in this SYN MUST be silently ignored, which
triggers a fallback to regular TCP; the sender is expected to
retry with a format compatible with this legacy specification. retry with a format compatible with this legacy specification.
Note that the length of the MP_CAPABLE option, and the meanings of Note that the length of the MP_CAPABLE option, and the meanings of
bits "C" through "H", may be altered by setting B=1. bits "C" through "H", may be altered by setting B=1.
C: The third bit, labeled "C", is set to "1" to indicate that the C: The third bit, labeled "C", is set to "1" to indicate that the
sender of this option will not accept additional MPTCP subflows to sender of this option will not accept additional MPTCP subflows to
the source address and port, and therefore the receiver MUST NOT the source address and port, and therefore the receiver MUST NOT
try to open any additional subflows towards this address and port. try to open any additional subflows towards this address and port.
This is an efficiency improvement for situations where the sender This is an efficiency improvement for situations where the sender
knows a restriction is in place, for example if the sender is knows a restriction is in place, for example if the sender is
skipping to change at page 19, line 37 skipping to change at page 22, line 15
load. If a responder does not support (or does not want to support) load. If a responder does not support (or does not want to support)
any of the initiator's proposals, it can respond without an any of the initiator's proposals, it can respond without an
MP_CAPABLE option, thus forcing a fallback to regular TCP. MP_CAPABLE option, thus forcing a fallback to regular TCP.
The MP_CAPABLE option is only used in the first subflow of a The MP_CAPABLE option is only used in the first subflow of a
connection, in order to identify the connection; all following connection, in order to identify the connection; all following
subflows will use the "Join" option (see Section 3.2) to join the subflows will use the "Join" option (see Section 3.2) to join the
existing connection. existing connection.
If a SYN contains an MP_CAPABLE option but the SYN/ACK does not, it If a SYN contains an MP_CAPABLE option but the SYN/ACK does not, it
is assumed that the passive opener is not multipath capable; thus, is assumed that sender of the SYN/ACK is not multipath capable; thus,
the MPTCP session MUST operate as a regular, single-path TCP. If a the MPTCP session MUST operate as a regular, single-path TCP. If a
SYN does not contain a MP_CAPABLE option, the SYN/ACK MUST NOT SYN does not contain a MP_CAPABLE option, the SYN/ACK MUST NOT
contain one in response. If the third packet (the ACK) does not contain one in response. If the third packet (the ACK) does not
contain the MP_CAPABLE option, then the session MUST fall back to contain the MP_CAPABLE option, then the session MUST fall back to
operating as a regular, single-path TCP. This is to maintain operating as a regular, single-path TCP. This is to maintain
compatibility with middleboxes on the path that drop some or all TCP compatibility with middleboxes on the path that drop some or all TCP
options. Note that an implementation MAY choose to attempt sending options. Note that an implementation MAY choose to attempt sending
MPTCP options more than one time before making this decision to MPTCP options more than one time before making this decision to
operate as regular TCP (see Section 3.9). operate as regular TCP (see Section 3.9).
skipping to change at page 22, line 26 skipping to change at page 24, line 50
When receiving a SYN with an MP_JOIN option that contains a valid When receiving a SYN with an MP_JOIN option that contains a valid
token for an existing MPTCP connection, the recipient SHOULD respond token for an existing MPTCP connection, the recipient SHOULD respond
with a SYN/ACK also containing an MP_JOIN option containing a random with a SYN/ACK also containing an MP_JOIN option containing a random
number and a truncated (leftmost 64 bits) Hash-based Message number and a truncated (leftmost 64 bits) Hash-based Message
Authentication Code (HMAC). This version of the option is shown in Authentication Code (HMAC). This version of the option is shown in
Figure 6. If the token is unknown, or the host wants to refuse Figure 6. If the token is unknown, or the host wants to refuse
subflow establishment (for example, due to a limit on the number of subflow establishment (for example, due to a limit on the number of
subflows it will permit), the receiver will send back a reset (RST) subflows it will permit), the receiver will send back a reset (RST)
signal, analogous to an unknown port in TCP, containing a MP_TCPRST signal, analogous to an unknown port in TCP, containing a MP_TCPRST
option (Section 3.6) with an appropriate reason code. Although option (Section 3.6) with a "MPTCP specific error" reason code.
calculating an HMAC requires cryptographic operations, it is believed Although calculating an HMAC requires cryptographic operations, it is
that the 32-bit token in the MP_JOIN SYN gives sufficient protection believed that the 32-bit token in the MP_JOIN SYN gives sufficient
against blind state exhaustion attacks; therefore, there is no need protection against blind state exhaustion attacks; therefore, there
to provide mechanisms to allow a responder to operate statelessly at is no need to provide mechanisms to allow a responder to operate
the MP_JOIN stage. statelessly at the MP_JOIN stage.
An HMAC is sent by both hosts -- by the initiator (Host A) in the An HMAC is sent by both hosts -- by the initiator (Host A) in the
third packet (the ACK) and by the responder (Host B) in the second third packet (the ACK) and by the responder (Host B) in the second
packet (the SYN/ACK). Doing the HMAC exchange at this stage allows packet (the SYN/ACK). Doing the HMAC exchange at this stage allows
both hosts to have first exchanged random data (in the first two SYN both hosts to have first exchanged random data (in the first two SYN
packets) that is used as the "message". This specification defines packets) that is used as the "message". This specification defines
that HMAC as defined in [RFC2104] is used, along with the SHA-256 that HMAC as defined in [RFC2104] is used, along with the SHA-256
hash algorithm [SHS] (potentially implemented as in [RFC6234]), thus hash algorithm [SHS] (potentially implemented as in [RFC6234]), thus
generating a 160-bit / 20-octet HMAC. Due to option space generating a 160-bit / 20-octet HMAC. Due to option space
limitations, the HMAC included in the SYN/ACK is truncated to the limitations, the HMAC included in the SYN/ACK is truncated to the
skipping to change at page 24, line 35 skipping to change at page 27, line 35
| |<-------------------------------| | |<-------------------------------|
| | ACK | | | ACK |
HMAC-A = HMAC(Key=(Key-A+Key-B), Msg=(R-A+R-B)) HMAC-A = HMAC(Key=(Key-A+Key-B), Msg=(R-A+R-B))
HMAC-B = HMAC(Key=(Key-B+Key-A), Msg=(R-B+R-A)) HMAC-B = HMAC(Key=(Key-B+Key-A), Msg=(R-B+R-A))
Figure 8: Example Use of MPTCP Authentication Figure 8: Example Use of MPTCP Authentication
If the token received at Host B is unknown or local policy prohibits If the token received at Host B is unknown or local policy prohibits
the acceptance of the new subflow, the recipient MUST respond with a the acceptance of the new subflow, the recipient MUST respond with a
TCP RST for the subflow, with a MP_TCPRST option (Section 3.6) with TCP RST for the subflow. If appropriate, a MP_TCPRST option with a
an appropriate reason code. "Administratively prohibited" reason code (Section 3.6) should be
included.
If the token is accepted at Host B, but the HMAC returned to Host A If the token is accepted at Host B, but the HMAC returned to Host A
does not match the one expected, Host A MUST close the subflow with a does not match the one expected, Host A MUST close the subflow with a
TCP RST. In this, and all following cases of sending a RST in this TCP RST. In this, and all following cases of sending a RST in this
section, the sender SHOULD send a MP_TCPRST option (Section 3.6) on section, the sender SHOULD send a MP_TCPRST option (Section 3.6) on
this RST packet with the reason code for a "MPTCP specific error". this RST packet with the reason code for a "MPTCP specific error".
If Host B does not receive the expected HMAC, or the MP_JOIN option If Host B does not receive the expected HMAC, or the MP_JOIN option
is missing from the ACK, it MUST close the subflow with a TCP RST is missing from the ACK, it MUST close the subflow with a TCP RST
with a MP_TCPRST (Section 3.6) option with the reason code for "MPTCP with a MP_TCPRST (Section 3.6) option with the reason code for "MPTCP
skipping to change at page 25, line 11 skipping to change at page 28, line 13
authenticated each other as being the same peers as existed at the authenticated each other as being the same peers as existed at the
start of the connection, and they have agreed of which connection start of the connection, and they have agreed of which connection
this subflow will become a part. this subflow will become a part.
If the SYN/ACK as received at Host A does not have an MP_JOIN option, If the SYN/ACK as received at Host A does not have an MP_JOIN option,
Host A MUST close the subflow with a TCP RST with a MP_TCPRST Host A MUST close the subflow with a TCP RST with a MP_TCPRST
(Section 3.6) option with the reason code for "MPTCP specific error". (Section 3.6) option with the reason code for "MPTCP specific error".
This covers all cases of the loss of an MP_JOIN. In more detail, if This covers all cases of the loss of an MP_JOIN. In more detail, if
MP_JOIN is stripped from the SYN on the path from A to B, and Host B MP_JOIN is stripped from the SYN on the path from A to B, and Host B
does not have a passive opener on the relevant port, it will respond does not have a listener on the relevant port, it will respond with a
with a RST in the normal way. If in response to a SYN with an RST in the normal way. If in response to a SYN with an MP_JOIN
MP_JOIN option, a SYN/ACK is received without the MP_JOIN option option, a SYN/ACK is received without the MP_JOIN option (either
(either since it was stripped on the return path, or it was stripped since it was stripped on the return path, or it was stripped on the
on the outgoing path but the passive opener on Host B responded as if outgoing path but Host B responded as if it were a new regular TCP
it were a new regular TCP session), then the subflow is unusable and session), then the subflow is unusable and Host A MUST close it with
Host A MUST close it with a RST. a RST.
Note that additional subflows can be created between any pair of Note that additional subflows can be created between any pair of
ports (but see Section 3.9 for heuristics); no explicit application- ports (but see Section 3.9 for heuristics); no explicit application-
level accept calls or bind calls are required to open additional level accept calls or bind calls are required to open additional
subflows. To associate a new subflow with an existing connection, subflows. To associate a new subflow with an existing connection,
the token supplied in the subflow's SYN exchange is used for the token supplied in the subflow's SYN exchange is used for
demultiplexing. This then binds the 5-tuple of the TCP subflow to demultiplexing. This then binds the 5-tuple of the TCP subflow to
the local token of the connection. A consequence is that it is the local token of the connection. A consequence is that it is
possible to allow any port pairs to be used for a connection. possible to allow any port pairs to be used for a connection.
skipping to change at page 26, line 28 skipping to change at page 29, line 31
Figure 9: Data Sequence Signal (DSS) Option Figure 9: Data Sequence Signal (DSS) Option
The flags, when set, define the contents of this option, as follows: The flags, when set, define the contents of this option, as follows:
o A = Data ACK present o A = Data ACK present
o a = Data ACK is 8 octets (if not set, Data ACK is 4 octets) o a = Data ACK is 8 octets (if not set, Data ACK is 4 octets)
o M = Data Sequence Number (DSN), Subflow Sequence Number (SSN), o M = Data Sequence Number (DSN), Subflow Sequence Number (SSN),
Data-Level Length, and Checksum present Data-Level Length, and Checksum (if negotiated) present
o m = Data sequence number is 8 octets (if not set, DSN is 4 octets) o m = Data sequence number is 8 octets (if not set, DSN is 4 octets)
The flags 'a' and 'm' only have meaning if the corresponding 'A' or The flags 'a' and 'm' only have meaning if the corresponding 'A' or
'M' flags are set; otherwise, they will be ignored. The maximum 'M' flags are set; otherwise, they will be ignored. The maximum
length of this option, with all flags set, is 28 octets. length of this option, with all flags set, is 28 octets.
The 'F' flag indicates "DATA_FIN". If present, this means that this The 'F' flag indicates "DATA_FIN". If present, this means that this
mapping covers the final data from the sender. This is the mapping covers the final data from the sender. This is the
connection-level equivalent to the FIN flag in single-path TCP. A connection-level equivalent to the FIN flag in single-path TCP. A
connection is not closed unless there has been a DATA_FIN exchange or connection is not closed unless there has been a DATA_FIN exchange,
a timeout. The purpose of the DATA_FIN and the interactions between or an implementation-specific, connection-level timeout. The purpose
this flag, the subflow-level FIN flag, and the data sequence mapping of the DATA_FIN and the interactions between this flag, the subflow-
are described in Section 3.3.3. The remaining reserved bits MUST be level FIN flag, and the data sequence mapping are described in
set to zero by an implementation of this specification. Section 3.3.3. The remaining reserved bits MUST be set to zero by an
implementation of this specification.
Note that the checksum is only present in this option if the use of Note that the checksum is only present in this option if the use of
MPTCP checksumming has been negotiated at the MP_CAPABLE handshake MPTCP checksumming has been negotiated at the MP_CAPABLE handshake
(see Section 3.1). The presence of the checksum can be inferred from (see Section 3.1). The presence of the checksum can be inferred from
the length of the option. If a checksum is present, but its use had the length of the option. If a checksum is present, but its use had
not been negotiated in the MP_CAPABLE handshake, the checksum field not been negotiated in the MP_CAPABLE handshake, the checksum field
MUST be ignored. If a checksum is not present when its use has been MUST be ignored. If a checksum is not present when its use has been
negotiated, the receiver MUST close the subflow with a RST as it is negotiated, the receiver MUST close the subflow with a RST as it is
considered broken. This RST SHOULD be accompanied with a MP_TCPRST considered broken. This RST SHOULD be accompanied with a MP_TCPRST
option (Section 3.6) with the reason code for a "MPTCP specific option (Section 3.6) with the reason code for a "MPTCP specific
skipping to change at page 27, line 42 skipping to change at page 30, line 47
the data sequence number after the mapping has been processed. A the data sequence number after the mapping has been processed. A
sender MUST NOT change this mapping after it has been declared; sender MUST NOT change this mapping after it has been declared;
however, the same data sequence number can be mapped to by different however, the same data sequence number can be mapped to by different
subflows for retransmission purposes (see Section 3.3.6). This would subflows for retransmission purposes (see Section 3.3.6). This would
also permit the same data to be sent simultaneously on multiple also permit the same data to be sent simultaneously on multiple
subflows for resilience or efficiency purposes, especially in the subflows for resilience or efficiency purposes, especially in the
case of lossy links. Although the detailed specification of such case of lossy links. Although the detailed specification of such
operation is outside the scope of this document, an implementation operation is outside the scope of this document, an implementation
SHOULD treat the first data that is received at a subflow for the SHOULD treat the first data that is received at a subflow for the
data sequence space as that which should be delivered to the data sequence space as that which should be delivered to the
application, and any later data for that sequence space ignored. application, and any later data for that sequence space should be
ignored.
The data sequence number is specified as an absolute value, whereas The data sequence number is specified as an absolute value, whereas
the subflow sequence numbering is relative (the SYN at the start of the subflow sequence numbering is relative (the SYN at the start of
the subflow has relative subflow sequence number 0). This is to the subflow has relative subflow sequence number 0). This is to
allow middleboxes to change the initial sequence number of a subflow, allow middleboxes to change the initial sequence number of a subflow,
such as firewalls that undertake ISN randomization. such as firewalls that undertake Initial Sequence Number (ISN)
randomization.
The data sequence mapping also contains a checksum of the data that The data sequence mapping also contains a checksum of the data that
this mapping covers, if use of checksums has been negotiated at the this mapping covers, if use of checksums has been negotiated at the
MP_CAPABLE exchange. Checksums are used to detect if the payload has MP_CAPABLE exchange. Checksums are used to detect if the payload has
been adjusted in any way by a non-MPTCP-aware middlebox. If this been adjusted in any way by a non-MPTCP-aware middlebox. If this
checksum fails, it will trigger a failure of the subflow, or a checksum fails, it will trigger a failure of the subflow, or a
fallback to regular TCP, as documented in Section 3.7, since MPTCP fallback to regular TCP, as documented in Section 3.7, since MPTCP
can no longer reliably know the subflow sequence space at the can no longer reliably know the subflow sequence space at the
receiver to build data sequence mappings. receiver to build data sequence mappings.
skipping to change at page 29, line 21 skipping to change at page 32, line 29
not arrive within a receive window of data, that subflow SHOULD be not arrive within a receive window of data, that subflow SHOULD be
treated as broken, closed with a RST, and any unmapped data silently treated as broken, closed with a RST, and any unmapped data silently
discarded. discarded.
Data sequence numbers are always 64-bit quantities, and MUST be Data sequence numbers are always 64-bit quantities, and MUST be
maintained as such in implementations. If a connection is maintained as such in implementations. If a connection is
progressing at a slow rate, so protection against wrapped sequence progressing at a slow rate, so protection against wrapped sequence
numbers is not required, then an implementation MAY include just the numbers is not required, then an implementation MAY include just the
lower 32 bits of the data sequence number in the data sequence lower 32 bits of the data sequence number in the data sequence
mapping and/or Data ACK as an optimization, and an implementation can mapping and/or Data ACK as an optimization, and an implementation can
make this choice independently for each packet. An implementaton make this choice independently for each packet. An implementation
MUST be able to receive and process both 64-bit or 32-bit sequence MUST be able to receive and process both 64-bit or 32-bit sequence
number values, but it is not required that an implementation is able number values, but it is not required that an implementation is able
to send both. to send both.
An implementation MUST send the full 64-bit data sequence number if An implementation MUST send the full 64-bit data sequence number if
it is transmitting at a sufficiently high rate that the 32-bit value it is transmitting at a sufficiently high rate that the 32-bit value
could wrap within the Maximum Segment Lifetime (MSL) [RFC1323]. The could wrap within the Maximum Segment Lifetime (MSL) [RFC1323]. The
lengths of the DSNs used in these values (which may be different) are lengths of the DSNs used in these values (which may be different) are
declared with flags in the DSS option. Implementations MUST accept a declared with flags in the DSS option. Implementations MUST accept a
32-bit DSN and implicitly promote it to a 64-bit quantity by 32-bit DSN and implicitly promote it to a 64-bit quantity by
skipping to change at page 32, line 21 skipping to change at page 35, line 29
necessary to retransmit data on different subflows. Essentially, a necessary to retransmit data on different subflows. Essentially, a
host MUST NOT close all functioning subflows unless it is safe to do host MUST NOT close all functioning subflows unless it is safe to do
so, i.e., until all outstanding data has been DATA_ACKed, or until so, i.e., until all outstanding data has been DATA_ACKed, or until
the segment with the DATA_FIN flag set is the only outstanding the segment with the DATA_FIN flag set is the only outstanding
segment. segment.
Once a DATA_FIN has been acknowledged, all remaining subflows MUST be Once a DATA_FIN has been acknowledged, all remaining subflows MUST be
closed with standard FIN exchanges. Both hosts SHOULD send FINs on closed with standard FIN exchanges. Both hosts SHOULD send FINs on
all subflows, as a courtesy to allow middleboxes to clean up state all subflows, as a courtesy to allow middleboxes to clean up state
even if an individual subflow has failed. It is also encouraged to even if an individual subflow has failed. It is also encouraged to
reduce the timeouts (Maximum Segment Life) on subflows at end hosts. reduce the timeouts (Maximum Segment Lifetime) on subflows at end
In particular, any subflows where there is still outstanding data hosts after receiving a DATA_FIN. In particular, any subflows where
queued (which has been retransmitted on other subflows in order to there is still outstanding data queued (which has been retransmitted
get the DATA_FIN acknowledged) MAY be closed with a RST with on other subflows in order to get the DATA_FIN acknowledged) MAY be
MP_TCPRST (Section 3.6) error code for "too much outstanding data". closed with a RST with MP_TCPRST (Section 3.6) error code for "too
much outstanding data".
A connection is considered closed once both hosts' DATA_FINs have A connection is considered closed once both hosts' DATA_FINs have
been acknowledged by DATA_ACKs. been acknowledged by DATA_ACKs.
As specified above, a standard TCP FIN on an individual subflow only As specified above, a standard TCP FIN on an individual subflow only
shuts down the subflow on which it was sent. If all subflows have shuts down the subflow on which it was sent. If all subflows have
been closed with a FIN exchange, but no DATA_FIN has been received been closed with a FIN exchange, but no DATA_FIN has been received
and acknowledged, the MPTCP connection is treated as closed only and acknowledged, the MPTCP connection is treated as closed only
after a timeout. This implies that an implementation will have after a timeout. This implies that an implementation will have
TIME_WAIT states at both the subflow and connection levels (see TIME_WAIT states at both the subflow and connection levels (see
skipping to change at page 35, line 39 skipping to change at page 39, line 4
and will keep trying to retransmit the data on the failed subflow and will keep trying to retransmit the data on the failed subflow
too. The sender will declare the subflow failed after a predefined too. The sender will declare the subflow failed after a predefined
upper bound on retransmissions is reached (which MAY be lower than upper bound on retransmissions is reached (which MAY be lower than
the usual TCP limits of the Maximum Segment Life), or on the receipt the usual TCP limits of the Maximum Segment Life), or on the receipt
of an ICMP error, and only then delete the outstanding data segments. of an ICMP error, and only then delete the outstanding data segments.
Multiple retransmissions are triggers that will indicate that a Multiple retransmissions are triggers that will indicate that a
subflow performs badly and could lead to a host resetting the subflow subflow performs badly and could lead to a host resetting the subflow
with a RST. However, additional research is required to understand with a RST. However, additional research is required to understand
the heuristics of how and when to reset underperforming subflows. the heuristics of how and when to reset underperforming subflows.
For example, a highly asymmetric path may be misdiagnosed as For example, a highly asymmetric path may be misdiagnosed as
underperforming. A RST for this purpose SHOULD be accompanied with underperforming. A RST for this purpose SHOULD be accompanied with
an appropriate MP_TCPRST option (Section 3.6). an "Unacceptable performance" MP_TCPRST option (Section 3.6).
3.3.7. Congestion Control Considerations 3.3.7. Congestion Control Considerations
Different subflows in an MPTCP connection have different congestion Different subflows in an MPTCP connection have different congestion
windows. To achieve fairness at bottlenecks and resource pooling, it windows. To achieve fairness at bottlenecks and resource pooling, it
is necessary to couple the congestion windows in use on each subflow, is necessary to couple the congestion windows in use on each subflow,
in order to push most traffic to uncongested links. One algorithm in order to push most traffic to uncongested links. One algorithm
for achieving this is presented in [RFC6356]; the algorithm does not for achieving this is presented in [RFC6356]; the algorithm does not
achieve perfect resource pooling but is "safe" in that it is readily achieve perfect resource pooling but is "safe" in that it is readily
deployable in the current Internet. By this, we mean that it does deployable in the current Internet. By this, we mean that it does
skipping to change at page 37, line 15 skipping to change at page 40, line 29
subflow where the receiver has indicated B=1 SHOULD NOT be used to subflow where the receiver has indicated B=1 SHOULD NOT be used to
send data unless there are no usable subflows where B=0). send data unless there are no usable subflows where B=0).
In the event that the available set of paths changes, a host may wish In the event that the available set of paths changes, a host may wish
to signal a change in priority of subflows to the peer (e.g., a to signal a change in priority of subflows to the peer (e.g., a
subflow that was previously set as backup should now take priority subflow that was previously set as backup should now take priority
over all remaining subflows). Therefore, the MP_PRIO option, shown over all remaining subflows). Therefore, the MP_PRIO option, shown
in Figure 11, can be used to change the 'B' flag of the subflow on in Figure 11, can be used to change the 'B' flag of the subflow on
which it is sent. which it is sent.
Another use of the MP_PRIO option is to set the 'B' flag on a subflow
to cleanly retire its use before closing it and removing it with
REMOVE_ADDR Section 3.4.2, for example to support make-before-break
session continuity.
1 2 3 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+---------------+---------------+-------+-----+-+ +---------------+---------------+-------+-----+-+
| Kind | Length |Subtype| |B| | Kind | Length |Subtype| |B|
+---------------+---------------+-------+-----+-+ +---------------+---------------+-------+-----+-+
Figure 11: Change Subflow Priority (MP_PRIO) Option Figure 11: Change Subflow Priority (MP_PRIO) Option
It should be noted that the backup flag is a request from a data It should be noted that the backup flag is a request from a data
receiver to a data sender only, and the data sender SHOULD adhere to receiver to a data sender only, and the data sender SHOULD adhere to
skipping to change at page 38, line 51 skipping to change at page 42, line 23
The Add Address (ADD_ADDR) MPTCP option announces additional The Add Address (ADD_ADDR) MPTCP option announces additional
addresses (and optionally, ports) on which a host can be reached addresses (and optionally, ports) on which a host can be reached
(Figure 12). This option can be used at any time during a (Figure 12). This option can be used at any time during a
connection, depending on when the sender wishes to enable multiple connection, depending on when the sender wishes to enable multiple
paths and/or when paths become available. As with all MPTCP signals, paths and/or when paths become available. As with all MPTCP signals,
the receiver MUST undertake standard TCP validity checks, e.g. the receiver MUST undertake standard TCP validity checks, e.g.
[RFC5961], before acting upon it. [RFC5961], before acting upon it.
Every address has an Address ID that can be used for uniquely Every address has an Address ID that can be used for uniquely
identifying the address within a connection for address removal. identifying the address within a connection for address removal. The
This is also used to identify MP_JOIN options (see Section 3.2) Address ID is also used to identify MP_JOIN options (see Section 3.2)
relating to the same address, even when address translators are in relating to the same address, even when address translators are in
use. The Address ID MUST uniquely identify the address to the sender use. The Address ID MUST uniquely identify the address for the
(within the scope of the connection), but the mechanism for sender of the option (within the scope of the connection), but the
allocating such IDs is implementation specific. mechanism for allocating such IDs is implementation specific.
All address IDs learned via either MP_JOIN or ADD_ADDR SHOULD be All address IDs learned via either MP_JOIN or ADD_ADDR SHOULD be
stored by the receiver in a data structure that gathers all the stored by the receiver in a data structure that gathers all the
Address ID to address mappings for a connection (identified by a Address ID to address mappings for a connection (identified by a
token pair). In this way, there is a stored mapping between Address token pair). In this way, there is a stored mapping between Address
ID, observed source address, and token pair for future processing of ID, observed source address, and token pair for future processing of
control information for a connection. Note that an implementation control information for a connection. Note that an implementation
MAY discard incoming address advertisements at will, for example, for MAY discard incoming address advertisements at will, for example, for
avoiding the required mapping state, or because advertised addresses avoiding updating mapping state, or because advertised addresses are
are of no use to it (for example, IPv6 addresses when it has IPv4 of no use to it (for example, IPv6 addresses when it has IPv4 only).
only). Therefore, a host MUST treat address advertisements as soft Therefore, a host MUST treat address advertisements as soft state,
state, and it MAY choose to refresh advertisements periodically. and it MAY choose to refresh advertisements periodically.
This option is shown in Figure 12. The illustration is sized for This option is shown in Figure 12. The illustration is sized for
IPv4 addresses. For IPv6, the length of the address will be 16 IPv4 addresses. For IPv6, the length of the address will be 16
octets (instead of 4). octets (instead of 4).
The 2 octets that specify the TCP port number to use are optional and The 2 octets that specify the TCP port number to use are optional and
their presence can be inferred from the length of the option. their presence can be inferred from the length of the option.
Although it is expected that the majority of use cases will use the Although it is expected that the majority of use cases will use the
same port pairs as used for the initial subflow (e.g., port 80 same port pairs as used for the initial subflow (e.g., port 80
remains port 80 on all subflows, as does the ephemeral port at the remains port 80 on all subflows, as does the ephemeral port at the
skipping to change at page 40, line 4 skipping to change at page 43, line 25
implemented as in [RFC6234]. In the same way as for MP_JOIN, the key implemented as in [RFC6234]. In the same way as for MP_JOIN, the key
for the HMAC algorithm, in the case of the message transmitted by for the HMAC algorithm, in the case of the message transmitted by
Host A, will be Key-A followed by Key-B, and in the case of Host B, Host A, will be Key-A followed by Key-B, and in the case of Host B,
Key-B followed by Key-A. These are the keys that were exchanged in Key-B followed by Key-A. These are the keys that were exchanged in
the original MP_CAPABLE handshake. The message for the HMAC is the the original MP_CAPABLE handshake. The message for the HMAC is the
Address ID, IP Address, and Port which precede the HMAC in the Address ID, IP Address, and Port which precede the HMAC in the
ADD_ADDR option. If the port is not present in the ADD_ADDR option, ADD_ADDR option. If the port is not present in the ADD_ADDR option,
the HMAC message will nevertheless include two octets of value zero. the HMAC message will nevertheless include two octets of value zero.
The rationale for the HMAC is to prevent unauthorized entities from The rationale for the HMAC is to prevent unauthorized entities from
injecting ADD_ADDR signals in an attempt to hijack a connection. injecting ADD_ADDR signals in an attempt to hijack a connection.
Note that additionally the presence of this HMAC prevents the address Note that additionally the presence of this HMAC prevents the address
being changed in flight unless the key is known by an intermediary. being changed in flight unless the key is known by an intermediary.
If a host receives an ADD_ADDR option for which it cannot validate If a host receives an ADD_ADDR option for which it cannot validate
the HMAC, it SHOULD silently ignore the option. the HMAC, it SHOULD silently ignore the option.
A set of four flags are present after the subtype and before the A set of four flags are present after the subtype and before the
Address ID. Only the rightmost bit - labelled 'E' - is assinged Address ID. Only the rightmost bit - labelled 'E' - is assigned
today. The other bits are currently unassigned and MUST be set to today. The other bits are currently unassigned and MUST be set to
zero by a sender and MUST be ignored by the receiver. zero by a sender and MUST be ignored by the receiver.
The 'E' bit exists to provide reliability for this option. Because The 'E' flag exists to provide reliability for this option. Because
this option will often be sent on pure ACKs, there is no guarantee of this option will often be sent on pure ACKs, there is no guarantee of
reliability. Therefore, a receiver receiving a fresh ADD_ADDR option reliability. Therefore, a receiver receiving a fresh ADD_ADDR option
(where E=0), will send the same option back to the sender, but not (where E=0), will send the same option back to the sender, but not
including the HMAC, and with E=1. The lack of this echo can be used including the HMAC, and with E=1. The lack of this echo can be used
by the initial ADD_ADDR sender to retransmit the ADD_ADDR according by the initial ADD_ADDR sender to retransmit the ADD_ADDR according
to local policy. to local policy.
1 2 3 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+---------------+---------------+-------+-------+---------------+ +---------------+---------------+-------+-------+---------------+
skipping to change at page 40, line 48 skipping to change at page 44, line 30
Due to the proliferation of NATs, it is reasonably likely that one Due to the proliferation of NATs, it is reasonably likely that one
host may attempt to advertise private addresses [RFC1918]. It is not host may attempt to advertise private addresses [RFC1918]. It is not
desirable to prohibit this, since there may be cases where both hosts desirable to prohibit this, since there may be cases where both hosts
have additional interfaces on the same private network, and a host have additional interfaces on the same private network, and a host
MAY want to advertise such addresses. The MP_JOIN handshake to MAY want to advertise such addresses. The MP_JOIN handshake to
create a new subflow (Section 3.2) provides mechanisms to minimize create a new subflow (Section 3.2) provides mechanisms to minimize
security risks. The MP_JOIN message contains a 32-bit token that security risks. The MP_JOIN message contains a 32-bit token that
uniquely identifies the connection to the receiving host. If the uniquely identifies the connection to the receiving host. If the
token is unknown, the host will return with a RST. In the unlikely token is unknown, the host will return with a RST. In the unlikely
event that the token is known, subflow setup will continue, but the event that the token is valid at the receiving host, subflow setup
HMAC exchange must occur for authentication. This will fail, and will continue, but the HMAC exchange must occur for authentication.
will provide sufficient protection against two unconnected hosts This will fail, and will provide sufficient protection against two
accidentally setting up a new subflow upon the signal of a private unconnected hosts accidentally setting up a new subflow upon the
address. Further security considerations around the issue of signal of a private address. Further security considerations around
ADD_ADDR messages that accidentally misdirect, or maliciously direct, the issue of ADD_ADDR messages that accidentally misdirect, or
new MP_JOIN attempts are discussed in Section 5. maliciously direct, new MP_JOIN attempts are discussed in Section 5.
Ideally, ADD_ADDR and REMOVE_ADDR options would be sent reliably, and Ideally, ADD_ADDR and REMOVE_ADDR options would be sent reliably, and
in order, to the other end. This would ensure that this address in order, to the other end. This would ensure that this address
management does not unnecessarily cause an outage in the connection management does not unnecessarily cause an outage in the connection
when remove/add addresses are processed in reverse order, and also to when remove/add addresses are processed in reverse order, and also to
ensure that all possible paths are used. Note, however, that losing ensure that all possible paths are used. Note, however, that losing
reliability and ordering will not break the multipath connections, it reliability and ordering will not break the multipath connections, it
will just reduce the opportunity to open multipath paths and to will just reduce the opportunity to open multipath paths and to
survive different patterns of path failures. survive different patterns of path failures.
Therefore, implementing reliability signals for these MPTCP options Therefore, implementing reliability signals for these MPTCP options
is not necessary. In order to minimize the impact of the loss of is not necessary. In order to minimize the impact of the loss of
these options, however, it is RECOMMENDED that a sender should send these options, however, it is RECOMMENDED that a sender should send
these options on all available subflows. If these options need to be these options on all available subflows. If these options need to be
received in order, an implementation SHOULD only send one ADD_ADDR/ received in order, an implementation SHOULD only send one ADD_ADDR/
REMOVE_ADDR option per RTT, to minimize the risk of misordering. REMOVE_ADDR option per RTT, to minimize the risk of misordering.
A host can send an ADD_ADDR message with an already assigned Address
ID, but the Address MUST be the same as previously assigned to this
Address ID, and the Port MUST be different from one already in use
for this Address ID. If these conditions are not met, the receiver
SHOULD silently ignore the ADD_ADDR. A host wishing to replace an
existing Address ID MUST first remove the existing one
(Section 3.4.2).
A host that receives an ADD_ADDR but finds a connection set up to A host that receives an ADD_ADDR but finds a connection set up to
that IP address and port number is unsuccessful SHOULD NOT perform that IP address and port number is unsuccessful SHOULD NOT perform
further connection attempts to this address/port combination for this further connection attempts to this address/port combination for this
connection. A sender that wants to trigger a new incoming connection connection. A sender that wants to trigger a new incoming connection
attempt on a previously advertised address/port combination can attempt on a previously advertised address/port combination can
therefore refresh ADD_ADDR information by sending the option again. therefore refresh ADD_ADDR information by sending the option again.
A host can therefore send an ADD_ADDR message with an already
assigned Address ID, but the Address MUST be the same as previously
assigned to this Address ID. A new ADD_ADDR may have the same, or
different, port number. If the port number is different, the
receiving host SHOULD try to set up a new subflow to this new
address/port combination.
A host wishing to replace an existing Address ID MUST first remove
the existing one (Section 3.4.2).
During normal MPTCP operation, it is unlikely that there will be During normal MPTCP operation, it is unlikely that there will be
sufficient TCP option space for ADD_ADDR to be included along with sufficient TCP option space for ADD_ADDR to be included along with
those for data sequence numbering (Section 3.3.1). Therefore, it is those for data sequence numbering (Section 3.3.1). Therefore, it is
expected that an MPTCP implementation will send the ADD_ADDR option expected that an MPTCP implementation will send the ADD_ADDR option
on separate ACKs. As discussed earlier, however, an MPTCP on separate ACKs. As discussed earlier, however, an MPTCP
implementation MUST NOT treat duplicate ACKs with any MPTCP option, implementation MUST NOT treat duplicate ACKs with any MPTCP option,
with the exception of the DSS option, as indications of congestion with the exception of the DSS option, as indications of congestion
[RFC5681], and an MPTCP implementation SHOULD NOT send more than two [RFC5681], and an MPTCP implementation SHOULD NOT send more than two
duplicate ACKs in a row for signaling purposes. duplicate ACKs in a row for signaling purposes.
3.4.2. Remove Address 3.4.2. Remove Address
If, during the lifetime of an MPTCP connection, a previously If, during the lifetime of an MPTCP connection, a previously
announced address becomes invalid (e.g., if the interface announced address becomes invalid (e.g., if the interface
disappears), the affected host SHOULD announce this so that the peer disappears), the affected host SHOULD announce this so that the peer
can remove subflows related to this address. can remove subflows related to this address. A host MAY also choose
to announce that a valid IP address should not be used any longer,
for example for make-before-break session continuity.
This is achieved through the Remove Address (REMOVE_ADDR) option This is achieved through the Remove Address (REMOVE_ADDR) option
(Figure 13), which will remove a previously added address (or list of (Figure 13), which will remove a previously added address (or list of
addresses) from a connection and terminate any subflows currently addresses) from a connection and terminate any subflows currently
using that address. using that address.
For security purposes, if a host receives a REMOVE_ADDR option, it For security purposes, if a host receives a REMOVE_ADDR option, it
must ensure the affected path(s) are no longer in use before it must ensure the affected path(s) are no longer in use before it
instigates closure. The receipt of REMOVE_ADDR SHOULD first trigger instigates closure. The receipt of REMOVE_ADDR SHOULD first trigger
the sending of a TCP keepalive [RFC1122] on the path, and if a the sending of a TCP keepalive [RFC1122] on the path, and if a
response is received the path SHOULD NOT be removed. Typical TCP response is received the path SHOULD NOT be removed. If the path is
validity tests on the subflow (e.g., ensuring sequence and ACK found to still be alive, the receiving host SHOULD no longer use the
numbers are correct) MUST also be undertaken. An implementation can specified address for future connections, but it is the
use indications of these test failures as part of intrusion detection responsibility of the host which sent the REMOVE_ADDR to shut down
or error logging. the subflow. The requesting host MAY also use MP_PRIO
(Section 3.3.8) to request a path is no longer used, before removal.
Typical TCP validity tests on the subflow (e.g., ensuring sequence
and ACK numbers are correct) MUST also be undertaken. An
implementation can use indications of these test failures as part of
intrusion detection or error logging.
The sending and receipt (if no keepalive response was received) of The sending and receipt (if no keepalive response was received) of
this message SHOULD trigger the sending of RSTs by both hosts on the this message SHOULD trigger the sending of RSTs by both hosts on the
affected subflow(s) (if possible), as a courtesy to cleaning up affected subflow(s) (if possible), as a courtesy to cleaning up
middlebox state, before cleaning up any local state. middlebox state, before cleaning up any local state.
Address removal is undertaken by ID, so as to permit the use of NATs Address removal is undertaken by ID, so as to permit the use of NATs
and other middleboxes that rewrite source addresses. If there is no and other middleboxes that rewrite source addresses. If there is no
address at the requested ID, the receiver will silently ignore the address at the requested ID, the receiver will silently ignore the
request. request.
skipping to change at page 43, line 51 skipping to change at page 47, line 33
option on one subflow, containing the key of Host B as declared in option on one subflow, containing the key of Host B as declared in
the initial connection handshake. On all the other subflows, Host the initial connection handshake. On all the other subflows, Host
A sends a regular TCP RST to close these subflows, and tears them A sends a regular TCP RST to close these subflows, and tears them
down. Host A now enters FASTCLOSE_WAIT state. down. Host A now enters FASTCLOSE_WAIT state.
o Option R (RST) : Host A sends a RST containing the MP_FASTCLOSE o Option R (RST) : Host A sends a RST containing the MP_FASTCLOSE
option on all subflows, containing the key of Host B as declared option on all subflows, containing the key of Host B as declared
in the initial connection handshake. Host A can tear the subflows in the initial connection handshake. Host A can tear the subflows
and the connection down immediately. and the connection down immediately.
If a host receives a packet with a valid MP_FASTCLOSE option, it If host A decides to force the closure by using Option A and sending
shall process it as follows : an ACK with the MP_FASTCLOSE option, the connection shall proceed as
follows:
o Upon receipt of an ACK with MP_FASTCLOSE, containing the valid o Upon receipt of an ACK with MP_FASTCLOSE by Host B, containing the
key, Host B answers on the same subflow with a TCP RST and tears valid key, Host B answers on the same subflow with a TCP RST and
down all subflows. Host B can now close the whole MPTCP tears down all subflows also through sending TCP RST signals.
connection (it transitions directly to CLOSED state). Host B can now close the whole MPTCP connection (it transitions
directly to CLOSED state).
o As soon as Host A has received the TCP RST on the remaining o As soon as Host A has received the TCP RST on the remaining
subflow, it can close this subflow and tear down the whole subflow, it can close this subflow and tear down the whole
connection (transition from FASTCLOSE_WAIT to CLOSED states). If connection (transition from FASTCLOSE_WAIT to CLOSED states). If
Host A receives an MP_FASTCLOSE instead of a TCP RST, both hosts Host A receives an MP_FASTCLOSE instead of a TCP RST, both hosts
attempted fast closure simultaneously. Host A should reply with a attempted fast closure simultaneously. Host A should reply with a
TCP RST and tear down the connection. TCP RST and tear down the connection.
o If Host A does not receive a TCP RST in reply to its MP_FASTCLOSE o If Host A does not receive a TCP RST in reply to its MP_FASTCLOSE
after one retransmission timeout (RTO) (the RTO of the subflow after one retransmission timeout (RTO) (the RTO of the subflow
where the MP_FASTCLOSE has been sent), it SHOULD retransmit the where the MP_FASTCLOSE has been sent), it SHOULD retransmit the
MP_FASTCLOSE. The number of retransmissions SHOULD be limited to MP_FASTCLOSE. The number of retransmissions SHOULD be limited to
avoid this connection from being retained for a long time, but avoid this connection from being retained for a long time, but
this limit is implementation specific. A RECOMMENDED number is 3. this limit is implementation specific. A RECOMMENDED number is 3.
If no TCP RST is received in response, Host A SHOULD send a TCP If no TCP RST is received in response, Host A SHOULD send a TCP
RST with the MP_FASTCLOSE option itself when it releases state in RST with the MP_FASTCLOSE option itself when it releases state in
order to clear any remaining state at middleboxes. order to clear any remaining state at middleboxes.
o Upon receipt of a RST with MP_FASTCLOSE, containing the valid key, If however host A decides to force the closure by using Option R and
Host B tears down all subflows. Host B can now close the whole sending a RST with the MP_FASTCLOSE option, Host B will act as
MPTCP connection (it transitions directly to CLOSED state). follows: Upon receipt of a RST with MP_FASTCLOSE, containing the
valid key, Host B tears down all subflows by sending a TCP RST. Host
B can now close the whole MPTCP connection (it transitions directly
to CLOSED state).
3.6. Subflow Reset 3.6. Subflow Reset
As discussed in Section 3.5 above, the MP_FASTCLOSE option provides a An implementation of MPTCP may also need to send a regular TCP RST to
connection-level reset roughly analagous to a TCP RST. Regular TCP force the closure of a subflow. A host sends a TCP RST in order to
RST options remain used to at the subflow-level to indicate the close a subflow or reject an attempt to open a subflow (MP_JOIN). In
receiving host has no knowledge of the MPTCP subflow or TCP order to inform the receiving host why a subflow is being closed or
connection to which the packet belongs. rejected, the TCP RST packet MAY include the MP_TCPRST Option. The
host MAY use this information to decide, for example, whether it
However, in MPTCP, there may be many reasons for rejecting the tries to re-establish the subflow immediately, later, or never.
opening of a subflow, but these semantics cannot be carried in a
standard TCP RST. It would be beneficial for a host to the reasons
why its subflow has been closed with a RST, and thus whether it
should try to re-establish the subflow immediately, later, or never
again. These semantics are carried in the MP_TCPRST option that can
be included on a TCP RST packet.
1 2 3 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+---------------+---------------+-------+-----------------------+ +---------------+---------------+-------+-----------------------+
| Kind | Length |Subtype|U|V|W|T| Reason | | Kind | Length |Subtype|U|V|W|T| Reason |
+---------------+---------------+-------+-----------------------+ +---------------+---------------+-------+-----------------------+
Figure 15: TCP RST Reason (MP_TCPRST) Option Figure 15: TCP RST Reason (MP_TCPRST) Option
The MP_TCPRST option contains a reason code that allows the sender of The MP_TCPRST option contains a reason code that allows the sender of
skipping to change at page 45, line 32 skipping to change at page 49, line 8
condition that is reported is Transient (T bit set to 1) or Permanent condition that is reported is Transient (T bit set to 1) or Permanent
(T bit set to 0). If the error condition is considered to be (T bit set to 0). If the error condition is considered to be
Transient by the sender of the RST segment, the recipient of this Transient by the sender of the RST segment, the recipient of this
segment MAY try to reestablish a subflow for this connection over the segment MAY try to reestablish a subflow for this connection over the
failed path. The time at which a receiver may try to re-establish failed path. The time at which a receiver may try to re-establish
this is implementation-specific, but SHOULD take into account the this is implementation-specific, but SHOULD take into account the
properties of the failure defined by the following reason code. If properties of the failure defined by the following reason code. If
the error condition is considered to be permanent, the receiver of the error condition is considered to be permanent, the receiver of
the RST segment SHOULD NOT try to reestablish a subflow for this the RST segment SHOULD NOT try to reestablish a subflow for this
connection over this path. The "U", "V" and "W" flags are not connection over this path. The "U", "V" and "W" flags are not
defined by this specification and are reserved for future use. defined by this specification and are reserved for future use. An
implementation of this specification MUST set these flags to 0, and a
receiver MUST ignore them.
The "Reason" code is an 8-bit field that indicates the reason for the The "Reason" code is an 8-bit field that indicates the reason for the
termination of the subflow. The following codes are defined in this termination of the subflow. The following codes are defined in this
document: document:
o Unspecified error (code 0x0). This is the default error implying o Unspecified error (code 0x0). This is the default error implying
the subflow is not longer available. The receiving host SHOULD the subflow is no longer available. The presence of this option
take account of the 'T' bit in deciding whether to re-estbalish shows that the RST was generated by a MPTCP-aware device.
this subflow. The presence of this option shows that the RST was
generated by a MPTCP-aware device.
o MPTCP specific error (code 0x01). An error has been detected in o MPTCP specific error (code 0x01). An error has been detected in
the processing of MPTCP options. This is the usual reason code to the processing of MPTCP options. This is the usual reason code to
return in the cases where a RST is being sent to close a subflow return in the cases where a RST is being sent to close a subflow
for reasons of an invalid response. for reasons of an invalid response.
o Lack of resources (code 0x02). This code indicates that the o Lack of resources (code 0x02). This code indicates that the
sending host does not have enough ressources to support the sending host does not have enough resources to support the
terminated subflow. terminated subflow.
o Administratively prohibited (code 0x03). This code indicates that o Administratively prohibited (code 0x03). This code indicates that
the requested subflow is prohibited by the policies of the sending the requested subflow is prohibited by the policies of the sending
host. host.
o Too much outstanding data (code 0x04). This code indicates that o Too much outstanding data (code 0x04). This code indicates that
there is an excessive amount of data that need to be transmitted there is an excessive amount of data that need to be transmitted
over the terminated subflow while having already been acknowledged over the terminated subflow while having already been acknowledged
over one or more other subflows. This may occur if a path has over one or more other subflows. This may occur if a path has
skipping to change at page 46, line 31 skipping to change at page 50, line 12
been detected over this subflow making MPTCP signaling invalid. been detected over this subflow making MPTCP signaling invalid.
For example, this may be sent if the checksum does not validate. For example, this may be sent if the checksum does not validate.
3.7. Fallback 3.7. Fallback
Sometimes, middleboxes will exist on a path that could prevent the Sometimes, middleboxes will exist on a path that could prevent the
operation of MPTCP. MPTCP has been designed in order to cope with operation of MPTCP. MPTCP has been designed in order to cope with
many middlebox modifications (see Section 6), but there are still many middlebox modifications (see Section 6), but there are still
some cases where a subflow could fail to operate within the MPTCP some cases where a subflow could fail to operate within the MPTCP
requirements. These cases are notably the following: the loss of requirements. These cases are notably the following: the loss of
MPTCP options on a path and the modification of payload data. If MPTCP options on a path, and the modification of payload data. If
such an event occurs, it is necessary to "fall back" to the previous, such an event occurs, it is necessary to "fall back" to the previous,
safe operation. This may be either falling back to regular TCP or safe operation. This may be either falling back to regular TCP or
removing a problematic subflow. removing a problematic subflow.
At the start of an MPTCP connection (i.e., the first subflow), it is At the start of an MPTCP connection (i.e., the first subflow), it is
important to ensure that the path is fully MPTCP capable and the important to ensure that the path is fully MPTCP capable and the
necessary MPTCP options can reach each host. The handshake as necessary MPTCP options can reach each host. The handshake as
described in Section 3.1 SHOULD fall back to regular TCP if either of described in Section 3.1 SHOULD fall back to regular TCP if either of
the SYN messages do not have the MPTCP options: this is the same, and the SYN messages do not have the MPTCP options: this is the same, and
desired, behavior in the case where a host is not MPTCP capable, or desired, behavior in the case where a host is not MPTCP capable, or
skipping to change at page 47, line 16 skipping to change at page 50, line 45
every segment until one of the sent segments has been acknowledged every segment until one of the sent segments has been acknowledged
with a DSS option containing a Data ACK. Upon reception of the with a DSS option containing a Data ACK. Upon reception of the
acknowledgment, the sender has the confirmation that the DSS option acknowledgment, the sender has the confirmation that the DSS option
passes in both directions and may choose to send fewer DSS options passes in both directions and may choose to send fewer DSS options
than once per segment. than once per segment.
If, however, an ACK is received for data (not just for the SYN) If, however, an ACK is received for data (not just for the SYN)
without a DSS option containing a Data ACK, the sender determines the without a DSS option containing a Data ACK, the sender determines the
path is not MPTCP capable. In the case of this occurring on an path is not MPTCP capable. In the case of this occurring on an
additional subflow (i.e., one started with MP_JOIN), the host MUST additional subflow (i.e., one started with MP_JOIN), the host MUST
close the subflow with a RST. In the case of the first subflow close the subflow with a RST, which SHOULD contain a MP_TCPRST option
(i.e., that started with MP_CAPABLE), it MUST drop out of an MPTCP (Section 3.6) with a "Middlebox interferance" reason code.
mode back to regular TCP. The sender will send one final data
sequence mapping, with the Data-Level Length value of 0 indicating an
infinite mapping (in case the path drops options in one direction
only), and then revert to sending data on the single subflow without
any MPTCP options.
Note that this rule essentially prohibits the sending of data on the In the case of such an ACK being received on the first subflow (i.e.,
third packet of an MP_CAPABLE or MP_JOIN handshake, since both that that started with MP_CAPABLE), before any additional subflows are
option and a DSS cannot fit in TCP option space. If the initiator is added, the implementation MUST drop out of an MPTCP mode, back to
to send first, another segment must be sent that contains the data regular TCP. The sender will send one final data sequence mapping,
and DSS. Note also that an additional subflow cannot be used until with the Data-Level Length value of 0 indicating an infinite mapping
the initial path has been verified as MPTCP capable. (to inform the other end in case the path drops options in one
direction only), and then revert to sending data on the single
subflow without any MPTCP options.
If a subflow breaks during operation, e.g. if it is re-routed and If a subflow breaks during operation, e.g. if it is re-routed and
MPTCP options are no longer permitted, then once this is detected (by MPTCP options are no longer permitted, then once this is detected (by
the subflow-level receive buffer filling up), the subflow SHOULD be the subflow-level receive buffer filling up), the subflow SHOULD be
treated as broken and closed with a RST, since no data can be treated as broken and closed with a RST, since no data can be
delivered to the application layer, and no fallback signal can be delivered to the application layer, and no fallback signal can be
reliably sent. This RST SHOULD include the MP_TCPRST option reliably sent. This RST SHOULD include the MP_TCPRST option
(Section 3.6) with an appropriate reason code. (Section 3.6) with a "Middlebox interferance" reason code.
These rules should cover all cases where such a failure could happen: These rules should cover all cases where such a failure could happen:
whether it's on the forward or reverse path and whether the server or whether it's on the forward or reverse path and whether the server or
the client first sends data. If lost options on data packets occur the client first sends data. If lost options on data packets occur
on any other subflow apart from the initial subflow, it should be on any other subflow apart from the initial subflow, it should be
treated as a standard path failure. The data would not be DATA_ACKed treated as a standard path failure. The data would not be DATA_ACKed
(since there is no mapping for the data), and the subflow can be (since there is no mapping for the data), and the subflow can be
closed with a RST, containing a MP_TCPRST option (Section 3.6) with closed with a RST, containing a MP_TCPRST option (Section 3.6) with a
an appropriate reason code. "Middlebox interferance" reason code.
The case described above is a specialized case of fallback, for when
the lack of MPTCP support is detected before any data is acknowledged
at the connection level on a subflow. More generally, fallback
(either closing a subflow, or to regular TCP) can become necessary at
any point during a connection if a non-MPTCP-aware middlebox changes
the data stream.
As described in Section 3.3, each portion of data for which there is So far this section has discussed the lost of MPTCP options, either
a mapping is protected by a checksum, if checksums have been initially, or during the course of the connection. As described in
negotiated. This mechanism is used to detect if middleboxes have Section 3.3, each portion of data for which there is a mapping is
made any adjustments to the payload (added, removed, or changed protected by a checksum, if checksums have been negotiated. This
data). A checksum will fail if the data has been changed in any way. mechanism is used to detect if middleboxes have made any adjustments
This will also detect if the length of data on the subflow is to the payload (added, removed, or changed data). A checksum will
increased or decreased, and this means the data sequence mapping is fail if the data has been changed in any way. This will also detect
no longer valid. The sender no longer knows what subflow-level if the length of data on the subflow is increased or decreased, and
sequence number the receiver is genuinely operating at (the middlebox this means the data sequence mapping is no longer valid. The sender
will be faking ACKs in return), and it cannot signal any further no longer knows what subflow-level sequence number the receiver is
mappings. Furthermore, in addition to the possibility of payload genuinely operating at (the middlebox will be faking ACKs in return),
modifications that are valid at the application layer, there is the and it cannot signal any further mappings. Furthermore, in addition
possibility that false positives could be hit across MPTCP segment to the possibility of payload modifications that are valid at the
boundaries, corrupting the data. Therefore, all data from the start application layer, there is the possibility that such modifications
of the segment that failed the checksum onwards is not trustworthy. could be triggered across MPTCP segment boundaries, corrupting the
data. Therefore, all data from the start of the segment that failed
the checksum onwards is not trustworthy.
Note that if checksum usage has not been negotiated, this fallback Note that if checksum usage has not been negotiated, this fallback
mechanism cannot be used unless there is some higher or lower layer mechanism cannot be used unless there is some higher or lower layer
signal to inform the MPTCP implementation that the payload has been signal to inform the MPTCP implementation that the payload has been
tampered with. tampered with.
When multiple subflows are in use, the data in flight on a subflow When multiple subflows are in use, the data in flight on a subflow
will likely involve data that is not contiguously part of the will likely involve data that is not contiguously part of the
connection-level stream, since segments will be spread across the connection-level stream, since segments will be spread across the
multiple subflows. Due to the problems identified above, it is not multiple subflows. Due to the problems identified above, it is not
possible to determine what the adjustment has done to the data possible to determine what adjustment has done to the data (notably,
(notably, any changes to the subflow sequence numbering). Therefore, any changes to the subflow sequence numbering). Therefore, it is not
it is not possible to recover the subflow, and the affected subflow possible to recover the subflow, and the affected subflow must be
must be immediately closed with a RST, featuring an MP_FAIL option immediately closed with a RST, featuring an MP_FAIL option
(Figure 16), which defines the data sequence number at the start of (Figure 16), which defines the data sequence number at the start of
the segment (defined by the data sequence mapping) that had the the segment (defined by the data sequence mapping) that had the
checksum failure. Note that the MP_FAIL option requires the use of checksum failure. Note that the MP_FAIL option requires the use of
the full 64-bit sequence number, even if 32-bit sequence numbers are the full 64-bit sequence number, even if 32-bit sequence numbers are
normally in use in the DSS signals on the path. normally in use in the DSS signals on the path.
1 2 3 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+---------------+---------------+-------+----------------------+ +---------------+---------------+-------+----------------------+
| Kind | Length=12 |Subtype| (reserved) | | Kind | Length=12 |Subtype| (reserved) |
+---------------+---------------+-------+----------------------+ +---------------+---------------+-------+----------------------+
| | | |
| Data Sequence Number (8 octets) | | Data Sequence Number (8 octets) |
| | | |
+--------------------------------------------------------------+ +--------------------------------------------------------------+
Figure 16: Fallback (MP_FAIL) Option Figure 16: Fallback (MP_FAIL) Option
The receiver MUST discard all data following the data sequence number The receiver of this option MUST discard all data following the data
specified. Failed data MUST NOT be DATA_ACKed and so will be sequence number specified. Failed data MUST NOT be DATA_ACKed and so
retransmitted on other subflows (Section 3.3.6). will be retransmitted on other subflows (Section 3.3.6).
A special case is when there is a single subflow and it fails with a A special case is when there is a single subflow and it fails with a
checksum error. If it is known that all unacknowledged data in checksum error. If it is known that all unacknowledged data in
flight is contiguous (which will usually be the case with a single flight is contiguous (which will usually be the case with a single
subflow), an infinite mapping can be applied to the subflow without subflow), an infinite mapping can be applied to the subflow without
the need to close it first, and essentially turn off all further the need to close it first, and essentially turn off all further
MPTCP signaling. In this case, if a receiver identifies a checksum MPTCP signaling. In this case, if a receiver identifies a checksum
failure when there is only one path, it will send back an MP_FAIL failure when there is only one path, it will send back an MP_FAIL
option on the subflow-level ACK, referring to the data-level sequence option on the subflow-level ACK, referring to the data-level sequence
number of the start of the segment on which the checksum error was number of the start of the segment on which the checksum error was
detected. The sender will receive this, and if all unacknowledged detected. The sender will receive this, and if all unacknowledged
data in flight is contiguous, will signal an infinite mapping. This data in flight is contiguous, will signal an infinite mapping. This
infinite mapping will be a DSS option (Section 3.3) on the first new infinite mapping will be a DSS option (Section 3.3) on the first new
packet, containing a data sequence mapping that acts retroactively, packet, containing a data sequence mapping that acts retroactively,
referring to the start of the subflow sequence number of the most referring to the start of the subflow sequence number of the most
recent segment that was known to be delivered intact (i.e. was recent segment that was known to be delivered intact (i.e. was
successfully DATA_ACKed). From that point onwards, data can be successfully DATA_ACKed). From that point onwards, data can be
altered by a middlebox without affecting MPTCP, as the data stream is altered by a middlebox without affecting MPTCP, as the data stream is
equivalent to a regular, legacy TCP session. The MP_FAIL signal equivalent to a regular, legacy TCP session. Whilst in theory paths
affects only one direction of traffic. It is not mandatory for the may only be damaged in one direction, and the MP_FAIL signal affects
reciever of an MP_FAIL to also respond with an MP_FAIL, since the only one direction of traffic, for implementation simplicity, the
paths may only be damaged in one direction. However, implementations receiver of an MP_FAIL MUST also respond with an MP_FAIL in the
MAY choose to send a MP_FAIL in the reverse direction and entirely reverse direction and entirely revert to a regular TCP session.
revert to a regular TCP session.
In the rare case that the data is not contiguous (which could happen In the rare case that the data is not contiguous (which could happen
when there is only one subflow but it is retransmitting data from a when there is only one subflow but it is retransmitting data from a
subflow that has recently been uncleanly closed), the receiver MUST subflow that has recently been uncleanly closed), the receiver MUST
close the subflow with a RST with MP_FAIL. The receiver MUST discard close the subflow with a RST with MP_FAIL. The receiver MUST discard
all data that follows the data sequence number specified. The sender all data that follows the data sequence number specified. The sender
MAY attempt to create a new subflow belonging to the same connection, MAY attempt to create a new subflow belonging to the same connection,
and, if it chooses to do so, SHOULD place the single subflow and, if it chooses to do so, SHOULD place the single subflow
immediately in single-path mode by setting an infinite data sequence immediately in single-path mode by setting an infinite data sequence
mapping. This mapping will begin from the data-level sequence number mapping. This mapping will begin from the data-level sequence number
that was declared in the MP_FAIL. that was declared in the MP_FAIL.
After a sender signals an infinite mapping, it MUST only use subflow After a sender signals an infinite mapping, it MUST only use subflow
ACKs to clear its send buffer. This is because Data ACKs may become ACKs to clear its send buffer. This is because Data ACKs may become
misaligned with the subflow ACKs when middleboxes insert or delete misaligned with the subflow ACKs when middleboxes insert or delete
data. The receive SHOULD stop generating Data ACKs after it receives data. The receive SHOULD stop generating Data ACKs after it receives
an infinite mapping. an infinite mapping.
When a connection has fallen back, only one subflow can send data; When a connection has fallen back with an infinite mapping, only one
otherwise, the receiver would not know how to reorder the data. In subflow can send data; otherwise, the receiver would not know how to
practice, this means that all MPTCP subflows will have to be reorder the data. In practice, this means that all MPTCP subflows
terminated except one. Once MPTCP falls back to regular TCP, it MUST will have to be terminated except one. Once MPTCP falls back to
NOT revert to MPTCP later in the connection. regular TCP, it MUST NOT revert to MPTCP later in the connection.
It should be emphasized that we are not attempting to prevent the use It should be emphasized that MPTCP is not attempting to prevent the
of middleboxes that want to adjust the payload. An MPTCP-aware use of middleboxes that want to adjust the payload. An MPTCP-aware
middlebox could provide such functionality by also rewriting middlebox could provide such functionality by also rewriting
checksums. checksums.
3.8. Error Handling 3.8. Error Handling
In addition to the fallback mechanism as described above, the In addition to the fallback mechanism as described above, the
standard classes of TCP errors may need to be handled in an MPTCP- standard classes of TCP errors may need to be handled in an MPTCP-
specific way. Note that changing semantics -- such as the relevance specific way. Note that changing semantics -- such as the relevance
of a RST -- are covered in Section 4. Where possible, we do not want of a RST -- are covered in Section 4. Where possible, we do not want
to deviate from regular TCP behavior. to deviate from regular TCP behavior.
skipping to change at page 51, line 20 skipping to change at page 54, line 29
ports as already in use. In other words, the destination port of a ports as already in use. In other words, the destination port of a
SYN containing an MP_JOIN option SHOULD be the same as the remote SYN containing an MP_JOIN option SHOULD be the same as the remote
port of the first subflow in the connection. The local port for such port of the first subflow in the connection. The local port for such
SYNs SHOULD also be the same as for the first subflow (and as such, SYNs SHOULD also be the same as for the first subflow (and as such,
an implementation SHOULD reserve ephemeral ports across all local IP an implementation SHOULD reserve ephemeral ports across all local IP
addresses), although there may be cases where this is infeasible. addresses), although there may be cases where this is infeasible.
This strategy is intended to maximize the probability of the SYN This strategy is intended to maximize the probability of the SYN
being permitted by a firewall or NAT at the recipient and to avoid being permitted by a firewall or NAT at the recipient and to avoid
confusing any network monitoring software. confusing any network monitoring software.
There may also be cases, however, where the passive opener wishes to There may also be cases, however, where a host wishes to signal that
signal to the other host that a specific port should be used, and a specific port should be used, and this facility is provided in the
this facility is provided in the Add Address option as documented in ADD_ADDR option as documented in Section 3.4.1. It is therefore
Section 3.4.1. It is therefore feasible to allow multiple subflows feasible to allow multiple subflows between the same two addresses
between the same two addresses but using different port pairs, and but using different port pairs, and such a facility could be used to
such a facility could be used to allow load balancing within the allow load balancing within the network based on 5-tuples (e.g., some
network based on 5-tuples (e.g., some ECMP implementations ECMP implementations [RFC2992]).
[RFC2992]).
3.9.2. Delayed Subflow Start and Subflow Symmetry 3.9.2. Delayed Subflow Start and Subflow Symmetry
Many TCP connections are short-lived and consist only of a few Many TCP connections are short-lived and consist only of a few
segments, and so the overheads of using MPTCP outweigh any benefits. segments, and so the overheads of using MPTCP outweigh any benefits.
A heuristic is required, therefore, to decide when to start using A heuristic is required, therefore, to decide when to start using
additional subflows in an MPTCP connection. We expect that additional subflows in an MPTCP connection. We expect that
experience gathered from deployments will provide further guidance on experience gathered from deployments will provide further guidance on
this, and will be affected by particular application characteristics this, and will be affected by particular application characteristics
(which are likely to change over time). However, a suggested (which are likely to change over time). However, a suggested
skipping to change at page 52, line 8 skipping to change at page 55, line 13
subflow for each initial window's worth of data that is buffered. subflow for each initial window's worth of data that is buffered.
Consideration should also be given to limiting the rate of adding new Consideration should also be given to limiting the rate of adding new
subflows, as well as limiting the total number of subflows open for a subflows, as well as limiting the total number of subflows open for a
particular connection. A host may choose to vary these values based particular connection. A host may choose to vary these values based
on its load or knowledge of traffic and path characteristics. on its load or knowledge of traffic and path characteristics.
Note that this heuristic alone is probably insufficient. Traffic for Note that this heuristic alone is probably insufficient. Traffic for
many common applications, such as downloads, is highly asymmetric and many common applications, such as downloads, is highly asymmetric and
the host that is multihomed may well be the client that will never the host that is multihomed may well be the client that will never
fill its buffers, and thus never use MPTCP. Advanced APIs that allow fill its buffers, and thus never use MPTCP according to this
an application to signal its traffic requirements would aid in these heuristic. Advanced APIs that allow an application to signal its
decisions. traffic requirements would aid in these decisions.
An additional time-based heuristic could be applied, opening An additional time-based heuristic could be applied, opening
additional subflows after a given period of time has passed. This additional subflows after a given period of time has passed. This
would alleviate the above issue, and also provide resilience for low- would alleviate the above issue, and also provide resilience for low-
bandwidth but long-lived applications. bandwidth but long-lived applications.
If the two communicating hosts immediately try to set up subflows Another issue is that both communicating hosts may simultaneously try
from all available addresses to all available addresses on the other to set up a subflow between the same pair of addresses. This leads
host, this could end up creating two subflows per path. This is an to an inefficient use of resources.
inefficient use of resources.
If the the same ports are used on all subflows, as recommended above, If the the same ports are used on all subflows, as recommended above,
then standard TCP simultaneous open logic should take care of this then standard TCP simultaneous open logic should take care of this
situation and only one subflow will be established between the situation and only one subflow will be established between the
address pairs. However, this relies on the same ports being used at address pairs. However, this relies on the same ports being used at
both end hosts. If a host does not support TCP simultaneous open, it both end hosts. If a host does not support TCP simultaneous open, it
is RECOMMENDED that some element of randomization is applied to the is RECOMMENDED that some element of randomization is applied to the
time waited before opening new subflows, so that only one subflow time to wait before opening new subflows, so that only one subflow is
exists between a given address pair. If, however, hosts signal created between a given address pair. If, however, hosts signal
additional ports to use (for example, for leveraging ECMP on-path), additional ports to use (for example, for leveraging ECMP on-path),
this heuristic need not apply. this heuristic is not appropriate.
This section has shown some of the considerations that an implementer This section has shown some of the considerations that an implementer
should give when developing MPTCP heuristics, but is not intended to should give when developing MPTCP heuristics, but is not intended to
be prescriptive. be prescriptive.
3.9.3. Failure Handling 3.9.3. Failure Handling
Requirements for MPTCP's handling of unexpected signals have been Requirements for MPTCP's handling of unexpected signals have been
given in Section 3.8. There are other failure cases, however, where given in Section 3.8. There are other failure cases, however, where
a hosts can choose appropriate behavior. a hosts can choose appropriate behavior.
skipping to change at page 53, line 30 skipping to change at page 56, line 34
4. Semantic Issues 4. Semantic Issues
In order to support multipath operation, the semantics of some TCP In order to support multipath operation, the semantics of some TCP
components have changed. To aid clarity, this section collects these components have changed. To aid clarity, this section collects these
semantic changes as a reference. semantic changes as a reference.
Sequence number: The (in-header) TCP sequence number is specific to Sequence number: The (in-header) TCP sequence number is specific to
the subflow. To allow the receiver to reorder application data, the subflow. To allow the receiver to reorder application data,
an additional data-level sequence space is used. In this data- an additional data-level sequence space is used. In this data-
level sequence space, the initial SYN and the final DATA_FIN level sequence space, the initial SYN and the final DATA_FIN
occupy 1 octet of sequence space. There is an explicit mapping of occupy 1 octet of sequence space. This is to ensure these signals
data sequence space to subflow sequence space, which is signaled are acknowledged at the connection level. There is an explicit
through TCP options in data packets. mapping of data sequence space to subflow sequence space, which is
signaled through TCP options in data packets.
ACK: The ACK field in the TCP header acknowledges only the subflow ACK: The ACK field in the TCP header acknowledges only the subflow
sequence number, not the data-level sequence space. sequence number, not the data-level sequence space.
Implementations SHOULD NOT attempt to infer a data-level Implementations SHOULD NOT attempt to infer a data-level
acknowledgment from the subflow ACKs. This separates subflow- and acknowledgment from the subflow ACKs. This separates subflow- and
connection-level processing at an end host. connection-level processing at an end host.
Duplicate ACK: A duplicate ACK that includes any MPTCP signaling Duplicate ACK: A duplicate ACK that includes any MPTCP signaling
(with the exception of the DSS option) MUST NOT be treated as a (with the exception of the DSS option) MUST NOT be treated as a
signal of congestion. To limit the chances of non-MPTCP-aware signal of congestion. To limit the chances of non-MPTCP-aware
skipping to change at page 55, line 26 skipping to change at page 58, line 32
hash of this key as the connection identification "token". The keys hash of this key as the connection identification "token". The keys
are concatenated and used as keys for creating Hash-based Message are concatenated and used as keys for creating Hash-based Message
Authentication Codes (HMACs) used on subflow setup, in order to Authentication Codes (HMACs) used on subflow setup, in order to
verify that the parties in the handshake are the same as in the verify that the parties in the handshake are the same as in the
original connection setup. It also provides verification that the original connection setup. It also provides verification that the
peer can receive traffic at this new address. Replay attacks would peer can receive traffic at this new address. Replay attacks would
still be possible when only keys are used; therefore, the handshakes still be possible when only keys are used; therefore, the handshakes
use single-use random numbers (nonces) at both ends -- this ensures use single-use random numbers (nonces) at both ends -- this ensures
the HMAC will never be the same on two handshakes. Guidance on the HMAC will never be the same on two handshakes. Guidance on
generating random numbers suitable for use as keys is given in generating random numbers suitable for use as keys is given in
[RFC4086] and discussed in Section 3.1. [RFC4086] and discussed in Section 3.1. HMAC is also used to secure
the ADD_ADDR option, due to the threats identified in [RFC7430].
The use of crypto capability bits in the initial connection handshake The use of crypto capability bits in the initial connection handshake
to negotiate use of a particular algorithm allows the deployment of to negotiate use of a particular algorithm allows the deployment of
additional crypto mechanisms in the future. Note that this would be additional crypto mechanisms in the future. Note that this would be
susceptible to bid-down attacks only if the attacker was on-path (and susceptible to bid-down attacks only if the attacker was on-path (and
thus would be able to modify the data anyway). The security thus would be able to modify the data anyway). The security
mechanism presented in this document should therefore protect against mechanism presented in this document should therefore protect against
all forms of flooding and hijacking attacks discussed in [RFC6181]. all forms of flooding and hijacking attacks discussed in [RFC6181].
The version negotiation specified in Section 3.1, if differing MPTCP
versions shared a common negotiation format, would allow an on-path
attacker to apply a theoretical bid-down attack. However, since the
v1 and v0 protocols have a different handshake, this is not an attack
that can be applied here. Furthermore, an on-path attacker would
have access to the raw data, negating any other TCP-level security
mechanisms. Also a change from [RFC6824] has removed the subflow
identifier from the MP_PRIO option (Section 3.3.8), to remove the
theoretical attack where a subflow could be placed in "backup" mode
by an attacker.
During normal operation, regular TCP protection mechanisms (such as During normal operation, regular TCP protection mechanisms (such as
ensuring sequence numbers are in-window) will provide the same level ensuring sequence numbers are in-window) will provide the same level
of protection against attacks on individual TCP subflows as exists of protection against attacks on individual TCP subflows as exists
for regular TCP today. Implementations will introduce additional for regular TCP today. Implementations will introduce additional
buffers compared to regular TCP, to reassemble data at the connection buffers compared to regular TCP, to reassemble data at the connection
level. The application of window sizing will minimize the risk of level. The application of window sizing will minimize the risk of
denial-of-service attacks consuming resources. denial-of-service attacks consuming resources.
As discussed in Section 3.4.1, a host may advertise its private As discussed in Section 3.4.1, a host may advertise its private
addresses, but these might point to different hosts in the receiver's addresses, but these might point to different hosts in the receiver's
network. The MP_JOIN handshake (Section 3.2) will ensure that this network. The MP_JOIN handshake (Section 3.2) will ensure that this
does not succeed in setting up a subflow to the incorrect host. does not succeed in setting up a subflow to the incorrect host.
However, it could still create unwanted TCP handshake traffic. This However, it could still create unwanted TCP handshake traffic. This
feature of MPTCP could be a target for denial-of-service exploits, feature of MPTCP could be a target for denial-of-service exploits,
with malicious participants in MPTCP connections encouraging the with malicious participants in MPTCP connections encouraging the
recipient to target other hosts in the network. Therefore, recipient to target other hosts in the network. Therefore,
implementations should consider heuristics (Section 3.9) at both the implementations should consider heuristics (Section 3.9) at both the
sender and receiver to reduce the impact of this. sender and receiver to reduce the impact of this.
To further protect against malicious ADD_ADDR messages sent by an
off-path attacker, the ADD_ADDR includes an HMAC using the keys
negotiated during the handshake. This effectively prevents an
attacker from diverting an MPTCP connection through an off-path
ADD_ADDR injection into the stream.
A small security risk could theoretically exist with key reuse, but A small security risk could theoretically exist with key reuse, but
in order to accomplish a replay attack, both the sender and receiver in order to accomplish a replay attack, both the sender and receiver
keys, and the sender and receiver random numbers, in the MP_JOIN keys, and the sender and receiver random numbers, in the MP_JOIN
handshake (Section 3.2) would have to match. handshake (Section 3.2) would have to match.
Whilst this specification defines a "medium" security solution, Whilst this specification defines a "medium" security solution,
meeting the criteria specified at the start of this section and the meeting the criteria specified at the start of this section and the
threat analysis ([RFC6181]), since attacks only ever get worse, it is threat analysis ([RFC6181]), since attacks only ever get worse, it is
likely that a future Standards Track version of MPTCP would need to likely that a future Standards Track version of MPTCP would need to
be able to support stronger security. There are several ways the be able to support stronger security. There are several ways the
skipping to change at page 58, line 33 skipping to change at page 61, line 48
Figure 17: Connection Setup with Middleboxes that Strip Options from Figure 17: Connection Setup with Middleboxes that Strip Options from
Packets Packets
We now examine data flow with MPTCP, assuming the flow is correctly We now examine data flow with MPTCP, assuming the flow is correctly
set up, which implies the options in the SYN packets were allowed set up, which implies the options in the SYN packets were allowed
through by the relevant middleboxes. If options are allowed through through by the relevant middleboxes. If options are allowed through
and there is no resegmentation or coalescing to TCP segments, and there is no resegmentation or coalescing to TCP segments,
Multipath TCP flows can proceed without problems. Multipath TCP flows can proceed without problems.
The case when options get stripped on data packets has been discussed The case when options get stripped on data packets has been discussed
in the Fallback section. If a fraction of options are stripped, in the Fallback section. If only some MPTCP options are stripped,
behavior is not deterministic. If some data sequence mappings are behavior is not deterministic. If some data sequence mappings are
lost, the connection can continue so long as mappings exist for the lost, the connection can continue so long as mappings exist for the
subflow-level data (e.g., if multiple maps have been sent that subflow-level data (e.g., if multiple maps have been sent that
reinforce each other). If some subflow-level space is left unmapped, reinforce each other). If some subflow-level space is left unmapped,
however, the subflow is treated as broken and is closed, through the however, the subflow is treated as broken and is closed, through the
process described in Section 3.7. MPTCP should survive with a loss process described in Section 3.7. MPTCP should survive with a loss
of some Data ACKs, but performance will degrade as the fraction of of some Data ACKs, but performance will degrade as the fraction of
stripped options increases. We do not expect such cases to appear in stripped options increases. We do not expect such cases to appear in
practice, though: most middleboxes will either strip all options or practice, though: most middleboxes will either strip all options or
let them all through. let them all through.
skipping to change at page 59, line 6 skipping to change at page 62, line 21
We end this section with a list of middlebox classes, their behavior, We end this section with a list of middlebox classes, their behavior,
and the elements in the MPTCP design that allow operation through and the elements in the MPTCP design that allow operation through
such middleboxes. Issues surrounding dropping packets with options such middleboxes. Issues surrounding dropping packets with options
or stripping options were discussed above, and are not included here: or stripping options were discussed above, and are not included here:
o NATs [RFC3022] (Network Address (and Port) Translators) change the o NATs [RFC3022] (Network Address (and Port) Translators) change the
source address (and often source port) of packets. This means source address (and often source port) of packets. This means
that a host will not know its public-facing address for signaling that a host will not know its public-facing address for signaling
in MPTCP. Therefore, MPTCP permits implicit address addition via in MPTCP. Therefore, MPTCP permits implicit address addition via
the MP_JOIN option, and the handshake mechanism ensures that the MP_JOIN option, and the handshake mechanism ensures that
connection attempts to private addresses [RFC1918] do not cause connection attempts to private addresses [RFC1918], since they are
problems. Explicit address removal is undertaken by an Address ID authenticated, will only set up subflows to the correct hosts.
to allow no knowledge of the source address. Explicit address removal is undertaken by an Address ID to allow
no knowledge of the source address.
o Performance Enhancing Proxies (PEPs) [RFC3135] might proactively o Performance Enhancing Proxies (PEPs) [RFC3135] might proactively
ACK data to increase performance. MPTCP, however, relies on ACK data to increase performance. MPTCP, however, relies on
accurate congestion control signals from the end host, and non- accurate congestion control signals from the end host, and non-
MPTCP-aware PEPs will not be able to provide such signals. MPTCP MPTCP-aware PEPs will not be able to provide such signals. MPTCP
will, therefore, fall back to single-path TCP, or close the will, therefore, fall back to single-path TCP, or close the
problematic subflow (see Section 3.7). problematic subflow (see Section 3.7).
o Traffic Normalizers [norm] may not allow holes in sequence o Traffic Normalizers [norm] may not allow holes in sequence
numbers, and may cache packets and retransmit the same data. numbers, and may cache packets and retransmit the same data.
skipping to change at page 59, line 37 skipping to change at page 63, line 5
numbers in data sequence mapping to cope with this. Like NATs, numbers in data sequence mapping to cope with this. Like NATs,
firewalls will not permit many incoming connections, so MPTCP firewalls will not permit many incoming connections, so MPTCP
supports address signaling (ADD_ADDR) so that a multiaddressed supports address signaling (ADD_ADDR) so that a multiaddressed
host can invite its peer behind the firewall/NAT to connect out to host can invite its peer behind the firewall/NAT to connect out to
its additional interface. its additional interface.
o Intrusion Detection Systems look out for traffic patterns and o Intrusion Detection Systems look out for traffic patterns and
content that could threaten a network. Multipath will mean that content that could threaten a network. Multipath will mean that
such data is potentially spread, so it is more difficult for an such data is potentially spread, so it is more difficult for an
IDS to analyze the whole traffic, and potentially increases the IDS to analyze the whole traffic, and potentially increases the
risk of false positives. However, for an MPTCP-aware IDS, tokens risk of false positives. However, a MPTCP-aware IDS can read
can be read by such systems to correlate multiple subflows and tokens to correlate multiple subflows and reassemble them for
reassemble for analysis. analysis.
o Application-level middleboxes such as content-aware firewalls may o Application-level middleboxes such as content-aware firewalls may
alter the payload within a subflow, such as rewriting URIs in HTTP alter the payload within a subflow, such as rewriting URIs in HTTP
traffic. MPTCP will detect these using the checksum and close the traffic. MPTCP will detect these using the checksum and close the
affected subflow(s), if there are other subflows that can be used. affected subflow(s), if there are other subflows that can be used.
If all subflows are affected, multipath will fall back to TCP, If all subflows are affected, multipath will fall back to TCP,
allowing such middleboxes to change the payload. MPTCP-aware allowing such middleboxes to change the payload. MPTCP-aware
middleboxes should be able to adjust the payload and MPTCP middleboxes should be able to adjust the payload and MPTCP
metadata in order not to break the connection. metadata in order not to break the connection.
skipping to change at page 60, line 33 skipping to change at page 63, line 49
The authors gratefully acknowledge significant input into this The authors gratefully acknowledge significant input into this
document from Sebastien Barre and Andrew McDonald. document from Sebastien Barre and Andrew McDonald.
The authors also wish to acknowledge reviews and contributions from The authors also wish to acknowledge reviews and contributions from
Iljitsch van Beijnum, Lars Eggert, Marcelo Bagnulo, Robert Hancock, Iljitsch van Beijnum, Lars Eggert, Marcelo Bagnulo, Robert Hancock,
Pasi Sarolahti, Toby Moncaster, Philip Eardley, Sergio Lembo, Pasi Sarolahti, Toby Moncaster, Philip Eardley, Sergio Lembo,
Lawrence Conroy, Yoshifumi Nishida, Bob Briscoe, Stein Gjessing, Lawrence Conroy, Yoshifumi Nishida, Bob Briscoe, Stein Gjessing,
Andrew McGregor, Georg Hampel, Anumita Biswas, Wes Eddy, Alexey Andrew McGregor, Georg Hampel, Anumita Biswas, Wes Eddy, Alexey
Melnikov, Francis Dupont, Adrian Farrel, Barry Leiba, Robert Sparks, Melnikov, Francis Dupont, Adrian Farrel, Barry Leiba, Robert Sparks,
Sean Turner, Stephen Farrell, Martin Stiemerling, Gregory Detal, and Sean Turner, Stephen Farrell, Martin Stiemerling, Gregory Detal,
Fabien Duchene. Fabien Duchene, Xavier de Foy, and Rahul Jadhav.
8. IANA Considerations 8. IANA Considerations
This document updates [RFC6824] and as such IANA is requested to This document obsoletes [RFC6824] and as such IANA is requested to
update the TCP option space registry to point to this document for update the TCP option space registry to point to this document for
Multipath TCP, as follows: Multipath TCP, as follows:
+------+--------+-----------------------+---------------+ +------+--------+-----------------------+---------------+
| Kind | Length | Meaning | Reference | | Kind | Length | Meaning | Reference |
+------+--------+-----------------------+---------------+ +------+--------+-----------------------+---------------+
| 30 | N | Multipath TCP (MPTCP) | This document | | 30 | N | Multipath TCP (MPTCP) | This document |
+------+--------+-----------------------+---------------+ +------+--------+-----------------------+---------------+
Table 1: TCP Option Kind Numbers Table 1: TCP Option Kind Numbers
skipping to change at page 62, line 12 skipping to change at page 66, line 5
reserved for use by private experiments. Its use may be formalized reserved for use by private experiments. Its use may be formalized
in a future specification. in a future specification.
8.2. MPTCP Handshake Algorithms 8.2. MPTCP Handshake Algorithms
IANA has created another sub-registry, "MPTCP Handshake Algorithms" IANA has created another sub-registry, "MPTCP Handshake Algorithms"
under the "Transmission Control Protocol (TCP) Parameters" registry, under the "Transmission Control Protocol (TCP) Parameters" registry,
based on the flags in MP_CAPABLE (Section 3.1). IANA is requested to based on the flags in MP_CAPABLE (Section 3.1). IANA is requested to
update the references of this table to this document, as follows: update the references of this table to this document, as follows:
+---------+----------------------------------+----------------------+ +-------+----------------------------------------+------------------+
| Flag | Meaning | Reference | | Flag | Meaning | Reference |
| Bit | | | | Bit | | |
+---------+----------------------------------+----------------------+ +-------+----------------------------------------+------------------+
| A | Checksum required | This document, | | A | Checksum required | This document, |
| | | Section 3.1 | | | | Section 3.1 |
| B | Extensibility | This document, | | B | Extensibility | This document, |
| | | Section 3.1 | | | | Section 3.1 |
| C | Do not attempt to connect to | This document, | | C | Do not attempt to establish new | This document, |
| | source address | Section 3.1 | | | subflows to the source address. | Section 3.1 |
| D-G | Unassigned | | | D-G | Unassigned | |
| H | HMAC-SHA256 | This document, | | H | HMAC-SHA256 | This document, |
| | | Section 3.2 | | | | Section 3.2 |
+---------+----------------------------------+----------------------+ +-------+----------------------------------------+------------------+
Table 3: MPTCP Handshake Algorithms Table 3: MPTCP Handshake Algorithms
Note that the meanings of bits D through H can be dependent upon bit Note that the meanings of bits D through H can be dependent upon bit
B, depending on how Extensibility is defined in future B, depending on how Extensibility is defined in future
specifications; see Section 3.1 for more information. specifications; see Section 3.1 for more information.
Future assignments in this registry are also to be defined by Future assignments in this registry are also to be defined by
Standards Action as defined by [RFC5226]. Assignments consist of the Standards Action as defined by [RFC5226]. Assignments consist of the
value of the flags, a symbolic name for the algorithm, and a value of the flags, a symbolic name for the algorithm, and a
reference to its specification. reference to its specification.
8.3. MP_TCPRST Reason Codes 8.3. MP_TCPRST Reason Codes
IANA is requested to create a further sub-registry, "MP_TCPRST Reason IANA is requested to create a further sub-registry, "MP_TCPRST Reason
Codes" under the "Transmission Control Protocol (TCP) Parameters" Codes" under the "Transmission Control Protocol (TCP) Parameters"
registry, based on the reason code in MP_TCPRST (Section 3.6). The registry, based on the reason code in MP_TCPRST (Section 3.6):
contents of this sub-registry are to to this document, as follows:
+------+-----------------------------+----------------------------+ +------+-----------------------------+----------------------------+
| Code | Meaning | Reference | | Code | Meaning | Reference |
+------+-----------------------------+----------------------------+ +------+-----------------------------+----------------------------+
| 0x00 | Unspecified TCP error | This document, Section 3.6 | | 0x00 | Unspecified TCP error | This document, Section 3.6 |
| 0x01 | MPTCP specific error | This document, Section 3.6 | | 0x01 | MPTCP specific error | This document, Section 3.6 |
| 0x02 | Lack of resources | This document, Section 3.6 | | 0x02 | Lack of resources | This document, Section 3.6 |
| 0x03 | Administratively prohibited | This document, Section 3.6 | | 0x03 | Administratively prohibited | This document, Section 3.6 |
| 0x04 | Too much outstanding data | This document, Section 3.6 | | 0x04 | Too much outstanding data | This document, Section 3.6 |
| 0x05 | Unacceptable performance | This document, Section 3.6 | | 0x05 | Unacceptable performance | This document, Section 3.6 |
skipping to change at page 63, line 37 skipping to change at page 67, line 23
[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
Requirement Levels", BCP 14, RFC 2119, Requirement Levels", BCP 14, RFC 2119,
DOI 10.17487/RFC2119, March 1997, DOI 10.17487/RFC2119, March 1997,
<https://www.rfc-editor.org/info/rfc2119>. <https://www.rfc-editor.org/info/rfc2119>.
[RFC6182] Ford, A., Raiciu, C., Handley, M., Barre, S., and J. [RFC6182] Ford, A., Raiciu, C., Handley, M., Barre, S., and J.
Iyengar, "Architectural Guidelines for Multipath TCP Iyengar, "Architectural Guidelines for Multipath TCP
Development", RFC 6182, DOI 10.17487/RFC6182, March 2011, Development", RFC 6182, DOI 10.17487/RFC6182, March 2011,
<https://www.rfc-editor.org/info/rfc6182>. <https://www.rfc-editor.org/info/rfc6182>.
[RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC
2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174,
May 2017, <https://www.rfc-editor.org/info/rfc8174>.
[SHS] National Institute of Science and Technology, "Secure Hash [SHS] National Institute of Science and Technology, "Secure Hash
Standard", Federal Information Processing Standard Standard", Federal Information Processing Standard
(FIPS) 180-4, August 2015, (FIPS) 180-4, August 2015,
<http://nvlpubs.nist.gov/nistpubs/FIPS/ <http://nvlpubs.nist.gov/nistpubs/FIPS/
NIST.FIPS.180-4.pdf>. NIST.FIPS.180-4.pdf>.
9.2. Informative References 9.2. Informative References
[howhard] Raiciu, C., Paasch, C., Barre, S., Ford, A., Honda, M., [howhard] Raiciu, C., Paasch, C., Barre, S., Ford, A., Honda, M.,
Duchene, F., Bonaventure, O., and M. Handley, "How Hard Duchene, F., Bonaventure, O., and M. Handley, "How Hard
skipping to change at page 66, line 22 skipping to change at page 70, line 13
<https://www.rfc-editor.org/info/rfc6824>. <https://www.rfc-editor.org/info/rfc6824>.
[RFC6897] Scharf, M. and A. Ford, "Multipath TCP (MPTCP) Application [RFC6897] Scharf, M. and A. Ford, "Multipath TCP (MPTCP) Application
Interface Considerations", RFC 6897, DOI 10.17487/RFC6897, Interface Considerations", RFC 6897, DOI 10.17487/RFC6897,
March 2013, <https://www.rfc-editor.org/info/rfc6897>. March 2013, <https://www.rfc-editor.org/info/rfc6897>.
[RFC7413] Cheng, Y., Chu, J., Radhakrishnan, S., and A. Jain, "TCP [RFC7413] Cheng, Y., Chu, J., Radhakrishnan, S., and A. Jain, "TCP
Fast Open", RFC 7413, DOI 10.17487/RFC7413, December 2014, Fast Open", RFC 7413, DOI 10.17487/RFC7413, December 2014,
<https://www.rfc-editor.org/info/rfc7413>. <https://www.rfc-editor.org/info/rfc7413>.
[RFC7430] Bagnulo, M., Paasch, C., Gont, F., Bonaventure, O., and C.
Raiciu, "Analysis of Residual Threats and Possible Fixes
for Multipath TCP (MPTCP)", RFC 7430,
DOI 10.17487/RFC7430, July 2015,
<https://www.rfc-editor.org/info/rfc7430>.
[TCPLO] Ramaiah, A., "TCP option space extension", Work [TCPLO] Ramaiah, A., "TCP option space extension", Work
in Progress, March 2012. in Progress, March 2012.
Appendix A. Notes on Use of TCP Options Appendix A. Notes on Use of TCP Options
The TCP option space is limited due to the length of the Data Offset The TCP option space is limited due to the length of the Data Offset
field in the TCP header (4 bits), which defines the TCP header length field in the TCP header (4 bits), which defines the TCP header length
in 32-bit words. With the standard TCP header being 20 bytes, this in 32-bit words. With the standard TCP header being 20 bytes, this
leaves a maximum of 40 bytes for options, and many of these may leaves a maximum of 40 bytes for options, and many of these may
already be used by options such as timestamp and SACK. already be used by options such as timestamp and SACK.
skipping to change at page 68, line 38 skipping to change at page 72, line 38
Finally, there are issues with reliable delivery of options. As Finally, there are issues with reliable delivery of options. As
options can also be sent on pure ACKs, these are not reliably sent. options can also be sent on pure ACKs, these are not reliably sent.
This is not an issue for DATA_ACK due to their cumulative nature, but This is not an issue for DATA_ACK due to their cumulative nature, but
may be an issue for ADD_ADDR/REMOVE_ADDR options. Here, it is may be an issue for ADD_ADDR/REMOVE_ADDR options. Here, it is
recommended to send these options redundantly (whether on multiple recommended to send these options redundantly (whether on multiple
paths or on the same path on a number of ACKs -- but interspersed paths or on the same path on a number of ACKs -- but interspersed
with data in order to avoid interpretation as congestion). The cases with data in order to avoid interpretation as congestion). The cases
where options are stripped by middleboxes are discussed in Section 6. where options are stripped by middleboxes are discussed in Section 6.
Appendix B. TCP Fast Open Appendix B. TCP Fast Open and MPTCP
TCP Fast Open (TFO) is an experimental TCP extension, described in TCP Fast Open (TFO) is an experimental TCP extension, described in
[RFC7413], which has been introduced with the objective of gaining [RFC7413], which has been introduced to allow sending data one RTT
one RTT before transmitting data. This is considered a valuable gain earlier than with regular TCP. This is considered a valuable gain as
as very short connections are very common, especially for HTTP very short connections are very common, especially for HTTP request/
request/response schemes. It achieves this by sending the SYN- response schemes. It achieves this by sending the SYN-segment
segment together with data and allowing the server to reply together with the application's data and allowing the listener to
immediately with data after the SYN/ACK. [RFC7413] secures this reply immediately with data after the SYN/ACK. [RFC7413] secures
mechanism, by using a new TCP option that includes a cookie which is this mechanism, by using a new TCP option that includes a cookie
negotiated in a preceding connection. which is negotiated in a preceding connection.
When using TCP Fast Open in conjunction with MPTCP, there are two key When using TCP Fast Open in conjunction with MPTCP, there are two key
points to take into account, detailed hereafter. points to take into account, detailed hereafter.
B.1. TFO cookie request with MPTCP B.1. TFO cookie request with MPTCP
When a TFO client first connects to a server, it cannot immediately When a TFO initiator first connects to a listener, it cannot
include data in the SYN for security reasons [RFC7413]. Instead, it immediately include data in the SYN for security reasons [RFC7413].
requests a cookie that will be used in subsequent connections. This Instead, it requests a cookie that will be used in subsequent
is done with the TCP cookie request/response options, of resp. 2 connections. This is done with the TCP cookie request/response
bytes and 6-18 bytes (depending on the chosen cookie length). options, of respectively 2 bytes and 6-18 bytes (depending on the
chosen cookie length).
TFO and MPTCP can be combined provided that the total length of their TFO and MPTCP can be combined provided that the total length of all
options does not exceed the maximum 40 bytes possible in TCP: the options does not exceed the maximum 40 bytes possible in TCP:
o In the SYN: MPTCP uses a 4-bytes long MP_CAPABLE option. The o In the SYN: MPTCP uses a 4-bytes long MP_CAPABLE option. The
MPTCP and TFO options sum up to 6 bytes. With typical TCP-options MPTCP and TFO options sum up to 6 bytes. With typical TCP-options
using up to 19 bytes in the SYN (24 bytes if options are padded at using up to 19 bytes in the SYN (24 bytes if options are padded at
a word boundary), there is enough space to combine the MP_CAPABLE a word boundary), there is enough space to combine the MP_CAPABLE
with the TFO Cookie Request. with the TFO Cookie Request.
o In the SYN+ACK: MPTCP uses a 12-bytes long MP_CAPABLE option, but o In the SYN+ACK: MPTCP uses a 12-bytes long MP_CAPABLE option, but
now TFO can be as long as 18 bytes. Since the maximum option now TFO can be as long as 18 bytes. Since the maximum option
length may be exceeded, it is up to the server to solve this by length may be exceeded, it is up to the listener to solve this by
using a shorter cookie. As an example, if we consider that 19 using a shorter cookie. As an example, if we consider that 19
bytes are used for classical TCP options, the maximum possible bytes are used for classical TCP options, the maximum possible
cookie length would be of 7 bytes. Note that the same limitation cookie length would be of 7 bytes. Note that the same limitation
applies to subsequent connections, for the SYN packet (because the applies to subsequent connections, for the SYN packet (because the
client then echoes back the cookie to the server). Finally, if initiator then echoes back the cookie to the listener). Finally,
the security impact of reducing the cookie size is not deemed if the security impact of reducing the cookie size is not deemed
acceptable, the server can reduce the amount of other TCP-options acceptable, the listener can reduce the amount of other TCP-
by omitting the TCP timestamps (as outlined in Appendix A). options by omitting the TCP timestamps (as outlined in
Appendix A).
B.2. Data sequence mapping under TFO B.2. Data sequence mapping under TFO
MPTCP uses, in the TCP establishment phase, a key exchange that is MPTCP uses, in the TCP establishment phase, a key exchange that is
used to generate the Initial Data Sequence Numbers (IDSNs). In used to generate the Initial Data Sequence Numbers (IDSNs). In
particular, the SYN with MP_CAPABLE occupies the first octet of the particular, the SYN with MP_CAPABLE occupies the first octet of the
data sequence space. With TFO, one way to handle the data sent data sequence space. With TFO, one way to handle the data sent
together with the SYN would be to consider an implicit DSS mapping together with the SYN would be to consider an implicit DSS mapping
that covers that SYN segment (since there is not enough space in the that covers that SYN segment (since there is not enough space in the
SYN to include a DSS option). The problem with that approach is that SYN to include a DSS option). The problem with that approach is that
if a middlebox modifies the TFO data, this will not be noticed by if a middlebox modifies the TFO data, this will not be noticed by
MPTCP because of the absence of a DSS-checksum. For example, a TCP MPTCP because of the absence of a DSS-checksum. For example, a TCP
(but not MPTCP)-aware middlebox could insert bytes at the beginning (but not MPTCP)-aware middlebox could insert bytes at the beginning
of the stream and adapt the TCP checksum and sequence numbers of the stream and adapt the TCP checksum and sequence numbers
accordingly. With an implicit mapping, this would give to client and accordingly. With an implicit mapping, this would give to initiator
server a different view on the DSS-mapping, with no way to detect and listener a different view on the DSS-mapping, with no way to
this inconsistency as the DSS checksum is not present. detect this inconsistency as the DSS checksum is not present.
To solve this, the TFO data should not be considered part of the Data To solve this, the TFO data must not be considered part of the Data
Sequence Number space: the SYN with MP_CAPABLE still occupies the Sequence Number space: the SYN with MP_CAPABLE still occupies the
first octet of data sequence space, but then the first non-TFO data first octet of data sequence space, but then the first non-TFO data
byte occupies the second octet. This guarantees that, if the use of byte occupies the second octet. This guarantees that, if the use of
DSS-checksum is negotiated, all data in the data sequence number DSS-checksum is negotiated, all data in the data sequence number
space is checksummed. We also note that this does not entail a loss space is checksummed. We also note that this does not entail a loss
of functionality, because TFO-data is always sent when only one path of functionality, because TFO-data is always only sent on the initial
is active. subflow before any attempt to create additional subflows.
B.3. Connection establishment examples B.3. Connection establishment examples
The following shows a few examples of possible TFO+MPTCP The following shows a few examples of possible TFO+MPTCP
establishment scenarios. establishment scenarios.
Before a client can send data together with the SYN, it must request Before an initiator can send data together with the SYN, it must
a cookie to the server, as shown in Figure Figure 18. This is done request a cookie to the listener, as shown in Figure Figure 18. This
by simply combining the TFO and MPTCP options. is done by simply combining the TFO and MPTCP options.
client server initiator listener
| | | |
| S 0(0) <MP_CAPABLE>, <TFO cookie request> | | S Seq=0(Length=0) <MP_CAPABLE>, <TFO cookie request> |
| -----------------------------------------------------------> | | -----------------------------------------------------------> |
| | | |
| S. 0(0) ack 1 <MP_CAPABLE>, <TFO cookie> | | S. 0(0) ack 1 <MP_CAPABLE>, <TFO cookie> |
| <----------------------------------------------------------- | | <----------------------------------------------------------- |
| | | |
| . 0(0) ack 1 <MP_CAPABLE> | | . 0(0) ack 1 <MP_CAPABLE> |
| -----------------------------------------------------------> | | -----------------------------------------------------------> |
| | | |
Figure 18: Cookie request Figure 18: Cookie request - sequence number and length are annotated
as Seq(Length) and used hereafter in the figures.
Once this is done, the received cookie can be used for TFO, as shown Once this is done, the received cookie can be used for TFO, as shown
in Figure Figure 19. In this example, the client first sends 20 in Figure Figure 19. In this example, the initiator first sends 20
bytes in the SYN. The server immediately replies with 100 bytes bytes in the SYN. The listener immediately replies with 100 bytes
following the SYN-ACK upon which the client replies with 20 more following the SYN-ACK upon which the initiator replies with 20 more
bytes. Note that the last segment in the figure has a TCP sequence bytes. Note that the last segment in the figure has a TCP sequence
number of 21, while the DSS subflow sequence number is 1 (because the number of 21, while the DSS subflow sequence number is 1 (because the
TFO data is not part of the data sequence number space, as explained TFO data is not part of the data sequence number space, as explained
in Section Appendix B.2. in Section Appendix B.2.
client server initiator listener
| | | |
| S 0(20) <MP_CAPABLE>, <TFO cookie> | | S 0(20) <MP_CAPABLE>, <TFO cookie> |
| -----------------------------------------------------------> | | -----------------------------------------------------------> |
| | | |
| S. 0(0) ack 21 <MP_CAPABLE> | | S. 0(0) ack 21 <MP_CAPABLE> |
| <----------------------------------------------------------- | | <----------------------------------------------------------- |
| | | |
| . 1(100) ack 21 <DSS ack=1 seq=1 ssn=1 dlen=100> | | . 1(100) ack 21 <DSS ack=1 seq=1 ssn=1 dlen=100> |
| <----------------------------------------------------------- | | <----------------------------------------------------------- |
| | | |
| . 21(0) ack 1 <MP_CAPABLE> | | . 21(0) ack 1 <MP_CAPABLE> |
| -----------------------------------------------------------> | | -----------------------------------------------------------> |
| | | |
| . 21(20) ack 101 <DSS ack=101 seq=1 ssn=1 dlen=20> | | . 21(20) ack 101 <DSS ack=101 seq=1 ssn=1 dlen=20> |
| -----------------------------------------------------------> | | -----------------------------------------------------------> |
| | | |
Figure 19: The server supports TFO Figure 19: The listener supports TFO
In Figure Figure 20, the server does not support TFO. The client In Figure Figure 20, the listener does not support TFO. The
detects that no state is created in the server (as no data is acked), initiator detects that no state is created in the listener (as no
and now sends the MP_CAPABLE in the third ack, in order for the data is acked), and now sends the MP_CAPABLE in the third ack, in
server to build its MPTCP context at then end of the establishment. order for the listener to build its MPTCP context at then end of the
Now, the tfo data, retransmitted, becomes part of the data sequence establishment. Now, the tfo data, retransmitted, becomes part of the
mapping because it is effectively sent (in fact re-sent) after the data sequence mapping because it is effectively sent (in fact re-
establishment. sent) after the establishment.
client server initiator listener
| | | |
| S 0(20) <MP_CAPABLE>, <TFO cookie> | | S 0(20) <MP_CAPABLE>, <TFO cookie> |
| -----------------------------------------------------------> | | -----------------------------------------------------------> |
| | | |
| S. 0(0) ack 1 <MP_CAPABLE> | | S. 0(0) ack 1 <MP_CAPABLE> |
| <----------------------------------------------------------- | | <----------------------------------------------------------- |
| | | |
| . 1(0) ack 1 <MP_CAPABLE> | | . 1(0) ack 1 <MP_CAPABLE> |
| -----------------------------------------------------------> | | -----------------------------------------------------------> |
| | | |
| . 1(20) ack 1 <DSS ack=1 seq=1 ssn=1 dlen=20> | | . 1(20) ack 1 <DSS ack=1 seq=1 ssn=1 dlen=20> |
| -----------------------------------------------------------> | | -----------------------------------------------------------> |
| | | |
| . 0(0) ack 21 <DSS ack=21 seq=1 ssn=1 dlen=0> | | . 0(0) ack 21 <DSS ack=21 seq=1 ssn=1 dlen=0> |
| <----------------------------------------------------------- | | <----------------------------------------------------------- |
| | | |
Figure 20: The server does not support TFO Figure 20: The listener does not support TFO
It is also possible that the server acknowledges only part of the TFO It is also possible that the listener acknowledges only part of the
data, as illustrated in Figure Figure 21. The client will simply TFO data, as illustrated in Figure Figure 21. The initiator will
retransmit the missing data together with a DSS-mapping. simply retransmit the missing data together with a DSS-mapping.
client server initiator listener
| | | |
| S 0(1000) <MP_CAPABLE>, <TFO cookie> | | S 0(1000) <MP_CAPABLE>, <TFO cookie> |
| -----------------------------------------------------------> | | -----------------------------------------------------------> |
| | | |
| S. 0(0) ack 501 <MP_CAPABLE> | | S. 0(0) ack 501 <MP_CAPABLE> |
| <----------------------------------------------------------- | | <----------------------------------------------------------- |
| | | |
| . 501(0) ack 1 <MP_CAPABLE> | | . 501(0) ack 1 <MP_CAPABLE> |
| -----------------------------------------------------------> | | -----------------------------------------------------------> |
| | | |
| . 501(500) ack 1 <DSS ack=1 seq=1 ssn=1 dlen=500> | | . 501(500) ack 1 <DSS ack=1 seq=1 ssn=1 dlen=500> |
| -----------------------------------------------------------> | | -----------------------------------------------------------> |
| | | |
Figure 21: Partial data acknowledgement Figure 21: Partial data acknowledgement
Appendix C. Control Blocks Appendix C. Control Blocks
Conceptually, an MPTCP connection can be represented as an MPTCP Conceptually, an MPTCP connection can be represented as an MPTCP
control block that contains several variables that track the progress protocol control block (PCB) that contains several variables that
and the state of the MPTCP connection and a set of linked TCP control track the progress and the state of the MPTCP connection and a set of
blocks that correspond to the subflows that have been established. linked TCP control blocks that correspond to the subflows that have
been established.
RFC 793 [RFC0793] specifies several state variables. Whenever RFC 793 [RFC0793] specifies several state variables. Whenever
possible, we reuse the same terminology as RFC 793 to describe the possible, we reuse the same terminology as RFC 793 to describe the
state variables that are maintained by MPTCP. state variables that are maintained by MPTCP.
C.1. MPTCP Control Block C.1. MPTCP Control Block
The MPTCP control block contains the following variable per The MPTCP control block contains the following variable per
connection. connection.
C.1.1. Authentication and Metadata C.1.1. Authentication and Metadata
Local.Token (32 bits): This is the token chosen by the local host on Local.Token (32 bits): This is the token chosen by the local host on
this MPTCP connection. The token MUST be unique among all this MPTCP connection. The token must be unique among all
established MPTCP connections, generated from the local key. established MPTCP connections, generated from the local key.
Local.Key (64 bits): This is the key sent by the local host on this Local.Key (64 bits): This is the key sent by the local host on this
MPTCP connection. MPTCP connection.
Remote.Token (32 bits): This is the token chosen by the remote host Remote.Token (32 bits): This is the token chosen by the remote host
on this MPTCP connection, generated from the remote key. on this MPTCP connection, generated from the remote key.
Remote.Key (64 bits): This is the key chosen by the remote host on Remote.Key (64 bits): This is the key chosen by the remote host on
this MPTCP connection this MPTCP connection
skipping to change at page 73, line 45 skipping to change at page 77, line 48
used to specify the DATA_ACK that is sent in the DSS option on all used to specify the DATA_ACK that is sent in the DSS option on all
subflows. subflows.
RCV.WND (32 bits with RFC 1323, 16 bits otherwise): This is the RCV.WND (32 bits with RFC 1323, 16 bits otherwise): This is the
connection-level receive window, which is the maximum of the connection-level receive window, which is the maximum of the
RCV.WND on all the subflows. RCV.WND on all the subflows.
C.2. TCP Control Blocks C.2. TCP Control Blocks
The MPTCP control block also contains a list of the TCP control The MPTCP control block also contains a list of the TCP control
blocks that are associated to the MPTCP connection. blocks that are associated with the MPTCP connection.
Note that the TCP control block on the TCP subflows does not contain Note that the TCP control block on the TCP subflows does not contain
the RCV.WND and SND.WND state variables as these are maintained at the RCV.WND and SND.WND state variables as these are maintained at
the MPTCP connection level and not at the subflow level. the MPTCP connection level and not at the subflow level.
Inside each TCP control block, the following state variables are Inside each TCP control block, the following state variables are
defined. defined.
C.2.1. Sending Side C.2.1. Sending Side
 End of changes. 114 change blocks. 
437 lines changed or deleted 566 lines changed or added

This html diff was produced by rfcdiff 1.47. The latest version is available from http://tools.ietf.org/tools/rfcdiff/