draft-ietf-mptcp-rfc6824bis-04.txt   draft-ietf-mptcp-rfc6824bis-05.txt 
Internet Engineering Task Force A. Ford Internet Engineering Task Force A. Ford
Internet-Draft Pexip Internet-Draft Pexip
Obsoletes: 6824 (if approved) C. Raiciu Obsoletes: 6824 (if approved) C. Raiciu
Intended status: Experimental U. Politechnica of Bucharest Intended status: Experimental U. Politechnica of Bucharest
Expires: September 8, 2015 M. Handley Expires: July 15, 2016 M. Handley
U. College London U. College London
O. Bonaventure O. Bonaventure
U. catholique de Louvain U. catholique de Louvain
March 7, 2015 C. Paasch
Apple, Inc.
January 12, 2016
TCP Extensions for Multipath Operation with Multiple Addresses TCP Extensions for Multipath Operation with Multiple Addresses
draft-ietf-mptcp-rfc6824bis-04 draft-ietf-mptcp-rfc6824bis-05
Abstract Abstract
TCP/IP communication is currently restricted to a single path per TCP/IP communication is currently restricted to a single path per
connection, yet multiple paths often exist between peers. The connection, yet multiple paths often exist between peers. The
simultaneous use of these multiple paths for a TCP/IP session would simultaneous use of these multiple paths for a TCP/IP session would
improve resource usage within the network and, thus, improve user improve resource usage within the network and, thus, improve user
experience through higher throughput and improved resilience to experience through higher throughput and improved resilience to
network failure. network failure.
Multipath TCP provides the ability to simultaneously use multiple Multipath TCP provides the ability to simultaneously use multiple
paths between peers. This document presents a set of extensions to paths between peers. This document presents a set of extensions to
traditional TCP to support multipath operation. The protocol offers traditional TCP to support multipath operation. The protocol offers
the same type of service to applications as TCP (i.e., reliable the same type of service to applications as TCP (i.e., reliable
bytestream), and it provides the components necessary to establish bytestream), and it provides the components necessary to establish
and use multiple TCP flows across potentially disjoint paths. and use multiple TCP flows across potentially disjoint paths.
This document obsoletes RFC6824 [5] through clarifications and This document specifies v1 of Multipath TCP, obsoleting v0 as
modifications, primarily driven by deployment experience. specified in RFC6824 [5] through clarifications and modifications
primarily driven by deployment experience.
Status of This Memo Status of This Memo
This Internet-Draft is submitted in full conformance with the This Internet-Draft is submitted in full conformance with the
provisions of BCP 78 and BCP 79. provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF). Note that other groups may also distribute Task Force (IETF). Note that other groups may also distribute
working documents as Internet-Drafts. The list of current Internet- working documents as Internet-Drafts. The list of current Internet-
Drafts is at http://datatracker.ietf.org/drafts/current/. Drafts is at http://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress." material or to cite them other than as "work in progress."
This Internet-Draft will expire on September 8, 2015.
This Internet-Draft will expire on July 15, 2016.
Copyright Notice Copyright Notice
Copyright (c) 2015 IETF Trust and the persons identified as the Copyright (c) 2016 IETF Trust and the persons identified as the
document authors. All rights reserved. document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents Provisions Relating to IETF Documents
(http://trustee.ietf.org/license-info) in effect on the date of (http://trustee.ietf.org/license-info) in effect on the date of
publication of this document. Please review these documents publication of this document. Please review these documents
carefully, as they describe your rights and restrictions with respect carefully, as they describe your rights and restrictions with respect
to this document. Code Components extracted from this document must to this document. Code Components extracted from this document must
include Simplified BSD License text as described in Section 4.e of include Simplified BSD License text as described in Section 4.e of
the Trust Legal Provisions and are provided without warranty as the Trust Legal Provisions and are provided without warranty as
skipping to change at page 2, line 41 skipping to change at page 2, line 44
2.2. Associating a New Subflow with an Existing MPTCP 2.2. Associating a New Subflow with an Existing MPTCP
Connection . . . . . . . . . . . . . . . . . . . . . . . . 9 Connection . . . . . . . . . . . . . . . . . . . . . . . . 9
2.3. Informing the Other Host about Another Potential 2.3. Informing the Other Host about Another Potential
Address . . . . . . . . . . . . . . . . . . . . . . . . . 10 Address . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.4. Data Transfer Using MPTCP . . . . . . . . . . . . . . . . 11 2.4. Data Transfer Using MPTCP . . . . . . . . . . . . . . . . 11
2.5. Requesting a Change in a Path's Priority . . . . . . . . . 11 2.5. Requesting a Change in a Path's Priority . . . . . . . . . 11
2.6. Closing an MPTCP Connection . . . . . . . . . . . . . . . 12 2.6. Closing an MPTCP Connection . . . . . . . . . . . . . . . 12
2.7. Notable Features . . . . . . . . . . . . . . . . . . . . . 12 2.7. Notable Features . . . . . . . . . . . . . . . . . . . . . 12
3. MPTCP Protocol . . . . . . . . . . . . . . . . . . . . . . . . 12 3. MPTCP Protocol . . . . . . . . . . . . . . . . . . . . . . . . 12
3.1. Connection Initiation . . . . . . . . . . . . . . . . . . 14 3.1. Connection Initiation . . . . . . . . . . . . . . . . . . 14
3.2. Starting a New Subflow . . . . . . . . . . . . . . . . . . 18 3.2. Starting a New Subflow . . . . . . . . . . . . . . . . . . 19
3.3. General MPTCP Operation . . . . . . . . . . . . . . . . . 23 3.3. General MPTCP Operation . . . . . . . . . . . . . . . . . 24
3.3.1. Data Sequence Mapping . . . . . . . . . . . . . . . . 25 3.3.1. Data Sequence Mapping . . . . . . . . . . . . . . . . 26
3.3.2. Data Acknowledgments . . . . . . . . . . . . . . . . . 28 3.3.2. Data Acknowledgments . . . . . . . . . . . . . . . . . 29
3.3.3. Closing a Connection . . . . . . . . . . . . . . . . . 29 3.3.3. Closing a Connection . . . . . . . . . . . . . . . . . 30
3.3.4. Receiver Considerations . . . . . . . . . . . . . . . 30 3.3.4. Receiver Considerations . . . . . . . . . . . . . . . 31
3.3.5. Sender Considerations . . . . . . . . . . . . . . . . 32 3.3.5. Sender Considerations . . . . . . . . . . . . . . . . 33
3.3.6. Reliability and Retransmissions . . . . . . . . . . . 32 3.3.6. Reliability and Retransmissions . . . . . . . . . . . 33
3.3.7. Congestion Control Considerations . . . . . . . . . . 34 3.3.7. Congestion Control Considerations . . . . . . . . . . 35
3.3.8. Subflow Policy . . . . . . . . . . . . . . . . . . . . 34 3.3.8. Subflow Policy . . . . . . . . . . . . . . . . . . . . 35
3.4. Address Knowledge Exchange (Path Management) . . . . . . . 36 3.4. Address Knowledge Exchange (Path Management) . . . . . . . 37
3.4.1. Address Advertisement . . . . . . . . . . . . . . . . 37 3.4.1. Address Advertisement . . . . . . . . . . . . . . . . 38
3.4.2. Remove Address . . . . . . . . . . . . . . . . . . . . 40 3.4.2. Remove Address . . . . . . . . . . . . . . . . . . . . 41
3.5. Fast Close . . . . . . . . . . . . . . . . . . . . . . . . 41 3.5. Fast Close . . . . . . . . . . . . . . . . . . . . . . . . 42
3.6. Subflow Reset . . . . . . . . . . . . . . . . . . . . . . 42 3.6. Subflow Reset . . . . . . . . . . . . . . . . . . . . . . 43
3.7. Fallback . . . . . . . . . . . . . . . . . . . . . . . . . 44 3.7. MPTCP Experimental Option . . . . . . . . . . . . . . . . 45
3.8. Error Handling . . . . . . . . . . . . . . . . . . . . . . 47 3.8. Fallback . . . . . . . . . . . . . . . . . . . . . . . . . 46
3.9. Heuristics . . . . . . . . . . . . . . . . . . . . . . . . 48 3.9. Error Handling . . . . . . . . . . . . . . . . . . . . . . 50
3.9.1. Port Usage . . . . . . . . . . . . . . . . . . . . . . 48 3.10. Heuristics . . . . . . . . . . . . . . . . . . . . . . . . 50
3.9.2. Delayed Subflow Start and Subflow Symmetry . . . . . . 48 3.10.1. Port Usage . . . . . . . . . . . . . . . . . . . . . . 51
3.9.3. Failure Handling . . . . . . . . . . . . . . . . . . . 50 3.10.2. Delayed Subflow Start and Subflow Symmetry . . . . . . 51
4. Semantic Issues . . . . . . . . . . . . . . . . . . . . . . . 50 3.10.3. Failure Handling . . . . . . . . . . . . . . . . . . . 52
5. Security Considerations . . . . . . . . . . . . . . . . . . . 52 4. Semantic Issues . . . . . . . . . . . . . . . . . . . . . . . 53
6. Interactions with Middleboxes . . . . . . . . . . . . . . . . 54 5. Security Considerations . . . . . . . . . . . . . . . . . . . 54
7. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 57 6. Interactions with Middleboxes . . . . . . . . . . . . . . . . 57
8. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 57 7. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 60
9. References . . . . . . . . . . . . . . . . . . . . . . . . . . 60 8. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 60
9.1. Normative References . . . . . . . . . . . . . . . . . . . 60 9. References . . . . . . . . . . . . . . . . . . . . . . . . . . 62
9.2. Informative References . . . . . . . . . . . . . . . . . . 60 9.1. Normative References . . . . . . . . . . . . . . . . . . . 62
Appendix A. Notes on Use of TCP Options . . . . . . . . . . . . . 62 9.2. Informative References . . . . . . . . . . . . . . . . . . 63
Appendix B. Control Blocks . . . . . . . . . . . . . . . . . . . 63 Appendix A. Notes on Use of TCP Options . . . . . . . . . . . . . 65
B.1. MPTCP Control Block . . . . . . . . . . . . . . . . . . . 64 Appendix B. Control Blocks . . . . . . . . . . . . . . . . . . . 67
B.1.1. Authentication and Metadata . . . . . . . . . . . . . 64 B.1. MPTCP Control Block . . . . . . . . . . . . . . . . . . . 67
B.1.2. Sending Side . . . . . . . . . . . . . . . . . . . . . 64 B.1.1. Authentication and Metadata . . . . . . . . . . . . . 67
B.1.3. Receiving Side . . . . . . . . . . . . . . . . . . . . 65 B.1.2. Sending Side . . . . . . . . . . . . . . . . . . . . . 68
B.2. TCP Control Blocks . . . . . . . . . . . . . . . . . . . . 65 B.1.3. Receiving Side . . . . . . . . . . . . . . . . . . . . 68
B.2.1. Sending Side . . . . . . . . . . . . . . . . . . . . . 65 B.2. TCP Control Blocks . . . . . . . . . . . . . . . . . . . . 68
B.2.2. Receiving Side . . . . . . . . . . . . . . . . . . . . 65 B.2.1. Sending Side . . . . . . . . . . . . . . . . . . . . . 69
Appendix C. Finite State Machine . . . . . . . . . . . . . . . . 66 B.2.2. Receiving Side . . . . . . . . . . . . . . . . . . . . 69
Appendix C. Finite State Machine . . . . . . . . . . . . . . . . 69
1. Introduction 1. Introduction
Multipath TCP (MPTCP) is a set of extensions to regular TCP [1] to Multipath TCP (MPTCP) is a set of extensions to regular TCP [1] to
provide a Multipath TCP [2] service, which enables a transport provide a Multipath TCP [2] service, which enables a transport
connection to operate across multiple paths simultaneously. This connection to operate across multiple paths simultaneously. This
document presents the protocol changes required to add multipath document presents the protocol changes required to add multipath
capability to TCP; specifically, those for signaling and setting up capability to TCP; specifically, those for signaling and setting up
multiple paths ("subflows"), managing these subflows, reassembly of multiple paths ("subflows"), managing these subflows, reassembly of
data, and termination of sessions. This is not the only information data, and termination of sessions. This is not the only information
skipping to change at page 4, line 32 skipping to change at page 4, line 32
o Congestion control [6] presents a safe congestion control o Congestion control [6] presents a safe congestion control
algorithm for coupling the behavior of the multiple paths in order algorithm for coupling the behavior of the multiple paths in order
to "do no harm" to other network users. to "do no harm" to other network users.
o Application considerations [7] discusses what impact MPTCP will o Application considerations [7] discusses what impact MPTCP will
have on applications, what applications will want to do with have on applications, what applications will want to do with
MPTCP, and as a consequence of these factors, what API extensions MPTCP, and as a consequence of these factors, what API extensions
an MPTCP implementation should present. an MPTCP implementation should present.
This document is an update to, and obsoletes, the first specification This document is an update to, and obsoletes, the v0 specification of
of Multipath TCP [5]. Changes are limited to behavioural Multipath TCP [5]. This document specifies MPTCP v1, which is not
clarifications and new messages that can coexist with earlier backward compatible with MPTCP v0. This document additionally
implementations. defines version negotiation procedures for implementations that
support both versions.
1.1. Design Assumptions 1.1. Design Assumptions
In order to limit the potentially huge design space, the working In order to limit the potentially huge design space, the working
group imposed two key constraints on the Multipath TCP design group imposed two key constraints on the Multipath TCP design
presented in this document: presented in this document:
o It must be backwards-compatible with current, regular TCP, to o It must be backwards-compatible with current, regular TCP, to
increase its chances of deployment. increase its chances of deployment.
skipping to change at page 9, line 11 skipping to change at page 9, line 11
during the lifetime of the Multipath TCP connection. during the lifetime of the Multipath TCP connection.
All MPTCP operations are signaled with a TCP option -- a single All MPTCP operations are signaled with a TCP option -- a single
numerical type for MPTCP, with "sub-types" for each MPTCP message. numerical type for MPTCP, with "sub-types" for each MPTCP message.
What follows is a summary of the purpose and rationale of these What follows is a summary of the purpose and rationale of these
messages. messages.
2.1. Initiating an MPTCP Connection 2.1. Initiating an MPTCP Connection
This is the same signaling as for initiating a normal TCP connection, This is the same signaling as for initiating a normal TCP connection,
but the SYN, SYN/ACK, and ACK packets also carry the MP_CAPABLE but the SYN, SYN/ACK, and initial ACK packets also carry the
option. This is variable length and serves multiple purposes. MP_CAPABLE option. This option is variable length and serves
Firstly, it verifies whether the remote host supports Multipath TCP; multiple purposes. Firstly, it verifies whether the remote host
secondly, this option allows the hosts to exchange some information supports Multipath TCP; secondly, this option allows the hosts to
to authenticate the establishment of additional subflows. Further exchange some information to authenticate the establishment of
details are given in Section 3.1. additional subflows. Further details are given in Section 3.1.
Host A Host B Host A Host B
------ ------ ------ ------
MP_CAPABLE -> MP_CAPABLE ->
[A's key, flags] [flags]
<- MP_CAPABLE <- MP_CAPABLE
[B's key, flags] [B's key, flags]
ACK + MP_CAPABLE -> ACK + MP_CAPABLE (+ data) ->
[A's key, B's key, flags] [A's key, B's key, flags, (data-level details)]
2.2. Associating a New Subflow with an Existing MPTCP Connection 2.2. Associating a New Subflow with an Existing MPTCP Connection
The exchange of keys in the MP_CAPABLE handshake provides material The exchange of keys in the MP_CAPABLE handshake provides material
that can be used to authenticate the endpoints when new subflows will that can be used to authenticate the endpoints when new subflows will
be set up. Additional subflows begin in the same way as initiating a be set up. Additional subflows begin in the same way as initiating a
normal TCP connection, but the SYN, SYN/ACK, and ACK packets also normal TCP connection, but the SYN, SYN/ACK, and ACK packets also
carry the MP_JOIN option. carry the MP_JOIN option.
Host A initiates a new subflow between one of its addresses and one Host A initiates a new subflow between one of its addresses and one
skipping to change at page 10, line 40 skipping to change at page 10, line 40
host the availability of an address without establishing a new host the availability of an address without establishing a new
subflow, for example, when a NAT prevents setup in one direction. In subflow, for example, when a NAT prevents setup in one direction. In
the example below, Host A informs Host B about its alternative IP the example below, Host A informs Host B about its alternative IP
address/port pair (IP#-A2). Host B may later send an MP_JOIN to this address/port pair (IP#-A2). Host B may later send an MP_JOIN to this
new address. This option contains a HMAC to authenticate the address new address. This option contains a HMAC to authenticate the address
as having been sent from the originator of the connection. Further as having been sent from the originator of the connection. Further
details are in Section 3.4.1. details are in Section 3.4.1.
Host A Host B Host A Host B
------ ------ ------ ------
ADD_ADDR2 -> ADD_ADDR ->
[IP#-A2, [IP#-A2,
IP#-A2's Address ID, IP#-A2's Address ID,
HMAC of IP#-A2] HMAC of IP#-A2]
There is a corresponding signal for address removal, making use of There is a corresponding signal for address removal, making use of
the Address ID that is signaled in the add address handshake. the Address ID that is signaled in the add address handshake.
Further details in Section 3.4.2. Further details in Section 3.4.2.
Host A Host B Host A Host B
------ ------ ------ ------
skipping to change at page 12, line 31 skipping to change at page 12, line 31
2.7. Notable Features 2.7. Notable Features
It is worth highlighting that MPTCP's signaling has been designed It is worth highlighting that MPTCP's signaling has been designed
with several key requirements in mind: with several key requirements in mind:
o To cope with NATs on the path, addresses are referred to by o To cope with NATs on the path, addresses are referred to by
Address IDs, in case the IP packet's source address gets changed Address IDs, in case the IP packet's source address gets changed
by a NAT. Setting up a new TCP flow is not possible if the by a NAT. Setting up a new TCP flow is not possible if the
passive opener is behind a NAT; to allow subflows to be created passive opener is behind a NAT; to allow subflows to be created
when either end is behind a NAT, MPTCP uses the ADD_ADDR2 message. when either end is behind a NAT, MPTCP uses the ADD_ADDR message.
o MPTCP falls back to ordinary TCP if MPTCP operation is not o MPTCP falls back to ordinary TCP if MPTCP operation is not
possible, for example, if one host is not MPTCP capable or if a possible, for example, if one host is not MPTCP capable or if a
middlebox alters the payload. middlebox alters the payload.
o To meet the threats identified in [10], the following steps are o To meet the threats identified in [10], the following steps are
taken: keys are sent in the clear in the MP_CAPABLE messages; taken: keys are sent in the clear in the MP_CAPABLE messages;
MP_JOIN messages are secured with HMAC-SHA1 ([11], [4]) using MP_JOIN messages are secured with HMAC-SHA1 ([11], [4]) using
those keys; and standard TCP validity checks are made on the other those keys; and standard TCP validity checks are made on the other
messages (ensuring sequence numbers are in-window [12]). messages (ensuring sequence numbers are in-window [12]).
skipping to change at page 14, line 13 skipping to change at page 14, line 13
the recommendations in [15]. the recommendations in [15].
3.1. Connection Initiation 3.1. Connection Initiation
Connection initiation begins with a SYN, SYN/ACK, ACK exchange on a Connection initiation begins with a SYN, SYN/ACK, ACK exchange on a
single path. Each packet contains the Multipath Capable (MP_CAPABLE) single path. Each packet contains the Multipath Capable (MP_CAPABLE)
MPTCP option (Figure 4). This option declares its sender is capable MPTCP option (Figure 4). This option declares its sender is capable
of performing Multipath TCP and wishes to do so on this particular of performing Multipath TCP and wishes to do so on this particular
connection. connection.
This option is used to declare the 64-bit key that the sender has The MP_CAPABLE exchange in this specification (v1) is different to
generated for this MPTCP connection. This key is used to that specified in v0 [5]. If a host supports multiple versions of
MPTCP, the sender of the MP_CAPABLE option SHOULD signal the highest
version number it supports. The passive opener, on receipt of this,
will signal the version number it wishes to use, which MUST be equal
to or lower than the version number indicated in the initial
MP_CAPABLE. Given the SYN exchange is different between v1 and v0
the exchange cannot be immediately downgraded, and therefore if the
far end has requested a lower version then the initiator SHOULD
respond with an ACK without any MP_CAPABLE option, to fall back to
regular TCP. If the initiator supports the requsted version, on
future connections to the target host, the initiator MAY cache the
version preference. Alternatively, the initiator MAY close the
connection with a TCP RST and immediately re-establish with the
requested version of MPTCP.
The MP_CAPABLE option is variable-length, with different fields
included depending on which packet the option is used on. The full
MP_CAPABLE option is shown in Figure 4.
1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+---------------+---------------+-------+-------+---------------+
| Kind | Length |Subtype|Version|A|B|C|D|E|F|G|H|
+---------------+---------------+-------+-------+---------------+
| Option Sender's Key (64 bits) |
| (if option Length > 4) |
| |
+---------------------------------------------------------------+
| Option Receiver's Key (64 bits) |
| (if option Length > 12) |
| |
+-------------------------------+-------------------------------+
| Data-Level Length (16 bits) | Checksum (16 bits, optional) |
+-------------------------------+-------------------------------+
Figure 4: Multipath Capable (MP_CAPABLE) Option
The MP_CAPABLE option is carried on the SYN, SYN/ACK, and ACK packets
that start the first subflow of an MPTCP connection, as well as the
first packet that carries data, if the initiator wishs to send first.
The data carried by each option is as follows, where A = initiator
and B = listener.
o SYN (A->B): only the first four octets (Length = 4).
o SYN/ACK (B->A): B's Key for this connection (Length = 12).
o ACK (no data) (A->B): A's Key followed by B's Key (Length = 20).
o ACK (with first data) (A->B): A's Key followed by B's Key followed
by Data-Level Length, and optional Checksum (Length = 22 or 24).
The contents of the option is determined by the SYN and ACK flags of
the packet, along with the option's length field. For the diagram
shown in Figure 4, "sender" and "receiver" refer to the sender or
receiver of the TCP packet (which can be either host).
The initial SYN, containing just the MP_CAPABLE header, is used to
define the version of MPTCP beign requested, as well as exchanging
flags to negotiate connection features, described later.
This option is used to declare the 64-bit keys that the end hosts
have generated for this MPTCP connection. This key is used to
authenticate the addition of future subflows to this connection. authenticate the addition of future subflows to this connection.
This is the only time the key will be sent in clear on the wire This is the only time the key will be sent in clear on the wire
(unless "fast close", Section 3.5, is used); all future subflows will (unless "fast close", Section 3.5, is used); all future subflows will
identify the connection using a 32-bit "token". This token is a identify the connection using a 32-bit "token". This token is a
cryptographic hash of this key. The algorithm for this process is cryptographic hash of this key. The algorithm for this process is
dependent on the authentication algorithm selected; the method of dependent on the authentication algorithm selected; the method of
selection is defined later in this section. selection is defined later in this section.
This key is generated by its sender, and its method of generation is Upon reception of the initial SYN-segment, a stateful server
implementation specific. The key MUST be hard to guess, and it MUST generates a random key and replies with a SYN/ACK. The key's method
be unique for the sending host at any one time. Recommendations for of generation is implementation specific. The key MUST be hard to
generating random numbers for use in keys are given in [16]. guess, and it MUST be unique for the sending host at any one time.
Connections will be indexed at each host by the token (a one-way hash Recommendations for generating random numbers for use in keys are
of the key). Therefore, an implementation will require a mapping given in [16]. Connections will be indexed at each host by the token
from each token to the corresponding connection, and in turn to the (a one-way hash of the key). Therefore, an implementation will
keys for the connection. require a mapping from each token to the corresponding connection,
and in turn to the keys for the connection.
There is a risk that two different keys will hash to the same token. There is a risk that two different keys will hash to the same token.
The risk of hash collisions is usually small, unless the host is The risk of hash collisions is usually small, unless the host is
handling many tens of thousands of connections. Therefore, an handling many tens of thousands of connections. Therefore, an
implementation SHOULD check its list of connection tokens to ensure implementation SHOULD check its list of connection tokens to ensure
there is not a collision before sending its key in the SYN/ACK. This there is not a collision before sending its key, and if there is,
would, however, be costly for a server with thousands of connections. then it should generate a new key. This would, however, be costly
The subflow handshake mechanism (Section 3.2) will ensure that new for a server with thousands of connections. The subflow handshake
subflows only join the correct connection, however, through the mechanism (Section 3.2) will ensure that new subflows only join the
cryptographic handshake, as well as checking the connection tokens in correct connection, however, through the cryptographic handshake, as
both directions, and ensuring sequence numbers are in-window. So in well as checking the connection tokens in both directions, and
the worst case if there was a token collision, the new subflow would ensuring sequence numbers are in-window. So in the worst case if
not succeed, but the MPTCP connection would continue to provide a there was a token collision, the new subflow would not succeed, but
regular TCP service. the MPTCP connection would continue to provide a regular TCP service.
Since key generation is implementation-specific, there is no Since key generation is implementation-specific, there is no
requirement that they be simply random numbers. An implemention is requirement that they be simply random numbers. An implemention is
free to exchange cryptographic material out-of-band and generate free to exchange cryptographic material out-of-band and generate
these keys from this, in order to provide additional mechanisms by these keys from this, in order to provide additional mechanisms by
which to verify the identity of the communicating entities. For which to verify the identity of the communicating entities. For
example, an implementation could choose to link its MPTCP keys to example, an implementation could choose to link its MPTCP keys to
those used in higher-layer TLS or SSH connections. those used in higher-layer TLS or SSH connections.
The MP_CAPABLE option is carried on the SYN, SYN/ACK, and ACK packets If the server behaves in a stateless manner, it has to generate its
that start the first subflow of an MPTCP connection. The data own key in a verifiable fashion. This verifiable way of generating
carried by each packet is as follows, where A = initiator and B = the key can be done by using a hash of the 4-tuple, sequence number
listener. and a local secret (similar to what is done for the TCP-sequence
number [17]). It will thus be able to verify whether it is indeed
o SYN (A->B): A's Key for this connection. the originator of the key echoed back in the later MP_CAPABLE option.
As for a stateful server, the tokens SHOULD be checked for
o SYN/ACK (B->A): B's Key for this connection. uniqueness, however if uniqueness is not met, and there is no way to
generate an alternative verifiable key, then the connection MUST fall
o ACK (A->B): A's Key followed by B's Key. back to using regular TCP by not sending a MP_CAPABLE in the SYN/ACK.
The contents of the option is determined by the SYN and ACK flags of
the packet, verified by the option's length field. For the diagram
shown in Figure 4, "sender" and "receiver" refer to the sender or
receiver of the TCP packet (which can be either host). If the SYN
flag is set, a single key is included; if only an ACK flag is set,
both keys are present.
B's Key is echoed in the ACK in order to allow the listener (Host B)
to act statelessly until the TCP connection reaches the ESTABLISHED
state. If the listener acts in this way, however, it MUST generate
its key in a way that would allow it to verify that it generated the
key when it is echoed in the ACK.
This exchange allows the safe passage of MPTCP options on SYN packets The ACK carries both A's key and B's key. This is the first time
to be determined. If any of these options are dropped, MPTCP will that A's key is seen on the wire, although it is expected that A will
gracefully fall back to regular single-path TCP, as documented in have generated a key locally before the initial SYN. The echoing of
Section 3.7. Note that new subflows MUST NOT be established (using B's key allows B to operate statelessly, as described above.
the process documented in Section 3.2) until a Data Sequence Signal Therefore, A's key must be delivered reliably to B, and in order to
(DSS) option has been successfully received across the path (as do this, the transmission of this packet must be made reliable.
documented in Section 3.3).
1 2 3 If B has data to send first, then the reliable delivery of the ACK
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 can be inferred by the receipt of this data with an appropriate MPTCP
+---------------+---------------+-------+-------+---------------+ Data Sequence Signal (DSS) option (Section 3.3). If, however, A
| Kind | Length |Subtype|Version|A|B|C|D|E|F|G|H| wishes to send data first, it would not know whether the ACK has
+---------------+---------------+-------+-------+---------------+ successfully been received, and thus whether the MPTCP is
| Option Sender's Key (64 bits) | successfully established. Therefore, on the first data A has to send
| | (if it has not received any data from B), it MUST also include a
| | MP_CAPABLE option, with additional data parameters. This packet may
+---------------------------------------------------------------+ be the third ACK if data is ready to be sent by the application, or
| Option Receiver's Key (64 bits) | may be a later packet if the application only later has data to send.
| (if option Length == 20) | This MP_CAPABLE option is in place of the DSS, and simply specifies
| | the data-level length of the payload, and the checksum (if the use of
+---------------------------------------------------------------+ checksums is negotiated). This is the minimal data required to
establish a MPTCP connection - it allows validation of the payload,
and given it is the first data, the Initial Data Sequence Number
(IDSN) is also known (as it is generated from the key, as described
below). Conveying the keys on the first data packet allows the TCP
reliability mechanisms to ensure the packet is successfully
delivered. The receiver will acknowledge this data a the connection
level with a Data ACK, as if a DSS option has been received.
Figure 4: Multipath Capable (MP_CAPABLE) Option Additionally, the MP_CAPABLE exchange allows the safe passage of
MPTCP options on SYN packets to be determined. If any of these
options are dropped, MPTCP will gracefully fall back to regular
single-path TCP, as documented in Section 3.8. Note that new
subflows MUST NOT be established (using the process documented in
Section 3.2) until a Data Sequence Signal (DSS) option has been
successfully received across the path (as documented in Section 3.3).
The first 4 bits of the first octet in the MP_CAPABLE option The first 4 bits of the first octet in the MP_CAPABLE option
(Figure 4) define the MPTCP option subtype (see Section 8; for (Figure 4) define the MPTCP option subtype (see Section 8; for
MP_CAPABLE, this is 0), and the remaining 4 bits of this octet MP_CAPABLE, this is 0), and the remaining 4 bits of this octet
specify the MPTCP version in use (for this specification, this is 0). specify the MPTCP version in use (for this specification, this is 1).
The second octet is reserved for flags, allocated as follows: The second octet is reserved for flags, allocated as follows:
A: The leftmost bit, labeled "A", SHOULD be set to 1 to indicate A: The leftmost bit, labeled "A", SHOULD be set to 1 to indicate
"Checksum Required", unless the system administrator has decided "Checksum Required", unless the system administrator has decided
that checksums are not required (for example, if the environment that checksums are not required (for example, if the environment
is controlled and no middleboxes exist that might adjust the is controlled and no middleboxes exist that might adjust the
payload). payload).
B: The second bit, labeled "B", is an extensibility flag, and MUST be B: The second bit, labeled "B", is an extensibility flag, and MUST be
skipping to change at page 16, line 40 skipping to change at page 18, line 6
bit, labeled "H", is assigned. Bit "H" indicates the use of HMAC- bit, labeled "H", is assigned. Bit "H" indicates the use of HMAC-
SHA1 (as defined in Section 3.2). An implementation that only SHA1 (as defined in Section 3.2). An implementation that only
supports this method MUST set bit "H" to 1, and bits "C" through supports this method MUST set bit "H" to 1, and bits "C" through
"G" to 0. "G" to 0.
A crypto algorithm MUST be specified. If flag bits C through H are A crypto algorithm MUST be specified. If flag bits C through H are
all 0, the MP_CAPABLE option MUST be treated as invalid and ignored all 0, the MP_CAPABLE option MUST be treated as invalid and ignored
(that is, it must be treated as a regular TCP handshake). (that is, it must be treated as a regular TCP handshake).
The selection of the authentication algorithm also impacts the The selection of the authentication algorithm also impacts the
algorithm used to generate the token and the initial data sequence algorithm used to generate the token and the Initial Data Sequence
number (IDSN). In this specification, with only the SHA-1 algorithm Number (IDSN). In this specification, with only the SHA-1 algorithm
(bit "H") specified and selected, the token MUST be a truncated (most (bit "H") specified and selected, the token MUST be a truncated (most
significant 32 bits) SHA-1 hash ([4], [17]) of the key. A different, significant 32 bits) SHA-1 hash ([4], [18]) of the key. A different,
64-bit truncation (the least significant 64 bits) of the SHA-1 hash 64-bit truncation (the least significant 64 bits) of the SHA-1 hash
of the key MUST be used as the initial data sequence number. Note of the key MUST be used as the IDSN. Note that the key MUST be
that the key MUST be hashed in network byte order. Also note that hashed in network byte order. Also note that the "least significant"
the "least significant" bits MUST be the rightmost bits of the SHA-1 bits MUST be the rightmost bits of the SHA-1 digest, as per [4].
digest, as per [4]. Future specifications of the use of the crypto Future specifications of the use of the crypto bits may choose to
bits may choose to specify different algorithms for token and IDSN specify different algorithms for token and IDSN generation.
generation.
Both the crypto and checksum bits negotiate capabilities in similar Both the crypto and checksum bits negotiate capabilities in similar
ways. For the Checksum Required bit (labeled "A"), if either host ways. For the Checksum Required bit (labeled "A"), if either host
requires the use of checksums, checksums MUST be used. In other requires the use of checksums, checksums MUST be used. In other
words, the only way for checksums not to be used is if both hosts in words, the only way for checksums not to be used is if both hosts in
their SYNs set A=0. This decision is confirmed by the setting of the their SYNs set A=0. This decision is confirmed by the setting of the
"A" bit in the third packet (the ACK) of the handshake. For example, "A" bit in the third packet (the ACK) of the handshake. For example,
if the initiator sets A=0 in the SYN, but the responder sets A=1 in if the initiator sets A=0 in the SYN, but the responder sets A=1 in
the SYN/ACK, checksums MUST be used in both directions, and the the SYN/ACK, checksums MUST be used in both directions, and the
initiator will set A=1 in the ACK. The decision whether to use initiator will set A=1 in the ACK. The decision whether to use
skipping to change at page 17, line 44 skipping to change at page 19, line 9
If a SYN contains an MP_CAPABLE option but the SYN/ACK does not, it If a SYN contains an MP_CAPABLE option but the SYN/ACK does not, it
is assumed that the passive opener is not multipath capable; thus, is assumed that the passive opener is not multipath capable; thus,
the MPTCP session MUST operate as a regular, single-path TCP. If a the MPTCP session MUST operate as a regular, single-path TCP. If a
SYN does not contain a MP_CAPABLE option, the SYN/ACK MUST NOT SYN does not contain a MP_CAPABLE option, the SYN/ACK MUST NOT
contain one in response. If the third packet (the ACK) does not contain one in response. If the third packet (the ACK) does not
contain the MP_CAPABLE option, then the session MUST fall back to contain the MP_CAPABLE option, then the session MUST fall back to
operating as a regular, single-path TCP. This is to maintain operating as a regular, single-path TCP. This is to maintain
compatibility with middleboxes on the path that drop some or all TCP compatibility with middleboxes on the path that drop some or all TCP
options. Note that an implementation MAY choose to attempt sending options. Note that an implementation MAY choose to attempt sending
MPTCP options more than one time before making this decision to MPTCP options more than one time before making this decision to
operate as regular TCP (see Section 3.9). operate as regular TCP (see Section 3.10).
If the SYN packets are unacknowledged, it is up to local policy to If the SYN packets are unacknowledged, it is up to local policy to
decide how to respond. It is expected that a sender will eventually decide how to respond. It is expected that a sender will eventually
fall back to single-path TCP (i.e., without the MP_CAPABLE option) in fall back to single-path TCP (i.e., without the MP_CAPABLE option) in
order to work around middleboxes that may drop packets with unknown order to work around middleboxes that may drop packets with unknown
options; however, the number of multipath-capable attempts that are options; however, the number of multipath-capable attempts that are
made first will be up to local policy. It is possible that MPTCP and made first will be up to local policy. It is possible that MPTCP and
non-MPTCP SYNs could get reordered in the network. Therefore, the non-MPTCP SYNs could get reordered in the network. Therefore, the
final state is inferred from the presence or absence of the final state is inferred from the presence or absence of the
MP_CAPABLE option in the third packet of the TCP handshake. If this MP_CAPABLE option in the third packet of the TCP handshake. If this
option is not present, the connection SHOULD fall back to regular option is not present, the connection SHOULD fall back to regular
TCP, as documented in Section 3.7. TCP, as documented in Section 3.8.
If a host supports multiple versions of MPTCP, the sender of the
MP_CAPABLE option SHOULD signal the highest version number it
supports. The passive opener, on receipt of this, will signal the
version number it wishes to use, which MUST be equal to or lower than
the version number indicated in the initial MP_CAPABLE. The
connection initiator, when sending the third packet (the ACK with
MP_CAPABLE), will either echo this version number as given in the
SYN/ACK (if it supports it), or will cancel the use of MPTCP and fall
back to regular TCP by not including the MP_CAPABLE option, if it
does not support, or does not wish to use, this requested version.
The initial data sequence number on an MPTCP connection is generated The initial data sequence number on an MPTCP connection is generated
from the key. The algorithm for IDSN generation is also determined from the key. The algorithm for IDSN generation is also determined
from the negotiated authentication algorithm. In this specification, from the negotiated authentication algorithm. In this specification,
with only the SHA-1 algorithm specified and selected, the IDSN of a with only the SHA-1 algorithm specified and selected, the IDSN of a
host MUST be the least significant 64 bits of the SHA-1 hash of its host MUST be the least significant 64 bits of the SHA-1 hash of its
key, i.e., IDSN-A = Hash(Key-A) and IDSN-B = Hash(Key-B). This key, i.e., IDSN-A = Hash(Key-A) and IDSN-B = Hash(Key-B). This
deterministic generation of the IDSN allows a receiver to ensure that deterministic generation of the IDSN allows a receiver to ensure that
there are no gaps in sequence space at the start of the connection. there are no gaps in sequence space at the start of the connection.
The SYN with MP_CAPABLE occupies the first octet of data sequence The SYN with MP_CAPABLE occupies the first octet of data sequence
skipping to change at page 18, line 43 skipping to change at page 19, line 45
3.2. Starting a New Subflow 3.2. Starting a New Subflow
Once an MPTCP connection has begun with the MP_CAPABLE exchange, Once an MPTCP connection has begun with the MP_CAPABLE exchange,
further subflows can be added to the connection. Hosts have further subflows can be added to the connection. Hosts have
knowledge of their own address(es), and can become aware of the other knowledge of their own address(es), and can become aware of the other
host's addresses through signaling exchanges as described in host's addresses through signaling exchanges as described in
Section 3.4. Using this knowledge, a host can initiate a new subflow Section 3.4. Using this knowledge, a host can initiate a new subflow
over a currently unused pair of addresses. It is permitted for over a currently unused pair of addresses. It is permitted for
either host in a connection to initiate the creation of a new either host in a connection to initiate the creation of a new
subflow, but it is expected that this will normally be the original subflow, but it is expected that this will normally be the original
connection initiator (see Section 3.9 for heuristics). connection initiator (see Section 3.10 for heuristics).
A new subflow is started as a normal TCP SYN/ACK exchange. The Join A new subflow is started as a normal TCP SYN/ACK exchange. The Join
Connection (MP_JOIN) MPTCP option is used to identify the connection Connection (MP_JOIN) MPTCP option is used to identify the connection
to be joined by the new subflow. It uses keying material that was to be joined by the new subflow. It uses keying material that was
exchanged in the initial MP_CAPABLE handshake (Section 3.1), and that exchanged in the initial MP_CAPABLE handshake (Section 3.1), and that
handshake also negotiates the crypto algorithm in use for the MP_JOIN handshake also negotiates the crypto algorithm in use for the MP_JOIN
handshake. handshake.
This section specifies the behavior of MP_JOIN using the HMAC-SHA1 This section specifies the behavior of MP_JOIN using the HMAC-SHA1
algorithm. An MP_JOIN option is present in the SYN, SYN/ACK, and ACK algorithm. An MP_JOIN option is present in the SYN, SYN/ACK, and ACK
of the three-way handshake, although in each case with a different of the three-way handshake, although in each case with a different
format. format.
In the first MP_JOIN on the SYN packet, illustrated in Figure 5, the In the first MP_JOIN on the SYN packet, illustrated in Figure 5, the
initiator sends a token, random number, and address ID. initiator sends a token, random number, and address ID.
The token is used to identify the MPTCP connection and is a The token is used to identify the MPTCP connection and is a
cryptographic hash of the receiver's key, as exchanged in the initial cryptographic hash of the receiver's key, as exchanged in the initial
MP_CAPABLE handshake (Section 3.1). In this specification, the MP_CAPABLE handshake (Section 3.1). In this specification, the
tokens presented in this option are generated by the SHA-1 ([4], tokens presented in this option are generated by the SHA-1 ([4],
[17]) algorithm, truncated to the most significant 32 bits. The [18]) algorithm, truncated to the most significant 32 bits. The
token included in the MP_JOIN option is the token that the receiver token included in the MP_JOIN option is the token that the receiver
of the packet uses to identify this connection; i.e., Host A will of the packet uses to identify this connection; i.e., Host A will
send Token-B (which is generated from Key-B). Note that the hash send Token-B (which is generated from Key-B). Note that the hash
generation algorithm can be overridden by the choice of cryptographic generation algorithm can be overridden by the choice of cryptographic
handshake algorithm, as defined in Section 3.1. handshake algorithm, as defined in Section 3.1.
The MP_JOIN SYN sends not only the token (which is static for a The MP_JOIN SYN sends not only the token (which is static for a
connection) but also random numbers (nonces) that are used to prevent connection) but also random numbers (nonces) that are used to prevent
replay attacks on the authentication method. Recommendations for the replay attacks on the authentication method. Recommendations for the
generation of random numbers for this purpose are given in [16]. generation of random numbers for this purpose are given in [16].
The MP_JOIN option includes an "Address ID". This is an identifier The MP_JOIN option includes an "Address ID". This is an identifier
that only has significance within a single connection, where it that only has significance within a single connection, where it
identifies the source address of this packet, even if the IP header identifies the source address of this packet, even if the IP header
has been changed in transit by a middlebox. The Address ID allows has been changed in transit by a middlebox. The Address ID allows
address removal (Section 3.4.2) without needing to know what the address removal (Section 3.4.2) without needing to know what the
source address at the receiver is, thus allowing address removal source address at the receiver is, thus allowing address removal
through NATs. The Address ID also allows correlation between new through NATs. The Address ID also allows correlation between new
subflow setup attempts and address signaling (Section 3.4.1), to subflow setup attempts and address signaling (Section 3.4.1), to
prevent setting up duplicate subflows on the same path, if an MP_JOIN prevent setting up duplicate subflows on the same path, if an MP_JOIN
and ADD_ADDR2 are sent at the same time. and ADD_ADDR are sent at the same time.
The Address IDs of the subflow used in the initial SYN exchange of The Address IDs of the subflow used in the initial SYN exchange of
the first subflow in the connection are implicit, and have the value the first subflow in the connection are implicit, and have the value
zero. A host MUST store the mappings between Address IDs and zero. A host MUST store the mappings between Address IDs and
addresses both for itself and the remote host. An implementation addresses both for itself and the remote host. An implementation
will also need to know which local and remote Address IDs are will also need to know which local and remote Address IDs are
associated with which established subflows, for when addresses are associated with which established subflows, for when addresses are
removed from a local or remote host. removed from a local or remote host.
The MP_JOIN option on packets with the SYN flag set also includes 4 The MP_JOIN option on packets with the SYN flag set also includes 4
skipping to change at page 20, line 44 skipping to change at page 21, line 46
against blind state exhaustion attacks; therefore, there is no need against blind state exhaustion attacks; therefore, there is no need
to provide mechanisms to allow a responder to operate statelessly at to provide mechanisms to allow a responder to operate statelessly at
the MP_JOIN stage. the MP_JOIN stage.
An HMAC is sent by both hosts -- by the initiator (Host A) in the An HMAC is sent by both hosts -- by the initiator (Host A) in the
third packet (the ACK) and by the responder (Host B) in the second third packet (the ACK) and by the responder (Host B) in the second
packet (the SYN/ACK). Doing the HMAC exchange at this stage allows packet (the SYN/ACK). Doing the HMAC exchange at this stage allows
both hosts to have first exchanged random data (in the first two SYN both hosts to have first exchanged random data (in the first two SYN
packets) that is used as the "message". This specification defines packets) that is used as the "message". This specification defines
that HMAC as defined in [11] is used, along with the SHA-1 hash that HMAC as defined in [11] is used, along with the SHA-1 hash
algorithm [4] (potentially implemented as in [17]), thus generating a algorithm [4] (potentially implemented as in [18]), thus generating a
160-bit / 20-octet HMAC. Due to option space limitations, the HMAC 160-bit / 20-octet HMAC. Due to option space limitations, the HMAC
included in the SYN/ACK is truncated to the leftmost 64 bits, but included in the SYN/ACK is truncated to the leftmost 64 bits, but
this is acceptable since random numbers are used; thus, an attacker this is acceptable since random numbers are used; thus, an attacker
only has one chance to guess the HMAC correctly (if the HMAC is only has one chance to guess the HMAC correctly (if the HMAC is
incorrect, the TCP connection is closed, so a new MP_JOIN negotiation incorrect, the TCP connection is closed, so a new MP_JOIN negotiation
with a new random number is required). with a new random number is required).
The initiator's authentication information is sent in its first ACK The initiator's authentication information is sent in its first ACK
(the third packet of the handshake), as shown in Figure 7. This data (the third packet of the handshake), as shown in Figure 7. This data
needs to be sent reliably, since it is the only time this HMAC is needs to be sent reliably, since it is the only time this HMAC is
skipping to change at page 23, line 23 skipping to change at page 24, line 25
MP_JOIN is stripped from the SYN on the path from A to B, and Host B MP_JOIN is stripped from the SYN on the path from A to B, and Host B
does not have a passive opener on the relevant port, it will respond does not have a passive opener on the relevant port, it will respond
with a RST in the normal way. If in response to a SYN with an with a RST in the normal way. If in response to a SYN with an
MP_JOIN option, a SYN/ACK is received without the MP_JOIN option MP_JOIN option, a SYN/ACK is received without the MP_JOIN option
(either since it was stripped on the return path, or it was stripped (either since it was stripped on the return path, or it was stripped
on the outgoing path but the passive opener on Host B responded as if on the outgoing path but the passive opener on Host B responded as if
it were a new regular TCP session), then the subflow is unusable and it were a new regular TCP session), then the subflow is unusable and
Host A MUST close it with a RST. Host A MUST close it with a RST.
Note that additional subflows can be created between any pair of Note that additional subflows can be created between any pair of
ports (but see Section 3.9 for heuristics); no explicit application- ports (but see Section 3.10 for heuristics); no explicit application-
level accept calls or bind calls are required to open additional level accept calls or bind calls are required to open additional
subflows. To associate a new subflow with an existing connection, subflows. To associate a new subflow with an existing connection,
the token supplied in the subflow's SYN exchange is used for the token supplied in the subflow's SYN exchange is used for
demultiplexing. This then binds the 5-tuple of the TCP subflow to demultiplexing. This then binds the 5-tuple of the TCP subflow to
the local token of the connection. A consequence is that it is the local token of the connection. A consequence is that it is
possible to allow any port pairs to be used for a connection. possible to allow any port pairs to be used for a connection.
Demultiplexing subflow SYNs MUST be done using the token; this is Demultiplexing subflow SYNs MUST be done using the token; this is
unlike traditional TCP, where the destination port is used for unlike traditional TCP, where the destination port is used for
demultiplexing SYN packets. Once a subflow is set up, demultiplexing demultiplexing SYN packets. Once a subflow is set up, demultiplexing
skipping to change at page 23, line 51 skipping to change at page 25, line 4
This section discusses operation of MPTCP for data transfer. At a This section discusses operation of MPTCP for data transfer. At a
high level, an MPTCP implementation will take one input data stream high level, an MPTCP implementation will take one input data stream
from an application, and split it into one or more subflows, with from an application, and split it into one or more subflows, with
sufficient control information to allow it to be reassembled and sufficient control information to allow it to be reassembled and
delivered reliably and in order to the recipient application. The delivered reliably and in order to the recipient application. The
following subsections define this behavior in detail. following subsections define this behavior in detail.
The data sequence mapping and the Data ACK are signaled in the Data The data sequence mapping and the Data ACK are signaled in the Data
Sequence Signal (DSS) option (Figure 9). Either or both can be Sequence Signal (DSS) option (Figure 9). Either or both can be
signaled in one DSS, dependent on the flags set. The data sequence signaled in one DSS, depending on the flags set. The data sequence
mapping defines how the sequence space on the subflow maps to the mapping defines how the sequence space on the subflow maps to the
connection level, and the Data ACK acknowledges receipt of data at connection level, and the Data ACK acknowledges receipt of data at
the connection level. These functions are described in more detail the connection level. These functions are described in more detail
in the following two subsections. in the following two subsections.
Either or both the data sequence mapping and the Data ACK can be
signaled in the DSS option, dependent on the flags set.
1 2 3 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+---------------+---------------+-------+----------------------+ +---------------+---------------+-------+----------------------+
| Kind | Length |Subtype| (reserved) |F|m|M|a|A| | Kind | Length |Subtype| (reserved) |F|m|M|a|A|
+---------------+---------------+-------+----------------------+ +---------------+---------------+-------+----------------------+
| Data ACK (4 or 8 octets, depending on flags) | | Data ACK (4 or 8 octets, depending on flags) |
+--------------------------------------------------------------+ +--------------------------------------------------------------+
| Data sequence number (4 or 8 octets, depending on flags) | | Data sequence number (4 or 8 octets, depending on flags) |
+--------------------------------------------------------------+ +--------------------------------------------------------------+
| Subflow Sequence Number (4 octets) | | Subflow Sequence Number (4 octets) |
skipping to change at page 26, line 16 skipping to change at page 27, line 13
the subflow sequence numbering is relative (the SYN at the start of the subflow sequence numbering is relative (the SYN at the start of
the subflow has relative subflow sequence number 0). This is to the subflow has relative subflow sequence number 0). This is to
allow middleboxes to change the initial sequence number of a subflow, allow middleboxes to change the initial sequence number of a subflow,
such as firewalls that undertake ISN randomization. such as firewalls that undertake ISN randomization.
The data sequence mapping also contains a checksum of the data that The data sequence mapping also contains a checksum of the data that
this mapping covers, if use of checksums has been negotiated at the this mapping covers, if use of checksums has been negotiated at the
MP_CAPABLE exchange. Checksums are used to detect if the payload has MP_CAPABLE exchange. Checksums are used to detect if the payload has
been adjusted in any way by a non-MPTCP-aware middlebox. If this been adjusted in any way by a non-MPTCP-aware middlebox. If this
checksum fails, it will trigger a failure of the subflow, or a checksum fails, it will trigger a failure of the subflow, or a
fallback to regular TCP, as documented in Section 3.7, since MPTCP fallback to regular TCP, as documented in Section 3.8, since MPTCP
can no longer reliably know the subflow sequence space at the can no longer reliably know the subflow sequence space at the
receiver to build data sequence mappings. receiver to build data sequence mappings.
The checksum algorithm used is the standard TCP checksum [1], The checksum algorithm used is the standard TCP checksum [1],
operating over the data covered by this mapping, along with a pseudo- operating over the data covered by this mapping, along with a pseudo-
header as shown in Figure 10. header as shown in Figure 10.
1 2 3 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+--------------------------------------------------------------+ +--------------------------------------------------------------+
skipping to change at page 27, line 26 skipping to change at page 28, line 24
window because this is relative to the data sequence numbers, so if window because this is relative to the data sequence numbers, so if
the receiver runs out of memory to hold this data, it will have to be the receiver runs out of memory to hold this data, it will have to be
discarded. If a mapping for that subflow-level sequence space does discarded. If a mapping for that subflow-level sequence space does
not arrive within a receive window of data, that subflow SHOULD be not arrive within a receive window of data, that subflow SHOULD be
treated as broken, closed with a RST, and any unmapped data silently treated as broken, closed with a RST, and any unmapped data silently
discarded. discarded.
Data sequence numbers are always 64-bit quantities, and MUST be Data sequence numbers are always 64-bit quantities, and MUST be
maintained as such in implementations. If a connection is maintained as such in implementations. If a connection is
progressing at a slow rate, so protection against wrapped sequence progressing at a slow rate, so protection against wrapped sequence
numbers is not required, then it is permissible to include just the numbers is not required, then an implementation MAY include just the
lower 32 bits of the data sequence number in the data sequence lower 32 bits of the data sequence number in the data sequence
mapping and/or Data ACK as an optimization, and an implementation can mapping and/or Data ACK as an optimization, and an implementation can
make this choice independently for each packet. make this choice independently for each packet. An implementaton
MUST be able to receive and process both 64-bit or 32-bit sequence
number values, but it is not required that an implementation is able
to send both.
An implementation MUST send the full 64-bit data sequence number if An implementation MUST send the full 64-bit data sequence number if
it is transmitting at a sufficiently high rate that the 32-bit value it is transmitting at a sufficiently high rate that the 32-bit value
could wrap within the Maximum Segment Lifetime (MSL) [18]. The could wrap within the Maximum Segment Lifetime (MSL) [19]. The
lengths of the DSNs used in these values (which may be different) are lengths of the DSNs used in these values (which may be different) are
declared with flags in the DSS option. Implementations MUST accept a declared with flags in the DSS option. Implementations MUST accept a
32-bit DSN and implicitly promote it to a 64-bit quantity by 32-bit DSN and implicitly promote it to a 64-bit quantity by
incrementing the upper 32 bits of sequence number each time the lower incrementing the upper 32 bits of sequence number each time the lower
32 bits wrap. A sanity check MUST be implemented to ensure that a 32 bits wrap. A sanity check MUST be implemented to ensure that a
wrap occurs at an expected time (e.g., the sequence number jumps from wrap occurs at an expected time (e.g., the sequence number jumps from
a very high number to a very low number) and is not triggered by out- a very high number to a very low number) and is not triggered by out-
of-order packets. of-order packets.
As with the standard TCP sequence number, the data sequence number As with the standard TCP sequence number, the data sequence number
skipping to change at page 28, line 15 skipping to change at page 29, line 15
A data sequence mapping does not need to be included in every MPTCP A data sequence mapping does not need to be included in every MPTCP
packet, as long as the subflow sequence space in that packet is packet, as long as the subflow sequence space in that packet is
covered by a mapping known at the receiver. This can be used to covered by a mapping known at the receiver. This can be used to
reduce overhead in cases where the mapping is known in advance; one reduce overhead in cases where the mapping is known in advance; one
such case is when there is a single subflow between the hosts, such case is when there is a single subflow between the hosts,
another is when segments of data are scheduled in larger than packet- another is when segments of data are scheduled in larger than packet-
sized chunks. sized chunks.
An "infinite" mapping can be used to fall back to regular TCP by An "infinite" mapping can be used to fall back to regular TCP by
mapping the subflow-level data to the connection-level data for the mapping the subflow-level data to the connection-level data for the
remainder of the connection (see Section 3.7). This is achieved by remainder of the connection (see Section 3.8). This is achieved by
setting the Data-Level Length field of the DSS option to the reserved setting the Data-Level Length field of the DSS option to the reserved
value of 0. The checksum, in such a case, will also be set to zero. value of 0. The checksum, in such a case, will also be set to zero.
3.3.2. Data Acknowledgments 3.3.2. Data Acknowledgments
To provide full end-to-end resilience, MPTCP provides a connection- To provide full end-to-end resilience, MPTCP provides a connection-
level acknowledgment, to act as a cumulative ACK for the connection level acknowledgment, to act as a cumulative ACK for the connection
as a whole. This is the "Data ACK" field of the DSS option as a whole. This is the "Data ACK" field of the DSS option
(Figure 9). The Data ACK is analogous to the behavior of the (Figure 9). The Data ACK is analogous to the behavior of the
standard TCP cumulative ACK -- indicating how much data has been standard TCP cumulative ACK -- indicating how much data has been
skipping to change at page 35, line 9 skipping to change at page 36, line 9
detail in [7]. detail in [7].
The ability to make effective choices at the sender requires full The ability to make effective choices at the sender requires full
knowledge of the path "cost", which is unlikely to be the case. It knowledge of the path "cost", which is unlikely to be the case. It
would be desirable for a receiver to be able to signal their own would be desirable for a receiver to be able to signal their own
preferences for paths, since they will often be the multihomed party, preferences for paths, since they will often be the multihomed party,
and may have to pay for metered incoming bandwidth. and may have to pay for metered incoming bandwidth.
Whilst fine-grained control may be the most powerful solution, that Whilst fine-grained control may be the most powerful solution, that
would require some mechanism such as overloading the Explicit would require some mechanism such as overloading the Explicit
Congestion Notification (ECN) signal [19], which is undesirable, and Congestion Notification (ECN) signal [20], which is undesirable, and
it is felt that there would not be sufficient benefit to justify an it is felt that there would not be sufficient benefit to justify an
entirely new signal. Therefore, the MP_JOIN option (see Section 3.2) entirely new signal. Therefore, the MP_JOIN option (see Section 3.2)
contains the 'B' bit, which allows a host to indicate to its peer contains the 'B' bit, which allows a host to indicate to its peer
that this path should be treated as a backup path to use only in the that this path should be treated as a backup path to use only in the
event of failure of other working subflows (i.e., a subflow where the event of failure of other working subflows (i.e., a subflow where the
receiver has indicated B=1 SHOULD NOT be used to send data unless receiver has indicated B=1 SHOULD NOT be used to send data unless
there are no usable subflows where B=0). there are no usable subflows where B=0).
In the event that the available set of paths changes, a host may wish In the event that the available set of paths changes, a host may wish
to signal a change in priority of subflows to the peer (e.g., a to signal a change in priority of subflows to the peer (e.g., a
skipping to change at page 36, line 38 skipping to change at page 37, line 38
o An MPTCP connection is initially set up between address/port A1 of o An MPTCP connection is initially set up between address/port A1 of
Host A and address/port B1 of Host B. If Host A is multihomed and Host A and address/port B1 of Host B. If Host A is multihomed and
multiaddressed, it can start an additional subflow from its multiaddressed, it can start an additional subflow from its
address A2 to B1, by sending a SYN with a Join option from A2 to address A2 to B1, by sending a SYN with a Join option from A2 to
B1, using B's previously declared token for this connection. B1, using B's previously declared token for this connection.
Alternatively, if B is multihomed, it can try to set up a new Alternatively, if B is multihomed, it can try to set up a new
subflow from B2 to A1, using A's previously declared token. In subflow from B2 to A1, using A's previously declared token. In
either case, the SYN will be sent to the port already in use for either case, the SYN will be sent to the port already in use for
the original subflow on the receiving host. the original subflow on the receiving host.
o Simultaneously (or after a timeout), an ADD_ADDR2 option o Simultaneously (or after a timeout), an ADD_ADDR option
(Section 3.4.1) is sent on an existing subflow, informing the (Section 3.4.1) is sent on an existing subflow, informing the
receiver of the sender's alternative address(es). The recipient receiver of the sender's alternative address(es). The recipient
can use this information to open a new subflow to the sender's can use this information to open a new subflow to the sender's
additional address. In our example, A will send ADD_ADDR2 option additional address. In our example, A will send ADD_ADDR option
informing B of address/port A2. The mix of using the SYN-based informing B of address/port A2. The mix of using the SYN-based
option and the ADD_ADDR2 option, including timeouts, is option and the ADD_ADDR option, including timeouts, is
implementation specific and can be tailored to agree with local implementation specific and can be tailored to agree with local
policy. policy.
o If subflow A2-B1 is successfully set up, Host B can use the o If subflow A2-B1 is successfully set up, Host B can use the
Address ID in the Join option to correlate this with the ADD_ADDR2 Address ID in the Join option to correlate this with the ADD_ADDR
option that will also arrive on an existing subflow; now B knows option that will also arrive on an existing subflow; now B knows
not to open A2-B1, ignoring the ADD_ADDR2. Otherwise, if B has not to open A2-B1, ignoring the ADD_ADDR. Otherwise, if B has not
not received the A2-B1 MP_JOIN SYN but received the ADD_ADDR2, it received the A2-B1 MP_JOIN SYN but received the ADD_ADDR, it can
can try to initiate a new subflow from one or more of its try to initiate a new subflow from one or more of its addresses to
addresses to address A2. This permits new sessions to be opened address A2. This permits new sessions to be opened if one host is
if one host is behind a NAT. behind a NAT.
Other ways of using the two signaling mechanisms are possible; for Other ways of using the two signaling mechanisms are possible; for
instance, signaling addresses in other address families can only be instance, signaling addresses in other address families can only be
done explicitly using the Add Address option. done explicitly using the Add Address option.
3.4.1. Address Advertisement 3.4.1. Address Advertisement
The Add Address (ADD_ADDR2) MPTCP option announces additional The Add Address (ADD_ADDR) MPTCP option announces additional
addresses (and optionally, ports) on which a host can be reached addresses (and optionally, ports) on which a host can be reached
(Figure 12). This option can be used at any time during a (Figure 12). This option can be used at any time during a
connection, depending on when the sender wishes to enable multiple connection, depending on when the sender wishes to enable multiple
paths and/or when paths become available. As with all MPTCP signals, paths and/or when paths become available. As with all MPTCP signals,
the receiver MUST undertake standard TCP validity checks, e.g. [12], the receiver MUST undertake standard TCP validity checks, e.g. [12],
before acting upon it. before acting upon it.
Every address has an Address ID that can be used for uniquely Every address has an Address ID that can be used for uniquely
identifying the address within a connection for address removal. identifying the address within a connection for address removal.
This is also used to identify MP_JOIN options (see Section 3.2) This is also used to identify MP_JOIN options (see Section 3.2)
relating to the same address, even when address translators are in relating to the same address, even when address translators are in
use. The Address ID MUST uniquely identify the address to the sender use. The Address ID MUST uniquely identify the address to the sender
(within the scope of the connection), but the mechanism for (within the scope of the connection), but the mechanism for
allocating such IDs is implementation specific. allocating such IDs is implementation specific.
All address IDs learned via either MP_JOIN or ADD_ADDR2 SHOULD be All address IDs learned via either MP_JOIN or ADD_ADDR SHOULD be
stored by the receiver in a data structure that gathers all the stored by the receiver in a data structure that gathers all the
Address ID to address mappings for a connection (identified by a Address ID to address mappings for a connection (identified by a
token pair). In this way, there is a stored mapping between Address token pair). In this way, there is a stored mapping between Address
ID, observed source address, and token pair for future processing of ID, observed source address, and token pair for future processing of
control information for a connection. Note that an implementation control information for a connection. Note that an implementation
MAY discard incoming address advertisements at will, for example, for MAY discard incoming address advertisements at will, for example, for
avoiding the required mapping state, or because advertised addresses avoiding the required mapping state, or because advertised addresses
are of no use to it (for example, IPv6 addresses when it has IPv4 are of no use to it (for example, IPv6 addresses when it has IPv4
only). Therefore, a host MUST treat address advertisements as soft only). Therefore, a host MUST treat address advertisements as soft
state, and it MAY choose to refresh advertisements periodically. state, and it MAY choose to refresh advertisements periodically.
This option is shown in Figure 12. The illustration is sized for This option is shown in Figure 12. The illustration is sized for
IPv4 addresses (IPVer = 4). For IPv6, the IPVer field will read 6, IPv4 addresses. For IPv6, the length of the address will be 16
and the length of the address will be 16 octets (instead of 4). octets (instead of 4).
The 2 octets that specify the TCP port number to use are optional and The 2 octets that specify the TCP port number to use are optional and
their presence can be inferred from the length of the option. their presence can be inferred from the length of the option.
Although it is expected that the majority of use cases will use the Although it is expected that the majority of use cases will use the
same port pairs as used for the initial subflow (e.g., port 80 same port pairs as used for the initial subflow (e.g., port 80
remains port 80 on all subflows, as does the ephemeral port at the remains port 80 on all subflows, as does the ephemeral port at the
client), there may be cases (such as port-based load balancing) where client), there may be cases (such as port-based load balancing) where
the explicit specification of a different port is required. If no the explicit specification of a different port is required. If no
port is specified, MPTCP SHOULD attempt to connect to the specified port is specified, MPTCP SHOULD attempt to connect to the specified
address on the same port as is already in use by the subflow on which address on the same port as is already in use by the subflow on which
the ADD_ADDR2 signal was sent; this is discussed in more detail in the ADD_ADDR signal was sent; this is discussed in more detail in
Section 3.9. Section 3.10.
The Truncated HMAC present in this Option is the rightmost 64 bits of The Truncated HMAC present in this Option is the rightmost 64 bits of
an HMAC, negotiated and calculated in the same way as for MP_JOIN as an HMAC, negotiated and calculated in the same way as for MP_JOIN as
described in Section 3.2. For this specification of MPTCP, as there described in Section 3.2. For this specification of MPTCP, as there
is only one hash algorithm option specified, this will be HMAC as is only one hash algorithm option specified, this will be HMAC as
defined in [11], using the SHA-1 hash algorithm [4], implemented as defined in [11], using the SHA-1 hash algorithm [4], implemented as
in [17]. The key used in the HMAC calculation is that of the sender, in [18]. In the same way as for MP_JOIN, the key for the HMAC
as originally declared in the MP_CAPABLE handshake. The message for algorithm, in the case of the message transmitted by Host A, will be
the HMAC is the Address ID, IP Address, and Port which precede the Key-A followed by Key-B, and in the case of Host B, Key-B followed by
HMAC in the ADD_ADDR2 option. The rationale for the HMAC is to Key-A. These are the keys that were exchanged in the original
prevent unauthorized entities from injecting ADD_ADDR2 signals in an MP_CAPABLE handshake. The message for the HMAC is the Address ID, IP
attempt to hijack a connection. Note that additionally the presence Address, and Port which precede the HMAC in the ADD_ADDR option. If
of this HMAC prevents the address being changed in flight unless the the port is not present in the ADD_ADDR option, the HMAC message will
key is known by an intermediary. If a host receives an ADD_ADDR2 nevertheless include two octets of value zero. The rationale for the
option for which it cannot validate the HMAC, it SHOULD silently HMAC is to prevent unauthorized entities from injecting ADD_ADDR
ignore the option. signals in an attempt to hijack a connection. Note that additionally
the presence of this HMAC prevents the address being changed in
flight unless the key is known by an intermediary. If a host
receives an ADD_ADDR option for which it cannot validate the HMAC, it
SHOULD silently ignore the option.
A set of four flags are present after the subtype and before the
Address ID. These are currently unassigned and MUST be set to zero
by a sender and MUST be ignored by the receiver.
1 2 3 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+---------------+---------------+-------+-------+---------------+ +---------------+---------------+-------+-------+---------------+
| Kind | Length |Subtype| IPVer | Address ID | | Kind | Length |Subtype|(resvd)| Address ID |
+---------------+---------------+-------+-------+---------------+ +---------------+---------------+-------+-------+---------------+
| Address (IPv4 - 4 octets / IPv6 - 16 octets) | | Address (IPv4 - 4 octets / IPv6 - 16 octets) |
+-------------------------------+-------------------------------+ +-------------------------------+-------------------------------+
| Port (2 octets, optional) | | | Port (2 octets, optional) | |
+-------------------------------+ | +-------------------------------+ |
| Truncated HMAC (8 octets) | | Truncated HMAC (8 octets) |
| +-------------------------------+ | +-------------------------------+
| | | |
+-------------------------------+ +-------------------------------+
Figure 12: Add Address (ADD_ADDR2) Option Figure 12: Add Address (ADD_ADDR) Option
Due to the proliferation of NATs, it is reasonably likely that one Due to the proliferation of NATs, it is reasonably likely that one
host may attempt to advertise private addresses [20]. It is not host may attempt to advertise private addresses [21]. It is not
desirable to prohibit this, since there may be cases where both hosts desirable to prohibit this, since there may be cases where both hosts
have additional interfaces on the same private network, and a host have additional interfaces on the same private network, and a host
MAY want to advertise such addresses. The MP_JOIN handshake to MAY want to advertise such addresses. The MP_JOIN handshake to
create a new subflow (Section 3.2) provides mechanisms to minimize create a new subflow (Section 3.2) provides mechanisms to minimize
security risks. The MP_JOIN message contains a 32-bit token that security risks. The MP_JOIN message contains a 32-bit token that
uniquely identifies the connection to the receiving host. If the uniquely identifies the connection to the receiving host. If the
token is unknown, the host will return with a RST. In the unlikely token is unknown, the host will return with a RST. In the unlikely
event that the token is known, subflow setup will continue, but the event that the token is known, subflow setup will continue, but the
HMAC exchange must occur for authentication. This will fail, and HMAC exchange must occur for authentication. This will fail, and
will provide sufficient protection against two unconnected hosts will provide sufficient protection against two unconnected hosts
accidentally setting up a new subflow upon the signal of a private accidentally setting up a new subflow upon the signal of a private
address. Further security considerations around the issue of address. Further security considerations around the issue of
ADD_ADDR2 messages that accidentally misdirect, or maliciously ADD_ADDR messages that accidentally misdirect, or maliciously direct,
direct, new MP_JOIN attempts are discussed in Section 5. new MP_JOIN attempts are discussed in Section 5.
Ideally, ADD_ADDR2 and REMOVE_ADDR options would be sent reliably, Ideally, ADD_ADDR and REMOVE_ADDR options would be sent reliably, and
and in order, to the other end. This would ensure that this address in order, to the other end. This would ensure that this address
management does not unnecessarily cause an outage in the connection management does not unnecessarily cause an outage in the connection
when remove/add addresses are processed in reverse order, and also to when remove/add addresses are processed in reverse order, and also to
ensure that all possible paths are used. Note, however, that losing ensure that all possible paths are used. Note, however, that losing
reliability and ordering will not break the multipath connections, it reliability and ordering will not break the multipath connections, it
will just reduce the opportunity to open multipath paths and to will just reduce the opportunity to open multipath paths and to
survive different patterns of path failures. survive different patterns of path failures.
Therefore, implementing reliability signals for these MPTCP options Therefore, implementing reliability signals for these MPTCP options
is not necessary. In order to minimize the impact of the loss of is not necessary. In order to minimize the impact of the loss of
these options, however, it is RECOMMENDED that a sender should send these options, however, it is RECOMMENDED that a sender should send
these options on all available subflows. If these options need to be these options on all available subflows. If these options need to be
received in order, an implementation SHOULD only send one ADD_ADDR2/ received in order, an implementation SHOULD only send one ADD_ADDR/
REMOVE_ADDR option per RTT, to minimize the risk of misordering. REMOVE_ADDR option per RTT, to minimize the risk of misordering.
A host can send an ADD_ADDR2 message with an already assigned Address A host can send an ADD_ADDR message with an already assigned Address
ID, but the Address MUST be the same as previously assigned to this ID, but the Address MUST be the same as previously assigned to this
Address ID, and the Port MUST be different from one already in use Address ID, and the Port MUST be different from one already in use
for this Address ID. If these conditions are not met, the receiver for this Address ID. If these conditions are not met, the receiver
SHOULD silently ignore the ADD_ADDR2. A host wishing to replace an SHOULD silently ignore the ADD_ADDR. A host wishing to replace an
existing Address ID MUST first remove the existing one existing Address ID MUST first remove the existing one
(Section 3.4.2). (Section 3.4.2).
A host that receives an ADD_ADDR2 but finds a connection set up to A host that receives an ADD_ADDR but finds a connection set up to
that IP address and port number is unsuccessful SHOULD NOT perform that IP address and port number is unsuccessful SHOULD NOT perform
further connection attempts to this address/port combination for this further connection attempts to this address/port combination for this
connection. A sender that wants to trigger a new incoming connection connection. A sender that wants to trigger a new incoming connection
attempt on a previously advertised address/port combination can attempt on a previously advertised address/port combination can
therefore refresh ADD_ADDR2 information by sending the option again. therefore refresh ADD_ADDR information by sending the option again.
During normal MPTCP operation, it is unlikely that there will be During normal MPTCP operation, it is unlikely that there will be
sufficient TCP option space for ADD_ADDR2 to be included along with sufficient TCP option space for ADD_ADDR to be included along with
those for data sequence numbering (Section 3.3.1). Therefore, it is those for data sequence numbering (Section 3.3.1). Therefore, it is
expected that an MPTCP implementation will send the ADD_ADDR2 option expected that an MPTCP implementation will send the ADD_ADDR option
on separate ACKs. As discussed earlier, however, an MPTCP on separate ACKs. As discussed earlier, however, an MPTCP
implementation MUST NOT treat duplicate ACKs with any MPTCP option, implementation MUST NOT treat duplicate ACKs with any MPTCP option,
with the exception of the DSS option, as indications of congestion with the exception of the DSS option, as indications of congestion
[14], and an MPTCP implementation SHOULD NOT send more than two [14], and an MPTCP implementation SHOULD NOT send more than two
duplicate ACKs in a row for signaling purposes. duplicate ACKs in a row for signaling purposes.
3.4.2. Remove Address 3.4.2. Remove Address
If, during the lifetime of an MPTCP connection, a previously If, during the lifetime of an MPTCP connection, a previously
announced address becomes invalid (e.g., if the interface announced address becomes invalid (e.g., if the interface
skipping to change at page 40, line 20 skipping to change at page 41, line 27
can remove subflows related to this address. can remove subflows related to this address.
This is achieved through the Remove Address (REMOVE_ADDR) option This is achieved through the Remove Address (REMOVE_ADDR) option
(Figure 13), which will remove a previously added address (or list of (Figure 13), which will remove a previously added address (or list of
addresses) from a connection and terminate any subflows currently addresses) from a connection and terminate any subflows currently
using that address. using that address.
For security purposes, if a host receives a REMOVE_ADDR option, it For security purposes, if a host receives a REMOVE_ADDR option, it
must ensure the affected path(s) are no longer in use before it must ensure the affected path(s) are no longer in use before it
instigates closure. The receipt of REMOVE_ADDR SHOULD first trigger instigates closure. The receipt of REMOVE_ADDR SHOULD first trigger
the sending of a TCP keepalive [21] on the path, and if a response is the sending of a TCP keepalive [22] on the path, and if a response is
received the path SHOULD NOT be removed. Typical TCP validity tests received the path SHOULD NOT be removed. Typical TCP validity tests
on the subflow (e.g., ensuring sequence and ACK numbers are correct) on the subflow (e.g., ensuring sequence and ACK numbers are correct)
MUST also be undertaken. An implementation can use indications of MUST also be undertaken. An implementation can use indications of
these test failures as part of intrusion detection or error logging. these test failures as part of intrusion detection or error logging.
The sending and receipt (if no keepalive response was received) of The sending and receipt (if no keepalive response was received) of
this message SHOULD trigger the sending of RSTs by both hosts on the this message SHOULD trigger the sending of RSTs by both hosts on the
affected subflow(s) (if possible), as a courtesy to cleaning up affected subflow(s) (if possible), as a courtesy to cleaning up
middlebox state, before cleaning up any local state. middlebox state, before cleaning up any local state.
skipping to change at page 44, line 5 skipping to change at page 45, line 13
reset and start again than it is to retransmit the queued data. reset and start again than it is to retransmit the queued data.
o Unacceptable performance (code 0x05). This code indicates that o Unacceptable performance (code 0x05). This code indicates that
the performance of this subflow was too low compared to the other the performance of this subflow was too low compared to the other
subflows of this Multipath TCP connection. subflows of this Multipath TCP connection.
o Middlebox interference (code 0x06). Middlebox interference has o Middlebox interference (code 0x06). Middlebox interference has
been detected over this subflow making MPTCP signaling invalid. been detected over this subflow making MPTCP signaling invalid.
For example, this may be sent if the checksum does not validate. For example, this may be sent if the checksum does not validate.
3.7. Fallback 3.7. MPTCP Experimental Option
In order to provide a structured identity and negotiation mechanism
for private experimental MPTCP extensions, the MP_EXPERIMENTAL option
has been reserved.
1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+---------------+---------------+-------+-------+---------------+
| Kind | Length |Subtype|S|U|rsv| Experiment |
+---------------+---------------+-------+-------+---------------+
| Id. (16 bits) | Subtype-specific data (variable length) ...
+----------------------------------------------------------- ...
Figure 16: MPTCP Experimental (MP_EXPERIMENTAL) Option
Figure 16 shows the format of the experimental option. The
Experiment identifier is a 16 bits integer that shall be assigned by
using the same procedure as defined in [23].
The two high order flags that are included in the MPTCP Experimental
option have the following semantics:
o "S" flag (highest order bit) : This is the synchronising bit.
When set to 1, it indicates that the host sending this option
expects a reply from the remote host with an option having the
same experiment identifier, but possibly containing other data.
o "U" flag (second highest order bit) : When set to 1, this flag
indicates that the experimental option was received by the sending
host but it was unable to parse it.
The two low order flags are currently reserved for further use. They
MUST be set to zero when sending and ignored upon reception.
To use the Experimental MPTCP option with a given experiment
identifier over a MPTCP connection, the sending host must first
verify the ability of the remote host to support this particular
Experimental option. For this, it first sends in any valid TCP
segment, including a duplicate acknowledgement, an Experimental MPTCP
option with the "S" flag set. Upon reception of this option, the
receiving host will verify whether it supports it. If yes, it shall
return a TCP segment that contains the experimental option with the
same identifier and the "S" and the "U" flags both set to 1. This
option may contain additional data depending on the semantics of the
extension. If the receiving host does not recognise the experimental
option that it has received, it shall return a TCP segment that
contains the received experimental option with the "S" flag set to 0
and the "U" flag set to 1.
If a host receives an Experimental MPTCP option with the "U" flag set
to 0 which it does not support, or which contains information that
the host cannot parse, it shall return the exact option that it
received with the "U" flag set to 1 to indicate the error to the
remote host. If an invalid option is received with the "U" flag set
to 0, it must be silently discarded.
Future documents specifying new experimental MPTCP options should
specify the extract semantic of the Subtype-specific data and whether
additional validation operations are to be followed at both sides.
It should be noted that data can be included in an experimental
option concurrently with the capability check (S/U).
3.8. Fallback
Sometimes, middleboxes will exist on a path that could prevent the Sometimes, middleboxes will exist on a path that could prevent the
operation of MPTCP. MPTCP has been designed in order to cope with operation of MPTCP. MPTCP has been designed in order to cope with
many middlebox modifications (see Section 6), but there are still many middlebox modifications (see Section 6), but there are still
some cases where a subflow could fail to operate within the MPTCP some cases where a subflow could fail to operate within the MPTCP
requirements. These cases are notably the following: the loss of requirements. These cases are notably the following: the loss of
MPTCP options on a path and the modification of payload data. If MPTCP options on a path and the modification of payload data. If
such an event occurs, it is necessary to "fall back" to the previous, such an event occurs, it is necessary to "fall back" to the previous,
safe operation. This may be either falling back to regular TCP or safe operation. This may be either falling back to regular TCP or
removing a problematic subflow. removing a problematic subflow.
skipping to change at page 46, line 16 skipping to change at page 48, line 39
tampered with. tampered with.
When multiple subflows are in use, the data in flight on a subflow When multiple subflows are in use, the data in flight on a subflow
will likely involve data that is not contiguously part of the will likely involve data that is not contiguously part of the
connection-level stream, since segments will be spread across the connection-level stream, since segments will be spread across the
multiple subflows. Due to the problems identified above, it is not multiple subflows. Due to the problems identified above, it is not
possible to determine what the adjustment has done to the data possible to determine what the adjustment has done to the data
(notably, any changes to the subflow sequence numbering). Therefore, (notably, any changes to the subflow sequence numbering). Therefore,
it is not possible to recover the subflow, and the affected subflow it is not possible to recover the subflow, and the affected subflow
must be immediately closed with a RST, featuring an MP_FAIL option must be immediately closed with a RST, featuring an MP_FAIL option
(Figure 16), which defines the data sequence number at the start of (Figure 17), which defines the data sequence number at the start of
the segment (defined by the data sequence mapping) that had the the segment (defined by the data sequence mapping) that had the
checksum failure. Note that the MP_FAIL option requires the use of checksum failure. Note that the MP_FAIL option requires the use of
the full 64-bit sequence number, even if 32-bit sequence numbers are the full 64-bit sequence number, even if 32-bit sequence numbers are
normally in use in the DSS signals on the path. normally in use in the DSS signals on the path.
1 2 3 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+---------------+---------------+-------+----------------------+ +---------------+---------------+-------+----------------------+
| Kind | Length=12 |Subtype| (reserved) | | Kind | Length=12 |Subtype| (reserved) |
+---------------+---------------+-------+----------------------+ +---------------+---------------+-------+----------------------+
| | | |
| Data Sequence Number (8 octets) | | Data Sequence Number (8 octets) |
| | | |
+--------------------------------------------------------------+ +--------------------------------------------------------------+
Figure 16: Fallback (MP_FAIL) Option Figure 17: Fallback (MP_FAIL) Option
The receiver MUST discard all data following the data sequence number The receiver MUST discard all data following the data sequence number
specified. Failed data MUST NOT be DATA_ACKed and so will be specified. Failed data MUST NOT be DATA_ACKed and so will be
retransmitted on other subflows (Section 3.3.6). retransmitted on other subflows (Section 3.3.6).
A special case is when there is a single subflow and it fails with a A special case is when there is a single subflow and it fails with a
checksum error. If it is known that all unacknowledged data in checksum error. If it is known that all unacknowledged data in
flight is contiguous (which will usually be the case with a single flight is contiguous (which will usually be the case with a single
subflow), an infinite mapping can be applied to the subflow without subflow), an infinite mapping can be applied to the subflow without
the need to close it first, and essentially turn off all further the need to close it first, and essentially turn off all further
skipping to change at page 47, line 43 skipping to change at page 50, line 25
otherwise, the receiver would not know how to reorder the data. In otherwise, the receiver would not know how to reorder the data. In
practice, this means that all MPTCP subflows will have to be practice, this means that all MPTCP subflows will have to be
terminated except one. Once MPTCP falls back to regular TCP, it MUST terminated except one. Once MPTCP falls back to regular TCP, it MUST
NOT revert to MPTCP later in the connection. NOT revert to MPTCP later in the connection.
It should be emphasized that we are not attempting to prevent the use It should be emphasized that we are not attempting to prevent the use
of middleboxes that want to adjust the payload. An MPTCP-aware of middleboxes that want to adjust the payload. An MPTCP-aware
middlebox could provide such functionality by also rewriting middlebox could provide such functionality by also rewriting
checksums. checksums.
3.8. Error Handling 3.9. Error Handling
In addition to the fallback mechanism as described above, the In addition to the fallback mechanism as described above, the
standard classes of TCP errors may need to be handled in an MPTCP- standard classes of TCP errors may need to be handled in an MPTCP-
specific way. Note that changing semantics -- such as the relevance specific way. Note that changing semantics -- such as the relevance
of a RST -- are covered in Section 4. Where possible, we do not want of a RST -- are covered in Section 4. Where possible, we do not want
to deviate from regular TCP behavior. to deviate from regular TCP behavior.
The following list covers possible errors and the appropriate MPTCP The following list covers possible errors and the appropriate MPTCP
behavior: behavior:
o Unknown token in MP_JOIN (or HMAC failure in MP_JOIN ACK, or o Unknown token in MP_JOIN (or HMAC failure in MP_JOIN ACK, or
missing MP_JOIN in SYN/ACK response): send RST (analogous to TCP's missing MP_JOIN in SYN/ACK response): send RST (analogous to TCP's
behavior on an unknown port) behavior on an unknown port)
o DSN out of window (during normal operation): drop the data, do not o DSN out of window (during normal operation): drop the data, do not
send Data ACKs send Data ACKs
o Remove request for unknown address ID: silently ignore o Remove request for unknown address ID: silently ignore
3.9. Heuristics 3.10. Heuristics
There are a number of heuristics that are needed for performance or There are a number of heuristics that are needed for performance or
deployment but that are not required for protocol correctness. In deployment but that are not required for protocol correctness. In
this section, we detail such heuristics. Note that discussion of this section, we detail such heuristics. Note that discussion of
buffering and certain sender and receiver window behaviors are buffering and certain sender and receiver window behaviors are
presented in Sections 3.3.4 and 3.3.5, as well as retransmission in presented in Sections 3.3.4 and 3.3.5, as well as retransmission in
Section 3.3.6. Section 3.3.6.
3.9.1. Port Usage 3.10.1. Port Usage
Under typical operation, an MPTCP implementation SHOULD use the same Under typical operation, an MPTCP implementation SHOULD use the same
ports as already in use. In other words, the destination port of a ports as already in use. In other words, the destination port of a
SYN containing an MP_JOIN option SHOULD be the same as the remote SYN containing an MP_JOIN option SHOULD be the same as the remote
port of the first subflow in the connection. The local port for such port of the first subflow in the connection. The local port for such
SYNs SHOULD also be the same as for the first subflow (and as such, SYNs SHOULD also be the same as for the first subflow (and as such,
an implementation SHOULD reserve ephemeral ports across all local IP an implementation SHOULD reserve ephemeral ports across all local IP
addresses), although there may be cases where this is infeasible. addresses), although there may be cases where this is infeasible.
This strategy is intended to maximize the probability of the SYN This strategy is intended to maximize the probability of the SYN
being permitted by a firewall or NAT at the recipient and to avoid being permitted by a firewall or NAT at the recipient and to avoid
confusing any network monitoring software. confusing any network monitoring software.
There may also be cases, however, where the passive opener wishes to There may also be cases, however, where the passive opener wishes to
signal to the other host that a specific port should be used, and signal to the other host that a specific port should be used, and
this facility is provided in the Add Address option as documented in this facility is provided in the Add Address option as documented in
Section 3.4.1. It is therefore feasible to allow multiple subflows Section 3.4.1. It is therefore feasible to allow multiple subflows
between the same two addresses but using different port pairs, and between the same two addresses but using different port pairs, and
such a facility could be used to allow load balancing within the such a facility could be used to allow load balancing within the
network based on 5-tuples (e.g., some ECMP implementations [8]). network based on 5-tuples (e.g., some ECMP implementations [8]).
3.9.2. Delayed Subflow Start and Subflow Symmetry 3.10.2. Delayed Subflow Start and Subflow Symmetry
Many TCP connections are short-lived and consist only of a few Many TCP connections are short-lived and consist only of a few
segments, and so the overheads of using MPTCP outweigh any benefits. segments, and so the overheads of using MPTCP outweigh any benefits.
A heuristic is required, therefore, to decide when to start using A heuristic is required, therefore, to decide when to start using
additional subflows in an MPTCP connection. We expect that additional subflows in an MPTCP connection. We expect that
experience gathered from deployments will provide further guidance on experience gathered from deployments will provide further guidance on
this, and will be affected by particular application characteristics this, and will be affected by particular application characteristics
(which are likely to change over time). However, a suggested (which are likely to change over time). However, a suggested
general-purpose heuristic that an implementation MAY choose to employ general-purpose heuristic that an implementation MAY choose to employ
is as follows. Results from experimental deployments are needed in is as follows. Results from experimental deployments are needed in
skipping to change at page 50, line 5 skipping to change at page 52, line 33
is RECOMMENDED that some element of randomization is applied to the is RECOMMENDED that some element of randomization is applied to the
time waited before opening new subflows, so that only one subflow time waited before opening new subflows, so that only one subflow
exists between a given address pair. If, however, hosts signal exists between a given address pair. If, however, hosts signal
additional ports to use (for example, for leveraging ECMP on-path), additional ports to use (for example, for leveraging ECMP on-path),
this heuristic need not apply. this heuristic need not apply.
This section has shown some of the considerations that an implementer This section has shown some of the considerations that an implementer
should give when developing MPTCP heuristics, but is not intended to should give when developing MPTCP heuristics, but is not intended to
be prescriptive. be prescriptive.
3.9.3. Failure Handling 3.10.3. Failure Handling
Requirements for MPTCP's handling of unexpected signals have been Requirements for MPTCP's handling of unexpected signals have been
given in Section 3.8. There are other failure cases, however, where given in Section 3.9. There are other failure cases, however, where
a hosts can choose appropriate behavior. a hosts can choose appropriate behavior.
For example, Section 3.1 suggests that a host SHOULD fall back to For example, Section 3.1 suggests that a host SHOULD fall back to
trying regular TCP SYNs after one or more failures of MPTCP SYNs for trying regular TCP SYNs after one or more failures of MPTCP SYNs for
a connection. A host may keep a system-wide cache of such a connection. A host may keep a system-wide cache of such
information, so that it can back off from using MPTCP, firstly for information, so that it can back off from using MPTCP, firstly for
that particular destination host, and eventually on a whole that particular destination host, and eventually on a whole
interface, if MPTCP connections continue failing. interface, if MPTCP connections continue failing.
Another failure could occur when the MP_JOIN handshake fails. Another failure could occur when the MP_JOIN handshake fails.
Section 3.8 specifies that an incorrect handshake MUST lead to the Section 3.9 specifies that an incorrect handshake MUST lead to the
subflow being closed with a RST. A host operating an active subflow being closed with a RST. A host operating an active
intrusion detection system may choose to start blocking MP_JOIN intrusion detection system may choose to start blocking MP_JOIN
packets from the source host if multiple failed MP_JOIN attempts are packets from the source host if multiple failed MP_JOIN attempts are
seen. From the connection initiator's point of view, if an MP_JOIN seen. From the connection initiator's point of view, if an MP_JOIN
fails, it SHOULD NOT attempt to connect to the same IP address and fails, it SHOULD NOT attempt to connect to the same IP address and
port during the lifetime of the connection, unless the other host port during the lifetime of the connection, unless the other host
refreshes the information with another ADD_ADDR2 option. Note that refreshes the information with another ADD_ADDR option. Note that
the ADD_ADDR2 option is informational only, and does not guarantee the ADD_ADDR option is informational only, and does not guarantee the
the other host will attempt a connection. other host will attempt a connection.
In addition, an implementation may learn, over a number of In addition, an implementation may learn, over a number of
connections, that certain interfaces or destination addresses connections, that certain interfaces or destination addresses
consistently fail and may default to not trying to use MPTCP for consistently fail and may default to not trying to use MPTCP for
these. Behavior could also be learned for particularly badly these. Behavior could also be learned for particularly badly
performing subflows or subflows that regularly fail during use, in performing subflows or subflows that regularly fail during use, in
order to temporarily choose not to use these paths. order to temporarily choose not to use these paths.
4. Semantic Issues 4. Semantic Issues
skipping to change at page 53, line 21 skipping to change at page 55, line 49
denial-of-service attacks consuming resources. denial-of-service attacks consuming resources.
As discussed in Section 3.4.1, a host may advertise its private As discussed in Section 3.4.1, a host may advertise its private
addresses, but these might point to different hosts in the receiver's addresses, but these might point to different hosts in the receiver's
network. The MP_JOIN handshake (Section 3.2) will ensure that this network. The MP_JOIN handshake (Section 3.2) will ensure that this
does not succeed in setting up a subflow to the incorrect host. does not succeed in setting up a subflow to the incorrect host.
However, it could still create unwanted TCP handshake traffic. This However, it could still create unwanted TCP handshake traffic. This
feature of MPTCP could be a target for denial-of-service exploits, feature of MPTCP could be a target for denial-of-service exploits,
with malicious participants in MPTCP connections encouraging the with malicious participants in MPTCP connections encouraging the
recipient to target other hosts in the network. Therefore, recipient to target other hosts in the network. Therefore,
implementations should consider heuristics (Section 3.9) at both the implementations should consider heuristics (Section 3.10) at both the
sender and receiver to reduce the impact of this. sender and receiver to reduce the impact of this.
A small security risk could theoretically exist with key reuse, but A small security risk could theoretically exist with key reuse, but
in order to accomplish a replay attack, both the sender and receiver in order to accomplish a replay attack, both the sender and receiver
keys, and the sender and receiver random numbers, in the MP_JOIN keys, and the sender and receiver random numbers, in the MP_JOIN
handshake (Section 3.2) would have to match. handshake (Section 3.2) would have to match.
Whilst this specification defines a "medium" security solution, Whilst this specification defines a "medium" security solution,
meeting the criteria specified at the start of this section and the meeting the criteria specified at the start of this section and the
threat analysis ([10]), since attacks only ever get worse, it is threat analysis ([10]), since attacks only ever get worse, it is
skipping to change at page 53, line 52 skipping to change at page 56, line 31
o defining a new MPCTP cryptographic algorithm, as negotiated in o defining a new MPCTP cryptographic algorithm, as negotiated in
MP_CAPABLE. A sub-case could be to include an additional MP_CAPABLE. A sub-case could be to include an additional
deployment assumption, such as stateful servers, in order to allow deployment assumption, such as stateful servers, in order to allow
a more powerful algorithm to be used. a more powerful algorithm to be used.
o defining how to secure data transfer with MPTCP, whilst not o defining how to secure data transfer with MPTCP, whilst not
changing the signaling part of the protocol. changing the signaling part of the protocol.
o defining security that requires more option space, perhaps in o defining security that requires more option space, perhaps in
conjunction with a "long options" proposal for extending the TCP conjunction with a "long options" proposal for extending the TCP
options space (such as those surveyed in [22]), or perhaps options space (such as those surveyed in [24]), or perhaps
building on the current approach with a second stage of MPTCP- building on the current approach with a second stage of MPTCP-
option-based security. option-based security.
o revisiting the working group's decision to exclusively use TCP o revisiting the working group's decision to exclusively use TCP
options for MPTCP signaling, and instead look at also making use options for MPTCP signaling, and instead look at also making use
of the TCP payloads. of the TCP payloads.
MPTCP has been designed with several methods available to indicate a MPTCP has been designed with several methods available to indicate a
new security mechanism, including: new security mechanism, including:
skipping to change at page 54, line 50 skipping to change at page 57, line 35
presence of the SYN flag. presence of the SYN flag.
MPTCP SYN packets on the first subflow of a connection contain the MPTCP SYN packets on the first subflow of a connection contain the
MP_CAPABLE option (Section 3.1). If this is dropped, MPTCP SHOULD MP_CAPABLE option (Section 3.1). If this is dropped, MPTCP SHOULD
fall back to regular TCP. If packets with the MP_JOIN option fall back to regular TCP. If packets with the MP_JOIN option
(Section 3.2) are dropped, the paths will simply not be used. (Section 3.2) are dropped, the paths will simply not be used.
If a middlebox strips options but otherwise passes the packets If a middlebox strips options but otherwise passes the packets
unchanged, MPTCP will behave safely. If an MP_CAPABLE option is unchanged, MPTCP will behave safely. If an MP_CAPABLE option is
dropped on either the outgoing or the return path, the initiating dropped on either the outgoing or the return path, the initiating
host can fall back to regular TCP, as illustrated in Figure 17 and host can fall back to regular TCP, as illustrated in Figure 18 and
discussed in Section 3.1. discussed in Section 3.1.
Subflow SYNs contain the MP_JOIN option. If this option is stripped Subflow SYNs contain the MP_JOIN option. If this option is stripped
on the outgoing path, the SYN will appear to be a regular SYN to Host on the outgoing path, the SYN will appear to be a regular SYN to Host
B. Depending on whether there is a listening socket on the target B. Depending on whether there is a listening socket on the target
port, Host B will reply either with SYN/ACK or RST (subflow port, Host B will reply either with SYN/ACK or RST (subflow
connection fails). When Host A receives the SYN/ACK it sends a RST connection fails). When Host A receives the SYN/ACK it sends a RST
because the SYN/ACK does not contain the MP_JOIN option and its because the SYN/ACK does not contain the MP_JOIN option and its
token. Either way, the subflow setup fails, but otherwise does not token. Either way, the subflow setup fails, but otherwise does not
affect the MPTCP connection as a whole. affect the MPTCP connection as a whole.
skipping to change at page 55, line 32 skipping to change at page 58, line 23
Host A Host B Host A Host B
| SYN(MP_CAPABLE) | | SYN(MP_CAPABLE) |
|------------------------------------>| |------------------------------------>|
| Middlebox M | | Middlebox M |
| | | | | |
| SYN/ACK |SYN/ACK(MP_CAPABLE)| | SYN/ACK |SYN/ACK(MP_CAPABLE)|
|<----------------|-------------------| |<----------------|-------------------|
b) MP_CAPABLE option stripped on return path b) MP_CAPABLE option stripped on return path
Figure 17: Connection Setup with Middleboxes that Strip Options from Figure 18: Connection Setup with Middleboxes that Strip Options from
Packets Packets
We now examine data flow with MPTCP, assuming the flow is correctly We now examine data flow with MPTCP, assuming the flow is correctly
set up, which implies the options in the SYN packets were allowed set up, which implies the options in the SYN packets were allowed
through by the relevant middleboxes. If options are allowed through through by the relevant middleboxes. If options are allowed through
and there is no resegmentation or coalescing to TCP segments, and there is no resegmentation or coalescing to TCP segments,
Multipath TCP flows can proceed without problems. Multipath TCP flows can proceed without problems.
The case when options get stripped on data packets has been discussed The case when options get stripped on data packets has been discussed
in the Fallback section. If a fraction of options are stripped, in the Fallback section. If a fraction of options are stripped,
behavior is not deterministic. If some data sequence mappings are behavior is not deterministic. If some data sequence mappings are
lost, the connection can continue so long as mappings exist for the lost, the connection can continue so long as mappings exist for the
subflow-level data (e.g., if multiple maps have been sent that subflow-level data (e.g., if multiple maps have been sent that
reinforce each other). If some subflow-level space is left unmapped, reinforce each other). If some subflow-level space is left unmapped,
however, the subflow is treated as broken and is closed, through the however, the subflow is treated as broken and is closed, through the
process described in Section 3.7. MPTCP should survive with a loss process described in Section 3.8. MPTCP should survive with a loss
of some Data ACKs, but performance will degrade as the fraction of of some Data ACKs, but performance will degrade as the fraction of
stripped options increases. We do not expect such cases to appear in stripped options increases. We do not expect such cases to appear in
practice, though: most middleboxes will either strip all options or practice, though: most middleboxes will either strip all options or
let them all through. let them all through.
We end this section with a list of middlebox classes, their behavior, We end this section with a list of middlebox classes, their behavior,
and the elements in the MPTCP design that allow operation through and the elements in the MPTCP design that allow operation through
such middleboxes. Issues surrounding dropping packets with options such middleboxes. Issues surrounding dropping packets with options
or stripping options were discussed above, and are not included here: or stripping options were discussed above, and are not included here:
o NATs [23] (Network Address (and Port) Translators) change the o NATs [25] (Network Address (and Port) Translators) change the
source address (and often source port) of packets. This means source address (and often source port) of packets. This means
that a host will not know its public-facing address for signaling that a host will not know its public-facing address for signaling
in MPTCP. Therefore, MPTCP permits implicit address addition via in MPTCP. Therefore, MPTCP permits implicit address addition via
the MP_JOIN option, and the handshake mechanism ensures that the MP_JOIN option, and the handshake mechanism ensures that
connection attempts to private addresses [20] do not cause connection attempts to private addresses [21] do not cause
problems. Explicit address removal is undertaken by an Address ID problems. Explicit address removal is undertaken by an Address ID
to allow no knowledge of the source address. to allow no knowledge of the source address.
o Performance Enhancing Proxies (PEPs) [24] might proactively ACK o Performance Enhancing Proxies (PEPs) [26] might proactively ACK
data to increase performance. MPTCP, however, relies on accurate data to increase performance. MPTCP, however, relies on accurate
congestion control signals from the end host, and non-MPTCP-aware congestion control signals from the end host, and non-MPTCP-aware
PEPs will not be able to provide such signals. MPTCP will, PEPs will not be able to provide such signals. MPTCP will,
therefore, fall back to single-path TCP, or close the problematic therefore, fall back to single-path TCP, or close the problematic
subflow (see Section 3.7). subflow (see Section 3.8).
o Traffic Normalizers [25] may not allow holes in sequence numbers, o Traffic Normalizers [27] may not allow holes in sequence numbers,
and may cache packets and retransmit the same data. MPTCP looks and may cache packets and retransmit the same data. MPTCP looks
like standard TCP on the wire, and will not retransmit different like standard TCP on the wire, and will not retransmit different
data on the same subflow sequence number. In the event of a data on the same subflow sequence number. In the event of a
retransmission, the same data will be retransmitted on the retransmission, the same data will be retransmitted on the
original TCP subflow even if it is additionally retransmitted at original TCP subflow even if it is additionally retransmitted at
the connection level on a different subflow. the connection level on a different subflow.
o Firewalls [26] might perform initial sequence number randomization o Firewalls [28] might perform initial sequence number randomization
on TCP connections. MPTCP uses relative sequence numbers in data on TCP connections. MPTCP uses relative sequence numbers in data
sequence mapping to cope with this. Like NATs, firewalls will not sequence mapping to cope with this. Like NATs, firewalls will not
permit many incoming connections, so MPTCP supports address permit many incoming connections, so MPTCP supports address
signaling (ADD_ADDR2) so that a multiaddressed host can invite its signaling (ADD_ADDR) so that a multiaddressed host can invite its
peer behind the firewall/NAT to connect out to its additional peer behind the firewall/NAT to connect out to its additional
interface. interface.
o Intrusion Detection Systems look out for traffic patterns and o Intrusion Detection Systems look out for traffic patterns and
content that could threaten a network. Multipath will mean that content that could threaten a network. Multipath will mean that
such data is potentially spread, so it is more difficult for an such data is potentially spread, so it is more difficult for an
IDS to analyze the whole traffic, and potentially increases the IDS to analyze the whole traffic, and potentially increases the
risk of false positives. However, for an MPTCP-aware IDS, tokens risk of false positives. However, for an MPTCP-aware IDS, tokens
can be read by such systems to correlate multiple subflows and can be read by such systems to correlate multiple subflows and
reassemble for analysis. reassemble for analysis.
skipping to change at page 57, line 33 skipping to change at page 60, line 25
segment. In this way, the mapping is independent of the packets segment. In this way, the mapping is independent of the packets
that carry it. that carry it.
o The receive window may be shrunk by some middleboxes at the o The receive window may be shrunk by some middleboxes at the
subflow level. MPTCP will use the maximum window at data level, subflow level. MPTCP will use the maximum window at data level,
but will also obey subflow-specific windows. but will also obey subflow-specific windows.
7. Acknowledgments 7. Acknowledgments
The authors gratefully acknowledge significant input into this The authors gratefully acknowledge significant input into this
document from Sebastien Barre, Christoph Paasch, and Andrew McDonald. document from Sebastien Barre and Andrew McDonald.
The authors also wish to acknowledge reviews and contributions from The authors also wish to acknowledge reviews and contributions from
Iljitsch van Beijnum, Lars Eggert, Marcelo Bagnulo, Robert Hancock, Iljitsch van Beijnum, Lars Eggert, Marcelo Bagnulo, Robert Hancock,
Pasi Sarolahti, Toby Moncaster, Philip Eardley, Sergio Lembo, Pasi Sarolahti, Toby Moncaster, Philip Eardley, Sergio Lembo,
Lawrence Conroy, Yoshifumi Nishida, Bob Briscoe, Stein Gjessing, Lawrence Conroy, Yoshifumi Nishida, Bob Briscoe, Stein Gjessing,
Andrew McGregor, Georg Hampel, Anumita Biswas, Wes Eddy, Alexey Andrew McGregor, Georg Hampel, Anumita Biswas, Wes Eddy, Alexey
Melnikov, Francis Dupont, Adrian Farrel, Barry Leiba, Robert Sparks, Melnikov, Francis Dupont, Adrian Farrel, Barry Leiba, Robert Sparks,
Sean Turner, Stephen Farrell, Martin Stiemerling, and Gregory Detal. Sean Turner, Stephen Farrell, Martin Stiemerling, and Gregory Detal.
8. IANA Considerations 8. IANA Considerations
skipping to change at page 58, line 16 skipping to change at page 60, line 52
| Kind | Length | Meaning | Reference | | Kind | Length | Meaning | Reference |
+------+--------+-----------------------+---------------+ +------+--------+-----------------------+---------------+
| 30 | N | Multipath TCP (MPTCP) | This document | | 30 | N | Multipath TCP (MPTCP) | This document |
+------+--------+-----------------------+---------------+ +------+--------+-----------------------+---------------+
Table 1: TCP Option Kind Numbers Table 1: TCP Option Kind Numbers
The 4-bit MPTCP subtype sub-registry ("MPTCP Option Subtypes" under The 4-bit MPTCP subtype sub-registry ("MPTCP Option Subtypes" under
the "Transmission Control Protocol (TCP) Parameters" registry) was the "Transmission Control Protocol (TCP) Parameters" registry) was
defined in [5]. This document defines one additional subtype defined in [5]. This document defines one additional subtype
(ADD_ADDR2) and updates the references to this document for all sub- (ADD_ADDR) and updates the references to this document for all sub-
types except ADD_ADDR, which is deprecated. The updates are listed types except ADD_ADDR, which is deprecated. The updates are listed
in the following table. in the following table.
+-------+--------------+----------------------------+---------------+ +-------+-----------------+-------------------------+---------------+
| Value | Symbol | Name | Reference | | Value | Symbol | Name | Reference |
+-------+--------------+----------------------------+---------------+ +-------+-----------------+-------------------------+---------------+
| 0x0 | MP_CAPABLE | Multipath Capable | This | | 0x0 | MP_CAPABLE | Multipath Capable | This |
| | | | document, | | | | | document, |
| | | | Section 3.1 | | | | | Section 3.1 |
| 0x1 | MP_JOIN | Join Connection | This | | 0x1 | MP_JOIN | Join Connection | This |
| | | | document, | | | | | document, |
| | | | Section 3.2 | | | | | Section 3.2 |
| 0x2 | DSS | Data Sequence Signal (Data | This | | 0x2 | DSS | Data Sequence Signal | This |
| | | ACK and data sequence | document, | | | | (Data ACK and data | document, |
| | | mapping) | Section 3.3 | | | | sequence mapping) | Section 3.3 |
| 0x4 | REMOVE_ADDR | Remove Address | This | | 0x3 | ADD_ADDR | Add Address | This |
| | | | document, | | | | | document, |
| | | | Section 3.4.2 | | | | | Section 3.4.1 |
| 0x5 | MP_PRIO | Change Subflow Priority | This | | 0x4 | REMOVE_ADDR | Remove Address | This |
| | | | document, | | | | | document, |
| | | | Section 3.3.8 | | | | | Section 3.4.2 |
| 0x6 | MP_FAIL | Fallback | This | | 0x5 | MP_PRIO | Change Subflow Priority | This |
| | | | document, | | | | | document, |
| | | | Section 3.7 | | | | | Section 3.3.8 |
| 0x7 | MP_FASTCLOSE | Fast Close | This | | 0x6 | MP_FAIL | Fallback | This |
| | | | document, | | | | | document, |
| | | | Section 3.5 | | | | | Section 3.8 |
| 0x8 | ADD_ADDR2 | Add Address | This | | 0x7 | MP_FASTCLOSE | Fast Close | This |
| | | | document, | | | | | document, |
| | | | Section 3.4.1 | | | | | Section 3.5 |
| 0x9 | MP_TCPRST | TCP Reset | This | | 0x8 | MP_TCPRST | Subflow Reset | This |
| | | | document, | | | | | document, |
| | | | Section 3.6 | | | | | Section 3.6 |
+-------+--------------+----------------------------+---------------+ | 0xf | MP_EXPERIMENTAL | MPTCP Experimental | This |
| | | Option | document, |
| | | | Section 3.7 |
+-------+-----------------+-------------------------+---------------+
Table 2: MPTCP Option Subtypes Table 2: MPTCP Option Subtypes
Values 0xa through 0xe are currently unassigned. The value 0xf is Values 0x9 through 0xe are currently unassigned.
reserved for Private Use within controlled testbeds. The value 0x3
was assigned to the deprecated ADD_ADDR option ([5]) and SHOULD be
silently ignored.
IANA has created another sub-registry, "MPTCP Handshake Algorithms" IANA has created another sub-registry, "MPTCP Handshake Algorithms"
under the "Transmission Control Protocol (TCP) Parameters" registry, under the "Transmission Control Protocol (TCP) Parameters" registry,
based on the flags in MP_CAPABLE (Section 3.1). IANA is requested to based on the flags in MP_CAPABLE (Section 3.1). IANA is requested to
update the references of this table to this document, as follows: update the references of this table to this document, as follows:
+----------+-------------------+----------------------------+ +----------+-------------------+----------------------------+
| Flag Bit | Meaning | Reference | | Flag Bit | Meaning | Reference |
+----------+-------------------+----------------------------+ +----------+-------------------+----------------------------+
| A | Checksum required | This document, Section 3.1 | | A | Checksum required | This document, Section 3.1 |
skipping to change at page 59, line 31 skipping to change at page 62, line 21
| H | HMAC-SHA1 | This document, Section 3.2 | | H | HMAC-SHA1 | This document, Section 3.2 |
+----------+-------------------+----------------------------+ +----------+-------------------+----------------------------+
Table 3: MPTCP Handshake Algorithms Table 3: MPTCP Handshake Algorithms
Note that the meanings of bits C through H can be dependent upon bit Note that the meanings of bits C through H can be dependent upon bit
B, depending on how Extensibility is defined in future B, depending on how Extensibility is defined in future
specifications; see Section 3.1 for more information. specifications; see Section 3.1 for more information.
Future assignments in this registry are also to be defined by Future assignments in this registry are also to be defined by
Standards Action as defined by [27]. Assignments consist of the Standards Action as defined by [29]. Assignments consist of the
value of the flags, a symbolic name for the algorithm, and a value of the flags, a symbolic name for the algorithm, and a
reference to its specification. reference to its specification.
IANA is requested to create a further sub-registry, "MP_TCPRST Reason IANA is requested to create a further sub-registry, "MP_TCPRST Reason
Codes" under the "Transmission Control Protocol (TCP) Parameters" Codes" under the "Transmission Control Protocol (TCP) Parameters"
registry, based on the reason code in MP_TCPRST (Section 3.6). The registry, based on the reason code in MP_TCPRST (Section 3.6). The
contents of this sub-registry are to to this document, as follows: contents of this sub-registry are to to this document, as follows:
+------+-----------------------------+----------------------------+ +------+-----------------------------+----------------------------+
| Code | Meaning | Reference | | Code | Meaning | Reference |
skipping to change at page 60, line 10 skipping to change at page 62, line 49
| 0x06 | Middlebox interference | This document, Section 3.6 | | 0x06 | Middlebox interference | This document, Section 3.6 |
+------+-----------------------------+----------------------------+ +------+-----------------------------+----------------------------+
Table 4: MPTCP MP_TCPRST Reason Codes Table 4: MPTCP MP_TCPRST Reason Codes
9. References 9. References
9.1. Normative References 9.1. Normative References
[1] Postel, J., "Transmission Control Protocol", STD 7, RFC 793, [1] Postel, J., "Transmission Control Protocol", STD 7, RFC 793,
September 1981. DOI 10.17487/RFC0793, September 1981,
<http://www.rfc-editor.org/info/rfc793>.
[2] Ford, A., Raiciu, C., Handley, M., Barre, S., and J. Iyengar, [2] Ford, A., Raiciu, C., Handley, M., Barre, S., and J. Iyengar,
"Architectural Guidelines for Multipath TCP Development", "Architectural Guidelines for Multipath TCP Development",
RFC 6182, March 2011. RFC 6182, DOI 10.17487/RFC6182, March 2011,
<http://www.rfc-editor.org/info/rfc6182>.
[3] Bradner, S., "Key words for use in RFCs to Indicate Requirement [3] Bradner, S., "Key words for use in RFCs to Indicate Requirement
Levels", BCP 14, RFC 2119, March 1997. Levels", BCP 14, RFC 2119, DOI 10.17487/RFC2119, March 1997,
<http://www.rfc-editor.org/info/rfc2119>.
[4] National Institute of Science and Technology, "Secure Hash [4] National Institute of Science and Technology, "Secure Hash
Standard", Federal Information Processing Standard Standard", Federal Information Processing Standard
(FIPS) 180-3, October 2008, <http://csrc.nist.gov/publications/ (FIPS) 180-3, October 2008, <http://csrc.nist.gov/publications/
fips/fips180-3/fips180-3_final.pdf>. fips/fips180-3/fips180-3_final.pdf>.
9.2. Informative References 9.2. Informative References
[5] Ford, A., Raiciu, C., Handley, M., and O. Bonaventure, "TCP [5] Ford, A., Raiciu, C., Handley, M., and O. Bonaventure, "TCP
Extensions for Multipath Operation with Multiple Addresses", Extensions for Multipath Operation with Multiple Addresses",
RFC 6824, January 2013. RFC 6824, DOI 10.17487/RFC6824, January 2013,
<http://www.rfc-editor.org/info/rfc6824>.
[6] Raiciu, C., Handley, M., and D. Wischik, "Coupled Congestion [6] Raiciu, C., Handley, M., and D. Wischik, "Coupled Congestion
Control for Multipath Transport Protocols", RFC 6356, Control for Multipath Transport Protocols", RFC 6356,
October 2011. DOI 10.17487/RFC6356, October 2011,
<http://www.rfc-editor.org/info/rfc6356>.
[7] Scharf, M. and A. Ford, "Multipath TCP (MPTCP) Application [7] Scharf, M. and A. Ford, "Multipath TCP (MPTCP) Application
Interface Considerations", RFC 6897, March 2013. Interface Considerations", RFC 6897, DOI 10.17487/RFC6897,
March 2013, <http://www.rfc-editor.org/info/rfc6897>.
[8] Hopps, C., "Analysis of an Equal-Cost Multi-Path Algorithm", [8] Hopps, C., "Analysis of an Equal-Cost Multi-Path Algorithm",
RFC 2992, November 2000. RFC 2992, DOI 10.17487/RFC2992, November 2000,
<http://www.rfc-editor.org/info/rfc2992>.
[9] Raiciu, C., Paasch, C., Barre, S., Ford, A., Honda, M., [9] Raiciu, C., Paasch, C., Barre, S., Ford, A., Honda, M.,
Duchene, F., Bonaventure, O., and M. Handley, "How Hard Can It Duchene, F., Bonaventure, O., and M. Handley, "How Hard Can It
Be? Designing and Implementing a Deployable Multipath TCP", Be? Designing and Implementing a Deployable Multipath TCP",
Usenix Symposium on Networked Systems Design and Usenix Symposium on Networked Systems Design and
Implementation 2012, <https://www.usenix.org/conference/nsdi12/ Implementation 2012, <https://www.usenix.org/conference/nsdi12/
how-hard-can-it-be-designing-and-implementing-deployable- how-hard-can-it-be-designing-and-implementing-deployable-
multipath-tcp>. multipath-tcp>.
[10] Bagnulo, M., "Threat Analysis for TCP Extensions for Multipath [10] Bagnulo, M., "Threat Analysis for TCP Extensions for Multipath
Operation with Multiple Addresses", RFC 6181, March 2011. Operation with Multiple Addresses", RFC 6181, DOI 10.17487/
RFC6181, March 2011, <http://www.rfc-editor.org/info/rfc6181>.
[11] Krawczyk, H., Bellare, M., and R. Canetti, "HMAC: Keyed-Hashing [11] Krawczyk, H., Bellare, M., and R. Canetti, "HMAC: Keyed-Hashing
for Message Authentication", RFC 2104, February 1997. for Message Authentication", RFC 2104, DOI 10.17487/RFC2104,
February 1997, <http://www.rfc-editor.org/info/rfc2104>.
[12] Ramaiah, A., Stewart, R., and M. Dalal, "Improving TCP's [12] Ramaiah, A., Stewart, R., and M. Dalal, "Improving TCP's
Robustness to Blind In-Window Attacks", RFC 5961, August 2010. Robustness to Blind In-Window Attacks", RFC 5961, DOI 10.17487/
RFC5961, August 2010, <http://www.rfc-editor.org/info/rfc5961>.
[13] Mathis, M., Mahdavi, J., Floyd, S., and A. Romanow, "TCP [13] Mathis, M., Mahdavi, J., Floyd, S., and A. Romanow, "TCP
Selective Acknowledgment Options", RFC 2018, October 1996. Selective Acknowledgment Options", RFC 2018, DOI 10.17487/
RFC2018, October 1996,
<http://www.rfc-editor.org/info/rfc2018>.
[14] Allman, M., Paxson, V., and E. Blanton, "TCP Congestion [14] Allman, M., Paxson, V., and E. Blanton, "TCP Congestion
Control", RFC 5681, September 2009. Control", RFC 5681, DOI 10.17487/RFC5681, September 2009,
<http://www.rfc-editor.org/info/rfc5681>.
[15] Gont, F. and S. Bellovin, "Defending against Sequence Number [15] Gont, F. and S. Bellovin, "Defending against Sequence Number
Attacks", RFC 6528, February 2012. Attacks", RFC 6528, DOI 10.17487/RFC6528, February 2012,
<http://www.rfc-editor.org/info/rfc6528>.
[16] Eastlake, D., Schiller, J., and S. Crocker, "Randomness [16] Eastlake 3rd, D., Schiller, J., and S. Crocker, "Randomness
Requirements for Security", BCP 106, RFC 4086, June 2005. Requirements for Security", BCP 106, RFC 4086, DOI 10.17487/
RFC4086, June 2005, <http://www.rfc-editor.org/info/rfc4086>.
[17] Eastlake, D. and T. Hansen, "US Secure Hash Algorithms (SHA and [17] Eddy, W., "TCP SYN Flooding Attacks and Common Mitigations",
SHA-based HMAC and HKDF)", RFC 6234, May 2011. RFC 4987, DOI 10.17487/RFC4987, August 2007,
<http://www.rfc-editor.org/info/rfc4987>.
[18] Jacobson, V., Braden, B., and D. Borman, "TCP Extensions for [18] Eastlake 3rd, D. and T. Hansen, "US Secure Hash Algorithms (SHA
High Performance", RFC 1323, May 1992. and SHA-based HMAC and HKDF)", RFC 6234, DOI 10.17487/RFC6234,
May 2011, <http://www.rfc-editor.org/info/rfc6234>.
[19] Ramakrishnan, K., Floyd, S., and D. Black, "The Addition of [19] Jacobson, V., Braden, R., and D. Borman, "TCP Extensions for
High Performance", RFC 1323, DOI 10.17487/RFC1323, May 1992,
<http://www.rfc-editor.org/info/rfc1323>.
[20] Ramakrishnan, K., Floyd, S., and D. Black, "The Addition of
Explicit Congestion Notification (ECN) to IP", RFC 3168, Explicit Congestion Notification (ECN) to IP", RFC 3168,
September 2001. DOI 10.17487/RFC3168, September 2001,
<http://www.rfc-editor.org/info/rfc3168>.
[20] Rekhter, Y., Moskowitz, R., Karrenberg, D., Groot, G., and E. [21] Rekhter, Y., Moskowitz, B., Karrenberg, D., de Groot, G., and
Lear, "Address Allocation for Private Internets", BCP 5, E. Lear, "Address Allocation for Private Internets", BCP 5,
RFC 1918, February 1996. RFC 1918, DOI 10.17487/RFC1918, February 1996,
<http://www.rfc-editor.org/info/rfc1918>.
[21] Braden, R., "Requirements for Internet Hosts - Communication [22] Braden, R., Ed., "Requirements for Internet Hosts -
Layers", STD 3, RFC 1122, October 1989. Communication Layers", STD 3, RFC 1122, DOI 10.17487/RFC1122,
October 1989, <http://www.rfc-editor.org/info/rfc1122>.
[22] Ramaiah, A., "TCP option space extension", Work in Progress, [23] Touch, J., "Shared Use of Experimental TCP Options", RFC 6994,
DOI 10.17487/RFC6994, August 2013,
<http://www.rfc-editor.org/info/rfc6994>.
[24] Ramaiah, A., "TCP option space extension", Work in Progress,
March 2012. March 2012.
[23] Srisuresh, P. and K. Egevang, "Traditional IP Network Address [25] Srisuresh, P. and K. Egevang, "Traditional IP Network Address
Translator (Traditional NAT)", RFC 3022, January 2001. Translator (Traditional NAT)", RFC 3022, DOI 10.17487/RFC3022,
January 2001, <http://www.rfc-editor.org/info/rfc3022>.
[24] Border, J., Kojo, M., Griner, J., Montenegro, G., and Z. [26] Border, J., Kojo, M., Griner, J., Montenegro, G., and Z.
Shelby, "Performance Enhancing Proxies Intended to Mitigate Shelby, "Performance Enhancing Proxies Intended to Mitigate
Link-Related Degradations", RFC 3135, June 2001. Link-Related Degradations", RFC 3135, DOI 10.17487/RFC3135,
June 2001, <http://www.rfc-editor.org/info/rfc3135>.
[25] Handley, M., Paxson, V., and C. Kreibich, "Network Intrusion [27] Handley, M., Paxson, V., and C. Kreibich, "Network Intrusion
Detection: Evasion, Traffic Normalization, and End-to-End Detection: Evasion, Traffic Normalization, and End-to-End
Protocol Semantics", Usenix Security 2001, 2001, <http:// Protocol Semantics", Usenix Security 2001, 2001, <http://
www.usenix.org/events/sec01/full_papers/handley/handley.pdf>. www.usenix.org/events/sec01/full_papers/handley/handley.pdf>.
[26] Freed, N., "Behavior of and Requirements for Internet [28] Freed, N., "Behavior of and Requirements for Internet
Firewalls", RFC 2979, October 2000. Firewalls", RFC 2979, DOI 10.17487/RFC2979, October 2000,
<http://www.rfc-editor.org/info/rfc2979>.
[27] Narten, T. and H. Alvestrand, "Guidelines for Writing an IANA [29] Narten, T. and H. Alvestrand, "Guidelines for Writing an IANA
Considerations Section in RFCs", BCP 26, RFC 5226, May 2008. Considerations Section in RFCs", BCP 26, RFC 5226,
DOI 10.17487/RFC5226, May 2008,
<http://www.rfc-editor.org/info/rfc5226>.
Appendix A. Notes on Use of TCP Options Appendix A. Notes on Use of TCP Options
The TCP option space is limited due to the length of the Data Offset The TCP option space is limited due to the length of the Data Offset
field in the TCP header (4 bits), which defines the TCP header length field in the TCP header (4 bits), which defines the TCP header length
in 32-bit words. With the standard TCP header being 20 bytes, this in 32-bit words. With the standard TCP header being 20 bytes, this
leaves a maximum of 40 bytes for options, and many of these may leaves a maximum of 40 bytes for options, and many of these may
already be used by options such as timestamp and SACK. already be used by options such as timestamp and SACK.
We have performed a brief study on the commonly used TCP options in We have performed a brief study on the commonly used TCP options in
skipping to change at page 62, line 33 skipping to change at page 66, line 9
bytes) options. Together these sum to 19 bytes. Some operating bytes) options. Together these sum to 19 bytes. Some operating
systems appear to pad each option up to a word boundary, thus using systems appear to pad each option up to a word boundary, thus using
24 bytes (a brief survey suggests Windows XP and Mac OS X do this, 24 bytes (a brief survey suggests Windows XP and Mac OS X do this,
whereas Linux does not). Optimistically, therefore, we have 21 bytes whereas Linux does not). Optimistically, therefore, we have 21 bytes
spare, or 16 if it has to be word-aligned. In either case, however, spare, or 16 if it has to be word-aligned. In either case, however,
the SYN versions of Multipath Capable (12 bytes) and Join (12 or 16 the SYN versions of Multipath Capable (12 bytes) and Join (12 or 16
bytes) options will fit in this remaining space. bytes) options will fit in this remaining space.
Note that due to the use of a 64-bit data-level sequence space, it is Note that due to the use of a 64-bit data-level sequence space, it is
feasible that MPTCP will not require the timestamp option for feasible that MPTCP will not require the timestamp option for
protection against wrapped sequence numbers (PAWS [18]), since the protection against wrapped sequence numbers (PAWS [19]), since the
data-level sequence space has far less chance of wrapping. data-level sequence space has far less chance of wrapping.
Confirmation of the validity of this optimisation is for further Confirmation of the validity of this optimisation is for further
study. study.
TCP data packets typically carry timestamp options in every packet, TCP data packets typically carry timestamp options in every packet,
taking 10 bytes (or 12 with padding). That leaves 30 bytes (or 28, taking 10 bytes (or 12 with padding). That leaves 30 bytes (or 28,
if word-aligned). The Data Sequence Signal (DSS) option varies in if word-aligned). The Data Sequence Signal (DSS) option varies in
length depending on whether the data sequence mapping and DATA_ACK length depending on whether the data sequence mapping and DATA_ACK
are included, and whether the sequence numbers in use are 4 or 8 are included, and whether the sequence numbers in use are 4 or 8
octets. The maximum size of the DSS option is 28 bytes, so even that octets. The maximum size of the DSS option is 28 bytes, so even that
skipping to change at page 63, line 25 skipping to change at page 66, line 49
Depending on the number of lost packets, SACK may utilize the entire Depending on the number of lost packets, SACK may utilize the entire
option space. If a DATA_ACK had to be included, then it is probably option space. If a DATA_ACK had to be included, then it is probably
necessary to reduce the number of SACK blocks to accommodate the necessary to reduce the number of SACK blocks to accommodate the
DATA_ACK. However, the presence of the DATA_ACK is unlikely to be DATA_ACK. However, the presence of the DATA_ACK is unlikely to be
necessary in a case where SACK is in use, since until at least some necessary in a case where SACK is in use, since until at least some
of the SACK blocks have been retransmitted, the cumulative data-level of the SACK blocks have been retransmitted, the cumulative data-level
ACK will not be moving forward (or if it does, due to retransmissions ACK will not be moving forward (or if it does, due to retransmissions
on another path, then that path can also be used to transmit the new on another path, then that path can also be used to transmit the new
DATA_ACK). DATA_ACK).
The ADD_ADDR2 option can be between 16 and 30 bytes, depending on The ADD_ADDR option can be between 16 and 30 bytes, depending on
whether IPv4 or IPv6 is used, and whether or not the port number is whether IPv4 or IPv6 is used, and whether or not the port number is
present. It is unlikely that such signaling would fit in a data present. It is unlikely that such signaling would fit in a data
packet (although if there is space, it is fine to include it). It is packet (although if there is space, it is fine to include it). It is
recommended to use duplicate ACKs with no other payload or options in recommended to use duplicate ACKs with no other payload or options in
order to transmit these rare signals. Note this is the reason for order to transmit these rare signals. Note this is the reason for
mandating that duplicate ACKs with MPTCP options are not taken as a mandating that duplicate ACKs with MPTCP options are not taken as a
signal of congestion. signal of congestion.
Finally, there are issues with reliable delivery of options. As Finally, there are issues with reliable delivery of options. As
options can also be sent on pure ACKs, these are not reliably sent. options can also be sent on pure ACKs, these are not reliably sent.
This is not an issue for DATA_ACK due to their cumulative nature, but This is not an issue for DATA_ACK due to their cumulative nature, but
may be an issue for ADD_ADDR2/REMOVE_ADDR options. Here, it is may be an issue for ADD_ADDR/REMOVE_ADDR options. Here, it is
recommended to send these options redundantly (whether on multiple recommended to send these options redundantly (whether on multiple
paths or on the same path on a number of ACKs -- but interspersed paths or on the same path on a number of ACKs -- but interspersed
with data in order to avoid interpretation as congestion). The cases with data in order to avoid interpretation as congestion). The cases
where options are stripped by middleboxes are discussed in Section 6. where options are stripped by middleboxes are discussed in Section 6.
Appendix B. Control Blocks Appendix B. Control Blocks
Conceptually, an MPTCP connection can be represented as an MPTCP Conceptually, an MPTCP connection can be represented as an MPTCP
control block that contains several variables that track the progress control block that contains several variables that track the progress
and the state of the MPTCP connection and a set of linked TCP control and the state of the MPTCP connection and a set of linked TCP control
skipping to change at page 66, line 7 skipping to change at page 69, line 28
is expected on the subflow. This state variable is modified upon is expected on the subflow. This state variable is modified upon
reception of in-order segments. The value of RCV.NXT is copied to reception of in-order segments. The value of RCV.NXT is copied to
the SEG.ACK field of the next segments transmitted on the subflow. the SEG.ACK field of the next segments transmitted on the subflow.
RCV.WND (32 bits with RFC 1323, 16 bits otherwise): This is the RCV.WND (32 bits with RFC 1323, 16 bits otherwise): This is the
subflow-level receive window that is updated with the window field subflow-level receive window that is updated with the window field
from the segments received on this subflow. from the segments received on this subflow.
Appendix C. Finite State Machine Appendix C. Finite State Machine
The diagram in Figure 18 shows the Finite State Machine for The diagram in Figure 19 shows the Finite State Machine for
connection-level closure. This illustrates how the DATA_FIN connection-level closure. This illustrates how the DATA_FIN
connection-level signal (indicated as the DFIN flag on a DATA_ACK) connection-level signal (indicated as the DFIN flag on a DATA_ACK)
interacts with subflow-level FINs, and permits "break-before-make" interacts with subflow-level FINs, and permits "break-before-make"
handover between subflows. handover between subflows.
+---------+ +---------+
| M_ESTAB | | M_ESTAB |
+---------+ +---------+
M_CLOSE | | rcv DATA_FIN M_CLOSE | | rcv DATA_FIN
------- | | ------- ------- | | -------
skipping to change at page 66, line 40 skipping to change at page 70, line 32
| rcv DATA_FIN -------------- | -------------- | | rcv DATA_FIN -------------- | -------------- |
| ------- CLOSE all subflows | CLOSE all subflows | | ------- CLOSE all subflows | CLOSE all subflows |
| snd DATA_ACK[DFIN] V delete MPTCP PCB V | snd DATA_ACK[DFIN] V delete MPTCP PCB V
\ +-----------+ +---------+ \ +-----------+ +---------+
------------------------>|M_TIME WAIT|----------------->| M_CLOSED| ------------------------>|M_TIME WAIT|----------------->| M_CLOSED|
+-----------+ +---------+ +-----------+ +---------+
All subflows in CLOSED All subflows in CLOSED
------------ ------------
delete MPTCP PCB delete MPTCP PCB
Figure 18: Finite State Machine for Connection Closure Figure 19: Finite State Machine for Connection Closure
Authors' Addresses Authors' Addresses
Alan Ford Alan Ford
Pexip Pexip
EMail: alan.ford@gmail.com EMail: alan.ford@gmail.com
Costin Raiciu Costin Raiciu
University Politehnica of Bucharest University Politehnica of Bucharest
Splaiul Independentei 313 Splaiul Independentei 313
Bucharest Bucharest
Romania Romania
EMail: costin.raiciu@cs.pub.ro EMail: costin.raiciu@cs.pub.ro
Mark Handley Mark Handley
University College London University College London
Gower Street Gower Street
London WC1E 6BT London WC1E 6BT
UK UK
EMail: m.handley@cs.ucl.ac.uk EMail: m.handley@cs.ucl.ac.uk
Olivier Bonaventure Olivier Bonaventure
Universite catholique de Louvain Universite catholique de Louvain
skipping to change at line 3137 skipping to change at page 71, line 19
EMail: m.handley@cs.ucl.ac.uk EMail: m.handley@cs.ucl.ac.uk
Olivier Bonaventure Olivier Bonaventure
Universite catholique de Louvain Universite catholique de Louvain
Pl. Ste Barbe, 2 Pl. Ste Barbe, 2
Louvain-la-Neuve 1348 Louvain-la-Neuve 1348
Belgium Belgium
EMail: olivier.bonaventure@uclouvain.be EMail: olivier.bonaventure@uclouvain.be
Christoph Paasch
Apple, Inc.
Cupertino
US
EMail: cpaasch@apple.com
 End of changes. 125 change blocks. 
309 lines changed or deleted 469 lines changed or added

This html diff was produced by rfcdiff 1.42. The latest version is available from http://tools.ietf.org/tools/rfcdiff/