draft-ietf-mptcp-rfc6824bis-02.txt   draft-ietf-mptcp-rfc6824bis-03.txt 
Internet Engineering Task Force A. Ford Internet Engineering Task Force A. Ford
Internet-Draft Pexip Internet-Draft Pexip
Obsoletes: 6824 (if approved) C. Raiciu Obsoletes: 6824 (if approved) C. Raiciu
Intended status: Experimental U. Politechnica of Bucharest Intended status: Experimental U. Politechnica of Bucharest
Expires: July 27, 2014 M. Handley Expires: April 30, 2015 M. Handley
U. College London U. College London
O. Bonaventure O. Bonaventure
U. catholique de Louvain U. catholique de Louvain
January 23, 2014 October 27, 2014
TCP Extensions for Multipath Operation with Multiple Addresses TCP Extensions for Multipath Operation with Multiple Addresses
draft-ietf-mptcp-rfc6824bis-02 draft-ietf-mptcp-rfc6824bis-03
Abstract Abstract
TCP/IP communication is currently restricted to a single path per TCP/IP communication is currently restricted to a single path per
connection, yet multiple paths often exist between peers. The connection, yet multiple paths often exist between peers. The
simultaneous use of these multiple paths for a TCP/IP session would simultaneous use of these multiple paths for a TCP/IP session would
improve resource usage within the network and, thus, improve user improve resource usage within the network and, thus, improve user
experience through higher throughput and improved resilience to experience through higher throughput and improved resilience to
network failure. network failure.
Multipath TCP provides the ability to simultaneously use multiple Multipath TCP provides the ability to simultaneously use multiple
paths between peers. This document presents a set of extensions to paths between peers. This document presents a set of extensions to
traditional TCP to support multipath operation. The protocol offers traditional TCP to support multipath operation. The protocol offers
the same type of service to applications as TCP (i.e., reliable the same type of service to applications as TCP (i.e., reliable
bytestream), and it provides the components necessary to establish bytestream), and it provides the components necessary to establish
and use multiple TCP flows across potentially disjoint paths. and use multiple TCP flows across potentially disjoint paths.
This document obsoletes RFC6824 [5] through clarifications and
modifications, primarily driven by deployment experience.
Status of This Memo Status of This Memo
This Internet-Draft is submitted in full conformance with the This Internet-Draft is submitted in full conformance with the
provisions of BCP 78 and BCP 79. provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF). Note that other groups may also distribute Task Force (IETF). Note that other groups may also distribute
working documents as Internet-Drafts. The list of current Internet- working documents as Internet-Drafts. The list of current Internet-
Drafts is at http://datatracker.ietf.org/drafts/current/. Drafts is at http://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress." material or to cite them other than as "work in progress."
This Internet-Draft will expire on April 30, 2015.
This Internet-Draft will expire on July 27, 2014.
Copyright Notice Copyright Notice
Copyright (c) 2014 IETF Trust and the persons identified as the Copyright (c) 2014 IETF Trust and the persons identified as the
document authors. All rights reserved. document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents Provisions Relating to IETF Documents
(http://trustee.ietf.org/license-info) in effect on the date of (http://trustee.ietf.org/license-info) in effect on the date of
publication of this document. Please review these documents publication of this document. Please review these documents
carefully, as they describe your rights and restrictions with respect carefully, as they describe your rights and restrictions with respect
to this document. Code Components extracted from this document must to this document. Code Components extracted from this document must
include Simplified BSD License text as described in Section 4.e of include Simplified BSD License text as described in Section 4.e of
skipping to change at page 2, line 48 skipping to change at page 3, line 4
3.3. General MPTCP Operation . . . . . . . . . . . . . . . . . 23 3.3. General MPTCP Operation . . . . . . . . . . . . . . . . . 23
3.3.1. Data Sequence Mapping . . . . . . . . . . . . . . . . 25 3.3.1. Data Sequence Mapping . . . . . . . . . . . . . . . . 25
3.3.2. Data Acknowledgments . . . . . . . . . . . . . . . . . 28 3.3.2. Data Acknowledgments . . . . . . . . . . . . . . . . . 28
3.3.3. Closing a Connection . . . . . . . . . . . . . . . . . 29 3.3.3. Closing a Connection . . . . . . . . . . . . . . . . . 29
3.3.4. Receiver Considerations . . . . . . . . . . . . . . . 30 3.3.4. Receiver Considerations . . . . . . . . . . . . . . . 30
3.3.5. Sender Considerations . . . . . . . . . . . . . . . . 31 3.3.5. Sender Considerations . . . . . . . . . . . . . . . . 31
3.3.6. Reliability and Retransmissions . . . . . . . . . . . 32 3.3.6. Reliability and Retransmissions . . . . . . . . . . . 32
3.3.7. Congestion Control Considerations . . . . . . . . . . 33 3.3.7. Congestion Control Considerations . . . . . . . . . . 33
3.3.8. Subflow Policy . . . . . . . . . . . . . . . . . . . . 34 3.3.8. Subflow Policy . . . . . . . . . . . . . . . . . . . . 34
3.4. Address Knowledge Exchange (Path Management) . . . . . . . 35 3.4. Address Knowledge Exchange (Path Management) . . . . . . . 35
3.4.1. Address Advertisement . . . . . . . . . . . . . . . . 36 3.4.1. Address Advertisement . . . . . . . . . . . . . . . . 37
3.4.2. Remove Address . . . . . . . . . . . . . . . . . . . . 39 3.4.2. Remove Address . . . . . . . . . . . . . . . . . . . . 39
3.5. Fast Close . . . . . . . . . . . . . . . . . . . . . . . . 40 3.5. Fast Close . . . . . . . . . . . . . . . . . . . . . . . . 40
3.6. Fallback . . . . . . . . . . . . . . . . . . . . . . . . . 41 3.6. Subflow Reset . . . . . . . . . . . . . . . . . . . . . . 42
3.7. Error Handling . . . . . . . . . . . . . . . . . . . . . . 45 3.7. Fallback . . . . . . . . . . . . . . . . . . . . . . . . . 43
3.8. Heuristics . . . . . . . . . . . . . . . . . . . . . . . . 46 3.8. Error Handling . . . . . . . . . . . . . . . . . . . . . . 47
3.8.1. Port Usage . . . . . . . . . . . . . . . . . . . . . . 46 3.9. Heuristics . . . . . . . . . . . . . . . . . . . . . . . . 48
3.8.2. Delayed Subflow Start . . . . . . . . . . . . . . . . 46 3.9.1. Port Usage . . . . . . . . . . . . . . . . . . . . . . 48
3.8.3. Failure Handling . . . . . . . . . . . . . . . . . . . 47 3.9.2. Delayed Subflow Start and Subflow Symmetry . . . . . . 48
4. Semantic Issues . . . . . . . . . . . . . . . . . . . . . . . 48 3.9.3. Failure Handling . . . . . . . . . . . . . . . . . . . 49
5. Security Considerations . . . . . . . . . . . . . . . . . . . 49 4. Semantic Issues . . . . . . . . . . . . . . . . . . . . . . . 50
6. Interactions with Middleboxes . . . . . . . . . . . . . . . . 51 5. Security Considerations . . . . . . . . . . . . . . . . . . . 51
7. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 55 6. Interactions with Middleboxes . . . . . . . . . . . . . . . . 54
8. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 55 7. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 57
9. References . . . . . . . . . . . . . . . . . . . . . . . . . . 57 8. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 57
9.1. Normative References . . . . . . . . . . . . . . . . . . . 57 9. References . . . . . . . . . . . . . . . . . . . . . . . . . . 59
9.2. Informative References . . . . . . . . . . . . . . . . . . 57 9.1. Normative References . . . . . . . . . . . . . . . . . . . 59
Appendix A. Notes on Use of TCP Options . . . . . . . . . . . . . 59 9.2. Informative References . . . . . . . . . . . . . . . . . . 60
Appendix B. Control Blocks . . . . . . . . . . . . . . . . . . . 60 Appendix A. Notes on Use of TCP Options . . . . . . . . . . . . . 61
B.1. MPTCP Control Block . . . . . . . . . . . . . . . . . . . 61 Appendix B. Control Blocks . . . . . . . . . . . . . . . . . . . 63
B.1.1. Authentication and Metadata . . . . . . . . . . . . . 61 B.1. MPTCP Control Block . . . . . . . . . . . . . . . . . . . 63
B.1.2. Sending Side . . . . . . . . . . . . . . . . . . . . . 61 B.1.1. Authentication and Metadata . . . . . . . . . . . . . 64
B.1.3. Receiving Side . . . . . . . . . . . . . . . . . . . . 61 B.1.2. Sending Side . . . . . . . . . . . . . . . . . . . . . 64
B.2. TCP Control Blocks . . . . . . . . . . . . . . . . . . . . 62 B.1.3. Receiving Side . . . . . . . . . . . . . . . . . . . . 64
B.2.1. Sending Side . . . . . . . . . . . . . . . . . . . . . 62 B.2. TCP Control Blocks . . . . . . . . . . . . . . . . . . . . 65
B.2.2. Receiving Side . . . . . . . . . . . . . . . . . . . . 62 B.2.1. Sending Side . . . . . . . . . . . . . . . . . . . . . 65
Appendix C. Finite State Machine . . . . . . . . . . . . . . . . 62 B.2.2. Receiving Side . . . . . . . . . . . . . . . . . . . . 65
Appendix C. Finite State Machine . . . . . . . . . . . . . . . . 65
1. Introduction 1. Introduction
Multipath TCP (MPTCP) is a set of extensions to regular TCP [1] to Multipath TCP (MPTCP) is a set of extensions to regular TCP [1] to
provide a Multipath TCP [2] service, which enables a transport provide a Multipath TCP [2] service, which enables a transport
connection to operate across multiple paths simultaneously. This connection to operate across multiple paths simultaneously. This
document presents the protocol changes required to add multipath document presents the protocol changes required to add multipath
capability to TCP; specifically, those for signaling and setting up capability to TCP; specifically, those for signaling and setting up
multiple paths ("subflows"), managing these subflows, reassembly of multiple paths ("subflows"), managing these subflows, reassembly of
data, and termination of sessions. This is not the only information data, and termination of sessions. This is not the only information
required to create a Multipath TCP implementation, however. This required to create a Multipath TCP implementation, however. This
document is complemented by three others: document is complemented by three others:
o Architecture [2], which explains the motivations behind Multipath o Architecture [2], which explains the motivations behind Multipath
TCP, contains a discussion of high-level design decisions on which TCP, contains a discussion of high-level design decisions on which
this design is based, and an explanation of a functional this design is based, and an explanation of a functional
separation through which an extensible MPTCP implementation can be separation through which an extensible MPTCP implementation can be
developed. developed.
o Congestion control [5] presents a safe congestion control o Congestion control [6] presents a safe congestion control
algorithm for coupling the behavior of the multiple paths in order algorithm for coupling the behavior of the multiple paths in order
to "do no harm" to other network users. to "do no harm" to other network users.
o Application considerations [6] discusses what impact MPTCP will o Application considerations [7] discusses what impact MPTCP will
have on applications, what applications will want to do with have on applications, what applications will want to do with
MPTCP, and as a consequence of these factors, what API extensions MPTCP, and as a consequence of these factors, what API extensions
an MPTCP implementation should present. an MPTCP implementation should present.
This document is an update to, and obsoletes, the first specification This document is an update to, and obsoletes, the first specification
of Multipath TCP [7]. Changes are limited to behavioural of Multipath TCP [5]. Changes are limited to behavioural
clarifications and new messages that can coexist with earlier clarifications and new messages that can coexist with earlier
implementations. implementations.
1.1. Design Assumptions 1.1. Design Assumptions
In order to limit the potentially huge design space, the working In order to limit the potentially huge design space, the working
group imposed two key constraints on the Multipath TCP design group imposed two key constraints on the Multipath TCP design
presented in this document: presented in this document:
o It must be backwards-compatible with current, regular TCP, to o It must be backwards-compatible with current, regular TCP, to
skipping to change at page 5, line 6 skipping to change at page 5, line 6
o It can be assumed that one or both hosts are multihomed and o It can be assumed that one or both hosts are multihomed and
multiaddressed. multiaddressed.
To simplify the design, we assume that the presence of multiple To simplify the design, we assume that the presence of multiple
addresses at a host is sufficient to indicate the existence of addresses at a host is sufficient to indicate the existence of
multiple paths. These paths need not be entirely disjoint: they may multiple paths. These paths need not be entirely disjoint: they may
share one or many routers between them. Even in such a situation, share one or many routers between them. Even in such a situation,
making use of multiple paths is beneficial, improving resource making use of multiple paths is beneficial, improving resource
utilization and resilience to a subset of node failures. The utilization and resilience to a subset of node failures. The
congestion control algorithms defined in [5] ensure this does not act congestion control algorithms defined in [6] ensure this does not act
detrimentally. Furthermore, there may be some scenarios where detrimentally. Furthermore, there may be some scenarios where
different TCP ports on a single host can provide disjoint paths (such different TCP ports on a single host can provide disjoint paths (such
as through certain Equal-Cost Multipath (ECMP) implementations [8]), as through certain Equal-Cost Multipath (ECMP) implementations [8]),
and so the MPTCP design also supports the use of ports in path and so the MPTCP design also supports the use of ports in path
identifiers. identifiers.
There are three aspects to the backwards-compatibility listed above There are three aspects to the backwards-compatibility listed above
(discussed in more detail in [2]): (discussed in more detail in [2]):
External Constraints: The protocol must function through the vast External Constraints: The protocol must function through the vast
skipping to change at page 5, line 34 skipping to change at page 5, line 34
Application Constraints: The protocol must be usable with no change Application Constraints: The protocol must be usable with no change
to existing applications that use the common TCP API (although it to existing applications that use the common TCP API (although it
is reasonable that not all features would be available to such is reasonable that not all features would be available to such
legacy applications). Furthermore, the protocol must provide the legacy applications). Furthermore, the protocol must provide the
same service model as regular TCP to the application. same service model as regular TCP to the application.
Fallback: The protocol should be able to fall back to standard TCP Fallback: The protocol should be able to fall back to standard TCP
with no interference from the user, to be able to communicate with with no interference from the user, to be able to communicate with
legacy hosts. legacy hosts.
The complementary application considerations document [6] discusses The complementary application considerations document [7] discusses
the necessary features of an API to provide backwards-compatibility, the necessary features of an API to provide backwards-compatibility,
as well as API extensions to convey the behavior of MPTCP at a level as well as API extensions to convey the behavior of MPTCP at a level
of control and information equivalent to that available with regular, of control and information equivalent to that available with regular,
single-path TCP. single-path TCP.
Further discussion of the design constraints and associated design Further discussion of the design constraints and associated design
decisions are given in the MPTCP Architecture document [2] and in decisions are given in the MPTCP Architecture document [2] and in
[9]. [9].
1.2. Multipath TCP in the Networking Stack 1.2. Multipath TCP in the Networking Stack
MPTCP operates at the transport layer and aims to be transparent to MPTCP operates at the transport layer and aims to be transparent to
both higher and lower layers. It is a set of additional features on both higher and lower layers. It is a set of additional features on
top of standard TCP; Figure 1 illustrates this layering. MPTCP is top of standard TCP; Figure 1 illustrates this layering. MPTCP is
designed to be usable by legacy applications with no changes; designed to be usable by legacy applications with no changes;
detailed discussion of its interactions with applications is given in detailed discussion of its interactions with applications is given in
[6]. [7].
+-------------------------------+ +-------------------------------+
| Application | | Application |
+---------------+ +-------------------------------+ +---------------+ +-------------------------------+
| Application | | MPTCP | | Application | | MPTCP |
+---------------+ + - - - - - - - + - - - - - - - + +---------------+ + - - - - - - - + - - - - - - - +
| TCP | | Subflow (TCP) | Subflow (TCP) | | TCP | | Subflow (TCP) | Subflow (TCP) |
+---------------+ +-------------------------------+ +---------------+ +-------------------------------+
| IP | | IP | IP | | IP | | IP | IP |
+---------------+ +-------------------------------+ +---------------+ +-------------------------------+
skipping to change at page 7, line 13 skipping to change at page 7, line 13
Section 4. Section 4.
1.4. MPTCP Concept 1.4. MPTCP Concept
This section provides a high-level summary of normal operation of This section provides a high-level summary of normal operation of
MPTCP, and is illustrated by the scenario shown in Figure 2. A MPTCP, and is illustrated by the scenario shown in Figure 2. A
detailed description of operation is given in Section 3. detailed description of operation is given in Section 3.
o To a non-MPTCP-aware application, MPTCP will behave the same as o To a non-MPTCP-aware application, MPTCP will behave the same as
normal TCP. Extended APIs could provide additional control to normal TCP. Extended APIs could provide additional control to
MPTCP-aware applications [6]. An application begins by opening a MPTCP-aware applications [7]. An application begins by opening a
TCP socket in the normal way. MPTCP signaling and operation are TCP socket in the normal way. MPTCP signaling and operation are
handled by the MPTCP implementation. handled by the MPTCP implementation.
o An MPTCP connection begins similarly to a regular TCP connection. o An MPTCP connection begins similarly to a regular TCP connection.
This is illustrated in Figure 2 where an MPTCP connection is This is illustrated in Figure 2 where an MPTCP connection is
established between addresses A1 and B1 on Hosts A and B, established between addresses A1 and B1 on Hosts A and B,
respectively. respectively.
o If extra paths are available, additional TCP sessions (termed o If extra paths are available, additional TCP sessions (termed
MPTCP "subflows") are created on these paths, and are combined MPTCP "subflows") are created on these paths, and are combined
skipping to change at page 14, line 9 skipping to change at page 14, line 9
Furthermore, standard TCP validity checks (such as ensuring the Furthermore, standard TCP validity checks (such as ensuring the
sequence number and acknowledgment number are within window) MUST be sequence number and acknowledgment number are within window) MUST be
undertaken before processing any MPTCP signals, as described in [12], undertaken before processing any MPTCP signals, as described in [12],
and initial subfow sequence numbers SHOULD be generated according to and initial subfow sequence numbers SHOULD be generated according to
the recommendations in [15]. the recommendations in [15].
3.1. Connection Initiation 3.1. Connection Initiation
Connection initiation begins with a SYN, SYN/ACK, ACK exchange on a Connection initiation begins with a SYN, SYN/ACK, ACK exchange on a
single path. Each packet contains the Multipath Capable (MP_CAPABLE) single path. Each packet contains the Multipath Capable (MP_CAPABLE)
TCP option (Figure 4). This option declares its sender is capable of MPTCP option (Figure 4). This option declares its sender is capable
performing Multipath TCP and wishes to do so on this particular of performing Multipath TCP and wishes to do so on this particular
connection. connection.
This option is used to declare the 64-bit key that the sender has This option is used to declare the 64-bit key that the sender has
generated for this MPTCP connection. This key is used to generated for this MPTCP connection. This key is used to
authenticate the addition of future subflows to this connection. authenticate the addition of future subflows to this connection.
This is the only time the key will be sent in clear on the wire This is the only time the key will be sent in clear on the wire
(unless "fast close", Section 3.5, is used); all future subflows will (unless "fast close", Section 3.5, is used); all future subflows will
identify the connection using a 32-bit "token". This token is a identify the connection using a 32-bit "token". This token is a
cryptographic hash of this key. The algorithm for this process is cryptographic hash of this key. The algorithm for this process is
dependent on the authentication algorithm selected; the method of dependent on the authentication algorithm selected; the method of
skipping to change at page 15, line 32 skipping to change at page 15, line 32
B's Key is echoed in the ACK in order to allow the listener (Host B) B's Key is echoed in the ACK in order to allow the listener (Host B)
to act statelessly until the TCP connection reaches the ESTABLISHED to act statelessly until the TCP connection reaches the ESTABLISHED
state. If the listener acts in this way, however, it MUST generate state. If the listener acts in this way, however, it MUST generate
its key in a way that would allow it to verify that it generated the its key in a way that would allow it to verify that it generated the
key when it is echoed in the ACK. key when it is echoed in the ACK.
This exchange allows the safe passage of MPTCP options on SYN packets This exchange allows the safe passage of MPTCP options on SYN packets
to be determined. If any of these options are dropped, MPTCP will to be determined. If any of these options are dropped, MPTCP will
gracefully fall back to regular single-path TCP, as documented in gracefully fall back to regular single-path TCP, as documented in
Section 3.6. Note that new subflows MUST NOT be established (using Section 3.7. Note that new subflows MUST NOT be established (using
the process documented in Section 3.2) until a Data Sequence Signal the process documented in Section 3.2) until a Data Sequence Signal
(DSS) option has been successfully received across the path (as (DSS) option has been successfully received across the path (as
documented in Section 3.3). documented in Section 3.3).
1 2 3 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+---------------+---------------+-------+-------+---------------+ +---------------+---------------+-------+-------+---------------+
| Kind | Length |Subtype|Version|A|B|C|D|E|F|G|H| | Kind | Length |Subtype|Version|A|B|C|D|E|F|G|H|
+---------------+---------------+-------+-------+---------------+ +---------------+---------------+-------+-------+---------------+
| Option Sender's Key (64 bits) | | Option Sender's Key (64 bits) |
skipping to change at page 17, line 44 skipping to change at page 17, line 44
If a SYN contains an MP_CAPABLE option but the SYN/ACK does not, it If a SYN contains an MP_CAPABLE option but the SYN/ACK does not, it
is assumed that the passive opener is not multipath capable; thus, is assumed that the passive opener is not multipath capable; thus,
the MPTCP session MUST operate as a regular, single-path TCP. If a the MPTCP session MUST operate as a regular, single-path TCP. If a
SYN does not contain a MP_CAPABLE option, the SYN/ACK MUST NOT SYN does not contain a MP_CAPABLE option, the SYN/ACK MUST NOT
contain one in response. If the third packet (the ACK) does not contain one in response. If the third packet (the ACK) does not
contain the MP_CAPABLE option, then the session MUST fall back to contain the MP_CAPABLE option, then the session MUST fall back to
operating as a regular, single-path TCP. This is to maintain operating as a regular, single-path TCP. This is to maintain
compatibility with middleboxes on the path that drop some or all TCP compatibility with middleboxes on the path that drop some or all TCP
options. Note that an implementation MAY choose to attempt sending options. Note that an implementation MAY choose to attempt sending
MPTCP options more than one time before making this decision to MPTCP options more than one time before making this decision to
operate as regular TCP (see Section 3.8). operate as regular TCP (see Section 3.9).
If the SYN packets are unacknowledged, it is up to local policy to If the SYN packets are unacknowledged, it is up to local policy to
decide how to respond. It is expected that a sender will eventually decide how to respond. It is expected that a sender will eventually
fall back to single-path TCP (i.e., without the MP_CAPABLE option) in fall back to single-path TCP (i.e., without the MP_CAPABLE option) in
order to work around middleboxes that may drop packets with unknown order to work around middleboxes that may drop packets with unknown
options; however, the number of multipath-capable attempts that are options; however, the number of multipath-capable attempts that are
made first will be up to local policy. It is possible that MPTCP and made first will be up to local policy. It is possible that MPTCP and
non-MPTCP SYNs could get reordered in the network. Therefore, the non-MPTCP SYNs could get reordered in the network. Therefore, the
final state is inferred from the presence or absence of the final state is inferred from the presence or absence of the
MP_CAPABLE option in the third packet of the TCP handshake. If this MP_CAPABLE option in the third packet of the TCP handshake. If this
option is not present, the connection SHOULD fall back to regular option is not present, the connection SHOULD fall back to regular
TCP, as documented in Section 3.6. TCP, as documented in Section 3.7.
The initial data sequence number on an MPTCP connection is generated The initial data sequence number on an MPTCP connection is generated
from the key. The algorithm for IDSN generation is also determined from the key. The algorithm for IDSN generation is also determined
from the negotiated authentication algorithm. In this specification, from the negotiated authentication algorithm. In this specification,
with only the SHA-1 algorithm specified and selected, the IDSN of a with only the SHA-1 algorithm specified and selected, the IDSN of a
host MUST be the least significant 64 bits of the SHA-1 hash of its host MUST be the least significant 64 bits of the SHA-1 hash of its
key, i.e., IDSN-A = Hash(Key-A) and IDSN-B = Hash(Key-B). This key, i.e., IDSN-A = Hash(Key-A) and IDSN-B = Hash(Key-B). This
deterministic generation of the IDSN allows a receiver to ensure that deterministic generation of the IDSN allows a receiver to ensure that
there are no gaps in sequence space at the start of the connection. there are no gaps in sequence space at the start of the connection.
The SYN with MP_CAPABLE occupies the first octet of data sequence The SYN with MP_CAPABLE occupies the first octet of data sequence
skipping to change at page 18, line 32 skipping to change at page 18, line 32
3.2. Starting a New Subflow 3.2. Starting a New Subflow
Once an MPTCP connection has begun with the MP_CAPABLE exchange, Once an MPTCP connection has begun with the MP_CAPABLE exchange,
further subflows can be added to the connection. Hosts have further subflows can be added to the connection. Hosts have
knowledge of their own address(es), and can become aware of the other knowledge of their own address(es), and can become aware of the other
host's addresses through signaling exchanges as described in host's addresses through signaling exchanges as described in
Section 3.4. Using this knowledge, a host can initiate a new subflow Section 3.4. Using this knowledge, a host can initiate a new subflow
over a currently unused pair of addresses. It is permitted for over a currently unused pair of addresses. It is permitted for
either host in a connection to initiate the creation of a new either host in a connection to initiate the creation of a new
subflow, but it is expected that this will normally be the original subflow, but it is expected that this will normally be the original
connection initiator (see Section 3.8 for heuristics). connection initiator (see Section 3.9 for heuristics).
A new subflow is started as a normal TCP SYN/ACK exchange. The Join A new subflow is started as a normal TCP SYN/ACK exchange. The Join
Connection (MP_JOIN) TCP option is used to identify the connection to Connection (MP_JOIN) MPTCP option is used to identify the connection
be joined by the new subflow. It uses keying material that was to be joined by the new subflow. It uses keying material that was
exchanged in the initial MP_CAPABLE handshake (Section 3.1), and that exchanged in the initial MP_CAPABLE handshake (Section 3.1), and that
handshake also negotiates the crypto algorithm in use for the MP_JOIN handshake also negotiates the crypto algorithm in use for the MP_JOIN
handshake. handshake.
This section specifies the behavior of MP_JOIN using the HMAC-SHA1 This section specifies the behavior of MP_JOIN using the HMAC-SHA1
algorithm. An MP_JOIN option is present in the SYN, SYN/ACK, and ACK algorithm. An MP_JOIN option is present in the SYN, SYN/ACK, and ACK
of the three-way handshake, although in each case with a different of the three-way handshake, although in each case with a different
format. format.
In the first MP_JOIN on the SYN packet, illustrated in Figure 5, the In the first MP_JOIN on the SYN packet, illustrated in Figure 5, the
skipping to change at page 20, line 25 skipping to change at page 20, line 25
Figure 5: Join Connection (MP_JOIN) Option (for Initial SYN) Figure 5: Join Connection (MP_JOIN) Option (for Initial SYN)
When receiving a SYN with an MP_JOIN option that contains a valid When receiving a SYN with an MP_JOIN option that contains a valid
token for an existing MPTCP connection, the recipient SHOULD respond token for an existing MPTCP connection, the recipient SHOULD respond
with a SYN/ACK also containing an MP_JOIN option containing a random with a SYN/ACK also containing an MP_JOIN option containing a random
number and a truncated (leftmost 64 bits) Hash-based Message number and a truncated (leftmost 64 bits) Hash-based Message
Authentication Code (HMAC). This version of the option is shown in Authentication Code (HMAC). This version of the option is shown in
Figure 6. If the token is unknown, or the host wants to refuse Figure 6. If the token is unknown, or the host wants to refuse
subflow establishment (for example, due to a limit on the number of subflow establishment (for example, due to a limit on the number of
subflows it will permit), the receiver will send back a reset (RST) subflows it will permit), the receiver will send back a reset (RST)
signal, analogous to an unknown port in TCP. Although calculating an signal, analogous to an unknown port in TCP, containing a MP_TCPRST
HMAC requires cryptographic operations, it is believed that the 32- option (Section 3.6) with an appropriate reason code. Although
bit token in the MP_JOIN SYN gives sufficient protection against calculating an HMAC requires cryptographic operations, it is believed
blind state exhaustion attacks; therefore, there is no need to that the 32-bit token in the MP_JOIN SYN gives sufficient protection
provide mechanisms to allow a responder to operate statelessly at the against blind state exhaustion attacks; therefore, there is no need
MP_JOIN stage. to provide mechanisms to allow a responder to operate statelessly at
the MP_JOIN stage.
An HMAC is sent by both hosts -- by the initiator (Host A) in the An HMAC is sent by both hosts -- by the initiator (Host A) in the
third packet (the ACK) and by the responder (Host B) in the second third packet (the ACK) and by the responder (Host B) in the second
packet (the SYN/ACK). Doing the HMAC exchange at this stage allows packet (the SYN/ACK). Doing the HMAC exchange at this stage allows
both hosts to have first exchanged random data (in the first two SYN both hosts to have first exchanged random data (in the first two SYN
packets) that is used as the "message". This specification defines packets) that is used as the "message". This specification defines
that HMAC as defined in [11] is used, along with the SHA-1 hash that HMAC as defined in [11] is used, along with the SHA-1 hash
algorithm [4] (potentially implemented as in [17]), thus generating a algorithm [4] (potentially implemented as in [17]), thus generating a
160-bit / 20-octet HMAC. Due to option space limitations, the HMAC 160-bit / 20-octet HMAC. Due to option space limitations, the HMAC
included in the SYN/ACK is truncated to the leftmost 64 bits, but included in the SYN/ACK is truncated to the leftmost 64 bits, but
skipping to change at page 21, line 44 skipping to change at page 21, line 45
+---------------+---------------+-------+-----------------------+ +---------------+---------------+-------+-----------------------+
| | | |
| | | |
| Sender's HMAC (160 bits) | | Sender's HMAC (160 bits) |
| | | |
| | | |
+---------------------------------------------------------------+ +---------------------------------------------------------------+
Figure 7: Join Connection (MP_JOIN) Option (for Third ACK) Figure 7: Join Connection (MP_JOIN) Option (for Third ACK)
These various TCP options fit together to enable authenticated These various MPTCP options fit together to enable authenticated
subflow setup as illustrated in Figure 8. subflow setup as illustrated in Figure 8.
Host A Host B Host A Host B
------------------------ ---------- ------------------------ ----------
Address A1 Address A2 Address B1 Address A1 Address A2 Address B1
---------- ---------- ---------- ---------- ---------- ----------
| | | | | |
| SYN + MP_CAPABLE(Key-A) | | SYN + MP_CAPABLE(Key-A) |
|--------------------------------------------->| |--------------------------------------------->|
|<---------------------------------------------| |<---------------------------------------------|
skipping to change at page 22, line 35 skipping to change at page 22, line 35
| |<-------------------------------| | |<-------------------------------|
| | ACK | | | ACK |
HMAC-A = HMAC(Key=(Key-A+Key-B), Msg=(R-A+R-B)) HMAC-A = HMAC(Key=(Key-A+Key-B), Msg=(R-A+R-B))
HMAC-B = HMAC(Key=(Key-B+Key-A), Msg=(R-B+R-A)) HMAC-B = HMAC(Key=(Key-B+Key-A), Msg=(R-B+R-A))
Figure 8: Example Use of MPTCP Authentication Figure 8: Example Use of MPTCP Authentication
If the token received at Host B is unknown or local policy prohibits If the token received at Host B is unknown or local policy prohibits
the acceptance of the new subflow, the recipient MUST respond with a the acceptance of the new subflow, the recipient MUST respond with a
TCP RST for the subflow. TCP RST for the subflow, with a MP_TCPRST option (Section 3.6) with
an appropriate reason code.
If the token is accepted at Host B, but the HMAC returned to Host A If the token is accepted at Host B, but the HMAC returned to Host A
does not match the one expected, Host A MUST close the subflow with a does not match the one expected, Host A MUST close the subflow with a
TCP RST. TCP RST. In this, and all following cases of sending a RST in this
section, the sender SHOULD send a MP_TCPRST option (Section 3.6) on
this RST packet with the reason code for a "MPTCP specific error".
If Host B does not receive the expected HMAC, or the MP_JOIN option If Host B does not receive the expected HMAC, or the MP_JOIN option
is missing from the ACK, it MUST close the subflow with a TCP RST. is missing from the ACK, it MUST close the subflow with a TCP RST
with a MP_TCPRST (Section 3.6) option with the reason code for "MPTCP
specific error".
If the HMACs are verified as correct, then both hosts have If the HMACs are verified as correct, then both hosts have
authenticated each other as being the same peers as existed at the authenticated each other as being the same peers as existed at the
start of the connection, and they have agreed of which connection start of the connection, and they have agreed of which connection
this subflow will become a part. this subflow will become a part.
If the SYN/ACK as received at Host A does not have an MP_JOIN option, If the SYN/ACK as received at Host A does not have an MP_JOIN option,
Host A MUST close the subflow with a RST. Host A MUST close the subflow with a TCP RST with a MP_TCPRST
(Section 3.6) option with the reason code for "MPTCP specific error".
This covers all cases of the loss of an MP_JOIN. In more detail, if This covers all cases of the loss of an MP_JOIN. In more detail, if
MP_JOIN is stripped from the SYN on the path from A to B, and Host B MP_JOIN is stripped from the SYN on the path from A to B, and Host B
does not have a passive opener on the relevant port, it will respond does not have a passive opener on the relevant port, it will respond
with a RST in the normal way. If in response to a SYN with an with a RST in the normal way. If in response to a SYN with an
MP_JOIN option, a SYN/ACK is received without the MP_JOIN option MP_JOIN option, a SYN/ACK is received without the MP_JOIN option
(either since it was stripped on the return path, or it was stripped (either since it was stripped on the return path, or it was stripped
on the outgoing path but the passive opener on Host B responded as if on the outgoing path but the passive opener on Host B responded as if
it were a new regular TCP session), then the subflow is unusable and it were a new regular TCP session), then the subflow is unusable and
Host A MUST close it with a RST. Host A MUST close it with a RST.
Note that additional subflows can be created between any pair of Note that additional subflows can be created between any pair of
ports (but see Section 3.8 for heuristics); no explicit application- ports (but see Section 3.9 for heuristics); no explicit application-
level accept calls or bind calls are required to open additional level accept calls or bind calls are required to open additional
subflows. To associate a new subflow with an existing connection, subflows. To associate a new subflow with an existing connection,
the token supplied in the subflow's SYN exchange is used for the token supplied in the subflow's SYN exchange is used for
demultiplexing. This then binds the 5-tuple of the TCP subflow to demultiplexing. This then binds the 5-tuple of the TCP subflow to
the local token of the connection. A consequence is that it is the local token of the connection. A consequence is that it is
possible to allow any port pairs to be used for a connection. possible to allow any port pairs to be used for a connection.
Demultiplexing subflow SYNs MUST be done using the token; this is Demultiplexing subflow SYNs MUST be done using the token; this is
unlike traditional TCP, where the destination port is used for unlike traditional TCP, where the destination port is used for
demultiplexing SYN packets. Once a subflow is set up, demultiplexing demultiplexing SYN packets. Once a subflow is set up, demultiplexing
skipping to change at page 24, line 52 skipping to change at page 25, line 6
are described in Section 3.3.3. The remaining reserved bits MUST be are described in Section 3.3.3. The remaining reserved bits MUST be
set to zero by an implementation of this specification. set to zero by an implementation of this specification.
Note that the checksum is only present in this option if the use of Note that the checksum is only present in this option if the use of
MPTCP checksumming has been negotiated at the MP_CAPABLE handshake MPTCP checksumming has been negotiated at the MP_CAPABLE handshake
(see Section 3.1). The presence of the checksum can be inferred from (see Section 3.1). The presence of the checksum can be inferred from
the length of the option. If a checksum is present, but its use had the length of the option. If a checksum is present, but its use had
not been negotiated in the MP_CAPABLE handshake, the checksum field not been negotiated in the MP_CAPABLE handshake, the checksum field
MUST be ignored. If a checksum is not present when its use has been MUST be ignored. If a checksum is not present when its use has been
negotiated, the receiver MUST close the subflow with a RST as it is negotiated, the receiver MUST close the subflow with a RST as it is
considered broken. considered broken. This RST SHOULD be accompanied with a MP_TCPRST
option (Section 3.6) with the reason code for a "MPTCP specific
error".
3.3.1. Data Sequence Mapping 3.3.1. Data Sequence Mapping
The data stream as a whole can be reassembled through the use of the The data stream as a whole can be reassembled through the use of the
data sequence mapping components of the DSS option (Figure 9), which data sequence mapping components of the DSS option (Figure 9), which
define the mapping from the subflow sequence number to the data define the mapping from the subflow sequence number to the data
sequence number. This is used by the receiver to ensure in-order sequence number. This is used by the receiver to ensure in-order
delivery to the application layer. Meanwhile, the subflow-level delivery to the application layer. Meanwhile, the subflow-level
sequence numbers (i.e., the regular sequence numbers in the TCP sequence numbers (i.e., the regular sequence numbers in the TCP
header) have subflow-only relevance. It is expected (but not header) have subflow-only relevance. It is expected (but not
skipping to change at page 26, line 4 skipping to change at page 26, line 10
the subflow sequence numbering is relative (the SYN at the start of the subflow sequence numbering is relative (the SYN at the start of
the subflow has relative subflow sequence number 0). This is to the subflow has relative subflow sequence number 0). This is to
allow middleboxes to change the initial sequence number of a subflow, allow middleboxes to change the initial sequence number of a subflow,
such as firewalls that undertake ISN randomization. such as firewalls that undertake ISN randomization.
The data sequence mapping also contains a checksum of the data that The data sequence mapping also contains a checksum of the data that
this mapping covers, if use of checksums has been negotiated at the this mapping covers, if use of checksums has been negotiated at the
MP_CAPABLE exchange. Checksums are used to detect if the payload has MP_CAPABLE exchange. Checksums are used to detect if the payload has
been adjusted in any way by a non-MPTCP-aware middlebox. If this been adjusted in any way by a non-MPTCP-aware middlebox. If this
checksum fails, it will trigger a failure of the subflow, or a checksum fails, it will trigger a failure of the subflow, or a
fallback to regular TCP, as documented in Section 3.6, since MPTCP fallback to regular TCP, as documented in Section 3.7, since MPTCP
can no longer reliably know the subflow sequence space at the can no longer reliably know the subflow sequence space at the
receiver to build data sequence mappings. receiver to build data sequence mappings.
The checksum algorithm used is the standard TCP checksum [1], The checksum algorithm used is the standard TCP checksum [1],
operating over the data covered by this mapping, along with a pseudo- operating over the data covered by this mapping, along with a pseudo-
header as shown in Figure 10. header as shown in Figure 10.
1 2 3 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+--------------------------------------------------------------+ +--------------------------------------------------------------+
skipping to change at page 28, line 4 skipping to change at page 28, line 9
A data sequence mapping does not need to be included in every MPTCP A data sequence mapping does not need to be included in every MPTCP
packet, as long as the subflow sequence space in that packet is packet, as long as the subflow sequence space in that packet is
covered by a mapping known at the receiver. This can be used to covered by a mapping known at the receiver. This can be used to
reduce overhead in cases where the mapping is known in advance; one reduce overhead in cases where the mapping is known in advance; one
such case is when there is a single subflow between the hosts, such case is when there is a single subflow between the hosts,
another is when segments of data are scheduled in larger than packet- another is when segments of data are scheduled in larger than packet-
sized chunks. sized chunks.
An "infinite" mapping can be used to fall back to regular TCP by An "infinite" mapping can be used to fall back to regular TCP by
mapping the subflow-level data to the connection-level data for the mapping the subflow-level data to the connection-level data for the
remainder of the connection (see Section 3.6). This is achieved by remainder of the connection (see Section 3.7). This is achieved by
setting the Data-Level Length field of the DSS option to the reserved setting the Data-Level Length field of the DSS option to the reserved
value of 0. The checksum, in such a case, will also be set to zero. value of 0. The checksum, in such a case, will also be set to zero.
3.3.2. Data Acknowledgments 3.3.2. Data Acknowledgments
To provide full end-to-end resilience, MPTCP provides a connection- To provide full end-to-end resilience, MPTCP provides a connection-
level acknowledgment, to act as a cumulative ACK for the connection level acknowledgment, to act as a cumulative ACK for the connection
as a whole. This is the "Data ACK" field of the DSS option as a whole. This is the "Data ACK" field of the DSS option
(Figure 9). The Data ACK is analogous to the behavior of the (Figure 9). The Data ACK is analogous to the behavior of the
standard TCP cumulative ACK -- indicating how much data has been standard TCP cumulative ACK -- indicating how much data has been
skipping to change at page 28, line 49 skipping to change at page 29, line 6
An MPTCP sender MUST NOT free data from the send buffer until it has An MPTCP sender MUST NOT free data from the send buffer until it has
been acknowledged by both a Data ACK received on any subflow and at been acknowledged by both a Data ACK received on any subflow and at
the subflow level by all subflows on which the data was sent. The the subflow level by all subflows on which the data was sent. The
former condition ensures liveness of the connection and the latter former condition ensures liveness of the connection and the latter
condition ensures liveness and self-consistence of a subflow when condition ensures liveness and self-consistence of a subflow when
data needs to be retransmitted. Note, however, that if some data data needs to be retransmitted. Note, however, that if some data
needs to be retransmitted multiple times over a subflow, there is a needs to be retransmitted multiple times over a subflow, there is a
risk of blocking the sending window. In this case, the MPTCP sender risk of blocking the sending window. In this case, the MPTCP sender
can decide to terminate the subflow that is behaving badly by sending can decide to terminate the subflow that is behaving badly by sending
a RST. a RST, using an appropriate MP_TCPRST (Section 3.6) error code.
The Data ACK MAY be included in all segments; however, optimizations The Data ACK MAY be included in all segments; however, optimizations
SHOULD be considered in more advanced implementations, where the Data SHOULD be considered in more advanced implementations, where the Data
ACK is present in segments only when the Data ACK value advances, and ACK is present in segments only when the Data ACK value advances, and
this behavior MUST be treated as valid. This behavior ensures the this behavior MUST be treated as valid. This behavior ensures the
sender buffer is freed, while reducing overhead when the data sender buffer is freed, while reducing overhead when the data
transfer is unidirectional. transfer is unidirectional.
3.3.3. Closing a Connection 3.3.3. Closing a Connection
skipping to change at page 30, line 17 skipping to change at page 30, line 22
the segment with the DATA_FIN flag set is the only outstanding the segment with the DATA_FIN flag set is the only outstanding
segment. segment.
Once a DATA_FIN has been acknowledged, all remaining subflows MUST be Once a DATA_FIN has been acknowledged, all remaining subflows MUST be
closed with standard FIN exchanges. Both hosts SHOULD send FINs on closed with standard FIN exchanges. Both hosts SHOULD send FINs on
all subflows, as a courtesy to allow middleboxes to clean up state all subflows, as a courtesy to allow middleboxes to clean up state
even if an individual subflow has failed. It is also encouraged to even if an individual subflow has failed. It is also encouraged to
reduce the timeouts (Maximum Segment Life) on subflows at end hosts. reduce the timeouts (Maximum Segment Life) on subflows at end hosts.
In particular, any subflows where there is still outstanding data In particular, any subflows where there is still outstanding data
queued (which has been retransmitted on other subflows in order to queued (which has been retransmitted on other subflows in order to
get the DATA_FIN acknowledged) MAY be closed with a RST. get the DATA_FIN acknowledged) MAY be closed with a RST with
MP_TCPRST (Section 3.6) error code for "too much outstanding data".
A connection is considered closed once both hosts' DATA_FINs have A connection is considered closed once both hosts' DATA_FINs have
been acknowledged by DATA_ACKs. been acknowledged by DATA_ACKs.
As specified above, a standard TCP FIN on an individual subflow only As specified above, a standard TCP FIN on an individual subflow only
shuts down the subflow on which it was sent. If all subflows have shuts down the subflow on which it was sent. If all subflows have
been closed with a FIN exchange, but no DATA_FIN has been received been closed with a FIN exchange, but no DATA_FIN has been received
and acknowledged, the MPTCP connection is treated as closed only and acknowledged, the MPTCP connection is treated as closed only
after a timeout. This implies that an implementation will have after a timeout. This implies that an implementation will have
TIME_WAIT states at both the subflow and connection levels (see TIME_WAIT states at both the subflow and connection levels (see
skipping to change at page 33, line 32 skipping to change at page 33, line 38
too. The sender will declare the subflow failed after a predefined too. The sender will declare the subflow failed after a predefined
upper bound on retransmissions is reached (which MAY be lower than upper bound on retransmissions is reached (which MAY be lower than
the usual TCP limits of the Maximum Segment Life), or on the receipt the usual TCP limits of the Maximum Segment Life), or on the receipt
of an ICMP error, and only then delete the outstanding data segments. of an ICMP error, and only then delete the outstanding data segments.
Multiple retransmissions are triggers that will indicate that a Multiple retransmissions are triggers that will indicate that a
subflow performs badly and could lead to a host resetting the subflow subflow performs badly and could lead to a host resetting the subflow
with a RST. However, additional research is required to understand with a RST. However, additional research is required to understand
the heuristics of how and when to reset underperforming subflows. the heuristics of how and when to reset underperforming subflows.
For example, a highly asymmetric path may be misdiagnosed as For example, a highly asymmetric path may be misdiagnosed as
underperforming. underperforming. A RST for this purpose SHOULD be accompanied with
an appropriate MP_TCPRST option (Section 3.6).
3.3.7. Congestion Control Considerations 3.3.7. Congestion Control Considerations
Different subflows in an MPTCP connection have different congestion Different subflows in an MPTCP connection have different congestion
windows. To achieve fairness at bottlenecks and resource pooling, it windows. To achieve fairness at bottlenecks and resource pooling, it
is necessary to couple the congestion windows in use on each subflow, is necessary to couple the congestion windows in use on each subflow,
in order to push most traffic to uncongested links. One algorithm in order to push most traffic to uncongested links. One algorithm
for achieving this is presented in [5]; the algorithm does not for achieving this is presented in [6]; the algorithm does not
achieve perfect resource pooling but is "safe" in that it is readily achieve perfect resource pooling but is "safe" in that it is readily
deployable in the current Internet. By this, we mean that it does deployable in the current Internet. By this, we mean that it does
not take up more capacity on any one path than if it was a single not take up more capacity on any one path than if it was a single
path flow using only that route, so this ensures fair coexistence path flow using only that route, so this ensures fair coexistence
with single-path TCP at shared bottlenecks. with single-path TCP at shared bottlenecks.
It is foreseeable that different congestion controllers will be It is foreseeable that different congestion controllers will be
implemented for MPTCP, each aiming to achieve different properties in implemented for MPTCP, each aiming to achieve different properties in
the resource pooling/fairness/stability design space, as well as the resource pooling/fairness/stability design space, as well as
those for achieving different properties in quality of service, those for achieving different properties in quality of service,
skipping to change at page 34, line 18 skipping to change at page 34, line 25
for each subflow, which packets were lost and when. for each subflow, which packets were lost and when.
3.3.8. Subflow Policy 3.3.8. Subflow Policy
Within a local MPTCP implementation, a host may use any local policy Within a local MPTCP implementation, a host may use any local policy
it wishes to decide how to share the traffic to be sent over the it wishes to decide how to share the traffic to be sent over the
available paths. available paths.
In the typical use case, where the goal is to maximize throughput, In the typical use case, where the goal is to maximize throughput,
all available paths will be used simultaneously for data transfer, all available paths will be used simultaneously for data transfer,
using coupled congestion control as described in [5]. It is using coupled congestion control as described in [6]. It is
expected, however, that other use cases will appear. expected, however, that other use cases will appear.
For instance, a possibility is an 'all-or-nothing' approach, i.e., For instance, a possibility is an 'all-or-nothing' approach, i.e.,
have a second path ready for use in the event of failure of the first have a second path ready for use in the event of failure of the first
path, but alternatives could include entirely saturating one path path, but alternatives could include entirely saturating one path
before using an additional path (the 'overflow' case). Such choices before using an additional path (the 'overflow' case). Such choices
would be most likely based on the monetary cost of links, but may would be most likely based on the monetary cost of links, but may
also be based on properties such as the delay or jitter of links, also be based on properties such as the delay or jitter of links,
where stability (of delay or bandwidth) is more important than where stability (of delay or bandwidth) is more important than
throughput. Application requirements such as these are discussed in throughput. Application requirements such as these are discussed in
detail in [6]. detail in [7].
The ability to make effective choices at the sender requires full The ability to make effective choices at the sender requires full
knowledge of the path "cost", which is unlikely to be the case. It knowledge of the path "cost", which is unlikely to be the case. It
would be desirable for a receiver to be able to signal their own would be desirable for a receiver to be able to signal their own
preferences for paths, since they will often be the multihomed party, preferences for paths, since they will often be the multihomed party,
and may have to pay for metered incoming bandwidth. and may have to pay for metered incoming bandwidth.
Whilst fine-grained control may be the most powerful solution, that Whilst fine-grained control may be the most powerful solution, that
would require some mechanism such as overloading the Explicit would require some mechanism such as overloading the Explicit
Congestion Notification (ECN) signal [19], which is undesirable, and Congestion Notification (ECN) signal [19], which is undesirable, and
skipping to change at page 36, line 44 skipping to change at page 37, line 7
can try to initiate a new subflow from one or more of its can try to initiate a new subflow from one or more of its
addresses to address A2. This permits new sessions to be opened addresses to address A2. This permits new sessions to be opened
if one host is behind a NAT. if one host is behind a NAT.
Other ways of using the two signaling mechanisms are possible; for Other ways of using the two signaling mechanisms are possible; for
instance, signaling addresses in other address families can only be instance, signaling addresses in other address families can only be
done explicitly using the Add Address option. done explicitly using the Add Address option.
3.4.1. Address Advertisement 3.4.1. Address Advertisement
The Add Address (ADD_ADDR2) TCP option announces additional addresses The Add Address (ADD_ADDR2) MPTCP option announces additional
(and optionally, ports) on which a host can be reached (Figure 12). addresses (and optionally, ports) on which a host can be reached
This option can be used at any time during a connection, depending on (Figure 12). This option can be used at any time during a
when the sender wishes to enable multiple paths and/or when paths connection, depending on when the sender wishes to enable multiple
become available. As with all MPTCP signals, the receiver MUST paths and/or when paths become available. As with all MPTCP signals,
undertake standard TCP validity checks, e.g. [12], before acting upon the receiver MUST undertake standard TCP validity checks, e.g. [12],
it. before acting upon it.
Every address has an Address ID that can be used for uniquely Every address has an Address ID that can be used for uniquely
identifying the address within a connection for address removal. identifying the address within a connection for address removal.
This is also used to identify MP_JOIN options (see Section 3.2) This is also used to identify MP_JOIN options (see Section 3.2)
relating to the same address, even when address translators are in relating to the same address, even when address translators are in
use. The Address ID MUST uniquely identify the address to the sender use. The Address ID MUST uniquely identify the address to the sender
(within the scope of the connection), but the mechanism for (within the scope of the connection), but the mechanism for
allocating such IDs is implementation specific. allocating such IDs is implementation specific.
All address IDs learned via either MP_JOIN or ADD_ADDR2 SHOULD be All address IDs learned via either MP_JOIN or ADD_ADDR2 SHOULD be
skipping to change at page 37, line 39 skipping to change at page 37, line 49
The 2 octets that specify the TCP port number to use are optional and The 2 octets that specify the TCP port number to use are optional and
their presence can be inferred from the length of the option. their presence can be inferred from the length of the option.
Although it is expected that the majority of use cases will use the Although it is expected that the majority of use cases will use the
same port pairs as used for the initial subflow (e.g., port 80 same port pairs as used for the initial subflow (e.g., port 80
remains port 80 on all subflows, as does the ephemeral port at the remains port 80 on all subflows, as does the ephemeral port at the
client), there may be cases (such as port-based load balancing) where client), there may be cases (such as port-based load balancing) where
the explicit specification of a different port is required. If no the explicit specification of a different port is required. If no
port is specified, MPTCP SHOULD attempt to connect to the specified port is specified, MPTCP SHOULD attempt to connect to the specified
address on the same port as is already in use by the subflow on which address on the same port as is already in use by the subflow on which
the ADD_ADDR2 signal was sent; this is discussed in more detail in the ADD_ADDR2 signal was sent; this is discussed in more detail in
Section 3.8. Section 3.9.
The Truncated HMAC present in this Option is the rightmost 64 bits of The Truncated HMAC present in this Option is the rightmost 64 bits of
an HMAC, negotiated and calculated in the same way as for MP_JOIN as an HMAC, negotiated and calculated in the same way as for MP_JOIN as
described in Section 3.2. For this specification of MPTCP, as there described in Section 3.2. For this specification of MPTCP, as there
is only one hash algorithm option specified, this will be HMAC as is only one hash algorithm option specified, this will be HMAC as
defined in [11], using the SHA-1 hash algorithm [4], implemented as defined in [11], using the SHA-1 hash algorithm [4], implemented as
in [17]. The key used in the HMAC calculation is that of the sender, in [17]. The key used in the HMAC calculation is that of the sender,
as originally declared in the MP_CAPABLE handshake. The message for as originally declared in the MP_CAPABLE handshake. The message for
the HMAC is the Address ID, IP Address, and Port which precede the the HMAC is the Address ID, IP Address, and Port which precede the
HMAC in the ADD_ADDR2 option. The rationale for the HMAC is to HMAC in the ADD_ADDR2 option. The rationale for the HMAC is to
skipping to change at page 38, line 50 skipping to change at page 39, line 12
Ideally, ADD_ADDR2 and REMOVE_ADDR options would be sent reliably, Ideally, ADD_ADDR2 and REMOVE_ADDR options would be sent reliably,
and in order, to the other end. This would ensure that this address and in order, to the other end. This would ensure that this address
management does not unnecessarily cause an outage in the connection management does not unnecessarily cause an outage in the connection
when remove/add addresses are processed in reverse order, and also to when remove/add addresses are processed in reverse order, and also to
ensure that all possible paths are used. Note, however, that losing ensure that all possible paths are used. Note, however, that losing
reliability and ordering will not break the multipath connections, it reliability and ordering will not break the multipath connections, it
will just reduce the opportunity to open multipath paths and to will just reduce the opportunity to open multipath paths and to
survive different patterns of path failures. survive different patterns of path failures.
Therefore, implementing reliability signals for these TCP options is Therefore, implementing reliability signals for these MPTCP options
not necessary. In order to minimize the impact of the loss of these is not necessary. In order to minimize the impact of the loss of
options, however, it is RECOMMENDED that a sender should send these these options, however, it is RECOMMENDED that a sender should send
options on all available subflows. If these options need to be these options on all available subflows. If these options need to be
received in order, an implementation SHOULD only send one ADD_ADDR2/ received in order, an implementation SHOULD only send one ADD_ADDR2/
REMOVE_ADDR option per RTT, to minimize the risk of misordering. REMOVE_ADDR option per RTT, to minimize the risk of misordering.
A host can send an ADD_ADDR2 message with an already assigned Address A host can send an ADD_ADDR2 message with an already assigned Address
ID, but the Address MUST be the same as previously assigned to this ID, but the Address MUST be the same as previously assigned to this
Address ID, and the Port MUST be different from one already in use Address ID, and the Port MUST be different from one already in use
for this Address ID. If these conditions are not met, the receiver for this Address ID. If these conditions are not met, the receiver
SHOULD silently ignore the ADD_ADDR2. A host wishing to replace an SHOULD silently ignore the ADD_ADDR2. A host wishing to replace an
existing Address ID MUST first remove the existing one existing Address ID MUST first remove the existing one
(Section 3.4.2). (Section 3.4.2).
skipping to change at page 41, line 45 skipping to change at page 42, line 5
attempted fast closure simultaneously. Host A should reply with a attempted fast closure simultaneously. Host A should reply with a
TCP RST and tear down the connection. TCP RST and tear down the connection.
o If Host A does not receive a TCP RST in reply to its MP_FASTCLOSE o If Host A does not receive a TCP RST in reply to its MP_FASTCLOSE
after one retransmission timeout (RTO) (the RTO of the subflow after one retransmission timeout (RTO) (the RTO of the subflow
where the MPTCP_RST has been sent), it SHOULD retransmit the where the MPTCP_RST has been sent), it SHOULD retransmit the
MP_FASTCLOSE. The number of retransmissions SHOULD be limited to MP_FASTCLOSE. The number of retransmissions SHOULD be limited to
avoid this connection from being retained for a long time, but avoid this connection from being retained for a long time, but
this limit is implementation specific. A RECOMMENDED number is 3. this limit is implementation specific. A RECOMMENDED number is 3.
3.6. Fallback 3.6. Subflow Reset
As discussed in Section 3.5 above, the MP_FASTCLOSE option provides a
connection-level reset roughly analagous to a TCP RST. Regular TCP
RST options remain used to at the subflow-level to indicate the
receiving host has no knowledge of the MPTCP subflow or TCP
connection to which the packet belongs.
However, in MPTCP, there may be many reasons for rejecting the
opening of a subflow, but these semantics cannot be carried in a
standard TCP RST. It would be beneficial for a host to the reasons
why its subflow has been closed with a RST, and thus whether it
should try to re-establish the subflow immediately, later, or never
again. These semantics are carried in the MP_TCPRST option that can
be included on a TCP RST packet.
1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+---------------+---------------+-------+-----------------------+
| Kind | Length |Subtype|U|V|W|T| Reason |
+---------------+---------------+-------+-----------------------+
Figure 15: TCP RST Reason (MP_TCPRST) Option
The MP_TCPRST option contains a reason code that allows the sender of
the option to provide more information about the reason for the
termination of the subflow. Using 12 bits of option space, the first
four bits are reserved for flags (only one of which is currently
defined), and the remaining octet is used to express a reason code
for this subflow termination, from which a receiver MAY infer
information about the usability of this path.
The "T" flag is used by the sender to indicate whether the error
condition that is reported is Transient (T bit set to 1) or Permanent
(T bit set to 0). If the error condition is considered to be
Transient by the sender of the RST segment, the recipient of this
segment MAY try to reestablish a subflow for this connection over the
failed path. The time at which a receiver may try to re-establish
this is implementation-specific, but SHOULD take into account the
properties of the failure defined by the following reason code. If
the error condition is considered to be permanent, the receiver of
the RST segment SHOULD NOT try to reestablish a subflow for this
connection over this path. The "U", "V" and "W" flags are not
defined by this specification and are reserved for future use.
The "Reason" code is an 8-bit field that indicates the reason for the
termination of the subflow. The following codes are defined in this
document:
o Unspecified error (code 0x0). This is the default error implying
the subflow is not longer available. The receiving host SHOULD
take account of the 'T' bit in deciding whether to re-estbalish
this subflow. The presence of this option shows that the RST was
generated by a MPTCP-aware device.
o MPTCP specific error (code 0x01). An error has been detected in
the processing of MPTCP options. This is the usual reason code to
return in the cases where a RST is being sent to close a subflow
for reasons of an invalid response.
o Lack of resources (code 0x02). This code indicates that the
sending host does not have enough ressources to support the
terminated subflow.
o Administratively prohibited (code 0x03). This code indicates that
the requested subflow is prohibited by the policies of the sending
host.
o Too much outstanding data (code 0x04). This code indicates that
there is an excessive amount of data that need to be transmitted
over the terminated subflow while having already been acknowledged
over one or more other subflows. This may occur if a path has
been unavailable for a short period and it is more efficient to
reset and start again than it is to retransmit the queued data.
o Unacceptable performance (code 0x05). This code indicates that
the performance of this subflow was too low compared to the other
subflows of this Multipath TCP connection.
o Middlebox interference (code 0x06). Middlebox interference has
been detected over this subflow making MPTCP signaling invalid.
For example, this may be sent if the checksum does not validate.
3.7. Fallback
Sometimes, middleboxes will exist on a path that could prevent the Sometimes, middleboxes will exist on a path that could prevent the
operation of MPTCP. MPTCP has been designed in order to cope with operation of MPTCP. MPTCP has been designed in order to cope with
many middlebox modifications (see Section 6), but there are still many middlebox modifications (see Section 6), but there are still
some cases where a subflow could fail to operate within the MPTCP some cases where a subflow could fail to operate within the MPTCP
requirements. These cases are notably the following: the loss of TCP requirements. These cases are notably the following: the loss of
options on a path and the modification of payload data. If such an MPTCP options on a path and the modification of payload data. If
event occurs, it is necessary to "fall back" to the previous, safe such an event occurs, it is necessary to "fall back" to the previous,
operation. This may be either falling back to regular TCP or safe operation. This may be either falling back to regular TCP or
removing a problematic subflow. removing a problematic subflow.
At the start of an MPTCP connection (i.e., the first subflow), it is At the start of an MPTCP connection (i.e., the first subflow), it is
important to ensure that the path is fully MPTCP capable and the important to ensure that the path is fully MPTCP capable and the
necessary TCP options can reach each host. The handshake as necessary MPTCP options can reach each host. The handshake as
described in Section 3.1 SHOULD fall back to regular TCP if either of described in Section 3.1 SHOULD fall back to regular TCP if either of
the SYN messages do not have the MPTCP options: this is the same, and the SYN messages do not have the MPTCP options: this is the same, and
desired, behavior in the case where a host is not MPTCP capable, or desired, behavior in the case where a host is not MPTCP capable, or
the path does not support the MPTCP options. When attempting to join the path does not support the MPTCP options. When attempting to join
an existing MPTCP connection (Section 3.2), if a path is not MPTCP an existing MPTCP connection (Section 3.2), if a path is not MPTCP
capable and the TCP options do not get through on the SYNs, the capable and the MPTCP options do not get through on the SYNs, the
subflow will be closed according to the MP_JOIN logic. subflow will be closed according to the MP_JOIN logic.
There is, however, another corner case that should be addressed. There is, however, another corner case that should be addressed.
That is one of MPTCP options getting through on the SYN, but not on That is one of MPTCP options getting through on the SYN, but not on
regular packets. This can be resolved if the subflow is the first regular packets. This can be resolved if the subflow is the first
subflow, and thus all data in flight is contiguous, using the subflow, and thus all data in flight is contiguous, using the
following rules. following rules.
A sender MUST include a DSS option with data sequence mapping in A sender MUST include a DSS option with data sequence mapping in
every segment until one of the sent segments has been acknowledged every segment until one of the sent segments has been acknowledged
skipping to change at page 43, line 6 skipping to change at page 44, line 48
Note that this rule essentially prohibits the sending of data on the Note that this rule essentially prohibits the sending of data on the
third packet of an MP_CAPABLE or MP_JOIN handshake, since both that third packet of an MP_CAPABLE or MP_JOIN handshake, since both that
option and a DSS cannot fit in TCP option space. If the initiator is option and a DSS cannot fit in TCP option space. If the initiator is
to send first, another segment must be sent that contains the data to send first, another segment must be sent that contains the data
and DSS. Note also that an additional subflow cannot be used until and DSS. Note also that an additional subflow cannot be used until
the initial path has been verified as MPTCP capable. the initial path has been verified as MPTCP capable.
If a subflow breaks during operation, e.g. if it is re-routed and If a subflow breaks during operation, e.g. if it is re-routed and
MPTCP options are no longer permitted, then once this is detected (by MPTCP options are no longer permitted, then once this is detected (by
the subflow-level receive buffer filling up), the subflow SHOULD be the subflow-level receive buffer filling up), the subflow SHOULD be
treated as broken as closed with a RST, since no data can be treated as broken and closed with a RST, since no data can be
delivered to the application layer, and no fallback signal can be delivered to the application layer, and no fallback signal can be
reliably sent. reliably sent. This RST SHOULD include the MP_TCPRST option
(Section 3.6) with an appropriate reason code.
These rules should cover all cases where such a failure could happen: These rules should cover all cases where such a failure could happen:
whether it's on the forward or reverse path and whether the server or whether it's on the forward or reverse path and whether the server or
the client first sends data. If lost options on data packets occur the client first sends data. If lost options on data packets occur
on any other subflow apart from the initial subflow, it should be on any other subflow apart from the initial subflow, it should be
treated as a standard path failure. The data would not be DATA_ACKed treated as a standard path failure. The data would not be DATA_ACKed
(since there is no mapping for the data), and the subflow can be (since there is no mapping for the data), and the subflow can be
closed with a RST. closed with a RST, containing a MP_TCPRST option (Section 3.6) with
an appropriate reason code.
The case described above is a specialized case of fallback, for when The case described above is a specialized case of fallback, for when
the lack of MPTCP support is detected before any data is acknowledged the lack of MPTCP support is detected before any data is acknowledged
at the connection level on a subflow. More generally, fallback at the connection level on a subflow. More generally, fallback
(either closing a subflow, or to regular TCP) can become necessary at (either closing a subflow, or to regular TCP) can become necessary at
any point during a connection if a non-MPTCP-aware middlebox changes any point during a connection if a non-MPTCP-aware middlebox changes
the data stream. the data stream.
As described in Section 3.3, each portion of data for which there is As described in Section 3.3, each portion of data for which there is
a mapping is protected by a checksum, if checksums have been a mapping is protected by a checksum, if checksums have been
skipping to change at page 44, line 6 skipping to change at page 45, line 50
tampered with. tampered with.
When multiple subflows are in use, the data in flight on a subflow When multiple subflows are in use, the data in flight on a subflow
will likely involve data that is not contiguously part of the will likely involve data that is not contiguously part of the
connection-level stream, since segments will be spread across the connection-level stream, since segments will be spread across the
multiple subflows. Due to the problems identified above, it is not multiple subflows. Due to the problems identified above, it is not
possible to determine what the adjustment has done to the data possible to determine what the adjustment has done to the data
(notably, any changes to the subflow sequence numbering). Therefore, (notably, any changes to the subflow sequence numbering). Therefore,
it is not possible to recover the subflow, and the affected subflow it is not possible to recover the subflow, and the affected subflow
must be immediately closed with a RST, featuring an MP_FAIL option must be immediately closed with a RST, featuring an MP_FAIL option
(Figure 15), which defines the data sequence number at the start of (Figure 16), which defines the data sequence number at the start of
the segment (defined by the data sequence mapping) that had the the segment (defined by the data sequence mapping) that had the
checksum failure. Note that the MP_FAIL option requires the use of checksum failure. Note that the MP_FAIL option requires the use of
the full 64-bit sequence number, even if 32-bit sequence numbers are the full 64-bit sequence number, even if 32-bit sequence numbers are
normally in use in the DSS signals on the path. normally in use in the DSS signals on the path.
1 2 3 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+---------------+---------------+-------+----------------------+ +---------------+---------------+-------+----------------------+
| Kind | Length=12 |Subtype| (reserved) | | Kind | Length=12 |Subtype| (reserved) |
+---------------+---------------+-------+----------------------+ +---------------+---------------+-------+----------------------+
| | | |
| Data Sequence Number (8 octets) | | Data Sequence Number (8 octets) |
| | | |
+--------------------------------------------------------------+ +--------------------------------------------------------------+
Figure 15: Fallback (MP_FAIL) Option Figure 16: Fallback (MP_FAIL) Option
The receiver MUST discard all data following the data sequence number The receiver MUST discard all data following the data sequence number
specified. Failed data MUST NOT be DATA_ACKed and so will be specified. Failed data MUST NOT be DATA_ACKed and so will be
retransmitted on other subflows (Section 3.3.6). retransmitted on other subflows (Section 3.3.6).
A special case is when there is a single subflow and it fails with a A special case is when there is a single subflow and it fails with a
checksum error. If it is known that all unacknowledged data in checksum error. If it is known that all unacknowledged data in
flight is contiguous (which will usually be the case with a single flight is contiguous (which will usually be the case with a single
subflow), an infinite mapping can be applied to the subflow without subflow), an infinite mapping can be applied to the subflow without
the need to close it first, and essentially turn off all further the need to close it first, and essentially turn off all further
skipping to change at page 45, line 33 skipping to change at page 47, line 28
otherwise, the receiver would not know how to reorder the data. In otherwise, the receiver would not know how to reorder the data. In
practice, this means that all MPTCP subflows will have to be practice, this means that all MPTCP subflows will have to be
terminated except one. Once MPTCP falls back to regular TCP, it MUST terminated except one. Once MPTCP falls back to regular TCP, it MUST
NOT revert to MPTCP later in the connection. NOT revert to MPTCP later in the connection.
It should be emphasized that we are not attempting to prevent the use It should be emphasized that we are not attempting to prevent the use
of middleboxes that want to adjust the payload. An MPTCP-aware of middleboxes that want to adjust the payload. An MPTCP-aware
middlebox could provide such functionality by also rewriting middlebox could provide such functionality by also rewriting
checksums. checksums.
3.7. Error Handling 3.8. Error Handling
In addition to the fallback mechanism as described above, the In addition to the fallback mechanism as described above, the
standard classes of TCP errors may need to be handled in an MPTCP- standard classes of TCP errors may need to be handled in an MPTCP-
specific way. Note that changing semantics -- such as the relevance specific way. Note that changing semantics -- such as the relevance
of a RST -- are covered in Section 4. Where possible, we do not want of a RST -- are covered in Section 4. Where possible, we do not want
to deviate from regular TCP behavior. to deviate from regular TCP behavior.
The following list covers possible errors and the appropriate MPTCP The following list covers possible errors and the appropriate MPTCP
behavior: behavior:
o Unknown token in MP_JOIN (or HMAC failure in MP_JOIN ACK, or o Unknown token in MP_JOIN (or HMAC failure in MP_JOIN ACK, or
missing MP_JOIN in SYN/ACK response): send RST (analogous to TCP's missing MP_JOIN in SYN/ACK response): send RST (analogous to TCP's
behavior on an unknown port) behavior on an unknown port)
o DSN out of window (during normal operation): drop the data, do not o DSN out of window (during normal operation): drop the data, do not
send Data ACKs send Data ACKs
o Remove request for unknown address ID: silently ignore o Remove request for unknown address ID: silently ignore
3.8. Heuristics 3.9. Heuristics
There are a number of heuristics that are needed for performance or There are a number of heuristics that are needed for performance or
deployment but that are not required for protocol correctness. In deployment but that are not required for protocol correctness. In
this section, we detail such heuristics. Note that discussion of this section, we detail such heuristics. Note that discussion of
buffering and certain sender and receiver window behaviors are buffering and certain sender and receiver window behaviors are
presented in Sections 3.3.4 and 3.3.5, as well as retransmission in presented in Sections 3.3.4 and 3.3.5, as well as retransmission in
Section 3.3.6. Section 3.3.6.
3.8.1. Port Usage 3.9.1. Port Usage
Under typical operation, an MPTCP implementation SHOULD use the same Under typical operation, an MPTCP implementation SHOULD use the same
ports as already in use. In other words, the destination port of a ports as already in use. In other words, the destination port of a
SYN containing an MP_JOIN option SHOULD be the same as the remote SYN containing an MP_JOIN option SHOULD be the same as the remote
port of the first subflow in the connection. The local port for such port of the first subflow in the connection. The local port for such
SYNs SHOULD also be the same as for the first subflow (and as such, SYNs SHOULD also be the same as for the first subflow (and as such,
an implementation SHOULD reserve ephemeral ports across all local IP an implementation SHOULD reserve ephemeral ports across all local IP
addresses), although there may be cases where this is infeasible. addresses), although there may be cases where this is infeasible.
This strategy is intended to maximize the probability of the SYN This strategy is intended to maximize the probability of the SYN
being permitted by a firewall or NAT at the recipient and to avoid being permitted by a firewall or NAT at the recipient and to avoid
confusing any network monitoring software. confusing any network monitoring software.
There may also be cases, however, where the passive opener wishes to There may also be cases, however, where the passive opener wishes to
signal to the other host that a specific port should be used, and signal to the other host that a specific port should be used, and
this facility is provided in the Add Address option as documented in this facility is provided in the Add Address option as documented in
Section 3.4.1. It is therefore feasible to allow multiple subflows Section 3.4.1. It is therefore feasible to allow multiple subflows
between the same two addresses but using different port pairs, and between the same two addresses but using different port pairs, and
such a facility could be used to allow load balancing within the such a facility could be used to allow load balancing within the
network based on 5-tuples (e.g., some ECMP implementations [8]). network based on 5-tuples (e.g., some ECMP implementations [8]).
3.8.2. Delayed Subflow Start 3.9.2. Delayed Subflow Start and Subflow Symmetry
Many TCP connections are short-lived and consist only of a few Many TCP connections are short-lived and consist only of a few
segments, and so the overheads of using MPTCP outweigh any benefits. segments, and so the overheads of using MPTCP outweigh any benefits.
A heuristic is required, therefore, to decide when to start using A heuristic is required, therefore, to decide when to start using
additional subflows in an MPTCP connection. We expect that additional subflows in an MPTCP connection. We expect that
experience gathered from deployments will provide further guidance on experience gathered from deployments will provide further guidance on
this, and will be affected by particular application characteristics this, and will be affected by particular application characteristics
(which are likely to change over time). However, a suggested (which are likely to change over time). However, a suggested
general-purpose heuristic that an implementation MAY choose to employ general-purpose heuristic that an implementation MAY choose to employ
is as follows. Results from experimental deployments are needed in is as follows. Results from experimental deployments are needed in
skipping to change at page 47, line 20 skipping to change at page 49, line 20
the host that is multihomed may well be the client that will never the host that is multihomed may well be the client that will never
fill its buffers, and thus never use MPTCP. Advanced APIs that allow fill its buffers, and thus never use MPTCP. Advanced APIs that allow
an application to signal its traffic requirements would aid in these an application to signal its traffic requirements would aid in these
decisions. decisions.
An additional time-based heuristic could be applied, opening An additional time-based heuristic could be applied, opening
additional subflows after a given period of time has passed. This additional subflows after a given period of time has passed. This
would alleviate the above issue, and also provide resilience for low- would alleviate the above issue, and also provide resilience for low-
bandwidth but long-lived applications. bandwidth but long-lived applications.
If the two communicating hosts immediately try to set up subflows
from all available addresses to all available addresses on the other
host, this could end up creating two subflows per path. This is an
inefficient use of resources.
If the the same ports are used on all subflows, as recommended above,
then standard TCP simultaneous open logic should take care of this
situation and only one subflow will be established between the
address pairs. However, this relies on the same ports being used at
both end hosts. If a host does not support TCP simultaneous open, it
is RECOMMENDED that some element of randomization is applied to the
time waited before opening new subflows, so that only one subflow
exists between a given address pair. If, however, hosts signal
additional ports to use (for example, for leveraging ECMP on-path),
this heuristic need not apply.
This section has shown some of the considerations that an implementer This section has shown some of the considerations that an implementer
should give when developing MPTCP heuristics, but is not intended to should give when developing MPTCP heuristics, but is not intended to
be prescriptive. be prescriptive.
3.8.3. Failure Handling 3.9.3. Failure Handling
Requirements for MPTCP's handling of unexpected signals have been Requirements for MPTCP's handling of unexpected signals have been
given in Section 3.7. There are other failure cases, however, where given in Section 3.8. There are other failure cases, however, where
a hosts can choose appropriate behavior. a hosts can choose appropriate behavior.
For example, Section 3.1 suggests that a host SHOULD fall back to For example, Section 3.1 suggests that a host SHOULD fall back to
trying regular TCP SYNs after one or more failures of MPTCP SYNs for trying regular TCP SYNs after one or more failures of MPTCP SYNs for
a connection. A host may keep a system-wide cache of such a connection. A host may keep a system-wide cache of such
information, so that it can back off from using MPTCP, firstly for information, so that it can back off from using MPTCP, firstly for
that particular destination host, and eventually on a whole that particular destination host, and eventually on a whole
interface, if MPTCP connections continue failing. interface, if MPTCP connections continue failing.
Another failure could occur when the MP_JOIN handshake fails. Another failure could occur when the MP_JOIN handshake fails.
Section 3.7 specifies that an incorrect handshake MUST lead to the Section 3.8 specifies that an incorrect handshake MUST lead to the
subflow being closed with a RST. A host operating an active subflow being closed with a RST. A host operating an active
intrusion detection system may choose to start blocking MP_JOIN intrusion detection system may choose to start blocking MP_JOIN
packets from the source host if multiple failed MP_JOIN attempts are packets from the source host if multiple failed MP_JOIN attempts are
seen. From the connection initiator's point of view, if an MP_JOIN seen. From the connection initiator's point of view, if an MP_JOIN
fails, it SHOULD NOT attempt to connect to the same IP address and fails, it SHOULD NOT attempt to connect to the same IP address and
port during the lifetime of the connection, unless the other host port during the lifetime of the connection, unless the other host
refreshes the information with another ADD_ADDR2 option. Note that refreshes the information with another ADD_ADDR2 option. Note that
the ADD_ADDR2 option is informational only, and does not guarantee the ADD_ADDR2 option is informational only, and does not guarantee
the other host will attempt a connection. the other host will attempt a connection.
skipping to change at page 49, line 24 skipping to change at page 51, line 41
per-connection local policy. Adding an address to one connection per-connection local policy. Adding an address to one connection
(either explicitly through an Add Address message, or implicitly (either explicitly through an Add Address message, or implicitly
through a Join) has no implication for other connections between through a Join) has no implication for other connections between
the same pair of hosts. the same pair of hosts.
5-tuple: The 5-tuple (protocol, local address, local port, remote 5-tuple: The 5-tuple (protocol, local address, local port, remote
address, remote port) presented by kernel APIs to the application address, remote port) presented by kernel APIs to the application
layer in a non-multipath-aware application is that of the first layer in a non-multipath-aware application is that of the first
subflow, even if the subflow has since been closed and removed subflow, even if the subflow has since been closed and removed
from the connection. This decision, and other related API issues, from the connection. This decision, and other related API issues,
are discussed in more detail in [6]. are discussed in more detail in [7].
5. Security Considerations 5. Security Considerations
As identified in [10], the addition of multipath capability to TCP As identified in [10], the addition of multipath capability to TCP
will bring with it a number of new classes of threat. In order to will bring with it a number of new classes of threat. In order to
prevent these, [2] presents a set of requirements for a security prevent these, [2] presents a set of requirements for a security
solution for MPTCP. The fundamental goal is for the security of solution for MPTCP. The fundamental goal is for the security of
MPTCP to be "no worse" than regular TCP today, and the key security MPTCP to be "no worse" than regular TCP today, and the key security
requirements are: requirements are:
skipping to change at page 50, line 39 skipping to change at page 53, line 9
denial-of-service attacks consuming resources. denial-of-service attacks consuming resources.
As discussed in Section 3.4.1, a host may advertise its private As discussed in Section 3.4.1, a host may advertise its private
addresses, but these might point to different hosts in the receiver's addresses, but these might point to different hosts in the receiver's
network. The MP_JOIN handshake (Section 3.2) will ensure that this network. The MP_JOIN handshake (Section 3.2) will ensure that this
does not succeed in setting up a subflow to the incorrect host. does not succeed in setting up a subflow to the incorrect host.
However, it could still create unwanted TCP handshake traffic. This However, it could still create unwanted TCP handshake traffic. This
feature of MPTCP could be a target for denial-of-service exploits, feature of MPTCP could be a target for denial-of-service exploits,
with malicious participants in MPTCP connections encouraging the with malicious participants in MPTCP connections encouraging the
recipient to target other hosts in the network. Therefore, recipient to target other hosts in the network. Therefore,
implementations should consider heuristics (Section 3.8) at both the implementations should consider heuristics (Section 3.9) at both the
sender and receiver to reduce the impact of this. sender and receiver to reduce the impact of this.
A small security risk could theoretically exist with key reuse, but A small security risk could theoretically exist with key reuse, but
in order to accomplish a replay attack, both the sender and receiver in order to accomplish a replay attack, both the sender and receiver
keys, and the sender and receiver random numbers, in the MP_JOIN keys, and the sender and receiver random numbers, in the MP_JOIN
handshake (Section 3.2) would have to match. handshake (Section 3.2) would have to match.
Whilst this specification defines a "medium" security solution, Whilst this specification defines a "medium" security solution,
meeting the criteria specified at the start of this section and the meeting the criteria specified at the start of this section and the
threat analysis ([10]), since attacks only ever get worse, it is threat analysis ([10]), since attacks only ever get worse, it is
skipping to change at page 52, line 21 skipping to change at page 54, line 41
presence of the SYN flag. presence of the SYN flag.
MPTCP SYN packets on the first subflow of a connection contain the MPTCP SYN packets on the first subflow of a connection contain the
MP_CAPABLE option (Section 3.1). If this is dropped, MPTCP SHOULD MP_CAPABLE option (Section 3.1). If this is dropped, MPTCP SHOULD
fall back to regular TCP. If packets with the MP_JOIN option fall back to regular TCP. If packets with the MP_JOIN option
(Section 3.2) are dropped, the paths will simply not be used. (Section 3.2) are dropped, the paths will simply not be used.
If a middlebox strips options but otherwise passes the packets If a middlebox strips options but otherwise passes the packets
unchanged, MPTCP will behave safely. If an MP_CAPABLE option is unchanged, MPTCP will behave safely. If an MP_CAPABLE option is
dropped on either the outgoing or the return path, the initiating dropped on either the outgoing or the return path, the initiating
host can fall back to regular TCP, as illustrated in Figure 16 and host can fall back to regular TCP, as illustrated in Figure 17 and
discussed in Section 3.1. discussed in Section 3.1.
Subflow SYNs contain the MP_JOIN option. If this option is stripped Subflow SYNs contain the MP_JOIN option. If this option is stripped
on the outgoing path, the SYN will appear to be a regular SYN to Host on the outgoing path, the SYN will appear to be a regular SYN to Host
B. Depending on whether there is a listening socket on the target B. Depending on whether there is a listening socket on the target
port, Host B will reply either with SYN/ACK or RST (subflow port, Host B will reply either with SYN/ACK or RST (subflow
connection fails). When Host A receives the SYN/ACK it sends a RST connection fails). When Host A receives the SYN/ACK it sends a RST
because the SYN/ACK does not contain the MP_JOIN option and its because the SYN/ACK does not contain the MP_JOIN option and its
token. Either way, the subflow setup fails, but otherwise does not token. Either way, the subflow setup fails, but otherwise does not
affect the MPTCP connection as a whole. affect the MPTCP connection as a whole.
skipping to change at page 52, line 51 skipping to change at page 55, line 23
Host A Host B Host A Host B
| SYN(MP_CAPABLE) | | SYN(MP_CAPABLE) |
|------------------------------------>| |------------------------------------>|
| Middlebox M | | Middlebox M |
| | | | | |
| SYN/ACK |SYN/ACK(MP_CAPABLE)| | SYN/ACK |SYN/ACK(MP_CAPABLE)|
|<----------------|-------------------| |<----------------|-------------------|
b) MP_CAPABLE option stripped on return path b) MP_CAPABLE option stripped on return path
Figure 16: Connection Setup with Middleboxes that Strip Options from Figure 17: Connection Setup with Middleboxes that Strip Options from
Packets Packets
We now examine data flow with MPTCP, assuming the flow is correctly We now examine data flow with MPTCP, assuming the flow is correctly
set up, which implies the options in the SYN packets were allowed set up, which implies the options in the SYN packets were allowed
through by the relevant middleboxes. If options are allowed through through by the relevant middleboxes. If options are allowed through
and there is no resegmentation or coalescing to TCP segments, and there is no resegmentation or coalescing to TCP segments,
Multipath TCP flows can proceed without problems. Multipath TCP flows can proceed without problems.
The case when options get stripped on data packets has been discussed The case when options get stripped on data packets has been discussed
in the Fallback section. If a fraction of options are stripped, in the Fallback section. If a fraction of options are stripped,
behavior is not deterministic. If some data sequence mappings are behavior is not deterministic. If some data sequence mappings are
lost, the connection can continue so long as mappings exist for the lost, the connection can continue so long as mappings exist for the
subflow-level data (e.g., if multiple maps have been sent that subflow-level data (e.g., if multiple maps have been sent that
reinforce each other). If some subflow-level space is left unmapped, reinforce each other). If some subflow-level space is left unmapped,
however, the subflow is treated as broken and is closed, through the however, the subflow is treated as broken and is closed, through the
process described in Section 3.6. MPTCP should survive with a loss process described in Section 3.7. MPTCP should survive with a loss
of some Data ACKs, but performance will degrade as the fraction of of some Data ACKs, but performance will degrade as the fraction of
stripped options increases. We do not expect such cases to appear in stripped options increases. We do not expect such cases to appear in
practice, though: most middleboxes will either strip all options or practice, though: most middleboxes will either strip all options or
let them all through. let them all through.
We end this section with a list of middlebox classes, their behavior, We end this section with a list of middlebox classes, their behavior,
and the elements in the MPTCP design that allow operation through and the elements in the MPTCP design that allow operation through
such middleboxes. Issues surrounding dropping packets with options such middleboxes. Issues surrounding dropping packets with options
or stripping options were discussed above, and are not included here: or stripping options were discussed above, and are not included here:
skipping to change at page 53, line 44 skipping to change at page 56, line 15
the MP_JOIN option, and the handshake mechanism ensures that the MP_JOIN option, and the handshake mechanism ensures that
connection attempts to private addresses [20] do not cause connection attempts to private addresses [20] do not cause
problems. Explicit address removal is undertaken by an Address ID problems. Explicit address removal is undertaken by an Address ID
to allow no knowledge of the source address. to allow no knowledge of the source address.
o Performance Enhancing Proxies (PEPs) [24] might proactively ACK o Performance Enhancing Proxies (PEPs) [24] might proactively ACK
data to increase performance. MPTCP, however, relies on accurate data to increase performance. MPTCP, however, relies on accurate
congestion control signals from the end host, and non-MPTCP-aware congestion control signals from the end host, and non-MPTCP-aware
PEPs will not be able to provide such signals. MPTCP will, PEPs will not be able to provide such signals. MPTCP will,
therefore, fall back to single-path TCP, or close the problematic therefore, fall back to single-path TCP, or close the problematic
subflow (see Section 3.6). subflow (see Section 3.7).
o Traffic Normalizers [25] may not allow holes in sequence numbers, o Traffic Normalizers [25] may not allow holes in sequence numbers,
and may cache packets and retransmit the same data. MPTCP looks and may cache packets and retransmit the same data. MPTCP looks
like standard TCP on the wire, and will not retransmit different like standard TCP on the wire, and will not retransmit different
data on the same subflow sequence number. In the event of a data on the same subflow sequence number. In the event of a
retransmission, the same data will be retransmitted on the retransmission, the same data will be retransmitted on the
original TCP subflow even if it is additionally retransmitted at original TCP subflow even if it is additionally retransmitted at
the connection level on a different subflow. the connection level on a different subflow.
o Firewalls [26] might perform initial sequence number randomization o Firewalls [26] might perform initial sequence number randomization
skipping to change at page 55, line 16 skipping to change at page 57, line 33
The authors gratefully acknowledge significant input into this The authors gratefully acknowledge significant input into this
document from Sebastien Barre, Christoph Paasch, and Andrew McDonald. document from Sebastien Barre, Christoph Paasch, and Andrew McDonald.
The authors also wish to acknowledge reviews and contributions from The authors also wish to acknowledge reviews and contributions from
Iljitsch van Beijnum, Lars Eggert, Marcelo Bagnulo, Robert Hancock, Iljitsch van Beijnum, Lars Eggert, Marcelo Bagnulo, Robert Hancock,
Pasi Sarolahti, Toby Moncaster, Philip Eardley, Sergio Lembo, Pasi Sarolahti, Toby Moncaster, Philip Eardley, Sergio Lembo,
Lawrence Conroy, Yoshifumi Nishida, Bob Briscoe, Stein Gjessing, Lawrence Conroy, Yoshifumi Nishida, Bob Briscoe, Stein Gjessing,
Andrew McGregor, Georg Hampel, Anumita Biswas, Wes Eddy, Alexey Andrew McGregor, Georg Hampel, Anumita Biswas, Wes Eddy, Alexey
Melnikov, Francis Dupont, Adrian Farrel, Barry Leiba, Robert Sparks, Melnikov, Francis Dupont, Adrian Farrel, Barry Leiba, Robert Sparks,
Sean Turner, Stephen Farrell, and Martin Stiemerling. Sean Turner, Stephen Farrell, Martin Stiemerling, and Gregory Detal.
8. IANA Considerations 8. IANA Considerations
This document updates [7] and as such IANA is requested to update the This document updates [5] and as such IANA is requested to update the
TCP option space registry to point to this document for Multipath TCP option space registry to point to this document for Multipath
TCP, as follows: TCP, as follows:
+------+--------+-----------------------+---------------+ +------+--------+-----------------------+---------------+
| Kind | Length | Meaning | Reference | | Kind | Length | Meaning | Reference |
+------+--------+-----------------------+---------------+ +------+--------+-----------------------+---------------+
| 30 | N | Multipath TCP (MPTCP) | This document | | 30 | N | Multipath TCP (MPTCP) | This document |
+------+--------+-----------------------+---------------+ +------+--------+-----------------------+---------------+
Table 1: TCP Option Kind Numbers Table 1: TCP Option Kind Numbers
The 4-bit MPTCP subtype sub-registry ("MPTCP Option Subtypes" under The 4-bit MPTCP subtype sub-registry ("MPTCP Option Subtypes" under
the "Transmission Control Protocol (TCP) Parameters" registry) was the "Transmission Control Protocol (TCP) Parameters" registry) was
defined in [7]. This document defines one additional subtype defined in [5]. This document defines one additional subtype
(ADD_ADDR22) and updates the references to this document for all sub- (ADD_ADDR2) and updates the references to this document for all sub-
types except ADD_ADDR, which is deprecated. The updates are listed types except ADD_ADDR, which is deprecated. The updates are listed
in the following table. in the following table.
+-------+--------------+----------------------------+---------------+ +-------+--------------+----------------------------+---------------+
| Value | Symbol | Name | Reference | | Value | Symbol | Name | Reference |
+-------+--------------+----------------------------+---------------+ +-------+--------------+----------------------------+---------------+
| 0x0 | MP_CAPABLE | Multipath Capable | This | | 0x0 | MP_CAPABLE | Multipath Capable | This |
| | | | document, | | | | | document, |
| | | | Section 3.1 | | | | | Section 3.1 |
| 0x1 | MP_JOIN | Join Connection | This | | 0x1 | MP_JOIN | Join Connection | This |
skipping to change at page 56, line 12 skipping to change at page 58, line 27
| | | ACK and data sequence | document, | | | | ACK and data sequence | document, |
| | | mapping) | Section 3.3 | | | | mapping) | Section 3.3 |
| 0x4 | REMOVE_ADDR | Remove Address | This | | 0x4 | REMOVE_ADDR | Remove Address | This |
| | | | document, | | | | | document, |
| | | | Section 3.4.2 | | | | | Section 3.4.2 |
| 0x5 | MP_PRIO | Change Subflow Priority | This | | 0x5 | MP_PRIO | Change Subflow Priority | This |
| | | | document, | | | | | document, |
| | | | Section 3.3.8 | | | | | Section 3.3.8 |
| 0x6 | MP_FAIL | Fallback | This | | 0x6 | MP_FAIL | Fallback | This |
| | | | document, | | | | | document, |
| | | | Section 3.6 | | | | | Section 3.7 |
| 0x7 | MP_FASTCLOSE | Fast Close | This | | 0x7 | MP_FASTCLOSE | Fast Close | This |
| | | | document, | | | | | document, |
| | | | Section 3.5 | | | | | Section 3.5 |
| 0x8 | ADD_ADDR22 | Add Address | This | | 0x8 | ADD_ADDR2 | Add Address | This |
| | | | document, | | | | | document, |
| | | | Section 3.4.1 | | | | | Section 3.4.1 |
| 0x9 | MP_TCPRST | TCP Reset | This |
| | | | document, |
| | | | Section 3.6 |
+-------+--------------+----------------------------+---------------+ +-------+--------------+----------------------------+---------------+
Table 2: MPTCP Option Subtypes Table 2: MPTCP Option Subtypes
Values 0x9 through 0xe are currently unassigned. The value 0xf is Values 0xa through 0xe are currently unassigned. The value 0xf is
reserved for Private Use within controlled testbeds. reserved for Private Use within controlled testbeds. The value 0x3
was assigned to the deprecated ADD_ADDR option ([5]) and SHOULD be
silently ignored.
IANA has created another sub-registry, "MPTCP Handshake Algorithms" IANA has created another sub-registry, "MPTCP Handshake Algorithms"
under the "Transmission Control Protocol (TCP) Parameters" registry, under the "Transmission Control Protocol (TCP) Parameters" registry,
based on the flags in MP_CAPABLE (Section 3.1). IANA is requested to based on the flags in MP_CAPABLE (Section 3.1). IANA is requested to
update the references of this table to this document, as follows: update the references of this table to this document, as follows:
+----------+-------------------+----------------------------+ +----------+-------------------+----------------------------+
| Flag Bit | Meaning | Reference | | Flag Bit | Meaning | Reference |
+----------+-------------------+----------------------------+ +----------+-------------------+----------------------------+
| A | Checksum required | This document, Section 3.1 | | A | Checksum required | This document, Section 3.1 |
skipping to change at page 56, line 51 skipping to change at page 59, line 25
Note that the meanings of bits C through H can be dependent upon bit Note that the meanings of bits C through H can be dependent upon bit
B, depending on how Extensibility is defined in future B, depending on how Extensibility is defined in future
specifications; see Section 3.1 for more information. specifications; see Section 3.1 for more information.
Future assignments in this registry are also to be defined by Future assignments in this registry are also to be defined by
Standards Action as defined by [27]. Assignments consist of the Standards Action as defined by [27]. Assignments consist of the
value of the flags, a symbolic name for the algorithm, and a value of the flags, a symbolic name for the algorithm, and a
reference to its specification. reference to its specification.
IANA is requested to create a further sub-registry, "MP_TCPRST Reason
Codes" under the "Transmission Control Protocol (TCP) Parameters"
registry, based on the reason code in MP_TCPRST (Section 3.6). The
contents of this sub-registry are to to this document, as follows:
+------+-----------------------------+----------------------------+
| Code | Meaning | Reference |
+------+-----------------------------+----------------------------+
| 0x00 | Unspecified TCP error | This document, Section 3.6 |
| 0x01 | MPTCP specific error | This document, Section 3.6 |
| 0x02 | Lack of resources | This document, Section 3.6 |
| 0x03 | Administratively prohibited | This document, Section 3.6 |
| 0x04 | Too much outstanding data | This document, Section 3.6 |
| 0x05 | Unacceptable performance | This document, Section 3.6 |
| 0x06 | Middlebox interference | This document, Section 3.6 |
+------+-----------------------------+----------------------------+
Table 4: MPTCP MP_TCPRST Reason Codes
9. References 9. References
9.1. Normative References 9.1. Normative References
[1] Postel, J., "Transmission Control Protocol", STD 7, RFC 793, [1] Postel, J., "Transmission Control Protocol", STD 7, RFC 793,
September 1981. September 1981.
[2] Ford, A., Raiciu, C., Handley, M., Barre, S., and J. Iyengar, [2] Ford, A., Raiciu, C., Handley, M., Barre, S., and J. Iyengar,
"Architectural Guidelines for Multipath TCP Development", "Architectural Guidelines for Multipath TCP Development",
RFC 6182, March 2011. RFC 6182, March 2011.
[3] Bradner, S., "Key words for use in RFCs to Indicate Requirement [3] Bradner, S., "Key words for use in RFCs to Indicate Requirement
skipping to change at page 57, line 23 skipping to change at page 60, line 16
[3] Bradner, S., "Key words for use in RFCs to Indicate Requirement [3] Bradner, S., "Key words for use in RFCs to Indicate Requirement
Levels", BCP 14, RFC 2119, March 1997. Levels", BCP 14, RFC 2119, March 1997.
[4] National Institute of Science and Technology, "Secure Hash [4] National Institute of Science and Technology, "Secure Hash
Standard", Federal Information Processing Standard Standard", Federal Information Processing Standard
(FIPS) 180-3, October 2008, <http://csrc.nist.gov/publications/ (FIPS) 180-3, October 2008, <http://csrc.nist.gov/publications/
fips/fips180-3/fips180-3_final.pdf>. fips/fips180-3/fips180-3_final.pdf>.
9.2. Informative References 9.2. Informative References
[5] Raiciu, C., Handley, M., and D. Wischik, "Coupled Congestion [5] Ford, A., Raiciu, C., Handley, M., and O. Bonaventure, "TCP
Extensions for Multipath Operation with Multiple Addresses",
RFC 6824, January 2013.
[6] Raiciu, C., Handley, M., and D. Wischik, "Coupled Congestion
Control for Multipath Transport Protocols", RFC 6356, Control for Multipath Transport Protocols", RFC 6356,
October 2011. October 2011.
[6] Scharf, M. and A. Ford, "Multipath TCP (MPTCP) Application [7] Scharf, M. and A. Ford, "Multipath TCP (MPTCP) Application
Interface Considerations", RFC 6897, March 2013. Interface Considerations", RFC 6897, March 2013.
[7] Ford, A., Raiciu, C., Handley, M., and O. Bonaventure, "TCP
Extensions for Multipath Operation with Multiple Addresses",
RFC 6824, January 2013.
[8] Hopps, C., "Analysis of an Equal-Cost Multi-Path Algorithm", [8] Hopps, C., "Analysis of an Equal-Cost Multi-Path Algorithm",
RFC 2992, November 2000. RFC 2992, November 2000.
[9] Raiciu, C., Paasch, C., Barre, S., Ford, A., Honda, M., [9] Raiciu, C., Paasch, C., Barre, S., Ford, A., Honda, M.,
Duchene, F., Bonaventure, O., and M. Handley, "How Hard Can It Duchene, F., Bonaventure, O., and M. Handley, "How Hard Can It
Be? Designing and Implementing a Deployable Multipath TCP", Be? Designing and Implementing a Deployable Multipath TCP",
Usenix Symposium on Networked Systems Design and Usenix Symposium on Networked Systems Design and
Implementation 2012, <https://www.usenix.org/conference/nsdi12/ Implementation 2012, <https://www.usenix.org/conference/nsdi12/
how-hard-can-it-be-designing-and-implementing-deployable- how-hard-can-it-be-designing-and-implementing-deployable-
multipath-tcp>. multipath-tcp>.
skipping to change at page 61, line 26 skipping to change at page 64, line 21
Local.Key (64 bits): This is the key sent by the local host on this Local.Key (64 bits): This is the key sent by the local host on this
MPTCP connection. MPTCP connection.
Remote.Token (32 bits): This is the token chosen by the remote host Remote.Token (32 bits): This is the token chosen by the remote host
on this MPTCP connection, generated from the remote key. on this MPTCP connection, generated from the remote key.
Remote.Key (64 bits): This is the key chosen by the remote host on Remote.Key (64 bits): This is the key chosen by the remote host on
this MPTCP connection this MPTCP connection
MPTCP.Checksum (flag): This flag is set to true if at least one of MPTCP.Checksum (flag): This flag is set to true if at least one of
the hosts has set the C bit in the MP_CAPABLE options exchanged the hosts has set the A bit in the MP_CAPABLE options exchanged
during connection establishment, and is set to false otherwise. during connection establishment, and is set to false otherwise.
If this flag is set, the checksum must be computed in all DSS If this flag is set, the checksum must be computed in all DSS
options. options.
B.1.2. Sending Side B.1.2. Sending Side
SND.UNA (64 bits): This is the data sequence number of the next byte SND.UNA (64 bits): This is the data sequence number of the next byte
to be acknowledged, at the MPTCP connection level. This variable to be acknowledged, at the MPTCP connection level. This variable
is updated upon reception of a DSS option containing a DATA_ACK. is updated upon reception of a DSS option containing a DATA_ACK.
skipping to change at page 62, line 49 skipping to change at page 65, line 44
is expected on the subflow. This state variable is modified upon is expected on the subflow. This state variable is modified upon
reception of in-order segments. The value of RCV.NXT is copied to reception of in-order segments. The value of RCV.NXT is copied to
the SEG.ACK field of the next segments transmitted on the subflow. the SEG.ACK field of the next segments transmitted on the subflow.
RCV.WND (32 bits with RFC 1323, 16 bits otherwise): This is the RCV.WND (32 bits with RFC 1323, 16 bits otherwise): This is the
subflow-level receive window that is updated with the window field subflow-level receive window that is updated with the window field
from the segments received on this subflow. from the segments received on this subflow.
Appendix C. Finite State Machine Appendix C. Finite State Machine
The diagram in Figure 17 shows the Finite State Machine for The diagram in Figure 18 shows the Finite State Machine for
connection-level closure. This illustrates how the DATA_FIN connection-level closure. This illustrates how the DATA_FIN
connection-level signal (indicated as the DFIN flag on a DATA_ACK) connection-level signal (indicated as the DFIN flag on a DATA_ACK)
interacts with subflow-level FINs, and permits "break-before-make" interacts with subflow-level FINs, and permits "break-before-make"
handover between subflows. handover between subflows.
+---------+ +---------+
| M_ESTAB | | M_ESTAB |
+---------+ +---------+
M_CLOSE | | rcv DATA_FIN M_CLOSE | | rcv DATA_FIN
------- | | ------- ------- | | -------
skipping to change at page 63, line 34 skipping to change at page 66, line 32
| rcv DATA_FIN -------------- | -------------- | | rcv DATA_FIN -------------- | -------------- |
| ------- CLOSE all subflows | CLOSE all subflows | | ------- CLOSE all subflows | CLOSE all subflows |
| snd DATA_ACK[DFIN] V delete MPTCP PCB V | snd DATA_ACK[DFIN] V delete MPTCP PCB V
\ +-----------+ +---------+ \ +-----------+ +---------+
------------------------>|M_TIME WAIT|----------------->| M_CLOSED| ------------------------>|M_TIME WAIT|----------------->| M_CLOSED|
+-----------+ +---------+ +-----------+ +---------+
All subflows in CLOSED All subflows in CLOSED
------------ ------------
delete MPTCP PCB delete MPTCP PCB
Figure 17: Finite State Machine for Connection Closure Figure 18: Finite State Machine for Connection Closure
Authors' Addresses Authors' Addresses
Alan Ford Alan Ford
Pexip Pexip
EMail: alan.ford@gmail.com EMail: alan.ford@gmail.com
Costin Raiciu Costin Raiciu
University Politehnica of Bucharest University Politehnica of Bucharest
 End of changes. 78 change blocks. 
120 lines changed or deleted 261 lines changed or added

This html diff was produced by rfcdiff 1.41. The latest version is available from http://tools.ietf.org/tools/rfcdiff/