draft-ietf-mptcp-rfc6824bis-13.txt   draft-ietf-mptcp-rfc6824bis-14.txt 
Internet Engineering Task Force A. Ford Internet Engineering Task Force A. Ford
Internet-Draft Pexip Internet-Draft Pexip
Obsoletes: 6824 (if approved) C. Raiciu Obsoletes: 6824 (if approved) C. Raiciu
Intended status: Standards Track U. Politechnica of Bucharest Intended status: Standards Track U. Politechnica of Bucharest
Expires: August 21, 2019 M. Handley Expires: November 4, 2019 M. Handley
U. College London U. College London
O. Bonaventure O. Bonaventure
U. catholique de Louvain U. catholique de Louvain
C. Paasch C. Paasch
Apple, Inc. Apple, Inc.
February 17, 2019 May 3, 2019
TCP Extensions for Multipath Operation with Multiple Addresses TCP Extensions for Multipath Operation with Multiple Addresses
draft-ietf-mptcp-rfc6824bis-13 draft-ietf-mptcp-rfc6824bis-14
Abstract Abstract
TCP/IP communication is currently restricted to a single path per TCP/IP communication is currently restricted to a single path per
connection, yet multiple paths often exist between peers. The connection, yet multiple paths often exist between peers. The
simultaneous use of these multiple paths for a TCP/IP session would simultaneous use of these multiple paths for a TCP/IP session would
improve resource usage within the network and, thus, improve user improve resource usage within the network and, thus, improve user
experience through higher throughput and improved resilience to experience through higher throughput and improved resilience to
network failure. network failure.
skipping to change at page 2, line 7 skipping to change at page 2, line 7
Internet-Drafts are working documents of the Internet Engineering Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF). Note that other groups may also distribute Task Force (IETF). Note that other groups may also distribute
working documents as Internet-Drafts. The list of current Internet- working documents as Internet-Drafts. The list of current Internet-
Drafts is at http://datatracker.ietf.org/drafts/current/. Drafts is at http://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress." material or to cite them other than as "work in progress."
This Internet-Draft will expire on August 21, 2019. This Internet-Draft will expire on November 4, 2019.
Copyright Notice Copyright Notice
Copyright (c) 2019 IETF Trust and the persons identified as the Copyright (c) 2019 IETF Trust and the persons identified as the
document authors. All rights reserved. document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents Provisions Relating to IETF Documents
(http://trustee.ietf.org/license-info) in effect on the date of (http://trustee.ietf.org/license-info) in effect on the date of
publication of this document. Please review these documents publication of this document. Please review these documents
skipping to change at page 3, line 5 skipping to change at page 3, line 5
3.2. Starting a New Subflow . . . . . . . . . . . . . . . . . 23 3.2. Starting a New Subflow . . . . . . . . . . . . . . . . . 23
3.3. General MPTCP Operation . . . . . . . . . . . . . . . . . 28 3.3. General MPTCP Operation . . . . . . . . . . . . . . . . . 28
3.3.1. Data Sequence Mapping . . . . . . . . . . . . . . . . 30 3.3.1. Data Sequence Mapping . . . . . . . . . . . . . . . . 30
3.3.2. Data Acknowledgments . . . . . . . . . . . . . . . . 33 3.3.2. Data Acknowledgments . . . . . . . . . . . . . . . . 33
3.3.3. Closing a Connection . . . . . . . . . . . . . . . . 34 3.3.3. Closing a Connection . . . . . . . . . . . . . . . . 34
3.3.4. Receiver Considerations . . . . . . . . . . . . . . . 35 3.3.4. Receiver Considerations . . . . . . . . . . . . . . . 35
3.3.5. Sender Considerations . . . . . . . . . . . . . . . . 37 3.3.5. Sender Considerations . . . . . . . . . . . . . . . . 37
3.3.6. Reliability and Retransmissions . . . . . . . . . . . 37 3.3.6. Reliability and Retransmissions . . . . . . . . . . . 37
3.3.7. Congestion Control Considerations . . . . . . . . . . 39 3.3.7. Congestion Control Considerations . . . . . . . . . . 39
3.3.8. Subflow Policy . . . . . . . . . . . . . . . . . . . 39 3.3.8. Subflow Policy . . . . . . . . . . . . . . . . . . . 39
3.4. Address Knowledge Exchange (Path Management) . . . . . . 41 3.4. Address Knowledge Exchange (Path Management) . . . . . . 40
3.4.1. Address Advertisement . . . . . . . . . . . . . . . . 42 3.4.1. Address Advertisement . . . . . . . . . . . . . . . . 42
3.4.2. Remove Address . . . . . . . . . . . . . . . . . . . 45 3.4.2. Remove Address . . . . . . . . . . . . . . . . . . . 45
3.5. Fast Close . . . . . . . . . . . . . . . . . . . . . . . 46 3.5. Fast Close . . . . . . . . . . . . . . . . . . . . . . . 46
3.6. Subflow Reset . . . . . . . . . . . . . . . . . . . . . . 48 3.6. Subflow Reset . . . . . . . . . . . . . . . . . . . . . . 47
3.7. Fallback . . . . . . . . . . . . . . . . . . . . . . . . 49 3.7. Fallback . . . . . . . . . . . . . . . . . . . . . . . . 49
3.8. Error Handling . . . . . . . . . . . . . . . . . . . . . 53 3.8. Error Handling . . . . . . . . . . . . . . . . . . . . . 53
3.9. Heuristics . . . . . . . . . . . . . . . . . . . . . . . 53 3.9. Heuristics . . . . . . . . . . . . . . . . . . . . . . . 53
3.9.1. Port Usage . . . . . . . . . . . . . . . . . . . . . 54 3.9.1. Port Usage . . . . . . . . . . . . . . . . . . . . . 53
3.9.2. Delayed Subflow Start and Subflow Symmetry . . . . . 54 3.9.2. Delayed Subflow Start and Subflow Symmetry . . . . . 54
3.9.3. Failure Handling . . . . . . . . . . . . . . . . . . 55 3.9.3. Failure Handling . . . . . . . . . . . . . . . . . . 55
4. Semantic Issues . . . . . . . . . . . . . . . . . . . . . . . 56 4. Semantic Issues . . . . . . . . . . . . . . . . . . . . . . . 55
5. Security Considerations . . . . . . . . . . . . . . . . . . . 57 5. Security Considerations . . . . . . . . . . . . . . . . . . . 57
6. Interactions with Middleboxes . . . . . . . . . . . . . . . . 60 6. Interactions with Middleboxes . . . . . . . . . . . . . . . . 60
7. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 63 7. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 63
8. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 64 8. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 63
8.1. MPTCP Option Subtypes . . . . . . . . . . . . . . . . . . 64 8.1. MPTCP Option Subtypes . . . . . . . . . . . . . . . . . . 64
8.2. MPTCP Handshake Algorithms . . . . . . . . . . . . . . . 65 8.2. MPTCP Handshake Algorithms . . . . . . . . . . . . . . . 65
8.3. MP_TCPRST Reason Codes . . . . . . . . . . . . . . . . . 66 8.3. MP_TCPRST Reason Codes . . . . . . . . . . . . . . . . . 66
9. References . . . . . . . . . . . . . . . . . . . . . . . . . 67 9. References . . . . . . . . . . . . . . . . . . . . . . . . . 66
9.1. Normative References . . . . . . . . . . . . . . . . . . 67 9.1. Normative References . . . . . . . . . . . . . . . . . . 66
9.2. Informative References . . . . . . . . . . . . . . . . . 67 9.2. Informative References . . . . . . . . . . . . . . . . . 67
Appendix A. Notes on Use of TCP Options . . . . . . . . . . . . 71 Appendix A. Notes on Use of TCP Options . . . . . . . . . . . . 70
Appendix B. TCP Fast Open and MPTCP . . . . . . . . . . . . . . 72 Appendix B. TCP Fast Open and MPTCP . . . . . . . . . . . . . . 71
B.1. TFO cookie request with MPTCP . . . . . . . . . . . . . . 72 B.1. TFO cookie request with MPTCP . . . . . . . . . . . . . . 71
B.2. Data sequence mapping under TFO . . . . . . . . . . . . . 73 B.2. Data sequence mapping under TFO . . . . . . . . . . . . . 72
B.3. Connection establishment examples . . . . . . . . . . . . 74 B.3. Connection establishment examples . . . . . . . . . . . . 73
Appendix C. Control Blocks . . . . . . . . . . . . . . . . . . . 76 Appendix C. Control Blocks . . . . . . . . . . . . . . . . . . . 75
C.1. MPTCP Control Block . . . . . . . . . . . . . . . . . . . 76 C.1. MPTCP Control Block . . . . . . . . . . . . . . . . . . . 75
C.1.1. Authentication and Metadata . . . . . . . . . . . . . 76 C.1.1. Authentication and Metadata . . . . . . . . . . . . . 75
C.1.2. Sending Side . . . . . . . . . . . . . . . . . . . . 77 C.1.2. Sending Side . . . . . . . . . . . . . . . . . . . . 76
C.1.3. Receiving Side . . . . . . . . . . . . . . . . . . . 77 C.1.3. Receiving Side . . . . . . . . . . . . . . . . . . . 76
C.2. TCP Control Blocks . . . . . . . . . . . . . . . . . . . 77 C.2. TCP Control Blocks . . . . . . . . . . . . . . . . . . . 76
C.2.1. Sending Side . . . . . . . . . . . . . . . . . . . . 78 C.2.1. Sending Side . . . . . . . . . . . . . . . . . . . . 77
C.2.2. Receiving Side . . . . . . . . . . . . . . . . . . . 78 C.2.2. Receiving Side . . . . . . . . . . . . . . . . . . . 77
Appendix D. Finite State Machine . . . . . . . . . . . . . . . . 78 Appendix D. Finite State Machine . . . . . . . . . . . . . . . . 77
Appendix E. Changes from RFC6184 . . . . . . . . . . . . . . . . 79 Appendix E. Changes from RFC6184 . . . . . . . . . . . . . . . . 78
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 81 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 80
1. Introduction 1. Introduction
Multipath TCP (MPTCP) is a set of extensions to regular TCP [RFC0793] Multipath TCP (MPTCP) is a set of extensions to regular TCP [RFC0793]
to provide a Multipath TCP [RFC6182] service, which enables a to provide a Multipath TCP [RFC6182] service, which enables a
transport connection to operate across multiple paths simultaneously. transport connection to operate across multiple paths simultaneously.
This document presents the protocol changes required to add multipath This document presents the protocol changes required to add multipath
capability to TCP; specifically, those for signaling and setting up capability to TCP; specifically, those for signaling and setting up
multiple paths ("subflows"), managing these subflows, reassembly of multiple paths ("subflows"), managing these subflows, reassembly of
data, and termination of sessions. This is not the only information data, and termination of sessions. This is not the only information
skipping to change at page 4, line 24 skipping to change at page 4, line 24
o Congestion control [RFC6356] presents a safe congestion control o Congestion control [RFC6356] presents a safe congestion control
algorithm for coupling the behavior of the multiple paths in order algorithm for coupling the behavior of the multiple paths in order
to "do no harm" to other network users. to "do no harm" to other network users.
o Application considerations [RFC6897] discusses what impact MPTCP o Application considerations [RFC6897] discusses what impact MPTCP
will have on applications, what applications will want to do with will have on applications, what applications will want to do with
MPTCP, and as a consequence of these factors, what API extensions MPTCP, and as a consequence of these factors, what API extensions
an MPTCP implementation should present. an MPTCP implementation should present.
This document is an update to, and obsoletes, the v0 specification of This document is an update to, and obsoletes, the v0 specification of
Multipath TCP [RFC6824]. This document specifies MPTCP v1, which is Multipath TCP (RFC6824). This document specifies MPTCP v1, which is
not backward compatible with MPTCP v0. This document additionally not backward compatible with MPTCP v0. This document additionally
defines version negotiation procedures for implementations that defines version negotiation procedures for implementations that
support both versions. support both versions.
1.1. Design Assumptions 1.1. Design Assumptions
In order to limit the potentially huge design space, the working In order to limit the potentially huge design space, the mptcp
group imposed two key constraints on the Multipath TCP design working group imposed two key constraints on the Multipath TCP design
presented in this document: presented in this document:
o It must be backwards-compatible with current, regular TCP, to o It must be backwards-compatible with current, regular TCP, to
increase its chances of deployment. increase its chances of deployment.
o It can be assumed that one or both hosts are multihomed and o It can be assumed that one or both hosts are multihomed and
multiaddressed. multiaddressed.
To simplify the design, we assume that the presence of multiple To simplify the design, we assume that the presence of multiple
addresses at a host is sufficient to indicate the existence of addresses at a host is sufficient to indicate the existence of
skipping to change at page 7, line 34 skipping to change at page 7, line 34
with the existing session, which continues to appear as a single with the existing session, which continues to appear as a single
connection to the applications at both ends. The creation of the connection to the applications at both ends. The creation of the
additional TCP session is illustrated between Address A2 on Host A additional TCP session is illustrated between Address A2 on Host A
and Address B1 on Host B. and Address B1 on Host B.
o MPTCP identifies multiple paths by the presence of multiple o MPTCP identifies multiple paths by the presence of multiple
addresses at hosts. Combinations of these multiple addresses addresses at hosts. Combinations of these multiple addresses
equate to the additional paths. In the example, other potential equate to the additional paths. In the example, other potential
paths that could be set up are A1<->B2 and A2<->B2. Although this paths that could be set up are A1<->B2 and A2<->B2. Although this
additional session is shown as being initiated from A2, it could additional session is shown as being initiated from A2, it could
equally have been initiated from B1. equally have been initiated from B1 or B2.
o The discovery and setup of additional subflows will be achieved o The discovery and setup of additional subflows will be achieved
through a path management method; this document describes a through a path management method; this document describes a
mechanism by which a host can initiate new subflows by using its mechanism by which a host can initiate new subflows by using its
own additional addresses, or by signaling its available addresses own additional addresses, or by signaling its available addresses
to the other host. to the other host.
o MPTCP adds connection-level sequence numbers to allow the o MPTCP adds connection-level sequence numbers to allow the
reassembly of segments arriving on multiple subflows with reassembly of segments arriving on multiple subflows with
differing network delays. differing network delays.
skipping to change at page 14, line 44 skipping to change at page 14, line 44
o MPTCP falls back to ordinary TCP if MPTCP operation is not o MPTCP falls back to ordinary TCP if MPTCP operation is not
possible, for example, if one host is not MPTCP capable or if a possible, for example, if one host is not MPTCP capable or if a
middlebox alters the payload. This is discussed in Section 3.7. middlebox alters the payload. This is discussed in Section 3.7.
o To address the threats identified in [RFC6181], the following o To address the threats identified in [RFC6181], the following
steps are taken: keys are sent in the clear in the MP_CAPABLE steps are taken: keys are sent in the clear in the MP_CAPABLE
messages; MP_JOIN messages are secured with HMAC-SHA256 messages; MP_JOIN messages are secured with HMAC-SHA256
([RFC2104], [SHS]) using those keys; and standard TCP validity ([RFC2104], [SHS]) using those keys; and standard TCP validity
checks are made on the other messages (ensuring sequence numbers checks are made on the other messages (ensuring sequence numbers
are in-window [RFC5961]). Residual threats to MPTCP v0 [RFC6824] are in-window [RFC5961]). Residual threats to MPTCP v0 were
were identified in [RFC7430], and those affecting the protocol identified in [RFC7430], and those affecting the protocol (i.e.
(i.e. modification to ADD_ADDR) have been incorporated in this modification to ADD_ADDR) have been incorporated in this document.
document. Further discussion of security can be found in Further discussion of security can be found in Section 5.
Section 5.
3. MPTCP Protocol 3. MPTCP Protocol
This section describes the operation of the MPTCP protocol, and is This section describes the operation of the MPTCP protocol, and is
subdivided into sections for each key part of the protocol operation. subdivided into sections for each key part of the protocol operation.
All MPTCP operations are signaled using optional TCP header fields. All MPTCP operations are signaled using optional TCP header fields.
A single TCP option number ("Kind") has been assigned by IANA for A single TCP option number ("Kind") has been assigned by IANA for
MPTCP (see Section 8), and then individual messages will be MPTCP (see Section 8), and then individual messages will be
determined by a "subtype", the values of which are also stored in an determined by a "subtype", the values of which are also stored in an
skipping to change at page 16, line 23 skipping to change at page 16, line 23
3.1. Connection Initiation 3.1. Connection Initiation
Connection initiation begins with a SYN, SYN/ACK, ACK exchange on a Connection initiation begins with a SYN, SYN/ACK, ACK exchange on a
single path. Each packet contains the Multipath Capable (MP_CAPABLE) single path. Each packet contains the Multipath Capable (MP_CAPABLE)
MPTCP option (Figure 4). This option declares its sender is capable MPTCP option (Figure 4). This option declares its sender is capable
of performing Multipath TCP and wishes to do so on this particular of performing Multipath TCP and wishes to do so on this particular
connection. connection.
The MP_CAPABLE exchange in this specification (v1) is different to The MP_CAPABLE exchange in this specification (v1) is different to
that specified in v0 [RFC6824]. If a host supports multiple versions that specified in v0. If a host supports multiple versions of MPTCP,
of MPTCP, the sender of the MP_CAPABLE option SHOULD signal the the sender of the MP_CAPABLE option SHOULD signal the highest version
highest version number it supports. In return, in its MP_CAPABLE number it supports. In return, in its MP_CAPABLE option, the
option, the receiver will signal the version number it wishes to use, receiver will signal the version number it wishes to use, which MUST
which MUST be equal to or lower than the version number indicated in be equal to or lower than the version number indicated in the initial
the initial MP_CAPABLE. There is a caveat though with respect to MP_CAPABLE. There is a caveat though with respect to this version
this version negotiation with old listeners that only support v0. A negotiation with old listeners that only support v0. A listener that
listener that supports v0 expects that the MP_CAPABLE option in the supports v0 expects that the MP_CAPABLE option in the SYN-segment
SYN-segment includes the initiator's key. If the initiator however includes the initiator's key. If the initiator however already
already upgraded to v1, it won't include the key in the SYN-segment. upgraded to v1, it won't include the key in the SYN-segment. Thus,
Thus, the listener will ignore the MP_CAPABLE of this SYN-segment and the listener will ignore the MP_CAPABLE of this SYN-segment and reply
reply with a SYN/ACK that does not include an MP_CAPABLE, thus with a SYN/ACK that does not include an MP_CAPABLE. The initiator
leading to a fallback to regular TCP. An initiator MAY cache this MAY choose to immediately fall back to TCP or MAY choose to attempt a
information about a peer and for future connections, MAY choose to connection using MPTCP v0 (if the initiator supports v0), in order to
attempt using MPTCP v0, if supported, before recording the host as discover whether the listener supports the earlier version of MPTCP.
not supporting MPTCP. In general a MPTCP v0 connection is likely to be preferred to a TCP
one, however in a particular deployment scenario it may be known that
the listener is unlikely to support MPTCPv0 and so the initiator may
prefer not to attempt a v0 connection. An initiator MAY cache
information for a peer about what version of MPTCP it supports if
any, and use this information for future connection attempts.
The MP_CAPABLE option is variable-length, with different fields The MP_CAPABLE option is variable-length, with different fields
included depending on which packet the option is used on. The full included depending on which packet the option is used on. The full
MP_CAPABLE option is shown in Figure 4. MP_CAPABLE option is shown in Figure 4.
1 2 3 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+---------------+---------------+-------+-------+---------------+ +---------------+---------------+-------+-------+---------------+
| Kind | Length |Subtype|Version|A|B|C|D|E|F|G|H| | Kind | Length |Subtype|Version|A|B|C|D|E|F|G|H|
+---------------+---------------+-------+-------+---------------+ +---------------+---------------+-------+-------+---------------+
skipping to change at page 22, line 6 skipping to change at page 22, line 6
For crypto negotiation, the responder has the choice. The initiator For crypto negotiation, the responder has the choice. The initiator
creates a proposal setting a bit for each algorithm it supports to 1 creates a proposal setting a bit for each algorithm it supports to 1
(in this version of the specification, there is only one proposal, so (in this version of the specification, there is only one proposal, so
bit "H" will be always set to 1). The responder responds with only 1 bit "H" will be always set to 1). The responder responds with only 1
bit set -- this is the chosen algorithm. The rationale for this bit set -- this is the chosen algorithm. The rationale for this
behavior is that the responder will typically be a server with behavior is that the responder will typically be a server with
potentially many thousands of connections, so it may wish to choose potentially many thousands of connections, so it may wish to choose
an algorithm with minimal computational complexity, depending on the an algorithm with minimal computational complexity, depending on the
load. If a responder does not support (or does not want to support) load. If a responder does not support (or does not want to support)
any of the initiator's proposals, it can respond without an any of the initiator's proposals, it MUST respond without an
MP_CAPABLE option, thus forcing a fallback to regular TCP. MP_CAPABLE option, thus forcing a fallback to regular TCP.
The MP_CAPABLE option is only used in the first subflow of a The MP_CAPABLE option is only used in the first subflow of a
connection, in order to identify the connection; all following connection, in order to identify the connection; all following
subflows will use the "Join" option (see Section 3.2) to join the subflows will use the "Join" option (see Section 3.2) to join the
existing connection. existing connection.
If a SYN contains an MP_CAPABLE option but the SYN/ACK does not, it If a SYN contains an MP_CAPABLE option but the SYN/ACK does not, it
is assumed that sender of the SYN/ACK is not multipath capable; thus, is assumed that sender of the SYN/ACK is not multipath capable; thus,
the MPTCP session MUST operate as a regular, single-path TCP. If a the MPTCP session MUST operate as a regular, single-path TCP. If a
skipping to change at page 24, line 34 skipping to change at page 24, line 34
path (B=1) in the event of failure of other paths, or whether it path (B=1) in the event of failure of other paths, or whether it
wants it to be used as part of the connection immediately. By wants it to be used as part of the connection immediately. By
setting B=1, the sender of the option is requesting the other host to setting B=1, the sender of the option is requesting the other host to
only send data on this subflow if there are no available subflows only send data on this subflow if there are no available subflows
where B=0. Subflow policy is discussed in more detail in where B=0. Subflow policy is discussed in more detail in
Section 3.3.8. Section 3.3.8.
1 2 3 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+---------------+---------------+-------+-----+-+---------------+ +---------------+---------------+-------+-----+-+---------------+
| Kind | Length = 12 |Subtype| |B| Address ID | | Kind | Length = 12 |Subtype|(rsv)|B| Address ID |
+---------------+---------------+-------+-----+-+---------------+ +---------------+---------------+-------+-----+-+---------------+
| Receiver's Token (32 bits) | | Receiver's Token (32 bits) |
+---------------------------------------------------------------+ +---------------------------------------------------------------+
| Sender's Random Number (32 bits) | | Sender's Random Number (32 bits) |
+---------------------------------------------------------------+ +---------------------------------------------------------------+
Figure 5: Join Connection (MP_JOIN) Option (for Initial SYN) Figure 5: Join Connection (MP_JOIN) Option (for Initial SYN)
When receiving a SYN with an MP_JOIN option that contains a valid When receiving a SYN with an MP_JOIN option that contains a valid
token for an existing MPTCP connection, the recipient SHOULD respond token for an existing MPTCP connection, the recipient SHOULD respond
skipping to change at page 26, line 8 skipping to change at page 26, line 8
transmitted by Host A, will be Key-A followed by Key-B, and in the transmitted by Host A, will be Key-A followed by Key-B, and in the
case of Host B, Key-B followed by Key-A. These are the keys that case of Host B, Key-B followed by Key-A. These are the keys that
were exchanged in the original MP_CAPABLE handshake. The "message" were exchanged in the original MP_CAPABLE handshake. The "message"
for the HMAC algorithm in each case is the concatenations of random for the HMAC algorithm in each case is the concatenations of random
number for each host (denoted by R): for Host A, R-A followed by R-B; number for each host (denoted by R): for Host A, R-A followed by R-B;
and for Host B, R-B followed by R-A. and for Host B, R-B followed by R-A.
1 2 3 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+---------------+---------------+-------+-----+-+---------------+ +---------------+---------------+-------+-----+-+---------------+
| Kind | Length = 16 |Subtype| |B| Address ID | | Kind | Length = 16 |Subtype|(rsv)|B| Address ID |
+---------------+---------------+-------+-----+-+---------------+ +---------------+---------------+-------+-----+-+---------------+
| | | |
| Sender's Truncated HMAC (64 bits) | | Sender's Truncated HMAC (64 bits) |
| | | |
+---------------------------------------------------------------+ +---------------------------------------------------------------+
| Sender's Random Number (32 bits) | | Sender's Random Number (32 bits) |
+---------------------------------------------------------------+ +---------------------------------------------------------------+
Figure 6: Join Connection (MP_JOIN) Option (for Responding SYN/ACK) Figure 6: Join Connection (MP_JOIN) Option (for Responding SYN/ACK)
skipping to change at page 30, line 43 skipping to change at page 30, line 43
the data sequence number after the mapping has been processed. A the data sequence number after the mapping has been processed. A
sender MUST NOT change this mapping after it has been declared; sender MUST NOT change this mapping after it has been declared;
however, the same data sequence number can be mapped to by different however, the same data sequence number can be mapped to by different
subflows for retransmission purposes (see Section 3.3.6). This would subflows for retransmission purposes (see Section 3.3.6). This would
also permit the same data to be sent simultaneously on multiple also permit the same data to be sent simultaneously on multiple
subflows for resilience or efficiency purposes, especially in the subflows for resilience or efficiency purposes, especially in the
case of lossy links. Although the detailed specification of such case of lossy links. Although the detailed specification of such
operation is outside the scope of this document, an implementation operation is outside the scope of this document, an implementation
SHOULD treat the first data that is received at a subflow for the SHOULD treat the first data that is received at a subflow for the
data sequence space as that which should be delivered to the data sequence space as that which should be delivered to the
application, and any later data for that sequence space should be application, and any later data for that sequence space SHOULD be
ignored. ignored.
The data sequence number is specified as an absolute value, whereas The data sequence number is specified as an absolute value, whereas
the subflow sequence numbering is relative (the SYN at the start of the subflow sequence numbering is relative (the SYN at the start of
the subflow has relative subflow sequence number 0). This is to the subflow has relative subflow sequence number 0). This is to
allow middleboxes to change the initial sequence number of a subflow, allow middleboxes to change the initial sequence number of a subflow,
such as firewalls that undertake Initial Sequence Number (ISN) such as firewalls that undertake Initial Sequence Number (ISN)
randomization. randomization.
The data sequence mapping also contains a checksum of the data that The data sequence mapping also contains a checksum of the data that
skipping to change at page 33, line 32 skipping to change at page 33, line 32
standard TCP cumulative ACK -- indicating how much data has been standard TCP cumulative ACK -- indicating how much data has been
successfully received (with no holes). This is in comparison to the successfully received (with no holes). This is in comparison to the
subflow-level ACK, which acts analogous to TCP SACK, given that there subflow-level ACK, which acts analogous to TCP SACK, given that there
may still be holes in the data stream at the connection level. The may still be holes in the data stream at the connection level. The
Data ACK specifies the next data sequence number it expects to Data ACK specifies the next data sequence number it expects to
receive. receive.
The Data ACK, as for the DSN, can be sent as the full 64-bit value, The Data ACK, as for the DSN, can be sent as the full 64-bit value,
or as the lower 32 bits. If data is received with a 64-bit DSN, it or as the lower 32 bits. If data is received with a 64-bit DSN, it
MUST be acknowledged with a 64-bit Data ACK. If the DSN received is MUST be acknowledged with a 64-bit Data ACK. If the DSN received is
32 bits, it is valid for the implementation to choose whether to send 32 bits, an implementation can choose whether to send a 32-bit or
a 32-bit or 64-bit Data ACK. 64-bit Data ACK, and an implementation MUST accept either in this
situation.
The Data ACK proves that the data, and all required MPTCP signaling, The Data ACK proves that the data, and all required MPTCP signaling,
has been received and accepted by the remote end. One key use of the has been received and accepted by the remote end. One key use of the
Data ACK signal is that it is used to indicate the left edge of the Data ACK signal is that it is used to indicate the left edge of the
advertised receive window. As explained in Section 3.3.4, the advertised receive window. As explained in Section 3.3.4, the
receive window is shared by all subflows and is relative to the Data receive window is shared by all subflows and is relative to the Data
ACK. Because of this, an implementation MUST NOT use the RCV.WND ACK. Because of this, an implementation MUST NOT use the RCV.WND
field of a TCP segment at the connection level if it does not also field of a TCP segment at the connection level if it does not also
carry a DSS option with a Data ACK field. Furthermore, separating carry a DSS option with a Data ACK field. Furthermore, separating
the connection-level acknowledgments from the subflow level allows the connection-level acknowledgments from the subflow level allows
skipping to change at page 38, line 8 skipping to change at page 38, line 8
The data sequence mapping allows senders to resend data with the same The data sequence mapping allows senders to resend data with the same
data sequence number on a different subflow. When doing this, a host data sequence number on a different subflow. When doing this, a host
MUST still retransmit the original data on the original subflow, in MUST still retransmit the original data on the original subflow, in
order to preserve the subflow integrity (middleboxes could replay old order to preserve the subflow integrity (middleboxes could replay old
data, and/or could reject holes in subflows), and a receiver will data, and/or could reject holes in subflows), and a receiver will
ignore these retransmissions. While this is clearly suboptimal, for ignore these retransmissions. While this is clearly suboptimal, for
compatibility reasons this is sensible behavior. Optimizations could compatibility reasons this is sensible behavior. Optimizations could
be negotiated in future versions of this protocol. Note also that be negotiated in future versions of this protocol. Note also that
this property would also permit a sender to always send the same this property would also permit a sender to always send the same
data, with the same data sequence number, on multiple subflows, if it data, with the same data sequence number, on multiple subflows, if
so desired for reliability reasons. desired for reliability reasons.
This protocol specification does not mandate any mechanisms for This protocol specification does not mandate any mechanisms for
handling retransmissions, and much will be dependent upon local handling retransmissions, and much will be dependent upon local
policy (as discussed in Section 3.3.8). One can imagine aggressive policy (as discussed in Section 3.3.8). One can imagine aggressive
connection-level retransmissions policies where every packet lost at connection-level retransmissions policies where every packet lost at
subflow level is retransmitted on a different subflow (hence, wasting subflow level is retransmitted on a different subflow (hence, wasting
bandwidth but possibly reducing application-to-application delays), bandwidth but possibly reducing application-to-application delays),
or conservative retransmission policies where connection-level or conservative retransmission policies where connection-level
retransmits are only used after a few subflow-level retransmission retransmits are only used after a few subflow-level retransmission
timeouts occur. timeouts occur.
skipping to change at page 38, line 41 skipping to change at page 38, line 41
which it has been sent. In this way, the sender can always which it has been sent. In this way, the sender can always
retransmit the data if needed, on the same subflow or on a different retransmit the data if needed, on the same subflow or on a different
one. A special case is when a subflow fails: the sender will one. A special case is when a subflow fails: the sender will
typically resend the data on other working subflows after a timeout, typically resend the data on other working subflows after a timeout,
and will keep trying to retransmit the data on the failed subflow and will keep trying to retransmit the data on the failed subflow
too. The sender will declare the subflow failed after a predefined too. The sender will declare the subflow failed after a predefined
upper bound on retransmissions is reached (which MAY be lower than upper bound on retransmissions is reached (which MAY be lower than
the usual TCP limits of the Maximum Segment Life), or on the receipt the usual TCP limits of the Maximum Segment Life), or on the receipt
of an ICMP error, and only then delete the outstanding data segments. of an ICMP error, and only then delete the outstanding data segments.
Multiple retransmissions are triggers that will indicate that a If multiple retransmissions are triggered that indicate that a
subflow performs badly and could lead to a host resetting the subflow subflow performs badly, this MAY lead to a host resetting the subflow
with a RST. However, additional research is required to understand with a RST. However, additional research is required to understand
the heuristics of how and when to reset underperforming subflows. the heuristics of how and when to reset underperforming subflows.
For example, a highly asymmetric path may be misdiagnosed as For example, a highly asymmetric path may be misdiagnosed as
underperforming. A RST for this purpose SHOULD be accompanied with underperforming. A RST for this purpose SHOULD be accompanied with
an "Unacceptable performance" MP_TCPRST option (Section 3.6). an "Unacceptable performance" MP_TCPRST option (Section 3.6).
3.3.7. Congestion Control Considerations 3.3.7. Congestion Control Considerations
Different subflows in an MPTCP connection have different congestion Different subflows in an MPTCP connection have different congestion
windows. To achieve fairness at bottlenecks and resource pooling, it windows. To achieve fairness at bottlenecks and resource pooling, it
skipping to change at page 40, line 7 skipping to change at page 40, line 7
where stability (of delay or bandwidth) is more important than where stability (of delay or bandwidth) is more important than
throughput. Application requirements such as these are discussed in throughput. Application requirements such as these are discussed in
detail in [RFC6897]. detail in [RFC6897].
The ability to make effective choices at the sender requires full The ability to make effective choices at the sender requires full
knowledge of the path "cost", which is unlikely to be the case. It knowledge of the path "cost", which is unlikely to be the case. It
would be desirable for a receiver to be able to signal their own would be desirable for a receiver to be able to signal their own
preferences for paths, since they will often be the multihomed party, preferences for paths, since they will often be the multihomed party,
and may have to pay for metered incoming bandwidth. and may have to pay for metered incoming bandwidth.
Whilst fine-grained control may be the most powerful solution, that To enable this, the MP_JOIN option (see Section 3.2) contains the 'B'
would require some mechanism such as overloading the Explicit bit, which allows a host to indicate to its peer that this path
Congestion Notification (ECN) signal [RFC3168], which is undesirable, should be treated as a backup path to use only in the event of
and it is felt that there would not be sufficient benefit to justify failure of other working subflows (i.e., a subflow where the receiver
an entirely new signal. Therefore, the MP_JOIN option (see has indicated B=1 SHOULD NOT be used to send data unless there are no
Section 3.2) contains the 'B' bit, which allows a host to indicate to usable subflows where B=0).
its peer that this path should be treated as a backup path to use
only in the event of failure of other working subflows (i.e., a
subflow where the receiver has indicated B=1 SHOULD NOT be used to
send data unless there are no usable subflows where B=0).
In the event that the available set of paths changes, a host may wish In the event that the available set of paths changes, a host may wish
to signal a change in priority of subflows to the peer (e.g., a to signal a change in priority of subflows to the peer (e.g., a
subflow that was previously set as backup should now take priority subflow that was previously set as backup should now take priority
over all remaining subflows). Therefore, the MP_PRIO option, shown over all remaining subflows). Therefore, the MP_PRIO option, shown
in Figure 11, can be used to change the 'B' flag of the subflow on in Figure 11, can be used to change the 'B' flag of the subflow on
which it is sent. which it is sent.
Another use of the MP_PRIO option is to set the 'B' flag on a subflow Another use of the MP_PRIO option is to set the 'B' flag on a subflow
to cleanly retire its use before closing it and removing it with to cleanly retire its use before closing it and removing it with
REMOVE_ADDR Section 3.4.2, for example to support make-before-break REMOVE_ADDR Section 3.4.2, for example to support make-before-break
session continuity, where new subflows are added before the session continuity, where new subflows are added before the
previously used ones are closed. previously used ones are closed.
1 2 3 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+---------------+---------------+-------+-----+-+ +---------------+---------------+-------+-----+-+
| Kind | Length |Subtype| |B| | Kind | Length |Subtype|(rsv)|B|
+---------------+---------------+-------+-----+-+ +---------------+---------------+-------+-----+-+
Figure 11: Change Subflow Priority (MP_PRIO) Option Figure 11: Change Subflow Priority (MP_PRIO) Option
It should be noted that the backup flag is a request from a data It should be noted that the backup flag is a request from a data
receiver to a data sender only, and the data sender SHOULD adhere to receiver to a data sender only, and the data sender SHOULD adhere to
these requests. A host cannot assume that the data sender will do these requests. A host cannot assume that the data sender will do
so, however, since local policies -- or technical difficulties -- may so, however, since local policies -- or technical difficulties -- may
override MP_PRIO requests. Note also that this signal applies to a override MP_PRIO requests. Note also that this signal applies to a
single direction, and so the sender of this option could choose to single direction, and so the sender of this option could choose to
skipping to change at page 50, line 38 skipping to change at page 50, line 20
with a DSS option containing a Data ACK. Upon reception of the with a DSS option containing a Data ACK. Upon reception of the
acknowledgment, the sender has the confirmation that the DSS option acknowledgment, the sender has the confirmation that the DSS option
passes in both directions and may choose to send fewer DSS options passes in both directions and may choose to send fewer DSS options
than once per segment. than once per segment.
If, however, an ACK is received for data (not just for the SYN) If, however, an ACK is received for data (not just for the SYN)
without a DSS option containing a Data ACK, the sender determines the without a DSS option containing a Data ACK, the sender determines the
path is not MPTCP capable. In the case of this occurring on an path is not MPTCP capable. In the case of this occurring on an
additional subflow (i.e., one started with MP_JOIN), the host MUST additional subflow (i.e., one started with MP_JOIN), the host MUST
close the subflow with a RST, which SHOULD contain a MP_TCPRST option close the subflow with a RST, which SHOULD contain a MP_TCPRST option
(Section 3.6) with a "Middlebox interferance" reason code. (Section 3.6) with a "Middlebox interference" reason code.
In the case of such an ACK being received on the first subflow (i.e., In the case of such an ACK being received on the first subflow (i.e.,
that started with MP_CAPABLE), before any additional subflows are that started with MP_CAPABLE), before any additional subflows are
added, the implementation MUST drop out of an MPTCP mode, back to added, the implementation MUST drop out of an MPTCP mode, back to
regular TCP. The sender will send one final data sequence mapping, regular TCP. The sender will send one final data sequence mapping,
with the Data-Level Length value of 0 indicating an infinite mapping with the Data-Level Length value of 0 indicating an infinite mapping
(to inform the other end in case the path drops options in one (to inform the other end in case the path drops options in one
direction only), and then revert to sending data on the single direction only), and then revert to sending data on the single
subflow without any MPTCP options. subflow without any MPTCP options.
If a subflow breaks during operation, e.g. if it is re-routed and If a subflow breaks during operation, e.g. if it is re-routed and
MPTCP options are no longer permitted, then once this is detected (by MPTCP options are no longer permitted, then once this is detected (by
the subflow-level receive buffer filling up), the subflow SHOULD be the subflow-level receive buffer filling up, since there is no
treated as broken and closed with a RST, since no data can be mapping available in order to DATA_ACK this data), the subflow SHOULD
be treated as broken and closed with a RST, since no data can be
delivered to the application layer, and no fallback signal can be delivered to the application layer, and no fallback signal can be
reliably sent. This RST SHOULD include the MP_TCPRST option reliably sent. This RST SHOULD include the MP_TCPRST option
(Section 3.6) with a "Middlebox interferance" reason code. (Section 3.6) with a "Middlebox interference" reason code.
These rules should cover all cases where such a failure could happen: These rules should cover all cases where such a failure could happen:
whether it's on the forward or reverse path and whether the server or whether it's on the forward or reverse path and whether the server or
the client first sends data. If lost options on data packets occur the client first sends data.
on any other subflow apart from the initial subflow, it should be
treated as a standard path failure. The data would not be DATA_ACKed
(since there is no mapping for the data), and the subflow can be
closed with a RST, containing a MP_TCPRST option (Section 3.6) with a
"Middlebox interferance" reason code.
So far this section has discussed the lost of MPTCP options, either So far this section has discussed the lost of MPTCP options, either
initially, or during the course of the connection. As described in initially, or during the course of the connection. As described in
Section 3.3, each portion of data for which there is a mapping is Section 3.3, each portion of data for which there is a mapping is
protected by a checksum, if checksums have been negotiated. This protected by a checksum, if checksums have been negotiated. This
mechanism is used to detect if middleboxes have made any adjustments mechanism is used to detect if middleboxes have made any adjustments
to the payload (added, removed, or changed data). A checksum will to the payload (added, removed, or changed data). A checksum will
fail if the data has been changed in any way. This will also detect fail if the data has been changed in any way. This will also detect
if the length of data on the subflow is increased or decreased, and if the length of data on the subflow is increased or decreased, and
this means the data sequence mapping is no longer valid. The sender this means the data sequence mapping is no longer valid. The sender
skipping to change at page 54, line 34 skipping to change at page 54, line 12
feasible to allow multiple subflows between the same two addresses feasible to allow multiple subflows between the same two addresses
but using different port pairs, and such a facility could be used to but using different port pairs, and such a facility could be used to
allow load balancing within the network based on 5-tuples (e.g., some allow load balancing within the network based on 5-tuples (e.g., some
ECMP implementations [RFC2992]). ECMP implementations [RFC2992]).
3.9.2. Delayed Subflow Start and Subflow Symmetry 3.9.2. Delayed Subflow Start and Subflow Symmetry
Many TCP connections are short-lived and consist only of a few Many TCP connections are short-lived and consist only of a few
segments, and so the overheads of using MPTCP outweigh any benefits. segments, and so the overheads of using MPTCP outweigh any benefits.
A heuristic is required, therefore, to decide when to start using A heuristic is required, therefore, to decide when to start using
additional subflows in an MPTCP connection. We expect that additional subflows in an MPTCP connection. Experimental deployments
experience gathered from deployments will provide further guidance on have shown that MPTCP can be applied in a range of scenarios so an
this, and will be affected by particular application characteristics implementation is likely to need to take into account factors
(which are likely to change over time). However, a suggested including the type of traffic being sent and duration of session, and
general-purpose heuristic that an implementation MAY choose to employ this information MAY be signalled by the application layer.
is as follows. Results from experimental deployments are needed in
order to verify the correctness of this proposal. However, for standard TCP traffic, a suggested general-purpose
heuristic that an implementation MAY choose to employ is as follows.
If a host has data buffered for its peer (which implies that the If a host has data buffered for its peer (which implies that the
application has received a request for data), the host opens one application has received a request for data), the host opens one
subflow for each initial window's worth of data that is buffered. subflow for each initial window's worth of data that is buffered.
Consideration should also be given to limiting the rate of adding new Consideration should also be given to limiting the rate of adding new
subflows, as well as limiting the total number of subflows open for a subflows, as well as limiting the total number of subflows open for a
particular connection. A host may choose to vary these values based particular connection. A host may choose to vary these values based
on its load or knowledge of traffic and path characteristics. on its load or knowledge of traffic and path characteristics.
skipping to change at page 55, line 21 skipping to change at page 54, line 46
An additional time-based heuristic could be applied, opening An additional time-based heuristic could be applied, opening
additional subflows after a given period of time has passed. This additional subflows after a given period of time has passed. This
would alleviate the above issue, and also provide resilience for low- would alleviate the above issue, and also provide resilience for low-
bandwidth but long-lived applications. bandwidth but long-lived applications.
Another issue is that both communicating hosts may simultaneously try Another issue is that both communicating hosts may simultaneously try
to set up a subflow between the same pair of addresses. This leads to set up a subflow between the same pair of addresses. This leads
to an inefficient use of resources. to an inefficient use of resources.
If the the same ports are used on all subflows, as recommended above, If the same ports are used on all subflows, as recommended above,
then standard TCP simultaneous open logic should take care of this then standard TCP simultaneous open logic should take care of this
situation and only one subflow will be established between the situation and only one subflow will be established between the
address pairs. However, this relies on the same ports being used at address pairs. However, this relies on the same ports being used at
both end hosts. If a host does not support TCP simultaneous open, it both end hosts. If a host does not support TCP simultaneous open, it
is RECOMMENDED that some element of randomization is applied to the is RECOMMENDED that some element of randomization is applied to the
time to wait before opening new subflows, so that only one subflow is time to wait before opening new subflows, so that only one subflow is
created between a given address pair. If, however, hosts signal created between a given address pair. If, however, hosts signal
additional ports to use (for example, for leveraging ECMP on-path), additional ports to use (for example, for leveraging ECMP on-path),
this heuristic is not appropriate. this heuristic is not appropriate.
skipping to change at page 55, line 47 skipping to change at page 55, line 24
Requirements for MPTCP's handling of unexpected signals have been Requirements for MPTCP's handling of unexpected signals have been
given in Section 3.8. There are other failure cases, however, where given in Section 3.8. There are other failure cases, however, where
a hosts can choose appropriate behavior. a hosts can choose appropriate behavior.
For example, Section 3.1 suggests that a host SHOULD fall back to For example, Section 3.1 suggests that a host SHOULD fall back to
trying regular TCP SYNs after one or more failures of MPTCP SYNs for trying regular TCP SYNs after one or more failures of MPTCP SYNs for
a connection. A host may keep a system-wide cache of such a connection. A host may keep a system-wide cache of such
information, so that it can back off from using MPTCP, firstly for information, so that it can back off from using MPTCP, firstly for
that particular destination host, and eventually on a whole that particular destination host, and eventually on a whole
interface, if MPTCP connections continue failing. interface, if MPTCP connections continue failing. The duration of
such a cache would be implementation-specific.
Another failure could occur when the MP_JOIN handshake fails. Another failure could occur when the MP_JOIN handshake fails.
Section 3.8 specifies that an incorrect handshake MUST lead to the Section 3.8 specifies that an incorrect handshake MUST lead to the
subflow being closed with a RST. A host operating an active subflow being closed with a RST. A host operating an active
intrusion detection system may choose to start blocking MP_JOIN intrusion detection system may choose to start blocking MP_JOIN
packets from the source host if multiple failed MP_JOIN attempts are packets from the source host if multiple failed MP_JOIN attempts are
seen. From the connection initiator's point of view, if an MP_JOIN seen. From the connection initiator's point of view, if an MP_JOIN
fails, it SHOULD NOT attempt to connect to the same IP address and fails, it SHOULD NOT attempt to connect to the same IP address and
port during the lifetime of the connection, unless the other host port during the lifetime of the connection, unless the other host
refreshes the information with another ADD_ADDR option. Note that refreshes the information with another ADD_ADDR option. Note that
skipping to change at page 58, line 44 skipping to change at page 58, line 23
mechanism presented in this document should therefore protect against mechanism presented in this document should therefore protect against
all forms of flooding and hijacking attacks discussed in [RFC6181]. all forms of flooding and hijacking attacks discussed in [RFC6181].
The version negotiation specified in Section 3.1, if differing MPTCP The version negotiation specified in Section 3.1, if differing MPTCP
versions shared a common negotiation format, would allow an on-path versions shared a common negotiation format, would allow an on-path
attacker to apply a theoretical bid-down attack. Since the v1 and v0 attacker to apply a theoretical bid-down attack. Since the v1 and v0
protocols have a different handshake, such an attack would require protocols have a different handshake, such an attack would require
the client to re-establish the connection using v0, and this being the client to re-establish the connection using v0, and this being
supported by the server. Note that an on-path attacker would have supported by the server. Note that an on-path attacker would have
access to the raw data, negating any other TCP-level security access to the raw data, negating any other TCP-level security
mechanisms. Also a change from [RFC6824] has removed the subflow mechanisms. Also a change from RFC6824 has removed the subflow
identifier from the MP_PRIO option (Section 3.3.8), to remove the identifier from the MP_PRIO option (Section 3.3.8), to remove the
theoretical attack where a subflow could be placed in "backup" mode theoretical attack where a subflow could be placed in "backup" mode
by an attacker. by an attacker.
During normal operation, regular TCP protection mechanisms (such as During normal operation, regular TCP protection mechanisms (such as
ensuring sequence numbers are in-window) will provide the same level ensuring sequence numbers are in-window) will provide the same level
of protection against attacks on individual TCP subflows as exists of protection against attacks on individual TCP subflows as exists
for regular TCP today. Implementations will introduce additional for regular TCP today. Implementations will introduce additional
buffers compared to regular TCP, to reassemble data at the connection buffers compared to regular TCP, to reassemble data at the connection
level. The application of window sizing will minimize the risk of level. The application of window sizing will minimize the risk of
skipping to change at page 64, line 7 skipping to change at page 63, line 41
Iljitsch van Beijnum, Lars Eggert, Marcelo Bagnulo, Robert Hancock, Iljitsch van Beijnum, Lars Eggert, Marcelo Bagnulo, Robert Hancock,
Pasi Sarolahti, Toby Moncaster, Philip Eardley, Sergio Lembo, Pasi Sarolahti, Toby Moncaster, Philip Eardley, Sergio Lembo,
Lawrence Conroy, Yoshifumi Nishida, Bob Briscoe, Stein Gjessing, Lawrence Conroy, Yoshifumi Nishida, Bob Briscoe, Stein Gjessing,
Andrew McGregor, Georg Hampel, Anumita Biswas, Wes Eddy, Alexey Andrew McGregor, Georg Hampel, Anumita Biswas, Wes Eddy, Alexey
Melnikov, Francis Dupont, Adrian Farrel, Barry Leiba, Robert Sparks, Melnikov, Francis Dupont, Adrian Farrel, Barry Leiba, Robert Sparks,
Sean Turner, Stephen Farrell, Martin Stiemerling, Gregory Detal, Sean Turner, Stephen Farrell, Martin Stiemerling, Gregory Detal,
Fabien Duchene, Xavier de Foy, Rahul Jadhav, and Klemens Schragel. Fabien Duchene, Xavier de Foy, Rahul Jadhav, and Klemens Schragel.
8. IANA Considerations 8. IANA Considerations
This document obsoletes [RFC6824] and as such IANA is requested to This document obsoletes RFC6824 and as such IANA is requested to
update the TCP option space registry to point to this document for update the TCP option space registry to point to this document for
Multipath TCP, as follows: Multipath TCP, as follows:
+------+--------+-----------------------+---------------+ +------+--------+-----------------------+---------------+
| Kind | Length | Meaning | Reference | | Kind | Length | Meaning | Reference |
+------+--------+-----------------------+---------------+ +------+--------+-----------------------+---------------+
| 30 | N | Multipath TCP (MPTCP) | This document | | 30 | N | Multipath TCP (MPTCP) | This document |
+------+--------+-----------------------+---------------+ +------+--------+-----------------------+---------------+
Table 1: TCP Option Kind Numbers Table 1: TCP Option Kind Numbers
8.1. MPTCP Option Subtypes 8.1. MPTCP Option Subtypes
The 4-bit MPTCP subtype sub-registry ("MPTCP Option Subtypes" under The 4-bit MPTCP subtype sub-registry ("MPTCP Option Subtypes" under
the "Transmission Control Protocol (TCP) Parameters" registry) was the "Transmission Control Protocol (TCP) Parameters" registry) was
defined in [RFC6824]. This document defines one additional subtype defined in RFC6824. Since RFC6824 was an Experimental not Standards
(ADD_ADDR) and updates the references to this document for all sub- Track RFC, and since no further entries have occurred beyond those
types except ADD_ADDR, which is deprecated. The updates are listed pointing to RFC6824, IANA is requested to replace the existing
in the following table. registry with Table 2 and with the following explanatory note.
Note: This registry specifies the MPTCP Option Subtypes for MPTCP v1,
which obsoletes the Experimental MPTCP v0. For the MPTCP v0
subtypes, please refer to RFC6824.
+-------+-----------------+-------------------------+---------------+ +-------+-----------------+-------------------------+---------------+
| Value | Symbol | Name | Reference | | Value | Symbol | Name | Reference |
+-------+-----------------+-------------------------+---------------+ +-------+-----------------+-------------------------+---------------+
| 0x0 | MP_CAPABLE | Multipath Capable | This | | 0x0 | MP_CAPABLE | Multipath Capable | This |
| | | | document, | | | | | document, |
| | | | Section 3.1 | | | | | Section 3.1 |
| 0x1 | MP_JOIN | Join Connection | This | | 0x1 | MP_JOIN | Join Connection | This |
| | | | document, | | | | | document, |
| | | | Section 3.2 | | | | | Section 3.2 |
skipping to change at page 65, line 43 skipping to change at page 65, line 7
| | | | document, | | | | | document, |
| | | | Section 3.6 | | | | | Section 3.6 |
| 0xf | MP_EXPERIMENTAL | Reserved for private | | | 0xf | MP_EXPERIMENTAL | Reserved for private | |
| | | experiments | | | | | experiments | |
+-------+-----------------+-------------------------+---------------+ +-------+-----------------+-------------------------+---------------+
Table 2: MPTCP Option Subtypes Table 2: MPTCP Option Subtypes
Values 0x9 through 0xe are currently unassigned. Option 0xf is Values 0x9 through 0xe are currently unassigned. Option 0xf is
reserved for use by private experiments. Its use may be formalized reserved for use by private experiments. Its use may be formalized
in a future specification. in a future specification. Future assignments in this registry are
to be defined by Standards Action as defined by [RFC8126].
Assignments consist of the MPTCP subtype's symbolic name and its
associated value, and a reference to its specification.
8.2. MPTCP Handshake Algorithms 8.2. MPTCP Handshake Algorithms
IANA has created another sub-registry, "MPTCP Handshake Algorithms" The "MPTCP Handshake Algorithms" sub-registry under the "Transmission
under the "Transmission Control Protocol (TCP) Parameters" registry, Control Protocol (TCP) Parameters" registry was defined in RFC6824.
based on the flags in MP_CAPABLE (Section 3.1). IANA is requested to Since RFC6824 was an Experimental not Standards Track RFC, and since
update the references of this table to this document, as follows: no further entries have occurred beyond those pointing to RFC6824,
IANA is requested to replace the existing registry with Table 3 and
with the following explanatory note.
Note: This registry specifies the MPTCP Handshake Algorithms for
MPTCP v1, which obsoletes the Experimental MPTCP v0. For the MPTCP
v0 subtypes, please refer to RFC6824.
+-------+----------------------------------------+------------------+ +-------+----------------------------------------+------------------+
| Flag | Meaning | Reference | | Flag | Meaning | Reference |
| Bit | | | | Bit | | |
+-------+----------------------------------------+------------------+ +-------+----------------------------------------+------------------+
| A | Checksum required | This document, | | A | Checksum required | This document, |
| | | Section 3.1 | | | | Section 3.1 |
| B | Extensibility | This document, | | B | Extensibility | This document, |
| | | Section 3.1 | | | | Section 3.1 |
| C | Do not attempt to establish new | This document, | | C | Do not attempt to establish new | This document, |
skipping to change at page 66, line 33 skipping to change at page 66, line 7
B, depending on how Extensibility is defined in future B, depending on how Extensibility is defined in future
specifications; see Section 3.1 for more information. specifications; see Section 3.1 for more information.
Future assignments in this registry are also to be defined by Future assignments in this registry are also to be defined by
Standards Action as defined by [RFC8126]. Assignments consist of the Standards Action as defined by [RFC8126]. Assignments consist of the
value of the flags, a symbolic name for the algorithm, and a value of the flags, a symbolic name for the algorithm, and a
reference to its specification. reference to its specification.
8.3. MP_TCPRST Reason Codes 8.3. MP_TCPRST Reason Codes
IANA is requested to create a further sub-registry, "MP_TCPRST Reason IANA is requested to create a further sub-registry, "MPTCP MP_TCPRST
Codes" under the "Transmission Control Protocol (TCP) Parameters" Reason Codes" under the "Transmission Control Protocol (TCP)
registry, based on the reason code in MP_TCPRST (Section 3.6): Parameters" registry, based on the reason code in MP_TCPRST
(Section 3.6) message. Initial values for this registry are give in
Table 4; future assignments are to be defined by Specification
Required as defined by [RFC8126]. Assignments consist of the value
of the code, a short description of its meaning, and a reference to
its specification. The maximum value is 0xff.
+------+-----------------------------+----------------------------+ +------+-----------------------------+----------------------------+
| Code | Meaning | Reference | | Code | Meaning | Reference |
+------+-----------------------------+----------------------------+ +------+-----------------------------+----------------------------+
| 0x00 | Unspecified TCP error | This document, Section 3.6 | | 0x00 | Unspecified TCP error | This document, Section 3.6 |
| 0x01 | MPTCP specific error | This document, Section 3.6 | | 0x01 | MPTCP specific error | This document, Section 3.6 |
| 0x02 | Lack of resources | This document, Section 3.6 | | 0x02 | Lack of resources | This document, Section 3.6 |
| 0x03 | Administratively prohibited | This document, Section 3.6 | | 0x03 | Administratively prohibited | This document, Section 3.6 |
| 0x04 | Too much outstanding data | This document, Section 3.6 | | 0x04 | Too much outstanding data | This document, Section 3.6 |
| 0x05 | Unacceptable performance | This document, Section 3.6 | | 0x05 | Unacceptable performance | This document, Section 3.6 |
skipping to change at page 67, line 18 skipping to change at page 66, line 43
[RFC0793] Postel, J., "Transmission Control Protocol", STD 7, [RFC0793] Postel, J., "Transmission Control Protocol", STD 7,
RFC 793, DOI 10.17487/RFC0793, September 1981, RFC 793, DOI 10.17487/RFC0793, September 1981,
<https://www.rfc-editor.org/info/rfc793>. <https://www.rfc-editor.org/info/rfc793>.
[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
Requirement Levels", BCP 14, RFC 2119, Requirement Levels", BCP 14, RFC 2119,
DOI 10.17487/RFC2119, March 1997, <https://www.rfc- DOI 10.17487/RFC2119, March 1997, <https://www.rfc-
editor.org/info/rfc2119>. editor.org/info/rfc2119>.
[RFC6182] Ford, A., Raiciu, C., Handley, M., Barre, S., and J.
Iyengar, "Architectural Guidelines for Multipath TCP
Development", RFC 6182, DOI 10.17487/RFC6182, March 2011,
<https://www.rfc-editor.org/info/rfc6182>.
[RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC [RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC
2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174,
May 2017, <https://www.rfc-editor.org/info/rfc8174>. May 2017, <https://www.rfc-editor.org/info/rfc8174>.
[SHS] National Institute of Science and Technology, "Secure Hash [SHS] National Institute of Science and Technology, "Secure Hash
Standard", Federal Information Processing Standard Standard", Federal Information Processing Standard
(FIPS) 180-4, August 2015, (FIPS) 180-4, August 2015,
<http://nvlpubs.nist.gov/nistpubs/FIPS/ <http://nvlpubs.nist.gov/nistpubs/FIPS/
NIST.FIPS.180-4.pdf>. NIST.FIPS.180-4.pdf>.
skipping to change at page 69, line 5 skipping to change at page 68, line 20
Address Translator (Traditional NAT)", RFC 3022, Address Translator (Traditional NAT)", RFC 3022,
DOI 10.17487/RFC3022, January 2001, <https://www.rfc- DOI 10.17487/RFC3022, January 2001, <https://www.rfc-
editor.org/info/rfc3022>. editor.org/info/rfc3022>.
[RFC3135] Border, J., Kojo, M., Griner, J., Montenegro, G., and Z. [RFC3135] Border, J., Kojo, M., Griner, J., Montenegro, G., and Z.
Shelby, "Performance Enhancing Proxies Intended to Shelby, "Performance Enhancing Proxies Intended to
Mitigate Link-Related Degradations", RFC 3135, Mitigate Link-Related Degradations", RFC 3135,
DOI 10.17487/RFC3135, June 2001, <https://www.rfc- DOI 10.17487/RFC3135, June 2001, <https://www.rfc-
editor.org/info/rfc3135>. editor.org/info/rfc3135>.
[RFC3168] Ramakrishnan, K., Floyd, S., and D. Black, "The Addition
of Explicit Congestion Notification (ECN) to IP",
RFC 3168, DOI 10.17487/RFC3168, September 2001,
<https://www.rfc-editor.org/info/rfc3168>.
[RFC4086] Eastlake 3rd, D., Schiller, J., and S. Crocker, [RFC4086] Eastlake 3rd, D., Schiller, J., and S. Crocker,
"Randomness Requirements for Security", BCP 106, RFC 4086, "Randomness Requirements for Security", BCP 106, RFC 4086,
DOI 10.17487/RFC4086, June 2005, <https://www.rfc- DOI 10.17487/RFC4086, June 2005, <https://www.rfc-
editor.org/info/rfc4086>. editor.org/info/rfc4086>.
[RFC4987] Eddy, W., "TCP SYN Flooding Attacks and Common [RFC4987] Eddy, W., "TCP SYN Flooding Attacks and Common
Mitigations", RFC 4987, DOI 10.17487/RFC4987, August 2007, Mitigations", RFC 4987, DOI 10.17487/RFC4987, August 2007,
<https://www.rfc-editor.org/info/rfc4987>. <https://www.rfc-editor.org/info/rfc4987>.
[RFC5681] Allman, M., Paxson, V., and E. Blanton, "TCP Congestion [RFC5681] Allman, M., Paxson, V., and E. Blanton, "TCP Congestion
skipping to change at page 69, line 33 skipping to change at page 68, line 43
[RFC5961] Ramaiah, A., Stewart, R., and M. Dalal, "Improving TCP's [RFC5961] Ramaiah, A., Stewart, R., and M. Dalal, "Improving TCP's
Robustness to Blind In-Window Attacks", RFC 5961, Robustness to Blind In-Window Attacks", RFC 5961,
DOI 10.17487/RFC5961, August 2010, <https://www.rfc- DOI 10.17487/RFC5961, August 2010, <https://www.rfc-
editor.org/info/rfc5961>. editor.org/info/rfc5961>.
[RFC6181] Bagnulo, M., "Threat Analysis for TCP Extensions for [RFC6181] Bagnulo, M., "Threat Analysis for TCP Extensions for
Multipath Operation with Multiple Addresses", RFC 6181, Multipath Operation with Multiple Addresses", RFC 6181,
DOI 10.17487/RFC6181, March 2011, <https://www.rfc- DOI 10.17487/RFC6181, March 2011, <https://www.rfc-
editor.org/info/rfc6181>. editor.org/info/rfc6181>.
[RFC6182] Ford, A., Raiciu, C., Handley, M., Barre, S., and J.
Iyengar, "Architectural Guidelines for Multipath TCP
Development", RFC 6182, DOI 10.17487/RFC6182, March 2011,
<https://www.rfc-editor.org/info/rfc6182>.
[RFC6234] Eastlake 3rd, D. and T. Hansen, "US Secure Hash Algorithms [RFC6234] Eastlake 3rd, D. and T. Hansen, "US Secure Hash Algorithms
(SHA and SHA-based HMAC and HKDF)", RFC 6234, (SHA and SHA-based HMAC and HKDF)", RFC 6234,
DOI 10.17487/RFC6234, May 2011, <https://www.rfc- DOI 10.17487/RFC6234, May 2011, <https://www.rfc-
editor.org/info/rfc6234>. editor.org/info/rfc6234>.
[RFC6356] Raiciu, C., Handley, M., and D. Wischik, "Coupled [RFC6356] Raiciu, C., Handley, M., and D. Wischik, "Coupled
Congestion Control for Multipath Transport Protocols", Congestion Control for Multipath Transport Protocols",
RFC 6356, DOI 10.17487/RFC6356, October 2011, RFC 6356, DOI 10.17487/RFC6356, October 2011,
<https://www.rfc-editor.org/info/rfc6356>. <https://www.rfc-editor.org/info/rfc6356>.
[RFC6528] Gont, F. and S. Bellovin, "Defending against Sequence [RFC6528] Gont, F. and S. Bellovin, "Defending against Sequence
Number Attacks", RFC 6528, DOI 10.17487/RFC6528, February Number Attacks", RFC 6528, DOI 10.17487/RFC6528, February
2012, <https://www.rfc-editor.org/info/rfc6528>. 2012, <https://www.rfc-editor.org/info/rfc6528>.
[RFC6824] Ford, A., Raiciu, C., Handley, M., and O. Bonaventure,
"TCP Extensions for Multipath Operation with Multiple
Addresses", RFC 6824, DOI 10.17487/RFC6824, January 2013,
<https://www.rfc-editor.org/info/rfc6824>.
[RFC6897] Scharf, M. and A. Ford, "Multipath TCP (MPTCP) Application [RFC6897] Scharf, M. and A. Ford, "Multipath TCP (MPTCP) Application
Interface Considerations", RFC 6897, DOI 10.17487/RFC6897, Interface Considerations", RFC 6897, DOI 10.17487/RFC6897,
March 2013, <https://www.rfc-editor.org/info/rfc6897>. March 2013, <https://www.rfc-editor.org/info/rfc6897>.
[RFC7323] Borman, D., Braden, B., Jacobson, V., and R. [RFC7323] Borman, D., Braden, B., Jacobson, V., and R.
Scheffenegger, Ed., "TCP Extensions for High Performance", Scheffenegger, Ed., "TCP Extensions for High Performance",
RFC 7323, DOI 10.17487/RFC7323, September 2014, RFC 7323, DOI 10.17487/RFC7323, September 2014,
<https://www.rfc-editor.org/info/rfc7323>. <https://www.rfc-editor.org/info/rfc7323>.
[RFC7413] Cheng, Y., Chu, J., Radhakrishnan, S., and A. Jain, "TCP [RFC7413] Cheng, Y., Chu, J., Radhakrishnan, S., and A. Jain, "TCP
skipping to change at page 79, line 36 skipping to change at page 78, line 36
------------------------>|M_TIME WAIT|----------------->| M_CLOSED| ------------------------>|M_TIME WAIT|----------------->| M_CLOSED|
+-----------+ +---------+ +-----------+ +---------+
All subflows in CLOSED All subflows in CLOSED
------------ ------------
delete MPTCP PCB delete MPTCP PCB
Figure 22: Finite State Machine for Connection Closure Figure 22: Finite State Machine for Connection Closure
Appendix E. Changes from RFC6184 Appendix E. Changes from RFC6184
This section lists the key technical changes between RFC6824 This section lists the key technical changes between RFC6824,
[RFC6824], specifying MPTCP v0, and this document, which obsoletes specifying MPTCP v0, and this document, which obsoletes RFC6824 and
RFC6824 and specifies MPTCP v1. Note that this specification is not specifies MPTCP v1. Note that this specification is not backwards
backwards compatible with RFC6824. compatible with RFC6824.
o The document incorporates lessons learnt from the various o The document incorporates lessons learnt from the various
implementations, deployments and experiments gathered in the implementations, deployments and experiments gathered in the
documents "Use Cases and Operational Experience with Multipath documents "Use Cases and Operational Experience with Multipath
TCP" [RFC8041] and the IETF Journal article "Multipath TCP TCP" [RFC8041] and the IETF Journal article "Multipath TCP
Deployments" [deployments]. Deployments" [deployments].
o Connection initiation, through the exchange of the MP_CAPABLE o Connection initiation, through the exchange of the MP_CAPABLE
MPTCP option, is different from RFC6824. In order to permit MPTCP option, is different from RFC6824. The SYN no longer
servers to act statelessly, the SYN doesn't include A's key (it is includes the initiator's key, allowing the MP_CAPABLE option on
still sent in the ACK). the SYN to be shorter in length, and to avoid duplicating the
sending of keying material.
o This requires MP_CAPABLE to also be sent reliably on the third o This requires MP_CAPABLE to also be sent reliably on the third
ACK. If safe receipt of the third ACK cannot be inferred, the ACK. If safe receipt of the third ACK cannot be inferred, the
MP_CAPABLE option must be repeated on the first data packet. MP_CAPABLE option must be repeated on the first data packet.
o In the Flags field of MP_CAPABLE, C is now assigned to mean that o In the Flags field of MP_CAPABLE, C is now assigned to mean that
the sender of this option will not accept additional MPTCP the sender of this option will not accept additional MPTCP
subflows to the source address and port. This is an efficiency subflows to the source address and port. This is an efficiency
improvement, for example where the sender is behind a strict NAT. improvement, for example where the sender is behind a strict NAT.
 End of changes. 44 change blocks. 
128 lines changed or deleted 136 lines changed or added

This html diff was produced by rfcdiff 1.47. The latest version is available from http://tools.ietf.org/tools/rfcdiff/