draft-ietf-mptcp-rfc6824bis-12.txt   draft-ietf-mptcp-rfc6824bis-13.txt 
Internet Engineering Task Force A. Ford Internet Engineering Task Force A. Ford
Internet-Draft Pexip Internet-Draft Pexip
Obsoletes: 6824 (if approved) C. Raiciu Obsoletes: 6824 (if approved) C. Raiciu
Intended status: Standards Track U. Politechnica of Bucharest Intended status: Standards Track U. Politechnica of Bucharest
Expires: April 6, 2019 M. Handley Expires: August 21, 2019 M. Handley
U. College London U. College London
O. Bonaventure O. Bonaventure
U. catholique de Louvain U. catholique de Louvain
C. Paasch C. Paasch
Apple, Inc. Apple, Inc.
October 3, 2018 February 17, 2019
TCP Extensions for Multipath Operation with Multiple Addresses TCP Extensions for Multipath Operation with Multiple Addresses
draft-ietf-mptcp-rfc6824bis-12 draft-ietf-mptcp-rfc6824bis-13
Abstract Abstract
TCP/IP communication is currently restricted to a single path per TCP/IP communication is currently restricted to a single path per
connection, yet multiple paths often exist between peers. The connection, yet multiple paths often exist between peers. The
simultaneous use of these multiple paths for a TCP/IP session would simultaneous use of these multiple paths for a TCP/IP session would
improve resource usage within the network and, thus, improve user improve resource usage within the network and, thus, improve user
experience through higher throughput and improved resilience to experience through higher throughput and improved resilience to
network failure. network failure.
Multipath TCP provides the ability to simultaneously use multiple Multipath TCP provides the ability to simultaneously use multiple
paths between peers. This document presents a set of extensions to paths between peers. This document presents a set of extensions to
traditional TCP to support multipath operation. The protocol offers traditional TCP to support multipath operation. The protocol offers
the same type of service to applications as TCP (i.e., reliable the same type of service to applications as TCP (i.e., reliable
bytestream), and it provides the components necessary to establish bytestream), and it provides the components necessary to establish
and use multiple TCP flows across potentially disjoint paths. and use multiple TCP flows across potentially disjoint paths.
This document specifies v1 of Multipath TCP, obsoleting v0 as This document specifies v1 of Multipath TCP, obsoleting v0 as
specified in RFC6824 [RFC6824] through clarifications and specified in RFC6824, through clarifications and modifications
modifications primarily driven by deployment experience. primarily driven by deployment experience.
Status of This Memo Status of This Memo
This Internet-Draft is submitted in full conformance with the This Internet-Draft is submitted in full conformance with the
provisions of BCP 78 and BCP 79. provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF). Note that other groups may also distribute Task Force (IETF). Note that other groups may also distribute
working documents as Internet-Drafts. The list of current Internet- working documents as Internet-Drafts. The list of current Internet-
Drafts is at https://datatracker.ietf.org/drafts/current/. Drafts is at http://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress." material or to cite them other than as "work in progress."
This Internet-Draft will expire on April 6, 2019. This Internet-Draft will expire on August 21, 2019.
Copyright Notice Copyright Notice
Copyright (c) 2018 IETF Trust and the persons identified as the Copyright (c) 2019 IETF Trust and the persons identified as the
document authors. All rights reserved. document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents Provisions Relating to IETF Documents
(https://trustee.ietf.org/license-info) in effect on the date of (http://trustee.ietf.org/license-info) in effect on the date of
publication of this document. Please review these documents publication of this document. Please review these documents
carefully, as they describe your rights and restrictions with respect carefully, as they describe your rights and restrictions with respect
to this document. Code Components extracted from this document must to this document. Code Components extracted from this document must
include Simplified BSD License text as described in Section 4.e of include Simplified BSD License text as described in Section 4.e of
the Trust Legal Provisions and are provided without warranty as the Trust Legal Provisions and are provided without warranty as
described in the Simplified BSD License. described in the Simplified BSD License.
Table of Contents Table of Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3
skipping to change at page 2, line 48 skipping to change at page 2, line 48
2.5. Requesting a Change in a Path's Priority . . . . . . . . 13 2.5. Requesting a Change in a Path's Priority . . . . . . . . 13
2.6. Closing an MPTCP Connection . . . . . . . . . . . . . . . 13 2.6. Closing an MPTCP Connection . . . . . . . . . . . . . . . 13
2.7. Notable Features . . . . . . . . . . . . . . . . . . . . 14 2.7. Notable Features . . . . . . . . . . . . . . . . . . . . 14
3. MPTCP Protocol . . . . . . . . . . . . . . . . . . . . . . . 15 3. MPTCP Protocol . . . . . . . . . . . . . . . . . . . . . . . 15
3.1. Connection Initiation . . . . . . . . . . . . . . . . . . 16 3.1. Connection Initiation . . . . . . . . . . . . . . . . . . 16
3.2. Starting a New Subflow . . . . . . . . . . . . . . . . . 23 3.2. Starting a New Subflow . . . . . . . . . . . . . . . . . 23
3.3. General MPTCP Operation . . . . . . . . . . . . . . . . . 28 3.3. General MPTCP Operation . . . . . . . . . . . . . . . . . 28
3.3.1. Data Sequence Mapping . . . . . . . . . . . . . . . . 30 3.3.1. Data Sequence Mapping . . . . . . . . . . . . . . . . 30
3.3.2. Data Acknowledgments . . . . . . . . . . . . . . . . 33 3.3.2. Data Acknowledgments . . . . . . . . . . . . . . . . 33
3.3.3. Closing a Connection . . . . . . . . . . . . . . . . 34 3.3.3. Closing a Connection . . . . . . . . . . . . . . . . 34
3.3.4. Receiver Considerations . . . . . . . . . . . . . . . 36 3.3.4. Receiver Considerations . . . . . . . . . . . . . . . 35
3.3.5. Sender Considerations . . . . . . . . . . . . . . . . 37 3.3.5. Sender Considerations . . . . . . . . . . . . . . . . 37
3.3.6. Reliability and Retransmissions . . . . . . . . . . . 38 3.3.6. Reliability and Retransmissions . . . . . . . . . . . 37
3.3.7. Congestion Control Considerations . . . . . . . . . . 39 3.3.7. Congestion Control Considerations . . . . . . . . . . 39
3.3.8. Subflow Policy . . . . . . . . . . . . . . . . . . . 39 3.3.8. Subflow Policy . . . . . . . . . . . . . . . . . . . 39
3.4. Address Knowledge Exchange (Path Management) . . . . . . 41 3.4. Address Knowledge Exchange (Path Management) . . . . . . 41
3.4.1. Address Advertisement . . . . . . . . . . . . . . . . 42 3.4.1. Address Advertisement . . . . . . . . . . . . . . . . 42
3.4.2. Remove Address . . . . . . . . . . . . . . . . . . . 45 3.4.2. Remove Address . . . . . . . . . . . . . . . . . . . 45
3.5. Fast Close . . . . . . . . . . . . . . . . . . . . . . . 46 3.5. Fast Close . . . . . . . . . . . . . . . . . . . . . . . 46
3.6. Subflow Reset . . . . . . . . . . . . . . . . . . . . . . 48 3.6. Subflow Reset . . . . . . . . . . . . . . . . . . . . . . 48
3.7. Fallback . . . . . . . . . . . . . . . . . . . . . . . . 50 3.7. Fallback . . . . . . . . . . . . . . . . . . . . . . . . 49
3.8. Error Handling . . . . . . . . . . . . . . . . . . . . . 53 3.8. Error Handling . . . . . . . . . . . . . . . . . . . . . 53
3.9. Heuristics . . . . . . . . . . . . . . . . . . . . . . . 54 3.9. Heuristics . . . . . . . . . . . . . . . . . . . . . . . 53
3.9.1. Port Usage . . . . . . . . . . . . . . . . . . . . . 54 3.9.1. Port Usage . . . . . . . . . . . . . . . . . . . . . 54
3.9.2. Delayed Subflow Start and Subflow Symmetry . . . . . 54 3.9.2. Delayed Subflow Start and Subflow Symmetry . . . . . 54
3.9.3. Failure Handling . . . . . . . . . . . . . . . . . . 55 3.9.3. Failure Handling . . . . . . . . . . . . . . . . . . 55
4. Semantic Issues . . . . . . . . . . . . . . . . . . . . . . . 56 4. Semantic Issues . . . . . . . . . . . . . . . . . . . . . . . 56
5. Security Considerations . . . . . . . . . . . . . . . . . . . 57 5. Security Considerations . . . . . . . . . . . . . . . . . . . 57
6. Interactions with Middleboxes . . . . . . . . . . . . . . . . 60 6. Interactions with Middleboxes . . . . . . . . . . . . . . . . 60
7. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 63 7. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 63
8. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 64 8. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 64
8.1. MPTCP Option Subtypes . . . . . . . . . . . . . . . . . . 64 8.1. MPTCP Option Subtypes . . . . . . . . . . . . . . . . . . 64
8.2. MPTCP Handshake Algorithms . . . . . . . . . . . . . . . 65 8.2. MPTCP Handshake Algorithms . . . . . . . . . . . . . . . 65
8.3. MP_TCPRST Reason Codes . . . . . . . . . . . . . . . . . 66 8.3. MP_TCPRST Reason Codes . . . . . . . . . . . . . . . . . 66
9. References . . . . . . . . . . . . . . . . . . . . . . . . . 67 9. References . . . . . . . . . . . . . . . . . . . . . . . . . 67
9.1. Normative References . . . . . . . . . . . . . . . . . . 67 9.1. Normative References . . . . . . . . . . . . . . . . . . 67
9.2. Informative References . . . . . . . . . . . . . . . . . 67 9.2. Informative References . . . . . . . . . . . . . . . . . 67
Appendix A. Notes on Use of TCP Options . . . . . . . . . . . . 71 Appendix A. Notes on Use of TCP Options . . . . . . . . . . . . 71
Appendix B. TCP Fast Open and MPTCP . . . . . . . . . . . . . . 72 Appendix B. TCP Fast Open and MPTCP . . . . . . . . . . . . . . 72
B.1. TFO cookie request with MPTCP . . . . . . . . . . . . . . 73 B.1. TFO cookie request with MPTCP . . . . . . . . . . . . . . 72
B.2. Data sequence mapping under TFO . . . . . . . . . . . . . 73 B.2. Data sequence mapping under TFO . . . . . . . . . . . . . 73
B.3. Connection establishment examples . . . . . . . . . . . . 74 B.3. Connection establishment examples . . . . . . . . . . . . 74
Appendix C. Control Blocks . . . . . . . . . . . . . . . . . . . 76 Appendix C. Control Blocks . . . . . . . . . . . . . . . . . . . 76
C.1. MPTCP Control Block . . . . . . . . . . . . . . . . . . . 76 C.1. MPTCP Control Block . . . . . . . . . . . . . . . . . . . 76
C.1.1. Authentication and Metadata . . . . . . . . . . . . . 76 C.1.1. Authentication and Metadata . . . . . . . . . . . . . 76
C.1.2. Sending Side . . . . . . . . . . . . . . . . . . . . 77 C.1.2. Sending Side . . . . . . . . . . . . . . . . . . . . 77
C.1.3. Receiving Side . . . . . . . . . . . . . . . . . . . 77 C.1.3. Receiving Side . . . . . . . . . . . . . . . . . . . 77
C.2. TCP Control Blocks . . . . . . . . . . . . . . . . . . . 77 C.2. TCP Control Blocks . . . . . . . . . . . . . . . . . . . 77
C.2.1. Sending Side . . . . . . . . . . . . . . . . . . . . 78 C.2.1. Sending Side . . . . . . . . . . . . . . . . . . . . 78
C.2.2. Receiving Side . . . . . . . . . . . . . . . . . . . 78 C.2.2. Receiving Side . . . . . . . . . . . . . . . . . . . 78
Appendix D. Finite State Machine . . . . . . . . . . . . . . . . 78 Appendix D. Finite State Machine . . . . . . . . . . . . . . . . 78
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 79 Appendix E. Changes from RFC6184 . . . . . . . . . . . . . . . . 79
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 81
1. Introduction 1. Introduction
Multipath TCP (MPTCP) is a set of extensions to regular TCP [RFC0793] Multipath TCP (MPTCP) is a set of extensions to regular TCP [RFC0793]
to provide a Multipath TCP [RFC6182] service, which enables a to provide a Multipath TCP [RFC6182] service, which enables a
transport connection to operate across multiple paths simultaneously. transport connection to operate across multiple paths simultaneously.
This document presents the protocol changes required to add multipath This document presents the protocol changes required to add multipath
capability to TCP; specifically, those for signaling and setting up capability to TCP; specifically, those for signaling and setting up
multiple paths ("subflows"), managing these subflows, reassembly of multiple paths ("subflows"), managing these subflows, reassembly of
data, and termination of sessions. This is not the only information data, and termination of sessions. This is not the only information
skipping to change at page 13, line 33 skipping to change at page 13, line 33
Host A Host B Host A Host B
------ ------ ------ ------
MP_PRIO -> MP_PRIO ->
2.6. Closing an MPTCP Connection 2.6. Closing an MPTCP Connection
When a host wants to close an existing subflow, but not the whole When a host wants to close an existing subflow, but not the whole
connection, it can initiate a regular TCP FIN/ACK exchange. connection, it can initiate a regular TCP FIN/ACK exchange.
When Host A wants to inform Host B that it has no more data to send, When Host A wants to inform Host B that it has no more data to send,
it signals this "Data FIN" as part of the Data Sequence Signal (see it signals this "DATA_FIN" as part of the Data Sequence Signal (see
above). It has the same semantics and behavior as a regular TCP FIN, above). It has the same semantics and behavior as a regular TCP FIN,
but at the connection level. Once all the data on the MPTCP but at the connection level. Once all the data on the MPTCP
connection has been successfully received, then this message is connection has been successfully received, then this message is
acknowledged at the connection level with a DATA_ACK. Further acknowledged at the connection level with a DATA_ACK. Further
details are in Section 3.3.3. details are in Section 3.3.3.
Host A Host B Host A Host B
------ ------ ------ ------
DATA_SEQUENCE_SIGNAL -> DATA_SEQUENCE_SIGNAL ->
[Data FIN] [DATA_FIN]
<- (MPTCP DATA_ACK) <- (MPTCP DATA_ACK)
There is an additional method of connection closure, referred to as There is an additional method of connection closure, referred to as
"Fast Close", which is analogous to closing a single-path TCP "Fast Close", which is analogous to closing a single-path TCP
connection with a RST signal. The MP_FASTCLOSE signal is used to connection with a RST signal. The MP_FASTCLOSE signal is used to
indicate to the peer that the connection will be abruptly closed and indicate to the peer that the connection will be abruptly closed and
no data will be accepted anymore. This can be used on an ACK no data will be accepted anymore. This can be used on an ACK
(ensuring reliability of the signal), or a RST (which is not). Both (ensuring reliability of the signal), or a RST (which is not). Both
examples are shown in the following diagrams. Further details are in examples are shown in the following diagrams. Further details are in
Section 3.5. Section 3.5.
skipping to change at page 21, line 8 skipping to change at page 21, line 8
C: The third bit, labeled "C", is set to "1" to indicate that the C: The third bit, labeled "C", is set to "1" to indicate that the
sender of this option will not accept additional MPTCP subflows to sender of this option will not accept additional MPTCP subflows to
the source address and port, and therefore the receiver MUST NOT the source address and port, and therefore the receiver MUST NOT
try to open any additional subflows towards this address and port. try to open any additional subflows towards this address and port.
This is an efficiency improvement for situations where the sender This is an efficiency improvement for situations where the sender
knows a restriction is in place, for example if the sender is knows a restriction is in place, for example if the sender is
behind a strict NAT, or operating behind a legacy Layer 4 load behind a strict NAT, or operating behind a legacy Layer 4 load
balancer. balancer.
D through H: The remaining bits, labeled "D" through "H", are used D through H: The remaining bits, labeled "D" through "H", are used
for crypto algorithm negotiation. Currently only the rightmost for crypto algorithm negotiation. In this specification only the
bit, labeled "H", is assigned. Bit "H" indicates the use of HMAC- rightmost bit, labeled "H", is assigned. Bit "H" indicates the
SHA256 (as defined in Section 3.2). An implementation that only use of HMAC-SHA256 (as defined in Section 3.2). An implementation
supports this method MUST set bit "H" to 1, and bits "D" through that only supports this method MUST set bit "H" to 1, and bits "D"
"G" to 0. through "G" to 0.
A crypto algorithm MUST be specified. If flag bits D through H are A crypto algorithm MUST be specified. If flag bits D through H are
all 0, the MP_CAPABLE option MUST be treated as invalid and ignored all 0, the MP_CAPABLE option MUST be treated as invalid and ignored
(that is, it must be treated as a regular TCP handshake). (that is, it must be treated as a regular TCP handshake).
The selection of the authentication algorithm also impacts the The selection of the authentication algorithm also impacts the
algorithm used to generate the token and the Initial Data Sequence algorithm used to generate the token and the Initial Data Sequence
Number (IDSN). In this specification, with only the SHA-256 Number (IDSN). In this specification, with only the SHA-256
algorithm (bit "H") specified and selected, the token MUST be a algorithm (bit "H") specified and selected, the token MUST be a
truncated (most significant 32 bits) SHA-256 hash ([SHS], [RFC6234]) truncated (most significant 32 bits) SHA-256 hash ([SHS], [RFC6234])
skipping to change at page 23, line 49 skipping to change at page 23, line 49
send Token-B (which is generated from Key-B). Note that the hash send Token-B (which is generated from Key-B). Note that the hash
generation algorithm can be overridden by the choice of cryptographic generation algorithm can be overridden by the choice of cryptographic
handshake algorithm, as defined in Section 3.1. handshake algorithm, as defined in Section 3.1.
The MP_JOIN SYN sends not only the token (which is static for a The MP_JOIN SYN sends not only the token (which is static for a
connection) but also random numbers (nonces) that are used to prevent connection) but also random numbers (nonces) that are used to prevent
replay attacks on the authentication method. Recommendations for the replay attacks on the authentication method. Recommendations for the
generation of random numbers for this purpose are given in [RFC4086]. generation of random numbers for this purpose are given in [RFC4086].
The MP_JOIN option includes an "Address ID". This is an identifier The MP_JOIN option includes an "Address ID". This is an identifier
that only has significance within a single connection, where it generated by the sender of the option, used to identify the source
identifies the source address of this packet, even if the IP header address of this packet, even if the IP header has been changed in
has been changed in transit by a middlebox. The Address ID allows transit by a middlebox. The numeric value of this field is generated
address removal (Section 3.4.2) without needing to know what the by the sender and must map uniquely to a source IP address for the
source address at the receiver is, thus allowing address removal sending host. The Address ID allows address removal (Section 3.4.2)
through NATs. The Address ID also allows correlation between new without needing to know what the source address at the receiver is,
subflow setup attempts and address signaling (Section 3.4.1), to thus allowing address removal through NATs. The Address ID also
prevent setting up duplicate subflows on the same path, if an MP_JOIN allows correlation between new subflow setup attempts and address
and ADD_ADDR are sent at the same time. signaling (Section 3.4.1), to prevent setting up duplicate subflows
on the same path, if an MP_JOIN and ADD_ADDR are sent at the same
time.
The Address IDs of the subflow used in the initial SYN exchange of The Address IDs of the subflow used in the initial SYN exchange of
the first subflow in the connection are implicit, and have the value the first subflow in the connection are implicit, and have the value
zero. A host MUST store the mappings between Address IDs and zero. A host MUST store the mappings between Address IDs and
addresses both for itself and the remote host. An implementation addresses both for itself and the remote host. An implementation
will also need to know which local and remote Address IDs are will also need to know which local and remote Address IDs are
associated with which established subflows, for when addresses are associated with which established subflows, for when addresses are
removed from a local or remote host. removed from a local or remote host.
The MP_JOIN option on packets with the SYN flag set also includes 4 The MP_JOIN option on packets with the SYN flag set also includes 4
skipping to change at page 27, line 10 skipping to change at page 27, line 10
Figure 7: Join Connection (MP_JOIN) Option (for Third ACK) Figure 7: Join Connection (MP_JOIN) Option (for Third ACK)
These various MPTCP options fit together to enable authenticated These various MPTCP options fit together to enable authenticated
subflow setup as illustrated in Figure 8. subflow setup as illustrated in Figure 8.
Host A Host B Host A Host B
------------------------ ---------- ------------------------ ----------
Address A1 Address A2 Address B1 Address A1 Address A2 Address B1
---------- ---------- ---------- ---------- ---------- ----------
| | | | | |
| SYN + MP_CAPABLE(Key-A) | | | SYN + MP_CAPABLE |
|--------------------------------------------->| |--------------------------------------------->|
|<---------------------------------------------| |<---------------------------------------------|
| SYN/ACK + MP_CAPABLE(Key-B) | | SYN/ACK + MP_CAPABLE(Key-B) |
| | | | | |
| ACK + MP_CAPABLE(Key-A, Key-B) | | ACK + MP_CAPABLE(Key-A, Key-B) |
|--------------------------------------------->| |--------------------------------------------->|
| | | | | |
| | SYN + MP_JOIN(Token-B, R-A) | | | SYN + MP_JOIN(Token-B, R-A) |
| |------------------------------->| | |------------------------------->|
| |<-------------------------------| | |<-------------------------------|
skipping to change at page 27, line 46 skipping to change at page 27, line 46
"Administratively prohibited" reason code (Section 3.6) should be "Administratively prohibited" reason code (Section 3.6) should be
included. included.
If the token is accepted at Host B, but the HMAC returned to Host A If the token is accepted at Host B, but the HMAC returned to Host A
does not match the one expected, Host A MUST close the subflow with a does not match the one expected, Host A MUST close the subflow with a
TCP RST. In this, and all following cases of sending a RST in this TCP RST. In this, and all following cases of sending a RST in this
section, the sender SHOULD send a MP_TCPRST option (Section 3.6) on section, the sender SHOULD send a MP_TCPRST option (Section 3.6) on
this RST packet with the reason code for a "MPTCP specific error". this RST packet with the reason code for a "MPTCP specific error".
If Host B does not receive the expected HMAC, or the MP_JOIN option If Host B does not receive the expected HMAC, or the MP_JOIN option
is missing from the ACK, it MUST close the subflow with a TCP RST is missing from the ACK, it MUST close the subflow with a TCP RST.
with a MP_TCPRST (Section 3.6) option with the reason code for "MPTCP
specific error".
If the HMACs are verified as correct, then both hosts have If the HMACs are verified as correct, then both hosts have
authenticated each other as being the same peers as existed at the authenticated each other as being the same peers as existed at the
start of the connection, and they have agreed of which connection start of the connection, and they have agreed of which connection
this subflow will become a part. this subflow will become a part.
If the SYN/ACK as received at Host A does not have an MP_JOIN option, If the SYN/ACK as received at Host A does not have an MP_JOIN option,
Host A MUST close the subflow with a TCP RST with a MP_TCPRST Host A MUST close the subflow with a TCP RST.
(Section 3.6) option with the reason code for "MPTCP specific error".
This covers all cases of the loss of an MP_JOIN. In more detail, if This covers all cases of the loss of an MP_JOIN. In more detail, if
MP_JOIN is stripped from the SYN on the path from A to B, and Host B MP_JOIN is stripped from the SYN on the path from A to B, and Host B
does not have a listener on the relevant port, it will respond with a does not have a listener on the relevant port, it will respond with a
RST in the normal way. If in response to a SYN with an MP_JOIN RST in the normal way. If in response to a SYN with an MP_JOIN
option, a SYN/ACK is received without the MP_JOIN option (either option, a SYN/ACK is received without the MP_JOIN option (either
since it was stripped on the return path, or it was stripped on the since it was stripped on the return path, or it was stripped on the
outgoing path but Host B responded as if it were a new regular TCP outgoing path but Host B responded as if it were a new regular TCP
session), then the subflow is unusable and Host A MUST close it with session), then the subflow is unusable and Host A MUST close it with
a RST. a RST.
skipping to change at page 32, line 36 skipping to change at page 32, line 30
numbers is not required, then an implementation MAY include just the numbers is not required, then an implementation MAY include just the
lower 32 bits of the data sequence number in the data sequence lower 32 bits of the data sequence number in the data sequence
mapping and/or Data ACK as an optimization, and an implementation can mapping and/or Data ACK as an optimization, and an implementation can
make this choice independently for each packet. An implementation make this choice independently for each packet. An implementation
MUST be able to receive and process both 64-bit or 32-bit sequence MUST be able to receive and process both 64-bit or 32-bit sequence
number values, but it is not required that an implementation is able number values, but it is not required that an implementation is able
to send both. to send both.
An implementation MUST send the full 64-bit data sequence number if An implementation MUST send the full 64-bit data sequence number if
it is transmitting at a sufficiently high rate that the 32-bit value it is transmitting at a sufficiently high rate that the 32-bit value
could wrap within the Maximum Segment Lifetime (MSL) [RFC1323]. The could wrap within the Maximum Segment Lifetime (MSL) [RFC7323]. The
lengths of the DSNs used in these values (which may be different) are lengths of the DSNs used in these values (which may be different) are
declared with flags in the DSS option. Implementations MUST accept a declared with flags in the DSS option. Implementations MUST accept a
32-bit DSN and implicitly promote it to a 64-bit quantity by 32-bit DSN and implicitly promote it to a 64-bit quantity by
incrementing the upper 32 bits of sequence number each time the lower incrementing the upper 32 bits of sequence number each time the lower
32 bits wrap. A sanity check MUST be implemented to ensure that a 32 bits wrap. A sanity check MUST be implemented to ensure that a
wrap occurs at an expected time (e.g., the sequence number jumps from wrap occurs at an expected time (e.g., the sequence number jumps from
a very high number to a very low number) and is not triggered by out- a very high number to a very low number) and is not triggered by out-
of-order packets. of-order packets.
As with the standard TCP sequence number, the data sequence number As with the standard TCP sequence number, the data sequence number
skipping to change at page 40, line 32 skipping to change at page 40, line 28
In the event that the available set of paths changes, a host may wish In the event that the available set of paths changes, a host may wish
to signal a change in priority of subflows to the peer (e.g., a to signal a change in priority of subflows to the peer (e.g., a
subflow that was previously set as backup should now take priority subflow that was previously set as backup should now take priority
over all remaining subflows). Therefore, the MP_PRIO option, shown over all remaining subflows). Therefore, the MP_PRIO option, shown
in Figure 11, can be used to change the 'B' flag of the subflow on in Figure 11, can be used to change the 'B' flag of the subflow on
which it is sent. which it is sent.
Another use of the MP_PRIO option is to set the 'B' flag on a subflow Another use of the MP_PRIO option is to set the 'B' flag on a subflow
to cleanly retire its use before closing it and removing it with to cleanly retire its use before closing it and removing it with
REMOVE_ADDR Section 3.4.2, for example to support make-before-break REMOVE_ADDR Section 3.4.2, for example to support make-before-break
session continuity. session continuity, where new subflows are added before the
previously used ones are closed.
1 2 3 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+---------------+---------------+-------+-----+-+ +---------------+---------------+-------+-----+-+
| Kind | Length |Subtype| |B| | Kind | Length |Subtype| |B|
+---------------+---------------+-------+-----+-+ +---------------+---------------+-------+-----+-+
Figure 11: Change Subflow Priority (MP_PRIO) Option Figure 11: Change Subflow Priority (MP_PRIO) Option
It should be noted that the backup flag is a request from a data It should be noted that the backup flag is a request from a data
skipping to change at page 43, line 31 skipping to change at page 43, line 31
ADD_ADDR option. If the port is not present in the ADD_ADDR option, ADD_ADDR option. If the port is not present in the ADD_ADDR option,
the HMAC message will nevertheless include two octets of value zero. the HMAC message will nevertheless include two octets of value zero.
The rationale for the HMAC is to prevent unauthorized entities from The rationale for the HMAC is to prevent unauthorized entities from
injecting ADD_ADDR signals in an attempt to hijack a connection. injecting ADD_ADDR signals in an attempt to hijack a connection.
Note that additionally the presence of this HMAC prevents the address Note that additionally the presence of this HMAC prevents the address
being changed in flight unless the key is known by an intermediary. being changed in flight unless the key is known by an intermediary.
If a host receives an ADD_ADDR option for which it cannot validate If a host receives an ADD_ADDR option for which it cannot validate
the HMAC, it SHOULD silently ignore the option. the HMAC, it SHOULD silently ignore the option.
A set of four flags are present after the subtype and before the A set of four flags are present after the subtype and before the
Address ID. Only the rightmost bit - labelled 'E' - is assigned Address ID. Only the rightmost bit - labelled 'E' - is assigned in
today. The other bits are currently unassigned and MUST be set to this specification. The other bits are currently unassigned and MUST
zero by a sender and MUST be ignored by the receiver. be set to zero by a sender and MUST be ignored by the receiver.
The 'E' flag exists to provide reliability for this option. Because The 'E' flag exists to provide reliability for this option. Because
this option will often be sent on pure ACKs, there is no guarantee of this option will often be sent on pure ACKs, there is no guarantee of
reliability. Therefore, a receiver receiving a fresh ADD_ADDR option reliability. Therefore, a receiver receiving a fresh ADD_ADDR option
(where E=0), will send the same option back to the sender, but not (where E=0), will send the same option back to the sender, but not
including the HMAC, and with E=1. The lack of this echo can be used including the HMAC, and with E=1, to indicate receipt. The lack of
by the initial ADD_ADDR sender to retransmit the ADD_ADDR according this echo can be used by the initial ADD_ADDR sender to retransmit
to local policy. the ADD_ADDR according to local policy.
1 2 3 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+---------------+---------------+-------+-------+---------------+ +---------------+---------------+-------+-------+---------------+
| Kind | Length |Subtype|(rsv)|E| Address ID | | Kind | Length |Subtype|(rsv)|E| Address ID |
+---------------+---------------+-------+-------+---------------+ +---------------+---------------+-------+-------+---------------+
| Address (IPv4 - 4 octets / IPv6 - 16 octets) | | Address (IPv4 - 4 octets / IPv6 - 16 octets) |
+-------------------------------+-------------------------------+ +-------------------------------+-------------------------------+
| Port (2 octets, optional) | | | Port (2 octets, optional) | |
+-------------------------------+ | +-------------------------------+ |
| Truncated HMAC (8 octets, if length > 10 octets) | | Truncated HMAC (8 octets, if E=0) |
| +-------------------------------+ | +-------------------------------+
| | | |
+-------------------------------+ +-------------------------------+
Figure 12: Add Address (ADD_ADDR) Option Figure 12: Add Address (ADD_ADDR) Option
Due to the proliferation of NATs, it is reasonably likely that one Due to the proliferation of NATs, it is reasonably likely that one
host may attempt to advertise private addresses [RFC1918]. It is not host may attempt to advertise private addresses [RFC1918]. It is not
desirable to prohibit this, since there may be cases where both hosts desirable to prohibit this, since there may be cases where both hosts
have additional interfaces on the same private network, and a host have additional interfaces on the same private network, and a host
skipping to change at page 44, line 38 skipping to change at page 44, line 38
uniquely identifies the connection to the receiving host. If the uniquely identifies the connection to the receiving host. If the
token is unknown, the host will return with a RST. In the unlikely token is unknown, the host will return with a RST. In the unlikely
event that the token is valid at the receiving host, subflow setup event that the token is valid at the receiving host, subflow setup
will continue, but the HMAC exchange must occur for authentication. will continue, but the HMAC exchange must occur for authentication.
This will fail, and will provide sufficient protection against two This will fail, and will provide sufficient protection against two
unconnected hosts accidentally setting up a new subflow upon the unconnected hosts accidentally setting up a new subflow upon the
signal of a private address. Further security considerations around signal of a private address. Further security considerations around
the issue of ADD_ADDR messages that accidentally misdirect, or the issue of ADD_ADDR messages that accidentally misdirect, or
maliciously direct, new MP_JOIN attempts are discussed in Section 5. maliciously direct, new MP_JOIN attempts are discussed in Section 5.
Ideally, ADD_ADDR and REMOVE_ADDR options would be sent reliably, and
in order, to the other end. This would ensure that this address
management does not unnecessarily cause an outage in the connection
when remove/add addresses are processed in reverse order, and also to
ensure that all possible paths are used. Note, however, that losing
reliability and ordering will not break the multipath connections, it
will just reduce the opportunity to open multipath paths and to
survive different patterns of path failures.
Therefore, implementing reliability signals for these MPTCP options
is not necessary. In order to minimize the impact of the loss of
these options, however, it is RECOMMENDED that a sender should send
these options on all available subflows. If these options need to be
received in order, an implementation SHOULD only send one ADD_ADDR/
REMOVE_ADDR option per RTT, to minimize the risk of misordering.
A host that receives an ADD_ADDR but finds a connection set up to A host that receives an ADD_ADDR but finds a connection set up to
that IP address and port number is unsuccessful SHOULD NOT perform that IP address and port number is unsuccessful SHOULD NOT perform
further connection attempts to this address/port combination for this further connection attempts to this address/port combination for this
connection. A sender that wants to trigger a new incoming connection connection. A sender that wants to trigger a new incoming connection
attempt on a previously advertised address/port combination can attempt on a previously advertised address/port combination can
therefore refresh ADD_ADDR information by sending the option again. therefore refresh ADD_ADDR information by sending the option again.
A host can therefore send an ADD_ADDR message with an already A host can therefore send an ADD_ADDR message with an already
assigned Address ID, but the Address MUST be the same as previously assigned Address ID, but the Address MUST be the same as previously
assigned to this Address ID. A new ADD_ADDR may have the same, or assigned to this Address ID. A new ADD_ADDR may have the same, or
skipping to change at page 58, line 45 skipping to change at page 58, line 39
The use of crypto capability bits in the initial connection handshake The use of crypto capability bits in the initial connection handshake
to negotiate use of a particular algorithm allows the deployment of to negotiate use of a particular algorithm allows the deployment of
additional crypto mechanisms in the future. Note that this would be additional crypto mechanisms in the future. Note that this would be
susceptible to bid-down attacks only if the attacker was on-path (and susceptible to bid-down attacks only if the attacker was on-path (and
thus would be able to modify the data anyway). The security thus would be able to modify the data anyway). The security
mechanism presented in this document should therefore protect against mechanism presented in this document should therefore protect against
all forms of flooding and hijacking attacks discussed in [RFC6181]. all forms of flooding and hijacking attacks discussed in [RFC6181].
The version negotiation specified in Section 3.1, if differing MPTCP The version negotiation specified in Section 3.1, if differing MPTCP
versions shared a common negotiation format, would allow an on-path versions shared a common negotiation format, would allow an on-path
attacker to apply a theoretical bid-down attack. However, since the attacker to apply a theoretical bid-down attack. Since the v1 and v0
v1 and v0 protocols have a different handshake, this is not an attack protocols have a different handshake, such an attack would require
that can be applied here. Furthermore, an on-path attacker would the client to re-establish the connection using v0, and this being
have access to the raw data, negating any other TCP-level security supported by the server. Note that an on-path attacker would have
access to the raw data, negating any other TCP-level security
mechanisms. Also a change from [RFC6824] has removed the subflow mechanisms. Also a change from [RFC6824] has removed the subflow
identifier from the MP_PRIO option (Section 3.3.8), to remove the identifier from the MP_PRIO option (Section 3.3.8), to remove the
theoretical attack where a subflow could be placed in "backup" mode theoretical attack where a subflow could be placed in "backup" mode
by an attacker. by an attacker.
During normal operation, regular TCP protection mechanisms (such as During normal operation, regular TCP protection mechanisms (such as
ensuring sequence numbers are in-window) will provide the same level ensuring sequence numbers are in-window) will provide the same level
of protection against attacks on individual TCP subflows as exists of protection against attacks on individual TCP subflows as exists
for regular TCP today. Implementations will introduce additional for regular TCP today. Implementations will introduce additional
buffers compared to regular TCP, to reassemble data at the connection buffers compared to regular TCP, to reassemble data at the connection
skipping to change at page 63, line 50 skipping to change at page 63, line 48
The authors gratefully acknowledge significant input into this The authors gratefully acknowledge significant input into this
document from Sebastien Barre and Andrew McDonald. document from Sebastien Barre and Andrew McDonald.
The authors also wish to acknowledge reviews and contributions from The authors also wish to acknowledge reviews and contributions from
Iljitsch van Beijnum, Lars Eggert, Marcelo Bagnulo, Robert Hancock, Iljitsch van Beijnum, Lars Eggert, Marcelo Bagnulo, Robert Hancock,
Pasi Sarolahti, Toby Moncaster, Philip Eardley, Sergio Lembo, Pasi Sarolahti, Toby Moncaster, Philip Eardley, Sergio Lembo,
Lawrence Conroy, Yoshifumi Nishida, Bob Briscoe, Stein Gjessing, Lawrence Conroy, Yoshifumi Nishida, Bob Briscoe, Stein Gjessing,
Andrew McGregor, Georg Hampel, Anumita Biswas, Wes Eddy, Alexey Andrew McGregor, Georg Hampel, Anumita Biswas, Wes Eddy, Alexey
Melnikov, Francis Dupont, Adrian Farrel, Barry Leiba, Robert Sparks, Melnikov, Francis Dupont, Adrian Farrel, Barry Leiba, Robert Sparks,
Sean Turner, Stephen Farrell, Martin Stiemerling, Gregory Detal, Sean Turner, Stephen Farrell, Martin Stiemerling, Gregory Detal,
Fabien Duchene, Xavier de Foy, and Rahul Jadhav. Fabien Duchene, Xavier de Foy, Rahul Jadhav, and Klemens Schragel.
8. IANA Considerations 8. IANA Considerations
This document obsoletes [RFC6824] and as such IANA is requested to This document obsoletes [RFC6824] and as such IANA is requested to
update the TCP option space registry to point to this document for update the TCP option space registry to point to this document for
Multipath TCP, as follows: Multipath TCP, as follows:
+------+--------+-----------------------+---------------+ +------+--------+-----------------------+---------------+
| Kind | Length | Meaning | Reference | | Kind | Length | Meaning | Reference |
+------+--------+-----------------------+---------------+ +------+--------+-----------------------+---------------+
skipping to change at page 66, line 27 skipping to change at page 66, line 27
| | | Section 3.2 | | | | Section 3.2 |
+-------+----------------------------------------+------------------+ +-------+----------------------------------------+------------------+
Table 3: MPTCP Handshake Algorithms Table 3: MPTCP Handshake Algorithms
Note that the meanings of bits D through H can be dependent upon bit Note that the meanings of bits D through H can be dependent upon bit
B, depending on how Extensibility is defined in future B, depending on how Extensibility is defined in future
specifications; see Section 3.1 for more information. specifications; see Section 3.1 for more information.
Future assignments in this registry are also to be defined by Future assignments in this registry are also to be defined by
Standards Action as defined by [RFC5226]. Assignments consist of the Standards Action as defined by [RFC8126]. Assignments consist of the
value of the flags, a symbolic name for the algorithm, and a value of the flags, a symbolic name for the algorithm, and a
reference to its specification. reference to its specification.
8.3. MP_TCPRST Reason Codes 8.3. MP_TCPRST Reason Codes
IANA is requested to create a further sub-registry, "MP_TCPRST Reason IANA is requested to create a further sub-registry, "MP_TCPRST Reason
Codes" under the "Transmission Control Protocol (TCP) Parameters" Codes" under the "Transmission Control Protocol (TCP) Parameters"
registry, based on the reason code in MP_TCPRST (Section 3.6): registry, based on the reason code in MP_TCPRST (Section 3.6):
+------+-----------------------------+----------------------------+ +------+-----------------------------+----------------------------+
skipping to change at page 67, line 15 skipping to change at page 67, line 15
9. References 9. References
9.1. Normative References 9.1. Normative References
[RFC0793] Postel, J., "Transmission Control Protocol", STD 7, [RFC0793] Postel, J., "Transmission Control Protocol", STD 7,
RFC 793, DOI 10.17487/RFC0793, September 1981, RFC 793, DOI 10.17487/RFC0793, September 1981,
<https://www.rfc-editor.org/info/rfc793>. <https://www.rfc-editor.org/info/rfc793>.
[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
Requirement Levels", BCP 14, RFC 2119, Requirement Levels", BCP 14, RFC 2119,
DOI 10.17487/RFC2119, March 1997, DOI 10.17487/RFC2119, March 1997, <https://www.rfc-
<https://www.rfc-editor.org/info/rfc2119>. editor.org/info/rfc2119>.
[RFC6182] Ford, A., Raiciu, C., Handley, M., Barre, S., and J. [RFC6182] Ford, A., Raiciu, C., Handley, M., Barre, S., and J.
Iyengar, "Architectural Guidelines for Multipath TCP Iyengar, "Architectural Guidelines for Multipath TCP
Development", RFC 6182, DOI 10.17487/RFC6182, March 2011, Development", RFC 6182, DOI 10.17487/RFC6182, March 2011,
<https://www.rfc-editor.org/info/rfc6182>. <https://www.rfc-editor.org/info/rfc6182>.
[RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC [RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC
2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174,
May 2017, <https://www.rfc-editor.org/info/rfc8174>. May 2017, <https://www.rfc-editor.org/info/rfc8174>.
[SHS] National Institute of Science and Technology, "Secure Hash [SHS] National Institute of Science and Technology, "Secure Hash
Standard", Federal Information Processing Standard Standard", Federal Information Processing Standard
(FIPS) 180-4, August 2015, (FIPS) 180-4, August 2015,
<http://nvlpubs.nist.gov/nistpubs/FIPS/ <http://nvlpubs.nist.gov/nistpubs/FIPS/
NIST.FIPS.180-4.pdf>. NIST.FIPS.180-4.pdf>.
9.2. Informative References 9.2. Informative References
[deployments]
Bonaventure, O. and S. Seo, "Multipath TCP Deployments",
IETF Journal 2016, November 2016,
<https://www.ietfjournal.org/multipath-tcp-deployments/>.
[howhard] Raiciu, C., Paasch, C., Barre, S., Ford, A., Honda, M., [howhard] Raiciu, C., Paasch, C., Barre, S., Ford, A., Honda, M.,
Duchene, F., Bonaventure, O., and M. Handley, "How Hard Duchene, F., Bonaventure, O., and M. Handley, "How Hard
Can It Be? Designing and Implementing a Deployable Can It Be? Designing and Implementing a Deployable
Multipath TCP", Usenix Symposium on Networked Systems Multipath TCP", Usenix Symposium on Networked Systems
Design and Implementation 2012, 2012, Design and Implementation 2012, 2012,
<https://www.usenix.org/conference/nsdi12/how-hard-can-it- <https://www.usenix.org/conference/nsdi12/how-hard-can-it-
be-designing-and-implementing-deployable-multipath-tcp>. be-designing-and-implementing-deployable-multipath-tcp>.
[norm] Handley, M., Paxson, V., and C. Kreibich, "Network [norm] Handley, M., Paxson, V., and C. Kreibich, "Network
Intrusion Detection: Evasion, Traffic Normalization, and Intrusion Detection: Evasion, Traffic Normalization, and
End-to-End Protocol Semantics", Usenix Security 2001, End-to-End Protocol Semantics", Usenix Security 2001,
2001, 2001,
<http://www.usenix.org/events/sec01/full_papers/handley/ <http://www.usenix.org/events/sec01/full_papers/handley/
handley.pdf>. handley.pdf>.
[RFC1122] Braden, R., Ed., "Requirements for Internet Hosts - [RFC1122] Braden, R., Ed., "Requirements for Internet Hosts -
Communication Layers", STD 3, RFC 1122, Communication Layers", STD 3, RFC 1122,
DOI 10.17487/RFC1122, October 1989, DOI 10.17487/RFC1122, October 1989, <https://www.rfc-
<https://www.rfc-editor.org/info/rfc1122>. editor.org/info/rfc1122>.
[RFC1323] Jacobson, V., Braden, R., and D. Borman, "TCP Extensions
for High Performance", RFC 1323, DOI 10.17487/RFC1323, May
1992, <https://www.rfc-editor.org/info/rfc1323>.
[RFC1918] Rekhter, Y., Moskowitz, B., Karrenberg, D., de Groot, G., [RFC1918] Rekhter, Y., Moskowitz, B., Karrenberg, D., de Groot, G.,
and E. Lear, "Address Allocation for Private Internets", and E. Lear, "Address Allocation for Private Internets",
BCP 5, RFC 1918, DOI 10.17487/RFC1918, February 1996, BCP 5, RFC 1918, DOI 10.17487/RFC1918, February 1996,
<https://www.rfc-editor.org/info/rfc1918>. <https://www.rfc-editor.org/info/rfc1918>.
[RFC2018] Mathis, M., Mahdavi, J., Floyd, S., and A. Romanow, "TCP [RFC2018] Mathis, M., Mahdavi, J., Floyd, S., and A. Romanow, "TCP
Selective Acknowledgment Options", RFC 2018, Selective Acknowledgment Options", RFC 2018,
DOI 10.17487/RFC2018, October 1996, DOI 10.17487/RFC2018, October 1996, <https://www.rfc-
<https://www.rfc-editor.org/info/rfc2018>. editor.org/info/rfc2018>.
[RFC2104] Krawczyk, H., Bellare, M., and R. Canetti, "HMAC: Keyed- [RFC2104] Krawczyk, H., Bellare, M., and R. Canetti, "HMAC: Keyed-
Hashing for Message Authentication", RFC 2104, Hashing for Message Authentication", RFC 2104,
DOI 10.17487/RFC2104, February 1997, DOI 10.17487/RFC2104, February 1997, <https://www.rfc-
<https://www.rfc-editor.org/info/rfc2104>. editor.org/info/rfc2104>.
[RFC2979] Freed, N., "Behavior of and Requirements for Internet [RFC2979] Freed, N., "Behavior of and Requirements for Internet
Firewalls", RFC 2979, DOI 10.17487/RFC2979, October 2000, Firewalls", RFC 2979, DOI 10.17487/RFC2979, October 2000,
<https://www.rfc-editor.org/info/rfc2979>. <https://www.rfc-editor.org/info/rfc2979>.
[RFC2992] Hopps, C., "Analysis of an Equal-Cost Multi-Path [RFC2992] Hopps, C., "Analysis of an Equal-Cost Multi-Path
Algorithm", RFC 2992, DOI 10.17487/RFC2992, November 2000, Algorithm", RFC 2992, DOI 10.17487/RFC2992, November 2000,
<https://www.rfc-editor.org/info/rfc2992>. <https://www.rfc-editor.org/info/rfc2992>.
[RFC3022] Srisuresh, P. and K. Egevang, "Traditional IP Network [RFC3022] Srisuresh, P. and K. Egevang, "Traditional IP Network
Address Translator (Traditional NAT)", RFC 3022, Address Translator (Traditional NAT)", RFC 3022,
DOI 10.17487/RFC3022, January 2001, DOI 10.17487/RFC3022, January 2001, <https://www.rfc-
<https://www.rfc-editor.org/info/rfc3022>. editor.org/info/rfc3022>.
[RFC3135] Border, J., Kojo, M., Griner, J., Montenegro, G., and Z. [RFC3135] Border, J., Kojo, M., Griner, J., Montenegro, G., and Z.
Shelby, "Performance Enhancing Proxies Intended to Shelby, "Performance Enhancing Proxies Intended to
Mitigate Link-Related Degradations", RFC 3135, Mitigate Link-Related Degradations", RFC 3135,
DOI 10.17487/RFC3135, June 2001, DOI 10.17487/RFC3135, June 2001, <https://www.rfc-
<https://www.rfc-editor.org/info/rfc3135>. editor.org/info/rfc3135>.
[RFC3168] Ramakrishnan, K., Floyd, S., and D. Black, "The Addition [RFC3168] Ramakrishnan, K., Floyd, S., and D. Black, "The Addition
of Explicit Congestion Notification (ECN) to IP", of Explicit Congestion Notification (ECN) to IP",
RFC 3168, DOI 10.17487/RFC3168, September 2001, RFC 3168, DOI 10.17487/RFC3168, September 2001,
<https://www.rfc-editor.org/info/rfc3168>. <https://www.rfc-editor.org/info/rfc3168>.
[RFC4086] Eastlake 3rd, D., Schiller, J., and S. Crocker, [RFC4086] Eastlake 3rd, D., Schiller, J., and S. Crocker,
"Randomness Requirements for Security", BCP 106, RFC 4086, "Randomness Requirements for Security", BCP 106, RFC 4086,
DOI 10.17487/RFC4086, June 2005, DOI 10.17487/RFC4086, June 2005, <https://www.rfc-
<https://www.rfc-editor.org/info/rfc4086>. editor.org/info/rfc4086>.
[RFC4987] Eddy, W., "TCP SYN Flooding Attacks and Common [RFC4987] Eddy, W., "TCP SYN Flooding Attacks and Common
Mitigations", RFC 4987, DOI 10.17487/RFC4987, August 2007, Mitigations", RFC 4987, DOI 10.17487/RFC4987, August 2007,
<https://www.rfc-editor.org/info/rfc4987>. <https://www.rfc-editor.org/info/rfc4987>.
[RFC5226] Narten, T. and H. Alvestrand, "Guidelines for Writing an
IANA Considerations Section in RFCs", RFC 5226,
DOI 10.17487/RFC5226, May 2008,
<https://www.rfc-editor.org/info/rfc5226>.
[RFC5681] Allman, M., Paxson, V., and E. Blanton, "TCP Congestion [RFC5681] Allman, M., Paxson, V., and E. Blanton, "TCP Congestion
Control", RFC 5681, DOI 10.17487/RFC5681, September 2009, Control", RFC 5681, DOI 10.17487/RFC5681, September 2009,
<https://www.rfc-editor.org/info/rfc5681>. <https://www.rfc-editor.org/info/rfc5681>.
[RFC5961] Ramaiah, A., Stewart, R., and M. Dalal, "Improving TCP's [RFC5961] Ramaiah, A., Stewart, R., and M. Dalal, "Improving TCP's
Robustness to Blind In-Window Attacks", RFC 5961, Robustness to Blind In-Window Attacks", RFC 5961,
DOI 10.17487/RFC5961, August 2010, DOI 10.17487/RFC5961, August 2010, <https://www.rfc-
<https://www.rfc-editor.org/info/rfc5961>. editor.org/info/rfc5961>.
[RFC6181] Bagnulo, M., "Threat Analysis for TCP Extensions for [RFC6181] Bagnulo, M., "Threat Analysis for TCP Extensions for
Multipath Operation with Multiple Addresses", RFC 6181, Multipath Operation with Multiple Addresses", RFC 6181,
DOI 10.17487/RFC6181, March 2011, DOI 10.17487/RFC6181, March 2011, <https://www.rfc-
<https://www.rfc-editor.org/info/rfc6181>. editor.org/info/rfc6181>.
[RFC6234] Eastlake 3rd, D. and T. Hansen, "US Secure Hash Algorithms [RFC6234] Eastlake 3rd, D. and T. Hansen, "US Secure Hash Algorithms
(SHA and SHA-based HMAC and HKDF)", RFC 6234, (SHA and SHA-based HMAC and HKDF)", RFC 6234,
DOI 10.17487/RFC6234, May 2011, DOI 10.17487/RFC6234, May 2011, <https://www.rfc-
<https://www.rfc-editor.org/info/rfc6234>. editor.org/info/rfc6234>.
[RFC6356] Raiciu, C., Handley, M., and D. Wischik, "Coupled [RFC6356] Raiciu, C., Handley, M., and D. Wischik, "Coupled
Congestion Control for Multipath Transport Protocols", Congestion Control for Multipath Transport Protocols",
RFC 6356, DOI 10.17487/RFC6356, October 2011, RFC 6356, DOI 10.17487/RFC6356, October 2011,
<https://www.rfc-editor.org/info/rfc6356>. <https://www.rfc-editor.org/info/rfc6356>.
[RFC6528] Gont, F. and S. Bellovin, "Defending against Sequence [RFC6528] Gont, F. and S. Bellovin, "Defending against Sequence
Number Attacks", RFC 6528, DOI 10.17487/RFC6528, February Number Attacks", RFC 6528, DOI 10.17487/RFC6528, February
2012, <https://www.rfc-editor.org/info/rfc6528>. 2012, <https://www.rfc-editor.org/info/rfc6528>.
[RFC6824] Ford, A., Raiciu, C., Handley, M., and O. Bonaventure, [RFC6824] Ford, A., Raiciu, C., Handley, M., and O. Bonaventure,
"TCP Extensions for Multipath Operation with Multiple "TCP Extensions for Multipath Operation with Multiple
Addresses", RFC 6824, DOI 10.17487/RFC6824, January 2013, Addresses", RFC 6824, DOI 10.17487/RFC6824, January 2013,
<https://www.rfc-editor.org/info/rfc6824>. <https://www.rfc-editor.org/info/rfc6824>.
[RFC6897] Scharf, M. and A. Ford, "Multipath TCP (MPTCP) Application [RFC6897] Scharf, M. and A. Ford, "Multipath TCP (MPTCP) Application
Interface Considerations", RFC 6897, DOI 10.17487/RFC6897, Interface Considerations", RFC 6897, DOI 10.17487/RFC6897,
March 2013, <https://www.rfc-editor.org/info/rfc6897>. March 2013, <https://www.rfc-editor.org/info/rfc6897>.
[RFC7323] Borman, D., Braden, B., Jacobson, V., and R.
Scheffenegger, Ed., "TCP Extensions for High Performance",
RFC 7323, DOI 10.17487/RFC7323, September 2014,
<https://www.rfc-editor.org/info/rfc7323>.
[RFC7413] Cheng, Y., Chu, J., Radhakrishnan, S., and A. Jain, "TCP [RFC7413] Cheng, Y., Chu, J., Radhakrishnan, S., and A. Jain, "TCP
Fast Open", RFC 7413, DOI 10.17487/RFC7413, December 2014, Fast Open", RFC 7413, DOI 10.17487/RFC7413, December 2014,
<https://www.rfc-editor.org/info/rfc7413>. <https://www.rfc-editor.org/info/rfc7413>.
[RFC7430] Bagnulo, M., Paasch, C., Gont, F., Bonaventure, O., and C. [RFC7430] Bagnulo, M., Paasch, C., Gont, F., Bonaventure, O., and C.
Raiciu, "Analysis of Residual Threats and Possible Fixes Raiciu, "Analysis of Residual Threats and Possible Fixes
for Multipath TCP (MPTCP)", RFC 7430, for Multipath TCP (MPTCP)", RFC 7430,
DOI 10.17487/RFC7430, July 2015, DOI 10.17487/RFC7430, July 2015, <https://www.rfc-
<https://www.rfc-editor.org/info/rfc7430>. editor.org/info/rfc7430>.
[RFC8041] Bonaventure, O., Paasch, C., and G. Detal, "Use Cases and
Operational Experience with Multipath TCP", RFC 8041,
DOI 10.17487/RFC8041, January 2017, <https://www.rfc-
editor.org/info/rfc8041>.
[RFC8126] Cotton, M., Leiba, B., and T. Narten, "Guidelines for
Writing an IANA Considerations Section in RFCs", BCP 26,
RFC 8126, DOI 10.17487/RFC8126, June 2017,
<https://www.rfc-editor.org/info/rfc8126>.
[TCPLO] Ramaiah, A., "TCP option space extension", Work [TCPLO] Ramaiah, A., "TCP option space extension", Work
in Progress, March 2012. in Progress, March 2012.
Appendix A. Notes on Use of TCP Options Appendix A. Notes on Use of TCP Options
The TCP option space is limited due to the length of the Data Offset The TCP option space is limited due to the length of the Data Offset
field in the TCP header (4 bits), which defines the TCP header length field in the TCP header (4 bits), which defines the TCP header length
in 32-bit words. With the standard TCP header being 20 bytes, this in 32-bit words. With the standard TCP header being 20 bytes, this
leaves a maximum of 40 bytes for options, and many of these may leaves a maximum of 40 bytes for options, and many of these may
skipping to change at page 71, line 29 skipping to change at page 71, line 29
bytes) options. Together these sum to 19 bytes. Some operating bytes) options. Together these sum to 19 bytes. Some operating
systems appear to pad each option up to a word boundary, thus using systems appear to pad each option up to a word boundary, thus using
24 bytes (a brief survey suggests Windows XP and Mac OS X do this, 24 bytes (a brief survey suggests Windows XP and Mac OS X do this,
whereas Linux does not). Optimistically, therefore, we have 21 bytes whereas Linux does not). Optimistically, therefore, we have 21 bytes
spare, or 16 if it has to be word-aligned. In either case, however, spare, or 16 if it has to be word-aligned. In either case, however,
the SYN versions of Multipath Capable (12 bytes) and Join (12 or 16 the SYN versions of Multipath Capable (12 bytes) and Join (12 or 16
bytes) options will fit in this remaining space. bytes) options will fit in this remaining space.
Note that due to the use of a 64-bit data-level sequence space, it is Note that due to the use of a 64-bit data-level sequence space, it is
feasible that MPTCP will not require the timestamp option for feasible that MPTCP will not require the timestamp option for
protection against wrapped sequence numbers (PAWS [RFC1323]), since protection against wrapped sequence numbers (PAWS [RFC7323]), since
the data-level sequence space has far less chance of wrapping. the data-level sequence space has far less chance of wrapping.
Confirmation of the validity of this optimisation is for further Confirmation of the validity of this optimisation is for further
study. study.
TCP data packets typically carry timestamp options in every packet, TCP data packets typically carry timestamp options in every packet,
taking 10 bytes (or 12 with padding). That leaves 30 bytes (or 28, taking 10 bytes (or 12 with padding). That leaves 30 bytes (or 28,
if word-aligned). The Data Sequence Signal (DSS) option varies in if word-aligned). The Data Sequence Signal (DSS) option varies in
length depending on whether the data sequence mapping and DATA_ACK length depending on whether the data sequence mapping and DATA_ACK
are included, and whether the sequence numbers in use are 4 or 8 are included, and whether the sequence numbers in use are 4 or 8
octets. The maximum size of the DSS option is 28 bytes, so even that octets. The maximum size of the DSS option is 28 bytes, so even that
skipping to change at page 72, line 29 skipping to change at page 72, line 29
The ADD_ADDR option can be between 16 and 30 bytes, depending on The ADD_ADDR option can be between 16 and 30 bytes, depending on
whether IPv4 or IPv6 is used, and whether or not the port number is whether IPv4 or IPv6 is used, and whether or not the port number is
present. It is unlikely that such signaling would fit in a data present. It is unlikely that such signaling would fit in a data
packet (although if there is space, it is fine to include it). It is packet (although if there is space, it is fine to include it). It is
recommended to use duplicate ACKs with no other payload or options in recommended to use duplicate ACKs with no other payload or options in
order to transmit these rare signals. Note this is the reason for order to transmit these rare signals. Note this is the reason for
mandating that duplicate ACKs with MPTCP options are not taken as a mandating that duplicate ACKs with MPTCP options are not taken as a
signal of congestion. signal of congestion.
Finally, there are issues with reliable delivery of options. As
options can also be sent on pure ACKs, these are not reliably sent.
This is not an issue for DATA_ACK due to their cumulative nature, but
may be an issue for ADD_ADDR/REMOVE_ADDR options. Here, it is
recommended to send these options redundantly (whether on multiple
paths or on the same path on a number of ACKs -- but interspersed
with data in order to avoid interpretation as congestion). The cases
where options are stripped by middleboxes are discussed in Section 6.
Appendix B. TCP Fast Open and MPTCP Appendix B. TCP Fast Open and MPTCP
TCP Fast Open (TFO) is an experimental TCP extension, described in TCP Fast Open (TFO) is an experimental TCP extension, described in
[RFC7413], which has been introduced to allow sending data one RTT [RFC7413], which has been introduced to allow sending data one RTT
earlier than with regular TCP. This is considered a valuable gain as earlier than with regular TCP. This is considered a valuable gain as
very short connections are very common, especially for HTTP request/ very short connections are very common, especially for HTTP request/
response schemes. It achieves this by sending the SYN-segment response schemes. It achieves this by sending the SYN-segment
together with the application's data and allowing the listener to together with the application's data and allowing the listener to
reply immediately with data after the SYN/ACK. [RFC7413] secures reply immediately with data after the SYN/ACK. [RFC7413] secures
this mechanism, by using a new TCP option that includes a cookie this mechanism, by using a new TCP option that includes a cookie
skipping to change at page 77, line 27 skipping to change at page 77, line 27
C.1.2. Sending Side C.1.2. Sending Side
SND.UNA (64 bits): This is the data sequence number of the next byte SND.UNA (64 bits): This is the data sequence number of the next byte
to be acknowledged, at the MPTCP connection level. This variable to be acknowledged, at the MPTCP connection level. This variable
is updated upon reception of a DSS option containing a DATA_ACK. is updated upon reception of a DSS option containing a DATA_ACK.
SND.NXT (64 bits): This is the data sequence number of the next byte SND.NXT (64 bits): This is the data sequence number of the next byte
to be sent. SND.NXT is used to determine the value of the DSN in to be sent. SND.NXT is used to determine the value of the DSN in
the DSS option. the DSS option.
SND.WND (32 bits with RFC 1323, 16 bits otherwise): This is the SND.WND (32 bits with RFC 7323, 16 bits otherwise): This is the
sending window. MPTCP maintains the sending window at the MPTCP sending window. MPTCP maintains the sending window at the MPTCP
connection level and the same window is shared by all subflows. connection level and the same window is shared by all subflows.
All subflows use the MPTCP connection level SND.WND to compute the All subflows use the MPTCP connection level SND.WND to compute the
SEQ.WND value that is sent in each transmitted segment. SEQ.WND value that is sent in each transmitted segment.
C.1.3. Receiving Side C.1.3. Receiving Side
RCV.NXT (64 bits): This is the data sequence number of the next byte RCV.NXT (64 bits): This is the data sequence number of the next byte
that is expected on the MPTCP connection. This state variable is that is expected on the MPTCP connection. This state variable is
modified upon reception of in-order data. The value of RCV.NXT is modified upon reception of in-order data. The value of RCV.NXT is
used to specify the DATA_ACK that is sent in the DSS option on all used to specify the DATA_ACK that is sent in the DSS option on all
subflows. subflows.
RCV.WND (32 bits with RFC 1323, 16 bits otherwise): This is the RCV.WND (32 bits with RFC 7323, 16 bits otherwise): This is the
connection-level receive window, which is the maximum of the connection-level receive window, which is the maximum of the
RCV.WND on all the subflows. RCV.WND on all the subflows.
C.2. TCP Control Blocks C.2. TCP Control Blocks
The MPTCP control block also contains a list of the TCP control The MPTCP control block also contains a list of the TCP control
blocks that are associated with the MPTCP connection. blocks that are associated with the MPTCP connection.
Note that the TCP control block on the TCP subflows does not contain Note that the TCP control block on the TCP subflows does not contain
the RCV.WND and SND.WND state variables as these are maintained at the RCV.WND and SND.WND state variables as these are maintained at
skipping to change at page 78, line 25 skipping to change at page 78, line 25
be sent on the subflow. SND.NXT is used to set the value of be sent on the subflow. SND.NXT is used to set the value of
SEG.SEQ upon transmission of the next segment. SEG.SEQ upon transmission of the next segment.
C.2.2. Receiving Side C.2.2. Receiving Side
RCV.NXT (32 bits): This is the sequence number of the next byte that RCV.NXT (32 bits): This is the sequence number of the next byte that
is expected on the subflow. This state variable is modified upon is expected on the subflow. This state variable is modified upon
reception of in-order segments. The value of RCV.NXT is copied to reception of in-order segments. The value of RCV.NXT is copied to
the SEG.ACK field of the next segments transmitted on the subflow. the SEG.ACK field of the next segments transmitted on the subflow.
RCV.WND (32 bits with RFC 1323, 16 bits otherwise): This is the RCV.WND (32 bits with RFC 7323, 16 bits otherwise): This is the
subflow-level receive window that is updated with the window field subflow-level receive window that is updated with the window field
from the segments received on this subflow. from the segments received on this subflow.
Appendix D. Finite State Machine Appendix D. Finite State Machine
The diagram in Figure 22 shows the Finite State Machine for The diagram in Figure 22 shows the Finite State Machine for
connection-level closure. This illustrates how the DATA_FIN connection-level closure. This illustrates how the DATA_FIN
connection-level signal (indicated as the DFIN flag on a DATA_ACK) connection-level signal (indicated in the diagram as the DFIN flag on
interacts with subflow-level FINs, and permits "break-before-make" a DATA_ACK) interacts with subflow-level FINs, and permits "break-
handover between subflows. before-make" handover between subflows.
+---------+ +---------+
| M_ESTAB | | M_ESTAB |
+---------+ +---------+
M_CLOSE | | rcv DATA_FIN M_CLOSE | | rcv DATA_FIN
------- | | ------- ------- | | -------
+---------+ snd DATA_FIN / \ snd DATA_ACK[DFIN] +---------+ +---------+ snd DATA_FIN / \ snd DATA_ACK[DFIN] +---------+
| M_FIN |<----------------- ------------------->| M_CLOSE | | M_FIN |<----------------- ------------------->| M_CLOSE |
| WAIT-1 |--------------------------- | WAIT | | WAIT-1 |--------------------------- | WAIT |
+---------+ rcv DATA_FIN \ +---------+ +---------+ rcv DATA_FIN \ +---------+
skipping to change at page 79, line 34 skipping to change at page 79, line 34
| snd DATA_ACK[DFIN] V delete MPTCP PCB V | snd DATA_ACK[DFIN] V delete MPTCP PCB V
\ +-----------+ +---------+ \ +-----------+ +---------+
------------------------>|M_TIME WAIT|----------------->| M_CLOSED| ------------------------>|M_TIME WAIT|----------------->| M_CLOSED|
+-----------+ +---------+ +-----------+ +---------+
All subflows in CLOSED All subflows in CLOSED
------------ ------------
delete MPTCP PCB delete MPTCP PCB
Figure 22: Finite State Machine for Connection Closure Figure 22: Finite State Machine for Connection Closure
Appendix E. Changes from RFC6184
This section lists the key technical changes between RFC6824
[RFC6824], specifying MPTCP v0, and this document, which obsoletes
RFC6824 and specifies MPTCP v1. Note that this specification is not
backwards compatible with RFC6824.
o The document incorporates lessons learnt from the various
implementations, deployments and experiments gathered in the
documents "Use Cases and Operational Experience with Multipath
TCP" [RFC8041] and the IETF Journal article "Multipath TCP
Deployments" [deployments].
o Connection initiation, through the exchange of the MP_CAPABLE
MPTCP option, is different from RFC6824. In order to permit
servers to act statelessly, the SYN doesn't include A's key (it is
still sent in the ACK).
o This requires MP_CAPABLE to also be sent reliably on the third
ACK. If safe receipt of the third ACK cannot be inferred, the
MP_CAPABLE option must be repeated on the first data packet.
o In the Flags field of MP_CAPABLE, C is now assigned to mean that
the sender of this option will not accept additional MPTCP
subflows to the source address and port. This is an efficiency
improvement, for example where the sender is behind a strict NAT.
o In the Flags field of MP_CAPABLE, H now indicates the use of HMAC-
SHA256 (rather than HMAC-SHA1).
o Connection initiation also defines the procedure for version
negotiation, for implementations that support both v0 (RFC6824)
and v1 (this document).
o The HMAC-SHA256 (rather than HMAC-SHA1) algorithm is used, as the
algorithm provides better security. It is used to generate the
token in the MP_JOIN and ADD_ADDR messages, and to set the initial
data sequence number.
o A new subflow-level option exists to signal reasons for sending a
RST on a subflow (MP_TCPRST Section 3.6), which can help an
implementation decide whether to attempt later re-connection.
o The MP_PRIO option (Section 3.3.8), which is used to signal a
change of priority for a subflow, no longer includes the AddrID
field. Its purpose was to allow the changed priority to be
applied on a subflow other than the one it was sent on. However,
it has been realised that this could be used by a man-in-the-
middle to divert all traffic on to its own path, and MP_PRIO does
not include a token or other security mechanism.
o The ADD_ADDR option (Section 3.4.1), which is used to inform the
other host about another potential address, is different in
several ways. It now includes an HMAC of the added address, for
enhanced security. In addition, reliability for the ADD_ADDR
option has been added: the IPVer field is replaced with a flag
field, and one flag is assigned (E) which is used as an 'Echo' so
a host can indicate that it has received the option.
o An additional way of performing a Fast Close is described, by
sending a MP_FASTCLOSE option on a RST on all subflows. This
allows the host to tear down the subflows and the connection
immediately.
o In the IANA registry a new MPTCP subtype option, MP_EXPERIMENTAL,
is reserved for private experiments. However, the document
doesn't define how to use the subtype option.
o A new Appendix discusses the usage of both the MPTCP and TCP Fast
Open on the same packet (Appendix B).
Authors' Addresses Authors' Addresses
Alan Ford Alan Ford
Pexip Pexip
EMail: alan.ford@gmail.com EMail: alan.ford@gmail.com
Costin Raiciu Costin Raiciu
University Politehnica of Bucharest University Politehnica of Bucharest
Splaiul Independentei 313 Splaiul Independentei 313
 End of changes. 51 change blocks. 
115 lines changed or deleted 174 lines changed or added

This html diff was produced by rfcdiff 1.47. The latest version is available from http://tools.ietf.org/tools/rfcdiff/