draft-ietf-mptcp-rfc6824bis-05.txt   draft-ietf-mptcp-rfc6824bis-06.txt 
Internet Engineering Task Force A. Ford Internet Engineering Task Force A. Ford
Internet-Draft Pexip Internet-Draft Pexip
Obsoletes: 6824 (if approved) C. Raiciu Obsoletes: 6824 (if approved) C. Raiciu
Intended status: Experimental U. Politechnica of Bucharest Intended status: Experimental U. Politechnica of Bucharest
Expires: July 15, 2016 M. Handley Expires: January 7, 2017 M. Handley
U. College London U. College London
O. Bonaventure O. Bonaventure
U. catholique de Louvain U. catholique de Louvain
C. Paasch C. Paasch
Apple, Inc. Apple, Inc.
January 12, 2016 July 6, 2016
TCP Extensions for Multipath Operation with Multiple Addresses TCP Extensions for Multipath Operation with Multiple Addresses
draft-ietf-mptcp-rfc6824bis-05 draft-ietf-mptcp-rfc6824bis-06
Abstract Abstract
TCP/IP communication is currently restricted to a single path per TCP/IP communication is currently restricted to a single path per
connection, yet multiple paths often exist between peers. The connection, yet multiple paths often exist between peers. The
simultaneous use of these multiple paths for a TCP/IP session would simultaneous use of these multiple paths for a TCP/IP session would
improve resource usage within the network and, thus, improve user improve resource usage within the network and, thus, improve user
experience through higher throughput and improved resilience to experience through higher throughput and improved resilience to
network failure. network failure.
Multipath TCP provides the ability to simultaneously use multiple Multipath TCP provides the ability to simultaneously use multiple
paths between peers. This document presents a set of extensions to paths between peers. This document presents a set of extensions to
traditional TCP to support multipath operation. The protocol offers traditional TCP to support multipath operation. The protocol offers
the same type of service to applications as TCP (i.e., reliable the same type of service to applications as TCP (i.e., reliable
bytestream), and it provides the components necessary to establish bytestream), and it provides the components necessary to establish
and use multiple TCP flows across potentially disjoint paths. and use multiple TCP flows across potentially disjoint paths.
This document specifies v1 of Multipath TCP, obsoleting v0 as This document specifies v1 of Multipath TCP, obsoleting v0 as
specified in RFC6824 [5] through clarifications and modifications specified in RFC6824 [RFC6824] through clarifications and
primarily driven by deployment experience. modifications primarily driven by deployment experience.
Status of This Memo Status of This Memo
This Internet-Draft is submitted in full conformance with the This Internet-Draft is submitted in full conformance with the
provisions of BCP 78 and BCP 79. provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF). Note that other groups may also distribute Task Force (IETF). Note that other groups may also distribute
working documents as Internet-Drafts. The list of current Internet- working documents as Internet-Drafts. The list of current Internet-
Drafts is at http://datatracker.ietf.org/drafts/current/. Drafts is at http://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress." material or to cite them other than as "work in progress."
This Internet-Draft will expire on July 15, 2016. This Internet-Draft will expire on January 7, 2017.
Copyright Notice Copyright Notice
Copyright (c) 2016 IETF Trust and the persons identified as the Copyright (c) 2016 IETF Trust and the persons identified as the
document authors. All rights reserved. document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents Provisions Relating to IETF Documents
(http://trustee.ietf.org/license-info) in effect on the date of (http://trustee.ietf.org/license-info) in effect on the date of
publication of this document. Please review these documents publication of this document. Please review these documents
skipping to change at page 3, line 23 skipping to change at page 3, line 23
3.9. Error Handling . . . . . . . . . . . . . . . . . . . . . . 50 3.9. Error Handling . . . . . . . . . . . . . . . . . . . . . . 50
3.10. Heuristics . . . . . . . . . . . . . . . . . . . . . . . . 50 3.10. Heuristics . . . . . . . . . . . . . . . . . . . . . . . . 50
3.10.1. Port Usage . . . . . . . . . . . . . . . . . . . . . . 51 3.10.1. Port Usage . . . . . . . . . . . . . . . . . . . . . . 51
3.10.2. Delayed Subflow Start and Subflow Symmetry . . . . . . 51 3.10.2. Delayed Subflow Start and Subflow Symmetry . . . . . . 51
3.10.3. Failure Handling . . . . . . . . . . . . . . . . . . . 52 3.10.3. Failure Handling . . . . . . . . . . . . . . . . . . . 52
4. Semantic Issues . . . . . . . . . . . . . . . . . . . . . . . 53 4. Semantic Issues . . . . . . . . . . . . . . . . . . . . . . . 53
5. Security Considerations . . . . . . . . . . . . . . . . . . . 54 5. Security Considerations . . . . . . . . . . . . . . . . . . . 54
6. Interactions with Middleboxes . . . . . . . . . . . . . . . . 57 6. Interactions with Middleboxes . . . . . . . . . . . . . . . . 57
7. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 60 7. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 60
8. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 60 8. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 60
9. References . . . . . . . . . . . . . . . . . . . . . . . . . . 62 8.1. MPTCP Option Subtypes . . . . . . . . . . . . . . . . . . 61
9.1. Normative References . . . . . . . . . . . . . . . . . . . 62 8.2. MPTCP Handshake Algorithms . . . . . . . . . . . . . . . . 62
9.2. Informative References . . . . . . . . . . . . . . . . . . 63 8.3. MP_TCPRST Reason Codes . . . . . . . . . . . . . . . . . . 62
Appendix A. Notes on Use of TCP Options . . . . . . . . . . . . . 65 8.4. Experimental option registry . . . . . . . . . . . . . . . 63
Appendix B. Control Blocks . . . . . . . . . . . . . . . . . . . 67 9. References . . . . . . . . . . . . . . . . . . . . . . . . . . 63
B.1. MPTCP Control Block . . . . . . . . . . . . . . . . . . . 67 9.1. Normative References . . . . . . . . . . . . . . . . . . . 63
B.1.1. Authentication and Metadata . . . . . . . . . . . . . 67 9.2. Informative References . . . . . . . . . . . . . . . . . . 64
B.1.2. Sending Side . . . . . . . . . . . . . . . . . . . . . 68 Appendix A. Notes on Use of TCP Options . . . . . . . . . . . . . 66
B.1.3. Receiving Side . . . . . . . . . . . . . . . . . . . . 68 Appendix B. Control Blocks . . . . . . . . . . . . . . . . . . . 68
B.2. TCP Control Blocks . . . . . . . . . . . . . . . . . . . . 68 B.1. MPTCP Control Block . . . . . . . . . . . . . . . . . . . 68
B.1.1. Authentication and Metadata . . . . . . . . . . . . . 68
B.1.2. Sending Side . . . . . . . . . . . . . . . . . . . . . 69
B.1.3. Receiving Side . . . . . . . . . . . . . . . . . . . . 69
B.2. TCP Control Blocks . . . . . . . . . . . . . . . . . . . . 69
B.2.1. Sending Side . . . . . . . . . . . . . . . . . . . . . 69 B.2.1. Sending Side . . . . . . . . . . . . . . . . . . . . . 69
B.2.2. Receiving Side . . . . . . . . . . . . . . . . . . . . 69 B.2.2. Receiving Side . . . . . . . . . . . . . . . . . . . . 70
Appendix C. Finite State Machine . . . . . . . . . . . . . . . . 69 Appendix C. Finite State Machine . . . . . . . . . . . . . . . . 70
1. Introduction 1. Introduction
Multipath TCP (MPTCP) is a set of extensions to regular TCP [1] to Multipath TCP (MPTCP) is a set of extensions to regular TCP [RFC0793]
provide a Multipath TCP [2] service, which enables a transport to provide a Multipath TCP [RFC6182] service, which enables a
connection to operate across multiple paths simultaneously. This transport connection to operate across multiple paths simultaneously.
document presents the protocol changes required to add multipath This document presents the protocol changes required to add multipath
capability to TCP; specifically, those for signaling and setting up capability to TCP; specifically, those for signaling and setting up
multiple paths ("subflows"), managing these subflows, reassembly of multiple paths ("subflows"), managing these subflows, reassembly of
data, and termination of sessions. This is not the only information data, and termination of sessions. This is not the only information
required to create a Multipath TCP implementation, however. This required to create a Multipath TCP implementation, however. This
document is complemented by three others: document is complemented by three others:
o Architecture [2], which explains the motivations behind Multipath o Architecture [RFC6182], which explains the motivations behind
TCP, contains a discussion of high-level design decisions on which Multipath TCP, contains a discussion of high-level design
this design is based, and an explanation of a functional decisions on which this design is based, and an explanation of a
separation through which an extensible MPTCP implementation can be functional separation through which an extensible MPTCP
developed. implementation can be developed.
o Congestion control [6] presents a safe congestion control o Congestion control [RFC6356] presents a safe congestion control
algorithm for coupling the behavior of the multiple paths in order algorithm for coupling the behavior of the multiple paths in order
to "do no harm" to other network users. to "do no harm" to other network users.
o Application considerations [7] discusses what impact MPTCP will o Application considerations [RFC6897] discusses what impact MPTCP
have on applications, what applications will want to do with will have on applications, what applications will want to do with
MPTCP, and as a consequence of these factors, what API extensions MPTCP, and as a consequence of these factors, what API extensions
an MPTCP implementation should present. an MPTCP implementation should present.
This document is an update to, and obsoletes, the v0 specification of This document is an update to, and obsoletes, the v0 specification of
Multipath TCP [5]. This document specifies MPTCP v1, which is not Multipath TCP [RFC6824]. This document specifies MPTCP v1, which is
backward compatible with MPTCP v0. This document additionally not backward compatible with MPTCP v0. This document additionally
defines version negotiation procedures for implementations that defines version negotiation procedures for implementations that
support both versions. support both versions.
1.1. Design Assumptions 1.1. Design Assumptions
In order to limit the potentially huge design space, the working In order to limit the potentially huge design space, the working
group imposed two key constraints on the Multipath TCP design group imposed two key constraints on the Multipath TCP design
presented in this document: presented in this document:
o It must be backwards-compatible with current, regular TCP, to o It must be backwards-compatible with current, regular TCP, to
skipping to change at page 5, line 7 skipping to change at page 5, line 7
o It can be assumed that one or both hosts are multihomed and o It can be assumed that one or both hosts are multihomed and
multiaddressed. multiaddressed.
To simplify the design, we assume that the presence of multiple To simplify the design, we assume that the presence of multiple
addresses at a host is sufficient to indicate the existence of addresses at a host is sufficient to indicate the existence of
multiple paths. These paths need not be entirely disjoint: they may multiple paths. These paths need not be entirely disjoint: they may
share one or many routers between them. Even in such a situation, share one or many routers between them. Even in such a situation,
making use of multiple paths is beneficial, improving resource making use of multiple paths is beneficial, improving resource
utilization and resilience to a subset of node failures. The utilization and resilience to a subset of node failures. The
congestion control algorithms defined in [6] ensure this does not act congestion control algorithms defined in [RFC6356] ensure this does
detrimentally. Furthermore, there may be some scenarios where not act detrimentally. Furthermore, there may be some scenarios
different TCP ports on a single host can provide disjoint paths (such where different TCP ports on a single host can provide disjoint paths
as through certain Equal-Cost Multipath (ECMP) implementations [8]), (such as through certain Equal-Cost Multipath (ECMP) implementations
and so the MPTCP design also supports the use of ports in path [RFC2992]), and so the MPTCP design also supports the use of ports in
identifiers. path identifiers.
There are three aspects to the backwards-compatibility listed above There are three aspects to the backwards-compatibility listed above
(discussed in more detail in [2]): (discussed in more detail in [RFC6182]):
External Constraints: The protocol must function through the vast External Constraints: The protocol must function through the vast
majority of existing middleboxes such as NATs, firewalls, and majority of existing middleboxes such as NATs, firewalls, and
proxies, and as such must resemble existing TCP as far as possible proxies, and as such must resemble existing TCP as far as possible
on the wire. Furthermore, the protocol must not assume the on the wire. Furthermore, the protocol must not assume the
segments it sends on the wire arrive unmodified at the segments it sends on the wire arrive unmodified at the
destination: they may be split or coalesced; TCP options may be destination: they may be split or coalesced; TCP options may be
removed or duplicated. removed or duplicated.
Application Constraints: The protocol must be usable with no change Application Constraints: The protocol must be usable with no change
to existing applications that use the common TCP API (although it to existing applications that use the common TCP API (although it
is reasonable that not all features would be available to such is reasonable that not all features would be available to such
legacy applications). Furthermore, the protocol must provide the legacy applications). Furthermore, the protocol must provide the
same service model as regular TCP to the application. same service model as regular TCP to the application.
Fallback: The protocol should be able to fall back to standard TCP Fallback: The protocol should be able to fall back to standard TCP
with no interference from the user, to be able to communicate with with no interference from the user, to be able to communicate with
legacy hosts. legacy hosts.
The complementary application considerations document [7] discusses The complementary application considerations document [RFC6897]
the necessary features of an API to provide backwards-compatibility, discusses the necessary features of an API to provide backwards-
as well as API extensions to convey the behavior of MPTCP at a level compatibility, as well as API extensions to convey the behavior of
of control and information equivalent to that available with regular, MPTCP at a level of control and information equivalent to that
single-path TCP. available with regular, single-path TCP.
Further discussion of the design constraints and associated design Further discussion of the design constraints and associated design
decisions are given in the MPTCP Architecture document [2] and in decisions are given in the MPTCP Architecture document [RFC6182] and
[9]. in [howhard].
1.2. Multipath TCP in the Networking Stack 1.2. Multipath TCP in the Networking Stack
MPTCP operates at the transport layer and aims to be transparent to MPTCP operates at the transport layer and aims to be transparent to
both higher and lower layers. It is a set of additional features on both higher and lower layers. It is a set of additional features on
top of standard TCP; Figure 1 illustrates this layering. MPTCP is top of standard TCP; Figure 1 illustrates this layering. MPTCP is
designed to be usable by legacy applications with no changes; designed to be usable by legacy applications with no changes;
detailed discussion of its interactions with applications is given in detailed discussion of its interactions with applications is given in
[7]. [RFC6897].
+-------------------------------+ +-------------------------------+
| Application | | Application |
+---------------+ +-------------------------------+ +---------------+ +-------------------------------+
| Application | | MPTCP | | Application | | MPTCP |
+---------------+ + - - - - - - - + - - - - - - - + +---------------+ + - - - - - - - + - - - - - - - +
| TCP | | Subflow (TCP) | Subflow (TCP) | | TCP | | Subflow (TCP) | Subflow (TCP) |
+---------------+ +-------------------------------+ +---------------+ +-------------------------------+
| IP | | IP | IP | | IP | | IP | IP |
+---------------+ +-------------------------------+ +---------------+ +-------------------------------+
skipping to change at page 7, line 13 skipping to change at page 7, line 13
Section 4. Section 4.
1.4. MPTCP Concept 1.4. MPTCP Concept
This section provides a high-level summary of normal operation of This section provides a high-level summary of normal operation of
MPTCP, and is illustrated by the scenario shown in Figure 2. A MPTCP, and is illustrated by the scenario shown in Figure 2. A
detailed description of operation is given in Section 3. detailed description of operation is given in Section 3.
o To a non-MPTCP-aware application, MPTCP will behave the same as o To a non-MPTCP-aware application, MPTCP will behave the same as
normal TCP. Extended APIs could provide additional control to normal TCP. Extended APIs could provide additional control to
MPTCP-aware applications [7]. An application begins by opening a MPTCP-aware applications [RFC6897]. An application begins by
TCP socket in the normal way. MPTCP signaling and operation are opening a TCP socket in the normal way. MPTCP signaling and
handled by the MPTCP implementation. operation are handled by the MPTCP implementation.
o An MPTCP connection begins similarly to a regular TCP connection. o An MPTCP connection begins similarly to a regular TCP connection.
This is illustrated in Figure 2 where an MPTCP connection is This is illustrated in Figure 2 where an MPTCP connection is
established between addresses A1 and B1 on Hosts A and B, established between addresses A1 and B1 on Hosts A and B,
respectively. respectively.
o If extra paths are available, additional TCP sessions (termed o If extra paths are available, additional TCP sessions (termed
MPTCP "subflows") are created on these paths, and are combined MPTCP "subflows") are created on these paths, and are combined
with the existing session, which continues to appear as a single with the existing session, which continues to appear as a single
connection to the applications at both ends. The creation of the connection to the applications at both ends. The creation of the
skipping to change at page 8, line 26 skipping to change at page 8, line 26
| |<---------------------| | | |<---------------------| |
| | | | | | | |
| | | | | | | |
Figure 2: Example MPTCP Usage Scenario Figure 2: Example MPTCP Usage Scenario
1.5. Requirements Language 1.5. Requirements Language
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
document are to be interpreted as described in RFC 2119 [3]. document are to be interpreted as described in RFC 2119 [RFC2119].
2. Operation Overview 2. Operation Overview
This section presents a single description of common MPTCP operation, This section presents a single description of common MPTCP operation,
with reference to the protocol operation. This is a high-level with reference to the protocol operation. This is a high-level
overview of the key functions; the full specification follows in overview of the key functions; the full specification follows in
Section 3. Extensibility and negotiated features are not discussed Section 3. Extensibility and negotiated features are not discussed
here. Considerable reference is made to symbolic names of MPTCP here. Considerable reference is made to symbolic names of MPTCP
options throughout this section -- these are subtypes of the IANA- options throughout this section -- these are subtypes of the IANA-
assigned MPTCP option (see Section 8), and their formats are defined assigned MPTCP option (see Section 8), and their formats are defined
skipping to change at page 12, line 37 skipping to change at page 12, line 37
o To cope with NATs on the path, addresses are referred to by o To cope with NATs on the path, addresses are referred to by
Address IDs, in case the IP packet's source address gets changed Address IDs, in case the IP packet's source address gets changed
by a NAT. Setting up a new TCP flow is not possible if the by a NAT. Setting up a new TCP flow is not possible if the
passive opener is behind a NAT; to allow subflows to be created passive opener is behind a NAT; to allow subflows to be created
when either end is behind a NAT, MPTCP uses the ADD_ADDR message. when either end is behind a NAT, MPTCP uses the ADD_ADDR message.
o MPTCP falls back to ordinary TCP if MPTCP operation is not o MPTCP falls back to ordinary TCP if MPTCP operation is not
possible, for example, if one host is not MPTCP capable or if a possible, for example, if one host is not MPTCP capable or if a
middlebox alters the payload. middlebox alters the payload.
o To meet the threats identified in [10], the following steps are o To meet the threats identified in [RFC6181], the following steps
taken: keys are sent in the clear in the MP_CAPABLE messages; are taken: keys are sent in the clear in the MP_CAPABLE messages;
MP_JOIN messages are secured with HMAC-SHA1 ([11], [4]) using MP_JOIN messages are secured with HMAC-SHA1 ([RFC2104], [sha1])
those keys; and standard TCP validity checks are made on the other using those keys; and standard TCP validity checks are made on the
messages (ensuring sequence numbers are in-window [12]). other messages (ensuring sequence numbers are in-window
[RFC5961]).
3. MPTCP Protocol 3. MPTCP Protocol
This section describes the operation of the MPTCP protocol, and is This section describes the operation of the MPTCP protocol, and is
subdivided into sections for each key part of the protocol operation. subdivided into sections for each key part of the protocol operation.
All MPTCP operations are signaled using optional TCP header fields. All MPTCP operations are signaled using optional TCP header fields.
A single TCP option number ("Kind") has been assigned by IANA for A single TCP option number ("Kind") has been assigned by IANA for
MPTCP (see Section 8), and then individual messages will be MPTCP (see Section 8), and then individual messages will be
determined by a "subtype", the values of which are also stored in an determined by a "subtype", the values of which are also stored in an
skipping to change at page 13, line 33 skipping to change at page 13, line 34
Those MPTCP options associated with subflow initiation are used on Those MPTCP options associated with subflow initiation are used on
packets with the SYN flag set. Additionally, there is one MPTCP packets with the SYN flag set. Additionally, there is one MPTCP
option for signaling metadata to ensure segmented data can be option for signaling metadata to ensure segmented data can be
recombined for delivery to the application. recombined for delivery to the application.
The remaining options, however, are signals that do not need to be on The remaining options, however, are signals that do not need to be on
a specific packet, such as those for signaling additional addresses. a specific packet, such as those for signaling additional addresses.
Whilst an implementation may desire to send MPTCP options as soon as Whilst an implementation may desire to send MPTCP options as soon as
possible, it may not be possible to combine all desired options (both possible, it may not be possible to combine all desired options (both
those for MPTCP and for regular TCP, such as SACK (selective those for MPTCP and for regular TCP, such as SACK (selective
acknowledgment) [13]) on a single packet. Therefore, an acknowledgment) [RFC2018]) on a single packet. Therefore, an
implementation may choose to send duplicate ACKs containing the implementation may choose to send duplicate ACKs containing the
additional signaling information. This changes the semantics of a additional signaling information. This changes the semantics of a
duplicate ACK; these are usually only sent as a signal of a lost duplicate ACK; these are usually only sent as a signal of a lost
segment [14] in regular TCP. Therefore, an MPTCP implementation segment [RFC5681] in regular TCP. Therefore, an MPTCP implementation
receiving a duplicate ACK that contains an MPTCP option MUST NOT receiving a duplicate ACK that contains an MPTCP option MUST NOT
treat it as a signal of congestion. Additionally, an MPTCP treat it as a signal of congestion. Additionally, an MPTCP
implementation SHOULD NOT send more than two duplicate ACKs in a row implementation SHOULD NOT send more than two duplicate ACKs in a row
for the purposes of sending MPTCP options alone, in order to ensure for the purposes of sending MPTCP options alone, in order to ensure
no middleboxes misinterpret this as a sign of congestion. no middleboxes misinterpret this as a sign of congestion.
Furthermore, standard TCP validity checks (such as ensuring the Furthermore, standard TCP validity checks (such as ensuring the
sequence number and acknowledgment number are within window) MUST be sequence number and acknowledgment number are within window) MUST be
undertaken before processing any MPTCP signals, as described in [12], undertaken before processing any MPTCP signals, as described in
and initial subfow sequence numbers SHOULD be generated according to [RFC5961], and initial subfow sequence numbers SHOULD be generated
the recommendations in [15]. according to the recommendations in [RFC6528].
3.1. Connection Initiation 3.1. Connection Initiation
Connection initiation begins with a SYN, SYN/ACK, ACK exchange on a Connection initiation begins with a SYN, SYN/ACK, ACK exchange on a
single path. Each packet contains the Multipath Capable (MP_CAPABLE) single path. Each packet contains the Multipath Capable (MP_CAPABLE)
MPTCP option (Figure 4). This option declares its sender is capable MPTCP option (Figure 4). This option declares its sender is capable
of performing Multipath TCP and wishes to do so on this particular of performing Multipath TCP and wishes to do so on this particular
connection. connection.
The MP_CAPABLE exchange in this specification (v1) is different to The MP_CAPABLE exchange in this specification (v1) is different to
that specified in v0 [5]. If a host supports multiple versions of that specified in v0 [RFC6824]. If a host supports multiple versions
MPTCP, the sender of the MP_CAPABLE option SHOULD signal the highest of MPTCP, the sender of the MP_CAPABLE option SHOULD signal the
version number it supports. The passive opener, on receipt of this, highest version number it supports. The passive opener, on receipt
will signal the version number it wishes to use, which MUST be equal of this, will signal the version number it wishes to use, which MUST
to or lower than the version number indicated in the initial be equal to or lower than the version number indicated in the initial
MP_CAPABLE. Given the SYN exchange is different between v1 and v0 MP_CAPABLE. Given the SYN exchange is different between v1 and v0
the exchange cannot be immediately downgraded, and therefore if the the exchange cannot be immediately downgraded, and therefore if the
far end has requested a lower version then the initiator SHOULD far end has requested a lower version then the initiator SHOULD
respond with an ACK without any MP_CAPABLE option, to fall back to respond with an ACK without any MP_CAPABLE option, to fall back to
regular TCP. If the initiator supports the requsted version, on regular TCP. If the initiator supports the requsted version, on
future connections to the target host, the initiator MAY cache the future connections to the target host, the initiator MAY cache the
version preference. Alternatively, the initiator MAY close the version preference. Alternatively, the initiator MAY close the
connection with a TCP RST and immediately re-establish with the connection with a TCP RST and immediately re-establish with the
requested version of MPTCP. requested version of MPTCP.
skipping to change at page 15, line 41 skipping to change at page 15, line 41
identify the connection using a 32-bit "token". This token is a identify the connection using a 32-bit "token". This token is a
cryptographic hash of this key. The algorithm for this process is cryptographic hash of this key. The algorithm for this process is
dependent on the authentication algorithm selected; the method of dependent on the authentication algorithm selected; the method of
selection is defined later in this section. selection is defined later in this section.
Upon reception of the initial SYN-segment, a stateful server Upon reception of the initial SYN-segment, a stateful server
generates a random key and replies with a SYN/ACK. The key's method generates a random key and replies with a SYN/ACK. The key's method
of generation is implementation specific. The key MUST be hard to of generation is implementation specific. The key MUST be hard to
guess, and it MUST be unique for the sending host at any one time. guess, and it MUST be unique for the sending host at any one time.
Recommendations for generating random numbers for use in keys are Recommendations for generating random numbers for use in keys are
given in [16]. Connections will be indexed at each host by the token given in [RFC4086]. Connections will be indexed at each host by the
(a one-way hash of the key). Therefore, an implementation will token (a one-way hash of the key). Therefore, an implementation will
require a mapping from each token to the corresponding connection, require a mapping from each token to the corresponding connection,
and in turn to the keys for the connection. and in turn to the keys for the connection.
There is a risk that two different keys will hash to the same token. There is a risk that two different keys will hash to the same token.
The risk of hash collisions is usually small, unless the host is The risk of hash collisions is usually small, unless the host is
handling many tens of thousands of connections. Therefore, an handling many tens of thousands of connections. Therefore, an
implementation SHOULD check its list of connection tokens to ensure implementation SHOULD check its list of connection tokens to ensure
there is not a collision before sending its key, and if there is, there is not a collision before sending its key, and if there is,
then it should generate a new key. This would, however, be costly then it should generate a new key. This would, however, be costly
for a server with thousands of connections. The subflow handshake for a server with thousands of connections. The subflow handshake
skipping to change at page 16, line 24 skipping to change at page 16, line 24
free to exchange cryptographic material out-of-band and generate free to exchange cryptographic material out-of-band and generate
these keys from this, in order to provide additional mechanisms by these keys from this, in order to provide additional mechanisms by
which to verify the identity of the communicating entities. For which to verify the identity of the communicating entities. For
example, an implementation could choose to link its MPTCP keys to example, an implementation could choose to link its MPTCP keys to
those used in higher-layer TLS or SSH connections. those used in higher-layer TLS or SSH connections.
If the server behaves in a stateless manner, it has to generate its If the server behaves in a stateless manner, it has to generate its
own key in a verifiable fashion. This verifiable way of generating own key in a verifiable fashion. This verifiable way of generating
the key can be done by using a hash of the 4-tuple, sequence number the key can be done by using a hash of the 4-tuple, sequence number
and a local secret (similar to what is done for the TCP-sequence and a local secret (similar to what is done for the TCP-sequence
number [17]). It will thus be able to verify whether it is indeed number [RFC4987]). It will thus be able to verify whether it is
the originator of the key echoed back in the later MP_CAPABLE option. indeed the originator of the key echoed back in the later MP_CAPABLE
As for a stateful server, the tokens SHOULD be checked for option. As for a stateful server, the tokens SHOULD be checked for
uniqueness, however if uniqueness is not met, and there is no way to uniqueness, however if uniqueness is not met, and there is no way to
generate an alternative verifiable key, then the connection MUST fall generate an alternative verifiable key, then the connection MUST fall
back to using regular TCP by not sending a MP_CAPABLE in the SYN/ACK. back to using regular TCP by not sending a MP_CAPABLE in the SYN/ACK.
The ACK carries both A's key and B's key. This is the first time The ACK carries both A's key and B's key. This is the first time
that A's key is seen on the wire, although it is expected that A will that A's key is seen on the wire, although it is expected that A will
have generated a key locally before the initial SYN. The echoing of have generated a key locally before the initial SYN. The echoing of
B's key allows B to operate statelessly, as described above. B's key allows B to operate statelessly, as described above.
Therefore, A's key must be delivered reliably to B, and in order to Therefore, A's key must be delivered reliably to B, and in order to
do this, the transmission of this packet must be made reliable. do this, the transmission of this packet must be made reliable.
skipping to change at page 18, line 9 skipping to change at page 18, line 9
"G" to 0. "G" to 0.
A crypto algorithm MUST be specified. If flag bits C through H are A crypto algorithm MUST be specified. If flag bits C through H are
all 0, the MP_CAPABLE option MUST be treated as invalid and ignored all 0, the MP_CAPABLE option MUST be treated as invalid and ignored
(that is, it must be treated as a regular TCP handshake). (that is, it must be treated as a regular TCP handshake).
The selection of the authentication algorithm also impacts the The selection of the authentication algorithm also impacts the
algorithm used to generate the token and the Initial Data Sequence algorithm used to generate the token and the Initial Data Sequence
Number (IDSN). In this specification, with only the SHA-1 algorithm Number (IDSN). In this specification, with only the SHA-1 algorithm
(bit "H") specified and selected, the token MUST be a truncated (most (bit "H") specified and selected, the token MUST be a truncated (most
significant 32 bits) SHA-1 hash ([4], [18]) of the key. A different, significant 32 bits) SHA-1 hash ([sha1], [RFC6234]) of the key. A
64-bit truncation (the least significant 64 bits) of the SHA-1 hash different, 64-bit truncation (the least significant 64 bits) of the
of the key MUST be used as the IDSN. Note that the key MUST be SHA-1 hash of the key MUST be used as the IDSN. Note that the key
hashed in network byte order. Also note that the "least significant" MUST be hashed in network byte order. Also note that the "least
bits MUST be the rightmost bits of the SHA-1 digest, as per [4]. significant" bits MUST be the rightmost bits of the SHA-1 digest, as
Future specifications of the use of the crypto bits may choose to per [sha1]. Future specifications of the use of the crypto bits may
specify different algorithms for token and IDSN generation. choose to specify different algorithms for token and IDSN generation.
Both the crypto and checksum bits negotiate capabilities in similar Both the crypto and checksum bits negotiate capabilities in similar
ways. For the Checksum Required bit (labeled "A"), if either host ways. For the Checksum Required bit (labeled "A"), if either host
requires the use of checksums, checksums MUST be used. In other requires the use of checksums, checksums MUST be used. In other
words, the only way for checksums not to be used is if both hosts in words, the only way for checksums not to be used is if both hosts in
their SYNs set A=0. This decision is confirmed by the setting of the their SYNs set A=0. This decision is confirmed by the setting of the
"A" bit in the third packet (the ACK) of the handshake. For example, "A" bit in the third packet (the ACK) of the handshake. For example,
if the initiator sets A=0 in the SYN, but the responder sets A=1 in if the initiator sets A=0 in the SYN, but the responder sets A=1 in
the SYN/ACK, checksums MUST be used in both directions, and the the SYN/ACK, checksums MUST be used in both directions, and the
initiator will set A=1 in the ACK. The decision whether to use initiator will set A=1 in the ACK. The decision whether to use
skipping to change at page 20, line 17 skipping to change at page 20, line 17
algorithm. An MP_JOIN option is present in the SYN, SYN/ACK, and ACK algorithm. An MP_JOIN option is present in the SYN, SYN/ACK, and ACK
of the three-way handshake, although in each case with a different of the three-way handshake, although in each case with a different
format. format.
In the first MP_JOIN on the SYN packet, illustrated in Figure 5, the In the first MP_JOIN on the SYN packet, illustrated in Figure 5, the
initiator sends a token, random number, and address ID. initiator sends a token, random number, and address ID.
The token is used to identify the MPTCP connection and is a The token is used to identify the MPTCP connection and is a
cryptographic hash of the receiver's key, as exchanged in the initial cryptographic hash of the receiver's key, as exchanged in the initial
MP_CAPABLE handshake (Section 3.1). In this specification, the MP_CAPABLE handshake (Section 3.1). In this specification, the
tokens presented in this option are generated by the SHA-1 ([4], tokens presented in this option are generated by the SHA-1 ([sha1],
[18]) algorithm, truncated to the most significant 32 bits. The [RFC6234]) algorithm, truncated to the most significant 32 bits. The
token included in the MP_JOIN option is the token that the receiver token included in the MP_JOIN option is the token that the receiver
of the packet uses to identify this connection; i.e., Host A will of the packet uses to identify this connection; i.e., Host A will
send Token-B (which is generated from Key-B). Note that the hash send Token-B (which is generated from Key-B). Note that the hash
generation algorithm can be overridden by the choice of cryptographic generation algorithm can be overridden by the choice of cryptographic
handshake algorithm, as defined in Section 3.1. handshake algorithm, as defined in Section 3.1.
The MP_JOIN SYN sends not only the token (which is static for a The MP_JOIN SYN sends not only the token (which is static for a
connection) but also random numbers (nonces) that are used to prevent connection) but also random numbers (nonces) that are used to prevent
replay attacks on the authentication method. Recommendations for the replay attacks on the authentication method. Recommendations for the
generation of random numbers for this purpose are given in [16]. generation of random numbers for this purpose are given in [RFC4086].
The MP_JOIN option includes an "Address ID". This is an identifier The MP_JOIN option includes an "Address ID". This is an identifier
that only has significance within a single connection, where it that only has significance within a single connection, where it
identifies the source address of this packet, even if the IP header identifies the source address of this packet, even if the IP header
has been changed in transit by a middlebox. The Address ID allows has been changed in transit by a middlebox. The Address ID allows
address removal (Section 3.4.2) without needing to know what the address removal (Section 3.4.2) without needing to know what the
source address at the receiver is, thus allowing address removal source address at the receiver is, thus allowing address removal
through NATs. The Address ID also allows correlation between new through NATs. The Address ID also allows correlation between new
subflow setup attempts and address signaling (Section 3.4.1), to subflow setup attempts and address signaling (Section 3.4.1), to
prevent setting up duplicate subflows on the same path, if an MP_JOIN prevent setting up duplicate subflows on the same path, if an MP_JOIN
skipping to change at page 21, line 45 skipping to change at page 21, line 45
that the 32-bit token in the MP_JOIN SYN gives sufficient protection that the 32-bit token in the MP_JOIN SYN gives sufficient protection
against blind state exhaustion attacks; therefore, there is no need against blind state exhaustion attacks; therefore, there is no need
to provide mechanisms to allow a responder to operate statelessly at to provide mechanisms to allow a responder to operate statelessly at
the MP_JOIN stage. the MP_JOIN stage.
An HMAC is sent by both hosts -- by the initiator (Host A) in the An HMAC is sent by both hosts -- by the initiator (Host A) in the
third packet (the ACK) and by the responder (Host B) in the second third packet (the ACK) and by the responder (Host B) in the second
packet (the SYN/ACK). Doing the HMAC exchange at this stage allows packet (the SYN/ACK). Doing the HMAC exchange at this stage allows
both hosts to have first exchanged random data (in the first two SYN both hosts to have first exchanged random data (in the first two SYN
packets) that is used as the "message". This specification defines packets) that is used as the "message". This specification defines
that HMAC as defined in [11] is used, along with the SHA-1 hash that HMAC as defined in [RFC2104] is used, along with the SHA-1 hash
algorithm [4] (potentially implemented as in [18]), thus generating a algorithm [sha1] (potentially implemented as in [RFC6234]), thus
160-bit / 20-octet HMAC. Due to option space limitations, the HMAC generating a 160-bit / 20-octet HMAC. Due to option space
included in the SYN/ACK is truncated to the leftmost 64 bits, but limitations, the HMAC included in the SYN/ACK is truncated to the
this is acceptable since random numbers are used; thus, an attacker leftmost 64 bits, but this is acceptable since random numbers are
only has one chance to guess the HMAC correctly (if the HMAC is used; thus, an attacker only has one chance to guess the HMAC
incorrect, the TCP connection is closed, so a new MP_JOIN negotiation correctly (if the HMAC is incorrect, the TCP connection is closed, so
with a new random number is required). a new MP_JOIN negotiation with a new random number is required).
The initiator's authentication information is sent in its first ACK The initiator's authentication information is sent in its first ACK
(the third packet of the handshake), as shown in Figure 7. This data (the third packet of the handshake), as shown in Figure 7. This data
needs to be sent reliably, since it is the only time this HMAC is needs to be sent reliably, since it is the only time this HMAC is
sent; therefore, receipt of this packet MUST trigger a regular TCP sent; therefore, receipt of this packet MUST trigger a regular TCP
ACK in response, and the packet MUST be retransmitted if this ACK is ACK in response, and the packet MUST be retransmitted if this ACK is
not received. In other words, sending the ACK/MP_JOIN packet places not received. In other words, sending the ACK/MP_JOIN packet places
the subflow in the PRE_ESTABLISHED state, and it moves to the the subflow in the PRE_ESTABLISHED state, and it moves to the
ESTABLISHED state only on receipt of an ACK from the receiver. It is ESTABLISHED state only on receipt of an ACK from the receiver. It is
not permitted to send data while in the PRE_ESTABLISHED state. The not permitted to send data while in the PRE_ESTABLISHED state. The
skipping to change at page 26, line 22 skipping to change at page 26, line 22
3.3.1. Data Sequence Mapping 3.3.1. Data Sequence Mapping
The data stream as a whole can be reassembled through the use of the The data stream as a whole can be reassembled through the use of the
data sequence mapping components of the DSS option (Figure 9), which data sequence mapping components of the DSS option (Figure 9), which
define the mapping from the subflow sequence number to the data define the mapping from the subflow sequence number to the data
sequence number. This is used by the receiver to ensure in-order sequence number. This is used by the receiver to ensure in-order
delivery to the application layer. Meanwhile, the subflow-level delivery to the application layer. Meanwhile, the subflow-level
sequence numbers (i.e., the regular sequence numbers in the TCP sequence numbers (i.e., the regular sequence numbers in the TCP
header) have subflow-only relevance. It is expected (but not header) have subflow-only relevance. It is expected (but not
mandated) that SACK [13] is used at the subflow level to improve mandated) that SACK [RFC2018] is used at the subflow level to improve
efficiency. efficiency.
The data sequence mapping specifies a mapping from subflow sequence The data sequence mapping specifies a mapping from subflow sequence
space to data sequence space. This is expressed in terms of starting space to data sequence space. This is expressed in terms of starting
sequence numbers for the subflow and the data level, and a length of sequence numbers for the subflow and the data level, and a length of
bytes for which this mapping is valid. This explicit mapping for a bytes for which this mapping is valid. This explicit mapping for a
range of data was chosen rather than per-packet signaling to assist range of data was chosen rather than per-packet signaling to assist
with compatibility with situations where TCP/IP segmentation or with compatibility with situations where TCP/IP segmentation or
coalescing is undertaken separately from the stack that is generating coalescing is undertaken separately from the stack that is generating
the data flow (e.g., through the use of TCP segmentation offloading the data flow (e.g., through the use of TCP segmentation offloading
skipping to change at page 27, line 17 skipping to change at page 27, line 17
The data sequence mapping also contains a checksum of the data that The data sequence mapping also contains a checksum of the data that
this mapping covers, if use of checksums has been negotiated at the this mapping covers, if use of checksums has been negotiated at the
MP_CAPABLE exchange. Checksums are used to detect if the payload has MP_CAPABLE exchange. Checksums are used to detect if the payload has
been adjusted in any way by a non-MPTCP-aware middlebox. If this been adjusted in any way by a non-MPTCP-aware middlebox. If this
checksum fails, it will trigger a failure of the subflow, or a checksum fails, it will trigger a failure of the subflow, or a
fallback to regular TCP, as documented in Section 3.8, since MPTCP fallback to regular TCP, as documented in Section 3.8, since MPTCP
can no longer reliably know the subflow sequence space at the can no longer reliably know the subflow sequence space at the
receiver to build data sequence mappings. receiver to build data sequence mappings.
The checksum algorithm used is the standard TCP checksum [1], The checksum algorithm used is the standard TCP checksum [RFC0793],
operating over the data covered by this mapping, along with a pseudo- operating over the data covered by this mapping, along with a pseudo-
header as shown in Figure 10. header as shown in Figure 10.
1 2 3 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+--------------------------------------------------------------+ +--------------------------------------------------------------+
| | | |
| Data Sequence Number (8 octets) | | Data Sequence Number (8 octets) |
| | | |
+--------------------------------------------------------------+ +--------------------------------------------------------------+
skipping to change at page 28, line 34 skipping to change at page 28, line 34
numbers is not required, then an implementation MAY include just the numbers is not required, then an implementation MAY include just the
lower 32 bits of the data sequence number in the data sequence lower 32 bits of the data sequence number in the data sequence
mapping and/or Data ACK as an optimization, and an implementation can mapping and/or Data ACK as an optimization, and an implementation can
make this choice independently for each packet. An implementaton make this choice independently for each packet. An implementaton
MUST be able to receive and process both 64-bit or 32-bit sequence MUST be able to receive and process both 64-bit or 32-bit sequence
number values, but it is not required that an implementation is able number values, but it is not required that an implementation is able
to send both. to send both.
An implementation MUST send the full 64-bit data sequence number if An implementation MUST send the full 64-bit data sequence number if
it is transmitting at a sufficiently high rate that the 32-bit value it is transmitting at a sufficiently high rate that the 32-bit value
could wrap within the Maximum Segment Lifetime (MSL) [19]. The could wrap within the Maximum Segment Lifetime (MSL) [RFC1323]. The
lengths of the DSNs used in these values (which may be different) are lengths of the DSNs used in these values (which may be different) are
declared with flags in the DSS option. Implementations MUST accept a declared with flags in the DSS option. Implementations MUST accept a
32-bit DSN and implicitly promote it to a 64-bit quantity by 32-bit DSN and implicitly promote it to a 64-bit quantity by
incrementing the upper 32 bits of sequence number each time the lower incrementing the upper 32 bits of sequence number each time the lower
32 bits wrap. A sanity check MUST be implemented to ensure that a 32 bits wrap. A sanity check MUST be implemented to ensure that a
wrap occurs at an expected time (e.g., the sequence number jumps from wrap occurs at an expected time (e.g., the sequence number jumps from
a very high number to a very low number) and is not triggered by out- a very high number to a very low number) and is not triggered by out-
of-order packets. of-order packets.
As with the standard TCP sequence number, the data sequence number As with the standard TCP sequence number, the data sequence number
skipping to change at page 35, line 11 skipping to change at page 35, line 11
For example, a highly asymmetric path may be misdiagnosed as For example, a highly asymmetric path may be misdiagnosed as
underperforming. A RST for this purpose SHOULD be accompanied with underperforming. A RST for this purpose SHOULD be accompanied with
an appropriate MP_TCPRST option (Section 3.6). an appropriate MP_TCPRST option (Section 3.6).
3.3.7. Congestion Control Considerations 3.3.7. Congestion Control Considerations
Different subflows in an MPTCP connection have different congestion Different subflows in an MPTCP connection have different congestion
windows. To achieve fairness at bottlenecks and resource pooling, it windows. To achieve fairness at bottlenecks and resource pooling, it
is necessary to couple the congestion windows in use on each subflow, is necessary to couple the congestion windows in use on each subflow,
in order to push most traffic to uncongested links. One algorithm in order to push most traffic to uncongested links. One algorithm
for achieving this is presented in [6]; the algorithm does not for achieving this is presented in [RFC6356]; the algorithm does not
achieve perfect resource pooling but is "safe" in that it is readily achieve perfect resource pooling but is "safe" in that it is readily
deployable in the current Internet. By this, we mean that it does deployable in the current Internet. By this, we mean that it does
not take up more capacity on any one path than if it was a single not take up more capacity on any one path than if it was a single
path flow using only that route, so this ensures fair coexistence path flow using only that route, so this ensures fair coexistence
with single-path TCP at shared bottlenecks. with single-path TCP at shared bottlenecks.
It is foreseeable that different congestion controllers will be It is foreseeable that different congestion controllers will be
implemented for MPTCP, each aiming to achieve different properties in implemented for MPTCP, each aiming to achieve different properties in
the resource pooling/fairness/stability design space, as well as the resource pooling/fairness/stability design space, as well as
those for achieving different properties in quality of service, those for achieving different properties in quality of service,
skipping to change at page 35, line 37 skipping to change at page 35, line 37
for each subflow, which packets were lost and when. for each subflow, which packets were lost and when.
3.3.8. Subflow Policy 3.3.8. Subflow Policy
Within a local MPTCP implementation, a host may use any local policy Within a local MPTCP implementation, a host may use any local policy
it wishes to decide how to share the traffic to be sent over the it wishes to decide how to share the traffic to be sent over the
available paths. available paths.
In the typical use case, where the goal is to maximize throughput, In the typical use case, where the goal is to maximize throughput,
all available paths will be used simultaneously for data transfer, all available paths will be used simultaneously for data transfer,
using coupled congestion control as described in [6]. It is using coupled congestion control as described in [RFC6356]. It is
expected, however, that other use cases will appear. expected, however, that other use cases will appear.
For instance, a possibility is an 'all-or-nothing' approach, i.e., For instance, a possibility is an 'all-or-nothing' approach, i.e.,
have a second path ready for use in the event of failure of the first have a second path ready for use in the event of failure of the first
path, but alternatives could include entirely saturating one path path, but alternatives could include entirely saturating one path
before using an additional path (the 'overflow' case). Such choices before using an additional path (the 'overflow' case). Such choices
would be most likely based on the monetary cost of links, but may would be most likely based on the monetary cost of links, but may
also be based on properties such as the delay or jitter of links, also be based on properties such as the delay or jitter of links,
where stability (of delay or bandwidth) is more important than where stability (of delay or bandwidth) is more important than
throughput. Application requirements such as these are discussed in throughput. Application requirements such as these are discussed in
detail in [7]. detail in [RFC6897].
The ability to make effective choices at the sender requires full The ability to make effective choices at the sender requires full
knowledge of the path "cost", which is unlikely to be the case. It knowledge of the path "cost", which is unlikely to be the case. It
would be desirable for a receiver to be able to signal their own would be desirable for a receiver to be able to signal their own
preferences for paths, since they will often be the multihomed party, preferences for paths, since they will often be the multihomed party,
and may have to pay for metered incoming bandwidth. and may have to pay for metered incoming bandwidth.
Whilst fine-grained control may be the most powerful solution, that Whilst fine-grained control may be the most powerful solution, that
would require some mechanism such as overloading the Explicit would require some mechanism such as overloading the Explicit
Congestion Notification (ECN) signal [20], which is undesirable, and Congestion Notification (ECN) signal [RFC3168], which is undesirable,
it is felt that there would not be sufficient benefit to justify an and it is felt that there would not be sufficient benefit to justify
entirely new signal. Therefore, the MP_JOIN option (see Section 3.2) an entirely new signal. Therefore, the MP_JOIN option (see
contains the 'B' bit, which allows a host to indicate to its peer Section 3.2) contains the 'B' bit, which allows a host to indicate to
that this path should be treated as a backup path to use only in the its peer that this path should be treated as a backup path to use
event of failure of other working subflows (i.e., a subflow where the only in the event of failure of other working subflows (i.e., a
receiver has indicated B=1 SHOULD NOT be used to send data unless subflow where the receiver has indicated B=1 SHOULD NOT be used to
there are no usable subflows where B=0). send data unless there are no usable subflows where B=0).
In the event that the available set of paths changes, a host may wish In the event that the available set of paths changes, a host may wish
to signal a change in priority of subflows to the peer (e.g., a to signal a change in priority of subflows to the peer (e.g., a
subflow that was previously set as backup should now take priority subflow that was previously set as backup should now take priority
over all remaining subflows). Therefore, the MP_PRIO option, shown over all remaining subflows). Therefore, the MP_PRIO option, shown
in Figure 11, can be used to change the 'B' flag of the subflow on in Figure 11, can be used to change the 'B' flag of the subflow on
which it is sent. which it is sent.
1 2 3 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
skipping to change at page 37, line 11 skipping to change at page 37, line 11
to its peer that an address is temporarily unavailable (for example, to its peer that an address is temporarily unavailable (for example,
if it has radio coverage issues) and the peer should therefore drop if it has radio coverage issues) and the peer should therefore drop
to backup state on all subflows using that Address ID. to backup state on all subflows using that Address ID.
3.4. Address Knowledge Exchange (Path Management) 3.4. Address Knowledge Exchange (Path Management)
We use the term "path management" to refer to the exchange of We use the term "path management" to refer to the exchange of
information about additional paths between hosts, which in this information about additional paths between hosts, which in this
design is managed by multiple addresses at hosts. For more detail of design is managed by multiple addresses at hosts. For more detail of
the architectural thinking behind this design, see the MPTCP the architectural thinking behind this design, see the MPTCP
Architecture document [2]. Architecture document [RFC6182].
This design makes use of two methods of sharing such information, and This design makes use of two methods of sharing such information, and
both can be used on a connection. The first is the direct setup of both can be used on a connection. The first is the direct setup of
new subflows, already described in Section 3.2, where the initiator new subflows, already described in Section 3.2, where the initiator
has an additional address. The second method, described in the has an additional address. The second method, described in the
following subsections, signals addresses explicitly to the other host following subsections, signals addresses explicitly to the other host
to allow it to initiate new subflows. The two mechanisms are to allow it to initiate new subflows. The two mechanisms are
complementary: the first is implicit and simple, while the explicit complementary: the first is implicit and simple, while the explicit
is more complex but is more robust. Together, the mechanisms allow is more complex but is more robust. Together, the mechanisms allow
addresses to change in flight (and thus support operation through addresses to change in flight (and thus support operation through
skipping to change at page 38, line 19 skipping to change at page 38, line 19
instance, signaling addresses in other address families can only be instance, signaling addresses in other address families can only be
done explicitly using the Add Address option. done explicitly using the Add Address option.
3.4.1. Address Advertisement 3.4.1. Address Advertisement
The Add Address (ADD_ADDR) MPTCP option announces additional The Add Address (ADD_ADDR) MPTCP option announces additional
addresses (and optionally, ports) on which a host can be reached addresses (and optionally, ports) on which a host can be reached
(Figure 12). This option can be used at any time during a (Figure 12). This option can be used at any time during a
connection, depending on when the sender wishes to enable multiple connection, depending on when the sender wishes to enable multiple
paths and/or when paths become available. As with all MPTCP signals, paths and/or when paths become available. As with all MPTCP signals,
the receiver MUST undertake standard TCP validity checks, e.g. [12], the receiver MUST undertake standard TCP validity checks, e.g.
before acting upon it. [RFC5961], before acting upon it.
Every address has an Address ID that can be used for uniquely Every address has an Address ID that can be used for uniquely
identifying the address within a connection for address removal. identifying the address within a connection for address removal.
This is also used to identify MP_JOIN options (see Section 3.2) This is also used to identify MP_JOIN options (see Section 3.2)
relating to the same address, even when address translators are in relating to the same address, even when address translators are in
use. The Address ID MUST uniquely identify the address to the sender use. The Address ID MUST uniquely identify the address to the sender
(within the scope of the connection), but the mechanism for (within the scope of the connection), but the mechanism for
allocating such IDs is implementation specific. allocating such IDs is implementation specific.
All address IDs learned via either MP_JOIN or ADD_ADDR SHOULD be All address IDs learned via either MP_JOIN or ADD_ADDR SHOULD be
skipping to change at page 39, line 14 skipping to change at page 39, line 14
the explicit specification of a different port is required. If no the explicit specification of a different port is required. If no
port is specified, MPTCP SHOULD attempt to connect to the specified port is specified, MPTCP SHOULD attempt to connect to the specified
address on the same port as is already in use by the subflow on which address on the same port as is already in use by the subflow on which
the ADD_ADDR signal was sent; this is discussed in more detail in the ADD_ADDR signal was sent; this is discussed in more detail in
Section 3.10. Section 3.10.
The Truncated HMAC present in this Option is the rightmost 64 bits of The Truncated HMAC present in this Option is the rightmost 64 bits of
an HMAC, negotiated and calculated in the same way as for MP_JOIN as an HMAC, negotiated and calculated in the same way as for MP_JOIN as
described in Section 3.2. For this specification of MPTCP, as there described in Section 3.2. For this specification of MPTCP, as there
is only one hash algorithm option specified, this will be HMAC as is only one hash algorithm option specified, this will be HMAC as
defined in [11], using the SHA-1 hash algorithm [4], implemented as defined in [RFC2104], using the SHA-1 hash algorithm [sha1],
in [18]. In the same way as for MP_JOIN, the key for the HMAC implemented as in [RFC6234]. In the same way as for MP_JOIN, the key
algorithm, in the case of the message transmitted by Host A, will be for the HMAC algorithm, in the case of the message transmitted by
Key-A followed by Key-B, and in the case of Host B, Key-B followed by Host A, will be Key-A followed by Key-B, and in the case of Host B,
Key-A. These are the keys that were exchanged in the original Key-B followed by Key-A. These are the keys that were exchanged in
MP_CAPABLE handshake. The message for the HMAC is the Address ID, IP the original MP_CAPABLE handshake. The message for the HMAC is the
Address, and Port which precede the HMAC in the ADD_ADDR option. If Address ID, IP Address, and Port which precede the HMAC in the
the port is not present in the ADD_ADDR option, the HMAC message will ADD_ADDR option. If the port is not present in the ADD_ADDR option,
nevertheless include two octets of value zero. The rationale for the the HMAC message will nevertheless include two octets of value zero.
HMAC is to prevent unauthorized entities from injecting ADD_ADDR The rationale for the HMAC is to prevent unauthorized entities from
signals in an attempt to hijack a connection. Note that additionally injecting ADD_ADDR signals in an attempt to hijack a connection.
the presence of this HMAC prevents the address being changed in Note that additionally the presence of this HMAC prevents the address
flight unless the key is known by an intermediary. If a host being changed in flight unless the key is known by an intermediary.
receives an ADD_ADDR option for which it cannot validate the HMAC, it If a host receives an ADD_ADDR option for which it cannot validate
SHOULD silently ignore the option. the HMAC, it SHOULD silently ignore the option.
A set of four flags are present after the subtype and before the A set of four flags are present after the subtype and before the
Address ID. These are currently unassigned and MUST be set to zero Address ID. These are currently unassigned and MUST be set to zero
by a sender and MUST be ignored by the receiver. by a sender and MUST be ignored by the receiver.
1 2 3 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+---------------+---------------+-------+-------+---------------+ +---------------+---------------+-------+-------+---------------+
| Kind | Length |Subtype|(resvd)| Address ID | | Kind | Length |Subtype|(resvd)| Address ID |
+---------------+---------------+-------+-------+---------------+ +---------------+---------------+-------+-------+---------------+
skipping to change at page 39, line 51 skipping to change at page 39, line 51
| Port (2 octets, optional) | | | Port (2 octets, optional) | |
+-------------------------------+ | +-------------------------------+ |
| Truncated HMAC (8 octets) | | Truncated HMAC (8 octets) |
| +-------------------------------+ | +-------------------------------+
| | | |
+-------------------------------+ +-------------------------------+
Figure 12: Add Address (ADD_ADDR) Option Figure 12: Add Address (ADD_ADDR) Option
Due to the proliferation of NATs, it is reasonably likely that one Due to the proliferation of NATs, it is reasonably likely that one
host may attempt to advertise private addresses [21]. It is not host may attempt to advertise private addresses [RFC1918]. It is not
desirable to prohibit this, since there may be cases where both hosts desirable to prohibit this, since there may be cases where both hosts
have additional interfaces on the same private network, and a host have additional interfaces on the same private network, and a host
MAY want to advertise such addresses. The MP_JOIN handshake to MAY want to advertise such addresses. The MP_JOIN handshake to
create a new subflow (Section 3.2) provides mechanisms to minimize create a new subflow (Section 3.2) provides mechanisms to minimize
security risks. The MP_JOIN message contains a 32-bit token that security risks. The MP_JOIN message contains a 32-bit token that
uniquely identifies the connection to the receiving host. If the uniquely identifies the connection to the receiving host. If the
token is unknown, the host will return with a RST. In the unlikely token is unknown, the host will return with a RST. In the unlikely
event that the token is known, subflow setup will continue, but the event that the token is known, subflow setup will continue, but the
HMAC exchange must occur for authentication. This will fail, and HMAC exchange must occur for authentication. This will fail, and
will provide sufficient protection against two unconnected hosts will provide sufficient protection against two unconnected hosts
skipping to change at page 41, line 9 skipping to change at page 41, line 9
attempt on a previously advertised address/port combination can attempt on a previously advertised address/port combination can
therefore refresh ADD_ADDR information by sending the option again. therefore refresh ADD_ADDR information by sending the option again.
During normal MPTCP operation, it is unlikely that there will be During normal MPTCP operation, it is unlikely that there will be
sufficient TCP option space for ADD_ADDR to be included along with sufficient TCP option space for ADD_ADDR to be included along with
those for data sequence numbering (Section 3.3.1). Therefore, it is those for data sequence numbering (Section 3.3.1). Therefore, it is
expected that an MPTCP implementation will send the ADD_ADDR option expected that an MPTCP implementation will send the ADD_ADDR option
on separate ACKs. As discussed earlier, however, an MPTCP on separate ACKs. As discussed earlier, however, an MPTCP
implementation MUST NOT treat duplicate ACKs with any MPTCP option, implementation MUST NOT treat duplicate ACKs with any MPTCP option,
with the exception of the DSS option, as indications of congestion with the exception of the DSS option, as indications of congestion
[14], and an MPTCP implementation SHOULD NOT send more than two [RFC5681], and an MPTCP implementation SHOULD NOT send more than two
duplicate ACKs in a row for signaling purposes. duplicate ACKs in a row for signaling purposes.
3.4.2. Remove Address 3.4.2. Remove Address
If, during the lifetime of an MPTCP connection, a previously If, during the lifetime of an MPTCP connection, a previously
announced address becomes invalid (e.g., if the interface announced address becomes invalid (e.g., if the interface
disappears), the affected host SHOULD announce this so that the peer disappears), the affected host SHOULD announce this so that the peer
can remove subflows related to this address. can remove subflows related to this address.
This is achieved through the Remove Address (REMOVE_ADDR) option This is achieved through the Remove Address (REMOVE_ADDR) option
(Figure 13), which will remove a previously added address (or list of (Figure 13), which will remove a previously added address (or list of
addresses) from a connection and terminate any subflows currently addresses) from a connection and terminate any subflows currently
using that address. using that address.
For security purposes, if a host receives a REMOVE_ADDR option, it For security purposes, if a host receives a REMOVE_ADDR option, it
must ensure the affected path(s) are no longer in use before it must ensure the affected path(s) are no longer in use before it
instigates closure. The receipt of REMOVE_ADDR SHOULD first trigger instigates closure. The receipt of REMOVE_ADDR SHOULD first trigger
the sending of a TCP keepalive [22] on the path, and if a response is the sending of a TCP keepalive [RFC1122] on the path, and if a
received the path SHOULD NOT be removed. Typical TCP validity tests response is received the path SHOULD NOT be removed. Typical TCP
on the subflow (e.g., ensuring sequence and ACK numbers are correct) validity tests on the subflow (e.g., ensuring sequence and ACK
MUST also be undertaken. An implementation can use indications of numbers are correct) MUST also be undertaken. An implementation can
these test failures as part of intrusion detection or error logging. use indications of these test failures as part of intrusion detection
or error logging.
The sending and receipt (if no keepalive response was received) of The sending and receipt (if no keepalive response was received) of
this message SHOULD trigger the sending of RSTs by both hosts on the this message SHOULD trigger the sending of RSTs by both hosts on the
affected subflow(s) (if possible), as a courtesy to cleaning up affected subflow(s) (if possible), as a courtesy to cleaning up
middlebox state, before cleaning up any local state. middlebox state, before cleaning up any local state.
Address removal is undertaken by ID, so as to permit the use of NATs Address removal is undertaken by ID, so as to permit the use of NATs
and other middleboxes that rewrite source addresses. If there is no and other middleboxes that rewrite source addresses. If there is no
address at the requested ID, the receiver will silently ignore the address at the requested ID, the receiver will silently ignore the
request. request.
skipping to change at page 42, line 29 skipping to change at page 42, line 29
remaining subflows. MPTCP's connection will stay alive at the data remaining subflows. MPTCP's connection will stay alive at the data
level, in order to permit break-before-make handover between level, in order to permit break-before-make handover between
subflows. It is therefore necessary to provide an MPTCP-level subflows. It is therefore necessary to provide an MPTCP-level
"reset" to allow the abrupt closure of the whole MPTCP connection, "reset" to allow the abrupt closure of the whole MPTCP connection,
and this is the MP_FASTCLOSE option. and this is the MP_FASTCLOSE option.
MP_FASTCLOSE is used to indicate to the peer that the connection will MP_FASTCLOSE is used to indicate to the peer that the connection will
be abruptly closed and no data will be accepted anymore. The reasons be abruptly closed and no data will be accepted anymore. The reasons
for triggering an MP_FASTCLOSE are implementation specific. Regular for triggering an MP_FASTCLOSE are implementation specific. Regular
TCP does not allow sending a RST while the connection is in a TCP does not allow sending a RST while the connection is in a
synchronized state [1]. Nevertheless, implementations allow the synchronized state [RFC0793]. Nevertheless, implementations allow
sending of a RST in this state, if, for example, the operating system the sending of a RST in this state, if, for example, the operating
is running out of resources. In these cases, MPTCP should send the system is running out of resources. In these cases, MPTCP should
MP_FASTCLOSE. This option is illustrated in Figure 14. send the MP_FASTCLOSE. This option is illustrated in Figure 14.
1 2 3 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+---------------+---------------+-------+-----------------------+ +---------------+---------------+-------+-----------------------+
| Kind | Length |Subtype| (reserved) | | Kind | Length |Subtype| (reserved) |
+---------------+---------------+-------+-----------------------+ +---------------+---------------+-------+-----------------------+
| Option Receiver's Key | | Option Receiver's Key |
| (64 bits) | | (64 bits) |
| | | |
+---------------------------------------------------------------+ +---------------------------------------------------------------+
skipping to change at page 45, line 31 skipping to change at page 45, line 31
+---------------+---------------+-------+-------+---------------+ +---------------+---------------+-------+-------+---------------+
| Kind | Length |Subtype|S|U|rsv| Experiment | | Kind | Length |Subtype|S|U|rsv| Experiment |
+---------------+---------------+-------+-------+---------------+ +---------------+---------------+-------+-------+---------------+
| Id. (16 bits) | Subtype-specific data (variable length) ... | Id. (16 bits) | Subtype-specific data (variable length) ...
+----------------------------------------------------------- ... +----------------------------------------------------------- ...
Figure 16: MPTCP Experimental (MP_EXPERIMENTAL) Option Figure 16: MPTCP Experimental (MP_EXPERIMENTAL) Option
Figure 16 shows the format of the experimental option. The Figure 16 shows the format of the experimental option. The
Experiment identifier is a 16 bits integer that shall be assigned by Experiment identifier is a 16 bits integer that shall be assigned by
using the same procedure as defined in [23]. using the same procedure as defined in [RFC6994]; a request to IANA
is made in Section 8.4.
The two high order flags that are included in the MPTCP Experimental The two high order flags that are included in the MPTCP Experimental
option have the following semantics: option have the following semantics:
o "S" flag (highest order bit) : This is the synchronising bit. o "S" flag (highest order bit) : This is the synchronising bit.
When set to 1, it indicates that the host sending this option When set to 1, it indicates that the host sending this option
expects a reply from the remote host with an option having the expects a reply from the remote host with an option having the
same experiment identifier, but possibly containing other data. same experiment identifier, but possibly containing other data.
o "U" flag (second highest order bit) : When set to 1, this flag o "U" flag (second highest order bit) : When set to 1, this flag
skipping to change at page 51, line 25 skipping to change at page 51, line 25
This strategy is intended to maximize the probability of the SYN This strategy is intended to maximize the probability of the SYN
being permitted by a firewall or NAT at the recipient and to avoid being permitted by a firewall or NAT at the recipient and to avoid
confusing any network monitoring software. confusing any network monitoring software.
There may also be cases, however, where the passive opener wishes to There may also be cases, however, where the passive opener wishes to
signal to the other host that a specific port should be used, and signal to the other host that a specific port should be used, and
this facility is provided in the Add Address option as documented in this facility is provided in the Add Address option as documented in
Section 3.4.1. It is therefore feasible to allow multiple subflows Section 3.4.1. It is therefore feasible to allow multiple subflows
between the same two addresses but using different port pairs, and between the same two addresses but using different port pairs, and
such a facility could be used to allow load balancing within the such a facility could be used to allow load balancing within the
network based on 5-tuples (e.g., some ECMP implementations [8]). network based on 5-tuples (e.g., some ECMP implementations
[RFC2992]).
3.10.2. Delayed Subflow Start and Subflow Symmetry 3.10.2. Delayed Subflow Start and Subflow Symmetry
Many TCP connections are short-lived and consist only of a few Many TCP connections are short-lived and consist only of a few
segments, and so the overheads of using MPTCP outweigh any benefits. segments, and so the overheads of using MPTCP outweigh any benefits.
A heuristic is required, therefore, to decide when to start using A heuristic is required, therefore, to decide when to start using
additional subflows in an MPTCP connection. We expect that additional subflows in an MPTCP connection. We expect that
experience gathered from deployments will provide further guidance on experience gathered from deployments will provide further guidance on
this, and will be affected by particular application characteristics this, and will be affected by particular application characteristics
(which are likely to change over time). However, a suggested (which are likely to change over time). However, a suggested
skipping to change at page 54, line 32 skipping to change at page 54, line 33
per-connection local policy. Adding an address to one connection per-connection local policy. Adding an address to one connection
(either explicitly through an Add Address message, or implicitly (either explicitly through an Add Address message, or implicitly
through a Join) has no implication for other connections between through a Join) has no implication for other connections between
the same pair of hosts. the same pair of hosts.
5-tuple: The 5-tuple (protocol, local address, local port, remote 5-tuple: The 5-tuple (protocol, local address, local port, remote
address, remote port) presented by kernel APIs to the application address, remote port) presented by kernel APIs to the application
layer in a non-multipath-aware application is that of the first layer in a non-multipath-aware application is that of the first
subflow, even if the subflow has since been closed and removed subflow, even if the subflow has since been closed and removed
from the connection. This decision, and other related API issues, from the connection. This decision, and other related API issues,
are discussed in more detail in [7]. are discussed in more detail in [RFC6897].
5. Security Considerations 5. Security Considerations
As identified in [10], the addition of multipath capability to TCP As identified in [RFC6181], the addition of multipath capability to
will bring with it a number of new classes of threat. In order to TCP will bring with it a number of new classes of threat. In order
prevent these, [2] presents a set of requirements for a security to prevent these, [RFC6182] presents a set of requirements for a
solution for MPTCP. The fundamental goal is for the security of security solution for MPTCP. The fundamental goal is for the
MPTCP to be "no worse" than regular TCP today, and the key security security of MPTCP to be "no worse" than regular TCP today, and the
requirements are: key security requirements are:
o Provide a mechanism to confirm that the parties in a subflow o Provide a mechanism to confirm that the parties in a subflow
handshake are the same as in the original connection setup. handshake are the same as in the original connection setup.
o Provide verification that the peer can receive traffic at a new o Provide verification that the peer can receive traffic at a new
address before using it as part of a connection. address before using it as part of a connection.
o Provide replay protection, i.e., ensure that a request to add/ o Provide replay protection, i.e., ensure that a request to add/
remove a subflow is 'fresh'. remove a subflow is 'fresh'.
skipping to change at page 55, line 22 skipping to change at page 55, line 22
cryptographic material, future subflows use a truncated cryptographic cryptographic material, future subflows use a truncated cryptographic
hash of this key as the connection identification "token". The keys hash of this key as the connection identification "token". The keys
are concatenated and used as keys for creating Hash-based Message are concatenated and used as keys for creating Hash-based Message
Authentication Codes (HMACs) used on subflow setup, in order to Authentication Codes (HMACs) used on subflow setup, in order to
verify that the parties in the handshake are the same as in the verify that the parties in the handshake are the same as in the
original connection setup. It also provides verification that the original connection setup. It also provides verification that the
peer can receive traffic at this new address. Replay attacks would peer can receive traffic at this new address. Replay attacks would
still be possible when only keys are used; therefore, the handshakes still be possible when only keys are used; therefore, the handshakes
use single-use random numbers (nonces) at both ends -- this ensures use single-use random numbers (nonces) at both ends -- this ensures
the HMAC will never be the same on two handshakes. Guidance on the HMAC will never be the same on two handshakes. Guidance on
generating random numbers suitable for use as keys is given in [16] generating random numbers suitable for use as keys is given in
and discussed in Section 3.1. [RFC4086] and discussed in Section 3.1.
The use of crypto capability bits in the initial connection handshake The use of crypto capability bits in the initial connection handshake
to negotiate use of a particular algorithm allows the deployment of to negotiate use of a particular algorithm allows the deployment of
additional crypto mechanisms in the future. Note that this would be additional crypto mechanisms in the future. Note that this would be
susceptible to bid-down attacks only if the attacker was on-path (and susceptible to bid-down attacks only if the attacker was on-path (and
thus would be able to modify the data anyway). The security thus would be able to modify the data anyway). The security
mechanism presented in this document should therefore protect against mechanism presented in this document should therefore protect against
all forms of flooding and hijacking attacks discussed in [10]. all forms of flooding and hijacking attacks discussed in [RFC6181].
During normal operation, regular TCP protection mechanisms (such as During normal operation, regular TCP protection mechanisms (such as
ensuring sequence numbers are in-window) will provide the same level ensuring sequence numbers are in-window) will provide the same level
of protection against attacks on individual TCP subflows as exists of protection against attacks on individual TCP subflows as exists
for regular TCP today. Implementations will introduce additional for regular TCP today. Implementations will introduce additional
buffers compared to regular TCP, to reassemble data at the connection buffers compared to regular TCP, to reassemble data at the connection
level. The application of window sizing will minimize the risk of level. The application of window sizing will minimize the risk of
denial-of-service attacks consuming resources. denial-of-service attacks consuming resources.
As discussed in Section 3.4.1, a host may advertise its private As discussed in Section 3.4.1, a host may advertise its private
skipping to change at page 56, line 10 skipping to change at page 56, line 10
implementations should consider heuristics (Section 3.10) at both the implementations should consider heuristics (Section 3.10) at both the
sender and receiver to reduce the impact of this. sender and receiver to reduce the impact of this.
A small security risk could theoretically exist with key reuse, but A small security risk could theoretically exist with key reuse, but
in order to accomplish a replay attack, both the sender and receiver in order to accomplish a replay attack, both the sender and receiver
keys, and the sender and receiver random numbers, in the MP_JOIN keys, and the sender and receiver random numbers, in the MP_JOIN
handshake (Section 3.2) would have to match. handshake (Section 3.2) would have to match.
Whilst this specification defines a "medium" security solution, Whilst this specification defines a "medium" security solution,
meeting the criteria specified at the start of this section and the meeting the criteria specified at the start of this section and the
threat analysis ([10]), since attacks only ever get worse, it is threat analysis ([RFC6181]), since attacks only ever get worse, it is
likely that a future Standards Track version of MPTCP would need to likely that a future Standards Track version of MPTCP would need to
be able to support stronger security. There are several ways the be able to support stronger security. There are several ways the
security of MPTCP could potentially be improved; some of these would security of MPTCP could potentially be improved; some of these would
be compatible with MPTCP as defined in this document, whilst others be compatible with MPTCP as defined in this document, whilst others
may not be. For now, the best approach is to get experience with the may not be. For now, the best approach is to get experience with the
current approach, establish what might work, and check that the current approach, establish what might work, and check that the
threat analysis is still accurate. threat analysis is still accurate.
Possible ways of improving MPTCP security could include: Possible ways of improving MPTCP security could include:
o defining a new MPCTP cryptographic algorithm, as negotiated in o defining a new MPCTP cryptographic algorithm, as negotiated in
MP_CAPABLE. A sub-case could be to include an additional MP_CAPABLE. A sub-case could be to include an additional
deployment assumption, such as stateful servers, in order to allow deployment assumption, such as stateful servers, in order to allow
a more powerful algorithm to be used. a more powerful algorithm to be used.
o defining how to secure data transfer with MPTCP, whilst not o defining how to secure data transfer with MPTCP, whilst not
changing the signaling part of the protocol. changing the signaling part of the protocol.
o defining security that requires more option space, perhaps in o defining security that requires more option space, perhaps in
conjunction with a "long options" proposal for extending the TCP conjunction with a "long options" proposal for extending the TCP
options space (such as those surveyed in [24]), or perhaps options space (such as those surveyed in [TCPLO]), or perhaps
building on the current approach with a second stage of MPTCP- building on the current approach with a second stage of MPTCP-
option-based security. option-based security.
o revisiting the working group's decision to exclusively use TCP o revisiting the working group's decision to exclusively use TCP
options for MPTCP signaling, and instead look at also making use options for MPTCP signaling, and instead look at also making use
of the TCP payloads. of the TCP payloads.
MPTCP has been designed with several methods available to indicate a MPTCP has been designed with several methods available to indicate a
new security mechanism, including: new security mechanism, including:
skipping to change at page 58, line 50 skipping to change at page 58, line 50
of some Data ACKs, but performance will degrade as the fraction of of some Data ACKs, but performance will degrade as the fraction of
stripped options increases. We do not expect such cases to appear in stripped options increases. We do not expect such cases to appear in
practice, though: most middleboxes will either strip all options or practice, though: most middleboxes will either strip all options or
let them all through. let them all through.
We end this section with a list of middlebox classes, their behavior, We end this section with a list of middlebox classes, their behavior,
and the elements in the MPTCP design that allow operation through and the elements in the MPTCP design that allow operation through
such middleboxes. Issues surrounding dropping packets with options such middleboxes. Issues surrounding dropping packets with options
or stripping options were discussed above, and are not included here: or stripping options were discussed above, and are not included here:
o NATs [25] (Network Address (and Port) Translators) change the o NATs [RFC3022] (Network Address (and Port) Translators) change the
source address (and often source port) of packets. This means source address (and often source port) of packets. This means
that a host will not know its public-facing address for signaling that a host will not know its public-facing address for signaling
in MPTCP. Therefore, MPTCP permits implicit address addition via in MPTCP. Therefore, MPTCP permits implicit address addition via
the MP_JOIN option, and the handshake mechanism ensures that the MP_JOIN option, and the handshake mechanism ensures that
connection attempts to private addresses [21] do not cause connection attempts to private addresses [RFC1918] do not cause
problems. Explicit address removal is undertaken by an Address ID problems. Explicit address removal is undertaken by an Address ID
to allow no knowledge of the source address. to allow no knowledge of the source address.
o Performance Enhancing Proxies (PEPs) [26] might proactively ACK o Performance Enhancing Proxies (PEPs) [RFC3135] might proactively
data to increase performance. MPTCP, however, relies on accurate ACK data to increase performance. MPTCP, however, relies on
congestion control signals from the end host, and non-MPTCP-aware accurate congestion control signals from the end host, and non-
PEPs will not be able to provide such signals. MPTCP will, MPTCP-aware PEPs will not be able to provide such signals. MPTCP
therefore, fall back to single-path TCP, or close the problematic will, therefore, fall back to single-path TCP, or close the
subflow (see Section 3.8). problematic subflow (see Section 3.8).
o Traffic Normalizers [27] may not allow holes in sequence numbers, o Traffic Normalizers [norm] may not allow holes in sequence
and may cache packets and retransmit the same data. MPTCP looks numbers, and may cache packets and retransmit the same data.
like standard TCP on the wire, and will not retransmit different MPTCP looks like standard TCP on the wire, and will not retransmit
data on the same subflow sequence number. In the event of a different data on the same subflow sequence number. In the event
retransmission, the same data will be retransmitted on the of a retransmission, the same data will be retransmitted on the
original TCP subflow even if it is additionally retransmitted at original TCP subflow even if it is additionally retransmitted at
the connection level on a different subflow. the connection level on a different subflow.
o Firewalls [28] might perform initial sequence number randomization o Firewalls [RFC2979] might perform initial sequence number
on TCP connections. MPTCP uses relative sequence numbers in data randomization on TCP connections. MPTCP uses relative sequence
sequence mapping to cope with this. Like NATs, firewalls will not numbers in data sequence mapping to cope with this. Like NATs,
permit many incoming connections, so MPTCP supports address firewalls will not permit many incoming connections, so MPTCP
signaling (ADD_ADDR) so that a multiaddressed host can invite its supports address signaling (ADD_ADDR) so that a multiaddressed
peer behind the firewall/NAT to connect out to its additional host can invite its peer behind the firewall/NAT to connect out to
interface. its additional interface.
o Intrusion Detection Systems look out for traffic patterns and o Intrusion Detection Systems look out for traffic patterns and
content that could threaten a network. Multipath will mean that content that could threaten a network. Multipath will mean that
such data is potentially spread, so it is more difficult for an such data is potentially spread, so it is more difficult for an
IDS to analyze the whole traffic, and potentially increases the IDS to analyze the whole traffic, and potentially increases the
risk of false positives. However, for an MPTCP-aware IDS, tokens risk of false positives. However, for an MPTCP-aware IDS, tokens
can be read by such systems to correlate multiple subflows and can be read by such systems to correlate multiple subflows and
reassemble for analysis. reassemble for analysis.
o Application-level middleboxes such as content-aware firewalls may o Application-level middleboxes such as content-aware firewalls may
skipping to change at page 60, line 37 skipping to change at page 60, line 37
The authors also wish to acknowledge reviews and contributions from The authors also wish to acknowledge reviews and contributions from
Iljitsch van Beijnum, Lars Eggert, Marcelo Bagnulo, Robert Hancock, Iljitsch van Beijnum, Lars Eggert, Marcelo Bagnulo, Robert Hancock,
Pasi Sarolahti, Toby Moncaster, Philip Eardley, Sergio Lembo, Pasi Sarolahti, Toby Moncaster, Philip Eardley, Sergio Lembo,
Lawrence Conroy, Yoshifumi Nishida, Bob Briscoe, Stein Gjessing, Lawrence Conroy, Yoshifumi Nishida, Bob Briscoe, Stein Gjessing,
Andrew McGregor, Georg Hampel, Anumita Biswas, Wes Eddy, Alexey Andrew McGregor, Georg Hampel, Anumita Biswas, Wes Eddy, Alexey
Melnikov, Francis Dupont, Adrian Farrel, Barry Leiba, Robert Sparks, Melnikov, Francis Dupont, Adrian Farrel, Barry Leiba, Robert Sparks,
Sean Turner, Stephen Farrell, Martin Stiemerling, and Gregory Detal. Sean Turner, Stephen Farrell, Martin Stiemerling, and Gregory Detal.
8. IANA Considerations 8. IANA Considerations
This document updates [5] and as such IANA is requested to update the This document updates [RFC6824] and as such IANA is requested to
TCP option space registry to point to this document for Multipath update the TCP option space registry to point to this document for
TCP, as follows: Multipath TCP, as follows:
+------+--------+-----------------------+---------------+ +------+--------+-----------------------+---------------+
| Kind | Length | Meaning | Reference | | Kind | Length | Meaning | Reference |
+------+--------+-----------------------+---------------+ +------+--------+-----------------------+---------------+
| 30 | N | Multipath TCP (MPTCP) | This document | | 30 | N | Multipath TCP (MPTCP) | This document |
+------+--------+-----------------------+---------------+ +------+--------+-----------------------+---------------+
Table 1: TCP Option Kind Numbers Table 1: TCP Option Kind Numbers
8.1. MPTCP Option Subtypes
The 4-bit MPTCP subtype sub-registry ("MPTCP Option Subtypes" under The 4-bit MPTCP subtype sub-registry ("MPTCP Option Subtypes" under
the "Transmission Control Protocol (TCP) Parameters" registry) was the "Transmission Control Protocol (TCP) Parameters" registry) was
defined in [5]. This document defines one additional subtype defined in [RFC6824]. This document defines one additional subtype
(ADD_ADDR) and updates the references to this document for all sub- (ADD_ADDR) and updates the references to this document for all sub-
types except ADD_ADDR, which is deprecated. The updates are listed types except ADD_ADDR, which is deprecated. The updates are listed
in the following table. in the following table.
+-------+-----------------+-------------------------+---------------+ +-------+-----------------+-------------------------+---------------+
| Value | Symbol | Name | Reference | | Value | Symbol | Name | Reference |
+-------+-----------------+-------------------------+---------------+ +-------+-----------------+-------------------------+---------------+
| 0x0 | MP_CAPABLE | Multipath Capable | This | | 0x0 | MP_CAPABLE | Multipath Capable | This |
| | | | document, | | | | | document, |
| | | | Section 3.1 | | | | | Section 3.1 |
skipping to change at page 61, line 46 skipping to change at page 62, line 5
| | | | Section 3.6 | | | | | Section 3.6 |
| 0xf | MP_EXPERIMENTAL | MPTCP Experimental | This | | 0xf | MP_EXPERIMENTAL | MPTCP Experimental | This |
| | | Option | document, | | | | Option | document, |
| | | | Section 3.7 | | | | | Section 3.7 |
+-------+-----------------+-------------------------+---------------+ +-------+-----------------+-------------------------+---------------+
Table 2: MPTCP Option Subtypes Table 2: MPTCP Option Subtypes
Values 0x9 through 0xe are currently unassigned. Values 0x9 through 0xe are currently unassigned.
8.2. MPTCP Handshake Algorithms
IANA has created another sub-registry, "MPTCP Handshake Algorithms" IANA has created another sub-registry, "MPTCP Handshake Algorithms"
under the "Transmission Control Protocol (TCP) Parameters" registry, under the "Transmission Control Protocol (TCP) Parameters" registry,
based on the flags in MP_CAPABLE (Section 3.1). IANA is requested to based on the flags in MP_CAPABLE (Section 3.1). IANA is requested to
update the references of this table to this document, as follows: update the references of this table to this document, as follows:
+----------+-------------------+----------------------------+ +----------+-------------------+----------------------------+
| Flag Bit | Meaning | Reference | | Flag Bit | Meaning | Reference |
+----------+-------------------+----------------------------+ +----------+-------------------+----------------------------+
| A | Checksum required | This document, Section 3.1 | | A | Checksum required | This document, Section 3.1 |
| B | Extensibility | This document, Section 3.1 | | B | Extensibility | This document, Section 3.1 |
skipping to change at page 62, line 21 skipping to change at page 62, line 28
| H | HMAC-SHA1 | This document, Section 3.2 | | H | HMAC-SHA1 | This document, Section 3.2 |
+----------+-------------------+----------------------------+ +----------+-------------------+----------------------------+
Table 3: MPTCP Handshake Algorithms Table 3: MPTCP Handshake Algorithms
Note that the meanings of bits C through H can be dependent upon bit Note that the meanings of bits C through H can be dependent upon bit
B, depending on how Extensibility is defined in future B, depending on how Extensibility is defined in future
specifications; see Section 3.1 for more information. specifications; see Section 3.1 for more information.
Future assignments in this registry are also to be defined by Future assignments in this registry are also to be defined by
Standards Action as defined by [29]. Assignments consist of the Standards Action as defined by [RFC5226]. Assignments consist of the
value of the flags, a symbolic name for the algorithm, and a value of the flags, a symbolic name for the algorithm, and a
reference to its specification. reference to its specification.
8.3. MP_TCPRST Reason Codes
IANA is requested to create a further sub-registry, "MP_TCPRST Reason IANA is requested to create a further sub-registry, "MP_TCPRST Reason
Codes" under the "Transmission Control Protocol (TCP) Parameters" Codes" under the "Transmission Control Protocol (TCP) Parameters"
registry, based on the reason code in MP_TCPRST (Section 3.6). The registry, based on the reason code in MP_TCPRST (Section 3.6). The
contents of this sub-registry are to to this document, as follows: contents of this sub-registry are to to this document, as follows:
+------+-----------------------------+----------------------------+ +------+-----------------------------+----------------------------+
| Code | Meaning | Reference | | Code | Meaning | Reference |
+------+-----------------------------+----------------------------+ +------+-----------------------------+----------------------------+
| 0x00 | Unspecified TCP error | This document, Section 3.6 | | 0x00 | Unspecified TCP error | This document, Section 3.6 |
| 0x01 | MPTCP specific error | This document, Section 3.6 | | 0x01 | MPTCP specific error | This document, Section 3.6 |
| 0x02 | Lack of resources | This document, Section 3.6 | | 0x02 | Lack of resources | This document, Section 3.6 |
| 0x03 | Administratively prohibited | This document, Section 3.6 | | 0x03 | Administratively prohibited | This document, Section 3.6 |
| 0x04 | Too much outstanding data | This document, Section 3.6 | | 0x04 | Too much outstanding data | This document, Section 3.6 |
| 0x05 | Unacceptable performance | This document, Section 3.6 | | 0x05 | Unacceptable performance | This document, Section 3.6 |
| 0x06 | Middlebox interference | This document, Section 3.6 | | 0x06 | Middlebox interference | This document, Section 3.6 |
+------+-----------------------------+----------------------------+ +------+-----------------------------+----------------------------+
Table 4: MPTCP MP_TCPRST Reason Codes Table 4: MPTCP MP_TCPRST Reason Codes
8.4. Experimental option registry
Section 3.7 has defined the MP_EXPERIMENTAL option for private,
experimental MPTCP options, and the same considerations as for
[RFC6994] apply. IANA should create a "Multipath TCP Experimental
Option Identifiers (MPTCP ExIDs)" sub-registry. This registry
contains the 16 bits ExIDs and a reference (description, document
pointer, or assignee name and e-mail contact) for each entry. MPTCP
ExIDs are assigned on a First Come, First Served (FCFS) basis
[RFC5226].
IANA will advise applicants of duplicate entries to select an
alternate value, as per typical FCFS processing.
IANA will record known duplicate uses to assist the community in both
debugging assigned uses as well as correcting unauthorized duplicate
uses.
IANA should impose no requirement on making a registration other than
indicating the desired codepoint and providing a point of contact. A
short description or acronym for the use is desired but should not be
required.
9. References 9. References
9.1. Normative References 9.1. Normative References
[1] Postel, J., "Transmission Control Protocol", STD 7, RFC 793, [RFC0793] Postel, J., "Transmission Control Protocol", STD 7,
DOI 10.17487/RFC0793, September 1981, RFC 793, DOI 10.17487/RFC0793, September 1981,
<http://www.rfc-editor.org/info/rfc793>. <http://www.rfc-editor.org/info/rfc793>.
[2] Ford, A., Raiciu, C., Handley, M., Barre, S., and J. Iyengar, [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
"Architectural Guidelines for Multipath TCP Development", Requirement Levels", BCP 14, RFC 2119, DOI 10.17487/
RFC 6182, DOI 10.17487/RFC6182, March 2011, RFC2119, March 1997,
<http://www.rfc-editor.org/info/rfc6182>. <http://www.rfc-editor.org/info/rfc2119>.
[3] Bradner, S., "Key words for use in RFCs to Indicate Requirement [RFC6182] Ford, A., Raiciu, C., Handley, M., Barre, S., and J.
Levels", BCP 14, RFC 2119, DOI 10.17487/RFC2119, March 1997, Iyengar, "Architectural Guidelines for Multipath TCP
<http://www.rfc-editor.org/info/rfc2119>. Development", RFC 6182, DOI 10.17487/RFC6182, March 2011,
<http://www.rfc-editor.org/info/rfc6182>.
[4] National Institute of Science and Technology, "Secure Hash [sha1] National Institute of Science and Technology, "Secure Hash
Standard", Federal Information Processing Standard Standard", Federal Information Processing Standard
(FIPS) 180-3, October 2008, <http://csrc.nist.gov/publications/ (FIPS) 180-3, October 2008, <http://csrc.nist.gov/
fips/fips180-3/fips180-3_final.pdf>. publications/fips/fips180-3/fips180-3_final.pdf>.
9.2. Informative References 9.2. Informative References
[5] Ford, A., Raiciu, C., Handley, M., and O. Bonaventure, "TCP [RFC1122] Braden, R., Ed., "Requirements for Internet Hosts -
Extensions for Multipath Operation with Multiple Addresses", Communication Layers", STD 3, RFC 1122, DOI 10.17487/
RFC 6824, DOI 10.17487/RFC6824, January 2013, RFC1122, October 1989,
<http://www.rfc-editor.org/info/rfc6824>. <http://www.rfc-editor.org/info/rfc1122>.
[6] Raiciu, C., Handley, M., and D. Wischik, "Coupled Congestion [RFC1323] Jacobson, V., Braden, R., and D. Borman, "TCP Extensions
Control for Multipath Transport Protocols", RFC 6356, for High Performance", RFC 1323, DOI 10.17487/RFC1323,
DOI 10.17487/RFC6356, October 2011, May 1992, <http://www.rfc-editor.org/info/rfc1323>.
<http://www.rfc-editor.org/info/rfc6356>.
[7] Scharf, M. and A. Ford, "Multipath TCP (MPTCP) Application [RFC1918] Rekhter, Y., Moskowitz, B., Karrenberg, D., de Groot, G.,
Interface Considerations", RFC 6897, DOI 10.17487/RFC6897, and E. Lear, "Address Allocation for Private Internets",
March 2013, <http://www.rfc-editor.org/info/rfc6897>. BCP 5, RFC 1918, DOI 10.17487/RFC1918, February 1996,
<http://www.rfc-editor.org/info/rfc1918>.
[8] Hopps, C., "Analysis of an Equal-Cost Multi-Path Algorithm", [RFC2018] Mathis, M., Mahdavi, J., Floyd, S., and A. Romanow, "TCP
RFC 2992, DOI 10.17487/RFC2992, November 2000, Selective Acknowledgment Options", RFC 2018, DOI 10.17487/
<http://www.rfc-editor.org/info/rfc2992>. RFC2018, October 1996,
<http://www.rfc-editor.org/info/rfc2018>.
[9] Raiciu, C., Paasch, C., Barre, S., Ford, A., Honda, M., [RFC2104] Krawczyk, H., Bellare, M., and R. Canetti, "HMAC: Keyed-
Duchene, F., Bonaventure, O., and M. Handley, "How Hard Can It Hashing for Message Authentication", RFC 2104,
Be? Designing and Implementing a Deployable Multipath TCP", DOI 10.17487/RFC2104, February 1997,
Usenix Symposium on Networked Systems Design and <http://www.rfc-editor.org/info/rfc2104>.
Implementation 2012, <https://www.usenix.org/conference/nsdi12/
how-hard-can-it-be-designing-and-implementing-deployable-
multipath-tcp>.
[10] Bagnulo, M., "Threat Analysis for TCP Extensions for Multipath [RFC2979] Freed, N., "Behavior of and Requirements for Internet
Operation with Multiple Addresses", RFC 6181, DOI 10.17487/ Firewalls", RFC 2979, DOI 10.17487/RFC2979, October 2000,
RFC6181, March 2011, <http://www.rfc-editor.org/info/rfc6181>. <http://www.rfc-editor.org/info/rfc2979>.
[11] Krawczyk, H., Bellare, M., and R. Canetti, "HMAC: Keyed-Hashing [RFC2992] Hopps, C., "Analysis of an Equal-Cost Multi-Path
for Message Authentication", RFC 2104, DOI 10.17487/RFC2104, Algorithm", RFC 2992, DOI 10.17487/RFC2992, November 2000,
February 1997, <http://www.rfc-editor.org/info/rfc2104>. <http://www.rfc-editor.org/info/rfc2992>.
[12] Ramaiah, A., Stewart, R., and M. Dalal, "Improving TCP's [RFC3022] Srisuresh, P. and K. Egevang, "Traditional IP Network
Robustness to Blind In-Window Attacks", RFC 5961, DOI 10.17487/ Address Translator (Traditional NAT)", RFC 3022,
RFC5961, August 2010, <http://www.rfc-editor.org/info/rfc5961>. DOI 10.17487/RFC3022, January 2001,
<http://www.rfc-editor.org/info/rfc3022>.
[13] Mathis, M., Mahdavi, J., Floyd, S., and A. Romanow, "TCP [RFC3135] Border, J., Kojo, M., Griner, J., Montenegro, G., and Z.
Selective Acknowledgment Options", RFC 2018, DOI 10.17487/ Shelby, "Performance Enhancing Proxies Intended to
RFC2018, October 1996, Mitigate Link-Related Degradations", RFC 3135,
<http://www.rfc-editor.org/info/rfc2018>. DOI 10.17487/RFC3135, June 2001,
<http://www.rfc-editor.org/info/rfc3135>.
[14] Allman, M., Paxson, V., and E. Blanton, "TCP Congestion [RFC3168] Ramakrishnan, K., Floyd, S., and D. Black, "The Addition
Control", RFC 5681, DOI 10.17487/RFC5681, September 2009, of Explicit Congestion Notification (ECN) to IP",
<http://www.rfc-editor.org/info/rfc5681>. RFC 3168, DOI 10.17487/RFC3168, September 2001,
<http://www.rfc-editor.org/info/rfc3168>.
[15] Gont, F. and S. Bellovin, "Defending against Sequence Number [RFC4086] Eastlake 3rd, D., Schiller, J., and S. Crocker,
Attacks", RFC 6528, DOI 10.17487/RFC6528, February 2012, "Randomness Requirements for Security", BCP 106, RFC 4086,
<http://www.rfc-editor.org/info/rfc6528>. DOI 10.17487/RFC4086, June 2005,
<http://www.rfc-editor.org/info/rfc4086>.
[16] Eastlake 3rd, D., Schiller, J., and S. Crocker, "Randomness [RFC4987] Eddy, W., "TCP SYN Flooding Attacks and Common
Requirements for Security", BCP 106, RFC 4086, DOI 10.17487/ Mitigations", RFC 4987, DOI 10.17487/RFC4987, August 2007,
RFC4086, June 2005, <http://www.rfc-editor.org/info/rfc4086>. <http://www.rfc-editor.org/info/rfc4987>.
[17] Eddy, W., "TCP SYN Flooding Attacks and Common Mitigations", [RFC5226] Narten, T. and H. Alvestrand, "Guidelines for Writing an
RFC 4987, DOI 10.17487/RFC4987, August 2007, IANA Considerations Section in RFCs", BCP 26, RFC 5226,
<http://www.rfc-editor.org/info/rfc4987>. DOI 10.17487/RFC5226, May 2008,
<http://www.rfc-editor.org/info/rfc5226>.
[18] Eastlake 3rd, D. and T. Hansen, "US Secure Hash Algorithms (SHA [RFC5681] Allman, M., Paxson, V., and E. Blanton, "TCP Congestion
and SHA-based HMAC and HKDF)", RFC 6234, DOI 10.17487/RFC6234, Control", RFC 5681, DOI 10.17487/RFC5681, September 2009,
May 2011, <http://www.rfc-editor.org/info/rfc6234>. <http://www.rfc-editor.org/info/rfc5681>.
[19] Jacobson, V., Braden, R., and D. Borman, "TCP Extensions for [RFC5961] Ramaiah, A., Stewart, R., and M. Dalal, "Improving TCP's
High Performance", RFC 1323, DOI 10.17487/RFC1323, May 1992, Robustness to Blind In-Window Attacks", RFC 5961,
<http://www.rfc-editor.org/info/rfc1323>. DOI 10.17487/RFC5961, August 2010,
<http://www.rfc-editor.org/info/rfc5961>.
[20] Ramakrishnan, K., Floyd, S., and D. Black, "The Addition of [RFC6181] Bagnulo, M., "Threat Analysis for TCP Extensions for
Explicit Congestion Notification (ECN) to IP", RFC 3168, Multipath Operation with Multiple Addresses", RFC 6181,
DOI 10.17487/RFC3168, September 2001, DOI 10.17487/RFC6181, March 2011,
<http://www.rfc-editor.org/info/rfc3168>. <http://www.rfc-editor.org/info/rfc6181>.
[21] Rekhter, Y., Moskowitz, B., Karrenberg, D., de Groot, G., and [RFC6234] Eastlake 3rd, D. and T. Hansen, "US Secure Hash Algorithms
E. Lear, "Address Allocation for Private Internets", BCP 5, (SHA and SHA-based HMAC and HKDF)", RFC 6234,
RFC 1918, DOI 10.17487/RFC1918, February 1996, DOI 10.17487/RFC6234, May 2011,
<http://www.rfc-editor.org/info/rfc1918>. <http://www.rfc-editor.org/info/rfc6234>.
[22] Braden, R., Ed., "Requirements for Internet Hosts - [RFC6356] Raiciu, C., Handley, M., and D. Wischik, "Coupled
Communication Layers", STD 3, RFC 1122, DOI 10.17487/RFC1122, Congestion Control for Multipath Transport Protocols",
October 1989, <http://www.rfc-editor.org/info/rfc1122>. RFC 6356, DOI 10.17487/RFC6356, October 2011,
<http://www.rfc-editor.org/info/rfc6356>.
[23] Touch, J., "Shared Use of Experimental TCP Options", RFC 6994, [RFC6528] Gont, F. and S. Bellovin, "Defending against Sequence
DOI 10.17487/RFC6994, August 2013, Number Attacks", RFC 6528, DOI 10.17487/RFC6528,
<http://www.rfc-editor.org/info/rfc6994>. February 2012, <http://www.rfc-editor.org/info/rfc6528>.
[24] Ramaiah, A., "TCP option space extension", Work in Progress, [RFC6824] Ford, A., Raiciu, C., Handley, M., and O. Bonaventure,
March 2012. "TCP Extensions for Multipath Operation with Multiple
Addresses", RFC 6824, DOI 10.17487/RFC6824, January 2013,
<http://www.rfc-editor.org/info/rfc6824>.
[25] Srisuresh, P. and K. Egevang, "Traditional IP Network Address [RFC6897] Scharf, M. and A. Ford, "Multipath TCP (MPTCP) Application
Translator (Traditional NAT)", RFC 3022, DOI 10.17487/RFC3022, Interface Considerations", RFC 6897, DOI 10.17487/RFC6897,
January 2001, <http://www.rfc-editor.org/info/rfc3022>. March 2013, <http://www.rfc-editor.org/info/rfc6897>.
[26] Border, J., Kojo, M., Griner, J., Montenegro, G., and Z. [RFC6994] Touch, J., "Shared Use of Experimental TCP Options",
Shelby, "Performance Enhancing Proxies Intended to Mitigate RFC 6994, DOI 10.17487/RFC6994, August 2013,
Link-Related Degradations", RFC 3135, DOI 10.17487/RFC3135, <http://www.rfc-editor.org/info/rfc6994>.
June 2001, <http://www.rfc-editor.org/info/rfc3135>.
[27] Handley, M., Paxson, V., and C. Kreibich, "Network Intrusion [TCPLO] Ramaiah, A., "TCP option space extension", Work
Detection: Evasion, Traffic Normalization, and End-to-End in Progress, March 2012.
Protocol Semantics", Usenix Security 2001, 2001, <http://
www.usenix.org/events/sec01/full_papers/handley/handley.pdf>.
[28] Freed, N., "Behavior of and Requirements for Internet [howhard] Raiciu, C., Paasch, C., Barre, S., Ford, A., Honda, M.,
Firewalls", RFC 2979, DOI 10.17487/RFC2979, October 2000, Duchene, F., Bonaventure, O., and M. Handley, "How Hard
<http://www.rfc-editor.org/info/rfc2979>. Can It Be? Designing and Implementing a Deployable
Multipath TCP", Usenix Symposium on Networked Systems
Design and Implementation 2012, <https://www.usenix.org/
conference/nsdi12/
how-hard-can-it-be-designing-and-implementing-deployable-
multipath-tcp>.
[29] Narten, T. and H. Alvestrand, "Guidelines for Writing an IANA [norm] Handley, M., Paxson, V., and C. Kreibich, "Network
Considerations Section in RFCs", BCP 26, RFC 5226, Intrusion Detection: Evasion, Traffic Normalization, and
DOI 10.17487/RFC5226, May 2008, End-to-End Protocol Semantics", Usenix Security 2001,
<http://www.rfc-editor.org/info/rfc5226>. 2001, <http://www.usenix.org/events/sec01/full_papers/
handley/handley.pdf>.
Appendix A. Notes on Use of TCP Options Appendix A. Notes on Use of TCP Options
The TCP option space is limited due to the length of the Data Offset The TCP option space is limited due to the length of the Data Offset
field in the TCP header (4 bits), which defines the TCP header length field in the TCP header (4 bits), which defines the TCP header length
in 32-bit words. With the standard TCP header being 20 bytes, this in 32-bit words. With the standard TCP header being 20 bytes, this
leaves a maximum of 40 bytes for options, and many of these may leaves a maximum of 40 bytes for options, and many of these may
already be used by options such as timestamp and SACK. already be used by options such as timestamp and SACK.
We have performed a brief study on the commonly used TCP options in We have performed a brief study on the commonly used TCP options in
skipping to change at page 66, line 9 skipping to change at page 67, line 7
bytes) options. Together these sum to 19 bytes. Some operating bytes) options. Together these sum to 19 bytes. Some operating
systems appear to pad each option up to a word boundary, thus using systems appear to pad each option up to a word boundary, thus using
24 bytes (a brief survey suggests Windows XP and Mac OS X do this, 24 bytes (a brief survey suggests Windows XP and Mac OS X do this,
whereas Linux does not). Optimistically, therefore, we have 21 bytes whereas Linux does not). Optimistically, therefore, we have 21 bytes
spare, or 16 if it has to be word-aligned. In either case, however, spare, or 16 if it has to be word-aligned. In either case, however,
the SYN versions of Multipath Capable (12 bytes) and Join (12 or 16 the SYN versions of Multipath Capable (12 bytes) and Join (12 or 16
bytes) options will fit in this remaining space. bytes) options will fit in this remaining space.
Note that due to the use of a 64-bit data-level sequence space, it is Note that due to the use of a 64-bit data-level sequence space, it is
feasible that MPTCP will not require the timestamp option for feasible that MPTCP will not require the timestamp option for
protection against wrapped sequence numbers (PAWS [19]), since the protection against wrapped sequence numbers (PAWS [RFC1323]), since
data-level sequence space has far less chance of wrapping. the data-level sequence space has far less chance of wrapping.
Confirmation of the validity of this optimisation is for further Confirmation of the validity of this optimisation is for further
study. study.
TCP data packets typically carry timestamp options in every packet, TCP data packets typically carry timestamp options in every packet,
taking 10 bytes (or 12 with padding). That leaves 30 bytes (or 28, taking 10 bytes (or 12 with padding). That leaves 30 bytes (or 28,
if word-aligned). The Data Sequence Signal (DSS) option varies in if word-aligned). The Data Sequence Signal (DSS) option varies in
length depending on whether the data sequence mapping and DATA_ACK length depending on whether the data sequence mapping and DATA_ACK
are included, and whether the sequence numbers in use are 4 or 8 are included, and whether the sequence numbers in use are 4 or 8
octets. The maximum size of the DSS option is 28 bytes, so even that octets. The maximum size of the DSS option is 28 bytes, so even that
will fit in the available space. But unless a connection is both will fit in the available space. But unless a connection is both
skipping to change at page 67, line 26 skipping to change at page 68, line 23
with data in order to avoid interpretation as congestion). The cases with data in order to avoid interpretation as congestion). The cases
where options are stripped by middleboxes are discussed in Section 6. where options are stripped by middleboxes are discussed in Section 6.
Appendix B. Control Blocks Appendix B. Control Blocks
Conceptually, an MPTCP connection can be represented as an MPTCP Conceptually, an MPTCP connection can be represented as an MPTCP
control block that contains several variables that track the progress control block that contains several variables that track the progress
and the state of the MPTCP connection and a set of linked TCP control and the state of the MPTCP connection and a set of linked TCP control
blocks that correspond to the subflows that have been established. blocks that correspond to the subflows that have been established.
RFC 793 [1] specifies several state variables. Whenever possible, we RFC 793 [RFC0793] specifies several state variables. Whenever
reuse the same terminology as RFC 793 to describe the state variables possible, we reuse the same terminology as RFC 793 to describe the
that are maintained by MPTCP. state variables that are maintained by MPTCP.
B.1. MPTCP Control Block B.1. MPTCP Control Block
The MPTCP control block contains the following variable per The MPTCP control block contains the following variable per
connection. connection.
B.1.1. Authentication and Metadata B.1.1. Authentication and Metadata
Local.Token (32 bits): This is the token chosen by the local host on Local.Token (32 bits): This is the token chosen by the local host on
this MPTCP connection. The token MUST be unique among all this MPTCP connection. The token MUST be unique among all
 End of changes. 95 change blocks. 
276 lines changed or deleted 324 lines changed or added

This html diff was produced by rfcdiff 1.45. The latest version is available from http://tools.ietf.org/tools/rfcdiff/