draft-ietf-mptcp-architecture-01.txt   draft-ietf-mptcp-architecture-02.txt 
Internet Engineering Task Force A. Ford, Ed. Internet Engineering Task Force A. Ford, Ed.
Internet-Draft Roke Manor Research Internet-Draft Roke Manor Research
Intended status: Informational C. Raiciu Intended status: Informational C. Raiciu
Expires: December 24, 2010 University College London Expires: April 19, 2011 M. Handley
S. Barre University College London
Universite catholique de
Louvain
J. Iyengar J. Iyengar
Franklin and Marshall College Franklin and Marshall College
June 22, 2010 October 16, 2010
Architectural Guidelines for Multipath TCP Development Architectural Guidelines for Multipath TCP Development
draft-ietf-mptcp-architecture-01 draft-ietf-mptcp-architecture-02
Abstract Abstract
Endpoints are often connected by multiple paths, but TCP restricts Hosts are often connected by multiple paths, but TCP restricts
communications to a single path per transport connection. Resource communications to a single path per transport connection. Resource
usage within the network would be more efficient were these multiple usage within the network would be more efficient were these multiple
paths able to be used concurrently. This should enhance user paths able to be used concurrently. This should enhance user
experience through improved resilience to network failure and higher experience through improved resilience to network failure and higher
throughput. throughput.
This document outlines architectural guidelines for the development This document outlines architectural guidelines for the development
of a Multipath Transport Protocol, with references to how these of a Multipath Transport Protocol, with references to how these
architectural components come together in the Multipath TCP (MPTCP) architectural components come together in the development of a
protocol. This document also lists certain high level design Multipath TCP protocol. This document lists certain high level
decisions that provide foundations for the MPTCP design, based upon design decisions that provide foundations for the design of the MPTCP
these architectural requirements. protocol, based upon these architectural requirements.
Status of this Memo Status of this Memo
This Internet-Draft is submitted in full conformance with the This Internet-Draft is submitted in full conformance with the
provisions of BCP 78 and BCP 79. provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF). Note that other groups may also distribute Task Force (IETF). Note that other groups may also distribute
working documents as Internet-Drafts. The list of current Internet- working documents as Internet-Drafts. The list of current Internet-
Drafts is at http://datatracker.ietf.org/drafts/current/. Drafts is at http://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress." material or to cite them other than as "work in progress."
This Internet-Draft will expire on December 24, 2010. This Internet-Draft will expire on April 19, 2011.
Copyright Notice Copyright Notice
Copyright (c) 2010 IETF Trust and the persons identified as the Copyright (c) 2010 IETF Trust and the persons identified as the
document authors. All rights reserved. document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents Provisions Relating to IETF Documents
(http://trustee.ietf.org/license-info) in effect on the date of (http://trustee.ietf.org/license-info) in effect on the date of
publication of this document. Please review these documents publication of this document. Please review these documents
carefully, as they describe your rights and restrictions with respect carefully, as they describe your rights and restrictions with respect
to this document. Code Components extracted from this document must to this document. Code Components extracted from this document must
include Simplified BSD License text as described in Section 4.e of include Simplified BSD License text as described in Section 4.e of
skipping to change at page 3, line 16 skipping to change at page 3, line 16
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.1. Requirements Language . . . . . . . . . . . . . . . . . . 5 1.1. Requirements Language . . . . . . . . . . . . . . . . . . 5
1.2. Terminology . . . . . . . . . . . . . . . . . . . . . . . 5 1.2. Terminology . . . . . . . . . . . . . . . . . . . . . . . 5
1.3. Reference Scenario . . . . . . . . . . . . . . . . . . . . 5 1.3. Reference Scenario . . . . . . . . . . . . . . . . . . . . 5
2. Goals . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 2. Goals . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.1. Functional Goals . . . . . . . . . . . . . . . . . . . . . 6 2.1. Functional Goals . . . . . . . . . . . . . . . . . . . . . 6
2.2. Compatibility Goals . . . . . . . . . . . . . . . . . . . 7 2.2. Compatibility Goals . . . . . . . . . . . . . . . . . . . 7
2.2.1. Application Compatibility . . . . . . . . . . . . . . 7 2.2.1. Application Compatibility . . . . . . . . . . . . . . 7
2.2.2. Network Compatibility . . . . . . . . . . . . . . . . 7 2.2.2. Network Compatibility . . . . . . . . . . . . . . . . 7
2.2.3. Compatibility with other network users . . . . . . . . 8 2.2.3. Compatibility with other network users . . . . . . . . 9
2.2.4. Security Goals . . . . . . . . . . . . . . . . . . . . 9
3. An Architectural Basis For MPTCP . . . . . . . . . . . . . . . 9 3. An Architectural Basis For MPTCP . . . . . . . . . . . . . . . 9
4. A Functional Decomposition of MPTCP . . . . . . . . . . . . . 10 4. A Functional Decomposition of MPTCP . . . . . . . . . . . . . 11
5. High-Level Design Decisions . . . . . . . . . . . . . . . . . 12 5. High-Level Design Decisions . . . . . . . . . . . . . . . . . 13
5.1. Sequence Numbering . . . . . . . . . . . . . . . . . . . . 12 5.1. Sequence Numbering . . . . . . . . . . . . . . . . . . . . 13
5.2. Reliability . . . . . . . . . . . . . . . . . . . . . . . 13 5.2. Reliability and Retransmissions . . . . . . . . . . . . . 14
5.3. Buffers . . . . . . . . . . . . . . . . . . . . . . . . . 14 5.3. Buffers . . . . . . . . . . . . . . . . . . . . . . . . . 15
5.4. Signalling . . . . . . . . . . . . . . . . . . . . . . . . 15 5.4. Signalling . . . . . . . . . . . . . . . . . . . . . . . . 16
5.5. Path Management . . . . . . . . . . . . . . . . . . . . . 15 5.5. Path Management . . . . . . . . . . . . . . . . . . . . . 17
5.6. Connection Identification . . . . . . . . . . . . . . . . 16 5.6. Connection Identification . . . . . . . . . . . . . . . . 18
5.7. Network Layer Compatibility . . . . . . . . . . . . . . . 16 5.7. Congestion Control . . . . . . . . . . . . . . . . . . . . 18
5.8. Congestion Control . . . . . . . . . . . . . . . . . . . . 17 5.8. Security . . . . . . . . . . . . . . . . . . . . . . . . . 19
6. Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 6. Interactions with Applications . . . . . . . . . . . . . . . . 20
7. Security Considerations . . . . . . . . . . . . . . . . . . . 17 7. Interactions with Middleboxes . . . . . . . . . . . . . . . . 20
8. Interactions with Applications . . . . . . . . . . . . . . . . 17 8. Contributors . . . . . . . . . . . . . . . . . . . . . . . . . 22
9. Interactions with Middleboxes . . . . . . . . . . . . . . . . 18 9. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 22
10. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 19 10. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 22
11. Contributors . . . . . . . . . . . . . . . . . . . . . . . . . 19 11. Security Considerations . . . . . . . . . . . . . . . . . . . 22
12. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 19 12. References . . . . . . . . . . . . . . . . . . . . . . . . . . 22
13. References . . . . . . . . . . . . . . . . . . . . . . . . . . 20 12.1. Normative References . . . . . . . . . . . . . . . . . . . 22
13.1. Normative References . . . . . . . . . . . . . . . . . . . 20 12.2. Informative References . . . . . . . . . . . . . . . . . . 23
13.2. Informative References . . . . . . . . . . . . . . . . . . 20 Appendix A. Changelog . . . . . . . . . . . . . . . . . . . . . . 24
Appendix A. Implementation Architecture . . . . . . . . . . . . . 21 A.1. Changes since draft-ietf-mptcp-architecture-01 . . . . . . 24
A.1. Functional Separation . . . . . . . . . . . . . . . . . . 21 A.2. Changes since draft-ietf-mptcp-architecture-00 . . . . . . 24
A.1.1. Application to default MPTCP protocol . . . . . . . . 21 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 24
A.1.2. Generic architecture for MPTCP . . . . . . . . . . . . 24
A.2. PM/MPS interface . . . . . . . . . . . . . . . . . . . . . 25
Appendix B. Changelog . . . . . . . . . . . . . . . . . . . . . . 26
B.1. Changes since draft-ietf-mptcp-architecture-00 . . . . . . 26
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 27
1. Introduction 1. Introduction
As the Internet evolves, demands on Internet resources are ever- As the Internet evolves, demands on Internet resources are ever-
increasing, but often these resources (in particular, bandwidth) increasing, but often these resources (in particular, bandwidth)
cannot be fully utilised due to protocol constraints both on the end- cannot be fully utilised due to protocol constraints both on the end-
systems and within the network. If these resources could instead be systems and within the network. If these resources could instead be
used concurrently, end user experience could be greatly improved. used concurrently, end user experience could be greatly improved.
Such enhancements would also reduce the necessary expenditure on Such enhancements would also reduce the necessary expenditure on
network infrastructure which would otherwise be needed to create an network infrastructure that would otherwise be needed to create an
equivalent improvement in user experience. equivalent improvement in user experience.
By the application of resource pooling[2], these available resources By the application of resource pooling[3], these available resources
can be 'pooled' such that they appear as a single logical resource to can be 'pooled' such that they appear as a single logical resource to
the user. The purpose of a multipath transport, therefore, is to the user. The purpose of a multipath transport, therefore, is to
make use of multiple available paths, through resource pooling, to make use of multiple available paths, through resource pooling, to
bring two key benefits: bring two key benefits:
o To increase the resilience of the connectivity by providing o To increase the resilience of the connectivity by providing
multiple paths, protecting end hosts from the failure of one. multiple paths, protecting end hosts from the failure of one.
o To increase the efficiency of the resource usage, and thus o To increase the efficiency of the resource usage, and thus
increase the network capacity available to end hosts. increase the network capacity available to end hosts.
Multipath TCP (MPTCP)[3] is a set of extensions for TCP[4] that MPTCP [4] is a set of extensions for TCP[1] that implements a
implements a multipath transport and achieves these goals by pooling multipath transport and achieves these goals by pooling multiple
multiple paths within a transport connection, transparent to the paths within a transport connection, transparent to the application.
application. While multihoming and multipath functions have been Although multihoming and multipath functions are not new to transport
implemented in transport protocols previously, notably SCTP[5], MPTCP protocols, MPTCP aims to gain wide-scale deployment by recognising
is distinct in recognizing application and network compatibility the importance of application and network compatibility goals. These
goals that we believe are important for deployability of a multipath goals, discussed in detail in Section 2, relate to the appearance of
transport; we discuss these goals in more detail later in Section 2. MPTCP to the network (so non-MPTCP-aware entities see it as TCP) and
to the application (through providing an equivalent service to TCP to
non-MPTCP-aware applications).
This document makes three contributions: (i) it describes goals for a This document has three key purposes: (i) it describes goals for a
multipath transport - goals that MPTCP is designed to meet; (ii) it multipath transport - goals that MPTCP is designed to meet; (ii) it
lays out an architectural basis for MPTCP's design - a discussion lays out an architectural basis for MPTCP's design - a discussion
that applies to other multipath transports as well; and (iii) it that applies to other multipath transports as well; and (iii) it
discusses and documents high-level design decisions made in MPTCP's discusses and documents high-level design decisions made in MPTCP's
development, and considers their implications. development, and considers their implications.
Companion documents to this architectural overview are those which Companion documents to this architectural overview are those which
provide details of the protocol extensions[3], congestion control provide details of the protocol extensions[4], congestion control
algorithms[6], and application-level considerations[7]. Put algorithms[5], and application-level considerations[6]. Put
together, these components specify a complete Multipath TCP design. together, these components specify a complete Multipath TCP design.
We note that specific components are replaceable with other protocols We note that specific components are replaceable with other protocols
in accordance with the layer and functional decompositions discussed in accordance with the layer and functional decompositions discussed
in this document. in this document.
Please note this document is a work-in-progress and covers several
topics, some of which may be more appropriately moved to separate
documents as this work evolves.
1.1. Requirements Language 1.1. Requirements Language
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
document are to be interpreted as described in RFC 2119 [1]. document are to be interpreted as described in RFC 2119 [2].
1.2. Terminology 1.2. Terminology
Path: A sequence of links between a sender and a receiver, defined Path: A sequence of links between a sender and a receiver, defined
in this context by a source and destination address pair. in this context by a source and destination address pair.
Endpoint: A host either initiating or terminating a MPTCP Path Identifier: Within the context of a multi-addressed multipath
TCP, a path is defined by the source and destination (address,
port) pairs (i.e. a 4-tuple).
Host: An end host either initiating or terminating a MPTCP
connection. connection.
Multipath TCP (MPTCP): A modified version of the TCP [4] protocol Multipath TCP: A modified version of the TCP [1] protocol that
that supports the simultaneous use of multiple paths between supports the simultaneous use of multiple paths between hosts.
endpoints.
MPTCP: The proposed protocol extensions specified in [4] to provide
a Multipath TCP implementation.
Subflow: A flow of TCP packets operating over an individual path, Subflow: A flow of TCP packets operating over an individual path,
which forms part of a larger MPTCP connection. which forms part of a larger MPTCP connection.
MPTCP Connection: A set of one or more subflows combined to provide MPTCP Connection: A set of one or more subflows combined to provide
a single Multipath TCP service to an application at an endpoint. a single Multipath TCP service to an application at a host.
1.3. Reference Scenario 1.3. Reference Scenario
The diagram shown in Figure 1 illustrates a typical usage scenario The diagram shown in Figure 1 illustrates a typical usage scenario
for MPTCP. Two hosts, A and B, are communicating with each other. for MPTCP. Two hosts, A and B, are communicating with each other.
These endpoints are multi-homed and multi-addressed, providing two These hosts are multi-homed and multi-addressed, providing two
disjoint connections to the Internet. The addresses on each endpoint disjoint connections to the Internet. The addresses on each host are
are referred to as A1, A2, B1 and B2. There are therefore up to four referred to as A1, A2, B1 and B2. There are therefore up to four
different paths between the two endpoints: A1-B1, A1-B2, A2-B1, different paths between the two hosts: A1-B1, A1-B2, A2-B1, A2-B2.
A2-B2.
+------+ __________ +------+ +------+ __________ +------+
| |A1 ______ ( ) ______ B1| | | |A1 ______ ( ) ______ B1| |
| Host |--/ ( ) \--| Host | | Host |--/ ( ) \--| Host |
| | ( Internet ) | | | | ( Internet ) | |
| A |--\______( )______/--| B | | A |--\______( )______/--| B |
| |A2 (__________) B2| | | |A2 (__________) B2| |
+------+ +------+ +------+ +------+
Figure 1: Simple MPTCP Usage Scenario Figure 1: Simple MPTCP Usage Scenario
The scenario could have any number of addresses (1 or more) on each The scenario could have any number of addresses (1 or more) on each
endpoint, so long as the number of paths available between the two host, so long as the number of paths available between the two hosts
endpoints is 2 or more (i.e. num_addr(A) * num_addr(B) > 1). The is 2 or more (i.e. num_addr(A) * num_addr(B) > 1). The paths created
paths created by these address combinations through the Internet need by these address combinations through the Internet need not be
not be entirely disjoint - shared bottlenecks will be addressed by entirely disjoint - shared bottlenecks will be addressed by the MPTCP
the MPTCP congestion controller. Furthermore, the paths through the congestion controller. Furthermore, the paths through the Internet
Internet may be interrupted by any number of middleboxes including may be interrupted by any number of middleboxes including NATs and
NATs and Firewalls. Finally, although the diagram refers to the Firewalls. Finally, although the diagram refers to the Internet,
Internet, MPTCP may be used over any network where there are multiple MPTCP may be used over any network where there are multiple paths
paths that could be used concurrently. that could be used concurrently.
TBD - what further detail here would be useful?
2. Goals 2. Goals
This section outlines primary goals that Multipath TCP aims to meet. This section outlines primary goals that Multipath TCP aims to meet.
These are broadly broken down into functional goals, which steer These are broadly broken down into: functional goals, which steer
services and features that MPTCP must provide, and compatibility services and features that Multipath TCP must provide; and
goals, which determine how MPTCP should appear to entities that compatibility goals, which determine how Multipath TCP should appear
interact with it. to entities that interact with it.
2.1. Functional Goals 2.1. Functional Goals
In providing the use of multiple paths, MPTCP has the following two In supporting the use of multiple paths, Multipath TCP has the
functional goals. following two functional goals.
o Improve Throughput: MPTCP MUST support the concurrent use of o Improve Throughput: Multipath TCP MUST support the concurrent use
multiple paths. To meet the minimum performance incentives for of multiple paths. To meet the minimum performance incentives for
deployment, an MPTCP connection over multiple paths SHOULD achieve deployment, a Multipath TCP connection over multiple paths SHOULD
no lesser throughput than a single TCP connection over the best achieve no lesser throughput than a single TCP connection over the
constituent path. best constituent path.
o Improve Resilience: MPTCP MUST support the use of multiple paths o Improve Resilience: Multipath TCP MUST support the use of multiple
interchangeably for resilience purposes, by permitting packets to paths interchangeably for resilience purposes, by permitting
be sent and re-sent on any available path. It follows that, in packets to be sent and re-sent on any available path. It follows
the worst case, the protocol MUST be no less resilient than legacy that, in the worst case, the protocol MUST be no less resilient
TCP. than regular single-path TCP.
As distribution of traffic among available paths and responses to As distribution of traffic among available paths and responses to
congestion are done in accordance with resource pooling congestion are done in accordance with resource pooling
principles[2], a secondary effect of meeting these goals is that principles[3], a secondary effect of meeting these goals is that
widespread use of MPTCP over the Internet should optimize overall widespread use of Multipath TCP over the Internet should optimize
network utility by shifting load away from congested bottlenecks and overall network utility by shifting load away from congested
by taking advantage of spare capacity wherever possible. bottlenecks and by taking advantage of spare capacity wherever
possible.
Furthermore, MPTCP SHOULD feature automatic negotiation of its use. Furthermore, Multipath TCP SHOULD feature automatic negotiation of
A host supporting Multipath TCP that requires the other endpoint to its use. A host supporting Multipath TCP that requires the other
do so too must be able to detect reliably whether this endpoint does host to do so too must be able to detect reliably whether this host
in fact support the next-generation protocol, using it if so, and does in fact support the required extensions, using them if so, and
otherwise automatically falling back to the legacy protocol. otherwise automatically falling back to single-path TCP.
2.2. Compatibility Goals 2.2. Compatibility Goals
In addition to the functional goals listed above, a Multipath TCP In addition to the functional goals listed above, a Multipath TCP
must meet a number of compatibility goals in order to support must meet a number of compatibility goals in order to support
deployment in today's Internet. These goals fall into the following deployment in today's Internet. These goals fall into the following
categories: categories:
2.2.1. Application Compatibility 2.2.1. Application Compatibility
Application compatibility refers to the appearance of MPTCP to the Application compatibility refers to the appearance of Multipath TCP
application both in terms of the API that can be used and the to the application both in terms of the API that can be used and the
expected service model that is provided. expected service model that is provided.
MPTCP MUST follow the same service model as TCP [4]: in-order, Multipath TCP MUST follow the same service model as TCP [1]: in-
reliable, and byte-oriented delivery. Furthermore, an MPTCP order, reliable, and byte-oriented delivery. Furthermore, an
connection SHOULD provide the application with no worse throughput Multipath TCP connection SHOULD provide the application with no worse
than it would expect from running a single TCP connection over any throughput than it would expect from running a single TCP connection
one of its available paths. over any one of its available paths.
A multipath-capable equivalent of TCP SHOULD retain backward A multipath-capable equivalent of TCP SHOULD retain backward
compatibility with existing TCP APIs, so that existing applications compatibility with existing TCP APIs, so that existing applications
can use the newer transport merely by upgrading the operating systems can use the newer transport merely by upgrading the operating systems
of the end-hosts. This does not preclude the use of an advanced API of the end-hosts. This does not preclude the use of an advanced API
to permit multipath-aware applications to specify preferences, nor to permit multipath-aware applications to specify preferences, nor
for users to configure their systems in a different way from the for users to configure their systems in a different way from the
default, for example switching on or off the automatic use of MPTCP. default, for example switching on or off the automatic use of
multipath extensions.
2.2.2. Network Compatibility 2.2.2. Network Compatibility
Traditional Internet architecture slots network devices in the Traditional Internet architecture slots network devices in the
network layer and lower layers of the OSI 7-layer stack, where the network layer and lower layers, where the layers above the network
layers above the network layer - the transport layer and upper layers layer are instantiated only at the end-hosts. While this
- are instantiated only at the end-hosts. While this architecture, architecture, shown in Figure 2, was initially largely adhered to,
shown in Figure 2, was largely adhered to earlier, this layering no this layering no longer reflects the "ground truth" in the Internet
longer reflects the "ground truth" in the Internet with the with the proliferation of middleboxes[7]. Middleboxes routinely
proliferation of middleboxes[8]. Middleboxes routinely interpose on interpose on the transport layer; sometimes even completely
the transport layer; sometimes even completely terminating transport terminating transport connections, thus leaving the application layer
connections, thus leaving the application layer as the first real as the first real end-to-end layer, as shown in Figure 3.
end-to-end layer, as shown in Figure 3.
+-------------+ +-------------+ +-------------+ +-------------+
| Application |<------------ end-to-end ------------->| Application | | Application |<------------ end-to-end ------------->| Application |
+-------------+ +-------------+ +-------------+ +-------------+
| Transport |<------------ end-to-end ------------->| Transport | | Transport |<------------ end-to-end ------------->| Transport |
+-------------+ +-------------+ +-------------+ +-------------+ +-------------+ +-------------+ +-------------+ +-------------+
| Network |<->| Network |<->| Network |<->| Network | | Network |<->| Network |<->| Network |<->| Network |
+-------------+ +-------------+ +-------------+ +-------------+ +-------------+ +-------------+ +-------------+ +-------------+
End Host Router Router End Host End Host Router Router End Host
skipping to change at page 8, line 29 skipping to change at page 8, line 29
| Transport |<------------------->| Transport |<->| Transport | | Transport |<------------------->| Transport |<->| Transport |
+-------------+ +-------------+ +-------------+ +-------------+ +-------------+ +-------------+ +-------------+ +-------------+
| Network |<->| Network |<->| Network |<->| Network | | Network |<->| Network |<->| Network |<->| Network |
+-------------+ +-------------+ +-------------+ +-------------+ +-------------+ +-------------+ +-------------+ +-------------+
Firewall, Firewall,
End Host Router NAT, or Proxy End Host End Host Router NAT, or Proxy End Host
Figure 3: Internet Reality Figure 3: Internet Reality
Middleboxes that interpose on the transport layer result in loss of Middleboxes that interpose on the transport layer result in loss of
"fate-sharing"[9], that is, they often hold "hard" state that, when "fate-sharing"[8], that is, they often hold "hard" state that, when
lost or corrupted, results in loss or corruption of the end-to-end lost or corrupted, results in loss or corruption of the end-to-end
transport connection. transport connection.
MPTCP MUST remain backward compatible with the Internet as it exists The network compatibility goal requires that the multipath extension
today, including being able to traverse predominant middleboxes such to TCP retains compatibility with the Internet as it exists today,
as firewalls, NATs, and performance enhancing proxies[8]. This including making reasonable efforts to be able to traverse
requirement comes from recognizing middleboxes as a significant predominant middleboxes such as firewalls, NATs, and performance
deployment bottleneck for any transport that is not TCP, and enhancing proxies[7]. This requirement comes from recognizing
constrains MPTCP to appear as TCP does on the wire and to use middleboxes as a significant deployment bottleneck for any transport
established TCP extensions where necessary. To ensure end-to-endness that is not TCP, and constrains Multipath TCP to appear as TCP does
of the transport, we further require MPTCP to preserve fate-sharing on the wire and to use established TCP extensions where necessary.
without making any assumptions about middlebox behavior. To ensure end-to-endness of the transport, we further require
Multipath TCP to preserve fate-sharing without making any assumptions
about middlebox behavior.
A detailed analysis of middlebox behaviour and the impact on the
Multipath TCP architecture is presented in Section 7. In addition,
network compatibility must be retained to the extent that Multipath
TCP MUST fall back to regular TCP if there are insurmountable
incompatibilities for the multipath extension on a path.
MPTCP's modifications remain at the transport layer, although some
knowledge of the underlying network layer is required. MPTCP SHOULD
work with IPv4 and IPv6 interchangeably, i.e. one MPTCP connection
may operate over both IPv4 and IPv6 networks.
2.2.3. Compatibility with other network users 2.2.3. Compatibility with other network users
As a corollary to both network and application compatibility, the As a corollary to both network and application compatibility, the
architecture must enable new Multipath TCP flows to coexist architecture must enable new Multipath TCP flows to coexist
gracefully with existing legacy TCP flows, competing for bandwidth gracefully with existing single-path TCP flows, competing for
neither unduly aggressively or unduly timidly (unless low-precedence bandwidth neither unduly aggressively or unduly timidly (unless low-
operation is specifically requested by the application, such as with precedence operation is specifically requested by the application,
LEDBAT). The use of multiple paths MUST not unduly harm users using such as with LEDBAT). The use of multiple paths MUST NOT unduly harm
single path TCP at shared bottlenecks, beyond the impact that would users using single-path TCP at shared bottlenecks, beyond the impact
occur from another single legacy TCP flow. that would occur from another single-path TCP flow. Multiple
Multipath TCP flows on a shared bottleneck MUST share bandwidth
between each other with the similar fairness to that which occurs
with a shared bottleneck with single-path TCP.
2.2.4. Security Goals
The extension of TCP with multipath capabilities will bring with it a
number of new threats, analysed in detail in [9]. The security goal
for Multipath TCP is to provide a service no less secure than
regular, single-path TCP. This will be achieved through a
combination of existing TCP security mechanisms (potentially modified
to align with the Multipath TCP extensions) and of protection against
the new multipath threats identified. The design decisions derived
from this goal are presented in Section 5.8.
3. An Architectural Basis For MPTCP 3. An Architectural Basis For MPTCP
We now present one possible transport architecture that we believe We now present one possible transport architecture that we believe
can effectively support MPTCP's goals. The new Internet model can effectively support MPTCP's goals. The new Internet model
described here is based on ideas proposed earlier in Tng ("Transport described here is based on ideas proposed earlier in Tng ("Transport
next-generation") [10]. While by no means the only possible next-generation") [10]. While by no means the only possible
architecture supporting multipath transport, Tng incorporates many architecture supporting multipath transport, Tng incorporates many
lessons learned from previous transport research and development lessons learned from previous transport research and development
practice, and offers a strong starting point from which to consider practice, and offers a strong starting point from which to consider
skipping to change at page 10, line 20 skipping to change at page 10, line 50
|Flow+Endpoint|<->|Flow+Endpoint|<->|Flow+Endpoint|<->|Flow+Endpoint| |Flow+Endpoint|<->|Flow+Endpoint|<->|Flow+Endpoint|<->|Flow+Endpoint|
+-------------+ +-------------+ +-------------+ +-------------+ +-------------+ +-------------+ +-------------+ +-------------+
| Network |<->| Network |<->| Network |<->| Network | | Network |<->| Network |<->| Network |<->| Network |
+-------------+ +-------------+ +-------------+ +-------------+ +-------------+ +-------------+ +-------------+ +-------------+
Firewall Performance Firewall Performance
End Host or NAT Enhancing Proxy End Host End Host or NAT Enhancing Proxy End Host
Figure 5: Middleboxes in the new Internet model Figure 5: Middleboxes in the new Internet model
MPTCP's architectural design follows Tng's decomposition as shown in MPTCP's architectural design follows Tng's decomposition as shown in
Figure 6. The MPTCP component, which provides application Figure 6. MPTCP, which provides application compatibility through
compatibility through the preservation of TCP-like semantics of the preservation of TCP-like semantics of global ordering of
global ordering of application data and reliability, is an application data and reliability, is an instantiation of the
instantiation of the "application-oriented" Semantic layer; whereas "application-oriented" Semantic layer; whereas the subflow TCP
the legacy-TCP component, which provides network compatibility by component, which provides network compatibility by appearing and
appearing and behaving as a TCP flow in network, is an instantiation behaving as a TCP flow in network, is an instantiation of the
of the "network-oriented" Flow+Endpoint layer. "network-oriented" Flow+Endpoint layer.
+--------------------------+ +-------------------------+ +--------------------------+ +-------------------------------+
| Application | | Application | | Application | | Application |
+--------------------------+ +-------------------------+ +--------------------------+ +-------------------------------+
| Semantic | | MPTCP | | Semantic | | MPTCP |
|--------------------------| + - - - - - + - - - - - + |------------+-------------| + - - - - - - - + - - - - - - - +
| Flow+Endpt | Flow+Endpt | | TCP | TCP | | Flow+Endpt | Flow+Endpt | | Subflow (TCP) | Subflow (TCP) |
+--------------------------+ +-------------------------+ +------------+-------------+ +---------------+---------------+
| Network | Network | | IP | IP | | Network | Network | | IP | IP |
+--------------------------+ +-------------------------+ +------------+-------------+ +---------------+---------------+
Figure 6: MPTCP mapping to Tng Figure 6: MPTCP mapping to Tng
As a protocol extension to TCP, MPTCP thus explicitly acknowledges As a protocol extension to TCP, MPTCP thus explicitly acknowledges
middleboxes in its design, and specifies a protocol that operates at middleboxes in its design, and specifies a protocol that operates at
two scales: the MPTCP component operates end-to-end, while it allows two scales: the MPTCP component operates end-to-end, while it allows
the TCP component to operate segment-by-segment. the TCP component to operate segment-by-segment.
4. A Functional Decomposition of MPTCP 4. A Functional Decomposition of MPTCP
Having laid out the goals to be met and the architectural basis for MPTCP, as described in [4], makes use of (what appear to the network
MPTCP, we now provide a functional decomposition MPTCP's design. to be) standard TCP sessions, termed "subflows", to provide the
underlying transport per path, and as such these retain the network
The MPTCP component relies upon (what appear to the network to be) compatibility desired. MPTCP-specific information is carried in a
standard TCP sessions, termed "subflows", to provide the underlying TCP-compatible manner, although this mechanism is separate from the
transport per path, and as such these retain the network actual information being transferred so could evolve in future
compatibility desired. MPTCP as described in [3] carries MPTCP- revisions. Figure 7 illustrates the layered architecture.
specific information in a TCP-compatible manner, although this
mechanism is separate from the actual information being transferred
so could evolve in future revisions. Figure 7 illustrates the
layered architecture.
+-------------------------------+ +-------------------------------+
| Application | | Application |
+---------------+ +-------------------------------+ +---------------+ +-------------------------------+
| Application | | MPTCP | | Application | | MPTCP |
+---------------+ + - - - - - - - + - - - - - - - + +---------------+ + - - - - - - - + - - - - - - - +
| TCP | | Subflow (TCP) | Subflow (TCP) | | TCP | | Subflow (TCP) | Subflow (TCP) |
+---------------+ +-------------------------------+ +---------------+ +-------------------------------+
| IP | | IP | IP | | IP | | IP | IP |
+---------------+ +-------------------------------+ +---------------+ +-------------------------------+
Figure 7: Comparison of Standard TCP and MPTCP Protocol Stacks Figure 7: Comparison of Standard TCP and MPTCP Protocol Stacks
Situated below the application, the MPTCP extension manages multiple Situated below the application, the MPTCP extension in turn manages
TCP subflows below it and must implement the following functions: multiple TCP subflows below it. In order to do this, it must
implement the following functions:
o Path Management: This is the function to detect and use multiple o Path Management: This is the function to detect and use multiple
paths between two endpoints. In the case of the MPTCP design [3], paths between two hosts. In the case of the MPTCP design [4],
this feature is implemented using multiple IP addresses at least this feature is implemented using multiple IP addresses at one or
one of the endpoints. Although this does not guarantee path both of the hosts. The path management features of the MPTCP
diversity, and there may be shared bottlenecks, this is a simple protocol are the mechanisms to signal alternative addresses to
mechanism that can be used with no additional features in the hosts, and mechanisms to set up new subflows joined to an existing
network. The path management features of the MPTCP protocol are MPTCP connection.
the mechanisms to signal alternative addresses to endpoints, and
mechanisms to set up new subflows attached to an existing MPTCP
connection.
o Packet Scheduling: This function breaks the bytestream received o Packet Scheduling: This function breaks the bytestream received
from the application into segments which are transmitted on one of from the application into segments to be transmitted on one of the
the available lower subflows. The MPTCP design makes use of a available lower subflows. The MPTCP design makes use of a data
data sequence mapping, associating packets sent on different sequence mapping, associating segments sent on different subflows
subflows to a connection-level sequence numbering, thus allowing to a connection-level sequence numbering, thus allowing segments
packets sent on different subflows to be correctly re-ordered at sent on different subflows to be correctly re-ordered at the
the receiver. The packet scheduler is dependent upon information receiver. The packet scheduler is dependent upon information
about the availability of paths exposed by the path management about the availability of paths exposed by the path management
component, and then makes use of the subflows to transmit these component, and then makes use of the subflows to transmit queued
packets. segments.
o Subflow (single-path TCP) Interface: A subflow component takes o Subflow (single-path TCP) Interface: A subflow component takes
segments from the packet-scheduling component and transmits them segments from the packet-scheduling component and transmits them
over the specified path, ensuring detectable delivery to the over the specified path, ensuring detectable delivery to the host.
endpoint. Detection of delivery is necessary to allow the MPTCP uses TCP underneath for network compatibility; TCP ensures
congestion control protocol to attribute packet delivery or loss in-order, reliable delivery. TCP adds its own sequence numbers to
to the right path. Note that the packet scheduling component does the segments; these are used to detect and retransmit lost packets
not embed enough information in packets to allow this to happen: at the subflow layer. The connection-level sequence numbering
segments with the same connection-level sequence number can be from the packet scheduling component allows re-ordering of the
transmitted over multiple paths, i.e. as retransmissions or just entire bytestream.
to increase redundancy. MPTCP uses TCP underneath for network
compatibility; TCP ensures in-order, reliable delivery. TCP adds
its of sequence numbers to the segments; these are used to detect
and retransmit lost packets.
o Congestion Control: This function manages congestion control o Congestion Control: This function coordinates congestion control
across the subflows. As specified, this congestion control across the subflows. As specified, this congestion control
algorithm must ensure that a MPTCP connection does not unfairly algorithm MUST ensure that a MPTCP connection does not unfairly
take more bandwidth than a single path TCP flow would take at a take more bandwidth than a single path TCP flow would take at a
shared bottlneck. An algorithm to support this is specified in shared bottlneck. An algorithm to support this is specified in
[6]. [5].
These functions fit together as follows. The Path Management looks These functions fit together as follows. The Path Management looks
after the discovery (and if necessary, initialisation) of multiple after the discovery (and if necessary, initialisation) of multiple
paths between two endpoints. The Packet Scheduler then receives paths between two hosts. The Packet Scheduler then receives a stream
packets from the application for the network and does the necessary of data from the application destined for the network, and undertakes
operations on them (such as adding a data-level sequence number) the necessary operations on it (such as segmenting the data into
before sending to a subflow. The subflow then adds its own sequence connection-level segments, and adding a connection-level sequence
number, acks, and passes them to network. The receiving subflow re- number) before sending it on to a subflow. The subflow then adds its
orders data and passes it to the MPTCP component, which performs own sequence number, acks, and passes them to network. The receiving
connection level re-ordering, removes the segment boundaries and subflow re-orders data (if necessary) and passes it to the packet
sends it to the application. Finally, the congestion control scheduling component, which performs connection level re-ordering,
component exists as part of the packet scheduling, in order to and sends the data stream to the application. Finally, the
schedule which packets should be sent at what rate on which subflow. congestion control component exists as part of the packet scheduling,
in order to schedule which packets should be sent at what rate on
which subflow.
5. High-Level Design Decisions 5. High-Level Design Decisions
There is seemingly a wide range of choices when designing a multipath There is seemingly a wide range of choices when designing a multipath
extension to TCP. However, the goals as discussed earlier in this extension to TCP. However, the goals as discussed earlier in this
document constrain the possible solutions, leaving relative little document constrain the possible solutions, leaving relative little
choice in many areas. Here, we outline high-level design choices choice in many areas. Here, we outline high-level design choices
that draw from the architectural basis discussed earlier in that draw from the architectural basis discussed earlier in
Section 3, and their implications for the MPTCP design. Section 3, and their implications for the MPTCP design [4].
5.1. Sequence Numbering 5.1. Sequence Numbering
MPTCP uses two levels of sequence spaces: a connection level sequence MPTCP uses two levels of sequence spaces: a connection level sequence
number, and another sequence number for each subflow. This permits number, and another sequence number for each subflow. This permits
connection-level segmentation and reassembly, and retransmission of connection-level segmentation and reassembly, and retransmission of
the same part of connection-level sequence space on different the same part of connection-level sequence space on different
subflow-level sequence space. subflow-level sequence space.
The alternative approach would be to use a single connection level The alternative approach would be to use a single connection level
sequence number, which gets sent on multiple subflows. This has two sequence number, which gets sent on multiple subflows. This has two
problems: first, the individual subflows will appear to the network problems: first, the individual subflows will appear to the network
as TCP sessions with gaps in the sequence space; this in turn may as TCP sessions with gaps in the sequence space; this in turn may
upset certain middleboxes such as intrusion detection systems, or upset certain middleboxes such as intrusion detection systems, or
certain transparent proxies, and would go against the network certain transparent proxies, and would thus go against the network
compatibility goal. Second, the sender cannot attribute packet compatibility goal. Second, the sender would not be able to
losses or receptions to the correct path when the same packet is sent attribute packet losses or receptions to the correct path when the
on multiple paths, in the case of retransmissions. same packet is sent on multiple paths (i.e. in the case of
retransmissions).
The sender must be able to tell the receiver how to reorder the data, The sender must be able to tell the receiver how to reassemble the
for delivery to the application. The sender does so by telling the data, for delivery to the application. In order to achieve this, the
receiver how subflow-level data (carying subflow sequence numbers) receiver must determine how subflow-level data (carying subflow
maps at connection level, which we refer to as Data Sequence Mapping. sequence numbers) maps at the connection level. We refer to this as
This mapping takes the form (data seq, subflow seq, length), i.e. for the Data Sequence Mapping. This mapping takes the form (data seq,
a given number of bytes (the length), the subflow sequence space subflow seq, length), i.e. for a given number of bytes (the length),
beginning at the given sequence number maps to the connection-level the subflow sequence space beginning at the given sequence number
sequence space (beginning at the given data seq number). maps to the connection-level sequence space (beginning at the given
data seq number).
This architecture does not mandate a mechanism for signalling such This architecture does not mandate a mechanism for signalling the
information, and it could conceivably have various sources. Data Sequence Mapping, and it could conceivably have various sources.
One option would be to use existing fields in the TCP segment (such One option would be to use existing fields in the TCP segment (such
as subflow seqno, length) and only add the data sequence number to as subflow seqno, length) and only add the data sequence number to
each segment, for instance as a TCP option. This is, however, each segment, for instance as a TCP option. This is, however,
vulnerable to middleboxes that resegment or assemble data, since vulnerable to middleboxes that resegment or assemble data, since
there is no specified behaviour for coalescing TCP options. If one there is no specified behaviour for coalescing TCP options. If one
signalled (data seqno, length), this would still be vulnerable to signalled (data seqno, length), this would still be vulnerable to
middleboxes that coalesce segments and do not correctly coalesce the middleboxes that coalesce segments and do not understand MPTCP
options. Because of these potential issues, the current signalling so do not correctly rewrite the options.
specification of MPTCP mandates that the full mapping should be sent
to the other end.
To reduce the overhead, it would be permissable for the mapping to be Because of these potential issues, the design decision taken in the
sent periodically and cover more than a single segment. It could MPTCP protocol [4] is that whenever a mapping for subflow data needs
also be excluded entirely in the case of a connection before more to be conveyed to the other host, all three pieces of data (data seq,
than one subflow is used, where the data-level and subflow-level subflow seq, length) must be sent. To reduce the overhead, it would
sequence space is the same. be permissable for the mapping to be sent periodically and cover more
than a single segment. Further experimentation is required to
determine what tradeoffs exist regarding the frequency at which
mappings should be sent. It could also be excluded entirely in the
case of a connection before more than one subflow is used, where the
data-level and subflow-level sequence space is the same.
5.2. Reliability 5.2. Reliability and Retransmissions
MPTCP features acknowledgements at connection-level as well as
subflow-level acknowledgements, in order to provide a robust service
to the application.
Under normal behaviour, MPTCP can use the data sequence mapping and Under normal behaviour, MPTCP can use the data sequence mapping and
subflow ACKs to decide when a connection-level segment was received. subflow ACKs to decide when a connection-level segment was received.
This has certain implications on end-to-end semantics. It means that The transmission of TCP ACKs for a subflow are handled entirely at
once a packet is acked at subflow level it cannot be discarded in the the subflow level, in order to maintain TCP semantics and trigger
re-order buffer at the connection level. Secondly, unlike in subflow-level retransmissions. This has certain implications on end-
standard TCP, a receiver cannot simply drop out-of-order segments if to-end semantics. It means that once a packet is acked at the
needed (for instance, due to memory pressure). subflow level it cannot be discarded in the re-order buffer at the
connection level. Secondly, unlike in standard TCP, a receiver
cannot simply drop out-of-order segments if needed (for instance, due
to memory pressure). Under certain circumstances, therefore, it may
be desirable to be able to drop packets after acknowledgement on the
subflow but before delivery to the application, and this can be
facilitated by a connection-level acknowledgement.
Furthermore, it is possible to conceive of some cases where Furthermore, it is possible to conceive of some cases where
connection-level acknowledgements could improve robustness. Consider connection-level acknowledgements could improve robustness. Consider
a subflow traversing a transparent proxy: if the proxy acks a segment a subflow traversing a transparent proxy: if the proxy acks a segment
and then crashes, the sender will not retransmit the lost segment on and then crashes, the sender will not retransmit the lost segment on
another subflow, as it thinks the segment has been received. The another subflow, as it thinks the segment has been received. The
connection grinds to a halt despite having other working subflows, connection grinds to a halt despite having other working subflows,
and the sender would be unable to determine the cause of the problem. and the sender would be unable to determine the cause of the problem.
Finally, as an optimisation, it may be feasible for a connection- An example situation where this may occur would be mobility between
level acknowledgement to be transmitted over the shortest RTT path, wireless access points, each of which operates a transport-level
potentially reducing send buffer requirements (see Section 5.3). proxy. Finally, as an optimisation, it may be feasible for a
connection-level acknowledgement to be transmitted over the shortest
RTT path, potentially reducing send buffer requirements (see
Section 5.3).
Therefore, to provide a fully robust multipath TCP solution, MPTCP Therefore, to provide a fully robust multipath TCP solution, MPTCP
SHOULD feature explicit connection-level acknowledgements. SHOULD feature explicit connection-level acknowledgements, in
addition to subflow-level acknowledgements. A connection-level
acknowledgement would only be required in order to signal when the
receive window moves forward; the heuristics for using such a signal
are discussed in more detail in the protocol specificiation [4].
Regarding retransmissions, it MUST be possible for a packet to be Regarding retransmissions, it MUST be possible for a packet to be
retransmitted on a different subflow to that on which it was retransmitted on a different subflow to that on which it was
originally sent. This is one of MPTCP's core goals, in order to originally sent. This is one of MPTCP's core goals, in order to
maintain integrity during temporary or permanent subflow failure, and maintain integrity during temporary or permanent subflow failure, and
this is enabled by the dual sequence number space. this is enabled by the dual sequence number space.
The scheduling of retransmissions will have significant impact on The scheduling of retransmissions will have significant impact on
MPTCP user experience. The current MPTCP specification suggests that MPTCP user experience. The current MPTCP specification suggests that
data outstanding on subflows that have timed out should be data outstanding on subflows that have timed out should be
rescheduled for transmission on different subflows. This behaviour rescheduled for transmission on different subflows. This behaviour
aims to minimize disruption when a path breaks, and uses the first aims to minimize disruption when a path breaks, and uses the first
timeout as indicators. More conservative versions would be to use timeout as indicators. More conservative versions would be to use
second or third timeouts for the same packet. second or third timeouts for the same packet.
When packet loss is detected and corrected with fast retransmit, Typically, fast retransmit on an individual subflow will not trigger
retransmission on different subflows may still be desirable in retransmission on another subflow, although this may still be
certain cases, for instance to reduce the receive buffer desirable in certain cases, for instance to reduce the receive buffer
requirements. However, in all cases with retransmissions on requirements. However, in all cases with retransmissions on
different subflows, the lost packets SHOULD still be sent on the path different subflows, the lost packets SHOULD still be sent on the path
that lost them. This is currently believed to be necessary to that lost them. This is currently believed to be necessary to
maintain subflow integrity, as per the network compatiblity goal. By maintain subflow integrity, as per the network compatiblity goal. By
doing this, throughput will be wasted, and it is unclear at this doing this, throughput will be wasted, and it is unclear at this
point what the optimal retransmit strategy is. point what the optimal retransmit strategy is.
Large-scale experiments are therefore required in order to determine
the most appropriate retransmission strategy, and recommendations
will be refined once more information is available.
5.3. Buffers 5.3. Buffers
Receive Buffer: ideally, a subflow failing should not affect the To ensure in-order delivery, Multipath TCP must use a connection
throughput of other working subflows. However, the receive buffer level receive buffer, where segments are placed until they are in
has limited size: if a flow times out, the other subflows will order and can be read by the application.
quickly fill the receive buffer with out-of-order data, and will
stall. Hence, receive buffer sizing is important for both robustness
and throughput.
The smallest receive buffer we need to avoid stalling under any In regular, single-path TCP, it is usually recommended to set the
circumstances is max(RTO)*sum(BW). This is, for most multipath receive buffer to 2*BDP (Bandwidth-Delay Product, i.e. BDP = BW*RTT,
connections, too expensive. A more reasonable size is proportional where BW = Bandwidth and RTT = Round-Trip Time). One BDP allows
to max(RTT)*sum(BW) which ensures subflows don't stall when fast supporting reordering of segments by the network. The other BDP
retransmit works. Also, depending on how the implementation behaves, allows the connection to continue during fast retransmit: when a
an additional sum(RTT*BW) might be needed for the individual re-order segment is fast retransmitted, the receiver must be able to store
buffers of the TCP subflows. incoming data during one more RTT.
Send Buffer: the smallest send buffer we need is sum(BDP) across all For Multipath TCP, the story is a bit more complicated. The ultimate
paths; this is to hold data until it's acked at subflow level. If we goal is that a subflow packet loss or subflow failure should not
didn't use a subflow level ack, and relied on a data-level ack, the affect the throughput of other working subflows; the receiver should
send buffer would need to be as big as the receive buffer of the have enough buffering to store all data until the missing packet is
connection, max(RTT)*sum(BW). In practice, the senders will be web re-transmitted and reaches the destination.
servers and receivers will be desktops or mobile servers. The send
buffer size matters particularly for servers, which must be able to The worst case scenario would be when the subflow with the highest
maintain a large number of ongoing connections. RTT/RTO (Round-Trip Time or Retransmission TimeOut) experiences a
timeout; in that case the receiver has to buffer data from all
subflows for the duration of the RTO. Thus, the smallest connection-
level receive buffer that would be needed to avoid stalling with
subflow failures is sum(BW_i)*RTO_max, where BW_i = Bandwidth for
each subflow and RTO_max is the largest RTO across all subflows.
This is an order of magnitude more than the receive buffer required
for a single connection, and is probably too expensive for practical
purposes. A more sensible requirement is to avoid stalls in the
absence of timeouts. Therefore, the RECOMMENDED receive buffer is
2*sum(BW_i)*RTT_max, where RTT_max is the largest RTT across all
subflows. This buffer sizing ensures subflows do not stall when fast
retransmit is triggered on any subflow.
The resulting buffer size should be small enough for practical use.
However, there may be extreme cases where fast, high throughput paths
(e.g. 100Mb/s, 10ms RTT) are used in conjunction with slow paths
(e.g. 1Mb/s, 1000ms RTT). In that case the required receive buffer
would be 12.5MB, which is likely too big. In these cases a Multipath
TCP scheduler SHOULD use only the fast path, potentially falling back
to the slow path if the fast path fails.
Send Buffer: The RECOMMENDED send buffer is the same size as the
recommended receive buffer i.e., 2*sum(BW_i)*RTT_max. This is
because the sender must store locally the segments sent but
unacknowledged by the connection level ACK. The send buffer size
matters particularly for hosts that maintain a large number of
ongoing connections. If the required send buffer is too large, a
host can choose to only send data on the fast subflows, using the
slow subflows only in cases of failure.
5.4. Signalling 5.4. Signalling
Since MPTCP will use regular TCP streams as its transport mechanism, Since MPTCP uses TCP as its subflow transport mechanism, a MPTCP
a MPTCP connection will also begin as a single TCP stream. connection will also begin as a single TCP connection. Nevertheless,
Nevertheless, it must signal to the peer that it supports MPTCP and it must signal to the peer that it supports MPTCP and wishes to use
wishes to use it on this connection. As such, a TCP Option will be it on this connection. As such, a TCP Option will be used to
used to transmit this information, since this is the established transmit this information, since this is the established mechanism
mechanism for indicating additional functionality on a TCP session. for indicating additional functionality on a TCP session.
On top of this, however, is signalling required during the operation In addition, further signalling is required during the operation of
of an MPTCP session, such as that for reassembly for multiple an MPTCP session, such as that for reassembly for multiple subflows,
subflows, and for informing the other endpoint about potential other and for informing the other host about potential other available
available addresses. It is not mandated by the architecture in what addresses. It is not mandated by the architecture in what format
format this signalling should be transmitted. this signalling should be transmitted.
The current MPTCP protocol proposal suggests the use of TCP options The MPTCP protocol design [4] continues to use TCP Options for this
for this signalling, however another approach would be to embed such signalling. This has been chosen as the mechanism most fitting in
information in the payload, and use type-length-value (TLV) encoding with the goals as specified in Section 2. With this mechanism, the
to separate signalling and payload data. signalling requires to operate MPTCP is transported separately from
the data, allowing it to be created and processed separately from the
data stream, and retaining architectural compatibility with network
entities.
5.5. Path Management 5.5. Path Management
Currently, the network does not expose multiple paths between Currently, the network does not expose multiple paths between hosts.
endpoints. Multipath TCP will use multiple addresses at one or both Multipath TCP will use multiple addresses at one or both hosts to
endpoints to get different paths to the destination. The hope is infer different paths across the network. The hope is that these
that these paths, whilst not necesarily entirely non-overlapping, paths, whilst not necesarily entirely non-overlapping, will be
will be sufficiently disjoint to allow multipath achieve improved sufficiently disjoint to allow multipath to achieve improved
throughput and robustness. throughput and robustness. The use of multiple IP addresses is a
simple mechanism that requires no additional features in the network.
Multiple different (source, destination) address pairs will thus be Multiple different (source, destination) address pairs will thus be
used as path selectors. Each path will be identified by a TCP used as path selectors. Each path will be identified by a TCP
4-tuple (i.e. source address, destination address, source port, 4-tuple (i.e. source address, destination address, source port,
destination port), thus allowing the extension of MPTCP to use such destination port), thus allowing the extension of MPTCP to use such
4-tuples as path selectors if the network will route different ports 4-tuples as path selectors if the network will route different ports
over different paths (which may be the case with technologies such as over different paths (which may be the case with technologies such as
ECMP). Equal Cost MultiPath (ECMP) routing, e.g. [14]).
For increased chance of successfully setting up additional subflows For increased chance of successfully setting up additional subflows
(such as when one end is behind a firewall, NAT, or other restrictive (such as when one end is behind a firewall, NAT, or other restrictive
middlebox), either endpoint should be able to add new subflows to a middlebox), either host SHOULD be able to add new subflows to a MPTCP
MPTCP connection. connection. MPTCP MUST be able to handle paths that appear and
disappear during the lifetime of a connection (for example, through
the activation of an additional network interface).
The modularity of path management will permit alternative mechanisms The modularity of path management will permit alternative mechanisms
to be employed if appropriate in the future. to be employed if appropriate in the future.
5.6. Connection Identification 5.6. Connection Identification
Therefore, each MPTCP connection should have a connection identifier Since an MPTCP connection may not be bound to a traditional 5-tuple
at each endpoint, which is locally unique within that endpoint. In (source addr and port, destination addr and port, protocol number)
many ways, this is analogous to a port number in regular TCP. The for the entirity of its existance, it is desirable to provide a new
manifestation and purpose of such an identifier is out of the scope mechanism for connection identification. This will be useful for
of this architecture document. MPTCP-aware applications, and for the MPTCP implementation (and
MPTCP-aware middleboxes) to have a unique identifier with which to
associate the multiple subflows.
Therefore, each MPTCP connection requires a connection identifier at
each host, which is locally unique within that host. In many ways,
this is analogous to a port number in regular TCP. The manifestation
and purpose of such an identifier is out of the scope of this
architecture document.
Legacy applications will not, however, have access to this identifier Legacy applications will not, however, have access to this identifier
and in such cases a MPTCP connection will be identified by the and in such cases a MPTCP connection will be identified by the
5-tuple of the first TCP subflow. It is out of the scope of this 5-tuple of the first TCP subflow. It is out of the scope of this
document, however, to define the behaviour of the MPTCP document, however, to define the behaviour of the MPTCP
implementation if the first TCP subflow later fails. If there are implementation if the first TCP subflow later fails. If there are
legacy applications that make assumptions about continued existance MPTCP-unaware applications that make assumptions about continued
of the initial address pair, their behaviour could be disrupted by existance of the initial address pair, their behaviour could be
carrying on regardless. It is expected that this is a very small, disrupted by carrying on regardless. It is expected that this is a
possibly negligible, set of applications, however. In the case of very small, possibly negligible, set of applications, however. In
applications that have specifically asked to be bound to a particular the case of applications that have used an existing API call to bind
address or interface, MPTCP will not be used. to a specific address or interface, the MPTCP extension MUST NOT be
used, since the applications are indicating a clear choice of path to
use and thus will have expectations of behaviour that must be
maintained, in order to adhere to the application compatibility
goals.
Since the requirements of applications are not clear at this stage, Since the requirements of applications are not clear at this stage,
however, it is as yet unconfirmed what the best behaviour is. It however, it is as yet unconfirmed what the best behaviour is. It
will be an implementation-specific solution, however, and as such the will be an implementation-specific solution, however, and as such the
behaviour is expected to be chosen by implementors once more research behaviour is expected to be chosen by implementors once more research
has been undertaken to determine its impact. has been undertaken to determine its impact.
5.7. Network Layer Compatibility 5.7. Congestion Control
MPTCP's modifications remain at the transport layer, although some As discussed in network-layer compatibility requirements
knowledge of the underlying network layer is required. MPTCP MUST Section 2.2.3, there are three goals for the congestion control
work with IPv4 and IPv6 interchangeably, i.e. one MPTCP connection algorithms used by an MPTCP implementation: improve throughput (at
may operate over both IPv4 and IPv6 networks. least as well as a single-path TCP connection would perform); do no
harm to other network users (do not take up more capacity on any one
path than if it was a single path flow using only that route - this
is particularly relevant for shared bottlenecks); and balance
congestion by moving traffic away from the most congested paths. To
achieve these goals, the congestion control algorithms on use on each
subflow must be coupled in some way. A proposal for a suitable
congestion control algorithm is given in [5].
5.8. Congestion Control 5.8. Security
As already documented in network-layer compatibility requirements, A detailed threat analysis for Multipath TCP is presented in a
the congestion control algorithms used by an MPTCP implementation separate document [9]. This focuses on flooding attacks and
must not harm other legacy users on shared bottlenecks. To achieve hijacking attacks that can be launched against a Multipath TCP
this, the congestion control algorithms on use on each subflow must connection.
be coupled in some way - a proposal for this is given in [6].
6. Summary The basic security goal of Multipath TCP, as introduced in
Section 2.2.4, can be stated as: "provide a solution that is no worse
than standard TCP".
This document has provided a summary of the components that have been From the threat analysis, and with this goal in mind, three key
identified to provide a Multipath TCP solution, and described the security requirements can be identified. A multi-addressed Multipath
high-level design decisions that have been used as a basis of the TCP SHOULD be able to:
MPTCP specification.
The suite of drafts that specify a complete MPTCP implementation, on o Provide a mechanism to confirm that the parties in a subflow
top of this architectural overview, are as follows: handshake are the same as in the original connection setup (e.g.
require use of a key exchanged in the initial handshake in the
subflow handshake, to limit the scope for hijacking attacks).
o A specification of the MPTCP protocol [3], describing the on- and o Provide verification that the peer can receive traffic at a new
off-the-wire differences to regular TCP. address before adding it (i.e. verify that the address belongs to
the other host, to prevent flooding attacks).
o A specification of a coupled congestion control algorithm [6], o Provide replay protection, i.e. ensure that a request to add/
that can be applied to the above protocol while meeting the goals remove a subflow is 'fresh'.
for such an algorithm as specified in this document.
o A document [7] that builds upon the application compatibility Additional mechanisms have been deployed as part of standard TCP
issues discussed in this document, explaining in more detail what stacks to provide resistance to Denial-of-Service attacks. For
if any changes an application may experience through the use of example, there are various mechanisms to protect against TCP reset
MPTCP. This document also provides a proposed API through which attacks [15], and Multipath TCP should continue to support similar
an application can influence the behaviour of the MPTCP protocol, protection. In addition, TCP SYN Cookies [16] were developed to
as specified in the above drafts. allow a TCP server to defer the creation of session state in the
SYN_RCVD state, and remain stateless until the ESTABLISHED state had
been reached. Multipath TCP should, ideally, continue to provide
such functionality and, at a minimum, avoid significant computational
burden prior to reaching the ESTABLISHED state (of the Multipath TCP
connection as a whole).
7. Security Considerations It should be noted that aspects of the Multipath TCP design space
place constraints on the security solution:
Please see [14] for a threat analysis of Multipath TCP. The threats o The use of TCP options significantly limits the amount of
analysed in this companion document are addressed as appropriate in information that can be carried in the handshake.
the protocol design [3].
8. Interactions with Applications o The need to work through middleboxes results in the need to handle
mutability of packets.
o The desire to support a 'break-before-make' approach to adding
subflows removes the ability to actively use a pre-existing
subflow to support the addition of a new one.
The MPTCP protocol design [4] aims to meet these security
requirements, and the protocol specification will document how these
are met.
6. Interactions with Applications
Interactions with applications - incuding, but not limited to, Interactions with applications - incuding, but not limited to,
performances changes that may be expected, semantic changes, and new performances changes that may be expected, semantic changes, and new
features that may be requested of an API, are presented in [7]. features that may be requested of an API, are presented in [6].
9. Interactions with Middleboxes 7. Interactions with Middleboxes
As discussed in Section 2.2, it is a goal of MPTCP to be deployable As discussed in Section 2.2, it is a goal of MPTCP to be deployable
today and thus compatible with the majority of middleboxes. This today and thus compatible with the majority of middleboxes. This
section summarises the issues that may arise with NATs, firewalls, section summarises the issues that may arise with NATs, firewalls,
proxies, intrusion detection systems, and other middleboxes that, if proxies, intrusion detection systems, and other middleboxes that, if
not considered in the protocol design, may hinder its deployment. not considered in the protocol design, may hinder its deployment.
This section is intended primarily as a description of options and This section is intended primarily as a description of options and
considerations only. Protocol-specific solutions to these issues considerations only. Protocol-specific solutions to these issues
will be given in the companion documents. will be given in the companion documents.
Multipath TCP will be deployed in a network that no longer provides Multipath TCP will be deployed in a network that no longer provides
just basic datagram delivery. A miriad of middleboxes are deployed just basic datagram delivery. A miriad of middleboxes are deployed
to optimize various perceived problems with the Internet protocols: to optimize various perceived problems with the Internet protocols:
NATs primarily address space shortage [11], Performance Enhancing NATs primarily address space shortage [11], Performance Enhancing
Proxies (PEPs) optimize TCP for different link characteristics [13], Proxies (PEPs) optimize TCP for different link characteristics [13],
firewalls [12] and intrusion detection systems try to block malicious firewalls [12] and intrusion detection systems try to block malicious
content from reaching a host, and traffic normalizers [15] ensure a content from reaching a host, and traffic normalizers [17] ensure a
consistent view of the traffic stream to IDSes and hosts. consistent view of the traffic stream to IDSes and hosts.
All these middleboxes optimize current applications at the expense of All these middleboxes optimize current applications at the expense of
future applications. In effect, future applications must mimic future applications. In effect, future applications will often need
existing ones if they want to be deployed. Further, the precise to behave in a similar fashion to existing ones, in order to increase
behaviour of all these middleboxes is not clearly specified, and the chances of successful deployment. Further, the precise behaviour
implementation errors make matters worse, raising the bar for the of all these middleboxes is not clearly specified, and implementation
deployment of new technologies. errors make matters worse, raising the bar for the deployment of new
technologies.
The following list of middlebox classes documents behaviour that The following list of middlebox classes documents behaviour that
could impact the use of MPTCP. This list is used in [3] to describe could impact the use of MPTCP. This list is used in [4] to describe
the features of the MPTCP protocol that are used to mitigate the the features of the MPTCP protocol that are used to mitigate the
impact of these middlebox behaviours. impact of these middlebox behaviours.
o NATs: Network Address Translators decouple the endpoint's local IP o NATs: Network Address Translators decouple the host's local IP
address with that which is seen in the wider Internet when the address with that which is seen in the wider Internet when the
packets are transmitted through a NAT. This adds complexity, and packets are transmitted through a NAT. This adds complexity, and
reduces the chances of success, when signalling IP addresses. reduces the chances of success, when signalling IP addresses.
o PEPs: Performance Enhancing Proxies, which aim to improve the o PEPs: Performance Enhancing Proxies, which aim to improve the
performance of protocols over low-performance (e.g. high latency performance of protocols over low-performance (e.g. high latency
or high error rate) links. As such, they may "split" a TCP or high error rate) links. As such, they may "split" a TCP
connection and behaviour such as proactive ACKing may occur. As connection and behaviour such as proactive ACKing may occur. As
with NATs, it is no longer guaranteed that one endpoint is with NATs, it is no longer guaranteed that one host is
communicating directly with another. communicating directly with another.
o Traffic Normalizers: These aim to eliminate ambiguities and o Traffic Normalizers: These aim to eliminate ambiguities and
potential attacks at the network level, and amongst other things potential attacks at the network level, and amongst other things
are unlikely to permit holes in sequence space. are unlikely to permit holes in TCP-level sequence space.
o Firewalls: on top of preventing incoming connections, firewalls
may also attempt additional protection such as sequence number
randomization.
o Intrusion Detection Systems: IDSs may look for traffic patterns to
protect a network, and may have false positives with MPTCP and
drop the connections during normal operation. For future MPTCP-
aware middleboxes, they will require the ability to correlate the
various paths in use.
In addition, all classes of middleboxes may affect TCP traffic in the
following ways:
o TCP Options: many middleboxes are in a position to drop packets o TCP Options: many middleboxes are in a position to drop packets
with unknown TCP options, or strip those options from the packets. with unknown TCP options, or strip those options from the packets.
o Segmentation/Colescing: middleboxes (or even something as close to o Segmentation/Colescing: middleboxes (or even something as close to
the end host as TCP Segmentation Offloading) may change the packet the end host as TCP Segmentation Offloading) may change the packet
boundaries from those which the sender intended. It may do this boundaries from those which the sender intended. It may do this
by splitting packets, or coalescing them together. This leads to by splitting packets, or coalescing them together. This leads to
two major impacts: we cannot guarantee where a packet boundary two major impacts: we cannot guarantee where a packet boundary
will be, and we cannot say for sure what a middlebox will do with will be, and we cannot say for sure what a middlebox will do with
TCP options in these cases (they may be repeated, dropped, or sent TCP options in these cases (they may be repeated, dropped, or sent
only once). only once).
o Firewalls: on top of preventing incoming connections, firewalls 8. Contributors
may also attempt additional protection such as sequence number
randomization.
o Intrusion Detection Systems: IDSs may look for traffic patterns to The authors would like to acknowledge the contributions of Sebastien
protect a network, and may have false positives with MPTCP and Barre, Andrew McDonald, and Bryan Ford to this document.
drop the connections during normal operation. For future MPTCP-
aware middleboxes, they will require the ability to correlate the
various paths in use.
10. Acknowledgements The authors would also like to thank the following people for
detailed reviews: Olivier Bonaventure, Gorry Fairhurst, Iljitsch van
Beijnum, and Philip Eardley.
Alan Ford, Costin Raiciu and Sebastien Barre are supported by Trilogy 9. Acknowledgements
Alan Ford, Costin Raiciu and Mark Handley are supported by Trilogy
(http://www.trilogy-project.org), a research project (ICT-216372) (http://www.trilogy-project.org), a research project (ICT-216372)
partially funded by the European Community under its Seventh partially funded by the European Community under its Seventh
Framework Program. The views expressed here are those of the Framework Program. The views expressed here are those of the
author(s) only. The European Commission is not liable for any use author(s) only. The European Commission is not liable for any use
that may be made of the information in this document. that may be made of the information in this document.
11. Contributors 10. IANA Considerations
The authors would like to acknowledge the contributions of Mark None.
Handley and Bryan Ford to this document.
12. IANA Considerations 11. Security Considerations
None. This informational document provides an architectural overview for
Multipath TCP and so does not, in itself, raise any security issues.
A separate threat analysis [9] lists threats that can exist with a
Multipath TCP. However, a protocol based on the architecture in this
document will have a number of security requirements. The high level
goals for such a protocol are identified in Section 2.2.4, whilst
Section 5.8 provides more detailed discussion of security
requirements and design decisions which are applied in the MPTCP
protocol design [4].
13. References 12. References
13.1. Normative References
[1] Bradner, S., "Key words for use in RFCs to Indicate Requirement 12.1. Normative References
[1] Postel, J., "Transmission Control Protocol", STD 7, RFC 793,
September 1981.
[2] Bradner, S., "Key words for use in RFCs to Indicate Requirement
Levels", BCP 14, RFC 2119, March 1997. Levels", BCP 14, RFC 2119, March 1997.
13.2. Informative References 12.2. Informative References
[2] Wischik, D., Handley, M., and M. Bagnulo Braun, "The Resource [3] Wischik, D., Handley, M., and M. Bagnulo Braun, "The Resource
Pooling Principle", ACM SIGCOMM CCR vol. 38 num. 5, pp. 47-52, Pooling Principle", ACM SIGCOMM CCR vol. 38 num. 5, pp. 47-52,
October 2008, October 2008,
<http://ccr.sigcomm.org/online/files/p47-handleyA4.pdf>. <http://ccr.sigcomm.org/online/files/p47-handleyA4.pdf>.
[3] Ford, A., Raiciu, C., and M. Handley, "TCP Extensions for [4] Ford, A., Raiciu, C., and M. Handley, "TCP Extensions for
Multipath Operation with Multiple Addresses", Multipath Operation with Multiple Addresses",
draft-ietf-mptcp-multiaddressed-00 (work in progress), draft-ietf-mptcp-multiaddressed-01 (work in progress),
June 2010. July 2010.
[4] Postel, J., "Transmission Control Protocol", STD 7, RFC 793,
September 1981.
[5] Stewart, R., "Stream Control Transmission Protocol", RFC 4960,
September 2007.
[6] Raiciu, C., Handley, M., and D. Wischik, "Coupled Multipath- [5] Raiciu, C., Handley, M., and D. Wischik, "Coupled Multipath-
Aware Congestion Control", draft-raiciu-mptcp-congestion-01 Aware Congestion Control", draft-ietf-mptcp-congestion-00 (work
(work in progress), March 2010. in progress), July 2010.
[7] Scharf, M. and A. Ford, "MPTCP Application Interface [6] Scharf, M. and A. Ford, "MPTCP Application Interface
Considerations", draft-scharf-mptcp-api-01 (work in progress), Considerations", draft-scharf-mptcp-api-02 (work in progress),
March 2010. July 2010.
[8] Carpenter, B. and S. Brim, "Middleboxes: Taxonomy and Issues", [7] Carpenter, B. and S. Brim, "Middleboxes: Taxonomy and Issues",
RFC 3234, February 2002. RFC 3234, February 2002.
[9] Carpenter, B., "Internet Transparency", RFC 2775, [8] Carpenter, B., "Internet Transparency", RFC 2775,
February 2000. February 2000.
[9] Bagnulo, M., "Threat Analysis for Multi-addressed/Multi-path
TCP", draft-ietf-mptcp-threat-02 (work in progress),
March 2010.
[10] Ford, B. and J. Iyengar, "Breaking Up the Transport Logjam", [10] Ford, B. and J. Iyengar, "Breaking Up the Transport Logjam",
ACM HotNets, October 2008. ACM HotNets, October 2008.
[11] Srisuresh, P. and K. Egevang, "Traditional IP Network Address [11] Srisuresh, P. and K. Egevang, "Traditional IP Network Address
Translator (Traditional NAT)", RFC 3022, January 2001. Translator (Traditional NAT)", RFC 3022, January 2001.
[12] Freed, N., "Behavior of and Requirements for Internet [12] Freed, N., "Behavior of and Requirements for Internet
Firewalls", RFC 2979, October 2000. Firewalls", RFC 2979, October 2000.
[13] Border, J., Kojo, M., Griner, J., Montenegro, G., and Z. [13] Border, J., Kojo, M., Griner, J., Montenegro, G., and Z.
Shelby, "Performance Enhancing Proxies Intended to Mitigate Shelby, "Performance Enhancing Proxies Intended to Mitigate
Link-Related Degradations", RFC 3135, June 2001. Link-Related Degradations", RFC 3135, June 2001.
[14] Bagnulo, M., "Threat Analysis for Multi-addressed/Multi-path [14] Hopps, C., "Analysis of an Equal-Cost Multi-Path Algorithm",
TCP", draft-ietf-mptcp-threat-02 (work in progress), RFC 2992, November 2000.
March 2010.
[15] Handley, M., Paxson, V., and C. Kreibich, "Network Intrusion [15] Ramaiah, A., Stewart, R., and M. Dalal, "Improving TCP's
Robustness to Blind In-Window Attacks", RFC 5961, August 2010.
[16] Eddy, W., "TCP SYN Flooding Attacks and Common Mitigations",
RFC 4987, August 2007.
[17] Handley, M., Paxson, V., and C. Kreibich, "Network Intrusion
Detection: Evasion, Traffic Normalization, and End-to-End Detection: Evasion, Traffic Normalization, and End-to-End
Protocol Semantics", Usenix Security 2001, 2001, <http:// Protocol Semantics", Usenix Security 2001, 2001, <http://
www.usenix.org/events/sec01/full_papers/handley/handley.pdf>. www.usenix.org/events/sec01/full_papers/handley/handley.pdf>.
Appendix A. Implementation Architecture Appendix A. Changelog
This section provides suggestions for an architecture to implement an
extensible, modular multipath transport protocol.
A.1. Functional Separation
This section describes a generic view of the internal implementation
of a Multipath TCP, through which the technical components specified
in the companion documents can fit together. It shows how an
implementation could be built that permits extensibility between
components without changing the external representation.
We first show the functional decomposition of an MPTCP solution that
is completely contained in the transport layer. That solution is
described in more details in [3]. Then we generalize the approach to
allow good extensibility of that solution.
A.1.1. Application to default MPTCP protocol
Although, in the default approach, MPTCP is fully contained in the
transport layer, it can still be divided into two main modules. One
manages the scheduling of packets as well as congestion control. The
other one manages the control of paths. The interface between the
two is dealt with thanks to a Path Index. As shown in Figure 8, the
Path Manager announces to the MultiPath Scheduler what paths can be
used trough path indices, and maintains the mapping between that
value and the particular action that it must apply to use the path
(an example of such a mapping is in Table 1). In the case of the
built-in Path Manager, the action is to replace an address/port pair
with another one, in such a way that another path is used across the
Internet to forward that packet.
Control plane <-- | --> Data plane
+---------------------------------------------------------------+
| Multipath Scheduler (MPS) |
+---------------------------------------------------------------+
^ | |
| | [A1,B1,|pA1,pB1]
|For conn_id | |
|<A1,B1,pA1,pB1> | +-------------+
|Paths 1->4 can be | | Data packet |<--Path idx:3
|used. | +-------------+ attached
| | | by MPS
| | V
+--------------------------------------------\------------------+
| Path Manager (PM) \[A1,B1]->[A1,B2] |
+--------------------------------------------------\------------+
/ \ | \
/-----------------------------\ | /"\ /"\ /"\ /"\
| rewriting table: || | | | | | | | |
| Subflow id <--> network_id || | | | | | | | |
| || | | | | | | | |
| [see table below] || | | | | | | | |
| || \./ \./ \./ \./
+------------------------------+| path1 path2 path3 path4
Figure 8: Functional separation of MPTCP in the transport layer
The MultiPath Scheduler only deals with abstract paths, represented
by numbers. It only sees one address pair throughout the
communication, that we call the connection identifier. However, the
MultiPath Scheduler must be able to perform per-subflow congestion
control, and thus to distinguish between the subflows. This leads to
define a subflow identifier, that consists of the usual transport
identifier extended with the path index:
<addr_src,psrc,addr_dst,pdst,path_index>. The following options,
described in [3], are managed by the MultiPath Scheduler.
o MULTIPATH CAPABLE (MPC): Tell the peer that we support MPTCP.
Note that the MPC option also holds a token, which is necessary
only if the built-in Path Manager is used. In the next section we
describe the generalized case, where the token can be ignored by
the receiver if another path manager is used.
o DATA SEQUENCE NUMBER (DSN): Identify the position of a set of
bytes in the meta-flow.
o DATA FIN (DFIN): Terminate a meta-flow.
An implementation MUST use those options even if another Path Manager
than the default one is implemented.
The Path manager applies a particular technology to give the MPS the
possibility to use several paths. The built-in MPTCP Path Manager
uses multiple IPv4 addresses as its mean to influence the forwarding
of packets through the Internet.
When the MPS starts a new connection, the PM chooses a token that
will be used to identify the connection. This is necessary to allow
the PM applying the correct path index to incoming packets. An
example mapping table is given hereafter:
+-----------------+---------------+---------+-----------------+
| connection id | subflow id | token | Network id |
+-----------------+---------------+---------+-----------------+
| <A1,B1,pA1,pB1> | <conn_id,pi1> | token_1 | <A1,B1,pA1,pB1> |
| <A1,B1,pA1,pB1> | <conn_id,pi2> | token_1 | <A2,B2,pA1,pB2> |
| <A1,B1,pA1,pB1> | <conn_id,pi3> | token_1 | <A1,B2,pA1,pB2> |
| <A1,B1,pA1,pB1> | <conn_id,pi4> | token_1 | <A2,B1,pA1,pB1> |
| <A1,B1,pA1,pB3> | <conn_id,pi1> | token_2 | <A1,B1,pA1,pB3> |
| <A1,B1,pA1,pB3> | <conn_id,pi2> | token_2 | <A2,B1,pA1,pB3> |
+-----------------+---------------+---------+-----------------+
Table 1: Example mapping table for built-in PM
Table 1 shows an example where two connections are ongoing. One is
identified by token_1, the other one with token_2. Since addresses
are rewritten by the path manager, the attachment to the right
connection is achieved thanks to the token, which is used at
connection establishment and subflow establishment. It is then
remembered. The first column holds the information that is exposed
to the applications, while the last column shows the information that
is actually written in packets that will fly through the network. We
note that additionnally to the addresses, ports can be rewritten,
which contributes to supporting NATs. The table also shows the role
of the token, which is to attach various combinations of ports and
addresses to a single connection. The token is specific to the
built-in path manager, and can be ignored if another path manager is
used. An implementation of the built-in path manager MUST implement
the following options (defined in more details in [3]):
o Add Address (ADDR): Announce a new address we own
o Remove Addresse (REMADDR): Withdraw a previously announced address
o Join Connection (JOIN): Attach a new subflow to the current
connection
Those options form the default MPTCP Path Manager, based on declaring
IP addresses, and carries control information in TCP options. An
implementation of Multipath TCP can use any Path Manager, but it MUST
be able to fallback to the default PM in case the other end does not
support the custom PM. Alternative Path Managers may be specified in
separate documents in the future.
A.1.2. Generic architecture for MPTCP
Now that the functional decomposition has been shown for MPTCP with
the built-in Path Manager, we show how that architecture can be
generalized to allow the implementation of other Path Managers for
MPTCP. A general overview of the architecture is provided in
Figure 9. The Multipath Scheduler (MPS) learns about the number of
available paths through notifications received from the Path Manager
(PM). From the point of view of the Multipath Scheduler, a path is
just a number, called a Path Index. Notifications from the PM to the
MPS MAY contain supporting information about the paths, if relevant,
so that the MPS can make more intelligent decisions about where to
route traffic. When the Multipath Scheduler initiates a
communication to a new host, it can only send the packets to the
default path. But since the Path manager is layered below the MPS,
it can detect that a new communication is happening, and tell the MPS
about the other paths it knows about.
Control plane <-- | --> Data plane
+---------------------------------------------------------------+
| Multipath Scheduler (MPS) |
+---------------------------------------------------------------+
^ | |
| | [A1,B1,|pA1,pB1]
| | |
|Announcing new | +-------------+
|paths. (referred | | Data packet |<--Path idx:3
|to as path indices) | +-------------+ attached
| | | by MPS
| | V
+--------------------------------------------\------------------+
| Path Manager (PM) \__________zzzzz |
+--------------------------------------------------------\------+
/ \ | \
/---------------------------\ | /"\ /"\ /"\
| subflow_id Action | | | | | | | |
|<A1,B1,pA1,pB1,1> xxxxx | | | | | | | |
|<A1,B1,pA1,pB1,2> yyyyy | | \./ \./ \./
|<A1,B1,pA1,pB1,3> zzzzz | | path1 path2 path3
+---------------------------+
Figure 9: Overview of MPTCP architecture
From then on, it is possible for the MPS to associate a Path Index
with its packets, so that the Path Manager can map this Path Index to
a particular action (see table in the lower left part of Figure 9).
The particular action depends on the network mechanism used to select
a path. Examples are address rewriting, tunnelling or setting a path
selector value inside the packet. Note that the Path Index is not
supposed to be written inside the packet, but instead associated with
it, internally to the implementation.
The applicability of the architecture is not limited to the MPTCP
protocol. While we define in this document an MPTCP MPS (MPTCP
Multipath Scheduler), other Multipath Schedulers can be defined. For
example, if an appropriate socket interface is designed, applications
could behave as a Multipath Scheduler and decide where to send any
particular data. In this document we concentrate on the MPTCP case,
however.
A.2. PM/MPS interface
The minimal set of requirement for a Path Manager is as follows:
o Outgoing untagged packets: Any outgoing packet flowing through the
Path Manager is either tagged or untagged (by the MPS) with a path
index. If it is untagged, the packet is sent normally to the
Internet, as if no multi-path support were present. Untagged
packets can be used to trigger a path discovery procedure, that
is, a Path Manager can listen to untagged packets and decide at
some time to find if any other path than the default one is
useable for the corresponding host pair. Note that any other
criteria could be used to decide when to start discovering
available paths. Note also that MPS scheduling will not be
possible until the Path Manager has notified the available paths.
The PM is thus the first entity coming into action.
o Outgoing tagged packets: The Path Manager maintains a table (For removal by the RFC Editor)
mapping path indices to actions. The action is the operation that
allows using a particular path. Examples of possible actions are
route selection, interface selection or packet transformation.
When the PM sees a packet tagged with a path index, it looks up
its table to find the appropriate action for that packet. The tag
is purely local. It is removed before the packet is transmitted.
o Incoming packets: A Path Manager MUST ensure that each incoming A.1. Changes since draft-ietf-mptcp-architecture-01
path is mapped unambiguously to exactly one outgoing path. Note
that this requirement implies that the same number of incoming/
outgoing paths must be established. Moreover, a PM MUST tag any
incoming path with the same Path Index as the one used for the
corresponding outgoing path. This is necessary for MPTCP to know
what outgoing path is acknowledged by an incoming packet.
o Module interface: A PM MUST be able to notify the MPS about the o Responded to review comments.
number of available paths. Such notifications MUST contain the
path indices that are legal for use by the MPS. In case the PM
decides to stop providing service for one path, it MUST notify the
MPS about path removal. Additionnaly, a PM MAY provide
complementary path information when available, such as link
quality or preference level.
Appendix B. Changelog o Added security sections.
B.1. Changes since draft-ietf-mptcp-architecture-00 A.2. Changes since draft-ietf-mptcp-architecture-00
o Added middlebox compatibility discussion (Section 9). o Added middlebox compatibility discussion (Section 7).
o Clarified path identification (TCP 4-tuple) in Section 5.5. o Clarified path identification (TCP 4-tuple) in Section 5.5.
o Added brief scenario and diagram to Section 1.3. o Added brief scenario and diagram to Section 1.3.
Authors' Addresses Authors' Addresses
Alan Ford (editor) Alan Ford (editor)
Roke Manor Research Roke Manor Research
Old Salisbury Lane Old Salisbury Lane
skipping to change at page 27, line 23 skipping to change at page 25, line 4
Phone: +44 1794 833 465 Phone: +44 1794 833 465
Email: alan.ford@roke.co.uk Email: alan.ford@roke.co.uk
Costin Raiciu Costin Raiciu
University College London University College London
Gower Street Gower Street
London WC1E 6BT London WC1E 6BT
UK UK
Email: c.raiciu@cs.ucl.ac.uk Email: c.raiciu@cs.ucl.ac.uk
Mark Handley
University College London
Gower Street
London WC1E 6BT
UK
Sebastien Barre Email: m.handley@cs.ucl.ac.uk
Universite catholique de Louvain
Pl. Ste Barbe, 2
Louvain-la-Neuve 1348
Belgium
Phone: +32 10 47 91 03
Email: sebastien.barre@uclouvain.be
Janardhan Iyengar Janardhan Iyengar
Franklin and Marshall College Franklin and Marshall College
Mathematics and Computer Science Mathematics and Computer Science
PO Box 3003 PO Box 3003
Lancaster, PA 17604-3003 Lancaster, PA 17604-3003
USA USA
Phone: 717-358-4774 Phone: 717-358-4774
Email: jiyengar@fandm.edu Email: jiyengar@fandm.edu
 End of changes. 120 change blocks. 
633 lines changed or deleted 539 lines changed or added

This html diff was produced by rfcdiff 1.40. The latest version is available from http://tools.ietf.org/tools/rfcdiff/