Internet Engineering Task Force                             A. Ford, Ed.
Internet-Draft                                       Roke Manor Research
Intended status: Informational                                 C. Raiciu
Expires: December 24, 2010 April 19, 2011                                       M. Handley
                                               University College London
                                                                S. Barre
                                                Universite catholique de
                                                                 Louvain
                                                              J. Iyengar
                                           Franklin and Marshall College
                                                           June 22,
                                                        October 16, 2010

         Architectural Guidelines for Multipath TCP Development
                    draft-ietf-mptcp-architecture-01
                    draft-ietf-mptcp-architecture-02

Abstract

   Endpoints

   Hosts are often connected by multiple paths, but TCP restricts
   communications to a single path per transport connection.  Resource
   usage within the network would be more efficient were these multiple
   paths able to be used concurrently.  This should enhance user
   experience through improved resilience to network failure and higher
   throughput.

   This document outlines architectural guidelines for the development
   of a Multipath Transport Protocol, with references to how these
   architectural components come together in the development of a
   Multipath TCP (MPTCP) protocol.  This document also lists certain high level
   design decisions that provide foundations for the design of the MPTCP design,
   protocol, based upon these architectural requirements.

Status of this Memo

   This Internet-Draft is submitted in full conformance with the
   provisions of BCP 78 and BCP 79.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF).  Note that other groups may also distribute
   working documents as Internet-Drafts.  The list of current Internet-
   Drafts is at http://datatracker.ietf.org/drafts/current/.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."

   This Internet-Draft will expire on December 24, 2010. April 19, 2011.

Copyright Notice
   Copyright (c) 2010 IETF Trust and the persons identified as the
   document authors.  All rights reserved.

   This document is subject to BCP 78 and the IETF Trust's Legal
   Provisions Relating to IETF Documents
   (http://trustee.ietf.org/license-info) in effect on the date of
   publication of this document.  Please review these documents
   carefully, as they describe your rights and restrictions with respect
   to this document.  Code Components extracted from this document must
   include Simplified BSD License text as described in Section 4.e of
   the Trust Legal Provisions and are provided without warranty as
   described in the Simplified BSD License.

Table of Contents

   1.  Introduction . . . . . . . . . . . . . . . . . . . . . . . . .  4
     1.1.  Requirements Language  . . . . . . . . . . . . . . . . . .  5
     1.2.  Terminology  . . . . . . . . . . . . . . . . . . . . . . .  5
     1.3.  Reference Scenario . . . . . . . . . . . . . . . . . . . .  5
   2.  Goals  . . . . . . . . . . . . . . . . . . . . . . . . . . . .  6
     2.1.  Functional Goals . . . . . . . . . . . . . . . . . . . . .  6
     2.2.  Compatibility Goals  . . . . . . . . . . . . . . . . . . .  7
       2.2.1.  Application Compatibility  . . . . . . . . . . . . . .  7
       2.2.2.  Network Compatibility  . . . . . . . . . . . . . . . .  7
       2.2.3.  Compatibility with other network users . . . . . . . .  8  9
       2.2.4.  Security Goals . . . . . . . . . . . . . . . . . . . .  9
   3.  An Architectural Basis For MPTCP . . . . . . . . . . . . . . .  9
   4.  A Functional Decomposition of MPTCP  . . . . . . . . . . . . . 10 11
   5.  High-Level Design Decisions  . . . . . . . . . . . . . . . . . 12 13
     5.1.  Sequence Numbering . . . . . . . . . . . . . . . . . . . . 12 13
     5.2.  Reliability and Retransmissions  . . . . . . . . . . . . . . . . . . . . . . . 13 14
     5.3.  Buffers  . . . . . . . . . . . . . . . . . . . . . . . . . 14 15
     5.4.  Signalling . . . . . . . . . . . . . . . . . . . . . . . . 15 16
     5.5.  Path Management  . . . . . . . . . . . . . . . . . . . . . 15 17
     5.6.  Connection Identification  . . . . . . . . . . . . . . . . 16 18
     5.7.  Network Layer Compatibility  . . . . . . . . . . . . . . . 16
     5.8.  Congestion Control . . . . . . . . . . . . . . . . . . . . 17
   6.  Summary  . . . . . . . . . . . . . . . . . . . . . 18
     5.8.  Security . . . . . . 17
   7.  Security Considerations . . . . . . . . . . . . . . . . . . . 17
   8. 19
   6.  Interactions with Applications . . . . . . . . . . . . . . . . 17
   9. 20
   7.  Interactions with Middleboxes  . . . . . . . . . . . . . . . . 18
   10. Acknowledgements 20
   8.  Contributors . . . . . . . . . . . . . . . . . . . . . . . 19
   11. Contributors . . 22
   9.  Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 19
   12. 22
   10. IANA Considerations  . . . . . . . . . . . . . . . . . . . . . 19
   13. References 22
   11. Security Considerations  . . . . . . . . . . . . . . . . . . . 22
   12. References . . . . . . . 20
     13.1. Normative References . . . . . . . . . . . . . . . . . . . 20
     13.2. Informative 22
     12.1. Normative References . . . . . . . . . . . . . . . . . . 20
   Appendix A.  Implementation Architecture . . . . . 22
     12.2. Informative References . . . . . . . . 21
     A.1.  Functional Separation . . . . . . . . . . 23
   Appendix A.  Changelog . . . . . . . . 21
       A.1.1.  Application to default MPTCP protocol . . . . . . . . 21
       A.1.2.  Generic architecture for MPTCP . . . . . . 24
     A.1.  Changes since draft-ietf-mptcp-architecture-01 . . . . . . 24
     A.2.  PM/MPS interface . . . . . . . . . . . . . . . . . . . . . 25
   Appendix B.  Changelog . . . . . . . . . . . . . . . . . . . . . . 26
     B.1.  Changes since draft-ietf-mptcp-architecture-00 . . . . . . 26 24
   Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 27 24

1.  Introduction

   As the Internet evolves, demands on Internet resources are ever-
   increasing, but often these resources (in particular, bandwidth)
   cannot be fully utilised due to protocol constraints both on the end-
   systems and within the network.  If these resources could instead be
   used concurrently, end user experience could be greatly improved.
   Such enhancements would also reduce the necessary expenditure on
   network infrastructure which that would otherwise be needed to create an
   equivalent improvement in user experience.

   By the application of resource pooling[2], pooling[3], these available resources
   can be 'pooled' such that they appear as a single logical resource to
   the user.  The purpose of a multipath transport, therefore, is to
   make use of multiple available paths, through resource pooling, to
   bring two key benefits:

   o  To increase the resilience of the connectivity by providing
      multiple paths, protecting end hosts from the failure of one.

   o  To increase the efficiency of the resource usage, and thus
      increase the network capacity available to end hosts.

   Multipath TCP (MPTCP)[3]

   MPTCP [4] is a set of extensions for TCP[4] TCP[1] that implements a
   multipath transport and achieves these goals by pooling multiple
   paths within a transport connection, transparent to the application.  While
   Although multihoming and multipath functions have been
   implemented in are not new to transport protocols previously, notably SCTP[5],
   protocols, MPTCP
   is distinct in recognizing aims to gain wide-scale deployment by recognising
   the importance of application and network compatibility
   goals that we believe are important for deployability of a multipath
   transport; we discuss these goals goals.  These
   goals, discussed in more detail later in Section 2. 2, relate to the appearance of
   MPTCP to the network (so non-MPTCP-aware entities see it as TCP) and
   to the application (through providing an equivalent service to TCP to
   non-MPTCP-aware applications).

   This document makes has three contributions: key purposes: (i) it describes goals for a
   multipath transport - goals that MPTCP is designed to meet; (ii) it
   lays out an architectural basis for MPTCP's design - a discussion
   that applies to other multipath transports as well; and (iii) it
   discusses and documents high-level design decisions made in MPTCP's
   development, and considers their implications.

   Companion documents to this architectural overview are those which
   provide details of the protocol extensions[3], extensions[4], congestion control
   algorithms[6],
   algorithms[5], and application-level considerations[7]. considerations[6].  Put
   together, these components specify a complete Multipath TCP design.
   We note that specific components are replaceable with other protocols
   in accordance with the layer and functional decompositions discussed
   in this document.

   Please note this document is a work-in-progress and covers several
   topics, some of which may be more appropriately moved to separate
   documents as this work evolves.

1.1.  Requirements Language

   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
   document are to be interpreted as described in RFC 2119 [1]. [2].

1.2.  Terminology

   Path:  A sequence of links between a sender and a receiver, defined
      in this context by a source and destination address pair.

   Endpoint:  A

   Path Identifier:  Within the context of a multi-addressed multipath
      TCP, a path is defined by the source and destination (address,
      port) pairs (i.e. a 4-tuple).

   Host:  An end host either initiating or terminating a MPTCP
      connection.

   Multipath TCP (MPTCP): TCP:  A modified version of the TCP [4] [1] protocol that
      supports the simultaneous use of multiple paths between
      endpoints. hosts.

   MPTCP:  The proposed protocol extensions specified in [4] to provide
      a Multipath TCP implementation.

   Subflow:  A flow of TCP packets operating over an individual path,
      which forms part of a larger MPTCP connection.

   MPTCP Connection:  A set of one or more subflows combined to provide
      a single Multipath TCP service to an application at an endpoint. a host.

1.3.  Reference Scenario

   The diagram shown in Figure 1 illustrates a typical usage scenario
   for MPTCP.  Two hosts, A and B, are communicating with each other.
   These endpoints hosts are multi-homed and multi-addressed, providing two
   disjoint connections to the Internet.  The addresses on each endpoint host are
   referred to as A1, A2, B1 and B2.  There are therefore up to four
   different paths between the two endpoints: hosts: A1-B1, A1-B2, A2-B1, A2-B2.

     +------+           __________           +------+
     |      |A1 ______ (          ) ______ B1|      |
     | Host |--/      (            )      \--| Host |
     |      |        (   Internet   )        |      |
     |  A   |--\______(            )______/--|   B  |
     |      |A2        (__________)        B2|      |
     +------+                                +------+

                   Figure 1: Simple MPTCP Usage Scenario

   The scenario could have any number of addresses (1 or more) on each
   endpoint,
   host, so long as the number of paths available between the two
   endpoints hosts
   is 2 or more (i.e. num_addr(A) * num_addr(B) > 1).  The paths created
   by these address combinations through the Internet need not be
   entirely disjoint - shared bottlenecks will be addressed by the MPTCP
   congestion controller.  Furthermore, the paths through the Internet
   may be interrupted by any number of middleboxes including NATs and
   Firewalls.  Finally, although the diagram refers to the Internet,
   MPTCP may be used over any network where there are multiple paths
   that could be used concurrently.

   TBD - what further detail here would be useful?

2.  Goals

   This section outlines primary goals that Multipath TCP aims to meet.
   These are broadly broken down into into: functional goals, which steer
   services and features that MPTCP Multipath TCP must provide, provide; and
   compatibility goals, which determine how MPTCP Multipath TCP should appear
   to entities that interact with it.

2.1.  Functional Goals

   In providing supporting the use of multiple paths, MPTCP Multipath TCP has the
   following two functional goals.

   o  Improve Throughput: MPTCP Multipath TCP MUST support the concurrent use
      of multiple paths.  To meet the minimum performance incentives for
      deployment, an MPTCP a Multipath TCP connection over multiple paths SHOULD
      achieve no lesser throughput than a single TCP connection over the
      best constituent path.

   o  Improve Resilience: MPTCP Multipath TCP MUST support the use of multiple
      paths interchangeably for resilience purposes, by permitting
      packets to be sent and re-sent on any available path.  It follows
      that, in the worst case, the protocol MUST be no less resilient
      than legacy regular single-path TCP.

   As distribution of traffic among available paths and responses to
   congestion are done in accordance with resource pooling
   principles[2],
   principles[3], a secondary effect of meeting these goals is that
   widespread use of MPTCP Multipath TCP over the Internet should optimize
   overall network utility by shifting load away from congested
   bottlenecks and by taking advantage of spare capacity wherever
   possible.

   Furthermore, MPTCP Multipath TCP SHOULD feature automatic negotiation of
   its use.  A host supporting Multipath TCP that requires the other endpoint
   host to do so too must be able to detect reliably whether this endpoint host
   does in fact support the next-generation protocol, required extensions, using it them if so, and
   otherwise automatically falling back to the legacy protocol. single-path TCP.

2.2.  Compatibility Goals

   In addition to the functional goals listed above, a Multipath TCP
   must meet a number of compatibility goals in order to support
   deployment in today's Internet.  These goals fall into the following
   categories:

2.2.1.  Application Compatibility

   Application compatibility refers to the appearance of MPTCP Multipath TCP
   to the application both in terms of the API that can be used and the
   expected service model that is provided.

   MPTCP

   Multipath TCP MUST follow the same service model as TCP [4]: in-order, [1]: in-
   order, reliable, and byte-oriented delivery.  Furthermore, an MPTCP
   Multipath TCP connection SHOULD provide the application with no worse
   throughput than it would expect from running a single TCP connection
   over any one of its available paths.

   A multipath-capable equivalent of TCP SHOULD retain backward
   compatibility with existing TCP APIs, so that existing applications
   can use the newer transport merely by upgrading the operating systems
   of the end-hosts.  This does not preclude the use of an advanced API
   to permit multipath-aware applications to specify preferences, nor
   for users to configure their systems in a different way from the
   default, for example switching on or off the automatic use of MPTCP.
   multipath extensions.

2.2.2.  Network Compatibility

   Traditional Internet architecture slots network devices in the
   network layer and lower layers of the OSI 7-layer stack, layers, where the layers above the network
   layer - the transport layer and upper layers
   - are instantiated only at the end-hosts.  While this
   architecture, shown in Figure 2, was initially largely adhered to earlier, to,
   this layering no longer reflects the "ground truth" in the Internet
   with the proliferation of middleboxes[8]. middleboxes[7].  Middleboxes routinely
   interpose on the transport layer; sometimes even completely
   terminating transport connections, thus leaving the application layer
   as the first real end-to-end layer, as shown in Figure 3.

   +-------------+                                       +-------------+
   | Application |<------------ end-to-end ------------->| Application |
   +-------------+                                       +-------------+
   |  Transport  |<------------ end-to-end ------------->|  Transport  |
   +-------------+   +-------------+   +-------------+   +-------------+
   |   Network   |<->|   Network   |<->|   Network   |<->|   Network   |
   +-------------+   +-------------+   +-------------+   +-------------+
      End Host           Router             Router          End Host

                Figure 2: Traditional Internet Architecture

   +-------------+                                       +-------------+
   | Application |<------------ end-to-end ------------->| Application |
   +-------------+                     +-------------+   +-------------+
   |  Transport  |<------------------->|  Transport  |<->|  Transport  |
   +-------------+   +-------------+   +-------------+   +-------------+
   |   Network   |<->|   Network   |<->|   Network   |<->|   Network   |
   +-------------+   +-------------+   +-------------+   +-------------+
                                          Firewall,
      End Host           Router         NAT, or Proxy      End Host

                        Figure 3: Internet Reality

   Middleboxes that interpose on the transport layer result in loss of
   "fate-sharing"[9],
   "fate-sharing"[8], that is, they often hold "hard" state that, when
   lost or corrupted, results in loss or corruption of the end-to-end
   transport connection.

   MPTCP MUST remain backward compatible

   The network compatibility goal requires that the multipath extension
   to TCP retains compatibility with the Internet as it exists today,
   including being making reasonable efforts to be able to traverse
   predominant middleboxes such as firewalls, NATs, and performance
   enhancing proxies[8]. proxies[7].  This requirement comes from recognizing
   middleboxes as a significant deployment bottleneck for any transport
   that is not TCP, and constrains MPTCP Multipath TCP to appear as TCP does
   on the wire and to use established TCP extensions where necessary.
   To ensure end-to-endness of the transport, we further require MPTCP
   Multipath TCP to preserve fate-sharing without making any assumptions
   about middlebox behavior.

2.2.3.  Compatibility with other network users

   As a corollary to both network

   A detailed analysis of middlebox behaviour and application compatibility, the impact on the
   Multipath TCP architecture is presented in Section 7.  In addition,
   network compatibility must enable new be retained to the extent that Multipath
   TCP flows MUST fall back to coexist
   gracefully with existing legacy regular TCP flows, competing if there are insurmountable
   incompatibilities for the multipath extension on a path.

   MPTCP's modifications remain at the transport layer, although some
   knowledge of the underlying network layer is required.  MPTCP SHOULD
   work with IPv4 and IPv6 interchangeably, i.e. one MPTCP connection
   may operate over both IPv4 and IPv6 networks.

2.2.3.  Compatibility with other network users

   As a corollary to both network and application compatibility, the
   architecture must enable new Multipath TCP flows to coexist
   gracefully with existing single-path TCP flows, competing for
   bandwidth neither unduly aggressively or unduly timidly (unless low-precedence low-
   precedence operation is specifically requested by the application,
   such as with LEDBAT).  The use of multiple paths MUST not NOT unduly harm
   users using
   single path single-path TCP at shared bottlenecks, beyond the impact
   that would occur from another single legacy single-path TCP flow.  Multiple
   Multipath TCP flows on a shared bottleneck MUST share bandwidth
   between each other with the similar fairness to that which occurs
   with a shared bottleneck with single-path TCP.

2.2.4.  Security Goals

   The extension of TCP with multipath capabilities will bring with it a
   number of new threats, analysed in detail in [9].  The security goal
   for Multipath TCP is to provide a service no less secure than
   regular, single-path TCP.  This will be achieved through a
   combination of existing TCP security mechanisms (potentially modified
   to align with the Multipath TCP extensions) and of protection against
   the new multipath threats identified.  The design decisions derived
   from this goal are presented in Section 5.8.

3.  An Architectural Basis For MPTCP

   We now present one possible transport architecture that we believe
   can effectively support MPTCP's goals.  The new Internet model
   described here is based on ideas proposed earlier in Tng ("Transport
   next-generation") [10].  While by no means the only possible
   architecture supporting multipath transport, Tng incorporates many
   lessons learned from previous transport research and development
   practice, and offers a strong starting point from which to consider
   the extant Internet architecture and its bearing on the design of any
   new Internet transports or transport extensions.

          +------------------+
          |    Application   |
          +------------------+  ^ Application-oriented transport
          |                  |  | functions (Semantic Layer)
          + - - Transport - -+ ----------------------------------
          |                  |  | Network-oriented transport
          +------------------+  v functions (Flow+Endpoint Layer)
          |      Network     |
          +------------------+
            Existing Layers             Tng Decomposition

              Figure 4: Decomposition of Transport Functions

   Tng loosely splits the transport layer into "application-oriented"
   and "network-oriented" layers, as shown in Figure 4.  The
   application-oriented "Semantic" layer implements functions driven
   primarily by concerns of supporting and protecting the application's
   end-to-end communication, while the network-oriented "Flow+Endpoint"
   layer implements functions such as endpoint identification (using
   port numbers) and congestion control.  These network-oriented
   functions, while traditionally located in the ostensibly "end-to-end"
   Transport layer, have proven in practice to be of great concern to
   network operators and the middleboxes they deploy in the network to
   enforce network usage policies[11] [12] or optimize communication
   performance[13].  Figure 5 shows how middleboxes interact with
   different layers in this decomposed model of the transport layer: the
   application-oriented layer operates end-to-end, while the network-
   oriented layer operates "segment-by-segment" and can be interposed
   upon by middleboxes.

   +-------------+                                       +-------------+
   | Application |<------------ end-to-end ------------->| Application |
   +-------------+                                       +-------------+
   |  Semantic   |<------------ end-to-end ------------->|  Semantic   |
   +-------------+   +-------------+   +-------------+   +-------------+
   |Flow+Endpoint|<->|Flow+Endpoint|<->|Flow+Endpoint|<->|Flow+Endpoint|
   +-------------+   +-------------+   +-------------+   +-------------+
   |   Network   |<->|   Network   |<->|   Network   |<->|   Network   |
   +-------------+   +-------------+   +-------------+   +-------------+
                        Firewall         Performance
      End Host           or NAT        Enhancing Proxy      End Host

              Figure 5: Middleboxes in the new Internet model

   MPTCP's architectural design follows Tng's decomposition as shown in
   Figure 6.  The MPTCP component,  MPTCP, which provides application compatibility through
   the preservation of TCP-like semantics of global ordering of
   application data and reliability, is an instantiation of the
   "application-oriented" Semantic layer; whereas the legacy-TCP subflow TCP
   component, which provides network compatibility by appearing and
   behaving as a TCP flow in network, is an instantiation of the
   "network-oriented" Flow+Endpoint layer.

     +--------------------------+    +-------------------------+    +-------------------------------+
     |      Application         |    |          Application          |
     +--------------------------+    +-------------------------+    +-------------------------------+
     |        Semantic          |    |             MPTCP             |
        |--------------------------|
     |------------+-------------|    + - - - - - - - + - - - - - - - +
     | Flow+Endpt | Flow+Endpt  |    |    TCP Subflow (TCP) |     TCP Subflow (TCP) |
        +--------------------------+    +-------------------------+
     +------------+-------------+    +---------------+---------------+
     |   Network  |   Network   |    |       IP      |       IP      |
        +--------------------------+    +-------------------------+
     +------------+-------------+    +---------------+---------------+

                      Figure 6: MPTCP mapping to Tng

   As a protocol extension to TCP, MPTCP thus explicitly acknowledges
   middleboxes in its design, and specifies a protocol that operates at
   two scales: the MPTCP component operates end-to-end, while it allows
   the TCP component to operate segment-by-segment.

4.  A Functional Decomposition of MPTCP

   Having laid out the goals to be met and the architectural basis for

   MPTCP, we now provide a functional decomposition MPTCP's design.

   The MPTCP component relies upon as described in [4], makes use of (what appear to the network
   to be) standard TCP sessions, termed "subflows", to provide the
   underlying transport per path, and as such these retain the network
   compatibility desired.  MPTCP as described in [3] carries MPTCP-
   specific  MPTCP-specific information is carried in a
   TCP-compatible manner, although this mechanism is separate from the
   actual information being transferred so could evolve in future
   revisions.  Figure 7 illustrates the layered architecture.

                                   +-------------------------------+
                                   |           Application         |
      +---------------+            +-------------------------------+
      |  Application  |            |             MPTCP             |
      +---------------+            + - - - - - - - + - - - - - - - +
      |      TCP      |            | Subflow (TCP) | Subflow (TCP) |
      +---------------+            +-------------------------------+
      |      IP       |            |       IP      |      IP       |
      +---------------+            +-------------------------------+

      Figure 7: Comparison of Standard TCP and MPTCP Protocol Stacks

   Situated below the application, the MPTCP extension in turn manages
   multiple TCP subflows below it.  In order to do this, it and must
   implement the following functions:

   o  Path Management: This is the function to detect and use multiple
      paths between two endpoints. hosts.  In the case of the MPTCP design [3], [4],
      this feature is implemented using multiple IP addresses at least one or
      both of the endpoints.  Although this does not guarantee path
      diversity, and there may be shared bottlenecks, this is a simple
      mechanism that can be used with no additional features in the
      network.  The hosts.  The path management features of the MPTCP
      protocol are the mechanisms to signal alternative addresses to endpoints,
      hosts, and mechanisms to set up new subflows attached joined to an existing
      MPTCP connection.

   o  Packet Scheduling: This function breaks the bytestream received
      from the application into segments which are to be transmitted on one of the
      available lower subflows.  The MPTCP design makes use of a data
      sequence mapping, associating packets segments sent on different subflows
      to a connection-level sequence numbering, thus allowing
      packets segments
      sent on different subflows to be correctly re-ordered at the
      receiver.  The packet scheduler is dependent upon information
      about the availability of paths exposed by the path management
      component, and then makes use of the subflows to transmit these
      packets. queued
      segments.

   o  Subflow (single-path TCP) Interface: A subflow component takes
      segments from the packet-scheduling component and transmits them
      over the specified path, ensuring detectable delivery to the
      endpoint.  Detection of delivery is necessary to allow the
      congestion control protocol to attribute packet delivery or loss
      to the right path.  Note that the packet scheduling component does
      not embed enough information in packets to allow this to happen:
      segments with the same connection-level sequence number can be
      transmitted over multiple paths, i.e. as retransmissions or just
      to increase redundancy. host.
      MPTCP uses TCP underneath for network compatibility; TCP ensures
      in-order, reliable delivery.  TCP adds its of own sequence numbers to
      the segments; these are used to detect and retransmit lost packets. packets
      at the subflow layer.  The connection-level sequence numbering
      from the packet scheduling component allows re-ordering of the
      entire bytestream.

   o  Congestion Control: This function manages coordinates congestion control
      across the subflows.  As specified, this congestion control
      algorithm must MUST ensure that a MPTCP connection does not unfairly
      take more bandwidth than a single path TCP flow would take at a
      shared bottlneck.  An algorithm to support this is specified in
      [6].
      [5].

   These functions fit together as follows.  The Path Management looks
   after the discovery (and if necessary, initialisation) of multiple
   paths between two endpoints. hosts.  The Packet Scheduler then receives
   packets a stream
   of data from the application destined for the network network, and does undertakes
   the necessary operations on them it (such as segmenting the data into
   connection-level segments, and adding a data-level connection-level sequence
   number) before sending it on to a subflow.  The subflow then adds its
   own sequence number, acks, and passes them to network.  The receiving
   subflow re-
   orders re-orders data (if necessary) and passes it to the MPTCP packet
   scheduling component, which performs connection level re-ordering, removes the segment boundaries
   and sends it the data stream to the application.  Finally, the
   congestion control component exists as part of the packet scheduling,
   in order to schedule which packets should be sent at what rate on
   which subflow.

5.  High-Level Design Decisions

   There is seemingly a wide range of choices when designing a multipath
   extension to TCP.  However, the goals as discussed earlier in this
   document constrain the possible solutions, leaving relative little
   choice in many areas.  Here, we outline high-level design choices
   that draw from the architectural basis discussed earlier in
   Section 3, and their implications for the MPTCP design. design [4].

5.1.  Sequence Numbering

   MPTCP uses two levels of sequence spaces: a connection level sequence
   number, and another sequence number for each subflow.  This permits
   connection-level segmentation and reassembly, and retransmission of
   the same part of connection-level sequence space on different
   subflow-level sequence space.

   The alternative approach would be to use a single connection level
   sequence number, which gets sent on multiple subflows.  This has two
   problems: first, the individual subflows will appear to the network
   as TCP sessions with gaps in the sequence space; this in turn may
   upset certain middleboxes such as intrusion detection systems, or
   certain transparent proxies, and would thus go against the network
   compatibility goal.  Second, the sender cannot would not be able to
   attribute packet losses or receptions to the correct path when the
   same packet is sent on multiple paths, paths (i.e. in the case of retransmissions.
   retransmissions).

   The sender must be able to tell the receiver how to reorder reassemble the
   data, for delivery to the application.  The sender does so by telling  In order to achieve this, the
   receiver must determine how subflow-level data (carying subflow
   sequence numbers) maps at the connection level, which we level.  We refer to this as
   the Data Sequence Mapping.  This mapping takes the form (data seq,
   subflow seq, length), i.e. for a given number of bytes (the length),
   the subflow sequence space beginning at the given sequence number
   maps to the connection-level sequence space (beginning at the given
   data seq number).

   This architecture does not mandate a mechanism for signalling such
   information, the
   Data Sequence Mapping, and it could conceivably have various sources.

   One option would be to use existing fields in the TCP segment (such
   as subflow seqno, length) and only add the data sequence number to
   each segment, for instance as a TCP option.  This is, however,
   vulnerable to middleboxes that resegment or assemble data, since
   there is no specified behaviour for coalescing TCP options.  If one
   signalled (data seqno, length), this would still be vulnerable to
   middleboxes that coalesce segments and do not understand MPTCP
   signalling so do not correctly coalesce rewrite the options.

   Because of these potential issues, the current
   specification of design decision taken in the
   MPTCP mandates protocol [4] is that the full whenever a mapping should for subflow data needs
   to be sent conveyed to the other end. host, all three pieces of data (data seq,
   subflow seq, length) must be sent.  To reduce the overhead, it would
   be permissable for the mapping to be sent periodically and cover more
   than a single segment.  Further experimentation is required to
   determine what tradeoffs exist regarding the frequency at which
   mappings should be sent.  It could also be excluded entirely in the
   case of a connection before more than one subflow is used, where the
   data-level and subflow-level sequence space is the same.

5.2.  Reliability and Retransmissions

   MPTCP features acknowledgements at connection-level as well as
   subflow-level acknowledgements, in order to provide a robust service
   to the application.

   Under normal behaviour, MPTCP can use the data sequence mapping and
   subflow ACKs to decide when a connection-level segment was received.
   This has certain implications on end-to-end semantics.  It means that
   The transmission of TCP ACKs for a subflow are handled entirely at
   the subflow level, in order to maintain TCP semantics and trigger
   subflow-level retransmissions.  This has certain implications on end-
   to-end semantics.  It means that once a packet is acked at the
   subflow level it cannot be discarded in the re-order buffer at the
   connection level.  Secondly, unlike in standard TCP, a receiver
   cannot simply drop out-of-order segments if needed (for instance, due
   to memory pressure).  Under certain circumstances, therefore, it may
   be desirable to be able to drop packets after acknowledgement on the
   subflow but before delivery to the application, and this can be
   facilitated by a connection-level acknowledgement.

   Furthermore, it is possible to conceive of some cases where
   connection-level acknowledgements could improve robustness.  Consider
   a subflow traversing a transparent proxy: if the proxy acks a segment
   and then crashes, the sender will not retransmit the lost segment on
   another subflow, as it thinks the segment has been received.  The
   connection grinds to a halt despite having other working subflows,
   and the sender would be unable to determine the cause of the problem.
   An example situation where this may occur would be mobility between
   wireless access points, each of which operates a transport-level
   proxy.  Finally, as an optimisation, it may be feasible for a connection-
   level
   connection-level acknowledgement to be transmitted over the shortest
   RTT path, potentially reducing send buffer requirements (see
   Section 5.3).

   Therefore, to provide a fully robust multipath TCP solution, MPTCP
   SHOULD feature explicit connection-level acknowledgements, in
   addition to subflow-level acknowledgements.  A connection-level
   acknowledgement would only be required in order to signal when the
   receive window moves forward; the heuristics for using such a signal
   are discussed in more detail in the protocol specificiation [4].

   Regarding retransmissions, it MUST be possible for a packet to be
   retransmitted on a different subflow to that on which it was
   originally sent.  This is one of MPTCP's core goals, in order to
   maintain integrity during temporary or permanent subflow failure, and
   this is enabled by the dual sequence number space.

   The scheduling of retransmissions will have significant impact on
   MPTCP user experience.  The current MPTCP specification suggests that
   data outstanding on subflows that have timed out should be
   rescheduled for transmission on different subflows.  This behaviour
   aims to minimize disruption when a path breaks, and uses the first
   timeout as indicators.  More conservative versions would be to use
   second or third timeouts for the same packet.

   When packet loss is detected and corrected with

   Typically, fast retransmit, retransmit on an individual subflow will not trigger
   retransmission on different subflows another subflow, although this may still be
   desirable in certain cases, for instance to reduce the receive buffer
   requirements.  However, in all cases with retransmissions on
   different subflows, the lost packets SHOULD still be sent on the path
   that lost them.  This is currently believed to be necessary to
   maintain subflow integrity, as per the network compatiblity goal.  By
   doing this, throughput will be wasted, and it is unclear at this
   point what the optimal retransmit strategy is.

   Large-scale experiments are therefore required in order to determine
   the most appropriate retransmission strategy, and recommendations
   will be refined once more information is available.

5.3.  Buffers

   Receive Buffer: ideally,

   To ensure in-order delivery, Multipath TCP must use a connection
   level receive buffer, where segments are placed until they are in
   order and can be read by the application.

   In regular, single-path TCP, it is usually recommended to set the
   receive buffer to 2*BDP (Bandwidth-Delay Product, i.e.  BDP = BW*RTT,
   where BW = Bandwidth and RTT = Round-Trip Time).  One BDP allows
   supporting reordering of segments by the network.  The other BDP
   allows the connection to continue during fast retransmit: when a
   segment is fast retransmitted, the receiver must be able to store
   incoming data during one more RTT.

   For Multipath TCP, the story is a bit more complicated.  The ultimate
   goal is that a subflow failing packet loss or subflow failure should not
   affect the throughput of other working subflows.  However, the receive buffer
   has limited size: if a flow times out, the other subflows will
   quickly fill subflows; the receive buffer with out-of-order data, and will
   stall.  Hence, receive buffer sizing receiver should
   have enough buffering to store all data until the missing packet is important for both robustness
   re-transmitted and throughput. reaches the destination.

   The worst case scenario would be when the subflow with the highest
   RTT/RTO (Round-Trip Time or Retransmission TimeOut) experiences a
   timeout; in that case the receiver has to buffer data from all
   subflows for the duration of the RTO.  Thus, the smallest connection-
   level receive buffer we need that would be needed to avoid stalling under any
   circumstances with
   subflow failures is sum(BW_i)*RTO_max, where BW_i = Bandwidth for
   each subflow and RTO_max is max(RTO)*sum(BW). the largest RTO across all subflows.

   This is, is an order of magnitude more than the receive buffer required
   for most multipath
   connections, a single connection, and is probably too expensive. expensive for practical
   purposes.  A more reasonable size sensible requirement is proportional to max(RTT)*sum(BW) which avoid stalls in the
   absence of timeouts.  Therefore, the RECOMMENDED receive buffer is
   2*sum(BW_i)*RTT_max, where RTT_max is the largest RTT across all
   subflows.  This buffer sizing ensures subflows don't do not stall when fast
   retransmit works.  Also, depending is triggered on how the implementation behaves,
   an additional sum(RTT*BW) might any subflow.

   The resulting buffer size should be needed small enough for practical use.
   However, there may be extreme cases where fast, high throughput paths
   (e.g. 100Mb/s, 10ms RTT) are used in conjunction with slow paths
   (e.g. 1Mb/s, 1000ms RTT).  In that case the individual re-order
   buffers of the required receive buffer
   would be 12.5MB, which is likely too big.  In these cases a Multipath
   TCP subflows. scheduler SHOULD use only the fast path, potentially falling back
   to the slow path if the fast path fails.

   Send Buffer: the smallest The RECOMMENDED send buffer we need is sum(BDP) across all
   paths; this is to hold data until it's acked at subflow level.  If we
   didn't use a subflow level ack, and relied on a data-level ack, the
   send buffer would need to be as big same size as the
   recommended receive buffer of i.e., 2*sum(BW_i)*RTT_max.  This is
   because the
   connection, max(RTT)*sum(BW).  In practice, sender must store locally the senders will be web
   servers and receivers will be desktops or mobile servers. segments sent but
   unacknowledged by the connection level ACK.  The send buffer size
   matters particularly for servers, which must be able to hosts that maintain a large number of
   ongoing connections.

5.4.  If the required send buffer is too large, a
   host can choose to only send data on the fast subflows, using the
   slow subflows only in cases of failure.

5.4.  Signalling

   Since MPTCP will use regular uses TCP streams as its subflow transport mechanism, a MPTCP
   connection will also begin as a single TCP stream. connection.  Nevertheless,
   it must signal to the peer that it supports MPTCP and wishes to use
   it on this connection.  As such, a TCP Option will be used to
   transmit this information, since this is the established mechanism
   for indicating additional functionality on a TCP session.

   On top of this, however, is

   In addition, further signalling is required during the operation of
   an MPTCP session, such as that for reassembly for multiple subflows,
   and for informing the other endpoint host about potential other available
   addresses.  It is not mandated by the architecture in what format
   this signalling should be transmitted.

   The current MPTCP protocol proposal suggests the design [4] continues to use of TCP options Options for this signalling, however another approach would be to embed such
   information
   signalling.  This has been chosen as the mechanism most fitting in
   with the goals as specified in Section 2.  With this mechanism, the payload, and use type-length-value (TLV) encoding
   to separate
   signalling requires to operate MPTCP is transported separately from
   the data, allowing it to be created and processed separately from the
   data stream, and payload data. retaining architectural compatibility with network
   entities.

5.5.  Path Management

   Currently, the network does not expose multiple paths between
   endpoints. hosts.
   Multipath TCP will use multiple addresses at one or both
   endpoints hosts to get
   infer different paths to across the destination. network.  The hope is that these
   paths, whilst not necesarily entirely non-overlapping, will be
   sufficiently disjoint to allow multipath to achieve improved
   throughput and robustness.  The use of multiple IP addresses is a
   simple mechanism that requires no additional features in the network.

   Multiple different (source, destination) address pairs will thus be
   used as path selectors.  Each path will be identified by a TCP
   4-tuple (i.e. source address, destination address, source port,
   destination port), thus allowing the extension of MPTCP to use such
   4-tuples as path selectors if the network will route different ports
   over different paths (which may be the case with technologies such as
   ECMP).
   Equal Cost MultiPath (ECMP) routing, e.g. [14]).

   For increased chance of successfully setting up additional subflows
   (such as when one end is behind a firewall, NAT, or other restrictive
   middlebox), either endpoint should host SHOULD be able to add new subflows to a MPTCP
   connection.  MPTCP MUST be able to handle paths that appear and
   disappear during the lifetime of a connection (for example, through
   the activation of an additional network interface).

   The modularity of path management will permit alternative mechanisms
   to be employed if appropriate in the future.

5.6.  Connection Identification

   Since an MPTCP connection may not be bound to a traditional 5-tuple
   (source addr and port, destination addr and port, protocol number)
   for the entirity of its existance, it is desirable to provide a new
   mechanism for connection identification.  This will be useful for
   MPTCP-aware applications, and for the MPTCP implementation (and
   MPTCP-aware middleboxes) to have a unique identifier with which to
   associate the multiple subflows.

   Therefore, each MPTCP connection should have requires a connection identifier at
   each endpoint, host, which is locally unique within that endpoint. host.  In many ways,
   this is analogous to a port number in regular TCP.  The manifestation
   and purpose of such an identifier is out of the scope of this
   architecture document.

   Legacy applications will not, however, have access to this identifier
   and in such cases a MPTCP connection will be identified by the
   5-tuple of the first TCP subflow.  It is out of the scope of this
   document, however, to define the behaviour of the MPTCP
   implementation if the first TCP subflow later fails.  If there are
   legacy
   MPTCP-unaware applications that make assumptions about continued
   existance of the initial address pair, their behaviour could be
   disrupted by carrying on regardless.  It is expected that this is a
   very small, possibly negligible, set of applications, however.  In
   the case of applications that have specifically asked used an existing API call to be bound bind
   to a particular specific address or interface, the MPTCP extension MUST NOT be
   used, since the applications are indicating a clear choice of path to
   use and thus will not have expectations of behaviour that must be used.
   maintained, in order to adhere to the application compatibility
   goals.

   Since the requirements of applications are not clear at this stage,
   however, it is as yet unconfirmed what the best behaviour is.  It
   will be an implementation-specific solution, however, and as such the
   behaviour is expected to be chosen by implementors once more research
   has been undertaken to determine its impact.

5.7.  Network Layer Compatibility

   MPTCP's modifications remain at the transport layer, although some
   knowledge of the underlying network layer is required.  MPTCP MUST
   work with IPv4 and IPv6 interchangeably, i.e. one MPTCP connection
   may operate over both IPv4 and IPv6 networks.

5.8.  Congestion Control

   As already documented discussed in network-layer compatibility requirements, requirements
   Section 2.2.3, there are three goals for the congestion control
   algorithms used by an MPTCP implementation
   must not implementation: improve throughput (at
   least as well as a single-path TCP connection would perform); do no
   harm to other legacy network users (do not take up more capacity on any one
   path than if it was a single path flow using only that route - this
   is particularly relevant for shared bottlenecks. bottlenecks); and balance
   congestion by moving traffic away from the most congested paths.  To
   achieve
   this, these goals, the congestion control algorithms on use on each
   subflow must be coupled in some way - a way.  A proposal for this a suitable
   congestion control algorithm is given in [6].

6.  Summary

   This document has provided [5].

5.8.  Security

   A detailed threat analysis for Multipath TCP is presented in a summary of the components
   separate document [9].  This focuses on flooding attacks and
   hijacking attacks that have been
   identified to provide can be launched against a Multipath TCP solution, and described the
   high-level design decisions that have been used
   connection.

   The basic security goal of Multipath TCP, as introduced in
   Section 2.2.4, can be stated as: "provide a basis of solution that is no worse
   than standard TCP".

   From the
   MPTCP specification.

   The suite of drafts threat analysis, and with this goal in mind, three key
   security requirements can be identified.  A multi-addressed Multipath
   TCP SHOULD be able to:

   o  Provide a mechanism to confirm that specify the parties in a complete MPTCP implementation, on
   top of this architectural overview, subflow
      handshake are the same as follows:

   o  A specification in the original connection setup (e.g.
      require use of a key exchanged in the MPTCP protocol [3], describing initial handshake in the on- and
      off-the-wire differences
      subflow handshake, to regular TCP. limit the scope for hijacking attacks).

   o  A specification of a coupled congestion control algorithm [6],  Provide verification that can be applied to the above protocol while meeting the goals
      for such an algorithm as specified in this document.

   o  A document [7] peer can receive traffic at a new
      address before adding it (i.e. verify that builds upon the application compatibility
      issues discussed in this document, explaining in more detail what
      if any changes an application may experience through address belongs to
      the use other host, to prevent flooding attacks).

   o  Provide replay protection, i.e. ensure that a request to add/
      remove a subflow is 'fresh'.

   Additional mechanisms have been deployed as part of
      MPTCP.  This document also provides standard TCP
   stacks to provide resistance to Denial-of-Service attacks.  For
   example, there are various mechanisms to protect against TCP reset
   attacks [15], and Multipath TCP should continue to support similar
   protection.  In addition, TCP SYN Cookies [16] were developed to
   allow a proposed API through which
      an application can influence TCP server to defer the behaviour creation of the MPTCP protocol,
      as specified session state in the above drafts.

7.  Security Considerations

   Please see [14] for
   SYN_RCVD state, and remain stateless until the ESTABLISHED state had
   been reached.  Multipath TCP should, ideally, continue to provide
   such functionality and, at a threat analysis minimum, avoid significant computational
   burden prior to reaching the ESTABLISHED state (of the Multipath TCP
   connection as a whole).

   It should be noted that aspects of the Multipath TCP. TCP design space
   place constraints on the security solution:

   o  The threats
   analysed use of TCP options significantly limits the amount of
      information that can be carried in this companion document are addressed as appropriate the handshake.

   o  The need to work through middleboxes results in the need to handle
      mutability of packets.

   o  The desire to support a 'break-before-make' approach to adding
      subflows removes the ability to actively use a pre-existing
      subflow to support the addition of a new one.

   The MPTCP protocol design [3].

8. [4] aims to meet these security
   requirements, and the protocol specification will document how these
   are met.

6.  Interactions with Applications

   Interactions with applications - incuding, but not limited to,
   performances changes that may be expected, semantic changes, and new
   features that may be requested of an API, are presented in [7].

9. [6].

7.  Interactions with Middleboxes

   As discussed in Section 2.2, it is a goal of MPTCP to be deployable
   today and thus compatible with the majority of middleboxes.  This
   section summarises the issues that may arise with NATs, firewalls,
   proxies, intrusion detection systems, and other middleboxes that, if
   not considered in the protocol design, may hinder its deployment.

   This section is intended primarily as a description of options and
   considerations only.  Protocol-specific solutions to these issues
   will be given in the companion documents.

   Multipath TCP will be deployed in a network that no longer provides
   just basic datagram delivery.  A miriad of middleboxes are deployed
   to optimize various perceived problems with the Internet protocols:
   NATs primarily address space shortage [11], Performance Enhancing
   Proxies (PEPs) optimize TCP for different link characteristics [13],
   firewalls [12] and intrusion detection systems try to block malicious
   content from reaching a host, and traffic normalizers [15] [17] ensure a
   consistent view of the traffic stream to IDSes and hosts.

   All these middleboxes optimize current applications at the expense of
   future applications.  In effect, future applications must mimic will often need
   to behave in a similar fashion to existing ones if they want ones, in order to be deployed. increase
   the chances of successful deployment.  Further, the precise behaviour
   of all these middleboxes is not clearly specified, and implementation
   errors make matters worse, raising the bar for the deployment of new
   technologies.

   The following list of middlebox classes documents behaviour that
   could impact the use of MPTCP.  This list is used in [3] [4] to describe
   the features of the MPTCP protocol that are used to mitigate the
   impact of these middlebox behaviours.

   o  NATs: Network Address Translators decouple the endpoint's host's local IP
      address with that which is seen in the wider Internet when the
      packets are transmitted through a NAT.  This adds complexity, and
      reduces the chances of success, when signalling IP addresses.

   o  PEPs: Performance Enhancing Proxies, which aim to improve the
      performance of protocols over low-performance (e.g. high latency
      or high error rate) links.  As such, they may "split" a TCP
      connection and behaviour such as proactive ACKing may occur.  As
      with NATs, it is no longer guaranteed that one endpoint host is
      communicating directly with another.

   o  Traffic Normalizers: These aim to eliminate ambiguities and
      potential attacks at the network level, and amongst other things
      are unlikely to permit holes in TCP-level sequence space.

   o  Firewalls: on top of preventing incoming connections, firewalls
      may also attempt additional protection such as sequence number
      randomization.

   o  Intrusion Detection Systems: IDSs may look for traffic patterns to
      protect a network, and may have false positives with MPTCP and
      drop the connections during normal operation.  For future MPTCP-
      aware middleboxes, they will require the ability to correlate the
      various paths in use.

   In addition, all classes of middleboxes may affect TCP traffic in the
   following ways:

   o  TCP Options: many middleboxes are in a position to drop packets
      with unknown TCP options, or strip those options from the packets.

   o  Segmentation/Colescing: middleboxes (or even something as close to
      the end host as TCP Segmentation Offloading) may change the packet
      boundaries from those which the sender intended.  It may do this
      by splitting packets, or coalescing them together.  This leads to
      two major impacts: we cannot guarantee where a packet boundary
      will be, and we cannot say for sure what a middlebox will do with
      TCP options in these cases (they may be repeated, dropped, or sent
      only once).

   o  Firewalls: on top of preventing incoming connections, firewalls
      may also attempt additional protection such as sequence number
      randomization.

   o  Intrusion Detection Systems: IDSs may look for traffic patterns

8.  Contributors

   The authors would like to
      protect a network, and may have false positives with MPTCP and
      drop the connections during normal operation.  For future MPTCP-
      aware middleboxes, they will require acknowledge the ability contributions of Sebastien
   Barre, Andrew McDonald, and Bryan Ford to correlate this document.

   The authors would also like to thank the
      various paths in use.

10. following people for
   detailed reviews: Olivier Bonaventure, Gorry Fairhurst, Iljitsch van
   Beijnum, and Philip Eardley.

9.  Acknowledgements

   Alan Ford, Costin Raiciu and Sebastien Barre Mark Handley are supported by Trilogy
   (http://www.trilogy-project.org), a research project (ICT-216372)
   partially funded by the European Community under its Seventh
   Framework Program.  The views expressed here are those of the
   author(s) only.  The European Commission is not liable for any use
   that may be made of the information in this document.

11.  Contributors

   The authors would like to acknowledge the contributions of Mark
   Handley and Bryan Ford to this document.

12.

10.  IANA Considerations

   None.

13.  References
13.1.  Normative References

   [1]   Bradner, S., "Key words

11.  Security Considerations

   This informational document provides an architectural overview for use in RFCs to Indicate Requirement
         Levels", BCP 14, RFC 2119, March 1997.

13.2.  Informative References

   [2]   Wischik, D., Handley, M.,
   Multipath TCP and M. Bagnulo Braun, "The Resource
         Pooling Principle", ACM SIGCOMM CCR vol. 38 num. 5, pp. 47-52,
         October 2008,
         <http://ccr.sigcomm.org/online/files/p47-handleyA4.pdf>.

   [3]   Ford, A., so does not, in itself, raise any security issues.
   A separate threat analysis [9] lists threats that can exist with a
   Multipath TCP.  However, a protocol based on the architecture in this
   document will have a number of security requirements.  The high level
   goals for such a protocol are identified in Section 2.2.4, whilst
   Section 5.8 provides more detailed discussion of security
   requirements and design decisions which are applied in the MPTCP
   protocol design [4].

12.  References

12.1.  Normative References

   [1]   Postel, J., "Transmission Control Protocol", STD 7, RFC 793,
         September 1981.

   [2]   Bradner, S., "Key words for use in RFCs to Indicate Requirement
         Levels", BCP 14, RFC 2119, March 1997.

12.2.  Informative References

   [3]   Wischik, D., Handley, M., and M. Bagnulo Braun, "The Resource
         Pooling Principle", ACM SIGCOMM CCR vol. 38 num. 5, pp. 47-52,
         October 2008,
         <http://ccr.sigcomm.org/online/files/p47-handleyA4.pdf>.

   [4]   Ford, A., Raiciu, C., and M. Handley, "TCP Extensions for
         Multipath Operation with Multiple Addresses",
         draft-ietf-mptcp-multiaddressed-00
         draft-ietf-mptcp-multiaddressed-01 (work in progress),
         June
         July 2010.

   [4]   Postel, J., "Transmission Control Protocol", STD 7, RFC 793,
         September 1981.

   [5]   Stewart, R., "Stream Control Transmission Protocol", RFC 4960,
         September 2007.

   [6]   Raiciu, C., Handley, M., and D. Wischik, "Coupled Multipath-
         Aware Congestion Control", draft-raiciu-mptcp-congestion-01 draft-ietf-mptcp-congestion-00 (work
         in progress), March July 2010.

   [7]

   [6]   Scharf, M. and A. Ford, "MPTCP Application Interface
         Considerations", draft-scharf-mptcp-api-01 draft-scharf-mptcp-api-02 (work in progress),
         March
         July 2010.

   [8]

   [7]   Carpenter, B. and S. Brim, "Middleboxes: Taxonomy and Issues",
         RFC 3234, February 2002.

   [9]

   [8]   Carpenter, B., "Internet Transparency", RFC 2775,
         February 2000.

   [10]  Ford, B. and J. Iyengar, "Breaking Up the Transport Logjam",
          ACM HotNets, October 2008.

   [11]  Srisuresh, P. and K. Egevang, "Traditional IP Network Address
         Translator (Traditional NAT)", RFC 3022, January 2001.

   [12]  Freed, N., "Behavior of and Requirements for Internet
         Firewalls", RFC 2979, October 2000.

   [13]  Border, J., Kojo, M., Griner, J., Montenegro, G., and Z.
         Shelby, "Performance Enhancing Proxies Intended to Mitigate
         Link-Related Degradations", RFC 3135, June 2001.

   [14]  Bagnulo, M., "Threat Analysis for Multi-addressed/Multi-path
         TCP", draft-ietf-mptcp-threat-02 (work in progress),
         March 2010.

   [15]  Handley, M., Paxson, V., and C. Kreibich, "Network Intrusion
         Detection: Evasion, Traffic Normalization, and End-to-End
         Protocol Semantics", Usenix Security 2001, 2001, <http://
         www.usenix.org/events/sec01/full_papers/handley/handley.pdf>.

Appendix A.  Implementation Architecture

   This section provides suggestions for an architecture to implement an
   extensible, modular multipath transport protocol.

A.1.  Functional Separation

   This section describes a generic view of the internal implementation
   of a Multipath TCP, through which the technical components specified
   in the companion documents can fit together.  It shows how an
   implementation could be built that permits extensibility between
   components without changing the external representation.

   We first show the functional decomposition of an MPTCP solution that
   is completely contained in the transport layer.  That solution is
   described in more details in [3].  Then we generalize the approach to
   allow good extensibility of that solution.

A.1.1.  Application to default MPTCP protocol

   Although, in the default approach, MPTCP is fully contained in the
   transport layer, it can still be divided into two main modules.  One
   manages the scheduling of packets as well as congestion control.  The
   other one manages the control of paths.  The interface between the
   two is dealt with thanks to a Path Index.  As shown in Figure 8, the
   Path Manager announces to the MultiPath Scheduler what paths can be
   used trough path indices, and maintains the mapping between that
   value and the particular action that it must apply to use the path
   (an example of such a mapping is in Table 1).  In the case of the
   built-in Path Manager, the action is to replace an address/port pair
   with another one, in such a way that another path is used across the
   Internet to forward that packet.

            Control plane    <--     |     -->    Data plane
   +---------------------------------------------------------------+
   |                     Multipath Scheduler (MPS)                 |
   +---------------------------------------------------------------+
                ^                    |          |
                |                    |   [A1,B1,|pA1,pB1]
                |For conn_id         |          |
                |<A1,B1,pA1,pB1>     |   +-------------+
                |Paths 1->4 can be   |   | Data packet |<--Path idx:3
                |used.               |   +-------------+   attached
                |                    |          |          by MPS
                |                    |          V
   +--------------------------------------------\------------------+
   |                         Path Manager (PM)   \[A1,B1]->[A1,B2] |
   +--------------------------------------------------\------------+
      /                           \  |                 \
     /-----------------------------\ |   /"\    /"\    /"\   /"\
     | rewriting table:             ||   | |    | |    | |   | |
     | Subflow id  <-->  network_id ||   | |    | |    | |   | |
     |                              ||   | |    | |    | |   | |
     |    [see table below]         ||   | |    | |    | |   | |
     |                              ||   \./    \./    \./   \./
     +------------------------------+|  path1  path2  path3 path4

      Figure 8: Functional separation of MPTCP in the transport layer

   The MultiPath Scheduler only deals with abstract paths, represented
   by numbers.  It only sees one address pair throughout the
   communication, that we call the connection identifier.  However, the
   MultiPath Scheduler must be able to perform per-subflow congestion
   control, and thus to distinguish between the subflows.  This leads to
   define a subflow identifier, that consists of the usual transport
   identifier extended with the path index:
   <addr_src,psrc,addr_dst,pdst,path_index>.  The following options,
   described in [3], are managed by the MultiPath Scheduler.

   o  MULTIPATH CAPABLE (MPC): Tell the peer that we support MPTCP.
      Note that the MPC option also holds a token, which is necessary
      only if the built-in Path Manager is used.  In the next section we
      describe the generalized case, where the token can be ignored by
      the receiver if another path manager is used.

   o  DATA SEQUENCE NUMBER (DSN): Identify the position of a set of
      bytes in the meta-flow.

   o  DATA FIN (DFIN): Terminate a meta-flow.

   An implementation MUST use those options even if another Path Manager
   than the default one is implemented.

   The Path manager applies a particular technology to give the MPS the
   possibility to use several paths.  The built-in MPTCP Path Manager
   uses multiple IPv4 addresses as its mean to influence the forwarding
   of packets through the Internet.

   When the MPS starts a new connection, the PM chooses a token that
   will be used to identify the connection.  This is necessary to allow
   the PM applying the correct path index to incoming packets.  An
   example mapping table is given hereafter:

      +-----------------+---------------+---------+-----------------+
      |  connection id  |   subflow id  |  token  |    Network id   |
      +-----------------+---------------+---------+-----------------+
      | <A1,B1,pA1,pB1> | <conn_id,pi1> | token_1 | <A1,B1,pA1,pB1> |
      | <A1,B1,pA1,pB1> | <conn_id,pi2> | token_1 | <A2,B2,pA1,pB2> |
      | <A1,B1,pA1,pB1> | <conn_id,pi3> | token_1 | <A1,B2,pA1,pB2> |
      | <A1,B1,pA1,pB1> | <conn_id,pi4> | token_1 | <A2,B1,pA1,pB1> |
      | <A1,B1,pA1,pB3> | <conn_id,pi1> | token_2 | <A1,B1,pA1,pB3> |
      | <A1,B1,pA1,pB3> | <conn_id,pi2> | token_2 | <A2,B1,pA1,pB3> |
      +-----------------+---------------+---------+-----------------+

              Table 1: Example mapping table for built-in PM

   Table 1 shows an example where two connections are ongoing.  One is
   identified by token_1, the other one with token_2.  Since addresses
   are rewritten by the path manager, the attachment to the right
   connection is achieved thanks to the token, which is used at
   connection establishment and subflow establishment.  It is then
   remembered.  The first column holds the information that is exposed
   to the applications, while the last column shows the information that
   is actually written in packets that will fly through the network.  We
   note that additionnally to the addresses, ports can be rewritten,
   which contributes to supporting NATs.  The table also shows the role
   of the token, which is to attach various combinations of ports and
   addresses to a single connection.  The token is specific to the
   built-in path manager, and can be ignored if another path manager is
   used.  An implementation of the built-in path manager MUST implement
   the following options (defined in more details in [3]):

   o  Add Address (ADDR): Announce a new address we own

   o  Remove Addresse (REMADDR): Withdraw a previously announced address

   o  Join Connection (JOIN): Attach a new subflow to the current
      connection

   Those options form the default MPTCP Path Manager, based on declaring
   IP addresses, and carries control information in TCP options.  An
   implementation of Multipath TCP can use any Path Manager, but it MUST
   be able to fallback to the default PM in case the other end does not
   support the custom PM.  Alternative Path Managers may be specified in
   separate documents in the future.

A.1.2.  Generic architecture for MPTCP

   Now that the functional decomposition has been shown for MPTCP with
   the built-in Path Manager, we show how that architecture can be
   generalized to allow the implementation of other Path Managers for
   MPTCP.  A general overview of the architecture is provided in
   Figure 9.  The Multipath Scheduler (MPS) learns about the number of
   available paths through notifications received from the Path Manager
   (PM).  From the point of view of the Multipath Scheduler, a path is
   just a number, called a Path Index.  Notifications from the PM to the
   MPS MAY contain supporting information about the paths, if relevant,
   so that the MPS can make more intelligent decisions about where to
   route traffic.  When the Multipath Scheduler initiates a
   communication to a new host, it can only send the packets to the
   default path.  But since the Path manager is layered below the MPS,
   it can detect that a new communication is happening, and tell the MPS
   about the other paths it knows about.

            Control plane    <--     |     -->    Data plane
   +---------------------------------------------------------------+
   |                     Multipath Scheduler (MPS)                 |
   +---------------------------------------------------------------+
                ^                    |          |
                |                    |   [A1,B1,|pA1,pB1]
                |                    |          |
                |Announcing new      |   +-------------+
                |paths. (referred    |   | Data packet |<--Path idx:3
                |to as path indices) |   +-------------+   attached
                |                    |          |          by MPS
                |                    |          V
   +--------------------------------------------\------------------+
   |                         Path Manager (PM)   \__________zzzzz  |
   +--------------------------------------------------------\------+
      /                         \    |                       \
     /---------------------------\   |   /"\       /"\       /"\
     | subflow_id        Action  |   |   | |       | |       | |
     |<A1,B1,pA1,pB1,1>  xxxxx   |   |   | |       | |       | |
     |<A1,B1,pA1,pB1,2>  yyyyy   |   |   \./       \./       \./
     |<A1,B1,pA1,pB1,3>  zzzzz   |   |  path1     path2     path3
     +---------------------------+

                 Figure 9: Overview of MPTCP architecture

   From then on, it is possible for the MPS to associate a Path Index
   with its packets, so that the Path Manager can map this Path Index to
   a particular action (see table in the lower left part of Figure 9).
   The particular action depends on the network mechanism used to select
   a path.  Examples are address rewriting, tunnelling or setting a path
   selector value inside the packet.  Note that the Path Index is not
   supposed to be written inside the packet, but instead associated with
   it, internally to the implementation.

   The applicability of the architecture is not limited to the MPTCP
   protocol.  While we define in this document an MPTCP MPS (MPTCP
   Multipath Scheduler), other Multipath Schedulers can be defined.  For
   example, if an appropriate socket interface is designed, applications
   could behave as a Multipath Scheduler and decide where to send any
   particular data.  In this document we concentrate on the MPTCP case,
   however.

A.2.  PM/MPS interface

   The minimal set of requirement for a Path Manager is as follows:

   o  Outgoing untagged packets: Any outgoing packet flowing through the
      Path Manager is either tagged or untagged (by the MPS) with a path
      index.  If it is untagged, the packet is sent normally to the
      Internet, as if no multi-path support were present.  Untagged
      packets can be used to trigger a path discovery procedure, that
      is, a Path Manager can listen to untagged packets and decide at
      some time to find if any other path than the default one is
      useable for the corresponding host pair.  Note that any other
      criteria could be used to decide when to start discovering
      available paths.  Note also that MPS scheduling will not be
      possible until the Path Manager has notified the available paths.
      The PM is thus the first entity coming into action.

   o  Outgoing tagged packets: The Path Manager maintains a table
      mapping path indices to actions.  The action is the operation that
      allows using a particular path.  Examples of possible actions are
      route selection, interface selection or packet transformation.
      When the PM sees a packet tagged with a path index, it looks up
      its table to find the appropriate action 2775,
         February 2000.

   [9]   Bagnulo, M., "Threat Analysis for that packet.  The tag
      is purely local.  It is removed before the packet is transmitted.

   o  Incoming packets: A Path Manager MUST ensure that each incoming
      path is mapped unambiguously to exactly one outgoing path.  Note
      that this requirement implies that Multi-addressed/Multi-path
         TCP", draft-ietf-mptcp-threat-02 (work in progress),
         March 2010.

   [10]  Ford, B. and J. Iyengar, "Breaking Up the same number Transport Logjam",
          ACM HotNets, October 2008.

   [11]  Srisuresh, P. and K. Egevang, "Traditional IP Network Address
         Translator (Traditional NAT)", RFC 3022, January 2001.

   [12]  Freed, N., "Behavior of incoming/
      outgoing paths must be established.  Moreover, a PM MUST tag any
      incoming path with the same Path Index as the one used for the
      corresponding outgoing path.  This is necessary and Requirements for MPTCP to know
      what outgoing path is acknowledged by an incoming packet.

   o  Module interface: A PM MUST be able Internet
         Firewalls", RFC 2979, October 2000.

   [13]  Border, J., Kojo, M., Griner, J., Montenegro, G., and Z.
         Shelby, "Performance Enhancing Proxies Intended to notify the MPS about the
      number Mitigate
         Link-Related Degradations", RFC 3135, June 2001.

   [14]  Hopps, C., "Analysis of available paths.  Such notifications MUST contain the
      path indices that are legal for use an Equal-Cost Multi-Path Algorithm",
         RFC 2992, November 2000.

   [15]  Ramaiah, A., Stewart, R., and M. Dalal, "Improving TCP's
         Robustness to Blind In-Window Attacks", RFC 5961, August 2010.

   [16]  Eddy, W., "TCP SYN Flooding Attacks and Common Mitigations",
         RFC 4987, August 2007.

   [17]  Handley, M., Paxson, V., and C. Kreibich, "Network Intrusion
         Detection: Evasion, Traffic Normalization, and End-to-End
         Protocol Semantics", Usenix Security 2001, 2001, <http://
         www.usenix.org/events/sec01/full_papers/handley/handley.pdf>.

Appendix A.  Changelog

   (For removal by the MPS.  In case the PM
      decides RFC Editor)

A.1.  Changes since draft-ietf-mptcp-architecture-01

   o  Responded to stop providing service for one path, it MUST notify the
      MPS about path removal.  Additionnaly, a PM MAY provide
      complementary path information when available, such as link
      quality or preference level.

Appendix B.  Changelog

B.1. review comments.

   o  Added security sections.

A.2.  Changes since draft-ietf-mptcp-architecture-00

   o  Added middlebox compatibility discussion (Section 9). 7).

   o  Clarified path identification (TCP 4-tuple) in Section 5.5.

   o  Added brief scenario and diagram to Section 1.3.

Authors' Addresses

   Alan Ford (editor)
   Roke Manor Research
   Old Salisbury Lane
   Romsey, Hampshire  SO51 0ZN
   UK

   Phone: +44 1794 833 465
   Email: alan.ford@roke.co.uk

   Costin Raiciu
   University College London
   Gower Street
   London  WC1E 6BT
   UK

   Email: c.raiciu@cs.ucl.ac.uk

   Sebastien Barre
   Universite catholique de Louvain
   Pl. Ste Barbe, 2
   Louvain-la-Neuve  1348
   Belgium

   Phone: +32 10 47 91 03
   Mark Handley
   University College London
   Gower Street
   London  WC1E 6BT
   UK

   Email: sebastien.barre@uclouvain.be m.handley@cs.ucl.ac.uk

   Janardhan Iyengar
   Franklin and Marshall College
   Mathematics and Computer Science
   PO Box 3003
   Lancaster, PA  17604-3003
   USA

   Phone: 717-358-4774
   Email: jiyengar@fandm.edu