--- 1/draft-ietf-taps-impl-03.txt 2019-07-08 11:14:45.449144761 -0700 +++ 2/draft-ietf-taps-impl-04.txt 2019-07-08 11:14:45.561147614 -0700 @@ -1,31 +1,31 @@ TAPS Working Group A. Brunstrom, Ed. Internet-Draft Karlstad University Intended status: Informational T. Pauly, Ed. -Expires: September 12, 2019 Apple Inc. +Expires: January 9, 2020 Apple Inc. T. Enghardt TU Berlin K-J. Grinnemo Karlstad University T. Jones University of Aberdeen P. Tiesel TU Berlin C. Perkins University of Glasgow M. Welzl University of Oslo - March 11, 2019 + July 08, 2019 Implementing Interfaces to Transport Services - draft-ietf-taps-impl-03 + draft-ietf-taps-impl-04 Abstract The Transport Services architecture [I-D.ietf-taps-arch] defines a system that allows applications to use transport networking protocols flexibly. This document serves as a guide to implementation on how to build such a system. Status of This Memo @@ -35,21 +35,21 @@ Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet- Drafts is at https://datatracker.ietf.org/drafts/current/. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." - This Internet-Draft will expire on September 12, 2019. + This Internet-Draft will expire on January 9, 2020. Copyright Notice Copyright (c) 2019 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents @@ -61,67 +61,69 @@ Table of Contents 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 2. Implementing Basic Objects . . . . . . . . . . . . . . . . . 3 3. Implementing Pre-Establishment . . . . . . . . . . . . . . . 4 3.1. Configuration-time errors . . . . . . . . . . . . . . . . 5 3.2. Role of system policy . . . . . . . . . . . . . . . . . . 5 4. Implementing Connection Establishment . . . . . . . . . . . . 6 4.1. Candidate Gathering . . . . . . . . . . . . . . . . . . . 7 - 4.1.1. Structuring Options as a Tree . . . . . . . . . . . . 7 - 4.1.2. Branch Types . . . . . . . . . . . . . . . . . . . . 9 - 4.2. Branching Order-of-Operations . . . . . . . . . . . . . . 11 - 4.3. Sorting Branches . . . . . . . . . . . . . . . . . . . . 12 - 4.4. Candidate Racing . . . . . . . . . . . . . . . . . . . . 13 - 4.4.1. Delayed . . . . . . . . . . . . . . . . . . . . . . . 14 - 4.4.2. Failover . . . . . . . . . . . . . . . . . . . . . . 15 - 4.5. Completing Establishment . . . . . . . . . . . . . . . . 15 - 4.5.1. Determining Successful Establishment . . . . . . . . 16 - 4.6. Establishing multiplexed connections . . . . . . . . . . 17 - 4.7. Handling racing with "unconnected" protocols . . . . . . 17 - 4.8. Implementing listeners . . . . . . . . . . . . . . . . . 18 - 4.8.1. Implementing listeners for Connected Protocols . . . 18 - 4.8.2. Implementing listeners for Unconnected Protocols . . 18 - 4.8.3. Implementing listeners for Multiplexed Protocols . . 18 - 5. Implementing Data Transfer . . . . . . . . . . . . . . . . . 19 - 5.1. Data transfer for streams, datagrams, and frames . . . . 19 - 5.1.1. Sending Messages . . . . . . . . . . . . . . . . . . 19 - 5.1.2. Receiving Messages . . . . . . . . . . . . . . . . . 21 - 5.2. Handling of data for fast-open protocols . . . . . . . . 22 - 6. Implementing Maintenance . . . . . . . . . . . . . . . . . . 23 - 6.1. Managing Connections . . . . . . . . . . . . . . . . . . 23 - 6.2. Handling Path Changes . . . . . . . . . . . . . . . . . . 24 - 7. Implementing Termination . . . . . . . . . . . . . . . . . . 24 - 8. Cached State . . . . . . . . . . . . . . . . . . . . . . . . 25 - 8.1. Protocol state caches . . . . . . . . . . . . . . . . . . 26 - 8.2. Performance caches . . . . . . . . . . . . . . . . . . . 26 - 9. Specific Transport Protocol Considerations . . . . . . . . . 27 - 9.1. TCP . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 - 9.2. UDP . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 - 9.3. SCTP . . . . . . . . . . . . . . . . . . . . . . . . . . 28 - 9.4. TLS . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 - 9.5. HTTP . . . . . . . . . . . . . . . . . . . . . . . . . . 29 - 9.6. QUIC . . . . . . . . . . . . . . . . . . . . . . . . . . 29 - 9.7. HTTP/2 transport . . . . . . . . . . . . . . . . . . . . 30 - 10. Rendezvous and Environment Discovery . . . . . . . . . . . . 30 - 11. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 32 - 12. Security Considerations . . . . . . . . . . . . . . . . . . . 32 - 12.1. Considerations for Candidate Gathering . . . . . . . . . 32 - 12.2. Considerations for Candidate Racing . . . . . . . . . . 32 - 13. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 33 - 14. References . . . . . . . . . . . . . . . . . . . . . . . . . 33 - 14.1. Normative References . . . . . . . . . . . . . . . . . . 33 - 14.2. Informative References . . . . . . . . . . . . . . . . . 34 - Appendix A. Additional Properties . . . . . . . . . . . . . . . 35 - A.1. Properties Affecting Sorting of Branches . . . . . . . . 35 - Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 35 + 4.1.1. Gathering Endpoint Candidates . . . . . . . . . . . . 7 + 4.1.2. Structuring Options as a Tree . . . . . . . . . . . . 9 + 4.1.3. Branch Types . . . . . . . . . . . . . . . . . . . . 10 + 4.2. Branching Order-of-Operations . . . . . . . . . . . . . . 13 + 4.3. Sorting Branches . . . . . . . . . . . . . . . . . . . . 14 + 4.4. Candidate Racing . . . . . . . . . . . . . . . . . . . . 15 + 4.4.1. Delayed . . . . . . . . . . . . . . . . . . . . . . . 16 + 4.4.2. Failover . . . . . . . . . . . . . . . . . . . . . . 16 + 4.5. Completing Establishment . . . . . . . . . . . . . . . . 17 + 4.5.1. Determining Successful Establishment . . . . . . . . 17 + 4.6. Establishing multiplexed connections . . . . . . . . . . 18 + 4.7. Handling racing with "unconnected" protocols . . . . . . 19 + 4.8. Implementing listeners . . . . . . . . . . . . . . . . . 19 + 4.8.1. Implementing listeners for Connected Protocols . . . 20 + 4.8.2. Implementing listeners for Unconnected Protocols . . 20 + 4.8.3. Implementing listeners for Multiplexed Protocols . . 20 + 5. Implementing Data Transfer . . . . . . . . . . . . . . . . . 20 + 5.1. Data transfer for streams, datagrams, and frames . . . . 20 + 5.1.1. Sending Messages . . . . . . . . . . . . . . . . . . 21 + 5.1.2. Receiving Messages . . . . . . . . . . . . . . . . . 23 + 5.2. Handling of data for fast-open protocols . . . . . . . . 23 + 6. Implementing Maintenance . . . . . . . . . . . . . . . . . . 24 + 6.1. Managing Connections . . . . . . . . . . . . . . . . . . 24 + 6.2. Handling Path Changes . . . . . . . . . . . . . . . . . . 26 + + 7. Implementing Termination . . . . . . . . . . . . . . . . . . 26 + 8. Cached State . . . . . . . . . . . . . . . . . . . . . . . . 27 + 8.1. Protocol state caches . . . . . . . . . . . . . . . . . . 27 + 8.2. Performance caches . . . . . . . . . . . . . . . . . . . 28 + 9. Specific Transport Protocol Considerations . . . . . . . . . 29 + 9.1. TCP . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 + 9.2. UDP . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 + 9.3. TLS . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 + 9.4. DTLS . . . . . . . . . . . . . . . . . . . . . . . . . . 34 + 9.5. HTTP . . . . . . . . . . . . . . . . . . . . . . . . . . 34 + 9.6. QUIC . . . . . . . . . . . . . . . . . . . . . . . . . . 35 + 9.7. HTTP/2 transport . . . . . . . . . . . . . . . . . . . . 36 + 9.8. SCTP . . . . . . . . . . . . . . . . . . . . . . . . . . 36 + 10. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 37 + 11. Security Considerations . . . . . . . . . . . . . . . . . . . 37 + 11.1. Considerations for Candidate Gathering . . . . . . . . . 37 + 11.2. Considerations for Candidate Racing . . . . . . . . . . 37 + 12. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 38 + 13. References . . . . . . . . . . . . . . . . . . . . . . . . . 38 + 13.1. Normative References . . . . . . . . . . . . . . . . . . 38 + 13.2. Informative References . . . . . . . . . . . . . . . . . 39 + Appendix A. Additional Properties . . . . . . . . . . . . . . . 40 + A.1. Properties Affecting Sorting of Branches . . . . . . . . 40 + Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 40 1. Introduction The Transport Services architecture [I-D.ietf-taps-arch] defines a system that allows applications to use transport networking protocols flexibly. The interface such a system exposes to applications is defined as the Transport Services API [I-D.ietf-taps-interface]. This API is designed to be generic across multiple transport protocols and sets of protocols features. @@ -268,21 +270,22 @@ calling Initiate. (At this point, any constraints or requirements the application may have on the connection are available from pre- establishment.) The process can be considered complete once there is at least one Protocol Stack that has completed any required setup to the point that it can transmit and receive the application's data. Connection establishment is divided into two top-level steps: Candidate Gathering, to identify the paths, protocols, and endpoints to use, and Candidate Racing, in which the necessary protocol handshakes are conducted so that the transport system can select - which set to use. + which set to use. This document structures candidates for racing as + a tree. The most simple example of this process might involve identifying the single IP address to which the implementation wishes to connect, using the system's current default interface or path, and starting a TCP handshake to establish a stream to the specified IP address. However, each step may also vary depending on the requirements of the connection: if the endpoint is defined as a hostname and port, then there may be multiple resolved addresses that are available; there may also be multiple interfaces or paths available, other than the default system interface; and some protocols may not need any @@ -312,21 +315,77 @@ section is the algorithm defining which of these options to try, when, and in what order. 4.1. Candidate Gathering The step of gathering candidates involves identifying which paths, protocols, and endpoints may be used for a given Connection. This list is determined by the requirements, prohibitions, and preferences of the application as specified in the Selection Properties. -4.1.1. Structuring Options as a Tree +4.1.1. Gathering Endpoint Candidates + + Both Local and Remote Endpoint Candidates must be discovered during + connection establishment. To support ICE, or similar protocols, that + involve out-of-band indirect signalling to exchange candidates with + the Remote Endpoint, it's important to be able to query the set of + candidate Local Endpoints, and give the protocol stack a set of + candidate Remote Endpoints, before it attempts to establish + connections. + +4.1.1.1. Local Endpoint candidates + + The set of possible Local Endpoints is gathered. In the simple case, + this merely enumerates the local interfaces and protocols, allocates + ephemeral source ports. For example, a system that has WiFi and + Ethernet and supports IPv4 and IPv6 might gather four candidate + locals (IPv4 on Ethernet, IPv6 on Ethernet, IPv4 on WiFi, and IPv6 on + WiFi) that can form the source for a transient. + + If NAT traversal is required, the process of gathering Local + Endpoints becomes broadly equivalent to the ICE candidate gathering + phase [RFC5245]. The endpoint determines its server reflexive Local + Endpoints (i.e., the translated address of a local, on the other side + of a NAT) and relayed locals (e.g., via a TURN server or other + relay), for each interface and network protocol. These are added to + the set of candidate Local Endpoints for this connection. + + Gathering Local Endpoints is primarily a local operation, although it + might involve exchanges with a STUN server to derive server reflexive + locals, or with a TURN server or other relay to derive relayed + locals. It does not involve communication with the Remote Endpoint. + +4.1.1.2. Remote Endpoint Candidates + + The Remote Endpoint is typically a name that needs to be resolved + into a set of possible addresses that can be used for communication. + Resolving the Remote Endpoint is the process of recursively + performing such name lookups, until fully resolved, to return the set + of candidates for the remote of this connection. + + How this is done will depend on the type of the Remote Endpoint, and + can also be specific to each Local Endpoint. A common case is when + the Remote Endpoint is a DNS name, in which case it is resolved to + give a set of IPv4 and IPv6 addresses representing that name. Some + types of remote might require more complex resolution. Resolving the + Remote Endpoint for a peer-to-peer connection might involve + communication with a rendezvous server, which in turn contacts the + peer to gain consent to communicate and retrieve its set of candidate + locals, which are returned and form the candidate remote addresses + for contacting that peer. + + Resolving the remote is not a local operation. It will involve a + directory service, and can require communication with the remote to + rendezvous and exchange peer addresses. This can expose some or all + of the candidate locals to the remote. + +4.1.2. Structuring Options as a Tree When an implementation responsible for connection establishment needs to consider multiple options, it should logically structure these options as a hierarchical tree. Each leaf node of the tree represents a single, coherent connection attempt, with an Endpoint, a Path, and a set of protocols that can directly negotiate and send data on the network. Each node in the tree that is not a leaf represents a connection attempt that is either underspecified, or else includes multiple distinct options. For example. when connecting on an IP network, a connection attempt to a hostname and @@ -386,27 +445,27 @@ a single interface with a single protocol. 1 [192.0.2.1:80, Wi-Fi, TCP] A parent node may also only have one child (or leaf) node, such as a when a hostname resolves to only a single IP address. 1 [www.example.com:80, Wi-Fi, TCP] 1.1 [192.0.2.1:80, Wi-Fi, TCP] -4.1.2. Branch Types +4.1.3. Branch Types There are three types of branching from a parent node into one or more child nodes. Any parent node of the tree must only use one type of branching. -4.1.2.1. Derived Endpoints +4.1.3.1. Derived Endpoints If a connection originally targets a single endpoint, there may be multiple endpoints of different types that can be derived from the original. The connection library should order the derived endpoints according to application preference, system policy and expected performance. DNS hostname-to-address resolution is the most common method of endpoint derivation. When trying to connect to a hostname endpoint on a traditional IP network, the implementation should send DNS @@ -419,29 +478,28 @@ 1.1 [2001:DB8::1.80, Wi-Fi, TCP] 1.2 [192.0.2.1:80, Wi-Fi, TCP] 1.3 [2001:DB8::2.80, Wi-Fi, TCP] 1.4 [2001:DB8::3.80, Wi-Fi, TCP] DNS-Based Service Discovery can also provide an endpoint derivation step. When trying to connect to a named service, the client may discover one or more hostname and port pairs on the local network using multicast DNS. These hostnames should each be treated as a branch which can be attempted independently from other hostnames. - Each of these hostnames may also resolve to one or more addresses, thus creating multiple layers of branching. 1 [term-printer._ipp._tcp.meeting.ietf.org, Wi-Fi, TCP] 1.1 [term-printer.meeting.ietf.org:631, Wi-Fi, TCP] 1.1.1 [31.133.160.18.631, Wi-Fi, TCP] -4.1.2.2. Alternate Paths +4.1.3.2. Alternate Paths If a client has multiple network interfaces available to it, such as mobile client with both Wi-Fi and Cellular connectivity, it can attempt a connection over either interface. This represents a branch point in the connection establishment. Like with derived endpoints, the interfaces should be ranked based on preference, system policy, and performance. Attempts should be started on one interface, and then on other interfaces successively after delays based on expected round-trip-time or other available metrics. @@ -451,21 +509,21 @@ This same approach applies to any situation in which the client is aware of multiple links or views of the network. Multiple Paths, each with a coherent set of addresses, routes, DNS server, and more, may share a single interface. A path may also represent a virtual interface service such as a Virtual Private Network (VPN). The list of available paths should be constrained by any requirements or prohibitions the application sets, as well as system policy. -4.1.2.3. Protocol Options +4.1.3.3. Protocol Options Differences in possible protocol compositions and options can also provide a branching point in connection establishment. This allows clients to be resilient to situations in which a certain protocol is not functioning on a server or network. This approach is commonly used for connections with optional proxy server configurations. A single connection may be allowed to use an HTTP-based proxy, a SOCKS-based proxy, or connect directly. These options should be ranked and attempted in succession. @@ -1260,310 +1321,465 @@ given protocol stack, can be stored for a long period of time (hours or longer), since it is expected that the capabilities of the Remote Endpoint are not changing very quickly. On the other hand, Round Trip Time observed by TCP over a particular network path may vary over a relatively short time interval. For such values, the implementation should remove them from the cache more quickly, or treat older values with less confidence/weight. 9. Specific Transport Protocol Considerations + Each protocol that can run as part of a Transport Services + implementation defines both its API mapping as well as implementation + details. + + API mappings for a protocol apply most to Connections in which the + given protocol is the "top" of the Protocol Stack. For example, the + mapping of the "Send" function for TCP applies to Connections in + which the application directly sends over TCP. If HTTP/2 is used on + top of TCP, the HTTP/2 mappings take precendence. + + Each protocol has a notion of Connectedness. Possible values for + Connectedness are: + + o Unconnected. Unconnected protocols do not establish explicit + state between endpoints, and do not perform a handshake during + Connection establishment. + + o Connected. Connected protocols establish state between endpoints, + and perform a handshake during Connection establishment. The + handshake may be 0-RTT to send data or resume a session, but + bidirectional traffic is required to confirm connectedness. + + o Multiplexing Connected. Multiplexing Connected protocols share + properties with Connected protocols, but also explictly support + opening multiple application-level flows. This means that they + can support cloning new Connection objects without a new explicit + handshake. + + Protocols also define a notion of Data Unit. Possible values for + Data Unit are: + + o Byte-stream. Byte-stream protocols do not define any Message + boundaries of their own apart from the end of a stream in each + direction. + + o Datagram. Datagram protocols define Message boundaries at the + same level of transmission, such that only complete (not partial) + Messages are supported. + + o Message. Message protocols support Message boundaries that can be + sent and received either as complete or partial Messages. Maximum + Message lengths can be defined, and Messages can be partially + reliable. + 9.1. TCP - Connection lifetime for TCP translates fairly simply into the the - abstraction presented to an application. When the TCP three-way - handshake is complete, its layer of the Protocol Stack can be - considered Ready (established). This event will cause racing of - Protocol Stack options to complete if TCP is the top-level protocol, - at which point the application can be notified that the Connection is - Ready to send and receive. + Connectedness: Connected - If the application sends a Close, that can translate to a graceful - termination of the TCP connection, which is performed by sending a - FIN to the remote endpoint. If the application sends an Abort, then - the TCP state can be closed abruptly, leading to a RST being sent to - the peer. + Data Unit: Byte-stream - Without a layer of framing (a top-level protocol in the established - Protocol Stack that preserves message boundaries, or an application- - supplied deframer) on top of TCP, the receiver side of the transport - system implementation can only treat the incoming stream of bytes as - a single Message, terminated by a FIN when the Remote Endpoint closes - the Connection. + API mappings for TCP are as follows: + + Connection Object: TCP connections between two hosts map directly to + Connection objects. + + Initiate: Calling "Initiate" on a TCP Connection causes it to + reserve a local port, and send a SYN to the Remote Endpoint. + + InitiateWithSend: Early idempotent data is sent on a TCP Connection + in the SYN, as TCP Fast Open data. + + Ready: A TCP Connection is ready once the three-way handshake is + complete. + + InitiateError: TCP can throw various errors during connection setup. + Specifically, it is important to handle a RST being sent by the + peer during the handshake. + + ConnectionError: Once established, TCP throws errors whenever the + connection is disconnected, such as due to receive a RST from the + peer; or hitting a TCP retransmission timeout. + + Listen: Calling "Listen" for TCP binds a local port and prepares it + to receive inbound SYN packets from peers. + + ConnectionReceived: TCP Listeners will deliver new connections once + they have replied to an inbound SYN with a SYN-ACK. + + Clone: Calling "Clone" on a TCP Connection creates a new Connection + with equivalent parameters. The two Connections are otherwise + independent. + + Send: TCP does not on its own preserve Message boundaries. Calling + "Send" on a TCP connection lays out the bytes on the TCP send + stream without any other delineation. Any Message marked as Final + will cause TCP to send a FIN once the Message has been completely + written. + + Receive: TCP delivers a stream of bytes without any Message + delineation. All data delivered in the "Received" or + "ReceivedPartial" event will be part of a single stream-wide + Message that is marked Final (unless a MessageFramer is used). + EndOfMessage will be delivered when the TCP Connection has + received a FIN from the peer. + + Close: Calling "Close" on a TCP Connection indicates that the + Connection should be gracefully closed by sending a FIN to the + peer and waiting for a FIN-ACK before delivering the "Closed" + event. + + Abort: Calling "Abort" on a TCP Connection indicates that the + Connection should be immediately closed by sending a RST to the + peer. 9.2. UDP - UDP as a direct transport does not provide any handshake or - connectivity state, so the notion of the transport protocol becoming - Ready or established is degenerate. Once the system has validated - that there is a route on which to send and receive UDP datagrams, the - protocol is considered Ready. Similarly, a Close or Abort has no - meaning to the on-the-wire protocol, but simply leads to the local - state being torn down. + Connectedness: Unconnected - When sending and receiving messages over UDP, each Message should - correspond to a single UDP datagram. The Message can contain - metadata about the packet, such as the ECN bits applied to the - packet. + Data Unit: Datagram -9.3. SCTP + API mappings for UDP are as follows: - To support sender-side stream schedulers (which are implemented on - the sender side), a receiver-side Transport System should always - support message interleaving [RFC8260]. + Connection Object: UDP connections represent a pair of specific IP + addresses and ports on two hosts. - SCTP messages can be very large. To allow the reception of large - messages in pieces, a "partial flag" can be used to inform a (native - SCTP) receiving application that a message is incomplete. After - receiving the "partial flag", this application would know that the - next receive calls will only deliver remaining parts of the same - message (i.e., no messages or partial messages will arrive on other - streams until the message is complete) (see Section 8.1.20 in - [RFC6458]). The "partial flag" can therefore facilitate the - implementation of the receiver buffer in the receiving application, - at the cost of limiting multiplexing and temporarily creating head- - of-line blocking delay at the receiver. + Initiate: Calling "Initiate" on a UDP Connection causes it to + reserve a local port, but does not generate any traffic. - When a Transport System transfers a Message, it seems natural to map - the Message object to SCTP messages in order to support properties - such as "Ordered" or "Lifetime" (which maps onto partially reliable - delivery with a SCTP_PR_SCTP_TTL policy [RFC6458]). However, since - multiplexing of Connections onto SCTP streams may happen, and would - be hidden from the application, the Transport System requires a per- - stream receiver buffer anyway, so this potential benefit is lost and - the "partial flag" becomes unnecessary for the system. + InitiateWithSend: Early data on a UDP Connection does not have any + special meaning. The data is sent whenever the Connection is + Ready. - The problem of long messages either requiring large receiver-side - buffers or getting in the way of multiplexing is addressed by message - interleaving [RFC8260], which is yet another reason why a receivers- - side transport system supporting SCTP should implement this - mechanism. + Ready: A UDP Connection is ready once the system has reserved a + local port and has a path to send to the Remote Endpoint. -9.4. TLS + InitiateError: UDP Connections can only generate errors on + initiation due to port conflicts on the local system. + + ConnectionError: Once in use, UDP throws errors upon receiving ICMP + notifications indicating failures in the network. + + Listen: Calling "Listen" for UDP binds a local port and prepares it + to receive inbound UDP datagrams from peers. + + ConnectionReceived: UDP Listeners will deliver new connections once + they have received traffic from a new Remote Endpoint. + + Clone: Calling "Clone" on a UDP Connection creates a new Connection + with equivalent parameters. The two Connections are otherwise + independent. + + Send: Calling "Send" on a UDP connection sends the data as the + payload of a complete UDP datagram. Marking Messages as Final + does not change anything in the datagram's contents. + + Receive: UDP only delivers complete Messages to "Received", each of + which represents a single datagram received in a UDP packet. + + Close: Calling "Close" on a UDP Connection releases the local port + reservation. + + Abort: Calling "Abort" on a UDP Connection is identical to calling + "Close". + +9.3. TLS The mapping of a TLS stream abstraction into the application is - equivalent to the contract provided by TCP (see Section 9.1). The - Ready state should be determined by the completion of the TLS - handshake, which involves potentially several more round trips beyond - the TCP handshake. The application should not be notified that the - Connection is Ready until TLS is established. + equivalent to the contract provided by TCP (see Section 9.1), and + builds upon many of the actions of TCP connections. + + Connectedness: Connected + + Data Unit: Byte-stream + + Connection Object: Connection objects represent a single TLS + connection running over a TCP connection between two hosts. + + Initiate: Calling "Initiate" on a TLS Connection causes it to first + initiate a TCP connection. Once the TCP protocol is Ready, the + TLS handshake will be performed as a client (starting by sending a + "client_hello", and so on). + + InitiateWithSend: Early idempotent data is supported by TLS 1.3, and + sends encrypted application data in the first TLS message when + performing session resumption. For older versions of TLS, or if a + session is not being resumed, the initial data will be delayed + until the TLS handshake is complete. TCP Fast Option can also be + enabled automatically. + + Ready: A TLS Connection is ready once the underlying TCP connection + is Ready, and TLS handshake is also complete and keys have been + established to encrypt application data. + + InitiateError: In addition to TCP initiation errors, TLS can + generate errors during its handshake. Examples of error include a + failure of the peer to successfully authenticate, the peer + rejecting the local authentication, or a failure to match versions + or algorithms. + + ConnectionError: TLS connections will generate TCP errors, or errors + due to failures to rekey or decrypt received messages. + + Listen: Calling "Listen" for TLS listens on TCP, and sets up + received connections to perform server-side TLS handshakes. + + ConnectionReceived: TLS Listeners will deliver new connections once + they have successfully completed both TCP and TLS handshakes. + + Clone: As with TCP, calling "Clone" on a TLS Connection creates a + new Connection with equivalent parameters. The two Connections + are otherwise independent. + + Send: Like TCP, TLS does not preserve message boundaries. Although + application data is framed natively in TLS, there is not a general + guarantee that these TLS messages represent semantically + meaningful application stream boundaries. Rather, sending data on + a TLS Connection only guarantees that the application data will be + transmitted in an encrypted form. Marking Messages as Final + causes a "close_notify" to be generated once the data has been + written. + + Receive: Like TCP, TLS delivers a stream of bytes without any + Message delineation. The data is decrypted prior to being + delivered to the application. If a "close_notify" is received, + the stream-wide Message will be delivered with EndOfMessage set. + + Close: Calling "Close" on a TLS Connection indicates that the + Connection should be gracefully closed by sending a "close_notify" + to the peer and waiting for a corresponding "close_notify" before + delivering the "Closed" event. + + Abort: Calling "Abort" on a TCP Connection indicates that the + Connection should be immediately closed by sending a + "close_notify", optionally preceded by "user_canceled", to the + peer. Implementations do not need to wait to receive + "close_notify" before delivering the "Closed" event. + +9.4. DTLS + + DTLS follows the same behavior as TLS (Section 9.3), with the notable + exception of not inheriting behavior directly from TCP. Differences + from TLS are detailed below, and all cases not explicitly mentioned + should be considered the same as TLS. + + Connectedness: Connected + + Data Unit: Datagram + + Connection Object: Connection objects represent a single DTLS + connection running over a set of UDP ports between two hosts. + + Initiate: Calling "Initiate" on a DTLS Connection causes it reserve + a UDP local port, and begin sending handshake messages to the peer + over UDP. These messages are reliable, and will be automatically + retransmitted. + + Ready: A DTLS Connection is ready once the TLS handshake is complete + and keys have been established to encrypt application data. + + Send: Sending over DTLS does preserve message boundaries in the same + way that UDP datagrams do. Marking a Message as Final does send a + "close_notify" like TLS. + + Receive: Receiving over DTLS delivers one decrypted Message for each + received DTLS datagram. If a "close_notify" is received, a + Message will be delivered that is marked as Final. 9.5. HTTP HTTP requests and responses map naturally into Messages, since they are delineated chunks of data with metadata that can be sent over a transport. To that end, HTTP can be seen as the most prevalent framing protocol that runs on top of streams like TCP, TLS, etc. In order to use a transport Connection that provides HTTP Message support, the establishment and closing of the connection can be treated as it would without the framing protocol. Sending and receiving of Messages, however, changes to treat each Message as a well-delineated HTTP request or response, with the content of the Message representing the body, and the Headers being provided in Message metadata. + Connectedness: Multiplexing Connected + + Data Unit: Message + Connection Object: Connection objects represent a flow of HTTP + messages between a client and a server, which may be an HTTP/1.1 + connection over TCP, or a single stream in an HTTP/2 connection. + + Initiate: Calling "Initiate" on an HTTP connection intiates a TCP or + TLS connection as a client. + + Clone: Calling "Clone" on an HTTP Connection opens a new stream on + an existing HTTP/2 connection when possible. If the underlying + version does not support multiplexed streams, calling "Clone" + simply creates a new parallel connection. + + Send: When an application sends an HTTP Message, it is expected to + provide HTTP header values as a MessageContext in a canonical + form, along with any associated HTTP message body as the Message + data. The HTTP header values are encoded in the specific version + format upon sending. + + Receive: HTTP Connections deliver Messages in which HTTP header + values attached to MessageContexts, and HTTP bodies in Message + data. + + Close: Calling "Close" on an HTTP Connection will only close the + underlying TLS or TCP connection if the HTTP version does not + support multiplexing. For HTTP/2, for example, closing the + connection only closes a specific stream. + 9.6. QUIC QUIC provides a multi-streaming interface to an encrypted transport. Each stream can be viewed as equivalent to a TLS stream over TCP, so a natural mapping is to present each QUIC stream as an individual Connection. The protocol for the stream will be considered Ready whenever the underlying QUIC connection is established to the point that this stream's data can be sent. For streams after the first stream, this will likely be an immediate operation. Closing a single QUIC stream, presented to the application as a Connection, does not imply closing the underlying QUIC connection itself. Rather, the implementation may choose to close the QUIC - connection once all streams have been closed (possibly after some + connection once all streams have been closed (often after some timeout), or after an individual stream Connection sends an Abort. - Messages over a direct QUIC stream should be represented similarly to - the TCP stream (one Message per direction, see Section 9.1), unless a - framing mapping is used on top of QUIC. + Connectedness: Multiplexing Connected + + Data Unit: Stream + Connection Object: Connection objects represent a single QUIC stream + on a QUIC connection. 9.7. HTTP/2 transport Similar to QUIC (Section 9.6), HTTP/2 provides a multi-streaming interface. This will generally use HTTP as the unit of Messages over the streams, in which each stream can be represented as a transport Connection. The lifetime of streams and the HTTP/2 connection should be managed as described for QUIC. It is possible to treat each HTTP/2 stream as a raw byte-stream instead of a carrier for HTTP messages, in which case the Messages over the streams can be represented similarly to the TCP stream (one Message per direction, see Section 9.1). -10. Rendezvous and Environment Discovery - - The connection establishment process outlined in Section 4 is - appropriate for client-server connections, but needs to be expanded - in peer-to-peer Rendezvous scenarios, as follows: - - o Gathering Local Endpoint candidates - - The set of possible Local Endpoints is gathered. In the simple - case, this merely enumerates the local interfaces and protocols, - allocates ephemeral source ports. For example, a system that has - WiFi and Ethernet and supports IPv4 and IPv6 might gather four - candidate locals (IPv4 on Ethernet, IPv6 on Ethernet, IPv4 on - WiFi, and IPv6 on WiFi) that can form the source for a transient. - - If NAT traversal is required, the process of gathering Local - Endpoints becomes broadly equivalent to the ICE candidate - gathering phase [RFC5245]. The endpoint determines its server - reflexive Local Endpoints (i.e., the translated address of a - local, on the other side of a NAT) and relayed locals (e.g., via a - TURN server or other relay), for each interface and network - protocol. These are added to the set of candidate Local Endpoints - for this connection. - - Gathering Local Endpoints is primarily a local operation, although - it might involve exchanges with a STUN server to derive server - reflexive locals, or with a TURN server or other relay to derive - relayed locals. It does not involve communication with the Remote - Endpoint. - - o Gathering Remote Endpoint Candidates - - The Remote Endpoint is typically a name that needs to be resolved - into a set of possible addresses that can be used for - communication. Resolving the Remote Endpoint is the process of - recursively performing such name lookups, until fully resolved, to - return the set of candidates for the remote of this connection. - - How this is done will depend on the type of the Remote Endpoint, - and can also be specific to each Local Endpoint. A common case is - when the Remote Endpoint is a DNS name, in which case it is - resolved to give a set of IPv4 and IPv6 addresses representing - that name. Some types of remote might require more complex - resolution. Resolving the Remote Endpoint for a peer-to-peer - connection might involve communication with a rendezvous server, - which in turn contacts the peer to gain consent to communicate and - retrieve its set of candidate locals, which are returned and form - the candidate remote addresses for contacting that peer. - - Resolving the remote is _not_ a local operation. It will involve - a directory service, and can require communication with the remote - to rendezvous and exchange peer addresses. This can expose some - or all of the candidate locals to the remote. + Connectedness: Multiplexing Connected - o Establishing Connections + Data Unit: Stream - The set of candidate Local Endpoints and the set of candidate - Remote Endpoints are paired, to derive a priority ordered set of - Candidate Paths that can potentially be used to establish a - Connection. + Connection Object: Connection objects represent a single HTTP/2 + stream on a HTTP/2 connection. - Then, communication is attempted over each candidate path, in - priority order. If there are multiple candidates with the same - priority, then connection establishment proceeds simultaneously - and uses the transient that wins the race to be established. - Otherwise, connection establishment is sequential, paced at a rate - that should not congest the network. Depending on the chosen - transport, this phase might involve racing TCP connections to a - server over IPv4 and IPv6 [RFC8305], or it could involve a STUN - exchange to establish peer-to-peer UDP connectivity [RFC5245], or - some other means. +9.8. SCTP - o Confirming and Maintaining Connections + To support sender-side stream schedulers (which are implemented on + the sender side), a receiver-side Transport System should always + support message interleaving [RFC8260]. - Once connectivity has been established, unused resources can be - released and the chosen path can be confirmed. This is primarily - required when establishing peer-to-peer connectivity, where - connections supporting relayed locals that were not required can - be closed, and where an associated signalling operation might be - needed to inform middleboxes and proxies of the chosen path. - Keep-alive messages may also be sent, as appropriate, to ensure - NAT and firewall state is maintained, so the Connection remains - operational. + SCTP messages can be very large. To allow the reception of large + messages in pieces, a "partial flag" can be used to inform a (native + SCTP) receiving application that a message is incomplete. After + receiving the "partial flag", this application would know that the + next receive calls will only deliver remaining parts of the same + message (i.e., no messages or partial messages will arrive on other + streams until the message is complete) (see Section 8.1.20 in + [RFC6458]). The "partial flag" can therefore facilitate the + implementation of the receiver buffer in the receiving application, + at the cost of limiting multiplexing and temporarily creating head- + of-line blocking delay at the receiver. - To support ICE, or similar protocols, that involve an out-of-band - indirect signalling exchange to exchange candidates with the Remote - Endpoint, it's important to be able to query the set of candidate - Local Endpoints, and give the protocol stack a set of candidate - Remote Endpoints, before it attempts to establish connections. + When a Transport System transfers a Message, it seems natural to map + the Message object to SCTP messages in order to support properties + such as "Ordered" or "Lifetime" (which maps onto partially reliable + delivery with a SCTP_PR_SCTP_TTL policy [RFC6458]). However, since + multiplexing of Connections onto SCTP streams may happen, and would + be hidden from the application, the Transport System requires a per- + stream receiver buffer anyway, so this potential benefit is lost and + the "partial flag" becomes unnecessary for the system. - (TO-DO: It is expected that a single abstract algorithm can be - identified that supports both the peer-to-peer and client-server - connection racing, allowing this text to be merged with Section 4) + The problem of long messages either requiring large receiver-side + buffers or getting in the way of multiplexing is addressed by message + interleaving [RFC8260], which is yet another reason why a receivers- + side transport system supporting SCTP should implement this + mechanism. -11. IANA Considerations +10. IANA Considerations RFC-EDITOR: Please remove this section before publication. This document has no actions for IANA. -12. Security Considerations +11. Security Considerations -12.1. Considerations for Candidate Gathering +11.1. Considerations for Candidate Gathering Implementations should avoid downgrade attacks that allow network interference to cause the implementation to select less secure, or entirely insecure, combinations of paths and protocols. -12.2. Considerations for Candidate Racing +11.2. Considerations for Candidate Racing See Section 5.2 for security considerations around racing with 0-RTT data. An attacker that knows a particular device is racing several options during connection establishment may be able to block packets for the first connection attempt, thus inducing the device to fall back to a secondary attempt. This is a problem if the secondary attempts have worse security properties that enable further attacks. Implementations should ensure that all options have equivalent security properties to avoid incentivizing attacks. Since results from the network can determine how a connection attempt tree is built, such as when DNS returns a list of resolved endpoints, it is possible for the network to cause an implementation to consume significant on-device resources. Implementations should limit the maximum amount of state allowed for any given node, including the number of child nodes, especially when the state is based on results from the network. -13. Acknowledgements +12. Acknowledgements This work has received funding from the European Union's Horizon 2020 research and innovation programme under grant agreement No. 644334 (NEAT). This work has been supported by Leibniz Prize project funds of DFG - German Research Foundation: Gottfried Wilhelm Leibniz-Preis 2011 (FKZ FE 570/4-1). This work has been supported by the UK Engineering and Physical Sciences Research Council under grant EP/R04144X/1. Thanks to Stuart Cheshire, Josh Graessley, David Schinazi, and Eric Kinnear for their implementation and design efforts, including Happy Eyeballs, that heavily influenced this work. -14. References +13. References -14.1. Normative References +13.1. Normative References [I-D.ietf-taps-arch] Pauly, T., Trammell, B., Brunstrom, A., Fairhurst, G., Perkins, C., Tiesel, P., and C. Wood, "An Architecture for - Transport Services", draft-ietf-taps-arch-02 (work in - progress), October 2018. + Transport Services", draft-ietf-taps-arch-03 (work in + progress), March 2019. [I-D.ietf-taps-interface] Trammell, B., Welzl, M., Enghardt, T., Fairhurst, G., Kuehlewind, M., Perkins, C., Tiesel, P., and C. Wood, "An Abstract Application Layer Interface to Transport - Services", draft-ietf-taps-interface-02 (work in - progress), October 2018. + Services", draft-ietf-taps-interface-03 (work in + progress), March 2019. [I-D.ietf-taps-minset] Welzl, M. and S. Gjessing, "A Minimal Set of Transport Services for End Systems", draft-ietf-taps-minset-11 (work in progress), September 2018. [RFC6458] Stewart, R., Tuexen, M., Poon, K., Lei, P., and V. Yasevich, "Sockets API Extensions for the Stream Control Transmission Protocol (SCTP)", RFC 6458, DOI 10.17487/RFC6458, December 2011, @@ -1596,26 +1812,26 @@ [RFC8305] Schinazi, D. and T. Pauly, "Happy Eyeballs Version 2: Better Connectivity Using Concurrency", RFC 8305, DOI 10.17487/RFC8305, December 2017, . [RFC8446] Rescorla, E., "The Transport Layer Security (TLS) Protocol Version 1.3", RFC 8446, DOI 10.17487/RFC8446, August 2018, . -14.2. Informative References +13.2. Informative References [I-D.ietf-quic-transport] Iyengar, J. and M. Thomson, "QUIC: A UDP-Based Multiplexed - and Secure Transport", draft-ietf-quic-transport-18 (work - in progress), January 2019. + and Secure Transport", draft-ietf-quic-transport-20 (work + in progress), April 2019. [NEAT-flow-mapping] "Transparent Flow Mapping for NEAT (in Workshop on Future of Internet Transport (FIT 2017))", n.d.. [RFC5245] Rosenberg, J., "Interactive Connectivity Establishment (ICE): A Protocol for Network Address Translator (NAT) Traversal for Offer/Answer Protocols", RFC 5245, DOI 10.17487/RFC5245, April 2010, . @@ -1690,25 +1906,25 @@ Tom Jones University of Aberdeen Fraser Noble Building Aberdeen, AB24 3UE UK Email: tom@erg.abdn.ac.uk Philipp S. Tiesel TU Berlin - Marchstrasse 23 + Einsteinufer 25 10587 Berlin Germany - Email: philipp@inet.tu-berlin.de + Email: philipp@tiesel.net Colin Perkins University of Glasgow School of Computing Science Glasgow G12 8QQ United Kingdom Email: csp@csperkins.org Michael Welzl University of Oslo