--- 1/draft-ietf-taps-impl-05.txt 2020-03-09 11:14:44.333871513 -0700 +++ 2/draft-ietf-taps-impl-06.txt 2020-03-09 11:14:44.445874358 -0700 @@ -1,31 +1,31 @@ TAPS Working Group A. Brunstrom, Ed. Internet-Draft Karlstad University Intended status: Informational T. Pauly, Ed. -Expires: May 7, 2020 Apple Inc. +Expires: 10 September 2020 Apple Inc. T. Enghardt TU Berlin K-J. Grinnemo Karlstad University T. Jones University of Aberdeen P. Tiesel TU Berlin C. Perkins University of Glasgow M. Welzl University of Oslo - November 04, 2019 + 9 March 2020 Implementing Interfaces to Transport Services - draft-ietf-taps-impl-05 + draft-ietf-taps-impl-06 Abstract The Transport Services architecture [I-D.ietf-taps-arch] defines a system that allows applications to use transport networking protocols flexibly. This document serves as a guide to implementation on how to build such a system. Status of This Memo @@ -35,102 +35,102 @@ Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet- Drafts is at https://datatracker.ietf.org/drafts/current/. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." - This Internet-Draft will expire on May 7, 2020. + This Internet-Draft will expire on 10 September 2020. Copyright Notice - Copyright (c) 2019 IETF Trust and the persons identified as the + Copyright (c) 2020 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal - Provisions Relating to IETF Documents - (https://trustee.ietf.org/license-info) in effect on the date of - publication of this document. Please review these documents - carefully, as they describe your rights and restrictions with respect - to this document. Code Components extracted from this document must - include Simplified BSD License text as described in Section 4.e of - the Trust Legal Provisions and are provided without warranty as - described in the Simplified BSD License. + Provisions Relating to IETF Documents (https://trustee.ietf.org/ + license-info) in effect on the date of publication of this document. + Please review these documents carefully, as they describe your rights + and restrictions with respect to this document. Code Components + extracted from this document must include Simplified BSD License text + as described in Section 4.e of the Trust Legal Provisions and are + provided without warranty as described in the Simplified BSD License. Table of Contents 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 2. Implementing Connection Objects . . . . . . . . . . . . . . . 4 - 3. Implementing Pre-Establishment . . . . . . . . . . . . . . . 4 + 3. Implementing Pre-Establishment . . . . . . . . . . . . . . . 5 3.1. Configuration-time errors . . . . . . . . . . . . . . . . 5 3.2. Role of system policy . . . . . . . . . . . . . . . . . . 6 - 4. Implementing Connection Establishment . . . . . . . . . . . . 6 + 4. Implementing Connection Establishment . . . . . . . . . . . . 7 4.1. Candidate Gathering . . . . . . . . . . . . . . . . . . . 8 4.1.1. Gathering Endpoint Candidates . . . . . . . . . . . . 8 4.1.2. Structuring Options as a Tree . . . . . . . . . . . . 9 4.1.3. Branch Types . . . . . . . . . . . . . . . . . . . . 11 4.2. Branching Order-of-Operations . . . . . . . . . . . . . . 13 4.3. Sorting Branches . . . . . . . . . . . . . . . . . . . . 14 - 4.4. Candidate Racing . . . . . . . . . . . . . . . . . . . . 15 + 4.4. Candidate Racing . . . . . . . . . . . . . . . . . . . . 16 4.4.1. Delayed . . . . . . . . . . . . . . . . . . . . . . . 16 4.4.2. Failover . . . . . . . . . . . . . . . . . . . . . . 17 4.5. Completing Establishment . . . . . . . . . . . . . . . . 17 4.5.1. Determining Successful Establishment . . . . . . . . 18 - 4.6. Establishing multiplexed connections . . . . . . . . . . 18 + 4.6. Establishing multiplexed connections . . . . . . . . . . 19 4.7. Handling racing with "unconnected" protocols . . . . . . 19 - 4.8. Implementing listeners . . . . . . . . . . . . . . . . . 19 + 4.8. Implementing listeners . . . . . . . . . . . . . . . . . 20 4.8.1. Implementing listeners for Connected Protocols . . . 20 - 4.8.2. Implementing listeners for Unconnected Protocols . . 20 - 4.8.3. Implementing listeners for Multiplexed Protocols . . 20 + 4.8.2. Implementing listeners for Unconnected Protocols . . 21 + 4.8.3. Implementing listeners for Multiplexed Protocols . . 21 5. Implementing Sending and Receiving Data . . . . . . . . . . . 21 - 5.1. Sending Messages . . . . . . . . . . . . . . . . . . . . 21 - 5.1.1. Message Properties . . . . . . . . . . . . . . . . . 21 + 5.1. Sending Messages . . . . . . . . . . . . . . . . . . . . 22 + 5.1.1. Message Properties . . . . . . . . . . . . . . . . . 22 5.1.2. Send Completion . . . . . . . . . . . . . . . . . . . 23 5.1.3. Batching Sends . . . . . . . . . . . . . . . . . . . 23 - 5.2. Receiving Messages . . . . . . . . . . . . . . . . . . . 23 + 5.2. Receiving Messages . . . . . . . . . . . . . . . . . . . 24 5.3. Handling of data for fast-open protocols . . . . . . . . 24 - 6. Implementing Message Framers . . . . . . . . . . . . . . . . 24 - 6.1. Defining Message Framers . . . . . . . . . . . . . . . . 25 - 6.2. Sender-side Message Framing . . . . . . . . . . . . . . . 26 - 6.3. Receiver-side Message Framing . . . . . . . . . . . . . . 26 - 7. Implementing Connection Management . . . . . . . . . . . . . 27 - 7.1. Pooled Connection . . . . . . . . . . . . . . . . . . . . 28 - 7.2. Handling Path Changes . . . . . . . . . . . . . . . . . . 28 - 8. Implementing Connection Termination . . . . . . . . . . . . . 29 - 9. Cached State . . . . . . . . . . . . . . . . . . . . . . . . 30 - 9.1. Protocol state caches . . . . . . . . . . . . . . . . . . 30 - 9.2. Performance caches . . . . . . . . . . . . . . . . . . . 31 - 10. Specific Transport Protocol Considerations . . . . . . . . . 32 - 10.1. TCP . . . . . . . . . . . . . . . . . . . . . . . . . . 33 - 10.2. UDP . . . . . . . . . . . . . . . . . . . . . . . . . . 34 - 10.3. TLS . . . . . . . . . . . . . . . . . . . . . . . . . . 35 - 10.4. DTLS . . . . . . . . . . . . . . . . . . . . . . . . . . 37 - 10.5. HTTP . . . . . . . . . . . . . . . . . . . . . . . . . . 37 - 10.6. QUIC . . . . . . . . . . . . . . . . . . . . . . . . . . 38 - 10.7. HTTP/2 transport . . . . . . . . . . . . . . . . . . . . 39 - 10.8. SCTP . . . . . . . . . . . . . . . . . . . . . . . . . . 39 - 11. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 42 - 12. Security Considerations . . . . . . . . . . . . . . . . . . . 42 - 12.1. Considerations for Candidate Gathering . . . . . . . . . 42 - 12.2. Considerations for Candidate Racing . . . . . . . . . . 42 - 13. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 42 - 14. References . . . . . . . . . . . . . . . . . . . . . . . . . 43 - 14.1. Normative References . . . . . . . . . . . . . . . . . . 43 - 14.2. Informative References . . . . . . . . . . . . . . . . . 44 - Appendix A. Additional Properties . . . . . . . . . . . . . . . 45 - A.1. Properties Affecting Sorting of Branches . . . . . . . . 45 - Appendix B. Reasons for errors . . . . . . . . . . . . . . . . . 45 - Appendix C. Existing Implementations . . . . . . . . . . . . . . 46 - Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 47 + 6. Implementing Message Framers . . . . . . . . . . . . . . . . 25 + 6.1. Defining Message Framers . . . . . . . . . . . . . . . . 26 + 6.2. Sender-side Message Framing . . . . . . . . . . . . . . . 27 + 6.3. Receiver-side Message Framing . . . . . . . . . . . . . . 27 + 7. Implementing Connection Management . . . . . . . . . . . . . 28 + 7.1. Pooled Connection . . . . . . . . . . . . . . . . . . . . 29 + 7.2. Handling Path Changes . . . . . . . . . . . . . . . . . . 29 + 8. Implementing Connection Termination . . . . . . . . . . . . . 30 + 9. Cached State . . . . . . . . . . . . . . . . . . . . . . . . 31 + 9.1. Protocol state caches . . . . . . . . . . . . . . . . . . 31 + 9.2. Performance caches . . . . . . . . . . . . . . . . . . . 32 + 10. Specific Transport Protocol Considerations . . . . . . . . . 33 + 10.1. TCP . . . . . . . . . . . . . . . . . . . . . . . . . . 34 + 10.2. UDP . . . . . . . . . . . . . . . . . . . . . . . . . . 35 + 10.3. UDP Multicast Receive . . . . . . . . . . . . . . . . . 36 + 10.4. TLS . . . . . . . . . . . . . . . . . . . . . . . . . . 38 + 10.5. DTLS . . . . . . . . . . . . . . . . . . . . . . . . . . 39 + 10.6. HTTP . . . . . . . . . . . . . . . . . . . . . . . . . . 40 + 10.7. QUIC . . . . . . . . . . . . . . . . . . . . . . . . . . 41 + 10.8. HTTP/2 transport . . . . . . . . . . . . . . . . . . . . 41 + 10.9. SCTP . . . . . . . . . . . . . . . . . . . . . . . . . . 42 + 11. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 44 + 12. Security Considerations . . . . . . . . . . . . . . . . . . . 44 + 12.1. Considerations for Candidate Gathering . . . . . . . . . 44 + 12.2. Considerations for Candidate Racing . . . . . . . . . . 44 + 13. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 45 + 14. References . . . . . . . . . . . . . . . . . . . . . . . . . 45 + 14.1. Normative References . . . . . . . . . . . . . . . . . . 45 + 14.2. Informative References . . . . . . . . . . . . . . . . . 46 + Appendix A. Additional Properties . . . . . . . . . . . . . . . 47 + A.1. Properties Affecting Sorting of Branches . . . . . . . . 47 + Appendix B. Reasons for errors . . . . . . . . . . . . . . . . . 47 + Appendix C. Existing Implementations . . . . . . . . . . . . . . 48 + Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 49 1. Introduction The Transport Services architecture [I-D.ietf-taps-arch] defines a system that allows applications to use transport networking protocols flexibly. The interface such a system exposes to applications is defined as the Transport Services API [I-D.ietf-taps-interface]. This API is designed to be generic across multiple transport protocols and sets of protocols features. @@ -140,27 +140,27 @@ an application into decisions on how to establish connections, and how to transfer data over those connections once established. The terminology used in this document is based on the Architecture [I-D.ietf-taps-arch]. 2. Implementing Connection Objects The connection objects that are exposed to applications for Transport Services are: - o the Preconnection, the bundle of properties that describes the + * the Preconnection, the bundle of properties that describes the application constraints on the transport; - o the Connection, the basic object that represents a flow of data in + * the Connection, the basic object that represents a flow of data in either direction between the Local and Remote Endpoints; - o and the Listener, a passive waiting object that delivers new + * and the Listener, a passive waiting object that delivers new Connections. Preconnection objects should be implemented as bundles of properties that an application can both read and write. Once a Preconnection has been used to create an outbound Connection or a Listener, the implementation should ensure that the copy of the properties held by the Connection or Listener is immutable. This may involve performing a deep-copy if the application is still able to modify properties on the original Preconnection object. @@ -206,29 +206,29 @@ The transport system should have a list of supported protocols available, which each have transport features reflecting the capabilities of the protocol. Once an application specifies its Transport Parameters, the transport system should match the required and prohibited properties against the transport features of the available protocols. In the following cases, failure should be detected during pre- establishment: - o The application requested Protocol Properties that include + * The application requested Protocol Properties that include requirements or prohibitions that cannot be satisfied by any of the available protocols. For example, if an application requires "Configure Reliability per Message", but no such protocol is available on the host running the transport system, e.g., because SCTP is not supported by the operating system, this should result in an error. - o The application requested Protocol Properties that are in conflict + * The application requested Protocol Properties that are in conflict with each other, i.e., the required and prohibited properties cannot be satisfied by the same protocol. For example, if an application prohibits "Reliable Data Transfer" but then requires "Configure Reliability per Message", this mismatch should result in an error. It is important to fail as early as possible in such cases in order to avoid allocating resources, e.g., to endpoint resolution, only to find out later that there is no protocol that satisfies the requirements. @@ -314,26 +314,31 @@ hostname + port endpoint, and has two valid interfaces available (Wi- Fi and LTE). The hostname resolves to a single IPv4 address on the Wi-Fi network, and resolves to the same IPv4 address on the LTE network, as well as a single IPv6 address. The aggregate set of connection establishment options can be viewed as follows: Aggregate [Endpoint: www.example.com:80] [Interface: Any] [Protocol: TCP] |-> [Endpoint: 192.0.2.1:80] [Interface: Wi-Fi] [Protocol: TCP] |-> [Endpoint: 192.0.2.1:80] [Interface: LTE] [Protocol: TCP] |-> [Endpoint: 2001:DB8::1.80] [Interface: LTE] [Protocol: TCP] - Any one of these sub-entries on the aggregate connection attempt would satisfy the original application intent. The concern of this section is the algorithm defining which of these options to try, when, and in what order. + During Candidate Gathering, an implementation first excludes all + protocols and paths that match a Prohibit or do not match all Require + properties. Then, the implementation will sort branches according to + Preferred properties, Avoided properties, and possibly other + criteria. + 4.1. Candidate Gathering The step of gathering candidates involves identifying which paths, protocols, and endpoints may be used for a given Connection. This list is determined by the requirements, prohibitions, and preferences of the application as specified in the Selection Properties. 4.1.1. Gathering Endpoint Candidates Both Local and Remote Endpoint Candidates must be discovered during @@ -644,49 +648,59 @@ including Selection Properties, which are specified in [I-D.ietf-taps-interface]. In addition to the properties provided by the application, an implementation may include additional criteria such as cached performance estimates, see Section 9.2, or system policy, see Section 3.2, in the ranking. Two examples of how Selection and Connection Properties may be used to sort branches are provided below: - o "Interface Instance or Type": If the application specifies an + * "Interface Instance or Type": If the application specifies an interface type to be preferred or avoided, implementations should rank paths accordingly. If the application specifies an interface type to be required or prohibited, we expect an implementation to not include the non-conforming paths into the three. - o "Capacity Profile": An implementation may use the Capacity Profile + * "Capacity Profile": An implementation may use the Capacity Profile to prefer paths optimized for the application's expected traffic pattern according to cached performance estimates, see Section 9.2: - * Scavenger: Prefer paths with the highest expected available + - Scavenger: Prefer paths with the highest expected available bandwidth, based on observed maximum throughput - * Low Latency/Interactive: Prefer paths with the lowest expected + - Low Latency/Interactive: Prefer paths with the lowest expected Round Trip Time - * Constant-Rate Streaming: Prefer paths that can satisfy the + - Constant-Rate Streaming: Prefer paths that can satisfy the requested Stream Send or Stream Receive Bitrate, based on observed maximum throughput Implementations should process properties in the following order: Prohibit, Require, Prefer, Avoid. If Selection Properties contain any prohibited properties, the implementation should first purge branches containing nodes with these properties. For required properties, it should only keep branches that satisfy these requirements. Finally, it should order branches according to preferred properties, and finally use avoided properties as a - tiebreaker. + tiebreaker. When ordering branches, an implementation may give more + weight to properties that the application has explicitly set than to + properties that are default. + + As the available protocols and paths on a specific system and in a + specific context may vary, the result of sorting and the outcome of + racing may vary even given the same Selection and Connection + Properties. However, an implementation ought to aim to provide a + consistent outcome to applications, e.g., by preferring protocols and + paths that existing Connections with similar Properties are already + using. 4.4. Candidate Racing The primary goal of the Candidate Racing process is to successfully negotiate a protocol stack to an endpoint over an interface--to connect a single leaf node of the tree--with as little delay and as few unnecessary connections attempts as possible. Optimizing these two factors improves the user experience, while minimizing network load. @@ -963,71 +978,71 @@ The effect of the application sending a Message is determined by the top-level protocol in the established Protocol Stack. That is, if the top-level protocol provides an abstraction of framed messages over a connection, the receiving application will be able to obtain multiple Messages on that connection, even if the framing protocol is built on a byte-stream protocol like TCP. 5.1.1. Message Properties - o Lifetime: this should be implemented by removing the Message from + * Lifetime: this should be implemented by removing the Message from its queue of pending Messages after the Lifetime has expired. A queue of pending Messages within the transport system implementation that have yet to be handed to the Protocol Stack can always support this property, but once a Message has been sent into the send buffer of a protocol, only certain protocols may support de-queueing a message. For example, TCP cannot remove bytes from its send buffer, while in case of SCTP, such control over the SCTP send buffer can be exercised using the partial reliability extension [RFC8303]. When there is no standing queue of Messages within the system, and the Protocol Stack does not support removing a Message from its buffer, this property may be ignored. - o Priority: this represents the ability to prioritize a Message over + * Priority: this represents the ability to prioritize a Message over other Messages. This can be implemented by the system re-ordering Messages that have yet to be handed to the Protocol Stack, or by giving relative priority hints to protocols that support priorities per Message. For example, an implementation of HTTP/2 could choose to send Messages of different Priority on streams of different priority. - o Ordered: when this is false, it disables the requirement of in- + * Ordered: when this is false, it disables the requirement of in- order-delivery for protocols that support configurable ordering. - o Idempotent: when this is true, it means that the Message can be + * Idempotent: when this is true, it means that the Message can be used by mechanisms that might transfer it multiple times - e.g., as a result of racing multiple transports or as part of TCP Fast Open. - o Final: when this is true, it means that a transport connection can + * Final: when this is true, it means that a transport connection can be closed immediately after its transmission. - o Corruption Protection Length: when this is set to any value other + * Corruption Protection Length: when this is set to any value other than -1, it limits the required checksum in protocols that allow limiting the checksum length (e.g. UDP-Lite). - o Transmission Profile: TBD - because it's not final in the API yet. + * Transmission Profile: TBD - because it's not final in the API yet. Old text follows: when this is set to "Interactive/Low Latency", the Message should be sent immediately, even when this comes at the cost of using the network capacity less efficiently. For example, small messages can sometimes be bundled to fit into a single data packet for the sake of reducing header overhead; such bundling should not be used. For example, in case of TCP, the Nagle algorithm should be disabled when Interactive/Low Latency is selected as the capacity profile. Scavenger/Bulk can translate into usage of a congestion control mechanism such as LEDBAT, and/ or the capacity profile can lead to a choice of a DSCP value as described in [I-D.ietf-taps-minset]). - o Singular Transmission: when this is true, the application requests + * Singular Transmission: when this is true, the application requests to avoid transport-layer segmentation or network-layer fragmentation. Some transports implement network-layer fragmentation avoidance (Path MTU Discovery) without exposing this functionality to the application; in this case, only transport- layer segmentation should be avoided, by fitting the message into a single transport-layer segment or otherwise failing. Otherwise, network-layer fragmentation should be avoided--e.g. by requesting the IP Don't Fragment bit to be set in case of UDP(-Lite) and IPv4 (SET_DF in [RFC8304]). @@ -1129,27 +1144,27 @@ While many protocols can be represented as Message Framers, for the purposes of the Transport Services interface these are ways for applications or application frameworks to define their own Message parsing to be included within a Connection's Protocol Stack. As an example, TLS can serve the purpose of framing data over TCP, but is exposed as a protocol natively supported by the Transport Services interface. Most Message Framers fall into one of two categories: - o Header-prefixed record formats, such as a basic Type-Length-Value + * Header-prefixed record formats, such as a basic Type-Length-Value (TLV) structure - o Delimiter-separated formats, such as HTTP/1.1. + * Delimiter-separated formats, such as HTTP/1.1. Common Message Framers can be provided by the Transport Services - implementation, but an implemention ought to allow custom Message + implementation, but an implementation ought to allow custom Message Framers to be defined by the application or some other piece of software. This section describes one possible interface for defining Message Framers as an example. 6.1. Defining Message Framers A Message Framer is primarily defined by the set of code that handles events for a framer implementation, specifically how it handles inbound and outbound data parsing. The piece of code that implements custom framing logic will be referred to as the "framer @@ -1170,44 +1185,60 @@ MessageFramer -> Stop(Connection) When a Message Framer generates a "Start" event, the framer implementation has the opportunity to start writing some data prior to the Connection delivering its "Ready" event. This allows the implementation to communicate control data to the remote endpoint that can be used to parse Messages. MessageFramer.MakeConnectionReady(Connection) + Similarly, when a Message Framer generates a "Stop" event, the framer + implementation has the opportunity to write some final data or clear + up its local state before the "Closed" event is delivered to the + Application. The framer implementation can indicate that it has + finished with this. + + MessageFramer.MakeConnectionClosed(Connection) + At any time if the implementation encounters a fatal error, it can also cause the Connection to fail and provide an error. MessageFramer.FailConnection(Connection, Error) + Should the framer implementation deem the candidate selected during + racing unsuitable it can signal this by failing the Connection prior + to marking it as ready. If there are no other candidates available, + the Connection will fail. Otherwise, the Connection will select a + different candidate and the Message Framer will generate a new + "Start" event. + Before an implementation marks a Message Framer as ready, it can also dynamically add a protocol or framer above it in the stack. This allows protocols like STARTTLS, that need to add TLS conditionally, to modify the Protocol Stack based on a handshake result. otherFramer := NewMessageFramer() MessageFramer.PrependFramer(Connection, otherFramer) 6.2. Sender-side Message Framing Message Framers generate an event whenever a Connection sends a new Message. MessageFramer -> NewSentMessage Upon receiving this event, a framer implementation is responsible for performing any necessary transformations and sending the resulting - data to the next protocol. Implementations SHOULD ensure that there - is a way to pass the original data through without copying to improve + data back to the Message Framer, which will in turn send it to the + next protocol. Implementations SHOULD ensure that there is a way to + pass the original data through without copying to improve performance. MessageFramer.Send(Connection, Data) To provide an example, a simple protocol that adds a length as a header would receive the "NewSentMessage" event, create a data representation of the length of the Message data, and then send a block of data that is the concatenation of the length header and the original Message data. @@ -1219,29 +1250,28 @@ MessageFramer -> HandleReceivedData Upon receiving this event, the framer implementation can inspect the inbound data. The data is parsed from a particular cursor representing the unprocessed data. The application requests a specific amount of data it needs to have available in order to parse. If the data is not available, the parse fails. MessageFramer.Parse(Connection, MinimumIncompleteLength, MaximumLength) -> (Data, MessageContext, IsEndOfMessage) - The framer implementation can directly advance the receive cursor once it has parsed data to effectively discard data (for example, discard a header once the content has been parsed). To deliver a Message to the application, the framer implementation - can either directly deliever data that it has allocated, or deliver a + can either directly deliver data that it has allocated, or deliver a range of data directly from the underlying transport and - simulatenously advance the receive cursor. + simultaneously advance the receive cursor. MessageFramer.AdvanceReceiveCursor(Connection, Length) MessageFramer.DeliverAndAdvanceReceiveCursor(Connection, MessageContext, Length, IsEndOfMessage) MessageFramer.Deliver(Connection, MessageContext, Data, IsEndOfMessage) Note that "MessageFramer.DeliverAndAdvanceReceiveCursor" allows the framer implementation to earmark bytes as part of a Message even before they are received by the transport. This allows the delivery of very large Messages without requiring the implementation to directly inspect all of the bytes. @@ -1381,29 +1411,29 @@ 9.1. Protocol state caches Some protocols will have long-term state to be cached in association with Endpoints. This state often has some time after which it is expired, so the implementation should allow each protocol to specify an expiration for cached content. Examples of cached protocol state include: - o The DNS protocol can cache resolution answers (A and AAAA queries, + * The DNS protocol can cache resolution answers (A and AAAA queries, for example), associated with a Time To Live (TTL) to be used for future hostname resolutions without requiring asking the DNS resolver again. - o TLS caches session state and tickets based on a hostname, which + * TLS caches session state and tickets based on a hostname, which can be used for resuming sessions with a server. - o TCP can cache cookies for use in TCP Fast Open. + * TCP can cache cookies for use in TCP Fast Open. Cached protocol state is primarily used during Connection establishment for a single Protocol Stack, but may be used to influence an implementation's preference between several candidate Protocol Stacks. For example, if two IP address Endpoints are otherwise equally preferred, an implementation may choose to attempt a connection to an address for which it has a TCP Fast Open cookie. Applications must have a way to flush protocol cache state if desired. This may be necessary, for example, if application-layer @@ -1411,25 +1441,25 @@ trackable TLS tickets or TFO cookies. 9.2. Performance caches In addition to protocol state, Protocol Instances should provide data into a performance-oriented cache to help guide future protocol and path selection. Some performance information can be gathered generically across several protocols to allow predictive comparisons between protocols on given paths: - o Observed Round Trip Time + * Observed Round Trip Time - o Connection Establishment latency + * Connection Establishment latency - o Connection Establishment success rate + * Connection Establishment success rate These items can be cached on a per-address and per-subnet granularity, and averaged between different values. The information should be cached on a per-network basis, since it is expected that different network attachments will have different performance characteristics. Besides Protocol Instances, other system entities may also provide data into performance-oriented caches. This could for instance be signal strength information reported by radio modems like Wi-Fi and mobile broadband or information about the battery- level of the device. Furthermore, the system may cache the observed @@ -1460,61 +1490,61 @@ implementation defines both its API mapping as well as implementation details. API mappings for a protocol apply most to Connections in which the given protocol is the "top" of the Protocol Stack. For example, the mapping of the "Send" function for TCP applies to Connections in which the application directly sends over TCP. If HTTP/2 is used on top of TCP, the HTTP/2 mappings take precendence. Each protocol has a notion of Connectedness. Possible values for Connectedness are: - o Unconnected. Unconnected protocols do not establish explicit + * Unconnected. Unconnected protocols do not establish explicit state between endpoints, and do not perform a handshake during Connection establishment. - o Connected. Connected protocols establish state between endpoints, + * Connected. Connected protocols establish state between endpoints, and perform a handshake during Connection establishment. The handshake may be 0-RTT to send data or resume a session, but bidirectional traffic is required to confirm connectedness. - o Multiplexing Connected. Multiplexing Connected protocols share + * Multiplexing Connected. Multiplexing Connected protocols share properties with Connected protocols, but also explictly support opening multiple application-level flows. This means that they can support cloning new Connection objects without a new explicit handshake. Protocols also define a notion of Data Unit. Possible values for Data Unit are: - o Byte-stream. Byte-stream protocols do not define any Message + * Byte-stream. Byte-stream protocols do not define any Message boundaries of their own apart from the end of a stream in each direction. - o Datagram. Datagram protocols define Message boundaries at the + * Datagram. Datagram protocols define Message boundaries at the same level of transmission, such that only complete (not partial) Messages are supported. - o Message. Message protocols support Message boundaries that can be + * Message. Message protocols support Message boundaries that can be sent and received either as complete or partial Messages. Maximum Message lengths can be defined, and Messages can be partially reliable. Below, primitives in the style of "CATEGORY.[SUBCATEGORY].PRIMITIVENAME.PROTOCOL" (e.g., "CONNECT.SCTP") refer to the primitives with the same name in section 4 of [RFC8303]. For further implementation details, the description of these primitives in [RFC8303] points to section 3, which refers - back the specifications for each protocol. This back-tracking method - applies to all elements of [I-D.ietf-taps-minset] (see appendix D of - [I-D.ietf-taps-interface]): they are listed in appendix A of - [I-D.ietf-taps-minset] with an implementation hint in the same style, - pointing back to section 4 of [RFC8303]. + back to the specifications for each protocol. This back-tracking + method applies to all elements of [I-D.ietf-taps-minset] (see + appendix D of [I-D.ietf-taps-interface]): they are listed in appendix + A of [I-D.ietf-taps-minset] with an implementation hint in the same + style, pointing back to section 4 of [RFC8303]. 10.1. TCP Connectedness: Connected Data Unit: Byte-stream API mappings for TCP are as follows: Connection Object: TCP connections between two hosts map directly to @@ -1622,21 +1652,87 @@ "Received", each of which represents a single datagram received in a UDP packet. Upon receiving a UDP datagram, the ECN flag from the IP header can be obtained (GET_ECN.UDP(-Lite)). Close: Calling "Close" on a UDP Connection (ABORT.UDP(-Lite)) releases the local port reservation. Abort: Calling "Abort" on a UDP Connection (ABORT.UDP(-Lite)) is identical to calling "Close". -10.3. TLS +10.3. UDP Multicast Receive + + Connectedness: Unconnected + + Data Unit: Datagram + + API mappings for Receiving Multicast UDP are as follows: + + Connection Object: Established UDP Multicast Receive connections + represent a pair of specific IP addresses and ports. The + "unidirectional receive" transport property is required, and the + local endpoint must be configured with a group IP address and a + port. + + Initiate: Calling "Initiate" on a UDP Multicast Receive Connection + causes an immediate InitiateError. This is an unsupported + operation. + + InitiateWithSend: Calling "InitiateWithSend" on a UDP Multicast + Receive Connection causes an immediate InitiateError. This is an + unsupported operation. + + Ready: A UDP Multicast Receive Connection is ready once the system + has received traffic for the appropriate group and port. + + InitiateError: UDP Multicast Receive Connections generate an + InitiateError if Initiate is called. + + ConnectionError: Once in use, UDP throws "soft errors" (ERROR.UDP(- + Lite)) upon receiving ICMP notifications indicating failures in + the network. + + Listen: LISTEN.UDP. Calling "Listen" for UDP Multicast Receive + binds a local port, prepares it to receive inbound UDP datagrams + from peers, and issues a multicast host join. If a remote + endpoint with an address is supplied, the join is Source-specific + Multicast, and the path selection is based on the route to the + remote endpoint. If a remote endpoint is not supplied, the join + is Any-source Multicast, and the path selection is based on the + outbound route to the group supplied in the local endpoint. + + ConnectionReceived: UDP Multicast Receive Listeners will deliver new + connections once they have received traffic from a new Remote + Endpoint. + + Clone: Calling "Clone" on a UDP Multicast Receive Connection creates + a new Connection with equivalent parameters. The two Connections + are otherwise independent. + + Send: SEND.UDP(-Lite). Calling "Send" on a UDP Multicast Receive + connection causes an immediate SendError. This is an unsupported + operation. + + Receive: RECEIVE.UDP(-Lite). The Receive operation in a UDP + Multicast Receive connection only delivers complete Messages to + "Received", each of which represents a single datagram received in + a UDP packet. Upon receiving a UDP datagram, the ECN flag from + the IP header can be obtained (GET_ECN.UDP(-Lite)). + + Close: Calling "Close" on a UDP Multicast Receive Connection + (ABORT.UDP(-Lite)) releases the local port reservation and leaves + the group. + + Abort: Calling "Abort" on a UDP Multicast Receive Connection + (ABORT.UDP(-Lite)) is identical to calling "Close". + +10.4. TLS The mapping of a TLS stream abstraction into the application is equivalent to the contract provided by TCP (see Section 10.1), and builds upon many of the actions of TCP connections. Connectedness: Connected Data Unit: Byte-stream Connection Object: Connection objects represent a single TLS @@ -1695,23 +1791,23 @@ Connection should be gracefully closed by sending a "close_notify" to the peer and waiting for a corresponding "close_notify" before delivering the "Closed" event. Abort: Calling "Abort" on a TCP Connection indicates that the Connection should be immediately closed by sending a "close_notify", optionally preceded by "user_canceled", to the peer. Implementations do not need to wait to receive "close_notify" before delivering the "Closed" event. -10.4. DTLS +10.5. DTLS - DTLS follows the same behavior as TLS (Section 10.3), with the + DTLS follows the same behavior as TLS (Section 10.4), with the notable exception of not inheriting behavior directly from TCP. Differences from TLS are detailed below, and all cases not explicitly mentioned should be considered the same as TLS. Connectedness: Connected Data Unit: Datagram Connection Object: Connection objects represent a single DTLS connection running over a set of UDP ports between two hosts. @@ -1725,21 +1821,21 @@ and keys have been established to encrypt application data. Send: Sending over DTLS does preserve message boundaries in the same way that UDP datagrams do. Marking a Message as Final does send a "close_notify" like TLS. Receive: Receiving over DTLS delivers one decrypted Message for each received DTLS datagram. If a "close_notify" is received, a Message will be delivered that is marked as Final. -10.5. HTTP +10.6. HTTP HTTP requests and responses map naturally into Messages, since they are delineated chunks of data with metadata that can be sent over a transport. To that end, HTTP can be seen as the most prevalent framing protocol that runs on top of streams like TCP, TLS, etc. In order to use a transport Connection that provides HTTP Message support, the establishment and closing of the connection can be treated as it would without the framing protocol. Sending and receiving of Messages, however, changes to treat each Message as a @@ -1771,21 +1867,21 @@ Receive: HTTP Connections deliver Messages in which HTTP header values attached to MessageContexts, and HTTP bodies in Message data. Close: Calling "Close" on an HTTP Connection will only close the underlying TLS or TCP connection if the HTTP version does not support multiplexing. For HTTP/2, for example, closing the connection only closes a specific stream. -10.6. QUIC +10.7. QUIC QUIC provides a multi-streaming interface to an encrypted transport. Each stream can be viewed as equivalent to a TLS stream over TCP, so a natural mapping is to present each QUIC stream as an individual Connection. The protocol for the stream will be considered Ready whenever the underlying QUIC connection is established to the point that this stream's data can be sent. For streams after the first stream, this will likely be an immediate operation. Closing a single QUIC stream, presented to the application as a @@ -1794,23 +1890,23 @@ connection once all streams have been closed (often after some timeout), or after an individual stream Connection sends an Abort. Connectedness: Multiplexing Connected Data Unit: Stream Connection Object: Connection objects represent a single QUIC stream on a QUIC connection. -10.7. HTTP/2 transport +10.8. HTTP/2 transport - Similar to QUIC (Section 10.6), HTTP/2 provides a multi-streaming + Similar to QUIC (Section 10.7), HTTP/2 provides a multi-streaming interface. This will generally use HTTP as the unit of Messages over the streams, in which each stream can be represented as a transport Connection. The lifetime of streams and the HTTP/2 connection should be managed as described for QUIC. It is possible to treat each HTTP/2 stream as a raw byte-stream instead of a carrier for HTTP messages, in which case the Messages over the streams can be represented similarly to the TCP stream (one Message per direction, see Section 10.1). @@ -1810,25 +1906,24 @@ be managed as described for QUIC. It is possible to treat each HTTP/2 stream as a raw byte-stream instead of a carrier for HTTP messages, in which case the Messages over the streams can be represented similarly to the TCP stream (one Message per direction, see Section 10.1). Connectedness: Multiplexing Connected Data Unit: Stream - Connection Object: Connection objects represent a single HTTP/2 stream on a HTTP/2 connection. -10.8. SCTP +10.9. SCTP Connectedness: Connected Data Unit: Message API mappings for SCTP are as follows: Connection Object: Connection objects represent a flow of SCTP messages between a client and a server, which may be an SCTP association or a stream in a SCTP association. How to map @@ -1985,34 +2080,40 @@ Kinnear for their implementation and design efforts, including Happy Eyeballs, that heavily influenced this work. 14. References 14.1. Normative References [I-D.ietf-taps-arch] Pauly, T., Trammell, B., Brunstrom, A., Fairhurst, G., Perkins, C., Tiesel, P., and C. Wood, "An Architecture for - Transport Services", draft-ietf-taps-arch-04 (work in - progress), July 2019. + Transport Services", Work in Progress, Internet-Draft, + draft-ietf-taps-arch-06, 23 December 2019, + . [I-D.ietf-taps-interface] Trammell, B., Welzl, M., Enghardt, T., Fairhurst, G., Kuehlewind, M., Perkins, C., Tiesel, P., Wood, C., and T. Pauly, "An Abstract Application Layer Interface to - Transport Services", draft-ietf-taps-interface-04 (work in - progress), July 2019. + Transport Services", Work in Progress, Internet-Draft, + draft-ietf-taps-interface-05, 4 November 2019, + . [I-D.ietf-taps-minset] Welzl, M. and S. Gjessing, "A Minimal Set of Transport - Services for End Systems", draft-ietf-taps-minset-11 (work - in progress), September 2018. + Services for End Systems", Work in Progress, Internet- + Draft, draft-ietf-taps-minset-11, 27 September 2018, + . [RFC7413] Cheng, Y., Chu, J., Radhakrishnan, S., and A. Jain, "TCP Fast Open", RFC 7413, DOI 10.17487/RFC7413, December 2014, . [RFC7540] Belshe, M., Peon, R., and M. Thomson, Ed., "Hypertext Transfer Protocol Version 2 (HTTP/2)", RFC 7540, DOI 10.17487/RFC7540, May 2015, . @@ -2038,172 +2139,167 @@ . [RFC8446] Rescorla, E., "The Transport Layer Security (TLS) Protocol Version 1.3", RFC 8446, DOI 10.17487/RFC8446, August 2018, . 14.2. Informative References [I-D.ietf-quic-transport] Iyengar, J. and M. Thomson, "QUIC: A UDP-Based Multiplexed - and Secure Transport", draft-ietf-quic-transport-23 (work - in progress), September 2019. + and Secure Transport", Work in Progress, Internet-Draft, + draft-ietf-quic-transport-27, 21 February 2020, + . [NEAT-flow-mapping] "Transparent Flow Mapping for NEAT (in Workshop on Future - of Internet Transport (FIT 2017))", n.d.. + of Internet Transport (FIT 2017))", 2017. [RFC5245] Rosenberg, J., "Interactive Connectivity Establishment (ICE): A Protocol for Network Address Translator (NAT) Traversal for Offer/Answer Protocols", RFC 5245, DOI 10.17487/RFC5245, April 2010, . - [Trickle] "Trickle - Rate Limiting YouTube Video Streaming (ATC - 2012)", n.d.. - -14.3. URIs - - [1] https://developer.apple.com/documentation/network - - [2] https://github.com/NEAT-project/neat - - [3] https://www.neat-project.org - - [4] https://github.com/fg-inet/python-asyncio-taps - Appendix A. Additional Properties This appendix discusses implementation considerations for additional parameters and properties that could be used to enhance transport protocol and/or path selection, or the transmission of messages given a Protocol Stack that implements them. These are not part of the interface, and may be removed from the final document, but are presented here to support discussion within the TAPS working group as to whether they should be added to a future revision of the base specification. A.1. Properties Affecting Sorting of Branches In addition to the Protocol and Path Selection Properties discussed in Section 4.3, the following properties under discussion can influence branch sorting: - o Bounds on Send or Receive Rate: If the application indicates a + * Bounds on Send or Receive Rate: If the application indicates a bound on the expected Send or Receive bitrate, an implementation may prefer a path that can likely provide the desired bandwidth, based on cached maximum throughput, see Section 9.2. The application may know the Send or Receive Bitrate from metadata in adaptive HTTP streaming, such as MPEG-DASH. - o Cost Preferences: If the application indicates a preference to + * Cost Preferences: If the application indicates a preference to avoid expensive paths, and some paths are associated with a monetary cost, an implementation should decrease the ranking of such paths. If the application indicates that it prohibits using expensive paths, paths that are associated with a cost should be purged from the decision tree. Appendix B. Reasons for errors The Transport Services API [I-D.ietf-taps-interface] allows for the several generic error types to specify a more detailed reason as to why an error occurred. This appendix lists some of the possible reasons. - o InvalidConfiguration: The transport properties and endpoints + * InvalidConfiguration: The transport properties and endpoints provided by the application are either contradictory or incomplete. Examples include the lack of a remote endpoint on an active open or using a multicast group address while not requesting a unidirectional receive. - o NoCandidates: The configuration is valid, but none of the + * NoCandidates: The configuration is valid, but none of the available transport protocols can satisfy the transport properties provided by the application. - o ResolutionFailed: The remote or local specifier provided by the + * ResolutionFailed: The remote or local specifier provided by the application can not be resolved. - o EstablishmentFailed: The TAPS system was unable to establish a + * EstablishmentFailed: The TAPS system was unable to establish a transport-layer connection to the remote endpoint specified by the application. - o PolicyProhibited: The system policy prevents the transport system + * PolicyProhibited: The system policy prevents the transport system from performing the action requested by the application. - o NotCloneable: The protocol stack is not capable of being cloned. + * NotCloneable: The protocol stack is not capable of being cloned. - o MessageTooLarge: The message size is too big for the transport + * MessageTooLarge: The message size is too big for the transport system to handle. - o ProtocolFailed: The underlying protocol stack failed. + * ProtocolFailed: The underlying protocol stack failed. - o InvalidMessageProperties: The message properties are either + * InvalidMessageProperties: The message properties are either contradictory to the transport properties or they can not be satisfied by the transport system. - o DeframingFailed: The data that was received by the underlying + * DeframingFailed: The data that was received by the underlying protocol stack could not be deframed. - o ConnectionAborted: The connection was aborted by the peer. + * ConnectionAborted: The connection was aborted by the peer. - o Timeout: Delivery of a message was not possible after a timeout. + * Timeout: Delivery of a message was not possible after a timeout. Appendix C. Existing Implementations This appendix gives an overview of existing implementations, at the time of writing, of transport systems that are (to some degree) in line with this document. - o Apple's Network.framework: + * Apple's Network.framework: - * [A very brief introduction should be added] + - Network.framework is a transport-level API built for C, + Objective-C, and Swift. It a connect-by-name API that supports + transport security protocols. It provides userspace + implementations of TCP, UDP, TLS, DTLS, proxy protocols, and + allows extension via custom framers. - * Documentation: https://developer.apple.com/documentation/ - network [1] + - Documentation: https://developer.apple.com/documentation/ + network (https://developer.apple.com/documentation/network) - o NEAT: + * NEAT: - * NEAT is the output of the European H2020 research project + - NEAT is the output of the European H2020 research project "NEAT"; it is a user-space library for protocol-independent communication on top of TCP, UDP and SCTP, with many more features such as a policy manager. - * Code: https://github.com/NEAT-project/neat [2] + - Code: https://github.com/NEAT-project/neat (https://github.com/ + NEAT-project/neat) - * NEAT project: https://www.neat-project.org [3] + - NEAT project: https://www.neat-project.org (https://www.neat- + project.org) - o PyTAPS: + * PyTAPS: - * A TAPS implementation based on Python asyncio, offering + - A TAPS implementation based on Python asyncio, offering protocol-independent communication to applications on top of TCP, UDP and TLS, with support for multicast. - * Code: https://github.com/fg-inet/python-asyncio-taps [4] + - Code: https://github.com/fg-inet/python-asyncio-taps + (https://github.com/fg-inet/python-asyncio-taps) Authors' Addresses Anna Brunstrom (editor) Karlstad University Universitetsgatan 2 - 651 88 Karlstad + SE- 651 88 Karlstad Sweden Email: anna.brunstrom@kau.se Tommy Pauly (editor) Apple Inc. One Apple Park Way - Cupertino, California 95014 + Cupertino, California 95014, United States of America Email: tpauly@apple.com - Theresa Enghardt TU Berlin Marchstrasse 23 10587 Berlin Germany Email: theresa@inet.tu-berlin.de Karl-Johan Grinnemo Karlstad University @@ -2201,29 +2297,30 @@ TU Berlin Marchstrasse 23 10587 Berlin Germany Email: theresa@inet.tu-berlin.de Karl-Johan Grinnemo Karlstad University Universitetsgatan 2 - 651 88 Karlstad + SE- 651 88 Karlstad Sweden Email: karl-johan.grinnemo@kau.se + Tom Jones University of Aberdeen Fraser Noble Building Aberdeen, AB24 3UE - UK + United Kingdom Email: tom@erg.abdn.ac.uk Philipp S. Tiesel TU Berlin Einsteinufer 25 10587 Berlin Germany Email: philipp@tiesel.net