--- 1/draft-ietf-taps-impl-04.txt 2019-11-04 13:18:30.977978823 -0800 +++ 2/draft-ietf-taps-impl-05.txt 2019-11-04 13:18:31.069981155 -0800 @@ -1,31 +1,31 @@ TAPS Working Group A. Brunstrom, Ed. Internet-Draft Karlstad University Intended status: Informational T. Pauly, Ed. -Expires: January 9, 2020 Apple Inc. +Expires: May 7, 2020 Apple Inc. T. Enghardt TU Berlin K-J. Grinnemo Karlstad University T. Jones University of Aberdeen P. Tiesel TU Berlin C. Perkins University of Glasgow M. Welzl University of Oslo - July 08, 2019 + November 04, 2019 Implementing Interfaces to Transport Services - draft-ietf-taps-impl-04 + draft-ietf-taps-impl-05 Abstract The Transport Services architecture [I-D.ietf-taps-arch] defines a system that allows applications to use transport networking protocols flexibly. This document serves as a guide to implementation on how to build such a system. Status of This Memo @@ -35,121 +35,133 @@ Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet- Drafts is at https://datatracker.ietf.org/drafts/current/. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." - This Internet-Draft will expire on January 9, 2020. + This Internet-Draft will expire on May 7, 2020. Copyright Notice Copyright (c) 2019 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License. Table of Contents 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 - 2. Implementing Basic Objects . . . . . . . . . . . . . . . . . 3 + 2. Implementing Connection Objects . . . . . . . . . . . . . . . 4 3. Implementing Pre-Establishment . . . . . . . . . . . . . . . 4 3.1. Configuration-time errors . . . . . . . . . . . . . . . . 5 - 3.2. Role of system policy . . . . . . . . . . . . . . . . . . 5 + 3.2. Role of system policy . . . . . . . . . . . . . . . . . . 6 4. Implementing Connection Establishment . . . . . . . . . . . . 6 - 4.1. Candidate Gathering . . . . . . . . . . . . . . . . . . . 7 - 4.1.1. Gathering Endpoint Candidates . . . . . . . . . . . . 7 + 4.1. Candidate Gathering . . . . . . . . . . . . . . . . . . . 8 + 4.1.1. Gathering Endpoint Candidates . . . . . . . . . . . . 8 4.1.2. Structuring Options as a Tree . . . . . . . . . . . . 9 - 4.1.3. Branch Types . . . . . . . . . . . . . . . . . . . . 10 + 4.1.3. Branch Types . . . . . . . . . . . . . . . . . . . . 11 4.2. Branching Order-of-Operations . . . . . . . . . . . . . . 13 4.3. Sorting Branches . . . . . . . . . . . . . . . . . . . . 14 4.4. Candidate Racing . . . . . . . . . . . . . . . . . . . . 15 4.4.1. Delayed . . . . . . . . . . . . . . . . . . . . . . . 16 - 4.4.2. Failover . . . . . . . . . . . . . . . . . . . . . . 16 + 4.4.2. Failover . . . . . . . . . . . . . . . . . . . . . . 17 4.5. Completing Establishment . . . . . . . . . . . . . . . . 17 - 4.5.1. Determining Successful Establishment . . . . . . . . 17 + 4.5.1. Determining Successful Establishment . . . . . . . . 18 4.6. Establishing multiplexed connections . . . . . . . . . . 18 4.7. Handling racing with "unconnected" protocols . . . . . . 19 4.8. Implementing listeners . . . . . . . . . . . . . . . . . 19 4.8.1. Implementing listeners for Connected Protocols . . . 20 4.8.2. Implementing listeners for Unconnected Protocols . . 20 4.8.3. Implementing listeners for Multiplexed Protocols . . 20 - 5. Implementing Data Transfer . . . . . . . . . . . . . . . . . 20 - 5.1. Data transfer for streams, datagrams, and frames . . . . 20 - 5.1.1. Sending Messages . . . . . . . . . . . . . . . . . . 21 - 5.1.2. Receiving Messages . . . . . . . . . . . . . . . . . 23 - 5.2. Handling of data for fast-open protocols . . . . . . . . 23 - 6. Implementing Maintenance . . . . . . . . . . . . . . . . . . 24 - 6.1. Managing Connections . . . . . . . . . . . . . . . . . . 24 - 6.2. Handling Path Changes . . . . . . . . . . . . . . . . . . 26 - - 7. Implementing Termination . . . . . . . . . . . . . . . . . . 26 - 8. Cached State . . . . . . . . . . . . . . . . . . . . . . . . 27 - 8.1. Protocol state caches . . . . . . . . . . . . . . . . . . 27 - 8.2. Performance caches . . . . . . . . . . . . . . . . . . . 28 - 9. Specific Transport Protocol Considerations . . . . . . . . . 29 - 9.1. TCP . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 - 9.2. UDP . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 - 9.3. TLS . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 - 9.4. DTLS . . . . . . . . . . . . . . . . . . . . . . . . . . 34 - 9.5. HTTP . . . . . . . . . . . . . . . . . . . . . . . . . . 34 - 9.6. QUIC . . . . . . . . . . . . . . . . . . . . . . . . . . 35 - 9.7. HTTP/2 transport . . . . . . . . . . . . . . . . . . . . 36 - 9.8. SCTP . . . . . . . . . . . . . . . . . . . . . . . . . . 36 - 10. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 37 - 11. Security Considerations . . . . . . . . . . . . . . . . . . . 37 - 11.1. Considerations for Candidate Gathering . . . . . . . . . 37 - 11.2. Considerations for Candidate Racing . . . . . . . . . . 37 - 12. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 38 - 13. References . . . . . . . . . . . . . . . . . . . . . . . . . 38 - 13.1. Normative References . . . . . . . . . . . . . . . . . . 38 - 13.2. Informative References . . . . . . . . . . . . . . . . . 39 - Appendix A. Additional Properties . . . . . . . . . . . . . . . 40 - A.1. Properties Affecting Sorting of Branches . . . . . . . . 40 - Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 40 + 5. Implementing Sending and Receiving Data . . . . . . . . . . . 21 + 5.1. Sending Messages . . . . . . . . . . . . . . . . . . . . 21 + 5.1.1. Message Properties . . . . . . . . . . . . . . . . . 21 + 5.1.2. Send Completion . . . . . . . . . . . . . . . . . . . 23 + 5.1.3. Batching Sends . . . . . . . . . . . . . . . . . . . 23 + 5.2. Receiving Messages . . . . . . . . . . . . . . . . . . . 23 + 5.3. Handling of data for fast-open protocols . . . . . . . . 24 + 6. Implementing Message Framers . . . . . . . . . . . . . . . . 24 + 6.1. Defining Message Framers . . . . . . . . . . . . . . . . 25 + 6.2. Sender-side Message Framing . . . . . . . . . . . . . . . 26 + 6.3. Receiver-side Message Framing . . . . . . . . . . . . . . 26 + 7. Implementing Connection Management . . . . . . . . . . . . . 27 + 7.1. Pooled Connection . . . . . . . . . . . . . . . . . . . . 28 + 7.2. Handling Path Changes . . . . . . . . . . . . . . . . . . 28 + 8. Implementing Connection Termination . . . . . . . . . . . . . 29 + 9. Cached State . . . . . . . . . . . . . . . . . . . . . . . . 30 + 9.1. Protocol state caches . . . . . . . . . . . . . . . . . . 30 + 9.2. Performance caches . . . . . . . . . . . . . . . . . . . 31 + 10. Specific Transport Protocol Considerations . . . . . . . . . 32 + 10.1. TCP . . . . . . . . . . . . . . . . . . . . . . . . . . 33 + 10.2. UDP . . . . . . . . . . . . . . . . . . . . . . . . . . 34 + 10.3. TLS . . . . . . . . . . . . . . . . . . . . . . . . . . 35 + 10.4. DTLS . . . . . . . . . . . . . . . . . . . . . . . . . . 37 + 10.5. HTTP . . . . . . . . . . . . . . . . . . . . . . . . . . 37 + 10.6. QUIC . . . . . . . . . . . . . . . . . . . . . . . . . . 38 + 10.7. HTTP/2 transport . . . . . . . . . . . . . . . . . . . . 39 + 10.8. SCTP . . . . . . . . . . . . . . . . . . . . . . . . . . 39 + 11. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 42 + 12. Security Considerations . . . . . . . . . . . . . . . . . . . 42 + 12.1. Considerations for Candidate Gathering . . . . . . . . . 42 + 12.2. Considerations for Candidate Racing . . . . . . . . . . 42 + 13. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 42 + 14. References . . . . . . . . . . . . . . . . . . . . . . . . . 43 + 14.1. Normative References . . . . . . . . . . . . . . . . . . 43 + 14.2. Informative References . . . . . . . . . . . . . . . . . 44 + Appendix A. Additional Properties . . . . . . . . . . . . . . . 45 + A.1. Properties Affecting Sorting of Branches . . . . . . . . 45 + Appendix B. Reasons for errors . . . . . . . . . . . . . . . . . 45 + Appendix C. Existing Implementations . . . . . . . . . . . . . . 46 + Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 47 1. Introduction The Transport Services architecture [I-D.ietf-taps-arch] defines a system that allows applications to use transport networking protocols flexibly. The interface such a system exposes to applications is defined as the Transport Services API [I-D.ietf-taps-interface]. This API is designed to be generic across multiple transport protocols and sets of protocols features. This document serves as a guide to implementation on how to build a system that provides a Transport Services API. It is the job of an implementation of a Transport Services system to turn the requests of an application into decisions on how to establish connections, and how to transfer data over those connections once established. The terminology used in this document is based on the Architecture [I-D.ietf-taps-arch]. -2. Implementing Basic Objects +2. Implementing Connection Objects - The basic objects that are exposed to applications for Transport - Services are the Preconnection, the bundle of properties that - describes the application constraints on the transport; the - Connection, the basic object that represents a flow of data in either - direction between the Local and Remote Endpoints; and the Listener, a - passive waiting object that delivers new Connections. + The connection objects that are exposed to applications for Transport + Services are: + + o the Preconnection, the bundle of properties that describes the + application constraints on the transport; + + o the Connection, the basic object that represents a flow of data in + either direction between the Local and Remote Endpoints; + + o and the Listener, a passive waiting object that delivers new + Connections. Preconnection objects should be implemented as bundles of properties that an application can both read and write. Once a Preconnection has been used to create an outbound Connection or a Listener, the implementation should ensure that the copy of the properties held by the Connection or Listener is immutable. This may involve performing a deep-copy if the application is still able to modify properties on the original Preconnection object. Connection objects represent the interface between the application @@ -549,21 +561,21 @@ Another example is racing SCTP with TCP: 1 [www.example.com:80, Any, Any Stream] 1.1 [www.example.com:80, Any, SCTP] 1.1.1 [192.0.2.1:80, Any, SCTP] 1.2 [www.example.com:80, Any, TCP] 1.2.1 [192.0.2.1:80, Any, TCP] Implementations that support racing protocols and protocol options should maintain a history of which protocols and protocol options - successfully established, on a per-network basis (see Section 8.2). + successfully established, on a per-network basis (see Section 9.2). This information can influence future racing decisions to prioritize or prune branches. 4.2. Branching Order-of-Operations Branch types must occur in a specific order relative to one another to avoid creating leaf nodes with invalid or incompatible settings. In the example above, it would be invalid to branch for derived endpoints (the DNS results for www.example.com) before branching between interface paths, since usable DNS results on one network may @@ -628,35 +639,35 @@ Implementations should sort the branches of the tree of connection options in order of their preference rank. Leaf nodes on branches with higher rankings represent connection attempts that will be raced first. Implementations should order the branches to reflect the preferences expressed by the application for its new connection, including Selection Properties, which are specified in [I-D.ietf-taps-interface]. In addition to the properties provided by the application, an implementation may include additional criteria such as cached - performance estimates, see Section 8.2, or system policy, see + performance estimates, see Section 9.2, or system policy, see Section 3.2, in the ranking. Two examples of how Selection and Connection Properties may be used to sort branches are provided below: o "Interface Instance or Type": If the application specifies an interface type to be preferred or avoided, implementations should rank paths accordingly. If the application specifies an interface type to be required or prohibited, we expect an implementation to not include the non-conforming paths into the three. o "Capacity Profile": An implementation may use the Capacity Profile to prefer paths optimized for the application's expected traffic pattern according to cached performance estimates, see - Section 8.2: + Section 9.2: * Scavenger: Prefer paths with the highest expected available bandwidth, based on observed maximum throughput * Low Latency/Interactive: Prefer paths with the lowest expected Round Trip Time * Constant-Rate Streaming: Prefer paths that can satisfy the requested Stream Send or Stream Receive Bitrate, based on observed maximum throughput @@ -784,23 +794,24 @@ If a leaf node has successfully completed its connection, all other attempts should be made ineligible for use by the application for the original request. New connection attempts that involve transmitting data on the network should not be started after another leaf node has completed successfully, as the connection as a whole has been established. An implementation may choose to let certain handshakes and negotiations complete in order to gather metrics to influence future connections. Similarly, an implementation may choose to hold onto fully established leaf nodes that were not the first to - establish for use in future connections, but this approach is not - recommended since those attempts were slower to connect and may - exhibit less desirable properties. + establish for use as part of a Pooled Connection, see Section 7.1, or + in future connections. In both cases, keeping additional connections + is generally not recommended since those attempts were slower to + connect and may exhibit less desirable properties. 4.5.1. Determining Successful Establishment Implementations may select the criteria by which a leaf node is considered to be successfully connected differently on a per-protocol basis. If the only protocol being used is a transport protocol with a clear handshake, like TCP, then the obvious choice is to declare that node "connected" when the last packet of the three-way handshake has been received. If the only protocol being used is an "unconnected" protocol, like UDP, the implementation may consider the @@ -921,52 +932,50 @@ tuple can listen both for entirely new connections (a new HTTP/2 stream on a new TCP connection, for example) and for new sub- connections (a new HTTP/2 stream on an existing connection). If the abstraction of Connection presented to the application is mapped to the multiplexed stream, then the Listener should deliver new Connection objects in the same way for either case. The implementation should allow the application to introspect the Connection Group marked on the Connections to determine the grouping of the multiplexing. -5. Implementing Data Transfer - -5.1. Data transfer for streams, datagrams, and frames +5. Implementing Sending and Receiving Data The most basic mapping for sending a Message is an abstraction of datagrams, in which the transport protocol naturally deals in discrete packets. Each Message here corresponds to a single datagram. Generally, these will be short enough that sending and receiving will always use a complete Message. For protocols that expose byte-streams, the only delineation provided by the protocol is the end of the stream in a given direction. Each Message in this case corresponds to the entire stream of bytes in a direction. These Messages may be quite long, in which case they can be sent in multiple parts. Protocols that provide the framing (such as length-value protocols, or protocols that use delimiters) provide data boundaries that may be longer than a traditional packet datagram. Each Message for framing protocols corresponds to a single frame, which may be sent either as a complete Message, or in multiple parts. -5.1.1. Sending Messages +5.1. Sending Messages The effect of the application sending a Message is determined by the top-level protocol in the established Protocol Stack. That is, if the top-level protocol provides an abstraction of framed messages over a connection, the receiving application will be able to obtain multiple Messages on that connection, even if the framing protocol is built on a byte-stream protocol like TCP. -5.1.1.1. Message Properties +5.1.1. Message Properties o Lifetime: this should be implemented by removing the Message from its queue of pending Messages after the Lifetime has expired. A queue of pending Messages within the transport system implementation that have yet to be handed to the Protocol Stack can always support this property, but once a Message has been sent into the send buffer of a protocol, only certain protocols may support de-queueing a message. For example, TCP cannot remove bytes from its send buffer, while in case of SCTP, such control over the SCTP send buffer can be exercised using the partial @@ -1015,43 +1024,43 @@ to avoid transport-layer segmentation or network-layer fragmentation. Some transports implement network-layer fragmentation avoidance (Path MTU Discovery) without exposing this functionality to the application; in this case, only transport- layer segmentation should be avoided, by fitting the message into a single transport-layer segment or otherwise failing. Otherwise, network-layer fragmentation should be avoided--e.g. by requesting the IP Don't Fragment bit to be set in case of UDP(-Lite) and IPv4 (SET_DF in [RFC8304]). -5.1.1.2. Send Completion +5.1.2. Send Completion The application should be notified whenever a Message or partial Message has been consumed by the Protocol Stack, or has failed to send. The meaning of the Message being consumed by the stack may vary depending on the protocol. For a basic datagram protocol like UDP, this may correspond to the time when the packet is sent into the interface driver. For a protocol that buffers data in queues, like TCP, this may correspond to when the data has entered the send buffer. -5.1.1.3. Batching Sends +5.1.3. Batching Sends Since sending a Message may involve a context switch between the application and the transport system, sending patterns that involve multiple small Messages can incur high overhead if each needs to be enqueued separately. To avoid this, the application should have a way to indicate a batch of Send actions, during which time the implementation will hold off on processing Messages until the batch is complete. This can also help context switches when enqueuing data in the interface driver if the operation can be batched. -5.1.2. Receiving Messages +5.2. Receiving Messages Similar to sending, Receiving a Message is determined by the top- level protocol in the established Protocol Stack. The main difference with Receiving is that the size and boundaries of the Message are not known beforehand. The application can communicate in its Receive action the parameters for the Message, which can help the implementation know how much data to deliver and when. For example, if the application only wants to receive a complete Message, the implementation should wait until an entire Message (datagram, stream, or frame) is read before delivering any Message content to the @@ -1062,21 +1071,21 @@ supports a byte-stream and no deframers were supported, the application must specify the minimum number of bytes of Message content it wants to receive (which may be just a single byte) to control the flow of received data. If a Connection becomes finished before a requested Receive action can be satisfied, the implementation should deliver any partial Message content outstanding, or if none is available, an indication that there will be no more received Messages. -5.2. Handling of data for fast-open protocols +5.3. Handling of data for fast-open protocols Several protocols allow sending higher-level protocol or application data within the first packet of their protocol establishment, such as TCP Fast Open [RFC7413] and TLS 1.3 [RFC8446]. This approach is referred to as sending Zero-RTT (0-RTT) data. This is a desirable property, but poses challenges to an implementation that uses racing during connection establishment. If the application has 0-RTT data to send in any protocol handshakes, it needs to provide this data before the handshakes have begun. When @@ -1103,104 +1112,223 @@ cookies, previously established TLS tickets, or out-of-band distributed pre-shared keys (PSKs). Implementations should be aware of security concerns around using these tokens across multiple addresses or paths when racing. In the case of TLS, any given ticket or PSK should only be used on one leaf node. If implementations have multiple tickets available from a previous connection, each leaf node attempt must use a different ticket. In effect, each leaf node will send the same early application data, yet encoded (encrypted) differently on the wire. -6. Implementing Maintenance +6. Implementing Message Framers - Maintenance encompasses changes that the application can request to a - Connection, or that a Connection can react to based on system and - network changes. + Message Framers are pieces of code that define simple transformations + between application Message data and raw transport protocol data. A + Framer can encapsulate or encode outbound Messages, and decapsulate + or decode inbound data into Messages. -6.1. Managing Connections + While many protocols can be represented as Message Framers, for the + purposes of the Transport Services interface these are ways for + applications or application frameworks to define their own Message + parsing to be included within a Connection's Protocol Stack. As an + example, TLS can serve the purpose of framing data over TCP, but is + exposed as a protocol natively supported by the Transport Services + interface. - Appendix A.1 of [I-D.ietf-taps-minset] explains, using primitives - from [RFC8303] and [RFC8304], how to implement changing some of the - following protocol properties of an established connection with TCP - and UDP. Below, we amend this description for other protocols (if - applicable) and extend it with Connection Properties that are not - contained in [I-D.ietf-taps-minset]. + Most Message Framers fall into one of two categories: - o Notification of excessive retransmissions: TODO - o Retransmission threshold before excessive retransmission - notification: TODO; for TCP, this can be done using ERROR.TCP - described in section 4 of [RFC8303]. + o Header-prefixed record formats, such as a basic Type-Length-Value + (TLV) structure - o Notification of ICMP soft error message arrival: TODO + o Delimiter-separated formats, such as HTTP/1.1. - o Required minimum coverage of the checksum for receiving: for UDP- - Lite, this can be done using the primitive - SET_MIN_CHECKSUM_COVERAGE.UDP-Lite described in section 4 of - [RFC8303]. + Common Message Framers can be provided by the Transport Services + implementation, but an implemention ought to allow custom Message + Framers to be defined by the application or some other piece of + software. This section describes one possible interface for defining + Message Framers as an example. - o Priority (Connection): TODO; for SCTP, this can be done using the - primitive CONFIGURE_STREAM_SCHEDULER.SCTP described in section 4 - of [RFC8303]. +6.1. Defining Message Framers - o Timeout for aborting Connection: for SCTP, this can be done using - the primitive CHANGE_TIMEOUT.SCTP described in section 4 of - [RFC8303]. + A Message Framer is primarily defined by the set of code that handles + events for a framer implementation, specifically how it handles + inbound and outbound data parsing. The piece of code that implements + custom framing logic will be referred to as the "framer + implementation", which may be provided by the Transport Services + implementation or the application itself. The Message Framer refers + to the object or piece of code within the main Connection + implementation that delivers events to the custom framer + implementation whenever data is ready to be parsed or framed. - o Connection group transmission scheduler: for SCTP, this can be - done using the primitive SET_STREAM_SCHEDULER.SCTP described in - section 4 of [RFC8303]. + When a Connection establishment attempt begins, an event can be + delivered to notify the framer implementation that a new Connection + is being created. Similarly, a stop event can be delivered when a + Connection is being torn down. The framer implementation can use the + Connection object to look up specific properties of the Connection or + the network being used that may influence how to frame Messages. - o Maximum message size concurrent with Connection establishment: - TODO + MessageFramer -> Start(Connection) + MessageFramer -> Stop(Connection) - o Maximum Message size before fragmentation or segmentation: TODO + When a Message Framer generates a "Start" event, the framer + implementation has the opportunity to start writing some data prior + to the Connection delivering its "Ready" event. This allows the + implementation to communicate control data to the remote endpoint + that can be used to parse Messages. - o Maximum Message size on send: TODO + MessageFramer.MakeConnectionReady(Connection) - o Maximum Message size on receive: TODO + At any time if the implementation encounters a fatal error, it can + also cause the Connection to fail and provide an error. - o Capacity Profile: TODO + MessageFramer.FailConnection(Connection, Error) - o Bounds on Send or Receive Rate: TODO + Before an implementation marks a Message Framer as ready, it can also + dynamically add a protocol or framer above it in the stack. This + allows protocols like STARTTLS, that need to add TLS conditionally, + to modify the Protocol Stack based on a handshake result. - o TCP-specific Property: User Timeout: for TCP, this can be - configured using the primitive CHANGE_TIMEOUT.TCP described in - section 4 of [RFC8303]. + otherFramer := NewMessageFramer() + MessageFramer.PrependFramer(Connection, otherFramer) - It may happen that the application attempts to set a Protocol - Property which does not apply to the actually chosen protocol. In - this case, the implementation should fail gracefully, i.e., it may - give a warning to the application, but it should not terminate the - Connection. +6.2. Sender-side Message Framing -6.2. Handling Path Changes + Message Framers generate an event whenever a Connection sends a new + Message. + +MessageFramer -> NewSentMessage + + Upon receiving this event, a framer implementation is responsible for + performing any necessary transformations and sending the resulting + data to the next protocol. Implementations SHOULD ensure that there + is a way to pass the original data through without copying to improve + performance. + + MessageFramer.Send(Connection, Data) + + To provide an example, a simple protocol that adds a length as a + header would receive the "NewSentMessage" event, create a data + representation of the length of the Message data, and then send a + block of data that is the concatenation of the length header and the + original Message data. + +6.3. Receiver-side Message Framing + + In order to parse a received flow of data into Messages, the Message + Framer notifies the framer implementation whenever new data is + available to parse. + + MessageFramer -> HandleReceivedData + + Upon receiving this event, the framer implementation can inspect the + inbound data. The data is parsed from a particular cursor + representing the unprocessed data. The application requests a + specific amount of data it needs to have available in order to parse. + If the data is not available, the parse fails. + +MessageFramer.Parse(Connection, MinimumIncompleteLength, MaximumLength) -> (Data, MessageContext, IsEndOfMessage) + + The framer implementation can directly advance the receive cursor + once it has parsed data to effectively discard data (for example, + discard a header once the content has been parsed). + + To deliver a Message to the application, the framer implementation + can either directly deliever data that it has allocated, or deliver a + range of data directly from the underlying transport and + simulatenously advance the receive cursor. + +MessageFramer.AdvanceReceiveCursor(Connection, Length) +MessageFramer.DeliverAndAdvanceReceiveCursor(Connection, MessageContext, Length, IsEndOfMessage) +MessageFramer.Deliver(Connection, MessageContext, Data, IsEndOfMessage) + + Note that "MessageFramer.DeliverAndAdvanceReceiveCursor" allows the + framer implementation to earmark bytes as part of a Message even + before they are received by the transport. This allows the delivery + of very large Messages without requiring the implementation to + directly inspect all of the bytes. + + To provide an example, a simple protocol that parses a length as a + header value would receive the "HandleReceivedData" event, and call + "Parse" with a minimum and maximum set to the length of the header + field. Once the parse succeeded, it would call + "AdvanceReceiveCursor" with the length of the header field, and then + call "DeliverAndAdvanceReceiveCursor" with the length of the body + that was parsed from the header, marking the new Message as complete. + +7. Implementing Connection Management + + Once a Connection is established, the Transport Services system + allows applications to interact with the Connection by modifying or + inspecting Connection Properties. A Connection can also generate + events in the form of Soft Errors. + + The set of Connection Properties that are supported for setting and + getting on a Connection are described in [I-D.ietf-taps-interface]. + For any properties that are generic, and thus could apply to all + protocols being used by a Connection, the Transport System should + store the properties in a generic storage, and notify all protocol + instances in the Protocol Stack whenever the properties have been + modified by the application. For protocol-specfic properties, such + as the User Timeout that applies to TCP, the Transport System only + needs to update the relevant protocol instance. + + If an error is encountered in setting a property (for example, if the + application tries to set a TCP-specific property on a Connection that + is not using TCP), the action should fail gracefully. The + application may be informed of the error, but the Connection itself + should not be terminated. + + The Transport Services implementation should allow protocol instances + in the Protocol Stack to pass up arbitrary generic or protocol- + specific errors that can be delivered to the application as Soft + Errors. These allow the application to be informed of ICMP errors, + and other similar events. + +7.1. Pooled Connection + + For protocols that employ request/response pairs and do not require + in-order delivery of the responses, like HTTP, the transport + implementation may distribute interactions across several underlying + transport connections. For these kinds of protocols, implementations + may hide the connection management and only expose a single + Connection object and the individual requests/responses as messages. + These Pooled Connections can use multiple connections or multiple + streams of multi-streaming connections between endpoints, as long as + all of these satisfy the requirements, and prohibitions specified in + the Selection Properties of the Pooled Connection. This enables + implementations to realize transparent connection coalescing, + connection migration, and to perform per-message endpoint and path + selection by choosing among these underlying connections. + +7.2. Handling Path Changes When a path change occurs, the Transport Services implementation is responsible for notifying Protocol Instances in the Protocol Stack. If the Protocol Stack includes a transport protocol that supports multipath connectivity, an update to the available paths should inform the Protocol Instance of the new set of paths that are permissible based on the Selection Properties passed by the application. A multipath protocol can establish new subflows over new paths, and should tear down subflows over paths that are no - longer available. If the Protocol Stack includes a transport - protocol that does not support multipath, but support migrating - between paths, the update to available paths can be used as the - trigger to migrating the connection. For protocols that do not - support multipath or migration, the Protocol Instances may be - informed of the path change, but should not be forcibly disconnected - if the previously used path becomes unavailable. An exception to - this case is if the System Policy changes to prohibit traffic from - the Connection based on its properties, in which case the Protocol - Stack should be disconnected. + longer available. Pooled Connections Section 7.1 may add or remove + underlying transport connections in a similar manner. If the + Protocol Stack includes a transport protocol that does not support + multipath, but support migrating between paths, the update to + available paths can be used as the trigger to migrating the + connection. For protocols that do not support multipath or + migration, the Protocol Instances may be informed of the path change, + but should not be forcibly disconnected if the previously used path + becomes unavailable. An exception to this case is if the System + Policy changes to prohibit traffic from the Connection based on its + properties, in which case the Protocol Stack should be disconnected. -7. Implementing Termination +8. Implementing Connection Termination With TCP, when an application closes a connection, this means that it has no more data to send (but expects all data that has been handed over to be reliably delivered). However, with TCP only, "close" does not mean that the application will stop receiving data. This is related to TCP's ability to support half-closed connections. SCTP is an example of a protocol that does not support such half- closed connections. Hence, with SCTP, the meaning of "close" is stricter: an application has no more data to send (but expects all @@ -1228,37 +1356,37 @@ Initiate action provokes a ConnectionReceived event at its peer. For Close (provoking a Finished event) and Abort (provoking a ConnectionError event), the same logic applies: while it is desirable to be informed when a peer closes or aborts a Connection, whether this is possible depends on the underlying protocol, and no guarantees can be given. With SCTP, the transport system can use the stream reset procedure to cause a Finish event upon a Close action from the peer [NEAT-flow-mapping]. -8. Cached State +9. Cached State Beyond a single Connection's lifetime, it is useful for an implementation to keep state and history. This cached state can help improve future Connection establishment due to re-using results and credentials, and favoring paths and protocols that performed well in the past. Cached state may be associated with different Endpoints for the same Connection, depending on the protocol generating the cached content. For example, session tickets for TLS are associated with specific endpoints, and thus should be cached based on a Connection's hostname Endpoint (if applicable). On the other hand, performance characteristics of a path are more likely tied to the IP address and subnet being used. -8.1. Protocol state caches +9.1. Protocol state caches Some protocols will have long-term state to be cached in association with Endpoints. This state often has some time after which it is expired, so the implementation should allow each protocol to specify an expiration for cached content. Examples of cached protocol state include: o The DNS protocol can cache resolution answers (A and AAAA queries, for example), associated with a Time To Live (TTL) to be used for @@ -1275,21 +1403,21 @@ influence an implementation's preference between several candidate Protocol Stacks. For example, if two IP address Endpoints are otherwise equally preferred, an implementation may choose to attempt a connection to an address for which it has a TCP Fast Open cookie. Applications must have a way to flush protocol cache state if desired. This may be necessary, for example, if application-layer identifiers rotate and clients wish to avoid linkability via trackable TLS tickets or TFO cookies. -8.2. Performance caches +9.2. Performance caches In addition to protocol state, Protocol Instances should provide data into a performance-oriented cache to help guide future protocol and path selection. Some performance information can be gathered generically across several protocols to allow predictive comparisons between protocols on given paths: o Observed Round Trip Time o Connection Establishment latency @@ -1319,31 +1447,29 @@ depending on the nature of the value. Certain information, like the connection establishment success rate to a Remote Endpoint using a given protocol stack, can be stored for a long period of time (hours or longer), since it is expected that the capabilities of the Remote Endpoint are not changing very quickly. On the other hand, Round Trip Time observed by TCP over a particular network path may vary over a relatively short time interval. For such values, the implementation should remove them from the cache more quickly, or treat older values with less confidence/weight. -9. Specific Transport Protocol Considerations +10. Specific Transport Protocol Considerations Each protocol that can run as part of a Transport Services implementation defines both its API mapping as well as implementation - details. - - API mappings for a protocol apply most to Connections in which the - given protocol is the "top" of the Protocol Stack. For example, the - mapping of the "Send" function for TCP applies to Connections in - which the application directly sends over TCP. If HTTP/2 is used on - top of TCP, the HTTP/2 mappings take precendence. + details. API mappings for a protocol apply most to Connections in + which the given protocol is the "top" of the Protocol Stack. For + example, the mapping of the "Send" function for TCP applies to + Connections in which the application directly sends over TCP. If + HTTP/2 is used on top of TCP, the HTTP/2 mappings take precendence. Each protocol has a notion of Connectedness. Possible values for Connectedness are: o Unconnected. Unconnected protocols do not establish explicit state between endpoints, and do not perform a handshake during Connection establishment. o Connected. Connected protocols establish state between endpoints, and perform a handshake during Connection establishment. The @@ -1365,134 +1491,155 @@ o Datagram. Datagram protocols define Message boundaries at the same level of transmission, such that only complete (not partial) Messages are supported. o Message. Message protocols support Message boundaries that can be sent and received either as complete or partial Messages. Maximum Message lengths can be defined, and Messages can be partially reliable. -9.1. TCP + Below, primitives in the style of + "CATEGORY.[SUBCATEGORY].PRIMITIVENAME.PROTOCOL" (e.g., + "CONNECT.SCTP") refer to the primitives with the same name in section + 4 of [RFC8303]. For further implementation details, the description + of these primitives in [RFC8303] points to section 3, which refers + back the specifications for each protocol. This back-tracking method + applies to all elements of [I-D.ietf-taps-minset] (see appendix D of + [I-D.ietf-taps-interface]): they are listed in appendix A of + [I-D.ietf-taps-minset] with an implementation hint in the same style, + pointing back to section 4 of [RFC8303]. + +10.1. TCP Connectedness: Connected Data Unit: Byte-stream API mappings for TCP are as follows: Connection Object: TCP connections between two hosts map directly to Connection objects. - Initiate: Calling "Initiate" on a TCP Connection causes it to - reserve a local port, and send a SYN to the Remote Endpoint. + Initiate: CONNECT.TCP. Calling "Initiate" on a TCP Connection + causes it to reserve a local port, and send a SYN to the Remote + Endpoint. - InitiateWithSend: Early idempotent data is sent on a TCP Connection - in the SYN, as TCP Fast Open data. + InitiateWithSend: CONNECT.TCP with parameter "user message". Early + idempotent data is sent on a TCP Connection in the SYN, as TCP + Fast Open data. Ready: A TCP Connection is ready once the three-way handshake is complete. - InitiateError: TCP can throw various errors during connection setup. - Specifically, it is important to handle a RST being sent by the - peer during the handshake. + InitiateError: Failure of CONNECT.TCP. TCP can throw various errors + during connection setup. Specifically, it is important to handle + a RST being sent by the peer during the handshake. ConnectionError: Once established, TCP throws errors whenever the - connection is disconnected, such as due to receive a RST from the - peer; or hitting a TCP retransmission timeout. + connection is disconnected, such as due to receiving a RST from + the peer; or hitting a TCP retransmission timeout. - Listen: Calling "Listen" for TCP binds a local port and prepares it - to receive inbound SYN packets from peers. + Listen: LISTEN.TCP. Calling "Listen" for TCP binds a local port and + prepares it to receive inbound SYN packets from peers. ConnectionReceived: TCP Listeners will deliver new connections once they have replied to an inbound SYN with a SYN-ACK. Clone: Calling "Clone" on a TCP Connection creates a new Connection with equivalent parameters. The two Connections are otherwise independent. - Send: TCP does not on its own preserve Message boundaries. Calling - "Send" on a TCP connection lays out the bytes on the TCP send - stream without any other delineation. Any Message marked as Final - will cause TCP to send a FIN once the Message has been completely - written. + Send: SEND.TCP. TCP does not on its own preserve Message + boundaries. Calling "Send" on a TCP connection lays out the bytes + on the TCP send stream without any other delineation. Any Message + marked as Final will cause TCP to send a FIN once the Message has + been completely written, by calling CLOSE.TCP immediately upon + successful termination of SEND.TCP. - Receive: TCP delivers a stream of bytes without any Message - delineation. All data delivered in the "Received" or + Receive: With RECEIVE.TCP, TCP delivers a stream of bytes without + any Message delineation. All data delivered in the "Received" or "ReceivedPartial" event will be part of a single stream-wide Message that is marked Final (unless a MessageFramer is used). EndOfMessage will be delivered when the TCP Connection has - received a FIN from the peer. + received a FIN (CLOSE-EVENT.TCP or ABORT-EVENT.TCP) from the peer. Close: Calling "Close" on a TCP Connection indicates that the - Connection should be gracefully closed by sending a FIN to the - peer and waiting for a FIN-ACK before delivering the "Closed" - event. + Connection should be gracefully closed (CLOSE.TCP) by sending a + FIN to the peer and waiting for a FIN-ACK before delivering the + "Closed" event. Abort: Calling "Abort" on a TCP Connection indicates that the Connection should be immediately closed by sending a RST to the - peer. + peer (ABORT.TCP). -9.2. UDP +10.2. UDP Connectedness: Unconnected Data Unit: Datagram API mappings for UDP are as follows: Connection Object: UDP connections represent a pair of specific IP addresses and ports on two hosts. - Initiate: Calling "Initiate" on a UDP Connection causes it to - reserve a local port, but does not generate any traffic. + Initiate: CONNECT.UDP. Calling "Initiate" on a UDP Connection + causes it to reserve a local port, but does not generate any + traffic. InitiateWithSend: Early data on a UDP Connection does not have any special meaning. The data is sent whenever the Connection is Ready. Ready: A UDP Connection is ready once the system has reserved a local port and has a path to send to the Remote Endpoint. InitiateError: UDP Connections can only generate errors on initiation due to port conflicts on the local system. - ConnectionError: Once in use, UDP throws errors upon receiving ICMP - notifications indicating failures in the network. + ConnectionError: Once in use, UDP throws "soft errors" (ERROR.UDP(- + Lite)) upon receiving ICMP notifications indicating failures in + the network. - Listen: Calling "Listen" for UDP binds a local port and prepares it - to receive inbound UDP datagrams from peers. + Listen: LISTEN.UDP. Calling "Listen" for UDP binds a local port and + prepares it to receive inbound UDP datagrams from peers. ConnectionReceived: UDP Listeners will deliver new connections once they have received traffic from a new Remote Endpoint. Clone: Calling "Clone" on a UDP Connection creates a new Connection with equivalent parameters. The two Connections are otherwise independent. - Send: Calling "Send" on a UDP connection sends the data as the - payload of a complete UDP datagram. Marking Messages as Final - does not change anything in the datagram's contents. + Send: SEND.UDP(-Lite). Calling "Send" on a UDP connection sends the + data as the payload of a complete UDP datagram. Marking Messages + as Final does not change anything in the datagram's contents. + Upon sending a UDP datagram, some relevant fields and flags in the + IP header can be controlled: DSCP (SET_DSCP.UDP(-Lite)), DF in + IPv4 (SET_DF.UDP(-Lite)) and ECN flag (SET_ECN.UDP(-Lite)). - Receive: UDP only delivers complete Messages to "Received", each of - which represents a single datagram received in a UDP packet. + Receive: RECEIVE.UDP(-Lite). UDP only delivers complete Messages to + "Received", each of which represents a single datagram received in + a UDP packet. Upon receiving a UDP datagram, the ECN flag from + the IP header can be obtained (GET_ECN.UDP(-Lite)). - Close: Calling "Close" on a UDP Connection releases the local port - reservation. + Close: Calling "Close" on a UDP Connection (ABORT.UDP(-Lite)) + releases the local port reservation. - Abort: Calling "Abort" on a UDP Connection is identical to calling - "Close". + Abort: Calling "Abort" on a UDP Connection (ABORT.UDP(-Lite)) is + identical to calling "Close". -9.3. TLS +10.3. TLS The mapping of a TLS stream abstraction into the application is - equivalent to the contract provided by TCP (see Section 9.1), and + equivalent to the contract provided by TCP (see Section 10.1), and builds upon many of the actions of TCP connections. Connectedness: Connected Data Unit: Byte-stream Connection Object: Connection objects represent a single TLS connection running over a TCP connection between two hosts. Initiate: Calling "Initiate" on a TLS Connection causes it to first @@ -1548,26 +1695,26 @@ Connection should be gracefully closed by sending a "close_notify" to the peer and waiting for a corresponding "close_notify" before delivering the "Closed" event. Abort: Calling "Abort" on a TCP Connection indicates that the Connection should be immediately closed by sending a "close_notify", optionally preceded by "user_canceled", to the peer. Implementations do not need to wait to receive "close_notify" before delivering the "Closed" event. -9.4. DTLS +10.4. DTLS - DTLS follows the same behavior as TLS (Section 9.3), with the notable - exception of not inheriting behavior directly from TCP. Differences - from TLS are detailed below, and all cases not explicitly mentioned - should be considered the same as TLS. + DTLS follows the same behavior as TLS (Section 10.3), with the + notable exception of not inheriting behavior directly from TCP. + Differences from TLS are detailed below, and all cases not explicitly + mentioned should be considered the same as TLS. Connectedness: Connected Data Unit: Datagram Connection Object: Connection objects represent a single DTLS connection running over a set of UDP ports between two hosts. Initiate: Calling "Initiate" on a DTLS Connection causes it reserve a UDP local port, and begin sending handshake messages to the peer @@ -1578,21 +1725,21 @@ and keys have been established to encrypt application data. Send: Sending over DTLS does preserve message boundaries in the same way that UDP datagrams do. Marking a Message as Final does send a "close_notify" like TLS. Receive: Receiving over DTLS delivers one decrypted Message for each received DTLS datagram. If a "close_notify" is received, a Message will be delivered that is marked as Final. -9.5. HTTP +10.5. HTTP HTTP requests and responses map naturally into Messages, since they are delineated chunks of data with metadata that can be sent over a transport. To that end, HTTP can be seen as the most prevalent framing protocol that runs on top of streams like TCP, TLS, etc. In order to use a transport Connection that provides HTTP Message support, the establishment and closing of the connection can be treated as it would without the framing protocol. Sending and receiving of Messages, however, changes to treat each Message as a @@ -1623,21 +1771,21 @@ Receive: HTTP Connections deliver Messages in which HTTP header values attached to MessageContexts, and HTTP bodies in Message data. Close: Calling "Close" on an HTTP Connection will only close the underlying TLS or TCP connection if the HTTP version does not support multiplexing. For HTTP/2, for example, closing the connection only closes a specific stream. -9.6. QUIC +10.6. QUIC QUIC provides a multi-streaming interface to an encrypted transport. Each stream can be viewed as equivalent to a TLS stream over TCP, so a natural mapping is to present each QUIC stream as an individual Connection. The protocol for the stream will be considered Ready whenever the underlying QUIC connection is established to the point that this stream's data can be sent. For streams after the first stream, this will likely be an immediate operation. Closing a single QUIC stream, presented to the application as a @@ -1642,156 +1790,230 @@ Closing a single QUIC stream, presented to the application as a Connection, does not imply closing the underlying QUIC connection itself. Rather, the implementation may choose to close the QUIC connection once all streams have been closed (often after some timeout), or after an individual stream Connection sends an Abort. Connectedness: Multiplexing Connected Data Unit: Stream + Connection Object: Connection objects represent a single QUIC stream on a QUIC connection. -9.7. HTTP/2 transport +10.7. HTTP/2 transport - Similar to QUIC (Section 9.6), HTTP/2 provides a multi-streaming + Similar to QUIC (Section 10.6), HTTP/2 provides a multi-streaming interface. This will generally use HTTP as the unit of Messages over the streams, in which each stream can be represented as a transport Connection. The lifetime of streams and the HTTP/2 connection should be managed as described for QUIC. It is possible to treat each HTTP/2 stream as a raw byte-stream instead of a carrier for HTTP messages, in which case the Messages over the streams can be represented similarly to the TCP stream (one - Message per direction, see Section 9.1). + Message per direction, see Section 10.1). Connectedness: Multiplexing Connected Data Unit: Stream Connection Object: Connection objects represent a single HTTP/2 stream on a HTTP/2 connection. -9.8. SCTP +10.8. SCTP - To support sender-side stream schedulers (which are implemented on - the sender side), a receiver-side Transport System should always - support message interleaving [RFC8260]. + Connectedness: Connected - SCTP messages can be very large. To allow the reception of large - messages in pieces, a "partial flag" can be used to inform a (native - SCTP) receiving application that a message is incomplete. After - receiving the "partial flag", this application would know that the - next receive calls will only deliver remaining parts of the same - message (i.e., no messages or partial messages will arrive on other - streams until the message is complete) (see Section 8.1.20 in - [RFC6458]). The "partial flag" can therefore facilitate the - implementation of the receiver buffer in the receiving application, - at the cost of limiting multiplexing and temporarily creating head- - of-line blocking delay at the receiver. + Data Unit: Message - When a Transport System transfers a Message, it seems natural to map - the Message object to SCTP messages in order to support properties - such as "Ordered" or "Lifetime" (which maps onto partially reliable - delivery with a SCTP_PR_SCTP_TTL policy [RFC6458]). However, since - multiplexing of Connections onto SCTP streams may happen, and would - be hidden from the application, the Transport System requires a per- - stream receiver buffer anyway, so this potential benefit is lost and - the "partial flag" becomes unnecessary for the system. + API mappings for SCTP are as follows: - The problem of long messages either requiring large receiver-side - buffers or getting in the way of multiplexing is addressed by message - interleaving [RFC8260], which is yet another reason why a receivers- - side transport system supporting SCTP should implement this - mechanism. + Connection Object: Connection objects represent a flow of SCTP + messages between a client and a server, which may be an SCTP + association or a stream in a SCTP association. How to map + Connection objects to streams is described in [NEAT-flow-mapping]; + in the following, a similar method is described. To map + Connection objects to SCTP streams without head-of-line blocking + on the sender side, both the sending and receiving SCTP + implementation must support message interleaving [RFC8260]. Both + SCTP implementations must also support stream reconfiguration. + Finally, both communicating endpoints must be aware of this + intended multiplexing; [NEAT-flow-mapping] describes a way for a + Transport System to negotiate the stream mapping capability using + SCTP's adaptation layer indication, such that this functionality + would only take effect if both ends sides are aware of it. The + first flow, for which the SCTP association has been created, will + always use stream id zero. All additional flows are assigned to + unused stream ids in growing order. To avoid a conflict when both + endpoints map new flows simultaneously, the peer which initiated + the transport connection will use even stream numbers whereas the + remote side will map its flows to odd stream numbers. Both sides + maintain a status map of the assigned stream numbers. Generally, + new streams must consume the lowest available (even or odd, + depending on the side) stream number; this rule is relevant when + lower numbers become available because Connection objects + associated to the streams are closed. -10. IANA Considerations + Initiate: If this is the only Connection object that is assigned to + the SCTP association or stream mapping has not been negotiated, + CONNECT.SCTP is called. Else, a new stream is used: if there are + enough streams available, "Initiate" is just a local operation + that assigns a new stream number to the Connection object. The + number of streams is negotiated as a parameter of the prior + CONNECT.SCTP call, and it represents a trade-off between local + resource usage and the number of Connection objects that can be + mapped without requiring a reconfiguration signal. When running + out of streams, ADD_STREAM.SCTP must be called. + + InitiateWithSend: If this is the only Connection object that is + assigned to the SCTP association or stream mapping has not been + negotiated, CONNECT.SCTP is called with the "user message" + parameter. Else, a new stream is used (see "Initiate" for how to + handle running out of streams), and this just sends the first + message on a new stream. + + Ready: "Initiate" or "InitiateWithSend" returns without an error, + i.e. SCTP's four-way handshake has completed. If an association + with the peer already exists, and stream mapping has been + negotiated and enough streams are available, a Connection Object + instantly becomes Ready after calling "Initiate" or + "InitiateWithSend". + + InitiateError: Failure of CONNECT.SCTP. + + ConnectionError: TIMEOUT.SCTP or ABORT-EVENT.SCTP. + + Listen: LISTEN.SCTP. If an association with the peer already exists + and stream mapping has been negotiated, "Listen" just expects to + receive a new message on a new stream id (chosen in accordance + with the stream number assignment procedure described above). + + ConnectionReceived: LISTEN.SCTP returns without an error (a result + of successful CONNECT.SCTP from the peer), or, in case of stream + mapping, the first message has arrived on a new stream (in this + case, "Receive" is also invoked). + + Clone: Calling "Clone" on an SCTP association creates a new + Connection object and assigns it a new stream number in accordance + with the stream number assignment procedure described above. If + there are not enough streams available, ADD_STREAM.SCTP must be + called. + + Priority (Connection): When this value is changed, or a Message with + Message Property "Priority" is sent, and there are multiple + Connection objects assigned to the same SCTP association, + CONFIGURE_STREAM_SCHEDULER.SCTP is called to adjust the priorities + of streams in the SCTP association. + + Send: SEND.SCTP. Message Properties such as "Lifetime" and + "Ordered" map to parameters of this primitive. + + Receive: RECEIVE.SCTP. The "partial flag" of RECEIVE.SCTP invokes a + "ReceivedPartial" event. + + Close: If this is the only Connection object that is assigned to the + SCTP association, CLOSE.SCTP is called. Else, the Connection object + is one out of several Connection objects that are assigned to the + same SCTP assocation, and RESET_STREAM.SCTP must be called, which + informs the peer that the stream will no longer be used for mapping + and can be used by future "Initiate", "InitiateWithSend" or "Listen" + calls. At the peer, the event RESET_STREAM-EVENT.SCTP will fire, + which the peer must answer by issuing RESET_STREAM.SCTP too. The + resulting local RESET_STREAM-EVENT.SCTP informs the transport system + that the stream number can now be re-used by the next "Initiate", + "InitiateWithSend" or "Listen" calls. + + Abort: If this is the only Connection object that is assigned to the + SCTP association, ABORT.SCTP is called. Else, the Connection object + is one out of several Connection objects that are assigned to the + same SCTP assocation, and shutdown proceeds as described under + "Close". + +11. IANA Considerations RFC-EDITOR: Please remove this section before publication. This document has no actions for IANA. -11. Security Considerations +12. Security Considerations -11.1. Considerations for Candidate Gathering +12.1. Considerations for Candidate Gathering Implementations should avoid downgrade attacks that allow network interference to cause the implementation to select less secure, or entirely insecure, combinations of paths and protocols. -11.2. Considerations for Candidate Racing +12.2. Considerations for Candidate Racing - See Section 5.2 for security considerations around racing with 0-RTT + See Section 5.3 for security considerations around racing with 0-RTT data. An attacker that knows a particular device is racing several options during connection establishment may be able to block packets for the first connection attempt, thus inducing the device to fall back to a secondary attempt. This is a problem if the secondary attempts have worse security properties that enable further attacks. Implementations should ensure that all options have equivalent security properties to avoid incentivizing attacks. Since results from the network can determine how a connection attempt tree is built, such as when DNS returns a list of resolved endpoints, it is possible for the network to cause an implementation to consume significant on-device resources. Implementations should limit the maximum amount of state allowed for any given node, including the number of child nodes, especially when the state is based on results from the network. -12. Acknowledgements +13. Acknowledgements This work has received funding from the European Union's Horizon 2020 research and innovation programme under grant agreement No. 644334 (NEAT). This work has been supported by Leibniz Prize project funds of DFG - German Research Foundation: Gottfried Wilhelm Leibniz-Preis 2011 (FKZ FE 570/4-1). This work has been supported by the UK Engineering and Physical Sciences Research Council under grant EP/R04144X/1. + This work has been supported by the Research Council of Norway under + its "Toppforsk" programme through the "OCARINA" project. + Thanks to Stuart Cheshire, Josh Graessley, David Schinazi, and Eric Kinnear for their implementation and design efforts, including Happy Eyeballs, that heavily influenced this work. -13. References +14. References -13.1. Normative References +14.1. Normative References [I-D.ietf-taps-arch] Pauly, T., Trammell, B., Brunstrom, A., Fairhurst, G., Perkins, C., Tiesel, P., and C. Wood, "An Architecture for - Transport Services", draft-ietf-taps-arch-03 (work in - progress), March 2019. + Transport Services", draft-ietf-taps-arch-04 (work in + progress), July 2019. [I-D.ietf-taps-interface] Trammell, B., Welzl, M., Enghardt, T., Fairhurst, G., - Kuehlewind, M., Perkins, C., Tiesel, P., and C. Wood, "An - Abstract Application Layer Interface to Transport - Services", draft-ietf-taps-interface-03 (work in - progress), March 2019. + Kuehlewind, M., Perkins, C., Tiesel, P., Wood, C., and T. + Pauly, "An Abstract Application Layer Interface to + Transport Services", draft-ietf-taps-interface-04 (work in + progress), July 2019. [I-D.ietf-taps-minset] Welzl, M. and S. Gjessing, "A Minimal Set of Transport Services for End Systems", draft-ietf-taps-minset-11 (work in progress), September 2018. - [RFC6458] Stewart, R., Tuexen, M., Poon, K., Lei, P., and V. - Yasevich, "Sockets API Extensions for the Stream Control - Transmission Protocol (SCTP)", RFC 6458, - DOI 10.17487/RFC6458, December 2011, - . - [RFC7413] Cheng, Y., Chu, J., Radhakrishnan, S., and A. Jain, "TCP Fast Open", RFC 7413, DOI 10.17487/RFC7413, December 2014, . [RFC7540] Belshe, M., Peon, R., and M. Thomson, Ed., "Hypertext Transfer Protocol Version 2 (HTTP/2)", RFC 7540, DOI 10.17487/RFC7540, May 2015, . [RFC8260] Stewart, R., Tuexen, M., Loreto, S., and R. Seggelmann, @@ -1812,80 +2034,168 @@ [RFC8305] Schinazi, D. and T. Pauly, "Happy Eyeballs Version 2: Better Connectivity Using Concurrency", RFC 8305, DOI 10.17487/RFC8305, December 2017, . [RFC8446] Rescorla, E., "The Transport Layer Security (TLS) Protocol Version 1.3", RFC 8446, DOI 10.17487/RFC8446, August 2018, . -13.2. Informative References +14.2. Informative References [I-D.ietf-quic-transport] Iyengar, J. and M. Thomson, "QUIC: A UDP-Based Multiplexed - and Secure Transport", draft-ietf-quic-transport-20 (work - in progress), April 2019. + and Secure Transport", draft-ietf-quic-transport-23 (work + in progress), September 2019. [NEAT-flow-mapping] "Transparent Flow Mapping for NEAT (in Workshop on Future of Internet Transport (FIT 2017))", n.d.. [RFC5245] Rosenberg, J., "Interactive Connectivity Establishment (ICE): A Protocol for Network Address Translator (NAT) Traversal for Offer/Answer Protocols", RFC 5245, DOI 10.17487/RFC5245, April 2010, . [Trickle] "Trickle - Rate Limiting YouTube Video Streaming (ATC 2012)", n.d.. +14.3. URIs + + [1] https://developer.apple.com/documentation/network + + [2] https://github.com/NEAT-project/neat + + [3] https://www.neat-project.org + + [4] https://github.com/fg-inet/python-asyncio-taps + Appendix A. Additional Properties This appendix discusses implementation considerations for additional parameters and properties that could be used to enhance transport protocol and/or path selection, or the transmission of messages given a Protocol Stack that implements them. These are not part of the interface, and may be removed from the final document, but are presented here to support discussion within the TAPS working group as to whether they should be added to a future revision of the base specification. A.1. Properties Affecting Sorting of Branches In addition to the Protocol and Path Selection Properties discussed in Section 4.3, the following properties under discussion can influence branch sorting: o Bounds on Send or Receive Rate: If the application indicates a bound on the expected Send or Receive bitrate, an implementation may prefer a path that can likely provide the desired bandwidth, - based on cached maximum throughput, see Section 8.2. The + based on cached maximum throughput, see Section 9.2. The application may know the Send or Receive Bitrate from metadata in adaptive HTTP streaming, such as MPEG-DASH. o Cost Preferences: If the application indicates a preference to avoid expensive paths, and some paths are associated with a monetary cost, an implementation should decrease the ranking of such paths. If the application indicates that it prohibits using expensive paths, paths that are associated with a cost should be purged from the decision tree. +Appendix B. Reasons for errors + + The Transport Services API [I-D.ietf-taps-interface] allows for the + several generic error types to specify a more detailed reason as to + why an error occurred. This appendix lists some of the possible + reasons. + + o InvalidConfiguration: The transport properties and endpoints + provided by the application are either contradictory or + incomplete. Examples include the lack of a remote endpoint on an + active open or using a multicast group address while not + requesting a unidirectional receive. + + o NoCandidates: The configuration is valid, but none of the + available transport protocols can satisfy the transport properties + provided by the application. + + o ResolutionFailed: The remote or local specifier provided by the + application can not be resolved. + + o EstablishmentFailed: The TAPS system was unable to establish a + transport-layer connection to the remote endpoint specified by the + application. + + o PolicyProhibited: The system policy prevents the transport system + from performing the action requested by the application. + + o NotCloneable: The protocol stack is not capable of being cloned. + + o MessageTooLarge: The message size is too big for the transport + system to handle. + + o ProtocolFailed: The underlying protocol stack failed. + + o InvalidMessageProperties: The message properties are either + contradictory to the transport properties or they can not be + satisfied by the transport system. + + o DeframingFailed: The data that was received by the underlying + protocol stack could not be deframed. + + o ConnectionAborted: The connection was aborted by the peer. + + o Timeout: Delivery of a message was not possible after a timeout. + +Appendix C. Existing Implementations + + This appendix gives an overview of existing implementations, at the + time of writing, of transport systems that are (to some degree) in + line with this document. + + o Apple's Network.framework: + + * [A very brief introduction should be added] + + * Documentation: https://developer.apple.com/documentation/ + network [1] + + o NEAT: + + * NEAT is the output of the European H2020 research project + "NEAT"; it is a user-space library for protocol-independent + communication on top of TCP, UDP and SCTP, with many more + features such as a policy manager. + + * Code: https://github.com/NEAT-project/neat [2] + + * NEAT project: https://www.neat-project.org [3] + + o PyTAPS: + + * A TAPS implementation based on Python asyncio, offering + protocol-independent communication to applications on top of + TCP, UDP and TLS, with support for multicast. + + * Code: https://github.com/fg-inet/python-asyncio-taps [4] + Authors' Addresses Anna Brunstrom (editor) Karlstad University Universitetsgatan 2 651 88 Karlstad Sweden Email: anna.brunstrom@kau.se + Tommy Pauly (editor) Apple Inc. One Apple Park Way Cupertino, California 95014 United States of America Email: tpauly@apple.com Theresa Enghardt TU Berlin