--- 1/draft-ietf-taps-impl-10.txt 2022-01-09 13:13:13.675236288 -0800 +++ 2/draft-ietf-taps-impl-11.txt 2022-01-09 13:13:13.783238983 -0800 @@ -1,31 +1,25 @@ TAPS Working Group A. Brunstrom, Ed. Internet-Draft Karlstad University Intended status: Informational T. Pauly, Ed. -Expires: 13 January 2022 Apple Inc. +Expires: 13 July 2022 Apple Inc. T. Enghardt Netflix - K-J. Grinnemo - Karlstad University - T. Jones - University of Aberdeen P. Tiesel SAP SE - C. Perkins - University of Glasgow M. Welzl University of Oslo - 12 July 2021 + 9 January 2022 Implementing Interfaces to Transport Services - draft-ietf-taps-impl-10 + draft-ietf-taps-impl-11 Abstract The Transport Services system enables applications to use transport protocols flexibly for network communication and defines a protocol- independent Transport Services Application Programming Interface (API) that is based on an asynchronous, event-driven interaction pattern. This document serves as a guide to implementation on how to build such a system. @@ -37,35 +31,35 @@ Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet- Drafts is at https://datatracker.ietf.org/drafts/current/. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." - This Internet-Draft will expire on 13 January 2022. + This Internet-Draft will expire on 13 July 2022. Copyright Notice - Copyright (c) 2021 IETF Trust and the persons identified as the + Copyright (c) 2022 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/ license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components - extracted from this document must include Simplified BSD License text - as described in Section 4.e of the Trust Legal Provisions and are - provided without warranty as described in the Simplified BSD License. + extracted from this document must include Revised BSD License text as + described in Section 4.e of the Trust Legal Provisions and are + provided without warranty as described in the Revised BSD License. Table of Contents 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 2. Implementing Connection Objects . . . . . . . . . . . . . . . 4 3. Implementing Pre-Establishment . . . . . . . . . . . . . . . 5 3.1. Configuration-time errors . . . . . . . . . . . . . . . . 5 3.2. Role of system policy . . . . . . . . . . . . . . . . . . 6 4. Implementing Connection Establishment . . . . . . . . . . . . 7 4.1. Structuring Candidates as a Tree . . . . . . . . . . . . 8 @@ -87,59 +81,58 @@ 4.7.2. Implementing listeners for Connectionless Protocols . . . . . . . . . . . . . . . . . . . . . . 22 4.7.3. Implementing listeners for Multiplexed Protocols . . 22 5. Implementing Sending and Receiving Data . . . . . . . . . . . 23 5.1. Sending Messages . . . . . . . . . . . . . . . . . . . . 23 5.1.1. Message Properties . . . . . . . . . . . . . . . . . 23 5.1.2. Send Completion . . . . . . . . . . . . . . . . . . . 25 5.1.3. Batching Sends . . . . . . . . . . . . . . . . . . . 25 5.2. Receiving Messages . . . . . . . . . . . . . . . . . . . 25 5.3. Handling of data for fast-open protocols . . . . . . . . 26 - 6. Implementing Message Framers . . . . . . . . . . . . . . . . 27 6.1. Defining Message Framers . . . . . . . . . . . . . . . . 28 6.2. Sender-side Message Framing . . . . . . . . . . . . . . . 29 - 6.3. Receiver-side Message Framing . . . . . . . . . . . . . . 30 - 7. Implementing Connection Management . . . . . . . . . . . . . 31 + 6.3. Receiver-side Message Framing . . . . . . . . . . . . . . 29 + 7. Implementing Connection Management . . . . . . . . . . . . . 30 7.1. Pooled Connection . . . . . . . . . . . . . . . . . . . . 31 - 7.2. Handling Path Changes . . . . . . . . . . . . . . . . . . 32 + 7.2. Handling Path Changes . . . . . . . . . . . . . . . . . . 31 8. Implementing Connection Termination . . . . . . . . . . . . . 33 9. Cached State . . . . . . . . . . . . . . . . . . . . . . . . 34 9.1. Protocol state caches . . . . . . . . . . . . . . . . . . 34 9.2. Performance caches . . . . . . . . . . . . . . . . . . . 35 10. Specific Transport Protocol Considerations . . . . . . . . . 36 10.1. TCP . . . . . . . . . . . . . . . . . . . . . . . . . . 37 10.2. MPTCP . . . . . . . . . . . . . . . . . . . . . . . . . 39 10.3. UDP . . . . . . . . . . . . . . . . . . . . . . . . . . 39 - 10.4. UDP-Lite . . . . . . . . . . . . . . . . . . . . . . . . 41 - 10.5. UDP Multicast Receive . . . . . . . . . . . . . . . . . 41 + 10.4. UDP-Lite . . . . . . . . . . . . . . . . . . . . . . . . 40 + 10.5. UDP Multicast Receive . . . . . . . . . . . . . . . . . 40 10.6. SCTP . . . . . . . . . . . . . . . . . . . . . . . . . . 42 11. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 45 12. Security Considerations . . . . . . . . . . . . . . . . . . . 45 12.1. Considerations for Candidate Gathering . . . . . . . . . 45 12.2. Considerations for Candidate Racing . . . . . . . . . . 45 13. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 46 14. References . . . . . . . . . . . . . . . . . . . . . . . . . 46 14.1. Normative References . . . . . . . . . . . . . . . . . . 46 14.2. Informative References . . . . . . . . . . . . . . . . . 47 Appendix A. API Mapping Template . . . . . . . . . . . . . . . . 49 Appendix B. Additional Properties . . . . . . . . . . . . . . . 50 B.1. Properties Affecting Sorting of Branches . . . . . . . . 50 Appendix C. Reasons for errors . . . . . . . . . . . . . . . . . 51 Appendix D. Existing Implementations . . . . . . . . . . . . . . 52 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 52 1. Introduction The Transport Services architecture [I-D.ietf-taps-arch] defines a - system that allows applications to use transport networking protocols - flexibly. The interface such a system exposes to applications is + system that allows applications to flexibly use transport networking + protocols. The API that such a system exposes to applications is defined as the Transport Services API [I-D.ietf-taps-interface]. This API is designed to be generic across multiple transport protocols and sets of protocols features. This document serves as a guide to implementation on how to build a system that provides a Transport Services API. It is the job of an implementation of a Transport Services system to turn the requests of an application into decisions on how to establish connections, and how to transfer data over those connections once established. The terminology used in this document is based on the Architecture @@ -170,30 +163,36 @@ specific transport protocol instance, since multiple candidate Protocol Stacks might be raced. Once a Preconnection has been used to create an outbound Connection or a Listener, the implementation should ensure that the copy of the properties held by the Connection or Listener is not affected when the application makes changes to a Preconnection object. This may involve the implementation performing a deep-copy, copying the object with all the objects that it references. - Once the Connection is established, its interface maps actions and - events to the details of the chosen Protocol Stack. For example, the - same Connection object may ultimately represent a single instance of - one transport protocol (e.g., a TCP connection, a TLS session over - TCP, a UDP flow with fully-specified Local and Remote Endpoints, a - DTLS session, a SCTP stream, a QUIC stream, or an HTTP/2 stream). - The properties held by a Connection or Listener is independent of - other connections that are not part of the same Connection Group. + Once the Connection is established, Transport Services implementation + maps actions and events to the details of the chosen Protocol Stack. + For example, the same Connection object may ultimately represent a + single instance of one transport protocol (e.g., a TCP connection, a + TLS session over TCP, a UDP flow with fully-specified Local and + Remote Endpoints, a DTLS session, a SCTP stream, a QUIC stream, or an + HTTP/2 stream). The properties held by a Connection or Listener is + independent of other connections that are not part of the same + Connection Group. - Once Initate has been called, the Selection Properties and Endpoint + Connection establishment is only a local operation for a Datagram + transport (e.g., UDP(-Lite)), which serves to simplify the local + send/receive functions and to filter the traffic for the specified + addresses and ports [RFC8085]. + + Once Initiate has been called, the Selection Properties and Endpoint information are immutable (i.e, an application is not able to later modify Selection Properties on the original Preconnection object). Listener objects are created with a Preconnection, at which point their configuration should be considered immutable by the implementation. The process of listening is described in Section 4.7. 3. Implementing Pre-Establishment During pre-establishment the application specifies one or more @@ -216,27 +215,27 @@ The Transport Services system should have a list of supported protocols available, which each have transport features reflecting the capabilities of the protocol. Once an application specifies its Transport Properties, the transport system matches the required and prohibited properties against the transport features of the available protocols. In the following cases, failure should be detected during pre- establishment: - * A request by an application for Protocol Properties that include - requirements or prohibitions that cannot be satisfied by any of - the available protocols. For example, if an application requires - "Configure Reliability per Message", but no such feature is - available in any protocol the host running the transport system on - the host running the transport system this should result in an - error, e.g., when SCTP is not supported by the operating system. + * A request by an application for Protocol Properties that cannot be + satisfied by any of the available protocols. For example, if an + application requires "Configure Reliability per Message", but no + such feature is available in any protocol the host running the + transport system on the host running the transport system this + should result in an error, e.g., when SCTP is not supported by the + operating system. * A request by an application for Protocol Properties that are in conflict with each other, i.e., the required and prohibited properties cannot be satisfied by the same protocol. For example, if an application prohibits "Reliable Data Transfer" but then requires "Configure Reliability per Message", this mismatch should result in an error. To avoid allocating resources that are not finally needed, it is important that configuration-time errors fail as early as possible. @@ -257,30 +256,30 @@ interfaces, supported transport protocols, and current/previous Connections. Examples of ways to externally retrieve policy- support information are through OS-specific statistics/ measurement tools and tools that reside on middleboxes and routers. 3. Default implementation policy, i.e., predefined policy by OS or application. In general, any protocol or path used for a connection must conform - to all three sources of constraints. A violation of any of the - layers should cause a protocol or path to be considered ineligible - for use. For an example of application preferences leading to - constraints, an application may prohibit the use of metered network - interfaces for a given Connection to avoid user cost. Similarly, the - system policy at a given time may prohibit the use of such a metered - network interface from the application's process. Lastly, the - implementation itself may default to disallowing certain network - interfaces unless explicitly requested by the application and allowed - by the system. + to all three sources of constraints. A violation that occurs at any + of the policy layers should cause a protocol or path to be considered + ineligible for use. For an example of application preferences + leading to constraints, an application may prohibit the use of + metered network interfaces for a given Connection to avoid user cost. + Similarly, the system policy at a given time may prohibit the use of + such a metered network interface from the application's process. + Lastly, the implementation itself may default to disallowing certain + network interfaces unless explicitly requested by the application and + allowed by the system. It is expected that the database of system policies and the method of looking up these policies will vary across various platforms. An implementation should attempt to look up the relevant policies for the system in a dynamic way to make sure it is reflecting an accurate version of the system policy, since the system's policy regarding the application's traffic may change over time due to user or administrative changes. 4. Implementing Connection Establishment @@ -346,21 +345,21 @@ criteria. 4.1. Structuring Candidates as a Tree As noted above, the considereration of multiple candidates in a gathering and racing process can be conceptually structured as a tree; this terminological convention is used throughout this document. Each leaf node of the tree represents a single, coherent connection - attempt, with an Endpoint, a Path, and a set of protocols that can + attempt, with an endpoint, a path, and a set of protocols that can directly negotiate and send data on the network. Each node in the tree that is not a leaf represents a connection attempt that is either underspecified, or else includes multiple distinct options. For example, when connecting on an IP network, a connection attempt to a hostname and port is underspecified, because the connection attempt requires a resolved IP address as its Remote Endpoint. In this case, the node represented by the connection attempt to the hostname is a parent node, with child nodes for each IP address. Similarly, an implementation that is allowed to connect using multiple interfaces will have a parent node of the tree for the @@ -383,21 +382,21 @@ | 192.0.2.1:80/Wi-Fi | | 192.0.2.1:80/LTE | | 2001:DB8::1.80/LTE | +====================+ +====================+ +======================+ The rest of this section will use a notation scheme to represent this tree. The parent (or trunk) node of the tree will be represented by a single integer, such as "1". Each child of that node will have an integer that identifies it, from 1 to the number of children. That child node will be uniquely identified by concatenating its integer to it's parents identifier with a dot in between, such as "1.1" and "1.2". Each node will be summarized by a tuple of three elements: - Endpoint, Path, and Protocol. The above example can now be written + endpoint, path, and protocol. The above example can now be written more succinctly as: 1 [www.example.com:80, Any, TCP] 1.1 [www.example.com:80, Wi-Fi, TCP] 1.1.1 [192.0.2.1:80, Wi-Fi, TCP] 1.2 [www.example.com:80, LTE, TCP] 1.2.1 [192.0.2.1:80, LTE, TCP] 1.2.2 [2001:DB8::1.80, LTE, TCP] When an implementation views this aggregate set of connection @@ -720,22 +719,22 @@ Resolving the Remote Endpoint is not a local operation. It will involve a directory service, and can require communication with the Remote Endpoint to rendezvous and exchange peer addresses. This can expose some or all of the candidate Local Endpoints to the Remote Endpoint. 4.3. Candidate Racing The primary goal of the Candidate Racing process is to successfully - negotiate a protocol stack to an endpoint over an interface--to - connect a single leaf node of the tree--with as little delay and as + negotiate a protocol stack to an endpoint over an interface to + connect a single leaf node of the tree with as little delay and as few unnecessary connections attempts as possible. Optimizing these two factors improves the user experience, while minimizing network load. This section covers the dynamic aspect of connection establishment. The tree described above is a useful conceptual and architectural model. However, an implementation is unable to know the full tree before it is formed and many of the possible branches ultimately might not be used. @@ -897,35 +896,36 @@ transport system should notify the application with an InitiateError event. An InitiateError event should also be generated in case the transport system finds no usable candidates to race. 4.5. Establishing multiplexed connections Multiplexing several Connections over a single underlying transport connection requires that the Connections to be multiplexed belong to the same Connection Group (as is indicated by the application using the Clone call). When the underlying transport connection supports - multi-streaming, the Transport System can map each Connection in the - Connection Group to a different stream. Thus, when the Connections - that are offered to an application by the Transport System are - multiplexed, the Transport System may implement the establishment of - a new Connection by simply beginning to use a new stream of an - already established transport connection and there is no need for a - connection establishment procedure. This, then, also means that - there may not be any "establishment" message (like a TCP SYN), but - the application can simply start sending or receiving. Therefore, - when the Initiate action of a Transport System is called without - Messages being handed over, it cannot be guaranteed that the other - endpoint will have any way to know about this, and hence a passive - endpoint's ConnectionReceived event may not be called upon an active - endpoint's Inititate. Instead, calling the ConnectionReceived event - may be delayed until the first Message arrives. + multi-streaming, the Transport Services System can map each + Connection in the Connection Group to a different stream. Thus, when + the Connections that are offered to an application by the Transport + Services API are multiplexed, the Transport Services implementation + can establish a new Connection by simply beginning to use a new + stream of an already established transport Connection and there is no + need for a connection establishment procedure. This, then, also + means that there may not be any "establishment" message (like a TCP + SYN), but the application can simply start sending or receiving. + Therefore, when the Initiate action of a Transport Services API is + called without Messages being handed over, it cannot be guaranteed + that the Remote Endpoint will have any way to know about this, and + hence a passive endpoint's ConnectionReceived event might not be + called until data is received. Instead, calling the + ConnectionReceived event could be delayed until the first Message + arrives. 4.6. Handling connectionless protocols While protocols that use an explicit handshake to validate a Connection to a peer can be used for racing multiple establishment attempts in parallel, connectionless protocols such as raw UDP do not offer a way to validate the presence of a peer or the usability of a Connection without application feedback. An implementation should consider such a protocol stack to be established as soon as the Transport Services system has selected a path on which to send data. @@ -1059,133 +1059,133 @@ that are Safely Replayable. When a transport system is permitted to replay messages, replay protection could be provided by the application. * Final: when this is true, this means that the sender will not send any further messages. The Connection need not be closed (in case the Protocol Stack supports half-close operation, like TCP). Any messages sent after a Final message will result in a SendError. * Corruption Protection Length: when this is set to any value other - than "Full Coverage", it sets the minimum protection in protocols + than Full Coverage, it sets the minimum protection in protocols that allow limiting the checksum length (e.g. UDP-Lite). If the protocol stack does not support checksum length limitation, this property may be ignored. * Reliable Data Transfer (Message): When true, the property specifies that the Message must be reliably transmitted. When false, and if unreliable transmission is supported by the underlying protocol, then the Message should be unreliably transmitted. If the underlying protocol does not support unreliable transmission, the Message should be reliably transmitted. * Message Capacity Profile Override: When true, this expresses a - wish to override the Generic Connection Property "Capacity - Profile" for this Message. Depending on the value, this can, for - example, be implemented by changing the DSCP value of the - associated packet (note that the guidelines in Section 6 of - [RFC7657] apply; e.g., the DSCP value should not be changed for - different packets within a reliable transport protocol session or - DCCP connection). + wish to override the Generic Connection Property Capacity Profile + for this Message. Depending on the value, this can, for example, + be implemented by changing the DSCP value of the associated packet + (note that the guidelines in Section 6 of [RFC7657] apply; e.g., + the DSCP value should not be changed for different packets within + a reliable transport protocol session or DCCP connection). * No Fragmentation: When set, this property limits the message size to the Maximum Message Size Before Fragmentation or Segmentation (see Section 10.1.7 of [I-D.ietf-taps-interface]). Messages larger than this size generate an error. Setting this avoids transport-layer segmentation or network-layer fragmentation. When used with transports running over IP version 4 the Don't Fragment bit will be set to avoid on-path IP fragmentation ([RFC8304]). 5.1.2. Send Completion The application should be notified whenever a Message or partial Message has been consumed by the Protocol Stack, or has failed to send. The time at which a Message is considered to have been consumed by the Protocol Stack may vary depending on the protocol. For example, for a basic datagram protocol like UDP, this may correspond to the time when the packet is sent into the interface driver. For a protocol that buffers data in queues, like TCP, this may correspond to when the data has entered the send buffer. The - time at which a message has failed to send is after the Protocol - Stack or the Transport Services implementation itself has not - successfully sent the entire Message content or partial Message - content on any open candidate connection; this may depend on - protocol-specific timeouts. + time at which a message failed to send is when Transport Services + implementation (including the Protocol Stack) has not successfully + sent the entire Message content or partial Message content on any + open candidate connection; this can depend on protocol-specific + timeouts. 5.1.3. Batching Sends Since sending a Message may involve a context switch between the - application and the transport system, sending patterns that involve - multiple small Messages can incur high overhead if each needs to be - enqueued separately. To avoid this, the application can indicate a - batch of Send actions through the API. When this is used, the - implementation can defer the processing of Messages until the batch - is complete. + application and the Transport Services system, sending patterns that + involve multiple small Messages can incur high overhead if each needs + to be enqueued separately. To avoid this, the application can + indicate a batch of Send actions through the API. When this is used, + the implementation can defer the processing of Messages until the + batch is complete. 5.2. Receiving Messages Similar to sending, Receiving a Message is determined by the top- level protocol in the established Protocol Stack. The main difference with Receiving is that the size and boundaries of the Message are not known beforehand. The application can communicate in its Receive action the parameters for the Message, which can help the - implementation know how much data to deliver and when. For example, - if the application only wants to receive a complete Message, the - implementation should wait until an entire Message (datagram, stream, - or frame) is read before delivering any Message content to the - application. This requires the implementation to understand where - messages end, either via a supplied deframer or because the top-level - protocol in the established Protocol Stack preserves message - boundaries. If the top-level protocol only supports a byte-stream - and no framers were supported, the application can control the flow - of received data by specifying the minimum number of bytes of Message - content it wants to receive at one time. + Transport Services implementation know how much data to deliver and + when. For example, if the application only wants to receive a + complete Message, the implementation should wait until an entire + Message (datagram, stream, or frame) is read before delivering any + Message content to the application. This requires the implementation + to understand where messages end, either via a supplied deframer or + because the top-level protocol in the established Protocol Stack + preserves message boundaries. If the top-level protocol only + supports a byte-stream and no framers were supported, the application + can control the flow of received data by specifying the minimum + number of bytes of Message content it wants to receive at one time. If a Connection finishes before a requested Receive action can be - satisfied, the implementation should deliver any partial Message - content outstanding, or if none is available, an indication that - there will be no more received Messages. + satisfied, the Transport Services API should deliver any partial + Message content outstanding, or if none is available, an indication + that there will be no more received Messages. 5.3. Handling of data for fast-open protocols Several protocols allow sending higher-level protocol or application data during their protocol establishment, such as TCP Fast Open [RFC7413] and TLS 1.3 [RFC8446]. This approach is referred to as sending Zero-RTT (0-RTT) data. This is a desirable feature, but poses challenges to an implementation that uses racing during connection establishment. The amount of data that can be sent as 0-RTT data varies by protocol - and can be queried by the application using the "Maximum Message Size - Concurrent with Connection Establishment" Connection Property. An + and can be queried by the application using the Maximum Message Size + Concurrent with Connection Establishment Connection Property. An implementation can set this property according to the protocols that it will race based on the given Selection Properties when the application requests to establish a connection. If the application has 0-RTT data to send in any protocol handshakes, it needs to provide this data before the handshakes have begun. When racing, this means that the data should be provided before the process of connection establishment has begun. If the application wants to send 0-RTT data, it must indicate this to the implementation - by setting the "Safely Replayable" send parameter to true when - sending the data. In general, 0-RTT data may be replayed (for - example, if a TCP SYN contains data, and the SYN is retransmitted, - the data will be retransmitted as well but may be considered as a new - connection instead of a retransmission). Also, when racing - connections, different leaf nodes have the opportunity to send the - same data independently. If data is truly safely replayable, this - should be permissible. + by setting the Safely Replayable send parameter to true when sending + the data. In general, 0-RTT data may be replayed (for example, if a + TCP SYN contains data, and the SYN is retransmitted, the data will be + retransmitted as well but may be considered as a new connection + instead of a retransmission). Also, when racing connections, + different leaf nodes have the opportunity to send the same data + independently. If data is truly safely replayable, this should be + permissible. - Once the application has provided its 0-RTT data, an implementation - should keep a copy of this data and provide it to each new leaf node - that is started and for which a 0-RTT protocol is being used. + Once the application has provided its 0-RTT data, a Transport + Services implementation should keep a copy of this data and provide + it to each new leaf node that is started and for which a 0-RTT + protocol is being used. It is also possible that protocol stacks within a particular leaf node use 0-RTT handshakes without any safely replayable application data. For example, TCP Fast Open could use a Client Hello from TLS as its 0-RTT data, shortening the cumulative handshake time. 0-RTT handshakes often rely on previous state, such as TCP Fast Open cookies, previously established TLS tickets, or out-of-band distributed pre-shared keys (PSKs). Implementations should be aware of security concerns around using these tokens across multiple @@ -1199,88 +1199,88 @@ wire. 6. Implementing Message Framers Message Framers are functions that define simple transformations between application Message data and raw transport protocol data. A Framer can encapsulate or encode outbound Messages, and decapsulate or decode inbound data into Messages. While many protocols can be represented as Message Framers, for the - purposes of the Transport Services interface these are ways for + purposes of the Transport Services API, these are ways for applications or application frameworks to define their own Message parsing to be included within a Connection's Protocol Stack. As an example, TLS is exposed as a protocol natively supported by the - Transport Services interface, even though it could also serve the - purpose of framing data over TCP. + Transport Services API, even though it could also serve the purpose + of framing data over TCP. Most Message Framers fall into one of two categories: * Header-prefixed record formats, such as a basic Type-Length-Value (TLV) structure * Delimiter-separated formats, such as HTTP/1.1. - Common Message Framers can be provided by the Transport Services + Common Message Framers can be provided by a Transport Services implementation, but an implementation ought to allow custom Message Framers to be defined by the application or some other piece of - software. This section describes one possible interface for defining + software. This section describes one possible API for defining Message Framers as an example. 6.1. Defining Message Framers A Message Framer is primarily defined by the code that handles events for a framer implementation, specifically how it handles inbound and outbound data parsing. The function that implements custom framing logic will be referred to as the "framer implementation", which may - be provided by the Transport Services implementation or the - application itself. The Message Framer refers to the object or - function within the main Connection implementation that delivers - events to the custom framer implementation whenever data is ready to - be parsed or framed. + be provided by a Transport Services implementation or the application + itself. The Message Framer refers to the object or function within + the main Connection implementation that delivers events to the custom + framer implementation whenever data is ready to be parsed or framed. When a Connection establishment attempt begins, an event can be delivered to notify the framer implementation that a new Connection is being created. Similarly, a stop event can be delivered when a Connection is being torn down. The framer implementation can use the Connection object to look up specific properties of the Connection or the network being used that may influence how to frame Messages. MessageFramer -> Start(Connection) MessageFramer -> Stop(Connection) - When a Message Framer generates a "Start" event, the framer + When a Message Framer generates a Start event, the framer implementation has the opportunity to start writing some data prior - to the Connection delivering its "Ready" event. This allows the + to the Connection delivering its Ready event. This allows the implementation to communicate control data to the Remote Endpoint that can be used to parse Messages. MessageFramer.MakeConnectionReady(Connection) - Similarly, when a Message Framer generates a "Stop" event, the framer + Similarly, when a Message Framer generates a Stop event, the framer implementation has the opportunity to write some final data or clear - up its local state before the "Closed" event is delivered to the + up its local state before the Closed event is delivered to the Application. The framer implementation can indicate that it has finished with this. MessageFramer.MakeConnectionClosed(Connection) At any time if the implementation encounters a fatal error, it can also cause the Connection to fail and provide an error. MessageFramer.FailConnection(Connection, Error) + Should the framer implementation deem the candidate selected during - racing unsuitable it can signal this by failing the Connection prior - to marking it as ready. If there are no other candidates available, - the Connection will fail. Otherwise, the Connection will select a - different candidate and the Message Framer will generate a new - "Start" event. + racing unsuitable, it can signal this to the Transport Services API + by failing the Connection prior to marking it as ready. If there are + no other candidates available, the Connection will fail. Otherwise, + the Connection will select a different candidate and the Message + Framer will generate a new Start event. Before an implementation marks a Message Framer as ready, it can also dynamically add a protocol or framer above it in the stack. This allows protocols that need to add TLS conditionally, like STARTTLS [RFC3207], to modify the Protocol Stack based on a handshake result. otherFramer := NewMessageFramer() MessageFramer.PrependFramer(Connection, otherFramer) A Message Framer might also choose to go into a passthrough mode once @@ -1303,21 +1303,21 @@ Upon receiving this event, a framer implementation is responsible for performing any necessary transformations and sending the resulting data back to the Message Framer, which will in turn send it to the next protocol. Implementations SHOULD ensure that there is a way to pass the original data through without copying to improve performance. MessageFramer.Send(Connection, Data) To provide an example, a simple protocol that adds a length as a - header would receive the "NewSentMessage" event, create a data + header would receive the NewSentMessage event, create a data representation of the length of the Message data, and then send a block of data that is the concatenation of the length header and the original Message data. 6.3. Receiver-side Message Framing In order to parse a received flow of data into Messages, the Message Framer notifies the framer implementation whenever new data is available to parse. @@ -1337,91 +1336,97 @@ To deliver a Message to the application, the framer implementation can either directly deliver data that it has allocated, or deliver a range of data directly from the underlying transport and simultaneously advance the receive cursor. MessageFramer.AdvanceReceiveCursor(Connection, Length) MessageFramer.DeliverAndAdvanceReceiveCursor(Connection, MessageContext, Length, IsEndOfMessage) MessageFramer.Deliver(Connection, MessageContext, Data, IsEndOfMessage) - Note that "MessageFramer.DeliverAndAdvanceReceiveCursor" allows the + Note that MessageFramer.DeliverAndAdvanceReceiveCursor allows the framer implementation to earmark bytes as part of a Message even before they are received by the transport. This allows the delivery of very large Messages without requiring the implementation to directly inspect all of the bytes. To provide an example, a simple protocol that parses a length as a - header value would receive the "HandleReceivedData" event, and call - "Parse" with a minimum and maximum set to the length of the header - field. Once the parse succeeded, it would call - "AdvanceReceiveCursor" with the length of the header field, and then - call "DeliverAndAdvanceReceiveCursor" with the length of the body - that was parsed from the header, marking the new Message as complete. + header value would receive the HandleReceivedData event, and call + Parse with a minimum and maximum set to the length of the header + field. Once the parse succeeded, it would call AdvanceReceiveCursor + with the length of the header field, and then call + DeliverAndAdvanceReceiveCursor with the length of the body that was + parsed from the header, marking the new Message as complete. 7. Implementing Connection Management - Once a Connection is established, the Transport Services system - allows applications to interact with the Connection by modifying or + Once a Connection is established, the Transport Services API allows + applications to interact with the Connection by modifying or inspecting Connection Properties. A Connection can also generate events in the form of Soft Errors. The set of Connection Properties that are supported for setting and getting on a Connection are described in [I-D.ietf-taps-interface]. For any properties that are generic, and thus could apply to all - protocols being used by a Connection, the Transport System should - store the properties in storage common to all protocols, and notify - all protocol instances in the Protocol Stack whenever the properties - have been modified by the application. For protocol-specfic - properties, such as the User Timeout that applies to TCP, the - Transport System only needs to update the relevant protocol instance. + protocols being used by a Connection, the Transport Services + implementation should store the properties in storage common to all + protocols, and notify all protocol instances in the Protocol Stack + whenever the properties have been modified by the application. For + protocol-specfic properties, such as the User Timeout that applies to + TCP, the Transport Services implementation only needs to update the + relevant protocol instance. If an error is encountered in setting a property (for example, if the application tries to set a TCP-specific property on a Connection that is not using TCP), the action should fail gracefully. The application may be informed of the error, but the Connection itself should not be terminated. - The Transport Services implementation should allow protocol instances - in the Protocol Stack to pass up arbitrary generic or protocol- - specific errors that can be delivered to the application as Soft - Errors. These allow the application to be informed of ICMP errors, - and other similar events. + The Transport Services API should allow protocol instances in the + Protocol Stack to pass up arbitrary generic or protocol-specific + errors that can be delivered to the application as Soft Errors. + These allow the application to be informed of ICMP errors, and other + similar events. 7.1. Pooled Connection - For protocols that employ request/response pairs and do not require - in-order delivery of the responses, like HTTP, the transport - implementation may distribute interactions across several underlying - transport connections. For these kinds of protocols, implementations - may hide the connection management and only expose a single - Connection object and the individual requests/responses as messages. - These Pooled Connections can use multiple connections or multiple - streams of multi-streaming connections between endpoints, as long as - all of these satisfy the requirements, and prohibitions specified in - the Selection Properties of the Pooled Connection. This enables - implementations to realize transparent connection coalescing, - connection migration, and to perform per-message endpoint and path - selection by choosing among these underlying connections. + For applications that do not need in-order delivery of Messages, the + Transport Services implementation may distribute Messages of a single + Connection across several underlying transport connections or + multiple streams of multi-streaming connections between endpoints, as + long as all of these satisfy the Selection Properties. The Transport + Services implementation will then hide this connection management and + only expose a single Connection object, which we here call a "Pooled + Connection". This is in contrast to Connection Groups, which + explicitly expose combined treatment of Connections, giving the + application control over multiplexing, for example. + + Pooled Connections can be useful when the application using the + Transport Services system implements a protocol such as HTTP, which + employs request/response pairs and does not require in-order delivery + of responses. This enables implementations of Transport Services + systems to realize transparent connection coalescing, connection + migration, and to perform per-message endpoint and path selection by + choosing among multiple underlying connections. 7.2. Handling Path Changes When a path change occurs, e.g., when the IP address of an interface changes or a new interface becomes available, the Transport Services implementation is responsible for notifying the Protocol Instance of the change. The path change may interrupt connectivity on a path for an active connection or provide an opportunity for a transport that supports multipath or migration to adapt to the new paths. Note - that, from the Transport Services API point of view, migration is + that, in the model of the Transport Services API, migration is considered a part of multipath connectivity; it is just a limiting - policy on multipath usage. If the "multipath" Selection Property is - set to "Disabled", migration is disallowed. + policy on multipath usage. If the multipath Selection Property is + set to Disabled, migration is disallowed. For protocols that do not support multipath or migration, the Protocol Instances should be informed of the path change, but should not be forcibly disconnected if the previously used path becomes unavailable. There are many common user scenarios that can lead to a path becoming temporarily unavailable, and then recovering before the transport protocol reaches a timeout error. These are particularly common using mobile devices. Examples include: an Ethernet cable becoming unplugged and then plugged back in; a device losing a Wi-Fi signal while a user is in an elevator, and reattaching when the user @@ -1429,37 +1434,36 @@ a train through a tunnel. If the device is able to rejoin a network with the same IP address, a stateful transport connection can generally resume. Thus, while it is useful for a Protocol Instance to be aware of a temporary loss of connectivity, the Transport Services implementation should not aggressively close connections in these scenarios. If the Protocol Stack includes a transport protocol that supports multipath connectivity, the Transport Services implementation should also inform the Protocol Instance of potentially new paths that - become permissible based on the "multipath" Selection Property and - the "multipath-policy" Connection Property choices made by the - application. A protocol can then establish new subflows over new - paths while an active path is still available or, if migration is - supported, also after a break has been detected, and should attempt - to tear down subflows over paths that are no longer used. The - Transport Services API's Connection Property "multipath-policy" - allows an application to indicate when and how different paths should - be used. However, detailed handling of these policies is still - implementation-specific. For example, if the "multipath" Selection - Property is set to "active", the decision about when to create a new - path or to announce a new path or set of paths to the Remote - Endpoint, e.g., in the form of additional IP addresses, is - implementation-specific. If the Protocol Stack includes a transport - protocol that does not support multipath, but does support migrating - between paths, the update to the set of available paths can trigger - the connection to be migrated. + become permissible based on the multipath Selection Property and the + multipath-policy Connection Property choices made by the application. + A protocol can then establish new subflows over new paths while an + active path is still available or, if migration is supported, also + after a break has been detected, and should attempt to tear down + subflows over paths that are no longer used. The Connection Property + multipath-policy of the Transport Services API allows an application + to indicate when and how different paths should be used. However, + detailed handling of these policies is still implementation-specific. + For example, if the multipath Selection Property is set to active, + the decision about when to create a new path or to announce a new + path or set of paths to the Remote Endpoint, e.g., in the form of + additional IP addresses, is implementation-specific. If the Protocol + Stack includes a transport protocol that does not support multipath, + but does support migrating between paths, the update to the set of + available paths can trigger the connection to be migrated. In case of Pooled Connections Section 7.1, the Transport Services implementation may add connections over new paths to the pool if permissible based on the multipath policy and Selection Properties. In case a previously used path becomes unavailable, the transport system may disconnect all connections that require this path, but should not disconnect the pooled connection object exposed to the application. The strategy to do so is implementation-specific, but should be consistent with the behavior of multipath transports. @@ -1474,93 +1478,94 @@ SCTP is an example of a protocol that does not support such half- closed connections. Hence, with SCTP, the meaning of "close" is stricter: an application has no more data to send (but expects all data that has been handed over to be reliably delivered), and will also not receive any more data. Implementing a protocol independent transport system means that the exposed semantics must be the strictest subset of the semantics of all supported protocols. Hence, as is common with all reliable transport protocols, after a Close action, the application can expect - to have its reliability requirements honored regarding the data it - has given to the Transport System, but it cannot expect to be able to - read any more data after calling Close. + to have its reliability requirements honored regarding the data + provided to the Transport Services API, but it cannot expect to be + able to read any more data after calling Close. Abort differs from Close only in that no guarantees are given - regarding data that the application has handed over to the Transport - System before calling Abort. + regarding any data that the application sent to the Transport + Services API before calling Abort. As explained in Section 4.5, when a new stream is multiplexed on an already existing connection of a Transport Protocol Instance, there is no need for a connection establishment procedure. Because the - Connections that are offered by the Transport System can be - implemented as streams that are multiplexed on a transport protocol's - connection, it can therefore not be guaranteed that one Endpoint's - Initiate action provokes a ConnectionReceived event at its peer. + Connections that are offered by a Transport Services implementation + can be implemented as streams that are multiplexed on a transport + protocol's connection, it can therefore not be guaranteed an Initiate + action from one endpoint provokes a ConnectionReceived event at its + peer. For Close (provoking a Finished event) and Abort (provoking a ConnectionError event), the same logic applies: while it is desirable to be informed when a peer closes or aborts a Connection, whether this is possible depends on the underlying protocol, and no guarantees can be given. With SCTP, the transport system can use the stream reset procedure to cause a Finish event upon a Close action from the peer [NEAT-flow-mapping]. 9. Cached State Beyond a single Connection's lifetime, it is useful for an implementation to keep state and history. This cached state can help improve future Connection establishment due to re-using results and credentials, and favoring paths and protocols that performed well in the past. - Cached state may be associated with different Endpoints for the same + Cached state may be associated with different endpoints for the same Connection, depending on the protocol generating the cached content. For example, session tickets for TLS are associated with specific endpoints, and thus should be cached based on a Connection's hostname - Endpoint (if applicable). On the other hand, performance - characteristics of a path are more likely tied to the IP address and - subnet being used. + endpoint (if applicable). However, performance characteristics of a + path are more likely tied to the IP address and subnet being used. 9.1. Protocol state caches Some protocols will have long-term state to be cached in association - with Endpoints. This state often has some time after which it is + with endpoints. This state often has some time after which it is expired, so the implementation should allow each protocol to specify an expiration for cached content. Examples of cached protocol state include: * The DNS protocol can cache resolution answers (A and AAAA queries, for example), associated with a Time To Live (TTL) to be used for future hostname resolutions without requiring asking the DNS resolver again. * TLS caches session state and tickets based on a hostname, which can be used for resuming sessions with a server. * TCP can cache cookies for use in TCP Fast Open. Cached protocol state is primarily used during Connection establishment for a single Protocol Stack, but may be used to influence an implementation's preference between several candidate - Protocol Stacks. For example, if two IP address Endpoints are + Protocol Stacks. For example, if two IP address endpoints are otherwise equally preferred, an implementation may choose to attempt a connection to an address for which it has a TCP Fast Open cookie. - Applications can request that a Connection Group maintain a separate - cache for protocol state. Connections in the group will not use - cached state from connections outside the group, and connections - outside the group will not use state cached from connections inside - the group. This may be necessary, for example, if application-layer - identifiers rotate and clients wish to avoid linkability via - trackable TLS tickets or TFO cookies. + Applications can use the Transport Services API to request that a + Connection Group maintain a separate cache for protocol state. + Connections in the group will not use cached state from connections + outside the group, and connections outside the group will not use + state cached from connections inside the group. This may be + necessary, for example, if application-layer identifiers rotate and + clients wish to avoid linkability via trackable TLS tickets or TFO + cookies. 9.2. Performance caches In addition to protocol state, Protocol Instances should provide data into a performance-oriented cache to help guide future protocol and path selection. Some performance information can be gathered generically across several protocols to allow predictive comparisons between protocols on given paths: * Observed Round Trip Time @@ -1596,26 +1602,26 @@ Trip Time observed by TCP over a particular network path may vary over a relatively short time interval. For such values, the implementation should remove them from the cache more quickly, or treat older values with less confidence/weight. [I-D.ietf-tcpm-2140bis] provides guidance about sharing of TCP Control Block information between connections on initialization. 10. Specific Transport Protocol Considerations - Each protocol that can run as part of a Transport Services + Each protocol that is supported by a Transport Services implementation should have a well-defined API mapping. API mappings - for a protocol apply most to Connections in which the given protocol - is the "top" of the Protocol Stack. For example, the mapping of the - "Send" function for TCP applies to Connections in which the - application directly sends over TCP. + for a protocol are important for Connections in which a given + protocol is the "top" of the Protocol Stack. For example, the + mapping of the Send function for TCP applies to Connections in which + the application directly sends over TCP. Each protocol has a notion of Connectedness. Possible values for Connectedness are: * Connectionless. Connectionless protocols do not establish explicit state between endpoints, and do not perform a handshake during Connection establishment. * Connected. Connected protocols establish state between endpoints, and perform a handshake during Connection establishment. The @@ -1662,263 +1668,269 @@ Connectedness: Connected Data Unit: Byte-stream API mappings for TCP are as follows: Connection Object: TCP connections between two hosts map directly to Connection objects. - Initiate: CONNECT.TCP. Calling "Initiate" on a TCP Connection - causes it to reserve a local port, and send a SYN to the Remote - Endpoint. + Initiate: CONNECT.TCP. Calling Initiate on a TCP Connection causes + it to reserve a local port, and send a SYN to the Remote Endpoint. - InitiateWithSend: CONNECT.TCP with parameter "user message". Early + InitiateWithSend: CONNECT.TCP with parameter user message. Early safely replayable data is sent on a TCP Connection in the SYN, as TCP Fast Open data. Ready: A TCP Connection is ready once the three-way handshake is complete. InitiateError: Failure of CONNECT.TCP. TCP can throw various errors during connection setup. Specifically, it is important to handle a RST being sent by the peer during the handshake. ConnectionError: Once established, TCP throws errors whenever the connection is disconnected, such as due to receiving a RST from the peer. - Listen: LISTEN.TCP. Calling "Listen" for TCP binds a local port and + Listen: LISTEN.TCP. Calling Listen for TCP binds a local port and prepares it to receive inbound SYN packets from peers. ConnectionReceived: TCP Listeners will deliver new connections once they have replied to an inbound SYN with a SYN-ACK. - Clone: Calling "Clone" on a TCP Connection creates a new Connection + Clone: Calling Clone on a TCP Connection creates a new Connection with equivalent parameters. These Connections, and Connections - generated via later calls to "Clone" on one of them, form a - Connection Group. To realize entanglement for these Connections, - with the exception of "Connection Priority", changing a Connection - Property on one of them must affect the Connection Properties of - the others too. No guarantees of honoring the Connection Property - "Connection Priority" are given, and thus it is safe for an - implementation of a transport system to ignore this property. - When it is reasonable to assume that Connections traverse the same - path (e.g., when they share the same encapsulation), support for - it can also experimentally be implemented using a congestion - control coupling mechanism (see for example [TCP-COUPLING] or - [RFC3124]). + generated via later calls to Clone on an Establied Connection, + form a Connection Group. To realize entanglement for these + Connections, with the exception of Connection Priority, changing a + Connection Property on one of them must affect the Connection + Properties of the others too. No guarantees of honoring the + Connection Property Connection Priority are given, and thus it is + safe for an implementation of a transport system to ignore this + property. When it is reasonable to assume that Connections + traverse the same path (e.g., when they share the same + encapsulation), support for it can also experimentally be + implemented using a congestion control coupling mechanism (see for + example [TCP-COUPLING] or [RFC3124]). Send: SEND.TCP. TCP does not on its own preserve Message - boundaries. Calling "Send" on a TCP connection lays out the bytes + boundaries. Calling Send on a TCP connection lays out the bytes on the TCP send stream without any other delineation. Any Message marked as Final will cause TCP to send a FIN once the Message has been completely written, by calling CLOSE.TCP immediately upon successful termination of SEND.TCP. Note that transmitting a - Message marked as Final should not cause the "Closed" event to be + Message marked as Final should not cause the Closed event to be delivered to the application, as it will still be possible to receive data until the peer closes or aborts the TCP connection. Receive: With RECEIVE.TCP, TCP delivers a stream of bytes without - any Message delineation. All data delivered in the "Received" or - "ReceivedPartial" event will be part of a single stream-wide - Message that is marked Final (unless a Message Framer is used). + any Message delineation. All data delivered in the Received or + ReceivedPartial event will be part of a single stream-wide Message + that is marked Final (unless a Message Framer is used). EndOfMessage will be delivered when the TCP Connection has received a FIN (CLOSE-EVENT.TCP) from the peer. Note that - reception of a FIN should not cause the "Closed" event to be + reception of a FIN should not cause the Closed event to be delivered to the application, as it will still be possible for the application to send data. - Close: Calling "Close" on a TCP Connection indicates that the + Close: Calling Close on a TCP Connection indicates that the Connection should be gracefully closed (CLOSE.TCP) by sending a FIN to the peer. It will then still be possible to receive data - until the peer closes or aborts the TCP connection. The "Closed" + until the peer closes or aborts the TCP connection. The Closed event will be issued upon reception of a FIN. - Abort: Calling "Abort" on a TCP Connection indicates that the + Abort: Calling Abort on a TCP Connection indicates that the Connection should be immediately closed by sending a RST to the peer (ABORT.TCP). 10.2. MPTCP Connectedness: Connected Data Unit: Byte-stream - API mappings for MPTCP are identical to TCP. MPTCP adds support for - multipath properties, such as "Multipath Transport" and "Policy for - using Multipath Transports". + the Transport Services API mappings for MPTCP are identical to TCP. + MPTCP adds support for multipath properties, such as "Multipath + Transport" and "Policy for using Multipath Transports". 10.3. UDP Connectedness: Connectionless Data Unit: Datagram API mappings for UDP are as follows: Connection Object: UDP connections represent a pair of specific IP addresses and ports on two hosts. - Initiate: CONNECT.UDP. Calling "Initiate" on a UDP Connection - causes it to reserve a local port, but does not generate any - traffic. + Initiate: CONNECT.UDP. Calling Initiate on a UDP Connection causes + it to reserve a local port, but does not generate any traffic. InitiateWithSend: Early data on a UDP Connection does not have any special meaning. The data is sent whenever the Connection is Ready. Ready: A UDP Connection is ready once the system has reserved a local port and has a path to send to the Remote Endpoint. InitiateError: UDP Connections can only generate errors on initiation due to port conflicts on the local system. ConnectionError: Once in use, UDP throws "soft errors" (ERROR.UDP(- Lite)) upon receiving ICMP notifications indicating failures in the network. - Listen: LISTEN.UDP. Calling "Listen" for UDP binds a local port and + Listen: LISTEN.UDP. Calling Listen for UDP binds a local port and prepares it to receive inbound UDP datagrams from peers. ConnectionReceived: UDP Listeners will deliver new connections once they have received traffic from a new Remote Endpoint. - Clone: Calling "Clone" on a UDP Connection creates a new Connection + Clone: Calling Clone on a UDP Connection creates a new Connection with equivalent parameters. The two Connections are otherwise independent. - Send: SEND.UDP(-Lite). Calling "Send" on a UDP connection sends the + Send: SEND.UDP(-Lite). Calling Send on a UDP connection sends the data as the payload of a complete UDP datagram. Marking Messages as Final does not change anything in the datagram's contents. Upon sending a UDP datagram, some relevant fields and flags in the IP header can be controlled: DSCP (SET_DSCP.UDP(-Lite)), DF in IPv4 (SET_DF.UDP(-Lite)) and ECN flag (SET_ECN.UDP(-Lite)). Receive: RECEIVE.UDP(-Lite). UDP only delivers complete Messages to - "Received", each of which represents a single datagram received in - a UDP packet. Upon receiving a UDP datagram, the ECN flag from - the IP header can be obtained (GET_ECN.UDP(-Lite)). + Received, each of which represents a single datagram received in a + UDP packet. Upon receiving a UDP datagram, the ECN flag from the + IP header can be obtained (GET_ECN.UDP(-Lite)). - Close: Calling "Close" on a UDP Connection (ABORT.UDP(-Lite)) - releases the local port reservation. + Close: Calling Close on a UDP Connection (ABORT.UDP(-Lite)) releases + the local port reservation. - Abort: Calling "Abort" on a UDP Connection (ABORT.UDP(-Lite)) is - identical to calling "Close". + Abort: Calling Abort on a UDP Connection (ABORT.UDP(-Lite)) is + identical to calling Close. 10.4. UDP-Lite Connectedness: Connectionless Data Unit: Datagram - API mappings for UDP-Lite are identical to UDP. Properties that - require checksum coverage are not supported by UDP-Lite, such as - "Corruption Protection Length", "Full Checksum Coverage on Sending", - "Required Minimum Corruption Protection Coverage for Receiving", and - "Full Checksum Coverage on Receiving". + The Transport Services API mappings for UDP-Lite are identical to + UDP. Properties that require checksum coverage are not supported by + UDP-Lite, such as "Corruption Protection Length", "Full Checksum + Coverage on Sending", "Required Minimum Corruption Protection + Coverage for Receiving", and "Full Checksum Coverage on Receiving". 10.5. UDP Multicast Receive Connectedness: Connectionless Data Unit: Datagram API mappings for Receiving Multicast UDP are as follows: Connection Object: Established UDP Multicast Receive connections represent a pair of specific IP addresses and ports. The "unidirectional receive" transport property is required, and the Local Endpoint must be configured with a group IP address and a port. - Initiate: Calling "Initiate" on a UDP Multicast Receive Connection + Initiate: Calling Initiate on a UDP Multicast Receive Connection causes an immediate InitiateError. This is an unsupported operation. - InitiateWithSend: Calling "InitiateWithSend" on a UDP Multicast + InitiateWithSend: Calling InitiateWithSend on a UDP Multicast Receive Connection causes an immediate InitiateError. This is an unsupported operation. Ready: A UDP Multicast Receive Connection is ready once the system has received traffic for the appropriate group and port. InitiateError: UDP Multicast Receive Connections generate an InitiateError if Initiate is called. ConnectionError: Once in use, UDP throws "soft errors" (ERROR.UDP(- Lite)) upon receiving ICMP notifications indicating failures in the network. - Listen: LISTEN.UDP. Calling "Listen" for UDP Multicast Receive - binds a local port, prepares it to receive inbound UDP datagrams - from peers, and issues a multicast host join. If a Remote - Endpoint with an address is supplied, the join is Source-specific + Listen: LISTEN.UDP. Calling Listen for UDP Multicast Receive binds + a local port, prepares it to receive inbound UDP datagrams from + peers, and issues a multicast host join. If a Remote Endpoint + with an address is supplied, the join is Source-specific Multicast, and the path selection is based on the route to the Remote Endpoint. If a Remote Endpoint is not supplied, the join is Any-source Multicast, and the path selection is based on the outbound route to the group supplied in the Local Endpoint. + There are cases where it is required to open multiple connections for + the same address(es). For example, one Connection might be opened + for a multicast group to for a multicast control bus, and another + application later opens a separate Connection to the same group to + send signals to and/or receive signals from the common bus. In such + cases, the Transport Services system needs to explicitly enable re- + use of the same set of addresses (equivalent to setting SO_REUSEADDR + in the socket API). + ConnectionReceived: UDP Multicast Receive Listeners will deliver new connections once they have received traffic from a new Remote Endpoint. - Clone: Calling "Clone" on a UDP Multicast Receive Connection creates - a new Connection with equivalent parameters. The two Connections + Clone: Calling Clone on a UDP Multicast Receive Connection creates a + new Connection with equivalent parameters. The two Connections are otherwise independent. - Send: SEND.UDP(-Lite). Calling "Send" on a UDP Multicast Receive + Send: SEND.UDP(-Lite). Calling Send on a UDP Multicast Receive connection causes an immediate SendError. This is an unsupported operation. Receive: RECEIVE.UDP(-Lite). The Receive operation in a UDP Multicast Receive connection only delivers complete Messages to - "Received", each of which represents a single datagram received in - a UDP packet. Upon receiving a UDP datagram, the ECN flag from - the IP header can be obtained (GET_ECN.UDP(-Lite)). + Received, each of which represents a single datagram received in a + UDP packet. Upon receiving a UDP datagram, the ECN flag from the + IP header can be obtained (GET_ECN.UDP(-Lite)). - Close: Calling "Close" on a UDP Multicast Receive Connection + Close: Calling Close on a UDP Multicast Receive Connection (ABORT.UDP(-Lite)) releases the local port reservation and leaves the group. - Abort: Calling "Abort" on a UDP Multicast Receive Connection - (ABORT.UDP(-Lite)) is identical to calling "Close". + Abort: Calling Abort on a UDP Multicast Receive Connection + (ABORT.UDP(-Lite)) is identical to calling Close. 10.6. SCTP Connectedness: Connected Data Unit: Message API mappings for SCTP are as follows: Connection Object: Connection objects can be mapped to an SCTP association or a stream in an SCTP association. Mapping Connection objects to SCTP streams is called "stream mapping" and has additional requirements as follows. The following explanation assumes a client-server communication model. Stream mapping requires an association to already be in place between the client and the server, and it requires the server to understand that a new incoming stream should be represented as a new Connection Object by the Transport Services system. A new SCTP stream is created by sending an SCTP message with a new stream id. Thus, to - implement stream mapping, the Transport Services system MUST provide - a newly created Connection Object to the application upon the - reception of such a message. The necessary semantics to implement a - Transport Services system's Close and Abort primitives are provided - by the stream reconfiguration (reset) procedure described in - [RFC6525]. This also allows to re-use a stream id after resetting - ("closing") the stream. To implement this functionality, SCTP stream - reconfiguration [RFC6525] MUST be supported by both the client and - the server side. + implement stream mapping, the Transport Services API MUST provide a + newly created Connection Object to the application upon the reception + of such a message. The necessary semantics to implement a Transport + Services system Close and Abort primitives are provided by the stream + reconfiguration (reset) procedure described in [RFC6525]. This also + allows to re-use a stream id after resetting ("closing") the stream. + To implement this functionality, SCTP stream reconfiguration + [RFC6525] MUST be supported by both the client and the server side. To avoid head-of-line blocking, stream mapping SHOULD only be implemented when both sides support message interleaving [RFC8260]. This allows a sender to schedule transmissions between multiple streams without risking that transmission of a large message on one stream might block transmissions on other streams for a long time. To avoid conflicts between stream ids, the following procedure is recommended: the first Connection, for which the SCTP association has been created, MUST always use stream id zero. All additional @@ -1930,94 +1942,92 @@ ids. Generally, new streams SHOULD consume the lowest available (even or odd, depending on the side) stream id; this rule is relevant when lower ids become available because Connection objects associated with the streams are closed. SCTP stream mapping as described here has been implemented in a research prototype; a desription of this implementation is given in [NEAT-flow-mapping]. Initiate: If this is the only Connection object that is assigned to - the SCTP association or stream mapping is not used, CONNECT.SCTP + the SCTP Association or stream mapping is not used, CONNECT.SCTP is called. Else, unless the Selection Property - "activeReadBeforeSend" is Preferred or Required, a new stream is - used: if there are enough streams available, "Initiate" is a local + activeReadBeforeSend is Preferred or Required, a new stream is + used: if there are enough streams available, Initiate is a local operation that assigns a new stream id to the Connection object. The number of streams is negotiated as a parameter of the prior CONNECT.SCTP call, and it represents a trade-off between local resource usage and the number of Connection objects that can be mapped without requiring a reconfiguration signal. When running out of streams, ADD_STREAM.SCTP must be called. InitiateWithSend: If this is the only Connection object that is assigned to the SCTP association or stream mapping is not used, CONNECT.SCTP is called with the "user message" parameter. Else, a - new stream is used (see "Initiate" for how to handle running out - of streams), and this just sends the first message on a new - stream. + new stream is used (see Initiate for how to handle running out of + streams), and this just sends the first message on a new stream. - Ready: "Initiate" or "InitiateWithSend" returns without an error, - i.e. SCTP's four-way handshake has completed. If an association - with the peer already exists, stream mapping is used and enough - streams are available, a Connection Object instantly becomes Ready - after calling "Initiate" or "InitiateWithSend". + Ready: Initiate or InitiateWithSend returns without an error, i.e. + SCTP's four-way handshake has completed. If an association with + the peer already exists, stream mapping is used and enough streams + are available, a Connection Object instantly becomes Ready after + calling Initiate or InitiateWithSend. InitiateError: Failure of CONNECT.SCTP. ConnectionError: TIMEOUT.SCTP or ABORT-EVENT.SCTP. Listen: LISTEN.SCTP. If an association with the peer already exists - and stream mapping is used, "Listen" just expects to receive a new + and stream mapping is used, Listen just expects to receive a new message with a new stream id (chosen in accordance with the stream id assignment procedure described above). ConnectionReceived: LISTEN.SCTP returns without an error (a result of successful CONNECT.SCTP from the peer), or, in case of stream mapping, the first message has arrived on a new stream (in this - case, "Receive" is also invoked). + case, Receive is also invoked). - Clone: Calling "Clone" on an SCTP association creates a new - Connection object and assigns it a new stream id in accordance - with the stream id assignment procedure described above. If there - are not enough streams available, ADD_STREAM.SCTP must be called. + Clone: Calling Clone on an SCTP association creates a new Connection + object and assigns it a new stream id in accordance with the + stream id assignment procedure described above. If there are not + enough streams available, ADD_STREAM.SCTP must be called. Priority (Connection): When this value is changed, or a Message with - Message Property "Priority" is sent, and there are multiple + Message Property Priority is sent, and there are multiple Connection objects assigned to the same SCTP association, CONFIGURE_STREAM_SCHEDULER.SCTP is called to adjust the priorities of streams in the SCTP association. - Send: SEND.SCTP. Message Properties such as "Lifetime" and - "Ordered" map to parameters of this primitive. + Send: SEND.SCTP. Message Properties such as Lifetime and Ordered + map to parameters of this primitive. Receive: RECEIVE.SCTP. The "partial flag" of RECEIVE.SCTP invokes a - "ReceivedPartial" event. + ReceivedPartial event. Close: If this is the only Connection object that is assigned to the - SCTP association, CLOSE.SCTP is called, and the "Closed" event will - be delivered to the application upon the ensuing CLOSE-EVENT.SCTP. + SCTP association, CLOSE.SCTP is called, and the Closed event will be + delivered to the application upon the ensuing CLOSE-EVENT.SCTP. Else, the Connection object is one out of several Connection objects that are assigned to the same SCTP assocation, and RESET_STREAM.SCTP must be called, which informs the peer that the stream will no longer - be used for mapping and can be used by future "Initiate", - "InitiateWithSend" or "Listen" calls. At the peer, the event + be used for mapping and can be used by future Initiate, + InitiateWithSend or Listen calls. At the peer, the event RESET_STREAM-EVENT.SCTP will fire, which the peer must answer by issuing RESET_STREAM.SCTP too. The resulting local RESET_STREAM- EVENT.SCTP informs the Transport Services system that the stream id - can now be re-used by the next "Initiate", "InitiateWithSend" or - "Listen" calls, and invokes a "Closed" event towards the application. + can now be re-used by the next Initiate, InitiateWithSend or Listen + calls, and invokes a Closed event towards the application. Abort: If this is the only Connection object that is assigned to the SCTP association, ABORT.SCTP is called. Else, the Connection object is one out of several Connection objects that are assigned to the - same SCTP assocation, and shutdown proceeds as described under - "Close". + same SCTP assocation, and shutdown proceeds as described under Close. 11. IANA Considerations RFC-EDITOR: Please remove this section before publication. This document has no actions for IANA. 12. Security Considerations [I-D.ietf-taps-arch] outlines general security consideration and @@ -2054,56 +2064,57 @@ it is possible for the network to cause an implementation to consume significant on-device resources. Implementations should limit the maximum amount of state allowed for any given node, including the number of child nodes, especially when the state is based on results from the network. 13. Acknowledgements This work has received funding from the European Union's Horizon 2020 research and innovation programme under grant agreement No. 644334 - (NEAT). + (NEAT) and No. 815178 (5GENESIS). This work has been supported by Leibniz Prize project funds of DFG - German Research Foundation: Gottfried Wilhelm Leibniz-Preis 2011 (FKZ FE 570/4-1). This work has been supported by the UK Engineering and Physical Sciences Research Council under grant EP/R04144X/1. This work has been supported by the Research Council of Norway under its "Toppforsk" programme through the "OCARINA" project. - Thanks to Stuart Cheshire, Josh Graessley, David Schinazi, and Eric - Kinnear for their implementation and design efforts, including Happy - Eyeballs, that heavily influenced this work. + Thanks to Colin Perkins, Tom Jones, Karl-Johan Grinnemo, Gorry + Fairhurst, for their contributions to the design of this + specification. Thanks also to Stuart Cheshire, Josh Graessley, David + Schinazi, and Eric Kinnear for their implementation and design + efforts, including Happy Eyeballs, that heavily influenced this work. 14. References 14.1. Normative References [I-D.ietf-taps-arch] - Pauly, T., Trammell, B., Brunstrom, A., Fairhurst, G., - Perkins, C., Tiesel, P. S., and C. A. Wood, "An - Architecture for Transport Services", Work in Progress, - Internet-Draft, draft-ietf-taps-arch-10, 30 April 2021, - . + Pauly, T., Trammell, B., Brunstrom, A., Fairhurst, G., and + C. Perkins, "An Architecture for Transport Services", Work + in Progress, Internet-Draft, draft-ietf-taps-arch-12, 3 + January 2022, . [I-D.ietf-taps-interface] Trammell, B., Welzl, M., Enghardt, T., Fairhurst, G., Kuehlewind, M., Perkins, C., Tiesel, P. S., Wood, C. A., Pauly, T., and K. Rose, "An Abstract Application Layer Interface to Transport Services", Work in Progress, - Internet-Draft, draft-ietf-taps-interface-12, 9 April - 2021, . + Internet-Draft, draft-ietf-taps-interface-14, 3 January + 2022, . [RFC7413] Cheng, Y., Chu, J., Radhakrishnan, S., and A. Jain, "TCP Fast Open", RFC 7413, DOI 10.17487/RFC7413, December 2014, . [RFC7540] Belshe, M., Peon, R., and M. Thomson, Ed., "Hypertext Transfer Protocol Version 2 (HTTP/2)", RFC 7540, DOI 10.17487/RFC7540, May 2015, . @@ -2197,20 +2208,24 @@ [RFC7230] Fielding, R., Ed. and J. Reschke, Ed., "Hypertext Transfer Protocol (HTTP/1.1): Message Syntax and Routing", RFC 7230, DOI 10.17487/RFC7230, June 2014, . [RFC7657] Black, D., Ed. and P. Jones, "Differentiated Services (Diffserv) and Real-Time Communication", RFC 7657, DOI 10.17487/RFC7657, November 2015, . + [RFC8085] Eggert, L., Fairhurst, G., and G. Shepherd, "UDP Usage + Guidelines", BCP 145, RFC 8085, DOI 10.17487/RFC8085, + March 2017, . + [RFC8260] Stewart, R., Tuexen, M., Loreto, S., and R. Seggelmann, "Stream Schedulers and User Message Interleaving for the Stream Control Transmission Protocol", RFC 8260, DOI 10.17487/RFC8260, November 2017, . [RFC8445] Keranen, A., Holmberg, C., and J. Rosenberg, "Interactive Connectivity Establishment (ICE): A Protocol for Network Address Translator (NAT) Traversal", RFC 8445, DOI 10.17487/RFC8445, July 2018, @@ -2301,21 +2316,21 @@ requesting a unidirectional receive. * NoCandidates: The configuration is valid, but none of the available transport protocols can satisfy the transport properties provided by the application. * ResolutionFailed: The remote or local specifier provided by the application can not be resolved. * EstablishmentFailed: The Transport Services system was unable to - establish a transport-layer connection to the remote endpoint + establish a transport-layer connection to the Remote Endpoint specified by the application. * PolicyProhibited: The system policy prevents the transport system from performing the action requested by the application. * NotCloneable: The protocol stack is not capable of being cloned. * MessageTooLarge: The message size is too big for the transport system to handle. @@ -2356,21 +2371,21 @@ communication on top of TCP, UDP and SCTP, with many more features such as a policy manager. - Code: https://github.com/NEAT-project/neat (https://github.com/ NEAT-project/neat) - NEAT project: https://www.neat-project.org (https://www.neat- project.org) - NEATPy is a Python shim over NEAT which updates the NEAT API to - be in line with version 6 of the TAPS interface draft. + be in line with version 6 of the Transport Services API draft. - Code: https://github.com/theagilepadawan/NEATPy (https://github.com/theagilepadawan/NEATPy) * PyTAPS: - A TAPS implementation based on Python asyncio, offering protocol-independent communication to applications on top of TCP, UDP and TLS, with support for multicast. @@ -2395,49 +2410,25 @@ Email: tpauly@apple.com Theresa Enghardt Netflix 121 Albright Way Los Gatos, CA 95032, United States of America Email: ietf@tenghardt.net - Karl-Johan Grinnemo - Karlstad University - Universitetsgatan 2 - 651 88 Karlstad - Sweden - - Email: karl-johan.grinnemo@kau.se - - Tom Jones - University of Aberdeen - Fraser Noble Building - Aberdeen, AB24 3UE - United Kingdom - - Email: tom@erg.abdn.ac.uk - Philipp S. Tiesel SAP SE Konrad-Zuse-Ring 10 14469 Potsdam Germany Email: philipp@tiesel.net - Colin Perkins - University of Glasgow - School of Computing Science - Glasgow G12 8QQ - United Kingdom - - Email: csp@csperkins.org - Michael Welzl University of Oslo PO Box 1080 Blindern 0316 Oslo Norway Email: michawe@ifi.uio.no