--- 1/draft-ietf-taps-minset-00.txt 2018-02-06 04:15:11.916607489 -0800 +++ 2/draft-ietf-taps-minset-01.txt 2018-02-06 04:15:12.012609771 -0800 @@ -1,123 +1,131 @@ TAPS M. Welzl Internet-Draft S. Gjessing Intended status: Informational University of Oslo -Expires: April 25, 2018 October 22, 2017 +Expires: August 10, 2018 February 6, 2018 A Minimal Set of Transport Services for TAPS Systems - draft-ietf-taps-minset-00 + draft-ietf-taps-minset-01 Abstract This draft recommends a minimal set of IETF Transport Services offered by end systems supporting TAPS, and gives guidance on choosing among the available mechanisms and protocols. It is based on the set of transport features given in the TAPS document draft- - ietf-taps-transports-usage-08. + ietf-taps-transports-usage-09. Status of This Memo This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet- Drafts is at https://datatracker.ietf.org/drafts/current/. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." - This Internet-Draft will expire on April 25, 2018. + This Internet-Draft will expire on August 10, 2018. Copyright Notice - Copyright (c) 2017 IETF Trust and the persons identified as the + Copyright (c) 2018 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License. Table of Contents 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 4 3. The Minimal Set of Transport Features . . . . . . . . . . . . 5 - 3.1. Flow Creation . . . . . . . . . . . . . . . . . . . . . . 5 - 3.2. Flow Connection and Termination . . . . . . . . . . . . . 7 - 3.3. Flow Group Configuration . . . . . . . . . . . . . . . . 8 - 3.4. Flow Configuration . . . . . . . . . . . . . . . . . . . 8 - 3.5. Data Transfer . . . . . . . . . . . . . . . . . . . . . . 9 - 3.5.1. The Sender . . . . . . . . . . . . . . . . . . . . . 9 - 3.5.2. The Receiver . . . . . . . . . . . . . . . . . . . . 10 - 4. An MinSet Abstract Interface . . . . . . . . . . . . . . . . 11 - 4.1. Specification . . . . . . . . . . . . . . . . . . . . . . 12 - 5. Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . 17 - 6. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 18 - 7. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 18 - 8. Security Considerations . . . . . . . . . . . . . . . . . . . 18 - 9. References . . . . . . . . . . . . . . . . . . . . . . . . . 18 - 9.1. Normative References . . . . . . . . . . . . . . . . . . 18 - 9.2. Informative References . . . . . . . . . . . . . . . . . 19 - Appendix A. Deriving the minimal set . . . . . . . . . . . . . . 21 + 3.1. ESTABLISHMENT, AVAILABILITY and TERMINATION . . . . . . . 5 + 3.2. MAINTENANCE . . . . . . . . . . . . . . . . . . . . . . . 8 + 3.3. DATA Transfer . . . . . . . . . . . . . . . . . . . . . . 9 + 3.3.1. Sending Data . . . . . . . . . . . . . . . . . . . . 9 + 3.3.2. Receiving Data . . . . . . . . . . . . . . . . . . . 10 + 4. Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 + 4.1. ESTABLISHMENT, AVAILABILITY and TERMINATION . . . . . . . 11 + 4.2. MAINTENANCE . . . . . . . . . . . . . . . . . . . . . . . 12 + 4.2.1. Connection groups . . . . . . . . . . . . . . . . . . 12 + 4.2.2. Individual connections . . . . . . . . . . . . . . . 13 + 4.3. DATA Transfer . . . . . . . . . . . . . . . . . . . . . . 14 + 5. Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . 15 + 6. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 15 + 7. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 15 + 8. Security Considerations . . . . . . . . . . . . . . . . . . . 16 + 9. References . . . . . . . . . . . . . . . . . . . . . . . . . 16 + 9.1. Normative References . . . . . . . . . . . . . . . . . . 16 + 9.2. Informative References . . . . . . . . . . . . . . . . . 16 + Appendix A. Deriving the minimal set . . . . . . . . . . . . . . 18 A.1. Step 1: Categorization -- The Superset of Transport - Features . . . . . . . . . . . . . . . . . . . . . . . . 21 - A.1.1. CONNECTION Related Transport Features . . . . . . . . 23 - A.1.2. DATA Transfer Related Transport Features . . . . . . 38 + Features . . . . . . . . . . . . . . . . . . . . . . . . 19 + A.1.1. CONNECTION Related Transport Features . . . . . . . . 20 + A.1.2. DATA Transfer Related Transport Features . . . . . . 36 A.2. Step 2: Reduction -- The Reduced Set of Transport - Features . . . . . . . . . . . . . . . . . . . . . . . . 43 - A.2.1. CONNECTION Related Transport Features . . . . . . . . 44 - A.2.2. DATA Transfer Related Transport Features . . . . . . 45 - A.3. Step 3: Discussion . . . . . . . . . . . . . . . . . . . 46 - A.3.1. Sending Messages, Receiving Bytes . . . . . . . . . . 46 - A.3.2. Stream Schedulers Without Streams . . . . . . . . . . 48 - A.3.3. Early Data Transmission . . . . . . . . . . . . . . . 49 - A.3.4. Sender Running Dry . . . . . . . . . . . . . . . . . 50 - A.3.5. Capacity Profile . . . . . . . . . . . . . . . . . . 50 - A.3.6. Security . . . . . . . . . . . . . . . . . . . . . . 51 - A.3.7. Packet Size . . . . . . . . . . . . . . . . . . . . . 51 - Appendix B. Revision information . . . . . . . . . . . . . . . . 52 - Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 53 + Features . . . . . . . . . . . . . . . . . . . . . . . . 41 + A.2.1. CONNECTION Related Transport Features . . . . . . . . 42 + A.2.2. DATA Transfer Related Transport Features . . . . . . 43 + A.3. Step 3: Discussion . . . . . . . . . . . . . . . . . . . 43 + A.3.1. Sending Messages, Receiving Bytes . . . . . . . . . . 44 + A.3.2. Stream Schedulers Without Streams . . . . . . . . . . 46 + A.3.3. Early Data Transmission . . . . . . . . . . . . . . . 47 + A.3.4. Sender Running Dry . . . . . . . . . . . . . . . . . 48 + A.3.5. Capacity Profile . . . . . . . . . . . . . . . . . . 48 + A.3.6. Security . . . . . . . . . . . . . . . . . . . . . . 49 + A.3.7. Packet Size . . . . . . . . . . . . . . . . . . . . . 49 + Appendix B. Revision information . . . . . . . . . . . . . . . . 49 + Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 50 1. Introduction The task of any system that implements TAPS is to offer transport services to its applications, i.e. the applications running on top of TAPS, without binding them to a particular transport protocol. + Currently, the set of transport services that most applications use - is based on TCP and UDP; this limits the ability for the network - stack to make use of features of other protocols. For example, if a - protocol supports out-of-order message delivery but applications - always assume that the network provides an ordered bytestream, then - the network stack can never utilize out-of-order message delivery: - doing so would break a fundamental assumption of the application. + is based on TCP and UDP (and protocols running on top of them); this + limits the ability for the network stack to make use of features of + other protocols. For example, if a protocol supports out-of-order + message delivery but applications always assume that the network + provides an ordered bytestream, then the network stack can never + utilize out-of-order message delivery: doing so would break a + fundamental assumption of the application. By exposing the transport services of multiple transport protocols, a TAPS system can make it possible to use these services without having to statically bind an application to a specific transport protocol. The first step towards the design of such a system was taken by [RFC8095], which surveys a large number of transports, and [TAPS2] as well as [TAPS2UDP], which identify the specific transport features that are exposed to applications by the protocols TCP, MPTCP, UDP(- Lite) and SCTP as well as the LEDBAT congestion control mechanism. The present draft is based on these documents and follows the same - terminology (also listed below). + terminology (also listed below). Because the considered transport + protocols together cover a wide range of transport features, there is + reason to hope that the resulting set (and the reasoning that led to + it) will also apply to many aspects of other transport protocols such + as QUIC. The number of transport features of current IETF transports is large, and exposing all of them has a number of disadvantages: generally, the more functionality is exposed, the less freedom a TAPS system has to automate usage of the various functions of its available set of transport protocols. Some functions only exist in one particular protocol, and if an application would use them, this would statically tie the application to this protocol, counteracting the purpose of a TAPS system. Also, if the number of exposed features is exceedingly large, a TAPS system might become very hard to use for an application @@ -128,37 +136,33 @@ transport functionality. Applications use a wide variety of APIs today. The transport features in the minimal set in this document must be reflected in *all* network APIs in order for the underlying functionality to become usable everywhere. For example, it does not help an application that talks to a middleware if only the Berkeley Sockets API is extended to offer "unordered message delivery", but the middleware only offers an ordered bytestream. Both the Berkeley Sockets API and the middleware would have to expose the "unordered - message delivery" transport feature (alternatively, there may be - interesting ways for certain types of middleware to use some - transport features without exposing them, based on knowledge about - the applications -- but this is not the general case). In most - situations, in the interest of being as flexible and efficient as - possible, the best choice will be for a middleware or library to - expose at least all of the transport features that are recommended as - a "minimal set" here. + message delivery" transport feature (alternatively, there may be ways + for certain types of middleware to use this transport feature without + exposing it, based on knowledge about the applications -- but this is + not the general case). In most situations, in the interest of being + as flexible and efficient as possible, the best choice will be for a + middleware or library to expose at least all of the transport + features that are recommended as a "minimal set" here. - This "minimal set" can be implemented one-sided with a fall-back to - TCP (or UDP, if certain limitations are put in place). This means - that a sender-side TAPS system can talk to a non-TAPS TCP (or UDP) - receiver, and a receiver-side TAPS system can talk to a non-TAPS TCP - (or UDP) sender. For systems that do not have this requirement, - [I-D.trammell-taps-post-sockets] describes a way to extend the - functionality of the minimal set such that some of its limitations - are removed. + This "minimal set" can be implemented one-sided over TCP (or UDP, if + certain limitations are put in place). This means that a sender-side + TAPS system implementing it can talk to a non-TAPS TCP (or UDP) + receiver, and a receiver-side TAPS system implementing it can talk to + a non-TAPS TCP (or UDP) sender. 2. Terminology The following terms are used throughout this document, and in subsequent documents produced by TAPS that describe the composition and decomposition of transport services. Transport Feature: a specific end-to-end feature that the transport layer provides to an application. Examples include confidentiality, reliable delivery, ordered delivery, message- @@ -187,58 +191,67 @@ Moreover, throughout the document, the protocol name "UDP(-Lite)" is used when discussing transport features that are equivalent for UDP and UDP-Lite; similarly, the protocol name "TCP" refers to both TCP and MPTCP. 3. The Minimal Set of Transport Features Based on the categorization, reduction and discussion in Appendix A, this section describes the minimal set of transport features that is - offered by end systems supporting TAPS. This TAPS system is able to - fall back to TCP; elements of the system that may prohibit falling - back to UDP are marked with "!UDP". To implement a TAPS system that - is also able to fall back to UDP, these marked transport features + offered by end systems supporting TAPS. This TAPS system can be + implemented over TCP; elements of the system that may prohibit + implementation over UDP are marked with "!UDP". To implement a TAPS + system that can also work over UDP, these marked transport features should be excluded. -3.1. Flow Creation + As in Appendix A, Appendix A.2 and [TAPS2], we categorize the minimal + set of transport features as 1) CONNECTION related (ESTABLISHMENT, + AVAILABILITY, MAINTENANCE, TERMINATION) and 2) DATA Transfer related + (Sending Data, Receiving Data, Errors). Here, the focus is on "TAPS + Connections": connections that the TAPS system offers, as opposed to + connections of transport protocols that the TAPS system uses. - A TAPS flow must be "created" before it is connected, to allow for - initial configurations to be carried out. All configuration - parameters in Section 3.3 and Section 3.4 can be used initially, - although some of them may only take effect when the flow has been - connected. Configuring a flow early helps a TAPS system make the - right decisions. In particular, the "group number" can influence the - TAPS system to implement a TAPS flow as a stream of a multi-streaming - protocol's existing association or not. +3.1. ESTABLISHMENT, AVAILABILITY and TERMINATION - For flows that use a new "group number", early configuration is - necessary because it allows the TAPS system to know which protocols - it should try to use (to steer a mechanism such as "Happy Eyeballs" + A TAPS connection must first be "created" to allow for some initial + configuration to be carried out before the TAPS system can actively + or passively establish a transport connection. All configuration + parameters in Section 3.2 and can be used initially, although some of + them may only take effect when a transport connection has been + established. Configuring a connection early helps a TAPS system make + the right decisions. In particular, grouping information can + influence the TAPS system to implement a TAPS connection as a stream + of a multi-streaming protocol's existing association or not. + + For ungrouped TAPS connections, early configuration is necessary + because it allows the TAPS system to know which protocols it should + try to use (to steer a mechanism such as "Happy Eyeballs" [I-D.grinnemo-taps-he]). In particular, a TAPS system that only makes a one-time choice for a particular protocol must know early about strict requirements that must be kept, or it can end up in a deadlock situation (e.g., having chosen UDP and later be asked to - support reliable transfer). As one possibility to correctly handle + support reliable transfer). As a possibility to correctly handle these cases, we provide the following decision tree (this is derived from Appendix A.2.1 excluding authentication, as explained in Section 8): - Will it ever be necessary to offer any of the following? * Reliably transfer data * Notify the peer of closing/aborting * Preserve data ordering Yes: SCTP or TCP can be used. - Is any of the following useful to the application? - * Choosing a scheduler to operate between flows in a group, - with the possibility to configure a priority or weight per flow + * Choosing a scheduler to operate between TAPS connections + in a group, with the possibility to configure a priority + or weight per connection * Configurable message reliability * Unordered message delivery * Request not to delay the acknowledgement (SACK) of a message Yes: SCTP is preferred. No: - Is any of the following useful to the application? * Hand over a message to reliably transfer (possibly multiple times) before connection establishment * Suggest timeout to the peer @@ -255,552 +268,432 @@ * Specify minimum checksum coverage required by receiver Yes: UDP-Lite is preferred. No: UDP is preferred. Note that this decision tree is not optimal for all cases. For example, if an application wants to use "Specify checksum coverage used by the sender", which is only offered by UDP-Lite, and "Configure priority or weight for a scheduler", which is only offered by SCTP, the above decision tree will always choose UDP-Lite, making - it impossible to use SCTP's schedulers with priorities between flows - in a group. The TAPS system must know which choice is more important - for the application in order to make the best decision. We caution - implementers to be aware of the full set of trade-offs, for which we - recommend consulting the list in Appendix A.2.1 when deciding how to - initialize a flow. - - Once a flow is created, it can be queried for the maximum amount of - data that an application can possibly expect to have reliably - transmitted before or during connection establishment (with zero - being a possible answer). An application can also give the flow a - message for reliable transmission before or during connection - establishment (!UDP); the TAPS system will then try to transmit it as - early as possible. An application can facilitate sending the message - particularly early by marking it as "idempotent"; in this case, the - receiving application must be prepared to potentially receive - multiple copies of the message (because idempotent messages are - reliably transferred, asking for idempotence is not necessary for - systems that support UDP-fall-back). - -3.2. Flow Connection and Termination + it impossible to use SCTP's schedulers with priorities between + grouped TAPS connections. The TAPS system must know which choice is + more important for the application in order to make the best + decision. We caution implementers to be aware of the full set of + trade-offs, for which we recommend consulting the list in + Appendix A.2.1 when deciding how to initialize a TAPS connection. - To be compatible with multiple transports, including streams of a - multi-streaming protocol (used as if they were transports - themselves), the semantics of opening and closing need to be the most - restrictive subset of all of them. For example, TCP's support of - half-closed connections can be seen as a feature on top of the more - restrictive "ABORT"; this feature cannot be supported because not all - protocols used by a TAPS system (including streams of an association) - support half-closed connections. + Once a TAPS connection is created, it can be queried for the maximum + amount of data that an application can possibly expect to have + reliably transmitted before or during transport connection + establishment (with zero being a possible answer). An application + can also give the TAPS connection a message for reliable transmission + before or during connection establishment (!UDP); the TAPS system + will then try to transmit it as early as possible. An application + can facilitate sending the message particularly early by marking it + as "idempotent"; in this case, the receiving application must be + prepared to potentially receive multiple copies of the message + (because idempotent messages are reliably transferred, asking for + idempotence is not necessary for systems that support UDP). - After creation, a flow can be actively connected to the other side - using "Connect", or it can passively listen for incoming connection - requests with "Listen". Note that "Connect" may or may not trigger a + After creation, a TAPS system can actively establish communication + with a peer, or it can passively listen for incoming connection + requests. Note that "Establish" may or may not trigger a notification on the listening side. It is possible that the first notification on the listening side is the arrival of the first data that the active side sends (a receiver-side TAPS system could handle this by continuing to block a "Listen" call, immediately followed by - issuing "Receive", for example; callback-based implementations may + issuing "Receive", for example; callback-based implementations could simply skip the equivalent of "Listen"). This also means that the active opening side is assumed to be the first side sending data. A TAPS system can actively close a connection, i.e. terminate it after reliably delivering all remaining data to the peer, or it can abort it, i.e. terminate it without delivering remaining data. - Unless all data transfers only used unreliable frame transmission + Unless all data transfers only used unreliable message transmission without congestion control (i.e., UDP-style transfer), closing a connection is guaranteed to cause an event to notify the peer application that the connection has been closed (!UDP). Similarly, for anything but (UDP-style) unreliable non-congestion-controlled data transfer, aborting a connection will cause an event to notify the peer application that the connection has been aborted (!UDP). A - timeout can be configured to abort a flow when data could not be - delivered for too long (!UDP); however, timeout-based abortion does - not notify the peer application that the connection has been aborted. - - Because half-closed connections are not supported, when a TAPS host - receives a notification that the peer is closing or aborting the flow - (!UDP), the other side may not be able to read outstanding data. - This means that unacknowledged data residing in the TAPS system's - send buffer may have to be dropped from that buffer upon arrival of a - notification to close or abort the flow from the peer. + timeout can be configured to abort a TAPS connection when data could + not be delivered for too long (!UDP); however, timeout-based abortion + does not notify the peer application that the connection has been + aborted. Because half-closed connections are not supported, when a + TAPS host receives a notification that the peer is closing or + aborting the connection (!UDP), its peer may not be able to read + outstanding data. This means that unacknowledged data residing in + the TAPS system's send buffer may have to be dropped from that buffer + upon arrival of a "close" or "abort" notification from the peer. -3.3. Flow Group Configuration +3.2. MAINTENANCE - A flow group can be configured with a number of transport features, - and there are some notifications to applications about a flow group. - Here we list transport features and notifications from Appendix A.2 - that sometimes automatically apply to groups of flows (e.g., when a - flow is mapped to a stream of a multi-streaming protocol). + A TAPS connection group can be configured with a number of transport + features, and there are some notifications to applications about a + connection group. The following transport features and notifications + from Appendix A.2 automatically apply to grouped TAPS connections + (e.g., when a TAPS connection is mapped to a stream of a multi- + streaming protocol): Timeout, error notifications: o Change timeout for aborting connection (using retransmit limit or time value) (!UDP) o Suggest timeout to the peer (!UDP) o Notification of Excessive Retransmissions (early warning below abortion threshold) o Notification of ICMP error message arrival Others: - o Choose a scheduler to operate between flows of a group + o Choose a scheduler to operate between connections of a group o Obtain ECN field The following transport features are new or changed, based on the discussion in Appendix A.3: o Capacity profile This describes how an application wants to use its available capacity. Choices can be "lowest possible latency at the expense of overhead" (which would disable any Nagle-like algorithm), - "scavenger", and some more values that help determine the DSCP - value for a flow (e.g. similar to table 1 in + "scavenger", and values that help determine the DSCP value for a + connection (e.g. similar to table 1 in [I-D.ietf-tsvwg-rtcweb-qos]). -3.4. Flow Configuration - - Here we list transport features and notifications from Appendix A.2 - that only apply to a single flow. + The following transport features and notifications from Appendix A.2 + only apply to a single TAPS connection: Configure priority or weight for a scheduler + Checksums: o Disable checksum when sending o Disable checksum requirement when receiving o Specify checksum coverage used by the sender o Specify minimum checksum coverage required by receiver + A TAPS system must offer means to group connections; at the same + time, it cannot guarantee truly grouping them below (e.g., it cannot + be guaranteed that TAPS connections become multiplexed as streams on + a single SCTP association when SCTP may not be available). The TAPS + system must therefore ensure that group versus non-group + configurations listed above are handled correctly in some way (e.g., + by applying the configuration to all grouped connections even when + they are not multiplexed, or informing the application about grouping + success or failure). -3.5. Data Transfer +3.3. DATA Transfer -3.5.1. The Sender +3.3.1. Sending Data - This section discusses how to send data after flow establishment. - Section 3.2 discusses the possiblity to hand over a message to - reliably send before or during establishment. + This section discusses how to send data after connection + establishment. Section 3.1 discusses the possiblity to hand over a + message to reliably send before or during establishment. - Here we list per-frame properties that a sender can optionally - configure if it hands over a delimited frame for sending with + Here we list per-message properties that a sender can optionally + configure if it hands over a delimited message for sending with congestion control (!UDP), taken from Appendix A.2: o Configurable Message Reliability o Ordered message delivery (potentially slower than unordered) o Unordered message delivery (potentially faster than ordered) o Request not to bundle messages o Request not to delay the acknowledgement (SACK) of a message - Additionally, an application can hand over delimited frames for + Additionally, an application can hand over delimited messages for unreliable transmission without congestion control (note that such applications should perform congestion control in accordance with - [RFC2914]). Then, none of the per-frame properties listed above have - any effect, but it is possible to use the transport feature "Specify - DF field" to allow/disallow fragmentation. + [RFC2914]). Then, none of the per-message properties listed above + have any effect, but it is possible to use the transport feature + "Specify DF field" to allow/disallow fragmentation. Following Appendix A.3.7, there are three transport features (two - old, one new) and a notification: + old, one new): - o Get max. transport frame size that may be sent without + o Get max. transport message size that may be sent without fragmentation from the configured interface This is optional for a TAPS system to offer, and may return an error ("not available"). It can aid applications implementing Path MTU Discovery. - o Get max. transport frame size that may be received from the + o Get max. transport message size that may be received from the configured interface This is optional for a TAPS system to offer, and may return an error ("not available"). - o Get maximum transport frame size + o Get maximum transport message size Irrespective of fragmentation, there is a size limit for the messages that can be handed over to SCTP or UDP(-Lite); because a TAPS system is independent of the transport, it must allow a TAPS - application to query this value -- the maximum size of a frame in - an Application-Framed-Bytestream. This may also return an error - when frames are not delimited ("not available"). + application to query this value -- the maximum size of a message + in an Application-Framed-Bytestream (see Appendix A.3.1). This + may also return an error when data is not delimited ("not + available"). There are two more sender-side notifications. These are unreliable, i.e. a TAPS system cannot be assumed to implement them, but they may occur: o Notification of send failures A TAPS system may inform a sender application of a failure to send - a specific frame. + a specific message. o Notification of draining below a low water mark A TAPS system can notify a sender application when the TAPS system's filling level of the buffer of unsent data is below a configurable threshold in bytes. Even for TAPS systems that do implement this notification, supporting thresholds other than 0 is optional. "Notification of draining below a low water mark" is a generic notification that tries to enable uniform access to "TCP_NOTSENT_LOWAT" as well as the "SENDER DRY" notification (as discussed in Appendix A.3.4 -- SCTP's "SENDER DRY" is a special case where the threshold (for unsent data) is 0 and there is also no more unacknowledged data in the send buffer). Note that this threshold and its notification should operate across the buffers of the whole TAPS system, i.e. also any potential buffers that the TAPS system itself may use on top of the transport's send buffer. -3.5.2. The Receiver - - A receiving application obtains an Application-Framed Bytestream. - Similar to TCP's receiver semantics, it is just a stream of bytes. - If frame boundaries were specified by the sender, a receiver-side - TAPS system will still not inform the receiving application about - them. Within the bytestream, frames themselves will always stay - intact (partial frames are not supported - see Appendix A.3.1). - Different from TCP's semantics, there is no guarantee that all frames - in the bytestream are transmitted from the sender to the receiver, - and that all of them are in the same sequence in which they were - handed over by the sender. If an application is aware of frame - delimiters in the bytestream, and if the sender-side application has - informed the TAPS system about these boundaries and about potentially - relaxed requirements regarding the sequence of frames or per-frame - reliability, frames within the receiver-side bytestream may be out- - of-order or missing. - -4. An MinSet Abstract Interface - - Here we present the minimum set in the form of an abstract interface - that a TAPS system could implement. This abstract interface is - derived from the description in the previous section. The primitives - of this abstract interface can be implemented in various ways. For - example, information that is provided to an application can either be - offered via a primitive that is polled, or via an asynchronous - notification. - - We note that this is just a different form to represent the text in - the previous section, and not an abstract API that is recommended to - be implemented in this form by all TAPS systems. Specifically, TAPS - systems implementing this specific abstract interface would have the - following properties: +3.3.2. Receiving Data - 1. Support one-sided deployment with a fall-back to TCP (or UDP) - 2. Offer all the transport features of (MP)TCP, UDP(-Lite), LEDBAT - and SCTP that require application-specific knowledge - 3. Not offer any of the transport features of these protocols and - the LEDBAT congestion control mechanism that do not require - application-specific knowledge (to give maximum flexibility to a - TAPS system) + A receiving application obtains an "Application-Framed Bytestream" + (AFra-Bytestream); this concept is further described in + Appendix A.3.1). In line with TCP's receiver semantics, an AFra- + Bytestream is just a stream of bytes to the receiver. If message + boundaries were specified by the sender, a receiver-side TAPS system + implementing only the minimum set of transport services defined here + will still not inform the receiving application about them. Within + the bytestream, messages themselves will always stay intact (partial + messages are not supported). Different from TCP's semantics, there + is no guarantee that all messages in the bytestream are transmitted + from the sender to the receiver, and that all of them are in the same + sequence in which they were handed over by the sender. If an + application is aware of message delimiters in the bytestream, and if + the sender-side application has informed the TAPS system about these + boundaries and about potentially relaxed requirements regarding the + sequence of messages or per-message reliability, messages within the + receiver-side bytestream may be out-of-order or missing. - This reciprocally means that this is probably not the ideal interface - to implement for systems that: +4. Summary - 1. Assume that there is a system on both sides -- in this case, - richer functionality can be provided (cf. - [I-D.trammell-taps-post-sockets]) -- or assume different fall- - back protocols than TCP or UDP - 2. Use other protocols than (MP)TCP, UDP(-Lite), SCTP or the LEDBAT - congestion control mechanism underneath the TAPS interface - 3. Want to offer transport features that do not require application- - specific knowledge + Here we summarize the minimum set of transport features in a more + compact form. -4.1. Specification +4.1. ESTABLISHMENT, AVAILABILITY and TERMINATION - CREATE (flow-group-id, reliability, checksum_coverage, - config_msg_prio, earlymsg_timeout_notifications) - Returns: flow-id + A TAPS connection is created and associated with an existing or new + TAPS connection group. Grouping can influence the TAPS system to + multiplex TAPS connections on a single transport connection or not, + and the other parameters serve as input to the decision tree + described in Section 3.1. The TAPS systems gives no guarantees about + honoring any of the requests at this stage, these parameters are just + meant to help it choose and configure a suitable protocol. Note that + the parameters below affect all grouped TAPS connections. - Create a flow and associate it with an existing or new flow group - number. The group number can influence the TAPS system to implement - a TAPS flow as a stream of a multi-streaming protocol's existing - association or not, and the other parameters serve as input to the - decision tree described in Section 3.1. The TAPS systems gives no - guarantees about honoring any of the requests at this stage, these - parameters are just meant to help it to choose and configure a - suitable protocol. + A TAPS connection can actively connect to a peer; this may or may not + trigger a notification on the listening side. If the application + sends data (see Section 4.3) before the TAPS system establishes a + transport connection, then such data may be transmitted early, upon + connecting. When a TAPS system listens for incoming connections, the + first arriving message may already be the first block of data. - PARAMETERS: + Creation / connection / configuration parameters: - flow-group-id: the flow's group number; all other parameters are - only relevant when this number is not currently in use by an - ongoing flow to the same destination (in which case the flow - becomes a member of the existing flow's group and inherits the - configuration of the group). reliability: a boolean that should be set to true when any of the following will be useful to the application: reliably transfer data; notify the peer of closing/aborting; preserve data ordering. checksum_coverage: a boolean to specify whether it will be useful to the application to specify checksum coverage when sending or receiving. config_msg_prio: a boolean that should be set to true when any of the following per-message configuration or prioritization mechanisms will be useful to the application: choosing a scheduler - to operate between flows in a group, with the possibility to - configure a priority or weight per flow; configurable message - reliability; unordered message delivery; requesting not to delay - the acknowledgement (SACK) of a message. + to operate between grouped connections, with the possibility to + configure a priority or weight per connection; configurable + message reliability; unordered message delivery; requesting not to + delay the acknowledgement (SACK) of a message. earlymsg_timeout_notifications: a boolean that should be set to true when any of the following will be useful to the application: hand over a message to reliably transfer (possibly multiple times) before connection establishment; suggest timeout to the peer; notification of excessive retransmissions (early warning below abortion threshold); notification of ICMP error message arrival. - (!UDP) CONFIGURE_TIMEOUT (flow-group-id [timeout] [peer_timeout] - [retrans_notify]) - - This configures timeouts for all flows in a group. Configuration - should generally be carried out as early as possible, ideally before - flows are connected, to aid the TAPS system's decision taking. - - PARAMETERS: - - timeout: a timeout value for aborting connections, in seconds - peer_timeout: a timeout value to be suggested to the peer (if - possible), in seconds - retrans_notify: the number of retransmissions after which the - application should be notifed of "Excessive Retransmissions" + A TAPS connection can be closed after all outstanding data is + reliably delivered to the peer (if reliable data delivery was + requested earlier (!UDP)), in which case the peer is notified that + the connection is closed. Alternatively, a TAPS connection can be + aborted without delivering outstanding data to the peer. In case + reliable or partially reliable data delivery was requested earlier + (!UDP), the peer is notified that the connection is aborted. - CONFIGURE_CHECKSUM (flow-id [send [send_length]] [receive - [receive_length]]) +4.2. MAINTENANCE - This configures the usage of checksums for a flow in a group. - Configuration should generally be carried out as early as possible, - ideally before the flow is connected, to aid the TAPS system's - decision taking. "send" parameters concern using a checksum when - sending, "receive" parameters concern requiring a checksum when - receiving. There is no guarantee that any checksum limitations will - indeed be enforced; all defaults are: "full coverage, checksum - enabled". + As a general rule, any configuration described below should be + carried out as early as possible to aid the TAPS system's decision + taking. - PARAMETERS: +4.2.1. Connection groups - send: boolean, enable / disable usage of a checksum - send_length: if send is true, this optional parameter can provide - the desired coverage of the checksum in bytes - receive: boolean, enable / disable requiring a checksum - receive_length: if receive is true, this optional parameter can - provide the required minimum coverage of the checksum in bytes + The transport features below apply to all TAPS connections in the + same group: - CONFIGURE_URGENCY (flow-group-id [scheduler] [capacity_profile] - [low_watermark]) + (!UDP) Configure a timeout: this can be done with the following + parameters: - This carries out configuration related to the urgency of sending data - on flows of a group. Configuration should generally be carried out - as early as possible, ideally before flows are connected, to aid the - TAPS system's decision taking. + o A timeout value for aborting connections, in seconds + o A timeout value to be suggested to the peer (if possible), in + seconds + o The number of retransmissions after which the application should + be notifed of "Excessive Retransmissions" - PARAMETERS: + Configure urgency: this can be done with the following parameters: - scheduler: a number to identify the type of scheduler that should be - used to operate between flows in the group (no guarantees given). - Future versions of this document will be self contained, but for - now we suggest the schedulers defined in - [I-D.ietf-tsvwg-sctp-ndata]. - capacity_profile: a number to identify how an application wants to - use its available capacity. Future versions of this document will - be self contained, but for now choices can be "lowest possible + o A number to identify the type of scheduler that should be used to + operate between connections in the group (no guarantees given). + Schedulers are defined in [RFC8260]. + o A "capacity profile" number to identify how an application wants + to use its available capacity. Choices can be "lowest possible latency at the expense of overhead" (which would disable any - Nagle-like algorithm), "scavenger", and some more values that help - determine the DSCP value for a flow (e.g. similar to table 1 in + Nagle-like algorithm), "scavenger", or values that help determine + the DSCP value for a connection (e.g. similar to table 1 in [I-D.ietf-tsvwg-rtcweb-qos]). - low_watermark: a buffer limit (in bytes); when the sender has less - then low_watermark bytes in the buffer, the application may be + o A buffer limit (in bytes); when the sender has less then + low_watermark bytes in the buffer, the application may be notified. Notifications are not guaranteed, and supporting - watermark numbers greater than 0 is not guaranteed. - - CONFIGURE_PRIORITY (flow-id priority) - - This configures a flow's priority or weight for a scheduler. - Configuration should generally be carried out as early as possible, - ideally before flows are connected, to aid the TAPS system's decision - taking. - - PARAMETERS: + watermark values greater than 0 is not guaranteed. - priority: future versions of this document will be self contained, - but for now we suggest the priority as described in - [I-D.ietf-tsvwg-sctp-ndata]. + The following properties can be queried: - NOTIFICATIONS - Returns: flow-group-id notification_type + o The maximum message size that may be sent without fragmentation, + in bytes (or "not available") + o The maximum transport message size that can be sent, in bytes (or + "not available") + o The maximum transport message size that can be received, in bytes + (or "not available") + o The maximum amount of data that can possibly be sent before or + during connection establishment, in bytes (or "not available") - This is fired when an event occurs, notifying the application about - something happening in relation to a flow group. Notification types - are: + In addition to the already mentioned closing / aborting notifications + and possible send errors, the following notifications can occur: - Excessive Retransmissions: the configured (or a default) number of + o Excessive Retransmissions: the configured (or a default) number of retransmissions has been reached, yielding this early warning below an abortion threshold. - ICMP Arrival (parameter: ICMP message): an ICMP packet carrying the - conveyed ICMP message has arrived. - ECN Arrival (parameter: ECN value): a packet carrying the conveyed + o ICMP Arrival (parameter: ICMP message): an ICMP packet carrying + the conveyed ICMP message has arrived. + o ECN Arrival (parameter: ECN value): a packet carrying the conveyed ECN value has arrived. This can be useful for applications implementing congestion control. - Timeout (parameter: s seconds): data could not be delivered for s + o Timeout (parameter: s seconds): data could not be delivered for s seconds. - Close: the peer has closed the connection. The peer has no more - data to send, and will not read more data. Data that is in - transit or resides in the local send buffer will be discarded. - Abort: the peer has aborted the connection. The peer has no more - data to send, and will not read more data. Data that is in - transit or resides in the local send buffer will be discarded. - - Note that there is no guarantee that this notification will be - invoked when the peer aborts. - Drain: the send buffer has either drained below the configured low + o Drain: the send buffer has either drained below the configured low water mark or it has become completely empty. - Path Change (parameter: path identifier): the path has changed; the - path identifier is a number that can be used to determine a - previously used path is used again (e.g., the TAPS system has - switched from one interface to the other and back). - Send Failure (parameter: frame identifier): this informs the - application of a failure to send a specific frame. There can be a - send failure without this notification happening. - - QUERY_PROPERTIES (flow-group-id property_identifier) - Returns: requested property (see below) - - This allows to query some properties of a flow group. Return values - per property identifier are: - o The maximum frame size that may be sent without fragmentation, in - bytes (or "not available") - o The maximum transport frame size that can be sent, in bytes (or - "not available") - o The maximum transport frame size that can be received, in bytes - (or "not available") - o The maximum amount of data that can possibly be sent before or - during connection establishment, in bytes (or "not available") - - CONNECT (flow-id dst_addr) - - Connects a flow. This primitive may or may not trigger a - notification (continuing LISTEN) on the listening side. If a send - precedes this call, then data may be transmitted with this connect. - - PARAMETERS: +4.2.2. Individual connections - dst_addr: the destination transport address to connect to + The transport features below apply to individual TAPS connections: - LISTEN (flow-id) + Configure priority or weight for a scheduler, as described in + [RFC8260]. - Blocking passive connect, listening on all interfaces. This may not - be the direct result of the peer calling CONNECT - it may also be - invoked upon reception of the first block of data. In this case, - RECEIVE_FRAME is invoked immediately after. + Configure checksum usage: this can be done with the following + parameters, but there is no guarantee that any checksum limitations + will indeed be enforced (the default behavior is "full coverage, + checksum enabled"): - SEND_FRAME (flow-id frame [reliability] [ordered] [bundle] [delack] - [fragment] [idempotent]) + o A boolean to enable / disable usage of a checksum when sending + o The desired coverage (in bytes) of the checksum used when sending + o A boolean to enable / disable requiring a checksum when receiving + o The required minimum coverage (in bytes) of the checksum when + receiving - Sends an application frame. No guarantees are given about the - preservation of frame boundaries to the peer; if frame boundaries are - needed, the receiving application at the peer must know about them - beforehand (or the TAPS system cannot fall back to TCP). Note that - this call can already be used before a flow is connected. All - parameters refer to the frame that is being handed over. +4.3. DATA Transfer - PARAMETERS: + When sending a message, no guarantees are given about the + preservation of message boundaries to the peer; if message boundaries + are needed, the receiving application at the peer must know about + them beforehand (or the TAPS system cannot use TCP). Note that an + application should already be able to hand over data before the TAPS + system establishes a transport connection. Regarding the message + that is being handed over, the following parameters can be used: - (!UDP) reliability: this parameter is used to convey a choice of: + o (!UDP) Reliability: this parameter is used to convey a choice of: fully reliable, unreliable without congestion control (which is - guaranteed), unreliable, partially reliable (how to configure: - TBD, probably using a time value). The latter two choices are not - guaranteed and may result in full reliability. - (!UDP) ordered: this boolean parameter lets an application choose + guaranteed), unreliable, partially reliable (see [RFC3758] and + [RFC7496] for details on how to specify partial reliability). The + latter two choices are not guaranteed and may result in full + reliability. + o (!UDP) Ordered: this boolean parameter lets an application choose between ordered message delivery (true) and possibly unordered, potentially faster message delivery (false). - bundle: a boolean that expresses a preference for allowing to bundle - frames (true) or not (false). No guarantees are given. - delack: a boolean that, if false, lets an application request that - the peer would not delay the acknowledgement for this frame. - fragment: a boolean that expresses a preference for allowing to - fragment frames (true) or not (false), at the IP level. No + o Bundle: a boolean that expresses a preference for allowing to + bundle messages (true) or not (false). No guarantees are given. + o DelAck: a boolean that, if false, lets an application request that + the peer would not delay the acknowledgement for this message. + o Fragment: a boolean that expresses a preference for allowing to + fragment messages (true) or not (false), at the IP level. No guarantees are given. - (!UDP) idempotent: a boolean that expresses whether a frame is - idempotent (true) or not (false). Idempotent frames may arrive + o (!UDP) Idempotent: a boolean that expresses whether a message is + idempotent (true) or not (false). Idempotent messages may arrive multiple times at the receiver (but they will arrive at least once). When data is idempotent it can be used by the receiver - immediately on a connection establishment attempt. Thus, if - SEND_FRAME is used before connecting, stating that a frame is - idempotent facilitates transmitting it to the peer application - particularly early. - - (!UDP) CLOSE (flow-id) - - Closes the flow after all outstanding data is reliably delivered to - the peer (if reliable data delivery was requested). In case reliable - or partially reliable data delivery was requested earlier, the peer - is notified of the CLOSE. - - ABORT (flow-id) - Aborts the flow without delivering outstanding data to the peer. In - case reliable or partially reliable data delivery was requested - earlier (!UDP), the peer is notified of the ABORT. + immediately on a connection establishment attempt. Thus, if data + is handed over before the TAPS system establishes a transport + connection, stating that a message is idempotent facilitates + transmitting it to the peer application particularly early. - RECEIVE_FRAME (flow-id buffer) + An application can be notified of a failure to send a specific + message. There is no guarantee of such notifications, i.e. send + failures can also silently occur. - This receives a block of data. This block may or may not correspond - to a sender-side frame, i.e. the receiving application is not - informed about frame boundaries (this limitation is only needed for - TAPS systems that want to be able to fall back to TCP). However, if - the sending application has allowed that frames are not fully - reliably transferred, or delivered out of order, then such re- - ordering or unreliability may be reflected per frame in the arriving - data. Frames will always stay intact - i.e. if an incomplete frame - is contained at the end of the arriving data block, this frame is + When receiving data blocks, these blocks may or may not correspond to + a sender-side message, i.e. the receiving application is not informed + about message boundaries (this limitation is only needed for TAPS + systems that are implemented to directly use TCP). However, if the + sending application has allowed that messages are not fully reliably + transferred, or delivered out of order, then such re-ordering or + unreliability may be reflected per message in the arriving data. + Messages will always stay intact - i.e. if an incomplete message is + contained at the end of the arriving data block, this message is guaranteed to continue in the next arriving data block. - PARAMETERS: - - buffer: the buffer where the received data will be stored. - 5. Conclusion By decoupling applications from transport protocols, a TAPS system provides a different abstraction level than the Berkeley sockets interface. As with high- vs. low-level programming languages, a higher abstraction level allows more freedom for automation below the interface, yet it takes some control away from the application programmer. This is the design trade-off that a TAPS system developer is facing, and this document provides guidance on the design of this abstraction level. Some transport features are currently rarely offered by APIs, yet they must be offered or they can never be used ("functional" transport features). Other transport features are offered by the APIs of the protocols covered here, but not exposing them in a TAPS API would allow for more freedom to - automate protocol usage in a TAPS system. - - The minimal set presented in this document is an effort to find a - middle ground that can be recommended for TAPS systems to implement, - on the basis of the transport features discussed in [TAPS2]. This - middle ground eliminates a large number of transport features because - they do not require application-specific knowledge, but instead rely - on knowledge about the network or the Operating System. This leaves - us with an unanswered question about how exactly a TAPS system should - automate using all of these "automatable" transport features. - - In some cases, it may be best to not entirely automate the decision - making, but leave it up to a system-wide policy. For example, when - multiple paths are available, a system policy could guide the - decision on whether to connect via a WiFi or a cellular interface. - Such high-level guidance could also be provided by application - developers, e.g. via a primitive that lets applications specify such - preferences. As long as this kind of information from applications - is treated as advisory, it will not lead to a permanent protocol - binding and does therefore not limit the flexibility of a TAPS - system. Decisions to add such primitives are therefore left open to - TAPS system designers. + automate protocol usage in a TAPS system. The minimal set presented + in this document is an effort to find a middle ground that can be + recommended for TAPS systems to implement, on the basis of the + transport features discussed in [TAPS2]. 6. Acknowledgements - The authors would like to thank the participants of the TAPS Working - Group and the NEAT research project for valuable input to this - document. We especially thank Michael Tuexen for help with TAPS flow - connection establishment/teardown and Gorry Fairhurst for his - suggestions regarding fragmentation and packet sizes. This work has - received funding from the European Union's Horizon 2020 research and - innovation programme under grant agreement No. 644334 (NEAT). + The authors would like to thank all the participants of the TAPS + Working Group and the NEAT and MAMI research projects for valuable + input to this document. We especially thank Michael Tuexen for help + with TAPS connection connection establishment/teardown and Gorry + Fairhurst for his suggestions regarding fragmentation and packet + sizes. This work has received funding from the European Union's + Horizon 2020 research and innovation programme under grant agreement + No. 644334 (NEAT). 7. IANA Considerations XX RFC ED - PLEASE REMOVE THIS SECTION XXX This memo includes no request to IANA. 8. Security Considerations Authentication, confidentiality protection, and integrity protection @@ -843,45 +736,39 @@ [I-D.grinnemo-taps-he] Grinnemo, K., Brunstrom, A., Hurtig, P., Khademi, N., and Z. Bozakov, "Happy Eyeballs for Transport Selection", draft-grinnemo-taps-he-03 (work in progress), July 2017. [I-D.ietf-tsvwg-rtcweb-qos] Jones, P., Dhesikan, S., Jennings, C., and D. Druta, "DSCP Packet Markings for WebRTC QoS", draft-ietf-tsvwg-rtcweb- qos-18 (work in progress), August 2016. - [I-D.ietf-tsvwg-sctp-ndata] - Stewart, R., Tuexen, M., Loreto, S., and R. Seggelmann, - "Stream Schedulers and User Message Interleaving for the - Stream Control Transmission Protocol", draft-ietf-tsvwg- - sctp-ndata-13 (work in progress), September 2017. - [I-D.pauly-taps-transport-security] - Pauly, T. and C. Wood, "A Survey of Transport Security - Protocols", draft-pauly-taps-transport-security-00 (work - in progress), July 2017. - - [I-D.trammell-taps-post-sockets] - Trammell, B., Perkins, C., Pauly, T., Kuehlewind, M., and - C. Wood, "Post Sockets, An Abstract Programming Interface - for the Transport Layer", draft-trammell-taps-post- - sockets-01 (work in progress), September 2017. + Pauly, T., Rose, K., and C. Wood, "A Survey of Transport + Security Protocols", draft-pauly-taps-transport- + security-01 (work in progress), January 2018. [LBE-draft] Bless, R., "A Lower Effort Per-Hop Behavior (LE PHB)", - Internet-draft draft-tsvwg-le-phb-02, June 2017. + Internet-draft draft-tsvwg-le-phb-03, February 2018. [RFC2914] Floyd, S., "Congestion Control Principles", BCP 41, RFC 2914, DOI 10.17487/RFC2914, September 2000, . + [RFC3758] Stewart, R., Ramalho, M., Xie, Q., Tuexen, M., and P. + Conrad, "Stream Control Transmission Protocol (SCTP) + Partial Reliability Extension", RFC 3758, + DOI 10.17487/RFC3758, May 2004, + . + [RFC4895] Tuexen, M., Stewart, R., Lei, P., and E. Rescorla, "Authenticated Chunks for the Stream Control Transmission Protocol (SCTP)", RFC 4895, DOI 10.17487/RFC4895, August 2007, . [RFC4987] Eddy, W., "TCP SYN Flooding Attacks and Common Mitigations", RFC 4987, DOI 10.17487/RFC4987, August 2007, . [RFC5925] Touch, J., Mankin, A., and R. Bonica, "The TCP @@ -892,24 +779,41 @@ Yasevich, "Sockets API Extensions for the Stream Control Transmission Protocol (SCTP)", RFC 6458, DOI 10.17487/RFC6458, December 2011, . [RFC6525] Stewart, R., Tuexen, M., and P. Lei, "Stream Control Transmission Protocol (SCTP) Stream Reconfiguration", RFC 6525, DOI 10.17487/RFC6525, February 2012, . + [RFC7305] Lear, E., Ed., "Report from the IAB Workshop on Internet + Technology Adoption and Transition (ITAT)", RFC 7305, + DOI 10.17487/RFC7305, July 2014, + . + [RFC7413] Cheng, Y., Chu, J., Radhakrishnan, S., and A. Jain, "TCP Fast Open", RFC 7413, DOI 10.17487/RFC7413, December 2014, . + [RFC7496] Tuexen, M., Seggelmann, R., Stewart, R., and S. Loreto, + "Additional Policies for the Partially Reliable Stream + Control Transmission Protocol Extension", RFC 7496, + DOI 10.17487/RFC7496, April 2015, + . + + [RFC8260] Stewart, R., Tuexen, M., Loreto, S., and R. Seggelmann, + "Stream Schedulers and User Message Interleaving for the + Stream Control Transmission Protocol", RFC 8260, + DOI 10.17487/RFC8260, November 2017, + . + [WWDC2015] Lakhera, P. and S. Cheshire, "Your App and Next Generation Networks", Apple Worldwide Developers Conference 2015, San Francisco, USA, June 2015, . Appendix A. Deriving the minimal set We approach the construction of a minimal set of transport features in the following way: @@ -935,21 +839,21 @@ Following [TAPS2], we divide the transport features into two main groups as follows: 1. CONNECTION related transport features - ESTABLISHMENT - AVAILABILITY - MAINTENANCE - TERMINATION - 2. DATA Transfer Related transport features + 2. DATA Transfer related transport features - Sending Data - Receiving Data - Errors We assume that TAPS applications have no specific requirements that need knowledge about the network, e.g. regarding the choice of network interface or the end-to-end path. Even with these assumptions, there are certain requirements that are strictly kept by transport protocols today, and these must also be kept by a TAPS system. Some of these requirements relate to transport features that @@ -982,22 +886,22 @@ marked as "ADDED". The corresponding transport features are automatable, and they are listed immediately below the "ADDED" transport feature. In this description, transport services are presented following the nomenclature "CATEGORY.[SUBCATEGORY].SERVICENAME.PROTOCOL", equivalent to "pass 2" in [TAPS2]. We also sketch how some of the TAPS transport features can be implemented by a TAPS system. For all transport features that are categorized as "functional" or "optimizing", and for which no matching TCP and/or UDP primitive - exists in "pass 2" of [TAPS2], a brief discussion on how to fall back - to TCP and/or UDP is included. + exists in "pass 2" of [TAPS2], a brief discussion on how to implement + them over TCP and/or UDP is included. We designate some transport features as "automatable" on the basis of a broader decision that affects multiple transport features: o Most transport features that are related to multi-streaming were designated as "automatable". This was done because the decision on whether to use multi-streaming or not does not depend on application-specific knowledge. This means that a connection that is exhibited to an application could be implemented by using a single stream of an SCTP association instead of mapping it to a @@ -1047,84 +951,84 @@ application-specific knowledge. Implementation: see Appendix A.3.2. o Specify number of attempts and/or timeout for the first establishment message Protocols: TCP, SCTP Functional because this is closely related to potentially assumed reliable data delivery for data that is sent before or during connection establishment. Implementation: Using a parameter of CONNECT.TCP and CONNECT.SCTP. - Fall-back to UDP: Do nothing (this is irrelevant in case of UDP - because there, reliable data delivery is not assumed). + Implementation over UDP: Do nothing (this is irrelevant in case of + UDP because there, reliable data delivery is not assumed). o Obtain multiple sockets Protocols: SCTP Automatable because the usage of multiple paths to communicate to the same end host relates to knowledge about the network, not the application. o Disable MPTCP Protocols: MPTCP Automatable because the usage of multiple paths to communicate to the same end host relates to knowledge about the network, not the application. Implementation: via a boolean parameter in CONNECT.MPTCP. o Configure authentication Protocols: TCP, SCTP Functional because this has a direct influence on security. Implementation: via parameters in CONNECT.TCP and CONNECT.SCTP. - Fall-back to TCP: With TCP, this allows to configure Master Key - Tuples (MKTs) to authenticate complete segments (including the TCP - IPv4 pseudoheader, TCP header, and TCP data). With SCTP, this + Implementation over TCP: With TCP, this allows to configure Master + Key Tuples (MKTs) to authenticate complete segments (including the + TCP IPv4 pseudoheader, TCP header, and TCP data). With SCTP, this allows to specify which chunk types must always be authenticated. Authenticating only certain chunk types creates a reduced level of security that is not supported by TCP; to be compatible, this should therefore only allow to authenticate all chunk types. Key material must be provided in a way that is compatible with both [RFC4895] and [RFC5925]. - Fall-back to UDP: Not possible. + Implementation over UDP: Not possible. o Indicate (and/or obtain upon completion) an Adaptation Layer via an adaptation code point Protocols: SCTP Functional because it allows to send extra data for the sake of identifying an adaptation layer, which by itself is application- specific. Implementation: via a parameter in CONNECT.SCTP. - Fall-back to TCP: not possible. - Fall-back to UDP: not possible. + Implementation over TCP: not possible. + Implementation over UDP: not possible. o Request to negotiate interleaving of user messages Protocols: SCTP Automatable because it requires using multiple streams, but requesting multiple streams in the CONNECTION.ESTABLISHMENT category is automatable. Implementation: via a parameter in CONNECT.SCTP. o Hand over a message to reliably transfer (possibly multiple times) before connection establishment Protocols: TCP Functional because this is closely tied to properties of the data that an application sends or expects to receive. Implementation: via a parameter in CONNECT.TCP. - Fall-back to UDP: not possible. + Implementation over UDP: not possible. o Hand over a message to reliably transfer during connection establishment Protocols: SCTP Functional because this can only work if the message is limited in size, making it closely tied to properties of the data that an application sends or expects to receive. Implementation: via a parameter in CONNECT.SCTP. - Fall-back to UDP: not possible. + Implementation over UDP: not possible. o Enable UDP encapsulation with a specified remote UDP port number Protocols: SCTP Automatable because UDP encapsulation relates to knowledge about the network, not the application. AVAILABILITY: o Listen Protocols: TCP, SCTP, UDP(-Lite) @@ -1167,98 +1071,99 @@ o Disable MPTCP Protocols: MPTCP Automatable because the usage of multiple paths to communicate to the same end host relates to knowledge about the network, not the application. o Configure authentication Protocols: TCP, SCTP Functional because this has a direct influence on security. Implementation: via parameters in LISTEN.TCP and LISTEN.SCTP. - Fall-back to TCP: With TCP, this allows to configure Master Key - Tuples (MKTs) to authenticate complete segments (including the TCP - IPv4 pseudoheader, TCP header, and TCP data). With SCTP, this + Implementation over TCP: With TCP, this allows to configure Master + Key Tuples (MKTs) to authenticate complete segments (including the + TCP IPv4 pseudoheader, TCP header, and TCP data). With SCTP, this allows to specify which chunk types must always be authenticated. Authenticating only certain chunk types creates a reduced level of security that is not supported by TCP; to be compatible, this should therefore only allow to authenticate all chunk types. Key material must be provided in a way that is compatible with both [RFC4895] and [RFC5925]. - Fall-back to UDP: not possible. + Implementation over UDP: not possible. o Obtain requested number of streams Protocols: SCTP Automatable because using multi-streaming does not require application-specific knowledge. Implementation: see Appendix A.3.2. o Limit the number of inbound streams Protocols: SCTP Automatable because using multi-streaming does not require application-specific knowledge. Implementation: see Appendix A.3.2. o Indicate (and/or obtain upon completion) an Adaptation Layer via an adaptation code point Protocols: SCTP Functional because it allows to send extra data for the sake of identifying an adaptation layer, which by itself is application- specific. Implementation: via a parameter in LISTEN.SCTP. - Fall-back to TCP: not possible. - Fall-back to UDP: not possible. + Implementation over TCP: not possible. + Implementation over UDP: not possible. o Request to negotiate interleaving of user messages Protocols: SCTP Automatable because it requires using multiple streams, but requesting multiple streams in the CONNECTION.ESTABLISHMENT category is automatable. Implementation: via a parameter in LISTEN.SCTP. MAINTENANCE: o Change timeout for aborting connection (using retransmit limit or time value) Protocols: TCP, SCTP Functional because this is closely related to potentially assumed reliable data delivery. Implementation: via CHANGE-TIMEOUT.TCP or CHANGE-TIMEOUT.SCTP. - Fall-back to UDP: not possible (UDP is unreliable and there is no - connection timeout). + Implementation over UDP: not possible (UDP is unreliable and there + is no connection timeout). o Suggest timeout to the peer Protocols: TCP Functional because this is closely related to potentially assumed reliable data delivery. Implementation: via CHANGE-TIMEOUT.TCP. - Fall-back to UDP: not possible (UDP is unreliable and there is no - connection timeout). + Implementation over UDP: not possible (UDP is unreliable and there + is no connection timeout). o Disable Nagle algorithm Protocols: TCP, SCTP Optimizing because this decision depends on knowledge about the size of future data blocks and the delay between them. Implementation: via DISABLE-NAGLE.TCP and DISABLE-NAGLE.SCTP. - Fall-back to UDP: do nothing (UDP does not implement the Nagle - algorithm). + Implementation over UDP: do nothing (UDP does not implement the + Nagle algorithm). o Request an immediate heartbeat, returning success/failure Protocols: SCTP Automatable because this informs about network-specific knowledge. o Notification of Excessive Retransmissions (early warning below abortion threshold) Protocols: TCP Optimizing because it is an early warning to the application, informing it of an impending functional event. Implementation: via ERROR.TCP. - Fall-back to UDP: do nothing (there is no abortion threshold). + Implementation over UDP: do nothing (there is no abortion + threshold). o Add path Protocols: MPTCP, SCTP MPTCP Parameters: source-IP; source-Port; destination-IP; destination-Port SCTP Parameters: local IP address Automatable because the usage of multiple paths to communicate to the same end host relates to knowledge about the network, not the application. @@ -1321,40 +1226,41 @@ Protocols: SCTP Automatable because it requires using multiple streams, but requesting multiple streams in the CONNECTION.ESTABLISHMENT category is automatable. Implementation: via a parameter in GETINTERL.SCTP. o Change authentication parameters Protocols: TCP, SCTP Functional because this has a direct influence on security. Implementation: via SET_AUTH.TCP and SET_AUTH.SCTP. - Fall-back to TCP: With SCTP, this allows to adjust key_id, key, - and hmac_id. With TCP, this allows to change the preferred + Implementation over TCP: With SCTP, this allows to adjust key_id, + key, and hmac_id. With TCP, this allows to change the preferred outgoing MKT (current_key) and the preferred incoming MKT (rnext_key), respectively, for a segment that is sent on the connection. Key material must be provided in a way that is compatible with both [RFC4895] and [RFC5925]. - Fall-back to UDP: not possible. + Implementation over UDP: not possible. o Obtain authentication information Protocols: SCTP Functional because authentication decisions may have been made by the peer, and this has an influence on the necessary application- level measures to provide a certain level of security. Implementation: via GETAUTH.SCTP. - Fall-back to TCP: With SCTP, this allows to obtain key_id and a - chunk list. With TCP, this allows to obtain current_key and + + Implementation over TCP: With SCTP, this allows to obtain key_id + and a chunk list. With TCP, this allows to obtain current_key and rnext_key from a previously received segment. Key material must be provided in a way that is compatible with both [RFC4895] and [RFC5925]. - Fall-back to UDP: not possible. + Implementation over UDP: not possible. o Reset Stream Protocols: SCTP Automatable because using multi-streaming does not require application-specific knowledge. Implementation: see Appendix A.3.2. o Notification of Stream Reset Protocols: STCP Automatable because using multi-streaming does not require @@ -1385,33 +1291,33 @@ Implementation: see Appendix A.3.2. o Choose a scheduler to operate between streams of an association Protocols: SCTP Optimizing because the scheduling decision requires application- specific knowledge. However, if a TAPS system would not use this, or wrongly configure it on its own, this would only affect the performance of data transfers; the outcome would still be correct within the "best effort" service model. Implementation: using SETSTREAMSCHEDULER.SCTP. - Fall-back to TCP: do nothing. - Fall-back to UDP: do nothing. + Implementation over TCP: do nothing. + Implementation over UDP: do nothing. o Configure priority or weight for a scheduler Protocols: SCTP Optimizing because the priority or weight requires application- specific knowledge. However, if a TAPS system would not use this, or wrongly configure it on its own, this would only affect the performance of data transfers; the outcome would still be correct within the "best effort" service model. Implementation: using CONFIGURESTREAMSCHEDULER.SCTP. - Fall-back to TCP: do nothing. - Fall-back to UDP: do nothing. + Implementation over TCP: do nothing. + Implementation over UDP: do nothing. o Configure send buffer size Protocols: SCTP Automatable because this decision relates to knowledge about the network and the Operating System, not the application (see also the discussion in Appendix A.3.4). o Configure receive buffer (and rwnd) size Protocols: SCTP Automatable because this decision relates to knowledge about the @@ -1437,99 +1343,99 @@ (it can be relevant for a sending application to request not to delay the SACK of a message, but this is a different transport feature). o Set Cookie life value Protocols: SCTP Functional because it relates to security (possibly weakened by keeping a cookie very long) versus the time between connection establishment attempts. Knowledge about both issues can be application-specific. - - Fall-back to TCP: the closest specified TCP functionality is the - cookie in TCP Fast Open; for this, [RFC7413] states that the - server "can expire the cookie at any time to enhance security" and - section 4.1.2 describes an example implementation where updating - the key on the server side causes the cookie to expire. + Implementation over TCP: the closest specified TCP functionality + is the cookie in TCP Fast Open; for this, [RFC7413] states that + the server "can expire the cookie at any time to enhance security" + and section 4.1.2 describes an example implementation where + updating the key on the server side causes the cookie to expire. Alternatively, for implementations that do not support TCP Fast Open, this transport feature could also affect the validity of SYN cookies (see Section 3.6 of [RFC4987]). - Fall-back to UDP: do nothing. + Implementation over UDP: do nothing. o Set maximum burst Protocols: SCTP Automatable because it relates to knowledge about the network, not the application. o Configure size where messages are broken up for partial delivery Protocols: SCTP Functional because this is closely tied to properties of the data that an application sends or expects to receive. - Fall-back to TCP: not possible. - Fall-back to UDP: not possible. + Implementation over TCP: not possible. + Implementation over UDP: not possible. o Disable checksum when sending Protocols: UDP Functional because application-specific knowledge is necessary to decide whether it can be acceptable to lose data integrity. Implementation: via SET_CHECKSUM_ENABLED.UDP. - Fall-back to TCP: do nothing. + Implementation over TCP: do nothing. o Disable checksum requirement when receiving Protocols: UDP Functional because application-specific knowledge is necessary to decide whether it can be acceptable to lose data integrity. Implementation: via SET_CHECKSUM_REQUIRED.UDP. - Fall-back to TCP: do nothing. + Implementation over TCP: do nothing. o Specify checksum coverage used by the sender Protocols: UDP-Lite Functional because application-specific knowledge is necessary to decide for which parts of the data it can be acceptable to lose data integrity. Implementation: via SET_CHECKSUM_COVERAGE.UDP-Lite. - Fall-back to TCP: do nothing. + Implementation over TCP: do nothing. o Specify minimum checksum coverage required by receiver Protocols: UDP-Lite Functional because application-specific knowledge is necessary to decide for which parts of the data it can be acceptable to lose data integrity. Implementation: via SET_MIN_CHECKSUM_COVERAGE.UDP-Lite. - Fall-back to TCP: do nothing. + Implementation over TCP: do nothing. o Specify DF field Protocols: UDP(-Lite) Optimizing because the DF field can be used to carry out Path MTU Discovery, which can lead an application to choose message sizes that can be transmitted more efficiently. + Implementation: via MAINTENANCE.SET_DF.UDP(-Lite) and SEND_FAILURE.UDP(-Lite). - Fall-back to TCP: do nothing. With TCP the sender is not in - control of transport message sizes, making this functionality + Implementation over TCP: do nothing. With TCP the sender is not + in control of transport message sizes, making this functionality irrelevant. o Get max. transport-message size that may be sent using a non- fragmented IP packet from the configured interface Protocols: UDP(-Lite) Optimizing because this can lead an application to choose message sizes that can be transmitted more efficiently. - Fall-back to TCP: do nothing: this information is not available - with TCP. + Implementation over TCP: do nothing: this information is not + available with TCP. o Get max. transport-message size that may be received from the configured interface Protocols: UDP(-Lite) Optimizing because this can, for example, influence an application's memory management. - Fall-back to TCP: do nothing: this information is not available - with TCP. + Implementation over TCP: do nothing: this information is not + available with TCP. o Specify TTL/Hop count field Protocols: UDP(-Lite) Automatable because a TAPS system can use a large enough system default to avoid communication failures. Allowing an application to configure it differently can produce notifications of ICMP error message arrivals that yield information which only relates to knowledge about the network, not the application. o Obtain TTL/Hop count field @@ -1541,22 +1447,22 @@ Protocols: UDP(-Lite) Automatable because the ECN field relates to knowledge about the network, not the application. o Obtain ECN field Protocols: UDP(-Lite) Optimizing because this information can be used by an application to better carry out congestion control (this is relevant when choosing a data transmission transport service that does not already do congestion control). - Fall-back to TCP: do nothing: this information is not available - with TCP. + Implementation over TCP: do nothing: this information is not + available with TCP. o Specify IP Options Protocols: UDP(-Lite) Automatable because IP Options relate to knowledge about the network, not the application. o Obtain IP Options Protocols: UDP(-Lite) Automatable because IP Options relate to knowledge about the network, not the application. @@ -1565,307 +1471,312 @@ Protocols: A protocol implementing the LEDBAT congestion control mechanism Optimizing because whether this service is appropriate or not depends on application-specific knowledge. However, wrongly using this will only affect the speed of data transfers (albeit including other transfers that may compete with the TAPS transfer in the network), so it is still correct within the "best effort" service model. Implementation: via CONFIGURE.LEDBAT and/or SET_DSCP.TCP / SET_DSCP.SCTP / SET_DSCP.UDP(-Lite) [LBE-draft]. - Fall-back to TCP: do nothing. - Fall-back to UDP: do nothing. + Implementation over TCP: do nothing. + Implementation over UDP: do nothing. TERMINATION: o Close after reliably delivering all remaining data, causing an event informing the application on the other side Protocols: TCP, SCTP Functional because the notion of a connection is often reflected in applications as an expectation to have all outstanding data delivered and no longer be able to communicate after a "Close" succeeded, with a communication sequence relating to this transport feature that is defined by the application protocol. Implementation: via CLOSE.TCP and CLOSE.SCTP. - Fall-back to UDP: not possible. + Implementation over UDP: not possible. o Abort without delivering remaining data, causing an event informing the application on the other side Protocols: TCP, SCTP Functional because the notion of a connection is often reflected in applications as an expectation to potentially not have all outstanding data delivered and no longer be able to communicate after an "Abort" succeeded. On both sides of a connection, an application protocol may define a communication sequence relating to this transport feature. Implementation: via ABORT.TCP and ABORT.SCTP. - Fall-back to UDP: not possible. + Implementation over UDP: not possible. o Abort without delivering remaining data, not causing an event informing the application on the other side Protocols: UDP(-Lite) Functional because the notion of a connection is often reflected in applications as an expectation to potentially not have all outstanding data delivered and no longer be able to communicate after an "Abort" succeeded. On both sides of a connection, an application protocol may define a communication sequence relating to this transport feature. Implementation: via ABORT.UDP(-Lite). - Fall-back to TCP: stop using the connection, wait for a timeout. + Implementation over TCP: stop using the connection, wait for a + timeout. o Timeout event when data could not be delivered for too long Protocols: TCP, SCTP Functional because this notifies that potentially assumed reliable data delivery is no longer provided. Implementation: via TIMEOUT.TCP and TIMEOUT.SCTP. - Fall-back to UDP: do nothing: this event will not occur with UDP. + Implementation over UDP: do nothing: this event will not occur + with UDP. A.1.2. DATA Transfer Related Transport Features A.1.2.1. Sending Data o Reliably transfer data, with congestion control Protocols: TCP, SCTP Functional because this is closely tied to properties of the data that an application sends or expects to receive. Implementation: via SEND.TCP and SEND.SCTP. - Fall-back to UDP: not possible. + Implementation over UDP: not possible. o Reliably transfer a message, with congestion control Protocols: SCTP Functional because this is closely tied to properties of the data that an application sends or expects to receive. Implementation: via SEND.SCTP. - Fall-back to TCP: via SEND.TCP. With SEND.TCP, messages will not - be identifiable by the receiver. - Fall-back to UDP: not possible. + Implementation over TCP: via SEND.TCP. With SEND.TCP, messages + will not be identifiable by the receiver. + Implementation over UDP: not possible. o Unreliably transfer a message Protocols: SCTP, UDP(-Lite) Optimizing because only applications know about the time criticality of their communication, and reliably transfering a message is never incorrect for the receiver of a potentially unreliable data transfer, it is just slower. ADDED. This differs from the 2 automatable transport features below in that it leaves the choice of congestion control open. Implementation: via SEND.SCTP or SEND.UDP(-Lite). - Fall-back to TCP: use SEND.TCP. With SEND.TCP, messages will be - sent reliably, and they will not be identifiable by the receiver. + Implementation over TCP: use SEND.TCP. With SEND.TCP, messages + will be sent reliably, and they will not be identifiable by the + receiver. o Unreliably transfer a message, with congestion control Protocols: SCTP Automatable because congestion control relates to knowledge about the network, not the application. o Unreliably transfer a message, without congestion control Protocols: UDP(-Lite) Automatable because congestion control relates to knowledge about the network, not the application. o Configurable Message Reliability Protocols: SCTP Optimizing because only applications know about the time criticality of their communication, and reliably transfering a message is never incorrect for the receiver of a potentially unreliable data transfer, it is just slower. Implementation: via SEND.SCTP. - Fall-back to TCP: By using SEND.TCP and ignoring this + Implementation over TCP: By using SEND.TCP and ignoring this configuration: based on the assumption of the best-effort service model, unnecessarily delivering data does not violate application expectations. Moreover, it is not possible to associate the requested reliability to a "message" in TCP anyway. - Fall-back to UDP: not possible. + Implementation over UDP: not possible. o Choice of stream Protocols: SCTP Automatable because it requires using multiple streams, but requesting multiple streams in the CONNECTION.ESTABLISHMENT category is automatable. Implementation: see Appendix A.3.2. o Choice of path (destination address) Protocols: SCTP Automatable because it requires using multiple sockets, but obtaining multiple sockets in the CONNECTION.ESTABLISHMENT category is automatable. o Ordered message delivery (potentially slower than unordered) Protocols: SCTP Functional because this is closely tied to properties of the data that an application sends or expects to receive. Implementation: via SEND.SCTP. - Fall-back to TCP: By using SEND.TCP. With SEND.TCP, messages will - not be identifiable by the receiver. - Fall-back to UDP: not possible. + Implementation over TCP: By using SEND.TCP. With SEND.TCP, + messages will not be identifiable by the receiver. + Implementation over UDP: not possible. o Unordered message delivery (potentially faster than ordered) Protocols: SCTP, UDP(-Lite) Functional because this is closely tied to properties of the data that an application sends or expects to receive. Implementation: via SEND.SCTP. - Fall-back to TCP: By using SEND.TCP and always sending data + Implementation over TCP: By using SEND.TCP and always sending data ordered: based on the assumption of the best-effort service model, ordered delivery may just be slower and does not violate application expectations. Moreover, it is not possible to associate the requested delivery order to a "message" in TCP anyway. o Request not to bundle messages Protocols: SCTP Optimizing because this decision depends on knowledge about the size of future data blocks and the delay between them. Implementation: via SEND.SCTP. - Fall-back to TCP: By using SEND.TCP and DISABLE-NAGLE.TCP to - disable the Nagle algorithm when the request is made and enable it - again when the request is no longer made. Note that this is not - fully equivalent because it relates to the time of issuing the + Implementation over TCP: By using SEND.TCP and DISABLE-NAGLE.TCP + to disable the Nagle algorithm when the request is made and enable + it again when the request is no longer made. Note that this is + not fully equivalent because it relates to the time of issuing the request rather than a specific message. - Fall-back to UDP: do nothing (UDP never bundles messages). + Implementation over UDP: do nothing (UDP never bundles messages). o Specifying a "payload protocol-id" (handed over as such by the receiver) Protocols: SCTP Functional because it allows to send extra application data with every message, for the sake of identification of data, which by itself is application-specific. Implementation: SEND.SCTP. - Fall-back to TCP: not possible. - Fall-back to UDP: not possible. + Implementation over TCP: not possible. + Implementation over UDP: not possible. o Specifying a key id to be used to authenticate a message Protocols: SCTP Functional because this has a direct influence on security. Implementation: via a parameter in SEND.SCTP. - Fall-back to TCP: This could be emulated by using SET_AUTH.TCP - before and after the message is sent. Note that this is not fully - equivalent because it relates to the time of issuing the request - rather than a specific message. - Fall-back to UDP: not possible. + Implementation over TCP: This could be emulated by using + SET_AUTH.TCP before and after the message is sent. Note that this + is not fully equivalent because it relates to the time of issuing + the request rather than a specific message. + Implementation over UDP: not possible. o Request not to delay the acknowledgement (SACK) of a message Protocols: SCTP Optimizing because only an application knows for which message it wants to quickly be informed about success / failure of its delivery. - Fall-back to TCP: do nothing. - Fall-back to UDP: do nothing. + Implementation over TCP: do nothing. + Implementation over UDP: do nothing. A.1.2.2. Receiving Data o Receive data (with no message delimiting) Protocols: TCP Functional because a TAPS system must be able to send and receive data. Implementation: via RECEIVE.TCP. - Fall-back to UDP: do nothing (hand over a message, let the - application ignore frame boundaries). + Implementation over UDP: do nothing (hand over a message, let the + application ignore message boundaries). o Receive a message Protocols: SCTP, UDP(-Lite) Functional because this is closely tied to properties of the data that an application sends or expects to receive. Implementation: via RECEIVE.SCTP and RECEIVE.UDP(-Lite). - Fall-back to TCP: not possible. + Implementation over TCP: not possible. o Choice of stream to receive from Protocols: SCTP Automatable because it requires using multiple streams, but requesting multiple streams in the CONNECTION.ESTABLISHMENT category is automatable. Implementation: see Appendix A.3.2. o Information about partial message arrival Protocols: SCTP Functional because this is closely tied to properties of the data that an application sends or expects to receive. Implementation: via RECEIVE.SCTP. - Fall-back to TCP: do nothing: this information is not available - with TCP. - Fall-back to UDP: do nothing: this information is not available - with UDP. + Implementation over TCP: do nothing: this information is not + available with TCP. + + Implementation over UDP: do nothing: this information is not + available with UDP. A.1.2.3. Errors This section describes sending failures that are associated with a specific call to in the "Sending Data" category (Appendix A.1.2.1). o Notification of send failures Protocols: SCTP, UDP(-Lite) Functional because this notifies that potentially assumed reliable data delivery is no longer provided. ADDED. This differs from the 2 automatable transport features below in that it does not distinugish between unsent and unacknowledged messages. Implementation: via SENDFAILURE-EVENT.SCTP and SEND_FAILURE.UDP(- Lite). - Fall-back to TCP: do nothing: this notification is not available - and will therefore not occur with TCP. + Implementation over TCP: do nothing: this notification is not + available and will therefore not occur with TCP. o Notification of an unsent (part of a) message Protocols: SCTP, UDP(-Lite) Automatable because the distinction between unsent and unacknowledged is network-specific. o Notification of an unacknowledged (part of a) message Protocols: SCTP Automatable because the distinction between unsent and unacknowledged is network-specific. o Notification that the stack has no more user data to send Protocols: SCTP Optimizing because reacting to this notification requires the application to be involved, and ensuring that the stack does not run dry of data (for too long) can improve performance. - Fall-back to TCP: do nothing. See also the discussion in + Implementation over TCP: do nothing. See also the discussion in Appendix A.3.4. - Fall-back to UDP: do nothing. This notification is not available - and will therefore not occur with UDP. + Implementation over UDP: do nothing. This notification is not + available and will therefore not occur with UDP. o Notification to a receiver that a partial message delivery has been aborted Protocols: SCTP Functional because this is closely tied to properties of the data that an application sends or expects to receive. - Fall-back to TCP: do nothing. This notification is not available - and will therefore not occur with TCP. - Fall-back to UDP: do nothing. This notification is not available - and will therefore not occur with UDP. + Implementation over TCP: do nothing. This notification is not + available and will therefore not occur with TCP. + Implementation over UDP: do nothing. This notification is not + available and will therefore not occur with UDP. A.2. Step 2: Reduction -- The Reduced Set of Transport Features By hiding automatable transport features from the application, a TAPS system can gain opportunities to automate the usage of network- related functionality. This can facilitate using the TAPS system for the application programmer and it allows for optimizations that may not be possible for an application. For instance, system-wide configurations regarding the usage of multiple interfaces can better be exploited if the choice of the interface is not entirely up to the application. Therefore, since they are not strictly necessary to expose in a TAPS system, we do not include automatable transport features in the reduced set of transport features. This leaves us with only the transport features that are either optimizing or functional. - A TAPS system should be able to fall back to TCP or UDP if + A TAPS system should be able to communicate via TCP or UDP if alternative transport protocols are found not to work. For many transport features, this is possible -- often by simply not doing - anything. For some transport features, however, it was identified - that neither a fall-back to TCP nor a fall-back to UDP is possible: - in these cases, even not doing anything would incur semantically - incorrect behavior. Whenever an application would make use of one of - these transport features, this would eliminate the possibility to use - TCP or UDP. Thus, we only keep the functional and optimizing - transport features for which a fall-back to either TCP or UDP is - possible in our reduced set. + anything when a specific request is made. For some transport + features, however, it was identified that direct usage of neither TCP + nor UDP is possible: in these cases, even not doing anything would + incur semantically incorrect behavior. Whenever an application would + make use of one of these transport features, this would eliminate the + possibility to use TCP or UDP. Thus, we only keep the functional and + optimizing transport features for which an implementation over either + TCP or UDP is possible in our reduced set. - In the following list, we precede a transport feature with "T:" if a - fall-back to TCP is possible, "U:" if a fall-back to UDP is possible, - and "TU:" if a fall-back to either TCP or UDP is possible. + In the following list, we precede a transport feature with "T:" if an + implementation over TCP is possible, "U:" if an implementation over + UDP is possible, and "TU:" if an implementation over either TCP or + UDP is possible. A.2.1. CONNECTION Related Transport Features ESTABLISHMENT: o T,U: Connect o T,U: Specify number of attempts and/or timeout for the first establishment message o T: Configure authentication o T: Hand over a message to reliably transfer (possibly multiple @@ -1947,93 +1857,94 @@ o T,U: Notification to a receiver that a partial message delivery has been aborted A.3. Step 3: Discussion The reduced set in the previous section exhibits a number of peculiarities, which we will discuss in the following. This section focuses on TCP because, with the exception of one particular transport feature ("Receive a message" -- we will discuss this in Appendix A.3.1), the list shows that UDP is strictly a subset of TCP. - We can first try to understand how to build a TAPS system that is - able to fall back to TCP, and then narrow down the result further to - allow that the system can always fall back to either TCP or UDP - (which effectively means removing everything related to reliability, - ordering, authentication and closing/aborting with a notification to - the peer). + We can first try to understand how to build a TAPS system that can + run over TCP, and then narrow down the result further to allow that + the system can always run over either TCP or UDP (which effectively + means removing everything related to reliability, ordering, + authentication and closing/aborting with a notification to the peer). Note that, because the functional transport features of UDP are -- with the exception of "Receive a message" -- a subset of TCP, TCP can - be used as a fall-back for UDP whenever an application does not need - message delimiting (e.g., because the application-layer protocol + be used as a replacement for UDP whenever an application does not + need message delimiting (e.g., because the application-layer protocol already does it). This has been recognized by many applications that already do this in practice, by trying to communicate with UDP at first, and falling back to TCP in case of a connection failure. A.3.1. Sending Messages, Receiving Bytes - When considering to fall back to TCP, there are several transport + For implementing a TAPS system over TCP, there are several transport features related to sending, but only a single transport feature related to receiving: "Receive data (with no message delimiting)" (and, strangely, "information about partial message arrival"). Notably, the transport feature "Receive a message" is also the only - non-automatable transport feature of UDP(-Lite) for which no fall- - back to TCP is possible. + non-automatable transport feature of UDP(-Lite) for which no + implementation over TCP is possible. To support these TCP receiver semantics, we define an "Application- Framed Bytestream" (AFra-Bytestream). AFra-Bytestreams allow senders to operate on messages while minimizing changes to the TCP socket API. In particular, nothing changes on the receiver side - data can be accepted via a normal TCP socket. In an AFra-Bytestream, the sending application can optionally inform - the transport about frame boundaries and required properties per - frame (configurable order and reliability, or embedding a request not - to delay the acknowledgement of a frame). Whenever the sending - application specifies per-frame properties that relax the notion of + the transport about message boundaries and required properties per + message (configurable order and reliability, or embedding a request + not to delay the acknowledgement of a message). Whenever the sending + application specifies per-message properties that relax the notion of reliable in-order delivery of bytes, it must assume that the - receiving application is 1) able to determine frame boundaries, - provided that frames are always kept intact, and 2) able to accept - these relaxed per-frame properties. Any signaling of such + receiving application is 1) able to determine message boundaries, + provided that messages are always kept intact, and 2) able to accept + these relaxed per-message properties. Any signaling of such information to the peer is up to an application-layer protocol and considered out of scope of this document. For example, if an application requests to transfer fixed-size messages of 100 bytes with partial reliability, this needs the receiving application to be prepared to accept data in chunks of 100 bytes. If, then, some of these 100-byte messages are missing (e.g., if SCTP with Configurable Reliability is used), this is the expected application behavior. With TCP, no messages would be missing, but this is also correct for the application, and the possible retransmission delay is acceptable within the best effort service - model. Still, the receiving application would separate the byte - stream into 100-byte chunks. + model [RFC7305]. Still, the receiving application would separate the + byte stream into 100-byte chunks. Note that this usage of messages does not require all messages to be equal in size. Many application protocols use some form of Type- Length-Value (TLV) encoding, e.g. by defining a header including length fields; another alternative is the use of byte stuffing methods such as COBS [COBS]. If an application needs message numbers, e.g. to restore the correct sequence of messages, these must also be encoded by the application itself, as the sequence number - related transport features of SCTP are no longer provided (in the - interest of enabling a fall-back to TCP). + related transport features of SCTP are not provided by the "minimum + set" (in the interest of enabling usage of TCP). + + !!!NOTE: IMPLEMENTATION DETAILS BELOW WILL BE MOVED TO A SEPARATE + DRAFT IN A FUTURE VERSION.!!! For the implementation of a TAPS system, this has the following consequences: o Because the receiver-side transport leaves it up to the application to delimit messages, messages must always remain intact as they are handed over by the transport receiver. Data can be handed over at any time as they arrive, but the byte stream must never "skip ahead" to the beginning of the next message. - o With SCTP, a "partial flag" informs a receiving application that a message is incomplete. Then, the next receive calls will only deliver remaining parts of the same message (i.e., no messages or partial messages will arrive on other streams until the message is complete) (see Section 8.1.20 in [RFC6458]). This can facilitate the implementation of the receiver buffer in the receiving application, but then such an application does not support message interleaving (which is required by stream schedulers). However, receiving a byte stream from multiple SCTP streams requires a per- stream receiver buffer anyway, so this potential benefit is lost @@ -2047,95 +1958,108 @@ comes at no additional implementation cost on the receiver side. Stream schedulers operate on the sender side. Hence, because a TAPS sender-side application may talk to an SCTP receiver that does not support interleaving, it cannot assume that stream schedulers will always work as expected. A.3.2. Stream Schedulers Without Streams We have already stated that multi-streaming does not require application-specific knowledge. Potential benefits or disadvantages - of, e.g., using two streams over an SCTP association versus using two + of, e.g., using two streams of an SCTP association versus using two separate SCTP associations or TCP connections are related to knowledge about the network and the particular transport protocol in use, not the application. However, the transport features "Choose a scheduler to operate between streams of an association" and "Configure priority or weight for a scheduler" operate on streams. Here, streams identify communication channels between which a scheduler operates, and they can be assigned a priority. Moreover, the transport features in the MAINTENANCE category all operate on assocations in case of SCTP, i.e. they apply to all streams in that assocation. With only these semantics necessary to represent, the interface to a - TAPS system becomes easier if we rename connections into "TAPS flows" - (the TAPS equivalent of a connection which may be a transport - connection or association, but could also become a stream of an - existing SCTP association, for example) and allow assigning a "Group - Number" to a TAPS flow. Then, all MAINTENANCE transport features can - be said to operate on flow groups, not connections, and a scheduler - also operates on the flows within a group. + TAPS system becomes easier if we assume that TAPS connections may be + a transport connection or association, but could also be a stream of + an existing SCTP association, for example. We only need to allow for + a way to define a possible grouping of TAPS connections. Then, all + MAINTENANCE transport features can be said to operate on TAPS + connection groups, not TAPS connections, and a scheduler operates on + the connections within a group. + + !!!NOTE: IMPLEMENTATION DETAILS BELOW WILL BE MOVED TO A SEPARATE + DRAFT IN A FUTURE VERSION.!!! For the implementation of a TAPS system, this has the following consequences: o Streams may be identified in different ways across different protocols. The only multi-streaming protocol considered in this document, SCTP, uses a stream id. The transport association below still uses a Transport Address (which includes one port number) for each communicating endpoint. To implement a TAPS system without exposed streams, an application must be given an - identifier for each TAPS flow (akin to a socket), and depending on - whether streams are used or not, there will be a 1:1 mapping - between this identifier and local ports or not. + identifier for each TAPS connection (akin to a socket), and + depending on whether streams are used or not, there will be a 1:1 + mapping between this identifier and local ports or not. o In SCTP, a fixed number of streams exists from the beginning of an association; streams are not "established", there is no handshake or any other form of signaling to create them: they can just be used. They are also not "gracefully shut down" -- at best, an "SSN Reset Request Parameter" in a "RE-CONFIG" chunk [RFC6525] can be used to inform the peer that of a "Stream Reset", as a rough equivalent of an "Abort". This has an impact on the semantics - connection establishment and teardown (see Section 3.2). + connection establishment and teardown (see Section 3.1). + o To support stream schedulers, a receiver-side TAPS system should always support message interleaving because it comes at no additional implementation cost (because of the receiver-side stream reception discussed in Appendix A.3.1). Note, however, that Stream schedulers operate on the sender side. Hence, because a TAPS sender-side application may talk to a native TCP-based receiver-side application, it cannot assume that stream schedulers will always work as expected. + To be compatible with multiple transport protocols and uniformly + allow access to both transport connections and streams of a multi- + streaming protocol, the semantics of opening and closing need to be + the most restrictive subset of all of the underlying options. For + example, TCP's support of half-closed connections can be seen as a + feature on top of the more restrictive "ABORT"; this feature cannot + be supported because not all protocols used by a TAPS system + (including streams of an association) support half-closed + connections. + A.3.3. Early Data Transmission There are two transport features related to transferring a message early: "Hand over a message to reliably transfer (possibly multiple times) before connection establishment", which relates to TCP Fast Open [RFC7413], and "Hand over a message to reliably transfer during connection establishment", which relates to SCTP's ability to transfer data together with the COOKIE-Echo chunk. Also without TCP Fast Open, TCP can transfer data during the handshake, together with the SYN packet -- however, the receiver of this data may not hand it over to the application until the handshake has completed. Also, different from TCP Fast Open, this data is not delimited as a message by TCP (thus, not visible as a ``message''). This functionality is commonly available in TCP and supported in several implementations, even though the TCP specification does not explain how to provide it to applications. A TAPS system could differentiate between the cases of transmitting - data "before" (possibly multiple times) or during the handshake. - + data "before" (possibly multiple times) or "during" the handshake. Alternatively, it could also assume that data that are handed over early will be transmitted as early as possible, and "before" the - handshake would only be used for data that are explicitly marked as - "idempotent" (i.e., it would be acceptable to transfer it multiple - times). + handshake would only be used for messages that are explicitly marked + as "idempotent" (i.e., it would be acceptable to transfer them + multiple times). The amount of data that can successfully be transmitted before or during the handshake depends on various factors: the transport protocol, the use of header options, the choice of IPv4 and IPv6 and the Path MTU. A TAPS system should therefore allow a sending application to query the maximum amount of data it can possibly transmit before (or, if exposed, during) connection establishment. A.3.4. Sender Running Dry @@ -2143,40 +2067,42 @@ data to send" relates to SCTP's "SENDER DRY" notification. Such notifications can, in principle, be used to avoid having an unnecessarily large send buffer, yet ensure that the transport sender always has data available when it has an opportunity to transmit it. This has been found to be very beneficial for some applications [WWDC2015]. However, "SENDER DRY" truly means that the entire send buffer (including both unsent and unacknowledged data) has emptied -- i.e., when it notifies the sender, it is already too late, the transport protocol already missed an opportunity to send data. Some modern TCP implementations now include the unspecified - "TCP_NOTSENT_LOWAT" socket option proposed in [WWDC2015], which - limits the amount of unsent data that TCP can keep in the socket - buffer; this allows to specify at which buffer filling level the - socket becomes writable, rather than waiting for the buffer to run - empty. + "TCP_NOTSENT_LOWAT" socket option that was proposed in [WWDC2015], + which limits the amount of unsent data that TCP can keep in the + socket buffer; this allows to specify at which buffer filling level + the socket becomes writable, rather than waiting for the buffer to + run empty. SCTP allows to configure the sender-side buffer too: the automatable Transport Feature "Configure send buffer size" provides this functionality, but only for the complete buffer, which includes both unsent and unacknowledged data. SCTP does not allow to control these - two sizes separately. A TAPS system should allow for uniform access - to "TCP_NOTSENT_LOWAT" as well as the "SENDER DRY" notification. + two sizes separately. It therefore makes sense for a TAPS system to + allow for uniform access to "TCP_NOTSENT_LOWAT" as well as the + "SENDER DRY" notification. A.3.5. Capacity Profile The transport features: o Disable Nagle algorithm o Enable and configure a "Low Extra Delay Background Transfer" o Specify DSCP field + all relate to a QoS-like application need such as "low latency" or "scavenger". In the interest of flexibility of a TAPS system, they could therefore be offered in a uniform, more abstract way, where a TAPS system could e.g. decide by itself how to use combinations of LEDBAT-like congestion control and certain DSCP values, and an application would only specify a general "capacity profile" (a description of how it wants to use the available capacity). A need for "lowest possible latency at the expense of overhead" could then translate into automatically disabling the Nagle algorithm. @@ -2209,29 +2135,20 @@ UDP(-Lite) has a transport feature called "Specify DF field". This yields an error message in case of sending a message that exceeds the Path MTU, which is necessary for a UDP-based application to be able to implement Path MTU Discovery (a function that UDP-based applications must do by themselves). The "Get max. transport-message size that may be sent using a non-fragmented IP packet from the configured interface" transport feature yields an upper limit for the Path MTU (minus headers) and can therefore help to implement Path MTU Discovery more efficiently. - This also relates to the fact that the choice of path is automatable: - if a TAPS system can switch a path at any time, unknown to an - application, yet the application intends to do Path MTU Discovery, - this could yield a very inefficient behavior. Thus, a TAPS system - should probably avoid automatically switching paths, and inform the - application about any unavoidable path changes, when applications - request to disallow fragmentation with the "Specify DF field" - feature. - Appendix B. Revision information XXX RFC-Ed please remove this section prior to publication. -02: implementation suggestions added, discussion section added, terminology extended, DELETED category removed, various other fixes; list of Transport Features adjusted to -01 version of [TAPS2] except that MPTCP is not included. -03: updated to be consistent with -02 version of [TAPS2]. @@ -2260,22 +2177,31 @@ divided into two features, one for ordered, one for unordered delivery. The word "reliably" was added to the transport features "Hand over a message to reliably transfer (possibly multiple times) before connection establishment" and "Hand over a message to reliably transfer during connection establishment" to make it clearer why this is not supported by UDP. Clarified that the "minset abstract interface" is not proposing a specific API for all TAPS systems to implement, but it is just a way to describe the minimum set. Author order changed. -Authors' Addresses + draft-ietf-taps-minset-01: "fall-back to" (TCP or UDP) replaced + (mostly with "implementation over"). References to post-sockets + removed (these were statments that assumed that post-sockets requires + two-sided implementation). Replaced "flow" with "TAPS Connection" + and "frame" with "message" to avoid introducing new terminology. + Made sections 3 and 4 in line with the categorization that is already + used in the appendix and [TAPS2], and changed style of section 4 to + be even shorter and less interface-like. Updated reference draft- + ietf-tsvwg-sctp-ndata to RFC8260. +Authors' Addresses Michael Welzl University of Oslo PO Box 1080 Blindern Oslo N-0316 Norway Phone: +47 22 85 24 20 Email: michawe@ifi.uio.no Stein Gjessing