--- 1/draft-ietf-mmusic-ice-06.txt 2006-03-09 22:12:29.000000000 +0100 +++ 2/draft-ietf-mmusic-ice-07.txt 2006-03-09 22:12:29.000000000 +0100 @@ -1,18 +1,18 @@ MMUSIC J. Rosenberg Internet-Draft Cisco Systems -Expires: April 22, 2006 October 19, 2005 +Expires: September 7, 2006 March 6, 2006 Interactive Connectivity Establishment (ICE): A Methodology for Network Address Translator (NAT) Traversal for Offer/Answer Protocols - draft-ietf-mmusic-ice-06 + draft-ietf-mmusic-ice-07 Status of this Memo By submitting this Internet-Draft, each author represents that any applicable patent or other IPR claims of which he or she is aware have been or will be disclosed, and any of which he or she becomes aware will be disclosed, in accordance with Section 6 of BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that @@ -23,218 +23,227 @@ and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt. The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html. - This Internet-Draft will expire on April 22, 2006. + This Internet-Draft will expire on September 7, 2006. Copyright Notice - Copyright (C) The Internet Society (2005). + Copyright (C) The Internet Society (2006). Abstract This document describes a protocol for Network Address Translator (NAT) traversal for multimedia session signaling protocols based on the offer/answer model, such as the Session Initiation Protocol (SIP). This protocol is called Interactive Connectivity - Establishment (ICE). ICE makes use of existing protocols, such as - Simple Traversal of UDP Through NAT (STUN) and Traversal Using Relay - NAT (TURN). ICE makes use of STUN in peer-to-peer cooperative - fashion, allowing participants to discover, create and verify mutual - connectivity. + Establishment (ICE). ICE makes use of the Simple Traversal of UDP + through NAT (STUN), applying its binding discovery, connectivity + check and relay usages. Table of Contents 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 4 2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . 4 3. Overview of ICE . . . . . . . . . . . . . . . . . . . . . . 8 - 4. Sending the Initial Offer . . . . . . . . . . . . . . . . . 10 + 4. Sending the Initial Offer . . . . . . . . . . . . . . . . . 11 5. Receipt of the Offer and Generation of the Answer . . . . . 11 - 6. Processing the Answer . . . . . . . . . . . . . . . . . . . 11 - 7. Common Procedures . . . . . . . . . . . . . . . . . . . . . 11 + 6. Processing the Answer . . . . . . . . . . . . . . . . . . . 12 + 7. Common Procedures . . . . . . . . . . . . . . . . . . . . . 12 7.1 Gathering Candidates . . . . . . . . . . . . . . . . . . . 12 - 7.2 Prioritizing the Candidates and Choosing an Active One . . 15 - 7.3 Encoding Candidates into SDP . . . . . . . . . . . . . . . 17 - 7.4 Forming Candidate Pairs . . . . . . . . . . . . . . . . . 19 - 7.5 Ordering the Candidate Pairs . . . . . . . . . . . . . . . 22 - 7.6 Performing the Connectivity Checks . . . . . . . . . . . . 23 - 7.7 Sending a Binding Request for Connectivity Checks . . . . 27 - 7.8 Receiving a Binding Request for Connectivity Checks . . . 29 - 7.9 Promoting a Candidate to Active . . . . . . . . . . . . . 31 - 7.10 Learning New Candidates from Connectivity Checks . . . . 31 - 7.10.1 On Receipt of a Binding Request . . . . . . . . . . 32 - 7.10.2 On Receipt of a Binding Response . . . . . . . . . . 35 - 7.11 Subsequent Offer/Answer Exchanges . . . . . . . . . . . 37 - 7.11.1 Sending of a Subsequent Offer . . . . . . . . . . . 37 - 7.11.2 Receiving the Offer and Sending an Answer . . . . . 39 - 7.11.3 Receiving the Answer . . . . . . . . . . . . . . . . 41 - 7.12 Binding Keepalives . . . . . . . . . . . . . . . . . . . 41 - 7.13 Sending Media . . . . . . . . . . . . . . . . . . . . . 42 - 8. Guidelines for Usage with SIP . . . . . . . . . . . . . . . 43 - 9. Interactions with Forking . . . . . . . . . . . . . . . . . 44 - 10. Interactions with Preconditions . . . . . . . . . . . . . . 45 - 11. Example . . . . . . . . . . . . . . . . . . . . . . . . . . 45 - 12. Grammar . . . . . . . . . . . . . . . . . . . . . . . . . . 68 - 13. Security Considerations . . . . . . . . . . . . . . . . . . 69 - 13.1 Attacks on Connectivity Checks . . . . . . . . . . . . . 69 - 13.2 Attacks on Address Gathering . . . . . . . . . . . . . . 72 - 13.3 Attacks on the Offer/Answer Exchanges . . . . . . . . . 73 - 13.4 Insider Attacks . . . . . . . . . . . . . . . . . . . . 73 - 13.4.1 The Voice Hammer Attack . . . . . . . . . . . . . . 73 - 13.4.2 STUN Amplification Attack . . . . . . . . . . . . . 74 - 14. IANA Considerations . . . . . . . . . . . . . . . . . . . . 74 - 14.1 candidate Attribute . . . . . . . . . . . . . . . . . . 74 - 14.2 remote-candidate Attribute . . . . . . . . . . . . . . . 75 - 15. IAB Considerations . . . . . . . . . . . . . . . . . . . . . 75 - 15.1 Problem Definition . . . . . . . . . . . . . . . . . . . 75 - 15.2 Exit Strategy . . . . . . . . . . . . . . . . . . . . . 76 - 15.3 Brittleness Introduced by ICE . . . . . . . . . . . . . 76 - 15.4 Requirements for a Long Term Solution . . . . . . . . . 77 - 15.5 Issues with Existing NAPT Boxes . . . . . . . . . . . . 78 - 16. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 78 - 17. References . . . . . . . . . . . . . . . . . . . . . . . . . 78 - 17.1 Normative References . . . . . . . . . . . . . . . . . . 78 - 17.2 Informative References . . . . . . . . . . . . . . . . . 79 - Author's Address . . . . . . . . . . . . . . . . . . . . . . 81 - Intellectual Property and Copyright Statements . . . . . . . 82 + 7.2 Prioritizing the Candidates and Choosing an Active One . . 16 + 7.3 Encoding Candidates into SDP . . . . . . . . . . . . . . . 18 + 7.4 Forming Candidate Pairs . . . . . . . . . . . . . . . . . 21 + 7.5 Ordering the Candidate Pairs . . . . . . . . . . . . . . . 23 + 7.6 Performing the Connectivity Checks . . . . . . . . . . . . 26 + 7.7 Sending a Binding Request for Connectivity Checks . . . . 30 + 7.8 Receiving a Binding Request for Connectivity Checks . . . 31 + 7.9 Promoting a Candidate to Active . . . . . . . . . . . . . 33 + 7.10 Learning New Candidates from Connectivity Checks . . . . 34 + 7.10.1 On Receipt of a Binding Request . . . . . . . . . . 34 + 7.10.2 On Receipt of a Binding Response . . . . . . . . . . 38 + 7.11 Subsequent Offer/Answer Exchanges . . . . . . . . . . . 39 + 7.11.1 Sending of a Subsequent Offer . . . . . . . . . . . 40 + 7.11.2 Receiving the Offer and Sending an Answer . . . . . 42 + 7.11.3 Receiving the Answer . . . . . . . . . . . . . . . . 45 + 7.12 Binding Keepalives . . . . . . . . . . . . . . . . . . . 45 + 7.13 Sending Media . . . . . . . . . . . . . . . . . . . . . 46 + 8. Guidelines for Usage with SIP . . . . . . . . . . . . . . . 49 + 9. Interactions with Forking . . . . . . . . . . . . . . . . . 51 + 10. Interactions with Preconditions . . . . . . . . . . . . . . 51 + 11. Examples . . . . . . . . . . . . . . . . . . . . . . . . . . 51 + 11.1 Basic Example . . . . . . . . . . . . . . . . . . . . . 53 + 11.2 Advanced Example . . . . . . . . . . . . . . . . . . . . 57 + 12. Grammar . . . . . . . . . . . . . . . . . . . . . . . . . . 77 + 13. Security Considerations . . . . . . . . . . . . . . . . . . 79 + 13.1 Attacks on Connectivity Checks . . . . . . . . . . . . . 79 + 13.2 Attacks on Address Gathering . . . . . . . . . . . . . . 81 + 13.3 Attacks on the Offer/Answer Exchanges . . . . . . . . . 82 + 13.4 Insider Attacks . . . . . . . . . . . . . . . . . . . . 82 + 13.4.1 The Voice Hammer Attack . . . . . . . . . . . . . . 82 + 13.4.2 STUN Amplification Attack . . . . . . . . . . . . . 83 + 14. IANA Considerations . . . . . . . . . . . . . . . . . . . . 83 + 14.1 candidate Attribute . . . . . . . . . . . . . . . . . . 83 + 14.2 remote-candidate Attribute . . . . . . . . . . . . . . . 84 + 14.3 ice-pwd Attribute . . . . . . . . . . . . . . . . . . . 84 + 15. IAB Considerations . . . . . . . . . . . . . . . . . . . . . 85 + 15.1 Problem Definition . . . . . . . . . . . . . . . . . . . 85 + 15.2 Exit Strategy . . . . . . . . . . . . . . . . . . . . . 86 + 15.3 Brittleness Introduced by ICE . . . . . . . . . . . . . 86 + 15.4 Requirements for a Long Term Solution . . . . . . . . . 87 + 15.5 Issues with Existing NAPT Boxes . . . . . . . . . . . . 87 + 16. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 88 + 17. References . . . . . . . . . . . . . . . . . . . . . . . . . 88 + 17.1 Normative References . . . . . . . . . . . . . . . . . . 88 + 17.2 Informative References . . . . . . . . . . . . . . . . . 89 + Author's Address . . . . . . . . . . . . . . . . . . . . . . 91 + Intellectual Property and Copyright Statements . . . . . . . 92 1. Introduction - A multimedia session signaling protocol is a protocol that exchanges - control messages between a pair of agents for the purposes of - establishing the flow of media traffic between them. This media flow - is distinct from the flow of control messages, and may take a - different path through the network. Examples of such protocols are - the Session Initiation Protocol (SIP) [3], the Real Time Streaming - Protocol (RTSP) [17] and the International Telecommunications Union - (ITU) H.323. + RFC 3264 [4] defines a two-phase exchange of Session Descrption + Protocol (SDP) messages [5] for the purposes of establishment of + multimedia sessions. This offer/answer mechanism is used by + protocols such as the Session Initiation Protocol (SIP) [2]. - These protocols, by nature of their design, are difficult to operate - through Network Address Translators (NAT). Because their purpose is - to establish a flow of media packets, they tend to carry IP addresses - within their messages, which is known to be problematic through NAT - [18]. The protocols also seek to create a media flow directly - between participants, so that there is no application layer - intermediary between them. This is done to reduce media latency, - decrease packet loss, and reduce the operational costs of deploying - the application. However, this is difficult to accomplish through - NAT. A full treatment of the reasons for this is beyond the scope of - this specification. + Protocols using offer/answer are difficult to operate through Network + Address Translators (NAT). Because their purpose is to establish a + flow of media packets, they tend to carry IP addresses within their + messages, which is known to be problematic through NAT [17]. The + protocols also seek to create a media flow directly between + participants, so that there is no application layer intermediary + between them. This is done to reduce media latency, decrease packet + loss, and reduce the operational costs of deploying the application. + However, this is difficult to accomplish through NAT. A full + treatment of the reasons for this is beyond the scope of this + specification. Numerous solutions have been proposed for allowing these protocols to operate through NAT. These include Application Layer Gateways - (ALGs), the Middlebox Control Protocol [20], Simple Traversal of UDP - through NAT (STUN) [1], Traversal Using Relay NAT [16], and Realm - Specific IP [21] [22] along with session description extensions - needed to make them work, such as the Session Description Protocol - (SDP) [7] attribute for the Real Time Control Protocol (RTCP) [2]. - Unfortunately, these techniques all have pros and cons which make - each one optimal in some network topologies, but a poor choice in - others. The result is that administrators and implementors are - making assumptions about the topologies of the networks in which + (ALGs), the Middlebox Control Protocol [19], Simple Traversal of UDP + through NAT (STUN) [16] and its revision [13], the STUN Relay Usage + [14], and Realm Specific IP [20] [21] along with session description + extensions needed to make them work, such as the Session Description + Protocol (SDP) [5] attribute for the Real Time Control Protocol + (RTCP) [1]. Unfortunately, these techniques all have pros and cons + which make each one optimal in some network topologies, but a poor + choice in others. The result is that administrators and implementors + are making assumptions about the topologies of the networks in which their solutions will be deployed. This introduces complexity and brittleness into the system. What is needed is a single solution which is flexible enough to work well in all situations. This specification provides that solution for media streams - established by signaling protocols based on the offer-answer model, - RFC 3264 [5]. It is called Interactive Connectivity Establishment, - or ICE. ICE makes use of STUN and TURN, but uses them in a specific - methodology which avoids many of the pitfalls of using any one alone. + established by signaling protocols based on the offer-answer model. + It is called Interactive Connectivity Establishment, or ICE. ICE + makes use of STUN and its relay extension, commonly called TURN, but + uses them in a specific methodology which avoids many of the pitfalls + of using any one alone. 2. Terminology Several new terms are introduced in this specification: Agent: As defined in RFC 3264, an agent is the protocol implementation involved in the offer/answer exchange. There are two agents involved in an offer/answer exchange. Peer: From the perspective of one of the agents in a session, its peer is the other agent. Specifically, from the perspective of the offerer, the peer is the answerer. From the perspective of the answerer, the peer is the offerer. Transport Address: The combination of an IP address and port. Local Transport Address: A local transport address is a transport address that has been allocated from the operating system on the host. This includes transport addresses obtained through Virtual Private Networks (VPNs) and transport addresses obtained through - Realm Specific IP (RSIP) [21] (which lives at the operating system + Realm Specific IP (RSIP) [20] (which lives at the operating system level). Transport addresses are typically obtained by binding to an interface. m/c line: The media and connection lines in the SDP, which together hold the transport address used for the receipt of media. Derived Transport Address: A derived transport address is a transport address which is derived from a local transport address. The derived transport address is related to the associated local transport address in that packets sent to the derived transport address are received on the socket bound to its associated local transport address. Derived addresses are obtained using protocols - like STUN and TURN, and more generally, any UNSAF protocol [23]. + like STUN, and more generally, any UNSAF protocol [22]. + + Reflexive Transport Address: As defined in [13], a transport address + learned by a client which identifies that client as seen by + another host on an IP network, typically a STUN server. When + there is an intervening NAT between the client and the other host, + the reflexive transport address represents the binding allocated + to the client on the public side of the NAT. Reflexive transport + addresses are learned from the MAPPED-ADDRESS attribute in STUN + Binding Responses and Allocate Responses [14], and are a type of + derived transport address. + + Server Reflexive Transport Address: A server reflexive transport + address is a reflexive address that is reflected off of a server, + distinct from the peer, whose address is configured or learned by + the client prior to an offer/answer exchange. + + Peer Reflexive Transport Address: A peer reflexive transport address + is a reflexive address that is reflected off of the peer. Peer + reflexive transport addresses are learned by connectivity checks. + + Relayed Transport Address: A transport address that terminates on a + server, and is forwarded towards the client. The STUN Allocate + Request can be used to obtain a relayed transport address, for + example. Associated Local Transport Address: When a peer sends a packet to a transport address, the associated local transport address is the local transport address at which those packets will actually arrive. For a local transport address, its associated local transport address is the same as the local transport address - itself. For STUN derived and TURN derived transport addresses, - however, they are not the same. The associated local transport - address is the one from which the STUN or TURN transport was - derived. - - Peer Derived Transport Address: A peer derived transport address is a - derived transport address learned from a STUN server running - within a peer in a media session. - - TURN Derived Transport Address: A derived transport address obtained - from a TURN server. - - STUN Derived Transport Address: A derived transport address obtained - from a STUN server whose address has been provisioned or - discovered by the UA. This, by definition, excludes Peer Derived - Transport Addresses. + itself. For reflexive and relayed transport addresses, however, + they are not the same. The associated local transport address is + the one from which the reflexive or relayed transport was derived. Candidate: A sequence of transport addresses that form an atomic set for usage with a particular media session. Here, atomic means that all of transport addresses in the candidate need to work before the candidate will be used for actual media transport. In the case of RTP, there can be one or more transport addresses per candidate. In the most common case, there are two - one for RTP, and another for RTCP. If the agent doesn't use RTCP, there would - be just one. If Generic Forward Error Correction (FEC) [19] is in + be just one. If Generic Forward Error Correction (FEC) [18] is in use, there may be more than two. The transport addresses that - compose a candidate are all of the same type - local, STUN - derived, TURN derived or peer derived. + compose a candidate are all of the same type - local, server + reflexive, peer reflexive or relayed. Local Candidate: A candidate whose transport addresses are local transport addresses. - STUN Candidate: A candidate whose transport addresses are STUN - derived transport addresses. + Server Reflexive Candidate: A candidate whose transport addresses are + server reflexive transport addresses. - TURN Candidate: A candidate whose transport addresses are TURN - derived transport addresses. + Peer Reflexive Candidate: A candidate whose transport addresses are + peer reflexive transport addresses. - Peer Derived Candidate: A candidate whose transport addresses are - peer derived transport addresses. + Relayed Candidate: A candidate whose transport addresses are relayed + transport addresses. - Generating Candidate: The candidate from which a peer derived + Generating Candidate: The candidate from which a peer reflexive candidate is derived. Active Candidate: The candidate that is in use for exchange of media. This is the one that an agent places in the m/c line of an offer or answer. Candidate ID: An identifier for a candidate. Component: When a media stream, and as a consequence, its candidate, require several IP addresses and ports to work atomically, each of @@ -302,76 +311,79 @@ if, when they exchange the addresses each has in that realm, they are able to send packets to each other. This includes IPv6 and IPv4 realms, which actually use different address spaces, in addition to private networks connected to the public Internet through NAT. The key assumption in ICE is that a client cannot know, apriori, which address realms it shares with any peer it may wish to communicate with. Therefore, in order to communicate, it has to try connecting to addresses in all of the realms. - Agent A TURN,STUN Servers Agent B + Agent A STUN Servers Agent B |(1) Gather Addresses | | |-------------------->| | |(2) Offer | | |------------------------------------------>| | |(3) Gather Addresses | | |<--------------------| |(4) Answer | | |<------------------------------------------| |(5) STUN Check | | |<------------------------------------------| |(6) STUN Check | | |------------------------------------------>| - |(7) Offer | | - |------------------------------------------>| - |(8) Answer | | - |<------------------------------------------| - |(9) Media | | + |(7) Media | | |<------------------------------------------| - |(10) Media | | + |(8) Media | | + |------------------------------------------>| + |(9) Offer | | |------------------------------------------>| + |(10) Answer | | + |<------------------------------------------| Figure 1 The basic flow of operation for ICE is shown in Figure 1. Before the offerer establishes a session, it obtains local transport addresses from its operating system on as many interfaces as it has access to. These interfaces can include IPv4 and IPv6 interfaces, in addition to Virtual Private Network (VPN) interfaces or ones associated with RSIP. It then obtains transport addresses for the media from each - interface. Though the ICE framework can support any type of - transport protocol, this specification only defines mechanisms for - UDP. In addition, the agent obtains derived transport addresses from - each local transport address using protocols such as STUN and TURN. - These are paced at a fixed rate in order to limit network load and - avoid NAT overload. The local and derived transport addresses are - formed into candidates, each of which represents a possible set of - transport addresses that might be viable for a media stream. + interface. Though ICE can support any type of transport protocol, + this specification only defines mechanisms for UDP. In addition, the + agent obtains server reflexive and relayed transport addresses. + These are usually obtained through a single STUN Allocate request, + which provides both. These requests are paced at a fixed rate in + order to limit network load and avoid NAT overload. The local, + server reflexive and relayed transport addresses are formed into + candidates, each of which represents a possible set of transport + addresses that might be viable for a media stream. Each candidate is listed in a set of a=candidate attributes in the offer. Each candidate is given a priority. Priority is a matter of local policy, but typically, lowest priority would be given to - transport addresses learned from a TURN server (i.e., TURN derived - transport addresses). Each candidate is also assigned a distinct ID, - called a candidate ID. + relayed transport addresses. Each candidate is also assigned a + distinct ID, called a candidate ID. The agent will choose one of its candidates as its active candidate for inclusion in the connection and media lines in the offer. Media can be sent to this candidate immediately following its validation. - Media is not sent without validation in order to avoid denial-of- - service attacks. In particular, without ICE, an offerer can send an - offer to another agent, and list the IP address and port of a target - in the offer. If the agent is an automata that answers a call - automatically, it will do so and then proceed to send media to the - target. This provides substantial packet amplifications. ICE fixes - this by using STUN-based validation of addresses. + Media can also be sent to a candidate that is not active but has been + validated. Media is not sent without validation in order to avoid + denial-of-service attacks. In particular, without ICE, an offerer + can send an offer to another agent, and list the IP address and port + of a target in the offer. If the agent is an automata that answers a + call automatically, it will do so and then proceed to send media to + the target. This provides substantial packet amplifications. ICE + fixes this by requiring that an agent never send media packets unless + it has sent a STUN message towards the target of the RTP packets, and + received a reply from that target Section 7.13. The offer is then sent to the answerer. This specification does not address the issue of how the signaling messages themselves traverse NAT. It is assumed that signaling protocol specific mechanisms are used for that purpose. The answerer follows a similar process as the offerer followed; it obtains addresses from local interfaces, obtains derived transport addresses from those, and then groups them into candidates for inclusion in a=candidate attributes in the answer. It picks one candidate as its active candidate and places it into the m/c line in the answer. @@ -384,75 +396,77 @@ of this list, beginning a connectivity check for that transport address pair. At a fixed interval, checks for the next transport address on the list begin. This results in a pacing of the connectivity checks. These connectivity checks are performed through peer-to-peer STUN requests, sent from one agent to the other. In addition to pacing the checks out at regular intervals, the offerer will generate a connectivity check for a transport address pair when it receives one from its peer. As soon as the active candidate has been verified by the STUN checks, media can begin to flow. Once a higher priority candidate has been verified by the offerer, it ceases - additional connectivity checks, and sends an updated offer which - promotes this higher priority candidate to the m/c-line. That - candidate is also listed in a=candidate attributes, resulting in - periodic STUN keepalives through the duration of the media session. + additional connectivity checks, begins using that candidate for + media, and sends an updated offer which promotes this higher priority + candidate to the m/c-line. That candidate is also listed in + a=candidate attributes, resulting in periodic STUN keepalives through + the duration of the media session. If an agent receives a STUN connectivity check with a new source IP address and port, or a response to such a check with a new IP address and port indicated in the MAPPED-ADDRESS attribute, this new address might be a viable candidate for the receipt of media. This happens - when there is a symmetric NAT between the agents. In such a case, + when there is a NAT with an address dependent or address and port + dependent mapping property [37] between the agents. In such a case, the agents algorithmically construct a new candidate. Like other candidates, connectivity checks begin for it, and if they succeed, its transport addresses can be used for receipt of media by promoting it to the m/c-line. The gathering of addresses and connectivity checks take time. As a - consequence, in order to have no impact on the call setup time or - post-pickup delay for SIP, these offer/answer exchanges and checks + consequence, in order to have minimal impact on the call setup time + or post-pickup delay for SIP, these offer/answer exchanges and checks happen while the call is ringing. 4. Sending the Initial Offer When an agent wishes to begin a session by sending an initial offer, it starts by gathering transport addresses, as described in Section 7.1. This will produce a set of candidates, including local - ones, STUN-derived ones, and TURN-derived ones. + ones, server reflexive ones, and relayed ones. This process of gathering candidates can actually happen at any time before sending the initial offer. A agent can pre-gather transport addresses, using a user interface cue (such as picking up the phone, or entry into an address book) as a hint that communications is imminent. Doing so eliminates any additional perceivable call setup delays due to address gathering. When it comes time to offer communications, the agent determines a priority for each candidate and identifies the active candidate that will be used for receipt of media, as described in Section 7.2. The next step is to construct the offer message. For each media stream, it places its candidates into a=candidate attributes in the offer and puts its active candidate into the m/c line. The process for doing this is described in Section 7.3. The offer is then sent. 5. Receipt of the Offer and Generation of the Answer Upon receipt of the offer message, the agent checks if the offer - contains any a=candidate attributes. If it does, the offerer + contains any a=candidate attributes. If the offer does, the offerer supports ICE. In that case, it starts gathering candidates, as described in Section 7.1, and prioritizes them as described in Section 7.2. This processing is done immediately on receipt of the offer, to prepare for the case where the user should accept the call, or early media needs to be generated. By gathering candidates (and performing connectivity checks) while the user is being alerted to - the request for communications, session establishment delays due to - that gathering can be eliminated. + the request for communications, session establishment delays are + reduced. The agent then constructs its answer, encoding its candidates into a=candidate attributes and including the active one in the m/c-line, as described in Section 7.3. The agent then forms candidate pairs as described in Section 7.4. These are ordered as described in Section 7.5. The agent then begins connectivity checks, as described in Section 7.6. It follows the logic in Section 7.10 on receipt of Binding Requests and responses to learn new candidates from the checks themselves. @@ -463,21 +477,21 @@ There are two possible cases for processing of the answer. If the answerer did not support ICE, the answer will not contain any a=candidate attributes. As a result, the offerer knows that it cannot perform its connectivity checks. In this case, it proceeds with normal media processing as if ICE was not in use. The procedures for sending media, described in Section 7.13, MUST be followed however. If the answer contains candidates, it implies that the answerer - supports ICE. The agent then forms candidate pairs as described in + supports ICE. The offerer then forms candidate pairs as described in Section 7.4. These are ordered as described in Section 7.5. The agent then begins connectivity checks, as described in Section 7.6. It follows the logic in Section 7.10 on receipt of Binding Requests and responses to learn new candidates from the checks themselves. Transmission of media is performed according to the procedures in Section 7.13. 7. Common Procedures @@ -491,21 +505,25 @@ (Section 4). For answerers, it occurs before sending an answer (Section 5). Each candidate has one or more components, each of which is associated with a sequence number, starting at 1 for the first component of each candidate, and incrementing by 1 for each additional component within that candidate. These components represent a set of transport addresses for which connectivity must be validated. For a particular media stream, all of the candidates SHOULD have the same number of components. The number of components - that are needed are a function of the type of media stream. + that are needed are a function of the type of media stream. All of + the components in a candidate MUST be of the same type - server + reflexive, relayed, or local, and obtained from the same server in + the case of server reflexive or relayed candidates. For local + candidates, each component MUST be obtained from the same interface. For traditional RTP-based media streams, it is RECOMMENDED that there be two components per candidate - one for RTP and one for RTCP. The component with the component ID of 1 MUST be RTP, and the one with component ID of 2 MUST be RTCP. If an agent doesn't implement RTCP, it SHOULD have a single component for the RTP stream (which will have a component ID of 1 by definition). Each component of a candidate has a single transport address. The first step is to gather local candidates. Local candidates are @@ -524,233 +542,270 @@ for each UDP stream , requiring K*N local transport addresses. Once the agent has obtained local candidates, it obtains candidates with derived transport addresses. The process for gathering derived candidates depends on the transport protocol. Procedures are specified here for UDP. Extensions to ICE that define procedures for other transport protocols MUST specify how derived transport addresses are gathered. Agents which serve end users directly, such as softphones, - hardphones, terminal adapters and so on, MUST implement STUN and - SHOULD use it to obtain STUN candidates. These devices SHOULD - implement and SHOULD use TURN to obtain TURN candidates. They MAY - implement and MAY use other protocols that provide derived transport - addresses, such as TEREDO [31]. Usage of STUN and TURN is at SHOULD - strength to allow for provider variation. If it is not to be used, - it is RECOMMENDED that it be implemented and just disabled through - configuration, so that it can re-enabled through configuration if - conditions change in the future. + hardphones, terminal adapters and so on, MUST implement the STUN + Binding Discovery usage and SHOULD use it to obtain server reflexive + candidates. These devices SHOULD implement the STUN Relay usage, and + SHOULD use its Allocate request to obtain both server reflexive and + relayed candidates. They MAY implement and MAY use other protocols + that provide server reflexive or relayed transport addresses, such as + TEREDO [33]. + + The requirement to use the relay Usage is at SHOULD strength to allow + for provider variation. If it is not to be used, it is RECOMMENDED + that it be implemented and just disabled through configuration, so + that it can re-enabled through configuration if conditions change in + the future. Agents which represent network servers under the control of a service provider, such as gateways to the telephone network, media servers, or conferencing servers that are targeted at deployment only in - networks with public IP addresses MAY use STUN, TURN or other similar - protocols to obtain candidates. + networks with public IP addresses MAY use the STUN Binding Discovery + usage and relay usage, or other similar protocols to obtain + candidates. Why would these types of endpoints even bother to implement ICE? The answer is that such an implementation greatly facilitates NAT traversal for clients that connect to it. The ability to process - STUN connectivity checks allows for clients to obtain peer-derived - transport addresses that can be used by the network server to - reach them without a relay, even through symmetric NAT. - Furthermore, implementation of the STUN connectivity checks allows - for NAT bindings along the way to be kept open. ICE also provides + STUN connectivity checks allows for clients to obtain peer + reflexive transport addresses that can be used by the network + server to reach them without a relay, even through NATs with + restrictive mapping and filtering policies. Furthermore, + implementation of the STUN connectivity checks allows for NAT + bindings along the way to be kept open. ICE also provides numerous security properties that are independent of NAT traversal, and would benefit any multimedia endpoint. See Section 13 for a discussion on these benefits. - Obtaining STUN, TURN and other derived candidates requires - transmission of packets which have the effect of creating bindings on - NAT devices between the client and the STUN or TURN servers. - Experience has shown that many NAT devices have upper limits on the - rate at which they will create new bindings. Furthermore, - transmission of these packets on the network makes use of bandwidth - and needs to be rate limited by the agent. As a consequence, a - client SHOULD pace its STUN and TURN transactions, such that the - start of each new transaction occurs at least Ta seconds after the - start of the previous transaction. The value of Ta SHOULD be + Obtaining derived candidates requires transmission of packets which + have the effect of creating bindings on NAT devices between the + client and the STUN servers. Experience has shown that many NAT + devices have upper limits on the rate at which they will create new + bindings. Furthermore, transmission of these packets on the network + makes use of bandwidth and needs to be rate limited by the agent. As + a consequence, a client SHOULD pace its STUN transactions, such that + the start of each new transaction occurs at least Ta seconds after + the start of the previous transaction. The value of Ta SHOULD be configurable, and SHOULD have a default of 50ms. Note that this pacing applies only to the start of a new transaction; pacing of - retransmissions within a STUN or TURN transaction is governed by the - retransmission rules defined in those protocols. + retransmissions within a STUN transaction is governed by the + retransmission rules defined by STUN. - To obtain STUN candidates, the client takes a local UDP candidate, - and for each configured STUN server, produces a STUN candidate. It - is anticipated that clients may have a multiplicity of STUN servers - that it discovers or is configured with in network environments where - there are multiple layers of NAT. To produce the STUN candidate from - the local candidate, it follows the procedures of Section 9 of RFC - 3489 for each local transport address in the local candidate. It - obtains a shared secret from the STUN server and then initiates a - Binding Request transaction from each local transport address to that - server. The Binding Response will provide the client with its STUN - derived transport address in the MAPPED-ADDRESS attribute. If the - client had K local candidates, this will produce S*K STUN candidates, - where S is the number of STUN servers. + Derived candidates can be obtained from the STUN Binding Discovery + usage or the STUN Relay usage. The latter is preferred since it will + provide the client with both a server reflexive and a relayed + transport address with a single transaction. It is possible that + some STUN servers will only support the Relay usage or only the + Binding Discovery usage, in which case a client might be configured + with different servers depending on the usage. - It is anticipated that clients may have a multiplicity of TURN - servers configured or discovered in network environments where there - are multiple layers of NAT, and that layering is known to the - provider of the client. To obtain TURN candidates, for each - configured TURN server, the client initiates an Allocate Request - transaction using the procedures of Section 8 of [16] from each - transport address of a particular local candidate. The Allocate - Response will provide the client with its TURN derived transport - address in the MAPPED-ADDRESS attribute. Once the TURN allocations - against a particular TURN server succeed from all of the transport - addresses in a particular local candidate, the client SHOULD NOT - attempt any further TURN allocations to that particular server from - the transport addresses in any other local candidates. This is to - reduce the number of bindings allocated from the NATs. Only a single - TURN candidate is needed from a particular TURN server. The order in - which local candidates are tried against the TURN server is a matter - of local policy. + To obtain both server reflexive and relayed candidates using the STUN + Relay Usage, the client takes a local UDP candidate, and for each + configured STUN server, produces both candidates. It is anticipated + that clients may have a multiplicity of STUN servers configured or + discovered in network environments where there are multiple layers of + NAT, and that layering is known to the provider of the client. To + obtain these candidates, for each configured STUN server, the client + initiates an Allocate Request transaction using the procedures of + Section 8.1.2 of [14] from each transport address of a particular + local candidate. The Allocate Response will provide the client with + its server reflexive transport address in the MAPPED-ADDRESS + attribute and its relayed transport address in the RELAY-ADDRESS + attribute. Once the Allocate requests have given a client a relayed + transport address for all transport addresses in a relayed candidate, + there is no reason for a client to obtain further relayed candidates + through the same STUN server. Thus, if there are other local + candidates from which the client has not yet obtained relayed + transport address, the client SHOULD NOT bother to obtain them. + Instead, it SHOULD use the STUN Binding Discovery usage and obtain + just server reflexive addresses from that STUN server. The order in + which local candidates are tried against the STUN server to obtain + relayed candidates is a matter of local policy. - Since a client will pace its STUN and TURN allocations at a rate of - one new transaction every Ta seconds, it will take a certain amount - of time for these allocations to occur. It is RECOMMENDED that - implementations have a configurable upper bound on the total number - of such allocations they will perform before generation of their - offer or answer. Any allocations not completed at that point SHOULD - be abandoned, but MAY continue and be used in an updated offer once - they complete. A default value of 10 is RECOMMENDED. Since the + To obtain server reflexice candidates using the STUN Binding + Discovery usage, the client takes a local UDP candidate, and for each + configured STUN server, produces a server reflexive candidate. To + produce the server reflexive candidate from the local candidate, it + follows the procedures of Section XX of [13] for each local transport + address in the local candidate. The Binding Response will provide + the client with its server reflexive transport address in the MAPPED- + ADDRESS attribute. If the client had K local candidates, this will + produce S*K server reflexive candidates, where S is the number of + STUN servers. + + Since a client will pace its STUN transactions (both Binding and + Allocate requests) at a total rate of one new transaction every Ta + seconds, it will take a certain amount of time to complete the + address gathering phase. It is RECOMMENDED that implementations have + a configurable upper bound on the total amount of time allotted to + address gathering. Any transactions not completed at that point + SHOULD be abandoned, but MAY continue and be used in an updated offer + once they complete. A default value of 5s is RECOMMENDED. Since the total number of allocations that could be done (based on the number - of STUN servers, TURN servers and local interfaces) might exceed this - value, clients SHOULD prioritize their allocations and perform higher - priority ones first. It is RECOMMENDED that STUN allocations be - prioritized over TURN allocations. + of STUN servers and local interfaces) might exceed this value, + clients SHOULD prioritize their local candidates and STUN servers, + performing transactions from the highest priority local candidates to + the highest priority STUN servers first. A STUN server would + typically be higher priority if it supports the STUN Relay Usage, + since such a server provides two transport addresses with one + transaction. Once the allocations are complete, any redundant candidates are - discarded. A candidate is redundant if its transport addresses for - each component match the transport addresses for each component of - another candidate. + discarded. Candidate A is redundant with candidate B if the + transport addresses for each component of each component match, and + each component of their associated local candidates match. For + example, consider a set of candidates with a single component. One + candidate is a local candidate, and its one component has a transport + address of 10.0.1.1:4458. A reflexive transport address is derived + from this local transport address, producing a 10.0.1.1:4458. These + two candidates are identical, and also have identical associated + local transport addresses, so they are redundant. However, in a more + complicated case, consider a multi-homed host, with one interface at + 192.168.1.1 and another at 10.0.1.1. The 192.168 network is natted, + with its "public" side in another net-10 private network. The client + obtains two local candidates, A and B, with transport addresses of + 192.168.1.1:2376 and 10.0.1.1:7266 respectively. A server reflexive + transport address is derived from A through a STUN query, and it + happens to produce 10.0.1.1:7266. Call this candidate C. Candidate C + is not redundant with candidate B, since they have different + associated local transport addresses. 7.2 Prioritizing the Candidates and Choosing an Active One The prioritization process takes the set of candidates and associates each with a priority. This priority reflects the desire that the - agent has to receive media on that address, and is assigned as a + agent has to receive media at that candidate, and is assigned as a value from 0 to 1 (1 being most preferred). Priorities are ordinal, so that their significance is only meaningful relative to other candidates from that agent for a particular media stream. Candidates MAY have the same priority. However, it is RECOMMENDED that each candidate have a distinct priority. Doing so improves the efficiency of ICE. - This specification makes no normative recommendations on how the + This specification makes no normative statements on how the prioritization is done. However, some useful guidelines are suggested on how such a prioritization can be determined. One criteria for choosing one candidate over another is whether or - not that candidate involves the use of a relay. That is, if media is - sent to that candidate, will the media first transit a relay before - being received. TURN candidates make use of relays (the TURN - server), as do any local candidates associated with a VPN server. - When media is transited through a relay, it can increase the latency + not that candidate involves the use of an intermediary. That is, if + media is sent to that candidate, will the media first transit an + intermediate server before being received. Relayed candidates are + clearly one type of candidates that involve an intermediary. Another + are local candidates associated with a VPN server. When media is + transited through an intermediary, it can increase the latency between transmission and reception. It can increase the packet losses, because of the additional router hops that may be taken. It may increase the cost of providing service, since media will be - routed in and right back out of a relay run by the provider. If - these concerns are important, candidates with this property can be + routed in and right back out of an intermediary run by the provider. + If these concerns are important, candidates with this property can be listed with lower priority. Another criteria for choosing one candidate over another is IP address family. ICE works with both IPv4 and IPv6. It therefore provides a transition mechanism that allows dual-stack hosts to prefer connectivity over IPv6, but to fall back to IPv4 in case the v6 networks are disconnected (due, for example, to a failure in a - 6to4 relay) [26]. It can also help with hosts that have both a + 6to4 relay) [25]. It can also help with hosts that have both a native IPv6 address and a 6to4 address. In such a case, higher priority could be afforded to the native v6 address, followed by the 6to4 address, followed by a native v4 address. This allows a site to - obtain and begin using native v6 addresss immediately, yet still + obtain and begin using native v6 addresses immediately, yet still fallback to 6to4 addresses when communicating with agents in other sites that do not yet have native v6 connectivity. Another criteria for choosing one candidate over another is security. If a user is a telecommuter, and therefore connected to their corporate network and a local home network, they may prefer their voice traffic to be routed over the VPN in order to keep it on the corporate network when communicating within the enterprise, but use the local network when communicating with users outside of the enterprise. Another criteria for choosing one address over another is topological - awareness. This is most useful for candidates which make use of - relays (including TURN and VPN). In those cases, if an agent has - preconfigured or dynamically discovered knowledge of the topological - proximity of the relays to itself, it can use that to select closer - relays with higher priority. + awareness. This is most useful for candidates that make use of + relays. In those cases, if an agent has preconfigured or dynamically + discovered knowledge of the topological proximity of the relays to + itself, it can use that to select closer relays with higher priority. There may be transport-specific reasons for preferring one candidate over another. In such a case, specifications defining usage of ICE with other transport protocols SHOULD document such considerations. Once the candidates have been prioritized, one may be selected as the active one. This is the candidate that will be used for actual - exchange of media if and when its validated, until replaced by an - updated offer or answer. The active candidate will also be used to + exchange of media if and when its validated, until a higher priority + candidate is validated. The active candidate will also be used to receive media from ICE-unaware peers. As such, it is RECOMMENDED that one be chosen based on the likelihood of that candidate to work with the peer that is being contacted. Unfortunately, it is difficult to ascertain which candidate that might be. As an example, consider a user within an enterprise. To reach non-ICE capable agents within the enterprise, a local candidate has to be used, since the enterprise policies may prevent communication between elements using a relay on the public network. However, when communicating to - peers outside of the enterprise, a TURN-based candidate from a - publically accessible TURN server is needed. + peers outside of the enterprise, a relayed candidate from a + publically accessible STUN server is needed. Indeed, the difficulty in picking just one address that will work is the whole problem that motivated the development of this specification in the first place. As such, it is RECOMMENDED that - the active candidate be a TURN derived candidate from a TURN server - providing public IP addresses. Furthermore, ICE is only truly - effective when it is supported on both sides of the session. It is - therefore most prudent to deploy it to close-knit communities as a - whole, rather than piecemeal. In the example above, this would mean - that ICE would ideally be deployed completely within the enterprise, - rather than just to parts of it. + the active candidate be a relayed candidate from a STUN server + providing public IP addresses in response to an Allocate request. + Furthermore, ICE is only truly effective when it is supported on both + sides of the session. It is therefore most prudent to deploy it to + close-knit communities as a whole, rather than piecemeal. In the + example above, this would mean that ICE would ideally be deployed + completely within the enterprise, rather than just to parts of it. An additional consideration for selection of the active candidate is the switching of media stream destinations between the initial offer and the subsequent offer. If the active candidate pair in the - initial offer is be validated, media will flow once that pair is - validated. When the ICE checks complete and yield a higher priority - candidate pair, there will be an updated offer/answer exchange that - will change the active candidate. This will result in a change in - the destination of the media packets. This may also cause a - different path for the media packets. That path might have different - delay and jitter characteristics. As a consequence, the jitter - buffers may see a glitch, causing possible media artifacts. If these - issues are a concern, the initial offer MAY omit an active candidate. - In such a case, an updated offer will need to be sent immediately - when communicating with an ICE-unaware agent, setting an active - candidate. + initial offer is being validated, media will flow to that pair once + it is validated. When the ICE checks complete and yield a higher + priority candidate pair, media will begin to flow to it (there will + also be an updated offer/answer exchange that changes the active + candidate). This will result in a change in the destination of the + media packets. This may also cause a different path for the media + packets. That path might have different delay and jitter + characteristics. As a consequence, the jitter buffers may see a + glitch, causing possible media artifacts. If these issues are a + concern, the initial offer MAY omit an active candidate. In such a + case, an updated offer will need to be sent immediately when + communicating with an ICE-unaware agent, setting an active candidate. There may be transport-specific reasons for selection of an active candidate. In such a case, specifications defining usage of ICE with other transport protocols SHOULD document such considerations. 7.3 Encoding Candidates into SDP For each candidate for a media stream, the agent includes a series of a=candidate attributes as media-level attributes, one for each component in the candidate. Each candidate has a unique identifier, called the candidate-id. The candidate-id MUST be chosen randomly - and contain at least 128 bits of randomness (this does not mean that - the candidate-id is 128 bits long; just that it has at least 128 bits + and contain at least 24 bits of randomness (this does not mean that + the candidate-id is 24 bits long; just that it has at least 24 bits of randomness). It is chosen only when the candidate is placed into the SDP for the first time; subsequent offers or answers within the same session containing that same candidate MUST use the same - candidate-id used previously. + candidate-id used previously. 24 bits is sufficient because the + candidate-id is not providing security (the much more random password + is). It is needed only to prevent a possible simultaneous selection + by two agents within a private network for the useful lifetime of the + software or hardware. Each component of the candidate has an identifier, called the component-id. The component-id is a sequence number. For each candidate, it starts at one, and increments by one for each component. As discussed below, ICE will perform connectivity checks such that, between a pair of candidates, checks only occur between transport addresses with the same component-id. As a consequence, if one candidate has three components, and it is paired with a candidate that has two, there will only be two transport address pairs and two connectivity checks. @@ -768,30 +823,31 @@ The transport, addr and port of the a=candidate attribute (all defined in Section 12) are set to the transport protocol, unicast address and port of the tranport address. A Fully Qualified Domain Name (FQDN) for a host MAY be used in place of a unicast address. In that case, when receiving an offer or answer containing an FQDN in an a=candidate attribute, the FQDN is looked up in the DNS using an A or AAAA record, and the resulting IP address is used for the remainder of ICE processing. The qvalue is set to the priority of the candidate, and MUST be the same for all components of the candidate. - Each transport address also includes a password that will be used for - securing the STUN connectivity checks. This password MUST be chosen - randomly with 128 bits of randomness (though it can be longer than - 128 bits). Like the candidate-id, it is chosen when the candidate is - placed into an SDP for the first time for a particular session; - subsequent offers and answers within the same session conveying the - same candidate MUST use the same password. The converse is true; if - a new offer is generated as part of a new multimedia session, a new - password (and candidate-id) would be used even if the transport - address from a previous session was being recycled. + + All of the candidates share a password that is used for securing the + STUN connectivity checks. This password MUST be chosen randomly with + 128 bits of randomness (though it can be longer than 128 bits). This + password is contained in the a=ice-pwd attribute, present as a + session level attribute. A new password MUST be selected for each + new session, and MUST be present with the same value in all + subsequent offers and answers from the agent. The converse is true; + if a new offer is generated as part of a new multimedia session, a + new password MUST be used even if the transport address from a + previous session was being recycled. The combination of candidate-id and component-id uniquely identify each transport address. As a consequence, each transport address has a unique identifier, called the tid. The tid is formed by concatenating the candidate-id with the component-id, separated by the colon (":"). The tid is not explicitly encoded in the SDP; it is derived from the candidate-id and component-id, which are present in the SDP. The usage of the colon as a separator allows the candidate-id and component-id to be extracted from the tid, since the colon is not a valid character for the candidate-id. @@ -841,33 +897,38 @@ be prepared for it. Note that this is not a problem specific to ICE; stray packets can arrive at a port at any time for any type of protocol, especially ones on the public Internet. As such, this requirement is just restating a general design guideline for Internet applications - be prepared for unknown packets on any port. The active candidate, if there is one, is placed into the m/c lines of the SDP. For RTP streams, this is done by placing the RTP address and port into the c and m lines in the SDP respectively. If the agent is utilizing RTCP, it MUST encode its address and port using - the a=rtcp attribute as defined in RFC 3605 [2]. If RTCP is not in + the a=rtcp attribute as defined in RFC 3605 [1]. If RTCP is not in use, the agent MUST signal that using b=RS:0 and b=RR:0 as defined in - RFC 3556 [8]. + RFC 3556 [6]. If there is no active candidate, the agent MUST include an a=inactive attribute. The RTP address and port in the m/c-line is inconsequential, since it won't be used. Encoding of candidates may involve transport protocol specific considerations. There are none for UDP. However, extensions that define usage of ICE with other transport protocols SHOULD specify any special encoding considerations. + Once an offer or answer are sent, an agent MUST be prepared to + receive both STUN and media packets on each candidate. As discussed + in Section 7.13, media packets can be sent to a candidate prior to + its promotion to active. + 7.4 Forming Candidate Pairs Once the offer/answer exchange has completed, both agents will have a set of candidates for each media stream. Each agent forms a set of candidate pairs for each media stream by combining each of its candidates with each of the candidates of its peer. Candidates can be paired up only if their transport protocols are identical. If an offer/answer exchange took place for a session comprised of an audio and a video stream, and each agent had two candidates per media stream, there would be 8 candidate pairs, 4 for audio and 4 for @@ -953,46 +1014,57 @@ that candidate pair and all of its transport address pairs. Similarly, the other agent is said to be the answerer of that candidate pair and all of its transport address pairs. As a consequence, each agent has a particular role, either offerer or answerer, for each transport address pair. This role is important; when a candidate pair is to be promoted to active, the offerer is the one which performs the updated offer. 7.5 Ordering the Candidate Pairs - For the same reason that the STUN and TURN allocations are paced at a - rate of Ta transactions per second, so too are the connectivity - checks paced, also at a rate of Ta transactions per second. However, - in order to rapidly converge on a valid candidate pair that is - mutually desirable, the candidate pairs are ordered, and the checks - start with the candidate pair at the top of the list. Rapid - convergence of ICE depends on both the offerer and answerer coming to - the same conclusion on the ordering of candidate pairs. + For the same reason that the STUN transactions during address + gathering are paced at a rate of Ta transactions per second, so too + are the connectivity checks paced, also at a rate of Ta transactions + per second. However, in order to rapidly converge on a valid + candidate pair that is mutually desirable, the candidate pairs are + ordered, and the checks start with the candidate pair at the top of + the list. Rapid convergence of ICE depends on both the offerer and + answerer coming to the same conclusion on the ordering of candidate + pairs. Recall that when each candidate is encoded into SDP, it contains a - qvalue between 1 and 0, with 1 being the highest priority. Peer- - derived candidates, learned through the procedures described in + qvalue between 1 and 0, with 1 being the highest priority. Peer + reflexive candidates, learned through the procedures described in Section 7.10 also have a priority between 0 and 1. For each media stream, the native candidates are ordered based on their qvalues, with higher q-values coming first. Amongst candidates with the same - qvalue, they are ordered based on candidate ID, using lexicographic - order where C1 is placed before C2, if C2 precedes C1. In other - words, if the qvalues are the same, the candidates are sorted in - reverse order. This is actually important; as discussed in - Section 13, it allows peer-derived candidates to be preferred over - native ones. The result of these two ordering rules will be an - ordered list of candidates. The first candidate in this list is - given a sequence number of 1, the next is given a sequence number of - 2, and so on. This same procedure is done for the remote candidates. - The result is that each candidate pair has two sequence numbers, one - for the native candidate, and one for the remote candidate. + qvalue, they are ordered based on candidate ID, using reverse + lexicographic order, where C1 is placed before C2, if C2 precedes C1 + lexicographically. Lexicographic order can be viewed as a numerical + ordering where each "digit" is actually a number in numerical base + 256, with the mapping of characters to numerical value being defined + by their ASCII encoding. For example, the candidate with candidate + ID agD is greater than the candidate with ID ad7, and both of those + are greater than the candidate with ID zz. Consequently, if these + three candidates had equal q-values, they would be ordered as agD, + ad7, zz - reverse of their lexicographic order. + + The usage of a reverse lexicographic order is important; as discussed + in Section 13, it allows peer-derived candidates to be preferred over + native ones. + + The result of these ordering rules will be an ordered list of + candidates. The first candidate in this list is given a sequence + number of 1, the next is given a sequence number of 2, and so on. + This same procedure is done for the remote candidates. The result is + that each candidate pair has two sequence numbers, one for the native + candidate, and one for the remote candidate. First, all of the candidate pairs for whom the smaller of the two sequence numbers equals 1 are taken first. Then, all of those for whom the smaller of the two sequence numbers equals 2 are taken next, and so on. Amongst those pairs that share the same value for their smaller sequence number, they are ordered by the larger of their two sequence numbers (smallest first). Amongst those pairs that share the same value for their smaller sequence number and the same value for their larger sequence number, the larger of the two candidate IDs in each pair are selected, and the pairs are lexicographically @@ -1016,22 +1087,30 @@ --------------------------------------------------------------------- 1 g9 1.0 1 h8 0.3 1 1 1 h8 2 88 0.8 2 h8 0.3 1 2 1 h8 3 g9 1.0 1 65 0.2 2 2 1 g9 4 g9 1.0 1 k1 0.1 3 3 1 k1 5 88 0.8 2 65 0.2 2 2 2 88 6 88 0.8 2 k1 0.1 3 3 2 k1 This ordering is then modified slightly by taking the candidate pair corresponding to the active candidate, if there is one, and promoting - it to the top of the list. This allows the current active candidate - to be tested first. As discussed below, media is not sent until the + it to the top of the list. To find this candidate pair, the agent + looks for candidate pairs whose native and remote transport addresses + match the native and remote transport addresses in the m/c-line. It + is possible that multiple candidates match; this happens in the case + where an agent obtained the same derived transport address from + different local transport addresses. In such a case, the agent + should pick one of the matching candidates. + + Putting the active candidate at the top of the list allows it to be + tested first. As discussed below, media is not sent until the corresponding candidate is verified, necessitating rapid verification of the active candidate. This modified ordering is called the candidate pair check ordering, since it reflects the order in which connectivity checks will be done. If there was no active candidate, the candidate pair check ordering and the candidate pair priority ordering will be identical. Within each candidate pair there will be a set of transport address pairs, one for each component ID. Those pairs are ordered by component ID. The result is an absolute ordering of all transport @@ -1040,52 +1119,53 @@ followed by the order of their component IDs. This ordering is called the transport address pair check ordering. Ordering of candidates may involve transport protocol specific considerations. There are none for UDP. However, extensions that define usage of ICE with other transport protocols SHOULD specify any special ordering considerations. 7.6 Performing the Connectivity Checks - Connectivity checks are performed by sending peer-to-peer STUN - Binding Requests. These checks result in a candidate progressing - through a state machine that captures the progress of connectivity - checks. The specific state machine and the procedures for the - connectivity checks are specific to the transport protocol. This - specification defines rules for UDP. Extensions to ICE that describe - other transport protocols SHOULD describe the state machine and the - procedures for connectivity checks. + Connectivity checks are a STUN usage defined in [13]. They are + performed by sending peer-to-peer STUN Binding Requests. These + checks result in a candidate progressing through a state machine that + captures the progress of connectivity checks. The specific state + machine and the procedures for the connectivity checks are specific + to the transport protocol. This specification defines rules for UDP. + Extensions to ICE that describe other transport protocols SHOULD + describe the state machine and the procedures for connectivity + checks. The set of states visited by the offerer and answerer are depicted graphically in Figure 4 | |Start | | V +------------+ | | | | | Waiting |----------------+ | | | | | | +------------+ | | | | Timer Ta | Get Req | --------. | ------- - | Send Req | Send Res, - V | Send Req - Get Res +------------+ Get Req | - ------- | | ------- | - - | | Send Res | + | Send Req Get Req | Send Res, + V ------- | Send Req + Get Res +------------+ Send Res, | + ------- | | Re-Xmit | + - | | Req | +---------------| Testing |-----------+ | | | | | | | | | | | | +------------+ | | | | | | | | Error | | | | ----- | | Timer Tr | | - | | -------- V V V V Send Req +------------+ +------------+ +------------+ @@ -1130,73 +1210,88 @@ Binding Request. The USERNAME directly contains the transport address pair ID. Requests that are sent by an agent as part of the processing described here encode the transport address pair in the USERNAME. Binding Responses are matched to their requests using the STUN transaction ID, and then mapped to the transport address pair from that. Every Ta seconds, the agent starts a new connectivity check for a transport address pair. The check is started for the first transport address pair in the transport address pair check ordered list (which - will be the active candidate) that is in the Waiting state. The - state machine for this transport address pair is moved to the Testing - state, and the agent sends a connectivity check using a STUN Binding - Request, as outlined in Section 7.7. Once a STUN connectivity check - begins, the processing of the check follows the rules for STUN. - Specifically, retransmits of STUN requests are done as specified in - RFC 3489, and furthermore, if a transaction fails and needs to be - retried, that retry can happen rapidly, as described below. It - doesn't "count" against the rate limit of 1/Ta checks per second. In - addition, the keepalives that are generated for a valid pair do not - count against the rate limit either. The rate limit applies strictly - to the start of connectivity checks by the answerer for a transport + will be part of the active candidate) that is in the Waiting state. + The state machine for this transport address pair is moved to the + Testing state, and the agent sends a connectivity check using a STUN + Binding Request, as outlined in Section 7.7. Once a STUN + connectivity check begins, the processing of the check follows the + rules for STUN. Specifically, retransmits of STUN requests are done + as specified in [13], and furthermore, if a transaction fails and + needs to be retried, that retry can happen rapidly, as described + below. It doesn't "count" against the rate limit of 1/Ta checks per + second. In addition, the keepalives that are generated for a valid + pair do not count against the rate limit either. The rate limit + applies strictly to the start of connectivity checks for a transport address pair that has been newly signaled through an offer/answer exchange. In addition, if, while in the Waiting state, an agent receives a Binding Request matching that transport address pair, and this - Binding Request generates a successful response, the agent moves into - the Send-Valid state, and sends a connectivity check of its own using - a STUN Binding Request, as outlined in Section 7.7. If the Binding - Request didn't generate a success response, there is no change in - state or generation of a Binding Request. + Binding Request generates a successful response, the transport + address pair moves into the Send-Valid state, and the agent sends a + connectivity check of its own using a STUN Binding Request, as + outlined in Section 7.7. If the Binding Request didn't generate a + success response, there is no change in state or generation of a + Binding Request. If, while in the Testing state, the agent receives a successful - response to its STUN request, it moves into the Recv-Valid state. In - this state, the agent knows that packets can flow in both directions. - However, its peer agent doesn't yet know that; all it knows is that - it has been able to receive a packet. Thus, in this state, the agent - awaits receipt of the Binding Request sent by its peer, as the - response to that request is what informs its peer that packets can - flow in both directions. + response to its STUN request, the transport address pair moves into + the Recv-Valid state. In this state, the agent knows that packets + can flow in both directions. However, its peer agent doesn't yet + know that; all it knows is that it has been able to receive a packet. + Thus, in this state, the agent awaits receipt of the Binding Request + sent by its peer, as the response to that request is what informs its + peer that packets can flow in both directions. + + If, while in the Testing state, the agent receives a Binding Request + matching that transport address pair, and this Binding Request + generates a successful response, the transport address pair moves + into the Send-Valid state. In addition, the agent retransmits a + Binding Request for the transaction in progress. This helps speed up + bidirectional connectivity verification when one agent is behind a + symmetric NAT. If the Binding Request didn't generate a success + response, there is no change in state or generation of a Binding + Request. If, while in the Send-Valid state, the agent receives a successful - response to its STUN request, it moves to the Valid state. In this - state, the agent knows that packets can flow in each direction. It - also knows that its peer has sent it the STUN Request whose response - will demonstrate to the peer that packets can flow in each direction. + response to its STUN request, the transport address pair moves to the + Valid state. In this state, the agent knows that packets can flow in + each direction. It also knows that its peer has sent it the STUN + Request whose response will demonstrate to the peer that packets can + flow in each direction. If, while in the Recv-Valid state, the agent receives a STUN Binding Request from its peer that results in a successful response, the - agent moves into the Valid state. Receipt of a request whose - response was not a successful one does not result in a change in - state. + transport address pair moves into the Valid state. Receipt of a + request whose response was not a successful one does not result in a + change in state. In any state, if the STUN transaction results in an error, the state - machine moves into the invalid state. + machine moves into the invalid state. A STUN transaction produces an + "error" based on the processing in Section 7.7, which indicates which + STUN response codes constitute an error as far as ICE processing is + concerned. If a transport address pair is in the Recv-Valid or Valid state, an agent MUST generate a new STUN Binding Request transaction every Tr seconds. This transaction ensures that NAT bindings for the transport address pair remain open while the candidate is under consideration. The transaction is performed as outlined in - Section 7.7. These transactions can also be used to keep the + Section 7.7. These transactions can also be used to keep the NAT bindings alive when the candidate is promoted to active, as described in Section 7.12. Tr SHOULD be configurable, and SHOULD default to 15 seconds. If the transaction results in an error, the state machine moves to the invalid state. This happens in cases where the NAT bindings expire (e.g., due to binding timeouts or NAT failures). The candidate pair itself has a state, which is derived from the states of its transport address pairs. If at least one of the transport address pairs in a candidate pair is in the invalid state, the state of the candidate pair is considered to be invalid. If the @@ -1220,181 +1315,166 @@ in the Waiting or Testing states, and at least one is in the Testing state, the state of the candidate pair is Testing. Otherwise, the state of the candidate pair is considered Indeterminate. A candidate itself also has a state. If a candidate is present in at least one valid candidate pair, that candidate is said to be valid. If all of the candidate pairs containing that candidate are invalid, the candidate itself is invalid. Otherwise, the candidate's state is Indeterminate. - If a native candidate becomes valid, and is more preferred than the - active one, the offerer sends an updated offer with this newly - validated candidate promoted to the m/c-line. This process is - discussed in more detail in Section 7.9. - 7.7 Sending a Binding Request for Connectivity Checks - An agent performs a Binding Request transaction by sending a STUN - Binding Request from its native transport address, and sending it to - the remote transport address. The meaning of "sending from its - native transport address" depends on the type of transport protocol - and the type of transport address (local, STUN-derived, TURN-derived, - or peer-derived). This specification defines the meaning for UDP. + An agent performs a connectivity check on a transport address pair by + sending a STUN Binding Request from its native transport address, and + sending it to the remote transport address. The meaning of "sending + from its native transport address" depends on the type of transport + protocol and the type of transport address (local, reflexive, or + relayed). This specification defines the meaning for UDP. Specifications defining other transport protocols must define what this means for them. For UDP-based local transport addresses, sending from the local transport address has the meaning one would expect - the request is - sent such that the source IP address and port For STUN derived UDP - transport addresses, it is sent by sending from the local transport - address used to derive that STUN address. For TURN derived UDP - transport addresses, it is sent by using TURN mechanisms to send the - request through the TURN server (using the SEND primitive). Sending - the request through the TURN server neccesarily requires that the - request be sent from the client, using the local transport address - used to derive the TURN transport address. + sent such that the source IP address and port equal that of the local + transport address. For reflexive ransport addresses, it is sent by + sending from the associated local transport address used to derive + that reflesive address. For relayed transport addresses, it is sent + by using STUN mechanisms to send the request through the STUN relay + (using the Send request). Sending the request through the STUN relay + server neccesarily requires that the request be sent from the client, + using the local transport address used to derive the relayed + transport address. The Binding Request sent by the agent MUST contain the USERNAME attribute. This attribute MUST be set to the transport address pair ID of the corresponding transport address pair as seen by its peer. Thus, for the first transport address pair in Figure 2, if the agent on the left sends the STUN Binding Request, the USERNAME will have the value R:1:L:1. If the agent on the right sends the STUN Binding Request, the USERNAME will have the value L:1:R:1. To be clear, the USERNAME that is used is NOT the one seen locally, but rather the one as seen by its peer. The request SHOULD contain the MESSAGE- - INTEGRITY attribute, computed according to RFC 3489 procedures. The - key used as input to the HMAC is the password provided by the peer - for this remote transport address. The Binding Request MUST NOT - contain the CHANGE-REQUEST or RESPONSE-ADDRESS attribute. + INTEGRITY attribute, computed according to [13]. The key used as + input to the HMAC is the password provided by the peer for this + remote transport address. This password will be identical for all + remote transport addresses for the same media stream. The STUN transaction will generate either a timeout, or a response. If the response is a 420, 500, or 401, the agent should try again as - described in RFC 3489 (as mentioned above, it need not wait Ta - seconds to try again). Either initially, or after such a retry, the - STUN transaction might produce a non-recoverable failure response - (error codes 400, 430, 431, or 600) or a failure result inapplicable - to this usage of STUN and thus unrecoverable (432, 433). If this - happens, an error event is generated into the state machine, and the - transport address pair enters the invalid state. + described in [13] (as mentioned above, it need not wait Ta seconds to + try again). Either initially, or after such a retry, the STUN + transaction might produce a non-recoverable failure response (error + codes 400, 430, 431, or 600) or a failure result inapplicable to this + usage of STUN and thus unrecoverable (432, 433). If this happens, an + error event is generated into the state machine, and the transport + address pair enters the invalid state. If the STUN transaction times out, the client SHOULD NOT retry. The only reason a retry might succeed is if there was severe packet loss during the duration of the check, or the answer was significantly delayed, also due to packet loss. However, STUN Binding Request transactions run for 9.5 seconds, which is well beyond the typical tolerance for a session establishment. The retries come with a penalty of additional traffic, which can be used to launch DoS attacks Section 13.4.2. The only reason to not follow the SHOULD NOT is if the agent has adjusted the STUN transaction timers to be more aggressive. If the Binding Response is a 200, the agent SHOULD check for the - MESSAGE-INTEGRITY attribute and verify it, as discussed in RFC 3489. + MESSAGE-INTEGRITY attribute and verify it, as discussed in [13]. Indeed, this check SHOULD be done for all responses. This will result in the response being discarded (eventually leading to a timeout), if the integrity check fails. 7.8 Receiving a Binding Request for Connectivity Checks As a result of providing a list of candidates in its offer or answer, an agent will receive STUN Binding Request messages. An agent MUST be prepared to receive STUN Binding Requests on each local transport address from the moment it sends an offer or answer that contains a candidate with that local transport address. Similarly, it MUST be prepared to receive STUN Binding Requests on a local transport - address the moment it sends an offer or answer that contains a STUN - or TURN candidate derived from a local candidate containing that - local transport address. It can cease listening for STUN messages on - that local transport address after sending an updated offer or answer - which does not include any candidates with transport addresses that - are equal to or derived from that local transport address. - - The agent does not need to provide STUN service on any other IP - addresses or ports, unlike the STUN usage described in [1]. The need - to run the service on multiple ports is to support receipt of Binding - Requests with the CHANGE-REQUEST attribute. However, that attribute - is not used when STUN is used for connectivity checks. A server - SHOULD reject, with a 400 answer, any STUN requests with a CHANGE- - REQUEST attribute whose value is non-zero. The CHANGED-ADDRESS - attribute in a BindingAnswer is set to the transport address on which - the server is running. + address the moment it sends an offer or answer that contains a + reflexive or relayed candidate derived from a local candidate with + that local transport address. It can cease listening for STUN + messages on that local transport address after sending an updated + offer or answer which does not include any candidates with transport + addresses that are equal to or derived from that local transport + address. - Furthermore, there is no need to support TLS or to be prepared to - receive SharedSecret request messages. Those messages are used to - obtain shared secrets to be used with BindingRequests. However, with - ICE, these shared secrets are exchanged through the offer/answer - exchange itself. + As discussed in [13], since the username and password for STUN + requests are exchanged through another mechanism - here, ICE - the + Shared Secret Request mechanism is not needed and need not be + implemented by agents that provide the connectivity check usage. - One of the candidates may be in use as the active candidate. For the - transport addresses comprising that candidate, the agent will receive - both STUN requests and media packets on its associated local - transport addresses. The agent MUST be able to disambiguate them. - In the case of RTP/RTCP, this disambiguation is easy. RTP and RTCP - packets start with the bits 0b10 (v=2). The first two bits in STUN - are always 0b00. This disambiguation also works for packets sent - using Secure RTP [25], since the RTP header is in the clear. - Disambiguating STUN with other media stream protocols may be more - complicated. However, it can always be possible with arbitrarily - high probabilities by selecting an appropriately random username (see - below). + One of the candidates may be in use as the active candidate, or may + become promoted to the active candidate in the next offer/answer + exchange as a consequence of a successful validation. In either + case, both media and STUN packets will be sent to the transport + addresses comprising that candidate, causing both to receive on their + associated local transport addresses. The agent MUST be able to + disambiguate them. This is done trivially by looking for the STUN + magic cookie as the value of the second 32-bit word in the packet. + If present, it identifies a STUN packet. Processing of the Binding Request proceeds in two steps. The first - is generation of the response, and the second is side-effect + is generation of the response, and the second ICE-specific processing. Generation of the response follows the general - procedures of RFC 3489. The USERNAME is considered valid if its - topmost portion (the part up to, but not including the second colon) - corresponds to a transport address ID known to the agent. The - password associated with that transport address ID is used to verify - the MESSAGE-INTEGRITY attribute, if one was present in the request. - If the USERNAME was not valid, the agent generates a 430. Otherwise, + procedures of [13]. The USERNAME is considered valid if one of the + candidate IDs sent in an offer or answer is a prefix of the USERNAME + (this will always be the case, even for peer reflexive candidates). + The password associated with that candidate ID is used to verify the + MESSAGE-INTEGRITY attribute, if one was present in the request. If + the USERNAME was not valid, the agent generates a 430. Otherwise, the success response will include the MAPPED-ADDRESS attribute, which is used for learning new candidates, as described in Section 7.10. The MAPPED-ADDRESS attribute is populated with the source IP address and port of the Binding Request. For Binding Requests received over - TURN-derived transport addresses, this MUST be the source IP address - and port of the Binding Request when it arrived at the TURN relay, - prior to forwarding towards the agent. That source transport address - will be present in the REMOTE-ADDRESS attribute of a TURN Data - Indication message, if the Binding Request were delivered through a - Data Indication. If the Binding Request was not encapsulated in a - Data Indication, that source address is equal to the current active - destination for the TURN session. + relayed transport addresses, this MUST be the source IP address and + port of the Binding Request when it arrived at the relay, prior to + forwarding towards the agent. That source transport address will be + present in the REMOTE-ADDRESS attribute of a STUN Data Indication + message, if the Binding Request was delivered through a Data + Indication. If the Binding Request was not encapsulated in a Data + Indication, that source address is equal to the current active + destination for the STUN relay session. - The side effect processing involves changes to the state machine for - a transport address pair. This processing cannot be done until the + The ICE processing involves changes to the state machine for a + transport address pair. This processing cannot be done until the initial offer/answer exchange has completed. As a consequence, if - the answerer received a Binding Request that generated a success + the oferrer received a Binding Request that generated a success response, but had not yet received the answer to its offer, it waits - for the answer, and when it arrives, then performs the side effect + for the answer, and when it arrives, then performs the ICE processing. The agent takes the entire contents of the USERNAME, and compares them against the transport address pair identifiers as seen by that agent for each transport address pair. If there is no match, nothing is done - this should never happen for compliant implementations. If there is a match, the resulting transport address pair is called the matching transport address pair. The state machine for the matching transport address pair is then updated based on the receipt of a STUN Binding Request, and the resulting actions described in Section 7.6 are undertaken. - An agent will continue to receive periodic STUN transactions on a - local transport address as long as it had listed that transport + An agent will continue to receive periodic STUN connectivity checks + on a local transport address as long as it had listed that transport address, or one derived from it, in an a=candidate attribute in its - most recent offer or answer, and the state machine indicates that - Binding Requests are periodically sent (as is the case for UDP). It - MUST process any such transactions according to this section. It is - possible that a transport address pair that was previously valid may - become invalidated as a result of a subsequent failed STUN - transaction. + most recent offer or answer, the state machine for that transport + address is in the Recv-Valid or Valid states, and the transport + address is for UDP. Whether STUN keepalives are used for other + transport protocols is defined by the specifications for that + transport protocol. The agent processes any such transactions + according to this section. It is possible that a transport address + pair that was previously valid may become invalidated as a result of + a subsequent failed STUN transaction. 7.9 Promoting a Candidate to Active As a consequence of the connectivity checks, each agent will change the states for each transport address pair, and consequently, for the candidate pairs. When a candidate pair becomes valid, and the agent is in the role of offerer for that candidate pair, the agent follows the logic in this section. The rules only apply to the offerer of a candidate pair in order to eliminate the possibility of both agents simultaneously offering an update to promote a candidate to active. @@ -1411,184 +1491,174 @@ If this candidate pair is not the first on the candidate pair priority ordered list or the candidate pair check ordered list, and the wait-state timer has not yet been set, the agent sets this timer to Tws seconds. Tws SHOULD be configurable, and SHOULD have a default of 100ms. This timer allows for a higher priority connectivity check to complete, in the event its STUN Binding Request was lost or delayed in the network. If, prior to the wait-state timer firing, another connectivity check completes and a candidate pair is validated, there is no need to reset or cancel the timer. + Once the timer fires, the agent SHOULD issue an updated offer as described in Section 7.11.1. + In addition, in order to speed up ICE processing, once the agent has + determined the candidate that is to be promoted, it will send and + receive media using that candidate in expectation of an updated + offer. This is discussed in Section 7.13. + 7.10 Learning New Candidates from Connectivity Checks - ICE makes use of candidate addresses learned through protocols like - STUN, as described in Section 7.1. These addresses are learned when - STUN requests are sent to configured STUN servers. However, the - peer-to-peer STUN connectivity checks can themselves provide - additional candidates that ICE can make use of. This happens, for - example, when two agents are separated by a symmetric NAT. When the - agent behind the symmetric NAT sends a Binding Request to the other - agent (which can have a public address or be behind any type of NAT - except for symmetric), the symmetric NAT will create a new NAT - binding for this Binding Request. Because of the properties of - symmetric NAT, that binding can be used be the agent on the public - side of the symmetric NAT to send packets back to the agent behind - the symmetric NAT. + ICE makes use of reflexive addresses, which are addresses that inform + an agent of its transport address as seen by another host. An + initial offer or answer generated by an agent includes server + reflexive addresses, which are learned from a configured or + discovered STUN server in the network. However, the connectivity + checks themselves can inform an agent of reflexive addresses, and in + particular, ones that are reflexive towards its peer. These are + called peer reflexive candidates. A new peer reflexive candidate is + typically observed when two agents are separated by a NAT with the + address-dependent or address and port dependent mapping properties + [37]. When the agent behind such a NAT sends a Binding Request to + the other agent (assuming it is reachable), the NAT will create a new + mapping for this Binding Request. Because STUN and the media packets + are sent on the same port, regardless of the filtering properties of + the NAT (whether endpoint independent, address dependent, or address + and port dependent), this reflexive address can be used by the peer + for sending STUN and media packets back towards the agent. - To do this, ICE agents perform additional processing on the receipt - of STUN Binding Requests and responses, beyond the logic described in - Section 7.7 and Section 7.8. This logic is described below. + To obtain and use these peer reflexive transport addresses, ICE + agents perform additional processing on the receipt of STUN Binding + Requests and responses, beyond the logic described in Section 7.7 and + Section 7.8. This logic is described below. 7.10.1 On Receipt of a Binding Request When a STUN Binding Request is received which generates a success response, that Binding Request would have been associated with a matching transport address pair and corresponding candidate pair. The source IP and port of this Binding Request are compared to the IP address and port of the remote transport address in the matching transport address pair. Note that, in this case, we are comparing actual IP addresses and ports - not tids. In addition, if the - Binding Request arrived through a TURN derived transport address, the + Binding Request arrived through a relayed transport address, the source IP and port of this binding request used for the comparison - are those in the Binding Request when it arrived at the TURN relay, - prior to forwarding towards the agent. That source transport address - will be present in the REMOTE-ADDRESS attribute of a TURN Data - Indication message, if the Binding Request were delivered through a - Data Indication. If the Binding Request was not encapsulated in a - Data Indication, that source address is equal to the current active - destination for the TURN session. + are those in the Binding Request when it arrived at the relay, prior + to forwarding towards the agent. That source transport address will + be present in the REMOTE-ADDRESS attribute of a STUN Data Indication + message, if the Binding Request were delivered through a Data + Indication. If the Binding Request was not encapsulated in a Data + Indication, that source address is equal to the current active + destination for the STUN relay session. The comparison of the source IP and port of the Binding Request and the IP address and port of the remote transport address in the - matching transport address pair may not match. One reason this could - happen is if there was a NAT between the two agents. If they do not - match, the source IP and port of the Binding Request (and again, for - TURN derived transport address, this refers to the source IP address - and port of the packet when it arrived at the relay) are compared to - the IP address and ports across the transport address pairs in *all* + matching transport address pair may indicate inequality. In that + case, the source IP and port of the Binding Request (and again, for + relayed transport address, this refers to the source IP address and + port of the packet when it arrived at the relay) are compared to the + IP address and ports across the transport address pairs in *all* remote candidates. If there is still no match, it means that the source IP and port might represent another valid remote transport - address. Such a transport address is called a peer-derived transport - address. + address - a peer derived one. To use it, that address needs to be associated with a candidate (called a peer-derived candidate). In this case, however, the candidate isn't signaled through an offer/answer exchange; it is constructed dynamically from information in the STUN request. Like all other candidates, the peer-derived candidate has a candidate ID. The candidate ID is derived from the candidate IDs of the matching candidate pair. In particular, the candidate ID is constructed by concatenating the remote candidate ID with the native candidate ID - (without the colon). + (without the colon). The password for the new candidate equals that + of the remote candidate ID in the matching candidate pair. On receipt of a STUN Binding Request whose source IP and port don't match the transport address in any remote candidate, the agent - constructs the candidate ID that represents the peer-derived + constructs the candidate ID that represents the peer reflexive candidate, and checks to see if that candidate exists. It may already exist if it had been constructed as a consequence of a previous application of this logic on receipt of a Binding Request for a different transport address pair of the same candidate pair. - If there is not yet a peer derived candidate with that candidate ID, - the agent creates it, and assigns it the newly computed candidate ID. - The priority of the peer-derived candidate MUST be set to the + If there is not yet a peer reflexive candidate with that candidate + ID, the agent creates it, and assigns it the newly computed candidate + ID. The priority of the peer-derived candidate MUST be set to the priority of its generating candidate - the remote candidate in the matching transport address pair. Note that, at this time, the peer derived candidate has no transport addresses in it. Newly created or not, the agent extracts the component ID from the matching transport address pair, and sees if a transport address with - that same component ID exists in the peer derived candidate. If not - (and it shouldn't), the agent adds a transport address to the peer- - derived candidate. This transport address is equal to the source IP - address and port from the incoming STUN Binding Request. It is - assigned the component ID equal to the component ID in the matching - transport address pair. This transport address will have a tid, - equal to the concatenation of the candidate ID for this new - candidate, and the component ID, separated by a colon. + that same component ID exists in the peer reflexive candidate. If + not (and it shouldn't), the agent adds a transport address to the + peer reflexive candidate. This transport address is equal to the + source IP address and port from the incoming STUN Binding Request + (and in the case of a relayed transport address, the one seen by the + relay). It is assigned the component ID equal to the component ID in + the matching transport address pair. This transport address will + have a tid, equal to the concatenation of the candidate ID for this + new candidate, and the component ID, separated by a colon. - The peer-derived candidate becomes usable once the number of + The peer reflexive candidate becomes usable once the number of transport addresses in it equals the transport address pair count of - the candidate pair from which it is derived. Initially, the peer- - derived candidate will start with a single transport address. More + the candidate pair from which it is derived. Initially, the peer + reflexive candidate will start with a single transport address. More are added as the connectivity checks for the original candidate pair - take place. Once the peer-derived candidate becomes usable, it has + take place. Once the peer reflexive candidate becomes usable, it has to be paired up with native candidates. However, unlike the procedures of Section 7.5, which pair up each remote candidate with - each native candidate, this peer-derived candidate is only paired up - with the native candidate from the candidate pair from which it was - derived. This creates a new candidate pair, and a set of new + each native candidate, this peer reflexive candidate is only paired + up with the native candidate from the candidate pair from which it + was derived. This creates a new candidate pair, and a set of new transport address pairs. Recall that, for each candidate pair, one agent plays the role of - offerer, and the other of answerer. For peer-derived candidates, the - agent that receives the STUN request and follows the processing in - this section acts as the answerer. + offerer, and the other of answerer. For a peer-reflexive candidate, + the role is identical to that of its generating candidate. - Figure 5 provides a pictorial representation of the peer derived + Figure 5 provides a pictorial representation of the peer reflexive candidate (the one with id=RL) and its pairing with the native candidate with id L. The candidate with ID R is referred to as the - generating candidate. The peer-derived candidate is effectively an + generating candidate. The peer reflexive candidate is effectively an alternate for that generating candidate, but is only paired with a specific native candidate. Note that, for a particular generating candidate, there can be many peer derived candidates, up to one for each native candidate. ............. ............. . tid=L:1 . . tid=R:1 . component. -- . id=L:1:R:1 . -- .component - id=1 . | A|------------------------| C| . id=1 + id=1 . | A|-------------------------| C| . id=1 . -- -------+ . -- . - . . | . . . . | . . Generating . . | . . Candidate - . . | . . - . . | . . . tid=L:2 . | . tid=R:2 . component. -- . | id=L:2:R:2 . -- .component - id=2 . | B|-------C----------------| D| . id=2 + id=2 . | B|-------C-----------------| D| . id=2 . -- -----+ | . -- . - . .| | . . - . .| | . . - . .| | . . - . .| | . . .............| | ............. Native | | Remote Candidate | | Candidate id=L | | id=R | | - | | - .| | - | | - | | - | | | | ............. | | . tid=RL:1 . | | id=L:1:RL:1 . -- .component | +-----------------| C| . id=1 | . -- . - | . . | . . Peer Derived | . . Candidate - | . . - | . . | . tid=RL:2 . | id=L:2:RL:2 . -- .component +-------------------| D| . id=2 . -- . - . . - . . - . . - - . . ............. Remote Candidate id=RL Figure 5 The new transport address pairs have a state machine associated with them. The state that is entered, and actions to take as a consequence, are specific to the transport protocol. For UDP, the @@ -1614,73 +1684,74 @@ candidate pair. This matching is done based on comparison of candidate IDs. The value of the MAPPED-ADDRESS attribute of the Binding Response are compared to the IP address and port of the native transport address in the matching transport address pair. Note that, in this case, we are comparing actual IP addresses and ports - not tids. These may not match if there was a NAT between the two agents. If they do not match, the value of the MAPPED-ADDRESS attribute of the Binding Response are compared to the IP address and ports across the transport address pairs in *all* native candidates. If there is still no match, it means that the MAPPED-ADDRESS might - represent another valid remote transport address. + represent another valid native transport address. To use it, that address needs to be associated with a candidate. In this case, however, the candidate isn't signaled through an offer/ answer exchange; it is constructed dynamically from information in - the STUN response. Such a candidate is called a peer-derived - candidate. Like all other candidates, the peer-derived candidate has - a candidate ID. The candidate ID is derived from the candidate IDs - of the matching candidate pair. In particular, the candidate ID is - constructed by concatenating the native candidate ID with the remote - candidate ID (without the colon). + the STUN response. Such a candidate is called a peer reflexive + candidate. Like all other candidates, the peer reflexive candidate + has a candidate ID. The candidate ID is derived from the candidate + IDs of the matching candidate pair. In particular, the candidate ID + is constructed by concatenating the native candidate ID with the + remote candidate ID (without the colon). The password for the new + candidate equals that of the native candidate ID in the matching + candidate pair. On receipt of a STUN Binding Response whose MAPPED-ADDRESS didn't match the transport address in any native candidate, the agent - constructs the candidate ID that represents the peer-derived + constructs the candidate ID that represents the peer reflexive candidate, and checks to see if that candidate exists. It may already exist if it had been constructed as a consequence of a previous application of this logic on receipt of a Binding Response for a different transport address pair of the same candidate pair. If there is not yet a peer derived candidate with that candidate ID, the agent creates it, and assigns it the newly computed candidate ID. The priority of the new candidate MUST be set to the priority of the generating candidate - the native candidate in the matching transport address pair. Note that, at this time, the peer derived candidate has no transport addresses in it. Newly created or not, the agent extracts the component ID from the matching transport address pair, and sees if a transport address with - that same component ID exists in the peer derived candidate. If not - (and it shouldn't), the agent adds a transport address to the peer- - derived candidate. This transport address is equal to the MAPPED- - ADDRESS from the STUN Binding Response. It is assigned the component - ID equal to the component ID in the matching transport address pair. - This transport address will have a tid, equal to the concatenation of - the candidate ID for this new candidate, and the component ID, - separated by a colon. + that same component ID exists in the peer reflexive candidate. If + not (and it shouldn't), the agent adds a transport address to the + peer reflexive candidate. This transport address is equal to the + MAPPED-ADDRESS from the STUN Binding Response. It is assigned the + component ID equal to the component ID in the matching transport + address pair. This transport address will have a tid, equal to the + concatenation of the candidate ID for this new candidate, and the + component ID, separated by a colon. The peer-derived candidate becomes usable once the number of transport addresses in it equals the transport address pair count of candidate pair from which it is derived. Initially, the peer-derived candidate will start with a single transport address. More are added as the connectivity checks for the original candidate pair take place. Once the peer-derived candidate becomes usable, it has to be paired up with remote candidates. However, unlike the procedures of Section 7.5, which pair up each remote candidate with each native candidate, the peer-derived candidate is only paired up with the remote candidate from the matching candidate pair . This creates a new candidate pair, and a set of new transport address pairs. Recall that, for each candidate pair, one agent plays the role of - offerer, and the other of answerer. For peer-derived candidates, the - agent that receives the STUN request and follows the processing in - this section acts as the answerer. + offerer, and the other of answerer. For a peer-reflexive candidate, + the role is identical to that of its generating candidate. The new transport address pairs have a state machine associated with them. The state that is entered, and actions to take as a consequence, are specific to the transport protocol. For UDP, the procedures are defined here. Extensions that define processing for other transport protocols SHOULD describe the behavior. For UDP, the state machine enters the Recv-Valid state. Effectively, the Binding Response just received "counts" as a validation in this direction, even though it was formally done for a different candidate @@ -1704,173 +1775,229 @@ If there are any aspects of this processing that are specific to the transport protocol, those SHOULD be called out in ICE extensions that define operation with other transport protocols. There are no additional considerations for UDP. 7.11.1 Sending of a Subsequent Offer The offer MAY contain a new active candidate in the m/c line. This candidate SHOULD be the native candidate from the highest candidate pair in the candidate pair priority ordered list whose state is - valid. If there are no candidate pairs in this state, the highest - one whose state is partially valid SHOULD be used. If there are no - candidate pairs in this state, the candidate pair that is most likely - to work with this peer, as described in Section 7.2, SHOULD be used. - The candidate is encoded into the m/c line in an updated offer as - described in Section 7.3. + Valid. If there are no candidate pairs in this state, the highest + one whose state is Send-Valid or Recv-Valid SHOULD be used. If there + are no candidate pairs in these states, the candidate pair that is + most likely to work with this peer, as described in Section 7.2, + SHOULD be used. The candidate is encoded into the m/c line in an + updated offer as described in Section 7.3. If the candidate pair whose native candidate was encoded into the - m/c-line was valid or partially valid, the agent MUST include an - a=remote-candidate attribute into the offer. This attribute MUST + m/c-line was Valid, Send-Valid or Recv-Valid, the agent MUST include + an a=remote-candidate attribute into the offer. This attribute MUST contain the candidate ID of the remote candidate in the candidate pair. It is used by the recipient of the offer in selecting its candidate for the answer. The meaning of a=candidate attributes within a subsequent offer have the same meaning as they do in an initial offer. They are a request for the peer to attempt (or continue to attempt if the candidate was provided previously) a connectivity check using STUN from each of its own candidates. When an updated offer is sent, there are several dispositions regarding the candidates: retained: A candidate is retained if the candidate ID for the candidate is included in the new offer, and matches the candidate - ID for a candidate in the previous offer or answer. In this case, - all of the information about the candidate - its qvalue and - components, and the IP addresses, ports, STUN passwords and - transport protocols of its components, MUST be the same as the - previous offer or answer from the agent. If the agent wants to - change them, this is accomplished by changing the candidate ID as - well. That will have the effect of removing the old candidate and - adding a new one with the updated information. + ID for a candidate in the previous offer or answer from the agent. + In this case, all of the information about the candidate - its + qvalue and components, and the IP addresses, ports, and transport + protocols of its components, MUST be the same as the previous + offer or answer from the agent. If the agent wants to change + them, this is accomplished by changing the candidate ID as well. + That will have the effect of removing the old candidate and adding + a new one with the updated information. removed: A candidate is removed if its candidate ID appeared in a previous offer or answer, and that candidate ID is not present in the new offer. added: A candidate is added if its candidate ID appeared in the new offer, but was not present in a previous offer or answer from that agent. The following rules are used to determine the disposition of the each of the current native candidates in the new offer: - o If a candidate is invalid, and all peer-derived candidates + o If a candidate is invalid, and all peer reflexive candidates generated from it are invalid as well, it SHOULD be removed. o If the candidate in the m/c-line is valid, all other candidates SHOULD be removed. This has the effect of stopping connectivity checks of other candidates. This SHOULD would not be followed if an agent wanted to keep a candidate ready for usage should, for some reason, the active candidate later become invalid. - o If the candidate in the m/c-line is valid, and it is not peer- - derived, that candidate MUST be retained. If the candidate in the - m/c-line is peer-derived, its generating candidate MUST be + o If the candidate in the m/c-line is valid, and it is not peer + reflexive, that candidate MUST be retained. If the candidate in + the m/c-line is peer reflexive, its generating candidate MUST be retained, even if it is itself invalid. o If the candidate in the m/c-line has not been validated, all other candidates that are not invalid, or candidates for whom their derived candidates are not invalid, SHOULD be retained. - o Peer derived candidates MUST NOT be added; they continue to be + o Peer reflexive candidates MUST NOT be added; they continue to be used as long as their generating candidate was retained. Peer derived candidates are learned exclusively through the STUN connectivity checks. A new candidate MAY be added. This can happen when the candidate is a new one, learned since the previous offer/answer exchange, and it has a higher priority than the currently active candidate. It can also occur when an agent wishes to restart checks for a transport address it had tried previously. Effectively, changing the candidate ID value in an updated offer will "restart" connectivity checks for that candidate. - If a candidate is removed, the agent takes the following steps: + If a candidate is removed, the agent takes the following steps once + the offer is sent: 1. The agent eliminates any candidate pairs whose native candidate equalled the candidate that was removed. Equality is based on comparison of candidate IDs. 2. The agent eliminates any candidate pairs that had a native - candidate that is a peer derived candidate generated from the + candidate that is a peer reflexive candidate generated from the candidate that was removed. 3. The candidate pairs that are eliminated are removed from the candidate pair priority ordered list and candidate pair check ordered list. As a consequence of this, if connectivity checks had not yet begun for the candidate pair, they won't. 4. If connectivity checks were already in progress for transport - addresses in that candidate pair, the agent SHOULD immediately - terminate them. No further retransmissions take place, and no - further transactions from that candidate will be made. + addresses in a candidate pair that was removed, the agent SHOULD + immediately terminate them. No further retransmissions take + place, and no further transactions from that candidate will be + made. - 5. If the removed candidate was a TURN-derived candidate, the agent - SHOULD de-allocate its transport addresses from the TURN server. - If a local candidate was removed, and all of its derived - candidates were also removed (including any peer-derived - candidates), local operating system resources for each of the - transport addresses in the local candidate SHOULD be de- - allocated. + 5. If the removed candidate was a relayed candidate, the agent + SHOULD de-allocate its transport addresses from the STUN relay if + it is not using those resources elswhere. If a local candidate + was removed, and all of its derived candidates were also removed + (including any peer reflexive candidates), local operating system + resources for each of the transport addresses in the local + candidate SHOULD be de-allocated, as long as it is not using + those resources elsewhere. The resources may be in use elsewhere + if they were included in an initial offer which generated + multiple answers (as can happen with SIP forking). In such a + case, a subsequent offer which removes the candidate will not + imply its removal with the other branches; each becomes a + separate offer/answer relationship. + + Subsequent offers MUST contain the a=ice-pwd attribute. This SHOULD + have the same value as in previous offers. However, an agent MAY + change it if, for some reason, the agent believes that the password + may have been compromised. Since the same password is applied across + all transport addresses in all candidates for all media streams, a + change in the password impacts all of them. An agent MUST be + prepared to receive connectivity checks that use either the new or + old password until Tpw seconds after it receives the answer. Tpw + SHOULD be configurable, and SHOULD default to 2 seconds. 7.11.2 Receiving the Offer and Sending an Answer To generate the answer, the answerer has to decide which transport addresses to include in the m/c line, and which to include in candidate attributes. + The first step in the process is to look for the a=remote-candidate + attribute in the offer. The a=remote-candidate exists to eliminate a + race condition between the updated offer and the response to the STUN + Binding Request that moved a candidate into the Valid state. This + race condition is shown in Figure 6. On receipt of message 5, agent + A can move its transport address pair state machine into the Valid + state. It sends a STUN response to the request (message 6), but this + is lost. Agent A proceeds with an updated offer (message 7), which + is received at agent B. As far as agent B is concerned, the transport + address pair is still in the Send-Valid state. It will move into the + Valid state only on receipt of the STUN response in message 10. + Thus, upon receipt of the offer, agent B cannot determine which + candidate to include in its answer. To eliminate this condition, the + identity of the validated candidate is included in the offer itself. + Note, however, that the answerer will not send media until it has + received this STUN response. + + Agent A Network Agent B + |(1) Offer | | + |------------------------------------------>| + |(2) Answer | | + |<------------------------------------------| + |(3) STUN Req. | | + |------------------------------------------>| + |(4) STUN Res. | | + |<------------------------------------------| + |(5) STUN Req. | | + |<------------------------------------------| + |(6) STUN Res. | | + |-------------------->| | + | |Lost | + |(7) Offer | | + |------------------------------------------>| + |(8) Answer | | + |<------------------------------------------| + |(9) STUN Req. | | + |<------------------------------------------| + |(10) STUN Res. | | + |------------------------------------------>| + + Figure 6 + + If the a=remote-candidate attribute is present, the agent examines + the transport addresses in the m/c-line of the offer. It compares + these with the transport addresses in the remote candidates of all + candidate pairs. If there is at least one match, the agent compares + the native candidate ID of each matching pair with the value of the + a=remote-candidate attribute. If there is a match, that candidate + pair is selected. For each transport address pair in that candidate + pair, if the state of the transport address pair is Send-Valid, the + agent considers the state to be Valid just for the purpose of + selecting the m/c-line as discussed in the paragraph below. The + actual state MUST remain Send-Valid. This is necessary to prevent + against DoS attacks. + Rules for choosing transport addresses for the m/c-line are as follows. The agent examines the transport addresses in the m/c-line of the offer. It compares these with the transport addresses in the remote candidates of candidate pairs whose states are Valid. If - there is matching candidate pair in that state, the agent MUST pick - the native candidate from one of those pairs, and use that candidate - as the active one. If none of the matching pairs are in the Valid - state, the agent checks if there are any matching pairs in the Send- - Valid state. If there are, the agent looks for the a=remote- - candidate attribute in the offer. If present, and the candidate ID - listed there is one of the native candidate IDs amongst the matching - pairs, that candidate ID MUST be used as the active one. If the - a=remote-candidate attribute was not present in the offer, or there - were no matching candidate pairs in the Send-Valid state, the - candidate that is most likely to work with this peer, as described in - Section 7.2, SHOULD be used. - - The a=remote-candidate exists to eliminate a race condition between - the updated offer and the response to the STUN Binding Request that - moved a candidate into the valid state. If the answer arrives at the - agent prior to the Binding Response, the candidate pair that was - validated by the offer will still be in the Send-Valid state. To - eliminate this condition, the identity of the validated candidate is - included in the offer itself. + there is a matching candidate pair in that state, the pair with the + highest priority MUST be chosen, and the native candidate from that + pair used as the active candidate. If there were no matching + candidate pairs in the Valid state, the candidate that is most likely + to work with this peer, as described in Section 7.2, SHOULD be used. - Like the offerer, the answer can decide, for each of its candidates, - whether they are retained or removed. The same rules defined in - Section 7.11.1 for determining their disposition apply to the - answerer. Similarly, if a candidate is removed, the same rules in - Section 7.11.1 regarding removal of canididate pairs and freeing of - resources apply. + Like the offerer, the answerer can decide, for each of its + candidates, whether they are retained or removed. The same rules + defined in Section 7.11.1 for determining their disposition apply to + the answerer. Similarly, if a candidate is removed, the same rules + in Section 7.11.1 regarding removal of canididate pairs and freeing + of resources apply. Once the answer is sent, the answerer will have the set of native and remote candidates before this offer/answer exchange, and the set of - native and remote candidates afterwards. The agent then pairs up the - native and remote candidates which were added or retained. - Furthermore, for candidate pairs containing a peer derived transport - address, those pairs continue as long as both candidates are - retained. A peer derived candidate continues to be used as long as - its generating parent continues to be used. This leads to a set of - current candidate pairs. + native and remote candidates afterwards. A peer derived candidate + continues to be used as long as its generating parent continues to be + used. The agent then pairs up the native and remote candidates which + were added or retained. This leads to a set of current candidate + pairs. If a candidate pair existed previously, but as a consequence of the - offer/answer exchange, either its native or remote candidate has been - removed, the agent takes the following steps: + offer/answer exchange, it no longer exists, the agent takes the + following steps: 1. The candidate pair is removed from the candidate pair priority ordered list and candidate pair check ordered list. As a consequence of this, if connectivity checks had not yet begun for the candidate pair, they won't. 2. If connectivity checks were already in progress for that candidate pair, the agent SHOULD immediately terminate any STUN transactions in progress from that candidate. No further retransmissions take place, and no further transactions from that @@ -1902,156 +2029,279 @@ 7.12 Binding Keepalives Once a candidate is promoted to active, and media begins flowing, it is still necessary to keep the bindings alive at intermediate NATs for the duration of the session. Normally, the media stream packets themselves (e.g., RTP) meet this objective. However, several cases merit further discussion. Firstly, in some RTP usages, such as SIP, the media streams can be "put on hold". This is accomplished by using the SDP "sendonly" or "inactive" attributes, as defined in RFC - 3264 [5]. RFC 3264 directs implementations to cease transmission of + 3264 [4]. RFC 3264 directs implementations to cease transmission of media in these cases. However, doing so may cause NAT bindings to timeout, and media won't be able to come off hold. Secondly, some RTP payload formats, such as the payload format for - text conversation [34], may send packets so infrequently that the + text conversation [36], may send packets so infrequently that the interval exceeds the NAT binding timeouts. Thirdly, if silence suppression is in use, long periods of silence may cause media transmission to cease sufficiently long for NAT bindings to time out. To prevent these problems, ICE implementations MUST continue to list - their active transport addresses in a=candidate lines for UDP-based - media streams. As a consequence of this, STUN packets will be - transmitted periodically independently of the transmission (or lack - thereof) of media packets. This provides a media independent, RTP - independent, and codec independent solution for keeping the NAT - bindings alive. STUN Binding Requests cannot be used for TCP-based - transports because the media protocol may not provide framing - services to support this. As such, application layer keepalives MUST - be used in this case. + their active candidate in a=candidate lines for UDP-based media + streams. As a consequence of this, STUN packets will be transmitted + periodically independently of the transmission (or lack thereof) of + media packets. This provides a media independent, RTP independent, + and codec independent solution for keeping the NAT bindings alive. If an ICE implementation is communciating with one that does not support ICE, keepalives MUST still be sent. Indeed, these keepalives are essential even if neither endpoint implements ICE. As such, this specification defines keepalive behavior generally, for endpoints that support ICE, and those that do not. All endpoints MUST send keepalives for each media session. These keepalives MUST be sent regardless of whether the media stream is currently inactive, sendonly, recvonly or sendrecv. The keepalive SHOULD be sent using a format which is supported by its peer. ICE endpoints allow for STUN-based keepalives for UDP streams, and as such, STUN keepalives MUST be used when an agent is communicating with a peer that supports ICE. An agent can determine that its peer supports ICE by the presence of the a=candidate attributes for each media session. If the peer does not support ICE, the choice of a packet format for keepalives is a matter of local implementation. A format which allows packets to easily be sent in the absence of actual media content is RECOMMENDED. Examples of formats which - readily meet this goal are RTP No-Op [29] and RTP comfort noise [27]. + readily meet this goal are RTP No-Op [31] and RTP comfort noise [26]. STUN-based keepalives will be sent periodically every Tr seconds as a consequence of the rules in in Section 7.7. If STUN keepalives are - not in use (because the peer does not support ICE or because of TCP), - an agent SHOULD ensure that a media packet is sent every Tr seconds. - If one is not sent as a consequence of normal media communications, a - keepalive packet using one of the formats discussed above SHOULD be - sent. + not in use (because the peer does not support ICE), an agent SHOULD + ensure that a media packet is sent every Tr seconds. If one is not + sent as a consequence of normal media communications, a keepalive + packet using one of the formats discussed above SHOULD be sent. 7.13 Sending Media - An agent MUST NOT send media packets until the active candidate has - entered either the Valid or Recv-Valid state. This is to prevent a - particularly destructive denial-of-service attack described in - Section 13.4.1. + When an agent receives an offer and sends an answer, or when it + receives an answer to an offer it sent, it begins connectivity + checks. These checks will include validation of the active candidate + pair, if there was one. An agent SHOULD NOT send media on the active + candidate pair until that candidate pair has reached the Valid or + Recv-Valid state. This is to help prevent a denial-of-service + attack, described in Section 13. Once the active candidate pair + reaches the Valid or Recv-Valid state, an agent MAY start sending + media to that candidate pair. - It is important to note that an agent always sends media to the - address in the m/c-line, not to a validated candidate. To use a - candidate, it must be promoted to the m/c-line through an updated - offer/answer exchange. + However, offer/answer exchanges are used with protocols, like SIP, + which require media to be sent "early", from the answerer to the + offer, prior to completion of the initial offer/answer exchange. It + is highly desirable (and sometimes necessary) for this early media to + use the candidate pair ultimately selected by ICE connectivity + checks. For this reason, ICE provides an early media mechanism that + allows for a candidate pair to be used in one direction prior to its + promotion to active in a subsequent offer/answer exchange. Note + that, with ICE, early media pertains to media sent to a candidate + pair until its promotion to active in a subsequent offer/answer + exchange. This is a broader definition than is used in [29], which + defines early media as media sent prior to acceptance of a call. - When an agent sends media packets, it MUST send them from the same IP - address and port it has advertised in the m/c-line. This provides a - property known as symmetry, which is an essential facet of NAT - traversal. + As a consequence of the connectivity checks, an agent will change the + states for each transport address pair, and consequently, for the + candidate pairs. When a candidate pair becomes Valid or Recv-Valid, + and the candidate pair is not equal to the active candidate pair, and + the agent is in the role of answerer for that candidate pair, the + agent checks the position of that pair in the candidate pair priority + ordered list. If it is the first, the agent selects this candidate + pair for early media. If this candidate pair is not the first on the + candidate pair priority ordered list, but is higher priority than the + active candidate pair, and the early media wait-state timer has not + yet been set, the agent sets this timer to Tws seconds. Tws SHOULD + be configurable, and SHOULD have a default of 100ms. This timer + allows for a higher priority connectivity check to complete, in the + event its STUN Binding Request or Response was lost or delayed in the + network. If, prior to the wait-state timer firing, another + connectivity check completes and a candidate pair enters the Valid or + Recv-Valid states, there is no need to reset or cancel the timer. + Once the timer fires, the agent SHOULD select the highest priority + candidate pair in the Valid or Recv-Valid state for which the agent + has the role of answerer, and use that candidate pair for early + media. - In the case of a STUN-derived transport address, this means that the - RTP packets are sent from the local transport address used to obtain - the STUN address. In the case of a TURN-derived transport address, - this means that media packets are sent through the TURN server (using - the TURN SEND primitive). For local transport addresses, media is - sent from that local transport address. + ICE processing will ensure that, under almost all circumstances, the + candidate pair selected by the answerer for early media will also be + the one selected by the offerer for eventual promotion to active. + The early media state implies that the answerer knows that this + candidate pair is to be used, but the offerer doesn't know yet that + it will eventually be validated. It is for this reason that the + candidate pair can be used for early media. - This symmetric behavior MUST be followed by an agent even if its peer - in the session doesn't support ICE. + If a candidate pair is selected for early media, an agent MAY send + media on that candidate pair, even if it is not the same as the + active candidate pair. However, to deal with cases in which the + offerer and answerer do not agree on the eventual selection of this + candidate for promotion to active (a rare but possible case), the + agent MUST discontinue using the candidate pair for sending media Tlo + seconds after the answer has been reliably delivered. An answer is + considered reliably delivered when the agent receives a confirmation + that is has been delivered. In the case of an answer delivered in a + 200 OK to an offer in an INVITE (in the SIP case), the answer is + considered reliably delivered upon receipt of the ACK. Tlo SHOULD be + configurable and SHOULD have a default of 5 seconds. This time + represents the amount of time it should take the offerer to perform + its connectivity checks, arrive at the same conclusion about the + viability of the early candidate, and then generate an updated offer + promoting it to active. If, after Tlo seconds, no updated offer + arrives, the answerer MUST cease using the early candidate. Media + MAY be sent to the active candidate pair if it is in the Valid or + Recv-Valid state. + + If an updated offer does arrive prior to the expiration of the timer, + the agent MUST execute the procedures in Section 7.11.2, which will + result in the selection of a candidate for the m/c-line in the + answer. At that point, the procedures of this section SHOULD be + restarted by the answerer. This implies that the active candidate + pair, if Valid or Recv-Valid, will be used. If a higher priority + candidate pair subsequently enters the Valid or Recv-Valid state, it + may end up being used as an early candidate. + + To use a candidate pair, whether it is early or active, media is sent + to the IP addresses and ports of the components in the remote + candidate, and sends that media from the IP addresses and ports of + the components in the native candidate. Transport addresses are + paired up based on component ID. For example, if a remote candidate + has two components R1 and R2, and the native candidate has two + components L1 and L2, media packets are sent from L1 to R1 and from + L2 to R2. This provides a property known as symmetry. This + symmetric behavior MUST be followed by an agent even if its peer in + the session doesn't support ICE. + + The definition of sending media "from" a particular transport address + depends on the type of transport address. In the case of a server + reflexive transport address, this means that the RTP packets are sent + from the local transport address used to obtain the STUN address. In + the case of a relayed transport address, this means that media + packets are sent through the relay server (for STUN relays, this + would be using the Send request). For local transport addresses, + media is sent from that local transport address. For peer reflexive + transport addresses, media is sent from the local transport address + used to obtain the reflexive address. + + ICE has interactions with jitter buffer adaptation mechanisms. An + RTP stream can begin using one candidate, and switch to another one. + The newer candidate may result in RTP packets taking a different path + through the network - one with different delay characteristics. To + signal to the jitter buffers that this change has happened, it is + RECOMMENDED that, when an agent switches transmission of media from + one candidate pair to another, it sets the RTP marker bit. + Furthermore, it is RECOMMENDED that, upon receipt of an RTP packet + with the marker bit set, or upon receipt of a packet with a different + source IP address, that the agent re-adjust its jitter buffers. 8. Guidelines for Usage with SIP - SIP [3] makes use of the offer/answer model, and is one of the + SIP [2] makes use of the offer/answer model, and is one of the primary targets for usage of ICE. SIP allows for offer/answer exchanges to occur in many different combinations of messages, including INVITE/200 OK and 200 OK/ACK. When support for reliable - provisional responses (RFC 3262 [13]) and UPDATE (RFC 3311 [28]) are + provisional responses (RFC 3262 [11]) and UPDATE (RFC 3311 [27]) are added, additional combinations of messages that can be used for offer/answer exchanges are added. As such, this section provides some guidance on good ways to make use of SIP with ICE. ICE requires a series of STUN-based connectivity checks to take place - between endpoints, along with an updated offer/answer exchange to use - a validated candidate. These exchanges require time to complete. If - the initial offer/answer exchange were to take place in the INVITE - and 200 OK response respectively, the connectivity checks and updated - offer would all occur after the called party answered. This will - result in a potential increase in the post-pickup delay. This delay - refers to the time between when a user "answers the phone" and when - any speech they utter can be delivered to the caller. + between endpoints. These checks start from the answerer on + generation of its answer, and start from the offerer when it receives + the answer. These checks can take time to complete, and as such, the + selection of messages to use with offers and answers can effect + perceived user latency. Two latency of figures are of particular + interest. These are the post-pickup delay and the post-dial delay. + The post-pickup delay refers to the time between when a user "answers + the phone" and when any speech they utter can be delivered to the + caller. The post-dial delay refers to the time between when a user + enters the destination address for the user, and ringback begins as a + consequence of having succesfully started ringing the phone of the + called party. - To eliminate any increase in post-pickup delay due to ICE, it is - RECOMMENDED that the initial offer/answer exchange take place in an - INVITE and a 18x provisional response. As a consequence, support for - RFC 3262 is RECOMMENDED with ICE. The STUN connectivity checks will - then take place while the called party is being "rung". To deliver - the updated offer prior to the user answering the call, it is - RECOMMENDED that it be delivered with an UPDATE request. This will - allow ICE to have completed prior to the called party even answering - the session invitation. + To reduce post-dial delays, it is RECOMMENDED that the caller begin + gathering candidates prior to actually sending its initial INVITE. + This can be started upon user interface cues that a call is pending, + such as activity on a keypad or the phone going offhook. - If RFC 3262 and RFC 3311 are not supported by both agents, tuning can - still take place to reduce post-pickup delays. In particular, the - answerer SHOULD include its answer in an unreliable 18x response. - RFC 3261 requires that the same answer also be placed in a 200 OK, - which is delivered reliably. However, placing it in a 18x gives the - offerer an early preview of the answer, and allows the connectivity - checks to all occur prior to the user answering the call. However, - the updated offer with the highest priority valid candidate promoted - to the m/c-line cannot occur until after the 200 OK, in which case it - SHOULD be done with a re-INVITE. Fortunately, if the active - candidates in the initial offer/answer exchange end up being valid - anyway, media can flow as soon as the user answers the call (or even - before hand, if early media is needed). The additional offer/answer - exchange in the re-INVITE would merely improve the situation by using - a higher priority candidate pair. + To reduce post-pickup delays, ICE allows for media to be sent from + the answerer to the offerer on a candidate pair, prior to its + promotion to active. However, this requires the answerer to have + generated its answer and sent it. In most cases, it will require + this answer to be received by the offerer. The reason is that + connectivity checks or RTP packets from the answerer to the offerer + will not be forwarded by NATs towards the offerer until the offerer + has established a permission in the NAT by generating a packet + towards the answerer. - One of the difficulties in including the answer in the 18x, and then - using it for connectivity checks, is that the 18x might be lost. In - such a case, the STUN connectivity check from the answerer to the - offerer (UAS to UAC) will pend indefinitely. To prevent this, it is - RECOMMENDED that a SIP UA retransmit its 18x periodically, using the - same exponential backoff defined in RFC 3262, until such time as a - Binding Response is received for any of the Binding Requests it sent. + For this reason, if an offer is received in an INVITE request, the + UAS SHOULD immediately gather its candidates and then generate an + answer in a provisional response. When reliable provisional + responses are not used, the SDP in the provisional response is not + formally the answer; the value in the 200 OK is the actual answer. + However, RFC 3261 allows for SDP to appear in an unreliable + provisional response, in which case its value has to be identical to + the value placed in the 200 OK. Thus, we refer to the SDP in the + provisional response, even when unreliable, as the answer. To deal + with possible losses of the provisional response, it SHOULD be + retransmitted until some indication of receipt. This indication can + either be through PRACK [11], or through the receipt of a STUN + Binding Request with a correct username and password. Furthermore, + once the answer has been sent, the agent SHOULD begin its + connectivity checks. Once a candidate reaches the Valid or Recv- + Valid state, the UAS has a known-valid path for media packets towards + the UAC. This point is called the connected point in ICE. + + Once the UAS reaches the connected point, media can be sent from the + UAS towards the UAC without any additional delays. However, between + the receipt of the INVITE and the connected point, any media that + needs to be sent towards the caller (such as SIP early media [29] + cannot be transmitted. For this reason, implementations MAY choose + to delay alerting the called party until the connected point is + reached. In the case of a PSTN gateway, this would mean that the + setup message into the PSTN is delayed until the connected point. + Doing this increases the post-dial delay, but has the effect of + eliminating 'ghost rings'. Ghost rings are cases where the called + party hears the phone ring, picks up, but hears nothing and cannot be + heard. This technique works without requiring support for, or usage + of, preconditions [7], since its a localized decision. It also has + the benefit of guaranteeing that not a single packet of early media + will get clipped. If an agent chooses to delay local alerting in + this way, it SHOULD generate a 180 response once alerting begins. + + A slight variation of this approach is to wait for a connectivity + check to succeed to a higher priority candidate pair than the active + one. This allows for the agent to only ever send media, early or + otherwise, to a single candidate, which will work better with jitter + buffers, at the expense of even greater post-dial delays. + + Note that, prior to the promotion of a candidate pair to active, the + offerer will not be able to send using the candidate pair. When used + with SIP, if the initial offer is sent in the INVITE, and the answer + is sent in both the provisional and final 200 OK response, the + offerer will not be able to send media until it sends a re-INVITE and + receives the 200 OK response to that re-INVITE. This can take + several hundred milliseconds. If this latency is an issue (it is + generally not considered an issue for voice systems), reliable + provisional responses [11] MAY be used, in which case an UPDATE [27] + can be used to send an updated offer prior to the call being + answered. As discussed in Section 13, offer/answer exchanges SHOULD be secured against eavesdropping and man-in-the-middle attacks. To do that, the - usage of SIPS is RECOMMENDED when used in concert with ICE. + usage of SIPS [2] is RECOMMENDED when used in concert with ICE. 9. Interactions with Forking SIP allows INVITE requests carrying offers to fork, which means that they are delivered to multiple user agents. Each of those user agents then provides an answer to the offer in the INVITE. The result is that a single offer generated by the UAC produces multiple answers. ICE interacts very well with forking. Indeed, ICE fixes some of the @@ -2066,83 +2316,101 @@ correlate that traffic with a particular remote UA. When SIP is used without ICE, the incoming media traffic cannot be disambiguated without an additional offer/answer exchange. 10. Interactions with Preconditions Because ICE involves multiple addresses and pre-session activities, its interactions with preconditions merits further discussion. Quality of Service (QoS) preconditions, which are defined in RFC 3312 - [9] and RFC 4032 [10], apply only to the IP addresses and ports - listed in the m/c lines in an offer/answer. If ICE changes the - address and port where media is received, this change is reflected in - the m/c lines of a new offer/answer. As such, it appears like any - other re-INVITE would, and is fully treated in RFC 3312 and 4032, - which applies without regard to the fact that the m/c lines are - changing due to ICE negotiations ocurring "in the background". + [7] and RFC 4032 [8], apply only to the IP addresses and ports listed + in the m/c lines in an offer/answer. If ICE changes the address and + port where media is received, this change is reflected in the m/c + lines of a new offer/answer. As such, it appears like any other re- + INVITE would, and is fully treated in RFC 3312 and 4032, which + applies without regard to the fact that the m/c lines are changing + due to ICE negotiations ocurring "in the background". + + However, usage of early candidates with QoS preconditions is NOT + RECOMMENDED, since QoS will only be reserved for the candidate pair + in the m/c-line. An agent SHOULD only send to the active candidate + (once it enters the Valid or Recv-Valid states) if QoS preconditions + are used for a media session. ICE also has (purposeful) interactions with connectivity - preconditions [14]. Those interactions are described there. + preconditions [30]. Those interactions are described there. -11. Example +11. Examples - This section provides an example ICE call flow. Two agents, L and R, - are using ICE. Both agents have a single IPv4 interface, and are - configured with a single TURN and single STUN server each (indeed, - the same one for each). As a consequence, each agent will end up - with three candidates - a local candidate, a TURN-derived candidate, - and a STUN-derived candidate. The agents are seeking to communicate - using a single RTP-based voice stream. As a consequence, each - candidate has two components - one for RTP and one for RTCP. Agent L - is behind a symmetric NAT, and agent R is on the public Internet. + This section provides two examples. One is a very basic example, and + the other is more elaborate. A common configuration and setup is + used in both cases. + + Two agents, L and R, are using ICE. Both agents have a single IPv4 + interface, and are configured with a single STUN server each (indeed, + the same one for each). This STUN server supports both the Binding + Discovery usage and the Relay usage. Agent L is behind a NAT, and + agent R is on the public Internet. To facilitate understanding, transport addresses are listed in a - mnemonic form. This form is