draft-ietf-mmusic-ice-09.txt   draft-ietf-mmusic-ice-10.txt 
MMUSIC J. Rosenberg MMUSIC J. Rosenberg
Internet-Draft Cisco Systems Internet-Draft Cisco Systems
Expires: December 28, 2006 June 26, 2006 Expires: March 4, 2007 August 31, 2006
Interactive Connectivity Establishment (ICE): A Methodology for Network Interactive Connectivity Establishment (ICE): A Methodology for Network
Address Translator (NAT) Traversal for Offer/Answer Protocols Address Translator (NAT) Traversal for Offer/Answer Protocols
draft-ietf-mmusic-ice-09 draft-ietf-mmusic-ice-10
Status of this Memo Status of this Memo
By submitting this Internet-Draft, each author represents that any By submitting this Internet-Draft, each author represents that any
applicable patent or other IPR claims of which he or she is aware applicable patent or other IPR claims of which he or she is aware
have been or will be disclosed, and any of which he or she becomes have been or will be disclosed, and any of which he or she becomes
aware will be disclosed, in accordance with Section 6 of BCP 79. aware will be disclosed, in accordance with Section 6 of BCP 79.
Internet-Drafts are working documents of the Internet Engineering Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF), its areas, and its working groups. Note that Task Force (IETF), its areas, and its working groups. Note that
skipping to change at page 1, line 34 skipping to change at page 1, line 34
and may be updated, replaced, or obsoleted by other documents at any and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress." material or to cite them other than as "work in progress."
The list of current Internet-Drafts can be accessed at The list of current Internet-Drafts can be accessed at
http://www.ietf.org/ietf/1id-abstracts.txt. http://www.ietf.org/ietf/1id-abstracts.txt.
The list of Internet-Draft Shadow Directories can be accessed at The list of Internet-Draft Shadow Directories can be accessed at
http://www.ietf.org/shadow.html. http://www.ietf.org/shadow.html.
This Internet-Draft will expire on December 28, 2006. This Internet-Draft will expire on March 4, 2007.
Copyright Notice Copyright Notice
Copyright (C) The Internet Society (2006). Copyright (C) The Internet Society (2006).
Abstract Abstract
This document describes a protocol for Network Address Translator This document describes a protocol for Network Address Translator
(NAT) traversal for multimedia session signaling protocols based on (NAT) traversal for multimedia session signaling protocols based on
the offer/answer model, such as the Session Initiation Protocol the offer/answer model, such as the Session Initiation Protocol
(SIP). This protocol is called Interactive Connectivity (SIP). This protocol is called Interactive Connectivity
Establishment (ICE). ICE makes use of the Simple Traversal of UDP Establishment (ICE). ICE makes use of the Simple Traversal
through NAT (STUN), applying its binding discovery, connectivity Underneath NAT (STUN) protocol, applying its binding discovery and
check and relay usages. relay usages, in addition to defining a new usage for checking
connectivity between peers.
Table of Contents Table of Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 4 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4
2. Overview of ICE . . . . . . . . . . . . . . . . . . . . . . . 4 2. Overview of ICE . . . . . . . . . . . . . . . . . . . . . . . 4
3. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 15 2.1. Gathering Candidate Addresses . . . . . . . . . . . . . . 6
4. Sending the Initial Offer . . . . . . . . . . . . . . . . . . 18 2.2. Connectivity Checks . . . . . . . . . . . . . . . . . . . 8
5. Receipt of the Offer and Generation of the Answer . . . . . . 19 2.3. Sorting Candidates . . . . . . . . . . . . . . . . . . . . 10
6. Processing the Answer . . . . . . . . . . . . . . . . . . . . 19 2.4. Frozen Candidates . . . . . . . . . . . . . . . . . . . . 10
7. Common Procedures . . . . . . . . . . . . . . . . . . . . . . 20 2.5. Security for Checks . . . . . . . . . . . . . . . . . . . 11
7.1. Gathering Candidates . . . . . . . . . . . . . . . . . . 20 3. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 11
7.2. Prioritizing the Candidates and Choosing an Operating 4. Sending the Initial Offer . . . . . . . . . . . . . . . . . . 13
One . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 4.1. Gathering Candidates . . . . . . . . . . . . . . . . . . . 13
7.3. Encoding Candidates into SDP . . . . . . . . . . . . . . 27 4.2. Prioritizing Candidates . . . . . . . . . . . . . . . . . 15
7.4. Forming Candidate Pairs . . . . . . . . . . . . . . . . . 31 4.3. Choosing In-Use Candidates . . . . . . . . . . . . . . . . 18
7.5. Ordering the Candidate Pairs . . . . . . . . . . . . . . 33 4.4. Encoding the SDP . . . . . . . . . . . . . . . . . . . . . 19
7.6. Performing the Connectivity Checks . . . . . . . . . . . 36 5. Receiving the Initial Offer . . . . . . . . . . . . . . . . . 20
7.7. Sending a Binding Request for Connectivity Checks . . . . 42 5.1. Verifying ICE Support . . . . . . . . . . . . . . . . . . 20
7.8. Receiving a Binding Request for Connectivity Checks . . . 44 5.2. Gathering Candidates . . . . . . . . . . . . . . . . . . . 21
7.9. Promoting a Candidate to Operating . . . . . . . . . . . 46 5.3. Prioritizing Candidates . . . . . . . . . . . . . . . . . 21
7.10. Learning New Candidates from Connectivity Checks . . . . 47 5.4. Choosing In Use Candidates . . . . . . . . . . . . . . . . 21
7.10.1. On Receipt of a Binding Request . . . . . . . . . . 47 5.5. Encoding the SDP . . . . . . . . . . . . . . . . . . . . . 21
7.10.2. On Receipt of a Binding Response . . . . . . . . . . 51 5.6. Forming the Check List . . . . . . . . . . . . . . . . . . 21
7.11. Subsequent Offer/Answer Exchanges . . . . . . . . . . . . 53 5.7. Performing Periodic Checks . . . . . . . . . . . . . . . . 23
7.11.1. Sending of a Subsequent Offer . . . . . . . . . . . 53 6. Receipt of the Initial Answer . . . . . . . . . . . . . . . . 24
7.11.2. Receiving the Offer and Sending an Answer . . . . . 56 6.1. Verifying ICE Support . . . . . . . . . . . . . . . . . . 24
7.11.3. Receiving the Answer . . . . . . . . . . . . . . . . 59 6.2. Forming the Check List . . . . . . . . . . . . . . . . . . 24
7.12. Binding Keepalives . . . . . . . . . . . . . . . . . . . 59 6.3. Performing Periodic Checks . . . . . . . . . . . . . . . . 24
7.13. Sending Media . . . . . . . . . . . . . . . . . . . . . . 61 7. Connectivity Checks . . . . . . . . . . . . . . . . . . . . . 24
7.14. Receiving Media . . . . . . . . . . . . . . . . . . . . . 63 7.1. Applicability . . . . . . . . . . . . . . . . . . . . . . 24
8. Guidelines for Usage with SIP . . . . . . . . . . . . . . . . 64 7.2. Client Discovery of Server . . . . . . . . . . . . . . . . 25
9. Interactions with Forking . . . . . . . . . . . . . . . . . . 66 7.3. Server Determination of Usage . . . . . . . . . . . . . . 25
10. Interactions with Preconditions . . . . . . . . . . . . . . . 67 7.4. New Requests or Indications . . . . . . . . . . . . . . . 25
11. Examples . . . . . . . . . . . . . . . . . . . . . . . . . . 67 7.5. New Attributes . . . . . . . . . . . . . . . . . . . . . . 25
11.1. Basic Example . . . . . . . . . . . . . . . . . . . . . . 68 7.6. New Error Response Codes . . . . . . . . . . . . . . . . . 25
11.2. Advanced Example . . . . . . . . . . . . . . . . . . . . 72 7.7. Client Procedures . . . . . . . . . . . . . . . . . . . . 25
12. Grammar . . . . . . . . . . . . . . . . . . . . . . . . . . . 93 7.7.1. Sending the Request . . . . . . . . . . . . . . . . . 25
13. Security Considerations . . . . . . . . . . . . . . . . . . . 95 7.7.2. Processing the Response . . . . . . . . . . . . . . . 26
13.1. Attacks on Connectivity Checks . . . . . . . . . . . . . 95 7.8. Server Procedures . . . . . . . . . . . . . . . . . . . . 27
13.2. Attacks on Address Gathering . . . . . . . . . . . . . . 98 7.9. Security Considerations for Connectivity Check . . . . . . 29
13.3. Attacks on the Offer/Answer Exchanges . . . . . . . . . . 99 8. Completing the ICE Checks . . . . . . . . . . . . . . . . . . 29
13.4. Insider Attacks . . . . . . . . . . . . . . . . . . . . . 99 9. Subsequent Offer/Answer Exchanges . . . . . . . . . . . . . . 30
13.4.1. The Voice Hammer Attack . . . . . . . . . . . . . . 99 9.1. Generating the Offer . . . . . . . . . . . . . . . . . . . 30
13.4.2. STUN Amplification Attack . . . . . . . . . . . . . 99 9.2. Receiving the Offer and Generating an Answer . . . . . . . 31
14. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 100 9.3. Updating the Check and Valid Lists . . . . . . . . . . . . 32
14.1. candidate Attribute . . . . . . . . . . . . . . . . . . . 100 10. Keepalives . . . . . . . . . . . . . . . . . . . . . . . . . . 33
14.2. remote-candidate Attribute . . . . . . . . . . . . . . . 100 11. Media Handling . . . . . . . . . . . . . . . . . . . . . . . . 34
14.3. ice-pwd Attribute . . . . . . . . . . . . . . . . . . . . 101 11.1. Sending Media . . . . . . . . . . . . . . . . . . . . . . 34
15. IAB Considerations . . . . . . . . . . . . . . . . . . . . . 101 11.2. Receiving Media . . . . . . . . . . . . . . . . . . . . . 35
15.1. Problem Definition . . . . . . . . . . . . . . . . . . . 102 12. Usage with SIP . . . . . . . . . . . . . . . . . . . . . . . . 35
15.2. Exit Strategy . . . . . . . . . . . . . . . . . . . . . . 102 12.1. Latency Guidelines . . . . . . . . . . . . . . . . . . . . 35
15.3. Brittleness Introduced by ICE . . . . . . . . . . . . . . 103 12.2. Interactions with Forking . . . . . . . . . . . . . . . . 37
15.4. Requirements for a Long Term Solution . . . . . . . . . . 104 12.3. Interactions with Preconditions . . . . . . . . . . . . . 37
15.5. Issues with Existing NAPT Boxes . . . . . . . . . . . . . 104 12.4. Interactions with Third Party Call Control . . . . . . . . 38
16. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 104 13. Grammar . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
17. References . . . . . . . . . . . . . . . . . . . . . . . . . 105 14. Example . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
17.1. Normative References . . . . . . . . . . . . . . . . . . 105 15. Security Considerations . . . . . . . . . . . . . . . . . . . 46
17.2. Informative References . . . . . . . . . . . . . . . . . 106 15.1. Attacks on Connectivity Checks . . . . . . . . . . . . . . 46
Author's Address . . . . . . . . . . . . . . . . . . . . . . . . 108 15.2. Attacks on Address Gathering . . . . . . . . . . . . . . . 49
Intellectual Property and Copyright Statements . . . . . . . . . 109 15.3. Attacks on the Offer/Answer Exchanges . . . . . . . . . . 49
15.4. Insider Attacks . . . . . . . . . . . . . . . . . . . . . 50
15.4.1. The Voice Hammer Attack . . . . . . . . . . . . . . . 50
15.4.2. STUN Amplification Attack . . . . . . . . . . . . . . 50
16. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 51
16.1. candidate Attribute . . . . . . . . . . . . . . . . . . . 51
16.2. remote-candidates Attribute . . . . . . . . . . . . . . . 51
16.3. ice-pwd Attribute . . . . . . . . . . . . . . . . . . . . 52
16.4. ice-ufrag Attribute . . . . . . . . . . . . . . . . . . . 52
17. IAB Considerations . . . . . . . . . . . . . . . . . . . . . . 53
17.1. Problem Definition . . . . . . . . . . . . . . . . . . . . 53
17.2. Exit Strategy . . . . . . . . . . . . . . . . . . . . . . 53
17.3. Brittleness Introduced by ICE . . . . . . . . . . . . . . 54
17.4. Requirements for a Long Term Solution . . . . . . . . . . 55
17.5. Issues with Existing NAPT Boxes . . . . . . . . . . . . . 55
18. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 56
19. References . . . . . . . . . . . . . . . . . . . . . . . . . . 56
19.1. Normative References . . . . . . . . . . . . . . . . . . . 56
19.2. Informative References . . . . . . . . . . . . . . . . . . 57
Appendix A. Design Motivations . . . . . . . . . . . . . . . . . 58
A.1. Applicability to Gateways and Servers . . . . . . . . . . 59
A.2. Pacing of STUN Transactions . . . . . . . . . . . . . . . 60
A.3. Candidates with Multiple Bases . . . . . . . . . . . . . . 61
A.4. Purpose of the Translation . . . . . . . . . . . . . . . . 63
A.5. Importance of the STUN Username . . . . . . . . . . . . . 63
A.6. The Candidate Pair Sequence Number Formula . . . . . . . . 64
A.7. The Frozen State . . . . . . . . . . . . . . . . . . . . . 65
A.8. The remote-candidates attribute . . . . . . . . . . . . . 65
A.9. Why are Keepalives Needed? . . . . . . . . . . . . . . . . 66
A.10. Why Prefer Peer Reflexive Candidates? . . . . . . . . . . 67
A.11. Why Can't Offerers Send Media When a Pair Validates . . . 67
Author's Address . . . . . . . . . . . . . . . . . . . . . . . . . 69
Intellectual Property and Copyright Statements . . . . . . . . . . 70
1. Introduction 1. Introduction
RFC 3264 [4] defines a two-phase exchange of Session Description RFC 3264 [4] defines a two-phase exchange of Session Description
Protocol (SDP) messages [5] for the purposes of establishment of Protocol (SDP) messages [10] for the purposes of establishment of
multimedia sessions. This offer/answer mechanism is used by multimedia sessions. This offer/answer mechanism is used by
protocols such as the Session Initiation Protocol (SIP) [2]. protocols such as the Session Initiation Protocol (SIP) [3].
Protocols using offer/answer are difficult to operate through Network Protocols using offer/answer are difficult to operate through Network
Address Translators (NAT). Because their purpose is to establish a Address Translators (NAT). Because their purpose is to establish a
flow of media packets, they tend to carry IP addresses within their flow of media packets, they tend to carry IP addresses within their
messages, which is known to be problematic through NAT [15]. The messages, which is known to be problematic through NAT [14]. The
protocols also seek to create a media flow directly between protocols also seek to create a media flow directly between
participants, so that there is no application layer intermediary participants, so that there is no application layer intermediary
between them. This is done to reduce media latency, decrease packet between them. This is done to reduce media latency, decrease packet
loss, and reduce the operational costs of deploying the application. loss, and reduce the operational costs of deploying the application.
However, this is difficult to accomplish through NAT. A full However, this is difficult to accomplish through NAT. A full
treatment of the reasons for this is beyond the scope of this treatment of the reasons for this is beyond the scope of this
specification. specification.
Numerous solutions have been proposed for allowing these protocols to Numerous solutions have been proposed for allowing these protocols to
operate through NAT. These include Application Layer Gateways operate through NAT. These include Application Layer Gateways
(ALGs), the Middlebox Control Protocol [17], Simple Traversal of UDP (ALGs), the Middlebox Control Protocol [15], Simple Traversal
through NAT (STUN) [14] and its revision [12], the STUN Relay Usage Underneath NAT (STUN) [13] and its revision [11], the STUN Relay
[13], and Realm Specific IP [18] [19] along with session description Usage [12], and Realm Specific IP [17] [18] along with session
extensions needed to make them work, such as the Session Description description extensions needed to make them work, such as the Session
Protocol (SDP) [5] attribute for the Real Time Control Protocol Description Protocol (SDP) [10] attribute for the Real Time Control
(RTCP) [1]. Unfortunately, these techniques all have pros and cons Protocol (RTCP) [2]. Unfortunately, these techniques all have pros
which make each one optimal in some network topologies, but a poor and cons which make each one optimal in some network topologies, but
choice in others. The result is that administrators and implementors a poor choice in others. The result is that administrators and
are making assumptions about the topologies of the networks in which implementors are making assumptions about the topologies of the
their solutions will be deployed. This introduces complexity and networks in which their solutions will be deployed. This introduces
brittleness into the system. What is needed is a single solution complexity and brittleness into the system. What is needed is a
which is flexible enough to work well in all situations. single solution which is flexible enough to work well in all
situations.
This specification provides that solution for media streams This specification provides that solution for media streams
established by signaling protocols based on the offer-answer model. established by signaling protocols based on the offer-answer model.
It is called Interactive Connectivity Establishment, or ICE. ICE It is called Interactive Connectivity Establishment, or ICE. ICE
makes use of STUN and its relay extension, commonly called TURN, but makes use of STUN and its relay extension, commonly called TURN, but
uses them in a specific methodology which avoids many of the pitfalls uses them in a specific methodology which avoids many of the pitfalls
of using any one alone. of using any one alone.
2. Overview of ICE 2. Overview of ICE
A typical architecture for an ICE deployment is shown in Figure 1. In a typical ICE deployment, we have two endpoints (known as agents
The figure shows two endpoints (known as agents in RFC 3264 in RFC 3264 terminology) which want to communicate. They are able to
terminology) which we call L and R (for left and right, which helps communicate indirectly via some signaling system such as SIP, by
visualize call flows). Both L and R are behind a NAT. The type of which they can perform an offer/answer exchange of SDP [4] messages.
NAT and its properties are unknown. Indeed, it is not known whether Note that ICE is not intended for NAT traversal for SIP, which is
the agent is behind a NAT at all, or whether there are multiple NATs assumed to be provided via some other mechanism [31]. At the
between it and the network. Agents A and B are capable of engaging beginning of the ICE process, the agents are ignorant of their own
in an offer/answer exchange [4] by which they can exchange SDP topologies. In particular, they might or might not be behind a NAT
messages, whose purpose is to set up a media session between A and B. (or multiple tiers of NATs). ICE allows the agents to discover
Of course, the offer/answer exchange itself must be capable of enough information about their topologies to find a path or paths by
traversing the NAT. Such traversal is facilitated through signaling which they can communicate.
elements such as SIP servers, and is outside the scope of this
specification. Different solutions are applied for traversal of the Figure Figure 1 shows a typical environment for ICE deployment. The
signaling that carries the offer/answer exchange, and for the media two endpoints are labelled L and R (for left and right, which helps
set up by that offer/answer exchange. This is because of the vastly visualize call flows). Both L and R are behind NATs -- though as
different requirements on latency, packet loss, and overall bandwidth mentioned before, they don't know that. The type of NAT and its
between the signaling and media. For example, usage of a signaling properties are also unknown. Agents A and B are capable of engaging
intermediary, such as a SIP proxy, as a relay for all signaling at in an offer/answer exchange by which they can exchange SDP messages,
all times, is acceptable, whereas usage of relays at all times for whose purpose is to set up a media session between A and B.
media is highly undesirable. Typically, this exchange will occur through a SIP server.
In addition to the agents, a SIP server and NATs, ICE is typically In addition to the agents, a SIP server and NATs, ICE is typically
used in concert with STUN servers in the network. Each agent can used in concert with STUN servers in the network. Each agent can
have its own STUN server, or they can be the same. have its own STUN server, or they can be the same.
+-------+ +-------+
| SIP | | SIP |
+-------+ | Srvr | +-------+ +-------+ | Srvr | +-------+
| STUN | | | | STUN | | STUN | | | | STUN |
| Srvr | +-------+ | Srvr | | Srvr | +-------+ | Srvr |
| | | | | | / \ | |
+-------+ +-------+ +-------+ / \ +-------+
/ \
/ \
/ \
/ \
/ <- Signalling -> \
/ \
/ \
+--------+ +--------+ +--------+ +--------+
| NAT | | NAT | | NAT | | NAT |
+--------+ +--------+ +--------+ +--------+
/ \
/ \
/ \
+-------+ +-------+ +-------+ +-------+
| Agent | | Agent | | Agent | | Agent |
| L | | R | | L | | R |
| | | | | | | |
+-------+ +-------+ +-------+ +-------+
Figure 1 Figure 1
Prior to initiating an offer, the offering agent (L in this example) The basic idea behind ICE is as follows: each agent has a variety of
starts by performing a process known as address gathering. This candidate transport addresses it could use to communicate with the
process allows the client to obtain one or more transport addresses, other agent. These might include:
one more of which might be viable addresses at which the agent can
receive incoming media packets from the other agent, which we call
its peer. A transport address is just the combination of an IP
address and port. With ICE, an agent will actually provide its peer
with all of its possible transport addresses, and ICE will figure out
which one to actually use.
Naturally, one viable transport address is one obtained directly from o It's directly attached network interface (or interfaces in the
a local interface the client has towards the network. Such a case of a multihomed machine
transport address is called a local transport address. The local
interface could be one on a local layer 2 network technology, such as
ethernet or WiFi, or it could be one that is obtained through a
tunnel mechanism, such as a Virtual Private Network (VPN) or Mobile
IP (MIP). In all cases, these appear to the agent as a local
interface from which ports (and thus transport addresses) can be
allocated.
If an agent is multihomed, it can obtain a transport address from o A translated address on the public side of a NAT (a "server
each interface. Depending on the location of the peer on the IP reflexive" address)
network, the agent may be reachable through one of those interfaces,
or through another. Consider, for example, an agent which has a o The address of a media relay the agent is using.
local interface to a private net 10 network, and also to the public
Internet. A transport address from the net10 interface will be Potentially, any of L's candidate transport addresses can be used to
directly reachable when communicating with a peer on the same private communicate with any of R's transport addresses. In practice,
net 10 network, while a transport address from the public interface however, many combinations will not work. For instance, if L and R
are both behind NATs then their directly interface addresses are
unlikely to be able to communicate directly (this is why ICE is
needed, after all!). The purpose of ICE is to discover which pairs
of addresses will work. The way that ICE does this is to
systematically try all possible pairs (in a carefully sorted order)
until it finds one or more that works.
2.1. Gathering Candidate Addresses
In order to execute ICE, an agent has to identify all of its address
candidates. Naturally, one viable candidate is one obtained directly
from a local interface the client has towards the network. Such a
candidate is called a HOST CANDIDATE. The local interface could be
one on a local layer 2 network technology, such as ethernet or WiFi,
or it could be one that is obtained through a tunnel mechanism, such
as a Virtual Private Network (VPN) or Mobile IP (MIP). In all cases,
these appear to the agent as a local interface from which ports (and
thus a candidate) can be allocated.
If an agent is multihomed, it can obtain a candidate from each
interface. Depending on the location of the peer on the IP network
relative to the agent, the agent may be reachable by the peer through
one of those interfaces, or through another. Consider, for example,
an agent which has a local interface to a private net 10 network, and
also to the public Internet. A candidate from the net10 interface
will be directly reachable when communicating with a peer on the same
private net 10 network, while a candidate from the public interface
will be directly reachable when communicating with a peer on the will be directly reachable when communicating with a peer on the
public Internet. Rather than trying to guess which interface will public Internet. Rather than trying to guess which interface will
work prior to sending an offer, the offering agent includes both work prior to sending an offer, the offering agent includes both
transport addresses in its offer. candidates in its offer.
Indeed, when using a media technology like the Real Time Transport
Protocol (RTP), an agent needs two transport addresses on each
interface - one for the RTP, and one for the Real Time Control
Protocol (RTCP). Other media technologies may require a multiplicity
of transport addresses to be used and treated as a bundle. Each of
these transport addresses is called a component. There are two
components in an RTP stream - the RTP itself, and the RTCP. In ICE,
the set of transport addresses that represent an atomic grouping on
which communications is possible is called a candidate. In the
example so far, the agent would obtain two candidates - one from the
net 10 interface, and one from the interface on the public Internet.
Each candidate would contain two transport addresses, corresponding
to each of the two components.
Once the agent has obtained local transport addresses, it uses STUN Once the agent has obtained host candidates, it uses STUN to obtain
to obtain additional transport addresses. To do this, it would send additional candidates. These come in two flavors: translated
a STUN Binding Request, using the Binding Discovery Usage [12] or the addresses on the public side of a NAT (SERVER REFLEXIVE CANDIDATES)
Relay Usage [13] from a local transport address, to its STUN server. and addresses of media relays (RELAYED CANDIDATES). The relationship
It is assumed that the address of the STUN server is configured, or of these candidates to the host candidate is shown in Figure 2. Both
learned in some way. Indeed, an agent might even have multiple STUN types of candidates are discovered using STUN.
servers. As a consequence of communicating with the STUN server, the
agent can learn potentially two new types of transport addresses -
server reflexive transport addresses and relayed transport addresses.
The relationship of these addresses to the local transport address is
shown in Figure 2.
To Internet To Internet
| |
| |
| /------------ Relayed | /------------ Relayed
| / Address | / Candidate
+--------+ +--------+
| | | |
| STUN | | STUN |
| Server | | Server |
| | | |
+--------+ +--------+
| |
| |
| /------------ Server | /------------ Server
|/ Reflexive |/ Reflexive
+------------+ Address +------------+ Candidate
| NAT | | NAT |
+------------+ +------------+
| |
| /------------ Local | /------------ Host
|/ Address |/ Candidate
+--------+ +--------+
| | | |
| Agent | | Agent |
| | | |
+--------+ +--------+
Figure 2 Figure 2
The local transport address is resident on the agent itself. Through To find a server reflexive candidate, the agent sends a STUN Binding
either the Binding Discovery Usage or the Relay Usage, the agent can Request, using the Binding Discovery Usage [11] from each host
discover its server reflexive transport address. This is the address candidate, to its STUN server. (It is assumed that the address of
on the public side of the NAT, facing the STUN server. It is the the STUN server is configured, or learned in some way.) When the
transport address allocated to the agent on the public side of the agents sends the Binding Request, the NAT (assuming there is one)
NAT as a consequence of the transmission of the STUN request through will allocate a binding, mapping this server reflexive candidate to
the NAT, to the STUN server. The NAT will allocate a binding, the host candidate. Outgoing packets sent from the host candidate
mapping this server reflexive transport address to the local will be translated by the NAT to the server reflexive candidate.
transport address. Packets received at the NAT, targeted towards the Incoming packets sent to the server relexive candidate will be
server reflexive transport address, will have their destination translated by the NAT to the host candidate and forwarded to the
address rewritten to the local transport address by the NAT, and then agent. We call the host candidate associated with a given server
be forwarded to the agent. When there are multiple NATs between the reflexive candidate the BASE.
agent and the STUN server, the STUN request will create a binding on
each NAT, but only the outermost server reflexive transport address
will be discovered by the agent.
In addition, through the Relay Usage, the agent can request that the
STUN server itself allocate a transport address from one of its local
interfaces, and establish a binding that maps that transport address
(called a relayed transport address, naturally) towards the source
transport address of the STUN request, which will actually be equal
to the server reflexive transport address allocated by the outermost
NAT. Consequently, packets sent to the relayed transport address
will be routed by the IP network towards the STUN server. The STUN
server will receive them, rewrite the destination address to be equal
to the server reflexive transport address, and forward them. They
will then arrive at the NAT, where the destination address is
rewritten once again, and the packet forward finally to the agent at
its local address.
Since the server reflexive transport addresses and relayed transport Note
addresses and obtained from a local transport address, they are said
to be derived transport addresses, since they are derived from (and
ultimately map to) their associated local transport address. During
the process of address gathering, the agent will obtain as many
transport addresses of a given type as are needed for the media
session. For example, with RTP, two transport addresses are needed
for a candidate. The agent will obtain two server reflexive
transport addresses (each derived from a local transport address),
and they would be used to constitute a server reflexive candidate.
The local transport addresses make up a local candidate, and the
relayed transport addresses make up a relayed candidate.
Server Server "Base" refers to the address you'd send from for a particular
Reflexive Reflexive candidate. Thus, as a degenerate case host candidates also have a
Candidate Candidate base, but it's the same as the host candidate.
.............. ..............
. . . .
. +-+ +-+ . . +-+ +-+ .
. | | | | . . | | | | .
. +-+ +-+ . . +-+ +-+ .
. ^ ^ . . ^ ^ .
....|....|.... ....|....|....
| | | |
| | | |
....|....|.... ....|....|....
. | | . . | | .
. +-+ +-+ . Local . +-+ +-+ . Local
. | | | | . Candidate . | | | | . Candidate
. +-+ +-+ . . +-+ +-+ .
. | | . . | | .
....|....|.... ....|....|....
| | | |
| | | |
....|....|.... ....|....|....
. V V . . V V .
. +-+ +-+ . . +-+ +-+ .
. | | | | . . | | | | .
. +-+ +-+ . . +-+ +-+ .
. . . .
.............. ..............
Relayed Relayed
Candidate Candidate
Legend When there are multiple NATs between the agent and the STUN server,
------ the STUN request will create a binding on each NAT, but only the
outermost server reflexive candidate will be discovered by the agent.
If the agent is not behind a NAT, then the base candidate will be the
same as the server reflexive candidate and the server reflexive
candidate can be ignored.
+-+ The final type of candidate is a RELAYED candidate. The STUN Relay
| | Transport Address Usage [12] allows a STUN server to act as a media relay, forwarding
+-+ traffic between L and R. In order to send traffic to L, R sends
traffic to the media relay which forwards it to L and vice versa.
The same thing happens in the other direction.
---> Derived From Traffic from L to R has its addresses rewritten twice: first by the
NAT and second by the STUN relay server. Thus, the address that R
knows about and the one that it wants to send to is the one on the
STUN relay server. This address is the final kind of candidate,
which we call a RELAYED CANDIDATE.
... 2.2. Connectivity Checks
. . Candidate
...
Figure 3 Once L has gathered all of its candidates, it orders them highest to
The relationship between these various transport addresses and lowest priority and sends them to R over the signalling channel. The
candidates is shown pictorially in Figure 3. The figure shows our candidates are carried in attributes in the SDP offer. When R
example agent with two local interfaces, each of which provides two receives the offer, it performs the same gathering process and
transport address pairs to make up two candidates. From those two responds with its own list of candidates. At the end of this
local candidates, a server reflexive and relayed candidate are process, each agent has a complete list of both its candidates and
derived. its peer's candidates and is ready to perform connectivity checks by
pairing up the candidates to see which pair works.
Once the agent has completed gathering its candidates, it assigns The basic principle of the connectivity checks is simple:
each a candidate identifier, called the candidate ID. The candidate
ID is a random number used to uniquely identify each candidate, and
is used in the connectivity checks discussed below. The components
of each candidate are ordered numerically, starting at one, such that
each transport address has a component ID. For example, in an RTP
candidate there are two components, component ID 1 and component ID
2. Each transport address pair is therefore uniquely identified by a
combination of its candidate ID and its component ID. The
combination of the two is called, unsurprisingly, a transport address
ID, or tid for short.
The agent will place all of its candidates in an offer, using a new 1. Sort the candidate pairs in priority order.
SDP attribute called the candidate attribute. This attribute
contains the actual transport address, the candidate ID and component
ID, and a q-value. The q-value is used for the agent to prioritize
its candidates. An agent will typically prefer to receive media at
particular candidates over other candidates, based on local policy.
For example, an agent would normally prefer to receive interactive
voice RTP packets at its local candidate as opposed to its relayed
candidate, due to the extra latency incurred by traveling through the
relay.
The candidate attribute will also include an indicator of the type of 2. Send checks on each candidate pair in priority order.
candidate (server reflexive, local, relayed), and its related
transport address. For server reflexive transport addresses, the
related transport address is the local transport address from which
it was derived. For relayed transport addresses, the related
transport address is the server reflexive address towards the relay.
The related transport address for reflexive candidates is used by the
ICE algorithm itself, as explained below. For relayed candidates,
the related transport address is not used by ICE directly; it is
useful for diagnostic purposes and for Quality of Service mechanisms
that require knowledge of addresses closer to the agent.
Finally, the agent chooses one of its candidates for inclusion in the 3. Acknowledge checks received from the other agent.
m and c lines (called the m/c-line collectively). Assuming that
candidate is verified as functional by the ICE connectivity checks
described below, this is the actual IP address and port to which
media will be sent. The candidate selected for inclusion in the m/c-
line of an offer (or an answer) is called the operating candidate,
since it is the one that is the in-use destination for receipt of
media traffic.
Once the operating candidate is chosen, the agent sends the offer. A complete connectivity check for a single candidate pair is a simple
Through the wonders or SIP or other signaling protocols, this offer 4-message handshake:
is delivered to the peer, which must now select its answer. To
create the answer, the agent starts by gathering addresses, in
exactly the same way the offered did. It includes those as
candidates in its answer, and selects one as the operating candidate,
just like the offered did. It then sends the answer.
Each agent then pairs up each of its candidates with the candidates A B
of its peer. From the perspective of the offerer, the set of - -
candidates it sent in its offer are called its native candidates, and STUN request -> \ A's
the ones received in the answer are the remote candidates. <- STUN response / check
Similarly, from the perspective of the answerer, the set of
candidates it sent in its answer are the native candidates, and the
ones received in the offer are the remote candidates. Both agents
pair up each of their native candidates with each of the remote
candidates, producing a set of candidate pairs. If there were N
native candidates and M remote candidates, there will be N*M
candidate pairs. Within each candidate pair, the transport addresses
themselves are paired up one for one, resulting in transport address
pairs as well. The transport addresses are paired up such that they
have identical component IDs. Each transport address pair has an ID,
called the transport address pair ID, formed by concatenating the
transport address IDs of its two transport addresses.
Once the pairing is done, the transport address pairs are ordered in <- STUN request \ B's
such a way that both the offerer and answerer will end up with the STUN response -> / check
same order. This ordering is done by using the q-values each side
provided, along with the candidate IDs to help break ties. Then,
each side begins a process known as connectivity checks.
Connectivity checks are STUN transactions, using the connectivity
check usage of STUN, sent from the native transport address to the
remote transport address of a particular transport address pair. If
an agent sends a STUN request and gets a successful response, the
transport address pair is said to be Receive Valid, or Recv Valid for
short, since the agent knows that its peer was able to receive a
packet. If an agent receives a request and sends a response, the
transport address pair is said to be Send Valid, since the agent
knows that its peer was able to send it a packet. When transactions
in both directions complete, the transport address pair is said to be
Valid. The idea behind ICE is that if a transport address pair is
valid, it means that agents were able to succesfully exchange IP
packets in both directions. Consequently, any media packets, which
are sent to and from exactly the same IP addresses and ports, should
also work, since they don't differ in their IP addresses or ports.
It's important to point out that, when used with ICE, an agent will Figure 3
always send and receive media on the same transport address. That
is, if an agent includes a transport address of 192.0.2.1:2444
(meaning an IP address of 192.0.2.1 and port of 2444) in its SDP for
receiving RTP packets (and also STUN connectivity check), it will not
only receive STUN requests and RTP packets on this transport address,
it will also send STUN requests and RTP packets from this transport
address. This property, known as symmetric RTP, is essential for
proper operation of ICE. Peer reflexive transport addresses,
discussed further below, will generally only work when symmetric RTP
is used. Symmetric RTP is also key for keeping NAT bindings alive.
Since there can be quite a few transport address pairs to check, As an optimization, as soon as B gets A's check message he
performing all of the connectivity checks in parallel can cause immediately sends his own check message to A on the same candidate
substantial load on the network. Instead, each agent will start at pair. This accelerates the process of finding a valid candidate.
the top of the ordered list they each created, and every 50ms, begin
a new connectivity check.
In order to succesfully process a STUN connectivity check, an agent At the end of this handshake, both A and B know that they can send
must be able to correlate the STUN request or response with the (and receive) messages end-to-end in both directions. Note that as
transport address pair whose connectivity the STUN message is meant soon as B receives A's STUN response it knows that the B->A path
to validate. To perform this correlation, the STUN connectivity works and it can start sending media on that path right away, as
checks contain a USERNAME attribute formed in a special way. In shown below. This allows for 'early media' to flow as fast as
particular, the USERNAME contains the actual transport address pair possible:
ID, which, as described above, is formed by concatenating the
transport address IDs of each of the candidates. The USERNAME is
used in conjunction with an authentication and message integrity
operation on the STUN message that requires a password. This
password is conveyed in the offer/answer exchange, and is a random
number valid only for the duration of the media session. This
ensures that, if the signaling channel carrying the offer/answer
exchange is secure, the agent can be certain that its STUN
connectivity checks are taking place with the agent which responded
to the signaling.
Because each agent is receiving STUN requests on the same IP address A B
and port that media will later be sent to, each agent is effectively - -
acting as its own mini STUN server, implementing the connectivity STUN request -> \ A's
check usage described in [12]. Like all STUN servers, when the agent <- STUN response / check
sends a STUN response to a request, the response includes the XOR-
MAPPED-ADDRESS attribute that contains the source IP address and port
that the request came from. In certain deployment scenarios, and in
particular where one of the agents is behind a NAT whose address and
port mapping properties are address and port dependent [32], this
source IP address and port may differ from the server reflexive ones
allocated by the peer during the address gathering phase. This
source IP address and port, conveyed in the XOR-MAPPED-ADDRESS
attribute of the STUN response, therefore constitutes a new transport
address, called a peer reflexive transport address, which can be used
for communications.
+-------+ <- STUN request \ B's
| STUN | STUN response -> / check
| Srvr | <- RTP Data
| |
+-------+
^
|
|
|
|
+--------------------------+ |
| NAT-2| |NAT-1
| +-----------+
| | APD NAT |
| +-----------+
| | |
| \ |
VL1 \|R1
+-------+ +-------+
| Agent | | Agent |
| L | | R |
| | | |
+-------+ +-------+
Figure 4 Figure 4
Consider the example of Figure 4. The agent on the left, agent L, Once any connectivity check for a candidate for a given media
has a single interface and is not behind a NAT. Consequently, it component succeeds, ICE uses that candidate and immediately abandons
ends up with a single candidate with a single transport address all other connectivity checks for that component. Note that due to
(normally two for RTP, but we'll consider just one for ease of race conditions and packet loss, this may mean that the "best"
explanation), transport address L1. It sends an offer to agent R, candidate isn't selected, but it does guarantee the selection of a
which is behind one of these Address and Port Dependent (APD) mapping candidate that works, and because of the sorting process it will
NATs. Agent R has a local transport address R1, and obtains a server generally be one of the most preferred ones.
reflexive transport address from its STUN server, transport address
NAT-1. Now, when agent R sends a connectivity check from its local
transport address (R1) to L's local transport address (L1), this
check will traverse the NAT. The connectivity check itself will
create a new mapping in the NAT and be allocated a new binding on the
NAT - NAT-2. This STUN request arrives at L, which generates a STUN
response containing transport address NAT-2. Agent R, noticing that
this is not the same as its other two transport addresses, treats
this as a new peer reflexive transport address.
This new peer reflexive transport address is paired up with the 2.3. Sorting Candidates
remote transport address containing the STUN server from which that
transport address was learned (transport address L1 in the example
above). This becomes a new transport address pair, and connectivity
checks are run on it as well.
Once all of the transport address pairs in a candidate pair have been Because the algorithm above searches all candidate pairs, if a
validated, that candidate pair is ready to be used. Media starts working pair exists it will eventually find it no matter what order
being sent on it immediately, and the offerer will send an updated the candidates are tried in. In order to produce faster (and better)
offer, now containing the agents half of the validated candidate pair results, the candidates are sorted in a specified order. The
in the m/c-line. This is called "promoting a candidate to algorithm is described in Section 4.2 but follows two general
operating". The updated offer only contains a single candidate principles:
attribute - the one for the operating candidate. It also contains an
attribute, called the remote-candidate attribute, which tells the
answerer the remote candidate in the validated candidate pair. The
answerer uses this attribute, along with its own view on the states
of the candidate pairs, to place a candidate in the m/c-line and
populate the candidate attributes in its answer.
It is important to understand that, when ICE is in use, media is not o Each agent gives its candidates a numeric priority which is sent
sent to a candidate without validation, even if that candidate along with the candidate to the peer
appears in the m/c-line. This is in order to avoid denial-of-service
attacks. In particular, without ICE, an offerer can send an offer to
another agent, and list the IP address and port of a target in the
offer. If the agent is an automata that answers a call
automatically, it will do so and then proceed to send media to the
target. This provides substantial packet amplifications. ICE fixes
this by requiring that an agent never send media packets unless it
has sent a STUN message towards the target of the RTP packets, and
received a reply from that target. See Section 7.13 for details.
A summary of this overall behavior is shown in the basic call flow in o The local and remote priorities are combined so that each agent
Figure 5. has the same ordering for the candidate pairs.
Agent A STUN Servers Agent B The second property is important for getting ICE to work when there
|(1) Gather Addresses | | are NATs in front of A and B. Frequently, NATs will not allow packets
|-------------------->| | in from a host until the agent behind the NAT has sent a packet
|(2) Offer | | towards that host. Consequently, ICE checks in each direction will
|------------------------------------------>| not succeed until both sides have sent a check through their
| |(3) Gather Addresses | respective NATs.
| |<--------------------|
|(4) Answer | |
|<------------------------------------------|
|(5) STUN Check | |
|<------------------------------------------|
|(6) STUN Check | |
|------------------------------------------>|
|(7) Media | |
|<------------------------------------------|
|(8) Media | |
|------------------------------------------>|
|(9) Offer | |
|------------------------------------------>|
|(10) Answer | |
|<------------------------------------------|
Figure 5 In general the priority algorithm is designed so that candidates of
similar type get similar priorities and so that more direct routes
are favored over indirect ones. Within those guidelines, however,
agents have a fair amount of discretion about how to tune their
algorithms.
2.4. Frozen Candidates
The previous description only addresses the case where the agents
wish to establish a single media component--i.e., a single flow with
a single host-port quartet. However, in many cases (in particular
RTP and RTCP) the agents actually need to establish connectivity for
more than one flow.
The naive way to attack this problem would be to simply do
independent ICE exchanges for each media component. This is
obviously inefficient because the network properties are likely to be
very similar for each component (especially because RTP and RTCP are
typically run on adjacent ports). Thus, it should be possible to
leverage information from one media component in order to determine
the best candidates for another. ICE does this with a mechanism
called "frozen candidates."
The basic principle behind frozen candidates is that initially only
the candidates for a single media component are tested. The other
media components are marked "frozen". When the connectivity checks
for the first component succeed, the corresponding candidates for the
other components are unfrozen and checked immediately. This avoids
repeated checking of components which are superficially more
attractive but in fact are likely to fail.
While we've described "frozen" here as a separate mechanism for
expository purposes, in fact it is an integral part of ICE and the
the ICE prioritization algorithm automatically ensures that the right
candidates are unfrozen and checked in the right order.
2.5. Security for Checks
Because ICE is used to discover which addresses can be used to send
media between two agents, it is important to ensure that the process
cannot be hijacked to send media to the wrong location. Each STUN
connectivity check is covered by a message authentication code (MAC)
computed using a key exchanged in the signalling channel. This MAC
provides message integrity and data origin authentication, thus
stopping an attacker from forging or modifying connectivity check
messages. The MAC also aids in disambiguating ICE exchanges from
forked calls.
3. Terminology 3. Terminology
Several new terms are introduced in this specification: The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
document are to be interpreted as described in RFC 2119 [1].
This specification makes use of the following terminology:
Agent: As defined in RFC 3264, an agent is the protocol Agent: As defined in RFC 3264, an agent is the protocol
implementation involved in the offer/answer exchange. There are implementation involved in the offer/answer exchange. There are
two agents involved in an offer/answer exchange. two agents involved in an offer/answer exchange.
Peer: From the perspective of one of the agents in a session, its Peer: From the perspective of one of the agents in a session, its
peer is the other agent. Specifically, from the perspective of peer is the other agent. Specifically, from the perspective of
the offerer, the peer is the answerer. From the perspective of the offerer, the peer is the answerer. From the perspective of
the answerer, the peer is the offerer. the answerer, the peer is the offerer.
Transport Address: The combination of an IP address and port. Transport Address: The combination of an IP address and port.
Local Transport Address: A local transport address is a transport Candidate: A transport address that is to be tested by ICE procedures
address that has been allocated from the operating system on the in order to determine its suitability for usage for receipt of
host. This includes transport addresses obtained through Virtual media.
Private Networks (VPNs) and transport addresses obtained through
Realm Specific IP (RSIP) [18] (which lives at the operating system
level). Transport addresses are typically obtained by binding to
an interface.
m/c line: The media and connection lines in the SDP, which together
hold the transport address used for the receipt of media.
Derived Transport Address: A derived transport address is a transport
address which is obtained from a local transport address. The
derived transport address is related to the associated local
transport address in that packets sent to the derived transport
address are received on the socket bound to its associated local
transport address. Derived addresses are obtained using protocols
like STUN, and more generally, any UNSAF protocol [20].
Reflexive Transport Address: As defined in [12], a derived transport
address learned by a client which identifies that client as seen
by another host on an IP network, typically a STUN server. When
there is an intervening NAT between the client and the other host,
the reflexive transport address represents the binding allocated
to the client on the public side of the NAT. Reflexive transport
addresses are learned from the XOR-MAPPED-ADDRESS attribute in
STUN Binding Responses and Allocate Responses [13].
Server Reflexive Transport Address: A server reflexive transport
address is a reflexive address that is reflected off of a server,
distinct from the peer, whose address is configured or learned by
the client prior to an offer/answer exchange.
Peer Reflexive Transport Address: A peer reflexive transport address
is a reflexive address that is reflected off of the peer. Peer
reflexive transport addresses are learned by connectivity checks.
Relayed Transport Address: A derived transport address that
terminates on a server, and is forwarded towards the client. The
STUN Allocate Request, defined as part of the STUN relay usage
[13] can be used to obtain a relayed transport address, for
example.
Associated Local Transport Address: When a peer sends a packet to a
transport address, the associated local transport address is the
local transport address at which those packets will actually
arrive. For a local transport address, its associated local
transport address is the same as the local transport address
itself. For reflexive and relayed transport addresses, however,
they are not the same. The associated local transport address is
the one from which the reflexive or relayed transport was derived.
Candidate: A sequence of transport addresses that form an atomic set
for usage with a particular media session. Here, atomic means
that all of transport addresses in the candidate need to work
before the candidate will be used for actual media transport. In
the case of RTP, there can be one or more transport addresses per
candidate. In the most common case, there are two - one for RTP,
and another for RTCP. If the agent doesn't use RTCP, there would
be just one. If Generic Forward Error Correction (FEC) [16] is in
use, there may be more than two. The transport addresses that
compose a candidate are all of the same type - local, server
reflexive, peer reflexive or relayed.
Local Candidate: A candidate whose transport addresses are local
transport addresses.
Server Reflexive Candidate: A candidate whose transport addresses are
server reflexive transport addresses.
Peer Reflexive Candidate: A candidate whose transport addresses are
peer reflexive transport addresses.
Relayed Candidate: A candidate whose transport addresses are relayed
transport addresses.
Generating Candidate: The candidate from which a peer reflexive Host Candidate: A candidate obtained by binding to a specific port
candidate is derived. from an interface on the host. This includes both physical
interfaces and logical ones, such as ones obtained through Virtual
Private Networks (VPNs) and Realm Specific IP (RSIP) [17] (which
lives at the operating system level).
Operating Candidate: The candidate that is in use for exchange of Server Reflexive Candidate: A candidate obtained by sending a STUN
media. This is the one that an agent places in the m/c line of an request from a host candidate to a STUN server, distinct from the
offer or answer. peer, whose address is configured or learned by the client prior
to an offer/answer exchange.
Candidate ID: An identifier for a candidate. Peer Reflexive Candidate: A candidate obtained by sending a STUN
request from a host candidate to the STUN server running on a
peer's candidate.
Component: When a media stream, and as a consequence, its candidate, Relayed Candidate: A candidate obtained by sending a STUN Allocate
require several IP addresses and ports to work atomically, each of request from a host candidate to a STUN server. The relayed
the constituent IP addresses and ports represents a component of candidate is resident on the STUN server, and the STUN server
that media stream. For example, RTP-based media streams typically relays packets back towards the agent.
have two components - one for RTP, and one for RTCP.
Component ID: An integer, starting with one within each candidate and Translation: The translation of a relayed candidate is the transport
incrementing by one for each component, which identifies the address that the relay will forward a packet to, when one is
component. received at the relayed candidate. For relayed candidates learned
through the STUN Allocate request, the translation of the relayed
candidate is the server reflexive candidate returned by the
Allocate response.
Transport Address ID (tid): An identifier for a transport address, Base: The base of a server reflexive candidate is the host candidate
formed by concatenating the candidate ID with the component ID, from which it was derived. A host candidate is also said to have
separated by a "colon". a base, equal to that candidate itself. Similarly, the base of a
relayed candidate is that candidate itself.
Candidate Pair: The combination of a candidate from one agent along Foundation: Each candidate has a foundation, which is an identifier
with a candidate from its peer. that is distinct for two candidates that have different types,
different interface IP addresses for their base, and different IP
addresses for their STUN servers. Two candidates have the same
foundation when they are of the same type, their bases have the
same IP address, and, for server reflexive or relayed candidates,
they come from the same STUN server. Foundations are used to
correlate candidates, so that when one candidate is found to be
valid, candidates sharing the same foundation can be tested next,
as they are likely to also be valid.
Native Candidate: From the perspective of each agent, the candidate Local Candidate: A candidate that an agent has obtained and included
in a candidate pair which represents a set of addresses obtained in an offer or answer it sent.
by that agent.
Remote Candidate: From the perspective of each agent, the candidate Remote Candidate: A candidate that an agent received in an offer or
in a candidate pair which represents the set of addresses obtained answer from its peer.
by that agents peer.
Transport Address Pair: The combination of the transport address for In-Use Candidate: A candidate is in-use when it appears in the m/c-
one component of a candidate with the transport address of the line of an active media stream.
same component for the matching candidate in a candidate pair.
Transport Address Pair ID: An identifier for a transport address Candidate Pair: A pairing containing a local candidate and a remote
pair. Formed by concatenating the native transport address ID candidate.
with the remote transport address ID, separated by a "colon".
Matching Transport Address Pair: When a STUN Binding Request is Check: A candidate pair where the local candidate is a transport
received on a local transport address, the matching transport address from which an agent can send a STUN connectivity check.
address pair is the transport address pair whose connectivity is
being checked by that Binding Request.
Candidate Pair Priority Ordering: An ordering of candidate pairs Check List: An ordered set of STUN checks that an agent is to
based on a combination of the qvalues of each candidate and the generate towards a peer.
candidate IDs of each candidate.
Candidate Pair Check Ordering: An ordering of candidate pairs that is Periodic Check: A connectivity check generated by an agent as a
similar to the candidate pair priority ordering, except that the consequence of a timer that fires periodically, instructing it to
operating candidate appears at the top of the list, regardless of send a check.
its priority.
Transport Address Pair Check Ordering: An ordering of transport Triggered Check: A connectivity check generated as a consequence of
address pairs that determines the sequence of connectivity checks the receipt of a connectivity check from the peer.
performed for the pairs.
Transport Address Pair Count: The number of transport address pairs Valid List: An ordered set of candidate pairs that have been
in a candidate pair. This is equal to the minimum of the number validated by a successful STUN transaction.
of transport addresses in the native candidate and the number of
transport addresses in the remote candidate.
4. Sending the Initial Offer 4. Sending the Initial Offer
When an agent wishes to begin a session by sending an initial offer, In order to send the initial offer in an offer/answer exchange, an
it starts by gathering transport addresses, as described in agent must gather candidates, priorize them, choose ones for
Section 7.1. This will produce a set of candidates, including local inclusion in the m/c-line, and then formulate and send the SDP. Each
ones, server reflexive ones, and relayed ones. of these steps is described in the subsections below.
This process of gathering candidates can actually happen at any time
before sending the initial offer. A agent can pre-gather transport
addresses, using a user interface cue (such as picking up the phone,
or entry into an address book) as a hint that communications is
imminent. Doing so eliminates any additional perceivable call setup
delays due to address gathering.
When it comes time to offer communications, the agent determines a
priority for each candidate and identifies the operating candidate
that will be used for receipt of media, as described in Section 7.2.
The next step is to construct the offer message. For each media
stream, it places its candidates into a=candidate attributes in the
offer and puts its operating candidate into the m/c line. The
process for doing this is described in Section 7.3. The offer is
then sent.
5. Receipt of the Offer and Generation of the Answer
Upon receipt of the offer message, the agent checks if the offer
contains any a=candidate attributes. If the offer does, the offerer
supports ICE. In that case, it starts gathering candidates, as
described in Section 7.1, and prioritizes them as described in
Section 7.2. This processing is done immediately on receipt of the
offer, to prepare for the case where the user should accept the call,
or early media needs to be generated. By gathering candidates (and
performing connectivity checks) while the user is being alerted to
the request for communications, session establishment delays are
reduced.
The agent then constructs its answer, encoding its candidates into
a=candidate attributes and including the operating one in the m/c-
line, as described in Section 7.3. The agent then forms candidate
pairs as described in Section 7.4. These are ordered as described in
Section 7.5. The agent then begins connectivity checks, as described
in Section 7.6. It follows the logic in Section 7.10 on receipt of
Binding Requests and responses to learn new candidates from the
checks themselves.
Transmission of media is performed according to the procedures in
Section 7.13.
6. Processing the Answer
There are two possible cases for processing of the answer. If the
answerer did not support ICE, the answer will not contain any
a=candidate attributes. As a result, the offerer knows that it
cannot perform its connectivity checks. In this case, it proceeds
with normal media processing as if ICE was not in use. However, it
SHOULD send media with the symmetric property described in
Section 7.13, and follow the keepalive procedures in Section 7.12.
If the answer contains candidates, it implies that the answerer
supports ICE. The offerer then forms candidate pairs as described in
Section 7.4. These are ordered as described in Section 7.5. The
agent then begins connectivity checks, as described in Section 7.6.
It follows the logic in Section 7.10 on receipt of Binding Requests
and responses to learn new candidates from the checks themselves.
Transmission of media is performed according to the procedures in
Section 7.13.
7. Common Procedures
This section discusses procedures that are common between offerer and
answerer.
7.1. Gathering Candidates 4.1. Gathering Candidates
An agent gathers candidates when it believes that communications is An agent gathers candidates when it believes that communications is
imminent. For offerers, this occurs before sending an offer imminent. An offerer can do this based on a user interface cue, or
(Section 4). For answerers, it occurs before sending an answer based on an explicit request to initiate a session. Every candidate
(Section 5). is an IP address and port (also known as a transport address). It
also has a type and a base. Three types are defined and gathered by
Each candidate has one or more components, each of which is this specification - host candidates, server reflexive candidates,
associated with a sequence number, starting at 1 for the first and relayed candidates. The base of a candidate is candidate that an
component of each candidate, and incrementing by 1 for each agent must send from when using that candidate.
additional component within that candidate. These components
represent a set of transport addresses for which connectivity must be
validated. For a particular media stream, all of the candidates
SHOULD have the same number of components. The number of components
that are needed are a function of the type of media stream. All of
the components in a candidate MUST be of the same type - server
reflexive, relayed, or local, and obtained from the same server in
the case of server reflexive or relayed candidates. For local
candidates, each component MUST be obtained from the same interface.
For server reflexive and relayed candidates, each component MUST be
derived from a component with the same component ID, all of which
come from a single local candidate.
For traditional RTP-based media streams, it is RECOMMENDED that there
be two components per candidate - one for RTP and one for RTCP. The
component with the component ID of 1 MUST be RTP, and the one with
component ID of 2 MUST be RTCP. If an agent doesn't implement RTCP,
it SHOULD have a single component for the RTP stream (which will have
a component ID of 1 by definition). Each component of a candidate
has a single transport address.
The first step is to gather local candidates. Local candidates are The first step is to gather host candidates. Host candidates are
obtained by binding to ports (typically ephemeral) on an interface obtained by binding to ports (typically ephemeral) on an interface
(physical or virtual, including VPN interfaces) on the host. The (physical or virtual, including VPN interfaces) on the host. The
process for gathering local candidates depends on the transport process for gathering host candidates depends on the transport
protocol. Procedures are specified here for UDP. Extensions to ICE protocol. Procedures are specified here for UDP.
that define procedures for other transport protocols MUST specify how
local transport addresses are gathered.
For each UDP media stream the agent wishes to use, the agent SHOULD For each UDP media stream the agent wishes to use, the agent SHOULD
obtain a set of candidates (one for each interface) by binding to N obtain a candidate for each component of the media stream on each
UDP ports on each interface, where N is the number of components interface that the host has. It obtains each candidate by binding to
needed for the candidate. For RTP, N is typically two. If a host a UDP port on the specific interface. A host candidate (and indeed
has K local interfaces, this will result in K candidates for each UDP every candidate) is always associated with a specific component for
stream, requiring K*N local transport addresses. which it is a candidate. Each component has an ID assigned to it,
called the component ID. For RTP-based media streams, the RTP itself
Once the agent has obtained local candidates, it obtains candidates has a component ID of 1, and RTCP a component ID of 2. If an agent
with derived transport addresses. The process for gathering derived is using RTCP it MUST obtain a candidate for it. If an agent is
candidates depends on the transport protocol. Procedures are using both RTP and RTCP, it would end up with 2*K host candidates if
specified here for UDP. Extensions to ICE that define procedures for an agent has K interfaces.
other transport protocols MUST specify how derived transport
addresses are gathered.
Agents which serve end users directly, such as softphones, The base for each host candidate is set to the candidate itself.
hardphones, terminal adapters and so on, MUST implement the STUN
Binding Discovery usage and SHOULD use it to obtain server reflexive
candidates. These devices SHOULD implement the STUN Relay usage, and
SHOULD use its Allocate request to obtain both server reflexive and
relayed candidates. They MAY implement and MAY use other protocols
that provide derived transport addresses, such as TEREDO [29].
The requirement to use the relay Usage is at SHOULD strength to allow Once the agent has obtained host candidates, it obtains server
for provider variation. If it is not to be used, it is RECOMMENDED reflexive and relayed candidates. The process for gathering server
that it be implemented and just disabled through configuration, so reflexive and relayed candidates depends on the transport protocol.
that it can re-enabled through configuration if conditions change in Procedures are specified here for UDP.
the future.
Agents which serve end users directly, such softphones, hardphones,
terminal adapters and so on, SHOULD obtain relayed candidates and
MUST obtain server reflexive candidates. The requirement to obtain
relayed candidates is at SHOULD strength to allow for provider
variation. If they are not used, it is RECOMMENDED that it be
implemented and just disabled through configuration, so that it can
re-enabled through configuration if conditions change in the future.
Agents which represent network servers under the control of a service Agents which represent network servers under the control of a service
provider, such as gateways to the telephone network, media servers, provider, such as gateways to the telephone network, media servers,
or conferencing servers that are targeted at deployment only in or conferencing servers that are targeted at deployment only in
networks with public IP addresses MAY use the STUN Binding Discovery networks with public IP addresses MAY skip obtaining server reflexive
usage and relay usage, or other similar protocols to obtain and relayed candidates.
candidates.
Why would these types of endpoints even bother to implement ICE?
The answer is that such an implementation greatly facilitates NAT
traversal for clients that connect to it. Consider a PC softphone
behind a NAT whose mapping policy is address and port dependent.
The softphone initiates a call through a gateway that implements
ICE. The gateway doesn't obtain any server reflexive or relayed
transport addresses, but it implements ICE, and consequently, is
prepared to receive STUN connectivity checks on its local
transport addresses. The softphone will send a STUN connectivity
to check to that local transport address, causing the NAT to
allocate a new binding for the softphone. The connectivity check
will inform the softphone of this address, allowing it to be used
by the gateway as a peer reflexive remote candidate. This allows
direct media transmission between the gateway and softphone,
without the need for relays. Furthermore, implementation of the
STUN connectivity checks allows for NAT bindings along the way to
be kept open. ICE also provides numerous security properties that
are independent of NAT traversal, and would benefit any multimedia
endpoint. See Section 13 for a discussion on these benefits.
Obtaining derived candidates requires transmission of packets which
have the effect of creating bindings on NAT devices between the
client and the STUN servers. Experience has shown that many NAT
devices have upper limits on the rate at which they will create new
bindings. Furthermore, transmission of these packets on the network
makes use of bandwidth and needs to be rate limited by the agent. As
a consequence, a client SHOULD pace its STUN transactions, such that
the start of each new transaction occurs at least Ta seconds after
the start of the previous transaction. The value of Ta SHOULD be
configurable, and SHOULD have a default of 50ms. Note that this
pacing applies only to the start of a new transaction; pacing of
retransmissions within a STUN transaction is governed by the
retransmission rules defined by STUN.
Derived candidates can be obtained from the STUN Binding Discovery
usage or the STUN Relay usage. The latter is preferred since it will
provide the client with both a server reflexive and a relayed
transport address with a single transaction. It is possible that
some STUN servers will only support the Relay usage or only the
Binding Discovery usage, in which case a client might be configured
with different servers depending on the usage.
To obtain both server reflexive and relayed candidates using the STUN
Relay Usage, the client takes a local UDP candidate, and for each
configured STUN server, produces both candidates. It is anticipated
that clients may have a multiplicity of STUN servers configured or
discovered in network environments where there are multiple layers of
NAT, and where that layering is known to the provider of the client.
To obtain these candidates, for each configured STUN server, the The agent next pairs each host candidate with the STUN server with
client initiates an Allocate Request transaction using the procedures which it is configured or has discovered by some means. This
of Section 8.1.2 of [13] from each transport address of a particular specification only considers usage of a single STUN server. Every Ta
local candidate. The Allocate Response will provide the client with seconds, the agent chooses another such pair (the order is
its server reflexive transport address (obtained from the XOR-MAPPED- inconsequential), and sends a STUN request to the server from that
ADDRESS attribute) and its relayed transport address in the RELAY- host candidate. If the agent is using both relayed and server
ADDRESS attribute. Indeed, these two transport addresses are related reflexive candidates, this request MUST be a STUN Allocate request
to each other. The relay will forward packets received on the from the relay usage [12]. If the agent is using only server
relayed transport address towards that server reflexive transport reflexive candidates, the request MUST be a STUN Binding request
address. As such, the server reflexive transport address is said to using the binding discovery usage [11].
be the associated server reflexive transport address for that relayed
address. Once the Allocate requests have given a client a relayed
transport address for all transport addresses in a relayed candidate,
there is no reason for a client to obtain further relayed candidates
through the same STUN server. Thus, if there are other local
candidates from which the client has not yet obtained relayed
transport address, the client SHOULD NOT bother to obtain them.
Instead, it SHOULD use the STUN Binding Discovery usage and obtain
just server reflexive addresses from that STUN server. The order in
which local candidates are tried against the STUN server to obtain
relayed candidates is a matter of local policy.
To obtain server reflexive candidates using the STUN Binding The value of Ta SHOULD be configurable, and SHOULD have a default of
Discovery usage, the client takes a local UDP candidate, and for each 50ms. Note that this pacing applies only to starting STUN
configured STUN server, produces a server reflexive candidate. To transactions with source and destination transport addresses (i.e.,
produce the server reflexive candidate from the local candidate, it the host candidate and STUN server respectively) for which a STUN
follows the procedures of Section 12.2 of [12] for each local transaction has not previously been sent. Consequently,
transport address in the local candidate. The Binding Response will retransmissions of a STUN request are governed entirely by the
provide the client with its server reflexive transport address. If retransmission rules defined in [11]. Similarly, retries of a
the client had K local candidates, this will produce S*K server request due to recoverable errors (such as an authentication
reflexive candidates, where S is the number of STUN servers. challenge) happen immediately and are not paced by timer Ta. Because
of this pacing, it will take a certain amount of time to obtain all
of the server reflexive and relayed candidates. Implementations
should be aware of the time required to do this, and if the
application requires a time budget, limit the amount of candidates
which are gathered.
Since a client will pace its STUN transactions (both Binding and An Allocate Response will provide the client with a server reflexive
Allocate requests) at a total rate of one new transaction every Ta candidate (obtained from the mapped address) and a relayed candidate
seconds, it will take a certain amount of time to complete the in the RELAY-ADDRESS attribute. A Binding Response will provide the
address gathering phase. It is RECOMMENDED that implementations have client with a only server reflexive candidate (also obtained from the
a configurable upper bound on the total amount of time allotted to mapped address). The base of the server reflexive candidate is the
address gathering. Any transactions not completed at that point host candidate from which the Allocate or Binding request was sent.
SHOULD be abandoned, but MAY continue and be used in an updated offer The base of a relayed candidate is that candidate itself. A server
once they complete. A default value of 5s is RECOMMENDED. Since the reflexive candidate obtained from an Allocate response is the called
total number of allocations that could be done (based on the number the "translation" of the relayed candidate obtained from the same
of STUN servers and local interfaces) might exceed this value, response. The agent will need to remember the translation for the
clients SHOULD prioritize their local candidates and STUN servers, relayed candidate, since it is placed into the SDP. If a relayed
performing transactions from the highest priority local candidates to candidate is identical to a host candidate (which can happen in rare
the highest priority STUN servers first. A STUN server would cases), the relayed candidate MUST be discarded. Proper operation of
typically be higher priority if it supports the STUN Relay Usage, ICE depends on each base being unique.
since such a server provides two transport addresses with one
transaction.
Once the allocations are complete, any redundant candidates are Next, redundant candidates are eliminated. A candidate is redundant
discarded. Candidate A is redundant with candidate B if the if its transport address equals another candidate, and its base
transport addresses of each component match, and each component of equals the base of that other candidate. Note that two candidates
their associated local candidates match. For example, consider a set can have the same transport address yet have different bases, and
of candidates with a single component. One candidate is a local these would not be considered redundant.
candidate, and its one component has a transport address of 10.0.1.1:
4458. A reflexive transport address is derived from this local
transport address, producing a 10.0.1.1:4458. These two candidates
are identical, and also have identical associated local transport
addresses, so they are redundant.
+----------+ Finally, each candidate is assigned a foundation. The foundation is
| STUN Srvr| an identifier, scoped within a session. Two candidates MUST have the
+----------+ same foundation ID when they are of the same type (host, relayed,
| server reflexive, peer reflexive or relayed), their bases have the
| same IP address (the ports can be different), and, for reflexive and
----- relayed candidates, the STUN servers used to obtain them have the
// \\ same IP address. Similarly, two candidates MUST have different
| | foundations if their types are different, their bases have different
| B:net10 | IP addresses, or the STUN servers used to obtain them have different
| | IP addresses.
\\ //
-----
|
|
+----------+
| NAT |
+----------+
|
|
-----
// \\
| A |
|192.168/16 |
| |
\\ //
-----
|
|
|192.168.1.1 -----
+----------+ // \\ +----------+
| | | | | |
| Offerer |---------| C:net10 |---------| Answerer |
| |10.0.1.1 | | 10.0.1.2 | |
+----------+ \\ // +----------+
-----
Figure 6
Consider the more complicated case of Figure 6. In this case, the 4.2. Prioritizing Candidates
offerer is multi-homed. It has one interface, 10.0.1.1, on network
C, which is a net 10 private network. The Answerer is on this same
network. The offerer is also connected to network A, which is
192.168/16. The offerer has an interface of 192.168.1.1 on this
network. There is a NAT on this network, natting into network B,
which is another net10 private network, but not connected to network
C. There is a STUN server on network B.
The offerer obtains local transport address on its interface on The prioritization process results in the assignment of a priority to
network C (10.0.1.1:2498) and a local transport address on its each candidate. An agent does this by determining a preference for
interface on network A (192.168.1.1:3344). It performs a STUN query each type of candidate (server reflexive, per reflexive, relayed and
to its configured STUN server from 192.168.1.1:3344. This query host), and, when the agent is multihomed, choosing a preference for
passes through the NAT, which happens to assign the binding 10.0.1.1: its interfaces. These two preferences are then combined to compute
2498. The STUN server reflects this in the STUN Binding Response. the priority for a candidate. That priority MUST be computed using
Now, the offerer has obtained a candidate with a transport address it the following formula:
already has (10.0.1.1:2498), but from a new interface. It therefore
keeps it. When it performs its connectivity checks, the offerer will
end up sending packets from both interfaces, and those sent from its
interface on network C will succeed.
7.2. Prioritizing the Candidates and Choosing an Operating One priority = 1000*(type preference) +
100*(local preference) +
10*(stream ID) +
1*(10 - component ID)
The prioritization process takes the set of candidates for a The type preference MUST be an integer from 0 to 9 inclusive, and
particular media stream and associates each with a priority. This represents the preference for the type of the candidate (where the
priority reflects the desire that the agent has to receive media at types are local, server reflexive, peer reflexive and relayed). A 9
that candidate, and is assigned as a value from 0 to 1 (1 being most is the highest preference, and a 0 is the lowest. Setting the value
preferred). Priorities are a property of a candidate, and thus to a 0 means that candidates of this type will only be used as a last
shared across all components of a candidate. Priorities are ordinal, resort. The type preference MUST be identical for all candidates of
so that their significance is only meaningful relative to other the same type and MUST be different for candidates of different
candidates from that agent for a particular media stream. Candidates types. The type preference for peer reflexive candidates MUST be
MAY have the same priority. However, it is RECOMMENDED that each lower than that of server reflexive candidates. Note that candidates
candidate have a distinct priority. Doing so improves the efficiency gathered based on the procedures of Section 4.1 will never be peer
of ICE. reflexive candidates; candidates of these type are learned from the
STUN connectivity checks performed by ICE. The component ID is the
component ID for the candidate, and MUST be between 1 and 10
inclusive. The stream ID is an integer, starting at 9, that
decrements by one for each media stream in the session. When
signaled in the SDP, the first m-line is the one with stream ID 9,
the next with stream ID 8, the next with stream ID 7, and so on. In
essence, the stream ID indicates the position of that media stream in
the SDP itself. The stream ID MUST be less than or equal to 9, and
therefore ICE only works with multimedia sessions with 10 or fewer
media streams. The local preference MUST be an integer from 0 to 9
inclusive. It represents a preference for the particular interface
from which the candidate was obtained, in cases where an agent is
multihomed. A nine represents the highest preference, and a zero,
the lowest. When there is only a single interface, this value SHOULD
be set to nine. Generally speaking, if there are multiple candidates
for a particular component for a particular media stream which have
the same type, the local preference MUST be unique for each one. In
this specification, this only happens for multi-homed hosts.
This specification makes no normative statements on how the These rules guarantee that there is a unique priority for each
prioritization is done. However, some useful guidelines are candidate. This priority will be used by ICE to determine the order
suggested on how such a prioritization can be determined. of the connectivity checks and the relative preference for
candidates. Consequently, what follows are some guidelines for
selection of these values.
One criteria for choosing one candidate over another is whether or One criteria for selection of the type and local preference values is
not that candidate involves the use of an intermediary. That is, if the use of an intermediary. That is, if media is sent to that
media is sent to that candidate, will the media first transit an candidate, will the media first transit an intermediate server before
intermediate server before being received. Relayed candidates are being received. Relayed candidates are clearly one type of
clearly one type of candidates that involve an intermediary. Another candidates that involve an intermediary. Another are host candidates
are local candidates associated with a VPN server. When media is obtained from a VPN interface. When media is transited through an
transited through an intermediary, it can increase the latency intermediary, it can increase the latency between transmission and
between transmission and reception. It can increase the packet reception. It can increase the packet losses, because of the
losses, because of the additional router hops that may be taken. It additional router hops that may be taken. It may increase the cost
may increase the cost of providing service, since media will be of providing service, since media will be routed in and right back
routed in and right back out of an intermediary run by the provider. out of an intermediary run by the provider. If these concerns are
If these concerns are important, candidates with this property can be important, the type preference for relayed candidates can be set
listed with lower priority. lower than the type preference for reflexive and host candidates.
Indeed, it is RECOMMENDED that in this case, host candidates have a
type preference of nine, server reflexive candidates have a type
preference of 5, peer reflexive have a type prefence of 6, and
relayed candidates have a type preference of zero. Furthermore, if
an agent is multi-homed and has multiple interfaces, the local
preference for host candidates from a VPN interface SHOULD have a
priority of 0.
Another criteria for choosing one candidate over another is IP Another criteria for selection of preferences is IP address family.
address family. ICE works with both IPv4 and IPv6. It therefore ICE works with both IPv4 and IPv6. It therefore provides a
provides a transition mechanism that allows dual-stack hosts to transition mechanism that allows dual-stack hosts to prefer
prefer connectivity over IPv6, but to fall back to IPv4 in case the connectivity over IPv6, but to fall back to IPv4 in case the v6
v6 networks are disconnected (due, for example, to a failure in a networks are disconnected (due, for example, to a failure in a 6to4
6to4 relay) [23]. It can also help with hosts that have both a relay) [22]. It can also help with hosts that have both a native
native IPv6 address and a 6to4 address. In such a case, higher IPv6 address and a 6to4 address. In such a case, lower local
priority could be afforded to the native v6 address, followed by the preferences could be assigned to the v6 interface, followed by the
6to4 address, followed by a native v4 address. This allows a site to 6to4 interfaces, followed by the v4 interfaces. This allows a site
obtain and begin using native v6 addresses immediately, yet still to obtain and begin using native v6 addresses immediately, yet still
fallback to 6to4 addresses when communicating with agents in other fallback to 6to4 addresses when communicating with agents in other
sites that do not yet have native v6 connectivity. sites that do not yet have native v6 connectivity.
Another criteria for choosing one candidate over another is security. Another criteria for selecting preferences is security. If a user is
If a user is a telecommuter, and therefore connected to their a telecommuter, and therefore connected to their corporate network
corporate network and a local home network, they may prefer their and a local home network, they may prefer their voice traffic to be
voice traffic to be routed over the VPN in order to keep it on the routed over the VPN in order to keep it on the corporate network when
corporate network when communicating within the enterprise, but use communicating within the enterprise, but use the local network when
the local network when communicating with users outside of the communicating with users outside of the enterprise. In such a case,
enterprise. a VPN interface would have a higher local preference than any other
interfaces.
Another criteria for choosing one address over another is topological Another criteria for selecting preferences is topological awareness.
awareness. This is most useful for candidates that make use of This is most useful for candidates that make use of relays. In those
relays. In those cases, if an agent has preconfigured or dynamically cases, if an agent has preconfigured or dynamically discovered
discovered knowledge of the topological proximity of the relays to knowledge of the topological proximity of the relays to itself, it
itself, it can use that to select closer relays with higher priority. can use that to assign higher local preferences to candidates
obtained from closer relays.
There may be transport-specific reasons for preferring one candidate There may be transport-specific reasons for assigning preferences to
over another. In such a case, specifications defining usage of ICE candidates. In such a case, specifications defining usage of ICE
with other transport protocols SHOULD document such considerations. with other transport protocols SHOULD document such considerations.
Once the candidates have been prioritized, one may be selected as the 4.3. Choosing In-Use Candidates
operating one. This is the candidate that will be used for actual
exchange of media if and when its validated, until a higher priority
candidate is validated. The operating candidate will also be used to
receive media from ICE-unaware peers. As such, it is RECOMMENDED
that one be chosen based on the likelihood of that candidate to work
with the peer that is being contacted. Unfortunately, it is
difficult to ascertain which candidate that might be. As an example,
consider a user within an enterprise. To reach non-ICE capable
agents within the enterprise, a local candidate has to be used, since
the enterprise policies may prevent communication between elements
using a relay on the public network. However, when communicating to
peers outside of the enterprise, a relayed candidate from a
publically accessible STUN server is needed.
Indeed, the difficulty in picking just one address that will work is
the whole problem that motivated the development of this
specification in the first place. As such, it is RECOMMENDED that
the operating candidate be a relayed candidate from a STUN server
providing public IP addresses in response to an Allocate request.
Furthermore, ICE is only truly effective when it is supported on both
sides of the session. It is therefore most prudent to deploy it to
close-knit communities as a whole, rather than piecemeal. In the
example above, this would mean that ICE would ideally be deployed
completely within the enterprise, rather than just to parts of it.
An additional consideration for selection of the operating candidate
is the switching of media stream destinations between the initial
offer and the subsequent offer. The operating candidate pair in the
initial offer is validated first, and if that validation succeeds,
media will immediately begin to flow between the pair. When the ICE
checks complete and yield a higher priority candidate pair, media
will begin to flow to it (there will also be an updated offer/answer
exchange that changes the operating candidate). This will result in
a change in the destination of the media packets. This may also
cause a different path for the media packets. That path might have
different delay and jitter characteristics. As a consequence, the
jitter buffers may see a glitch, causing possible media artifacts.
If these issues are a concern, the initial offer MAY omit an
operating candidate. This is done by including an m/c-line with an
a=inactive attribute. In such a case, an updated offer will need to
be sent immediately when communicating with an ICE-unaware agent,
setting an operating candidate.
There may be transport-specific reasons for selection of an operating
candidate. In such a case, specifications defining usage of ICE with
other transport protocols SHOULD document such considerations.
7.3. Encoding Candidates into SDP
For each candidate for a media stream, the agent includes a series of
a=candidate attributes as media-level attributes, one for each
component in the candidate. Each candidate has a unique identifier,
called the candidate ID. The candidate ID MUST be chosen randomly
and contain at least 24 bits of randomness. This means that a
candidate ID must be at least 4 characters long, since each character
in the base64 alphabet used for candidate IDs contains at most 6 bits
of randomness. A candidate ID MAY be longer than 4 characters, and
different candidate IDs MAY have different lengths. It is chosen
only when the candidate is placed into the SDP for the first time;
subsequent offers or answers within the same session containing that
same candidate MUST use the same candidate ID used previously. 24
bits is sufficient because the candidate ID is not providing security
(the much more random password is). Its sole purpose is to make it
highly unlikely that both the offerer and answerer select the same
value for a candidate for the same media stream. Different values
for the candidate ID are required to break ties in the procedure that
is used to order the candidate pairs.
Each component of the candidate has an identifier, called the
component ID. The component ID is a sequence number. For each
candidate, it starts at one, and increments by one for each
component. As discussed below, ICE will perform connectivity checks
such that, between a pair of candidates, checks only occur between
transport addresses with the same component ID. As a consequence, if
one candidate has three components, and it is paired with a candidate
that has two, there will only be two transport address pairs and two
connectivity checks.
ICE will work without a standardized mapping between the components
of a media stream and the numerical value of the component ID. This
allows ICE to be used with media streams with multiple components
without development of standards around such a mapping. However, a
specific mapping has been defined in this specification for RTP -
component ID 1 corresponds to RTP, and component ID of 2 corresponds
to RTCP. Like the candidate ID, the component ID is assigned at the
time the candidate is first placed into the SDP; subsequent offers or
answers within the same session containing that same candidate MUST
use the same component ID used previously.
The transport, addr and port of the a=candidate attribute (all
defined in Section 12) are set to the transport protocol, unicast
address and port of the tranport address. A Fully Qualified Domain
Name (FQDN) for a host MAY be used in place of a unicast address. In
that case, when receiving an offer or answer containing an FQDN in an
a=candidate attribute, the FQDN is looked up in the DNS using an A or
AAAA record, and the resulting IP address is used for the remainder
of ICE processing. The qvalue is set to the priority of the
candidate, and MUST be the same for all components of the candidate.
The agent MUST include a type for the transport address by populating A candidate is said to be "in-use" if it appears in the m/c-line of
the candidate-types production with the appropriate value - "local" an offer or answer. When communicating with an ICE peer, being in-
for local transport addresses, "srflx" for server reflexive use implies that, should these candidates be selected by the ICE
candidates, and "relay" for relayed candidates. If the transport algorithm, bidirectional media can flow and the candidates can be
address is server reflexive, the agent MUST include the rel-addr and used. If a candidate is selected by ICE but is not in-use, only
rel-port productions containing the associated local transport unidirectional media can flow and only for a brief time; the
address for that server reflexive transport address. There are candidate must be made in-use through an updated offer/answer
environments in which the policy of an agent is such that it never exchange. When communicating with a peer that is not ICE-aware, the
provides local transport addresses in its offers or answers, for fear in-use candidates will be used exclusively for the exchange of media,
of revealing internal topology to external hosts. In such cases, an as defined in normal offer/answer procedures.
agent MAY include a random transport address instead, as long as it
is the same transport address for all server reflexive candidates
derived from the same actual local transport address. This is
because the transport address in the rel-addr and rel-port production
are used by the ICE algorithm itself for correlation purposes.
If the tranport address is relayed, the agent SHOULD include the rel- An agent MUST choose a set of candidates, one for each component of
addr and rel-port productions, containing the associated server each active media stream, to be in-use. A media stream is active if
reflexive transport address. When a relayed address is obtained from it does not contain the a=inactive SDP attribute.
a STUN relay, the associated server reflexive transport address is
the value from the XOR-MAPPED-ADDRESS that was returned in the same
STUN response which provided the relayed address to the agent.
Though not used directly with ICE, the rel-addr and rel-port
attributes are essential for proper functioning of QoS mechanisms,
such as those defined by 3gpp and Packetcable.
The rel-addr and rel-port production MUST NOT be present for a local It is RECOMMENDED that in-use candidates be chosen based on the
transport address. likelihood of those candidates to work with the peer that is being
contacted. Unfortunately, it is difficult to ascertain which
candidates that might be. As an example, consider a user within an
enterprise. To reach non-ICE capable agents within the enterprise,
host candidates have to be used, since the enterprise policies may
prevent communication between elements using a relay on the public
network. However, when communicating to peers outside of the
enterprise, relayed candidates from a publically accessible STUN
server are needed.
All of the candidates for a media stream share a password that is Indeed, the difficulty in picking just one transport address that
used for securing the STUN connectivity checks. The password will be will work is the whole problem that motivated the development of this
used to process the MESSAGE-INTEGRITY attribute for STUN requests specification in the first place. As such, it is RECOMMENDED that
received by the agent. The password for candidates for different relayed candidates be selected to be in-use. Furthermore, ICE is
media streams MAY be the same, or MAY be different. This password only truly effective when it is supported on both sides of the
MUST be chosen randomly with 128 bits of randomness (though it can be session. It is therefore most prudent to deploy it to close-knit
longer than 128 bits). This password is contained in the a=ice-pwd communities as a whole, rather than piecemeal. In the example above,
attribute, present as a session or media level attribute. Since each this would mean that ICE would ideally be deployed completely within
character of the ice-pwd attribute can represent six bits of the enterprise, rather than just to parts of it.
randomness, the ice-pwd attribute will always be at least 22
characters long. New passwords MUST be selected for each new
session, even if the transport address from a previous session is
being recycled.
The combination of candidate ID and component ID uniquely identify There may be transport-specific reasons for selection of an in-use
each transport address. As a consequence, each transport address has candidate. In such a case, specifications defining usage of ICE with
a unique identifier, called the transport address ID. The transport other transport protocols SHOULD document such considerations.
address ID is formed by concatenating the candidate ID with the
component ID, separated by the colon (":"). The transport address ID
is not explicitly encoded in the SDP; it is derived from the
candidate ID and component ID, which are present in the SDP. The
usage of the colon as a separator allows the candidate ID and
component ID to be extracted from the transport address ID, since the
colon is not a valid character for the candidate ID.
The transport address ID gets combined, through further 4.4. Encoding the SDP
concatenation, with the transport address ID of a transport address
from the remote candidate (separated again by another colon) to form
the username that is placed in the STUN checks between the peers.
This allows the STUN message to uniquely identify the pairing whose
connectivity it is checking. The transport address ID is needed as a
unique identifier because the IP address within the candidate fails
to provide that uniqueness as a consequence of NAT.
Consider agents A, B, and C. A and B are within private enterprise 1, The agent includes a single a=candidate media level attribute in the
which is using 10.0.0.0/8. C is within private enterprise 2, which SDP for each candidate for that media stream. The a=candidate
is also using 10.0.0.0/8. As it turns out, B and C both have IP attribute contains the IP address, port and transport protocol for
address 10.0.1.1. A sends an offer to C. C, in its answer, provides that candidate. A Fully Qualified Domain Name (FQDN) for a host MAY
A with its transport addresses. In this case, that is 10.0.1.1:8866 be used in place of a unicast address. In that case, when receiving
and 10.0.1.1:8877. As it turns out, B is in a session at that same an offer or answer containing an FQDN in an a=candidate attribute,
time, and is also using 10.0.1.1:8866 and 10.0.1.1:8877. This means the FQDN is looked up in the DNS using an A or AAAA record, and the
that B is prepared to accept STUN messages on those ports, just as C resulting IP address is used for the remainder of ICE processing.
is. A will send a STUN request to 10.0.1.1:8866 and and another to The candidate attribute also includes the component ID for that
10.0.1.1:8877. However, these do not go to C as expected. Instead, candidate. For media streams based on RTP, candidates for the actual
they go to B. If B just replied to them, A would believe it has RTP media MUST have a component ID of 1, and candidates for RTCP MUST
connectivity to C, when in fact it has connectivity to a completely have a component ID of 2. Other types of media streams which require
different user, B. To fix this, the transport address ID takes on the multiple components MUST develop specifications which define the
role of a unique identifier. C provides A with an identifier for its mapping of components to component IDs.
transport address, and A provides one to C. A concatenates these two
identifiers (with a colon between) and uses the result as the
username in its STUN query to 10.0.1.1:8866. This STUN query arrives
at B. However, the username is unknown to B, and so the request is
rejected. A treats the rejected STUN request as if there were no
connectivity to C (which is actually true). Therefore, the error is
avoided.
An unfortunate consequence of the non-uniqueness of IP addresses is The candidate attribute also includes the priority, which is the
that, in the above example, B might not even be an ICE agent. It value determined for the candidate as described in Section 4.2, and
could be any host, and the port to which the STUN packet is directed the foundation, which is the value determined for the candidate as
could be any ephemeral port on that host. If there is an application described in Section 4.1. The agent SHOULD include a type for each
listening on this socket for packets, and it is not prepared to candidate by populating the candidate-types production with the
handle malformed packets for whatever protocol is in use, the appropriate value - "host" for host candidates, "srflx" for server
operation of that application could be affected. Fortunately, since reflexive candidates, "prflx" for peer reflexive candidates (though
the ports exchanged in SDP are ephemeral and usually drawn from the these never appear in an initial offer/answer exchange), and "relay"
dynamic or registered range, the odds are good that the port is not for relayed candidates. The related address MUST NOT be included if
used to run a server on host B, but rather is the agent side of some a type was not included. If a type was included, the related address
protocol. This decreases the probability of hitting a port in-use, SHOULD be present for server reflexive, peer reflexive and relayed
due to the transient nature of port usage in this range. However, candidates. If a candidate is server or peer reflexive, the related
the possibility of a problem does exist, and network deployers should address is equal to the base for that server or peer reflexive
be prepared for it. Note that this is not a problem specific to ICE; candidate. If the candidate is relayed, the related address is equal
stray packets can arrive at a port at any time for any type of to the translation of the relayed address. If the candidiate is a
protocol, especially ones on the public Internet. As such, this host candidate, there is no related address and the rel-addr
requirement is just restating a general design guideline for Internet production MUST be omitted.
applications - be prepared for unknown packets on any port.
The operating candidate, if there is one, is placed into the m/c STUN connectivity checks between agents make use of a short term
lines of the SDP. For RTP streams, this is done by placing the RTP credential that is exchanged in the offer/answer process. The
address and port into the c and m lines in the SDP respectively. If username part of this credential is formed by concatenating a
the agent is utilizing RTCP, it MUST encode its address and port username fragment from each agent, separated by a colon. Each agent
using the a=rtcp attribute as defined in RFC 3605 [1]. If RTCP is also provides a password, used to compute the message integrity for
not in use, the agent MUST signal that using b=RS:0 and b=RR:0 as requests it receives. As such, an SDP MUST contain the ice-ufrag and
defined in RFC 3556 [6]. ice-pwd attributes, containing the username fragment and password
respectively. These can be either session or media level attributes,
and thus common across all candidates for all media streams, or all
candidates for a particular media stream, respectively. However, if
two media streams have identical ice-ufrag's, they MUST have
identical ice-pwd's. The ice-ufrag and ice-pwd attributes MUST be
chosen randomly at the beginning of a session. The ice-ufrag
attribute MUST contain at least 24 bits of randomness, and the ice-
pwd attribute MUST contain at least 128 bits of randomness. This
means that the ice-ufrag attribute will be at least 4 characters
long, and the ice-pwd at least 22 characters long, since the grammar
for these attributes allows for 6 bits of randomness per character.
The attributes MAY be longer than 4 and 21 characters respectively,
of course.
If there is no operating candidate, the agent MUST include an The m/c-line is populated with the candidates that are in-use. For
a=inactive attribute. The media address and port in the m/c-line is streams based on RTP, this is done by placing the RTP candidate into
inconsequential, since it won't be used. the m and c lines respectively. If the agent is utilizing RTCP, it
MUST encode the RTCP candidate into the m/c-line using the a=rtcp
attribute as defined in RFC 3605 [2]. If RTCP is not in use, the
agent MUST signal that using b=RS:0 and b=RR:0 as defined in RFC 3556
[5].
Encoding of candidates may involve transport protocol specific There MUST be a candidate attribute for each component of the media
considerations. There are none for UDP. However, extensions that stream in the m/c-line.
define usage of ICE with other transport protocols SHOULD specify any
special encoding considerations.
Once an offer or answer are sent, an agent MUST be prepared to Once an offer or answer are sent, an agent MUST be prepared to
receive both STUN and media packets on each candidate. As discussed receive both STUN and media packets on each candidate. As discussed
in Section 7.13, media packets can be sent to a candidate prior to in Section 11.1, media packets can be sent to a candidate prior to
its promotion to operating. its appearence in the m/c-line.
7.4. Forming Candidate Pairs
Once the offer/answer exchange has completed, both agents will have a
set of candidates for each media stream. Each agent forms a set of
candidate pairs for each media stream by combining each of its
candidates with each of the candidates of its peer. Candidates can
be paired up only if their transport protocols are identical. Each
candidate has a number of components, each of which has a transport
address. Within a candidate pair, the components themselves are
paired up such that transport addresses with the same component ID
are combined to form a transport address pair. If one candidate has
more components than the other, those extra components will not be
part of a transport address pair, won't be validated, and will
effectively be treated as if they weren't included in the candidate
pair in the first place.
For example, if an offer/answer exchange took place for a session
comprised of an audio and a video stream, and each agent had two
candidates per media stream, there would be 8 candidate pairs, 4 for
audio and 4 for video. For each of the 8 candidate pairs, there
would be two transport address pairs - one for RTP, and one for RTCP.
The relationship between a candidate, candidate pair, transport
address, transport address pair and component are shown in Figure 7.
This figure shows the relationships as seen by the agent that owns
the candidate with candidate ID "L". This candidate has two
components with transport addresses A and B respectively. This
candidate is called the native candidate, since it is the one owned
by the agent in question. The candidate owned by its peer is called
the remote candidate. As the figure shows, there is a single
candidate pair, and two components in each candidate. The native
candidate has a candidate ID of "L", and the remote candidate has a
candidate ID of "R". Since the two component IDs are 1 and 2,
candidate "L" has two transport addresses with transport address IDs
of "L:1" and "L:2" respectively. Similarly, candidate "R" has two
transport addresses with transport address IDs of "R:1" and "R:2"
respectively. Note that these candidate IDs are not actually legal
since they are not sufficiently random. However, we use "L" and "R"
to keep the figures readable.
Furthermore, each transport address pair is associated with an ID,
the transport address pair ID. This ID is equal to the concatenation
of the transport address ID of the native transport address with the
transport address ID of the remote transport address, separated by a
colon. This means that the identifiers are seen differenly for each
agent. For the agent that owns candidate "L", there are two
transport address pairs. One contains transport address "L:1" and
"R:1", with a transport address pair ID of "L:1:R:1". The other
contains transport address "L:2" and "R:2", with a transport address
pair ID of "L:2:R:2". For the agent that owns candidate "R", the
identifiers for these two transport address pairs are reversed; it
would be "R:1:L:1" for the first one and "R:2:L:2" for the second.
...............................................
. .
. .
. ............. ............. .
. . tid=L:1 . . tid=R:1 . .
. . -- . . -- . . component
component. . | A|------------------------| C| . . id=1
id=1 . . -- . Transport . -- . .
. . . Address . . .
. . . Pair . . .
. . . id=L:1:R:1 . . .
. . . . . .
. . . . . .
. . tid=L:2 . . tid=R:2 . .
component . . -- . . -- . .
id=2 . . | B|------------------------| D| component
. . -- . Transport . -- . . id=2
. . . Address . . .
. . . Pair . . .
. . . id=L:2:R:2 . . .
. . . . . .
. ............. ............. .
. Native Remote .
. Candidate Candidate .
. id=L id=R .
. .
. .
...............................................
Candidate Pair
Figure 7
If a candidate pair was created as a consequence of an offer 5. Receiving the Initial Offer
generated by an agent, then that agent is said to be the offerer of
that candidate pair and all of its transport address pairs.
Similarly, the other agent is said to be the answerer of that
candidate pair and all of its transport address pairs. As a
consequence, each agent has a particular role, either offerer or
answerer, for each transport address pair. This role is important;
when a candidate pair is to be promoted to operating, the offerer is
the one which performs the updated offer.
7.5. Ordering the Candidate Pairs When an agent receives an initial offer, it will check if the offeror
supports ICE, gather candidates, prioritize them, choose one for in-
use, encode and send an answer, and then form a check list and begin
connectivity checks.
Recall that when each candidate is encoded into SDP, it contains a 5.1. Verifying ICE Support
qvalue between 1 and 0, with 1 being the highest priority. Peer
reflexive candidates, learned through the procedures described in
Section 7.10 also have a priority between 0 and 1. For each media
stream, the native candidates are ordered based on their qvalues,
with higher q-values coming first. Amongst candidates with the same
qvalue, they are ordered based on candidate ID, using reverse ASCII
sort order. For example, the candidate with candidate ID "lagDx"
sorts before the candidate with ID "bad79", and both of those follow
the candidate with ID "m8zz".
The usage of a reverse ASCII sort order is important; as discussed in The agent will proceed with the ICE procedures defined in this
Section 13, it allows peer-derived candidates to be preferred over specification if the following are both true:
native ones.
The result of these ordering rules will be an ordered list of o There is at least one a=candidate attribute for each media stream
candidates. The first candidate in this list is given a sequence in the SDP it just received.
number of 1, the next is given a sequence number of 2, and so on.
This same procedure is done for the remote candidates. The result is
that each candidate pair has two sequence numbers, one for the native
candidate, and one for the remote candidate.
First, all of the candidate pairs for whom the smaller of the two o For each media stream, at least one of the candidates is a match
sequence numbers equals 1 are taken first. Then, all of those for for its respective in-use component in the m/c-line.
whom the smaller of the two sequence numbers equals 2 are taken next,
and so on. Amongst those pairs that share the same value for their
smaller sequence number, they are ordered by the larger of their two
sequence numbers (smallest first). Amongst those pairs that share
the same value for their smaller sequence number and the same value
for their larger sequence number, the larger of the two candidate IDs
in each pair are selected, and the pairs are ordered in reverse ASCII
order of the candidate ID, largest first.
The resulting ordering of candidate pairs is called the candidate If both of these conditions are not met, the agent MUST process the
pair priority ordered list. SDP based on normal RFC 3264 procedures, without using any of the ICE
mechanisms described in the remainder of this specification, with the
exception of Section 10, which describes keepalive procedures.
As an example, consider two agents, A and B. One offers two 5.2. Gathering Candidates
candidates for a media stream with candidate IDs of "g9g9" and
"8888", with q-values of 1.0 and 0.8 respectively. The other answers
with three candidates with candidate IDs of "h8h8", "6565" and
"klkl", with q-values of 0.3, 0.2 and 0.1 respectively. The
following table shows the rank ordering of the six candidate pairs.
The column labeled "Max SN" is the larger of the two sequence numbers
in the candidate pair, and "Min SN" is the minimum. The column
labeled "Max Cand. ID" is the value of the larger of the two
candidate IDs in the candidate pair.
Order A A A B B B Max The process for gathering candidates at the answerer is identical to
Cand. Cand. Cand. Cand. Cand. Cand. Max Min Cand. the process for the offerer as described in Section 4.1. It is
ID q-value SN ID q-value SN SN SN ID RECOMMENDED that this process begin immediately on receipt of the
--------------------------------------------------------------------- offer, prior to user acceptance of a session. Such gathering MAY
1 g9g9 1.0 1 h8h8 0.3 1 1 1 h8h8 even be done pre-emptively when an agent starts.
2 8888 0.8 2 h8h8 0.3 1 2 1 h8h8
3 g9g9 1.0 1 6565 0.2 2 2 1 g9g9
4 g9g9 1.0 1 klkl 0.1 3 3 1 klkl
5 8888 0.8 2 6565 0.2 2 2 2 8888
6 8888 0.8 2 klkl 0.1 3 3 2 klkl
The candidate pair priority ordered list is then used to obtain an 5.3. Prioritizing Candidates
ordered list of transport address pairs, on which the agent will, in
order, attempt to send STUN connectivity checks. This list, called
the transport address pair check ordered list, is very similar to the
candidate pair priority ordered list, but differs in two important
respects. Firstly, the candidate pairs matching the operating
candidate pair (there can actually be more than one) get promoted to
the top of the list. This allows the operating candidate pair to be
validated first. Secondly, many of the checks would be redundant,
and a filtering algorithm is used to eliminate these redundant
checks.
Ordering of candidates may involve transport protocol specific The process for prioritizing candidates at the answerer is identical
considerations. There are none for UDP. However, extensions that to the process followed by the offerer, as described in Section 4.2.
define usage of ICE with other transport protocols SHOULD specify any
special ordering considerations.
To form the transport address pair check ordered list, the candidate 5.4. Choosing In Use Candidates
list is first modified by taking the candidate pairs corresponding to
the operating candidate pair, and promoting them to the top of the
list. A candidate pair matches the operating candidate pair when its
native and remote transport address match the native and remote
transport addresses in the m/c-line, respectively. In unusual
circumstances, there may be more than one such candidate pair. In
such a case, they should be promoted such that the higher priority
candidate pairs appear first. In addition, it is possible that none
of the candidate pairs match the operating candidate pair. In that
case, no candidate pairs are promoted.
Within each candidate pair there will be a set of transport address The process for selecting in-use candidates at the answerer is
pairs, one for each component ID. Those pairs are ordered by identical to the process followed by the offerer, as described in
component ID. The result is an absolute ordering of all transport Section 4.3.
address pairs for a media stream, sorted first by the order of their
candidate pairs (with the exception of the operating candidate),
followed by the order of their component IDs. This ordering is used
as the start of the transport address pair check ordering.
The next step is to remove redundant transport addresses. Starting 5.5. Encoding the SDP
at the top of the list, the agent moves down from one transport
address pair to the next. If a transport address pair under
consideration has the same remote transport address as a previous
pair, based on transport address pair ID comparisons, and the native
transport address from that previous pair has the same origination
transport address as the one under consideration (based on IP address
and port comparison), the one under consideration is removed from the
list.
The origination transport address is the address that the agent would The process for encoding the SDP at the answerer is identical to the
send from in order to emit a packet with that native transport process followed by the offerer, as described in Section 4.4.
address as a source transport address. For a local transport
address, the origination transport address is equal to that local
transport address. For a server reflexive transport address, the
origination transport address is equal to the local transport address
from which it was derived. For relayed addresses, packets are
emitted by explicitly sending them through the relay. Consequently,
the origination transport address is equal to the relayed address.
After the agent has gone through the entire list, the result is the 5.6. Forming the Check List
transport address pair check ordered list.
The pairs that get removed are redundant since the agent would send a Next, the agent forms the check list. The check list is a sequence
STUN connectivity check using the same source and destination of STUN connectivity checks that are performed by the agent. To form
addresses as a previous check. Consequently, the connectivity check the check list, the agent forms candidate pairs, computes a candidate
will provide no information to the remote agent except for the pair priority, orders the pairs by priority, prunes them, and sets
transport address pair ID its associated with. These turn out to be their states. These steps are described in this section.
unnecesary due to the STUN processing rules outlined below.
7.6. Performing the Connectivity Checks First, the agent takes each of its candidates (called local
candidates) and pairs them with the candidates it received from its
peer (called remote candidates). A local candidate is paired with a
remote candidate if and only if the two candidates are for the same
media stream, have the same component ID, and have the same IP
address version. It is possible that some of the local candidates
don't get paired with a remote candidate, and some of the remote
candidates don't get paired with local candidates. This can happen
if one agent didn't include candidates for the all of the components
for a media stream. In the case of RTP, for example, this would
happen when one agent provided candidates for RTCP, and the other did
not. If this happens, the number of components for that media stream
is effectively reduced, and considered to be equal to the minimum
across both agents of the maximum component ID provided by each agent
across all components for the media stream.
Connectivity checks are a STUN usage defined in [12]. They are Once the pairs are formed, a candidate pair priority is computed.
performed by sending peer-to-peer STUN Binding Requests. These Let O-P be the priority for the candidate provided by the offerer.
checks result in a transport address pair progressing through a state Let A-P be the priority for the candidate provided by the answerer.
machine that captures the progress of the connectivity checks. The Let O-IP be the IP address (without the port) of the candidate
specific state machine and the procedures for the connectivity checks provided by the offerer. Let SZ be two to the power of 32 for IPv4
are specific to the transport protocol. This specification defines candidates, and two to the power of 128 for IPv6 candidates. The
rules for UDP. The state machine processing described in this priority for a pair is computed as:
section MUST be followed by agents. Extensions to ICE that describe
other transport protocols SHOULD describe the state machine and the
procedures for connectivity checks.
The set of states for a transport address pair visited by the offerer pair priority = 10000*MIN(O-P,A-P) + MAX(O-P,A-P) + O-IP/SZ
and answerer are depicted graphically in Figure 9. Note that this
state machine exists for all transport address pairs, including ones
pruned from the transport address pair check ordered list.
| OPEN ISSUE: This can be larger than 32 bits. Should consider ways
|Start of reducing that.
|
|
V
+------------+
+-----------------| |
| | |
| +----| Waiting |----------------+
| | | | |
| | | | |
| Miss | +------------+ |
| ---- | | |
Match Res| - | | Selected | Match Req
---------| | | --------. | -------
- | | | Send Req Match Req | Send Req
| | V --------- |
| Match Res | +------------+ Re-Xmit |
| --------- | | | Req |
| - | | | |
| +------c----| Testing |-----------+ |
| | | | | | |
| | | | | | |
| | | +------------+ | |
| | | | | |
| | | | Error or | |
| | | | Miss | |
Timer Tr | | | | ----- | |
-------- V V | V - V V
Send Req +------------+ | +------------+ +------------+
+-----| | +--->| | | |
| | Recv- | | | | Send- |
| | Valid |------->| Invalid |<-------| Valid |
| | | | | | |
+---->| | Error, | | Error, | |
+------------+ Miss +------------+ Miss +------------+
| ----- ^ ----- |
| - | Error, - |
| | Miss |
| | ----- |
| | - |
| +------------+ |
| | | |
| | | |
+-------------->| Valid |<-------------+
Match Req | | Match Res
--------- | | ---------
- +------------+ -
| ^
| |
| |
+-------+
Timer Tr
--------
Send Req
Figure 9 This formula ensures a unique priority for each pair in most cases.
One the priority is assigned, the agent sorts the candidate pairs in
decreasing order of priority. If two pairs have identical priority,
the ordering amongst them is arbitrary.
The state machine has six states - Waiting, Testing, Recv-Valid, This sorted list of candidate pairs is used to determine a sequence
Send-Valid, Valid and Invalid. In the Waiting state, the agent is of connectivity checks that will be performed. Each check involves
waiting to send or receive a connectivity check for the pair. In the sending a request from a local candidate to a remote candidate.
Testing state, the agent has sent a connectivity check and is Since an agent cannot send requests directly from a reflexive
awaiting a response. In the Recv-Valid state, the agent knows that candidate, but only from its base, the agent next goes through the
its peer can receive packets from it on this transport address pair. sorted list of candidate pairs. For each pair where the local
In the Send-Valid state, the agent knows that its peer can send candidate is server reflexive, the server reflexive candidate MUST be
packets to it. In the Valid state, the agent knows that its peer can replaced by its base. Once this has been done, the agent MUST remove
both send and receive packets from it. redundant pairs. A pair is redundant if its local and remote
candidates are identical to the local and remote candidates of a pair
higher up on the priority list. The result is called the check list,
and each candidate pair on it is called a check.
Initially, all transport address pairs start in the Waiting state. Each check is also said to have a foundation, which is merely the
In this state, the agent waits for one of three events - a chance to combination of the foundations of the local and remote candidates in
send a Binding Request, receipt of a Binding Request, or receipt of a the check.
Binding Response.
Since there is an instance of the state machine for each transport Finally, each check in the check list is associated with a state.
address pair, Binding Requests and responses need to be matched to There are five potential values that the state can have:
the specific state machine for which they were meant to apply. As
described below, the Binding Request may not be a match for the
transport address pair it was meant to validate. To find the
transport address pair it was meant to validate, called the target
transport address pair, the agent examinines the USERNAME of the
incoming Binding Request. The USERNAME directly contains the
transport address pair ID for the pair it was meant to validate.
Binding Responses are matched to their requests using the STUN
transaction ID, and then mapped to the transport address pair from
that.
For each media stream, the agent starts a new connectivity check for Waiting: This check has not been performed, and can be performed as
a transport address pair every Tb*RND seconds. Tb SHOULD scale soon as it is the highest priority Waiting check on the check
linearly with the number of media streams, so that the pace of list.
connectivity checks overall is invariant to the number of media
streams. Consequently, it is RECOMMENDED that Tb have a default
value of N*50ms, where N is the number of media streams. RND is a
random number chosen uniformly between 0.7 and 1.3, and it helps to
avoid synchronization between the transmission of connectivity checks
for different media streams. On average, if there are N media
streams, the checks across all media streams will be paced out at a
total of N/Tb checks per second. The check is started for the first
transport address pair in the transport address pair check ordered
list that is in the Waiting state. The "Selected" event is passed to
the state machine for this transport address pair, causing it to be
moved to the Testing state. The agent then sends a connectivity
check using a STUN Binding Request, as outlined in Section 7.7.
Once a STUN connectivity check begins, the processing of the check In-Progress: A request has been sent for this check, but the
follows the rules for STUN. Specifically, retransmits of STUN transaction is in progress.
requests are done as specified in [12], and furthermore, if a
transaction fails and needs to be retried, that retry can happen
rapidly, as described below. It doesn't "count" against the average
rate limit of 1/Tb checks per second per media stream. In addition,
the keepalives that are generated for a valid pair do not count
against the rate limit either. The rate limit applies strictly to
the start of connectivity checks for a transport address pair that
has been newly signaled through an offer/answer exchange.
When an agent receives a Binding Request, which per the processing Succeeded: This check was already done and produced a successful
rules of Section 7.8 produces a succesful response, the agent result.
examines the source transport address of the request. If the native
transport address was relayed, this would be the source as seen by
the relay. For the STUN relay usage, that source transport address
will be present in the REMOTE-ADDRESS attribute of a STUN Data
Indication message, if the Binding Request was delivered through a
Data Indication. If the Binding Request was not encapsulated in a
Data Indication, that source transport address is equal to the
current active destination for the STUN relay session.
If the source transport address matches the remote transport address Failed: This check was already done and failed, either never
of the target transport address pair, the Binding Request is producing any response or producing an unrecoverable failure
considered to be a match for the target transport address pair. response.
Consequently, a Match Req event is passed to the state machine for
the target transport address pair. If the state machine was in the
Waiting or Testing state, the state machine moves into the Send-Valid
state. If it was previously in the Waiting state, the agent sends a
connectivity check of its own for the target transport address pair,
as outlined in Section 7.7. If it was in the Testing state, it
retransmits a Binding Request for the transaction in progress. This
retransmission is one that would not normally occur based on the
procedures in [12]. ICE "prods" the STUN transaction state machine
to send an extra retransmit, in addition to the one which is
scheduled to be sent next. This helps speed up bidirectional
connectivity verification when one agent is behind a NAT with an
address and port dependent filtering behavior [32].
If the source transport addresses in the Binding Request was not a Frozen: This check hasn't been performed, and it can't yet be
match for the remote transport address, the Binding Request is performed until some other check succeeds, allowing it to move
considered to be a miss for the target transport address pair. into the Waiting state.
Consequently, a Miss event is passed to the state machine of the
target transport address pair, and it immediately moves into the
Invalid state. Typically, the source transport address won't match
when there was a NAT between the sender and receiver with an address
and port dependent mapping property, though there are other cases in
which this can happen.
Though it was a miss for the target transport address pair, the First, the agent sets all of the checks to the Frozen state. Then,
connectivity check may have been a match for a different transport it sets the first check in the check list to Waiting. It then finds
address pair. To determine this, the agent checks the source all of the other checks for the same media stream and with the same
transport address of the Binding Request against all of the other component ID, but different foundations, and sets all of their states
remote transport addresses of transport address pairs for the same to Waiting.
media stream that use the same transport protocol and share the same
native transport address (based on transport address ID comparison)
of the target. Of those that match (assuming at least one matches),
it refines the set further by selecting only those for whom the
origination transport address of the remote transport address matches
the origination transport address of the remote transport address in
the target transport address pair. The origination transport address
for a remote transport address is obtained from information signaled
in the SDP, and depends on the type. For a local transport address,
the origination address equals that local transport address. For a
server reflexive transport address, the origination address is
obtained from the related address information provided in the SDP.
For a relayed transport address, the origination transport address
quals that relayed transport address. For these three types, the
type is signaled in the SDP. For a peer derived transport address,
the origination address is the same as the origination address of the
generating transport address.
If there was a match (there can only be either one or zero matches), 5.7. Performing Periodic Checks
this match is called the alternate. In many cases, the alternate
transport address pair will not be in the transport address pair
check ordered list; it will have been one of the ones pruned.
Indeed, this is why it was pruned - a check on the remaining
transport address pairs can serve to validate it. The state machine
for the alternate is passed the Match Req event. If it was in the
Waiting state, this causes it to move into the Send-Valid state, and
a connectivity check is generated for the alternate transport address
pair. It may have been in the Testing state, in which case it moves
move into the Send-Valid state, and the agent restransmits the
Binding Request for the transaction in progress. If it was the in
the Recv-Valid state, this causes it to move into the Valid state.
If no alternate could be found, it means that a new remote transport An agent performs two types of checks. The first type are periodic
address and corresponding origination transport address have been checks. These checks occur periodically, and involve choosing the
discovered. In this case, the agent follows the procedures of highest priority check in the Waiting state from the check list, and
Section 7.10.1 to create a new transport address pair and state performing it. The other type of check is called a triggered check.
machine for it. This is a check that is performed on receipt of a connectivity check
from the peer. This section describes how periodic checks are
performed.
If the Binding Request didn't generate a success response, an Error Once the agent has computed the check list as described in
event is passed to the state machine of the target, causing it to Section 5.6, it sets a timer that fires every Ta seconds. This is
move into the Invalid state. the same value used to pace the gathering of candidates, as described
in Section 4.1. The first timer fires immediately, so that the agent
performs a connectivity check the moment the offer/answer exchange
has been done, followed by the next periodic check Ta seconds later.
If the agent receives a successful response to its STUN request, it When the timer fires, the agent MUST find the highest priority check
agent examines the transport address in the XOR-MAPPED-ADDRESS in the check list that is in the Waiting state. The agent then sends
attribute of the response. This will be a peer reflexive transport a STUN check from the local candidate of that check to the remote
address. If the peer reflexive transport address matches (based on candidate of that check. The procedures for forming the STUN request
IP address and port comparison) the native transport address of the for this purpose are described in Section 7.7.1. If none of the
target transport address pair, a Match Res event is passed to the checks in the check list are in the Waiting state, but there are
state machine of the target. If the state machine was in the Testing checks in the Frozen state, the highest priority check in the Frozen
state, the state machine moves into the Recv-Valid state. If it was state is moved into the Waiting state, and that check is performed.
in the Send-Valid state, it moves into the Valid state. When a check is performed, its state is set to In-Progress. If there
are no checks in either the Waiting or Frozen state, then timer Ta is
stopped.
If, however, the transport addresses didn't match, a Miss event is Performing the connectivity check requires the agent to know the
passed to the state machine of the target, and it immediately moves username fragment for the local and remote candidates, and the
into the Invalid state. The agent checks the peer reflexive password for the remote candidate. For periodic checks, the remote
transport address against all of the other native transport addresses username fragment and password are learned directly from the SDP
for transport address pairs for the same media stream with the same received from the peer, and the local username fragment is known by
transport protocol and the same remote transport address (based on the agent.
comparison of transport address ID) as the target. Of those that
match (assuming at least one matches), it refines the set further by
selecting only those for whom the origination transport address of
the native transport address matches the origination address of the
native transport address in the target transport address pair. The
resulting transport address pair (there can be only zero or one) is
called the alternate. In many cases, the alternate transport address
pair will not be in the transport address pair check ordered list; it
will have been one of the ones pruned. The state machine for the
alternate is passed the Match Res event. If it was in the Waiting
state, this causes it to move into the Recv-Valid state. It may have
been in the Testing state, in which case it moves move into the Recv-
Valid state. If it was the in the Send-Valid state, this causes it
to move into the Valid state.
If no alternate could be found, the Binding Response will create a 6. Receipt of the Initial Answer
new peer reflexive transport address, and the procedures of
Section 7.10.2 are followed to create a new transport address pair
and state machine for it.
In any state, if the STUN transaction results in an error, the state This section describes the procedures that an agent follows when it
machine moves into the Invalid state. A STUN transaction produces an receives the answer from the peer. It verifies that its peer
"error" based on the processing in Section 7.7, which indicates which supports ICE, forms the check list and begins performing periodic
STUN response codes constitute an error as far as ICE processing is checks.
concerned.
If a transport address pair is in the Recv-Valid or Valid state, an 6.1. Verifying ICE Support
agent MUST generate a new STUN Binding Request transaction every Tr
seconds. This transaction ensures that NAT bindings for the
transport address pair remain open while the candidate is under
consideration. The transaction is performed as outlined in
Section 7.7. These transactions can also be used to keep the NAT
bindings alive when the candidate is promoted to operating, as
described in Section 7.12. Tr SHOULD be configurable, and SHOULD
default to 15 seconds. These STUN transactions are processed in the
same way as any other, and can result in new peer derived transport
addresses, or can fail and cause the transport address pair to be
invalidated.
The candidate pair itself has a state, which is derived from the The offerer follows the same procedures described for the answerer in
states of its transport address pairs. If at least one of the Section 5.1.
transport address pairs in a candidate pair is in the invalid state,
the state of the candidate pair is considered to be invalid. If the
candidate pair enters this state, an agent moves the state machines
for all of the other transport address pairs in this candidate pair
into the invalid state as well. This will ensure that connectivity
checks never start for those transport address pairs. Furthermore,
if checks are already in progress for one of those transport address
pairs, the agent ceases them.
If all of the transport address pairs making up the candidate pair 6.2. Forming the Check List
are Valid, the candidate pair is considered valid. If all of the
transport address pairs making up the candidate pair are either Valid
or Recv-Valid, and at least one is Recv-Valid, the candidate pair is
considered to be Recv-Valid. If all of the transport address pairs
making up the candidate pair are either Valid or Send-Valid, and at
least one is Send-Valid, the candidate pair is considered to be Send-
Valid. If all of the transport address pairs in a candidate pair are
in the Waiting state, the candidate pair is in the waiting state. If
all of the transport address pairs in the candidate pair are either
in the Waiting or Testing states, and at least one is in the Testing
state, the state of the candidate pair is Testing. Otherwise, the
state of the candidate pair is considered Indeterminate.
A candidate itself also has a state. If a candidate is present in at The offerer follows the same procedures described for the answerer in
least one valid candidate pair, that candidate is said to be valid. Section 5.6.
If all of the candidate pairs containing that candidate are invalid,
the candidate itself is invalid. Otherwise, the candidate's state is
Indeterminate.
7.7. Sending a Binding Request for Connectivity Checks 6.3. Performing Periodic Checks
An agent performs a connectivity check on a transport address pair by The offerer follows the same procedures described for the answerer in
sending a STUN Binding Request from its native transport address, and Section 5.7.
sending it to the remote transport address. Sending from its native
transport address is done by sending it from its origination
transport address. As mentioned above, the origination transport
address depends on the type of transport protocol and the type of
transport address (local, reflexive, or relayed). This specification
defines the meaning for UDP. Specifications defining other transport
protocols must define what this means for them.
For UDP-based local transport addresses, sending from the local 7. Connectivity Checks
transport address has the meaning one would expect - the request is
sent such that the source IP address and port equal that of the local
transport address. For reflexive transport addresses, it is sent by
sending from the associated local transport address used to derive
that reflexive address. For relayed transport addresses, it is sent
by using STUN mechanisms to send the request through the STUN relay
(using the Send request). Sending the request through the STUN relay
server necesarily requires that the request be sent from the client,
using the local transport address used to derive the relayed
transport address.
The Binding Request sent by the agent MUST contain the USERNAME This section describes how connectivity checks are performed.
attribute. This attribute MUST be set to the transport address pair Connectivity checks are a STUN usage, and the behaviors described
ID of the corresponding transport address pair as seen by its peer. here meet the guidelines for definitions of new usages as outlined in
Thus, for the first transport address pair in Figure 7, if the agent [11]
on the left sends the STUN Binding Request, the USERNAME will have
the value R:1:L:1. If the agent on the right sends the STUN Binding
Request, the USERNAME will have the value L:1:R:1. To be clear, the
USERNAME that is used is NOT the one seen locally, but rather the one
as seen by its peer. The request SHOULD contain the MESSAGE-
INTEGRITY attribute, computed according to [12]. The key used as
input to the HMAC is the password provided by the peer for this
remote transport address. This password will be identical for all
remote transport addresses for the same media stream.
Note that all ICE implementations are required to be compliant to Note that all ICE implementations are required to be compliant to
[12], as opposed to the older [14]. Consequently, all connectivity [11], as opposed to the older [13].
checks will contain the magic cookie in the STUN header, and cause
the STUN server embedded in each ICE implementation to include XOR-
MAPPED-ADDRESS attributes in the response, rather than MAPPED-
ADDRESS.
Once created, the STUN transaction is linked to the transport address
pair so that, when the response is received, the state machine on the
linked transport address pair can be updated.
The STUN transaction will generate either a timeout, or a response.
If the response is a 420, 500, or 401, the agent should try again as
described in [12] (as mentioned above, it need not wait the roughly
Tb seconds to try again). Either initially, or after such a retry,
the STUN transaction might produce a non-recoverable failure response
or a failure result inapplicable to this usage of STUN and thus
unrecoverable. If this happens, an error event is generated into the
state machine, and the transport address pair enters the invalid
state.
If the STUN transaction times out, the client SHOULD NOT retry. The
only reason a retry might succeed is if there was severe packet loss
during the duration of the check, or the answer was significantly
delayed, also due to packet loss. However, STUN Binding Request
transactions run for 9.5 seconds, which is well beyond the typical
tolerance for a session establishment. The retries come with a
penalty of additional traffic, which can be used to launch DoS
attacks (see Section 13.4.2). The only reason to not follow the
SHOULD NOT is if the agent has adjusted the STUN transaction timers
to be more aggressive.
If the Binding Response is a 200, the agent SHOULD check for the
MESSAGE-INTEGRITY attribute and verify it, as discussed in [12].
Indeed, this check SHOULD be done for all responses. This will
result in the response being discarded (eventually leading to a
timeout), if the integrity check fails.
7.8. Receiving a Binding Request for Connectivity Checks
As a result of providing a list of candidates in its offer or answer,
an agent will receive STUN Binding Request messages. An agent MUST
be prepared to receive STUN Binding Requests on each local transport
address from the moment it sends an offer or answer that contains a
candidate with that local transport address. Similarly, it MUST be
prepared to receive STUN Binding Requests on a local transport
address the moment it sends an offer or answer that contains a
derived candidate derived from that local transport address. It can
cease listening for STUN messages on that local transport address
after sending an updated offer or answer which does not include any
candidates with transport addresses that are equal to or derived from
that local transport address.
As discussed in [12], since the username and password for STUN
requests are exchanged through another mechanism - here, ICE - the
Shared Secret Request mechanism is not needed and need not be
implemented by agents that provide the connectivity check usage.
One of the candidates may be in use as the operating candidate, or
may become promoted to the operating candidate in the next offer/
answer exchange as a consequence of a successful validation. In
either case, both media and STUN packets will be sent to the
transport addresses comprising that candidate, causing both to
receive on their associated local transport addresses. The agent
MUST be able to disambiguate them. This is done trivially by looking
for the STUN magic cookie as the value of the second 32-bit word in
the packet. If present, it identifies a STUN packet.
Processing of the Binding Request proceeds in two steps. The first
is generation of the response, and the second is ICE-specific
processing. Generation of the response follows the general
procedures of [12], and is independent of the state machinery
described in Section 7.6. The USERNAME is considered valid if one of
the candidate IDs sent in an offer or answer is a prefix of the
USERNAME (this will always be the case, even for peer reflexive
candidates), and for the component indicated in the USERNAME, the
associated local transport address matches the local transport
address on which the request was received. The password associated
with that candidate ID, which was provided by the agent to its peer,
is used to verify the MESSAGE-INTEGRITY attribute, if one was present
in the request. If the USERNAME is not valid, the agent generates a
430. Otherwise, the success response will include the XOR-MAPPED-
ADDRESS attribute, which is used for learning new candidates, as
described in Section 7.10. The XOR-MAPPED-ADDRESS attribute is
constructed using the source IP address and port of the Binding
Request. For Binding Requests received over relayed transport
addresses, this MUST be the source IP address and port of the Binding
Request when it arrived at the relay, prior to forwarding towards the
agent. That source transport address will be present in the REMOTE-
ADDRESS attribute of a STUN Data Indication message, if the Binding
Request was delivered through a Data Indication. If the Binding
Request was not encapsulated in a Data Indication, that source
address is equal to the current active destination for the STUN relay
session.
The ICE processing involves changes to the state machine for a
transport address pair. This processing cannot be done until the
initial offer/answer exchange has completed. As a consequence, if
the offerer received a Binding Request that generated a success
response, but had not yet received the answer to its offer, it waits
for the answer, and when it arrives, then performs the ICE
processing.
The agent takes the entire contents of the USERNAME, and compares
them against the transport address pair identifiers as seen by that
agent for each transport address pair. If there is no match, nothing
is done - this should never happen for compliant implementations. If
there is a match, the resulting transport address pair is called the
matching transport address pair. The state machine for the matching
transport address pair is then updated based on the receipt of a STUN
Binding Request, and the resulting actions described in Section 7.6
are undertaken.
An agent will continue to receive periodic STUN connectivity checks
on a local transport address as long as it had listed that transport
address, or one derived from it, in an a=candidate attribute in its
most recent offer or answer and the transport address is for UDP.
Whether STUN keepalives are used for other transport protocols is
defined by the specifications for that transport protocol. The agent
processes any such transactions according to this section. It is
possible that a transport address pair that was previously valid may
become invalidated as a result of a subsequent failed STUN
transaction.
7.9. Promoting a Candidate to Operating
As a consequence of the connectivity checks, each agent will change
the states for each transport address pair, and consequently, for the
candidate pairs. When a candidate pair enters the valid state, and
the agent is in the role of offerer for that candidate pair, the
agent follows the logic in this section. The rules only apply to the
offerer of a candidate pair in order to eliminate the possibility of
both agents simultaneously offering an update to promote a candidate
to operating.
The agent locates the candidate pair in the candidate pair priority
ordered list. If it is the highest priority candidate pair, the
agent SHOULD send an updated offer immediately as described in
Section 7.11.1. If it is not the highest priority candidate pair,
and the states of all lower priority candidate pairs are Invalid, the
agent SHOULD send an updated offer immediately. If it is not the
highest priority candidate pair, and the state of at least one of the
lower priority candidate pairs is Indeterminate, the agent does
nothing. Tests have yet to begin for higher priority candidate
pairs. If it is not the highest priority candidate pair, and none of
the lower priority candidate pairs have a state of Indeterminate, the
agents starts a timer, called the wait-state timer, but only if this
timer is not already running. The timer is set to fire in Tws
seconds. Tws SHOULD be configurable, and SHOULD have a default of
Tws = max(0, 200ms - N*Tb), where N is the number of components for
the candidates for this media stream. The 200ms allows for a single
STUN retransmission (which takes 100ms) and an RTT of 100ms. This
timer allows for a higher priority connectivity check to complete, in
the event its STUN Binding Request was lost or delayed in the
network. Note that the timer goes to zero as the number of
components increases. If, prior to the wait-state timer firing,
another connectivity check completes and a candidate pair is
validated, there is no need to reset or cancel the timer. Once the
timer fires, the agent SHOULD issue an updated offer as described in
Section 7.11.1. This updated offer will use the highest priority
candidate pair in Valid state when the timer fires.
7.10. Learning New Candidates from Connectivity Checks
ICE makes use of reflexive addresses, which are addresses that inform
an agent of its transport address as seen by another host. An
initial offer or answer generated by an agent includes server
reflexive addresses, which are learned from a configured or
discovered STUN server in the network. However, the connectivity
checks themselves can inform an agent of reflexive addresses, and in
particular, ones that are reflexive towards its peer. These are
called peer reflexive candidates. A new peer reflexive candidate is
typically observed when two agents are separated by a NAT with the
address-dependent or address and port dependent mapping properties
[32]. However, in unusual topologies, peer reflexive candidates can
be observed even when there are only NATs with the endpoint
independent mapping property. Because STUN and the media packets are
sent on the same port, regardless of the filtering properties of the
NAT (whether endpoint independent, address dependent, or address and
port dependent), this reflexive address can be used by the peer for
sending STUN and media packets back towards the agent.
To obtain and use these peer reflexive transport addresses, ICE
agents MUST perform the additional processing on the receipt of STUN
Binding Requests and responses described in the following two
subsections. These procedures are not just applied in the (hopefully
increasingly rare) case of address and port dependent mapping NATs.
They are also needed for behave-compliant NATs [32].
7.10.1. On Receipt of a Binding Request
The procedures in this section are followed when an agent receives a
STUN Binding Request matched to a target transport address pair whose
source transport address (where the source is the one seen by the
relay for requests received on a relayed transport address) doesn't
match any of the existing remote transport addresses, or where the
source matches, but the origination transport address does not. This
source address and its associated origination transport address
become a new remote transport address.
To use it, that source transport address needs to be associated with
a candidate (called a peer-derived candidate). In this case,
however, the candidate isn't signaled through an offer/answer
exchange; it is constructed dynamically from information in the STUN
request. Like all other candidates, the peer-derived candidate has a
candidate ID. The candidate ID is derived from the candidate IDs of
the target candidate pair. In particular, the candidate ID is
constructed by concatenating the remote candidate ID with the native
candidate ID (without the colon). The password for the new candidate
equals that of the remote candidate ID in the target candidate pair
(note that, this password would be the same for all remote candidates
for the same media line).
When the STUN Binding Request is received, the agent constructs the
candidate ID for the peer reflexive candidate, and checks to see if
that candidate exists. It may already exist if it had been
constructed as a consequence of a previous application of this logic
on receipt of a Binding Request from a different remote transport
address of the same new peer reflexive candidate. If there is not
yet a peer reflexive candidate with that candidate ID, the agent
creates it, and assigns it the newly computed candidate ID. The
priority of the peer-derived candidate is set to the priority of its
generating candidate. The generating candidate is the one that the
new peer derived candidate comes from - the remote candidate in the
target candidate. Note that, at this time, the peer derived
candidate has no transport addresses in it.
The remote candidate is then paired up with a native candidate. 7.1. Applicability
However, unlike the procedures of Section 7.5, which pair up each
remote candidate with each native candidate, this peer reflexive
candidate is only paired up with a the native candidate from the
candidate pair from which it was derived. This creates a new
candidate pair. This new candidate pair is inserted into the
candidate pair priority ordered list based on the ordering rules
defined in Section 7.5. Note that no entries are added to the
transport address pair check ordered list.
Recall that, for each candidate pair, one agent plays the role of This STUN usage provides a connectivity check between two peers
offerer, and the other of answerer. For a peer-reflexive candidate, participating in an offer/answer exchange. This check serves to
the role is identical to that of its generating candidate. validate a pair of candidates for usage of exchange of media.
Connectivity checks also allow agents to discover reflexive
candidates towards their peers, called peer reflexive candidates.
Finally, connectivity checks serve to keep NAT bindings alive.
Newly created or not, the agent extracts the component ID from the It is fundamental to this STUN usage that the addresses and ports
matching transport address pair, and sees if a transport address with used for media are the same ones used for the Binding Requests and
that same component ID exists in the peer reflexive candidate. If it responses. Consequently, it will be necessary to demultiplex STUN
does, the agent does nothing further. This can happen in unusual traffic from whatever the media traffic is. This demultiplexing is
cases when there is a NAT reboot in the middle of a STUN transaction, done using the techniques described in [11].
causing two requests in the same transaction two produce two
different transport addresses. If there is no transport address with
the same component ID in the peer reflexive candidate, the agent adds
a transport address to the peer reflexive candidate. This transport
address is equal to the source IP address and port from the incoming
STUN Binding Request (and in the case of Binding Request received on
a relayed transport address, the one seen by the relay), and has a
transport protocol equal to that of the incoming STUN request. It is
assigned the component ID equal to the component ID in the target
transport address pair. This new transport address will have a
transport address ID, equal to the concatenation of the candidate ID
for this new candidate, and the component ID, separated by a colon.
The type of the transport address is considered to be peer reflexive,
though this is never signaled through SDP and so there is no
candidate-types value defined for it. Recall that each transport
address is associated with an origination transport address. For
server reflexive candidates, the origination transport address is
signaled through SDP. For peer reflexive transport addresses, it is
inherited from the origination transport address of the generating
transport address. If the generating transport address was a local
transport address, then the origination transport address is that
transport address. If the generating transport address was server
reflexive, the origination transport address is the related transport
address that was signaled for that server reflexive candidate. If
the generating transport address was relayed, the origination
transport address is the relayed transport address itself. Whether
and how other candidate attributes defined by extensions are
inherited depends on the extension.
The newly added transport address is paired up with the native 7.2. Client Discovery of Server
transport address with the same component ID. Initially, the peer
reflexive candidate will start with a single transport address a
transport address pair. More are added as the connectivity checks
for the original candidate pair take place.
Figure 10 provides a pictorial representation of the peer reflexive The client does not follow the DNS-based procedures defined in [11].
candidate (the one with id=RL) and its pairing with the native Rather, the remote candidate of the check to be performed is used as
candidate with ID L. The candidate with ID R is the generating the IP address and port of the STUN server. Note that the STUN
candidate. The peer reflexive candidate is effectively an alternate server is a logical entity, and is not a physically distinct server
for that generating candidate, but is only paired with a specific in this usage.
native candidate. Note that, for a particular generating candidate,
there can be many peer derived candidates, up to one for each native
candidate. Also note that candidate IDs with values "L" and "R" and
"RL" are not actually permitted, since all candidate IDs must be at
least four characters long. These shortened candidate IDs are used
to keep the figure readable.
............. ............. 7.3. Server Determination of Usage
. tid=L:1 . . tid=R:1 .
component. -- . id=L:1:R:1 . -- .component
id=1 . | A|-------------------------| C| . id=1
. -- -------+ . -- .
. . | . . Generating
. . | . . Candidate
. tid=L:2 . | . tid=R:2 .
component. -- . | id=L:2:R:2 . -- .component
id=2 . | B|-------C-----------------| D| . id=2
. -- -----+ | . -- .
.............| | .............
Native | | Remote
Candidate | | Candidate
id=L | | id=R
| |
| | .............
| | . tid=RL:1 .
| | id=L:1:RL:1 . -- .component
| +-----------------| C| . id=1
| . -- .
| . . Peer Derived
| . . Candidate
| . tid=RL:2 .
| id=L:2:RL:2 . -- .component
+-------------------| D| . id=2
. -- .
.............
Remote
Candidate
id=RL
Figure 10 The server is aware of this usage because it signaled this port
through the offer/answer exchange. Any STUN packets received on this
port will be for the connectivity check usage.
The new transport address pair has a state machine associated with 7.4. New Requests or Indications
it. The state that is entered, and actions to take as a consequence,
are specific to the transport protocol. For UDP, the procedures are
defined here. Extensions that define processing for other transport
protocols SHOULD describe the behavior.
For UDP, the state machine enters the Send-Valid state. Effectively, This usage does not define any new message types.
the Binding Request just received "counts" as a validation in this
direction, even though it was formally done for a different transport
address pair. In addition, the agent generates a Binding Request for
the new transport address pair, as described in Section 7.7.
Processing of the response follows the logic described in
Section 7.6.
As with all candidate pairs, the state of this new candidate pair is 7.5. New Attributes
derived from the states of its transport address pairs. Until the
number of transport address pairs in the candidate pair equals the
transport address pair count of the candidate pair from which it is
derived, the state of the candidate pair is Indeterminate. Once they
are equal, the state is derived just like any other candidate pair.
7.10.2. On Receipt of a Binding Response This usage defines a new attribute, PRIORITY. This attribute
indicates the priority that is to be associated with a peer reflexive
candidate, should one be discovered by this check. It is a 32 bit
unsigned integer, and has an attribute type of 0x0024.
The procedures on receipt of a Binding Response are nearly identical 7.6. New Error Response Codes
to those for receipt of a Binding Request as described above.
The procedures in this section are followed when an agent receives a This usage does not define any new error response codes.
STUN Binding Response matched to a transport address pair whose XOR-
MAPPED-ADDRESS doesn't match any of the existing native transport
addresses. The XOR-MAPPED-ADDRESS becomes a new native transport
address.
To use it, the XOR-MAPPED-ADDRESS needs to be associated with a 7.7. Client Procedures
candidate (called a peer-derived candidate). In this case, however,
the candidate isn't signaled through an offer/answer exchange; it is
constructed dynamically from information in the STUN response. Like
all other candidates, the peer-derived candidate has a candidate ID.
The candidate ID is derived from the candidate IDs of the target
candidate pair. In particular, the candidate ID is constructed by
concatenating the native candidate ID with the remote candidate ID
(without the colon). The password for the new candidate equals that
of the native candidate ID in the matching candidate pair (note that,
this password would be the same for all native candidates for the
same media line).
When the Binding Response is received, the agent constructs the This section defines additional procedures for the Binding Request
candidate ID that represents the peer reflexive candidate, and checks transaction, beyond those described in [11].
to see if that candidate exists. It may already exist if it had been
constructed as a consequence of a previous application of this logic
on receipt of a Binding Response for a different transport address
pair of the same candidate pair. If there is not yet a peer
reflexive candidate with that candidate ID, the agent creates it, and
assigns it the newly computed candidate ID. The priority of the
peer-derived candidate is set to the priority of its generating
candidate - the native candidate in the target transport address
pair. Note that, at this time, the peer derived candidate has no
transport addresses in it. The native candidate is then paired up
with a remote candidate. However, unlike the procedures of
Section 7.5, which pair up each native candidate with each remote
candidate, this peer reflexive candidate is only paired up with the
remote candidate from the target candidate pair. This creates a new
candidate pair. This new candidate pair is inserted into the
candidate pair priority ordered list based on the ordering rules
defined in Section 7.5. Note that no entries are added to the
transport address pair check ordered list.
Recall that, for each candidate pair, one agent plays the role of 7.7.1. Sending the Request
offerer, and the other of answerer. For a peer-reflexive candidate,
the role is identical to that of its generating candidate.
Newly created or not, the agent extracts the component ID from the The agent acting as the client generates a connectivity check either
target transport address pair, and sees if a transport address with periodically, or triggered. In either case, the check is generated
that same component ID exists in the peer reflexive candidate. If it by sending a Binding Request from a local candidate, to a remote
does, the agent does nothing further. This can happen in unusual candidate. The agent must know the username fragment for both
cases when there is a NAT reboot in the middle of a STUN transaction, candidates and the password for the remote candidate.
causing two requests in the same transaction two produce two
different transport addresses. If there is no transport address with
the same component ID in the peer reflexive candidate, the agent adds
a transport address to the peer reflexive candidate. This transport
address is equal to the XOR-MAPPED-ADDRESS from the incoming STUN
Binding Response, and has a transport protocol equal to the one used
for the Binding Response. It is assigned the component ID equal to
the component ID in the matching transport address pair. This
transport address will have a transport address ID, equal to the
concatenation of the candidate ID for this new candidate, and the
component ID, separated by a colon. The type of the transport
address is considered to be peer reflexive, though this is never
signaled through SDP and so there is no candidate-types value defined
for it. Recall that each transport address is associated with an
origination transport address. For server reflexive candidates, the
origination transport address is signaled through SDP. For peer
reflexive transport addresses, it is inherited from the origination
transport address of the generating transport address. If the
generating transport address was a local transport address, then the
origination transport address is that transport address. If the
generating transport address was server reflexive, the origination
transport address is the related transport address that was signaled
for that server reflexive candidate. If the generating transport
address was relayed, the origination transport address is the relayed
transport address itself. Whether and how other candidate attributes
defined by extensions are inherited depends on the extension.
The newly added transport address is paired up with the remote A Binding Request serving as a connectivity check MUST utilize a STUN
transport address with the same component ID. Initially, the peer short term credential. Rather than being learned from a Shared
reflexive candidate will start with a single transport address a Secret request, the short term credential is exchanged in the offer/
transport address pair. More are added as the connectivity checks answer procedures. In particular, the username is formed by
for the original candidate pair take place. concatenating the username fragment provided by the peer with the
username fragment of the agent sending the request, separated by a
colon (":"). The password is equal to the password provided by the
peer. For example, consider the case where agent A is the offerer,
and agent B is the answerer. Agent A included a username fragment of
AFRAG for its candidates, and a password of APASS. Agent B provided
a username fragment of BFRAG and a password of BPASS. A connectivity
check from A to B (and its response of course) utilize the username
BFRAG:AFRAG and a password of BPASS. A connectivity check from B to
A (and its response) utilize the username AFRAG:BFRAG and a password
of APASS.
The new transport address pair has a state machine associated with All Binding Requests for the connectivity check usage MUST contain
it. The state that is entered, and actions to take as a consequence, the PRIORITY attribute. This MUST be set equal to the priority that
are specific to the transport protocol. For UDP, the procedures are would be assigned, based on the algorithm in Section 4.2, to a peer
defined here. Extensions that define processing for other transport reflexive candidate learned from this check. Such a peer reflexive
protocols SHOULD describe the behavior. candidate has a stream ID, component ID and local preference that are
equal to the host candidate from which the check is being sent, but a
type preference equal to the value associated with peer reflexive
candidates.
For UDP, the state machine enters the Recv-Valid state. Effectively, The Binding Request by an agent MUST include the USERNAME and
the Binding Response just received "counts" as a validation in this MESSAGE-INTEGRITY attributes. That is, an agent MUST NOT wait to be
direction, even though it was formally done for a different candidate challenged for short term credentials. Rather, it MUST provide them
pair. The peer will likely generate a Binding Request for this in the Binding Request right away.
candidate pair; processing of the request follows the logic described
in Section 7.6.
As with all candidate pairs, the state of this new candidate pair is 7.7.2. Processing the Response
derived from the states of its transport address pairs. Until the
number of transport address pairs in the candidate pair equals the
transport address pair count of the candidate pair from which it is
derived, the state of the candidate pair is Indeterminate. Once they
are equal, the state is derived just like any other candidate pair.
7.11. Subsequent Offer/Answer Exchanges If the STUN transaction generates an unrecoverable failure response
or times out, the agent sets the state of the check to Failed. The
remainder of this section applies to processing of successful
responses (any response from 200 to 299).
An agent MAY issue an updated offer at any time. This updated offer The agent MUST check that the source IP address and port of the
may be sent for reasons having nothing to do with ICE processing (for response equals the destination IP address and port that the Binding
example, the addition of a video stream in a multimedia session), or Request was sent to, and that the source IP address and port of the
it may be due to a change in ICE-related parameters. For example, if request match the destination IP address and port that the Binding
an agent acquires a new candidate after the initial offer/answer Response was received on. If these do not match, the agent sets the
exchange, it may seek to add it. state of the check to Failed. The processing described in the
remainder of this section MUST NOT be performed.
However, agents SHOULD follow the logic described in Section 7.9 to Otherwise, the source transport address of the response matched the
determine when to send an updated offer as a consequence of promoting destination transport address of the request. The agent changes the
a candidate to operating. state for this check to Succeeded. Next, the agent sees if the
success of this check can cause other checks to be unfrozen. If the
check had a component ID of one, the agent MUST change the states for
all other Frozen checks for the same media stream and same
foundation, but different component IDs, to Waiting. If the
component ID for the check was equal to the number of components for
the media stream, the agent MUST change the state for all other
Frozen checks for the first component of different media streams but
the same foundation, to Waiting.
If there are any aspects of this processing that are specific to the Next, the agent checks the mapped address from the STUN response. If
transport protocol, those SHOULD be called out in ICE extensions that the transport address does not match any of the local candidates that
define operation with other transport protocols. There are no the agent knows about, the mapped address representes a new peer
additional considerations for UDP. reflexive candidate. Its type is equal to peer reflexive. Its base
is set equal to the candidate from which the STUN check was sent.
Its username fragment and password are identical to the candidate
from which the check was sent. It is assigned the priority value
that was placed in the PRIORITY attribute of the request. Its
foundation is selected as described in Section 4.1. The peer
reflexive candidate is then added to the list of local candidates
known by the agent (though it is not paired with other remote
candidates at this time).
7.11.1. Sending of a Subsequent Offer In addition, the agent creates a candidate pair whose local candidate
equals the mapped address of the response, and whose remote candidate
equals the destination address to which the request was sent. This
is called a validated pair, since it has been validated by a STUN
connectivity check. The agent will know, either from the SDP or
through the PRIORITY attribute that was present in a STUN request,
the priorities of the local and remote candidates of the validated
pair. Based on these priorities, a priority for the validated pair
itself is computed if it was not already known, using the algorithm
in Section 5.6, and the pair is added to the valid list.
The offer MAY contain a new operating candidate in the m/c line. 7.8. Server Procedures
This candidate SHOULD be the native candidate from the highest
priority candidate pair in the candidate pair priority ordered list
whose state is Valid. If there are no candidate pairs in this state,
the highest one whose state is Send-Valid or Recv-Valid SHOULD be
used. If there are no candidate pairs in these states, the candidate
pair that is most likely to work with this peer, as described in
Section 7.2, SHOULD be used. The candidate is encoded into the m/c
line in an updated offer as described in Section 7.3. Note that,
while peer-derived candidates never appear in a=candidate attributes
(only their generating candidates appear there), a peer-derived
candidate can appear in the m/c line if it has been selected for
usage for media.
If the candidate pair whose native candidate was encoded into the An agent MUST be prepared to receive a Binding Request on the base of
m/c-line was Valid, Send-Valid or Recv-Valid, the agent MUST include each candidate it included in its most recent offer or answer.
an a=remote-candidate attribute into the offer. This attribute MUST Receipt of a Binding Request on an IP address and port that the agent
contain the candidate ID of the remote candidate in the candidate had included in a candidate attribute is an indication that the
pair. It is used by the recipient of the offer in selecting its connectivity check usage applies to the request.
candidate for the answer. Because the native candidate in the m/c-
line will typically be Valid, Send-Valid or Recv-Valid in every offer
after the initial one, the a=remote-candidate attribute will
typically be used in all subsequent offers.
The meaning of a=candidate attributes within a subsequent offer have The agent MUST use a short term credential to authenticate the
the same meaning as they do in an initial offer. They are a request request and perform a message integrity check. The agent MUST accept
for the peer to attempt (or continue to attempt if the candidate was a credential if the username consists of two values separated by a
provided previously) a connectivity check using STUN from each of its colon, where the first value is equal to the username fragment
own candidates. When an updated offer is sent, there are several generated by the agent in an offer or answer for a session in-
dispositions regarding the candidates: progress, and the password is equal to the password for that username
fragment. It is possible (and in fact very likely) that an offeror
will receive a Binding Request prior to receiving the answer from its
peer. However, the request can be processed without receiving this
answer, and a response generated.
retained: A candidate is retained if the candidate ID for the For requests being received on a relayed candidate, the source IP
candidate is included in the new offer, and matches the candidate address and port used for STUN processing (namely, generation of the
ID for a candidate in the previous offer or answer from the agent. XOR-MAPPED-ADDRESS attribute) is the IP address and port as seen by
In this case, all of the information about the candidate - its the relay. That source transport address will be present in the
qvalue and components, and the IP addresses, ports, and transport REMOTE-ADDRESS attribute of a STUN Data Indication message, if the
protocols of its components, MUST be the same as the previous Binding Request was delivered through a Data Indication. If the
offer or answer from the agent. If the agent wants to change Binding Request was not encapsulated in a Data Indication, that
them, this is accomplished by changing the candidate ID as well. source address is equal to the current active destination for the
That will have the effect of removing the old candidate and adding STUN relay session.
a new one with the updated information.
removed: A candidate is removed if its candidate ID appeared in a When the agent receives a STUN Binding Request for which it generates
previous offer or answer, and that candidate ID is not present in a successful response, the agent checks the source transport address
the new offer. of the request. If this transport address does not match any
existing remote candidates, it represents a new peer reflexive remote
candidate. This candidate is given a priority equal to the PRIORITY
attribute from the request. The type of the candidate is equal to
peer reflexive. Its foundation is set to an arbitrary value,
different from the foundation for all other remote candidates. The
username fragment for this candidate is equal to the bottom half (the
part after the colon) of the username in the Binding Request that was
just received. The password for this username fragment is taken from
the SDP from the peer. If agent has not yet received this SDP (a
likely case for the offerer in the initial offer/answer exchange), it
MUST wait for the SDP to be received, and then proceed with rest of
the processing described in the remainder of this section. This
candidate is then added to the list of remote candidates. However,
it is not paired with any local candidates.
added: A candidate is added if its candidate ID appeared in the new Next, the agent MUST generate a triggered check in the reverse
offer, but was not present in a previous offer or answer from that directon if it has not already sent such a check. The triggered
agent. check has a local candidate equal to the candidate on which the STUN
request was received, and a remote candidate equal to the source
transport address where the request came from (which may be a newly
formed peer reflexive candidate). The agent knows the priorities for
the local and remote candidates of this check, and so can compute the
priority for the check itself. If there is already a check on the
check list with this same local and remote candidates, and the state
of that check is Waiting or Frozen, its state is changed to In-
Progress and the check is performed. If there was already a check on
the check list with this same local and remote candidates, and its
state was In-Progress, the agent SHOULD generate an immediate
retransmit of the Binding Request. This is to facilitate rapid
completion of ICE when both agents are behind NAT. If there was a
check in the list already and its state was Succeeded or Failed,
nothing further is done. If there was no matching check on the check
list, it is inserted into the check list based on its priority, its
state is set to In-Progress, and the check is performed.
The following rules are used to determine the disposition of the each 7.9. Security Considerations for Connectivity Check
of the current native candidates in the new offer:
o If a candidate is invalid, and all peer reflexive candidates Security considerations for the connectivity check are discussed in
generated from it are invalid as well, it SHOULD be removed. Section 15.
o If the candidate in the m/c-line is valid, all other lower 8. Completing the ICE Checks
priority candidates SHOULD be removed. This has the effect of
stopping connectivity checks of other candidates. This SHOULD
would not be followed if an agent wanted to keep a candidate ready
for usage if, for some reason, the operating candidate later
become invalid.
o If the candidate in the m/c-line is valid, and it is not peer When a pair is added to the valid list, and the agent was the offeror
reflexive, that candidate MUST be retained. If the candidate in in the most recent offer/answer exchange, the agent MUST check to see
the m/c-line is peer reflexive, its generating candidate MUST be if there is a pair on the validated list for each component of each
retained, even if it is itself invalid. media stream. If there is, the offeror MUST stop timer Ta, and MUST
cease retransmitting any Binding Requests for transactions in
progress. It MUST ignore any responses which may subsequently arrive
to transactions previously in progress. The offeror MUST generate an
updated offer as described in Section 9. It does this regardless of
whether the highest priority pairs in the check list match the
current in-use candidate pairs.
o If the candidate in the m/c-line has not been validated, all other When a pair is aded to the valid list, and the agent was the answerer
candidates that are not invalid, or candidates for whom their in the most recent offer/answer exchange, the agent MAY begin sending
derived candidates are not invalid, SHOULD be retained. media using that candidate pair, as described in Section 11.1. In
addition, if there is a candidate pair on the valid list for each
component of each media stream, the answerer MUST stop timer Ta, and
MUST cease retransmitting any Binding Requests for transactions in
progress. It MUST ignore any responses which may subsequently arrive
to transactions previously in progress.
o Peer reflexive candidates MUST NOT be added; they continue to be Note that only agent that was the answerer in the most recent offer/
used as long as their generating candidate was retained. Peer answer exchange gets to send media right away. The offeror must wait
derived candidates are learned exclusively through the STUN for a subsequent offer/answer exchange if the valid candidates don't
connectivity checks. match those in the m/c-line.
A new candidate MAY be added. This can happen when the candidate is OPEN ISSUE: It is possible that higher priority checks may still
a new one, learned since the previous offer/answer exchange, and it succeed, if we allowed things to continue. This can happen for
has a higher priority than the currently operating candidate. It can several reasons. First, an in-progress check of higher priority
also occur when an agent wishes to restart checks for a transport had some packet loss and thus hasn't completed. Timer Tws was
address it had tried previously. Effectively, changing the candidate meant to handle this (I removed this timer from -10 to simplify).
ID value in an updated offer will "restart" connectivity checks for More interestingly, higher priority checks may have not been done
that candidate. because a triggered check of lower priority succeeded. This
happens in cases where the number of checks at each agent are
assymetric. It is possible to fix both of these problems by
delaying the completion of the ICE procedures for a bit more time.
This adds complexity and latency. The basic algorithm would be
this. You take the lowest priority pair in the valid list. You
keep doing checks as long as there are higher priority checks on
the list in the Waiting state. If there are none, you wait a
brief time (say 50ms) and then consider ICE finished.
If a candidate is removed, the agent takes the following steps once 9. Subsequent Offer/Answer Exchanges
the offer is sent:
1. The agent eliminates any candidate pairs whose native candidate An agent MAY generate a subsequent offer at any time. However, the
equalled the candidate that was removed. Equality is based on rules in Section 7.7.2 will cause the offerer to generate an updated
comparison of candidate IDs. offer when the candidates in the valid list are not all in-use.
2. The agent eliminates any candidate pairs that had a native 9.1. Generating the Offer
candidate that is a peer reflexive candidate generated from the
candidate that was removed.
3. The candidate pairs that are eliminated are removed from the When an agent generates an updated offer, the set of candidate
candidate pair priority ordered list. Their corresponding attributes to include depend on the state of ICE processing. If ICE
transport address pairs are removed from the transport address is "done", which occurs when the valid list includes a candidate pair
pair check ordered list. As a consequence of this, if for each component of each media stream, the agent MUST include a
connectivity checks had not yet begun for the candidate pair, candidate attribute for each local candidate amongst the pairs in the
they won't. If a transport address pair had been pruned from the valid list (including peer reflexive candidates), and SHOULD NOT
transport address pair check ordered list because it was include any others. This will cause STUN keepalives to be sent for
redundant with one of the transport address pairs which was just the in-use candidates, and thats it.
removed, that transport address pair is added back to the list.
4. If connectivity checks were already in progress for transport If, however, the valid list does not yet include a candidate pair for
addresses in a candidate pair that was removed, the agent SHOULD each component of each media stream, the agent SHOULD include all
immediately terminate them. No further retransmissions take current candidates, including any peer reflexive candidates it has
place, and no further transactions from that candidate will be learned since the last offer or answer it sent. This MAY include
made. candidates it did not offer previously, but which it has gathered
since the last offer/answer exchange.
5. If the removed candidate was a relayed candidate, the agent If a candidate was sent in a previous offer/answer exchange, it
SHOULD de-allocate its transport addresses from the STUN relay if SHOULD have the same priority. For a peer reflexive candidate, the
it is not using those resources elswhere. If a local candidate priority SHOULD be the same as determined by the processing in
was removed, and all of its derived candidates were also removed Section 7.7.2. The foundation SHOULD be the same. The username
(including any peer reflexive candidates), local operating system fragments and passwords for a media stream SHOULD remain the same as
resources for each of the transport addresses in the local the previous offer or answer.
candidate SHOULD be de-allocated, as long as it is not using
those resources elsewhere. The resources may be in use elsewhere
if they were included in an initial offer which generated
multiple answers (as can happen with SIP forking). In such a
case, a subsequent offer which removes the candidate will not
imply its removal with the other branches; each becomes a
separate offer/answer relationship.
Subsequent offers MUST contain a=ice-pwd attributes that specify the Population of the m/c-lines also depends on the state of ICE
password for the candidates for each media stream. If any of the processing. If, for a particular media stream, the valid list has
candidates for a particular m-line are the same as the previous candidate pairs for all of the components of that media stream, those
offer, the ICE password for that m-line MUST be the same. If all of pairs are used. In particular, the m/c-line would be constructed by
the candidates for a particular m-line are different from the from the local candidate from each of those candidate pairs. In
previous offer, the ICE password for that m-line MAY be different. addition, the agent MUST include the a=remote-candidates attribute
Note that it is permissible to use a session-level attribute in one for that media stream, and include in it the remote candidates for
offer, but to provide the same password as a media-level attribute in each of the pairs that were used.
a subsequent offer. This is not a change in password, just a change
in its representation.
7.11.2. Receiving the Offer and Sending an Answer If, for a particular media stream, the valid list does not have pairs
for all of the components of the stream, the agent SHOULD populate
the m/c-line for that media stream based on the considerations in
Section 4.3.
To generate the answer, the answerer has to decide which transport The agent MUST use the same ice-pwd and ice-ufrag for a media stream
addresses to include in the m/c line, and which to include in as its previous offer or answer. Note that it is permissible to use
candidate attributes. a session-level attribute in one offer, but to provide the same
password as a media-level attribute in a subsequent offer. This is
not a change in password, just a change in its representation.
The first step in the process is to look for the a=remote-candidate 9.2. Receiving the Offer and Generating an Answer
attribute in the offer. The a=remote-candidate exists to eliminate a
race condition between the updated offer and the response to the STUN
Binding Request that moved a candidate into the Valid state. This
race condition is shown in Figure 11. On receipt of message 5, agent
A can move its transport address pair state machine into the Valid
state. It sends a STUN response to the request (message 6), but this
is lost. Agent A proceeds with an updated offer (message 7), which
is received at agent B. As far as agent B is concerned, the transport
address pair is still in the Send-Valid state. It will move into the
Valid state only on receipt of the STUN response in message 10.
Thus, upon receipt of the offer, agent B cannot determine which When the answerer generates its answer, it must decide what
candidate to include in its answer. To eliminate this condition, the candidates to include in the answer, and how to populate the m/c-
identity of the validated candidate is included in the offer itself. line.
Note, however, that the answerer will not send media until it has
received this STUN response.
Agent A Network Agent B For each media stream in the offer, the agent checks to see if the
|(1) Offer | | stream contained the remote-candidates attribute. If it did, it
|------------------------------------------>| means that the offerer believed that ICE processing has completed for
|(2) Answer | | that media stream. In this case, the remote-candidates attribute
|<------------------------------------------| contains the candidates that the answerer is supposed to use. It is
|(3) STUN Req. | | possible that the agent doesn't even know of these candidates yet;
|------------------------------------------>| they will be discovered shortly through a response to an in-progress
|(4) STUN Res. | | check. The agent MUST populate the m/c-line with the candidates from
|<------------------------------------------| the a=remote-candidates attribute. In addition, it MUST include an
|(5) STUN Req. | | a=candidate attribute in its answer for each candidate in the
|<------------------------------------------| a=remote-candidates attribute. If the agent is not aware of the
|(6) STUN Res. | | candidate yet, it will need to generate a priority value for it. The
|-------------------->| | type preference in the computation is peer-reflexive, and the stream
| |Lost | ID and component ID are known from the offer. The agent chooses an
|(7) Offer | | arbitrary local preference value if it is multi-homed, since it won't
|------------------------------------------>| yet know the interface associated with this candidate.
|(8) Answer | |
|<------------------------------------------|
|(9) STUN Req. | |
|<------------------------------------------|
|(10) STUN Res. | |
|------------------------------------------>|
Figure 11 If a media stream does not yet contain the a=remote-candidates
attribute, it means that the offerer believes that ICE checks are
still in progress for that media stream. In this case, the answerer
SHOULD include an a=candidate attribute for all of the candidates for
that media stream it knows about (including peer-reflexive
candidates). The m/c-line is populated based on the considerations
in Section 4.3.
If the a=remote-candidate attribute is present, the agent examines Construction of the ice-pwd and ice-ufrag are identical to the
the transport addresses in the m/c-line of the offer. It compares procedures followed by the offerer, as described in Section 9.1.
these with the transport addresses in the remote candidates of all
candidate pairs. If there is no match, no further processing of the
a=remote-candidate attribute is done. If there is at least one
match, the agent compares the native candidate ID of each matching
pair with the value of the a=remote-candidate attribute. If there is
a match, that candidate pair is selected. For each transport address
pair in that candidate pair, if the state of the transport address
pair is Send-Valid, the agent considers the state to be Valid just
for the purpose of constructing the answer. In particular, it will
impact selection of the candidate for the m/c-line and the set of
additional candidates to include or exclude from the answer.
However, the actual state MUST remain Send-Valid. This state will be
used to determine when it is safe to send media. Keeping it at Send-
Valid is necessary to prevent against DoS attacks.
Note that the a=remote-candidate attribute SHOULD NOT be included in Note that the a=remote-candidates attribute SHOULD NOT be included in
the answer, and if included, will just be ignored by the offerer, the answer, and if included, will just be ignored by the offerer,
since it is not used in any processing of the answer. since it is not used in any processing of the answer.
Rules for choosing transport addresses for the m/c-line are as 9.3. Updating the Check and Valid Lists
follows. The agent examines the transport addresses in the m/c-line
of the offer. It compares these with the transport addresses in the
remote candidates of candidate pairs whose states are Valid. If
there is a matching candidate pair in that state, the pair with the
highest priority MUST be chosen, and the native candidate from that
pair used as the operating candidate. If there were no matching
candidate pairs in the Valid state (possibly because the transport
addresses in the m/c-line in the offer didn't match any of the remote
candiadtes), the candidate that is most likely to work with this
peer, as described in Section 7.2, SHOULD be used. Note that this
candidate may be Valid as a consequence of being temporarily changed
to such by the a=remote-candidate attribute.
Like the offerer, the answerer can decide, for each of its
candidates, whether they are retained or removed. The same rules
defined in Section 7.11.1 for determining their disposition apply to
the answerer. Similarly, if a candidate is removed, the same rules
in Section 7.11.1 regarding removal of canididate pairs and freeing
of resources apply. As with selection of the candidate for the m/c-
line, the state of one of the candidates may be Valid as a
consequence of being temporarily changed to such by the a=remote-
candidate attribute.
Once the answer is sent, the answerer will have the set of native and
remote candidates before this offer/answer exchange, and the set of
native and remote candidates afterwards. A peer derived candidate
continues to be used as long as its generating parent continues to be
used. The agent then pairs up the native and remote candidates which
were added or retained. This leads to a set of current candidate
pairs.
If a candidate pair existed previously, but as a consequence of the
offer/answer exchange, it no longer exists, the agent takes the
following steps:
1. The candidate pair is removed from the candidate pair priority
ordered list. Their corresponding transport address pairs are
removed from the transport address pair check ordered list. As a
consequence of this, if connectivity checks had not yet begun for
the candidate pair, they won't. If a transport address pair had
been pruned from the transport address pair check ordered list
because it was redundant with one of the transport address pairs
which was just removed, that transport address pair is added back
to the list.
2. If connectivity checks were already in progress for that
candidate pair, the agent SHOULD immediately terminate any STUN
transactions in progress from that candidate. No further
retransmissions take place, and no further transactions from that
candidate will be made.
3. If the agent receives a STUN Binding Request for that candidate Once the subsequent offer/answer exchange has completed, each agent
pair, however, processing occurs as defined in Section 7.8. needs to compute the new check list resulting from this exchange, and
then remove any pairs from the valid list which are no longer usable.
Once these adjustments are made, ICE processing continues using these
new lists.
If a candidate pair existed previously, and continues to exist, no Each agent recomputes the check list using the procedures described
changes are made; any STUN transactions in progress for that in Section 5.6. If a check on this new check list was also on the
candidate pair continue, it remains on the candidate pair priority previous check list, and its state was Waiting, In-Progress,
ordered list, and its transport address pairs remain on the transport Succeeded or Failed, its state is copied over. If a check on the new
address pair check ordered list. check list does not have a state (because its a new check or its
state was not copied over), and it is for the component with
component ID 1 and for the media stream with stream ID 9, its state
is set to Waiting. All other pairs without a state have their state
set to Frozen.
If a candidate pair is new (because either its native candidate is Next, the agent goes through the check list, starting with the
new, or its remote candidate is new, or both), the agent takes the highest priority check. If a check has a state of Succeeded, and it
role of answerer for this candidate pair. The new candidate pair is has a component ID of 1, then all Frozen checks for the same media
inserted into the candidate pair priority ordered list, and the stream and same foundation whose component IDs are not one, have
transport address pair check ordered list is rederived. STUN their state set to Waiting. If, for a particular media stream, there
connectivity checks will start for them based on the logic described are checks for each component of that media stream in the Succeeded
in Section 7.6. state, the agent moves the state of all Frozen checks for the first
component of all other media streams with the same foundation to
Waiting.
7.11.3. Receiving the Answer If a check was on the old check list, but was not on the new check
list, and had a state of In-Progress, the corresponding STUN
transaction is abandoned. No further retransmits will be sent for
the STUN request, and any response that might be received is ignored.
Once the answer is received, the answerer will have the set of native Next, the agent prunes the valid list. For each pair on the valid
and remote candidates before this offer/answer exchange, and the set list, the agent examines each candidate in the pair. If the
of native and remote candidates afterwards. It then follows the same candidate was not peer reflexive, and was not present in the most
logic described in Section 7.11.2, pairing up the candidate pairs, recent offer/answer exchange, the candidate pair is removed from the
removing ones that are no longer in use, and beginning of processing valid list.
for ones that are new.
7.12. Binding Keepalives OPEN ISSUE: This means that you cannot forcefully remove a peer
reflexive candidate. This feature was possible, at much
complexity, in previous versions of the spec. An alternative is
to remove a peer reflexive candidate if it was not present in the
offer/answer, and was discovered more than 500ms ago.
Once a candidate is promoted to operating, and media begins flowing, 10. Keepalives
it is still necessary to keep the bindings alive at intermediate NATs
for the duration of the session. Normally, the media stream packets
themselves (e.g., RTP) meet this objective. However, several cases
merit further discussion. Firstly, in some RTP usages, such as SIP,
the media streams can be "put on hold". This is accomplished by
using the SDP "sendonly" or "inactive" attributes, as defined in RFC
3264 [4]. RFC 3264 directs implementations to cease transmission of
media in these cases. However, doing so may cause NAT bindings to
timeout, and media won't be able to come off hold.
Secondly, some RTP payload formats, such as the payload format for STUN connectivity checks are also used to keep NAT bindings open once
text conversation [31], may send packets so infrequently that the a session is underway. This is accomplished by periodically re-
interval exceeds the NAT binding timeouts. starting the check process, as described in this section.
Thirdly, if silence suppression is in use, long periods of silence Once the initial offer/answer exchange has taken place, the agent
may cause media transmission to cease sufficiently long for NAT sets a timer to fire in Tr seconds. Tr SHOULD be configurable and
bindings to time out. SHOULD have a default of 15 seconds. When Tr fires, the agent MUST
reset the states for all of the checks in the check list using the
procedures defined in Section 5.6 and then begin performing periodic
checks as described in Section 5.7. By the time the timer fires for
the first time, the check list will include only the in-use
candidates. Reperforming these checks will therefore performing a
period keepalive.
To prevent these problems, ICE implementations MUST continue to list OPEN ISSUE: ICE isn't saying anything about what happens if these
their operating candidate in a=candidate lines for UDP-based media periodic keepalives should fail. It they do, something really bad
streams. As a consequence of this, STUN packets will be transmitted has happened, like a NAT reboot or failure. I think we should
periodically independently of the transmission (or lack thereof) of keep that out of scope.
media packets. These will be received on the same IP address and
port as the media streams. The agent determines whether the packet
is media or STUN by looking for the magic cookie in bits 32-63 of the
data. If present, it indicates that the packet is STUN, and if not,
indicates that it is media. This provides a media independent, RTP
independent, and codec independent solution for keeping the NAT
bindings alive. However, an ICE implementation MUST be prepared for
the transport address received in an m/c-line to not correspond to
any a=candidate attributes.
If an ICE implementation is communciating with one that does not When an ICE agent is communicating with an agent that is not ICE-
support ICE, keepalives MUST still be sent. Indeed, these keepalives aware, keepalives still need to be utilized. Indeed, these
are essential even if neither endpoint implements ICE. As such, this keepalives are essential even if neither endpoint implements ICE. As
specification defines keepalive behavior generally, for endpoints such, this specification defines keepalive behavior generally, for
that support ICE, and those that do not. endpoints that support ICE, and those that do not.
All endpoints MUST send keepalives for each media session. These All endpoints MUST send keepalives for each media session. These
keepalives MUST be sent regardless of whether the media stream is keepalives MUST be sent regardless of whether the media stream is
currently inactive, sendonly, recvonly or sendrecv. The keepalive currently inactive, sendonly, recvonly or sendrecv. The keepalive
SHOULD be sent using a format which is supported by its peer. ICE SHOULD be sent using a format which is supported by its peer. ICE
endpoints allow for STUN-based keepalives for UDP streams, and as endpoints allow for STUN-based keepalives for UDP streams, and as
such, STUN keepalives MUST be used when an agent is communicating such, STUN keepalives MUST be used when an agent is communicating
with a peer that supports ICE. An agent can determine that its peer with a peer that supports ICE. An agent can determine that its peer
supports ICE by the presence of the a=candidate attributes for each supports ICE by the presence of the a=candidate attributes for each
media session. If the peer does not support ICE, the choice of a media session. If the peer does not support ICE, the choice of a
packet format for keepalives is a matter of local implementation. A packet format for keepalives is a matter of local implementation. A
format which allows packets to easily be sent in the absence of format which allows packets to easily be sent in the absence of
actual media content is RECOMMENDED. Examples of formats which actual media content is RECOMMENDED. Examples of formats which
readily meet this goal are RTP No-Op [28] and RTP comfort noise [24]. readily meet this goal are RTP No-Op [27] and RTP comfort noise [23].
If the peer doesn't support any formats that are particularly well If the peer doesn't support any formats that are particularly well
suited for keepalives, an agent SHOULD send RTP packets with an suited for keepalives, an agent SHOULD send RTP packets with an
incorrect version number, or some other form of error which would incorrect version number, or some other form of error which would
cause them to be discarded by the peer. cause them to be discarded by the peer.
STUN-based keepalives will be sent periodically every Tr seconds as a STUN-based keepalives will be sent periodically every Tr seconds as
consequence of the rules in in Section 7.7. If STUN keepalives are described above. If STUN keepalives are not in use (because the peer
not in use (because the peer does not support ICE), an agent SHOULD does not support ICE), an agent SHOULD ensure that a media packet is
ensure that a media packet is sent every Tr seconds. If one is not sent every Tr seconds. If one is not sent as a consequence of normal
sent as a consequence of normal media communications, a keepalive media communications, a keepalive packet using one of the formats
packet using one of the formats discussed above SHOULD be sent. discussed above SHOULD be sent.
7.13. Sending Media
When an agent receives an offer and sends an answer, or when it
receives an answer to an offer it sent, it begins connectivity
checks. If there is a candidate that corresponds to the m/c-line,
these checks will include validation of the operating candidate pair.
In that case, an agent SHOULD NOT send media on the operating
candidate pair until that candidate pair has reached the Valid or
Recv-Valid state. This is to help prevent a denial-of-service
attack, described in Section 13. Once the operating candidate pair
reaches the Valid or Recv-Valid state, an agent MAY start sending
media to that candidate pair. If there is no candidate that
corresponds to the m/c-line, the m/c-line cannot be validated, and
media is sent to it as described in RFC 3264 [4]. Under normal
conditions, there will be a candidate for the m/c-line. Indeed - ICE
itself requires that an agent include one. However, actual SIP
deployments have seen usage of network intermediaries which
manipulate the m/c-line of offers and answers. Should such elements
ignore the candidate attributes, it would manifest itself like an
agent which did not include a candidate for the m/c-line. For this
reason, this use case is explicitly supported by ICE.
Offer/answer exchanges are used with protocols, like SIP, which
require media to be sent "early", from the answerer to the offer,
prior to completion of the initial offer/answer exchange. It is
highly desirable (and sometimes necessary) for this early media to
use the candidate pair ultimately selected by ICE connectivity
checks. For this reason, ICE provides an early media mechanism that
allows for a candidate pair to be used in one direction prior to its
promotion to operating in a subsequent offer/answer exchange. Note
that, with ICE, early media pertains to media sent to a candidate
pair until its promotion to operating in a subsequent offer/answer
exchange. This is a broader definition than is used in [26], which
defines early media as media sent prior to acceptance of a call.
As a consequence of the connectivity checks, an agent will change the 11. Media Handling
states for each transport address pair, and consequently, for the
candidate pairs. When a candidate pair becomes Valid or Recv-Valid,
and there is a candidate pair for the m/c-line, and the candidate
pair is not equal to the operating candidate pair, and the agent is
in the role of answerer for that candidate pair, the agent checks the
position of that pair in the candidate pair priority ordered list.
If it is the first, the agent selects this candidate pair for early
media. If this candidate pair is not the first on the candidate pair
priority ordered list, but is higher priority than the operating
candidate pair, and the early media wait-state timer has not yet been
set, the agent sets this timer to Tws seconds. Though the early
media wait state timer has the same value as the wait state timer
described in Section 7.9, these are different timers and indeed are
set by different entites. The early media wait state timer allows
for a higher priority connectivity check to complete, in the event
its STUN Binding Request or Response was lost or delayed in the
network. If, prior to the early media wait-state timer firing,
another connectivity check completes and a candidate pair enters the
Valid or Recv-Valid states, there is no need to reset or cancel the
timer. Once the timer fires, the agent SHOULD select the highest
priority candidate pair in the Valid or Recv-Valid state for which
the agent has the role of answerer, and use that candidate pair for
early media.
ICE processing will ensure that, under almost all circumstances, the 11.1. Sending Media
candidate pair selected by the answerer for early media will also be
the one selected by the offerer for eventual promotion to operating.
The early media state implies that the answerer knows that this
candidate pair is to be used, but the offerer doesn't know yet that
it will eventually be validated. It is for this reason that the
candidate pair can be used for early media.
If a candidate pair is selected for early media, an agent MAY send Agents always send media using a candidate pair. An agent will send
media on that candidate pair, even if it is not the same as the media to the remote candidate in the pair (setting the destination
operating candidate pair. However, to deal with cases in which the address and port of the packet equal to that remote candidate), and
offerer and answerer do not agree on the eventual selection of this will send it from the local candidate. When the local candidate is
candidate for promotion to operating (a rare but possible case), the server or peer reflexive, media is originated from the base. Media
agent MUST discontinue using the candidate pair for sending media Tlo sent from a relayed candidate is sent through that relay, using
seconds after the next opportunity its peer would have to send an procedures defined in [12].
updated offer. In the case of an answer delivered in a 200 OK to an
offer in a SIP INVITE (regardless of whether that same answer
appeared in an earlier unreliable provisional response), this would
be Tlo seconds after receipt of the ACK. Tlo SHOULD be configurable
and SHOULD have a default of 5 seconds. This time represents the
amount of time it should take the offerer to perform its connectivity
checks, arrive at the same conclusion about the viability of the
early candidate, and then generate an updated offer promoting it to
operating. If, after Tlo seconds, no updated offer arrives, the
answerer MUST cease using the early candidate. Media MAY be sent to
the operating candidate pair if it is in the Valid or Recv-Valid
state.
If an updated offer does arrive prior to the expiration of the timer, If an agent was the offerer in the most recent offer/answer exchange,
the agent MUST execute the procedures in Section 7.11.2, which will when it sends media, it MUST use the candidates in the m/c-line for
result in the selection of a candidate for the m/c-line in the each media stream. However, it MUST only send media once those
answer. At that point, the procedures of this section SHOULD be candidates also appear in the valid list. If the candidates in the
restarted by the answerer. This implies that the operating candidate m/c-line are not the ones that are ultimately selected by ICE, this
pair, if Valid or Recv-Valid, will be used. If a higher priority implies that the offerer will need to wait for the subsequent offer/
candidate pair subsequently enters the Valid or Recv-Valid state, it answer exchange to complete before it can send media.
may end up being used as an early candidate.
To use a candidate pair, whether it is early or operating, media is If an agent was the answerer in the most recent offer/answer
sent to the IP addresses and ports of the components in the remote exchange, the rules are different. When the agent wishes to send
candidate, and sends that media from the IP addresses and ports of media, and the candidate pairs in the m/c-lines are also the highest
the components in the native candidate. Transport addresses are priority ones in the valid list for each media stream, it uses those
paired up based on component ID. For example, if a remote candidate candidate pairs. If, however, the highest priority pairs in the
has two components R1 and R2, and the native candidate has two valid list for a media stream are not the same as the ones in the
components L1 and L2, media packets are sent from L1 to R1 and from m/c-lines, the agent MUST use the highest priority pairs in the valid
L2 to R2. This provides a property known as symmetry. This list. However, the agent MUST discontinue using those candidate
symmetric behavior MUST be followed by an agent even if its peer in pairs Tlo seconds after the next opportunity its peer would have to
the session doesn't support ICE. send an updated offer. In the case of an answer delivered in a 200
OK to an offer in a SIP INVITE (regardless of whether that same
answer appeared in an earlier unreliable provisional response), this
would be Tlo seconds after receipt of the ACK. Tlo SHOULD be
configurable and SHOULD have a default of 5 seconds. This time
represents the amount of time it should take the offerer to perform
its connectivity checks, arrive at the same conclusion about the
candidate pair, and then generate an updated offer. If, after Tlo
seconds, no updated offer arrives, the answerer MUST cease sending
media, and will need to wait for the updated offer.
The definition of sending media "from" a particular transport address OPEN ISSUE: In previous versions of ICE, once this timer fired,
depends on the type of transport address. In the case of a server you just sent media to the one in the m/c-line. This causes the
reflexive transport address, this means that the RTP packets are sent media streams to flip back and forth between addresses, which I am
from the local transport address used to obtain the STUN address. In trying to avoid. Since this timer should never go off anyway, I
the case of a relayed transport address, this means that media removed this feature.
packets are sent through the relay server (for STUN relays, this
would be using the Send request). For local transport addresses,
media is sent from that local transport address. For peer reflexive
transport addresses, media is sent from the local transport address
used to obtain the reflexive address.
ICE has interactions with jitter buffer adaptation mechanisms. An ICE has interactions with jitter buffer adaptation mechanisms. An
RTP stream can begin using one candidate, and switch to another one. RTP stream can begin using one candidate, and switch to another one,
The newer candidate may result in RTP packets taking a different path though this happens rarely with ICE. The newer candidate may result
through the network - one with different delay characteristics. As in RTP packets taking a different path through the network - one with
discussed below, agents are encouraged to re-adjust jitter buffers different delay characteristics. As discussed below, agents are
when there are changes in source or destination address. encouraged to re-adjust jitter buffers when there are changes in
Furthermore, many audio codecs use the marker bit to signal the source or destination address. Furthermore, many audio codecs use
beginning of a talkspurt, for the purposes of jitter buffer the marker bit to signal the beginning of a talkspurt, for the
adaptation. For such codecs, it is RECOMMENDED that the sender purposes of jitter buffer adaptation. For such codecs, it is
change the marker bit when an agent switches transmission of media RECOMMENDED that the sender change the marker bit when an agent
from one candidate pair to another. switches transmission of media from one candidate pair to another.
7.14. Receiving Media
ICE implementations MUST be prepared to receive media on a candidate 11.2. Receiving Media
pair if it is in the role of offerer for that candidate pair, even if
that candidate pair is not currently operating. This is a
consequence of the early media mechanism described in the previous
section.
If an agent determines that its peer supports ICE (an offerer knows ICE implementations MUST be prepared to receive media on any
this when the answer contains a=candidate attributes), it SHOULD candidates provided in the most recent offer/answer exchange. In
discard any media packets received on a candidate pair prior to the order to avoid attacks described in Section 15, when an agent
candidate pair entering the Send Valid state. This helps eliminate receives a media packet, and it knows its peer supports ICE, it MUST
certain attacks, as discussed in Section 13. Note that, in cases of verify that it has received a check (for which a successful response
forking, an agent may get multiple answers to its offer, each for a was generated) on the same 5-tuple as the received media packet (that
different peer. Consequently, if would only discard media packets is, the source and destination transport addresses of the media
received on a candidate pair once it has determined that all forked packet match those of the check). If no such check has succeeded,
targets support ICE. the agent MUST silently discard the media packet.
It is RECOMMENDED that, when an agent receives an RTP packet with a It is RECOMMENDED that, when an agent receives an RTP packet with a
new source or destination IP address for a particular media stream, new source or destination IP address for a particular media stream,
that the agent re-adjust its jitter buffers. that the agent re-adjust its jitter buffers.
RFC 3550 [21] describes an algorithm in Section 8.2 for detecting RFC 3550 [20] describes an algorithm in Section 8.2 for detecting
SSRC collisions and loops. These algorithms are based, in part, on SSRC collisions and loops. These algorithms are based, in part, on
seeing different source IP addresses and ports with the same SSRC. seeing different source IP addresses and ports with the same SSRC.
However, when ICE is used, such changes will naturally occur as the However, when ICE is used, such changes will sometimes occur as the
media streams switch between candidates. An agent will be able to media streams switch between candidates. An agent will be able to
determine that a media stream is from the same peer as a consequence determine that a media stream is from the same peer as a consequence
of the STUN exchange that proceeds media transmission. Thus, if of the STUN exchange that proceeds media transmission. Thus, if
there is a change in source IP address and port, but the media there is a change in source IP address and port, but the media
packets come from the same peer agent, this SHOULD NOT be treated as packets come from the same peer agent, this SHOULD NOT be treated as
an SSRC collision. an SSRC collision.
8. Guidelines for Usage with SIP 12. Usage with SIP
SIP [2] makes use of the offer/answer model, and is one of the 12.1. Latency Guidelines
primary targets for usage of ICE. SIP allows for offer/answer
exchanges to occur in many different combinations of messages,
including INVITE/200 OK and 200 OK/ACK. When support for reliable
provisional responses (RFC 3262 [11]) and UPDATE (RFC 3311 [25]) are
added, additional combinations of messages that can be used for
offer/answer exchanges are added. As such, this section provides
some guidance on good ways to make use of SIP with ICE.
ICE requires a series of STUN-based connectivity checks to take place ICE requires a series of STUN-based connectivity checks to take place
between endpoints. These checks start from the answerer on between endpoints. These checks start from the answerer on
generation of its answer, and start from the offerer when it receives generation of its answer, and start from the offerer when it receives
the answer. These checks can take time to complete, and as such, the the answer. These checks can take time to complete, and as such, the
selection of messages to use with offers and answers can effect selection of messages to use with offers and answers can effect
perceived user latency. Two latency figures are of particular perceived user latency. Two latency figures are of particular
interest. These are the post-pickup delay and the post-dial delay. interest. These are the post-pickup delay and the post-dial delay.
The post-pickup delay refers to the time between when a user "answers The post-pickup delay refers to the time between when a user "answers
the phone" and when any speech they utter can be delivered to the the phone" and when any speech they utter can be delivered to the
caller. The post-dial delay refers to the time between when a user caller. The post-dial delay refers to the time between when a user
enters the destination address for the user, and ringback begins as a enters the destination address for the user, and ringback begins as a
consequence of having succesfully started ringing the phone of the consequence of having succesfully started ringing the phone of the
called party. called party.
To reduce post-dial delays, it is RECOMMENDED that the caller begin To reduce post-dial delays, it is RECOMMENDED that the caller begin
gathering candidates prior to actually sending its initial INVITE. gathering candidates prior to actually sending its initial INVITE.
This can be started upon user interface cues that a call is pending, This can be started upon user interface cues that a call is pending,
such as activity on a keypad or the phone going offhook. such as activity on a keypad or the phone going offhook.
To reduce post-pickup delays, ICE allows for media to be sent from If an offer is received in an INVITE request, the callee SHOULD
the answerer to the offerer on a candidate pair, prior to its immediately gather its candidates and then generate an answer in a
promotion to operating. However, this requires the answerer to have provisional response. When reliable provisional responses are not
generated its answer and sent it. In most cases, it will require used, the SDP in the provisional response is the answer, and that
this answer to be received by the offerer. The reason is that exact same answer reappears in the 200 OK. To deal with possible
connectivity checks or RTP packets from the answerer to the offerer losses of the provisional response, it SHOULD be retransmitted until
will not be forwarded by NATs towards the offerer until the offerer some indication of receipt. This indication can either be through
has established a permission in the NAT by generating a packet PRACK [9], or through the receipt of a successful STUN Binding
towards the answerer. Request. Even if PRACK is not used, the provisional response SHOULD
be retransmitted using the exponential backoff described in [9].
For this reason, if an offer is received in an INVITE request, the Furthermore, once the answer has been sent, the agent SHOULD begin
UAS SHOULD immediately gather its candidates and then generate an its connectivity checks. Once candidate pairs for each component of
answer in a provisional response. When reliable provisional a media stream enter the valid list, the callee can begin sending
responses are not used, the SDP in the provisional response is the media on that media stream.
answer, and that exact same answer reappears in the 200 OK. To deal
with possible losses of the provisional response, it SHOULD be
retransmitted until some indication of receipt. This indication can
either be through PRACK [11], or through the receipt of a STUN
Binding Request with a correct username and password. Even if PRACK
is not used, the provisional response SHOULD be retransmitted using
the exponential backoff described in [11]. Furthermore, once the
answer has been sent, the agent SHOULD begin its connectivity checks.
Once a candidate reaches the Valid or Recv-Valid state, the UAS has a
known-valid path for media packets towards the UAC. This point is
called the connected point in ICE.
Once the UAS reaches the connected point, media can be sent from the
UAS towards the UAC without any additional delays. However, between
the receipt of the INVITE and the connected point, any media that
needs to be sent towards the caller (such as SIP early media [26]
cannot be transmitted. For this reason, implementations MAY choose
to delay alerting the called party until the connected point is
reached. In the case of a PSTN gateway, this would mean that the
setup message into the PSTN is delayed until the connected point.
Doing this increases the post-dial delay, but has the effect of
eliminating 'ghost rings'. Ghost rings are cases where the called
party hears the phone ring, picks up, but hears nothing and cannot be
heard. This technique works without requiring support for, or usage
of, preconditions [7], since its a localized decision. It also has
the benefit of guaranteeing that not a single packet of early media
will get clipped. If an agent chooses to delay local alerting in
this way, it SHOULD generate a 180 response once alerting begins.
A slight variation of this approach is to wait for a connectivity However, prior to this point, any media that needs to be sent towards
check to succeed to a higher priority candidate pair than the the caller (such as SIP early media [25] cannot be transmitted. For
operating one. This allows for the agent to only ever send media, this reason, implementations SHOULD delay alerting the called party
early or otherwise, to a single candidate, which will work better until candidates for each component of each media stream have entered
with jitter buffers, at the expense of even greater post-dial delays. the valid list. In the case of a PSTN gateway, this would mean that
the setup message into the PSTN is delayed until this point. Doing
this increases the post-dial delay, but has the effect of eliminating
'ghost rings'. Ghost rings are cases where the called party hears
the phone ring, picks up, but hears nothing and cannot be heard.
This technique works without requiring support for, or usage of,
preconditions [6], since its a localized decision. It also has the
benefit of guaranteeing that not a single packet of media will get
clipped, so that post-pickup delay is zero. If an agent chooses to
delay local alerting in this way, it SHOULD generate a 180 response
once alerting begins.
Note that, prior to the promotion of a candidate pair to operating, Based on the rules in Section 11.1, the offerer will not be able to
the offerer will not be able to send using the candidate pair. When send media until the highest priority valid candidates match the m/c-
used with SIP, if the initial offer is sent in the INVITE, and the line. When used with SIP, if the initial offer is sent in the
answer is sent in both the provisional and final 200 OK response, the INVITE, and the answer is sent in both the provisional and final 200
offerer will not be able to send media until it sends a re-INVITE and OK response, the offerer will generally not be able to send media
receives the 200 OK response to that re-INVITE. This can take until it sends a re-INVITE and receives the 200 OK response to that
several hundred milliseconds. If this latency is an issue (it is re-INVITE. This can take several hundred milliseconds. If this
generally not considered an issue for voice systems), reliable latency is an issue (it is generally not considered an issue for
provisional responses [11] MAY be used, in which case an UPDATE [25] voice systems), reliable provisional responses [9] MAY be used, in
can be used to send an updated offer prior to the call being which case an UPDATE [24] can be used to send an updated offer prior
answered. to the call being answered.
As discussed in Section 13, offer/answer exchanges SHOULD be secured As discussed in Section 15, offer/answer exchanges SHOULD be secured
against eavesdropping and man-in-the-middle attacks. To do that, the against eavesdropping and man-in-the-middle attacks. To do that, the
usage of SIPS [2] is RECOMMENDED when used in concert with ICE. usage of SIPS [3] is RECOMMENDED when used in concert with ICE.
9. Interactions with Forking
SIP allows INVITE requests carrying offers to fork, which means that 12.2. Interactions with Forking
they are delivered to multiple user agents. Each of those user
agents then provides an answer to the offer in the INVITE. The
result is that a single offer generated by the UAC produces multiple
answers.
ICE interacts very well with forking. Indeed, ICE fixes some of the ICE interacts very well with forking. Indeed, ICE fixes some of the
problems associated with forking. Once the offer/answer exchange has problems associated with forking. Without ICE, when a call forks and
completed, the UAC will have an answer from each UAS that received the caller receives multiple incoming media streams, it cannot
the INVITE. The ICE connectivity checks that ensue will carry determine which media stream corresponds to which callee.
transport address pair IDs that correlate each of those checks (and
thus their corresponding IP addresses and ports) with a specific
remote user agent. As these checks happen before any media is
transmitted, ICE allows a UAC to disambiguate subsequent media
traffic by looking at the source IP address and port, and then
correlate that traffic with a particular remote UA. When SIP is used
without ICE, the incoming media traffic cannot be disambiguated
without an additional offer/answer exchange.
10. Interactions with Preconditions With ICE, this problem is resolved. The connectivity checks which
occur prior to transmission of media carry username fragments, which
in turn are correlated to a specific callee. Subsequent media
packets which arrive on the same 5-tuple as the connectivity check
will be associated with that same callee. Thus, the caller can
perform this correlation as long as it has received an answer.
Because ICE involves multiple addresses and pre-session activities, Section 11.2 introduces a requirement for agents receiving media;
its interactions with preconditions merits further discussion. namely, that media should be discarded until a check has been
received from that peer. Unfortunately, this mechanism doesn't work
well in forking situations where a subset of the recipients are not
ICE-aware. Those recipients will not send checks, and media from
them will be discarded.
OPEN ISSUE: Obviously this is an issue. Need to either remove
this feature of ICE or find a way to make it work better in
forking situations.
12.3. Interactions with Preconditions
Quality of Service (QoS) preconditions, which are defined in RFC 3312 Quality of Service (QoS) preconditions, which are defined in RFC 3312
[7] and RFC 4032 [8], apply only to the IP addresses and ports listed [6] and RFC 4032 [7], apply only to the IP addresses and ports listed
in the m/c lines in an offer/answer. If ICE changes the address and in the m/c lines in an offer/answer. If ICE changes the address and
port where media is received, this change is reflected in the m/c port where media is received, this change is reflected in the m/c
lines of a new offer/answer. As such, it appears like any other re- lines of a new offer/answer. As such, it appears like any other re-
INVITE would, and is fully treated in RFC 3312 and 4032, which INVITE would, and is fully treated in RFC 3312 and 4032, which apply
applies without regard to the fact that the m/c lines are changing without regard to the fact that the m/c lines are changing due to ICE
due to ICE negotiations ocurring "in the background". negotiations ocurring "in the background".
However, usage of early candidates with QoS preconditions is NOT Indeed, an agent SHOULD NOT indicate that Qos preconditions have been
RECOMMENDED, since QoS will only be reserved for the candidate pair met until the ICE checks have completed and selected the candidate
in the m/c-line. An agent SHOULD only send to the operating pairs to be used for media.
candidate (once it enters the Valid or Recv-Valid states) if QoS
preconditions are used for a media session.
ICE also has (purposeful) interactions with connectivity ICE also has (purposeful) interactions with connectivity
preconditions [27]. Those interactions are described there. preconditions [26]. Those interactions are described there.
11. Examples OPEN ISSUE: Are these preconditions really needed with ICE? ICE
provides a connectivity precondition on its own using the
mechanisms described above.
This section provides two examples. One is a very basic example, and 12.4. Interactions with Third Party Call Control
the other is more elaborate. A common configuration and setup is
used in both cases. ICE works with Flows I and IV as described in [16]. Flow I works
without the controller supporting or being aware of ICE. Flow IV
will work as long as the controller passes along the ICE attributes
without alteration. Flow III may disrupt ICE processing, since it
will distort the stream ID values used in the computation of
priorities. When there is but a single media stream, Flow III will
work as long as the controller passes through the ICE attributes
unmodified. Flow II is fundamentally incompatible with ICE; each
agent will believe itself to be the answerer and thus never generate
a re-INVITE.
OPEN ISSUE: Its really too bad flow III doesn't work with
multimedia; should consider ways to make it work. There are
several ways.
The flows for continued operation, as described in Section 7 of RFC
3725, require additional behavior of ICE implementations to support.
In particular, if an agent receives a mid-dialog re-INVITE that
contains no offer, it MUST go through the process of gathering
candidates, prioritizing them and generating an offer, as if this was
an initial offer for a session. Furthermore, that list of candidates
SHOULD include the ones currently in-use.
13. Grammar
This specification defines four new SDP attributes - the "candidate",
"remote-candidates", "ice-ufrag" and "ice-pwd" attributes.
The candidate attribute is a media-level attribute only. It contains
a transport address for a candidate that can be used for connectivity
checks.
The syntax of this attribute is defined using Augmented BNF as
defined in RFC 4234 [8]:
candidate-attribute = "candidate" ":" foundation SP component-id SP
transport SP
priority SP
connection-address SP ;from RFC 4566
port ;port from RFC 4566
[SP cand-type]
[SP rel-addr]
[SP rel-port]
*(SP extension-att-name SP
extension-att-value)
foundation = 1*ice-char
component-id = 1*DIGIT
transport = "UDP" / transport-extension
transport-extension = token ; from RFC 3261
priority = 1*DIGIT
cand-type = "typ" SP candidate-types
candidate-types = "host" / "srflx" / "prflx" / "relay" / token
rel-addr = "raddr" SP connection-address
rel-port = "rport" SP port
extension-att-name = byte-string ;from RFC 4566
extension-att-value = byte-string
ice-char = ALPHA / DIGIT / "+" / "/"
The foundation is composed of one or more ice-char. The component-id
is a positive integer, which identifies the specific component for
which the transport address is a candidate. It MUST start at 1 and
MUST increment by 1 for each component of a particular candidate.
The connect-address production is taken from RFC 4566 [10], allowing
for IPv4 addresses, IPv6 addresses and FQDNs. The port production is
also taken from RFC 4566 [10]. The token production is taken from
RFC 3261 [3]. The transport production indicates the transport
protocol for the candidate. This specification only defines UDP.
However, extensibility is provided to allow for future transport
protocols to be used with ICE, such as TCP or the Datagram Congestion
Control Protocol (DCCP) [28].
The cand-type production encodes the type of candidate. This
specification defines the values "host", "srflx", "prflx" and "relay"
for host, server reflexive, peer reflexive and relayed candidates,
respectively. The set of candidate types is extensible for the
future. Inclusion of the candidate type is optional. The rel-addr
and rel-port productions convey information the related transport
addresses. Rules for inclusion of these values is described in
Section 4.4.
The a=candidate attribute can itself be extended. The grammar allows
for new name/value pairs to be added at the end of the attribute. An
implementation MUST ignore any name/value pairs it doesn't
understand.
The syntax of the "remote-candidates" attribute is defined using
Augmented BNF as defined in RFC 4234 [8]. The remote-candidates
attribute is a media level attribute only.
remote-candidate-att = "remote-candidates" ":" remote-candidate
0*(SP remote-candidate)
remote-candidate = component-ID SP connection-address SP port
The attribute contains a connection-address and port for each
component. The ordering of components is irrelevant. However, a
value MUST be present for each component of a media stream.
The syntax of the "ice-pwd" and "ice-ufrag" attributes are defined
as:
ice-pwd-att = "ice-pwd" ":" password
ice-ufrag-att = "ice-ufrag" ":" ufrag
password = 22*ice-char
ufrag = 4*ice-char
The "ice-pwd" and "ice-ufrag" attributes can appear at either the
session-level or media-level. When present in both, the value in the
media-level takes precedence. Thus, the value at the session level
is effectively a default that applies to all media streams, unless
overriden by a media-level value.
14. Example
Two agents, L and R, are using ICE. Both agents have a single IPv4 Two agents, L and R, are using ICE. Both agents have a single IPv4
interface. For agent L, it is 10.0.1.1, and for agent R, 192.0.2.1. interface. For agent L, it is 10.0.1.1, and for agent R, 192.0.2.1.
Both are configured with a single STUN server each (indeed, the same Both are configured with a single STUN server each (indeed, the same
one for each), which is listening for STUN requests at an IP address one for each), which is listening for STUN requests at an IP address
of 192.0.2.2 and port 3478. This STUN server supports both the of 192.0.2.2 and port 3478. This STUN server supports both the
Binding Discovery usage and the Relay usage. Agent L is behind a Binding Discovery usage and the Relay usage. Agent L is behind a
NAT, and agent R is on the public Internet. The public side of the NAT, and agent R is on the public Internet. The NAT has an endpoint
NAT has an IP address of 192.0.2.3. independent mapping property and an address dependent filtering
property. The public side of the NAT has an IP address of 192.0.2.3.
To facilitate understanding, transport addresses are listed using To facilitate understanding, transport addresses are listed using
variables that have mnemonic names. This format of the anem is variables that have mnemonic names. The format of the name is
entity-type-seqno, where entity refers to the entity whose interface entity-type-seqno, where entity refers to the entity whose interface
the transport address is on, and is one of "L", "R", "STUN", or the transport address is on, and is one of "L", "R", "STUN", or
"NAT". The type is either "PUB" for transport addresses that are "NAT". The type is either "PUB" for transport addresses that are
public, and "PRIV" for transport addresses that are private. public, and "PRIV" for transport addresses that are private.
Finally, seq-no is a sequence number that is different for each Finally, seq-no is a sequence number that is different for each
transport address of the same type on a particular entity. Each transport address of the same type on a particular entity. Each
variable has an IP address and port, denoted by varname.IP and variable has an IP address and port, denoted by varname.IP and
varname.PORT, respectively, where varname is the name of the varname.PORT, respectively, where varname is the name of the
variable. variable.
In addition, candidate IDs are also listed using variables that have
mnemonic names. Agent L uses candidate ID L1 for its local
candidate, L2 for its server reflexive candidate, and L3 for its
relayed candidate. Agent R uses R1 for its local candidate and R2
for its relayed candidate. The password is LPASS for each candidate
from agent L, and RPASS for each candidate from agent R.
The STUN server has advertised transport address STUN-PUB-1 (which is The STUN server has advertised transport address STUN-PUB-1 (which is
192.0.2.2:3478) for both the binding discovery usage and the relay 192.0.2.2:3478) for both the binding discovery usage and the relay
usage. usage. However, neither agent is using the relay usage.
In the call flow itself, STUN messages are annotated with several In the call flow itself, STUN messages are annotated with several
attributes. The "S=" attribute indicates the source transport attributes. The "S=" attribute indicates the source transport
address of the message. The "D=" attribute indicates the destination address of the message. The "D=" attribute indicates the destination
transport address of the message. The "MA=" attribute is used in transport address of the message. The "MA=" attribute is used in
STUN Binding Response messages, STUN Binding Response messages STUN Binding Response messages and refers to the mapped address.
carried in a STUN Send Request or Data Indication, and in a Allocate
Response, and refers to the reflexive transport address derived from
the XOR-MAPPED-ADDRESS attribute. The "RA=" attribute is used in
STUN Data Indications, and refers to the value of the REMOTE-ADDRESS
attribute. The "U=" attribute is used in STUN Requests, and
corresponds to the STUN USERNAME. The "DA=" attribute is used in
STUN Send requests, and refers to the value of the DESTINATION-
ADDRESS attribute. The "R=" attribute is used in Allocate responses,
and it indicates the value of the RELAY-ADDRESS attribute.
The call flow examples omit STUN authentication operations.
11.1. Basic Example
In this example, the NAT has an endpoint independent mapping property The call flow examples omit STUN authentication operations and RTCP,
and an address dependent filtering property. Neither agent is using and focus on RTP for a single media stream.
the STUN relay usage, only the binding discovery usage. As a
consequence, agent L will end up with two candidates - a local
candidate and a server reflexive candidate. Agent R will have one -
a local candidate (the reflexive candidate will be identical to the
local one, and thus discarded). The agents are seeking to
communicate using a single RTP-based voice stream. RTCP is not used.
As a consequence, each candidate has one component.
L NAT STUN R L NAT STUN R
|RTP STUN alloc. | | |RTP STUN alloc. | |
|(1) STUN Req | | | |(1) STUN Req | | |
|S=$L-PRIV-1 | | | |S=$L-PRIV-1 | | |
|D=$STUN-PUB-1 | | | |D=$STUN-PUB-1 | | |
|------------->| | | |------------->| | |
| |(2) STUN Req | | | |(2) STUN Req | |
| |S=$NAT-PUB-1 | | | |S=$NAT-PUB-1 | |
| |D=$STUN-PUB-1 | | | |D=$STUN-PUB-1 | |
skipping to change at page 69, line 36 skipping to change at page 42, line 21
| | |<-------------| | | |<-------------|
| | |(7) STUN Res | | | |(7) STUN Res |
| | |S=$STUN-PUB-1 | | | |S=$STUN-PUB-1 |
| | |D=$R-PUB-1 | | | |D=$R-PUB-1 |
| | |MA=$R-PUB-1 | | | |MA=$R-PUB-1 |
| | |------------->| | | |------------->|
|(8) answer | | | |(8) answer | | |
|<-------------------------------------------| |<-------------------------------------------|
| |(9) Bind Req | | | |(9) Bind Req | |
| |S=$R-PUB-1 | | | |S=$R-PUB-1 | |
| |D=$NAT-PUB-1 | | | |D=L-PRIV-1 | |
| |<----------------------------| | |<----------------------------|
| |Dropped | | | |Dropped | |
|(10) Bind Req | | | |(10) Bind Req | | |
|S=$L-PRIV-1 | | | |S=$L-PRIV-1 | | |
|D=$R-PUB-1 | | | |D=$R-PUB-1 | | |
|------------->| | | |------------->| | |
| |(11) Bind Req | | | |(11) Bind Req | |
| |S=$NAT-PUB-1 | | | |S=$NAT-PUB-1 | |
| |D=$R-PUB-1 | | | |D=$R-PUB-1 | |
| |---------------------------->| | |---------------------------->|
| |(12) Bind Res | | | |(12) Bind Res | |
| |S=$R-PUB-1 | | | |S=$R-PUB-1 | |
| |D=$NAT-PUB-1 | | | |D=$NAT-PUB-1 | |
| |MA=$NAT-PUB-1 | | | |MA=$NAT-PUB-1 | |
| |<----------------------------| | |<----------------------------|
|(13) Bind Res | | | |(13) Bind Res | | |
|S=$R-PUB-1 | | | |S=$R-PUB-1 | | |
|D=$L-PRIV-1 | | | |D=$L-PRIV-1 | | |
|MA=$NAT-PUB-1 | | | |MA=$NAT-PUB-1 | | |
|<-------------| | | |<-------------| | |
|RTP flows | | | |(14) Offer | | |
| |(14) Bind Req | | |------------------------------------------->|
|(15) Answer | | |
|<-------------------------------------------|
| |(16) Bind Req | |
| |S=$R-PUB-1 | | | |S=$R-PUB-1 | |
| |D=$NAT-PUB-1 | | | |D=$NAT-PUB-1 | |
| |<----------------------------| | |<----------------------------|
|(15) Bind Req | | | |(17) Bind Req | | |
|S=$R-PUB-1 | | | |S=$R-PUB-1 | | |
|D=$L-PRIV-1 | | | |D=$L-PRIV-1 | | |
|<-------------| | | |<-------------| | |
|(16) Bind Res | | | |(18) Bind Res | | |
|S=$L-PRIV-1 | | | |S=$L-PRIV-1 | | |
|D=$R-PUB-1 | | | |D=$R-PUB-1 | | |
|MA=$R-PUB-1 | | | |MA=$R-PUB-1 | | |
|------------->| | | |------------->| | |
| |(17) Bind Res | | | |(19) Bind Res | |
| |S=$NAT-PUB-1 | | | |S=$NAT-PUB-1 | |
| |D=$R-PUB-1 | | | |D=$R-PUB-1 | |
| |MA=$R-PUB-1 | | | |MA=$R-PUB-1 | |
| |---------------------------->| | |---------------------------->|
| | | |RTP flows |RTP flows | | |
Figure 12 Figure 9
First, agent L obtains a server reflexive transport address for its First, agent L obtains a host candidate from its local interface (not
RTP packets (messages 1-4). Recall that the NAT has the address and shown), and from that, sends a STUN Binding Request to the STUN
port independent mapping property. Here, it creates a binding of server to get a server reflexive candidate (messages 1-4). Recall
NAT-PUB-1 for this UDP request, and this becomes the server reflexive that the NAT has the address and port independent mapping property.
transport address for RTP, the sole component of its server reflexive Here, it creates a binding of NAT-PUB-1 for this UDP request, and
candidate. this becomes the server reflexive candidate for RTP.
With its two candidates, agent L prioritizes them, choosing the local Agent L sets a type preference of 9 for the host candidate and 5 for
candidate as highest priority, followed by the server reflexive the server reflexive. The local preference is 9. Based on this, the
candidate. It chooses its server reflexive candidate as the priority of the host candidate is 9909 and for the server reflexive
operating candidate, and encodes it into the m/c-line. The resulting candidate is 5909. The host candidate is assigned a foundation of 1,
offer (message 5) looks like (lines folded for clarity): and the server reflexive, a foundation of 2. It chooses its server
reflexive candidate as the in-use candidate, and encodes it into the
m/c-line. The resulting offer (message 5) looks like (lines folded
for clarity):
v=0 v=0
o=jdoe 2890844526 2890842807 IN IP4 $L-PRIV-1.IP o=jdoe 2890844526 2890842807 IN IP4 $L-PRIV-1.IP
s= s=
c=IN IP4 $NAT-PUB-1.IP c=IN IP4 $NAT-PUB-1.IP
t=0 0 t=0 0
a=ice-pwd:$LPASS a=ice-pwd:asd88fgpdd777uzjYhagZg
a=ice-ufrag:8hhY
m=audio $NAT-PUB-1.PORT RTP/AVP 0 m=audio $NAT-PUB-1.PORT RTP/AVP 0
a=rtpmap:0 PCMU/8000 a=rtpmap:0 PCMU/8000
a=candidate:$L1 1 UDP 1.0 $L-PRIV-1.IP $L-PRIV-1.PORT typ local a=candidate:1 1 UDP 9909 $L-PRIV-1.IP $L-PRIV-1.PORT typ local
a=candidate:$L2 1 UDP 0.7 $NAT-PUB-1.IP $NAT-PUB-1.PORT typ srflx raddr a=candidate:2 1 UDP 5909 $NAT-PUB-1.IP $NAT-PUB-1.PORT typ srflx raddr
$L-PRIV-1.IP rport $L-PRIV-1.PORT $L-PRIV-1.IP rport $L-PRIV-1.PORT
The offer, with the variables replaced with their values, will look The offer, with the variables replaced with their values, will look
like (lines folded for clarity): like (lines folded for clarity):
v=0 v=0
o=jdoe 2890844526 2890842807 IN IP4 10.0.1.1 o=jdoe 2890844526 2890842807 IN IP4 10.0.1.1
s= s=
c=IN IP4 192.0.2.3 c=IN IP4 192.0.2.3
t=0 0 t=0 0
a=ice-pwd:asd88fgpdd777uzjYhagZg a=ice-pwd:asd88fgpdd777uzjYhagZg
a=ice-ufrag:8hhY
m=audio 45664 RTP/AVP 0 m=audio 45664 RTP/AVP 0
a=rtpmap:0 PCMU/8000 a=rtpmap:0 PCMU/8000
a=candidate:8hhY 1 UDP 1.0 10.0.1.1 8998 typ local a=candidate:1 1 UDP 9909 10.0.1.1 8998 typ local
a=candidate:Bzo8 1 UDP 0.7 192.0.2.3 45664 typ srflx raddr a=candidate:2 1 UDP 5909 192.0.2.3 45664 typ srflx raddr
10.0.1.1 rport 8998 10.0.1.1 rport 8998
This offer is received at agent R. Agent R will gather its server This offer is received at agent R. Agent R will obtain a host
reflexive transport address (messages 6-7). Since R is not behind a candidate, and from it, obtain a server reflexive candidate (messages
NAT, this address is identical to its local transport address, and 6-7). Since R is not behind a NAT, this candidate is identical to
was obtained from its local transport address, and thus does not its host candidate, and they share the same base. It therefore
represent a separate candidate. It therefore ends up with a single discards this candidate and ends up with a single host candidate.
local candidate with a single component for RTP. Its resulting With identical type and local preferences as L, the priority for this
answer looks like: candidate is 9909. It chooses a foundation of 1 for its single
candidate. Its resulting answer looks like:
v=0 v=0
o=bob 2808844564 2808844564 IN IP4 $R-PUB-1.IP o=bob 2808844564 2808844564 IN IP4 $R-PUB-1.IP
s= s=
c=IN IP4 $R-PUB-1.IP c=IN IP4 $R-PUB-1.IP
t=0 0 t=0 0
a=ice-pwd:$RPASS a=ice-pwd:YH75Fviy6338Vbrhrlp8Yh
a=ice-ufrag:9uB6
m=audio $R-PUB-1.PORT RTP/AVP 0 m=audio $R-PUB-1.PORT RTP/AVP 0
a=rtpmap:0 PCMU/8000 a=rtpmap:0 PCMU/8000
a=candidate:$R1 1 UDP 1.0 $R-PUB-1.IP $R-PUB-1.PORT typ local a=candidate:1 1 UDP 9909 $R-PUB-1.IP $R-PUB-1.PORT typ local
With the variables filled in: With the variables filled in:
v=0 v=0
o=bob 2808844564 2808844564 IN IP4 192.0.2.1 o=bob 2808844564 2808844564 IN IP4 192.0.2.1
s= s=
c=IN IP4 192.0.2.1 c=IN IP4 192.0.2.1
t=0 0 t=0 0
a=ice-pwd:YH75Fviy6338Vbrhrlp8Yh a=ice-pwd:YH75Fviy6338Vbrhrlp8Yh
a=ice-ufrag:9uB6
m=audio 3478 RTP/AVP 0 m=audio 3478 RTP/AVP 0
a=rtpmap:0 PCMU/8000 a=rtpmap:0 PCMU/8000
a=candidate:9uB6 1 UDP 1.0 192.0.2.1 3478 typ local a=candidate:1 1 UDP 9909 192.0.2.1 3478 typ local
Next, agents L and R form candidate pairs, the candidate pair Agents L and R both pair up the candidates. They both initially have
priority ordered list and transport address pair check ordered list. two. However, agent L will prune the pair containing its server
The candidate pair priority ordered list will have two entries, and reflexive candidate, resulting in just one. At agent L, this pair
be identical for L and R. The highest priority one will be the one (the check) has a local candidate of $L_PRIV_1 and remote candidate
containing L2 and R1 (since its the operating candidate pair), and of $R_PUB_1, and has a candidate pair priority of 99099909.039. At
the second one will be L1 and R1. The transport address pair check agent R, there are two checks. The highest priority has a local
ordered list initially starts with two entries. For agent L, this candidate of $R_PUB_1 and remote candidate of $L_PRIV_1 and has a
will be L2:1:R1:1 and L1:1:R1:1. However, after the trimming priority of 99099909.039, and the second has a local candidate of
operation, agent L will remove the second transport address pair, $R_PUB_1 and remote candidate of $NAT_PUB_1 and priority 59099909.75.
since it shares the same origination transport address as the first
(L-PRIV-1 for both). However, R will keep both transport address
pairs.
Agent R begins its connectivity check (message 9) for transport Agent R begins its connectivity check (message 9) for the first pair
address pair L2:1:R1:1 (note that, from its perspective, the (between the two host candidates). The host candidate from agent L
transport address pair has the ID R1:1:L2:1, and this ID would appear is private and behind a different NAT, and thus this check is
in the USERNAME of STUN requests it receives). Since the NAT has a
filtering policy of address dependent, the connectivity check is
discarded. discarded.
When agent L gets the answer, it begins its connectivity check for When agent L gets the answer, it performs its one and only
L2:1:R1:1 (messages 10-13), which succeed, placing the transport connectivity check (messages 10-13). This will succeed. This causes
address pair and resulting candidate pair into the Recv-Valid state. agent L to create a new pair, whos local candidate is from the mapped
L can now send media to R. When agent R receives the connectivity address in the binding response (NAT-PUB-1 from message 13) and whose
check (message 11), it is a match for the transport address pair, and remote candidate is the destination of the request (R-PUB-1 from
the state of the transport address pair moves to Send-Valid. Agent R message 10). This is added to the valid list. At this point, agent
begins its connectivity checks (messages 14-17). When the check L examines the valid list and sees that there is a candidate there
arrives at the NAT (message 14), it is permitted to pass since a for each component of each media stream (which is just RTP for the
permission was created towards R-PUB-1 as a consequence of message single audio stream). It therefore considers ICE checks complete and
10. This check arrives at agent L, which generates a success sends an updated offer (message 14). This offer serves only to
response (message 16), and updates the state of the transport address remove the candidate that was not selected and indicate the remote
pair to Valid. This response arrives at agent R, which also updates candidates; the m/c-line remains unchanged. This offer looks like:
the state of the transport address pair to Valid. Now, media can
flow from agent R to agent L as well.
11.2. Advanced Example
In this more advanced example, The NAT has address and port dependent
mapping and filtering properties. Both agents use the STUN relay
usage in addition to the binding discovery usage. As a consequence,
agent L will end up with three candidates - a local candidate, a
relayed candidate, and a server reflexive candidate. Agent R will
have two - a local candidate and a relayed candidate (the server
reflexive candidate will equal the local candidate and thus not be
used). The agents are seeking to communicate using a single RTP-
based voice stream, but are using RTCP. As a consequence, each
candidate has two components - one for RTP and one for RTCP.
L NAT STUN R
| | | |
| | | |
| | | |
|RTP Alloc. | | |
| | | |
| | | |
| | | |
|(1) Alloc Req | | |
|S=L-PRIV-1 | | |
|D=STUN-PUB-1 | | |
|------------->| | |
| | | |
| | | |
| |(2) Alloc Req | |
| |S=NAT-PUB-1 | |
| |D=STUN-PUB-1 | |
| |------------->| |
| |(3) Alloc Res | |