draft-ietf-mmusic-ice-05.txt   draft-ietf-mmusic-ice-06.txt 
MMUSIC J. Rosenberg MMUSIC J. Rosenberg
Internet-Draft Cisco Systems Internet-Draft Cisco Systems
Expires: January 18, 2006 July 17, 2005 Expires: April 22, 2006 October 19, 2005
Interactive Connectivity Establishment (ICE): A Methodology for Network Interactive Connectivity Establishment (ICE): A Methodology for Network
Address Translator (NAT) Traversal for Offer/Answer Protocols Address Translator (NAT) Traversal for Offer/Answer Protocols
draft-ietf-mmusic-ice-05 draft-ietf-mmusic-ice-06
Status of this Memo Status of this Memo
By submitting this Internet-Draft, each author represents that any By submitting this Internet-Draft, each author represents that any
applicable patent or other IPR claims of which he or she is aware applicable patent or other IPR claims of which he or she is aware
have been or will be disclosed, and any of which he or she becomes have been or will be disclosed, and any of which he or she becomes
aware will be disclosed, in accordance with Section 6 of BCP 79. aware will be disclosed, in accordance with Section 6 of BCP 79.
Internet-Drafts are working documents of the Internet Engineering Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF), its areas, and its working groups. Note that Task Force (IETF), its areas, and its working groups. Note that
skipping to change at page 1, line 34 skipping to change at page 1, line 34
and may be updated, replaced, or obsoleted by other documents at any and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress." material or to cite them other than as "work in progress."
The list of current Internet-Drafts can be accessed at The list of current Internet-Drafts can be accessed at
http://www.ietf.org/ietf/1id-abstracts.txt. http://www.ietf.org/ietf/1id-abstracts.txt.
The list of Internet-Draft Shadow Directories can be accessed at The list of Internet-Draft Shadow Directories can be accessed at
http://www.ietf.org/shadow.html. http://www.ietf.org/shadow.html.
This Internet-Draft will expire on January 18, 2006. This Internet-Draft will expire on April 22, 2006.
Copyright Notice Copyright Notice
Copyright (C) The Internet Society (2005). Copyright (C) The Internet Society (2005).
Abstract Abstract
This document describes a methodology for Network Address Translator This document describes a protocol for Network Address Translator
(NAT) traversal for multimedia session signaling protocols, such as (NAT) traversal for multimedia session signaling protocols based on
the Session Initiation Protocol (SIP). This methodology is called the offer/answer model, such as the Session Initiation Protocol
Interactive Connectivity Establishment (ICE). ICE makes use of (SIP). This protocol is called Interactive Connectivity
existing protocols, such as Simple Traversal of UDP Through NAT Establishment (ICE). ICE makes use of existing protocols, such as
(STUN) and Traversal Using Relay NAT (TURN). ICE makes use of STUN Simple Traversal of UDP Through NAT (STUN) and Traversal Using Relay
in peer-to-peer cooperative fashion, allowing participants to NAT (TURN). ICE makes use of STUN in peer-to-peer cooperative
discover, create and verify mutual connectivity. fashion, allowing participants to discover, create and verify mutual
connectivity.
Table of Contents Table of Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 4
2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 4 2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . 4
3. Overview of ICE . . . . . . . . . . . . . . . . . . . . . . . 6 3. Overview of ICE . . . . . . . . . . . . . . . . . . . . . . 8
4. Sending the Initial Offer . . . . . . . . . . . . . . . . . . 8 4. Sending the Initial Offer . . . . . . . . . . . . . . . . . 10
5. Receipt of the Offer and Generation of the Answer . . . . . . 9 5. Receipt of the Offer and Generation of the Answer . . . . . 11
6. Processing the Answer . . . . . . . . . . . . . . . . . . . . 9 6. Processing the Answer . . . . . . . . . . . . . . . . . . . 11
7. Common Procedures . . . . . . . . . . . . . . . . . . . . . . 10 7. Common Procedures . . . . . . . . . . . . . . . . . . . . . 11
7.1 Gathering Candidates . . . . . . . . . . . . . . . . . . . 10 7.1 Gathering Candidates . . . . . . . . . . . . . . . . . . . 12
7.2 Encoding Candidates into SDP . . . . . . . . . . . . . . . 13 7.2 Prioritizing the Candidates and Choosing an Active One . . 15
7.3 Prioritizing the Transport Addresses and Choosing an 7.3 Encoding Candidates into SDP . . . . . . . . . . . . . . . 17
Active One . . . . . . . . . . . . . . . . . . . . . . . . 15 7.4 Forming Candidate Pairs . . . . . . . . . . . . . . . . . 19
7.4 Connectivity Checks . . . . . . . . . . . . . . . . . . . 17 7.5 Ordering the Candidate Pairs . . . . . . . . . . . . . . . 22
7.4.1 UDP Connectivity Checks . . . . . . . . . . . . . . . 19 7.6 Performing the Connectivity Checks . . . . . . . . . . . . 23
7.4.1.1 Send Validation . . . . . . . . . . . . . . . . . 19 7.7 Sending a Binding Request for Connectivity Checks . . . . 27
7.4.1.2 Receive Validation . . . . . . . . . . . . . . . . 20 7.8 Receiving a Binding Request for Connectivity Checks . . . 29
7.4.1.3 Learning New Candidates from Connectivity 7.9 Promoting a Candidate to Active . . . . . . . . . . . . . 31
Checks . . . . . . . . . . . . . . . . . . . . . . 22 7.10 Learning New Candidates from Connectivity Checks . . . . 31
7.4.1.3.1 On Receipt of a Binding Request . . . . . . . 23 7.10.1 On Receipt of a Binding Request . . . . . . . . . . 32
7.4.1.3.2 On Receipt of a Binding Response . . . . . . . 26 7.10.2 On Receipt of a Binding Response . . . . . . . . . . 35
7.4.2 TCP Connectivity Checks . . . . . . . . . . . . . . . 26 7.11 Subsequent Offer/Answer Exchanges . . . . . . . . . . . 37
7.4.2.1 Connection Establishment . . . . . . . . . . . . . 26 7.11.1 Sending of a Subsequent Offer . . . . . . . . . . . 37
7.4.2.2 Sending STUN Binding Requests . . . . . . . . . . 27 7.11.2 Receiving the Offer and Sending an Answer . . . . . 39
7.4.2.3 Receiving STUN Requests . . . . . . . . . . . . . 29 7.11.3 Receiving the Answer . . . . . . . . . . . . . . . . 41
7.5 Promoting a Valid Candidate to Active . . . . . . . . . . 30 7.12 Binding Keepalives . . . . . . . . . . . . . . . . . . . 41
7.5.1 Minimum Requirements . . . . . . . . . . . . . . . . . 30 7.13 Sending Media . . . . . . . . . . . . . . . . . . . . . 42
7.5.2 Suggested Algorithm . . . . . . . . . . . . . . . . . 31 8. Guidelines for Usage with SIP . . . . . . . . . . . . . . . 43
7.6 Subsequent Offer/Answer Exchanges . . . . . . . . . . . . 33 9. Interactions with Forking . . . . . . . . . . . . . . . . . 44
7.6.1 Sending of an Offer . . . . . . . . . . . . . . . . . 33 10. Interactions with Preconditions . . . . . . . . . . . . . . 45
7.6.2 Receiving the Offer and Sending an Answer . . . . . . 34 11. Example . . . . . . . . . . . . . . . . . . . . . . . . . . 45
7.6.3 Receiving the Answer . . . . . . . . . . . . . . . . . 36 12. Grammar . . . . . . . . . . . . . . . . . . . . . . . . . . 68
7.7 Binding Keepalives . . . . . . . . . . . . . . . . . . . . 37 13. Security Considerations . . . . . . . . . . . . . . . . . . 69
7.8 Sending Media . . . . . . . . . . . . . . . . . . . . . . 38 13.1 Attacks on Connectivity Checks . . . . . . . . . . . . . 69
8. Interactions with Forking . . . . . . . . . . . . . . . . . . 38 13.2 Attacks on Address Gathering . . . . . . . . . . . . . . 72
9. Interactions with Preconditions . . . . . . . . . . . . . . . 38 13.3 Attacks on the Offer/Answer Exchanges . . . . . . . . . 73
10. Example . . . . . . . . . . . . . . . . . . . . . . . . . . 39 13.4 Insider Attacks . . . . . . . . . . . . . . . . . . . . 73
11. Grammar . . . . . . . . . . . . . . . . . . . . . . . . . . 39 13.4.1 The Voice Hammer Attack . . . . . . . . . . . . . . 73
12. Security Considerations . . . . . . . . . . . . . . . . . . 40 13.4.2 STUN Amplification Attack . . . . . . . . . . . . . 74
13. IANA Considerations . . . . . . . . . . . . . . . . . . . . 42 14. IANA Considerations . . . . . . . . . . . . . . . . . . . . 74
14. IAB Considerations . . . . . . . . . . . . . . . . . . . . . 42 14.1 candidate Attribute . . . . . . . . . . . . . . . . . . 74
14.1 Problem Definition . . . . . . . . . . . . . . . . . . . . 42 14.2 remote-candidate Attribute . . . . . . . . . . . . . . . 75
14.2 Exit Strategy . . . . . . . . . . . . . . . . . . . . . . 43 15. IAB Considerations . . . . . . . . . . . . . . . . . . . . . 75
14.3 Brittleness Introduced by ICE . . . . . . . . . . . . . . 43 15.1 Problem Definition . . . . . . . . . . . . . . . . . . . 75
14.4 Requirements for a Long Term Solution . . . . . . . . . . 44 15.2 Exit Strategy . . . . . . . . . . . . . . . . . . . . . 76
14.5 Issues with Existing NAPT Boxes . . . . . . . . . . . . . 45 15.3 Brittleness Introduced by ICE . . . . . . . . . . . . . 76
15. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 45 15.4 Requirements for a Long Term Solution . . . . . . . . . 77
16. References . . . . . . . . . . . . . . . . . . . . . . . . . 45 15.5 Issues with Existing NAPT Boxes . . . . . . . . . . . . 78
16.1 Normative References . . . . . . . . . . . . . . . . . . . 45 16. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 78
16.2 Informative References . . . . . . . . . . . . . . . . . . 46 17. References . . . . . . . . . . . . . . . . . . . . . . . . . 78
Author's Address . . . . . . . . . . . . . . . . . . . . . . . 47 17.1 Normative References . . . . . . . . . . . . . . . . . . 78
Intellectual Property and Copyright Statements . . . . . . . . 48 17.2 Informative References . . . . . . . . . . . . . . . . . 79
Author's Address . . . . . . . . . . . . . . . . . . . . . . 81
Intellectual Property and Copyright Statements . . . . . . . 82
1. Introduction 1. Introduction
A multimedia session signaling protocol is a protocol that exchanges A multimedia session signaling protocol is a protocol that exchanges
control messages between a pair of agents for the purposes of control messages between a pair of agents for the purposes of
establishing the flow of media traffic between them. This media flow establishing the flow of media traffic between them. This media flow
is distinct from the flow of control messages, and may take a is distinct from the flow of control messages, and may take a
different path through the network. Examples of such protocols are different path through the network. Examples of such protocols are
the Session Initiation Protocol (SIP) [3], the Real Time Streaming the Session Initiation Protocol (SIP) [3], the Real Time Streaming
Protocol (RTSP) [16] and the International Telecommunications Union Protocol (RTSP) [17] and the International Telecommunications Union
(ITU) H.323. (ITU) H.323.
These protocols, by nature of their design, are difficult to operate These protocols, by nature of their design, are difficult to operate
through Network Address Translators (NAT). Because their purpose in through Network Address Translators (NAT). Because their purpose is
life is to establish a flow of packets, they tend to carry IP to establish a flow of media packets, they tend to carry IP addresses
addresses within their messages, which is known to be problematic within their messages, which is known to be problematic through NAT
through NAT [17]. The protocols also seek to create a media flow [18]. The protocols also seek to create a media flow directly
directly between participants, so that there is no application layer between participants, so that there is no application layer
intermediary between them. This is done to reduce media latency, intermediary between them. This is done to reduce media latency,
decrease packet loss, and reduce the operational costs of deploying decrease packet loss, and reduce the operational costs of deploying
the application. However, this is difficult to accomplish through the application. However, this is difficult to accomplish through
NAT. A full treatment of the reasons for this is beyond the scope of NAT. A full treatment of the reasons for this is beyond the scope of
this specification. this specification.
Numerous solutions have been proposed for allowing these protocols to Numerous solutions have been proposed for allowing these protocols to
operate through NAT. These include Application Layer Gateways operate through NAT. These include Application Layer Gateways
(ALGs), the Middlebox Control Protocol [18], Simple Traversal of UDP (ALGs), the Middlebox Control Protocol [20], Simple Traversal of UDP
through NAT (STUN) [1], Traversal Using Relay NAT [14], and Realm through NAT (STUN) [1], Traversal Using Relay NAT [16], and Realm
Specific IP [19] [20] along with session description extensions Specific IP [21] [22] along with session description extensions
needed to make them work, such as the Session Description Protocol needed to make them work, such as the Session Description Protocol
(SDP) [7] attribute for the Real Time Control Protocol (RTCP) [2]. (SDP) [7] attribute for the Real Time Control Protocol (RTCP) [2].
Unfortunately, these techniques all have pros and cons which make Unfortunately, these techniques all have pros and cons which make
each one optimal in some network topologies, but a poor choice in each one optimal in some network topologies, but a poor choice in
others. The result is that administrators and implementors are others. The result is that administrators and implementors are
making assumptions about the topologies of the networks in which making assumptions about the topologies of the networks in which
their solutions will be deployed. This introduces complexity and their solutions will be deployed. This introduces complexity and
brittleness into the system. What is needed is a single solution brittleness into the system. What is needed is a single solution
which is flexible enough to work well in all situations. which is flexible enough to work well in all situations.
This specification provides that solution for protocols based on the This specification provides that solution for media streams
offer-answer model, RFC 3264 [4]. It is called Interactive established by signaling protocols based on the offer-answer model,
Connectivity Establishment, or ICE. ICE makes use of STUN and TURN, RFC 3264 [5]. It is called Interactive Connectivity Establishment,
but uses them in a specific methodology which avoids many of the or ICE. ICE makes use of STUN and TURN, but uses them in a specific
pitfalls of using any one alone. methodology which avoids many of the pitfalls of using any one alone.
2. Terminology 2. Terminology
Several new terms are introduced in this specification: Several new terms are introduced in this specification:
Agent: As defined in RFC 3264, an agent is the protocol
implementation involved in the offer/answer exchange. There are
two agents involved in an offer/answer exchange.
Peer: From the perspective of one of the agents in a session, its Peer: From the perspective of one of the agents in a session, its
peer is the other agent. Specifically, from the perspective of peer is the other agent. Specifically, from the perspective of
the offerer, the peer is the answerer. From the perspective of the offerer, the peer is the answerer. From the perspective of
the answerer, the peer is the offeror. the answerer, the peer is the offerer.
Transport Address: The combination of an IP address and port. Transport Address: The combination of an IP address and port.
Local Transport Address: A local transport address a transport Local Transport Address: A local transport address is a transport
address that has been allocated from the operating system on the address that has been allocated from the operating system on the
host. This includes transport addresses obtained through Virtual host. This includes transport addresses obtained through Virtual
Private Networks (VPNs) and transport addresses obtained through Private Networks (VPNs) and transport addresses obtained through
Realm Specific IP (RSIP) [19] (which lives at the operating system Realm Specific IP (RSIP) [21] (which lives at the operating system
level). Transport addresses are typically obtained by binding to level). Transport addresses are typically obtained by binding to
an interface. an interface.
m/c line: The media and connection lines in the SDP, which together m/c line: The media and connection lines in the SDP, which together
hold the transport address used for the receipt of media. hold the transport address used for the receipt of media.
Derived Transport Address: A derived transport address is a transport Derived Transport Address: A derived transport address is a transport
address which is derived from a local transport address. The address which is derived from a local transport address. The
derived transport address is related to the associated local derived transport address is related to the associated local
transport address in that packets sent to the derived transport transport address in that packets sent to the derived transport
address are received on the socket bound to its associated local address are received on the socket bound to its associated local
transport address. Derived addresses are obtained using protocols transport address. Derived addresses are obtained using protocols
like STUN and TURN, and more generally, any UNSAF protocol [21]. like STUN and TURN, and more generally, any UNSAF protocol [23].
Candidate Transport Address: A transport address advertised by a Associated Local Transport Address: When a peer sends a packet to a
agent in an offer or answer. A candidate transport address can transport address, the associated local transport address is the
either by a local transport address or a derived transport local transport address at which those packets will actually
address. arrive. For a local transport address, its associated local
transport address is the same as the local transport address
itself. For STUN derived and TURN derived transport addresses,
however, they are not the same. The associated local transport
address is the one from which the STUN or TURN transport was
derived.
Peer Derived Transport Address: A peer derived transport address is a Peer Derived Transport Address: A peer derived transport address is a
derived transport address learned from a STUN server running derived transport address learned from a STUN server running
within a peer in a media session. within a peer in a media session.
TURN Derived Transport Address: A derived transport address obtained TURN Derived Transport Address: A derived transport address obtained
from a TURN server. from a TURN server.
STUN Derived Transport Address: A derived transport address obtained STUN Derived Transport Address: A derived transport address obtained
from a STUN server whose address has been provisioned into the UA. from a STUN server whose address has been provisioned or
This, by definition, excludes Peer Derived Transport Addresses. discovered by the UA. This, by definition, excludes Peer Derived
Transport Addresses.
Candidate: A sequence of candidate transport addresses that form an Candidate: A sequence of transport addresses that form an atomic set
atomic set for usage with a particular media stream. In the case for usage with a particular media session. Here, atomic means
of RTP, there are two candidate transport addresses per candidate: that all of transport addresses in the candidate need to work
one for RTP, and another for RTCP. Connectivity is verified to before the candidate will be used for actual media transport. In
all of the candidate transport addresses within a candidate before the case of RTP, there can be one or more transport addresses per
that candidate is used. The transport addresses that compose a candidate. In the most common case, there are two - one for RTP,
candidate are all of the same type - local, STUN derived, TURN and another for RTCP. If the agent doesn't use RTCP, there would
derived or peer derived. be just one. If Generic Forward Error Correction (FEC) [19] is in
use, there may be more than two. The transport addresses that
compose a candidate are all of the same type - local, STUN
derived, TURN derived or peer derived.
Local Candidate: A candidate whose transport addresses are local Local Candidate: A candidate whose transport addresses are local
transport addresses. transport addresses.
STUN Candidate: A candidate whose transport addresses are STUN STUN Candidate: A candidate whose transport addresses are STUN
derived transport addresses. derived transport addresses.
TURN Candidate: A candidate whose transport addresses are TURN TURN Candidate: A candidate whose transport addresses are TURN
derived transport addresses. derived transport addresses.
Peer Candidate: A candidate whose transport addresses are peer Peer Derived Candidate: A candidate whose transport addresses are
derived transport addresses. peer derived transport addresses.
Generating Candidate: The candidate from which a peer derived
candidate is derived.
Active Candidate: The candidate that is in use for exchange of media. Active Candidate: The candidate that is in use for exchange of media.
This is the one that an agent places in the m/c line of an offer This is the one that an agent places in the m/c line of an offer
or answer. or answer.
Candidate ID: An identifier for a candidate.
Component: When a media stream, and as a consequence, its candidate,
require several IP addresses and ports to work atomically, each of
the constituent IP addresses and ports represents a component of
that media stream. For example, RTP-based media streams typically
have two components - one for RTP, and one for RTCP.
Component ID: An integer, starting with one within each candidate and
incrementing by one for each component, which identifies the
component.
Transport Address ID (tid): An identifier for a transport address,
formed by concatenating the candidate ID with the component ID,
separated by a "colon".
Candidate Pair: The combination of a candidate from one agent along
with a candidate from its peer.
Native Candidate: From the perspective of each agent, the candidate
in a candidate pair which represents a set of addresses obtained
by that agent.
Remote Candidate: From the perspective of each agent, the candidate
in a candidate pair which represents the set of addresses obtained
by that agents peer.
Transport Address Pair: The combination of the transport address for
one component of a candidate with the transport address of the
same component for the matching candidate in a candidate pair.
Transport Address Pair ID: An identifier for a transport address
pair. Formed by concatenating the native transport address ID
with the remote transport address ID, separated by a "colon".
Matching Transport Address Pair: When a STUN Binding Request is
received on a local transport address, the matching transport
address pair is the transport address pair whose connectivity is
being checked by that Binding Request.
Candidate Pair Priority Ordering: An ordering of candidate pairs
based on a combination of the qvalues of each candidate and the
candidate IDs of each candidate.
Candidate Pair Check Ordering: An ordering of candidate pairs that is
similar to the candidate pair priority ordering, except that the
active candidate appears at the top of the list, regardless of its
priority.
Transport Address Pair Check Ordering: An ordering of transport
address pairs that determines the sequence of connectivity checks
performed for the pairs.
Transport Address Pair Count: The number of transport address pairs
in a candidate pair. This is equal to the minimum of the number
of transport addresses in the native candidate and the number of
transport addresses in the remote candidate.
3. Overview of ICE 3. Overview of ICE
ICE makes the fundamental assumption that clients exist in a network ICE makes the fundamental assumption that clients exist in a network
of segmented connectivity. This segmentation is the result of a of segmented connectivity. This segmentation is the result of a
number of addressing realms in which a client can simultaneously be number of addressing realms in which a client can simultaneously be
connected. We use "realms" here in the broadest sense. A realm is connected. We use "realms" here in the broadest sense. A realm is
defined purely by connectivity. Two clients are in the same realm defined purely by connectivity. Two clients are in the same realm
if, when they exchange the addresses each has in that realm, they are if, when they exchange the addresses each has in that realm, they are
able to send packets to each other. This includes IPv6 and IPv4 able to send packets to each other. This includes IPv6 and IPv4
realms, which actually use different address spaces, in addition to realms, which actually use different address spaces, in addition to
skipping to change at page 7, line 14 skipping to change at page 8, line 31
Agent A TURN,STUN Servers Agent B Agent A TURN,STUN Servers Agent B
|(1) Gather Addresses | | |(1) Gather Addresses | |
|-------------------->| | |-------------------->| |
|(2) Offer | | |(2) Offer | |
|------------------------------------------>| |------------------------------------------>|
| |(3) Gather Addresses | | |(3) Gather Addresses |
| |<--------------------| | |<--------------------|
|(4) Answer | | |(4) Answer | |
|<------------------------------------------| |<------------------------------------------|
|(5) Media | | |(5) STUN Check | |
|<------------------------------------------|
|(6) Media | |
|------------------------------------------>|
|(7) STUN Checks | |
|<------------------------------------------| |<------------------------------------------|
|(8) STUN Checks | | |(6) STUN Check | |
|------------------------------------------>| |------------------------------------------>|
|(9) Offer | | |(7) Offer | |
|------------------------------------------>| |------------------------------------------>|
|(10) Answer | | |(8) Answer | |
|<------------------------------------------| |<------------------------------------------|
|(11) Media | | |(9) Media | |
|<------------------------------------------| |<------------------------------------------|
|(12) Media | | |(10) Media | |
|------------------------------------------>| |------------------------------------------>|
Figure 1 Figure 1
The basic flow of operation for ICE is shown in Figure 1. Before the The basic flow of operation for ICE is shown in Figure 1. Before the
offeror establishes a session, it obtains local transport addresses offerer establishes a session, it obtains local transport addresses
from its operating system on as many interfaces as it has access to. from its operating system on as many interfaces as it has access to.
These interfaces can include IPv4 and IPv6 interfaces, in addition to These interfaces can include IPv4 and IPv6 interfaces, in addition to
Virtual Private Network (VPN) interfaces or ones associated with Virtual Private Network (VPN) interfaces or ones associated with
RSIP. For media protocols that support both UDP and TCP (such as the RSIP. It then obtains transport addresses for the media from each
Real Time Transport Protocol (RTP) [22], which can run over either), interface. Though the ICE framework can support any type of
it obtains both TCP and UDP transport addresses. In addition, the transport protocol, this specification only defines mechanisms for
agent obtains derived transport addresses from each local transport UDP. In addition, the agent obtains derived transport addresses from
address using protocols such as STUN and TURN. Each local and each local transport address using protocols such as STUN and TURN.
derived transport address becomes a candidate for receipt of media These are paced at a fixed rate in order to limit network load and
traffic. avoid NAT overload. The local and derived transport addresses are
formed into candidates, each of which represents a possible set of
The agent will choose one of its candidate transport addresses as its transport addresses that might be viable for a media stream.
initial media transport address for inclusion in the connection and
media lines in the offer. This transport address will be utilized
for media traffic while connectivity is verified to all of the
candidates. Since these checks may take time to execute, media
clipping will occur if the media transport address is not reachable
by the peer. To minimize the probability of clipping, the transport
address that is most likely to work is chosen. This is normally a
TURN-derived tranport address, but others can be utilized based on
local policy.
Each candidate transport address (including the one being used as the Each candidate is listed in a set of a=candidate attributes in the
media transport address) is listed in an a=candidate attribute in the offer. Each candidate is given a priority. Priority is a matter of
offer. Each candidate is given a preference. Preference is a matter local policy, but typically, lowest priority would be given to
of local policy, but typically, lowest preference would be given to
transport addresses learned from a TURN server (i.e., TURN derived transport addresses learned from a TURN server (i.e., TURN derived
transport addresses). Each candidate is also assigned a distinct ID, transport addresses). Each candidate is also assigned a distinct ID,
called a transport ID (tid). called a candidate ID.
The agent will choose one of its candidates as its active candidate
for inclusion in the connection and media lines in the offer. Media
can be sent to this candidate immediately following its validation.
Media is not sent without validation in order to avoid denial-of-
service attacks. In particular, without ICE, an offerer can send an
offer to another agent, and list the IP address and port of a target
in the offer. If the agent is an automata that answers a call
automatically, it will do so and then proceed to send media to the
target. This provides substantial packet amplifications. ICE fixes
this by using STUN-based validation of addresses.
The offer is then sent to the answerer. This specification does not The offer is then sent to the answerer. This specification does not
address the issue of how the signaling messages themselves traverse address the issue of how the signaling messages themselves traverse
NAT. It is assumed that signaling protocol specific mechanisms are NAT. It is assumed that signaling protocol specific mechanisms are
used for that purpose. The answerer follows a similar process as the used for that purpose. The answerer follows a similar process as the
offeror followed; it obtains addresses from local interfaces, obtains offerer followed; it obtains addresses from local interfaces, obtains
derived transport addresses from those, and the combination becomes derived transport addresses from those, and then groups them into
its set of candidate transport addresses. It picks one as its candidates for inclusion in a=candidate attributes in the answer. It
initial media transport address and places it into the m/c line in picks one candidate as its active candidate and places it into the
the answer, and then lists all of them in the a=candidate attributes m/c line in the answer.
in the answer, along with a preference and tid.
Once the offer/answer exchange has completed, each agent sends media Once the offer/answer exchange has completed, both agents pair up the
from its media transport address to the media transport address of candidates, and then determine an ordered set of transport address
its peer. This media stream may or may not work, depending on pairs. This ordering is based primarily on the priority of the
whether or not the media transport address is reachable. In parallel candidates, with the exception of the active candidate, whose
with the transmission of media, a connectivity check begins. This addresses are at the top of the list. Both agents start at the top
check makes use of STUN messages sent from each candidate to each of this list, beginning a connectivity check for that transport
other candidate. These checks will allow each agent to determine address pair. At a fixed interval, checks for the next transport
whether it can send packets from a particular candidate to a address on the list begin. This results in a pacing of the
candidate from its peer, and whether packets can be sent back. If, connectivity checks. These connectivity checks are performed through
after a certain period of time, an agent determines that a pair of peer-to-peer STUN requests, sent from one agent to the other. In
candidates works, and has a higher priority than the transport addition to pacing the checks out at regular intervals, the offerer
addresses currently in use for media (perhaps because the ones in use will generate a connectivity check for a transport address pair when
don't work), it sends a new offer that "promotes" its candidate into it receives one from its peer. As soon as the active candidate has
the m/c line. This causes the media traffic to switch to this new been verified by the STUN checks, media can begin to flow. Once a
transport address. higher priority candidate has been verified by the offerer, it ceases
additional connectivity checks, and sends an updated offer which
promotes this higher priority candidate to the m/c-line. That
candidate is also listed in a=candidate attributes, resulting in
periodic STUN keepalives through the duration of the media session.
If an agent receives a STUN connectivity check with a new source IP
address and port, or a response to such a check with a new IP address
and port indicated in the MAPPED-ADDRESS attribute, this new address
might be a viable candidate for the receipt of media. This happens
when there is a symmetric NAT between the agents. In such a case,
the agents algorithmically construct a new candidate. Like other
candidates, connectivity checks begin for it, and if they succeed,
its transport addresses can be used for receipt of media by promoting
it to the m/c-line.
The gathering of addresses and connectivity checks take time. As a
consequence, in order to have no impact on the call setup time or
post-pickup delay for SIP, these offer/answer exchanges and checks
happen while the call is ringing.
4. Sending the Initial Offer 4. Sending the Initial Offer
When an agent wishes to begin a session by sending an initial offer, When an agent wishes to begin a session by sending an initial offer,
it starts by gathering transport addresses, as described in it starts by gathering transport addresses, as described in
Section 7.1. This will produce a set of candidates, including local Section 7.1. This will produce a set of candidates, including local
ones, STUN-derived ones, and TURN-derived ones. ones, STUN-derived ones, and TURN-derived ones.
This process of gathering candidates can actually happen at any time This process of gathering candidates can actually happen at any time
before sending the initial offer. A agent can pre-gather transport before sending the initial offer. A agent can pre-gather transport
addresses, using a user interface cue (such as picking up the phone, addresses, using a user interface cue (such as picking up the phone,
or entry into an address book) as a hint that communications is or entry into an address book) as a hint that communications is
imminent. Doing so eliminates any additional perceivable call setup imminent. Doing so eliminates any additional perceivable call setup
delays due to address gathering. delays due to address gathering.
When it comes time to offer communications, it determines a priority When it comes time to offer communications, the agent determines a
for each candidate and identifies the active candidate that will be priority for each candidate and identifies the active candidate that
used for receipt of media, as described in Section 7.3. will be used for receipt of media, as described in Section 7.2.
The next step is to construct the offer message. For each media The next step is to construct the offer message. For each media
stream, it places its candidates into a=candidate attributes in the stream, it places its candidates into a=candidate attributes in the
offer and puts its active candidate into the m/c line. The process offer and puts its active candidate into the m/c line. The process
for doing this is described in Section 7.2. The offer is then sent. for doing this is described in Section 7.3. The offer is then sent.
5. Receipt of the Offer and Generation of the Answer 5. Receipt of the Offer and Generation of the Answer
Upon receipt of the offer message, the agent checks if the offer Upon receipt of the offer message, the agent checks if the offer
contains any a=candidate attributes. If it does, the offeror contains any a=candidate attributes. If it does, the offerer
supports ICE. In that case, it starts gathering candidates, as supports ICE. In that case, it starts gathering candidates, as
described in Section 7.1, and prioritizes them Section 7.3. This described in Section 7.1, and prioritizes them as described in
processing is done immediately on receipt of the offer, to prepare Section 7.2. This processing is done immediately on receipt of the
for the case where the user should accept the call, or early media offer, to prepare for the case where the user should accept the call,
needs to be generated. By gathering candidates while the user is or early media needs to be generated. By gathering candidates (and
being alerted to the request for communications, session performing connectivity checks) while the user is being alerted to
establishment delays due to that gathering can be eliminated. the request for communications, session establishment delays due to
that gathering can be eliminated.
At some point, the answerer will decide to accept or reject the The agent then constructs its answer, encoding its candidates into
communications. A rejection terminates ICE processing. In the case a=candidate attributes and including the active one in the m/c-line,
of acceptance, the answer is constructed, and if the offeror as described in Section 7.3. The agent then forms candidate pairs as
supported ICE, the candidates are encoded into the SDP as described described in Section 7.4. These are ordered as described in
in Section 7.2. The answer is then sent. If the offeror supported Section 7.5. The agent then begins connectivity checks, as described
ICE, the answerer begins its connectivity checks as described in in Section 7.6. It follows the logic in Section 7.10 on receipt of
Section 7.4. Binding Requests and responses to learn new candidates from the
checks themselves.
In addition, and regardless if the offeror supported ICE, the Transmission of media is performed according to the procedures in
answerer can begin sending media packets as it normally would. It Section 7.13.
sends media according to the procedures in Section 7.8.
6. Processing the Answer 6. Processing the Answer
There are two possible cases for processing of the answer. If the There are two possible cases for processing of the answer. If the
answerer did not support ICE, the answer will not contain any answerer did not support ICE, the answer will not contain any
a=candidate attributes. As a result, the offeror knows that it a=candidate attributes. As a result, the offerer knows that it
cannot perform its connectivity checks. In this case, it proceeds cannot perform its connectivity checks. In this case, it proceeds
with normal media processing as if ICE was not in use. The with normal media processing as if ICE was not in use. The
procedures for sending media, described in Section 7.8, MUST be procedures for sending media, described in Section 7.13, MUST be
followed however. followed however.
If the answer contains candidates, it implies that the answerer If the answer contains candidates, it implies that the answerer
supported ICE. In that case, the offeror begins connectivity checks supports ICE. The agent then forms candidate pairs as described in
as described in Section 7.4. It also starts sending media, using the Section 7.4. These are ordered as described in Section 7.5. The
candidate in the m/c line, based on the procedures described in agent then begins connectivity checks, as described in Section 7.6.
Section 7.8. It follows the logic in Section 7.10 on receipt of Binding Requests
and responses to learn new candidates from the checks themselves.
Transmission of media is performed according to the procedures in
Section 7.13.
7. Common Procedures 7. Common Procedures
This section discusses procedures that are common between offeror and This section discusses procedures that are common between offerer and
answerer. answerer.
7.1 Gathering Candidates 7.1 Gathering Candidates
An agent gathers candidates when it believes that communications is An agent gathers candidates when it believes that communications is
imminent. For offerors, this occurs before sending an offer imminent. For offerers, this occurs before sending an offer
(Section 4). For answerers, it occurs before sending an answer (Section 4). For answerers, it occurs before sending an answer
(Section 5). (Section 5).
Each candidate is composed of a series of transport addresses of the Each candidate has one or more components, each of which is
same type. In the case of RTP, the candidate is composed of either associated with a sequence number, starting at 1 for the first
one or two transport addresses. Normally there are two - one for component of each candidate, and incrementing by 1 for each
RTP, and one for RTCP. However, if RTCP is not in use, a candidate additional component within that candidate. These components
will only contain a single transport address. represent a set of transport addresses for which connectivity must be
validated. For a particular media stream, all of the candidates
SHOULD have the same number of components. The number of components
that are needed are a function of the type of media stream.
For traditional RTP-based media streams, it is RECOMMENDED that there
be two components per candidate - one for RTP and one for RTCP. The
component with the component ID of 1 MUST be RTP, and the one with
component ID of 2 MUST be RTCP. If an agent doesn't implement RTCP,
it SHOULD have a single component for the RTP stream (which will have
a component ID of 1 by definition). Each component of a candidate
has a single transport address.
The first step is to gather local candidates. Local candidates are The first step is to gather local candidates. Local candidates are
obtained by binding to ephemeral ports on an interface (physical or obtained by binding to ephemeral ports on an interface (physical or
virtual, including VPN interfaces) on the host. Specifically, for virtual, including VPN interfaces) on the host. The process for
each UDP-only media stream the agent wishes to use, the agent SHOULD gathering local candidates depends on the transport protocol.
Procedures are specified here for UDP. Extensions to ICE that define
procedures for other transport protocols MUST specify how local
transport addresses are gathered.
For each UDP media stream the agent wishes to use, the agent SHOULD
obtain a set of candidates (one for each interface) by binding to N obtain a set of candidates (one for each interface) by binding to N
ephemeral UDP ports on each interface, where N is the number of ephemeral UDP ports on each interface, where N is the number of
transport addresses needed for the candidate. For RTP, N is components needed for the candidate. For RTP, N is typically two.
typically two. For each TCP-only media stream the agent wishes to
use, the agent SHOULD obtain a set of candidates by binding to N
ephemeral TCP ports on each interface, where N is the number of
transport addresses needed for the candidate. For media streams that
can support either UDP or TCP, the agent SHOULD obtain a set of
candidates by binding to N ephemeral UDP and N ephemeral TCP ports on
each interface, where N is the number of transport addresses needed
for the candidate.
If a host has K local interfaces, this will result in K candidates If a host has K local interfaces, this will result in K candidates
for each UDP stream (requiring K*N transport addresses), K candidates for each UDP stream , requiring K*N local transport addresses.
for each TCP stream (requiring K*N transport addresses), and 2K
candidates for streams that support UDP and TCP (requiring 2*K*N
transport addresses).
Media streams carried using the Real Time Transport Protocol (RTP)
[22] can run over TCP [27]. As such, it is RECOMMENDED that both UDP
and TCP candidates be obtained. Transmission of real time media over
UDP is generally preferred to TCP. However, many network
environments, for better or for worse, permit only TCP traffic.
Obtaining a TCP candidate, and then using it in conjunction with a
TURN relay as described below, allows for ICE to make use of the TCP
media only when UDP connectivity is non-existent, as it may be in
these restricted environments. However, providers of real-time
communications services may decide that it is preferable to have no
media at all than it is to have media over TCP. To allow for choice,
it is RECOMMENDED that agents be configurable with whether they
obtain TCP candidates for real time media.
Having it be configurable, and then configuring it to be off, is
far better than not having the capability at all. An important
goal of this specification is to provide a single mechanism that
can be used across all types of endpoints. As such, it is
preferable to account for provider and network variation through
configuration, instead of hard-coded limitations in an
implementation. Furthermore, network characteristics and
connectivity assumptions can, and will change over time. Just
because a agent is communicating with a server on the public
network today, doesn't mean that it won't need to communicate with
one behind a NAT tomorrow. Just because a agent is behind a full
cone NAT today, doesn't mean that tomorrow they won't pick up
their agent and take it to a public network access point where
there is a symmetric NAT or one that only allows outbound TCP.
The way to handle these cases and build a reliable system is for
agents to implement a diverse set of techniques for allocating
addresses, so that at least one of them is almost certainly going
to work in any situation. Implementors should consider very
carefully any assumptions that they make about deployments before
electing not to implement one of the mechanisms for address
allocation. In particular, implementors should consider whether
the elements in the system may be mobile, and connect through
different networks with different connectivity. They should also
consider whether endpoints which are under their control, in terms
of location and network connectivity, would always be under their
control. Only in cases where there isn't now, and never will be,
endpoint mobility or nomadicity of any sort, should a technique be
omitted.
Once the agent has obtained local candidates, it obtains candidates Once the agent has obtained local candidates, it obtains candidates
with derived transport addresses. Agents which serve end users with derived transport addresses. The process for gathering derived
directly, such as softphones, hardphones, terminal adaptors and so candidates depends on the transport protocol. Procedures are
on, MUST implement STUN and SHOULD use it to obtain STUN candidates. specified here for UDP. Extensions to ICE that define procedures for
These devices SHOULD implement and SHOULD use TURN to obtain TURN other transport protocols MUST specify how derived transport
candidates. They MAY implement and MAY use other protocols that addresses are gathered.
provide derived transport addresses, such as TEREDO [25]. As with
TCP, usage of STUN and TURN is at SHOULD strength to allow for Agents which serve end users directly, such as softphones,
provider variation. If it is not to be used, it is also RECOMMENDED hardphones, terminal adapters and so on, MUST implement STUN and
that it be implemented and just disabled through configuration, so SHOULD use it to obtain STUN candidates. These devices SHOULD
that it can re-enabled through configuration if conditions change in implement and SHOULD use TURN to obtain TURN candidates. They MAY
the future. implement and MAY use other protocols that provide derived transport
addresses, such as TEREDO [31]. Usage of STUN and TURN is at SHOULD
strength to allow for provider variation. If it is not to be used,
it is RECOMMENDED that it be implemented and just disabled through
configuration, so that it can re-enabled through configuration if
conditions change in the future.
Agents which represent network servers under the control of a service Agents which represent network servers under the control of a service
provider, such as gateways to the telephone network, media servers, provider, such as gateways to the telephone network, media servers,
or conferencing servers that are targeted at deployment only in or conferencing servers that are targeted at deployment only in
networks with public IP addresses MAY use STUN, TURN or other similar networks with public IP addresses MAY use STUN, TURN or other similar
protocols to obtain candidates. protocols to obtain candidates.
Why would these types of endpoints even bother to implement ICE? Why would these types of endpoints even bother to implement ICE?
The answer is that such an implementation greatly facilitates NAT The answer is that such an implementation greatly facilitates NAT
traversal for endpoints that connect to it. The ability to traversal for clients that connect to it. The ability to process
process STUN connectivity checks allows for the network server to STUN connectivity checks allows for clients to obtain peer-derived
obtain peer-derived transport addresses that can be used to transport addresses that can be used by the network server to
provide relay-free traversal of symmetric NAT for endpoints that reach them without a relay, even through symmetric NAT.
connect to it. Furthermore, implementation of the STUN Furthermore, implementation of the STUN connectivity checks allows
connectivity checks allows for NAT bindings along the way to be for NAT bindings along the way to be kept open. ICE also provides
kept open. ICE also provides numerous security properties that numerous security properties that are independent of NAT
are independent of NAT traversal, and would benefit any multimedia traversal, and would benefit any multimedia endpoint. See
endpoint. See Section 12 for a discussion on these benefits. Section 13 for a discussion on these benefits.
To obtain STUN candidates (which are always UDP), the client takes a
local UDP candidate, and for each configured STUN server, produces a
STUN candidate. It is anticipated that clients may have a
multiplicity of STUN servers configured in network environments where
there are multiple layers of NAT, and that layering is known to the
provider of the client. To produce the STUN candidate from the local
candidate, it follows the procedures of Section 9 of RFC 3489 for
each local transport address in the local candidate. It obtains a
shared secret from the STUN server and then initiates a Binding
Request transaction from the local transport address to that server.
The Binding Response will provide the client with its STUN derived
transport address in the MAPPED-ADDRESS attribute. If the client had
K local candidates, this will produce S*K STUN candidates, where S is
the number of configured STUN servers.
To obtain UDP TURN candidates, the client takes a local UDP
candidate, and for each configured TURN server, produces a TURN
candidate. It is anticipated that clients may have a multiplicity of
TURN servers configured in network environments where there are
multiple layers of NAT, and that layering is known to the provider of
the client. To produce the TURN candidate from the local candidate,
it follows the procedures of Section 8 of [14] for each local
transport address in the local candidate. It initiates an Allocate
Request transaction from the local transport address to that server.
The Allocate Response will provide the client with its TURN derived
transport address in the MAPPED-ADDRESS attribute. If the client had
K local candidates, this will produce S*K UDP TURN candidates, where
S is the number of configured TURN servers.
To obtain a TURN-derived TCP candidates, the client takes a local TCP
candidate, and for each configured TURN server, produces a TCP TURN
candidate. It is anticipated that clients may have a multiplicity of
TURN servers configured in network environments where there are
multiple layers of NAT, and that layering is known to the provider of
the client. To produce the TURN candidate from the local candidate,
it iterates through the local transport addresses in the local
candidate, and for for each one, initiates a TCP connection from the
same interface the local transport address to the TURN server. It is
not neccesary to initiate the connection from the actual port in the
local transport address. Following the procedures of Section 8 of
[14], it initiates an Allocate Request transaction over the
connection. The Allocate Response will provide the client with its
TCP TURN derived transport address in the MAPPED-ADDRESS attribute.
If the client had K local TCP candidates, this will produce S*K TCP
TURN candidates, where S is the number of configured TURN servers.
7.2 Encoding Candidates into SDP
For each candidate to be placed into the SDP, the agent includes a
series of a=candidate attributes as media-level attributes, one for
each transport address in the candidate. Each of the transport
addresses for the same candidate MUST have the same value of the
candidate-id attribute. The a=candidate attributes for different
candidates MUST be unique within that media stream. Using a simple
sequence number, incrementing by one for each candidate for a media
stream, meets these requirements. The transport, unicast-address and
port of the attribute are set to those for the candidate. The qvalue
is set to the priority of this candidate (note that, for RTP, the RTP
and RTCP transport addresses MUST have equal priority values). The
tid MUST be chosen randomly with 128 bits of randomness. The tid is
chosen only when the transport address is placed into the SDP for the
first time; subsequent offers or answers within the same session
containing that same transport address would use the same tid used
previously.
The tid serves as a unique identifier for each transport address. It
also gets combined, through concatenation, with the tid of a peer
candidate to form the username and password that is placed in the
STUN checks between the peers. This allows the STUN message to
uniquely identify the pairing whose connectivity it is checking. The
tid is needed as a unique identifier because the IP address within
the candidate fails to provide that uniqueness as a consequence of
NAT.
Consider agents A, B, and C. A and B are within private enterprise 1, Obtaining STUN, TURN and other derived candidates requires
which is using 10.0.0.0/8. C is within private enterprise 2, which transmission of packets which have the effect of creating bindings on
is also using 10.0.0.0/8. As it turns out, B and C both have IP NAT devices between the client and the STUN or TURN servers.
address 10.0.1.1. A sends an offer to C. C, in its answer, provides Experience has shown that many NAT devices have upper limits on the
A with its transport addresses. In this case, thats 10.0.1.1:8866 rate at which they will create new bindings. Furthermore,
and 8877. As it turns out, B is in a session at that same time, and transmission of these packets on the network makes use of bandwidth
is also using 10.0.1.1:8866 and 8877. This means that B is prepared and needs to be rate limited by the agent. As a consequence, a
to accept STUN messages on those ports, just as C is. A will send a client SHOULD pace its STUN and TURN transactions, such that the
STUN request to 10.0.1.1:8866 and 8877. However, these do not go to start of each new transaction occurs at least Ta seconds after the
C as expected. Instead, they go to B. If B just replied to them, A start of the previous transaction. The value of Ta SHOULD be
would believe it has connectivity to C, when in fact it has configurable, and SHOULD have a default of 50ms. Note that this
connectivity to a completely different user, B. To fix this, tid pacing applies only to the start of a new transaction; pacing of
takes on the role of a unique identifier. C provides A with an retransmissions within a STUN or TURN transaction is governed by the
identifier for its transport address, and A provides one to C. A retransmission rules defined in those protocols.
concatenates these two identifiers and uses the result as the
username and password in its STUN query to 10.0.1.1:8866. This STUN
query arrives at B. However, the username is unknown to B, and so the
request is rejected. A treats the rejected STUN request as if there
were no connectivity to C (which is actually true). Therefore, the
error is avoided.
An unfortunate consequence of the non-uniqueness of IP addresses is To obtain STUN candidates, the client takes a local UDP candidate,
that, in the above example, B might not even be an ICE agent. It and for each configured STUN server, produces a STUN candidate. It
could be any host, and the port to which the STUN packet is directed is anticipated that clients may have a multiplicity of STUN servers
could be any ephemeral port on that host. If there is an application that it discovers or is configured with in network environments where
listening on this socket for packets, and it is not prepared to there are multiple layers of NAT. To produce the STUN candidate from
handle malformed packets for whatever protocol is in use, the the local candidate, it follows the procedures of Section 9 of RFC
operation of that application could be effected. Fortunately, since 3489 for each local transport address in the local candidate. It
the ports exchanged in SDP are ephemeral and ususally drawn from the obtains a shared secret from the STUN server and then initiates a
dynamic or registered range, the odds are good that the port is not Binding Request transaction from each local transport address to that
used to run a server on host B, but rather is the agent side of some server. The Binding Response will provide the client with its STUN
protocol. This decreases the probability of hitting a port in-use, derived transport address in the MAPPED-ADDRESS attribute. If the
due to the transient nature of port usage in this range. However, client had K local candidates, this will produce S*K STUN candidates,
the possibility of a problem does exist, and network deployers should where S is the number of STUN servers.
be prepared for it.
Note that, because there are separate transport addresses for RTP and It is anticipated that clients may have a multiplicity of TURN
RTCP, each will have a distinct tid. servers configured or discovered in network environments where there
are multiple layers of NAT, and that layering is known to the
provider of the client. To obtain TURN candidates, for each
configured TURN server, the client initiates an Allocate Request
transaction using the procedures of Section 8 of [16] from each
transport address of a particular local candidate. The Allocate
Response will provide the client with its TURN derived transport
address in the MAPPED-ADDRESS attribute. Once the TURN allocations
against a particular TURN server succeed from all of the transport
addresses in a particular local candidate, the client SHOULD NOT
attempt any further TURN allocations to that particular server from
the transport addresses in any other local candidates. This is to
reduce the number of bindings allocated from the NATs. Only a single
TURN candidate is needed from a particular TURN server. The order in
which local candidates are tried against the TURN server is a matter
of local policy.
The active candidate is placed into the m/c lines of the SDP. For Since a client will pace its STUN and TURN allocations at a rate of
RTP streams, this is done by placing the RTP address and port into one new transaction every Ta seconds, it will take a certain amount
the c and m lines in the SDP respectively. If the agent it utilizing of time for these allocations to occur. It is RECOMMENDED that
RTCP, it MUST encode its address and port using the a=rtcp attribute implementations have a configurable upper bound on the total number
as defined in RFC 3605 [2]. If RTCP is not in use, the agent MUST of such allocations they will perform before generation of their
signal that using b=RS:0 and b=RR:0 as defined in RFC 3556 [8]. offer or answer. Any allocations not completed at that point SHOULD
be abandoned, but MAY continue and be used in an updated offer once
they complete. A default value of 10 is RECOMMENDED. Since the
total number of allocations that could be done (based on the number
of STUN servers, TURN servers and local interfaces) might exceed this
value, clients SHOULD prioritize their allocations and perform higher
priority ones first. It is RECOMMENDED that STUN allocations be
prioritized over TURN allocations.
For media streams that are inherently TCP-based (as opposed to ones Once the allocations are complete, any redundant candidates are
where TCP is a fallback and would be listed as a candidate but not discarded. A candidate is redundant if its transport addresses for
the initial active address), the connections MUST be signaled using each component match the transport addresses for each component of
comedia [13], and those connections MUST be in "holdconn" mode. This another candidate.
has the effect of suspending connection attempts via the comedia
mechanisms, allowing ICE to open the connections instead. These
connections then get removed from holdconn mode when the ICE
procedures complete and an updated offer/answer exchange takes place
that promotes one of the existing ICE-established connections to
active. Note that this has the result of increasing the post-dial-
delay for TCP-oriented media, but brings with it substantial security
and NAT traversal properties.
7.3 Prioritizing the Transport Addresses and Choosing an Active One 7.2 Prioritizing the Candidates and Choosing an Active One
The prioritization process takes the set of candidates and associates The prioritization process takes the set of candidates and associates
each with a priority. This priority reflects the desire that the each with a priority. This priority reflects the desire that the
agent has to receive media on that address, and is assigned as a agent has to receive media on that address, and is assigned as a
value from 0 to 1 (1 being most preferred). Priorities are ordinal, value from 0 to 1 (1 being most preferred). Priorities are ordinal,
so that their significance is only meaningful relative to other so that their significance is only meaningful relative to other
candidates for a particular media stream. candidates from that agent for a particular media stream. Candidates
MAY have the same priority. However, it is RECOMMENDED that each
candidate have a distinct priority. Doing so improves the efficiency
of ICE.
This specification makes no normative recommendations on how the This specification makes no normative recommendations on how the
prioritization is done. However, some useful guidelines are prioritization is done. However, some useful guidelines are
suggested on how such a prioritization can be determined. suggested on how such a prioritization can be determined.
One criteria for choosing one candidate over another is whether or One criteria for choosing one candidate over another is whether or
not that candidate involves the use of a relay. That is, if media is not that candidate involves the use of a relay. That is, if media is
sent to that candidate, will the media first transit a relay before sent to that candidate, will the media first transit a relay before
being received. TURN candidates make use of relays (the TURN being received. TURN candidates make use of relays (the TURN
server), as do any local candidates associated with a VPN server. server), as do any local candidates associated with a VPN server.
skipping to change at page 15, line 49 skipping to change at page 15, line 39
may increase the cost of providing service, since media will be may increase the cost of providing service, since media will be
routed in and right back out of a relay run by the provider. If routed in and right back out of a relay run by the provider. If
these concerns are important, candidates with this property can be these concerns are important, candidates with this property can be
listed with lower priority. listed with lower priority.
Another criteria for choosing one candidate over another is IP Another criteria for choosing one candidate over another is IP
address family. ICE works with both IPv4 and IPv6. It therefore address family. ICE works with both IPv4 and IPv6. It therefore
provides a transition mechanism that allows dual-stack hosts to provides a transition mechanism that allows dual-stack hosts to
prefer connectivity over IPv6, but to fall back to IPv4 in case the prefer connectivity over IPv6, but to fall back to IPv4 in case the
v6 networks are disconnected (due, for example, to a failure in a v6 networks are disconnected (due, for example, to a failure in a
6to4 relay) [24]. It can also help with hosts that have both a 6to4 relay) [26]. It can also help with hosts that have both a
native IPv6 address and a 6to4 address. In such a case, higher native IPv6 address and a 6to4 address. In such a case, higher
priority could be afforded to the native v6 address, followed by the priority could be afforded to the native v6 address, followed by the
6to4 address, followed by a native v4 address. This allows a site to 6to4 address, followed by a native v4 address. This allows a site to
obtain and begin using native v6 addresss immediately, yet still obtain and begin using native v6 addresss immediately, yet still
fallback to 6to4 addresses when communicating with agents in other fallback to 6to4 addresses when communicating with agents in other
sites that do not yet have native v6 connectivity. sites that do not yet have native v6 connectivity.
Another criteria for choosing one candidate over another is security. Another criteria for choosing one candidate over another is security.
If a user is a telecommuter, and therefore connected to their If a user is a telecommuter, and therefore connected to their
corporate network and a local home network, they may prefer their corporate network and a local home network, they may prefer their
voice traffic to be routed over the VPN in order to keep it on the voice traffic to be routed over the VPN in order to keep it on the
corporate network when communicating within the enterprise, but use corporate network when communicating within the enterprise, but use
the local network when communicating with users outside of the the local network when communicating with users outside of the
enterprise. enterprise.
Another criteria for choosing one address over another is topological Another criteria for choosing one address over another is topological
awareness. This is most useful for candidates which make use of awareness. This is most useful for candidates which make use of
relays (including TURN and VPN). In those cases, if a agent has relays (including TURN and VPN). In those cases, if an agent has
preconfigured or dynamically discovered knowledge of the topological preconfigured or dynamically discovered knowledge of the topological
proximity of the relays to itself, it can use that to select closer proximity of the relays to itself, it can use that to select closer
relays with higher priority. relays with higher priority.
Finally, the transport protocol itself is a criteria for choosing one There may be transport-specific reasons for preferring one candidate
candidate over another. If a particular media stream can run over over another. In such a case, specifications defining usage of ICE
UDP or TCP, the UDP candidates might be preferred over the TCP with other transport protocols SHOULD document such considerations.
candidates. This allows ICE to use the lower latency UDP
connectivity if it exists, but fallback to TCP if UDP doesn't work.
Once the candidates have been prioritized, one is selected as the Once the candidates have been prioritized, one may be selected as the
active one. This is the candidate that will be used for actual active one. This is the candidate that will be used for actual
exchange of media, until replaced by an updated offer or answer. exchange of media if and when its validated, until replaced by an
Since the ICE connectivity checks can take a few seconds to execute, updated offer or answer. The active candidate will also be used to
media clipping can occur is this candidate doesn't work. The active receive media from ICE-unaware peers. As such, it is RECOMMENDED
candidate will also be used to receive media from ICE-unaware peers. that one be chosen based on the likelihood of that candidate to work
As such, it is RECOMMENDED that one be chosen based on the likelihood with the peer that is being contacted. Unfortunately, it is
of that candidate to work with the peer that is being contacted. difficult to ascertain which candidate that might be. As an example,
Unfortunately, it is difficult to ascertain which candidate that consider a user within an enterprise. To reach non-ICE capable
might be. As an example, consider a user within an enterprise. To agents within the enterprise, a local candidate has to be used, since
reach non-ICE capable agents within the enterprise, a local candidate the enterprise policies may prevent communication between elements
has to be used, since the enterprise policies may prevent using a relay on the public network. However, when communicating to
communication between elements using a relay on the public network. peers outside of the enterprise, a TURN-based candidate from a
However, when communicating to peers outside of the enterprise, a publically accessible TURN server is needed.
TURN-based candidate from a publically accessible TURN server is
needed.
Indeed, the difficulty in picking just one address that will work is Indeed, the difficulty in picking just one address that will work is
the whole problem that motivated the development of this the whole problem that motivated the development of this
specification in the first place. As such, it is RECOMMENDED that specification in the first place. As such, it is RECOMMENDED that
the default address be a TURN candidate from a TURN server providing the active candidate be a TURN derived candidate from a TURN server
public IP addresses. Furthermore, ICE is only truly effective when providing public IP addresses. Furthermore, ICE is only truly
it is supported on both sides of the session. It is therefore most effective when it is supported on both sides of the session. It is
prudent to deploy it to close-knit communities as a whole, rather therefore most prudent to deploy it to close-knit communities as a
than piecemeal. In the example above, this would mean that ICE would whole, rather than piecemeal. In the example above, this would mean
ideally be deployed completely within the enterprise, rather than that ICE would ideally be deployed completely within the enterprise,
just to parts of it. rather than just to parts of it.
7.4 Connectivity Checks An additional consideration for selection of the active candidate is
the switching of media stream destinations between the initial offer
and the subsequent offer. If the active candidate pair in the
initial offer is be validated, media will flow once that pair is
validated. When the ICE checks complete and yield a higher priority
candidate pair, there will be an updated offer/answer exchange that
will change the active candidate. This will result in a change in
the destination of the media packets. This may also cause a
different path for the media packets. That path might have different
delay and jitter characteristics. As a consequence, the jitter
buffers may see a glitch, causing possible media artifacts. If these
issues are a concern, the initial offer MAY omit an active candidate.
In such a case, an updated offer will need to be sent immediately
when communicating with an ICE-unaware agent, setting an active
candidate.
There may be transport-specific reasons for selection of an active
candidate. In such a case, specifications defining usage of ICE with
other transport protocols SHOULD document such considerations.
7.3 Encoding Candidates into SDP
For each candidate for a media stream, the agent includes a series of
a=candidate attributes as media-level attributes, one for each
component in the candidate. Each candidate has a unique identifier,
called the candidate-id. The candidate-id MUST be chosen randomly
and contain at least 128 bits of randomness (this does not mean that
the candidate-id is 128 bits long; just that it has at least 128 bits
of randomness). It is chosen only when the candidate is placed into
the SDP for the first time; subsequent offers or answers within the
same session containing that same candidate MUST use the same
candidate-id used previously.
Each component of the candidate has an identifier, called the
component-id. The component-id is a sequence number. For each
candidate, it starts at one, and increments by one for each
component. As discussed below, ICE will perform connectivity checks
such that, between a pair of candidates, checks only occur between
transport addresses with the same component-id. As a consequence, if
one candidate has three components, and it is paired with a candidate
that has two, there will only be two transport address pairs and two
connectivity checks.
ICE will work without a standardized mapping between the components
of a media stream and the numerical value of the component-id. This
allows ICE to be used with media streams with multiple components
without development of standards around such a mapping. However, a
specific mapping has been defined in this specification for RTP -
component-id 1 corresponds to RTP, and component-id of 2 corresponds
to RTCP. Like the candidate-id, the component-id is assigned at the
time the candidate is first placed into the SDP; subsequent offers or
answers within the same session containing that same candidate MUST
use the same component-id used previously.
The transport, addr and port of the a=candidate attribute (all
defined in Section 12) are set to the transport protocol, unicast
address and port of the tranport address. A Fully Qualified Domain
Name (FQDN) for a host MAY be used in place of a unicast address. In
that case, when receiving an offer or answer containing an FQDN in an
a=candidate attribute, the FQDN is looked up in the DNS using an A or
AAAA record, and the resulting IP address is used for the remainder
of ICE processing. The qvalue is set to the priority of the
candidate, and MUST be the same for all components of the candidate.
Each transport address also includes a password that will be used for
securing the STUN connectivity checks. This password MUST be chosen
randomly with 128 bits of randomness (though it can be longer than
128 bits). Like the candidate-id, it is chosen when the candidate is
placed into an SDP for the first time for a particular session;
subsequent offers and answers within the same session conveying the
same candidate MUST use the same password. The converse is true; if
a new offer is generated as part of a new multimedia session, a new
password (and candidate-id) would be used even if the transport
address from a previous session was being recycled.
The combination of candidate-id and component-id uniquely identify
each transport address. As a consequence, each transport address has
a unique identifier, called the tid. The tid is formed by
concatenating the candidate-id with the component-id, separated by
the colon (":"). The tid is not explicitly encoded in the SDP; it is
derived from the candidate-id and component-id, which are present in
the SDP. The usage of the colon as a separator allows the
candidate-id and component-id to be extracted from the tid, since the
colon is not a valid character for the candidate-id.
The tid gets combined, through further concatenation, with the tid of
a transport address from the remote candidate (separated again by
another colon) to form the username that is placed in the STUN checks
between the peers. This allows the STUN message to uniquely identify
the pairing whose connectivity it is checking. The tid is needed as
a unique identifier because the IP address within the candidate fails
to provide that uniqueness as a consequence of NAT.
Consider agents A, B, and C. A and B are within private enterprise 1,
which is using 10.0.0.0/8. C is within private enterprise 2, which
is also using 10.0.0.0/8. As it turns out, B and C both have IP
address 10.0.1.1. A sends an offer to C. C, in its answer, provides
A with its transport addresses. In this case, thats 10.0.1.1:8866
and 8877. As it turns out, B is in a session at that same time, and
is also using 10.0.1.1:8866 and 8877. This means that B is prepared
to accept STUN messages on those ports, just as C is. A will send a
STUN request to 10.0.1.1:8866 and 8877. However, these do not go to
C as expected. Instead, they go to B. If B just replied to them, A
would believe it has connectivity to C, when in fact it has
connectivity to a completely different user, B. To fix this, tid
takes on the role of a unique identifier. C provides A with an
identifier for its transport address, and A provides one to C. A
concatenates these two identifiers (with a colon between) and uses
the result as the username in its STUN query to 10.0.1.1:8866. This
STUN query arrives at B. However, the username is unknown to B, and
so the request is rejected. A treats the rejected STUN request as if
there were no connectivity to C (which is actually true). Therefore,
the error is avoided.
An unfortunate consequence of the non-uniqueness of IP addresses is
that, in the above example, B might not even be an ICE agent. It
could be any host, and the port to which the STUN packet is directed
could be any ephemeral port on that host. If there is an application
listening on this socket for packets, and it is not prepared to
handle malformed packets for whatever protocol is in use, the
operation of that application could be affected. Fortunately, since
the ports exchanged in SDP are ephemeral and ususally drawn from the
dynamic or registered range, the odds are good that the port is not
used to run a server on host B, but rather is the agent side of some
protocol. This decreases the probability of hitting a port in-use,
due to the transient nature of port usage in this range. However,
the possibility of a problem does exist, and network deployers should
be prepared for it. Note that this is not a problem specific to ICE;
stray packets can arrive at a port at any time for any type of
protocol, especially ones on the public Internet. As such, this
requirement is just restating a general design guideline for Internet
applications - be prepared for unknown packets on any port.
The active candidate, if there is one, is placed into the m/c lines
of the SDP. For RTP streams, this is done by placing the RTP address
and port into the c and m lines in the SDP respectively. If the
agent is utilizing RTCP, it MUST encode its address and port using
the a=rtcp attribute as defined in RFC 3605 [2]. If RTCP is not in
use, the agent MUST signal that using b=RS:0 and b=RR:0 as defined in
RFC 3556 [8].
If there is no active candidate, the agent MUST include an a=inactive
attribute. The RTP address and port in the m/c-line is
inconsequential, since it won't be used.
Encoding of candidates may involve transport protocol specific
considerations. There are none for UDP. However, extensions that
define usage of ICE with other transport protocols SHOULD specify any
special encoding considerations.
7.4 Forming Candidate Pairs
Once the offer/answer exchange has completed, both agents will have a Once the offer/answer exchange has completed, both agents will have a
set of candidates for each media stream. Each agent forms a set of set of candidates for each media stream. Each agent forms a set of
pairings for each media stream by combining each of its UDP candidate pairs for each media stream by combining each of its
candidates with each of the UDP candidates of its peer, and by candidates with each of the candidates of its peer. Candidates can
combining each of its TCP candidates with each of the TCP candidates be paired up only if their transport protocols are identical. If an
of its peer. If candidates for other transport protocols were offer/answer exchange took place for a session comprised of an audio
signaled through the offer/answer exchange, a pairing is performed and a video stream, and each agent had two candidates per media
between each of those as well. If an offer/answer exchange took stream, there would be 8 candidate pairs, 4 for audio and 4 for
place for a session comprised of an audio and a video stream, and video. One agent can offer two candidates for a media stream, and
each stream had two UDP and two TCP candidates from each agent, there the answer can contain three candidates for the same media stream.
would be 16 pairings, 8 for audio and 8 for video. Each of those In that case, there would be six candidate pairs.
eight would be comprised of four UDP and four TCP. Note that there
is no requirement that the number of candidates from each peer be the
same. One agent can offer two UDP candidates for a media stream, and
the answer can contain three UDP candidates for the same media
stream. In that case, there would be six UDP pairings.
Each candidate has a number of transport addresses. In the case of Each candidate has a number of components, each of which has a
RTP, there are either one or two. Within the pairing, the transport transport address. Within a candidate pair, the components
addresses of each candidate are linked together one-to-one to form a themselves are paired up such that transport addresses with the same
transport address pair. In the case of RTP, the result will either component ID are combined to form a transport address pair.
be one or two transport address pairs - one for RTP, and possibly Returning to the previous example, for each of the 8 candidate pairs,
another for RTCP. The relationship between a candidate, transport there would be two transport address pairs - one for RTP, and one for
address, pairing and transport address pair are shown in Figure 2. RTCP. If one candidate has more components than the other, those
This figure shows the pairing as seen by the agent that owns the extra components will not be part of a transport address pair, won't
candidate {A,B}. The candidate owned by that agent is called the be validated, and will effectively be treated as if they weren't
native candidate, and the one owned by its peer is the remote included in the candidate pair in the first place.
candidate. As the figure shows, there is one pairing between two
candidates, and two transport address pairs ({A,C} and {B,D}). If
one of the candidates only had one transport address (in the case
where RTCP was not being used by one agent), there would only be one
transport address pair, {A,C}. Each transport address is associated
with a tid. Furthermore, each transport address pair is associated
with an ID, the transport address pair ID. This ID is equal to the
concatenation of the tid of the native transport address with the tid
of the remote transport address. This means that the identifiers are
different for each agent. For the agent that owns {A,B}, the
transport address pair ID is WY for the first transport address pair,
and XZ for the second. For the agent that owns {C,D}, it would be
reversed - YW for the first transport address pair, and ZX for the
second.
........................................... The relationship between a candidate, candidate pair, transport
address, transport address pair and component are shown in Figure 2.
This figure shows the relationships as seen by the agent that owns
the candidate with candidate ID "L". This candidate has two
components with transport addresses A and B respectively. This
candidate is called the native candidate, since it is the one owned
by the agent in question. The candidate owned by its peer is called
the remote candidate. As the figure shows, there is a single
candidate pair, and two components in each candidate. The native
candidate has a candidate-id of "L", and the remote candidate has a
candidate-id of "R". Since the two component-ids are 1 and 2,
candidate "L" has two transport addresses with transport address IDs
of "L:1" and "L:2" respectively. Similarly, candidate "R" has two
transport addresses with transport address IDs of "R:1" and "R:2"
respectively.
Furthermore, each transport address pair is associated with an ID,
the transport address pair ID. This ID is equal to the concatenation
of the tid of the native transport address with the tid of the remote
transport address, separated by a colon. This means that the
identifiers are seen differenly for each agent. For the agent that
owns candidate "L", there are two transport address pairs. One
contains transport address "L:1" and "R:1", with a transport address
pair ID of "L:1:R:1". The other contains transport address "L:2" and
"R:2", with a transport address pair ID of "L:2:R:2". For the agent
that owns candidate "R", the identifiers for these two transport
address pairs are reversed; it would be "R:1:L:1" for the first one
and "R:2:L:2" for the second.
...............................................
. . . .
.......... . . .......... . .
. . . ............. ............. . . . . ............. ............. .
. . . . . . . . . . . . tid=L:1 . . tid=R:1 . .
. -- . . . -- . . -- . . . -- . . . -- . . -- . . component
. | A|<<<<<<<<<<| A|--------------------| C|>>>>>>>>>>>>| K| . component. . | A|------------------------| C| . . id=1
. -- . . . -- . Transport . -- . . . -- . id=1 . . -- . Transport . -- . .
. . . . Transport . Address . Transport . . . . . . . Address . . .
. . . . Address . Pair . Address . . . . . . . Pair . . .
. . . . tid=W . ID=WY . tid=Y . . . . . . . id=L:1:R:1 . . .
. . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . .
. . . . . . . . . . . . tid=L:2 . . tid=R:2 . .
. -- . . . -- . . -- . . . -- . component . . -- . . -- . .
. | J|<<<<<<<<<<| B|--------------------| D|>>>>>>>>>>>>| D| . id=2 . . | B|------------------------| D| component
. -- . . . -- . Transport . -- . . . -- . . . -- . Transport . -- . . id=2
.......... . . Transport . Address . Transport . . .......... . . . Address . . .
Associated . . Address . Pair . Address . . Associated . . . Pair . . .
Local . . tid=X . ID=XZ . tid=Z . . Local . . . id=L:2:R:2 . . .
Transport . . . . . . Transport . . . . . .
Addresses . ............. ............. . Addresses . ............. ............. .
. Native Remote . . Native Remote .
. Candidate Candidate . . Candidate Candidate .
. and and . . id=L id=R .
. Transport Addresses Transport Addresses .
. . . .
........................................... . .
...............................................
Pairing Candidate Pair
Figure 2 Figure 2
The figure also shows that each transport address has an associated If a candidate pair was created as a consequence of an offer
local transport address. The associated local transport address is generated by an agent, then that agent is said to be the offerer of
the local transport address at which the agent will receive packets that candidate pair and all of its transport address pairs.
sent to the transport address. For a local transport address, its Similarly, the other agent is said to be the answerer of that
associated local transport address is the same. That is the case of candidate pair and all of its transport address pairs. As a
transport address A and D in the diagram. For STUN derived and TURN consequence, each agent has a particular role, either offerer or
derived transport addresses, however, they are not the same. The answerer, for each transport address pair. This role is important;
associated local transport address is the one from which the STUN or when a candidate pair is to be promoted to active, the offerer is the
TURN transport was derived. one which performs the updated offer.
Next, each agent begins sending connectivity checks for each 7.5 Ordering the Candidate Pairs
transport address pair. The procedure differs for UDP and TCP.
7.4.1 UDP Connectivity Checks For the same reason that the STUN and TURN allocations are paced at a
rate of Ta transactions per second, so too are the connectivity
checks paced, also at a rate of Ta transactions per second. However,
in order to rapidly converge on a valid candidate pair that is
mutually desirable, the candidate pairs are ordered, and the checks
start with the candidate pair at the top of the list. Rapid
convergence of ICE depends on both the offerer and answerer coming to
the same conclusion on the ordering of candidate pairs.
An agent considers a UDP pairing validated when all of its transport Recall that when each candidate is encoded into SDP, it contains a
address pairs have been validated. Each transport address pair is qvalue between 1 and 0, with 1 being the highest priority. Peer-
validated if an agent successfully completed a STUN Binding Request derived candidates, learned through the procedures described in
transaction from its native transport address to the corresponding Section 7.10 also have a priority between 0 and 1. For each media
remote transport address, and when it has received a STUN Binding stream, the native candidates are ordered based on their qvalues,
Request transaction on its native transport address, sent from the with higher q-values coming first. Amongst candidates with the same
remote transport address. This ensures that packets can flow in each qvalue, they are ordered based on candidate ID, using lexicographic
direction. order where C1 is placed before C2, if C2 precedes C1. In other
words, if the qvalues are the same, the candidates are sorted in
reverse order. This is actually important; as discussed in
Section 13, it allows peer-derived candidates to be preferred over
native ones. The result of these two ordering rules will be an
ordered list of candidates. The first candidate in this list is
given a sequence number of 1, the next is given a sequence number of
2, and so on. This same procedure is done for the remote candidates.
The result is that each candidate pair has two sequence numbers, one
for the native candidate, and one for the remote candidate.
Because validation of a transport address pair involves a STUN First, all of the candidate pairs for whom the smaller of the two
transaction in each direction, a pair can be in one of five states - sequence numbers equals 1 are taken first. Then, all of those for
unknown, invalid, send-valid, receive-valid and valid. Each whom the smaller of the two sequence numbers equals 2 are taken next,
transport address pair starts in the unknown state. and so on. Amongst those pairs that share the same value for their
smaller sequence number, they are ordered by the larger of their two
sequence numbers (smallest first). Amongst those pairs that share
the same value for their smaller sequence number and the same value
for their larger sequence number, the larger of the two candidate IDs
in each pair are selected, and the pairs are lexicographically
ordered in reverse by that candidate ID, largest first.
7.4.1.1 Send Validation As an example, consider two agents, A and B. One offers two
candidates for a media stream with candidate IDs of "g9" and "88",
with q-values of 1.0 and 0.8 respectively. The other answers with
three candidates with candidate IDs of "h8", "65" and "kl", with
q-values of 0.3, 0.2 and 0.1 respectively. The following table shows
the rank ordering of the six candidate pairs. The column labeled
"Max SN" is the larger of the two sequence numbers in the candidate
pair, and "Min SN" is the minimum. The column labeled "Max Cand.
To validate a transport address pair in the send direction, an agent ID" is the value of the larger of the two candidate IDs in the
needs to complete a successful STUN Binding Request transaction. candidate pair.
This means it needs to send a Binding Request from its native
transport address to the remote transport address, and receive a
successful Binding Response back.
For UDP-based transport addresses, an agent initiates a STUN Binding Order A A A B B B Max
Request transaction by sending from its native transport address, and Cand. Cand. Cand. Cand. Cand. Cand. Max Min Cand.
sends it to the remote transport address. The meaning of "sending ID q-value SN ID q-value SN SN SN ID
from its native transport address" is clear in the case of a local ---------------------------------------------------------------------
transport address - the request is sent such that the source IP 1 g9 1.0 1 h8 0.3 1 1 1 h8
address and port of the packet is equal to that local transport 2 88 0.8 2 h8 0.3 1 2 1 h8
address. However, the meaning is different for STUN and TURN derived 3 g9 1.0 1 65 0.2 2 2 1 g9
transport addresses. For STUN derived transport address, it is sent 4 g9 1.0 1 k1 0.1 3 3 1 k1
by sending from the local transport address used to derive that STUN 5 88 0.8 2 65 0.2 2 2 2 88
address. For TURN derived transport addresses, it is sent by using 6 88 0.8 2 k1 0.1 3 3 2 k1
TURN mechanisms to send the request through the TURN server (using
the SEND primitive). Sending the request through the TURN server This ordering is then modified slightly by taking the candidate pair
neccesarily requires that the request be sent from the client, using corresponding to the active candidate, if there is one, and promoting
the local transport address used to derive the TURN transport it to the top of the list. This allows the current active candidate
address. to be tested first. As discussed below, media is not sent until the
corresponding candidate is verified, necessitating rapid verification
of the active candidate. This modified ordering is called the
candidate pair check ordering, since it reflects the order in which
connectivity checks will be done. If there was no active candidate,
the candidate pair check ordering and the candidate pair priority
ordering will be identical.
Within each candidate pair there will be a set of transport address
pairs, one for each component ID. Those pairs are ordered by
component ID. The result is an absolute ordering of all transport
address pairs for a media stream, sorted first by the order of their
candidate pairs (with the exception of the active candidate),
followed by the order of their component IDs. This ordering is
called the transport address pair check ordering.
Ordering of candidates may involve transport protocol specific
considerations. There are none for UDP. However, extensions that
define usage of ICE with other transport protocols SHOULD specify any
special ordering considerations.
7.6 Performing the Connectivity Checks
Connectivity checks are performed by sending peer-to-peer STUN
Binding Requests. These checks result in a candidate progressing
through a state machine that captures the progress of connectivity
checks. The specific state machine and the procedures for the
connectivity checks are specific to the transport protocol. This
specification defines rules for UDP. Extensions to ICE that describe
other transport protocols SHOULD describe the state machine and the
procedures for connectivity checks.
The set of states visited by the offerer and answerer are depicted
graphically in Figure 4
|
|Start
|
|
V
+------------+
| |
| |
| Waiting |----------------+
| | |
| | |
+------------+ |
| |
| Timer Ta | Get Req
| --------. | -------
| Send Req | Send Res,
V | Send Req
Get Res +------------+ Get Req |
------- | | ------- |
- | | Send Res |
+---------------| Testing |-----------+ |
| | | | |
| | | | |
| +------------+ | |
| | | |
| | Error | |
| | ----- | |
Timer Tr | | - | |
-------- V V V V
Send Req +------------+ +------------+ +------------+
+-----| | | | | |
| | Recv- | | | | Send- |
| | Valid |------->| Invalid |<-------| Valid |
| | | | | | |
+---->| | Error | | Error | |
+------------+ ----- +------------+ ----- +------------+
| - ^ - |
| | Error |
| | ----- |
| | - |
| +------------+ |
| | | |
| | | |
+-------------->| Valid |<-------------+
Get Req | | Get Res
------- | | -------
Send Res +------------+ -
| ^
| |
| |
+-------+
Timer Tr
--------
Send Req
Figure 4
The state machine has six states - waiting, testing, Recv-Valid,
Send-Valid, Valid and Invalid. Initially, all transport address
pairs start in the waiting state. In this state, the agent waits for
one of two events - a chance to send a Binding Request, or receipt of
a Binding Request.
Since there is an instance of the state machine for each transport
address pair, Binding Requests and responses need to be matched to
the specific state machine for which they apply. This is done by
computing the matching transport address pair for each Binding
Request. This is done by examining the USERNAME of the incoming
Binding Request. The USERNAME directly contains the transport
address pair ID. Requests that are sent by an agent as part of the
processing described here encode the transport address pair in the
USERNAME. Binding Responses are matched to their requests using the
STUN transaction ID, and then mapped to the transport address pair
from that.
Every Ta seconds, the agent starts a new connectivity check for a
transport address pair. The check is started for the first transport
address pair in the transport address pair check ordered list (which
will be the active candidate) that is in the Waiting state. The
state machine for this transport address pair is moved to the Testing
state, and the agent sends a connectivity check using a STUN Binding
Request, as outlined in Section 7.7. Once a STUN connectivity check
begins, the processing of the check follows the rules for STUN.
Specifically, retransmits of STUN requests are done as specified in
RFC 3489, and furthermore, if a transaction fails and needs to be
retried, that retry can happen rapidly, as described below. It
doesn't "count" against the rate limit of 1/Ta checks per second. In
addition, the keepalives that are generated for a valid pair do not
count against the rate limit either. The rate limit applies strictly
to the start of connectivity checks by the answerer for a transport
address pair that has been newly signaled through an offer/answer
exchange.
In addition, if, while in the Waiting state, an agent receives a
Binding Request matching that transport address pair, and this
Binding Request generates a successful response, the agent moves into
the Send-Valid state, and sends a connectivity check of its own using
a STUN Binding Request, as outlined in Section 7.7. If the Binding
Request didn't generate a success response, there is no change in
state or generation of a Binding Request.
If, while in the Testing state, the agent receives a successful
response to its STUN request, it moves into the Recv-Valid state. In
this state, the agent knows that packets can flow in both directions.
However, its peer agent doesn't yet know that; all it knows is that
it has been able to receive a packet. Thus, in this state, the agent
awaits receipt of the Binding Request sent by its peer, as the
response to that request is what informs its peer that packets can
flow in both directions.
If, while in the Send-Valid state, the agent receives a successful
response to its STUN request, it moves to the Valid state. In this
state, the agent knows that packets can flow in each direction. It
also knows that its peer has sent it the STUN Request whose response
will demonstrate to the peer that packets can flow in each direction.
If, while in the Recv-Valid state, the agent receives a STUN Binding
Request from its peer that results in a successful response, the
agent moves into the Valid state. Receipt of a request whose
response was not a successful one does not result in a change in
state.
In any state, if the STUN transaction results in an error, the state
machine moves into the invalid state.
If a transport address pair is in the Recv-Valid or Valid state, an
agent MUST generate a new STUN Binding Request transaction every Tr
seconds. This transaction ensures that NAT bindings for the
transport address pair remain open while the candidate is under
consideration. The transaction is performed as outlined in
Section 7.7. These transactions can also be used to keep the
bindings alive when the candidate is promoted to active, as described
in Section 7.12. Tr SHOULD be configurable, and SHOULD default to 15
seconds. If the transaction results in an error, the state machine
moves to the invalid state. This happens in cases where the NAT
bindings expire (e.g., due to binding timeouts or NAT failures).
The candidate pair itself has a state, which is derived from the
states of its transport address pairs. If at least one of the
transport address pairs in a candidate pair is in the invalid state,
the state of the candidate pair is considered to be invalid. If the
candidate pair enters this state, an agent SHOULD move the state
machines for all of the other transport address pairs in this
candidate pair into the invalid state as well. This will ensure that
connectivity checks never start for those transport address pairs.
Furthermore, if checks are already in progress for one of those
transport address pairs, the agent SHOULD cease them.
If all of the transport address pairs making up the candidate pair
are Valid, the candidate pair is considered valid. If all of the
transport address pairs making up the candidate pair are either Valid
or Recv-Valid, and at least one is Recv-Valid, the candidate pair is
considered to be Recv-Valid. If all of the transport address pairs
making up the candidate pair are either Valid or Send-Valid, and at
least one is Send-Valid, the candidate pair is considered to be Send-
Valid. If all of the transport address pairs in a candidate pair are
in the Waiting state, the candidate pair is in the waiting state. If
all of the transport address pairs in the candidate pair are either
in the Waiting or Testing states, and at least one is in the Testing
state, the state of the candidate pair is Testing. Otherwise, the
state of the candidate pair is considered Indeterminate.
A candidate itself also has a state. If a candidate is present in at
least one valid candidate pair, that candidate is said to be valid.
If all of the candidate pairs containing that candidate are invalid,
the candidate itself is invalid. Otherwise, the candidate's state is
Indeterminate.
If a native candidate becomes valid, and is more preferred than the
active one, the offerer sends an updated offer with this newly
validated candidate promoted to the m/c-line. This process is
discussed in more detail in Section 7.9.
7.7 Sending a Binding Request for Connectivity Checks
An agent performs a Binding Request transaction by sending a STUN
Binding Request from its native transport address, and sending it to
the remote transport address. The meaning of "sending from its
native transport address" depends on the type of transport protocol
and the type of transport address (local, STUN-derived, TURN-derived,
or peer-derived). This specification defines the meaning for UDP.
Specifications defining other transport protocols must define what
this means for them.
For UDP-based local transport addresses, sending from the local
transport address has the meaning one would expect - the request is
sent such that the source IP address and port For STUN derived UDP
transport addresses, it is sent by sending from the local transport
address used to derive that STUN address. For TURN derived UDP
transport addresses, it is sent by using TURN mechanisms to send the
request through the TURN server (using the SEND primitive). Sending
the request through the TURN server neccesarily requires that the
request be sent from the client, using the local transport address
used to derive the TURN transport address.
The Binding Request sent by the agent MUST contain the USERNAME The Binding Request sent by the agent MUST contain the USERNAME
attribute. This attribute MUST be set to the transport address pair attribute. This attribute MUST be set to the transport address pair
ID of the corresponding transport address pair as seen by its peer. ID of the corresponding transport address pair as seen by its peer.
Thus, for the first transport address pair in the example above, if Thus, for the first transport address pair in Figure 2, if the agent
the agent on the left sends the STUN Binding Request, the USERNAME on the left sends the STUN Binding Request, the USERNAME will have
will have the value YW. The request MAY contain the MESSAGE- the value R:1:L:1. If the agent on the right sends the STUN Binding
Request, the USERNAME will have the value L:1:R:1. To be clear, the
USERNAME that is used is NOT the one seen locally, but rather the one
as seen by its peer. The request SHOULD contain the MESSAGE-
INTEGRITY attribute, computed according to RFC 3489 procedures. The INTEGRITY attribute, computed according to RFC 3489 procedures. The
MESSAGE-INTEGRITY The Binding Request MUST NOT contain the CHANGE- key used as input to the HMAC is the password provided by the peer
REQUEST or ANSWER-ADDRESS attribute. for this remote transport address. The Binding Request MUST NOT
contain the CHANGE-REQUEST or RESPONSE-ADDRESS attribute.
Each of these STUN transactions will generate either a timeout, or a The STUN transaction will generate either a timeout, or a response.
response. If the response is a 420, 500, or 401, the agent should If the response is a 420, 500, or 401, the agent should try again as
try again as described in RFC 3489. Either initially, or after such described in RFC 3489 (as mentioned above, it need not wait Ta
a retry, the STUN transaction might produce a non-recoverable failure seconds to try again). Either initially, or after such a retry, the
response (error codes 400, 431, or 600) or a failure result STUN transaction might produce a non-recoverable failure response
inapplicable to this usage of STUN and thus unrecoverable (432, 433). (error codes 400, 430, 431, or 600) or a failure result inapplicable
If this happens the transport address pair and its corresponding to this usage of STUN and thus unrecoverable (432, 433). If this
candidate is considered invalid. If the STUN transaction produces a happens, an error event is generated into the state machine, and the
430 error or times out, the client SHOULD retry with a new STUN transport address pair enters the invalid state.
Binding Request transaction. The 430 response code, as described
below, is generated when the server doesn't recognize the STUN
username because the BindingRequest was sent received prior to the
receipt of the answer. Its ocurrence is a result of a failed race
between the BindingRequest and the answer. This is remedied by
retrying, which allows the "slower" answer to be received. These
retry transactions carry the same USERNAME value as the original
Binding Request, and differ only in their STUN transaction ID. If
these retries have not produced a success response after Tg seconds,
the transport address pair is considered invalid. Tg SHOULD be
configurable. It is RECOMMENDED that it default to 50 seconds. This
is a reasonable approximation of the maximum SIP transaction
duration.
If the STUN transaction succeeds for a UDP transport address pair If the STUN transaction times out, the client SHOULD NOT retry. The
(producing a success response), and the pair was previously in the only reason a retry might succeed is if there was severe packet loss
receive-valid state, it is considered valid. If the pair was during the duration of the check, or the answer was significantly
previously in the unknown state, it is considered send-valid. delayed, also due to packet loss. However, STUN Binding Request
transactions run for 9.5 seconds, which is well beyond the typical
tolerance for a session establishment. The retries come with a
penalty of additional traffic, which can be used to launch DoS
attacks Section 13.4.2. The only reason to not follow the SHOULD NOT
is if the agent has adjusted the STUN transaction timers to be more
aggressive.
If a transport address pair is send-valid or valid, an agent MUST If the Binding Response is a 200, the agent SHOULD check for the
generate a new STUN Binding Request transaction every Tr seconds. MESSAGE-INTEGRITY attribute and verify it, as discussed in RFC 3489.
This transaction ensures that NAT bindings for the transport address Indeed, this check SHOULD be done for all responses. This will
pair remain open while the candidate is under consideration. They result in the response being discarded (eventually leading to a
can also be used to keep the bindings alive when the candidate is timeout), if the integrity check fails.
promoted to active, as described in Section 7.7. Tr SHOULD be
configurable, and SHOULD default to 15 seconds. Each new Binding
Request transaction is processed according to the procedures in this
Section. It is possible for a previously valid candidate to later be
invalidated by a subsequent STUN transaction. This happens in cases
where the NAT bindings expire.
7.4.1.2 Receive Validation 7.8 Receiving a Binding Request for Connectivity Checks
As a result of providing a list of candidates in its offer or answer, As a result of providing a list of candidates in its offer or answer,
an ICE implementation will receive STUN Binding Request messages. An an agent will receive STUN Binding Request messages. An agent MUST
agent MUST be prepared to receive STUN Binding Requests on each local be prepared to receive STUN Binding Requests on each local transport
transport address from the moment it sends an offer or answer that address from the moment it sends an offer or answer that contains a
contains a candidate with that local transport address. Similarly, candidate with that local transport address. Similarly, it MUST be
it MUST be prepared to receive STUN Binding Requests on a local prepared to receive STUN Binding Requests on a local transport
transport address the moment it sends an offer or answer that address the moment it sends an offer or answer that contains a STUN
contains a STUN or TURN candidate derived from a local candidate or TURN candidate derived from a local candidate containing that
containing that local transport address. It can cease listening for local transport address. It can cease listening for STUN messages on
STUN messages on that local transport address after reliably sending that local transport address after sending an updated offer or answer
an updated offer or answer which does not include any candidates which does not include any candidates with transport addresses that
equal to or derived from that local transport address. Here, are equal to or derived from that local transport address.
"reliably" means that the agent knows that the offer or answer was
received by its peer. This knowledge is based on the protocol
carrying the offer/answer exchanges. In the case of SIP, if the
offer is in an INVITE, the agent knows this was received by its peer
when a 200 OK or reliable provisional response [9] is received with
the answer. If the offer is in a reliable provisional response, the
agent knows it was reliably received when the PRACK arrives. If an
answer is in a 200 OK response, the agent knows this was received
when the ACK is received.
The agent does not need to provide STUN service on any other IP The agent does not need to provide STUN service on any other IP
address or port, unlike the STUN usage described in [1]. The need to addresses or ports, unlike the STUN usage described in [1]. The need
run the service on multiple ports is to support the change flags. to run the service on multiple ports is to support receipt of Binding
However, those flags are not needed with ICE, and the server SHOULD Requests with the CHANGE-REQUEST attribute. However, that attribute
reject, with a 400 answer, any STUN requests with these flags set. is not used when STUN is used for connectivity checks. A server
The CHANGED-ADDRESS attribute in a BindingAnswer is set to the SHOULD reject, with a 400 answer, any STUN requests with a CHANGE-
transport address on which the server is running. REQUEST attribute whose value is non-zero. The CHANGED-ADDRESS
attribute in a BindingAnswer is set to the transport address on which
the server is running.
Furthermore, there is no need to support TLS or to be prepared to Furthermore, there is no need to support TLS or to be prepared to
receive SharedSecret request messages. Those messages are used to receive SharedSecret request messages. Those messages are used to
obtain shared secrets to be used with BindingRequests. However, with obtain shared secrets to be used with BindingRequests. However, with
ICE, a shared secret is not needed. The tid's that are exchanged and ICE, these shared secrets are exchanged through the offer/answer
used to form the STUN USERNAME attribute do not actually require the exchange itself.
security properties associated with a shared secret in order for ICE
to operate securely; this is because ICE security is bootstrapped off
of the protocol carrying the offer/answer exchanges.
One of the candidates will be in use as the active candidate. For One of the candidates may be in use as the active candidate. For the
the transport addresses comprising that candidate, the agent will transport addresses comprising that candidate, the agent will receive
receive both STUN requests and media packets on its associated local both STUN requests and media packets on its associated local
transport addresses. The agent MUST be able to disambiguate them. transport addresses. The agent MUST be able to disambiguate them.
In the case of RTP/RTCP, this disambiguation is easy. RTP and RTCP In the case of RTP/RTCP, this disambiguation is easy. RTP and RTCP
packets start with the bits 0b10 (v=2). The first two bits in STUN packets start with the bits 0b10 (v=2). The first two bits in STUN
are always 0b00. This disambiguation also works for packets sent are always 0b00. This disambiguation also works for packets sent
using Secure RTP [23], since the RTP header is in the clear. using Secure RTP [25], since the RTP header is in the clear.
Disambiguating STUN with other media stream protocols may be more Disambiguating STUN with other media stream protocols may be more
complicated. However, it can always be possible with arbitrarily complicated. However, it can always be possible with arbitrarily
high probabilities by selecting an appropriately random username (see high probabilities by selecting an appropriately random username (see
below). below).
The STUN Binding Request can only be usefully processed once an Processing of the Binding Request proceeds in two steps. The first
offer/answer exchange has completed. As a result, if an offeror is generation of the response, and the second is side-effect
receives a STUN Binding Request message prior to the receipt of an processing. Generation of the response follows the general
answer to its offer, it MUST reject the request with a 430 response. procedures of RFC 3489. The USERNAME is considered valid if its
This will cause the answerer to retry, and give time for the answer topmost portion (the part up to, but not including the second colon)
(which is in transit) to arrive at the offerer. corresponds to a transport address ID known to the agent. The
password associated with that transport address ID is used to verify
the MESSAGE-INTEGRITY attribute, if one was present in the request.
If the USERNAME was not valid, the agent generates a 430. Otherwise,
the success response will include the MAPPED-ADDRESS attribute, which
is used for learning new candidates, as described in Section 7.10.
The MAPPED-ADDRESS attribute is populated with the source IP address
and port of the Binding Request. For Binding Requests received over
TURN-derived transport addresses, this MUST be the source IP address
and port of the Binding Request when it arrived at the TURN relay,
prior to forwarding towards the agent. That source transport address
will be present in the REMOTE-ADDRESS attribute of a TURN Data
Indication message, if the Binding Request were delivered through a
Data Indication. If the Binding Request was not encapsulated in a
Data Indication, that source address is equal to the current active
destination for the TURN session.
If the offer/answer exchange has completed, the agent MUST follow the The side effect processing involves changes to the state machine for
procedures defined in RFC 3489 and verify that the USERNAME attribute a transport address pair. This processing cannot be done until the
is known to the server. Here, this is done by taking the USERNAME initial offer/answer exchange has completed. As a consequence, if
attribute, and comparing it against the transport address pair the answerer received a Binding Request that generated a success
identifiers for each transport address pair as seen by that agent. response, but had not yet received the answer to its offer, it waits
If there is no match, the STUN Binding Request generates a 400. If for the answer, and when it arrives, then performs the side effect
processing.
The agent takes the entire contents of the USERNAME, and compares
them against the transport address pair identifiers as seen by that
agent for each transport address pair. If there is no match, nothing
is done - this should never happen for compliant implementations. If
there is a match, the resulting transport address pair is called the there is a match, the resulting transport address pair is called the
matching transport address pair. The user agent proceeds with the matching transport address pair. The state machine for the matching
processing of the request and generation of a response as per RFC transport address pair is then updated based on the receipt of a STUN
3489. In addition, the if the state of that transport address pair Binding Request, and the resulting actions described in Section 7.6
was previously unknown, it changes to receive-valid. If the state are undertaken.
was previously send-valid, it moves to valid.
An agent will continue to receive periodic STUN transactions as long An agent will continue to receive periodic STUN transactions on a
as it had listed its transport address in an a=candidate attribute. local transport address as long as it had listed that transport
It MUST process those transactions according to this section. It is address, or one derived from it, in an a=candidate attribute in its
most recent offer or answer, and the state machine indicates that
Binding Requests are periodically sent (as is the case for UDP). It
MUST process any such transactions according to this section. It is
possible that a transport address pair that was previously valid may possible that a transport address pair that was previously valid may
become invalidated as a result of a subsequent failed STUN become invalidated as a result of a subsequent failed STUN
transaction. transaction.
7.4.1.3 Learning New Candidates from Connectivity Checks 7.9 Promoting a Candidate to Active
As a consequence of the connectivity checks, each agent will change
the states for each transport address pair, and consequently, for the
candidate pairs. When a candidate pair becomes valid, and the agent
is in the role of offerer for that candidate pair, the agent follows
the logic in this section. The rules only apply to the offerer of a
candidate pair in order to eliminate the possibility of both agents
simultaneously offering an update to promote a candidate to active.
If this candidate pair is the first one in the candidate pair
priority ordered list, the agent SHOULD send an updated offer as
described in Section 7.11.1. If this candidate pair is not the first
on that list, but it is the first on the candidate pair check ordered
list, it means that this candidate pair is the active one, and its
connectivity has been verified. This is good news; the currently
active candidate is working. Media can now flow as described in
Section 7.13 (media will never flow prior to validation). However,
no updated offer is sent at this time.
If this candidate pair is not the first on the candidate pair
priority ordered list or the candidate pair check ordered list, and
the wait-state timer has not yet been set, the agent sets this timer
to Tws seconds. Tws SHOULD be configurable, and SHOULD have a
default of 100ms. This timer allows for a higher priority
connectivity check to complete, in the event its STUN Binding Request
was lost or delayed in the network. If, prior to the wait-state
timer firing, another connectivity check completes and a candidate
pair is validated, there is no need to reset or cancel the timer.
Once the timer fires, the agent SHOULD issue an updated offer as
described in Section 7.11.1.
7.10 Learning New Candidates from Connectivity Checks
ICE makes use of candidate addresses learned through protocols like ICE makes use of candidate addresses learned through protocols like
STUN, as described in Section 7.1. These addresses are learned when STUN, as described in Section 7.1. These addresses are learned when
STUN requests are sent to configured STUN servers. However, the STUN requests are sent to configured STUN servers. However, the
peer-to-peer STUN connectivity checks can themselves provide peer-to-peer STUN connectivity checks can themselves provide
additional candidates that ICE can make use of. This happens when additional candidates that ICE can make use of. This happens, for
two agents are separated by a symmetric NAT. When the agent behind example, when two agents are separated by a symmetric NAT. When the
the symmetric NAT sends a Binding Request to the other agent (which agent behind the symmetric NAT sends a Binding Request to the other
can have a public address or be behind any type of NAT except for agent (which can have a public address or be behind any type of NAT
symmetric), the symmetric NAT will create a new NAT binding for this except for symmetric), the symmetric NAT will create a new NAT
Binding Request. Because of the properties of symmetric NAT, that binding for this Binding Request. Because of the properties of
binding can be used be the agent on the public side of the symmetric symmetric NAT, that binding can be used be the agent on the public
NAT to send packets back to the agent behind the symmetric NAT. side of the symmetric NAT to send packets back to the agent behind
the symmetric NAT.
To do this, ICE agents dynamically learn new candidates by examining
the source IP addresses and MAPPED-ADDRESS attributes in STUN Binding
Requests and Responses respectively. If they don't match any
existing candidates, a new candidate is added. This candidate
corresponds to the new IP address and port created by the symmetric
NAT, and is a new point of contact for the agent behind the symmetric
NAT. Since that candidate is only reachable from the very specific
IP address and port where the STUN request was sent to, the new
candidate is paired up with that transport address on the other
agent. Since all candidates need to have properties, such as tids,
priorities and candidate IDs, these are all computed algorithmically,
so that they can be determined by both agents just from the STUN
message.
The specific procedures on receipt of a Binding Request and Response To do this, ICE agents perform additional processing on the receipt
for accomplishing this are described here. of STUN Binding Requests and responses, beyond the logic described in
Section 7.7 and Section 7.8. This logic is described below.
7.4.1.3.1 On Receipt of a Binding Request 7.10.1 On Receipt of a Binding Request
When a STUN Binding Request is received which generates a success When a STUN Binding Request is received which generates a success
response, the source IP address and port of that request is compared response, that Binding Request would have been associated with a
all existing remote transport addresses. If there is no match, the matching transport address pair and corresponding candidate pair.
agent creates a new remote candidate, and adds a transport address to The source IP and port of this Binding Request are compared to the IP
it. It sets the IP address and port of this new remote transport address and port of the remote transport address in the matching
address to the IP address and port that was present in the incoming transport address pair. Note that, in this case, we are comparing
Binding Request. Since this is a new candidate transport address, it actual IP addresses and ports - not tids. In addition, if the
requires a new tid. The agent creates one algorithmically, by Binding Request arrived through a TURN derived transport address, the
concatenating the tid of the remote transport address in the matching source IP and port of this binding request used for the comparison
transport address pair (recall that the matching transport address are those in the Binding Request when it arrived at the TURN relay,
pair is the one whose transport address pair ID matched the username prior to forwarding towards the agent. That source transport address
of the incoming Binding Request) with the string representation of will be present in the REMOTE-ADDRESS attribute of a TURN Data
the source IP address and port from the incoming Binding Request. Indication message, if the Binding Request were delivered through a
This string representation is defined using the grammar for Data Indication. If the Binding Request was not encapsulated in a
"hostport" from RFC 3261 [3], which defines the familiar notation of Data Indication, that source address is equal to the current active
the IP address and port separated by a colon. destination for the TURN session.
The priority of the new candidate MUST be set to the priority of the The comparison of the source IP and port of the Binding Request and
remote candidate in the matching transport address pair. There is no the IP address and port of the remote transport address in the
need to compute the candidate ID for this new candidate. matching transport address pair may not match. One reason this could
happen is if there was a NAT between the two agents. If they do not
match, the source IP and port of the Binding Request (and again, for
TURN derived transport address, this refers to the source IP address
and port of the packet when it arrived at the relay) are compared to
the IP address and ports across the transport address pairs in *all*
remote candidates. If there is still no match, it means that the
source IP and port might represent another valid remote transport
address. Such a transport address is called a peer-derived transport
address.
Though this is a valid transport address, the agent does not pair it To use it, that address needs to be associated with a candidate
up with each of its own transport addresses. Rather, it pairs it up (called a peer-derived candidate). In this case, however, the
only with the native transport address from the matching transport candidate isn't signaled through an offer/answer exchange; it is
address pair. This creates a new transport address pair. Since constructed dynamically from information in the STUN request. Like
connectivity has been verified in the receive direction, the agent all other candidates, the peer-derived candidate has a candidate ID.
sets its state to receive-valid. As with all other transport address The candidate ID is derived from the candidate IDs of the matching
pairs, the agent will attempt to validate send capabilities by candidate pair. In particular, the candidate ID is constructed by
sending a STUN Binding Request according to the procedures in concatenating the remote candidate ID with the native candidate ID
Section 7.4.1.1. (without the colon).
It is important to note that this process creates a new remote On receipt of a STUN Binding Request whose source IP and port don't
transport address, not a whole new remote candidate. For a whole match the transport address in any remote candidate, the agent
remote candidate to come into existence, all of its component constructs the candidate ID that represents the peer-derived
transport addresses must come into existence, and all must have been candidate, and checks to see if that candidate exists. It may
obtained as a result of a STUN Binding Requests between transport already exist if it had been constructed as a consequence of a
address pairs in the same pairing. As an example, consider the previous application of this logic on receipt of a Binding Request
pairing in Figure 2. If the peer is behind a symmetric NAT, the for a different transport address pair of the same candidate pair.
Binding Request sent from C to A might produce a new remote transport If there is not yet a peer derived candidate with that candidate ID,
address for RTP. To create a full candidate, a STUN Binding Request the agent creates it, and assigns it the newly computed candidate ID.
from D to B has to also create a new remote transport address, to be The priority of the peer-derived candidate MUST be set to the
used for RTCP. If this were to happen, the resulting set of priority of its generating candidate - the remote candidate in the
relationships is shown in Figure 3. To simplify the diagram, matching transport address pair. Note that, at this time, the peer
associated local transport address relationships have been omitted. derived candidate has no transport addresses in it.
Notice how the tids of the new remote candidate have been constructed
by concatenating the tids of the original remote candidate with the Newly created or not, the agent extracts the component ID from the
newly discovered transport addresses, here, {R,S}. matching transport address pair, and sees if a transport address with
that same component ID exists in the peer derived candidate. If not
(and it shouldn't), the agent adds a transport address to the peer-
derived candidate. This transport address is equal to the source IP
address and port from the incoming STUN Binding Request. It is
assigned the component ID equal to the component ID in the matching
transport address pair. This transport address will have a tid,
equal to the concatenation of the candidate ID for this new
candidate, and the component ID, separated by a colon.
The peer-derived candidate becomes usable once the number of
transport addresses in it equals the transport address pair count of
the candidate pair from which it is derived. Initially, the peer-
derived candidate will start with a single transport address. More
are added as the connectivity checks for the original candidate pair
take place. Once the peer-derived candidate becomes usable, it has
to be paired up with native candidates. However, unlike the
procedures of Section 7.5, which pair up each remote candidate with
each native candidate, this peer-derived candidate is only paired up
with the native candidate from the candidate pair from which it was
derived. This creates a new candidate pair, and a set of new
transport address pairs.
Recall that, for each candidate pair, one agent plays the role of
offerer, and the other of answerer. For peer-derived candidates, the
agent that receives the STUN request and follows the processing in
this section acts as the answerer.
Figure 5 provides a pictorial representation of the peer derived
candidate (the one with id=RL) and its pairing with the native
candidate with id L. The candidate with ID R is referred to as the
generating candidate. The peer-derived candidate is effectively an
alternate for that generating candidate, but is only paired with a
specific native candidate. Note that, for a particular generating
candidate, there can be many peer derived candidates, up to one for
each native candidate.
............. ............. ............. .............
. . . . . tid=L:1 . . tid=R:1 .
. -- . . -- . component. -- . id=L:1:R:1 . -- .component
. | A|---------------------------------------| C| . id=1 . | A|------------------------| C| . id=1
. -- -----------+ Transport . -- . . -- -------+ . -- .
. Transport . | Address . Transport .
. Address . | Pair . Address .
. tid=W . | ID=WY . tid=Y .
. . | . . . . | . .
. . | . . Generating
. . | . . Candidate
. . | . . . . | . .
. . | . . . . | . .
. -- . | . -- . . tid=L:2 . | . tid=R:2 .
. | B|-----------C---------------------------| D| . component. -- . | id=L:2:R:2 . -- .component
. -- ---------+ | Transport . -- . id=2 . | B|-------C----------------| D| . id=2
. Transport . | | Address . Transport . . -- -----+ | . -- .
. Address . | | Pair . Address . . .| | . .
. tid=X . | | ID=XZ . tid=Z . . .| | . .
. .| | . .
. . | | . . . . | | . .
............. | | ............. ............. | | .............
| | remote Native | | Remote
native | | candidate Candidate | | Candidate
candidate | | id=L | | id=R
| |
| |
.| |
| |
| |
| |
| | ............. | | .............
| | . . | | . tid=RL:1 .
| | . -- . | | id=L:1:RL:1 . -- .component
| +---------------------------| R| . | +-----------------| C| . id=1
| Transport . -- . | . -- .
| Address . Transport .
| Pair . Address .
| ID=WYR . tid=YR .
| . . | . .
| . . Peer Derived
| . . Candidate
| . . | . .
| . . | . .
| . -- . | . tid=RL:2 .
+-----------------------------| S| . | id=L:2:RL:2 . -- .component
Transport . -- . +-------------------| D| . id=2
Address . Transport . . -- .
Pair . Address . . .
ID=XZS . tid=ZS . . .
. . . .
.............
peer-derived
remote candidate
Figure 3
7.4.1.3.2 On Receipt of a Binding Response
When an agent receives a successful Binding Response, it examines the
MAPPED-ADDRESS attribute in that response. If the MAPPED-ADDRESS
does match any of the existing candidate transport addresses, this
represents a new peer-derived transport address.
The agent creates a new local candidate, and adds a transport address
to it. It sets the IP address and port of this new native transport
address to the IP address and port that was present in the MAPPED-
ADDRESS attribute of the Binding Response. Since this is a new
candidate transport address, it requires a new tid. The agent
creates one algorithmically, by concatenating the tid of the native
transport address in the transport address pair that was being
validated by the Binding Request with the string representation of
the source IP address and port from the MAPPED-ADDRESS attribute.
This string representation is defined using the grammar for
"hostport" from RFC 3261 [3], which defines the familiar notation of
the IP address and port separated by a colon.
The priority of the new candidate MUST be set to the priority of the
native candidate that was being validated by the Binding Request.
The agent SHOULD assign a new candidate ID to this candidate.
Though this is a valid transport address, the agent does not pair it
up with each of the remote transport addresses. Rather, it pairs it
up only with the remote transport address from the transport address
pair that was being validated. This creates a new transport address
pair. Since connectivity has been verified in the send direction,
the agent sets its state to send-valid. As with all other transport
address pairs, the agent will attempt to validate receive
capabilities by waiting for a a STUN Binding Request according to the
procedures in Section 7.4.1.2.
It is important to note that this process creates a new native
transport address, not a whole new candidate. For a whole native
candidate to come into existence, all of its component transport
addresses must come into existence, and all must have been obtained
as a result of a STUN Binding Requests between transport address
pairs in the same pairing.
7.4.2 TCP Connectivity Checks
7.4.2.1 Connection Establishment
Because of the connection-oriented nature of TCP, the connectivity
checks work differently. After the offer/answer exchange completes,
each agent will have a set of TCP candidates at which it is waiting
to receive a connection on, and it will have a similar set from its
peer. Thus, a pairing of TCP candidates allows for the possibility
of TCP connections in each direction. Unlike the UDP checks, where
the STUN packets are sent from the native transport addresses to the
remote ones, the TCP connections are not opened from the native TCP
transport addresses to the remote ones. This would represent a
simultaneous open, and represent an unusual condition that would
either fail, or at best result in a single TCP connection. Rather,
ICE desires to attempt two connections, one in each direction, and
use one of them if both happen to succeed.
To accomplish this, each agent will attempt to open a connection to
each remote transport address in the transport address pair, and do
so "from" its native transport address. Here, however, "from" means
something different than the UDP case. If the native transport
address is a local transport address, the agent opens the TCP
connection from the same IP interface used to obtain the local
transport address, but from a different and ephemeral port. Indeed,
that port MUST NOT be the same as the port in the local transport
address. If the native transport address is a TURN-derived TCP
transport address, no attempt is made to open a connection at all.
TURN-derived TCP transport addresses can only be used in passive
mode.
As such, for each TCP transport address pair, there will be either
zero, one, or two connection attempts. If the transport address
pairs are both TURN-derived, there will be zero (both sides passive).
If one of the transport addresses is local, and the other TURN
derived, there will be one connection attempt. The agent owning the
local transport address will be in active mode, and the agent owning
the TURN-derived one will be in passive mode. If both are local
transport address, there will be two attempts, and each agent will
act in active mode.
Because a transport address pair can produce multiple connections,
validity becomes a property of the TCP connection itself. A
transport address pair is considered valid if at least one valid
connection has been established within it. An entire pairing is
valid if all transport address pairs are valid.
7.4.2.2 Sending STUN Binding Requests
Once the connection is established, the agent which opened the . .
connection (that is, acted in active mode) sends a STUN Binding .............
Request over that connection. STUN Binding Requests as described in Remote
RFC 3489 are not normally sent over UDP, but when used in conjunction Candidate
with ICE for connectivity checks, they are sent over TCP. id=RL
This unusual operation requires some explanation. At first glance, a Figure 5
successful TCP connection ought to be sufficient. Clearly,
connectivity is established, as TCP packets were exchanged in both
directions via the TCP handshake. While that is true, the STUN
Binding Requests serve many purposes, only one of which is to
literally test connectivity. The STUN requests also serve as a
correlation vehicle, allowing the agent to match the source of a
connection attempt with the offer/answer signaling driving the entire
mechanism. For example, in the case of a forked SIP INVITE carrying
an offer, the UAC may receive two connection attempts to each of its
passive TCP addresses, one from each branch of the fork. These are
readily disambiguated by the STUN Binding Request which will follow,
as the tid in the USERNAME tells the UAC which branch has initiated
the connection.
More importantly, however, the STUN Binding Request is an essential The new transport address pairs have a state machine associated with
part of the security properties of ICE. Without it, an entity them. The state that is entered, and actions to take as a
eavesdropping the signaling messages would be able to deny service or consequence, are specific to the transport protocol. For UDP, the
hijack media connections, and such attacks would require encryption procedures are defined here. Extensions that define processing for
of the offer/answer exchanges (using a mechanism like SIPS [3]) to other transport protocols SHOULD describe the behavior.
prevent. However, when a STUN Binding Request exchange is added,
these attacks are completely foiled without the need for SIPS,
raising the overall security of ICE substantially with minimal cost.
These properties of ICE are discussed thoroughly in Section 12.
As such, once an agent has actively opened a TCP connection to the For UDP, the state machine enters the Send-Valid state. Effectively,
remote agent, it sends a STUN Binding Request over that connection. the Binding Request just received "counts" as a validation in this
Recall that STUN messages include length indicators, allowing them to direction, even though it was formally done for a different candidate
be framed over a connection-oriented transport protocol. The Binding pair. In addition, the agent SHOULD generate a Binding Request for
Request MUST contain the USERNAME attribute. This attribute MUST be each transport address in this new candidate pair, as described in
set to the transport address pair ID of the corresponding transport Section 7.7. The transport address pairs are inserted into the
address pair as seen by its peer. Thus, for the first transport ordered list of pairs based on the ordering described in Section 7.5
address pair in Figure 2, if the agent on the left sends the STUN and processing follows the logic described in Section 7.6.
Binding Request, the USERNAME will have the value YW. The request
MAY contain the MESSAGE-INTEGRITY attribute, computed according to
RFC 3489 procedures. The MESSAGE-INTEGRITY The Binding Request MUST
NOT contain the CHANGE-REQUEST or ANSWER-ADDRESS attribute. The STUN
BindingRequest message SHOULD NOT be retransmitted over the
connection.
The STUN will generate either a timeout, or a response. If the 7.10.2 On Receipt of a Binding Response
response is a 420, 500, or 401, the agent should try again as
described in RFC 3489. Either initially, or after such a retry, the
STUN transaction might produce a non-recoverable failure response
(error codes 400, 431, or 600) or a failure result inapplicable to
this usage of STUN and thus unrecoverable (432, 433). If this
happens the connection is considered invalid. If the STUN
transaction produces a 430 error or times out, the client SHOULD
retry with a new STUN Binding Request transaction. The 430 response
code is a result of a failed race between the BindingRequest and the
answer. This is remedied by retrying, which allows the "slower"
answer to be received. These retry transactions carry the same
USERNAME value as the original Binding Request, and differ only in
their STUN transaction ID. If these retries have not produced a
success response after Tg seconds, the connection is considered
invalid. Tg SHOULD be configurable. It is RECOMMENDED that it
default to 50 seconds. This is a reasonable approximation of the
maximum SIP transaction duration.
If the STUN Binding Request generates a successful response, the The procedures on receipt of a Binding Response are nearly identical
connection over which it was sent is considered valid. Furthermore, to those for receipt of a Binding Request as described above.
the agent stores the IP address and port from the MAPPED-ADDRESS
response in the STUN Binding Response. This is called the "apparent"
native transport address for the active side of the connection. It
will be used later if this connection is used for media transport.
Once a connection is valid, the agent which initiated the connection When a successful STUN Binding Response is received, it will be
MUST generate a new STUN Binding Request transaction every Tr associated with a matching transport address pair and corresponding
seconds. This transaction ensures that NAT bindings for the candidate pair. This matching is done based on comparison of
connection remain open while the connection is under consideration as candidate IDs. The value of the MAPPED-ADDRESS attribute of the
a candidate. Tr SHOULD be configurable, and SHOULD default to 15 Binding Response are compared to the IP address and port of the
seconds. Each new Binding Request transaction is processed according native transport address in the matching transport address pair.
to the procedures in this section. It is possible for a previously Note that, in this case, we are comparing actual IP addresses and
valid candidate to later be invalidated by a subsequent STUN ports - not tids. These may not match if there was a NAT between the
transaction. This happens in cases where the NAT bindings expire. two agents. If they do not match, the value of the MAPPED-ADDRESS
Note that, unlike the UDP case, STUN is sent only while a connection attribute of the Binding Response are compared to the IP address and
is is not active for media. If the connection is used as the active ports across the transport address pairs in *all* native candidates.
connection for media, STUN MUST NOT be sent. If there is still no match, it means that the MAPPED-ADDRESS might
represent another valid remote transport address.
7.4.2.3 Receiving STUN Requests To use it, that address needs to be associated with a candidate. In
this case, however, the candidate isn't signaled through an offer/
answer exchange; it is constructed dynamically from information in
the STUN response. Such a candidate is called a peer-derived
candidate. Like all other candidates, the peer-derived candidate has
a candidate ID. The candidate ID is derived from the candidate IDs
of the matching candidate pair. In particular, the candidate ID is
constructed by concatenating the native candidate ID with the remote
candidate ID (without the colon).
When an agent acted as the passive side of a TCP connection, it will On receipt of a STUN Binding Response whose MAPPED-ADDRESS didn't
receive a STUN Binding Request over that connection. match the transport address in any native candidate, the agent
constructs the candidate ID that represents the peer-derived
candidate, and checks to see if that candidate exists. It may
already exist if it had been constructed as a consequence of a
previous application of this logic on receipt of a Binding Response
for a different transport address pair of the same candidate pair.
If there is not yet a peer derived candidate with that candidate ID,
the agent creates it, and assigns it the newly computed candidate ID.
The priority of the new candidate MUST be set to the priority of the
generating candidate - the native candidate in the matching transport
address pair. Note that, at this time, the peer derived candidate
has no transport addresses in it.
One of the candidates will be in use as the active candidate. For Newly created or not, the agent extracts the component ID from the
the transport addresses comprising that candidate, the agent will matching transport address pair, and sees if a transport address with
receive both STUN requests and media packets on its associated local that same component ID exists in the peer derived candidate. If not
transport addresses. The agent MUST be able to disambiguate them. (and it shouldn't), the agent adds a transport address to the peer-
In the case of RTP/RTCP, this disambiguation is easy. RTP and RTCP derived candidate. This transport address is equal to the MAPPED-
packets start with the bits 0b10 (v=2). The first two bits in STUN ADDRESS from the STUN Binding Response. It is assigned the component
are always 0b00. This disambiguation also works for packets sent ID equal to the component ID in the matching transport address pair.
using Secure RTP [23], since the RTP header is in the clear. This transport address will have a tid, equal to the concatenation of
Disambiguating STUN with other media stream protocols may be more the candidate ID for this new candidate, and the component ID,
complicated. However, it can always be possible with arbitrarily separated by a colon.
high probabilities by selecting an appropriately random username (see
below).
The STUN Binding Request can only be usefully processed once an The peer-derived candidate becomes usable once the number of
offer/answer exchange has completed. As a result, if an offeror transport addresses in it equals the transport address pair count of
receives a STUN Binding Request message prior to the receipt of an candidate pair from which it is derived. Initially, the peer-derived
answer to its offer, it MUST reject the request with a 430 response. candidate will start with a single transport address. More are added
This will cause the answerer to retry, and give time for the answer as the connectivity checks for the original candidate pair take
(which is in transit) to arrive at the offerer. place. Once the peer-derived candidate becomes usable, it has to be
paired up with remote candidates. However, unlike the procedures of
Section 7.5, which pair up each remote candidate with each native
candidate, the peer-derived candidate is only paired up with the
remote candidate from the matching candidate pair . This creates a
new candidate pair, and a set of new transport address pairs.
If the offer/answer exchange has completed, the agent MUST follow the Recall that, for each candidate pair, one agent plays the role of
procedures defined in RFC 3489 and verify that the USERNAME attribute offerer, and the other of answerer. For peer-derived candidates, the
is known to the server. Here, this is done by taking the USERNAME agent that receives the STUN request and follows the processing in
attribute, and comparing it against the transport address pair this section acts as the answerer.
identifiers for each transport address pair as seen by that agent.
If there is no match, the STUN Binding Request generates a 400. If
there is a match, the resulting transport address pair is called the
matching transport address pair. The user agent proceeds with the
processing of the request and generation of a response as per RFC
3489. In addition, the agent stores the source IP address and port
of the Binding Request, and associates it with the connection. This
address is called the "apparent" remote transport address for this
connection.
An agent will continue to receive periodic STUN transactions as long The new transport address pairs have a state machine associated with
as it had listed its transport address in an a=candidate attribute. them. The state that is entered, and actions to take as a
It MUST process those transactions according to this section. It is consequence, are specific to the transport protocol. For UDP, the
possible that a transport address pair that was previously valid may procedures are defined here. Extensions that define processing for
become invalidated as a result of a subsequent failed STUN other transport protocols SHOULD describe the behavior.
transaction.
Note that, unlike the UDP case, there will never be simultaneous For UDP, the state machine enters the Recv-Valid state. Effectively,
transmission of media and STUN packets over TCP connections. This is the Binding Response just received "counts" as a validation in this
because the connection is listed as on hold according to comedia direction, even though it was formally done for a different candidate
procedures, and no media will be transmitted. ICE will establish the pair. The transport address pairs are inserted into the ordered list
connections as described here. Once established, an updated offer/ of pairs based on the ordering described in Section 7.5, and
answer exchange can promote those connections to active usage through processing follows the logic described in Section 7.6.
the comedia "exist" mechanism, as described below. The additional
offer/answer exchange provides a barrier synchronization point at
which a TCP connection switches from ICE control to control by the
media source and sinks. Once it is active, STUN packets will no
longer be sent on the connection.
7.5 Promoting a Valid Candidate to Active 7.11 Subsequent Offer/Answer Exchanges
7.5.1 Minimum Requirements An agent MAY issue an updated offer at any time. This updated offer
may be sent for reasons having nothing to do with ICE processing (for
example, the addition of a video stream in a multimedia session), or
it may be due to a change in ICE-related parameters. For example, if
an agent acquires a new candidate after the initial offer/answer
exchange, it may seek to add it.
As the STUN connectivity checks run, they will result in the However, agents SHOULD follow the logic described in Section 7.9 to
validation of pairings. Once validated, a pairing can be used by determine when to send an updated offer as a consequence of promoting
promoting it to active. This promotion occurs by placing the a candidate to active.
transport addresses for the native candidate of the pairing into the
m/c line and sending an updated offer. It MAY promote a candidate
associated with any validated pairing at any time, as long as the
candidate had been provided in series of a=candidate attributes in
the most recent offer (in other words, an agent can't validate a
candidate, omit that candidate from the a=candidate attribute of an
offer, and then later on, generate a new offer that promotes the
candidate to active). The procedures for doing so are described
here.
Any candidates which the agent would like to retain as valid If there are any aspects of this processing that are specific to the
candidates are also included in a=candidate lines in the offer. It transport protocol, those SHOULD be called out in ICE extensions that
SHOULD include any candidates learned from the peer-to-peer discovery define operation with other transport protocols. There are no
processing of Section 7.4.1.3, and SHOULD include any candidates of additional considerations for UDP.
higher priority than the one just promoted to active. It SHOULD omit
candidates of lower priority than the one being promoted to active.
It SHOULD omit any for whom all pairings that include that candidate
have become invalid.
If a candidate is omitted, and that candidate was a TURN-derived 7.11.1 Sending of a Subsequent Offer
transport address, the agent SHOULD de-allocate the address from the
TURN server. If a local candidate was omitted, along with all of its
derived transport addresses, local operating system resources for
that candidate SHOULD be de-allocated.
Once it has decided on the set of candidates to provide in the The offer MAY contain a new active candidate in the m/c line. This
updated offer, the agent constructs the offer and follows the candidate SHOULD be the native candidate from the highest candidate
procedures in Section 7.6 which defines general subsequent offer/ pair in the candidate pair priority ordered list whose state is
answer processing. valid. If there are no candidate pairs in this state, the highest
one whose state is partially valid SHOULD be used. If there are no
candidate pairs in this state, the candidate pair that is most likely
to work with this peer, as described in Section 7.2, SHOULD be used.
The candidate is encoded into the m/c line in an updated offer as
described in Section 7.3.
7.5.2 Suggested Algorithm If the candidate pair whose native candidate was encoded into the
m/c-line was valid or partially valid, the agent MUST include an
a=remote-candidate attribute into the offer. This attribute MUST
contain the candidate ID of the remote candidate in the candidate
pair. It is used by the recipient of the offer in selecting its
candidate for the answer.
ICE leaves substantial variability to implementors around when an The meaning of a=candidate attributes within a subsequent offer have
agent decides to generate a new offer. However, there are good ways the same meaning as they do in an initial offer. They are a request
to do this, and bad ways. Perhaps the worst algorithm possible would for the peer to attempt (or continue to attempt if the candidate was
be to generate a new offer every time a candidate with higher provided previously) a connectivity check using STUN from each of its
priority than the active one becomes valid. This algorithm will own candidates. When an updated offer is sent, there are several
likely result in a large number of offer/answer exchanges in rapid dispositions regarding the candidates:
succession, many of which will produce "glare" as each agent will
independently initiate an exchange. This will consume CPU and
network resources for little benefit. Rather, the ideal algorithm
strikes a balance between usage of network resources and the desire
to use the ideal pair of candidates.
The following algorithm provides a good tradeoff, and usage of this retained: A candidate is retained if the candidate ID for the
algorithm is RECOMMENDED. The algorithm results in a bounded number candidate is included in the new offer, and matches the candidate
of additional offer/answer exchanges after the initial one - never ID for a candidate in the previous offer or answer. In this case,
more than two, and frequently one or zero. The algorithm almost all of the information about the candidate - its qvalue and
never produces a glare condition. components, and the IP addresses, ports, STUN passwords and
transport protocols of its components, MUST be the same as the
previous offer or answer from the agent. If the agent wants to
change them, this is accomplished by changing the candidate ID as
well. That will have the effect of removing the old candidate and
adding a new one with the updated information.
Once the initial offer/answer exchange completes, media flow will removed: A candidate is removed if its candidate ID appeared in a
happen, though not optimally (where optimal is defined by the previous offer or answer, and that candidate ID is not present in
policies used to set the priorities of the candidates), as long as the new offer.
the candidate that is active has been validated. Thus, the objective
of the algorithm is to quickly make sure that there is a valid path
for media (to avoid clipping), and then do a single offer/answer
exchange to use the highest priority pairing that was validated.
After the initial offer/answer exchange, each agent sets a timer Tu. added: A candidate is added if its candidate ID appeared in the new
This timer SHOULD have a configurable baseline value, which SHOULD offer, but was not present in a previous offer or answer from that
default to 3 seconds. The actual timer is set to this baseline, plus agent.
a time value chosen uniformly beween -1 and 1 seconds. This causes
the actual timer to be randomized so that the timer doesnt fire
simultaneously at each agent. In addition, each agent monitors the
status of the active pairing. If the active media stream is UDP-
based, the status of the active candidates is equal to the status of
the pairing with matching transport addresses. In the case of TCP-
based media, the active media stream is never active initially, since
it always begins with the "holdconn" state.
If, when Tu fires, the active pairing has not been validated, and The following rules are used to determine the disposition of the each
there exists at least one pairing that has been validated, the agent of the current native candidates in the new offer:
generates a new offer. This offer promotes its highest priority
candidate with a validated pairing to the active candidate. If there
are no pairings that have been validated when the timer fires, the
agent waits until one is validated, and once that happens, sets a
timer to fire randomly between 0 and 2 seconds. When the timer
fires, a new offer is generated that promotes the candidate from this
validating pairing to active. If the active pairing is validated
when the timer fires, the agent does nothing at this time.
If new offer is to be sent, the agent includes the new active o If a candidate is invalid, and all peer-derived candidates
candidate in the a=candidate attribute list. It also includes all generated from it are invalid as well, it SHOULD be removed.
candidates with higher priority than the one that is active,
including ones it learned from the connectivity checks themselves.
At this point, media is flowing successfully, since a valid candidate o If the candidate in the m/c-line is valid, all other candidates
is active. However, it may not be optimal. So, the next stage of SHOULD be removed. This has the effect of stopping connectivity
the algorithm is to let the connectivity checks continue. If those checks of other candidates. This SHOULD would not be followed if
checks indicate that a pairing between the two highest priority an agent wanted to keep a candidate ready for usage should, for
candidates from both agents has been validated, each agent sets a some reason, the active candidate later become invalid.
timer whose value is randomly set between 0 and 2 seconds. When the
timer fires, a new offer is generated that promotes the candidate
from this validating pairing to active. Otherwise, when the
connectivity checks have all concluded, such that no pairing exists
in the invalid state, each agent sets a timer whose value is randomly
set between 0 and 2 seconds. When the timer fires, a new offer is
generated that promotes the candidate from the valid pairing with the
highest priority to active.
7.6 Subsequent Offer/Answer Exchanges o If the candidate in the m/c-line is valid, and it is not peer-
derived, that candidate MUST be retained. If the candidate in the
m/c-line is peer-derived, its generating candidate MUST be
retained, even if it is itself invalid.
An offer/answer exchange within a session can occur at any time, o If the candidate in the m/c-line has not been validated, all other
whether it is the result of the algorithm described in Section 7.5.2, candidates that are not invalid, or candidates for whom their
or because one of the agents wishes to add or remove a media stream, derived candidates are not invalid, SHOULD be retained.
or add a codec, and so on.
7.6.1 Sending of an Offer o Peer derived candidates MUST NOT be added; they continue to be
used as long as their generating candidate was retained. Peer
derived candidates are learned exclusively through the STUN
connectivity checks.
The meaning of a=candidate attributes within a subsequent offer have A new candidate MAY be added. This can happen when the candidate is
the same meaning they do in an initial offer. They are a request for a new one, learned since the previous offer/answer exchange, and it
the peer to attempt (or continue to attempt if the candidate was has a higher priority than the currently active candidate. It can
provided previously) a connectivity check using STUN from each of its also occur when an agent wishes to restart checks for a transport
own candidates. As such, an a=candidate attribute is included in address it had tried previously. Effectively, changing the candidate
subsequent offers when (1) connectivity checks haven't concluded yet ID value in an updated offer will "restart" connectivity checks for
to that candidate, or (2) the checks have concluded, and the that candidate.
candidate is currently active. In that case, STUN is used to keep
the bindings active.
If an agent sends an offer which omits candidates it had sent to its If a candidate is removed, the agent takes the following steps:
peer previously, it MUST cease connectivity checks from that
candidate. Any pairings that include the absent native candidate are
discarded. Any STUN transactions in progress from that candidate are
immediately terminated - no further retransmissions take place, and
no further transactions from that candidate will be made. If a TCP
connection was opened to or from that candidate, and that connection
is not listed as the active one in the offer, the connection is torn
down.
The offer MAY contain a new active candidate in the m/c line. If the 1. The agent eliminates any candidate pairs whose native candidate
new active transprot address is UDP, candidate is encoded into an equalled the candidate that was removed. Equality is based on
update offer as described in Section 7.2. The transport addresses comparison of candidate IDs.
constituting the candidate SHOULD also be listed in a=candidate
attributes, so that STUN can be used as an ongoing keepalive.
If the new active transport address is TCP, it is more complicated. 2. The agent eliminates any candidate pairs that had a native
Recall that each TCP connection is opened from one of the agents to candidate that is a peer derived candidate generated from the
the other, such that, for each connection, one agent has the active candidate that was removed.
role, and the other, the passive. The ICE mechanisms allow the
active agent to actually choose a specific connection for use in an
offer, so long as the agent has used a different ephemeral port for
each connection it initiated (which is almost always the case). If,
however, an agent was in the passive role, it cannot choose a
specific connection. Rather, it can choose a specific native
transport address which may have been used to receive multiple
connections. This assymetric behavior brings with it some important
security properties, which are discussed in Section 12.
If the agent was the active one and established the connection, it 3. The candidate pairs that are eliminated are removed from the
includes its apparent native transport address in the m/c line of the candidate pair priority ordered list and candidate pair check
SDP (recall that this address was discovered via the STUN exchange ordered list. As a consequence of this, if connectivity checks
over the connection). Note that this is instead of the SHOULD- had not yet begun for the candidate pair, they won't.
strength recommendation in comedia, which recommends that the port
number sent by the entity which initiated the connection should be
'9'. The actual port number is present to facilitate identification
of the connection. The a=setup attribute MUST be present and MUST
contain the value "active". The a=connection attribute MUST be
present and MUST have the value of "existing".
If the agent was the passive one and was the recipient of the 4. If connectivity checks were already in progress for transport
connection, it includes its transport address in the m/c line of the addresses in that candidate pair, the agent SHOULD immediately
SDP. In this case, that address will be the same as the one it had terminate them. No further retransmissions take place, and no
placed into the a=candidate line of the SDP. The a=setup attribute further transactions from that candidate will be made.
MUST be present and MUST contain the value of "passive". The
a=connection attribute MUST be present and MUST have the value of
"existing".
7.6.2 Receiving the Offer and Sending an Answer 5. If the removed candidate was a TURN-derived candidate, the agent
SHOULD de-allocate its transport addresses from the TURN server.
If a local candidate was removed, and all of its derived
candidates were also removed (including any peer-derived
candidates), local operating system resources for each of the
transport addresses in the local candidate SHOULD be de-
allocated.
If an agent receives an updated offer with a=candidate attributes, it 7.11.2 Receiving the Offer and Sending an Answer
checks to see if it already knows about the listed candidates. This
is done by comparing the tid with the candidates it had received in
the previous offer or answer from the peer. If the tid is already
known, processing for that candidate continues as if no offer had
been made. Any connectivity checks in progress continue, and any
ongoing STUN keepalives continue.
If a candidate which had been listed previously is no longer present To generate the answer, the answerer has to decide which transport
in the offer, this tells the answerer to cease connectivity checks. addresses to include in the m/c line, and which to include in
Any pairings that include the absent remote candidate are discarded. candidate attributes.
Any STUN transactions in progress to that candidate are immediately
terminated - no further retransmissions take place, and no further
transactions to that candidate will be made. If a TCP connection was
opened to or from that candidate, and that connection is not listed
as the active one in the offer, the connection is torn down.
The agent then sends its answer. Like the offerer, it can add or Rules for choosing transport addresses for the m/c-line are as
remove candidates from its answer. If it removed candidates from its follows. The agent examines the transport addresses in the m/c-line
answer, it ceases STUN connectivity checks from those candidates, and of the offer. It compares these with the transport addresses in the
any pairings that include those candidates are discarded. Any STUN remote candidates of candidate pairs whose states are Valid. If
transactions in progress to that candidate are immediately terminated there is matching candidate pair in that state, the agent MUST pick
- no further retransmissions take place, and no further transactions the native candidate from one of those pairs, and use that candidate
to that candidate will be made. If a TCP connection was opened to or as the active one. If none of the matching pairs are in the Valid
from that candidate, and that connection is not listed as the active state, the agent checks if there are any matching pairs in the Send-
one in the answer, the connection is torn down. Valid state. If there are, the agent looks for the a=remote-
candidate attribute in the offer. If present, and the candidate ID
listed there is one of the native candidate IDs amongst the matching
pairs, that candidate ID MUST be used as the active one. If the
a=remote-candidate attribute was not present in the offer, or there
were no matching candidate pairs in the Send-Valid state, the
candidate that is most likely to work with this peer, as described in
Section 7.2, SHOULD be used.
After transmission of the answer, there may be a set of candidates The a=remote-candidate exists to eliminate a race condition between
which were new in the offer, and a set that were new in the answer. the updated offer and the response to the STUN Binding Request that
The agent begins connectivity checks as described in Section 7.4, moved a candidate into the valid state. If the answer arrives at the
pairing each new candidate in its answer with all candidates in the agent prior to the Binding Response, the candidate pair that was
offer, and each new candidate in the offer with all of its candidates validated by the offer will still be in the Send-Valid state. To
in the answer. eliminate this condition, the identity of the validated candidate is
included in the offer itself.
The m/c line may have also changed, indicating a new active Like the offerer, the answer can decide, for each of its candidates,
candidate. If the m/c line contains a UDP stream, the agent begins whether they are retained or removed. The same rules defined in
sending media to the transport addresses listed there. In addition, Section 7.11.1 for determining their disposition apply to the
it checks to see if those transport addresses correspond to a remote answerer. Similarly, if a candidate is removed, the same rules in
candidate in a valid pairing. So long as the remote agent has Section 7.11.1 regarding removal of canididate pairs and freeing of
offered up a candidate that has been validated by ICE, it should be resources apply.
the case. Indeed, there may be a multitude of valid pairings
containing the transport addresses in the m/c line as the remote
candidate. In that case, the agent MUST choose the pairing whose
native candidate has the highest priority. It MUST place this
candidate in the m/c line. Transmission of media occurs as defined
in Section 7.8.
If the m/c line has changed, and now indicates a new TCP candidate, Once the answer is sent, the answerer will have the set of native and
the agent examines it. The comedia "a=connection" attribute will remote candidates before this offer/answer exchange, and the set of
normally be present and normally contain the value of "existing". If native and remote candidates afterwards. The agent then pairs up the
not present, or if present but with a value of "new", comedia process native and remote candidates which were added or retained.
is followed, as apparently the peer has abandoned ICE operation for Furthermore, for candidate pairs containing a peer derived transport
this media stream. Assuming it contains a value of "existing", the address, those pairs continue as long as both candidates are
agent looks at whether the a=setup attribute is present. If its retained. A peer derived candidate continues to be used as long as
value is "active", it means that a connection that was initiated by its generating parent continues to be used. This leads to a set of
the remote agent is to be used. The agent examines the transport current candidate pairs.
address in the m/c line. It looks for a matching value in the
apparent remote transport addresses of existing connections. If it
matches multiple connections (though it should normally match just
one), one of those connections is chosen. The native transport
address of that connection is then placed into the m/c line of the
answer. If no existing connections where matched, an error has
occured. The agent SHOULD respond with "holdconn", and then generate
its own offer with a connection to the peer which it believes is
valid.
If the a=setup attribute had a value of "passive", it means that a If a candidate pair existed previously, but as a consequence of the
connection that was initiated by the agent itself is to be used. The offer/answer exchange, either its native or remote candidate has been
agent examines the transport address in the m/c line. It looks for a removed, the agent takes the following steps:
matching value amongst the remote transport addresses in valid
pairings. If multiple pairings match, it MUST choose the one whose
native transport address has the highest priority. The apparent
native transport address associated with an active connection
initiated by the agent is then placed into the m/c line, and that TCP
connection is used to send and receive media. If no pairings match,
an error has occured. The agent SHOULD respond with "holdconn", and
then generate its own offer with a connection to the peer which it
believes is valid.
7.6.3 Receiving the Answer 1. The candidate pair is removed from the candidate pair priority
ordered list and candidate pair check ordered list. As a
consequence of this, if connectivity checks had not yet begun for
the candidate pair, they won't.
If an agent receives an answer with a=candidate attributes, it checks 2. If connectivity checks were already in progress for that
to see if it already knows about the listed candidates. This is done candidate pair, the agent SHOULD immediately terminate any STUN
by comparing the tid with the candidates it had received in the transactions in progress from that candidate. No further
previous offer or answer from the peer. If the tid is already known, retransmissions take place, and no further transactions from that
processing for that candidate continues as if no offer had been made. candidate will be made.
Any connectivity checks in progress continue, and any ongoing STUN
keepalives continue.
If a candidate which had been listed previously is no longer present 3. If the agent receives a STUN Binding Request for that candidate
in the answer, this tells the offerer to cease connectivity checks. pair, the agent SHOULD generate a 430 response.
Any pairings that include the absent remote candidate are discarded.
Any STUN transactions in progress to that candidate are immediately
terminated - no further retransmissions take place, and no further
transactions to that candidate will be made. If a TCP connection was
opened to or from that candidate, and that connection is not listed
as the active one in the answer, the connection is torn down.
Furthermore, there may be a set of candidates which were new in the If a candidate pair existed previously, and continues to exist, no
offer, and a set that were new in the answer. The agent begins changes are made; any STUN transactions in progress for that
connectivity checks as described in Section 7.4, pairing each new candidate pair continue, and it remains on the candidate pair
candidate in its offer with all candidates in the answer, and each priority ordered list and candidate pair check ordered list.
new candidate in the answer with all of its candidates in the offer.
The m/c line may have also changed, indicating a new active If a candidate pair is new (because either its native candidate is
candidate. If the m/c line contains a UDP stream, the agent begins new, or its remote candidate is new, or both), the agent takes the
sending media to the transport addresses listed there as defined in role of answerer for this candidate pair. The new candidate pair is
Section 7.8. It will send from the m/c line it had signaled in the inserted into the candidate pair priority ordered list and candidate
offer. pair check ordered list. STUN connectivity checks will start for
them based on the logic described in Section 7.6.
If the m/c line has changed, and now indicates a new TCP candidate, 7.11.3 Receiving the Answer
the agent examines it. If the agent had, in its offer, indicated the
desire to use a specific connection that it had initiated, it would
have used the a=connection attribute with the value of "existing",
and the a=setup attribute with the value of "active", and have placed
its apparent native transport address in the m/c line. In that case,
the m/c line in the answer will normally have the a=connection
attribute with the value "existing", which means that the remote
agent agrees with the usage of that connection. The transport
addresses in the m/c line should correspond to the remote transport
addresses that the agent had initiated its connection to. If so,
that connection is used.
If the agent had, in its offer, indicated the desire to use any Once the answer is received, the answerer will have the set of native
connection that had been established to a specific native transport and remote candidates before this offer/answer exchange, and the set
address, it would have, in its offer, used the a=connection attribute of native and remote candidates afterwards. It then follows the same
with the value of "existing" and the a=setup attribute with the value logic described in Section 7.11.2, pairing up the candidate pairs,
of "passive", and placed that address in the m/c line. In that case, removing ones that are no longer in use, and beginning of processing
the m/c line in the answer will normally have the a=connection for ones that are new.
attribute with the value of "existing" and the a=setup attribute with
the value of "active". The transport address in the m/c line will
correspond to the apparent remote transport address. The agent MUST
scan its existing connections to the native transport address it had
advertised in the offer, and find the one whose apparent remote
transport address matches the m/c line in the answer. If there is a
match, that connection is used for sending media. If there is no
match, an error has occurred.
7.7 Binding Keepalives 7.12 Binding Keepalives
Once the candidates are promoted to active, and media begins flowing, Once a candidate is promoted to active, and media begins flowing, it
it is still necessary to keep the bindings alive at intermediate NATs is still necessary to keep the bindings alive at intermediate NATs
for the duration of the session. Normally, the RTP packets for the duration of the session. Normally, the media stream packets
themselves meet this objective. However, several cases merit further themselves (e.g., RTP) meet this objective. However, several cases
discussion. Firstly, in some RTP usages, such as SIP, the media merit further discussion. Firstly, in some RTP usages, such as SIP,
streams can be "put on hold". This is accomplished by using the SDP the media streams can be "put on hold". This is accomplished by
"sendonly" or "inactive" attributes, as defined in RFC 3264 [4]. RFC using the SDP "sendonly" or "inactive" attributes, as defined in RFC
3264 directs implementations to cease transmission of media in these 3264 [5]. RFC 3264 directs implementations to cease transmission of
cases. However, doing so may cause NAT bindings to timeout, and media in these cases. However, doing so may cause NAT bindings to
media won't be able to come off hold. timeout, and media won't be able to come off hold.
Secondly, some RTP payload formats, such as the payload format for Secondly, some RTP payload formats, such as the payload format for
text conversation [28], may send packets so infrequently that the text conversation [34], may send packets so infrequently that the
interval exceeds the NAT binding timeouts. interval exceeds the NAT binding timeouts.
Thirdly, if silence suppression is in use, long periods of silence Thirdly, if silence suppression is in use, long periods of silence
may cause media transmission to cease sufficiently long for NAT may cause media transmission to cease sufficiently long for NAT
bindings to time out. bindings to time out.
To prevent these problems, ICE implementations MUST continue to list To prevent these problems, ICE implementations MUST continue to list
their active transport addresses as candidates in a=candidate lines. their active transport addresses in a=candidate lines for UDP-based
As a consequence of this, STUN packets will be transmitted media streams. As a consequence of this, STUN packets will be
periodically independently of the transmission (or lack thereof) of transmitted periodically independently of the transmission (or lack
media packets. This provides a media independent, RTP independent, thereof) of media packets. This provides a media independent, RTP
and codec independent solution for keeping the NAT bindings alive. independent, and codec independent solution for keeping the NAT
bindings alive. STUN Binding Requests cannot be used for TCP-based
transports because the media protocol may not provide framing
services to support this. As such, application layer keepalives MUST
be used in this case.
If an ICE implementation is communciating with one that does not If an ICE implementation is communciating with one that does not
support ICE, keepalives MUST still be sent. In that case, it is support ICE, keepalives MUST still be sent. Indeed, these keepalives
RECOMMENDED that an agent support the RTP No-Op payload format [15], are essential even if neither endpoint implements ICE. As such, this
and send it at least once every 20 seconds if media is not otherwise specification defines keepalive behavior generally, for endpoints
being sent. This No-Op MUST be sent even if the media stream is that support ICE, and those that do not.
inactive or recvonly.
7.8 Sending Media All endpoints MUST send keepalives for each media session. These
keepalives MUST be sent regardless of whether the media stream is
currently inactive, sendonly, recvonly or sendrecv. The keepalive
SHOULD be sent using a format which is supported by its peer. ICE
endpoints allow for STUN-based keepalives for UDP streams, and as
such, STUN keepalives MUST be used when an agent is communicating
with a peer that supports ICE. An agent can determine that its peer
supports ICE by the presence of the a=candidate attributes for each
media session. If the peer does not support ICE, the choice of a
packet format for keepalives is a matter of local implementation. A
format which allows packets to easily be sent in the absence of
actual media content is RECOMMENDED. Examples of formats which
readily meet this goal are RTP No-Op [29] and RTP comfort noise [27].
STUN-based keepalives will be sent periodically every Tr seconds as a
consequence of the rules in in Section 7.7. If STUN keepalives are
not in use (because the peer does not support ICE or because of TCP),
an agent SHOULD ensure that a media packet is sent every Tr seconds.
If one is not sent as a consequence of normal media communications, a
keepalive packet using one of the formats discussed above SHOULD be
sent.
7.13 Sending Media
An agent MUST NOT send media packets until the active candidate has
entered either the Valid or Recv-Valid state. This is to prevent a
particularly destructive denial-of-service attack described in
Section 13.4.1.
It is important to note that an agent always sends media to the
address in the m/c-line, not to a validated candidate. To use a
candidate, it must be promoted to the m/c-line through an updated
offer/answer exchange.
When an agent sends media packets, it MUST send them from the same IP When an agent sends media packets, it MUST send them from the same IP
address and port it has advertised in the m/c-line. This provides a address and port it has advertised in the m/c-line. This provides a
property known as symmetry, which is an essential facet of NAT property known as symmetry, which is an essential facet of NAT
travresal. traversal.
In the case of a STUN-derived transport address, this means that the In the case of a STUN-derived transport address, this means that the
RTP packets are sent from the local transport address used to obtain RTP packets are sent from the local transport address used to obtain
the STUN address. In the case of a TURN-derived transport address, the STUN address. In the case of a TURN-derived transport address,
this means that media packets are sent through the TURN server (using this means that media packets are sent through the TURN server (using
the TURN SEND primitive). For local transport addresses, media is the TURN SEND primitive). For local transport addresses, media is
sent from that local transport address. sent from that local transport address.
This symmetric behavior MUST be followed by an agent even if its peer This symmetric behavior MUST be followed by an agent even if its peer
in the session doesn't support ICE. in the session doesn't support ICE.
8. Interactions with Forking 8. Guidelines for Usage with SIP
SIP [3] makes use of the offer/answer model, and is one of the
primary targets for usage of ICE. SIP allows for offer/answer
exchanges to occur in many different combinations of messages,
including INVITE/200 OK and 200 OK/ACK. When support for reliable
provisional responses (RFC 3262 [13]) and UPDATE (RFC 3311 [28]) are
added, additional combinations of messages that can be used for
offer/answer exchanges are added. As such, this section provides
some guidance on good ways to make use of SIP with ICE.
ICE requires a series of STUN-based connectivity checks to take place
between endpoints, along with an updated offer/answer exchange to use
a validated candidate. These exchanges require time to complete. If
the initial offer/answer exchange were to take place in the INVITE
and 200 OK response respectively, the connectivity checks and updated
offer would all occur after the called party answered. This will
result in a potential increase in the post-pickup delay. This delay
refers to the time between when a user "answers the phone" and when
any speech they utter can be delivered to the caller.
To eliminate any increase in post-pickup delay due to ICE, it is
RECOMMENDED that the initial offer/answer exchange take place in an
INVITE and a 18x provisional response. As a consequence, support for
RFC 3262 is RECOMMENDED with ICE. The STUN connectivity checks will
then take place while the called party is being "rung". To deliver
the updated offer prior to the user answering the call, it is
RECOMMENDED that it be delivered with an UPDATE request. This will
allow ICE to have completed prior to the called party even answering
the session invitation.
If RFC 3262 and RFC 3311 are not supported by both agents, tuning can
still take place to reduce post-pickup delays. In particular, the
answerer SHOULD include its answer in an unreliable 18x response.
RFC 3261 requires that the same answer also be placed in a 200 OK,
which is delivered reliably. However, placing it in a 18x gives the
offerer an early preview of the answer, and allows the connectivity
checks to all occur prior to the user answering the call. However,
the updated offer with the highest priority valid candidate promoted
to the m/c-line cannot occur until after the 200 OK, in which case it
SHOULD be done with a re-INVITE. Fortunately, if the active
candidates in the initial offer/answer exchange end up being valid
anyway, media can flow as soon as the user answers the call (or even
before hand, if early media is needed). The additional offer/answer
exchange in the re-INVITE would merely improve the situation by using
a higher priority candidate pair.
One of the difficulties in including the answer in the 18x, and then
using it for connectivity checks, is that the 18x might be lost. In
such a case, the STUN connectivity check from the answerer to the
offerer (UAS to UAC) will pend indefinitely. To prevent this, it is
RECOMMENDED that a SIP UA retransmit its 18x periodically, using the
same exponential backoff defined in RFC 3262, until such time as a
Binding Response is received for any of the Binding Requests it sent.
As discussed in Section 13, offer/answer exchanges SHOULD be secured
against eavesdropping and man-in-the-middle attacks. To do that, the
usage of SIPS is RECOMMENDED when used in concert with ICE.
9. Interactions with Forking
SIP allows INVITE requests carrying offers to fork, which means that SIP allows INVITE requests carrying offers to fork, which means that
they are delivered to multiple user agents. Each of those user they are delivered to multiple user agents. Each of those user
agents then provides an answer to the offer in the INVITE. The agents then provides an answer to the offer in the INVITE. The
result is that a single offer generated by the UAC produces multiple result is that a single offer generated by the UAC produces multiple
answers. answers.
ICE interacts very well with forking. Indeed, ICE fixes some of the ICE interacts very well with forking. Indeed, ICE fixes some of the
problems associated with forking. Once the offer/answer exchange has problems associated with forking. Once the offer/answer exchange has
completed, the UAC will have an answer from each UAS that received completed, the UAC will have an answer from each UAS that received
the INVITE. The ICE connectivity checks that ensue will carry tids the INVITE. The ICE connectivity checks that ensue will carry
that correlate each of those checks (and thus their corresponding transport address pair IDs that correlate each of those checks (and
source IP address and port or TCP connection) with a specific remote thus their corresponding IP addresses and ports) with a specific
user agent. As these checks happen before any media is transmitted, remote user agent. As these checks happen before any media is
ICE allows a UAC to disambiguate subsequent media traffic, and transmitted, ICE allows a UAC to disambiguate subsequent media
corelate that traffic with a particular remote UA. When SIP is used traffic by looking at the source IP address and port, and then
correlate that traffic with a particular remote UA. When SIP is used
without ICE, the incoming media traffic cannot be disambiguated without ICE, the incoming media traffic cannot be disambiguated
without an additional offer/answer exchange. without an additional offer/answer exchange.
9. Interactions with Preconditions 10. Interactions with Preconditions
Because ICE involves multiple addresses and pre-session activities, Because ICE involves multiple addresses and pre-session activities,
its interactions with preconditions [10] merits further discussion. its interactions with preconditions merits further discussion.
Quality of Service (QoS) preconditions, which are defined in RFC Quality of Service (QoS) preconditions, which are defined in RFC 3312
3312, apply only to the IP addresses and ports listed in the m/c [9] and RFC 4032 [10], apply only to the IP addresses and ports
lines in an offer/answer. If ICE changes the address and port where listed in the m/c lines in an offer/answer. If ICE changes the
media is received, this change is reflected in the m/c lines of a new address and port where media is received, this change is reflected in
offer/answer. As such, it appears like any other re-INVITE would, the m/c lines of a new offer/answer. As such, it appears like any
and is fully treated in RFC 3312, which applies without regard to the other re-INVITE would, and is fully treated in RFC 3312 and 4032,
fact that the m/c lines are changing due to ICE negotiations ocurring which applies without regard to the fact that the m/c lines are
"in the background". changing due to ICE negotiations ocurring "in the background".
ICE also has (purposeful) interactions with connectivity ICE also has (purposeful) interactions with connectivity
preconditions [12]. As described there, the precondition is preconditions [14]. Those interactions are described there.
satisfied once ICE has verified that there exists a valid path of
connectivity for each media stream to which the precondition applies.
More specifically, it is satisfied when there is at least one valid
UDP transport address pairing or TCP connection for such a media
stream. Furthermore, when a subsequent offer is made to promote one
of those valid transport address pairings or connections into the
m/c-line, the preconditions is marked as met in that same offer/
answer exchange.
10. Example 11. Example
In the example that follows, messages are labeled with "message name This section provides an example ICE call flow. Two agents, L and R,
A,B" to mean a message from transport address A to B. For STUN are using ICE. Both agents have a single IPv4 interface, and are
Requests, this is followed by curly brackets enclosing the username configured with a single TURN and single STUN server each (indeed,
(which is also the password). For STUN answers, this is followed by the same one for each). As a consequence, each agent will end up
square brackets containing the value of MAPPED ADDRESS. The example with three candidates - a local candidate, a TURN-derived candidate,
shows a flow of two agents where one is behind a full cone NAT, and and a STUN-derived candidate. The agents are seeking to communicate
the other is behind a symmetric NAT. using a single RTP-based voice stream. As a consequence, each
candidate has two components - one for RTP and one for RTCP. Agent L
is behind a symmetric NAT, and agent R is on the public Internet.
TODO: Fill in. This is a big complicated flow! To facilitate understanding, transport addresses are listed in a
mnemonic form. This form is <entity&rt;-<type&rt;-<seq-no&rt;, where
<entity&rt; refers to the entity whose interface the transport
address is on, and is one of "L", "R", "STUN", "TURN", or "NAT". The
<type&rt; is either "PUB" for transport addresses that are public,
and "PRIV" for transport addresses that are private. Finally, <seq-
no&rt; is a sequence number that is different for each transport
address of the same type on a particular entity.
11. Grammar The STUN server has advertised transport address STUN-PUB-1 for STUN
requests, and the TURN server has advertised transport address TURN-
PUB-1 for TURN allocations.
This specification defines a new SDP attribute. It is called In addition, candidate IDs are also listed in mnemonic form. Agent L
"candidate". The candidate attribute MUST be present within a media uses candidate ID L1 for its local candidate, L2 for its STUN derived
block of the SDP. It contains a transport address for a candidate candidate, and L3 for its TURN derived candidate. Agent R uses R1
that can be used for connectivity checks. There MAY be multiple for its local candidate and R2 for its TURN derived candidate. The
candidate attributes in a media block. passwords for each transport address are LPASS1 through LPASS6 for
agent L, and RPASS1 through RPASS4 for agent R.
The syntax of this attribute is: In example SDP messages, $<token&rt;.IP is used to refer to the value
of the IP address of the transport address with mnemonic name
"token". Similarly, $<token&rt;.PORT is used to refer to the value
of the port of the transport address with mnemonic name "token".
candidate-attribute = "candidate" ":" candidate-id SP tid SP In the call flow example in Figure 6, STUN and TURN messages are
annotated with several attributes. The "S=" attribute indicates the
source transport address of the message. The "D=" attribute
indicates the destination transport address of the message. The
"MA=" attribute is used in STUN Binding Response messages, or STUN
Binding Response messages carried in a TURN SEND or TURN DATA
message, and refers to the value of the MAPPED-ADDRESS attribute in
the STUN Binding Response. The "RA=" attribute is used in TURN DATA
messages, and refers to the value of the REMOTE-ADDRESS attribute.
The "U=" attribute is used in STUN Binding Requests, and corresponds
to the STUN USERNAME. Finally, the "DA=" attribute is used in TURN
SEND messages, and refers to the value of the DESTINATION-ADDRESS
attribute.
The call flow example omits STUN authentication operations.
L NAT STUN R
| | | |
| | | |
| | | |
|RTP STUN alloc. | |
| | | |
| | | |
| | | |
|(1) STUN Req | | |
|S=L-PRIV-1 | | |
|D=STUN-PUB-1 | | |
|------------->| | |
| | | |
| | | |
| |(2) STUN Req | |
| |S=NAT-PUB-1 | |
| |D=STUN-PUB-1 | |
| |------------->| |
| | | |
| |(3) STUN Res | |
| |S=STUN-PUB-1 | |
| |D=NAT-PUB-1 | |
| |MA=NAT-PUB-1 | |
| |<-------------| |
| | | |
|(4) STUN Res | | |
|S=STUN-PUB-1 | | |
|D=L-PRIV-1 | | |
|MA=NAT-PUB-1 | | |
|<-------------| | |
| | | |
| | | |
| | | |
|RTCP STUN alloc. | |
|Ta secs. later| | |
| | | |
| | | |
| | | |
|(5) STUN Req | | |
|S=L-PRIV-2 | | |
|D=STUN-PUB-1 | | |
|------------->| | |
| | | |
| | | |
| |(6) STUN Req | |
| |S=NAT-PUB-2 | |
| |D=STUN-PUB-1 | |
| |------------->| |
| | | |
| |(7) STUN Res | |
| |S=STUN-PUB-1 | |
| |D=NAT-PUB-2 | |
| |MA=NAT-PUB-2 | |
| |<-------------| |
| | | |
|(8) STUN Res | | |
|S=STUN-PUB-1 | | |
|D=L-PRIV-2 | | |
|MA=NAT-PUB-2 | | |
|<-------------| | |
| | | |
| | | |
| | | |
|RTP TURN alloc. | |
|Ta secs. later| | |
| | | |
| | | |
| | | |
|(9) TURN Req | | |
|S=L-PRIV-1 | | |
|D=TURN-PUB-1 | | |
|------------->| | |
| | | |
| | | |
| |(10) TURN Req | |
| |S=NAT-PUB-3 | |
| |D=TURN-PUB-1 | |
| |------------->| |
| | | |
| |(11) TURN Res | |
| |S=TURN-PUB-1 | |
| |D=NAT-PUB-3 | |
| |MA=TURN-PUB-2 | |
| |<-------------| |
| | | |
|(12) TURN Res | | |
|S=TURN-PUB-1 | | |
|D=L-PRIV-1 | | |
|MA=TURN-PUB-2 | | |
|<-------------| | |
| | | |
| | | |
| | | |
|RTCP TURN alloc. | |
|Ta secs. later| | |
| | | |
| | | |
| | | |
|(13) TURN Req | | |
|S=L-PRIV-2 | | |
|D=TURN-PUB-1 | | |
|------------->| | |
| | | |
| | | |
| |(14) TURN Req | |
| |S=NAT-PUB-4 | |
| |D=TURN-PUB-1 | |
| |------------->| |
| | | |
| |(15) TURN Res | |
| |S=TURN-PUB-1 | |
| |D=NAT-PUB-4 | |
| |MA=TURN-PUB-3 | |
| |<-------------| |
| | | |
|(16) TURN Res | | |
|S=TURN-PUB-1 | | |
|D=L-PRIV-2 | | |
|MA=TURN-PUB-3 | | |
|<-------------| | |
| | | |
| | | |
| | | |
| | | |
|(17) Offer | | |
|------------------------------------------->|
| | | |
| | | |
| | | |
| | | |
| | | |RTP STUN alloc.
| | | |
| | | |
| | | |
| | |(18) STUN Req |
| | |S=R-PUB-1 |
| | |D=STUN-PUB-1 |
| | |<-------------|
| | | |
| | |(19) STUN Res |
| | |S=STUN-PUB-1 |
| | |D=R-PUB-1 |
| | |MA=R-PUB-1 |
| | |------------->|
| | | |
| | | |
| | | |
| | | |RTCP STUN alloc.
| | | |Ta secs. later
| | | |
| | | |
| | | |
| | |(20) STUN Req |
| | |S=R-PUB-2 |
| | |D=STUN-PUB-1 |
| | |<-------------|
| | | |
| | |(21) STUN Res |
| | |S=STUN-PUB-1 |
| | |D=R-PUB-2 |
| | |MA=R-PUB-2 |
| | |------------->|
| | | |
| | | |
| | | |
| | | |RTP TURN alloc.
| | | |Ta secs. later
| | | |
| | | |
| | | |
| | |(22) TURN Req |
| | |S=R-PUB-1 |
| | |D=TURN-PUB-1 |
| | |<-------------|
| | | |
| | |(23) TURN Res |
| | |S=TURN-PUB-1 |
| | |D=R-PUB-1 |
| | |MA=TURN-PUB-4 |
| | |------------->|
| | | |
| | | |
| | | |
| | | |RTCP TURN alloc.
| | | |Ta secs. later
| | | |
| | | |
| | | |
| | |(24) TURN Req |
| | |S=R-PUB-2 |
| | |D=TURN-PUB-1 |
| | |<-------------|
| | | |
| | |(25) TURN Res |
| | |S=TURN-PUB-1 |
| | |D=R-PUB-2 |
| | |MA=TURN-PUB-5 |
| | |------------->|
| | | |
| | | |
| | | |
| | | |
|(26) answer | | |
|<-------------------------------------------|
| | | |
| | | |
| | | |
| | | |Validate
| | | |TURN-PUB-4 to TURN-PUB-2
| | | |
| | | |
| | |(27) TURN SEND|
| | |S=R-PUB-1 |
| | |D=TURN-PUB-1 |
| | |DA=TURN-PUB-2 |
| | |<-------------|
| | | |
| | |STUN Req. |
| | |S=TURN-PUB-4 |
| | |D=TURN-PUB-2 |
| | |U=L3:1:R2:1 |
| | | |
| | | |
| | | |
| | | |
| | | |
| | |Discard |
| | | |
| | | |
| | | |
| | | |
|Validate | | |
|TURN-PUB-2 to TURN-PUB-4 | |
| | | |
| | | |
|(28) TURN SEND| | |
|S=L-PRIV-1 | | |
|D=TURN-PUB-1 | | |
|DA=TURN-PUB-4 | | |
|------------->| | |
| | | |
| |(29) TURN SEND| |
| |S=NAT-PUB-3 | |
| |D=TURN-PUB-1 | |
| |DA=TURN-PUB-4 | |
| |------------->| |
| | | |
| | |STUN Req. |
| | |S=TURN-PUB-2 |
| | |D=TURN-PUB-4 |
| | |U=R2:1:L3:1 |
| | | |
| | | |
| | |(30) TURN DATA|
| | |S=TURN-PUB-1 |
| | |D=R-PUB-1 |
| | |RA=TURN-PUB-2 |
| | |------------->|
| | |(31) TURN SEND|
| | |S=R-PUB-1 |
| | |D=TURN-PUB-1 |
| | |DA=TURN-PUB-2 |
| | |MA=TURN-PUB-2 |
| | |<-------------|
| | | |
| | |STUN Res. |
| | |S=TURN-PUB-4 |
| | |D=TURN-PUB-2 |
| | |MA=TURN-PUB-2 |
| | | |
| |(32) TURN DATA| |
| |S=TURN-PUB-1 | |
| |D=NAT-PUB-3 | |
| |RA=TURN-PUB-4 | |
| |MA=TURN-PUB-2 | |
| |<-------------| |
|(33) TURN DATA| | |
|S=TURN-PUB-1 | | |
|D=L-PRIV-1 | | |
|RA=TURN-PUB-4 | | |
|MA=TURN-PUB-2 | | |
|<-------------| | |
| | | |
| | | |
| | | |
| | | |Validate
| | | |TURN-PUB-4 to TURN-PUB-2
| | | |
| | | |
| | |(34) TURN SEND|
| | |S=R-PUB-1 |
| | |D=TURN-PUB-1 |
| | |DA=TURN-PUB-2 |
| | |<-------------|
| | | |
| | |STUN Req. |
| | |S=TURN-PUB-4 |
| | |D=TURN-PUB-2 |
| | |U=L3:1:R2:1 |
| | | |
| | | |
| |(35) TURN DATA| |
| |S=TURN-PUB-1 | |
| |D=NAT-PUB-3 | |
| |RA=TURN-PUB-4 | |
| |<-------------| |
| | | |
|(36) TURN DATA| | |
|S=TURN-PUB-1 | | |
|D=L-PRIV-1 | | |
|RA=TURN-PUB-4 | | |
|<-------------| | |
|(37) TURN SEND| | |
|S=L-PRIV-1 | | |
|D=TURN-PUB-1 | | |
|DA=TURN-PUB-4 | | |
|MA=TURN-PUB-4 | | |
|------------->| | |
| |(38) TURN SEND| |
| |S=NAT-PUB-3 | |
| |D=TURN-PUB-1 | |
| |DA=TURN-PUB-4 | |
| |MA=TURN-PUB-4 | |
| |------------->| |
| | | |
| | |STUN Res. |
| | |S=TURN-PUB-2 |
| | |D=TURN-PUB-4 |
| | |MA=TURN-PUB-4 |
| | | |
| | |(39) TURN DATA|
| | |S=TURN-PUB-1 |
| | |D=R-PUB-1 |
| | |RA=TURN-PUB-2 |
| | |MA=TURN-PUB-4 |
| | |------------->|
| | | |
| | | |
| | | |
| | | |Validate
| | | |TURN-PUB-5 to TURN-PUB-3
| | | |
| | | |
| | |(40) TURN SEND|
| | |S=R-PUB-2 |
| | |D=TURN-PUB-1 |
| | |DA=TURN-PUB-3 |
| | |<-------------|
| | | |
| | |STUN Req. |
| | |S=TURN-PUB-5 |
| | |D=TURN-PUB-3 |
| | |U=L3:2:R2:2 |
| | | |
| | | |
| | | |
| | | |
| | | |
| | |Discard |
| | | |
| | | |
| | | |
| | | |
|Validate | | |
|TURN-PUB-3 to TURN-PUB-5 | |
| | | |
| | | |
|(41) TURN SEND| | |
|S=L-PRIV-2 | | |
|D=TURN-PUB-1 | | |
|DA=TURN-PUB-5 | | |
|------------->| | |
| | | |
| |(42) TURN SEND| |
| |S=NAT-PUB-4 | |
| |D=TURN-PUB-1 | |
| |DA=TURN-PUB-5 | |
| |------------->| |
| | | |
| | |STUN Req. |
| | |S=TURN-PUB-3 |
| | |D=TURN-PUB-5 |
| | |U=R2:2:L3:2 |
| | | |
| | | |
| | |(43) TURN DATA|
| | |S=TURN-PUB-1 |
| | |D=R-PUB-2 |
| | |RA=TURN-PUB-3 |
| | |------------->|
| | |(44) TURN SEND|
| | |S=R-PUB-2 |
| | |D=TURN-PUB-1 |
| | |DA=TURN-PUB-3 |
| | |MA=TURN-PUB-3 |
| | |<-------------|
| | | |
| | |STUN Res. |
| | |S=TURN-PUB-5 |
| | |D=TURN-PUB-3 |
| | |MA=TURN-PUB-3 |
| | | |
| |(45) TURN DATA| |
| |S=TURN-PUB-1 | |
| |D=NAT-PUB-4 | |
| |RA=TURN-PUB-5 | |
| |MA=TURN-PUB-3 | |
| |<-------------| |
|(46) TURN DATA| | |
|S=TURN-PUB-1 | | |
|D=L-PRIV-2 | | |
|RA=TURN-PUB-5 | | |
|MA=TURN-PUB-3 | | |
|<-------------| | |
| | | |
| | | |
| | | |
| | | |Validate
| | | |TURN-PUB-5 to TURN-PUB-3
| | | |
| | | |
| | |(47) TURN SEND|
| | |S=R-PUB-2 |
| | |D=TURN-PUB-1 |
| | |DA=TURN-PUB-3 |
| | |<-------------|
| | | |
| | |STUN Req. |
| | |S=TURN-PUB-5 |
| | |D=TURN-PUB-3 |
| | |U=L3:2:R2:2 |
| | | |
| | | |
| |(48) TURN DATA| |
| |S=TURN-PUB-1 | |
| |D=NAT-PUB-4 | |
| |RA=TURN-PUB-5 | |
| |<-------------| |
| | | |
|(49) TURN DATA| | |
|S=TURN-PUB-1 | | |
|D=L-PRIV-2 | | |
|RA=TURN-PUB-5 | | |
|<-------------| | |
|(50) TURN SEND| | |
|S=L-PRIV-2 | | |
|D=TURN-PUB-1 | | |
|DA=TURN-PUB-5 | | |
|MA=TURN-PUB-5 | | |
|------------->| | |
| |(51) TURN SEND| |
| |S=NAT-PUB-4 | |
| |D=TURN-PUB-1 | |
| |DA=TURN-PUB-5 | |
| |MA=TURN-PUB-5 | |
| |------------->| |
| | | |
| | |STUN Res. |
| | |S=TURN-PUB-3 |
| | |D=TURN-PUB-5 |
| | |MA=TURN-PUB-5 |
| | | |
| | |(52) TURN DATA|
| | |S=TURN-PUB-1 |
| | |D=R-PUB-2 |
| | |RA=TURN-PUB-3 |
| | |MA=TURN-PUB-5 |
| | |------------->|
| | | |
| | | |
| | | |
| | | |
|RTP flows | | |
| | | |
| | | |
|(53) TURN SEND| | |
|S=L-PRIV-1 | | |
|D=TURN-PUB-1 | | |
|DA=TURN-PUB-4 | | |
|------------->| | |
| | | |
| |(54) TURN SEND| |
| |S=NAT-PUB-3 | |
| |D=TURN-PUB-1 | |
| |DA=TURN-PUB-4 | |
| |------------->| |
| | | |
| | | |
| | |RTP |
| | |S=TURN-PUB-2 |
| | |D=TURN-PUB-4 |
| | | |
| | | |
| | |(55) TURN DATA|
| | |S=TURN-PUB-1 |
| | |D=R-PUB-1 |
| | |RA=TURN-PUB-2 |
| | |------------->|
| | | |
| | | |
| | | |
| | | |
| | | |RTP flows
| | | |
| | | |
| | |(56) TURN SEND|
| | |S=R-PUB-1 |
| | |D=TURN-PUB-1 |
| | |DA=TURN-PUB-2 |
| | |<-------------|
| | | |
| | | |
| | |RTP |
| | |S=TURN-PUB-4 |
| | |D=TURN-PUB-2 |
| | | |
| | | |
| |(57) TURN DATA| |
| |S=TURN-PUB-1 | |
| |D=NAT-PUB-3 | |
| |RA=TURN-PUB-4 | |
| |<-------------| |
| | | |
|(58) TURN DATA| | |
|S=TURN-PUB-1 | | |
|D=L-PRIV-1 | | |
|RA=TURN-PUB-4 | | |
|<-------------| | |
| | | |
| | | |
| | | |
|Validate | | |
|L-PRIV-1 to R-PUB-1 | |
| | | |
| | | |
|(59) STUN Req.| | |
|S=L-PRIV-1 | | |
|D=R-PUB-1 | | |
|U=R1:1:L1:1 | | |
|------------->| | |
| | | |
| |(60) STUN Req.| |
| |S=NAT-PUB-5 | |
| |D=R-PUB-1 | |
| |U=R1:1:L1:1 | |
| |---------------------------->|
| | | |
| |(61) STUN Res.| |
| |S=R-PUB-1 | |
| |D=NAT-PUB-5 | |
| |MA=NAT-PUB-5 | |
| |<----------------------------|
| | | |
|(62) STUN Res.| | |
|S=R-PUB-1 | | |
|D=L-PRIV-1 | | |
|MA-NAT-PUB-5 | | |
|<-------------| | |
| | | |
| | | |
| | | |
| | | |Validate
| | | |R-PUB-1 to L-PRIV-1
| | | |
| | | |
| |(63) STUN Req.| |
| |S=R-PUB-1 | |
| |D=L-PRIV-1 | |
| |U=L1:1:R1:1 | |
| |<----------------------------|
| | | |
| | | |
| | | |
| | | |
| |Discard | |
| | | |
| | | |
| | | |
| | | |
| | | |Validate
| | | |R-PUB-2 to L-PRIV-2
| | | |
| | | |
| |(64) STUN Req.| |
| |S=R-PUB-2 | |
| |D=L-PRIV-2 | |
| |U=L1:2:R1:2 | |
| |<----------------------------|
| | | |
| | | |
| | | |
| | | |
| |Discard | |
| | | |
| | | |
| | | |
| | | |
|Validate | | |
|L-PRIV-2 to R-PUB-2 | |
| | | |
| | | |
|(65) STUN Req.| | |
|S=L-PRIV-2 | | |
|D=R-PUB-2 | | |
|U=R1:2:L1:2 | | |
|------------->| | |
| | | |
| |(66) STUN Req.| |
| |S=NAT-PUB-6 | |
| |D=R-PUB-2 | |
| |U=R1:2:L1:2 | |
| |---------------------------->|
| | | |
| |(67) STUN Res.| |
| |S=R-PUB-2 | |
| |D=NAT-PUB-6 | |
| |MA=NAT-PUB-6 | |
| |<----------------------------|
| | | |
|(68) STUN Res.| | |
|S=R-PUB-2 | | |
|D=L-PRIV-2 | | |
|MA=NAT-PUB-6 | | |
|<-------------| | |
| | | |
| | | |
| | | |
| | | |Validate
| | | |R-PUB-1 to NAT-PUB-5
| | | |
| | | |
| |(69) STUN Req.| |
| |S=R-PUB-1 | |
| |D=NAT-PUB-5 | |
| |U=L1R1:1:R1:1 | |
| |<----------------------------|
| | | |
|(70) STUN Req.| | |
|S=R-PUB-1 | | |
|D=L-PRIV-1 | | |
|U=L1R1:1:R1:1 | | |
|<-------------| | |
| | | |
|(71) STUN Res.| | |
|S=L-PRIV-1 | | |
|D=R-PUB-1 | | |
|MA=R-PUB-1 | | |
|------------->| | |
| | | |
| |(72) STUN Res.| |
| |S=NAT-PUB-5 | |
| |D=R-PUB-1 | |
| |MA=R-PUB-1 | |
| |---------------------------->|
| | | |
| | | |
| | | |
| | | |Validate
| | | |R-PUB-2 to NAT-PUB-6
| | | |
| | | |
| |(73) STUN Req.| |
| |S=R-PUB-2 | |
| |D=NAT-PUB-6 | |
| |U=L1R1:2:R1:2 | |
| |<----------------------------|
| | | |
|(74) STUN Req.| | |
|S=R-PUB-2 | | |
|D=L-PRIV-2 | | |
|U=L1R1:2:R1:2 | | |
|<-------------| | |
| | | |
|(75) STUN Res.| | |
|S=L-PRIV-2 | | |
|D=R-PUB-2 | | |
|MA=R-PUB-2 | | |
|------------->| | |
| | | |
| |(76) STUN Res.| |
| |S=NAT-PUB-6 | |
| |D=R-PUB-2 | |
| |MA=R-PUB-2 | |
| |---------------------------->|
| | | |
| | | |
| | | |
| | | |
|(77) Offer | | |
|------------------------------------------->|
| | | |
| | | |
| | | |
| | | |
|(78) Answer | | |
|<-------------------------------------------|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
Figure 6
First, agent L obtains a STUN derived transport address for its RTP
packets (messages 1-4). Recall that the NAT is symmetric. Here, it
creates a binding of NAT-PUB-1 for this UDP request, and this becomes
the STUN derived transport address for RTP. Agent L repeats this
process for RTCP (messages 5-8) Ta seconds later, and obtains NAT-
PUB-2 as its STUN derived transport address for RTCP. The two
transport addresses are the two components of the STUN derived
candidate that agent L has just obtained.
Next, agent L will allocate a TURN derived transport address for RTP
(messages 9-12) and RTCP (messages 13-16). This produces TURN-PUB-2
and TURN-PUB-3 for RTP and RTCP, respectively.
With its three candidates, agent L prioritizes them, choosing the
local candidate as highest priority, followed by the STUN derived
candidate, followed by the TURN-derived candidate. It chooses its
TURN derived candidate as the active candidate, and encodes it into
the m/c-line. The resulting offer (message 17) looks like:
v=0
o=jdoe 2890844526 2890842807 IN IP4 $L-PRIV-1.IP
s=
c=IN IP4 $TURN-PUB-2.IP
t=0 0
m=audio $TURN-PUB-2.PORT RTP/AVP 0
a=rtpmap:0 PCMU/8000
a=rtcp:$TURN-PUB-3.PORT
a=candidate $L1 1 $L-PASS1 UDP 1.0 $L-PRIV-1.IP $L-PRIV-1.PORT
a=candidate $L1 2 $L-PASS2 UDP 1.0 $L-PRIV-2.IP $L-PRIV-2.PORT
a=candidate $L2 1 $L-PASS3 UDP 0.7 $NAT-PUB-1.IP $NAT-PUB-1.PORT
a=candidate $L2 2 $L-PASS4 UDP 0.7 $NAT-PUB-2.IP $NAT-PUB-2.PORT
a=candidate $L3 1 $L-PASS5 UDP 0.3 $TURN-PUB-2.IP $TURN-PUB-2.PORT
a=candidate $L3 2 $L-PASS6 UDP 0.3 $TURN-PUB-3.IP $TURN-PUB-3.PORT
This offer is received at agent R. Agent R will gather its STUN
derived RTP transport address (messages 18-19) and RTCP address
(messages 20-21). Since the result of the STUN allocations did not
provide a new set of transport addresses, there will not be a
separate candidate for them. Agent R then gathers its TURN derived
RTP transport address (messages 22-23) and TURN derived RTCP
transport addresses (messages 24-25). Agent R now has two
candidates. It prioritizes the local candidate with higher priority
than the TURN candidate, and selects the TURN candidate as the active
candidate. Its resulting answer looks like:
v=0
o=bob 2808844564 2808844564 IN IP4 $R-PUB-1.IP
s=
c=IN IP4 $TURN-PUB-4.IP
t=0 0
m=audio $TURN-PUB-4.PORT RTP/AVP 0
a=rtpmap:0 PCMU/8000
a=rtcp:$TURN-PUB-5.PORT
a=candidate $R1 1 $R-PASS1 UDP 1.0 $R-PUB-1.IP $R-PUB-1.PORT
a=candidate $R1 2 $R-PASS2 UDP 1.0 $R-PUB-2.IP $R-PUB-2.PORT
a=candidate $R2 1 $R-PASS3 UDP 0.3 $TURN-PUB-4.IP $TURN-PUB-4.PORT
a=candidate $R2 2 $R-PASS4 UDP 0.3 $TURN-PUB-5.IP $TURN-PUB-5.PORT
Next, agents L and R form candidate pairs and the transport address
check ordered list. This list will start with the two components in
the currently active candidate pair - TURN. Agent R begins its
checks (message 27). It will check connectivity between the active
candidate pair, starting with the first component, which is
TURN-PUB-4 for agent R and TURN-PUB-2 for agent L. The state machine
for that transport address pair moves to the Testing state. Since
this is a TURN-derived transport address for agent R, it utilizes the
TURN SEND mechanism to deliver the Binding Request. The DESTINATION-
ADDRESS is TURN-PUB-2.
The TURN server will extract the content of the TURN message, which
is a STUN Binding Request, and deliver it to the destination, TURN-
PUB-4. This request will be sent from the TURN address allocated to
R, which is TURN-PUB-4. As both interfaces are on the TURN server,
this message is sent to itself (and thus the lack of a message number
in the sequence diagram above). Note that the USERNAME in the
Binding Request is L3:1:R2:1, which represents the transport address
pair ID. This message gets discarded by the TURN server since, as of
yet, there are no permissions established for the TURN-PUB-2
allocation. However, it did have the side effect of establishing a
permission on the TURN-PUB-4 binding, allowing incoming packets from
TURN-PUB-2.
Once L gets the offer, it will attempt to validate the first
transport address pair in the transport address pair check ordered
list, which will be the active candidate. The state machine for this
transport address pair moves into the Testing state. Like agent R
did, it will use the TURN SEND primitive to send a STUN Binding
Request from its TURN derived transport address, TURN-PUB-2, to TURN-
PUB-4 (message 28). This packet traverses the NAT (message 29) and
arrives at the TURN server. The TURN server will unwrap the contents
of the packet and send them from TURN-PUB-2 to TURN-PUB-4. It will
also, as a consequence, add a permission for TURN-PUB-4. The
contents of the packet are a STUN Binding Request with USERNAME R2:1:
L3:1 (note how this is the flip of the USERNAME in the Binding
Request sent by agent R). This is also a packet from the TURN server
to itself. However, now, the packet is not discarded, as a
permission had been installed as a consequence of the "suicide
packet" from agent R (a suicide packet is a packet that has no hope
of traversing a far end NAT, but serves the purpose of enabling a
permission in a near end NAT so that a packet from the peer can be
returned). Thus, the TURN server will relay the received STUN
request towards agent R (message 30). This is delivered as a TURN
DATA Indication. Notice how the REMOTE-ADDRESS is TURN-PUB-2; this
is important as it will be used to construct the STUN Binding
Response.
Agent R will receive the DATA Indication, and unwrap its contents to
find the Binding Request. The state machine for this transport
address pair is currently in the Testing state. It therefore moves
into the Send-Valid state, and it generates a Binding Response.
However, the MAPPED-ADDRESS in the Binding Response is constructed
using the source IP address and port that were seen by the TURN
server when the Binding Request arrived at TURN-PUB-4, which is the
looped message between messages 29 and 30. This source address is
TURN-PUB-2, which is the value of the REMOTE-ADDRESS attribute in
message 30. Thus, the STUN Binding Response will contain TURN-PUB-2
in the MAPPED-ADDRESS, and is to be sent to TURN-PUB-2. To send the
response, agent R takes the STUN Binding Response and encapsulates it
in a TURN SEND primitive, setting the DESTINATION-ADDRESS to TURN-
PUB-2. This is shown in message 31.
The TURN server will receive this SEND request, and unwrap its
contents to find the STUN Binding Response. It sends it to the value
of the DESTINATION-ADDRESS attribute, and sends it from the TURN
address allocated to R, which is TURN-PUB-4. This, once again,
results in a looped message to itself, and it arrives at TURN-PUB-2.
Now, however, there is a permission installed for TURN-PUB-4. The
TURN server will therefore forward the packet to agent L. To do so,
it constructs a TURN DATA Indication containing the contents of the
packet. It sets the REMOTE-ADDRESS to the source transport address
of the request it received (TURN-PUB-4), and forwards it to agent L
(message 32). This traverses the NAT (message 33) and arrives at
agent L. As a consequence of the receipt of a Binding Response, the
state machine for this transport address pair moves to the Recv-Valid
state. The agent also examines the MAPPED-ADDRESS of the STUN
response. It is TURN-PUB-2. This is the same as the native
transport address of this transport address pair, and thus doesn't
represent a new transport address that might have been learned.
A short while later, agent R will attempt a retransmission of its
STUN Binding Request that was lost (the contents of message 27 that
were discarded by the TURN server due to lack of permission). This
time, however, a permission has been installed and the retransmission
will work. So, it sends the Binding Request again (message 34,
identical to message 27). This is looped by the TURN server to
itself again, but this time there is a permission in place when it
arrives at TURN-PUB-2. As such, the request is forwarded towards
agent L this time, in a TURN DATA Indication (message 35). This
traverses the NAT (message 36) and arrives at agent L. Agent L
extracts the contents of the request, which are a STUN Binding
Request. This causes the state machine to move from Recv-Valid to
Valid. It generates a STUN Binding Response, and sets the MAPPED-
ADDRESS to the value of the REMOTE-ADDRESS in message 36
(TURN-PUB-4). This Binding Response is sent to TURN-PUB-4, which is
accomplished through a TURN SEND primitive (message 37). This SEND
Request traverses the NAT (message 38) and is received by the TURN
server. Its contents are decapsulated, and sent to TURN-PUB-4, which
is again a loop on the same host. This packet is then sent towards
agent R in a DATA Indication (message 39). The contents of the DATA
Indication are extracted, and the agent sees a successful Binding
Response. It therefore moves the state machine from the Send-Valid
state to the Valid state. At this point, the transport address pair
is in the Valid state for both agents.
Approximately Ta seconds after agent R sent message 27, agent R will
start checks for the next transport address pair in its transport
address pair check ordered list. This is the second component of the
same candidate pair, used for RTCP. This sequence, messages 40
through 52, are identical to the ones for RTP, but differ only in the
specific transport addresses.
Once that validation happens, the second transport address pair has
been validated. The candidate pair moves into the valid state, and
both candidates are considered valid. The active candidate has now
been validated, and media can begin to flow. It will do so through
the TURN server; indeed, it is relayed "twice" through the TURN
server. Even though there is a single TURN server, it is logically
acting as two separate TURN servers. Indeed, had L and R used two
separate TURN servers, media would be relayed through both TURN
servers.
The actual media flows are shown as well. It is important to note
that, since the ICE checks have not yet concluded on the candidate
that will ultimately be used, no TURN Set Active Destinations have
been sent. As a consequence, media that is sent through the TURN
servers has to be sent using TURN Send requests. This introduces
some overhead, but is a transient condition. In message 53, agent L
sends an RTP packet to agent R using a SEND request. It is sent to
TURN-PUB-4. This traverses the NAT (message 54), and arrives at the
TURN server. It is decapsulated, looped to itself, and arrives at
TURN-PUB-4. From there, it is encapsulated in a DATA Indication and
sent to agent R (message 55). In the reverse direction, agent R will
send an RTP packet using a TURN SEND primitive (message 56), and send
it to TURN-PUB-2. This is received by the TURN server, decapsulated,
and sent to TURN-PUB-2 from TURN-PUB-4. This is again a loop within
the same host, arriving at TURN-PUB-4. The contents of the packet
are sent to agent L through a TURN DATA Indication (message 57),
which traverses the NAT (message 58) to arrive at agent R. Since this
call flow is already long enough, RTCP packet transmission is not
shown.
Approximately Ta seconds after it sends message 41, agent L goes to
the next transport address pair in its transport address pair check
ordered list that is in the Waiting state. This will be the RTP
candidate for the top priority candidate pair, which is L-PRIV-1 on
agent L and R-PUB-1 on agent R. This is a local candidate for each
agent. To perform the check, agent R sends a STUN Binding Request
from L-PRIV-1 to R-PUB-1 (message 59). Note the USERNAME of
R1:1:L1:1, which identifies this transport address pair. This
traverses the NAT (message 60). Since the NAT is symmetric and this
is a new destination IP address, the NAT allocates a new transport
address on its public side, NAT-PUB-5, and places this in the source
IP address and port. This packet arrives at agent R. Agent R finds a
matching transport address pair in the Waiting state. The state
machine transitions to the Send-Valid state. It sends the Binding
response, with a MAPPED-ADDRESS equal to NAT-PUB-5 (message 61),
which traverses the NAT and arrives at agent L (message 62). Agent
R, in addition to sending the response, will also send a Binding
Request. It is important to remember that this Binding Request is
sent to the remote address in the transport address pair (L-PRIV-1),
and NOT to the source IP address and port of the Binding Request
(NAT-PUB-5); that will happen later. This attempt is shown in
message 63. However, since the L-PRIV-1 is private, the packet is
discarded in the network.
Now, as a consequence of receiving message 60, agent R will have
constructed a peer-derived candidate. The candidate ID for this
candidate is L1R1, and it initially contains a single transport
address pair, NAT-PUB-5 and R-PUB-1. However, the candidate isn't
yet usable until the other component gets added. Similarly, agent L
will have constructed the same peer-derived candidate, with the same
candidate ID and the same transport address pair.
Some Ta seconds after sending message 40, agent R will move to the
next transport address pair in the transport address pair check
ordered list whose state is Waiting. This is the RTCP component of
the highest priority candidate pair. It will attempt a connectivity
check, from R-PUB-2 to L-PRIV-2 (message 64). Since L-PRIV-1 is
private, this message is discarded.
Some Ta seconds after sending message 59, agent L will move to the
next transport address pair in the transport address pair check
ordered list whose state is Waiting. This is the RTCP component of
the highest priority candidate pair. It will attempt a connectivity
check, from L-PRIV-2 to R-PUB-2 (message 65), which operates nearly
identically to messages 59-62, with the exception of the specific
addresses. Here, the NAT will create a new binding for the RTCP,
NAT-PUB-6, and this transport address is new for both participants.
On receipt of this Binding Request at agent R (message 66), agent R
constructs the candidate ID for the peer-derived candidate, L1R1, and
finds it already exists. As such, this new transport address is
added, and the peer-derived candidate becomes complete and usable.
Agent L does the same thing on receipt of message 68. This candidate
will have the same priority as its generating candidate L1 (1.0), and
is paired up with R1 (also at priority 1.0). Since L1R1 has the same
priority as L1 itself, the ordering algorithm in Section 7.5 will use
the reverse lexicographic order of the candidate ID iself to
determine order. L1R1 is larger than L1, so that the peer-derived
candidate will come before its generating candidate. As a
consequence, the peer-derived candidate pair will have a higher
priority than its generating candidate, and appear just before it in
the candidate pair priority ordered list.
As a consequence, after agent R sends message 67 and completes the
peer-derived candidate, it will move the two transport addresses in
the peer derived candidate into the Send-Valid state, and send a
Binding Request for each in rapid succession (agent L will have moved
both into the Recv-Valid state upon receipt of message 68). The
first of these connectivity checks are for the RTP component, from
R-PUB-1 to NAT-PUB-5 (message 69). Note the USERNAME in the STUN
Binding Request, L1R1:1:R1:1, which identifies the peer-derived
transport address pair. This will succesfully traverse the NAT and
be delivered to agent L (message 70). The receipt of this request
moves the state machine for this transport address pair from Recv-
Valid to Valid, and a Binding Response is sent (message 71). This
passes through the NAT and arrives at agent R (message 72). This
causes its state machine to enter the Valid state as well. The
MAPPED-ADDRESS, R-PUB-1, is not new to agent R and thus does not
result in the creation of a new peer-derived candidate.
Messages 73 through 76 show the same basic flow for RTCP. Upon
receipt of message 76, both transport address pairs are Valid at both
agents, causing the peer derived candidate to become valid. Timer
Tws is set at agent L, and fires without any higher priority
candidate pairs becoming validated. As such, it now decides to send
an updated offer to promote the peer-derived candidate to active.
This offer (message 77) looks like:
v=0
o=jdoe 2890844526 2890842808 IN IP4 $L-PRIV-1.IP
s=
c=IN IP4 $NAT-PUB-5.IP
t=0 0
m=audio $NAT-PUB-5.PORT RTP/AVP 0
a=rtpmap:0 PCMU/8000
a=rtcp:$NAT-PUB-6.PORT
a=remote-candidate:R1
a=candidate $L1 1 $L-PASS1 UDP 1.0 $L-PRIV-1.IP $L-PRIV-1.PORT
a=candidate $L1 2 $L-PASS2 UDP 1.0 $L-PRIV-2.IP $L-PRIV-2.PORT
There are several important things to note in this offer. Firstly,
note how the m/c-line now contains NAT-PUB-5 and NAT-PUB-6, the peer
derived transport addresses it learned through the ICE processing.
Secondly, note how there remains a candidate encoded into the
a=candidate attributes. This is candidate L1, NOT candidate L1R1.
Recall that the peer-derived candidates are never encoded into the
SDP. Rather, their generating candidate is encoded. This will cause
keepalives to take place for the genreating candidate if valid
(though its not) and any of its derived candidates, which is what we
want. Finally, notice the inclusion of the a=remote-candidate
attribute. Since agent L doesn't know whether agent R received
messages 72 or 76, it doesnt know whether the state of the candidate
is Recv-Valid or Valid at agent R. So, it has to tell agent R that,
in case its Recv-Valid, to please use it anyway.
The answer generated by agent R looks like:
v=0
o=bob 2808844564 2808844565 IN IP4 $R-PUB-1.IP
s=
c=IN IP4 $R-PUB-1.IP
t=0 0
m=audio $R-PUB-1.PORT RTP/AVP 0
a=rtpmap:0 PCMU/8000
a=rtcp:$R-PUB-2.PORT
a=candidate $R1 1 $R-PASS1 UDP 1.0 $R-PUB-1.IP $R-PUB-1.PORT
a=candidate $R1 2 $R-PASS2 UDP 1.0 $R-PUB-2.IP $R-PUB-2.PORT
With this, media can now flow directly between endpoints. The
removal of the TURN-based candidates from the offer/answer exchange
will cause the TURN allocations to be removed.
12. Grammar
This specification defines two new SDP attributes - the "candidate"
and "remote-candidate" attributes.
The candidate attribute MUST be present within a media block of the
SDP. It contains a transport address for a candidate that can be
used for connectivity checks. There may be multiple candidate
attributes in a media block.
The syntax of this attribute is defined using Augmented BNF as
defined in RFC 2234 [11]:
candidate-attribute = "candidate" ":" candidate-id SP component-id SP
password SP
transport SP transport SP
qvalue SP ;qvalue from RFC 3261 qvalue SP ;qvalue from RFC 3261
addr SP addr SP ;addr from RFC 3266
port SP port ;port from RFC 2327
;addr, port from RFC 2327 *(SP extension-att-name SP
transport = "UDP" / "TCP" / transport-extension extension-att-value)
transport-extension = token
candidate-id = 1*DIGIT
id = non-ws-string
transport = "UDP" / transport-extension
transport-extension = token
candidate-id = 1*base64-char
password = 1*base64-char
base64-char = ALPHANUM / DIGIT / "+" / "/"
component-id = 1*DIGIT
extension-att-name = token
extension-att-value = token
The candidate-id is used to group together the transport addresses The candidate-id is used to group together the transport addresses
for a particular candidate. It MUST be a positive integer whose for a particular candidate. It MUST be constructed with at least 128
value is less than (2^31 -1). It MUST have the same value for all bits of randomness. It MUST have the same value for all transport
transport addresses within the same candidate. It MUST have a addresses within the same candidate. It MUST have a different value
different value for transport addresses within different candidates for transport addresses within different candidates for the same
for the same media stream. The tid production contains an media stream. The password MUST also be constructed with at least
identifier, chosen with 128 bits of randomness, that identifies the 128 bits of randomness, and MUST differ for each transport address.
transport address. The tid of a pair of transport addresses is Both of these use a syntax that is defined to be equal to the base64
combined to for the username and password of a STUN request from one alphabet [4], which allows the candidate-id and password to be
transport address to another. The transport production indicates the generated by performing a base64 encoding of a randomly generated 128
transport protocol for the candidate. This can be either UDP or TCP. bit value (note, however, that this does not mean that the
Extensibility is provided to allow for future transport protocols to candidate-id or password is base64 decoded when use in STUN
be used with ICE, such as the Datagram Congestion Control Protocol messages). The component-id is a positive integer, which identifies
(DCCP) [26]. The unicast-address production is from RFC 2327, and the specific component of the candidate. It MUST start at 1 and MUST
contains the IPv4 or IPv6 address of the candidate. The port increment by 1 for each component of a particular candidate.
production contains its port.
12. Security Considerations The addr production is taken from [12], allowing for IPv4 addresses,
IPv6 addresses and FQDNs. The port production is taken from RFC 2327
[7]. The token production is taken from RFC 3261 [3]. The transport
production indicates the transport protocol for the candidate. This
specification only defines UDP. However, extensibility is provided
to allow for future transport protocols to be used with ICE, such as
TCP or the Datagram Congestion Control Protocol (DCCP) [32].
There are numerous threats in a system using ICE. This section The a=candidate attribute can itself be extended. The grammar allows
overviews these threats and discusses how they are mitigated. for new name/value pairs to be added at the end of the attribute. An
implementation MUST ignore any name/value pairs it doesn't
understand.
STUN itself introduces many security considerations, which receive an The syntax of the "remote-candidate" attribute is defined using
extensive treatment in RFC 3489. STUN is used within ICE in two ways Augmented BNF as defined in RFC 2234 [11]:
- one, as a technique for address gathering, and two, as a peer-to-
peer connectivity check. All of the security considerations of RFC
3489 apply directly to the former usage. However, the latter usage,
as a peer-to-peer connectivity check, is sufficiently different that
a discussion of its security considerations is appropriate.
It remains the case that many attacks are rooted in a single remote-candidate-att = "remote-candidate" ":" candidate-id
primitive - an attacker attempts to inject a STUN response with an
invalid MAPPED-ADDRESS attribute. In the usages of STUN described in
RFC 3489, this injection can occur as a result of compromises of STUN
servers, attacks on the DNS, rogue NATs, injection of faked responses
coupled with a dos attack, and replaying modified requests. With
peer-to-peer STUN, compromises of STUN servers are not much of a
concern, since the STUN servers are embedded in endpoints and
distributed throughout the network. Thus, compromising the STUN
server is equivalent to comprimising the endpoint, and if that
happens, far more problematic attacks are possible than those against
ICE. Similarly, DNS attacks are irrelevant since STUN servers are
not discovered via DNS, they are signaled via SIP. Rogue NATs,
injection of fake responses and relaying modified requests all can be
handled in ICE with the countermeasures discussed below.
Consider an attacker that intercepts a STUN packet used for This attribute MUST be present in an offer when the candidate in the
connectivity checks, and replays it using its own source address. If m/c-line is part of a candidate pair that is in the valid or
successful, this would fool an endpoint into thinking that this faked partially valid state.
source address was a valid destination for media (recall that the
source transport address of received STUN packets is used as a
potential candidate address). However, the recipient of the replayed
packet will not just send media to that candidate. It will verify it
with a STUN connectivity check. This check will be sent to that
faked source address, and if there is no answer, the address will not
be used. The attacker cannot answer the STUN request without access
to the username and password, which are exchanged as part of the
signaling. Thus, if the signaling is protected as recommended above,
the attacker cannot obtain the username or password.
If an attacker instead intercepts and replays STUN packets used for 13. Security Considerations
the purposes of unilateral allocation, a similar result occurs. The
target of the attack will be fooled into thinking it has a STUN
derived transport address that it does not. Its peer will perform a
connectivity check to this address, which will fail. The attacker
cannot force this check to succeed without access to the username and
password, which are protected. Thus, this address will not be used.
In the worst case, an attacker can generate enough traffic so that There are several types of attacks possible in an ICE system. This
none of the valid STUN checks or unilateral allocations succeed. section considers these attacks and their countermeasures.
This would result in a service disruption. However, this attack is
no worse than any pure packet flood disruption attack launched
against any other protocol. These attacks cannot be prevented by any
protocol means.
If an attacker could intercept and modify the contents of the Offer 13.1 Attacks on Connectivity Checks
or Accept messages, they could disrupt the session, divert the media,
and otherwise take control over the session. This attack is
prevented by encryption, authentication and message integrity of the
signaling channel used for ICE.
SIP-based implementations of ICE SHOULD use the sips URI scheme when An attacker might also attempt to disrupt the STUN-based connectivity
transporting SDP with ICE information, and MAY use S/MIME [3]. checks. Ultimately, all of these attacks fool an agent into thinking
something incorrect about the results of the connectivity checks.
The possible false conclusions an attacker can try and cause are:
13. IANA Considerations False Invalid An attacker can fool a pair of agents into thinking a
candidate pair is invalid, when it isn't. This can be used to
cause an agent to prefer a different candidate (such as one
injected by the attacker), or to disrupt a call by forcing all
candidates to fail.
This specification defines one new SDP attribute per the procedures False Valid An attacker can fool a pair of agents into thinking a
candidate pair is valid, when it isn't. This can cause an agent
to proceed with a session, but then not be able to receive any
media.
False Peer-Derived Candidate An attacker can cause an agent to
discover a new peer-derived candidate, when it shouldn't have.
This can be used to redirect media streams to a DoS target or to
the attacker, for eavesdropping or other purposes.
False Valid on False Candidate An attacker has already convinced an
agent that there is a candidate with an address that doesn't
actually route to that agent (for example, by injecting a false
peer-derived candidate or false STUN-derived candidate). It must
then launch an attack that forces the agents to believe that this
candidate is valid.
Of the various techniques for creating faked STUN messages described
in RFC 3489, many are not applicable for the connectivity checks.
Compromises of STUN servers are not much of a concern, since the STUN
servers are embedded in endpoints and distributed throughout the
network. Thus, compromising the STUN server is equivalent to
comprimising the endpoint, and if that happens, far more problematic
attacks are possible than those against ICE. Similarly, DNS attacks
are irrelevant since STUN servers are not discovered via DNS, they
are signaled via SIP. Injection of fake responses and relaying
modified requests all can be handled in ICE with the countermeasures
discussed below.
To force the false invalid result, the attacker has to wait for the
connectivity check for one of the agents to be sent. When it is, the
attacker needs to inject a fake response with an unrecoverable error
response, such as a 600. This attack only needs to be launched
against one of the agents in order to invalidate the candidate pair.
However, since the candidate is, in fact, valid, the original request
may reach the peer agent, and result in a success response. The
attacker needs to force this packet or its response to be dropped,
through a DoS attack, layer 2 network disruption, or other technique.
If it doesn't do this, the success response will also reach the
originator, alerting it to a possible attack. This will cause the
agent to abandon the candidate, which is the desired result in any
case. Fortunately, this attack is mitigated completely through the
STUN message integrity mechanism. The attacker needs to inject a
fake response, and in order for this response to be processed, the
attacker needs the password. If the offer/answer signaling is
secured, the attacker will not have the password.
Forcing the fake valid result works in a similar way. The agent
needs to wait for the Binding Request from each agent, and inject a
fake success response. The attacker won't need to worry about
disrupting the actual response since, if the candidate is not valid,
it presumably wouldn't be received anyway. However, like the fake
invalid attack, this attack is mitigated completely through the STUN
message integrity and offer/answer security techniques.
Forcing the false peer-derived candidate result can be done either
with fake requests or responses, or with replays. We consider the
fake requests and responses case first. It requires the attacker to
send a Binding Request to one agent with a source IP address and port
for the false transport address. In addition, the attacker must wait
for a Binding Request from the other agent, and generate a fake
response with a MAPPED-ADDRESS attribute. This attack is best
launched against a candidate pair that is likely to be invalid, so
the attacker doesnt need to contend with the actual responses to the
real connectivity checks. Like the other attacks described here,
this attack is mitigated by the STUN message integrity mechanisms and
secure offer/answer exchanges.
Forcing the false peer-derived candidate result with packet replays
is different. The attacker waits until one of the agents sends a
Binding Request for one of the transport address pairs. It then
intercepts this request, and replays it towards the other agent with
a faked source IP address. It must also prevent the original request
from reaching the remote agent, either by launching a DoS attack to
cause the packet to be dropped, or forcing it to be dropped using
layer 2 mechanisms. The replayed packet is received at the other
agent, and accepted, since the integrity check passes (the integrity
check cannot and does not cover the source IP address and port). It
is then responded to. This response will contain a MAPPED-ADDRESS
with the false transport address. It is passed to the this false
address. The attacker must then intercept it and relay it towards
the originator.
The other agent will then initiate a connectivity check towards that
transport address. This validation needs to succeed. This requires
the attacker to force a false valid on a false candidate. Injecting
of fake requests or responses to achieve this goal is prevented using
the integrity mechanisms of STUN and the offer/answer exchange.
Thus, this attack can only be launched through replays. To do that,
the attacker must intercept the Binding Request towards this false
transport address, and replay it towards the other agent. Then, it
must intercept the response and replay that back as well.
This attack is very hard to launch unless the attacker themself is
identified by the fake transport address. This is because it
requires the attacker to intercept and replay packets sent by two
different hosts. If both agents are on different networks (for
example, across the public Internet), this attack can be hard to
coordinate, since it needs to occur against two different endpoints
on different parts of the network at the same time.
If the attacker themself is identified by the fake transport address,
the attack is easier to coordinate. However, if SRTP is used [25],
the attacker will not be able to play the media packets, they will
only be able to discard them, effectively disabling the media stream
for the call. However, this attack requires the agent to disrupt
packets in order to block the connectivity check from reaching the
target. In that case, if the goal is to disrupt the media stream,
its much easier to just disrupt it with the same mechanism, rather
than attack ICE.
13.2 Attacks on Address Gathering
ICE endpoints make use of STUN for gathering addresses from a STUN
server in the network. This is corresponds to the binding
acquisition use case discussed in Section 10.1 of RFC 3489. As a
consequence, the attacks against STUN itself that are described in
Section 12 of RFC 3489 can still be used against the STUN address
gathering operations that occur in ICE.
However, the additional mechanisms provided by ICE actually
counteract such attacks, making binding acquisition with STUN more
secure when combined with ICE than without ICE.
Consider an attacker which is able to provide an agent with a faked
MAPPED-ADDRESS in a STUN Binding Request that is used for address
gathering. This is the primary attack primitive described in Section
12 of RFC 3489. This address will be used as a STUN derived
candidate in the ICE exchange. For this candidate to actually be
used for media, the attacker must also attack the connectivity
checks, and in particular, force a false valid on a false candidate.
This attack is very hard to launch if the false address identifies a
third party, and is prevented by SRTP if it identifies the attacker
themself.
If the attacker elects not to attack the connectivity checks, the
worst it can do is prevent the STUN-derived address from being used.
However, if the peer agent has at least one address that is reachable
by the agent under attack, the STUN connectivity checks themselves
will provide a STUN-derived address that can be used for the exchange
of media. Peer derived candidates are preferred over the candidate
they are generated from for this reason. As such, an attack solely
on the STUN address gathering will normally have no impact on a call
at all.
13.3 Attacks on the Offer/Answer Exchanges
An attacker that can modify or disrupt the offer/answer exchanges