--- 1/draft-ietf-mmusic-ice-07.txt 2006-03-31 02:12:18.000000000 +0200 +++ 2/draft-ietf-mmusic-ice-08.txt 2006-03-31 02:12:18.000000000 +0200 @@ -1,18 +1,18 @@ MMUSIC J. Rosenberg Internet-Draft Cisco Systems -Expires: September 7, 2006 March 6, 2006 +Expires: September 30, 2006 March 29, 2006 Interactive Connectivity Establishment (ICE): A Methodology for Network Address Translator (NAT) Traversal for Offer/Answer Protocols - draft-ietf-mmusic-ice-07 + draft-ietf-mmusic-ice-08 Status of this Memo By submitting this Internet-Draft, each author represents that any applicable patent or other IPR claims of which he or she is aware have been or will be disclosed, and any of which he or she becomes aware will be disclosed, in accordance with Section 6 of BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that @@ -23,21 +23,21 @@ and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt. The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html. - This Internet-Draft will expire on September 7, 2006. + This Internet-Draft will expire on September 30, 2006. Copyright Notice Copyright (C) The Internet Society (2006). Abstract This document describes a protocol for Network Address Translator (NAT) traversal for multimedia session signaling protocols based on the offer/answer model, such as the Session Initiation Protocol @@ -49,67 +49,68 @@ Table of Contents 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 4 2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . 4 3. Overview of ICE . . . . . . . . . . . . . . . . . . . . . . 8 4. Sending the Initial Offer . . . . . . . . . . . . . . . . . 11 5. Receipt of the Offer and Generation of the Answer . . . . . 11 6. Processing the Answer . . . . . . . . . . . . . . . . . . . 12 7. Common Procedures . . . . . . . . . . . . . . . . . . . . . 12 7.1 Gathering Candidates . . . . . . . . . . . . . . . . . . . 12 - 7.2 Prioritizing the Candidates and Choosing an Active One . . 16 - 7.3 Encoding Candidates into SDP . . . . . . . . . . . . . . . 18 - 7.4 Forming Candidate Pairs . . . . . . . . . . . . . . . . . 21 - 7.5 Ordering the Candidate Pairs . . . . . . . . . . . . . . . 23 - 7.6 Performing the Connectivity Checks . . . . . . . . . . . . 26 - 7.7 Sending a Binding Request for Connectivity Checks . . . . 30 - 7.8 Receiving a Binding Request for Connectivity Checks . . . 31 - 7.9 Promoting a Candidate to Active . . . . . . . . . . . . . 33 - 7.10 Learning New Candidates from Connectivity Checks . . . . 34 - 7.10.1 On Receipt of a Binding Request . . . . . . . . . . 34 - 7.10.2 On Receipt of a Binding Response . . . . . . . . . . 38 - 7.11 Subsequent Offer/Answer Exchanges . . . . . . . . . . . 39 - 7.11.1 Sending of a Subsequent Offer . . . . . . . . . . . 40 - 7.11.2 Receiving the Offer and Sending an Answer . . . . . 42 - 7.11.3 Receiving the Answer . . . . . . . . . . . . . . . . 45 - 7.12 Binding Keepalives . . . . . . . . . . . . . . . . . . . 45 - 7.13 Sending Media . . . . . . . . . . . . . . . . . . . . . 46 - 8. Guidelines for Usage with SIP . . . . . . . . . . . . . . . 49 - 9. Interactions with Forking . . . . . . . . . . . . . . . . . 51 - 10. Interactions with Preconditions . . . . . . . . . . . . . . 51 - 11. Examples . . . . . . . . . . . . . . . . . . . . . . . . . . 51 - 11.1 Basic Example . . . . . . . . . . . . . . . . . . . . . 53 - 11.2 Advanced Example . . . . . . . . . . . . . . . . . . . . 57 - 12. Grammar . . . . . . . . . . . . . . . . . . . . . . . . . . 77 - 13. Security Considerations . . . . . . . . . . . . . . . . . . 79 - 13.1 Attacks on Connectivity Checks . . . . . . . . . . . . . 79 - 13.2 Attacks on Address Gathering . . . . . . . . . . . . . . 81 - 13.3 Attacks on the Offer/Answer Exchanges . . . . . . . . . 82 - 13.4 Insider Attacks . . . . . . . . . . . . . . . . . . . . 82 - 13.4.1 The Voice Hammer Attack . . . . . . . . . . . . . . 82 - 13.4.2 STUN Amplification Attack . . . . . . . . . . . . . 83 - 14. IANA Considerations . . . . . . . . . . . . . . . . . . . . 83 - 14.1 candidate Attribute . . . . . . . . . . . . . . . . . . 83 - 14.2 remote-candidate Attribute . . . . . . . . . . . . . . . 84 - 14.3 ice-pwd Attribute . . . . . . . . . . . . . . . . . . . 84 - 15. IAB Considerations . . . . . . . . . . . . . . . . . . . . . 85 - 15.1 Problem Definition . . . . . . . . . . . . . . . . . . . 85 - 15.2 Exit Strategy . . . . . . . . . . . . . . . . . . . . . 86 - 15.3 Brittleness Introduced by ICE . . . . . . . . . . . . . 86 - 15.4 Requirements for a Long Term Solution . . . . . . . . . 87 - 15.5 Issues with Existing NAPT Boxes . . . . . . . . . . . . 87 - 16. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 88 - 17. References . . . . . . . . . . . . . . . . . . . . . . . . . 88 - 17.1 Normative References . . . . . . . . . . . . . . . . . . 88 - 17.2 Informative References . . . . . . . . . . . . . . . . . 89 - Author's Address . . . . . . . . . . . . . . . . . . . . . . 91 - Intellectual Property and Copyright Statements . . . . . . . 92 + 7.2 Prioritizing the Candidates and Choosing an Active One . . 18 + 7.3 Encoding Candidates into SDP . . . . . . . . . . . . . . . 20 + 7.4 Forming Candidate Pairs . . . . . . . . . . . . . . . . . 23 + 7.5 Ordering the Candidate Pairs . . . . . . . . . . . . . . . 25 + 7.6 Performing the Connectivity Checks . . . . . . . . . . . . 28 + 7.7 Sending a Binding Request for Connectivity Checks . . . . 32 + 7.8 Receiving a Binding Request for Connectivity Checks . . . 33 + 7.9 Promoting a Candidate to Active . . . . . . . . . . . . . 35 + 7.10 Learning New Candidates from Connectivity Checks . . . . 36 + 7.10.1 On Receipt of a Binding Request . . . . . . . . . . 36 + 7.10.2 On Receipt of a Binding Response . . . . . . . . . . 40 + 7.11 Subsequent Offer/Answer Exchanges . . . . . . . . . . . 42 + 7.11.1 Sending of a Subsequent Offer . . . . . . . . . . . 42 + 7.11.2 Receiving the Offer and Sending an Answer . . . . . 45 + 7.11.3 Receiving the Answer . . . . . . . . . . . . . . . . 47 + 7.12 Binding Keepalives . . . . . . . . . . . . . . . . . . . 48 + 7.13 Sending Media . . . . . . . . . . . . . . . . . . . . . 49 + 7.14 Receiving Media . . . . . . . . . . . . . . . . . . . . 51 + 8. Guidelines for Usage with SIP . . . . . . . . . . . . . . . 52 + 9. Interactions with Forking . . . . . . . . . . . . . . . . . 54 + 10. Interactions with Preconditions . . . . . . . . . . . . . . 54 + 11. Examples . . . . . . . . . . . . . . . . . . . . . . . . . . 55 + 11.1 Basic Example . . . . . . . . . . . . . . . . . . . . . 56 + 11.2 Advanced Example . . . . . . . . . . . . . . . . . . . . 60 + 12. Grammar . . . . . . . . . . . . . . . . . . . . . . . . . . 80 + 13. Security Considerations . . . . . . . . . . . . . . . . . . 82 + 13.1 Attacks on Connectivity Checks . . . . . . . . . . . . . 82 + 13.2 Attacks on Address Gathering . . . . . . . . . . . . . . 85 + 13.3 Attacks on the Offer/Answer Exchanges . . . . . . . . . 86 + 13.4 Insider Attacks . . . . . . . . . . . . . . . . . . . . 86 + 13.4.1 The Voice Hammer Attack . . . . . . . . . . . . . . 86 + 13.4.2 STUN Amplification Attack . . . . . . . . . . . . . 86 + 14. IANA Considerations . . . . . . . . . . . . . . . . . . . . 87 + 14.1 candidate Attribute . . . . . . . . . . . . . . . . . . 87 + 14.2 remote-candidate Attribute . . . . . . . . . . . . . . . 87 + 14.3 ice-pwd Attribute . . . . . . . . . . . . . . . . . . . 88 + 15. IAB Considerations . . . . . . . . . . . . . . . . . . . . . 88 + 15.1 Problem Definition . . . . . . . . . . . . . . . . . . . 89 + 15.2 Exit Strategy . . . . . . . . . . . . . . . . . . . . . 89 + 15.3 Brittleness Introduced by ICE . . . . . . . . . . . . . 90 + 15.4 Requirements for a Long Term Solution . . . . . . . . . 91 + 15.5 Issues with Existing NAPT Boxes . . . . . . . . . . . . 91 + 16. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 91 + 17. References . . . . . . . . . . . . . . . . . . . . . . . . . 92 + 17.1 Normative References . . . . . . . . . . . . . . . . . . 92 + 17.2 Informative References . . . . . . . . . . . . . . . . . 93 + Author's Address . . . . . . . . . . . . . . . . . . . . . . 94 + Intellectual Property and Copyright Statements . . . . . . . 96 1. Introduction RFC 3264 [4] defines a two-phase exchange of Session Descrption Protocol (SDP) messages [5] for the purposes of establishment of multimedia sessions. This offer/answer mechanism is used by protocols such as the Session Initiation Protocol (SIP) [2]. Protocols using offer/answer are difficult to operate through Network Address Translators (NAT). Because their purpose is to establish a @@ -178,23 +179,23 @@ address are received on the socket bound to its associated local transport address. Derived addresses are obtained using protocols like STUN, and more generally, any UNSAF protocol [22]. Reflexive Transport Address: As defined in [13], a transport address learned by a client which identifies that client as seen by another host on an IP network, typically a STUN server. When there is an intervening NAT between the client and the other host, the reflexive transport address represents the binding allocated to the client on the public side of the NAT. Reflexive transport - addresses are learned from the MAPPED-ADDRESS attribute in STUN - Binding Responses and Allocate Responses [14], and are a type of - derived transport address. + addresses are learned from the XOR-MAPPED-ADDRESS attribute in + STUN Binding Responses and Allocate Responses [14], and are a type + of derived transport address. Server Reflexive Transport Address: A server reflexive transport address is a reflexive address that is reflected off of a server, distinct from the peer, whose address is configured or learned by the client prior to an offer/answer exchange. Peer Reflexive Transport Address: A peer reflexive transport address is a reflexive address that is reflected off of the peer. Peer reflexive transport addresses are learned by connectivity checks. @@ -403,29 +404,29 @@ it receives one from its peer. As soon as the active candidate has been verified by the STUN checks, media can begin to flow. Once a higher priority candidate has been verified by the offerer, it ceases additional connectivity checks, begins using that candidate for media, and sends an updated offer which promotes this higher priority candidate to the m/c-line. That candidate is also listed in a=candidate attributes, resulting in periodic STUN keepalives through the duration of the media session. If an agent receives a STUN connectivity check with a new source IP - address and port, or a response to such a check with a new IP address - and port indicated in the MAPPED-ADDRESS attribute, this new address - might be a viable candidate for the receipt of media. This happens - when there is a NAT with an address dependent or address and port - dependent mapping property [37] between the agents. In such a case, - the agents algorithmically construct a new candidate. Like other - candidates, connectivity checks begin for it, and if they succeed, - its transport addresses can be used for receipt of media by promoting - it to the m/c-line. + address and port, or a response to such a check with a new reflexive + transport address (obtained from the XOR-MAPPED-ADDRESS attribute), + this new address might be a viable candidate for the receipt of + media. This happens when there is a NAT with an address dependent or + address and port dependent mapping property [37] between the agents. + In such a case, the agents algorithmically construct a new candidate. + Like other candidates, connectivity checks begin for it, and if they + succeed, its transport addresses can be used for receipt of media by + promoting it to the m/c-line. The gathering of addresses and connectivity checks take time. As a consequence, in order to have minimal impact on the call setup time or post-pickup delay for SIP, these offer/answer exchanges and checks happen while the call is ringing. 4. Sending the Initial Offer When an agent wishes to begin a session by sending an initial offer, it starts by gathering transport addresses, as described in @@ -472,23 +473,23 @@ Transmission of media is performed according to the procedures in Section 7.13. 6. Processing the Answer There are two possible cases for processing of the answer. If the answerer did not support ICE, the answer will not contain any a=candidate attributes. As a result, the offerer knows that it cannot perform its connectivity checks. In this case, it proceeds - with normal media processing as if ICE was not in use. The - procedures for sending media, described in Section 7.13, MUST be - followed however. + with normal media processing as if ICE was not in use. However, it + SHOULD send media with the symmetric property described in + Section 7.13, and follow the keepalive procedures in Section 7.12. If the answer contains candidates, it implies that the answerer supports ICE. The offerer then forms candidate pairs as described in Section 7.4. These are ordered as described in Section 7.5. The agent then begins connectivity checks, as described in Section 7.6. It follows the logic in Section 7.10 on receipt of Binding Requests and responses to learn new candidates from the checks themselves. Transmission of media is performed according to the procedures in Section 7.13. @@ -608,43 +609,42 @@ To obtain both server reflexive and relayed candidates using the STUN Relay Usage, the client takes a local UDP candidate, and for each configured STUN server, produces both candidates. It is anticipated that clients may have a multiplicity of STUN servers configured or discovered in network environments where there are multiple layers of NAT, and that layering is known to the provider of the client. To obtain these candidates, for each configured STUN server, the client initiates an Allocate Request transaction using the procedures of Section 8.1.2 of [14] from each transport address of a particular local candidate. The Allocate Response will provide the client with - its server reflexive transport address in the MAPPED-ADDRESS - attribute and its relayed transport address in the RELAY-ADDRESS - attribute. Once the Allocate requests have given a client a relayed - transport address for all transport addresses in a relayed candidate, - there is no reason for a client to obtain further relayed candidates - through the same STUN server. Thus, if there are other local - candidates from which the client has not yet obtained relayed + its server reflexive transport address (obtained from the XOR-MAPPED- + ADDRESS attribute) and its relayed transport address in the RELAY- + ADDRESS attribute. Once the Allocate requests have given a client a + relayed transport address for all transport addresses in a relayed + candidate, there is no reason for a client to obtain further relayed + candidates through the same STUN server. Thus, if there are other + local candidates from which the client has not yet obtained relayed transport address, the client SHOULD NOT bother to obtain them. Instead, it SHOULD use the STUN Binding Discovery usage and obtain just server reflexive addresses from that STUN server. The order in which local candidates are tried against the STUN server to obtain relayed candidates is a matter of local policy. - To obtain server reflexice candidates using the STUN Binding + To obtain server reflexive candidates using the STUN Binding Discovery usage, the client takes a local UDP candidate, and for each configured STUN server, produces a server reflexive candidate. To produce the server reflexive candidate from the local candidate, it - follows the procedures of Section XX of [13] for each local transport - address in the local candidate. The Binding Response will provide - the client with its server reflexive transport address in the MAPPED- - ADDRESS attribute. If the client had K local candidates, this will - produce S*K server reflexive candidates, where S is the number of - STUN servers. + follows the procedures of Section 12.2 of [13] for each local + transport address in the local candidate. The Binding Response will + provide the client with its server reflexive transport address. If + the client had K local candidates, this will produce S*K server + reflexive candidates, where S is the number of STUN servers. Since a client will pace its STUN transactions (both Binding and Allocate requests) at a total rate of one new transaction every Ta seconds, it will take a certain amount of time to complete the address gathering phase. It is RECOMMENDED that implementations have a configurable upper bound on the total amount of time allotted to address gathering. Any transactions not completed at that point SHOULD be abandoned, but MAY continue and be used in an updated offer once they complete. A default value of 5s is RECOMMENDED. Since the total number of allocations that could be done (based on the number @@ -658,30 +658,80 @@ Once the allocations are complete, any redundant candidates are discarded. Candidate A is redundant with candidate B if the transport addresses for each component of each component match, and each component of their associated local candidates match. For example, consider a set of candidates with a single component. One candidate is a local candidate, and its one component has a transport address of 10.0.1.1:4458. A reflexive transport address is derived from this local transport address, producing a 10.0.1.1:4458. These two candidates are identical, and also have identical associated - local transport addresses, so they are redundant. However, in a more - complicated case, consider a multi-homed host, with one interface at - 192.168.1.1 and another at 10.0.1.1. The 192.168 network is natted, - with its "public" side in another net-10 private network. The client - obtains two local candidates, A and B, with transport addresses of - 192.168.1.1:2376 and 10.0.1.1:7266 respectively. A server reflexive - transport address is derived from A through a STUN query, and it - happens to produce 10.0.1.1:7266. Call this candidate C. Candidate C - is not redundant with candidate B, since they have different - associated local transport addresses. + local transport addresses, so they are redundant. + + +----------+ + | STUN Srvr| + +----------+ + | + | + ----- + // \\ + | | + | B:net10 | + | | + \\ // + ----- + | + | + +----------+ + | NAT | + +----------+ + | + | + ----- + // \\ + | A | + |192.168/16 | + | | + \\ // + ----- + | + | + |192.168.1.1 ----- + +----------+ // \\ +----------+ + | | | | | | + | Offerer |---------| C:net10 |---------| Answerer | + | |10.0.1.1 | | 10.0.1.2 | | + +----------+ \\ // +----------+ + ----- + + Figure 2 + + Consider the more complicated case of Figure 2. In this case, the + offerer is multi-homed. It has one interface, 10.0.1.1, on network + C, which is a net 10 private network. The Answerer is on this same + network. The offerer is also connected to network A, which is + 192.168/16. The offerer has an interface of 192.168.1.1 on this + network. There is a NAT on this network, natting into network B, + which is another net10 private network, but not connected to network + C. There is a STUN server on network B. + + The offerer obtains local transport address on its interface on + network C (10.0.1.1:2498) and a local transport address on its + interface on network A (192.168.1.1:3344). It performs a STUN query + to its configured STUN server from 192.168.1.1:3344. This query + passes through the NAT, which happens to assign the binding 10.0.1.1: + 2498. The STUN server reflects this in the STUN Binding Response. + Now, the offerer has obtained a candidate with a transport address it + already has (10.0.1.1:2498), but from a new interface. It therefore + keeps it. When it performs its connectivity checks, the offerer will + end up sending packets from both interfaces, and those sent from its + interface on network C will succeed. 7.2 Prioritizing the Candidates and Choosing an Active One The prioritization process takes the set of candidates and associates each with a priority. This priority reflects the desire that the agent has to receive media at that candidate, and is assigned as a value from 0 to 1 (1 being most preferred). Priorities are ordinal, so that their significance is only meaningful relative to other candidates from that agent for a particular media stream. Candidates MAY have the same priority. However, it is RECOMMENDED that each @@ -824,30 +874,29 @@ The transport, addr and port of the a=candidate attribute (all defined in Section 12) are set to the transport protocol, unicast address and port of the tranport address. A Fully Qualified Domain Name (FQDN) for a host MAY be used in place of a unicast address. In that case, when receiving an offer or answer containing an FQDN in an a=candidate attribute, the FQDN is looked up in the DNS using an A or AAAA record, and the resulting IP address is used for the remainder of ICE processing. The qvalue is set to the priority of the candidate, and MUST be the same for all components of the candidate. - All of the candidates share a password that is used for securing the - STUN connectivity checks. This password MUST be chosen randomly with - 128 bits of randomness (though it can be longer than 128 bits). This + All of the candidates for a media stream share a password that is + used for securing the STUN connectivity checks. Furthermore, the + password for candidates for different media streams MAY be the same, + or MAY be different. This password MUST be chosen randomly with 128 + bits of randomness (though it can be longer than 128 bits). This password is contained in the a=ice-pwd attribute, present as a - session level attribute. A new password MUST be selected for each - new session, and MUST be present with the same value in all - subsequent offers and answers from the agent. The converse is true; - if a new offer is generated as part of a new multimedia session, a - new password MUST be used even if the transport address from a - previous session was being recycled. + session or media level attribute. New passwords MUST be selected for + each new session, even if the transport address from a previous + session was being recycled. The combination of candidate-id and component-id uniquely identify each transport address. As a consequence, each transport address has a unique identifier, called the tid. The tid is formed by concatenating the candidate-id with the component-id, separated by the colon (":"). The tid is not explicitly encoded in the SDP; it is derived from the candidate-id and component-id, which are present in the SDP. The usage of the colon as a separator allows the candidate-id and component-id to be extracted from the tid, since the colon is not a valid character for the candidate-id. @@ -941,21 +990,21 @@ themselves are paired up such that transport addresses with the same component ID are combined to form a transport address pair. Returning to the previous example, for each of the 8 candidate pairs, there would be two transport address pairs - one for RTP, and one for RTCP. If one candidate has more components than the other, those extra components will not be part of a transport address pair, won't be validated, and will effectively be treated as if they weren't included in the candidate pair in the first place. The relationship between a candidate, candidate pair, transport - address, transport address pair and component are shown in Figure 2. + address, transport address pair and component are shown in Figure 3. This figure shows the relationships as seen by the agent that owns the candidate with candidate ID "L". This candidate has two components with transport addresses A and B respectively. This candidate is called the native candidate, since it is the one owned by the agent in question. The candidate owned by its peer is called the remote candidate. As the figure shows, there is a single candidate pair, and two components in each candidate. The native candidate has a candidate-id of "L", and the remote candidate has a candidate-id of "R". Since the two component-ids are 1 and 2, candidate "L" has two transport addresses with transport address IDs @@ -1000,21 +1049,21 @@ . ............. ............. . . Native Remote . . Candidate Candidate . . id=L id=R . . . . . ............................................... Candidate Pair - Figure 2 + Figure 3 If a candidate pair was created as a consequence of an offer generated by an agent, then that agent is said to be the offerer of that candidate pair and all of its transport address pairs. Similarly, the other agent is said to be the answerer of that candidate pair and all of its transport address pairs. As a consequence, each agent has a particular role, either offerer or answerer, for each transport address pair. This role is important; when a candidate pair is to be promoted to active, the offerer is the one which performs the updated offer. @@ -1130,21 +1179,21 @@ performed by sending peer-to-peer STUN Binding Requests. These checks result in a candidate progressing through a state machine that captures the progress of connectivity checks. The specific state machine and the procedures for the connectivity checks are specific to the transport protocol. This specification defines rules for UDP. Extensions to ICE that describe other transport protocols SHOULD describe the state machine and the procedures for connectivity checks. The set of states visited by the offerer and answerer are depicted - graphically in Figure 4 + graphically in Figure 5 | |Start | | V +------------+ | | | | | Waiting |----------------+ @@ -1187,21 +1236,21 @@ ------- | | ------- Send Res +------------+ - | ^ | | | | +-------+ Timer Tr -------- Send Req - Figure 4 + Figure 5 The state machine has six states - waiting, testing, Recv-Valid, Send-Valid, Valid and Invalid. Initially, all transport address pairs start in the waiting state. In this state, the agent waits for one of two events - a chance to send a Binding Request, or receipt of a Binding Request. Since there is an instance of the state machine for each transport address pair, Binding Requests and responses need to be matched to the specific state machine for which they apply. This is done by @@ -1341,31 +1390,38 @@ that reflesive address. For relayed transport addresses, it is sent by using STUN mechanisms to send the request through the STUN relay (using the Send request). Sending the request through the STUN relay server neccesarily requires that the request be sent from the client, using the local transport address used to derive the relayed transport address. The Binding Request sent by the agent MUST contain the USERNAME attribute. This attribute MUST be set to the transport address pair ID of the corresponding transport address pair as seen by its peer. - Thus, for the first transport address pair in Figure 2, if the agent + Thus, for the first transport address pair in Figure 3, if the agent on the left sends the STUN Binding Request, the USERNAME will have the value R:1:L:1. If the agent on the right sends the STUN Binding Request, the USERNAME will have the value L:1:R:1. To be clear, the USERNAME that is used is NOT the one seen locally, but rather the one as seen by its peer. The request SHOULD contain the MESSAGE- INTEGRITY attribute, computed according to [13]. The key used as input to the HMAC is the password provided by the peer for this remote transport address. This password will be identical for all remote transport addresses for the same media stream. + Note that all ICE implementations are required to be compliant to + [13], as opposed to the older [16]. Consequently, all connectivity + checks will contain the magic cookie in the STUN header, and cause + the STUN server embedded in each ICE implementation to include XOR- + MAPPED-ADDRESS attributes in the response, rather than MAPPED- + ADDRESS. + The STUN transaction will generate either a timeout, or a response. If the response is a 420, 500, or 401, the agent should try again as described in [13] (as mentioned above, it need not wait Ta seconds to try again). Either initially, or after such a retry, the STUN transaction might produce a non-recoverable failure response (error codes 400, 430, 431, or 600) or a failure result inapplicable to this usage of STUN and thus unrecoverable (432, 433). If this happens, an error event is generated into the state machine, and the transport address pair enters the invalid state. @@ -1419,32 +1475,32 @@ Processing of the Binding Request proceeds in two steps. The first is generation of the response, and the second ICE-specific processing. Generation of the response follows the general procedures of [13]. The USERNAME is considered valid if one of the candidate IDs sent in an offer or answer is a prefix of the USERNAME (this will always be the case, even for peer reflexive candidates). The password associated with that candidate ID is used to verify the MESSAGE-INTEGRITY attribute, if one was present in the request. If the USERNAME was not valid, the agent generates a 430. Otherwise, - the success response will include the MAPPED-ADDRESS attribute, which - is used for learning new candidates, as described in Section 7.10. - The MAPPED-ADDRESS attribute is populated with the source IP address - and port of the Binding Request. For Binding Requests received over - relayed transport addresses, this MUST be the source IP address and - port of the Binding Request when it arrived at the relay, prior to - forwarding towards the agent. That source transport address will be - present in the REMOTE-ADDRESS attribute of a STUN Data Indication - message, if the Binding Request was delivered through a Data - Indication. If the Binding Request was not encapsulated in a Data - Indication, that source address is equal to the current active - destination for the STUN relay session. + the success response will include the XOR-MAPPED-ADDRESS attribute, + which is used for learning new candidates, as described in + Section 7.10. The XOR-MAPPED-ADDRESS attribute is constructed using + the source IP address and port of the Binding Request. For Binding + Requests received over relayed transport addresses, this MUST be the + source IP address and port of the Binding Request when it arrived at + the relay, prior to forwarding towards the agent. That source + transport address will be present in the REMOTE-ADDRESS attribute of + a STUN Data Indication message, if the Binding Request was delivered + through a Data Indication. If the Binding Request was not + encapsulated in a Data Indication, that source address is equal to + the current active destination for the STUN relay session. The ICE processing involves changes to the state machine for a transport address pair. This processing cannot be done until the initial offer/answer exchange has completed. As a consequence, if the oferrer received a Binding Request that generated a success response, but had not yet received the answer to its offer, it waits for the answer, and when it arrives, then performs the ICE processing. The agent takes the entire contents of the USERNAME, and compares @@ -1551,34 +1606,47 @@ Indication, that source address is equal to the current active destination for the STUN relay session. The comparison of the source IP and port of the Binding Request and the IP address and port of the remote transport address in the matching transport address pair may indicate inequality. In that case, the source IP and port of the Binding Request (and again, for relayed transport address, this refers to the source IP address and port of the packet when it arrived at the relay) are compared to the IP address and ports across the transport address pairs in *all* - remote candidates. If there is still no match, it means that the - source IP and port might represent another valid remote transport - address - a peer derived one. + remote candidates. If there is a match to another remote candidate + (called the alternate remote candidate), this is not a new candidate; + however, the Binding Request has effectively helped validate the + alternate remote candidate. The agent SHOULD select the candidate + pair corresponding to the combination of the alternate remote + candidate and the native candidate from the original matching + candidate pair. A "Get Req" event is passed to the state machine for + that candidate pair. Consequently, if this candidate pair was in the + Waiting state, a connectivity check will be generated for it. + + If, when the source IP and port of the STUN packet, when compared + against all remote candidates, was not a match to any of them, it + means that the source IP and port might represent another valid + remote transport address - a peer derived one. To use it, that address needs to be associated with a candidate (called a peer-derived candidate). In this case, however, the candidate isn't signaled through an offer/answer exchange; it is constructed dynamically from information in the STUN request. Like all other candidates, the peer-derived candidate has a candidate ID. The candidate ID is derived from the candidate IDs of the matching candidate pair. In particular, the candidate ID is constructed by concatenating the remote candidate ID with the native candidate ID (without the colon). The password for the new candidate equals that - of the remote candidate ID in the matching candidate pair. + of the remote candidate ID in the matching candidate pair (note that, + this password would be the same for all remote candidates for the + same media line). On receipt of a STUN Binding Request whose source IP and port don't match the transport address in any remote candidate, the agent constructs the candidate ID that represents the peer reflexive candidate, and checks to see if that candidate exists. It may already exist if it had been constructed as a consequence of a previous application of this logic on receipt of a Binding Request for a different transport address pair of the same candidate pair. If there is not yet a peer reflexive candidate with that candidate ID, the agent creates it, and assigns it the newly computed candidate @@ -1609,21 +1677,21 @@ procedures of Section 7.5, which pair up each remote candidate with each native candidate, this peer reflexive candidate is only paired up with the native candidate from the candidate pair from which it was derived. This creates a new candidate pair, and a set of new transport address pairs. Recall that, for each candidate pair, one agent plays the role of offerer, and the other of answerer. For a peer-reflexive candidate, the role is identical to that of its generating candidate. - Figure 5 provides a pictorial representation of the peer reflexive + Figure 6 provides a pictorial representation of the peer reflexive candidate (the one with id=RL) and its pairing with the native candidate with id L. The candidate with ID R is referred to as the generating candidate. The peer reflexive candidate is effectively an alternate for that generating candidate, but is only paired with a specific native candidate. Note that, for a particular generating candidate, there can be many peer derived candidates, up to one for each native candidate. ............. ............. . tid=L:1 . . tid=R:1 . @@ -1650,21 +1718,21 @@ | . . Candidate | . tid=RL:2 . | id=L:2:RL:2 . -- .component +-------------------| D| . id=2 . -- . ............. Remote Candidate id=RL - Figure 5 + Figure 6 The new transport address pairs have a state machine associated with them. The state that is entered, and actions to take as a consequence, are specific to the transport protocol. For UDP, the procedures are defined here. Extensions that define processing for other transport protocols SHOULD describe the behavior. For UDP, the state machine enters the Send-Valid state. Effectively, the Binding Request just received "counts" as a validation in this direction, even though it was formally done for a different candidate @@ -1675,67 +1743,77 @@ and processing follows the logic described in Section 7.6. 7.10.2 On Receipt of a Binding Response The procedures on receipt of a Binding Response are nearly identical to those for receipt of a Binding Request as described above. When a successful STUN Binding Response is received, it will be associated with a matching transport address pair and corresponding candidate pair. This matching is done based on comparison of - candidate IDs. The value of the MAPPED-ADDRESS attribute of the - Binding Response are compared to the IP address and port of the - native transport address in the matching transport address pair. - Note that, in this case, we are comparing actual IP addresses and - ports - not tids. These may not match if there was a NAT between the - two agents. If they do not match, the value of the MAPPED-ADDRESS - attribute of the Binding Response are compared to the IP address and - ports across the transport address pairs in *all* native candidates. - If there is still no match, it means that the MAPPED-ADDRESS might - represent another valid native transport address. + candidate IDs. The reflexive transport address from the Binding + Response is compared to the IP address and port of the native + transport address in the matching transport address pair. Note that, + in this case, we are comparing actual IP addresses and ports - not + tids. These may not match if there was a NAT between the two agents. + If they do not match, the reflexive transport address is compared to + the IP address and ports across the transport address pairs in *all* + native candidates. If there is a match to another native candidate + (called the alternate native candidate), this is not a new candidate; + however, the Binding Response has effectively helped validate the + alternate native candidate. The agent SHOULD select the candidate + pair corresponding to the combination of the alternate native + candidate and the remote candidate from the original matching + candidate pair. If the candidate pair is in the Waiting state, it + moves directly to the Recv Valid state. + + If, when the reflexive transport address, when compared against all + native candidates, was not a match to any of them, it means that the + reflexive transport address might represent another valid native + transport address - a peer derived one. To use it, that address needs to be associated with a candidate. In this case, however, the candidate isn't signaled through an offer/ answer exchange; it is constructed dynamically from information in the STUN response. Such a candidate is called a peer reflexive candidate. Like all other candidates, the peer reflexive candidate has a candidate ID. The candidate ID is derived from the candidate IDs of the matching candidate pair. In particular, the candidate ID is constructed by concatenating the native candidate ID with the remote candidate ID (without the colon). The password for the new candidate equals that of the native candidate ID in the matching candidate pair. - On receipt of a STUN Binding Response whose MAPPED-ADDRESS didn't - match the transport address in any native candidate, the agent - constructs the candidate ID that represents the peer reflexive - candidate, and checks to see if that candidate exists. It may - already exist if it had been constructed as a consequence of a + On receipt of a STUN Binding Response whose reflexive transport + address didn't match the transport address in any native candidate, + the agent constructs the candidate ID that represents the peer + reflexive candidate, and checks to see if that candidate exists. It + may already exist if it had been constructed as a consequence of a previous application of this logic on receipt of a Binding Response for a different transport address pair of the same candidate pair. If there is not yet a peer derived candidate with that candidate ID, the agent creates it, and assigns it the newly computed candidate ID. The priority of the new candidate MUST be set to the priority of the generating candidate - the native candidate in the matching transport address pair. Note that, at this time, the peer derived candidate has no transport addresses in it. Newly created or not, the agent extracts the component ID from the matching transport address pair, and sees if a transport address with that same component ID exists in the peer reflexive candidate. If not (and it shouldn't), the agent adds a transport address to the peer reflexive candidate. This transport address is equal to the - MAPPED-ADDRESS from the STUN Binding Response. It is assigned the - component ID equal to the component ID in the matching transport - address pair. This transport address will have a tid, equal to the - concatenation of the candidate ID for this new candidate, and the - component ID, separated by a colon. + reflexive transport address from the STUN Binding Response. It is + assigned the component ID equal to the component ID in the matching + transport address pair. This transport address will have a tid, + equal to the concatenation of the candidate ID for this new + candidate, and the component ID, separated by a colon. The peer-derived candidate becomes usable once the number of transport addresses in it equals the transport address pair count of candidate pair from which it is derived. Initially, the peer-derived candidate will start with a single transport address. More are added as the connectivity checks for the original candidate pair take place. Once the peer-derived candidate becomes usable, it has to be paired up with remote candidates. However, unlike the procedures of Section 7.5, which pair up each remote candidate with each native candidate, the peer-derived candidate is only paired up with the @@ -1780,21 +1858,24 @@ 7.11.1 Sending of a Subsequent Offer The offer MAY contain a new active candidate in the m/c line. This candidate SHOULD be the native candidate from the highest candidate pair in the candidate pair priority ordered list whose state is Valid. If there are no candidate pairs in this state, the highest one whose state is Send-Valid or Recv-Valid SHOULD be used. If there are no candidate pairs in these states, the candidate pair that is most likely to work with this peer, as described in Section 7.2, SHOULD be used. The candidate is encoded into the m/c line in an - updated offer as described in Section 7.3. + updated offer as described in Section 7.3. Note that, while peer- + derived candidates never appear in a=candidate attributes (only their + generating candidates appear there), a peer-derived candidate can + appear in the m/c line if it has been selected for usage for media. If the candidate pair whose native candidate was encoded into the m/c-line was Valid, Send-Valid or Recv-Valid, the agent MUST include an a=remote-candidate attribute into the offer. This attribute MUST contain the candidate ID of the remote candidate in the candidate pair. It is used by the recipient of the offer in selecting its candidate for the answer. The meaning of a=candidate attributes within a subsequent offer have the same meaning as they do in an initial offer. They are a request @@ -1885,41 +1966,44 @@ (including any peer reflexive candidates), local operating system resources for each of the transport addresses in the local candidate SHOULD be de-allocated, as long as it is not using those resources elsewhere. The resources may be in use elsewhere if they were included in an initial offer which generated multiple answers (as can happen with SIP forking). In such a case, a subsequent offer which removes the candidate will not imply its removal with the other branches; each becomes a separate offer/answer relationship. - Subsequent offers MUST contain the a=ice-pwd attribute. This SHOULD - have the same value as in previous offers. However, an agent MAY - change it if, for some reason, the agent believes that the password - may have been compromised. Since the same password is applied across - all transport addresses in all candidates for all media streams, a - change in the password impacts all of them. An agent MUST be + Subsequent offers MUST contain a=ice-pwd attributes that specify the + password for the candidates for each media stream. The password for + the candidates for a particular media stream SHOULD have the same + value as in previous offers. However, an agent MAY change it if, for + some reason, the agent believes that the password may have been + compromised. Note that it is permissible to use a session-level + attribute in one offer, and in a subseqeunt offer, provide the same + password as a media-level attribute. This is not a change in the + password; merely a change in its representation. An agent MUST be prepared to receive connectivity checks that use either the new or old password until Tpw seconds after it receives the answer. Tpw SHOULD be configurable, and SHOULD default to 2 seconds. 7.11.2 Receiving the Offer and Sending an Answer To generate the answer, the answerer has to decide which transport addresses to include in the m/c line, and which to include in candidate attributes. The first step in the process is to look for the a=remote-candidate attribute in the offer. The a=remote-candidate exists to eliminate a race condition between the updated offer and the response to the STUN Binding Request that moved a candidate into the Valid state. This - race condition is shown in Figure 6. On receipt of message 5, agent + race condition is shown in Figure 7. On receipt of message 5, agent A can move its transport address pair state machine into the Valid state. It sends a STUN response to the request (message 6), but this is lost. Agent A proceeds with an updated offer (message 7), which is received at agent B. As far as agent B is concerned, the transport address pair is still in the Send-Valid state. It will move into the Valid state only on receipt of the STUN response in message 10. Thus, upon receipt of the offer, agent B cannot determine which candidate to include in its answer. To eliminate this condition, the identity of the validated candidate is included in the offer itself. Note, however, that the answerer will not send media until it has @@ -1941,21 +2025,21 @@ | |Lost | |(7) Offer | | |------------------------------------------>| |(8) Answer | | |<------------------------------------------| |(9) STUN Req. | | |<------------------------------------------| |(10) STUN Res. | | |------------------------------------------>| - Figure 6 + Figure 7 If the a=remote-candidate attribute is present, the agent examines the transport addresses in the m/c-line of the offer. It compares these with the transport addresses in the remote candidates of all candidate pairs. If there is at least one match, the agent compares the native candidate ID of each matching pair with the value of the a=remote-candidate attribute. If there is a match, that candidate pair is selected. For each transport address pair in that candidate pair, if the state of the transport address pair is Send-Valid, the agent considers the state to be Valid just for the purpose of @@ -2184,27 +2268,56 @@ the case of a relayed transport address, this means that media packets are sent through the relay server (for STUN relays, this would be using the Send request). For local transport addresses, media is sent from that local transport address. For peer reflexive transport addresses, media is sent from the local transport address used to obtain the reflexive address. ICE has interactions with jitter buffer adaptation mechanisms. An RTP stream can begin using one candidate, and switch to another one. The newer candidate may result in RTP packets taking a different path - through the network - one with different delay characteristics. To - signal to the jitter buffers that this change has happened, it is - RECOMMENDED that, when an agent switches transmission of media from - one candidate pair to another, it sets the RTP marker bit. - Furthermore, it is RECOMMENDED that, upon receipt of an RTP packet - with the marker bit set, or upon receipt of a packet with a different - source IP address, that the agent re-adjust its jitter buffers. + through the network - one with different delay characteristics. As + discussed below, agents are encouraged to re-adjust jitter buffers + when there are changes in source or destination address. + Furthermore, many audio codecs use the marker bit to signal the + beginning of a talkspurt, for the purposes of jitter buffer + adaptation. For such codecs, it is RECOMMENDED that the sender + change the marker bit when an agent switches transmission of media + from one candidate pair to another. + +7.14 Receiving Media + + ICE implementations MUST be prepared to receive media on a candidate + pair if it is in the role of offerer for that candidate pair, even if + that candidate pair is not currently active. This is a consequence + of the early media mechanism described in the previous section. + + If an agent determines that its peer supports ICE (an offerer knows + this when the answer contains a=candidate attributes), it SHOULD + discard any media packets received on a candidate pair prior to the + candidate pair entering the Send Valid state. This helps eliminate + certain attacks, as discussed in Section 13. + + It is RECOMMENDED that, when an agent receives an RTP packet with a + new source or destination IP address for a particular media stream, + that the agent re-adjust its jitter buffers. + + RFC 3550 [23] describes an algorithm in Section 8.2 for detecting + SSRC collisions and loops. These algorithms are based, in part, on + seeing different source IP addresses and ports with the same SSRC. + However, when ICE is used, such changes will naturally occur as the + media streams switch between candidates. An agent will be able to + determine that a media stream is from the same peer as a consequence + of the STUN exchange that proceeds media transmission. Thus, if + there is a change in source IP address and port, but the media + packets come from the same peer agent, this SHOULD NOT be treated as + an SSRC collision. 8. Guidelines for Usage with SIP SIP [2] makes use of the offer/answer model, and is one of the primary targets for usage of ICE. SIP allows for offer/answer exchanges to occur in many different combinations of messages, including INVITE/200 OK and 200 OK/ACK. When support for reliable provisional responses (RFC 3262 [11]) and UPDATE (RFC 3311 [27]) are added, additional combinations of messages that can be used for offer/answer exchanges are added. As such, this section provides @@ -2244,25 +2357,27 @@ answer in a provisional response. When reliable provisional responses are not used, the SDP in the provisional response is not formally the answer; the value in the 200 OK is the actual answer. However, RFC 3261 allows for SDP to appear in an unreliable provisional response, in which case its value has to be identical to the value placed in the 200 OK. Thus, we refer to the SDP in the provisional response, even when unreliable, as the answer. To deal with possible losses of the provisional response, it SHOULD be retransmitted until some indication of receipt. This indication can either be through PRACK [11], or through the receipt of a STUN - Binding Request with a correct username and password. Furthermore, - once the answer has been sent, the agent SHOULD begin its - connectivity checks. Once a candidate reaches the Valid or Recv- - Valid state, the UAS has a known-valid path for media packets towards - the UAC. This point is called the connected point in ICE. + Binding Request with a correct username and password. Even if PRACK + is not used, the provisional response SHOULD be retransmitted using + the exponential backoff described in [11]. Furthermore, once the + answer has been sent, the agent SHOULD begin its connectivity checks. + Once a candidate reaches the Valid or Recv-Valid state, the UAS has a + known-valid path for media packets towards the UAC. This point is + called the connected point in ICE. Once the UAS reaches the connected point, media can be sent from the UAS towards the UAC without any additional delays. However, between the receipt of the INVITE and the connected point, any media that needs to be sent towards the caller (such as SIP early media [29] cannot be transmitted. For this reason, implementations MAY choose to delay alerting the called party until the connected point is reached. In the case of a PSTN gateway, this would mean that the setup message into the PSTN is delayed until the connected point. Doing this increases the post-dial delay, but has the effect of @@ -2375,28 +2490,28 @@ the IP address of the transport address with mnemonic name "taddr". Similarly, $TADDR.PORT is used to refer to the value of the port of the transport address with mnemonic name "TADDR". In the call flow itself, STUN messages are annotated with several attributes. The "S=" attribute indicates the source transport address of the message. The "D=" attribute indicates the destination transport address of the message. The "MA=" attribute is used in STUN Binding Response messages, STUN Binding Response messages carried in a STUN Send Request or Data Indication, and in a Allocate - Response, and refers to the value of the MAPPED-ADDRESS attribute. - The "RA=" attribute is used in STUN Data Indications, and refers to - the value of the REMOTE-ADDRESS attribute. The "U=" attribute is - used in STUN Requests, and corresponds to the STUN USERNAME. The - "DA=" attribute is used in STUN Send requests, and refers to the - value of the DESTINATION-ADDRESS attribute. The "R=" attribute is - used in Allocate responses, and it indicates the value of the RELAY- - ADDRESS attribute. + Response, and refers to the reflexive transport address derived from + the XOR-MAPPED-ADDRESS attribute. The "RA=" attribute is used in + STUN Data Indications, and refers to the value of the REMOTE-ADDRESS + attribute. The "U=" attribute is used in STUN Requests, and + corresponds to the STUN USERNAME. The "DA=" attribute is used in + STUN Send requests, and refers to the value of the DESTINATION- + ADDRESS attribute. The "R=" attribute is used in Allocate responses, + and it indicates the value of the RELAY-ADDRESS attribute. The call flow examples omit STUN authentication operations. 11.1 Basic Example In this example, the NAT has the address and port independent mapping property and the address dependent permission property. Neither agent is using the STUN relay usage, only the binding discovery usage. As a consequence, agent L will end up with two candidates - a local candidate and a server reflexive candidate. Agent R will have @@ -2525,21 +2640,21 @@ | | | | | | | |RTP flows | | | | | | | | | | | | | | | | | | | | | | | | | | | | - Figure 7 + Figure 8 First, agent L obtains a server reflexive transport address for its RTP packets (messages 1-4). Recall that the NAT has the address and port independent mapping property. Here, it creates a binding of NAT-PUB-1 for this UDP request, and this becomes the server reflexive transport address for RTP, the sole component of its server reflexive candidate. With its two candidates, agent L prioritizes them, choosing the local candidate as highest priority, followed by the server reflexive @@ -3204,21 +3319,21 @@ | | | | |(66) Answer | | | |<-------------------------------------------| | | | | | | | | | | | | | | | | | | | | | | | | - Figure 10 + Figure 11 First, agent L obtains both server reflexive and relayed transport addresses for its RTP packets, using a STUN Allocate request, which will provide it with both types of addresses (messages 1-4). Recall that the NAT has the address and port dependent mapping property. Here, it creates a binding of NAT-PUB-1 for this UDP request, and this becomes the server reflexive transport address for RTP. The relayed transport address is STUN-PUB-2, allocated by the STUN server. Agent L repeats this process for RTCP (messages 5-8) Ta seconds later, and obtains NAT-PUB-2 as its server reflexive @@ -3315,65 +3431,65 @@ a packet from the peer can be returned). Thus, the STUN server will relay the received STUN request towards agent R (message 18). This is delivered as a STUN Data Indication. Notice how the REMOTE- ADDRESS is STUN-PUB-2; this is important as it will be used to construct the STUN Binding Response. Agent R will receive the Data Indication, and unwrap its contents to find the Binding Request. The state machine for this transport address pair is currently in the Testing state. It therefore moves into the Send-Valid state, and it generates a Binding Response. - However, the MAPPED-ADDRESS in the Binding Response is constructed - using the source IP address and port that were seen by the STUN - server when the Binding Request arrived at STUN-PUB-4, which is the - looped message between messages 17 and 18. This source address is - STUN-PUB-2, which is the value of the REMOTE-ADDRESS attribute in - message 18. Thus, the STUN Binding Response will contain STUN-PUB-2 - in the MAPPED-ADDRESS, and is to be sent to STUN-PUB-2. To send the - response, agent R takes the STUN Binding Response and encapsulates it - in a STUN Send indication, setting the DESTINATION-ADDRESS to STUN- - PUB-2. This is shown in message 19. + However, the XOR-MAPPED-ADDRESS in the Binding Response is + constructed using the source IP address and port that were seen by + the STUN server when the Binding Request arrived at STUN-PUB-4, which + is the looped message between messages 17 and 18. This source + address is STUN-PUB-2, which is the value of the REMOTE-ADDRESS + attribute in message 18. Thus, the STUN Binding Response will + contain STUN-PUB-2 in the XOR-MAPPED-ADDRESS, and is to be sent to + STUN-PUB-2. To send the response, agent R takes the STUN Binding + Response and encapsulates it in a STUN Send indication, setting the + DESTINATION-ADDRESS to STUN-PUB-2. This is shown in message 19. The STUN server will receive this Send Indication, and unwrap its contents to find the STUN Binding Response. It sends it to the value of the DESTINATION-ADDRESS attribute, and sends it from the relayed address allocated to R, which is STUN-PUB-4. This, once again, results in a looped message to itself, and it arrives at STUN-PUB-2. Now, however, there is a permission installed for STUN-PUB-4. The STUN server will therefore forward the packet to agent L. To do so, it constructs a STUN Data Indication containing the contents of the packet. It sets the REMOTE-ADDRESS to the source transport address of the request it received (STUN-PUB-4), and forwards it to agent L (message 20). This traverses the NAT (message 21) and arrives at agent L. As a consequence of the receipt of a Binding Response, the state machine for this transport address pair moves to the Recv-Valid - state. The agent also examines the MAPPED-ADDRESS of the STUN - response. It is STUN-PUB-2. This is the same as the native + state. The agent also examines the XOR-MAPPED-ADDRESS of the STUN + response. It indicates STUN-PUB-2. This is the same as the native transport address of this transport address pair, and thus doesn't represent a new transport address that might have been learned. Because of the receipt of message 18, the transport address pair moved from Testing to Send-Valid, causing R to attempt a retransmission of its STUN Binding Request that was lost (the contents of message 15 that were discarded by the STUN server due to lack of permission). This time, however, a permission has been installed and the retransmission will work. So, it sends the Binding Request again (message 22, identical to message 15). This is looped by the STUN server to itself again, but this time there is a permission in place when it arrives at STUN-PUB-2. As such, the request is forwarded towards agent L this time, in a STUN Data Indication (message 23). This traverses the NAT (message 24) and arrives at agent L. Agent L extracts the contents of the request, which are a STUN Binding Request. This causes the state machine to move from Recv-Valid to Valid. It generates a STUN Binding Response, - and sets the MAPPED-ADDRESS to the value of the REMOTE-ADDRESS in - message 24 (STUN-PUB-4). This Binding Response is sent to + and sets the XOR-MAPPED-ADDRESS based on the value of the REMOTE- + ADDRESS in message 24 (STUN-PUB-4). This Binding Response is sent to STUN-PUB-4, which is accomplished through a STUN Send Indication (message 25). This Send Indication traverses the NAT (message 26) and is received by the STUN server. Its contents are decapsulated, and sent to STUN-PUB-4, which is again a loop on the same host. This packet is then sent towards agent R in a Data Indication (message 27). The contents of the DATA Indication are extracted, and the agent sees a successful Binding Response. It therefore moves the state machine from the Send-Valid state to the Valid state. At this point, the transport address pair is in the Valid state for both agents. @@ -3422,29 +3538,30 @@ agent L and R-PUB-1 on agent R. This is a local candidate for each agent. To perform the check, agent L sends a STUN Binding Request from L-PRIV-1 to R-PUB-1 (message 47). Note the USERNAME of R1:1:L1:1, which identifies this transport address pair. This traverses the NAT (message 48). Since the NAT has the address and port dependent mapping property, and this is a new destination IP address, the NAT allocates a new transport address on its public side, NAT-PUB-3, and places this in the source IP address and port. This packet arrives at agent R. Agent R finds a matching transport address pair in the Waiting state. The state machine transitions to - the Send-Valid state. It sends the Binding response, with a MAPPED- - ADDRESS equal to NAT-PUB-3 (message 49), which traverses the NAT and - arrives at agent L (message 50). Agent R, in addition to sending the - response, will also send a Binding Request. It is important to - remember that this Binding Request is sent to the remote address in - the transport address pair (L-PRIV-1), and NOT to the source IP - address and port of the Binding Request (NAT-PUB-3); that will happen - later. This attempt is shown in message 51. However, since the - L-PRIV-1 is private, the packet is discarded in the network. + the Send-Valid state. It sends the Binding response, with a XOR- + MAPPED-ADDRESS indicating NAT-PUB-3 (message 49), which traverses the + NAT and arrives at agent L (message 50). Agent R, in addition to + sending the response, will also send a Binding Request. It is + important to remember that this Binding Request is sent to the remote + address in the transport address pair (L-PRIV-1), and NOT to the + source IP address and port of the Binding Request (NAT-PUB-3); that + will happen later. This attempt is shown in message 51. However, + since the L-PRIV-1 is private, the packet is discarded in the + network. Now, as a consequence of receiving message 48, agent R will have constructed a peer-derived candidate. The candidate ID for this candidate is L1R1, and it initially contains a single transport address pair, NAT-PUB-3 and R-PUB-1. However, the candidate isn't yet usable until the other component gets added. Similarly, agent L will have constructed the same peer-derived candidate, with the same candidate ID and the same transport address pair. Some Ta seconds after sending message 28, agent R will move to the @@ -3484,22 +3601,22 @@ both into the Recv-Valid state upon receipt of message 56). The first of these connectivity checks are for the RTP component, from R-PUB-1 to NAT-PUB-3 (message 57). Note the USERNAME in the STUN Binding Request, L1R1:1:R1:1, which identifies the peer-derived transport address pair. This will succesfully traverse the NAT and be delivered to agent L (message 58). The receipt of this request moves the state machine for this transport address pair from Recv- Valid to Valid, and a Binding Response is sent (message 59). This passes through the NAT and arrives at agent R (message 60). This causes its state machine to enter the Valid state as well. The - MAPPED-ADDRESS, R-PUB-1, is not new to agent R and thus does not - result in the creation of a new peer-derived candidate. + reflexive transport address, R-PUB-1, is not new to agent R and thus + does not result in the creation of a new peer-derived candidate. Messages 61 through 64 show the same basic flow for RTCP. Upon receipt of message 64, both transport address pairs are Valid at both agents, causing the peer derived candidate to become valid. Timer Tws is set at agent L, and fires without any higher priority candidate pairs becoming validated. At agent R, media can now be sent on this candidate pair from answerer (agent R) to offerer (agent L). Agent L sends an updated offer to promote the peer-derived candidate to active. This offer (message 65) looks like: @@ -3582,21 +3699,24 @@ for a particular candidate. It MUST be constructed with at least 24 bits of randomness. It MUST have the same value for all transport addresses within the same candidate. It MUST have a different value for transport addresses within different candidates for the same media stream. The candidate-id uses a syntax that is defined to be equal to the base64 alphabet [3], which allows the candidate-id to be generated by performing a base64 encoding of a randomly generated value (note, however, that this does not mean that the candidate-id or password is base64 decoded when use in STUN messages). In addition, if content is base64 encoded to generate the candidate-id, - it MUST NOT be padded with '='. The component-id is a positive + it MUST NOT be padded with '='. Section 2.2 of RFC 3548 indicates + that some base64 usages do not require padding, and it requests that + such usages call out that fact. ICE is one such usage. This is + because the data is never decoded. The component-id is a positive integer, which identifies the specific component of the candidate. It MUST start at 1 and MUST increment by 1 for each component of a particular candidate. The addr production is taken from [10], allowing for IPv4 addresses, IPv6 addresses and FQDNs. The port production is taken from RFC 2327 [5]. The token production is taken from RFC 3261 [2]. The transport production indicates the transport protocol for the candidate. This specification only defines UDP. However, extensibility is provided to allow for future transport protocols to be used with ICE, such as @@ -3614,27 +3734,29 @@ This attribute MUST be present in an offer when the candidate in the m/c-line is part of a candidate pair that is in the valid or partially valid state. The syntax of the "ice-pwd" attribute is defined as: ice-pwd-att = "ice-pwd" ":" password password = 1*base64-char - The "ice-pwd" attribute MUST appear at the session-level, and is - consequently shared by all candidates for all media streams within - the session. It MUST have at least 128 bits of randomness. Like the - candidate-ID, its syntax is taken from the base64 alphabet, allowing - the password to be generted from a base64 encoding of a 128 bit - value. In addition, if content is base64 encoded to generate the - candidate-id, it MUST NOT be padded with '='. + The "ice-pwd" attribute can appear at either the session-level or + media-level. When present in both, the value in the media-level + takes precedence. Thus, the value at the session level is + effectively a default that applies to all media streams, unless + overriden by a media-level value. It MUST have at least 128 bits of + randomness. Like the candidate-ID, its syntax is taken from the + base64 alphabet, allowing the password to be generted from a base64 + encoding of a 128 bit value. In addition, if content is base64 + encoded to generate the candidate-id, it MUST NOT be padded with '='. 13. Security Considerations There are several types of attacks possible in an ICE system. This section considers these attacks and their countermeasures. 13.1 Attacks on Connectivity Checks An attacker might attempt to disrupt the STUN-based connectivity checks. Ultimately, all of these attacks fool an agent into thinking @@ -3701,41 +3823,41 @@ it presumably wouldn't be received anyway. However, like the fake invalid attack, this attack is mitigated completely through the STUN message integrity and offer/answer security techniques. Forcing the false peer-derived candidate result can be done either with fake requests or responses, or with replays. We consider the fake requests and responses case first. It requires the attacker to send a Binding Request to one agent with a source IP address and port for the false transport address. In addition, the attacker must wait for a Binding Request from the other agent, and generate a fake - response with a MAPPED-ADDRESS attribute. This attack is best + response with a XOR-MAPPED-ADDRESS attribute. This attack is best launched against a candidate pair that is likely to be invalid, so the attacker doesnt need to contend with the actual responses to the real connectivity checks. Like the other attacks described here, this attack is mitigated by the STUN message integrity mechanisms and secure offer/answer exchanges. Forcing the false peer-derived candidate result with packet replays is different. The attacker waits until one of the agents sends a Binding Request for one of the transport address pairs. It then intercepts this request, and replays it towards the other agent with a faked source IP address. It must also prevent the original request from reaching the remote agent, either by launching a DoS attack to cause the packet to be dropped, or forcing it to be dropped using layer 2 mechanisms. The replayed packet is received at the other agent, and accepted, since the integrity check passes (the integrity check cannot and does not cover the source IP address and port). It - is then responded to. This response will contain a MAPPED-ADDRESS - with the false transport address. It is passed to the this false - address. The attacker must then intercept it and relay it towards - the originator. + is then responded to. This response will contain a XOR-MAPPED- + ADDRESS with the false transport address. It is passed to the this + false address. The attacker must then intercept it and relay it + towards the originator. The other agent will then initiate a connectivity check towards that transport address. This validation needs to succeed. This requires the attacker to force a false valid on a false candidate. Injecting of fake requests or responses to achieve this goal is prevented using the integrity mechanisms of STUN and the offer/answer exchange. Thus, this attack can only be launched through replays. To do that, the attacker must intercept the Binding Request towards this false transport address, and replay it towards the other agent. Then, it must intercept the response and replay that back as well. @@ -3765,21 +3887,21 @@ acquisition use case discussed in Section 10.1 of [13]. As a consequence, the attacks against STUN itself that are described in Section 12 [13] can still be used against the STUN address gathering operations that occur in ICE. However, the additional mechanisms provided by ICE actually counteract such attacks, making binding acquisition with STUN more secure when combined with ICE than without ICE. Consider an attacker which is able to provide an agent with a faked - MAPPED-ADDRESS in a STUN Binding Request that is used for address + XOR-MAPPED-ADDRESS in a STUN Binding Request that is used for address gathering. This is the primary attack primitive described in Section 12 of [13]. This address will be used as a STUN derived candidate in the ICE exchange. For this candidate to actually be used for media, the attacker must also attack the connectivity checks, and in particular, force a false valid on a false candidate. This attack is very hard to launch if the false address identifies a third party, and is prevented by SRTP if it identifies the attacker themself. If the attacker elects not to attack the connectivity checks, the worst it can do is prevent the STUN-derived address from being used. @@ -3959,56 +4081,54 @@ From RFC 3424, any UNSAF proposal must provide: Description of an exit strategy/transition plan. The better short term fixes are the ones that will naturally see less and less use as the appropriate technology is deployed. ICE itself doesn't easily get phased out. However, it is useful even in a globally connected Internet, to serve as a means for detecting whether a router failure has temporarily disrupted connectivity, for - example. However, what ICE does is help phase out other UNSAF - mechanisms. ICE effectively selects amongst those mechanisms, - prioritizing ones that are better, and deprioritizing ones that are - worse. Local IPv6 addresses can be preferred. As NATs begin to - dissipate as IPv6 is introduced, derived transport addresses from - other UNSAF mechanisms simply never get used, because higher priority - connectivity exists. Therefore, the servers get used less and less, - and can eventually be remove when their usage goes to zero. + example. ICE also helps prevent certain security attacks which have + nothing to do with NAT. However, what ICE does is help phase out + other UNSAF mechanisms. ICE effectively selects amongst those + mechanisms, prioritizing ones that are better, and deprioritizing + ones that are worse. Local IPv6 addresses can be preferred. As NATs + begin to dissipate as IPv6 is introduced, derived transport addresses + from other UNSAF mechanisms simply never get used, because higher + priority connectivity exists. Therefore, the servers get used less + and less, and can eventually be remove when their usage goes to zero. Indeed, ICE can assist in the transition from IPv4 to IPv6. It can be used to determine whether to use IPv6 or IPv4 when two dual-stack hosts communicate with SIP (IPv6 gets used). It can also allow a network with both 6to4 and native v6 connectivity to determine which address to use when communicating with a peer. 15.3 Brittleness Introduced by ICE From RFC3424, any UNSAF proposal must provide: Discussion of specific issues that may render systems more "brittle". For example, approaches that involve using data at multiple network layers create more dependencies, increase debugging challenges, and make it harder to transition. ICE actually removes brittleness from existing UNSAF mechanisms. In - particular, traditional STUN (the usage described in [13]) has - several points of brittleness. One of them is the discovery process - which requires a agent to try and classify the type of NAT it is - behind. This process is error-prone. With ICE, that discovery - process is simply not used. Rather than unilaterally assessing the - validity of the address, its validity is dynamically determined by - measuring connectivity to a peer. The process of determining - connectivity is very robust. The only potential problem is that - bilaterally fixed addresses through STUN can expire if traffic does - not keep them alive. However, that is substantially less brittleness - than the STUN discovery mechanisms. + particular, traditional STUN (as described in [16]) has several + points of brittleness. One of them is the discovery process which + requires a agent to try and classify the type of NAT it is behind. + This process is error-prone. With ICE, that discovery process is + simply not used. Rather than unilaterally assessing the validity of + the address, its validity is dynamically determined by measuring + connectivity to a peer. The process of determining connectivity is + very robust. Another point of brittleness in STUN and any other unilateral mechanism is its absolute reliance on an additional server. ICE makes use of a server for allocating unilateral addresses, but allows agents to directly connect if possible. Therefore, in some cases, the failure of a STUN server would still allow for a call to progress when ICE is used. Another point of brittleness in traditional STUN is that it assumes that the STUN server is on the public Internet. Interestingly, with @@ -4017,20 +4137,24 @@ provided a usable address. The most troubling point of brittleness in traditional STUN is that it doesn't work in all network topologies. In cases where there is a shared NAT between each agent and the STUN server, traditional STUN may not work. With ICE, that restriction can be lifted. Traditional STUN also introduces some security considerations. Fortunately, those security considerations are also mitigated by ICE. + Consequently, ICE serves to repair the brittleness introduced in + other UNSAF mechanisms, and does not introduce any additional + brittleness into the system. + 15.4 Requirements for a Long Term Solution From RFC 3424, any UNSAF proposal must provide: Identify requirements for longer term, sound technical solutions -- contribute to the process of finding the right longer term solution. Our conclusions from STUN remain unchanged. However, we feel ICE actually helps because we believe it can be part of the long term @@ -4039,29 +4163,41 @@ 15.5 Issues with Existing NAPT Boxes From RFC 3424, any UNSAF proposal must provide: Discussion of the impact of the noted practical issues with existing, deployed NA[P]Ts and experience reports. A number of NAT boxes are now being deployed into the market which try and provide "generic" ALG functionality. These generic ALGs hunt for IP addresses, either in text or binary form within a packet, and - rewrite them if they match a binding. This will interfere with - proper operation of any UNSAF mechanism, including ICE. + rewrite them if they match a binding. This interferes with + traditional STUN. However, the update to STUN [13] uses an encoding + which hides these binary addresses from generic ALGs. Since [13] is + required for all ICE implementations, this NAPT problem does not + impact ICE. + + Existing NAPT boxes have non-deterministic and typically short + expiration times for UDP-based bindings. This requires + implementations to send periodic keepalives to maintain those + bindings. ICE uses a default of 15s, which is a very conservative + estimate. Eventually, over time, as NAT boxes become compliant to + behave [37], this minimum keepalive will become deterministic and + well-known, and the ICE timers can be adjusted. Having a way to + discover the minimum keepalive interval would be far better still. 16. Acknowledgements The authors would like to thank Flemming Andreasen, Rohan Mahy, Dean - Willis, Dan Wing, Douglas Otis, and Francois Audet for their comments - and input. A special thanks goes to Magnus Westerlund for doing - several detailed reviews on the various revisions of this + Willis, Dan Wing, Douglas Otis, Tim Moore, and Francois Audet for + their comments and input. A special thanks goes to Magnus Westerlund + for doing several detailed reviews on the various revisions of this specification. His input led to many substantive improvements in this document. 17. References 17.1 Normative References [1] Huitema, C., "Real Time Control Protocol (RTCP) attribute in Session Description Protocol (SDP)", RFC 3605, October 2003. @@ -4097,22 +4233,22 @@ [11] Rosenberg, J. and H. Schulzrinne, "Reliability of Provisional Responses in Session Initiation Protocol (SIP)", RFC 3262, June 2002. [12] Yon, D., "Connection-Oriented Media Transport in the Session Description Protocol (SDP)", draft-ietf-mmusic-sdp-comedia-10 (work in progress), November 2004. [13] Rosenberg, J., "Simple Traversal of UDP Through Network Address - Translators (NAT) (STUN)", draft-ietf-behave-rfc3489bis-02 - (work in progress), July 2005. + Translators (NAT) (STUN)", draft-ietf-behave-rfc3489bis-03 + (work in progress), March 2006. [14] Rosenberg, J., Mahy, R., and C. Huitema, "Obtaining Relay Addresses from Simple Traversal of UDP Through NAT (STUN)", Internet Draft draft-ietf-behave-turn-00.txt, February 2006. 17.2 Informative References [15] Schulzrinne, H., Rao, A., and R. Lanphier, "Real Time Streaming Protocol (RTSP)", RFC 2326, April 1998.