Internet Engineering Task Force                     Gonzalo Camarillo
Internet draft                                             Jan Holler
                                                    Goran AP Eriksson
                                                             Ericsson
                                                        December 2000
                                                           April 2001
                                                 Expires June October 2001
                                       <draft-ietf-mmusic-fid-00.txt>
                                       <draft-ietf-mmusic-fid-01.txt>

                         The SDP media alignment in SIP fid attribute

Status of this Memo

   This document is an Internet-Draft and is in full conformance with
      all provisions of Section 10 of RFC2026.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF), its areas, and its working groups. Note that
   other groups may also distribute working documents as Internet-
   Drafts. Internet-Drafts are draft documents valid for a maximum of
   six months and may be updated, replaced, or obsoleted by other
   documents at any time. It is inappropriate to use Internet- Drafts
   as reference material or to cite them other than as "work in
   progress."

   The list of current Internet-Drafts can be accessed at
   http://www.ietf.org/ietf/1id-abstracts.txt
   The list of Internet-Draft Shadow Directories can be accessed at
   http://www.ietf.org/shadow.html.

Abstract

   This document defines an SDP media attribute. This attribute is
   intended to be used in conjunction with SIP in order to align
   different media streams belonging to a session. The use of this
   attribute allows sending receiving media from a single flow (several media
   streams), encoded in different formats during the a particular session, to
   in different ports and host interfaces.

Camarillo/Holler/Eriksson                                            1
                        The SDP media alignment in SIP fid attribute

TABLE OF CONTENTS

   1   Introduction.................................................2
   2   Media flow definition........................................2
   2.1   Motivation...................................................2
   1.1 SIP and cellular access......................................3
   2.2 access......................................2
   1.2 DTMF tones...................................................4 tones...................................................3
   2   Media flow definition........................................3
   3   Flow identification attribute................................4 attribute................................3
   4   Examples   Semantics of flow identification attribute....................4 the fid attribute...............................4
   4.1 UMTS/SIP terminal............................................4
   4.2 Application Server Components................................5 Interactions with other media level attributes...............4
   5   Media-level versus session-level attribute...................7
   6   Usage of the fid attribute in SIP............................5
   5.1 Backward compatibility.......................................7
   6.1 compatibility.......................................5
   5.2 Caller does not support fid..................................7
   6.2 fid..................................5
   5.3 Callee does not support fid..................................7 fid..................................5
   6   Acknoledgements..............................................6
   7   Behavior of UAs..............................................8   References...................................................6
   8   Acronyms.....................................................9
   9   Acknoledgements..............................................9
   10  References...................................................9
   11   Authors³ Addresses..........................................10 Addresses...........................................6

1. Introduction

   SIP [1] is an application layer protocol for establishing,
   terminating and modifying multimedia sessions. SIP carries session
   descriptions in the bodies of the SIP messages but is independent
   from the protocol used for describing sessions. SDP [2] is one of
   the protocols that can be used for this purpose.

   Appendix B of [1] describes the usage of SDP in relation to SIP. It
   states: "The caller and callee align their media description so that
   the nth media stream ("m=" line) in the caller³s session description
   corresponds to the nth media stream in the callee³s description."

   This way of performing the media alignment is not efficient when a
   single flow comprises several media streams. This is a common
   situation when AP (Application Sever) components [3] are employed.
   It is also common for systems that handle different codecs on
   different port numbers (or on different interfaces).

2. Media flow definition Motivation

   The RTSP RFC [4] [1] defines a media stream as "a single media instance,
   e.g., an audio stream or a video stream as well as a single
   whiteboard or shared application group. When using RTP, a stream
   consists of all RTP and RTCP packets created by a source within an
   RTP session".

   This definition assumes that a single audio (or video) stream maps
   into an RTP session. The RTP RFC [5] [2] defines an RTP session as
   follows: "For each participant, the session is defined by a
   particular pair of destination transport addresses (one network
   address plus a port pair for RTP and RTCP)".

Camarillo/Holler/Eriksson                                            2
                      SDP media alignment in SIP

   However, there are situations where a single media instance, e.g., (e.g.,
   an audio stream or a video stream stream) is sent using more than one RTP
   session. Two examples (among many others) of this kind of situation
   are cellular systems using SIP [3] and systems receiving DTMF tones
   on a different host than the voice. Both examples are described in later
   sections.

   We introduce the definition of media flow:

   Media flow consists of a single media instance, e.g., an audio
   stream or

1.1 SIP and cellular access

   Systems using a video stream as well as a single whiteboard or shared
   application group. When using RTP, a media flow comprises one or
   more RTP sessions.

   For instance, in a two party call where the voice exchanged can be
   encoded using GSM or PCM, the receiver wants to receive GSM on a
   port number and PCM on a different port number. Two RTP sessions
   will be established, one carrying GSM and the other carrying PCM.

   At any particular moment just one codec is in use. Therefore, at any
   moment one of the RTP sessions will not transport any voice. Here
   the systems are dealing with a single flow (one audio stream) and
   two RTP sessions.

2.1 SIP and cellular access

   Systems using a cellular access (such as UMTS or EDGE) and SIP cellular access and SIP as a signalling protocol
   need to receive media over the air. During a session the media can
   be encoded using different codecs. The encoded media has to traverse
   the radio interface. The radio interface is generally characterized
   by being bit error prone and associated with relatively high packet
   transfer delays. In addition, radio interface resources in a
   cellular environment are scarce and thus expensive, which calls for
   special measures in providing a highly efficient transport [6]. [4]. In
   order to get an appropriate speech quality in combination with an
   efficient transport, precise knowledge of codec properties are
   required so that a proper radio bearer for the RTP session can be

Camarillo/Holler/Eriksson                                            2
                        The SDP fid attribute

   configured before transferring the media. These radio bearers are
   dedicated bearers per media type, i.e. codec.

   In UMTS, for instance, when the RTP packets shall be delivered over
   the air interface, a packet filtering function routes the packets to
   the proper radio bearer towards the UMTS/SIP terminal. The packet
   filtering function operates using a Traffic Flow Template (TFT) [7],
   which is established when configuring the radio bearer. The TFT
   hence specifies the profile of the data that should be carried by
   the

   Cellular systems typically configure different radio bearer. A TFT can contain the following data:

   -Source Address and Subnet Mask.
   -Protocol Number (IPv4) / Next Header (IPv6).
   -Destination Port Range.
   -Source Port Range.
   -IPSec Security Parameter Index (SPI).

Camarillo/Holler/Eriksson                                            3
                      SDP media alignment in SIP

   -Type of Service (TOS) (IPv4) / Traffic class (IPv6) and Mask.
   -Flow Label (IPv6).

   It is worth noticing that just certain combinations of these
   parameters are allowed.

   The bearers on
   different port numbers. Therefore, incoming media has to have
   different destination port numbers for the different possible codecs
   in order to be filtered and routed properly to the correct radio bearer. Therefore, Thus,
   this is an example in which several RTP sessions are used for to carry a
   single media flow.

2.2 instance (the encoded speech from the sender).

1.2 DTMF tones

   Some voice sessions include DTMF tones. Sometimes the voice handling
   is performed by a different host than the DTMF handling (e.g.
   section 5.4, figures 3 and 4 handling. [5]
   contains several examples of [3]). how application servers in the network
   gather DTMF tones for the user while the user receives the encoded
   speech on his user agent. In this situations it is necessary to
   establish two RTP sessions: one for the voice and the other for the
   DTMF tones. Both RTP sessions are logically part of the same media flow.

3. Flow identification attribute

   A new "flow identification" media attribute is defined. It is used
   for identifying media flows within
   instance.

2. Media flow definition

   The previous examples show that the definition of a session. media stream in
   [1] has to be updated. It provides cannot be assumed that a means
   for aligning single media
   instance maps into a number single RTP session. Therefore, we introduce the
   definition of flows (rather than a number media flow:

   Media flow consists of a single media
   streams) within instance, e.g., an audio
   stream or a session between members participating video stream as well as a single whiteboard or shared
   application group. When using RTP, a media flow comprises one or
   more RTP sessions.

   For instance, in a two party call where the voice exchanged can be
   encoded using GSM or PCM, the receiver wants to receive GSM on a
   port number and PCM on a different port number. Two RTP sessions
   will be established, one carrying GSM and the other carrying PCM.

   At any particular moment just one codec is in use. Therefore, at any
   moment one of the RTP sessions will not transport any voice. Here
   the systems are dealing with a single media flow, but two RTP
   sessions.

3. Flow identification attribute

   An RTP session is described in SDP [6] using an "m" line. When a
   media flow comprises more than one RTP session, we need a way to
   associate several "m" lines together into a media flow.

   A new "flow identification" media attribute is defined. It is used
   for identifying media flows within a session. Its formatting in SDP
   is described by the following BNF:

Camarillo/Holler/Eriksson                                            3
                        The SDP fid attribute

         fid-attribute      = "a=fid:" identification-tag
         identification-tag = token

   The identification tag is unique within the SDP session description.
   The following examples illustrate its usage.

4. Examples of flow identification attribute

4.1 UMTS/SIP terminal

   In the following example John uses

   Syntactically fid is a traditional access such as an
   ethernet while Laura has media-level attribute. It provides
   information about a media stream defined by an "m" line.
   Semantically fid would be defined as a session-level attribute since
   it provides flow hierarchy inside a UMTS/SIP terminal. The caller John sends
   an INVITE with the following session description description.

4. Semantics of the fid attribute

   A media agent handling a media flow that comprises several "m" lines
   sends media to different destinations (IP address/port number)
   depending on the callee
   Laura. codec used at any moment. If several "m" lines
   contain the codec used media is sent to different destinations in
   parallel.

   For instance, a SIP user agent receives an INVITE with the following
   body:

         v=0
         o=John 289085535 289085535
         o=Laura 289083124 289083124 IN IP4 first.example.com second.example.com
         t=0 0
         c=IN IP4 111.111.111.111 131.160.1.112
         m=audio 20000 30000 RTP/AVP 0
         a=fid:1
         m=audio 30002 RTP/AVP 8
         a=fid:1

Camarillo/Holler/Eriksson                                            4
                      SDP media alignment in SIP

   The callee Laura is on
         m=audio 30004 RTP/AVP 0 8
         a=fid:1

   At a UMTS/SIP terminal. She configures the
   necessary radio bearers and implements the TFTs:

   All particular point of time, if the incoming IP media agent is sending PCM u-
   law (payload 0) it sends RTP packets with destination port UDP to ports 30000 will be
   carried by the radio access bearer configured for G-711 u-law and 30004 (first
   and third "m" lines). If it is sending PCM A-law (payload type 0).

   All the incoming IP 8) it
   sends RTP packets with destination port UDP to ports 30002 will be
   carried by and 30004 (second and third "m"
   lines).

   Note that if several "m" lines with the radio access bearer configured for G-711 A-law
   (payload type 8).

   Accordingly, same fid value contain the following SDP is returned to
   same codec the caller in media agent MUST send media over several RTP sessions
   at the same time.

4.1 Interactions with other media level attributes

   Media level attributes affect a media stream defined by an "m" line.
   The presence of fid does not modify this behavior.

   For instance, a 200 OK
   response: SIP user agent receives an INVITE with the following
   body:

         v=0
         o=Laura 289083124 289083124 IN IP4 second.example.com
         t=0 0

Camarillo/Holler/Eriksson                                            4
                        The SDP fid attribute

         c=IN IP4 222.222.222.222 131.160.1.112
         m=audio 30000 RTP/AVP 0
         a=fid:1
         m=audio 30002 RTP/AVP 8
         a=recvonly
         a=fid:1

   The ACK carries the definitive SDP from the caller:

         v=0
         o=John 289085535 289085535 IN IP4 first.example.com
         t=0 0
         c=IN IP4 111.111.111.111
         m=audio 20000 RTP/AVP 0
         a=fid:1
         m=audio 20002 RTP/AVP 8
         a=fid:1

   With the current way of performing SDP media alignment in SIP the
   callee would have accepted the call and immediately after re-INVITEd agent knows that at a certain moment it can send either
   PCM u-law to port number 30000 or PCM A-law to port number 30002.
   However, the caller with media agent also knows that the new SDP. The fid attribute saves many RTTs.

   Besides saving bandwidth and RTTs other end will only
   send PCM u-law (payload 0).

   Note that the fid attribute provides a means allows to express uni-directional codecs
   for describing a logical relationship between bi-directional media streams that
   belong to flow, as it is shown in the same flow.

4.2 Application Server Components

   Section 5.4 example
   above.

5. Usage of "An Application Server Architecture for SIP" [3]
   contains two examples (figures 3 and 4) where DTMF tones are
   received by a different host than the voice stream. In both
   situations using the fid attribute to perform media alignment would
   save a tremendous amount of messages exchanged and reduce the global
   session establishment time.

Camarillo/Holler/Eriksson                                            5
                      SDP media alignment in SIP

   Let us take figure 4. A UAC sends

   SIP [3] is an INVITE with just a voice
   stream. There are two ASs in the path that want to receive DTMF
   tones.

   Three steps are needed in order to set the session up:
   1) A application layer protocol for establishing,
   terminating and modifying multimedia sessions. SIP carries session is established between
   descriptions in the UAC and bodies of the callee. This
      involves three SIP messages but is independent
   from the callerĘs point of view (INVITE-
      200 OK-ACK).
   2) The session protocol used for describing sessions. SDP [6] is modified by A (one one of
   the ASs protocols that wants can be used for this purpose.

   Appendix B of [3] describes the usage of SDP in relation to
      receive DTMF tones). SIP. It adds an "m" line to
   states: "The caller and callee align their media description so that
   the nth media stream ("m=" line) in the caller³s session description indicating that it wants
   corresponds to receive DTMF tones. This
      involves three more messages from the callerĘs point of view
      (INVITE-200 OK-ACK)
   3) The session is modified once more by B (the other AS that also
      wants to receive DTMF tones). It adds another "m" line indicating
      that it wants to receive DTMF tones. This involves three more
      messages from the callerĘs point of view (INVITE-200 OK-ACK).

          Caller            A                B              Callee
            |               |                |                 |
            |(1) SIP INV    |                |                 |
            |-------------->|(2) SIP INV     |                 |
            |               |--------------->|(3) SIP INV      |
            |               |                |---------------->|
            |               |                |(4) 200 OK       |
            |               |(5) 200 OK      |<----------------|
            |(6) 200 OK     |<---------------|                 |
            |<--------------|                |                 |
            |(7) SIP ACK    |                |                 |
            |-------------->|(8) SIP ACK     |                 |
            |               |--------------->|(9) SIP ACK      |
            |               |                |---------------->|
            |(10) SIP INV   |                |                 |
            |<--------------|                |                 |
            |(11) 200 OK    |                |                 |
            |-------------->|                |                 |
            |(12) SIP ACK   |                |                 |
            |<--------------|                |                 |
            |               |                |                 |
            |               |(13) SIP INV    |                 |
            |(14) SIP INV   |<---------------|                 |
            |<--------------|                |                 |
            |(15) 200 OK    |                |                 |
            |-------------->|(16) 200 OK     |                 |
            |               |--------------->|                 |
            |               |(17) SIP ACK    |                 |
            |(18) SIP ACK   |<---------------|                 |
            |<--------------|                |                 |
            |               |                |                 |

           Figure 4 of "An AS Component Architecture for SIP" [3]

Camarillo/Holler/Eriksson                                            6
                      SDP nth media alignment stream in SIP

   The whole session is not correctly set up until the end of this
   sequence of messages. If the caller is using a low-rate access this
   can take a long time. callee³s description."

   The use presence of the fid attribute would reduce these nine messages that
   the caller sees to just three (INVITE-200 OK-ACK). B would add in an
   "m" line to the 200 OK from the callee with the same fid value as
   the voice stream. Then A would add another "m" line, again with the
   same fid value than the two previous "m" lines.

   As a result, the caller receives a 200 OK indicating that just SDP session description does
   not modify this behavior.

5.1 Backward compatibility

   This document does not define any SIP "Require" header. Therefore,
   if one
   flow is established, but also that all of the DTMF tones should be sent
   to A and B. For a low-rate access SIP user agents does not understand the establishment time has been
   reduced a lot.

   Note that fid attribute
   the caller sends an updated standard SDP in the ACK with the local
   RTP ports for all the "m" lines received in the 200 OK.

5. Media-level versus session-level attribute

   Syntactically fid fall back mechanism is a media-level attribute. It provides
   information about a media stream defined by an "m" line.
   Semantically fid would be defined as a session-level attribute since
   it provides flow hierarchy inside a session description.

6. Backward compatibility used.

   A system that understands the fid attribute MUST add it to any SDP
   session description that it generates.

   If a response to a request that included the fid attribute also
   includes it media alignment is performed based on the fid attribute
   rather than on matching of nth lines.

6.1

5.2 Caller does not support fid

   This situation does not represent a problem. The SDP in the INVITE
   will not contain any fid attribute and the callee will use the "nth-
   line" method to perform media alignment. attribute. The callee will need a re-INVITE in order to receive the proper
   media encoding on knows that the proper interface.

6.2 caller
   does not support fid.

5.3 Callee does not support fid

   The callee will ignore the fid attribute. attribute, since it does not
   understand it. It will consider that the session comprises several
   media streams.

Camarillo/Holler/Eriksson                                            5
                        The SDP fid attribute

   Different implementations would behave in different ways.

   In the case of audio and different "m" lines for different codecs an
   implementation might decide to act as a mixer with the different
   incoming RTP sessions, which is the correct behavior.

Camarillo/Holler/Eriksson                                            7
                      SDP media alignment in SIP

   If an

   An implementation decides to refuse the request (e.g. 488 Not
   acceptable here or 606 Not Acceptable) the caller should re-try the
   request without the fid attribute and only one "m" line per flow.
   Note that even re-INVITEs without the fid attribute adding new "m"
   lines would probably fail in this situation because the callee does
   not support multiple "m" lines. Therefore, this problem is related
   to UAs that do not handle multiple "m" lines rather than to the fid
   attribute.

7. Behavior of UAs

   UAs supporting the fid attribute can add new "m" lines belonging to
   an existing flow (identified by a fid value) in re-INVITEs and 200
   OK responses. UAs MUST NOT add "m" lines to existing flows in ACKs
   since it would be impossible to receive the remote RTP/RTCP port for
   the new "m" line.

   A UA handling a media flow that comprises several "m" lines sends
   media to different destinations (IP address/port number) depending
   on the codec used at any moment. If several "m" lines contain the
   codec used media is sent to different destinations in parallel.

   For instance, a UA receives the following 200 OK:

         v=0
         o=Laura 289083124 289083124 IN IP4 second.example.com
         t=0 0
         c=IN IP4 222.222.222.222
         m=audio 30000 RTP/AVP 0
         a=fid:1
         m=audio 30002 RTP/AVP 8
         a=fid:1
         m=audio 30004 RTP/AVP 0 8
         a=fid:1

   At a particular point of time, if it is sending PCM u-law (payload
   0) it sends RTP packets to ports 30000 and 30004 (first and third
   "m" lines). If it is sending PCM A-law (payload 8) it sends RTP
   packets to ports 30002 and 30004 (second and third "m" lines).

   Note that if several "m" lines with the same fid value contain the
   same codec the UA MUST send several RTP sessions in parallel. A UA
   that sends an INVITE with a single "m" is willing to send one RTP
   session at a time, but upon reception of a 200 OK might be asked to
   send more than one RTP session in parallel. If the UA is not willing
   to do so (e.g. due to bandwidth constraints) it should BYE the
   session.

   In order to avoid this situation UAS should follow certain
   guidelines. If it is essential for the UAS that the UAC sends
   several RTP sessions in parallel (e.g. two APs need also decide to gather DTMF

Camarillo/Holler/Eriksson                                            8
                      SDP media alignment in SIP

   tones) the UAS should use the fid attribute in refuse the 200 OK to include
   these request (e.g. 488
   Not acceptable here or 606 Not Acceptable) because it contains
   several "m" lines. In this situation, if case, the UAC callee does not support sending RTP sessions
   in parallel the UAS is not willing to accept the session. Thus, when
   the UAC BYEs the session the result is the one expected (session
   terminated).

   If re-INVITEs had been used (instead
   type of the fid attribute) the UAS
   would have sent a BYE when the first re-INVITE had failed. Thus, the
   result is the same as when using the fid attribute.

   On the other hand, if it is desirable but not essential for the UAS session that the UAC sends several RTP sessions in parallel, caller wanted to establish. In case the UAS should
   use re-INVITEs
   caller is willing to add new "m" lines. If establish a re-INVITE fails, simpler session anyway, he should
   re-try the UAS
   would continue with request without the session with a single RTP session at a time.

8. Acronyms

   AP       Application Server
   BNF      Backus-Naur Form
   DTMF     Dual Tone Multi Frequency
   EDGE     Enhanced Data rates for GSM fid attribute and TDMA/136 Evolution
   GSM      Global System for Mobile communication
   IP       Internet Protocol
   PCM      Pulse Code Modulation
   RFC      Request For Comments
   RTCP     RTP Control Protocol
   RTP      Real-time Transport Protocol
   RTSP     Real-Time Streaming Protocol
   RTT      Round Trip Time
   SDP      Session Description Protocol
   SIP      Session Initiation Protocol
   TFT      Traffic Flow Template
   UA       User Agent
   UAC      User Agent Client
   UAS      User Agent Server
   UMTS     Universal Mobile Telecommunication System
   WLAN     Wireless Local Area Network

9. only one "m" line
   per flow.

6. Acknowledgments

   The authors would like to thank Jonathan Rosenberg and Adam Roach
   for their feedback on this document.

10.

7. References

   [1] M. Handley/H. Schulzrinne/E. Schooler/J. Rosenberg, "SIP:
   Session Initiation Protocol", RFC 2543, IETF; Mach 1999.

   [2] M. Handley/V. Jacobson, "SDP: Session Description Protocol", RFC
   2327, IETF; April 1998.

Camarillo/Holler/Eriksson                                            9
                      SDP media alignment in SIP

   [3] J. Rosemberg/P.Mataga/H.Schulzrinne, "An Applcation Server
   Component Architecture for SIP", draft-rosenberg-sip-app-components-
   00.txt, IETF; November 2000.

   [4] H. Schulzrinne/A. Rao/R. Lanphier, "Real Time Streaming Protocol
   (RTSP)", RFC 2326, IETF; April 1998.

   [5]

   [2] H. Schulzrinne/S. Casner/R. Frederick/V. Jacobson, "RTP: A
   Transport Protocol for Real-Time Applications", RFC 1889, IETF;
   January 1996.

   [6]

   [3] M. Handley/H. Schulzrinne/E. Schooler/J. Rosenberg, "SIP:
   Session Initiation Protocol", RFC 2543, IETF; Mach 1999.

   [4] L. Westberg/M. Lindqvist, "Realtime Traffic over Cellular Access
   Networks", draft-westberg-realtime-cellular-03.txt, IETF; November
   2000. Work in progress.

   [7] 3G TS 23.060 v3.2.1 General Packet Radio Service Description.

11.

   [5] J. Rosemberg/P.Mataga/H.Schulzrinne, "An Applcation Server
   Component Architecture for SIP", draft-rosenberg-sip-app-components-
   00.txt, IETF; November 2000. Work in progress.

   [6] M. Handley/V. Jacobson, "SDP: Session Description Protocol", RFC
   2327, IETF; April 1998.

8. Authors³ Addresses

   Gonzalo Camarillo
   Ericsson
   Advanced Signalling Research Lab.
   FIN-02420 Jorvas
   Finland
   Phone: +358 9 299 3371
   Fax: +358 9 299 3052
   Email: Gonzalo.Camarillo@ericsson.com

Camarillo/Holler/Eriksson                                            6
                        The SDP fid attribute

   Jan Holler
   Ericsson Research
   S-16480 Stockholm
   Sweden
   Phone: +46 8 58532845
   Fax: +46 8 4047020
   Email: Jan.Holler@era.ericsson.se

   Goran AP Eriksson
   Ericsson Research
   S-16480 Stockholm
   Sweden
   Phone: +46 8 58531762
   Fax: +46 8 4047020
   Email: Goran.AP.Eriksson@era.ericsson.se

Camarillo/Holler/Eriksson                                           10                                            7