draft-ietf-avtcore-rtp-topologies-update-01.txt   draft-ietf-avtcore-rtp-topologies-update-02.txt 
Network Working Group M. Westerlund Network Working Group M. Westerlund
Internet-Draft Ericsson Internet-Draft Ericsson
Obsoletes: 5117 (if approved) S. Wenger Obsoletes: 5117 (if approved) S. Wenger
Intended status: Informational Vidyo Intended status: Informational Vidyo
Expires: April 25, 2014 October 22, 2013 Expires: November 28, 2014 May 27, 2014
RTP Topologies RTP Topologies
draft-ietf-avtcore-rtp-topologies-update-01 draft-ietf-avtcore-rtp-topologies-update-02
Abstract Abstract
This document discusses point to point and multi-endpoint topologies This document discusses point to point and multi-endpoint topologies
used in Real-time Transport Protocol (RTP)-based environments. In used in Real-time Transport Protocol (RTP)-based environments. In
particular, centralized topologies commonly employed in the video particular, centralized topologies commonly employed in the video
conferencing industry are mapped to the RTP terminology. conferencing industry are mapped to the RTP terminology.
This document is updated with additional topologies and is intended This document is updated with additional topologies and is intended
to replace RFC 5117. to replace RFC 5117.
skipping to change at page 1, line 37 skipping to change at page 1, line 37
Internet-Drafts are working documents of the Internet Engineering Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF). Note that other groups may also distribute Task Force (IETF). Note that other groups may also distribute
working documents as Internet-Drafts. The list of current Internet- working documents as Internet-Drafts. The list of current Internet-
Drafts is at http://datatracker.ietf.org/drafts/current/. Drafts is at http://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress." material or to cite them other than as "work in progress."
This Internet-Draft will expire on April 25, 2014. This Internet-Draft will expire on November 28, 2014.
Copyright Notice Copyright Notice
Copyright (c) 2013 IETF Trust and the persons identified as the Copyright (c) 2014 IETF Trust and the persons identified as the
document authors. All rights reserved. document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents Provisions Relating to IETF Documents
(http://trustee.ietf.org/license-info) in effect on the date of (http://trustee.ietf.org/license-info) in effect on the date of
publication of this document. Please review these documents publication of this document. Please review these documents
carefully, as they describe your rights and restrictions with respect carefully, as they describe your rights and restrictions with respect
to this document. Code Components extracted from this document must to this document. Code Components extracted from this document must
include Simplified BSD License text as described in Section 4.e of include Simplified BSD License text as described in Section 4.e of
the Trust Legal Provisions and are provided without warranty as the Trust Legal Provisions and are provided without warranty as
skipping to change at page 2, line 18 skipping to change at page 2, line 18
Table of Contents Table of Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3
2. Definitions . . . . . . . . . . . . . . . . . . . . . . . . . 3 2. Definitions . . . . . . . . . . . . . . . . . . . . . . . . . 3
2.1. Glossary . . . . . . . . . . . . . . . . . . . . . . . . 3 2.1. Glossary . . . . . . . . . . . . . . . . . . . . . . . . 3
3. Topologies . . . . . . . . . . . . . . . . . . . . . . . . . 4 3. Topologies . . . . . . . . . . . . . . . . . . . . . . . . . 4
3.1. Point to Point . . . . . . . . . . . . . . . . . . . . . 4 3.1. Point to Point . . . . . . . . . . . . . . . . . . . . . 4
3.2. Point to Point via Middlebox . . . . . . . . . . . . . . 5 3.2. Point to Point via Middlebox . . . . . . . . . . . . . . 5
3.2.1. Translators . . . . . . . . . . . . . . . . . . . . . 5 3.2.1. Translators . . . . . . . . . . . . . . . . . . . . . 5
3.2.2. Back to Back RTP sessions . . . . . . . . . . . . . . 9 3.2.2. Back to Back RTP sessions . . . . . . . . . . . . . . 9
3.3. Point to Multipoint Using Multicast . . . . . . . . . . . 9 3.3. Point to Multipoint Using Multicast . . . . . . . . . . . 10
3.3.1. Any Source Multicast (ASM) . . . . . . . . . . . . . 10 3.3.1. Any Source Multicast (ASM) . . . . . . . . . . . . . 10
3.3.2. Source Specific Multicast (SSM) . . . . . . . . . . . 11 3.3.2. Source Specific Multicast (SSM) . . . . . . . . . . . 11
3.3.3. SSM with Local Unicast Resources . . . . . . . . . . 13 3.3.3. SSM with Local Unicast Resources . . . . . . . . . . 13
3.4. Point to Multipoint Using Mesh . . . . . . . . . . . . . 14 3.4. Point to Multipoint Using Mesh . . . . . . . . . . . . . 15
3.5. Point to Multipoint Using the RFC 3550 Translator . . . . 17 3.5. Point to Multipoint Using the RFC 3550 Translator . . . . 18
3.5.1. Relay - Transport Translator . . . . . . . . . . . . 17 3.5.1. Relay - Transport Translator . . . . . . . . . . . . 18
3.5.2. Media Translator . . . . . . . . . . . . . . . . . . 19 3.5.2. Media Translator . . . . . . . . . . . . . . . . . . 19
3.6. Point to Multipoint Using the RFC 3550 Mixer Model . . . 19 3.6. Point to Multipoint Using the RFC 3550 Mixer Model . . . 20
3.6.1. Media Mixing . . . . . . . . . . . . . . . . . . . . 21 3.6.1. Media Mixing . . . . . . . . . . . . . . . . . . . . 22
3.6.2. Media Switching . . . . . . . . . . . . . . . . . . . 24 3.6.2. Media Switching . . . . . . . . . . . . . . . . . . . 25
3.7. Selective Forwarding Middlebox . . . . . . . . . . . . . 26 3.7. Selective Forwarding Middlebox . . . . . . . . . . . . . 27
3.8. Point to Multipoint Using Video Switching MCUs . . . . . 29 3.8. Point to Multipoint Using Video Switching MCUs . . . . . 30
3.9. Point to Multipoint Using RTCP-Terminating MCU . . . . . 30 3.9. Point to Multipoint Using RTCP-Terminating MCU . . . . . 32
3.10. Split Component Endpoint . . . . . . . . . . . . . . . . 32 3.10. Split Component Endpoint . . . . . . . . . . . . . . . . 33
3.11. Non-Symmetric Mixer/Translators . . . . . . . . . . . . . 33 3.11. Non-Symmetric Mixer/Translators . . . . . . . . . . . . . 34
3.12. Combining Topologies . . . . . . . . . . . . . . . . . . 33 3.12. Combining Topologies . . . . . . . . . . . . . . . . . . 35
4. Comparing Topologies . . . . . . . . . . . . . . . . . . . . 34 4. Comparing Topologies . . . . . . . . . . . . . . . . . . . . 35
4.1. Topology Properties . . . . . . . . . . . . . . . . . . . 34 4.1. Topology Properties . . . . . . . . . . . . . . . . . . . 36
4.1.1. All to All Media Transmission . . . . . . . . . . . . 34 4.1.1. All to All Media Transmission . . . . . . . . . . . . 36
4.1.2. Transport or Media Interoperability . . . . . . . . . 35 4.1.2. Transport or Media Interoperability . . . . . . . . . 37
4.1.3. Per Domain Bit-Rate Adaptation . . . . . . . . . . . 35 4.1.3. Per Domain Bit-Rate Adaptation . . . . . . . . . . . 37
4.1.4. Aggregation of Media . . . . . . . . . . . . . . . . 36 4.1.4. Aggregation of Media . . . . . . . . . . . . . . . . 37
4.1.5. View of All Session Participants . . . . . . . . . . 36 4.1.5. View of All Session Participants . . . . . . . . . . 38
4.1.6. Loop Detection . . . . . . . . . . . . . . . . . . . 36 4.1.6. Loop Detection . . . . . . . . . . . . . . . . . . . 38
4.2. Comparison of Topologies . . . . . . . . . . . . . . . . 36 4.2. Comparison of Topologies . . . . . . . . . . . . . . . . 39
5. Security Considerations . . . . . . . . . . . . . . . . . . . 37 5. Security Considerations . . . . . . . . . . . . . . . . . . . 39
6. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 39 6. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 41
7. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 39 7. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 41
8. References . . . . . . . . . . . . . . . . . . . . . . . . . 39 8. References . . . . . . . . . . . . . . . . . . . . . . . . . 41
8.1. Normative References . . . . . . . . . . . . . . . . . . 39 8.1. Normative References . . . . . . . . . . . . . . . . . . 41
8.2. Informative References . . . . . . . . . . . . . . . . . 39 8.2. Informative References . . . . . . . . . . . . . . . . . 42
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 41 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 43
1. Introduction 1. Introduction
Real-time Transport Protocol (RTP) [RFC3550] topologies describe Real-time Transport Protocol (RTP) [RFC3550] topologies describe
methods for interconnecting RTP entities and their processing methods for interconnecting RTP entities and their processing
behavior of RTP and RTCP. This document tries to address past and behavior of RTP and RTCP. This document tries to address past and
existing confusion, especially with respect to terms not defined in existing confusion, especially with respect to terms not defined in
RTP but in common use in the conversational communication industry, RTP but in common use in the conversational communication industry,
such as the Multipoint Control Unit or MCU. such as the Multipoint Control Unit or MCU.
skipping to change at page 4, line 50 skipping to change at page 5, line 5
3.1. Point to Point 3.1. Point to Point
Shortcut name: Topo-Point-to-Point Shortcut name: Topo-Point-to-Point
The Point to Point (PtP) topology (Figure 1) consists of two The Point to Point (PtP) topology (Figure 1) consists of two
endpoints, communicating using unicast. Both RTP and RTCP traffic endpoints, communicating using unicast. Both RTP and RTCP traffic
are conveyed endpoint-to-endpoint, using unicast traffic only (even are conveyed endpoint-to-endpoint, using unicast traffic only (even
if, in exotic cases, this unicast traffic happens to be conveyed over if, in exotic cases, this unicast traffic happens to be conveyed over
an IP-multicast address). an IP-multicast address).
+---+ +---+ +---+ +---+
| A |<------->| B | | A |<------->| B |
+---+ +---+ +---+ +---+
Figure 1: Point to Point Figure 1: Point to Point
The main property of this topology is that A sends to B, and only B, The main property of this topology is that A sends to B, and only B,
while B sends to A, and only A. This avoids all complexities of while B sends to A, and only A. This avoids all complexities of
handling multiple endpoints and combining the requirements stemming handling multiple endpoints and combining the requirements stemming
from them. Note that an endpoint can still use multiple RTP from them. Note that an endpoint can still use multiple RTP
Synchronization Sources (SSRCs) in an RTP session. The number of RTP Synchronization Sources (SSRCs) in an RTP session. The number of RTP
sessions in use between A and B can also be of any number, subject sessions in use between A and B can also be of any number, subject
only to system level limitations like the number range of ports. only to system level limitations like the number range of ports.
RTCP feedback messages for the indicated SSRCs are communicated RTCP feedback messages for the indicated SSRCs are communicated
directly between the endpoints. Therefore, this topology poses directly between the endpoints. Therefore, this topology poses
minimal (if any) issues for any feedback messages. For RTP sessions minimal (if any) issues for any feedback messages. For RTP sessions
which use multiple SSRC per endpoint it can be relevant to implement which use multiple SSRC per endpoint it can be relevant to implement
support for cross-reporting suppression as defined in "Sending support for cross-reporting suppression as defined in "Sending
Multiple Media Streams in a Single RTP Session" Multiple Media Streams in a Single RTP Session"
[I-D.ietf-avtcore-rtp-multi-stream]. [I-D.ietf-avtcore-rtp-multi-stream-optimisation].
3.2. Point to Point via Middlebox 3.2. Point to Point via Middlebox
This section discusses cases where two endpoints communicate but have This section discusses cases where two endpoints communicate but have
one or more middleboxes involved in the RTP session. one or more middleboxes involved in the RTP session.
3.2.1. Translators 3.2.1. Translators
Shortcut name: Topo-PtP-Translator Shortcut name: Topo-PtP-Translator
skipping to change at page 6, line 42 skipping to change at page 6, line 45
transport translators. These middleboxes come in many variations transport translators. These middleboxes come in many variations
including NAT [RFC3022] traversal by pinning the media path to a including NAT [RFC3022] traversal by pinning the media path to a
public address domain relay, network topologies where the media flow public address domain relay, network topologies where the media flow
is required to pass a particular point for audit by employing is required to pass a particular point for audit by employing
relaying, or preserving privacy by hiding each peer's transport relaying, or preserving privacy by hiding each peer's transport
addresses to the other party. Other protocols or functionalities addresses to the other party. Other protocols or functionalities
that provide this behavior are TURN [RFC5766] servers, Session Border that provide this behavior are TURN [RFC5766] servers, Session Border
Gateways and Media Processing Nodes with media anchoring Gateways and Media Processing Nodes with media anchoring
functionalities. functionalities.
+---+ +---+ +---+ +---+ +---+ +---+
| A |<------>| T |<------->| B | | A |<------>| T |<------->| B |
+---+ +---+ +---+ +---+ +---+ +---+
Figure 2: Point to Point with Translator Figure 2: Point to Point with Translator
A common element in these functions is that they are normally A common element in these functions is that they are normally
transparent at the RTP level, i.e., they perform no changes on any transparent at the RTP level, i.e., they perform no changes on any
RTP or RTCP packet fields and only affect the lower layers. They may RTP or RTCP packet fields and only affect the lower layers. They may
affect, however, the path the RTP and RTCP packets are routed between affect, however, the path the RTP and RTCP packets are routed between
the endpoints in the RTP session, and thereby only indirectly affect the endpoints in the RTP session, and thereby only indirectly affect
the RTP session. For this reason, one could believe that transport the RTP session. For this reason, one could believe that transport
translator-type middleboxes do not need to be included in this translator-type middleboxes do not need to be included in this
skipping to change at page 8, line 13 skipping to change at page 8, line 17
encoding. encoding.
Stand-alone Media Translators are rare. Most commonly, a combination Stand-alone Media Translators are rare. Most commonly, a combination
of Transport and Media Translator is used to translate both the media of Transport and Media Translator is used to translate both the media
stream and the transport aspects of a stream between two transport stream and the transport aspects of a stream between two transport
domains (or clouds). domains (or clouds).
When media translation occurs, the Translator's task regarding When media translation occurs, the Translator's task regarding
handling of RTCP traffic becomes substantially more complex. In this handling of RTCP traffic becomes substantially more complex. In this
case, the Translator needs to rewrite B's RTCP Receiver Report before case, the Translator needs to rewrite B's RTCP Receiver Report before
forwarding them to A. The rewriting is needed as the stream received forwarding them to A. The rewriting is needed as the stream received
by B is not the same stream as the other participants receive. For by B is not the same stream as the other participants receive. For
example, the number of packets transmitted to B may be lower than example, the number of packets transmitted to B may be lower than
what A sends, due to the different media format and data rate. what A sends, due to the different media format and data rate.
Therefore, if the Receiver Reports were forwarded without changes, Therefore, if the Receiver Reports were forwarded without changes,
the extended highest sequence number would indicate that B were the extended highest sequence number would indicate that B were
substantially behind in reception, while most likely it would not be. substantially behind in reception, while most likely it would not be.
Therefore, the Translator must translate that number to a Therefore, the Translator must translate that number to a
corresponding sequence number for the stream the Translator received. corresponding sequence number for the stream the Translator received.
Similar arguments can be made for most other fields in the RTCP Similar arguments can be made for most other fields in the RTCP
Receiver Reports. Receiver Reports.
skipping to change at page 8, line 35 skipping to change at page 8, line 39
A media Translator may in some cases act on behalf of the "real" A media Translator may in some cases act on behalf of the "real"
source and respond to RTCP feedback messages. This may occur, for source and respond to RTCP feedback messages. This may occur, for
example, when a receiver requests a bandwidth reduction, and the example, when a receiver requests a bandwidth reduction, and the
media Translator has not detected any congestion or other reasons for media Translator has not detected any congestion or other reasons for
bandwidth reduction between the media source and itself. In that bandwidth reduction between the media source and itself. In that
case, it is sensible that the media Translator reacts to the codec case, it is sensible that the media Translator reacts to the codec
control messages itself, for example, by transcoding to a lower media control messages itself, for example, by transcoding to a lower media
rate. rate.
A variant of translator behaviour worth pointing out is the one A variant of translator behaviour worth pointing out is the one
depicted in Figure 3 of an endpoint A sends a media flow to B. On the depicted in Figure 3 of an endpoint A sends a media flow to B. On
path there is a device T that on A's behalf does something with the the path there is a device T that on A's behalf does something with
media streams, for example adds an RTP session with FEC information the media streams, for example adds an RTP session with FEC
for A's media streams. In this case, T needs to bind the new FEC information for A's media streams. In this case, T needs to bind the
streams to A's media stream, for example by using the same CNAME as new FEC streams to A's media stream, for example by using the same
A. CNAME as A.
+------+ +------+ +------+ +------+ +------+ +------+
| | | | | | | | | | | |
| A |------->| T |-------->| B | | A |------->| T |-------->| B |
| | | |---FEC-->| | | | | |---FEC-->| |
+------+ +------+ +------+ +------+ +------+ +------+
Figure 3: When De-composition is a Translator Figure 3: When De-composition is a Translator
This type of functionality where T does something with the media This type of functionality where T does something with the media
stream on behalf of A is covered under the media translator stream on behalf of A is covered under the media translator
definition. definition.
3.2.2. Back to Back RTP sessions 3.2.2. Back to Back RTP sessions
There exist middleboxes that interconnect two endpoints through There exist middleboxes that interconnect two endpoints through
themselves, but not by being part of a common RTP session. They themselves, but not by being part of a common RTP session. They
establish instead two different RTP sessions, one between A and the establish instead two different RTP sessions, one between A and the
middlebox and another between the middlebox and B. middlebox and another between the middlebox and B. This topology is
called Topo-Back-To-Back
|<--Session A-->| |<--Session B-->| |<--Session A-->| |<--Session B-->|
+------+ +------+ +------+ +------+ +------+ +------+
| A |------->| MB |-------->| B | | A |------->| MB |-------->| B |
+------+ +------+ +------+ +------+ +------+ +------+
Figure 4: When De-composition is a Translator Figure 4: When De-composition is a Translator
The middlebox acts as an application-level gateway and bridges the The middlebox acts as an application-level gateway and bridges the
two RTP sessions. This bridging can be as basic as forwarding the two RTP sessions. This bridging can be as basic as forwarding the
RTP payloads between the sessions, or more complex including media RTP payloads between the sessions, or more complex including media
transcoding. The difference with the single RTP session context is transcoding. The difference with the single RTP session context is
the handling of the SSRCs and the other session-related identifiers, the handling of the SSRCs and the other session-related identifiers,
such as CNAMEs. With two different RTP sessions these can be freely such as CNAMEs. With two different RTP sessions these can be freely
changed and it becomes the middlebox's task to maintain the correct changed and it becomes the middlebox's task to maintain the correct
skipping to change at page 9, line 40 skipping to change at page 10, line 4
sessions and changing identifiers. The structure with two RTP sessions and changing identifiers. The structure with two RTP
sessions also puts a congestion control requirement on the middlebox, sessions also puts a congestion control requirement on the middlebox,
because it becomes fully responsible for the media stream it sources because it becomes fully responsible for the media stream it sources
into each of the sessions. into each of the sessions.
Adherence to congestion control can be solved locally or by bridging Adherence to congestion control can be solved locally or by bridging
also statistics from the receiving endpoint. From an implementation also statistics from the receiving endpoint. From an implementation
point, however, this requires dealing with a number of point, however, this requires dealing with a number of
inconsistencies. First, packet loss must be detected for an RTP flow inconsistencies. First, packet loss must be detected for an RTP flow
sent from A to the middlebox, and that loss must be reported through sent from A to the middlebox, and that loss must be reported through
a skipped sequence number in the flow from the middlebox to B. This a skipped sequence number in the flow from the middlebox to B. This
coupling and the resulting inconsistencies is conceptually easier to coupling and the resulting inconsistencies is conceptually easier to
handle when considering the two flows as belonging to a single RTP handle when considering the two flows as belonging to a single RTP
session. session.
3.3. Point to Multipoint Using Multicast 3.3. Point to Multipoint Using Multicast
Multicast is an IP layer functionality that is available in some Multicast is an IP layer functionality that is available in some
networks. Two main flavors can be distinguished: Any Source networks. Two main flavors can be distinguished: Any Source
Multicast (ASM) [RFC1112] where any multicast group participant can Multicast (ASM) [RFC1112] where any multicast group participant can
send to the group address and expect the packet to reach all group send to the group address and expect the packet to reach all group
participants; and Source Specific Multicast (SSM) [RFC3569], where participants; and Source Specific Multicast (SSM) [RFC3569], where
only a particular IP host sends to the multicast group. Both these only a particular IP host sends to the multicast group. Both these
models are discussed below in their respective sections. models are discussed below in their respective sections.
3.3.1. Any Source Multicast (ASM) 3.3.1. Any Source Multicast (ASM)
skipping to change at page 10, line 16 skipping to change at page 10, line 23
Multicast (ASM) [RFC1112] where any multicast group participant can Multicast (ASM) [RFC1112] where any multicast group participant can
send to the group address and expect the packet to reach all group send to the group address and expect the packet to reach all group
participants; and Source Specific Multicast (SSM) [RFC3569], where participants; and Source Specific Multicast (SSM) [RFC3569], where
only a particular IP host sends to the multicast group. Both these only a particular IP host sends to the multicast group. Both these
models are discussed below in their respective sections. models are discussed below in their respective sections.
3.3.1. Any Source Multicast (ASM) 3.3.1. Any Source Multicast (ASM)
Shortcut name: Topo-ASM (was Topo-Multicast) Shortcut name: Topo-ASM (was Topo-Multicast)
+-----+ +-----+
+---+ / \ +---+ +---+ / \ +---+
| A |----/ \---| B | | A |----/ \---| B |
+---+ / Multi- \ +---+ +---+ / Multi- \ +---+
+ Cast + + Cast +
+---+ \ Network / +---+ +---+ \ Network / +---+
| C |----\ /---| D | | C |----\ /---| D |
+---+ \ / +---+ +---+ \ / +---+
+-----+ +-----+
Figure 5: Point to Multipoint Using Multicast Figure 5: Point to Multipoint Using Multicast
Point to Multipoint (PtM) is defined here as using a multicast Point to Multipoint (PtM) is defined here as using a multicast
topology as a transmission model, in which traffic from any topology as a transmission model, in which traffic from any
participant reaches all the other participants, except for cases such participant reaches all the other participants, except for cases such
as: as:
o packet loss, or o packet loss, or
skipping to change at page 11, line 16 skipping to change at page 11, line 23
small multicast group, some applications may still want to use the small multicast group, some applications may still want to use the
more limited options for RTCP feedback available to large multicast more limited options for RTCP feedback available to large multicast
groups, for example when there is a likelihood that the threshold of groups, for example when there is a likelihood that the threshold of
the small multicast group (in terms of participants) may be exceeded the small multicast group (in terms of participants) may be exceeded
during the lifetime of a session. during the lifetime of a session.
RTCP feedback messages in multicast reach, like media data, every RTCP feedback messages in multicast reach, like media data, every
subscriber (subject to packet losses and multicast group subscriber (subject to packet losses and multicast group
subscription). Therefore, the feedback suppression mechanism subscription). Therefore, the feedback suppression mechanism
discussed in [RFC4585] is typically required. Each individual node discussed in [RFC4585] is typically required. Each individual node
needs to process every feedback message it receives, not to determine needs to process every feedback message it receives, not only to
if it is affected or if the feedback message applies only to some determine if it is affected or if the feedback message applies only
other participant, but also to derive timing restrictions for the to some other participant, but also to derive timing restrictions for
sending of its own feedback messages, if any. the sending of its own feedback messages, if any.
3.3.2. Source Specific Multicast (SSM) 3.3.2. Source Specific Multicast (SSM)
In Any Source Multicast, any of the participants can send to all the In Any Source Multicast, any of the participants can send to all the
other participants, by sending a packet to the multicast group. In other participants, by sending a packet to the multicast group. In
contrast, Source Specific Multicast [RFC3569][RFC4607] refers to contrast, Source Specific Multicast [RFC3569][RFC4607] refers to
scenarios where only a single source (Distribution Source) can send scenarios where only a single source (Distribution Source) can send
to the multicast group, creating a topology that looks like the one to the multicast group, creating a topology that looks like the one
below: below:
+--------+ +-----+ +--------+ +-----+
|Media | | | Source-specific |Media | | | Source-specific
|Sender 1|<----->| D S | Multicast |Sender 1|<----->| D S | Multicast
+--------+ | I O | +--+----------------> R(1) +--------+ | I O | +--+----------------> R(1)
| S U | | | | | S U | | | |
+--------+ | T R | | +-----------> R(2) | +--------+ | T R | | +-----------> R(2) |
|Media |<----->| R C |->+ | : | | |Media |<----->| R C |->+ | : | |
|Sender 2| | I E | | +------> R(n-1) | | |Sender 2| | I E | | +------> R(n-1) | |
+--------+ | B | | | | | | +--------+ | B | | | | | |
: | U | +--+--> R(n) | | | : | U | +--+--> R(n) | | |
: | T +-| | | | | : | T +-| | | | |
: | I | |<---------+ | | | : | I | |<---------+ | | |
+--------+ | O |F|<---------------+ | | +--------+ | O |F|<---------------+ | |
|Media | | N |T|<--------------------+ | |Media | | N |T|<--------------------+ |
|Sender M|<----->| | |<-------------------------+ |Sender M|<----->| | |<-------------------------+
+--------+ +-----+ RTCP Unicast +--------+ +-----+ RTCP Unicast
FT = Feedback Target FT = Feedback Target
Transport from the Feedback Target to the Distribution Transport from the Feedback Target to the Distribution
Source is via unicast or multicast RTCP if they are not Source is via unicast or multicast RTCP if they are not
co-located. co-located.
Figure 6: Point to Multipoint using Source Specific Multicast Figure 6: Point to Multipoint using Source Specific Multicast
In the SSM topology (Figure 6) a number of RTP sources (1 to M) are In the SSM topology (Figure 6) a number of RTP sources (1 to M) are
allowed to send media to the SSM group. These sources send media to allowed to send media to the SSM group. These sources send media to
a dedicated distribution source, which forwards the media streams to a dedicated distribution source, which forwards the media streams to
the multicast group on behalf of the original senders. The media the multicast group on behalf of the original senders. The media
streams reach the Receivers (R(1) to R(n)). The Receivers' RTCP streams reach the Receivers (R(1) to R(n)). The Receivers' RTCP
messages cannot be sent to the multicast group, as the SSM multicast messages cannot be sent to the multicast group, as the SSM multicast
group by definition has only a single source. To support RTCP, an group by definition has only a single IP sender. To support RTCP, an
RTP extension for SSM [RFC5760] was defined. It uses unicast RTP extension for SSM [RFC5760] was defined. It uses unicast
transmission to send RTCP from each of the receivers to one or more transmission to send RTCP from each of the receivers to one or more
Feedback Targets (FT). The feedback targets relay the RTCP Feedback Targets (FT). The feedback targets relay the RTCP
unmodified, or provide a summary of the participants RTCP reports unmodified, or provide a summary of the participants RTCP reports
towards the whole group by forwarding the RTCP traffic to the towards the whole group by forwarding the RTCP traffic to the
distribution source. Figure 6 only shows a single feedback target distribution source. Figure 6 only shows a single feedback target
integrated in the distribution source, but for scalability the FT can integrated in the distribution source, but for scalability the FT can
be many and have responsibility for sub-groups of the receivers. For be many and have responsibility for sub-groups of the receivers. For
summary reports, however, there must be a single feedback aggregating summary reports, however, there must be a single feedback aggregating
all the summaries to a common message to the whole receiver group. all the summaries to a common message to the whole receiver group.
skipping to change at page 13, line 31 skipping to change at page 13, line 31
All multicast configurations share a signalling requirement: all of All multicast configurations share a signalling requirement: all of
the participants need to have the same RTP and payload type the participants need to have the same RTP and payload type
configuration. Otherwise, A could, for example, be using payload configuration. Otherwise, A could, for example, be using payload
type 97 to identify the video codec H.264, while B would identify it type 97 to identify the video codec H.264, while B would identify it
as MPEG-2. as MPEG-2.
Security solutions for this type of group communications are also Security solutions for this type of group communications are also
challenging. First, the key-management and the security protocol challenging. First, the key-management and the security protocol
must support group communication. Source authentication becomes more must support group communication. Source authentication becomes more
difficult and requires special solutions. For more discussion on difficult and requires special solutions. For more discussion on
this please review Options for Securing RTP Sessions this please review Options for Securing RTP Sessions [RFC7201].
[I-D.ietf-avtcore-rtp-security-options].
3.3.3. SSM with Local Unicast Resources 3.3.3. SSM with Local Unicast Resources
[RFC6285] "Unicast-Based Rapid Acquisition of Multicast RTP Sessions" [RFC6285] "Unicast-Based Rapid Acquisition of Multicast RTP Sessions"
results in additional extensions to SSM Topology. results in additional extensions to SSM Topology.
----------- -------------- ----------- --------------
| |------------------------------------>| | | |------------------------------------>| |
| |.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.->| | | |.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.->| |
| | | | | | | |
skipping to change at page 14, line 31 skipping to change at page 14, line 46
Figure 7 Figure 7
The Rapid acquisition extension allows an endpoint joining an SSM The Rapid acquisition extension allows an endpoint joining an SSM
multicast session to request media starting with the last sync-point multicast session to request media starting with the last sync-point
(from where media can be decoded without requiring context (from where media can be decoded without requiring context
established by the decoding of prior packets) to be sent at high established by the decoding of prior packets) to be sent at high
speed until such time where, after decoding of these burst-delivered speed until such time where, after decoding of these burst-delivered
media packets, the correct media timing is established, i.e. media media packets, the correct media timing is established, i.e. media
packets are received within adequate buffer intervals for this packets are received within adequate buffer intervals for this
application. This is accomplished by first establishing a unicast application. This is accomplished by first establishing a unicast
PtP RTP session between the Burst/Retransmission Source (BRS, Figure PtP RTP session between the Burst/Retransmission Source (BRS,
7) and the RTP Receiver. The unicast session is used to transmit Figure 7) and the RTP Receiver. The unicast session is used to
cached packets from the multicast group at higher then normal speed transmit cached packets from the multicast group at higher then
in order to synchronize the receiver to the ongoing multicast packet normal speed in order to synchronize the receiver to the ongoing
flow. Once the RTP receiver and its decoder have caught up with the multicast packet flow. Once the RTP receiver and its decoder have
multicast session's current delivery, the receiver switches over to caught up with the multicast session's current delivery, the receiver
receiving directly from the multicast group. The (still existing) switches over to receiving directly from the multicast group. The
PtP RTP session is, in many deployed applications, be used as a (still existing) PtP RTP session is, in many deployed applications,
repair channel, i.e., for RTP Retransmission traffic of those packets be used as a repair channel, i.e., for RTP Retransmission traffic of
that were not received from the multicast group. those packets that were not received from the multicast group.
3.4. Point to Multipoint Using Mesh 3.4. Point to Multipoint Using Mesh
Shortcut name: Topo-Mesh Shortcut name: Topo-Mesh
+---+ +---+ +---+ +---+
| A |<---->| B | | A |<---->| B |
+---+ +---+ +---+ +---+
^ ^ ^ ^
\ / \ /
\ / \ /
v v v v
+---+ +---+
| C | | C |
+---+ +---+
Figure 8: Point to Multi-Point using Mesh Figure 8: Point to Multi-Point using Mesh
Based on the RTP session definition, it is clearly possible to have a Based on the RTP session definition, it is clearly possible to have a
joint RTP session over multiple unicast transport flows like the joint RTP session over multiple unicast transport flows like the
above joint three endpoint session. In this case, A needs to send above joint three endpoint session. In this case, A needs to send
its' media streams and RTCP packets to both B and C over their its' media streams and RTCP packets to both B and C over their
respective transport flows. As long as all participants do the same, respective transport flows. As long as all participants do the same,
everyone will have a joint view of the RTP session. everyone will have a joint view of the RTP session.
This does not create any additional requirements beyond the need to This does not create any additional requirements beyond the need to
have multiple transport flows associated with a single RTP session. have multiple transport flows associated with a single RTP session.
Note that an endpoint may use a single local port to receive all Note that an endpoint may use a single local port to receive all
these transport flows, or it might have separate local reception these transport flows, or it might have separate local reception
ports for each of the endpoints. ports for each of the endpoints.
+-A--------------------+ +-B-----------+ +-A--------------------+
|+---+ | | | |+---+ |
||CAM| | | | ||CAM| | +-B-----------+
|+---+ +-UDP1------| |-UDP1------+ | |+---+ +-UDP1------| |-UDP1------+ |
| | | +-RTP1----| |-RTP1----+ | | | | | +-RTP1----| |-RTP1----+ | |
| V | | +-Video-| |-Video-+ | | | | V | | +-Video-| |-Video-+ | | |
|+----+ | | | |<----------------|BV1 | | | | |+----+ | | | |<----------------|BV1 | | | |
||ENC |----+-+-+--->AV1|---------------->| | | | | ||ENC |----+-+-+--->AV1|---------------->| | | | |
|+----+ | | +-------| |-------+ | | | |+----+ | | +-------| |-------+ | | |
| | | +---------| |---------+ | | | | | +---------| |---------+ | |
| | +-----------| |-----------+ | | | +-----------| |-----------+ |
| | ------------| |------------ | | | | +-------------+
| | | |-------------+
| | | | | |
| | | +-C-----------+ | | | +-C-----------+
| | | | |
| | +-UDP2------| |-UDP2------+ | | | +-UDP2------| |-UDP2------+ |
| | | +-RTP1----| |-RTP1----+ | | | | | +-RTP1----| |-RTP1----+ | |
| | | | +-Video-| |-Video-+ | | | | | | | +-Video-| |-Video-+ | | |
| +-------+-+-+--->AV1|---------------->| | | | | | +-------+-+-+--->AV1|---------------->| | | | |
| | | | |<----------------|CV1 | | | | | | | | |<----------------|CV1 | | | |
| | | +-------| |-------+ | | | | | | +-------| |-------+ | | |
| | +---------| |---------+ | | | | +---------| |---------+ | |
| +-----------| |-----------+ | | +-----------| |-----------+ |
| ------------| |------------ |
+----------------------+ +-------------+ +----------------------+ +-------------+
Figure 9: An Multi-unicast Mesh with a joint RTP session Figure 9: An Multi-unicast Mesh with a joint RTP session
A joint RTP session from A's perspective for the Mesh depicted in A joint RTP session from A's perspective for the Mesh depicted in
Figure 8 with a joint RTP session have multiple transport flows, here Figure 8 with a joint RTP session have multiple transport flows, here
enumerated as UDP1 and UDP2. However, there is only one RTP session enumerated as UDP1 and UDP2. However, there is only one RTP session
(RTP1). The media source (CAM) is encoded and transmitted over the (RTP1). The media source (CAM) is encoded and transmitted over the
SSRC (AV1) across both transport layers. However, as this is a joint SSRC (AV1) across both transport layers. However, as this is a joint
RTP session, the two streams must be the same. Thus, an congestion RTP session, the two streams must be the same. Thus, an congestion
skipping to change at page 17, line 47 skipping to change at page 18, line 31
multipoint of Translators compared to the point to point only cases multipoint of Translators compared to the point to point only cases
in Section 3.2.1. in Section 3.2.1.
3.5.1. Relay - Transport Translator 3.5.1. Relay - Transport Translator
Shortcut name: Topo-PtM-Trn-Translator Shortcut name: Topo-PtM-Trn-Translator
This section discusses Transport Translator only usages to enable This section discusses Transport Translator only usages to enable
multipoint sessions. multipoint sessions.
+-----+ +-----+
+---+ / \ +------------+ +---+ +---+ / \ +------------+ +---+
| A |<---/ \ | |<---->| B | | A |<---/ \ | |<---->| B |
+---+ / Multi- \ | | +---+ +---+ / Multi- \ | | +---+
+ cast +->| Translator | + cast +->| Translator |
+---+ \ Network / | | +---+
+---+ \ Network / | | +---+ | C |<---\ / | |<---->| D |
| C |<---\ / | |<---->| D | +---+ \ / +------------+ +---+
+---+ \ / +------------+ +---+ +-----+
+-----+
Figure 11: Point to Multipoint Using Multicast Figure 11: Point to Multipoint Using Multicast
Figure 11 depicts an example of a Transport Translator performing at Figure 11 depicts an example of a Transport Translator performing at
least IP address translation. It allows the (non-multicast-capable) least IP address translation. It allows the (non-multicast-capable)
participants B and D to take part in an any source multicast session participants B and D to take part in an any source multicast session
by having the Translator forward their unicast traffic to the by having the Translator forward their unicast traffic to the
multicast addresses in use, and vice versa. It must also forward B's multicast addresses in use, and vice versa. It must also forward B's
traffic to D, and vice versa, to provide each of B and D with a traffic to D, and vice versa, to provide each of B and D with a
complete view of the session. complete view of the session.
+---+ +------------+ +---+ +---+ +------------+ +---+
| A |<---->| |<---->| B | | A |<---->| |<---->| B |
+---+ | | +---+ +---+ | | +---+
| Translator | | Translator |
+---+ | | +---+ +---+ | | +---+
| C |<---->| |<---->| D | | C |<---->| |<---->| D |
+---+ +------------+ +---+ +---+ +------------+ +---+
Figure 12: RTP Translator (Relay) with Only Unicast Paths Figure 12: RTP Translator (Relay) with Only Unicast Paths
Another Translator scenario is depicted in Figure 12. The Translator Another Translator scenario is depicted in Figure 12. The Translator
in this case connects multiple users of a conference through unicast. in this case connects multiple users of a conference through unicast.
This can be implemented using a very simple transport Translator This can be implemented using a very simple transport Translator
which, in this document, is called a relay. The relay forwards all which, in this document, is called a relay. The relay forwards all
traffic it receives, both RTP and RTCP, to all other participants. traffic it receives, both RTP and RTCP, to all other participants.
In doing so, a multicast network is emulated without relying on a In doing so, a multicast network is emulated without relying on a
multicast-capable network infrastructure. multicast-capable network infrastructure.
skipping to change at page 19, line 29 skipping to change at page 20, line 13
other participants to reach B without overloading the path. This other participants to reach B without overloading the path. This
transcoding can help the other participants in the Multicast part of transcoding can help the other participants in the Multicast part of
the session, by not requiring the quality transmitted by A to be the session, by not requiring the quality transmitted by A to be
lowered to the bitrates that B is actually capable of receiving. lowered to the bitrates that B is actually capable of receiving.
3.6. Point to Multipoint Using the RFC 3550 Mixer Model 3.6. Point to Multipoint Using the RFC 3550 Mixer Model
Shortcut name: Topo-Mixer Shortcut name: Topo-Mixer
A Mixer is a middlebox that aggregates multiple RTP streams that are A Mixer is a middlebox that aggregates multiple RTP streams that are
part of a session by generating a new RTP stream and, in most cases, part of a session by generating one or more new RTP streams and, in
by manipulating the media data. One common application for a Mixer most cases, by manipulating the media data. One common application
is to allow a participant to receive a session with a reduced amount for a Mixer is to allow a participant to receive a session with a
of resources. reduced amount of resources.
+-----+ +-----+
+---+ / \ +-----------+ +---+ +---+ / \ +-----------+ +---+
| A |<---/ \ | |<---->| B | | A |<---/ \ | |<---->| B |
+---+ / Multi- \ | | +---+ +---+ / Multi- \ | | +---+
+ cast +->| Mixer | + cast +->| Mixer |
+---+ \ Network / | | +---+ +---+ \ Network / | | +---+
| C |<---\ / | |<---->| D | | C |<---\ / | |<---->| D |
+---+ \ / +-----------+ +---+ +---+ \ / +-----------+ +---+
+-----+ +-----+
Figure 13: Point to Multipoint Using the RFC 3550 Mixer Model Figure 13: Point to Multipoint Using the RFC 3550 Mixer Model
A Mixer can be viewed as a device terminating the media streams A Mixer can be viewed as a device terminating the media streams
received from other session participants. Using the media data from received from other session participants. Using the media data from
the received media streams, a Mixer generates a media stream that is the received media streams, a Mixer generates media streams that are
sent to the session participant. sent to the session participant.
The content that the Mixer provides is the mixed aggregate of what The content that the Mixer provides is the mixed aggregate of what
the Mixer receives over the PtP or PtM paths, which are part of the the Mixer receives over the PtP or PtM paths, which are part of the
same conference session. same conference session.
The Mixer is the content source, as it mixes the content (often in The Mixer is the content source, as it mixes the content (often in
the uncompressed domain) and then encodes it for transmission to a the uncompressed domain) and then encodes it for transmission to a
participant. The CSRC Count (CC) and CSRC fields in the RTP header participant. The CSRC Count (CC) and CSRC fields in the RTP header
can be used to indicate the contributors to the newly generated can be used to indicate the contributors to the newly generated
skipping to change at page 20, line 32 skipping to change at page 21, line 14
The Mixer is responsible for generating RTCP packets in accordance The Mixer is responsible for generating RTCP packets in accordance
with its role. It is a receiver and should therefore send receiver with its role. It is a receiver and should therefore send receiver
reports for the media streams it receives. In its role as a media reports for the media streams it receives. In its role as a media
sender, it should also generate sender reports for those media sender, it should also generate sender reports for those media
streams it sends. As specified in Section 7.3 of RFC 3550, a Mixer streams it sends. As specified in Section 7.3 of RFC 3550, a Mixer
must not forward RTCP unaltered between the two domains. must not forward RTCP unaltered between the two domains.
The Mixer depicted in Figure 13 is involved in three domains that The Mixer depicted in Figure 13 is involved in three domains that
need to be separated: the any source multicast network (including need to be separated: the any source multicast network (including
participants A and C), participant B, and participant D. Assuming all participants A and C), participant B, and participant D. Assuming
four participants in the conference are interested in receiving all four participants in the conference are interested in receiving
content from each other participant, the Mixer produces different content from each other participant, the Mixer produces different
mixed streams for B and D, as the one to B may contain content mixed streams for B and D, as the one to B may contain content
received from D, and vice versa. However, the Mixer may only need received from D, and vice versa. However, the Mixer may only need
one SSRC per media type in each domain where it is the receiving one SSRC per media type in each domain where it is the receiving
entity and transmitter of mixed content. entity and transmitter of mixed content.
In the multicast domain, a Mixer still needs to provide a mixed view In the multicast domain, a Mixer still needs to provide a mixed view
of the other domains. This makes the Mixer simpler to implement and of the other domains. This makes the Mixer simpler to implement and
avoids any issues with advanced RTCP handling or loop detection, avoids any issues with advanced RTCP handling or loop detection,
which would be problematic if the Mixer were providing non-symmetric which would be problematic if the Mixer were providing non-symmetric
skipping to change at page 21, line 14 skipping to change at page 22, line 5
participants in the other domain(s). In other cases, a message is participants in the other domain(s). In other cases, a message is
handled by the Mixer itself and therefore not forwarded to any other handled by the Mixer itself and therefore not forwarded to any other
domain. domain.
When replacing the multicast network in Figure 13 (to the left of the When replacing the multicast network in Figure 13 (to the left of the
Mixer) with individual unicast paths as depicted in Figure 14, the Mixer) with individual unicast paths as depicted in Figure 14, the
Mixer model is very similar to the one discussed in Section 3.9 Mixer model is very similar to the one discussed in Section 3.9
below. Please see the discussion in Section 3.9 about the below. Please see the discussion in Section 3.9 about the
differences between these two models. differences between these two models.
+---+ +------------+ +---+ +---+ +------------+ +---+
| A |<---->| |<---->| B | | A |<---->| |<---->| B |
+---+ | | +---+ +---+ | | +---+
| Mixer | | Mixer |
+---+ | | +---+ +---+ | | +---+
| C |<---->| |<---->| D | | C |<---->| |<---->| D |
+---+ +------------+ +---+ +---+ +------------+ +---+
Figure 14: RTP Mixer with Only Unicast Paths Figure 14: RTP Mixer with Only Unicast Paths
We now discuss in more detail the different mixing operations that a We now discuss in more detail the different mixing operations that a
mixer can perform and how they can affect RTP and RTCP behavior. mixer can perform and how they can affect RTP and RTCP behavior.
3.6.1. Media Mixing 3.6.1. Media Mixing
The media mixing mixer is likely the one that most think of when they The media mixing mixer is likely the one that most think of when they
hear the term "mixer". Its basic mode of operation is that it hear the term "mixer". Its basic mode of operation is that it
skipping to change at page 22, line 46 skipping to change at page 24, line 5
annoyingly noticeable in case of video, or in case of audio if that annoyingly noticeable in case of video, or in case of audio if that
mixed audio is lip-sychronized with high latency video. The mixed audio is lip-sychronized with high latency video. The
advantage of media mixing is that it is straightforward for the advantage of media mixing is that it is straightforward for the
clients to handle the single media stream (which includes the mixed clients to handle the single media stream (which includes the mixed
aggregate of many sources), as they don't need to handle multiple aggregate of many sources), as they don't need to handle multiple
decodings, local mixing and composition. In fact, mixers were decodings, local mixing and composition. In fact, mixers were
introduced in pre-RTP times so that legacy, single stream receiving introduced in pre-RTP times so that legacy, single stream receiving
endpoints could successfully participate in what a user would endpoints could successfully participate in what a user would
recognize as a multiparty video conference. recognize as a multiparty video conference.
+-A---------+ +-MIXER----------------------+ +-A---------+ +-MIXER----------------------+
| +-RTP1----| |-RTP1------+ +-----+ | | +-RTP1----| |-RTP1------+ +-----+ |
| | +-Audio-| |-Audio---+ | +---+ | | | | | +-Audio-| |-Audio---+ | +---+ | | |
| | | AA1|--------->|---------+-+-|DEC|->| | | | | | AA1|--------->|---------+-+-|DEC|->| | |
| | | |<---------|MA1 <----+ | +---+ | | | | | | |<---------|MA1 <----+ | +---+ | | |
| | | | |(BA1+CA1)|\| +---+ | | | | | | | |(BA1+CA1)|\| +---+ | | |
| | +-------| |---------+ +-|ENC|<-| B+C | | | | +-------| |---------+ +-|ENC|<-| B+C | |
| +---------| |-----------+ +---+ | | | | +---------| |-----------+ +---+ | | |
+-----------+ | | | | +-----------+ | | | |
| | M | | | | M | |
+-B---------+ | | E | | +-B---------+ | | E | |
| +-RTP2----| |-RTP2------+ | D | | | +-RTP2----| |-RTP2------+ | D | |
| | +-Audio-| |-Audio---+ | +---+ | I | | | | +-Audio-| |-Audio---+ | +---+ | I | |
| | | BA1|--------->|---------+-+-|DEC|->| A | | | | | BA1|--------->|---------+-+-|DEC|->| A | |
| | | |<---------|MA2 <----+ | +---+ | | | | | | |<---------|MA2 <----+ | +---+ | | |
| | +-------| |(BA1+CA1)|\| +---+ | | | | | +-------| |(BA1+CA1)|\| +---+ | | |
| +---------| |---------+ +-|ENC|<-| A+C | | | +---------| |---------+ +-|ENC|<-| A+C | |
+-----------+ |-----------+ +---+ | | | +-----------+ |-----------+ +---+ | | |
| | M | | | | M | |
+-C---------+ | | I | | +-C---------+ | | I | |
| +-RTP3----| |-RTP3------+ | X | | | +-RTP3----| |-RTP3------+ | X | |
| | +-Audio-| |-Audio---+ | +---+ | E | | | | +-Audio-| |-Audio---+ | +---+ | E | |
| | | CA1|--------->|---------+-+-|DEC|->| R | | | | | CA1|--------->|---------+-+-|DEC|->| R | |
| | | |<---------|MA3 <----+ | +---+ | | | | | | |<---------|MA3 <----+ | +---+ | | |
| | +-------| |(BA1+CA1)|\| +---+ | | | | | +-------| |(BA1+CA1)|\| +---+ | | |
| +---------| |---------+ +-|ENC|<-| A+B | | | +---------| |---------+ +-|ENC|<-| A+B | |
+-----------+ |-----------+ +---+ +-----+ | +-----------+ |-----------+ +---+ +-----+ |
+----------------------------+ +----------------------------+
Figure 15: Session and SSRC details for Media Mixer Figure 15: Session and SSRC details for Media Mixer
From an RTP perspective media mixing can be a very simple process, as From an RTP perspective media mixing can be a very simple process, as
can be seen in Figure 15. The mixer presents one SSRC towards the can be seen in Figure 15. The mixer presents one SSRC towards the
receiving client, e.g., MA1 to Peer A, where the associated stream is receiving client, e.g., MA1 to Peer A, where the associated stream is
the media mix of the other participants. As each peer, in this the media mix of the other participants. As each peer, in this
example, receives a different version of a mix from the mixer, there example, receives a different version of a mix from the mixer, there
is no actual relation between the different RTP sessions in terms of is no actual relation between the different RTP sessions in terms of
actual media or transport level information. There are, however, actual media or transport level information. There are, however,
skipping to change at page 25, line 5 skipping to change at page 26, line 13
same numbering for a given configuration. This also requires that same numbering for a given configuration. This also requires that
the different endpoints support a common set of codecs, otherwise the different endpoints support a common set of codecs, otherwise
media transcoding for codec compatibility would still be required. media transcoding for codec compatibility would still be required.
We now consider the operation of a media switching mixer that We now consider the operation of a media switching mixer that
supports a video conference with six participants (A-F) where the two supports a video conference with six participants (A-F) where the two
most recent speakers in the conference are shown to each participant. most recent speakers in the conference are shown to each participant.
The mixer has thus two SSRCs sending video to each peer, and each The mixer has thus two SSRCs sending video to each peer, and each
peer is capable of locally handling two video streams simultaneously. peer is capable of locally handling two video streams simultaneously.
+-A---------+ +-MIXER----------------------+ +-A---------+ +-MIXER----------------------+
| +-RTP1----| |-RTP1------+ +-----+ | | +-RTP1----| |-RTP1------+ +-----+ |
| | +-Video-| |-Video---+ | | | | | | +-Video-| |-Video---+ | | | |
| | | AV1|------------>|---------+-+------->| S | | | | | AV1|------------>|---------+-+------->| S | |
| | | |<------------|MV1 <----+-+-BV1----| W | | | | | |<------------|MV1 <----+-+-BV1----| W | |
| | | |<------------|MV2 <----+-+-EV1----| I | | | | | |<------------|MV2 <----+-+-EV1----| I | |
| | +-------| |---------+ | | T | | | | +-------| |---------+ | | T | |
| +---------| |-----------+ | C | | | +---------| |-----------+ | C | |
+-----------+ | | H | | +-----------+ | | H | |
| | | | | | | |
+-B---------+ | | M | | +-B---------+ | | M | |
| +-RTP2----| |-RTP2------+ | A | | | +-RTP2----| |-RTP2------+ | A | |
| | +-Video-| |-Video---+ | | T | | | | +-Video-| |-Video---+ | | T | |
| | | BV1|------------>|---------+-+------->| R | | | | | BV1|------------>|---------+-+------->| R | |
| | | |<------------|MV3 <----+-+-AV1----| I | | | | | |<------------|MV3 <----+-+-AV1----| I | |
| | | |<------------|MV4 <----+-+-EV1----| X | | | | | |<------------|MV4 <----+-+-EV1----| X | |
| | +-------| |---------+ | | | | | | +-------| |---------+ | | | |
| +---------| |-----------+ | | | | +---------| |-----------+ | | |
+-----------+ | | | | +-----------+ | | | |
: : : : : : : :
: : : : : : : :
+-F---------+ | | | | +-F---------+ | | | |
| +-RTP6----| |-RTP6------+ | | | | +-RTP6----| |-RTP6------+ | | |
| | +-Video-| |-Video---+ | | | | | | +-Video-| |-Video---+ | | | |
| | | CV1|------------>|---------+-+------->| | | | | | CV1|------------>|---------+-+------->| | |
| | | |<------------|MV11 <---+-+-AV1----| | | | | | |<------------|MV11 <---+-+-AV1----| | |
| | | |<------------|MV12 <---+-+-EV1----| | | | | | |<------------|MV12 <---+-+-EV1----| | |
| | +-------| |---------+ | | | | | | +-------| |---------+ | | | |
| +---------| |-----------+ +-----+ | | +---------| |-----------+ +-----+ |
+-----------+ +----------------------------+ +-----------+ +----------------------------+
Figure 16: Media Switching RTP Mixer Figure 16: Media Switching RTP Mixer
The Media Switching RTP mixer can, similarly to the Media Mixing The Media Switching RTP mixer can, similarly to the Media Mixing
Mixer, reduce the bit-rate required for media transmission towards Mixer, reduce the bit-rate required for media transmission towards
the different peers by selecting and forwarding only a sub-set of RTP the different peers by selecting and forwarding only a sub-set of RTP
media streams it receives from the conference participants. In cases media streams it receives from the conference participants. In cases
the mixer receives simulcast transmissions or a scalable encoding of the mixer receives simulcast transmissions or a scalable encoding of
the media source, the mixer has more degrees of freedom to select the media source, the mixer has more degrees of freedom to select
streams or sub-sets of stream to forward to a receiver, both based on streams or sub-sets of stream to forward to a receiver, both based on
skipping to change at page 26, line 25 skipping to change at page 28, line 5
3.7. Selective Forwarding Middlebox 3.7. Selective Forwarding Middlebox
Another method for handling media in the RTP mixer is to "project", Another method for handling media in the RTP mixer is to "project",
or make available, all potential RTP sources (SSRCs) into a per- or make available, all potential RTP sources (SSRCs) into a per-
endpoint, independent RTP session. The middlebox can select which of endpoint, independent RTP session. The middlebox can select which of
the potential sources that are currently actively transmitting media the potential sources that are currently actively transmitting media
will be sent to each of the endpoints. This is similar to the media will be sent to each of the endpoints. This is similar to the media
switching Mixer but has some important differences in RTP details. switching Mixer but has some important differences in RTP details.
+-A---------+ +-Middlebox-----------------+ +-A---------+ +-Middlebox-----------------+
| +-RTP1----| |-RTP1------+ +-----+ | | +-RTP1----| |-RTP1------+ +-----+ |
| | +-Video-| |-Video---+ | | | | | | +-Video-| |-Video---+ | | | |
| | | AV1|------------>|---------+-+------>| | | | | | AV1|------------>|---------+-+------>| | |
| | | |<------------|BV1 <----+-+-------| S | | | | | |<------------|BV1 <----+-+-------| S | |
| | | |<------------|CV1 <----+-+-------| W | | | | | |<------------|CV1 <----+-+-------| W | |
| | | |<------------|DV1 <----+-+-------| I | | | | | |<------------|DV1 <----+-+-------| I | |
| | | |<------------|EV1 <----+-+-------| T | | | | | |<------------|EV1 <----+-+-------| T | |
| | | |<------------|FV1 <----+-+-------| C | | | | | |<------------|FV1 <----+-+-------| C | |
| | +-------| |---------+ | | H | | | | +-------| |---------+ | | H | |
| +---------| |-----------+ | | | | +---------| |-----------+ | | |
+-----------+ | | M | | +-----------+ | | M | |
| | A | | | | A | |
+-B---------+ | | T | | +-B---------+ | | T | |
| +-RTP2----| |-RTP2------+ | R | | | +-RTP2----| |-RTP2------+ | R | |
| | +-Video-| |-Video---+ | | I | | | | +-Video-| |-Video---+ | | I | |
| | | BV1|------------>|---------+-+------>| X | | | | | BV1|------------>|---------+-+------>| X | |
| | | |<------------|AV1 <----+-+-------| | | | | | |<------------|AV1 <----+-+-------| | |
| | | |<------------|CV1 <----+-+-------| | | | | | |<------------|CV1 <----+-+-------| | |
| | | | : : : |: : : : : : : : :| | | | | | | : : : |: : : : : : : : :| | |
| | | |<------------|FV1 <----+-+-------| | | | | | |<------------|FV1 <----+-+-------| | |
| | +-------| |---------+ | | | | | | +-------| |---------+ | | | |
| +---------| |-----------+ | | | | +---------| |-----------+ | | |
+-----------+ | | | | +-----------+ | | | |
: : : : : : : :
: : : : : : : :
+-F---------+ | | | | +-F---------+ | | | |
| +-RTP6----| |-RTP6------+ | | | | +-RTP6----| |-RTP6------+ | | |
| | +-Video-| |-Video---+ | | | | | | +-Video-| |-Video---+ | | | |
| | | FV1|------------>|---------+-+------>| | | | | | FV1|------------>|---------+-+------>| | |
| | | |<------------|AV1 <----+-+-------| | | | | | |<------------|AV1 <----+-+-------| | |
| | | | : : : |: : : : : : : : :| | | | | | | : : : |: : : : : : : : :| | |
| | | |<------------|EV1 <----+-+-------| | | | | | |<------------|EV1 <----+-+-------| | |
| | +-------| |---------+ | | | | | | +-------| |---------+ | | | |
| +---------| |-----------+ +-----+ | | +---------| |-----------+ +-----+ |
+-----------+ +---------------------------+ +-----------+ +---------------------------+
Figure 17: Selective Forwarding Middlebox Figure 17: Selective Forwarding Middlebox
In the six participant conference depicted above in (Figure 17) one In the six participant conference depicted above in (Figure 17) one
can see that end-point A is aware of five incoming SSRCs, BV1-FV1. can see that end-point A is aware of five incoming SSRCs, BV1-FV1.
If this middlebox intends to have a similar behavior as in If this middlebox intends to have a similar behavior as in
Section 3.6.2 where the mixer provides the end-points with the two Section 3.6.2 where the mixer provides the end-points with the two
latest speaking end-points, then only two out of these five SSRCs latest speaking end-points, then only two out of these five SSRCs
need concurrently transmit media to A. As the middlebox selects the need concurrently transmit media to A. As the middlebox selects the
source in the different RTP sessions that transmit media to the end- source in the different RTP sessions that transmit media to the end-
points, each RTP media stream requires some rewriting of RTP header points, each RTP media stream requires some rewriting of RTP header
fields when being projected from one session into another. In fields when being projected from one session into another. In
particular, the sequence number needs to be consecutively incremented particular, the sequence number needs to be consecutively incremented
based on the packet actually being transmitted in each RTP session. based on the packet actually being transmitted in each RTP session.
Therefore, the RTP sequence number offset will change each time a Therefore, the RTP sequence number offset will change each time a
source is turned on in a RTP session. The timestamp (possibly source is turned on in a RTP session. The timestamp (possibly
offset) stays the same. offset) stays the same.
As the RTP sessions are independent, the SSRC numbers used can also As the RTP sessions are independent, the SSRC numbers used can also
skipping to change at page 29, line 19 skipping to change at page 31, line 4
contexts, are one point of difference. The other is how the contexts, are one point of difference. The other is how the
identification is performed, where the Mixer uses CSRC to provide identification is performed, where the Mixer uses CSRC to provide
info what is included in a particular RTP packet stream that info what is included in a particular RTP packet stream that
represent a particular concept. Selective forwarding gets the source represent a particular concept. Selective forwarding gets the source
information through the SSRC, and instead have to use other mechanism information through the SSRC, and instead have to use other mechanism
to make clear the streams current purpose. to make clear the streams current purpose.
3.8. Point to Multipoint Using Video Switching MCUs 3.8. Point to Multipoint Using Video Switching MCUs
Shortcut name: Topo-Video-switch-MCU Shortcut name: Topo-Video-switch-MCU
+---+ +------------+ +---+
+---+ +------------+ +---+ | A |------| Multipoint |------| B |
| A |------| Multipoint |------| B | +---+ | Control | +---+
+---+ | Control | +---+ | Unit |
| Unit | +---+ | (MCU) | +---+
+---+ | (MCU) | +---+ | C |------| |------| D |
| C |------| |------| D | +---+ +------------+ +---+
+---+ +------------+ +---+
Figure 18: Point to Multipoint Using a Video Switching MCU Figure 18: Point to Multipoint Using a Video Switching MCU
This PtM topology was popular in early implementations of multipoint This PtM topology was popular in early implementations of multipoint
videoconferencing systems due to its simplicity, and the videoconferencing systems due to its simplicity, and the
corresponding middlebox design has been known as a "video switching corresponding middlebox design has been known as a "video switching
MCU". The more complex RTCP-terminating MCUs, discussed in the next MCU". The more complex RTCP-terminating MCUs, discussed in the next
section, became the norm, however, when technology allowed section, became the norm, however, when technology allowed
implementations at acceptable costs. implementations at acceptable costs.
skipping to change at page 30, line 36 skipping to change at page 32, line 18
appropriate CSRC values. Second, the MCU needs to modify the RTCP appropriate CSRC values. Second, the MCU needs to modify the RTCP
RRs it forwards between the domains. As a result, it is recommended RRs it forwards between the domains. As a result, it is recommended
that one implement a centralized video switching conference using a that one implement a centralized video switching conference using a
Mixer according to RFC 3550, instead of the shortcut implementation Mixer according to RFC 3550, instead of the shortcut implementation
described here. described here.
3.9. Point to Multipoint Using RTCP-Terminating MCU 3.9. Point to Multipoint Using RTCP-Terminating MCU
Shortcut name: Topo-RTCP-terminating-MCU Shortcut name: Topo-RTCP-terminating-MCU
+---+ +------------+ +---+ +---+ +------------+ +---+
| A |<---->| Multipoint |<---->| B | | A |<---->| Multipoint |<---->| B |
+---+ | Control | +---+ +---+ | Control | +---+
| Unit | | Unit |
+---+ | (MCU) | +---+ +---+ | (MCU) | +---+
| C |<---->| |<---->| D | | C |<---->| |<---->| D |
+---+ +------------+ +---+ +---+ +------------+ +---+
Figure 19: Point to Multipoint Using Content Modifying MCUs Figure 19: Point to Multipoint Using Content Modifying MCUs
In this PtM scenario, each participant runs an RTP point-to-point In this PtM scenario, each participant runs an RTP point-to-point
session between itself and the MCU. This is a very commonly deployed session between itself and the MCU. This is a very commonly deployed
topology in multipoint video conferencing. The content that the MCU topology in multipoint video conferencing. The content that the MCU
provides to each participant is either: provides to each participant is either:
a. a selection of the content received from the other participants, a. a selection of the content received from the other participants,
or or
skipping to change at page 32, line 28 skipping to change at page 34, line 10
Which decomposition scheme is possible is highly dependent on the RTP Which decomposition scheme is possible is highly dependent on the RTP
session usage. It is not really feasible to decompose one logical session usage. It is not really feasible to decompose one logical
end-point into two different transport nodes in one RTP session. A end-point into two different transport nodes in one RTP session. A
third party monitor would report such an attempt as two entities third party monitor would report such an attempt as two entities
being two different end-points with a CNAME collision. As a result, being two different end-points with a CNAME collision. As a result,
a fully RTP conformant de-composited endpoint is one where the a fully RTP conformant de-composited endpoint is one where the
different decomposed parts use separate RTP sessions to send and/or different decomposed parts use separate RTP sessions to send and/or
receive media streams intended for them. receive media streams intended for them.
+---------------------+ +---------------------+
| Endpoint A | | Endpoint A |
| Local Area Network | | Local Area Network |
| +------------+ | | +------------+ |
| +->| Audio |<+-RTP---\ | +->| Audio |<+-RTP---\
| | +------------+ | \ +------+ | | +------------+ | \ +------+
| | +------------+ | +-->| | | | +------------+ | +-->| |
| +->| Video |<+-RTP-------->| B | | +->| Video |<+-RTP-------->| B |
| | +------------+ | +-->| | | | +------------+ | +-->| |
| | +------------+ | / +------+ | | +------------+ | / +------+
| +->| Control |<+-SIP---/ | +->| Control |<+-SIP---/
| +------------+ | | +------------+ |
+---------------------+ +---------------------+
Figure 20: Split Component Endpoint Figure 20: Split Component Endpoint
In the above usage, let us assume that the different RTP sessions are In the above usage, let us assume that the different RTP sessions are
used for audio and video. The audio and video parts, however, use a used for audio and video. The audio and video parts, however, use a
common CNAME and also have a common clock to ensure that common CNAME and also have a common clock to ensure that
synchronization and clock drift handling works, despite the fact that synchronization and clock drift handling works, despite the fact that
the components are separated. Also, RTCP handling works correctly as the components are separated. Also, RTCP handling works correctly as
long as only one part of the split endpoint is part of each RTP long as only one part of the split endpoint is part of each RTP
session. That way any differences in the path between A's audio session. That way any differences in the path between A's audio
skipping to change at page 33, line 24 skipping to change at page 35, line 7
Shortcut name: Topo-Asymmetric Shortcut name: Topo-Asymmetric
It is theoretically possible to construct an MCU that is a Mixer in It is theoretically possible to construct an MCU that is a Mixer in
one direction and a Translator in another. The main reason to one direction and a Translator in another. The main reason to
consider this would be to allow topologies similar to Figure 13, consider this would be to allow topologies similar to Figure 13,
where the Mixer does not need to mix in the direction from B or D where the Mixer does not need to mix in the direction from B or D
towards the multicast domains with A and C. Instead, the media towards the multicast domains with A and C. Instead, the media
streams from B and D are forwarded without changes. Avoiding this streams from B and D are forwarded without changes. Avoiding this
mixing would save media processing resources that perform the mixing mixing would save media processing resources that perform the mixing
in cases where it isn't needed. However, there would still be a need in cases where it isn't needed. However, there would still be a need
to mix B's stream towards D. Only in the direction B -> multicast to mix B's stream towards D. Only in the direction B -> multicast
domain or D -> multicast domain would it be possible to work as a domain or D -> multicast domain would it be possible to work as a
Translator. In all other directions, it would function as a Mixer. Translator. In all other directions, it would function as a Mixer.
The Mixer/Translator would still need to process and change the RTCP The Mixer/Translator would still need to process and change the RTCP
before forwarding it in the directions of B or D to the multicast before forwarding it in the directions of B or D to the multicast
domain. One issue is that A and C do not know about the mixed-media domain. One issue is that A and C do not know about the mixed-media
stream the Mixer sends to either B or D. Therefore, any reports stream the Mixer sends to either B or D. Therefore, any reports
related to these streams must be removed. Also, receiver reports related to these streams must be removed. Also, receiver reports
related to A and C's media stream would be missing. To avoid A and C related to A and C's media stream would be missing. To avoid A and C
thinking that B and D aren't receiving A and C at all, the Mixer thinking that B and D aren't receiving A and C at all, the Mixer
needs to insert locally generated reports reflecting the situation needs to insert locally generated reports reflecting the situation
for the streams from A and C into B and D's Sender Reports. In the for the streams from A and C into B and D's Sender Reports. In the
opposite direction, the Receiver Reports from A and C about B's and opposite direction, the Receiver Reports from A and C about B's and
D's stream also need to be aggregated into the Mixer's Receiver D's stream also need to be aggregated into the Mixer's Receiver
Reports sent to B and D. Since B and D only have the Mixer as source Reports sent to B and D. Since B and D only have the Mixer as source
for the stream, all RTCP from A and C must be suppressed by the for the stream, all RTCP from A and C must be suppressed by the
Mixer. Mixer.
This topology is so problematic and it is so easy to get the RTCP This topology is so problematic and it is so easy to get the RTCP
processing wrong, that it is not recommended for implementation. processing wrong, that it is not recommended for implementation.
3.12. Combining Topologies 3.12. Combining Topologies
Topologies can be combined and linked to each other using Mixers or Topologies can be combined and linked to each other using Mixers or
Translators. However, care must be taken in handling the SSRC/CSRC Translators. However, care must be taken in handling the SSRC/CSRC
skipping to change at page 34, line 23 skipping to change at page 36, line 7
4. Comparing Topologies 4. Comparing Topologies
The topologies discussed in Section 3 have different properties. The topologies discussed in Section 3 have different properties.
This section first describes these properties and then analyzes how This section first describes these properties and then analyzes how
these properties are supported by the different topologies. Note these properties are supported by the different topologies. Note
that, even if a certain property is supported within a particular that, even if a certain property is supported within a particular
topology concept, the necessary functionality may be optional to topology concept, the necessary functionality may be optional to
implement. implement.
Note: This section has not yet been updated with the new additions of
topologies.
4.1. Topology Properties 4.1. Topology Properties
4.1.1. All to All Media Transmission 4.1.1. All to All Media Transmission
Multicast, at least Any Source Multicast (ASM), provides the To recapitulate, multicast, and in particular Any Source Multicast
functionality that everyone may send to, or receive from, everyone (ASM), provides the functionality that everyone may send to, or
else within the session. Mesh, MCUs, Mixers, and Translators may all receive from, everyone else within the session. Source-specific
provide that functionality at least on some basic level. However, Multicast (SSM) can provide a similar functionality by having anyone
there are some differences in which type of reachability they intending to participate as sender to send its media to the SSM
provide. distribution source. The SSM distribution source forwards the media
to all receivers subscribed to the multicast group. Mesh, MCUs,
Mixers, SFMs and Translators may all provide that functionality at
least on some basic level. However, there are some differences in
which type of reachability they provide.
The transport Translator function called "relay", in Section 3.5, as Closest to true IP-multicast-based, all to all transmission comes
well as the Mesh is the ones that provides the emulation of ASM that perhaps the transport Translator function called "relay" in in
is closest to true IP-multicast-based, all to all transmission. Section 3.5, as well as the Mesh with joint RTP sessions. Media
Media Translators, Mixers, and the MCU variants do not provide a Translators, Mesh with independent RTP Sessions, Mixers, SFUs and the
fully meshed forwarding on the transport level; instead, they only MCU variants do not provide a fully meshed forwarding on the
allow limited forwarding of content from the other session transport level; instead, they only allow limited forwarding of
participants. content from the other session participants.
The "all to all media transmission" requires that any media The "all to all media transmission" requires that any media
transmitting entity considers the path to the least capable receiver. transmitting entity considers the path to the least capable receiver.
Otherwise, the media transmissions may overload that path. Otherwise, the media transmissions may overload that path.
Therefore, a media sender needs to monitor the path from itself to Therefore, a media sender needs to monitor the path from itself to
any of the participants, to detect the currently least capable any of the participants, to detect the currently least capable
receiver, and adapt its sending rate accordingly. As multiple receiver, and adapt its sending rate accordingly. As multiple
participants may send simultaneously, the available resources may participants may send simultaneously, the available resources may
vary. RTCP's Receiver Reports help performing this monitoring, at vary. RTCP's Receiver Reports help performing this monitoring, at
least on a medium time scale. least on a medium time scale.
The resource consumption for performing all to all transmission The resource consumption for performing all to all transmission
varies, where the benefit of ASM is that only one copy of each packet Varies depending with the topology. Both ASM and SSM have the
traverse a particular link. Using a relay, causes one copy per benefit that only one copy of each packet traverses a particular
client to relay path and packet transmitted, however, in most cases link. Using a relay causes the transmission of one copy of a packet
the links with the multiple copies will be the ones close to the per client-to-relay path and packet transmitted. However, in most
relay, rather than the clients unless they share LAN segment. The cases the links carrying the multiple copies will be the ones close
Mesh causes N-1 copies of of each transmitted packet to traverse the to the relay (which can be assumed to be part of the network
first hop link from the client, in a N client mesh. How long the infrastructure with good connectivity to the backbone), rather than
different paths are common, is highly situation dependent. the clients (which may be behind slower access links). The Mesh
causes N-1 streams of transmitted packets to traverse the first hop
link from the client, in an N client mesh. How long the different
paths are common, is highly situation dependent.
The transmission of RTCP automatically adapts to any changes in the The transmission of RTCP by design adapts to any changes in the
number of participants due to the transmission algorithm, defined in number of participants due to the transmission algorithm, defined in
the RTP specification [RFC3550], and the extensions in AVPF [RFC4585] the RTP specification [RFC3550], and the extensions in AVPF [RFC4585]
(when applicable). That way, the resources utilized for RTCP stay (when applicable). That way, the resources utilized for RTCP stay
within the bounds configured for the session. within the bounds configured for the session.
4.1.2. Transport or Media Interoperability 4.1.2. Transport or Media Interoperability
Translators, Mixers, and RTCP-terminating MCU, and Mesh with All Translators, Mixers, and RTCP-terminating MCU, and Mesh with
individual RTP sessions, all allow changing the media encoding or the individual RTP sessions, allow changing the media encoding or the
transport to other properties of the other domain, thereby providing transport to other properties of the other domain, thereby providing
extended interoperability in cases where the participants lack a extended interoperability in cases where the participants lack a
common set of media codecs and/or transport protocols. common set of media codecs and/or transport protocols. Selective
Forwarding Middleboxes can adopt the transport, and (at least)
selectively forward the encoded streams that match a receiver's
capability. It requires an additional translator to change the media
encoding if the encoded streams do not match the receiver's
capabilities.
4.1.3. Per Domain Bit-Rate Adaptation 4.1.3. Per Domain Bit-Rate Adaptation
Participants are most likely to be connected to each other with a Participants are most likely to be connected to each other with a
heterogeneous set of paths. This makes congestion control in a Point heterogeneous set of paths. This makes congestion control in a Point
to Multipoint set problematic. For the ASM, Mesh with common RTP to Multipoint set problematic. For the ASM, SSM, Mesh with common
session, and Relay scenario, each individual sender has to adapt to RTP session, and Transport Relay scenario, each individual sender has
the receiver with the least capable path. This is no longer to adapt to the receiver with the least capable path, yielding
necessary when Media Translators, Mixers, or MCUs are involved, as suboptimal quality for the receivers behind the more capable paths.
each participant only needs to adapt to the slowest path within its This is no longer necessary when Media Translators, Mixers, SFM or
own domain. The Translator, Mixer, or MCU topologies all require MCUs are involved, as each participant only needs to adapt to the
their respective outgoing streams to adjust the bit-rate, packet- slowest path within its own domain. The Translator, Mixer, SFM, or
rate, etc., to adapt to the least capable path in each of the other MCU topologies all require their respective outgoing streams to
domains. That way one can avoid lowering the quality to the least- adjust the bit-rate, packet-rate, etc., to adapt to the least capable
capable participant in all the domains at the cost (complexity, path in each of the other domains. That way one can avoid lowering
delay, equipment) of the Mixer or Translator. the quality to the least-capable participant in all the domains at
the cost (complexity, delay, equipment) of the Mixer, SFM or
Translator, and potentially media sender (multicast/layered encoding
and sending the different representations).
4.1.4. Aggregation of Media 4.1.4. Aggregation of Media
In the all to all media property mentioned above and provided by ASM, In the all to all media property mentioned above and provided by ASM,
all simultaneous media transmissions share the available bit-rate. SSM, Mesh with common RTP session, and relay, all simultaneous media
For participants with limited reception capabilities, this may result transmissions share the available bit-rate. For participants with
in a situation where even a minimal acceptable media quality cannot limited reception capabilities, this may result in a situation where
be accomplished. This is the result of multiple media streams even a minimal acceptable media quality cannot be accomplished,
needing to share the available resources. The solution to this because multiple media streams need to share the same resources. One
problem is to provide for a Mixer or MCU to aggregate the multiple solution to this problem is to provide for a Mixer, or MCU to
streams into a single one. This aggregation can be performed aggregate the multiple streams into a single one, where the single
according to different methods. Mixing or selection are two common stream takes up less resources in terms of bit-rate. This
methods. aggregation can be performed according to different methods. Mixing
or selection are two common methods. Selection is almost always
possible and easy to implement. Mixing requires resources in the
mixer, and may be relatively easy and not impairing the quality to
badly (audio) or quite difficult (video tiling, which is not only
computationally complex but also reduces the pixel count per stream,
with corresponding less in perceptual quality).
4.1.5. View of All Session Participants 4.1.5. View of All Session Participants
The RTP protocol includes functionality to identify the session The RTP protocol includes functionality to identify the session
participants through the use of the SSRC and CSRC fields. In participants through the use of the SSRC and CSRC fields. In
addition, it is capable of carrying some further identity information addition, it is capable of carrying some further identity information
about these participants using the RTCP Source Descriptors (SDES). about these participants using the RTCP Source Descriptors (SDES).
To maintain this functionality, it is necessary that RTCP is handled In topologies that provide an full all to all functionality, i.e.
correctly in domain bridging function. This is specified for ASM, Mesh with common RTP session, Relay a compliant RTP
Translators and Mixers. The MCU described in Section 3.8 does not implementation offers the functionality directly as specified in RTP.
entirely fulfill this. The one described in Section 3.9 does not In topologies that do not offer all-to-all communication, it is
support this at all. necessary that RTCP is handled correctly in domain bridging function.
RTP includes explicit specification text for Translators and Mixers,
and for SFMs the required functionality can be derived from that
text. However, the MCU described in Section 3.8 cannot offer the
full functionality for session participant identification through RTP
means. The topologies that create independent RTP sessions per
endpoint or pair of endpoints, like Back to Back RTP session, MESH
with independent RTP sessions, and the RTCP terminating MCU RTCP
terminating MCU (Section 3.9) do not support RTP based identification
of session participants. In all those cases, other non-RTP based
mechanisms need to be implemented if such knowledge is required or
desirable.
4.1.6. Loop Detection 4.1.6. Loop Detection
In complex topologies with multiple interconnected domains, it is In complex topologies with multiple interconnected domains, it is
possible to form media loops. RTP and RTCP support detecting such possible to unintentionally form media loops. RTP and RTCP support
loops, as long as the SSRC and CSRC identities are correctly set in detecting such loops, as long as the SSRC and CSRC identities are
forwarded packets. It is likely that loop detection works for the maintained and correctly set in forwarded packets. Loop detection
MCU, described in Section 3.8, at least as long as it forwards the will work in ASM, SSM, Mesh with joint RTP session, and Relay. It is
RTCP between the participants. However, the MCU in Section 3.9 will likely that loop detection works for the video switching MCU
definitely break the loop detection mechanism. Section 3.8, at least as long as it forwards the RTCP between the
participants. However, the Back to Back RTP sessions, Mesh with
independent RTP sessions, SFM, will definitely break the loop
detection mechanism.
4.2. Comparison of Topologies 4.2. Comparison of Topologies
The table below attempts to summarize the properties of the different The table below attempts to summarize the properties of the different
topologies. The legend to the topology abbreviations are: Topo- topologies. The legend to the topology abbreviations are: Topo-
Point-to-Point (PtP), Topo-Multicast (Multic), Topo-Trns-Translator Point-to-Point (PtP), Topo-ASM (ASM), Topo-SSM (SSM), Topo-Trns-
(TTrn), Topo-Media-Translator (including Transport Translator) Translator (TT), Topo-Media-Translator (including Transport
(MTrn), Topo-Mixer (Mixer), Topo-Asymmetric (ASY), Topo-Video-switch- Translator) (MT), Topo-Mesh with joint session (MJS), Topo-Mesh with
MCU (MCUs), and Topo-RTCP-terminating-MCU (MCUt). In the table individual sessions (MIS), Topo-Mixer (Mix), Topo-Asymmetric (ASY),
below, Y indicates Yes or full support, N indicates No support, (Y) Topo-Video-switch-MCU (VSM), and Topo-RTCP-terminating-MCU (RTM),
indicates partial support, and N/A indicates not applicable. Selective Forwarding Middlebox (SFM). In the table below, Y
indicates Yes or full support, N indicates No support, (Y) indicates
partial support, and N/A indicates not applicable.
Property PtP Multic TTrn MTrn Mixer ASY MCUs MCUt Property PtP ASM SSM TT MT MJS MIS Mix ASY VSM RTM SFM
------------------------------------------------------------------ ---------------------------------------------------------------------
All to All media N Y Y Y (Y) (Y) (Y) (Y) All to All media N Y (Y) Y Y Y (Y) (Y) (Y) (Y) (Y) (Y)
Interoperability N/A N Y Y Y Y N Y Interoperability N/A N N Y Y Y Y Y Y N Y Y
Per Domain Adaptation N/A N N Y Y Y N Y Per Domain Adaptation N/A N N N Y N Y Y Y N Y Y
Aggregation of media N N N N Y (Y) Y Y Aggregation of media N N N N N N N Y (Y) Y Y N
Full Session View Y Y Y Y Y Y (Y) N Full Session View Y Y Y Y Y Y N Y Y (Y) N Y
Loop Detection Y Y Y Y Y Y (Y) N Loop Detection Y Y Y Y Y Y N Y Y (Y) N N
Please note that the Media Translator also includes the transport Please note that the Media Translator also includes the transport
Translator functionality. Translator functionality.
5. Security Considerations 5. Security Considerations
The use of Mixers and Translators has impact on security and the The use of Mixers, SFMs and Translators has impact on security and
security functions used. The primary issue is that both Mixers and the security functions used. The primary issue is that both Mixers,
Translators modify packets, thus preventing the use of integrity and SFMs and Translators modify packets, thus preventing the use of
source authentication, unless they are trusted devices that take part integrity and source authentication, unless they are trusted devices
in the security context, e.g., the device can send Secure Realtime that take part in the security context, e.g., the device can send
Transport Protocol (SRTP) and Secure Realtime Transport Control Secure Realtime Transport Protocol (SRTP) and Secure Realtime
Protocol (SRTCP) [RFC3711] packets to session endpoints. If Transport Control Protocol (SRTCP) [RFC3711] packets to session
encryption is employed, the media Translator and Mixer need to be endpoints. If encryption is employed, the media Translator, SFM and
able to decrypt the media to perform its function. A transport Mixer need to be able to decrypt the media to perform its function.
Translator may be used without access to the encrypted payload in A transport Translator may be used without access to the encrypted
cases where it translates parts that are not included in the payload in cases where it translates parts that are not included in
encryption and integrity protection, for example, IP address and UDP the encryption and integrity protection, for example, IP address and
port numbers in a media stream using SRTP [RFC3711]. However, in UDP port numbers in a media stream using SRTP [RFC3711]. However, in
general, the Translator or Mixer needs to be part of the signalling general, the Translator, SFM or Mixer needs to be part of the
context and get the necessary security associations (e.g., SRTP signalling context and get the necessary security associations (e.g.,
crypto contexts) established with its RTP session participants. SRTP crypto contexts) established with its RTP session participants.
Including the Mixer and Translator in the security context allows the Including the Mixer, SFM and Translator in the security context
entity, if subverted or misbehaving, to perform a number of very allows the entity, if subverted or misbehaving, to perform a number
serious attacks as it has full access. It can perform all the of very serious attacks as it has full access. It can perform all
attacks possible (see RFC 3550 and any applicable profiles) as if the the attacks possible (see RFC 3550 and any applicable profiles) as if
media session were not protected at all, while giving the impression the media session were not protected at all, while giving the
to the session participants that they are protected. impression to the session participants that they are protected.
Transport Translators have no interactions with cryptography that Transport Translators have no interactions with cryptography that
works above the transport layer, such as SRTP, since that sort of works above the transport layer, such as SRTP, since that sort of
Translator leaves the RTP header and payload unaltered. Media Translator leaves the RTP header and payload unaltered. Media
Translators, on the other hand, have strong interactions with Translators, on the other hand, have strong interactions with
cryptography, since they alter the RTP payload. A media Translator cryptography, since they alter the RTP payload. A media Translator
in a session that uses cryptographic protection needs to perform in a session that uses cryptographic protection needs to perform
cryptographic processing to both inbound and outbound packets. cryptographic processing to both inbound and outbound packets.
A media Translator may need to use different cryptographic keys for A media Translator may need to use different cryptographic keys for
skipping to change at page 38, line 18 skipping to change at page 40, line 32
When the media Translator uses different keys to process inbound and When the media Translator uses different keys to process inbound and
outbound packets, each session participant needs to be provided with outbound packets, each session participant needs to be provided with
the appropriate key, depending on whether they are listening to the the appropriate key, depending on whether they are listening to the
Translator or the original source. (Note that there is an Translator or the original source. (Note that there is an
architectural difference between RTP media translation, in which architectural difference between RTP media translation, in which
participants can rely on the RTP Payload Type field of a packet to participants can rely on the RTP Payload Type field of a packet to
determine appropriate processing, and cryptographically protected determine appropriate processing, and cryptographically protected
media translation, in which participants must use information that is media translation, in which participants must use information that is
not carried in the packet.) not carried in the packet.)
When using security mechanisms with Translators and Mixers, it is When using security mechanisms with Translators, SFMs and Mixers, it
possible that the Translator or Mixer could create different security is possible that the Translator, SFM or Mixer could create different
associations for the different domains they are working in. Doing so security associations for the different domains they are working in.
has some implications: Doing so has some implications:
First, it might weaken security if the Mixer/Translator accepts a First, it might weaken security if the Mixer/Translator accepts a
weaker algorithm or key in one domain than in another. Therefore, weaker algorithm or key in one domain than in another. Therefore,
care should be taken that appropriately strong security parameters care should be taken that appropriately strong security parameters
are negotiated in all domains. In many cases, "appropriate" are negotiated in all domains. In many cases, "appropriate"
translates to "similar" strength. If a key management system does translates to "similar" strength. If a key management system does
allow the negotiation of security parameters resulting in a different allow the negotiation of security parameters resulting in a different
strength of the security, then this system should notify the strength of the security, then this system should notify the
participants in the other domains about this. participants in the other domains about this.
Second, the number of crypto contexts (keys and security related Second, the number of crypto contexts (keys and security related
state) needed (for example, in SRTP [RFC3711]) may vary between state) needed (for example, in SRTP [RFC3711]) may vary between
Mixers and Translators. A Mixer normally needs to represent only a Mixers, SFMs and Translators. A Mixer normally needs to represent
single SSRC per domain and therefore needs to create only one only a single SSRCs per domain and therefore needs to create only one
security association (SRTP crypto context) per domain. In contrast, security association (SRTP crypto context) per domain. In contrast,
a Translator needs one security association per participant it a Translator needs one security association per participant it
translates towards, in the opposite domain. Considering Figure 11, translates towards, in the opposite domain. Considering Figure 11,
the Translator needs two security associations towards the multicast the Translator needs two security associations towards the multicast
domain, one for B and one for D. It may be forced to maintain a set domain, one for B and one for D. It may be forced to maintain a set
of totally independent security associations between itself and B and of totally independent security associations between itself and B and
D respectively, so as to avoid two-time pad occurrences. These D respectively, so as to avoid two-time pad occurrences. These
contexts must also be capable of handling all the sources present in contexts must also be capable of handling all the sources present in
the other domains. Hence, using completely independent security the other domains. Hence, using completely independent security
associations (for certain keying mechanisms) may force a Translator associations (for certain keying mechanisms) may force a Translator
to handle N*DM keys and related state; where N is the total number of to handle N*DM keys and related state; where N is the total number of
SSRCs used over all domains and DM is the total number of domains. SSRCs used over all domains and DM is the total number of domains.
The multicast based (ASM and SSM), Relay and Mesh with common RTP
session are all topologies with multiple endpoints that requires
knowledge about the different crypto contexts for the endpoints.
These multi-party topologies have special requirements on the key-
management as well as the security functions. Specifically source-
authentication in these environments has special requirements.
There exist a number of different mechanisms to provide keys to the There exist a number of different mechanisms to provide keys to the
different participants. One example is the choice between group keys different participants. One example is the choice between group keys
and unique keys per SSRC. The appropriate keying model is impacted and unique keys per SSRC. The appropriate keying model is impacted
by the topologies one intends to use. The final security properties by the topologies one intends to use. The final security properties
are dependent on both the topologies in use and the keying are dependent on both the topologies in use and the keying
mechanisms' properties, and need to be considered by the application. mechanisms' properties, and need to be considered by the application.
Exactly which mechanisms are used is outside of the scope of this Exactly which mechanisms are used is outside of the scope of this
document. Please review RTP Security Options document. Please review RTP Security Options [RFC7201] to get a
[I-D.ietf-avtcore-rtp-security-options] to get a better understanding better understanding of most of the available options.
of most of the available options.
6. IANA Considerations 6. IANA Considerations
This document makes no request of IANA. This document makes no request of IANA.
Note to RFC Editor: this section may be removed on publication as an Note to RFC Editor: this section may be removed on publication as an
RFC. RFC.
7. Acknowledgements 7. Acknowledgements
skipping to change at page 39, line 48 skipping to change at page 42, line 20
Initiation Protocol (SIP) Event Package for Conference Initiation Protocol (SIP) Event Package for Conference
State", RFC 4575, August 2006. State", RFC 4575, August 2006.
[RFC4585] Ott, J., Wenger, S., Sato, N., Burmeister, C., and J. Rey, [RFC4585] Ott, J., Wenger, S., Sato, N., Burmeister, C., and J. Rey,
"Extended RTP Profile for Real-time Transport Control "Extended RTP Profile for Real-time Transport Control
Protocol (RTCP)-Based Feedback (RTP/AVPF)", RFC 4585, July Protocol (RTCP)-Based Feedback (RTP/AVPF)", RFC 4585, July
2006. 2006.
8.2. Informative References 8.2. Informative References
[I-D.ietf-avtcore-rtp-multi-stream] [I-D.ietf-avtcore-rtp-multi-stream-optimisation]
Lennox, J., Westerlund, M., Wu, W., and C. Perkins, Lennox, J., Westerlund, M., Wu, W., and C. Perkins,
"Sending Multiple Media Streams in a Single RTP Session", "Sending Multiple Media Streams in a Single RTP Session:
draft-ietf-avtcore-rtp-multi-stream-01 (work in progress), Grouping RTCP Reception Statistics and Other Feedback",
July 2013. draft-ietf-avtcore-rtp-multi-stream-optimisation-02 (work
in progress), February 2014.
[I-D.ietf-avtcore-rtp-security-options]
Westerlund, M. and C. Perkins, "Options for Securing RTP
Sessions", draft-ietf-avtcore-rtp-security-options-08
(work in progress), October 2013.
[RFC1112] Deering, S., "Host extensions for IP multicasting", STD 5, [RFC1112] Deering, S., "Host extensions for IP multicasting", STD 5,
RFC 1112, August 1989. RFC 1112, August 1989.
[RFC3022] Srisuresh, P. and K. Egevang, "Traditional IP Network [RFC3022] Srisuresh, P. and K. Egevang, "Traditional IP Network
Address Translator (Traditional NAT)", RFC 3022, January Address Translator (Traditional NAT)", RFC 3022, January
2001. 2001.
[RFC3569] Bhattacharyya, S., "An Overview of Source-Specific [RFC3569] Bhattacharyya, S., "An Overview of Source-Specific
Multicast (SSM)", RFC 3569, July 2003. Multicast (SSM)", RFC 3569, July 2003.
skipping to change at page 41, line 5 skipping to change at page 43, line 13
Traversal Utilities for NAT (STUN)", RFC 5766, April 2010. Traversal Utilities for NAT (STUN)", RFC 5766, April 2010.
[RFC6285] Ver Steeg, B., Begen, A., Van Caenegem, T., and Z. Vax, [RFC6285] Ver Steeg, B., Begen, A., Van Caenegem, T., and Z. Vax,
"Unicast-Based Rapid Acquisition of Multicast RTP "Unicast-Based Rapid Acquisition of Multicast RTP
Sessions", RFC 6285, June 2011. Sessions", RFC 6285, June 2011.
[RFC6465] Ivov, E., Marocco, E., and J. Lennox, "A Real-time [RFC6465] Ivov, E., Marocco, E., and J. Lennox, "A Real-time
Transport Protocol (RTP) Header Extension for Mixer-to- Transport Protocol (RTP) Header Extension for Mixer-to-
Client Audio Level Indication", RFC 6465, December 2011. Client Audio Level Indication", RFC 6465, December 2011.
[RFC7201] Westerlund, M. and C. Perkins, "Options for Securing RTP
Sessions", RFC 7201, April 2014.
Authors' Addresses Authors' Addresses
Magnus Westerlund Magnus Westerlund
Ericsson Ericsson
Farogatan 6 Farogatan 6
SE-164 80 Kista SE-164 80 Kista
Sweden Sweden
Phone: +46 10 714 82 87 Phone: +46 10 714 82 87
Email: magnus.westerlund@ericsson.com Email: magnus.westerlund@ericsson.com
 End of changes. 70 change blocks. 
402 lines changed or deleted 437 lines changed or added

This html diff was produced by rfcdiff 1.41. The latest version is available from http://tools.ietf.org/tools/rfcdiff/