draft-ietf-avtcore-srtp-vbr-audio-03.txt   draft-ietf-avtcore-srtp-vbr-audio-04.txt 
Network Working Group C. Perkins Network Working Group C. Perkins
Internet-Draft University of Glasgow Internet-Draft University of Glasgow
Intended status: BCP JM. Valin Intended status: BCP JM. Valin
Expires: January 7, 2012 Octasic Inc. Expires: July 2, 2012 Octasic Inc.
July 6, 2011 December 30, 2011
Guidelines for the use of Variable Bit Rate Audio with Secure RTP Guidelines for the use of Variable Bit Rate Audio with Secure RTP
draft-ietf-avtcore-srtp-vbr-audio-03.txt draft-ietf-avtcore-srtp-vbr-audio-04.txt
Abstract Abstract
This memo discusses potential security issues that arise when using This memo discusses potential security issues that arise when using
variable bit rate audio with the secure RTP profile. Guidelines to variable bit rate audio with the secure RTP profile. Guidelines to
mitigate these issues are suggested. mitigate these issues are suggested.
Status of this Memo Status of this Memo
This Internet-Draft is submitted in full conformance with the This Internet-Draft is submitted in full conformance with the
skipping to change at page 1, line 33 skipping to change at page 1, line 33
Internet-Drafts are working documents of the Internet Engineering Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF). Note that other groups may also distribute Task Force (IETF). Note that other groups may also distribute
working documents as Internet-Drafts. The list of current Internet- working documents as Internet-Drafts. The list of current Internet-
Drafts is at http://datatracker.ietf.org/drafts/current/. Drafts is at http://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress." material or to cite them other than as "work in progress."
This Internet-Draft will expire on January 7, 2012. This Internet-Draft will expire on July 2, 2012.
Copyright Notice Copyright Notice
Copyright (c) 2011 IETF Trust and the persons identified as the Copyright (c) 2011 IETF Trust and the persons identified as the
document authors. All rights reserved. document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents Provisions Relating to IETF Documents
(http://trustee.ietf.org/license-info) in effect on the date of (http://trustee.ietf.org/license-info) in effect on the date of
publication of this document. Please review these documents publication of this document. Please review these documents
skipping to change at page 2, line 17 skipping to change at page 2, line 17
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3
2. Scenario-Dependent Risk . . . . . . . . . . . . . . . . . . . . 3 2. Scenario-Dependent Risk . . . . . . . . . . . . . . . . . . . . 3
3. Guidelines for use of VBR Audio with SRTP . . . . . . . . . . . 4 3. Guidelines for use of VBR Audio with SRTP . . . . . . . . . . . 4
4. Guidelines for use of Voice Activity Detection with SRTP . . . 4 4. Guidelines for use of Voice Activity Detection with SRTP . . . 4
5. Padding the output of VBR codecs . . . . . . . . . . . . . . . 5 5. Padding the output of VBR codecs . . . . . . . . . . . . . . . 5
6. Security Considerations . . . . . . . . . . . . . . . . . . . . 6 6. Security Considerations . . . . . . . . . . . . . . . . . . . . 6
7. IANA Considerations . . . . . . . . . . . . . . . . . . . . . . 6 7. IANA Considerations . . . . . . . . . . . . . . . . . . . . . . 6
8. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 6 8. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 6
9. References . . . . . . . . . . . . . . . . . . . . . . . . . . 6 9. References . . . . . . . . . . . . . . . . . . . . . . . . . . 6
9.1. Normative References . . . . . . . . . . . . . . . . . . . 6 9.1. Normative References . . . . . . . . . . . . . . . . . . . 6
9.2. Informative References . . . . . . . . . . . . . . . . . . 6 9.2. Informative References . . . . . . . . . . . . . . . . . . 7
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 6 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 7
1. Introduction 1. Introduction
The secure RTP framework (SRTP) [RFC3711] is a widely used framework The secure RTP framework (SRTP) [RFC3711] is a widely used framework
for securing RTP sessions. SRTP provides the ability to encrypt the for securing RTP [RFC3550] sessions. SRTP provides the ability to
payload of an RTP packet, and optionally add an authentication tag, encrypt the payload of an RTP packet, and optionally add an
while leaving the RTP header and any header extension in the clear. authentication tag, while leaving the RTP header and any header
A range of encryption transforms can be used with SRTP, but none of extension in the clear. A range of encryption transforms can be used
the pre-defined encryption transforms use any padding; the RTP and with SRTP, but none of the pre-defined encryption transforms use any
SRTP payload sizes match exactly. padding; the RTP and SRTP payload sizes match exactly.
When using SRTP with voice streams compressed using variable bit rate When using SRTP with voice streams compressed using variable bit rate
(VBR) codecs, the length of the compressed packets will therefore (VBR) codecs, the length of the compressed packets will therefore
depend on the characteristics of the speech signal. This variation depend on the characteristics of the speech signal. This variation
in packet size will leak a small amount of information about the in packet size will leak a small amount of information about the
contents of the speech signal. For example [spot-me] shows that contents of the speech signal. This is potentially a security risk
known phrases in an encrypted call using the Speex codec in VBR mode for some applications. For example, [spot-me] shows that known
can be recognised with high accuracy in certain circumstances, phrases in an encrypted call using the Speex codec in VBR mode can be
without breaking the encryption. Other work, referenced from recognised with high accuracy in certain circumstances, and [fon-iks]
[spot-me], has shown that the language spoken in encrypted shows that approximate transcripts of encrypted VBR calls can be
conversations can also be recognised. This is potentially a security derived for some codecs without breaking the encryption. How
risk for some applications. How significant these results are and significant these results are, and how they generalise to other
how they generalise to other codecs is still an open question. This codecs, is still an open question. This memo discusses ways in which
memo discusses ways in which this traffic analysis risk may be such traffic analysis risks may be mitigated.
mitigated.
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
document are to be interpreted as described in RFC 2119 [RFC2119]. document are to be interpreted as described in RFC 2119 [RFC2119].
2. Scenario-Dependent Risk 2. Scenario-Dependent Risk
Whether the information leak analysed in [spot-me] is significant Whether the information leaks and attacks discussed in [spot-me],
highly depends on the application. In the worst case, using the rate [fon-iks], and similar works are significant is highly dependent on
the application and use scenario. In the worst case, using the rate
information to recognize a pre-recorded message knowing the set of information to recognize a pre-recorded message knowing the set of
all possible messages would lead to near-perfect accuracy. Even when all possible messages would lead to near-perfect accuracy. Even when
the audio is not pre-recorded, there is a real possibility of being the audio is not pre-recorded, there is a real possibility of being
able to recognize contents from encypted audio when the dialog is able to recognize contents from encypted audio when the dialog is
highly structured (e.g. when the evesdropper knows that only a highly structured (e.g., when the evesdropper knows that only a
handful of possible sentences are possible) and thus contain only handful of possible sentences are possible), and thus contain only
little information. On the other end, recognizing unconstrained little information. Recognizing unconstrained conversational speech
conversational speech from the rate information alone appears to be from the rate information alone is unreliable and computationally
highly unlikely at best. In fact, such a task is already considered expensive at present, but does appear possible in some circumstances.
a hard problem even when one has access to the unencrypted audio. These attacks are only likely to improve over time.
In practical SRTP scenarios, it must also be considered how In practical SRTP scenarios, it must also be considered how
significant the information leak is when compared to other SRTP- significant the information leak is when compared to other SRTP-
related information, such as the fact that the source and destination related information, such as the fact that the source and destination
IP addresses are available. IP addresses are available.
3. Guidelines for use of VBR Audio with SRTP 3. Guidelines for use of VBR Audio with SRTP
It is the responsibility of the application designer to determine the It is the responsibility of the application designer to determine the
appropriate trade-off between security and bandwidth overhead. As a appropriate trade-off between security and bandwidth overhead. As a
general rule, VBR codecs should be considered safe in the context of general rule, VBR codecs should be considered safe in the context of
encrypted unstructured calls. However, applications that make use of low-value encrypted unstructured calls. However, applications that
pre-recorded messages where the contents of such pre-recorded make use of pre-recorded messages where the contents of such pre-
messages may be of any value to an evesdropper (i.e., messages beyond recorded messages may be of any value to an evesdropper (i.e.,
standard greeting messages) SHOULD NOT use codecs in VBR mode. messages beyond standard greeting messages) SHOULD NOT use codecs in
Interactive voice response (IVR) applications would be particularly VBR mode. Interactive voice response (IVR) applications would be
vulnerable since an evesdropper could easily use the rate information particularly vulnerable since an evesdropper could easily use the
to easily recognize the prompts being played out. rate information to easily recognize the prompts being played out.
Applications conveying highly sensitive unstructured information
SHOULD NOT use codecs in VBR mode.
It is safe to use variable rate coding to adapt the output of a voice It is safe to use variable rate coding to adapt the output of a voice
codec to match characteristics of a network channel, for example for codec to match characteristics of a network channel, for example for
congestion control purposes, provided this adaptation done in a way congestion control purposes, provided this adaptation done in a way
that does not expose any information on the speech signal. That is, that does not expose any information on the speech signal. That is,
if the variation is driven by the available network bandwidth, not by if the variation is driven by the available network bandwidth, not by
the input speech (i.e., if the packet sizes and spacing are constant the input speech (i.e., if the packet sizes and spacing are constant
unless the network conditions change). VBR speech codecs can safely unless the network conditions change). VBR speech codecs can safely
be used in this fashion with SRTP while avoiding leaking information be used in this fashion with SRTP while avoiding leaking information
on the contents of the speech signal that might be useful for traffic on the contents of the speech signal that might be useful for traffic
skipping to change at page 5, line 34 skipping to change at page 5, line 36
recorded messages may be of any value to an eavesdropper. recorded messages may be of any value to an eavesdropper.
The application of a random overhang period to each talkspurt will The application of a random overhang period to each talkspurt will
reduce the effectiveness of VAD in SRTP sessions when compared to reduce the effectiveness of VAD in SRTP sessions when compared to
non-SRTP sessions. It is, however, still expected that the use of non-SRTP sessions. It is, however, still expected that the use of
VAD will provide a significant bandwidth saving for many encrypted VAD will provide a significant bandwidth saving for many encrypted
sessions. sessions.
5. Padding the output of VBR codecs 5. Padding the output of VBR codecs
For scenarios where VBR is considered unsafe, the codec SHOULD be For scenarios where VBR is considered unsafe, a constant bit rate
operated in constant bit rate (CBR) mode. However, if the codec does (CBR) codec SHOULD be negotiated and used instead, or the VBR codec
not support CBR, RTP padding SHOULD be used to reduce the information SHOULD be operated in a CBR mode. However, if the codec does not
support CBR, RTP padding SHOULD be used to reduce the information
leak to an insignificant level. Packets may be padded to a constant leak to an insignificant level. Packets may be padded to a constant
size ([spot-me] achieves good results by padding to the next multiple size or to a small range of sizes ([spot-me] achieves good results by
of 16 octets, but the amount of padding needed to hide the variation padding to the next multiple of 16 octets, but the amount of padding
in packet size will depend on the codec), or may be padded to a size needed to hide the variation in packet size will depend on the codec
that varies with time. In the case where the size of the padded and the sophistication of the attacker), or may be padded to a size
packets varies in time, the same concerns as for VAD apply. That is, that varies with time. The most secure, and RECOMMENDED, option is
the padding SHOULD NOT be reduced without waiting for a certain to pad all packets throughout the call to the same size.
(random) time. The RECOMMENDED "hold time" is the same as the one
for VAD. In the case where the size of the padded packets varies in time, the
same concerns as for VAD apply. That is, the padding SHOULD NOT be
reduced without waiting for a certain (random) time. The RECOMMENDED
"hold time" is the same as the one for VAD.
Note that SRTP encrypts the count of the number of octets of padding Note that SRTP encrypts the count of the number of octets of padding
added to a packet, but not the bit in the RTP header that indicates added to a packet, but not the bit in the RTP header that indicates
that the packet has been padded. For this reason, it is RECOMMENDED that the packet has been padded. For this reason, it is RECOMMENDED
to add at least one octet of padding to all packets in a media to add at least one octet of padding to all packets in a media
stream, so an attacker cannot tell which packets needed padding. stream, so an attacker cannot tell which packets needed padding.
6. Security Considerations 6. Security Considerations
The security considerations of [RFC3711] apply. This entire memo is about security. The security considerations of
[RFC3711] also apply.
7. IANA Considerations 7. IANA Considerations
No IANA actions are required. No IANA actions are required.
8. Acknowledgements 8. Acknowledgements
This memo is based on the discussion in [spot-me]. ZRTP [RFC6189] ZRTP [RFC6189] contains similar recommendations; the purpose of this
contain a similar recommendation; the purpose of this memo is to memo is to highlight these issues to a wider audience, since they are
highlight these issues to a wider audience, since they are not not specific to ZRTP. Thanks are due to Phil Zimmermann, Stefan
specific to ZRTP. Thanks are due to Phil Zimmermann, Stefan Doehla, Doehla, Mats Naslund, Gregory Maxwell, David McGrew, Mark Baugher,
Mats Naslund, Gregory Maxwell, David McGrew, Mark Baugher, Koen Vos, Koen Vos, Ingemar Johansson, and Stephen Farrell for their comments
and Ingemar Johansson for their comments and feedback on this memo. and feedback on this memo.
9. References 9. References
9.1. Normative References 9.1. Normative References
[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
Requirement Levels", BCP 14, RFC 2119, March 1997. Requirement Levels", BCP 14, RFC 2119, March 1997.
[RFC3550] Schulzrinne, H., Casner, S., Frederick, R., and V.
Jacobson, "RTP: A Transport Protocol for Real-Time
Applications", STD 64, RFC 3550, July 2003.
[RFC3711] Baugher, M., McGrew, D., Naslund, M., Carrara, E., and K. [RFC3711] Baugher, M., McGrew, D., Naslund, M., Carrara, E., and K.
Norrman, "The Secure Real-time Transport Protocol (SRTP)", Norrman, "The Secure Real-time Transport Protocol (SRTP)",
RFC 3711, March 2004. RFC 3711, March 2004.
9.2. Informative References 9.2. Informative References
[RFC6189] Zimmermann, P., Johnston, A., and J. Callas, "ZRTP: Media [RFC6189] Zimmermann, P., Johnston, A., and J. Callas, "ZRTP: Media
Path Key Agreement for Unicast Secure RTP", RFC 6189, Path Key Agreement for Unicast Secure RTP", RFC 6189,
April 2011. April 2011.
[fon-iks] White, A., Matthews, A., Snow, K., and F. Monrose,
"Phonotactic Reconstruction of Encrypted VoIP
Conversations: Hookt on fon-iks", Proceedings of the IEEE
Symposium on Security and Privacy 2011, May 2011.
[spot-me] Wright, C., Ballard, L., Coull, S., Monrose, F., and G. [spot-me] Wright, C., Ballard, L., Coull, S., Monrose, F., and G.
Masson, "Spot me if you can: Uncovering spoken phrases in Masson, "Spot me if you can: Uncovering spoken phrases in
encrypted VoIP conversation", Proceedings of the IEEE encrypted VoIP conversation", Proceedings of the IEEE
Symposium on Security and Privacy 2008, May 2008. Symposium on Security and Privacy 2008, May 2008.
Authors' Addresses Authors' Addresses
Colin Perkins Colin Perkins
University of Glasgow University of Glasgow
School of Computing Science School of Computing Science
 End of changes. 15 change blocks. 
55 lines changed or deleted 71 lines changed or added

This html diff was produced by rfcdiff 1.41. The latest version is available from http://tools.ietf.org/tools/rfcdiff/