RFC 6562 - Guidelines for the Use of Variable Bit Rate Audio with Secure RTP 日本語訳

原文URL : https://datatracker.ietf.org/doc/html/rfc6562
タイトル : RFC 6562 - Secure RTPでの可変ビットレートオーディオの使用に関するガイドライン
翻訳編集 : 自動生成

[要約] この文書は、Secure RTP（SRTP）を使用して暗号化された音声通信において、可変ビットレート（VBR）オーディオを使用する際のセキュリティ上のリスクと対策を説明しています。パケットサイズの変化から音声の内容を推測されるサイドチャネル攻撃（トラフィック解析）を防ぐためのパディング技術やコーデックの適切な使用方法に関するガイドラインを提供します。

Internet Engineering Task Force (IETF)                        C. Perkins
Request for Comments: 6562                         University of Glasgow
Category: Standards Track                                      JM. Valin
ISSN: 2070-1721                                      Mozilla Corporation
                                                              March 2012

Guidelines for the Use of Variable Bit Rate Audio with Secure RTP

Secure RTPでの可変ビットレートオーディオの使用に関するガイドライン

Abstract

概要

This memo discusses potential security issues that arise when using variable bit rate (VBR) audio with the secure RTP profile. Guidelines to mitigate these issues are suggested.

このメモは、安全なRTPプロファイルで可変ビットレート（VBR）オーディオを使用するときに発生する潜在的なセキュリティ問題について説明します。これらの問題を軽減するためのガイドラインが推奨されます。

Status of This Memo

本文書の状態

This is an Internet Standards Track document.

これはInternet Standards Trackドキュメントです。

This document is a product of the Internet Engineering Task Force (IETF). It represents the consensus of the IETF community. It has received public review and has been approved for publication by the Internet Engineering Steering Group (IESG). Further information on Internet Standards is available in Section 2 of RFC 5741.

このドキュメントは、IETF（Internet Engineering Task Force）の製品です。これは、IETFコミュニティのコンセンサスを表しています。公開レビューを受け、インターネットエンジニアリングステアリンググループ（IESG）による公開が承認されました。インターネット標準の詳細については、RFC 5741のセクション2をご覧ください。

Information about the current status of this document, any errata, and how to provide feedback on it may be obtained at http://www.rfc-editor.org/info/rfc6562.

このドキュメントの現在のステータス、エラータ、およびフィードバックの提供方法に関する情報は、http：//www.rfc-editor.org/info/rfc6562で入手できます。

著作権表示

This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License.

この文書は、BCP 78およびIETF文書に関するIETFトラストの法的規定（http://trustee.ietf.org/license-info）の対象であり、この文書の発行日に有効です。これらのドキュメントは、このドキュメントに関するあなたの権利と制限を説明しているため、注意深く確認してください。このドキュメントから抽出されたコードコンポーネントには、Trust Legal Provisionsのセクション4.eに記載されているSimplified BSD Licenseのテキストが含まれている必要があり、Simplified BSD Licenseに記載されているように保証なしで提供されます。

Table of Contents

   1.  Introduction ...................................................2
   2.  Scenario-Dependent Risk ........................................2
   3.  Guidelines for Use of VBR Audio with SRTP ......................3
   4.  Guidelines for Use of Voice Activity Detection with SRTP .......3
   5.  Padding the Output of VBR Codecs ...............................4
   6.  Security Considerations ........................................5
   7.  Acknowledgements ...............................................5
   8.  References .....................................................5
       8.1. Normative References ......................................5
       8.2. Informative References ....................................6

1. Introduction

1. はじめに

The Secure RTP (SRTP) framework [RFC3711] is a widely used framework for securing RTP sessions [RFC3550]. SRTP provides the ability to encrypt the payload of an RTP packet, and optionally add an authentication tag, while leaving the RTP header and any header extension in the clear. A range of encryption transforms can be used with SRTP, but none of the predefined encryption transforms use any padding; the RTP and SRTP payload sizes match exactly.

Secure RTP（SRTP）フレームワーク[RFC3711]は、RTPセッションを保護するために広く使用されているフレームワーク[RFC3550]です。 SRTPは、RTPパケットのペイロードを暗号化し、オプションで認証タグを追加する機能を提供しますが、RTPヘッダーとヘッダー拡張はクリアテキストのままにします。 SRTPではさまざまな暗号化トランスフォームを使用できますが、事前定義された暗号化トランスフォームはパディングを使用しません。 RTPとSRTPのペイロードサイズは正確に一致します。

When using SRTP with voice streams compressed using variable bit rate (VBR) codecs, the length of the compressed packets will depend on the characteristics of the speech signal. This variation in packet size will leak a small amount of information about the contents of the speech signal. This is potentially a security risk for some applications. For example, [spot-me] shows that known phrases in an encrypted call using the Speex codec in VBR mode can be recognized with high accuracy in certain circumstances, and [fon-iks] shows that approximate transcripts of encrypted VBR calls can be derived for some codecs without breaking the encryption. How significant these results are, and how they generalize to other codecs, is still an open question. This memo discusses ways in which such traffic analysis risks may be mitigated.

可変ビットレート（VBR）コーデックを使用して圧縮された音声ストリームでSRTPを使用する場合、圧縮されたパケットの長さは、音声信号の特性によって異なります。このパケットサイズの変動により、音声信号の内容に関する少量の情報が漏洩します。これは、一部のアプリケーションの潜在的なセキュリティリスクです。たとえば、[spot-me]は、VBRモードでSpeexコーデックを使用した暗号化されたコールの既知のフレーズを特定の状況で高精度で認識できることを示し、[fon-iks]は、暗号化されたVBRコールのおおよそのトランスクリプトを導出できることを示しています。一部のコーデックでは、暗号化を破ることはありません。これらの結果がいかに重要であり、それらが他のコーデックに一般化する方法は、まだ未解決の問題です。このメモでは、そのようなトラフィック分析のリスクを軽減する方法について説明します。

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119 [RFC2119].

このドキュメントのキーワード "MUST"、"MUST NOT"、"REQUIRED"、"SHALL"、"SHALL NOT"、"SHOULD"、"SHOULD NOT"、"RECOMMENDED"、"MAY"、および "OPTIONAL" は、RFC 2119 [RFC2119] で説明されているように解釈されるものとします。

2. Scenario-Dependent Risk

2. シナリオ依存のリスク

Whether the information leaks and attacks discussed in [spot-me], [fon-iks], and similar works are significant is highly dependent on the application and use scenario. In the worst case, using the rate information to recognize a prerecorded message knowing the set of all possible messages would lead to near-perfect accuracy. Even when the audio is not prerecorded, there is a real possibility of being able to recognize contents from encrypted audio when the dialog is highly structured (e.g., when the eavesdropper knows that only a handful of possible sentences are possible), and thus contain only little information. Recognizing unconstrained conversational speech from the rate information alone is unreliable and computationally expensive at present, but does appear possible in some circumstances. These attacks are only likely to improve over time.

[spot-me]、[fon-iks]、および同様の作品で説明されている情報漏えいや攻撃が重要であるかどうかは、アプリケーションと使用シナリオに大きく依存します。最悪の場合、レート情報を使用して、すべての可能なメッセージのセットを知っている事前に録音されたメッセージを認識すると、ほぼ完全な精度になります。音声が事前に録音されていない場合でも、ダイアログが高度に構造化されている場合（たとえば、盗聴者が可能な文はほんの一握りしかないとわかっている場合）、暗号化された音声からコンテンツを認識できる可能性があり、少しの情報。レート情報のみから無制限の会話音声を認識することは、現時点では信頼性が低く、計算コストがかかりますが、状況によっては可能であるように見えます。これらの攻撃は、時間の経過とともに改善する可能性があります。

In practical SRTP scenarios, how significant the information leak is when compared to other SRTP-related information must be considered, such as the fact that the source and destination IP addresses are available.

実際のSRTPシナリオでは、送信元IPアドレスと宛先IPアドレスが利用可能であるなど、他のSRTP関連情報と比較した場合の情報漏えいの重要性を考慮する必要があります。

3. Guidelines for Use of VBR Audio with SRTP

3. SRTPでVBRオーディオを使用するためのガイドライン

It is the responsibility of the application designer to determine the appropriate trade-off between security and bandwidth overhead. As a general rule, VBR codecs should be considered safe in the context of low-value encrypted unstructured calls. However, applications that make use of prerecorded messages where the contents of such prerecorded messages may be of any value to an eavesdropper (i.e., messages beyond standard greeting messages) SHOULD NOT use codecs in VBR mode. Interactive voice response (IVR) applications would be particularly vulnerable since an eavesdropper could easily use the rate information to recognize the prompts being played out. Applications conveying highly sensitive unstructured information SHOULD NOT use codecs in VBR mode.

セキュリティと帯域幅のオーバーヘッドの間の適切なトレードオフを決定するのは、アプリケーション設計者の責任です。原則として、VBRコーデックは、価値の低い暗号化された非構造化呼び出しのコンテキストでは安全であると見なされます。ただし、事前録音されたメッセージの内容が盗聴者にとって価値のある可能性がある事前録音されたメッセージを使用するアプリケーション（つまり、標準の挨拶メッセージ以外のメッセージ）は、VBRモードでコーデックを使用しないでください。対話型音声応答（IVR）アプリケーションは、盗聴者がレート情報を簡単に使用して、再生されているプロンプトを認識できるため、特に脆弱です。機密性の高い非構造化情報を伝達するアプリケーションは、VBRモードでコーデックを使用しないでください。

It is safe to use variable rate coding to adapt the output of a voice codec to match characteristics of a network channel, provided this adaptation is done in a way that does not expose any information on the speech signal. For example, VBR audio can be used for congestion control purposes, where the variation is driven by the available network bandwidth, not by the input speech (i.e., the packet sizes and spacing are constant unless the network conditions change). VBR speech codecs can safely be used in this fashion with SRTP while avoiding leaking information on the contents of the speech signal that might be useful for traffic analysis.

音声コーデックの出力をネットワークチャネルの特性に一致するように適応させるために、可変レートコーディングを使用しても安全です。ただし、この適応は、音声信号に関する情報を公開しない方法で行われます。たとえば、VBRオーディオは輻輳制御の目的で使用でき、変動は入力音声ではなく、利用可能なネットワーク帯域幅によって駆動されます（つまり、ネットワークの状態が変化しない限り、パケットサイズと間隔は一定です）。 VBR音声コーデックは、SRTPでこのように安全に使用でき、トラフィック分析に役立つ音声信号の内容に関する情報の漏洩を回避できます。

4. Guidelines for Use of Voice Activity Detection with SRTP

4. SRTPでの音声アクティビティ検出の使用に関するガイドライン

Many speech codecs employ some form of voice activity detection (VAD) to either suppress output frames, or generate some form of lower-rate comfort noise frames, during periods when the speaker is not active. If VAD is used on an encrypted speech signal, then some information about the characteristics of that speech signal can be determined by watching the patterns of voice activity. This information leakage is less than with VBR coding since there are only two rates possible.

多くの音声コーデックは、スピーカーがアクティブでない期間に、何らかの形式の音声アクティビティ検出（VAD）を使用して、出力フレームを抑制するか、低速のコンフォートノイズフレームを生成します。暗号化された音声信号でVADを使用する場合、その音声信号の特性に関するいくつかの情報は、音声アクティビティのパターンを監視することによって決定できます。可能なレートは2つしかないため、この情報漏えいはVBRコーディングの場合よりも少なくなります。

The information leakage due to VAD in SRTP audio sessions can be much reduced if the sender adds an unpredictable "overhang" period to the end of active speech intervals, obscuring their actual length. An RTP sender using VAD with encrypted SRTP audio SHOULD insert such an overhang period at the end of each talkspurt, delaying the start of the silence/comfort noise by a random interval. The length of the overhang applied to each talkspurt must be randomly chosen in such a way that it is computationally infeasible for an attacker to reliably estimate the length of that talkspurt. This may be more important for short talkspurts, since it seems easier to distinguish between different single word responses based on the exact word length, than to glean meaning from the duration of a longer phrase. The audio data comprising the overhang period must be packetized and transmitted in RTP packets in a manner that is indistinguishable from the other data in the talkspurt.

SRTPオーディオセッションでのVADによる情報漏えいは、送信者が予測できない「オーバーハング」期間をアクティブなスピーチ間隔の最後に追加して、実際の長さを不明瞭にすると、大幅に削減できます。暗号化されたSRTPオーディオでVADを使用するRTP送信者は、このようなオーバーハング期間を各トークスパートの最後に挿入して、無音/快適ノイズの開始をランダムな間隔で遅らせる必要があります（SHOULD）。各トークスパートに適用されるオーバーハングの長さは、攻撃者がそのトークスパートの長さを確実に推定することを計算上実行できないようにランダムに選択する必要があります。これは、長いフレーズの継続時間から意味を収集するよりも、正確な語長に基づいて異なる単一の単語の応答を区別する方が簡単なように思われるため、短いトークスパートにとってより重要になる場合があります。オーバーハング期間を構成するオーディオデータは、トークスパート内の他のデータと区別できない方法でRTPパケットでパケット化して送信する必要があります。

The overhang period SHOULD have an exponentially decreasing probability distribution function. This ensures a long tail, while being easy to compute. It is RECOMMENDED to use an overhang with a "half life" of a few hundred milliseconds (this should be sufficient to obscure the presence of interword pauses and the lengths of single words spoken in isolation, for example, the digits of a credit card number clearly enunciated for an automated system, but not so long as to significantly reduce the effectiveness of VAD for detecting listening pauses). Despite the overhang (and no matter what the duration is), there is still a small amount of information leaked about the start time of the talkspurt due to the fact that we cannot apply an overhang to the start of a talkspurt without unacceptably affecting intelligibility. For that reason, VAD SHOULD NOT be used in encrypted IVR applications where the content of prerecorded messages may be of any value to an eavesdropper.

オーバーハング期間は、指数関数的に減少する確率分布関数を持つ必要があります。これにより、計算が簡単でありながら、長いテールが保証されます。数百ミリ秒の「半減期」を持つオーバーハングを使用することをお勧めします（これは、単語間ポーズの存在と、たとえばクレジットカード番号の桁など、単独で話される単一の単語の長さを不明瞭にするのに十分です）自動化されたシステムについて明確に示されていますが、リスニングポーズを検出するためのVADの効果を大幅に低下させるほど長くはありません）。オーバーハングにもかかわらず（および継続時間が何であろうと）、了解度に許容できないほどの影響を与えることなくトークスパートの開始にオーバーハングを適用できないため、トークスパートの開始時刻に関する少量の情報が漏れています。そのため、VADは、事前に録音されたメッセージのコンテンツが盗聴者にとって価値のある暗号化されたIVRアプリケーションでは使用しないでください。

The application of a random overhang period to each talkspurt will reduce the effectiveness of VAD in SRTP sessions when compared to non-SRTP sessions. However, it is still expected that the use of VAD will provide significant bandwidth savings for many encrypted sessions.

各トークスパートにランダムオーバーハング期間を適用すると、非SRTPセッションと比較した場合に、SRTPセッションでのVADの効果が低下します。ただし、VADを使用すると、多くの暗号化されたセッションで帯域幅を大幅に節約できると予想されます。

5. Padding the Output of VBR Codecs

5. VBRコーデックの出力のパディング

For scenarios where VBR is considered unsafe, a constant bit rate (CBR) codec SHOULD be negotiated and used instead, or the VBR codec SHOULD be operated in a CBR mode. However, if the codec does not support CBR, RTP padding SHOULD be used to reduce the information leak to an insignificant level. Packets may be padded to a constant size or to a small range of sizes ([spot-me] achieves good results by padding to the next multiple of 16 octets, but the amount of padding needed to hide the variation in packet size will depend on the codec and the sophistication of the attacker) or may be padded to a size that varies with time. The most secure and RECOMMENDED option is to pad all packets throughout the call to the same size.

VBRが安全でないと考えられるシナリオでは、固定ビットレート（CBR）コーデックをネゴシエートして代わりに使用する必要があります（SHOULD）。または、VBRコーデックはCBRモードで操作する必要があります（SHOULD）。ただし、コーデックがCBRをサポートしていない場合は、RTPパディングを使用して、情報漏えいを無視できるレベルに減らす必要があります（SHOULD）。パケットは、一定のサイズまたは小さなサイズの範囲にパディングされる場合があります（[spot-me]は、16オクテットの次の倍数にパディングすることで良好な結果を達成しますが、パケットサイズの変動を隠すために必要なパディングの量は、コーデックと攻撃者の洗練度）、または時間とともに変化するサイズにパディングされる場合があります。最も安全で推奨されるオプションは、呼び出し全体ですべてのパケットを同じサイズに埋め込むことです。

In the case where the size of the padded packets varies in time, the same concerns as for VAD apply. That is, the padding SHOULD NOT be reduced without waiting for a certain (random) time. The RECOMMENDED "hold time" is the same as the one for VAD.

パディングされたパケットのサイズが時間とともに変化する場合、VADと同じ問題が適用されます。つまり、一定の（ランダムな）時間を待たずにパディングを減らすべきではありません（SHOULD NOT）。推奨される「保持時間」は、VADの場合と同じです。

Note that SRTP encrypts the count of the number of octets of padding added to a packet, but not the bit in the RTP header that indicates that the packet has been padded. For this reason, it is RECOMMENDED to add at least one octet of padding to all packets in a media stream, so an attacker cannot tell which packets needed padding.

SRTPはパケットに追加されたパディングのオクテット数のカウントを暗号化しますが、パケットがパディングされたことを示すRTPヘッダーのビットは暗号化しないことに注意してください。このため、メディアストリームのすべてのパケットに少なくとも1オクテットのパディングを追加することをお勧めします。これにより、攻撃者はパディングが必要なパケットを特定できません。

6. Security Considerations

6. セキュリティに関する考慮事項

This entire memo is about security. The security considerations of [RFC3711] also apply.

このメモ全体はセキュリティに関するものです。 [RFC3711]のセキュリティに関する考慮事項も適用されます。

7. Acknowledgements

7. 謝辞

ZRTP [RFC6189] contains similar recommendations; the purpose of this memo is to highlight these issues to a wider audience, since they are not specific to ZRTP. Thanks are due to Phil Zimmermann, Stefan Doehla, Mats Naslund, Gregory Maxwell, David McGrew, Mark Baugher, Koen Vos, Ingemar Johansson, and Stephen Farrell for their comments and feedback on this memo.

ZRTP [RFC6189]にも同様の推奨事項が含まれています。このメモの目的は、これらの問題がZRTPに固有のものではないため、より広い聴衆にこれらの問題を強調することです。このメモに関するコメントとフィードバックを提供してくれたPhil Zimmermann、Stefan Doehla、Mats Naslund、Gregory Maxwell、David McGrew、Mark Baugher、Koen Vos、Ingemar Johansson、Stephen Farrellに感謝します。

8. References

8. 参考文献

8.1. Normative References

8.1. 引用文献

[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, March 1997.

[RFC2119] Bradner、S。、「要件レベルを示すためにRFCで使用するキーワード」、BCP 14、RFC 2119、1997年3月。

[RFC3550] Schulzrinne, H., Casner, S., Frederick, R., and V. Jacobson, "RTP: A Transport Protocol for Real-Time Applications", STD 64, RFC 3550, July 2003.

[RFC3550] Schulzrinne、H.、Casner、S.、Frederick、R。、およびV. Jacobson、「RTP：A Transport Protocol for Real-Time Applications」、STD 64、RFC 3550、2003年7月。

[RFC3711] Baugher, M., McGrew, D., Naslund, M., Carrara, E., and K. Norrman, "The Secure Real-time Transport Protocol (SRTP)", RFC 3711, March 2004.

[RFC3711]バウアー、M。、マクルー、D。、ナスルンド、M。、カララ、E。、およびK.ノーマン、「Secure Real-time Transport Protocol（SRTP）」、RFC 3711、2004年3月。

8.2. Informative References

8.2. 参考引用

[RFC6189] Zimmermann, P., Johnston, A., and J. Callas, "ZRTP: Media Path Key Agreement for Unicast Secure RTP", RFC 6189, April 2011.

[RFC6189] Zimmermann、P.、Johnston、A。、およびJ. Callas、「ZRTP：Media Path Key Agreement for Unicast Secure RTP」、RFC 6189、2011年4月。

[fon-iks] White, A., Matthews, A., Snow, K., and F. Monrose, "Phonotactic Reconstruction of Encrypted VoIP Conversations: Hookt on fon-iks", Proceedings of the IEEE Symposium on Security and Privacy 2011, May 2011.

[fon-iks]ホワイト、A。、マシューズ、A。、スノー、K。、およびF.モンローズ、「暗号化されたVoIP会話の表現法の再構築：fon-iksのフック」、セキュリティとプライバシーに関するIEEEシンポジウム2011の議事録、2011年5月。

[spot-me] Wright, C., Ballard, L., Coull, S., Monrose, F., and G. Masson, "Spot me if you can: Uncovering spoken phrases in encrypted VoIP conversation", Proceedings of the IEEE Symposium on Security and Privacy 2008, May 2008.

[spot-me] Wright、C.、Ballard、L.、Coull、S.、Monrose、F。、およびG. Masson、「できれば私を見つけて：暗号化されたVoIP会話で話されたフレーズを明らかにする」、IEEE議事録セキュリティとプライバシーに関するシンポジウム2008、2008年5月。

Authors' Addresses

著者のアドレス

Colin Perkins University of Glasgow School of Computing Science Glasgow G12 8QQ UK

コリンパーキンスグラスゴー大学コンピューティングサイエンススクールグラスゴーG12 8QQイギリス

   EMail: csp@csperkins.org

Jean-Marc Valin Mozilla Corporation 650 Castro Street Mountain View, CA 94041 USA

Jean-Marc Valin Mozilla Corporation 650 Castro Street Mountain View、CA 94041 USA

   Phone: +1 650 903-0800
   EMail: jmvalin@jmvalin.ca