RFC 9628 - RTP Payload Format for VP9 Video 日本語訳

原文URL : https://www.rfc-editor.org/rfc/rfc9628.html
タイトル : RFC 9628 - VP9ビデオのRTPペイロード形式
翻訳編集 : 自動生成

Internet Engineering Task Force (IETF)                         J. Uberti
Request for Comments: 9628                                        OpenAI
Category: Standards Track                                      S. Holmer
ISSN: 2070-1721                                               M. Flodman
                                                                 D. Hong
                                                                  Google
                                                               J. Lennox
                                                             8x8 / Jitsi
                                                              March 2025

RTP Payload Format for VP9 Video

VP9ビデオのRTPペイロード形式

Abstract

概要

This specification describes an RTP payload format for the VP9 video codec. The payload format has wide applicability as it supports applications from low bitrate peer-to-peer usage to high bitrate video conferences. It includes provisions for temporal and spatial scalability.

この仕様には、VP9ビデオコーデックのRTPペイロード形式が記載されています。ペイロード形式は、低ビットレートのピアツーピア使用から高ビットレートのビデオ会議までのアプリケーションをサポートするため、幅広い適用性を備えています。時間的および空間的スケーラビリティに関する規定が含まれています。

Status of This Memo

本文書の位置付け

This is an Internet Standards Track document.

これは、インターネット標準トラックドキュメントです。

This document is a product of the Internet Engineering Task Force (IETF). It represents the consensus of the IETF community. It has received public review and has been approved for publication by the Internet Engineering Steering Group (IESG). Further information on Internet Standards is available in Section 2 of RFC 7841.

このドキュメントは、インターネットエンジニアリングタスクフォース（IETF）の製品です。IETFコミュニティのコンセンサスを表しています。公開レビューを受けており、インターネットエンジニアリングステアリンググループ（IESG）からの出版が承認されています。インターネット標準の詳細については、RFC 7841のセクション2で入手できます。

Information about the current status of this document, any errata, and how to provide feedback on it may be obtained at https://www.rfc-editor.org/info/rfc9628.

このドキュメントの現在のステータス、任意のERRATA、およびそのフィードバックを提供する方法に関する情報は、https://www.rfc-editor.org/info/rfc9628で取得できます。

Copyright Notice

著作権表示

This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Revised BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Revised BSD License.

このドキュメントは、BCP 78およびIETFドキュメント（https://trustee.ietf.org/license-info）に関連するIETF Trustの法的規定の対象となります。この文書に関するあなたの権利と制限を説明するので、これらの文書を注意深く確認してください。このドキュメントから抽出されたコードコンポーネントには、セクション4.Eで説明されている法的規定のセクション4.Eで説明されており、改訂されたBSDライセンスで説明されている保証なしで提供されるように、改訂されたBSDライセンステキストを含める必要があります。

   1.  Introduction
   2.  Conventions
   3.  Media Format Description
   4.  Payload Format
     4.1.  RTP Header Usage
     4.2.  VP9 Payload Descriptor
       4.2.1.  Scalability Structure (SS)
     4.3.  Frame Fragmentation
     4.4.  Scalable Encoding Considerations
     4.5.  Example of a VP9 RTP Stream
       4.5.1.  Reference Picture Use for Scalable Structure
   5.  Feedback Messages and Header Extensions
     5.1.  Reference Picture Selection Indication (RPSI)
     5.2.  Full Intra Request (FIR)
     5.3.  Layer Refresh Request (LRR)
   6.  Payload Format Parameters
     6.1.  SDP Parameters
       6.1.1.  Mapping of Media Subtype Parameters to SDP
       6.1.2.  Offer/Answer Considerations
   7.  Media Type Definition
   8.  Security Considerations
   9.  Congestion Control
   10. IANA Considerations
   11. References
     11.1.  Normative References
     11.2.  Informative References
   Acknowledgments
   Authors' Addresses

1. Introduction

1. はじめに

This document describes an RTP [RFC3550] payload specification applicable to the transmission of video streams encoded using the VP9 video codec [VP9-BITSTREAM]. The format described in this document can be used both in peer-to-peer and video conferencing applications.

このドキュメントでは、VP9ビデオコーデック[VP9ビットストリーム]を使用してエンコードされたビデオストリームの送信に適用されるRTP [RFC3550]ペイロード仕様について説明します。このドキュメントで説明されている形式は、ピアツーピアとビデオ会議アプリケーションの両方で使用できます。

The VP9 video codec was developed by Google and is the successor to its earlier VP8 [RFC6386] codec. Above the compression improvements and other general enhancements to VP8, VP9 is also designed in a way that allows spatially scalable video encoding.

VP9ビデオコーデックはGoogleによって開発され、以前のVP8 [RFC6386]コーデックの後継者です。圧縮の改善とVP8のその他の一般的な機能強化の上で、VP9は、空間的にスケーラブルなビデオエンコーディングを可能にする方法で設計されています。

2. Conventions

2. 規約

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all capitals, as shown here.

このドキュメント内のキーワード「MUST」、「MUST NOT」、「REQUIRED」、「SHALL」、「SHALL NOT」、「SHOULD」、「SHOULD NOT」、「RECOMMENDED」、「NOT RECOMMENDED」、「MAY」、および「OPTIONAL」は、ここに示すようにすべて大文字で表示されている場合にのみ、BCP 14 [RFC2119] [RFC8174] で説明されているように解釈されます。

3. Media Format Description

3. メディア形式の説明

The VP9 codec can maintain up to eight reference frames, of which up to three can be referenced by any new frame.

VP9コーデックは、最大8つの参照フレームを維持でき、そのうち最大3つは新しいフレームで参照できます。

VP9 also allows a frame to use another frame of a different resolution as a reference frame. (Specifically, a frame may use any references whose width and height are between 1/16th that of the current frame and twice that of the current frame, inclusive.) This allows internal resolution changes without requiring the use of keyframes.

VP9では、フレームが別の解像度の別のフレームを参照フレームとして使用することもできます。（具体的には、フレームは、幅と高さが現在のフレームの1/16番目、現在のフレームの2倍である任意の参照を使用する場合があります。）これにより、キーフレームを使用せずに内部解像度の変更が可能になります。

These features together enable an encoder to implement various forms of coarse-grained scalability, including temporal, spatial, and quality scalability modes, as well as combinations of these, without the need for explicit scalable coding tools.

これらの機能により、エンコーダーは、明示的なスケーラブルコーディングツールを必要とせずに、時間、空間、品質のスケーラビリティモード、およびこれらの組み合わせなど、さまざまな形式の粗粒スケーラビリティを実装できます。

Temporal layers define different frame rates of video; spatial and quality layers define different and possibly dependent representations of a single input frame. Spatial layers allow a frame to be encoded at different resolutions, whereas quality layers allow a frame to be encoded at the same resolution but at different qualities (and, thus, with different amounts of coding error). VP9 supports quality layers as spatial layers without any resolution changes; hereinafter, the term "spatial layer" is used to represent both spatial and quality layers.

時間層は、ビデオの異なるフレームレートを定義します。空間層と品質のレイヤーは、単一の入力フレームの異なる依存表現と依存する表現を定義します。空間レイヤーにより、フレームを異なる解像度でエンコードすることができますが、品質レイヤーにより、フレームを同じ解像度ではなく異なる品質（したがって、異なる量のコーディングエラー）でエンコードすることができます。VP9は、解像度の変更なしに空間層として品質層をサポートします。以下、「空間層」という用語は、空間層と品質の両方の層を表すために使用されます。

This payload format specification defines how such temporal and spatial scalability layers can be described and communicated.

このペイロード形式の仕様は、そのような時間的および空間スケーラビリティレイヤーをどのように説明および通信できるかを定義します。

Temporal and spatial scalability layers are associated with non-negative integer IDs. The lowest layer of either type has an ID of zero and is sometimes referred to as the "base" temporal or spatial layer.

時間的および空間的スケーラビリティレイヤーは、非陰性整数IDに関連付けられています。いずれかのタイプの最低層はゼロのIDを持ち、「ベース」時間または空間層と呼ばれることがあります。

Layers are designed, and MUST be encoded, such that if any layer, and all higher layers, are removed from the bitstream along either the spatial or temporal dimension, the remaining bitstream is still correctly decodable.

レイヤーは設計されており、エンコードされている必要があります。これにより、空間的または時間的な寸法に沿ってビットストリームから除去されるように、残りのビットストリームは依然として正しくデコード可能であるように、エンコードする必要があります。

For terminology, this document uses the term "frame" to refer to a single encoded VP9 frame for a particular resolution and/or quality, and "picture" to refer to all the representations (frames) at a single instant in time. Thus, a picture consists of one or more frames, encoding different spatial layers.

用語の場合、このドキュメントでは、「フレーム」という用語を使用して、特定の解像度および/または品質の単一のエンコードされたVP9フレームを参照し、「画像」を使用してすべての表現（フレーム）を1つの瞬間に指します。したがって、写真は1つ以上のフレームで構成され、異なる空間層をエンコードします。

Within a picture, a frame with spatial-layer ID equal to S, where S > 0, can depend on a frame of the same picture with a lower spatial-layer ID. This "inter-layer" dependency can result in additional coding gain compared to the case where only "inter-picture" dependency is used, where a frame depends on a previously coded frame in time. For simplicity, this payload format assumes that, within a picture and if inter-layer dependency is used, a spatial-layer S frame can depend only on the immediately previous spatial-layer S-1 frame, when S > 0. Additionally, if inter-picture dependency is used, a spatial-layer S frame is assumed to only depend on a previously coded spatial-layer S frame.

写真内では、S> 0に等しいSに等しい空間層IDを持つフレームは、空間層IDが低い同じ画像のフレームに依存できます。この「層間」依存関係は、フレームが以前にコード化されたフレームに依存する場合に使用される場合と比較して、追加のコーディングゲインをもたらす可能性があります。簡単にするために、このペイロード形式は、画像内で、層間依存関係を使用する場合、空間層のフレームは、S> 0の場合、直前の空間層S-1フレームにのみ依存できると想定しています。さらに、ピクチャ間依存関係を使用する場合、空間溶接部のフレームは以前にコーディングされた空間革新型フレームに依存すると想定されます。

Given the above simplifications for inter-layer and inter-picture dependencies, a flag (the D bit described below) is used to indicate whether a spatial-layer SID frame depends on the spatial-layer SID-1 frame. Given the D bit, a receiver only needs to additionally know the inter-picture dependency structure for a given spatial-layer frame in order to determine its decodability. Two modes of describing the inter-picture dependency structure are possible: "flexible mode" and "non-flexible mode". An encoder can only switch between the two on the first packet of a keyframe with a temporal-layer ID equal to zero.

上記の層間依存性と選択型依存関係の単純化を考えると、空間層SIDフレームが空間層SID-1フレームに依存するかどうかを示すために、フラグ（以下で説明するDビット）を使用します。Dビットを考えると、受信者は、そのデコード可能性を決定するために、特定の空間層フレームの選択できる依存性構造をさらに知るだけで済みます。「柔軟性モード」と「柔軟性のないモード」という選択できる依存性構造を記述する2つのモードが可能です。エンコーダーは、キーフレームの最初のパケットの2つをゼロに等しい時間層IDでのみ切り替えることができます。

In flexible mode, each packet can contain up to three reference indices, which identify all frames referenced by the frame transmitted in the current packet for inter-picture prediction. This (along with the D bit) enables a receiver to identify if a frame is decodable or not and helps it understand the temporal-layer structure. Since this is signaled in each packet, it makes it possible to have very flexible temporal-layer hierarchies and scalability structures, which are changing dynamically.

柔軟なモードでは、各パケットには最大3つの参照インデックスを含めることができます。これにより、現在のパケットで送信されたフレームが参照されるすべてのフレームを識別して、選択した状態の予測のために識別できます。これにより（Dビットとともに）、受信機がフレームがデコード可能かどうかを識別し、時間層構造を理解するのに役立ちます。これは各パケットでシグナルがあるため、動的に変化している非常に柔軟な時間層階層とスケーラビリティ構造を持つことが可能になります。

In non-flexible mode, frames are encoded using a fixed, recurring pattern of dependencies; the set of pictures that recur in this pattern is known as a "Picture Group" (or "PG"). In this mode, the inter-picture dependencies (the reference indices) of the PG MUST be pre-specified as part of the Scalability Structure (SS) data. Each packet has an index to refer to one of the described pictures in the PG from which the pictures referenced by the picture transmitted in the current packet for inter-picture prediction can be identified.

柔軟性のないモードでは、フレームは、依存関係の固定パターンを使用してエンコードされます。このパターンで再発する写真のセットは、「画像グループ」（または「PG」）として知られています。このモードでは、PGのピクチャ間依存性（参照インデックス）は、スケーラビリティ構造（SS）データの一部として事前に指定する必要があります。各パケットには、PGの記載されている写真の1つを参照するインデックスがあり、そこからは、現在のパケットで送信された画像で参照されている画像が、選択した予測のために識別できます。

Note: A "Picture Group" or "PG", as used in this document, is not the same thing as the term "Group of Pictures" as it is commonly used in video coding, i.e., to mean an independently decodable run of pictures beginning with a keyframe.

注：このドキュメントで使用されている「ピクチャーグループ」または「PG」は、ビデオコーディングで一般的に使用されている「写真のグループ」という用語と同じではありません。つまり、キーフレームで始まる独立したデコード可能な写真の実行を意味します。

The SS data can also be used to specify the resolution of each spatial layer present in the VP9 stream for both flexible and non-flexible modes.

SSデータは、VP9ストリームに存在する各空間層の解像度を柔軟なモードと柔軟性のないモードの両方で指定するためにも使用できます。

4. Payload Format

4. ペイロード形式

This section describes how the encoded VP9 bitstream is encapsulated in RTP. To handle network losses, usage of RTP/AVPF [RFC4585] is RECOMMENDED. All integer fields in this specification are encoded as unsigned integers in network octet order.

このセクションでは、エンコードされたVP9ビットストリームがRTPでどのようにカプセル化されているかについて説明します。ネットワークの損失を処理するには、RTP/AVPF [RFC4585]の使用をお勧めします。この仕様のすべての整数フィールドは、ネットワークオクテット順に署名されていない整数としてエンコードされます。

4.1. RTP Header Usage

4.1. RTPヘッダーの使用

The general RTP payload format for VP9 is depicted below.

VP9の一般的なRTPペイロード形式を以下に示します。

      0                   1                   2                   3
      0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
     |V=2|P|X|  CC   |M|     PT      |       sequence number         |
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
     |                           timestamp                           |
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
     |           synchronization source (SSRC) identifier            |
     +=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+
     |            contributing source (CSRC) identifiers             |
     |                             ....                              |
     +=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+
     |            VP9 payload descriptor (integer #octets)           |
     :                                                               :
     |                               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
     |                               :                               |
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+                               |
     |                                                               |
     +                                                               |
     :                          VP9 payload                          :
     |                                                               |
     |                               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
     |                               :    OPTIONAL RTP padding       |
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

Figure 1: General RTP Payload Format for VP9

図1：VP9の一般RTPペイロード形式

See Section 4.2 for more information on the VP9 payload descriptor; the VP9 payload is described in [VP9-BITSTREAM]. OPTIONAL RTP padding MUST NOT be included unless the P bit is set.

VP9ペイロード記述子の詳細については、セクション4.2を参照してください。VP9ペイロードは[VP9ビットストリーム]で説明されています。Pビットが設定されていない限り、オプションのRTPパディングは含めてはなりません。

Marker bit (M):

マーカービット（M）：

This bit MUST be set to one for the final packet of the highest spatial-layer frame (the final packet of the picture); otherwise, it is zero. Unless spatial scalability is in use for this picture, this bit will have the same value as the E bit described in Section 4.2. Note this bit MUST be set to one for the target spatial-layer frame if a stream is being rewritten to remove higher spatial layers.

このビットは、最高の空間層フレーム（写真の最終パケット）の最終パケットのために1つに設定する必要があります。それ以外の場合は、ゼロです。この写真では空間スケーラビリティが使用されていない限り、このビットはセクション4.2で説明されているEビットと同じ値を持ちます。注このビットは、より高い空間層を削除するためにストリームを書き換えている場合は、ターゲット空間層フレームの1つに設定する必要があります。

Payload Type (PT):

ペイロードタイプ（PT）：

In line with the policy in Section 3 of [RFC3551], applications using the VP9 RTP payload profile MUST assign a dynamic payload type number to be used in each RTP session and provide a mechanism to indicate the mapping. See Section 6.1 for the mechanism to be used with the Session Description Protocol (SDP) [RFC8866].

[RFC3551]のセクション3のポリシーに沿って、VP9 RTPペイロードプロファイルを使用したアプリケーションは、各RTPセッションで使用する動的なペイロードタイプ番号を割り当て、マッピングを示すメカニズムを提供する必要があります。セッション説明プロトコル（SDP）[RFC8866]で使用するメカニズムについては、セクション6.1を参照してください。

Timestamp:

タイムスタンプ：

The RTP timestamp [RFC3550] indicates the time when the input frame was sampled, at a clock rate of 90 kHz. If the input picture is encoded with multiple frames, all of the frames of the picture MUST have the same timestamp.

RTPタイムスタンプ[RFC3550]は、90 kHzのクロックレートで入力フレームがサンプリングされた時間を示します。入力画像が複数のフレームでエンコードされている場合、画像のフレームのすべてに同じタイムスタンプが必要です。

If a frame has the VP9 show_frame field set to zero (i.e., it is meant only to populate a reference buffer without being output), its timestamp MAY alternatively be set to be the same as the subsequent frame with show_frame equal to one. (This will be convenient for playing out pre-encoded content packaged with VP9 "superframes", which typically bundle show_frame==0 frames with a subsequent show_frame==1 frame.) Every picture containing a frame with show_frame==1, however, MUST have a unique timestamp modulo the 2^32 wrap of the field.

フレームにvp9 show_frameフィールドがゼロに設定されている場合（つまり、出力されずに参照バッファーを入力することのみを意味します）、そのタイムスタンプは、show_frameが1つに等しい後続のフレームと同じに設定される場合があります。（これは、VP9 "SuperFrames"でパッケージ化された事前エンコードされたコンテンツを再生するのに便利です。これは通常、show_frame == 0フレームをbundle show_frame == 1フレームでbundle == 1フレームで含むすべての画像に、フィールドの2^32ラップを2^32ラップします。

The remaining RTP Fixed Header Fields (V, P, X, CC, sequence number, SSRC, and CSRC identifiers) are used as specified in Section 5.1 of [RFC3550].

残りのRTP固定ヘッダーフィールド（V、P、X、CC、シーケンス番号、SSRC、およびCSRC識別子）は、[RFC3550]のセクション5.1で指定されているように使用されます。

4.2. VP9 Payload Descriptor

4.2. VP9ペイロード記述子

In flexible mode (with the F bit below set to one), the first octets after the RTP header are the VP9 payload descriptor, with the following structure.

フレキシブルモード（Fビットを1に設定する）では、RTPヘッダーの後の最初のオクテットはVP9ペイロード記述子であり、次の構造があります。

         0 1 2 3 4 5 6 7
        +-+-+-+-+-+-+-+-+
        |I|P|L|F|B|E|V|Z| (REQUIRED)
        +-+-+-+-+-+-+-+-+
   I:   |M| PICTURE ID  | (REQUIRED)
        +-+-+-+-+-+-+-+-+
   M:   | EXTENDED PID  | (RECOMMENDED)
        +-+-+-+-+-+-+-+-+
   L:   | TID |U| SID |D| (Conditionally RECOMMENDED)
        +-+-+-+-+-+-+-+-+                             -\
   P,F: | P_DIFF      |N| (Conditionally REQUIRED)    - up to 3 times
        +-+-+-+-+-+-+-+-+                             -/
   V:   | SS            |
        | ..            |
        +-+-+-+-+-+-+-+-+

Figure 2: Flexible Mode Format for VP9 Payload Descriptor

図2：VP9ペイロード記述子の柔軟なモード形式

In non-flexible mode (with the F bit below set to zero), the first octets after the RTP header are the VP9 payload descriptor, with the following structure.

柔軟性のないモード（Fビット以下のゼロに設定）では、RTPヘッダーの後の最初のオクテットはVP9ペイロード記述子であり、次の構造があります。

         0 1 2 3 4 5 6 7
        +-+-+-+-+-+-+-+-+
        |I|P|L|F|B|E|V|Z| (REQUIRED)
        +-+-+-+-+-+-+-+-+
   I:   |M| PICTURE ID  | (RECOMMENDED)
        +-+-+-+-+-+-+-+-+
   M:   | EXTENDED PID  | (RECOMMENDED)
        +-+-+-+-+-+-+-+-+
   L:   | TID |U| SID |D| (Conditionally RECOMMENDED)
        +-+-+-+-+-+-+-+-+
        |   TL0PICIDX   | (Conditionally REQUIRED)
        +-+-+-+-+-+-+-+-+
   V:   | SS            |
        | ..            |
        +-+-+-+-+-+-+-+-+

Figure 3: Non-Flexible Mode Format for VP9 Payload Descriptor

図3：VP9ペイロード記述子の柔軟性のないモード形式

Except as noted, the following field descriptions apply to the payload descriptor formats in both Figures 2 and 3.

前述の場合を除き、図2と3の両方のペイロード記述子形式には、次のフィールドの説明が適用されます。

Picture ID (PID) present. When set to one, the OPTIONAL PID MUST be present after the mandatory first octet and specified as below. Otherwise, PID MUST NOT be present. If the V bit was set in the stream's most recent start of a keyframe (i.e., the SS field was present) and the F bit is set to zero (i.e., non-flexible scalability mode is in use), then this bit MUST be set on every packet.

写真ID（PID）が表示されます。1つに設定する場合、オプションのPIDが必須の最初のオクテットの後に存在し、以下のように指定する必要があります。それ以外の場合、PIDが存在してはなりません。Vビットが、キーフレームのストリームの最新の開始（つまり、SSフィールドが存在する）で設定され、Fビットがゼロに設定されている場合（つまり、柔軟性のないスケーラビリティモードが使用されています）、このビットはすべてのパケットで設定する必要があります。

Inter-picture predicted frame. When set to zero, the frame does not utilize inter-picture prediction. In this case, up-switching to a current spatial layer's frame is possible from a directly lower spatial-layer frame. P SHOULD also be set to zero when encoding a layer synchronization frame in response to a Layer Refresh Request (LRR) [RFC9627] message (see Section 5.3). When P is set to zero, the Temporal-layer ID (TID) field (described below) MUST also be set to zero (if present). Note that the P bit does not forbid intra-picture, inter-layer prediction from earlier frames of the same picture, if any.

間隔で予測されたフレーム。ゼロに設定した場合、フレームは選択子間予測を使用しません。この場合、現在の空間レイヤーのフレームへの上昇スイッチングは、直接低い空間層フレームから可能です。また、レイヤーリフレッシュリクエスト（LRR）[RFC9627]メッセージに応じてレイヤー同期フレームをエンコードする場合、Pもゼロに設定する必要があります（セクション5.3を参照）。pがゼロに設定されている場合、時間層ID（TID）フィールド（以下に説明）もゼロ（存在する場合）に設定する必要があります。P BITは、同じ画像の以前のフレームからの絵画内、層間予測を禁止していないことに注意してください。

Layer indices present. When set to one, the one or two octets following the mandatory first octet and the PID (if present) is as described by "Layer indices" below. If the F bit (described below) is set to one (indicating flexible mode), then only one octet is present for the layer indices. Otherwise, if the F bit is set to zero (indicating non-flexible mode), then two octets are present for the layer indices.

レイヤーインデックスが存在します。1つに設定すると、必須の最初のオクテットとPID（存在する場合）に続く1つまたは2つのオクテットは、以下の「層インデックス」で説明されています。fビット（以下で説明）が1つ（柔軟なモードを示す）に設定されている場合、レイヤーインデックスには1つのオクテットのみが存在します。それ以外の場合、Fビットがゼロに設定されている場合（柔軟性のないモードを示す）、レイヤーインデックスに2つのオクテットが存在します。

Flexible mode. When set to one, this indicates flexible mode; if the P bit is also set to one, then the octets following the mandatory first octet, the PID, and layer indices (if present) are as described by "reference indices" below. This bit MUST only be set to one if the I bit is also set to one; if the I bit is set to zero, then this bit MUST also be set to zero and ignored by receivers. (Flexible mode's reference indices are defined as offsets from the Picture ID field, so they would have no meaning if I were not set.) The value of the F bit MUST only change on the first packet of a key picture. A "key picture" is a picture whose base spatial-layer frame is a keyframe, and thus one which completely resets the encoder state. This packet will have its P bit equal to zero, SID or L bit (described below) equal to zero, and B bit (described below) equal to one.

柔軟なモード。1に設定すると、これは柔軟なモードを示します。Pビットも1に設定されている場合、必須の最初のオクテット、PID、および層インデックス（存在する場合）に続くオクテットは、以下の「参照インデックス」で説明されています。このビットも1つに設定されている場合にのみ1つに設定する必要があります。iビットがゼロに設定されている場合、このビットもゼロに設定し、受信機によって無視する必要があります。（Flexible Modeの参照インデックスは、画像IDフィールドからのオフセットとして定義されるため、設定されていない場合は意味がありません。）Fビットの値は、キー画像の最初のパケットでのみ変更する必要があります。「キー画像」は、ベースの空間層フレームがキーフレームであるため、エンコーダー状態を完全にリセットする画像です。このパケットは、ゼロ、sid、またはlビット（以下で説明）ゼロに等しく、bビット（以下で説明）に等しいpビットを1に等しくします。

Start of Frame. This bit MUST be set to one if the first payload octet of the RTP packet is the beginning of a new VP9 frame; otherwise, it MUST NOT be one. Note that this frame might not be the first frame of a picture.

フレームの開始。RTPパケットの最初のペイロードオクテットが新しいVP9フレームの始まりである場合、このビットは1に設定する必要があります。そうでなければ、それは一つではありません。このフレームは写真の最初のフレームではないかもしれないことに注意してください。

End of Frame. This bit MUST be set to one for the final RTP packet of a VP9 frame; otherwise, it is zero. This enables a decoder to finish decoding the frame, where it otherwise may need to wait for the next packet to explicitly know that the frame is complete. Note that, if spatial scalability is in use, more frames from the same picture may follow; see the description of the B bit above.

フレームの終わり。このビットは、VP9フレームの最終RTPパケットのために1つに設定する必要があります。それ以外の場合は、ゼロです。これにより、デコーダーがフレームのデコードを完了することができます。そうでなければ、次のパケットがフレームが完全であることを明示的に把握するのを待つ必要がある場合があります。空間スケーラビリティが使用されている場合、同じ画像からのより多くのフレームが続く可能性があることに注意してください。上記のBビットの説明を参照してください。

Scalability Structure (SS) data present. When set to one, the OPTIONAL SS data MUST be present in the payload descriptor. Otherwise, the SS data MUST NOT be present.

スケーラビリティ構造（SS）データが存在します。1に設定すると、オプションのSSデータがペイロード記述子に存在する必要があります。それ以外の場合、SSデータが存在してはなりません。

Not a reference frame for upper spatial layers. If set to one, indicates that frames with higher spatial layers SID+1 and greater of the current and following pictures do not depend on the current spatial-layer SID frame. This enables a decoder that is targeting a higher spatial layer to know that it can safely discard this packet's frame without processing it, without having to wait for the D bit in the higher-layer frame (see below).

上部空間層の参照フレームではありません。1つに設定すると、より高い空間層SID+1を備えたフレームと、電流および次の写真のうち大きいフレームが、現在の空間層SIDフレームに依存しないことを示します。これにより、より高い空間層をターゲットにしているデコーダーが、高層フレームでDビットを待つことなく、処理せずにこのパケットのフレームを安全に破棄できることを知ることができます（以下を参照）。

The mandatory first octet is followed by the extension data fields that are enabled:

必須の最初のオクテットの後に、有効になっている拡張データフィールドが続きます。

The most significant bit of the first octet is an extension flag. The field MUST be present if the I bit is equal to one. If M is set, the PID field MUST contain 15 bits; otherwise, it MUST contain 7 bits. See PID below.

最初のオクテットの中で最も重要なビットは、拡張フラグです。Iビットが1つに等しい場合は、フィールドが存在する必要があります。Mが設定されている場合、PIDフィールドには15ビットが含まれている必要があります。それ以外の場合は、7ビットを含める必要があります。以下のPIDを参照してください。

Picture ID (PID):

写真ID（PID）：

Picture ID represented in 7 or 15 bits, depending on the M bit. This is a running index of the pictures, where the sender increments the value by one for each picture it sends. (Note, however, that because a middlebox can discard pictures where permitted by the SS, Picture IDs as received by a receiver might not be contiguous.) This field MUST be present if the I bit is equal to one. If M is set to zero, 7 bits carry the PID; else, if M is set to one, 15 bits carry the PID in network byte order. The sender may choose between a 7- or 15-bit index. The PID SHOULD start on a random number and MUST wrap after reaching the maximum ID (0x7f or 0x7fff depending on the index size chosen). The receiver MUST NOT assume that the number of bits in the PID stays the same through the session. If this field transitions from 7 bits to 15 bits, the value is zero-extended (i.e., the value after 0x6e is 0x006f); if the field transitions from 15 bits to 7 bits, it is truncated (i.e., the value after 0x1bbe is 0x3f).

Mビットに応じて、7ビットまたは15ビットで表される画像ID。これは写真の実行中のインデックスであり、送信者は送信する各画像に対して値を1つずつ増加させます。（ただし、ミドルボックスはSSによって許可されている場合に写真を破棄できるため、受信者が受信した写真IDは隣接していない可能性があります。）Iビットが1つに等しい場合、このフィールドは存在する必要があります。mがゼロに設定されている場合、7ビットがPIDを運びます。それ以外の場合、mが1に設定されている場合、15ビットがネットワークバイトの順序でPIDを運びます。送信者は、7ビットまたは15ビットのインデックスを選択できます。PIDは乱数で起動し、最大ID（選択したインデックスサイズに応じて0x7Fまたは0x7FFF）に達した後にラップする必要があります。受信者は、PIDのビット数がセッションを通じて同じままであると想定してはなりません。このフィールドが7ビットから15ビットから15ビットに遷移する場合、値はゼロ拡張されています（つまり、0x6E後の値は0x006Fです）。フィールドが15ビットから7ビットから7ビットに遷移する場合、切り捨てられます（つまり、0x1bbeの後の値は0x3fです）。

In the non-flexible mode (when the F bit is set to zero), this PID is used as an index to the PG specified in the SS data below. In this mode, the PID of the keyframe corresponds to the first specified frame in the PG. Then subsequent PIDs are mapped to subsequently specified frames in the PG (modulo N_G, specified in the SS data below), respectively.

柔軟性のないモード（Fビットがゼロに設定されている場合）では、このPIDは、以下のSSデータで指定されたPGのインデックスとして使用されます。このモードでは、キーフレームのPIDは、PGの最初の指定されたフレームに対応します。次に、後続のPIDは、それぞれPG（以下のSSデータで指定されているModulo N_G）のその後の指定フレームにマッピングされます。

All frames of the same picture MUST have the same PID value.

同じ画像のすべてのフレームには、同じPID値が必要です。

Frames (and their corresponding pictures) with the VP9 show_frame field equal to zero MUST have distinct PID values from subsequent pictures with show_frame equal to one. Thus, a picture (as defined in this specification) is different than a VP9 superframe.

ゼロに等しいVP9 show_frameフィールドを使用したフレーム（および対応する写真）は、show_frameが1に等しい後続の写真から異なるpid値を持つ必要があります。したがって、画像（この仕様で定義されている）は、VP9スーパーフレームとは異なります。

All frames of the same picture MUST have the same value for show_frame.

同じ画像のすべてのフレームには、show_frameに対して同じ値が必要です。

Layer indices:

レイヤーインデックス：

This field is optional but RECOMMENDED whenever encoding with layers. For both flexible and non-flexible modes, one octet is used to specify a layer frame's Temporal-layer ID (TID) and Spatial-layer ID (SID) as shown both in Figures 2 and 3. Additionally, a bit (U) is used to indicate that the current frame is a "switching up point" frame. Another bit (D) is used to indicate whether inter-layer prediction is used for the current frame.

このフィールドはオプションですが、レイヤーでエンコードするたびに推奨されます。柔軟なモードと柔軟性のないモードの両方で、1つのオクテットを使用して、図2と3に示すように、レイヤーフレームの時間層ID（TID）と空間層ID（SID）を指定するために使用されます。さらに、現在のフレームが「スイッチングポイント」フレームであることを示すためにビット（U）が使用されます。別のビット（d）は、現在のフレームに層間予測が使用されるかどうかを示すために使用されます。

In the non-flexible mode (when the F bit is set to zero), another octet is used to represent the Temporal Layer 0 Picture Index (8 bits) (TL0PICIDX), as depicted in Figure 3. The TL0PICIDX is present so that all minimally required frames (the base temporal-layer frames) can be tracked.

柔軟性のないモード（Fビットがゼロに設定されている場合）では、図3に示すように、時間層0画像インデックス（8ビット）（TL0PICIDX）を表すために別のオクテットが使用されます。TL0PICIDXが存在するように、すべての最小限の必要なフレーム（ベースの一時的な層フレーム）を追跡できます。

The TID and SID fields indicate the temporal and spatial layers and can help middleboxes and endpoints quickly identify which layer a packet belongs to.

TIDおよびSIDフィールドは、時間的および空間的層を示し、パケットがどの層に属するかを迅速に識別するのに役立ちます。

TID:

The temporal-layer ID of the current frame. In the case of non-flexible mode, if a PID is mapped to a picture in a specified PG, then the value of the TID MUST match the corresponding TID value of the mapped picture in the PG.

現在のフレームの時間層ID。柔軟性のないモードの場合、PIDが指定されたPGの画像にマッピングされている場合、TIDの値は、PGのマップ画像の対応するTID値と一致する必要があります。

Switching up point. When this bit is set to one, if the current picture has a temporal-layer ID equal to value T, then subsequent pictures with temporal-layer ID values higher than T will not depend on any picture before the current picture (in decode order) with a temporal-layer ID value greater than T.

ポイントの切り替え。このビットが1に設定されている場合、現在の画像が値tに等しい時間層IDを持っている場合、Tよりも高い一時的な層ID値を持つ後続の画像は、Tを超える時間層ID値を持つ現在の画像（デコード順）の前の画像に依存しません。

SID:

The spatial-layer ID of the current frame. Note that frames with spatial-layer SID > 0 may be dependent on decoded spatial-layer SID-1 frame within the same picture. Different frames of the same picture MUST have distinct spatial-layer IDs, and frames' spatial layers MUST appear in increasing order within the frame.

現在のフレームの空間層ID。空間層SID> 0のフレームは、同じ画像内のデコードされた空間層SID-1フレームに依存する可能性があることに注意してください。同じ画像の異なるフレームには、明確な空間層IDが必要であり、フレームの空間層がフレーム内で増加する順序で表示される必要があります。

Inter-layer dependency is used. D MUST be set to one if and only if the current spatial-layer SID frame depends on spatial-layer SID-1 frame of the same picture; otherwise, it MUST be set to zero. For the base-layer frame (with SID equal to zero), the D bit MUST be set to zero.

層間依存関係が使用されます。Dは、現在の空間層SIDフレームが同じ画像の空間層SID-1フレームに依存している場合にのみ、1つに設定する必要があります。それ以外の場合は、ゼロに設定する必要があります。ベース層フレーム（SIDがゼロに等しい）の場合、Dビットはゼロに設定する必要があります。

TL0PICIDX:

tl0picidx：

Temporal Layer 0 Picture Index (8 bits). TL0PICIDX is only present in the non-flexible mode (F = 0). This is a running index for the temporal base-layer pictures, i.e., the pictures with a TID set to zero. If the TID is larger than zero, TL0PICIDX indicates which temporal base-layer picture the current picture depends on. TL0PICIDX MUST be incremented by one when the TID is equal to zero. The index SHOULD start on a random number and MUST restart at zero after reaching the maximum number 255.

時間層0画像インデックス（8ビット）。TL0PICIDXは、柔軟性のないモード（f = 0）にのみ存在します。これは、一時的な基本層の写真の実行中のインデックス、つまり、TIDがゼロに設定された写真です。TIDがゼロよりも大きい場合、TL0PICIDXは、現在の画像が依存している一時的な基本層画像を示します。TL0PICIDXは、TIDがゼロに等しい場合に1つずつ増加する必要があります。インデックスは乱数で起動する必要があり、最大数255に達した後、ゼロで再起動する必要があります。

Reference indices:

参照インデックス：

When P and F are both set to one, indicating a non-keyframe in flexible mode, then at least one reference index MUST be specified as below. Additional reference indices (a total of up to three reference indices are allowed) may be specified using the N bit below. When either P or F is set to zero, then no reference index is specified.

PとFが両方とも1つに設定されている場合、柔軟なモードの非キーフレームを示している場合、少なくとも1つの参照インデックスを以下のように指定する必要があります。追加の参照インデックス（合計3つの参照インデックスが許可されます）は、以下のNビットを使用して指定できます。PまたはFがゼロに設定されている場合、参照インデックスは指定されていません。

P_DIFF:

P_DIFF：

The reference index (in 7 bits) specified as the relative PID from the current picture. For example, when P_DIFF=3 on a packet containing the picture with PID 112 means that the picture refers back to the picture with PID 109. This calculation is done modulo the size of the PID field, i.e., either 7 or 15 bits. A P_DIFF value of zero is invalid.

現在の画像からの相対PIDとして指定された参照インデックス（7ビット）。たとえば、PIT112を含む画像を含むパケットのP_DIFF = 3は、画像がPID 109で画像に戻ることを意味します。この計算は、PIDフィールドのサイズ、つまり7ビットまたは15ビットのサイズをモジュロで実行します。ゼロのp_diff値は無効です。

1 if there is additional P_DIFF following the current P_DIFF.

1現在のp_diffに続いて追加のp_diffがある場合。

4.2.1. Scalability Structure (SS)

4.2.1. スケーラビリティ構造（SS）

The SS data describes the resolution of each frame within a picture as well as the inter-picture dependencies for a PG. If the VP9 payload descriptor's V bit is set, the SS data is present in the position indicated in Figures 2 and 3.

SSデータは、画像内の各フレームの解像度と、PGの間隔依存関係を記述します。VP9ペイロード記述子のVビットが設定されている場合、SSデータは図2および3に示されている位置に存在します。

        +-+-+-+-+-+-+-+-+
   V:   | N_S |Y|G|-|-|-|
        +-+-+-+-+-+-+-+-+              -\
   Y:   |     WIDTH     | (OPTIONAL)    .
        +               +               .
        |               | (OPTIONAL)    .
        +-+-+-+-+-+-+-+-+               . - N_S + 1 times
        |     HEIGHT    | (OPTIONAL)    .
        +               +               .
        |               | (OPTIONAL)    .
        +-+-+-+-+-+-+-+-+              -/
   G:   |      N_G      | (OPTIONAL)
        +-+-+-+-+-+-+-+-+                            -\
   N_G: | TID |U| R |-|-| (OPTIONAL)                 .
        +-+-+-+-+-+-+-+-+              -\            . - N_G times
        |    P_DIFF     | (OPTIONAL)    . - R times  .
        +-+-+-+-+-+-+-+-+              -/            -/

Figure 4: VP9 Scalability Structure

図4：VP9スケーラビリティ構造

N_S:

N_S：

Number of Spatial Layers Minus 1. N_S + 1 indicates the number of spatial layers present in the VP9 stream.

空間層の数を引いた1。N_S + 1は、VP9ストリームに存在する空間層の数を示します。

Each spatial layer's frame resolution is present. When set to one, the OPTIONAL WIDTH (2 octets) and HEIGHT (2 octets) MUST be present for each layer frame. Otherwise, the resolution MUST NOT be present.

各空間層のフレーム解像度が存在します。1つに設定すると、オプションの幅（2オクテット）と高さ（2オクテット）が各レイヤーフレームに存在する必要があります。それ以外の場合、解決策が存在してはなりません。

The PG description present flag.

PG説明はフラグを提示します。

- ：

A bit reserved for future use. It MUST be set to zero and MUST be ignored by the receiver.

将来の使用のために少し予約されています。ゼロに設定する必要があり、受信機によって無視する必要があります。

N_G:

N_G：

N_G indicates the number of pictures in a PG. If N_G is greater than zero, then the SS data allows the inter-picture dependency structure of the VP9 stream to be pre-declared, rather than indicating it on the fly with every packet. If N_G is greater than zero, then for N_G pictures in the PG, each picture's Temporal-layer ID (TID), switch up point (U), and reference indices (P_DIFFs) are specified.

N_Gは、Pgの写真の数を示します。N_Gがゼロより大きい場合、SSデータは、すべてのパケットでその場でそれを示すのではなく、VP9ストリームの偏光間依存構造を事前に宣言することができます。N_Gがゼロより大きい場合、PGのN_G写真の場合、各画像の時間層ID（TID）、スイッチアップポイント（U）、および参照インデックス（P_DIFF）が指定されています。

The first picture specified in the PG MUST have a TID set to zero.

PGで指定された最初の画像には、ゼロに設定されたTIDセットが必要です。

G set to zero or N_G set to zero indicates that either there is only one temporal layer (for non-flexible mode) or no fixed inter-picture dependency information is present (for flexible mode) going forward in the bitstream.

gゼロまたはn_gに設定された設定は、1つの時間レイヤー（柔軟性のないモードの場合）のみが1つのみがあるか、固定された選択依存関係情報（柔軟なモードの場合）が存在しないことを示しています。

Note that for a given picture, all frames follow the same inter-picture dependency structure. However, the frame rate of each spatial layer can be different from each other; this can be described with the use of the D bit described above. The specified dependency structure in the SS data MUST be for the highest frame rate layer.

特定の画像では、すべてのフレームが同じ選択子間依存性構造に従うことに注意してください。ただし、各空間層のフレームレートは互いに異なる場合があります。これは、上記のDビットを使用して説明できます。SSデータの指定された依存関係構造は、最高のフレームレートレイヤー用でなければなりません。

The number of P_DIFF fields that are present.

存在するP_DIFFフィールドの数。

In a scalable stream sent with a fixed pattern, the SS data SHOULD be included in the first packet of every key frame. This is a packet with the P bit equal to zero, SID or L bit equal to zero, and B bit equal to one. The SS data MUST only be changed on the picture that corresponds to the first picture specified in the previous SS data's PG (if the previous SS data's N_G was greater than zero).

固定パターンで送信されるスケーラブルなストリームでは、SSデータをすべてのキーフレームの最初のパケットに含める必要があります。これは、ゼロに等しいPビット、sidまたはlビットがゼロに等しく、bビットが1に等しいパケットです。SSデータは、以前のSSデータのPGで指定された最初の画像に対応する画像でのみ変更する必要があります（以前のSSデータのN_Gがゼロより大きい場合）。

4.3. Frame Fragmentation

4.3. フレームの断片化

VP9 frames are fragmented into packets in RTP sequence number order: beginning with a packet with the B bit set and ending with a packet with the E bit set. There is no mechanism for finer-grained access to parts of a VP9 frame.

VP9フレームは、RTPシーケンス番号の順序でパケットに断片化されます。Bビットセットのパケットから始まり、Eビットセットのパケットで終了します。VP9フレームの一部へのより細かい粒度のアクセスのメカニズムはありません。

4.4. Scalable Encoding Considerations

4.4. スケーラブルなエンコードの考慮事項

In addition to the use of reference frames, VP9 has several additional forms of inter-frame dependencies, largely involving probability tables for the entropy and tree encoders. In VP9 syntax, the syntax element "error_resilient_mode" resets this additional inter-frame data, allowing a frame's syntax to be decoded independently.

参照フレームの使用に加えて、VP9には、主にエントロピーおよびツリーエンコーダーの確率表が含まれる、いくつかの追加の形式間依存関係があります。VP9構文では、構文要素「error_resilient_mode」がこの追加のフレーム間データをリセットして、フレームの構文を個別にデコードできるようにします。

Due to the requirements of scalable streams, a VP9 encoder producing a scalable stream needs to ensure that a frame does not depend on a previous frame (of the same or a previous picture) that can legitimately be removed from the stream. Thus, a frame that follows a frame that might be removed (in full decode order) MUST be encoded with "error_resilient_mode" set to true.

スケーラブルストリームの要件により、スケーラブルなストリームを生成するVP9エンコーダーは、フレームがストリームから合法的に削除できる（同じまたは前の画像の）以前のフレームに依存しないことを確認する必要があります。したがって、（完全なデコード順序で）削除される可能性のあるフレームに従うフレームは、「error_resilient_mode」がtrueに設定されてエンコードする必要があります。

For spatially scalable streams, this means that "error_resilient_mode" needs to be turned on for the base spatial layer; however, it can be turned off for higher spatial layers, assuming they are sent with inter-layer dependency (i.e., with the D bit set). For streams that are only temporally scalable without spatial scalability, "error_resilient_mode" can additionally be turned off for any picture that immediately follows a temporal-layer 0 frame.

空間的にスケーラブルなストリームの場合、これはベース空間層に対して「error_resilient_mode」をオンにする必要があることを意味します。ただし、より高い空間層では、それらが層間依存関係（つまり、Dビットセット）で送信されると仮定してオフにすることができます。空間スケーラビリティなしで一時的にスケーラブルなストリームの場合、「error_resilient_mode」は、時間層0フレームの直後の画像に対してさらにオフにできます。

4.5. Example of a VP9 RTP Stream

4.5. VP9 RTPストリームの例

4.5.1. Reference Picture Use for Scalable Structure

4.5.1. スケーラブルな構造の参照画像使用

As discussed in Section 3, the VP9 codec can maintain up to eight reference frames, of which up to three can be referenced or updated by any new frame. This section illustrates one way that a scalable structure (with three spatial layers and three temporal layers) can be constructed using these reference frames.

セクション3で説明したように、VP9コーデックは最大8つの参照フレームを維持でき、そのうち最大3つの参照フレームを新しいフレームで参照または更新できます。このセクションでは、これらの参照フレームを使用してスケーラブルな構造（3つの空間層と3つの時間層を使用）を構築できる1つの方法を示しています。

               +==========+=========+============+=========+
               | Temporal | Spatial | References | Updates |
               +==========+=========+============+=========+
               |    0     |    0    |     0      |    0    |
               +----------+---------+------------+---------+
               |    0     |    1    |    0,1     |    1    |
               +----------+---------+------------+---------+
               |    0     |    2    |    1,2     |    2    |
               +----------+---------+------------+---------+
               |    2     |    0    |     0      |    6    |
               +----------+---------+------------+---------+
               |    2     |    1    |    1,6     |    7    |
               +----------+---------+------------+---------+
               |    2     |    2    |    2,7     |    -    |
               +----------+---------+------------+---------+
               |    1     |    0    |     0      |    3    |
               +----------+---------+------------+---------+
               |    1     |    1    |    1,3     |    4    |
               +----------+---------+------------+---------+
               |    1     |    2    |    2,4     |    5    |
               +----------+---------+------------+---------+
               |    2     |    0    |     3      |    6    |
               +----------+---------+------------+---------+
               |    2     |    1    |    4,6     |    7    |
               +----------+---------+------------+---------+
               |    2     |    2    |    5,7     |    -    |
               +----------+---------+------------+---------+

Table 1: Example Scalability Structure

表1：スケーラビリティ構造の例

This structure is constructed such that the U bit can always be set.

この構造は、Uビットを常に設定できるように構築されています。

5. Feedback Messages and Header Extensions

5. フィードバックメッセージとヘッダー拡張機能

5.1. Reference Picture Selection Indication (RPSI)

5.1. 参照画像選択表示（RPSI）

The RPSI is a payload-specific feedback message defined within the RTCP-based feedback format. The RPSI message is generated by a receiver and can be used in two ways: either it can signal a preferred reference picture when a loss has been detected by the decoder (preferably a reference that the decoder knows is perfect) or it can be used as positive feedback information to acknowledge correct decoding of certain reference pictures. The positive feedback method is useful for VP9 used for point-to-point (unicast) communication. The use of RPSI for VP9 is preferably combined with a special update pattern of the codec's two special reference frames -- the golden frame and the altref frame -- in which they are updated in an alternating leapfrog fashion. When a receiver has received and correctly decoded a golden or altref frame, and that frame had a Picture ID in the payload descriptor, the receiver can acknowledge this simply by sending an RPSI message back to the sender. The message body (i.e., the "native RPSI bit string" in [RFC4585]) is simply the (7- or 15-bit) Picture ID of the received frame.

RPSIは、RTCPベースのフィードバック形式内で定義されたペイロード固有のフィードバックメッセージです。RPSIメッセージはレシーバーによって生成され、2つの方法で使用できます。デコーダーによって損失が検出された場合（できればデコーダーが知っている参照が完全です）、特定の参照写真の正しいデコードを認める肯定的なフィードバック情報として使用できます。肯定的なフィードバック方法は、ポイントツーポイント（ユニキャスト）通信に使用されるVP9に役立ちます。VP9のRPSIの使用は、コーデックの2つの特別な参照フレーム（ゴールデンフレームとアルトレフフレーム）の特別な更新パターンと組み合わせることができます。レシーバーがゴールデンまたはアルトレフフレームを受信して正しくデコードし、そのフレームがペイロード記述子に画像IDを持っていた場合、レシーバーはRPSIメッセージを送信者に送信するだけでこれを確認できます。メッセージ本文（つまり、[RFC4585]の「ネイティブRPSIビット文字列」）は、受信したフレームの（7または15ビット）画像IDです。

Note: because all frames of the same picture must have the same inter-picture reference structure, there is no need for a message to specify which frame is being selected.

注：同じ画像のすべてのフレームには、同じ絵画間参照構造が必要なため、選択されているフレームを指定するメッセージは必要ありません。

5.2. Full Intra Request (FIR)

5.2. 完全なリクエスト（for）

The Full Intra Request (FIR) [RFC5104] RTCP feedback message allows a receiver to request a full state refresh of an encoded stream.

完全なリクエスト（FIR）[RFC5104] RTCPフィードバックメッセージにより、レシーバーはエンコードされたストリームの完全な状態リフレッシュを要求できます。

Upon receipt of a FIR request, a VP9 sender MUST send a picture with a keyframe for its spatial-layer 0 layer frame and then send frames without inter-picture prediction (P=0) for any higher-layer frames.

FIRリクエストを受信すると、VP9送信者は、空間層0レイヤーフレームのキーフレームを備えた画像を送信し、高レイヤーフレームのためにピクチャー間予測なし（p = 0）なしでフレームを送信する必要があります。

5.3. Layer Refresh Request (LRR)

5.3. レイヤーリフレッシュリクエスト（LRR）

The Layer Refresh Request (LRR) [RFC9627] allows a receiver to request a single layer of a spatially or temporally encoded stream to be refreshed without necessarily affecting the stream's other layers.

レイヤーリフレッシュリクエスト（LRR）[RFC9627]により、レシーバーは、ストリームの他のレイヤーに必ずしも影響を与えることなく、空間的または一時的にエンコードされたストリームの単一層をリフレッシュするようにすることができます。

               +---------------+---------------+
               |0|1|2|3|4|5|6|7|0|1|2|3|4|5|6|7|
               +---------------+---------+-----+
               |   RES   | TID | RES     | SID |
               +---------------+---------+-----+

Figure 5: LRR Index Format

図5：LRRインデックス形式

Figure 5 shows the format of an LRR's layer index fields for VP9 streams. The two "RES" fields MUST be set to zero on transmission and ignored on reception. See Section 4.2 for details on the TID and SID fields.

図5は、VP9ストリームのLRRのレイヤーインデックスフィールドの形式を示しています。2つの「RES」フィールドは、送信時にゼロに設定し、受信で無視する必要があります。TIDおよびSIDフィールドの詳細については、セクション4.2を参照してください。

Identification of a layer refresh frame can be derived from the reference IDs of each frame by backtracking the dependency chain until reaching a point where only decodable frames are being referenced. Therefore, it's recommended for both the flexible and the non-flexible mode that, when switching up points are being encoded in response to an LRR, those packets contain layer indices and the reference field or fields so that the decoder or selective forwarding middleboxes [RFC7667] can make this derivation.

レイヤーリフレッシュフレームの識別は、デコード可能なフレームのみが参照されるポイントに到達するまで、依存関係チェーンをバックトラックすることにより、各フレームの参照IDから導出できます。したがって、柔軟なモードと柔軟性のないモードの両方に推奨されます。これにより、ポイントを切り替えるとLRRに応じてエンコードされている場合、これらのパケットにはレイヤーインデックスと参照フィールドまたはフィールドが含まれているため、デコーダーまたは選択的な転送ミドルボックス[RFC7667]がこの派生を行うことができます。

Example:

例：

LRR {1,0}, {2,1} is sent by a Multipoint Control Unit (MCU) when it is currently relaying {1,0} to a receiver that wants to upgrade to {2,1}. In response, the encoder should encode the next frames in layers {1,1} and {2,1} by only referring to frames in {1,0} or {0,0}.

LRR {1,0}、{2,1}は、現在{1,0}を{2,1}にアップグレードしたいレシーバーに{1,0}を中継しているときに、マルチポイントコントロールユニット（MCU）によって送信されます。これに応じて、エンコーダーは、{1,0}または{0,0}のフレームのみを参照することにより、次のフレームをレイヤー{1,1}および{2,1}でエンコードする必要があります。

In the non-flexible mode, periodic upgrade frames can be defined by the layer structure of the SS; thus, periodic upgrade frames can be automatically identified by the Picture ID.

柔軟性のないモードでは、定期的なアップグレードフレームは、SSのレイヤー構造によって定義できます。したがって、定期的なアップグレードフレームは、画像IDによって自動的に識別できます。

6. Payload Format Parameters

6. ペイロードフォーマットパラメーター

This payload format has three optional parameters: max-fr, max-fs, and profile-id.

このペイロード形式には、MAX-FR、MAX-FS、およびProfile-IDの3つのオプションパラメーターがあります。

The max-fr and max-fs parameters are used to signal the capabilities of a receiver implementation. If the implementation is willing to receive media, both parameters MUST be provided. These parameters MUST NOT be used for any other purpose. A media sender SHOULD NOT send media with a frame rate or frame size exceeding the max-fr and max-fs values signaled. (There may be scenarios, such as pre-encoded media or selective forwarding middleboxes [RFC7667], where a media sender does not have media available that fits within a receiver's max-fs and max-fr values; in such scenarios, a sender MAY exceed the signaled values.)

MAX-FRおよびMAX-FSパラメーターは、受信機の実装の機能を信号するために使用されます。実装がメディアを受信する意思がある場合、両方のパラメーターを提供する必要があります。これらのパラメーターは、他の目的に使用してはなりません。メディア送信者は、メディアをフレームレートまたはフレームサイズをMAX-FRおよびMAX-FS値を超えて送信してはなりません。（事前エンコードされたメディアや選択的転送ミドルボックス[RFC7667]などのシナリオがあります。メディア送信者には、受信者のMAX-FSおよびMAX-FR値に適合するメディアが利用できません。このようなシナリオでは、送信者は信号値を超えている可能性があります。）

max-fr:

MAX-FR：

The value of max-fr is an integer indicating the maximum frame rate in units of frames per second that the decoder is capable of decoding.

Max-FRの値は、デコーダーがデコードできることを1秒あたりのフレーム単位単位の最大フレームレートを示す整数です。

max-fs:

MAX-FS：

The value of max-fs is an integer indicating the maximum frame size in units of macroblocks that the decoder is capable of decoding.

Max-FSの値は、デコーダーがデコードできるマクロブロックの単位の最大フレームサイズを示す整数です。

The decoder is capable of decoding this frame size as long as the width and height of the frame in macroblocks are each less than int(sqrt(max-fs * 8)); for instance, a max-fs of 1200 (capable of supporting 640x480 resolution) will support widths and heights up to 1552 pixels (97 macroblocks).

デコーダーは、マクロブロックのフレームの幅と高さがそれぞれint（sqrt（max-fs * 8））未満である限り、このフレームサイズをデコードできます。たとえば、1200の最大（640x480の解像度をサポートできる）は、最大1552ピクセル（97マクロブロック）までの幅と高さをサポートします。

profile-id:

Profile-ID：

The value of profile-id is an integer indicating the default coding profile (the subset of coding tools that may have been used to generate the stream or that the receiver supports). Table 2 lists all of the profiles defined in Section 7.2 of [VP9-BITSTREAM] and the corresponding integer values to be used.

Profile-IDの値は、デフォルトのコーディングプロファイル（ストリームの生成に使用された可能性のあるコーディングツールのサブセットまたは受信機がサポートする可能性のあるコーディングツールのサブセット）を示す整数です。表2に、[VP9ビットストリーム]のセクション7.2で定義されているすべてのプロファイルと、使用する対応する整数値を示します。

If no profile-id is present, Profile 0 MUST be inferred. (The profile-id parameter was added relatively late in the development of this specification, so some existing implementations may not send it.)

プロファイルIDが存在しない場合、プロファイル0を推測する必要があります。（この仕様の開発により、プロファイル-IDパラメーターが比較的遅く追加されたため、一部の既存の実装は送信されない場合があります。）

Informative note: See Table 3 for capabilities of coding profiles defined in Section 7.2 of [VP9-BITSTREAM].

有益な注意：[VP9ビットストリーム]のセクション7.2で定義されているコーディングプロファイルの機能については、表3を参照してください。

A receiver MUST ignore any parameter unspecified in this specification.

受信者は、この仕様で特定されていないパラメーターを無視する必要があります。

                          +=========+============+
                          | Profile | profile-id |
                          +=========+============+
                          |    0    |     0      |
                          +---------+------------+
                          |    1    |     1      |
                          +---------+------------+
                          |    2    |     2      |
                          +---------+------------+
                          |    3    |     3      |
                          +---------+------------+

Table 2: Correspondence between profile-id to VP9 Profile Integer

表2：VP9プロファイル整数へのプロファイルID間の対応

   +=========+===========+=================+==========================+
   | Profile | Bit Depth | SRGB Colorspace |    Chroma Subsampling    |
   +=========+===========+=================+==========================+
   |    0    |     8     |        No       |        YUV 4:2:0         |
   +---------+-----------+-----------------+--------------------------+
   |    1    |     8     |       Yes       | YUV 4:2:2,4:4:0 or 4:4:4 |
   +---------+-----------+-----------------+--------------------------+
   |    2    |  10 or 12 |        No       |        YUV 4:2:0         |
   +---------+-----------+-----------------+--------------------------+
   |    3    |  10 or 12 |       Yes       | YUV 4:2:2,4:4:0 or 4:4:4 |
   +---------+-----------+-----------------+--------------------------+

Table 3: Profile Capabilities

表3：プロファイル機能

Note: SRGB (often sRGB) = Standard Red-Green-Blue

注：SRGB（多くの場合SRGB）=標準の赤緑色

6.1. SDP Parameters

6.1. SDPパラメーター

6.1.1. Mapping of Media Subtype Parameters to SDP

6.1.1. SDPへのメディアサブタイプパラメーターのマッピング

The media type video/vp9 string is mapped to fields in the Session Description Protocol (SDP) [RFC8866] as follows:

メディアタイプのビデオ/VP9文字列は、次のようにセッション説明プロトコル（SDP）[RFC8866]のフィールドにマッピングされます。

* The media name in the "m=" line of SDP MUST be video.

* SDPの「m =」行のメディア名はビデオでなければなりません。

* The encoding name in the "a=rtpmap" line of SDP MUST be VP9 (the media subtype).

* SDPの「a = rtpmap」行のエンコーディング名は、VP9（メディアサブタイプ）でなければなりません。

* The clock rate in the "a=rtpmap" line MUST be 90000.

* 「a = rtpmap」行のクロックレートは90000でなければなりません。

* The parameters max-fr and max-fs MUST be included in the "a=fmtp" line of SDP if the receiver wishes to declare its receiver capabilities. These parameters are expressed as a media subtype string in the form of a semicolon-separated list of parameter=value pairs.

* 受信者が受信機の機能を宣言したい場合は、MAX-FRとMAX-FSをSDPの「A = FMTP」ラインに含める必要があります。これらのパラメーターは、パラメーター=値ペアのセミコロン分離リストの形式のメディアサブタイプ文字列として表されます。

* The OPTIONAL parameter profile-id, when present, SHOULD be included in the "a=fmtp" line of SDP. This parameter is expressed as a media subtype string in the form of a parameter=value pair. When the parameter is not present, a value of 0 MUST be inferred for profile-id.

* オプションのパラメータープロファイルIDは、存在する場合、SDPの「a = fmtp」行に含める必要があります。このパラメーターは、パラメーター=値ペアの形式のメディアサブタイプ文字列として表されます。パラメーターが存在しない場合、プロファイルIDに対して0の値を推測する必要があります。

6.1.1.1. Example

6.1.1.1. 例

An example of media representation in SDP is as follows:

SDPのメディア表現の例は次のとおりです。

   m=video 49170 RTP/AVPF 98
   a=rtpmap:98 VP9/90000
   a=fmtp:98 max-fr=30;max-fs=3600;profile-id=0

6.1.2. Offer/Answer Considerations

6.1.2. 考慮事項を提供/回答します

When VP9 is offered over RTP using SDP in an Offer/Answer model [RFC3264] for negotiation for unicast usage, the following limitations and rules apply:

ユニキャスト使用に関する交渉のために、オファー/回答モデル[RFC3264]でSDPを使用してRTPを介してVP9を提供する場合、次の制限とルールが適用されます。

* The parameter identifying a media format configuration for VP9 is profile-id. This media format configuration parameter MUST be used symmetrically; that is, the answerer MUST either maintain this configuration parameter or remove the media format (payload type) completely if it is not supported.

* VP9のメディア形式構成を識別するパラメーターはProfile-IDです。このメディア形式の構成パラメーターは、対称的に使用する必要があります。つまり、応答者は、この構成パラメーターを維持するか、サポートされていない場合はメディア形式（ペイロードタイプ）を完全に削除する必要があります。

* The max-fr and max-fs parameters are used declaratively to describe receiver capabilities, even in the Offer/Answer model. The values in an answer are used to describe the answerer's capabilities; thus, their values are set independently of the values in the offer.

* MAX-FRおよびMAX-FSパラメーターは、オファー/回答モデルであっても、受信機の機能を記述するために宣言的に使用されます。回答の値は、回答者の能力を説明するために使用されます。したがって、それらの価値は、オファーの価値とは独立して設定されます。

* To simplify the handling and matching of these configurations, the same RTP payload type number used in the offer SHOULD also be used in the answer and in a subsequent offer, as specified in [RFC3264]. An answer or subsequent offer MUST NOT contain the payload type number used in the offer unless the profile-id value is exactly the same as in the original offer. However, max-fr and max-fs parameters MAY be changed in subsequent offers and answers, with the same payload type number, if an endpoint wishes to change its declared receiver capabilities.

* これらの構成の取り扱いと一致を簡素化するには、[RFC3264]で指定されているように、オファーで使用される同じRTPペイロードタイプ番号も回答とその後のオファーで使用する必要があります。回答または後続のオファーには、プロファイルID値が元のオファーとまったく同じでない限り、オファーで使用されるペイロードタイプ番号を含めてはなりません。ただし、エンドポイントが宣言されたレシーバー機能を変更したい場合、同じペイロードタイプ番号で、その後のオファーと回答でMAX-FRおよびMAX-FSパラメーターが変更される場合があります。

7. Media Type Definition

7. メディアタイプの定義

This registration uses the template defined in [RFC6838] and following [RFC4855].

この登録は、[RFC6838]および[RFC4855]で定義されたテンプレートを使用します。

Type name:

タイプ名：

video

ビデオ

Subtype name:

サブタイプ名：

VP9

Required parameters:

必要なパラメーター：

N/A

n/a

Optional parameters:

オプションのパラメーター：

There are three optional parameters: max-fr, max-fs, and profile-id. See Section 6 for their definition.

MAX-FR、MAX-FS、およびProfile-IDの3つのオプションパラメーターがあります。その定義については、セクション6を参照してください。

Encoding considerations:

考慮事項のエンコード：

This media type is framed in RTP and contains binary data; see Section 4.8 of [RFC6838].

このメディアタイプはRTPに囲まれており、バイナリデータが含まれています。[RFC6838]のセクション4.8を参照してください。

Security considerations:

セキュリティ上の考慮事項：

See Section 8 of RFC 9628.

RFC 9628のセクション8を参照してください。

Interoperability considerations:

相互運用性の考慮事項：

None

なし

Published specification:

公開された仕様：

VP9 bitstream format [VP9-BITSTREAM] and RFC 9628.

VP9ビットストリーム形式[VP9ビットストリーム]およびRFC 9628。

Applications that use this media type:

このメディアタイプを使用するアプリケーション：

For example, video over IP, video conferencing.

たとえば、ビデオオーバーIP、ビデオ会議。

Fragment identifier considerations:

フラグメント識別子の考慮事項：

N/A

n/a

Additional information:

追加情報：

None

なし

Person & email address to contact for further information:

詳細については、連絡先への個人およびメールアドレス：

Jonathan Lennox <jonathan.lennox@8x8.com>

Intended usage:

意図された使用法：

COMMON

一般

Restrictions on usage:

使用に関する制限：

This media type depends on RTP framing; hence, it is only defined for transfer via RTP [RFC3550].

このメディアタイプは、RTPフレーミングに依存します。したがって、RTP [RFC3550]を介した転送のみで定義されます。

Author:

著者：

Jonathan Lennox <jonathan.lennox@8x8.com>

Change controller:

Change Controller：

IETF AVTCore Working Group delegated from the IETF.

IETF AVTCOREワーキンググループは、IETFから委任されました。

8. Security Considerations

8. セキュリティに関する考慮事項

RTP packets using the payload format defined in this specification are subject to the security considerations discussed in the RTP specification [RFC3550], and in any applicable RTP profile such as RTP/AVP [RFC3551], RTP/AVPF [RFC4585], RTP/SAVP [RFC3711], or RTP/ SAVPF [RFC5124]. However, as "Securing the RTP Framework: Why RTP Does Not Mandate a Single Media Security Solution" [RFC7202] discusses, it is not an RTP payload format's responsibility to discuss or mandate what solutions are used to meet the basic security goals like confidentiality, integrity, and source authenticity for RTP in general. This responsibility lies with anyone using RTP in an application. They can find guidance on available security mechanisms in "Options for Securing RTP Sessions" [RFC7201]. Applications SHOULD use one or more appropriate strong security mechanisms.

この仕様で定義されているペイロード形式を使用したRTPパケットは、RTP仕様[RFC3550]、およびRTP/AVP [RFC3551]、RTP/AVPF [RFC4585]、RTP [RFC3711]、RTP/SAVE SAVE/SAVE/SAVE/SAVEなどのRTP/AVP [RFC3551]、RTP/AVPF [RFC4585]などの該当するRTPプロファイルで説明されているセキュリティ考慮事項の対象となります。[RFC5124]。ただし、「RTPフレームワークを保護する：RTPが単一のメディアセキュリティソリューションを義務付けていない理由」[RFC7202]は、RTPフォーマットの責任ではなく、RTPの機密性、整合性、整合性、ソースの信頼性などの基本的なセキュリティ目標を満たすために使用されるソリューションを議論または義務付けます。この責任は、アプリケーションでRTPを使用している人にあります。彼らは、「RTPセッションを保護するためのオプション」[RFC7201]で利用可能なセキュリティメカニズムに関するガイダンスを見つけることができます。アプリケーションは、1つ以上の適切な強力なセキュリティメカニズムを使用する必要があります。

Implementations of this RTP payload format need to take appropriate security considerations into account. It is extremely important for the decoder to be robust against malicious or malformed payloads and ensure that they do not cause the decoder to overrun its allocated memory or otherwise misbehave. An overrun in allocated memory could lead to arbitrary code execution by an attacker. The same applies to the encoder, even though problems in encoders are (typically) rarer.

このRTPペイロード形式の実装は、適切なセキュリティに関する考慮事項を考慮に入れる必要があります。デコーダーが悪意のあるまたは奇形のペイロードに対して堅牢であることが非常に重要であり、デコーダーが割り当てられたメモリをオーバーランしたり、その他の誤ったりしないようにします。割り当てられたメモリのオーバーランは、攻撃者による任意のコード実行につながる可能性があります。エンコーダーの問題は（通常）よりまれである場合でも、エンコーダーにも同じことが当てはまります。

This RTP payload format and its media decoder do not exhibit any significant non-uniformity in the receiver-side computational complexity for packet processing; thus, they are unlikely to pose a denial-of-service threat due to the receipt of pathological data. Nor does the RTP payload format contain any active content.

このRTPペイロード形式とそのメディアデコーダーは、パケット処理のために受信機側の計算の複雑さに有意な不均一性を示さない。したがって、病理学的データの受領により、サービス拒否の脅威をもたらす可能性は低いです。また、RTPペイロード形式にはアクティブなコンテンツが含まれていません。

9. Congestion Control

9. 混雑制御

Congestion control for RTP SHALL be used in accordance with [RFC3550], and with any applicable RTP profile, e.g., [RFC3551]. The congestion control mechanism can, in a real-time encoding scenario, adapt the transmission rate by instructing the encoder to encode at a certain target rate. Media-aware network elements MAY use the information in the VP9 payload descriptor in Section 4.2 to identify non-reference frames and discard them in order to reduce network congestion. Note that discarding of non-reference frames cannot be done if the stream is encrypted (because the non-reference marker is encrypted).

RTPの混雑制御は、[RFC3550]に従って、および該当するRTPプロファイルなど、[RFC3551]に従って使用するものとします。輻輳制御メカニズムは、リアルタイムエンコードシナリオでは、エンコーダに特定の目標レートでエンコードするよう指示することにより、伝送速度を適応させることができます。メディア認識ネットワーク要素は、セクション4.2のVP9ペイロード記述子の情報を使用して、非参照フレームを識別し、ネットワークの混雑を減らすためにそれらを破棄することができます。ストリームが暗号化されている場合（非参照マーカーが暗号化されているため）、非参照フレームの破棄は実行できないことに注意してください。

10. IANA Considerations

10. IANAの考慮事項

IANA has registered the media type "video/vp9" as specified in Section 7. The media type has also been added to the "RTP Payload Format Media Types" registry of the "Real-Time Transport Protocol (RTP) Parameters" registry group (https://www.iana.org/assignments/ rtp-parameters) as follows.

IANAは、セクション7で指定されているように、メディアタイプの「ビデオ/VP9」を登録しています。メディアタイプは、「RTP Payloadフォーマットメディアタイプ」に「リアルタイムトランスポートプロトコル（RTP）パラメーター」レジストリ（https://www.iana.org/assignments/ rtp-parameters）

Media Type:

メディアタイプ：