[要約] RFC 9317は、ストリーミングビデオや他の高ビットレートメディアをインターネット上で配信する際に重要なネットワーキングおよびトランスポートプロトコルの問題についての概要を提供し、品質の経験(QoE)に関連するものです。この文書は、既存のネットワーキングプラクティスの一般的な仮定と、ストリーミングメディアを配信する際に遭遇するメディア配信の問題の観察との間の違いを強調し、ネットワークデザイナーやトランスポートエキスパートが特定のメディアの専門知識を持たない場合に驚くべき特性を説明しています。

Operational Considerations for Streaming Media




This document provides an overview of operational networking and transport protocol issues that pertain to the quality of experience (QoE) when streaming video and other high-bitrate media over the Internet.


This document explains the characteristics of streaming media delivery that have surprised network designers or transport experts who lack specific media expertise, since streaming media highlights key differences between common assumptions in existing networking practices and observations of media delivery issues encountered when streaming media over those existing networks.


このドキュメントは、インターネットエンジニアリングタスクフォース(IETF)の製品です。IETFコミュニティのコンセンサスを表しています。公開レビューを受けており、インターネットエンジニアリングステアリンググループ(IESG)からの出版が承認されています。IESGによって承認されたすべてのドキュメントが、インターネット標準のあらゆるレベルの候補者であるわけではありません。RFC 7841のセクション2を参照してください。

著作権(c)2022 IETF Trustおよび文書著者として特定された人。全著作権所有。

このドキュメントは、BCP 78およびIETFドキュメント(https://trustee.ietf.org/license-info)に関連するIETF Trustの法的規定の対象となります。この文書に関するあなたの権利と制限を説明するので、これらの文書を注意深く確認してください。このドキュメントから抽出されたコードコンポーネントには、セクション4.Eで説明されている法的規定のセクション4.Eで説明されており、修正されたBSDライセンスで説明されているように保証なしで提供される修正されたBSDライセンステキストを含める必要があります。

1. Introduction
1. はじめに

This document provides an overview of operational networking and transport protocol issues that pertain to the quality of experience (QoE) when streaming video and other high-bitrate media over the Internet.


This document is intended to explain the characteristics of streaming media delivery that have surprised network designers or transport experts who lack specific media expertise, since streaming media highlights key differences between common assumptions in existing networking practices and observations of media delivery issues encountered when streaming media over those existing networks.


1.1. Key Definitions
1.1. 重要な定義

This document defines "high-bitrate streaming media over the Internet" as follows:


* "High-bitrate" is a context-sensitive term broadly intended to capture rates that can be sustained over some but not all of the target audience's network connections. A snapshot of values commonly qualifying as high-bitrate on today's Internet is given by the higher-value entries in Section 3.1.1.

* 「High-Bitrate」とは、ターゲットオーディエンスのネットワーク接続のすべてではなく、一部で維持できるレートを維持できるレートをキャプチャすることを広く意図したコンテキストに敏感な用語です。今日のインターネットで一般的に高ビトル酸塩として適格である値のスナップショットは、セクション3.1.1のより高い価値のあるエントリによって与えられます。

* "Streaming" means the continuous transmission of media segments from a server to a client and its simultaneous consumption by the client.

* 「ストリーミング」とは、サーバーからクライアントへのメディアセグメントの継続的な送信と、クライアントによる同時消費を意味します。

- The term "simultaneous" is critical, as media segment transmission is not considered "streaming" if one downloads a media file and plays it after the download is completed. Instead, this would be called "download and play".

- メディアセグメントの送信は、メディアファイルをダウンロードしてダウンロードが完了した後に再生する場合、メディアセグメントの送信は「ストリーミング」とは見なされないため、「同時」という用語が重要です。代わりに、これは「ダウンロードと再生」と呼ばれます。

- This has two implications. First, the sending rate for media segments must match the client's consumption rate (whether loosely or tightly) to provide uninterrupted playback. That is, the client must not run out of media segments (buffer underrun) and must not accept more media segments than it can buffer before playback (buffer overrun).

- これには2つの意味があります。まず、メディアセグメントの送信率は、クライアントの消費率(ゆるくても厳しく)と一致して、途切れない再生を提供する必要があります。つまり、クライアントはメディアセグメントを使い果たしてはなりません(バッファーアンダーラン)、再生前にバッファー(バッファオーバーラン)よりも多くのメディアセグメントを受け入れてはなりません。

- Second, the client's media segment consumption rate is limited not only by the path's available bandwidth but also by media segment availability. The client cannot fetch media segments that a media server cannot provide (yet).

- 第二に、クライアントのメディアセグメントの消費率は、パスの利用可能な帯域幅だけでなく、メディアセグメントの可用性によっても制限されています。クライアントは、メディアサーバーが(まだ)提供できないメディアセグメントを取得できません。

* "Media" refers to any type of media and associated streams, such as video, audio, metadata, etc.

* 「メディア」とは、ビデオ、オーディオ、メタデータなど、あらゆるタイプのメディアや関連するストリームを指します。

* "Over the Internet" means that a single operator does not have control of the entire path between media servers and media clients, so it is not a "walled garden".

* 「インターネット上」とは、単一のオペレーターがメディアサーバーとメディアクライアント間のパス全体を制御できないことを意味するため、「壁に囲まれた庭」ではありません。

This document uses these terms to describe the streaming media ecosystem:


Streaming Media Operator: an entity that provides streaming media servers


Media Server: a server that provides streaming media to a media player, which is also referred to as a streaming media server, or simply a server


Intermediary: an entity that is on-path, between the streaming media operator and the ultimate media consumer, and that is media aware


When the streaming media is encrypted, an intermediary must have credentials that allow the intermediary to decrypt the media in order to be media aware.


An intermediary can be one of many specialized subtypes that meet this definition.


Media Player: an endpoint that requests streaming media from a media server for an ultimate media consumer, which is also referred to as a streaming media client, or simply a client


Ultimate Media Consumer: a human or machine using a media player


1.2. Document Scope
1.2. ドキュメントスコープ

A full review of all streaming media considerations for all types of media over all types of network paths is too broad a topic to cover comprehensively in a single document.


This document focuses chiefly on the large-scale delivery of streaming high-bitrate media to end users. It is primarily intended for those controlling endpoints involved in delivering streaming media traffic. This can include origin servers publishing content, intermediaries like content delivery networks (CDNs), and providers for client devices and media players.

このドキュメントは、主にエンドユーザーへのストリーミング高ビトレートメディアの大規模な配信に焦点を当てています。主に、ストリーミングメディアトラフィックの提供に関与するエンドポイントを制御する人々を対象としています。これには、Origin Servers Publishing Content、コンテンツ配信ネットワーク(CDN)などの仲介者、クライアントデバイスやメディアプレーヤー向けのプロバイダーが含まれます。

Most of the considerations covered in this document apply to both "live media" (created and streamed as an event is in progress) and "media on demand" (previously recorded media that is streamed from storage), except where noted.


Most of the considerations covered in this document apply to both media that is consumed by a media player, for viewing by a human, and media that is consumed by a machine, such as a media recorder that is executing an adaptive bitrate (ABR) streaming algorithm, except where noted.


This document contains


* a short description of streaming video characteristics in Section 2 to set the stage for the rest of the document,

* セクション2のストリーミングビデオ特性の簡単な説明は、ドキュメントの残りの部分の段階を設定します。

* general guidance on bandwidth provisioning (Section 3) and latency considerations (Section 4) for streaming media delivery,

* ストリーミングメディア配信に関する帯域幅プロビジョニング(セクション3)およびレイテンシの考慮事項(セクション4)に関する一般的なガイダンス、

* a description of adaptive encoding and adaptive delivery techniques in common use for streaming video, along with a description of the challenges media senders face in detecting the bitrate available between the media sender and media receiver, and a collection of measurements by a third party for use in analytics (Section 5),

* ストリーミングビデオに一般的な使用の適応エンコーディングおよび適応配信技術の説明と、メディア送信者がメディア送信者とメディアレシーバーの間で利用可能なビットレートを検出する際に直面する課題の説明、および使用のための第三者による測定のコレクションの説明分析(セクション5)、

* a description of existing transport protocols used for media streaming and the issues encountered when using those protocols, along with a description of the QUIC transport protocol [RFC9000] more recently used for streaming media (Section 6),

* メディアストリーミングに使用される既存の輸送プロトコルの説明と、これらのプロトコルを使用するときに発生する問題、および最近ではメディアのストリーミングに使用されたQUICトランスポートプロトコル[RFC9000]の説明(セクション6)、

* a description of implications when streaming encrypted media (Section 7), and

* 暗号化されたメディア(セクション7)をストリーミングするときの影響の説明、および

* a pointer to additional resources for further reading on this rapidly changing subject (Section 8).

* この急速に変化する主題をさらに読むための追加のリソースへのポインター(セクション8)。

Topics outside this scope include the following:


* an in-depth examination of real-time, two-way interactive media, such as videoconferencing; although this document touches lightly on topics related to this space, the intent is to let readers know that for more in-depth coverage they should look to other documents, since the techniques and issues for interactive real-time, two-way media differ so dramatically from those in large-scale, one-way delivery of streaming media.

* ビデオ会議などのリアルタイムの双方向インタラクティブメディアの詳細な調査。このドキュメントは、この分野に関連するトピックに軽く触れていますが、インタラクティブなリアルタイムのテクニックと問題は異なるため、より詳細なカバレッジのために他のドキュメントを見るべきであることを読者に知らせることです。ストリーミングメディアの大規模な一元配置配信の人々から劇的に。

* specific recommendations on operational practices to mitigate issues described in this document; although some known mitigations are mentioned in passing, the primary intent is to provide a point of reference for future solution proposals to describe how new technologies address or avoid existing problems.

* この文書に記載されている問題を軽減するための運用慣行に関する具体的な推奨事項。いくつかの既知の緩和が通過することで言及されていますが、主な意図は、新しいテクノロジーが既存の問題にどのように対処するか、回避するかを説明するために、将来のソリューション提案の参照ポイントを提供することです。

* generalized network performance techniques; while considerations, such as data center design, transit network design, and "walled garden" optimizations, can be crucial components of a performant streaming media service, these are considered independent topics that are better addressed by other documents.

* 一般化されたネットワークパフォーマンス技術。データセンターの設計、トランジットネットワーク設計、「壁に囲まれた庭」の最適化などの考慮事項は、パフォーマンスストリーミングメディアサービスの重要なコンポーネントになる可能性がありますが、これらは他のドキュメントでより適切に対処される独立したトピックと見なされます。

* transparent tunnels; while tunnels can have an impact on streaming media via issues like the round-trip time and the maximum transmission unit (MTU) of packets carried over tunnels, for the purposes of this document, these issues are considered as part of the set of network path properties.

* 透明なトンネル;トンネルは、このドキュメントの目的のために、トンネルを介したパケットの往復時間や最大送信ユニット(MTU)などの問題を介してストリーミングメディアに影響を与える可能性がありますが、これらの問題はネットワークパスのセットの一部と見なされます。プロパティ。

Questions about whether this document also covers "Web Real-Time Communication (WebRTC)" have come up often. It does not. WebRTC's principal media transport protocol [RFC8834] [RFC8835], the Real-time Transport Protocol (RTP), is mentioned in this document. However, as noted in Section 2, it is difficult to give general guidance for unreliable media transport protocols used to carry interactive real-time media.

このドキュメントが「Webリアルタイム通信(WeBRTC)」をカバーするかどうかについての質問が頻繁に登場します。そうではありません。WeBRTCの主要なメディアトランスポートプロトコル[RFC8834] [RFC8835]、リアルタイムトランスポートプロトコル(RTP)は、このドキュメントに記載されています。ただし、セクション2で述べたように、インタラクティブなリアルタイムメディアを運ぶために使用される信頼できないメディア輸送プロトコルについて、一般的なガイダンスを提供することは困難です。

2. Our Focus on Streaming Video
2. ストリーミングビデオに焦点を当てています

As the Internet has grown, an increasingly large share of the traffic delivered to end users has become video. The most recent available estimates found that 75% of the total traffic to end users was video in 2019 (as described in [RFC8404], such traffic surveys have since become impossible to conduct due to ubiquitous encryption). At that time, the share of video traffic had been growing for years and was projected to continue growing (Appendix D of [CVNI]).


A substantial part of this growth is due to the increased use of streaming video. However, video traffic in real-time communications (for example, online videoconferencing) has also grown significantly. While both streaming video and videoconferencing have real-time delivery and latency requirements, these requirements vary from one application to another. For additional discussion of latency requirements, see Section 4.


In many contexts, media traffic can be handled transparently as generic application-level traffic. However, as the volume of media traffic continues to grow, it is becoming increasingly important to consider the effects of network design decisions on application-level performance, with considerations for the impact on media delivery.


Much of the focus of this document is on media streaming over HTTP. HTTP is widely used for media streaming because


* support for HTTP is widely available in a wide range of operating systems,

* HTTPのサポートは、幅広いオペレーティングシステムで広く利用可能です。

* HTTP is also used in a wide variety of other applications,

* HTTPは、他のさまざまなアプリケーションでも使用されています。

* HTTP has been demonstrated to provide acceptable performance over the open Internet,

* HTTPは、オープンなインターネットで許容可能なパフォーマンスを提供することが実証されています。

* HTTP includes state-of-the-art standardized security mechanisms, and

* HTTPには、最先端の標準化されたセキュリティメカニズムが含まれます

* HTTP can use already-deployed caching infrastructure, such as CDNs, local proxies, and browser caches.

* HTTPは、CDN、ローカルプロキシ、ブラウザキャッシュなど、すでに展開されたキャッシュインフラストラクチャを使用できます。

Various HTTP versions have been used for media delivery. HTTP/1.0, HTTP/1.1, and HTTP/2 are carried over TCP [RFC9293], and TCP's transport behavior is described in Section 6.1. HTTP/3 is carried over QUIC, and QUIC's transport behavior is described in Section 6.3.

さまざまなHTTPバージョンがメディア配信に使用されています。HTTP/1.0、HTTP/1.1、およびHTTP/2はTCP [RFC9293]に掲載され、TCPの輸送挙動はセクション6.1に記載されています。HTTP/3はQUICに渡され、QUICの輸送挙動はセクション6.3で説明されています。

Unreliable media delivery using RTP and other UDP-based protocols is also discussed in Sections 4.1, 6.2, and 7.2, but it is difficult to give general guidance for these applications. For instance, when packet loss occurs, the most appropriate response may depend on the type of codec being used.


3. Bandwidth Provisioning
3. 帯域幅プロビジョニング
3.1. Scaling Requirements for Media Delivery
3.1. メディア配信のスケーリング要件
3.1.1. Video Bitrates
3.1.1. ビデオビットレート

Video bitrate selection depends on many variables including the resolution (height and width), frame rate, color depth, codec, encoding parameters, scene complexity, and amount of motion. Generally speaking, as the resolution, frame rate, color depth, scene complexity, and amount of motion increase, the encoding bitrate increases. As newer codecs with better compression tools are used, the encoding bitrate decreases. Similarly, a multi-pass encoding generally produces better quality output compared to single-pass encoding at the same bitrate or delivers the same quality at a lower bitrate.


Here are a few common resolutions used for video content, with typical ranges of bitrates for the two most popular video codecs [Encodings].


         | Name       | Width x Height | H.264      | H.265      |
         | DVD        | 720 x 480      | 1.0 Mbps   | 0.5 Mbps   |
         | 720p (1K)  | 1280 x 720     | 3-4.5 Mbps | 2-4 Mbps   |
         | 1080p (2K) | 1920 x 1080    | 6-8 Mbps   | 4.5-7 Mbps |
         | 2160p (4k) | 3840 x 2160    | N/A        | 10-20 Mbps |

Table 1: Typical Resolutions and Bitrate Ranges Used for Video Encoding


* Note that these codecs do not take the actual "available bandwidth" between media servers and media players into account when encoding because the codec does not have any idea what network paths and network path conditions will carry the encoded video at some point in the future. It is common for codecs to offer a small number of resource variants, differing only in the bandwidth each variant targets.

* これらのコーデックは、エンコード時にメディアサーバーとメディアプレーヤーの間で実際の「利用可能な帯域幅」を考慮していないことに注意してください。コーデックは、将来のある時点でエンコードされたビデオをどのネットワークパスとネットワークパス条件を運ぶかを知らないためです。コーデックが少数のリソースバリエーションを提供するのが一般的であり、各バリアントターゲットの帯域幅のみが異なります。

* Note that media players attempting to receive encoded video across a network path with insufficient available path bandwidth might request the media server to provide video encoded for lower bitrates, at the cost of lower video quality, as described in Section 5.3.

* 使用可能なパス帯域幅が不十分なネットワークパスでエンコードされたビデオを受け取ろうとするメディアプレーヤーは、セクション5.3で説明されているように、ビデオ品質の低下で、より低いビットレート用にエンコードされたビデオを提供するようにメディアサーバーに要求する可能性があることに注意してください。

* In order to provide multiple encodings for video resources, the codec must produce multiple variants (also called renditions) of the video resource encoded at various bitrates, as described in Section 5.2.

* ビデオリソースに複数のエンコーディングを提供するには、セクション5.2で説明されているように、さまざまなビットレートでエンコードされたビデオリソースの複数のバリエーション(レンディションとも呼ばれます)を生成する必要があります。

3.1.2. Virtual Reality Bitrates
3.1.2. バーチャルリアリティビットレート

The bitrates given in Section 3.1.1 describe video streams that provide the user with a single, fixed point of view -- therefore, the user has no "degrees of freedom", and the user sees all of the video image that is available.


Even basic virtual reality (360-degree) videos that allow users to look around freely (referred to as "three degrees of freedom" or 3DoF) require substantially larger bitrates when they are captured and encoded, as such videos require multiple fields of view of the scene. Yet, due to smart delivery methods, such as viewport-based or tile-based streaming, there is no need to send the whole scene to the user. Instead, the user needs only the portion corresponding to its viewpoint at any given time [Survey360].


In more immersive applications, where limited user movement ("three degrees of freedom plus" or 3DoF+) or full user movement ("six degrees of freedom" or 6DoF) is allowed, the required bitrate grows even further. In this case, immersive content is typically referred to as volumetric media. One way to represent the volumetric media is to use point clouds, where streaming a single object may easily require a bitrate of 30 Mbps or higher. Refer to [MPEGI] and [PCC] for more details.

限られたユーザーの動き(「3度の自由度と3DOF」または3DOF)またはフルユーザーの動き(「6度自由度」または6DOF)が許可されている、より没入型のアプリケーションでは、必要なビットレートがさらに成長します。この場合、没入型コンテンツは通常、体積媒体と呼ばれます。体積媒体を表す1つの方法は、単一のオブジェクトをストリーミングすることで30 Mbps以上のビットレートが簡単に必要になる場合があるポイントクラウドを使用することです。詳細については、[MPEGI]および[PCC]を参照してください。

3.2. Path Bottlenecks and Constraints
3.2. パスボトルネックと制約

Even when the bandwidth requirements for media streams along a path are well understood, additional analysis is required to understand the constraints on bandwidth at various points along the path between media servers and media players. Media streams can encounter bottlenecks at many points along a path, whether the bottleneck happens at a node or at a path segment along the path, and these bottlenecks may involve a lack of processing power, buffering capacity, link speed, or any other exhaustible resource.


Media servers may react to bandwidth constraints using two independent feedback loops:


* Media servers often respond to application-level feedback from the media player that indicates a bottleneck somewhere along the path by sending a different media bitrate. This is described in greater detail in Section 5.

* メディアサーバーは、多くの場合、メディアプレーヤーからのアプリケーションレベルのフィードバックに応答します。メディアプレーヤーは、異なるメディアビットレートを送信することでパスに沿ったボトルネックを示すことを示しています。これについては、セクション5で詳細に説明しています。

* Media servers also typically rely on transport protocols with capacity-seeking congestion controllers that probe for available path bandwidth and adjust the media sending rate based on transport mechanisms. This is described in greater detail in Section 6.

* メディアサーバーは通常、利用可能なパス帯域幅をプローブし、輸送メカニズムに基づいてメディア送信レートを調整する容量を求める混雑コントローラーを備えた輸送プロトコルにも依存しています。これについては、セクション6で詳しく説明します。

The result is that these two (potentially competing) "helpful" mechanisms each respond to the same bottleneck with no coordination between themselves, so that each is unaware of actions taken by the other, and this can result in QoE for users that is significantly lower than what could have been achieved.


One might wonder why media servers and transport protocols are each unaware of what the other is doing, and there are multiple reasons for that. One reason is that media servers are often implemented as applications executing in user space, relying on a general-purpose operating system that typically has its transport protocols implemented in the operating system kernel, making decisions that the media server never knows about.


As one example, if a media server overestimates the available bandwidth to the media player,


* the transport protocol may detect loss due to congestion and reduce its sending window size per round trip,

* 輸送プロトコルは、混雑による損失を検出し、往復あたりの送信ウィンドウサイズを削減する可能性があります。

* the media server adapts to application-level feedback from the media player and reduces its own sending rate, and/or

* メディアサーバーは、メディアプレーヤーからのアプリケーションレベルのフィードバックに適応し、独自の送信率を削減します。

* the transport protocol sends media at the new, lower rate and confirms that this new, lower rate is "safe" because no transport-level loss is occurring.

* トランスポートプロトコルは、新しい、低いレートでメディアを送信し、輸送レベルの損失が発生していないため、この新しい低いレートが「安全」であることを確認します。

However, because the media server continues to send at the new, lower rate, the transport protocol's maximum sending rate is now limited by the amount of information the media server queues for transmission. Therefore, the transport protocol cannot probe for available path bandwidth by sending at a higher rate until the media player requests segments that buffer enough data for the transport to perform the probing.


To avoid these types of situations, which can potentially affect all the users whose streaming media segments traverse a bottleneck path segment, there are several possible mitigations that streaming operators can use. However, the first step toward mitigating a problem is knowing that a problem is occurring.


3.2.1. Recognizing Changes from a Baseline
3.2.1. ベースラインからの変更を認識します

There are many reasons why path characteristics might change in normal operation. For example:


* If the path topology changes. For example, routing changes, which can happen in normal operation, may result in traffic being carried over a new path topology that is partially or entirely disjointed from the previous path, especially if the new path topology includes one or more path segments that are more heavily loaded, offer lower total bandwidth, change the overall Path MTU size, or simply cover more distance between the path endpoints.

* パストポロジが変更された場合。たとえば、通常の操作で発生する可能性のあるルーティングの変更により、特に新しいパストポロジがより多くの1つ以上のパスセグメントが含まれている場合、以前のパスから部分的または完全に切り離された新しいパストポロジにトラフィックが運ばれる可能性があります。重く負荷がかかり、総帯域幅が低いか、全体のパスMTUサイズを変更するか、パスエンドポイント間のより多くの距離をカバーするだけです。

* If cross traffic that also traverses part or all of the same path topology increases or decreases, especially if this new cross traffic is "inelastic" and does not respond to indications of path congestion.

* 同じ経路トポロジの一部またはすべてが横断またはすべてを横断する場合、特にこの新しい交差トラフィックが「非弾性」であり、パス輻輳の兆候に応答しない場合。

* Wireless links (Wi-Fi, 5G, LTE, etc.) may see rapid changes to capacity from changes in radio interference and signal strength as endpoints move.

* ワイヤレスリンク(Wi-Fi、5G、LTEなど)は、エンドポイントが移動するにつれて、無線干渉と信号強度の変化による容量に急速な変化が見られる場合があります。

To recognize that a path carrying streaming media has experienced a change, maintaining a baseline that captures its prior properties is fundamental. Analytics that aid in that recognition can be more or less sophisticated and can usefully operate on several different time scales, from milliseconds to hours or days.


Useful properties to monitor for changes can include the following:


* round-trip times

* 往復時間

* loss rate (and explicit congestion notification (ECN) [RFC3168] when in use)

* 損失率(および明示的な混雑通知(ECN)[RFC3168]使用中の場合)

* out-of-order packet rate

* オーダーアウトパケットレート

* packet and byte receive rate

* パケットとバイトの受信率

* application-level goodput

* アプリケーションレベルのGoodput

* properties of other connections carrying competing traffic, in addition to the connections carrying the streaming media

* ストリーミングメディアを運ぶ接続に加えて、競合するトラフィックを運ぶ他の接続のプロパティ

* externally provided measurements, for example, from network cards or metrics collected by the operating system

* たとえば、オペレーティングシステムによって収集されたネットワークカードやメトリックからの外部から提供された測定値

3.3. Path Requirements
3.3. パス要件

The bitrate requirements in Section 3.1 are per end user actively consuming a media feed, so in the worst case, the bitrate demands can be multiplied by the number of simultaneous users to find the bandwidth requirements for a delivery path with that number of users downstream. For example, at a node with 10,000 downstream users simultaneously consuming video streams, approximately 80 Gbps might be necessary for all of them to get typical content at 1080p resolution.

セクション3.1のビットレート要件はエンドユーザーごとにメディアフィードを積極的に消費しているため、最悪の場合、ビットレートの要求に同時ユーザーの数を掛けて、その数のユーザーが下流のユーザーの帯域幅要件を見つけることができます。たとえば、10,000人のダウンストリームユーザーがビデオストリームを消費しているノードでは、すべての人が1080pの解像度で典型的なコンテンツを取得するために約80 Gbpsが必要になる場合があります。

However, when there is some overlap in the feeds being consumed by end users, it is sometimes possible to reduce the bandwidth provisioning requirements for the network by performing some kind of replication within the network. This can be achieved via object caching with the delivery of replicated objects over individual connections and/or by packet-level replication using multicast.


To the extent that replication of popular content can be performed, bandwidth requirements at peering or ingest points can be reduced to as low as a per-feed requirement instead of a per-user requirement.


3.4. Caching Systems
3.4. キャッシュシステム

When demand for content is relatively predictable, and especially when that content is relatively static, caching content close to requesters and preloading caches to respond quickly to initial requests are often useful (for example, HTTP/1.1 caching is described in [RFC9111]). This is subject to the usual considerations for caching -- for example, how much data must be cached to make a significant difference to the requester and how the benefit of caching and preloading cache balances against the costs of tracking stale content in caches and refreshing that content.


It is worth noting that not all high-demand content is "live" content. One relevant example is when popular streaming content can be staged close to a significant number of requesters, as can happen when a new episode of a popular show is released. This content may be largely stable and is therefore low-cost to maintain in multiple places throughout the Internet. This can reduce demands for high end-to-end bandwidth without having to use mechanisms like multicast.


Caching and preloading can also reduce exposure to peering point congestion, since less traffic crosses the peering point exchanges if the caches are placed in peer networks. This is especially true when the content can be preloaded during off-peak hours and if the transfer can make use of "A Lower-Effort Per-Hop Behavior (LE PHB) for Differentiated Services" [RFC8622], "Low Extra Delay Background Transport (LEDBAT)" [RFC6817], or similar mechanisms.

キャッシュとプリロードは、ピアネットワークにキャッシュが配置されている場合、ピアリングポイント交換を交差させるトラフィックが少ないため、ピアリングポイント輻輳への曝露を減らすこともできます。これは、オフピーク時間中にコンテンツをプリロードできる場合、および転送が「差別化されたサービスの低いエフォートの動作(LE PHB)」[RFC8622]を使用できる場合、特に当てはまります。(Ledbat) "[rfc6817]、または同様のメカニズム。

All of this depends, of course, on the ability of a streaming media operator to predict usage and provision bandwidth, caching, and other mechanisms to meet the needs of users. In some cases (Section 3.5), this is relatively routine, but in other cases, it is more difficult (Section 3.6).


With the emergence of ultra-low-latency streaming, responses have to start streaming to the end user while still being transmitted to the cache and while the cache does not yet know the size of the object. Some of the popular caching systems were designed around a cache footprint and had deeply ingrained assumptions about knowing the size of objects that are being stored, so the change in design requirements in long-established systems caused some errors in production. Incidents occurred where a transmission error in the connection from the upstream source to the cache could result in the cache holding a truncated segment and transmitting it to the end user's device. In this case, players rendering the stream often had a playback freeze until the player was reset. In some cases, the truncated object was even cached that way and served later to other players as well, causing continued stalls at the same spot in the media for all players playing the segment delivered from that cache node.


3.5. Predictable Usage Profiles
3.5. 予測可能な使用プロファイル

Historical data shows that users consume more videos, and these videos are encoded at a bitrate higher than they were in the past. Improvements in the codecs that help reduce the encoding bitrates with better compression algorithms have not offset the increase in the demand for the higher quality video (higher resolution, higher frame rate, better color gamut, better dynamic range, etc.). In particular, mobile data usage in cellular access networks has shown a large jump over the years due to increased consumption of entertainment and conversational video.

履歴データは、ユーザーがより多くのビデオを消費することを示しており、これらのビデオは過去よりも高いビットレートでエンコードされています。より良い圧縮アルゴリズムでエンコーディングビットレートを減らすのに役立つコーデックの改善は、高品質のビデオの需要の増加を相殺しません(より高い解像度、より高いフレームレート、より良い色域、より良いダイナミックレンジなど)。特に、Cellular Access Networksでのモバイルデータの使用は、エンターテイメントや会話のビデオの消費の増加により、長年にわたって大きなジャンプを示しています。

3.6. Unpredictable Usage Profiles
3.6. 予測不可能な使用プロファイル

It is also possible for usage profiles to change significantly and suddenly. These changes are more difficult to plan for, but at a minimum, recognizing that sudden changes are happening is critical.


The two examples that follow are instructive.


3.6.1. Peer-to-Peer Applications
3.6.1. ピアツーピアアプリケーション

In the first example, described in "Report from the IETF Workshop on Peer-to-Peer (P2P) Infrastructure, May 28, 2008" [RFC5594], when the BitTorrent file sharing application came into widespread use in 2005, sudden and unexpected growth in peer-to-peer traffic led to complaints from ISP customers about the performance of delay-sensitive traffic (Voice over IP (VoIP) and gaming). These performance issues resulted from at least two causes:

最初の例では、「2008年5月28日、ピアツーピア(P2P)インフラストラクチャに関するIETFワークショップのレポート」[RFC5594] [RFC5594]で説明されています。ピアツーピアのトラフィックでは、遅延に敏感なトラフィック(Voice over IP(VoIP)とゲーム)のパフォーマンスについてISPの顧客からの苦情につながりました。これらのパフォーマンスの問題は、少なくとも2つの原因から生じました。

* Many access networks for end users used underlying technologies that are inherently asymmetric, favoring downstream bandwidth (e.g., ADSL, cellular technologies, and most IEEE 802.11 variants), assuming that most users will need more downstream bandwidth than upstream bandwidth. This is a good assumption for client-server applications, such as streaming media or software downloads, but BitTorrent rewarded peers that uploaded as much as they downloaded, so BitTorrent users had much more symmetric usage profiles, which interacted badly with these asymmetric access network technologies.

* エンドユーザー向けの多くのアクセスネットワークは、本質的に非対称で、下流の帯域幅(ADSL、セルラーテクノロジー、ほとんどのIEEE 802.11バリアント)を支持する基礎となるテクノロジーを使用しています。これは、ストリーミングメディアやソフトウェアのダウンロードなどのクライアントサーバーアプリケーションにとって良い仮定ですが、BitTorrentはダウンロードしたのと同じくらいアップロードしたピアに報いるため、BitTorrentユーザーはこれらの非対称アクセスネットワークテクノロジーとひどく相互作用したがはるかに対称的な使用プロファイルを持っていました。。

* Some P2P systems also used distributed hash tables to organize peers into a ring topology, where each peer knew its "next peer" and "previous peer". There was no connection between the application-level ring topology and the lower-level network topology, so a peer's "next peer" might be anywhere on the reachable Internet. Traffic models that expected most communication to take place with a relatively small number of servers were unable to cope with peer-to-peer traffic that was much less predictable.

* 一部のP2Pシステムは、分散したハッシュテーブルも使用して、ピアがリングトポロジに整理し、各ピアが「次のピア」と「以前のピア」を知っていました。アプリケーションレベルのリングトポロジーと低レベルのネットワークトポロジの間に接続はなかったため、ピアの「次のピア」は到達可能なインターネット上のどこにでもあるかもしれません。比較的少数のサーバーでほとんどの通信が行われると予想されるトラフィックモデルは、予測がはるかに低いピアツーピアトラフィックに対処することができませんでした。

Especially as end users increase the use of video-based social networking applications, it will be helpful for access network providers to watch for increasing numbers of end users uploading significant amounts of content.


3.6.2. Impact of Global Pandemic
3.6.2. 世界のパンデミックの影響

Early in 2020, the COVID-19 pandemic and resulting quarantines and shutdowns led to significant changes in traffic patterns due to a large number of people who suddenly started working and attending school remotely and using more interactive applications (e.g., videoconferencing and streaming media). Subsequently, the Internet Architecture Board (IAB) held a COVID-19 Network Impacts Workshop [RFC9075] in November 2020. The following observations from the workshop report are worth considering.


* Participants describing different types of networks reported different kinds of impacts, but all types of networks saw impacts.

* さまざまな種類のネットワークを説明する参加者は、さまざまな種類の影響を報告しましたが、すべてのタイプのネットワークは影響を見ました。

* Mobile networks saw traffic reductions, and residential networks saw significant increases.

* モバイルネットワークではトラフィックの削減が見られ、住宅ネットワークは大幅に増加しました。

* Reported traffic increases from ISPs and Internet Exchange Points (IXPs) over just a few weeks were as big as the traffic growth over the course of a typical year, representing a 15-20% surge in growth to land at a new normal that was much higher than anticipated.

* 報告されたトラフィックは、数週間にわたってISPとインターネット交換ポイント(IXPS)からのトラフィックの増加が、典型的な年にわたって交通量の増加と同じくらい大きく、新しい正常に着陸するための成長の15〜20%の急増を表しています。予想よりも高い。

* At Deutscher Commercial Internet Exchange (DE-CIX) Frankfurt, the world's largest IXP in terms of data throughput, the year 2020 has seen the largest increase in peak traffic within a single year since the IXP was founded in 1995.

* データスループットの観点から世界最大のIXPであるDeutscher Commercial Internet Exchange(DE-CIX)Frankfurtで、2020年は1995年にIXPが設立されて以来、1年以内にピークトラフィックが最大に増加しました。

* The usage pattern changed significantly as work-from-home and videoconferencing usage peaked during normal work hours, which would have typically been off-peak hours with adults at work and children at school. One might expect that the peak would have had more impact on networks if it had happened during typical evening peak hours for streaming applications.

* 使用パターンは、通常の勤務時間中に在宅勤務とビデオ会議の使用がピークに達したため、大幅に変化しました。これは、通常、職場で大人と学校の子供たちとオフピークの時間でした。ストリーミングアプリケーションの典型的な夕方のピーク時間中にそれが起こった場合、ピークはネットワークにより多くの影響を与えると予想されるかもしれません。

* The increase in daytime bandwidth consumption reflected both significant increases in essential applications, such as videoconferencing and virtual private networks (VPNs), and entertainment applications as people watched videos or played games.

* 昼間の帯域幅の消費の増加は、ビデオ会議や仮想プライベートネットワーク(VPN)などの必須アプリケーションの大幅な増加と、人々がビデオを見たりゲームをプレイしたりしたときのエンターテイメントアプリケーションの両方を反映しています。

* At the IXP level, it was observed that physical link utilization increased. This phenomenon could probably be explained by a higher level of uncacheable traffic, such as videoconferencing and VPNs, from residential users as they stopped commuting and switched to working at home.

* IXPレベルでは、物理リンクの使用率が増加することが観察されました。この現象は、おそらく、通勤を停止し、自宅での仕事に切り替えた住宅ユーザーから、ビデオ会議やVPNなどのより高いレベルの不可能なトラフィックによって説明できます。

Again, it will be helpful for streaming operators to monitor traffic as described in Section 5.6, watching for sudden changes in performance.


4. Latency Considerations
4. 待ち時間の考慮事項

Streaming media latency refers to the "glass-to-glass" time duration, which is the delay between the real-life occurrence of an event and the streamed media being appropriately played on an end user's device. Note that this is different from the network latency (defined as the time for a packet to cross a network from one end to another end) because it includes media encoding/decoding and buffering time and, for most cases, also the ingest to an intermediate service, such as a CDN or other media distribution service, rather than a direct connection to an end user.


The team working on this document found these rough categories to be useful when considering a streaming media application's latency requirements:


* ultra-low-latency (less than 1 second)

* 超低遅延(1秒未満)

* low-latency live (less than 10 seconds)

* 低遅延ライブ(10秒未満)

* non-low-latency live (10 seconds to a few minutes)

* 低遅延ライブ(10秒から数分)

* on-demand (hours or more)

* オンデマンド(時間以上)

4.1. Ultra-Low-Latency
4.1. 超低遅延

Ultra-low-latency delivery of media is defined here as having a glass-to-glass delay target under 1 second.


Some media content providers aim to achieve this level of latency for live media events. This introduces new challenges when compared to the other latency categories described in Section 4, because ultra-low-latency is on the same scale as commonly observed end-to-end network latency variation, often due to bufferbloat [CoDel], Wi-Fi error correction, or packet reordering. These effects can make it difficult to achieve ultra-low-latency for many users and may require accepting relatively frequent user-visible media artifacts. However, for controlled environments that provide mitigations against such effects, ultra-low-latency is potentially achievable with the right provisioning and the right media transport technologies.


Most applications operating over IP networks and requiring latency this low use the Real-time Transport Protocol (RTP) [RFC3550] or WebRTC [RFC8825], which uses RTP as its media transport protocol, along with several other protocols necessary for safe operation in browsers.

IPネットワークを介して動作し、レイテンシを必要とするほとんどのアプリケーションは、この低使用リアルタイムトランスポートプロトコル(RTP)[RFC3550]またはWeBRTC [RFC8825]を使用します。これは、RTPをメディアトランスポートプロトコルとして使用し、ブロウザーの安全な動作に必要な他のプロトコルとともにいくつかのプロトコルを使用します。。

It is worth noting that many applications for ultra-low-latency delivery do not need to scale to as many users as applications for low-latency and non-low-latency live delivery, which simplifies many delivery considerations.


Recommended reading for applications adopting an RTP-based approach also includes [RFC7656]. For increasing the robustness of the playback by implementing adaptive playout methods, refer to [RFC4733] and [RFC6843].


4.1.1. Near-Real-Time Latency
4.1.1. ほぼリアルタイムの遅延

Some Internet applications that incorporate media streaming have specific interactivity or control-feedback requirements that drive much lower glass-to-glass media latency targets than 1 second. These include videoconferencing or voice calls; remote video gameplay; remote control of hardware platforms like drones, vehicles, or surgical robots; and many other envisioned or deployed interactive applications.


Applications with latency targets in these regimes are out of scope for this document.


4.2. Low-Latency Live
4.2. 低遅延ライブ

Low-latency live delivery of media is defined here as having a glass-to-glass delay target under 10 seconds.


This level of latency is targeted to have a user experience similar to broadcast TV delivery. A frequently cited problem with failing to achieve this level of latency for live sporting events is the user experience failure from having crowds within earshot of one another who react audibly to an important play or from users who learn of an event in the match via some other channel, for example, social media, before it has happened on the screen showing the sporting event.


Applications requiring low-latency live media delivery are generally feasible at scale with some restrictions. This typically requires the use of a premium service dedicated to the delivery of live media, and some trade-offs may be necessary relative to what is feasible in a higher-latency service. The trade-offs may include higher costs, delivering a lower quality media, reduced flexibility for adaptive bitrates, or reduced flexibility for available resolutions so that fewer devices can receive an encoding tuned for their display. Low-latency live delivery is also more susceptible to user-visible disruptions due to transient network conditions than higher-latency services.


Implementation of a low-latency live media service can be achieved with the use of HTTP Live Streaming (HLS) [RFC8216] by using its low-latency extension (called LL-HLS) [HLS-RFC8216BIS] or with Dynamic Adaptive Streaming over HTTP (DASH) [MPEG-DASH] by using its low-latency extension (called LL-DASH) [LL-DASH]. These extensions use the Common Media Application Format (CMAF) standard [MPEG-CMAF] that allows the media to be packaged into and transmitted in units smaller than segments, which are called "chunks" in CMAF language. This way, the latency can be decoupled from the duration of the media segments. Without a CMAF-like packaging, lower latencies can only be achieved by using very short segment durations. However, using shorter segments means using more frequent intra-coded frames, and that is detrimental to video encoding quality. The CMAF standard allows us to still use longer segments (improving encoding quality) without penalizing latency.


While an LL-HLS client retrieves each chunk with a separate HTTP GET request, an LL-DASH client uses the chunked transfer encoding feature of the HTTP [CMAF-CTE], which allows the LL-DASH client to fetch all the chunks belonging to a segment with a single GET request. An HTTP server can transmit the CMAF chunks to the LL-DASH client as they arrive from the encoder/packager. A detailed comparison of LL-HLS and LL-DASH is given in [MMSP20].

LL-HLSクライアントは各チャンクを別のHTTP GETリクエストで取得しますが、LL-DashクライアントはHTTP [CMAF-CTE]のチャンク転送エンコード機能を使用します。これにより、LLダッシュクライアントはすべてのチャンクを取得できます。単一のgetリクエストがあるセグメント。HTTPサーバーは、Encoder/Packagerから到着するときに、CMAFチャンクをLL-Dashクライアントに送信できます。LL-HLSとLL-DASHの詳細な比較は[MMSP20]に記載されています。

4.3. Non-Low-Latency Live
4.3. 低負荷のライブ

Non-low-latency live delivery of media is defined here as a live stream that does not have a latency target shorter than 10 seconds.


This level of latency is the historically common case for segmented media delivery using HLS and DASH. This level of latency is often considered adequate for content like news. This level of latency is also sometimes achieved as a fallback state when some part of the delivery system or the client-side players do not support low-latency live streaming.


This level of latency can typically be achieved at scale with commodity CDN services for HTTP(s) delivery, and in some cases, the increased time window can allow for the production of a wider range of encoding options relative to the requirements for a lower-latency service without the need for increasing the hardware footprint, which can allow for wider device interoperability.


4.4. On-Demand
4.4. オンデマンド

On-demand media streaming refers to the playback of pre-recorded media based on a user's action. In some cases, on-demand media is produced as a by-product of a live media production, using the same segments as the live event but freezing the manifest that describes the media available from the media server after the live event has finished. In other cases, on-demand media is constructed out of pre-recorded assets with no streaming necessarily involved during the production of the on-demand content.


On-demand media generally is not subject to latency concerns, but other timing-related considerations can still be as important or even more important to the user experience than the same considerations with live events. These considerations include the startup time, the stability of the media stream's playback quality, and avoidance of stalls and other media artifacts during the playback under all but the most severe network conditions.


In some applications, optimizations are available to on-demand media but are not always available to live events, such as preloading the first segment for a startup time that does not have to wait for a network download to begin.


5. Adaptive Encoding, Adaptive Delivery, and Measurement Collection
5. 適応エンコード、適応配信、および測定コレクション

This section describes one of the best-known ways to provide a good user experience over a given network path, but one thing to keep in mind is that application-level mechanisms cannot provide a better experience than the underlying network path can support.


5.1. Overview
5.1. 概要

A simple model of media playback can be described as a media stream consumer, a buffer, and a transport mechanism that fills the buffer. The consumption rate is fairly static and is represented by the content bitrate. The size of the buffer is also commonly a fixed size. The buffer fill process needs to be at least fast enough to ensure that the buffer is never empty; however, it also can have significant complexity when things like personalization or advertising insertion workflows are introduced.


The challenges in filling the buffer in a timely way fall into two broad categories:


* Content variation (also sometimes called a "bitrate ladder") is the set of content renditions that are available at any given selection point.

* コンテンツのバリエーション(「Bitrate Ladder」とも呼ばれることもあります)は、特定の選択ポイントで利用可能なコンテンツレンディションのセットです。

* Content selection comprises all of the steps a client uses to determine which content rendition to play.

* コンテンツの選択は、クライアントがプレイするコンテンツレンディションを決定するために使用するすべてのステップで構成されています。

The mechanism used to select the bitrate is part of the content selection, and the content variation is all of the different bitrate renditions.


Adaptive bitrate streaming ("ABR streaming" or simply "ABR") is a commonly used technique for dynamically adjusting the media quality of a stream to match bandwidth availability. When this goal is achieved, the media server will tend to send enough media that the media player does not "stall", without sending so much media that the media player cannot accept it.

Adaptive BitRateストリーミング(「ABRストリーミング」または単に「ABR」)は、帯域幅の可用性に合わせてストリームのメディア品質を動的に調整するための一般的に使用される手法です。この目標が達成されると、メディアサーバーは、メディアプレーヤーがメディアプレーヤーを送信することなく、メディアプレーヤーが「失速」しないほど十分なメディアを送信する傾向があります。

ABR uses an application-level response strategy in which the streaming client attempts to detect the available bandwidth of the network path by first observing the successful application-layer download speed; then, given the available bandwidth, the client chooses a bitrate for each of the video, audio, subtitles, and metadata (among a limited number of available options for each type of media) that fits within that bandwidth, typically adjusting as changes in available bandwidth occur in the network or changes in capabilities occur during the playback (such as available memory, CPU, display size, etc.).


5.2. Adaptive Encoding
5.2. 適応エンコーディング

Media servers can provide media streams at various bitrates because the media has been encoded at various bitrates. This is a so-called "ladder" of bitrates that can be offered to media players as part of the manifest so that the media player can select among the available bitrate choices.


The media server may also choose to alter which bitrates are made available to players by adding or removing bitrate options from the ladder delivered to the player in subsequent manifests built and sent to the player. This way, both the player, through its selection of bitrate to request from the manifest, and the server, through its construction of the bitrates offered in the manifest, are able to affect network utilization.


5.3. Adaptive Segmented Delivery
5.3. 適応セグメント化された配信

Adaptive segmented delivery attempts to optimize its own use of the path between a media server and a media client. ABR playback is commonly implemented by streaming clients using HLS [RFC8216] or DASH [MPEG-DASH] to perform a reliable segmented delivery of media over HTTP. Different implementations use different strategies [ABRSurvey], often relying on proprietary algorithms (called rate adaptation or bitrate selection algorithms) to perform available bandwidth estimation/prediction and the bitrate selection.

適応型セグメント化された配信は、メディアサーバーとメディアクライアント間のパスの独自の使用を最適化しようとします。ABR再生は、HLS [RFC8216]またはDASH [MPEG-DASH]を使用してクライアントをストリーミングすることによって一般的に実装され、HTTPを介したメディアの信頼できるセグメント化された配信を実行します。さまざまな実装では、さまざまな戦略[Abrsurvey]を使用しており、多くの場合、独自のアルゴリズム(レート適応またはビットレート選択アルゴリズムと呼ばれる)に依存して、利用可能な帯域幅の推定/予測とビットレート選択を実行します。

Many systems will do an initial probe or a very simple throughput speed test at the start of media playback. This is done to get a rough sense of the highest (total) media bitrate that the network between the server and player will likely be able to provide under initial network conditions. After the initial testing, clients tend to rely upon passive network observations and will make use of player-side statistics, such as buffer fill rates, to monitor and respond to changing network conditions.


The choice of bitrate occurs within the context of optimizing for one or more metrics monitored by the client, such as the highest achievable audiovisual quality or the lowest chances for a rebuffering event (playback stall).


5.4. Advertising
5.4. 広告

The inclusion of advertising alongside or interspersed with streaming media content is common in today's media landscape.


Some commonly used forms of advertising can introduce potential user experience issues for a media stream. This section provides a very brief overview of a complex and rapidly evolving space.


The same techniques used to allow a media player to switch between renditions of different bitrates at segment boundaries can also be used to enable the dynamic insertion of advertisements (hereafter referred to as "ads"), but this does not mean that the insertion of ads has no effect on the user's quality of experience.


Ads may be inserted with either Client-side Ad Insertion (CSAI) or Server-side Ad Insertion (SSAI). In CSAI, the ABR manifest will generally include links to an external ad server for some segments of the media stream, while in SSAI, the server will remain the same during ads but will include media segments that contain the advertising. In SSAI, the media segments may or may not be sourced from an external ad server like with CSAI.


In general, the more targeted the ad request is, the more requests the ad service needs to be able to handle concurrently. If connectivity is poor to the ad service, this can cause rebuffering even if the underlying media assets (both content and ads) can be accessed quickly. The less targeted the ad request is, the more likely that ad requests can be consolidated and that ads can be cached similarly to the media content.


In some cases, especially with SSAI, advertising space in a stream is reserved for a specific advertiser and can be integrated with the video so that the segments share the same encoding properties, such as bitrate, dynamic range, and resolution. However, in many cases, ad servers integrate with a Supply Side Platform (SSP) that offers advertising space in real-time auctions via an Ad Exchange, with bids for the advertising space coming from Demand Side Platforms (DSPs) that collect money from advertisers for delivering the ads. Most such Ad Exchanges use application-level protocol specifications published by the Interactive Advertising Bureau [IAB-ADS], an industry trade organization.

場合によっては、特にSSAIの場合、ストリーム内の広告スペースは特定の広告主用に予約されており、ビデオと統合されて、セグメントがBitrate、Dynamic Range、Resolutionなどの同じエンコードプロパティを共有できるようにします。ただし、多くの場合、広告サーバーは、広告交換を介してリアルタイムオークションで広告スペースを提供するSupply Side Platform(SSP)と統合され、広告主からお金を集める需要側プラットフォーム(DSP)からの広告スペースへの入札があります。広告を配信するため。このような広告交換のほとんどは、業界貿易機関であるInteractive Advertising Bureau [IAB-ADS]によって公開されたアプリケーションレベルのプロトコル仕様を使用しています。

This ecosystem balances several competing objectives, and integrating with it naively can produce surprising user experience results. For example, ad server provisioning and/or the bitrate of the ad segments might be different from that of the main content, and either of these differences can result in playback stalls. For another example, since the inserted ads are often produced independently, they might have a different base volume level than the main content, which can make for a jarring user experience.


Another major source of competing objectives comes from user privacy considerations vs. the advertiser's incentives to target ads to user segments based on behavioral data. Multiple studies, for example, [BEHAVE] and [BEHAVE2], have reported large improvements in ad effectiveness when using behaviorally targeted ads, relative to untargeted ads. This provides a strong incentive for advertisers to gain access to the data necessary to perform behavioral targeting, leading some to engage in what is indistinguishable from a pervasive monitoring attack [RFC7258] based on user tracking in order to collect the relevant data. A more complete review of issues in this space is available in [BALANCING].


On top of these competing objectives, this market historically has had incidents of misreporting of ad delivery to end users for financial gain [ADFRAUD]. As a mitigation for concerns driven by those incidents, some SSPs have required the use of specific media players that include features like reporting of ad delivery or providing additional user information that can be used for tracking.


In general, this is a rapidly developing space with many considerations, and media streaming operators engaged in advertising may need to research these and other concerns to find solutions that meet their user experience, user privacy, and financial goals. For further reading on mitigations, [BAP] has published some standards and best practices based on user experience research.


5.5. Bitrate Detection Challenges
5.5. ビットレート検出の課題

This kind of bandwidth-measurement system can experience various troubles that are affected by networking and transport protocol issues. Because adaptive application-level response strategies are often using rates as observed by the application layer, there are sometimes inscrutable transport-level protocol behaviors that can produce surprising measurement values when the application-level feedback loop is interacting with a transport-level feedback loop.


A few specific examples of surprising phenomena that affect bitrate detection measurements are described in the following subsections. As these examples will demonstrate, it is common to encounter cases that can deliver application-level measurements that are too low, too high, and (possibly) correct but that vary more quickly than a lab-tested selection algorithm might expect.


These effects and others that cause transport behavior to diverge from lab modeling can sometimes have a significant impact on bitrate selection and on user QoE, especially where players use naive measurement strategies and selection algorithms that do not account for the likelihood of bandwidth measurements that diverge from the true path capacity.


5.5.1. Idle Time between Segments
5.5.1. セグメント間のアイドル時間

When the bitrate selection is chosen substantially below the available capacity of the network path, the response to a segment request will typically complete in much less absolute time than the duration of the requested segment, leaving significant idle time between segment downloads. This can have a few surprising consequences:


* TCP slow-start, when restarting after idle, requires multiple RTTs to re-establish a throughput at the network's available capacity. When the active transmission time for segments is substantially shorter than the time between segments, leaving an idle gap between segments that triggers a restart of TCP slow-start, the estimate of the successful download speed coming from the application-visible receive rate on the socket can thus end up much lower than the actual available network capacity. This, in turn, can prevent a shift to the most appropriate bitrate. [RFC7661] provides some mitigations for this effect at the TCP transport layer for senders who anticipate a high incidence of this problem.

* TCPスロースタートは、アイドル後に再起動するときに、ネットワークの利用可能な容量でスループットを再確立するために複数のRTTを必要とします。セグメントのアクティブ送信時間がセグメント間の時間よりも大幅に短く、TCPスロースタートの再起動をトリガーするセグメント間のアイドルギャップを残している場合、ソケットのアプリケーション可視受信率からの成功したダウンロード速度の推定値したがって、実際の利用可能なネットワーク容量よりもはるかに低くなります。これにより、最も適切なビットレートへの移行を防ぐことができます。[RFC7661]は、この問題の高い発生率を予測する送信者に、TCP輸送層でこの効果のいくつかの緩和を提供します。

* Mobile flow-bandwidth spectrum and timing mapping can be impacted by idle time in some networks. The carrier capacity assigned to a physical or virtual link can vary with activity. Depending on the idle time characteristics, this can result in a lower available bitrate than would be achievable with a steadier transmission in the same network.

* モバイルフローバンド幅のスペクトルとタイミングマッピングは、一部のネットワークでのアイドル時間によって影響を受ける可能性があります。物理的または仮想リンクに割り当てられたキャリア容量は、アクティビティによって異なります。アイドル時間の特性に応じて、これにより、同じネットワーク内のより安定した伝送で達成できるよりも、利用可能なビットレートが低くなる可能性があります。

Some receiver-side ABR algorithms, such as [ELASTIC], are designed to try to avoid this effect.


Another way to mitigate this effect is by the help of two simultaneous TCP connections, as explained in [MMSys11] for Microsoft Smooth Streaming. In some cases, the system-level TCP slow-start restart can also be disabled, for example, as described in [OReilly-HPBN].

この効果を緩和する別の方法は、Microsoft Smoothストリーミングの[MMSYS11]で説明されているように、2つの同時TCP接続の助けを借りることです。場合によっては、[Oreilly-HPBN]で説明されているように、システムレベルのTCPスロースタート再起動も無効にすることができます。

5.5.2. Noisy Measurements
5.5.2. 騒々しい測定

In addition to smoothing over an appropriate time scale to handle network jitter (see [RFC5481]), ABR systems relying on measurements at the application layer also have to account for noise from the in-order data transmission at the transport layer.


For instance, in the event of a lost packet on a TCP connection with SACK support (a common case for segmented delivery in practice), loss of a packet can provide a confusing bandwidth signal to the receiving application. Because of the sliding window in TCP, many packets may be accepted by the receiver without being available to the application until the missing packet arrives. Upon the arrival of the one missing packet after retransmit, the receiver will suddenly get access to a lot of data at the same time.


To a receiver measuring bytes received per unit time at the application layer and interpreting it as an estimate of the available network bandwidth, this appears as a high jitter in the goodput measurement, presenting as a stall followed by a sudden leap that can far exceed the actual capacity of the transport path from the server when the hole in the received data is filled by a later retransmission.


5.5.3. Wide and Rapid Variation in Path Capacity
5.5.3. パス容量の幅広い急速な変動

As many end devices have moved to wireless connections for the final hop (such as Wi-Fi, 5G, LTE, etc.), new problems in bandwidth detection have emerged.


In most real-world operating environments, wireless links can often experience sudden changes in capacity as the end user device moves from place to place or encounters new sources of interference. Microwave ovens, for example, can cause a throughput degradation in Wi-Fi of more than a factor of 2 while active [Micro].


These swings in actual transport capacity can result in user experience issues when interacting with ABR algorithms that are not tuned to handle the capacity variation gracefully.


5.6. Measurement Collection
5.6. 測定コレクション

Media players use measurements to guide their segment-by-segment adaptive streaming requests but may also provide measurements to streaming media providers.


In turn, media providers may base analytics on these measurements to guide decisions, such as whether adaptive encoding bitrates in use are the best ones to provide to media players or whether current media content caching is providing the best experience for viewers.


To that effect, the Consumer Technology Association (CTA), who owns the Web Application Video Ecosystem (WAVE) project, has published two important specifications.

そのため、Webアプリケーションビデオエコシステム(Wave)プロジェクトを所有しているConsumer Technology Association(CTA)は、2つの重要な仕様を公開しています。

* CTA-2066: Streaming Quality of Experience Events, Properties and Metrics

* CTA-2066:エクスペリエンスイベント、プロパティ、メトリックのストリーミング品質

[CTA-2066] specifies a set of media player events, properties, QoE metrics, and associated terminology for representing streaming media QoE across systems, media players, and analytics vendors. While all these events, properties, metrics, and associated terminology are used across a number of proprietary analytics and measurement solutions, they were used in slightly (or vastly) different ways that led to interoperability issues. CTA-2066 attempts to address this issue by defining common terminology and how each metric should be computed for consistent reporting.


* CTA-5004: Web Application Video Ecosystem - Common Media Client Data (CMCD)

* CTA -5004:Webアプリケーションビデオエコシステム - 一般的なメディアクライアントデータ(CMCD)

Many assume that the CDNs have a holistic view of the health and performance of the streaming clients. However, this is not the case. The CDNs produce millions of log lines per second across hundreds of thousands of clients, and they have no concept of a "session" as a client would have, so CDNs are decoupled from the metrics the clients generate and report. A CDN cannot tell which request belongs to which playback session, the duration of any media object, the bitrate, or whether any of the clients have stalled and are rebuffering or are about to stall and will rebuffer. The consequence of this decoupling is that a CDN cannot prioritize delivery for when the client needs it most, prefetch content, or trigger alerts when the network itself may be underperforming. One approach to couple the CDN to the playback sessions is for the clients to communicate standardized media-relevant information to the CDNs while they are fetching data. [CTA-5004] was developed exactly for this purpose.


6. Transport Protocol Behaviors and Their Implications for Media Transport Protocols

6. 輸送プロトコルの動作とメディア輸送プロトコルに対するそれらの影響

Within this document, the term "media transport protocol" is used to describe any protocol that carries media metadata and media segments in its payload, and the term "transport protocol" describes any protocol that carries a media transport protocol, or another transport protocol, in its payload. This is easier to understand if the reader assumes a protocol stack that looks something like this:


             Media Segments
              Media Format
         Media Transport Protocol
          Transport Protocol(s)



* "Media segments" would be something like the output of a codec or some other source of media segments, such as closed-captioning,

* 「メディアセグメント」は、コーデックの出力や、閉じたキャプションなどのメディアセグメントの他のソースのようなものになります。

* "Media format" would be something like an RTP payload format [RFC2736] or an ISO base media file format (ISOBMFF) profile [ISOBMFF],

* 「Media Format」は、RTPペイロード形式[RFC2736]またはISOベースメディアファイル形式(ISOBMFF)プロファイル[ISOBMFF]のようなものになります。

* "Media transport protocol" would be something like RTP [RFC3550] or DASH [MPEG-DASH], and

* 「Media Transport Protocol」は、RTP [RFC3550]やDash [MPEG-Dash]のようなものになり、

* "Transport protocol" would be a protocol that provides appropriate transport services, as described in Section 5 of [RFC8095].

* 「輸送プロトコル」は、[RFC8095]のセクション5で説明されているように、適切な輸送サービスを提供するプロトコルです。

Not all possible streaming media applications follow this model, but for the ones that do, it seems useful to distinguish between the protocol layer that is aware it is transporting media segments and underlying protocol layers that are not aware.


As described in the abstract of [RFC8095], the IETF has standardized a number of protocols that provide transport services. Although these protocols, taken in total, provide a wide variety of transport services, Section 6 will distinguish between two extremes:


* transport protocols used to provide reliable, in-order media delivery to an endpoint, typically providing flow control and congestion control (Section 6.1), and

* エンドポイントへの信頼性の高い注文メディア配信を提供するために使用されるトランスポートプロトコルは、通常、フロー制御と輻輳制御を提供します(セクション6.1)、および

* transport protocols used to provide unreliable, unordered media delivery to an endpoint, without flow control or congestion control (Section 6.2).

* トランスポートプロトコルは、フロー制御または輻輳制御なしで、エンドポイントに信頼できない順序付けられていないメディア配信を提供するために使用されます(セクション6.2)。

Because newly standardized transport protocols, such as QUIC [RFC9000], that are typically implemented in user space can evolve their transport behavior more rapidly than currently used transport protocols that are typically implemented in operating system kernel space, this document includes a description of how the path characteristics that streaming media providers may see are likely to evolve; see Section 6.3.

QUIC [RFC9000]などの新たに標準化された輸送プロトコルは、ユーザースペースに通常実装される可能性があるため、オペレーティングシステムカーネルスペースに通常実装されている現在使用されている輸送プロトコルよりも迅速に輸送挙動を進化させる可能性があるため、このドキュメントには、ストリーミングメディアプロバイダーが見られる可能性のあるパスの特性は、進化する可能性があります。セクション6.3を参照してください。

It is worth noting explicitly that the transport protocol layer might include more than one protocol. For example, a specific media transport protocol might run over HTTP, or over WebTransport, which in turn runs over HTTP.


It is worth noting explicitly that more complex network protocol stacks are certainly possible -- for instance, when packets with this protocol stack are carried in a tunnel or in a VPN, the entire packet would likely appear in the payload of other protocols. If these environments are present, streaming media operators may need to analyze their effects on applications as well.


6.1. Media Transport over Reliable Transport Protocols
6.1. 信頼できる輸送プロトコルを介したメディア輸送

The HLS [RFC8216] and DASH [MPEG-DASH] media transport protocols are typically carried over HTTP, and HTTP has used TCP as its only standardized transport protocol until HTTP/3 [RFC9114]. These media transport protocols use ABR response strategies as described in Section 5 to respond to changing path characteristics, and underlying transport protocols are also attempting to respond to changing path characteristics.

HLS [RFC8216]およびDASH [MPEG-DASH]メディアトランスポートプロトコルは通常、HTTPで運ばれ、HTTPはTCPをHTTP/3 [RFC9114]まで標準化された輸送プロトコルとして使用しています。これらのメディアトランスポートプロトコルは、セクション5で説明されているようにABR応答戦略を使用してパス特性の変化に応答し、基礎となる輸送プロトコルも変化するパス特性に応答しようとしています。

The past success of the largely TCP-based Internet is evidence that the various flow control and congestion control mechanisms that TCP has used to achieve equilibrium quickly, at a point where TCP senders do not interfere with other TCP senders for sustained periods of time [RFC5681], have been largely successful. The Internet has continued to work even when the specific TCP mechanisms used to reach equilibrium changed over time [RFC7414]. Because TCP provided a common tool to avoid contention, even when significant TCP-based applications like FTP were largely replaced by other significant TCP-based applications like HTTP, the transport behavior remained safe for the Internet.


Modern TCP implementations [RFC9293] continue to probe for available bandwidth and "back off" when a network path is saturated but may also work to avoid growing queues along network paths, which can prevent older TCP senders from quickly detecting when a network path is becoming saturated. Congestion control mechanisms, such as Copa [COPA18] and Bottleneck Bandwidth and Round-trip propagation time (BBR) [BBR-CONGESTION-CONTROL], make these decisions based on measured path delays, assuming that if the measured path delay is increasing, the sender is injecting packets onto the network path faster than the network can forward them (or the receiver can accept them), so the sender should adjust its sending rate accordingly.

最新のTCP実装[RFC9293]は、ネットワークパスが飽和状態にあるが、ネットワークパスに沿ったキューの拡大を避けるために機能する場合に、利用可能な帯域幅と「バックオフ」のプローブを続けます。飽和。COPA [COPA18]やボトルネックの帯域幅および往復伝播時間(BBR)[BBRコンスコントロール]などの混雑制御メカニズムは、測定された経路の遅延が増加すると仮定して、測定された経路の遅延に基づいてこれらの決定を下します。送信者は、ネットワークが転送できるよりも速くネットワークパスにパケットを注入している(または受信者がそれらを受け入れることができる)ため、送信者はそれに応じて送信率を調整する必要があります。

Although common TCP behavior has changed significantly since the days of [Jacobson-Karels] and [RFC2001], even with adding new congestion controllers such as CUBIC [RFC8312], the common practice of implementing TCP as part of an operating system kernel has acted to limit how quickly TCP behavior can change. Even with the widespread use of automated operating system update installation on many end-user systems, streaming media providers could have a reasonable expectation that they could understand TCP transport protocol behaviors and that those behaviors would remain relatively stable in the short term.

[Jacobson-Karels]と[RFC2001]の時代から一般的なTCPの動作は大幅に変化しましたが、Cubic [RFC8312]などの新しい混雑コントローラーを追加しても、オペレーティングシステムのカーネルの一部としてTCPを実装する一般的な慣行は行動しました。TCPの動作がどれだけ速く変化するかを制限します。多くのエンドユーザーシステムでの自動オペレーティングシステムの更新インストールを広く使用していても、ストリーミングメディアプロバイダーは、TCP輸送プロトコルの動作を理解できること、およびそれらの動作が短期的には比較的安定したままであるという合理的な期待を持つことができます。

6.2. Media Transport over Unreliable Transport Protocols
6.2. 信頼できない輸送プロトコルを介したメディアトランスポート

Because UDP does not provide any feedback mechanism to senders to help limit impacts on other users, UDP-based application-level protocols have been responsible for the decisions that TCP-based applications have delegated to TCP, i.e., what to send, how much to send, and when to send it. Because UDP itself has no transport-layer feedback mechanisms, UDP-based applications that send and receive substantial amounts of information are expected to provide their own feedback mechanisms and to respond to the feedback the application receives. This expectation is most recently codified as a Best Current Practice [RFC8085].


In contrast to adaptive segmented delivery over a reliable transport as described in Section 5.3, some applications deliver streaming media segments using an unreliable transport and rely on a variety of approaches, including:


* media encapsulated in a raw MPEG Transport Stream (MPEG-TS) [MPEG-TS] over UDP, which makes no attempt to account for reordering or loss in the transport,

* UDPを介して生のMPEGトランスポートストリーム(MPEG-TS)[MPEG-TS]にカプセル化されたメディアは、輸送の並べ替えや損失を考慮しようとはしません。

* RTP [RFC3550], which can notice packet loss and repair some limited reordering,

* RTP [RFC3550]、パケットの損失に気付き、いくつかの限られた再注文を修復することができます、

* the Stream Control Transmission Protocol (SCTP) [RFC9260], which can use partial reliability [RFC3758] to recover from some loss but can abandon recovery to limit head-of-line blocking, and

* ストリーム制御伝送プロトコル(SCTP)[RFC9260]は、部分的な信頼性[RFC3758]を使用して、いくらかの損失から回復できますが、回復を放棄して頭のブロックを制限する可能性があります。

* the Secure Reliable Transport (SRT) [SRT], which can use forward error correction and time-bound retransmission to recover from loss within certain limits but can abandon recovery to limit head-of-line blocking.

* 安全な信頼性の高い輸送(SRT)[SRT]は、特定の制限内で損失から回復するために前方エラーの修正と時間に縛られた再送信を使用できますが、回復を放棄して頭のブロックを制限できます。

Under congestion and loss, approaches like the above generally experience transient media artifacts more often and delay of playback effects less often, as compared with reliable segment transport. Often, one of the key goals of using a UDP-based transport that allows some unreliability is to reduce latency and better support applications like videoconferencing or other live-action video with interactive components, such as some sporting events.


Congestion avoidance strategies for deployments using unreliable transport protocols vary widely in practice, ranging from being entirely unresponsive to responding by using strategies, including:


* feedback signaling to change encoder settings (as in [RFC5762]),

* エンコーダー設定を変更するためのフィードバックシグナリング([RFC5762]のように)、

* fewer enhancement layers (as in [RFC6190]), and

* 増強層の少ない([RFC6190]のように)、および

* proprietary methods to detect QoE issues and turn off video to allow less bandwidth-intensive media, such as audio, to be delivered.

* QOEの問題を検出し、ビデオをオフにして、オーディオなどの帯域幅の集約メディアを提供できるようにする独自の方法を提供します。

RTP relies on RTCP sender and receiver reports [RFC3550] as its own feedback mechanism and even includes circuit breakers for unicast RTP sessions [RFC8083] for situations when normal RTP congestion control has not been able to react sufficiently to RTP flows sending at rates that result in sustained packet loss.


The notion of "circuit breakers" has also been applied to other UDP applications in [RFC8084], such as tunneling packets over UDP that are potentially not congestion controlled (for example, "encapsulating MPLS in UDP", as described in [RFC7510]). If streaming media segments are carried in tunnels encapsulated in UDP, these media streams may encounter "tripped circuit breakers", with resulting user-visible impacts.


6.3. QUIC and Changing Transport Protocol Behavior
6.3. QUICおよび変化する輸送プロトコルの動作

The QUIC protocol, developed from a proprietary protocol into an IETF Standards Track protocol [RFC9000], behaves differently than the transport protocols characterized in Sections 6.1 and 6.2.


Although QUIC provides an alternative to the TCP and UDP transport protocols, QUIC is itself encapsulated in UDP. As noted elsewhere in Section 7.1, the QUIC protocol encrypts almost all of its transport parameters and all of its payload, so any intermediaries that network operators may be using to troubleshoot HTTP streaming media performance issues, perform analytics, or even intercept exchanges in current applications will not work for QUIC-based applications without making changes to their networks. Section 7 describes the implications of media encryption in more detail.


While QUIC is designed as a general-purpose transport protocol and can carry different application-layer protocols, the current standardized mapping is for HTTP/3 [RFC9114], which describes how QUIC transport services are used for HTTP. The convention is for HTTP/3 to run over UDP port 443 [Port443], but this is not a strict requirement.

QUICは汎用輸送プロトコルとして設計されており、異なるアプリケーション層プロトコルを運ぶことができますが、現在の標準化されたマッピングはHTTP/3 [RFC9114]用です。条約は、HTTP/3がUDPポート443 [PORT443]を介して実行されるためですが、これは厳格な要件ではありません。

When HTTP/3 is encapsulated in QUIC, which is then encapsulated in UDP, streaming operators (and network operators) might see UDP traffic patterns that are similar to HTTP(S) over TCP. UDP ports may be blocked for any port numbers that are not commonly used, such as UDP 53 for DNS. Even when UDP ports are not blocked and QUIC packets can flow, streaming operators (and network operators) may severely rate-limit this traffic because they do not expect to see legitimate high-bandwidth traffic, such as streaming media over the UDP ports that HTTP/3 is using.

HTTP/3がQUICでカプセル化され、UDPでカプセル化されると、ストリーミングオペレーター(およびネットワークオペレーター)は、TCPを超えるHTTPに類似したUDPトラフィックパターンが表示される場合があります。UDPポートは、DNSのUDP 53など、一般的に使用されていないポート番号に対してブロックされる場合があります。UDPポートがブロックされず、QUICパケットが流れている場合でも、ストリーミングオペレーター(およびネットワークオペレーター)は、HTTPのUDPポート上のストリーミングメディアなど、合法的な高帯域幅トラフィックを見ることを期待していないため、このトラフィックを厳しく評価する可能性があります。/3は使用しています。

As noted in Section 5.5.2, because TCP provides a reliable, in-order delivery service for applications, any packet loss for a TCP connection causes head-of-line blocking so that no TCP segments arriving after a packet is lost will be delivered to the receiving application until retransmission of the lost packet has been received, allowing in-order delivery to the application to continue. As described in [RFC9000], QUIC connections can carry multiple streams, and when packet losses do occur, only the streams carried in the lost packet are delayed.


A QUIC extension currently being specified [RFC9221] adds the capability for "unreliable" delivery, similar to the service provided by UDP, but these datagrams are still subject to the QUIC connection's congestion controller, providing some transport-level congestion avoidance measures, which UDP does not.


As noted in Section 6.1, there is an increasing interest in congestion control algorithms that respond to delay measurements instead of responding to packet loss. These algorithms may deliver an improved user experience, but in some cases, they have not responded to sustained packet loss, which exhausts available buffers along the end-to-end path that may affect other users sharing that path. The QUIC protocol provides a set of congestion control hooks that can be used for algorithm agility, and [RFC9002] defines a basic congestion control algorithm that is roughly similar to TCP NewReno [RFC6582]. However, QUIC senders can and do unilaterally choose to use different algorithms, such as loss-based CUBIC [RFC8312], delay-based Copa or BBR, or even something completely different.

セクション6.1で述べたように、パケットの損失に応答する代わりに、遅延測定に応答する輻輳制御アルゴリズムへの関心が高まっています。これらのアルゴリズムは、ユーザーエクスペリエンスの改善を提供する可能性がありますが、場合によっては、パスを共有する他のユーザーに影響を与える可能性のあるエンドツーエンドパスに沿って利用可能なバッファーを排出する持続的なパケット損失に応答していません。QUICプロトコルは、アルゴリズムの俊敏性に使用できる輻輳制御フックのセットを提供し、[RFC9002]は、TCP NewReno [RFC6582]にほぼ類似した基本的なうっ血制御アルゴリズムを定義します。ただし、QUIC送信者は、損失ベースのCubic [RFC8312]、遅延ベースのCOPAまたはBBRなど、異なるアルゴリズムを使用することを一方的に選択できます。

The Internet community does have experience with deploying new congestion controllers without causing congestion collapse on the Internet. As noted in [RFC8312], both the CUBIC congestion controller and its predecessor BIC have significantly different behavior from Reno-style congestion controllers, such as TCP NewReno [RFC6582]; both were added to the Linux kernel to allow experimentation and analysis, both were then selected as the default TCP congestion controllers in Linux, and both were deployed globally.

インターネットコミュニティは、インターネット上で混雑崩壊を引き起こすことなく、新しい混雑コントローラーを展開する経験があります。[RFC8312]に記載されているように、Cubic混雑コントローラーとその前身のBICの両方が、TCP Newreno [RFC6582]などのリノスタイルの混雑コントローラーとは著しく異なる動作を持っています。両方ともLinuxカーネルに追加されて、実験と分析を可能にし、両方ともLinuxのデフォルトのTCP混雑コントローラーとして選択され、両方がグローバルに展開されました。

The point mentioned in Section 6.1 about TCP congestion controllers being implemented in operating system kernels is different with QUIC. Although QUIC can be implemented in operating system kernels, one of the design goals when this work was chartered was "QUIC is expected to support rapid, distributed development and testing of features"; to meet this expectation, many implementers have chosen to implement QUIC in user space, outside the operating system kernel, and to even distribute QUIC libraries with their own applications. It is worth noting that streaming operators using HTTP/3, carried over QUIC, can expect more frequent deployment of new congestion controller behavior than has been the case with HTTP/1 and HTTP/2, carried over TCP.


It is worth considering that if TCP-based HTTP traffic and UDP-based HTTP/3 traffic are allowed to enter operator networks on roughly equal terms, questions of fairness and contention will be heavily dependent on interactions between the congestion controllers in use for TCP-based HTTP traffic and UDP-based HTTP/3 traffic.


7. Streaming Encrypted Media
7. ストリーミング暗号化されたメディア

"Encrypted Media" has at least three meanings:


* Media encrypted at the application layer, typically using some sort of Digital Rights Management (DRM) system or other object encryption/security mechanism and typically remaining encrypted at rest when senders and receivers store it.

* アプリケーションレイヤーで暗号化されたメディアは、通常、何らかのデジタル権利管理(DRM)システムまたはその他のオブジェクト暗号化/セキュリティメカニズムを使用し、通常は送信者と受信機が保存したときに安静時に暗号化されたままにします。

* Media encrypted by the sender at the transport layer and remaining encrypted until it reaches the ultimate media consumer (in this document, it is referred to as end-to-end media encryption).

* 輸送層の送信者によって暗号化され、究極のメディア消費者に到達するまで暗号化されたままにします(このドキュメントでは、エンドツーエンドのメディア暗号化と呼ばれます)。

* Media encrypted by the sender at the transport layer and remaining encrypted until it reaches some intermediary that is _not_ the ultimate media consumer but has credentials allowing decryption of the media content. This intermediary may examine and even transform the media content in some way, before forwarding re-encrypted media content (in this document, it is referred to as hop-by-hop media encryption).

* 輸送層の送信者によって暗号化され、究極のメディア消費者であるがメディアコンテンツの復号化を可能にする資格がある_NOT_ _NOT_に到達するまで暗号化されたままにします。この仲介者は、再暗号化されたメディアコンテンツを転送する前に、メディアコンテンツを何らかの形で調べて変換することさえあります(このドキュメントでは、ホップバイホップメディア暗号化と呼ばれます)。

This document focuses on media encrypted at the transport layer, whether encryption is performed hop by hop or end to end. Because media encrypted at the application layer will only be processed by application-level entities, this encryption does not have transport-layer implications. Of course, both hop-by-hop and end-to-end encrypted transport may carry media that is, in addition, encrypted at the application layer.


Each of these encryption strategies is intended to achieve a different goal. For instance, application-level encryption may be used for business purposes, such as avoiding piracy or enforcing geographic restrictions on playback, while transport-layer encryption may be used to prevent media stream manipulation or to protect manifests.


This document does not take a position on whether those goals are valid.


Both end-to-end and hop-by-hop media encryption have specific implications for streaming operators. These are described in Sections 7.2 and 7.3.


7.1. General Considerations for Streaming Media Encryption
7.1. メディア暗号化のストリーミングに関する一般的な考慮事項

The use of strong encryption does provide confidentiality for encrypted streaming media, from the sender to either the ultimate media consumer or to an intermediary that possesses credentials allowing decryption. This does prevent deep packet inspection (DPI) by any on-path intermediary that does not possess credentials allowing decryption. However, even encrypted content streams may be vulnerable to traffic analysis. An on-path observer that can identify that encrypted traffic contains a media stream could "fingerprint" this encrypted media stream and then compare it against "fingerprints" of known content. The protection provided by strong encryption can be further lessened if a streaming media operator is repeatedly encrypting the same content. "Identifying HTTPS-Protected Netflix Videos in Real-Time" [CODASPY17] is an example of what is possible when identifying HTTPS-protected videos over TCP transport, based either on the length of entire resources being transferred or on characteristic packet patterns at the beginning of a resource being transferred. If traffic analysis is successful at identifying encrypted content and associating it with specific users, this tells an on-path observer what resource is being streamed, and by who, almost as certainly as examining decrypted traffic.

強力な暗号化の使用は、送信者から究極のメディア消費者のいずれか、または復号化を可能にする資格情報を持っている仲介者まで、暗号化されたストリーミングメディアの機密性を提供します。これにより、復号化を可能にする資格情報を所有していないパス上の仲介者による深いパケット検査(DPI)が妨げられます。ただし、暗号化されたコンテンツストリームでさえ、トラフィック分析に対して脆弱になる場合があります。暗号化されたトラフィックにメディアストリームが含まれていることを特定できるパスオンパスオブザーバーは、この暗号化されたメディアストリームを「フィンガープリント」する可能性があり、既知のコンテンツの「指紋」と比較します。ストリーミングメディアオペレーターが同じコンテンツを繰り返し暗号化している場合、強力な暗号化によって提供される保護をさらに軽減できます。 「HTTPSで保護されたNetflixビデオをリアルタイムで識別する」[CODASPY17]は、TCPトランスポートを介してHTTPSで保護されたビデオを識別するときに可能な例です。転送されるリソースの。トラフィック分析が暗号化されたコンテンツを特定し、特定のユーザーと関連付けることに成功した場合、これにより、オンパスオブザーバーに、リソースがストリーミングされていること、そして誰が復号化されたトラフィックを調べるのとほぼ同じくらい、誰によって

Because HTTPS has historically layered HTTP on top of TLS, which is in turn layered on top of TCP, intermediaries have historically had access to unencrypted TCP-level transport information, such as retransmissions, and some carriers exploited this information in attempts to improve transport-layer performance [RFC3135]. The most recent standardized version of HTTPS, HTTP/3 [RFC9114], uses the QUIC protocol [RFC9000] as its transport layer. QUIC relies on the TLS 1.3 initial handshake [RFC8446] only for key exchange [RFC9001] and encrypts almost all transport parameters itself, except for a few invariant header fields. In the QUIC short header, the only transport-level parameter that is sent "in the clear" is the Destination Connection ID [RFC8999], and even in the QUIC long header, the only transport-level parameters sent "in the clear" are the version, Destination Connection ID, and Source Connection ID. For these reasons, HTTP/3 is significantly more "opaque" than HTTPS with HTTP/1 or HTTP/2.

HTTPは歴史的にTLSの上にHTTPを階層化しており、これはTCPの上に階層化されているため、仲介業者は歴史的には、再nsiplansmissionsなどの暗号化されていないTCPレベルの輸送情報にアクセスでき、一部のキャリアは輸送を改善する試みでこの情報を活用しました。層のパフォーマンス[RFC3135]。 HTTPSの最新の標準化されたバージョンであるHTTP/3 [RFC9114]は、QUICプロトコル[RFC9000]をその輸送層として使用しています。 QUICは、いくつかの不変ヘッダーフィールドを除き、キーエクスチェンジ[RFC9001]のみであり、キーエクスチェンジ[RFC9001]にのみTLS 1.3初期握手[RFC8446]に依存し、ほぼすべての輸送パラメーター自体に依存しています。 QUICショートヘッダーでは、「クリア」で送信されるトランスポートレベルの唯一のパラメーターは、宛先接続ID [RFC8999]であり、QUIC Longヘッダーでさえ、「クリアで」送信される唯一のトランスポートレベルのパラメーターはバージョン、宛先接続ID、およびソース接続ID。これらの理由により、HTTP/3は、HTTP/1またはHTTP/2のHTTPよりもはるかに「不透明」です。

[RFC9312] discusses the manageability of the QUIC transport protocol that is used to encapsulate HTTP/3, focusing on the implications of QUIC's design and wire image on network operations involving QUIC traffic. It discusses what network operators can consider in some detail.


More broadly, "Considerations around Transport Header Confidentiality, Network Operations, and the Evolution of Internet Transport Protocols" [RFC9065] describes the impact of increased encryption of transport headers in general terms.


It is also worth noting that considerations for heavily encrypted transport protocols also come into play when streaming media is carried over IP-level VPNs and tunnels, with the additional consideration that an intermediary that does not possess credentials allowing decryption will not have visibility to the source and destination IP addresses of the packets being carried inside the tunnel.


7.2. Considerations for Hop-by-Hop Media Encryption
7.2. ホップバイホップメディア暗号化の考慮事項

Hop-by-hop media encryption offers the benefits described in Section 7.1 between the streaming media operator and authorized intermediaries, among authorized intermediaries, and between authorized intermediaries and the ultimate media consumer; however, it does not provide these benefits end to end. The streaming media operator and ultimate media consumer must trust the authorized intermediaries, and if these intermediaries cannot be trusted, the benefits of encryption are lost.


Although the IETF has put considerable emphasis on end-to-end streaming media encryption, there are still important use cases that require the insertion of intermediaries.


There are a variety of ways to involve intermediaries, and some are much more intrusive than others.


From a streaming media operator's perspective, a number of considerations are in play. The first question is likely whether the streaming media operator intends that intermediaries are explicitly addressed from endpoints or whether the streaming media operator is willing to allow intermediaries to "intercept" streaming content transparently, with no awareness or permission from either endpoint.


If a streaming media operator does not actively work to avoid interception by on-path intermediaries, the effect will be indistinguishable from "impersonation attacks", and endpoints cannot be assured of any level of confidentiality and cannot trust that the content received came from the expected sender.


Assuming that a streaming media operator does intend to allow intermediaries to participate in content streaming and does intend to provide some level of privacy for endpoints, there are a number of possible tools, either already available or still being specified. These include the following:


Server and Network Assisted DASH [MPEG-DASH-SAND]: This specification introduces explicit messaging between DASH clients and DASH-aware network elements or among various DASH-aware network elements for the purpose of improving the efficiency of streaming sessions by providing information about real-time operational characteristics of networks, servers, proxies, caches, CDNs, as well as a DASH client's performance and status.

サーバーとネットワークアシストダッシュ[MPEG-Dash-Sand]:この仕様では、ダッシュクライアントとダッシュアウェアネットワーク要素間、または実際の情報を実際の情報を提供することにより、ストリーミングセッションの効率を改善する目的で、さまざまなダッシュアウェアネットワーク要素間で明示的なメッセージングを導入します。 - ネットワーク、サーバー、プロキシ、キャッシュ、CDN、およびダッシュクライアントのパフォーマンスとステータスの時間運用特性。

"Double Encryption Procedures for the Secure Real-Time Transport Protocol (SRTP)" [RFC8723]: This specification provides a cryptographic transform for the SRTP that provides both hop-by-hop and end-to-end security guarantees.


Secure Frames [SFRAME]: [RFC8723] is closely tied to SRTP, and this close association impeded widespread deployment, because it could not be used for the most common media content delivery mechanisms. A more recent proposal, Secure Frames [SFRAME], also provides both hop-by-hop and end-to-end security guarantees but can be used with other media transport protocols beyond SRTP.

セキュアフレーム[SFRAME]:[RFC8723]はSRTPと密接に結びついており、この密接な関連性は、最も一般的なメディアコンテンツ配信メカニズムに使用できないため、広範な展開を妨げました。より最近の提案であるSecure Frames [SFRAME]は、ホップバイホップとエンドツーエンドのセキュリティ保証の両方を提供しますが、SRTPを超えた他のメディアトランスポートプロトコルで使用できます。

A streaming media operator's choice of whether to involve intermediaries requires careful consideration. As an example, when ABR manifests were commonly sent unencrypted, some access network operators would modify manifests during peak hours by removing high-bitrate renditions to prevent players from choosing those renditions, thus reducing the overall bandwidth consumed for delivering these media streams and thereby reducing the network load and improving the average user experience for their customers. Now that ubiquitous encryption typically prevents this kind of modification, a streaming media operator who used intermediaries in the past, and who now wishes to maintain the same level of network health and user experience, must choose between adding intermediaries who are authorized to change the manifests or adding some other form of complexity to their service.


Some resources that might inform other similar considerations are further discussed in [RFC8824] (for WebRTC) and [RFC9312] (for HTTP/3 and QUIC).


7.3. Considerations for End-to-End Media Encryption
7.3. エンドツーエンドのメディア暗号化に関する考慮事項

End-to-end media encryption offers the benefits described in Section 7.1 from the streaming media operator to the ultimate media consumer.


End-to-end media encryption has become much more widespread in the years since the IETF issued "Pervasive Monitoring Is an Attack" [RFC7258] as a Best Current Practice, describing pervasive monitoring as a much greater threat than previously appreciated. After the Snowden disclosures, many content providers made the decision to use HTTPS protection -- HTTP over TLS -- for most or all content being delivered as a routine practice, rather than in exceptional cases for content that was considered sensitive.


However, as noted in [RFC7258], there is no way to prevent pervasive monitoring by an attacker while allowing monitoring by a more benign entity who only wants to use DPI to examine HTTP requests and responses to provide a better user experience. If a modern encrypted transport protocol is used for end-to-end media encryption, unauthorized on-path intermediaries are unable to examine transport and application protocol behavior. As described in Section 7.2, only an intermediary explicitly authorized by the streaming media operator who is to examine packet payloads, rather than intercepting packets and examining them without authorization, can continue these practices.


[RFC7258] states that "[t]he IETF will strive to produce specifications that mitigate pervasive monitoring attacks", so streaming operators should expect the IETF's direction toward preventing unauthorized monitoring of IETF protocols to continue for the foreseeable future.

[RFC7258]は、「[t] he IETFは、広範な監視攻撃を緩和する仕様を作成するよう努めている」と述べているため、ストリーミングオペレーターは、IETFプロトコルの不正な監視が先見の明のある将来を継続することを防ぐためのIETFの方向を期待する必要があります。

8. Additional Resources for Streaming Media
8. ストリーミングメディアの追加リソース

The Media Operations (MOPS) community maintains a list of references and resources; for further reading, see [MOPS-RESOURCES].


9. IANA Considerations
9. IANAの考慮事項

This document has no IANA actions.


10. Security Considerations
10. セキュリティに関する考慮事項

Security is an important matter for streaming media applications, and the topic of media encryption was explained in Section 7. This document itself introduces no new security issues.


11. Informative References
11. 参考引用

