Network Working Group                                         M. Handley
Request for Comments: 2887                                      S. Floyd
Category: Informational                                            ACIRI
                                                              B. Whetten
                                                              R. Kermode
                                                             L. Vicisano
                                                                 M. Luby
                                                  Digital Fountain, Inc.
                                                             August 2000
       The Reliable Multicast Design Space for Bulk Data Transfer

Status of this Memo


This memo provides information for the Internet community. It does not specify an Internet standard of any kind. Distribution of this memo is unlimited.


Copyright Notice


Copyright (C) The Internet Society (2000). All Rights Reserved.




The design space for reliable multicast is rich, with many possible solutions having been devised. However, application requirements serve to constrain this design space to a relatively small solution space. This document provides an overview of the design space and the ways in which application constraints affect possible solutions.


1. Introduction
1. はじめに

The term "general purpose reliable multicast protocol" is something of an oxymoron. Different applications have different requirements of a reliable multicast protocol, and these requirements constrain the design space in ways that two applications with differing requirements often cannot share a single solution. There are however many successful reliable multicast protocol designs that serve more special purpose requirements well.


In this document we attempt to review the design space for reliable multicast protocols intended for bulk data transfer. The term bulk data transfer should be taken as having broad meaning - the main limitations are that the data stream is continuous and long lived - constraints necessary for the forms of congestion control we currently understand. The purpose of this review is to gather together an overview of the field and to make explicit the constraints imposed by particular mechanisms. The aim is to provide guidance to the standardization process for protocols and protocol building blocks. In doing this, we cluster potential solutions into a number of loose categories - real protocols may be composed of mechanisms from more than one of these clusters.

この文書では、バルクデータ転送のために意図され、信頼性マルチキャストプロトコルの設計空間を確認しようとします。用語バルクデータ転送は、広い意味を持つものとして取られるべきである - 私たちは現在、理解輻輳制御のフォームに必要な制約 - 主な制限は、データストリームが連続しており、長く住んでいたということです。このレビューの目的は、一緒にフィールドの概要を収集し、特定のメカニズムによって課される明示的な制約を作ることです。目的は、プロトコルとプロトコル・ビルディング・ブロックのための標準化プロセスへのガイダンスを提供することです。これを行うには、我々は緩いカテゴリの数に潜在的な解決策をクラスタ化 - 実際のプロトコルは、これらのクラスタの複数のメカニズムで構成されてもよいです。

The main constraint on solutions is imposed by the need to scale to large receiver sets. For small receiver sets the design space is much less restricted.


2. Application Constraints

Application requirements for reliable multicast (RM) are as broad and varied as the applications themselves. However, there are a set of requirements that significantly affect the design of an RM protocol. A brief list includes:


o Does the application need to know that everyone received the data?


o Does the application need to constrain differences between receivers?


o Does the application need to scale to large numbers of receivers?


o Does the application need to be totally reliable?


o Does the application need ordered data?


o Does the application need to provide low-delay delivery?


o Does the application need to provide time-bounded delivery?


o Does the application need many interacting senders?


o Is the application data flow intermittent?


o Does the application need to work in the public Internet?


o Does the application need to work without a return path (e.g. satellite)?


o Does the application need to provide secure delivery?


In the context of standardizing bulk data transfer protocols, we can rule out applications with multiple interacting senders and intermittent data flows. It is not that these applications are unimportant, but that we do not yet have effective congestion control for such applications.


2.1. Did everyone receive the data?
2.1. 誰もがデータを受信しましたか?

In many applications a logically defined unit or units of data is to be delivered to multiple clients, e.g., a file or a set of files, a software package, a stock quote or package of stock quotes, an event notification, a set of slides, a frame or block from a video. An application data unit (ADU) is defined to be a logically separable unit of data that is useful to the application. In some cases, an application data unit may be short enough to fit into a single packet (e.g., an event notification or a stock quote), whereas in other cases an application data unit may be much longer than a packet (e.g., a software package).


A protocol may optionally provide delivery confirmation to ensure reliable delivery, i.e., a mechanism for receivers to inform the sender when data has been delivered. There are two types of confirmation, at the application data unit level and at the packet level. Application data unit confirmation is useful at the application level, e.g., to inform the application about receiver progress and to decide when to stop sending packets about a particular application data unit. Packet confirmation is useful at the transport level, e.g., to inform the transport level when it can release buffer space being used for storing packets for which delivery has been confirmed.


Some applications have a strong requirement for confirmation that all the receivers got an ADU, or if not, to be informed of which specific receivers failed to receive the entire ADU. Examples include applications where receivers pay for data, and reliable file-system replication. Other applications do not have such a requirement. An example is the distribution of free software.


If the application does need to know that every receiver got the ADU, then a positive acknowledgment must be received from every receiver, although it may be possible to aggregate these acknowledgments. If the application needs to know precisely which receivers failed to get the ADU, additional constraints are placed on acknowledgment aggregation.


It should be noted that different mechanisms can be used for ADU-level confirmation and packet-level confirmation in the same application. For example, an ADU-level confirmation mechanism using positive acknowledgments may sit on top of a packet-level NACK or FEC-based transport. Typically this only makes sense when ADUs are significantly larger than a single packet.

異なるメカニズムが同じアプリケーション内ADUレベルの確認とパケットレベルの確認のために使用することができることに留意すべきです。例えば、肯定応答を使用してADUレベル確認機構は、パケットレベルのNACK又はFECベースのトランスポートの上に座ることができます。 ADUが単一のパケットよりも有意に大きい場合、通常、これはのみ意味があります。

2.2. Constraining differences
2.2. 違いを拘束

Some applications need to constrain differences between receivers so that the data reception characteristics for all receivers falls within some range. An example is a stock price feed, where it is unacceptable for a receiver to suffer delivery that is delayed significantly more than any other receiver.


This requirement is difficult to satisfy without harming performance. Typically solutions involve not sending more than a limited amount of new data until positive acknowledgments have been received from all the receivers. Such a solution does not cope with network and end-system failures well.


2.3. Receiver Set Scaling
2.3. 受信機セットスケーリング

There are many applications for RM that do not need to scale to large numbers of receivers. For such applications, a range of solutions may be available that are not available for applications where scaling to large receiver sets is a requirement.


A protocol must achieve good throughput of application data units to receivers. This means that most data that is delivered to receivers is useful in recovering the application data unit that they are trying to receive. A protocol must also provide good congestion control to fairly share the available network resources between all applications. Receiver set scaling is one of the most important constraints in meeting these requirements, because it strictly limits the mechanisms that can be used to achieve these requirements to those that will efficiently scale to a large receiver population. Acknowledgement packets have been employed by many systems to achieve these goals, but it is important to understand the strength and limitations of different ways of using such packets.


In a very small system, it may be acceptable to have the receivers acknowledge every packet. This approach provides the sender with the maximum amount of information about reception conditions at all the receivers, information that can be used both to achieve good throughput and to achieve congestion control.


For larger systems, such "flat ACK" schemes cause acknowledge implosions at the sender. Attempts have been made to reduce this problem by sending aggregate ACKs infrequently [RMWT98, BC94], but it is very difficult to incorporate effective congestion control into such protocols because of the spareceness of feedback.


Using negative acknowledgments (NACKs) instead of ACKs reduces this problem to one of NACK implosion (only from the receivers missing the packets), and because the sender really only needs to know that at least one receiver is missing data in order to achieve good throughput, various NACK suppression mechanisms can be applied.


An alternative to NACKs is ACK aggregation, which can be done by arranging the receivers into a logical tree, so that each leaf sends ACKs to its parent which aggregates them, and passes them on up the tree. Tree-based protocols scale well, but tree formation can be problematic.


Other ACK topologies such as rings are also possible, but are often more difficult to form and maintain than trees are. An alternative strategy is to add mechanisms to routers so that they can help out in achieving good throughput or in reducing the cost of achieving good throughput.


All these solutions improve receiver set scaling, but they all have limits of one form or another. One class of solutions scales to an infinite number of receivers by having no feedback channel whatsoever in order to achieve good throughput. These open-loop solutions take the initial data and encode it using an FEC-style mechanism. This encoded data is transmitted in a continuous stream. Receivers then join the session and receive packets until they have sufficient packets to decode the original data, at which point they leave the session.


Thus, it is clear that the intended scale of the session constrains the possible solutions. All solutions will work for very small sessions, but as the intended receive set increases, the range of possible solutions that can be deployed safely decreases.


It should also be noted that hybrids of these mechanisms are possible, and that using one mechanism at the packet-level and a different (typically higher overhead) solution at the ADU level may also scale reasonably if the ADUs are large compared to packets.


2.4. Total vs Semi-reliable
2.4. セミ信頼性の高い対合計

Many applications require delivery of application data units to be totally reliable; if any of the application data unit is missing, none of the received portion of the application data unit is useful. File transfer applications are a good example of applications requiring total reliability.


However, some applications do not need total reliability. An example is audio broadcasting, where missing packets reduce the quality of the received audio but do not render it unusable. Such applications can sometimes get by without any additional reliability over native IP reliability, but often having a semi-reliable multicast protocol is desirable.


2.5. Time-bounded Delivery
2.5. 時間制限配達

Many applications just require data to be delivered to the receivers as fast as possible. They have no absolute deadline for delivery.


However, some applications have hard delivery constraints - if the data does not arrive at the receiver by a certain time, there is no point in delivering it at all. Such time-boundedness may be as a result of real-time constraints such as with audio or video streaming, or as the result of new data superseding old data. In both cases, the requirement is for the application to have a greater degree of control over precisely what the application sends at which time than might be required with applications such as file transfer.

ただし、一部のアプリケーションは、ハード配信制約を持っている - データが一定の時間で受信機に到着しない場合は、すべてでそれを提供しても意味がありません。このような時間有界は、オーディオやビデオストリーミングのように、または古いデータに取って代わる、新たなデータの結果として、リアルタイム制約の結果であってもよいです。アプリケーションは、アプリケーションがファイル転送などのアプリケーションで必要となる場合がありますよりも、その時点で送信し、正確に何を制御の大きな度合いを持っているため、両方のケースでは、要件があります。

Time-bounded delivery usually also implies a semi-reliable protocol, but the converse does not necessarily hold.


3. Network Constraints

The properties of the network in which the application is being deployed may themselves constrain the reliable multicast design space.


3.1. Internet vs Intranet
3.1. イントラネット対インターネット

In principle the Internet and intranets are the same. In practice however, the fact that an intranet is under one administration might allow for solutions to be configured that can not easily be done in the public Internet. Thus, if the data is of very high value, it might be appropriate to enhance the routers to provide assistance to a reliable multicast transport protocol. In the public Internet, it is less likely that the additional expense required to support this state in the routers would be acceptable.


3.2. Return Path
3.2. 復路

In principle, when feedback is required from receivers, this feedback can be multicast or unicast. Multicast feedback has advantages, especially in NACK-based protocols where it is valuable for NACK suppression. However, it is not clear at this time whether all ISPs will allow all members of a session to send to that session. If multicast feedback is not allowed, then unicast feedback can almost always be substituted, although often at the expense of additional messages and mechanisms.


Some networks may not allow any form of feedback however. The primary example of this occurs with satellite broadcasts where the back channel may be very narrow or even non-existent. For such networks the solution space is very constrained - only FEC-based encodings have any real chance of working. If the receivers are direct satellite receivers, then no congestion control is needed, but it is dangerous to make such assumptions because it is possible for a satellite hop to feed downstream networks. Thus, congestion control still needs to be considered with solutions that do not have a return path.

ネットワークによっては、しかし、フィードバックの任意のフォームを許可しない場合があります。これの主な例は、バックチャネルが非常に狭く、あるいは存在しないかもしれ衛星放送で発生します。解空間が非常に制約されているようなネットワークのために - 唯一のFECベースのエンコーディングは、作業の任意の本当のチャンスを持っています。受信機は、直接衛星受信機である場合、輻輳制御は必要ありませんが、衛星ホップが下流のネットワークを養うすることが可能であるため、そのような仮定をするのは危険です。このように、輻輳制御は、まだリターンパスを持っていないソリューションを検討する必要があります。

3.3. Network Assistance
3.3. ネットワーク支援

A reliable multicast protocol must involve mechanisms running in end hosts, and must involve routers forwarding multicast packets. However under some circumstances, it is possible to rely on some additional degree of assistance from network elements. Broadly speaking we can cluster RM protocols into four classes depending on the degree of support received from other network elements.


No Additional Support The routers merely forward packets, and only the sender and receivers have any reliable multicast protocol state.


Layered Approaches Data is split across multiple multicast groups. Receivers join appropriate groups to receive only the traffic they require. This may in some cases require fast join or leave functionality from the routers, and may require more forwarding state in the routers.


Server-based Approaches Additional nodes are used to assist with data delivery or feedback aggregation. These additional nodes might not be normal senders or receivers, and may be present on the distribution or feedback tree only to provide assistance to the reliable multicast protocol. They would not otherwise receive the multicast traffic.


Router-based Approaches With router-based approaches, routers on the normal data distribution tree from the sender to the receivers assist in the delivery of data or feedback aggregation or suppression. As routers can directly influence multicast routing, they have more control over which traffic goes to which group members than server-based approaches. However routers do not normally have a large amount of spare memory or processing power, which restricts how much functionality can be placed in the routers. In addition, router code is normally more difficult to upgrade than application code, so router-based approaches need to be very general as they are more difficult to deploy and to change.


4. Good Throughput Mechanisms

Two main concerns that a RM protocol must address are congestion control and good throughput. Packet loss plays a major role with respect to both concerns. The primary symptom of congestion in many networks is packet loss. The primary obstacle that must be overcome to achieve good throughput is packet loss. Thus, measuring and reacting to packet loss is crucial to address both concerns. RM solutions that address these concerns can be roughly categorized as using one or more of the following techniques:


o Data packet acknowledgment.


o Negative acknowledgment of missing data packets.


o Redundancy allowing not all packets to be received.


These techniques themselves can be usefully subdivided, so that we can examine the parts of the requirement space in which each mechanism can be deployed. In this section, we focus on using these mechanisms for achieving good throughput, and in the next section we focus on using these mechanisms for congestion control.


4.1. ACK-based Mechanisms
4.1. ACKベースのメカニズム

The simplest ACK-based mechanism involves every receiver sending an ACK packet for every data packet it receives and resending packets that are lost by any receiver. Such mechanisms are limited to very small receiver groups by the implosion of ACKs received at the sender, and for this reason they are impractical for most applications.


Putting multiple ACKs into a single data packet [RMWT98] reduces the implosion problem by a constant amount, allowing slightly larger receiver groups. However a limit is soon reached whereby feedback to the sender is too infrequent for sender-based congestion control mechanisms to work reliably.


Arranging the receivers into a ring [WKM94] whereby an "ACK-token" is passed around the ring prevents the implosion problem for data. However ring creation and maintenance may itself be problematic. Also if ring creation does not take into account network topology (something which is difficult to achieve in practice), then the number of ACK packets crossing the network backbone for each data packet sent may increase O(n) with the number of receivers.


4.1.1. Tree-based ACK Mechanisms
4.1.1. ツリーベースのACKメカニズム

Arranging the receivers into a tree [MWB+98, KCW98] whereby receivers generate ACKs to a parent node, which aggregates those ACKs to its parent in turn, is both more robust and more easily configured than a ring. The ACK-tree is typically only used for ACK-aggregation - data packets are multicast from the sender to the receivers as normal. Trees are easier to construct than rings because more local information can be used in their construction. Also they can be more fault tolerant than rings because node failures only affect a subset of receivers, each of which can easily and locally decide to by-pass its parent and report directly to the node one level higher in the tree. With good ACK-tree formation, tree-based ACK mechanisms have the potential to be one of the most scalable RM solutions.

ツリーに受信機を配置する[MWB + 98、KCW98]は受信機が順番に親にそれらのACKを集約親ノードにACKを生成することにより、両方のより堅牢でより容易に構成されたリングよりなります。 ACKツリーは、典型的にのみACK凝集のために使用される - データパケットは通常通り受信機に送信者からマルチキャストです。より多くの地元の情報がその構築に使用することができますので、木はリングよりも構築が容易です。ノードの障害だけ簡単に、ローカルツリーの上位ノード1のレベルに直接その親との報告を渡すことにより、-に決めることができ、それぞれが、受信機のサブセットに影響を与えるため、また、彼らはリングよりもフォールトトレラントすることができます。良いACK-木の形成と、ツリーベースのACKメカニズムは、最もスケーラブルなRMソリューションの1つになる可能性があります。

To be simple to deploy, tree-based protocols must be self-organizing - the receivers must form the tree themselves using local information in a scalable manner. Such mechanisms are possible, but are not trivial. The main scaling limitations of tree-based protocols therefore come from the tree formation and maintenance mechanisms rather than from the use of ACKs. Without such a scalable and automatic tree-formation mechanism, tree-based protocols must rely on manual configuration, which significantly limits their applicability (often to intranets) and (due to the complexity of configuration) their scalability.

導入が簡単であるために、ツリーベースのプロトコルは、自己組織化でなければならない - 受信機は、スケーラブルな方法で地域情報を使用して自分自身をツリーを形成しなければなりません。このようなメカニズムは可能ですが、些細ではありません。ツリーベースのプロトコルの主なスケーリングの制限は、したがって、ツリーの形成および維持機構からではなく、ACKの使用から来ます。そのようなスケーラブルな自動ツリー生成機構なしで、ツリーベースのプロトコルは、有意に(多くの場合、イントラネット)及びそれらのスケーラビリティ(による構成の複雑さのために)それらの適用を制限する手動の構成に依存しなければなりません。

Orthogonal to the issue of tree formation is the issue of subtree retransmission. With appropriate router mechanisms, or the use of multiple multicast groups, it is possible to allow the intermediate tree nodes to retransmit missing data to the nodes below them in the tree rather than relying on the original sender to retransmit the data. This relies on there being a good correlation at the point of the intermediate node between the ACK tree and the actual data tree, as well as there being a mechanism to constrain the retransmission to the subtree. A good automatic tree formation mechanism combined with the use of administrative scoped multicast groups might provide such a solution. Without such tree formation mechanisms, subtree retransmission is difficult to deploy in large groups in the public internet. This could also be solved by the use of transport-level router mechanisms to assist or perform retransmission, although existing router mechanisms [FLST98] support NACK-based rather than ACK-based protocols.


Another important issue is the nature of the aggregation performed at interior nodes on the ACK-tree. Such nodes could:


1. aggregate ACKs by sending a single ACK when all their children have ACKed,


2. aggregate ACKs by listing all the children that have ACKed,
3. send an aggregated ACK with a NACK-like exception list.
3. NACKのような例外リストに集約されたACKを送信します。

For data packets, 1. is clearly more scalable, and should be preferred. However if the sender needs to know exactly which receivers received the data, 2. and 3. provide this information. Fortunately, there is usually no need to do this on a per-packet basis, but rather on a per-ADU basis. Doing 1. on a per packet basis, and 3. on a per ADU basis is the most scalable solution for applications that need this information, and suffers virtually no disadvantage compared to the other solutions used on a per-packet basis.

データパケットの場合、1は明らかに、よりスケーラブルであり、好ましいする必要があります。送信者は受信機がデータを受信し、正確に知っている必要がある場合は、2と3は、この情報を提供します。幸いなことに、パケット単位ではなく、むしろあたり-ADUベースでこれを実行する必要は通常ありません。 ADUごとにパケットごとに1、および3を実行すると、この情報を必要とするアプリケーションのための最もスケーラブルなソリューションであり、パケット単位で使用される他のソリューションに比べて実質的には不利益を被ります。

4.2. NACK-based mechanisms
4.2. NACKベースのメカニズム

Instead of sending an ACK for every data packet received, receivers can send a negative acknowledgment (NACK) for every data packet they discover they did not receive. This has a number of advantages over ACK-based mechanisms:


o The sender no longer needs to know exactly how many receivers there are. This removes the topology-building phase needed for ring- or tree-style ACK-based algorithms.

O送信者は、もはや存在している正確にどのように多くの受信機を知っている必要はありません。これは、リング - またはツリースタイルのACKベースのアルゴリズムのために必要なトポロジー構築フェーズを削除します。

o Fault-tolerance is made somewhat simpler by making receivers responsible for reliability.


o Sender state can be significantly reduced because the sender does not need to keep track of the receivers state.


o Only a single NACK is needed from any receiver to indicate a packet that is missing by any number of receivers. Thus NACK suppression is possible.


The disadvantages are that it is more difficult for the sender to know that it can free transmission buffers, and that additional session level mechanisms are needed if the sender really needs to know if a particular receiver actually received all the data. However for many applications, neither of these is an issue.


4.2.1. NACK Suppression
4.2.1. NACK抑制

The key differences between NACK-based protocols is in how NACK-suppression is performed. The goal is for only one NACK to reach the sender (or a node that can resend the missing data) as soon as possible after the loss is first noticed, and for only one copy of the missing data to be received by those nodes needing retransmission.


Different mechanisms come close to satisfying these goals in different ways.


o SRM [FJM95] uses random timers weighted by the round trip time between the sender and each node missing the data. This is effective, but requires computing the RTT to each receiver before suppression works properly.

O SRM [FJM95]送信者とデータの欠落、各ノード間の往復時間で重み付けランダムタイマーを使用しています。これは有効であるが、抑制が正常に動作する前に、各受信機にRTTを計算する必要があります。

o NTE [HC97] uses a sender-triggered mechanism based on random keys and sliding masks. This does not require random timers, and works for very large sessions, but makes it difficult to provide the constant low-level stream of feedback needed to perform congestion control.

O NTE [HC97]はランダムキーと摺動マスクに基づいて送信者トリガ機構を使用します。これは、ランダムなタイマーを必要とし、非常に大きなセッションのために動作しますが、それが困難な輻輳制御を実行するために必要なフィードバックの一定の低レベルのストリームを提供することができるものではありません。

o AAP [Ha99] uses exponentially distributed random timers and is effective for large sessions without needing to compute the RTT to each receiver.

O AAPは[Ha99】指数分布ランダムタイマを使用し、各受信機にRTTを計算することなく、大きなセッションのために有効です。

o PGM [FLST98] and LMS [PPV98] use additional mechanisms in routers to suppress duplicate NACKs. In the case of PGM, router assistance suppliments SRM-stype random timers and localizes the suppression so that the whole group does not need suppressing.

O PGM [FLST98]とLMSは、[PPV98]重複NACKを抑制するためにルータに追加のメカニズムを使用します。 PGMの場合は、ルータの援助はSRM-STYPEランダムタイマーをsupplimentsし、グループ全体が抑制必要がないように抑制することが局在化します。

The most general of these mechanisms is probably exponentially weighted random timers. Although SRM style timers can reduce feedback delay, they are harder to use correctly in situations where all the RTTs are not known, or where the number of respondees is unknown. In contrast, exponentially weighted random timers work well across a large range of session sizes with good worst case delay characteristics.

これらのメカニズムの中で最も一般的には、おそらく指数関数的に重み付けランダムタイマーです。 SRMスタイルタイマーがフィードバック遅延を減らすことができますが、それらはすべてのRTTが知られていない状況で、正しく使用が困難であるか、あるいはrespondeesの数は不明です。これとは対照的に、指数関数的に重み付けランダムタイマーが良い最悪の場合の遅延特性とのセッションサイズの大きな範囲でうまく動作します。

Either form of random timer based mechanism can be supplemented by router-support where it is available. Sender triggered NACK mechanisms (e.g. [HC97]) are more difficult to integrate with router-based support mechanisms.


4.3. Replication
4.3. レプリケーション

Some RM protocols can be designed so as to not need explicit reliability mechanisms except in comparatively rare cases. An example is in a multicast game, where the position of a moving object is continuously multicast. This positional stream does not require additional reliability because a new position superseding the old one will be sent before any retransmission could take place. However, when the moving object interacts with other objects or stops moving, then an explicit reliability mechanism is required to reliably send the interaction information or last position.


It is not just games that can be built in this manner - the NTE shared text editor[HC97] uses just such a mechanism with changes to a line of text. For every change the whole line is sent, and so long as the user keeps typing no explicit reliability mechanism is needed. The major advantage of replication is that it is not susceptible to spatially uncorrelated packet loss. With a traditional ACK or NACK based protocol, the probability of any particular packet being received by all the receivers in a large group can be very low. This leads to high retransmission rates. In contrast, replicated streams do not suffer as the size of the receiver group increases - different receivers lose different packets, but this does not increase network traffic.

それは、このように構築することができるだけのゲームではありません - NTE共有テキストエディタ[HC97]は、テキストの行への変更とちょうどそのようなメカニズムを使用しています。すべてのために行全体が送信され、ユーザーが明示的な信頼性メカニズムが必要とされていない入力を続ける限り、される変更。複製の主な利点は、それが空間的に相関しないパケットロスの影響を受けないということです。従来のACK又はNACKベースのプロトコルを用いて、大規模なグループ内のすべての受信機によって受信される任意の特定のパケットの確率が非常に低くすることができます。これは、高再送率につながります。異なる受信機が異なるパケットを失うが、これは、ネットワークトラフィックを増加させない - これとは対照的に、複製されたストリームは、受信グループのサイズが大きくなると苦しむことはありません。

4.4. Packet-level Forward Error Correction
4.4. パケットレベル前方誤り訂正

Forward Error Correction (FEC) is a well known technique for protecting data against corruption. For reliable multicast it is most useful in the form of erasure codes.


The simplest form of packet-level FEC is to take a group of packets that is to be sent, and to XOR the packets together to form a newpacket which is also sent. If there were three original packets plus the XOR packet sent, then if a receiver is missing any one of the original data packets, but receives the XOR packet, then it can reproduce the missing original packet.

パケットレベルのFECの最も単純な形態は、送信されるパケットのグループを取るために、また、送信されるnewpacketを形成するために一緒にパケットをXORすることです。 3元のパケットプラスXORパケット送信さがあった場合は、受信機が、元のデータパケットのいずれかが欠けますが、XORパケットを受信した場合、その後、それが欠落している元のパケットを再生することができます。

More general erasure codes exist [BKKKLZ95], [Ri97], [LMSSS97] that allow the generation of n encoding packets from k original data packets. In such cases, so long as at least k of the n encoding packets are received, then the k original data packets can be reproduced.


To apply FEC the sender groups data packets into rounds, and encoding packets are produced based on all the data packets in a round. A round may consist of all data packets in an entire application data unit in some cases, whereas in other cases it may consist of a group of data packets that make up only a small portion of an application data unit.


Using erasure codes to repair packet loss is a significant improvement over simple retransmission because the dependency on which packets have been lost is removed. Thus, the amount of repair traffic required to repair spatially uncorrelated packet loss is considerably lessened.


We can divide packet-level FEC schemes into two categories: proactive FEC and reactive FEC. The difference between the two is that for proactive FEC the sender decides a priori how many encoding packets to send for each round of data packets, whereas for reactive FEC the sender initially transmits only the original data packets for each round. Then, the sender uses feedback from the receivers to compute how many packets were lost by the receiver that experienced the most loss in each round, and then only that number of additional encoding packets are sent for that round. These encoding packets will then also serve to repair loss at the other receivers that are missing fewer packets. The receivers report via ACKs or NACKs how many packets are missing from each round. With NACKs, only the receiver missing the most packets need send a NACK for this round, so this is used to weight the random timers in the NACK calculation.

積極的FECと反応FEC:我々は2つのカテゴリにパケットレベルのFECスキームを分割することができます。両者の違いは、プロアクティブFECのために、送信者は、反応FECのために、送信者は、最初に各ラウンドのためにのみ、元のデータ・パケットを送信するのに対し、多くの符号化パケットが、データパケットの各ラウンドのために送信する方法を先験的に決定することです。次に、送信者は各ラウンドで最も損失を経験した受信機によって失われた、その後、追加のエンコーディングパケットの唯一その数は、そのラウンドのために送信されますどのように多くのパケット計算するために受信機からのフィードバックを使用しています。これらの符号化パケットは、その後も、より少ないパケットが欠落している他の受信機での損失を修復するのに役立つであろう。受信機は、各ラウンドから欠落しているどのように多くのパケットのACKまたはNACKが経由して報告しています。 NACKでは、ほとんどのパケットは、このラウンドのためにNACKを送信する必要があるので、これはNACK計算におけるランダムタイマーを重み付けするために使用される唯一欠けている受信機。

Proactive and reactive FEC can be combined, e.g., a certain amount of proactive FEC can be sent for each round and if there are receivers that experience more loss than can be overcome by this for some rounds then they can request and receive additional encoding packets for these rounds.


FEC is very effective at reducing the repair traffic for packet loss. However, it requires that the data to be sent to be grouped into rounds, which can add to end-to-end latency. For bulk-data applications this is typically not a problem, but this may be an issue for interactive applications where replication may be a better solution.


4.5. Layered FEC
4.5. 階層化FEC

An alternative use of packet level FEC is possible when data is spread across several multicast groups [RVC98], [BLMR98]. In such cases, the original k data packets are used to generate n encoding packets, where n is much larger than k. The n encoded packets are then striped across multiple multicast groups. When a receiver wishes to receive the original data it joins one or more of the multicast groups, and receives the encoding packets. Once it has received k different encoding packets, the receiver can then leave all the multicast groups and reconstruct the original data.

データは、いくつかのマルチキャストグループに分散されたときに、パケットレベルのFECの別の使用は、[BLMR98]、[RVC98]が可能です。このような場合、元のk個のデータパケットは、nはkよりもはるかに大きい場合、n個の符号化パケットを生成するために使用されます。 N個の符号化されたパケットは、複数のマルチキャストグループにわたってストライピングされます。受信機は、元のデータを受信したい場合には、マルチキャストグループの一つ以上を結合し、符号化パケットを受信します。それはK個の異なる符号化パケットを受信した後、受信機は、すべてのマルチキャストグループを離れて元のデータを再構成することができます。

The primary importance of such a layering is that it allows different receivers to be able to receive the traffic at different rates according to the available capacity. Such schemes do not require any form of feedback from the receivers to the sender to ensure good throughput, and therefore the need for good throughput does not constrain the size of the receiver set. However, to perform adequate network congestion control using receiver joins and leaves in this manner may require coordination between members that are behind the same congested link from the sender. As described in the next section, [RVC98] suggests such a layered congestion control scheme.


5. Congestion Control Mechanisms

The basic delivery model of the Internet is best-effort service. No guarantees are given as to throughput, delay or packet loss. End-systems are expected to be adaptive, and to reduce their transmission rate to a level appropriate for the congestion state of the network. Although increasingly the Internet will start to support reserved bandwidth and differentiated service classes for specialist applications, unless an end-system knows explicitly that it has reserved bandwidth, it must still perform congestion control.


Broadly speaking, there are five classes of single-sender multicast congestion control solution:


o Sender-controlled, one group.


A single multicast group is used for data distribution. Feedback from the group members is used to control the rate of this group. The goal is to transmit at a rate dictated by the slowest receiver.


o Sender-controlled, multiple groups.


One initial multicast group is adaptively subdivided into multiple subgroups with subdivisions centered on congestion points in the network. Application-level relays buffer data from a group nearer the original sender, and retransmit it at a slower rate into a group further from the original sender. In this way, different receivers can receiver the data at different rates. Sender-based congestion control takes place between the members of a subgroup and their relay.


o Receiver-controlled, one group.


A single multicast group is used for data distribution. The receivers determine if the sender is transmitting too rapidly for the current congestion state of the network, and they leave the group if this is the case.


o Receiver-controlled, layered organization.


A layered approach for how to combine this scheme with a congestion control protocol that requires no receiver feedback is described in [RVC98]. The sender stripes data across multiple multicast groups simultaneously. Receivers join and leave these layered groups depending on their measurements of the congestion state of the network, so that the amount of data being received is always appropriate. However, this scheme relies on receivers to join and leave the different multicast groups in a coordinated fashion behind a bottleneck link, and it has not yet been completely confirmed that this approach will scale in practice to the Internet. As a result, more work on this congestion control mechanism would be beneficial.


o Router-based congestion control.


It is possible to add additional mechanisms to multicast routers to assist in multicast congestion control. Such mechanisms could include:


o Conditional joins (a multicast join that specifies a loss rate above which it is acceptable for the router to reject the join).


o Router filtering of traffic that exceeds a reasonable rate. This may include mechanisms for filtering traffic at different points in the network at different rates depending on local congestion conditions [LVS99].


o Fair queuing schemes combined with end-to-end adaptation.


Router-based schemes generally require more state in network routers than has traditionally been acceptable for backbone routers. Thus, in the near-term, such schemes are only likely to be applicable for intranet solutions.


For reliable multicast protocols, it is important to consider congestion control at the same time as reliability is being considered. The same mechanisms that are used to provide reliability will sometimes be used to provide congestion control.


In the case of receiver-based congestion control, open-loop delivery using FEC is the likely choice for achieving good throughput for bulk- data transfer. This is because open-loop delivery requires no feedback from receivers, and thus it is a perfect match with a receiver-based congestion-control mechanism that operates without feedback from receivers.


6. Security Considerations

Generally speaking, security considerations have relatively little effect on constraining the design space for reliable multicast protocols. The primary issues constraining the design space are all related to receiver-set scaling. For authentication of the source and of data integrity, receiver-set scaling is not a significant issue. However, for data encryption, key distribution and particularly re-keying may be significantly affected by receiver-set scaling. Tree and graph based re-keying solutions[WHA98,WGL97] would appear to be appropriate solutions to these problems. It is not clear however that such re-keying solutions need to directly affect the design of the data distribution part of a reliable multicast protocol.


The primary question to consider for the security of reliable multicast protocols is the role of third-parties. If nodes other than the original source of the data are allowed to send or resend data packets, then the security model for the protocol must take this into account. In particular, it must be clear whether such third parties are trusted or untrusted. A requirement for trusted third parties can make protocols difficult to deploy on the Internet.


Untrusted third parties (such as receivers that retransmit the data) may be used so long as the data authentication mechanisms take this into account. Typically this means that the original sender digitally signs and timestamps the data, and that the third parties resend this signed timestamped payload unmodified.


Unlike unicast protocols, denial-of-service attacks on multicast transport state are easy if the protocol design does not take such attacks into account. This is because any receiver can join the session, and can then produce feedback that influences the progress of a session involving many other receivers. Hence protection against denial-of-service attacks on reliable multicast protocols must be carefully considered. A receiver that requests retransmission of every packet, or that refuses to acknowledge packets in an ACK-based protocol can potentially bring a reliable multicast session to a standstill. Senders must have appropriate policy to deal with such conditions, and if necessary, evict the receiver from the group. A single receiver masquerading as a large number of receivers may still be an issue under such circumstances with protocols that support NACK-like functionality. Providing unique "keys" to each NACKer when they first NACK using a unicast response might potentially prevent such attacks.


Denial-of-service attacks caused by traffic flooding are however somewhat easier to protect against than with unicast. Unwanted senders can simply be pruned from the distribution tree using the mechanisms implemented in IGMP v3[CDT99].

トラフィックフラッディングによって引き起こされるDoS攻撃は、しかし、ユニキャストと比べに対する保護がやや簡単です。不要な送信者は、単に、IGMP V3 [CDT99]で実装メカニズムを使用して配信ツリーからプルーニングすることができます。

7. Conclusions

In this document we present an overview of the design space for reliable multicast within the context of one-to-many bulk-data transfer. Other flavors of multicast application are not considered in this document, and hence the overview given should not be considered inclusive of the design space for protocols that fall outside the context of one-to-many bulk-data transfer. During the course of this overview, we have reaffirmed the notion that the process of reliable multicast protocol design is affected by a number of factors that render the generation of a "one size fits all solution" moot. These factors are then described to show how an application's needs serve to constrain the set of available techniques that may be used to create a reliable multicast protocol. We examined a number of basic techniques and to show how well they can meet the needs of certain types of applications.


This document is intended to provide guidance to the IETF community regarding the standardization of reliable multicast protocols for bulk-data transfer. Given the degree to which application requirements constrain reliable multicast solutions, and the diverse set of applications that need to be supported, it should be clear that any standardization work should take great pains to be future-proof. This would seem to imply not standardizing complete reliable multicast transport protocols in one pass, but rather examining the degree to which such protocols are separable into functional building blocks, and standardizing these blocks separately to the maximum degree that makes sense. Such an approach allows for protocol evolution, and allows applications with new constraints to be supported with maximal reuse of existing and tested mechanisms.


8. Acknowledgments

This document represents an overview of the reliable multicast design space. The ideas presented are not those of the authors, but are collected from the varied presentations and discussions in the IRTF Reliable Multicast Research Group. Although they are too numerous to list here, we thank everyone who has participated in these discussions for their contributions.


9. Authors' Addresses

Mark Handley ATT Center for Internet Research at ICSI, International Computer Science Institute, 1947 Center Street, Suite 600, Berkeley, CA 94704, USA

ICSI、国際コンピュータサイエンス研究所、1947センターストリート、スイート600、バークレー、CA 94704、USAでのインターネットリサーチのためのマーク・ハンドリーATTセンター



Sally Floyd ATT Center for Internet Research at ICSI, International Computer Science Institute, 1947 Center Street, Suite 600, Berkeley, CA 94704, USA

ICSI、国際コンピュータサイエンス研究所、1947センターストリート、スイート600、バークレー、CA 94704、USAでのインターネットリサーチのためのサリーフロイドATTセンター



Brian Whetten Talarian Corporation, 333 Distel Circle, Los Altos, CA 94022, USA

ブライアンWhetten Talarian社、333 DISTELサークル、ロスアルトス、CA 94022、USA



Roger Kermode Motorola Australian Research Centre Level 3, 12 Lord St, Botany NSW 2019, Australia

ロジャーKermodeモトローラオーストラリアの研究センターレベル3、12主の聖、ボタニーNSW 2019、オーストラリア



Lorenzo Vicisano Cisco Systems, 170 West Tasman Dr. San Jose, CA 95134, USA

ロレンツォVicisanoシスコシステムズ、170西タスマン博士サンノゼ、CA 95134、USA



Michael Luby Digital Fountain, Inc. 600 Alabama Street San Francisco, CA 94110

マイケル・ルビーデジタル噴水、Inc.の600アラバマストリート、サンフランシスコ、CA 94110



10. References

[BC94] K. Birman, T. Clark. "Performance of the Isis Distributed Computing Toolkit." Technical Report TR-94-1432, Dept. of Computer Science, Cornell University.

[BC94] K.バーマン、T.クラーク。 「イシスのパフォーマンス・コンピューティング・ツールキットを配布しました。」テクニカルレポートTR-94から1432、コンピュータサイエンス学科、コーネル大学。

[BKKKLZ95] J. Bloemer, M. Kalfane, M. Karpinski, R. Karp, M. Luby, D. Zuckerman, "An XOR-based Erasure Resilient Coding Scheme", ICSI Technical Report No. TR-95-048, August 1995.

【BKKKLZ95] J. Bloemer、M. Kalfane、M. Karpinski、R.カープ、M.ルビー、D.ズッカーマン、 "符号化方式弾性XORベースの消去"、ICSIテクニカルレポートNo. TR-95から048年8月1995。

[BLMR98] J. Byers, M. Luby, M. Mitzenmacher, A. Rege, "A Digital Fountain Approach to Reliable Distribution of Bulk Data", Proc ACM SIGCOMM 98.

【BLMR98] J.バイヤーズ、M.ルビー、M. Mitzenmacher、A.レゲ、 "バルクデータの信頼性の分布にデジタルファウンテンアプローチ"、PROC ACM SIGCOMM 98。

[CDT99] Cain, B., Deering, S., and A. Thyagarajan, "Internet Group Management Protocol, Version 3", Work in Progress.

[CDT99]カイン、B.、デアリング、S.、およびA. Thyagarajan、 "インターネットグループ管理プロトコル、バージョン3" が進行中で働いています。

[FLST98] Farinacci, D., Lin, S., Speakman, T. and A. Tweedly, "PGM reliable transport protocol specification", Work in Progress.

【FLST98]ファリナッチ、D.、リン、S.、スピークマン、T.及びA. Tweedly、 "PGM信頼できるトランスポートプロトコル仕様"、ProgressのWork。

[FJM95] S. Floyd, V. Jacobson, S. McCanne, "A Reliable Multicast Framework for Light-weight Sessions and Application Level Framing", Proc ACM SIGCOMM 95, Aug 1995 pp. 342-356.

【FJM95] S.フロイド、V. Jacobsonの、S. McCanne、 "軽量セッションの高信頼マルチキャストフレームワークとアプリケーションレベルフレーミング"、PROC ACM SIGCOMM 95、1995年8月頁342から356。

[Ha99] Handley, M., "Multicast address allocation protocol (AAP)", Work in Progress.

【Ha99]ハンドレー、M.、 "マルチキャストアドレス割り当てプロトコル(AAP)"、ProgressのWork。

[HC97] M. Handley and J. Crowcroft, "Network text editor (NTE) a scalable shared text editor for MBone," ACM Computer Communication Review, vol. 27, pp. 197-208, Oct. 1997. ACM SIGCOMM'97, Sept. 1997.

[HC97] M. HandleyのとJ.クロウクロフト、「ネットワークテキストエディタ(NTE)MBONEのスケーラブルな共有テキストエディタ、」ACMコンピュータコミュニケーションレビュー、巻。 27頁197から208まで、10月1997年ACM SIGCOMM'97、1997年9月。

[KCW98] Kadansky, M., Chiu, D. and J. Wesley, "Tree-based reliable multicast (TRAM)", Work in Progress.

【KCW98] Kadansky、M.、チウ、D.およびJ.ウェスリー、 "ツリーベースの信頼性の高いマルチキャスト(TRAM)"、進行中の作業。

[LMSSS97] M. Luby, M. Mitzenmacher, A. Shokrollahi, D. Spielman, V. Stemann, "Practical Loss-Resilient Codes", Proc ACM Symposium on Theory of Computing, 1997.

【LMSSS97] M.ルビー、M. Mitzenmacher、A.ショクロラヒ、D. Spielman、V. Stemann、 "実用的なロス弾性コード"、コンピューティング、1997の理論PROC ACMシンポジウム。

[MWB+98] Montgomery, T., Whetten, B., Basavaiah, M., Paul, S., Rastogi, N., Conlan, J. and T. Yeh, "THE RMTP-II PROTOCOL", Work in Progress.

[MWB + 98]モンゴメリー、T.、Whetten、B.、Basavaiah、M.、ポール、S.、Rastogi、N.、コンラン、J.及びT.葉、 "RMTP-IIプロトコル"、ProgressのWork 。

[PPV98] C. Papadopoulos, G. Parulkar, and G. Varghese, "An error control scheme for large-scale multicast applications," in Proceedings of the Conference on Computer Communications (IEEE Infocom), (San Francisco, California), p. 1188, March/April 1998.

コンピュータ通信会議(IEEEインフォコム)、(サンフランシスコ、カリフォルニア州)の議事録における[PPV98] C.パパドプロス、G. Parulkar、及びG. Varghese、「大規模なマルチキャストアプリケーションのための誤り制御方式」、P 。 1188年3月/ 1998年4月。

[Ri97] L. Rizzo, "Effective erasure codes for reliable computer communication protocols," ACM Computer Communication Review, vol. 27, pp. 24-36, Apr. 1997.

[Ri97] L.リゾー、「信頼性の高いコンピュータ通信プロトコルのための効果的な消去符号、」ACMコンピュータコミュニケーションレビュー、巻。 27頁24-36、1997年4月。

[RV97] L. Rizzo, L. Vicisano, "A Reliable Multicast data Distribution Protocol based on software FEC techniques", Proc. of The Fourth IEEE Workshop on the Architecture and Implementation of High Performance Communication Systems (HPCS'97), Sani Beach, Chalkidiki, Greece June 23-25, 1997.

"ソフトウェアFEC技術に基づいた信頼性の高いマルチキャストデータ配布プロトコル" [RV97] L.リゾー、L. Vicisano、PROC。高性能通信システムのアーキテクチャと実装第4回IEEEワークショップ(HPCS'97)、サニビーチ、ハルキディキ、ギリシャ6月23-25、1997年。

[RVC98] L. Rizzo, L. Vicisano, J. Crowcroft, "The RLC multicast congestion control algorithm", submitted to IEEE Network - special issue multicast.

[RVC98] L.リゾー、L. Vicisano、J.クロウクロフト、IEEEネットワークに提出し、 "RLCマルチキャスト輻輳制御アルゴリズム"、 - 特別号マルチキャスト。

[RMWT98] Robertson, K., Miller, K., White, M. and A. Tweedly, "StarBurst multicast file transfer protocol (MFTP) specification", Work in Progress.

【RMWT98]ロバートソン、K.、ミラー、K.、白、M.及びA. Tweedly、 "スターバーストマルチキャストファイル転送プロトコル(MFTP)仕様"、ProgressのWork。

[WHA98] Wallner, D., Hardler, E. and R. Agee, "Key Management for Multicast: Issues and Architectures", RFC 2627, June 1999.

[WHA98]ウォルナー、D.、Hardler、E.およびR.アゲ、 "マルチキャストのためのキー管理:問題とアーキテクチャ"、RFC 2627、1999年6月。

[WKM94] Brian Whetten, Simon Kaplan, and Todd Montgomery, "A high performance totally ordered multicast protocol," research memorandum, Aug. 1994.


[WGL97] C.K. Wong, M. Gouda, S. Lam, "Secure Group Communications Using Key Graphs," Technical Report TR 97-23, Department of Computer Sciences, The University of Texas at Austin, July 1997.

【WGL97] C.K.ウォン、M.ゴーダ、S.ラムは、「キーグラフを使用してグループ通信をセキュア、」テクニカルレポートTR 97から23、コンピュータ科学科、オースチン、1997年7月のテキサス大学。

11. Full Copyright Statement

Copyright (C) The Internet Society (2000). All Rights Reserved.


This document and translations of it may be copied and furnished to others, and derivative works that comment on or otherwise explain it or assist in its implementation may be prepared, copied, published and distributed, in whole or in part, without restriction of any kind, provided that the above copyright notice and this paragraph are included on all such copies and derivative works. However, this document itself may not be modified in any way, such as by removing the copyright notice or references to the Internet Society or other Internet organizations, except as needed for the purpose of developing Internet standards in which case the procedures for copyrights defined in the Internet Standards process must be followed, or as required to translate it into languages other than English.


The limited permissions granted above are perpetual and will not be revoked by the Internet Society or its successors or assigns.






Funding for the RFC Editor function is currently provided by the Internet Society.

RFC Editor機能のための基金は現在、インターネット協会によって提供されます。