Network Working Group                                           D. Thaler
Request for Comments: 2991                                      Microsoft
Category: Informational                                          C. Hopps
                                                     NextHop Technologies
                                                            November 2000
      Multipath Issues in Unicast and Multicast Next-Hop Selection

Status of this Memo


This memo provides information for the Internet community. It does not specify an Internet standard of any kind. Distribution of this memo is unlimited.


Copyright Notice


Copyright (C) The Internet Society (2000). All Rights Reserved.




Various routing protocols, including Open Shortest Path First (OSPF) and Intermediate System to Intermediate System (ISIS), explicitly allow "Equal-Cost Multipath" (ECMP) routing. Some router implementations also allow equal-cost multipath usage with RIP and other routing protocols. The effect of multipath routing on a forwarder is that the forwarder potentially has several next-hops for any given destination and must use some method to choose which next-hop should be used for a given data packet.


1. Introduction
1. はじめに

Various routing protocols, including OSPF and ISIS, explicitly allow "Equal-Cost Multipath" routing. Some router implementations also allow equal-cost multipath usage with RIP and other routing protocols. Using equal-cost multipath means that if multiple equal-cost routes to the same destination exist, they can be discovered and used to provide load balancing among redundant paths.


The effect of multipath routing on a forwarder is that the forwarder potentially has several next-hops for any given destination and must use some method to choose which next-hop should be used for a given data packet. This memo summarizes current practices, problems, and solutions.


2. Concerns

Several router implementations allow multipath forwarding. This is sometimes done naively via round-robin, where each packet matching a given destination route is forwarded using the subsequent next-hop, in a round-robin fashion. This does provide a form of load balancing, but there are several problems with approaches such as round-robin or random:


Variable Path MTU Since each of the redundant paths may have a different MTU, this means that the overall path MTU can change on a packet-by-packet basis, negating the usefulness of path MTU discovery.

変数Path MTU冗長パスの各々は、異なるMTUを有することができるので、これは全体的な経路MTUは、パスMTU発見の有用性を否定、パケットごとに変更することができることを意味します。

Variable Latencies Since each of the redundant paths may have a different latency involved, having packets take separate paths can cause packets to always arrive out of order, increasing delivery latency and buffering requirements.


         Packet reordering causes TCP to believe that loss has taken
         place when packets with higher sequence numbers arrive before
         an earlier one.  When three or more packets are received before
         a "late" packet, TCP enters a mode called "fast-retransmit" [6]
         which consumes extra bandwidth (which could potentially cause
         more loss, decreasing throughput) as it attempts to
         unnecessarily retransmit the delayed packet(s).  Hence,
         reordering can be detrimental to network performance.

Debugging Common debugging utilities such as ping and traceroute are much less reliable in the presence of multiple paths and may even present completely wrong results.


In multicast routing, the problem with multiple paths is that multicast routing protocols prevent loops and duplicates by constructing a single tree to all receivers of the same group address. Multicast routing protocols deployed today (DVMRP, PIM-DM, PIM-SM) [2] construct shortest-path trees rooted at either the source, or another router known as a Core or Rendezvous Point. Hence, the way they ensure that duplicates will not arise is that a given tree must use only a single next-hop towards the root of the tree.


3. Requirements

In the remainder of this document, we will use the term "flow" to represent the granularity at which the router keeps state (if at all) for classes of traffic. The exact definition of a flow may depend on the actual implementation. For example, a flow might be identified solely by destination address, or it might be identified by (source address, destination address, protocol id) triplet. Hence "flow" is not necessarily synonymous with the term "microflow" as used in RFC 2474 [7], which also includes port numbers. Indeed, including transport-layer information in the next-hop selection process can actually be problematic. For example, if packets are fragmented, the transport-layer information may not be available in every packet. Furthermore, having the choice of path depend on transport-layer fields may negate the benefit of caching information such as MTU for use in subsequent connections between the same endpoints.

この文書の残りの部分では、我々は(すべての場合)、ルータがトラフィックのクラスの状態を維持しているの粒度を表すために、用語「流れ」を使用します。流れの正確な定義は、実際の実装に依存してもよいです。例えば、フローは単に、宛先アドレスによって識別されるかもしれない、またはそれが(ソースアドレス、宛先アドレス、プロトコルID)トリプレットによって識別されるかもしれません。したがって、「フロー」は、ポート番号を含む[7]、必ずしもRFC 2474で使用される用語「マイクロ」と同義ではありません。実際、次ホップ選択処理におけるトランスポートレイヤ情報を含めて、実際に問題となり得ます。パケットが断片化されている場合、例えば、トランスポート層の情報は、すべてのパケットに利用可能ではないかもしれません。また、経路の選択を有する同一のエンドポイント間の後続の接続で使用するためにそのようなMTUなどの情報をキャッシュの利点を打ち消すことができるトランスポート層フィールドに依存します。

All of the problems outlined in the previous section arise when packets in the same unicast or multicast "flow" are split among multiple paths. The natural solution is therefore to ensure that packets for the same flow always use the same path.


Two additional features are desirable:


Minimal disruption When multipath is used, meaning that multiple routes contribute valid next-hops, the chances are higher of routes being added and deleted from consideration than when only the "best" route is used (in which case metric changes in alternate routes have no effect on traffic paths). Since a higher number of routes may actually be used for forwarding when multipath is in use, the potential for packet reordering and packet loss due to route flaps can be much greater than when not using multipath. Hence, it is desirable to minimize the number of active flows affected by the addition or deletion of another next-hop.


Fast implementation The amount of additional computation required to forward a packet should be small. For example, when doing round-robin, this computation might consist of incrementing (modulo the number of next-hops) a next-hop index.


4. Solutions

We now provide three possible methods for improving the performance of multipath and then discuss their applicability to unicast and multicast forwarding.


Modulo-N Hash To select a next-hop from the list of N next-hops, the router performs a modulo-N hash over the packet header fields that identify a flow. This has the advantage of being fast, at the expense of (N-1)/N of all flows changing paths whenever a next-hop is added or removed.


Hash-Threshold The router first selects a key by performing a hash over the packet header fields that identify the flow. The N next-hops have been assigned unique regions in the hash function's output space. By comparing the hash value against region boundaries the router can determine which region the hash value belongs to and thus which next-hop to use. This method has the advantage of only affecting flows near the region boundaries (or thresholds) when next-hops are added or removed. For ECMP hash-threshold's lookup can be done with a simple division (hash_value / fixed_region_size). When a next-hop is added or removed, between 1/4 and 1/2 of all flows change paths. An analysis of this method can be found in [3].

ハッシュしきい値ルータは最初のフローを識別パケットヘッダフィールド上ハッシュを行うことにより、キーを選択します。 Nネクストホップは、ハッシュ関数の出力空間でユニークな領域を割り当てられています。領域境界に対してハッシュ値を比較することにより、ルータは、ハッシュ値が属する領域を判定することができるので、次のホップが使用します。この方法は、次のホップが追加または削除された領域の境界(またはしきい値)付近の流れに影響を与えるという利点を有します。 ECMPハッシュ・スレッショルドの検索のために、単純な割り算(HASH_VALUE / fixed_region_size)で行うことができます。ネクストホップを追加または削除、すべてのフローの1/4と1/2の間にあるときにパスを変更。この方法の分析は、[3]に見ることができます。

Highest Random Weight (HRW) The router computes a key for EACH next-hop by performing a hash over the packet header fields that identify the flow, as well as over the address of the next-hop. The router then chooses the next-hop with the highest resulting key value [4]. This has the advantage of minimizing the number of flows affected by a next-hop addition or deletion (only 1/N of them), but is approximately N times as expensive as a modulo-N hash.

最高ランダム重量(HRW)ルータは、フローを識別パケットヘッダフィールド上、ならびに次のホップのアドレス上にハッシュを実行することによって、各次ホップのためのキーを計算します。ルータは次に、最高得られたキー値[4]を用いて次ホップを選択します。これは、ネクストホップ付加または欠失(そのうちのわずか1 / N)によって影響を受けるフローの数を最小限にするという利点を有するが、モジュロNのハッシュとして約N倍高価です。

The applicability of these three alternatives depends on (at least) two factors: whether the forwarder maintains per-flow state, and how precious CPU is to a multipath forwarder.


Some routers may maintain per-flow state for reasons other than for supporting multipath. For example, routers typically keep per-flow state for multicast flows so that they can maintain the list of interfaces to which packets in the flow should be copied.


If per-flow state is maintained in a multipath forwarder, then computation of the next-hop can be done by the router at state creation time. This entails no additional computations at packet forwarding time compared with normal forwarding to a single next-hop, since the next-hop is precomputed. In this case, any method can be used, including round-robin, random, modulo-N, hash-threshold or HRW. Hash functions such as modulo-N, hash-threshold and HRW are better if the forwarder state may be deleted for any reason during the lifetime of a flow since subsequent next-hop computations by the router will always select the same path. This also improves the usefulness of debugging utilities such as traceroute. Finally, to maximize the stability of paths (and hence the usefulness of traceroute, etc.), the use of HRW is recommended over the other methods mentioned herein.


If per-flow state is not maintained by the forwarder, then using multiple next-hops requires that the next-hop be calculated at packet arrival time. When CPU is more precious than stability of flow paths, hash-threshold is recommended over the other methods mentioned herein.

フロー毎の状態をフォワーダによって維持されていない場合は、複数の次のホップを使用して、次のホップがパケット到着時間で計算されることを要求します。 CPUは、流路の安定性よりも貴重である場合、ハッシュ閾値は、本明細書に記載の他の方法よりも推奨されます。

4.1. Unicast Forwarding
4.1. ユニキャスト転送

Depending on the implementation, unicast forwarding may or may not keep per-flow state. We recommend that where forwarder implementations keep flow state, routers should use HRW at state creation time (and next-hop deletion time) to select the next-hop, and that forwarders without per-flow state use hash-threshold.


4.2. Multicast Forwarding
4.2. マルチキャスト転送

Today's multicast forwarding engines use a cache of forwarding entries indexed by group (or group prefix) and source (or source prefix). This means that today's multicast forwarder's always keep per-flow state, although for some multicast routing protocols, the "flow" may be fairly coarse (e.g., traffic from all sources to the same destination). Since per-flow state is kept by the forwarder, it is recommended that the router always use HRW to select the next-hop.


Routers using explicit-joining protocols such as PIM-SM [5] should thus use the multipath information when determining to which neighbor a join message should be sent. For example, when multiple next-hops exist for a given Rendezvous Point (RP) toward which a (*,G) Join should be sent, it is recommended that HRW be used to select the next-hop to use for each group.


5. Applicability

The algorithms discussed above (except round-robin) all rely on some form of hash function. Equal flow distribution is achieved when the hash function is uniformly distributed. Since the commonly used hash functions only become uniformly distributed when the number of inputs is relatively large, these algorithms are more applicable to routers used to route many flows, than in, for example, a small business setting.


6. Redundant Parallel Links

A related problem occurs when multiple parallel links are used between the same pair of routers. A common solution is to bundle the two links together into a "super"-link when is then used for routing. For multicast forwarding, this results in the two links being reduced to a single next-hop (over the combined link) which can be used to prevent duplicates. When a unicast or multicast packet is queued to the combined link, some method, such as those discussed earlier, is still required to determine the physical link on which to transmit the packet. If the parallel links are identical, then most of the concerns discussed in this document are avoided with the combined link. The exception is packet reordering, which can still occur with round-robin, adversely affecting TCP.


7. Security Considerations

This document discusses issues with various methods of choosing a next-hop from among multiple valid next-hops. As such, it does not directly impact the security of the Internet infrastructure or its applications.


One issue that is worth mentioning, however, is that when next-hop selection is predictable, an attacker can synthesize traffic that will all hash the same, making it possible to launch a denial-of-service attack that overloads a particular path. Since a special case of this is when the same (single) next-hop is always selected, such an attack is easiest when multipath is not being used. Introducing multipath routing can make such an attack more difficult; the more unpredictable the hash is, the harder it becomes to conduct a denial-of-service attack against any single link.


8. References

[1] Moy, J., "OSPF Version 2", STD 54, RFC 2328, April 1998.

[1]モイ、J.、 "OSPFバージョン2"、STD 54、RFC 2328、1998年4月。

[2] Maufer, T., "Deploying IP Multicast in the Enterprise", Prentice-Hall, 1998.

[2] Maufer、T.、 "Enterpriseの展開IPマルチキャスト"、プレンティス・ホール、1998。

[3] Hopps, C., "Analysis of an Equal-Cost Multi-Path Algorithm", RFC 2992, November 2000.

[3] Hoppsが、C.、 "等価コストマルチパスアルゴリズムの分析"、RFC 2992、2000年11月に。

[4] Thaler, D., and C.V. Ravishankar, "Using Name-Based Mappings to Increase Hit Rates", IEEE/ACM Transactions on Networking, February 1998.

[4]ターレル、D.、及びC.V. Ravishankar、ネットワーキング、1998年2月にIEEE / ACM取引、「ヒット率を高めるために名前ベースのマッピングを使用します」。

[5] Estrin, D., Farinacci, D., Helmy, A., Thaler, D., Deering, S., Handley, M., Jacobson, V., Liu, C., Sharma, P. and L. Wei, "Protocol Independent Multicast-Sparse Mode (PIM-SM): Protocol Specification", RFC 2362, June 1998.

[5] Estrin、D.、ファリナッチ、D.、Helmy、A.、ターラー、D.、デアリング、S.、ハンドレー、M.、ヤコブソン、V.、劉、C.、シャルマ、P.、およびL.魏、 "プロトコル独立マルチキャスト - スパースモード(PIM-SM):プロトコル仕様"、RFC 2362、1998年6月。

[6] Allman, M., Paxson, V. and W. Stevens, "TCP Congestion Control", RFC 2581, April 1999.

[6]オールマン、M.、パクソン、V.とW.スティーブンス、 "TCP輻輳制御"、RFC 2581、1999年4月。

[7] Nichols, K., Blake, S., Baker, F. and D. Black., "Definition of the Differentiated Services Field (DS Field) in the IPv4 and IPv6 Headers", RFC 2474, December 1998.

[7]ニコルズ、K.、ブレイク、S.、ベイカー、F.とD.黒。、 "IPv4とIPv6ヘッダーの差別化されたサービス分野(DSフィールド)の定義"、RFC 2474、1998年12月。

9. Authors' Addresses

Dave Thaler Microsoft One Microsoft Way Redmond, WA 98052


Phone: +1 425 703 8835 EMail:

電話:+1 425 703 8835 Eメール

Christian E. Hopps NextHop Technologies, Inc. 517 W. William Street Ann Arbor, MI 48103-4943 U.S.A

クリスチャン・E. HoppsがのNextHop Technologies社517 W.ウィリアム・ストリートアナーバー、ミシガン州48103から4943 U.S.A

Phone: +1 734 936 0291 EMail:

電話:+1 734 936 0291 Eメール

10. Full Copyright Statement

Copyright (C) The Internet Society (2000). All Rights Reserved.


This document and translations of it may be copied and furnished to others, and derivative works that comment on or otherwise explain it or assist in its implementation may be prepared, copied, published and distributed, in whole or in part, without restriction of any kind, provided that the above copyright notice and this paragraph are included on all such copies and derivative works. However, this document itself may not be modified in any way, such as by removing the copyright notice or references to the Internet Society or other Internet organizations, except as needed for the purpose of developing Internet standards in which case the procedures for copyrights defined in the Internet Standards process must be followed, or as required to translate it into languages other than English.


The limited permissions granted above are perpetual and will not be revoked by the Internet Society or its successors or assigns.






Funding for the RFC Editor function is currently provided by the Internet Society.

RFC Editor機能のための基金は現在、インターネット協会によって提供されます。