Network Working Group                                            C. Hopps
Request for Comments: 2992                           NextHop Technologies
Category: Informational                                     November 2000
             Analysis of an Equal-Cost Multi-Path Algorithm

Status of this Memo


This memo provides information for the Internet community. It does not specify an Internet standard of any kind. Distribution of this memo is unlimited.


Copyright Notice


Copyright (C) The Internet Society (2000). All Rights Reserved.




Equal-cost multi-path (ECMP) is a routing technique for routing packets along multiple paths of equal cost. The forwarding engine identifies paths by next-hop. When forwarding a packet the router must decide which next-hop (path) to use. This document gives an analysis of one method for making that decision. The analysis includes the performance of the algorithm and the disruption caused by changes to the set of next-hops.


1. Hash-Threshold

One method for determining which next-hop to use when routing with ECMP can be called hash-threshold. The router first selects a key by performing a hash (e.g., CRC16) over the packet header fields that identify a flow. The N next-hops have been assigned unique regions in the key space. The router uses the key to determine which region and thus which next-hop to use.

ECMPでルーティングするときに使用する次のホップを決定するための一つの方法は、ハッシュ閾値と呼ぶことができます。ルータは最初のフローを識別パケットヘッダーフィールド上ハッシュ(例えば、CRC16)を行うことにより、キーを選択します。 Nネクストホップは、鍵空間でユニークな領域を割り当てられています。ルータが使用する領域、したがって次のホップを決定するためにキーを使用します。

As an example of hash-threshold, upon receiving a packet the router performs a CRC16 on the packet's header fields that define the flow (e.g., the source and destination fields of the packet), this is the key. Say for this destination there are 4 next-hops to choose from. Each next-hop is assigned a region in 16 bit space (the key space). For equal usage the router may have chosen to divide it up evenly so each region is 65536/4 or 16k large. The next-hop is chosen by determining which region contains the key (i.e., the CRC result).


2. Analysis

There are a few concerns when choosing an algorithm for deciding which next-hop to use. One is performance, the computational requirements to run the algorithm. Another is disruption (i.e., the changing of which path a flow uses). Balancing is a third concern; however, since the algorithm's balancing characteristics are directly related to the chosen hash function this analysis does not treat this concern in depth.


For this analysis we will assume regions of equal size. If the output of the hash function is uniformly distributed the distribution of flows amongst paths will also be uniform, and so the algorithm will properly implement ECMP. One can implement non-equal-cost multi-path routing by using regions of unequal size; however, non-equal-cost multi-path routing is outside the scope of this document.


2.1. Performance
2.1. 演奏

The performance of the hash-threshold algorithm can be broken down into three parts: selection of regions for the next-hops, obtaining the key and comparing the key to the regions to decide which next-hop to use.


The algorithm doesn't specify the hash function used to obtain the key. Its performance in this area will be exactly the performance of the hash function. It is presumed that if this calculation proves to be a concern it can be done in hardware parallel to other operations that need to complete before deciding which next-hop to use.


Since regions are restricted to be of equal size the calculation of region boundaries is trivial. Each boundary is exactly regionsize away from the previous boundary starting from 0 for the first region. As we will show, for equal sized regions, we don't need to store the boundary values.


To choose the next-hop we must determine which region contains the key. Because the regions are of equal size determining which region contains the key is a simple division operation.


                regionsize = keyspace.size / #{nexthops}
                region = key / regionsize;

Thus the time required to find the next-hop is dependent on the way the next-hops are organized in memory. The obvious use of an array indexed by region yields O(1).


2.2. Disruption
2.2. 破壊

Protocols such as TCP perform better if the path they flow along does not change while the stream is connected. Disruption is the measurement of how many flows have their paths changed due to some change in the router. We measure disruption as the fraction of total flows whose path changes in response to some change in the router. This can become important if one or more of the paths is flapping. For a description of disruption and how it affects protocols such as


TCP see [1].


Some algorithms such as round-robin (i.e., upon receiving a packet the least recently used next-hop is chosen) are disruptive regardless of any change in the router. Clearly this is not the case with hash-threshold. As long as the region boundaries remain unchanged the same next-hop will be chosen for a given flow.


Because we have required regions to be equal in size the only reason for a change in region boundaries is the addition or removal of a next-hop. In this case the regions must all grow or shrink to fill the key space. The analysis begins with some examples of this.


             |   1   |   2   |   3   |   4   |   5   |
             |    1    |    2    |    4    |    5    |

Figure 1. Before and after deletion of region 3


In figure 1. region 3 has been deleted. The remaining regions grow equally and shift to compensate. In this case 1/4 of region 2 is now in region 1, 1/2 (2/4) of region 3 is in region 2, 1/2 of region 3 is in region 4 and 1/4 of region 4 is in region 5. Since each of the original regions represent 1/5 of the flows, the total disruption is 1/5*(1/4 + 1/2 + 1/2 + 1/4) or 3/10.

図では1領域3が削除されました。残りの領域は、均等に成長し、補償するシフト。この場合、領域2の1/4領域1、領域3の1/2(2/4)は、領域3の1/2領域4にあり、領域4の1/4であり、領域2にある中であります元の領域の各々は、流れの1/5を表す領域5ので、総破壊は、1/5×(1/4 + 1/2 + 1/2 + 1/4)または3/10です。

Note that the disruption to flows when adding a region is equivalent to that of removing a region. That is, we are considering the fraction of total flows that changes regions when moving from N to N-1 regions, and that same fraction of flows will change when moving from N-1 to N regions.


             |   1   |   2   |   3   |   4   |   5   |
             |    1    |    2    |    3    |    5    |

Figure 2. Before and after deletion of region 4


In figure 2. region 4 has been deleted. Again the remaining regions grow equally and shift to compensate. 1/4 of region 2 is now in region 1, 1/2 of region 3 is in region 2, 3/4 of region 4 is in region 3 and 1/4 of region 4 is in region 5. Since each of the original regions represent 1/5 of the flows the, total disruption is 7/20.


To generalize, upon removing a region K the remaining N-1 regions grow to fill the 1/N space. This growth is evenly divided between the N-1 regions and so the change in size for each region is 1/N/(N-1) or 1/(N(N-1)). This change in size causes non-end regions to move. The first region grows and so the second region is shifted towards K by the change in size of the first region. 1/(N(N-1)) of the flows from region 2 are subsumed by the change in region 1's size. 2/(N(N-1)) of the flows in region 3 are subsumed by region 2. This is because region 2 has shifted by 1/(N(N-1)) and grown by 1/(N(N-1)). This continues from both ends until you reach the regions that bordered K. The calculation for the number of flows subsumed from the Kth region into the bordering regions accounts for the removal of the Kth region. Thus we have the following equation.

残りのN-1領域は、1 / Nの空間を埋めるために成長K領域を除去する際に、一般化します。この成長は均一にN-1領域の間に分割して、各地域の大きさの変化は、/ 1 / Nである(N-1)又は1 /(N(N-1))されています。サイズの変化は、移動する非端部領域が発生します。第1の領域は成長し、したがって第2の領域は第1の領域の大きさの変化によりKに向かってシフトされます。領域2からの流れの1 /(N(N-1))が領域1の大きさの変化により包含されます。領域3における流れの2 /(N(N-1))領域2が1 /(N(N-1))だけシフトし、1 /(N(N-によって成長したからである領域2によって包含されます1))。あなたが国境を接する地域にK番目の領域から包摂フローの数のための計算は、K番目領域の除去を占めKを境領域に達するまで、これは両端から続いています。したがって、私たちは次の式を持っています。

                           K-1              N
                           ---    i        ---  (i-K)
             disruption =  \     ---    +  \     ---
                           /   (N)(N-1)    /   (N)(N-1)
                           ---             ---
                           i=1            i=K+1

We can factor 1/((N)(N-1)) out as it is constant.

それが一定であるように我々は、1 /((N)(N-1))を考慮することができます。

                                /  K-1         N        \
                          1     |  ---        ---       |
                     =   ---    |  \    i  +  \   (i-K) |
                       (N)(N-1) |  /          /         |
                                \  ---        ---       /
                                     1        i=K+1

We now use the the concrete formulas for the sum of integers. The first summation is (K)(K-1)/2. For the second summation notice that we are summing the integers from 1 to N-K, thus it is (N-K)(N-K+1)/2.

私たちは今、整数の合計のための具体的な数式を使用しています。最初の合計は(K)(K-1)/ 2です。我々は1からN-Kの整数を加算された第2の加算通知のために、したがって、(N-K)(N-K + 1)/ 2です。

                             (K-1)(K) + (N-K)(N-K+1)
                           = -----------------------

Considering the summations, one can see that the least disruption is when K is as close to half way between 1 and N as possible. This can be proven by finding the minimum of the concrete formula for K holding N constant. First break apart the quantities and collect.


                            2K*K - 2K - 2NK + N*N + N
                          = -------------------------
                             K*K - K - NK      N + 1
                          = --------------  + -------
                               (N)(N-1)        2(N-1)

Since we are minimizing for K the right side (N+1)/2(N-1) is constant as is the denominator (N)(N-1) so we can drop them. To minimize we take the derivative. d -- (K*K - (N+1)K) dk

我々はKのために最小化されるので、分母(N)(N-1)であるので、我々はそれらをドロップすることができるように右側(N + 1)/ 2(N-1)は一定です。最小限に抑えるために、我々は、微分を取ります。 D - (K * K - (N + 1)K)DK

= 2K - (N+1)

= 2K - (N + 1)

Which is zero when K is (N+1)/2.

Kは、(N + 1)/ 2であるときにはゼロです。

The last thing to consider is that K must be an integer. When N is odd (N+1)/2 will yield an integer, however when N is even (N+1)/2 yields an integer + 1/2. In the case, because of symmetry, we get the least disruption when K is N/2 or N/2 + 1.

考慮すべき最後のことは、Kは整数でなければならないということです。 Nは、(N + 1)が奇数であるとき/ 2はNが偶数である場合しかし、整数を生じる(N + 1)/ 2が整数+ 1/2が得られます。対称性から、我々は少なくとも混乱を得るため場合、KはN / 2又はN / 2 + 1である場合。

Now since the formula is quadratic with a global minimum half way between 1 and N the maximum possible disruption must occur when edge regions (1 and N) are removed. If K is 1 or N the formula reduces to 1/2.

エッジ領域(1、N)が削除されたとき、今式は1とNとの間のグローバル最小途中で二次であるので、可能な最大の破壊が発生しなければなりません。 Kは、1又はNである場合、式は1/2に減少させます。

The minimum possible disruption is obtained by letting K=(N+1)/2. In this case the formula reduces to 1/4 + 1/(4*N). So the range of possible disruption is (1/4, 1/2].

最小の可能な破壊は、K =(N + 1)/ 2をさせることによって得られます。この場合、式は1/4 + 1 /(4 * N)に低減します。そう可能破壊の範囲は(1/4、1/2です。

To minimize disruption we recommend adding new regions to the center rather than the ends.


3. Comparison to other algorithms

Other algorithms exist to decide which next-hop to use. These algorithms all have different performance and disruptive characteristics. Of these algorithms we will only consider ones that are not disruptive by design (i.e., if no change to the set of next-hops occurs the path a flow takes remains the same). This will exclude round-robin and random choice. We will look at modulo-N and highest random weight.


Modulo-N is a "simpler" form of hash-threshold. Given N next-hops the packet header fields which describe the flow are run through a hash function. A final modulo-N is applied to the output of the hash. This result then directly maps to one of the next-hops. Modulo-N is the most disruptive of the algorithms; if a next-hop is added or removed the disruption is (N-1)/N. The performance of Modulo-N is equivalent to hash-threshold.

モジュロ-Nは、ハッシュしきい値の「単純な」形です。 N所定のフローを記述するパケットヘッダフィールドは、ハッシュ関数を介して実行され、次のホップ。最終モジュロNはハッシュの出力に適用されます。この結果は、直接ネクストホップのいずれかにマッピングされます。モジュロ-Nは、アルゴリズムの中で最も破壊的です。ネクストホップが追加または削除された場合に破壊は、(N-1)/ Nです。モジュロNの性能は、ハッシュ閾値と同等です。

Highest random weight (HRW) is a comparative method similar in some ways to hash-threshold with non-fixed sized regions. For each next-hop, the router seeds a pseudo-random number generator with the packet header fields which describe the flow and the next-hop to obtain a weight. The next-hop which receives the highest weight is selected. The advantage with using HRW is that it has minimal disruption (i.e., disruption due to adding or removing a next-hop is always 1/N.) The disadvantage with HRW is that the next-hop selection is more expensive than hash-threshold. A description of HRW along with comparisons to other methods can be found in [2]. Although not used for next-hop calculation an example usage of HRW can be found in [3].

最高ランダム量(HRW)は、非固定サイズの領域とハッシュ閾値といくつかの点で同様の比較方法です。各次のホップのために、ルータの種子の流れ及び重量を得るために、次ホップを記載してパケットヘッダフィールドを持つ擬似乱数生成器。最も高い重みを受け取る次ホップが選択されます。 HRWを使用しての利点は、それが最小限の中断(これは次のホップを追加または削除する、すなわち、破壊が常に1 / Nである。)を有することであるHRWの欠点は、次ホップ選択はハッシュ閾値よりも高価であることです。他の方法との比較とともに、HRWの説明は、[2]に見出すことができます。ネクストホップ計算のために使用されていないがHRWの使用例[3]に見ることができます。

Since each of modulo-N, hash-threshold and HRW require a hash on the packet header fields which define a flow, we can factor the performance of the hash out of the comparison. If the hash can not be done inexpensively (e.g., in hardware) it too must be considered when using any of the above methods.


The lookup performance for hash-threshold, like modulo-N is an optimal O(1). HRW's lookup performance is O(N).

モジュロNのようなハッシュ閾値のルックアップ性能は、最適なO(1)です。 HRWの検索のパフォーマンスはO(N)です。

Disruptive behavior is the opposite of performance. HRW is best with 1/N. Hash-threshold is between 1/4 and 1/2. Finally Modulo-N is (N-1)/N.

破壊的行動は、パフォーマンスの反対です。 HRWは、1 / Nと最高です。ハッシュ閾値は1/4と1/2の間です。最後モジュロ-Nは、(N-1)/ Nです。

If the complexity of HRW's next-hop selection process is acceptable we think it should be considered as an alternative to hash-threshold. This could be the case when, for example, per-flow state is kept and thus the next-hop choice is made infrequently.


However, when HRW's next-hop selection is seen as too expensive the obvious choice is hash-threshold as it performs as well as modulo-N and is less disruptive.


4. Security Considerations

This document is an analysis of an algorithm used to implement an ECMP routing decision. This analysis does not directly affect the security of the Internet Infrastructure.


5. References

[1] Thaler, D. and C. Hopps, "Multipath Issues in Unicast and Multicast", RFC 2991, November 2000.

[1]ターラー、D.およびC. Hoppsが、 "ユニキャストとマルチキャストにおけるマルチパスの問題"、RFC 2991、2000年11月。

[2] Thaler, D. and C.V. Ravishankar, "Using Name-Based Mappings to Increase Hit Rates", IEEE/ACM Transactions on Networking, February 1998.

[2]ターレル、D.及びC.V. Ravishankar、ネットワーキング、1998年2月にIEEE / ACM取引、「ヒット率を高めるために名前ベースのマッピングを使用します」。

[3] Estrin, D., Farinacci, D., Helmy, A., Thaler, D., Deering, S., Handley, M., Jacobson, V., Liu, C., Sharma, P. and L. Wei, "Protocol Independent Multicast-Sparse Mode (PIM-SM): Protocol Specification", RFC 2362, June 1998.

[3] Estrin、D.、ファリナッチ、D.、Helmy、A.、ターラー、D.、デアリング、S.、ハンドレー、M.、ヤコブソン、V.、劉、C.、シャルマ、P.、およびL.魏、 "プロトコル独立マルチキャスト - スパースモード(PIM-SM):プロトコル仕様"、RFC 2362、1998年6月。

6. Author's Address

Christian E. Hopps NextHop Technologies, Inc. 517 W. William Street Ann Arbor, MI 48103-4943 U.S.A

クリスチャン・E. HoppsがのNextHop Technologies社517 W.ウィリアム・ストリートアナーバー、ミシガン州48103から4943 U.S.A

Phone: +1 734 936 0291 EMail:

電話:+1 734 936 0291 Eメール

7. Full Copyright Statement

Copyright (C) The Internet Society (2000). All Rights Reserved.


This document and translations of it may be copied and furnished to others, and derivative works that comment on or otherwise explain it or assist in its implementation may be prepared, copied, published and distributed, in whole or in part, without restriction of any kind, provided that the above copyright notice and this paragraph are included on all such copies and derivative works. However, this document itself may not be modified in any way, such as by removing the copyright notice or references to the Internet Society or other Internet organizations, except as needed for the purpose of developing Internet standards in which case the procedures for copyrights defined in the Internet Standards process must be followed, or as required to translate it into languages other than English.


The limited permissions granted above are perpetual and will not be revoked by the Internet Society or its successors or assigns.






Funding for the RFC Editor function is currently provided by the Internet Society.

RFC Editor機能のための基金は現在、インターネット協会によって提供されます。