draft-ietf-bess-evpn-unequal-lb-01.txt   draft-ietf-bess-evpn-unequal-lb-02.txt 
skipping to change at page 1, line 19 skipping to change at page 1, line 19
J. Rabadan J. Rabadan
Nokia Nokia
J. Drake J. Drake
Juniper Juniper
A. Lingala A. Lingala
AT&T AT&T
Expires: Sept 26, 2019 March 25, 2019 Expires: Jan 23, 2020 July 22, 2019
Weighted Multi-Path Procedures for EVPN All-Active Multi-Homing Weighted Multi-Path Procedures for EVPN All-Active Multi-Homing
draft-ietf-bess-evpn-unequal-lb-01 draft-ietf-bess-evpn-unequal-lb-02
Abstract Abstract
In an EVPN-IRB based network overlay, EVPN all-active multi-homing In an EVPN-IRB based network overlay, EVPN all-active multi-homing
enables multi-homing for a CE device connected to two or more PEs via enables multi-homing for a CE device connected to two or more PEs via
a LAG bundle, such that bridged and routed traffic from remote PEs a LAG bundle, such that bridged and routed traffic from remote PEs
can be equally load balanced (ECMPed) across the multi-homing PEs. can be equally load balanced (ECMPed) across the multi-homing PEs.
This document defines extensions to EVPN procedures to optimally This document defines extensions to EVPN procedures to optimally
handle unequal access bandwidth distribution across a set of multi- handle unequal access bandwidth distribution across a set of multi-
homing PEs in order to: homing PEs in order to:
skipping to change at page 3, line 5 skipping to change at page 3, line 5
3.1 Link Bandwidth Extended Community . . . . . . . . . . . . . 8 3.1 Link Bandwidth Extended Community . . . . . . . . . . . . . 8
3.2 REMOTE PE Behavior . . . . . . . . . . . . . . . . . . . . . 9 3.2 REMOTE PE Behavior . . . . . . . . . . . . . . . . . . . . . 9
4. Weighted BUM Traffic Load-Sharing . . . . . . . . . . . . . . 10 4. Weighted BUM Traffic Load-Sharing . . . . . . . . . . . . . . 10
4.1 The BW Capability in the DF Election Extended Community . . 10 4.1 The BW Capability in the DF Election Extended Community . . 10
4.2 BW Capability and Default DF Election algorithm . . . . . . 11 4.2 BW Capability and Default DF Election algorithm . . . . . . 11
4.3 BW Capability and HRW DF Election algorithm (Type 1 and 4.3 BW Capability and HRW DF Election algorithm (Type 1 and
4) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 4) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
4.3.1 BW Increment . . . . . . . . . . . . . . . . . . . . . . 11 4.3.1 BW Increment . . . . . . . . . . . . . . . . . . . . . . 11
4.3.2 HRW Hash Computations with BW Increment . . . . . . . . 12 4.3.2 HRW Hash Computations with BW Increment . . . . . . . . 12
4.3.3 Cost-Benefit Tradeoff on Link Failures . . . . . . . . . 13 4.3.3 Cost-Benefit Tradeoff on Link Failures . . . . . . . . . 13
4.4 BW Capability and Preference DF Election algorithm . . . . 14 4.4 BW Capability and Weighted HRW DF Election algorithm
5. Real-time Available Bandwidth . . . . . . . . . . . . . . . . . 15 (Type TBD) . . . . . . . . . . . . . . . . . . . . . . . . 14
6. Routed EVPN Overlay . . . . . . . . . . . . . . . . . . . . . . 15 4.5 BW Capability and Preference DF Election algorithm . . . . 15
7. EVPN-IRB Multi-homing with non-EVPN routing . . . . . . . . . . 16 5. Real-time Available Bandwidth . . . . . . . . . . . . . . . . . 16
7. References . . . . . . . . . . . . . . . . . . . . . . . . . . 17 6. Routed EVPN Overlay . . . . . . . . . . . . . . . . . . . . . . 16
7.1 Normative References . . . . . . . . . . . . . . . . . . . 17 7. EVPN-IRB Multi-homing with non-EVPN routing . . . . . . . . . . 17
7.2 Informative References . . . . . . . . . . . . . . . . . . 17 7. References . . . . . . . . . . . . . . . . . . . . . . . . . . 18
8. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 18 7.1 Normative References . . . . . . . . . . . . . . . . . . . 18
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 18 7.2 Informative References . . . . . . . . . . . . . . . . . . 18
8. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 19
9. Contributors . . . . . . . . . . . . . . . . . . . . . . . . . 19
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 19
1 Introduction 1 Introduction
In an EVPN-IRB based network overlay, with a CE multi-homed via a In an EVPN-IRB based network overlay, with a CE multi-homed via a
EVPN all-active multi-homing, bridged and routed traffic from remote EVPN all-active multi-homing, bridged and routed traffic from remote
PEs can be equally load balanced (ECMPed) across the multi-homing PEs can be equally load balanced (ECMPed) across the multi-homing
PEs: PEs:
o ECMP Load-balancing for bridged unicast traffic is enabled via o ECMP Load-balancing for bridged unicast traffic is enabled via
aliasing and mass-withdraw procedures detailed in RFC 7432. aliasing and mass-withdraw procedures detailed in RFC 7432.
skipping to change at page 10, line 19 skipping to change at page 10, line 19
Type 1. In the event that link bandwidth attribute is not received Type 1. In the event that link bandwidth attribute is not received
from one or more PEs, forwarding path-list would be computed using from one or more PEs, forwarding path-list would be computed using
regular ECMP semantics. regular ECMP semantics.
4. Weighted BUM Traffic Load-Sharing 4. Weighted BUM Traffic Load-Sharing
Optionally, load sharing of per-service DF role, weighted by Optionally, load sharing of per-service DF role, weighted by
individual PE's link-bandwidth share within a multi-homed ES may also individual PE's link-bandwidth share within a multi-homed ES may also
be achieved. be achieved.
In order to do that, a new DF Election Capability [EVPN-DF-ELECT- In order to do that, a new DF Election Capability [RFC8584] called
FRAMEWORK] called "BW" (Bandwidth Weighted DF Election) is defined. "BW" (Bandwidth Weighted DF Election) is defined. BW may be used
BW may be used along with some DF Election Types, as described in the along with some DF Election Types, as described in the following
following sections. sections.
4.1 The BW Capability in the DF Election Extended Community 4.1 The BW Capability in the DF Election Extended Community
[EVPN-DF-ELECT-FRAMEWORK] defines a new extended community for PEs [RFC8584] defines a new extended community for PEs within a
within a redundancy group to signal and agree on uniform DF Election redundancy group to signal and agree on uniform DF Election Type and
Type and Capabilities for each ES. This document requests a bit in Capabilities for each ES. This document requests a bit in the DF
the DF Election extended community Bitmap: Election extended community Bitmap:
Bit 28: BW (Bandwidth Weighted DF Election) Bit 28: BW (Bandwidth Weighted DF Election)
ES routes advertised with the BW bit set will indicate the desire of ES routes advertised with the BW bit set will indicate the desire of
the advertising PE to consider the link-bandwidth in the DF Election the advertising PE to consider the link-bandwidth in the DF Election
algorithm defined by the value in the "DF Type". algorithm defined by the value in the "DF Type".
As per [EVPN-DF-ELECT-FRAMEWORK], all the PEs in the ES MUST As per [RFC8584], all the PEs in the ES MUST advertise the same
advertise the same Capabilities and DF Type, otherwise the PEs will Capabilities and DF Type, otherwise the PEs will fall back to Default
fall back to Default [RFC7432] DF Election procedure. [RFC7432] DF Election procedure.
The BW Capability MAY be advertised with the following DF Types: The BW Capability MAY be advertised with the following DF Types:
o Type 0: Default DF Election algorithm, as in [RFC7432] o Type 0: Default DF Election algorithm, as in [RFC7432]
o Type 1: HRW algorithm, as in [EVPN-DF-ELECT-FRAMEWORK] o Type 1: HRW algorithm, as in [RFC8584]
o Type 2: Preference algorithm, as in [EVPN-DF-PREF] o Type 2: Preference algorithm, as in [EVPN-DF-PREF]
o Type 4: HRW per-multicast flow DF Election, as in o Type 4: HRW per-multicast flow DF Election, as in
[EVPN-PER-MCAST-FLOW-DF] [EVPN-PER-MCAST-FLOW-DF]
The following sections describe how the DF Election procedures are The following sections describe how the DF Election procedures are
modified for the above DF Types when the BW Capability is used. modified for the above DF Types when the BW Capability is used.
4.2 BW Capability and Default DF Election algorithm 4.2 BW Capability and Default DF Election algorithm
When all the PEs in the Ethernet Segment (ES) agree to use the BW When all the PEs in the Ethernet Segment (ES) agree to use the BW
skipping to change at page 11, line 32 skipping to change at page 11, line 32
for DF election is: for DF election is:
[PE-1, PE-1, PE-2, PE-3]. [PE-1, PE-1, PE-2, PE-3].
The DF for a given VLAN-a on ES-10 is now computed as (VLAN-a % 4). The DF for a given VLAN-a on ES-10 is now computed as (VLAN-a % 4).
This would result in the DF role being distributed across PE1, PE2, This would result in the DF role being distributed across PE1, PE2,
and PE3 in portion to each PE's normalized weight for ES-10. and PE3 in portion to each PE's normalized weight for ES-10.
4.3 BW Capability and HRW DF Election algorithm (Type 1 and 4) 4.3 BW Capability and HRW DF Election algorithm (Type 1 and 4)
[EVPN-DF-ELECT-FRAMEWORK] introduces Highest Random Weight (HRW) [RFC8584] introduces Highest Random Weight (HRW) algorithm (DF Type
algorithm (DF Type 1) for DF election in order to solve potential DF 1) for DF election in order to solve potential DF election skew
election skew depending on Ethernet tag space distribution. [EVPN- depending on Ethernet tag space distribution. [EVPN-PER-MCAST-FLOW-
PER-MCAST-FLOW-DF] further extends HRW algorithm for per-multicast DF] further extends HRW algorithm for per-multicast flow based hash
flow based hash computations (DF Type 4). This section describes computations (DF Type 4). This section describes extensions to HRW
extensions to HRW Algorithm for EVPN DF Election specified in [EVPN- Algorithm for EVPN DF Election specified in [RFC8584] and in [EVPN-
DF-ELECT-FRAMEWORK] and in [EVPN-PER-MCAST-FLOW-DF] in order to PER-MCAST-FLOW-DF] in order to achieve DF election distribution that
achieve DF election distribution that is weighted by link bandwidth. is weighted by link bandwidth.
4.3.1 BW Increment 4.3.1 BW Increment
A new variable called "bandwidth increment" is computed for each [PE, A new variable called "bandwidth increment" is computed for each [PE,
ES] advertising the ES link bandwidth attribute as follows: ES] advertising the ES link bandwidth attribute as follows:
In the context of an ES, In the context of an ES,
L(i) = Link bandwidth advertised by PE(i) for this ES L(i) = Link bandwidth advertised by PE(i) for this ES
skipping to change at page 12, line 30 skipping to change at page 12, line 30
b(1) = 1, b(2) = 1, b(3) = 1 b(1) = 1, b(2) = 1, b(3) = 1
Note that the bandwidth increment must always be an integer, Note that the bandwidth increment must always be an integer,
including, in an unlikely scenario of a PE's link bandwidth not being including, in an unlikely scenario of a PE's link bandwidth not being
an exact multiple of L(min). If it computes to a non-integer value an exact multiple of L(min). If it computes to a non-integer value
(including as a result of link failure), it MUST be rounded down to (including as a result of link failure), it MUST be rounded down to
an integer. an integer.
4.3.2 HRW Hash Computations with BW Increment 4.3.2 HRW Hash Computations with BW Increment
HRW algorithm as described in [EVPN-DF-ELECT-FRAMEWORK] and in [EVPN- HRW algorithm as described in [RFC8584] and in [EVPN-PER-MCAST-FLOW-
PER-MCAST-FLOW-DF] compute a random hash value (referred to as DF] compute a random hash value (referred to as affinity here) for
affinity here) for each PE(i), where, (0 < i <= N), PE(i) is the PE each PE(i), where, (0 < i <= N), PE(i) is the PE at ordinal i, and
at ordinal i, and Address(i) is the IP address of PE at ordinal i. Address(i) is the IP address of PE at ordinal i.
For 'N' PEs sharing an Ethernet segment, this results in 'N' For 'N' PEs sharing an Ethernet segment, this results in 'N'
candidate hash computations. PE that has the highest hash value is candidate hash computations. PE that has the highest hash value is
selected as the DF. selected as the DF.
Affinity computation for each PE(i) is extended to be computed one Affinity computation for each PE(i) is extended to be computed one
per-bandwidth increment associated with PE(i) instead of a single per-bandwidth increment associated with PE(i) instead of a single
affinity computation per PE(i). affinity computation per PE(i).
PE(i) with b(i) = j, results in j affinity computations: PE(i) with b(i) = j, results in j affinity computations:
skipping to change at page 13, line 38 skipping to change at page 13, line 38
For e.g., For e.g.,
affinity function specified in [EVPN-PER-MCAST-FLOW-DF] MAY be affinity function specified in [EVPN-PER-MCAST-FLOW-DF] MAY be
extended as follows to incorporate bandwidth increment j: extended as follows to incorporate bandwidth increment j:
affinity(S,G,V, ESI, Address(i,j)) = affinity(S,G,V, ESI, Address(i,j)) =
(1103515245.((1103515245.Address(i).j + 12345) XOR (1103515245.((1103515245.Address(i).j + 12345) XOR
D(S,G,V,ESI))+12345) (mod 2^31) D(S,G,V,ESI))+12345) (mod 2^31)
affinity or random function specified in [EVPN-DF-ELECT-FRAMEWORK] affinity or random function specified in [RFC8584] MAY be extended as
MAY be extended as follows to incorporate bandwidth increment j: follows to incorporate bandwidth increment j:
affinity(v, Es, Address(i,j)) = (1103515245((1103515245.Address(i).j affinity(v, Es, Address(i,j)) = (1103515245((1103515245.Address(i).j
+ 12345) XOR D(v,Es))+12345)(mod 2^31) + 12345) XOR D(v,Es))+12345)(mod 2^31)
4.3.3 Cost-Benefit Tradeoff on Link Failures 4.3.3 Cost-Benefit Tradeoff on Link Failures
While incorporating link bandwidth into the DF election process While incorporating link bandwidth into the DF election process
provides optimal BUM traffic distribution across the ES links, it provides optimal BUM traffic distribution across the ES links, it
also implies that affinity values for a given PE are re-computed, and also implies that affinity values for a given PE are re-computed, and
DF elections are re-adjusted on changes to that PE's bandwidth DF elections are re-adjusted on changes to that PE's bandwidth
skipping to change at page 14, line 13 skipping to change at page 14, line 13
the operator does not wish to have this level of churn in their DF the operator does not wish to have this level of churn in their DF
election, then they should not advertise the BW capability. Not election, then they should not advertise the BW capability. Not
advertising BW capability may result in less than optimal BUM traffic advertising BW capability may result in less than optimal BUM traffic
distribution while still retaining the ability to allow a remote distribution while still retaining the ability to allow a remote
ingress PE to do weighted ECMP for its unicast traffic to a set of ingress PE to do weighted ECMP for its unicast traffic to a set of
multi-homed PEs, as described in section 3.2. multi-homed PEs, as described in section 3.2.
Same also applies to use of BW capability with service carving (DF Same also applies to use of BW capability with service carving (DF
Type 0), as specified in section 4.2. Type 0), as specified in section 4.2.
4.4 BW Capability and Preference DF Election algorithm 4.4 BW Capability and Weighted HRW DF Election algorithm (Type TBD)
Use of BW capability together with HRW DF election algorithm
described in the previous section has a few limitations:
o While in most scenarios a change in BW for a given PE results in
re-assigment of DF roles from or to that PE, in certain
scenarios, a change in PE BW can result in complete re-assignment
of DF roles.
o If BW advertised from a set of PEs does not have a good least
common multiple, the BW set may result in a high BW increment for
each PE, and hence, may result in higher order of complexity.
[WEIGHTED-HRW] document describes an alternate DF election algorithm
that uses a weighted score function that is minimally disruptive such
that it minimizes the probability of complete re-assignment of DF
roles in a BW change scenario. It also does not require multiple BW
increment based computations.
Instead of computing BW increment and an HRW hash for each [PE, BW
increment], a single weighted score is computed for each PE using the
proposed score function with absolute BW advertised by each PE as its
weight value.
As described in section 4 of [WEIGHTED-HRW], a HRW hash computation
for each PE is converted to a weighted score as follows:
Score(Oi, Sj) = -wi/log(Hash(Oi, Sj)/Hmax); where Hmax is the maximum
hash value.
Oi is object being assigned, for e.g., a vlan-id in this case;
Sj is the server, for e.g., a PE IP address in this case;
wi is the weight, for e.g., BW capability in this case;
Object Oi is assigned to server Si with the highest score.
4.5 BW Capability and Preference DF Election algorithm
This section applies to ES'es where all the PEs in the ES agree use This section applies to ES'es where all the PEs in the ES agree use
the BW Capability with DF Type 2. The BW Capability modifies the the BW Capability with DF Type 2. The BW Capability modifies the
Preference DF Election procedure [EVPN-DF-PREF], by adding the LBW Preference DF Election procedure [EVPN-DF-PREF], by adding the LBW
value as a tie-breaker as follows: value as a tie-breaker as follows:
o Section 4.1, bullet (f) in [EVPN-DF-PREF] now considers the LBW o Section 4.1, bullet (f) in [EVPN-DF-PREF] now considers the LBW
value: value:
f) In case of equal Preference in two or more PEs in the ES, the f) In case of equal Preference in two or more PEs in the ES, the
skipping to change at page 16, line 20 skipping to change at page 17, line 20
use cases from EVPN IRB use cases discussed earlier is that EVPN use cases from EVPN IRB use cases discussed earlier is that EVPN
control plane is used only to enable LAG interface based multi-homing control plane is used only to enable LAG interface based multi-homing
and NOT as an overlay VPN control plane. EVPN control plane in this and NOT as an overlay VPN control plane. EVPN control plane in this
case enables: case enables:
o DF election via EVPN RT-4 based procedures described in [RFC7432] o DF election via EVPN RT-4 based procedures described in [RFC7432]
o LOCAL MAC sync across multi-homing PEs via EVPN RT-2 o LOCAL MAC sync across multi-homing PEs via EVPN RT-2
o LOCAL ARP and ND sync across multi-homing PEs via EVPN RT-2 o LOCAL ARP and ND sync across multi-homing PEs via EVPN RT-2
Applicability of weighted ECMP procedures proposed in this document Applicability of weighted ECMP procedures proposed in this document
to these set of use cases will be addressed in subsequent revisions. to these set of use cases is an area of further consideration.
7. References 7. References
7.1 Normative References 7.1 Normative References
[RFC7432] Sajassi, A., Ed., Aggarwal, R., Bitar, N., Isaac, A., [RFC7432] Sajassi, A., Ed., Aggarwal, R., Bitar, N., Isaac, A.,
Uttaro, J., Drake, J., and W. Henderickx, "BGP MPLS-Based Uttaro, J., Drake, J., and W. Henderickx, "BGP MPLS-Based
Ethernet VPN", RFC 7432, DOI 10.17487/RFC7432, February Ethernet VPN", RFC 7432, DOI 10.17487/RFC7432, February
2015, <http://www.rfc-editor.org/info/rfc7432>. 2015, <http://www.rfc-editor.org/info/rfc7432>.
[BGP-LINK-BW] Mohapatra, P., Fernando, R., "BGP Link Bandwidth [BGP-LINK-BW] Mohapatra, P., Fernando, R., "BGP Link Bandwidth
Extended Community", January 2013, Extended Community", March 2018,
<https://tools.ietf.org/html/draft-ietf-idr-link- <https://tools.ietf.org/html/draft-ietf-idr-link-
bandwidth-06>. bandwidth-07>.
[EVPN-IP-ALIASING] Sajassi, A., Badoni, G., "L3 Aliasing and Mass [EVPN-IP-ALIASING] Sajassi, A., Badoni, G., "L3 Aliasing and Mass
Withdrawal Support for EVPN", July 2017, Withdrawal Support for EVPN", July 2017,
<https://tools.ietf.org/html/draft-sajassi-bess-evpn-ip- <https://tools.ietf.org/html/draft-sajassi-bess-evpn-ip-
aliasing-00>. aliasing-00>.
[EVPN-DF-PREF] Rabadan, J., Sathappan, S., Przygienda, T., Lin, W., [EVPN-DF-PREF] Rabadan, J., Sathappan, S., Przygienda, T., Lin, W.,
Drake, J., Sajassi, A., and S. Mohanty, "Preference-based Drake, J., Sajassi, A., and S. Mohanty, "Preference-based
EVPN DF Election", internet-draft ietf-bess-evpn-pref-df- EVPN DF Election", internet-draft ietf-bess-evpn-pref-df-
01.txt, April 2018. 01.txt, April 2018.
[EVPN-PER-MCAST-FLOW-DF] Sajassi, et al., "Per multicast flow [EVPN-PER-MCAST-FLOW-DF] Sajassi, et al., "Per multicast flow
Designated Forwarder Election for EVPN", March 2018, Designated Forwarder Election for EVPN", March 2018,
<https://tools.ietf.org/html/draft-sajassi-bess-evpn-per- <https://tools.ietf.org/html/draft-sajassi-bess-evpn-per-
mcast-flow-df-election-00>. mcast-flow-df-election-00>.
[EVPN-DF-ELECT-FRAMEWORK] Rabadan, Mohanty, et al., "Framework for [RFC8584] Rabadan, Mohanty, et al., "Framework for Ethernet VPN
EVPN Designated Forwarder Election Extensibility", March Designated Forwarder Election Extensibility", April 2019,
2018, <https://tools.ietf.org/html/draft-ietf-bess-evpn- <https://tools.ietf.org/html/rfc8584>.
df-election-framework-03>.
[WEIGHTED-HRW] Mohanty, et al., "Weighted HRW and its applications",
Sept. 2019, <https://tools.ietf.org/html/draft-mohanty-
bess-weighted-hrw-00>.
[RFC2119] S. Bradner, "Key words for use in RFCs to Indicate [RFC2119] S. Bradner, "Key words for use in RFCs to Indicate
Requirement Levels", March 1997, Requirement Levels", March 1997,
<https://tools.ietf.org/html/rfc2119>. <https://tools.ietf.org/html/rfc2119>.
[RFC8174] B. Leiba, "Ambiguity of Uppercase vs Lowercase in RFC 2119 [RFC8174] B. Leiba, "Ambiguity of Uppercase vs Lowercase in RFC 2119
Key Words", May 2017, Key Words", May 2017,
<https://tools.ietf.org/html/rfc8174>. <https://tools.ietf.org/html/rfc8174>.
7.2 Informative References 7.2 Informative References
8. Acknowledgements 8. Acknowledgements
Authors would like to thank Satya Mohanty for valuable review and Authors would like to thank Satya Mohanty for valuable review and
inputs with respect to HRW algorithm refinements proposed in this inputs with respect to HRW and weighted HRW algorithm refinements
document. proposed in this document.
9. Contributors
Satya Ranjan Mohanty
Cisco
Email: satyamoh@cisco.com
Authors' Addresses Authors' Addresses
Neeraj Malhotra, Editor. Neeraj Malhotra, Editor.
Arrcus Arrcus
Email: neeraj.ietf@gmail.com Email: neeraj.ietf@gmail.com
Ali Sajassi Ali Sajassi
Cisco Cisco
Email: sajassi@cisco.com Email: sajassi@cisco.com
 End of changes. 16 change blocks. 
47 lines changed or deleted 97 lines changed or added

This html diff was produced by rfcdiff 1.47. The latest version is available from http://tools.ietf.org/tools/rfcdiff/