draft-ietf-bess-evpn-unequal-lb-00.txt   draft-ietf-bess-evpn-unequal-lb-01.txt 
INTERNET-DRAFT N. Malhotra, Ed. BESS Working Group N. Malhotra, Ed.
Arrcus Internet-Draft Arrcus
Intended Status: Proposed Standard
A. Sajassi A. Sajassi
Intended Status: Proposed Standard Cisco S. Thoria
Cisco
J. Rabadan J. Rabadan
Nokia Nokia
J. Drake J. Drake
Juniper Juniper
A. Lingala A. Lingala
AT&T AT&T
S. Thoria
Cisco
Expires: March 23, 2019 September 19, 2018 Expires: Sept 26, 2019 March 25, 2019
Weighted Multi-Path Procedures for EVPN All-Active Multi-Homing Weighted Multi-Path Procedures for EVPN All-Active Multi-Homing
draft-ietf-bess-evpn-unequal-lb-00 draft-ietf-bess-evpn-unequal-lb-01
Abstract Abstract
In an EVPN-IRB based network overlay, EVPN LAG enables all-active In an EVPN-IRB based network overlay, EVPN all-active multi-homing
multi-homing for a host or CE device connected to two or more PEs via enables multi-homing for a CE device connected to two or more PEs via
a LAG bundle, such that bridged and routed traffic from remote PEs a LAG bundle, such that bridged and routed traffic from remote PEs
can be equally load balanced (ECMPed) across the multi-homing PEs. can be equally load balanced (ECMPed) across the multi-homing PEs.
This document defines extensions to EVPN procedures to optimally This document defines extensions to EVPN procedures to optimally
handle unequal access bandwidth distribution across a set of multi- handle unequal access bandwidth distribution across a set of multi-
homing PEs in order to: homing PEs in order to:
o provide greater flexibility, with respect to adding or o provide greater flexibility, with respect to adding or
removing individual PE-CE links within the access LAG removing individual PE-CE links within the access LAG
o handle PE-CE LAG member link failures that can result in unequal o handle PE-CE LAG member link failures that can result in unequal
skipping to change at page 4, line 7 skipping to change at page 4, line 7
6. Routed EVPN Overlay . . . . . . . . . . . . . . . . . . . . . . 15 6. Routed EVPN Overlay . . . . . . . . . . . . . . . . . . . . . . 15
7. EVPN-IRB Multi-homing with non-EVPN routing . . . . . . . . . . 16 7. EVPN-IRB Multi-homing with non-EVPN routing . . . . . . . . . . 16
7. References . . . . . . . . . . . . . . . . . . . . . . . . . . 17 7. References . . . . . . . . . . . . . . . . . . . . . . . . . . 17
7.1 Normative References . . . . . . . . . . . . . . . . . . . 17 7.1 Normative References . . . . . . . . . . . . . . . . . . . 17
7.2 Informative References . . . . . . . . . . . . . . . . . . 17 7.2 Informative References . . . . . . . . . . . . . . . . . . 17
8. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 18 8. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 18
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 18 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 18
1 Introduction 1 Introduction
In an EVPN-IRB based network overlay, with an access CE multi-homed In an EVPN-IRB based network overlay, with a CE multi-homed via a
via a LAG interface, bridged and routed traffic from remote PEs can EVPN all-active multi-homing, bridged and routed traffic from remote
be equally load balanced (ECMPed) across the multi-homing PEs: PEs can be equally load balanced (ECMPed) across the multi-homing
PEs:
o ECMP Load-balancing for bridged unicast traffic is enabled via o ECMP Load-balancing for bridged unicast traffic is enabled via
aliasing and mass-withdraw procedures detailed in RFC 7432. aliasing and mass-withdraw procedures detailed in RFC 7432.
o ECMP Load-balancing for routed unicast traffic is enabled via o ECMP Load-balancing for routed unicast traffic is enabled via
existing L3 ECMP mechanisms. existing L3 ECMP mechanisms.
o Load-sharing of bridged BUM traffic on local ports is enabled o Load-sharing of bridged BUM traffic on local ports is enabled
via EVPN DF election procedure detailed in RFC 7432 via EVPN DF election procedure detailed in RFC 7432
skipping to change at page 5, line 25 skipping to change at page 5, line 25
\ ESI-1 / \ ESI-1 /
\ / \ /
+\---/+ +\---/+
| \ / | | \ / |
+--+--+ +--+--+
| |
CE1 CE1
Figure 1 Figure 1
Consider a CE1 that is dual-homed to PE1 and PE2 via EVPN-LAG with Consider a CE1 that is dual-homed to PE1 and PE2 via EVPN all-active
single member links of equal bandwidth to each PE (aka, equal access multi-homing with single member links of equal bandwidth to each PE
bandwidth distribution across PE1 and PE2). If the provider wants to (aka, equal access bandwidth distribution across PE1 and PE2). If the
increase link bandwidth to CE1, it MUST add a link to both PE1 and provider wants to increase link bandwidth to CE1, it MUST add a link
PE2 in order to maintain equal access bandwidth distribution and to both PE1 and PE2 in order to maintain equal access bandwidth
inter-work with EVPN ECMP load-balancing. In other words, for a dual- distribution and inter-work with EVPN ECMP load-balancing. In other
homed CE, total number of CE links must be provisioned in multiples words, for a dual-homed CE, total number of CE links must be
of 2 (2, 4, 6, and so on). For a triple-homed CE, number of CE links provisioned in multiples of 2 (2, 4, 6, and so on). For a triple-
must be provisioned in multiples of three (3, 6, 9, and so on). To homed CE, number of CE links must be provisioned in multiples of
generalize, for a CE that is multi-homed to "n" PEs, number of PE-CE three (3, 6, 9, and so on). To generalize, for a CE that is multi-
physical links provisioned must be an integral multiple of "n". This homed to "n" PEs, number of PE-CE physical links provisioned must be
is restrictive in case of dual-homing and very quickly becomes an integral multiple of "n". This is restrictive in case of dual-
prohibitive in case of multi-homing. homing and very quickly becomes prohibitive in case of multi-homing.
Instead, a provider may wish to increase PE-CE bandwidth OR number of Instead, a provider may wish to increase PE-CE bandwidth OR number of
links in ANY link increments. As an example, for CE1 dual-homed to links in ANY link increments. As an example, for CE1 dual-homed to
PE1 and PE2 in all-active mode, provider may wish to add a third link PE1 and PE2 in all-active mode, provider may wish to add a third link
to ONLY PE1 to increase total bandwidth for this CE by 50%, rather to ONLY PE1 to increase total bandwidth for this CE by 50%, rather
than being required to increase access bandwidth by 100% by adding a than being required to increase access bandwidth by 100% by adding a
link to each of the two PEs. While existing EVPN based all-active link to each of the two PEs. While existing EVPN based all-active
load-balancing procedures do not necessarily preclude such asymmetric load-balancing procedures do not necessarily preclude such asymmetric
access bandwidth distribution among the PEs providing redundancy, it access bandwidth distribution among the PEs providing redundancy, it
may result in unexpected traffic loss due to congestion in the access may result in unexpected traffic loss due to congestion in the access
skipping to change at page 6, line 33 skipping to change at page 6, line 33
\\ ESI-1 // \\ ESI-1 //
\\ /X \\ /X
+\\---//+ +\\---//+
| \\ // | | \\ // |
+---+---+ +---+---+
| |
CE1 CE1
Consider a CE1 that is multi-homed to PE1 and PE2 via a link bundle Consider a CE1 that is multi-homed to PE1 and PE2 via a link bundle
with two member links to each PE. On a PE2-CE1 physical link failure, with two member links to each PE. On a PE2-CE1 physical link failure,
link bundle represented by ESI-1 on PE2 stays up, however, it's link bundle represented by an Ethernet Segment ESI-1 on PE2 stays up,
bandwidth is cut in half. With the existing ECMP procedures, both PE1 however, it's bandwidth is cut in half. With existing ECMP
and PE2 will continue to attract equal amount of traffic from remote procedures, both PE1 and PE2 will continue to attract equal amount of
PEs, even when PE1 has double the bandwidth to CE1. If bandwidth traffic from remote PEs, even when PE1 has double the bandwidth to
distribution to CE1 across PE1 and PE2 is 2:1, traffic from remote CE1. If bandwidth distribution to CE1 across PE1 and PE2 is 2:1,
hosts MUST also be load-balanced across PE1 and PE2 in 2:1 manner to traffic from remote hosts MUST also be load-balanced across PE1 and
avoid unexpected congestion and traffic loss on PE2-CE1 links within PE2 in 2:1 manner to avoid unexpected congestion and traffic loss on
the LAG. PE2-CE1 links within the LAG.
1.3 Design Requirement 1.3 Design Requirement
+-----------------------+ +-----------------------+
|Underlay Network Fabric| |Underlay Network Fabric|
+-----------------------+ +-----------------------+
+-----+ +-----+ +-----+ +-----+ +-----+ +-----+ +-----+ +-----+
| PE1 | | PE2 | ..... | PEx | | PEn | | PE1 | | PE2 | ..... | PEx | | PEn |
+-----+ +-----+ +-----+ +-----+ +-----+ +-----+ +-----+ +-----+
skipping to change at page 8, line 9 skipping to change at page 8, line 9
OR router that physically hosts the ESI. OR router that physically hosts the ESI.
"REMOTE PE" in the context of an ESI refers to a provider edge switch "REMOTE PE" in the context of an ESI refers to a provider edge switch
OR router in an EVPN overlay, who's overlay reachability to the ESI OR router in an EVPN overlay, who's overlay reachability to the ESI
is via the LOCAL PE. is via the LOCAL PE.
2. Solution Overview 2. Solution Overview
In order to achieve weighted load balancing for overlay unicast In order to achieve weighted load balancing for overlay unicast
traffic, Ethernet A-D per-ES route (EVPN Route Type 1) is leveraged traffic, Ethernet A-D per-ES route (EVPN Route Type 1) is leveraged
to signal the ESI bandwidth to remote PEs. Using Ethernet A-D per-ES to signal the Ethernet Segment bandwidth to remote PEs. Using
route to signal the ESI bandwidth provides a mechanism to be able to Ethernet A-D per-ES route to signal the Ethernet Segment bandwidth
react to changes in access bandwidth in a service and host provides a mechanism to be able to react to changes in access
independent manner. Remote PEs computing the MAC path-lists based on bandwidth in a service and host independent manner. Remote PEs
global and aliasing Ethernet A-D routes now have the ability to setup computing the MAC path-lists based on global and aliasing Ethernet A-
weighted load-balancing path-lists based on the ESI access bandwidth D routes now have the ability to setup weighted load-balancing path-
received from each PE that the ESI is multi-homed to. If Ethernet A-D lists based on the ESI access bandwidth received from each PE that
per-ES route is also leveraged for IP path-list computation, as per the ESI is multi-homed to. If Ethernet A-D per-ES route is also
[EVPN-IP-ALIASING], it also provides a method to do weighted load- leveraged for IP path-list computation, as per [EVPN-IP-ALIASING], it
balancing for IP routed traffic. also provides a method to do weighted load-balancing for IP routed
traffic.
In order to achieve weighted load-balancing of overlay BUM traffic, In order to achieve weighted load-balancing of overlay BUM traffic,
EVPN ES route (Route Type 4) is leveraged to signal the ESI bandwidth EVPN ES route (Route Type 4) is leveraged to signal the ESI bandwidth
to PEs within an ESI's redundancy group to influence per-service DF to PEs within an ESI's redundancy group to influence per-service DF
election. PEs in an ESI redundancy group now have the ability to do election. PEs in an ESI redundancy group now have the ability to do
service carving in proportion to each PE's relative ESI bandwidth. service carving in proportion to each PE's relative ESI bandwidth.
Procedures to accomplish this are described in greater detail next. Procedures to accomplish this are described in greater detail next.
3. Weighted Unicast Traffic Load-balancing 3. Weighted Unicast Traffic Load-balancing
3.1 LOCAL PE Behavior 3.1 LOCAL PE Behavior
A PE that is part of an ESI's redundancy group would advertise a A PE that is part of an Ethernet Segment's redundancy group would
additional "link bandwidth" EXT-COMM attribute with Ethernet A-D per- advertise a additional "link bandwidth" EXT-COMM attribute with
ES route (EVPN Route Type 1), that represents total bandwidth of PE's Ethernet A-D per-ES route (EVPN Route Type 1), that represents total
physical links in an ESI. BGP link bandwidth EXT-COMM defined in bandwidth of PE's physical links in an Ethernet Segment. BGP link
[BGP-LINK-BW] is re-used for this purpose. bandwidth EXT-COMM defined in [BGP-LINK-BW] is re-used for this
purpose.
3.1 Link Bandwidth Extended Community 3.1 Link Bandwidth Extended Community
Link bandwidth extended community described in [BGP-LINK-BW] for Link bandwidth extended community described in [BGP-LINK-BW] for
layer 3 VPNs is re-used here to signal local ES link bandwidth to layer 3 VPNs is re-used here to signal local ES link bandwidth to
remote PEs. link-bandwidth extended community is however defined in remote PEs. link-bandwidth extended community is however defined in
[BGP-LINK-BW] as optional non-transitive. In inter-AS scenarios, [BGP-LINK-BW] as optional non-transitive. In inter-AS scenarios,
link-bandwidth may need to be signaled to an eBGP neighbor along with link-bandwidth may need to be signaled to an eBGP neighbor along with
next-hop unchanged. It is work in progress with authors of [BGP-LINK- next-hop unchanged. It is work in progress with authors of [BGP-LINK-
BW] to allow for this attribute to be used as transitive in inter-AS BW] to allow for this attribute to be used as transitive in inter-AS
skipping to change at page 10, line 46 skipping to change at page 10, line 46
As per [EVPN-DF-ELECT-FRAMEWORK], all the PEs in the ES MUST As per [EVPN-DF-ELECT-FRAMEWORK], all the PEs in the ES MUST
advertise the same Capabilities and DF Type, otherwise the PEs will advertise the same Capabilities and DF Type, otherwise the PEs will
fall back to Default [RFC7432] DF Election procedure. fall back to Default [RFC7432] DF Election procedure.
The BW Capability MAY be advertised with the following DF Types: The BW Capability MAY be advertised with the following DF Types:
o Type 0: Default DF Election algorithm, as in [RFC7432] o Type 0: Default DF Election algorithm, as in [RFC7432]
o Type 1: HRW algorithm, as in [EVPN-DF-ELECT-FRAMEWORK] o Type 1: HRW algorithm, as in [EVPN-DF-ELECT-FRAMEWORK]
o Type 2: Preference algorithm, as in [EVPN-DF-PREF] o Type 2: Preference algorithm, as in [EVPN-DF-PREF]
o Type 4: HRW per-multicast flow DF Election, as in [XXX] o Type 4: HRW per-multicast flow DF Election, as in
[EVPN-PER-MCAST-FLOW-DF]
The following sections describe how the DF Election procedures are The following sections describe how the DF Election procedures are
modified for the above DF Types when the BW Capability is used. modified for the above DF Types when the BW Capability is used.
4.2 BW Capability and Default DF Election algorithm 4.2 BW Capability and Default DF Election algorithm
When all the PEs in the ES agree to use the BW Capability with DF When all the PEs in the Ethernet Segment (ES) agree to use the BW
Type 0, the Default DF Election procedure is modified as follows: Capability with DF Type 0, the Default DF Election procedure is
modified as follows:
o Each PE advertises a "Link Bandwidth" EXT-COMM attribute along o Each PE advertises a "Link Bandwidth" EXT-COMM attribute along
with the ES route to signal the PE-CE link bandwidth (LBW) for with the ES route to signal the PE-CE link bandwidth (LBW) for
the ES. the ES.
o A receiving PE MUST use the ES link bandwidth attribute o A receiving PE MUST use the ES link bandwidth attribute
received from each PE to compute a relative weight for each received from each PE to compute a relative weight for each
remote PE. remote PE.
o The DF Election procedure MUST now use this weighted list of PEs o The DF Election procedure MUST now use this weighted list of PEs
to compute the per-VLAN Designated Forwarder, such that the DF to compute the per-VLAN Designated Forwarder, such that the DF
role is distributed in proportion to this normalized weight. role is distributed in proportion to this normalized weight.
skipping to change at page 18, line 12 skipping to change at page 18, line 12
7.2 Informative References 7.2 Informative References
8. Acknowledgements 8. Acknowledgements
Authors would like to thank Satya Mohanty for valuable review and Authors would like to thank Satya Mohanty for valuable review and
inputs with respect to HRW algorithm refinements proposed in this inputs with respect to HRW algorithm refinements proposed in this
document. document.
Authors' Addresses Authors' Addresses
Neeraj Malhotra, Ed. Neeraj Malhotra, Editor.
Arrcus Arrcus
Email: neeraj.ietf@gmail.com Email: neeraj.ietf@gmail.com
Ali Sajassi Ali Sajassi
Cisco Cisco
Email: sajassi@cisco.com Email: sajassi@cisco.com
Jorge Rabadan Jorge Rabadan
Nokia Nokia
Email: jorge.rabadan@nokia.com Email: jorge.rabadan@nokia.com
 End of changes. 16 change blocks. 
52 lines changed or deleted 60 lines changed or added

This html diff was produced by rfcdiff 1.47. The latest version is available from http://tools.ietf.org/tools/rfcdiff/