draft-ietf-bess-evpn-df-election-framework-01.txt   draft-ietf-bess-evpn-df-election-framework-02.txt 
skipping to change at page 1, line 14 skipping to change at page 1, line 14
Internet Draft Nokia Internet Draft Nokia
S. Mohanty, Ed. S. Mohanty, Ed.
Intended status: Standards Track A. Sajassi Intended status: Standards Track A. Sajassi
Cisco Cisco
J. Drake J. Drake
Juniper Juniper
K. Nagaraj K. Nagaraj
S. Sathappan S. Sathappan
Nokia Nokia
Expires: October 14, 2018 April 12, 2018 Expires: November 24, 2018 May 23, 2018
Framework for EVPN Designated Forwarder Election Extensibility Framework for EVPN Designated Forwarder Election Extensibility
draft-ietf-bess-evpn-df-election-framework-01 draft-ietf-bess-evpn-df-election-framework-02
Abstract Abstract
The Designated Forwarder (DF) in EVPN networks is the PE responsible The Designated Forwarder (DF) in EVPN networks is the PE responsible
for sending broadcast, unknown unicast and multicast (BUM) traffic to for sending broadcast, unknown unicast and multicast (BUM) traffic to
a multi-homed CE, on a given VLAN on a particular Ethernet Segment a multi-homed CE, on a given VLAN on a particular Ethernet Segment
(ES). The DF is selected out of a list of candidate PEs that (ES). The DF is selected out of a list of candidate PEs that
advertise the same Ethernet Segment Identifier (ESI) to the EVPN advertise the same Ethernet Segment Identifier (ESI) to the EVPN
network. By default, EVPN uses a DF Election algorithm referred to as network. By default, EVPN uses a DF Election algorithm referred to as
"Service Carving" and it is based on a modulus function (V mod N) "Service Carving" and it is based on a modulus function (V mod N)
skipping to change at page 2, line 18 skipping to change at page 2, line 18
and may be updated, replaced, or obsoleted by other documents at any and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress." material or to cite them other than as "work in progress."
The list of current Internet-Drafts can be accessed at The list of current Internet-Drafts can be accessed at
http://www.ietf.org/ietf/1id-abstracts.txt http://www.ietf.org/ietf/1id-abstracts.txt
The list of Internet-Draft Shadow Directories can be accessed at The list of Internet-Draft Shadow Directories can be accessed at
http://www.ietf.org/shadow.html http://www.ietf.org/shadow.html
This Internet-Draft will expire on October 14, 2018. This Internet-Draft will expire on November 24, 2018.
Copyright Notice Copyright Notice
Copyright (c) 2018 IETF Trust and the persons identified as the Copyright (c) 2018 IETF Trust and the persons identified as the
document authors. All rights reserved. document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents Provisions Relating to IETF Documents
(http://trustee.ietf.org/license-info) in effect on the date of (http://trustee.ietf.org/license-info) in effect on the date of
publication of this document. Please review these documents publication of this document. Please review these documents
skipping to change at page 3, line 38 skipping to change at page 3, line 38
9.1. Normative References . . . . . . . . . . . . . . . . . . . 21 9.1. Normative References . . . . . . . . . . . . . . . . . . . 21
9.2. Informative References . . . . . . . . . . . . . . . . . . 22 9.2. Informative References . . . . . . . . . . . . . . . . . . 22
10. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 23 10. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 23
11. Contributors . . . . . . . . . . . . . . . . . . . . . . . . . 23 11. Contributors . . . . . . . . . . . . . . . . . . . . . . . . . 23
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 23 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 23
1. Conventions and Terminology 1. Conventions and Terminology
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and
"OPTIONAL" in this document are to be interpreted as described in "OPTIONAL" in this document are to be interpreted as described in BCP
BCP14 [RFC2119] [RFC8174] when, and only when, they appear in all 14 [RFC2119] [RFC8174] when, and only when, they appear in all
capitals, as shown here. capitals, as shown here.
o AC and ACS - Attachment Circuit and Attachment Circuit Status. An o AC and ACS - Attachment Circuit and Attachment Circuit Status. An
AC has an Ethernet Tag associated to it. AC has an Ethernet Tag associated to it.
o BUM - refers to the Broadcast, Unknown unicast and Multicast o BUM - refers to the Broadcast, Unknown unicast and Multicast
traffic. traffic.
o DF, NDF and BDF - Designated Forwarder, Non-Designated Forwarder o DF, NDF and BDF - Designated Forwarder, Non-Designated Forwarder
and Backup Designated Forwarder and Backup Designated Forwarder
skipping to change at page 5, line 10 skipping to change at page 5, line 10
responsible for: responsible for:
o Flooding Broadcast, Unknown unicast and Multicast traffic (BUM), on o Flooding Broadcast, Unknown unicast and Multicast traffic (BUM), on
a given Ethernet Tag on a particular Ethernet Segment (ES), to the a given Ethernet Tag on a particular Ethernet Segment (ES), to the
CE. This is valid for single-active and all-active EVPN CE. This is valid for single-active and all-active EVPN
multi-homing. multi-homing.
o Sending unicast traffic on a given Ethernet Tag on a particular ES o Sending unicast traffic on a given Ethernet Tag on a particular ES
to the CE. This is valid for single-active multi-homing. to the CE. This is valid for single-active multi-homing.
Figure 1 illustrates and example that we will be used to explain the Figure 1 illustrates and example that we will use to explain the
Designated Forwarder function. Designated Forwarder function.
+---------------+ +---------------+
| IP/MPLS | | IP/MPLS |
| CORE | | CORE |
+----+ ES1 +----+ +----+ +----+ ES1 +----+ +----+
| CE1|-----| |-----------| |____ES2 | CE1|-----| |-----------| |____ES2
+----+ | PE1| | PE2| \ +----+ | PE1| | PE2| \
| |-------- +----+ \+----+ | |-------- +----+ \+----+
+----+ | | | CE2| +----+ | | | CE2|
skipping to change at page 6, line 5 skipping to change at page 6, line 5
is very important that, in case of multi-homing, only one of the is very important that, in case of multi-homing, only one of the
links be used to direct traffic to/from the core. links be used to direct traffic to/from the core.
One of the pre-requisites for this support is that participating PEs One of the pre-requisites for this support is that participating PEs
must agree amongst themselves as to who would act as the Designated must agree amongst themselves as to who would act as the Designated
Forwarder (DF). This needs to be achieved through a distributed Forwarder (DF). This needs to be achieved through a distributed
algorithm in which each participating PE independently and algorithm in which each participating PE independently and
unambiguously selects one of the participating PEs as the DF, and the unambiguously selects one of the participating PEs as the DF, and the
result should be unanimously in agreement. result should be unanimously in agreement.
The default procedure for DF election defined by [RFC7432] at the The default algorithm for DF election defined by [RFC7432] at the
granularity of (ESI,EVI) is referred to as "service carving". In this granularity of (ESI,EVI) is referred to as "service carving". In this
document, service carving or default DF Election algorithm is used document, service carving or default DF Election algorithm is used
indistinctly. With service carving, it is possible to elect multiple indistinctly. With service carving, it is possible to elect multiple
DFs per Ethernet Segment (one per EVI) in order to perform load- DFs per Ethernet Segment (one per EVI) in order to perform load-
balancing of traffic destined to a given Segment. The objective is balancing of traffic destined to a given Segment. The objective is
that the load-balancing procedures should carve up the BD space among that the load-balancing procedures should carve up the BD space among
the redundant PE nodes evenly, in such a way that every PE is the DF the redundant PE nodes evenly, in such a way that every PE is the DF
for a disjoint set of EVIs. for a disjoint set of EVIs.
The DF Election algorithm as described in [RFC7432] (Section 8.5) is The DF Election algorithm as described in [RFC7432] (Section 8.5) is
skipping to change at page 6, line 29 skipping to change at page 6, line 29
values. For example, there are N PEs: PE0, PE1,... PEN-1 ranked as values. For example, there are N PEs: PE0, PE1,... PEN-1 ranked as
per increasing IP addresses in the ordinal list; then for each VLAN per increasing IP addresses in the ordinal list; then for each VLAN
with Ethernet Tag V, configured on the Ethernet Segment ES1, PEx is with Ethernet Tag V, configured on the Ethernet Segment ES1, PEx is
the DF for VLAN V on ES1 when x equals (V mod N). In the case of the DF for VLAN V on ES1 when x equals (V mod N). In the case of
VLAN-Bundle only the lowest VLAN is used. In the case when the VLAN-Bundle only the lowest VLAN is used. In the case when the
planned density is high (meaning there are significant number of planned density is high (meaning there are significant number of
VLANs and the Ethernet Tags are uniformly distributed), the thinking VLANs and the Ethernet Tags are uniformly distributed), the thinking
is that the DF Election will be spread across the PEs hosting that is that the DF Election will be spread across the PEs hosting that
Ethernet Segment and good service carving can be achieved. Ethernet Segment and good service carving can be achieved.
The described default DF Election algorithm has some undesirable However, the described default DF Election algorithm has some
properties and in some cases can be somewhat disruptive and unfair. undesirable properties and in some cases can be somewhat disruptive
This document describes those issues and proposes a mechanism for and unfair. This document describes some of those issues and proposes
dealing with them. These mechanisms do involve changes to the default a mechanism for dealing with them. These mechanisms do involve
DF Election algorithm, however they do not require any protocol changes to the default DF Election algorithm, but they do not require
changes to the EVPN Route exchange and have minimal changes to their any changes to the EVPN Route exchange and have minimal changes to
content per se. their content per se.
In addition, there is a need to extend the DF Election procedures so
that new algorithms and capabilities are possible. A single algorithm
(the default DF Election algorithm) may not meet the requirements in
all the use-cases.
Note that while [RFC7432] elects a DF per <ES, EVI>, this document Note that while [RFC7432] elects a DF per <ES, EVI>, this document
elects a DF per <ES, BD>. This means that unlike [RFC 7432], where elects a DF per <ES, BD>. This means that unlike [RFC 7432], where
for a VLAN Aware Bundle service EVI there is only one DF for the EVI, for a VLAN Aware Bundle service EVI there is only one DF for the EVI,
this document specifies that there will be multiple DFs, one for each this document specifies that there will be multiple DFs, one for each
BD configured in that EVI. BD configured in that EVI.
2.2. Problem Statement 2.2. Problem Statement
This section describes some potential issues on the default DF This section describes some potential issues on the default DF
skipping to change at page 7, line 11 skipping to change at page 7, line 16
There are three fundamental problems with the current default DF There are three fundamental problems with the current default DF
Election algorithm. Election algorithm.
1- First, the algorithm will not perform well when the Ethernet Tag 1- First, the algorithm will not perform well when the Ethernet Tag
follows a non-uniform distribution, for instance when the Ethernet follows a non-uniform distribution, for instance when the Ethernet
Tags are all even or all odd. In such a case let us assume that Tags are all even or all odd. In such a case let us assume that
the ES is multi-homed to two PEs; all the VLANs will only pick one the ES is multi-homed to two PEs; all the VLANs will only pick one
of the PEs as the DF. This is very sub-optimal. It defeats the of the PEs as the DF. This is very sub-optimal. It defeats the
purpose of service carving as the DFs are not really evenly spread purpose of service carving as the DFs are not really evenly spread
across. In this particular case, in fact one of the PEs does not across. In fact, in this particular case, one of the PEs does not
get elected all as the DF, so it does not participate in the DF get elected as DF at all, so it does not participate in the DF
responsibilities at all. Consider another example where referring responsibilities at all. Consider another example where, referring
to Figure 1, lets assume that PE2, PE3, PE4 are in ascending order to Figure 1, lets assume that PE2, PE3, PE4 are in ascending order
of the IP address; and each VLAN configured on ES2 is associated of the IP address; and each VLAN configured on ES2 is associated
with an Ethernet Tag of of the form (3x+1), where x is an integer. with an Ethernet Tag of of the form (3x+1), where x is an integer.
This will result in PE3 always be selected as the DF. This will result in PE3 always be selected as the DF.
2- Even in the case when the Ethernet Tag distribution is uniform the 2- Even in the case when the Ethernet Tag distribution is uniform the
instance of a PE being up or down results in re-computation ((v instance of a PE being up or down results in re-computation ((v
mod N-1) or (v mod N+1) as is the case); the resulting modulus mod N-1) or (v mod N+1) as is the case); the resulting modulus
value need not be uniformly distributed because it can be subject value need not be uniformly distributed because it can be subject
to the primality of N-1 or N+1 as may be the case. to the primality of N-1 or N+1 as may be the case.
3- The third problem is one of disruption. Consider a case when the 3- The third problem is one of disruption. Consider a case when the
same Ethernet Segment is multi homed to a set of PEs. When the ES same Ethernet Segment is multi homed to a set of PEs. When the ES
is down in one of the PEs, say PE1, or PE1 itself reboots, or the is down in one of the PEs, say PE1, or PE1 itself reboots, or the
BGP process goes down or the connectivity between PE1 and an RR BGP process goes down or the connectivity between PE1 and an RR
goes down, the effective number of PEs in the system now becomes goes down, the effective number of PEs in the system now becomes
N-1 and DFs are computed for all the VLANs that are configured on N-1, and DFs are computed for all the VLANs that are configured on
that Ethernet Segment. In general, if the DF for a VLAN v happens that Ethernet Segment. In general, if the DF for a VLAN v happens
not to be PE1, but some other PE, say PE2, it is likely that some not to be PE1, but some other PE, say PE2, it is likely that some
other PE will become the new DF. This is not desirable. Similarly other PE will become the new DF. This is not desirable. Similarly
when a new PE hosts the same Ethernet Segment, the mapping again when a new PE hosts the same Ethernet Segment, the mapping again
changes because of the mod operation. This results in needless changes because of the modulus operation. This results in needless
churn. Again referring to Figure 1, say v1, v2 and v3 are VLANs churn. Again referring to Figure 1, say v1, v2 and v3 are VLANs
configured on ES2 with associated Ethernet Tags of value 999, 1000 configured on ES2 with associated Ethernet Tags of value 999, 1000
and 10001 respectively. So PE1, PE2 and PE3 are also the DFs for and 10001 respectively. So PE1, PE2 and PE3 are the DFs for v1, v2
v1, v2 and v3 respectively. Now when PE3 goes down, PE2 will and v3 respectively. Now when PE3 goes down, PE2 will become the
become the DF for v1 and PE1 will become the DF for v2. DF for v1 and PE1 will become the DF for v2.
One point to note is that the current DF election algorithm assumes One point to note is that the default DF election algorithm assumes
that all the PEs who are multi-homed to the same Ethernet Segment and that all the PEs who are multi-homed to the same Ethernet Segment
interested in the DF Election by exchanging EVPN routes have a V4 (and interested in the DF Election by exchanging EVPN routes) use an
peering with each other or via a Route Reflector. This need not be Originating Router's IP Address of the same family. This does not
the case as there can be a v6 peering and supporting the EVPN need to be the case as the EVPN address-family can be carried over a
address-family. v4 or v6 peering, and the PEs attached to the same ES may use an
address of either family.
Mathematically, a conventional hash function maps a key k to a number Mathematically, a conventional hash function maps a key k to a number
i representing one of m hash buckets through a function h(k) i.e. i representing one of m hash buckets through a function h(k) i.e.
i=h(k). In the EVPN case, h is simply a modulo-m hash function viz. i=h(k). In the EVPN case, h is simply a modulo-m hash function viz.
h(v) = v mod N, where N is the number of PEs that are multi-homed to h(v) = v mod N, where N is the number of PEs that are multi-homed to
the Ethernet Segment in discussion. It is well-known that for good the Ethernet Segment in discussion. It is well-known that for good
hash distribution using the modulus operation, the modulus N should hash distribution using the modulus operation, the modulus N should
be a prime-number not too close to a power of 2 [CLRS2009]. When the be a prime-number not too close to a power of 2 [CLRS2009]. When the
effective number of PEs changes from N to N-1 (or vice versa); all effective number of PEs changes from N to N-1 (or vice versa); all
the objects (VLAN V) will be remapped except those for which V mod N the objects (VLAN V) will be remapped except those for which V mod N
and V mod (N-1) refer to the same PE in the previous and subsequent and V mod (N-1) refer to the same PE in the previous and subsequent
ordinal rankings respectively. ordinal rankings respectively. From a forwarding perspective, this is
a churn, as it results in programming the PE side ports as blocking
or non-blocking at potentially all PEs when the DF changes.
From a forwarding perspective, this is a churn, as it results in This document addresses this problem and furnishes a solution to this
programming the CE and PE side ports as blocking or non-blocking at
potentially all PEs when the DF changes either because (i) a new PE
is added or (ii) another one goes down or loses connectivity or else
cannot take part in the DF election process for whatever reason. This
document addresses this problem and furnishes a solution to this
undesirable behavior. undesirable behavior.
2.2.2. Traffic Black-Holing on Individual AC Failures 2.2.2. Traffic Black-Holing on Individual AC Failures
As discussed in section 2.1 the default DF Election algorithm defined As discussed in section 2.1 the default DF Election algorithm defined
by [RFC7432] takes into account only two variables in modulus by [RFC7432] takes into account only two variables in the modulus
function for a given ES: the existence of the PE's IP address on the function for a given ES: the existence of the PE's IP address on the
candidate list and the locally provisioned Ethernet Tags. candidate list and the locally provisioned Ethernet Tags.
If the DF for an <ESI, EVI> fails (due to physical link/node If the DF for an <ESI, EVI> fails (due to physical link/node
failures) an ES route withdrawal will make the Non-DF (NDF) PEs re- failures) an ES route withdrawal will make the Non-DF (NDF) PEs re-
elect the DF for that <ESI, EVI> and the service will be recovered. elect the DF for that <ESI, EVI> and the service will be recovered.
However the default DF election procedure does not provide a However, the default DF election procedure does not provide a
protection against "logical" failures or human errors that may occur protection against "logical" failures or human errors that may occur
at service level on the DF, while the list of active PEs for a given at service level on the DF, while the list of active PEs for a given
ES does not change. These failures may have an impact not only on the ES does not change. These failures may have an impact not only on the
local PE where the issue happens, but also on the rest of the PEs of local PE where the issue happens, but also on the rest of the PEs of
the ES. Some examples of such logical failures are listed below: the ES. Some examples of such logical failures are listed below:
a) A given individual Attachment Circuit (AC) defined in an ES is a) A given individual Attachment Circuit (AC) defined in an ES is
accidentally shutdown or even not provisioned yet (hence the accidentally shutdown or even not provisioned yet (hence the
Attachment Circuit Status - ACS - is DOWN), while the ES is Attachment Circuit Status - ACS - is DOWN), while the ES is
operationally active (since the ES route is active). operationally active (since the ES route is active).
skipping to change at page 9, line 6 skipping to change at page 9, line 10
b) A given MAC-VRF - with a defined ES - is shutdown or not b) A given MAC-VRF - with a defined ES - is shutdown or not
provisioned yet, while the ES is operationally active (since the provisioned yet, while the ES is operationally active (since the
ES route is active). In this case, the ACS of all the ACs defined ES route is active). In this case, the ACS of all the ACs defined
in that MAC-VRF is considered to be DOWN. in that MAC-VRF is considered to be DOWN.
Neither (a) nor (b) will trigger the DF re-election on the remote Neither (a) nor (b) will trigger the DF re-election on the remote
multi-homed PEs for a given ES since the ACS is not taken into multi-homed PEs for a given ES since the ACS is not taken into
account in the DF election procedures. While the ACS is used as a DF account in the DF election procedures. While the ACS is used as a DF
election tie-breaker and trigger in VPLS multi-homing procedures election tie-breaker and trigger in VPLS multi-homing procedures
[VPLS-MH], there is no procedure defined in EVPN [RFC7432] to trigger [VPLS-MH], there is no procedure defined in EVPN [RFC7432] to trigger
the DF re- election based on the ACS change on the DF. the DF re-election based on the ACS change on the DF.
Figure 2 illustrates the described issue with an example. Figure 2 illustrates the described issue with an example.
+---+ +---+
|CE4| |CE4|
+---+ +---+
| |
PE4 | PE4 |
+-----+-----+ +-----+-----+
+---------------| +-----+ |---------------+ +---------------| +-----+ |---------------+
skipping to change at page 10, line 28 skipping to change at page 10, line 31
keep assuming PE2 is the DF. keep assuming PE2 is the DF.
Quoting [RFC7432], "when an Ethernet Tag is decommissioned on an Quoting [RFC7432], "when an Ethernet Tag is decommissioned on an
Ethernet Segment, then the PE MUST withdraw the Ethernet A-D per EVI Ethernet Segment, then the PE MUST withdraw the Ethernet A-D per EVI
route(s) announced for the <ESI, Ethernet Tags> that are impacted by route(s) announced for the <ESI, Ethernet Tags> that are impacted by
the decommissioning", however, while this A-D per EVI route the decommissioning", however, while this A-D per EVI route
withdrawal is used at the remote PEs performing aliasing or backup withdrawal is used at the remote PEs performing aliasing or backup
procedures, it is not used to influence the DF election for the procedures, it is not used to influence the DF election for the
affected EVIs. affected EVIs.
This document modifies the default DF Election procedure so that the This document adds an optional modification of the DF Election
ACS may be taken into account as a variable in the DF election, and procedure so that the ACS may be taken into account as a variable in
therefore EVPN can provide protection against logical failures. the DF election, and therefore EVPN can provide protection against
logical failures.
2.3. The Need for Extending the Default DF Election in EVPN 2.3. The Need for Extending the Default DF Election in EVPN
Section 2.2 describes some of the issues that exist in the default DF Section 2.2 describes some of the issues that exist in the default DF
Election procedures. In order to address those issues, this document Election procedures. In order to address those issues, this document
describes a new DF Election algorithm and a new capability that can introduces a new DF Election framework. This framework allows the PEs
influence the DF Election result: to agree on a common DF election type, as well as the capabilities to
enable during the DF Election procedure. In general, "DF Election
Type" refers to the type of DF election algorithm that takes a number
of parameters as input and determines the DF PE. A "DF Election
capability" refers to an additional feature that can be executed
along with the DF election algorithm, such as modifying the inputs
(or list of candidate PEs) before the DF Election algorithm chooses
the DF.
Within this framework, this document defines a new DF Election
algorithm and a new capability that can influence the DF Election
result:
o The new DF Election algorithm is referred to as "Highest Random o The new DF Election algorithm is referred to as "Highest Random
Weight" (HRW). The HRW procedures are described in section 4. Weight" (HRW). The HRW procedures are described in section 4.
o The new DF Election capability is referred to as "AC-Influenced DF o The new DF Election capability is referred to as "AC-Influenced DF
Election" (AC-DF). The AC-DF procedures are described in section 5. Election" (AC-DF). The AC-DF procedures are described in section 5.
o HRW and AC-DF mechanisms are independent of each other. Therefore, o HRW and AC-DF mechanisms are independent of each other. Therefore,
a PE MAY support either HRW or AC-DF independently or MAY support a PE MAY support either HRW or AC-DF independently or MAY support
both of them together. A PE MAY also support AC-DF capability along both of them together. A PE MAY also support AC-DF capability along
with the default DF election algorithm per [RFC7432]. with the default DF election algorithm per [RFC7432].
o In general, a DF Election Type refers to the type of DF election
algorithm that takes a number of parameters as input and determines
the DF PE. A DF Election capability refers to an additional feature
that can be executed along with the DF election algorithm, such as
modifying the inputs (or list of candidate PEs) before the DF
Election algorithm chooses the DF.
In addition, this document defines a way to indicate the support of In addition, this document defines a way to indicate the support of
HRW and/or AC-DF along with the EVPN ES routes advertised for a given HRW and/or AC-DF along with the EVPN ES routes advertised for a given
ES. Refer to section 3.2 for more details. ES. Refer to section 3.2 for more details.
3. Designated Forwarder Election Protocol and BGP Extensions 3. Designated Forwarder Election Protocol and BGP Extensions
This section describes the BGP extensions required to support the new This section describes the BGP extensions required to support the new
DF Election procedures. In addition, since the specification in EVPN DF Election procedures. In addition, since the specification in EVPN
[RFC7432] does leave several questions open as to the precise final [RFC7432] does leave several questions open as to the precise final
state machine behavior of the DF election, section 3.1 describes state machine behavior of the DF election, section 3.1 describes
precisely the intended behavior. precisely the intended behavior.
3.1 The DF Election Finite State Machine (FSM) 3.1 The DF Election Finite State Machine (FSM)
Per [RFC7432], the FSM described in Figure 3 is executed per Per [RFC7432], the FSM described in Figure 3 is executed per
<ESI,VLAN> in case of VLAN-based service or <ESI,[VLANs in VLAN- <ESI,VLAN> in case of VLAN-based service or <ESI,[VLANs in VLAN-
Bundle]> in case of VLAN-Bundle on each participating PE. Bundle]> in case of VLAN-Bundle on each participating PE.
Observe that currently the VLANs are derived from local configuration Observe that currently the VLANs are derived from local configuration
and the FSM does not provide any protection against misconfiguration and the FSM does not provide any protection against misconfiguration
where same EVI,ESI combination has different set of VLANs on where the same (EVI,ESI) combination has different set of VLANs on
different participating PEs or one of the PEs elects to consider different participating PEs or one of the PEs elects to consider
VLANs as VLAN-Bundle and another as separate VLANs for election VLANs as VLAN-Bundle and another as separate VLANs for election
purposes (service type mismatch). purposes (service type mismatch).
The FSM is normative in the sense that any design or implementation The FSM is conceptual and any design or implementation MUST comply
MUST behave towards external peers and as observable external with a behavior equivalent to the one outlined in this FSM.
behavior (DF) in a manner equivalent to this FSM.
LOST_ES LOST_ES
RCVD_ES RCVD_ES RCVD_ES RCVD_ES
LOST_ES +----+ LOST_ES +----+
+----+ | v +----+ | v
| | ++----++ RCVD_ES | | ++----++ RCVD_ES
| +-+----+ ES_UP | DF +<--------+ | +-+----+ ES_UP | DF +<--------+
+->+ INIT +---------------> WAIT | | +->+ INIT +---------------> WAIT | |
++-----+ +----+-+ | ++-----+ +----+-+ |
^ | | ^ | |
skipping to change at page 12, line 35 skipping to change at page 12, line 35
| VLAN_CHANGE | | VLAN_CHANGE |
| | | |
+-------------------------------------+ +-------------------------------------+
Figure 3 DF Election Finite State Machine Figure 3 DF Election Finite State Machine
States: States:
1. INIT: Initial State 1. INIT: Initial State
2. DF WAIT: State in which the participants waits for enough 2. DF WAIT: State in which the participant waits for enough
information to perform the DF election for the EVI/ESI/VLAN information to perform the DF election for the EVI/ESI/VLAN
combination. combination.
3. DF CALC: State in which the new DF is recomputed. 3. DF CALC: State in which the new DF is recomputed.
4. DF DONE: State in which the according DF for the EVI/ESI/VLAN 4. DF DONE: State in which the according DF for the EVI/ESI/VLAN
combination has been elected. combination has been elected.
Events: Events:
1. ES_UP: The ESI has been locally configured as 'up'. 1. ES_UP: The ESI has been locally configured as 'up'.
2. ES_DOWN: The ESI has been locally configured as 'down'. 2. ES_DOWN: The ESI has been locally configured as 'down'.
3. VLAN_CHANGE: The VLANs configured in a bundle that uses the ESI 3. VLAN_CHANGE: The VLANs configured in a bundle (that uses the ESI)
changed. This event is necessary for VLAN-Bundles only. changed. This event is necessary for VLAN-Bundles only.
4. DF_TIMER: DF Wait timer has expired. 4. DF_TIMER: DF Wait timer has expired.
5. RCVD_ES: A new or changed Ethernet Segment Route is received in a 5. RCVD_ES: A new or changed Ethernet Segment Route is received in a
BGP REACH UPDATE. Receiving an unchanged UPDATE MUST NOT trigger BGP REACH UPDATE. Receiving an unchanged UPDATE MUST NOT trigger
this event. this event.
6. LOST_ES: A BGP UNREACH UPDATE for a previously received Ethernet 6. LOST_ES: A BGP UNREACH UPDATE for a previously received Ethernet
Segment route has been received. If an UNREACH is seen for a Segment route has been received. If an UNREACH is seen for a
route that has not been advertised previously, the event MUST NOT route that has not been advertised previously, the event MUST NOT
be triggered. be triggered.
7. CALCULATED: DF has been successfully calculated. 7. CALCULATED: DF has been successfully calculated.
According actions when transitions are performed or states According actions when transitions are performed or states
entered/exited: entered/exited:
1. ANY STATE on ES_DOWN: (i)stop DF timer (ii) assume non-DF for 1. ANY STATE on ES_DOWN: (i) stop DF timer (ii) assume non-DF for
local PE. local PE.
2. INIT on ES_UP: (i)do nothing. 2. INIT on ES_UP: transition to DF_WAIT.
3. INIT on RCVD_ES, LOST_ES: (i)do nothing. 3. INIT on RCVD_ES, LOST_ES: do nothing.
4. DF_WAIT on entering the state: (i) start DF timer if not started 4. DF_WAIT on entering the state: (i) start DF timer if not started
already or expired (ii) assume non-DF for local PE. already or expired (ii) assume non-DF for local PE.
5. DF_WAIT on RCVD_ES, LOST_ES: do nothing. 5. DF_WAIT on RCVD_ES, LOST_ES: do nothing.
6. DF_WAIT on DF_TIMER: do nothing. 6. DF_WAIT on DF_TIMER: transition to DF_CALC.
7. DF_CALC on entering or re-entering the state: (i) rebuild 7. DF_CALC on entering or re-entering the state: (i) rebuild
according list and hashes and perform election (ii) FSM generates candidate list, hash and perform election (ii) Afterwards FSM
CALCULATED event against itself. generates CALCULATED event against itself.
8. DF_CALC on LOST_ES or VLAN_CHANGE: do nothing. 8. DF_CALC on LOST_ES or VLAN_CHANGE: do nothing.
9. DF_CALC on RCVD_ES: do nothing. 9. DF_CALC on RCVD_ES: transition to DF_WAIT.
10. DF_CALC on CALCULATED: (i) mark election result for VLAN or 10. DF_CALC on CALCULATED: mark election result for VLAN or bundle,
bundle. and transition to DF_DONE.
11. DF_DONE on exiting the state: (i)if RFC7432 election or new 11. DF_DONE on exiting the state: (i) if [RFC7432] election or new
election and lost primary DF then assume non-DF for local PE for election and lost primary DF then assume non-DF for local PE for
VLAN or VLAN-Bundle. VLAN or VLAN-Bundle.
12. DF_DONE on VLAN_CHANGE or LOST_ES: do nothing. 12. DF_DONE on VLAN_CHANGE or LOST_ES: transition to DF_CALC.
13. DF_DONE on RCVD_ES: transition to DF_WAIT.
3.2 The DF Election Extended Community 3.2 The DF Election Extended Community
For the DF election procedures to be globally convergent and For the DF election procedures to be globally consistent and
unanimous, it is necessary that all the participating PEs agree on unanimous, it is necessary that all the participating PEs agree on
the DF Election algorithm to be used. For instance, it is not the DF Election type and capabilities to be used. For instance, it is
possible that some PEs continue to use the default DF Election not possible that some PEs continue to use the default DF Election
algorithm and some PEs use HRW. For brown-field deployments and for algorithm and some PEs use HRW. For brown-field deployments and for
interoperability with legacy boxes, its is important that all PEs interoperability with legacy boxes, its is important that all PEs
need to have the capability to fall back on the Default DF Election. need to have the capability to fall back on the Default DF Election.
A PE can indicate its willingness to support HRW and/or AC-DF by A PE can indicate its willingness to support HRW and/or AC-DF by
signaling a DF Election Extended Community along with the Ethernet signaling a DF Election Extended Community along with the Ethernet
Segment Route (Type-4). Segment Route (Type-4).
The DF Election Extended Community is a new BGP transitive extended The DF Election Extended Community is a new BGP transitive extended
community attribute [RFC4360] that is defined to identify the DF community attribute [RFC4360] that is defined to identify the DF
election procedure to be used for the Ethernet Segment. Figure 4 election procedure to be used for the Ethernet Segment. Figure 4
skipping to change at page 15, line 22 skipping to change at page 15, line 26
- Bit 25: AC-DF (AC-Influenced DF Election, explained in this - Bit 25: AC-DF (AC-Influenced DF Election, explained in this
document). When set to 1, it indicates the desire to use AC- document). When set to 1, it indicates the desire to use AC-
Influenced DF Election with the rest of the PEs in the ES. Influenced DF Election with the rest of the PEs in the ES.
- Bits 26-31: Unassigned. - Bits 26-31: Unassigned.
The DF Election Extended Community is used as follows: The DF Election Extended Community is used as follows:
o A PE SHOULD attach the DF Election Extended Community to any o A PE SHOULD attach the DF Election Extended Community to any
advertised ES route and the Extended Community MUST be sent if the advertised ES route and the Extended Community MUST be sent if the
ES is locally configured for DF Type HRW and/or AC-DF. In the ES is locally configured with a DF election type different from the
Extended Community, the PE indicates the desired "DF Type" Default Election algorithm or if a capability is required to be
algorithm and "Bitmap" capabilities to be used for the ES. Only one used. In the Extended Community, the PE indicates the desired "DF
DF Election Extended Community can be sent along with an ES route. Type" algorithm and "Bitmap" capabilities to be used for the ES.
- Only one DF Election Extended Community can be sent along with an
ES route. Note that the intent is not for the advertising PE to
indicate all the supported DF Types and capabilities, but signal
the preferred ones.
- DF Types 0 and 1 can be both used with bit AC-DF set to 0 or 1. - DF Types 0 and 1 can be both used with bit AC-DF set to 0 or 1.
- In general, a specific DF Type MAY determine the use of the - In general, a specific DF Type MAY determine the use of the
reserved bits in the Extended Community. In case of DF Type HRW, reserved bits in the Extended Community. In case of DF Type HRW,
the reserved bits will be sent as 0 and will be ignored on the reserved bits will be sent as 0 and will be ignored on
reception. reception.
o When a PE receives the ES Routes from all the other PEs for the ES o When a PE receives the ES Routes from all the other PEs for the ES
in question, it checks to see if all the advertisements have the in question, it checks to see if all the advertisements have the
extended community with the same DF Type and Bitmap: extended community with the same DF Type and Bitmap:
- In the case that they do, this particular PE will follow the - In the case that they do, this particular PE MUST follow the
procedures for the advertised DF Type and capabilities. For procedures for the advertised DF Type and capabilities. For
instance, if all ES routes for a given ES indicate DF Type HRW instance, if all ES routes for a given ES indicate DF Type HRW
and AC-DF set to 1, the receiving PE and by induction all the and AC-DF set to 1, the receiving PE and by induction all the
other PEs in the ES will proceed to do DF Election as per the HRW other PEs in the ES will proceed to do DF Election as per the HRW
Algorithm and following the AC-DF procedures. Algorithm and following the AC-DF procedures.
- Otherwise if even a single advertisement for the type-4 route is - Otherwise if even a single advertisement for the type-4 route is
not received with the locally configured DF Type and capability, not received with the locally configured DF Type and capability,
the default DF Election algorithm (modulus) algorithm MUST be the default DF Election algorithm (modulus) algorithm MUST be
used as in [RFC7432]. used as in [RFC7432].
- The absence of the DF Election Extended Community MUST be - The absence of the DF Election Extended Community MUST be
interpreted by a receiving PE as an indication of the default DF interpreted by a receiving PE as an indication of the default DF
Election algorithm on the sending PE, that is, DF Type 0 and no Election algorithm on the sending PE, that is, DF Type 0 and no
DF Election capabilities. DF Election capabilities.
o When all the PEs in an ES advertise DF Type 255, they will rely on o When all the PEs in an ES advertise DF Type 255, they will rely on
the local policy to decide how to proceed with the DF Election. the local policy to decide how to proceed with the DF Election.
o For any new capability defined in the future, the
applicability/compatibility of this new capability to the existing
DF types must be assessed on a per case by case basis.
o Likewise, for any new DF type defined in future, its
applicability/compatibility to the existing capabilities must be
assessed on a per case by case basis.
3.3 Auto-Derivation of ES-Import Route Target 3.3 Auto-Derivation of ES-Import Route Target
Section 7.6 of [RFC7432] describes how the value of the ES-Import Section 7.6 of [RFC7432] describes how the value of the ES-Import
Route Target for ESI types 1, 2, and 3 can be auto-derived by using Route Target for ESI types 1, 2, and 3 can be auto-derived by using
the high-order six bytes of the nine byte ESI value. This document the high-order six bytes of the nine byte ESI value. The same auto-
extends the same auto-derivation procedure to ESI types 0, 4, and 5. derivation procedure can be extended to ESI types 0, 4, and 5 as long
as it is ensured that the auto-derived values for ES-Import RT among
different ES types don't overlap.
4. The Highest Random Weight DF Election Type 4. The Highest Random Weight DF Election Type
The procedure discussed in this section is applicable to the DF The procedure discussed in this section is applicable to the DF
Election in EVPN Services [RFC7432] and EVPN Virtual Private Wire Election in EVPN Services [RFC7432] and EVPN Virtual Private Wire
Services [RFC8214]. Services [RFC8214].
Highest Random Weight (HRW) as defined in [HRW1999] is originally Highest Random Weight (HRW) as defined in [HRW1999] is originally
proposed in the context of Internet Caching and proxy Server load proposed in the context of Internet Caching and proxy Server load
balancing. Given an object name and a set of servers, HRW maps a balancing. Given an object name and a set of servers, HRW maps a
skipping to change at page 16, line 48 skipping to change at page 17, line 23
to the key distribution and imparts a good uniform distribution of to the key distribution and imparts a good uniform distribution of
the hash output is an important aspect of the algorithm. Fortunately the hash output is an important aspect of the algorithm. Fortunately
many such hash functions exist. [HRW1999] provides pseudo-random many such hash functions exist. [HRW1999] provides pseudo-random
functions based on Unix utilities rand and srand and easily functions based on Unix utilities rand and srand and easily
constructed XOR functions that perform considerably well. This constructed XOR functions that perform considerably well. This
imparts very good properties in the load balancing context. Also each imparts very good properties in the load balancing context. Also each
server independently and unambiguously arrives at the primary server server independently and unambiguously arrives at the primary server
selection. HRW already finds use in multicast and ECMP [RFC2991], selection. HRW already finds use in multicast and ECMP [RFC2991],
[RFC2992]. [RFC2992].
In the default DF Election algorithm (Section 2.1), whenever a new PE
comes up or an existing PE goes down, there is a significant interval
before the change is noticed by all peer PEs as it has to be conveyed
by the BGP update message involving the type-4 route. There is a
timer to batch all the messages before triggering the service carving
procedures.
When the timer expires, each PE will build the ordered list and
follow the procedures for DF Election. In the proposed method which
we will describe shortly this "jittered" behavior is retained.
4.1. HRW and Consistent Hashing 4.1. HRW and Consistent Hashing
HRW is not the only algorithm that addresses the object to server HRW is not the only algorithm that addresses the object to server
mapping problem with goals of fair load distribution, redundancy and mapping problem with goals of fair load distribution, redundancy and
fast access. There is another family of algorithms that also fast access. There is another family of algorithms that also
addresses this problem; these fall under the umbrella of the addresses this problem; these fall under the umbrella of the
Consistent Hashing Algorithms [CHASH]. These will not be considered Consistent Hashing Algorithms [CHASH]. These will not be considered
here. here.
4.2. HRW Algorithm for EVPN DF Election 4.2. HRW Algorithm for EVPN DF Election
The applicability of HRW to DF Election is described here. Let DF(v) The applicability of HRW to DF Election is described here. Let DF(v)
denote the Designated Forwarder and BDF(v) the Backup Designated denote the Designated Forwarder and BDF(v) the Backup Designated
forwarder for the Ethernet Tag V, where v is the VLAN, Si is the IP forwarder for the Ethernet Tag V, where v is the VLAN, Si is the IP
address of server i, Es denotes the Ethernet Segment Identifier and address of server i, Es denotes the Ethernet Segment Identifier and
weight is a pseudo-random function of v and Si. weight is a pseudo-random function of v and Si.
Note that while the DF election algorithm in [RFC7432] uses PE Note that while the DF election algorithm in [RFC7432] uses PE
address and Ethernet Tag as inputs, this document uses PE address, address and vlan as inputs, this document uses PE address, ESI, and
ESI, and Ethernet Tag as inputs. This is because if the same set of vlan as inputs. This is because if the same set of PEs are multi-
PEs are multi-homed to the same set of ESes, then the DF election homed to the same set of ESes, then the DF election algorithm used in
algorithm used in [RFC7432] would result in the same PE being elected [RFC7432] would result in the same PE being elected DF for the same
DF for the same set of broadcast domains on each ES, which can have set of broadcast domains on each ES, which can have adverse side-
adverse side-effects on both load balancing and redundancy. Including effects on both load balancing and redundancy. Including ESI in the
ESI in the DF election algorithm introduces additional entropy which DF election algorithm introduces additional entropy which
significantly reduces the probability of the same PE being elected DF significantly reduces the probability of the same PE being elected DF
for the same set of broadcast domains on each ES. Therefore, the ESI for the same set of broadcast domains on each ES. Therefore, the ESI
value in the Weight function below SHOULD be set to that of value in the Weight function below SHOULD be set to that of
corresponding ES. The ESI value MAY be set to all 0's in the Weight corresponding ES. The ESI value MAY be set to all 0's in the Weight
function below if the operator chooses so. function below if the operator chooses so.
In case of a VLAN-Bundle service, v denotes the lowest VLAN similar In case of a VLAN-Bundle service, v denotes the lowest VLAN similar
to the 'lowest VLAN in bundle' logic of [RFC7432]. to the 'lowest VLAN in bundle' logic of [RFC7432].
1. DF(v) = Si: Weight(v, Es, Si) >= Weight(V, Es, Sj), for all j. In 1. DF(v) = Si: Weight(v, Es, Si) >= Weight(V, Es, Sj), for all j. In
case of a tie, choose the PE whose IP address is numerically the case of a tie, choose the PE whose IP address is numerically the
least. Note 0 <= i,j <= Number of PEs in the redundancy group. least. Note 0 <= i,j <= Number of PEs in the redundancy group.
2. BDF(v) = Sk: Weight(v, Es, Si) >= Weight(V, Es, Sk) and Weight(v, 2. BDF(v) = Sk: Weight(v, Es, Si) >= Weight(V, Es, Sk) and Weight(v,
Sk) >= Weight(v, Es, Sj). In case of tie choose the PE whose IP Es, Sk) >= Weight(v, Es, Sj). In case of tie choose the PE whose
address is numerically the least. IP address is numerically the least.
Since the Weight is a Pseudo-random function with domain as the Since the Weight is a Pseudo-random function with domain as the
three-tuple (v, Es, S), it is an efficient deterministic algorithm three-tuple (v, Es, S), it is an efficient deterministic algorithm
which is independent of the Ethernet Tag V sample space distribution. which is independent of the Ethernet Tag V sample space distribution.
Choosing a good hash function for the pseudo-random function is an Choosing a good hash function for the pseudo-random function is an
important consideration for this algorithm to perform probably better important consideration for this algorithm to perform probably better
than the default algorithm. As mentioned previously, such functions than the default algorithm. As mentioned previously, such functions
are described in the HRW paper. We take as candidate hash functions are described in the HRW paper. We take as candidate hash functions
two of the ones that are preferred in [HRW1999]. two of the ones that are preferred in [HRW1999].
1. Wrand(v, Es, Si) = (1103515245((1103515245.Si+12345)XOR 1. Wrand(v, Es, Si) = (1103515245((1103515245.Si+12345)XOR
D(v,Es))+12345)(mod 2^31) and D(v,Es))+12345)(mod 2^31) and
2. Wrand2(v, Es, Si) = (1103515245((1103515245.D(v,Es)+12345)XOR 2. Wrand2(v, Es, Si) = (1103515245((1103515245.D(v,Es)+12345)XOR
Si)+12345)(mod 2^31) Si)+12345)(mod 2^31)
Here D(v,Es) is the 31-bit digest (CRC-32 and discarding the MSB as Here D(v,Es) is the 31-bit digest (CRC-32 and discarding the MSB as
in [HRW1999] ) of the 14-byte stream, the Ethernet Tag v (4 bytes) in [HRW1999]) of the 14-byte stream, the Ethernet Tag v (4 bytes)
followed by the Ethernet Segment Identifier (10 bytes). Si is address followed by the Ethernet Segment Identifier (10 bytes). It is
of the ith server. The server's IP address length does not matter as mandated that the 14-byte stream is formed by concatenation of the
only the low-order 31 bits are modulo significant. Although both the Ethernet tag and the Ethernet Segment identifier in network byte
above hash functions perform similarly, we select the first hash order. The CRC should proceed as if the architecture is in network
function (1) of choice, as the hash function has to be the same in byte order (big-endian). Si is address of the ith server. The
all the PEs participating in the DF election. server's IP address length does not matter as only the low-order 31
bits are modulo significant. Although both the above hash functions
perform similarly, we select the first hash function (1) of choice,
as the hash function has to be the same in all the PEs participating
in the DF election.
A point to note is that the Weight function takes into consideration A point to note is that the Weight function takes into consideration
the combination of the Ethernet Tag, Ethernet Segment and the PE IP- the combination of the Ethernet Tag, Ethernet Segment and the PE IP-
address, and the actual length of the server IP address (whether V4 address, and the actual length of the server IP address (whether V4
or V6) is not really relevant The existing algorithm in [RFC7432] as or V6) is not really relevant. The default algorithm in [RFC7432]
is cannot employ both V4 and V6 neighbor peering address. cannot employ both V4 and V6 PE addresses, since [RFC7432] does not
specify how to decide on the ordering (the ordinal list) when both V4
and V6 PEs are present.
HRW solves the disadvantage pointed out in Section 2.2.1 and ensures: HRW solves the disadvantage pointed out in Section 2.2.1 and ensures:
o with very high probability that the task of DF election for o with very high probability that the task of DF election for
respective VLANs is more or less equally distributed among the PEs respective VLANs is more or less equally distributed among the PEs
even for the 2 PE case. even for the 2 PE case.
o If a PE, hosting some VLANs on given ES, but is neither the DF nor o If a PE, hosting some VLANs on given ES, but is neither the DF nor
the BDF for that VLAN, goes down or its connection to the ES goes the BDF for that VLAN, goes down or its connection to the ES goes
down, it does not result in a DF and BDF reassignment the other down, it does not result in a DF and BDF reassignment the other
skipping to change at page 19, line 15 skipping to change at page 19, line 31
o In addition to the DF, the algorithm also furnishes the BDF, which o In addition to the DF, the algorithm also furnishes the BDF, which
would be the DF if the current DF fails. would be the DF if the current DF fails.
5. The Attachment Circuit Influenced DF Election Capability 5. The Attachment Circuit Influenced DF Election Capability
The procedure discussed in this section is applicable to the DF The procedure discussed in this section is applicable to the DF
Election in EVPN Services [RFC7432] and EVPN Virtual Private Wire Election in EVPN Services [RFC7432] and EVPN Virtual Private Wire
Services [RFC8214]. Services [RFC8214].
The AC-DF capability MAY be used with any "DF Type" algorithm. It The AC-DF capability MAY be used with any "DF Type" algorithm. It
modifies the default DF Election procedures in [RFC7432] by removing MUST modify the DF Election procedures by removing from consideration
from consideration any candidate PE in the ES that cannot forward any candidate PE in the ES that cannot forward traffic on the AC that
traffic on the AC that belongs to the BD. This section is applicable belongs to the BD. This section is applicable to VLAN-Based and VLAN-
to VLAN-Based and VLAN-Bundle service interfaces. Section 5.1 Bundle service interfaces. Section 5.1 describes the procedures for
describes the procedures for VLAN-Aware Bundle interfaces. VLAN-Aware Bundle interfaces.
In particular, the AC-DF capability modifies the Step 3 in the In particular, when used with the default DF Type, the AC-DF
default DF Election procedure described in [RFC7432] Section 8.5, as capability modifies the Step 3 in the DF Election procedure described
follows: in [RFC7432] Section 8.5, as follows:
3. When the timer expires, each PE builds an ordered "candidate" list 3. When the timer expires, each PE builds an ordered "candidate" list
of the IP addresses of all the PE nodes connected to the Ethernet of the IP addresses of all the PE nodes connected to the Ethernet
Segment (including itself), in increasing numeric value. The Segment (including itself), in increasing numeric value. The
candidate list is based on the Originator Router's IP addresses of candidate list is based on the Originator Router's IP addresses of
the ES routes, excluding all the PEs for which no Ethernet A-D per the ES routes, excluding all the PEs for which no Ethernet A-D per
ES route has been received, or for which the route has been ES route has been received, or for which the route has been
withdrawn. Afterwards, the DF Election algorithm is applied on a withdrawn. Afterwards, the DF Election algorithm is applied on a
per <ES,VLAN> or <ES,VLAN-bundle>, however, the IP address for a per <ES,VLAN> or <ES,VLAN-bundle>, however, the IP address for a
PE will not be considered candidate for a given <ES,VLAN> or PE will not be considered candidate for a given <ES,VLAN> or
skipping to change at page 20, line 5 skipping to change at page 20, line 22
one (which is the default DF Election, or DF Type 0 in this one (which is the default DF Election, or DF Type 0 in this
document). document).
o The candidate list is pruned based on the Ethernet A-D routes: a o The candidate list is pruned based on the Ethernet A-D routes: a
PE's IP address MUST be removed from the ES candidate list if its PE's IP address MUST be removed from the ES candidate list if its
Ethernet A-D per ES route is withdrawn. A PE's IP address MUST NOT Ethernet A-D per ES route is withdrawn. A PE's IP address MUST NOT
be considered as candidate DF for a <ES,VLAN> or <ES,VLAN-bundle>, be considered as candidate DF for a <ES,VLAN> or <ES,VLAN-bundle>,
if its Ethernet A-D per EVI route for the <ES,VLAN> or <ES,VLAN- if its Ethernet A-D per EVI route for the <ES,VLAN> or <ES,VLAN-
bundle> respectively, is withdrawn. bundle> respectively, is withdrawn.
The following example illustrates the AC-DF behavior, assuming the The following example illustrates the AC-DF behavior applied to the
network in Figure 2: Default DF election algorithm, assuming the network in Figure 2:
a) When PE1 and PE2 discover ES12, they advertise an ES route for a) When PE1 and PE2 discover ES12, they advertise an ES route for
ES12 with the associated ES-import extended community and the DF ES12 with the associated ES-import extended community and the DF
Election Extended Community indicating AC-DF=1; they start a timer Election Extended Community indicating AC-DF=1; they start a timer
at the same time. Likewise, PE2 and PE3 advertise an ES route for at the same time. Likewise, PE2 and PE3 advertise an ES route for
ES23 with AC-DF=1 and start a timer. ES23 with AC-DF=1 and start a timer.
b) PE1/PE2 advertise an Ethernet A-D per ES route for ES12, and b) PE1/PE2 advertise an Ethernet A-D per ES route for ES12, and
PE2/PE3 advertise an Ethernet A-D per ES route for ES23. PE2/PE3 advertise an Ethernet A-D per ES route for ES23.
skipping to change at page 20, line 46 skipping to change at page 21, line 15
f) Once the PEs with ACS = DOWN for a given BD have been removed from f) Once the PEs with ACS = DOWN for a given BD have been removed from
the candidate list, the DF Election can be applied for the the candidate list, the DF Election can be applied for the
remaining N candidates. remaining N candidates.
Note that this procedure only modifies the existing EVPN control Note that this procedure only modifies the existing EVPN control
plane by adding and processing the DF Election Extended Community, plane by adding and processing the DF Election Extended Community,
and by pruning the candidate list of PEs that take part in the DF and by pruning the candidate list of PEs that take part in the DF
election. election.
In addition to the procedure described above, the following events In addition to the events defined in the FSM in Section 3.1, the
SHALL modify the candidate PE list and trigger the DF re-election in following events SHALL modify the candidate PE list and trigger the
a PE for a given <ES,VLAN> or <ES,VLAN-Bundle>: DF re-election in a PE for a given <ES,VLAN> or <ES,VLAN-Bundle>. In
the FSM of Figure 3, the events below MUST trigger a transition from
i. Local ES going DOWN due to a physical failure or reception of an DF_DONE to DF_CALC:
ES route withdraw for that ES.
ii. Local ES going UP due to its detection/configuration or
reception of a new ES route update for that ES.
iii. Local AC going DOWN/UP. i. Local AC going DOWN/UP.
iv. Reception of a new Ethernet A-D per EVI update/withdraw for the ii. Reception of a new Ethernet A-D per EVI update/withdraw for the
<ES,VLAN> or <ES,VLAN-Bundle>. <ES,VLAN> or <ES,VLAN-Bundle>.
v. Reception of a new Ethernet A-D per ES update/withdraw for the iii. Reception of a new Ethernet A-D per ES update/withdraw for the
ES. ES.
5.1. AC-Influenced DF Election Capability For VLAN-Aware Bundle Services 5.1. AC-Influenced DF Election Capability For VLAN-Aware Bundle Services
The procedure described section 5 works for VLAN-based and The procedure described section 5 works for VLAN-based and
VLAN-Bundle service interfaces since, for those service types, a PE VLAN-Bundle service interfaces since, for those service types, a PE
advertises only one Ethernet A-D per EVI route per <ES,VLAN> or advertises only one Ethernet A-D per EVI route per <ES,VLAN> or
<ES,VLAN-Bundle>. The withdrawal of such route means that the PE <ES,VLAN-Bundle>. The withdrawal of such route means that the PE
cannot forward traffic on that particular <ES,VLAN> or cannot forward traffic on that particular <ES,VLAN> or
<ES,VLAN-Bundle>, therefore the PE can be removed from consideration <ES,VLAN-Bundle>, therefore the PE can be removed from consideration
skipping to change at page 23, line 32 skipping to change at page 23, line 46
[RFC8214] Boutros, S., Sajassi, A., Salam, S., Drake, J., and J. [RFC8214] Boutros, S., Sajassi, A., Salam, S., Drake, J., and J.
Rabadan, "Virtual Private Wire Service Support in Ethernet VPN", RFC Rabadan, "Virtual Private Wire Service Support in Ethernet VPN", RFC
8214, DOI 10.17487/RFC8214, August 2017, <https://www.rfc- 8214, DOI 10.17487/RFC8214, August 2017, <https://www.rfc-
editor.org/info/rfc8214>. editor.org/info/rfc8214>.
[HRW1999] Thaler, D. and C. Ravishankar, "Using Name-Based Mappings [HRW1999] Thaler, D. and C. Ravishankar, "Using Name-Based Mappings
to Increase Hit Rates", IEEE/ACM Transactions in networking Volume 6 to Increase Hit Rates", IEEE/ACM Transactions in networking Volume 6
Issue 1, February 1998. Issue 1, February 1998.
[RFC7153] Rosen, E. and Y. Rekhter, "IANA Registries for BGP
Extended Communities", RFC 7153, DOI 10.17487/RFC7153, March 2014,
<https://www.rfc-editor.org/info/rfc7153>.
[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
Requirement Levels", BCP 14, RFC 2119, DOI 10.17487/RFC2119, March Requirement Levels", BCP 14, RFC 2119, DOI 10.17487/RFC2119, March
1997, <http://www.rfc-editor.org/info/rfc2119>. 1997, <https://www.rfc-editor.org/info/rfc2119>.
[RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC [RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC
2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, May 2017, 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, May 2017,
<https://www.rfc-editor.org/info/rfc8174>. <https://www.rfc-editor.org/info/rfc8174>.
[RFC4271] Rekhter, Y., Ed., Li, T., Ed., and S. Hares, Ed., "A
Border Gateway Protocol 4 (BGP-4)", RFC 4271, DOI 10.17487/RFC4271,
January 2006, <http://www.rfc-editor.org/info/rfc4271>.
[RFC4360] Sangli, S., Tappan, D., and Y. Rekhter, "BGP Extended [RFC4360] Sangli, S., Tappan, D., and Y. Rekhter, "BGP Extended
Communities Attribute", RFC 4360, DOI 10.17487/RFC4360, February Communities Attribute", RFC 4360, DOI 10.17487/RFC4360, February
2006, <http://www.rfc-editor.org/info/rfc4360>. 2006, <http://www.rfc-editor.org/info/rfc4360>.
9.2. Informative References 9.2. Informative References
[VPLS-MH] Kothari, Henderickx et al., "BGP based Multi-homing in [VPLS-MH] Kothari, Henderickx et al., "BGP based Multi-homing in
Virtual Private LAN Service", draft-ietf-bess-vpls-multihoming- Virtual Private LAN Service", draft-ietf-bess-vpls-multihoming-
01.txt, work in progress, January, 2016. 01.txt, work in progress, January, 2016.
[CHASH] Karger, D., Lehman, E., Leighton, T., Panigrahy, R., [CHASH] Karger, D., Lehman, E., Leighton, T., Panigrahy, R., Levine,
Levine, M., and D. Lewin, "Consistent Hashing and Random Trees: M., and D. Lewin, "Consistent Hashing and Random Trees: Distributed
Distributed Caching Protocols for Relieving Hot Spots on the World Caching Protocols for Relieving Hot Spots on the World Wide Web", ACM
Wide Web", ACM Symposium on Theory of Computing ACM Press New York, Symposium on Theory of Computing ACM Press New York, May 1997.
May 1997.
[CLRS2009] Cormen, T., Leiserson, C., Rivest, R., and C. Stein, [CLRS2009] Cormen, T., Leiserson, C., Rivest, R., and C. Stein,
"Introduction to Algorithms (3rd ed.)", MIT Press and McGraw-Hill "Introduction to Algorithms (3rd ed.)", MIT Press and McGraw-Hill
ISBN 0-262-03384-4., February 2009. ISBN 0-262-03384-4., February 2009.
[RFC2991] Thaler, D. and C. Hopps, "Multipath Issues in Unicast and [RFC2991] Thaler, D. and C. Hopps, "Multipath Issues in Unicast and
Multicast Next-Hop Selection", RFC 2991, DOI 10.17487/RFC2991, Multicast Next-Hop Selection", RFC 2991, DOI 10.17487/RFC2991,
November 2000, <http://www.rfc-editor.org/info/rfc2991>. November 2000, <http://www.rfc-editor.org/info/rfc2991>.
[RFC2992] Hopps, C., "Analysis of an Equal-Cost Multi-Path [RFC2992] Hopps, C., "Analysis of an Equal-Cost Multi-Path
Algorithm", RFC 2992, DOI 10.17487/RFC2992, November 2000, Algorithm", RFC 2992, DOI 10.17487/RFC2992, November 2000,
<http://www.rfc-editor.org/info/rfc2992>. <http://www.rfc-editor.org/info/rfc2992>.
10. Acknowledgments 10. Acknowledgments
The authors want to thank Sriram Venkateswaran, Laxmi Padakanti, The authors want to thank Sriram Venkateswaran, Laxmi Padakanti,
Ranganathan Boovaraghavan, Tamas Mondal, Sami Boutros, Jakob Heitz Ranganathan Boovaraghavan, Tamas Mondal, Sami Boutros, Jakob Heitz,
and Stephane Litkowski for their review and contributions. Mrinmoy Ghosh, Leo Mermelstein, Mankamna Misra and Samir Thoria for
their review and contributions. Special thanks to Stephane Litkowski
for his thorough review and detailed contributions.
11. Contributors 11. Contributors
In addition to the authors listed on the front page, the following In addition to the authors listed on the front page, the following
coauthors have also contributed to this document: coauthors have also contributed to this document:
Antoni Przygienda Antoni Przygienda
Juniper Networks, Inc. Juniper Networks, Inc.
1194 N. Mathilda Drive 1194 N. Mathilda Drive
Sunnyvale, CA 95134 Sunnyvale, CA 95134
skipping to change at page 26, line 4 skipping to change at page 26, line 9
San Jose, CA 95134 San Jose, CA 95134
USA USA
Email: satyamoh@cisco.com Email: satyamoh@cisco.com
Ali Sajassi Ali Sajassi
Cisco Systems, Inc. Cisco Systems, Inc.
225 West Tasman Drive 225 West Tasman Drive
San Jose, CA 95134 San Jose, CA 95134
USA USA
Email: sajassi@cisco.com Email: sajassi@cisco.com
John Drake John Drake
Juniper Networks, Inc. Juniper Networks, Inc.
1194 N. Mathilda Drive 1194 N. Mathilda Drive
Sunnyvale, CA 95134 Sunnyvale, CA 95134
USA USA
Email: jdrake@juniper.com Email: jdrake@juniper.net
Kiran Nagaraj Kiran Nagaraj
Nokia Nokia
701 E. Middlefield Road 701 E. Middlefield Road
Mountain View, CA 94043 USA Mountain View, CA 94043 USA
Email: kiran.nagaraj@nokia.com Email: kiran.nagaraj@nokia.com
Senthil Sathappan Senthil Sathappan
Nokia Nokia
701 E. Middlefield Road 701 E. Middlefield Road
 End of changes. 60 change blocks. 
149 lines changed or deleted 157 lines changed or added

This html diff was produced by rfcdiff 1.46. The latest version is available from http://tools.ietf.org/tools/rfcdiff/