draft-ietf-bess-evpn-df-election-framework-03.txt   draft-ietf-bess-evpn-df-election-framework-04.txt 
skipping to change at page 1, line 14 skipping to change at page 1, line 14
Internet Draft Nokia Internet Draft Nokia
S. Mohanty, Ed. S. Mohanty, Ed.
Intended status: Standards Track A. Sajassi Intended status: Standards Track A. Sajassi
Cisco Cisco
J. Drake J. Drake
Juniper Juniper
K. Nagaraj K. Nagaraj
S. Sathappan S. Sathappan
Nokia Nokia
Expires: November 25, 2018 May 24, 2018 Expires: April 22, 2019 October 19, 2018
Framework for EVPN Designated Forwarder Election Extensibility Framework for EVPN Designated Forwarder Election Extensibility
draft-ietf-bess-evpn-df-election-framework-03 draft-ietf-bess-evpn-df-election-framework-04
Abstract Abstract
The Designated Forwarder (DF) in EVPN networks is the PE responsible The Designated Forwarder (DF) in EVPN networks is the Provider Edge
for sending broadcast, unknown unicast and multicast (BUM) traffic to (PE) router responsible for sending broadcast, unknown unicast and
a multi-homed CE, on a given VLAN on a particular Ethernet Segment multicast (BUM) traffic to a multi-homed Customer Equipment (CE)
(ES). The DF is selected out of a list of candidate PEs that device, on a given VLAN on a particular Ethernet Segment (ES). The DF
advertise the same Ethernet Segment Identifier (ESI) to the EVPN is selected out of a list of candidate PEs that advertise the same
network. By default, EVPN uses a DF Election algorithm referred to as Ethernet Segment Identifier (ESI) to the EVPN network. By default,
"Service Carving" and it is based on a modulus function (V mod N) EVPN uses a DF Election algorithm referred to as "Service Carving"
that takes the number of PEs in the ES (N) and the VLAN value (V) as and it is based on a modulus function (V mod N) that takes the number
input. This default DF Election algorithm has some inefficiencies of PEs in the ES (N) and the VLAN value (V) as input. This default DF
that this document addresses by defining a new DF Election algorithm Election algorithm has some inefficiencies that this document
and a capability to influence the DF Election result for a VLAN, addresses by defining a new DF Election algorithm and a capability to
depending on the state of the associated Attachment Circuit (AC). In influence the DF Election result for a VLAN, depending on the state
addition, this document creates a registry with IANA, for future DF of the associated Attachment Circuit (AC). In addition, this document
Election Algorithms and Capabilities. It also presents a formal creates a registry with IANA, for future DF Election Algorithms and
definition and clarification of the DF Election Finite State Machine. Capabilities. It also presents a formal definition and clarification
of the DF Election Finite State Machine.
Status of this Memo Status of this Memo
This Internet-Draft is submitted in full conformance with the This Internet-Draft is submitted in full conformance with the
provisions of BCP 78 and BCP 79. provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF), its areas, and its working groups. Note that Task Force (IETF), its areas, and its working groups. Note that
other groups may also distribute working documents as Internet- other groups may also distribute working documents as Internet-
Drafts. Drafts.
skipping to change at page 2, line 18 skipping to change at page 2, line 19
and may be updated, replaced, or obsoleted by other documents at any and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress." material or to cite them other than as "work in progress."
The list of current Internet-Drafts can be accessed at The list of current Internet-Drafts can be accessed at
http://www.ietf.org/ietf/1id-abstracts.txt http://www.ietf.org/ietf/1id-abstracts.txt
The list of Internet-Draft Shadow Directories can be accessed at The list of Internet-Draft Shadow Directories can be accessed at
http://www.ietf.org/shadow.html http://www.ietf.org/shadow.html
This Internet-Draft will expire on November 24, 2018. This Internet-Draft will expire on April 22, 2019.
Copyright Notice Copyright Notice
Copyright (c) 2018 IETF Trust and the persons identified as the Copyright (c) 2018 IETF Trust and the persons identified as the
document authors. All rights reserved. document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents Provisions Relating to IETF Documents
(http://trustee.ietf.org/license-info) in effect on the date of (http://trustee.ietf.org/license-info) in effect on the date of
publication of this document. Please review these documents publication of this document. Please review these documents
skipping to change at page 4, line 28 skipping to change at page 4, line 28
o VID and CE-VID - VLAN Identifier and Customer Equipment VLAN o VID and CE-VID - VLAN Identifier and Customer Equipment VLAN
Identifier. Identifier.
o Ethernet Tag - used to represent a Broadcast Domain that is o Ethernet Tag - used to represent a Broadcast Domain that is
configured on a given ES for the purpose of DF election. Note that configured on a given ES for the purpose of DF election. Note that
any of the following may be used to represent a Broadcast Domain: any of the following may be used to represent a Broadcast Domain:
VIDs (including double Q-in-Q tags), configured IDs, VNI, VIDs (including double Q-in-Q tags), configured IDs, VNI,
normalized VID, I-SIDs, etc., as long as the representation of the normalized VID, I-SIDs, etc., as long as the representation of the
broadcast domains is configured consistently across the multi-homed broadcast domains is configured consistently across the multi-homed
PEs attached to that ES. PEs attached to that ES. The Ethernet Tag value MUST be different
from zero.
o Ethernet Tag ID - refers to the identifier used in the EVPN routes
defined in [RFC7432]. Its value may be the same as the Ethernet Tag
value (see Ethernet Tag definition) when advertising routes for
VLAN-aware bundle services. Note that in case of VLAN-based or VLAN
Bundle services, the Ethernet Tag ID is zero.
o DF Election Procedure and DF Algorithm - The Designated Forwarder o DF Election Procedure and DF Algorithm - The Designated Forwarder
Election Procedure or simply DF Election, refers to the process in Election Procedure or simply DF Election, refers to the process in
its entirety, including the discovery of the PEs in the ES, the its entirety, including the discovery of the PEs in the ES, the
creation and maintenance of the PE candidate list and the selection creation and maintenance of the PE candidate list and the selection
of a PE. The Designated Forwarder Algorithm is just a component of of a PE. The Designated Forwarder Algorithm is just a component of
the DF Election Procedure and strictly refers to the selection of a the DF Election Procedure and strictly refers to the selection of a
PE for a given <ES,Ethernet Tag>. PE for a given <ES,Ethernet Tag>.
This document also assumes familiarity with the terminology of This document also assumes familiarity with the terminology of
skipping to change at page 5, line 10 skipping to change at page 5, line 16
responsible for: responsible for:
o Flooding Broadcast, Unknown unicast and Multicast traffic (BUM), on o Flooding Broadcast, Unknown unicast and Multicast traffic (BUM), on
a given Ethernet Tag on a particular Ethernet Segment (ES), to the a given Ethernet Tag on a particular Ethernet Segment (ES), to the
CE. This is valid for single-active and all-active EVPN CE. This is valid for single-active and all-active EVPN
multi-homing. multi-homing.
o Sending unicast traffic on a given Ethernet Tag on a particular ES o Sending unicast traffic on a given Ethernet Tag on a particular ES
to the CE. This is valid for single-active multi-homing. to the CE. This is valid for single-active multi-homing.
Figure 1 illustrates and example that we will use to explain the Figure 1 illustrates an example that we will use to explain the
Designated Forwarder function. Designated Forwarder function.
+---------------+ +---------------+
| IP/MPLS | | IP/MPLS |
| CORE | | CORE |
+----+ ES1 +----+ +----+ +----+ ES1 +----+ +----+
| CE1|-----| |-----------| |____ES2 | CE1|-----| |-----------| |____ES2
+----+ | PE1| | PE2| \ +----+ | PE1| | PE2| \
| |-------- +----+ \+----+ | |-------- +----+ \+----+
+----+ | | | CE2| +----+ | | | CE2|
skipping to change at page 5, line 50 skipping to change at page 6, line 8
Layer-2 devices are particularly susceptible to forwarding loops Layer-2 devices are particularly susceptible to forwarding loops
because of the broadcast nature of the Ethernet traffic. Therefore it because of the broadcast nature of the Ethernet traffic. Therefore it
is very important that, in case of multi-homing, only one of the is very important that, in case of multi-homing, only one of the
links be used to direct traffic to/from the core. links be used to direct traffic to/from the core.
One of the pre-requisites for this support is that participating PEs One of the pre-requisites for this support is that participating PEs
must agree amongst themselves as to who would act as the Designated must agree amongst themselves as to who would act as the Designated
Forwarder (DF). This needs to be achieved through a distributed Forwarder (DF). This needs to be achieved through a distributed
algorithm in which each participating PE independently and algorithm in which each participating PE independently and
unambiguously selects one of the participating PEs as the DF, and the unambiguously selects one of the participating PEs as the DF, and the
result should be unanimously in agreement. result should be consistent and unanimous.
The default algorithm for DF election defined by [RFC7432] at the The default algorithm for DF election defined by [RFC7432] at the
granularity of (ESI,EVI) is referred to as "service carving". In this granularity of (ESI,EVI) is referred to as "service carving". In this
document, service carving or default DF Election algorithm is used document, service carving or default DF Election algorithm are used
indistinctly. With service carving, it is possible to elect multiple interchangeably. With service carving, it is possible to elect
DFs per Ethernet Segment (one per EVI) in order to perform load- multiple DFs per Ethernet Segment (one per EVI) in order to perform
balancing of traffic destined to a given Segment. The objective is load-balancing of traffic destined to a given Segment. The objective
that the load-balancing procedures should carve up the BD space among is that the load-balancing procedures should carve up the BD space
the redundant PE nodes evenly, in such a way that every PE is the DF among the redundant PE nodes evenly, in such a way that every PE is
for a disjoint set of EVIs. the DF for a distinct set of EVIs.
The DF Election algorithm as described in [RFC7432] (Section 8.5) is The DF Election algorithm as described in [RFC7432] (Section 8.5) is
based on a modulus operation. The PEs to which the ES (for which DF based on a modulus operation. The PEs to which the ES (for which DF
election is to be carried out per VLAN) is multi-homed form an election is to be carried out per EVI) is multi-homed form an ordered
ordered (ordinal) list in ascending order of the PE IP address (ordinal) list in ascending order of the PE IP address values. For
values. For example, there are N PEs: PE0, PE1,... PEN-1 ranked as example, there are N PEs: PE0, PE1,... PEN-1 ranked as per increasing
per increasing IP addresses in the ordinal list; then for each VLAN IP addresses in the ordinal list; then for each VLAN with Ethernet
with Ethernet Tag V, configured on the Ethernet Segment ES1, PEx is Tag V, configured on the Ethernet Segment ES1, PEx is the DF for VLAN
the DF for VLAN V on ES1 when x equals (V mod N). In the case of V on ES1 when x equals (V mod N). In the case of VLAN-Bundle only the
VLAN-Bundle only the lowest VLAN is used. In the case when the lowest VLAN is used. In the case when the planned density is high
planned density is high (meaning there are significant number of (meaning there are significant number of VLANs and the Ethernet Tags
VLANs and the Ethernet Tags are uniformly distributed), the thinking are uniformly distributed), the thinking is that the DF Election will
is that the DF Election will be spread across the PEs hosting that be spread across the PEs hosting that Ethernet Segment and good load-
Ethernet Segment and good service carving can be achieved. balancing can be achieved.
However, the described default DF Election algorithm has some However, the described default DF Election algorithm has some
undesirable properties and in some cases can be somewhat disruptive undesirable properties and in some cases can be somewhat disruptive
and unfair. This document describes some of those issues and proposes and unfair. This document describes some of those issues and proposes
a mechanism for dealing with them. These mechanisms do involve a mechanism for dealing with them. These mechanisms do involve
changes to the default DF Election algorithm, but they do not require changes to the default DF Election algorithm, but they do not require
any changes to the EVPN Route exchange and have minimal changes to any changes to the EVPN Route exchange and have minimal changes to
their content per se. their content per se.
In addition, there is a need to extend the DF Election procedures so In addition, there is a need to extend the DF Election procedures so
skipping to change at page 6, line 50 skipping to change at page 7, line 7
all the use-cases. all the use-cases.
Note that while [RFC7432] elects a DF per <ES, EVI>, this document Note that while [RFC7432] elects a DF per <ES, EVI>, this document
elects a DF per <ES, BD>. This means that unlike [RFC7432], where for elects a DF per <ES, BD>. This means that unlike [RFC7432], where for
a VLAN Aware Bundle service EVI there is only one DF for the EVI, a VLAN Aware Bundle service EVI there is only one DF for the EVI,
this document specifies that there will be multiple DFs, one for each this document specifies that there will be multiple DFs, one for each
BD configured in that EVI. BD configured in that EVI.
2.2. Problem Statement 2.2. Problem Statement
This section describes some potential issues on the default DF This section describes some potential issues with the default DF
Election algorithm. Election algorithm.
2.2.1. Unfair Load-Balancing and Service Disruption 2.2.1. Unfair Load-Balancing and Service Disruption
There are three fundamental problems with the current default DF There are three fundamental problems with the current default DF
Election algorithm. Election algorithm.
1- First, the algorithm will not perform well when the Ethernet Tag 1- First, the algorithm will not perform well when the Ethernet Tag
follows a non-uniform distribution, for instance when the Ethernet follows a non-uniform distribution, for instance when the Ethernet
Tags are all even or all odd. In such a case let us assume that Tags are all even or all odd. In such a case let us assume that
the ES is multi-homed to two PEs; all the VLANs will only pick one the ES is multi-homed to two PEs; one of the PEs will be elected
of the PEs as the DF. This is very sub-optimal. It defeats the as DF for all of the VLANs. This is very sub-optimal. It defeats
purpose of service carving as the DFs are not really evenly spread the purpose of service carving as the DFs are not really evenly
across. In fact, in this particular case, one of the PEs does not spread across. In fact, in this particular case, one of the PEs
get elected as DF at all, so it does not participate in the DF does not get elected as DF at all, so it does not participate in
responsibilities at all. Consider another example where, referring the DF responsibilities at all. Consider another example where,
to Figure 1, lets assume that PE2, PE3, PE4 are in ascending order referring to Figure 1, lets assume that PE2, PE3, PE4 are in
of the IP address; and each VLAN configured on ES2 is associated ascending order of the IP address; and each VLAN configured on ES2
with an Ethernet Tag of of the form (3x+1), where x is an integer. is associated with an Ethernet Tag of the form (3x+1), where x is
This will result in PE3 always be selected as the DF. an integer. This will result in PE3 always be selected as the DF.
2- Even in the case when the Ethernet Tag distribution is uniform the 2- Even in the case when the Ethernet Tag distribution is uniform the
instance of a PE being up or down results in re-computation ((v instance of a PE being up or down results in re-computation ((v
mod N-1) or (v mod N+1) as is the case); the resulting modulus mod N-1) or (v mod N+1) as is the case); the resulting modulus
value need not be uniformly distributed because it can be subject value need not be uniformly distributed because it can be subject
to the primality of N-1 or N+1 as may be the case. to the primality of N-1 or N+1 as may be the case.
3- The third problem is one of disruption. Consider a case when the 3- The third problem is one of disruption. Consider a case when the
same Ethernet Segment is multi homed to a set of PEs. When the ES same Ethernet Segment is multi homed to a set of PEs. When the ES
is down in one of the PEs, say PE1, or PE1 itself reboots, or the is down in one of the PEs, say PE1, or PE1 itself reboots, or the
BGP process goes down or the connectivity between PE1 and an RR BGP process goes down or the connectivity between PE1 and an RR
goes down, the effective number of PEs in the system now becomes goes down, the effective number of PEs in the system now becomes
N-1, and DFs are computed for all the VLANs that are configured on N-1, and DFs are computed for all the VLANs that are configured on
that Ethernet Segment. In general, if the DF for a VLAN v happens that Ethernet Segment. In general, if the DF for a VLAN v happens
not to be PE1, but some other PE, say PE2, it is likely that some not to be PE1, but some other PE, say PE2, it is likely that some
other PE will become the new DF. This is not desirable. Similarly other PE will become the new DF. This is not desirable. Similarly
when a new PE hosts the same Ethernet Segment, the mapping again when a new PE hosts the same Ethernet Segment, the mapping again
changes because of the modulus operation. This results in needless changes because of the modulus operation. This results in needless
churn. Again referring to Figure 1, say v1, v2 and v3 are VLANs churn. Again referring to Figure 1, say v1, v2 and v3 are VLANs
configured on ES2 with associated Ethernet Tags of value 999, 1000 configured on ES2 with associated Ethernet Tags of value 999, 1000
and 10001 respectively. So PE1, PE2 and PE3 are the DFs for v1, v2 and 1001 respectively. So PE1, PE2 and PE3 are the DFs for v1, v2
and v3 respectively. Now when PE3 goes down, PE2 will become the and v3 respectively. Now when PE3 goes down, PE2 will become the
DF for v1 and PE1 will become the DF for v2. DF for v1 and PE1 will become the DF for v2.
One point to note is that the default DF election algorithm assumes One point to note is that the default DF election algorithm assumes
that all the PEs who are multi-homed to the same Ethernet Segment that all the PEs who are multi-homed to the same Ethernet Segment
(and interested in the DF Election by exchanging EVPN routes) use an (and interested in the DF Election by exchanging EVPN routes) use an
Originating Router's IP Address of the same family. This does not Originating Router's IP Address of the same family. This does not
need to be the case as the EVPN address-family can be carried over a need to be the case as the EVPN address-family can be carried over a
v4 or v6 peering, and the PEs attached to the same ES may use an v4 or v6 peering, and the PEs attached to the same ES may use an
address of either family. address of either family.
skipping to change at page 8, line 17 skipping to change at page 8, line 22
i representing one of m hash buckets through a function h(k) i.e. i representing one of m hash buckets through a function h(k) i.e.
i=h(k). In the EVPN case, h is simply a modulo-m hash function viz. i=h(k). In the EVPN case, h is simply a modulo-m hash function viz.
h(v) = v mod N, where N is the number of PEs that are multi-homed to h(v) = v mod N, where N is the number of PEs that are multi-homed to
the Ethernet Segment in discussion. It is well-known that for good the Ethernet Segment in discussion. It is well-known that for good
hash distribution using the modulus operation, the modulus N should hash distribution using the modulus operation, the modulus N should
be a prime-number not too close to a power of 2 [CLRS2009]. When the be a prime-number not too close to a power of 2 [CLRS2009]. When the
effective number of PEs changes from N to N-1 (or vice versa); all effective number of PEs changes from N to N-1 (or vice versa); all
the objects (VLAN V) will be remapped except those for which V mod N the objects (VLAN V) will be remapped except those for which V mod N
and V mod (N-1) refer to the same PE in the previous and subsequent and V mod (N-1) refer to the same PE in the previous and subsequent
ordinal rankings respectively. From a forwarding perspective, this is ordinal rankings respectively. From a forwarding perspective, this is
a churn, as it results in programming the PE side ports as blocking a churn, as it results in re-programming the PE ports as either
or non-blocking at potentially all PEs when the DF changes. blocking or non-blocking at potentially all PEs when the DF changes.
This document addresses this problem and furnishes a solution to this This document addresses this problem and furnishes a solution to this
undesirable behavior. undesirable behavior.
2.2.2. Traffic Black-Holing on Individual AC Failures 2.2.2. Traffic Black-Holing on Individual AC Failures
As discussed in section 2.1 the default DF Election algorithm defined As discussed in section 2.1 the default DF Election algorithm defined
by [RFC7432] takes into account only two variables in the modulus by [RFC7432] takes into account only two variables in the modulus
function for a given ES: the existence of the PE's IP address on the function for a given ES: the existence of the PE's IP address on the
candidate list and the locally provisioned Ethernet Tags. candidate list and the locally provisioned Ethernet Tags.
skipping to change at page 10, line 18 skipping to change at page 10, line 23
BUM traffic to CE12 will be "black-holed", whereas for single- BUM traffic to CE12 will be "black-holed", whereas for single-
active multi-homing, all the traffic to/from CE12 will be active multi-homing, all the traffic to/from CE12 will be
discarded. This is due to the fact that a logical failure in PE2's discarded. This is due to the fact that a logical failure in PE2's
AC2 may not trigger an ES route withdrawn for ES12 (since there AC2 may not trigger an ES route withdrawn for ES12 (since there
are still other ACs active on ES12) and therefore PE1 will not re- are still other ACs active on ES12) and therefore PE1 will not re-
run the DF election procedures. run the DF election procedures.
b) If the Bridge Table for BD-1 is administratively shutdown or even b) If the Bridge Table for BD-1 is administratively shutdown or even
not configured yet on PE2, CE12 and CE23 will both be impacted: not configured yet on PE2, CE12 and CE23 will both be impacted:
BUM traffic to both CEs will be discarded in case of all-active BUM traffic to both CEs will be discarded in case of all-active
multi- homing and all traffic will be discarded to/from the CEs in multi-homing and all traffic will be discarded to/from the CEs in
case of single-active multi-homing. This is due to the fact that case of single-active multi-homing. This is due to the fact that
PE1 and PE3 will not re-run the DF election procedures and will PE1 and PE3 will not re-run the DF election procedures and will
keep assuming PE2 is the DF. keep assuming PE2 is the DF.
Quoting [RFC7432], "when an Ethernet Tag is decommissioned on an Quoting [RFC7432], "when an Ethernet Tag is decommissioned on an
Ethernet Segment, then the PE MUST withdraw the Ethernet A-D per EVI Ethernet Segment, then the PE MUST withdraw the Ethernet A-D per EVI
route(s) announced for the <ESI, Ethernet Tags> that are impacted by route(s) announced for the <ESI, Ethernet Tags> that are impacted by
the decommissioning", however, while this A-D per EVI route the decommissioning", however, while this A-D per EVI route
withdrawal is used at the remote PEs performing aliasing or backup withdrawal is used at the remote PEs performing aliasing or backup
procedures, it is not used to influence the DF election for the procedures, it is not used to influence the DF election for the
skipping to change at page 10, line 41 skipping to change at page 10, line 46
This document adds an optional modification of the DF Election This document adds an optional modification of the DF Election
procedure so that the ACS may be taken into account as a variable in procedure so that the ACS may be taken into account as a variable in
the DF election, and therefore EVPN can provide protection against the DF election, and therefore EVPN can provide protection against
logical failures. logical failures.
2.3. The Need for Extending the Default DF Election in EVPN 2.3. The Need for Extending the Default DF Election in EVPN
Section 2.2 describes some of the issues that exist in the default DF Section 2.2 describes some of the issues that exist in the default DF
Election procedures. In order to address those issues, this document Election procedures. In order to address those issues, this document
introduces a new DF Election framework. This framework allows the PEs introduces a new DF Election framework. This framework allows the PEs
to agree on a common DF election type, as well as the capabilities to to agree on a common DF election algorithm, as well as the
enable during the DF Election procedure. In general, "DF Election capabilities to enable during the DF Election procedure Generally,
Type" refers to the type of DF election algorithm that takes a number 'DF election algorithm' refers to the algorithm by which a number of
of parameters as input and determines the DF PE. A "DF Election input parameters are used to determine the DF PE, while 'DF election
capability" refers to an additional feature that can be executed capability' refers to an additional feature that can be used prior to
along with the DF election algorithm, such as modifying the inputs the invocation of the DF election algorithm, such as modifying the
(or list of candidate PEs) before the DF Election algorithm chooses inputs (or list of candidate PEs).
the DF.
Within this framework, this document defines a new DF Election Within this framework, this document defines a new DF Election
algorithm and a new capability that can influence the DF Election algorithm and a new capability that can influence the DF Election
result: result:
o The new DF Election algorithm is referred to as "Highest Random o The new DF Election algorithm is referred to as "Highest Random
Weight" (HRW). The HRW procedures are described in section 4. Weight" (HRW). The HRW procedures are described in section 4.
o The new DF Election capability is referred to as "AC-Influenced DF o The new DF Election capability is referred to as "AC-Influenced DF
Election" (AC-DF). The AC-DF procedures are described in section 5. Election" (AC-DF). The AC-DF procedures are described in section 5.
skipping to change at page 12, line 5 skipping to change at page 12, line 5
Observe that currently the VLANs are derived from local configuration Observe that currently the VLANs are derived from local configuration
and the FSM does not provide any protection against misconfiguration and the FSM does not provide any protection against misconfiguration
where the same (EVI,ESI) combination has different set of VLANs on where the same (EVI,ESI) combination has different set of VLANs on
different participating PEs or one of the PEs elects to consider different participating PEs or one of the PEs elects to consider
VLANs as VLAN-Bundle and another as separate VLANs for election VLANs as VLAN-Bundle and another as separate VLANs for election
purposes (service type mismatch). purposes (service type mismatch).
The FSM is conceptual and any design or implementation MUST comply The FSM is conceptual and any design or implementation MUST comply
with a behavior equivalent to the one outlined in this FSM. with a behavior equivalent to the one outlined in this FSM.
LOST_ES VLAN_CHANGE
RCVD_ES RCVD_ES VLAN_CHANGE RCVD_ES
RCVD_ES LOST_ES
LOST_ES +----+ LOST_ES +----+
+----+ | v +----+ | v
| | ++----++ RCVD_ES | | ++----++
| +-+----+ ES_UP | DF +<--------+ | +-+----+ ES_UP | DF |
+->+ INIT +---------------> WAIT | | +->+ INIT +---------------> WAIT |
++-----+ +----+-+ | ++-----+ +----+-+
^ | | ^ |
+-----------+ | |DF_TIMER | +-----------+ | |DF_TIMER
| ANY STATE +-------+ VLAN_CHANGE | | | ANY STATE +-------+ VLAN_CHANGE |
+-----------+ ES_DOWN +-----------------+ | ^ +-----------+ ES_DOWN +-----------------+ |
| LOST_ES v v | | RCVD_ES v v
+-----++ ++---+-+ | +-----++ LOST_ES ++---+-+
| DF | | DF +---------+ | DF | | DF |
| DONE +<--------------+ CALC +v-+ | | DONE +<--------------+ CALC +<-+
+-+----+ CALCULATED +----+-+ | | +------+ CALCULATED +----+-+ |
| | | | | |
| +----+ | +----+
| LOST_ES | VLAN_CHANGE
| VLAN_CHANGE | RCVD_ES
| | LOST_ES
+-------------------------------------+
Figure 3 DF Election Finite State Machine Figure 3 DF Election Finite State Machine
States: States:
1. INIT: Initial State 1. INIT: Initial State
2. DF WAIT: State in which the participant waits for enough 2. DF WAIT: State in which the participant waits for enough
information to perform the DF election for the EVI/ESI/VLAN information to perform the DF election for the EVI/ESI/VLAN
combination. combination.
skipping to change at page 13, line 26 skipping to change at page 13, line 26
7. CALCULATED: DF has been successfully calculated. 7. CALCULATED: DF has been successfully calculated.
According actions when transitions are performed or states According actions when transitions are performed or states
entered/exited: entered/exited:
1. ANY STATE on ES_DOWN: (i) stop DF timer (ii) assume non-DF for 1. ANY STATE on ES_DOWN: (i) stop DF timer (ii) assume non-DF for
local PE. local PE.
2. INIT on ES_UP: transition to DF_WAIT. 2. INIT on ES_UP: transition to DF_WAIT.
3. INIT on RCVD_ES, LOST_ES: do nothing. 3. INIT on VLAN_CHANGE, RCVD_ES, LOST_ES: do nothing.
4. DF_WAIT on entering the state: (i) start DF timer if not started 4. DF_WAIT on entering the state: (i) start DF timer if not started
already or expired (ii) assume non-DF for local PE. already or expired (ii) assume non-DF for local PE.
5. DF_WAIT on RCVD_ES, LOST_ES: do nothing. 5. DF_WAIT on VLAN_CHANGE, RCVD_ES, LOST_ES: do nothing.
6. DF_WAIT on DF_TIMER: transition to DF_CALC. 6. DF_WAIT on DF_TIMER: transition to DF_CALC.
7. DF_CALC on entering or re-entering the state: (i) rebuild 7. DF_CALC on entering or re-entering the state: (i) rebuild
candidate list, hash and perform election (ii) Afterwards FSM candidate list, hash and perform election (ii) Afterwards FSM
generates CALCULATED event against itself. generates CALCULATED event against itself.
8. DF_CALC on LOST_ES or VLAN_CHANGE: do nothing. 8. DF_CALC on VLAN_CHANGE, RCVD_ES, LOST_ES: do nothing.
9. DF_CALC on RCVD_ES: transition to DF_WAIT.
10. DF_CALC on CALCULATED: mark election result for VLAN or bundle, 9. DF_CALC on CALCULATED: mark election result for VLAN or bundle,
and transition to DF_DONE. and transition to DF_DONE.
11. DF_DONE on exiting the state: (i) if [RFC7432] election or new 11. DF_DONE on exiting the state: if there is a new DF election
election and lost primary DF then assume non-DF for local PE for triggered and the current DF is lost, then assume non-DF for
VLAN or VLAN-Bundle. local PE for VLAN or VLAN-Bundle.
12. DF_DONE on VLAN_CHANGE or LOST_ES: transition to DF_CALC.
13. DF_DONE on RCVD_ES: transition to DF_WAIT. 12. DF_DONE on VLAN_CHANGE, RCVD_ES or LOST_ES: transition to
DF_CALC.
3.2 The DF Election Extended Community 3.2 The DF Election Extended Community
For the DF election procedures to be globally consistent and For the DF election procedures to be consistent and unanimous, it is
unanimous, it is necessary that all the participating PEs agree on necessary that all the participating PEs agree on the DF Election
the DF Election type and capabilities to be used. For instance, it is algorithm and capabilities to be used. For instance, it is not
not possible that some PEs continue to use the default DF Election possible that some PEs continue to use the default DF Election
algorithm and some PEs use HRW. For brown-field deployments and for algorithm and some PEs use HRW. For brown-field deployments and for
interoperability with legacy boxes, its is important that all PEs interoperability with legacy PEs, it is important that all PEs need
need to have the capability to fall back on the Default DF Election. to have the capability to fall back on the Default DF Election. A PE
A PE can indicate its willingness to support HRW and/or AC-DF by can indicate its willingness to support HRW and/or AC-DF by signaling
signaling a DF Election Extended Community along with the Ethernet a DF Election Extended Community along with the Ethernet Segment
Segment Route (Type-4). Route (Type-4).
The DF Election Extended Community is a new BGP transitive extended The DF Election Extended Community is a new BGP transitive extended
community attribute [RFC4360] that is defined to identify the DF community attribute [RFC4360] that is defined to identify the DF
election procedure to be used for the Ethernet Segment. Figure 4 election procedure to be used for the Ethernet Segment. Figure 4
shows the encoding of the DF Election Extended Community. shows the encoding of the DF Election Extended Community.
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Type=0x06 | Sub-Type(0x06)| DF Type | Bitmap | | Type=0x06 | Sub-Type(0x06)| DF Alg | Bitmap |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Reserved = 0 | | Bitmap | Reserved |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Figure 4 DF Election Extended Community Figure 4 DF Election Extended Community
Where: Where:
o Type is 0x06 as registered with IANA for EVPN Extended Communities. o Type is 0x06 as registered with IANA for EVPN Extended Communities.
o Sub-Type is 0x06 - "DF Election Extended Community" as requested by o Sub-Type is 0x06 - "DF Election Extended Community" as requested by
this document to IANA. this document to IANA.
o DF Type (1 octet) - Encodes the DF Election algorithm values o DF Alg (1 octet) - Encodes the DF Election algorithm values
(between 0 and 255) that the advertising PE desires to use for the (between 0 and 255) that the advertising PE desires to use for the
ES. This document requests IANA to set up a registry called "DF ES. This document requests IANA to set up a registry called "DF Alg
Type Registry" and solicits the following values: Registry" and solicits the following values:
- Type 0: Default DF Election algorithm, or modulus-based algorithm - Type 0: Default DF Election algorithm, or modulus-based algorithm
as in [RFC7432]. as in [RFC7432].
- Type 1: HRW algorithm (explained in this document). - Type 1: HRW algorithm (explained in this document).
- Types 2-254: Unassigned. - Types 2-254: Unassigned.
- Type 255: Reserved for Experimental Use. - Type 255: Reserved for Experimental Use.
o Bitmap (1 octet) - Encodes "capabilities" associated to the DF o Bitmap (2 octets) - Encodes "capabilities" to use with the DF
Election algorithm in the field "DF Type". This document requests Election algorithm in the field "DF Alg". This document requests
IANA to create a registry for the Bitmap field, called "DF Election IANA to create a registry for the Bitmap field, with values 0-15,
Capabilities" and solicits the following values: called "DF Election Capabilities" and solicits the following
values:
- Bit 24: Unassigned. - Bit 0: Unassigned.
- Bit 25: AC-DF (AC-Influenced DF Election, explained in this - Bit 1: AC-DF (AC-Influenced DF Election, explained in this
document). When set to 1, it indicates the desire to use AC- document). When set to 1, it indicates the desire to use AC-
Influenced DF Election with the rest of the PEs in the ES. Influenced DF Election with the rest of the PEs in the ES.
- Bits 26-31: Unassigned. - Bits 2-15: Unassigned.
The DF Election Extended Community is used as follows: The DF Election Extended Community is used as follows:
o A PE SHOULD attach the DF Election Extended Community to any o A PE SHOULD attach the DF Election Extended Community to any
advertised ES route and the Extended Community MUST be sent if the advertised ES route and the Extended Community MUST be sent if the
ES is locally configured with a DF election type different from the ES is locally configured with a DF election algorithm other than
Default Election algorithm or if a capability is required to be the Default Election algorithm or if a capability is required to be
used. In the Extended Community, the PE indicates the desired "DF used. In the Extended Community, the PE indicates the desired "DF
Type" algorithm and "Bitmap" capabilities to be used for the ES. Alg" algorithm and "Bitmap" capabilities to be used for the ES.
- Only one DF Election Extended Community can be sent along with an - Only one DF Election Extended Community can be sent along with an
ES route. Note that the intent is not for the advertising PE to ES route. Note that the intent is not for the advertising PE to
indicate all the supported DF Types and capabilities, but signal indicate all the supported DF election algorithms and
the preferred ones. capabilities, but signal the preferred one.
- DF Types 0 and 1 can be both used with bit AC-DF set to 0 or 1. - DF Algs 0 and 1 can be both used with bit AC-DF set to 0 or 1.
- In general, a specific DF Type MAY determine the use of the - In general, a specific DF Alg MAY determine the use of the
reserved bits in the Extended Community. In case of DF Type HRW, reserved bits in the Extended Community, which may be used in a
the reserved bits will be sent as 0 and will be ignored on different way for a different DF Alg.
reception.
o When a PE receives the ES Routes from all the other PEs for the ES o When a PE receives the ES Routes from all the other PEs for the ES
in question, it checks to see if all the advertisements have the in question, it checks to see if all the advertisements have the
extended community with the same DF Type and Bitmap: extended community with the same DF Alg and Bitmap:
- In the case that they do, this particular PE MUST follow the - In the case that they do, this particular PE MUST follow the
procedures for the advertised DF Type and capabilities. For procedures for the advertised DF Alg and capabilities. For
instance, if all ES routes for a given ES indicate DF Type HRW instance, if all ES routes for a given ES indicate DF Alg HRW and
and AC-DF set to 1, the receiving PE and by induction all the AC-DF set to 1, the receiving PE and by induction all the other
other PEs in the ES will proceed to do DF Election as per the HRW PEs in the ES will proceed to do DF Election as per the HRW
Algorithm and following the AC-DF procedures. Algorithm and following the AC-DF procedures.
- Otherwise if even a single advertisement for the type-4 route is - Otherwise if even a single advertisement for the type-4 route is
not received with the locally configured DF Type and capability, not received with the locally configured DF Alg and capability,
the default DF Election algorithm (modulus) algorithm MUST be the default DF Election algorithm (modulus) algorithm MUST be
used as in [RFC7432]. used as in [RFC7432].
- The absence of the DF Election Extended Community MUST be - The absence of the DF Election Extended Community MUST be
interpreted by a receiving PE as an indication of the default DF interpreted by a receiving PE as an indication of the default DF
Election algorithm on the sending PE, that is, DF Type 0 and no Election algorithm on the sending PE, that is, DF Alg 0 and no DF
DF Election capabilities. Election capabilities.
o When all the PEs in an ES advertise DF Type 255, they will rely on o When all the PEs in an ES advertise DF Type 255, they will rely on
the local policy to decide how to proceed with the DF Election. the local policy to decide how to proceed with the DF Election.
o For any new capability defined in the future, the o For any new capability defined in the future, the
applicability/compatibility of this new capability to the existing applicability/compatibility of this new capability to the existing
DF types must be assessed on a per case by case basis. DF Algs must be assessed on a case by case basis.
o Likewise, for any new DF type defined in future, its o Likewise, for any new DF Alg defined in future, its
applicability/compatibility to the existing capabilities must be applicability/compatibility to the existing capabilities must be
assessed on a per case by case basis. assessed on a case by case basis.
3.3 Auto-Derivation of ES-Import Route Target 3.3 Auto-Derivation of ES-Import Route Target
Section 7.6 of [RFC7432] describes how the value of the ES-Import Section 7.6 of [RFC7432] describes how the value of the ES-Import
Route Target for ESI types 1, 2, and 3 can be auto-derived by using Route Target for ESI types 1, 2, and 3 can be auto-derived by using
the high-order six bytes of the nine byte ESI value. The same auto- the high-order six bytes of the nine byte ESI value. The same auto-
derivation procedure can be extended to ESI types 0, 4, and 5 as long derivation procedure can be extended to ESI types 0, 4, and 5 as long
as it is ensured that the auto-derived values for ES-Import RT among as it is ensured that the auto-derived values for ES-Import RT among
different ES types don't overlap. different ES types don't overlap.
4. The Highest Random Weight DF Election Type 4. The Highest Random Weight DF Election Algorithm
The procedure discussed in this section is applicable to the DF The procedure discussed in this section is applicable to the DF
Election in EVPN Services [RFC7432] and EVPN Virtual Private Wire Election in EVPN Services [RFC7432] and EVPN Virtual Private Wire
Services [RFC8214]. Services [RFC8214].
Highest Random Weight (HRW) as defined in [HRW1999] is originally Highest Random Weight (HRW) as defined in [HRW1999] is originally
proposed in the context of Internet Caching and proxy Server load proposed in the context of Internet Caching and proxy Server load
balancing. Given an object name and a set of servers, HRW maps a balancing. Given an object name and a set of servers, HRW maps a
request to a server using the object-name (object-id) and server-name request to a server using the object-name (object-id) and server-name
(server-id) rather than the state of the server states. HRW forms a (server-id) rather than the state of the server states. HRW forms a
skipping to change at page 17, line 34 skipping to change at page 17, line 29
HRW is not the only algorithm that addresses the object to server HRW is not the only algorithm that addresses the object to server
mapping problem with goals of fair load distribution, redundancy and mapping problem with goals of fair load distribution, redundancy and
fast access. There is another family of algorithms that also fast access. There is another family of algorithms that also
addresses this problem; these fall under the umbrella of the addresses this problem; these fall under the umbrella of the
Consistent Hashing Algorithms [CHASH]. These will not be considered Consistent Hashing Algorithms [CHASH]. These will not be considered
here. here.
4.2. HRW Algorithm for EVPN DF Election 4.2. HRW Algorithm for EVPN DF Election
The applicability of HRW to DF Election is described here. Let DF(v) This section describes the application of HRW to DF election. Let
denote the Designated Forwarder and BDF(v) the Backup Designated DF(v) denote the Designated Forwarder and BDF(v) the Backup
forwarder for the Ethernet Tag V, where v is the VLAN, Si is the IP Designated forwarder for the Ethernet Tag v, where v is the VLAN, Si
address of server i, Es denotes the Ethernet Segment Identifier and is the IP address of server i, Es denotes the Ethernet Segment
weight is a pseudo-random function of v and Si. Identifier and weight is a function of v, Si, and Es.
Note that while the DF election algorithm in [RFC7432] uses PE Note that while the DF election algorithm in [RFC7432] uses PE
address and vlan as inputs, this document uses PE address, ESI, and address and vlan as inputs, this document uses Ethernet Tag, PE
vlan as inputs. This is because if the same set of PEs are multi- address and ESI as inputs. This is because if the same set of PEs are
homed to the same set of ESes, then the DF election algorithm used in multi-homed to the same set of ESes, then the DF election algorithm
[RFC7432] would result in the same PE being elected DF for the same used in [RFC7432] would result in the same PE being elected DF for
set of broadcast domains on each ES, which can have adverse side- the same set of broadcast domains on each ES, which can have adverse
effects on both load balancing and redundancy. Including ESI in the side-effects on both load balancing and redundancy. Including ESI in
DF election algorithm introduces additional entropy which the DF election algorithm introduces additional entropy which
significantly reduces the probability of the same PE being elected DF significantly reduces the probability of the same PE being elected DF
for the same set of broadcast domains on each ES. Therefore, the ESI for the same set of broadcast domains on each ES. Therefore, the ESI
value in the Weight function below SHOULD be set to that of value in the Weight function below SHOULD be set to that of
corresponding ES. The ESI value MAY be set to all 0's in the Weight corresponding ES. The ESI value MAY be set to all 0's in the Weight
function below if the operator chooses so. function below if the operator chooses so.
In case of a VLAN-Bundle service, v denotes the lowest VLAN similar In case of a VLAN-Bundle service, v denotes the lowest VLAN similar
to the 'lowest VLAN in bundle' logic of [RFC7432]. to the 'lowest VLAN in bundle' logic of [RFC7432].
1. DF(v) = Si: Weight(v, Es, Si) >= Weight(V, Es, Sj), for all j. In 1. DF(v) = Si: Weight(v, Es, Si) >= Weight(v, Es, Sj), for all j. In
case of a tie, choose the PE whose IP address is numerically the case of a tie, choose the PE whose IP address is numerically the
least. Note 0 <= i,j <= Number of PEs in the redundancy group. least. Note 0 <= i,j <= Number of PEs in the redundancy group.
2. BDF(v) = Sk: Weight(v, Es, Si) >= Weight(V, Es, Sk) and Weight(v, 2. BDF(v) = Sk: Weight(v, Es, Si) >= Weight(V, Es, Sk) and Weight(v,
Es, Sk) >= Weight(v, Es, Sj). In case of tie choose the PE whose Es, Sk) >= Weight(v, Es, Sj). In case of tie choose the PE whose
IP address is numerically the least. IP address is numerically the least.
Since the Weight is a Pseudo-random function with domain as the Since the Weight is a Pseudo-random function with domain as the
three-tuple (v, Es, S), it is an efficient deterministic algorithm three-tuple (v, Es, S), it is an efficient deterministic algorithm
which is independent of the Ethernet Tag V sample space distribution. that is independent of the Ethernet Tag v sample space distribution.
Choosing a good hash function for the pseudo-random function is an Choosing a good hash function for the pseudo-random function is an
important consideration for this algorithm to perform probably better important consideration for this algorithm to perform better than the
than the default algorithm. As mentioned previously, such functions default algorithm. As mentioned previously, such functions are
are described in the HRW paper. We take as candidate hash functions described in the HRW paper. We take as candidate hash functions two
two of the ones that are preferred in [HRW1999]. of the ones that are preferred in [HRW1999].
1. Wrand(v, Es, Si) = (1103515245((1103515245.Si+12345)XOR 1. Wrand(v, Es, Si) = (1103515245((1103515245.Si+12345)XOR
D(v,Es))+12345)(mod 2^31) and D(v,Es))+12345)(mod 2^31) and
2. Wrand2(v, Es, Si) = (1103515245((1103515245.D(v,Es)+12345)XOR 2. Wrand2(v, Es, Si) = (1103515245((1103515245.D(v,Es)+12345)XOR
Si)+12345)(mod 2^31) Si)+12345)(mod 2^31)
Here D(v,Es) is the 31-bit digest (CRC-32 and discarding the MSB as Here D(v,Es) is the 31-bit digest (CRC-32 and discarding the MSB as
in [HRW1999]) of the 14-byte stream, the Ethernet Tag v (4 bytes) in [HRW1999]) of the 14-byte stream, the Ethernet Tag v (4 bytes)
followed by the Ethernet Segment Identifier (10 bytes). It is followed by the Ethernet Segment Identifier (10 bytes). It is
mandated that the 14-byte stream is formed by concatenation of the mandated that the 14-byte stream is formed by concatenation of the
Ethernet tag and the Ethernet Segment identifier in network byte Ethernet tag and the Ethernet Segment identifier in network byte
order. The CRC should proceed as if the architecture is in network order. The CRC should proceed as if the stream is in network byte
byte order (big-endian). Si is address of the ith server. The order (big-endian). Si is address of the ith server. The server's IP
server's IP address length does not matter as only the low-order 31 address length does not matter as only the low-order 31 bits are
bits are modulo significant. Although both the above hash functions modulo significant. Although both the above hash functions perform
perform similarly, we select the first hash function (1) of choice, similarly, we select the first hash function (1) of choice, as the
as the hash function has to be the same in all the PEs participating hash function has to be the same in all the PEs participating in the
in the DF election. DF election.
A point to note is that the Weight function takes into consideration A point to note is that the Weight function takes into consideration
the combination of the Ethernet Tag, Ethernet Segment and the PE IP- the combination of the Ethernet Tag, Ethernet Segment and the PE IP-
address, and the actual length of the server IP address (whether V4 address, and the actual length of the server IP address (whether V4
or V6) is not really relevant. The default algorithm in [RFC7432] or V6) is not really relevant. The default algorithm in [RFC7432]
cannot employ both V4 and V6 PE addresses, since [RFC7432] does not cannot employ both V4 and V6 PE addresses, since [RFC7432] does not
specify how to decide on the ordering (the ordinal list) when both V4 specify how to decide on the ordering (the ordinal list) when both V4
and V6 PEs are present. and V6 PEs are present.
HRW solves the disadvantage pointed out in Section 2.2.1 and ensures: HRW solves the disadvantage pointed out in Section 2.2.1 and ensures:
o with very high probability that the task of DF election for o with very high probability that the task of DF election for the
respective VLANs is more or less equally distributed among the PEs VLANs configured on an ES is more or less equally distributed among
even for the 2 PE case. the PEs even for the 2 PE case.
o If a PE, hosting some VLANs on given ES, but is neither the DF nor o If a PE that is not the DF or the BDF for that VLAN, goes down or
the BDF for that VLAN, goes down or its connection to the ES goes its connection to the ES goes down, it does not result in a DF or
down, it does not result in a DF and BDF reassignment the other BDF reassignment. This saves computation, especially in the case
PEs. This saves computation, especially in the case when the when the connection flaps.
connection flaps.
o More importantly it avoids the needless disruption case of Section o More importantly it avoids the needless disruption case of Section
2.2.1 (3), that is inherent in the existing default DF Election. 2.2.1 (3), that is inherent in the existing default DF Election.
o In addition to the DF, the algorithm also furnishes the BDF, which o In addition to the DF, the algorithm also furnishes the BDF, which
would be the DF if the current DF fails. would be the DF if the current DF fails.
5. The Attachment Circuit Influenced DF Election Capability 5. The Attachment Circuit Influenced DF Election Capability
The procedure discussed in this section is applicable to the DF The procedure discussed in this section is applicable to the DF
Election in EVPN Services [RFC7432] and EVPN Virtual Private Wire Election in EVPN Services [RFC7432] and EVPN Virtual Private Wire
Services [RFC8214]. Services [RFC8214].
The AC-DF capability MAY be used with any "DF Type" algorithm. It The AC-DF capability MAY be used with any "DF Alg" algorithm. It MUST
MUST modify the DF Election procedures by removing from consideration modify the DF Election procedures by removing from consideration any
any candidate PE in the ES that cannot forward traffic on the AC that candidate PE in the ES that cannot forward traffic on the AC that
belongs to the BD. This section is applicable to VLAN-Based and VLAN- belongs to the BD. This section is applicable to VLAN-Based and VLAN-
Bundle service interfaces. Section 5.1 describes the procedures for Bundle service interfaces. Section 5.1 describes the procedures for
VLAN-Aware Bundle interfaces. VLAN-Aware Bundle interfaces.
In particular, when used with the default DF Type, the AC-DF In particular, when used with the default DF Alg, the AC-DF
capability modifies the Step 3 in the DF Election procedure described capability modifies the Step 3 in the DF Election procedure described
in [RFC7432] Section 8.5, as follows: in [RFC7432] Section 8.5, as follows:
3. When the timer expires, each PE builds an ordered "candidate" list 3. When the timer expires, each PE builds an ordered "candidate" list
of the IP addresses of all the PE nodes connected to the Ethernet of the IP addresses of all the PE nodes attached to the Ethernet
Segment (including itself), in increasing numeric value. The Segment (including itself), in increasing numeric value. The
candidate list is based on the Originator Router's IP addresses of candidate list is based on the Originator Router's IP addresses of
the ES routes, excluding all the PEs for which no Ethernet A-D per the ES routes, but excludes any PE from whom no Ethernet A-D per
ES route has been received, or for which the route has been ES route has been received, or from whom the route has been
withdrawn. Afterwards, the DF Election algorithm is applied on a withdrawn. Afterwards, the DF Election algorithm is applied on a
per <ES,VLAN> or <ES,VLAN-bundle>, however, the IP address for a per <ES,VLAN> or <ES,VLAN-bundle>, however, the IP address for a
PE will not be considered candidate for a given <ES,VLAN> or PE will not be considered candidate for a given <ES,VLAN> or
<ES,VLAN-bundle> until the corresponding Ethernet A-D per EVI <ES,VLAN-bundle> until the corresponding Ethernet A-D per EVI
route has been received from that PE. In other words, the ACS on route has been received from that PE. In other words, the ACS on
the ES for a given PE must be UP so that the PE is considered as the ES for a given PE must be UP so that the PE is considered as
candidate for a given BD. candidate for a given BD.
The above paragraph differs from [RFC7432] Section 8.5, Step 3, in The above paragraph differs from [RFC7432] Section 8.5, Step 3, in
two aspects: two aspects:
o Any DF Type algorithm can be used, and not only the modulus-based o Any DF Alg algorithm can be used, and not only the modulus-based
one (which is the default DF Election, or DF Type 0 in this one (which is the default DF Election, or DF Alg 0 in this
document). document).
o The candidate list is pruned based on the Ethernet A-D routes: a o The candidate list is pruned based upon non-receipt of Ethernet A-D
PE's IP address MUST be removed from the ES candidate list if its routes: a PE's IP address MUST be removed from the ES candidate
Ethernet A-D per ES route is withdrawn. A PE's IP address MUST NOT list if its Ethernet A-D per ES route is withdrawn. A PE's IP
be considered as candidate DF for a <ES,VLAN> or <ES,VLAN-bundle>, address MUST NOT be considered as candidate DF for a <ES,VLAN> or
if its Ethernet A-D per EVI route for the <ES,VLAN> or <ES,VLAN- <ES,VLAN-bundle>, if its Ethernet A-D per EVI route for the
bundle> respectively, is withdrawn. <ES,VLAN> or <ES,VLAN-bundle> respectively, is withdrawn.
The following example illustrates the AC-DF behavior applied to the The following example illustrates the AC-DF behavior applied to the
Default DF election algorithm, assuming the network in Figure 2: Default DF election algorithm, assuming the network in Figure 2:
a) When PE1 and PE2 discover ES12, they advertise an ES route for a) When PE1 and PE2 discover ES12, they advertise an ES route for
ES12 with the associated ES-import extended community and the DF ES12 with the associated ES-import extended community and the DF
Election Extended Community indicating AC-DF=1; they start a timer Election Extended Community indicating AC-DF=1; they start a timer
at the same time. Likewise, PE2 and PE3 advertise an ES route for at the same time. Likewise, PE2 and PE3 advertise an ES route for
ES23 with AC-DF=1 and start a timer. ES23 with AC-DF=1 and start a timer.
skipping to change at page 20, line 43 skipping to change at page 20, line 37
c) In addition, PE1/PE2/PE3 advertise an Ethernet A-D per EVI route c) In addition, PE1/PE2/PE3 advertise an Ethernet A-D per EVI route
for AC1, AC2, AC3 and AC4 as soon as the ACs are enabled. Note for AC1, AC2, AC3 and AC4 as soon as the ACs are enabled. Note
that the AC can be associated to a single customer VID (e.g. VLAN- that the AC can be associated to a single customer VID (e.g. VLAN-
based service interfaces) or a bundle of customer VIDs (e.g. VLAN- based service interfaces) or a bundle of customer VIDs (e.g. VLAN-
Bundle service interfaces). Bundle service interfaces).
d) When the timer expires, each PE builds an ordered "candidate" list d) When the timer expires, each PE builds an ordered "candidate" list
of the IP addresses of all the PE nodes connected to the Ethernet of the IP addresses of all the PE nodes connected to the Ethernet
Segment (including itself) as explained above in [RFC7432] Step 3. Segment (including itself) as explained above in [RFC7432] Step 3.
All the PEs for which no Ethernet A-D per ES route has been Any PE from which an Ethernet A-D per ES route has not been
received, are pruned from the list. received is pruned from the list.
e) When electing the DF for a given BD, a PE will not be considered e) When electing the DF for a given BD, a PE will not be considered
candidate until an Ethernet A-D per EVI route has been received candidate until an Ethernet A-D per EVI route has been received
from that PE. In other words, the ACS on the ES for a given PE from that PE. In other words, the ACS on the ES for a given PE
must be UP so that the PE is considered as candidate for a given must be UP so that the PE is considered as candidate for a given
BD. For example, PE1 will not consider PE2 as candidate for DF BD. For example, PE1 will not consider PE2 as candidate for DF
election for <ES12,VLAN-1> until an Ethernet A-D per EVI route is election for <ES12,VLAN-1> until an Ethernet A-D per EVI route is
received from PE2 for <ES12,VLAN-1>. received from PE2 for <ES12,VLAN-1>.
f) Once the PEs with ACS = DOWN for a given BD have been removed from f) Once the PEs with ACS = DOWN for a given BD have been removed from
skipping to change at page 22, line 8 skipping to change at page 21, line 45
performed per <ES,VLAN-Bundle>. The withdrawal of an individual route performed per <ES,VLAN-Bundle>. The withdrawal of an individual route
only indicates the unavailability of a specific AC but not only indicates the unavailability of a specific AC but not
necessarily all the ACs in the <ES,VLAN-Bundle>. necessarily all the ACs in the <ES,VLAN-Bundle>.
This document modifies the DF Election for VLAN-Aware Bundle services This document modifies the DF Election for VLAN-Aware Bundle services
in the following way: in the following way:
o After confirming that all the PEs in the ES advertise the AC-DF o After confirming that all the PEs in the ES advertise the AC-DF
capability, a PE will perform a DF Election per <ES,VLAN>, as capability, a PE will perform a DF Election per <ES,VLAN>, as
opposed to per <ES,VLAN-Bundle> in [RFC7432]. Now, the withdrawal opposed to per <ES,VLAN-Bundle> in [RFC7432]. Now, the withdrawal
of an Ethernet per EVI route for a VLAN will indicate that the of an Ethernet A-D per EVI route for a VLAN will indicate that the
advertising PE's ACS is DOWN and the rest of the PEs in the ES can advertising PE's ACS is DOWN and the rest of the PEs in the ES can
remove the PE from consideration for DF in the <ES,VLAN>. remove the PE from consideration for DF in the <ES,VLAN>.
o The PEs will now follow the procedures in section 5. o The PEs will now follow the procedures in section 5.
For example, assuming three bridge tables in PE1 for the same MAC-VRF For example, assuming three bridge tables in PE1 for the same MAC-VRF
(each one associated to a different Ethernet Tag, e.g. VLAN-1, VLAN-2 (each one associated to a different Ethernet Tag, e.g. VLAN-1, VLAN-2
and VLAN-3), PE1 will advertise three Ethernet A-D per EVI routes for and VLAN-3), PE1 will advertise three Ethernet A-D per EVI routes for
ES12. Each of the three routes will indicate the status of each of ES12. Each of the three routes will indicate the status of each of
the three ACs in ES12. PE1 will be considered as a valid candidate PE the three ACs in ES12. PE1 will be considered as a valid candidate PE
for DF election in <ES12,VLAN-1>, <ES12,VLAN-2>, <ES12,VLAN-3> as for DF election in <ES12,VLAN-1>, <ES12,VLAN-2>, <ES12,VLAN-3> as
long as its three routes are active. For instance, if PE1 withdraws long as its three routes are active. For instance, if PE1 withdraws
the Ethernet A-D per EVI routes for <ES12,VLAN-1>, the PEs in ES12 the Ethernet A-D per EVI routes for <ES12,VLAN-1>, the PEs in ES12
will not consider PE1 as a suitable DF candidate for <ES12,VLAN-1>. will not consider PE1 as a suitable DF candidate for <ES12,VLAN-1>.
PE1 will still be considered for <ES12,VLAN-2> and <ES12,VLAN-3>
since its routes are active.
6. Solution Benefits 6. Solution Benefits
The solution described in this document provides the following The solution described in this document provides the following
benefits: benefits:
a) Extends the DF Election in [RFC7432] to address the unfair load- a) Extends the DF Election in [RFC7432] to address the unfair load-
balancing and potential black-holing issues of the default DF balancing and potential black-holing issues of the default DF
Election algorithm. The solution is applicable to the DF Election Election algorithm. The solution is applicable to the DF Election
in EVPN Services [RFC7432] and EVPN Virtual Private Wire Services in EVPN Services [RFC7432] and EVPN Virtual Private Wire Services
skipping to change at page 22, line 47 skipping to change at page 22, line 40
defining the DF Election Extended Community, which allow signaling defining the DF Election Extended Community, which allow signaling
of the capabilities supported by this document as well as any of the capabilities supported by this document as well as any
other future DF Election algorithms and capabilities. other future DF Election algorithms and capabilities.
c) The solution is backwards compatible with the procedures defined c) The solution is backwards compatible with the procedures defined
in [RFC7432]. If one or more PEs in the ES do not support the new in [RFC7432]. If one or more PEs in the ES do not support the new
procedures, they will all follow the [RFC7432] DF Election. procedures, they will all follow the [RFC7432] DF Election.
7. Security Considerations 7. Security Considerations
The same Security Considerations described in [RFC7432] are valid for This document addresses some identified issues in the DF Election
this document. procedures described in [RFC7432] by defining a new DF Election
framework. In general, this framework allows the PEs that are part of
the same Ethernet Segment to exchange additional information and
agree on the DF Election Type and Capabilities to be used.
Following the procedures in this document, the operator will minimize
undesired situations such as unfair load-balancing, service
disruption and traffic black-holing. Since those situations may have
been purposely created by a malicious user with access to the
configuration of one PE, this document enhances also the security of
the network. In addition, the new framework is extensible and allows
for future new security enhancements that are out of the scope of
this document. Finally, since this document extends the procedures in
[RFC7432], the same Security Considerations described in [RFC7432]
are valid for this document.
8. IANA Considerations 8. IANA Considerations
IANA is requested to: IANA is requested to:
o Allocate Sub-Type value 0x06 as "DF Election Extended Community" in o Allocate Sub-Type value 0x06 in the "EVPN Extended Community Sub-
the "EVPN Extended Community Sub-Types" registry. Types" registry defined in [RFC7153] as follows:
o Set up a registry "DF Type" for the DF Type octet in the Extended SUB-TYPE VALUE NAME Reference
Community. The following values in that registry are requested: -------------- ------------------------- -------------
0x06 DF Election Extended Community This document
- Type 0: Default DF Election. o Set up a registry called "DF Alg" for the DF Alg octet in the
- Type 1: HRW algorithm. Extended Community. New registrations will be made through the "RFC
- Type 255: Reserved for Experimental use. Required" procedure defined in [RFC8126]. The following initial
values in that registry are requested:
o Set up a registry "DF Election Capabilities" for the Bitmap octet Alg Name Reference
in the Extended Community. The following values in that registry ---- -------------- -------------
are requested: 0 Default DF Election This document
1 HRW algorithm This document
2-254 Unassigned
255 Reserved for Experimental use This document
- Bit 25: AC-DF capability. o Set up a registry called "DF Election Capabilities" for the two-
octet Bitmap field in the Extended Community. New registrations
will be made through the "RFC Required" procedure defined in
[RFC8126]. The following initial value in that registry is
requested:
o The registration policy for the two registries is "Specification Bit Name Reference
Required". ---- -------------- -------------
0 Unassigned
1 AC-DF capability This document
2-15 Unassigned
9. References 9. References
9.1. Normative References 9.1. Normative References
[RFC7432] Sajassi, A., Ed., Aggarwal, R., Bitar, N., Isaac, A., [RFC7432] Sajassi, A., Ed., Aggarwal, R., Bitar, N., Isaac, A.,
Uttaro, J., Drake, J., and W. Henderickx, "BGP MPLS-Based Ethernet Uttaro, J., Drake, J., and W. Henderickx, "BGP MPLS-Based Ethernet
VPN", RFC 7432, DOI 10.17487/RFC7432, February 2015, VPN", RFC 7432, DOI 10.17487/RFC7432, February 2015,
<https://www.rfc-editor.org/info/rfc7432>. <https://www.rfc-editor.org/info/rfc7432>.
skipping to change at page 24, line 13 skipping to change at page 24, line 29
1997, <https://www.rfc-editor.org/info/rfc2119>. 1997, <https://www.rfc-editor.org/info/rfc2119>.
[RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC [RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC
2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, May 2017, 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, May 2017,
<https://www.rfc-editor.org/info/rfc8174>. <https://www.rfc-editor.org/info/rfc8174>.
[RFC4360] Sangli, S., Tappan, D., and Y. Rekhter, "BGP Extended [RFC4360] Sangli, S., Tappan, D., and Y. Rekhter, "BGP Extended
Communities Attribute", RFC 4360, DOI 10.17487/RFC4360, February Communities Attribute", RFC 4360, DOI 10.17487/RFC4360, February
2006, <http://www.rfc-editor.org/info/rfc4360>. 2006, <http://www.rfc-editor.org/info/rfc4360>.
[RFC7153] Rosen, E. and Y. Rekhter, "IANA Registries for BGP
Extended Communities", RFC 7153, DOI 10.17487/RFC7153, March 2014,
<https://www.rfc-editor.org/info/rfc7153>.
[RFC8126] Cotton, M., Leiba, B., and T. Narten, "Guidelines for
Writing an IANA Considerations Section in RFCs", BCP 26, RFC 8126,
DOI 10.17487/RFC8126, June 2017, <https://www.rfc-
editor.org/info/rfc8126>.
9.2. Informative References 9.2. Informative References
[VPLS-MH] Kothari, Henderickx et al., "BGP based Multi-homing in [VPLS-MH] Kothari, Henderickx et al., "BGP based Multi-homing in
Virtual Private LAN Service", draft-ietf-bess-vpls-multihoming- Virtual Private LAN Service", draft-ietf-bess-vpls-multihoming-
01.txt, work in progress, January, 2016. 02.txt, work in progress, September, 2018.
[CHASH] Karger, D., Lehman, E., Leighton, T., Panigrahy, R., Levine, [CHASH] Karger, D., Lehman, E., Leighton, T., Panigrahy, R., Levine,
M., and D. Lewin, "Consistent Hashing and Random Trees: Distributed M., and D. Lewin, "Consistent Hashing and Random Trees: Distributed
Caching Protocols for Relieving Hot Spots on the World Wide Web", ACM Caching Protocols for Relieving Hot Spots on the World Wide Web", ACM
Symposium on Theory of Computing ACM Press New York, May 1997. Symposium on Theory of Computing ACM Press New York, May 1997.
[CLRS2009] Cormen, T., Leiserson, C., Rivest, R., and C. Stein, [CLRS2009] Cormen, T., Leiserson, C., Rivest, R., and C. Stein,
"Introduction to Algorithms (3rd ed.)", MIT Press and McGraw-Hill "Introduction to Algorithms (3rd ed.)", MIT Press and McGraw-Hill
ISBN 0-262-03384-4., February 2009. ISBN 0-262-03384-4., February 2009.
 End of changes. 72 change blocks. 
205 lines changed or deleted 245 lines changed or added

This html diff was produced by rfcdiff 1.47. The latest version is available from http://tools.ietf.org/tools/rfcdiff/