draft-ietf-bess-evpn-df-election-framework-09.txt   rfc8584.txt 
BESS Workgroup J. Rabadan, Ed. Internet Engineering Task Force (IETF) J. Rabadan, Ed.
Internet Draft Nokia Request for Comments: 8584 Nokia
Updates: 7432 S. Mohanty, Ed. Updates: 7432 S. Mohanty, Ed.
Intended status: Standards Track A. Sajassi Category: Standards Track A. Sajassi
Cisco ISSN: 2070-1721 Cisco
J. Drake J. Drake
Juniper Juniper
K. Nagaraj K. Nagaraj
S. Sathappan S. Sathappan
Nokia Nokia
April 2019
Expires: July 28, 2019 January 24, 2019 Framework for Ethernet VPN Designated Forwarder Election Extensibility
Framework for EVPN Designated Forwarder Election Extensibility
draft-ietf-bess-evpn-df-election-framework-09
Abstract Abstract
An alternative to the Default Designated Forwarder (DF) selection An alternative to the default Designated Forwarder (DF) selection
algorithm in Ethernet VPN (EVPN) networks is defined. The DF is the algorithm in Ethernet VPNs (EVPNs) is defined. The DF is the
Provider Edge (PE) router responsible for sending broadcast, unknown Provider Edge (PE) router responsible for sending Broadcast, Unknown
unicast and multicast (BUM) traffic to multi-homed Customer Equipment Unicast, and Multicast (BUM) traffic to a multihomed Customer Edge
(CE) on a particular Ethernet Segment (ES) within a VLAN. In (CE) device on a given VLAN on a particular Ethernet Segment (ES).
addition, the capability to influence the DF election result for a In addition, the ability to influence the DF election result for a
VLAN based on the state of the associated Attachment Circuit (AC) is VLAN based on the state of the associated Attachment Circuit (AC) is
specified. This document clarifies the DF Election Finite State specified. This document clarifies the DF election Finite State
Machine in EVPN, therefore it updates the EVPN specification. Machine in EVPN services. Therefore, it updates the EVPN
specification (RFC 7432).
Status of this Memo
This Internet-Draft is submitted in full conformance with the
provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF), its areas, and its working groups. Note that
other groups may also distribute working documents as Internet-
Drafts.
Internet-Drafts are draft documents valid for a maximum of six months Status of This Memo
and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress."
The list of current Internet-Drafts can be accessed at This is an Internet Standards Track document.
http://www.ietf.org/ietf/1id-abstracts.txt
The list of Internet-Draft Shadow Directories can be accessed at This document is a product of the Internet Engineering Task Force
http://www.ietf.org/shadow.html (IETF). It represents the consensus of the IETF community. It has
received public review and has been approved for publication by the
Internet Engineering Steering Group (IESG). Further information on
Internet Standards is available in Section 2 of RFC 7841.
This Internet-Draft will expire on July 28, 2019. Information about the current status of this document, any errata,
and how to provide feedback on it may be obtained at
https://www.rfc-editor.org/info/rfc8584.
Copyright Notice Copyright Notice
Copyright (c) 2019 IETF Trust and the persons identified as the Copyright (c) 2019 IETF Trust and the persons identified as the
document authors. All rights reserved. document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents Provisions Relating to IETF Documents
(http://trustee.ietf.org/license-info) in effect on the date of (https://trustee.ietf.org/license-info) in effect on the date of
publication of this document. Please review these documents publication of this document. Please review these documents
carefully, as they describe your rights and restrictions with respect carefully, as they describe your rights and restrictions with respect
to this document. Code Components extracted from this document must to this document. Code Components extracted from this document must
include Simplified BSD License text as described in Section 4.e of include Simplified BSD License text as described in Section 4.e of
the Trust Legal Provisions and are provided without warranty as the Trust Legal Provisions and are provided without warranty as
described in the Simplified BSD License. described in the Simplified BSD License.
Table of Contents Table of Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 1. Introduction ....................................................3
1.1. Default Designated Forwarder (DF) Election in EVPN . . . . 3 1.1. Conventions and Terminology ................................3
1.2. Problem Statement . . . . . . . . . . . . . . . . . . . . . 6 1.2. Default Designated Forwarder (DF) Election in EVPN
1.2.1. Unfair Load-Balancing and Service Disruption . . . . . 6 Services ...................................................5
1.2.2. Traffic Black-Holing on Individual AC Failures . . . . 7 1.3. Problem Statement ..........................................8
1.3. The Need for Extending the Default DF Election in EVPN . . 10 1.3.1. Unfair Load Balancing and Service Disruption ........8
2. Conventions and Terminology . . . . . . . . . . . . . . . . . . 11 1.3.2. Traffic Black-Holing on Individual AC Failures .....10
3. Designated Forwarder Election Protocol and BGP Extensions . . . 12 1.4. The Need for Extending the Default DF Election in
3.1. The DF Election Finite State Machine (FSM) . . . . . . . . 12 EVPN Services .............................................12
3.2. The DF Election Extended Community . . . . . . . . . . . . 15 2. Designated Forwarder Election Protocol and BGP Extensions ......13
3.2.1. Backward Compatibility . . . . . . . . . . . . . . . . 18 2.1. The DF Election Finite State Machine (FSM) ................13
3.3. Auto-Derivation of ES-Import Route Target . . . . . . . . . 18 2.2. The DF Election Extended Community ........................16
4. The Highest Random Weight DF Election Algorithm . . . . . . . . 18 2.2.1. Backward Compatibility .............................19
4.1. HRW and Consistent Hashing . . . . . . . . . . . . . . . . 19 3. The Highest Random Weight DF Election Algorithm ................19
4.2. HRW Algorithm for EVPN DF Election . . . . . . . . . . . . 19 3.1. HRW and Consistent Hashing ................................20
5. The Attachment Circuit Influenced DF Election Capability . . . 21 3.2. HRW Algorithm for EVPN DF Election ........................20
5.1. AC-Influenced DF Election Capability For VLAN-Aware 4. The AC-Influenced DF Election Capability .......................22
Bundle Services . . . . . . . . . . . . . . . . . . . . . . 23 4.1. AC-Influenced DF Election Capability for
VLAN-Aware Bundle Services ................................24
6. Solution Benefits . . . . . . . . . . . . . . . . . . . . . . . 24 5. Solution Benefits ..............................................25
7. Security Considerations . . . . . . . . . . . . . . . . . . . . 25 6. Security Considerations ........................................26
8. IANA Considerations . . . . . . . . . . . . . . . . . . . . . . 25 7. IANA Considerations ............................................27
9. References . . . . . . . . . . . . . . . . . . . . . . . . . . 26 8. References .....................................................28
9.1. Normative References . . . . . . . . . . . . . . . . . . . 26 8.1. Normative References ......................................28
9.2. Informative References . . . . . . . . . . . . . . . . . . 27 8.2. Informative References ....................................29
10. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 27 Acknowledgments ...................................................30
11. Contributors . . . . . . . . . . . . . . . . . . . . . . . . . 28 Contributors ......................................................30
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 28 Authors' Addresses ................................................31
1. Introduction 1. Introduction
The Designated Forwarder (DF) in EVPN networks is the Provider Edge The Designated Forwarder (DF) in Ethernet VPNs (EVPNs) is the
(PE) router responsible for sending broadcast, unknown unicast and Provider Edge (PE) router responsible for sending Broadcast, Unknown
multicast (BUM) traffic to a multi-homed Customer Equipment (CE) Unicast, and Multicast (BUM) traffic to a multihomed Customer Edge
device, on a given VLAN on a particular Ethernet Segment (ES). The DF (CE) device on a given VLAN on a particular Ethernet Segment (ES).
is selected out of a list of candidate PEs that advertise the same The DF is elected from the set of multihomed PEs attached to a given
Ethernet Segment Identifier (ESI) to the EVPN network. By default, ES, each of which advertises an ES route for the ES as identified by
EVPN uses a DF Election algorithm referred to as "Service Carving" its Ethernet Segment Identifier (ESI). By default, the EVPN uses a
and it is based on a modulus function (V mod N) that takes the number DF election algorithm referred to as "service carving". The DF
of PEs in the ES (N) and the VLAN value (V) as input. This Default DF election algorithm is based on a modulus function (V mod N) that
Election algorithm has some inefficiencies that this document takes the number of PEs in the ES (N) and the VLAN value (V) as
addresses by defining a new DF Election algorithm and a capability to input. This document addresses inefficiencies in the default DF
influence the DF Election result for a VLAN, depending on the state election algorithm by defining a new DF election algorithm and an
of the associated Attachment Circuit (AC). In order to avoid any ability to influence the DF election result for a VLAN, depending on
ambiguity with the identifier used in the DF Election Algorithm, this the state of the associated Attachment Circuit (AC). In order to
document uses the term Ethernet Tag instead of VLAN. This document avoid any ambiguity with the identifier used in the DF election
also creates a registry with IANA, for future DF Election Algorithms algorithm, this document uses the term "Ethernet Tag" instead of
and Capabilities. It also presents a formal definition and "VLAN". This document also creates a registry with IANA for future
clarification of the DF Election Finite State Machine (FSM), DF election algorithms and capabilities (see Section 7). It also
therefore the document updates [RFC7432] and EVPN implementations presents a formal definition and clarification of the DF election
MUST conform to the prescribed FSM. Finite State Machine (FSM). Therefore, this document updates
[RFC7432], and EVPN implementations MUST conform to the
prescribed FSM.
The procedures described in this document apply to DF election in all The procedures described in this document apply to DF election in all
EVPN solutions including [RFC7432] and [RFC8214]. Apart from the FSM EVPN solutions, including those described in [RFC7432] and [RFC8214].
formal description, this document does not intend to update other Apart from the formal description of the FSM, this document does not
[RFC7432] procedures. It only aims to improve the behavior of the DF intend to update other procedures described in [RFC7432]; it only
Election on PEs that are upgraded to follow the described procedures. aims to improve the behavior of the DF election on PEs that are
upgraded to follow the procedures described in this document.
1.1. Default Designated Forwarder (DF) Election in EVPN 1.1. Conventions and Terminology
[RFC7432] defines the Designated Forwarder (DF) as the EVPN PE The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
responsible for: "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and
"OPTIONAL" in this document are to be interpreted as described in
BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all
capitals, as shown here.
o Flooding Broadcast, Unknown unicast and Multicast traffic (BUM), on o AC: Attachment Circuit. An AC has an Ethernet Tag associated
a given Ethernet Tag on a particular Ethernet Segment (ES), to the with it.
CE. This is valid for single-active and all-active EVPN
multi-homing.
o Sending unicast traffic on a given Ethernet Tag on a particular ES o ACS: Attachment Circuit Status.
to the CE. This is valid for single-active multi-homing.
Figure 1 illustrates an example that we will use to explain the o BUM: Broadcast, unknown unicast, and multicast.
Designated Forwarder function.
o DF: Designated Forwarder.
o NDF: Non-Designated Forwarder.
o BDF: Backup Designated Forwarder.
o Ethernet A-D per ES route: Refers to Route Type 1 as defined in
[RFC7432] or to Auto-discovery per Ethernet Segment route.
o Ethernet A-D per EVI route: Refers to Route Type 1 as defined in
[RFC7432] or to Auto-discovery per EVPN Instance route.
o ES: Ethernet Segment.
o ESI: Ethernet Segment Identifier.
o EVI: EVPN Instance.
o MAC-VRF: A Virtual Routing and Forwarding table for Media Access
Control (MAC) addresses on a PE.
o BD: Broadcast Domain. An EVI may be comprised of one BD
(VLAN-based or VLAN Bundle services) or multiple BDs (VLAN-aware
Bundle services).
o Bridge table: An instantiation of a BD on a MAC-VRF.
o HRW: Highest Random Weight.
o VID: VLAN Identifier.
o CE-VID: Customer Edge VLAN Identifier.
o Ethernet Tag: Used to represent a BD that is configured on a given
ES for the purpose of DF election. Note that any of the following
may be used to represent a BD: VIDs (including Q-in-Q tags),
configured IDs, VNIs (Virtual Extensible Local Area Network
(VXLAN) Network Identifiers), normalized VIDs, I-SIDs (Service
Instance Identifiers), etc., as long as the representation of the
BDs is configured consistently across the multihomed PEs attached
to that ES. The Ethernet Tag value MUST be different from zero.
o Ethernet Tag ID: Refers to the identifier used in the EVPN routes
defined in [RFC7432]. Its value may be the same as the Ethernet
Tag value (see the definition for Ethernet Tag) when advertising
routes for VLAN-aware Bundle services. Note that in the case of
VLAN-based or VLAN Bundle services, the Ethernet Tag ID is zero.
o DF election procedure: Also called "DF election". Refers to the
process in its entirety, including the discovery of the PEs in the
ES, the creation and maintenance of the PE candidate list, and the
selection of a PE.
o DF algorithm: A component of the DF election procedure. Strictly
refers to the selection of a PE for a given <ES, Ethernet Tag>.
o RR: Route Reflector. A network routing component for BGP
[RFC4456]. It offers an alternative to the logical full-mesh
requirement of the Internal Border Gateway Protocol (IBGP). The
purpose of the RR is concentration. Multiple BGP routers can peer
with a central point, the RR -- acting as a route reflector server
-- rather than peer with every other router in a full mesh. This
results in an O(N) peering as opposed to O(N^2).
o TTL: Time To Live.
This document also assumes that the reader is familiar with the
terminology provided in [RFC7432].
1.2. Default Designated Forwarder (DF) Election in EVPN Services
[RFC7432] defines the DF as the EVPN PE responsible for:
o Flooding BUM traffic on a given Ethernet Tag on a particular ES to
the CE. This is valid for Single-Active and All-Active EVPN
multihoming.
o Sending unicast traffic on a given Ethernet Tag on a particular ES
to the CE. This is valid for Single-Active multihoming.
Figure 1 illustrates an example that we will use to explain the DF
function.
+---------------+ +---------------+
| IP/MPLS | | IP/MPLS |
| CORE | | Core |
+----+ ES1 +----+ +----+ +----+ ES1 +----+ +----+
| CE1|-----| | | |____ES2 | CE1|-----| | | |____ES2
+----+ | PE1| | PE2| \ +----+ | PE1| | PE2| \
| | +----+ \+----+ | | +----+ \+----+
+----+ | | CE2| +----+ | | CE2|
| +----+ /+----+ | +----+ /+----+
| | |____/ | | | |____/ |
| | PE3| ES2 / | | PE3| ES2 /
| +----+ / | +----+ /
| | / | | /
+-------------+----+ / +-------------+----+ /
| PE4|____/ES2 | PE4|____/ES2
| | | |
+----+ +----+
Figure 1 Multi-homing Network of EVPN Figure 1: EVPN Multihoming
Figure 1 illustrates a case where there are two Ethernet Segments, Figure 1 illustrates a case where there are two ESes: ES1 and ES2.
ES1 and ES2. PE1 is attached to CE1 via Ethernet Segment ES1 whereas PE1 is attached to CE1 via ES1, whereas PE2, PE3, and PE4 are
PE2, PE3 and PE4 are attached to CE2 via ES2 i.e. PE2, PE3 and PE4 attached to CE2 via ES2, i.e., PE2, PE3, and PE4 form a redundancy
form a redundancy group. Since CE2 is multi-homed to different PEs on group. Since CE2 is multihomed to different PEs on the same ES, it
the same Ethernet Segment, it is necessary for PE2, PE3 and PE4 to is necessary for PE2, PE3, and PE4 to agree on a DF to satisfy the
agree on a DF to satisfy the above mentioned requirements. above-mentioned requirements.
The effect of forwarding loops in a Layer-2 network is particularly The effect of forwarding loops in a Layer 2 network is particularly
severe because of the broadcast nature of Ethernet traffic and the severe because of the broadcast nature of Ethernet traffic and the
lack of a Time-To-Live (TTL). Therefore it is very important that in lack of a TTL. Therefore, it is very important that, in the case of
the case of a multi-homed CE only one of the PEs be used to send BUM a multihomed CE, only one of the PEs be used to send BUM traffic
traffic to it. to it.
One of the pre-requisites for this support is that participating PEs One of the prerequisites for this support is that participating PEs
must agree amongst themselves as to who would act as the Designated must agree amongst themselves as to who would act as the DF. This
Forwarder (DF). This needs to be achieved through a distributed needs to be achieved through a distributed algorithm in which each
algorithm in which each participating PE independently and participating PE independently and unambiguously selects one of the
unambiguously selects one of the participating PEs as the DF, and the participating PEs as the DF, and the result should be consistent and
result should be consistent and unanimous. unanimous.
The default algorithm for DF election defined by [RFC7432] at the The default algorithm for DF election defined by [RFC7432] at the
granularity of (ESI,EVI) is referred to as "service carving". In this granularity of (ESI, EVI) is referred to as "service carving". In
document, service carving and Default DF Election algorithm are used this document, service carving and the default DF election algorithm
interchangeably. With service carving, it is possible to elect are used interchangeably. With service carving, it is possible to
multiple DFs per Ethernet Segment (one per EVI) in order to perform elect multiple DFs per ES (one per EVI) in order to perform load
load-balancing of traffic destined to a given Segment. The objective balancing of traffic destined to a given ES. The objective is that
is that the load-balancing procedures should carve up the BD space the load-balancing procedures should carve up the BD space among the
among the redundant PE nodes evenly, in such a way that every PE is redundant PE nodes evenly, in such a way that every PE is the DF for
the DF for a distinct set of EVIs. a distinct set of EVIs.
The DF Election algorithm as described in [RFC7432] (Section 8.5) is The DF election algorithm (as described in [RFC7432], Section 8.5) is
based on a modulus operation. The PEs to which the ES (for which DF based on a modulus operation. The PEs to which the ES (for which DF
election is to be carried out per EVI) is multi-homed form an ordered election is to be carried out per EVI) is multihomed form an ordered
(ordinal) list in ascending order of the PE IP address values. For (ordinal) list in ascending order by PE IP address value. For
example, there are N PEs: PE0, PE1,... PEN-1 ranked as per increasing example, there are N PEs: PE0, PE1,... PE(N-1) ranked as per
IP addresses in the ordinal list; then for each VLAN with Ethernet increasing IP addresses in the ordinal list; then, for each VLAN with
Tag V, configured on the Ethernet Segment ES1, PEx is the DF for VLAN Ethernet Tag V, configured on ES1, PEx is the DF for VLAN V on ES1
V on ES1 when x equals (V mod N). In the case of VLAN Bundle only the when x equals (V mod N). In the case of a VLAN Bundle, only the
lowest VLAN is used. In the case when the planned density is high lowest VLAN is used. In the case when the planned density is high
(meaning there are significant number of VLANs and the Ethernet Tags (meaning there are a significant number of VLANs and the Ethernet
are uniformly distributed), the thinking is that the DF Election will Tags are uniformly distributed), the thinking is that the DF election
be spread across the PEs hosting that Ethernet Segment and good load- will be spread across the PEs hosting that ES and good load balancing
balancing can be achieved. can be achieved.
However, the described Default DF Election algorithm has some However, the described default DF election algorithm has some
undesirable properties and in some cases can be somewhat disruptive undesirable properties and, in some cases, can be somewhat disruptive
and unfair. This document describes some of those issues and defines and unfair. This document describes some of those issues and defines
a mechanism for dealing with them. These mechanisms do involve a mechanism for dealing with them. These mechanisms do involve
changes to the Default DF Election algorithm, but they do not require changes to the default DF election algorithm, but they do not require
any changes to the EVPN Route exchange and have minimal changes in any changes to the EVPN route exchange, and changes in the EVPN
the EVPN routes. routes will be minimal.
In addition, there is a need to extend the DF Election procedures so In addition, there is a need to extend the DF election procedures so
that new algorithms and capabilities are possible. A single algorithm that new algorithms and capabilities are possible. A single
(the Default DF Election algorithm) may not meet the requirements in algorithm (the default DF election algorithm) may not meet the
all the use-cases. requirements in all the use cases.
Note that while [RFC7432] elects a DF per <ES, EVI>, this document Note that while [RFC7432] elects a DF per <ES, EVI>, this document
elects a DF per <ES, BD>. This means that unlike [RFC7432], where for elects a DF per <ES, BD>. This means that unlike [RFC7432], where
a VLAN-Aware Bundle service EVI there is only one DF for the EVI, for a VLAN-aware Bundle service EVI there is only one DF for the EVI,
this document specifies that there will be multiple DFs, one for each this document specifies that there will be multiple DFs, one for each
BD configured in that EVI. BD configured in that EVI.
1.2. Problem Statement 1.3. Problem Statement
This section describes some potential issues with the Default DF This section describes some potential issues with the default DF
Election algorithm. election algorithm.
1.2.1. Unfair Load-Balancing and Service Disruption 1.3.1. Unfair Load Balancing and Service Disruption
There are three fundamental problems with the current Default DF There are three fundamental problems with the current default DF
Election algorithm. election algorithm.
1- First, the algorithm will not perform well when the Ethernet Tag 1. The algorithm will not perform well when the Ethernet Tag follows
follows a non-uniform distribution, for instance when the Ethernet a non-uniform distribution -- for instance, when the Ethernet
Tags are all even or all odd. In such a case let us assume that Tags are all even or all odd. In such a case, let us assume that
the ES is multi-homed to two PEs; one of the PEs will be elected the ES is multihomed to two PEs; one of the PEs will be elected
as DF for all of the VLANs. This is very sub-optimal. It defeats as the DF for all of the VLANs. This is very suboptimal. It
the purpose of service carving as the DFs are not really evenly defeats the purpose of service carving, as the DFs are not really
spread across. In fact, in this particular case, one of the PEs evenly spread across the PEs hosting the ES. In fact, in this
does not get elected as DF at all, so it does not participate in particular case, one of the PEs does not get elected as the DF at
the DF responsibilities at all. Consider another example where, all, so it does not participate in DF responsibilities at all.
referring to Figure 1, lets assume that PE2, PE3, PE4 are in Consider another example where, referring to Figure 1, let's
ascending order of the IP address; and each VLAN configured on ES2 assume that (1) PE2, PE3, and PE4 are listed in ascending order
is associated with an Ethernet Tag of the form (3x+1), where x is by IP address and (2) each VLAN configured on ES2 is associated
an integer. This will result in PE3 always be selected as the DF. with an Ethernet Tag of the form (3x+1), where x is an integer.
This will result in PE3 always being selected as the DF.
2- The Ethernet tag that identifies the BD can be as large as 2^24; 2. The Ethernet Tag that identifies the BD can be as large as 2^24;
however, it is not guaranteed that the tenant BD on the ES will however, it is not guaranteed that the tenant BD on the ES will
conform to a uniform distribution. In fact, it is up to the conform to a uniform distribution. In fact, it is up to the
customer what BDs they will configure on the ES. Quoting [Knuth], customer what BDs they will configure on the ES. Quoting
"In general, we want to avoid values of M that divide r^k+a or [Knuth]:
r^k-a, where k and a are small numbers and r is the radix of the
alphabetic character set (usually r=64, 256 or 100), since a
remainder modulo such a value of M tends to be largely a simple
superposition of key digits. Such considerations suggest that we
choose M to be a prime number such that r^k!=a(modulo)M or
r^k!=?a(modulo)M for small k & a."
In our case, N is the number of PEs in [RFC7432] which corresponds In general, we want to avoid values of M that divide r^k+a or
to M above. Since N, N-1 or N+1 need not satisfy the primality r^k-a, where k and a are small numbers and r is the radix of
properties of the M above; as per the [RFC7432] modulo based DF the alphabetic character set (usually r=64, 256 or 100), since
assignment, whenever a PE goes down or a new PE boots up (hosting a remainder modulo such a value of M tends to be largely a
the same Ethernet Segment), the modulo scheme will not necessarily simple superposition of key digits. Such considerations
map BDs to PEs uniformly. suggest that we choose M to be a prime number such that
r^k!=a(modulo)M or r^k!=?a(modulo)M for small k & a.
3- The third problem is one of disruption. Consider a case when the In our case, N is the number of PEs (Section 8.5 of [RFC7432]).
same Ethernet Segment is multi-homed to a set of PEs. When the ES N corresponds to M above. Since N, N-1, or N+1 need not satisfy
is down in one of the PEs, say PE1, or PE1 itself reboots, or the the primality properties of M, as per the modulo-based DF
BGP process goes down or the connectivity between PE1 and an RR assignment [RFC7432], whenever a PE goes down or a new PE boots
goes down, the effective number of PEs in the system now becomes up (attached to the same ES), the modulo scheme will not
N-1, and DFs are computed for all the VLANs that are configured on necessarily map BDs to PEs uniformly.
that Ethernet Segment. In general, if the DF for a VLAN v happens
not to be PE1, but some other PE, say PE2, it is likely that some
other PE (different from PE1 and PE2) will become the new DF. This
is not desirable. Similarly when a new PE hosts the same Ethernet
Segment, the mapping again changes because of the modulus
operation. This results in needless churn. Again referring to
Figure 1, say v1, v2 and v3 are VLANs configured on ES2 with
associated Ethernet Tags of value 999, 1000 and 1001 respectively.
So PE1, PE2 and PE3 are the DFs for v1, v2 and v3 respectively.
Now when PE3 goes down, PE2 will become the DF for v1 and PE1 will
become the DF for v2.
One point to note is that the Default DF election algorithm assumes 3. Disruption is another problem. Consider a case when the same ES
that all the PEs who are multi-homed to the same Ethernet Segment is multihomed to a set of PEs. When the ES is DOWN in one of the
(and interested in the DF Election by exchanging EVPN routes) use an PEs, say PE1, or PE1 itself reboots, or the BGP process goes down
Originating Router's IP Address of the same family. This does not or the connectivity between PE1 and an RR goes down, the
need to be the case as the EVPN address-family can be carried over an effective number of PEs in the system now becomes N-1, and DFs
are computed for all the VLANs that are configured on that ES.
In general, if the DF for a VLAN V happens not to be PE1, but
some other PE, say PE2, it is likely that some other PE
(different from PE1 and PE2) will become the new DF. This is not
desirable. Similarly, when a new PE hosts the same ES, the
mapping again changes because of the modulus operation. This
results in needless churn. Again referring to Figure 1, say V1,
V2, and V3 are VLANs configured on ES2 with associated Ethernet
Tags of values 999, 1000, and 1001, respectively. So, PE1, PE2,
and PE3 are the DFs for V1, V2, and V3, respectively. Now when
PE3 goes down, PE2 will become the DF for V1 and PE1 will become
the DF for V2.
One point to note is that the default DF election algorithm assumes
that all the PEs who are multihomed to the same ES (and interested in
the DF election by exchanging EVPN routes) use an Originating
Router's IP address [RFC7432] of the same family. This does not need
to be the case, as the EVPN address family can be carried over an
IPv4 or IPv6 peering, and the PEs attached to the same ES may use an IPv4 or IPv6 peering, and the PEs attached to the same ES may use an
address of either family. address of either family.
Mathematically, a conventional hash function maps a key k to a number Mathematically, a conventional hash function maps a key k to a number
i representing one of m hash buckets through a function h(k) i.e. i representing one of m hash buckets through a function h(k), i.e.,
i=h(k). In the EVPN case, h is simply a modulo-m hash function viz. i = h(k). In the EVPN case, h is simply a modulo-m hash function
h(v) = v mod N, where N is the number of PEs that are multi-homed to viz. h(V) = V mod N, where N is the number of PEs that are multihomed
the Ethernet Segment in discussion. It is well-known that for good to the ES in question. It is well known that for good hash
hash distribution using the modulus operation, the modulus N should distribution using the modulus operation, the modulus N should be a
be a prime-number not too close to a power of 2 [CLRS2009]. When the prime number not too close to a power of 2 [CLRS2009]. When the
effective number of PEs changes from N to N-1 (or vice versa); all effective number of PEs changes from N to N-1 (or vice versa), all
the objects (VLAN V) will be remapped except those for which V mod N the objects (VLAN V) will be remapped except those for which V mod N
and V mod (N-1) refer to the same PE in the previous and subsequent and V mod (N-1) refer to the same PE in the previous and subsequent
ordinal rankings respectively. From a forwarding perspective, this is ordinal rankings, respectively. From a forwarding perspective, this
a churn, as it results in re-programming the PE ports as either is a churn, as it results in reprogramming the PE ports as either
blocking or non-blocking at the PEs where the DF state changes. blocking or non-blocking at the PEs where the DF state changes.
This document addresses this problem and furnishes a solution to this This document addresses this problem and furnishes a solution to this
undesirable behavior. undesirable behavior.
1.2.2. Traffic Black-Holing on Individual AC Failures 1.3.2. Traffic Black-Holing on Individual AC Failures
As discussed in section 2.1 the Default DF Election algorithm defined The default DF election algorithm defined by [RFC7432] takes into
by [RFC7432] takes into account only two variables in the modulus account only two variables in the modulus function for a given ES:
function for a given ES: the existence of the PE's IP address on the the existence of the PE's IP address in the candidate list and the
candidate list and the locally provisioned Ethernet Tags. locally provisioned Ethernet Tags.
If the DF for an <ESI, EVI> fails (due to physical link/node If the DF for an <ESI, EVI> fails (due to physical link/node
failures) an ES route withdrawal will make the Non-DF (NDF) PEs re- failures), an ES route withdrawal will make the NDF PEs re-elect the
elect the DF for that <ESI, EVI> and the service will be recovered. DF for that <ESI, EVI> and the service will be recovered.
However, the Default DF election procedure does not provide a However, the default DF election procedure does not provide
protection against "logical" failures or human errors that may occur protection against "logical" failures or human errors that may occur
at service level on the DF, while the list of active PEs for a given at the service level on the DF, while the list of active PEs for a
ES does not change. These failures may have an impact not only on the given ES does not change. These failures may have an impact not only
local PE where the issue happens, but also on the rest of the PEs of on the local PE where the issue happens but also on the rest of the
the ES. Some examples of such logical failures are listed below: PEs of the ES. Some examples of such logical failures are listed
below:
a) A given individual Attachment Circuit (AC) defined in an ES is (a) A given individual AC defined in an ES is accidentally shut down
accidentally shutdown or even not provisioned yet (hence the or is not provisioned yet (hence, the ACS is DOWN), while the ES
Attachment Circuit Status - ACS - is DOWN), while the ES is is operationally active (since the ES route is active).
operationally active (since the ES route is active).
b) A given MAC-VRF - with a defined ES - is shutdown or not (b) A given MAC-VRF with a defined ES is either shut down or not
provisioned yet, while the ES is operationally active (since the provisioned yet, while the ES is operationally active (since the
ES route is active). In this case, the ACS of all the ACs defined ES route is active). In this case, the ACS of all the ACs
in that MAC-VRF is considered to be DOWN. defined in that MAC-VRF is considered to be DOWN.
Neither (a) nor (b) will trigger the DF re-election on the remote Neither (a) nor (b) will trigger the DF re-election on the remote
multi-homed PEs for a given ES since the ACS is not taken into multihomed PEs for a given ES, since the ACS is not taken into
account in the DF election procedures. While the ACS is used as a DF account in the DF election procedures. While the ACS is used as a DF
election tie-breaker and trigger in VPLS multi-homing procedures election tiebreaker and trigger in Virtual Private LAN Service (VPLS)
[VPLS-MH], there is no procedure defined in EVPN [RFC7432] to trigger multihoming procedures [VPLS-MH], there is no procedure defined in
the DF re-election based on the ACS change on the DF. the EVPN specification [RFC7432] to trigger the DF re-election based
on the ACS change on the DF.
Figure 2 illustrates the described issue with an example. Figure 2 shows an example of logical AC failure.
+---+ +---+
|CE4| |CE4|
+---+ +---+
| |
PE4 | PE4 |
+-----+-----+ +-----+-----+
+---------------| +-----+ |---------------+ +---------------| +-----+ |---------------+
| | | BD-1| | | | | | BD-1| | |
| +-----------+ | | +-----------+ |
skipping to change at page 9, line 30 skipping to change at page 11, line 32
| | BD-1| | | | BD-1| | | | BD-1| | | | BD-1| | | | BD-1| | | | BD-1| |
| +-----+ |-------| +-----+ |-------| +-----+ | | +-----+ |-------| +-----+ |-------| +-----+ |
+-----------+ +-----------+ +-----------+ +-----------+ +-----------+ +-----------+
AC1\ ES12 /AC2 AC3\ ES23 /AC4 AC1\ ES12 /AC2 AC3\ ES23 /AC4
\ / \ / \ / \ /
\ / \ / \ / \ /
+----+ +----+ +----+ +----+
|CE12| |CE23| |CE12| |CE23|
+----+ +----+ +----+ +----+
Figure 2 Default DF Election and Traffic Black-Holing Figure 2: Default DF Election and Traffic Black-Holing
BD-1 is defined in PE1, PE2, PE3 and PE4. CE12 is a multi-homed CE BD-1 is defined in PE1, PE2, PE3, and PE4. CE12 is a multihomed CE
connected to ES12 in PE1 and PE2. Similarly CE23 is multi-homed to connected to ES12 in PE1 and PE2. Similarly, CE23 is multihomed to
PE2 and PE3 using ES23. Both, CE12 and CE23, are connected to BD-1 PE2 and PE3 using ES23. Both CE12 and CE23 are connected to BD-1
through VLAN-based service interfaces: CE12-VID 1 (VLAN ID 1 on CE12) through VLAN-based service interfaces: CE12-VID 1 (VID 1 on CE12) is
is associated to AC1 and AC2 in BD-1, whereas CE23-VID 1 is associated with AC1 and AC2 in BD-1, whereas CE23-VID 1 is associated
associated to AC3 and AC4 in BD-1. Assume that, although not with AC3 and AC4 in BD-1. Assume that, although not represented,
represented, there are other ACs defined on these ES mapped to there are other ACs defined on these ESes mapped to different BDs.
different BDs.
After executing the [RFC7432] Default DF election algorithm, PE2 After executing the default DF election algorithm as described in
turns out to be the DF for ES12 and ES23 in BD-1. The following [RFC7432], PE2 turns out to be the DF for ES12 and ES23 in BD-1. The
issues may arise: following issues may arise:
a) If AC2 is accidentally shutdown or even not configured, CE12 (a) If AC2 is accidentally shut down or is not configured yet, CE12
traffic will be impacted. In case of all-active multi-homing, the traffic will be impacted. In the case of All-Active
BUM traffic to CE12 will be "black-holed", whereas for single- multihoming, the BUM traffic to CE12 will be "black-holed",
active multi-homing, all the traffic to/from CE12 will be whereas for Single-Active multihoming, all the traffic to/from
discarded. This is due to the fact that a logical failure in PE2's CE12 will be discarded. This is because a logical failure in
AC2 may not trigger an ES route withdrawn for ES12 (since there PE2's AC2 may not trigger an ES route withdrawal for ES12 (since
are still other ACs active on ES12) and therefore PE1 will not re- there are still other ACs active on ES12); therefore, PE1 will
run the DF election procedures. not rerun the DF election procedures.
b) If the Bridge Table for BD-1 is administratively shutdown or even (b) If the bridge table for BD-1 is administratively shut down or is
not configured yet on PE2, CE12 and CE23 will both be impacted: not configured yet on PE2, CE12 and CE23 will both be impacted:
BUM traffic to both CEs will be discarded in case of all-active BUM traffic to both CEs will be discarded in the case of
multi-homing and all traffic will be discarded to/from the CEs in All-Active multihoming, and all traffic will be discarded
case of single-active multi-homing. This is due to the fact that to/from the CEs in the case of Single-Active multihoming. This
PE1 and PE3 will not re-run the DF election procedures and will is because PE1 and PE3 will not rerun the DF election procedures
keep assuming PE2 is the DF. and will keep assuming that PE2 is the DF.
Quoting [RFC7432], "when an Ethernet Tag is decommissioned on an Quoting [RFC7432], "When an Ethernet tag is decommissioned on an
Ethernet Segment, then the PE MUST withdraw the Ethernet A-D per EVI Ethernet segment, then the PE MUST withdraw the Ethernet A-D per EVI
route(s) announced for the <ESI, Ethernet Tags> that are impacted by route(s) announced for the <ESI, Ethernet tags> that are impacted by
the decommissioning", however, while this A-D per EVI route the decommissioning." However, while this A-D per EVI route
withdrawal is used at the remote PEs performing aliasing or backup withdrawal is used at the remote PEs performing aliasing or backup
procedures, it is not used to influence the DF election for the procedures, it is not used to influence the DF election for the
affected EVIs. affected EVIs.
This document adds an optional modification of the DF Election This document adds an optional modification of the DF election
procedure so that the ACS may be taken into account as a variable in procedure so that the ACS may be taken into account as a variable in
the DF election, and therefore EVPN can provide protection against the DF election; therefore, EVPN can provide protection against
logical failures. logical failures.
1.3. The Need for Extending the Default DF Election in EVPN 1.4. The Need for Extending the Default DF Election in EVPN Services
Section 1.2 describes some of the issues that exist in the Default DF Section 1.3 describes some of the issues that exist in the default DF
Election procedures. In order to address those issues, this document election procedures. In order to address those issues, this document
introduces a new DF Election framework. This framework allows the PEs introduces a new DF election framework. This framework allows the
to agree on a common DF election algorithm, as well as the PEs to agree on a common DF election algorithm, as well as the
capabilities to enable during the DF Election procedure. Generally, capabilities to enable during the DF election procedure. Generally,
'DF election algorithm' refers to the algorithm by which a number of "DF election algorithm" refers to the algorithm by which a number of
input parameters are used to determine the DF PE, while 'DF election input parameters are used to determine the DF PE, while "DF election
capability' refers to an additional feature that can be used prior to capability" refers to an additional feature that can be used prior to
the invocation of the DF election algorithm, such as modifying the the invocation of the DF election algorithm, such as modifying the
inputs (or list of candidate PEs). inputs (or list of candidate PEs).
Within this framework, this document defines a new DF Election Within this framework, this document defines a new DF election
algorithm and a new capability that can influence the DF Election algorithm and a new capability that can influence the DF election
result: result:
o The new DF Election algorithm is referred to as "Highest Random o The new DF election algorithm is referred to as "Highest Random
Weight" (HRW). The HRW procedures are described in section 4. Weight" (HRW). The HRW procedures are described in Section 3.
o The new DF Election capability is referred to as "AC-Influenced DF o The new DF election capability is referred to as "AC-Influenced DF
Election" (AC-DF). The AC-DF procedures are described in section 5. election" (AC-DF). The AC-DF procedures are described in
Section 4.
o HRW and AC-DF mechanisms are independent of each other. Therefore, o HRW and AC-DF mechanisms are independent of each other.
a PE may support either HRW or AC-DF independently or may support Therefore, a PE may support either HRW or AC-DF independently or
both of them together. A PE may also support AC-DF capability along may support both of them together. A PE may also support the
with the Default DF election algorithm per [RFC7432]. AC-DF capability along with the default DF election algorithm per
[RFC7432].
In addition, this document defines a way to indicate the support of In addition, this document defines a way to indicate the support of
HRW and/or AC-DF along with the EVPN ES routes advertised for a given HRW and/or AC-DF along with the EVPN ES routes advertised for a given
ES. Refer to section 3.2 for more details. ES. Refer to Section 2.2 for more details.
2. Conventions and Terminology
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and
"OPTIONAL" in this document are to be interpreted as described in BCP
14 [RFC2119] [RFC8174] when, and only when, they appear in all
capitals, as shown here.
o AC and ACS - Attachment Circuit and Attachment Circuit Status. An
AC has an Ethernet Tag associated to it.
o BUM - refers to the Broadcast, Unknown unicast and Multicast
traffic.
o DF, NDF and BDF - Designated Forwarder, Non-Designated Forwarder
and Backup Designated Forwarder
o Ethernet A-D per ES route - refers to [RFC7432] route type 1 or
Auto-Discovery per Ethernet Segment route.
o Ethernet A-D per EVI route - refers to [RFC7432] route type 1 or
Auto-Discovery per EVPN Instance route.
o ES and ESI - Ethernet Segment and Ethernet Segment Identifier.
o EVI - EVPN Instance.
o MAC-VRF - A Virtual Routing and Forwarding table for Media Access
Control (MAC) addresses on a PE.
o BD - Broadcast Domain. An EVI may be comprised of one (VLAN-Based
or VLAN Bundle services) or multiple (VLAN-Aware Bundle services)
Broadcast Domains.
o Bridge Table - An instantiation of a broadcast domain on a MAC-VRF.
o HRW - Highest Random Weight
o VID and CE-VID - VLAN Identifier and Customer Equipment VLAN
Identifier.
o Ethernet Tag - used to represent a Broadcast Domain that is
configured on a given ES for the purpose of DF election. Note that
any of the following may be used to represent a Broadcast Domain:
VIDs (including Q-in-Q tags), configured IDs, VNI (VXLAN Network
Identifiers), normalized VID, I-SIDs (Service Instance
Identifiers), etc., as long as the representation of the broadcast
domains is configured consistently across the multi-homed PEs
attached to that ES. The Ethernet Tag value MUST be different from
zero.
o Ethernet Tag ID - refers to the identifier used in the EVPN routes
defined in [RFC7432]. Its value may be the same as the Ethernet Tag
value (see Ethernet Tag definition) when advertising routes for
VLAN-aware Bundle services. Note that in case of VLAN-based or VLAN
Bundle services, the Ethernet Tag ID is zero.
o DF Election Procedure and DF Algorithm - The Designated Forwarder
Election Procedure or simply DF Election, refers to the process in
its entirety, including the discovery of the PEs in the ES, the
creation and maintenance of the PE candidate list and the selection
of a PE. The Designated Forwarder Algorithm is just a component of
the DF Election Procedure and strictly refers to the selection of a
PE for a given <ES,Ethernet Tag>.
o TTL - Time To Live
This document also assumes familiarity with the terminology of
[RFC7432].
3. Designated Forwarder Election Protocol and BGP Extensions 2. Designated Forwarder Election Protocol and BGP Extensions
This section describes the BGP extensions required to support the new This section describes the BGP extensions required to support the new
DF Election procedures. In addition, since the EVPN specification DF election procedures. In addition, since the EVPN specification
[RFC7432] does leave several questions open as to the precise final [RFC7432] leaves several questions open as to the precise FSM
state machine behavior of the DF election, section 3.1 describes behavior of the DF election, Section 2.1 precisely describes the
precisely the intended behavior. intended behavior.
3.1. The DF Election Finite State Machine (FSM) 2.1. The DF Election Finite State Machine (FSM)
Per [RFC7432], the FSM described in Figure 3 is executed per Per [RFC7432], the FSM shown in Figure 3 is executed per <ES, VLAN>
<ESI,VLAN> in case of VLAN-based service or <ESI,[VLANs in VLAN in the case of VLAN-based service or <ES, [VLANs in VLAN Bundle]> in
Bundle]> in case of VLAN Bundle on each participating PE. the case of a VLAN Bundle on each participating PE. Note that the
FSM is conceptual. Any design or implementation MUST comply with
behavior that is equivalent to the behavior outlined in this FSM.
Observe that currently the VLANs are derived from local configuration VLAN_CHANGE VLAN_CHANGE
and the FSM does not provide any protection against misconfiguration RCVD_ES RCVD_ES
where the same (EVI,ESI) combination has different set of VLANs on LOST_ES LOST_ES
different participating PEs or one of the PEs elects to consider +----+ +-------+
VLANs as VLAN Bundle and another as separate VLANs for election | | | v
purposes (service type mismatch). | +-+----+ ES_UP ++-------++
+->+ INIT +-------------->+ DF_WAIT |
++-----+ +-------+-+
^ |
+-----------+ | |DF_TIMER
| ANY_STATE +-------+ VLAN_CHANGE |
+-----------+ ES_DOWN +-----------------+ |
| RCVD_ES v v
+--------++ LOST_ES ++------+-+
| DF_DONE +<--------------+ DF_CALC +<-+
+---------+ CALCULATED +-------+-+ |
| |
+----+
VLAN_CHANGE
RCVD_ES
LOST_ES
The FSM is conceptual and any design or implementation MUST comply Figure 3: DF Election Finite State Machine
with a behavior equivalent to the one outlined in this FSM.
VLAN_CHANGE Observe that each EVI is locally configured on each of the multihomed
VLAN_CHANGE RCVD_ES PEs attached to a given ES and that the FSM does not provide any
RCVD_ES LOST_ES protection against inconsistent configuration between these PEs.
LOST_ES +----+ That is, for a given EVI, one or more of the PEs are inadvertently
+----+ | v configured with a different set of VLANs for a VLAN-aware Bundle
| | ++----++ service or with different VLANs for a VLAN-based service.
| +-+----+ ES_UP | DF |
+->+ INIT +---------------> WAIT |
++-----+ +----+-+
^ |
+-----------+ | |DF_TIMER
| ANY STATE +-------+ VLAN_CHANGE |
+-----------+ ES_DOWN +-----------------+ |
| RCVD_ES v v
+-----++ LOST_ES ++---+-+
| DF | | DF |
| DONE +<--------------+ CALC +<-+
+------+ CALCULATED +----+-+ |
| |
+----+
VLAN_CHANGE
RCVD_ES
LOST_ES
Figure 3 DF Election Finite State Machine The states and events shown in Figure 3 are defined as follows.
States: States:
1. INIT: Initial State 1. INIT: Initial state.
2. DF_WAIT: State in which the participant waits for enough 2. DF_WAIT: State in which the participant waits for enough
information to perform the DF election for the EVI/ESI/VLAN information to perform the DF election for the EVI/ESI/VLAN
combination. combination.
3. DF_CALC: State in which the new DF is recomputed. 3. DF_CALC: State in which the new DF is recomputed.
4. DF_DONE: State in which the according DF for the EVI/ESI/VLAN 4. DF_DONE: State in which the corresponding DF for the EVI/ESI/VLAN
combination has been elected. combination has been elected.
5. ANY_STATE: Refers to any of the above states. 5. ANY_STATE: Refers to any of the above states.
Events: Events:
1. ES_UP: The ESI has been locally configured as 'up'. 1. ES_UP: The ES has been locally configured as "UP".
2. ES_DOWN: The ESI has been locally configured as 'down'. 2. ES_DOWN: The ES has been locally configured as "DOWN".
3. VLAN_CHANGE: The VLANs configured in a bundle (that uses the ESI) 3. VLAN_CHANGE: The VLANs configured in a bundle (that uses the ES)
changed. This event is necessary for VLAN Bundles only. changed. This event is necessary for VLAN Bundles only.
4. DF_TIMER: DF Wait timer [RFC7432] has expired. 4. DF_TIMER: DF timer [RFC7432] (referred to as "Wait timer" in this
document) has expired.
5. RCVD_ES: A new or changed Ethernet Segment route is received in a 5. RCVD_ES: A new or changed ES route is received in an Update
BGP REACH UPDATE. Receiving an unchanged UPDATE MUST NOT trigger message with an MP_REACH_NLRI. Receiving an unchanged Update
this event. MUST NOT trigger this event.
6. LOST_ES: A BGP UNREACH UPDATE for a previously received Ethernet 6. LOST_ES: An Update message with an MP_UNREACH_NLRI for a
Segment route has been received. If an UNREACH is seen for a previously received ES route has been received. If such a
route that has not been advertised previously, the event MUST NOT message is seen for a route that has not been advertised
be triggered. previously, the event MUST NOT be triggered.
7. CALCULATED: DF has been successfully calculated. 7. CALCULATED: DF has been successfully calculated.
According actions when transitions are performed or states Corresponding actions when transitions are performed or states are
entered/exited: entered/exited:
1. ANY_STATE on ES_DOWN: (i) stop DF wait timer (ii) assume NDF for 1. ANY_STATE on ES_DOWN:
local PE. (i) Stop the DF Wait timer.
(ii) Assume an NDF for the local PE.
2. INIT on ES_UP: transition to DF_WAIT. 2. INIT on ES_UP: Transition to DF_WAIT.
3. INIT on VLAN_CHANGE, RCVD_ES or LOST_ES: do nothing. 3. INIT on VLAN_CHANGE, RCVD_ES, or LOST_ES: Do nothing.
4. DF_WAIT on entering the state: (i) start DF wait timer if not 4. DF_WAIT on entering the state:
started already or expired (ii) assume NDF for local PE. (i) Start the DF Wait timer if not started already or expired.
(ii) Assume an NDF for the local PE.
5. DF_WAIT on VLAN_CHANGE, RCVD_ES or LOST_ES: do nothing. 5. DF_WAIT on VLAN_CHANGE, RCVD_ES, or LOST_ES: Do nothing.
6. DF_WAIT on DF_TIMER: transition to DF_CALC. 6. DF_WAIT on DF_TIMER: Transition to DF_CALC.
7. DF_CALC on entering or re-entering the state: (i) rebuild 7. DF_CALC on entering or re-entering the state:
candidate list, hash and perform election (ii) Afterwards FSM (i) Rebuild the candidate list, perform a hash, and perform the
generates CALCULATED event against itself. election.
(ii) Afterwards, the FSM generates a CALCULATED event against
itself.
8. DF_CALC on VLAN_CHANGE, RCVD_ES or LOST_ES: do as in transition 8. DF_CALC on VLAN_CHANGE, RCVD_ES, or LOST_ES: Do as prescribed in
7. Transition 7.
9. DF_CALC on CALCULATED: mark election result for VLAN or bundle, 9. DF_CALC on CALCULATED: Mark the election result for the VLAN or
and transition to DF_DONE. bundle, and transition to DF_DONE.
11. DF_DONE on exiting the state: if there is a new DF election 10. DF_DONE on exiting the state: If a new DF election is triggered
triggered and the current DF is lost, then assume NDF for local and the current DF is lost, then assume an NDF for the local PE
PE for VLAN or VLAN Bundle. for the VLAN or VLAN Bundle.
12. DF_DONE on VLAN_CHANGE, RCVD_ES or LOST_ES: transition to 11. DF_DONE on VLAN_CHANGE, RCVD_ES, or LOST_ES: Transition to
DF_CALC. DF_CALC.
The above events and transitions are defined for the Default DF The above events and transitions are defined for the default DF
Election Algorithm. As described in Section 5, the use of the AC-DF election algorithm. As described in Section 4, the use of the AC-DF
capability introduces additional events and transitions. capability introduces additional events and transitions.
3.2. The DF Election Extended Community 2.2. The DF Election Extended Community
For the DF election procedures to be consistent and unanimous, it is For the DF election procedures to be consistent and unanimous, it is
necessary that all the participating PEs agree on the DF Election necessary that all the participating PEs agree on the DF election
algorithm and capabilities to be used. For instance, it is not algorithm and capabilities to be used. For instance, it is not
possible that some PEs continue to use the Default DF Election possible for some PEs to continue to use the default DF election
algorithm and some PEs use HRW. For brown-field deployments and for algorithm while some PEs use HRW. For brownfield deployments and for
interoperability with legacy PEs, it is important that all PEs need interoperability with legacy PEs, it is important that all PEs have
to have the capability to fall back on the Default DF Election. A PE the ability to fall back on the default DF election. A PE can
can indicate its willingness to support HRW and/or AC-DF by signaling indicate its willingness to support HRW and/or AC-DF by signaling a
a DF Election Extended Community along with the Ethernet Segment DF Election Extended Community along with the ES route (Route
route (Type-4). Type 4).
The DF Election Extended Community is a new BGP transitive extended The DF Election Extended Community is a new BGP transitive Extended
community attribute [RFC4360] that is defined to identify the DF Community attribute [RFC4360] that is defined to identify the DF
election procedure to be used for the Ethernet Segment. Figure 4 election procedure to be used for the ES. Figure 4 shows the
shows the encoding of the DF Election Extended Community. encoding of the DF Election Extended Community.
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Type=0x06 | Sub-Type(0x06)| RSV | DF Alg | Bitmap ~ | Type = 0x06 | Sub-Type(0x06)| RSV | DF Alg | Bitmap ~
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
~ Bitmap | Reserved | ~ Bitmap | Reserved |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Figure 4 DF Election Extended Community Figure 4: DF Election Extended Community
Where: Where:
o Type is 0x06 as registered with IANA for EVPN Extended Communities. o Type: 0x06, as registered with IANA (Section 7) for EVPN Extended
Communities.
o Sub-Type is 0x06 - "DF Election Extended Community" as requested by o Sub-Type: 0x06. "DF Election Extended Community", as registered
this document to IANA. with IANA.
o RSV / Reserved - Reserved bits for DF Alg specific information. o RSV/Reserved: Reserved bits for information that is specific to
DF Alg.
o DF Alg (5 bits) - Encodes the DF Election algorithm values (between o DF Alg (5 bits): Encodes the DF election algorithm values (between
0 and 31) that the advertising PE desires to use for the ES. This 0 and 31) that the advertising PE desires to use for the ES. This
document requests IANA to set up a registry called "DF Alg document creates an IANA registry called "DF Alg" (Section 7),
Registry" and solicits the following values: which contains the following values:
- Type 0: Default DF Election algorithm, or modulus-based algorithm - Type 0: Default DF election algorithm, or modulus-based
as in [RFC7432]. algorithm as defined in [RFC7432].
- Type 1: HRW algorithm (explained in this document). - Type 1: HRW Algorithm (Section 3).
- Types 2-30: Unassigned. - Types 2-30: Unassigned.
- Type 31: Reserved for Experimental Use. - Type 31: Reserved for Experimental Use.
o Bitmap (2 octets) - Encodes "capabilities" to use with the DF o Bitmap (2 octets): Encodes "capabilities" to use with the DF
Election algorithm in the field "DF Alg". This document requests election algorithm in the DF Alg field. This document creates an
IANA to create a registry for the Bitmap field, with values 0-15, IANA registry (Section 7) for the Bitmap field, with values 0-15.
called "DF Election Capabilities" and solicits the following This registry is called "DF Election Capabilities" and includes
values: the bit values listed below.
1 1 1 1 1 1 1 1 1 1 1 1
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| |A| | | |A| |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Figure 5 Bitmap field in the DF Election Extended Community Figure 5: Bitmap Field in the DF Election Extended Community
- Bit 0 (corresponds to Bit 24 of the DF Election Extended - Bit 0 (corresponds to Bit 24 of the DF Election Extended
Community): Unassigned. Community): Unassigned.
- Bit 1: AC-DF (AC-Influenced DF Election, explained in this - Bit 1: AC-DF Capability (AC-Influenced DF election; see
document). When set to 1, it indicates the desire to use AC- Section 4). When set to 1, it indicates the desire to use
Influenced DF Election with the rest of the PEs in the ES. AC-DF with the rest of the PEs in the ES.
- Bits 2-15: Unassigned. - Bits 2-15: Unassigned.
The DF Election Extended Community is used as follows: The DF Election Extended Community is used as follows:
o A PE SHOULD attach the DF Election Extended Community to any o A PE SHOULD attach the DF Election Extended Community to any
advertised ES route and the Extended Community MUST be sent if the advertised ES route, and the Extended Community MUST be sent if
ES is locally configured with a DF election algorithm other than the ES is locally configured with a DF election algorithm other
the Default Election algorithm or if a capability is required to be than the default DF election algorithm or if a capability is
used. In the Extended Community, the PE indicates the desired "DF required to be used. In the Extended Community, the PE indicates
Alg" algorithm and "Bitmap" capabilities to be used for the ES. the desired "DF Alg" algorithm and "Bitmap" capabilities to be
used for the ES.
- Only one DF Election Extended Community can be sent along with an
ES route. Note that the intent is not for the advertising PE to
indicate all the supported DF election algorithms and
capabilities, but signal the preferred one.
- DF Algs 0 and 1 can be both used with bit AC-DF set to 0 or 1.
- In general, a specific DF Alg SHOULD determine the use of the - Only one DF Election Extended Community can be sent along with
reserved bits in the Extended Community, which may be used in a an ES route. Note that the intent is not for the advertising
different way for a different DF Alg. In particular, for DF Algs PE to indicate all the supported DF election algorithms and
0 and 1, the reserved bits are not set by the advertising PE and capabilities but to signal the preferred one.
SHOULD be ignored by the receiving PE.
o When a PE receives the ES Routes from all the other PEs for the ES - DF Alg values 0 and 1 can both be used with Bit 1 (AC-DF) set
in question, it checks to see if all the advertisements have the to 0 or 1.
extended community with the same DF Alg and Bitmap:
- In the case that they do, this particular PE MUST follow the - In general, a specific DF Alg SHOULD determine the use of the
procedures for the advertised DF Alg and capabilities. For reserved bits in the Extended Community, which may be used in a
instance, if all ES routes for a given ES indicate DF Alg HRW and different way for a different DF Alg. In particular, for DF
AC-DF set to 1, the receiving PE and by induction all the other Alg values 0 and 1, the reserved bits are not set by the
PEs in the ES will proceed to do DF Election as per the HRW advertising PE and SHOULD be ignored by the receiving PE.
Algorithm and following the AC-DF procedures.
- Otherwise if even a single advertisement for the type-4 route is o When a PE receives the ES routes from all the other PEs for the ES
received without the locally configured DF Alg and capability, in question, it checks to see if all the advertisements have the
the Default DF Election algorithm (modulus) algorithm MUST be Extended Community with the same DF Alg and Bitmap:
used as in [RFC7432]. This procedure handles the case where
participating PEs in the ES disagree about the DF algorithm and
capability to apply.
- The absence of the DF Election Extended Community or the presence - If they do, this particular PE MUST follow the procedures for
of multiple DF Election Extended Communities (in the same route) the advertised DF Alg and capabilities. For instance, if all
MUST be interpreted by a receiving PE as an indication of the ES routes for a given ES indicate DF Alg HRW and AC-DF set
Default DF Election algorithm on the sending PE, that is, DF Alg to 1, then the PEs attached to the ES will perform the DF
0 and no DF Election capabilities. election as per the HRW algorithm and following the AC-DF
procedures.
o When all the PEs in an ES advertise DF Type 31, they will rely on - Otherwise, if even a single advertisement for Route Type 4 is
the local policy to decide how to proceed with the DF Election. received without the locally configured DF Alg and capability,
the default DF election algorithm MUST be used as prescribed in
[RFC7432]. This procedure handles the case where participating
PEs in the ES disagree about the DF algorithm and capability to
be applied.
o For any new capability defined in the future, the - The absence of the DF Election Extended Community or the
applicability/compatibility of this new capability to the existing presence of multiple DF Election Extended Communities (in the
DF Algs must be assessed on a case by case basis. same route) MUST be interpreted by a receiving PE as an
indication of the default DF election algorithm on the sending
PE -- that is, DF Alg 0 and no DF election capabilities.
o Likewise, for any new DF Alg defined in future, its o When all the PEs in an ES advertise DF Type 31, they will rely on
applicability/compatibility to the existing capabilities must be the local policy to decide how to proceed with the DF election.
assessed on a case by case basis.
3.2.1. Backward Compatibility o For any new capability defined in the future, the applicability/
compatibility of this new capability to/with the existing DF Alg
values must be assessed on a case-by-case basis.
[RFC7432] implementations (i.e., those that predate this o Likewise, for any new DF Alg defined in the future, its
specification) will not advertise the DF Election Extended Community. applicability/compatibility to/with the existing capabilities must
That means that all other participating PEs in the ES will not be assessed on a case-by-case basis.
receive DF preferences and will revert to the Default DF Election
algorithm without AC-Influenced DF Election.
Similarly, a [RFC7432] implementation receiving a DF Election 2.2.1. Backward Compatibility
Extended Community will ignore it and will continue to use the
Default DF Election algorithm.
3.3. Auto-Derivation of ES-Import Route Target Implementations that comply with [RFC7432] only (i.e.,
implementations that predate this specification) will not advertise
the DF Election Extended Community. That means that all other
participating PEs in the ES will not receive DF preferences and will
revert to the default DF election algorithm without AC-DF.
Section 7.6 of [RFC7432] describes how the value of the ES-Import Similarly, an implementation that complies with [RFC7432] only and
Route Target for ESI types 1, 2, and 3 can be auto-derived by using that receives a DF Election Extended Community will ignore it and
the high-order six bytes of the nine byte ESI value. The same auto- will continue to use the default DF election algorithm.
derivation procedure can be extended to ESI types 0, 4, and 5 as long
as it is ensured that the auto-derived values for ES-Import RT among
different ES types don't overlap. As in [RFC7432], the mechanism to
guarantee that the auto-derived ESI or ES-import RT values for
different ESIs do not match is out of scope of this document.
4. The Highest Random Weight DF Election Algorithm 3. The Highest Random Weight DF Election Algorithm
The procedure discussed in this section is applicable to the DF The procedure discussed in this section is applicable to the DF
Election in EVPN Services [RFC7432] and EVPN Virtual Private Wire election in EVPN services [RFC7432] and the EVPN Virtual Private Wire
Services [RFC8214]. Service (VPWS) [RFC8214].
Highest Random Weight (HRW) as defined in [HRW1999] is originally HRW as defined in [HRW1999] is originally proposed in the context of
proposed in the context of Internet Caching and proxy Server load Internet caching and proxy server load balancing. Given an object
balancing. Given an object name and a set of servers, HRW maps a name and a set of servers, HRW maps a request to a server using the
request to a server using the object-name (object-id) and server-name object-name (object-id) and server-name (server-id) rather than the
(server-id) rather than the server states. HRW forms a hash out of server states. HRW forms a hash out of the server-id and the
the server-id and the object-id and forms an ordered list of the object-id and forms an ordered list of the servers for the particular
servers for the particular object-id. The server for which the hash object-id. The server for which the hash value is highest serves as
value is highest, serves as the primary responsible for that the primary server responsible for that particular object, and the
particular object, and the server with the next highest value in that server with the next-highest value in that hash serves as the backup
hash serves as the backup server. HRW always maps a given object name server. HRW always maps a given object name to the same server
to the same server within a given cluster; consequently it can be within a given cluster; consequently, it can be used at client sites
used at client sites to achieve global consensus on object-server to achieve global consensus on object-to-server mappings. When that
mappings. When that server goes down, the backup server becomes the server goes down, the backup server becomes the responsible
responsible designate. designate.
Choosing an appropriate hash function that is statistically oblivious Choosing an appropriate hash function that is statistically oblivious
to the key distribution and imparts a good uniform distribution of to the key distribution and imparts a good uniform distribution of
the hash output is an important aspect of the algorithm. Fortunately the hash output is an important aspect of the algorithm.
many such hash functions exist. [HRW1999] provides pseudo-random Fortunately, many such hash functions exist. [HRW1999] provides
functions based on the Unix utilities rand and srand and easily pseudorandom functions based on the Unix utilities rand and srand and
constructed XOR functions that satisfy the desired hashing easily constructed XOR functions that satisfy the desired hashing
properties. HRW already finds use in multicast and ECMP properties. HRW already finds use in multicast and ECMP [RFC2991]
[RFC2991],[RFC2992]. [RFC2992].
4.1. HRW and Consistent Hashing 3.1. HRW and Consistent Hashing
HRW is not the only algorithm that addresses the object to server HRW is not the only algorithm that addresses the object-to-server
mapping problem with goals of fair load distribution, redundancy and mapping problem with goals of fair load distribution, redundancy, and
fast access. There is another family of algorithms that also fast access. There is another family of algorithms that also
addresses this problem; these fall under the umbrella of the addresses this problem; these fall under the umbrella of the
Consistent Hashing Algorithms [CHASH]. These will not be considered Consistent Hashing Algorithms [CHASH]. These will not be considered
here. here.
4.2. HRW Algorithm for EVPN DF Election 3.2. HRW Algorithm for EVPN DF Election
This section describes the application of HRW to DF election. Let This section describes the application of HRW to DF election. Let
DF(v) denote the Designated Forwarder and BDF(v) the Backup DF(V) denote the DF and BDF(V) denote the BDF for the Ethernet Tag V;
Designated forwarder for the Ethernet Tag v, where v is the VLAN, Si Si is the IP address of PE i; Es is the ESI; and Weight is a function
is the IP address of PE i, Es denotes the Ethernet Segment Identifier of V, Si, and Es.
and weight is a function of v, Si, and Es.
Note that while the DF election algorithm in [RFC7432] uses PE Note that while the DF election algorithm provided in [RFC7432] uses
address and vlan as inputs, this document uses Ethernet Tag, PE a PE address and VLAN as inputs, this document uses an Ethernet Tag,
address and ESI as inputs. This is because if the same set of PEs are PE address, and ESI as inputs. This is because if the same set of
multi-homed to the same set of ESes, then the DF election algorithm PEs are multihomed to the same set of ESes, then the DF election
used in [RFC7432] would result in the same PE being elected DF for algorithm used in [RFC7432] would result in the same PE being elected
the same set of broadcast domains on each ES, which can have adverse DF for the same set of BDs on each ES; this could have adverse
side-effects on both load balancing and redundancy. Including ESI in side effects on both load balancing and redundancy. Including an ESI
the DF election algorithm introduces additional entropy which in the DF election algorithm introduces additional entropy, which
significantly reduces the probability of the same PE being elected DF significantly reduces the probability of the same PE being elected DF
for the same set of broadcast domains on each ES. Therefore, when for the same set of BDs on each ES. Therefore, when using the HRW
using the HRW Algorithm for EVPN DF Election, the ESI value in the algorithm for EVPN DF election, the ESI value in the Weight function
Weight function below SHOULD be set to that of the corresponding ES. below SHOULD be set to that of the corresponding ES.
In case of a VLAN Bundle service, v denotes the lowest VLAN similar In the case of a VLAN Bundle service, V denotes the lowest VLAN,
to the 'lowest VLAN in bundle' logic of [RFC7432]. similar to the "lowest VLAN in bundle" logic of [RFC7432].
1. DF(v) = Si| Weight(v, Es, Si) >= Weight(v, Es, Sj), for all j. In 1. DF(V) = Si| Weight(V, Es, Si) >= Weight(V, Es, Sj), for all j.
case of a tie, choose the PE whose IP address is numerically the In the case of a tie, choose the PE whose IP address is
least. Note 0 <= i,j < Number of PEs in the redundancy group. numerically the least. Note that 0 <= i,j < number of PEs in the
redundancy group.
2. BDF(v) = Sk| Weight(v, Es, Si) >= Weight(v, Es, Sk) and Weight(v, 2. BDF(V) = Sk| Weight(V, Es, Si) >= Weight(V, Es, Sk), and
Es, Sk) >= Weight(v, Es, Sj). In case of tie choose the PE whose Weight(V, Es, Sk) >= Weight(V, Es, Sj). In the case of a tie,
IP address is numerically the least. choose the PE whose IP address is numerically the least.
Where: Where:
DF(v): is defined to be the address Si (index i) for which weight(v, o DF(V) is defined to be the address Si (index i) for which
Es, Si) is the highest, 0 <= i < N-1 Weight(V, Es, Si) is the highest; 0 <= i < N-1.
BDF(v) is defined as that PE with address Sk for which the computed o BDF(V) is defined as that PE with address Sk for which the
weight is the next highest after the weight of the DF. j is the computed Weight is the next highest after the Weight of the DF.
running index from 0 to N-1, i, k are selected values. j is the running index from 0 to N-1; i and k are selected values.
Since the Weight is a pseudo-random function with domain as the Since the Weight is a pseudorandom function with the domain as the
three-tuple (v, Es, S), it is an efficient and deterministic three-tuple (V, Es, S), it is an efficient and deterministic
algorithm that is independent of the Ethernet Tag v sample space algorithm that is independent of the Ethernet Tag V sample space
distribution. Choosing a good hash function for the pseudo-random distribution. Choosing a good hash function for the pseudorandom
function is an important consideration for this algorithm to perform function is an important consideration for this algorithm to perform
better than the Default algorithm. As mentioned previously, such better than the default algorithm. As mentioned previously, such
functions are described in the HRW paper. We take as candidate hash functions are described in [HRW1999]. We take as a candidate hash
function the first one out of the two that are preferred in function the first one out of the two that are listed as preferred in
[HRW1999]: [HRW1999]:
Wrand(v, Es, Si) = (1103515245((1103515245.Si+12345) XOR Wrand(V, Es, Si) = (1103515245((1103515245.Si+12345) XOR
D(v,Es))+12345)(mod 2^31) D(V, Es))+12345)(mod 2^31)
Here D(v,Es) is the 31-bit digest (CRC-32 and discarding the MSB as Here, D(V, Es) is the 31-bit digest (CRC-32 and discarding the
in [HRW1999]) of the 14-byte stream, the Ethernet Tag v (4 bytes) most significant bit (MSB), as noted in [HRW1999]) of the 14-octet
followed by the Ethernet Segment Identifier (10 bytes). It is stream (the 4-octet Ethernet Tag V followed by the 10-octet ESI). It
mandated that the 14-byte stream is formed by concatenation of the is mandated that the 14-octet stream be formed by the concatenation
Ethernet tag and the Ethernet Segment identifier in network byte of the Ethernet Tag and the ESI in network byte order. The CRC
order. The CRC should proceed as if the stream is in network byte should proceed as if the stream is in network byte order
order (big-endian). Si is address of the ith server. The server's IP (big-endian). Si is the address of the ith server. The server's
address length does not matter as only the low-order 31 bits are IP address length does not matter, as only the low-order 31 bits are
modulo significant. modulo significant.
A point to note is that the Weight function takes into consideration A point to note is that the Weight function takes into consideration
the combination of the Ethernet Tag, Ethernet Segment and the PE IP- the combination of the Ethernet Tag, the ES, and the PE IP address,
address, and the actual length of the server IP address (whether IPv4 and the actual length of the server IP address (whether IPv4 or IPv6)
or IPv6) is not really relevant. The Default algorithm in [RFC7432] is not really relevant. The default algorithm defined in [RFC7432]
cannot employ both IPv4 and IPv6 PE addresses, since [RFC7432] does cannot employ both IPv4 and IPv6 PE addresses, since [RFC7432] does
not specify how to decide on the ordering (the ordinal list) when not specify how to decide on the ordering (the ordinal list) when
both IPv4 and IPv6 PEs are present. both IPv4 and IPv6 PEs are present.
HRW solves the disadvantages pointed out in Section 1.2.1 and HRW solves the disadvantages pointed out in Section 1.3.1 of this
ensures: document and ensures that:
o with very high probability that the task of DF election for the o With very high probability, the task of DF election for the VLANs
VLANs configured on an ES is more or less equally distributed among configured on an ES is more or less equally distributed among the
the PEs even for the 2 PE case. PEs, even in the case of two PEs (see the first fundamental
problem listed in Section 1.3.1).
o If a PE that is not the DF or the BDF for that VLAN, goes down or o If a PE that is not the DF or the BDF for that VLAN goes down or
its connection to the ES goes down, it does not result in a DF or its connection to the ES goes down, it does not result in a DF or
BDF reassignment. This saves computation, especially in the case BDF reassignment. This saves computation, especially in the case
when the connection flaps. when the connection flaps.
o More importantly it avoids the needless disruption case of Section o More importantly, it avoids the third fundamental problem listed
1.2.1 (3), that is inherent in the existing Default DF Election. in Section 1.3.1 (needless disruption) that is inherent in the
existing default DF election.
o In addition to the DF, the algorithm also furnishes the BDF, which o In addition to the DF, the algorithm also furnishes the BDF, which
would be the DF if the current DF fails. would be the DF if the current DF fails.
5. The Attachment Circuit Influenced DF Election Capability 4. The AC-Influenced DF Election Capability
The procedure discussed in this section is applicable to the DF The procedure discussed in this section is applicable to the DF
Election in EVPN Services [RFC7432] and EVPN Virtual Private Wire election in EVPN services [RFC7432] and EVPN VPWS [RFC8214].
Services [RFC8214].
The AC-DF capability is expected to be of general applicability with The AC-DF capability is expected to be generally applicable to any
any future DF Algorithm. It modifies the DF Election procedures by future DF algorithm. It modifies the DF election procedures by
removing from consideration any candidate PE in the ES that cannot removing from consideration any candidate PE in the ES that cannot
forward traffic on the AC that belongs to the BD. This section is forward traffic on the AC that belongs to the BD. This section is
applicable to VLAN-Based and VLAN Bundle service interfaces. Section applicable to VLAN-based and VLAN Bundle service interfaces.
5.1 describes the procedures for VLAN-Aware Bundle interfaces. Section 4.1 describes the procedures for VLAN-aware Bundle service
interfaces.
In particular, when used with the Default DF Alg, the AC-DF In particular, when used with the default DF algorithm, the AC-DF
capability modifies the Step 3 in the DF Election procedure described capability modifies Step 3 in the DF election procedure described in
in [RFC7432] Section 8.5, as follows: [RFC7432], Section 8.5, as follows:
3. When the timer expires, each PE builds an ordered "candidate" list 3. When the timer expires, each PE builds an ordered candidate list
of the IP addresses of all the PE nodes attached to the Ethernet of the IP addresses of all the PE nodes attached to the ES
Segment (including itself), in increasing numeric value. The (including itself), in increasing numeric value. The candidate
candidate list is based on the Originator Router's IP addresses of list is based on the Originating Router's IP addresses of the ES
the ES routes, but excludes any PE from whom no Ethernet A-D per routes but excludes any PE from whom no Ethernet A-D per ES route
ES route has been received, or from whom the route has been has been received or from whom the route has been withdrawn.
withdrawn. Afterwards, the DF Election algorithm is applied on a Afterwards, the DF election algorithm is applied on a per
per <ES, Ethernet Tag>, however, the IP address for a PE will not <ES, Ethernet Tag>; however, the IP address for a PE will not be
be considered candidate for a given <ES, Ethernet Tag> until the considered to be a candidate for a given <ES, Ethernet Tag> until
corresponding Ethernet A-D per EVI route has been received from the corresponding Ethernet A-D per EVI route has been received
that PE. In other words, the ACS on the ES for a given PE must be from that PE. In other words, the ACS on the ES for a given PE
UP so that the PE is considered as candidate for a given BD. must be UP so that the PE is considered to be a candidate for a
given BD.
If the Default DF Alg is used, every PE in the resulting candidate If the default DF algorithm is used, every PE in the resulting
list is then given an ordinal indicating its position in the candidate list is then given an ordinal indicating its position in
ordered list, starting with 0 as the ordinal for the PE with the the ordered list, starting with 0 as the ordinal for the PE with
numerically lowest IP address. The ordinals are used to determine the numerically lowest IP address. The ordinals are used to
which PE node will be the DF for a given Ethernet Tag on the determine which PE node will be the DF for a given Ethernet Tag on
Ethernet Segment, using the following rule: the ES, using the following rule:
Assuming a redundancy group of N PE nodes, for VLAN-based service, Assuming a redundancy group of N PE nodes, for VLAN-based service,
the PE with ordinal i is the DF for an <ES, Ethernet Tag V> when the PE with ordinal i is the DF for an <ES, Ethernet Tag V> when
(V mod N)= i. In the case of VLAN-(aware) bundle service, then the (V mod N) = i. In the case of a VLAN (-aware) Bundle service,
numerically lowest VLAN value in that bundle on that ES MUST be then the numerically lowest VLAN value in that bundle on that ES
used in the modulo function as Ethernet Tag. MUST be used in the modulo function as the Ethernet Tag.
It should be noted that using the "Originating Router's IP It should be noted that using the Originating Router's IP Address
address" field in the Ethernet Segment route to get the PE IP field [RFC7432] in the ES route to get the PE IP address needed
address needed for the ordered list allows for a CE to be for the ordered list allows for a CE to be multihomed across
multihomed across different ASes if such a need ever arises. different Autonomous Systems (ASes) if such a need ever arises.
The above three paragraphs differ from [RFC7432] Section 8.5, Step 3, The modified Step 3, above, differs from [RFC7432], Section 8.5,
in two aspects: Step 3 in two ways:
o Any DF Alg algorithm can be used, and not only the described o Any DF Alg can be used -- not only the described modulus-based DF
modulus-based DF Alg (referred to as the Default DF Election, or DF Alg (referred to as the default DF election or "DF Alg 0" in this
Alg 0 in this document). document).
o The candidate list is pruned based upon non-receipt of Ethernet A-D o The candidate list is pruned based upon non-receipt of Ethernet
routes: a PE's IP address MUST be removed from the ES candidate A-D routes: a PE's IP address MUST be removed from the ES
list if its Ethernet A-D per ES route is withdrawn. A PE's IP candidate list if its Ethernet A-D per ES route is withdrawn. A
address MUST NOT be considered as candidate DF for a <ES, Ethernet PE's IP address MUST NOT be considered to be a candidate DF for an
Tag>, if its Ethernet A-D per EVI route for the <ES, Ethernet Tag> <ES, Ethernet Tag> if its Ethernet A-D per EVI route for the
is withdrawn. <ES, Ethernet Tag> is withdrawn.
The following example illustrates the AC-DF behavior applied to the The following example illustrates the AC-DF behavior applied to the
Default DF election algorithm, assuming the network in Figure 2: default DF election algorithm, assuming the network in Figure 2:
a) When PE1 and PE2 discover ES12, they advertise an ES route for (a) When PE1 and PE2 discover ES12, they advertise an ES route for
ES12 with the associated ES-import extended community and the DF ES12 with the associated ES-Import Extended Community and the DF
Election Extended Community indicating AC-DF=1; they start a DF Election Extended Community indicating AC-DF = 1; they start a
Wait timer (independently). Likewise, PE2 and PE3 advertise an ES DF Wait timer (independently). Likewise, PE2 and PE3 advertise
route for ES23 with AC-DF=1 and start a DF Wait timer. an ES route for ES23 with AC-DF = 1 and start a DF Wait timer.
b) PE1/PE2 advertise an Ethernet A-D per ES route for ES12, and (b) PE1 and PE2 advertise an Ethernet A-D per ES route for ES12.
PE2/PE3 advertise an Ethernet A-D per ES route for ES23. PE2 and PE3 advertise an Ethernet A-D per ES route for ES23.
c) In addition, PE1/PE2/PE3 advertise an Ethernet A-D per EVI route (c) In addition, PE1, PE2, and PE3 advertise an Ethernet A-D per EVI
for AC1, AC2, AC3 and AC4 as soon as the ACs are enabled. Note route for AC1, AC2, AC3, and AC4 as soon as the ACs are enabled.
that the AC can be associated to a single customer VID (e.g. VLAN- Note that the AC can be associated with a single customer VID
based service interfaces) or a bundle of customer VIDs (e.g. VLAN (e.g., VLAN-based service interfaces) or a bundle of customer
Bundle service interfaces). VIDs (e.g., VLAN Bundle service interfaces).
d) When the timer expires, each PE builds an ordered "candidate" list (d) When the timer expires, each PE builds an ordered candidate list
of the IP addresses of all the PE nodes connected to the Ethernet of the IP addresses of all the PE nodes attached to the ES
Segment (including itself) as explained above in [RFC7432] Step 3. (including itself) as explained in the modified Step 3 above.
Any PE from which an Ethernet A-D per ES route has not been Any PE from which an Ethernet A-D per ES route has not been
received is pruned from the list. received is pruned from the list.
e) When electing the DF for a given BD, a PE will not be considered (e) When electing the DF for a given BD, a PE will not be considered
candidate until an Ethernet A-D per EVI route has been received to be a candidate until an Ethernet A-D per EVI route has been
from that PE. In other words, the ACS on the ES for a given PE received from that PE. In other words, the ACS on the ES for a
must be UP so that the PE is considered as candidate for a given given PE must be UP so that the PE is considered to be a
BD. For example, PE1 will not consider PE2 as candidate for DF candidate for a given BD. For example, PE1 will not consider
election for <ES12,VLAN-1> until an Ethernet A-D per EVI route is PE2 as a candidate for DF election for <ES12, VLAN-1> until an
received from PE2 for <ES12,VLAN-1>. Ethernet A-D per EVI route is received from PE2 for
<ES12, VLAN-1>.
f) Once the PEs with ACS = DOWN for a given BD have been removed from (f) Once the PEs with ACS = DOWN for a given BD have been removed
the candidate list, the DF Election can be applied for the from the candidate list, the DF election can be applied for the
remaining N candidates. remaining N candidates.
Note that this procedure only modifies the existing EVPN control Note that this procedure only modifies the existing EVPN control
plane by adding and processing the DF Election Extended Community, plane by adding and processing the DF Election Extended Community
and by pruning the candidate list of PEs that take part in the DF and by pruning the candidate list of PEs that take part in the DF
election. election.
In addition to the events defined in the FSM in Section 3.1, the In addition to the events defined in the FSM in Section 2.1, the
following events SHALL modify the candidate PE list and trigger the following events SHALL modify the candidate PE list and trigger the
DF re-election in a PE for a given <ES, Ethernet Tag>. In the FSM of DF re-election in a PE for a given <ES, Ethernet Tag>. In the FSM
Figure 3, the events below MUST trigger a transition from DF_DONE to shown in Figure 3, the events below MUST trigger a transition from
DF_CALC: DF_DONE to DF_CALC:
i. Local AC going DOWN/UP. 1. Local AC going DOWN/UP.
ii. Reception of a new Ethernet A-D per EVI update/withdraw for the 2. Reception of a new Ethernet A-D per EVI route update/withdrawal
<ES, Ethernet Tag>. for the <ES, Ethernet Tag>.
iii. Reception of a new Ethernet A-D per ES update/withdraw for the 3. Reception of a new Ethernet A-D per ES route update/withdrawal
ES. for the ES.
5.1. AC-Influenced DF Election Capability For VLAN-Aware Bundle Services 4.1. AC-Influenced DF Election Capability for VLAN-Aware Bundle
Services
The procedure described in section 5 works for VLAN-based and VLAN The procedure described in Section 4 works for VLAN-based and VLAN
Bundle service interfaces since, for those service types, a PE Bundle service interfaces because, for those service types, a PE
advertises only one Ethernet A-D per EVI route per <ES,VLAN> or advertises only one Ethernet A-D per EVI route per <ES, VLAN> or
<ES,VLAN Bundle>. In Section 5, an Ethernet Tag represents a given <ES, VLAN Bundle>. In Section 4, an Ethernet Tag represents a given
VLAN or VLAN Bundle for the purpose of DF Election. The withdrawal of VLAN or VLAN Bundle for the purpose of DF election. The withdrawal
such route means that the PE cannot forward traffic on that of such a route means that the PE cannot forward traffic on that
particular <ES,VLAN> or <ES,VLAN Bundle>, therefore the PE can be particular <ES, VLAN> or <ES, VLAN Bundle>; therefore, the PE can be
removed from consideration for DF. removed from consideration for DF election.
According to [RFC7432], in VLAN-aware Bundle services, the PE According to [RFC7432], in VLAN-aware Bundle services, the PE
advertises multiple Ethernet A-D per EVI routes per <ES,VLAN Bundle> advertises multiple Ethernet A-D per EVI routes per <ES, VLAN Bundle>
(one route per Ethernet Tag), while the DF Election is still (one route per Ethernet Tag), while the DF election is still
performed per <ES,VLAN Bundle>. The withdrawal of an individual route performed per <ES, VLAN Bundle>. The withdrawal of an individual
only indicates the unavailability of a specific AC but not route only indicates the unavailability of a specific AC and not
necessarily all the ACs in the <ES,VLAN Bundle>. necessarily all the ACs in the <ES, VLAN Bundle>.
This document modifies the DF Election for VLAN-Aware Bundle services This document modifies the DF election for VLAN-aware Bundle services
in the following way: in the following ways:
o After confirming that all the PEs in the ES advertise the AC-DF o After confirming that all the PEs in the ES advertise the AC-DF
capability, a PE will perform a DF Election per <ES,VLAN>, as capability, a PE will perform a DF election per <ES, VLAN>, as
opposed to per <ES,VLAN Bundle> in [RFC7432]. Now, the withdrawal opposed to per <ES, VLAN Bundle> as described in [RFC7432]. Now,
of an Ethernet A-D per EVI route for a VLAN will indicate that the the withdrawal of an Ethernet A-D per EVI route for a VLAN will
advertising PE's ACS is DOWN and the rest of the PEs in the ES can indicate that the advertising PE's ACS is DOWN and the rest of the
remove the PE from consideration for DF in the <ES,VLAN>. PEs in the ES can remove the PE from consideration for DF election
in the <ES, VLAN>.
o The PEs will now follow the procedures in section 5. o The PEs will now follow the procedures in Section 4.
For example, assuming three Bridge Tables in PE1 for the same MAC-VRF For example, assuming three bridge tables in PE1 for the same MAC-VRF
(each one associated to a different Ethernet Tag, e.g. VLAN-1, VLAN-2 (each one associated with a different Ethernet Tag, e.g., VLAN-1,
and VLAN-3), PE1 will advertise three Ethernet A-D per EVI routes for VLAN-2, and VLAN-3), PE1 will advertise three Ethernet A-D per EVI
ES12. Each of the three routes will indicate the status of each of routes for ES12. Each of the three routes will indicate the status
the three ACs in ES12. PE1 will be considered as a valid candidate PE of each of the three ACs in ES12. PE1 will be considered to be a
for DF Election in <ES12,VLAN-1>, <ES12,VLAN-2>, <ES12,VLAN-3> as valid candidate PE for DF election in <ES12, VLAN-1>, <ES12, VLAN-2>,
long as its three routes are active. For instance, if PE1 withdraws and <ES12, VLAN-3> as long as its three routes are active. For
the Ethernet A-D per EVI routes for <ES12,VLAN-1>, the PEs in ES12 instance, if PE1 withdraws the Ethernet A-D per EVI routes for
will not consider PE1 as a suitable DF candidate for <ES12,VLAN-1>. <ES12, VLAN-1>, the PEs in ES12 will not consider PE1 as a suitable
PE1 will still be considered for <ES12,VLAN-2> and <ES12,VLAN-3> DF candidate for <ES12, VLAN-1>. PE1 will still be considered for
since its routes are active. <ES12, VLAN-2> and <ES12, VLAN-3>, since its routes are active.
6. Solution Benefits 5. Solution Benefits
The solution described in this document provides the following The solution described in this document provides the following
benefits: benefits:
a) Extends the DF Election in [RFC7432] to address the unfair load- (a) It extends the DF election as defined in [RFC7432] to address
balancing and potential black-holing issues of the Default DF the unfair load balancing and potential black-holing issues with
Election algorithm. The solution is applicable to the DF Election the default DF election algorithm. The solution is applicable
in EVPN Services [RFC7432] and EVPN Virtual Private Wire Services to the DF election in EVPN services [RFC7432] and EVPN VPWS
[RFC8214]. [RFC8214].
b) It defines a way to signal the DF Election algorithm and (b) It defines a way to signal the DF election algorithm and
capabilities intended by the advertising PE. This is done by capabilities intended by the advertising PE. This is done by
defining the DF Election Extended Community, which allow signaling defining the DF Election Extended Community, which allows the
of the capabilities supported by this document as well as any advertising PE to indicate its support for the capabilities
other future DF Election algorithms and capabilities. defined in this document as well as any subsequently defined DF
election algorithms or capabilities.
c) The solution is backwards compatible with the procedures defined (c) It is backwards compatible with the procedures defined in
in [RFC7432]. If one or more PEs in the ES do not support the new [RFC7432]. If one or more PEs in the ES do not support the new
procedures, they will all follow the [RFC7432] DF Election. procedures, they will all follow DF election as defined in
[RFC7432].
7. Security Considerations 6. Security Considerations
This document addresses some identified issues in the DF Election This document addresses some identified issues in the DF election
procedures described in [RFC7432] by defining a new DF Election procedures described in [RFC7432] by defining a new DF election
framework. In general, this framework allows the PEs that are part of framework. In general, this framework allows the PEs that are part
the same Ethernet Segment to exchange additional information and of the same ES to exchange additional information and agree on the DF
agree on the DF Election Type and Capabilities to be used. election type and capabilities to be used.
Following the procedures in this document, the operator will minimize By following the procedures in this document, the operator will
undesired situations such as unfair load-balancing, service minimize such undesirable situations as unfair load balancing,
disruption and traffic black-holing. Since those situations may have service disruption, and traffic black-holing. Because such
been purposely created by a malicious user with access to the situations could be purposely created by a malicious user with access
configuration of one PE, this document enhances also the security of to the configuration of one PE, this document also enhances the
the network. Note that the network will not benefit of the new security of the network. Note that the network will not benefit from
procedures if the DF Election Alg is not consistently configured on the new procedures if the DF election algorithm is not consistently
all the PEs in the ES (if there is no unanimity among all the PEs, configured on all the PEs in the ES (if there is no unanimity among
the DF Election Alg falls back to the Default [RFC7432] DF Election). all the PEs, the DF election algorithm falls back to the default DF
This behavior could be exploited by an attacker that manages to election as provided in [RFC7432]). This behavior could be exploited
modify the configuration of one PE in the Ethernet Segment so that by an attacker that manages to modify the configuration of one PE in
the DF Election Alg and capabilities in all the PEs in the Ethernet the ES so that the DF election algorithm and capabilities in all the
Segment fall back to the Default DF Election. If that is the case, PEs in the ES fall back to the default DF election. If that is the
the PEs will be exposed to the unfair load-balancing, service case, the PEs will be exposed to the unfair load balancing, service
disruption and black-holing that were mentioned earlier. disruption, and black-holing mentioned earlier.
In addition, the new framework is extensible and allows for future In addition, the new framework is extensible and allows for new
new security enhancements that are out of the scope of this document. security enhancements in the future. Note that such enhancements are
Finally, since this document extends the procedures in [RFC7432], the out of scope for this document. Finally, since this document extends
same Security Considerations described in [RFC7432] are valid for the procedures in [RFC7432], the same security considerations as
this document. those described in [RFC7432] are valid for this document.
8. IANA Considerations 7. IANA Considerations
IANA is requested to: IANA has:
o Allocate Sub-Type value 0x06 in the "EVPN Extended Community Sub- o Allocated Sub-Type value 0x06 in the "EVPN Extended Community
Types" registry defined in [RFC7153] as follows: Sub-Types" registry defined in [RFC7153] as follows:
SUB-TYPE VALUE NAME Reference Sub-Type Value Name Reference
-------------- ------------------------- ------------- -------------- ------------------------------ -------------
0x06 DF Election Extended Community This document 0x06 DF Election Extended Community This document
o Set up a registry called "DF Alg" for the DF Alg field in the o Set up a registry called "DF Alg" for the DF Alg field in the
Extended Community. New registrations will be made through the "RFC Extended Community. New registrations will be made through the
Required" procedure defined in [RFC8126]. Value 31 is for "RFC Required" procedure defined in [RFC8126]. Value 31 is for
Experimental use and does not require any other RFC than this experimental use and does not require any other RFC than this
document. The following initial values in that registry are document. The following initial values in that registry exist:
requested:
Alg Name Reference Alg Name Reference
---- -------------- ------------- ---- ----------------------------- -------------
0 Default DF Election This document 0 Default DF Election This document
1 HRW algorithm This document 1 HRW Algorithm This document
2-30 Unassigned 2-30 Unassigned
31 Reserved for Experimental use This document 31 Reserved for Experimental Use This document
o Set up a registry called "DF Election Capabilities" for the two- o Set up a registry called "DF Election Capabilities" for the
octet Bitmap field in the Extended Community. New registrations 2-octet Bitmap field in the Extended Community. New registrations
will be made through the "RFC Required" procedure defined in will be made through the "RFC Required" procedure defined in
[RFC8126]. The following initial value in that registry is [RFC8126]. The following initial value in that registry exists:
requested:
Bit Name Reference Bit Name Reference
---- -------------- ------------- ---- ---------------- -------------
0 Unassigned 0 Unassigned
1 AC-DF capability This document 1 AC-DF Capability This document
2-15 Unassigned 2-15 Unassigned
9. References 8. References
9.1. Normative References 8.1. Normative References
[RFC7432] Sajassi, A., Ed., Aggarwal, R., Bitar, N., Isaac, A., [RFC7432] Sajassi, A., Ed., Aggarwal, R., Bitar, N., Isaac, A.,
Uttaro, J., Drake, J., and W. Henderickx, "BGP MPLS-Based Ethernet Uttaro, J., Drake, J., and W. Henderickx, "BGP MPLS-Based
VPN", RFC 7432, DOI 10.17487/RFC7432, February 2015, Ethernet VPN", RFC 7432, DOI 10.17487/RFC7432,
<https://www.rfc-editor.org/info/rfc7432>. February 2015, <https://www.rfc-editor.org/info/rfc7432>.
[RFC8214] Boutros, S., Sajassi, A., Salam, S., Drake, J., and J. [RFC8214] Boutros, S., Sajassi, A., Salam, S., Drake, J., and J.
Rabadan, "Virtual Private Wire Service Support in Ethernet VPN", RFC Rabadan, "Virtual Private Wire Service Support in Ethernet
8214, DOI 10.17487/RFC8214, August 2017, <https://www.rfc- VPN", RFC 8214, DOI 10.17487/RFC8214, August 2017,
editor.org/info/rfc8214>. <https://www.rfc-editor.org/info/rfc8214>.
[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
Requirement Levels", BCP 14, RFC 2119, DOI 10.17487/RFC2119, March Requirement Levels", BCP 14, RFC 2119,
1997, <https://www.rfc-editor.org/info/rfc2119>. DOI 10.17487/RFC2119, March 1997,
<https://www.rfc-editor.org/info/rfc2119>.
[RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC [RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in
2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, May 2017, RFC 2119 Key Words", BCP 14, RFC 8174,
<https://www.rfc-editor.org/info/rfc8174>. DOI 10.17487/RFC8174, May 2017,
<https://www.rfc-editor.org/info/rfc8174>.
[RFC4360] Sangli, S., Tappan, D., and Y. Rekhter, "BGP Extended [RFC4360] Sangli, S., Tappan, D., and Y. Rekhter, "BGP Extended
Communities Attribute", RFC 4360, DOI 10.17487/RFC4360, February Communities Attribute", RFC 4360, DOI 10.17487/RFC4360,
2006, <http://www.rfc-editor.org/info/rfc4360>. February 2006, <https://www.rfc-editor.org/info/rfc4360>.
[RFC7153] Rosen, E. and Y. Rekhter, "IANA Registries for BGP [RFC7153] Rosen, E. and Y. Rekhter, "IANA Registries for BGP
Extended Communities", RFC 7153, DOI 10.17487/RFC7153, March 2014, Extended Communities", RFC 7153, DOI 10.17487/RFC7153,
<https://www.rfc-editor.org/info/rfc7153>. March 2014, <https://www.rfc-editor.org/info/rfc7153>.
[RFC8126] Cotton, M., Leiba, B., and T. Narten, "Guidelines for [RFC8126] Cotton, M., Leiba, B., and T. Narten, "Guidelines for
Writing an IANA Considerations Section in RFCs", BCP 26, RFC 8126, Writing an IANA Considerations Section in RFCs", BCP 26,
DOI 10.17487/RFC8126, June 2017, <https://www.rfc- RFC 8126, DOI 10.17487/RFC8126, June 2017,
editor.org/info/rfc8126>. <https://www.rfc-editor.org/info/rfc8126>.
9.2. Informative References 8.2. Informative References
[VPLS-MH] Kothari, Henderickx et al., "BGP based Multi-homing in [VPLS-MH] Kothari, B., Kompella, K., Henderickx, W., Balus, F., and
Virtual Private LAN Service", draft-ietf-bess-vpls-multihoming- J. Uttaro, "BGP based Multi-homing in Virtual Private LAN
02.txt, work in progress, September, 2018. Service", Work in Progress,
draft-ietf-bess-vpls-multihoming-03, March 2019.
[CHASH] Karger, D., Lehman, E., Leighton, T., Panigrahy, R., Levine, [CHASH] Karger, D., Lehman, E., Leighton, T., Panigrahy, R.,
M., and D. Lewin, "Consistent Hashing and Random Trees: Distributed Levine, M., and D. Lewin, "Consistent Hashing and Random
Caching Protocols for Relieving Hot Spots on the World Wide Web", ACM Trees: Distributed Caching Protocols for Relieving Hot
Symposium on Theory of Computing ACM Press New York, May 1997. Spots on the World Wide Web", ACM Symposium on Theory of
Computing, ACM Press, New York, DOI 10.1145/258533.258660,
May 1997.
[CLRS2009] Cormen, T., Leiserson, C., Rivest, R., and C. Stein, [CLRS2009] Cormen, T., Leiserson, C., Rivest, R., and C. Stein,
"Introduction to Algorithms (3rd ed.)", MIT Press and McGraw-Hill "Introduction to Algorithms (3rd Edition)", MIT
ISBN 0-262-03384-4., February 2009. Press, ISBN 0-262-03384-8, 2009.
[RFC2991] Thaler, D. and C. Hopps, "Multipath Issues in Unicast and [RFC2991] Thaler, D. and C. Hopps, "Multipath Issues in Unicast and
Multicast Next-Hop Selection", RFC 2991, DOI 10.17487/RFC2991, Multicast Next-Hop Selection", RFC 2991,
November 2000, <http://www.rfc-editor.org/info/rfc2991>. DOI 10.17487/RFC2991, November 2000,
<https://www.rfc-editor.org/info/rfc2991>.
[RFC2992] Hopps, C., "Analysis of an Equal-Cost Multi-Path [RFC2992] Hopps, C., "Analysis of an Equal-Cost Multi-Path
Algorithm", RFC 2992, DOI 10.17487/RFC2992, November 2000, Algorithm", RFC 2992, DOI 10.17487/RFC2992, November 2000,
<http://www.rfc-editor.org/info/rfc2992>. <https://www.rfc-editor.org/info/rfc2992>.
[RFC4456] Bates, T., Chen, E., and R. Chandra, "BGP Route
Reflection: An Alternative to Full Mesh Internal BGP
(IBGP)", RFC 4456, DOI 10.17487/RFC4456, April 2006,
<https://www.rfc-editor.org/info/rfc4456>.
[HRW1999] Thaler, D. and C. Ravishankar, "Using Name-Based Mappings [HRW1999] Thaler, D. and C. Ravishankar, "Using Name-Based Mappings
to Increase Hit Rates", IEEE/ACM Transactions in networking Volume 6 to Increase Hit Rates", IEEE/ACM Transactions on
Issue 1, February 1998, <https://www.microsoft.com/en-us/research/wp- Networking, Volume 6, No. 1, February 1998,
content/uploads/2017/02/HRW98.pdf>. <https://www.microsoft.com/en-us/research/wp-content/
uploads/2017/02/HRW98.pdf>.
[Knuth] Art of Computer Programming - Sorting and Searching,Vol 3 [Knuth] Knuth, D., "The Art of Computer Programming: Volume 3:
Pg. 516, Addison Wesley Sorting and Searching", 2nd Edition, Addison-Wesley,
Page 516, 1998.
10. Acknowledgments Acknowledgments
The authors want to thank Sriram Venkateswaran, Laxmi Padakanti, The authors want to thank Ranganathan Boovaraghavan, Sami Boutros,
Ranganathan Boovaraghavan, Tamas Mondal, Sami Boutros, Jakob Heitz, Luc Andre Burdet, Anoop Ghanwani, Mrinmoy Ghosh, Jakob Heitz, Leo
Mrinmoy Ghosh, Leo Mermelstein, Mankamana Mishra, Anoop Ghanwani and Mermelstein, Mankamana Mishra, Tamas Mondal, Laxmi Padakanti, Samir
Samir Thoria for their review and contributions. Special thanks to Thoria, and Sriram Venkateswaran for their review and contributions.
Stephane Litkowski for his thorough review and detailed Special thanks to Stephane Litkowski for his thorough review and
contributions. detailed contributions.
11. Contributors They would also like to thank their working group chairs, Matthew
Bocci and Stephane Litkowski, and their AD, Martin Vigoureux, for
their guidance and support.
In addition to the authors listed on the front page, the following Finally, they would like to thank the Directorate reviewers and the
coauthors have also contributed to this document: ADs for their thorough reviews and probing questions, the answers to
which have substantially improved the quality of the document.
Contributors
The following people have contributed substantially to this document
and should be considered coauthors:
Antoni Przygienda Antoni Przygienda
Juniper Networks, Inc. Juniper Networks, Inc.
1194 N. Mathilda Drive 1194 N. Mathilda Ave.
Sunnyvale, CA 95134 Sunnyvale, CA 94089
USA United States of America
Email: prz@juniper.net Email: prz@juniper.net
Vinod Prabhu Vinod Prabhu
Nokia Nokia
Email: vinod.prabhu@nokia.com Email: vinod.prabhu@nokia.com
Wim Henderickx Wim Henderickx
Nokia Nokia
Email: wim.henderickx@nokia.com Email: wim.henderickx@nokia.com
Wen Lin Wen Lin
Juniper Networks, Inc. Juniper Networks, Inc.
Email: wlin@juniper.net
Email: wlin@juniper.net
Patrice Brissette Patrice Brissette
Cisco Systems Cisco Systems
Email: pbrisset@cisco.com Email: pbrisset@cisco.com
Keyur Patel Keyur Patel
Arrcus, Inc Arrcus, Inc.
Email: keyur@arrcus.com Email: keyur@arrcus.com
Autumn Liu Autumn Liu
Ciena Ciena
Email: hliu@ciena.com Email: hliu@ciena.com
Authors' Addresses Authors' Addresses
Jorge Rabadan Jorge Rabadan (editor)
Nokia Nokia
777 E. Middlefield Road 777 E. Middlefield Road
Mountain View, CA 94043 USA Mountain View, CA 94043
United States of America
Email: jorge.rabadan@nokia.com Email: jorge.rabadan@nokia.com
Satya Mohanty Satya Mohanty (editor)
Cisco Systems, Inc. Cisco Systems, Inc.
225 West Tasman Drive 225 West Tasman Drive
San Jose, CA 95134 San Jose, CA 95134
USA United States of America
Email: satyamoh@cisco.com Email: satyamoh@cisco.com
Ali Sajassi Ali Sajassi
Cisco Systems, Inc. Cisco Systems, Inc.
225 West Tasman Drive 225 West Tasman Drive
San Jose, CA 95134 San Jose, CA 95134
USA United States of America
Email: sajassi@cisco.com
Email: sajassi@cisco.com
John Drake John Drake
Juniper Networks, Inc. Juniper Networks, Inc.
1194 N. Mathilda Drive 1194 N. Mathilda Ave.
Sunnyvale, CA 95134 Sunnyvale, CA 94089
USA United States of America
Email: jdrake@juniper.net Email: jdrake@juniper.net
Kiran Nagaraj Kiran Nagaraj
Nokia Nokia
701 E. Middlefield Road 701 E. Middlefield Road
Mountain View, CA 94043 USA Mountain View, CA 94043
United States of America
Email: kiran.nagaraj@nokia.com Email: kiran.nagaraj@nokia.com
Senthil Sathappan Senthil Sathappan
Nokia Nokia
701 E. Middlefield Road 701 E. Middlefield Road
Mountain View, CA 94043 USA Mountain View, CA 94043
United States of America
Email: senthil.sathappan@nokia.com Email: senthil.sathappan@nokia.com
 End of changes. 242 change blocks. 
899 lines changed or deleted 948 lines changed or added

This html diff was produced by rfcdiff 1.47. The latest version is available from http://tools.ietf.org/tools/rfcdiff/