draft-ietf-bess-evpn-overlay-12.txt   rfc8365.txt 
BESS Workgroup A. Sajassi (Editor) Internet Engineering Task Force (IETF) A. Sajassi, Ed.
INTERNET-DRAFT Cisco Request for Comments: 8365 Cisco
Intended Status: Standards Track J. Drake (Editor) Category: Standards Track J. Drake, Ed.
Juniper ISSN: 2070-1721 Juniper
N. Bitar N. Bitar
Nokia Nokia
R. Shekhar R. Shekhar
Juniper Juniper
J. Uttaro J. Uttaro
AT&T AT&T
W. Henderickx W. Henderickx
Nokia Nokia
March 2018
Expires: August 9, 2018 February 9, 2018 A Network Virtualization Overlay Solution Using Ethernet VPN (EVPN)
A Network Virtualization Overlay Solution using EVPN
draft-ietf-bess-evpn-overlay-12
Abstract Abstract
This document specifies how Ethernet VPN (EVPN) can be used as a This document specifies how Ethernet VPN (EVPN) can be used as a
Network Virtualization Overlay (NVO) solution and explores the Network Virtualization Overlay (NVO) solution and explores the
various tunnel encapsulation options over IP and their impact on the various tunnel encapsulation options over IP and their impact on the
EVPN control-plane and procedures. In particular, the following EVPN control plane and procedures. In particular, the following
encapsulation options are analyzed: Virtual Extensible LAN (VXLAN), encapsulation options are analyzed: Virtual Extensible LAN (VXLAN),
Network Virtualization using Generic Routing Encapsulation (NVGRE), Network Virtualization using Generic Routing Encapsulation (NVGRE),
and MPLS over Generic Routing Encapsulation (GRE). This specification and MPLS over GRE. This specification is also applicable to Generic
is also applicable to Generic Network Virtualization Encapsulation Network Virtualization Encapsulation (GENEVE); however, some
(GENEVE) encapsulation; however, some incremental work is required incremental work is required, which will be covered in a separate
which will be covered in a separate document. This document also document. This document also specifies new multihoming procedures
specifies new multi-homing procedures for split-horizon filtering and for split-horizon filtering and mass withdrawal. It also specifies
mass-withdraw. It also specifies EVPN route constructions for EVPN route constructions for VXLAN/NVGRE encapsulations and
VXLAN/NVGRE encapsulations and Autonomous System Boundary Router Autonomous System Border Router (ASBR) procedures for multihoming of
(ASBR) procedures for multi-homing of Network Virtualization (NV) Network Virtualization Edge (NVE) devices.
Edge devices.
Status of this Memo
This Internet-Draft is submitted to IETF in full conformance with the
provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering Status of This Memo
Task Force (IETF), its areas, and its working groups. Note that
other groups may also distribute working documents as
Internet-Drafts.
Internet-Drafts are draft documents valid for a maximum of six months This is an Internet Standards Track document.
and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress."
The list of current Internet-Drafts can be accessed at This document is a product of the Internet Engineering Task Force
http://www.ietf.org/1id-abstracts.html (IETF). It represents the consensus of the IETF community. It has
received public review and has been approved for publication by the
Internet Engineering Steering Group (IESG). Further information on
Internet Standards is available in Section 2 of RFC 7841.
The list of Internet-Draft Shadow Directories can be accessed at Information about the current status of this document, any errata,
http://www.ietf.org/shadow.html and how to provide feedback on it may be obtained at
https://www.rfc-editor.org/info/rfc8365.
Copyright and License Notice Copyright Notice
Copyright (c) 2018 IETF Trust and the persons identified as the Copyright (c) 2018 IETF Trust and the persons identified as the
document authors. All rights reserved. document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents Provisions Relating to IETF Documents
(http://trustee.ietf.org/license-info) in effect on the date of (https://trustee.ietf.org/license-info) in effect on the date of
publication of this document. Please review these documents publication of this document. Please review these documents
carefully, as they describe your rights and restrictions with respect carefully, as they describe your rights and restrictions with respect
to this document. Code Components extracted from this document must to this document. Code Components extracted from this document must
include Simplified BSD License text as described in Section 4.e of include Simplified BSD License text as described in Section 4.e of
the Trust Legal Provisions and are provided without warranty as the Trust Legal Provisions and are provided without warranty as
described in the Simplified BSD License. described in the Simplified BSD License.
Table of Contents Table of Contents
1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4 1. Introduction ....................................................4
2 Requirements Notation and Conventions . . . . . . . . . . . . . 5 2. Requirements Notation and Conventions ...........................5
3 Terminology . . . . . . . . . . . . . . . . . . . . . . . . . . 5 3. Terminology .....................................................5
4 EVPN Features . . . . . . . . . . . . . . . . . . . . . . . . . 6 4. EVPN Features ...................................................7
5 Encapsulation Options for EVPN Overlays . . . . . . . . . . . . 8 5. Encapsulation Options for EVPN Overlays .........................8
5.1 VXLAN/NVGRE Encapsulation . . . . . . . . . . . . . . . . . 8 5.1. VXLAN/NVGRE Encapsulation ..................................8
5.1.1 Virtual Identifiers Scope . . . . . . . . . . . . . . . 9 5.1.1. Virtual Identifiers Scope ...........................9
5.1.1.1 Data Center Interconnect with Gateway . . . . . . . 9 5.1.2. Virtual Identifiers to EVI Mapping .................11
5.1.1.2 Data Center Interconnect without Gateway . . . . . . 9 5.1.3. Constructing EVPN BGP Routes .......................13
5.1.2 Virtual Identifiers to EVI Mapping . . . . . . . . . . . 10 5.2. MPLS over GRE .............................................15
5.1.2.1 Auto Derivation of RT . . . . . . . . . . . . . . . 11 6. EVPN with Multiple Data-Plane Encapsulations ...................15
5.1.3 Constructing EVPN BGP Routes . . . . . . . . . . . . . 13 7. Single-Homing NVEs - NVE Residing in Hypervisor ................16
5.2 MPLS over GRE . . . . . . . . . . . . . . . . . . . . . . . 14 7.1. Impact on EVPN BGP Routes & Attributes for VXLAN/NVGRE ....16
6 EVPN with Multiple Data Plane Encapsulations . . . . . . . . . 15 7.2. Impact on EVPN Procedures for VXLAN/NVGRE Encapsulations ..17
7 Single-Homing NVEs - NVE Residing in Hypervisor . . . . . . . . 15 8. Multihoming NVEs - NVE Residing in ToR Switch ..................18
7.1 Impact on EVPN BGP Routes & Attributes for VXLAN/NVGRE 8.1. EVPN Multihoming Features .................................18
Encapsulation . . . . . . . . . . . . . . . . . . . . . . . 16 8.1.1. Multihomed ES Auto-Discovery .......................18
8.1.2. Fast Convergence and Mass Withdrawal ...............18
7.2 Impact on EVPN Procedures for VXLAN/NVGRE Encapsulation . . 16 8.1.3. Split-Horizon ......................................19
8 Multi-Homing NVEs - NVE Residing in ToR Switch . . . . . . . . 17 8.1.4. Aliasing and Backup Path ...........................19
8.1 EVPN Multi-Homing Features . . . . . . . . . . . . . . . . 17 8.1.5. DF Election ........................................20
8.1.1 Multi-homed Ethernet Segment Auto-Discovery . . . . . . 18 8.2. Impact on EVPN BGP Routes and Attributes ..................20
8.1.2 Fast Convergence and Mass Withdraw . . . . . . . . . . . 18 8.3. Impact on EVPN Procedures .................................20
8.1.3 Split-Horizon . . . . . . . . . . . . . . . . . . . . . 18 8.3.1. Split Horizon ......................................21
8.1.4 Aliasing and Backup-Path . . . . . . . . . . . . . . . . 18 8.3.2. Aliasing and Backup Path ...........................22
8.1.5 DF Election . . . . . . . . . . . . . . . . . . . . . . 19 8.3.3. Unknown Unicast Traffic Designation ................22
8.2 Impact on EVPN BGP Routes & Attributes . . . . . . . . . . . 20 9. Support for Multicast ..........................................23
8.3 Impact on EVPN Procedures . . . . . . . . . . . . . . . . . 20 10. Data-Center Interconnections (DCIs) ...........................24
8.3.1 Split Horizon . . . . . . . . . . . . . . . . . . . . . 20 10.1. DCI Using GWs ............................................24
8.3.2 Aliasing and Backup-Path . . . . . . . . . . . . . . . . 21 10.2. DCI Using ASBRs ..........................................24
8.3.3 Unknown Unicast Traffic Designation . . . . . . . . . . 21 10.2.1. ASBR Functionality with Single-Homing NVEs ........25
9 Support for Multicast . . . . . . . . . . . . . . . . . . . . . 22 10.2.2. ASBR Functionality with Multihoming NVEs ..........26
10 Data Center Interconnections - DCI . . . . . . . . . . . . . . 23 11. Security Considerations .......................................28
10.1 DCI using GWs . . . . . . . . . . . . . . . . . . . . . . . 23 12. IANA Considerations ...........................................29
10.2 DCI using ASBRs . . . . . . . . . . . . . . . . . . . . . . 24 13. References ....................................................29
10.2.1 ASBR Functionality with Single-Homing NVEs . . . . . . 25 13.1. Normative References .....................................29
10.2.2 ASBR Functionality with Multi-Homing NVEs . . . . . . . 25 13.2. Informative References ...................................30
11 Acknowledgement . . . . . . . . . . . . . . . . . . . . . . . 27 Acknowledgements ..................................................32
12 Security Considerations . . . . . . . . . . . . . . . . . . . 27 Contributors ......................................................32
13 IANA Considerations . . . . . . . . . . . . . . . . . . . . . 28 Authors' Addresses ................................................33
14 References . . . . . . . . . . . . . . . . . . . . . . . . . . 28
14.1 Normative References . . . . . . . . . . . . . . . . . . . 28
14.2 Informative References . . . . . . . . . . . . . . . . . . 29
Contributors . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 30
1 Introduction 1. Introduction
This document specifies how Ethernet VPN (EVPN) [RFC7432] can be used This document specifies how Ethernet VPN (EVPN) [RFC7432] can be used
as a Network Virtualization Overlay (NVO) solution and explores the as a Network Virtualization Overlay (NVO) solution and explores the
various tunnel encapsulation options over IP and their impact on the various tunnel encapsulation options over IP and their impact on the
EVPN control-plane and procedures. In particular, the following EVPN control plane and procedures. In particular, the following
encapsulation options are analyzed: Virtual Extensible LAN (VXLAN) encapsulation options are analyzed: Virtual Extensible LAN (VXLAN)
[RFC7348], Network Virtualization using Generic Routing Encapsulation [RFC7348], Network Virtualization using Generic Routing Encapsulation
(NVGRE) [RFC7637], and MPLS over Generic Routing Encapsulation (GRE) (NVGRE) [RFC7637], and MPLS over Generic Routing Encapsulation (GRE)
[RFC4023]. This specification is also applicable to Generic Network [RFC4023]. This specification is also applicable to Generic Network
Virtualization Encapsulation (GENEVE) encapsulation [GENEVE]; Virtualization Encapsulation (GENEVE) [GENEVE]; however, some
however, some incremental work is required which will be covered in a incremental work is required, which will be covered in a separate
separate document [EVPN-GENEVE]. This document also specifies new document [EVPN-GENEVE]. This document also specifies new multihoming
multi-homing procedures for split-horizon filtering and mass- procedures for split-horizon filtering and mass withdrawal. It also
withdraw. It also specifies EVPN route constructions for VXLAN/NVGRE specifies EVPN route constructions for VXLAN/NVGRE encapsulations and
encapsulations and Autonomous System Boundary Router (ASBR) Autonomous System Border Router (ASBR) procedures for multihoming of
procedures for multi-homing of Network Virtualization (NV) Edge Network Virtualization Edge (NVE) devices.
devices.
In the context of this document, a Network Virtualization Overlay In the context of this document, an NVO is a solution to address the
(NVO) is a solution to address the requirements of a multi-tenant requirements of a multi-tenant data center, especially one with
data center, especially one with virtualized hosts, e.g., Virtual virtualized hosts, e.g., Virtual Machines (VMs) or virtual workloads.
Machines (VMs) or virtual workloads. The key requirements of such a The key requirements of such a solution, as described in [RFC7364],
solution, as described in [RFC7364], are: are the following:
- Isolation of network traffic per tenant - Isolation of network traffic per tenant
- Support for a large number of tenants (tens or hundreds of - Support for a large number of tenants (tens or hundreds of
thousands) thousands)
- Extending L2 connectivity among different VMs belonging to a given - Extension of Layer 2 (L2) connectivity among different VMs
tenant segment (subnet) across different Point of Deliveries (PODs) belonging to a given tenant segment (subnet) across different
within a data center or between different data centers Points of Delivery (PoDs) within a data center or between
different data centers
- Allowing a given VM to move between different physical points of - Allowing a given VM to move between different physical points of
attachment within a given L2 segment attachment within a given L2 segment
The underlay network for NVO solutions is assumed to provide IP The underlay network for NVO solutions is assumed to provide IP
connectivity between NVO endpoints (NVEs). connectivity between NVO endpoints.
This document describes how Ethernet VPN (EVPN) can be used as an NVO This document describes how EVPN can be used as an NVO solution and
solution and explores applicability of EVPN functions and procedures. explores applicability of EVPN functions and procedures. In
In particular, it describes the various tunnel encapsulation options particular, it describes the various tunnel encapsulation options for
for EVPN over IP, and their impact on the EVPN control-plane and EVPN over IP and their impact on the EVPN control plane as well as
procedures for two main scenarios: procedures for two main scenarios:
a) single-homing NVEs - when a NVE resides in the hypervisor, and (a) single-homing NVEs - when an NVE resides in the hypervisor, and
b) multi-homing NVEs - when a NVE resides in a Top of Rack (ToR)
device (b) multihoming NVEs - when an NVE resides in a Top-of-Rack (ToR)
device.
The possible encapsulation options for EVPN overlays that are The possible encapsulation options for EVPN overlays that are
analyzed in this document are: analyzed in this document are:
- VXLAN and NVGRE - VXLAN and NVGRE
- MPLS over GRE
- MPLS over GRE
Before getting into the description of the different encapsulation Before getting into the description of the different encapsulation
options for EVPN over IP, it is important to highlight the EVPN options for EVPN over IP, it is important to highlight the EVPN
solution's main features, how those features are currently supported, solution's main features, how those features are currently supported,
and any impact that the encapsulation has on those features. and any impact that the encapsulation has on those features.
2 Requirements Notation and Conventions 2. Requirements Notation and Conventions
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and
"OPTIONAL" in this document are to be interpreted as described in BCP "OPTIONAL" in this document are to be interpreted as described in
14 [RFC2119] [RFC8174] when, and only when, they appear in all BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all
capitals, as shown here. capitals, as shown here.
3 Terminology 3. Terminology
Most of the terminology used in this documents comes from [RFC7432] Most of the terminology used in this documents comes from [RFC7432]
and [RFC7365]. and [RFC7365].
VXLAN: Virtual Extensible LAN VXLAN: Virtual Extensible LAN
GRE: Generic Routing Encapsulation GRE: Generic Routing Encapsulation
NVGRE: Network Virtualization using Generic Routing Encapsulation NVGRE: Network Virtualization using Generic Routing Encapsulation
GENEVE: Generic Network Virtualization Encapsulation GENEVE: Generic Network Virtualization Encapsulation
POD: Point of Delivery PoD: Point of Delivery
NV: Network Virtualization NV: Network Virtualization
NVO: Network Virtualization Overlay
NVO: Network Virtualization Overlay NVE: Network Virtualization Edge
NVE: Network Virtualization Endpoint VNI: VXLAN Network Identifier
VNI: Virtual Network Identifier (for VXLAN) VSID: Virtual Subnet Identifier (for NVGRE)
VSID: Virtual Subnet Identifier (for NVGRE)
EVPN: Ethernet VPN I-SID: Service Instance Identifier
EVI: An EVPN instance spanning the Provider Edge (PE) devices EVPN: Ethernet VPN
participating in that EVPN
MAC-VRF: A Virtual Routing and Forwarding table for Media Access EVI: EVPN Instance. An EVPN instance spanning the Provider Edge
Control (MAC) addresses on a PE (PE) devices participating in that EVPN
IP-VRF: A Virtual Routing and Forwarding table for Internet Protocol MAC-VRF: A Virtual Routing and Forwarding table for Media Access
(IP) addresses on a PE Control (MAC) addresses on a PE
Ethernet Segment (ES): When a customer site (device or network) is IP-VRF: A Virtual Routing and Forwarding table for Internet Protocol
connected to one or more PEs via a set of Ethernet links, then that (IP) addresses on a PE
set of links is referred to as an 'Ethernet segment'.
Ethernet Segment Identifier (ESI): A unique non-zero identifier that ES: Ethernet Segment. When a customer site (device or network) is
identifies an Ethernet segment is called an 'Ethernet Segment connected to one or more PEs via a set of Ethernet links, then
Identifier'. that set of links is referred to as an 'Ethernet segment'.
Ethernet Tag: An Ethernet tag identifies a particular broadcast Ethernet Segment Identifier (ESI): A unique non-zero identifier that
domain, e.g., a VLAN. An EVPN instance consists of one or more identifies an Ethernet segment is called an 'Ethernet Segment
broadcast domains. Identifier'.
PE: Provider Edge device. Ethernet Tag: An Ethernet tag identifies a particular broadcast
domain, e.g., a VLAN. An EVPN instance consists of one or more
broadcast domains.
Single-Active Redundancy Mode: When only a single PE, among all the PE: Provider Edge
PEs attached to an Ethernet segment, is allowed to forward traffic
to/from that Ethernet segment for a given VLAN, then the Ethernet
segment is defined to be operating in Single-Active redundancy mode.
All-Active Redundancy Mode: When all PEs attached to an Ethernet Single-Active Redundancy Mode: When only a single PE, among all the
segment are allowed to forward known unicast traffic to/from that PEs attached to an ES, is allowed to forward traffic to/from that
Ethernet segment for a given VLAN, then the Ethernet segment is ES for a given VLAN, then the Ethernet segment is defined to be
defined to be operating in All-Active redundancy mode. operating in Single-Active redundancy mode.
PIM-SM: Protocol Independent Multicast - Sparse-Mode All-Active Redundancy Mode: When all PEs attached to an Ethernet
segment are allowed to forward known unicast traffic to/from that
ES for a given VLAN, then the ES is defined to be operating in
All-Active redundancy mode.
PIM-SSM: Protocol Independent Multicast - Source Specific Multicast PIM-SM: Protocol Independent Multicast - Sparse-Mode
PIM-SSM: Protocol Independent Multicast - Source-Specific Multicast
Bidir PIM: Bidirectional PIM BIDIR-PIM: Bidirectional PIM
4 EVPN Features 4. EVPN Features
EVPN [RFC7432] was originally designed to support the requirements EVPN [RFC7432] was originally designed to support the requirements
detailed in [RFC7209] and therefore has the following attributes detailed in [RFC7209] and therefore has the following attributes
which directly address control plane scaling and ease of deployment which directly address control-plane scaling and ease of deployment
issues. issues.
1) Control plane information is distributed with BGP and Broadcast 1. Control-plane information is distributed with BGP and broadcast
and Multicast traffic is sent using a shared multicast tree or with and multicast traffic is sent using a shared multicast tree or
ingress replication. with ingress replication.
2) Control plane learning is used for MAC (and IP) addresses instead 2. Control-plane learning is used for MAC (and IP) addresses
of data plane learning. The latter requires the flooding of unknown instead of data-plane learning. The latter requires the
unicast and Address Resolution Protocol (ARP) frames; whereas, the flooding of unknown unicast and Address Resolution Protocol
former does not require any flooding. (ARP) frames; whereas, the former does not require any flooding.
3) Route Reflector (RR) is used to reduce a full mesh of BGP sessions 3. Route Reflector (RR) is used to reduce a full mesh of BGP
among PE devices to a single BGP session between a PE and the RR. sessions among PE devices to a single BGP session between a PE
Furthermore, RR hierarchy can be leveraged to scale the number of BGP and the RR. Furthermore, RR hierarchy can be leveraged to scale
routes on the RR. the number of BGP routes on the RR.
4) Auto-discovery via BGP is used to discover PE devices 4. Auto-discovery via BGP is used to discover PE devices
participating in a given VPN, PE devices participating in a given participating in a given VPN, PE devices participating in a
redundancy group, tunnel encapsulation types, multicast tunnel type, given redundancy group, tunnel encapsulation types, multicast
multicast members, etc. tunnel types, multicast members, etc.
5) All-Active multihoming is used. This allows a given customer 5. All-Active multihoming is used. This allows a given Customer
device (CE) to have multiple links to multiple PEs, and traffic Edge (CE) device to have multiple links to multiple PEs, and
to/from that CE fully utilizes all of these links. traffic to/from that CE fully utilizes all of these links.
6) When a link between a CE and a PE fails, the PEs for that EVI are 6. When a link between a CE and a PE fails, the PEs for that EVI
notified of the failure via the withdrawal of a single EVPN route. are notified of the failure via the withdrawal of a single EVPN
This allows those PEs to remove the withdrawing PE as a next hop for route. This allows those PEs to remove the withdrawing PE as a
every MAC address associated with the failed link. This is termed next hop for every MAC address associated with the failed link.
'mass withdrawal'. This is termed "mass withdrawal".
7) BGP route filtering and constrained route distribution are 7. BGP route filtering and constrained route distribution are
leveraged to ensure that the control plane traffic for a given EVI is leveraged to ensure that the control-plane traffic for a given
only distributed to the PEs in that EVI. EVI is only distributed to the PEs in that EVI.
8) When a 802.1Q interface is used between a CE and a PE, each of the 8. When an IEEE 802.1Q [IEEE.802.1Q] interface is used between a CE
VLAN ID (VID) on that interface can be mapped onto a bridge table and a PE, each of the VLAN IDs (VIDs) on that interface can be
(for upto 4094 such bridge tables). All these bridge tables may be mapped onto a bridge table (for up to 4094 such bridge tables).
mapped onto a single MAC-VRF (in case of VLAN-aware bundle service). All these bridge tables may be mapped onto a single MAC-VRF (in
case of VLAN-aware bundle service).
9) VM Mobility mechanisms ensure that all PEs in a given EVI know 9. VM Mobility mechanisms ensure that all PEs in a given EVI know
the ES with which a given VM, as identified by its MAC and IP the ES with which a given VM, as identified by its MAC and IP
addresses, is currently associated. addresses, is currently associated.
10) Route Targets are used to allow the operator (or customer) to 10. RTs are used to allow the operator (or customer) to define a
define a spectrum of logical network topologies including mesh, hub & spectrum of logical network topologies including mesh, hub and
spoke, and extranets (e.g., a VPN whose sites are owned by different spoke, and extranets (e.g., a VPN whose sites are owned by
enterprises), without the need for proprietary software or the aid of different enterprises), without the need for proprietary
other virtual or physical devices. software or the aid of other virtual or physical devices.
Because the design goal for NVO is millions of instances per common Because the design goal for NVO is millions of instances per common
physical infrastructure, the scaling properties of the control plane physical infrastructure, the scaling properties of the control plane
for NVO are extremely important. EVPN and the extensions described for NVO are extremely important. EVPN and the extensions described
herein, are designed with this level of scalability in mind. herein, are designed with this level of scalability in mind.
5 Encapsulation Options for EVPN Overlays 5. Encapsulation Options for EVPN Overlays
5.1 VXLAN/NVGRE Encapsulation 5.1. VXLAN/NVGRE Encapsulation
Both VXLAN and NVGRE are examples of technologies that provide a data Both VXLAN and NVGRE are examples of technologies that provide a data
plane encapsulation which is used to transport a packet over the plane encapsulation which is used to transport a packet over the
common physical IP infrastructure between Network Virtualization common physical IP infrastructure between Network Virtualization
Edges (NVEs) - e.g., VXLAN Tunnel End Points (VTEPs) in VXLAN Edges (NVEs) - e.g., VXLAN Tunnel End Points (VTEPs) in VXLAN
network. Both of these technologies include the identifier of the network. Both of these technologies include the identifier of the
specific NVO instance, Virtual Network Identifier (VNI) in VXLAN and specific NVO instance, VNI in VXLAN and VSID in NVGRE, in each
Virtual Subnet Identifier (VSID) in NVGRE, in each packet. In the packet. In the remainder of this document we use VNI as the
remainder of this document we use VNI as the representation for NVO representation for NVO instance with the understanding that VSID can
instance with the understanding that VSID can equally be used if the equally be used if the encapsulation is NVGRE unless it is stated
encapsulation is NVGRE unless it is stated otherwise. otherwise.
Note that a Provider Edge (PE) is equivalent to a NVE/VTEP. Note that a PE is equivalent to an NVE/VTEP.
VXLAN encapsulation is based on UDP, with an 8-byte header following VXLAN encapsulation is based on UDP, with an 8-byte header following
the UDP header. VXLAN provides a 24-bit VNI, which typically provides the UDP header. VXLAN provides a 24-bit VNI, which typically
a one-to-one mapping to the tenant VLAN ID, as described in provides a one-to-one mapping to the tenant VID, as described in
[RFC7348]. In this scenario, the ingress VTEP does not include an [RFC7348]. In this scenario, the ingress VTEP does not include an
inner VLAN tag on the encapsulated frame, and the egress VTEP inner VLAN tag on the encapsulated frame, and the egress VTEP
discards the frames with an inner VLAN tag. This mode of operation in discards the frames with an inner VLAN tag. This mode of operation
[RFC7348] maps to VLAN Based Service in [RFC7432], where a tenant in [RFC7348] maps to VLAN-Based Service in [RFC7432], where a tenant
VLAN ID gets mapped to an EVPN instance (EVI). VID gets mapped to an EVI.
VXLAN also provides an option of including an inner VLAN tag in the VXLAN also provides an option of including an inner VLAN tag in the
encapsulated frame, if explicitly configured at the VTEP. This mode encapsulated frame, if explicitly configured at the VTEP. This mode
of operation can map to VLAN Bundle Service in [RFC7432] because all of operation can map to VLAN Bundle Service in [RFC7432] because all
the tenant's tagged frames map to a single bridge table / MAC-VRF, the tenant's tagged frames map to a single bridge table / MAC-VRF,
and the inner VLAN tag is not used for lookup by the disposition PE and the inner VLAN tag is not used for lookup by the disposition PE
when performing VXLAN decapsulation as described in section 6 of when performing VXLAN decapsulation as described in Section 6 of
[RFC7348]. [RFC7348].
[RFC7637] encapsulation is based on GRE encapsulation and it mandates [RFC7637] encapsulation is based on GRE encapsulation, and it
the inclusion of the optional GRE Key field which carries the VSID. mandates the inclusion of the optional GRE Key field, which carries
There is a one-to-one mapping between the VSID and the tenant VLAN the VSID. There is a one-to-one mapping between the VSID and the
ID, as described in [RFC7637] and the inclusion of an inner VLAN tag tenant VID, as described in [RFC7637]. The inclusion of an inner
is prohibited. This mode of operation in [RFC7637] maps to VLAN Based VLAN tag is prohibited. This mode of operation in [RFC7637] maps to
Service in [RFC7432]. VLAN Based Service in [RFC7432].
As described in the next section there is no change to the encoding As described in the next section, there is no change to the encoding
of EVPN routes to support VXLAN or NVGRE encapsulation except for the of EVPN routes to support VXLAN or NVGRE encapsulation, except for
use of the BGP Encapsulation extended community to indicate the the use of the BGP Encapsulation Extended Community to indicate the
encapsulation type (e.g., VXLAN or NVGRE). However, there is encapsulation type (e.g., VXLAN or NVGRE). However, there is
potential impact to the EVPN procedures depending on where the NVE is potential impact to the EVPN procedures depending on where the NVE is
located (i.e., in hypervisor or TOR) and whether multi-homing located (i.e., in hypervisor or ToR) and whether multihoming
capabilities are required. capabilities are required.
5.1.1 Virtual Identifiers Scope 5.1.1. Virtual Identifiers Scope
Although VNIs are defined as 24-bit globally unique values, there are Although VNIs are defined as 24-bit globally unique values, there are
scenarios in which it is desirable to use a locally significant value scenarios in which it is desirable to use a locally significant value
for VNI, especially in the context of data center interconnect: for the VNI, especially in the context of a data-center interconnect.
5.1.1.1 Data Center Interconnect with Gateway 5.1.1.1. Data-Center Interconnect with Gateway
In the case where NVEs in different data centers need to be In the case where NVEs in different data centers need to be
interconnected, and the NVEs need to use VNIs as a globally unique interconnected, and the NVEs need to use VNIs as globally unique
identifiers within a data center, then a Gateway needs to be employed identifiers within a data center, then a Gateway (GW) needs to be
at the edge of the data center network. This is because the Gateway employed at the edge of the data-center network (DCN). This is
will provide the functionality of translating the VNI when crossing because the Gateway will provide the functionality of translating the
network boundaries, which may align with operator span of control VNI when crossing network boundaries, which may align with operator
boundaries. As an example, consider the network of Figure 1 below. span-of-control boundaries. As an example, consider the network of
Assume there are three network operators: one for each of the DC1, Figure 1. Assume there are three network operators: one for each of
DC2 and WAN networks. The Gateways at the edge of the data centers the DC1, DC2, and WAN networks. The Gateways at the edge of the data
are responsible for translating the VNIs between the values used in centers are responsible for translating the VNIs between the values
each of the data center networks and the values used in the WAN. used in each of the DCNs and the values used in the WAN.
+--------------+ +--------------+
| | | |
+---------+ | WAN | +---------+ +---------+ | WAN | +---------+
+----+ | +---+ +----+ +----+ +---+ | +----+ +----+ | +---+ +----+ +----+ +---+ | +----+
|NVE1|--| | | |WAN | |WAN | | | |--|NVE3| |NVE1|--| | | |WAN | |WAN | | | |--|NVE3|
+----+ |IP |GW |--|Edge| |Edge|--|GW | IP | +----+ +----+ |IP |GW |--|Edge| |Edge|--|GW | IP | +----+
+----+ |Fabric +---+ +----+ +----+ +---+ Fabric | +----+ +----+ |Fabric +---+ +----+ +----+ +---+ Fabric | +----+
|NVE2|--| | | | | |--|NVE4| |NVE2|--| | | | | |--|NVE4|
+----+ +---------+ +--------------+ +---------+ +----+ +----+ +---------+ +--------------+ +---------+ +----+
|<------ DC 1 ------> <------ DC2 ------>| |<------ DC 1 ------> <------ DC2 ------>|
Figure 1: Data Center Interconnect with Gateway Figure 1: Data-Center Interconnect with Gateway
5.1.1.2 Data Center Interconnect without Gateway 5.1.1.2. Data-Center Interconnect without Gateway
In the case where NVEs in different data centers need to be In the case where NVEs in different data centers need to be
interconnected, and the NVEs need to use locally assigned VNIs (e.g., interconnected, and the NVEs need to use locally assigned VNIs (e.g.,
similar to MPLS labels), then there may be no need to employ Gateways similar to MPLS labels), there may be no need to employ Gateways at
at the edge of the data center network. More specifically, the VNI the edge of the DCN. More specifically, the VNI value that is used
value that is used by the transmitting NVE is allocated by the NVE by the transmitting NVE is allocated by the NVE that is receiving the
that is receiving the traffic (in other words, this is similar to traffic (in other words, this is similar to a "downstream-assigned"
"downstream assigned" MPLS label). This allows the VNI space to be MPLS label). This allows the VNI space to be decoupled between
decoupled between different data center networks without the need for different DCNs without the need for a dedicated Gateway at the edge
a dedicated Gateway at the edge of the data centers. This topics is of the data centers. This topic is covered in Section 10.2.
covered in section 10.2.
+--------------+ +--------------+
| | | |
+---------+ | WAN | +---------+ +---------+ | WAN | +---------+
+----+ | | +----+ +----+ | | +----+ +----+ | | +----+ +----+ | | +----+
|NVE1|--| | |ASBR| |ASBR| | |--|NVE3| |NVE1|--| | |ASBR| |ASBR| | |--|NVE3|
+----+ |IP Fabric|---| | | |--|IP Fabric| +----+ +----+ |IP Fabric|---| | | |--|IP Fabric| +----+
+----+ | | +----+ +----+ | | +----+ +----+ | | +----+ +----+ | | +----+
|NVE2|--| | | | | |--|NVE4| |NVE2|--| | | | | |--|NVE4|
+----+ +---------+ +--------------+ +---------+ +----+ +----+ +---------+ +--------------+ +---------+ +----+
|<------ DC 1 -----> <---- DC2 ------>| |<------ DC 1 -----> <---- DC2 ------>|
Figure 2: Data Center Interconnect with ASBR Figure 2: Data-Center Interconnect with ASBR
5.1.2 Virtual Identifiers to EVI Mapping 5.1.2. Virtual Identifiers to EVI Mapping
When the EVPN control plane is used in conjunction with VXLAN (or Just like in [RFC7432], where two options existed for mapping
NVGRE encapsulation), just like [RFC7432] where two options existed broadcast domains (represented by VLAN IDs) to an EVI, when the EVPN
for mapping broadcast domains (represented by VLAN IDs) to an EVI, in control plane is used in conjunction with VXLAN (or NVGRE
here there are also two options for mapping broadcast domains encapsulation), there are also two options for mapping broadcast
represented by VXLAN VNIs (or NVGRE VSIDs) to an EVI: domains represented by VXLAN VNIs (or NVGRE VSIDs) to an EVI:
1. Option 1: Single Broadcast Domain per EVI Option 1: A Single Broadcast Domain per EVI
In this option, a single Ethernet broadcast domain (e.g., subnet) In this option, a single Ethernet broadcast domain (e.g., subnet)
represented by a VNI is mapped to a unique EVI. This corresponds to represented by a VNI is mapped to a unique EVI. This corresponds to
the VLAN Based service in [RFC7432], where a tenant-facing interface, the VLAN-Based Service in [RFC7432], where a tenant-facing interface,
logical interface (e.g., represented by a VLAN ID) or physical, gets logical interface (e.g., represented by a VID), or physical interface
mapped to an EVPN instance (EVI). As such, a BGP RD and RT are needed gets mapped to an EVI. As such, a BGP Route Distinguisher (RD) and
per VNI on every NVE. The advantage of this model is that it allows Route Target (RT) are needed per VNI on every NVE. The advantage of
the BGP RT constraint mechanisms to be used in order to limit the this model is that it allows the BGP RT constraint mechanisms to be
propagation and import of routes to only the NVEs that are interested used in order to limit the propagation and import of routes to only
in a given VNI. The disadvantage of this model may be the the NVEs that are interested in a given VNI. The disadvantage of
provisioning overhead if RD and RT are not derived automatically from this model may be the provisioning overhead if the RD and RT are not
VNI. derived automatically from the VNI.
In this option, the MAC-VRF table is identified by the RT in the In this option, the MAC-VRF table is identified by the RT in the
control plane and by the VNI in the data-plane. In this option, the control plane and by the VNI in the data plane. In this option, the
specific MAC-VRF table corresponds to only a single bridge table. specific MAC-VRF table corresponds to only a single bridge table.
2. Option 2: Multiple Broadcast Domains per EVI Option 2: Multiple Broadcast Domains per EVI
In this option, multiple subnets each represented by a unique VNI are In this option, multiple subnets, each represented by a unique VNI,
mapped to a single EVI. For example, if a tenant has multiple are mapped to a single EVI. For example, if a tenant has multiple
segments/subnets each represented by a VNI, then all the VNIs for segments/subnets each represented by a VNI, then all the VNIs for
that tenant are mapped to a single EVI - e.g., the EVI in this case that tenant are mapped to a single EVI; for example, the EVI in this
represents the tenant and not a subnet . This corresponds to the case represents the tenant and not a subnet. This corresponds to the
VLAN-aware bundle service in [RFC7432]. The advantage of this model VLAN-aware bundle service in [RFC7432]. The advantage of this model
is that it doesn't require the provisioning of RD/RT per VNI. is that it doesn't require the provisioning of an RD/RT per VNI.
However, this is a moot point when compared to option 1 where auto- However, this is a moot point when compared to Option 1 where auto-
derivation is used. The disadvantage of this model is that routes derivation is used. The disadvantage of this model is that routes
would be imported by NVEs that may not be interested in a given VNI. would be imported by NVEs that may not be interested in a given VNI.
In this option the MAC-VRF table is identified by the RT in the In this option, the MAC-VRF table is identified by the RT in the
control plane and a specific bridge table for that MAC-VRF is control plane; a specific bridge table for that MAC-VRF is identified
identified by the <RT, Ethernet Tag ID> in the control plane. In this by the <RT, Ethernet Tag ID> in the control plane. In this option,
option, the VNI in the data-plane is sufficient to identify a the VNI in the data plane is sufficient to identify a specific bridge
specific bridge table. table.
5.1.2.1 Auto Derivation of RT 5.1.2.1. Auto-Derivation of RT
When the option of a single VNI per EVI is used, in order to simplify In order to simplify configuration, when the option of a single VNI
configuration, the RT used for EVPN can be auto-derived. RD can be per EVI is used, the RT used for EVPN can be auto-derived. RD can be
auto generated as described in [RFC7432] and RT can be auto-derived auto-generated as described in [RFC7432], and RT can be auto-derived
as described next. as described next.
Since a gateway PE as depicted in figure-1 participates in both the Since a Gateway PE as depicted in Figure 1 participates in both the
DCN and WAN BGP sessions, it is important that when RT values are DCN and WAN BGP sessions, it is important that, when RT values are
auto-derived from VNIs, there is no conflict in RT spaces between DCN auto-derived from VNIs, there be no conflict in RT spaces between
and WAN networks assuming that both are operating within the same AS. DCNs and WANs, assuming that both are operating within the same
Also, there can be scenarios where both VXLAN and NVGRE Autonomous System (AS). Also, there can be scenarios where both
encapsulations may be needed within the same DCN and their VXLAN and NVGRE encapsulations may be needed within the same DCN, and
corresponding VNIs are administered independently which means VNI their corresponding VNIs are administered independently, which means
spaces can overlap. In order to avoid conflict in RT spaces arises, VNI spaces can overlap. In order to avoid conflict in RT spaces, the
the 6-byte RT values with 2-octet AS number for DCNs can be auto- 6-byte RT values with 2-octet AS number for DCNs can be auto-derived
derived as follow: as follow:
0 1 2 3 0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Global Administrator | Local Administrator | | Global Administrator | Local Administrator |
+-----------------------------------------------+---------------+ +-----------------------------------------------+---------------+
| Local Administrator (Cont.) | | Local Administrator (Cont.) |
+-------------------------------+ +-------------------------------+
0 1 2 3 0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Global Administrator |A| TYPE| D-ID | Service ID | | Global Administrator |A| TYPE| D-ID | Service ID |
+-----------------------------------------------+---------------+ +-----------------------------------------------+---------------+
| Service ID (Cont.) | | Service ID (Cont.) |
+-------------------------------+ +-------------------------------+
The 6-octet RT field consists of two sub-field: The 6-octet RT field consists of two sub-fields:
- Global Administrator sub-field: 2 octets. This sub-field contains - Global Administrator sub-field: 2 octets. This sub-field contains
an Autonomous System number assigned by IANA. an AS number assigned by IANA <https://www.iana.org/assignments/
as-numbers/>.
- Local Administrator sub-field: 4 octets - Local Administrator sub-field: 4 octets
* A: A single-bit field indicating if this RT is auto-derived * A: A single-bit field indicating if this RT is auto-derived
0: auto-derived 0: auto-derived
1: manually-derived 1: manually derived
* Type: A 3-bit field that identifies the space in which * Type: A 3-bit field that identifies the space in which the
the other 3 bytes are defined. The following spaces are other 3 bytes are defined. The following spaces are defined:
defined:
0 : VID (802.1Q VLAN ID) 0 : VID (802.1Q VLAN ID)
1 : VXLAN 1 : VXLAN
2 : NVGRE 2 : NVGRE
3 : I-SID 3 : I-SID
4 : EVI 4 : EVI
5 : dual-VID (QinQ VLAN ID) 5 : dual-VID (QinQ VLAN ID)
* D-ID: A 4-bit field that identifies domain-id. The default * D-ID: A 4-bit field that identifies domain-id. The default
value of domain-id is zero indicating that only a single value of domain-id is zero, indicating that only a single
numbering space exist for a given technology. However, if numbering space exist for a given technology. However, if more
there are more than one number space exist for a given than one number space exists for a given technology (e.g.,
technology (e.g., overlapping VXLAN spaces), then each of overlapping VXLAN spaces), then each of the number spaces need
the number spaces need to be identify by their to be identified by its corresponding domain-id starting from
corresponding domain-id starting from 1. 1.
* Service ID: This 3-octet field is set to VNI, VSID, I-SID, * Service ID: This 3-octet field is set to VNI, VSID, I-SID, or
or VID. VID.
It should be noted that RT auto-derivation is applicable for 2-octet It should be noted that RT auto-derivation is applicable for 2-octet
AS numbers. For 4-octet AS numbers, RT needs to be manually AS numbers. For 4-octet AS numbers, the RT needs to be manually
configured since 3-octet VNI fields cannot be fit within 2-octet configured because 3-octet VNI fields cannot be fit within the
local administrator field. 2-octet local administrator field.
5.1.3 Constructing EVPN BGP Routes 5.1.3. Constructing EVPN BGP Routes
In EVPN, an MPLS label for instance identifying forwarding table is In EVPN, an MPLS label, for instance, identifying the forwarding
distributed by the egress PE via the EVPN control plane and is placed table is distributed by the egress PE via the EVPN control plane and
in the MPLS header of a given packet by the ingress PE. This label is is placed in the MPLS header of a given packet by the ingress PE.
used upon receipt of that packet by the egress PE for disposition of This label is used upon receipt of that packet by the egress PE for
that packet. This is very similar to the use of the VNI by the egress disposition of that packet. This is very similar to the use of the
NVE, with the difference being that an MPLS label has local VNI by the egress NVE, with the difference being that an MPLS label
significance while a VNI typically has global significance. has local significance while a VNI typically has global significance.
Accordingly, and specifically to support the option of locally- Accordingly, and specifically to support the option of locally
assigned VNIs, the MPLS Label1 field in the MAC/IP Advertisement assigned VNIs, the MPLS Label1 field in the MAC/IP Advertisement
route, the MPLS label field in the Ethernet AD per EVI route, and the route, the MPLS label field in the Ethernet A-D per EVI route, and
MPLS label field in the PMSI Tunnel Attribute of the Inclusive the MPLS label field in the P-Multicast Service Interface (PMSI)
Multicast Ethernet Tag (IMET) route are used to carry the VNI. For Tunnel attribute of the Inclusive Multicast Ethernet Tag (IMET) route
the balance of this memo, the above MPLS label fields will be are used to carry the VNI. For the balance of this memo, the above
referred to as the VNI field. The VNI field is used for both local MPLS label fields will be referred to as the VNI field. The VNI
and global VNIs, and for either case the entire 24-bit field is used field is used for both local and global VNIs; for either case, the
to encode the VNI value. entire 24-bit field is used to encode the VNI value.
For the VLAN-based service (a single VNI per MAC-VRF), the Ethernet For the VLAN-Based Service (a single VNI per MAC-VRF), the Ethernet
Tag field in the MAC/IP Advertisement, Ethernet AD per EVI, and IMET Tag field in the MAC/IP Advertisement, Ethernet A-D per EVI, and IMET
route MUST be set to zero just as in the VLAN Based service in route MUST be set to zero just as in the VLAN-Based Service in
[RFC7432]. [RFC7432].
For the VLAN-aware bundle service (multiple VNIs per MAC-VRF with For the VLAN-Aware Bundle Service (multiple VNIs per MAC-VRF with
each VNI associated with its own bridge table), the Ethernet Tag each VNI associated with its own bridge table), the Ethernet Tag
field in the MAC Advertisement, Ethernet AD per EVI, and IMET route field in the MAC Advertisement, Ethernet A-D per EVI, and IMET route
MUST identify a bridge table within a MAC-VRF and the set of Ethernet MUST identify a bridge table within a MAC-VRF; the set of Ethernet
Tags for that EVI needs to be configured consistently on all PEs Tags for that EVI needs to be configured consistently on all PEs
within that EVI. For locally-assigned VNIs, the value advertised in within that EVI. For locally assigned VNIs, the value advertised in
the Ethernet Tag field MUST be set to a VID just as in the VLAN-aware the Ethernet Tag field MUST be set to a VID just as in the VLAN-aware
bundle service in [RFC7432]. Such setting must be done consistently bundle service in [RFC7432]. Such setting must be done consistently
on all PE devices participating in that EVI within a given domain. on all PE devices participating in that EVI within a given domain.
For global VNIs, the value advertised in the Ethernet Tag field For global VNIs, the value advertised in the Ethernet Tag field
SHOULD be set to a VNI as long as it matches the existing semantics SHOULD be set to a VNI as long as it matches the existing semantics
of the Ethernet Tag, i.e., it identifies a bridge table within a MAC- of the Ethernet Tag, i.e., it identifies a bridge table within a
VRF and the set of VNIs are configured consistently on each PE in MAC-VRF and the set of VNIs are configured consistently on each PE in
that EVI. that EVI.
In order to indicate which type of data plane encapsulation (i.e., In order to indicate which type of data-plane encapsulation (i.e.,
VXLAN, NVGRE, MPLS, or MPLS in GRE) is to be used, the BGP VXLAN, NVGRE, MPLS, or MPLS in GRE) is to be used, the BGP
Encapsulation extended community defined in [RFC5512] is included Encapsulation Extended Community defined in [RFC5512] is included
with all EVPN routes (i.e. MAC Advertisement, Ethernet AD per EVI, with all EVPN routes (i.e., MAC Advertisement, Ethernet A-D per EVI,
Ethernet AD per ESI, Inclusive Multicast Ethernet Tag, and Ethernet Ethernet A-D per ESI, IMET, and Ethernet Segment) advertised by an
Segment) advertised by an egress PE. Five new values have been egress PE. Five new values have been assigned by IANA to extend the
assigned by IANA to extend the list of encapsulation types defined in list of encapsulation types defined in [RFC5512]; they are listed in
[RFC5512] and they are listed in section 13. Section 11.
The MPLS encapsulation tunnel type, listed in section 13, is needed The MPLS encapsulation tunnel type, listed in Section 11, is needed
in order to distinguish between an advertising node that only in order to distinguish between an advertising node that only
supports non-MPLS encapsulations and one that supports MPLS and non- supports non-MPLS encapsulations and one that supports MPLS and
MPLS encapsulations. An advertising node that only supports MPLS non-MPLS encapsulations. An advertising node that only supports MPLS
encapsulation does not need to advertise any encapsulation tunnel encapsulation does not need to advertise any encapsulation tunnel
types; i.e., if the BGP Encapsulation extended community is not types; i.e., if the BGP Encapsulation Extended Community is not
present, then either MPLS encapsulation or a statically configured present, then either MPLS encapsulation or a statically configured
encapsulation is assumed. encapsulation is assumed.
The Next Hop field of the MP_REACH_NLRI attribute of the route MUST The Next Hop field of the MP_REACH_NLRI attribute of the route MUST
be set to the IPv4 or IPv6 address of the NVE. The remaining fields be set to the IPv4 or IPv6 address of the NVE. The remaining fields
in each route are set as per [RFC7432]. in each route are set as per [RFC7432].
Note that the procedure defined here to use the MPLS Label field to Note that the procedure defined here -- to use the MPLS Label field
carry the VNI in the presence of a Tunnel Encapsulation Extended to carry the VNI in the presence of a Tunnel Encapsulation Extended
Community specifying the use of a VNI, is aligned with the procedures Community specifying the use of a VNI -- is aligned with the
described in section 8.2.2.2 of [TUNNEL-ENCAP] ("When a Valid VNI has procedures described in Section 8.2.2.2 of [TUNNEL-ENCAP] ("When a
not been Signaled"). Valid VNI has not been Signaled").
5.2 MPLS over GRE 5.2. MPLS over GRE
The EVPN data-plane is modeled as an EVPN MPLS client layer sitting The EVPN data plane is modeled as an EVPN MPLS client layer sitting
over an MPLS PSN-tunnel server layer. Some of the EVPN functions over an MPLS PSN tunnel server layer. Some of the EVPN functions
(split-horizon, aliasing, and backup-path) are tied to the MPLS (split-horizon, Aliasing, and Backup Path) are tied to the MPLS
client layer. If MPLS over GRE encapsulation is used, then the EVPN client layer. If MPLS over GRE encapsulation is used, then the EVPN
MPLS client layer can be carried over an IP PSN tunnel transparently. MPLS client layer can be carried over an IP PSN tunnel transparently.
Therefore, there is no impact to the EVPN procedures and associated Therefore, there is no impact to the EVPN procedures and associated
data-plane operation. data-plane operation.
The existing standards for MPLS over GRE encapsulation as defined by [RFC4023] defines the standard for using MPLS over GRE encapsulation,
[RFC4023] can be used for this purpose; however, when it is used in which can be used for this purpose. However, when MPLS over GRE is
conjunction with EVPN, it is recommended that the GRE key field be used in conjunction with EVPN, it is recommended that the GRE key
present and be used to provide a 32-bit entropy value only if the P field be present and be used to provide a 32-bit entropy value only
nodes can perform Equal-Cost Multipath (ECMP) hashing based on the if the P nodes can perform Equal-Cost Multipath (ECMP) hashing based
GRE key; otherwise, the GRE header SHOULD NOT include the GRE key. on the GRE key; otherwise, the GRE header SHOULD NOT include the GRE
The Checksum and Sequence Number fields MUST NOT be included and the key field. The Checksum and Sequence Number fields MUST NOT be
corresponding C and S bits in the GRE Packet Header MUST be set to included, and the corresponding C and S bits in the GRE header MUST
zero. A PE capable of supporting this encapsulation, SHOULD advertise be set to zero. A PE capable of supporting this encapsulation SHOULD
its EVPN routes along with the Tunnel Encapsulation extended advertise its EVPN routes along with the Tunnel Encapsulation
community indicating MPLS over GRE encapsulation as described in Extended Community indicating MPLS over GRE encapsulation as
previous section. described in the previous section.
6 EVPN with Multiple Data Plane Encapsulations 6. EVPN with Multiple Data-Plane Encapsulations
The use of the BGP Encapsulation extended community per [RFC5512] The use of the BGP Encapsulation Extended Community per [RFC5512]
allows each NVE in a given EVI to know each of the encapsulations allows each NVE in a given EVI to know each of the encapsulations
supported by each of the other NVEs in that EVI. i.e., each of the supported by each of the other NVEs in that EVI. That is, each of
NVEs in a given EVI may support multiple data plane encapsulations. the NVEs in a given EVI may support multiple data-plane
An ingress NVE can send a frame to an egress NVE only if the set of encapsulations. An ingress NVE can send a frame to an egress NVE
encapsulations advertised by the egress NVE forms a non-empty only if the set of encapsulations advertised by the egress NVE forms
intersection with the set of encapsulations supported by the ingress a non-empty intersection with the set of encapsulations supported by
NVE, and it is at the discretion of the ingress NVE which the ingress NVE; it is at the discretion of the ingress NVE which
encapsulation to choose from this intersection. (As noted in encapsulation to choose from this intersection. (As noted in
section 5.1.3, if the BGP Encapsulation extended community is not Section 5.1.3, if the BGP Encapsulation extended community is not
present, then the default MPLS encapsulation or a locally configured present, then the default MPLS encapsulation or a locally configured
encapsulation is assumed.) encapsulation is assumed.)
When a PE advertises multiple supported encapsulations, it MUST When a PE advertises multiple supported encapsulations, it MUST
advertise encapsulations that use the same EVPN procedures including advertise encapsulations that use the same EVPN procedures including
procedures associated with split-horizon filtering described in procedures associated with split-horizon filtering described in
section 8.3.1. For example, VXLAN and NVGRE (or MPLS and MPLS over Section 8.3.1. For example, VXLAN and NVGRE (or MPLS and MPLS over
GRE) encapsulations use the same EVPN procedures and thus a PE can GRE) encapsulations use the same EVPN procedures; thus, a PE can
advertise both of them and can support either of them or both of them advertise both of them and can support either of them or both of them
simultaneously. However, a PE MUST NOT advertise VXLAN and MPLS simultaneously. However, a PE MUST NOT advertise VXLAN and MPLS
encapsulations together because (a) the MPLS field of EVPN routes is encapsulations together because (a) the MPLS field of EVPN routes is
set to either an MPLS label or a VNI but not both and (b) some EVPN set to either an MPLS label or a VNI, but not both and (b) some EVPN
procedures (such as split-horizon filtering) are different for procedures (such as split-horizon filtering) are different for VXLAN/
VXLAN/NVGRE and MPLS encapsulations. NVGRE and MPLS encapsulations.
An ingress node that uses shared multicast trees for sending An ingress node that uses shared multicast trees for sending
broadcast or multicast frames MAY maintain distinct trees for each broadcast or multicast frames MAY maintain distinct trees for each
different encapsulation type. different encapsulation type.
It is the responsibility of the operator of a given EVI to ensure It is the responsibility of the operator of a given EVI to ensure
that all of the NVEs in that EVI support at least one common that all of the NVEs in that EVI support at least one common
encapsulation. If this condition is violated, it could result in encapsulation. If this condition is violated, it could result in
service disruption or failure. The use of the BGP Encapsulation service disruption or failure. The use of the BGP Encapsulation
extended community provides a method to detect when this condition is Extended Community provides a method to detect when this condition is
violated but the actions to be taken are at the discretion of the violated, but the actions to be taken are at the discretion of the
operator and are outside the scope of this document. operator and are outside the scope of this document.
7 Single-Homing NVEs - NVE Residing in Hypervisor 7. Single-Homing NVEs - NVE Residing in Hypervisor
When a NVE and its hosts/VMs are co-located in the same physical When an NVE and its hosts/VMs are co-located in the same physical
device, e.g., when they reside in a server, the links between them device, e.g., when they reside in a server, the links between them
are virtual and they typically share fate; i.e., the subject are virtual and they typically share fate. That is, the subject
hosts/VMs are typically not multi-homed or if they are multi-homed, hosts/VMs are typically not multihomed or, if they are multihomed,
the multi-homing is a purely local matter to the server hosting the the multihoming is a purely local matter to the server hosting the VM
VM and the NVEs, and need not be "visible" to any other NVEs residing and the NVEs, and it need not be "visible" to any other NVEs residing
on other servers, and thus does not require any specific protocol on other servers. Thus, it does not require any specific protocol
mechanisms. The most common case of this is when the NVE resides on mechanisms. The most common case of this is when the NVE resides on
the hypervisor. the hypervisor.
In the sub-sections that follow, we will discuss the impact on EVPN In the subsections that follow, we will discuss the impact on EVPN
procedures for the case when the NVE resides on the hypervisor and procedures for the case when the NVE resides on the hypervisor and
the VXLAN (or NVGRE) encapsulation is used. the VXLAN (or NVGRE) encapsulation is used.
7.1 Impact on EVPN BGP Routes & Attributes for VXLAN/NVGRE Encapsulation 7.1. Impact on EVPN BGP Routes & Attributes for VXLAN/NVGRE
Encapsulations
In scenarios where different groups of data centers are under In scenarios where different groups of data centers are under
different administrative domains, and these data centers are different administrative domains, and these data centers are
connected via one or more backbone core providers as described in connected via one or more backbone core providers as described in
[RFC7365], the RD must be a unique value per EVI or per NVE as [RFC7365], the RD must be a unique value per EVI or per NVE as
described in [RFC7432]. In other words, whenever there is more than described in [RFC7432]. In other words, whenever there is more than
one administrative domain for global VNI, then a unique RD must be one administrative domain for global VNI, a unique RD must be used;
used, or whenever the VNI value has local significance, then a unique or, whenever the VNI value has local significance, a unique RD must
RD must be used. Therefore, it is recommended to use a unique RD as be used. Therefore, it is recommended to use a unique RD as
described in [RFC7432] at all time. described in [RFC7432] at all times.
When the NVEs reside on the hypervisor, the EVPN BGP routes and When the NVEs reside on the hypervisor, the EVPN BGP routes and
attributes associated with multi-homing are no longer required. This attributes associated with multihoming are no longer required. This
reduces the required routes and attributes to the following subset of reduces the required routes and attributes to the following subset of
four out of the total of eight listed in section 7 of [RFC7432]: four out of the total of eight listed in Section 7 of [RFC7432]:
- MAC/IP Advertisement Route - MAC/IP Advertisement Route
- Inclusive Multicast Ethernet Tag Route
- MAC Mobility Extended Community
- Default Gateway Extended Community
However, as noted in section 8.6 of [RFC7432] in order to enable a - Inclusive Multicast Ethernet Tag Route
- MAC Mobility Extended Community
- Default Gateway Extended Community
However, as noted in Section 8.6 of [RFC7432], in order to enable a
single-homing ingress NVE to take advantage of fast convergence, single-homing ingress NVE to take advantage of fast convergence,
aliasing, and backup-path when interacting with multi-homed egress Aliasing, and Backup Path when interacting with multihomed egress
NVEs attached to a given Ethernet segment, the single-homing ingress NVEs attached to a given ES, the single-homing ingress NVE should be
NVE should be able to receive and process Ethernet AD per ES and able to receive and process routes that are Ethernet A-D per ES and
Ethernet AD per EVI routes. Ethernet A-D per EVI.
7.2 Impact on EVPN Procedures for VXLAN/NVGRE Encapsulation 7.2. Impact on EVPN Procedures for VXLAN/NVGRE Encapsulations
When the NVEs reside on the hypervisors, the EVPN procedures When the NVEs reside on the hypervisors, the EVPN procedures
associated with multi-homing are no longer required. This limits the associated with multihoming are no longer required. This limits the
procedures on the NVE to the following subset of the EVPN procedures: procedures on the NVE to the following subset.
1. Local learning of MAC addresses received from the VMs per section 1. Local learning of MAC addresses received from the VMs per
10.1 of [RFC7432]. Section 10.1 of [RFC7432].
2. Advertising locally learned MAC addresses in BGP using the MAC/IP 2. Advertising locally learned MAC addresses in BGP using the MAC/IP
Advertisement routes. Advertisement routes.
3. Performing remote learning using BGP per Section 10.2 of 3. Performing remote learning using BGP per Section 9.2 of
[RFC7432]. [RFC7432].
4. Discovering other NVEs and constructing the multicast tunnels 4. Discovering other NVEs and constructing the multicast tunnels
using the Inclusive Multicast Ethernet Tag routes. using the IMET routes.
5. Handling MAC address mobility events per the procedures of Section 5. Handling MAC address mobility events per the procedures of
16 in [RFC7432]. Section 15 in [RFC7432].
However, as noted in section 8.6 of [RFC7432] in order to enable a However, as noted in Section 8.6 of [RFC7432], in order to enable a
single-homing ingress NVE to take advantage of fast convergence, single-homing ingress NVE to take advantage of fast convergence,
aliasing, and back-up path when interacting with multi-homed egress Aliasing, and Backup Path when interacting with multihomed egress
NVEs attached to a given Ethernet segment, a single-homing ingress NVEs attached to a given ES, a single-homing ingress NVE should
NVE should implement the ingress node processing of Ethernet AD per implement the ingress node processing of routes that are Ethernet A-D
ES and Ethernet AD per EVI routes as defined in sections 8.2 Fast per ES and Ethernet A-D per EVI as defined in Sections 8.2 ("Fast
Convergence and 8.4 Aliasing and Backup-Path of [RFC7432]. Convergence") and 8.4 ("Aliasing and Backup Path") of [RFC7432].
8 Multi-Homing NVEs - NVE Residing in ToR Switch 8. Multihoming NVEs - NVE Residing in ToR Switch
In this section, we discuss the scenario where the NVEs reside in the In this section, we discuss the scenario where the NVEs reside in the
Top of Rack (ToR) switches AND the servers (where VMs are residing) ToR switches AND the servers (where VMs are residing) are multihomed
are multi-homed to these ToR switches. The multi-homing NVE operate to these ToR switches. The multihoming NVE operates in All-Active or
in All-Active or Single-Active redundancy mode. If the servers are Single-Active redundancy mode. If the servers are single-homed to
single-homed to the ToR switches, then the scenario becomes similar the ToR switches, then the scenario becomes similar to that where the
to that where the NVE resides on the hypervisor, as discussed in NVE resides on the hypervisor, as discussed in Section 7, as far as
Section 7, as far as the required EVPN functionality are concerned. the required EVPN functionality is concerned.
[RFC7432] defines a set of BGP routes, attributes and procedures to [RFC7432] defines a set of BGP routes, attributes, and procedures to
support multi-homing. We first describe these functions and support multihoming. We first describe these functions and
procedures, then discuss which of these are impacted by the VXLAN procedures, then discuss which of these are impacted by the VXLAN (or
(or NVGRE) encapsulation and what modifications are required. As it NVGRE) encapsulation and what modifications are required. As will be
will be seen later in this section, the only EVPN procedure that is seen later in this section, the only EVPN procedure that is impacted
impacted by non-MPLS overlay encapsulation (e.g., VXLAN or NVGRE) by non-MPLS overlay encapsulation (e.g., VXLAN or NVGRE) where it
where it provides space for one ID rather than stack of labels, is provides space for one ID rather than a stack of labels, is that of
that of split-horizon filtering for multi-homed Ethernet Segments split-horizon filtering for multihomed ESs described in
described in section 8.3.1. Section 8.3.1.
8.1 EVPN Multi-Homing Features 8.1. EVPN Multihoming Features
In this section, we will recap the multi-homing features of EVPN to In this section, we will recap the multihoming features of EVPN to
highlight the encapsulation dependencies. The section only describes highlight the encapsulation dependencies. The section only describes
the features and functions at a high-level. For more details, the the features and functions at a high level. For more details, the
reader is to refer to [RFC7432]. reader is to refer to [RFC7432].
8.1.1 Multi-homed Ethernet Segment Auto-Discovery 8.1.1. Multihomed ES Auto-Discovery
EVPN NVEs (or PEs) connected to the same Ethernet Segment (e.g. the EVPN NVEs (or PEs) connected to the same ES (e.g., the same server
same server via LAG) can automatically discover each other with via Link Aggregation Group (LAG)) can automatically discover each
minimal to no configuration through the exchange of BGP routes. other with minimal to no configuration through the exchange of BGP
routes.
8.1.2 Fast Convergence and Mass Withdraw 8.1.2. Fast Convergence and Mass Withdrawal
EVPN defines a mechanism to efficiently and quickly signal, to remote EVPN defines a mechanism to efficiently and quickly signal, to remote
NVEs, the need to update their forwarding tables upon the occurrence NVEs, the need to update their forwarding tables upon the occurrence
of a failure in connectivity to an Ethernet segment (e.g., a link or of a failure in connectivity to an ES (e.g., a link or a port
a port failure). This is done by having each NVE advertise an failure). This is done by having each NVE advertise an Ethernet A-D
Ethernet A-D Route per Ethernet segment for each locally attached route per ES for each locally attached segment. Upon a failure in
segment. Upon a failure in connectivity to the attached segment, the connectivity to the attached segment, the NVE withdraws the
NVE withdraws the corresponding Ethernet A-D route. This triggers all corresponding Ethernet A-D route. This triggers all NVEs that
NVEs that receive the withdrawal to update their next-hop adjacencies receive the withdrawal to update their next-hop adjacencies for all
for all MAC addresses associated with the Ethernet segment in MAC addresses associated with the ES in question. If no other NVE
question. If no other NVE had advertised an Ethernet A-D route for had advertised an Ethernet A-D route for the same segment, then the
the same segment, then the NVE that received the withdrawal simply NVE that received the withdrawal simply invalidates the MAC entries
invalidates the MAC entries for that segment. Otherwise, the NVE for that segment. Otherwise, the NVE updates the next-hop adjacency
updates the next-hop adjacency list accordingly. list accordingly.
8.1.3 Split-Horizon 8.1.3. Split-Horizon
If a server is multi-homed to two or more NVEs (represented by an If a server is multihomed to two or more NVEs (represented by an ES
Ethernet segment ES1) and operating in an all-active redundancy mode, ES1) and operating in an All-Active redundancy mode, sends a BUM
sends a BUM packet (ie, Broadcast, Unknown unicast, or Multicast) to (i.e., Broadcast, Unknown unicast, or Multicast) packet to one of
one of these NVEs, then it is important to ensure the packet is not these NVEs, then it is important to ensure the packet is not looped
looped back to the server via another NVE connected to this server. back to the server via another NVE connected to this server. The
The filtering mechanism on the NVE to prevent such loop and packet filtering mechanism on the NVE to prevent such loop and packet
duplication is called "split horizon filtering'. duplication is called "split-horizon filtering".
8.1.4 Aliasing and Backup-Path 8.1.4. Aliasing and Backup Path
In the case where a station is multi-homed to multiple NVEs, it is In the case where a station is multihomed to multiple NVEs, it is
possible that only a single NVE learns a set of the MAC addresses possible that only a single NVE learns a set of the MAC addresses
associated with traffic transmitted by the station. This leads to a associated with traffic transmitted by the station. This leads to a
situation where remote NVEs receive MAC advertisement routes, for situation where remote NVEs receive MAC Advertisement routes, for
these addresses, from a single NVE even though multiple NVEs are these addresses, from a single NVE even though multiple NVEs are
connected to the multi-homed station. As a result, the remote NVEs connected to the multihomed station. As a result, the remote NVEs
are not able to effectively load-balance traffic among the NVEs are not able to effectively load-balance traffic among the NVEs
connected to the multi-homed Ethernet segment. This could be the connected to the multihomed ES. For example, this could be the case
case, for e.g. when the NVEs perform data-path learning on the when the NVEs perform data-path learning on the access and the load-
access, and the load-balancing function on the station hashes traffic balancing function on the station hashes traffic from a given source
from a given source MAC address to a single NVE. Another scenario MAC address to a single NVE. Another scenario where this occurs is
where this occurs is when the NVEs rely on control plane learning on when the NVEs rely on control-plane learning on the access (e.g.,
the access (e.g. using ARP), since ARP traffic will be hashed to a using ARP), since ARP traffic will be hashed to a single link in the
single link in the LAG. LAG.
To alleviate this issue, EVPN introduces the concept of Aliasing. To alleviate this issue, EVPN introduces the concept of "Aliasing".
This refers to the ability of an NVE to signal that it has This refers to the ability of an NVE to signal that it has
reachability to a given locally attached Ethernet segment, even when reachability to a given locally attached ES, even when it has learned
it has learnt no MAC addresses from that segment. The Ethernet A-D no MAC addresses from that segment. The Ethernet A-D route per EVI
route per EVI is used to that end. Remote NVEs which receive MAC is used to that end. Remote NVEs that receive MAC Advertisement
advertisement routes with non-zero ESI should consider the MAC routes with non-zero ESIs should consider the MAC address as
address as reachable via all NVEs that advertise reachability to the reachable via all NVEs that advertise reachability to the relevant
relevant Segment using Ethernet A-D routes with the same ESI and with Segment using Ethernet A-D routes with the same ESI and with the
the Single-Active flag reset. Single-Active flag reset.
Backup-Path is a closely related function, albeit it applies to the Backup Path is a closely related function, albeit one that applies to
case where the redundancy mode is Single-Active. In this case, the the case where the redundancy mode is Single-Active. In this case,
NVE signals that it has reachability to a given locally attached the NVE signals that it has reachability to a given locally attached
Ethernet Segment using the Ethernet A-D route as well. Remote NVEs ES using the Ethernet A-D route as well. Remote NVEs that receive
which receive the MAC advertisement routes, with non-zero ESI, should the MAC Advertisement routes, with non-zero ESI, should consider the
consider the MAC address as reachable via the advertising NVE. MAC address as reachable via the advertising NVE. Furthermore, the
Furthermore, the remote NVEs should install a Backup-Path, for said remote NVEs should install a Backup Path, for said MAC, to the NVE
MAC, to the NVE which had advertised reachability to the relevant that had advertised reachability to the relevant segment using an
Segment using an Ethernet A-D route with the same ESI and with the Ethernet A-D route with the same ESI and with the Single-Active flag
Single-Active flag set. set.
8.1.5 DF Election 8.1.5. DF Election
If a host is multi-homed to two or more NVEs on an Ethernet segment If a host is multihomed to two or more NVEs on an ES operating in
operating in all-active redundancy mode, then for a given EVI only All-Active redundancy mode, then, for a given EVI, only one of these
one of these NVEs, termed the Designated Forwarder (DF) is NVEs, termed the "Designated Forwarder" (DF) is responsible for
responsible for sending it broadcast, multicast, and, if configured sending it broadcast, multicast, and, if configured for that EVI,
for that EVI, unknown unicast frames. unknown unicast frames.
This is required in order to prevent duplicate delivery of multi- This is required in order to prevent duplicate delivery of multi-
destination frames to a multi-homed host or VM, in case of all-active destination frames to a multihomed host or VM, in case of All-Active
redundancy. redundancy.
In NVEs where .1Q tagged frames are received from hosts, the DF In NVEs where frames tagged as IEEE 802.1Q [IEEE.802.1Q] are received
election should be performed based on host VLAN IDs (VIDs) per from hosts, the DF election should be performed based on host VIDs
section 8.5 of [RFC7432]. Furthermore, multi-homing PEs of a given per Section 8.5 of [RFC7432]. Furthermore, multihoming PEs of a
Ethernet Segment MAY perform DF election using configured IDs such as given ES MAY perform DF election using configured IDs such as VNI,
VNI, EVI, normalized VIDs, and etc. as along the IDs are configured EVI, normalized VIDs, and etc., as along the IDs are configured
consistently across the multi-homing PEs. consistently across the multihoming PEs.
In GWs where VXLAN encapsulated frames are received, the DF election In GWs where VXLAN-encapsulated frames are received, the DF election
is performed on VNIs. Again, it is assumed that for a given Ethernet is performed on VNIs. Again, it is assumed that, for a given
Segment, VNIs are unique and consistent (e.g., no duplicate VNIs Ethernet segment, VNIs are unique and consistent (e.g., no duplicate
exist). VNIs exist).
8.2 Impact on EVPN BGP Routes & Attributes 8.2. Impact on EVPN BGP Routes and Attributes
Since multi-homing is supported in this scenario, then the entire set Since multihoming is supported in this scenario, the entire set of
of BGP routes and attributes defined in [RFC7432] are used. The BGP routes and attributes defined in [RFC7432] is used. The setting
setting of the Ethernet Tag field in the MAC Advertisement, Ethernet of the Ethernet Tag field in the MAC Advertisement, Ethernet A-D per
AD per EVI, and Inclusive Multicast routes follows that of section EVI, and IMET) routes follows that of Section 5.1.3. Furthermore,
5.1.3. Furthermore, the setting of the VNI field in the MAC the setting of the VNI field in the MAC Advertisement and Ethernet
Advertisement and Ethernet AD per EVI routes follows that of section A-D per EVI routes follows that of Section 5.1.3.
5.1.3.
8.3 Impact on EVPN Procedures 8.3. Impact on EVPN Procedures
Two cases need to be examined here, depending on whether the NVEs are Two cases need to be examined here, depending on whether the NVEs are
operating in Single-Active or in All-Active redundancy mode. operating in Single-Active or in All-Active redundancy mode.
First, lets consider the case of Single-Active redundancy mode, where First, let's consider the case of Single-Active redundancy mode,
the hosts are multi-homed to a set of NVEs, however, only a single where the hosts are multihomed to a set of NVEs; however, only a
NVE is active at a given point of time for a given VNI. In this case, single NVE is active at a given point of time for a given VNI. In
the aliasing is not required and the split-horizon filtering may not this case, the Aliasing is not required, and the split-horizon
be required, but other functions such as multi-homed Ethernet segment filtering may not be required, but other functions such as multihomed
auto-discovery, fast convergence and mass withdraw, backup path, and ES auto-discovery, fast convergence and mass withdrawal, Backup Path,
DF election are required. and DF election are required.
Second, let's consider the case of All-Active redundancy mode. In Second, let's consider the case of All-Active redundancy mode. In
this case, out of all the EVPN multi-homing features listed in this case, out of all the EVPN multihoming features listed in
section 8.1, the use of the VXLAN or NVGRE encapsulation impacts the Section 8.1, the use of the VXLAN or NVGRE encapsulation impacts the
split-horizon and aliasing features, since those two rely on the MPLS split-horizon and Aliasing features, since those two rely on the MPLS
client layer. Given that this MPLS client layer is absent with these client layer. Given that this MPLS client layer is absent with these
types of encapsulations, alternative procedures and mechanisms are types of encapsulations, alternative procedures and mechanisms are
needed to provide the required functions. Those are discussed in needed to provide the required functions. Those are discussed in
detail next. detail next.
8.3.1 Split Horizon 8.3.1. Split Horizon
In EVPN, an MPLS label is used for split-horizon filtering to support In EVPN, an MPLS label is used for split-horizon filtering to support
All-Active multi-homing where an ingress NVE adds a label All-Active multihoming where an ingress NVE adds a label
corresponding to the site of origin (aka ESI Label) when corresponding to the site of origin (aka an ESI label) when
encapsulating the packet. The egress NVE checks the ESI label when encapsulating the packet. The egress NVE checks the ESI label when
attempting to forward a multi-destination frame out an interface, and attempting to forward a multi-destination frame out an interface, and
if the label corresponds to the same site identifier (ESI) associated if the label corresponds to the same site identifier (ESI) associated
with that interface, the packet gets dropped. This prevents the with that interface, the packet gets dropped. This prevents the
occurrence of forwarding loops. occurrence of forwarding loops.
Since VXLAN and NVGRE encapsulations do not include the ESI label, Since VXLAN and NVGRE encapsulations do not include the ESI label,
other means of performing the split-horizon filtering function must other means of performing the split-horizon filtering function must
be devised for these encapsulations. The following approach is be devised for these encapsulations. The following approach is
recommended for split-horizon filtering when VXLAN (or NVGRE) recommended for split-horizon filtering when VXLAN (or NVGRE)
encapsulation is used. encapsulation is used.
Every NVE track the IP address(es) associated with the other NVE(s) Every NVE tracks the IP address(es) associated with the other NVE(s)
with which it has shared multi-homed Ethernet Segments. When the NVE with which it has shared multihomed ESs. When the NVE receives a
receives a multi-destination frame from the overlay network, it multi-destination frame from the overlay network, it examines the
examines the source IP address in the tunnel header (which source IP address in the tunnel header (which corresponds to the
corresponds to the ingress NVE) and filters out the frame on all ingress NVE) and filters out the frame on all local interfaces
local interfaces connected to Ethernet Segments that are shared with connected to ESs that are shared with the ingress NVE. With this
the ingress NVE. With this approach, it is required that the ingress approach, it is required that the ingress NVE perform replication
NVE performs replication locally to all directly attached Ethernet locally to all directly attached Ethernet segments (regardless of the
Segments (regardless of the DF Election state) for all flooded DF election state) for all flooded traffic ingress from the access
traffic ingress from the access interfaces (i.e. from the hosts). interfaces (i.e., from the hosts). This approach is referred to as
This approach is referred to as "Local Bias", and has the advantage "Local Bias", and has the advantage that only a single IP address
that only a single IP address needs to be used per NVE for split- need be used per NVE for split-horizon filtering, as opposed to
horizon filtering, as opposed to requiring an IP address per Ethernet requiring an IP address per Ethernet segment per NVE.
Segment per NVE.
In order to allow proper operation of split-horizon filtering among In order to allow proper operation of split-horizon filtering among
the same group of multi-homing PE devices, a mix of PE devices with the same group of multihoming PE devices, a mix of PE devices with
MPLS over GRE encapsulations running [RFC7432] procedures for split- MPLS over GRE encapsulations running the procedures from [RFC7432]
horizon filtering on the one hand and VXLAN/NVGRE encapsulations for split-horizon filtering on the one hand and VXLAN/NVGRE
running local-bias procedures on the other on a given Ethernet encapsulation running local-bias procedures on the other on a given
Segment MUST NOT be configured. Ethernet segment MUST NOT be configured.
8.3.2 Aliasing and Backup-Path 8.3.2. Aliasing and Backup Path
The Aliasing and the Backup-Path procedures for VXLAN/NVGRE The Aliasing and the Backup Path procedures for VXLAN/NVGRE
encapsulation are very similar to the ones for MPLS. In case of MPLS, encapsulation are very similar to the ones for MPLS. In the case of
Ethernet A-D route per EVI is used for Aliasing when the MPLS, Ethernet A-D route per EVI is used for Aliasing when the
corresponding Ethernet Segment operates in All-Active multi-homing, corresponding ES operates in All-Active multihoming, and the same
and the same route is used for Backup-Path when the corresponding route is used for Backup Path when the corresponding ES operates in
Ethernet Segment operates in Single-Active multi-homing. In case of Single-Active multihoming. In the case of VXLAN/NVGRE, the same
VXLAN/NVGRE, the same route is used for the Aliasing and the Backup- route is used for the Aliasing and the Backup Path with the
Path with the difference that the Ethernet Tag and VNI fields in difference that the Ethernet Tag and VNI fields in Ethernet A-D per
Ethernet A-D per EVI route are set as described in section 5.1.3. EVI route are set as described in Section 5.1.3.
8.3.3 Unknown Unicast Traffic Designation 8.3.3. Unknown Unicast Traffic Designation
In EVPN, when an ingress PE uses ingress replication to flood unknown In EVPN, when an ingress PE uses ingress replication to flood unknown
unicast traffic to egress PEs, the ingress PE uses a different EVPN unicast traffic to egress PEs, the ingress PE uses a different EVPN
MPLS label (from the one used for known unicast traffic) to identify MPLS label (from the one used for known unicast traffic) to identify
such BUM traffic. The egress PEs use this label to identify such BUM such BUM traffic. The egress PEs use this label to identify such BUM
traffic and thus apply DF filtering for All-Active multi-homed sites. traffic and, thus, apply DF filtering for All-Active multihomed
In absence of unknown unicast traffic designation and in presence of sites. In absence of an unknown unicast traffic designation and in
enabling unknown unicast flooding, there can be transient duplicate the presence of enabling unknown unicast flooding, there can be
traffic to All-Active multi-homed sites under the following transient duplicate traffic to All-Active multihomed sites under the
condition: the host MAC address is learned by the egress PE(s) and following condition: the host MAC address is learned by the egress
advertised to the ingress PE; however, the MAC advertisement has not PE(s) and advertised to the ingress PE; however, the MAC
been received or processed by the ingress PE, resulting in the host Advertisement has not been received or processed by the ingress PE,
MAC address to be unknown on the ingress PE but be known on the resulting in the host MAC address being unknown on the ingress PE but
egress PE(s). Therefore, when a packet destined to that host MAC known on the egress PE(s). Therefore, when a packet destined to that
address arrives on the ingress PE, it floods it via ingress host MAC address arrives on the ingress PE, it floods it via ingress
replication to all the egress PE(s) and since they are known to the replication to all the egress PE(s), and since they are known to the
egress PE(s), multiple copies is sent to the All-Active multi-homed egress PE(s), multiple copies are sent to the All-Active multihomed
site. It should be noted that such transient packet duplication only site. It should be noted that such transient packet duplication only
happens when a) the destination host is multi-homed via All-Active happens when a) the destination host is multihomed via All-Active
redundancy mode, b) flooding of unknown unicast is enabled in the redundancy mode, b) flooding of unknown unicast is enabled in the
network, c) ingress replication is used, and d) traffic for the network, c) ingress replication is used, and d) traffic for the
destination host is arrived on the ingress PE before it learns the destination host is arrived on the ingress PE before it learns the
host MAC address via BGP EVPN advertisement. If it is desired to host MAC address via BGP EVPN advertisement. If it is desired to
avoid occurrence of such transient packet duplication (however low avoid occurrence of such transient packet duplication (however low
probability that may be), then VXLAN-GPE encapsulation needs to be probability that may be), then VXLAN-GPE encapsulation needs to be
used between these PEs and the ingress PE needs to set the BUM used between these PEs and the ingress PE needs to set the BUM
Traffic Bit (B bit) [VXLAN-GPE] to indicate that this is an ingress- Traffic Bit (B bit) [VXLAN-GPE] to indicate that this is an ingress-
replicated BUM traffic. replicated BUM traffic.
9 Support for Multicast 9. Support for Multicast
The E-VPN Inclusive Multicast Ethernet Tag (IMET) route is used to The EVPN IMET route is used to discover the multicast tunnels among
discover the multicast tunnels among the endpoints associated with a the endpoints associated with a given EVI (e.g., given VNI) for VLAN-
given EVI (e.g., given VNI) for VLAN-based service and a given Based Service and a given <EVI, VLAN> for VLAN-Aware Bundle Service.
<EVI,VLAN> for VLAN-aware bundle service. All fields of this route is All fields of this route are set as described in Section 5.1.3. The
set as described in section 5.1.3. The Originating router's IP originating router's IP address field is set to the NVE's IP address.
address field is set to the NVE's IP address. This route is tagged This route is tagged with the PMSI Tunnel attribute, which is used to
with the PMSI Tunnel attribute, which is used to encode the type of encode the type of multicast tunnel to be used as well as the
multicast tunnel to be used as well as the multicast tunnel multicast tunnel identifier. The tunnel encapsulation is encoded by
identifier. The tunnel encapsulation is encoded by adding the BGP adding the BGP Encapsulation Extended Community as per Section 5.1.1.
Encapsulation extended community as per section 5.1.1. For example, For example, the PMSI Tunnel attribute may indicate the multicast
the PMSI Tunnel attribute may indicate the multicast tunnel is of tunnel is of type Protocol Independent Multicast - Sparse-Mode (PIM-
type Protocol Independent Multicast - Sparse-Mode (PIM-SM); whereas, SM); whereas, the BGP Encapsulation Extended Community may indicate
the BGP Encapsulation extended community may indicate the the encapsulation for that tunnel is of type VXLAN. The following
encapsulation for that tunnel is of type VXLAN. The following tunnel tunnel types as defined in [RFC6514] can be used in the PMSI Tunnel
types as defined in [RFC6514] can be used in the PMSI tunnel
attribute for VXLAN/NVGRE: attribute for VXLAN/NVGRE:
+ 3 - PIM-SSM Tree + 3 - PIM-SSM Tree
+ 4 - PIM-SM Tree + 4 - PIM-SM Tree
+ 5 - Bidir-PIM Tree + 5 - BIDIR-PIM Tree
+ 6 - Ingress Replication + 6 - Ingress Replication
In case of VxLAN and NVGRE encapsulation with locally-assigned VNIs, In case of VXLAN and NVGRE encapsulations with locally assigned VNIs,
just as in [RFC7432], each PE MUST advertise an IMET route to other just as in [RFC7432], each PE MUST advertise an IMET route to other
PEs in an EVPN instance for the multicast tunnel type that it uses PEs in an EVPN instance for the multicast tunnel type that it uses
(i.e., ingress replication, PIM-SM, PIM-SSM, or Bidir-PIM tunnel). (i.e., ingress replication, PIM-SM, PIM-SSM, or BIDIR-PIM tunnel).
However, for globally-assigned VNIs, each PE MUST advertise IMET However, for globally assigned VNIs, each PE MUST advertise an IMET
route to other PEs in an EVPN instance for ingress replication or route to other PEs in an EVPN instance for ingress replication or a
PIM-SSM tunnel, and MAY advertise IMET route for PIM-SM or Bidir-PIM PIM-SSM tunnel, and they MAY advertise an IMET route for a PIM-SM or
tunnel. In case of PIM-SM or Bidir-PIM tunnel, no information in the BIDIR-PIM tunnel. In case of a PIM-SM or BIDIR-PIM tunnel, no
IMET route is needed by the PE to setup these tunnels. information in the IMET route is needed by the PE to set up these
tunnels.
In the scenario where the multicast tunnel is a tree, both the In the scenario where the multicast tunnel is a tree, both the
Inclusive as well as the Aggregate Inclusive variants may be used. In Inclusive as well as the Aggregate Inclusive variants may be used.
the former case, a multicast tree is dedicated to a VNI. Whereas, in In the former case, a multicast tree is dedicated to a VNI. Whereas,
the latter, a multicast tree is shared among multiple VNIs. For VNI- in the latter, a multicast tree is shared among multiple VNIs. For
based service, the Aggregate Inclusive mode is accomplished by having VNI-Based Service, the Aggregate Inclusive mode is accomplished by
the NVEs advertise multiple IMET routes with different Route Targets having the NVEs advertise multiple IMET routes with different RTs
(one per VNI) but with the same tunnel identifier encoded in the PMSI (one per VNI) but with the same tunnel identifier encoded in the PMSI
tunnel attribute. For VNI-aware bundle service, the Aggregate Tunnel attribute. For VNI-Aware Bundle Service, the Aggregate
Inclusive mode is accomplished by having the NVEs advertise multiple Inclusive mode is accomplished by having the NVEs advertise multiple
IMET routes with different VNI encoded in the Ethernet Tag field, but IMET routes with different VNIs encoded in the Ethernet Tag field,
with the same tunnel identifier encoded in the PMSI Tunnel attribute. but with the same tunnel identifier encoded in the PMSI Tunnel
attribute.
10 Data Center Interconnections - DCI 10. Data-Center Interconnections (DCIs)
For DCI, the following two main scenarios are considered when For DCIs, the following two main scenarios are considered when
connecting data centers running evpn-overlay (as described here) over connecting data centers running evpn-overlay (as described here) over
MPLS/IP core network: an MPLS/IP core network:
- Scenario 1: DCI using GWs - Scenario 1: DCI using GWs
- Scenario 2: DCI using ASBRs
- Scenario 2: DCI using ASBRs
The following two subsections describe the operations for each of The following two subsections describe the operations for each of
these scenarios. these scenarios.
10.1 DCI using GWs 10.1. DCI Using GWs
This is the typical scenario for interconnecting data centers over This is the typical scenario for interconnecting data centers over
WAN. In this scenario, EVPN routes are terminated and processed in WAN. In this scenario, EVPN routes are terminated and processed in
each GW and MAC/IP routes are always re-advertised from DC to WAN but each GW and MAC/IP route are always re-advertised from DC to WAN but
from WAN to DC, they are not re-advertised if unknown MAC address from WAN to DC, they are not re-advertised if unknown MAC addresses
(and default IP address) are utilized in NVEs. In this scenario, each (and default IP address) are utilized in the NVEs. In this scenario,
GW maintains a MAC-VRF (and/or IP-VRF) for each EVI. The main each GW maintains a MAC-VRF (and/or IP-VRF) for each EVI. The main
advantage of this approach is that NVEs do not need to maintain MAC advantage of this approach is that NVEs do not need to maintain MAC
and IP addresses from any remote data centers when default IP route and IP addresses from any remote data centers when default IP routes
and unknown MAC routes are used - i.e., they only need to maintain and unknown MAC routes are used; that is, they only need to maintain
routes that are local to their own DC. When default IP route and routes that are local to their own DC. When default IP routes and
unknown MAC route are used, any unknown IP and MAC packets from NVEs unknown MAC routes are used, any unknown IP and MAC packets from NVEs
are forwarded to the GWs where all the VPN MAC and IP routes are are forwarded to the GWs where all the VPN MAC and IP routes are
maintained. This approach reduces the size of MAC-VRF and IP-VRF maintained. This approach reduces the size of MAC-VRF and IP-VRF
significantly at NVEs. Furthermore, it results in a faster significantly at NVEs. Furthermore, it results in a faster
convergence time upon a link or NVE failure in a multi-homed network convergence time upon a link or NVE failure in a multihomed network
or device redundancy scenario, because the failure related BGP routes or device redundancy scenario, because the failure-related BGP routes
(such as mass withdraw message) do not need to get propagated all the (such as mass withdrawal message) do not need to get propagated all
way to the remote NVEs in the remote DCs. This approach is described the way to the remote NVEs in the remote DCs. This approach is
in details in section 3.4 of [DCI-EVPN-OVERLAY]. described in detail in Section 3.4 of [DCI-EVPN-OVERLAY].
10.2 DCI using ASBRs 10.2. DCI Using ASBRs
This approach can be considered as the opposite of the first approach This approach can be considered as the opposite of the first
and it favors simplification at DCI devices over NVEs such that approach. It favors simplification at DCI devices over NVEs such
larger MAC-VRF (and IP-VRF) tables need to be maintained on NVEs; that larger MAC-VRF (and IP-VRF) tables need to be maintained on
whereas, DCI devices don't need to maintain any MAC (and IP) NVEs; whereas DCI devices don't need to maintain any MAC (and IP)
forwarding tables. Furthermore, DCI devices do not need to terminate forwarding tables. Furthermore, DCI devices do not need to terminate
and process routes related to multi-homing but rather to relay these and process routes related to multihoming but rather to relay these
messages for the establishment of an end-to-end Label Switched Path messages for the establishment of an end-to-end Label Switched Path
(LSP) path. In other words, DCI devices in this approach operate (LSP). In other words, DCI devices in this approach operate similar
similar to ASBRs for inter-AS option B - section 10 of [RFC4364]. to ASBRs for inter-AS Option B (see Section 10 of [RFC4364]). This
This requires locally assigned VNIs to be used just like downstream requires locally assigned VNIs to be used just like downstream-
assigned MPLS VPN label where for all practical purposes the VNIs assigned MPLS VPN labels where, for all practical purposes, the VNIs
function like 24-bit VPN labels. This approach is equally applicable function like 24-bit VPN labels. This approach is equally applicable
to data centers (or Carrier Ethernet networks) with MPLS to data centers (or Carrier Ethernet networks) with MPLS
encapsulation. encapsulation.
In inter-AS option B, when ASBR receives an EVPN route from its DC In inter-AS Option B, when ASBR receives an EVPN route from its DC
over internal BGP (iBGP) and re-advertises it to other ASBRs, it re- over internal BGP (iBGP) and re-advertises it to other ASBRs, it
advertises the EVPN route by re-writing the BGP next-hops to itself, re-advertises the EVPN route by re-writing the BGP next hops to
thus losing the identity of the PE that originated the advertisement. itself, thus losing the identity of the PE that originated the
This re-write of BGP next-hop impacts the EVPN Mass Withdraw route advertisement. This rewrite of BGP next hop impacts the EVPN mass
(Ethernet A-D per ES) and its procedure adversely. However, it does withdrawal route (Ethernet A-D per ES) and its procedure adversely.
not impact EVPN Aliasing mechanism/procedure because when the However, it does not impact the EVPN Aliasing mechanism/procedure
Aliasing routes (Ether A-D per EVI) are advertised, the receiving PE because when the Aliasing routes (Ethernet A-D per EVI) are
first resolves a MAC address for a given EVI into its corresponding advertised, the receiving PE first resolves a MAC address for a given
<ES,EVI> and subsequently, it resolves the <ES,EVI> into multiple EVI into its corresponding <ES, EVI>, and, subsequently, it resolves
paths (and their associated next hops) via which the <ES,EVI> is the <ES, EVI> into multiple paths (and their associated next hops)
reachable. Since Aliasing and MAC routes are both advertised per EVI via which the <ES, EVI> is reachable. Since Aliasing and MAC routes
basis and they use the same RD and RT (per EVI), the receiving PE can are both advertised on a per-EVI-basis and they use the same RD and
associate them together on a per BGP path basis (e.g., per RT (per EVI), the receiving PE can associate them together on a
originating PE) and thus perform recursive route resolution - e.g., a per-BGP-path basis (e.g., per originating PE). Thus, it can perform
MAC is reachable via an <ES,EVI> which in turn, is reachable via a recursive route resolution, e.g., a MAC is reachable via an <ES, EVI>
set of BGP paths, thus the MAC is reachable via the set of BGP paths. which in turn, is reachable via a set of BGP paths; thus, the MAC is
Since on a per EVI basis, the association of MAC routes and the reachable via the set of BGP paths. Due to the per-EVI basis, the
corresponding Aliasing route is fixed and determined by the same RD association of MAC routes and the corresponding Aliasing route is
and RT, there is no ambiguity when the BGP next hop for these routes fixed and determined by the same RD and RT; there is no ambiguity
is re-written as these routes pass through ASBRs - i.e., the when the BGP next hop for these routes is rewritten as these routes
receiving PE may receive multiple Aliasing routes for the same EVI pass through ASBRs. That is, the receiving PE may receive multiple
from a single next hop (a single ASBR), and it can still create Aliasing routes for the same EVI from a single next hop (a single
multiple paths toward that <ES, EVI>. ASBR), and it can still create multiple paths toward that <ES, EVI>.
However, when the BGP next hop address corresponding to the However, when the BGP next-hop address corresponding to the
originating PE is re-written, the association between the Mass originating PE is rewritten, the association between the mass
Withdraw route (Ether A-D per ES) and its corresponding MAC routes withdrawal route (Ethernet A-D per ES) and its corresponding MAC
cannot be made based on their RDs and RTs because the RD for Mass routes cannot be made based on their RDs and RTs because the RD for
Withdraw route is different than the one for the MAC routes. the mass Withdrawal route is different than the one for the MAC
Therefore, the functionality needed at the ASBRs and the receiving routes. Therefore, the functionality needed at the ASBRs and the
PEs depends on whether the Mass Withdraw route is originated and receiving PEs depends on whether the Mass Withdrawal route is
whether there is a need to handle route resolution ambiguity for this originated and whether there is a need to handle route resolution
route. The following two subsections describe the functionality ambiguity for this route. The following two subsections describe the
needed by the ASBRs and the receiving PEs depending on whether the functionality needed by the ASBRs and the receiving PEs depending on
NVEs reside in a Hypervisors or in TORs. whether the NVEs reside in a hypervisors or in ToR switches.
10.2.1 ASBR Functionality with Single-Homing NVEs 10.2.1. ASBR Functionality with Single-Homing NVEs
When NVEs reside in hypervisors as described in section 7.1, there is When NVEs reside in hypervisors as described in Section 7.1, there is
no multi-homing and thus there is no need for the originating NVE to no multihoming; thus, there is no need for the originating NVE to
send Ethernet A-D per ES or Ethernet A-D per EVI routes. However, as send Ethernet A-D per ES or Ethernet A-D per EVI routes. However, as
noted in section 7, in order to enable a single-homing ingress NVE to noted in Section 7, in order to enable a single-homing ingress NVE to
take advantage of fast convergence, aliasing, and backup-path when take advantage of fast convergence, Aliasing, and Backup Path when
interacting with multi-homing egress NVEs attached to a given interacting with multihoming egress NVEs attached to a given ES, the
Ethernet segment, the single-homing NVE should be able to receive and single-homing NVE should be able to receive and process Ethernet A-D
process Ethernet AD per ES and Ethernet AD per EVI routes. The per ES and Ethernet A-D per EVI routes. The handling of these routes
handling of these routes are described in the next section. is described in the next section.
10.2.2 ASBR Functionality with Multi-Homing NVEs 10.2.2. ASBR Functionality with Multihoming NVEs
When NVEs reside in TORs and operate in multi-homing redundancy mode, When NVEs reside in ToR switches and operate in multihoming
then as described in section 8, there is a need for the originating redundancy mode, there is a need, as described in Section 8, for the
multi-homing NVE to send Ethernet A-D per ES route(s) (used for mass originating multihoming NVE to send Ethernet A-D per ES route(s)
withdraw) and Ethernet A-D per EVI routes (used for aliasing). As (used for mass withdrawal) and Ethernet A-D per EVI routes (used for
described above, the re-write of BGP next-hop by ASBRs creates Aliasing). As described above, the rewrite of BGP next hop by ASBRs
ambiguities when Ethernet A-D per ES routes are received by the creates ambiguities when Ethernet A-D per ES routes are received by
remote NVE in a different ASBR because the receiving NVE cannot the remote NVE in a different ASBR because the receiving NVE cannot
associated that route with the MAC/IP routes of that Ethernet Segment associate that route with the MAC/IP routes of that ES advertised by
advertised by the same originating NVE. This ambiguity inhibits the the same originating NVE. This ambiguity inhibits the function of
function of mass-withdraw per ES by the receiving NVE in a different mass withdrawal per ES by the receiving NVE in a different AS.
AS.
As an example consider a scenario where CE is multi-homed to PE1 and As an example, consider a scenario where a CE is multihomed to PE1
PE2 where these PEs are connected via ASBR1 and then ASBR2 to the and PE2, where these PEs are connected via ASBR1 and then ASBR2 to
remote PE3. Furthermore, consider that PE1 receives M1 from CE1 but the remote PE3. Furthermore, consider that PE1 receives M1 from CE1
not PE2. Therefore, PE1 advertises Eth A-D per ES1, Eth A-D per EVI1, but not PE2. Therefore, PE1 advertises Ethernet A-D per ES1,
and M1; whereas, PE2 only advertises Eth A-D per ES1 and Eth A-D per Ethernet A-D per EVI1, and M1; whereas, PE2 only advertises Ethernet
EVI1. ASBR1 receives all these five advertisements and passes them to A-D per ES1 and Ethernet A-D per EVI1. ASBR1 receives all these five
ASBR2 (with itself as the BGP next hop). ASBR2, in turn, passes them advertisements and passes them to ASBR2 (with itself as the BGP next
to the remote PE3 with itself as the BGP next hop. PE3 receives these hop). ASBR2, in turn, passes them to the remote PE3, with itself as
five routes where all of them have the same BGP next-hop (i.e., the BGP next hop. PE3 receives these five routes where all of them
ASBR2). Furthermore, the two Ether A-D per ES routes received by PE3 have the same BGP next hop (i.e., ASBR2). Furthermore, the two
have the same info - i.e., same ESI and the same BGP next hop. Ethernet A-D per ES routes received by PE3 have the same information,
Although both of these routes are maintained by the BGP process in i.e., same ESI and the same BGP next hop. Although both of these
PE3 (because they have different RDs and thus treated as different routes are maintained by the BGP process in PE3 (because they have
BGP routes), information from only one of them is used in the L2 different RDs and, thus, are treated as different BGP routes),
routing table (L2 RIB). information from only one of them is used in the L2 routing table (L2
RIB).
PE1 PE1
/ \ / \
CE ASBR1---ASBR2---PE3 CE ASBR1---ASBR2---PE3
\ / \ /
PE2 PE2
Figure 1: Inter-AS Option B Figure 3: Inter-AS Option B
Now, when the AC between the PE2 and the CE fails and PE2 sends NLRI Now, when the AC between the PE2 and the CE fails and PE2 sends
withdrawal for Ether A-D per ES route and this withdrawal gets Network Layer Reachability Information (NLRI) withdrawal for Ethernet
propagated and received by the PE3, the BGP process in PE3 removes A-D per ES route, and this withdrawal gets propagated and received by
the corresponding BGP route; however, it doesn't remove the the PE3, the BGP process in PE3 removes the corresponding BGP route;
associated info (namely ESI and BGP next hop) from the L2 routing however, it doesn't remove the associated information (namely ESI and
table (L2 RIB) because it still has the other Ether A-D per ES route BGP next hop) from the L2 routing table (L2 RIB) because it still has
(originated from PE1) with the same info. That is why the mass- the other Ethernet A-D per ES route (originated from PE1) with the
withdraw mechanism does not work when doing DCI with inter-AS option same information. That is why the mass withdrawal mechanism does not
B. However, as described previoulsy, the aliasing function works and work when doing DCI with inter-AS Option B. However, as described
so does "mass-withdraw per EVI" (which is associated with withdrawing previously, the Aliasing function works and so does "mass withdrawal
the EVPN route associated with Aliasing - i.e., Ether A-D per EVI per EVI" (which is associated with withdrawing the EVPN route
route). associated with Aliasing, i.e., Ethernet A-D per EVI route).
In the above example, the PE3 receives two Aliasing routes with the In the above example, the PE3 receives two Aliasing routes with the
same BGP next hop (ASBR2) but different RDs. One of the Alias route same BGP next hop (ASBR2) but different RDs. One of the Aliasing
has the same RD as the advertised MAC route (M1). PE3 follows the route has the same RD as the advertised MAC route (M1). PE3 follows
route resolution procedure specified in [RFC7432] upon receiving the the route resolution procedure specified in [RFC7432] upon receiving
two Aliasing route - ie, it resolves M1 to <ES, EVI1> and the two Aliasing routes; that is, it resolves M1 to <ES, EVI1>, and,
subsequently it resolves <ES,EVI1> to a BGP path list with two paths subsequently, it resolves <ES, EVI1> to a BGP path list with two
along with the corresponding VNIs/MPLS labels (one associated with paths along with the corresponding VNIs/MPLS labels (one associated
PE1 and the other associated with PE2). It should be noted that even with PE1 and the other associated with PE2). It should be noted that
though both paths are advertised by the same BGP next hop (ASRB2), even though both paths are advertised by the same BGP next hop
the receiving PE3 can handle them properly. Therefore, M1 is (ASRB2), the receiving PE3 can handle them properly. Therefore, M1
reachable via two paths. This creates two end-to-end LSPs, from PE3 is reachable via two paths. This creates two end-to-end LSPs, from
to PE1 and from PE3 to PE2, for M1 such that when PE3 wants to PE3 to PE1 and from PE3 to PE2, for M1 such that when PE3 wants to
forward traffic destined to M1, it can load balanced between the two forward traffic destined to M1, it can load-balance between the two
LSPs. Although route resolution for Aliasing routes with the same BGP LSPs. Although route resolution for Aliasing routes with the same
next hop is not explicitly mentioned in [RFC7432], this is the BGP next hop is not explicitly mentioned in [RFC7432], this is the
expected operation and thus it is elaborated here. expected operation; thus, it is elaborated here.
When the AC between the PE2 and the CE fails and PE2 sends NLRI When the AC between the PE2 and the CE fails and PE2 sends NLRI
withdrawal for Ether A-D per EVI routes and these withdrawals get withdrawal for Ethernet A-D per EVI routes, and these withdrawals get
propagated and received by the PE3, the PE3 removes the Aliasing propagated and received by the PE3, the PE3 removes the Aliasing
route and updates the path list - ie, it removes the path route and updates the path list; that is, it removes the path
corresponding to the PE2. Therefore, all the corresponding MAC routes corresponding to the PE2. Therefore, all the corresponding MAC
for that <ES,EVI> that point to that path list will now have the routes for that <ES, EVI> that point to that path list will now have
updated path list with a single path associated with PE1. This action the updated path list with a single path associated with PE1. This
can be considered as the mass-withdraw at the per-EVI level. The action can be considered to be the mass withdrawal at the per-EVI
mass-withdraw at per-EVI level has longer convergence time than the level. The mass withdrawal at the per-EVI level has a longer
mass-withdraw at per-ES level; however, it is much faster than the convergence time than the mass withdrawal at the per-ES level;
convergence time when the withdraw is done on a per-MAC basis. however, it is much faster than the convergence time when the
withdrawal is done on a per-MAC basis.
If a PE becomes detached from a given ES, then in addition to If a PE becomes detached from a given ES, then, in addition to
withdrawing its previously advertised Ethernet AD Per ES routes, it withdrawing its previously advertised Ethernet A-D per ES routes, it
MUST also withdraw its previously advertised Ethernet AD Per EVI MUST also withdraw its previously advertised Ethernet A-D per EVI
routes for that ES. For a remote PE that is separated from the routes for that ES. For a remote PE that is separated from the
withdrawing PE by one or more EVPN inter-AS option B ASBRs, the withdrawing PE by one or more EVPN inter-AS Option B ASBRs, the
withdrawal of the Ethernet AD Per ES routes is not actionable. withdrawal of the Ethernet A-D per ES routes is not actionable.
However, a remote PE is able to correlate a previously advertised However, a remote PE is able to correlate a previously advertised
Ethernet AD Per EVI route with any MAC/IP Advertisement routes also Ethernet A-D per EVI route with any MAC/IP Advertisement routes also
advertised by the withdrawing PE for that <ES, EVI, BD>. Hence, when advertised by the withdrawing PE for that <ES, EVI, BD>. Hence, when
it receives the withdrawal of an Ethernet AD Per EVI route, it SHOULD it receives the withdrawal of an Ethernet A-D per EVI route, it
remove the withdrawing PE as a next-hop for all MAC addresses SHOULD remove the withdrawing PE as a next hop for all MAC addresses
associated with that <ES, EVI, BD>. associated with that <ES, EVI, BD>.
In the previous example, when the AC between PE2 and the CE fails, In the previous example, when the AC between PE2 and the CE fails,
PE2 will withdraw its Ethernet AD Per ES and Per EVI routes. When PE2 will withdraw its Ethernet A-D per ES and per EVI routes. When
PE3 receives the withdrawal of an Ethernet AD Per EVI route, it PE3 receives the withdrawal of an Ethernet A-D per EVI route, it
removes PE2 as a valid next-hop for all MAC addresses associated with removes PE2 as a valid next hop for all MAC addresses associated with
the corresponding <ES, EVI, BD>. Therefore, all the MAC next-hops the corresponding <ES, EVI, BD>. Therefore, all the MAC next hops
for that <ES,EVI, BD> will now have a single next-hop, viz the LSP to for that <ES, EVI, BD> will now have a single next hop, viz. the LSP
PE1. to PE1.
In summary, it can be seen that aliasing (and backup path)
functionality should work as is for inter-AS option B without
requiring any addition functionality in ASBRs or PEs. However, the
mass-withdraw functionality falls back from per-ES mode to per-EVI
mode for inter-AS option B - i.e., PEs receiving mass-withdraw route
from the same AS take action on Ether A-D per ES route; whereas, PEs
receiving mass-withdraw route from different AS take action on Ether
A-D per EVI route.
11 Acknowledgement In summary, it can be seen that Aliasing (and Backup Path)
functionality should work as is for inter-AS Option B without
requiring any additional functionality in ASBRs or PEs. However, the
mass withdrawal functionality falls back from per-ES mode to per-EVI
mode for inter-AS Option B. That is, PEs receiving a mass withdrawal
route from the same AS take action on Ethernet A-D per ES route;
whereas, PEs receiving mass withdrawal routes from different ASes
take action on the Ethernet A-D per EVI route.
The authors would like to thank Aldrin Isaac, David Smith, John 11. Security Considerations
Mullooly, Thomas Nadeau, Samir Thoria, and Jorge Rabadan for their
valuable comments and feedback. The authors would also like to thank
Jakob Heitz for his contribution on section 10.2.
12 Security Considerations This document uses IP-based tunnel technologies to support data-plane
This document uses IP-based tunnel technologies to support data transport. Consequently, the security considerations of those tunnel
plane transport. Consequently, the security considerations of those technologies apply. This document defines support for VXLAN
tunnel technologies apply. This document defines support for VXLAN [RFC7348] and NVGRE encapsulations [RFC7637]. The security
[RFC7348] and NVGRE [RFC7637] encapsulations. The security considerations from those RFCs apply to the data-plane aspects of
considerations from those RFCs apply to the data plane aspects of
this document. this document.
As with [RFC5512], any modification of the information that is used As with [RFC5512], any modification of the information that is used
to form encapsulation headers, to choose a tunnel type, or to choose to form encapsulation headers, to choose a tunnel type, or to choose
a particular tunnel for a particular payload type may lead to user a particular tunnel for a particular payload type may lead to user
data packets getting misrouted, misdelivered, and/or dropped. data packets getting misrouted, misdelivered, and/or dropped.
More broadly, the security considerations for the transport of IP More broadly, the security considerations for the transport of IP
reachability information using BGP are discussed in [RFC4271] and reachability information using BGP are discussed in [RFC4271] and
[RFC4272], and are equally applicable for the extensions described [RFC4272] and are equally applicable for the extensions described in
in this document. this document.
13 IANA Considerations 12. IANA Considerations
This document requests the following BGP Tunnel Encapsulation This document registers the following in the "BGP Tunnel
Attribute Tunnel Types from IANA and they have already been Encapsulation Attribute Tunnel Types" registry.
allocated. The IANA registry needs to point to this document.
Value Name
----- ------------------------
8 VXLAN Encapsulation 8 VXLAN Encapsulation
9 NVGRE Encapsulation 9 NVGRE Encapsulation
10 MPLS Encapsulation 10 MPLS Encapsulation
11 MPLS in GRE Encapsulation 11 MPLS in GRE Encapsulation
12 VXLAN GPE Encapsulation 12 VXLAN GPE Encapsulation
14 References 13. References
14.1 Normative References 13.1. Normative References
[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
Requirement Levels", BCP 14, RFC 2119, March 1997. Requirement Levels", BCP 14, RFC 2119,
DOI 10.17487/RFC2119, March 1997,
<https://www.rfc-editor.org/info/rfc2119>.
[RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC [RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC
2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, May 2017, 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174,
<http://www.rfc-editor.org/info/rfc8174>. May 2017, <https://www.rfc-editor.org/info/rfc8174>.
[RFC7432] Sajassi et al., "BGP MPLS Based Ethernet VPN", RFC 7432,
February 2014
[RFC7348] Mahalingam, M., et al, "VXLAN: A Framework for Overlaying [RFC7432] Sajassi, A., Ed., Aggarwal, R., Bitar, N., Isaac, A.,
Virtualized Layer 2 Networks over Layer 3 Networks", RFC 7348, August Uttaro, J., Drake, J., and W. Henderickx, "BGP MPLS-Based
2014 Ethernet VPN", RFC 7432, DOI 10.17487/RFC7432, February
2015, <https://www.rfc-editor.org/info/rfc7432>.
[RFC7637] Garg, P., et al., "NVGRE: Network Virtualization using [RFC7348] Mahalingam, M., Dutt, D., Duda, K., Agarwal, P., Kreeger,
Generic Routing Encapsulation", RFC 7637, September, 2015 L., Sridhar, T., Bursell, M., and C. Wright, "Virtual
eXtensible Local Area Network (VXLAN): A Framework for
Overlaying Virtualized Layer 2 Networks over Layer 3
Networks", RFC 7348, DOI 10.17487/RFC7348, August 2014,
<https://www.rfc-editor.org/info/rfc7348>.
[RFC5512] Mohapatra, P. and E. Rosen, "The BGP Encapsulation [RFC5512] Mohapatra, P. and E. Rosen, "The BGP Encapsulation
Subsequent Address Family Identifier (SAFI) and the BGP Tunnel Subsequent Address Family Identifier (SAFI) and the BGP
Encapsulation Attribute", RFC 5512, April 2009. Tunnel Encapsulation Attribute", RFC 5512,
DOI 10.17487/RFC5512, April 2009,
<https://www.rfc-editor.org/info/rfc5512>.
[RFC4023] T. Worster et al., "Encapsulating MPLS in IP or Generic [RFC4023] Worster, T., Rekhter, Y., and E. Rosen, Ed.,
Routing Encapsulation (GRE)", RFC 4023, March 2005 "Encapsulating MPLS in IP or Generic Routing Encapsulation
(GRE)", RFC 4023, DOI 10.17487/RFC4023, March 2005,
<https://www.rfc-editor.org/info/rfc4023>.
14.2 Informative References [RFC7637] Garg, P., Ed. and Y. Wang, Ed., "NVGRE: Network
Virtualization Using Generic Routing Encapsulation",
RFC 7637, DOI 10.17487/RFC7637, September 2015,
<https://www.rfc-editor.org/info/rfc7637>.
[RFC7209] Sajassi et al., "Requirements for Ethernet VPN (EVPN)", RFC 13.2. Informative References
7209, May 2014
[RFC4272] S. Murphy, "BGP Security Vulnerabilities Analysis.", [RFC7209] Sajassi, A., Aggarwal, R., Uttaro, J., Bitar, N.,
January 2006. Henderickx, W., and A. Isaac, "Requirements for Ethernet
VPN (EVPN)", RFC 7209, DOI 10.17487/RFC7209, May 2014,
<https://www.rfc-editor.org/info/rfc7209>.
[RFC7364] Narten et al., "Problem Statement: Overlays for Network [RFC4272] Murphy, S., "BGP Security Vulnerabilities Analysis",
Virtualization", RFC 7364, October 2014. RFC 4272, DOI 10.17487/RFC4272, January 2006,
<https://www.rfc-editor.org/info/rfc4272>.
[RFC7365] Lasserre et al., "Framework for DC Network Virtualization", [RFC7364] Narten, T., Ed., Gray, E., Ed., Black, D., Fang, L.,
RFC 7365, October 2014. Kreeger, L., and M. Napierala, "Problem Statement:
Overlays for Network Virtualization", RFC 7364,
DOI 10.17487/RFC7364, October 2014,
<https://www.rfc-editor.org/info/rfc7364>.
[DCI-EVPN-OVERLAY] Rabadan et al., "Interconnect Solution for EVPN [RFC7365] Lasserre, M., Balus, F., Morin, T., Bitar, N., and Y.
Overlay networks", draft-ietf-bess-dci-evpn-overlay-08, work in Rekhter, "Framework for Data Center (DC) Network
progress, February 8, 2018. Virtualization", RFC 7365, DOI 10.17487/RFC7365, October
2014, <https://www.rfc-editor.org/info/rfc7365>.
[RFC4271] Y. Rekhter, Ed., T. Li, Ed., S. Hares, Ed., "A Border [RFC6514] Aggarwal, R., Rosen, E., Morin, T., and Y. Rekhter, "BGP
Gateway Protocol 4 (BGP-4)", January 2006. Encodings and Procedures for Multicast in MPLS/BGP IP
VPNs", RFC 6514, DOI 10.17487/RFC6514, February 2012,
<https://www.rfc-editor.org/info/rfc6514>.
[RFC4364] Rosen, E., et al, "BGP/MPLS IP Virtual Private Networks [RFC4271] Rekhter, Y., Ed., Li, T., Ed., and S. Hares, Ed., "A
(VPNs)", RFC 4364, February 2006. Border Gateway Protocol 4 (BGP-4)", RFC 4271,
DOI 10.17487/RFC4271, January 2006,
<https://www.rfc-editor.org/info/rfc4271>.
[TUNNEL-ENCAP] Rosen et al., "The BGP Tunnel Encapsulation [RFC4364] Rosen, E. and Y. Rekhter, "BGP/MPLS IP Virtual Private
Attribute", draft-ietf-idr-tunnel-encaps-08, work in progress, Networks (VPNs)", RFC 4364, DOI 10.17487/RFC4364, February
January 11, 2018. 2006, <https://www.rfc-editor.org/info/rfc4364>.
[RFC6514] R. Aggarwal et al., "BGP Encodings and Procedures for [TUNNEL-ENCAP]
Multicast in MPLS/BGP IP VPNs", RFC 6514, February 2012 Rosen, E., Ed., Patel, K., and G. Velde, "The BGP Tunnel
Encapsulation Attribute", Work in Progress draft-ietf-idr-
tunnel-encaps-09, February 2018.
[VXLAN-GPE] Maino et al., "Generic Protocol Extension for VXLAN", [DCI-EVPN-OVERLAY]
draft-ietf-nvo3-vxlan-gpe-05, work in progress October 30, 2017. Rabadan, J., Ed., Sathappan, S., Henderickx, W., Sajassi,
A., and J. Drake, "Interconnect Solution for EVPN Overlay
networks", Work in Progress, draft-ietf-bess-dci-evpn-
overlay-10, March 2018.
[GENEVE] J. Gross et al., "Geneve: Generic Network Virtualization [EVPN-GENEVE]
Encapsulation", draft-ietf-nvo3-geneve-05, September 2017 Boutros, S., Sajassi, A., Drake, J., and J. Rabadan, "EVPN
control plane for Geneve", Work in Progress,
draft-boutros-bess-evpn-geneve-02, March 2018.
[EVPN-GENEVE] S. Boutros et al., "EVPN control plane for Geneve", [VXLAN-GPE]
draft-boutros-bess-evpn-geneve-00.txt, June 2017 Maino, F., Kreeger, L., Ed., and U. Elzur, Ed., "Generic
Protocol Extension for VXLAN", Work in Progress,
draft-ietf-nvo3-vxlan-gpe-05, October 2017.
[GENEVE] Gross, J., Ed., Ganga, I., Ed., and T. Sridhar, Ed.,
"Geneve: Generic Network Virtualization Encapsulation",
Work in Progress, draft-ietf-nvo3-geneve-06, March 2018.
[IEEE.802.1Q]
IEEE, "IEEE Standard for Local and metropolitan area
networks - Bridges and Bridged Networks - Media Access
Control (MAC) Bridges and Virtual Bridged Local Area
Networks", IEEE Std 802.1Q.
Acknowledgements
The authors would like to thank Aldrin Isaac, David Smith, John
Mullooly, Thomas Nadeau, Samir Thoria, and Jorge Rabadan for their
valuable comments and feedback. The authors would also like to thank
Jakob Heitz for his contribution on Section 10.2.
Contributors Contributors
S. Salam S. Salam
K. Patel K. Patel
D. Rao D. Rao
S. Thoria S. Thoria
D. Cai D. Cai
Cisco Cisco
Y. Rekhter Y. Rekhter
A. Issac A. Issac
Wen Lin W. Lin
Nischal Sheth N. Sheth
Juniper Juniper
L. Yong L. Yong
Huawei Huawei
Authors' Addresses Authors' Addresses
Ali Sajassi Ali Sajassi (editor)
Cisco Cisco
USA United States of America
Email: sajassi@cisco.com Email: sajassi@cisco.com
John Drake John Drake (editor)
Juniper Networks Juniper Networks
USA United States of America
Email: jdrake@juniper.net Email: jdrake@juniper.net
Nabil Bitar Nabil Bitar
Nokia Nokia
USA United States of America
Email : nabil.bitar@nokia.com
Email: nabil.bitar@nokia.com
R. Shekhar R. Shekhar
Juniper Juniper
USA United States of America
Email: rshekhar@juniper.net Email: rshekhar@juniper.net
James Uttaro James Uttaro
AT&T AT&T
USA United States of America
Email: uttaro@att.com Email: uttaro@att.com
Wim Henderickx Wim Henderickx
Nokia Nokia
USA Copernicuslaan 50
e-mail: wim.henderickx@nokia.com 2018 Antwerp
Belgium
Email: wim.henderickx@nokia.com
 End of changes. 267 change blocks. 
888 lines changed or deleted 930 lines changed or added

This html diff was produced by rfcdiff 1.46. The latest version is available from http://tools.ietf.org/tools/rfcdiff/