draft-ietf-bess-evpn-overlay-02.txt   draft-ietf-bess-evpn-overlay-03.txt 
L2VPN Workgroup A. Sajassi (Editor) L2VPN Workgroup A. Sajassi (Editor)
INTERNET-DRAFT Cisco INTERNET-DRAFT Cisco
Intended Status: Standards Track J. Drake (Editor) Intended Status: Standards Track J. Drake (Editor)
Juniper Juniper
Nabil Bitar N. Bitar
Verizon Nokia
Aldrin Isaac A. Isaac
Juniper Juniper
James Uttaro J. Uttaro
AT&T AT&T
W. Henderickx W. Henderickx
Alcatel-Lucent Nokia
Expires: April 19, 2016 October 19, 2015 Expires: November 24, 2016 May 24, 2016
A Network Virtualization Overlay Solution using EVPN A Network Virtualization Overlay Solution using EVPN
draft-ietf-bess-evpn-overlay-02 draft-ietf-bess-evpn-overlay-03
Abstract Abstract
This document describes how Ethernet VPN (EVPN) [RFC7432] can be used This document describes how Ethernet VPN (EVPN) [RFC7432] can be used
as an Network Virtualization Overlay (NVO) solution and explores the as an Network Virtualization Overlay (NVO) solution and explores the
various tunnel encapsulation options over IP and their impact on the various tunnel encapsulation options over IP and their impact on the
EVPN control-plane and procedures. In particular, the following EVPN control-plane and procedures. In particular, the following
encapsulation options are analyzed: VXLAN, NVGRE, and MPLS over GRE. encapsulation options are analyzed: VXLAN, NVGRE, and MPLS over GRE.
Status of this Memo Status of this Memo
skipping to change at page 3, line 5 skipping to change at page 3, line 5
8.1.1 Multi-homed Ethernet Segment Auto-Discovery . . . . . . 16 8.1.1 Multi-homed Ethernet Segment Auto-Discovery . . . . . . 16
8.1.2 Fast Convergence and Mass Withdraw . . . . . . . . . . . 16 8.1.2 Fast Convergence and Mass Withdraw . . . . . . . . . . . 16
8.1.3 Split-Horizon . . . . . . . . . . . . . . . . . . . . . 16 8.1.3 Split-Horizon . . . . . . . . . . . . . . . . . . . . . 16
8.1.4 Aliasing and Backup-Path . . . . . . . . . . . . . . . . 17 8.1.4 Aliasing and Backup-Path . . . . . . . . . . . . . . . . 17
8.1.5 DF Election . . . . . . . . . . . . . . . . . . . . . . 17 8.1.5 DF Election . . . . . . . . . . . . . . . . . . . . . . 17
8.2 Impact on EVPN BGP Routes & Attributes . . . . . . . . . . . 18 8.2 Impact on EVPN BGP Routes & Attributes . . . . . . . . . . . 18
8.3 Impact on EVPN Procedures . . . . . . . . . . . . . . . . . 18 8.3 Impact on EVPN Procedures . . . . . . . . . . . . . . . . . 18
8.3.1 Split Horizon . . . . . . . . . . . . . . . . . . . . . 19 8.3.1 Split Horizon . . . . . . . . . . . . . . . . . . . . . 19
8.3.2 Aliasing and Backup-Path . . . . . . . . . . . . . . . . 19 8.3.2 Aliasing and Backup-Path . . . . . . . . . . . . . . . . 19
9 Support for Multicast . . . . . . . . . . . . . . . . . . . . . 19 9 Support for Multicast . . . . . . . . . . . . . . . . . . . . . 20
10 Data Center Interconnections - DCI . . . . . . . . . . . . . . 20 10 Data Center Interconnections - DCI . . . . . . . . . . . . . . 20
10.1 DCI using GWs . . . . . . . . . . . . . . . . . . . . . . . 20 10.1 DCI using GWs . . . . . . . . . . . . . . . . . . . . . . . 21
10.2 DCI using ASBRs . . . . . . . . . . . . . . . . . . . . . . 21 10.2 DCI using ASBRs . . . . . . . . . . . . . . . . . . . . . . 21
10.2.1 ASBR Functionality with NVEs in Hypervisors . . . . . . 22 10.2.1 ASBR Functionality with NVEs in Hypervisors . . . . . . 22
10.2.2 ASBR Functionality with NVEs in TORs . . . . . . . . . 22 10.2.2 ASBR Functionality with NVEs in TORs . . . . . . . . . 22
11 Acknowledgement . . . . . . . . . . . . . . . . . . . . . . . 24 11 Acknowledgement . . . . . . . . . . . . . . . . . . . . . . . 24
12 Security Considerations . . . . . . . . . . . . . . . . . . . 24 12 Security Considerations . . . . . . . . . . . . . . . . . . . 24
13 IANA Considerations . . . . . . . . . . . . . . . . . . . . . 25 13 IANA Considerations . . . . . . . . . . . . . . . . . . . . . 25
14 References . . . . . . . . . . . . . . . . . . . . . . . . . . 25 14 References . . . . . . . . . . . . . . . . . . . . . . . . . . 25
14.1 Normative References . . . . . . . . . . . . . . . . . . . 25 14.1 Normative References . . . . . . . . . . . . . . . . . . . 25
14.2 Informative References . . . . . . . . . . . . . . . . . . 25 14.2 Informative References . . . . . . . . . . . . . . . . . . 26
Contributors . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 Contributors . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 26 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 27
1 Introduction 1 Introduction
In the context of this document, a Network Virtualization Overlay In the context of this document, a Network Virtualization Overlay
(NVO) is a solution to address the requirements of a multi-tenant (NVO) is a solution to address the requirements of a multi-tenant
data center, especially one with virtualized hosts, e.g., Virtual data center, especially one with virtualized hosts, e.g., Virtual
Machines (VMs). The key requirements of such a solution, as described Machines (VMs). The key requirements of such a solution, as described
in [Problem-Statement], are: in [Problem-Statement], are:
- Isolation of network traffic per tenant - Isolation of network traffic per tenant
skipping to change at page 7, line 28 skipping to change at page 7, line 28
plane for NVO are extremely important. EVPN and the extensions plane for NVO are extremely important. EVPN and the extensions
described herein, are designed with this level of scalability in described herein, are designed with this level of scalability in
mind. mind.
5 Encapsulation Options for EVPN Overlays 5 Encapsulation Options for EVPN Overlays
5.1 VXLAN/NVGRE Encapsulation 5.1 VXLAN/NVGRE Encapsulation
Both VXLAN and NVGRE are examples of technologies that provide a data Both VXLAN and NVGRE are examples of technologies that provide a data
plane encapsulation which is used to transport a packet over the plane encapsulation which is used to transport a packet over the
common physical IP infrastructure between VXLAN Tunnel End Points common physical IP infrastructure between Network Virtualization
(VTEPs) in VXLAN network and Network Virtualization Endpoints (NVEs) Edges (NVEs) - e.g., VXLAN Tunnel End Points (VTEPs) in VXLAN
in NVGRE network. Both of these technologies include the identifier network. Both of these technologies include the identifier of the
of the specific NVO instance, Virtual Network Identifier (VNI) in specific NVO instance, Virtual Network Identifier (VNI) in VXLAN and
VXLAN and Virtual Subnet Identifier (VSID) in NVGRE, in each packet. Virtual Subnet Identifier (VSID) in NVGRE, in each packet. In the
remainder of this document we use VNI as the representation for NVO
instance with the understanding that VSID can equally be used if the
encapsulation is NVGRE unless it is stated otherwise.
Note that a Provider Edge (PE) is equivalent to a VTEP/NVE. Note that a Provider Edge (PE) is equivalent to a NVE/VTEP.
VXLAN encapsulation is based on UDP, with an 8-byte header following VXLAN encapsulation is based on UDP, with an 8-byte header following
the UDP header. VXLAN provides a 24-bit VNI, which typically provides the UDP header. VXLAN provides a 24-bit VNI, which typically provides
a one-to-one mapping to the tenant VLAN ID, as described in a one-to-one mapping to the tenant VLAN ID, as described in
[RFC7348]. In this scenario, the ingress VTEP does not include an [RFC7348]. In this scenario, the ingress VTEP does not include an
inner VLAN tag on the encapsulated frame, and the egress VTEP inner VLAN tag on the encapsulated frame, and the egress VTEP
discards the frames with an inner VLAN tag. This mode of operation in discards the frames with an inner VLAN tag. This mode of operation in
[RFC7348] maps to VLAN Based Service in [RFC7432], where a tenant [RFC7348] maps to VLAN Based Service in [RFC7432], where a tenant
VLAN ID gets mapped to an EVPN instance (EVI). VLAN ID gets mapped to an EVPN instance (EVI).
skipping to change at page 8, line 16 skipping to change at page 8, line 18
[NVGRE] encapsulation is based on [GRE] and it mandates the inclusion [NVGRE] encapsulation is based on [GRE] and it mandates the inclusion
of the optional GRE Key field which carries the VSID. There is a one- of the optional GRE Key field which carries the VSID. There is a one-
to-one mapping between the VSID and the tenant VLAN ID, as described to-one mapping between the VSID and the tenant VLAN ID, as described
in [NVGRE] and the inclusion of an inner VLAN tag is prohibited. This in [NVGRE] and the inclusion of an inner VLAN tag is prohibited. This
mode of operation in [NVGRE] maps to VLAN Based Service in mode of operation in [NVGRE] maps to VLAN Based Service in
[RFC7432]. [RFC7432].
As described in the next section there is no change to the encoding As described in the next section there is no change to the encoding
of EVPN routes to support VXLAN or NVGRE encapsulation except for the of EVPN routes to support VXLAN or NVGRE encapsulation except for the
use of BGP Encapsulation extended community. However, there is use of BGP Encapsulation extended community to indicate the
encapsulation type (e.g., VxLAN or NVGRE). However, there is
potential impact to the EVPN procedures depending on where the NVE is potential impact to the EVPN procedures depending on where the NVE is
located (i.e., in hypervisor or TOR) and whether multi-homing located (i.e., in hypervisor or TOR) and whether multi-homing
capabilities are required. capabilities are required.
5.1.1 Virtual Identifiers Scope 5.1.1 Virtual Identifiers Scope
Although VNI or VSID are defined as 24-bit globally unique values, Although VNIs are defined as 24-bit globally unique values, there are
there are scenarios in which it is desirable to use a locally scenarios in which it is desirable to use a locally significant value
significant value for VNI or VSID, especially in the context of data for VNI, especially in the context of data center interconnect:
center interconnect:
5.1.1.1 Data Center Interconnect with Gateway 5.1.1.1 Data Center Interconnect with Gateway
In the case where NVEs in different data centers need to be In the case where NVEs in different data centers need to be
interconnected, and the NVEs need to use VNIs or VSIDs as a globally interconnected, and the NVEs need to use VNIs as a globally unique
unique identifiers within a data center, then a Gateway needs to be identifiers within a data center, then a Gateway needs to be employed
employed at the edge of the data center network. This is because the at the edge of the data center network. This is because the Gateway
Gateway will provide the functionality of translating the VNI or VSID will provide the functionality of translating the VNI when crossing
when crossing network boundaries, which may align with operator span network boundaries, which may align with operator span of control
of control boundaries. As an example, consider the network of Figure boundaries. As an example, consider the network of Figure 1 below.
1 below. Assume there are three network operators: one for each of Assume there are three network operators: one for each of the DC1,
the DC1, DC2 and WAN networks. The Gateways at the edge of the data DC2 and WAN networks. The Gateways at the edge of the data centers
centers are responsible for translating the VNIs / VSIDs between the are responsible for translating the VNIs between the values used in
values used in each of the data center networks and the values used each of the data center networks and the values used in the WAN.
in the WAN.
+--------------+ +--------------+
| | | |
+---------+ | WAN | +---------+ +---------+ | WAN | +---------+
+----+ | +---+ +----+ +----+ +---+ | +----+ +----+ | +---+ +----+ +----+ +---+ | +----+
|NVE1|--| | | |WAN | |WAN | | | |--|NVE3| |NVE1|--| | | |WAN | |WAN | | | |--|NVE3|
+----+ |IP |GW |--|Edge| |Edge|--|GW | IP | +----+ +----+ |IP |GW |--|Edge| |Edge|--|GW | IP | +----+
+----+ |Fabric +---+ +----+ +----+ +---+ Fabric | +----+ +----+ |Fabric +---+ +----+ +----+ +---+ Fabric | +----+
|NVE2|--| | | | | |--|NVE4| |NVE2|--| | | | | |--|NVE4|
+----+ +---------+ +--------------+ +---------+ +----+ +----+ +---------+ +--------------+ +---------+ +----+
|<------ DC 1 ------> <------ DC2 ------>| |<------ DC 1 ------> <------ DC2 ------>|
Figure 1: Data Center Interconnect with Gateway Figure 1: Data Center Interconnect with Gateway
5.1.1.2 Data Center Interconnect without Gateway 5.1.1.2 Data Center Interconnect without Gateway
In the case where NVEs in different data centers need to be In the case where NVEs in different data centers need to be
interconnected, and the NVEs need to use locally assigned VNIs or interconnected, and the NVEs need to use locally assigned VNIs (e.g.,
VSIDs (e.g., as MPLS labels), then there may be no need to employ similar to MPLS labels), then there may be no need to employ Gateways
Gateways at the edge of the data center network. More specifically, at the edge of the data center network. More specifically, the VNI
the VNI or VSID value that is used by the transmitting NVE is value that is used by the transmitting NVE is allocated by the NVE
allocated by the NVE that is receiving the traffic (in other words, that is receiving the traffic (in other words, this is similar to
this is a "downstream assigned" MPLS label). This allows the VNI or "downstream assigned" MPLS label). This allows the VNI space to be
VSID space to be decoupled between different data center networks decoupled between different data center networks without the need for
without the need for a dedicated Gateway at the edge of the data a dedicated Gateway at the edge of the data centers. This topics is
centers. covered in section 10.2.
+--------------+ +--------------+
| | | |
+---------+ | WAN | +---------+ +---------+ | WAN | +---------+
+----+ | | +----+ +----+ | | +----+ +----+ | | +----+ +----+ | | +----+
|NVE1|--| | |WAN | |WAN | | |--|NVE3| |NVE1|--| | |ASBR| |ASBR| | |--|NVE3|
+----+ |IP Fabric|---|Edge| |Edge|--|IP Fabric| +----+ +----+ |IP Fabric|---| | | |--|IP Fabric| +----+
+----+ | | +----+ +----+ | | +----+ +----+ | | +----+ +----+ | | +----+
|NVE2|--| | | | | |--|NVE4| |NVE2|--| | | | | |--|NVE4|
+----+ +---------+ +--------------+ +---------+ +----+ +----+ +---------+ +--------------+ +---------+ +----+
|<------ DC 1 -----> <---- DC2 ------>| |<------ DC 1 -----> <---- DC2 ------>|
Figure 2: Data Center Interconnect without Gateway Figure 2: Data Center Interconnect with ASBR
5.1.2 Virtual Identifiers to EVI Mapping 5.1.2 Virtual Identifiers to EVI Mapping
When the EVPN control plane is used in conjunction with VXLAN or When the EVPN control plane is used in conjunction with VXLAN (or
NVGRE, two options for mapping the VXLAN VNI or NVGRE VSID to an EVI NVGRE encapsulation), two options for mapping the VXLAN VNI (or NVGRE
are possible: VSID) to an EVI are possible:
1. Option 1: Single Subnet per EVI 1. Option 1: Single Subnet per EVI
In this option, a single subnet represented by a VNI or VSID is In this option, a single subnet represented by a VNI is mapped to a
mapped to a unique EVI. This corresponds to the VLAN Based service in unique EVI. This corresponds to the VLAN Based service in [RFC7432],
[RFC7432], where a tenant VLAN ID gets mapped to an EVPN instance where a tenant VLAN ID gets mapped to an EVPN instance (EVI). As
(EVI). As such, a BGP RD and RT is needed per VNI / VSID on every such, a BGP RD and RT is needed per VNI on every NVE. The advantage
VTEP. The advantage of this model is that it allows the BGP RT of this model is that it allows the BGP RT constraint mechanisms to
constraint mechanisms to be used in order to limit the propagation be used in order to limit the propagation and import of routes to
and import of routes to only the VTEPs that are interested in a given only the NVEs that are interested in a given VNI. The disadvantage of
VNI or VSID. The disadvantage of this model may be the provisioning this model may be the provisioning overhead if RD and RT are not
overhead if RD and RT are not derived automatically from VNI or derived automatically from VNI.
VSID.
In this option, the MAC-VRF table is identified by the RT in the In this option, the MAC-VRF table is identified by the RT in the
control plane and by the VNI or VSID in the data-plane. In this control plane and by the VNI in the data-plane. In this option, the
option, the specific the MAC-VRF table corresponds to only a single specific the MAC-VRF table corresponds to only a single bridge table.
bridge table.
2. Option 2: Multiple Subnets per EVI 2. Option 2: Multiple Subnets per EVI
In this option, multiple subnets each represented by a unique VNI or In this option, multiple subnets each represented by a unique VNI are
VSID are mapped to a single EVI. For example, if a tenant has mapped to a single EVI. For example, if a tenant has multiple
multiple segments/subnets each represented by a VNI or VSID, then all segments/subnets each represented by a VNI, then all the VNIs for
the VNIs (or VSIDs) for that tenant are mapped to a single EVI - that tenant are mapped to a single EVI - e.g., the EVI in this case
e.g., the EVI in this case represents the tenant and not a subnet . represents the tenant and not a subnet . This corresponds to the
This corresponds to the VLAN-Aware Bundle service in [RFC7432]. The VLAN-aware bundle service in [RFC7432]. The advantage of this model
advantage of this model is that it doesn't require the provisioning is that it doesn't require the provisioning of RD/RT per VNI.
of RD/RT per VNI or VSID. However, this is a moot point if option 1 However, this is a moot point if option 1 with auto-derivation is
with auto-derivation is used. The disadvantage of this model is that used. The disadvantage of this model is that routes would be imported
routes would be imported by VTEPs that may not be interested in a by NVEs that may not be interested in a given VNI.
given VNI or VSID.
In this option the MAC-VRF table is identified by the RT in the In this option the MAC-VRF table is identified by the RT in the
control plane and a specific bridge table for that MAC-VRF is control plane and a specific bridge table for that MAC-VRF is
identified by the <RT, Ethernet Tag ID> in the control plane. In this identified by the <RT, Ethernet Tag ID> in the control plane. In this
option, the VNI/VSID in the data-plane is sufficient to identify a option, the VNI in the data-plane is sufficient to identify a
specific bridge table - e.g., no need to do a lookup based on specific bridge table - e.g., no need to do a lookup based on VNI and
VNI/VSID and Ethernet Tag ID fields to identify a bridge table. Ethernet Tag ID fields to identify a bridge table.
5.1.2.1 Auto Derivation of RT 5.1.2.1 Auto Derivation of RT
When the option of a single VNI or VSID per EVI is used, it is When the option of a single VNI per EVI is used, it is important to
important to auto-derive RT for EVPN BGP routes in order to simplify auto-derive RT for EVPN BGP routes in order to simplify configuration
configuration for data center operations. RD can be derived easily as for data center operations. RD can be auto generated as described in
described in [RFC7432] and RT can be auto-derived as described next. [RFC7432] and RT can be auto-derived as described next.
Since a gateway PE as depicted in figure-1 participates in both the Since a gateway PE as depicted in figure-1 participates in both the
DCN and WAN BGP sessions, it is important that when RT values are DCN and WAN BGP sessions, it is important that when RT values are
auto-derived for VNIs (or VSIDs), there is no conflict in RT spaces auto-derived for VNIs, there is no conflict in RT spaces between DCN
between DCN and WAN networks assuming that both are operating within and WAN networks assuming that both are operating within the same AS.
the same AS. Also, there can be scenarios where both VXLAN and NVGRE Also, there can be scenarios where both VXLAN and NVGRE
encapsulations may be needed within the same DCN and their encapsulations may be needed within the same DCN and their
corresponding VNIs and VSIDs are administered independently which corresponding VNIs are administered independently which means VNI
means VNI and VSID spaces can overlap. In order to ensure that no spaces can overlap. In order to ensure that no such conflict in RT
such conflict in RT spaces arises, RT values for DCNs are auto- spaces arises, RT values for DCNs are auto-derived as follow:
derived as follow:
0 1 2 3 4 0 1 2 3 4
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 0 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 0
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+---+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+---+
| AS # |A| TYPE| D-ID |Service Instance ID| | AS # |A| TYPE| D-ID |Service Instance ID|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+---+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+---+
- 2 bytes of global admin field of the RT is set to the AS number. - 2 bytes of global admin field of the RT is set to the AS number.
- Three least significant bytes of the local admin field of the RT is - Three least significant bytes of the local admin field of the RT is
set to the VNI or VSID, I-SID, or VID. The most significant bit of set to the VNI, VSID, I-SID, or VID.
the local admin field of the RT is set as follow:
- The most significant bit of the local admin field of the RT is set
as follow:
0: auto-derived 0: auto-derived
1: manually-derived 1: manually-derived
- The next 3 bits of the most significant byte of the local admin - The next 3 bits of the most significant byte of the local admin
field of the RT identifies the space in which the other 3 bytes are field of the RT identifies the space in which the other 3 bytes are
defined. The following spaces are defined: defined. The following spaces are defined:
0 : VID 0 : VID
1 : VXLAN 1 : VXLAN
2 : NVGRE 2 : NVGRE
3 : I-SID 3 : I-SID
skipping to change at page 12, line 4 skipping to change at page 11, line 46
- The remaining 4 bits of the most significant byte of the local - The remaining 4 bits of the most significant byte of the local
admin field of the RT identifies the domain-id. The default value of admin field of the RT identifies the domain-id. The default value of
domain-id is zero indicating that only a single numbering space exist domain-id is zero indicating that only a single numbering space exist
for a given technology. However, if there are more than one number for a given technology. However, if there are more than one number
space exist for a given technology (e.g., overlapping VXLAN spaces), space exist for a given technology (e.g., overlapping VXLAN spaces),
then each of the number spaces need to be identify by their then each of the number spaces need to be identify by their
corresponding domain-id starting from 1. corresponding domain-id starting from 1.
5.1.3 Constructing EVPN BGP Routes 5.1.3 Constructing EVPN BGP Routes
In EVPN, an MPLS label is distributed by the egress PE via the EVPN In EVPN, an MPLS label is distributed by the egress PE via the EVPN
control plane and is placed in the MPLS header of a given packet by control plane and is placed in the MPLS header of a given packet by
the ingress PE. This label is used upon receipt of that packet by the the ingress PE. This label is used upon receipt of that packet by the
egress PE for disposition of that packet. This is very similar to the egress PE for disposition of that packet. This is very similar to the
use of the VNI or VSID by the egress VTEP or NVE, respectively, with use of the VNI by the egress NVE, with the difference being that an
the difference being that an MPLS label has local significance while MPLS label has local significance while a VNI typically has global
a VNI or VSID typically has global significance. Accordingly, and significance. Accordingly, and specifically to support the option of
specifically to support the option of locally assigned VNIs, the MPLS locally assigned VNIs, the MPLS label field in the MAC Advertisement,
label field in the MAC Advertisement, Ethernet AD per EVI, and Ethernet AD per EVI, and Inclusive Multicast Ethernet Tag routes is
Inclusive Multicast Ethernet Tag routes is used to carry the VNI or used to carry the VNI. For the balance of this memo, the MPLS label
VSID. For the balance of this memo, the MPLS label field will be field will be referred to as the VNI field. The VNI field is used for
referred to as the VNI/VSID field. The VNI/VSID field is used for both local and global VNIs, and for either case the entire 24-bit
both local and global VNIs/VSIDs, and for either case the entire 24- field is used to encode the VNI value.
bit field is used to encode the VNI/VSID value.
For the VLAN-based service (a single VNI per MAC-VRF), the Ethernet For the VLAN-based service (a single VNI per MAC-VRF), the Ethernet
Tag field in the MAC/IP Advertisement, Ethernet AD per EVI, and Tag field in the MAC/IP Advertisement, Ethernet AD per EVI, and
Inclusive Multicast route MUST be set to zero just as in the VLAN Inclusive Multicast route MUST be set to zero just as in the VLAN
Based service in [RFC7432]. Based service in [RFC7432].
For the VLAN-aware bundle service (multiple VNIs per MAC-VRF with For the VLAN-aware bundle service (multiple VNIs per MAC-VRF with
each VNI associated with its own bridge table), the Ethernet Tag each VNI associated with its own bridge table), the Ethernet Tag
field in the MAC Advertisement, Ethernet AD per EVI, and Inclusive field in the MAC Advertisement, Ethernet AD per EVI, and Inclusive
Multicast route MUST identify a bridge table within a MAC-VRF and the Multicast route MUST identify a bridge table within a MAC-VRF and the
skipping to change at page 13, line 7 skipping to change at page 13, line 5
Segment) advertised by an egress PE. Five new values have been Segment) advertised by an egress PE. Five new values have been
assigned by IANA to extend the list of encapsulation types defined in assigned by IANA to extend the list of encapsulation types defined in
[RFC5512]: [RFC5512]:
+ 8 - VXLAN Encapsulation + 8 - VXLAN Encapsulation
+ 9 - NVGRE Encapsulation + 9 - NVGRE Encapsulation
+ 10 - MPLS Encapsulation + 10 - MPLS Encapsulation
+ 11 - MPLS in GRE Encapsulation + 11 - MPLS in GRE Encapsulation
+ 12 - VXLAN GPE Encapsulation + 12 - VXLAN GPE Encapsulation
If the BGP Encapsulation extended community is not present, then the Note that the MPLS encapsulation tunnel type is needed in order to
default MPLS encapsulation or a statically configured encapsulation distinguish between an advertising node that only supports non-MPLS
is assumed. encapsulations and one that supports MPLS and non-MPLS
encapsulations. An advertising node that only supports MPLS
encapsulation does not need to advertise any encapsulation tunnel
types; i.e., if the BGP Encapsulation extended community is not
present, then either MPLS encapsulation or a statically configured
encapsulation is assumed.
The Ethernet Segment and Ethernet AD per ESI routes MAY be advertised The Ethernet Segment and Ethernet AD per ESI routes MAY be advertised
with multiple encapsulation types as long as they use the same EVPN with multiple encapsulation types as long as they use the same EVPN
multi-homing procedures - e.g., the mix of VXLAN and NVGRE multi-homing procedures - e.g., the mix of VXLAN and NVGRE
encapsulation types is a valid one but not the mix of VXLAN and MPLS encapsulation types is a valid one but not the mix of VXLAN and MPLS
encapsulation types. encapsulation types.
The Next Hop field of the MP_REACH_NLRI attribute of the route MUST The Next Hop field of the MP_REACH_NLRI attribute of the route MUST
be set to the IPv4 or IPv6 address of the NVE. The remaining fields be set to the IPv4 or IPv6 address of the NVE. The remaining fields
in each route are set as per [RFC7432]. in each route are set as per [RFC7432].
5.2 MPLS over GRE 5.2 MPLS over GRE
The EVPN data-plane is modeled as an EVPN MPLS client layer sitting The EVPN data-plane is modeled as an EVPN MPLS client layer sitting
over an MPLS PSN tunnel. Some of the EVPN functions (split-horizon, over an MPLS PSN-tunnel server layer. Some of the EVPN functions
aliasing, and backup-path) are tied to the MPLS client layer. If MPLS (split-horizon, aliasing, and backup-path) are tied to the MPLS
over GRE encapsulation is used, then the EVPN MPLS client layer can client layer. If MPLS over GRE encapsulation is used, then the EVPN
be carried over an IP PSN tunnel transparently. Therefore, there is MPLS client layer can be carried over an IP PSN tunnel transparently.
no impact to the EVPN procedures and associated data-plane Therefore, there is no impact to the EVPN procedures and associated
operation. data-plane operation.
The existing standards for MPLS over GRE encapsulation as defined by The existing standards for MPLS over GRE encapsulation as defined by
[RFC4023] can be used for this purpose; however, when it is used in [RFC4023] can be used for this purpose; however, when it is used in
conjunction with EVPN the key field SHOULD be present, and SHOULD be conjunction with EVPN the GRE key field SHOULD be present, and SHOULD
used to provide a 32-bit entropy field. The Checksum and Sequence be used to provide a 32-bit entropy field. The Checksum and Sequence
Number fields are not needed and their corresponding C and S bits Number fields are not needed and their corresponding C and S bits
MUST be set to zero. MUST be set to zero.
6 EVPN with Multiple Data Plane Encapsulations 6 EVPN with Multiple Data Plane Encapsulations
The use of the BGP Encapsulation extended community allows each PE in The use of the BGP Encapsulation extended community per [RFC5512]
a given EVI to know each of the encapsulations supported by each of allows each NVE in a given EVI to know each of the encapsulations
the other PEs in that EVI. I.e., each of the PEs in a given EVI may supported by each of the other NVEs in that EVI. i.e., each of the
support multiple data plane encapsulations. An ingress PE can send a NVEs in a given EVI may support multiple data plane encapsulations.
frame to an egress PE only if the set of encapsulations advertised by An ingress NVE can send a frame to an egress NVE only if the set of
the egress PE in the subject MAC/IP Advertisement or per EVI Ethernet encapsulations advertised by the egress NVE in the subject MAC/IP
AD route, forms a non-empty intersection with the set of Advertisement or per EVI Ethernet AD route, forms a non-empty
encapsulations supported by the ingress PE, and it is at the intersection with the set of encapsulations supported by the ingress
discretion of the ingress PE which encapsulation to choose from this NVE, and it is at the discretion of the ingress NVE which
intersection. (As noted in section 5.1.3, if the BGP Encapsulation encapsulation to choose from this intersection. (As noted in
extended community is not present, then the default MPLS section 5.1.3, if the BGP Encapsulation extended community is not
encapsulation or a statically configured encapsulation is assumed.) present, then the default MPLS encapsulation or a statically
configured encapsulation is assumed.)
An ingress node that uses shared multicast trees for sending An ingress node that uses shared multicast trees for sending
broadcast or multicast frames MUST maintain distinct trees for each broadcast or multicast frames MUST maintain distinct trees for each
different encapsulation type. different encapsulation type.
It is the responsibility of the operator of a given EVI to ensure It is the responsibility of the operator of a given EVI to ensure
that all of the PEs in that EVI support at least one common that all of the NVEs in that EVI support at least one common
encapsulation. If this condition is violated, it could result in encapsulation. If this condition is violated, it could result in
service disruption or failure. The use of the BGP Encapsulation service disruption or failure. The use of the BGP Encapsulation
extended community provides a method to detect when this condition is extended community provides a method to detect when this condition is
violated but the actions to be taken are at the discretion of the violated but the actions to be taken are at the discretion of the
operator and are outside the scope of this document. operator and are outside the scope of this document.
7 NVE Residing in Hypervisor 7 NVE Residing in Hypervisor
When a PE and its CEs are co-located in the same physical device, When a NVE and its hosts/VMs are co-located in the same physical
e.g., when the PE resides in a server and the CEs are its VMs, the device, e.g., when they reside in a server, the links between them
links between them are virtual and they typically share fate; i.e., are virtual and they typically share fate; i.e., the subject
the subject CEs are typically not multi-homed or if they are multi- hosts/VMs are typically not multi-homed or if they are multi-homed,
homed, the multi-homing is a purely local matter to the server the multi-homing is a purely local matter to the server hosting the
hosting the VM, and need not be "visible" to any other PEs, and thus VM and the NVEs, and need not be "visible" to any other NVEs residing
does not require any specific protocol mechanisms. The most common on other servers, and thus does not require any specific protocol
case of this is when the NVE resides in the hypervisor. mechanisms. The most common case of this is when the NVE resides on
the hypervisor.
In the sub-sections that follow, we will discuss the impact on EVPN In the sub-sections that follow, we will discuss the impact on EVPN
procedures for the case when the NVE resides on the hypervisor and procedures for the case when the NVE resides on the hypervisor and
the VXLAN or NVGRE encapsulation is used. the VXLAN (or NVGRE) encapsulation is used.
7.1 Impact on EVPN BGP Routes & Attributes for VXLAN/NVGRE Encapsulation 7.1 Impact on EVPN BGP Routes & Attributes for VXLAN/NVGRE Encapsulation
In the scenario where all data centers are under a single In the scenario where all data centers are under a single
administrative domain, and there is a single global VNI/VSID space, administrative domain, and there is a single global VNI space, the RD
the RD MAY be set to zero in the EVPN routes. However, in the MAY be set to zero in the EVPN routes. However, in the scenario where
scenario where different groups of data centers are under different different groups of data centers are under different administrative
administrative domains, and these data centers are connected via one domains, and these data centers are connected via one or more
or more backbone core providers as described in [NOV3-Framework], the backbone core providers as described in [NOV3-Framework], the RD must
RD must be a unique value per EVI or per NVE as described in be a unique value per EVI or per NVE as described in [RFC7432]. In
[RFC7432]. In other words, whenever there is more than one other words, whenever there is more than one administrative domain
administrative domain for global VNI or VSID, then a non-zero RD MUST for global VNI, then a non-zero RD MUST be used, or whenever the VNI
be used, or whenever the VNI or VSID value have local significance, value have local significance, then a non-zero RD MUST be used. It is
then a non-zero RD MUST be used. It is recommend to use a non-zero RD recommend to use a non-zero RD at all time.
at all time.
When the NVEs reside on the hypervisor, the EVPN BGP routes and When the NVEs reside on the hypervisor, the EVPN BGP routes and
attributes associated with multi-homing are no longer required. This attributes associated with multi-homing are no longer required. This
reduces the required routes and attributes to the following subset of reduces the required routes and attributes to the following subset of
four out of the set of eight : four out of eight:
- MAC Advertisement Route - MAC/IP Advertisement Route
- Inclusive Multicast Ethernet Tag Route - Inclusive Multicast Ethernet Tag Route
- MAC Mobility Extended Community - MAC Mobility Extended Community
- Default Gateway Extended Community - Default Gateway Extended Community
However, as noted in section 8.6 of [RFC7432] in order to enable a However, as noted in section 8.6 of [RFC7432] in order to enable a
single-homing ingress PE to take advantage of fast convergence, single-homing ingress NVE to take advantage of fast convergence,
aliasing, and backup-path when interacting with multi-homed egress aliasing, and backup-path when interacting with multi-homed egress
PEs attached to a given Ethernet segment, a single-homing ingress PE NVEs attached to a given Ethernet segment, the single-homing ingress
SHOULD be able to receive and process Ethernet AD per ES and Ethernet NVE SHOULD be able to receive and process Ethernet AD per ES and
AD per EVI routes." Ethernet AD per EVI routes.
7.2 Impact on EVPN Procedures for VXLAN/NVGRE Encapsulation 7.2 Impact on EVPN Procedures for VXLAN/NVGRE Encapsulation
When the NVEs reside on the hypervisors, the EVPN procedures When the NVEs reside on the hypervisors, the EVPN procedures
associated with multi-homing are no longer required. This limits the associated with multi-homing are no longer required. This limits the
procedures on the NVE to the following subset of the EVPN procedures: procedures on the NVE to the following subset of the EVPN procedures:
1. Local learning of MAC addresses received from the VMs per section 1. Local learning of MAC addresses received from the VMs per section
10.1 of [RFC7432]. 10.1 of [RFC7432].
2. Advertising locally learned MAC addresses in BGP using the MAC 2. Advertising locally learned MAC addresses in BGP using the MAC/IP
Advertisement routes. Advertisement routes.
3. Performing remote learning using BGP per Section 10.2 of 3. Performing remote learning using BGP per Section 10.2 of
[RFC7432]. [RFC7432].
4. Discovering other NVEs and constructing the multicast tunnels 4. Discovering other NVEs and constructing the multicast tunnels
using the Inclusive Multicast Ethernet Tag routes. using the Inclusive Multicast Ethernet Tag routes.
5. Handling MAC address mobility events per the procedures of Section 5. Handling MAC address mobility events per the procedures of Section
16 in [RFC7432]. 16 in [RFC7432].
However, as noted in section 8.6 of [RFC7432] in order to enable a However, as noted in section 8.6 of [RFC7432] in order to enable a
single-homing ingress PE to take advantage of fast convergence, single-homing ingress NVE to take advantage of fast convergence,
aliasing, and back-up path when interacting with multi-homed egress aliasing, and back-up path when interacting with multi-homed egress
PEs attached to a given Ethernet segment, a single-homing ingress PE NVEs attached to a given Ethernet segment, a single-homing ingress
SHOULD implement the ingress node processing of Ethernet AD per ES NVE SHOULD implement the ingress node processing of Ethernet AD per
and Ethernet AD per EVI routes as defined in sections 8.2 Fast ES and Ethernet AD per EVI routes as defined in sections 8.2 Fast
Convergence and 8.4 Aliasing and Backup-Path of [RFC7432]. Convergence and 8.4 Aliasing and Backup-Path of [RFC7432].
8 NVE Residing in ToR Switch 8 NVE Residing in ToR Switch
In this section, we discuss the scenario where the NVEs reside in the In this section, we discuss the scenario where the NVEs reside in the
Top of Rack (ToR) switches AND the servers (where VMs are residing) Top of Rack (ToR) switches AND the servers (where VMs are residing)
are multi-homed to these ToR switches. The multi-homing may operate are multi-homed to these ToR switches. The multi-homing may operate
in All-Active or Single-Active redundancy mode. If the servers are in All-Active or Single-Active redundancy mode. If the servers are
single-homed to the ToR switches, then the scenario becomes similar single-homed to the ToR switches, then the scenario becomes similar
to that where the NVE resides in the hypervisor, as discussed in to that where the NVE resides on the hypervisor, as discussed in
Section 5, as far as the required EVPN functionality. Section 7, as far as the required EVPN functionality are concerned.
[RFC7432] defines a set of BGP routes, attributes and procedures to [RFC7432] defines a set of BGP routes, attributes and procedures to
support multi-homing. We first describe these functions and support multi-homing. We first describe these functions and
procedures, then discuss which of these are impacted by the procedures, then discuss which of these are impacted by the VxLAN
encapsulation (such as VXLAN or NVGRE) and what modifications are (or NVGRE) encapsulation and what modifications are required.
required.
8.1 EVPN Multi-Homing Features 8.1 EVPN Multi-Homing Features
In this section, we will recap the multi-homing features of EVPN to In this section, we will recap the multi-homing features of EVPN to
highlight the encapsulation dependencies. The section only describes highlight the encapsulation dependencies. The section only describes
the features and functions at a high-level. For more details, the the features and functions at a high-level. For more details, the
reader is to refer to [RFC7432]. reader is to refer to [RFC7432].
8.1.1 Multi-homed Ethernet Segment Auto-Discovery 8.1.1 Multi-homed Ethernet Segment Auto-Discovery
skipping to change at page 16, line 43 skipping to change at page 16, line 44
of a failure in connectivity to an Ethernet segment (e.g., a link or of a failure in connectivity to an Ethernet segment (e.g., a link or
a port failure). This is done by having each NVE advertise an a port failure). This is done by having each NVE advertise an
Ethernet A-D Route per Ethernet segment for each locally attached Ethernet A-D Route per Ethernet segment for each locally attached
segment. Upon a failure in connectivity to the attached segment, the segment. Upon a failure in connectivity to the attached segment, the
NVE withdraws the corresponding Ethernet A-D route. This triggers all NVE withdraws the corresponding Ethernet A-D route. This triggers all
NVEs that receive the withdrawal to update their next-hop adjacencies NVEs that receive the withdrawal to update their next-hop adjacencies
for all MAC addresses associated with the Ethernet segment in for all MAC addresses associated with the Ethernet segment in
question. If no other NVE had advertised an Ethernet A-D route for question. If no other NVE had advertised an Ethernet A-D route for
the same segment, then the NVE that received the withdrawal simply the same segment, then the NVE that received the withdrawal simply
invalidates the MAC entries for that segment. Otherwise, the NVE invalidates the MAC entries for that segment. Otherwise, the NVE
updates the next-hop adjacencies to point to the backup NVE(s). updates the next-hop adjacency list accordingly.
8.1.3 Split-Horizon 8.1.3 Split-Horizon
If a server is multi-homed to two or more NVEs on an Ethernet segment If a server is multi-homed to two or more NVEs (represented by an
ES1 operating in all-active redundancy mode sends a multicast, Ethernet segment ES1) and operating in an all-active redundancy mode,
broadcast or unknown unicast packet to a one of these NVEs, then it sends a BUM packet (ie, Broadcast, Unknown unicast, or Multicast)
is important to ensure the packet is not looped back to the server packet to one of these NVEs, then it is important to ensure the
via another NVE connected to this server. The filtering mechanism on packet is not looped back to the server via another NVE connected to
the NVE to prevent such loop and packet duplication is called "split this server. The filtering mechanism on the NVE to prevent such loop
horizon filtering'. and packet duplication is called "split horizon filtering'.
8.1.4 Aliasing and Backup-Path 8.1.4 Aliasing and Backup-Path
In the case where a station is multi-homed to multiple NVEs, it is In the case where a station is multi-homed to multiple NVEs, it is
possible that only a single NVE learns a set of the MAC addresses possible that only a single NVE learns a set of the MAC addresses
associated with traffic transmitted by the station. This leads to a associated with traffic transmitted by the station. This leads to a
situation where remote NVEs receive MAC advertisement routes, for situation where remote NVEs receive MAC advertisement routes, for
these addresses, from a single NVE even though multiple NVEs are these addresses, from a single NVE even though multiple NVEs are
connected to the multi-homed station. As a result, the remote NVEs connected to the multi-homed station. As a result, the remote NVEs
are not able to effectively load-balance traffic among the NVEs are not able to effectively load-balance traffic among the NVEs
skipping to change at page 17, line 47 skipping to change at page 17, line 48
Ethernet Segment using the Ethernet A-D route as well. Remote NVEs Ethernet Segment using the Ethernet A-D route as well. Remote NVEs
which receive the MAC advertisement routes, with non-zero ESI, SHOULD which receive the MAC advertisement routes, with non-zero ESI, SHOULD
consider the MAC address as reachable via the advertising NVE. consider the MAC address as reachable via the advertising NVE.
Furthermore, the remote NVEs SHOULD install a Backup-Path, for said Furthermore, the remote NVEs SHOULD install a Backup-Path, for said
MAC, to the NVE which had advertised reachability to the relevant MAC, to the NVE which had advertised reachability to the relevant
Segment using an Ethernet A-D route with the same ESI and with the Segment using an Ethernet A-D route with the same ESI and with the
Single-Active flag set. Single-Active flag set.
8.1.5 DF Election 8.1.5 DF Election
If a CE is multi-homed to two or more NVEs on an Ethernet segment If a host is multi-homed to two or more NVEs on an Ethernet segment
operating in all-active redundancy mode, then for a given EVI only operating in all-active redundancy mode, then for a given EVI only
one of these NVEs, termed the Designated Forwarder (DF) is one of these NVEs, termed the Designated Forwarder (DF) is
responsible for sending it broadcast, multicast, and, if configured responsible for sending it broadcast, multicast, and, if configured
for that EVI, unknown unicast frames. for that EVI, unknown unicast frames.
This is required in order to prevent duplicate delivery of multi- This is required in order to prevent duplicate delivery of multi-
destination frames to a multi-homed host or VM, in case of all-active destination frames to a multi-homed host or VM, in case of all-active
redundancy. redundancy.
In NVEs where .1Q tagged frames are received from hosts, the DF In NVEs where .1Q tagged frames are received from hosts, the DF
skipping to change at page 18, line 24 skipping to change at page 18, line 25
duplicate VIDs exist). duplicate VIDs exist).
In GWs where VxLAN encapsulated frames are received, the DF election In GWs where VxLAN encapsulated frames are received, the DF election
is performed on VNIs. Again, it is assumed that for a given Ethernet is performed on VNIs. Again, it is assumed that for a given Ethernet
Segment, VNIs are unique and consistent (e.g., no duplicate VNIs Segment, VNIs are unique and consistent (e.g., no duplicate VNIs
exist). exist).
8.2 Impact on EVPN BGP Routes & Attributes 8.2 Impact on EVPN BGP Routes & Attributes
Since multi-homing is supported in this scenario, then the entire set Since multi-homing is supported in this scenario, then the entire set
of BGP routes and attributes defined in [RFC7432] are used. As of BGP routes and attributes defined in [RFC7432] are used. The
discussed in Section 3.1.3, the VSID or VNI is carried in the setting of the Ethernet Tag field in the MAC Advertisement, Ethernet
VNI/VSID field in the MAC Advertisement, Ethernet AD per EVI, and AD per EVI, and Inclusive Multicast routes follows that of section
Inclusive Multicast Ethernet Tag routes. 5.1.3. Furthermore, the setting of the VNI field in the MAC
Advertisement and Ethernet AD per EVI routes follows that of section
5.1.3.
8.3 Impact on EVPN Procedures 8.3 Impact on EVPN Procedures
Two cases need to be examined here, depending on whether the NVEs are Two cases need to be examined here, depending on whether the NVEs are
operating in Active/Standby or in All-Active redundancy. operating in Active/Standby or in All-Active redundancy.
First, lets consider the case of Active/Standby redundancy, where the First, lets consider the case of Active/Standby redundancy, where the
hosts are multi-homed to a set of NVEs, however, only a single NVE is hosts are multi-homed to a set of NVEs, however, only a single NVE is
active at a given point of time for a given VNI or VSID. In this active at a given point of time for a given VNI. In this case, the
case, the split-horizon and the aliasing functions are not required aliasing is not required and the split-horizon may not be required,
but other functions such as multi-homed Ethernet segment auto- but other functions such as multi-homed Ethernet segment auto-
discovery, fast convergence and mass withdraw, backup path, and DF discovery, fast convergence and mass withdraw, backup path, and DF
election are required. election are required.
Second, let's consider the case of All-Active redundancy. In this Second, let's consider the case of All-Active redundancy. In this
case, out of the EVPN multi-homing features listed in section 8.1, case, out of all the EVPN multi-homing features listed in section
the use of the VXLAN or NVGRE encapsulation impacts the split-horizon 8.1, the use of the VXLAN or NVGRE encapsulation impacts the split-
and aliasing features, since those two rely on the MPLS client layer. horizon and aliasing features, since those two rely on the MPLS
Given that this MPLS client layer is absent with these types of client layer. Given that this MPLS client layer is absent with these
encapsulations, alternative procedures and mechanisms are needed to types of encapsulations, alternative procedures and mechanisms are
provide the required functions. Those are discussed in detail next. needed to provide the required functions. Those are discussed in
detail next.
8.3.1 Split Horizon 8.3.1 Split Horizon
In EVPN, an MPLS label is used for split-horizon filtering to support In EVPN, an MPLS label is used for split-horizon filtering to support
active/active multi-homing where an ingress NVE adds a label All-Active multi-homing where an ingress NVE adds a label
corresponding to the site of origin (aka ESI Label) when corresponding to the site of origin (aka ESI Label) when
encapsulating the packet. The egress NVE checks the ESI label when encapsulating the packet. The egress NVE checks the ESI label when
attempting to forward a multi-destination frame out an interface, and attempting to forward a multi-destination frame out an interface, and
if the label corresponds to the same site identifier (ESI) associated if the label corresponds to the same site identifier (ESI) associated
with that interface, the packet gets dropped. This prevents the with that interface, the packet gets dropped. This prevents the
occurrence of forwarding loops. occurrence of forwarding loops.
Since the VXLAN or NVGRE encapsulation does not include this ESI Since the VXLAN or NVGRE encapsulation does not include this ESI
label, other means of performing the split-horizon filtering function label, other means of performing the split-horizon filtering function
MUST be devised. The following approach is recommended for split- MUST be devised. The following approach is recommended for split-
horizon filtering when VXLAN or NVGRE encapsulation is used. horizon filtering when VXLAN (or NVGRE) encapsulation is used.
Every NVE track the IP address(es) associated with the other NVE(s) Every NVE track the IP address(es) associated with the other NVE(s)
with which it has shared multi-homed Ethernet Segments. When the NVE with which it has shared multi-homed Ethernet Segments. When the NVE
receives a multi-destination frame from the overlay network, it receives a multi-destination frame from the overlay network, it
examines the source IP address in the tunnel header (which examines the source IP address in the tunnel header (which
corresponds to the ingress NVE) and filters out the frame on all corresponds to the ingress NVE) and filters out the frame on all
local interfaces connected to Ethernet Segments that are shared with local interfaces connected to Ethernet Segments that are shared with
the ingress NVE. With this approach, it is required that the ingress the ingress NVE. With this approach, it is required that the ingress
NVE performs replication locally to all directly attached Ethernet NVE performs replication locally to all directly attached Ethernet
Segments (regardless of the DF Election state) for all flooded Segments (regardless of the DF Election state) for all flooded
skipping to change at page 19, line 46 skipping to change at page 19, line 49
In order to prevent unhealthy interactions between the split horizon In order to prevent unhealthy interactions between the split horizon
procedures defined in [RFC7432] and the local bias procedures procedures defined in [RFC7432] and the local bias procedures
described in this document, a mix of MPLS over GRE encapsulations on described in this document, a mix of MPLS over GRE encapsulations on
the one hand and VXLAN/NVGRE encapsulations on the other on a given the one hand and VXLAN/NVGRE encapsulations on the other on a given
Ethernet Segment is prohibited. Ethernet Segment is prohibited.
8.3.2 Aliasing and Backup-Path 8.3.2 Aliasing and Backup-Path
The Aliasing and the Backup-Path procedures for VXLAN/NVGRE The Aliasing and the Backup-Path procedures for VXLAN/NVGRE
encapsulation is very similar to the ones for MPLS. In case of MPLS, encapsulation is very similar to the ones for MPLS. In case of MPLS,
two different Ethernet AD routes are used for this purpose. The one two different Ethernet A-D routes are used for this purpose. The one
used for Aliasing has a VPN scope and carries a VPN label but the one used for Aliasing has a VPN scope (per EVI) and carries a VPN label
used for Backup-Path has Ethernet segment scope and doesn't carry any but the one used for Backup-Path has Ethernet segment scope (per ES)
VPN specific info (e.g., Ethernet Tag and MPLS label are set to and doesn't carry any VPN specific info (e.g., Ethernet Tag and MPLS
zero). label are set to zero). In case of VxLAN/NVGRE, the same two routes
are used for the Aliasing and the Backup-Path. In case of Aliasing,
the Ethernet Tag and VNI fields in Ethernet A-D per EVI route is set
as described in section 5.1.3.
9 Support for Multicast 9 Support for Multicast
The E-VPN Inclusive Multicast BGP route is used to discover the The E-VPN Inclusive Multicast BGP route is used to discover the
multicast tunnels among the endpoints associated with a given VXLAN multicast tunnels among the endpoints associated with a given EVI
VNI or NVGRE VSID. The Ethernet Tag field of this route is used to (e.g., given VNI) for VLAN-based service and a given <EVI,VLAN> for
encode the VNI for VLXAN or VSID for NVGRE. The Originating router's VLAN-aware bundle service. The Ethernet Tag field of this route is
IP address field is set to the NVE's IP address. This route is tagged set as described in section 5.1.3. The Originating router's IP
address field is set to the NVE's IP address. This route is tagged
with the PMSI Tunnel attribute, which is used to encode the type of with the PMSI Tunnel attribute, which is used to encode the type of
multicast tunnel to be used as well as the multicast tunnel multicast tunnel to be used as well as the multicast tunnel
identifier. The tunnel encapsulation is encoded by adding the BGP identifier. The tunnel encapsulation is encoded by adding the BGP
Encapsulation extended community as per section 3.1.1. The following Encapsulation extended community as per section 5.1.1. The following
tunnel types as defined in [RFC6514] can be used in the PMSI tunnel tunnel types as defined in [RFC6514] can be used in the PMSI tunnel
attribute for VXLAN/NVGRE: attribute for VXLAN/NVGRE:
+ 3 - PIM-SSM Tree + 3 - PIM-SSM Tree
+ 4 - PIM-SM Tree + 4 - PIM-SM Tree
+ 5 - BIDIR-PIM Tree + 5 - BIDIR-PIM Tree
+ 6 - Ingress Replication + 6 - Ingress Replication
Except for Ingress Replication, this multicast tunnel is used by the Except for Ingress Replication, this multicast tunnel is used by the
PE originating the route for sending multicast traffic to other PEs, PE originating the route for sending multicast traffic to other PEs,
and is used by PEs that receive this route for receiving the traffic and is used by PEs that receive this route for receiving the traffic
originated by CEs connected to the PE that originated the route. originated by hosts connected to the PE that originated the route.
In the scenario where the multicast tunnel is a tree, both the In the scenario where the multicast tunnel is a tree, both the
Inclusive as well as the Aggregate Inclusive variants may be used. In Inclusive as well as the Aggregate Inclusive variants may be used. In
the former case, a multicast tree is dedicated to a VNI or VSID. the former case, a multicast tree is dedicated to a VNI. Whereas, in
Whereas, in the latter, a multicast tree is shared among multiple the latter, a multicast tree is shared among multiple VNIs. This is
VNIs or VSIDs. This is done by having the NVEs advertise multiple done by having the NVEs advertise multiple Inclusive Multicast routes
Inclusive Multicast routes with different VNI or VSID encoded in the with different VNI encoded in the Ethernet Tag field, but with the
Ethernet Tag field, but with the same tunnel identifier encoded in same tunnel identifier encoded in the PMSI Tunnel attribute.
the PMSI Tunnel attribute.
10 Data Center Interconnections - DCI 10 Data Center Interconnections - DCI
For DCI, the following two main scenarios are considered when For DCI, the following two main scenarios are considered when
connecting data centers running evpn-overlay (as described here) over connecting data centers running evpn-overlay (as described here) over
MPLS/IP core network: MPLS/IP core network:
- Scenario 1: DCI using GWs - Scenario 1: DCI using GWs
- Scenario 2: DCI using ASBRs - Scenario 2: DCI using ASBRs
The following two subsections describe the operations for each of The following two subsections describe the operations for each of
these scenarios. these scenarios.
10.1 DCI using GWs 10.1 DCI using GWs
This is the typical scenario for interconnecting data centers over This is the typical scenario for interconnecting data centers over
WAN. In this scenario, EVPN routes are terminated and processed in WAN. In this scenario, EVPN routes are terminated and processed in
each GW and MAC/IP routes are always re-advertised from DC to WAN but each GW and MAC/IP routes are always re-advertised from DC to WAN but
from WAN to DC, they are not re-advertised if unknown MAC address from WAN to DC, they are not re-advertised if unknown MAC address
(and default IP address) are utilized in NVEs. In this scenario, each (and default IP address) are utilized in NVEs. In this scenario, each
skipping to change at page 21, line 35 skipping to change at page 21, line 42
and it favors simplification at DCI devices over NVEs such that and it favors simplification at DCI devices over NVEs such that
larger MAC-VRF (and IP-VRF) tables are need to be maintained on NVEs; larger MAC-VRF (and IP-VRF) tables are need to be maintained on NVEs;
whereas, DCI devices don't need to maintain any MAC (and IP) whereas, DCI devices don't need to maintain any MAC (and IP)
forwarding tables. Furthermore, DCI devices do not need to terminate forwarding tables. Furthermore, DCI devices do not need to terminate
and processed routes related to multi-homing but rather to relay and processed routes related to multi-homing but rather to relay
these messages for the establishment of an end-to-end LSP path. In these messages for the establishment of an end-to-end LSP path. In
other words, DCI devices in this approach operate similar to ASBRs other words, DCI devices in this approach operate similar to ASBRs
for inter-AS options B. This requires locally assigned VNIs to be for inter-AS options B. This requires locally assigned VNIs to be
used just like downstream assigned MPLS VPN label where for all used just like downstream assigned MPLS VPN label where for all
practical purposes the VNIs function like 24-bit VPN labels. This practical purposes the VNIs function like 24-bit VPN labels. This
approach is equally applicable to data centers (or access networks) approach is equally applicable to data centers (or Carrier Ethernet
with MPLS encapsulation. networks) with MPLS encapsulation.
In inter-AS option B, when ASBR receives an EVPN route from its DC In inter-AS option B, when ASBR receives an EVPN route from its DC
over iBGP and re-advertises it to other ASBRs, it re-advertises the over iBGP and re-advertises it to other ASBRs, it re-advertises the
EVPN route by re-writing the BGP next-hops to itself, thus losing the EVPN route by re-writing the BGP next-hops to itself, thus losing the
identity of the PE that originated the advertisement. This re-write identity of the PE that originated the advertisement. This re-write
of BGP next-hop impacts the EVPN Mass Withdraw route (Ethernet A-D of BGP next-hop impacts the EVPN Mass Withdraw route (Ethernet A-D
per ES) and its procedure adversely. In EVPN, the route used for per ES) and its procedure adversely. However, it does not impact EVPN
aliasing (Ethernet A-D per EVI route) has the same RD as the MAC/IP Aliasing mechanism/procedure because when the Aliasing routes (Ether
routes associated with that EVI. Therefore, the receiving PE can A-D per EVI) are advertised, the receiving PE first resolves a MAC
associated the receive MAC/IP routes with its corresponding aliasing address for a given EVI into its corresponding <ES,EVI> and
route using their RDs even if their next hop is written to the same subsequently, it resolves the <ES,EVI> into multiple paths (and their
ASBR router's address. However, in EVPN, the mass-withdraw route uses associated next hops) via which the <ES,EVI> is reachable. Since
a different RD than that of its associated MAC/IP routes. Thus, the Aliasing and MAC routes are both advertised per EVI basis and they
way to associate them together is via their next-hop router's use the same RD and RT (per EVI), the receiving PE can associate them
address. Now, when BGP next hop address representing the originating together on a per BGP path basis (e.g., per originating PE) and thus
PE, gets re-written by the re-advertising ASBR, it creates ambiguity perform recursive route resolution - e.g., a MAC is reachable via an
in the receiving PE that cannot be resolved. Therefore, the <ES,EVI> which in turn, is reachable via a set of BGP paths, thus the
functionality needed at the ASBRs depends on whether the EVPN MAC is reachable via the set of BGP paths. Since on a per EVI basis,
Ethernet A-D routes (per ES and/or per EVI) are originated and the association of MAC routes and the corresponding Aliasing route is
whether there is a need to handle route resolution ambiguity for fixed and determined by the same RD and RT, there is no ambiguity
Ethernet A-D per ES route. when the BGP next hop for these routes is re-written as these routes
pass through ASBRs - i.e., the receiving PE may receive multiple
Aliasing routes for the same EVI from a single next hop (a single
ASBR), and it can still create multiple paths toward that <ES, EVI>.
The following two subsections describe the functionality needed by However, when the BGP next hop address corresponding to the
the ASBRs depending on whether the NVEs reside in a Hypervisors or in originating PE is re-written, the association between the Mass
TORs. Withdraw route (Ether A-D per ES) and its corresponding MAC routes
cannot be made based on their RDs and RTs because the RD for Mass
Withdraw route is different than the one for the MAC routes.
Therefore, the functionality needed at the ASBRs and the receiving
PEs depends on whether the Mass Withdraw route is originated and
whether there is a need to handle route resolution ambiguity for this
route. The following two subsections describe the functionality
needed by the ASBRs and the receiving PEs depending on whether the
NVEs reside in a Hypervisors or in TORs.
10.2.1 ASBR Functionality with NVEs in Hypervisors 10.2.1 ASBR Functionality with NVEs in Hypervisors
When NVEs reside in hypervisors as described in section 7.1, there is When NVEs reside in hypervisors as described in section 7.1, there is
no multi-homing and thus there is no need for the originating NVE to no multi-homing and thus there is no need for the originating NVE to
send Ethernet A-D per ES or Ethernet A-D per EVI routes. Furthermore, send Ethernet A-D per ES or Ethernet A-D per EVI routes. However, as
the processing of these routes by the receiving NVE in the hypervisor noted in section 7, in order to enable a single-homing ingress NVE to
are optional per [RFC7432] and as described in section 7. Therefore, take advantage of fast convergence, aliasing, and backup-path when
the ambiguity issue discussed above doesn't exist for this scenario interacting with multi-homing egress NVEs attached to a given
and the functionality of ASBRs are that of existing L2VPN (or L3VPN) Ethernet segment, the single-homing NVE SHOULD be able to receive and
where the ASBRs assist in setting up end-to-end LSPs among the NVEs' process Ethernet AD per ES and Ethernet AD per EVI routes. The
MAC-VRFs. As noted previously, for all practical purposes, the 24-bit handling of these routes are described in the next section.
locally assigned VNIs used in this scenario, function as 24-bit
labels in setting up the end-to-end LSPs.
10.2.2 ASBR Functionality with NVEs in TORs 10.2.2 ASBR Functionality with NVEs in TORs
When NVEs reside in TORs and operate in multi-homing redundancy mode, When NVEs reside in TORs and operate in multi-homing redundancy mode,
then as described in section 8, there is a need for the originating then as described in section 8, there is a need for the originating
NVE to send Ethernet A-D per ES route(s) (used for mass withdraw) and NVE to send Ethernet A-D per ES route(s) (used for mass withdraw) and
Ethernet A-D per EVI routes (used for aliasing). As described above, Ethernet A-D per EVI routes (used for aliasing). As described above,
the re-write of BGP next-hop by ASBRs creates ambiguities when the re-write of BGP next-hop by ASBRs creates ambiguities when
Ethernet A-D per ES routes are received by the remote PE in a Ethernet A-D per ES routes are received by the remote NVE in a
different ASBR because the receiving PE cannot associated that route different ASBR because the receiving NVE cannot associated that route
with the MAC/IP routes from the same Ethernet Segment advertised by with the MAC/IP routes of that Ethernet Segment advertised by the
the same originating PE. This ambiguity inhibits the function of same originating NVE. This ambiguity inhibits the function of mass-
mass-withdraw per ES by the receiving PE in a different ASBR. withdraw per ES by the receiving NVE in a different AS.
As an example consider a scenario where CE is multi-homed to PE1 and As an example consider a scenario where CE is multi-homed to PE1 and
PE2 where these PEs are connected via ASBR1 and then ASBR2 to the PE2 where these PEs are connected via ASBR1 and then ASBR2 to the
remote PE3. Furthermore, consider that PE1 receives M1 from CE1 but remote PE3. Furthermore, consider that PE1 receives M1 from CE1 but
not PE2. Therefore, PE1 advertises Eth A-D per ES1, Eth A-D per EVI1, not PE2. Therefore, PE1 advertises Eth A-D per ES1, Eth A-D per EVI1,
and M1; whereas, PE2 only advertises Eth A-D per ES1 and Eth A-D per and M1; whereas, PE2 only advertises Eth A-D per ES1 and Eth A-D per
EVI1. ASBR1 receives all these five advertisements and passes them to EVI1. ASBR1 receives all these five advertisements and passes them to
ASBR2 (with itself as the BGP next hop). ASBR2, in turn, passes them ASBR2 (with itself as the BGP next hop). ASBR2, in turn, passes them
to the remote PE3 with itself as the BGP next hop. PE3 receives these to the remote PE3 with itself as the BGP next hop. PE3 receives these
five routes where all of them have the same BGP next-hop (i.e., five routes where all of them have the same BGP next-hop (i.e.,
ASBR2). Furthermore, the two Ether A-D per ES routes received by PE3 ASBR2). Furthermore, the two Ether A-D per ES routes received by PE3
have the same info - i.e., same ESI and the same BGP next hop. have the same info - i.e., same ESI and the same BGP next hop.
Although both of these routes are maintained by the BGP process in Although both of these routes are maintained by the BGP process in
PE3, information from only one of them is used in the L2 routing PE3 (because they have different RDs and thus treated as different
table (L2 RIB). BGP routes), information from only one of them is used in the L2
routing table (L2 RIB).
PE1 PE1
/ \ / \
CE ASBR1---ASBR2---PE3 CE ASBR1---ASBR2---PE3
\ / \ /
PE2 PE2
Figure 1: Inter-AS Option B Figure 1: Inter-AS Option B
Now, when the AC between the PE2 and the CE fails and PE2 sends NLRI Now, when the AC between the PE2 and the CE fails and PE2 sends NLRI
withdrawal for Ether A-D per ES route and this withdrawal gets withdrawal for Ether A-D per ES route and this withdrawal gets
propagated and received by the PE3, the BGP process in PE3 removes propagated and received by the PE3, the BGP process in PE3 removes
the corresponding BGP route; however, it doesn't remove the the corresponding BGP route; however, it doesn't remove the
associated info (namely ESI and BGP next hop) from the L2 routing associated info (namely ESI and BGP next hop) from the L2 routing
table (L2 RIB) because it still has the other Ether A-D per ES route table (L2 RIB) because it still has the other Ether A-D per ES route
(originated from PE1) with the same info. That is why the mass- (originated from PE1) with the same info. That is why the mass-
withdraw mechanism does not work when doing DCI with inter-AS option withdraw mechanism does not work when doing DCI with inter-AS option
B. However, as described next, the Aliasing function works and so B. However, as described previoulsy, the aliasing function works and
does mass-withdraw per EVI (which is associated with withdrawing the so does "mass-withdraw per EVI" (which is associated with withdrawing
EVPN route associated with Aliasing - i.e., Ether A-D per EVI route). the EVPN route associated with Aliasing - i.e., Ether A-D per EVI
route).
In the above example, the PE3 receives two Aliasing routes with the In the above example, the PE3 receives two Aliasing routes with the
same BGP next hop (ASBR2) but different RDs. One of the Alias route same BGP next hop (ASBR2) but different RDs. One of the Alias route
has the same RD as the advertised MAC route (M1). PE3 follows the has the same RD as the advertised MAC route (M1). PE3 follows the
route resolution procedure specified in [RFC7432] upon receiving the route resolution procedure specified in [RFC7432] upon receiving the
two Aliasing route. PE3 should also resolve the alias path properly two Aliasing route - ie, it resolves M1 to <ES, EVI1> and
even though both the primary and backup paths have the same BGP next subsequently it resolves <ES,EVI1> to a BGP path list with two paths
hop, they have different RDs and the alias route with the different along with the corresponding VNIs/MPLS labels (one associated with
RD than that of the MAC route is considered as the backup path. PE1 and the other associated with PE2). It should be noted that even
Therefore, PE3 installs both primary and backup paths (and their though both paths are advertised by the same BGP next hop (ASRB2),
associated ESI/EVI MPLS labels or local VNIs) for the MAC route M1. the receiving PE3 can handle them properly. Therefore, M1 is
This creates two end-to-end LSPs from PE3 to PE1 for M1 such that reachable via two paths. This creates two end-to-end LSPs from PE3 to
when PE3 wants to forward traffic destined to M1, it can load PE1 for M1 such that when PE3 wants to forward traffic destined to
balanced between the two paths. Although route resolution for M1, it can load balanced between the two paths. Although route
Aliasing routes with the same BGP next hop is not described in this resolution for Aliasing routes with the same BGP next hop is not
level of details in [RFC7432], it is expected to operate as such and explicitly mentioned in [RFC7432], the is the expected operation and
thus it is clarified here. thus it is elaborated here.
When the AC between the PE2 and the CE fails and PE2 sends NLRI When the AC between the PE2 and the CE fails and PE2 sends NLRI
withdrawal for Ether A-D per EVI routes and these withdrawals get withdrawal for Ether A-D per EVI routes and these withdrawals get
propagated and received by the PE3, the PE3 removes the Aliasing propagated and received by the PE3, the PE3 removes the Aliasing
route and updates all the corresponding MAC routes for that EVI to route and updates the path list - ie, it removes the path
remove the backup path. This action makes the mass-withdraw corresponding to the PE2. Therefore, all the corresponding MAC routes
functionality to perform at the per-EVI level (instead of per-ES). for that <ES,EVI> that point to that path list will now have the
The mass-withdraw at per-EVI level requires more messages than that updated path list with a single path associated with PE1. This action
of per-ES level and thus its convergence time is not as good as per can be considered as the mass-withdraw at the per-EVI level. The
ES level. However, its convergence time is much better than mass-withdraw at per-EVI level has longer convergence time than the
individual MAC withdraw. mass-withdraw at per-ES level; however, it is much faster than the
convergence time when the withdraw is done on a per-MAC basis.
In summary, it can be seen that aliasing and backup path In summary, it can be seen that aliasing (and backup path)
functionality should work as is for inter-AS option B. Furthermore, functionality should work as is for inter-AS option B without
in case of inter-AS option B, mass-withdraw functionality falls back requiring any addition functionality in ASBRs or PEs. However, the
from per-ES to per-EVI. If per-ES mass-withdraw functionality is mass-withdraw functionality falls back from per-ES mode to per-EVI
needed along with backward compatibility, then it is recommended to mode for inter-AS option B - i.e., PEs receiving mass-withdraw route
use GWs (per section 10.1) instead of ASBRs for DCI. from the same AS use Ether A-D per ES route; whereas, PEs receiving
mass-withdraw route from different AS use Ether A-D per EVI route.
11 Acknowledgement 11 Acknowledgement
The authors would like to thank David Smith, John Mullooly, Thomas The authors would like to thank David Smith, John Mullooly, Thomas
Nadeau for their valuable comments and feedback. The authors would Nadeau for their valuable comments and feedback. The authors would
also like to thank Jakob Heitz for his contribution on section 10. also like to thank Jakob Heitz for his contribution on section 10.
12 Security Considerations 12 Security Considerations
This document uses IP-based tunnel technologies to support data This document uses IP-based tunnel technologies to support data
skipping to change at page 26, line 21 skipping to change at page 26, line 42
Network Virtualization", draft-ietf-nvo3-overlay-problem-statement- Network Virtualization", draft-ietf-nvo3-overlay-problem-statement-
01, September 2012. 01, September 2012.
[L3VPN-ENDSYSTEMS] Marques et al., "BGP-signaled End-system IP/VPNs", [L3VPN-ENDSYSTEMS] Marques et al., "BGP-signaled End-system IP/VPNs",
draft-ietf-l3vpn-end-system, work in progress, October 2012. draft-ietf-l3vpn-end-system, work in progress, October 2012.
[NOV3-FRWK] Lasserre et al., "Framework for DC Network [NOV3-FRWK] Lasserre et al., "Framework for DC Network
Virtualization", draft-ietf-nvo3-framework-01.txt, work in progress, Virtualization", draft-ietf-nvo3-framework-01.txt, work in progress,
October 2012. October 2012.
[DCI-EVPN-OVERLAY] Rabadan et al., "Interconnect Solution for EVPN
Overlay networks", draft-ietf-bess-dci-evpn-overlay-02, work in
progress, February 29, 2016.
Contributors Contributors
S. Salam K. Patel D. Rao S. Thoria D. Cai Cisco S. Salam K. Patel D. Rao S. Thoria D. Cai Cisco
Y. Rekhter R. Shekhar Wen Lin Nischal Sheth Juniper Y. Rekhter R. Shekhar Wen Lin Nischal Sheth Juniper
L. Yong Huawei L. Yong Huawei
Authors' Addresses Authors' Addresses
Ali Sajassi Ali Sajassi
Cisco Cisco
Email: sajassi@cisco.com Email: sajassi@cisco.com
John Drake John Drake
Juniper Networks Juniper Networks
skipping to change at page 26, line 40 skipping to change at page 27, line 17
Ali Sajassi Ali Sajassi
Cisco Cisco
Email: sajassi@cisco.com Email: sajassi@cisco.com
John Drake John Drake
Juniper Networks Juniper Networks
Email: jdrake@juniper.net Email: jdrake@juniper.net
Nabil Bitar Nabil Bitar
Verizon Communications Nokia
Email : nabil.n.bitar@verizon.com Email : nabil.bitar@nokia.com
Aldrin Isaac Aldrin Isaac
Juniper Juniper
Email: aisaac@juniper.net Email: aisaac@juniper.net
James Uttaro James Uttaro
AT&T AT&T
Email: uttaro@att.com Email: uttaro@att.com
Wim Henderickx Wim Henderickx
Alcatel-Lucent Alcatel-Lucent
e-mail: wim.henderickx@alcatel-lucent.com e-mail: wim.henderickx@nokia.com
 End of changes. 76 change blocks. 
267 lines changed or deleted 294 lines changed or added

This html diff was produced by rfcdiff 1.45. The latest version is available from http://tools.ietf.org/tools/rfcdiff/