draft-ietf-bess-virtual-subnet-05.txt   draft-ietf-bess-virtual-subnet-06.txt 
Network Working Group X. Xu Network Working Group X. Xu
Internet-Draft Huawei Internet-Draft Huawei Technologies
Intended status: Informational R. Raszuk Intended status: Informational R. Raszuk
Expires: May 14, 2016 Bloomberg LP Expires: May 30, 2016 Bloomberg LP
C. Jacquenet C. Jacquenet
Orange Orange
T. Boyes T. Boyes
Bloomberg LP Bloomberg LP
B. Fee B. Fee
Extreme Networks Extreme Networks
November 11, 2015 November 27, 2015
Virtual Subnet: A BGP/MPLS IP VPN-based Subnet Extension Solution Virtual Subnet: A BGP/MPLS IP VPN-based Subnet Extension Solution
draft-ietf-bess-virtual-subnet-05 draft-ietf-bess-virtual-subnet-06
Abstract Abstract
This document describes a BGP/MPLS IP VPN-based subnet extension This document describes a BGP/MPLS IP VPN-based subnet extension
solution referred to as Virtual Subnet, which can be used for solution referred to as Virtual Subnet, which can be used for
building Layer 3 network virtualization overlays within and/or building Layer 3 network virtualization overlays within and/or
between data centers. between data centers.
Status of This Memo Status of This Memo
skipping to change at page 1, line 40 skipping to change at page 1, line 40
Internet-Drafts are working documents of the Internet Engineering Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF). Note that other groups may also distribute Task Force (IETF). Note that other groups may also distribute
working documents as Internet-Drafts. The list of current Internet- working documents as Internet-Drafts. The list of current Internet-
Drafts is at http://datatracker.ietf.org/drafts/current/. Drafts is at http://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress." material or to cite them other than as "work in progress."
This Internet-Draft will expire on May 14, 2016. This Internet-Draft will expire on May 30, 2016.
Copyright Notice Copyright Notice
Copyright (c) 2015 IETF Trust and the persons identified as the Copyright (c) 2015 IETF Trust and the persons identified as the
document authors. All rights reserved. document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents Provisions Relating to IETF Documents
(http://trustee.ietf.org/license-info) in effect on the date of (http://trustee.ietf.org/license-info) in effect on the date of
publication of this document. Please review these documents publication of this document. Please review these documents
skipping to change at page 2, line 18 skipping to change at page 2, line 18
the Trust Legal Provisions and are provided without warranty as the Trust Legal Provisions and are provided without warranty as
described in the Simplified BSD License. described in the Simplified BSD License.
Table of Contents Table of Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2
2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 4 2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 4
3. Solution Description . . . . . . . . . . . . . . . . . . . . 4 3. Solution Description . . . . . . . . . . . . . . . . . . . . 4
3.1. Unicast . . . . . . . . . . . . . . . . . . . . . . . . . 4 3.1. Unicast . . . . . . . . . . . . . . . . . . . . . . . . . 4
3.1.1. Intra-subnet Unicast . . . . . . . . . . . . . . . . 4 3.1.1. Intra-subnet Unicast . . . . . . . . . . . . . . . . 4
3.1.2. Inter-subnet Unicast . . . . . . . . . . . . . . . . 5 3.1.2. Inter-subnet Unicast . . . . . . . . . . . . . . . . 6
3.2. Multicast . . . . . . . . . . . . . . . . . . . . . . . . 8 3.2. Multicast . . . . . . . . . . . . . . . . . . . . . . . . 8
3.3. Host Discovery . . . . . . . . . . . . . . . . . . . . . 9 3.3. Host Discovery . . . . . . . . . . . . . . . . . . . . . 9
3.4. ARP/ND Proxy . . . . . . . . . . . . . . . . . . . . . . 9 3.4. ARP/ND Proxy . . . . . . . . . . . . . . . . . . . . . . 9
3.5. Host Mobility . . . . . . . . . . . . . . . . . . . . . . 9 3.5. Host Mobility . . . . . . . . . . . . . . . . . . . . . . 9
3.6. Forwarding Table Scalability on Data Center Switches . . 10 3.6. Forwarding Table Scalability on Data Center Switches . . 10
3.7. ARP/ND Cache Table Scalability on Default Gateways . . . 10 3.7. ARP/ND Cache Table Scalability on Default Gateways . . . 10
3.8. ARP/ND and Unknown Uncast Flood Avoidance . . . . . . . . 10 3.8. ARP/ND and Unknown Unicast Flood Avoidance . . . . . . . 10
3.9. Path Optimization . . . . . . . . . . . . . . . . . . . . 10 3.9. Path Optimization . . . . . . . . . . . . . . . . . . . . 10
4. Limitations . . . . . . . . . . . . . . . . . . . . . . . . . 11 4. Limitations . . . . . . . . . . . . . . . . . . . . . . . . . 11
4.1. Non-support of Non-IP Traffic . . . . . . . . . . . . . . 11 4.1. Non-support of Non-IP Traffic . . . . . . . . . . . . . . 11
4.2. Non-support of IP Broadcast and Link-local Multicast . . 11 4.2. Non-support of IP Broadcast and Link-local Multicast . . 11
4.3. TTL and Traceroute . . . . . . . . . . . . . . . . . . . 11 4.3. TTL and Traceroute . . . . . . . . . . . . . . . . . . . 11
5. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 12 5. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 12
6. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 12 6. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 12
7. Security Considerations . . . . . . . . . . . . . . . . . . . 12 7. Security Considerations . . . . . . . . . . . . . . . . . . . 12
8. References . . . . . . . . . . . . . . . . . . . . . . . . . 12 8. References . . . . . . . . . . . . . . . . . . . . . . . . . 12
8.1. Normative References . . . . . . . . . . . . . . . . . . 12 8.1. Normative References . . . . . . . . . . . . . . . . . . 12
8.2. Informative References . . . . . . . . . . . . . . . . . 13 8.2. Informative References . . . . . . . . . . . . . . . . . 13
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 13 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 14
1. Introduction 1. Introduction
For business continuity purpose, Virtual Machine (VM) migration For business continuity purposes, Virtual Machine (VM) migration
across data centers is commonly used in situations such as data across data centers is commonly used in situations such as data
center maintenance, data center migration, data center consolidation, center maintenance, migration, consolidation, expansion, or disaster
data center expansion, and data center disaster avoidance. It's avoidance. It's generally admitted that IP renumbering of servers
generally admitted that IP renumbering of servers (i.e., VMs) after (i.e., VMs) after the migration is usually complex and costly at the
the migration is usually complex and costly at the risk of extending risk of extending the business downtime during the process of
the business downtime during the process of migration. To allow the migration. To allow the migration of a VM from one data center to
migration of a VM from one data center to another without IP another without IP renumbering, the subnet on which the VM resides
renumbering, the subnet on which the VM resides needs to be extended needs to be extended across these data centers.
across these data centers.
To achieve subnet extension across multiple Infrastructure-as- To achieve subnet extension across multiple cloud data centers in a
a-Service (IaaS) cloud data centers in a scalable way, the following scalable way, the following requirements and challenges must be
requirements and challenges must be considered: considered:
a. VPN Instance Space Scalability: In a modern cloud data center a. VPN Instance Space Scalability: In a modern cloud data center
environment, thousands or even tens of thousands of tenants could environment, thousands or even tens of thousands of tenants could
be hosted over a shared network infrastructure. For security and be hosted over a shared network infrastructure. For security and
performance isolation purposes, these tenants need to be isolated performance isolation purposes, these tenants need to be isolated
from one another. from one another.
b. Forwarding Table Scalability: With the development of server b. Forwarding Table Scalability: With the development of server
virtualization technologies, it's not uncommon for a single cloud virtualization technologies, it's not uncommon for a single cloud
data center to contain millions of VMs. This number already data center to contain millions of VMs. This number already
implies a big challenge on the forwarding table scalability of implies a big challenge to the forwarding table scalability of
data center switches. Provided multiple data centers of such data center switches. Provided multiple data centers of such
scale were interconnected at Layer 2, this challenge would become scale were interconnected at Layer 2, this challenge would become
even worse. even worse.
c. ARP/ND Cache Table Scalability: [RFC6820] notes that the Address c. ARP/ND Cache Table Scalability: [RFC6820] notes that the Address
Resolution Protocol (ARP)/Neighbor Discovery (ND) cache tables Resolution Protocol (ARP)/Neighbor Discovery (ND) cache tables
maintained on default gateways within cloud data centers can maintained by default gateways within cloud data centers can
raise scalability issues. Therefore, it's very useful if the raise scalability issues. Therefore, mastering the size of the
ARP/ND cache table size could be prevented from growing by ARP/ND cache tables is critical as the number of data centers to
multiples as the number of data centers to be connected be connected increases.
increases.
d. ARP/ND and Unknown Unicast Flooding: It's well-known that the d. ARP/ND and Unknown Unicast Flooding: It's well-known that the
flooding of ARP/ND broadcast/multicast and unknown unicast flooding of ARP/ND broadcast/multicast messages as well as
traffic within large Layer 2 networks would affect the unknown unicast traffic within large Layer 2 networks is likely
performance of networks and hosts. As multiple data centers with to affect network and host performance. When multiple data
each containing millions of VMs are interconnected at Layer 2, centers that each hosts millions of VMs are interconnected at
the impact of flooding as mentioned above would become even Layer 2, the impact of such flooding would become even worse. As
worse. As such, it becomes increasingly important to avoid the such, it becomes increasingly important to avoid the flooding of
flooding of ARP/ND broadcast/multicast and unknown unicast ARP/ND broadcast/multicast as well as unknown unicast traffic
traffic across data centers. across data centers.
e. Path Optimization: A subnet usually indicates a location in the e. Path Optimization: A subnet usually indicates a location in the
network. However, when a subnet has been extended across network. However, when a subnet has been extended across
multiple geographically dispersed data center locations, the multiple geographically-dispersed data center locations, the
location semantics of such subnet is not retained any longer. As location semantics of such subnet is not retained any longer. As
a result, the traffic between a specific user and server, in a result, traffic exchanged between a specific user and a server
different data centers, may first be routed through a third data that would be located in different data centers, may first be
center. This suboptimal routing would obviously result in an forwarded through a third data center. This suboptimal routing
unnecessary consumption of the bandwidth resource between data would obviously result in an unnecessary consumption of the
centers. Furthermore, in the case where traditional VPLS bandwidth resources between data centers. Furthermore, in the
technology [RFC4761] [RFC4762] is used for data center case where traditional VPLS technology [RFC4761] [RFC4762] is
interconnect, return traffic from a server may be forwarded to a used for data center interconnect, return traffic from a server
default gateway located in a different data center due to the may be forwarded to a default gateway located in a different data
configuration in a virtual router redundancy group. This center due to the configuration of a virtual router redundancy
suboptimal routing would also unnecessarily consume the bandwidth group. This suboptimal routing would also unnecessarily consume
resource between data centers. the bandwidth resources between data centers.
This document describes a BGP/MPLS IP VPN-based subnet extension This document describes a BGP/MPLS IP VPN-based subnet extension
solution referred to as Virtual Subnet, which can be used for data solution referred to as Virtual Subnet, which can be used for data
center interconnection while addressing all of the requirements and center interconnection while addressing all of the aforementioned
challenges as mentioned above. Here the BGP/MPLS IP VPN means both requirements and challenges. Here the BGP/MPLS IP VPN means both
BGP/MPLS IPv4 VPN [RFC4364] and BGP/MPLS IPv6 VPN [RFC4659]. In BGP/MPLS IPv4 VPN [RFC4364] and BGP/MPLS IPv6 VPN [RFC4659]. In
addition, since Virtual Subnet is mainly built on proven technologies addition, since Virtual Subnet is mainly built on proven technologies
such as BGP/MPLS IP VPN and ARP/ND proxy [RFC0925][RFC1027][RFC4389], such as BGP/MPLS IP VPN and ARP/ND proxy [RFC0925][RFC1027][RFC4389],
those service providers offering IaaS public cloud services could those service providers that provide Infrastructure as a Service
rely upon their existing BGP/MPLS IP VPN infrastructures and their (IaaS) cloud services can rely upon their existing BGP/MPLS IP VPN
corresponding experiences to realize data center interconnection. infrastructure and take advantage of their BGP/MPLS VPN operational
experience to interconnect data centers.
Although Virtual Subnet is described in this document as an approach Although Virtual Subnet is described in this document as an approach
for data center interconnection, it actually could be used within for data center interconnection, it can be used within data centers
data centers as well. as well.
Note that the approach described in this document is not intended to Note that the approach described in this document is not intended to
achieve an exact emulation of Layer 2 connectivity and therefore it achieve an exact emulation of Layer 2 connectivity and therefore it
can only support a restricted Layer 2 connectivity service model with can only support a restricted Layer 2 connectivity service model with
limitations declared in Section 4. As for the discussion about in limitations that are discussed in Section 4. As for the discussion
which environment this service model should be suitable, it's outside about where this service model can apply, it's outside the scope of
the scope of this document. this document.
2. Terminology 2. Terminology
This memo makes use of the terms defined in [RFC4364]. This memo makes use of the terms defined in [RFC4364].
3. Solution Description 3. Solution Description
3.1. Unicast 3.1. Unicast
3.1.1. Intra-subnet Unicast 3.1.1. Intra-subnet Unicast
skipping to change at page 5, line 32 skipping to change at page 5, line 32
+------------+---------+--------+ +------------+---------+--------+ +------------+---------+--------+ +------------+---------+--------+
|192.0.2.2/32|192.0.2.2| Direct | |192.0.2.2/32| PE-1 | IBGP | |192.0.2.2/32|192.0.2.2| Direct | |192.0.2.2/32| PE-1 | IBGP |
+------------+---------+--------+ +------------+---------+--------+ +------------+---------+--------+ +------------+---------+--------+
|192.0.2.3/32| PE-2 | IBGP | |192.0.2.3/32|192.0.2.3| Direct | |192.0.2.3/32| PE-2 | IBGP | |192.0.2.3/32|192.0.2.3| Direct |
+------------+---------+--------+ +------------+---------+--------+ +------------+---------+--------+ +------------+---------+--------+
|192.0.2.0/24|192.0.2.1| Direct | |192.0.2.0/24|192.0.2.1| Direct | |192.0.2.0/24|192.0.2.1| Direct | |192.0.2.0/24|192.0.2.1| Direct |
+------------+---------+--------+ +------------+---------+--------+ +------------+---------+--------+ +------------+---------+--------+
Figure 1: Intra-subnet Unicast Example Figure 1: Intra-subnet Unicast Example
As shown in Figure 1, two hosts (i.e., Hosts A and B) belonging to As shown in Figure 1, two hosts (i.e., Hosts A and B) belonging to
the same subnet (i.e., 192.0.2.0/24) are located at different data the same subnet (i.e., 192.0.2.0/24) are located in different data
centers (i.e., DC West and DC East) respectively. PE routers (i.e., centers (i.e., DC West and DC East) respectively. PE routers (i.e.,
PE-1 and PE-2) which are used for interconnecting these two data PE-1 and PE-2) that are used for interconnecting these two data
centers create host routes for their own local hosts respectively and centers create host routes for their own local hosts respectively and
then advertise them via the BGP/MPLS IP VPN signaling. Meanwhile, an then advertise these routes by means of the BGP/MPLS IP VPN
ARP proxy is enabled on VRF attachment circuits of these PE routers. signaling. Meanwhile, an ARP proxy is enabled on Virtual Routing and
Forwarding (VRF) attachment circuits of these PE routers.
Now assume host A sends an ARP request for host B before Let's now assume that host A sends an ARP request for host B before
communicating with host B. Upon receiving the ARP request, PE-1 communicating with host B. Upon receiving the ARP request, PE-1
acting as an ARP proxy returns its own MAC address as a response. acting as an ARP proxy returns its own MAC address as a response.
Host A then sends IP packets for host B to PE-1. PE-1 tunnels such Host A then sends IP packets for host B to PE-1. PE-1 tunnels such
packets towards PE-2 which in turn forwards them to host B. Thus, packets towards PE-2 which in turn forwards them to host B. Thus,
hosts A and B can communicate with each other as if they were located hosts A and B can communicate with each other as if they were located
within the same subnet. within the same subnet.
3.1.2. Inter-subnet Unicast 3.1.2. Inter-subnet Unicast
+--------------------+ +--------------------+
+------------------+ | | +------------------+ +------------------+ | | +------------------+
|VPN_A:192.0.2.1/24| | | |VPN_A:192.0.2.1/24| |VPN_A:192.0.2.1/24| | | |VPN_A:192.0.2.1/24|
| \ | | | | / | | \ | | | | / |
| +------+ \ ++---+-+ +-+---++/ +------+ | | +------+ \ ++---+-+ +-+---++/ +------+ |
| |Host A+-------+ PE-1 | | PE-2 +-+----+Host B| | | |Host A+-------+ PE-1 | | PE-2 +-+----+Host B| |
| +------+\ ++-+-+-+ +-+-+-++ | /+------+ | | +------+\ ++-+-+-+ +-+-+-++ | /+------+ |
| 192.0.2.2/24 | | | | | | | 192.0.2.3/24 | | 192.0.2.2/24 | | | | | | | 192.0.2.3/24 |
| GW=192.0.2.4 | | | | | | | GW=192.0.2.4 | | GW=192.0.2.4 | | | | | | | GW=192.0.2.4 |
| | | | | | | | +------+ | | | | | | | | | +------+ |
skipping to change at page 6, line 41 skipping to change at page 6, line 44
+------------+---------+--------+ +------------+---------+--------+ +------------+---------+--------+ +------------+---------+--------+
|192.0.2.4/32| PE-2 | IBGP | |192.0.2.4/32|192.0.2.4| Direct | |192.0.2.4/32| PE-2 | IBGP | |192.0.2.4/32|192.0.2.4| Direct |
+------------+---------+--------+ +------------+---------+--------+ +------------+---------+--------+ +------------+---------+--------+
|192.0.2.0/24|192.0.2.1| Direct | |192.0.2.0/24|192.0.2.1| Direct | |192.0.2.0/24|192.0.2.1| Direct | |192.0.2.0/24|192.0.2.1| Direct |
+------------+---------+--------+ +------------+---------+--------+ +------------+---------+--------+ +------------+---------+--------+
| 0.0.0.0/0 | PE-2 | IBGP | | 0.0.0.0/0 |192.0.2.4| Static | | 0.0.0.0/0 | PE-2 | IBGP | | 0.0.0.0/0 |192.0.2.4| Static |
+------------+---------+--------+ +------------+---------+--------+ +------------+---------+--------+ +------------+---------+--------+
Figure 2: Inter-subnet Unicast Example (1) Figure 2: Inter-subnet Unicast Example (1)
As shown in Figure 2, only one data center (i.e., DC East) is As shown in Figure 2, only one data center (i.e., DC East) is
deployed with a default gateway (i.e., GW). PE-2 which is connected deployed with a default gateway (i.e., GW). PE-2 that is connected
to GW would either be configured with or learn from GW a default to GW would either be configured with or learn from GW a default
route with next-hop being pointed to GW. Meanwhile, this route is route with the next-hop being pointed at GW. Meanwhile, this route
distributed to other PE routers (i.e., PE-1) as per normal [RFC4364] is distributed to other PE routers (i.e., PE-1) as per normal
operation. Assume host A sends an ARP request for its default [RFC4364] operation. Assume host A sends an ARP request for its
gateway (i.e., 192.0.2.4) prior to communicating with a destination default gateway (i.e., 192.0.2.4) prior to communicating with a
host outside of its subnet. Upon receiving this ARP request, PE-1 destination host outside of its subnet. Upon receiving this ARP
acting as an ARP proxy returns its own MAC address as a response. request, PE-1 acting as an ARP proxy returns its own MAC address as a
Host A then sends a packet for Host B to PE-1. PE-1 tunnels such response. Host A then sends a packet for Host B to PE-1. PE-1
packet towards PE-2 according to the default route learnt from PE-2, tunnels such packet towards PE-2 according to the default route
which in turn forwards that packet to GW. learnt from PE-2, which in turn forwards that packet to GW.
+--------------------+ +--------------------+
+------------------+ | | +------------------+ +------------------+ | | +------------------+
|VPN_A:192.0.2.1/24| | | |VPN_A:192.0.2.1/24| |VPN_A:192.0.2.1/24| | | |VPN_A:192.0.2.1/24|
| \ | | | | / | | \ | | | | / |
| +------+ \ ++---+-+ +-+---++/ +------+ | | +------+ \ ++---+-+ +-+---++/ +------+ |
| |Host A+----+--+ PE-1 | | PE-2 +-+----+Host B| | | |Host A+----+--+ PE-1 | | PE-2 +-+----+Host B| |
| +------+\ | ++-+-+-+ +-+-+-++ | /+------+ | | +------+\ | ++-+-+-+ +-+-+-++ | /+------+ |
| 192.0.2.2/24 | | | | | | | | 192.0.2.3/24 | | 192.0.2.2/24 | | | | | | | | 192.0.2.3/24 |
| GW=192.0.2.4 | | | | | | | | GW=192.0.2.4 | | GW=192.0.2.4 | | | | | | | | GW=192.0.2.4 |
skipping to change at page 8, line 37 skipping to change at page 8, line 37
+------------+---------+--------+ +------------+---------+--------+ +------------+---------+--------+ +------------+---------+--------+
|192.0.2.3/32| PE-2 | IBGP | |192.0.2.3/32|192.0.2.3| Direct | |192.0.2.3/32| PE-2 | IBGP | |192.0.2.3/32|192.0.2.3| Direct |
+------------+---------+--------+ +------------+---------+--------+ +------------+---------+--------+ +------------+---------+--------+
|192.0.2.0/24|192.0.2.1| Direct | |192.0.2.0/24|192.0.2.1| Direct | |192.0.2.0/24|192.0.2.1| Direct | |192.0.2.0/24|192.0.2.1| Direct |
+------------+---------+--------+ +------------+---------+--------+ +------------+---------+--------+ +------------+---------+--------+
| 0.0.0.0/0 | PE-3 | IBGP | | 0.0.0.0/0 | PE-3 | IBGP | | 0.0.0.0/0 | PE-3 | IBGP | | 0.0.0.0/0 | PE-3 | IBGP |
+------------+---------+--------+ +------------+---------+--------+ +------------+---------+--------+ +------------+---------+--------+
Figure 4: Inter-subnet Unicast Example (3) Figure 4: Inter-subnet Unicast Example (3)
Alternatively, as shown in Figure 4, PE routers themselves could be Alternatively, as shown in Figure 4, PE routers themselves could be
directly configured as default gateways of their locally connected configured as default gateways for their locally connected hosts as
hosts as long as these PE routers have routes for outside networks. long as these PE routers have routes to reach outside networks.
3.2. Multicast 3.2. Multicast
To support IP multicast between hosts of the same Virtual Subnet, To support IP multicast between hosts of the same Virtual Subnet,
MVPN technologies [RFC6513] could be directly used without any MVPN technologies [RFC6513] could be used without any change. For
change. For example, PE routers attached to a given VPN join a example, PE routers attached to a given VPN join a default provider
default provider multicast distribution tree which is dedicated for multicast distribution tree which is dedicated to that VPN. Ingress
that VPN. Ingress PE routers, upon receiving multicast packets from PE routers, upon receiving multicast packets from their local hosts,
their local hosts, forward them towards remote PE routers through the forward them towards remote PE routers through the corresponding
corresponding default provider multicast distribution tree. Note default provider multicast distribution tree. Within this context,
that here the IP multicast doesn't include link-local multicast. the IP multicast doesn't include link-local multicast.
3.3. Host Discovery 3.3. Host Discovery
PE routers should be able to discover their local hosts and keep the PE routers should be able to dynamically discover their local hosts
list of these hosts up to date in a timely manner so as to ensure the and keep the list of these hosts up-to-date in a timely manner so as
availability and accuracy of the corresponding host routes originated to ensure the availability and accuracy of the corresponding host
from them. PE routers could accomplish local host discovery by some routes originated from them. PE routers could accomplish local host
traditional host discovery mechanisms using ARP or ND protocols. discovery by some traditional host discovery mechanisms using ARP or
ND protocols.
3.4. ARP/ND Proxy 3.4. ARP/ND Proxy
Acting as an ARP or ND proxies, a PE routers should only respond to Acting as an ARP or ND proxy, a PE router should only respond to an
an ARP request or Neighbor Solicitation (NS) message for a target ARP request or Neighbor Solicitation (NS) message for a target host
host when it has a best route for that target host in the associated when it has a best route for that target host in the associated VRF
VRF and the outgoing interface of that best route is different from and the outgoing interface of that best route is different from the
the one over which the ARP request or NS message is received. In the one over which the ARP request or NS message is received. In the
scenario where a given VPN site (i.e., a data center) is multi-homed scenario where a given VPN site (i.e., a data center) is multi-homed
to more than one PE router via an Ethernet switch or an Ethernet to more than one PE router via an Ethernet switch or an Ethernet
network, Virtual Router Redundancy Protocol (VRRP) [RFC5798] is network, the Virtual Router Redundancy Protocol (VRRP) [RFC5798] is
usually enabled on these PE routers. In this case, only the PE usually enabled on these PE routers. In this case, only the PE
router being elected as the VRRP Master is allowed to perform the router being elected as the VRRP Master is allowed to perform the
ARP/ND proxy function. ARP/ND proxy function.
3.5. Host Mobility 3.5. Host Mobility
During the VM migration process, the PE router to which the moving VM During the VM migration process, the PE router to which the moving VM
is now attached would create a host route for that host upon is now attached would create a host route for that host upon
receiving a notification message of VM attachment (e.g., a gratuitous receiving a notification message of VM attachment (e.g., a gratuitous
ARP or unsolicited NA message). The PE router to which the moving VM ARP or unsolicited NA message). The PE router to which the moving VM
was previously attached would withdraw the corresponding host route was previously attached would withdraw the corresponding host route
when receiving a notification message of VM detachment (e.g., a VDP when noticing the detachment of that VM. Meanwhile, the latter PE
message about VM detachment). Meanwhile, the latter PE router could router could optionally broadcast a gratuitous ARP or send an
optionally broadcast a gratuitous ARP or send an unsolicited NA unsolicited NA message on behalf of that host with source MAC address
message on behalf of that host with source MAC address being one of being one of its own. In this way, the ARP/ND entry of this host
its own. In this way, the ARP/ND entry of this host that moved and that moved and which has been cached on any local host would be
which has been cached on any local host would be updated accordingly. updated accordingly. In the case where there is no explicit VM
In the case where there is no explicit VM detachment notification detachment notification mechanism, the PE router could also use the
mechanism, the PE router could also use the following trick to following trick to detect the VM detachment: upon learning a route
determine the VM detachment event: upon learning a route update for a update for a local host from a remote PE router for the first time,
local host from a remote PE router for the first time, the PE router the PE router could immediately check whether that local host is
could immediately check whether that local host is still attached to still attached to it by some means (e.g., ARP/ND PING and/or ICMP
it by some means (e.g., ARP/ND PING and/or ICMP PING). It is PING). It is important to ensure that the same MAC and IP are
important to ensure that the same MAC and IP are associated to the associated to the default gateway active in each data center, as the
default gateway active in each data center, as the VM would most VM would most likely continue to send packets to the same default
likely continue to send packets to the same default gateway address gateway address after having migrated from one data center to
after migrated from one data center to another. One possible way to another. One possible way to achieve this goal is to configure the
achieve this goal is to configure the same VRRP group on each same VRRP group on each location so as to ensure that the default
location so as to ensure the default gateway active in each data gateway active in each data center shares the same virtual MAC and
center share the same virtual MAC and virtual IP addresses. virtual IP addresses.
3.6. Forwarding Table Scalability on Data Center Switches 3.6. Forwarding Table Scalability on Data Center Switches
In a Virtual Subnet environment, the MAC learning domain associated In a Virtual Subnet environment, the MAC learning domain associated
with a given Virtual Subnet which has been extended across multiple with a given Virtual Subnet which has been extended across multiple
data centers is partitioned into segments and each segment is data centers is partitioned into segments and each segment is
confined within a single data center. Therefore data center switches confined within a single data center. Therefore data center switches
only need to learn local MAC addresses, rather than learning both only need to learn local MAC addresses, rather than learning both
local and remote MAC addresses. local and remote MAC addresses.
3.7. ARP/ND Cache Table Scalability on Default Gateways 3.7. ARP/ND Cache Table Scalability on Default Gateways
When default gateway functions are implemented on PE routers as shown When default gateway functions are implemented on PE routers as shown
in Figure 4, the ARP/ND cache table on each PE router only needs to in Figure 4, the ARP/ND cache table on each PE router only needs to
contain ARP/ND entries of local hosts As a result, the ARP/ND cache contain ARP/ND entries of local hosts. As a result, the ARP/ND cache
table size would not grow as the number of data centers to be table size would not grow as the number of data centers to be
connected increases. connected increases.
3.8. ARP/ND and Unknown Uncast Flood Avoidance 3.8. ARP/ND and Unknown Unicast Flood Avoidance
In a Virtual Subnet environment, the flooding domain associated with In a Virtual Subnet environment, the flooding domain associated with
a given Virtual Subnet that has been extended across multiple data a given Virtual Subnet that has been extended across multiple data
centers, is partitioned into segments and each segment is confined centers, is partitioned into segments and each segment is confined
within a single data center. Therefore, the performance impact on within a single data center. Therefore, the performance impact on
networks and servers imposed by the flooding of ARP/ND broadcast/ networks and servers imposed by the flooding of ARP/ND broadcast/
multicast and unknown unicast traffic is alleviated. multicast and unknown unicast traffic is minimized.
3.9. Path Optimization 3.9. Path Optimization
Take the scenario shown in Figure 4 as an example, to optimize the Take the scenario shown in Figure 4 as an example, to optimize the
forwarding path for the traffic between cloud users and cloud data forwarding path for the traffic between cloud users and cloud data
centers, PE routers located at cloud data centers (i.e., PE-1 and PE- centers, PE routers located in cloud data centers (i.e., PE-1 and PE-
2), which are also acting as default gateways, propagate host routes 2), which are also acting as default gateways, propagate host routes
for their own local hosts respectively to remote PE routers which are for their own local hosts respectively to remote PE routers which are
attached to cloud user sites (i.e., PE-3). As such, the traffic from attached to cloud user sites (i.e., PE-3). As such, traffic from
cloud user sites to a given server on the Virtual Subnet which has cloud user sites to a given server on the Virtual Subnet which has
been extended across data centers would be forwarded directly to the been extended across data centers would be forwarded directly to the
data center location where that server resides, since the traffic is data center location where that server resides, since traffic is now
now forwarded according to the host route for that server, rather forwarded according to the host route for that server, rather than
than the subnet route. Furthermore, for the traffic coming from the subnet route. Furthermore, for traffic coming from cloud data
cloud data centers and forwarded to cloud user sites, each PE router centers and forwarded to cloud user sites, each PE router acting as a
acting as a default gateway would forward the traffic according to default gateway would forward traffic according to the longest-match
the best-match route in the corresponding VRF. As a result, the route in the corresponding VRF. As a result, traffic from data
traffic from data centers to cloud user sites is forwarded along an centers to cloud user sites is forwarded along an optimal path as
optimal path as well. well.
4. Limitations 4. Limitations
4.1. Non-support of Non-IP Traffic 4.1. Non-support of Non-IP Traffic
Although most traffic within and across data centers is IP traffic, Although most traffic within and across data centers is IP traffic,
there may still be a few legacy clustering applications which rely on there may still be a few legacy clustering applications which rely on
non-IP communications (e.g., heartbeat messages between cluster non-IP communications (e.g., heartbeat messages between cluster
nodes). Since Virtual Subnet is strictly based on L3 forwarding, nodes). Since Virtual Subnet is strictly based on L3 forwarding,
those non-IP communications cannot be supported in the Virtual Subnet those non-IP communications cannot be supported in the Virtual Subnet
solution. In order to support those few non-IP traffic (if present) solution. In order to support those few non-IP traffic (if present)
in the environment where the Virtual Subnet solution has been in the environment where the Virtual Subnet solution has been
deployed, the approach following the idea of "route all IP traffic, deployed, the approach following the idea of "route all IP traffic,
bridge non-IP traffic" could be considered. That's to say, all IP bridge non-IP traffic" could be considered. In other words, all IP
traffic including both intra-subnet and inter-subnet would be traffic including both intra- and inter-subnet, would be processed
processed by the Virtual Subnet process, while the non-IP traffic according to the Virtual Subnet design, while non-IP traffic would be
would be resorted to a particular Layer 2 VPN approach. Such unified forwarded according to a particular Layer 2 VPN approach. Such
L2/L3 VPN approach requires ingress PE routers to classify the unified L2/L3 VPN approach requires ingress PE routers to classify
traffic received from hosts before distributing them to the packets received from hosts before distributing them to the
corresponding L2 or L3 VPN forwarding processes. Note that more and corresponding L2 or L3 VPN forwarding processes. Note that more and
more cluster vendors are offering clustering applications based on more cluster vendors are offering clustering applications based on
Layer 3 interconnection. Layer 3 interconnection.
4.2. Non-support of IP Broadcast and Link-local Multicast 4.2. Non-support of IP Broadcast and Link-local Multicast
As illustrated before, intra-subnet traffic is forwarded at Layer 3 As illustrated before, intra-subnet traffic is forwarded at Layer 3
in the Virtual Subnet solution. Therefore, IP broadcast and link- in the Virtual Subnet solution. Therefore, IP broadcast and link-
local multicast traffic cannot be supported by the Virtual Subnet local multicast traffic cannot be supported by the Virtual Subnet
solution. In order to support the IP broadcast and link-local solution. In order to support the IP broadcast and link-local
multicast traffic in the environment where the Virtual Subnet multicast traffic in the environment where the Virtual Subnet
solution has been deployed, the unified L2/L3 overlay approach as solution has been deployed, the unified L2/L3 overlay approach as
described in Section 4.1 could be considered as well. That's to say, described in Section 4.1 could be considered as well. That is, IP
the IP broadcast and link-local multicast would be resorted to the broadcast and link-local multicast messages would be forwared at
L2VPN forwarding process while the routable IP traffic would be Layer 2 while routable IP traffic would be processed according to the
processed by the Virtual Subnet process. Virtual Subnet design.
4.3. TTL and Traceroute 4.3. TTL and Traceroute
As illustrated before, intra-subnet traffic is forwarded at Layer 3 As mentioned before, intra-subnet traffic is forwarded at Layer 3 in
in the Virtual Subnet context. Since it doesn't require any change the Virtual Subnet context. Since it doesn't require any change to
to the TTL handling mechanism of the BGP/MPLS IP VPN, when doing a the Time To Live (TTL) handling mechanism of the BGP/MPLS IP VPN,
traceroute operation on one host for another host (assuming that when doing a traceroute operation on one host for another host
these two hosts are within the same subnet but are attached to (assuming that these two hosts are within the same subnet but are
different sites), the traceroute output would reflect the fact that attached to different sites), the traceroute output would reflect the
these two hosts within the same subnet are actually connected via an fact that these two hosts within the same subnet are actually
Virtual Subnet, rather than a Layer 2 connection since the PE routers connected via a Virtual Subnet, rather than a Layer 2 connection
to which those two host are connected respectively would be displayed since the PE routers to which those two hosts are connected would be
in the traceroute output. In addition, for any other applications displayed in the traceroute output. In addition, for any other
which generate intra-subnet traffic with TTL set to 1, these applications that generate intra-subnet traffic with TTL set to 1,
applications may not be workable in the Virtual Subnet context, these applications may not work properly in the Virtual Subnet
unless special TTL processing for such case has been implemented context, unless special TTL processing for such context has been
(e.g., if the source and destination addresses of a packet whose TTL implemented (e.g., if the source and destination addresses of a
is set to 1 belong to the same extended subnet, neither ingress nor packet whose TTL is set to 1 belong to the same extended subnet,
egress PE routers should decrement the TTL of such packet. neither ingress nor egress PE routers should decrement the TTL of
Furthermore, the TTL of such packet should not be copied into the TTL such packet. Furthermore, the TTL of such packet should not be
of the transport tunnel and vice versa). copied into the TTL of the transport tunnel and vice versa).
5. Acknowledgements 5. Acknowledgements
Thanks to Susan Hares, Yongbing Fan, Dino Farinacci, Himanshu Shah, Thanks to Susan Hares, Yongbing Fan, Dino Farinacci, Himanshu Shah,
Nabil Bitar, Giles Heron, Ronald Bonica, Monique Morrow, Rajiv Asati, Nabil Bitar, Giles Heron, Ronald Bonica, Monique Morrow, Rajiv Asati,
Eric Osborne, Thomas Morin, Martin Vigoureux, Pedro Roque Marque, Joe Eric Osborne, Thomas Morin, Martin Vigoureux, Pedro Roque Marque, Joe
Touch and Wim Henderickx for their valuable comments and suggestions Touch and Wim Henderickx for their valuable comments and suggestions
on this document. Thanks to Loa Andersson for his WG LC review on on this document. Thanks to Loa Andersson for his WG LC review on
this document. Thanks to Alvaro Retana for his AD review on this this document. Thanks to Alvaro Retana for his AD review on this
document. Thanks to Ronald Bonica for his RtgDir review. document. Thanks to Ronald Bonica for his RtgDir review. Thanks to
Donald Eastlake for his Sec-DIR review of this document. Thanks to
Jouni Korhonen for the OPS-Dir review of this document. Thanks to
Roni Even for the Gen-ART review of this document. Thanks to Sabrina
Tanamal for the IANA review of this document.
6. IANA Considerations 6. IANA Considerations
There is no requirement for any IANA action. There is no requirement for any IANA action.
7. Security Considerations 7. Security Considerations
This document doesn't introduce additional security risk to BGP/MPLS Since the BGP/MPLS IP VPN signaling is reused without any change,
IP VPN, nor does it provide any additional security feature for BGP/ those security considerations as described in [RFC4364] are
MPLS IP VPN. applicable to this document. Meanwhile, since security issues
associated with the NDP are inherited due to the use of NDP proxy,
those security considerations and recommendations as described in
[RFC6583] are applicable to this document as well.
8. References 8. References
8.1. Normative References 8.1. Normative References
[RFC0925] Postel, J., "Multi-LAN address resolution", RFC 925, [RFC0925] Postel, J., "Multi-LAN address resolution", RFC 925,
DOI 10.17487/RFC0925, October 1984, DOI 10.17487/RFC0925, October 1984,
<http://www.rfc-editor.org/info/rfc925>. <http://www.rfc-editor.org/info/rfc925>.
[RFC1027] Carl-Mitchell, S. and J. Quarterman, "Using ARP to [RFC1027] Carl-Mitchell, S. and J. Quarterman, "Using ARP to
skipping to change at page 13, line 31 skipping to change at page 13, line 39
[RFC5798] Nadas, S., Ed., "Virtual Router Redundancy Protocol (VRRP) [RFC5798] Nadas, S., Ed., "Virtual Router Redundancy Protocol (VRRP)
Version 3 for IPv4 and IPv6", RFC 5798, Version 3 for IPv4 and IPv6", RFC 5798,
DOI 10.17487/RFC5798, March 2010, DOI 10.17487/RFC5798, March 2010,
<http://www.rfc-editor.org/info/rfc5798>. <http://www.rfc-editor.org/info/rfc5798>.
[RFC6513] Rosen, E., Ed. and R. Aggarwal, Ed., "Multicast in MPLS/ [RFC6513] Rosen, E., Ed. and R. Aggarwal, Ed., "Multicast in MPLS/
BGP IP VPNs", RFC 6513, DOI 10.17487/RFC6513, February BGP IP VPNs", RFC 6513, DOI 10.17487/RFC6513, February
2012, <http://www.rfc-editor.org/info/rfc6513>. 2012, <http://www.rfc-editor.org/info/rfc6513>.
[RFC6583] Gashinsky, I., Jaeggli, J., and W. Kumari, "Operational
Neighbor Discovery Problems", RFC 6583,
DOI 10.17487/RFC6583, March 2012,
<http://www.rfc-editor.org/info/rfc6583>.
[RFC6820] Narten, T., Karir, M., and I. Foo, "Address Resolution [RFC6820] Narten, T., Karir, M., and I. Foo, "Address Resolution
Problems in Large Data Center Networks", RFC 6820, Problems in Large Data Center Networks", RFC 6820,
DOI 10.17487/RFC6820, January 2013, DOI 10.17487/RFC6820, January 2013,
<http://www.rfc-editor.org/info/rfc6820>. <http://www.rfc-editor.org/info/rfc6820>.
Authors' Addresses Authors' Addresses
Xiaohu Xu Xiaohu Xu
Huawei Huawei Technologies
No.156 Beiqing Rd
Beijing 100095
CHINA
Email: xuxiaohu@huawei.com Email: xuxiaohu@huawei.com
Robert Raszuk Robert Raszuk
Bloomberg LP Bloomberg LP
731 Lexington Ave
New York City, NY 10022
USA
Email: robert@raszuk.net Email: robert@raszuk.net
Christian Jacquenet Christian Jacquenet
Orange Orange
4 rue du Clos Courtel
Cesson-Sevigne, 35512
FRANCE
Email: christian.jacquenet@orange.com Email: christian.jacquenet@orange.com
Truman Boyes Truman Boyes
Bloomberg LP Bloomberg LP
Email: tboyes@bloomberg.net Email: tboyes@bloomberg.net
Brendan Fee Brendan Fee
Extreme Networks Extreme Networks
 End of changes. 49 change blocks. 
156 lines changed or deleted 180 lines changed or added

This html diff was produced by rfcdiff 1.42. The latest version is available from http://tools.ietf.org/tools/rfcdiff/