draft-ietf-nvo3-overlay-problem-statement-00.txt   draft-ietf-nvo3-overlay-problem-statement-01.txt 
Internet Engineering Task Force T. Narten, Ed. Internet Engineering Task Force T. Narten, Ed.
Internet-Draft IBM Internet-Draft IBM
Intended status: Informational D. Black Intended status: Informational E. Gray, Ed.
Expires: March 9, 2013 EMC Expires: April 26, 2013 Ericsson
D. Black
EMC
D. Dutt D. Dutt
L. Fang L. Fang
Cisco Systems Cisco Systems
E. Gray, Ed.
Ericsson
L. Kreeger L. Kreeger
Cisco Cisco
M. Napierala M. Napierala
AT&T AT&T
M. Sridharan M. Sridharan
Microsoft Microsoft
September 5, 2012 October 23, 2012
Problem Statement: Overlays for Network Virtualization Problem Statement: Overlays for Network Virtualization
draft-ietf-nvo3-overlay-problem-statement-00 draft-ietf-nvo3-overlay-problem-statement-01
Abstract Abstract
This document describes issues associated with providing multi- This document describes issues associated with providing multi-
tenancy in large data center networks that require an overlay-based tenancy in large data center networks and how these issues may be
network virtualization approach to addressing them. A key multi- addressed using an overlay-based network virtualization approach. A
tenancy requirement is traffic isolation, so that a tenant's traffic key multi-tenancy requirement is traffic isolation, so that one
is not visible to any other tenant. This isolation can be achieved tenant's traffic is not visible to any other tenant. Another
by assigning one or more virtual networks to each tenant such that requirement is address space isolation, so that different tenants can
traffic within a virtual network is isolated from traffic in other use the same address space within different virtual networks.
virtual networks. The primary functionality required is provisioning Traffic and address space isolation is achieved by assigning one or
virtual networks, associating a virtual machine's virtual network more virtual networks to each tenant, where traffic within a virtual
interface(s) with the appropriate virtual network, and maintaining network can only cross into another virtual network in a controlled
that association as the virtual machine is activated, migrated and/or fashion (e.g., via a configured router and/or a security gateway).
deactivated. Use of an overlay-based approach enables scalable Additional functionality is required to provision virtual networks,
deployment on large network infrastructures. associating a virtual machine's network interface(s) with the
appropriate virtual network, and maintaining that association as the
virtual machine is activated, migrated and/or deactivated. Use of an
overlay-based approach enables scalable deployment on large network
infrastructures.
Status of this Memo Status of this Memo
This Internet-Draft is submitted in full conformance with the This Internet-Draft is submitted in full conformance with the
provisions of BCP 78 and BCP 79. provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF). Note that other groups may also distribute Task Force (IETF). Note that other groups may also distribute
working documents as Internet-Drafts. The list of current Internet- working documents as Internet-Drafts. The list of current Internet-
Drafts is at http://datatracker.ietf.org/drafts/current/. Drafts is at http://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress." material or to cite them other than as "work in progress."
This Internet-Draft will expire on March 9, 2013. This Internet-Draft will expire on April 26, 2013.
Copyright Notice Copyright Notice
Copyright (c) 2012 IETF Trust and the persons identified as the Copyright (c) 2012 IETF Trust and the persons identified as the
document authors. All rights reserved. document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents Provisions Relating to IETF Documents
(http://trustee.ietf.org/license-info) in effect on the date of (http://trustee.ietf.org/license-info) in effect on the date of
publication of this document. Please review these documents publication of this document. Please review these documents
carefully, as they describe your rights and restrictions with respect carefully, as they describe your rights and restrictions with respect
to this document. Code Components extracted from this document must to this document. Code Components extracted from this document must
include Simplified BSD License text as described in Section 4.e of include Simplified BSD License text as described in Section 4.e of
the Trust Legal Provisions and are provided without warranty as the Trust Legal Provisions and are provided without warranty as
described in the Simplified BSD License. described in the Simplified BSD License.
Table of Contents Table of Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4
2. Problem Areas . . . . . . . . . . . . . . . . . . . . . . . . 5 2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.1. Need For Dynamic Provisioning . . . . . . . . . . . . . . 5 3. Problem Areas . . . . . . . . . . . . . . . . . . . . . . . . 6
2.2. Virtual Machine Mobility Limitations . . . . . . . . . . . 6 3.1. Need For Dynamic Provisioning . . . . . . . . . . . . . . 6
2.3. Inadequate Forwarding Table Sizes in Switches . . . . . . 6 3.2. Virtual Machine Mobility Limitations . . . . . . . . . . . 6
2.4. Need to Decouple Logical and Physical Configuration . . . 7 3.3. Inadequate Forwarding Table Sizes . . . . . . . . . . . . 7
2.5. Need For Address Separation Between Tenants . . . . . . . 7 3.4. Need to Decouple Logical and Physical Configuration . . . 7
2.6. Need For Address Separation Between Tenant and 3.5. Need For Address Separation Between Virtual Networks . . . 8
Infrastructure . . . . . . . . . . . . . . . . . . . . . . 7 3.6. Need For Address Separation Between Virtual Networks
2.7. IEEE 802.1 VLAN Limitations . . . . . . . . . . . . . . . 8 and Infrastructure . . . . . . . . . . . . . . . . . . . . 8
3. Network Overlays . . . . . . . . . . . . . . . . . . . . . . . 8 3.7. Optimal Forwarding . . . . . . . . . . . . . . . . . . . . 8
3.1. Benefits of Network Overlays . . . . . . . . . . . . . . . 9 4. Using Network Overlays to Provide Virtual Networks . . . . . . 9
3.2. Communication Between Virtual and Traditional Networks . . 10 4.1. Overview of Network Overlays . . . . . . . . . . . . . . . 9
3.3. Communication Between Virtual Networks . . . . . . . . . . 11 4.2. Communication Between Virtual and Non-virtualized
3.4. Overlay Design Characteristics . . . . . . . . . . . . . . 11 Networks . . . . . . . . . . . . . . . . . . . . . . . . . 11
3.5. Overlay Networking Work Areas . . . . . . . . . . . . . . 12 4.3. Communication Between Virtual Networks . . . . . . . . . . 11
4. Related IETF and IEEE Work . . . . . . . . . . . . . . . . . 14 4.4. Overlay Design Characteristics . . . . . . . . . . . . . . 12
4.1. L3 BGP/MPLS IP VPNs . . . . . . . . . . . . . . . . . . . 14 4.5. Control Plane Overlay Networking Work Areas . . . . . . . 13
4.2. L2 BGP/MPLS IP VPNs . . . . . . . . . . . . . . . . . . . 14 4.6. Data Plane Work Areas . . . . . . . . . . . . . . . . . . 14
4.3. IEEE 802.1aq - Shortest Path Bridging . . . . . . . . . . 15 5. Related IETF and IEEE Work . . . . . . . . . . . . . . . . . . 14
4.4. ARMD . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 5.1. BGP/MPLS IP VPNs . . . . . . . . . . . . . . . . . . . . . 15
4.5. TRILL . . . . . . . . . . . . . . . . . . . . . . . . . . 15 5.2. BGP/MPLS Ethernet VPNs . . . . . . . . . . . . . . . . . . 15
4.6. L2VPNs . . . . . . . . . . . . . . . . . . . . . . . . . . 16 5.3. 802.1 VLANs . . . . . . . . . . . . . . . . . . . . . . . 15
4.7. Proxy Mobile IP . . . . . . . . . . . . . . . . . . . . . 16 5.4. IEEE 802.1aq - Shortest Path Bridging . . . . . . . . . . 16
4.8. LISP . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 5.5. ARMD . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
5. Further Work . . . . . . . . . . . . . . . . . . . . . . . . . 16 5.6. TRILL . . . . . . . . . . . . . . . . . . . . . . . . . . 17
6. Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 5.7. L2VPNs . . . . . . . . . . . . . . . . . . . . . . . . . . 17
7. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 17 5.8. Proxy Mobile IP . . . . . . . . . . . . . . . . . . . . . 17
8. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 17 5.9. LISP . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
9. Security Considerations . . . . . . . . . . . . . . . . . . . 17 5.10. VDP . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
10. Informative References . . . . . . . . . . . . . . . . . . . . 17 6. Further Work . . . . . . . . . . . . . . . . . . . . . . . . . 18
Appendix A. Change Log . . . . . . . . . . . . . . . . . . . . . 19 7. Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
A.1. Changes from 8. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 18
draft-narten-nvo3-overlay-problem-statement-04.txt . . . . 19 9. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 19
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 19 10. Security Considerations . . . . . . . . . . . . . . . . . . . 19
11. Informative References . . . . . . . . . . . . . . . . . . . . 19
Appendix A. Change Log . . . . . . . . . . . . . . . . . . . . . 20
A.1. Changes From -00 to -01 . . . . . . . . . . . . . . . . . 20
A.2. Changes from
draft-narten-nvo3-overlay-problem-statement-04.txt . . . . 21
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 21
1. Introduction 1. Introduction
Data Centers are increasingly being consolidated and outsourced in an Data Centers are increasingly being consolidated and outsourced in an
effort, both to improve the deployment time of applications as well effort to improve the deployment time of applications and reduce
as reduce operational costs. This coincides with an increasing operational costs. This coincides with an increasing demand for
demand for compute, storage, and network resources from applications. compute, storage, and network resources from applications. In order
In order to scale compute, storage, and network resources, physical to scale compute, storage, and network resources, physical resources
resources are being abstracted from their logical representation, in are being abstracted from their logical representation, in what is
what is referred to as server, storage, and network virtualization. referred to as server, storage, and network virtualization.
Virtualization can be implemented in various layers of computer Virtualization can be implemented in various layers of computer
systems or networks systems or networks.
The demand for server virtualization is increasing in data centers. The demand for server virtualization is increasing in data centers.
With server virtualization, each physical server supports multiple With server virtualization, each physical server supports multiple
virtual machines (VMs), each running its own operating system, virtual machines (VMs), each running its own operating system,
middleware and applications. Virtualization is a key enabler of middleware and applications. Virtualization is a key enabler of
workload agility, i.e., allowing any server to host any application workload agility, i.e., allowing any server to host any application
and providing the flexibility of adding, shrinking, or moving and providing the flexibility of adding, shrinking, or moving
services within the physical infrastructure. Server virtualization services within the physical infrastructure. Server virtualization
provides numerous benefits, including higher utilization, increased provides numerous benefits, including higher utilization, increased
security, reduced user downtime, reduced power usage, etc. security, reduced user downtime, reduced power usage, etc.
Multi-tenant data centers are taking advantage of the benefits of Multi-tenant data centers are taking advantage of the benefits of
server virtualization to provide a new kind of hosting, a virtual server virtualization to provide a new kind of hosting, a virtual
hosted data center. Multi-tenant data centers are ones where hosted data center. Multi-tenant data centers are ones where
individual tenants could belong to a different company (in the case individual tenants could belong to a different company (in the case
of a public provider) or a different department (in the case of an of a public provider) or a different department (in the case of an
internal company data center). Each tenant has the expectation of a internal company data center). Each tenant has the expectation of a
level of security and privacy separating their resources from those level of security and privacy separating their resources from those
of other tenants. For example, one tenant's traffic must never be of other tenants. For example, one tenant's traffic must never be
exposed to another tenant, except through carefully controlled exposed to another tenant, except through carefully controlled
interfaces, such as a security gateway. interfaces, such as a security gateway (e.g., a firewall).
To a tenant, virtual data centers are similar to their physical To a tenant, virtual data centers are similar to their physical
counterparts, consisting of end stations attached to a network, counterparts, consisting of end stations attached to a network,
complete with services such as load balancers and firewalls. But complete with services such as load balancers and firewalls. But
unlike a physical data center, end stations connect to a virtual unlike a physical data center, tenant systems connect to a virtual
network. To end stations, a virtual network looks like a normal network. To tenant systems, a virtual network looks like a normal
network (e.g., providing an ethernet or L3 service), except that the network (e.g., providing an ethernet or L3 service), except that the
only end stations connected to the virtual network are those only end stations connected to the virtual network are those
belonging to a tenant's specific virtual network. belonging to a tenant's specific virtual network.
A tenant is the administrative entity that is responsible for and A tenant is the administrative entity on whose behalf one or more
manages a specific virtual network instance and its associated specific virtual network instance and its associated services
services (whether virtual or physical). In a cloud environment, a (whether virtual or physical) are managed. In a cloud environment, a
tenant would correspond to the customer that has defined and is using tenant would correspond to the customer that is using a particular
a particular virtual network. However, a tenant may also find it virtual network. However, a tenant may also find it useful to create
useful to create multiple different virtual network instances. multiple different virtual network instances. Hence, there is a one-
to-many mapping between tenants and virtual network instances. A
Hence, there is a one-to-many mapping between tenants and virtual single tenant may operate multiple individual virtual network
network instances. A single tenant may operate multiple individual instances, each associated with a different service.
virtual network instances, each associated with a different service.
How a virtual network is implemented does not generally matter to the How a virtual network is implemented does not generally matter to the
tenant; what matters is that the service provided (L2 or L3) has the tenant; what matters is that the service provided (L2 or L3) has the
right semantics, performance, etc. It could be implemented via a right semantics, performance, etc. It could be implemented via a
pure routed network, a pure bridged network or a combination of pure routed network, a pure bridged network or a combination of
bridged and routed networks. A key requirement is that each bridged and routed networks. A key requirement is that each
individual virtual network instance be isolated from other virtual individual virtual network instance be isolated from other virtual
network instances. network instances, with traffic crossing from one virtual network to
another only when allowed by policy.
For data center virtualization, two key issues must be addressed. For data center virtualization, two key issues must be addressed.
First, address space separation between tenants must be supported. First, address space separation between tenants must be supported.
Second, it must be possible to place (and migrate) VMs anywhere in Second, it must be possible to place (and migrate) VMs anywhere in
the data center, without restricting VM addressing to match the the data center, without restricting VM addressing to match the
subnet boundaries of the underlying data center network. subnet boundaries of the underlying data center network.
This document outlines the problems encountered in scaling the number The document outlines problems encountered in scaling the number of
of isolated networks in a data center, as well as the problems of isolated virtual networks in a data center. Furthermore, the
managing the creation/deletion, membership and span of these networks document presents issues associated with managing those virtual
and makes the case that an overlay based approach, where individual networks, in relation to operations, such as virtual network
networks are implemented within individual virtual networks that are creation/deletion and end-node membership change. Finally, the
dynamically controlled by a standardized control plane provides a document makes the case that an overlay based approach has a number
number of advantages over current approaches. The purpose of this of advantages over traditional, non-overlay approaches. The purpose
document is to identify the set of problems that any solution has to of this document is to identify the set of issues that any solution
address in building multi-tenant data centers. With this approach, has to address in building multi-tenant data centers. With this
the goal is to allow the construction of standardized, interoperable approach, the goal is to allow the construction of standardized,
implementations to allow the construction of multi-tenant data interoperable implementations to allow the construction of multi-
centers. tenant data centers.
Section 2 describes the problem space details. Section 3 describes This document is the problem statement for the "Network
overlay networks in more detail. Sections 4 and 5 review related and Virtualization over L3" (NVO3) Working Group. NVO3 is focused on the
further work, while Section 6 closes with a summary. construction of overlay networks that operate over an IP (L3)
underlay transport network. NVO3 expects to provide both L2 service
and IP service to end devices (though perhaps as two different
solutions). Some deployments require an L2 service, others an L3
service, and some may require both.
2. Problem Areas Section 2 gives terminology. Section 3 describes the problem space
details. Section 4 describes overlay networks in more detail.
Sections 5 and 6 review related and further work, while Section 7
closes with a summary.
2. Terminology
This document uses the same terminology as
[I-D.lasserre-nvo3-framework]. In addition, this document use the
following terms.
In-Band Virtual Network: A Virtual Network that separates tenant
traffic without hiding tenant forwarding information from the
physical infrastructure. The Tenant System may also retain
visibility of a tenant within the underlying physical
infrastructure. IEEE 802.1 networks using C-VIDs are an example
of an in-band Virtual Network.
Overlay Virtual Network: A Virtual Network in which the separation
of tenants is hidden from the underlying physical infrastructure.
That is, the underlying transport network does not need to know
about tenancy separation to correctly forward traffic.
VLANs: An informal term referring to IEEE 802.1 networks using
C-VIDs.
3. Problem Areas
The following subsections describe aspects of multi-tenant data The following subsections describe aspects of multi-tenant data
center networking that pose problems for network infrastructure. center networking that pose problems for network infrastructure.
Different problem aspects may arise based on the network architecture Different problem aspects may arise based on the network architecture
and scale. and scale.
2.1. Need For Dynamic Provisioning 3.1. Need For Dynamic Provisioning
Cloud computing involves on-demand provisioning of resources for Cloud computing involves on-demand provisioning of resources for
multi-tenant environments. A common example of cloud computing is multi-tenant environments. A common example of cloud computing is
the public cloud, where a cloud service provider offers elastic the public cloud, where a cloud service provider offers elastic
services to multiple customers over the same infrastructure. In services to multiple customers over the same infrastructure. In
current systems, it can be difficult to provision resources for current systems, it can be difficult to provision resources for
individual tenants in such a way that provisioned properties migrate individual tenants (e.g., QoS) in such a way that provisioned
automatically when services are dynamically moved around within the properties migrate automatically when services are dynamically moved
data center to optimize workloads. around within the data center to optimize workloads.
2.2. Virtual Machine Mobility Limitations 3.2. Virtual Machine Mobility Limitations
A key benefit of server virtualization is virtual machine (VM) A key benefit of server virtualization is virtual machine (VM)
mobility. A VM can be migrated from one server to another, live, mobility. A VM can be migrated from one server to another, live,
i.e., while continuing to run and without needing to shut it down and i.e., while continuing to run and without needing to shut it down and
restart it at the new location. A key requirement for live migration restart it at the new location. A key requirement for live migration
is that a VM retain critical network state at its new location, is that a VM retain critical network state at its new location,
including its IP and MAC address(es). Preservation of MAC addresses including its IP and MAC address(es). Preservation of MAC addresses
may be necessary, for example, when software licenses are bound to may be necessary, for example, when software licenses are bound to
MAC addresses. More generally, any change in the VM's MAC addresses MAC addresses. More generally, any change in the VM's MAC addresses
resulting from a move would be visible to the VM and thus potentially resulting from a move would be visible to the VM and thus potentially
result in unexpected disruptions. Retaining IP addresses after a result in unexpected disruptions. Retaining IP addresses after a
move is necessary to prevent existing transport connections (e.g., move is necessary to prevent existing transport connections (e.g.,
TCP) from breaking and needing to be restarted. TCP) from breaking and needing to be restarted.
In traditional data centers, servers are assigned IP addresses based In data center networks, servers are typically assigned IP addresses
on their physical location, for example based on the Top of Rack based on their physical location, for example based on the Top of
(ToR) switch for the server rack or the VLAN configured to the Rack (ToR) switch for the server rack or the VLAN configured to the
server. Servers can only move to other locations within the same IP server. Servers can only move to other locations within the same IP
subnet. This constraint is not problematic for physical servers, subnet. This constraint is not problematic for physical servers,
which move infrequently, but it restricts the placement and movement which move infrequently, but it restricts the placement and movement
of VMs within the data center. Any solution for a scalable multi- of VMs within the data center. Any solution for a scalable multi-
tenant data center must allow a VM to be placed (or moved) anywhere tenant data center must allow a VM to be placed (or moved) anywhere
within the data center, without being constrained by the subnet within the data center, without being constrained by the subnet
boundary concerns of the host servers. boundary concerns of the host servers.
2.3. Inadequate Forwarding Table Sizes in Switches 3.3. Inadequate Forwarding Table Sizes
Today's virtualized environments place additional demands on the Today's virtualized environments place additional demands on the
forwarding tables of switches in the physical infrastructure. forwarding tables of forwarding nodes in the physical infrastructure.
Instead of just one link-layer address per server, the switching The core problem is that location independence results in specific
infrastructure has to learn addresses of the individual VMs (which end state information being propagated into the forwarding system
could range in the 100s per server). This is a requirement since (e.g., /32 host routes in L3 networks, or MAC addresses in L2
traffic from/to the VMs to the rest of the physical network will networks). In L2 networks, for instance, instead of just one link-
traverse the physical network infrastructure. This places a much layer address per server, the switching infrastructure may have to
larger demand on the switches' forwarding table capacity compared to learn addresses of the individual VMs (which could range in the 100s
non-virtualized environments, causing more traffic to be flooded or per server). This increases the demand on a forwarding node's table
dropped when the number of addresses in use exceeds a switch's capacity compared to non-virtualized environments.
forwarding table capacity.
2.4. Need to Decouple Logical and Physical Configuration 3.4. Need to Decouple Logical and Physical Configuration
Data center operators must be able to achieve high utilization of Data center operators must be able to achieve high utilization of
server and network capacity. For efficient and flexible allocation, server and network capacity. For efficient and flexible allocation,
operators should be able to spread a virtual network instance across operators should be able to spread a virtual network instance across
servers in any rack in the data center. It should also be possible servers in any rack in the data center. It should also be possible
to migrate compute workloads to any server anywhere in the network to migrate compute workloads to any server anywhere in the network
while retaining the workload's addresses. In networks using VLANs, while retaining the workload's addresses. In networks using VLANs,
moving servers elsewhere in the network may require expanding the moving servers elsewhere in the network may require expanding the
scope of the VLAN beyond its original boundaries. While this can be scope of the VLAN beyond its original boundaries. While this can be
done, it requires potentially complex network configuration changes done, it requires potentially complex network configuration changes
and can conflict with the desire to bound the size of broadcast and can conflict with the desire to bound the size of broadcast
domains, especially in larger data centers. domains, especially in larger data centers. In addition, when VMs
migrate, the physical network (e.g., access lists) may need to be
reconfigured which can be time consuming and error prone.
However, in order to limit the broadcast domain of each VLAN, multi- In order to limit the broadcast domain of each VLAN, multi-
destination frames within a VLAN should optimally flow only to those destination frames within a VLAN should optimally flow only to those
devices that have that VLAN configured. When workloads migrate, the devices that have that VLAN configured. When workloads migrate, the
physical network (e.g., access lists) may need to be reconfigured physical network (e.g., access lists) may need to be reconfigured
which is typically time consuming and error prone. which is typically time consuming and error prone.
An important use case is cross-pod expansion. A pod typically An important use case is cross-pod expansion. A pod typically
consists of one or more racks of servers with its associated network consists of one or more racks of servers with associated network and
and storage connectivity. A tenant's virtual network may start off storage connectivity. A tenant's virtual network may start off on a
on a pod and, due to expansion, require servers/VMs on other pods, pod and, due to expansion, require servers/VMs on other pods,
especially the case when other pods are not fully utilizing all their especially the case when other pods are not fully utilizing all their
resources. This use case requires that virtual networks span resources. This use case requires that virtual networks span
multiple pods in order to provide connectivity to all of its tenant's multiple pods in order to provide connectivity to all of its tenant's
servers/VMs. Such expansion can be difficult to achieve when tenant servers/VMs. Such expansion can be difficult to achieve when tenant
addressing is tied to the addressing used by the underlay network or addressing is tied to the addressing used by the underlay network or
when it requires that the scope of the underlying L2 VLAN expand when the expansion requires that the scope of the underlying L2 VLAN
beyond its original pod boundary. expand beyond its original pod boundary.
2.5. Need For Address Separation Between Tenants 3.5. Need For Address Separation Between Virtual Networks
Individual tenants need control over the addresses they use within a Individual tenants need control over the addresses they use within a
virtual network. But it can be problematic when different tenants virtual network. But it can be problematic when different tenants
want to use the same addresses, or even if the same tenant wants to want to use the same addresses, or even if the same tenant wants to
reuse the same addresses in different virtual networks. reuse the same addresses in different virtual networks.
Consequently, virtual networks must allow tenants to use whatever Consequently, virtual networks must allow tenants to use whatever
addresses they want without concern for what addresses are being used addresses they want without concern for what addresses are being used
by other tenants or other virtual networks. by other tenants or other virtual networks.
2.6. Need For Address Separation Between Tenant and Infrastructure 3.6. Need For Address Separation Between Virtual Networks and
Infrastructure
As in the previous case, a tenant needs to be able to use whatever As in the previous case, a tenant needs to be able to use whatever
addresses it wants in a virtual network independent of what addresses addresses it wants in a virtual network independent of what addresses
the underlying data center network is using. Tenants (and the the underlying data center network is using. Tenants (and the
underlay infrastructure provider) should be able use whatever underlay infrastructure provider) should be able use whatever
addresses make sense for them, without having to worry about address addresses make sense for them, without having to worry about address
collisions between addresses used by tenants and those used by the collisions between addresses used by tenants and those used by the
underlay data center network. underlay data center network.
2.7. IEEE 802.1 VLAN Limitations 3.7. Optimal Forwarding
VLANs are a well known construct in the networking industry, Another problem area relates to the changing of optimal paths when a
providing an L2 service via an L2 underlay. A VLAN is an L2 bridging VM moves from one location to another. In the simplest case, a
construct that provides some of the semantics of virtual networks virtual network may have 2 external routers (whether for inter-VN
mentioned above: a MAC address is unique within a VLAN, but not traffic, or traffic external to all VNs). When a VM migrates to a
necessarily across VLANs. Traffic sourced within a VLAN (including new location within the data center, the closest router may change,
broadcast and multicast traffic) remains within the VLAN it i.e., the VM may get better service by switching to the "closer"
originates from. Traffic forwarded from one VLAN to another router. But IP does not normally distinguish between multiple
typically involves router (L3) processing. The forwarding table look routers on the same subnet. All routers are considered one-hop away.
up operation is keyed on {VLAN, MAC address} tuples.
But there are problems and limitations with L2 VLANs. VLANs are a The issue is further complicated when middleboxes (e.g., load-
pure L2 bridging construct and VLAN identifiers are carried along balancers, firewalls, etc.) must be traversed. Middle boxes may have
with data frames to allow each forwarding point to know what VLAN the session state that must be preserved for ongoing communication, and
frame belongs to. A VLAN today is defined as a 12 bit number, traffic must continue to flow through the middle box, regardless of
limiting the total number of VLANs to 4096 (though typically, this which router is "closest".
number is 4094 since 0 and 4095 are reserved). Due to the large
number of tenants that a cloud provider might service, the 4094 VLAN
limit is often inadequate. In addition, there is often a need for
multiple VLANs per tenant, which exacerbates the issue. The use of a
sufficiently large VNID, present in the overlay control plane and
possibly also in the dataplane would eliminate current VLAN size
limitations associated with single 12-bit VLAN tags.
3. Network Overlays 4. Using Network Overlays to Provide Virtual Networks
Virtual Networks are used to isolate a tenant's traffic from that of Virtual Networks are used to isolate a tenant's traffic from that of
other tenants (or even traffic within the same tenant that requires other tenants (or even traffic within the same tenant network that
isolation). There are two main characteristics of virtual networks: requires isolation). There are two main characteristics of virtual
networks:
1. Virtual networks isolate the address space used in one virtual 1. Virtual networks isolate the address space used in one virtual
network from the address space used by another virtual network. network from the address space used by another virtual network.
The same network addresses may be used in different virtual The same network addresses may be used in different virtual
networks at the same time. In addition, the address space used networks at the same time. In addition, the address space used
by a virtual network is independent from that used by the by a virtual network is independent from that used by the
underlying physical network. underlying physical network.
2. Virtual Networks limit the scope of packets sent on the virtual 2. Virtual Networks limit the scope of packets sent on the virtual
network. Packets sent by end systems attached to a virtual network. Packets sent by Tenant Systems attached to a virtual
network are delivered as expected to other end systems on that network are delivered as expected to other Tenant Systems on that
virtual network and may exit a virtual network only through virtual network and may exit a virtual network only through
controlled exit points such as a security gateway. Likewise, controlled exit points such as a security gateway. Likewise,
packets sourced from outside of the virtual network may enter the packets sourced from outside of the virtual network may enter the
virtual network only through controlled entry points, such as a virtual network only through controlled entry points, such as a
security gateway. security gateway.
3.1. Benefits of Network Overlays 4.1. Overview of Network Overlays
To address the problems described in Section 2, a network overlay To address the problems described in Section 3, a network overlay
model can be used. approach can be used.
The idea behind an overlay is quite straightforward. Each virtual The idea behind an overlay is quite straightforward. Each virtual
network instance is implemented as an overlay. The original packet network instance is implemented as an overlay. The original packet
is encapsulated by the first-hop network device. The encapsulation is encapsulated by the first-hop network device, called a Network
identifies the destination of the device that will perform the Virtualization Edge (NVE). The encapsulation identifies the
decapsulation before delivering the original packet to the endpoint. destination of the device that will perform the decapsulation (i.e.,
The rest of the network forwards the packet based on the the egress NVE) before delivering the original packet to the
endpoint. The rest of the network forwards the packet based on the
encapsulation header and can be oblivious to the payload that is encapsulation header and can be oblivious to the payload that is
carried inside. carried inside.
Overlays are based on what is commonly known as a "map-and-encap" Overlays are based on what is commonly known as a "map-and-encap"
architecture. There are three distinct and logically separable architecture. When processing and forwarding packets, three distinct
steps: and logically separable steps take place:
1. The first-hop overlay device implements a mapping operation that 1. The first-hop overlay device implements a mapping operation that
determines where the encapsulated packet should be sent to reach determines where the encapsulated packet should be sent to reach
its intended destination VM. Specifically, the mapping function its intended destination VM. Specifically, the mapping function
maps the destination address (either L2 or L3) of a packet maps the destination address (either L2 or L3) of a packet
received from a VM into the corresponding destination address of received from a VM into the corresponding destination address of
the egress device. The destination address will be the underlay the egress NVE device. The destination address will be the
address of the device doing the decapsulation and is an IP underlay address of the NVE device doing the decapsulation and is
address. an IP address.
2. Once the mapping has been determined, the ingress overlay device 2. Once the mapping has been determined, the ingress overlay NVE
encapsulates the received packet within an overlay header. device encapsulates the received packet within an overlay header.
3. The final step is to actually forward the (now encapsulated) 3. The final step is to actually forward the (now encapsulated)
packet to its destination. The packet is forwarded by the packet to its destination. The packet is forwarded by the
underlay (i.e., the IP network) based entirely on its outer underlay (i.e., the IP network) based entirely on its outer
address. Upon receipt at the destination, the egress overlay address. Upon receipt at the destination, the egress overlay NVE
device decapsulates the original packet and delivers it to the device decapsulates the original packet and delivers it to the
intended recipient VM. intended recipient VM.
Each of the above steps is logically distinct, though an Each of the above steps is logically distinct, though an
implementation might combine them for efficiency or other reasons. implementation might combine them for efficiency or other reasons.
It should be noted that in L3 BGP/VPN terminology, the above steps It should be noted that in L3 BGP/VPN terminology, the above steps
are commonly known as "forwarding" or "virtual forwarding". are commonly known as "forwarding" or "virtual forwarding".
The first hop network device can be a traditional switch or router or The first hop network NVE device can be a traditional switch or
the virtual switch residing inside a hypervisor. Furthermore, the router or the virtual switch residing inside a hypervisor.
endpoint can be a VM or it can be a physical server. Examples of Furthermore, the endpoint can be a VM or it can be a physical server.
architectures based on network overlays include BGP/MPLS VPNs Examples of architectures based on network overlays include BGP/MPLS
[RFC4364], TRILL [RFC6325], LISP [I-D.ietf-lisp], and Shortest Path VPNs [RFC4364], TRILL [RFC6325], LISP [I-D.ietf-lisp], and Shortest
Bridging (SPB-M) [SPBM]. Path Bridging (SPBM) [SPBM].
In the data plane, a virtual network identifier (or VNID), or a
locally significant identifier, can be carried as part of the overlay
header so that every data packet explicitly identifies the specific
virtual network the packet belongs to. Since both routed and bridged
semantics can be supported by a virtual data center, the original
packet carried within the overlay header can be an Ethernet frame
complete with MAC addresses or just the IP packet.
The use of a sufficiently large VNID would address current VLAN In the data plane, an overlay header provides a place to carry either
limitations associated with single 12-bit VLAN tags. This VNID can the virtual network identifier, or an identifier that is locally-
be carried in the control plane. In the data plane, an overlay significant to the edge device. In both cases, the identifier in the
header provides a place to carry either the VNID, or an identifier overlay header specifies which specific virtual network the data
that is locally-significant to the edge device. In both cases, the packet belongs to. Since both routed and bridged semantics can be
identifier in the overlay header specifies which virtual network the supported by a virtual data center, the original packet carried
data packet belongs to. within the overlay header can be an Ethernet frame or just the IP
packet.
A key aspect of overlays is the decoupling of the "virtual" MAC A key aspect of overlays is the decoupling of the "virtual" MAC
and/or IP addresses used by VMs from the physical network and/or IP addresses used by VMs from the physical network
infrastructure and the infrastructure IP addresses used by the data infrastructure and the infrastructure IP addresses used by the data
center. If a VM changes location, the overlay edge devices simply center. If a VM changes location, the overlay edge devices simply
update their mapping tables to reflect the new location of the VM update their mapping tables to reflect the new location of the VM
within the data center's infrastructure space. Because an overlay within the data center's infrastructure space. Because an overlay
network is used, a VM can now be located anywhere in the data center network is used, a VM can now be located anywhere in the data center
that the overlay reaches without regards to traditional constraints that the overlay reaches without regards to traditional constraints
implied by L2 properties such as VLAN numbering, or the span of an L2 implied by L2 properties such as VLAN numbering, or L3 properties as
broadcast domain scoped to a single pod or access switch. the IP subnet number.
Multi-tenancy is supported by isolating the traffic of one virtual Multi-tenancy is supported by isolating the traffic of one virtual
network instance from traffic of another. Traffic from one virtual network instance from traffic of another. Traffic from one virtual
network instance cannot be delivered to another instance without network instance cannot be delivered to another instance without
(conceptually) exiting the instance and entering the other instance (conceptually) exiting the instance and entering the other instance
via an entity that has connectivity to both virtual network via an entity (e.g., a gateway) that has connectivity to both virtual
instances. Without the existence of this entity, tenant traffic network instances. Without the existence of a gateway entity, tenant
remains isolated within each individual virtual network instance. traffic remains isolated within each individual virtual network
instance.
Overlays are designed to allow a set of VMs to be placed within a Overlays are designed to allow a set of VMs to be placed within a
single virtual network instance, whether that virtual network single virtual network instance, whether that virtual network
provides a bridged network or a routed network. provides a bridged network or a routed network.
3.2. Communication Between Virtual and Traditional Networks 4.2. Communication Between Virtual and Non-virtualized Networks
Not all communication will be between devices connected to Not all communication will be between devices connected to
virtualized networks. Devices using overlays will continue to access virtualized networks. Devices using overlays will continue to access
devices and make use of services on traditional, non-virtualized devices and make use of services on non-virtualized networks, whether
networks, whether in the data center, the public Internet, or at in the data center, the public Internet, or at remote/branch
remote/branch campuses. Any virtual network solution must be capable campuses. Any virtual network solution must be capable of
of interoperating with existing routers, VPN services, load interoperating with existing routers, VPN services, load balancers,
balancers, intrusion detection services, firewalls, etc. on external intrusion detection services, firewalls, etc. on external networks.
networks.
Communication between devices attached to a virtual network and Communication between devices attached to a virtual network and
devices connected to non-virtualized networks is handled devices connected to non-virtualized networks is handled
architecturally by having specialized gateway devices that receive architecturally by having specialized gateway devices that receive
packets from a virtualized network, decapsulate them, process them as packets from a virtualized network, decapsulate them, process them as
regular (i.e., non-virtualized) traffic, and finally forward them on regular (i.e., non-virtualized) traffic, and finally forward them on
to their appropriate destination (and vice versa). Additional to their appropriate destination (and vice versa).
identification, such as VLAN tags, could be used on the non-
virtualized side of such a gateway to enable forwarding of traffic
for multiple virtual networks over a common non-virtualized link.
A wide range of implementation approaches are possible. Overlay A wide range of implementation approaches are possible. Overlay
gateway functionality could be combined with other network gateway functionality could be combined with other network
functionality into a network device that implements the overlay functionality into a network device that implements the overlay
functionality, and then forwards traffic between other internal functionality, and then forwards traffic between other internal
components that implement functionality such as full router service, components that implement functionality such as full router service,
load balancing, firewall support, VPN gateway, etc. load balancing, firewall support, VPN gateway, etc.
3.3. Communication Between Virtual Networks 4.3. Communication Between Virtual Networks
Communication between devices on different virtual networks is Communication between devices on different virtual networks is
handled architecturally by adding specialized interconnect handled architecturally by adding specialized interconnect
functionality among the otherwise isolated virtual networks. For a functionality among the otherwise isolated virtual networks. For a
virtual network providing an L2 service, such interconnect virtual network providing an L2 service, such interconnect
functionality could be IP forwarding configured as part of the functionality could be IP forwarding configured as part of the
"default gateway" for each virtual network. For a virtual network "default gateway" for each virtual network. For a virtual network
providing L3 service, the interconnect functionality could be IP providing L3 service, the interconnect functionality could be IP
forwarding configured as part of routing between IP subnets or it can forwarding configured as part of routing between IP subnets or it can
be based on configured inter-virtual network traffic policies. In be based on configured inter-virtual-network traffic policies. In
both cases, the implementation of the interconnect functionality both cases, the implementation of the interconnect functionality
could be distributed across the NVEs, and could be combined with could be distributed across the NVEs and could be combined with other
other network functionality (e.g., load balancing, firewall support) network functionality (e.g., load balancing, firewall support) that
that is applied to traffic that is forwarded between virtual is applied to traffic forwarded between virtual networks.
networks.
3.4. Overlay Design Characteristics 4.4. Overlay Design Characteristics
There are existing layer 2 and layer 3 overlay protocols in Below are some of the characteristics of environments that must be
existence, but they do not necessarily solve all of today's problem taken into account by the overlay technology.
in the environment of a highly virtualized data center. Below are
some of the characteristics of environments that must be taken into
account by the overlay technology:
1. Highly distributed systems. The overlay should work in an 1. Highly distributed systems: The overlay should work in an
environment where there could be many thousands of access devices environment where there could be many thousands of access
(e.g. residing within the hypervisors) and many more end systems switches (e.g. residing within the hypervisors) and many more
(e.g. VMs) connected to them. This leads to a distributed Tenant Systems (e.g. VMs) connected to them. This leads to a
mapping system that puts a low overhead on the overlay tunnel distributed mapping system that puts a low overhead on the
endpoints. overlay tunnel endpoints.
2. Many highly distributed virtual networks with sparse membership. 2. Many highly distributed virtual networks with sparse membership:
Each virtual network could be highly dispersed inside the data Each virtual network could be highly dispersed inside the data
center. Also, along with expectation of many virtual networks, center. Also, along with expectation of many virtual networks,
the number of end systems connected to any one virtual network is the number of end systems connected to any one virtual network is
expected to be relatively low; Therefore, the percentage of expected to be relatively low; Therefore, the percentage of NVEs
access devices participating in any given virtual network would participating in any given virtual network would also be expected
also be expected to be low. For this reason, efficient delivery to be low. For this reason, efficient delivery of multi-
of multi-destination traffic within a virtual network instance destination traffic within a virtual network instance should be
should be taken into consideration. taken into consideration.
3. Highly dynamic end systems. End systems connected to virtual 3. Highly dynamic Tenant Systems: Tenant Systems connected to
networks can be very dynamic, both in terms of creation/deletion/ virtual networks can be very dynamic, both in terms of creation/
power-on/off and in terms of mobility across the access devices. deletion/power-on/off and in terms of mobility from one access
device to another.
4. Work with existing, widely deployed network Ethernet switches and 4. Be incrementally deployable, without necessarily requiring major
IP routers without requiring wholesale replacement. The first upgrade of the entire network: The first hop device (or end
hop device (or end system) that adds and removes the overlay system) that adds and removes the overlay header may require new
header will require new equipment and/or new software. software and may require new hardware (e.g., for improved
performance). But the rest of the network should not need to
change just to enable the use of overlays.
5. Work with existing data center network deployments without 5. Work with existing data center network deployments without
requiring major changes in operational or other practices. For requiring major changes in operational or other practices: For
example, some data centers have not enabled multicast beyond example, some data centers have not enabled multicast beyond
link-local scope. Overlays should be capable of leveraging link-local scope. Overlays should be capable of leveraging
underlay multicast support where appropriate, but not require its underlay multicast support where appropriate, but not require its
enablement in order to use an overlay solution. enablement in order to use an overlay solution.
6. Network infrastructure administered by a single administrative 6. Network infrastructure administered by a single administrative
domain. This is consistent with operation within a data center, domain: This is consistent with operation within a data center,
and not across the Internet. and not across the Internet.
3.5. Overlay Networking Work Areas 4.5. Control Plane Overlay Networking Work Areas
There are three specific and separate potential work areas needed to There are three specific and separate potential work areas in the
realize an overlay solution. The areas correspond to different area of control plane protocols needed to realize an overlay
possible "on-the-wire" protocols, where distinct entities interact solution. The areas correspond to different possible "on-the-wire"
with each other. protocols, where distinct entities interact with each other.
One area of work concerns the address dissemination protocol an NVE One area of work concerns the address dissemination protocol an NVE
uses to build and maintain the mapping tables it uses to deliver uses to build and maintain the mapping tables it uses to deliver
encapsulated packets to their proper destination. One approach is to encapsulated packets to their proper destination. One approach is to
build mapping tables entirely via learning (as is done in 802.1 build mapping tables entirely via learning (as is done in 802.1
networks). But to provide better scaling properties, a more networks). Another approach is to use a specialized control plane
sophisticated approach is needed, i.e., the use of a specialized protocol. While there are some advantages to using or leveraging an
control plane protocol. While there are some advantages to using or existing protocol for maintaining mapping tables, the fact that large
leveraging an existing protocol for maintaining mapping tables, the numbers of NVE's will likely reside in hypervisors places constraints
fact that large numbers of NVE's will likely reside in hypervisors on the resources (cpu and memory) that can be dedicated to such
places constraints on the resources (cpu and memory) that can be functions.
dedicated to such functions.
From an architectural perspective, one can view the address mapping From an architectural perspective, one can view the address mapping
dissemination problem as having two distinct and separable dissemination problem as having two distinct and separable
components. The first component consists of a back-end "oracle" that components. The first component consists of a back-end "oracle" that
is responsible for distributing and maintaining the mapping is responsible for distributing and maintaining the mapping
information for the entire overlay system. The second component information for the entire overlay system. For this document, we use
consists of the on-the-wire protocols an NVE uses when interacting the term "oracle" in its generic sense, referring to an entity that
with the oracle. supplies answers, without regard to how it knows the answers it is
providing. The second component consists of the on-the-wire
protocols an NVE uses when interacting with the oracle.
The back-end oracle could provide high performance, high resiliency, The back-end oracle could provide high performance, high resiliency,
failover, etc. and could be implemented in significantly different failover, etc. and could be implemented in significantly different
ways. For example, one model uses a traditional, centralized ways. For example, one model uses a traditional, centralized
"directory-based" database, using replicated instances for "directory-based" database, using replicated instances for
reliability and failover. A second model involves using and possibly reliability and failover. A second model involves using and possibly
extending an existing routing protocol (e.g., BGP, IS-IS, etc.). To extending an existing routing protocol (e.g., BGP, IS-IS, etc.). To
support different architectural models, it is useful to have one support different architectural models, it is useful to have one
standard protocol for the NVE-oracle interaction while allowing standard protocol for the NVE-oracle interaction while allowing
different protocols and architectural approaches for the oracle different protocols and architectural approaches for the oracle
itself. Separating the two allows NVEs to transparently interact itself. Separating the two allows NVEs to transparently interact
with different types of oracles, i.e., either of the two with different types of oracles, i.e., either of the two
architectural models described above. Having separate protocols architectural models described above. Having separate protocols
could also allow for a simplified NVE that only interacts with the could also allow for a simplified NVE that only interacts with the
oracle for the mapping table entries it needs and allows the oracle oracle for the mapping table entries it needs and allows the oracle
(and its associated protocols) to evolve independently over time with (and its associated protocols) to evolve independently over time with
minimal impact to the NVEs. minimal impact to the NVEs.
A third work area considers the attachment and detachment of VMs (or A third work area considers the attachment and detachment of VMs (or
Tenant End Systems [I-D.lasserre-nvo3-framework] more generally) from Tenant Systems [I-D.lasserre-nvo3-framework] more generally) from a
a specific virtual network instance. When a VM attaches, the Network specific virtual network instance. When a VM attaches, the NVE
Virtualization Edge (NVE) [I-D.lasserre-nvo3-framework] associates associates the VM with a specific overlay for the purposes of
the VM with a specific overlay for the purposes of tunneling traffic tunneling traffic sourced from or destined to the VM. When a VM
sourced from or destined to the VM. When a VM disconnects, it is disconnects, the NVE should notify the oracle that the Tenant System
removed from the overlay and the NVE effectively terminates any to NVE address mapping is no longer valid. In addition, if this VM
tunnels associated with the VM. To achieve this functionality, a was the last remaining member of the virtual network, then the NVE
standardized interaction between the NVE and hypervisor may be can also terminate any tunnels used to deliver tenant multi-
needed, for example in the case where the NVE resides on a separate destination packets within the VN to the NVE. In the case where an
device from the VM. NVE and hypervisor are on separate physical devices separated by an
access network, a standardized protocol may be needed.
In summary, there are three areas of potential work. The first area In summary, there are three areas of potential work. The first area
concerns the oracle itself and any on-the-wire protocols it needs. A concerns the implementation of the oracle function itself and any
second area concerns the interaction between the oracle and NVEs. protocols it needs (e.g., if implemented in a distributed fashion).
A second area concerns the interaction between the oracle and NVEs.
The third work area concerns protocols associated with attaching and The third work area concerns protocols associated with attaching and
detaching a VM from a particular virtual network instance. All three detaching a VM from a particular virtual network instance. All three
work areas are important to the development of scalable, work areas are important to the development of scalable,
interoperable solutions. interoperable solutions.
4. Related IETF and IEEE Work 4.6. Data Plane Work Areas
The following subsections discuss related IETF and IEEE work in The data plane carries encapsulated packets for Tenant Systems. The
progress, the items are not meant to be complete coverage of all IETF data plane encapsulation header carries a VN Context identifier
and IEEE data center related work, nor are the descriptions [I-D.lasserre-nvo3-framework] for the virtual network to which the
comprehensive. Each area is currently trying to address certain data packet belongs. Numerous encapsulation or tunneling protocols
limitations of today's data center networks, e.g., scaling is a already exist that can be leveraged. In the absence of strong and
common issue for every area listed and multi-tenancy and VM mobility compelling justification, it would not seem necessary or helpful to
are important focus areas as well. Comparing and evaluating the work develop yet another encapsulation format just for NVO3.
result and progress of each work area listed is out of scope of this
document. The intent of this section is to provide a reference to
the interested readers.
4.1. L3 BGP/MPLS IP VPNs 5. Related IETF and IEEE Work
BGP/MPLS IP VPNs [RFC4364] support multi-tenancy address overlapping, The following subsections discuss related IETF and IEEE work. The
VPN traffic isolation, and address separation between tenants and items are not meant to provide complete coverage of all IETF and IEEE
network infrastructure. The BGP/MPLS control plane is used to data center related work, nor should the descriptions be considered
distribute the VPN labels and the tenant IP addresses which identify comprehensive. Each area aims to address particular limitations of
the tenants (or to be more specific, the particular VPN/VN) and today's data center networks. In all areas, scaling is a common
tenant IP addresses. Deployment of enterprise L3 VPNs has been shown theme as are multi-tenancy and VM mobility. Comparing and evaluating
to scale to thousands of VPNs and millions of VPN prefixes. BGP/MPLS the work result and progress of each work area listed is out of scope
IP VPNs are currently deployed in some large enterprise data centers. of this document. The intent of this section is to provide a
The potential limitation for deploying BGP/MPLS IP VPNs in data reference to the interested readers. Note that NVO3 is scoped to
center environments is the practicality of using BGP in the data running over an IP/L3 underlay network.
center, especially reaching into the servers or hypervisors. There
may be computing work force skill set issues, equipment support 5.1. BGP/MPLS IP VPNs
issues, and potential new scaling challenges. A combination of BGP
and lighter weight IP signaling protocols, e.g., XMPP, have been BGP/MPLS IP VPNs [RFC4364] support multi-tenancy, VPN traffic
proposed to extend the solutions into DC environment isolation, address overlapping and address separation between tenants
[I-D.marques-l3vpn-end-system], while taking advantage of building in and network infrastructure. The BGP/MPLS control plane is used to
distribute the VPN labels and the tenant IP addresses that identify
the tenants (or to be more specific, the particular VPN/virtual
network) and tenant IP addresses. Deployment of enterprise L3 VPNs
has been shown to scale to thousands of VPNs and millions of VPN
prefixes. BGP/MPLS IP VPNs are currently deployed in some large
enterprise data centers. The potential limitation for deploying BGP/
MPLS IP VPNs in data center environments is the practicality of using
BGP in the data center, especially reaching into the servers or
hypervisors. There may be computing work force skill set issues,
equipment support issues, and potential new scaling challenges. A
combination of BGP and lighter weight IP signaling protocols, e.g.,
XMPP, have been proposed to extend the solutions into DC environment
[I-D.marques-l3vpn-end-system], while taking advantage of built-in
VPN features with its rich policy support; it is especially useful VPN features with its rich policy support; it is especially useful
for inter-tenant connectivity. for inter-tenant connectivity.
4.2. L2 BGP/MPLS IP VPNs 5.2. BGP/MPLS Ethernet VPNs
Ethernet Virtual Private Networks (E-VPNs) [I-D.ietf-l2vpn-evpn] Ethernet Virtual Private Networks (E-VPNs) [I-D.ietf-l2vpn-evpn]
provide an emulated L2 service in which each tenant has its own provide an emulated L2 service in which each tenant has its own
Ethernet network over a common IP or MPLS infrastructure and a BGP/ Ethernet network over a common IP or MPLS infrastructure. A BGP/MPLS
MPLS control plane is used to distribute the tenant MAC addresses and control plane is used to distribute the tenant MAC addresses and the
the MPLS labels that identify the tenants and tenant MAC addresses. MPLS labels that identify the tenants and tenant MAC addresses.
Within the BGP/MPLS control plane a thirty two bit Ethernet Tag is Within the BGP/MPLS control plane a thirty two bit Ethernet Tag is
used to identify the broadcast domains (VLANs) associated with a used to identify the broadcast domains (VLANs) associated with a
given L2 VLAN service instance and these Ethernet tags are mapped to given L2 VLAN service instance and these Ethernet tags are mapped to
VLAN IDs understood by the tenant at the service edges. This means VLAN IDs understood by the tenant at the service edges. This means
that the limit of 4096 VLANs is associated with an individual tenant that the limit of 4096 VLANs is associated with an individual tenant
service edge, enabling a much higher level of scalability. service edge, enabling a much higher level of scalability.
Interconnection between tenants is also allowed in a controlled Interconnection between tenants is also allowed in a controlled
fashion. fashion.
VM Mobility [I-D.raggarwa-data-center-mobility] introduces the VM Mobility [I-D.raggarwa-data-center-mobility] introduces the
concept of a combined L2/L3 VPN service in order to support the concept of a combined L2/L3 VPN service in order to support the
mobility of individual Virtual Machines (VMs) between Data Centers mobility of individual Virtual Machines (VMs) between Data Centers
connected over a common IP or MPLS infrastructure. connected over a common IP or MPLS infrastructure.
4.3. IEEE 802.1aq - Shortest Path Bridging 5.3. 802.1 VLANs
Shortest Path Bridging (SPB-M) is an IS-IS based overlay for L2 VLANs are a well understood construct in the networking industry,
Ethernets. SPB-M supports multi-pathing and addresses a number of providing an L2 service via an in-band L2 Virtual Network. A VLAN is
shortcoming in the original Ethernet Spanning Tree Protocol. SPB-M an L2 bridging construct that provides the semantics of virtual
uses IEEE 802.1ah MAC-in-MAC encapsulation and supports a 24-bit networks mentioned above: a MAC address can be kept unique within a
I-SID, which can be used to identify virtual network instances. VLAN, but it is not necessarily unique across VLANs. Traffic scoped
SPB-M is entirely L2 based, extending the L2 Ethernet bridging model. within a VLAN (including broadcast and multicast traffic) can be kept
within the VLAN it originates from. Traffic forwarded from one VLAN
to another typically involves router (L3) processing. The forwarding
table look up operation may be keyed on {VLAN, MAC address} tuples.
4.4. ARMD VLANs are a pure L2 bridging construct and VLAN identifiers are
carried along with data frames to allow each forwarding point to know
what VLAN the frame belongs to. Various types of VLANs are available
today, which can be used for network virtualization even together.
The C-VLAN, S-VLAN and B-VLAN IDs are 12 bits. The 24-bit I-SID
allows the support of more than 16 million virtual networks.
Altogether, 60 bits are available for network virtualization in the
Ethernet header today.
ARMD is chartered to look at data center scaling issues with a focus 5.4. IEEE 802.1aq - Shortest Path Bridging
on address resolution. ARMD is currently chartered to develop a
problem statement and is not currently developing solutions. While
an overlay-based approach may address some of the "pain points" that
have been raised in ARMD (e.g., better support for multi-tenancy), an
overlay approach may also push some of the L2 scaling concerns (e.g.,
excessive flooding) to the IP level (flooding via IP multicast).
Analysis will be needed to understand the scaling tradeoffs of an
overlay based approach compared with existing approaches. On the
other hand, existing IP-based approaches such as proxy ARP may help
mitigate some concerns.
4.5. TRILL Shortest Path Bridging (SPBM) [SPBM] is an IS-IS based overlay that
operates over L2 Ethernets. SPBM supports multi-pathing and
addresses a number of shortcoming in the original Ethernet Spanning
Tree Protocol. SPBM uses IEEE 802.1ah PBB (MAC-in-MAC) encapsulation
and supports a 24-bit I-SID, which can be used to identify virtual
network instances. SPBM provides multi-pathing and supports easy
virtual network creation or update.
TRILL is an L2-based approach aimed at improving deficiencies and SPB extends IS-IS in order to perform link-state routing among core
limitations with current Ethernet networks and STP in particular. SPBM nodes, obviating the need for learning for communication among
Although it differs from Shortest Path Bridging in many architectural core SPBM nodes. Learning is still used to build and maintain the
and implementation details, it is similar in that is provides an L2- mapping tables of edge nodes to encapsulate Tenant System traffic for
based service to end systems. TRILL as defined today, supports only transport across the SPBM core.
the standard (and limited) 12-bit VLAN model. Approaches to extend
TRILL to support more than 4094 VLANs are currently under
investigation [I-D.ietf-trill-fine-labeling]
4.6. L2VPNs SPB is compatible with all other 802.1 standards thus allows
leveraging of other features, e.g., VSI Discovery Protocol (VDP), OAM
or scalability solutions.
5.5. ARMD
The ARMD WG examined data center scaling issues with a focus on
address resolution and developed a problem statement document
[I-D.ietf-armd-problem-statement]. While an overlay-based approach
may address some of the "pain points" that were raised in ARMD (e.g.,
better support for multi-tenancy), an overlay approach may also push
some of the L2 scaling concerns (e.g., excessive flooding) to the IP
level (flooding via IP multicast). Analysis will be needed to
understand the scaling tradeoffs of an overlay based approach
compared with existing approaches. On the other hand, existing IP-
based approaches such as proxy ARP may help mitigate some concerns.
5.6. TRILL
TRILL is a network protocol that provides an Ethernet L2 service to
end systems and is designed to operate over any L2 link type. TRILL
establishes forwarding paths using IS-IS routing and encapsulates
traffic within its own TRILL header. TRILL as defined today,
supports only the standard (and limited) 12-bit C-VID identifier.
Approaches to extend TRILL to support more than 4094 VLANs are
currently under investigation [I-D.ietf-trill-fine-labeling]
5.7. L2VPNs
The IETF has specified a number of approaches for connecting L2 The IETF has specified a number of approaches for connecting L2
domains together as part of the L2VPN Working Group. That group, domains together as part of the L2VPN Working Group. That group,
however has historically been focused on Provider-provisioned L2 however has historically been focused on Provider-provisioned L2
VPNs, where the service provider participates in management and VPNs, where the service provider participates in management and
provisioning of the VPN. In addition, much of the target environment provisioning of the VPN. In addition, much of the target environment
for such deployments involves carrying L2 traffic over WANs. Overlay for such deployments involves carrying L2 traffic over WANs. Overlay
approaches are intended be used within data centers where the overlay approaches as discussed in this document are intended be used within
network is managed by the data center operator, rather than by an data centers where the overlay network is managed by the data center
outside party. While overlays can run across the Internet as well, operator, rather than by an outside party. While overlays can run
they will extend well into the data center itself (e.g., up to and across the Internet as well, they will extend well into the data
including hypervisors) and include large numbers of machines within center itself (e.g., up to and including hypervisors) and include
the data center itself. large numbers of machines within the data center itself.
Other L2VPN approaches, such as L2TP [RFC2661] require significant Other L2VPN approaches, such as L2TP [RFC3931] require significant
tunnel state at the encapsulating and decapsulating end points. tunnel state at the encapsulating and decapsulating end points.
Overlays require less tunnel state than other approaches, which is Overlays require less tunnel state than other approaches, which is
important to allow overlays to scale to hundreds of thousands of end important to allow overlays to scale to hundreds of thousands of end
points. It is assumed that smaller switches (i.e., virtual switches points. It is assumed that smaller switches (i.e., virtual switches
in hypervisors or the adjacent devices to which VMs connect) will be in hypervisors or the adjacent devices to which VMs connect) will be
part of the overlay network and be responsible for encapsulating and part of the overlay network and be responsible for encapsulating and
decapsulating packets. decapsulating packets.
4.7. Proxy Mobile IP 5.8. Proxy Mobile IP
Proxy Mobile IP [RFC5213] [RFC5844] makes use of the GRE Key Field Proxy Mobile IP [RFC5213] [RFC5844] makes use of the GRE Key Field
[RFC5845] [RFC6245], but not in a way that supports multi-tenancy. [RFC5845] [RFC6245], but not in a way that supports multi-tenancy.
4.8. LISP 5.9. LISP
LISP[I-D.ietf-lisp] essentially provides an IP over IP overlay where LISP[I-D.ietf-lisp] essentially provides an IP over IP overlay where
the internal addresses are end station Identifiers and the outer IP the internal addresses are end station Identifiers and the outer IP
addresses represent the location of the end station within the core addresses represent the location of the end station within the core
IP network topology. The LISP overlay header uses a 24-bit Instance IP network topology. The LISP overlay header uses a 24-bit Instance
ID used to support overlapping inner IP addresses. ID used to support overlapping inner IP addresses.
5. Further Work 5.10. VDP
VDP is the Virtual Station Interface (VSI) Discovery and
Configuration Protocol specified by IEEE P802.1Qbg [Qbg]. VDP is a
protocol that supports the association of a VSI with a port. VDP is
run between the end system (e.g., a hypervisor) and its adjacent
switch, i.e., the device on the edge of the network. VDP is used for
example to communicate to the switch that a Virtual Machine (Virtual
Station) is moving, i.e. designed for VM migration.
6. Further Work
It is believed that overlay-based approaches may be able to reduce It is believed that overlay-based approaches may be able to reduce
the overall amount of flooding and other multicast and broadcast the overall amount of flooding and other multicast and broadcast
related traffic (e.g, ARP and ND) currently experienced within related traffic (e.g, ARP and ND) currently experienced within
current data centers with a large flat L2 network. Further analysis current data centers with a large flat L2 network. Further analysis
is needed to characterize expected improvements. is needed to characterize expected improvements.
There are a number of VPN approaches that provide some if not all of There are a number of VPN approaches that provide some if not all of
the desired semantics of virtual networks. A gap analysis will be the desired semantics of virtual networks. A gap analysis will be
needed to assess how well existing approaches satisfy the needed to assess how well existing approaches satisfy the
requirements. requirements.
6. Summary 7. Summary
This document has argued that network virtualization using overlays This document has argued that network virtualization using overlays
addresses a number of issues being faced as data centers scale in addresses a number of issues being faced as data centers scale in
size. In addition, careful study of current data center problems is size. In addition, careful study of current data center problems is
needed for development of proper requirements and standard solutions. needed for development of proper requirements and standard solutions.
Three potential work were identified. The first involves the This document identified three potential control protocol work areas.
interaction that take place when a VM attaches or detaches from an The first involves a backend "oracle" and how it learns and
overlay. A second involves the protocol an NVE would use to distributes the mapping information NVEs use when processing tenant
communicate with a backend "oracle" to learn and disseminate mapping traffic. A second involves the protocol an NVE would use to
information about the VMs the NVE communicates with. The third communicate with the backend oracle to obtain the mapping
potential work area involves the backend oracle itself, i.e., how it information. The third potential work concerns the interactions that
provides failover and how it interacts with oracles in other domains. take place when a VM attaches or detaches from an specific virtual
network instance.
7. Acknowledgments 8. Acknowledgments
Helpful comments and improvements to this document have come from Helpful comments and improvements to this document have come from
John Drake, Ariel Hendel, Vinit Jain, Thomas Morin, Benson Schliesser John Drake, Janos Farkas, Ilango Ganga, Ariel Hendel, Vinit Jain,
and many others on the mailing list. Petr Lapukhov, Thomas Morin, Benson Schliesser, Lucy Yong and many
others on the NVO3 mailing list.
8. IANA Considerations 9. IANA Considerations
This memo includes no request to IANA. This memo includes no request to IANA.
9. Security Considerations 10. Security Considerations
TBD Because this document describes the problem space associated with the
need for virtualization of networks in complex, large-scale, data-
center networks, it does not itself introduce any security risks.
However, it is clear that security concerns need to be a
consideration of any solutions proposed to address this problem
space.
10. Informative References 11. Informative References
[I-D.ietf-armd-problem-statement]
Narten, T., Karir, M., and I. Foo, "Address Resolution
Problems in Large Data Center Networks",
draft-ietf-armd-problem-statement-04 (work in progress),
October 2012.
[I-D.ietf-l2vpn-evpn] [I-D.ietf-l2vpn-evpn]
Sajassi, A., Aggarwal, R., Henderickx, W., Balus, F., Sajassi, A., Aggarwal, R., Henderickx, W., Balus, F.,
Isaac, A., and J. Uttaro, "BGP MPLS Based Ethernet VPN", Isaac, A., and J. Uttaro, "BGP MPLS Based Ethernet VPN",
draft-ietf-l2vpn-evpn-01 (work in progress), July 2012. draft-ietf-l2vpn-evpn-01 (work in progress), July 2012.
[I-D.ietf-lisp] [I-D.ietf-lisp]
Farinacci, D., Fuller, V., Meyer, D., and D. Lewis, Farinacci, D., Fuller, V., Meyer, D., and D. Lewis,
"Locator/ID Separation Protocol (LISP)", "Locator/ID Separation Protocol (LISP)",
draft-ietf-lisp-23 (work in progress), May 2012. draft-ietf-lisp-23 (work in progress), May 2012.
[I-D.ietf-trill-fine-labeling] [I-D.ietf-trill-fine-labeling]
Eastlake, D., Zhang, M., Agarwal, P., Perlman, R., and D. Eastlake, D., Zhang, M., Agarwal, P., Perlman, R., and D.
Dutt, "TRILL: Fine-Grained Labeling", Dutt, "TRILL: Fine-Grained Labeling",
draft-ietf-trill-fine-labeling-01 (work in progress), draft-ietf-trill-fine-labeling-02 (work in progress),
June 2012. October 2012.
[I-D.lasserre-nvo3-framework] [I-D.lasserre-nvo3-framework]
Lasserre, M., Balus, F., Morin, T., Bitar, N., and Y. Lasserre, M., Balus, F., Morin, T., Bitar, N., and Y.
Rekhter, "Framework for DC Network Virtualization", Rekhter, "Framework for DC Network Virtualization",
draft-lasserre-nvo3-framework-03 (work in progress), draft-lasserre-nvo3-framework-03 (work in progress),
July 2012. July 2012.
[I-D.marques-l3vpn-end-system] [I-D.marques-l3vpn-end-system]
Marques, P., Fang, L., Pan, P., Shukla, A., Napierala, M., Marques, P., Fang, L., Pan, P., Shukla, A., Napierala, M.,
and N. Bitar, "BGP-signaled end-system IP/VPNs.", and N. Bitar, "BGP-signaled end-system IP/VPNs.",
draft-marques-l3vpn-end-system-07 (work in progress), draft-marques-l3vpn-end-system-07 (work in progress),
August 2012. August 2012.
[I-D.raggarwa-data-center-mobility] [I-D.raggarwa-data-center-mobility]
Aggarwal, R., Rekhter, Y., Henderickx, W., Shekhar, R., Aggarwal, R., Rekhter, Y., Henderickx, W., Shekhar, R.,
and L. Fang, "Data Center Mobility based on BGP/MPLS, IP and L. Fang, "Data Center Mobility based on BGP/MPLS, IP
Routing and NHRP", draft-raggarwa-data-center-mobility-03 Routing and NHRP", draft-raggarwa-data-center-mobility-03
(work in progress), June 2012. (work in progress), June 2012.
[RFC2661] Townsley, W., Valencia, A., Rubens, A., Pall, G., Zorn, [Qbg] "IEEE P802.1Qbg Edge Virtual Bridging", February 2012.
G., and B. Palter, "Layer Two Tunneling Protocol "L2TP"",
RFC 2661, August 1999. [RFC3931] Lau, J., Townsley, M., and I. Goyret, "Layer Two Tunneling
Protocol - Version 3 (L2TPv3)", RFC 3931, March 2005.
[RFC4364] Rosen, E. and Y. Rekhter, "BGP/MPLS IP Virtual Private [RFC4364] Rosen, E. and Y. Rekhter, "BGP/MPLS IP Virtual Private
Networks (VPNs)", RFC 4364, February 2006. Networks (VPNs)", RFC 4364, February 2006.
[RFC5213] Gundavelli, S., Leung, K., Devarapalli, V., Chowdhury, K., [RFC5213] Gundavelli, S., Leung, K., Devarapalli, V., Chowdhury, K.,
and B. Patil, "Proxy Mobile IPv6", RFC 5213, August 2008. and B. Patil, "Proxy Mobile IPv6", RFC 5213, August 2008.
[RFC5844] Wakikawa, R. and S. Gundavelli, "IPv4 Support for Proxy [RFC5844] Wakikawa, R. and S. Gundavelli, "IPv4 Support for Proxy
Mobile IPv6", RFC 5844, May 2010. Mobile IPv6", RFC 5844, May 2010.
skipping to change at page 19, line 12 skipping to change at page 20, line 46
Ghanwani, "Routing Bridges (RBridges): Base Protocol Ghanwani, "Routing Bridges (RBridges): Base Protocol
Specification", RFC 6325, July 2011. Specification", RFC 6325, July 2011.
[SPBM] "IEEE P802.1aq/D4.5 Draft Standard for Local and [SPBM] "IEEE P802.1aq/D4.5 Draft Standard for Local and
Metropolitan Area Networks -- Media Access Control (MAC) Metropolitan Area Networks -- Media Access Control (MAC)
Bridges and Virtual Bridged Local Area Networks, Bridges and Virtual Bridged Local Area Networks,
Amendment 8: Shortest Path Bridging", February 2012. Amendment 8: Shortest Path Bridging", February 2012.
Appendix A. Change Log Appendix A. Change Log
A.1. Changes from draft-narten-nvo3-overlay-problem-statement-04.txt A.1. Changes From -00 to -01
1. Numerous editorial and clarity improvements.
2. Picked up updated terminology from the framework document (e.g.,
Tenant System).
3. Significant changes regarding IEEE 802.1 Ethernets and VLANs.
All text moved to the Related Work section, where the technology
is summarized.
4. Removed section on Forwarding Table Size limitations. This issue
only occurs in some deployments with L2 bridging, and is not
considered a motivating factor for the NVO3 work.
5. Added paragraph in Introduction that makes clear that NVO3 is
focused on providing both L2 and L3 service to end systems, and
that IP is assumed as the underlay transport in the data center.
6. Added new section (2.6) on Optimal Forwarding.
7. Added a section on Data Plane issues.
8. Significant improvement to Section describing SPBM.
9. Added sub-section on VDP in "Related Work"
A.2. Changes from draft-narten-nvo3-overlay-problem-statement-04.txt
1. This document has only one substantive change relative to 1. This document has only one substantive change relative to
draft-narten-nvo3-overlay-problem-statement-04.txt. Two draft-narten-nvo3-overlay-problem-statement-04.txt. Two
sentences were removed per the discussion that led to WG adoption sentences were removed per the discussion that led to WG adoption
of this document. of this document.
Authors' Addresses Authors' Addresses
Thomas Narten (editor) Thomas Narten (editor)
IBM IBM
Email: narten@us.ibm.com Email: narten@us.ibm.com
Eric Gray (editor)
Ericsson
Email: eric.gray@ericsson.com
David Black David Black
EMC EMC
Email: david.black@emc.com Email: david.black@emc.com
Dinesh Dutt Dinesh Dutt
Email: ddutt.ietf@hobbesdutt.com Email: ddutt.ietf@hobbesdutt.com
Luyuan Fang Luyuan Fang
Cisco Systems Cisco Systems
111 Wood Avenue South 111 Wood Avenue South
Iselin, NJ 08830 Iselin, NJ 08830
USA USA
Email: lufang@cisco.com Email: lufang@cisco.com
Eric Gray (editor)
Ericsson
Email: eric.gray@ericsson.com
Lawrence Kreeger Lawrence Kreeger
Cisco Cisco
Email: kreeger@cisco.com Email: kreeger@cisco.com
Maria Napierala Maria Napierala
AT&T AT&T
200 Laurel Avenue 200 Laurel Avenue
Middletown, NJ 07748 Middletown, NJ 07748
 End of changes. 98 change blocks. 
355 lines changed or deleted 473 lines changed or added

This html diff was produced by rfcdiff 1.41. The latest version is available from http://tools.ietf.org/tools/rfcdiff/