draft-narten-nvo3-overlay-problem-statement-03.txt   draft-narten-nvo3-overlay-problem-statement-04.txt 
Internet Engineering Task Force T. Narten, Ed. Internet Engineering Task Force T. Narten, Ed.
Internet-Draft IBM Internet-Draft IBM
Intended status: Informational M. Sridharan Intended status: Informational D. Black
Expires: January 18, 2013 Microsoft Expires: February 11, 2013 EMC
D. Dutt D. Dutt
D. Black L. Fang
EMC Cisco Systems
E. Gray
Ericsson
L. Kreeger L. Kreeger
Cisco Cisco
July 17, 2012 M. Napierala
AT&T
M. Sridharan
Microsoft
August 10, 2012
Problem Statement: Overlays for Network Virtualization Problem Statement: Overlays for Network Virtualization
draft-narten-nvo3-overlay-problem-statement-03 draft-narten-nvo3-overlay-problem-statement-04
Abstract Abstract
This document describes issues associated with providing multi- This document describes issues associated with providing multi-
tenancy in large data center networks and an overlay-based network tenancy in large data center networks that require an overlay-based
virtualization approach to addressing them. A key multi-tenancy network virtualization approach to addressing them. A key multi-
requirement is traffic isolation, so that a tenant's traffic is not tenancy requirement is traffic isolation, so that a tenant's traffic
visible to any other tenant. This isolation can be achieved by is not visible to any other tenant. This isolation can be achieved
assigning one or more virtual networks to each tenant such that by assigning one or more virtual networks to each tenant such that
traffic within a virtual network is isolated from traffic in other traffic within a virtual network is isolated from traffic in other
virtual networks. The primary functionality required is provisioning virtual networks. The primary functionality required is provisioning
virtual networks, associating a virtual machine's virtual network virtual networks, associating a virtual machine's virtual network
interface(s) with the appropriate virtual network, and maintaining interface(s) with the appropriate virtual network, and maintaining
that association as the virtual machine is activated, migrated and/or that association as the virtual machine is activated, migrated and/or
deactivated. Use of an overlay-based approach enables scalable deactivated. Use of an overlay-based approach enables scalable
deployment on large network infrastructures. deployment on large network infrastructures.
Status of this Memo Status of this Memo
skipping to change at page 1, line 49 skipping to change at page 2, line 10
Internet-Drafts are working documents of the Internet Engineering Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF). Note that other groups may also distribute Task Force (IETF). Note that other groups may also distribute
working documents as Internet-Drafts. The list of current Internet- working documents as Internet-Drafts. The list of current Internet-
Drafts is at http://datatracker.ietf.org/drafts/current/. Drafts is at http://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress." material or to cite them other than as "work in progress."
This Internet-Draft will expire on January 18, 2013. This Internet-Draft will expire on February 11, 2013.
Copyright Notice Copyright Notice
Copyright (c) 2012 IETF Trust and the persons identified as the Copyright (c) 2012 IETF Trust and the persons identified as the
document authors. All rights reserved. document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents Provisions Relating to IETF Documents
(http://trustee.ietf.org/license-info) in effect on the date of (http://trustee.ietf.org/license-info) in effect on the date of
publication of this document. Please review these documents publication of this document. Please review these documents
carefully, as they describe your rights and restrictions with respect carefully, as they describe your rights and restrictions with respect
to this document. Code Components extracted from this document must to this document. Code Components extracted from this document must
include Simplified BSD License text as described in Section 4.e of include Simplified BSD License text as described in Section 4.e of
the Trust Legal Provisions and are provided without warranty as the Trust Legal Provisions and are provided without warranty as
described in the Simplified BSD License. described in the Simplified BSD License.
Table of Contents Table of Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4
2. Problem Details . . . . . . . . . . . . . . . . . . . . . . . 5 2. Problem Areas . . . . . . . . . . . . . . . . . . . . . . . . 5
2.1. Dynamic Provisioning . . . . . . . . . . . . . . . . . . . 5 2.1. Need For Dynamic Provisioning . . . . . . . . . . . . . . 5
2.2. Virtual Machine Mobility Requirements . . . . . . . . . . 5 2.2. Virtual Machine Mobility Limitations . . . . . . . . . . . 6
2.3. Span of Virtual Networks . . . . . . . . . . . . . . . . . 6 2.3. Inadequate Forwarding Table Sizes in Switches . . . . . . 6
2.4. Inadequate Forwarding Table Sizes in Switches . . . . . . 6 2.4. Need to Decouple Logical and Physical Configuration . . . 7
2.5. Decoupling Logical and Physical Configuration . . . . . . 6 2.5. Need For Address Separation Between Tenants . . . . . . . 7
2.6. Separating Tenant Addressing from Infrastructure 2.6. Need For Address Separation Between Tenant and
Addressing . . . . . . . . . . . . . . . . . . . . . . . . 7 Infrastructure . . . . . . . . . . . . . . . . . . . . . . 7
2.7. Communication Between Virtual and Traditional Networks . . 7 2.7. IEEE 802.1 VLAN Limitations . . . . . . . . . . . . . . . 8
2.8. Communication Between Virtual Networks . . . . . . . . . . 7 3. Network Overlays . . . . . . . . . . . . . . . . . . . . . . . 8
2.9. Overlay Design Characteristics . . . . . . . . . . . . . . 8 3.1. Benefits of Network Overlays . . . . . . . . . . . . . . . 9
3. Network Overlays . . . . . . . . . . . . . . . . . . . . . . . 9 3.2. Communication Between Virtual and Traditional Networks . . 10
3.1. Limitations of Existing Virtual Network Models . . . . . . 9 3.3. Communication Between Virtual Networks . . . . . . . . . . 11
3.2. Benefits of Network Overlays . . . . . . . . . . . . . . . 10 3.4. Overlay Design Characteristics . . . . . . . . . . . . . . 11
3.3. Overlay Networking Work Areas . . . . . . . . . . . . . . 11 3.5. Overlay Networking Work Areas . . . . . . . . . . . . . . 12
4. Related Work . . . . . . . . . . . . . . . . . . . . . . . . . 13 4. Related IETF and IEEE Work . . . . . . . . . . . . . . . . . 14
4.1. IEEE 802.1aq - Shortest Path Bridging . . . . . . . . . . 13 4.1. L3 BGP/MPLS IP VPNs . . . . . . . . . . . . . . . . . . . 14
4.2. ARMD . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 4.2. L2 BGP/MPLS IP VPNs . . . . . . . . . . . . . . . . . . . 15
4.3. TRILL . . . . . . . . . . . . . . . . . . . . . . . . . . 13 4.3. IEEE 802.1aq - Shortest Path Bridging . . . . . . . . . . 15
4.4. L2VPNs . . . . . . . . . . . . . . . . . . . . . . . . . . 14 4.4. ARMD . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
4.5. Proxy Mobile IP . . . . . . . . . . . . . . . . . . . . . 14 4.5. TRILL . . . . . . . . . . . . . . . . . . . . . . . . . . 15
4.6. LISP . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 4.6. L2VPNs . . . . . . . . . . . . . . . . . . . . . . . . . . 16
4.7. Individual Submissions . . . . . . . . . . . . . . . . . . 14 4.7. Proxy Mobile IP . . . . . . . . . . . . . . . . . . . . . 16
5. Further Work . . . . . . . . . . . . . . . . . . . . . . . . . 15 4.8. LISP . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
6. Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 5. Further Work . . . . . . . . . . . . . . . . . . . . . . . . . 16
7. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 15 6. Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
8. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 15 7. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 17
9. Security Considerations . . . . . . . . . . . . . . . . . . . 15 8. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 17
10. Informative References . . . . . . . . . . . . . . . . . . . . 15 9. Security Considerations . . . . . . . . . . . . . . . . . . . 17
Appendix A. Change Log . . . . . . . . . . . . . . . . . . . . . 17 10. Informative References . . . . . . . . . . . . . . . . . . . . 17
A.1. Changes from -01 . . . . . . . . . . . . . . . . . . . . . 17 Appendix A. Change Log . . . . . . . . . . . . . . . . . . . . . 19
A.2. Changes from -02 . . . . . . . . . . . . . . . . . . . . . 18 A.1. Changes from -01 . . . . . . . . . . . . . . . . . . . . . 19
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 18 A.2. Changes from -02 . . . . . . . . . . . . . . . . . . . . . 19
A.3. Changes from -03 . . . . . . . . . . . . . . . . . . . . . 20
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 20
1. Introduction 1. Introduction
Server virtualization is increasingly becoming the norm in data Data Centers are increasingly being consolidated and outsourced in an
centers. With server virtualization, each physical server supports effort, both to improve the deployment time of applications as well
multiple virtual machines (VMs), each running its own operating as reduce operational costs. This coincides with an increasing
system, middleware and applications. Virtualization is a key enabler demand for compute, storage, and network resources from applications.
of workload agility, i.e., allowing any server to host any In order to scale compute, storage, and network resources, physical
application and providing the flexibility of adding, shrinking, or resources are being abstracted from their logical representation, in
moving services within the physical infrastructure. Server what is referred to as server, storage, and network virtualization.
virtualization provides numerous benefits, including higher Virtualization can be implemented in various layers of computer
utilization, increased security, reduced user downtime, reduced power systems or networks
usage, etc.
Large scale multi-tenant data centers are taking advantage of the The demand for server virtualization is increasing in data centers.
benefits of server virtualization to provide a new kind of hosting, a With server virtualization, each physical server supports multiple
virtual hosted data center. Multi-tenant data centers are ones where virtual machines (VMs), each running its own operating system,
middleware and applications. Virtualization is a key enabler of
workload agility, i.e., allowing any server to host any application
and providing the flexibility of adding, shrinking, or moving
services within the physical infrastructure. Server virtualization
provides numerous benefits, including higher utilization, increased
security, reduced user downtime, reduced power usage, etc.
Multi-tenant data centers are taking advantage of the benefits of
server virtualization to provide a new kind of hosting, a virtual
hosted data center. Multi-tenant data centers are ones where
individual tenants could belong to a different company (in the case individual tenants could belong to a different company (in the case
of a public provider) or a different department (in the case of an of a public provider) or a different department (in the case of an
internal company data center). Each tenant has the expectation of a internal company data center). Each tenant has the expectation of a
level of security and privacy separating their resources from those level of security and privacy separating their resources from those
of other tenants. For example, one tenant's traffic must never be of other tenants. For example, one tenant's traffic must never be
exposed to another tenant, except through carefully controlled exposed to another tenant, except through carefully controlled
interfaces, such as a security gateway. interfaces, such as a security gateway.
To a tenant, virtual data centers are similar to their physical To a tenant, virtual data centers are similar to their physical
counterparts, consisting of end stations attached to a network, counterparts, consisting of end stations attached to a network,
complete with services such as load balancers and firewalls. But complete with services such as load balancers and firewalls. But
unlike a physical data center, end stations connect to a virtual unlike a physical data center, end stations connect to a virtual
network. To end stations, a virtual network looks like a normal network. To end stations, a virtual network looks like a normal
network (e.g., providing an ethernet service), except that the only network (e.g., providing an ethernet or L3 service), except that the
end stations connected to the virtual network are those belonging to only end stations connected to the virtual network are those
the tenant. belonging to a tenant's specific virtual network.
A tenant is the administrative entity that is responsible for and A tenant is the administrative entity that is responsible for and
manages a specific virtual network instance and its associated manages a specific virtual network instance and its associated
services (whether virtual or physical). In a cloud environment, a services (whether virtual or physical). In a cloud environment, a
tenant would correspond to the customer that has defined and is using tenant would correspond to the customer that has defined and is using
a particular virtual network. However, a tenant may also find it a particular virtual network. However, a tenant may also find it
useful to create multiple different virtual network instances. useful to create multiple different virtual network instances.
Hence, there is a one-to-many mapping between tenants and virtual Hence, there is a one-to-many mapping between tenants and virtual
network instances. A single tenant may operate multiple individual network instances. A single tenant may operate multiple individual
virtual network instances, each associated with a different service. virtual network instances, each associated with a different service.
How a virtual network is implemented does not matter to the tenant. How a virtual network is implemented does not generally matter to the
It could be a pure routed network, a pure bridged network or a tenant; what matters is that the service provided (L2 or L3) has the
combination of bridged and routed networks. The key requirement is right semantics, performance, etc. It could be implemented via a
that each individual virtual network instance be isolated from other pure routed network, a pure bridged network or a combination of
virtual network instances. bridged and routed networks. A key requirement is that each
individual virtual network instance be isolated from other virtual
network instances.
For data center virtualization, two key issues must be addressed.
First, address space separation between tenants must be supported.
Second, it must be possible to place (and migrate) VMs anywhere in
the data center, without restricting VM addressing to match the
subnet boundaries of the underlying data center network.
This document outlines the problems encountered in scaling the number This document outlines the problems encountered in scaling the number
of isolated networks in a data center, as well as the problems of of isolated networks in a data center, as well as the problems of
managing the creation/deletion, membership and span of these networks managing the creation/deletion, membership and span of these networks
and makes the case that an overlay based approach, where individual and makes the case that an overlay based approach, where individual
networks are implemented within individual virtual networks that are networks are implemented within individual virtual networks that are
dynamically controlled by a standardized control plane provides a dynamically controlled by a standardized control plane provides a
number of advantages over current approaches. The purpose of this number of advantages over current approaches. The purpose of this
document is to identify the set of problems that any solution has to document is to identify the set of problems that any solution has to
address in building multi-tenant data centers. With this approach, address in building multi-tenant data centers. With this approach,
the goal is to allow the construction of standardized, interoperable the goal is to allow the construction of standardized, interoperable
implementations to allow the construction of multi-tenant data implementations to allow the construction of multi-tenant data
centers. centers.
Section 2 describes the problem space details. Section 3 describes Section 2 describes the problem space details. Section 3 describes
network overlays in more detail and the potential work areas. overlay networks in more detail. Sections 4 and 5 review related and
Sections 4 and 5 review related and further work, while Section 6 further work, while Section 6 closes with a summary.
closes with a summary.
2. Problem Details 2. Problem Areas
The following subsections describe aspects of multi-tenant networking The following subsections describe aspects of multi-tenant data
that pose problems for large scale network infrastructure. Different center networking that pose problems for network infrastructure.
problem aspects may arise based on the network architecture and Different problem aspects may arise based on the network architecture
scale. and scale.
2.1. Dynamic Provisioning 2.1. Need For Dynamic Provisioning
Cloud computing involves on-demand provisioning of resources for Cloud computing involves on-demand provisioning of resources for
multi-tenant environments. A common example of cloud computing is multi-tenant environments. A common example of cloud computing is
the public cloud, where a cloud service provider offers elastic the public cloud, where a cloud service provider offers elastic
services to multiple customers over the same infrastructure. The on- services to multiple customers over the same infrastructure. In
demand nature of provisioning in conjunction with trusted hypervisors current systems, it can be difficult to provision resources for
controlling network access by VMs can be achieved through resilient individual tenants in such a way that provisioned properties migrate
distributed network control mechanisms. automatically when services are dynamically moved around within the
data center to optimize workloads.
2.2. Virtual Machine Mobility Requirements 2.2. Virtual Machine Mobility Limitations
A key benefit of server virtualization is virtual machine (VM) A key benefit of server virtualization is virtual machine (VM)
mobility. A VM can be migrated from one server to another, live, mobility. A VM can be migrated from one server to another, live,
i.e., while continuing to run and without needing to shut it down and i.e., while continuing to run and without needing to shut it down and
restart it at the new location. A key requirement for live migration restart it at the new location. A key requirement for live migration
is that a VM retain critical network state at its new location, is that a VM retain critical network state at its new location,
including its IP and MAC address(es). Preservation of MAC addresses including its IP and MAC address(es). Preservation of MAC addresses
may be necessary, for example, when software licences are bound to may be necessary, for example, when software licenses are bound to
MAC addresses. More generally, any change in the VM's MAC addresses MAC addresses. More generally, any change in the VM's MAC addresses
resulting from a move would be visible to the VM and thus potentially resulting from a move would be visible to the VM and thus potentially
result in unexpected disruptions. Retaining IP addresses after a result in unexpected disruptions. Retaining IP addresses after a
move is necessary to prevent existing transport connections (e.g., move is necessary to prevent existing transport connections (e.g.,
TCP) from breaking and needing to be restarted. TCP) from breaking and needing to be restarted.
In traditional data centers, servers are assigned IP addresses based In traditional data centers, servers are assigned IP addresses based
on their physical location, for example based on the Top of Rack on their physical location, for example based on the Top of Rack
(ToR) switch for the server rack or the VLAN configured to the (ToR) switch for the server rack or the VLAN configured to the
server. Servers can only move to other locations within the same IP server. Servers can only move to other locations within the same IP
subnet. This constraint is not problematic for physical servers, subnet. This constraint is not problematic for physical servers,
which move infrequently, but it restricts the placement and movement which move infrequently, but it restricts the placement and movement
of VMs within the data center. Any solution for a scalable multi- of VMs within the data center. Any solution for a scalable multi-
tenant data center must allow a VM to be placed (or moved) anywhere tenant data center must allow a VM to be placed (or moved) anywhere
within the data center, without being constrained by the subnet within the data center, without being constrained by the subnet
boundary concerns of the host servers. boundary concerns of the host servers.
2.3. Span of Virtual Networks 2.3. Inadequate Forwarding Table Sizes in Switches
Another use case is cross pod expansion. A pod typically consists of
one or more racks of servers with its associated network and storage
connectivity. Tenants may start off on a pod and, due to expansion,
require servers/VMs on other pods, especially the case when tenants
on the other pods are not fully utilizing all their resources. This
use case requires that virtual networks span multiple pods in order
to provide connectivity to all of the tenant's servers/VMs.
2.4. Inadequate Forwarding Table Sizes in Switches
Today's virtualized environments place additional demands on the Today's virtualized environments place additional demands on the
forwarding tables of switches. Instead of just one link-layer forwarding tables of switches in the physical infrastructure.
address per server, the switching infrastructure has to learn Instead of just one link-layer address per server, the switching
addresses of the individual VMs (which could range in the 100s per infrastructure has to learn addresses of the individual VMs (which
server). This is a requirement since traffic from/to the VMs to the could range in the 100s per server). This is a requirement since
rest of the physical network will traverse the physical network traffic from/to the VMs to the rest of the physical network will
infrastructure. This places a much larger demand on the switches' traverse the physical network infrastructure. This places a much
forwarding table capacity compared to non-virtualized environments, larger demand on the switches' forwarding table capacity compared to
causing more traffic to be flooded or dropped when the addresses in non-virtualized environments, causing more traffic to be flooded or
use exceeds the forwarding table capacity. dropped when the number of addresses in use exceeds a switch's
forwarding table capacity.
2.5. Decoupling Logical and Physical Configuration 2.4. Need to Decouple Logical and Physical Configuration
Data center operators must be able to achieve high utilization of Data center operators must be able to achieve high utilization of
server and network capacity. For efficient and flexible allocation, server and network capacity. For efficient and flexible allocation,
operators should be able to spread a virtual network instance across operators should be able to spread a virtual network instance across
servers in any rack in the data center. It should also be possible servers in any rack in the data center. It should also be possible
to migrate compute workloads to any server anywhere in the network to migrate compute workloads to any server anywhere in the network
while retaining the workload's addresses. This can be achieved today while retaining the workload's addresses. In networks using VLANs,
by stretching VLANs (e.g., by using TRILL or SPB). moving servers elsewhere in the network may require expanding the
scope of the VLAN beyond its original boundaries. While this can be
done, it requires potentially complex network configuration changes
and can conflict with the desire to bound the size of broadcast
domains, especially in larger data centers.
However, in order to limit the broadcast domain of each VLAN, multi- However, in order to limit the broadcast domain of each VLAN, multi-
destination frames within a VLAN should optimally flow only to those destination frames within a VLAN should optimally flow only to those
devices that have that VLAN configured. When workloads migrate, the devices that have that VLAN configured. When workloads migrate, the
physical network (e.g., access lists) may need to be reconfigured physical network (e.g., access lists) may need to be reconfigured
which is typically time consuming and error prone. which is typically time consuming and error prone.
2.6. Separating Tenant Addressing from Infrastructure Addressing An important use case is cross-pod expansion. A pod typically
consists of one or more racks of servers with its associated network
It is highly desirable to be able to number the data center underlay and storage connectivity. A tenant's virtual network may start off
network using whatever addresses make sense for it, without having to on a pod and, due to expansion, require servers/VMs on other pods,
worry about address collisions between addresses used by the underlay especially the case when other pods are not fully utilizing all their
and those used by tenants. resources. This use case requires that virtual networks span
multiple pods in order to provide connectivity to all of its tenant's
2.7. Communication Between Virtual and Traditional Networks servers/VMs. Such expansion can be difficult to achieve when tenant
addressing is tied to the addressing used by the underlay network or
Not all communication will be between devices connected to when it requires that the scope of the underlying L2 VLAN expand
virtualized networks. Devices using overlays will continue to access beyond its original pod boundary.
devices and make use of services on traditional, non-virtualized
networks, whether in the data center, the public Internet, or at
remote/branch campuses. Any virtual network solution must be capable
of interoperating with existing routers, VPN services, load
balancers, intrusion detection services, firewalls, etc. on external
networks.
Communication between devices attached to a virtual network and
devices connected to non-virtualized networks is handled
architecturally by having specialized gateway devices that receive
packets from a virtualized network, decapsulate them, process them as
regular (i.e., non-virtualized) traffic, and finally forward them on
to their appropriate destination (and vice versa). Additional
identification, such as VLAN tags, could be used on the non-
virtualized side of such a gateway to enable forwarding of traffic
for multiple virtual networks over a common non-virtualized link.
A wide range of implementation approaches are possible. Overlay
gateway functionality could be combined with other network
functionality into a network device that implements the overlay
functionality, and then forwards traffic between other internal
components that implement functionality such as full router service,
load balancing, firewall support, VPN gateway, etc.
2.8. Communication Between Virtual Networks
Communication between devices on different virtual networks is
handled architecturally by adding specialized interconnect
functionality among the otherwise isolated virtual networks. For a
virtual network providing an Ethernet service, such interconnect
functionality could be IP forwarding configured as part of the
"default gateway" for each virtual network. For a virtual network
providing IP service, the interconnect functionality could be IP
forwarding configured as part of the IP addressing structure of each
virtual network. In both cases, the implementation of the
interconnect functionality could be distributed across the NVEs, and
could be combined with other network functionality (e.g., load
balancing, firewall support) that is applied to traffic that is
forwarded between virtual networks.
2.9. Overlay Design Characteristics
There are existing layer 2 overlay protocols in existence, but they
were not necessarily designed to solve the problem in the environment
of a highly virtualized data center. Below are some of the
characteristics of environments that must be taken into account by
the overlay technology:
1. Highly distributed systems. The overlay should work in an
environment where there could be many thousands of access
switches (e.g. residing within the hypervisors) and many more end
systems (e.g. VMs) connected to them. This leads to a
distributed mapping system that puts a low overhead on the
overlay tunnel endpoints.
2. Many highly distributed virtual networks with sparse membership.
Each virtual network could be highly dispersed inside the data
center. Also, along with expectation of many virtual networks,
the number of end systems connected to any one virtual network is
expected to be relatively low; Therefore, the percentage of
access switches participating in any given virtual network would
also be expected to be low. For this reason, efficient pruning
of multi-destination traffic should be taken into consideration.
3. Highly dynamic end systems. End systems connected to virtual
networks can be very dynamic, both in terms of creation/deletion/
power-on/off and in terms of mobility across the access switches.
4. Work with existing, widely deployed network Ethernet switches and
IP routers without requiring wholesale replacement. The first
hop switch that adds and removes the overlay header will require
new equipment and/or new software.
5. Network infrastructure administered by a single administrative
domain. This is consistent with operation within a data center,
and not across the Internet.
3. Network Overlays 2.5. Need For Address Separation Between Tenants
Virtual Networks are used to isolate a tenant's traffic from that of Individual tenants need control over the addresses they use within a
other tenants (or even traffic within the same tenant that requires virtual network. But it can be problematic when different tenants
isolation). There are two main characteristics of virtual networks: want to use the same addresses, or even if the same tenant wants to
reuse the same addresses in different virtual networks.
Consequently, virtual networks must allow tenants to use whatever
addresses they want without concern for what addresses are being used
by other tenants or other virtual networks.
1. Providing network address space that is isolated from other 2.6. Need For Address Separation Between Tenant and Infrastructure
virtual networks. The same network addresses may be used in
different virtual networks on the same underlying network
infrastructure.
2. Limiting the scope of frames sent on the virtual network. Frames As in the previous case, a tenant needs to be able to use whatever
sent by end systems attached to a virtual network are delivered addresses it wants in a virtual network independent of what addresses
as expected to other end systems on that virtual network and may the underlying data center network is using. Tenants (and the
exit a virtual network only through controlled exit points such underlay infrastructure provider) should be able use whatever
as a security gateway. Likewise, frames sourced outside of the addresses make sense for them, without having to worry about address
virtual network may enter the virtual network only through collisions between addresses used by tenants and those used by the
controlled entry points, such as a security gateway. underlay data center network.
3.1. Limitations of Existing Virtual Network Models 2.7. IEEE 802.1 VLAN Limitations
Virtual networks are not new to networking. For example, VLANs are a VLANs are a well known construct in the networking industry,
well known construct in the networking industry. A VLAN is an L2 providing an L2 service via an L2 underlay. A VLAN is an L2 bridging
bridging construct that provides some of the semantics of virtual construct that provides some of the semantics of virtual networks
networks mentioned above: a MAC address is unique within a VLAN, but mentioned above: a MAC address is unique within a VLAN, but not
not necessarily across VLANs. Traffic sourced within a VLAN necessarily across VLANs. Traffic sourced within a VLAN (including
(including broadcast and multicast traffic) remains within the VLAN broadcast and multicast traffic) remains within the VLAN it
it originates from. Traffic forwarded from one VLAN to another originates from. Traffic forwarded from one VLAN to another
typically involves router (L3) processing. The forwarding table look typically involves router (L3) processing. The forwarding table look
up operation is keyed on {VLAN, MAC address} tuples. up operation is keyed on {VLAN, MAC address} tuples.
But there are problems and limitations with L2 VLANs. VLANs are a But there are problems and limitations with L2 VLANs. VLANs are a
pure L2 bridging construct and VLAN identifiers are carried along pure L2 bridging construct and VLAN identifiers are carried along
with data frames to allow each forwarding point to know what VLAN the with data frames to allow each forwarding point to know what VLAN the
frame belongs to. A VLAN today is defined as a 12 bit number, frame belongs to. A VLAN today is defined as a 12 bit number,
limiting the total number of VLANs to 4096 (though typically, this limiting the total number of VLANs to 4096 (though typically, this
number is 4094 since 0 and 4095 are reserved). Due to the large number is 4094 since 0 and 4095 are reserved). Due to the large
number of tenants that a cloud provider might service, the 4094 VLAN number of tenants that a cloud provider might service, the 4094 VLAN
limit is often inadequate. In addition, there is often a need for limit is often inadequate. In addition, there is often a need for
multiple VLANs per tenant, which exacerbates the issue. The use of a multiple VLANs per tenant, which exacerbates the issue. The use of a
sufficiently large VNID, present in the overlay control plane and sufficiently large VNID, present in the overlay control plane and
possibly also in the dataplane would eliminate current VLAN size possibly also in the dataplane would eliminate current VLAN size
limitations associated with single 12-bit VLAN tags. limitations associated with single 12-bit VLAN tags.
For IP/MPLS networks, Ethernet Virtual Private Network (E-VPN) 3. Network Overlays
[I-D.ietf-l2vpn-evpn] provides an emulated Ethernet service in which
each tenant has its own Ethernet network over a common IP or MPLS
infrastructure and a BGP/MPLS control plane is used to distribute the
tenant MAC addresses and the MPLS labels that identify the tenants
and tenant MAC addresses. Within the BGP/MPLS control plane a thirty
two bit Ethernet Tag is used to identify the broadcast domains
(VLANs) associated with a given L2 VLAN service instance and these
Ethernet tags are mapped to VLAN IDs understood by the tenant at the
service edges. This means that the limit of 4096 VLANs is associated
with an individual tenant service edge, enabling a much higher level
of scalability. Interconnectivity between tenants is also allowed in
a controlled fashion.
IP/MPLS networks also provide an IP VPN service (L3 VPN) [RFC4364] in Virtual Networks are used to isolate a tenant's traffic from that of
which each tenant has its own IP network over a common IP or MPLS other tenants (or even traffic within the same tenant that requires
infrastructure and a BGP/MPLS control plane is used to distribute the isolation). There are two main characteristics of virtual networks:
tenant IP routes and the MPLS labels that identify the tenants and
tenant IP routes. As with E-VPNs, interconnectivity between tenants
is also allowed in a controlled fashion.
VM Mobility [I-D.raggarwa-data-center-mobility] introduces the 1. Virtual networks isolate the address space used in one virtual
concept of a combined L2/L3 VPN service in order to support the network from the address space used by another virtual network.
mobility of individual Virtual Machines (VMs) between Data Centers The same network addresses may be used in different virtual
connected over a common IP or MPLS infrastructure. networks at the same time. In addition, the address space used
by a virtual network is independent from that used by the
underlying physical network.
There are a number of VPN approaches that provide some if not all of 2. Virtual Networks limit the scope of packets sent on the virtual
the desired semantics of virtual networks. A gap analysis will be network. Packets sent by end systems attached to a virtual
needed to assess how well existing approaches satisfy the network are delivered as expected to other end systems on that
requirements. virtual network and may exit a virtual network only through
controlled exit points such as a security gateway. Likewise,
packets sourced from outside of the virtual network may enter the
virtual network only through controlled entry points, such as a
security gateway.
3.2. Benefits of Network Overlays 3.1. Benefits of Network Overlays
To address the problems described earlier, a network overlay model To address the problems described in Section 2, a network overlay
can be used. model can be used.
The idea behind an overlay is quite straightforward. Each virtual The idea behind an overlay is quite straightforward. Each virtual
network instance is implemented as an overlay. The original frame is network instance is implemented as an overlay. The original packet
encapsulated by the first hop network device. The encapsulation is encapsulated by the first-hop network device. The encapsulation
identifies the destination of the device that will perform the identifies the destination of the device that will perform the
decapsulation before delivering the frame to the endpoint. The rest decapsulation before delivering the original packet to the endpoint.
of the network forwards the frame based on the encapsulation header The rest of the network forwards the packet based on the
and can be oblivious to the payload that is carried inside. To avoid encapsulation header and can be oblivious to the payload that is
belaboring the point each time, the first hop network device can be a carried inside.
traditional switch or router or the virtual switch residing inside a
hypervisor. Furthermore, the endpoint can be a VM or it can be a
physical server. Examples of architectures based on network overlays
include BGP/MPLS VPNs [RFC4364], TRILL [RFC6325], LISP
[I-D.ietf-lisp], and Shortest Path Bridging [SPB].
With the overlay, a virtual network identifier (or VNID) can be Overlays are based on what is commonly known as a "map-and-encap"
carried as part of the overlay header so that every data frame architecture. There are three distinct and logically separable
explicitly identifies the specific virtual network the frame belongs steps:
to. Since both routed and bridged semantics can be supported by a
virtual data center, the original frame carried within the overlay 1. The first-hop overlay device implements a mapping operation that
header can be an Ethernet frame complete with MAC addresses or just determines where the encapsulated packet should be sent to reach
the IP packet. its intended destination VM. Specifically, the mapping function
maps the destination address (either L2 or L3) of a packet
received from a VM into the corresponding destination address of
the egress device. The destination address will be the underlay
address of the device doing the decapsulation and is an IP
address.
2. Once the mapping has been determined, the ingress overlay device
encapsulates the received packet within an overlay header.
3. The final step is to actually forward the (now encapsulated)
packet to its destination. The packet is forwarded by the
underlay (i.e., the IP network) based entirely on its outer
address. Upon receipt at the destination, the egress overlay
device decapsulates the original packet and delivers it to the
intended recipient VM.
Each of the above steps is logically distinct, though an
implementation might combine them for efficiency or other reasons.
It should be noted that in L3 BGP/VPN terminology, the above steps
are commonly known as "forwarding" or "virtual forwarding".
The first hop network device can be a traditional switch or router or
the virtual switch residing inside a hypervisor. Furthermore, the
endpoint can be a VM or it can be a physical server. Examples of
architectures based on network overlays include BGP/MPLS VPNs
[RFC4364], TRILL [RFC6325], LISP [I-D.ietf-lisp], and Shortest Path
Bridging (SPB-M) [SPBM].
In the data plane, a virtual network identifier (or VNID), or a
locally significant identifier, can be carried as part of the overlay
header so that every data packet explicitly identifies the specific
virtual network the packet belongs to. Since both routed and bridged
semantics can be supported by a virtual data center, the original
packet carried within the overlay header can be an Ethernet frame
complete with MAC addresses or just the IP packet.
The use of a sufficiently large VNID would address current VLAN The use of a sufficiently large VNID would address current VLAN
limitations associated with single 12-bit VLAN tags. This VNID can limitations associated with single 12-bit VLAN tags. This VNID can
be carried in the control plane. In the data plane, an overlay be carried in the control plane. In the data plane, an overlay
header provides a place to carry either the VNID, or a locally- header provides a place to carry either the VNID, or an identifier
significant identifier. In both cases, the identifier in the overlay that is locally-significant to the edge device. In both cases, the
header specifies which virtual network the data packet belongs to. identifier in the overlay header specifies which virtual network the
data packet belongs to.
A key aspect of overlays is the decoupling of the "virtual" MAC and A key aspect of overlays is the decoupling of the "virtual" MAC
IP addresses used by VMs from the physical network infrastructure and and/or IP addresses used by VMs from the physical network
the infrastructure IP addresses used by the data center. If a VM infrastructure and the infrastructure IP addresses used by the data
changes location, the switches at the edge of the overlay simply center. If a VM changes location, the overlay edge devices simply
update their mapping tables to reflect the new location of the VM update their mapping tables to reflect the new location of the VM
within the data center's infrastructure space. Because an overlay within the data center's infrastructure space. Because an overlay
network is used, a VM can now be located anywhere in the data center network is used, a VM can now be located anywhere in the data center
that the overlay reaches without regards to traditional constraints that the overlay reaches without regards to traditional constraints
implied by L2 properties such as VLAN numbering, or the span of an L2 implied by L2 properties such as VLAN numbering, or the span of an L2
broadcast domain scoped to a single pod or access switch. broadcast domain scoped to a single pod or access switch.
Multi-tenancy is supported by isolating the traffic of one virtual Multi-tenancy is supported by isolating the traffic of one virtual
network instance from traffic of another. Traffic from one virtual network instance from traffic of another. Traffic from one virtual
network instance cannot be delivered to another instance without network instance cannot be delivered to another instance without
(conceptually) exiting the instance and entering the other instance (conceptually) exiting the instance and entering the other instance
via an entity that has connectivity to both virtual network via an entity that has connectivity to both virtual network
instances. Without the existence of this entity, tenant traffic instances. Without the existence of this entity, tenant traffic
remains isolated within each individual virtual network instance. remains isolated within each individual virtual network instance.
Overlays are designed to allow a set of VMs to be placed within a Overlays are designed to allow a set of VMs to be placed within a
single virtual network instance, whether that virtual network single virtual network instance, whether that virtual network
provides a bridged network or a routed network. provides a bridged network or a routed network.
3.3. Overlay Networking Work Areas 3.2. Communication Between Virtual and Traditional Networks
Not all communication will be between devices connected to
virtualized networks. Devices using overlays will continue to access
devices and make use of services on traditional, non-virtualized
networks, whether in the data center, the public Internet, or at
remote/branch campuses. Any virtual network solution must be capable
of interoperating with existing routers, VPN services, load
balancers, intrusion detection services, firewalls, etc. on external
networks.
Communication between devices attached to a virtual network and
devices connected to non-virtualized networks is handled
architecturally by having specialized gateway devices that receive
packets from a virtualized network, decapsulate them, process them as
regular (i.e., non-virtualized) traffic, and finally forward them on
to their appropriate destination (and vice versa). Additional
identification, such as VLAN tags, could be used on the non-
virtualized side of such a gateway to enable forwarding of traffic
for multiple virtual networks over a common non-virtualized link.
A wide range of implementation approaches are possible. Overlay
gateway functionality could be combined with other network
functionality into a network device that implements the overlay
functionality, and then forwards traffic between other internal
components that implement functionality such as full router service,
load balancing, firewall support, VPN gateway, etc.
3.3. Communication Between Virtual Networks
Communication between devices on different virtual networks is
handled architecturally by adding specialized interconnect
functionality among the otherwise isolated virtual networks. For a
virtual network providing an L2 service, such interconnect
functionality could be IP forwarding configured as part of the
"default gateway" for each virtual network. For a virtual network
providing L3 service, the interconnect functionality could be IP
forwarding configured as part of routing between IP subnets or it can
be based on configured inter-virtual network traffic policies. In
both cases, the implementation of the interconnect functionality
could be distributed across the NVEs, and could be combined with
other network functionality (e.g., load balancing, firewall support)
that is applied to traffic that is forwarded between virtual
networks.
3.4. Overlay Design Characteristics
There are existing layer 2 and layer 3 overlay protocols in
existence, but they do not necessarily solve all of today's problem
in the environment of a highly virtualized data center. Below are
some of the characteristics of environments that must be taken into
account by the overlay technology:
1. Highly distributed systems. The overlay should work in an
environment where there could be many thousands of access devices
(e.g. residing within the hypervisors) and many more end systems
(e.g. VMs) connected to them. This leads to a distributed
mapping system that puts a low overhead on the overlay tunnel
endpoints.
2. Many highly distributed virtual networks with sparse membership.
Each virtual network could be highly dispersed inside the data
center. Also, along with expectation of many virtual networks,
the number of end systems connected to any one virtual network is
expected to be relatively low; Therefore, the percentage of
access devices participating in any given virtual network would
also be expected to be low. For this reason, efficient delivery
of multi-destination traffic within a virtual network instance
should be taken into consideration.
3. Highly dynamic end systems. End systems connected to virtual
networks can be very dynamic, both in terms of creation/deletion/
power-on/off and in terms of mobility across the access devices.
4. Work with existing, widely deployed network Ethernet switches and
IP routers without requiring wholesale replacement. The first
hop device (or end system) that adds and removes the overlay
header will require new equipment and/or new software.
5. Work with existing data center network deployments without
requiring major changes in operational or other practices. For
example, some data centers have not enabled multicast beyond
link-local scope. Overlays should be capable of leveraging
underlay multicast support where appropriate, but not require its
enablement in order to use an overlay solution.
6. Network infrastructure administered by a single administrative
domain. This is consistent with operation within a data center,
and not across the Internet.
3.5. Overlay Networking Work Areas
There are three specific and separate potential work areas needed to There are three specific and separate potential work areas needed to
realize an overlay solution. The areas correspond to different realize an overlay solution. The areas correspond to different
possible "on-the-wire" protocols, where distinct entities interact possible "on-the-wire" protocols, where distinct entities interact
with each other. with each other.
One area of work concerns the address dissemination protocol an NVE One area of work concerns the address dissemination protocol an NVE
uses to build and maintain the mapping tables it uses to deliver uses to build and maintain the mapping tables it uses to deliver
encapsulated frames to their proper destination. One approach is to encapsulated packets to their proper destination. One approach is to
build mapping tables entirely via learning (as is done in 802.1 build mapping tables entirely via learning (as is done in 802.1
networks). But to provide better scaling properties, a more networks). But to provide better scaling properties, a more
sophisticated approach is needed, i.e., the use of a specialized sophisticated approach is needed, i.e., the use of a specialized
control plane protocol. While there are some advantages to using or control plane protocol. While there are some advantages to using or
leveraging an existing protocol for maintaining mapping tables, the leveraging an existing protocol for maintaining mapping tables, the
fact that large numbers of NVE's will likely reside in hypervisors fact that large numbers of NVE's will likely reside in hypervisors
places constraints on the resources (cpu and memory) that can be places constraints on the resources (cpu and memory) that can be
dedicated to such functions. For example, routing protocols (e.g., dedicated to such functions. For example, routing protocols (e.g.,
IS-IS, BGP) may have scaling difficulties if implemented directly in IS-IS, BGP) may have scaling difficulties if implemented directly in
all NVEs, based on both flooding and convergence time concerns. An all NVEs, based on both flooding and convergence time concerns. An
skipping to change at page 13, line 10 skipping to change at page 14, line 11
tunnels associated with the VM. To achieve this functionality, a tunnels associated with the VM. To achieve this functionality, a
standardized interaction between the NVE and hypervisor may be standardized interaction between the NVE and hypervisor may be
needed, for example in the case where the NVE resides on a separate needed, for example in the case where the NVE resides on a separate
device from the VM. device from the VM.
In summary, there are three areas of potential work. The first area In summary, there are three areas of potential work. The first area
concerns the oracle itself and any on-the-wire protocols it needs. A concerns the oracle itself and any on-the-wire protocols it needs. A
second area concerns the interaction between the oracle and NVEs. second area concerns the interaction between the oracle and NVEs.
The third work area concerns protocols associated with attaching and The third work area concerns protocols associated with attaching and
detaching a VM from a particular virtual network instance. All three detaching a VM from a particular virtual network instance. All three
work areas are important to the development of a scalable, work areas are important to the development of scalable,
interoperable solution. interoperable solutions.
4. Related Work 4. Related IETF and IEEE Work
4.1. IEEE 802.1aq - Shortest Path Bridging The following subsections discuss related IETF and IEEE work in
progress, the items are not meant to be complete coverage of all IETF
and IEEE data center related work, nor are the descriptions
comprehensive. Each area is currently trying to address certain
limitations of today's data center networks, e.g., scaling is a
common issue for every area listed and multi-tenancy and VM mobility
are important focus areas as well. Comparing and evaluating the work
result and progress of each work area listed is out of scope of this
document. The intent of this section is to provide a reference to
the interested readers.
Shortest Path Bridging (SPB) is an IS-IS based overlay for L2 4.1. L3 BGP/MPLS IP VPNs
Ethernets. SPB supports multi-pathing and addresses a number of
BGP/MPLS IP VPNs [RFC4364] support multi-tenancy address overlapping,
VPN traffic isolation, and address separation between tenants and
network infrastructure. The BGP/MPLS control plane is used to
distribute the VPN labels and the tenant IP addresses which identify
the tenants (or to be more specific, the particular VPN/VN) and
tenant IP addresses. Deployment of enterprise L3 VPNs has been shown
to scale to thousands of VPNs and millions of VPN prefixes. BGP/MPLS
IP VPNs are currently deployed in some large enterprise data centers.
The potential limitation for deploying BGP/MPLS IP VPNs in data
center environments is the practicality of using BGP in the data
center, especially reaching into the servers or hypervisors. There
may be computing work force skill set issues, equipment support
issues, and potential new scaling challenges. A combination of BGP
and lighter weight IP signaling protocols, e.g., XMPP, have been
proposed to extend the solutions into DC environment [I-D.margues-
end-system], while taking advantage of building in VPN features with
its rich policy support; it is especially useful for inter-tenant
connectivity.
4.2. L2 BGP/MPLS IP VPNs
Ethernet Virtual Private Networks (E-VPNs) [I-D.ietf-l2vpn-evpn]
provide an emulated L2 service in which each tenant has its own
Ethernet network over a common IP or MPLS infrastructure and a BGP/
MPLS control plane is used to distribute the tenant MAC addresses and
the MPLS labels that identify the tenants and tenant MAC addresses.
Within the BGP/MPLS control plane a thirty two bit Ethernet Tag is
used to identify the broadcast domains (VLANs) associated with a
given L2 VLAN service instance and these Ethernet tags are mapped to
VLAN IDs understood by the tenant at the service edges. This means
that the limit of 4096 VLANs is associated with an individual tenant
service edge, enabling a much higher level of scalability.
Interconnection between tenants is also allowed in a controlled
fashion.
VM Mobility [I-D.raggarwa-data-center-mobility] introduces the
concept of a combined L2/L3 VPN service in order to support the
mobility of individual Virtual Machines (VMs) between Data Centers
connected over a common IP or MPLS infrastructure.
4.3. IEEE 802.1aq - Shortest Path Bridging
Shortest Path Bridging (SPB-M) is an IS-IS based overlay for L2
Ethernets. SPB-M supports multi-pathing and addresses a number of
shortcoming in the original Ethernet Spanning Tree Protocol. SPB-M shortcoming in the original Ethernet Spanning Tree Protocol. SPB-M
uses IEEE 802.1ah MAC-in-MAC encapsulation and supports a 24-bit uses IEEE 802.1ah MAC-in-MAC encapsulation and supports a 24-bit
I-SID, which can be used to identify virtual network instances. SPB I-SID, which can be used to identify virtual network instances.
is entirely L2 based, extending the L2 Ethernet bridging model. SPB-M is entirely L2 based, extending the L2 Ethernet bridging model.
4.2. ARMD 4.4. ARMD
ARMD is chartered to look at data center scaling issues with a focus ARMD is chartered to look at data center scaling issues with a focus
on address resolution. ARMD is currently chartered to develop a on address resolution. ARMD is currently chartered to develop a
problem statement and is not currently developing solutions. While problem statement and is not currently developing solutions. While
an overlay-based approach may address some of the "pain points" that an overlay-based approach may address some of the "pain points" that
have been raised in ARMD (e.g., better support for multi-tenancy), an have been raised in ARMD (e.g., better support for multi-tenancy), an
overlay approach may also push some of the L2 scaling concerns (e.g., overlay approach may also push some of the L2 scaling concerns (e.g.,
excessive flooding) to the IP level (flooding via IP multicast). excessive flooding) to the IP level (flooding via IP multicast).
Analysis will be needed to understand the scaling tradeoffs of an Analysis will be needed to understand the scaling tradeoffs of an
overlay based approach compared with existing approaches. On the overlay based approach compared with existing approaches. On the
other hand, existing IP-based approaches such as proxy ARP may help other hand, existing IP-based approaches such as proxy ARP may help
mitigate some concerns. mitigate some concerns.
4.3. TRILL 4.5. TRILL
TRILL is an L2-based approach aimed at improving deficiencies and TRILL is an L2-based approach aimed at improving deficiencies and
limitations with current Ethernet networks and STP in particular. limitations with current Ethernet networks and STP in particular.
Although it differs from Shortest Path Bridging in many architectural Although it differs from Shortest Path Bridging in many architectural
and implementation details, it is similar in that is provides an L2- and implementation details, it is similar in that is provides an L2-
based service to end systems. TRILL as defined today, supports only based service to end systems. TRILL as defined today, supports only
the standard (and limited) 12-bit VLAN model. Approaches to extend the standard (and limited) 12-bit VLAN model. Approaches to extend
TRILL to support more than 4094 VLANs are currently under TRILL to support more than 4094 VLANs are currently under
investigation [I-D.ietf-trill-fine-labeling] investigation [I-D.ietf-trill-fine-labeling]
4.4. L2VPNs 4.6. L2VPNs
The IETF has specified a number of approaches for connecting L2 The IETF has specified a number of approaches for connecting L2
domains together as part of the L2VPN Working Group. That group, domains together as part of the L2VPN Working Group. That group,
however has historically been focused on Provider-provisioned L2 however has historically been focused on Provider-provisioned L2
VPNs, where the service provider participates in management and VPNs, where the service provider participates in management and
provisioning of the VPN. In addition, much of the target environment provisioning of the VPN. In addition, much of the target environment
for such deployments involves carrying L2 traffic over WANs. Overlay for such deployments involves carrying L2 traffic over WANs. Overlay
approaches are intended be used within data centers where the overlay approaches are intended be used within data centers where the overlay
network is managed by the data center operator, rather than by an network is managed by the data center operator, rather than by an
outside party. While overlays can run across the Internet as well, outside party. While overlays can run across the Internet as well,
they will extend well into the data center itself (e.g., up to and they will extend well into the data center itself (e.g., up to and
including hypervisors) and include large numbers of machines within including hypervisors) and include large numbers of machines within
the data center itself. the data center itself.
Other L2VPN approaches, such as L2TP [RFC2661] require significant Other L2VPN approaches, such as L2TP [RFC2661] require significant
tunnel state at the encapsulating and decapsulating end points. tunnel state at the encapsulating and decapsulating end points.
Overlays require less tunnel state than other approaches, which is Overlays require less tunnel state than other approaches, which is
important to allow overlays to scale to hundreds of thousands of end important to allow overlays to scale to hundreds of thousands of end
points. It is assumed that smaller switches (i.e., virtual switches points. It is assumed that smaller switches (i.e., virtual switches
in hypervisors or the physical switches to which VMs connect) will be in hypervisors or the adjacent devices to which VMs connect) will be
part of the overlay network and be responsible for encapsulating and part of the overlay network and be responsible for encapsulating and
decapsulating packets. decapsulating packets.
4.5. Proxy Mobile IP 4.7. Proxy Mobile IP
Proxy Mobile IP [RFC5213] [RFC5844] makes use of the GRE Key Field Proxy Mobile IP [RFC5213] [RFC5844] makes use of the GRE Key Field
[RFC5845] [RFC6245], but not in a way that supports multi-tenancy. [RFC5845] [RFC6245], but not in a way that supports multi-tenancy.
4.6. LISP 4.8. LISP
LISP[I-D.ietf-lisp] essentially provides an IP over IP overlay where LISP[I-D.ietf-lisp] essentially provides an IP over IP overlay where
the internal addresses are end station Identifiers and the outer IP the internal addresses are end station Identifiers and the outer IP
addresses represent the location of the end station within the core addresses represent the location of the end station within the core
IP network topology. The LISP overlay header uses a 24-bit Instance IP network topology. The LISP overlay header uses a 24-bit Instance
ID used to support overlapping inner IP addresses. ID used to support overlapping inner IP addresses.
4.7. Individual Submissions
Many individual submissions also look to addressing some or all of
the issues addressed in this draft. Examples of such drafts are
VXLAN [I-D.mahalingam-dutt-dcops-vxlan], NVGRE
[I-D.sridharan-virtualization-nvgre] and Virtual Machine Mobility in
L3 networks[I-D.wkumari-dcops-l3-vmmobility].
5. Further Work 5. Further Work
It is believed that overlay-based approaches may be able to reduce It is believed that overlay-based approaches may be able to reduce
the overall amount of flooding and other multicast and broadcast the overall amount of flooding and other multicast and broadcast
related traffic (e.g, ARP and ND) currently experienced within related traffic (e.g, ARP and ND) currently experienced within
current data centers with a large flat L2 network. Further analysis current data centers with a large flat L2 network. Further analysis
is needed to characterize expected improvements. is needed to characterize expected improvements.
There are a number of VPN approaches that provide some if not all of
the desired semantics of virtual networks. A gap analysis will be
needed to assess how well existing approaches satisfy the
requirements.
6. Summary 6. Summary
This document has argued that network virtualization using L3 This document has argued that network virtualization using overlays
overlays addresses a number of issues being faced as data centers addresses a number of issues being faced as data centers scale in
scale in size. In addition, careful consideration of a number of size. In addition, careful study of current data center problems is
issues would lead to the development of interoperable implementation needed for development of proper requirements and standard solutions.
of virtualization overlays.
Three potential work were identified. The first involves the Three potential work were identified. The first involves the
interaction that take place when a VM attaches or detaches from an interaction that take place when a VM attaches or detaches from an
overlay. A second involves the protocol an NVE would use to overlay. A second involves the protocol an NVE would use to
communicate with a backend "oracle" to learn and disseminate mapping communicate with a backend "oracle" to learn and disseminate mapping
information about the VMs the NVE communicates with. The third information about the VMs the NVE communicates with. The third
potential work area involves the backend oracle itself, i.e., how it potential work area involves the backend oracle itself, i.e., how it
provides failover and how it interacts with oracles in other domains. provides failover and how it interacts with oracles in other domains.
7. Acknowledgments 7. Acknowledgments
Helpful comments and improvements to this document have come from Helpful comments and improvements to this document have come from
Ariel Hendel, Vinit Jain, and Benson Schliesser. John Drake, Ariel Hendel, Vinit Jain, Thomas Morin, Benson Schliesser
and many others on the mailing list.
8. IANA Considerations 8. IANA Considerations
This memo includes no request to IANA. This memo includes no request to IANA.
9. Security Considerations 9. Security Considerations
TBD TBD
10. Informative References 10. Informative References
[I-D.fang-vpn4dc-problem-statement]
Napierala, M., Fang, L., and D. Cai, "IP-VPN Data Center
Problem Statement and Requirements",
draft-fang-vpn4dc-problem-statement-01 (work in progress),
June 2012.
[I-D.ietf-l2vpn-evpn] [I-D.ietf-l2vpn-evpn]
Sajassi, A., Aggarwal, R., Henderickx, W., Balus, F., Sajassi, A., Aggarwal, R., Henderickx, W., Balus, F.,
Isaac, A., and J. Uttaro, "BGP MPLS Based Ethernet VPN", Isaac, A., and J. Uttaro, "BGP MPLS Based Ethernet VPN",
draft-ietf-l2vpn-evpn-01 (work in progress), July 2012. draft-ietf-l2vpn-evpn-01 (work in progress), July 2012.
[I-D.ietf-lisp] [I-D.ietf-lisp]
Farinacci, D., Fuller, V., Meyer, D., and D. Lewis, Farinacci, D., Fuller, V., Meyer, D., and D. Lewis,
"Locator/ID Separation Protocol (LISP)", "Locator/ID Separation Protocol (LISP)",
draft-ietf-lisp-23 (work in progress), May 2012. draft-ietf-lisp-23 (work in progress), May 2012.
[I-D.ietf-trill-fine-labeling] [I-D.ietf-trill-fine-labeling]
Eastlake, D., Zhang, M., Agarwal, P., Perlman, R., and D. Eastlake, D., Zhang, M., Agarwal, P., Perlman, R., and D.
Dutt, "TRILL: Fine-Grained Labeling", Dutt, "TRILL: Fine-Grained Labeling",
draft-ietf-trill-fine-labeling-01 (work in progress), draft-ietf-trill-fine-labeling-01 (work in progress),
June 2012. June 2012.
[I-D.kreeger-nvo3-overlay-cp] [I-D.kreeger-nvo3-overlay-cp]
Black, D., Dutt, D., Kreeger, L., Sridhavan, M., and T. Kreeger, L., Dutt, D., Narten, T., Black, D., and M.
Narten, "Network Virtualization Overlay Control Protocol Sridhavan, "Network Virtualization Overlay Control
Requirements", draft-kreeger-nvo3-overlay-cp-00 (work in Protocol Requirements", draft-kreeger-nvo3-overlay-cp-01
progress), January 2012. (work in progress), July 2012.
[I-D.lasserre-nvo3-framework] [I-D.lasserre-nvo3-framework]
Lasserre, M., Balus, F., Morin, T., Bitar, N., and Y. Lasserre, M., Balus, F., Morin, T., Bitar, N., and Y.
Rekhter, "Framework for DC Network Virtualization", Rekhter, "Framework for DC Network Virtualization",
draft-lasserre-nvo3-framework-03 (work in progress), draft-lasserre-nvo3-framework-03 (work in progress),
July 2012. July 2012.
[I-D.mahalingam-dutt-dcops-vxlan]
Sridhar, T., Bursell, M., Kreeger, L., Dutt, D., Wright,
C., Mahalingam, M., Duda, K., and P. Agarwal, "VXLAN: A
Framework for Overlaying Virtualized Layer 2 Networks over
Layer 3 Networks", draft-mahalingam-dutt-dcops-vxlan-01
(work in progress), February 2012.
[I-D.raggarwa-data-center-mobility] [I-D.raggarwa-data-center-mobility]
Aggarwal, R., Rekhter, Y., Henderickx, W., Shekhar, R., Aggarwal, R., Rekhter, Y., Henderickx, W., Shekhar, R.,
and L. Fang, "Data Center Mobility based on BGP/MPLS, IP and L. Fang, "Data Center Mobility based on BGP/MPLS, IP
Routing and NHRP", draft-raggarwa-data-center-mobility-03 Routing and NHRP", draft-raggarwa-data-center-mobility-03
(work in progress), June 2012. (work in progress), June 2012.
[I-D.sridharan-virtualization-nvgre]
Sridhavan, M., Greenberg, A., Venkataramaiah, N., Wang,
Y., Duda, K., Ganga, I., Lin, G., Pearson, M., Thaler, P.,
and C. Tumuluri, "NVGRE: Network Virtualization using
Generic Routing Encapsulation",
draft-sridharan-virtualization-nvgre-01 (work in
progress), July 2012.
[I-D.wkumari-dcops-l3-vmmobility]
Kumari, W. and J. Halpern, "Virtual Machine mobility in L3
Networks.", draft-wkumari-dcops-l3-vmmobility-00 (work in
progress), August 2011.
[RFC2661] Townsley, W., Valencia, A., Rubens, A., Pall, G., Zorn, [RFC2661] Townsley, W., Valencia, A., Rubens, A., Pall, G., Zorn,
G., and B. Palter, "Layer Two Tunneling Protocol "L2TP"", G., and B. Palter, "Layer Two Tunneling Protocol "L2TP"",
RFC 2661, August 1999. RFC 2661, August 1999.
[RFC4023] Worster, T., Rekhter, Y., and E. Rosen, "Encapsulating
MPLS in IP or Generic Routing Encapsulation (GRE)",
RFC 4023, March 2005.
[RFC4364] Rosen, E. and Y. Rekhter, "BGP/MPLS IP Virtual Private [RFC4364] Rosen, E. and Y. Rekhter, "BGP/MPLS IP Virtual Private
Networks (VPNs)", RFC 4364, February 2006. Networks (VPNs)", RFC 4364, February 2006.
[RFC5036] Andersson, L., Minei, I., and B. Thomas, "LDP
Specification", RFC 5036, October 2007.
[RFC5213] Gundavelli, S., Leung, K., Devarapalli, V., Chowdhury, K., [RFC5213] Gundavelli, S., Leung, K., Devarapalli, V., Chowdhury, K.,
and B. Patil, "Proxy Mobile IPv6", RFC 5213, August 2008. and B. Patil, "Proxy Mobile IPv6", RFC 5213, August 2008.
[RFC5844] Wakikawa, R. and S. Gundavelli, "IPv4 Support for Proxy [RFC5844] Wakikawa, R. and S. Gundavelli, "IPv4 Support for Proxy
Mobile IPv6", RFC 5844, May 2010. Mobile IPv6", RFC 5844, May 2010.
[RFC5845] Muhanna, A., Khalil, M., Gundavelli, S., and K. Leung, [RFC5845] Muhanna, A., Khalil, M., Gundavelli, S., and K. Leung,
"Generic Routing Encapsulation (GRE) Key Option for Proxy "Generic Routing Encapsulation (GRE) Key Option for Proxy
Mobile IPv6", RFC 5845, June 2010. Mobile IPv6", RFC 5845, June 2010.
[RFC6245] Yegani, P., Leung, K., Lior, A., Chowdhury, K., and J. [RFC6245] Yegani, P., Leung, K., Lior, A., Chowdhury, K., and J.
Navali, "Generic Routing Encapsulation (GRE) Key Extension Navali, "Generic Routing Encapsulation (GRE) Key Extension
for Mobile IPv4", RFC 6245, May 2011. for Mobile IPv4", RFC 6245, May 2011.
[RFC6325] Perlman, R., Eastlake, D., Dutt, D., Gai, S., and A. [RFC6325] Perlman, R., Eastlake, D., Dutt, D., Gai, S., and A.
Ghanwani, "Routing Bridges (RBridges): Base Protocol Ghanwani, "Routing Bridges (RBridges): Base Protocol
Specification", RFC 6325, July 2011. Specification", RFC 6325, July 2011.
[SPB] "IEEE P802.1aq/D4.5 Draft Standard for Local and [SPBM] "IEEE P802.1aq/D4.5 Draft Standard for Local and
Metropolitan Area Networks -- Media Access Control (MAC) Metropolitan Area Networks -- Media Access Control (MAC)
Bridges and Virtual Bridged Local Area Networks, Bridges and Virtual Bridged Local Area Networks,
Amendment 8: Shortest Path Bridging", February 2012. Amendment 8: Shortest Path Bridging", February 2012.
Appendix A. Change Log Appendix A. Change Log
A.1. Changes from -01 A.1. Changes from -01
1. Removed Section 4.2 (Standardization Issues) and Section 5 1. Removed Section 4.2 (Standardization Issues) and Section 5
(Control Plane) as those are more appropriately covered in and (Control Plane) as those are more appropriately covered in and
skipping to change at page 18, line 25 skipping to change at page 20, line 6
5. Revised some of the terminology to be consistent with 5. Revised some of the terminology to be consistent with
[I-D.lasserre-nvo3-framework] and [I-D.kreeger-nvo3-overlay-cp]. [I-D.lasserre-nvo3-framework] and [I-D.kreeger-nvo3-overlay-cp].
A.2. Changes from -02 A.2. Changes from -02
1. Numerous changes in response to discussions on the nvo3 mailing 1. Numerous changes in response to discussions on the nvo3 mailing
list, with majority of changes in Section 2 (Problem Details) and list, with majority of changes in Section 2 (Problem Details) and
Section 3 (Network Overlays). Best to see diffs for specific Section 3 (Network Overlays). Best to see diffs for specific
text changes. text changes.
A.3. Changes from -03
1. Too numerous to enumerate; moved solution-specific descriptions
to Related Work section. Pulled in additional text (and authors)
from from [I-D.fang-vpn4dc-problem-statement], numerous editorial
improvements.
Authors' Addresses Authors' Addresses
Thomas Narten (editor) Thomas Narten (editor)
IBM IBM
Email: narten@us.ibm.com Email: narten@us.ibm.com
Murari Sridharan David Black
Microsoft EMC
Email: muraris@microsoft.com Email: david.black@emc.com
Dinesh Dutt Dinesh Dutt
Email: ddutt.ietf@hobbesdutt.com Email: ddutt.ietf@hobbesdutt.com
David Black Luyuan Fang
EMC Cisco Systems
111 Wood Avenue South
Iselin, NJ 08830
USA
Email: david.black@emc.com Email: lufang@cisco.com
Eric Gray
Ericsson
Email: eric.gray@ericsson.com
Lawrence Kreeger Lawrence Kreeger
Cisco Cisco
Email: kreeger@cisco.com Email: kreeger@cisco.com
Maria Napierala
AT&T
200 Laurel Avenue
Middletown, NJ 07748
USA
Email: mnapierala@att.com
Murari Sridharan
Microsoft
Email: muraris@microsoft.com
 End of changes. 72 change blocks. 
337 lines changed or deleted 440 lines changed or added

This html diff was produced by rfcdiff 1.41. The latest version is available from http://tools.ietf.org/tools/rfcdiff/