draft-narten-nvo3-overlay-problem-statement-02.txt   draft-narten-nvo3-overlay-problem-statement-03.txt 
Internet Engineering Task Force T. Narten, Ed. Internet Engineering Task Force T. Narten, Ed.
Internet-Draft IBM Internet-Draft IBM
Intended status: Informational M. Sridharan Intended status: Informational M. Sridharan
Expires: December 17, 2012 Microsoft Expires: January 18, 2013 Microsoft
D. Dutt D. Dutt
D. Black D. Black
EMC EMC
L. Kreeger L. Kreeger
Cisco Cisco
June 15, 2012 July 17, 2012
Problem Statement: Overlays for Network Virtualization Problem Statement: Overlays for Network Virtualization
draft-narten-nvo3-overlay-problem-statement-02 draft-narten-nvo3-overlay-problem-statement-03
Abstract Abstract
This document describes issues associated with providing multi- This document describes issues associated with providing multi-
tenancy in large data center networks and an overlay-based network tenancy in large data center networks and an overlay-based network
virtualization approach to addressing them. A key multi-tenancy virtualization approach to addressing them. A key multi-tenancy
requirement is traffic isolation, so that a tenant's traffic is not requirement is traffic isolation, so that a tenant's traffic is not
visible to any other tenant. This isolation can be achieved by visible to any other tenant. This isolation can be achieved by
assigning one or more virtual networks to each tenant such that assigning one or more virtual networks to each tenant such that
traffic within a virtual network is isolated from traffic in other traffic within a virtual network is isolated from traffic in other
virtual networks. The primary functionality required is provisioning virtual networks. The primary functionality required is provisioning
virtual networks, associating a virtual machine's NIC with the virtual networks, associating a virtual machine's virtual network
appropriate virtual network, and maintaining that association as the interface(s) with the appropriate virtual network, and maintaining
virtual machine is activated, migrated and/or deactivated. Use of an that association as the virtual machine is activated, migrated and/or
overlay-based approach enables scalable deployment on large network deactivated. Use of an overlay-based approach enables scalable
infrastructures. deployment on large network infrastructures.
Status of this Memo Status of this Memo
This Internet-Draft is submitted in full conformance with the This Internet-Draft is submitted in full conformance with the
provisions of BCP 78 and BCP 79. provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF). Note that other groups may also distribute Task Force (IETF). Note that other groups may also distribute
working documents as Internet-Drafts. The list of current Internet- working documents as Internet-Drafts. The list of current Internet-
Drafts is at http://datatracker.ietf.org/drafts/current/. Drafts is at http://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress." material or to cite them other than as "work in progress."
This Internet-Draft will expire on December 17, 2012. This Internet-Draft will expire on January 18, 2013.
Copyright Notice Copyright Notice
Copyright (c) 2012 IETF Trust and the persons identified as the Copyright (c) 2012 IETF Trust and the persons identified as the
document authors. All rights reserved. document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents Provisions Relating to IETF Documents
(http://trustee.ietf.org/license-info) in effect on the date of (http://trustee.ietf.org/license-info) in effect on the date of
publication of this document. Please review these documents publication of this document. Please review these documents
carefully, as they describe your rights and restrictions with respect carefully, as they describe your rights and restrictions with respect
to this document. Code Components extracted from this document must to this document. Code Components extracted from this document must
include Simplified BSD License text as described in Section 4.e of include Simplified BSD License text as described in Section 4.e of
the Trust Legal Provisions and are provided without warranty as the Trust Legal Provisions and are provided without warranty as
described in the Simplified BSD License. described in the Simplified BSD License.
Table of Contents Table of Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4
2. Problem Details . . . . . . . . . . . . . . . . . . . . . . . 5 2. Problem Details . . . . . . . . . . . . . . . . . . . . . . . 5
2.1. Multi-tenant Environment Scale . . . . . . . . . . . . . . 5 2.1. Dynamic Provisioning . . . . . . . . . . . . . . . . . . . 5
2.2. Virtual Machine Mobility Requirements . . . . . . . . . . 5 2.2. Virtual Machine Mobility Requirements . . . . . . . . . . 5
2.3. Span of Virtual Networks . . . . . . . . . . . . . . . . . 6 2.3. Span of Virtual Networks . . . . . . . . . . . . . . . . . 6
2.4. Inadequate Forwarding Table Sizes in Switches . . . . . . 6 2.4. Inadequate Forwarding Table Sizes in Switches . . . . . . 6
2.5. Decoupling Logical and Physical Configuration . . . . . . 6 2.5. Decoupling Logical and Physical Configuration . . . . . . 6
2.6. Support Communication Between VMs and Non-virtualized 2.6. Separating Tenant Addressing from Infrastructure
Devices . . . . . . . . . . . . . . . . . . . . . . . . . 7 Addressing . . . . . . . . . . . . . . . . . . . . . . . . 7
2.7. Overlay Design Characteristics . . . . . . . . . . . . . . 7 2.7. Communication Between Virtual and Traditional Networks . . 7
3. Network Overlays . . . . . . . . . . . . . . . . . . . . . . . 8 2.8. Communication Between Virtual Networks . . . . . . . . . . 7
3.1. Limitations of Existing Virtual Network Models . . . . . . 8 2.9. Overlay Design Characteristics . . . . . . . . . . . . . . 8
3.2. Benefits of Network Overlays . . . . . . . . . . . . . . . 9 3. Network Overlays . . . . . . . . . . . . . . . . . . . . . . . 9
3.3. Overlay Networking Work Areas . . . . . . . . . . . . . . 10 3.1. Limitations of Existing Virtual Network Models . . . . . . 9
4. Related Work . . . . . . . . . . . . . . . . . . . . . . . . . 12 3.2. Benefits of Network Overlays . . . . . . . . . . . . . . . 10
4.1. IEEE 802.1aq - Shortest Path Bridging . . . . . . . . . . 12 3.3. Overlay Networking Work Areas . . . . . . . . . . . . . . 11
4.2. ARMD . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 4. Related Work . . . . . . . . . . . . . . . . . . . . . . . . . 13
4.3. TRILL . . . . . . . . . . . . . . . . . . . . . . . . . . 12 4.1. IEEE 802.1aq - Shortest Path Bridging . . . . . . . . . . 13
4.4. L2VPNs . . . . . . . . . . . . . . . . . . . . . . . . . . 13 4.2. ARMD . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
4.5. Proxy Mobile IP . . . . . . . . . . . . . . . . . . . . . 13 4.3. TRILL . . . . . . . . . . . . . . . . . . . . . . . . . . 13
4.6. LISP . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 4.4. L2VPNs . . . . . . . . . . . . . . . . . . . . . . . . . . 14
4.7. Individual Submissions . . . . . . . . . . . . . . . . . . 13 4.5. Proxy Mobile IP . . . . . . . . . . . . . . . . . . . . . 14
5. Further Work . . . . . . . . . . . . . . . . . . . . . . . . . 14 4.6. LISP . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
6. Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 4.7. Individual Submissions . . . . . . . . . . . . . . . . . . 14
7. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 14 5. Further Work . . . . . . . . . . . . . . . . . . . . . . . . . 15
8. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 14 6. Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
9. Security Considerations . . . . . . . . . . . . . . . . . . . 14 7. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 15
10. Informative References . . . . . . . . . . . . . . . . . . . . 14 8. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 15
Appendix A. Change Log . . . . . . . . . . . . . . . . . . . . . 16 9. Security Considerations . . . . . . . . . . . . . . . . . . . 15
A.1. Changes from -01 . . . . . . . . . . . . . . . . . . . . . 16 10. Informative References . . . . . . . . . . . . . . . . . . . . 15
Appendix A. Change Log . . . . . . . . . . . . . . . . . . . . . 17
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 16 A.1. Changes from -01 . . . . . . . . . . . . . . . . . . . . . 17
A.2. Changes from -02 . . . . . . . . . . . . . . . . . . . . . 18
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 18
1. Introduction 1. Introduction
Server virtualization is increasingly becoming the norm in data Server virtualization is increasingly becoming the norm in data
centers. With server virtualization, each physical server supports centers. With server virtualization, each physical server supports
multiple virtual machines (VMs), each running its own operating multiple virtual machines (VMs), each running its own operating
system, middleware and applications. Virtualization is a key enabler system, middleware and applications. Virtualization is a key enabler
of workload agility, i.e., allowing any server to host any of workload agility, i.e., allowing any server to host any
application and providing the flexibility of adding, shrinking, or application and providing the flexibility of adding, shrinking, or
moving services within the physical infrastructure. Server moving services within the physical infrastructure. Server
virtualization provides numerous benefits, including higher virtualization provides numerous benefits, including higher
utilization, increased data security, reduced user downtime, reduced utilization, increased security, reduced user downtime, reduced power
power usage, etc. usage, etc.
Large scale multi-tenant data centers are taking advantage of the Large scale multi-tenant data centers are taking advantage of the
benefits of server virtualization to provide a new kind of hosting, a benefits of server virtualization to provide a new kind of hosting, a
virtual hosted data center. Multi-tenant data centers are ones where virtual hosted data center. Multi-tenant data centers are ones where
individual tenants could belong to a different company (in the case individual tenants could belong to a different company (in the case
of a public provider) or a different department (in the case of an of a public provider) or a different department (in the case of an
internal company data center). Each tenant has the expectation of a internal company data center). Each tenant has the expectation of a
level of security and privacy separating their resources from those level of security and privacy separating their resources from those
of other tenants. For example, one tenant's traffic must never be of other tenants. For example, one tenant's traffic must never be
exposed to another tenant, except through carefully controlled exposed to another tenant, except through carefully controlled
skipping to change at page 5, line 30 skipping to change at page 5, line 30
Sections 4 and 5 review related and further work, while Section 6 Sections 4 and 5 review related and further work, while Section 6
closes with a summary. closes with a summary.
2. Problem Details 2. Problem Details
The following subsections describe aspects of multi-tenant networking The following subsections describe aspects of multi-tenant networking
that pose problems for large scale network infrastructure. Different that pose problems for large scale network infrastructure. Different
problem aspects may arise based on the network architecture and problem aspects may arise based on the network architecture and
scale. scale.
2.1. Multi-tenant Environment Scale 2.1. Dynamic Provisioning
Cloud computing involves on-demand elastic provisioning of resources Cloud computing involves on-demand provisioning of resources for
for multi-tenant environments. A common example of cloud computing multi-tenant environments. A common example of cloud computing is
is the public cloud, where a cloud service provider offers these the public cloud, where a cloud service provider offers elastic
elastic services to multiple customers over the same infrastructure. services to multiple customers over the same infrastructure. The on-
This elastic on-demand nature in conjunction with trusted hypervisors demand nature of provisioning in conjunction with trusted hypervisors
to control network access by VMs calls for resilient distributed controlling network access by VMs can be achieved through resilient
network control mechanisms. distributed network control mechanisms.
2.2. Virtual Machine Mobility Requirements 2.2. Virtual Machine Mobility Requirements
A key benefit of server virtualization is virtual machine (VM) A key benefit of server virtualization is virtual machine (VM)
mobility. A VM can be migrated from one server to another, live i.e. mobility. A VM can be migrated from one server to another, live,
as it continues to run and without shutting down the VM and i.e., while continuing to run and without needing to shut it down and
restarting it at a new location. A key requirement for live restart it at the new location. A key requirement for live migration
migration is that a VM retain its IP address(es) and MAC address(es) is that a VM retain critical network state at its new location,
in its new location (to avoid tearing down existing communication). including its IP and MAC address(es). Preservation of MAC addresses
Today, servers are assigned IP addresses based on their physical may be necessary, for example, when software licences are bound to
location, typically based on the ToR (Top of Rack) switch for the MAC addresses. More generally, any change in the VM's MAC addresses
server rack or the VLAN configured to the server. This works well resulting from a move would be visible to the VM and thus potentially
for physical servers, which cannot move, but it restricts the result in unexpected disruptions. Retaining IP addresses after a
placement and movement of the more mobile VMs within the data center move is necessary to prevent existing transport connections (e.g.,
(DC). Any solution for a scalable multi-tenant DC must allow a VM to TCP) from breaking and needing to be restarted.
be placed (or moved to) anywhere within the data center, without
being constrained by the subnet boundary concerns of the host In traditional data centers, servers are assigned IP addresses based
servers. on their physical location, for example based on the Top of Rack
(ToR) switch for the server rack or the VLAN configured to the
server. Servers can only move to other locations within the same IP
subnet. This constraint is not problematic for physical servers,
which move infrequently, but it restricts the placement and movement
of VMs within the data center. Any solution for a scalable multi-
tenant data center must allow a VM to be placed (or moved) anywhere
within the data center, without being constrained by the subnet
boundary concerns of the host servers.
2.3. Span of Virtual Networks 2.3. Span of Virtual Networks
Another use case is cross pod expansion. A pod typically consists of Another use case is cross pod expansion. A pod typically consists of
one or more racks of servers with its associated network and storage one or more racks of servers with its associated network and storage
connectivity. Tenants may start off on a pod and, due to expansion, connectivity. Tenants may start off on a pod and, due to expansion,
require servers/VMs on other pods, especially the case when tenants require servers/VMs on other pods, especially the case when tenants
on the other pods are not fully utilizing all their resources. This on the other pods are not fully utilizing all their resources. This
use case requires that virtual networks span multiple pods in order use case requires that virtual networks span multiple pods in order
to provide connectivity to all of the tenant's servers/VMs. to provide connectivity to all of the tenant's servers/VMs.
skipping to change at page 7, line 5 skipping to change at page 7, line 9
to migrate compute workloads to any server anywhere in the network to migrate compute workloads to any server anywhere in the network
while retaining the workload's addresses. This can be achieved today while retaining the workload's addresses. This can be achieved today
by stretching VLANs (e.g., by using TRILL or SPB). by stretching VLANs (e.g., by using TRILL or SPB).
However, in order to limit the broadcast domain of each VLAN, multi- However, in order to limit the broadcast domain of each VLAN, multi-
destination frames within a VLAN should optimally flow only to those destination frames within a VLAN should optimally flow only to those
devices that have that VLAN configured. When workloads migrate, the devices that have that VLAN configured. When workloads migrate, the
physical network (e.g., access lists) may need to be reconfigured physical network (e.g., access lists) may need to be reconfigured
which is typically time consuming and error prone. which is typically time consuming and error prone.
2.6. Support Communication Between VMs and Non-virtualized Devices 2.6. Separating Tenant Addressing from Infrastructure Addressing
Within data centers, not all communication will be between VMs. It is highly desirable to be able to number the data center underlay
Network operators will continue to use non-virtualized servers for network using whatever addresses make sense for it, without having to
various reasons, traditional routers to provide L2VPN and L3VPN worry about address collisions between addresses used by the underlay
services, traditional load balancers, firewalls, intrusion detection and those used by tenants.
engines and so on. Any virtual network solution should be capable of
working with these existing systems.
2.7. Overlay Design Characteristics 2.7. Communication Between Virtual and Traditional Networks
Not all communication will be between devices connected to
virtualized networks. Devices using overlays will continue to access
devices and make use of services on traditional, non-virtualized
networks, whether in the data center, the public Internet, or at
remote/branch campuses. Any virtual network solution must be capable
of interoperating with existing routers, VPN services, load
balancers, intrusion detection services, firewalls, etc. on external
networks.
Communication between devices attached to a virtual network and
devices connected to non-virtualized networks is handled
architecturally by having specialized gateway devices that receive
packets from a virtualized network, decapsulate them, process them as
regular (i.e., non-virtualized) traffic, and finally forward them on
to their appropriate destination (and vice versa). Additional
identification, such as VLAN tags, could be used on the non-
virtualized side of such a gateway to enable forwarding of traffic
for multiple virtual networks over a common non-virtualized link.
A wide range of implementation approaches are possible. Overlay
gateway functionality could be combined with other network
functionality into a network device that implements the overlay
functionality, and then forwards traffic between other internal
components that implement functionality such as full router service,
load balancing, firewall support, VPN gateway, etc.
2.8. Communication Between Virtual Networks
Communication between devices on different virtual networks is
handled architecturally by adding specialized interconnect
functionality among the otherwise isolated virtual networks. For a
virtual network providing an Ethernet service, such interconnect
functionality could be IP forwarding configured as part of the
"default gateway" for each virtual network. For a virtual network
providing IP service, the interconnect functionality could be IP
forwarding configured as part of the IP addressing structure of each
virtual network. In both cases, the implementation of the
interconnect functionality could be distributed across the NVEs, and
could be combined with other network functionality (e.g., load
balancing, firewall support) that is applied to traffic that is
forwarded between virtual networks.
2.9. Overlay Design Characteristics
There are existing layer 2 overlay protocols in existence, but they There are existing layer 2 overlay protocols in existence, but they
were not necessarily designed to solve the problem in the environment were not necessarily designed to solve the problem in the environment
of a highly virtualized data center. Below are some of the of a highly virtualized data center. Below are some of the
characteristics of environments that must be taken into account by characteristics of environments that must be taken into account by
the overlay technology: the overlay technology:
1. Highly distributed systems. The overlay should work in an 1. Highly distributed systems. The overlay should work in an
environment where there could be many thousands of access environment where there could be many thousands of access
switches (e.g. residing within the hypervisors) and many more end switches (e.g. residing within the hypervisors) and many more end
skipping to change at page 8, line 44 skipping to change at page 9, line 44
up operation is keyed on {VLAN, MAC address} tuples. up operation is keyed on {VLAN, MAC address} tuples.
But there are problems and limitations with L2 VLANs. VLANs are a But there are problems and limitations with L2 VLANs. VLANs are a
pure L2 bridging construct and VLAN identifiers are carried along pure L2 bridging construct and VLAN identifiers are carried along
with data frames to allow each forwarding point to know what VLAN the with data frames to allow each forwarding point to know what VLAN the
frame belongs to. A VLAN today is defined as a 12 bit number, frame belongs to. A VLAN today is defined as a 12 bit number,
limiting the total number of VLANs to 4096 (though typically, this limiting the total number of VLANs to 4096 (though typically, this
number is 4094 since 0 and 4095 are reserved). Due to the large number is 4094 since 0 and 4095 are reserved). Due to the large
number of tenants that a cloud provider might service, the 4094 VLAN number of tenants that a cloud provider might service, the 4094 VLAN
limit is often inadequate. In addition, there is often a need for limit is often inadequate. In addition, there is often a need for
multiple VLANs per tenant, which exacerbates the issue. multiple VLANs per tenant, which exacerbates the issue. The use of a
sufficiently large VNID, present in the overlay control plane and
possibly also in the dataplane would eliminate current VLAN size
limitations associated with single 12-bit VLAN tags.
In the case of IP networks, many routers provide a Virtual Routing For IP/MPLS networks, Ethernet Virtual Private Network (E-VPN)
and Forwarding (VRF) service. The same router operates multiple [I-D.ietf-l2vpn-evpn] provides an emulated Ethernet service in which
instances of forwarding tables, one for each tenant. Each forwarding each tenant has its own Ethernet network over a common IP or MPLS
table instance is populated separately via routing protocols, either infrastructure and a BGP/MPLS control plane is used to distribute the
running (conceptually) as separate instances for each VRF, or as a tenant MAC addresses and the MPLS labels that identify the tenants
single instance-aware routing protocol that supports VRFs directly and tenant MAC addresses. Within the BGP/MPLS control plane a thirty
(e.g., [RFC4364]). Each VRF instance provides address and traffic two bit Ethernet Tag is used to identify the broadcast domains
isolation. The forwarding table look up operation is keyed on {VRF, (VLANs) associated with a given L2 VLAN service instance and these
IP address} tuples. Ethernet tags are mapped to VLAN IDs understood by the tenant at the
service edges. This means that the limit of 4096 VLANs is associated
with an individual tenant service edge, enabling a much higher level
of scalability. Interconnectivity between tenants is also allowed in
a controlled fashion.
VRF's are a pure routing construct and do not have end-to-end IP/MPLS networks also provide an IP VPN service (L3 VPN) [RFC4364] in
significance in the sense that the data plane carries a VRF indicator which each tenant has its own IP network over a common IP or MPLS
on an end-to-end basis. Instead, the VRF is derived at each hop infrastructure and a BGP/MPLS control plane is used to distribute the
using a combination of incoming interface and some information in the tenant IP routes and the MPLS labels that identify the tenants and
frame (e.g., local VLAN tag). Furthermore, the VRF model has tenant IP routes. As with E-VPNs, interconnectivity between tenants
typically assumed that a separate control plane governs the is also allowed in a controlled fashion.
population of the forwarding table within that VRF. Thus, a
traditional VRF model assumes multiple, independent control planes
and has no specific tag within a data frame to identify the VRF of
the frame.
There are number of VPN approaches that provide some of the desired VM Mobility [I-D.raggarwa-data-center-mobility] introduces the
semantics of virtual networks (e.g., [RFC4364]). But VPN approaches concept of a combined L2/L3 VPN service in order to support the
have traditionally been deployed across WANs and have not seen mobility of individual Virtual Machines (VMs) between Data Centers
widespread deployment within enterprise data centers. They are not connected over a common IP or MPLS infrastructure.
necessarily seen as supporting the characteristics outlined in
Section 2.7. There are a number of VPN approaches that provide some if not all of
the desired semantics of virtual networks. A gap analysis will be
needed to assess how well existing approaches satisfy the
requirements.
3.2. Benefits of Network Overlays 3.2. Benefits of Network Overlays
To address the problems described earlier, a network overlay model To address the problems described earlier, a network overlay model
can be used. can be used.
The idea behind an overlay is quite straightforward. Each virtual The idea behind an overlay is quite straightforward. Each virtual
network instance is implemented as an overlay. The original frame is network instance is implemented as an overlay. The original frame is
encapsulated by the first hop network device. The encapsulation encapsulated by the first hop network device. The encapsulation
identifies the destination of the device that will perform the identifies the destination of the device that will perform the
decapsulation before delivering the frame to the endpoint. The rest decapsulation before delivering the frame to the endpoint. The rest
of the network forwards the frame based on the encapsulation header of the network forwards the frame based on the encapsulation header
and can be oblivious to the payload that is carried inside. To avoid and can be oblivious to the payload that is carried inside. To avoid
belaboring the point each time, the first hop network device can be a belaboring the point each time, the first hop network device can be a
traditional switch or router or the virtual switch residing inside a traditional switch or router or the virtual switch residing inside a
hypervisor. Furthermore, the endpoint can be a VM or it can be a hypervisor. Furthermore, the endpoint can be a VM or it can be a
physical server. Some examples of network overlays are tunnels such physical server. Examples of architectures based on network overlays
as IP GRE [RFC2784], LISP [I-D.ietf-lisp] or TRILL [RFC6325]. include BGP/MPLS VPNs [RFC4364], TRILL [RFC6325], LISP
[I-D.ietf-lisp], and Shortest Path Bridging [SPB].
With the overlay, a virtual network identifier (or VNID) can be With the overlay, a virtual network identifier (or VNID) can be
carried as part of the overlay header so that every data frame carried as part of the overlay header so that every data frame
explicitly identifies the specific virtual network the frame belongs explicitly identifies the specific virtual network the frame belongs
to. Since both routed and bridged semantics can be supported by a to. Since both routed and bridged semantics can be supported by a
virtual data center, the original frame carried within the overlay virtual data center, the original frame carried within the overlay
header can be an Ethernet frame complete with MAC addresses or just header can be an Ethernet frame complete with MAC addresses or just
the IP packet. the IP packet.
The use of a large (e.g., 24-bit) VNID would allow 16 million The use of a sufficiently large VNID would address current VLAN
distinct virtual networks within a single data center, eliminating limitations associated with single 12-bit VLAN tags. This VNID can
current VLAN size limitations. This VNID needs to be carried in the be carried in the control plane. In the data plane, an overlay
data plane along with the packet. Adding an overlay header provides header provides a place to carry either the VNID, or a locally-
a place to carry this VNID. significant identifier. In both cases, the identifier in the overlay
header specifies which virtual network the data packet belongs to.
A key aspect of overlays is the decoupling of the "virtual" MAC and A key aspect of overlays is the decoupling of the "virtual" MAC and
IP addresses used by VMs from the physical network infrastructure and IP addresses used by VMs from the physical network infrastructure and
the infrastructure IP addresses used by the data center. If a VM the infrastructure IP addresses used by the data center. If a VM
changes location, the switches at the edge of the overlay simply changes location, the switches at the edge of the overlay simply
update their mapping tables to reflect the new location of the VM update their mapping tables to reflect the new location of the VM
within the data center's infrastructure space. Because an overlay within the data center's infrastructure space. Because an overlay
network is used, a VM can now be located anywhere in the data center network is used, a VM can now be located anywhere in the data center
that the overlay reaches without regards to traditional constraints that the overlay reaches without regards to traditional constraints
implied by L2 properties such as VLAN numbering, or the span of an L2 implied by L2 properties such as VLAN numbering, or the span of an L2
broadcast domain scoped to a single pod or access switch. broadcast domain scoped to a single pod or access switch.
Multi-tenancy is supported by isolating the traffic of one virtual Multi-tenancy is supported by isolating the traffic of one virtual
network instance from traffic of another. Traffic from one virtual network instance from traffic of another. Traffic from one virtual
network instance cannot be delivered to another instance without network instance cannot be delivered to another instance without
(conceptually) exiting the instance and entering the other instance (conceptually) exiting the instance and entering the other instance
via an entity that has connectivity to both virtual network via an entity that has connectivity to both virtual network
instances. Without the existence of this entity, tenant traffic instances. Without the existence of this entity, tenant traffic
remains isolated within each individual virtual network instance. remains isolated within each individual virtual network instance.
External communications (from a VM within a virtual network instance
to a machine outside of any virtual network instance, e.g. on the
Internet) is handled by having an ingress switch forward traffic to
an external router, where an egress switch decapsulates a tunneled
packet and delivers it to the router for normal processing. This
router is external to the overlay, and behaves much like existing
external facing routers in data centers today.
Overlays are designed to allow a set of VMs to be placed within a Overlays are designed to allow a set of VMs to be placed within a
single virtual network instance, whether that virtual network single virtual network instance, whether that virtual network
provides the bridged network or a routed network. provides a bridged network or a routed network.
3.3. Overlay Networking Work Areas 3.3. Overlay Networking Work Areas
There are three specific and separate potential work areas needed to There are three specific and separate potential work areas needed to
realize an overlay solution. The areas correspond to different realize an overlay solution. The areas correspond to different
possible "on-the-wire" protocols, where distinct entities interact possible "on-the-wire" protocols, where distinct entities interact
with each other. with each other.
One area of work concerns the address dissemination protocol an NVE One area of work concerns the address dissemination protocol an NVE
uses to build and maintain the mapping tables it uses to deliver uses to build and maintain the mapping tables it uses to deliver
encapsulated frames to their proper destination. One approach is to encapsulated frames to their proper destination. One approach is to
build mapping tables entirely via learning (as is done in 802.1 build mapping tables entirely via learning (as is done in 802.1
networks). But to provide better scaling properties, a more networks). But to provide better scaling properties, a more
sophisticated approach is needed, i.e., the use of a specialized sophisticated approach is needed, i.e., the use of a specialized
control plane protocol. While there are some advantages to using or control plane protocol. While there are some advantages to using or
leveraging an existing protocol for maintaining mapping tables, the leveraging an existing protocol for maintaining mapping tables, the
fact that large numbers of NVE's will likely reside in hypervisors fact that large numbers of NVE's will likely reside in hypervisors
places constraints on the resources (cpu and memory) that can be places constraints on the resources (cpu and memory) that can be
dedicated to such functions. For example, routing protocols (e.g., dedicated to such functions. For example, routing protocols (e.g.,
IS-IS, BGP) may have scaling difficulties if implemented directly in IS-IS, BGP) may have scaling difficulties if implemented directly in
all NVEs, based on both flooding and convergence time concerns. This all NVEs, based on both flooding and convergence time concerns. An
suggests that use of a standard lookup protocol between NVEs and a alternative approach would be to use a standard query protocol
smaller number of network nodes that implement the actual routing between NVEs and the set of network nodes that maintain address
protocol (or the directory-based "oracle") is a more promising mappings used across the data center for the entire overlay system.
approach at larger scale.
From an architectural perspective, one can view the address mapping From an architectural perspective, one can view the address mapping
dissemination problem as having two distinct and separable dissemination problem as having two distinct and separable
components. The first component consists of a back-end "oracle" that components. The first component consists of a back-end "oracle" that
is responsible for distributing and maintaining the mapping is responsible for distributing and maintaining the mapping
information for the entire overlay system. The second component information for the entire overlay system. The second component
consists of the on-the-wire protocols an NVE uses when interacting consists of the on-the-wire protocols an NVE uses when interacting
with the oracle. with the oracle.
The back-end oracle could provide high performance, high resiliancy, The back-end oracle could provide high performance, high resiliency,
failover, etc. and could be implemented in different ways. For failover, etc. and could be implemented in significantly different
example, one model uses a traditional, centralized "directory-based" ways. For example, one model uses a traditional, centralized
database, using replicated instances for reliability and failover "directory-based" database, using replicated instances for
(e.g., LISP-XXX). A second model involves using and possibly reliability and failover. A second model involves using and possibly
extending an existing routing protocol (e.g., BGP, IS-IS, etc.). To extending an existing routing protocol (e.g., BGP, IS-IS, etc.). To
support different architectural models, it is useful to have one support different architectural models, it is useful to have one
standard protocol for the NVE-oracle interaction while allowing standard protocol for the NVE-oracle interaction while allowing
different protocols and architectural approaches for the oracle different protocols and architectural approaches for the oracle
itself. Separating the two allows NVEs to interact with different itself. Separating the two allows NVEs to transparently interact
types of oracles, i.e., either of the two architectural models with different types of oracles, i.e., either of the two
described above. Having separate protocols also allows for a architectural models described above. Having separate protocols
simplified NVE that only interacts with the oracle for the mapping could also allow for a simplified NVE that only interacts with the
table entries it needs and allows the oracle (and its associated oracle for the mapping table entries it needs and allows the oracle
protocols) to evolve independently over time with minimal impact to (and its associated protocols) to evolve independently over time with
the NVEs. minimal impact to the NVEs.
A third work area considers the attachment and detachment of VMs (or A third work area considers the attachment and detachment of VMs (or
Tenant End Systems [I-D.lasserre-nvo3-framework] more generally) from Tenant End Systems [I-D.lasserre-nvo3-framework] more generally) from
a specific virtual network instance. When a VM attaches, the Network a specific virtual network instance. When a VM attaches, the Network
Virtualization Edge (NVE) [I-D.lasserre-nvo3-framework] associates Virtualization Edge (NVE) [I-D.lasserre-nvo3-framework] associates
the VM with a specific overlay for the purposes of tunneling traffic the VM with a specific overlay for the purposes of tunneling traffic
sourced from or destined to the VM. When a VM disconnects, it is sourced from or destined to the VM. When a VM disconnects, it is
removed from the overlay and the NVE effectively terminates any removed from the overlay and the NVE effectively terminates any
tunnels associated with the VM. To achieve this functionality, a tunnels associated with the VM. To achieve this functionality, a
standardized interaction between the NVE and hypervisor may be standardized interaction between the NVE and hypervisor may be
needed, for example in the case where the NVE resides on a separate needed, for example in the case where the NVE resides on a separate
device from the VM. device from the VM.
In summary, there are three areas of potential work. The first area In summary, there are three areas of potential work. The first area
concerns the oracle itself and any on-the-wire protocols it needs. A concerns the oracle itself and any on-the-wire protocols it needs. A
second area concerns the interaction between the oracle and NVEs. second area concerns the interaction between the oracle and NVEs.
The third work area concerns protocols associated with attaching and The third work area concerns protocols associated with attaching and
detaching a VM from a particular virtual network instance. The detaching a VM from a particular virtual network instance. All three
latter two items are the priority work areas and can be done largely work areas are important to the development of a scalable,
independent of any oracle-related work. interoperable solution.
4. Related Work 4. Related Work
4.1. IEEE 802.1aq - Shortest Path Bridging 4.1. IEEE 802.1aq - Shortest Path Bridging
Shortest Path Bridging (SPB) is an IS-IS based overlay for L2 Shortest Path Bridging (SPB) is an IS-IS based overlay for L2
Ethernets. SPB supports multi-pathing and addresses a number of Ethernets. SPB supports multi-pathing and addresses a number of
shortcoming in the original Ethernet Spanning Tree Protocol. SPB-M shortcoming in the original Ethernet Spanning Tree Protocol. SPB-M
uses IEEE 802.1ah MAC-in-MAC encapsulation and supports a 24-bit uses IEEE 802.1ah MAC-in-MAC encapsulation and supports a 24-bit
I-SID, which can be used to identify virtual network instances. SPB I-SID, which can be used to identify virtual network instances. SPB
skipping to change at page 12, line 33 skipping to change at page 13, line 33
4.2. ARMD 4.2. ARMD
ARMD is chartered to look at data center scaling issues with a focus ARMD is chartered to look at data center scaling issues with a focus
on address resolution. ARMD is currently chartered to develop a on address resolution. ARMD is currently chartered to develop a
problem statement and is not currently developing solutions. While problem statement and is not currently developing solutions. While
an overlay-based approach may address some of the "pain points" that an overlay-based approach may address some of the "pain points" that
have been raised in ARMD (e.g., better support for multi-tenancy), an have been raised in ARMD (e.g., better support for multi-tenancy), an
overlay approach may also push some of the L2 scaling concerns (e.g., overlay approach may also push some of the L2 scaling concerns (e.g.,
excessive flooding) to the IP level (flooding via IP multicast). excessive flooding) to the IP level (flooding via IP multicast).
Analysis will be needed to understand the scaling trade offs of an Analysis will be needed to understand the scaling tradeoffs of an
overlay based approach compared with existing approaches. On the overlay based approach compared with existing approaches. On the
other hand, existing IP-based approaches such as proxy ARP may help other hand, existing IP-based approaches such as proxy ARP may help
mitigate some concerns. mitigate some concerns.
4.3. TRILL 4.3. TRILL
TRILL is an L2-based approach aimed at improving deficiencies and TRILL is an L2-based approach aimed at improving deficiencies and
limitations with current Ethernet networks and STP in particular. limitations with current Ethernet networks and STP in particular.
Although it differs from Shortest Path Bridging in many architectural Although it differs from Shortest Path Bridging in many architectural
and implementation details, it is similar in that is provides an L2- and implementation details, it is similar in that is provides an L2-
skipping to change at page 13, line 39 skipping to change at page 14, line 39
4.5. Proxy Mobile IP 4.5. Proxy Mobile IP
Proxy Mobile IP [RFC5213] [RFC5844] makes use of the GRE Key Field Proxy Mobile IP [RFC5213] [RFC5844] makes use of the GRE Key Field
[RFC5845] [RFC6245], but not in a way that supports multi-tenancy. [RFC5845] [RFC6245], but not in a way that supports multi-tenancy.
4.6. LISP 4.6. LISP
LISP[I-D.ietf-lisp] essentially provides an IP over IP overlay where LISP[I-D.ietf-lisp] essentially provides an IP over IP overlay where
the internal addresses are end station Identifiers and the outer IP the internal addresses are end station Identifiers and the outer IP
addresses represent the location of the end station within the core addresses represent the location of the end station within the core
IP network topology. The LISP overlay header uses a 24 bit Instance IP network topology. The LISP overlay header uses a 24-bit Instance
ID used to support overlapping inner IP addresses. ID used to support overlapping inner IP addresses.
4.7. Individual Submissions 4.7. Individual Submissions
Many individual submissions also look to addressing some or all of Many individual submissions also look to addressing some or all of
the issues addressed in this draft. Examples of such drafts are the issues addressed in this draft. Examples of such drafts are
VXLAN [I-D.mahalingam-dutt-dcops-vxlan], NVGRE VXLAN [I-D.mahalingam-dutt-dcops-vxlan], NVGRE
[I-D.sridharan-virtualization-nvgre] and Virtual Machine Mobility in [I-D.sridharan-virtualization-nvgre] and Virtual Machine Mobility in
L3 networks[I-D.wkumari-dcops-l3-vmmobility]. L3 networks[I-D.wkumari-dcops-l3-vmmobility].
skipping to change at page 14, line 44 skipping to change at page 15, line 44
8. IANA Considerations 8. IANA Considerations
This memo includes no request to IANA. This memo includes no request to IANA.
9. Security Considerations 9. Security Considerations
TBD TBD
10. Informative References 10. Informative References
[I-D.ietf-l2vpn-evpn]
Sajassi, A., Aggarwal, R., Henderickx, W., Balus, F.,
Isaac, A., and J. Uttaro, "BGP MPLS Based Ethernet VPN",
draft-ietf-l2vpn-evpn-01 (work in progress), July 2012.
[I-D.ietf-lisp] [I-D.ietf-lisp]
Farinacci, D., Fuller, V., Meyer, D., and D. Lewis, Farinacci, D., Fuller, V., Meyer, D., and D. Lewis,
"Locator/ID Separation Protocol (LISP)", "Locator/ID Separation Protocol (LISP)",
draft-ietf-lisp-23 (work in progress), May 2012. draft-ietf-lisp-23 (work in progress), May 2012.
[I-D.ietf-trill-fine-labeling] [I-D.ietf-trill-fine-labeling]
Eastlake, D., Zhang, M., Agarwal, P., Perlman, R., and D. Eastlake, D., Zhang, M., Agarwal, P., Perlman, R., and D.
Dutt, "TRILL: Fine-Grained Labeling", Dutt, "TRILL: Fine-Grained Labeling",
draft-ietf-trill-fine-labeling-01 (work in progress), draft-ietf-trill-fine-labeling-01 (work in progress),
June 2012. June 2012.
[I-D.kreeger-nvo3-overlay-cp] [I-D.kreeger-nvo3-overlay-cp]
Black, D., Dutt, D., Kreeger, L., Sridhavan, M., and T. Black, D., Dutt, D., Kreeger, L., Sridhavan, M., and T.
Narten, "Network Virtualization Overlay Control Protocol Narten, "Network Virtualization Overlay Control Protocol
Requirements", draft-kreeger-nvo3-overlay-cp-00 (work in Requirements", draft-kreeger-nvo3-overlay-cp-00 (work in
progress), January 2012. progress), January 2012.
[I-D.lasserre-nvo3-framework] [I-D.lasserre-nvo3-framework]
Lasserre, M., Balus, F., Morin, T., Bitar, N., Rekhter, Lasserre, M., Balus, F., Morin, T., Bitar, N., and Y.
Y., and Y. Ikejiri, "Framework for DC Network Rekhter, "Framework for DC Network Virtualization",
Virtualization", draft-lasserre-nvo3-framework-01 (work in draft-lasserre-nvo3-framework-03 (work in progress),
progress), March 2012. July 2012.
[I-D.mahalingam-dutt-dcops-vxlan] [I-D.mahalingam-dutt-dcops-vxlan]
Sridhar, T., Bursell, M., Kreeger, L., Dutt, D., Wright, Sridhar, T., Bursell, M., Kreeger, L., Dutt, D., Wright,
C., Mahalingam, M., Duda, K., and P. Agarwal, "VXLAN: A C., Mahalingam, M., Duda, K., and P. Agarwal, "VXLAN: A
Framework for Overlaying Virtualized Layer 2 Networks over Framework for Overlaying Virtualized Layer 2 Networks over
Layer 3 Networks", draft-mahalingam-dutt-dcops-vxlan-01 Layer 3 Networks", draft-mahalingam-dutt-dcops-vxlan-01
(work in progress), February 2012. (work in progress), February 2012.
[I-D.raggarwa-data-center-mobility]
Aggarwal, R., Rekhter, Y., Henderickx, W., Shekhar, R.,
and L. Fang, "Data Center Mobility based on BGP/MPLS, IP
Routing and NHRP", draft-raggarwa-data-center-mobility-03
(work in progress), June 2012.
[I-D.sridharan-virtualization-nvgre] [I-D.sridharan-virtualization-nvgre]
Sridhavan, M., Duda, K., Ganga, I., Greenberg, A., Lin, Sridhavan, M., Greenberg, A., Venkataramaiah, N., Wang,
G., Pearson, M., Thaler, P., Tumuluri, C., and Y. Wang, Y., Duda, K., Ganga, I., Lin, G., Pearson, M., Thaler, P.,
"NVGRE: Network Virtualization using Generic Routing and C. Tumuluri, "NVGRE: Network Virtualization using
Encapsulation", draft-sridharan-virtualization-nvgre-00 Generic Routing Encapsulation",
(work in progress), September 2011. draft-sridharan-virtualization-nvgre-01 (work in
progress), July 2012.
[I-D.wkumari-dcops-l3-vmmobility] [I-D.wkumari-dcops-l3-vmmobility]
Kumari, W. and J. Halpern, "Virtual Machine mobility in L3 Kumari, W. and J. Halpern, "Virtual Machine mobility in L3
Networks.", draft-wkumari-dcops-l3-vmmobility-00 (work in Networks.", draft-wkumari-dcops-l3-vmmobility-00 (work in
progress), August 2011. progress), August 2011.
[RFC2661] Townsley, W., Valencia, A., Rubens, A., Pall, G., Zorn, [RFC2661] Townsley, W., Valencia, A., Rubens, A., Pall, G., Zorn,
G., and B. Palter, "Layer Two Tunneling Protocol "L2TP"", G., and B. Palter, "Layer Two Tunneling Protocol "L2TP"",
RFC 2661, August 1999. RFC 2661, August 1999.
[RFC2784] Farinacci, D., Li, T., Hanks, S., Meyer, D., and P. [RFC4023] Worster, T., Rekhter, Y., and E. Rosen, "Encapsulating
Traina, "Generic Routing Encapsulation (GRE)", RFC 2784, MPLS in IP or Generic Routing Encapsulation (GRE)",
March 2000. RFC 4023, March 2005.
[RFC4364] Rosen, E. and Y. Rekhter, "BGP/MPLS IP Virtual Private [RFC4364] Rosen, E. and Y. Rekhter, "BGP/MPLS IP Virtual Private
Networks (VPNs)", RFC 4364, February 2006. Networks (VPNs)", RFC 4364, February 2006.
[RFC5036] Andersson, L., Minei, I., and B. Thomas, "LDP
Specification", RFC 5036, October 2007.
[RFC5213] Gundavelli, S., Leung, K., Devarapalli, V., Chowdhury, K., [RFC5213] Gundavelli, S., Leung, K., Devarapalli, V., Chowdhury, K.,
and B. Patil, "Proxy Mobile IPv6", RFC 5213, August 2008. and B. Patil, "Proxy Mobile IPv6", RFC 5213, August 2008.
[RFC5844] Wakikawa, R. and S. Gundavelli, "IPv4 Support for Proxy [RFC5844] Wakikawa, R. and S. Gundavelli, "IPv4 Support for Proxy
Mobile IPv6", RFC 5844, May 2010. Mobile IPv6", RFC 5844, May 2010.
[RFC5845] Muhanna, A., Khalil, M., Gundavelli, S., and K. Leung, [RFC5845] Muhanna, A., Khalil, M., Gundavelli, S., and K. Leung,
"Generic Routing Encapsulation (GRE) Key Option for Proxy "Generic Routing Encapsulation (GRE) Key Option for Proxy
Mobile IPv6", RFC 5845, June 2010. Mobile IPv6", RFC 5845, June 2010.
[RFC6245] Yegani, P., Leung, K., Lior, A., Chowdhury, K., and J. [RFC6245] Yegani, P., Leung, K., Lior, A., Chowdhury, K., and J.
Navali, "Generic Routing Encapsulation (GRE) Key Extension Navali, "Generic Routing Encapsulation (GRE) Key Extension
for Mobile IPv4", RFC 6245, May 2011. for Mobile IPv4", RFC 6245, May 2011.
[RFC6325] Perlman, R., Eastlake, D., Dutt, D., Gai, S., and A. [RFC6325] Perlman, R., Eastlake, D., Dutt, D., Gai, S., and A.
Ghanwani, "Routing Bridges (RBridges): Base Protocol Ghanwani, "Routing Bridges (RBridges): Base Protocol
Specification", RFC 6325, July 2011. Specification", RFC 6325, July 2011.
[SPB] "IEEE P802.1aq/D4.5 Draft Standard for Local and
Metropolitan Area Networks -- Media Access Control (MAC)
Bridges and Virtual Bridged Local Area Networks,
Amendment 8: Shortest Path Bridging", February 2012.
Appendix A. Change Log Appendix A. Change Log
A.1. Changes from -01 A.1. Changes from -01
1. Removed Section 4.2 (Standardization Issues) and Section 5 1. Removed Section 4.2 (Standardization Issues) and Section 5
(Control Plane) as those are more appropriately covered in and (Control Plane) as those are more appropriately covered in and
overlap with material in [I-D.lasserre-nvo3-framework] and overlap with material in [I-D.lasserre-nvo3-framework] and
[I-D.kreeger-nvo3-overlay-cp]. [I-D.kreeger-nvo3-overlay-cp].
2. Expanded introduction and better explained terms such as tenant 2. Expanded introduction and better explained terms such as tenant
skipping to change at page 17, line 5 skipping to change at page 18, line 18
3. Added Section 3.3 "Overlay Networking Work Areas" to better 3. Added Section 3.3 "Overlay Networking Work Areas" to better
articulate the three separable work components (or "on-the-wire articulate the three separable work components (or "on-the-wire
protocols") where work is needed. protocols") where work is needed.
4. Added section on Shortest Path Bridging in Related Work section. 4. Added section on Shortest Path Bridging in Related Work section.
5. Revised some of the terminology to be consistent with 5. Revised some of the terminology to be consistent with
[I-D.lasserre-nvo3-framework] and [I-D.kreeger-nvo3-overlay-cp]. [I-D.lasserre-nvo3-framework] and [I-D.kreeger-nvo3-overlay-cp].
A.2. Changes from -02
1. Numerous changes in response to discussions on the nvo3 mailing
list, with majority of changes in Section 2 (Problem Details) and
Section 3 (Network Overlays). Best to see diffs for specific
text changes.
Authors' Addresses Authors' Addresses
Thomas Narten (editor) Thomas Narten (editor)
IBM IBM
Email: narten@us.ibm.com Email: narten@us.ibm.com
Murari Sridharan Murari Sridharan
Microsoft Microsoft
 End of changes. 36 change blocks. 
142 lines changed or deleted 221 lines changed or added

This html diff was produced by rfcdiff 1.41. The latest version is available from http://tools.ietf.org/tools/rfcdiff/