draft-narten-nvo3-overlay-problem-statement-03.txt | draft-narten-nvo3-overlay-problem-statement-04.txt | |||
---|---|---|---|---|
Internet Engineering Task Force T. Narten, Ed. | Internet Engineering Task Force T. Narten, Ed. | |||
Internet-Draft IBM | Internet-Draft IBM | |||
Intended status: Informational M. Sridharan | Intended status: Informational D. Black | |||
Expires: January 18, 2013 Microsoft | Expires: February 11, 2013 EMC | |||
D. Dutt | D. Dutt | |||
D. Black | L. Fang | |||
EMC | Cisco Systems | |||
E. Gray | ||||
Ericsson | ||||
L. Kreeger | L. Kreeger | |||
Cisco | Cisco | |||
July 17, 2012 | M. Napierala | |||
AT&T | ||||
M. Sridharan | ||||
Microsoft | ||||
August 10, 2012 | ||||
Problem Statement: Overlays for Network Virtualization | Problem Statement: Overlays for Network Virtualization | |||
draft-narten-nvo3-overlay-problem-statement-03 | draft-narten-nvo3-overlay-problem-statement-04 | |||
Abstract | Abstract | |||
This document describes issues associated with providing multi- | This document describes issues associated with providing multi- | |||
tenancy in large data center networks and an overlay-based network | tenancy in large data center networks that require an overlay-based | |||
virtualization approach to addressing them. A key multi-tenancy | network virtualization approach to addressing them. A key multi- | |||
requirement is traffic isolation, so that a tenant's traffic is not | tenancy requirement is traffic isolation, so that a tenant's traffic | |||
visible to any other tenant. This isolation can be achieved by | is not visible to any other tenant. This isolation can be achieved | |||
assigning one or more virtual networks to each tenant such that | by assigning one or more virtual networks to each tenant such that | |||
traffic within a virtual network is isolated from traffic in other | traffic within a virtual network is isolated from traffic in other | |||
virtual networks. The primary functionality required is provisioning | virtual networks. The primary functionality required is provisioning | |||
virtual networks, associating a virtual machine's virtual network | virtual networks, associating a virtual machine's virtual network | |||
interface(s) with the appropriate virtual network, and maintaining | interface(s) with the appropriate virtual network, and maintaining | |||
that association as the virtual machine is activated, migrated and/or | that association as the virtual machine is activated, migrated and/or | |||
deactivated. Use of an overlay-based approach enables scalable | deactivated. Use of an overlay-based approach enables scalable | |||
deployment on large network infrastructures. | deployment on large network infrastructures. | |||
Status of this Memo | Status of this Memo | |||
skipping to change at page 1, line 49 | skipping to change at page 2, line 10 | |||
Internet-Drafts are working documents of the Internet Engineering | Internet-Drafts are working documents of the Internet Engineering | |||
Task Force (IETF). Note that other groups may also distribute | Task Force (IETF). Note that other groups may also distribute | |||
working documents as Internet-Drafts. The list of current Internet- | working documents as Internet-Drafts. The list of current Internet- | |||
Drafts is at http://datatracker.ietf.org/drafts/current/. | Drafts is at http://datatracker.ietf.org/drafts/current/. | |||
Internet-Drafts are draft documents valid for a maximum of six months | Internet-Drafts are draft documents valid for a maximum of six months | |||
and may be updated, replaced, or obsoleted by other documents at any | and may be updated, replaced, or obsoleted by other documents at any | |||
time. It is inappropriate to use Internet-Drafts as reference | time. It is inappropriate to use Internet-Drafts as reference | |||
material or to cite them other than as "work in progress." | material or to cite them other than as "work in progress." | |||
This Internet-Draft will expire on January 18, 2013. | This Internet-Draft will expire on February 11, 2013. | |||
Copyright Notice | Copyright Notice | |||
Copyright (c) 2012 IETF Trust and the persons identified as the | Copyright (c) 2012 IETF Trust and the persons identified as the | |||
document authors. All rights reserved. | document authors. All rights reserved. | |||
This document is subject to BCP 78 and the IETF Trust's Legal | This document is subject to BCP 78 and the IETF Trust's Legal | |||
Provisions Relating to IETF Documents | Provisions Relating to IETF Documents | |||
(http://trustee.ietf.org/license-info) in effect on the date of | (http://trustee.ietf.org/license-info) in effect on the date of | |||
publication of this document. Please review these documents | publication of this document. Please review these documents | |||
carefully, as they describe your rights and restrictions with respect | carefully, as they describe your rights and restrictions with respect | |||
to this document. Code Components extracted from this document must | to this document. Code Components extracted from this document must | |||
include Simplified BSD License text as described in Section 4.e of | include Simplified BSD License text as described in Section 4.e of | |||
the Trust Legal Provisions and are provided without warranty as | the Trust Legal Provisions and are provided without warranty as | |||
described in the Simplified BSD License. | described in the Simplified BSD License. | |||
Table of Contents | Table of Contents | |||
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4 | 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4 | |||
2. Problem Details . . . . . . . . . . . . . . . . . . . . . . . 5 | 2. Problem Areas . . . . . . . . . . . . . . . . . . . . . . . . 5 | |||
2.1. Dynamic Provisioning . . . . . . . . . . . . . . . . . . . 5 | 2.1. Need For Dynamic Provisioning . . . . . . . . . . . . . . 5 | |||
2.2. Virtual Machine Mobility Requirements . . . . . . . . . . 5 | 2.2. Virtual Machine Mobility Limitations . . . . . . . . . . . 6 | |||
2.3. Span of Virtual Networks . . . . . . . . . . . . . . . . . 6 | 2.3. Inadequate Forwarding Table Sizes in Switches . . . . . . 6 | |||
2.4. Inadequate Forwarding Table Sizes in Switches . . . . . . 6 | 2.4. Need to Decouple Logical and Physical Configuration . . . 7 | |||
2.5. Decoupling Logical and Physical Configuration . . . . . . 6 | 2.5. Need For Address Separation Between Tenants . . . . . . . 7 | |||
2.6. Separating Tenant Addressing from Infrastructure | 2.6. Need For Address Separation Between Tenant and | |||
Addressing . . . . . . . . . . . . . . . . . . . . . . . . 7 | Infrastructure . . . . . . . . . . . . . . . . . . . . . . 7 | |||
2.7. Communication Between Virtual and Traditional Networks . . 7 | 2.7. IEEE 802.1 VLAN Limitations . . . . . . . . . . . . . . . 8 | |||
2.8. Communication Between Virtual Networks . . . . . . . . . . 7 | 3. Network Overlays . . . . . . . . . . . . . . . . . . . . . . . 8 | |||
2.9. Overlay Design Characteristics . . . . . . . . . . . . . . 8 | 3.1. Benefits of Network Overlays . . . . . . . . . . . . . . . 9 | |||
3. Network Overlays . . . . . . . . . . . . . . . . . . . . . . . 9 | 3.2. Communication Between Virtual and Traditional Networks . . 10 | |||
3.1. Limitations of Existing Virtual Network Models . . . . . . 9 | 3.3. Communication Between Virtual Networks . . . . . . . . . . 11 | |||
3.2. Benefits of Network Overlays . . . . . . . . . . . . . . . 10 | 3.4. Overlay Design Characteristics . . . . . . . . . . . . . . 11 | |||
3.3. Overlay Networking Work Areas . . . . . . . . . . . . . . 11 | 3.5. Overlay Networking Work Areas . . . . . . . . . . . . . . 12 | |||
4. Related Work . . . . . . . . . . . . . . . . . . . . . . . . . 13 | 4. Related IETF and IEEE Work . . . . . . . . . . . . . . . . . 14 | |||
4.1. IEEE 802.1aq - Shortest Path Bridging . . . . . . . . . . 13 | 4.1. L3 BGP/MPLS IP VPNs . . . . . . . . . . . . . . . . . . . 14 | |||
4.2. ARMD . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 | 4.2. L2 BGP/MPLS IP VPNs . . . . . . . . . . . . . . . . . . . 15 | |||
4.3. TRILL . . . . . . . . . . . . . . . . . . . . . . . . . . 13 | 4.3. IEEE 802.1aq - Shortest Path Bridging . . . . . . . . . . 15 | |||
4.4. L2VPNs . . . . . . . . . . . . . . . . . . . . . . . . . . 14 | 4.4. ARMD . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 | |||
4.5. Proxy Mobile IP . . . . . . . . . . . . . . . . . . . . . 14 | 4.5. TRILL . . . . . . . . . . . . . . . . . . . . . . . . . . 15 | |||
4.6. LISP . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 | 4.6. L2VPNs . . . . . . . . . . . . . . . . . . . . . . . . . . 16 | |||
4.7. Individual Submissions . . . . . . . . . . . . . . . . . . 14 | 4.7. Proxy Mobile IP . . . . . . . . . . . . . . . . . . . . . 16 | |||
5. Further Work . . . . . . . . . . . . . . . . . . . . . . . . . 15 | 4.8. LISP . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 | |||
6. Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 | 5. Further Work . . . . . . . . . . . . . . . . . . . . . . . . . 16 | |||
7. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 15 | 6. Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 | |||
8. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 15 | 7. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 17 | |||
9. Security Considerations . . . . . . . . . . . . . . . . . . . 15 | 8. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 17 | |||
10. Informative References . . . . . . . . . . . . . . . . . . . . 15 | 9. Security Considerations . . . . . . . . . . . . . . . . . . . 17 | |||
Appendix A. Change Log . . . . . . . . . . . . . . . . . . . . . 17 | 10. Informative References . . . . . . . . . . . . . . . . . . . . 17 | |||
A.1. Changes from -01 . . . . . . . . . . . . . . . . . . . . . 17 | Appendix A. Change Log . . . . . . . . . . . . . . . . . . . . . 19 | |||
A.2. Changes from -02 . . . . . . . . . . . . . . . . . . . . . 18 | A.1. Changes from -01 . . . . . . . . . . . . . . . . . . . . . 19 | |||
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 18 | A.2. Changes from -02 . . . . . . . . . . . . . . . . . . . . . 19 | |||
A.3. Changes from -03 . . . . . . . . . . . . . . . . . . . . . 20 | ||||
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 20 | ||||
1. Introduction | 1. Introduction | |||
Server virtualization is increasingly becoming the norm in data | Data Centers are increasingly being consolidated and outsourced in an | |||
centers. With server virtualization, each physical server supports | effort, both to improve the deployment time of applications as well | |||
multiple virtual machines (VMs), each running its own operating | as reduce operational costs. This coincides with an increasing | |||
system, middleware and applications. Virtualization is a key enabler | demand for compute, storage, and network resources from applications. | |||
of workload agility, i.e., allowing any server to host any | In order to scale compute, storage, and network resources, physical | |||
application and providing the flexibility of adding, shrinking, or | resources are being abstracted from their logical representation, in | |||
moving services within the physical infrastructure. Server | what is referred to as server, storage, and network virtualization. | |||
virtualization provides numerous benefits, including higher | Virtualization can be implemented in various layers of computer | |||
utilization, increased security, reduced user downtime, reduced power | systems or networks | |||
usage, etc. | ||||
Large scale multi-tenant data centers are taking advantage of the | The demand for server virtualization is increasing in data centers. | |||
benefits of server virtualization to provide a new kind of hosting, a | With server virtualization, each physical server supports multiple | |||
virtual hosted data center. Multi-tenant data centers are ones where | virtual machines (VMs), each running its own operating system, | |||
middleware and applications. Virtualization is a key enabler of | ||||
workload agility, i.e., allowing any server to host any application | ||||
and providing the flexibility of adding, shrinking, or moving | ||||
services within the physical infrastructure. Server virtualization | ||||
provides numerous benefits, including higher utilization, increased | ||||
security, reduced user downtime, reduced power usage, etc. | ||||
Multi-tenant data centers are taking advantage of the benefits of | ||||
server virtualization to provide a new kind of hosting, a virtual | ||||
hosted data center. Multi-tenant data centers are ones where | ||||
individual tenants could belong to a different company (in the case | individual tenants could belong to a different company (in the case | |||
of a public provider) or a different department (in the case of an | of a public provider) or a different department (in the case of an | |||
internal company data center). Each tenant has the expectation of a | internal company data center). Each tenant has the expectation of a | |||
level of security and privacy separating their resources from those | level of security and privacy separating their resources from those | |||
of other tenants. For example, one tenant's traffic must never be | of other tenants. For example, one tenant's traffic must never be | |||
exposed to another tenant, except through carefully controlled | exposed to another tenant, except through carefully controlled | |||
interfaces, such as a security gateway. | interfaces, such as a security gateway. | |||
To a tenant, virtual data centers are similar to their physical | To a tenant, virtual data centers are similar to their physical | |||
counterparts, consisting of end stations attached to a network, | counterparts, consisting of end stations attached to a network, | |||
complete with services such as load balancers and firewalls. But | complete with services such as load balancers and firewalls. But | |||
unlike a physical data center, end stations connect to a virtual | unlike a physical data center, end stations connect to a virtual | |||
network. To end stations, a virtual network looks like a normal | network. To end stations, a virtual network looks like a normal | |||
network (e.g., providing an ethernet service), except that the only | network (e.g., providing an ethernet or L3 service), except that the | |||
end stations connected to the virtual network are those belonging to | only end stations connected to the virtual network are those | |||
the tenant. | belonging to a tenant's specific virtual network. | |||
A tenant is the administrative entity that is responsible for and | A tenant is the administrative entity that is responsible for and | |||
manages a specific virtual network instance and its associated | manages a specific virtual network instance and its associated | |||
services (whether virtual or physical). In a cloud environment, a | services (whether virtual or physical). In a cloud environment, a | |||
tenant would correspond to the customer that has defined and is using | tenant would correspond to the customer that has defined and is using | |||
a particular virtual network. However, a tenant may also find it | a particular virtual network. However, a tenant may also find it | |||
useful to create multiple different virtual network instances. | useful to create multiple different virtual network instances. | |||
Hence, there is a one-to-many mapping between tenants and virtual | Hence, there is a one-to-many mapping between tenants and virtual | |||
network instances. A single tenant may operate multiple individual | network instances. A single tenant may operate multiple individual | |||
virtual network instances, each associated with a different service. | virtual network instances, each associated with a different service. | |||
How a virtual network is implemented does not matter to the tenant. | How a virtual network is implemented does not generally matter to the | |||
It could be a pure routed network, a pure bridged network or a | tenant; what matters is that the service provided (L2 or L3) has the | |||
combination of bridged and routed networks. The key requirement is | right semantics, performance, etc. It could be implemented via a | |||
that each individual virtual network instance be isolated from other | pure routed network, a pure bridged network or a combination of | |||
virtual network instances. | bridged and routed networks. A key requirement is that each | |||
individual virtual network instance be isolated from other virtual | ||||
network instances. | ||||
For data center virtualization, two key issues must be addressed. | ||||
First, address space separation between tenants must be supported. | ||||
Second, it must be possible to place (and migrate) VMs anywhere in | ||||
the data center, without restricting VM addressing to match the | ||||
subnet boundaries of the underlying data center network. | ||||
This document outlines the problems encountered in scaling the number | This document outlines the problems encountered in scaling the number | |||
of isolated networks in a data center, as well as the problems of | of isolated networks in a data center, as well as the problems of | |||
managing the creation/deletion, membership and span of these networks | managing the creation/deletion, membership and span of these networks | |||
and makes the case that an overlay based approach, where individual | and makes the case that an overlay based approach, where individual | |||
networks are implemented within individual virtual networks that are | networks are implemented within individual virtual networks that are | |||
dynamically controlled by a standardized control plane provides a | dynamically controlled by a standardized control plane provides a | |||
number of advantages over current approaches. The purpose of this | number of advantages over current approaches. The purpose of this | |||
document is to identify the set of problems that any solution has to | document is to identify the set of problems that any solution has to | |||
address in building multi-tenant data centers. With this approach, | address in building multi-tenant data centers. With this approach, | |||
the goal is to allow the construction of standardized, interoperable | the goal is to allow the construction of standardized, interoperable | |||
implementations to allow the construction of multi-tenant data | implementations to allow the construction of multi-tenant data | |||
centers. | centers. | |||
Section 2 describes the problem space details. Section 3 describes | Section 2 describes the problem space details. Section 3 describes | |||
network overlays in more detail and the potential work areas. | overlay networks in more detail. Sections 4 and 5 review related and | |||
Sections 4 and 5 review related and further work, while Section 6 | further work, while Section 6 closes with a summary. | |||
closes with a summary. | ||||
2. Problem Details | 2. Problem Areas | |||
The following subsections describe aspects of multi-tenant networking | The following subsections describe aspects of multi-tenant data | |||
that pose problems for large scale network infrastructure. Different | center networking that pose problems for network infrastructure. | |||
problem aspects may arise based on the network architecture and | Different problem aspects may arise based on the network architecture | |||
scale. | and scale. | |||
2.1. Dynamic Provisioning | 2.1. Need For Dynamic Provisioning | |||
Cloud computing involves on-demand provisioning of resources for | Cloud computing involves on-demand provisioning of resources for | |||
multi-tenant environments. A common example of cloud computing is | multi-tenant environments. A common example of cloud computing is | |||
the public cloud, where a cloud service provider offers elastic | the public cloud, where a cloud service provider offers elastic | |||
services to multiple customers over the same infrastructure. The on- | services to multiple customers over the same infrastructure. In | |||
demand nature of provisioning in conjunction with trusted hypervisors | current systems, it can be difficult to provision resources for | |||
controlling network access by VMs can be achieved through resilient | individual tenants in such a way that provisioned properties migrate | |||
distributed network control mechanisms. | automatically when services are dynamically moved around within the | |||
data center to optimize workloads. | ||||
2.2. Virtual Machine Mobility Requirements | 2.2. Virtual Machine Mobility Limitations | |||
A key benefit of server virtualization is virtual machine (VM) | A key benefit of server virtualization is virtual machine (VM) | |||
mobility. A VM can be migrated from one server to another, live, | mobility. A VM can be migrated from one server to another, live, | |||
i.e., while continuing to run and without needing to shut it down and | i.e., while continuing to run and without needing to shut it down and | |||
restart it at the new location. A key requirement for live migration | restart it at the new location. A key requirement for live migration | |||
is that a VM retain critical network state at its new location, | is that a VM retain critical network state at its new location, | |||
including its IP and MAC address(es). Preservation of MAC addresses | including its IP and MAC address(es). Preservation of MAC addresses | |||
may be necessary, for example, when software licences are bound to | may be necessary, for example, when software licenses are bound to | |||
MAC addresses. More generally, any change in the VM's MAC addresses | MAC addresses. More generally, any change in the VM's MAC addresses | |||
resulting from a move would be visible to the VM and thus potentially | resulting from a move would be visible to the VM and thus potentially | |||
result in unexpected disruptions. Retaining IP addresses after a | result in unexpected disruptions. Retaining IP addresses after a | |||
move is necessary to prevent existing transport connections (e.g., | move is necessary to prevent existing transport connections (e.g., | |||
TCP) from breaking and needing to be restarted. | TCP) from breaking and needing to be restarted. | |||
In traditional data centers, servers are assigned IP addresses based | In traditional data centers, servers are assigned IP addresses based | |||
on their physical location, for example based on the Top of Rack | on their physical location, for example based on the Top of Rack | |||
(ToR) switch for the server rack or the VLAN configured to the | (ToR) switch for the server rack or the VLAN configured to the | |||
server. Servers can only move to other locations within the same IP | server. Servers can only move to other locations within the same IP | |||
subnet. This constraint is not problematic for physical servers, | subnet. This constraint is not problematic for physical servers, | |||
which move infrequently, but it restricts the placement and movement | which move infrequently, but it restricts the placement and movement | |||
of VMs within the data center. Any solution for a scalable multi- | of VMs within the data center. Any solution for a scalable multi- | |||
tenant data center must allow a VM to be placed (or moved) anywhere | tenant data center must allow a VM to be placed (or moved) anywhere | |||
within the data center, without being constrained by the subnet | within the data center, without being constrained by the subnet | |||
boundary concerns of the host servers. | boundary concerns of the host servers. | |||
2.3. Span of Virtual Networks | 2.3. Inadequate Forwarding Table Sizes in Switches | |||
Another use case is cross pod expansion. A pod typically consists of | ||||
one or more racks of servers with its associated network and storage | ||||
connectivity. Tenants may start off on a pod and, due to expansion, | ||||
require servers/VMs on other pods, especially the case when tenants | ||||
on the other pods are not fully utilizing all their resources. This | ||||
use case requires that virtual networks span multiple pods in order | ||||
to provide connectivity to all of the tenant's servers/VMs. | ||||
2.4. Inadequate Forwarding Table Sizes in Switches | ||||
Today's virtualized environments place additional demands on the | Today's virtualized environments place additional demands on the | |||
forwarding tables of switches. Instead of just one link-layer | forwarding tables of switches in the physical infrastructure. | |||
address per server, the switching infrastructure has to learn | Instead of just one link-layer address per server, the switching | |||
addresses of the individual VMs (which could range in the 100s per | infrastructure has to learn addresses of the individual VMs (which | |||
server). This is a requirement since traffic from/to the VMs to the | could range in the 100s per server). This is a requirement since | |||
rest of the physical network will traverse the physical network | traffic from/to the VMs to the rest of the physical network will | |||
infrastructure. This places a much larger demand on the switches' | traverse the physical network infrastructure. This places a much | |||
forwarding table capacity compared to non-virtualized environments, | larger demand on the switches' forwarding table capacity compared to | |||
causing more traffic to be flooded or dropped when the addresses in | non-virtualized environments, causing more traffic to be flooded or | |||
use exceeds the forwarding table capacity. | dropped when the number of addresses in use exceeds a switch's | |||
forwarding table capacity. | ||||
2.5. Decoupling Logical and Physical Configuration | 2.4. Need to Decouple Logical and Physical Configuration | |||
Data center operators must be able to achieve high utilization of | Data center operators must be able to achieve high utilization of | |||
server and network capacity. For efficient and flexible allocation, | server and network capacity. For efficient and flexible allocation, | |||
operators should be able to spread a virtual network instance across | operators should be able to spread a virtual network instance across | |||
servers in any rack in the data center. It should also be possible | servers in any rack in the data center. It should also be possible | |||
to migrate compute workloads to any server anywhere in the network | to migrate compute workloads to any server anywhere in the network | |||
while retaining the workload's addresses. This can be achieved today | while retaining the workload's addresses. In networks using VLANs, | |||
by stretching VLANs (e.g., by using TRILL or SPB). | moving servers elsewhere in the network may require expanding the | |||
scope of the VLAN beyond its original boundaries. While this can be | ||||
done, it requires potentially complex network configuration changes | ||||
and can conflict with the desire to bound the size of broadcast | ||||
domains, especially in larger data centers. | ||||
However, in order to limit the broadcast domain of each VLAN, multi- | However, in order to limit the broadcast domain of each VLAN, multi- | |||
destination frames within a VLAN should optimally flow only to those | destination frames within a VLAN should optimally flow only to those | |||
devices that have that VLAN configured. When workloads migrate, the | devices that have that VLAN configured. When workloads migrate, the | |||
physical network (e.g., access lists) may need to be reconfigured | physical network (e.g., access lists) may need to be reconfigured | |||
which is typically time consuming and error prone. | which is typically time consuming and error prone. | |||
2.6. Separating Tenant Addressing from Infrastructure Addressing | An important use case is cross-pod expansion. A pod typically | |||
consists of one or more racks of servers with its associated network | ||||
It is highly desirable to be able to number the data center underlay | and storage connectivity. A tenant's virtual network may start off | |||
network using whatever addresses make sense for it, without having to | on a pod and, due to expansion, require servers/VMs on other pods, | |||
worry about address collisions between addresses used by the underlay | especially the case when other pods are not fully utilizing all their | |||
and those used by tenants. | resources. This use case requires that virtual networks span | |||
multiple pods in order to provide connectivity to all of its tenant's | ||||
2.7. Communication Between Virtual and Traditional Networks | servers/VMs. Such expansion can be difficult to achieve when tenant | |||
addressing is tied to the addressing used by the underlay network or | ||||
Not all communication will be between devices connected to | when it requires that the scope of the underlying L2 VLAN expand | |||
virtualized networks. Devices using overlays will continue to access | beyond its original pod boundary. | |||
devices and make use of services on traditional, non-virtualized | ||||
networks, whether in the data center, the public Internet, or at | ||||
remote/branch campuses. Any virtual network solution must be capable | ||||
of interoperating with existing routers, VPN services, load | ||||
balancers, intrusion detection services, firewalls, etc. on external | ||||
networks. | ||||
Communication between devices attached to a virtual network and | ||||
devices connected to non-virtualized networks is handled | ||||
architecturally by having specialized gateway devices that receive | ||||
packets from a virtualized network, decapsulate them, process them as | ||||
regular (i.e., non-virtualized) traffic, and finally forward them on | ||||
to their appropriate destination (and vice versa). Additional | ||||
identification, such as VLAN tags, could be used on the non- | ||||
virtualized side of such a gateway to enable forwarding of traffic | ||||
for multiple virtual networks over a common non-virtualized link. | ||||
A wide range of implementation approaches are possible. Overlay | ||||
gateway functionality could be combined with other network | ||||
functionality into a network device that implements the overlay | ||||
functionality, and then forwards traffic between other internal | ||||
components that implement functionality such as full router service, | ||||
load balancing, firewall support, VPN gateway, etc. | ||||
2.8. Communication Between Virtual Networks | ||||
Communication between devices on different virtual networks is | ||||
handled architecturally by adding specialized interconnect | ||||
functionality among the otherwise isolated virtual networks. For a | ||||
virtual network providing an Ethernet service, such interconnect | ||||
functionality could be IP forwarding configured as part of the | ||||
"default gateway" for each virtual network. For a virtual network | ||||
providing IP service, the interconnect functionality could be IP | ||||
forwarding configured as part of the IP addressing structure of each | ||||
virtual network. In both cases, the implementation of the | ||||
interconnect functionality could be distributed across the NVEs, and | ||||
could be combined with other network functionality (e.g., load | ||||
balancing, firewall support) that is applied to traffic that is | ||||
forwarded between virtual networks. | ||||
2.9. Overlay Design Characteristics | ||||
There are existing layer 2 overlay protocols in existence, but they | ||||
were not necessarily designed to solve the problem in the environment | ||||
of a highly virtualized data center. Below are some of the | ||||
characteristics of environments that must be taken into account by | ||||
the overlay technology: | ||||
1. Highly distributed systems. The overlay should work in an | ||||
environment where there could be many thousands of access | ||||
switches (e.g. residing within the hypervisors) and many more end | ||||
systems (e.g. VMs) connected to them. This leads to a | ||||
distributed mapping system that puts a low overhead on the | ||||
overlay tunnel endpoints. | ||||
2. Many highly distributed virtual networks with sparse membership. | ||||
Each virtual network could be highly dispersed inside the data | ||||
center. Also, along with expectation of many virtual networks, | ||||
the number of end systems connected to any one virtual network is | ||||
expected to be relatively low; Therefore, the percentage of | ||||
access switches participating in any given virtual network would | ||||
also be expected to be low. For this reason, efficient pruning | ||||
of multi-destination traffic should be taken into consideration. | ||||
3. Highly dynamic end systems. End systems connected to virtual | ||||
networks can be very dynamic, both in terms of creation/deletion/ | ||||
power-on/off and in terms of mobility across the access switches. | ||||
4. Work with existing, widely deployed network Ethernet switches and | ||||
IP routers without requiring wholesale replacement. The first | ||||
hop switch that adds and removes the overlay header will require | ||||
new equipment and/or new software. | ||||
5. Network infrastructure administered by a single administrative | ||||
domain. This is consistent with operation within a data center, | ||||
and not across the Internet. | ||||
3. Network Overlays | 2.5. Need For Address Separation Between Tenants | |||
Virtual Networks are used to isolate a tenant's traffic from that of | Individual tenants need control over the addresses they use within a | |||
other tenants (or even traffic within the same tenant that requires | virtual network. But it can be problematic when different tenants | |||
isolation). There are two main characteristics of virtual networks: | want to use the same addresses, or even if the same tenant wants to | |||
reuse the same addresses in different virtual networks. | ||||
Consequently, virtual networks must allow tenants to use whatever | ||||
addresses they want without concern for what addresses are being used | ||||
by other tenants or other virtual networks. | ||||
1. Providing network address space that is isolated from other | 2.6. Need For Address Separation Between Tenant and Infrastructure | |||
virtual networks. The same network addresses may be used in | ||||
different virtual networks on the same underlying network | ||||
infrastructure. | ||||
2. Limiting the scope of frames sent on the virtual network. Frames | As in the previous case, a tenant needs to be able to use whatever | |||
sent by end systems attached to a virtual network are delivered | addresses it wants in a virtual network independent of what addresses | |||
as expected to other end systems on that virtual network and may | the underlying data center network is using. Tenants (and the | |||
exit a virtual network only through controlled exit points such | underlay infrastructure provider) should be able use whatever | |||
as a security gateway. Likewise, frames sourced outside of the | addresses make sense for them, without having to worry about address | |||
virtual network may enter the virtual network only through | collisions between addresses used by tenants and those used by the | |||
controlled entry points, such as a security gateway. | underlay data center network. | |||
3.1. Limitations of Existing Virtual Network Models | 2.7. IEEE 802.1 VLAN Limitations | |||
Virtual networks are not new to networking. For example, VLANs are a | VLANs are a well known construct in the networking industry, | |||
well known construct in the networking industry. A VLAN is an L2 | providing an L2 service via an L2 underlay. A VLAN is an L2 bridging | |||
bridging construct that provides some of the semantics of virtual | construct that provides some of the semantics of virtual networks | |||
networks mentioned above: a MAC address is unique within a VLAN, but | mentioned above: a MAC address is unique within a VLAN, but not | |||
not necessarily across VLANs. Traffic sourced within a VLAN | necessarily across VLANs. Traffic sourced within a VLAN (including | |||
(including broadcast and multicast traffic) remains within the VLAN | broadcast and multicast traffic) remains within the VLAN it | |||
it originates from. Traffic forwarded from one VLAN to another | originates from. Traffic forwarded from one VLAN to another | |||
typically involves router (L3) processing. The forwarding table look | typically involves router (L3) processing. The forwarding table look | |||
up operation is keyed on {VLAN, MAC address} tuples. | up operation is keyed on {VLAN, MAC address} tuples. | |||
But there are problems and limitations with L2 VLANs. VLANs are a | But there are problems and limitations with L2 VLANs. VLANs are a | |||
pure L2 bridging construct and VLAN identifiers are carried along | pure L2 bridging construct and VLAN identifiers are carried along | |||
with data frames to allow each forwarding point to know what VLAN the | with data frames to allow each forwarding point to know what VLAN the | |||
frame belongs to. A VLAN today is defined as a 12 bit number, | frame belongs to. A VLAN today is defined as a 12 bit number, | |||
limiting the total number of VLANs to 4096 (though typically, this | limiting the total number of VLANs to 4096 (though typically, this | |||
number is 4094 since 0 and 4095 are reserved). Due to the large | number is 4094 since 0 and 4095 are reserved). Due to the large | |||
number of tenants that a cloud provider might service, the 4094 VLAN | number of tenants that a cloud provider might service, the 4094 VLAN | |||
limit is often inadequate. In addition, there is often a need for | limit is often inadequate. In addition, there is often a need for | |||
multiple VLANs per tenant, which exacerbates the issue. The use of a | multiple VLANs per tenant, which exacerbates the issue. The use of a | |||
sufficiently large VNID, present in the overlay control plane and | sufficiently large VNID, present in the overlay control plane and | |||
possibly also in the dataplane would eliminate current VLAN size | possibly also in the dataplane would eliminate current VLAN size | |||
limitations associated with single 12-bit VLAN tags. | limitations associated with single 12-bit VLAN tags. | |||
For IP/MPLS networks, Ethernet Virtual Private Network (E-VPN) | 3. Network Overlays | |||
[I-D.ietf-l2vpn-evpn] provides an emulated Ethernet service in which | ||||
each tenant has its own Ethernet network over a common IP or MPLS | ||||
infrastructure and a BGP/MPLS control plane is used to distribute the | ||||
tenant MAC addresses and the MPLS labels that identify the tenants | ||||
and tenant MAC addresses. Within the BGP/MPLS control plane a thirty | ||||
two bit Ethernet Tag is used to identify the broadcast domains | ||||
(VLANs) associated with a given L2 VLAN service instance and these | ||||
Ethernet tags are mapped to VLAN IDs understood by the tenant at the | ||||
service edges. This means that the limit of 4096 VLANs is associated | ||||
with an individual tenant service edge, enabling a much higher level | ||||
of scalability. Interconnectivity between tenants is also allowed in | ||||
a controlled fashion. | ||||
IP/MPLS networks also provide an IP VPN service (L3 VPN) [RFC4364] in | Virtual Networks are used to isolate a tenant's traffic from that of | |||
which each tenant has its own IP network over a common IP or MPLS | other tenants (or even traffic within the same tenant that requires | |||
infrastructure and a BGP/MPLS control plane is used to distribute the | isolation). There are two main characteristics of virtual networks: | |||
tenant IP routes and the MPLS labels that identify the tenants and | ||||
tenant IP routes. As with E-VPNs, interconnectivity between tenants | ||||
is also allowed in a controlled fashion. | ||||
VM Mobility [I-D.raggarwa-data-center-mobility] introduces the | 1. Virtual networks isolate the address space used in one virtual | |||
concept of a combined L2/L3 VPN service in order to support the | network from the address space used by another virtual network. | |||
mobility of individual Virtual Machines (VMs) between Data Centers | The same network addresses may be used in different virtual | |||
connected over a common IP or MPLS infrastructure. | networks at the same time. In addition, the address space used | |||
by a virtual network is independent from that used by the | ||||
underlying physical network. | ||||
There are a number of VPN approaches that provide some if not all of | 2. Virtual Networks limit the scope of packets sent on the virtual | |||
the desired semantics of virtual networks. A gap analysis will be | network. Packets sent by end systems attached to a virtual | |||
needed to assess how well existing approaches satisfy the | network are delivered as expected to other end systems on that | |||
requirements. | virtual network and may exit a virtual network only through | |||
controlled exit points such as a security gateway. Likewise, | ||||
packets sourced from outside of the virtual network may enter the | ||||
virtual network only through controlled entry points, such as a | ||||
security gateway. | ||||
3.2. Benefits of Network Overlays | 3.1. Benefits of Network Overlays | |||
To address the problems described earlier, a network overlay model | To address the problems described in Section 2, a network overlay | |||
can be used. | model can be used. | |||
The idea behind an overlay is quite straightforward. Each virtual | The idea behind an overlay is quite straightforward. Each virtual | |||
network instance is implemented as an overlay. The original frame is | network instance is implemented as an overlay. The original packet | |||
encapsulated by the first hop network device. The encapsulation | is encapsulated by the first-hop network device. The encapsulation | |||
identifies the destination of the device that will perform the | identifies the destination of the device that will perform the | |||
decapsulation before delivering the frame to the endpoint. The rest | decapsulation before delivering the original packet to the endpoint. | |||
of the network forwards the frame based on the encapsulation header | The rest of the network forwards the packet based on the | |||
and can be oblivious to the payload that is carried inside. To avoid | encapsulation header and can be oblivious to the payload that is | |||
belaboring the point each time, the first hop network device can be a | carried inside. | |||
traditional switch or router or the virtual switch residing inside a | ||||
hypervisor. Furthermore, the endpoint can be a VM or it can be a | ||||
physical server. Examples of architectures based on network overlays | ||||
include BGP/MPLS VPNs [RFC4364], TRILL [RFC6325], LISP | ||||
[I-D.ietf-lisp], and Shortest Path Bridging [SPB]. | ||||
With the overlay, a virtual network identifier (or VNID) can be | Overlays are based on what is commonly known as a "map-and-encap" | |||
carried as part of the overlay header so that every data frame | architecture. There are three distinct and logically separable | |||
explicitly identifies the specific virtual network the frame belongs | steps: | |||
to. Since both routed and bridged semantics can be supported by a | ||||
virtual data center, the original frame carried within the overlay | 1. The first-hop overlay device implements a mapping operation that | |||
header can be an Ethernet frame complete with MAC addresses or just | determines where the encapsulated packet should be sent to reach | |||
the IP packet. | its intended destination VM. Specifically, the mapping function | |||
maps the destination address (either L2 or L3) of a packet | ||||
received from a VM into the corresponding destination address of | ||||
the egress device. The destination address will be the underlay | ||||
address of the device doing the decapsulation and is an IP | ||||
address. | ||||
2. Once the mapping has been determined, the ingress overlay device | ||||
encapsulates the received packet within an overlay header. | ||||
3. The final step is to actually forward the (now encapsulated) | ||||
packet to its destination. The packet is forwarded by the | ||||
underlay (i.e., the IP network) based entirely on its outer | ||||
address. Upon receipt at the destination, the egress overlay | ||||
device decapsulates the original packet and delivers it to the | ||||
intended recipient VM. | ||||
Each of the above steps is logically distinct, though an | ||||
implementation might combine them for efficiency or other reasons. | ||||
It should be noted that in L3 BGP/VPN terminology, the above steps | ||||
are commonly known as "forwarding" or "virtual forwarding". | ||||
The first hop network device can be a traditional switch or router or | ||||
the virtual switch residing inside a hypervisor. Furthermore, the | ||||
endpoint can be a VM or it can be a physical server. Examples of | ||||
architectures based on network overlays include BGP/MPLS VPNs | ||||
[RFC4364], TRILL [RFC6325], LISP [I-D.ietf-lisp], and Shortest Path | ||||
Bridging (SPB-M) [SPBM]. | ||||
In the data plane, a virtual network identifier (or VNID), or a | ||||
locally significant identifier, can be carried as part of the overlay | ||||
header so that every data packet explicitly identifies the specific | ||||
virtual network the packet belongs to. Since both routed and bridged | ||||
semantics can be supported by a virtual data center, the original | ||||
packet carried within the overlay header can be an Ethernet frame | ||||
complete with MAC addresses or just the IP packet. | ||||
The use of a sufficiently large VNID would address current VLAN | The use of a sufficiently large VNID would address current VLAN | |||
limitations associated with single 12-bit VLAN tags. This VNID can | limitations associated with single 12-bit VLAN tags. This VNID can | |||
be carried in the control plane. In the data plane, an overlay | be carried in the control plane. In the data plane, an overlay | |||
header provides a place to carry either the VNID, or a locally- | header provides a place to carry either the VNID, or an identifier | |||
significant identifier. In both cases, the identifier in the overlay | that is locally-significant to the edge device. In both cases, the | |||
header specifies which virtual network the data packet belongs to. | identifier in the overlay header specifies which virtual network the | |||
data packet belongs to. | ||||
A key aspect of overlays is the decoupling of the "virtual" MAC and | A key aspect of overlays is the decoupling of the "virtual" MAC | |||
IP addresses used by VMs from the physical network infrastructure and | and/or IP addresses used by VMs from the physical network | |||
the infrastructure IP addresses used by the data center. If a VM | infrastructure and the infrastructure IP addresses used by the data | |||
changes location, the switches at the edge of the overlay simply | center. If a VM changes location, the overlay edge devices simply | |||
update their mapping tables to reflect the new location of the VM | update their mapping tables to reflect the new location of the VM | |||
within the data center's infrastructure space. Because an overlay | within the data center's infrastructure space. Because an overlay | |||
network is used, a VM can now be located anywhere in the data center | network is used, a VM can now be located anywhere in the data center | |||
that the overlay reaches without regards to traditional constraints | that the overlay reaches without regards to traditional constraints | |||
implied by L2 properties such as VLAN numbering, or the span of an L2 | implied by L2 properties such as VLAN numbering, or the span of an L2 | |||
broadcast domain scoped to a single pod or access switch. | broadcast domain scoped to a single pod or access switch. | |||
Multi-tenancy is supported by isolating the traffic of one virtual | Multi-tenancy is supported by isolating the traffic of one virtual | |||
network instance from traffic of another. Traffic from one virtual | network instance from traffic of another. Traffic from one virtual | |||
network instance cannot be delivered to another instance without | network instance cannot be delivered to another instance without | |||
(conceptually) exiting the instance and entering the other instance | (conceptually) exiting the instance and entering the other instance | |||
via an entity that has connectivity to both virtual network | via an entity that has connectivity to both virtual network | |||
instances. Without the existence of this entity, tenant traffic | instances. Without the existence of this entity, tenant traffic | |||
remains isolated within each individual virtual network instance. | remains isolated within each individual virtual network instance. | |||
Overlays are designed to allow a set of VMs to be placed within a | Overlays are designed to allow a set of VMs to be placed within a | |||
single virtual network instance, whether that virtual network | single virtual network instance, whether that virtual network | |||
provides a bridged network or a routed network. | provides a bridged network or a routed network. | |||
3.3. Overlay Networking Work Areas | 3.2. Communication Between Virtual and Traditional Networks | |||
Not all communication will be between devices connected to | ||||
virtualized networks. Devices using overlays will continue to access | ||||
devices and make use of services on traditional, non-virtualized | ||||
networks, whether in the data center, the public Internet, or at | ||||
remote/branch campuses. Any virtual network solution must be capable | ||||
of interoperating with existing routers, VPN services, load | ||||
balancers, intrusion detection services, firewalls, etc. on external | ||||
networks. | ||||
Communication between devices attached to a virtual network and | ||||
devices connected to non-virtualized networks is handled | ||||
architecturally by having specialized gateway devices that receive | ||||
packets from a virtualized network, decapsulate them, process them as | ||||
regular (i.e., non-virtualized) traffic, and finally forward them on | ||||
to their appropriate destination (and vice versa). Additional | ||||
identification, such as VLAN tags, could be used on the non- | ||||
virtualized side of such a gateway to enable forwarding of traffic | ||||
for multiple virtual networks over a common non-virtualized link. | ||||
A wide range of implementation approaches are possible. Overlay | ||||
gateway functionality could be combined with other network | ||||
functionality into a network device that implements the overlay | ||||
functionality, and then forwards traffic between other internal | ||||
components that implement functionality such as full router service, | ||||
load balancing, firewall support, VPN gateway, etc. | ||||
3.3. Communication Between Virtual Networks | ||||
Communication between devices on different virtual networks is | ||||
handled architecturally by adding specialized interconnect | ||||
functionality among the otherwise isolated virtual networks. For a | ||||
virtual network providing an L2 service, such interconnect | ||||
functionality could be IP forwarding configured as part of the | ||||
"default gateway" for each virtual network. For a virtual network | ||||
providing L3 service, the interconnect functionality could be IP | ||||
forwarding configured as part of routing between IP subnets or it can | ||||
be based on configured inter-virtual network traffic policies. In | ||||
both cases, the implementation of the interconnect functionality | ||||
could be distributed across the NVEs, and could be combined with | ||||
other network functionality (e.g., load balancing, firewall support) | ||||
that is applied to traffic that is forwarded between virtual | ||||
networks. | ||||
3.4. Overlay Design Characteristics | ||||
There are existing layer 2 and layer 3 overlay protocols in | ||||
existence, but they do not necessarily solve all of today's problem | ||||
in the environment of a highly virtualized data center. Below are | ||||
some of the characteristics of environments that must be taken into | ||||
account by the overlay technology: | ||||
1. Highly distributed systems. The overlay should work in an | ||||
environment where there could be many thousands of access devices | ||||
(e.g. residing within the hypervisors) and many more end systems | ||||
(e.g. VMs) connected to them. This leads to a distributed | ||||
mapping system that puts a low overhead on the overlay tunnel | ||||
endpoints. | ||||
2. Many highly distributed virtual networks with sparse membership. | ||||
Each virtual network could be highly dispersed inside the data | ||||
center. Also, along with expectation of many virtual networks, | ||||
the number of end systems connected to any one virtual network is | ||||
expected to be relatively low; Therefore, the percentage of | ||||
access devices participating in any given virtual network would | ||||
also be expected to be low. For this reason, efficient delivery | ||||
of multi-destination traffic within a virtual network instance | ||||
should be taken into consideration. | ||||
3. Highly dynamic end systems. End systems connected to virtual | ||||
networks can be very dynamic, both in terms of creation/deletion/ | ||||
power-on/off and in terms of mobility across the access devices. | ||||
4. Work with existing, widely deployed network Ethernet switches and | ||||
IP routers without requiring wholesale replacement. The first | ||||
hop device (or end system) that adds and removes the overlay | ||||
header will require new equipment and/or new software. | ||||
5. Work with existing data center network deployments without | ||||
requiring major changes in operational or other practices. For | ||||
example, some data centers have not enabled multicast beyond | ||||
link-local scope. Overlays should be capable of leveraging | ||||
underlay multicast support where appropriate, but not require its | ||||
enablement in order to use an overlay solution. | ||||
6. Network infrastructure administered by a single administrative | ||||
domain. This is consistent with operation within a data center, | ||||
and not across the Internet. | ||||
3.5. Overlay Networking Work Areas | ||||
There are three specific and separate potential work areas needed to | There are three specific and separate potential work areas needed to | |||
realize an overlay solution. The areas correspond to different | realize an overlay solution. The areas correspond to different | |||
possible "on-the-wire" protocols, where distinct entities interact | possible "on-the-wire" protocols, where distinct entities interact | |||
with each other. | with each other. | |||
One area of work concerns the address dissemination protocol an NVE | One area of work concerns the address dissemination protocol an NVE | |||
uses to build and maintain the mapping tables it uses to deliver | uses to build and maintain the mapping tables it uses to deliver | |||
encapsulated frames to their proper destination. One approach is to | encapsulated packets to their proper destination. One approach is to | |||
build mapping tables entirely via learning (as is done in 802.1 | build mapping tables entirely via learning (as is done in 802.1 | |||
networks). But to provide better scaling properties, a more | networks). But to provide better scaling properties, a more | |||
sophisticated approach is needed, i.e., the use of a specialized | sophisticated approach is needed, i.e., the use of a specialized | |||
control plane protocol. While there are some advantages to using or | control plane protocol. While there are some advantages to using or | |||
leveraging an existing protocol for maintaining mapping tables, the | leveraging an existing protocol for maintaining mapping tables, the | |||
fact that large numbers of NVE's will likely reside in hypervisors | fact that large numbers of NVE's will likely reside in hypervisors | |||
places constraints on the resources (cpu and memory) that can be | places constraints on the resources (cpu and memory) that can be | |||
dedicated to such functions. For example, routing protocols (e.g., | dedicated to such functions. For example, routing protocols (e.g., | |||
IS-IS, BGP) may have scaling difficulties if implemented directly in | IS-IS, BGP) may have scaling difficulties if implemented directly in | |||
all NVEs, based on both flooding and convergence time concerns. An | all NVEs, based on both flooding and convergence time concerns. An | |||
skipping to change at page 13, line 10 | skipping to change at page 14, line 11 | |||
tunnels associated with the VM. To achieve this functionality, a | tunnels associated with the VM. To achieve this functionality, a | |||
standardized interaction between the NVE and hypervisor may be | standardized interaction between the NVE and hypervisor may be | |||
needed, for example in the case where the NVE resides on a separate | needed, for example in the case where the NVE resides on a separate | |||
device from the VM. | device from the VM. | |||
In summary, there are three areas of potential work. The first area | In summary, there are three areas of potential work. The first area | |||
concerns the oracle itself and any on-the-wire protocols it needs. A | concerns the oracle itself and any on-the-wire protocols it needs. A | |||
second area concerns the interaction between the oracle and NVEs. | second area concerns the interaction between the oracle and NVEs. | |||
The third work area concerns protocols associated with attaching and | The third work area concerns protocols associated with attaching and | |||
detaching a VM from a particular virtual network instance. All three | detaching a VM from a particular virtual network instance. All three | |||
work areas are important to the development of a scalable, | work areas are important to the development of scalable, | |||
interoperable solution. | interoperable solutions. | |||
4. Related Work | 4. Related IETF and IEEE Work | |||
4.1. IEEE 802.1aq - Shortest Path Bridging | The following subsections discuss related IETF and IEEE work in | |||
progress, the items are not meant to be complete coverage of all IETF | ||||
and IEEE data center related work, nor are the descriptions | ||||
comprehensive. Each area is currently trying to address certain | ||||
limitations of today's data center networks, e.g., scaling is a | ||||
common issue for every area listed and multi-tenancy and VM mobility | ||||
are important focus areas as well. Comparing and evaluating the work | ||||
result and progress of each work area listed is out of scope of this | ||||
document. The intent of this section is to provide a reference to | ||||
the interested readers. | ||||
Shortest Path Bridging (SPB) is an IS-IS based overlay for L2 | 4.1. L3 BGP/MPLS IP VPNs | |||
Ethernets. SPB supports multi-pathing and addresses a number of | ||||
BGP/MPLS IP VPNs [RFC4364] support multi-tenancy address overlapping, | ||||
VPN traffic isolation, and address separation between tenants and | ||||
network infrastructure. The BGP/MPLS control plane is used to | ||||
distribute the VPN labels and the tenant IP addresses which identify | ||||
the tenants (or to be more specific, the particular VPN/VN) and | ||||
tenant IP addresses. Deployment of enterprise L3 VPNs has been shown | ||||
to scale to thousands of VPNs and millions of VPN prefixes. BGP/MPLS | ||||
IP VPNs are currently deployed in some large enterprise data centers. | ||||
The potential limitation for deploying BGP/MPLS IP VPNs in data | ||||
center environments is the practicality of using BGP in the data | ||||
center, especially reaching into the servers or hypervisors. There | ||||
may be computing work force skill set issues, equipment support | ||||
issues, and potential new scaling challenges. A combination of BGP | ||||
and lighter weight IP signaling protocols, e.g., XMPP, have been | ||||
proposed to extend the solutions into DC environment [I-D.margues- | ||||
end-system], while taking advantage of building in VPN features with | ||||
its rich policy support; it is especially useful for inter-tenant | ||||
connectivity. | ||||
4.2. L2 BGP/MPLS IP VPNs | ||||
Ethernet Virtual Private Networks (E-VPNs) [I-D.ietf-l2vpn-evpn] | ||||
provide an emulated L2 service in which each tenant has its own | ||||
Ethernet network over a common IP or MPLS infrastructure and a BGP/ | ||||
MPLS control plane is used to distribute the tenant MAC addresses and | ||||
the MPLS labels that identify the tenants and tenant MAC addresses. | ||||
Within the BGP/MPLS control plane a thirty two bit Ethernet Tag is | ||||
used to identify the broadcast domains (VLANs) associated with a | ||||
given L2 VLAN service instance and these Ethernet tags are mapped to | ||||
VLAN IDs understood by the tenant at the service edges. This means | ||||
that the limit of 4096 VLANs is associated with an individual tenant | ||||
service edge, enabling a much higher level of scalability. | ||||
Interconnection between tenants is also allowed in a controlled | ||||
fashion. | ||||
VM Mobility [I-D.raggarwa-data-center-mobility] introduces the | ||||
concept of a combined L2/L3 VPN service in order to support the | ||||
mobility of individual Virtual Machines (VMs) between Data Centers | ||||
connected over a common IP or MPLS infrastructure. | ||||
4.3. IEEE 802.1aq - Shortest Path Bridging | ||||
Shortest Path Bridging (SPB-M) is an IS-IS based overlay for L2 | ||||
Ethernets. SPB-M supports multi-pathing and addresses a number of | ||||
shortcoming in the original Ethernet Spanning Tree Protocol. SPB-M | shortcoming in the original Ethernet Spanning Tree Protocol. SPB-M | |||
uses IEEE 802.1ah MAC-in-MAC encapsulation and supports a 24-bit | uses IEEE 802.1ah MAC-in-MAC encapsulation and supports a 24-bit | |||
I-SID, which can be used to identify virtual network instances. SPB | I-SID, which can be used to identify virtual network instances. | |||
is entirely L2 based, extending the L2 Ethernet bridging model. | SPB-M is entirely L2 based, extending the L2 Ethernet bridging model. | |||
4.2. ARMD | 4.4. ARMD | |||
ARMD is chartered to look at data center scaling issues with a focus | ARMD is chartered to look at data center scaling issues with a focus | |||
on address resolution. ARMD is currently chartered to develop a | on address resolution. ARMD is currently chartered to develop a | |||
problem statement and is not currently developing solutions. While | problem statement and is not currently developing solutions. While | |||
an overlay-based approach may address some of the "pain points" that | an overlay-based approach may address some of the "pain points" that | |||
have been raised in ARMD (e.g., better support for multi-tenancy), an | have been raised in ARMD (e.g., better support for multi-tenancy), an | |||
overlay approach may also push some of the L2 scaling concerns (e.g., | overlay approach may also push some of the L2 scaling concerns (e.g., | |||
excessive flooding) to the IP level (flooding via IP multicast). | excessive flooding) to the IP level (flooding via IP multicast). | |||
Analysis will be needed to understand the scaling tradeoffs of an | Analysis will be needed to understand the scaling tradeoffs of an | |||
overlay based approach compared with existing approaches. On the | overlay based approach compared with existing approaches. On the | |||
other hand, existing IP-based approaches such as proxy ARP may help | other hand, existing IP-based approaches such as proxy ARP may help | |||
mitigate some concerns. | mitigate some concerns. | |||
4.3. TRILL | 4.5. TRILL | |||
TRILL is an L2-based approach aimed at improving deficiencies and | TRILL is an L2-based approach aimed at improving deficiencies and | |||
limitations with current Ethernet networks and STP in particular. | limitations with current Ethernet networks and STP in particular. | |||
Although it differs from Shortest Path Bridging in many architectural | Although it differs from Shortest Path Bridging in many architectural | |||
and implementation details, it is similar in that is provides an L2- | and implementation details, it is similar in that is provides an L2- | |||
based service to end systems. TRILL as defined today, supports only | based service to end systems. TRILL as defined today, supports only | |||
the standard (and limited) 12-bit VLAN model. Approaches to extend | the standard (and limited) 12-bit VLAN model. Approaches to extend | |||
TRILL to support more than 4094 VLANs are currently under | TRILL to support more than 4094 VLANs are currently under | |||
investigation [I-D.ietf-trill-fine-labeling] | investigation [I-D.ietf-trill-fine-labeling] | |||
4.4. L2VPNs | 4.6. L2VPNs | |||
The IETF has specified a number of approaches for connecting L2 | The IETF has specified a number of approaches for connecting L2 | |||
domains together as part of the L2VPN Working Group. That group, | domains together as part of the L2VPN Working Group. That group, | |||
however has historically been focused on Provider-provisioned L2 | however has historically been focused on Provider-provisioned L2 | |||
VPNs, where the service provider participates in management and | VPNs, where the service provider participates in management and | |||
provisioning of the VPN. In addition, much of the target environment | provisioning of the VPN. In addition, much of the target environment | |||
for such deployments involves carrying L2 traffic over WANs. Overlay | for such deployments involves carrying L2 traffic over WANs. Overlay | |||
approaches are intended be used within data centers where the overlay | approaches are intended be used within data centers where the overlay | |||
network is managed by the data center operator, rather than by an | network is managed by the data center operator, rather than by an | |||
outside party. While overlays can run across the Internet as well, | outside party. While overlays can run across the Internet as well, | |||
they will extend well into the data center itself (e.g., up to and | they will extend well into the data center itself (e.g., up to and | |||
including hypervisors) and include large numbers of machines within | including hypervisors) and include large numbers of machines within | |||
the data center itself. | the data center itself. | |||
Other L2VPN approaches, such as L2TP [RFC2661] require significant | Other L2VPN approaches, such as L2TP [RFC2661] require significant | |||
tunnel state at the encapsulating and decapsulating end points. | tunnel state at the encapsulating and decapsulating end points. | |||
Overlays require less tunnel state than other approaches, which is | Overlays require less tunnel state than other approaches, which is | |||
important to allow overlays to scale to hundreds of thousands of end | important to allow overlays to scale to hundreds of thousands of end | |||
points. It is assumed that smaller switches (i.e., virtual switches | points. It is assumed that smaller switches (i.e., virtual switches | |||
in hypervisors or the physical switches to which VMs connect) will be | in hypervisors or the adjacent devices to which VMs connect) will be | |||
part of the overlay network and be responsible for encapsulating and | part of the overlay network and be responsible for encapsulating and | |||
decapsulating packets. | decapsulating packets. | |||
4.5. Proxy Mobile IP | 4.7. Proxy Mobile IP | |||
Proxy Mobile IP [RFC5213] [RFC5844] makes use of the GRE Key Field | Proxy Mobile IP [RFC5213] [RFC5844] makes use of the GRE Key Field | |||
[RFC5845] [RFC6245], but not in a way that supports multi-tenancy. | [RFC5845] [RFC6245], but not in a way that supports multi-tenancy. | |||
4.6. LISP | 4.8. LISP | |||
LISP[I-D.ietf-lisp] essentially provides an IP over IP overlay where | LISP[I-D.ietf-lisp] essentially provides an IP over IP overlay where | |||
the internal addresses are end station Identifiers and the outer IP | the internal addresses are end station Identifiers and the outer IP | |||
addresses represent the location of the end station within the core | addresses represent the location of the end station within the core | |||
IP network topology. The LISP overlay header uses a 24-bit Instance | IP network topology. The LISP overlay header uses a 24-bit Instance | |||
ID used to support overlapping inner IP addresses. | ID used to support overlapping inner IP addresses. | |||
4.7. Individual Submissions | ||||
Many individual submissions also look to addressing some or all of | ||||
the issues addressed in this draft. Examples of such drafts are | ||||
VXLAN [I-D.mahalingam-dutt-dcops-vxlan], NVGRE | ||||
[I-D.sridharan-virtualization-nvgre] and Virtual Machine Mobility in | ||||
L3 networks[I-D.wkumari-dcops-l3-vmmobility]. | ||||
5. Further Work | 5. Further Work | |||
It is believed that overlay-based approaches may be able to reduce | It is believed that overlay-based approaches may be able to reduce | |||
the overall amount of flooding and other multicast and broadcast | the overall amount of flooding and other multicast and broadcast | |||
related traffic (e.g, ARP and ND) currently experienced within | related traffic (e.g, ARP and ND) currently experienced within | |||
current data centers with a large flat L2 network. Further analysis | current data centers with a large flat L2 network. Further analysis | |||
is needed to characterize expected improvements. | is needed to characterize expected improvements. | |||
There are a number of VPN approaches that provide some if not all of | ||||
the desired semantics of virtual networks. A gap analysis will be | ||||
needed to assess how well existing approaches satisfy the | ||||
requirements. | ||||
6. Summary | 6. Summary | |||
This document has argued that network virtualization using L3 | This document has argued that network virtualization using overlays | |||
overlays addresses a number of issues being faced as data centers | addresses a number of issues being faced as data centers scale in | |||
scale in size. In addition, careful consideration of a number of | size. In addition, careful study of current data center problems is | |||
issues would lead to the development of interoperable implementation | needed for development of proper requirements and standard solutions. | |||
of virtualization overlays. | ||||
Three potential work were identified. The first involves the | Three potential work were identified. The first involves the | |||
interaction that take place when a VM attaches or detaches from an | interaction that take place when a VM attaches or detaches from an | |||
overlay. A second involves the protocol an NVE would use to | overlay. A second involves the protocol an NVE would use to | |||
communicate with a backend "oracle" to learn and disseminate mapping | communicate with a backend "oracle" to learn and disseminate mapping | |||
information about the VMs the NVE communicates with. The third | information about the VMs the NVE communicates with. The third | |||
potential work area involves the backend oracle itself, i.e., how it | potential work area involves the backend oracle itself, i.e., how it | |||
provides failover and how it interacts with oracles in other domains. | provides failover and how it interacts with oracles in other domains. | |||
7. Acknowledgments | 7. Acknowledgments | |||
Helpful comments and improvements to this document have come from | Helpful comments and improvements to this document have come from | |||
Ariel Hendel, Vinit Jain, and Benson Schliesser. | John Drake, Ariel Hendel, Vinit Jain, Thomas Morin, Benson Schliesser | |||
and many others on the mailing list. | ||||
8. IANA Considerations | 8. IANA Considerations | |||
This memo includes no request to IANA. | This memo includes no request to IANA. | |||
9. Security Considerations | 9. Security Considerations | |||
TBD | TBD | |||
10. Informative References | 10. Informative References | |||
[I-D.fang-vpn4dc-problem-statement] | ||||
Napierala, M., Fang, L., and D. Cai, "IP-VPN Data Center | ||||
Problem Statement and Requirements", | ||||
draft-fang-vpn4dc-problem-statement-01 (work in progress), | ||||
June 2012. | ||||
[I-D.ietf-l2vpn-evpn] | [I-D.ietf-l2vpn-evpn] | |||
Sajassi, A., Aggarwal, R., Henderickx, W., Balus, F., | Sajassi, A., Aggarwal, R., Henderickx, W., Balus, F., | |||
Isaac, A., and J. Uttaro, "BGP MPLS Based Ethernet VPN", | Isaac, A., and J. Uttaro, "BGP MPLS Based Ethernet VPN", | |||
draft-ietf-l2vpn-evpn-01 (work in progress), July 2012. | draft-ietf-l2vpn-evpn-01 (work in progress), July 2012. | |||
[I-D.ietf-lisp] | [I-D.ietf-lisp] | |||
Farinacci, D., Fuller, V., Meyer, D., and D. Lewis, | Farinacci, D., Fuller, V., Meyer, D., and D. Lewis, | |||
"Locator/ID Separation Protocol (LISP)", | "Locator/ID Separation Protocol (LISP)", | |||
draft-ietf-lisp-23 (work in progress), May 2012. | draft-ietf-lisp-23 (work in progress), May 2012. | |||
[I-D.ietf-trill-fine-labeling] | [I-D.ietf-trill-fine-labeling] | |||
Eastlake, D., Zhang, M., Agarwal, P., Perlman, R., and D. | Eastlake, D., Zhang, M., Agarwal, P., Perlman, R., and D. | |||
Dutt, "TRILL: Fine-Grained Labeling", | Dutt, "TRILL: Fine-Grained Labeling", | |||
draft-ietf-trill-fine-labeling-01 (work in progress), | draft-ietf-trill-fine-labeling-01 (work in progress), | |||
June 2012. | June 2012. | |||
[I-D.kreeger-nvo3-overlay-cp] | [I-D.kreeger-nvo3-overlay-cp] | |||
Black, D., Dutt, D., Kreeger, L., Sridhavan, M., and T. | Kreeger, L., Dutt, D., Narten, T., Black, D., and M. | |||
Narten, "Network Virtualization Overlay Control Protocol | Sridhavan, "Network Virtualization Overlay Control | |||
Requirements", draft-kreeger-nvo3-overlay-cp-00 (work in | Protocol Requirements", draft-kreeger-nvo3-overlay-cp-01 | |||
progress), January 2012. | (work in progress), July 2012. | |||
[I-D.lasserre-nvo3-framework] | [I-D.lasserre-nvo3-framework] | |||
Lasserre, M., Balus, F., Morin, T., Bitar, N., and Y. | Lasserre, M., Balus, F., Morin, T., Bitar, N., and Y. | |||
Rekhter, "Framework for DC Network Virtualization", | Rekhter, "Framework for DC Network Virtualization", | |||
draft-lasserre-nvo3-framework-03 (work in progress), | draft-lasserre-nvo3-framework-03 (work in progress), | |||
July 2012. | July 2012. | |||
[I-D.mahalingam-dutt-dcops-vxlan] | ||||
Sridhar, T., Bursell, M., Kreeger, L., Dutt, D., Wright, | ||||
C., Mahalingam, M., Duda, K., and P. Agarwal, "VXLAN: A | ||||
Framework for Overlaying Virtualized Layer 2 Networks over | ||||
Layer 3 Networks", draft-mahalingam-dutt-dcops-vxlan-01 | ||||
(work in progress), February 2012. | ||||
[I-D.raggarwa-data-center-mobility] | [I-D.raggarwa-data-center-mobility] | |||
Aggarwal, R., Rekhter, Y., Henderickx, W., Shekhar, R., | Aggarwal, R., Rekhter, Y., Henderickx, W., Shekhar, R., | |||
and L. Fang, "Data Center Mobility based on BGP/MPLS, IP | and L. Fang, "Data Center Mobility based on BGP/MPLS, IP | |||
Routing and NHRP", draft-raggarwa-data-center-mobility-03 | Routing and NHRP", draft-raggarwa-data-center-mobility-03 | |||
(work in progress), June 2012. | (work in progress), June 2012. | |||
[I-D.sridharan-virtualization-nvgre] | ||||
Sridhavan, M., Greenberg, A., Venkataramaiah, N., Wang, | ||||
Y., Duda, K., Ganga, I., Lin, G., Pearson, M., Thaler, P., | ||||
and C. Tumuluri, "NVGRE: Network Virtualization using | ||||
Generic Routing Encapsulation", | ||||
draft-sridharan-virtualization-nvgre-01 (work in | ||||
progress), July 2012. | ||||
[I-D.wkumari-dcops-l3-vmmobility] | ||||
Kumari, W. and J. Halpern, "Virtual Machine mobility in L3 | ||||
Networks.", draft-wkumari-dcops-l3-vmmobility-00 (work in | ||||
progress), August 2011. | ||||
[RFC2661] Townsley, W., Valencia, A., Rubens, A., Pall, G., Zorn, | [RFC2661] Townsley, W., Valencia, A., Rubens, A., Pall, G., Zorn, | |||
G., and B. Palter, "Layer Two Tunneling Protocol "L2TP"", | G., and B. Palter, "Layer Two Tunneling Protocol "L2TP"", | |||
RFC 2661, August 1999. | RFC 2661, August 1999. | |||
[RFC4023] Worster, T., Rekhter, Y., and E. Rosen, "Encapsulating | ||||
MPLS in IP or Generic Routing Encapsulation (GRE)", | ||||
RFC 4023, March 2005. | ||||
[RFC4364] Rosen, E. and Y. Rekhter, "BGP/MPLS IP Virtual Private | [RFC4364] Rosen, E. and Y. Rekhter, "BGP/MPLS IP Virtual Private | |||
Networks (VPNs)", RFC 4364, February 2006. | Networks (VPNs)", RFC 4364, February 2006. | |||
[RFC5036] Andersson, L., Minei, I., and B. Thomas, "LDP | ||||
Specification", RFC 5036, October 2007. | ||||
[RFC5213] Gundavelli, S., Leung, K., Devarapalli, V., Chowdhury, K., | [RFC5213] Gundavelli, S., Leung, K., Devarapalli, V., Chowdhury, K., | |||
and B. Patil, "Proxy Mobile IPv6", RFC 5213, August 2008. | and B. Patil, "Proxy Mobile IPv6", RFC 5213, August 2008. | |||
[RFC5844] Wakikawa, R. and S. Gundavelli, "IPv4 Support for Proxy | [RFC5844] Wakikawa, R. and S. Gundavelli, "IPv4 Support for Proxy | |||
Mobile IPv6", RFC 5844, May 2010. | Mobile IPv6", RFC 5844, May 2010. | |||
[RFC5845] Muhanna, A., Khalil, M., Gundavelli, S., and K. Leung, | [RFC5845] Muhanna, A., Khalil, M., Gundavelli, S., and K. Leung, | |||
"Generic Routing Encapsulation (GRE) Key Option for Proxy | "Generic Routing Encapsulation (GRE) Key Option for Proxy | |||
Mobile IPv6", RFC 5845, June 2010. | Mobile IPv6", RFC 5845, June 2010. | |||
[RFC6245] Yegani, P., Leung, K., Lior, A., Chowdhury, K., and J. | [RFC6245] Yegani, P., Leung, K., Lior, A., Chowdhury, K., and J. | |||
Navali, "Generic Routing Encapsulation (GRE) Key Extension | Navali, "Generic Routing Encapsulation (GRE) Key Extension | |||
for Mobile IPv4", RFC 6245, May 2011. | for Mobile IPv4", RFC 6245, May 2011. | |||
[RFC6325] Perlman, R., Eastlake, D., Dutt, D., Gai, S., and A. | [RFC6325] Perlman, R., Eastlake, D., Dutt, D., Gai, S., and A. | |||
Ghanwani, "Routing Bridges (RBridges): Base Protocol | Ghanwani, "Routing Bridges (RBridges): Base Protocol | |||
Specification", RFC 6325, July 2011. | Specification", RFC 6325, July 2011. | |||
[SPB] "IEEE P802.1aq/D4.5 Draft Standard for Local and | [SPBM] "IEEE P802.1aq/D4.5 Draft Standard for Local and | |||
Metropolitan Area Networks -- Media Access Control (MAC) | Metropolitan Area Networks -- Media Access Control (MAC) | |||
Bridges and Virtual Bridged Local Area Networks, | Bridges and Virtual Bridged Local Area Networks, | |||
Amendment 8: Shortest Path Bridging", February 2012. | Amendment 8: Shortest Path Bridging", February 2012. | |||
Appendix A. Change Log | Appendix A. Change Log | |||
A.1. Changes from -01 | A.1. Changes from -01 | |||
1. Removed Section 4.2 (Standardization Issues) and Section 5 | 1. Removed Section 4.2 (Standardization Issues) and Section 5 | |||
(Control Plane) as those are more appropriately covered in and | (Control Plane) as those are more appropriately covered in and | |||
skipping to change at page 18, line 25 | skipping to change at page 20, line 6 | |||
5. Revised some of the terminology to be consistent with | 5. Revised some of the terminology to be consistent with | |||
[I-D.lasserre-nvo3-framework] and [I-D.kreeger-nvo3-overlay-cp]. | [I-D.lasserre-nvo3-framework] and [I-D.kreeger-nvo3-overlay-cp]. | |||
A.2. Changes from -02 | A.2. Changes from -02 | |||
1. Numerous changes in response to discussions on the nvo3 mailing | 1. Numerous changes in response to discussions on the nvo3 mailing | |||
list, with majority of changes in Section 2 (Problem Details) and | list, with majority of changes in Section 2 (Problem Details) and | |||
Section 3 (Network Overlays). Best to see diffs for specific | Section 3 (Network Overlays). Best to see diffs for specific | |||
text changes. | text changes. | |||
A.3. Changes from -03 | ||||
1. Too numerous to enumerate; moved solution-specific descriptions | ||||
to Related Work section. Pulled in additional text (and authors) | ||||
from from [I-D.fang-vpn4dc-problem-statement], numerous editorial | ||||
improvements. | ||||
Authors' Addresses | Authors' Addresses | |||
Thomas Narten (editor) | Thomas Narten (editor) | |||
IBM | IBM | |||
Email: narten@us.ibm.com | Email: narten@us.ibm.com | |||
Murari Sridharan | David Black | |||
Microsoft | EMC | |||
Email: muraris@microsoft.com | Email: david.black@emc.com | |||
Dinesh Dutt | Dinesh Dutt | |||
Email: ddutt.ietf@hobbesdutt.com | Email: ddutt.ietf@hobbesdutt.com | |||
David Black | Luyuan Fang | |||
EMC | Cisco Systems | |||
111 Wood Avenue South | ||||
Iselin, NJ 08830 | ||||
USA | ||||
Email: david.black@emc.com | Email: lufang@cisco.com | |||
Eric Gray | ||||
Ericsson | ||||
Email: eric.gray@ericsson.com | ||||
Lawrence Kreeger | Lawrence Kreeger | |||
Cisco | Cisco | |||
Email: kreeger@cisco.com | Email: kreeger@cisco.com | |||
Maria Napierala | ||||
AT&T | ||||
200 Laurel Avenue | ||||
Middletown, NJ 07748 | ||||
USA | ||||
Email: mnapierala@att.com | ||||
Murari Sridharan | ||||
Microsoft | ||||
Email: muraris@microsoft.com | ||||
End of changes. 72 change blocks. | ||||
337 lines changed or deleted | 440 lines changed or added | |||
This html diff was produced by rfcdiff 1.41. The latest version is available from http://tools.ietf.org/tools/rfcdiff/ |