draft-ietf-nvo3-framework-06.txt | draft-ietf-nvo3-framework-07.txt | |||
---|---|---|---|---|
Internet Engineering Task Force Marc Lasserre | Internet Engineering Task Force Marc Lasserre | |||
Internet Draft Florin Balus | Internet Draft Florin Balus | |||
Intended status: Informational Alcatel-Lucent | Intended status: Informational Alcatel-Lucent | |||
Expires: Nov 2014 | Expires: Dec 2014 | |||
Thomas Morin | Thomas Morin | |||
France Telecom Orange | France Telecom Orange | |||
Nabil Bitar | Nabil Bitar | |||
Verizon | Verizon | |||
Yakov Rekhter | Yakov Rekhter | |||
Juniper | Juniper | |||
May 21, 2014 | June 5, 2014 | |||
Framework for DC Network Virtualization | Framework for DC Network Virtualization | |||
draft-ietf-nvo3-framework-06.txt | draft-ietf-nvo3-framework-07.txt | |||
Abstract | Abstract | |||
This document provides a framework for Network Virtualization | This document provides a framework for Data Center (DC) Network | |||
Overlays (NVO3) and it defines a reference model along with logical | Virtualization Overlays (NVO3) and it defines a reference model | |||
components required to design a solution. | along with logical components required to design a solution. | |||
Status of this Memo | Status of this Memo | |||
This Internet-Draft is submitted in full conformance with the | This Internet-Draft is submitted in full conformance with the | |||
provisions of BCP 78 and BCP 79. | provisions of BCP 78 and BCP 79. | |||
Internet-Drafts are working documents of the Internet Engineering | Internet-Drafts are working documents of the Internet Engineering | |||
Task Force (IETF). Note that other groups may also distribute | Task Force (IETF). Note that other groups may also distribute | |||
working documents as Internet-Drafts. The list of current Internet- | working documents as Internet-Drafts. The list of current Internet- | |||
Drafts is at http://datatracker.ietf.org/drafts/current/. | Drafts is at http://datatracker.ietf.org/drafts/current/. | |||
Internet-Drafts are draft documents valid for a maximum of six | Internet-Drafts are draft documents valid for a maximum of six | |||
months and may be updated, replaced, or obsoleted by other documents | months and may be updated, replaced, or obsoleted by other documents | |||
at any time. It is inappropriate to use Internet-Drafts as | at any time. It is inappropriate to use Internet-Drafts as | |||
reference material or to cite them other than as "work in progress." | reference material or to cite them other than as "work in progress." | |||
This Internet-Draft will expire on Nov 20, 2014. | This Internet-Draft will expire on Dec 5, 2014. | |||
Copyright Notice | Copyright Notice | |||
Copyright (c) 2014 IETF Trust and the persons identified as the | Copyright (c) 2014 IETF Trust and the persons identified as the | |||
document authors. All rights reserved. | document authors. All rights reserved. | |||
This document is subject to BCP 78 and the IETF Trust's Legal | This document is subject to BCP 78 and the IETF Trust's Legal | |||
Provisions Relating to IETF Documents | Provisions Relating to IETF Documents | |||
(http://trustee.ietf.org/license-info) in effect on the date of | (http://trustee.ietf.org/license-info) in effect on the date of | |||
publication of this document. Please review these documents | publication of this document. Please review these documents | |||
skipping to change at page 3, line 5 | skipping to change at page 3, line 5 | |||
3.1.5.3. Address advertisement and tunnel mapping........15 | 3.1.5.3. Address advertisement and tunnel mapping........15 | |||
3.1.5.4. Overlay Tunneling...............................15 | 3.1.5.4. Overlay Tunneling...............................15 | |||
3.2. Multi-homing...........................................16 | 3.2. Multi-homing...........................................16 | |||
3.3. VM Mobility............................................17 | 3.3. VM Mobility............................................17 | |||
4. Key aspects of overlay networks.............................17 | 4. Key aspects of overlay networks.............................17 | |||
4.1. Pros & Cons............................................17 | 4.1. Pros & Cons............................................17 | |||
4.2. Overlay issues to consider.............................19 | 4.2. Overlay issues to consider.............................19 | |||
4.2.1. Data plane vs Control plane driven................19 | 4.2.1. Data plane vs Control plane driven................19 | |||
4.2.2. Coordination between data plane and control plane.19 | 4.2.2. Coordination between data plane and control plane.19 | |||
4.2.3. Handling Broadcast, Unknown Unicast and Multicast (BUM) | 4.2.3. Handling Broadcast, Unknown Unicast and Multicast (BUM) | |||
traffic..................................................20 | traffic..................................................19 | |||
4.2.4. Path MTU..........................................20 | 4.2.4. Path MTU..........................................20 | |||
4.2.5. NVE location trade-offs...........................21 | 4.2.5. NVE location trade-offs...........................21 | |||
4.2.6. Interaction between network overlays and underlays.22 | 4.2.6. Interaction between network overlays and underlays.22 | |||
5. Security Considerations......................................22 | 5. Security Considerations.....................................22 | |||
6. IANA Considerations.........................................23 | 6. IANA Considerations.........................................23 | |||
7. References..................................................23 | 7. References..................................................23 | |||
7.1. Informative References.................................23 | 7.1. Informative References.................................23 | |||
8. Acknowledgments.............................................24 | 8. Acknowledgments.............................................24 | |||
1. Introduction | 1. Introduction | |||
This document provides a framework for Data Center (DC) Network | This document provides a framework for Data Center (DC) Network | |||
Virtualization over Layer3 (L3) tunnels. This framework is intended | Virtualization over Layer3 (L3) tunnels. This framework is intended | |||
to aid in standardizing protocols and mechanisms to support large- | to aid in standardizing protocols and mechanisms to support large- | |||
skipping to change at page 16, line 34 | skipping to change at page 16, line 34 | |||
on MPLS re-rerouting capabilities. | on MPLS re-rerouting capabilities. | |||
When a Tenant System is co-located with the NVE, the Tenant System | When a Tenant System is co-located with the NVE, the Tenant System | |||
is effectively single homed to the NVE via a virtual port. When the | is effectively single homed to the NVE via a virtual port. When the | |||
Tenant System and the NVE are separated, the Tenant System is | Tenant System and the NVE are separated, the Tenant System is | |||
connected to the NVE via a logical Layer2 (L2) construct such as a | connected to the NVE via a logical Layer2 (L2) construct such as a | |||
VLAN and it can be multi-homed to various NVEs. An NVE may provide | VLAN and it can be multi-homed to various NVEs. An NVE may provide | |||
an L2 service to the end system or an l3 service. An NVE may be | an L2 service to the end system or an l3 service. An NVE may be | |||
multi-homed to a next layer in the DC at Layer2 (L2) or Layer3 | multi-homed to a next layer in the DC at Layer2 (L2) or Layer3 | |||
(L3). When an NVE provides an L2 service and is not co-located with | (L3). When an NVE provides an L2 service and is not co-located with | |||
the end system, techniques such as Ethernet Link Aggregation Group | the end system, loop avoidance techniques must be used. Similarly, | |||
(LAG) or Spanning Tree Protocol (STP) can be used to switch traffic | when the NVE provides L3 service, similar dual-homing techniques can | |||
between an end system and connected NVEs without creating | be used. When the NVE provides a L3 service to the end system, it is | |||
loops. Similarly, when the NVE provides L3 service, similar dual- | possible that no dynamic routing protocol is enabled between the end | |||
homing techniques can be used. When the NVE provides a L3 service to | system and the NVE. The end system can be multi-homed to | |||
the end system, it is possible that no dynamic routing protocol is | multiple physically-separated L3 NVEs over multiple interfaces. When | |||
enabled between the end system and the NVE. The end system can be | one of the links connected to an NVE fails, the other interfaces can | |||
multi-homed to multiple physically-separated L3 NVEs over multiple | be used to reach the end system. | |||
interfaces. When one of the links connected to an NVE fails, the | ||||
other interfaces can be used to reach the end system. | ||||
External connectivity from a DC can be handled by two or more DC | External connectivity from a DC can be handled by two or more DC | |||
gateways. Each gateway provides access to external networks such as | gateways. Each gateway provides access to external networks such as | |||
VPNs or the Internet. A gateway may be connected to two or more edge | VPNs or the Internet. A gateway may be connected to two or more edge | |||
nodes in the external network for redundancy. When a connection to | nodes in the external network for redundancy. When a connection to | |||
an upstream node is lost, the alternative connection is used and the | an upstream node is lost, the alternative connection is used and the | |||
failed route withdrawn. | failed route withdrawn. | |||
3.3. VM Mobility | 3.3. VM Mobility | |||
skipping to change at page 17, line 29 | skipping to change at page 17, line 27 | |||
Solutions to maintain connectivity while a VM is moved are necessary | Solutions to maintain connectivity while a VM is moved are necessary | |||
in the case of "hot" mobility. This implies that connectivity among | in the case of "hot" mobility. This implies that connectivity among | |||
VMs is preserved. For instance, for L2 VNs, ARP caches are updated | VMs is preserved. For instance, for L2 VNs, ARP caches are updated | |||
accordingly. | accordingly. | |||
Upon VM mobility, NVE policies that define connectivity among VMs | Upon VM mobility, NVE policies that define connectivity among VMs | |||
must be maintained. | must be maintained. | |||
During VM mobility, it is expected that the path to the VM's default | During VM mobility, it is expected that the path to the VM's default | |||
gateway assures adequate performance to VM applications. | gateway assures adequate QoS to VM applications, i.e. QoS that | |||
matches the expected service level agreement for these applications. | ||||
4. Key aspects of overlay networks | 4. Key aspects of overlay networks | |||
The intent of this section is to highlight specific issues that | The intent of this section is to highlight specific issues that | |||
proposed overlay solutions need to address. | proposed overlay solutions need to address. | |||
4.1. Pros & Cons | 4.1. Pros & Cons | |||
An overlay network is a layer of virtual network topology on top of | An overlay network is a layer of virtual network topology on top of | |||
the physical network. | the physical network. | |||
skipping to change at page 18, line 35 | skipping to change at page 18, line 32 | |||
It is difficult to accurately evaluate network properties. It | It is difficult to accurately evaluate network properties. It | |||
might be preferable for the underlay network to expose usage | might be preferable for the underlay network to expose usage | |||
and performance information. | and performance information. | |||
- Miscommunication or lack of coordination between overlay and | - Miscommunication or lack of coordination between overlay and | |||
underlay networks can lead to an inefficient usage of network | underlay networks can lead to an inefficient usage of network | |||
resources. | resources. | |||
- When multiple overlays co-exist on top of a common underlay | - When multiple overlays co-exist on top of a common underlay | |||
network, the lack of coordination between overlays can lead | network, the lack of coordination between overlays can lead | |||
to performance issues and/or resource usage inefficiencies. | to performance issues and/or resource usage inefficiencies. | |||
- Traffic carried over an overlay may not traverse firewalls and | - Traffic carried over an overlay might fail to traverse | |||
NAT devices. | firewalls and NAT devices. | |||
- Multicast service scalability: Multicast support may be | - Multicast service scalability: Multicast support may be | |||
required in the underlay network to address tenant flood | required in the underlay network to address tenant flood | |||
containment or efficient multicast handling. The underlay may | containment or efficient multicast handling. The underlay may | |||
also be required to maintain multicast state on a per-tenant | also be required to maintain multicast state on a per-tenant | |||
basis, or even on a per-individual multicast flow of a given | basis, or even on a per-individual multicast flow of a given | |||
tenant. Ingress replication at the NVE eliminates that | tenant. Ingress replication at the NVE eliminates that | |||
additional multicast state in the underlay core, but depending | additional multicast state in the underlay core, but depending | |||
on the multicast traffic volume, it may cause inefficient use | on the multicast traffic volume, it may cause inefficient use | |||
of bandwidth. | of bandwidth. | |||
- Hash-based load balancing may not be optimal as the hash | ||||
algorithm may not work well due to the limited number of | ||||
combinations of tunnel source and destination addresses. Other | ||||
NVO3 mechanisms may use additional entropy information than | ||||
source and destination addresses. | ||||
4.2. Overlay issues to consider | 4.2. Overlay issues to consider | |||
4.2.1. Data plane vs Control plane driven | 4.2.1. Data plane vs Control plane driven | |||
In the case of an L2 NVE, it is possible to dynamically learn MAC | In the case of an L2 NVE, it is possible to dynamically learn MAC | |||
addresses against VAPs. It is also possible that such addresses be | addresses against VAPs. It is also possible that such addresses be | |||
known and controlled via management or a control protocol for both | known and controlled via management or a control protocol for both | |||
L2 NVEs and L3 NVEs. Dynamic data plane learning implies that | L2 NVEs and L3 NVEs. Dynamic data plane learning implies that | |||
flooding of unknown destinations be supported and hence implies that | flooding of unknown destinations be supported and hence implies that | |||
broadcast and/or multicast be supported or that ingress replication | broadcast and/or multicast be supported or that ingress replication | |||
skipping to change at page 20, line 44 | skipping to change at page 20, line 36 | |||
trees as opposed to dedicated multicast trees. | trees as opposed to dedicated multicast trees. | |||
4.2.4. Path MTU | 4.2.4. Path MTU | |||
When using overlay tunneling, an outer header is added to the | When using overlay tunneling, an outer header is added to the | |||
original frame. This can cause the MTU of the path to the egress | original frame. This can cause the MTU of the path to the egress | |||
tunnel endpoint to be exceeded. | tunnel endpoint to be exceeded. | |||
It is usually not desirable to rely on IP fragmentation for | It is usually not desirable to rely on IP fragmentation for | |||
performance reasons. Ideally, the interface MTU as seen by a Tenant | performance reasons. Ideally, the interface MTU as seen by a Tenant | |||
System is adjusted such that no fragmentation is needed. TCP will | System is adjusted such that no fragmentation is needed. | |||
adjust its maximum segment size accordingly. | ||||
It is possible for the MTU to be configured manually or to be | It is possible for the MTU to be configured manually or to be | |||
discovered dynamically. Various Path MTU discovery techniques exist | discovered dynamically. Various Path MTU discovery techniques exist | |||
in order to determine the proper MTU size to use: | in order to determine the proper MTU size to use: | |||
- Classical ICMP-based MTU Path Discovery [RFC1191] [RFC1981] | - Classical ICMP-based MTU Path Discovery [RFC1191] [RFC1981] | |||
- Tenant Systems rely on ICMP messages to discover the MTU of the | - Tenant Systems rely on ICMP messages to discover the MTU of the | |||
end-to-end path to its destination. This method is not always | end-to-end path to its destination. This method is not always | |||
possible, such as when traversing middle boxes (e.g. firewalls) | possible, such as when traversing middle boxes (e.g. firewalls) | |||
which disable ICMP for security reasons | which disable ICMP for security reasons | |||
- Extended MTU Path Discovery techniques such as defined in | - Extended MTU Path Discovery techniques such as defined in | |||
[RFC4821] | [RFC4821] | |||
- Tenant Systems rely on detection of receipt and loss of probe | - Tenant Systems send probe packets of different sizes, and rely | |||
packets at receivers and communication of that receipt/loss | on confirmation of receipt or lack thereof from receivers to | |||
information to senders in order to discover the MTU of the end- | allow a sender to discover the MTU of the end-to-end paths. | |||
to-end path to its destination | ||||
It is also possible to rely on the NVE to perform segmentation and | While it could also be possible to rely on the NVE to perform | |||
reassembly operations without relying on the Tenant Systems to know | segmentation and reassembly operations without relying on the Tenant | |||
about the end-to-end MTU. The assumption is that some hardware | Systems to know about the end-to-end MTU, this would lead to | |||
assist is available on the NVE node to perform such SAR operations. | undesired performance and congestion issues as well as significantly | |||
However, fragmentation by the NVE can lead to performance and | increase the complexity of hardware NVEs required for buffering and | |||
congestion issues due to TCP dynamics and might require new | reassembly logic. | |||
congestion avoidance mechanisms from the underlay network [FLOYD]. | ||||
Finally, the underlay network may be designed in such a way that the | Preferably, the underlay network should be designed in such a way | |||
MTU can accommodate the extra tunneling and possibly additional NVO3 | that the MTU can accommodate the extra tunneling and possibly | |||
header encapsulation overhead. | additional NVO3 header encapsulation overhead. | |||
4.2.5. NVE location trade-offs | 4.2.5. NVE location trade-offs | |||
In the case of DC traffic, traffic originated from a VM is native | In the case of DC traffic, traffic originated from a VM is native | |||
Ethernet traffic. This traffic can be switched by a local virtual | Ethernet traffic. This traffic can be switched by a local virtual | |||
switch or ToR switch and then by a DC gateway. The NVE function can | switch or ToR switch and then by a DC gateway. The NVE function can | |||
be embedded within any of these elements. | be embedded within any of these elements. | |||
There are several criteria to consider when deciding where the NVE | There are several criteria to consider when deciding where the NVE | |||
function should happen: | function should happen: | |||
skipping to change at page 22, line 42 | skipping to change at page 22, line 32 | |||
coordination in placing overlay demand on an underlay network, may | coordination in placing overlay demand on an underlay network, may | |||
be achieved by providing mechanisms to exchange performance and | be achieved by providing mechanisms to exchange performance and | |||
liveliness information between the underlay and overlay(s) or the | liveliness information between the underlay and overlay(s) or the | |||
use of such information by a coordination system. Such information | use of such information by a coordination system. Such information | |||
may include: | may include: | |||
- Performance metrics (throughput, delay, loss, jitter) | - Performance metrics (throughput, delay, loss, jitter) | |||
- Cost metrics | - Cost metrics | |||
such as defined in [RFC2330]. | ||||
5. Security Considerations | 5. Security Considerations | |||
Since NVEs and NVAs play a central role in NVO3, it is critical that | Since NVEs and NVAs play a central role in NVO3, it is critical that | |||
a secure access to NVEs and NVAs be ensured such that no | a secure access to NVEs and NVAs be ensured such that no | |||
unauthorized access is possible. | unauthorized access is possible. | |||
As discussed in section 3.1.5.2. , Tenant Systems identification is | As discussed in section 3.1.5.2. , Tenant Systems identification is | |||
based upon state that is often provided by management systems (e.g. | based upon state that is often provided by management systems (e.g. | |||
a VM orchestration system in a virtualized environment). Secure | a VM orchestration system in a virtualized environment). Secure | |||
access to such management systems must also be ensured. | access to such management systems must also be ensured. | |||
When an NVE receives data from a TS, the tenant identity needs to be | When an NVE receives data from a Tenant System, the tenant identity | |||
verified in order to guarantee that it is authorized to access the | needs to be verified in order to guarantee that it is authorized to | |||
corresponding VN. This can be achieved by identifying incoming | access the corresponding VN. This can be achieved by identifying | |||
packets against specific VAPs in some cases. In other circumstances, | incoming packets against specific VAPs in some cases. In other | |||
authentication may be necessary. | circumstances, authentication may be necessary. | |||
Data integrity can be assured if authorized access to NVEs, NVAs, | Data integrity can be assured if authorized access to NVEs, NVAs, | |||
and intermediate underlay nodes is ensured. Otherwise, encryption | and intermediate underlay nodes is ensured. Otherwise, encryption | |||
must be used. | must be used. | |||
NVO3 provides data confidentiality through data separation. The use | NVO3 provides data confidentiality through data separation. The use | |||
of both VNIs and tunneling of tenant traffic by NVEs ensures that | of both VNIs and tunneling of tenant traffic by NVEs ensures that | |||
NVO3 data is kept in a separate context and thus separated from | NVO3 data is kept in a separate context and thus separated from | |||
other tenant traffic. When NVO3 data traverses untrusted networks, | other tenant traffic. When NVO3 data traverses untrusted networks, | |||
data encryption may be needed. | data encryption may be needed. | |||
skipping to change at page 23, line 40 | skipping to change at page 23, line 31 | |||
information). | information). | |||
6. IANA Considerations | 6. IANA Considerations | |||
IANA does not need to take any action for this draft. | IANA does not need to take any action for this draft. | |||
7. References | 7. References | |||
7.1. Informative References | 7.1. Informative References | |||
[NVOPS] Narten, T. et al, "Problem Statement : Overlays for Network | [NVOPS] Narten, T. et al, "Problem Statement : Overlays for | |||
Virtualization", draft-narten-nvo3-overlay-problem- | Network Virtualization", draft-ietf-nvo3-overlay-problem- | |||
statement (work in progress) | statement (work in progress) | |||
[OF] Open Networking Foundation, "OpenFlow Switch Specification | ||||
v1.4.0" | ||||
[FLOYD] Sally Floyd, Allyn Romanow, "Dynamics of TCP Traffic over | ||||
ATM Networks", IEEE JSAC, V. 13 N. 4, May 1995 | ||||
[RFC4364] Rosen, E. and Y. Rekhter, "BGP/MPLS IP Virtual Private | [RFC4364] Rosen, E. and Y. Rekhter, "BGP/MPLS IP Virtual Private | |||
Networks (VPNs)", RFC 4364, February 2006. | Networks (VPNs)", RFC 4364, February 2006. | |||
[RFC4761] Kompella, K. et al, "Virtual Private LAN Service (VPLS) | [RFC4761] Kompella, K. et al, "Virtual Private LAN Service (VPLS) | |||
Using BGP for auto-discovery and Signaling", RFC4761, | Using BGP for auto-discovery and Signaling", RFC4761, | |||
January 2007 | January 2007 | |||
[RFC4762] Lasserre, M. et al, "Virtual Private LAN Service (VPLS) | [RFC4762] Lasserre, M. et al, "Virtual Private LAN Service (VPLS) | |||
Using Label Distribution Protocol (LDP) Signaling", | Using Label Distribution Protocol (LDP) Signaling", | |||
RFC4762, January 2007 | RFC4762, January 2007 | |||
skipping to change at page 24, line 30 | skipping to change at page 24, line 14 | |||
[RFC1981] McCann, J. et al, "Path MTU Discovery for IPv6", RFC1981, | [RFC1981] McCann, J. et al, "Path MTU Discovery for IPv6", RFC1981, | |||
August 1996 | August 1996 | |||
[RFC4821] Mathis, M. et al, "Packetization Layer Path MTU | [RFC4821] Mathis, M. et al, "Packetization Layer Path MTU | |||
Discovery", RFC4821, March 2007 | Discovery", RFC4821, March 2007 | |||
[RFC6820] Narten, T. et al, "Address Resolution Problems in Large | [RFC6820] Narten, T. et al, "Address Resolution Problems in Large | |||
Data Center Networks", RFC6820, January 2013 | Data Center Networks", RFC6820, January 2013 | |||
[RFC2330] Paxson, V. et al, "Framework for IP Performance Metrics", | ||||
RFC2330, May 1998 | ||||
8. Acknowledgments | 8. Acknowledgments | |||
In addition to the authors the following people have contributed to | In addition to the authors the following people have contributed to | |||
this document: | this document: | |||
Dimitrios Stiliadis, Rotem Salomonovitch, Lucy Yong, Thomas Narten, | Dimitrios Stiliadis, Rotem Salomonovitch, Lucy Yong, Thomas Narten, | |||
Larry Kreeger, David Black. | Larry Kreeger, David Black. | |||
This document was prepared using 2-Word-v2.0.template.dot. | This document was prepared using 2-Word-v2.0.template.dot. | |||
End of changes. 21 change blocks. | ||||
58 lines changed or deleted | 46 lines changed or added | |||
This html diff was produced by rfcdiff 1.41. The latest version is available from http://tools.ietf.org/tools/rfcdiff/ |