draft-ietf-spring-segment-routing-msdc-01.txt   draft-ietf-spring-segment-routing-msdc-02.txt 
Network Working Group C. Filsfils, Ed. Network Working Group C. Filsfils, Ed.
Internet-Draft S. Previdi, Ed. Internet-Draft S. Previdi, Ed.
Intended status: Informational Cisco Systems, Inc. Intended status: Informational Cisco Systems, Inc.
Expires: October 15, 2016 J. Mitchell Expires: April 6, 2017 J. Mitchell
Unaffiliated Unaffiliated
E. Aries E. Aries
Juniper Networks
P. Lapukhov P. Lapukhov
Facebook Facebook
April 13, 2016 October 3, 2016
BGP-Prefix Segment in large-scale data centers BGP-Prefix Segment in large-scale data centers
draft-ietf-spring-segment-routing-msdc-01 draft-ietf-spring-segment-routing-msdc-02
Abstract Abstract
This document describes the motivation and benefits for applying This document describes the motivation and benefits for applying
segment routing in the data-center. It describes the design to segment routing in the data-center. It describes the design to
deploy segment routing in the data-center, for both the MPLS and IPv6 deploy segment routing in the data-center, for both the MPLS and IPv6
dataplanes. dataplanes.
Requirements Language Requirements Language
skipping to change at page 1, line 44 skipping to change at page 1, line 45
Internet-Drafts are working documents of the Internet Engineering Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF). Note that other groups may also distribute Task Force (IETF). Note that other groups may also distribute
working documents as Internet-Drafts. The list of current Internet- working documents as Internet-Drafts. The list of current Internet-
Drafts is at http://datatracker.ietf.org/drafts/current/. Drafts is at http://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress." material or to cite them other than as "work in progress."
This Internet-Draft will expire on October 15, 2016. This Internet-Draft will expire on April 6, 2017.
Copyright Notice Copyright Notice
Copyright (c) 2016 IETF Trust and the persons identified as the Copyright (c) 2016 IETF Trust and the persons identified as the
document authors. All rights reserved. document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents Provisions Relating to IETF Documents
(http://trustee.ietf.org/license-info) in effect on the date of (http://trustee.ietf.org/license-info) in effect on the date of
publication of this document. Please review these documents publication of this document. Please review these documents
skipping to change at page 3, line 26 skipping to change at page 3, line 26
instructions, called segments. A segment can represent any instructions, called segments. A segment can represent any
instruction, topological or service-based. A segment can have a instruction, topological or service-based. A segment can have a
local semantic to an SR node or global within an SR domain. SR local semantic to an SR node or global within an SR domain. SR
allows to enforce a flow through any topological path and service allows to enforce a flow through any topological path and service
chain while maintaining per-flow state only at the ingress node to chain while maintaining per-flow state only at the ingress node to
the SR domain. Segment Routing can be applied to the MPLS and IPv6 the SR domain. Segment Routing can be applied to the MPLS and IPv6
data-planes. data-planes.
The use-case use-cases described in this document should be The use-case use-cases described in this document should be
considered in the context of the BGP-based large-scale data-center considered in the context of the BGP-based large-scale data-center
(DC) design described in[I-D.ietf-rtgwg-bgp-routing-large-dc]We (DC) design described in [RFC7938]. We extend it by applying SR both
extend it by applying SR both with IPv6 and MPLS dataplane. with IPv6 and MPLS dataplane.
2. Large Scale Data Center Network Design Summary 2. Large Scale Data Center Network Design Summary
This section provides a brief summary of the informational document This section provides a brief summary of the informational document
[I-D.ietf-rtgwg-bgp-routing-large-dc] that outlines a practical [RFC7938] that outlines a practical network design suitable for data-
network design suitable for data-centers of various scales: centers of various scales:
o Data-center networks have highly symmetric topologies with o Data-center networks have highly symmetric topologies with
multiple parallel paths between two server attachment points. The multiple parallel paths between two server attachment points. The
well-known Clos topology is most popular among the operators. In well-known Clos topology is most popular among the operators. In
a Clos topology, the minimum number of parallel paths between two a Clos topology, the minimum number of parallel paths between two
elements is determined by the "width" of the middle stage. See elements is determined by the "width" of the middle stage. See
Figure 1 below for an illustration of the concept. Figure 1 below for an illustration of the concept.
o Large-scale data-centers commonly use a routing protocol, such as o Large-scale data-centers commonly use a routing protocol, such as
BGP4 [RFC4271] in order to provide endpoint connectivity. BGP4 [RFC4271] in order to provide endpoint connectivity.
skipping to change at page 7, line 15 skipping to change at page 7, line 15
For illustration purpose, when considering an MPLS data-plane, we For illustration purpose, when considering an MPLS data-plane, we
assume that the segment index allocated to prefix 192.0.2.x/32 is X. assume that the segment index allocated to prefix 192.0.2.x/32 is X.
As a result, a local label 1600x is allocated for prefix 192.0.2.x/32 As a result, a local label 1600x is allocated for prefix 192.0.2.x/32
by each node throughout the DC fabric. by each node throughout the DC fabric.
When IPv6 data-plane is considered, we assume that Node X is When IPv6 data-plane is considered, we assume that Node X is
allocated IPv6 address (segment) 2001:DB8::X. allocated IPv6 address (segment) 2001:DB8::X.
4.2. eBGP Labeled Unicast (RFC3107) 4.2. eBGP Labeled Unicast (RFC3107)
Referring to Figure 1 and [[I-D.ietf-rtgwg-bgp-routing-large-dc], the Referring to Figure 1 and [RFC7938], the following design
following design modifications are introduced: modifications are introduced:
o Each node peers with its neighbors via eBGP3107 session o Each node peers with its neighbors via eBGP3107 session
o The forwarding plane at Tier-2 and Tier-1 is MPLS. o The forwarding plane at Tier-2 and Tier-1 is MPLS.
o The forwarding plane at Tier-3 is either IP2MPLS (if the host o The forwarding plane at Tier-3 is either IP2MPLS (if the host
sends IP traffic) or MPLS2MPLS (if the host sends MPLS- sends IP traffic) or MPLS2MPLS (if the host sends MPLS-
encapsulated traffic). encapsulated traffic).
Figure 2 zooms on a path from server A to server Z within the Figure 2 zooms on a path from server A to server Z within the
skipping to change at page 12, line 46 skipping to change at page 12, line 46
The control-plane behavior is mostly the same as described in the The control-plane behavior is mostly the same as described in the
previous section: the only difference is that the eBGP3107 path previous section: the only difference is that the eBGP3107 path
propagation is simply replaced by an iBGP3107 path reflection with propagation is simply replaced by an iBGP3107 path reflection with
next-hop changed to self. next-hop changed to self.
The data-plane tables are exactly the same. The data-plane tables are exactly the same.
5. Applying Segment Routing in the DC with IPv6 dataplane 5. Applying Segment Routing in the DC with IPv6 dataplane
The design described in I-D.ietf-rtgwg-bgp-routing-large-dc The design described in [RFC7938] is reused with one single
[I-D.ietf-rtgwg-bgp-routing-large-dc] is reused with one single
modification. We highlight it using the example of the reachability modification. We highlight it using the example of the reachability
to Node11 via spine switch Node5. to Node11 via spine switch Node5.
Spine5 originates 2001:DB8::5/128 with the attached BGP Prefix Spine5 originates 2001:DB8::5/128 with the attached BGP Prefix
Attribute adverting the support of the Segment Routing extension Attribute adverting the support of the Segment Routing extension
header (SRH, [I-D.ietf-6man-segment-routing-header]) for IPv6 packets header (SRH, [I-D.ietf-6man-segment-routing-header]) for IPv6 packets
destined to segment 2001:DB8::5. destined to segment 2001:DB8::5.
Tor11 originates 2001:DB8::11/128 with the attached BGP Prefix Tor11 originates 2001:DB8::11/128 with the attached BGP Prefix
Attribute adverting the support of the Segment Routing extension Attribute adverting the support of the Segment Routing extension
skipping to change at page 16, line 44 skipping to change at page 16, line 44
the data-center. For every host, it may send packets over each of the data-center. For every host, it may send packets over each of
the possible paths, knowing exactly which links and devices these the possible paths, knowing exactly which links and devices these
packets will be crossing. Correlating results for multiple packets will be crossing. Correlating results for multiple
destinations with the topological data, it may automatically isolate destinations with the topological data, it may automatically isolate
possible problem to a link or device in the network. possible problem to a link or device in the network.
8. Additional Benefits 8. Additional Benefits
8.1. MPLS Dataplane with operational simplicity 8.1. MPLS Dataplane with operational simplicity
As required by [I-D.ietf-rtgwg-bgp-routing-large-dc], no new As required by [RFC7938], no new signaling protocol is introduced.
signaling protocol is introduced. The Prefix Segment is a The Prefix Segment is a lightweight extension to BGP Labelled Unicast
lightweight extension to BGP Labelled Unicast (RFC3107 [RFC3107]). (RFC3107 [RFC3107]). It applies either to eBGP or iBGP based
It applies either to eBGP or iBGP based designs. designs.
Specifically, LDP and RSVP-TE are not used. These protocols would Specifically, LDP and RSVP-TE are not used. These protocols would
drastically impact the operational complexity of the Data Center and drastically impact the operational complexity of the Data Center and
would not scale. This is in line with the requirements expressed in would not scale. This is in line with the requirements expressed in
[I-D.ietf-rtgwg-bgp-routing-large-dc] [RFC7938].
A key element of the operational simplicity is the deployment of the A key element of the operational simplicity is the deployment of the
design with a single and consistent SRGB across the DC fabric. design with a single and consistent SRGB across the DC fabric.
At every node in the fabric, the same label is associated to a given At every node in the fabric, the same label is associated to a given
BGP prefix segment and hence a notion of global prefix segment BGP prefix segment and hence a notion of global prefix segment
arises. arises.
When a controller programs HostA to send traffic to HostZ via the When a controller programs HostA to send traffic to HostZ via the
normally available BGP ECMP paths, the controller uses label 16011 normally available BGP ECMP paths, the controller uses label 16011
associated with the ToR switch connected to the HostZ. The associated with the ToR switch connected to the HostZ. The
skipping to change at page 22, line 7 skipping to change at page 22, line 7
16.2. Informative References 16.2. Informative References
[GREENBERG09] [GREENBERG09]
Greenberg, A., Hamilton, J., Jain, N., Kadula, S., Kim, Greenberg, A., Hamilton, J., Jain, N., Kadula, S., Kim,
C., Lahiri, P., Maltz, D., Patel, P., and S. Sengupta, C., Lahiri, P., Maltz, D., Patel, P., and S. Sengupta,
"VL2: A Scalable and Flexible Data Center Network", 2009. "VL2: A Scalable and Flexible Data Center Network", 2009.
[I-D.ietf-6man-segment-routing-header] [I-D.ietf-6man-segment-routing-header]
Previdi, S., Filsfils, C., Field, B., Leung, I., Linkova, Previdi, S., Filsfils, C., Field, B., Leung, I., Linkova,
J., Kosugi, T., Vyncke, E., and D. Lebrun, "IPv6 Segment J., Aries, E., Kosugi, T., Vyncke, E., and D. Lebrun,
Routing Header (SRH)", draft-ietf-6man-segment-routing- "IPv6 Segment Routing Header (SRH)", draft-ietf-6man-
header-01 (work in progress), March 2016. segment-routing-header-02 (work in progress), September
2016.
[I-D.ietf-idr-bgp-prefix-sid] [I-D.ietf-idr-bgp-prefix-sid]
Previdi, S., Filsfils, C., Lindem, A., Patel, K., Previdi, S., Filsfils, C., Lindem, A., Patel, K.,
Sreekantiah, A., Ray, S., and H. Gredler, "Segment Routing Sreekantiah, A., Ray, S., and H. Gredler, "Segment Routing
Prefix SID extensions for BGP", draft-ietf-idr-bgp-prefix- Prefix SID extensions for BGP", draft-ietf-idr-bgp-prefix-
sid-02 (work in progress), December 2015. sid-03 (work in progress), June 2016.
[I-D.ietf-mpls-seamless-mpls] [I-D.ietf-mpls-seamless-mpls]
Leymann, N., Decraene, B., Filsfils, C., Konstantynowicz, Leymann, N., Decraene, B., Filsfils, C., Konstantynowicz,
M., and D. Steinberg, "Seamless MPLS Architecture", draft- M., and D. Steinberg, "Seamless MPLS Architecture", draft-
ietf-mpls-seamless-mpls-07 (work in progress), June 2014. ietf-mpls-seamless-mpls-07 (work in progress), June 2014.
[I-D.ietf-rtgwg-bgp-routing-large-dc]
Lapukhov, P., Premji, A., and J. Mitchell, "Use of BGP for
routing in large-scale data centers", draft-ietf-rtgwg-
bgp-routing-large-dc-09 (work in progress), March 2016.
[I-D.ietf-spring-segment-routing] [I-D.ietf-spring-segment-routing]
Filsfils, C., Previdi, S., Decraene, B., Litkowski, S., Filsfils, C., Previdi, S., Decraene, B., Litkowski, S.,
and R. Shakir, "Segment Routing Architecture", draft-ietf- and R. Shakir, "Segment Routing Architecture", draft-ietf-
spring-segment-routing-07 (work in progress), December spring-segment-routing-09 (work in progress), July 2016.
2015.
[I-D.ietf-spring-segment-routing-central-epe] [I-D.ietf-spring-segment-routing-central-epe]
Filsfils, C., Previdi, S., Ginsburg, D., and D. Afanasiev, Filsfils, C., Previdi, S., Aries, E., Ginsburg, D., and D.
"Segment Routing Centralized BGP Peer Engineering", draft- Afanasiev, "Segment Routing Centralized BGP Peer
ietf-spring-segment-routing-central-epe-01 (work in Engineering", draft-ietf-spring-segment-routing-central-
progress), March 2016. epe-02 (work in progress), September 2016.
[KANDULA04] [KANDULA04]
Sinha, S., Kandula, S., and D. Katabi, "Harnessing TCP's Sinha, S., Kandula, S., and D. Katabi, "Harnessing TCP's
Burstiness with Flowlet Switching", 2004. Burstiness with Flowlet Switching", 2004.
[RFC7938] Lapukhov, P., Premji, A., and J. Mitchell, Ed., "Use of
BGP for Routing in Large-Scale Data Centers", RFC 7938,
DOI 10.17487/RFC7938, August 2016,
<http://www.rfc-editor.org/info/rfc7938>.
Authors' Addresses Authors' Addresses
Clarence Filsfils (editor) Clarence Filsfils (editor)
Cisco Systems, Inc. Cisco Systems, Inc.
Brussels Brussels
BE BE
Email: cfilsfil@cisco.com Email: cfilsfil@cisco.com
Stefano Previdi (editor) Stefano Previdi (editor)
Cisco Systems, Inc. Cisco Systems, Inc.
skipping to change at page 23, line 18 skipping to change at page 23, line 18
Italy Italy
Email: sprevidi@cisco.com Email: sprevidi@cisco.com
Jon Mitchell Jon Mitchell
Unaffiliated Unaffiliated
Email: jrmitche@puck.nether.net Email: jrmitche@puck.nether.net
Ebben Aries Ebben Aries
Facebook Juniper Networks
1133 Innovation Way
Sunnyvale CA 94089
US US
Email: exa@fb.com Email: exa@juniper.net
Petr Lapukhov Petr Lapukhov
Facebook Facebook
US US
Email: petr@fb.com Email: petr@fb.com
 End of changes. 19 change blocks. 
34 lines changed or deleted 37 lines changed or added

This html diff was produced by rfcdiff 1.45. The latest version is available from http://tools.ietf.org/tools/rfcdiff/