draft-ietf-rtgwg-cl-requirement-04.txt   draft-ietf-rtgwg-cl-requirement-05.txt 
RTGWG C. Villamizar, Ed. RTGWG C. Villamizar, Ed.
Internet-Draft Infinera Corporation Internet-Draft OCCNC, LLC
Intended status: Informational D. McDysan, Ed. Intended status: Informational D. McDysan, Ed.
Expires: September 15, 2011 S. Ning Expires: August 2, 2012 S. Ning
A. Malis A. Malis
Verizon Verizon
L. Yong L. Yong
Huawei USA Huawei USA
March 14, 2011 January 30, 2012
Requirements for MPLS Over a Composite Link Requirements for MPLS Over a Composite Link
draft-ietf-rtgwg-cl-requirement-04 draft-ietf-rtgwg-cl-requirement-05
Abstract Abstract
There is often a need to provide large aggregates of bandwidth that There is often a need to provide large aggregates of bandwidth that
are best provided using parallel links between routers or MPLS LSR. are best provided using parallel links between routers or MPLS LSR.
In core networks there is often no alternative since the aggregate In core networks there is often no alternative since the aggregate
capacities of core networks today far exceed the capacity of a single capacities of core networks today far exceed the capacity of a single
physical link or single packet processing element. physical link or single packet processing element.
The presence of parallel links, with each link potentially comprised The presence of parallel links, with each link potentially comprised
skipping to change at page 2, line 4 skipping to change at page 2, line 4
Internet-Drafts are working documents of the Internet Engineering Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF). Note that other groups may also distribute Task Force (IETF). Note that other groups may also distribute
working documents as Internet-Drafts. The list of current Internet- working documents as Internet-Drafts. The list of current Internet-
Drafts is at http://datatracker.ietf.org/drafts/current/. Drafts is at http://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress." material or to cite them other than as "work in progress."
This Internet-Draft will expire on September 15, 2011. This Internet-Draft will expire on August 2, 2012.
Copyright Notice Copyright Notice
Copyright (c) 2011 IETF Trust and the persons identified as the Copyright (c) 2012 IETF Trust and the persons identified as the
document authors. All rights reserved. document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents Provisions Relating to IETF Documents
(http://trustee.ietf.org/license-info) in effect on the date of (http://trustee.ietf.org/license-info) in effect on the date of
publication of this document. Please review these documents publication of this document. Please review these documents
carefully, as they describe your rights and restrictions with respect carefully, as they describe your rights and restrictions with respect
to this document. Code Components extracted from this document must to this document. Code Components extracted from this document must
include Simplified BSD License text as described in Section 4.e of include Simplified BSD License text as described in Section 4.e of
the Trust Legal Provisions and are provided without warranty as the Trust Legal Provisions and are provided without warranty as
skipping to change at page 3, line 16 skipping to change at page 3, line 16
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.1. Requirements Language . . . . . . . . . . . . . . . . . . 4 1.1. Requirements Language . . . . . . . . . . . . . . . . . . 4
2. Assumptions . . . . . . . . . . . . . . . . . . . . . . . . . 4 2. Assumptions . . . . . . . . . . . . . . . . . . . . . . . . . 4
3. Definitions . . . . . . . . . . . . . . . . . . . . . . . . . 4 3. Definitions . . . . . . . . . . . . . . . . . . . . . . . . . 4
4. Network Operator Functional Requirements . . . . . . . . . . . 5 4. Network Operator Functional Requirements . . . . . . . . . . . 5
4.1. Availability, Stability and Transient Response . . . . . . 5 4.1. Availability, Stability and Transient Response . . . . . . 5
4.2. Component Links Provided by Lower Layer Networks . . . . . 6 4.2. Component Links Provided by Lower Layer Networks . . . . . 6
4.3. Parallel Component Links with Different Characteristics . 7 4.3. Parallel Component Links with Different Characteristics . 7
5. Derived Requirements . . . . . . . . . . . . . . . . . . . . . 9 5. Derived Requirements . . . . . . . . . . . . . . . . . . . . . 9
6. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 10 6. Management Requirements . . . . . . . . . . . . . . . . . . . 10
7. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 10 7. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 11
8. Security Considerations . . . . . . . . . . . . . . . . . . . 10 8. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 11
9. References . . . . . . . . . . . . . . . . . . . . . . . . . . 11 9. Security Considerations . . . . . . . . . . . . . . . . . . . 11
9.1. Normative References . . . . . . . . . . . . . . . . . . . 11 10. References . . . . . . . . . . . . . . . . . . . . . . . . . . 11
9.2. Informative References . . . . . . . . . . . . . . . . . . 11 10.1. Normative References . . . . . . . . . . . . . . . . . . . 11
9.3. Appendix References . . . . . . . . . . . . . . . . . . . 12 10.2. Informative References . . . . . . . . . . . . . . . . . . 12
Appendix A. More Details on Existing Network Operator 10.3. Appendix References . . . . . . . . . . . . . . . . . . . 13
Practices and Protocol Usage . . . . . . . . . . . . 13 Appendix A. Existing Network Operator Practices and Protocol
Appendix B. Existing Multipath Standards and Techniques . . . . . 15 Usage . . . . . . . . . . . . . . . . . . . . . . . . 14
B.1. Common Multpath Load Spliting Techniques . . . . . . . . . 16 Appendix B. Existing Multipath Standards and Techniques . . . . . 14
B.2. Simple and Adaptive Load Balancing Multipath . . . . . . . 17
B.3. Traffic Split over Parallel Links . . . . . . . . . . . . 18
B.4. Traffic Split over Multiple Paths . . . . . . . . . . . . 18
Appendix C. ITU-T G.800 Composite Link Definitions and Appendix C. ITU-T G.800 Composite Link Definitions and
Terminology . . . . . . . . . . . . . . . . . . . . . 18 Terminology . . . . . . . . . . . . . . . . . . . . . 14
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 19 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 15
1. Introduction 1. Introduction
The purpose of this document is to describe why network operators The purpose of this document is to describe why network operators
require certain functions in order to solve certain business problems require certain functions in order to solve certain business problems
(Section 2). The intent is to first describe why things need to be (Section 2). The intent is to first describe why things need to be
done in terms of functional requirements that are as independent as done in terms of functional requirements that are as independent as
possible of protocol specifications (Section 4). For certain possible of protocol specifications (Section 4). For certain
functional requirements this document describes a set of derived functional requirements this document describes a set of derived
protocol requirements (Section 5). Three appendices provide protocol requirements (Section 5). Three appendices provide
skipping to change at page 7, line 42 skipping to change at page 7, line 42
"preempted LSPs" may not be restored if there is no "preempted LSPs" may not be restored if there is no
uncongested path in the network. uncongested path in the network.
4.3. Parallel Component Links with Different Characteristics 4.3. Parallel Component Links with Different Characteristics
Corresponding to Case 1 of [ITU-T.G.800], as one means to provide Corresponding to Case 1 of [ITU-T.G.800], as one means to provide
high availability, network operators deploy a topology in the MPLS high availability, network operators deploy a topology in the MPLS
network using lower layer networks that have a certain degree of network using lower layer networks that have a certain degree of
diversity at the lower layer(s). Many techniques have been developed diversity at the lower layer(s). Many techniques have been developed
to balance the distribution of flows across component links that to balance the distribution of flows across component links that
connect the same pair of nodes (See Appendix B.3). When the path for connect the same pair of nodes. When the path for a flow can be
a flow can be chosen from a set of candidate nodes connected via chosen from a set of candidate nodes connected via composite links,
composite links, other techniques have been developed (See other techniques have been developed.
Appendix B.4).
FR#11 The solution SHALL measure traffic on a labeled traffic flow FR#11 The solution SHALL measure traffic on a labeled traffic flow
and dynamically select the component link on which to place and dynamically select the component link on which to place
this flow in order to balance the load so that no component this flow in order to balance the load so that no component
link in the composite link between a pair of nodes is link in the composite link between a pair of nodes is
overloaded. overloaded.
FR#12 When a traffic flow is moved from one component link to FR#12 When a traffic flow is moved from one component link to
another in the same composite link between a set of nodes (or another in the same composite link between a set of nodes (or
sites), it MUST be done so in a minimally disruptive manner. sites), it MUST be done so in a minimally disruptive manner.
skipping to change at page 9, line 14 skipping to change at page 9, line 14
FR#20 The solution SHOULD support the use case where a composite FR#20 The solution SHOULD support the use case where a composite
link itself is a component link for a higher order composite link itself is a component link for a higher order composite
link. For example, a composite link comprised of MPLS-TP bi- link. For example, a composite link comprised of MPLS-TP bi-
directional tunnels viewed as logical links could then be used directional tunnels viewed as logical links could then be used
as a component link in yet another composite link that as a component link in yet another composite link that
connects MPLS routers. connects MPLS routers.
FR#21 The solution MUST support an optional means for LSP signaling FR#21 The solution MUST support an optional means for LSP signaling
to bind an LSP to a particular component link within a to bind an LSP to a particular component link within a
composite link. If this option is applied to a bidirectional composite link. If this option is not exercised, then an LSP
LSP, both directions of the LSP are bound to the component. that is bound to a composite link may be bound to any
If this option is not exercised, then an LSP that is bound to component link matching all other signaled requirements, and
a composite link may be bound to any component link matching different directions of a bidirectional LSP can be bound to
all other signaled requirements, and different directions of a different component links.
bidirectional LSP can be bound to different component links.
FR#22 The solution MUST support a means to indicate that both
directions of co-routed bidirectional LSP MUST be bound to the
same component link.
5. Derived Requirements 5. Derived Requirements
This section takes the next step and derives high-level requirements This section takes the next step and derives high-level requirements
on protocol specification from the functional requirements. on protocol specification from the functional requirements.
DR#1 The solution SHOULD attempt to extend existing protocols DR#1 The solution SHOULD attempt to extend existing protocols
wherever possible, developing a new protocol only if this adds wherever possible, developing a new protocol only if this adds
a significant set of capabilities. a significant set of capabilities.
The vast majority of network operators have provisioned L3VPN
services over LDP. Many have deployed L2VPN services over LDP
as well. TE extensions to IGP and RSVP-TE are viewed as being
overly complex by some operators.
DR#2 A solution SHOULD extend LDP capabilities to meet functional DR#2 A solution SHOULD extend LDP capabilities to meet functional
requirements (without using TE methods as decided in requirements (without using TE methods as decided in
[RFC3468]). [RFC3468]).
DR#3 Coexistence of LDP and RSVP-TE signaled LSPs MUST be supported DR#3 Coexistence of LDP and RSVP-TE signaled LSPs MUST be supported
on a composite link. Other functional requirements should be on a composite link. Other functional requirements should be
supported as independently of signaling protocol as possible. supported as independently of signaling protocol as possible.
DR#4 When the nodes connected via a composite link are in the same DR#4 When the nodes connected via a composite link are in the same
MPLS network topology, the solution MAY define extensions to MPLS network topology, the solution MAY define extensions to
skipping to change at page 10, line 25 skipping to change at page 10, line 25
component link characteristics (e.g., latency, up/down state), component link characteristics (e.g., latency, up/down state),
and/or bandwidth utilization. and/or bandwidth utilization.
DR#7 When a worst case failure scenario occurs, the number of DR#7 When a worst case failure scenario occurs, the number of
RSVP-TE LSPs to be resignaled will cause a period of RSVP-TE LSPs to be resignaled will cause a period of
unavailability as perceived by users. The resignaling time of unavailability as perceived by users. The resignaling time of
the solution MUST meet the NPO objective for the duration of the solution MUST meet the NPO objective for the duration of
unavailability. The resignaling time of the solution MUST not unavailability. The resignaling time of the solution MUST not
increase significantly as compared with current methods. increase significantly as compared with current methods.
6. Acknowledgements 6. Management Requirements
MR#1 Management Plane MUST support polling of the status and
configuration of a composite link and its individual composite
link and support notification of status change.
MR#2 Management Plane MUST be able to activate or de-activate any
component link in a composite link in order to facilitate
operation maintenance tasks. The routers at each end of a
composite link MUST redistribute traffic to move traffic from a
de-activated link to other component links based on the traffic
flow TE criteria.
MR#3 Management Plane MUST be able to configure a LSP over a
composite link and be able to select a component link for the
LSP.
MR#4 Management Plane MUST be able to trace which component link a
LSP is assigned to and monitor individual component link and
composite link performance.
MR#5 Management Plane MUST be able to verify connectivity over each
individual component link within a composite link.
MR#6 Management Plane SHOULD provide the means for an operator to
initiate an optimization process.
7. Acknowledgements
Frederic Jounay of France Telecom and Yuji Kamite of NTT Frederic Jounay of France Telecom and Yuji Kamite of NTT
Communications Corporation co-authored a version of this document. Communications Corporation co-authored a version of this document.
A rewrite of this document occurred after the IETF77 meeting. A rewrite of this document occurred after the IETF77 meeting.
Dimitri Papadimitriou, Lou Berger, Tony Li, the WG chairs John Scuder Dimitri Papadimitriou, Lou Berger, Tony Li, the WG chairs John Scuder
and Alex Zinin, and others provided valuable guidance prior to and at and Alex Zinin, and others provided valuable guidance prior to and at
the IETF77 RTGWG meeting. the IETF77 RTGWG meeting.
Tony Li and John Drake have made numerous valuable comments on the Tony Li and John Drake have made numerous valuable comments on the
RTGWG mailing list that are reflected in versions following the RTGWG mailing list that are reflected in versions following the
IETF77 meeting. IETF77 meeting.
7. IANA Considerations 8. IANA Considerations
This memo includes no request to IANA. This memo includes no request to IANA.
8. Security Considerations 9. Security Considerations
This document specifies a set of requirements. The requirements This document specifies a set of requirements. The requirements
themselves do not pose a security threat. If these requirements are themselves do not pose a security threat. If these requirements are
met using MPLS signaling as commonly practiced today with met using MPLS signaling as commonly practiced today with
authenticated but unencrypted OSPF-TE, ISIS-TE, and RSVP-TE or LDP, authenticated but unencrypted OSPF-TE, ISIS-TE, and RSVP-TE or LDP,
then the requirement to provide additional information in this then the requirement to provide additional information in this
communication presents additional information that could conceivably communication presents additional information that could conceivably
be gathered in a man-in-the-middle confidentiality breach. Such an be gathered in a man-in-the-middle confidentiality breach. Such an
attack would require a capability to monitor this signaling either attack would require a capability to monitor this signaling either
through a provider breach or access to provider physical transmission through a provider breach or access to provider physical transmission
infrastructure. A provider breach already poses a threat of numerous infrastructure. A provider breach already poses a threat of numerous
tpes of attacks which are of far more serious consequence. Encrption tpes of attacks which are of far more serious consequence. Encrption
of the signaling can prevent or render more difficult any of the signaling can prevent or render more difficult any
confidentiality breach that otherwise might occur by means of access confidentiality breach that otherwise might occur by means of access
to provider physical transmission infrastructure. to provider physical transmission infrastructure.
9. References 10. References
9.1. Normative References 10.1. Normative References
[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
Requirement Levels", BCP 14, RFC 2119, March 1997. Requirement Levels", BCP 14, RFC 2119, March 1997.
9.2. Informative References 10.2. Informative References
[I-D.ietf-l2vpn-vpms-frmwk-requirements] [I-D.ietf-l2vpn-vpms-frmwk-requirements]
Kamite, Y., JOUNAY, F., Niven-Jenkins, B., Brungard, D., Kamite, Y., JOUNAY, F., Niven-Jenkins, B., Brungard, D.,
and L. Jin, "Framework and Requirements for Virtual and L. Jin, "Framework and Requirements for Virtual
Private Multicast Service (VPMS)", Private Multicast Service (VPMS)",
draft-ietf-l2vpn-vpms-frmwk-requirements-03 (work in draft-ietf-l2vpn-vpms-frmwk-requirements-03 (work in
progress), July 2010. progress), July 2010.
[ITU-T.G.800] [ITU-T.G.800]
ITU-T, "Unified functional architecture of transport ITU-T, "Unified functional architecture of transport
skipping to change at page 12, line 34 skipping to change at page 13, line 16
[RFC4797] Rekhter, Y., Bonica, R., and E. Rosen, "Use of Provider [RFC4797] Rekhter, Y., Bonica, R., and E. Rosen, "Use of Provider
Edge to Provider Edge (PE-PE) Generic Routing Edge to Provider Edge (PE-PE) Generic Routing
Encapsulation (GRE) or IP in BGP/MPLS IP Virtual Private Encapsulation (GRE) or IP in BGP/MPLS IP Virtual Private
Networks", RFC 4797, January 2007. Networks", RFC 4797, January 2007.
[RFC5254] Bitar, N., Bocci, M., and L. Martini, "Requirements for [RFC5254] Bitar, N., Bocci, M., and L. Martini, "Requirements for
Multi-Segment Pseudowire Emulation Edge-to-Edge (PWE3)", Multi-Segment Pseudowire Emulation Edge-to-Edge (PWE3)",
RFC 5254, October 2008. RFC 5254, October 2008.
9.3. Appendix References 10.3. Appendix References
[I-D.ietf-pwe3-fat-pw] [I-D.ietf-pwe3-fat-pw]
Bryant, S., Filsfils, C., Drafz, U., Kompella, V., Regan, Bryant, S., Filsfils, C., Drafz, U., Kompella, V., Regan,
J., and S. Amante, "Flow Aware Transport of Pseudowires J., and S. Amante, "Flow Aware Transport of Pseudowires
over an MPLS PSN", draft-ietf-pwe3-fat-pw-03 (work in over an MPLS PSN", draft-ietf-pwe3-fat-pw-03 (work in
progress), January 2010. progress), January 2010.
[IEEE-802.1AX]
IEEE Standards Association, "IEEE Std 802.1AX-2008 IEEE
Standard for Local and Metropolitan Area Networks - Link
Aggregation", 2006, <http://standards.ieee.org/getieee802/
download/802.1AX-2008.pdf>.
[ITU-T.Y.1540]
ITU-T, "Internet protocol data communication service - IP
packet transfer and availability performance parameters",
2007, <http://www.itu.int/rec/T-REC-Y.1540/en>.
[ITU-T.Y.1541]
ITU-T, "Network performance objectives for IP-based
services", 2006, <http://www.itu.int/rec/T-REC-Y.1541/en>.
[RFC1717] Sklower, K., Lloyd, B., McGregor, G., and D. Carr, "The [RFC1717] Sklower, K., Lloyd, B., McGregor, G., and D. Carr, "The
PPP Multilink Protocol (MP)", RFC 1717, November 1994. PPP Multilink Protocol (MP)", RFC 1717, November 1994.
[RFC2475] Blake, S., Black, D., Carlson, M., Davies, E., Wang, Z., [RFC2475] Blake, S., Black, D., Carlson, M., Davies, E., Wang, Z.,
and W. Weiss, "An Architecture for Differentiated and W. Weiss, "An Architecture for Differentiated
Services", RFC 2475, December 1998. Services", RFC 2475, December 1998.
[RFC2615] Malis, A. and W. Simpson, "PPP over SONET/SDH", RFC 2615, [RFC2615] Malis, A. and W. Simpson, "PPP over SONET/SDH", RFC 2615,
June 1999. June 1999.
skipping to change at page 13, line 42 skipping to change at page 14, line 9
Internet Protocol", RFC 4301, December 2005. Internet Protocol", RFC 4301, December 2005.
[RFC4385] Bryant, S., Swallow, G., Martini, L., and D. McPherson, [RFC4385] Bryant, S., Swallow, G., Martini, L., and D. McPherson,
"Pseudowire Emulation Edge-to-Edge (PWE3) Control Word for "Pseudowire Emulation Edge-to-Edge (PWE3) Control Word for
Use over an MPLS PSN", RFC 4385, February 2006. Use over an MPLS PSN", RFC 4385, February 2006.
[RFC4928] Swallow, G., Bryant, S., and L. Andersson, "Avoiding Equal [RFC4928] Swallow, G., Bryant, S., and L. Andersson, "Avoiding Equal
Cost Multipath Treatment in MPLS Networks", BCP 128, Cost Multipath Treatment in MPLS Networks", BCP 128,
RFC 4928, June 2007. RFC 4928, June 2007.
Appendix A. More Details on Existing Network Operator Practices and Appendix A. Existing Network Operator Practices and Protocol Usage
Protocol Usage
Often, network operators have a contractual Service Level Agreement
(SLA) with customers for services that are comprised of numerical
values for performance measures, principally availability, latency,
delay variation. Additionally, network operators may have Service
Level Sepcification (SLS) that is for internal use by the operator.
See [ITU-T.Y.1540], [ITU-T.Y.1541], RFC3809, Section 4.9 [RFC3809]
for examples of the form of such SLA and SLS specifications. In this
document we use the term Network Performance Objective (NPO) as
defined in section 5 of [ITU-T.Y.1541] since the SLA and SLS measures
have network operator and service specific implications. Note that
the numerical NPO values of Y.1540 and Y.1541 span multiple networks
and may be looser than network operator SLA or SLS objectives.
Applications and acceptable user experience have an important
relationship to these performance parameters.
Consider latency as an example. In some cases, minimizing latency
relates directly to the best customer experience (e.g., in TCP closer
is faster). I other cases, user experience is relatively insensitive
to latency, up to a specific limit at which point user perception of
quality degrades significantly (e.g., interactive human voice and
multimedia conferencing). A number of NPOs have. a bound on point-
point latency, and as long as this bound is met, the NPO is met --
decreasing the latency is not necessary. In some NPOs, if the
specified latency is not met, the user considers the service as
unavailable. An unprotected LSP can be manually provisioned on a set
of to meet this type of NPO, but this lowers availability since an
alternate route that meets the latency NPO cannot be determined.
Historically, when an IP/MPLS network was operated over a lower layer
circuit switched network (e.g., SONET rings), a change in latency
caused by the lower layer network (e.g., due to a maintenance action
or failure) this was not known to the MPLS network. This resulted in
latency affecting end user experience, sometimes violating NPOs or
resulting in user complaints.
A response to this problem was to provision IP/MPLS networks over
unprotected circuits and set the metric and/or TE-metric proportional
to latency. This resulted in traffic being directed over the least
latency path, even if this was not needed to meet an NPO or meet user
experience objectives. This results in reduced flexibility and
increased cost for network operators. Using lower layer networks to
provide restoration and grooming is expected to be more efficient,
but the inability to communicate performance parameters, in
particular latency, from the lower layer network to the higher layer
network is an important problem to be solved before this can be done.
Latency NPOs for pt-pt services are often tied closely to geographic
locations, while latency for multipoint services may be based upon a
worst case within a region.
Section 7 of [ITU-T.Y.1540] defines availability for an IP service in
terms of loss exceeding a threshold for a period on the order of 5
minutes. However, the timeframes for restoration (i.e., as
implemented by pre-determined protection, convergence of routing
protocols and/or signaling) for services range from on the order of
100 ms or less (e.g., for VPWS to emulate classical SDH/SONET
protection switching), to several minutes (e.g., to allow BGP to
reconverge for L3VPN) and may differ among the set of customers
within a single service.
The presence of only three Traffic Class (TC) bits (previously known
as EXP bits) in the MPLS shim header is limiting when a network
operator needs to support QoS classes for multiple services (e.g.,
L2VPN VPWS, VPLS, L3VPN and Internet), each of which has a set of QoS
classes that need to be supported. In some cases one bit is used to
indicate conformance to some ingress traffic classification, leaving
only two bits for indicating the service QoS classes. The approach
that has been taken is to aggregate these QoS classes into similar
sets on LER-LSR and LSR-LSR links.
Labeled LSPs have been and use of link layer encapsulation have been
standardized in order to provide a means to meet these needs.
The IP DSCP cannot be used for flow identification since RFC 4301
Section 5.5 [RFC4301] requires Diffserv transparency, and in general
network operators do not rely on the DSCP of Internet packets.
A label is pushed onto Internet packets when they are carried along
with L2/L3VPN packets on the same link or lower layer network
provides a mean to distinguish between the QoS class for these
packets.
Operating an MPLS-TE network involves a different paradigm from
operating an IGP metric-based LDP signaled MPLS network. The mpt-pt
LDP signaled MPLS LSPs occur automatically, and balancing across
parallel links occurs if the IGP metrics are set "equally" (with
equality a locally definable relation).
Traffic is typically comprised of a few large (some very large) flows The network operator practices appendix has been moved to a separate
and many small flows. In some cases, separate LSPs are established document. When that document has an XML I-D tag the references to
for very large flows. This can occur even if the IP header this appendix will be changed to that document and this appendix will
information is inspected by a router, for example an IPsec tunnel be deleted.
that carries a large amount of traffic. An important example of
large flows is that of a L2/L3 VPN customer who has an access line
bandwdith comparable to a client-client composite link bandwidth --
there could be flows that are on the order of the access line
bandwdith.
Appendix B. Existing Multipath Standards and Techniques Appendix B. Existing Multipath Standards and Techniques
Today the requirement to handle large aggregations of traffic, much The multipath standards and techniques appendix has been moved to a
larger than a single component link, can be handled by a number of separate document. When that document has an XML I-D tag the
techniques which we will collectively call multipath. Multipath references to this appendix will be changed to that document and this
applied to parallel links between the same set of nodes includes appendix will be deleted.
Ethernet Link Aggregation [IEEE-802.1AX], link bundling [RFC4201], or
other aggregation techniques some of which may be vendor specific.
Multipath applied to diverse paths rather than parallel links
includes Equal Cost MultiPath (ECMP) as applied to OSPF, ISIS, or
even BGP, and equal cost LSP, as described in Appendix B.4. Various
mutilpath techniques have strengths and weaknesses.
The term composite link is more general than terms such as link
aggregate which is generally considered to be specific to Ethernet
and its use here is consistent with the broad definition in
[ITU-T.G.800]. The term multipath excludes inverse multiplexing and
refers to techniques which only solve the problem of large
aggregations of traffic, without addressing the other requirements
outlined in this document.
B.1. Common Multpath Load Spliting Techniques
Identical load balancing techniqes are used for multipath both over
parallel links and over diverse paths.
Large aggregates of IP traffic do not provide explicit signaling to
indicate the expected traffic loads. Large aggregates of MPLS
traffic are carried in MPLS tunnels supported by MPLS LSP. LSP which
are signaled using RSVP-TE extensions do provide explicit signaling
which includes the expected traffic load for the aggregate. LSP
which are signaled using LDP do not provide an expected traffic load.
MPLS LSP may contain other MPLS LSP arranged hierarchically. When an
MPLS LSR serves as a midpoint LSR in an LSP carrying other LSP as
payload, there is no signaling associated with these inner LSP.
Therefore even when using RSVP-TE signaling there may be insufficient
information provided by signaling to adequately distribute load
across a composite link.
Generally a set of label stack entries that is unique across the
ordered set of label numbers can safely be assumed to contain a group
of flows. The reordering of traffic can therefore be considered to
be acceptable unless reordering occurs within traffic containing a
common unique set of label stack entries. Existing load splitting
techniques take advantage of this property in addition to looking
beyond the bottom of the label stack and determining if the payload
is IPv4 or IPv6 to load balance traffic accordingly.
MPLS-TP OAM violates the assumption that it is safe to reorder
traffic within an LSP. If MPLS-TP OAM is to be accommodated, then
existing multipth techniques must be modified. Such modifications
are outside the scope of this document.
For example a large aggregate of IP traffic may be subdivided into a
large number of groups of flows using a hash on the IP source and
destination addresses. This is as described in [RFC2475] and
clarified in [RFC3260]. For MPLS traffic carrying IP, a similar hash
can be performed on the set of labels in the label stack. These
techniques are both examples of means to subdivide traffic into
groups of flows for the purpose of load balancing traffic across
aggregated link capacity. The means of identifying a flow should not
be confused with the definition of a flow.
Discussion of whether a hash based approach provides a sufficiently
even load balance using any particular hashing algorithm or method of
distributing traffic across a set of component links is outside of
the scope of this document.
The current load balancing techniques are referenced in [RFC4385] and
[RFC4928]. The use of three hash based approaches are described in
[RFC2991] and [RFC2992]. A mechanism to identify flows within PW is
described in [I-D.ietf-pwe3-fat-pw]. The use of hash based
approaches is mentioned as an example of an existing set of
techniques to distribute traffic over a set of component links.
Other techniques are not precluded.
B.2. Simple and Adaptive Load Balancing Multipath
Simple multipath generally relies on the mathematical probability
that given a very large number of small microflows, these microflows
will tend to be distributed evenly across a hash space. A common
simple multipath implementation assumes that all members (component
links) are of equal capacity and perform a modulo operation across
the hashed value. An alternate simple multipath technique uses a
table generally with a power of two size, and distributes the table
entries proportionally among members according to the capacity of
each member.
Simple load balancing works well if there are a very large number of
small microflows (i.e., microflow rate is much less than component
link capacity). However, the case where there are even a few large
microflows is not handled well by simple load balancing.
An adaptive multipath technique is one where the traffic bound to
each member (component link) is measured and the load split is
adjusted accordingly. As long as the adjustment is done within a
single network element, then no protocol extensions are required and
there are no interoperability issues.
Note that if the load balancing algorithm and/or its parameters is
adjusted, then packets in some flows may be delivered out of
sequence.
B.3. Traffic Split over Parallel Links
The load spliting techniques defined in Appendix B.1 and Appendix B.2
are both used in splitting traffic over parallel links between the
same pair of nodes. The best known technique, though far from being
the first, is Ethernet Link Aggregation [IEEE-802.1AX]. This same
technique had been applied much earlier using OSPF or ISIS Equal Cost
MultiPath (ECMP) over parallel links between the same nodes.
Multilink PPP [RFC1717] uses a technique that provides inverse
multiplexing, however a number of vendors had provided proprietary
extensions to PPP over SONET/SDH [RFC2615] that predated Ethernet
Link Aggregation but are no longer used.
Link bundling [RFC4201] provides yet another means of handling
parallel LSP. RFC4201 explicitly allow a special value of all ones
to indicate a split across all members of the bundle.
B.4. Traffic Split over Multiple Paths
OSPF or ISIS Equal Cost MultiPath (ECMP) is a well known form of
traffic split over multiple paths that may traverse intermediate
nodes. ECMP is often incorrectly equated to only this case, and
multipath over multiple diverse paths is often incorrectly equated to
ECMP.
Many implementations are able to create more than one LSP between a
pair of nodes, where these LSP are routed diversely to better make
use of available capacity. The load on these LSP can be distributed
proportionally to the reserved bandwidth of the LSP. These multiple
LSP may be advertised as a single PSC FA and any LSP making use of
the FA may be split over these multiple LSP.
Link bundling [RFC4201] component links may themselves be LSP. When
this technique is used, any LSP which specifies the link bundle may
be split across the multiple paths of the LSP that comprise the
bundle.
Appendix C. ITU-T G.800 Composite Link Definitions and Terminology Appendix C. ITU-T G.800 Composite Link Definitions and Terminology
Composite Link: Composite Link:
Section 6.9.2 of ITU-T-G.800 [ITU-T.G.800] defines composite link Section 6.9.2 of ITU-T-G.800 [ITU-T.G.800] defines composite link
in terms of three cases, of which the following two are relevant in terms of three cases, of which the following two are relevant
(the one describing inverse (TDM) multiplexing does not apply). (the one describing inverse (TDM) multiplexing does not apply).
Note that these case definitions are taken verbatim from section Note that these case definitions are taken verbatim from section
6.9, "Layer Relationships". 6.9, "Layer Relationships".
Case 1: "Multiple parallel links between the same subnetworks Case 1: "Multiple parallel links between the same subnetworks
can be bundled together into a single composite link. Each can be bundled together into a single composite link. Each
component of the composite link is independent in the sense component of the composite link is independent in the sense
skipping to change at page 19, line 45 skipping to change at page 15, line 21
in this draft), or connection oriented (e.g., MPLS signaled or in this draft), or connection oriented (e.g., MPLS signaled or
configured). configured).
Component Link: A topolological relationship between subnetworks Component Link: A topolological relationship between subnetworks
(i.e., a connection between nodes), which may be a wavelength, (i.e., a connection between nodes), which may be a wavelength,
circuit, virtual circuit or an MPLS LSP. circuit, virtual circuit or an MPLS LSP.
Authors' Addresses Authors' Addresses
Curtis Villamizar (editor) Curtis Villamizar (editor)
Infinera Corporation OCCNC, LLC
169 W. Java Drive
Sunnyvale, CA 94089 Email: curtis@occnc.com
Email: cvillamizar@infinera.com
Dave McDysan (editor) Dave McDysan (editor)
Verizon Verizon
22001 Loudoun County PKWY 22001 Loudoun County PKWY
Ashburn, VA 20147 Ashburn, VA 20147
Email: dave.mcdysan@verizon.com Email: dave.mcdysan@verizon.com
So Ning So Ning
Verizon Verizon
2400 N. Glenville Ave. 2400 N. Glenville Ave.
 End of changes. 25 change blocks. 
300 lines changed or deleted 78 lines changed or added

This html diff was produced by rfcdiff 1.41. The latest version is available from http://tools.ietf.org/tools/rfcdiff/