draft-ietf-idr-bgp-nh-cost-01.txt   draft-ietf-idr-bgp-nh-cost-02.txt 
Internet Engineering Task Force I. Varlashkin Internet Engineering Task Force I. Varlashkin
Internet-Draft Easynet Global Services Internet-Draft Google
Intended status: Standards Track R. Raszuk Intended status: Standards Track R. Raszuk
Expires: September 28, 2012 NTT MCL Inc. Expires: November 16, 2015 Mirantis Inc.
March 27, 2012 K. Patel
M. Bhardwaj
S. Bayraktar
Cisco Systems
May 15, 2015
Carrying next-hop cost information in BGP Carrying next-hop cost information in BGP
draft-ietf-idr-bgp-nh-cost-01 draft-ietf-idr-bgp-nh-cost-02
Abstract Abstract
This document describes new BGP SAFI to exchange cost information to BGPLS provides a mechanism by which Link state and traffic
next-hops for the purpose of calculating best path from a peer engineering information can be collected from internal networks and
perspective rather than local BGP speaker own perspective. shared with external network routers using BGP. BGPLS defines a new
Address Family to exchange this information using BGP.
Status of this Memo BGP Optimal Route Reflection (ORR) provides a mechanism for a
centralized BGP Route Reflector to acheive requirements of a Hot
Potato Routing as described in Section 11 of [RFC4456]. Optimal
Route Reflection requires BGP ORR to overwrite the default IGP
location placement of the route reflector; which is used for
determining cost to the nexthop contained in the path.
This draft augments BGPLS and defines a new extensions to exchange
cost information to next-hops for the purpose of calculating best
path from a peer perspective rather than local BGP speaker own
perspective.
Status of This Memo
This Internet-Draft is submitted in full conformance with the This Internet-Draft is submitted in full conformance with the
provisions of BCP 78 and BCP 79. provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF). Note that other groups may also distribute Task Force (IETF). Note that other groups may also distribute
working documents as Internet-Drafts. The list of current Internet- working documents as Internet-Drafts. The list of current Internet-
Drafts is at http://datatracker.ietf.org/drafts/current/. Drafts is at http://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress." material or to cite them other than as "work in progress."
This Internet-Draft will expire on November 16, 2015.
This Internet-Draft will expire on September 28, 2012.
Copyright Notice Copyright Notice
Copyright (c) 2012 IETF Trust and the persons identified as the Copyright (c) 2015 IETF Trust and the persons identified as the
document authors. All rights reserved. document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents Provisions Relating to IETF Documents
(http://trustee.ietf.org/license-info) in effect on the date of (http://trustee.ietf.org/license-info) in effect on the date of
publication of this document. Please review these documents publication of this document. Please review these documents
carefully, as they describe your rights and restrictions with respect carefully, as they describe your rights and restrictions with respect
to this document. Code Components extracted from this document must to this document. Code Components extracted from this document must
include Simplified BSD License text as described in Section 4.e of include Simplified BSD License text as described in Section 4.e of
the Trust Legal Provisions and are provided without warranty as the Trust Legal Provisions and are provided without warranty as
described in the Simplified BSD License. described in the Simplified BSD License.
This document may contain material from IETF Documents or IETF
Contributions published or made publicly available before November
10, 2008. The person(s) controlling the copyright in some of this
material may not have granted the IETF Trust the right to allow
modifications of such material outside the IETF Standards Process.
Without obtaining an adequate license from the person(s) controlling
the copyright in such materials, this document may not be modified
outside the IETF Standards Process, and derivative works of it may
not be created outside the IETF Standards Process, except to format
it for publication as an RFC or to translate it into languages other
than English.
Table of Contents Table of Contents
1. Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2
2. NEXT-HOP INFORMATION BASE . . . . . . . . . . . . . . . . . . . 3 2. NEXT-HOP INFORMATION BASE . . . . . . . . . . . . . . . . . . 3
3. BGP BEST PATH SELECTION MODIFICATION . . . . . . . . . . . . . 3 3. BGP Bestpath Selection Modification . . . . . . . . . . . . . 4
4. USING BGP TO POPULATE NHIB . . . . . . . . . . . . . . . . . . 4 4. BGPLS Extensions . . . . . . . . . . . . . . . . . . . . . . 4
4.1. NEXT-HOP SAFI . . . . . . . . . . . . . . . . . . . . . . . 4 4.1. RIB Metrics Prefix Descriptor . . . . . . . . . . . . . . 4
4.2. CAPABILITY ADVERTISEMENT . . . . . . . . . . . . . . . . . 4 4.2. RIB Protocol ID . . . . . . . . . . . . . . . . . . . . . 4
4.3. INFORMATION ENCODING . . . . . . . . . . . . . . . . . . . 4 4.3. Information Exchange . . . . . . . . . . . . . . . . . . 5
4.4. SESSION ESTABLISHMENT . . . . . . . . . . . . . . . . . . . 5 4.4. Termination of the session carrying next-hop cost . . . . 5
4.5. INFORMATION EXCHANGE . . . . . . . . . . . . . . . . . . . 5 4.5. Graceful Restart and Route-Refresh . . . . . . . . . . . 5
4.6. TERMINATION OF NH SAFI SESSION . . . . . . . . . . . . . . 6 5. Security considerations . . . . . . . . . . . . . . . . . . . 5
4.7. GRACEFUL RESTART AND ROUTE REFRESH . . . . . . . . . . . . 6 6. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 5
5. Security considerations . . . . . . . . . . . . . . . . . . . . 6 7. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 6
6. IANA Considerations . . . . . . . . . . . . . . . . . . . . . . 6 8. References . . . . . . . . . . . . . . . . . . . . . . . . . 6
7. Acknowledgment . . . . . . . . . . . . . . . . . . . . . . . . 7 8.1. Normative References . . . . . . . . . . . . . . . . . . 6
8. References . . . . . . . . . . . . . . . . . . . . . . . . . . 7 8.2. Informative References . . . . . . . . . . . . . . . . . 7
8.1. Normative References . . . . . . . . . . . . . . . . . . . 7 Appendix A. USAGE SCENARIOS . . . . . . . . . . . . . . . . . . 7
8.2. Informative References . . . . . . . . . . . . . . . . . . 7 A.1. Trivial case . . . . . . . . . . . . . . . . . . . . . . 7
Appendix A. USAGE SCENARIOS . . . . . . . . . . . . . . . . . . . 7 A.2. Non-IGP based cost . . . . . . . . . . . . . . . . . . . 7
A.1. Trivial case . . . . . . . . . . . . . . . . . . . . . . . 7 A.3. Multiple route-reflectors . . . . . . . . . . . . . . . . 8
A.2. Non-IGP based cost . . . . . . . . . . . . . . . . . . . . 8 A.4. Inter-AS MPLS VPN . . . . . . . . . . . . . . . . . . . . 8
A.3. Multiple route-reflectors . . . . . . . . . . . . . . . . . 8 A.5. Corner case . . . . . . . . . . . . . . . . . . . . . . . 9
A.4. Inter-AS MPLS VPN . . . . . . . . . . . . . . . . . . . . . 9 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 9
A.5. Corner case . . . . . . . . . . . . . . . . . . . . . . . . 9
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 9
1. Motivation 1. Introduction
In certain situation route-reflector clients may not get optimum path In a certain situation, route-reflector clients may not get optimum
to certain destinations. ADDPATH solves this problem by letting path to certain destinations. ADDPATH solves this problem by letting
route-reflector to advertise multiple paths for given prefix. If route-reflector to advertise multiple paths for a given prefix. If
number of advertised paths sufficiently big, route-reflector clients number of advertised paths are sufficiently big, route-reflector
can choose same route as they would in case of full-mesh. This clients can choose same route as they would in case of full-mesh.
approach however places additional burden on the control plane. This approach however places an additional burden on the control
Solutions proposed by [BGP-ORR] use different approach - instead of plane. Solutions proposed by [BGP-ORR] use different approach -
calculating best path from local speaker own perspective the instead of calculating best path from the local speaker's own
calculations are done using cost from the client to the next-hops. perspective the calculations are done using cost from the client to
Although they eliminate need for transmitting redundant routing the next-hops. Although they eliminate need for transmitting
information between peers, there are scenarios where cost to the redundant routing information between peers, there are scenarios
next-hop cannot be obtained accurately using this methods. For where cost to the next-hop cannot be obtained accurately using these
example, if next-hop information itself has been learned via BGP then methods. For example, if next-hop information itself has been
simple SPF run on link-state database won't be sufficient to obtain learned via BGP then simple SPF run on link-state database won't be
cost information. To address such scenarios this document proposes a sufficient to obtain cost information. There are also scenarios
solution where cost information to the next-hops is carried within where while a Route Reflector can reach its clients, the client to
BGP itself using dedicated SAFI. client connectivity MAY be down.
BGPLS [I-D.ietf-idr-bgp-orr] provides a mechanism by which Link state
and traffic engineering information can be collected from internal
networks and shared with external network routers using BGP. BGPLS
defines a new Address Family to exchange this information using BGP.
To address such scenarios, this draft defines extensions to BGPLS to
carry cost information of the next-hops. In particular, this draft
defines a new Protocol ID to announce a Router's IGP routes, and a
Prefix Descriptor to carry the cost information of the IGP routes
used towards resolving next-hops.
2. NEXT-HOP INFORMATION BASE 2. NEXT-HOP INFORMATION BASE
To facilitate further description of the proposed solution we To facilitate further description of the proposed solution we
introduce new table for all known next hops and costs to it from introduce a new table for all known next-hops and costs to it from
various routers on the network. various routers on the network.
Next-Hop Information Base (NHIB) stores cost to reach next-hop from Next-Hop Information Base (NHIB) stores cost to reach next-hop from
arbitrary router on the network. This information is essential for an arbitrary router on the network. This information is essential
choosing best path from a peer perspective rather than BGP-speaker for choosing best path from a peer perspective rather than BGP-
own perspective. In canonical form NHIB entry is triplet (router, speaker own perspective. In canonical form NHIB entry is triplet
next-hop, cost), however this specification does not impose any (router, next-hop, cost), however this specification does not impose
restriction on how BGP implementations store that information any restriction on how BGP implementations store that information
internally. The cost in NHIB is does not have to be an IGP cost, but internally. The cost in NHIB is does not have to be an IGP cost, but
all costs in NHIB MUST be comparable with each other. all costs in NHIB MUST be comparable with each other.
NHIB can be populated from various sources both static and dynamic. NHIB can be populated from various sources including static routing
This document focuses on populating NHIB using BGP. However it is and dynamic routing. However, this document focuses on populating
possible that protocols other than BGP could be also used to populate NHIB using BGP.
NHIB.
3. BGP BEST PATH SELECTION MODIFICATION An implementation implementing the BGP extension described in this
draft MAY provide an operator-controlled configuration knob
significant to an individual BGP speaker that treats next-hop cost
information received from two or more clients as equivalent. For
example a route-reflector could receive next-hop cost only from R1
but it will use it while calculating best-path also for R2, R3, Rn
because it has been instructed to do so by locally-significant
configuration. Multiple sources can be used for redundancy purpose.
3. BGP Bestpath Selection Modification
This section applies regardless of method used to populate NHIB. This section applies regardless of method used to populate NHIB.
When BGP speaker conforming to this specification selects routes to When BGP speaker conforming to this specification selects routes to
be advertised to a peer it SHOULD use cost information from NHIB be advertised to a peer it SHOULD use cost information from NHIB
rather than its own IGP cost to the next-hop after step (d) of rather than its own IGP cost to the next-hop after step (d) of
9.1.2.2 in [RFC4271]. 9.1.2.2 in [RFC4271].
4. USING BGP TO POPULATE NHIB 4. BGPLS Extensions
This section describes extension to base BGP specification that
allows BGP to be used for exchanging next-hop information between BGP
speakers via new SAFI in order to populate NHIB. Although next-hops
costs are exchanged via dedicated SAFI, this information is vital to
best path selection process for other AFI/SAFI (e.g. IPv4 and IPv6
unicast). It's therefore recommended that next-hop cost information
is exchanged before other AFI/SAFI.
4.1. NEXT-HOP SAFI
This document introduces Next-Hop SAFI (NH SAFI) with value to be
assigned by IANA and purpose of exchanging information about cost to
next-hops.
4.2. CAPABILITY ADVERTISEMENT
A BGP speaker willing to exchange next-hop information MUST advertise
this in the OPEN message using BGP Capability Code 1 (Multiprotocol
Extensions, see [RFC4760]) setting AFI appropriately to indicate IPv4
or IPv6 and SAFI to the value assigned by IANA for NH SAFI. Note
that if BGP speaker whishes to exchange cost information for both
IPv4 and IPv6, then it MUST advertise two capabilities: one NH SAFI
for IPv4 and one NH SAFI for IPv6.
4.3. INFORMATION ENCODING
Routers use standard BGP UPDATE messages to exchange NH SAFI
information. Cost to reachable next-hops is communicated using
MP_REACH_NLRI (attribute 14) with NLRI part as described below.
Requests are also sent using MP_REACH_NLRI. Informing a neighbour
about unreachable next-hop is done using MP_UNREACH_NLRI. All NH
SAFI messages MUST contain BGP COMMUNITY attribute with value
NO_ADVERTISE (0xFFFFFF02) and their propagation MUST follow normal
BGP rules (i.e. they're not to be propagated).
To request cost to a next-hop from peer or to inform peer about cost
to a next-hop BGP attribute 14 is used as follow:
1. AFI is set to indicate IPv4 or IPv6 (whichever is appropriate) 4.1. RIB Metrics Prefix Descriptor
2. SAFI is set to NH SAFI This draft defines a new Prefix Descriptor known as a Cost Prefix
3. Network Address of Next-Hop field is zeroed out Descriptor with a TLV code point value to be assigned by IANA. The
Cost descriptor looks like:
4. NLRI field is encoded as shown in the next figure +--------------+-----------------------+----------+-----------------+
| TLV Code | Description | Length | Value defined |
| Point | | | in: |
+--------------+-----------------------+----------+-----------------+
| TBD | Cost | 4 bytes | Cost Value |
+--------------+-----------------------+----------+-----------------+
Format of NH SAFI NLRI is as follow: Cost Value is a 4 byte Metric value computed by a Router's
+-----+------+-------+----------+------+ local RIB.
| AFI | SAFI | Flags | NEXT_HOP | cost |
+-----+------+-------+----------+------+
Flags - 1 octet field. Least significant bit MUST be set to 1 for The Cost value is a cost associated with a prefix by a Router. The
Request and to zero for Response cost is typically computed by the routing procotols that owns a
route.
AFI/SAFI fields can be set either to one of the registered values to 4.2. RIB Protocol ID
indicate that next-hop cost info applies only to specified AFI/SAFI.
Alternatively when both fields are be set to zero, the cost
information applies to any compatible AFI/SAFI negotiated with given
peer.
Next-hop - IPv4 or IPv6 address for which cost is being communicated This draft defines a new protocol ID for IPv4 and IPv6 Topology
or requested. Type is determined from context, and length is Prefix NLRI known as a RIB Protocol ID. The RIB Protocol ID has a
inferred from total length of attribute. value to be assigned by IANA. The Prefix NLRI with RIB Protocol ID
is used to announce all the local and IGP computated routes that are
installed in the RIB along with its Cost value.
Cost is 32-bit unsigned integer (value described below), and NEXT_HOP 4.3. Information Exchange
is AFI-specific address of the next-hop cost to which is being
communicated or requested. Size of NEXT_HOP field is inferred from
total length of attribute 14.
To inform peer that particular next-hop is unreachable Typically BGPLS sessions will be established between route-reflectors
MP_UNREACH_NLRI attribute is used with same NLRI format as described and its internal peers (both clients and non-clients). As soon as
above. In this case cost field SHOULD be set to 0xFFFFFFFF. the BGPLS session is ESTABLISHED, all the RIB routes used to resolve
next-hop cost and information about next-hop costs MAY be sent
immediately by clients to its route-reflector. Implementations are
advised to announce BGP updates for this SAFI before any other SAFIs
to facilitate faster convergence of other SAFIs on Route Reflectors.
4.4. SESSION ESTABLISHMENT Each internal neighbor of a route-reflector announces its IGP RIB
Prefix information and its RIB metrics to the Route Reflector using a
BGPLS session and a new NLRI Protocol ID and RIB metric Prefix
Descriptor. Each neighbor updates Route Reflector with its IGP
prefix cost everytime a cost to an IGP route changes.
BGP speakers willing to exchange next-hop information SHOULD NOT Upon a receipt of a BGPLS route and its associated cost, a Route
establish more then one session for given AFI and NH SAFI, even using Reflector stores the prefix, cost, and neighbor information in its
different transport addresses. This can be ensured for example by local NHRIB database. It then uses the received cost towards
checking peer's Router Id. calculation of bestpath from the respective clients perpective as
opposed to its own IGP cost.
4.5. INFORMATION EXCHANGE 4.4. Termination of the session carrying next-hop cost
Typically NH SAFI sessions will be established between route- When the BGPLS session carrying next-hop cost terminates (for
reflectors and its internal peers (both clients and non-clients). As whatever reason), the BGP speaker SHOULD invalidate all the next-hop
soon as the NH SAFI session is ESTABLISHED requests for next-hop cost cost information (i.e same treatment that applies to the next-hop
and information information about next-hop costs MAY be sent cost as to any other BGP learned information).
independently. That is, route-reflector MAY send multiple requests
without waiting for response, and its peers MAY send cost information
before or after receiving such request. On the other hand, Router
Reflectors SHOULD request cost information from their internal peers
as soon as possible (due to reasons stated in section "BGP best path
selection modification"). BGP speaker does not need to track
outstanding requests to the peer.
When a BGP speaker receives request for cost information it MUST 4.5. Graceful Restart and Route-Refresh
reply with actual cost (not necessarily IGP cost, but whatever has
been chosen to be carried in NH SAFI) to given next-hop or with cost
set to all-ones indicating that next-hop is unreachable. If next-hop
information is obtained from sender's routing table, then sender MUST
perform lookup exactly the same way as it would for resolving next-
hop in BGP UPDATE message. For example, for non-labelled
destinations (e.g. AFI/SAFI 1/1 or 2/1) lookup would be done using
longest match, whereas for labelled IPv4 (AFI/SAFI 1/4, 1/128 or 2/4)
exact-match would be used.
When a BGP speaker detects change in cost to previously advertised BGPLS sessions carrying next-hop cost could use Graceful Restart
next-hop with delta equal or exceeding configured advertisement [RFC4724] and Route Refresh [RFC7313] mechanisms in the same way as
threshold, it SHOULD inform peer by sending MP_UNREACH_NLRI as it's used for IPv4 and IPv6 unicast.
described earlier.
When a BGP speaker discovers new next-hop among candidate routes it 5. Security considerations
SHOULD request cost information from the peer.
4.6. TERMINATION OF NH SAFI SESSION This document does not introduce new security considerations above
and beyond those already specified in [RFC4271], [I-D.ietf-idr-bgp-
orr] and [I-D.ietf-idr-bgp-ls].
When BGP speaker terminates (for whatever reason) NH SAFI session 6. IANA Considerations
with a peer, it SHOULD remove all cost information received from that
peer unless instructed by configuration to do otherwise.
4.7. GRACEFUL RESTART AND ROUTE REFRESH This draft defines a new protocol id value for RIB Protocol ID. This
draft requests IANA to allocate a value for a RIB Protocol ID from
BGPLS Protocol ID Registry.
NH SAFI sessions could use graceful restart and route refresh This draft defines a new RIB Metrics Prefix Descriptor value. This
mechanisms in the same way as it's used for IPv4 and IPv6 unicast - draft request IANA to allocate a TLV code value for the new
preservation and purge of next-hop cost information follows normal GR descriptor from the Prefix Descriptor registry.
rules.
5. Security considerations 7. Acknowledgements
No new security issues are introduced to the BGP protocol by this The authors would like to acknowledge David Ward, Anton Elita,
specification. Nagendra Kumar and Burjiz Pithawala for their critical reviews and
feedback.
6. IANA Considerations 8. References
IANA is requested to allocate value for Next-Hop Subsequent Address 8.1. Normative References
Family Identifier.
7. Acknowledgment [I-D.ietf-idr-bgp-optimal-route-reflection]
Raszuk, R., Cassar, C., Aman, E., Decraene, B., and S.
Litkowski, "BGP Optimal Route Reflection (BGP-ORR)",
draft-ietf-idr-bgp-optimal-route-reflection-09 (work in
progress), April 2015.
Authors would like to thank Keyur Patel, Anton Elita, Nagendra Kumar [I-D.ietf-idr-ls-distribution]
for critical reviews and feedback. Gredler, H., Medved, J., Previdi, S., Farrel, A., and S.
Ray, "North-Bound Distribution of Link-State and TE
Information using BGP", draft-ietf-idr-ls-distribution-10
(work in progress), January 2015.
8. References [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
Requirement Levels", BCP 14, RFC 2119, March 1997.
8.1. Normative References [RFC2328] Moy, J., "OSPF Version 2", STD 54, RFC 2328, April 1998.
[RFC4271] Rekhter, Y., Li, T., and S. Hares, "A Border Gateway [RFC4271] Rekhter, Y., Li, T., and S. Hares, "A Border Gateway
Protocol 4 (BGP-4)", RFC 4271, January 2006. Protocol 4 (BGP-4)", RFC 4271, January 2006.
[RFC4760] Bates, T., Chandra, R., Katz, D., and Y. Rekhter, [RFC4456] Bates, T., Chen, E., and R. Chandra, "BGP Route
"Multiprotocol Extensions for BGP-4", RFC 4760, Reflection: An Alternative to Full Mesh Internal BGP
(IBGP)", RFC 4456, April 2006.
[RFC4724] Sangli, S., Chen, E., Fernando, R., Scudder, J., and Y.
Rekhter, "Graceful Restart Mechanism for BGP", RFC 4724,
January 2007. January 2007.
8.2. Informative References [RFC4760] Bates, T., Chandra, R., Katz, D., and Y. Rekhter,
"Multiprotocol Extensions for BGP-4", RFC 4760, January
2007.
[I-D.raszuk-bgp-optimal-route-reflection] [RFC7313] Patel, K., Chen, E., and B. Venkatachalapathy, "Enhanced
Raszuk, R., Cassar, C., Aman, E., and B. Decraene, "BGP Route Refresh Capability for BGP-4", RFC 7313, July 2014.
Optimal Route Reflection (BGP-ORR)",
draft-raszuk-bgp-optimal-route-reflection-01 (work in 8.2. Informative References
progress), March 2011.
[RFC2918] Chen, E., "Route Refresh Capability for BGP-4", RFC 2918, [RFC2918] Chen, E., "Route Refresh Capability for BGP-4", RFC 2918,
September 2000. September 2000.
Appendix A. USAGE SCENARIOS Appendix A. USAGE SCENARIOS
A.1. Trivial case A.1. Trivial case
--+---NetA---+-- --+---NetA---+--
| | | |
skipping to change at page 8, line 6 skipping to change at page 7, line 31
R3 R3
In this scenario r1 and r3 along with NetA are part of AS1; and R1-R4 In this scenario r1 and r3 along with NetA are part of AS1; and R1-R4
along with RR are in AS2. along with RR are in AS2.
If RR implements non-optimized route-reflection, then it will choose If RR implements non-optimized route-reflection, then it will choose
path to NetA via R1 and advertise it to both R3 and R4. Such choice path to NetA via R1 and advertise it to both R3 and R4. Such choice
is good from R3 perspective, but it results in suboptimal traffic is good from R3 perspective, but it results in suboptimal traffic
flow from R4 to NetA. flow from R4 to NetA.
Using NH SAFI the route-reflector will learn that cost from R4 to R1 Using the proposed BGPLS extensions, the route-reflector will learn
is 8 whereas to R2 it's only 1. RR will announce NetA to R4 with that cost from R4 to R1 is 8 whereas to R2 it's only 1. RR will
next-hop set to R2, while its announce to R3 will still have R1 as announce NetA to R4 with next-hop set to R2, while its announce to R3
next-hop. Both R3 and R4 now will send traffic to NetA via closest will still have R1 as next-hop. Both R3 and R4 now will send traffic
exit, achieving same behaviour as if full iBGP mesh would have been to NetA via closest exit, achieving same behaviour as if full iBGP
configured. mesh would have been configured.
A.2. Non-IGP based cost A.2. Non-IGP based cost
When it's desirable to direct traffic over an exit other than the one When it's desirable to direct traffic over an exit other than the one
with smallest IGP cost, NH SAFI can be used to convey cost which is with smallest IGP cost, BGPLS extensions can be used to convey cost
not based on IGP. For example, network operator may arrange exit which is not based on IGP. For example, network operator may arrange
points in order of administrative preference and configure routers to exit points in order of administrative preference and configure
send this instead of IGP cost. Route reflector then will then routers to send this instead of IGP cost. Route reflector then will
calculate best path based on administrative preference rather than then calculate best path based on administrative preference rather
IGP metrics. than IGP metrics.
Network operators should excercise care to ensure that all routers up Network operators should excercise care to ensure that all routers up
to and including exit point do not devert packets on to a different to and including exit point do not devert packets on to a different
path, otherwise routing loops may occur. One way to achieve this is path, otherwise routing loops may occur. One way to achieve this is
to have consistent administrative preference among all routers. to have consistent administrative preference among all routers.
Another option is to use a tunneling mechanism (e.g. MPLS-TE tunnel) Another option is to use a tunneling mechanism (e.g. MPLS-TE tunnel)
between source and the exit point, provided that the router serving between source and the exit point, provided that the router serving
as exit point will send packets out of the network rather than as exit point will send packets out of the network rather than
diverting them to another exit point. diverting them to another exit point.
A.3. Multiple route-reflectors A.3. Multiple route-reflectors
This example demonstrates that NH SAFI peerings are necessary only This example demonstrates that BGPLS extensions are necessary only
between routers that already exchange other AFI/SAFI. between routers that already exchange other AFI/SAFI.
| |
R1----R3---------R5----R7--+ R1----R3---------R5----R7--+
| | | | | |
RR1 | NetA RR1 | NetA
| RR2 | | RR2 |
| | | | | |
R2----R4---------R6----R8--+ R2----R4---------R6----R8--+
| |
In the above network the routers R1-R4 are clients of RR1, and R5-R8 In the above network the routers R1-R4 are clients of RR1, and R5-R8
are clients of RR2. RR1 and RR2 also peer with each other and use are clients of RR2. RR1 and RR2 also peer with each other and use
ADDPATH. ADDPATH.
RR2 learns about NetA from R7 and R8. Since it sends not just best- RR2 learns about NetA from R7 and R8. Since it sends not just best-
path but all prefixes to RR1, there is no need for RR2 to learn cost path but all prefixes to RR1, there is no need for RR2 to learn cost
information from R1 and R2 towards R7 and R8. On the other hand RR1 information from R1 and R2 towards R7 and R8. On the other hand RR1
does exchange NH SAFI information with R1 and R2 so that each of them does exchange cost information using BGPLS with R1 and R2 so that
can receive routes, which are best from their perspective. each of them can receive routes, which are best from their
perspective.
As addition to ADDPATH a mechanism could be devised that would allow As addition to ADDPATH a mechanism could be devised that would allow
RR2 to learn how many alternative routes does it need to send to RR1. RR2 to learn how many alternative routes does it need to send to RR1.
For example, if NetA would also be connected to R9 (not shown) but For example, if NetA would also be connected to R9 (not shown) but
all clients of RR1 prefer R7 as exit point and R9 as next-best, then all clients of RR1 prefer R7 as exit point and R9 as next-best, then
there is no need for RR2 to send NetA routes with next-hop R8 to RR1. there is no need for RR2 to send NetA routes with next-hop R8 to RR1.
Discussion: authors would like to solicit discussion whether there is Discussion: authors would like to solicit discussion whether there is
sufficient interest in such mechanism. sufficient interest in such mechanism.
skipping to change at page 9, line 46 skipping to change at page 9, line 27
selection modification" requires RR to have next-hop cost information selection modification" requires RR to have next-hop cost information
for every next-hop and every peer. for every next-hop and every peer.
Note that the problem is the same as if RR would not use extensions Note that the problem is the same as if RR would not use extensions
described in this document and R3 would peer directly with R1 and R2, described in this document and R3 would peer directly with R1 and R2,
while R4 would peer only with RR. while R4 would peer only with RR.
Authors' Addresses Authors' Addresses
Ilya Varlashkin Ilya Varlashkin
Easynet Global Services Google
Email: ilya@nobulus.com
Email: ilya.varlashkin@easynet.com
Robert Raszuk Robert Raszuk
NTT MCL Inc. Mirantis Inc.
101 S Ellsworth Avenue Suite 350 615 National Ave. #100
San Mateo, CA 94401 Mt View, CA 94043
US USA
Email: robert@raszuk.net Email: robert@raszuk.net
Keyur Patel
Cisco Systems
170 W. Tasman Drive
San Jose, CA 95124 95134
USA
Email: keyupate@cisco.com
Manish Bhardwaj
Cisco Systems
170 W. Tasman Drive
San Jose, CA 95124 95134
USA
Email: manbhard@cisco.com
Serpil Bayraktar
Cisco Systems
170 W. Tasman Drive
San Jose, CA 95124 95134
USA
Email: serpil@cisco.com
 End of changes. 55 change blocks. 
222 lines changed or deleted 212 lines changed or added

This html diff was produced by rfcdiff 1.42. The latest version is available from http://tools.ietf.org/tools/rfcdiff/