--- 1/draft-ietf-idr-bgp-nh-cost-01.txt 2015-05-16 03:17:00.797754988 -0700 +++ 2/draft-ietf-idr-bgp-nh-cost-02.txt 2015-05-16 03:17:00.821755572 -0700 @@ -1,306 +1,294 @@ Internet Engineering Task Force I. Varlashkin -Internet-Draft Easynet Global Services +Internet-Draft Google Intended status: Standards Track R. Raszuk -Expires: September 28, 2012 NTT MCL Inc. - March 27, 2012 +Expires: November 16, 2015 Mirantis Inc. + K. Patel + M. Bhardwaj + S. Bayraktar + Cisco Systems + May 15, 2015 Carrying next-hop cost information in BGP - draft-ietf-idr-bgp-nh-cost-01 + draft-ietf-idr-bgp-nh-cost-02 Abstract - This document describes new BGP SAFI to exchange cost information to - next-hops for the purpose of calculating best path from a peer - perspective rather than local BGP speaker own perspective. + BGPLS provides a mechanism by which Link state and traffic + engineering information can be collected from internal networks and + shared with external network routers using BGP. BGPLS defines a new + Address Family to exchange this information using BGP. -Status of this Memo + BGP Optimal Route Reflection (ORR) provides a mechanism for a + centralized BGP Route Reflector to acheive requirements of a Hot + Potato Routing as described in Section 11 of [RFC4456]. Optimal + Route Reflection requires BGP ORR to overwrite the default IGP + location placement of the route reflector; which is used for + determining cost to the nexthop contained in the path. + + This draft augments BGPLS and defines a new extensions to exchange + cost information to next-hops for the purpose of calculating best + path from a peer perspective rather than local BGP speaker own + perspective. + +Status of This Memo This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet- Drafts is at http://datatracker.ietf.org/drafts/current/. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." - - This Internet-Draft will expire on September 28, 2012. + This Internet-Draft will expire on November 16, 2015. Copyright Notice - Copyright (c) 2012 IETF Trust and the persons identified as the + Copyright (c) 2015 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License. - This document may contain material from IETF Documents or IETF - Contributions published or made publicly available before November - 10, 2008. The person(s) controlling the copyright in some of this - material may not have granted the IETF Trust the right to allow - modifications of such material outside the IETF Standards Process. - Without obtaining an adequate license from the person(s) controlling - the copyright in such materials, this document may not be modified - outside the IETF Standards Process, and derivative works of it may - not be created outside the IETF Standards Process, except to format - it for publication as an RFC or to translate it into languages other - than English. - Table of Contents - 1. Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . 3 - 2. NEXT-HOP INFORMATION BASE . . . . . . . . . . . . . . . . . . . 3 - 3. BGP BEST PATH SELECTION MODIFICATION . . . . . . . . . . . . . 3 - 4. USING BGP TO POPULATE NHIB . . . . . . . . . . . . . . . . . . 4 - 4.1. NEXT-HOP SAFI . . . . . . . . . . . . . . . . . . . . . . . 4 - 4.2. CAPABILITY ADVERTISEMENT . . . . . . . . . . . . . . . . . 4 - 4.3. INFORMATION ENCODING . . . . . . . . . . . . . . . . . . . 4 - 4.4. SESSION ESTABLISHMENT . . . . . . . . . . . . . . . . . . . 5 - 4.5. INFORMATION EXCHANGE . . . . . . . . . . . . . . . . . . . 5 - 4.6. TERMINATION OF NH SAFI SESSION . . . . . . . . . . . . . . 6 - 4.7. GRACEFUL RESTART AND ROUTE REFRESH . . . . . . . . . . . . 6 - 5. Security considerations . . . . . . . . . . . . . . . . . . . . 6 - 6. IANA Considerations . . . . . . . . . . . . . . . . . . . . . . 6 - 7. Acknowledgment . . . . . . . . . . . . . . . . . . . . . . . . 7 - 8. References . . . . . . . . . . . . . . . . . . . . . . . . . . 7 - 8.1. Normative References . . . . . . . . . . . . . . . . . . . 7 - 8.2. Informative References . . . . . . . . . . . . . . . . . . 7 - Appendix A. USAGE SCENARIOS . . . . . . . . . . . . . . . . . . . 7 - A.1. Trivial case . . . . . . . . . . . . . . . . . . . . . . . 7 - A.2. Non-IGP based cost . . . . . . . . . . . . . . . . . . . . 8 - A.3. Multiple route-reflectors . . . . . . . . . . . . . . . . . 8 - A.4. Inter-AS MPLS VPN . . . . . . . . . . . . . . . . . . . . . 9 - A.5. Corner case . . . . . . . . . . . . . . . . . . . . . . . . 9 - Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 9 + 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 + 2. NEXT-HOP INFORMATION BASE . . . . . . . . . . . . . . . . . . 3 + 3. BGP Bestpath Selection Modification . . . . . . . . . . . . . 4 + 4. BGPLS Extensions . . . . . . . . . . . . . . . . . . . . . . 4 + 4.1. RIB Metrics Prefix Descriptor . . . . . . . . . . . . . . 4 + 4.2. RIB Protocol ID . . . . . . . . . . . . . . . . . . . . . 4 + 4.3. Information Exchange . . . . . . . . . . . . . . . . . . 5 + 4.4. Termination of the session carrying next-hop cost . . . . 5 + 4.5. Graceful Restart and Route-Refresh . . . . . . . . . . . 5 + 5. Security considerations . . . . . . . . . . . . . . . . . . . 5 + 6. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 5 + 7. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 6 + 8. References . . . . . . . . . . . . . . . . . . . . . . . . . 6 + 8.1. Normative References . . . . . . . . . . . . . . . . . . 6 + 8.2. Informative References . . . . . . . . . . . . . . . . . 7 + Appendix A. USAGE SCENARIOS . . . . . . . . . . . . . . . . . . 7 + A.1. Trivial case . . . . . . . . . . . . . . . . . . . . . . 7 + A.2. Non-IGP based cost . . . . . . . . . . . . . . . . . . . 7 + A.3. Multiple route-reflectors . . . . . . . . . . . . . . . . 8 + A.4. Inter-AS MPLS VPN . . . . . . . . . . . . . . . . . . . . 8 + A.5. Corner case . . . . . . . . . . . . . . . . . . . . . . . 9 + Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 9 -1. Motivation +1. Introduction - In certain situation route-reflector clients may not get optimum path - to certain destinations. ADDPATH solves this problem by letting - route-reflector to advertise multiple paths for given prefix. If - number of advertised paths sufficiently big, route-reflector clients - can choose same route as they would in case of full-mesh. This - approach however places additional burden on the control plane. - Solutions proposed by [BGP-ORR] use different approach - instead of - calculating best path from local speaker own perspective the - calculations are done using cost from the client to the next-hops. - Although they eliminate need for transmitting redundant routing - information between peers, there are scenarios where cost to the - next-hop cannot be obtained accurately using this methods. For - example, if next-hop information itself has been learned via BGP then - simple SPF run on link-state database won't be sufficient to obtain - cost information. To address such scenarios this document proposes a - solution where cost information to the next-hops is carried within - BGP itself using dedicated SAFI. + In a certain situation, route-reflector clients may not get optimum + path to certain destinations. ADDPATH solves this problem by letting + route-reflector to advertise multiple paths for a given prefix. If + number of advertised paths are sufficiently big, route-reflector + clients can choose same route as they would in case of full-mesh. + This approach however places an additional burden on the control + plane. Solutions proposed by [BGP-ORR] use different approach - + instead of calculating best path from the local speaker's own + perspective the calculations are done using cost from the client to + the next-hops. Although they eliminate need for transmitting + redundant routing information between peers, there are scenarios + where cost to the next-hop cannot be obtained accurately using these + methods. For example, if next-hop information itself has been + learned via BGP then simple SPF run on link-state database won't be + sufficient to obtain cost information. There are also scenarios + where while a Route Reflector can reach its clients, the client to + client connectivity MAY be down. + + BGPLS [I-D.ietf-idr-bgp-orr] provides a mechanism by which Link state + and traffic engineering information can be collected from internal + networks and shared with external network routers using BGP. BGPLS + defines a new Address Family to exchange this information using BGP. + + To address such scenarios, this draft defines extensions to BGPLS to + carry cost information of the next-hops. In particular, this draft + defines a new Protocol ID to announce a Router's IGP routes, and a + Prefix Descriptor to carry the cost information of the IGP routes + used towards resolving next-hops. 2. NEXT-HOP INFORMATION BASE To facilitate further description of the proposed solution we - introduce new table for all known next hops and costs to it from + introduce a new table for all known next-hops and costs to it from various routers on the network. Next-Hop Information Base (NHIB) stores cost to reach next-hop from - arbitrary router on the network. This information is essential for - choosing best path from a peer perspective rather than BGP-speaker - own perspective. In canonical form NHIB entry is triplet (router, - next-hop, cost), however this specification does not impose any - restriction on how BGP implementations store that information + an arbitrary router on the network. This information is essential + for choosing best path from a peer perspective rather than BGP- + speaker own perspective. In canonical form NHIB entry is triplet + (router, next-hop, cost), however this specification does not impose + any restriction on how BGP implementations store that information internally. The cost in NHIB is does not have to be an IGP cost, but all costs in NHIB MUST be comparable with each other. - NHIB can be populated from various sources both static and dynamic. - This document focuses on populating NHIB using BGP. However it is - possible that protocols other than BGP could be also used to populate - NHIB. + NHIB can be populated from various sources including static routing + and dynamic routing. However, this document focuses on populating + NHIB using BGP. -3. BGP BEST PATH SELECTION MODIFICATION + An implementation implementing the BGP extension described in this + draft MAY provide an operator-controlled configuration knob + significant to an individual BGP speaker that treats next-hop cost + information received from two or more clients as equivalent. For + example a route-reflector could receive next-hop cost only from R1 + but it will use it while calculating best-path also for R2, R3, Rn + because it has been instructed to do so by locally-significant + configuration. Multiple sources can be used for redundancy purpose. + +3. BGP Bestpath Selection Modification This section applies regardless of method used to populate NHIB. When BGP speaker conforming to this specification selects routes to be advertised to a peer it SHOULD use cost information from NHIB rather than its own IGP cost to the next-hop after step (d) of 9.1.2.2 in [RFC4271]. -4. USING BGP TO POPULATE NHIB - - This section describes extension to base BGP specification that - allows BGP to be used for exchanging next-hop information between BGP - speakers via new SAFI in order to populate NHIB. Although next-hops - costs are exchanged via dedicated SAFI, this information is vital to - best path selection process for other AFI/SAFI (e.g. IPv4 and IPv6 - unicast). It's therefore recommended that next-hop cost information - is exchanged before other AFI/SAFI. - -4.1. NEXT-HOP SAFI - - This document introduces Next-Hop SAFI (NH SAFI) with value to be - assigned by IANA and purpose of exchanging information about cost to - next-hops. - -4.2. CAPABILITY ADVERTISEMENT - - A BGP speaker willing to exchange next-hop information MUST advertise - this in the OPEN message using BGP Capability Code 1 (Multiprotocol - Extensions, see [RFC4760]) setting AFI appropriately to indicate IPv4 - or IPv6 and SAFI to the value assigned by IANA for NH SAFI. Note - that if BGP speaker whishes to exchange cost information for both - IPv4 and IPv6, then it MUST advertise two capabilities: one NH SAFI - for IPv4 and one NH SAFI for IPv6. - -4.3. INFORMATION ENCODING - - Routers use standard BGP UPDATE messages to exchange NH SAFI - information. Cost to reachable next-hops is communicated using - MP_REACH_NLRI (attribute 14) with NLRI part as described below. - Requests are also sent using MP_REACH_NLRI. Informing a neighbour - about unreachable next-hop is done using MP_UNREACH_NLRI. All NH - SAFI messages MUST contain BGP COMMUNITY attribute with value - NO_ADVERTISE (0xFFFFFF02) and their propagation MUST follow normal - BGP rules (i.e. they're not to be propagated). - - To request cost to a next-hop from peer or to inform peer about cost - to a next-hop BGP attribute 14 is used as follow: +4. BGPLS Extensions - 1. AFI is set to indicate IPv4 or IPv6 (whichever is appropriate) +4.1. RIB Metrics Prefix Descriptor - 2. SAFI is set to NH SAFI - 3. Network Address of Next-Hop field is zeroed out + This draft defines a new Prefix Descriptor known as a Cost Prefix + Descriptor with a TLV code point value to be assigned by IANA. The + Cost descriptor looks like: - 4. NLRI field is encoded as shown in the next figure + +--------------+-----------------------+----------+-----------------+ + | TLV Code | Description | Length | Value defined | + | Point | | | in: | + +--------------+-----------------------+----------+-----------------+ + | TBD | Cost | 4 bytes | Cost Value | + +--------------+-----------------------+----------+-----------------+ - Format of NH SAFI NLRI is as follow: - +-----+------+-------+----------+------+ - | AFI | SAFI | Flags | NEXT_HOP | cost | - +-----+------+-------+----------+------+ + Cost Value is a 4 byte Metric value computed by a Router's + local RIB. - Flags - 1 octet field. Least significant bit MUST be set to 1 for - Request and to zero for Response + The Cost value is a cost associated with a prefix by a Router. The + cost is typically computed by the routing procotols that owns a + route. - AFI/SAFI fields can be set either to one of the registered values to - indicate that next-hop cost info applies only to specified AFI/SAFI. - Alternatively when both fields are be set to zero, the cost - information applies to any compatible AFI/SAFI negotiated with given - peer. +4.2. RIB Protocol ID - Next-hop - IPv4 or IPv6 address for which cost is being communicated - or requested. Type is determined from context, and length is - inferred from total length of attribute. + This draft defines a new protocol ID for IPv4 and IPv6 Topology + Prefix NLRI known as a RIB Protocol ID. The RIB Protocol ID has a + value to be assigned by IANA. The Prefix NLRI with RIB Protocol ID + is used to announce all the local and IGP computated routes that are + installed in the RIB along with its Cost value. - Cost is 32-bit unsigned integer (value described below), and NEXT_HOP - is AFI-specific address of the next-hop cost to which is being - communicated or requested. Size of NEXT_HOP field is inferred from - total length of attribute 14. +4.3. Information Exchange - To inform peer that particular next-hop is unreachable - MP_UNREACH_NLRI attribute is used with same NLRI format as described - above. In this case cost field SHOULD be set to 0xFFFFFFFF. + Typically BGPLS sessions will be established between route-reflectors + and its internal peers (both clients and non-clients). As soon as + the BGPLS session is ESTABLISHED, all the RIB routes used to resolve + next-hop cost and information about next-hop costs MAY be sent + immediately by clients to its route-reflector. Implementations are + advised to announce BGP updates for this SAFI before any other SAFIs + to facilitate faster convergence of other SAFIs on Route Reflectors. -4.4. SESSION ESTABLISHMENT + Each internal neighbor of a route-reflector announces its IGP RIB + Prefix information and its RIB metrics to the Route Reflector using a + BGPLS session and a new NLRI Protocol ID and RIB metric Prefix + Descriptor. Each neighbor updates Route Reflector with its IGP + prefix cost everytime a cost to an IGP route changes. - BGP speakers willing to exchange next-hop information SHOULD NOT - establish more then one session for given AFI and NH SAFI, even using - different transport addresses. This can be ensured for example by - checking peer's Router Id. + Upon a receipt of a BGPLS route and its associated cost, a Route + Reflector stores the prefix, cost, and neighbor information in its + local NHRIB database. It then uses the received cost towards + calculation of bestpath from the respective clients perpective as + opposed to its own IGP cost. -4.5. INFORMATION EXCHANGE +4.4. Termination of the session carrying next-hop cost - Typically NH SAFI sessions will be established between route- - reflectors and its internal peers (both clients and non-clients). As - soon as the NH SAFI session is ESTABLISHED requests for next-hop cost - and information information about next-hop costs MAY be sent - independently. That is, route-reflector MAY send multiple requests - without waiting for response, and its peers MAY send cost information - before or after receiving such request. On the other hand, Router - Reflectors SHOULD request cost information from their internal peers - as soon as possible (due to reasons stated in section "BGP best path - selection modification"). BGP speaker does not need to track - outstanding requests to the peer. + When the BGPLS session carrying next-hop cost terminates (for + whatever reason), the BGP speaker SHOULD invalidate all the next-hop + cost information (i.e same treatment that applies to the next-hop + cost as to any other BGP learned information). - When a BGP speaker receives request for cost information it MUST - reply with actual cost (not necessarily IGP cost, but whatever has - been chosen to be carried in NH SAFI) to given next-hop or with cost - set to all-ones indicating that next-hop is unreachable. If next-hop - information is obtained from sender's routing table, then sender MUST - perform lookup exactly the same way as it would for resolving next- - hop in BGP UPDATE message. For example, for non-labelled - destinations (e.g. AFI/SAFI 1/1 or 2/1) lookup would be done using - longest match, whereas for labelled IPv4 (AFI/SAFI 1/4, 1/128 or 2/4) - exact-match would be used. +4.5. Graceful Restart and Route-Refresh - When a BGP speaker detects change in cost to previously advertised - next-hop with delta equal or exceeding configured advertisement - threshold, it SHOULD inform peer by sending MP_UNREACH_NLRI as - described earlier. + BGPLS sessions carrying next-hop cost could use Graceful Restart + [RFC4724] and Route Refresh [RFC7313] mechanisms in the same way as + it's used for IPv4 and IPv6 unicast. - When a BGP speaker discovers new next-hop among candidate routes it - SHOULD request cost information from the peer. +5. Security considerations -4.6. TERMINATION OF NH SAFI SESSION + This document does not introduce new security considerations above + and beyond those already specified in [RFC4271], [I-D.ietf-idr-bgp- + orr] and [I-D.ietf-idr-bgp-ls]. - When BGP speaker terminates (for whatever reason) NH SAFI session - with a peer, it SHOULD remove all cost information received from that - peer unless instructed by configuration to do otherwise. +6. IANA Considerations -4.7. GRACEFUL RESTART AND ROUTE REFRESH + This draft defines a new protocol id value for RIB Protocol ID. This + draft requests IANA to allocate a value for a RIB Protocol ID from + BGPLS Protocol ID Registry. - NH SAFI sessions could use graceful restart and route refresh - mechanisms in the same way as it's used for IPv4 and IPv6 unicast - - preservation and purge of next-hop cost information follows normal GR - rules. + This draft defines a new RIB Metrics Prefix Descriptor value. This + draft request IANA to allocate a TLV code value for the new + descriptor from the Prefix Descriptor registry. -5. Security considerations +7. Acknowledgements - No new security issues are introduced to the BGP protocol by this - specification. + The authors would like to acknowledge David Ward, Anton Elita, + Nagendra Kumar and Burjiz Pithawala for their critical reviews and + feedback. -6. IANA Considerations +8. References - IANA is requested to allocate value for Next-Hop Subsequent Address - Family Identifier. +8.1. Normative References -7. Acknowledgment + [I-D.ietf-idr-bgp-optimal-route-reflection] + Raszuk, R., Cassar, C., Aman, E., Decraene, B., and S. + Litkowski, "BGP Optimal Route Reflection (BGP-ORR)", + draft-ietf-idr-bgp-optimal-route-reflection-09 (work in + progress), April 2015. - Authors would like to thank Keyur Patel, Anton Elita, Nagendra Kumar - for critical reviews and feedback. + [I-D.ietf-idr-ls-distribution] + Gredler, H., Medved, J., Previdi, S., Farrel, A., and S. + Ray, "North-Bound Distribution of Link-State and TE + Information using BGP", draft-ietf-idr-ls-distribution-10 + (work in progress), January 2015. -8. References + [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate + Requirement Levels", BCP 14, RFC 2119, March 1997. -8.1. Normative References + [RFC2328] Moy, J., "OSPF Version 2", STD 54, RFC 2328, April 1998. [RFC4271] Rekhter, Y., Li, T., and S. Hares, "A Border Gateway Protocol 4 (BGP-4)", RFC 4271, January 2006. - [RFC4760] Bates, T., Chandra, R., Katz, D., and Y. Rekhter, - "Multiprotocol Extensions for BGP-4", RFC 4760, + [RFC4456] Bates, T., Chen, E., and R. Chandra, "BGP Route + Reflection: An Alternative to Full Mesh Internal BGP + (IBGP)", RFC 4456, April 2006. + + [RFC4724] Sangli, S., Chen, E., Fernando, R., Scudder, J., and Y. + Rekhter, "Graceful Restart Mechanism for BGP", RFC 4724, January 2007. -8.2. Informative References + [RFC4760] Bates, T., Chandra, R., Katz, D., and Y. Rekhter, + "Multiprotocol Extensions for BGP-4", RFC 4760, January + 2007. - [I-D.raszuk-bgp-optimal-route-reflection] - Raszuk, R., Cassar, C., Aman, E., and B. Decraene, "BGP - Optimal Route Reflection (BGP-ORR)", - draft-raszuk-bgp-optimal-route-reflection-01 (work in - progress), March 2011. + [RFC7313] Patel, K., Chen, E., and B. Venkatachalapathy, "Enhanced + Route Refresh Capability for BGP-4", RFC 7313, July 2014. + +8.2. Informative References [RFC2918] Chen, E., "Route Refresh Capability for BGP-4", RFC 2918, September 2000. Appendix A. USAGE SCENARIOS A.1. Trivial case --+---NetA---+-- | | @@ -312,69 +300,70 @@ R3 In this scenario r1 and r3 along with NetA are part of AS1; and R1-R4 along with RR are in AS2. If RR implements non-optimized route-reflection, then it will choose path to NetA via R1 and advertise it to both R3 and R4. Such choice is good from R3 perspective, but it results in suboptimal traffic flow from R4 to NetA. - Using NH SAFI the route-reflector will learn that cost from R4 to R1 - is 8 whereas to R2 it's only 1. RR will announce NetA to R4 with - next-hop set to R2, while its announce to R3 will still have R1 as - next-hop. Both R3 and R4 now will send traffic to NetA via closest - exit, achieving same behaviour as if full iBGP mesh would have been - configured. + Using the proposed BGPLS extensions, the route-reflector will learn + that cost from R4 to R1 is 8 whereas to R2 it's only 1. RR will + announce NetA to R4 with next-hop set to R2, while its announce to R3 + will still have R1 as next-hop. Both R3 and R4 now will send traffic + to NetA via closest exit, achieving same behaviour as if full iBGP + mesh would have been configured. A.2. Non-IGP based cost When it's desirable to direct traffic over an exit other than the one - with smallest IGP cost, NH SAFI can be used to convey cost which is - not based on IGP. For example, network operator may arrange exit - points in order of administrative preference and configure routers to - send this instead of IGP cost. Route reflector then will then - calculate best path based on administrative preference rather than - IGP metrics. + with smallest IGP cost, BGPLS extensions can be used to convey cost + which is not based on IGP. For example, network operator may arrange + exit points in order of administrative preference and configure + routers to send this instead of IGP cost. Route reflector then will + then calculate best path based on administrative preference rather + than IGP metrics. Network operators should excercise care to ensure that all routers up to and including exit point do not devert packets on to a different path, otherwise routing loops may occur. One way to achieve this is to have consistent administrative preference among all routers. Another option is to use a tunneling mechanism (e.g. MPLS-TE tunnel) between source and the exit point, provided that the router serving as exit point will send packets out of the network rather than diverting them to another exit point. A.3. Multiple route-reflectors - This example demonstrates that NH SAFI peerings are necessary only + This example demonstrates that BGPLS extensions are necessary only between routers that already exchange other AFI/SAFI. | R1----R3---------R5----R7--+ | | | RR1 | NetA | RR2 | | | | R2----R4---------R6----R8--+ | In the above network the routers R1-R4 are clients of RR1, and R5-R8 are clients of RR2. RR1 and RR2 also peer with each other and use ADDPATH. RR2 learns about NetA from R7 and R8. Since it sends not just best- path but all prefixes to RR1, there is no need for RR2 to learn cost information from R1 and R2 towards R7 and R8. On the other hand RR1 - does exchange NH SAFI information with R1 and R2 so that each of them - can receive routes, which are best from their perspective. + does exchange cost information using BGPLS with R1 and R2 so that + each of them can receive routes, which are best from their + perspective. As addition to ADDPATH a mechanism could be devised that would allow RR2 to learn how many alternative routes does it need to send to RR1. For example, if NetA would also be connected to R9 (not shown) but all clients of RR1 prefer R7 as exit point and R9 as next-best, then there is no need for RR2 to send NetA routes with next-hop R8 to RR1. Discussion: authors would like to solicit discussion whether there is sufficient interest in such mechanism. @@ -400,20 +389,44 @@ selection modification" requires RR to have next-hop cost information for every next-hop and every peer. Note that the problem is the same as if RR would not use extensions described in this document and R3 would peer directly with R1 and R2, while R4 would peer only with RR. Authors' Addresses Ilya Varlashkin - Easynet Global Services + Google + + Email: ilya@nobulus.com - Email: ilya.varlashkin@easynet.com Robert Raszuk - NTT MCL Inc. - 101 S Ellsworth Avenue Suite 350 - San Mateo, CA 94401 - US + Mirantis Inc. + 615 National Ave. #100 + Mt View, CA 94043 + USA Email: robert@raszuk.net + + Keyur Patel + Cisco Systems + 170 W. Tasman Drive + San Jose, CA 95124 95134 + USA + + Email: keyupate@cisco.com + Manish Bhardwaj + Cisco Systems + 170 W. Tasman Drive + San Jose, CA 95124 95134 + USA + + Email: manbhard@cisco.com + + Serpil Bayraktar + Cisco Systems + 170 W. Tasman Drive + San Jose, CA 95124 95134 + USA + + Email: serpil@cisco.com