--- 1/draft-ietf-idr-bgp-nh-cost-00.txt 2012-03-27 10:13:59.374756439 +0200 +++ 2/draft-ietf-idr-bgp-nh-cost-01.txt 2012-03-27 10:13:59.394757428 +0200 @@ -1,19 +1,19 @@ Internet Engineering Task Force I. Varlashkin Internet-Draft Easynet Global Services Intended status: Standards Track R. Raszuk -Expires: August 2, 2012 NTT MCL Inc. - January 30, 2012 +Expires: September 28, 2012 NTT MCL Inc. + March 27, 2012 Carrying next-hop cost information in BGP - draft-ietf-idr-bgp-nh-cost-00 + draft-ietf-idr-bgp-nh-cost-01 Abstract This document describes new BGP SAFI to exchange cost information to next-hops for the purpose of calculating best path from a peer perspective rather than local BGP speaker own perspective. Status of this Memo This Internet-Draft is submitted in full conformance with the @@ -22,21 +22,21 @@ Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet- Drafts is at http://datatracker.ietf.org/drafts/current/. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." - This Internet-Draft will expire on August 2, 2012. + This Internet-Draft will expire on September 28, 2012. Copyright Notice Copyright (c) 2012 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents @@ -66,29 +66,30 @@ 4. USING BGP TO POPULATE NHIB . . . . . . . . . . . . . . . . . . 4 4.1. NEXT-HOP SAFI . . . . . . . . . . . . . . . . . . . . . . . 4 4.2. CAPABILITY ADVERTISEMENT . . . . . . . . . . . . . . . . . 4 4.3. INFORMATION ENCODING . . . . . . . . . . . . . . . . . . . 4 4.4. SESSION ESTABLISHMENT . . . . . . . . . . . . . . . . . . . 5 4.5. INFORMATION EXCHANGE . . . . . . . . . . . . . . . . . . . 5 4.6. TERMINATION OF NH SAFI SESSION . . . . . . . . . . . . . . 6 4.7. GRACEFUL RESTART AND ROUTE REFRESH . . . . . . . . . . . . 6 5. Security considerations . . . . . . . . . . . . . . . . . . . . 6 6. IANA Considerations . . . . . . . . . . . . . . . . . . . . . . 6 - 7. References . . . . . . . . . . . . . . . . . . . . . . . . . . 6 - 7.1. Normative References . . . . . . . . . . . . . . . . . . . 6 - 7.2. Informative References . . . . . . . . . . . . . . . . . . 6 + 7. Acknowledgment . . . . . . . . . . . . . . . . . . . . . . . . 7 + 8. References . . . . . . . . . . . . . . . . . . . . . . . . . . 7 + 8.1. Normative References . . . . . . . . . . . . . . . . . . . 7 + 8.2. Informative References . . . . . . . . . . . . . . . . . . 7 Appendix A. USAGE SCENARIOS . . . . . . . . . . . . . . . . . . . 7 A.1. Trivial case . . . . . . . . . . . . . . . . . . . . . . . 7 - A.2. Non-IGP based cost . . . . . . . . . . . . . . . . . . . . 7 + A.2. Non-IGP based cost . . . . . . . . . . . . . . . . . . . . 8 A.3. Multiple route-reflectors . . . . . . . . . . . . . . . . . 8 - A.4. Inter-AS MPLS VPN . . . . . . . . . . . . . . . . . . . . . 8 - A.5. Corner case . . . . . . . . . . . . . . . . . . . . . . . . 8 + A.4. Inter-AS MPLS VPN . . . . . . . . . . . . . . . . . . . . . 9 + A.5. Corner case . . . . . . . . . . . . . . . . . . . . . . . . 9 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 9 1. Motivation In certain situation route-reflector clients may not get optimum path to certain destinations. ADDPATH solves this problem by letting route-reflector to advertise multiple paths for given prefix. If number of advertised paths sufficiently big, route-reflector clients can choose same route as they would in case of full-mesh. This approach however places additional burden on the control plane. @@ -154,48 +155,65 @@ A BGP speaker willing to exchange next-hop information MUST advertise this in the OPEN message using BGP Capability Code 1 (Multiprotocol Extensions, see [RFC4760]) setting AFI appropriately to indicate IPv4 or IPv6 and SAFI to the value assigned by IANA for NH SAFI. Note that if BGP speaker whishes to exchange cost information for both IPv4 and IPv6, then it MUST advertise two capabilities: one NH SAFI for IPv4 and one NH SAFI for IPv6. 4.3. INFORMATION ENCODING + Routers use standard BGP UPDATE messages to exchange NH SAFI + information. Cost to reachable next-hops is communicated using + MP_REACH_NLRI (attribute 14) with NLRI part as described below. + Requests are also sent using MP_REACH_NLRI. Informing a neighbour + about unreachable next-hop is done using MP_UNREACH_NLRI. All NH + SAFI messages MUST contain BGP COMMUNITY attribute with value + NO_ADVERTISE (0xFFFFFF02) and their propagation MUST follow normal + BGP rules (i.e. they're not to be propagated). + To request cost to a next-hop from peer or to inform peer about cost to a next-hop BGP attribute 14 is used as follow: 1. AFI is set to indicate IPv4 or IPv6 (whichever is appropriate) 2. SAFI is set to NH SAFI - 3. Network Address of Next-Hop field is zeroed out 4. NLRI field is encoded as shown in the next figure - +-------------+------------+ - | NEXT_HOP | cost | - +-------------+------------+ + Format of NH SAFI NLRI is as follow: + +-----+------+-------+----------+------+ + | AFI | SAFI | Flags | NEXT_HOP | cost | + +-----+------+-------+----------+------+ - Where cost is 32-bit unsigned integer (value described below), and - NEXT_HOP is AFI-specific address of the next-hop cost to which is - being communicated or requested. Size of NEXT_HOP field is inferred - from total length of attribute 14. + Flags - 1 octet field. Least significant bit MUST be set to 1 for + Request and to zero for Response - To request cost to arbitrary next-hop from a peer, BGP speaker sets - cost field to zero. + AFI/SAFI fields can be set either to one of the registered values to + indicate that next-hop cost info applies only to specified AFI/SAFI. + Alternatively when both fields are be set to zero, the cost + information applies to any compatible AFI/SAFI negotiated with given + peer. - To inform peer about cost to a next-hop BGP speaker sets cost to - actual cost value. + Next-hop - IPv4 or IPv6 address for which cost is being communicated + or requested. Type is determined from context, and length is + inferred from total length of attribute. - To inform peer that a next-hop is not reachable the cost is set to - all-ones (0xFFFFFFFF). + Cost is 32-bit unsigned integer (value described below), and NEXT_HOP + is AFI-specific address of the next-hop cost to which is being + communicated or requested. Size of NEXT_HOP field is inferred from + total length of attribute 14. + + To inform peer that particular next-hop is unreachable + MP_UNREACH_NLRI attribute is used with same NLRI format as described + above. In this case cost field SHOULD be set to 0xFFFFFFFF. 4.4. SESSION ESTABLISHMENT BGP speakers willing to exchange next-hop information SHOULD NOT establish more then one session for given AFI and NH SAFI, even using different transport addresses. This can be ensured for example by checking peer's Router Id. 4.5. INFORMATION EXCHANGE @@ -207,66 +225,76 @@ without waiting for response, and its peers MAY send cost information before or after receiving such request. On the other hand, Router Reflectors SHOULD request cost information from their internal peers as soon as possible (due to reasons stated in section "BGP best path selection modification"). BGP speaker does not need to track outstanding requests to the peer. When a BGP speaker receives request for cost information it MUST reply with actual cost (not necessarily IGP cost, but whatever has been chosen to be carried in NH SAFI) to given next-hop or with cost - set to all-ones indicating that next-hop is unreachable. - - Note that BGP speaker MUST use longest match rather than exact match - for the next-hop. + set to all-ones indicating that next-hop is unreachable. If next-hop + information is obtained from sender's routing table, then sender MUST + perform lookup exactly the same way as it would for resolving next- + hop in BGP UPDATE message. For example, for non-labelled + destinations (e.g. AFI/SAFI 1/1 or 2/1) lookup would be done using + longest match, whereas for labelled IPv4 (AFI/SAFI 1/4, 1/128 or 2/4) + exact-match would be used. When a BGP speaker detects change in cost to previously advertised next-hop with delta equal or exceeding configured advertisement - threshold, it SHOULD inform peer by advertising new cost or - 0xFFFFFFFF. + threshold, it SHOULD inform peer by sending MP_UNREACH_NLRI as + described earlier. When a BGP speaker discovers new next-hop among candidate routes it SHOULD request cost information from the peer. 4.6. TERMINATION OF NH SAFI SESSION When BGP speaker terminates (for whatever reason) NH SAFI session with a peer, it SHOULD remove all cost information received from that peer unless instructed by configuration to do otherwise. 4.7. GRACEFUL RESTART AND ROUTE REFRESH NH SAFI sessions could use graceful restart and route refresh - mechanisms in the same way as it's used for IPv4 and IPv6 unicast. + mechanisms in the same way as it's used for IPv4 and IPv6 unicast - + preservation and purge of next-hop cost information follows normal GR + rules. 5. Security considerations No new security issues are introduced to the BGP protocol by this specification. 6. IANA Considerations IANA is requested to allocate value for Next-Hop Subsequent Address Family Identifier. -7. References +7. Acknowledgment -7.1. Normative References + Authors would like to thank Keyur Patel, Anton Elita, Nagendra Kumar + for critical reviews and feedback. + +8. References + +8.1. Normative References [RFC4271] Rekhter, Y., Li, T., and S. Hares, "A Border Gateway Protocol 4 (BGP-4)", RFC 4271, January 2006. [RFC4760] Bates, T., Chandra, R., Katz, D., and Y. Rekhter, "Multiprotocol Extensions for BGP-4", RFC 4760, January 2007. -7.2. Informative References +8.2. Informative References [I-D.raszuk-bgp-optimal-route-reflection] Raszuk, R., Cassar, C., Aman, E., and B. Decraene, "BGP Optimal Route Reflection (BGP-ORR)", draft-raszuk-bgp-optimal-route-reflection-01 (work in progress), March 2011. [RFC2918] Chen, E., "Route Refresh Capability for BGP-4", RFC 2918, September 2000.