draft-ietf-idr-bgp-nh-cost-01.txt | draft-ietf-idr-bgp-nh-cost-02.txt | |||
---|---|---|---|---|
Internet Engineering Task Force I. Varlashkin | Internet Engineering Task Force I. Varlashkin | |||
Internet-Draft Easynet Global Services | Internet-Draft Google | |||
Intended status: Standards Track R. Raszuk | Intended status: Standards Track R. Raszuk | |||
Expires: September 28, 2012 NTT MCL Inc. | Expires: November 16, 2015 Mirantis Inc. | |||
March 27, 2012 | K. Patel | |||
M. Bhardwaj | ||||
S. Bayraktar | ||||
Cisco Systems | ||||
May 15, 2015 | ||||
Carrying next-hop cost information in BGP | Carrying next-hop cost information in BGP | |||
draft-ietf-idr-bgp-nh-cost-01 | draft-ietf-idr-bgp-nh-cost-02 | |||
Abstract | Abstract | |||
This document describes new BGP SAFI to exchange cost information to | BGPLS provides a mechanism by which Link state and traffic | |||
next-hops for the purpose of calculating best path from a peer | engineering information can be collected from internal networks and | |||
perspective rather than local BGP speaker own perspective. | shared with external network routers using BGP. BGPLS defines a new | |||
Address Family to exchange this information using BGP. | ||||
Status of this Memo | BGP Optimal Route Reflection (ORR) provides a mechanism for a | |||
centralized BGP Route Reflector to acheive requirements of a Hot | ||||
Potato Routing as described in Section 11 of [RFC4456]. Optimal | ||||
Route Reflection requires BGP ORR to overwrite the default IGP | ||||
location placement of the route reflector; which is used for | ||||
determining cost to the nexthop contained in the path. | ||||
This draft augments BGPLS and defines a new extensions to exchange | ||||
cost information to next-hops for the purpose of calculating best | ||||
path from a peer perspective rather than local BGP speaker own | ||||
perspective. | ||||
Status of This Memo | ||||
This Internet-Draft is submitted in full conformance with the | This Internet-Draft is submitted in full conformance with the | |||
provisions of BCP 78 and BCP 79. | provisions of BCP 78 and BCP 79. | |||
Internet-Drafts are working documents of the Internet Engineering | Internet-Drafts are working documents of the Internet Engineering | |||
Task Force (IETF). Note that other groups may also distribute | Task Force (IETF). Note that other groups may also distribute | |||
working documents as Internet-Drafts. The list of current Internet- | working documents as Internet-Drafts. The list of current Internet- | |||
Drafts is at http://datatracker.ietf.org/drafts/current/. | Drafts is at http://datatracker.ietf.org/drafts/current/. | |||
Internet-Drafts are draft documents valid for a maximum of six months | Internet-Drafts are draft documents valid for a maximum of six months | |||
and may be updated, replaced, or obsoleted by other documents at any | and may be updated, replaced, or obsoleted by other documents at any | |||
time. It is inappropriate to use Internet-Drafts as reference | time. It is inappropriate to use Internet-Drafts as reference | |||
material or to cite them other than as "work in progress." | material or to cite them other than as "work in progress." | |||
This Internet-Draft will expire on November 16, 2015. | ||||
This Internet-Draft will expire on September 28, 2012. | ||||
Copyright Notice | Copyright Notice | |||
Copyright (c) 2012 IETF Trust and the persons identified as the | Copyright (c) 2015 IETF Trust and the persons identified as the | |||
document authors. All rights reserved. | document authors. All rights reserved. | |||
This document is subject to BCP 78 and the IETF Trust's Legal | This document is subject to BCP 78 and the IETF Trust's Legal | |||
Provisions Relating to IETF Documents | Provisions Relating to IETF Documents | |||
(http://trustee.ietf.org/license-info) in effect on the date of | (http://trustee.ietf.org/license-info) in effect on the date of | |||
publication of this document. Please review these documents | publication of this document. Please review these documents | |||
carefully, as they describe your rights and restrictions with respect | carefully, as they describe your rights and restrictions with respect | |||
to this document. Code Components extracted from this document must | to this document. Code Components extracted from this document must | |||
include Simplified BSD License text as described in Section 4.e of | include Simplified BSD License text as described in Section 4.e of | |||
the Trust Legal Provisions and are provided without warranty as | the Trust Legal Provisions and are provided without warranty as | |||
described in the Simplified BSD License. | described in the Simplified BSD License. | |||
This document may contain material from IETF Documents or IETF | ||||
Contributions published or made publicly available before November | ||||
10, 2008. The person(s) controlling the copyright in some of this | ||||
material may not have granted the IETF Trust the right to allow | ||||
modifications of such material outside the IETF Standards Process. | ||||
Without obtaining an adequate license from the person(s) controlling | ||||
the copyright in such materials, this document may not be modified | ||||
outside the IETF Standards Process, and derivative works of it may | ||||
not be created outside the IETF Standards Process, except to format | ||||
it for publication as an RFC or to translate it into languages other | ||||
than English. | ||||
Table of Contents | Table of Contents | |||
1. Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . 3 | 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 | |||
2. NEXT-HOP INFORMATION BASE . . . . . . . . . . . . . . . . . . . 3 | 2. NEXT-HOP INFORMATION BASE . . . . . . . . . . . . . . . . . . 3 | |||
3. BGP BEST PATH SELECTION MODIFICATION . . . . . . . . . . . . . 3 | 3. BGP Bestpath Selection Modification . . . . . . . . . . . . . 4 | |||
4. USING BGP TO POPULATE NHIB . . . . . . . . . . . . . . . . . . 4 | 4. BGPLS Extensions . . . . . . . . . . . . . . . . . . . . . . 4 | |||
4.1. NEXT-HOP SAFI . . . . . . . . . . . . . . . . . . . . . . . 4 | 4.1. RIB Metrics Prefix Descriptor . . . . . . . . . . . . . . 4 | |||
4.2. CAPABILITY ADVERTISEMENT . . . . . . . . . . . . . . . . . 4 | 4.2. RIB Protocol ID . . . . . . . . . . . . . . . . . . . . . 4 | |||
4.3. INFORMATION ENCODING . . . . . . . . . . . . . . . . . . . 4 | 4.3. Information Exchange . . . . . . . . . . . . . . . . . . 5 | |||
4.4. SESSION ESTABLISHMENT . . . . . . . . . . . . . . . . . . . 5 | 4.4. Termination of the session carrying next-hop cost . . . . 5 | |||
4.5. INFORMATION EXCHANGE . . . . . . . . . . . . . . . . . . . 5 | 4.5. Graceful Restart and Route-Refresh . . . . . . . . . . . 5 | |||
4.6. TERMINATION OF NH SAFI SESSION . . . . . . . . . . . . . . 6 | 5. Security considerations . . . . . . . . . . . . . . . . . . . 5 | |||
4.7. GRACEFUL RESTART AND ROUTE REFRESH . . . . . . . . . . . . 6 | 6. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 5 | |||
5. Security considerations . . . . . . . . . . . . . . . . . . . . 6 | 7. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 6 | |||
6. IANA Considerations . . . . . . . . . . . . . . . . . . . . . . 6 | 8. References . . . . . . . . . . . . . . . . . . . . . . . . . 6 | |||
7. Acknowledgment . . . . . . . . . . . . . . . . . . . . . . . . 7 | 8.1. Normative References . . . . . . . . . . . . . . . . . . 6 | |||
8. References . . . . . . . . . . . . . . . . . . . . . . . . . . 7 | 8.2. Informative References . . . . . . . . . . . . . . . . . 7 | |||
8.1. Normative References . . . . . . . . . . . . . . . . . . . 7 | Appendix A. USAGE SCENARIOS . . . . . . . . . . . . . . . . . . 7 | |||
8.2. Informative References . . . . . . . . . . . . . . . . . . 7 | A.1. Trivial case . . . . . . . . . . . . . . . . . . . . . . 7 | |||
Appendix A. USAGE SCENARIOS . . . . . . . . . . . . . . . . . . . 7 | A.2. Non-IGP based cost . . . . . . . . . . . . . . . . . . . 7 | |||
A.1. Trivial case . . . . . . . . . . . . . . . . . . . . . . . 7 | A.3. Multiple route-reflectors . . . . . . . . . . . . . . . . 8 | |||
A.2. Non-IGP based cost . . . . . . . . . . . . . . . . . . . . 8 | A.4. Inter-AS MPLS VPN . . . . . . . . . . . . . . . . . . . . 8 | |||
A.3. Multiple route-reflectors . . . . . . . . . . . . . . . . . 8 | A.5. Corner case . . . . . . . . . . . . . . . . . . . . . . . 9 | |||
A.4. Inter-AS MPLS VPN . . . . . . . . . . . . . . . . . . . . . 9 | Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 9 | |||
A.5. Corner case . . . . . . . . . . . . . . . . . . . . . . . . 9 | ||||
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 9 | ||||
1. Motivation | 1. Introduction | |||
In certain situation route-reflector clients may not get optimum path | In a certain situation, route-reflector clients may not get optimum | |||
to certain destinations. ADDPATH solves this problem by letting | path to certain destinations. ADDPATH solves this problem by letting | |||
route-reflector to advertise multiple paths for given prefix. If | route-reflector to advertise multiple paths for a given prefix. If | |||
number of advertised paths sufficiently big, route-reflector clients | number of advertised paths are sufficiently big, route-reflector | |||
can choose same route as they would in case of full-mesh. This | clients can choose same route as they would in case of full-mesh. | |||
approach however places additional burden on the control plane. | This approach however places an additional burden on the control | |||
Solutions proposed by [BGP-ORR] use different approach - instead of | plane. Solutions proposed by [BGP-ORR] use different approach - | |||
calculating best path from local speaker own perspective the | instead of calculating best path from the local speaker's own | |||
calculations are done using cost from the client to the next-hops. | perspective the calculations are done using cost from the client to | |||
Although they eliminate need for transmitting redundant routing | the next-hops. Although they eliminate need for transmitting | |||
information between peers, there are scenarios where cost to the | redundant routing information between peers, there are scenarios | |||
next-hop cannot be obtained accurately using this methods. For | where cost to the next-hop cannot be obtained accurately using these | |||
example, if next-hop information itself has been learned via BGP then | methods. For example, if next-hop information itself has been | |||
simple SPF run on link-state database won't be sufficient to obtain | learned via BGP then simple SPF run on link-state database won't be | |||
cost information. To address such scenarios this document proposes a | sufficient to obtain cost information. There are also scenarios | |||
solution where cost information to the next-hops is carried within | where while a Route Reflector can reach its clients, the client to | |||
BGP itself using dedicated SAFI. | client connectivity MAY be down. | |||
BGPLS [I-D.ietf-idr-bgp-orr] provides a mechanism by which Link state | ||||
and traffic engineering information can be collected from internal | ||||
networks and shared with external network routers using BGP. BGPLS | ||||
defines a new Address Family to exchange this information using BGP. | ||||
To address such scenarios, this draft defines extensions to BGPLS to | ||||
carry cost information of the next-hops. In particular, this draft | ||||
defines a new Protocol ID to announce a Router's IGP routes, and a | ||||
Prefix Descriptor to carry the cost information of the IGP routes | ||||
used towards resolving next-hops. | ||||
2. NEXT-HOP INFORMATION BASE | 2. NEXT-HOP INFORMATION BASE | |||
To facilitate further description of the proposed solution we | To facilitate further description of the proposed solution we | |||
introduce new table for all known next hops and costs to it from | introduce a new table for all known next-hops and costs to it from | |||
various routers on the network. | various routers on the network. | |||
Next-Hop Information Base (NHIB) stores cost to reach next-hop from | Next-Hop Information Base (NHIB) stores cost to reach next-hop from | |||
arbitrary router on the network. This information is essential for | an arbitrary router on the network. This information is essential | |||
choosing best path from a peer perspective rather than BGP-speaker | for choosing best path from a peer perspective rather than BGP- | |||
own perspective. In canonical form NHIB entry is triplet (router, | speaker own perspective. In canonical form NHIB entry is triplet | |||
next-hop, cost), however this specification does not impose any | (router, next-hop, cost), however this specification does not impose | |||
restriction on how BGP implementations store that information | any restriction on how BGP implementations store that information | |||
internally. The cost in NHIB is does not have to be an IGP cost, but | internally. The cost in NHIB is does not have to be an IGP cost, but | |||
all costs in NHIB MUST be comparable with each other. | all costs in NHIB MUST be comparable with each other. | |||
NHIB can be populated from various sources both static and dynamic. | NHIB can be populated from various sources including static routing | |||
This document focuses on populating NHIB using BGP. However it is | and dynamic routing. However, this document focuses on populating | |||
possible that protocols other than BGP could be also used to populate | NHIB using BGP. | |||
NHIB. | ||||
3. BGP BEST PATH SELECTION MODIFICATION | An implementation implementing the BGP extension described in this | |||
draft MAY provide an operator-controlled configuration knob | ||||
significant to an individual BGP speaker that treats next-hop cost | ||||
information received from two or more clients as equivalent. For | ||||
example a route-reflector could receive next-hop cost only from R1 | ||||
but it will use it while calculating best-path also for R2, R3, Rn | ||||
because it has been instructed to do so by locally-significant | ||||
configuration. Multiple sources can be used for redundancy purpose. | ||||
3. BGP Bestpath Selection Modification | ||||
This section applies regardless of method used to populate NHIB. | This section applies regardless of method used to populate NHIB. | |||
When BGP speaker conforming to this specification selects routes to | When BGP speaker conforming to this specification selects routes to | |||
be advertised to a peer it SHOULD use cost information from NHIB | be advertised to a peer it SHOULD use cost information from NHIB | |||
rather than its own IGP cost to the next-hop after step (d) of | rather than its own IGP cost to the next-hop after step (d) of | |||
9.1.2.2 in [RFC4271]. | 9.1.2.2 in [RFC4271]. | |||
4. USING BGP TO POPULATE NHIB | 4. BGPLS Extensions | |||
This section describes extension to base BGP specification that | ||||
allows BGP to be used for exchanging next-hop information between BGP | ||||
speakers via new SAFI in order to populate NHIB. Although next-hops | ||||
costs are exchanged via dedicated SAFI, this information is vital to | ||||
best path selection process for other AFI/SAFI (e.g. IPv4 and IPv6 | ||||
unicast). It's therefore recommended that next-hop cost information | ||||
is exchanged before other AFI/SAFI. | ||||
4.1. NEXT-HOP SAFI | ||||
This document introduces Next-Hop SAFI (NH SAFI) with value to be | ||||
assigned by IANA and purpose of exchanging information about cost to | ||||
next-hops. | ||||
4.2. CAPABILITY ADVERTISEMENT | ||||
A BGP speaker willing to exchange next-hop information MUST advertise | ||||
this in the OPEN message using BGP Capability Code 1 (Multiprotocol | ||||
Extensions, see [RFC4760]) setting AFI appropriately to indicate IPv4 | ||||
or IPv6 and SAFI to the value assigned by IANA for NH SAFI. Note | ||||
that if BGP speaker whishes to exchange cost information for both | ||||
IPv4 and IPv6, then it MUST advertise two capabilities: one NH SAFI | ||||
for IPv4 and one NH SAFI for IPv6. | ||||
4.3. INFORMATION ENCODING | ||||
Routers use standard BGP UPDATE messages to exchange NH SAFI | ||||
information. Cost to reachable next-hops is communicated using | ||||
MP_REACH_NLRI (attribute 14) with NLRI part as described below. | ||||
Requests are also sent using MP_REACH_NLRI. Informing a neighbour | ||||
about unreachable next-hop is done using MP_UNREACH_NLRI. All NH | ||||
SAFI messages MUST contain BGP COMMUNITY attribute with value | ||||
NO_ADVERTISE (0xFFFFFF02) and their propagation MUST follow normal | ||||
BGP rules (i.e. they're not to be propagated). | ||||
To request cost to a next-hop from peer or to inform peer about cost | ||||
to a next-hop BGP attribute 14 is used as follow: | ||||
1. AFI is set to indicate IPv4 or IPv6 (whichever is appropriate) | 4.1. RIB Metrics Prefix Descriptor | |||
2. SAFI is set to NH SAFI | This draft defines a new Prefix Descriptor known as a Cost Prefix | |||
3. Network Address of Next-Hop field is zeroed out | Descriptor with a TLV code point value to be assigned by IANA. The | |||
Cost descriptor looks like: | ||||
4. NLRI field is encoded as shown in the next figure | +--------------+-----------------------+----------+-----------------+ | |||
| TLV Code | Description | Length | Value defined | | ||||
| Point | | | in: | | ||||
+--------------+-----------------------+----------+-----------------+ | ||||
| TBD | Cost | 4 bytes | Cost Value | | ||||
+--------------+-----------------------+----------+-----------------+ | ||||
Format of NH SAFI NLRI is as follow: | Cost Value is a 4 byte Metric value computed by a Router's | |||
+-----+------+-------+----------+------+ | local RIB. | |||
| AFI | SAFI | Flags | NEXT_HOP | cost | | ||||
+-----+------+-------+----------+------+ | ||||
Flags - 1 octet field. Least significant bit MUST be set to 1 for | The Cost value is a cost associated with a prefix by a Router. The | |||
Request and to zero for Response | cost is typically computed by the routing procotols that owns a | |||
route. | ||||
AFI/SAFI fields can be set either to one of the registered values to | 4.2. RIB Protocol ID | |||
indicate that next-hop cost info applies only to specified AFI/SAFI. | ||||
Alternatively when both fields are be set to zero, the cost | ||||
information applies to any compatible AFI/SAFI negotiated with given | ||||
peer. | ||||
Next-hop - IPv4 or IPv6 address for which cost is being communicated | This draft defines a new protocol ID for IPv4 and IPv6 Topology | |||
or requested. Type is determined from context, and length is | Prefix NLRI known as a RIB Protocol ID. The RIB Protocol ID has a | |||
inferred from total length of attribute. | value to be assigned by IANA. The Prefix NLRI with RIB Protocol ID | |||
is used to announce all the local and IGP computated routes that are | ||||
installed in the RIB along with its Cost value. | ||||
Cost is 32-bit unsigned integer (value described below), and NEXT_HOP | 4.3. Information Exchange | |||
is AFI-specific address of the next-hop cost to which is being | ||||
communicated or requested. Size of NEXT_HOP field is inferred from | ||||
total length of attribute 14. | ||||
To inform peer that particular next-hop is unreachable | Typically BGPLS sessions will be established between route-reflectors | |||
MP_UNREACH_NLRI attribute is used with same NLRI format as described | and its internal peers (both clients and non-clients). As soon as | |||
above. In this case cost field SHOULD be set to 0xFFFFFFFF. | the BGPLS session is ESTABLISHED, all the RIB routes used to resolve | |||
next-hop cost and information about next-hop costs MAY be sent | ||||
immediately by clients to its route-reflector. Implementations are | ||||
advised to announce BGP updates for this SAFI before any other SAFIs | ||||
to facilitate faster convergence of other SAFIs on Route Reflectors. | ||||
4.4. SESSION ESTABLISHMENT | Each internal neighbor of a route-reflector announces its IGP RIB | |||
Prefix information and its RIB metrics to the Route Reflector using a | ||||
BGPLS session and a new NLRI Protocol ID and RIB metric Prefix | ||||
Descriptor. Each neighbor updates Route Reflector with its IGP | ||||
prefix cost everytime a cost to an IGP route changes. | ||||
BGP speakers willing to exchange next-hop information SHOULD NOT | Upon a receipt of a BGPLS route and its associated cost, a Route | |||
establish more then one session for given AFI and NH SAFI, even using | Reflector stores the prefix, cost, and neighbor information in its | |||
different transport addresses. This can be ensured for example by | local NHRIB database. It then uses the received cost towards | |||
checking peer's Router Id. | calculation of bestpath from the respective clients perpective as | |||
opposed to its own IGP cost. | ||||
4.5. INFORMATION EXCHANGE | 4.4. Termination of the session carrying next-hop cost | |||
Typically NH SAFI sessions will be established between route- | When the BGPLS session carrying next-hop cost terminates (for | |||
reflectors and its internal peers (both clients and non-clients). As | whatever reason), the BGP speaker SHOULD invalidate all the next-hop | |||
soon as the NH SAFI session is ESTABLISHED requests for next-hop cost | cost information (i.e same treatment that applies to the next-hop | |||
and information information about next-hop costs MAY be sent | cost as to any other BGP learned information). | |||
independently. That is, route-reflector MAY send multiple requests | ||||
without waiting for response, and its peers MAY send cost information | ||||
before or after receiving such request. On the other hand, Router | ||||
Reflectors SHOULD request cost information from their internal peers | ||||
as soon as possible (due to reasons stated in section "BGP best path | ||||
selection modification"). BGP speaker does not need to track | ||||
outstanding requests to the peer. | ||||
When a BGP speaker receives request for cost information it MUST | 4.5. Graceful Restart and Route-Refresh | |||
reply with actual cost (not necessarily IGP cost, but whatever has | ||||
been chosen to be carried in NH SAFI) to given next-hop or with cost | ||||
set to all-ones indicating that next-hop is unreachable. If next-hop | ||||
information is obtained from sender's routing table, then sender MUST | ||||
perform lookup exactly the same way as it would for resolving next- | ||||
hop in BGP UPDATE message. For example, for non-labelled | ||||
destinations (e.g. AFI/SAFI 1/1 or 2/1) lookup would be done using | ||||
longest match, whereas for labelled IPv4 (AFI/SAFI 1/4, 1/128 or 2/4) | ||||
exact-match would be used. | ||||
When a BGP speaker detects change in cost to previously advertised | BGPLS sessions carrying next-hop cost could use Graceful Restart | |||
next-hop with delta equal or exceeding configured advertisement | [RFC4724] and Route Refresh [RFC7313] mechanisms in the same way as | |||
threshold, it SHOULD inform peer by sending MP_UNREACH_NLRI as | it's used for IPv4 and IPv6 unicast. | |||
described earlier. | ||||
When a BGP speaker discovers new next-hop among candidate routes it | 5. Security considerations | |||
SHOULD request cost information from the peer. | ||||
4.6. TERMINATION OF NH SAFI SESSION | This document does not introduce new security considerations above | |||
and beyond those already specified in [RFC4271], [I-D.ietf-idr-bgp- | ||||
orr] and [I-D.ietf-idr-bgp-ls]. | ||||
When BGP speaker terminates (for whatever reason) NH SAFI session | 6. IANA Considerations | |||
with a peer, it SHOULD remove all cost information received from that | ||||
peer unless instructed by configuration to do otherwise. | ||||
4.7. GRACEFUL RESTART AND ROUTE REFRESH | This draft defines a new protocol id value for RIB Protocol ID. This | |||
draft requests IANA to allocate a value for a RIB Protocol ID from | ||||
BGPLS Protocol ID Registry. | ||||
NH SAFI sessions could use graceful restart and route refresh | This draft defines a new RIB Metrics Prefix Descriptor value. This | |||
mechanisms in the same way as it's used for IPv4 and IPv6 unicast - | draft request IANA to allocate a TLV code value for the new | |||
preservation and purge of next-hop cost information follows normal GR | descriptor from the Prefix Descriptor registry. | |||
rules. | ||||
5. Security considerations | 7. Acknowledgements | |||
No new security issues are introduced to the BGP protocol by this | The authors would like to acknowledge David Ward, Anton Elita, | |||
specification. | Nagendra Kumar and Burjiz Pithawala for their critical reviews and | |||
feedback. | ||||
6. IANA Considerations | 8. References | |||
IANA is requested to allocate value for Next-Hop Subsequent Address | 8.1. Normative References | |||
Family Identifier. | ||||
7. Acknowledgment | [I-D.ietf-idr-bgp-optimal-route-reflection] | |||
Raszuk, R., Cassar, C., Aman, E., Decraene, B., and S. | ||||
Litkowski, "BGP Optimal Route Reflection (BGP-ORR)", | ||||
draft-ietf-idr-bgp-optimal-route-reflection-09 (work in | ||||
progress), April 2015. | ||||
Authors would like to thank Keyur Patel, Anton Elita, Nagendra Kumar | [I-D.ietf-idr-ls-distribution] | |||
for critical reviews and feedback. | Gredler, H., Medved, J., Previdi, S., Farrel, A., and S. | |||
Ray, "North-Bound Distribution of Link-State and TE | ||||
Information using BGP", draft-ietf-idr-ls-distribution-10 | ||||
(work in progress), January 2015. | ||||
8. References | [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate | |||
Requirement Levels", BCP 14, RFC 2119, March 1997. | ||||
8.1. Normative References | [RFC2328] Moy, J., "OSPF Version 2", STD 54, RFC 2328, April 1998. | |||
[RFC4271] Rekhter, Y., Li, T., and S. Hares, "A Border Gateway | [RFC4271] Rekhter, Y., Li, T., and S. Hares, "A Border Gateway | |||
Protocol 4 (BGP-4)", RFC 4271, January 2006. | Protocol 4 (BGP-4)", RFC 4271, January 2006. | |||
[RFC4760] Bates, T., Chandra, R., Katz, D., and Y. Rekhter, | [RFC4456] Bates, T., Chen, E., and R. Chandra, "BGP Route | |||
"Multiprotocol Extensions for BGP-4", RFC 4760, | Reflection: An Alternative to Full Mesh Internal BGP | |||
(IBGP)", RFC 4456, April 2006. | ||||
[RFC4724] Sangli, S., Chen, E., Fernando, R., Scudder, J., and Y. | ||||
Rekhter, "Graceful Restart Mechanism for BGP", RFC 4724, | ||||
January 2007. | January 2007. | |||
8.2. Informative References | [RFC4760] Bates, T., Chandra, R., Katz, D., and Y. Rekhter, | |||
"Multiprotocol Extensions for BGP-4", RFC 4760, January | ||||
2007. | ||||
[I-D.raszuk-bgp-optimal-route-reflection] | [RFC7313] Patel, K., Chen, E., and B. Venkatachalapathy, "Enhanced | |||
Raszuk, R., Cassar, C., Aman, E., and B. Decraene, "BGP | Route Refresh Capability for BGP-4", RFC 7313, July 2014. | |||
Optimal Route Reflection (BGP-ORR)", | ||||
draft-raszuk-bgp-optimal-route-reflection-01 (work in | 8.2. Informative References | |||
progress), March 2011. | ||||
[RFC2918] Chen, E., "Route Refresh Capability for BGP-4", RFC 2918, | [RFC2918] Chen, E., "Route Refresh Capability for BGP-4", RFC 2918, | |||
September 2000. | September 2000. | |||
Appendix A. USAGE SCENARIOS | Appendix A. USAGE SCENARIOS | |||
A.1. Trivial case | A.1. Trivial case | |||
--+---NetA---+-- | --+---NetA---+-- | |||
| | | | | | |||
skipping to change at page 8, line 6 | skipping to change at page 7, line 31 | |||
R3 | R3 | |||
In this scenario r1 and r3 along with NetA are part of AS1; and R1-R4 | In this scenario r1 and r3 along with NetA are part of AS1; and R1-R4 | |||
along with RR are in AS2. | along with RR are in AS2. | |||
If RR implements non-optimized route-reflection, then it will choose | If RR implements non-optimized route-reflection, then it will choose | |||
path to NetA via R1 and advertise it to both R3 and R4. Such choice | path to NetA via R1 and advertise it to both R3 and R4. Such choice | |||
is good from R3 perspective, but it results in suboptimal traffic | is good from R3 perspective, but it results in suboptimal traffic | |||
flow from R4 to NetA. | flow from R4 to NetA. | |||
Using NH SAFI the route-reflector will learn that cost from R4 to R1 | Using the proposed BGPLS extensions, the route-reflector will learn | |||
is 8 whereas to R2 it's only 1. RR will announce NetA to R4 with | that cost from R4 to R1 is 8 whereas to R2 it's only 1. RR will | |||
next-hop set to R2, while its announce to R3 will still have R1 as | announce NetA to R4 with next-hop set to R2, while its announce to R3 | |||
next-hop. Both R3 and R4 now will send traffic to NetA via closest | will still have R1 as next-hop. Both R3 and R4 now will send traffic | |||
exit, achieving same behaviour as if full iBGP mesh would have been | to NetA via closest exit, achieving same behaviour as if full iBGP | |||
configured. | mesh would have been configured. | |||
A.2. Non-IGP based cost | A.2. Non-IGP based cost | |||
When it's desirable to direct traffic over an exit other than the one | When it's desirable to direct traffic over an exit other than the one | |||
with smallest IGP cost, NH SAFI can be used to convey cost which is | with smallest IGP cost, BGPLS extensions can be used to convey cost | |||
not based on IGP. For example, network operator may arrange exit | which is not based on IGP. For example, network operator may arrange | |||
points in order of administrative preference and configure routers to | exit points in order of administrative preference and configure | |||
send this instead of IGP cost. Route reflector then will then | routers to send this instead of IGP cost. Route reflector then will | |||
calculate best path based on administrative preference rather than | then calculate best path based on administrative preference rather | |||
IGP metrics. | than IGP metrics. | |||
Network operators should excercise care to ensure that all routers up | Network operators should excercise care to ensure that all routers up | |||
to and including exit point do not devert packets on to a different | to and including exit point do not devert packets on to a different | |||
path, otherwise routing loops may occur. One way to achieve this is | path, otherwise routing loops may occur. One way to achieve this is | |||
to have consistent administrative preference among all routers. | to have consistent administrative preference among all routers. | |||
Another option is to use a tunneling mechanism (e.g. MPLS-TE tunnel) | Another option is to use a tunneling mechanism (e.g. MPLS-TE tunnel) | |||
between source and the exit point, provided that the router serving | between source and the exit point, provided that the router serving | |||
as exit point will send packets out of the network rather than | as exit point will send packets out of the network rather than | |||
diverting them to another exit point. | diverting them to another exit point. | |||
A.3. Multiple route-reflectors | A.3. Multiple route-reflectors | |||
This example demonstrates that NH SAFI peerings are necessary only | This example demonstrates that BGPLS extensions are necessary only | |||
between routers that already exchange other AFI/SAFI. | between routers that already exchange other AFI/SAFI. | |||
| | | | |||
R1----R3---------R5----R7--+ | R1----R3---------R5----R7--+ | |||
| | | | | | | | |||
RR1 | NetA | RR1 | NetA | |||
| RR2 | | | RR2 | | |||
| | | | | | | | |||
R2----R4---------R6----R8--+ | R2----R4---------R6----R8--+ | |||
| | | | |||
In the above network the routers R1-R4 are clients of RR1, and R5-R8 | In the above network the routers R1-R4 are clients of RR1, and R5-R8 | |||
are clients of RR2. RR1 and RR2 also peer with each other and use | are clients of RR2. RR1 and RR2 also peer with each other and use | |||
ADDPATH. | ADDPATH. | |||
RR2 learns about NetA from R7 and R8. Since it sends not just best- | RR2 learns about NetA from R7 and R8. Since it sends not just best- | |||
path but all prefixes to RR1, there is no need for RR2 to learn cost | path but all prefixes to RR1, there is no need for RR2 to learn cost | |||
information from R1 and R2 towards R7 and R8. On the other hand RR1 | information from R1 and R2 towards R7 and R8. On the other hand RR1 | |||
does exchange NH SAFI information with R1 and R2 so that each of them | does exchange cost information using BGPLS with R1 and R2 so that | |||
can receive routes, which are best from their perspective. | each of them can receive routes, which are best from their | |||
perspective. | ||||
As addition to ADDPATH a mechanism could be devised that would allow | As addition to ADDPATH a mechanism could be devised that would allow | |||
RR2 to learn how many alternative routes does it need to send to RR1. | RR2 to learn how many alternative routes does it need to send to RR1. | |||
For example, if NetA would also be connected to R9 (not shown) but | For example, if NetA would also be connected to R9 (not shown) but | |||
all clients of RR1 prefer R7 as exit point and R9 as next-best, then | all clients of RR1 prefer R7 as exit point and R9 as next-best, then | |||
there is no need for RR2 to send NetA routes with next-hop R8 to RR1. | there is no need for RR2 to send NetA routes with next-hop R8 to RR1. | |||
Discussion: authors would like to solicit discussion whether there is | Discussion: authors would like to solicit discussion whether there is | |||
sufficient interest in such mechanism. | sufficient interest in such mechanism. | |||
skipping to change at page 9, line 46 | skipping to change at page 9, line 27 | |||
selection modification" requires RR to have next-hop cost information | selection modification" requires RR to have next-hop cost information | |||
for every next-hop and every peer. | for every next-hop and every peer. | |||
Note that the problem is the same as if RR would not use extensions | Note that the problem is the same as if RR would not use extensions | |||
described in this document and R3 would peer directly with R1 and R2, | described in this document and R3 would peer directly with R1 and R2, | |||
while R4 would peer only with RR. | while R4 would peer only with RR. | |||
Authors' Addresses | Authors' Addresses | |||
Ilya Varlashkin | Ilya Varlashkin | |||
Easynet Global Services | ||||
Email: ilya@nobulus.com | ||||
Email: ilya.varlashkin@easynet.com | ||||
Robert Raszuk | Robert Raszuk | |||
NTT MCL Inc. | Mirantis Inc. | |||
101 S Ellsworth Avenue Suite 350 | 615 National Ave. #100 | |||
San Mateo, CA 94401 | Mt View, CA 94043 | |||
US | USA | |||
Email: robert@raszuk.net | Email: robert@raszuk.net | |||
Keyur Patel | ||||
Cisco Systems | ||||
170 W. Tasman Drive | ||||
San Jose, CA 95124 95134 | ||||
USA | ||||
Email: keyupate@cisco.com | ||||
Manish Bhardwaj | ||||
Cisco Systems | ||||
170 W. Tasman Drive | ||||
San Jose, CA 95124 95134 | ||||
USA | ||||
Email: manbhard@cisco.com | ||||
Serpil Bayraktar | ||||
Cisco Systems | ||||
170 W. Tasman Drive | ||||
San Jose, CA 95124 95134 | ||||
USA | ||||
Email: serpil@cisco.com | ||||
End of changes. 55 change blocks. | ||||
222 lines changed or deleted | 212 lines changed or added | |||
This html diff was produced by rfcdiff 1.42. The latest version is available from http://tools.ietf.org/tools/rfcdiff/ |