draft-ietf-idr-restart-11.txt | draft-ietf-idr-restart-12.txt | |||
---|---|---|---|---|
Network Working Group Srihari R. Sangli | Network Working Group Srihari R. Sangli | |||
Internet Draft Yakov Rekhter | Internet Draft Yakov Rekhter | |||
Expiration Date: November 2006 Rex Fernando | Expiration Date: December 2006 Rex Fernando | |||
John G. Scudder | John G. Scudder | |||
Enke Chen | Enke Chen | |||
Graceful Restart Mechanism for BGP | Graceful Restart Mechanism for BGP | |||
draft-ietf-idr-restart-11.txt | draft-ietf-idr-restart-12.txt | |||
Status of this Memo | Status of this Memo | |||
Internet-Drafts are working documents of the Internet Engineering | Internet-Drafts are working documents of the Internet Engineering | |||
Task Force (IETF), its areas, and its working groups. Note that | Task Force (IETF), its areas, and its working groups. Note that | |||
other groups may also distribute working documents as Internet- | other groups may also distribute working documents as Internet- | |||
Drafts. | Drafts. | |||
Internet-Drafts are draft documents valid for a maximum of six months | Internet-Drafts are draft documents valid for a maximum of six months | |||
and may be updated, replaced, or obsoleted by other documents at any | and may be updated, replaced, or obsoleted by other documents at any | |||
skipping to change at line 39 | skipping to change at line 39 | |||
IPR Disclosure Acknowledgement | IPR Disclosure Acknowledgement | |||
By submitting this Internet-Draft, each author represents that any | By submitting this Internet-Draft, each author represents that any | |||
applicable patent or other IPR claims of which he or she is aware | applicable patent or other IPR claims of which he or she is aware | |||
have been or will be disclosed, and any of which he or she becomes | have been or will be disclosed, and any of which he or she becomes | |||
aware will be disclosed, in accordance with Section 6 of BCP 79. | aware will be disclosed, in accordance with Section 6 of BCP 79. | |||
Abstract | Abstract | |||
This document proposes a mechanism for BGP that would help minimize | This document describes a mechanism for BGP that would help minimize | |||
the negative effects on routing caused by BGP restart. An End-of-RIB | the negative effects on routing caused by BGP restart. An End-of-RIB | |||
marker is specified and can be used to convey routing convergence | marker is specified and can be used to convey routing convergence | |||
information. A new BGP capability, termed "Graceful Restart | information. A new BGP capability, termed "Graceful Restart | |||
Capability", is defined which would allow a BGP speaker to express | Capability", is defined which would allow a BGP speaker to express | |||
its ability to preserve forwarding state during BGP restart. Finally, | its ability to preserve forwarding state during BGP restart. Finally, | |||
procedures are outlined for temporarily retaining routing information | procedures are outlined for temporarily retaining routing information | |||
across a TCP transport reset. | across a TCP transport reset. | |||
The mechanisms described in this document are applicable to all | The mechanisms described in this document are applicable to all | |||
routers, both those with the ability to preserve forwarding state | routers, both those with the ability to preserve forwarding state | |||
skipping to change at line 72 | skipping to change at line 72 | |||
Usually when BGP on a router restarts, all the BGP peers detect that | Usually when BGP on a router restarts, all the BGP peers detect that | |||
the session went down, and then came up. This "down/up" transition | the session went down, and then came up. This "down/up" transition | |||
results in a "routing flap" and causes BGP route re-computation, | results in a "routing flap" and causes BGP route re-computation, | |||
generation of BGP routing updates and flap the forwarding tables. It | generation of BGP routing updates and flap the forwarding tables. It | |||
could spread across multiple routing domains. Such routing flaps may | could spread across multiple routing domains. Such routing flaps may | |||
create transient forwarding blackholes and/or transient forwarding | create transient forwarding blackholes and/or transient forwarding | |||
loops. They also consume resources on the control plane of the | loops. They also consume resources on the control plane of the | |||
routers affected by the flap. As such they are detrimental to the | routers affected by the flap. As such they are detrimental to the | |||
overall network performance. | overall network performance. | |||
This document proposes a mechanism for BGP that would help minimize | This document describes a mechanism for BGP that would help minimize | |||
the negative effects on routing caused by BGP restart. An End-of-RIB | the negative effects on routing caused by BGP restart. An End-of-RIB | |||
marker is specified and can be used to convey routing convergence | marker is specified and can be used to convey routing convergence | |||
information. A new BGP capability, termed "Graceful Restart | information. A new BGP capability, termed "Graceful Restart | |||
Capability", is defined which would allow a BGP speaker to express | Capability", is defined which would allow a BGP speaker to express | |||
its ability to preserve forwarding state during BGP restart. Finally, | its ability to preserve forwarding state during BGP restart. Finally, | |||
procedures are outlined for temporarily retaining routing information | procedures are outlined for temporarily retaining routing information | |||
across a TCP transport reset. | across a TCP transport reset. | |||
3. Marker for End-of-RIB | 3. Marker for End-of-RIB | |||
skipping to change at line 204 | skipping to change at line 204 | |||
the <AFI, SAFI> has indeed been preserved during the previous | the <AFI, SAFI> has indeed been preserved during the previous | |||
BGP restart. When set (value 1), the bit indicates that the | BGP restart. When set (value 1), the bit indicates that the | |||
forwarding state has been preserved. | forwarding state has been preserved. | |||
The remaining bits are reserved, and SHOULD be set to zero by | The remaining bits are reserved, and SHOULD be set to zero by | |||
the sender and ignored by the receiver. | the sender and ignored by the receiver. | |||
When a sender of this capability doesn't include any <AFI, SAFI> in | When a sender of this capability doesn't include any <AFI, SAFI> in | |||
the capability, it means that the sender is not capable of preserving | the capability, it means that the sender is not capable of preserving | |||
its forwarding state during BGP restart, but supports procedures for | its forwarding state during BGP restart, but supports procedures for | |||
the Receiving Speaker (as defined in Section 6.2 of this document). | the Receiving Speaker (as defined in Section 5.2 of this document). | |||
In that case the value of the "Restart Time" field advertised by the | In that case the value of the "Restart Time" field advertised by the | |||
sender is irrelevant. | sender is irrelevant. | |||
A BGP speaker SHOULD NOT include more than one instance of the | A BGP speaker SHOULD NOT include more than one instance of the | |||
Graceful Restart Capability in the capability advertisement [BGP- | Graceful Restart Capability in the capability advertisement [BGP- | |||
CAP]. If more than one instance of the Graceful Restart Capability | CAP]. If more than one instance of the Graceful Restart Capability | |||
is carried in the capability advertisement, the receiver of the | is carried in the capability advertisement, the receiver of the | |||
advertisement SHOULD ignore all but the last instance of the Graceful | advertisement SHOULD ignore all but the last instance of the Graceful | |||
Restart Capability. | Restart Capability. | |||
skipping to change at line 245 | skipping to change at line 245 | |||
The End-of-RIB marker SHOULD be sent by a BGP speaker to its peer | The End-of-RIB marker SHOULD be sent by a BGP speaker to its peer | |||
once it completes the initial routing update (including the case when | once it completes the initial routing update (including the case when | |||
there is no update to send) for an address family after the BGP | there is no update to send) for an address family after the BGP | |||
session is established. | session is established. | |||
It is noted that the normal BGP procedures MUST be followed when the | It is noted that the normal BGP procedures MUST be followed when the | |||
TCP session terminates due to the sending or receiving of a BGP | TCP session terminates due to the sending or receiving of a BGP | |||
NOTIFICATION message. | NOTIFICATION message. | |||
In general the Restart Time SHOULD NOT be greater than the HOLDTIME | A suggested default for the Restart Time is a value less than or | |||
carried in the OPEN. | equal to the HOLDTIME carried in the OPEN. | |||
In the following sections, "Restarting Speaker" refers to a router | In the following sections, "Restarting Speaker" refers to a router | |||
whose BGP has restarted, and "Receiving Speaker" refers to a router | whose BGP has restarted, and "Receiving Speaker" refers to a router | |||
that peers with the restarting speaker. | that peers with the restarting speaker. | |||
Consider that the Graceful Restart Capability for an address family | Consider that the Graceful Restart Capability for an address family | |||
is advertised by the Restarting Speaker, and is understood by the | is advertised by the Restarting Speaker, and is understood by the | |||
Receiving Speaker, and a BGP session between them is established. | Receiving Speaker, and a BGP session between them is established. | |||
The following sections detail the procedures that SHALL be followed | The following sections detail the procedures that SHALL be followed | |||
by the Restarting Speaker as well as the Receiving Speaker once the | by the Restarting Speaker as well as the Receiving Speaker once the | |||
Restarting Speaker restarts. | Restarting Speaker restarts. | |||
5.1. Procedures for the Restarting Speaker | 5.1. Procedures for the Restarting Speaker | |||
When the Restarting Speaker restarts, possible it SHOULD retain, if | When the Restarting Speaker restarts, it SHOULD retain, if possible, | |||
possible, the forwarding state for the BGP routes in the Loc-RIB, and | the forwarding state for the BGP routes in the Loc-RIB, and SHALL | |||
SHALL mark them as stale. It SHOULD NOT differentiate between stale | mark them as stale. It SHOULD NOT differentiate between stale and | |||
and other information during forwarding. | other information during forwarding. | |||
To re-establish the session with its peer, the Restarting Speaker | To re-establish the session with its peer, the Restarting Speaker | |||
MUST set the "Restart State" bit in the Graceful Restart Capability | MUST set the "Restart State" bit in the Graceful Restart Capability | |||
of the OPEN message. Unless allowed via configuration, the | of the OPEN message. Unless allowed via configuration, the | |||
"Forwarding State" bit for an address family in the capability can be | "Forwarding State" bit for an address family in the capability can be | |||
set only if the forwarding state has indeed been preserved for that | set only if the forwarding state has indeed been preserved for that | |||
address family during the restart. | address family during the restart. | |||
Once the session between the Restarting Speaker and the Receiving | Once the session between the Restarting Speaker and the Receiving | |||
Speaker is re-established, the Restarting Speaker will receive and | Speaker is re-established, the Restarting Speaker will receive and | |||
skipping to change at line 348 | skipping to change at line 348 | |||
Graceful Restart Capability of the OPEN message sent by the Receiving | Graceful Restart Capability of the OPEN message sent by the Receiving | |||
Speaker SHALL NOT be set unless the Receiving Speaker has restarted. | Speaker SHALL NOT be set unless the Receiving Speaker has restarted. | |||
The presence and the setting of the "Forwarding State" bit for an | The presence and the setting of the "Forwarding State" bit for an | |||
address family depends upon the actual forwarding state and | address family depends upon the actual forwarding state and | |||
configuration. | configuration. | |||
If the session does not get re-established within the "Restart Time" | If the session does not get re-established within the "Restart Time" | |||
that the peer advertised previously, the Receiving Speaker SHALL | that the peer advertised previously, the Receiving Speaker SHALL | |||
delete all the stale routes from the peer that it is retaining. | delete all the stale routes from the peer that it is retaining. | |||
A BGP speaker could have some way of determining whether its peer's | ||||
forwarding state is still viable, for example through [BFD] or | ||||
through monitoring layer two information. Specifics of such | ||||
mechanisms are beyond the scope of this document. In the event that | ||||
it determines that its peer's forwarding state is not viable prior to | ||||
the re-establishment of the session, the speaker MAY delete all the | ||||
stale routes from the peer that it is retaining. | ||||
Once the session is re-established, if the "Forwarding State" bit for | Once the session is re-established, if the "Forwarding State" bit for | |||
a specific address family is not set in the newly received Graceful | a specific address family is not set in the newly received Graceful | |||
Restart Capability, or if a specific address family is not included | Restart Capability, or if a specific address family is not included | |||
in the newly received Graceful Restart Capability, or if the Graceful | in the newly received Graceful Restart Capability, or if the Graceful | |||
Restart Capability isn't received in the re-established session at | Restart Capability isn't received in the re-established session at | |||
all, then Receiving Speaker SHALL immediately remove all the stale | all, then Receiving Speaker SHALL immediately remove all the stale | |||
routes from the peer that it is retaining for that address family. | routes from the peer that it is retaining for that address family. | |||
The Receiving Speaker SHALL send the End-of-RIB marker once it | The Receiving Speaker SHALL send the End-of-RIB marker once it | |||
completes the initial update for an address family (including the | completes the initial update for an address family (including the | |||
skipping to change at line 492 | skipping to change at line 500 | |||
- drops the TCP connection, | - drops the TCP connection, | |||
- increments the ConnectRetryCounter by 1, | - increments the ConnectRetryCounter by 1, | |||
- changes its state to Idle. | - changes its state to Idle. | |||
7. Deployment Considerations | 7. Deployment Considerations | |||
While the procedures described in this document would help minimize | While the procedures described in this document would help minimize | |||
the effect of routing flaps, it is noted, however, that when a BGP | the effect of routing flaps, it is noted, however, that when a BGP | |||
Graceful Restart capable router restarts, there is a potential for | Graceful Restart capable router restarts, or if it restarts without | |||
transient routing loops or blackholes in the network if routing | preserving its forwarding state (for example due to a power failure) | |||
information changes before the involved routers complete routing | there is a potential for transient routing loops or blackholes in the | |||
updates and convergence. Also, depending on the network topology, if | network if routing information changes before the involved routers | |||
not all IBGP speakers are Graceful Restart capable, there could be an | complete routing updates and convergence. Also, depending on the | |||
increased exposure to transient routing loops or blackholes when the | network topology, if not all IBGP speakers are Graceful Restart | |||
Graceful Restart procedures are exercised. | capable, there could be an increased exposure to transient routing | |||
loops or blackholes when the Graceful Restart procedures are | ||||
exercised. | ||||
The Restart Time, the upper bound for retaining routes and the upper | The Restart Time, the upper bound for retaining routes and the upper | |||
bound for deferring route selection may need to be tuned as more | bound for deferring route selection may need to be tuned as more | |||
deployment experience is gained. | deployment experience is gained. | |||
Finally, it is noted that the benefits of deploying BGP Graceful | Finally, it is noted that the benefits of deploying BGP Graceful | |||
Restart in an AS whose IGPs and BGP are tightly coupled (i.e., BGP | Restart in an AS whose IGPs and BGP are tightly coupled (i.e., BGP | |||
and IGPs would both restart) and IGPs have no similar Graceful | and IGPs would both restart) and IGPs have no similar Graceful | |||
Restart capability are reduced relative to the scenario where IGPs do | Restart capability are reduced relative to the scenario where IGPs do | |||
have similar Graceful Restart capability. | have similar Graceful Restart capability. | |||
skipping to change at line 598 | skipping to change at line 608 | |||
[BGP-AUTH] Heffernan A., "Protection of BGP Sessions via the TCP MD5 | [BGP-AUTH] Heffernan A., "Protection of BGP Sessions via the TCP MD5 | |||
Signature Option", RFC 2385, August 1998. | Signature Option", RFC 2385, August 1998. | |||
[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate | [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate | |||
Requirement Levels", BCP 14, RFC 2119, March 1997. | Requirement Levels", BCP 14, RFC 2119, March 1997. | |||
[IANA-AFI] http://www.iana.org/assignments/address-family-numbers. | [IANA-AFI] http://www.iana.org/assignments/address-family-numbers. | |||
[IANA-SAFI] http://www.iana.org/assignments/safi-namespace. | [IANA-SAFI] http://www.iana.org/assignments/safi-namespace. | |||
14. Author Information | 14. Non-normative References | |||
[BFD] Katz, D., Ward, D., "Bidirectional Forwarding Detection", | ||||
draft-ietf-bfd-base-03.txt, work in progress | ||||
15. Author Information | ||||
Srihari R. Sangli | Srihari R. Sangli | |||
Cisco Systems, Inc. | Cisco Systems, Inc. | |||
EMail: rsrihari@cisco.com | EMail: rsrihari@cisco.com | |||
Yakov Rekhter | Yakov Rekhter | |||
Juniper Networks, Inc. | Juniper Networks, Inc. | |||
EMail: yakov@juniper.net | EMail: yakov@juniper.net | |||
Rex Fernando | Rex Fernando | |||
End of changes. 10 change blocks. | ||||
19 lines changed or deleted | 34 lines changed or added | |||
This html diff was produced by rfcdiff 1.32. The latest version is available from http://www.levkowetz.com/ietf/tools/rfcdiff/ |