draft-ietf-idr-restart-03.txt | draft-ietf-idr-restart-04.txt | |||
---|---|---|---|---|
Network Working Group Srihari R. Sangli (Procket Networks) | Network Working Group Srihari R. Sangli (Procket Networks) | |||
Internet Draft Yakov Rekhter (Juniper Networks) | Internet Draft Yakov Rekhter (Juniper Networks) | |||
Expiration Date: October 2002 Rex Fernando (Procket Networks) | Expiration Date: December 2002 Rex Fernando (Procket Networks) | |||
John G. Scudder (Cisco Systems) | John G. Scudder (Cisco Systems) | |||
Enke Chen (Redback Networks) | Enke Chen (Redback Networks) | |||
Graceful Restart Mechanism for BGP | Graceful Restart Mechanism for BGP | |||
draft-ietf-idr-restart-03.txt | draft-ietf-idr-restart-04.txt | |||
1. Status of this Memo | 1. Status of this Memo | |||
This document is an Internet-Draft and is in full conformance with | This document is an Internet-Draft and is in full conformance with | |||
all provisions of Section 10 of RFC2026. | all provisions of Section 10 of RFC2026. | |||
Internet-Drafts are working documents of the Internet Engineering | Internet-Drafts are working documents of the Internet Engineering | |||
Task Force (IETF), its areas, and its working groups. Note that | Task Force (IETF), its areas, and its working groups. Note that | |||
other groups may also distribute working documents as Internet- | other groups may also distribute working documents as Internet- | |||
Drafts. | Drafts. | |||
skipping to change at page 2, line 28 | skipping to change at page 2, line 28 | |||
the negative effects on routing caused by BGP restart. An End-of-RIB | the negative effects on routing caused by BGP restart. An End-of-RIB | |||
marker is specified and can be used to convey routing convergence | marker is specified and can be used to convey routing convergence | |||
information. A new BGP capability, termed "Graceful Restart | information. A new BGP capability, termed "Graceful Restart | |||
Capability", is defined which would allow a BGP speaker to express | Capability", is defined which would allow a BGP speaker to express | |||
its ability to preserve forwarding state during BGP restart. Finally, | its ability to preserve forwarding state during BGP restart. Finally, | |||
procedures are outlined for temporarily retaining routing information | procedures are outlined for temporarily retaining routing information | |||
across a TCP transport reset. | across a TCP transport reset. | |||
4. Marker for End-of-RIB | 4. Marker for End-of-RIB | |||
An UPDATE message with empty withdrawn NLRI is specified as the End- | An UPDATE message with no reachable NLRI and empty withdrawn NLRI is | |||
Of-RIB Marker that can be used by a BGP speaker to indicate to its | specified as the End-Of-RIB Marker that can be used by a BGP speaker | |||
peer the completion of the initial routing update after the session | to indicate to its peer the completion of the initial routing update | |||
is established. For IPv4 unicast address family, the End-Of-RIB | after the session is established. For IPv4 unicast address family, | |||
Marker is an UPDATE message with the minimum length [BGP-4]. For any | the End-Of-RIB Marker is an UPDATE message with the minimum length | |||
other address family, it is an UPDATE message that contains only | [BGP-4]. For any other address family, it is an UPDATE message that | |||
MP_UNREACH_NLRI [BGP-MP] with no withdrawn routes for that <AFI, Sub- | contains only the MP_UNREACH_NLRI attribute [BGP-MP] with no | |||
AFI>. | withdrawn routes for that <AFI, SAFI>. | |||
Although the End-of-RIB Marker is specified for the purpose of BGP | Although the End-of-RIB Marker is specified for the purpose of BGP | |||
graceful restart, it is noted that the generation of such a marker | graceful restart, it is noted that the generation of such a marker | |||
upon completion of the initial update would be useful for routing | upon completion of the initial update would be useful for routing | |||
convergence in general, and thus the practice is recommended. | convergence in general, and thus the practice is recommended. | |||
In addition, it would be beneficial for routing convergence if a BGP | In addition, it would be beneficial for routing convergence if a BGP | |||
speaker can indicate to its peer up-front that it will generate the | speaker can indicate to its peer up-front that it will generate the | |||
End-Of-RIB marker, regardless of its ability to preserve its | End-Of-RIB marker, regardless of its ability to preserve its | |||
forwarding state during BGP restart. This can be accomplished using | forwarding state during BGP restart. This can be accomplished using | |||
skipping to change at page 3, line 19 | skipping to change at page 3, line 19 | |||
its forwarding state during BGP restart. It can also be used to | its forwarding state during BGP restart. It can also be used to | |||
convey to its peer its intention of generating the End-Of-RIB marker | convey to its peer its intention of generating the End-Of-RIB marker | |||
upon the completion of its initial routing updates. | upon the completion of its initial routing updates. | |||
This capability is defined as follows: | This capability is defined as follows: | |||
Capability code: 64 | Capability code: 64 | |||
Capability length: variable | Capability length: variable | |||
Capability value: Consists of the "Restart Flags" field, | Capability value: Consists of the "Restart Flags" field, "Restart | |||
"Restart Time" field, and zero or more of the tuples <AFI, | Time" field, and zero or more of the tuples <AFI, SAFI, Flags for | |||
Sub-AFI, Flags for address family> as follows. | address family> as follows: | |||
+--------------------------------------------------+ | +--------------------------------------------------+ | |||
| Restart Flags (4 bits) | | | Restart Flags (4 bits) | | |||
+--------------------------------------------------+ | +--------------------------------------------------+ | |||
| Restart Time in seconds (12 bits) | | | Restart Time in seconds (12 bits) | | |||
+--------------------------------------------------+ | +--------------------------------------------------+ | |||
| Address Family Identifier (16 bits) | | | Address Family Identifier (16 bits) | | |||
+--------------------------------------------------+ | +--------------------------------------------------+ | |||
| Subsequent Address Family Identifier (8 bits) | | | Subsequent Address Family Identifier (8 bits) | | |||
+--------------------------------------------------+ | +--------------------------------------------------+ | |||
skipping to change at page 3, line 49 | skipping to change at page 3, line 49 | |||
+--------------------------------------------------+ | +--------------------------------------------------+ | |||
| Flags for Address Family (8 bits) | | | Flags for Address Family (8 bits) | | |||
+--------------------------------------------------+ | +--------------------------------------------------+ | |||
The use and meaning of the fields are as follows: | The use and meaning of the fields are as follows: | |||
Restart Flags: | Restart Flags: | |||
This field contains bit flags related to restart. | This field contains bit flags related to restart. | |||
The most significant bit is defined as the Restart State bit | 0 1 2 3 | |||
which can be used to avoid possible deadlock caused by waiting | +-+-+-+-+ | |||
for the End-of-RIB marker when multiple BGP speakers peering | |R|Resv.| | |||
with each other restart. When set (value 1), this bit indicates | +-+-+-+-+ | |||
that the BGP speaker has restarted, and its peer should not wait | The most significant bit is defined as the Restart State (R) | |||
for the End-of-RIB marker from the speaker before advertising | bit which can be used to avoid possible deadlock caused by | |||
routing information to the speaker. | waiting for the End-of-RIB marker when multiple BGP speakers | |||
peering with each other restart. When set (value 1), this bit | ||||
indicates that the BGP speaker has restarted, and its peer | ||||
should not wait for the End-of-RIB marker from the speaker | ||||
before advertising routing information to the speaker. | ||||
The remaining bits are reserved. | The remaining bits are reserved, and should be set to zero by | |||
the sender and ignored by the receiver. | ||||
Restart Time: | Restart Time: | |||
This is the estimated time (in seconds) it will take for the BGP | This is the estimated time (in seconds) it will take for the | |||
session to be re-established after a restart. This can be used to | BGP session to be re-established after a restart. This can be | |||
speed up routing convergence by its peer in case that the BGP | used to speed up routing convergence by its peer in case that | |||
speaker does not come back after a restart. | the BGP speaker does not come back after a restart. | |||
Address Family Identifier (AFI): | Address Family Identifier (AFI): | |||
This field carries the identity of the Network Layer protocol | This field carries the identity of the Network Layer protocol | |||
for which the Graceful Restart support is advertised. Presently | for which the Graceful Restart support is advertised. Presently | |||
defined values for this field are specified in RFC1700 (see | defined values for this field are specified in [IANA-AFI]. | |||
the Address Family Numbers section). | ||||
Subsequent Address Family Identifier (Sub-AFI): | Subsequent Address Family Identifier (SAFI): | |||
This field provides additional information about the type of | This field provides additional information about the type of | |||
the Network Layer Reachability Information carried in the | the Network Layer Reachability Information carried in the | |||
attribute. | attribute. Presently defined values for this field are | |||
specified in [IANA-SAFI]. | ||||
Flags for Address Family: | Flags for Address Family: | |||
This field contains bit flags for the <AFI, Sub-AFI>. | This field contains bit flags for the <AFI, SAFI>. | |||
The most significant bit is defined as the Forwarding State | 0 1 2 3 4 5 6 7 | |||
+-+-+-+-+-+-+-+-+ | ||||
|F| Reserved | | ||||
+-+-+-+-+-+-+-+-+ | ||||
The most significant bit is defined as the Forwarding State (F) | ||||
bit which can be used to indicate if the forwarding state for | bit which can be used to indicate if the forwarding state for | |||
the <AFI, Sub-AFI> has indeed been preserved during the previous | the <AFI, SAFI> has indeed been preserved during the previous | |||
BGP restart. When set (value 1), the bit indicates that the | BGP restart. When set (value 1), the bit indicates that the | |||
forwarding state has been preserved. | forwarding state has been preserved. | |||
The remaining bits are reserved. | The remaining bits are reserved, and should be set to zero by | |||
the sender and ignored by the receiver. | ||||
The advertisement of this capability by a BGP speaker also implies | When a sender of this capability doesn't include any <AFI, SAFI> in | |||
that it will generate the End-of-RIB marker (for all address families | the capability, it means that the sender is not capable of preserving | |||
exchanged) upon completion of its initial routing update to its peer. | its forwarding state during BGP restart, but is going to generate the | |||
The value of the "Restart Time" field is irrelevant in the case that | End-of-RIB marker upon the completion of its initial routing updates. | |||
the capability does not carry any <AFI, Sub-AFI>. | The value of the "Restart Time" field is irrelevant in that case. | |||
A BGP speaker should not include more than one instance of the | ||||
Graceful Restart Capability in the capability advertisement [BGP- | ||||
CAP]. If more than one instance of the Graceful Restart Capability | ||||
is carried in the capability advertisement, the receiver of the | ||||
advertisement should ignore all but the last instance of the Graceful | ||||
Restart Capability. | ||||
Including <AFI=IPv4, SAFI=unicast> into the Graceful Restart | ||||
Capability doesn't imply that the IPv4 unicast routing information | ||||
should be carried by using the BGP Multiprotocol extensions [BGP-MP] | ||||
- it could be carried in the NLRI field of the BGP UPDATE message. | ||||
6. Operation | 6. Operation | |||
A BGP speaker may advertise the Graceful Restart Capability for an | A BGP speaker may advertise the Graceful Restart Capability for an | |||
address family to its peer only if it has the ability to preserve its | address family to its peer if it has the ability to preserve its | |||
forwarding state for the address family when BGP restarts. | forwarding state for the address family when BGP restarts. In | |||
addition, even if the speaker does not have the ability to preserve | ||||
Even if the speaker does not have the ability to preserve its | its forwarding state for any address family during BGP restart, it is | |||
forwarding state for any address family during BGP restart, it is | ||||
still recommended that the speaker advertise the Graceful Restart | still recommended that the speaker advertise the Graceful Restart | |||
Capability to its peer to indicate its intention of generating the | Capability to its peer to indicate its intention of generating the | |||
End-of-RIB marker upon the completion of its initial routing updates. | End-of-RIB marker upon the completion of its initial routing updates | |||
(as mentioned before this is done by not including any <AFI, SAFI> in | ||||
the advertised capability), as doing this would be useful for routing | ||||
convergence in general. | ||||
The End-of-RIB marker should be sent by a BGP speaker to its peer | The End-of-RIB marker should be sent by a BGP speaker to its peer | |||
once it completes the initial routing update (including the case when | once it completes the initial routing update (including the case when | |||
there is no update to send) for an address family after the BGP | there is no update to send) for an address family after the BGP | |||
session is established. | session is established. | |||
It is noted that the normal BGP procedures MUST be followed when the | It is noted that the normal BGP procedures must be followed when the | |||
TCP session terminates due to the sending or receiving of a BGP | TCP session terminates due to the sending or receiving of a BGP | |||
NOTIFICATION message. | NOTIFICATION message. | |||
In general the Restart Time SHOULD NOT be greater than the HOLDTIME | In general the Restart Time should not be greater than the HOLDTIME | |||
carried in the OPEN. | carried in the OPEN. | |||
In the following sections, "Restarting Speaker" refers to a router | In the following sections, "Restarting Speaker" refers to a router | |||
whose BGP has restarted, and "Receiving Speaker" refers to a router | whose BGP has restarted, and "Receiving Speaker" refers to a router | |||
that peers with the restarting speaker. | that peers with the restarting speaker. | |||
Consider that the Graceful Restart Capability for an address family | Consider that the Graceful Restart Capability for an address family | |||
is advertised by the Restarting Speaker, and is understood by the | is advertised by the Restarting Speaker, and is understood by the | |||
Receiving Speaker, and a BGP session between them is established. | Receiving Speaker, and a BGP session between them is established. | |||
The following sections detail the procedures that shall be followed | The following sections detail the procedures that shall be followed | |||
skipping to change at page 6, line 31 | skipping to change at page 6, line 51 | |||
of the speaker shall be updated and any previously marked stale | of the speaker shall be updated and any previously marked stale | |||
information shall be removed. The Adj-RIB-Out can then be advertised | information shall be removed. The Adj-RIB-Out can then be advertised | |||
to its peers. Once the initial update is complete for an address | to its peers. Once the initial update is complete for an address | |||
family (including the case that there is no routing update to send), | family (including the case that there is no routing update to send), | |||
the End-of-RIB marker shall be sent. | the End-of-RIB marker shall be sent. | |||
To put an upper bound on the amount of time a router defers its route | To put an upper bound on the amount of time a router defers its route | |||
selection, an implementation must support a (configurable) timer that | selection, an implementation must support a (configurable) timer that | |||
imposes this upper bound. | imposes this upper bound. | |||
If one wants to apply graceful restart only when the restart is | ||||
planned (as opposed to both planned and unplanned restart), then one | ||||
way to accomplish this would be to set the Forwarding State bit to 1 | ||||
after a planned restart, and to 0 in all other cases. Other | ||||
approaches to accomplish this are outside the scope of this document. | ||||
6.2. Procedures for the Receiving Speaker | 6.2. Procedures for the Receiving Speaker | |||
When the Restarting Speaker restarts, the Receiving Speaker may or | When the Restarting Speaker restarts, the Receiving Speaker may or | |||
may not detect the termination of the TCP session with the Restarting | may not detect the termination of the TCP session with the Restarting | |||
Speaker, depending on the underlying TCP implementation, whether or | Speaker, depending on the underlying TCP implementation, whether or | |||
not [BGP-AUTH] is in use, and the specific circumstances of the | not [BGP-AUTH] is in use, and the specific circumstances of the | |||
restart. In case it does not detect the TCP reset and still | restart. In case it does not detect the TCP reset and still | |||
considers the BGP session as being established, it shall treat the | considers the BGP session as being established, it shall treat the | |||
subsequent open connection from the peer as an indication of TCP | subsequent open connection from the peer as an indication of TCP | |||
reset and act accordingly (when the Graceful Restart Capabilty has | reset and act accordingly (when the Graceful Restart Capability has | |||
been received from the peer). | been received from the peer). | |||
"Acting accordingly" in this context means that the previous TCP | ||||
session should be closed, and the new one retained. Note that this | ||||
behavior differs from the default behavior, as specified in [BGP-4] | ||||
section 6.8. Since the previous connection is considered to be | ||||
reset, no NOTIFICATION message should be sent -- the previous TCP | ||||
session is simply closed. | ||||
When the Receiving Speaker detects TCP reset for a BGP session with a | When the Receiving Speaker detects TCP reset for a BGP session with a | |||
peer that has advertised the Graceful Restart Capability, it shall | peer that has advertised the Graceful Restart Capability, it shall | |||
retain the routes received from the peer for all the address families | retain the routes received from the peer for all the address families | |||
that were previously received in the Graceful Restart Capability, and | that were previously received in the Graceful Restart Capability, and | |||
shall mark them as stale routing information. To deal with possible | shall mark them as stale routing information. To deal with possible | |||
consecutive restarts, a route (from the peer) previously marked as | consecutive restarts, a route (from the peer) previously marked as | |||
stale shall be deleted. The router should not differentiate between | stale shall be deleted. The router should not differentiate between | |||
stale and other routing information during forwarding. | stale and other routing information during forwarding. | |||
In re-establishing the session, the "Restart State" bit in the | In re-establishing the session, the "Restart State" bit in the | |||
skipping to change at page 7, line 17 | skipping to change at page 7, line 49 | |||
Speaker shall not be set unless the Receiving Speaker has restarted. | Speaker shall not be set unless the Receiving Speaker has restarted. | |||
The presence and the setting of the "Forwarding State" bit for an | The presence and the setting of the "Forwarding State" bit for an | |||
address family depends upon the actual forwarding state and | address family depends upon the actual forwarding state and | |||
configuration. | configuration. | |||
If the session does not get re-established within the "Restart Time" | If the session does not get re-established within the "Restart Time" | |||
that the peer advertised previously, the Receiving Speaker shall | that the peer advertised previously, the Receiving Speaker shall | |||
delete all the stale routes from the peer that it is retaining. | delete all the stale routes from the peer that it is retaining. | |||
Once the session is re-established, if the "Forwarding State" bit for | Once the session is re-established, if the "Forwarding State" bit for | |||
an address family is not set in the received Graceful Restart | a specific address family is not set in the newly received Graceful | |||
Capability, or if the capability is not received for an address | Restart Capability, or if a specific address family is not included | |||
family, the Receiving Speaker shall immediately remove all the stale | in the newly received Graceful Restart Capability, or if the Graceful | |||
Restart Capability isn't received in the re-established session at | ||||
all, then Receiving Speaker shall immediately remove all the stale | ||||
routes from the peer that it is retaining for that address family. | routes from the peer that it is retaining for that address family. | |||
The Receiving Speaker shall send the End-of-RIB marker once it | The Receiving Speaker shall send the End-of-RIB marker once it | |||
completes the initial update for an address family (including the | completes the initial update for an address family (including the | |||
case that it has no routes to send) to the peer. | case that it has no routes to send) to the peer. | |||
The Receiving Speaker shall replace the stale routes by the routing | The Receiving Speaker shall replace the stale routes by the routing | |||
updates received from the peer. Once the End-of-RIB marker for an | updates received from the peer. Once the End-of-RIB marker for an | |||
address family is received from the peer, it shall immediately remove | address family is received from the peer, it shall immediately remove | |||
any routes from the peer that are still marked as stale for that | any routes from the peer that are still marked as stale for that | |||
address family. | address family. | |||
To put an upper bound on the amount of time a router retains the | To put an upper bound on the amount of time a router retains the | |||
stale routes, an implementation may support a (configurable) timer | stale routes, an implementation may support a (configurable) timer | |||
that imposes this upper bound. | that imposes this upper bound. | |||
7. Deployment Considerations | 7. Deployment Considerations | |||
While the procedures described in this document would help minimize | While the procedures described in this document would help minimize | |||
the effect of routing flaps, it is noted, however, that when a BGP | the effect of routing flaps, it is noted, however, that when a BGP | |||
Graceful-Restart capable router restarts, there is a potential for | Graceful Restart capable router restarts, there is a potential for | |||
transient routing loops or blackholes in the network if routing | transient routing loops or blackholes in the network if routing | |||
information changes before the involved routers complete routing | information changes before the involved routers complete routing | |||
updates and convergence. Also, depending on the network topology, if | updates and convergence. Also, depending on the network topology, if | |||
not all IBGP speakers are Graceful-Restart capable, there could be an | not all IBGP speakers are Graceful Restart capable, there could be an | |||
increased exposure to transient routing loops or blackholes when the | increased exposure to transient routing loops or blackholes when the | |||
Graceful-Restart procedures are exercised. | Graceful Restart procedures are exercised. | |||
The Restart Time, the upper bound for retaining routes and the upper | The Restart Time, the upper bound for retaining routes and the upper | |||
bound for deferring route selection may need to be tuned as more | bound for deferring route selection may need to be tuned as more | |||
deployment experience is gained. | deployment experience is gained. | |||
Finally, it is noted that there is little benefit deploying BGP | Finally, it is noted that the benefits of deploying BGP Graceful | |||
Graceful-Restart in an AS whose IGPs and BGP are tightly coupled | Restart in an AS whose IGPs and BGP are tightly coupled (i.e., BGP | |||
(i.e., BGP and IGPs would both restart), and IGPs have no similar | and IGPs would both restart) and IGPs have no similar Graceful | |||
Graceful-Restart capability. | Restart capability are reduced relative to the scenario where IGPs do | |||
have similar Graceful Restart capability. | ||||
8. Security Considerations | 8. Security Considerations | |||
Since with this proposal a new connection can cause an old one to be | Since with this proposal a new connection can cause an old one to be | |||
terminated, it might seem to open the door to denial of service | terminated, it might seem to open the door to denial of service | |||
attacks. However, it is noted that unauthenticated BGP is already | attacks. However, it is noted that unauthenticated BGP is already | |||
known to be vulnerable to denials of service through attacks on the | known to be vulnerable to denials of service through attacks on the | |||
TCP transport. The TCP transport is commonly protected through use | TCP transport. The TCP transport is commonly protected through use | |||
of [BGP-AUTH]. Such authentication will equally protect against | of [BGP-AUTH]. Such authentication will equally protect against | |||
denials of service through spurious new connections. | denials of service through spurious new connections. | |||
It is thus concluded that this proposal does not change the | It is thus concluded that this proposal does not change the | |||
underlying security model (and issues) of BGP-4. | underlying security model (and issues) of BGP-4. | |||
9. Acknowledgments | 9. Acknowledgments | |||
The authors would like to thank Alvaro Retana, Satinder Singh, David | The authors would like to thank Bruce Cole, Bill Fenner, Eric Gray | |||
Ward, Naiming Shen and Bruce Cole for their review and comments. | Jeffrey Haas, Alvaro Retana, Naiming Shen, Satinder Singh, David | |||
Ward, Shane Wright and Alex Zinin for their review and comments. | ||||
10. References | 10. References | |||
[BGP-4] Rekhter, Y., and T. Li, "A Border Gateway Protocol 4 (BGP- | [BGP-4] Rekhter, Y., and T. Li, "A Border Gateway Protocol 4 (BGP- | |||
4)", RFC 1771, March 1995. | 4)", RFC 1771, March 1995. | |||
[BGP-MP] Bates, T., Chandra, R., Katz, D., and Rekhter, Y., | [BGP-MP] Bates, T., Chandra, R., Katz, D., and Rekhter, Y., | |||
"Multiprotocol Extensions for BGP-4", RFC 2283, March 1998. | "Multiprotocol Extensions for BGP-4", RFC2858, June 2000. | |||
[BGP-CAP] Chandra, R., Scudder, J., "Capabilities Advertisement with | [BGP-CAP] Chandra, R., Scudder, J., "Capabilities Advertisement with | |||
BGP-4", RFC 2842, May 2000. | BGP-4", draft-ietf-idr-rfc2842bis-02.txt, April 2002. | |||
[BGP-AUTH] Heffernan A., "Protection of BGP Sessions via the TCP MD5 | [BGP-AUTH] Heffernan A., "Protection of BGP Sessions via the TCP MD5 | |||
Signature Option", RFC 2385, August 1998. | Signature Option", RFC 2385, August 1998. | |||
[IANA-AFI] http://www.iana.org/assignments/address-family-numbers. | ||||
[IANA-SAFI] http://www.iana.org/assignments/safi-namespace. | ||||
11. Author Information | 11. Author Information | |||
Srihari R. Sangli | Srihari R. Sangli | |||
Procket Networks, Inc. | Procket Networks, Inc. | |||
1100 Cadillac Court | 1100 Cadillac Court | |||
Milpitas, CA 95035 | Milpitas, CA 95035 | |||
e-mail: srihari@procket.com | e-mail: srihari@procket.com | |||
Yakov Rekhter | Yakov Rekhter | |||
End of changes. | ||||
This html diff was produced by rfcdiff 1.23, available from http://www.levkowetz.com/ietf/tools/rfcdiff/ |