draft-ietf-mpls-ldp-restart-01.txt   draft-ietf-mpls-ldp-restart-02.txt 
Network Working Group Manoj Leelanivas (Juniper Networks) Network Working Group Manoj Leelanivas (Juniper Networks)
Internet Draft Yakov Rekhter(Juniper Networks) Internet Draft Yakov Rekhter(Juniper Networks)
Expiration Date: October 2002 Rahul Aggarwal (Redback Networks) Expiration Date: December 2002 Rahul Aggarwal (Redback Networks)
Graceful Restart Mechanism for LDP Graceful Restart Mechanism for LDP
draft-ietf-mpls-ldp-restart-01.txt draft-ietf-mpls-ldp-restart-02.txt
1. Status of this Memo 1. Status of this Memo
This document is an Internet-Draft and is in full conformance with This document is an Internet-Draft and is in full conformance with
all provisions of Section 10 of RFC2026, except that the right to all provisions of Section 10 of RFC2026, except that the right to
produce derivative works is not granted. produce derivative works is not granted.
Internet-Drafts are working documents of the Internet Engineering Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF), its areas, and its working groups. Note that other Task Force (IETF), its areas, and its working groups. Note that other
groups may also distribute working documents as Internet-Drafts. groups may also distribute working documents as Internet-Drafts.
skipping to change at page 1, line 35 skipping to change at page 1, line 35
The list of current Internet-Drafts can be accessed at The list of current Internet-Drafts can be accessed at
http://www.ietf.org/ietf/1id-abstracts.txt http://www.ietf.org/ietf/1id-abstracts.txt
The list of Internet-Draft Shadow Directories can be accessed at The list of Internet-Draft Shadow Directories can be accessed at
http://www.ietf.org/shadow.html. http://www.ietf.org/shadow.html.
2. Abstract 2. Abstract
This document describes a mechanism that helps to minimize the This document describes a mechanism that helps to minimize the
negative effects on MPLS traffic caused by LSR's control plane negative effects on MPLS traffic caused by Label Switch Router's
restart, and specifically by the restart of its LDP component, on (LSR's) control plane restart, and specifically by the restart of its
LSRs that are capable of preserving the MPLS forwarding component Label Distribution Protocol (LDP) component, on LSRs that are capable
across the restart. of preserving the MPLS forwarding component across the restart.
The mechanism described in this document is applicable to all LSRs,
both those with the ability to preserve forwarding state during LDP
restart and those without (although the latter need to implement only
a subset of the mechanism described in this document).
The mechanism makes minimalistic assumptions on what has to be
preserved across restart - the mechanism assumes that only the actual
MPLS forwarding state has to be preserved; the mechanism does not
require any of the LDP-related state to be preserved across the
restart.
3. Summary for Sub-IP Area 3. Summary for Sub-IP Area
3.1. Summary 3.1. Summary
This document describes a mechanism that helps to minimize the This document describes a mechanism that helps to minimize the
negative effects on MPLS traffic caused by LSR's control plane negative effects on MPLS traffic caused by LSR's control plane
restart, and specifically by the restart of its LDP component, on restart, and specifically by the restart of its LDP component, on
LSRs that are capable of preserving the MPLS forwarding component LSRs that are capable of preserving the MPLS forwarding component
across the restart. across the restart.
skipping to change at page 3, line 14 skipping to change at page 3, line 14
4. Motivation 4. Motivation
In the case where an LSR could preserve its MPLS forwarding state In the case where an LSR could preserve its MPLS forwarding state
across restart of its control plane, and specifically its LDP across restart of its control plane, and specifically its LDP
component [LDP], it may be desirable not to perturb the LSPs going component [LDP], it may be desirable not to perturb the LSPs going
through that LSR (and specifically, the LSPs established by LDP). In through that LSR (and specifically, the LSPs established by LDP). In
this document, we describe a mechanism, termed "LDP Graceful this document, we describe a mechanism, termed "LDP Graceful
Restart", that allows to accomplish this goal. Restart", that allows to accomplish this goal.
The mechanism described in this document is applicable to all LSRs,
both those with the ability to preserve forwarding state during LDP
restart and those without (although the latter need to implement only
a subset of the mechanism described in this document).
The mechanism makes minimalistic assumptions on what has to be
preserved across restart - the mechanism assumes that only the actual
MPLS forwarding state has to be preserved. Clearly this is the
minimum amount of state that has to be preserved across the restart
in order not to perturb the LSPs traversing a restarting LSR. The
mechanism does not require any of the LDP-related state to be
preserved across the restart.
5. LDP Extension 5. LDP Extension
An LSR indicates that it is capable of supporting LDP Graceful An LSR indicates that it is capable of supporting LDP Graceful
Restart, as defined in this document, by including the Graceful Restart, as defined in this document, by including the Fault Tolerant
Restart TLV as an Optional Parameter in the LDP Initialization (FT) Session TLV as an Optional Parameter in the LDP Initialization
message. message. The format of the FT Session TLV is defined in [FT-LDP].
The L flag has to be set to 1. The rest of the FT flags are set to 0
by a sender and ignored on receipt.
0 1 2 3 The value field of the FT Session TLV contains two components that
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 are used by the mechanisms defined in this document: FT Reconnect
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Timeout, and Recovery Time.
|1|0| Type (TBD) | Length = 8 |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Restart Time (in milliseconds) |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Recovery Time (in milliseconds) |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
The value field of the Graceful Restart TLV contains two components: The FT Reconnect Timeout is the time (in milliseconds) that the
Restart Time, and Recovery Time. sender of the TLV would like the receiver of that TLV to wait after
the receiver detects the failure of LDP communication with the
sender. While waiting, the receiver should retain the LDP and MPLS
forwarding state for the (already established) LSPs that traverse a
link between the sender and the receiver. The FT Reconnect Timeout
should be long enough to allow the restart of the control plane of
the sender of the TLV, and specifically its LDP component to bring it
to the state where the sender could exchange LDP messages with its
neighbors.
The Restart Time is the time (in milliseconds) that the sender of the Setting the FT Reconnect Timeout to 0 indicates that the sender of
TLV would like the receiver of that TLV to wait after the receiver the TLV will not preserve its forwarding state across the restart,
detects the failure of LDP communication with the sender. While yet the sender supports the procedures defined in Section 6.3 of this
waiting, the receiver should retain the LDP and MPLS forwarding state document, and therefore could take advantage if its neighbor can
for the (already established) LSPs that traverse a link between the preserve its forwarding state across the restart.
sender and the receiver. The Restart Time should be long enough to
allow the restart of the control plane of the sender of the TLV, and
specifically its LDP component to bring it to the state where the
sender could exchange LDP messages with its neighbors.
For a restarting LSR the Recovery Time carries the time (in For a restarting LSR the Recovery Time carries the time (in
milliseconds) the LSR is willing to retain its MPLS forwarding state milliseconds) the LSR is willing to retain its MPLS forwarding state
that it preserved across the restart. The time is from the moment the that it preserved across the restart. The time is from the moment the
LSR sends the Initialization message that carries the Graceful LSR sends the Initialization message that carries the FT Session TLV
Restart TLV after restart. Setting this time to 0 indicates that the after restart. Setting this time to 0 indicates that the MPLS
MPLS forwarding state wasn't preserved across the restart (or even if forwarding state wasn't preserved across the restart (or even if it
it was preserved, is no longer available). was preserved, is no longer available).
The Recovery Time should be long enough to allow the neighboring The Recovery Time should be long enough to allow the neighboring
LSR's to re-sync all the LSP's in a graceful manner, without creating LSR's to re-sync all the LSP's in a graceful manner, without creating
congestion in the LDP control plane. congestion in the LDP control plane.
6. Operations 6. Operations
For the sake of brevity in the context of this document by "the For the sake of brevity in the context of this document by "the
control plane" we mean "the LDP component of the control plane". control plane" we mean "the LDP component of the control plane".
An LSR that supports functionality described in this document should An LSR that supports functionality described in this document should
advertise this to its LDP neighbors by carrying the Graceful Restart advertise this to its LDP neighbors by carrying the FT Session TLV in
TLV in the LDP Initialization message. the LDP Initialization message.
The procedures described in this document apply to downstream The procedures described in this document apply to downstream
unsolicited label distribution. Extending these procedures to unsolicited label distribution. Extending these procedures to
downstream on demand label distribution is for further study. downstream on demand label distribution is for further study.
This document assumes that in addition to the MPLS forwarding state, This document assumes that in addition to the MPLS forwarding state,
an LSR can also preserve its IP forwarding state across the restart. an LSR can also preserve its IP forwarding state across the restart.
Procedures for preserving IP forwarding state across the restart are Procedures for preserving IP forwarding state across the restart are
defined in [OSPF-RESTART], [ISIS-RESTART], and [BGP-RESTART]. defined in [OSPF-RESTART], [ISIS-RESTART], and [BGP-RESTART].
6.1. Procedures for the restarting LSR 6.1. Procedures for the restarting LSR
For the sake of brevity in the context of this document by "MPLS For the sake of brevity in the context of this document by "MPLS
forwarding state" we mean either <incoming label -> (outgoing label, forwarding state" we mean either <incoming label -> (outgoing label,
next hop)> (non-ingress case), or <FEC->(outgoing label, next hop)> next hop)> (non-ingress case), or <FEC->(outgoing label, next hop)>
(ingress case) mapping. (ingress case) mapping.
After an LSR restarts its control plane, the LSR should check whether After an LSR restarts its control plane, the LSR should check whether
it was able to preserve its MPLS forwarding state from prior to the it was able to preserve its MPLS forwarding state from prior to the
restart. If no, then the LSR must set the Recovery Time to 0 in the restart. If no, then the LSR must set the Recovery Time to 0 in the
Graceful Restart TLV the LSR sends to its neighbors. FT Session TLV the LSR sends to its neighbors.
If the forwarding state has been preserved, then the LSR starts its If the forwarding state has been preserved, then the LSR starts its
internal timer, called MPLS Forwarding State Holding timer (the value internal timer, called MPLS Forwarding State Holding timer (the value
of that timer should be configurable), and marks all the MPLS of that timer should be configurable), and marks all the MPLS
forwarding state entries as "stale". At the expiration of the timer, forwarding state entries as "stale". At the expiration of the timer,
all the entries still marked as stale should be deleted. The value of all the entries still marked as stale should be deleted. The value of
the Recovery Time advertised in the Graceful Restart TLV should be the Recovery Time advertised in the FT Session TLV should be set to
set to the (current) value of the timer at the point when the the (current) value of the timer at the point when the Initialization
Initialization message carrying the Graceful Restart TLV is sent. message carrying the FT Session TLV is sent.
We say that an LSR is in the process of restarting when the MPLS We say that an LSR is in the process of restarting when the MPLS
Forwarding State Holding timer is not expired. Once the timer Forwarding State Holding timer is not expired. Once the timer
expires, we say that the LSR completed its restart. expires, we say that the LSR completed its restart.
The following procedures apply when an LSR is in the process of The following procedures apply when an LSR is in the process of
restarting. restarting.
6.1.1. Non-egress LSR 6.1.1. Non-egress LSR
skipping to change at page 7, line 18 skipping to change at page 7, line 28
described in Section 6.1. described in Section 6.1.
The procedures described in this section assumes that the restarting The procedures described in this section assumes that the restarting
LSR has (at least) as many unallocated as allocated labels. The LSR has (at least) as many unallocated as allocated labels. The
latter form the MPLS forwarding state that the LSR managed to latter form the MPLS forwarding state that the LSR managed to
preserve across the restart. preserve across the restart.
After an LSR restarts its control plane, the LSR should check whether After an LSR restarts its control plane, the LSR should check whether
it was able to preserve its MPLS forwarding state from prior to the it was able to preserve its MPLS forwarding state from prior to the
restart. If no, then the LSR must set the Recovery Time to 0 in the restart. If no, then the LSR must set the Recovery Time to 0 in the
Graceful Restart TLV the LSR sends to its neighbors. FT Session TLV the LSR sends to its neighbors.
If the forwarding state has been preserved, then the LSR starts its If the forwarding state has been preserved, then the LSR starts its
internal timer, called MPLS Forwarding State Holding timer (the value internal timer, called MPLS Forwarding State Holding timer (the value
of that timer should be configurable), and marks all the MPLS of that timer should be configurable), and marks all the MPLS
forwarding state entries as "stale". At the expiration of the timer, forwarding state entries as "stale". At the expiration of the timer,
all the entries still marked as stale should be deleted. The value of all the entries still marked as stale should be deleted. The value of
the Recovery Time advertised in the Graceful Restart TLV should be the Recovery Time advertised in the FT Session TLV should be set to
set to the (current) value of the timer at the point when the the (current) value of the timer at the point when the Initialization
Initialization message carrying the Graceful Restart TLV is sent. message carrying the FT Session TLV is sent.
We say that an LSR is in the process of restarting when the MPLS We say that an LSR is in the process of restarting when the MPLS
Forwarding State Holding timer is not expired. Once the timer Forwarding State Holding timer is not expired. Once the timer
expires, we say that the LSR completed its restart. expires, we say that the LSR completed its restart.
While an LSR is in the process of restarting, the LSR creates local While an LSR is in the process of restarting, the LSR creates local
label binding by following the normal LDP procedures. label binding by following the normal LDP procedures.
Note that while an LSR is in the process of restarting, the LSR may Note that while an LSR is in the process of restarting, the LSR may
have not one, but two local label bindings for a given FEC - one that have not one, but two local label bindings for a given FEC - one that
was retained from prior to restart, and another that was created was retained from prior to restart, and another that was created
after the restart. Once the LSR completes its restart, the former after the restart. Once the LSR completes its restart, the former
will be deleted. Both of these bindings though would have the same will be deleted. Both of these bindings though would have the same
outgoing label (and the same next hop). outgoing label (and the same next hop).
6.3. Restart of LDP communication with a neighbor LSR 6.3. Restart of LDP communication with a neighbor LSR
When an LSR detects that its LDP session with a neighbor went down, When an LSR detects that its LDP session with a neighbor went down,
and the LSR knows that the neighbor is capable of preserving its MPLS and the LSR knows that the neighbor is capable of preserving its MPLS
forwarding state across the restart (as was indicated by the Graceful forwarding state across the restart (as was indicated by the FT
Restart TLV in the Initialization message received from the Session TLV in the Initialization message received from the
neighbor), the LSR should retain the label-FEC bindings received via neighbor), the LSR should retain the label-FEC bindings received via
that session (rather than discarding the bindings), but should mark that session (rather than discarding the bindings), but should mark
them as "stale". them as "stale".
After detecting that the LDP session with the neighbor went down, the After detecting that the LDP session with the neighbor went down, the
LSR should try to re-establish LDP communication with the neighbor. LSR should try to re-establish LDP communication with the neighbor
following the usual LDP procedures.
The amount of time the LSR should keep its stale label-FEC bindings The amount of time the LSR should keep its stale label-FEC bindings
is set to the lesser of the Restart Time, as was advertised by the is set to the lesser of the FT Reconnect Timeout, as was advertised
neighbor, and a local timer. After that, if the LSR still doesn't by the neighbor, and a local timer, called the Neighbor Liveness
establish an LDP session with the neighbor, all stale bindings should Timer. If within that time the LSR still doesn't establish an LDP
be deleted. The local timer is started when the LSR detects that its session with the neighbor, all the stale bindings should be deleted.
LDP session with the neighbor went down. The value of the local timer The Neighbor Liveness Timer is started when the LSR detects that its
should be configurable. LDP session with the neighbor went down. The value of the Neighbor
Liveness timer should be configurable.
If the LSR re-establishes an LDP session with the neighbor within the If the LSR re-establishes an LDP session with the neighbor within the
lesser of the Restart Time and the local timer, and the LSR lesser of the FT Reconnect Timeout and the Neighbor Liveness Timer,
determines that the neighbor was not able to preserve its MPLS and the LSR determines that the neighbor was not able to preserve its
forwarding state, the LSR should immediately delete all the stale MPLS forwarding state, the LSR should immediately delete all the
label-FEC bindings received from that neighbor. If the LSR determines stale label-FEC bindings received from that neighbor. If the LSR
that the neighbor was able to preserve its MPLS forwarding state (as determines that the neighbor was able to preserve its MPLS forwarding
was indicated by the non-zero Recovery Time advertised by the state (as was indicated by the non-zero Recovery Time advertised by
neighbor), the LSR should further keep the stale label-FEC bindings the neighbor), the LSR should further keep the stale label-FEC
received from the neighbor for as long as the lesser of the Recovery bindings received from the neighbor for as long as the lesser of the
Time, advertised by the neighbor, and a local configurable value. Recovery Time, advertised by the neighbor, and a local configurable
value, called Maximum Recovery Time.
The LSR should try to complete the exchange of its label mapping The LSR should try to complete the exchange of its label mapping
information with the neighbor within the Recovery Time, as specified information with the neighbor within 1/2 of the Recovery Time, as
in the Graceful Restart TLV received from the neighbor. specified in the FT Session TLV received from the neighbor.
The LSR should handle the Label Mapping messages received from the The LSR should handle the Label Mapping messages received from the
neighbor by following the normal LDP procedures, except that (a) it neighbor by following the normal LDP procedures, except that (a) it
should treat the stale entries in its Label Information Base (LIB), should treat the stale entries in its Label Information Base (LIB),
as if these entries have been received over the (newly established) as if these entries have been received over the (newly established)
session, (b) if the label-FEC binding carried in the message is the session, (b) if the label-FEC binding carried in the message is the
same as the one that is present in the LIB, but is marked as stale, same as the one that is present in the LIB, but is marked as stale,
the LIB entry should no longer be marked as stale, and (c) if for the the LIB entry should no longer be marked as stale, and (c) if for the
FEC in the label-FEC binding carried in the message there is already FEC in the label-FEC binding carried in the message there is already
a label-FEC binding in the LIB that is marked as stale, and the label a label-FEC binding in the LIB that is marked as stale, and the label
skipping to change at page 9, line 6 skipping to change at page 9, line 22
the FEC in the binding. If the route to the FEC disappears, and then the FEC in the binding. If the route to the FEC disappears, and then
re-appears again later, then this may result in using a different re-appears again later, then this may result in using a different
label value, as when the route re-appears, the LSR would create a new label value, as when the route re-appears, the LSR would create a new
<label, FEC> binding. <label, FEC> binding.
To minimize the potential mis-routing caused by the label change, To minimize the potential mis-routing caused by the label change,
when creating a new <label, FEC> binding the LSR should pick up the when creating a new <label, FEC> binding the LSR should pick up the
least recently used label. Once an LSR releases a label, the LSR least recently used label. Once an LSR releases a label, the LSR
should not re-use this label for advertising a <label, FEC> binding should not re-use this label for advertising a <label, FEC> binding
to a neighbor that supports graceful restart for at least the sum of to a neighbor that supports graceful restart for at least the sum of
Restart Time plus Recovery Time, as advertised by the neighbor to the FT Reconnect Timeout plus Recovery Time, as advertised by the
LSR. neighbor to the LSR.
7. Security Consideration 7. Security Consideration
This document does not introduce new security issues. The security This document does not introduce new security issues. The security
considerations pertaining to the original LDP protocol remain considerations pertaining to the original LDP protocol remain
relevant. relevant.
8. Intellectual Property Considerations 8. Intellectual Property Considerations
Juniper Networks, Inc. is seeking patent protection on some or all of Juniper Networks, Inc. is seeking patent protection on some or all of
skipping to change at page 9, line 33 skipping to change at page 10, line 7
Redback Networks, Inc. is seeking patent protection on some of the Redback Networks, Inc. is seeking patent protection on some of the
technology described in this Internet-Draft. If technology in this technology described in this Internet-Draft. If technology in this
document is adopted as a standard, Redback Networks agrees to document is adopted as a standard, Redback Networks agrees to
license, on reasonable and non-discriminatory terms, any patent license, on reasonable and non-discriminatory terms, any patent
rights it obtains covering such technology to the extent necessary to rights it obtains covering such technology to the extent necessary to
comply with the standard. comply with the standard.
9. Acknowledgments 9. Acknowledgments
We would like to thank Chaitanya Kodeboyina, Nischal Sheth, and Enke We would like to thank Chaitanya Kodeboyina, Ina Minei, Nischal
Chen for their contributions to this document. Sheth, and Enke Chen for their contributions to this document.
10. References 10. Normative References
[LDP] "Label Distribution Protocol", RFC3036 [LDP] "Label Distribution Protocol", RFC3036
[FT-LDP] "Fault Tolerance for LDP and CR-LDP", work in progress
11. Non-normative References
[OSPF-RESTART] "Hitless OSPF Restart", draft-ietf-ospf-hitless- [OSPF-RESTART] "Hitless OSPF Restart", draft-ietf-ospf-hitless-
restart-01.txt restart-01.txt
[ISIS-RESTART] "Restart signaling for ISIS", draft-shand-isis- [ISIS-RESTART] "Restart signaling for ISIS", draft-ietf-isis-
restart-00.txt restart-01.txt
[BGP-RESTART] "Graceful Restart Mechanism for BGP", draft-ietf-idr- [BGP-RESTART] "Graceful Restart Mechanism for BGP", draft-ietf-idr-
restart-00.txt restart-03.txt
11. Author Information 12. Author Information
Manoj Leelanivas Manoj Leelanivas
Juniper Networks Juniper Networks
1194 N.Mathilda Ave 1194 N.Mathilda Ave
Sunnyvale, CA 94089 Sunnyvale, CA 94089
e-mail: manoj@juniper.net e-mail: manoj@juniper.net
Yakov Rekhter Yakov Rekhter
Juniper Networks Juniper Networks
1194 N.Mathilda Ave 1194 N.Mathilda Ave
 End of changes. 

This html diff was produced by rfcdiff 1.23, available from http://www.levkowetz.com/ietf/tools/rfcdiff/