draft-ietf-idr-route-damp-03.txt   rfc2439.txt 
Internet Engineering Task Force Curtis Villamizar Network Working Group C. Villamizar
INTERNET-DRAFT ANS Request for Comments: 2439 ANS
draft-ietf-idr-route-damp-03 Ravi Chandra Category: Standards Track R. Chandra
Cisco Cisco
Ramesh Govindan R. Govindan
ISI ISI
May 15, 1998 November 1998
BGP Route Flap Damping BGP Route Flap Damping
Status of this Memo Status of this Memo
This document is an Internet-Draft. Internet-Drafts are working This document specifies an Internet standards track protocol for the
documents of the Internet Engineering Task Force (IETF), its areas, Internet community, and requests discussion and suggestions for
and its working groups. Note that other groups may also distribute improvements. Please refer to the current edition of the "Internet
working documents as Internet-Drafts. Official Protocol Standards" (STD 1) for the standardization state
and status of this protocol. Distribution of this memo is unlimited.
Internet-Drafts are draft documents valid for a maximum of six months Copyright Notice
and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet- Drafts as reference
material or to cite them other than as ``work in progress.''
To view the entire list of current Internet-Drafts, please check Copyright (C) The Internet Society (1998). All Rights Reserved.
the "1id-abstracts.txt" listing contained in the Internet-Drafts
Shadow Directories on ftp.is.co.za (Africa), ftp.nordu.net
(Northern Europe), ftp.nis.garr.it (Southern Europe), munnari.oz.au
(Pacific Rim), ftp.ietf.org (US East Coast), or ftp.isi.edu
(US West Coast).
Abstract Abstract
A usage of the BGP routing protocol is described which is capable of A usage of the BGP routing protocol is described which is capable of
reducing the routing traffic passed on to routing peers and therefore reducing the routing traffic passed on to routing peers and therefore
the load on these peers without adversely affecting route convergence the load on these peers without adversely affecting route convergence
time for relatively stable routes. This technique has been time for relatively stable routes. This technique has been
implemented in commercial products supporting BGP. The technique is implemented in commercial products supporting BGP. The technique is
also applicable to IDRP. also applicable to IDRP.
The overall goals are:
o to provide a mechanism capable of reducing router processing load The overall goals are:
caused by instability
o in doing so prevent sustained routing oscillations o to provide a mechanism capable of reducing router processing load
caused by instability
o to do so without sacrificing route convergence time for generally o in doing so prevent sustained routing oscillations
well behaved routes.
This must be accomplished keeping other goals of BGP in mind: o to do so without sacrificing route convergence time for generally
well behaved routes.
o pack changes into a small number of updates This must be accomplished keeping other goals of BGP in mind:
o preserve consistent routing o pack changes into a small number of updates
o minimal addition space and computational overhead o preserve consistent routing
o minimal addition space and computational overhead
An excessive rate of update to the advertised reachability of a subset An excessive rate of update to the advertised reachability of a
of Internet prefixes has been widespread in the Internet. This subset of Internet prefixes has been widespread in the Internet.
observation was made in the early 1990s by many people involved in This observation was made in the early 1990s by many people involved
Internet operations and remains the case. These excessive updates are in Internet operations and remains the case. These excessive updates
not necessarily periodic so route oscillation would be a misleading are not necessarily periodic so route oscillation would be a
term. The informal term used to describe this effect is ``route misleading term. The informal term used to describe this effect is
flap''. The techniques described here are now widely deployed and are "route flap". The techniques described here are now widely deployed
commonly referred to as ``route flap damping''. and are commonly referred to as "route flap damping".
1 Overview 1 Overview
To maintain scalability of a routed internet, it is necessary to To maintain scalability of a routed internet, it is necessary to
reduce the amount of change in routing state propagated by BGP in reduce the amount of change in routing state propagated by BGP in
order to limit processing requirements. The primary contributors of order to limit processing requirements. The primary contributors of
processing load resulting from BGP updates are the BGP decision processing load resulting from BGP updates are the BGP decision
process and adding and removing forwarding entries. process and adding and removing forwarding entries.
Consider the following example. A widely deployed BGP implementation Consider the following example. A widely deployed BGP implementation
may tend to fail due to high routing update volume. For example, it may tend to fail due to high routing update volume. For example, it
may be unable to maintain it's BGP or IGP sessions if sufficiently may be unable to maintain it's BGP or IGP sessions if sufficiently
loaded. The failure of one router can further contribute to the load loaded. The failure of one router can further contribute to the load
on other routers. This additional load may cause failures in other on other routers. This additional load may cause failures in other
instances of the same implementation or other implementations with a instances of the same implementation or other implementations with a
similar weakness. In the worst case, a stable oscillation could similar weakness. In the worst case, a stable oscillation could
result. Such worse cases have already been observed in practice. result. Such worse cases have already been observed in practice.
A BGP implementation must be prepared for a large volume of routing A BGP implementation must be prepared for a large volume of routing
traffic. A BGP implementation cannot rely upon the sender to traffic. A BGP implementation cannot rely upon the sender to
sufficiently shield it from route instabilities. The guidelines here sufficiently shield it from route instabilities. The guidelines here
are designed to prevent sustained oscillations, but do not eliminate are designed to prevent sustained oscillations, but do not eliminate
the need for robust and efficient implementations. The mechanisms the need for robust and efficient implementations. The mechanisms
described here allow routing instability to be contained at an AS described here allow routing instability to be contained at an AS
border router bordering the instability. border router bordering the instability.
Even where BGP implementations are highly robust, the performance of Even where BGP implementations are highly robust, the performance of
the routing process is limited. Limiting the propagation of the routing process is limited. Limiting the propagation of
unnecessary change then becomes an issue of maintaining reasonable unnecessary change then becomes an issue of maintaining reasonable
route change convergence time as a routing topology grows. route change convergence time as a routing topology grows.
2 Methods of Limiting Route Advertisement 2 Methods of Limiting Route Advertisement
Two methods of controlling the frequency of route advertisement are Two methods of controlling the frequency of route advertisement are
described here. The first involves fixed timers. The fixed timer described here. The first involves fixed timers. The fixed timer
technique has no space overhead per route but has the disadvantage of technique has no space overhead per route but has the disadvantage of
slowing route convergence for the normal case where a route does not slowing route convergence for the normal case where a route does not
have a history of instability. The second method overcomes this have a history of instability. The second method overcomes this
limitation at the expense of maintaining some additional space limitation at the expense of maintaining some additional space
overhead. The additional overhead includes a small amount of state overhead. The additional overhead includes a small amount of state
per route and a very small processing overhead. per route and a very small processing overhead.
It is possible and desirable to combine both techniques. In practice, It is possible and desirable to combine both techniques. In
fixed timers have been set to very short time intervals and have practice, fixed timers have been set to very short time intervals and
proven useful to pack routes (NLRI) into a smaller number of updates have proven useful to pack routes into a smaller number of updates
when routes arrive in separate updates. when routes arrive in separate updates. The BGP protocol refers to
this as packing Network Layer Reachability Information (NLRI) [5].
Seldom are fixed timers set to the tens of minutes to hours that would Seldom are fixed timers set to the tens of minutes to hours that
be necessary to actually damp route flap. To do so would produce the would be necessary to actually damp route flap. To do so would
undesirable effect of severely limiting routing convergence. produce the undesirable effect of severely limiting routing
convergence.
2.1 Existing Fixed Timer Recommendations 2.1 Existing Fixed Timer Recommendations
BGP-3 does not make specific recommendations in this area [1]. The BGP-3 does not make specific recommendations in this area [1]. The
short section entitled ``Frequency of Route Selection'' simply short section entitled "Frequency of Route Selection" simply
recommends that something be done and makes broad statements regarding recommends that something be done and makes broad statements
certain properties that are desirable or undesirable. regarding certain properties that are desirable or undesirable.
BGP4 retains the ``Frequency of Route Advertisement'' section and adds BGP4 retains the "Frequency of Route Advertisement" section and adds
a ``Frequency of Route Origination'' section. BGP-4 describes a a "Frequency of Route Origination" section. BGP-4 describes a method
method of limiting route advertisement involving a fixed of limiting route advertisement involving a fixed (configurable)
(configurable) MinRouteAdvertisementInterval timer and fixed MinRouteAdvertisementInterval timer and fixed
MinASOriginationInterval timer [5]. The recommended timer values of MinASOriginationInterval timer [5]. The recommended timer values of
MinRouteAdvertisementInterval is 30 seconds and MinRouteAdvertisementInterval is 30 seconds and
MinASOriginationInterval is 15 seconds. MinASOriginationInterval is 15 seconds.
2.2 Desirable Properties of Damping Algorithms 2.2 Desirable Properties of Damping Algorithms
Before describing damping algorithms the objectives need to be clearly Before describing damping algorithms the objectives need to be
defined. Some key properties are examined to clarify the design clearly defined. Some key properties are examined to clarify the
rationale. design rationale.
The overall objective is to reduce the route update load without The overall objective is to reduce the route update load without
limiting convergence time for well behaved routes. To accomplish limiting convergence time for well behaved routes. To accomplish
this, criteria must be defined for well behaved and poorly behaved this, criteria must be defined for well behaved and poorly behaved
routes. An algorithm must be defined which allows poorly behaved routes. An algorithm must be defined which allows poorly behaved
routes to be identified. Ideally, this measure would be a prediction routes to be identified. Ideally, this measure would be a prediction
of the future stability of a route. of the future stability of a route.
Any delay in propagation of well behaved routes should be minimal. Any delay in propagation of well behaved routes should be minimal.
Some delay is tolerable to support better packing of updates. Delay Some delay is tolerable to support better packing of updates. Delay
of poorly behave routes should, if possible, be proportional to a of poorly behave routes should, if possible, be proportional to a
measure of the expected future instability of the route. Delay in measure of the expected future instability of the route. Delay in
propagating an unstable route should cause the unstable route to be propagating an unstable route should cause the unstable route to be
suppressed until there is some degree of confidence that the route has suppressed until there is some degree of confidence that the route
stabilized. has stabilized.
If a large number of route changes are received in separate updates If a large number of route changes are received in separate updates
over some very short period of time and these updates have the over some very short period of time and these updates have the
potential to be combined into a single update then these should be potential to be combined into a single update then these should be
packed as efficiently as possible before propagating further. Some packed as efficiently as possible before propagating further. Some
small delay in propagating well behaved routes is tolerable and is small delay in propagating well behaved routes is tolerable and is
necessary to allow better packing of updates. necessary to allow better packing of updates.
Where routes are unstable, use and announcement of the routes should Where routes are unstable, use and announcement of the routes should
be suppressed rather than suppressing their removal. Where one route be suppressed rather than suppressing their removal. Where one route
to a destination is stable, and another route to the same destination to a destination is stable, and another route to the same destination
is somewhat unstable, if possible, the unstable route should be is somewhat unstable, if possible, the unstable route should be
suppressed more aggressively than if there were no alternate path. suppressed more aggressively than if there were no alternate path.
Routing consistency within an AS is very important. Only very minimal Routing consistency within an AS is very important. Only very
delay of internal BGP (IBGP) should be done. Routing consistency minimal delay of internal BGP (IBGP) should be done. Routing
across AS boundaries is also very important. It is highly undesirable consistency across AS boundaries is also very important. It is
to advertise a route that is different from the route that is being highly undesirable to advertise a route that is different from the
used, except for a very minimal time. It is more desirable to route that is being used, except for a very minimal time. It is more
suppress the acceptance of a route (and therefore the use of that desirable to suppress the acceptance of a route (and therefore the
route in the IGP) rather than suppress only the redistribution. use of that route in the IGP) rather than suppress only the
redistribution.
It is clearly not possible to accurately predict the future stability It is clearly not possible to accurately predict the future stability
of a route. The recent history of stability is generally regarded as of a route. The recent history of stability is generally regarded as
a good basis for estimating the likelihood of future stability. The a good basis for estimating the likelihood of future stability. The
criteria that is used to distinguish well behaved from poorly behaved criteria that is used to distinguish well behaved from poorly behaved
routes is therefore based on the recent history of stability of the routes is therefore based on the recent history of stability of the
route. There is no simple quantitative expression of recent stability route. There is no simple quantitative expression of recent
so a figure of merit must be defined. Some desirable characteristics stability so a figure of merit must be defined. Some desirable
of this figure of merit would be that the farther in the past that characteristics of this figure of merit would be that the farther in
instability occurred, the less it's affect on the figure of merit and the past that instability occurred, the less it's affect on the
that the instability measure would be cumulative rather than figure of merit and that the instability measure would be cumulative
reflecting only the most recent event. rather than reflecting only the most recent event.
The algorithms should behave such that for routes which have a history The algorithms should behave such that for routes which have a
of stability but make a few transitions, those transitions should be history of stability but make a few transitions, those transitions
made quickly. If transitions continue, advertisement of the route should be made quickly. If transitions continue, advertisement of
should be suppressed. There should be some memory of prior instabil- the route should be suppressed. There should be some memory of prior
ity. The degree to which prior instability is considered should be instability. The degree to which prior instability is considered
gradually reduced as long as the route remains announced and stable. should be gradually reduced as long as the route remains announced
and stable.
2.3 Design Choices 2.3 Design Choices
After routes have been accepted their readvertisement will be briefly After routes have been accepted their readvertisement will be briefly
suppressed to improve packing of updates. There may be a lengthy suppressed to improve packing of updates. There may be a lengthy
suppression of the acceptance of an external route. How long a route suppression of the acceptance of an external route. How long a route
will be suppressed is based on a figure of merit that is expected to will be suppressed is based on a figure of merit that is expected to
be correlated to the probability of future instability of a route. be correlated to the probability of future instability of a route.
Routes with high figure of merit values will be suppressed. An Routes with high figure of merit values will be suppressed. An
exponential decay algorithm was chosen as the basis for reducing the exponential decay algorithm was chosen as the basis for reducing the
figure of merit over time. These choices should be viewed as figure of merit over time. These choices should be viewed as
suggestions for implementation. suggestions for implementation.
An exponential decay function has the property that previous An exponential decay function has the property that previous
instability can be remembered for a fairly long time. The rate at instability can be remembered for a fairly long time. The rate at
which the instability figure of merit decays slows as time goes on. which the instability figure of merit decays slows as time goes on.
Exponential decay has the following property. Exponential decay has the following property.
f(f(figure-of-merit, t1), t2) = f(figure-of-merit, t1+t2) f(f(figure-of-merit, t1), t2) = f(figure-of-merit, t1+t2)
This property allows the decay for a long period to be computed in a This property allows the decay for a long period to be computed in a
single operation regardless of the current value (figure-of-merit). single operation regardless of the current value (figure-of-merit).
As a performance optimization, the decay can be applied in fixed time As a performance optimization, the decay can be applied in fixed time
increments. Given a desired decay half life, the decay for a single increments. Given a desired decay half life, the decay for a single
time increment can be computed ahead of time. The decay for multiple time increment can be computed ahead of time. The decay for multiple
time increments is expressed below. time increments is expressed below.
f(figure-of-merit, n*t0) = f(figure-of-merit, t0)**n = K**n f(figure-of-merit, n*t0) = f(figure-of-merit, t0)**n = K**n
The values of K ** n can be precomputed for a reasonable number of The values of K ** n can be precomputed for a reasonable number of
``n'' and stored in an array. The value of ``K'' is always less than "n" and stored in an array. The value of "K" is always less than
one. The array size can be bounded since the value quickly approaches one. The array size can be bounded since the value quickly
zero. This makes the decay easy to compute using an array bound approaches zero. This makes the decay easy to compute using an array
check, an array lookup and a single multiply regardless as to how much bound check, an array lookup and a single multiply regardless as to
time has elapsed. how much time has elapsed.
3 Limiting Route Advertisements using Fixed Timers 3 Limiting Route Advertisements using Fixed Timers
This method of limiting route advertisements involves the use of fixed This method of limiting route advertisements involves the use of
timers applied to the process of sending routes. It's primary purpose fixed timers applied to the process of sending routes. It's primary
is to improve the packing of routes in BGP update messages. The delay purpose is to improve the packing of routes in BGP update messages.
in advertising a stable route should be bounded and minimal. The The delay in advertising a stable route should be bounded and
delay in advertising an unreachable need not be zero, but should also minimal. The delay in advertising an unreachable need not be zero,
be bounded and should probably have a separate bound set less than or but should also be bounded and should probably have a separate bound
equal to the bound for a reachable advertisement. set less than or equal to the bound for a reachable advertisement.
Routes that need to be readvertised can be marked in the RIB or an The BGP protocol defines the use of a Routing Information Base (RIB).
external set of structures maintained, which references the RIB. Routes that need to be readvertised can be marked in the RIB or an
Periodically, a subset of the marked routes can be flushed. This is external set of structures maintained, which references the RIB.
fairly straightforward and accomplishes the objectives. Computation
for too simple an implementation may be order N squared. To avoid N
squared performance, some form of data structure is needed to group
routes with common attributes.
An implementation should pack updates efficiently, provide a minimum Periodically, a subset of the marked routes can be flushed. This is
readvertisement delay, provide a bounds on the maximum readvertisement fairly straightforward and accomplishes the objectives. Computation
delay that would be experienced solely as a result of the algorithm for too simple an implementation may be order N squared. To avoid N
used to provide a minimum delay, and must be computationally efficient squared performance, some form of data structure is needed to group
in the presence of a very large number of candidates for routes with common attributes.
readvertisement.
4 Stability Sensitive Suppression of Route Advertisement An implementation should pack updates efficiently, provide a minimum
readvertisement delay, provide a bounds on the maximum
readvertisement delay that would be experienced solely as a result of
the algorithm used to provide a minimum delay, and must be
computationally efficient in the presence of a very large number of
candidates for readvertisement.
This method of limiting route advertisements uses a measure of route 4 Stability Sensitive Suppression of Route Advertisement
stability applied on a per route basis. This technique is applied
when receiving updates from external peers only (EBGP). Applying this
technique to IBGP learned routes or to advertisement to IBGP or EBGP
peers after making a route selection can result in routing loops.
A figure of merit based on a measure of instability is maintained on a This method of limiting route advertisements uses a measure of route
per route basis. This figure of merit is used in the decision to stability applied on a per route basis. This technique is applied
suppress the use of the route. Routes with high figure of merit are when receiving updates from external peers only (EBGP). Applying this
suppressed. Each time a route is withdrawn, the figure of merit is technique to IBGP learned routes or to advertisement to IBGP or EBGP
incremented. While the route is not changing the figure of merit peers after making a route selection can result in routing loops.
value is decayed exponentially with separate decay rates depending on
whether the route is stable and reachable or has been stable and
unreachable. The decay rate may be slower when the route is unreach-
able, or the stability figure of merit could remain fixed (not decay
at all) while the route remains unreachable. Whether to decay un-
reachable routes at the same rate, a slower rate, or not at all is an im-
plementation choice. Decaying at a slower rate is recommended.
A very efficient implementation is suggested in the following A figure of merit based on a measure of instability is maintained on
sections. The implementation only requires computation for the routes a per route basis. This figure of merit is used in the decision to
contained in an update, when an update is received or withdrawn (as suppress the use of the route. Routes with high figure of merit are
opposed to the simplistic approach of periodically decaying each suppressed. Each time a route is withdrawn, the figure of merit is
route). The suggested implementation involves only a small number of incremented. While the route is not changing the figure of merit
simple operations, and can be implemented using scaled integers. value is decayed exponentially with separate decay rates depending on
whether the route is stable and reachable or has been stable and
unreachable. The decay rate may be slower when the route is
unreachable, or the stability figure of merit could remain fixed (not
decay at all) while the route remains unreachable. Whether to decay
unreachable routes at the same rate, a slower rate, or not at all is
an implementation choice. Decaying at a slower rate is recommended.
The behavior of unstable routes is fairly predictable. Severely A very efficient implementation is suggested in the following
flapping routes will often be advertised and withdrawn at regular time sections. The implementation only requires computation for the
intervals corresponding to the timers of a particular protocol (the routes contained in an update, when an update is received or
IGP or exterior protocol in use where the problem exists). Marginal withdrawn (as opposed to the simplistic approach of periodically
circuits or mild congestion can result in a long term pattern of decaying each route). The suggested implementation involves only a
occasional brief route withdrawal or occasional brief connectivity. small number of simple operations, and can be implemented using
scaled integers.
4.1 Single vs. Multiple Configuration Parameter Sets The behavior of unstable routes is fairly predictable. Severely
flapping routes will often be advertised and withdrawn at regular
time intervals corresponding to the timers of a particular protocol
(the IGP or exterior protocol in use where the problem exists).
Marginal circuits or mild congestion can result in a long term
pattern of occasional brief route withdrawal or occasional brief
connectivity.
The behavior of the algorithm is modified by a number of configurable 4.1 Single vs. Multiple Configuration Parameter Sets
parameters. It is possible to configure separate sets of parameters
designed to handle short term severe route flap and chronic milder
route flap (a pattern of occasional drops over a long time period).
The former would require a fast decay and low threshold (allowing a
small number of consecutive flaps to cause a route to be suppressed,
but allowing it to be reused after a relatively short period of
stability). The latter would require a very slow decay and a higher
threshold and might be appropriate for routes for which there was an
alternate path of similar bandwidth.
It may also be desirable to configure different thresholds for routes The behavior of the algorithm is modified by a number of configurable
with roughly equivalent alternate paths than for routes where the parameters. It is possible to configure separate sets of parameters
alternate paths have a lower bandwidth or tend to be congested. This designed to handle short term severe route flap and chronic milder
can be solved by associating a different set of parameters with route flap (a pattern of occasional drops over a long time period).
different ranges of preference values. Parameter selection could be The former would require a fast decay and low threshold (allowing a
based on BGP LOCAL_PREF. small number of consecutive flaps to cause a route to be suppressed,
but allowing it to be reused after a relatively short period of
stability). The latter would require a very slow decay and a higher
threshold and might be appropriate for routes for which there was an
alternate path of similar bandwidth.
Parameter selection could also be based on whether an alternate route It may also be desirable to configure different thresholds for routes
was known. A route would be considered if, for any applicable with roughly equivalent alternate paths than for routes where the
parameter set, an alternate route with the specified preference value alternate paths have a lower bandwidth or tend to be congested. This
existed and the figure of merit associated with the parameter set did can be solved by associating a different set of parameters with
not indicate a need to suppress the route. A less aggressive different ranges of preference values. Parameter selection could be
suppression would be applied to the case where no alternate route at based on BGP LOCAL_PREF.
all existed. In the simplest case, a more aggressive suppression
would be applied if any alternate route existed. Only the highest
preference (most preferred) value needs to be specified, since the
ranges may overlap.
It might also be desirable to configure a different set of thresholds Parameter selection could also be based on whether an alternate route
for routes which rely on switched services and may disconnect at times was known. A route would be considered if, for any applicable
to reduce connect charges. Such routes might be expected to change parameter set, an alternate route with the specified preference value
state somewhat more often, but should be suppressed if continuous existed and the figure of merit associated with the parameter set did
state changes indicate instability. not indicate a need to suppress the route. A less aggressive
suppression would be applied to the case where no alternate route at
all existed. In the simplest case, a more aggressive suppression
would be applied if any alternate route existed. Only the highest
preference (most preferred) value needs to be specified, since the
ranges may overlap.
While not essential, it might be desirable to be able to configure It might also be desirable to configure a different set of thresholds
multiple sets of configuration parameters per route. It may also be for routes which rely on switched services and may disconnect at
desirable to be able to configure sets of parameters that only times to reduce connect charges. Such routes might be expected to
correspond to a set of routes (identified by AS path, peer router, change state somewhat more often, but should be suppressed if
specific destinations or other means). Experience may dictate how continuous state changes indicate instability.
much flexibility is needed and how to best to set the parameters.
Whether to allow different damping parameter sets for different
routes, and whether to allow multiple figures of merit per route is an
implementation choice.
Parameter selection can also be based on prefix length. The rationale While not essential, it might be desirable to be able to configure
is that longer prefixes tend to reach less end systems and are less multiple sets of configuration parameters per route. It may also be
important and these less important prefixes can be damped more desirable to be able to configure sets of parameters that only
aggressively. This technique is in fairly widespread use. Small correspond to a set of routes (identified by AS path, peer router,
sites or those with dense address allocation who are multihomed are specific destinations or other means). Experience may dictate how
often reachable by long prefixes which are not easily aggregated. much flexibility is needed and how to best to set the parameters.
These sites tend to dispute the choice of prefix length for parameter Whether to allow different damping parameter sets for different
selection. Advocates of the technique point out that it encourages routes, and whether to allow multiple figures of merit per route is
better aggregation. an implementation choice.
4.2 Configuration Parameters Parameter selection can also be based on prefix length. The
rationale is that longer prefixes tend to reach less end systems and
are less important and these less important prefixes can be damped
more aggressively. This technique is in fairly widespread use.
Small sites or those with dense address allocation who are multihomed
are often reachable by long prefixes which are not easily aggregated.
These sites tend to dispute the choice of prefix length for parameter
selection. Advocates of the technique point out that it encourages
better aggregation.
At configuration time, a number of parameters may be specified by the 4.2 Configuration Parameters
user. The configuration parameters are expressed in units meaningful
to the user. These differ from the parameters used at run time which
are in unit convenient for computation. The run time parameters are
derived from the configuration parameters. Suggested configuration
parameters are listed below.
cutoff threshold (cut) At configuration time, a number of parameters may be specified by the
user. The configuration parameters are expressed in units meaningful
to the user. These differ from the parameters used at run time which
are in unit convenient for computation. The run time parameters are
derived from the configuration parameters. Suggested configuration
parameters are listed below.
This value is expressed as a number of route withdrawals. It is cutoff threshold (cut)
the value above which a route advertisement will be suppressed.
reuse threshold (reuse) This value is expressed as a number of route withdrawals. It is
the value above which a route advertisement will be suppressed.
This value is expressed as a number of route withdrawals. It is reuse threshold (reuse)
the value below which a suppressed route will now be used again.
maximum hold down time (T-hold) This value is expressed as a number of route withdrawals. It is
the value below which a suppressed route will now be used again.
This value is the maximum time a route can be suppressed no matter maximum hold down time (T-hold)
how unstable it has been prior to this period of stability.
decay half life while reachable (decay-ok) This value is the maximum time a route can be suppressed no
matter how unstable it has been prior to this period of
stability.
This value is the time duration in minutes or seconds during which decay half life while reachable (decay-ok)
the accumulated stability figure of merit will be reduced by half
if the route if considered reachable (whether suppressed or not).
decay half life while unreachable (decay-ng) This value is the time duration in minutes or seconds during
which the accumulated stability figure of merit will be reduced
by half if the route if considered reachable (whether suppressed
or not).
This value is the time duration in minutes or seconds during which decay half life while unreachable (decay-ng)
the accumulated stability figure of merit will be reduced by half
if the route if considered unreachable. If not specified or set to
zero, no decay will occur while a route remains unreachable.
decay memory limit (Tmax-ok or Tmax-ng) This value is the time duration in minutes or seconds during
which the accumulated stability figure of merit will be reduced
by half if the route if considered unreachable. If not
specified or set to zero, no decay will occur while a route
remains unreachable.
This is the maximum time that any memory of previous instability decay memory limit (Tmax-ok or Tmax-ng)
will be retained given that the route's state remains unchanged,
whether reachable or unreachable. This parameter is generally used
to determine array sizes.
There may be multiple sets of the parameters above as described in This is the maximum time that any memory of previous instability
Section 4.1. The configuration parameters listed below would be will be retained given that the route's state remains unchanged,
applied system wide. These include the time granularity of all whether reachable or unreachable. This parameter is generally
computations, and the parameters used to control reevaluation of used to determine array sizes.
routes that have previously been suppressed.
time granularity (delta-t) There may be multiple sets of the parameters above as described in
Section 4.1. The configuration parameters listed below would be
applied system wide. These include the time granularity of all
computations, and the parameters used to control reevaluation of
routes that have previously been suppressed.
This is the time granularity in seconds used to perform all decay time granularity (delta-t)
computations.
reuse list time granularity (delta-reuse) This is the time granularity in seconds used to perform all
decay computations.
This is the time interval between evaluations of the reuse lists. reuse list time granularity (delta-reuse)
Each reuse lists corresponds to an additional time increment.
reuse list memory reuse-list-max This is the time interval between evaluations of the reuse
lists. Each reuse lists corresponds to an additional time
increment.
This is the time value corresponding to the last reuse list. This reuse list memory reuse-list-max
may be the maximum value of T-hold for all parameter sets of may be
configured.
number of reuse lists (reuse-list-size) This is the time value corresponding to the last reuse list.
This may be the maximum value of T-hold for all parameter sets
of may be configured.
This is the number of reuse lists. It may be determined from number of reuse lists (reuse-list-size)
reuse-list-max or set explicitly.
A necessary optimization is described in Section 4.8.6 that involves This is the number of reuse lists. It may be determined from
an array referred to as the ``reuse index array''. A reuse index reuse-list-max or set explicitly.
array is needed for each decay rate in use. The reuse index array is
used to estimate which reuse list to place a route when it is
suppressed. Proper placement avoids the need to periodically evaluate
decay to determine if a route can be reused or when storage can be
recovered. Using the reuse index array avoids the need to compute a
logarithm to determine placement. One additional system wide
parameter can be introduced.
reuse index array size (reuse-index-array-size) A recommended optimization is described in Section 4.8.6 that
involves an array referred to as the "reuse index array". A reuse
index array is needed for each decay rate in use. The reuse index
array is used to estimate which reuse list to place a route when it
is suppressed. Proper placement avoids the need to periodically
evaluate decay to determine if a route can be reused or when storage
can be recovered. Using the reuse index array avoids the need to
compute a logarithm to determine placement. One additional system
wide parameter can be introduced.
This is the size of reuse index arrays. This size determines the reuse index array size (reuse-index-array-size)
accuracy with which suppressed routes can be placed within the set
of reuse lists when suppressed for a long time.
4.3 Guidelines for Setting Parameters This is the size of reuse index arrays. This size determines
the accuracy with which suppressed routes can be placed within
the set of reuse lists when suppressed for a long time.
The decay half life should be set to a time considerably longer than 4.3 Guidelines for Setting Parameters
the period of the route flap it is intended to address. For example,
if the decay is set to ten minutes and a route is withdrawn and
readvertised exactly every ten minutes, the route would continue to
flap if the cutoff was set to a value of 2 or above.
The stability figure of merit itself is an accumulated time decayed The decay half life should be set to a time considerably longer than
total. This must be kept in mind in setting the decay time, cutoff the period of the route flap it is intended to address. For example,
values and reuse values. For example, if a route flaps at four times if the decay is set to ten minutes and a route is withdrawn and
the decay rate, it will reach 3 in 4 cycles, 4 in 6 cycles, 5 in 10 readvertised exactly every ten minutes, the route would continue to
cycles, and will converge at about 6.3. At twice the decay time, it flap if the cutoff was set to a value of 2 or above.
will reach 3 in 7 cycles, and converge at a value of less than 3.5.
Figure 1 shows the stability figure of merit for route flap at a The stability figure of merit itself is an accumulated time decayed
constant rate. The time axis is labeled in multiples of the decay total. This must be kept in mind in setting the decay time, cutoff
half life. The plots represent route flap with a period of 1/2, 1/3, values and reuse values. The figure of merit is increased each time
1/4, and 1/8 times the decay half life. A ceiling of 4.5 was set, a route transitions from reachable to unreachable. The figure of
which can be seen to affect three of the plots, effectively limiting merit is decayed at a rate proportional to its current value.
the time it takes to readvertise the route regardless of the prior Increasing the rate of route flap therefore increments the figure of
history. With the cutoff and reuse thresholds suggested by the dotted merit more often and reaches a given threshhold in a shorter amount
lines, routes would be suppressed after being declared unreachable 2-3 of time. When the response to a constant rate route flap is plotted
times and be used again after approximately 2 decay half life periods this looks like a sawtooth with an abrupt rising edge and a decaying
of stability. falling edge. Since the absolute decay amount is proportional to the
figure of merit, at a continuous constant flap rate the baseline of
the sawtooth will tend to stop rising and converge if not clipped by
a ceiling value.
From the maximum hold time value (T-hold), a ratio of the reuse value If clipped by a ceiling value, the sawtooth baseline will simply
to a ceiling can be determined. An integer value for the ceiling can reach the ceiling faster at a higher rate of route flap. For
then be chosen such that overflow will not be a problem and all other example, if flapping at four times the decay rate the following
values can be scaled accordingly. If both cutoffs are specified or if progression occurs. When the route becomes unreachable the first
multiple parameter sets are used the highest ceiling will be used. time the value becomes 1. When the next flap occurs, one is added to
the previous value, which has been decreased by the fourth root of 2
(the amount of decay that would occur in 1/4 of the half life time if
decay is exponential). The sequence is 1, 1.84, 2.55, 3.14, 3.64,
4.06, 4.42, 4.71, 4.96, 5.17, ..., converging at about 6.285. If a
route flaps at four times the decay rate, it will reach 3 in 4
cycles, 4 in 6 cycles, 5 in 10 cycles, and will converge at about
6.3. At twice the decay time, it will reach 3 in 7 cycles, and
converge at a value of less than 3.5.
time figure-of-merit as a function of time Figure 1 shows the stability figure of merit for route flap at a
constant rate. The time axis is labeled in multiples of the decay
half life. The plots represent route flap with a period of 1/2, 1/3,
1/4, and 1/8 times the decay half life. A ceiling of 4.5 was set,
which can be seen to affect three of the plots, effectively limiting
the time it takes to readvertise the route regardless of the prior
history. With cutoff and reuse thresholds of 1.5 and 0.75, routes
would be suppressed after being declared unreachable 2-3 times and be
used again after approximately 2 decay half life periods of
stability.
0.00 0.000 . 0.000 . 0.000 . 0.000 . This function can be expressed formally. Reachability of a route can
0.08 0.000 . 0.000 . 0.000 . 0.000 . be represented by a variable "R" with possible values of 0 and 1
0.16 0.000 . 0.000 . 0.000 . 0.973 . representing unreachable and reachable. At a discrete time R can
0.24 0.000 . 0.000 . 0.000 . 0.920 . only have one value. The figure of merit is increased by 1 at each
0.32 0.000 . 0.000 . 0.946 . 1.817 . transition from R=1 to R=0 and clipped to a ceiling value. The decay
0.40 0.000 . 0.953 . 0.895 . 2.698 . in figure of merit can then be expressed over a set of discrete times
0.48 0.000 . 0.901 . 0.847 . 2.552 . as follows.
0.56 0.953 . 0.853 . 1.754 . 3.367 .
0.64 0.901 . 0.807 . 1.659 . 4.172 .
0.72 0.853 . 1.722 . 1.570 . 3.947 .
0.80 0.807 . 1.629 . 2.444 . 4.317 .
0.88 0.763 . 1.542 . 2.312 . 4.469 .
0.96 0.722 . 1.458 . 2.188 . 4.228 .
1.04 1.649 . 2.346 . 3.036 . 4.347 .
1.12 1.560 . 2.219 . 2.872 . 4.112 .
1.20 1.476 . 2.099 . 2.717 . 4.257 .
1.28 1.396 . 1.986 . 3.543 . 4.377 .
1.36 1.321 . 2.858 . 3.352 . 4.141 .
1.44 1.250 . 2.704 . 3.171 . 4.287 .
1.52 2.162 . 2.558 . 3.979 . 4.407 .
1.60 2.045 . 2.420 . 3.765 . 4.170 .
1.68 1.935 . 3.276 . 3.562 . 4.317 .
1.76 1.830 . 3.099 . 4.356 . 4.438 .
1.84 1.732 . 2.932 . 4.121 . 4.199 .
1.92 1.638 . 2.774 . 3.899 . 3.972 .
2.00 1.550 . 2.624 . 3.688 . 3.758 .
2.08 1.466 . 2.483 . 3.489 . 3.555 .
2.16 1.387 . 2.349 . 3.301 . 3.363 .
2.24 1.312 . 2.222 . 3.123 . 3.182 .
2.32 1.242 . 2.102 . 2.955 . 3.010 .
2.40 1.175 . 1.989 . 2.795 . 2.848 .
2.48 1.111 . 1.882 . 2.644 . 2.694 .
2.56 1.051 . 1.780 . 2.502 . 2.549 .
2.64 0.995 . 1.684 . 2.367 . 2.411 .
2.72 0.941 . 1.593 . 2.239 . 2.281 .
2.80 0.890 . 1.507 . 2.118 . 2.158 .
2.88 0.842 . 1.426 . 2.004 . 2.042 .
2.96 0.797 . 1.349 . 1.896 . 1.932 .
3.04 0.754 . 1.276 . 1.794 . 1.828 .
3.12 0.713 . 1.207 . 1.697 . 1.729 .
3.20 0.675 . 1.142 . 1.605 . 1.636 .
3.28 0.638 . 1.081 . 1.519 . 1.547 .
3.36 0.604 . 1.022 . 1.437 . 1.464 .
3.44 0.571 . 0.967 . 1.359 . 1.385 .
Figure 1: Instability figure of merit for flap at a constant rate figure-of-merit(t) = K * figure-of-merit(t - delta-t)
time figure-of-merit as a function of time K = K1 for R=0 K=K2 for R=1
0.00 0.000 . 0.000 . 0.000 . The four plots are presented vertically. Due to space limitations,
0.20 0.000 . 0.000 . 0.000 . only a limited set of points along the time axis are shown. The
0.40 0.000 . 0.000 . 0.000 . value of the figure of merit is given. Along side each value is a
0.60 0.000 . 0.000 . 0.000 . very low resolution strip chart made up of ASCII dots. This is just
0.80 0.000 . 0.000 . 0.000 . intended to give a rough feel for the rise and fall of the values.
1.00 0.999 . 0.999 . 0.999 . The strip charts are not displayed on an overlapping set of axes
1.20 0.971 . 0.971 . 0.929 . because the sawtooth waveforms cross each other quite frequently. At
1.40 0.945 . 0.945 . 0.809 . the very low resolution of these plots, the rise and fall of the
1.60 0.919 . 0.865 . 0.704 . baseline is evident, but the sawtooth nature is only observed in the
1.80 0.894 . 0.753 . 0.613 . printed value.
2.00 1.812 . 1.657 . 1.535 .
2.20 1.762 . 1.612 . 1.428 .
2.40 1.714 . 1.568 . 1.244 .
2.60 1.667 . 1.443 . 1.083 .
2.80 1.622 . 1.256 . 0.942 .
3.00 1.468 . 1.094 . 0.820 .
3.20 2.400 . 2.036 . 1.694 .
3.40 2.335 . 1.981 . 1.475 .
3.60 2.271 . 1.823 . 1.284 .
3.80 2.209 . 1.587 . 1.118 .
4.00 1.999 . 1.381 . 0.973 .
4.20 2.625 . 2.084 . 1.727 .
4.40 2.285 . 1.815 . 1.503 .
4.60 1.990 . 1.580 . 1.309 .
4.80 1.732 . 1.375 . 1.139 .
5.00 1.508 . 1.197 . 0.992 .
5.20 1.313 . 1.042 . 0.864 .
5.40 1.143 . 0.907 . 0.752 .
5.60 0.995 . 0.790 . 0.654 .
5.80 0.866 . 0.688 . 0.570 .
6.00 0.754 . 0.599 . 0.496 .
6.20 0.656 . 0.521 . 0.432 .
6.40 0.571 . 0.454 . 0.376 .
6.60 0.497 . 0.395 . 0.327 .
6.80 0.433 . 0.344 . 0.285 .
7.00 0.377 . 0.299 . 0.248 .
7.20 0.328 . 0.261 . 0.216 .
7.40 0.286 . 0.227 . 0.188 .
7.60 0.249 . 0.197 . 0.164 .
7.80 0.216 . 0.172 . 0.142 .
8.00 0.188 . 0.150 . 0.124 .
Figure 2: Separate decay constants when unreachable From the maximum hold time value (T-hold), a ratio of the reuse value
to a ceiling can be determined. An integer value for the ceiling can
then be chosen such that overflow will not be a problem and all other
values can be scaled accordingly. If both cutoffs are specified or
if multiple parameter sets are used the highest ceiling will be used.
Figure 2 show the effect of configuring separate decay rates to be time figure-of-merit as a function of time (in minutes)
used when the route is reachable or unreachable. The decay rate is
5 times slower when the route is unreachable. In the three case
shown, the period of the route flap is equal to the decay half life
but the route is reachable 1/8 of the time in one, reachable 1/2 the
time in one, and reachable 7/8 of the time in the other. In the last
case the route is not suppressed until after the third unreachable
(when it is above the top threshold after becoming reachable again).
In both Figure 1 and Figure 2, routes would be suppressed. Routes 0.00 0.000 . 0.000 . 0.000 . 0.000 .
flapping at the decay half life or less would be withdrawn two or 0.08 0.000 . 0.000 . 0.000 . 0.000 .
three times and then remain withdrawn until they had remained stably 0.16 0.000 . 0.000 . 0.000 . 0.973 .
announced and stable for on the order of 1 1/2 to 2 1/2 times the 0.24 0.000 . 0.000 . 0.000 . 0.920 .
decay half life (given the ceiling in the example). 0.32 0.000 . 0.000 . 0.946 . 1.817 .
0.40 0.000 . 0.953 . 0.895 . 2.698 .
0.48 0.000 . 0.901 . 0.847 . 2.552 .
0.56 0.953 . 0.853 . 1.754 . 3.367 .
0.64 0.901 . 0.807 . 1.659 . 4.172 .
0.72 0.853 . 1.722 . 1.570 . 3.947 .
0.80 0.807 . 1.629 . 2.444 . 4.317 .
0.88 0.763 . 1.542 . 2.312 . 4.469 .
0.96 0.722 . 1.458 . 2.188 . 4.228 .
1.04 1.649 . 2.346 . 3.036 . 4.347 .
1.12 1.560 . 2.219 . 2.872 . 4.112 .
1.20 1.476 . 2.099 . 2.717 . 4.257 .
1.28 1.396 . 1.986 . 3.543 . 4.377 .
1.36 1.321 . 2.858 . 3.352 . 4.141 .
1.44 1.250 . 2.704 . 3.171 . 4.287 .
1.52 2.162 . 2.558 . 3.979 . 4.407 .
1.60 2.045 . 2.420 . 3.765 . 4.170 .
1.68 1.935 . 3.276 . 3.562 . 4.317 .
1.76 1.830 . 3.099 . 4.356 . 4.438 .
1.84 1.732 . 2.932 . 4.121 . 4.199 .
1.92 1.638 . 2.774 . 3.899 . 3.972 .
2.00 1.550 . 2.624 . 3.688 . 3.758 .
2.08 1.466 . 2.483 . 3.489 . 3.555 .
2.16 1.387 . 2.349 . 3.301 . 3.363 .
2.24 1.312 . 2.222 . 3.123 . 3.182 .
2.32 1.242 . 2.102 . 2.955 . 3.010 .
2.40 1.175 . 1.989 . 2.795 . 2.848 .
2.48 1.111 . 1.882 . 2.644 . 2.694 .
2.56 1.051 . 1.780 . 2.502 . 2.549 .
2.64 0.995 . 1.684 . 2.367 . 2.411 .
2.72 0.941 . 1.593 . 2.239 . 2.281 .
2.80 0.890 . 1.507 . 2.118 . 2.158 .
2.88 0.842 . 1.426 . 2.004 . 2.042 .
2.96 0.797 . 1.349 . 1.896 . 1.932 .
3.04 0.754 . 1.276 . 1.794 . 1.828 .
3.12 0.713 . 1.207 . 1.697 . 1.729 .
3.20 0.675 . 1.142 . 1.605 . 1.636 .
3.28 0.638 . 1.081 . 1.519 . 1.547 .
3.36 0.604 . 1.022 . 1.437 . 1.464 .
3.44 0.571 . 0.967 . 1.359 . 1.385 .
A larger time granularity will keep table storage down. The time Figure 1: Instability figure of merit for flap at a constant rate
granularity should be less than a minimal reasonable time between
expected worse case route flaps. It might be reasonable to fix this
parameter at compile time or set a default and strongly recommend that
the user leave it alone. With an exponential decay, array size can be
greatly reduced by setting a period of complete stability after which
the decayed total will be considered zero rather than retaining a tiny
quantity. Alternately, very long decays can be implemented by
multiplying more than once if array bounds are exceeded.
The reuse lists hold suppressed routes grouped according to how long time figure-of-merit as a function of time (in minutes)
it will be before the routes are eligible for reuse. Periodically
each list will be advanced by one position and one list removed as de-
scribed in Section 4.8.7. All of the suppressed routes in the removed
list will be reevaluated and either used or placed in another list
according to how much additional time must elapse before the route can
be reused. The last list will always contain all the routes which
will not be advertised for more time than is appropriate for the re-
maining list heads. When the last list advances to the front, some of
the routes will not be ready to be used and will have to be requeued.
The time interval for reconsidering suppressed routes and number of list
heads should be configurable. Reasonable defaults might be 30 seconds and
64 list heads. A route suppressed for a long time would need to be reeval-
uated every 32 minutes.
4.4 Run Time Data Structures 0.00 0.000 . 0.000 . 0.000 .
0.20 0.000 . 0.000 . 0.000 .
0.40 0.000 . 0.000 . 0.000 .
0.60 0.000 . 0.000 . 0.000 .
0.80 0.000 . 0.000 . 0.000 .
1.00 0.999 . 0.999 . 0.999 .
1.20 0.971 . 0.971 . 0.929 .
1.40 0.945 . 0.945 . 0.809 .
1.60 0.919 . 0.865 . 0.704 .
1.80 0.894 . 0.753 . 0.613 .
2.00 1.812 . 1.657 . 1.535 .
2.20 1.762 . 1.612 . 1.428 .
2.40 1.714 . 1.568 . 1.244 .
2.60 1.667 . 1.443 . 1.083 .
2.80 1.622 . 1.256 . 0.942 .
3.00 1.468 . 1.094 . 0.820 .
3.20 2.400 . 2.036 . 1.694 .
3.40 2.335 . 1.981 . 1.475 .
3.60 2.271 . 1.823 . 1.284 .
3.80 2.209 . 1.587 . 1.118 .
4.00 1.999 . 1.381 . 0.973 .
4.20 2.625 . 2.084 . 1.727 .
4.40 2.285 . 1.815 . 1.503 .
4.60 1.990 . 1.580 . 1.309 .
4.80 1.732 . 1.375 . 1.139 .
5.00 1.508 . 1.197 . 0.992 .
5.20 1.313 . 1.042 . 0.864 .
5.40 1.143 . 0.907 . 0.752 .
5.60 0.995 . 0.790 . 0.654 .
5.80 0.866 . 0.688 . 0.570 .
6.00 0.754 . 0.599 . 0.496 .
6.20 0.656 . 0.521 . 0.432 .
6.40 0.571 . 0.454 . 0.376 .
6.60 0.497 . 0.395 . 0.327 .
6.80 0.433 . 0.344 . 0.285 .
7.00 0.377 . 0.299 . 0.248 .
7.20 0.328 . 0.261 . 0.216 .
7.40 0.286 . 0.227 . 0.188 .
7.60 0.249 . 0.197 . 0.164 .
7.80 0.216 . 0.172 . 0.142 .
8.00 0.188 . 0.150 . 0.124 .
A fixed small amount of per system storage will be required. Where Figure 2: Separate decay constants when unreachable
sets of multiple configuration parameters are used, storage will be
required per set of parameters. A small amount of per route storage
is required. A set of list heads is needed. These list heads are
used to arrange suppressed routes according to the time remaining
until they can be reused.
A separate reuse list can be used to hold unreachable routes for the Figure 2 shows the effect of configuring separate decay rates to be
purpose of later recovering storage if they remain unreachable too used when the route is reachable or unreachable. The decay rate is 5
long. This might be more accurately described as a recycling list. times slower when the route is unreachable. In the three case shown,
The advantage this would provide is making free data structures the period of the route flap is equal to the decay half life but the
available as soon as possible. Alternately, the data structures can route is reachable 1/8 of the time in one, reachable 1/2 the time in
simply be placed on a queue and the storage recovered when the route one, and reachable 7/8 of the time in the other. In the last case
hits the front of the queue and if storage is needed. The latter is the route is not suppressed until after the third unreachable (when
less optimal but simple. it is above the top threshold after becoming reachable again).
If multiple sets of configuration parameters are allowed per route, The main point of Figure 2 is to show the effect of changing the duty
there is a need for some means of associating more than one figure of cycle of the square wave in the variable "R" for a fixed frequency of
merit and set of parameters with each route. Building a linked list the square wave. If the decay constants are chosen such that decay
of these objects seems like one of a number of reasonable is slower when R=0 (the route is unreachable), then the figure of
implementations. Similarly, a means of associating a route to a reuse merit rises more slowly (more accurately, the baseline of the
list is required. A small overhead will be required for the pointers sawtooth waveform rises more slowly) if the route is reachable a
needed to implement whatever data structure is chosen for the reuse larger percentage of the time. The effect when the route becomes
lists. The suggested implementation uses a double linked lists and so persistently reachable again can be fairly negligible if the sawtooth
requires two pointers per figure of merit. is clipped by a ceiling value, but is more significant if a slow
route flap rate or short interval of route flapping is such that the
sawtooth does not reach the ceiling value. In Figure 2 the interval
in which the routes are unstable is short enough that the ceiling
value is not reached, therefore, the routes that are reachable for a
greater percentage of the route flap cycle are reused (placed in the
RIB and advertised to peers) sooner than others after the route
becomes stable again ("R" becomes 1, indicating the announced state
goes to reachable and remains there).
Each set of configuration parameters can reference decay arrays and In both Figure 1 and Figure 2, routes would be suppressed. Routes
reuse arrays. These arrays should be shared among multiple sets of flapping at the decay half life or less would be withdrawn two or
parameters since their storage requirement is not negligible. There three times and then remain withdrawn until they had remained stably
will be only one set of reuse list heads for the entire router. announced and stable for on the order of 1 1/2 to 2 1/2 times the
decay half life (given the ceiling in the example).
4.4.1 Data Structures for Configuration Parameter Sets The purpose of damping BGP route flap is to reduce the processor
burden at the immediate router and the processor burden to downstream
routers (BGP peer routers and peers of peers that will see the route
announcements advertised by the immediate router). Computing a
figure of merit at each discrete time interval using figure-of-
merit(t) = K * figure-of-merit(t - delta-t) would be very inefficient
and defeat the purpose. This problem is addressed by defering
computation as long as possible and doing a single simple computation
to compensate for the decay during the time that has elapsed since
the figure of merit was last updated. The use of decay arrays
provides the single simple calculation. The use of reuse lists
(described later) provide a means to defer calculations. A route
becomes usable if there was not further change for a period of time
and the route is unreachable. The data structure storage is
recovered if the route's state has not changed for a period of time
and it has been unreachable. The reuse arrays provide a means to
estimate how long a computation can be deferred if there is no
further change.
Based on the configuration parameters described in the previous A larger time granularity will keep table storage down. The time
section, the following values can be computed as scaled integers granularity should be less than a minimal reasonable time between
directly from the corresponding configuration parameters. expected worse case route flaps. It might be reasonable to fix this
parameter at compile time or set a default and strongly recommend
that the user leave it alone. With an exponential decay, array size
can be greatly reduced by setting a period of complete stability
after which the decayed total will be considered zero rather than
retaining a tiny quantity. Alternately, very long decays can be
implemented by multiplying more than once if array bounds are
exceeded.
o decay array scale factor (decay-array-scale-factor) The reuse lists hold suppressed routes grouped according to how long
it will be before the routes are eligible for reuse. Periodically
each list will be advanced by one position and one list removed as
described in Section 4.8.7. All of the suppressed routes in the
removed list will be reevaluated and either used or placed in another
list according to how much additional time must elapse before the
route can be reused. The last list will always contain all the
routes which will not be advertised for more time than is appropriate
for the remaining list heads. When the last list advances to the
front, some of the routes will not be ready to be used and will have
to be requeued. The time interval for reconsidering suppressed
routes and number of list heads should be configurable. Reasonable
defaults might be 30 seconds and 64 list heads. A route suppressed
for a long time would need to be reevaluated every 32 minutes.
o cutoff value (cut) 4.4 Run Time Data Structures
o reuse value (reuse) A fixed small amount of per system storage will be required. Where
sets of multiple configuration parameters are used, storage will be
required per set of parameters. A small amount of per route storage
is required. A set of list heads is needed. These list heads are
used to arrange suppressed routes according to the time remaining
until they can be reused.
o figure of merit ceiling (ceiling) A separate reuse list can be used to hold unreachable routes for the
purpose of later recovering storage if they remain unreachable too
long. This might be more accurately described as a recycling list.
The advantage this would provide is making free data structures
available as soon as possible. Alternately, the data structures can
simply be placed on a queue and the storage recovered when the route
hits the front of the queue and if storage is needed. The latter is
less optimal but simple.
Each configuration parameter set will reference one or two decay If multiple sets of configuration parameters are allowed per route,
arrays and one or two reuse arrays. Only one array will be needed if there is a need for some means of associating more than one figure of
the decay rate is the same while a route is unreachable as while it is merit and set of parameters with each route. Building a linked list
reachable, or if the stability figure of merit does not decay while a of these objects seems like one of a number of reasonable
route is unreachable. implementations. Similarly, a means of associating a route to a
reuse list is required. A small overhead will be required for the
pointers needed to implement whatever data structure is chosen for
the reuse lists. The suggested implementation uses a double linked
lists and so requires two pointers per figure of merit.
4.4.2 Data Structures per Decay Array and Reuse Index Array Each set of configuration parameters can reference decay arrays and
reuse arrays. These arrays should be shared among multiple sets of
parameters since their storage requirement is not negligible. There
will be only one set of reuse list heads for the entire router.
The following are also computed from the configuration parameters 4.4.1 Data Structures for Configuration Parameter Sets
though not as directly.
o decay rate per tick (decay-delta-t) Based on the configuration parameters described in the previous
section, the following values can be computed as scaled integers
directly from the corresponding configuration parameters.
o decay array size (decay-array-size) o decay array scale factor (decay-array-scale-factor)
o decay array (decay[]) o cutoff value (cut)
o reuse index array size (reuse-index-array-size) o reuse value (reuse)
o reuse index array (reuse-index-array[]) o figure of merit ceiling (ceiling)
For each decay rate specified, an array will be used to store the Each configuration parameter set will reference one or two decay
value of a computed parameter raised to the power of the index of each arrays and one or two reuse arrays. Only one array will be needed if
array element. This is to speed computations. The decay rate per the decay rate is the same while a route is unreachable as while it
tick is an intermediate value expressed as a real number and used to is reachable, or if the stability figure of merit does not decay
compute the values stored in the decay arrays. The array size is while a route is unreachable.
computed from the decay memory limit configuration parameter expressed
as an array size or as a maximum hold time.
The decay array size must be of sufficient size to accommodate the 4.4.2 Data Structures per Decay Array and Reuse Index Array
specified decay memory given the time granularity, or sufficient to
hold the number of array elements until integer rounding produces a
zero result if that value is smaller, or a implementation imposed
reasonable size to prevent configurations which use excessive memory.
Implementations may chose to make the array size shorter and multiply
more than once when decaying a long time interval to reduce storage.
The reuse index arrays serve a similar purpose to the decay arrays. The following are also computed from the configuration parameters
The amount of time until a route can be reused can be determined using though not as directly. The computation is described in Section 4.5.
a array lookup. The array can be built given the decay rate. The
array is indexed using a scaled integer proportional to the ratio
between a current stability figure of merit value and the value needed
for the route to be reused.
4.4.3 Per Route State o decay rate per tick (decay-delta-t)
Information must be maintained per some tuple representing a route. o decay array size (decay-array-size)
At the very minimum, the NLRI (BGP prefix and length) must be
contained in the tuple. Different BGP attributes may be included or
excluded depending on the specific situation. The AS path should also
be contained in the tuple be default. The tuple may also optionally
contain other BGP attributes such as MULTI_EXIT_DISCRIMINATOR (MED).
The tuple representing a route for the purpose of route flap damping o decay array (decay[])
is:
o reuse index array size (reuse-index-array-size)
o reuse index array (reuse-index-array[])
For each decay rate specified, an array will be used to store the
value of a computed parameter raised to the power of the index of
each array element. This is to speed computations. The decay rate
per tick is an intermediate value expressed as a real number and used
to compute the values stored in the decay arrays. The array size is
computed from the decay memory limit configuration parameter
expressed as an array size or as a maximum hold time.
The decay array size must be of sufficient size to accommodate the
specified decay memory given the time granularity, or sufficient to
hold the number of array elements until integer rounding produces a
zero result if that value is smaller, or a implementation imposed
reasonable size to prevent configurations which use excessive memory.
Implementations may chose to make the array size shorter and multiply
more than once when decaying a long time interval to reduce storage.
The reuse index arrays serve a similar purpose to the decay arrays.
In BGP, a route is said to be "used" if it is considered the best
route. In this context, if the route is "used" it is placed in the
RIB and is eligible for advertisement to BGP peers. If a route is
withdrawn (a BGP announcement is made by a peer indicating that it is
no longer reachable), then it is no longer eligible for "use". When
a route becomes reachable it may not be "used" immediately if the
figure of merit indicates that a recent instability has occurred.
After the route remains stable and the figure of merit decays below
the "reuse" threshhold, the route is said to be eligible to be
"reused" (treated as truly reachable, placed in the RIB and
advertised to peers). The amount of time until a route can be reused
can be determined using a array lookup. The array can be built given
the decay rate. The array is indexed using a scaled integer
proportional to the ratio between a current stability figure of merit
value and the value needed for the route to be reused.
4.4.3 Per Route State
Information must be maintained per some tuple representing a route.
At the very minimum, the NLRI (BGP prefix and length) must be
contained in the tuple. Different BGP attributes may be included or
excluded depending on the specific situation. The AS path should
also be contained in the tuple by default. The tuple may also
optionally contain other BGP attributes such as
MULTI_EXIT_DISCRIMINATOR (MED).
The tuple representing a route for the purpose of route flap damping
is:
tuple entry default options tuple entry default options
------------------------------------------- -------------------------------------------
NLRI NLRI
prefix required prefix required
length required length required
AS path included option to exclude AS path included option to exclude
last AS set in path excluded option to include last AS set in path excluded option to include
next hop excluded option to include next hop excluded option to include
MED excluded option to include MED excluded option to include
in comparisons only in comparisons only
The AS path is generally included in order to identify downstream The AS path is generally included in order to identify downstream
instability which is not being damped or not being sufficiently damped instability which is not being damped or not being sufficiently
and is alternating between a stable and an unstable path. Under rare damped and is alternating between a stable and an unstable path.
circumstances it may be desirable to exclude AS path for all or a Under rare circumstances it may be desirable to exclude AS path for
subset of prefixes. If an AS path ends in an AS set, in practice the all or a subset of prefixes. If an AS path ends in an AS set, in
path is always for an aggregate. Changes to the trailing AS set practice the path is always for an aggregate. Changes to the
should be ignored. Ideally the AS path comparison should insure that trailing AS set should be ignored. Ideally the AS path comparison
at least one AS has remained constant in the old and new AS set, but should insure that at least one AS has remained constant in the old
completely ignoring the contents of a trailing AS set is also and new AS set, but completely ignoring the contents of a trailing AS
acceptable. set is also acceptable.
Including next hop and MED changes can help suppress the use of an AS Including next hop and MED changes can help suppress the use of an AS
which is internally unstable or avoid a next hop which is closer to an which is internally unstable or avoid a next hop which is closer to
unstable IGP path in the adjacent AS. If a large number of MED values an unstable IGP path in the adjacent AS. If a large number of MED
are used, the increase in the amount of state may become a problem. values are used, the increase in the amount of state may become a
For this reason MED is disabled by default and enabled only as part of problem. For this reason MED is disabled by default and enabled only
the tuple comparison, using a single state entry regardless of MED as part of the tuple comparison, using a single state entry
value. Including MED will suppress the use of the adjacent AS even regardless of MED value. Including MED will suppress the use of the
though the change need not be propagated further. Using MED is only a adjacent AS even though the change need not be propagated further.
safe practice if a path is known to exist through another AS or where Using MED is only a safe practice if a path is known to exist through
there are enough peering sites with the adjacent AS such that routes another AS or where there are enough peering sites with the adjacent
heard at only a subset of the peering sites will be suppressed. AS such that routes heard at only a subset of the peering sites will
be suppressed.
4.4.4 Data Structures per Route 4.4.4 Data Structures per Route
The following information must be maintained per route. A route here The following information must be maintained per route. A route here
is considered to be a tuple usually containing NLRI, next hop, and AS is considered to be a tuple usually containing NLRI, next hop, and AS
path as defined in Section 4.4.3. path as defined in Section 4.4.3.
stability figure of merit (figure-of-merit) stability figure of merit (figure-of-merit)
Each route must have a stability figure of merit per applicable
parameter set.
last time updated (time-update) Each route must have a stability figure of merit per applicable
parameter set.
The exact last time updated must be maintained to allow exponential last time updated (time-update)
decay of the accumulated figure of merit to be deferred until the The exact last time updated must be maintained to allow
route might reasonable be considered eligible for a change in exponential decay of the accumulated figure of merit to be
status (having gone from unreachable to reachable or advancing deferred until the route might reasonable be considered eligible
within the reuse lists). for a change in status (having gone from unreachable to
reachable or advancing within the reuse lists).
config block pointer config block pointer
Any implementation that supports multiple parameter sets must Any implementation that supports multiple parameter sets must
provide a means of quickly identifying which set of parameters provide a means of quickly identifying which set of parameters
corresponds to the route currently being considered. For corresponds to the route currently being considered. For
implementations supporting only parameter sets where all routes implementations supporting only parameter sets where all routes
must be treated the same, this pointer is not required. must be treated the same, this pointer is not required.
reuse list traversal pointers reuse list traversal pointers
If doubly linked lists are used to implement reuse lists, then two If doubly linked lists are used to implement reuse lists, then
pointers will be needed, previous and next. Generally there is a two pointers will be needed, previous and next. Generally there
double linked list which is unused when a route is suppressed from is a double linked list which is unused when a route is
use that can be used for reuse list traversal eliminating the need suppressed from use that can be used for reuse list traversal
for additional pointer storage. eliminating the need for additional pointer storage.
4.5 Processing Configuration Parameters 4.5 Processing Configuration Parameters
From the configuration parameters, it is possible to precompute a From the configuration parameters, it is possible to precompute
number of values that will be used repeatedly and retain these to a number of values that will be used repeatedly and retain these
speed later computations that will be required frequently. to speed later computations that will be required frequently.
Scaling is usually dependent on the highest value that figure-of-merit Scaling is usually dependent on the highest value that figure-
can attain, referred to here as the ceiling. The real number value of of-merit can attain, referred to here as the ceiling. The real
the ceiling will typically be determined by the following equation. number value of the ceiling will typically be determined by the
following equation. The ceiling can also be configured to a
specific value, which in turn dictates T-hold.
ceiling = reuse * (exp(T-hold/decay-half-life) * log(2)) ceiling = reuse * (exp(T-hold/decay-half-life) * log(2))
The methods of scaled integer arithmetic are not described in detail In the above equation, reuse is the reuse threshhold described
here. The methods of determining the real values are given. in Section 4.2.
Translation into scaled integer values and the details of scaled
integer arithmetic are left up to the individual implementations.
figure of merit scale factor ( scale-figure-of-merit ) The methods of scaled integer arithmetic are not described in
detail here. The methods of determining the real values are
given. Translation into scaled integer values and the details
of scaled integer arithmetic are left up to the individual
implementations.
The ceiling value can be set to be the largest integer that can fit The ceiling value can be set to be the largest integer that can fit
in half the bits available for an unsigned integer. This will in half the bits available for an unsigned integer. This will
allow the scaled integers to be multiplied by the scaled decay allow the scaled integers to be multiplied by the scaled decay
value and then shifted down. Implementations may prefer to use value and then shifted down. Implementations may prefer to use
real numbers or may use any integer scaling deemed appropriate for real numbers or may use any integer scaling deemed appropriate for
their architecture. their architecture.
penalty value and thresholds (as proportional scaled integers) penalty value and thresholds (as proportional scaled integers)
The figure of merit penalty for one route withdrawal and the cutoff
values must be scaled according to the above scaling factor.
decay rate per tick (decay[1])
The decay value per increment of time as defined by the time The figure of merit penalty for one route withdrawal and the
granularity must be determined (at least initially as a floating cutoff values must be scaled according to the above scaling
point number). The per tick decay is a number slightly less than factor.
one. It is the Nth root of the one half where N is the half life
divided by the time granularity.
decay[1] = exp ((1 / (decay-half-life/delta-t)) * log decay rate per tick (decay[1])
(1/2))
decay array size (decay-array-size) The decay value per increment of time as defined by the time
granularity must be determined (at least initially as a floating
point number). The per tick decay is a number slightly less
than one. It is the Nth root of the one half where N is the
half life divided by the time granularity.
The decay array size is the decay memory divided by the time decay[1] = exp ((1 / (decay-half-life/delta-t)) * log (1/2))
granularity. If integer truncation brings the value of an array
element to zero, the array can be made smaller. An implementation
should also impose a maximum reasonable array size or allow more
than one multiplication.
decay-array-size = (Tmax/delta-t) decay array size (decay-array-size)
decay array (decay[]) The decay array size is the decay memory divided by the time
granularity. If integer truncation brings the value of an array
element to zero, the array can be made smaller. An
implementation should also impose a maximum reasonable array
size or allow more than one multiplication.
Each i-th element of the decay array is the per tick delay raised decay-array-size = (Tmax/delta-t)
to the i-th power. This might be best done by successive floating
point multiplies followed by scaling and integer rounding or
truncation. The array itself need only be computed at startup.
decay[i] = decay[1] ** i decay array (decay[])
4.6 Building the Reuse Index Arrays Each i-th element of the decay array is the per tick delay
raised to the i-th power. This might be best done by successive
floating point multiplies followed by scaling and integer
rounding or truncation. The array itself need only be computed
at startup.
The reuse lists may be accessed quite frequently if a lot of routes decay[i] = decay[1] ** i
are flapping sufficiently to be suppressed. A method of speeding the
determination of which reuse list to use for a given route is
suggested. This method is introduced in Section 4.2, its
configuration described in Section 4.4.2 and the algorithms described
in Section 4.8.6 and Section 4.8.7. This section describes building
the reuse list index arrays.
A ratio of the figure of merit of the route under consideration to the 4.6 Building the Reuse Index Arrays
cutoff value is used as the basis for an array lookup. The ratio is
scaled and truncated to an integer and used to index the array. The
array entry is an integer used to determine which reuse list to use.
reuse array maximum ratio (max-ratio) The reuse lists may be accessed quite frequently if a lot of routes
are flapping sufficiently to be suppressed. A method of speeding the
determination of which reuse list to use for a given route is
suggested. This method is introduced in Section 4.2, its
configuration described in Section 4.4.2 and the algorithms described
in Section 4.8.6 and Section 4.8.7. This section describes building
the reuse list index arrays.
This is the maximum ratio between the current value of the A ratio of the figure of merit of the route under consideration to
stability figure of merit and the target reuse value that can be the cutoff value is used as the basis for an array lookup. The ratio
indexed by the reuse array. It may be limited by the ceiling is scaled and truncated to an integer and used to index the array.
imposed by the maximum hold time or by the amount of time that the The array entry is an integer used to determine which reuse list to
reuse lists cover. use.
max-ratio = min(ceiling/reuse, exp((1 / reuse array maximum ratio (max-ratio)
(half-life/reuse-array-time)) * log(2)))
reuse array scale factor ( scale-factor ) This is the maximum ratio between the current value of the
stability figure of merit and the target reuse value that can be
indexed by the reuse array. It may be limited by the ceiling
imposed by the maximum hold time or by the amount of time that
the reuse lists cover.
Since the reuse array is an estimator, the reuse array scale factor max-ratio = min(ceiling/reuse, exp((1 / (half-life/reuse-
has to be computed such that the full size of the reuse array is array-time)) * log(2)))
used.
scale-factor = reuse-index-array-size / (max-ratio - 1) reuse array scale factor ( scale-factor )
reuse index array (reuse-index-array[]) Since the reuse array is an estimator, the reuse array scale
factor has to be computed such that the full size of the reuse
array is used.
Each reuse index array entry should contain an index into the reuse scale-factor = reuse-index-array-size / (max-ratio - 1)
list array pointing to one of the list heads. This index should
corresponding to the reuse list that will be evaluated just after a
route would be eligible for reuse given the ratio of current value
of the stability figure of merit to target reuse value
corresponding the the reuse array entry.
reuse-index-array[j] = integer((decay-half-life / reuse index array (reuse-index-array[])
reuse-time-granularity) * log(1/(reuse * (1 + (j / Each reuse index array entry should contain an index into the
scale-factor)))) / log(1/2)) reuse list array pointing to one of the list heads. This index
should corresponding to the reuse list that will be evaluated
just after a route would be eligible for reuse given the ratio
of current value of the stability figure of merit to target
reuse value corresponding the the reuse array entry.
To determine which reuse queue to place a route which is being sup- reuse-index-array[j] = integer((decay-half-life / reuse-
pressed, the following procedure is used. Divide the current figure time-granularity) * log(1/(reuse * (1 + (j / scale-factor)))) /
of merit by the cutoff. Subtract one. Multiply by the scale factor. log(1/2))
This is the index into the reuse index array (reuse-index-array[]).
The value fetched from the reuse index array (reuse-index-array[]) is
an index into the array of reuse lists (reuse-array[]). If this index
is off the end of the array use the last queue otherwise look in the
array and pick the number of the queue from the array at that index.
This is quite fast and well worth the setup and storage required.
4.7 A Sample Configuration To determine which reuse queue to place a route which is being
suppressed, the following procedure is used. Divide the current
figure of merit by the cutoff. Subtract one. Multiply by the scale
factor. This is the index into the reuse index array (reuse-index-
array[]). The value fetched from the reuse index array (reuse-
index-array[]) is an index into the array of reuse lists (reuse-
array[]). If this index is off the end of the array use the last
queue otherwise look in the array and pick the number of the queue
from the array at that index. This is quite fast and well worth the
setup and storage required.
A simple example is presented here in which the space overhead is 4.7 A Sample Configuration
estimated for a set of configuration parameters. The design here
assumes:
1. there is a single parameter set used for all routes, A simple example is presented here in which the space overhead is
estimated for a set of configuration parameters. The design here
assumes:
2. decay time for unreachable routes is slower than for reachable 1. there is a single parameter set used for all routes,
routes
3. the arrays must be full size, rather than allow more than one 2. decay time for unreachable routes is slower than for reachable
multiply per decay operation to reduce the array size. routes
This example is used in later sections. The use of multiple parameter 3. the arrays must be full size, rather than allow more than one
sets complicates the examples somewhat. Where multiple parameter sets multiply per decay operation to reduce the array size.
are allowed for a single route, the decay portion of the algorithm is
repeated for each parameter set. If different routes are allowed to
have different parameter sets, the routes must have pointers to the
parameter sets to keep the time to locate to a minimum, but the
algorithms are otherwise unchanged.
A sample set of configuration parameters and a sample set of This example is used in later sections. The use of multiple
implementation parameters are provided in in the two following lists. parameter sets complicates the examples somewhat. Where multiple
parameter sets are allowed for a single route, the decay portion of
the algorithm is repeated for each parameter set. If different
routes are allowed to have different parameter sets, the routes must
have pointers to the parameter sets to keep the time to locate to a
minimum, but the algorithms are otherwise unchanged.
1. Configuration Parameters A sample set of configuration parameters and a sample set of
implementation parameters are provided in in the two following lists.
o cut = 1.25 1. Configuration Parameters
o reuse = 0.5 o cut = 1.25
o T-hold = 15 mins
o decay-ok = 5 min o reuse = 0.5
o decay-ng = 15 min o T-hold = 15 mins
o Tmax-ok, Tmax-ng = 15, 30 mins o decay-ok = 5 min
2. Implementation Parameters o decay-ng = 15 min
o delta-t = 1 sec o Tmax-ok, Tmax-ng = 15, 30 mins
o delta-reuse 2. Implementation Parameters
o reuse-list-size = 256 o delta-t = 1 sec
o reuse-index-array-size = 1,024 o delta-reuse = 15 sec
o reuse-list-size = 256
Using these configuration and implementation parameters and the o reuse-index-array-size = 1,024
equations in Section 4.5, the space overhead can be computed. There
is a fixed space overhead that is independent of the number of routes.
There is a space requirement associated with a stable route. There is
a larger space requirement associated with an unstable route. The
space requirements for the parameters above are provide in the lists
below.
1. fixed overhead (using parameters from previous example) Using these configuration and implementation parameters and the
equations in Section 4.5, the space overhead can be computed. There
is a fixed space overhead that is independent of the number of
routes. There is a space requirement associated with a stable route.
There is a larger space requirement associated with an unstable
route. The space requirements for the parameters above are provide
in the lists below.
o 900 * integer - decay array 1. fixed overhead (using parameters from previous example)
o 1,800 * integer - decay array o 900 * integer - decay array
o 120 * pointer - reuse list-heads o 1,800 * integer - decay array
o 2,048 * integer - reuse index arrays o 120 * pointer - reuse list-heads
2. overhead per stable route o 2,048 * integer - reuse index arrays
o pointer - containing null entry 2. overhead per stable route
3. overhead per unstable route o pointer - containing null entry
o pointer - to a damping structure containing the following 3. overhead per unstable route
o integer - figure of merit + bit for state o pointer - to a damping structure containing the following
o integer - last time updated
o pointer (optional) to configuration parameter block o integer - figure of merit + bit for state
o 2 * pointer - reuse list pointers (prev, next) o integer - last time updated
Figure 3 shows the behavior of the algorithm with the parameters given o 2 * pointer - reuse list pointers (prev, next)
above. Four cases are given in this example. In all four, there is a
twelve minute period of route oscillations. Two periods of oscilla-
tion are used, 2 minutes and 4 minutes. Two duty cycles are used, one
in which the route is reachable during 20% of the cycle and the other
where the route is reachable during 80% of the cycle. In all four
cases, the route becomes suppressed after it becomes unreachable the
second time. Once suppressed, it remains suppressed until some period
after becoming stable. The routes which oscillate over a 4 minute pe-
riod are no longer suppressed within 9-11 minutes after becoming sta-
ble. The routes with a 2 minute period of oscillation are suppressed for
nearly the maximum 15 minute period after becoming stable.
4.8 Processing Routing Protocol Activity The decay arrays are sized acording to delta-t and Tmax-ok or Tmax-
ng. The number of reuse list-heads is based on delta-reuse and the
greater of Tmax-ok or Tmax-ng. There are two reuse index arrays
whose size is a configured parameter.
The prior sections concentrate on configuration parameters and their Figure 3 shows the behavior of the algorithm with the parameters
relationship to the parameters and arrays used at run time and provide given above. Four cases are given in this example. In all four,
the algorithms for initializing run time storage. This section there is a twelve minute period of route oscillations. Two periods
provides the steps taken in processing routing events and timer events of oscillation are used, 2 minutes and 4 minutes. Two duty cycles
when running. are used, one in which the route is reachable during 20% of the cycle
and the other where the route is reachable during 80% of the cycle.
In all four cases, the route becomes suppressed after it becomes
unreachable the second time. Once suppressed, it remains suppressed
until some period after becoming stable. The routes which oscillate
over a 4 minute period are no longer suppressed within 9-11 minutes
after becoming stable. The routes with a 2 minute period of
oscillation are suppressed for nearly the maximum 15 minute period
after becoming stable.
The routing events are: 4.8 Processing Routing Protocol Activity
1. A BGP peer or new route comes up for the first time (or after an The prior sections concentrate on configuration parameters and their
extended down time) (Section 4.8.1) relationship to the parameters and arrays used at run time and
provide the algorithms for initializing run time storage. This
section provides the steps taken in processing routing events and
timer events when running.
2. A route becomes unreachable (Section 4.8.2) The routing events are:
3. A route becomes reachable again (Section 4.8.3) 1. A BGP peer or new route comes up for the first time (or after
an extended down time) (Section 4.8.1)
4. A route changes (Section 4.8.4) 2. A route becomes unreachable (Section 4.8.2)
5. A peer goes down (Section 4.8.5) 3. A route becomes reachable again (Section 4.8.3)
The reuse list is used to provide a means of fast evaluation of route 4. A route changes (Section 4.8.4)
that had been suppressed, but had been stable long enough to be reused
again or had been suppressed long enough that it can be treated as a
new route. The following two operations are described.
time figure-of-merit as a function of time 5. A peer goes down (Section 4.8.5)
time figure-of-merit as a function of time (in minutes)
0.00 0.000 . 0.000 . 0.000 . 0.000 . 0.00 0.000 . 0.000 . 0.000 . 0.000 .
0.62 0.000 . 0.000 . 0.000 . 0.000 . 0.62 0.000 . 0.000 . 0.000 . 0.000 .
1.25 0.000 . 0.000 . 0.000 . 0.000 . 1.25 0.000 . 0.000 . 0.000 . 0.000 .
1.88 0.000 . 0.000 . 0.000 . 0.000 . 1.88 0.000 . 0.000 . 0.000 . 0.000 .
2.50 0.977 . 0.968 . 0.000 . 0.000 . 2.50 0.977 . 0.968 . 0.000 . 0.000 .
3.12 0.949 . 0.888 . 0.000 . 0.000 . 3.12 0.949 . 0.888 . 0.000 . 0.000 .
3.75 0.910 . 0.814 . 0.000 . 0.000 . 3.75 0.910 . 0.814 . 0.000 . 0.000 .
4.37 1.846 . 1.756 . 0.983 . 0.983 . 4.37 1.846 . 1.756 . 0.983 . 0.983 .
5.00 1.794 . 1.614 . 0.955 . 0.935 . 5.00 1.794 . 1.614 . 0.955 . 0.935 .
skipping to change at page 23, line 49 skipping to change at page 25, line 48
20.00 1.380 . 1.220 . 0.817 . 0.691 . 20.00 1.380 . 1.220 . 0.817 . 0.691 .
20.62 1.266 . 1.119 . 0.750 . 0.633 . 20.62 1.266 . 1.119 . 0.750 . 0.633 .
21.25 1.161 . 1.026 . 0.687 . 0.581 . 21.25 1.161 . 1.026 . 0.687 . 0.581 .
21.87 1.064 . 0.941 . 0.630 . 0.533 . 21.87 1.064 . 0.941 . 0.630 . 0.533 .
22.50 0.976 . 0.863 . 0.578 . 0.488 . 22.50 0.976 . 0.863 . 0.578 . 0.488 .
23.12 0.895 . 0.791 . 0.530 . 0.448 . 23.12 0.895 . 0.791 . 0.530 . 0.448 .
23.75 0.821 . 0.725 . 0.486 . 0.411 . 23.75 0.821 . 0.725 . 0.486 . 0.411 .
24.37 0.753 . 0.665 . 0.446 . 0.377 . 24.37 0.753 . 0.665 . 0.446 . 0.377 .
25.00 0.690 . 0.610 . 0.409 . 0.345 . 25.00 0.690 . 0.610 . 0.409 . 0.345 .
Figure 3: Some fairly long route flap cycles, repeated for 12 Figure 3: Some fairly long route flap cycles, repeated for 12 minutes,
minutes, followed by a period of stability. followed by a period of stability.
1. Inserting into a reuse list (Section 4.8.6) The reuse list is used to provide a means of fast evaluation of route
that had been suppressed, but had been stable long enough to be
reused again or had been suppressed long enough that it can be
treated as a new route. The following two operations are described.
2. Reuse list processing every delta-t seconds (Section 4.8.7) 1. Inserting into a reuse list (Section 4.8.6)
4.8.1 Processing a New Peer or New Routes 2. Reuse list processing every delta-t seconds (Section 4.8.7)
When a peer comes up, no action is required if the routes had no 4.8.1 Processing a New Peer or New Routes
previous history of instability, for example if this is the first time
the peer is coming up and announcing these routes. For each route,
the pointer to the damping structure would be zeroed and route used.
The same action is taken for a new route or a route that has been down
long enough that the figure of merit reached zero and the damping
structure was deleted.
4.8.2 Processing Unreachable Messages When a peer comes up, no action is required if the routes had no
previous history of instability, for example if this is the first
time the peer is coming up and announcing these routes. For each
route, the pointer to the damping structure would be zeroed and route
used. The same action is taken for a new route or a route that has
been down long enough that the figure of merit reached zero and the
damping structure was deleted.
When a route is withdrawn or changed (Section 4.8.4 describes how a 4.8.2 Processing Unreachable Messages
change is handled), the following procedure is used.
If there is no previous stability history (the damping structure When a route is withdrawn or changed (Section 4.8.4 describes how a
pointer is zero), then: change is handled), the following procedure is used.
1. allocate a damping structure If there is no previous stability history (the damping structure
pointer is zero), then:
2. set figure-of-merit = 1 1. allocate a damping structure
3. withdraw the route 2. set figure-of-merit = 1
Otherwise, if there is an existing damping structure, then: 3. withdraw the route
1. set t-diff = t-now - t-updated Otherwise, if there is an existing damping structure, then:
2. if (t-diff puts you off the end of the array) { 1. set t-diff = t-now - t-updated
set figure-of-merit = 1 2. if (t-diff puts you off the end of the array) {
} else { setfigure-of-merit =1
set figure-of-merit = figure-of-merit * decay-array-ok [t-diff] + 1 }else {
if (figure-of-merit > ceiling) { setfigure-of-merit =figure-of-merit *decay-array-ok [t-diff ]+ 1
set figure-of-merit = ceiling if(figure-of-merit >ceiling) {
} setfigure-of-merit =ceiling
} }
3. remove the route from a reuse list if it is on one }
4. withdraw the route unless it is already suppressed 3. remove the route from a reuse list if it is on one
In either case then: 4. withdraw the route unless it is already suppressed
1. set t-updated = t-now In either case then:
2. insert into a reuse list (see Section 4.8.6) 1. set t-updated = t-now
If there was a stability history, the previous value of the stability 2. insert into a reuse list (see Section 4.8.6)
figure of merit is decayed. This is done using the decay array
(decay-array). The index is determined by subtracting the current
time and the last time updated, then dividing by the time granularity.
If the index is zero, the figure of merit is unchanged (no decay). If
it is greater than the array size, it is zeroed. Otherwise use the
index to fetch a decay array element and multiply the figure of merit
by the array element. If using the suggested scaled integer method,
shift down half an integer. Add the scaled penalty for one more un-
reachable (shown above as 1). If the result is above the ceiling re-
place it with the ceiling value. Now update the last time updated field
(preferably taking into account how much time was truncated before doing
the decay calculation).
When a route becomes unreachable, alternate paths must be considered. If there was a stability history, the previous value of the stability
This process is complicated slightly if different configuration param- figure of merit is decayed. This is done using the decay array
eters are used in the presence or absence of viable alternate paths. (decay-array). The index is determined by subtracting the current
If all of these alternate paths have been suppressed because there had time and the last time updated, then dividing by the time
previously been an alternate route and the new route withdrawal granularity. If the index is zero, the figure of merit is unchanged
changes that condition, the suppressed alternate paths must be reeval- (no decay). If it is greater than the array size, it is zeroed.
uated. They should be reevaluated in order of normal route prefer- Otherwise use the index to fetch a decay array element and multiply
ence. When one of these alternate routes is encountered that had been the figure of merit by the array element. If using the suggested
suppressed but is now usable since there is no alternate route, no scaled integer method, shift down half an integer. Add the scaled
further routes need to be reevaluated. This only applies if routes penalty for one more unreachable (shown above as 1). If the result
are given two different reuse thresholds, one for use when there is an al- is above the ceiling replace it with the ceiling value. Now update
ternate path and a higher threshold to use when suppressing the route would the last time updated field (preferably taking into account how much
result in making the destination completely unreachable. time was truncated before doing the decay calculation).
4.8.3 Processing Route Advertisements When a route becomes unreachable, alternate paths must be considered.
This process is complicated slightly if different configuration
parameters are used in the presence or absence of viable alternate
paths. If all of these alternate paths have been suppressed because
there had previously been an alternate route and the new route
withdrawal changes that condition, the suppressed alternate paths
must be reevaluated. They should be reevaluated in order of normal
route preference. When one of these alternate routes is encountered
that had been suppressed but is now usable since there is no
alternate route, no further routes need to be reevaluated. This only
applies if routes are given two different reuse thresholds, one for
use when there is an alternate path and a higher threshold to use
when suppressing the route would result in making the destination
completely unreachable.
When a route is readvertised if there is no damping structure, then 4.8.3 Processing Route Advertisements
the procedure is the same as in Section 4.8.1.
1. don't create a new damping structure When a route is readvertised if there is no damping structure, then
the procedure is the same as in Section 4.8.1.
2. use the route 1. don't create a new damping structure
If an damping structure exists, the figure of merit is decayed and the 2. use the route
figure of merit and last time updated fields are updated. A decision
is now made as to whether the route can be used immediately or needs
to be suppressed for some period of time.
1. set t-diff = t-now - t-updated If an damping structure exists, the figure of merit is decayed and
the figure of merit and last time updated fields are updated. A
decision is now made as to whether the route can be used immediately
or needs to be suppressed for some period of time.
2. if (t-diff puts you off the end of the array) { 1. set t-diff = t-now - t-updated
set figure-of-merit = 0 2. if (t-diff puts you off the end of the array) {
} else { set figure-of-merit =0
set figure-of-merit = figure-of-merit * decay-array-ng [t-diff] }else {
} set figure-of-merit= figure-of-merit* decay-array-ng[t-diff]
3. if (not suppressed and figure-of-merit < cut) { }
use the route 3. if ( not suppressed and figure-of-merit < cut ) {
} else if (suppressed and figure-of-merit < reuse) { use the route
set state to not suppressed }else if( suppressed and figure-of-merit< reuse) {
remove the route from a reuse list set state tonot suppressed
use the route remove the route from a reuse list
} else { use the route
set state to suppressed }else {
don't use the route set state to suppressed
insert into a reuse list (see Section 4.8.6) don't use the route
} insert into a reuse list (see Section 4.8.6)
4. if (figure-of-merit > 0) { }
set t-updated = t-now 4. if ( figure-of-merit > 0 ) {
} else { set t-updated= t-now
recover memory for damping struct }else {
recover memory for damping struct
zero pointer to damping struct zero pointer to damping struct
} }
If the route is deemed usable, a search for the current best route If the route is deemed usable, a search for the current best route
must be made. The newly reachable route is then evaluated according must be made. The newly reachable route is then evaluated according
to the BGP protocol rules for route selection. to the BGP protocol rules for route selection.
If the new route is usable, the previous best route is examined. If the new route is usable, the previous best route is examined.
Prior to route comparisons, the current best route may have to be Prior to route comparisons, the current best route may have to be
reevaluated if separate parameter sets are used depending on the reevaluated if separate parameter sets are used depending on the
presence or absence of an alternate route. If there had been no presence or absence of an alternate route. If there had been no
alternate the previous best route may be suppressed. alternate the previous best route may be suppressed.
If the new route is to be suppressed it is placed on a reuse list only If the new route is to be suppressed it is placed on a reuse list
if it would have been preferred to the current best route had the new only if it would have been preferred to the current best route had
route been accepted as stable. There is no reason to queue a route on the new route been accepted as stable. There is no reason to queue a
a reuse list if after the route becomes usable it would not be used route on a reuse list if after the route becomes usable it would not
anyway due to the existence of a more preferred route. Such a route be used anyway due to the existence of a more preferred route. Such
would not have to be reevaluated unless the preferred route became a route would not have to be reevaluated unless the preferred route
unreachable. As specified here, the less preferred route would be became unreachable. As specified here, the less preferred route
reevaluated and potentially used or potentially added to a reuse list would be reevaluated and potentially used or potentially added to a
when processing the withdrawal of a more preferred best route. reuse list when processing the withdrawal of a more preferred best
route.
4.8.4 Processing Route Changes 4.8.4 Processing Route Changes
If a route is replaced by a peer router by supplying a new path, the If a route is replaced by a peer router by supplying a new path, the
route that is being replaced should be treated as if an unreachable route that is being replaced should be treated as if an unreachable
were received (see Section 4.8.2). This will occur when a peer were received (see Section 4.8.2). This will occur when a peer
somewhere back in the AS path is continuously switching between two AS somewhere back in the AS path is continuously switching between two
paths and that peer is not damping route flap (or applying less AS paths and that peer is not damping route flap (or applying less
damping). There is no way to determine if one AS path is stable and damping). There is no way to determine if one AS path is stable and
the other is flapping, or if they are both flapping. If the cycle is the other is flapping, or if they are both flapping. If the cycle is
sufficiently short compared to convergence times neither route through sufficiently short compared to convergence times neither route
that peer will deliver packets very reliably. Since there is no way through that peer will deliver packets very reliably. Since there is
to affect the peer such that it chooses the stable of the two AS no way to affect the peer such that it chooses the stable of the two
paths, the only viable option is to penalize both routes by considering AS paths, the only viable option is to penalize both routes by
each change as an unreachable followed by a route advertisement. considering each change as an unreachable followed by a route
advertisement.
4.8.5 Processing A Peer Router Loss 4.8.5 Processing A Peer Router Loss
When a peer routing session is broken, either all individual routes When a peer routing session is broken, either all individual routes
advertised by that peer may be marked as unstable, or the peering advertised by that peer may be marked as unstable, or the peering
session itself may be marked as unstable. Marking the peer will save session itself may be marked as unstable. Marking the peer will save
considerable memory. Since the individual routes are advertised as considerable memory. Since the individual routes are advertised as
unreachable to routers beyond the immediate problem, per route state unreachable to routers beyond the immediate problem, per route state
will be incurred beyond the peer immediately adjacent to the BGP will be incurred beyond the peer immediately adjacent to the BGP
session that went down. If the instability continues, the immediately session that went down. If the instability continues, the
adjacent router need only keep track of the peer stability history. immediately adjacent router need only keep track of the peer
The routers beyond that point will receive no further advertisements stability history. The routers beyond that point will receive no
or withdrawal of routes and will dispose of the damping structure over further advertisements or withdrawal of routes and will dispose of
time. the damping structure over time.
BGP notification through an optional transitive attribute that damping BGP notification through an optional transitive attribute that
will already be applied may be considered in the future to reduce the damping will already be applied may be considered in the future to
number of routers that incur damping structure storage overhead. reduce the number of routers that incur damping structure storage
overhead.
4.8.6 Inserting into the Reuse Timer List 4.8.6 Inserting into the Reuse Timer List
The reuse lists are used to provide a means of fast evaluation of The reuse lists are used to provide a means of fast evaluation of
route that had been suppressed, but had been stable long enough to be route that had been suppressed, but had been stable long enough to be
reused again. The data structure consists of a series of list heads. reused again. The data structure consists of a series of list heads.
Each list contains a set of routes that are scheduled for reevaluation Each list contains a set of routes that are scheduled for
at approximately the same time. The set of reuse list heads are reevaluation at approximately the same time. The set of reuse list
treated as a circular array. heads are treated as a circular array. Refer to Figure 4.
A simple implementation of the circular array of list heads would be A simple implementation of the circular array of list heads would be
an array containing the list heads with an offset. The offset would an array containing the list heads. An offset is used when accessing
identify the first list. The Nth list would be at the index the array. The offset would identify the first list. The Nth list
corresponding to N plus the offset modulo the number of list heads. would be at the index corresponding to N plus the offset modulo the
This design will be assumed in the examples that follow. number of list heads. This design will be assumed in the examples
that follow.
A key requirement is to be able to insert an entry in the most A key requirement is to be able to insert an entry in the most
appropriate queue with a minimum of computation. The computation is appropriate queue with a minimum of computation. The computation is
given only the current value of figure-of-merit. The array, scale, given only the current value of figure-of-merit. Instead of a
and bounds are precomputed to map figure-of-merit to the nearest list computation which would involve a logarithm, the reuse array (reuse-
head without requiring a logarithm to be computed (see Section 4.5). array[]) described in Section 4.6 is used. The array, scale, and
bounds are precomputed to map figure-of-merit to the nearest list
head without requiring a logarithm to be computed (see Section 4.5).
1. scale figure-of-merit for the index array lookup producing index +-+ +-+ +-+ non-empty linked list means
| | | | | | <-- that there are routes with
+-+ +-+ +-+ defered action to be taken
^ ^ ^ N * delta-reuse seconds later.
| | |
+------+------+------+------+------+ +------+
| list | list | list | list | list | ... | list |
| head | head | head | head | head | ... | head |
+------+------+------+------+------+ +------+
^ ^ ^ ^ ^ ^
Nth 1st 2nd 3rd 4th N-1
|
offset to first list
(the offset is incremented every delta-reuse seconds)
2. check index against the array bound Figure 4: Reuse List Data Structures
3. if (within the array bound) { Note that in the following sections the operator prefix notation
"modulo a b" means "b % a" in C language algebraic operator notation.
For example, "modulo 16 1023" would be 15.
set index = reuse-array [index] 1. scale figure-of-merit for the index array lookup producing
index
} else { 2. check index against the array bound
set index = reuse-list-size - 1 3. if (within the array bound) {
} set index =reuse-array [index ]
4. insert into the list }else {
reuse-list [modulo reuse-list-size (index + offset)] set index =reuse-list-size -1
Choosing the correct reuse list involves only a multiply and shift to }
do the scaling, an integer truncation, then an array lookup. The most
common method of implementing a circular array is to use an array and
apply an offset and modulo operation to pick the correct array entry.
The offset is incremented to rotate the the circular array.
4.8.7 Handling Reuse Timer Events 4. insert into the list
The granularity of the reuse timer should be more course that that of reuse-list[ moduloreuse-list-size (index +offset )]
the decay timer. As a result, when the reuse timer fires, suppressed
routes should be decayed by multiple increments of decay time. Some
computation can be avoided by always inserting into the reuse list
corresponding to one time increment past reuse eligibility. In cases
where the reuse lists have a longer ``memory'' than the ``decay
memory'' (described above), all of the routes in the first queue will
be available for immediate reuse if reachable or the history entry
could be disposed of if unreachable.
When it is time to advance the lists, the first queue on the reuse Choosing the correct reuse list involves only a multiply and shift to
list must be processed and the circular queue must be rotated. Using do the scaling, an integer truncation, then an array lookup in the
an array and an offset as a circular array (as described in reuse array (reuse-array[]). The value retrieved from the reuse
Section 4.8.6), the algorithm below is repeated every t-reuse seconds. array is used to select a reuse list. The reuse list is a circular
list. The most common method of implementing a circular list is to
use an array and apply an offset and modulo operation to pick the
correct array entry. The offset is incremented to rotate the
circular list.
1. save a pointer to the current zeroth queue head and zero the list 4.8.7 Handling Reuse Timer Events
head entry
2. set offset = modulo reuse-list-size ( offset + 1 ), thereby The granularity of the reuse timer should be more coarse than that of
rotating the circular queue of list-heads the decay timer. As a result, when the reuse timer fires, suppressed
routes should be decayed by multiple increments of decay time. Some
computation can be avoided by always inserting into the reuse list
corresponding to one time increment past reuse eligibility. In cases
where the reuse lists have a longer "memory" than the "decay memory"
(described above), all of the routes in the first queue will be
available for immediate reuse if reachable or the history entry could
be disposed of if unreachable.
3. if (the saved list head pointer is non-empty) When it is time to advance the lists, the first queue on the reuse
list must be processed and the circular queue must be rotated. Using
an array and an offset as a circular array (as described in Section
4.8.6), the algorithm below is repeated every delta-reuse seconds.
foreach entry { 1. save a pointer to the current zeroth queue head and zero the
list head entry
set t-diff = t-now - t-updated 2. set offset = modulo reuse-list-size ( offset + 1 ), thereby
rotating the circular queue of list-heads
set figure-of-merit = figure-of-merit * decay-array-ok [t-diff] 3. if ( the saved list head pointer is non-empty )
set t-updated = t-now
if (figure-of-merit < reuse) for each entry {
reuse the route sett-diff =t-now -t-updated
else set figure-of-merit =figure-of-merit *decay-array-ok [t-diff ]
re-insert into another list (see Section 4.8.6) sett-updated =t-now
} if( figure-of-merit< reuse)
The value of the zeroth list head would be saved and the array entry reuse the route
itself zeroed. The list heads would then be advanced by incrementing
the offset. Starting with the saved head of the old zeroth list, each
route would be reevaluated and used, disposed of entirely or requeued
if it were not ready for reuse. If a route is used, it must be
treated as if it were a new route advertisement as described in
Section 4.8.3.
5 Implementation Experience else
The first implementations of ``route flap damping'' were the route re-insert into another list (seeSection 4.8.6)
server daemon (rsd) coding by Ramesh Govindan (ISI) and the Cisco IOS
implementation by Ravi Chandra. Both implementations first became
available in 1995 and have been used extensively. The rsd
implementation has been in use in route servers at the NSF funded
Network Access Points (NAPs) and at other major Internet
interconnects. The Cisco IOS version has been in use by Internet
Service Providers worldwide. The rsd implementation has been
integrated in releases of gated (see http://www.gated.org) and is
available in commercial routers using gated.
There are now more than 2 years of BGP route damping deployment }
experience. Some problems have occurred in deployment. So far these
are solvable by careful implementation of the algorithm and by careful
deployment. In some topologies coordinated deployment can be helpful
and in all cases disclosure of the use of route damping and the param-
eters used is highly beneficial in debugging connectivity problems.
Some of the problems have occurred due to subtle implementation The value of the zeroth list head would be saved and the array entry
errors. Route damping should never be applied on IBGP learned routes. itself zeroed. The list heads would then be advanced by incrementing
To do so can open the possibility for persistent route loops. the offset. Starting with the saved head of the old zeroth list,
Implementations should disallow this configuration. Penalties for each route would be reevaluated and used, disposed of entirely or
flapping should only be applied when a route is removed or replaced requeued if it were not ready for reuse. If a route is used, it must
and not when a route is added. If damping parameters are applied be treated as if it were a new route advertisement as described in
consistently, this implementation constraint will result in a stable Section 4.8.3.
secondary path being preferred over an unstable primary path due to
damping of the primary path near the source.
In topologies where multiple AS paths to a given destination exist 5 Implementation Experience
flapping of the primary path can result in suppression of the
secondary path. This can occur if no damping is being done near the
cause of the route flap or if damping is being applied more
aggressively by a distant AS. This problem can be solved in one of two
ways. Damping can be done near the source of the route flap and the
damping parameters can be made consistent. Alternately, a distant AS
which insists on more aggressive damping parameters can disable
penalizing routes on AS path change, penalizing routes only if they
are withdrawn completely. In order to do so, the implementation must
support this option (as described in Section 4.4.3).
Route flap should be damped near the source. Single homed The first implementations of "route flap damping" were the route
destinations can be covered by static routes. Aggregation provides server daemon (rsd) coding by Ramesh Govindan (ISI) and the Cisco IOS
another means of damping. Providers should damp their own internal implementation by Ravi Chandra. Both implementations first became
problems, however damping on IGP link state origination is not yet available in 1995 and have been used extensively. The rsd
implemented by router vendors. Providers which use multiple AS within implementation has been in use in route servers at the NSF funded
their own topology should damp between their own AS. Providers should Network Access Points (NAPs) and at other major Internet
damp adjacent providers AS. interconnects. The Cisco IOS version has been in use by Internet
Service Providers worldwide. The rsd implementation has been
integrated in releases of gated (see http://www.gated.org) and is
available in commercial routers using gated.
Damping provides a means to limit propagation excessive route change There are now more than 2 years of BGP route damping deployment
when connectivity is highly intermittent. Once a problem is experience. Some problems have occurred in deployment. So far these
corrected, select damping state can be manually cleared. In order to are solvable by careful implementation of the algorithm and by
determine where damping may have occurred after connectivity problems, careful deployment. In some topologies coordinated deployment can be
providers should publish their damping parameters. Providers should helpful and in all cases disclosure of the use of route damping and
be willing to manually clear damping on specific prefixes or AS paths the parameters used is highly beneficial in debugging connectivity
at the request of other providers when the request is accompanied by problems.
assurance that the problem has truly been addressed.
By damping their own routing information, providers can reduce their Some of the problems have occurred due to subtle implementation
own need to make requests of other providers to clear damping state errors. Route damping should never be applied on IBGP learned
after correcting a problem. Providers should be pro-active and routes. To do so can open the possibility for persistent route
monitor what prefixes and paths are suppressed in addition to loops. When IBGP routes within an AS are inconsistent, route loops
monitoring link states and BGP session state. can easily form. Suppressing IBGP learned routes causes such
inconsistencies. Implementations should disallow configuration of
route damping on IBGP peers.
Penalties for instability should only be applied when a route is
removed or replaced and not when a route is added. If damping
parameters are applied consistently, this implementation constraint
will result in a stable secondary path being preferred over an
unstable primary path due to damping of the primary path near the
source.
In topologies where multiple AS paths to a given destination exist
flapping of the primary path can result in suppression of the
secondary path. This can occur if no damping is being done near the
cause of the route flap or if damping is being applied more
aggressively by a distant AS. This problem can be solved in one of
two ways. Damping can be done near the source of the route flap and
the damping parameters can be made consistent. Alternately, a
distant AS which insists on more aggressive damping parameters can
disable penalizing routes on AS path change, penalizing routes only
if they are withdrawn completely. In order to do so, the
implementation must support this option (as described in Section
4.4.3).
Route flap should be damped near the source. Single homed
destinations can be covered by static routes. Aggregation provides
another means of damping. Providers should damp their own internal
problems, however damping on IGP link state origination is not yet
implemented by router vendors. Providers which use multiple AS
within their own topology should damp between their own AS. Providers
should damp adjacent providers AS.
Damping provides a means to limit propagation excessive route change
when connectivity is highly intermittent. Once a problem is
corrected, damping state corresponding to the prefixes known to be
damped due to the problem just fixed can be manually cleared. In
order to determine where damping may have occurred after connectivity
problems, providers should publish their damping parameters.
Providers should be willing to manually clear damping on specific
prefixes or AS paths at the request of other providers when the
request is accompanied by credible assurance that the problem has
truly been addressed.
By damping their own routing information, providers can reduce their
own need to make requests of other providers to clear damping state
after correcting a problem. Providers should be pro-active and
monitor what prefixes and paths are suppressed in addition to
monitoring link states and BGP session state.
Acknowledgements Acknowledgements
This work and this document may not have been completed without the This work and this document may not have been completed without the
advise, comments and encouragement of Yakov Rekhter (Cisco). Dennis advise, comments and encouragement of Yakov Rekhter (Cisco). Dennis
Ferguson (MCI) provided a description of the algorithms in the gated Ferguson (MCI) provided a description of the algorithms in the gated
BGP implementation and many valuable comments and insights. David BGP implementation and many valuable comments and insights. David
Bolen (ANS) and Jordan Becker (ANS) provided valuable comments, Bolen (ANS) and Jordan Becker (ANS) provided valuable comments,
particularly regarding early simulations. Over four years elapsed particularly regarding early simulations. Over four years elapsed
between the initial draft presented to the BGP WG (October 1993) and between the initial draft presented to the BGP WG (October 1993) and
this iteration. At the time of this writing there is significant this iteration. At the time of this writing there is significant
experience with two implementations, each having been deployed since experience with two implementations, each having been deployed since
1995. One was led by Ramesh Govindan (ISI) for the NSF Routing Ar- 1995. One was led by Ramesh Govindan (ISI) for the NSF Routing
biter project. The second was led by Ravi Chandra (Cisco). Sean Doran Arbiter project. The second was led by Ravi Chandra (Cisco). Sean
(Sprintlink) and Serpil Bayraktar (ANS) were among the early independent Doran (Sprintlink) and Serpil Bayraktar (ANS) were among the early
testers of the Cisco pre-beta implementation. Valuable comments and im- independent testers of the Cisco pre-beta implementation. Valuable
plementation feedback were shared by many individuals on the IETF IDR WG comments and implementation feedback were shared by many individuals
and the RIPE Routing Work Group and in NANOG and IEPG. on the IETF IDR WG and the RIPE Routing Work Group and in NANOG and
IEPG.
Thanks also to Rob Coltun (Fore Systems), Sanjay Wadhwa (Fore), John Thanks also to Rob Coltun (Fore Systems), Sanjay Wadhwa (Fore), John
Scudder (IENG), Eric Bennet (IENG) and Jayesh Bhatt (Bay Networks) for Scudder (IENG), Eric Bennet (IENG) and Jayesh Bhatt (Bay Networks)
pointing out errors in the math uncovered during coding of more recent for pointing out errors in the math uncovered during coding of more
implementations. These errors appeared in the details of the recent implementations. These errors appeared in the details of the
implementation suggestion sections written after the first two implementation suggestion sections written after the first two
implementations were completed. implementations were completed. Thanks also to Vern Paxson for a
very thorough review resulting in numerous clarifications to the
document.
References References
[1] P. Gross and Y. Rekhter. Application of the border gateway proto- [1] Gross, P., and Y. Rekhter, "Application of the border gateway
col in protocol in the internet", RFC 1268, October 1991.
the internet. Request for Comments (Draft Standard) RFC 1268, In-
ternet Engineering Task Force, October 1991. (Obsoletes RFC1164);
(Obsoleted by RFC1655). ftp://ds.internic.net/rfc/rfc1268.txt.
[2] ISO/IEC. Iso/iec 10747 - information technology - telecommunica- [2] ISO/IEC. Iso/iec 10747 - information technology - telecommuni-
tions and information exchange between systems - protocol for cations and information exchange between systems - protocol for
exchange of inter-domain routeing information among intermediate exchange of inter-domain routeing information among intermediate
systems to support forwarding of iso systems to support forwarding of iso 8473 pdus. Technical
8473 pdus. Technical report, International Organization for Stan- report, International Organization for Standardization, August
dardization, August 1994. ftp://merit.edu/pub/iso/idrp.ps.gz. 1994. ftp://merit.edu/pub/iso/idrp.ps.gz.
[3] K. Lougheed and Y. Rekhter. A border gateway protocol 3 (BGP-3). [3] Lougheed, K., and Y. Rekhter, "A border gateway protocol 3 (BGP-
Request for Comments (Draft Standard) RFC 1267, In- 3)", RFC 1267, October 1991.
ternet Engineering Task Force, October 1991. (Obsoletes RFC1163).
ftp://ds.internic.net/rfc/rfc1267.txt.
[4] Y. Rekhter and P. Gross. Application of the border gateway proto-
col in the internet. Request for Comments (Draft Standard)
RFC 1772, Internet Engineering Task Force, March 1995. (Obsoletes
RFC1655). ftp://ds.internic.net/rfc/rfc1772.txt.
[5] Y. Rekhter and T. Li. A border [4] Rekhter, Y., and P. Gross, "Application of the border gateway
gateway protocol 4 (BGP-4). Request for Comments (Draft Standard) protocol in the internet", RFC 1772, March 1995.
RFC 1771, Internet Engineering Task Force, March 1995. (Obsoletes
RFC1654). ftp://ds.internic.net/rfc/rfc1771.txt.
[6] Y. Rekhter and C. Topolcic. Exchanging routing information across [5] Rekhter, Y., and T. Li, "A border gateway protocol 4 (BGP-4)",
provider boundaries in the CIDR environment. Request for Comments RFC 1771, March 1995.
(Informational) RFC 1520, Internet Engineering Task Force,
September 1993. ftp://ds.internic.net/rfc/rfc1520.txt.
[7] P. Traina. BGP-4 protocol analysis. Request for Comments (Infor- [6] Rekhter, Y., and C. Topolcic,"Exchanging routing information
mational) RFC 1774, Internet Engineering Task Force, March 1995. across provider boundaries in the CIDR environment", RFC 1520,
ftp://ds.internic.net/rfc/rfc1774.txt. September 1993.
[8] P. Traina. Experience with the BGP-4 protocol. Request for Com- [7] Traina, P., "BGP-4 protocol analysis", RFC 1774, March 1995.
ments (Informational) RFC 1773,
Internet Engineering Task Force, March 1995. (Obsoletes RFC1656). [8] Traina, P., "Experience with the BGP-4 protocol", RFC 1773, March
ftp://ds.internic.net/rfc/rfc1773.txt. 1995.
Security Considerations Security Considerations
The practices outlined in this document do not further weaken the The practices outlined in this document do not further weaken the
security of the routing protocols. Denial of service is possible in security of the routing protocols. Denial of service is possible in
an already insecure routing environment but these practices only an already insecure routing environment but these practices only
contribute to the persistence of such attacks and do not impact the contribute to the persistence of such attacks and do not impact the
methods of prevention and the methods of determining the source. methods of prevention and the methods of determining the source.
Author's Addresses Authors' Addresses
Curtis Villamizar Curtis Villamizar
ANS Communications ANS
<curtis@ans.net>
Ravi Chandra EMail: curtis@ans.net
Cisco Systems
<rchandra@cisco.com>
Ramesh Govindan Ravi Chandra
ISI Cisco Systems
<govindan@isi.edu>
EMail: rchandra@cisco.com
Ramesh Govindan
ISI
EMail: govindan@isi.edu
Full Copyright Statement
Copyright (C) The Internet Society (1998). All Rights Reserved.
This document and translations of it may be copied and furnished to
others, and derivative works that comment on or otherwise explain it
or assist in its implementation may be prepared, copied, published
and distributed, in whole or in part, without restriction of any
kind, provided that the above copyright notice and this paragraph are
included on all such copies and derivative works. However, this
document itself may not be modified in any way, such as by removing
the copyright notice or references to the Internet Society or other
Internet organizations, except as needed for the purpose of
developing Internet standards in which case the procedures for
copyrights defined in the Internet Standards process must be
followed, or as required to translate it into languages other than
English.
The limited permissions granted above are perpetual and will not be
revoked by the Internet Society or its successors or assigns.
This document and the information contained herein is provided on an
"AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING
TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING
BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION
HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF
MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
 End of changes. 320 change blocks. 
1107 lines changed or deleted 1242 lines changed or added

This html diff was produced by rfcdiff 1.41. The latest version is available from http://tools.ietf.org/tools/rfcdiff/