draft-ietf-idr-route-damp-02.txt   draft-ietf-idr-route-damp-03.txt 
Internet Engineering Task Force Curtis Villamizar Internet Engineering Task Force Curtis Villamizar
INTERNET-DRAFT ANS INTERNET-DRAFT ANS
draft-ietf-idr-route-damp-02 Ravi Chandra draft-ietf-idr-route-damp-03 Ravi Chandra
Cisco Cisco
Ramesh Govindan Ramesh Govindan
ISI ISI
February 15, 1998 May 15, 1998
BGP Route Flap Damping BGP Route Flap Damping
Status of this Memo Status of this Memo
This document is an Internet-Draft. Internet-Drafts are working This document is an Internet-Draft. Internet-Drafts are working
documents of the Internet Engineering Task Force (IETF), its areas, documents of the Internet Engineering Task Force (IETF), its areas,
and its working groups. Note that other groups may also distribute and its working groups. Note that other groups may also distribute
working documents as Internet-Drafts. working documents as Internet-Drafts.
Internet-Drafts are draft documents valid for a maximum of six months Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet- Drafts as reference time. It is inappropriate to use Internet- Drafts as reference
material or to cite them other than as ``work in progress.'' material or to cite them other than as ``work in progress.''
To view the entire list of current Internet-Drafts, please check the To view the entire list of current Internet-Drafts, please check
``1id-abstracts.txt'' listing contained in the Internet-Drafts Shadow the "1id-abstracts.txt" listing contained in the Internet-Drafts
Directories on ftp.is.co.za (Africa), ftp.nordu.net (Europe), Shadow Directories on ftp.is.co.za (Africa), ftp.nordu.net
munnari.oz.au (Pacific Rim), ds.internic.net (US East Coast), or (Northern Europe), ftp.nis.garr.it (Southern Europe), munnari.oz.au
ftp.isi.edu (US West Coast). (Pacific Rim), ftp.ietf.org (US East Coast), or ftp.isi.edu
(US West Coast).
Abstract Abstract
A usage of the BGP routing protocol is described which is capable of A usage of the BGP routing protocol is described which is capable of
reducing the routing traffic passed on to routing peers and therefore reducing the routing traffic passed on to routing peers and therefore
the load on these peers without adversely affecting route convergence the load on these peers without adversely affecting route convergence
time for relatively stable routes. This technique has been time for relatively stable routes. This technique has been
implemented in commercial products supporting BGP. The technique is implemented in commercial products supporting BGP. The technique is
also applicable to IDRP. also applicable to IDRP.
skipping to change at page 10, line 10 skipping to change at page 10, line 10
number of reuse lists (reuse-list-size) number of reuse lists (reuse-list-size)
This is the number of reuse lists. It may be determined from This is the number of reuse lists. It may be determined from
reuse-list-max or set explicitly. reuse-list-max or set explicitly.
A necessary optimization is described in Section 4.8.6 that involves A necessary optimization is described in Section 4.8.6 that involves
an array referred to as the ``reuse index array''. A reuse index an array referred to as the ``reuse index array''. A reuse index
array is needed for each decay rate in use. The reuse index array is array is needed for each decay rate in use. The reuse index array is
used to estimate which reuse list to place a route when it is used to estimate which reuse list to place a route when it is
suppressed. Proper placement avoids the need to periodically evaluate suppressed. Proper placement avoids the need to periodically evaluate
decay to determine if a route can be reused. Using the reuse index decay to determine if a route can be reused or when storage can be
array avoids the need to compute a logarithm to determine placement. recovered. Using the reuse index array avoids the need to compute a
One additional system wide parameter can be introduced. logarithm to determine placement. One additional system wide
parameter can be introduced.
reuse index array size (reuse-index-array-size) reuse index array size (reuse-index-array-size)
This is the size of reuse index arrays. This size determines the This is the size of reuse index arrays. This size determines the
accuracy with which suppressed routes can be placed within the set accuracy with which suppressed routes can be placed within the set
of reuse lists when suppressed for a long time. of reuse lists when suppressed for a long time.
4.3 Guidelines for Setting Parameters 4.3 Guidelines for Setting Parameters
The decay half life should be set to a time considerably longer than The decay half life should be set to a time considerably longer than
skipping to change at page 10, line 46 skipping to change at page 10, line 47
constant rate. The time axis is labeled in multiples of the decay constant rate. The time axis is labeled in multiples of the decay
half life. The plots represent route flap with a period of 1/2, 1/3, half life. The plots represent route flap with a period of 1/2, 1/3,
1/4, and 1/8 times the decay half life. A ceiling of 4.5 was set, 1/4, and 1/8 times the decay half life. A ceiling of 4.5 was set,
which can be seen to affect three of the plots, effectively limiting which can be seen to affect three of the plots, effectively limiting
the time it takes to readvertise the route regardless of the prior the time it takes to readvertise the route regardless of the prior
history. With the cutoff and reuse thresholds suggested by the dotted history. With the cutoff and reuse thresholds suggested by the dotted
lines, routes would be suppressed after being declared unreachable 2-3 lines, routes would be suppressed after being declared unreachable 2-3
times and be used again after approximately 2 decay half life periods times and be used again after approximately 2 decay half life periods
of stability. of stability.
From either maximum hold time value (Tmax-ok or Tmax-ng), a ratio of From the maximum hold time value (T-hold), a ratio of the reuse value
the cutoff to a ceiling can be determined. An integer value for the to a ceiling can be determined. An integer value for the ceiling can
ceiling can then be chosen such that overflow will not be a problem then be chosen such that overflow will not be a problem and all other
and all other values can be scaled accordingly. If both cutoffs are values can be scaled accordingly. If both cutoffs are specified or if
specified or if multiple parameter sets are used the highest ceiling multiple parameter sets are used the highest ceiling will be used.
will be used.
time figure-of-merit as a function of time time figure-of-merit as a function of time
0.00 0.000 . 0.000 . 0.000 . 0.000 . 0.00 0.000 . 0.000 . 0.000 . 0.000 .
0.08 0.000 . 0.000 . 0.000 . 0.000 . 0.08 0.000 . 0.000 . 0.000 . 0.000 .
0.16 0.000 . 0.000 . 0.000 . 0.973 . 0.16 0.000 . 0.000 . 0.000 . 0.973 .
0.24 0.000 . 0.000 . 0.000 . 0.920 . 0.24 0.000 . 0.000 . 0.000 . 0.920 .
0.32 0.000 . 0.000 . 0.946 . 1.817 . 0.32 0.000 . 0.000 . 0.946 . 1.817 .
0.40 0.000 . 0.953 . 0.895 . 2.698 . 0.40 0.000 . 0.953 . 0.895 . 2.698 .
0.48 0.000 . 0.901 . 0.847 . 2.552 . 0.48 0.000 . 0.901 . 0.847 . 2.552 .
skipping to change at page 14, line 5 skipping to change at page 14, line 5
4.4 Run Time Data Structures 4.4 Run Time Data Structures
A fixed small amount of per system storage will be required. Where A fixed small amount of per system storage will be required. Where
sets of multiple configuration parameters are used, storage will be sets of multiple configuration parameters are used, storage will be
required per set of parameters. A small amount of per route storage required per set of parameters. A small amount of per route storage
is required. A set of list heads is needed. These list heads are is required. A set of list heads is needed. These list heads are
used to arrange suppressed routes according to the time remaining used to arrange suppressed routes according to the time remaining
until they can be reused. until they can be reused.
A separate reuse list can be used to hold unreachable routes for the
purpose of later recovering storage if they remain unreachable too
long. This might be more accurately described as a recycling list.
The advantage this would provide is making free data structures
available as soon as possible. Alternately, the data structures can
simply be placed on a queue and the storage recovered when the route
hits the front of the queue and if storage is needed. The latter is
less optimal but simple.
If multiple sets of configuration parameters are allowed per route, If multiple sets of configuration parameters are allowed per route,
there is a need for some means of associating more than one figure of there is a need for some means of associating more than one figure of
merit and set of parameters with each route. Building a linked list merit and set of parameters with each route. Building a linked list
of these objects seems like one of a number of reasonable of these objects seems like one of a number of reasonable
implementations. Similarly, a means of associating a route to a reuse implementations. Similarly, a means of associating a route to a reuse
list is required. A small overhead will be required for the pointers list is required. A small overhead will be required for the pointers
needed to implement whatever data structure is chosen for the reuse needed to implement whatever data structure is chosen for the reuse
lists. The suggested implementation uses a double linked lists and so lists. The suggested implementation uses a double linked lists and so
requires two pointers per figure of merit. requires two pointers per figure of merit.
skipping to change at page 15, line 4 skipping to change at page 15, line 13
route is unreachable. route is unreachable.
4.4.2 Data Structures per Decay Array and Reuse Index Array 4.4.2 Data Structures per Decay Array and Reuse Index Array
The following are also computed from the configuration parameters The following are also computed from the configuration parameters
though not as directly. though not as directly.
o decay rate per tick (decay-delta-t) o decay rate per tick (decay-delta-t)
o decay array size (decay-array-size) o decay array size (decay-array-size)
o decay array (decay)
o decay array (decay[])
o reuse index array size (reuse-index-array-size) o reuse index array size (reuse-index-array-size)
o reuse index array (reuse-index-array) o reuse index array (reuse-index-array[])
For each decay rate specified, an array will be used to store the For each decay rate specified, an array will be used to store the
value of a computed parameter raised to the power of the index of each value of a computed parameter raised to the power of the index of each
array element. This is to speed computations. The decay rate per array element. This is to speed computations. The decay rate per
tick is an intermediate value expressed as a real number and used to tick is an intermediate value expressed as a real number and used to
compute the values stored in the decay arrays. The array size is compute the values stored in the decay arrays. The array size is
computed from the decay memory limit configuration parameter expressed computed from the decay memory limit configuration parameter expressed
as an array size or as a maximum hold time. as an array size or as a maximum hold time.
The decay array size must be of sufficient size to accommodate the The decay array size must be of sufficient size to accommodate the
skipping to change at page 17, line 29 skipping to change at page 17, line 37
double linked list which is unused when a route is suppressed from double linked list which is unused when a route is suppressed from
use that can be used for reuse list traversal eliminating the need use that can be used for reuse list traversal eliminating the need
for additional pointer storage. for additional pointer storage.
4.5 Processing Configuration Parameters 4.5 Processing Configuration Parameters
From the configuration parameters, it is possible to precompute a From the configuration parameters, it is possible to precompute a
number of values that will be used repeatedly and retain these to number of values that will be used repeatedly and retain these to
speed later computations that will be required frequently. speed later computations that will be required frequently.
Scaling is usually dependent on the highest value that figure-of-merit
can attain, referred to here as the ceiling. The real number value of
the ceiling will typically be determined by the following equation.
ceiling = reuse * (exp(T-hold/decay-half-life) * log(2))
The methods of scaled integer arithmetic are not described in detail The methods of scaled integer arithmetic are not described in detail
here. The methods of determining the real values are given. here. The methods of determining the real values are given.
Translation into scaled integer values and the details of scaled Translation into scaled integer values and the details of scaled
integer arithmetic are left up to the individual implementations. integer arithmetic are left up to the individual implementations.
figure of merit scale factor ( scale-figure-of-merit ) figure of merit scale factor ( scale-figure-of-merit )
The ceiling value can be set to be the largest integer that can fit The ceiling value can be set to be the largest integer that can fit
in half the bits available for an unsigned integer. This will in half the bits available for an unsigned integer. This will
allow the scaled integers to be multiplied by the scaled decay allow the scaled integers to be multiplied by the scaled decay
skipping to change at page 18, line 8 skipping to change at page 18, line 27
values must be scaled according to the above scaling factor. values must be scaled according to the above scaling factor.
decay rate per tick (decay[1]) decay rate per tick (decay[1])
The decay value per increment of time as defined by the time The decay value per increment of time as defined by the time
granularity must be determined (at least initially as a floating granularity must be determined (at least initially as a floating
point number). The per tick decay is a number slightly less than point number). The per tick decay is a number slightly less than
one. It is the Nth root of the one half where N is the half life one. It is the Nth root of the one half where N is the half life
divided by the time granularity. divided by the time granularity.
decay[1] = exp ((1 / (decay-rate/delta-t)) * log (1/2)) decay[1] = exp ((1 / (decay-half-life/delta-t)) * log
(1/2))
decay array size (decay-array-size) decay array size (decay-array-size)
The decay array size is the decay memory divided by the time The decay array size is the decay memory divided by the time
granularity. If integer truncation brings the value of an array granularity. If integer truncation brings the value of an array
element to zero, the array can be made smaller. An implementation element to zero, the array can be made smaller. An implementation
should also impose a maximum reasonable array size or allow more should also impose a maximum reasonable array size or allow more
than one multiplication. than one multiplication.
decay-array-size = (Tmax/delta-t) decay-array-size = (Tmax/delta-t)
skipping to change at page 19, line 10 skipping to change at page 19, line 29
reuse array maximum ratio (max-ratio) reuse array maximum ratio (max-ratio)
This is the maximum ratio between the current value of the This is the maximum ratio between the current value of the
stability figure of merit and the target reuse value that can be stability figure of merit and the target reuse value that can be
indexed by the reuse array. It may be limited by the ceiling indexed by the reuse array. It may be limited by the ceiling
imposed by the maximum hold time or by the amount of time that the imposed by the maximum hold time or by the amount of time that the
reuse lists cover. reuse lists cover.
max-ratio = min(ceiling/reuse, exp((1 / max-ratio = min(ceiling/reuse, exp((1 /
(half-life/reuse-array-time)) * log(1/2))) (half-life/reuse-array-time)) * log(2)))
reuse array scale factor ( scale-factor ) reuse array scale factor ( scale-factor )
Since the reuse array is an estimator, the reuse array scale factor Since the reuse array is an estimator, the reuse array scale factor
has to be computed such that the full size of the reuse array is has to be computed such that the full size of the reuse array is
used. used.
scale-factor = (max-ratio - 1) / reuse-array-size scale-factor = reuse-index-array-size / (max-ratio - 1)
reuse index array (reuse) reuse index array (reuse-index-array[])
Each reuse index array entry should contain an index into the reuse Each reuse index array entry should contain an index into the reuse
list array pointing to one of the list heads. This index should list array pointing to one of the list heads. This index should
corresponding to the reuse list that will be evaluated just after a corresponding to the reuse list that will be evaluated just after a
route would be eligible for reuse given the ratio of current value route would be eligible for reuse given the ratio of current value
of the stability figure of merit to target reuse value of the stability figure of merit to target reuse value
corresponding the the reuse array entry. corresponding the the reuse array entry.
reuse-array[j] = integer(log(1 / (1 + ((j+1) * reuse-index-array[j] = integer((decay-half-life /
(max-ratio-1)))) / reuse-time-granularity) reuse-time-granularity) * log(1/(reuse * (1 + (j /
scale-factor)))) / log(1/2))
To determine which reuse queue to place a route which is being To determine which reuse queue to place a route which is being sup-
suppressed, the following procedure is used. Divide the current pressed, the following procedure is used. Divide the current figure
figure of merit by the cutoff. Subtract one. Multiply by the scale of merit by the cutoff. Subtract one. Multiply by the scale factor.
factor. This is the array index. If it is off the end of the array This is the index into the reuse index array (reuse-index-array[]).
use the last queue otherwise look in the array and pick the number of The value fetched from the reuse index array (reuse-index-array[]) is
the queue from the array at that index. This is quite fast and well an index into the array of reuse lists (reuse-array[]). If this index
worth the setup and storage required. is off the end of the array use the last queue otherwise look in the
array and pick the number of the queue from the array at that index.
This is quite fast and well worth the setup and storage required.
4.7 A Sample Configuration 4.7 A Sample Configuration
A simple example is presented here in which the space overhead is A simple example is presented here in which the space overhead is
estimated for a set of configuration parameters. The design here estimated for a set of configuration parameters. The design here
assumes: assumes:
1. there is a single parameter set used for all routes, 1. there is a single parameter set used for all routes,
2. decay time for unreachable routes is slower than for reachable 2. decay time for unreachable routes is slower than for reachable
skipping to change at page 22, line 5 skipping to change at page 22, line 23
tion are used, 2 minutes and 4 minutes. Two duty cycles are used, one tion are used, 2 minutes and 4 minutes. Two duty cycles are used, one
in which the route is reachable during 20% of the cycle and the other in which the route is reachable during 20% of the cycle and the other
where the route is reachable during 80% of the cycle. In all four where the route is reachable during 80% of the cycle. In all four
cases, the route becomes suppressed after it becomes unreachable the cases, the route becomes suppressed after it becomes unreachable the
second time. Once suppressed, it remains suppressed until some period second time. Once suppressed, it remains suppressed until some period
after becoming stable. The routes which oscillate over a 4 minute pe- after becoming stable. The routes which oscillate over a 4 minute pe-
riod are no longer suppressed within 9-11 minutes after becoming sta- riod are no longer suppressed within 9-11 minutes after becoming sta-
ble. The routes with a 2 minute period of oscillation are suppressed for ble. The routes with a 2 minute period of oscillation are suppressed for
nearly the maximum 15 minute period after becoming stable. nearly the maximum 15 minute period after becoming stable.
4.8 Processing Routing Protocol Activity
The prior sections concentrate on configuration parameters and their
relationship to the parameters and arrays used at run time and provide
the algorithms for initializing run time storage. This section
provides the steps taken in processing routing events and timer events
when running.
The routing events are:
1. A BGP peer or new route comes up for the first time (or after an
extended down time) (Section 4.8.1)
2. A route becomes unreachable (Section 4.8.2)
3. A route becomes reachable again (Section 4.8.3)
4. A route changes (Section 4.8.4)
5. A peer goes down (Section 4.8.5)
The reuse list is used to provide a means of fast evaluation of route
that had been suppressed, but had been stable long enough to be reused
again or had been suppressed long enough that it can be treated as a
new route. The following two operations are described.
time figure-of-merit as a function of time time figure-of-merit as a function of time
0.00 0.000 . 0.000 . 0.000 . 0.000 . 0.00 0.000 . 0.000 . 0.000 . 0.000 .
0.62 0.000 . 0.000 . 0.000 . 0.000 . 0.62 0.000 . 0.000 . 0.000 . 0.000 .
1.25 0.000 . 0.000 . 0.000 . 0.000 . 1.25 0.000 . 0.000 . 0.000 . 0.000 .
1.88 0.000 . 0.000 . 0.000 . 0.000 . 1.88 0.000 . 0.000 . 0.000 . 0.000 .
2.50 0.977 . 0.968 . 0.000 . 0.000 . 2.50 0.977 . 0.968 . 0.000 . 0.000 .
3.12 0.949 . 0.888 . 0.000 . 0.000 . 3.12 0.949 . 0.888 . 0.000 . 0.000 .
3.75 0.910 . 0.814 . 0.000 . 0.000 . 3.75 0.910 . 0.814 . 0.000 . 0.000 .
4.37 1.846 . 1.756 . 0.983 . 0.983 . 4.37 1.846 . 1.756 . 0.983 . 0.983 .
skipping to change at page 23, line 5 skipping to change at page 24, line 5
21.87 1.064 . 0.941 . 0.630 . 0.533 . 21.87 1.064 . 0.941 . 0.630 . 0.533 .
22.50 0.976 . 0.863 . 0.578 . 0.488 . 22.50 0.976 . 0.863 . 0.578 . 0.488 .
23.12 0.895 . 0.791 . 0.530 . 0.448 . 23.12 0.895 . 0.791 . 0.530 . 0.448 .
23.75 0.821 . 0.725 . 0.486 . 0.411 . 23.75 0.821 . 0.725 . 0.486 . 0.411 .
24.37 0.753 . 0.665 . 0.446 . 0.377 . 24.37 0.753 . 0.665 . 0.446 . 0.377 .
25.00 0.690 . 0.610 . 0.409 . 0.345 . 25.00 0.690 . 0.610 . 0.409 . 0.345 .
Figure 3: Some fairly long route flap cycles, repeated for 12 Figure 3: Some fairly long route flap cycles, repeated for 12
minutes, followed by a period of stability. minutes, followed by a period of stability.
4.8 Processing Routing Protocol Activity
The prior sections concentrate on configuration parameters and their
relationship to the parameters and arrays used at run time and provide
the algorithms for initializing run time storage. This section
provides the steps taken in processing routing events and timer events
when running.
The routing events are:
1. A BGP peer or new route comes up for the first time (or after an
extended down time) (Section 4.8.1)
2. A route becomes unreachable (Section 4.8.2)
3. A route becomes reachable again (Section 4.8.3)
4. A route changes (Section 4.8.4)
5. A peer goes down (Section 4.8.5)
The reuse list is used to provide a means of fast evaluation of route
that had been suppressed, but had been stable long enough to be reused
again or had been suppressed long enough that it can be treated as a
new route. The following two operations are described.
1. Inserting into a reuse list (Section 4.8.6) 1. Inserting into a reuse list (Section 4.8.6)
2. Reuse list processing every delta-t seconds (Section 4.8.7) 2. Reuse list processing every delta-t seconds (Section 4.8.7)
4.8.1 Processing a New Peer or New Routes 4.8.1 Processing a New Peer or New Routes
When a peer comes up, no action is required if the routes had no When a peer comes up, no action is required if the routes had no
previous history of instability, for example if this is the first time previous history of instability, for example if this is the first time
the peer is coming up and announcing these routes. For each route, the peer is coming up and announcing these routes. For each route,
the pointer to the damping structure would be zeroed and route used. the pointer to the damping structure would be zeroed and route used.
skipping to change at page 31, line 43 skipping to change at page 32, line 12
between the initial draft presented to the BGP WG (October 1993) and between the initial draft presented to the BGP WG (October 1993) and
this iteration. At the time of this writing there is significant this iteration. At the time of this writing there is significant
experience with two implementations, each having been deployed since experience with two implementations, each having been deployed since
1995. One was led by Ramesh Govindan (ISI) for the NSF Routing Ar- 1995. One was led by Ramesh Govindan (ISI) for the NSF Routing Ar-
biter project. The second was led by Ravi Chandra (Cisco). Sean Doran biter project. The second was led by Ravi Chandra (Cisco). Sean Doran
(Sprintlink) and Serpil Bayraktar (ANS) were among the early independent (Sprintlink) and Serpil Bayraktar (ANS) were among the early independent
testers of the Cisco pre-beta implementation. Valuable comments and im- testers of the Cisco pre-beta implementation. Valuable comments and im-
plementation feedback were shared by many individuals on the IETF IDR WG plementation feedback were shared by many individuals on the IETF IDR WG
and the RIPE Routing Work Group and in NANOG and IEPG. and the RIPE Routing Work Group and in NANOG and IEPG.
Thanks also to Rob Coltun (Fore Systems), Sanjay Wadhwa (Fore), John
Scudder (IENG), Eric Bennet (IENG) and Jayesh Bhatt (Bay Networks) for
pointing out errors in the math uncovered during coding of more recent
implementations. These errors appeared in the details of the
implementation suggestion sections written after the first two
implementations were completed.
References References
[1] P. Gross and Y. Rekhter. Application of the border gateway proto- [1] P. Gross and Y. Rekhter. Application of the border gateway proto-
col in col in
the internet. Request for Comments (Draft Standard) RFC 1268, In- the internet. Request for Comments (Draft Standard) RFC 1268, In-
ternet Engineering Task Force, October 1991. (Obsoletes RFC1164); ternet Engineering Task Force, October 1991. (Obsoletes RFC1164);
(Obsoleted by RFC1655). ftp://ds.internic.net/rfc/rfc1268.txt. (Obsoleted by RFC1655). ftp://ds.internic.net/rfc/rfc1268.txt.
[2] ISO/IEC. Iso/iec 10747 - information technology - telecommunica- [2] ISO/IEC. Iso/iec 10747 - information technology - telecommunica-
tions and information exchange between systems - protocol for tions and information exchange between systems - protocol for
 End of changes. 

This html diff was produced by rfcdiff 1.23, available from http://www.levkowetz.com/ietf/tools/rfcdiff/