draft-ietf-grow-bgp-wedgies-02.txt   draft-ietf-grow-bgp-wedgies-03.txt 
GROW T. Griffin GROW T. Griffin
Internet-Draft University of Cambridge Internet-Draft University of Cambridge
Expires: October 16, 2005 G. Huston Expires: December 12, 2005 G. Huston
APNIC APNIC
April 14, 2005 June 10, 2005
BGP Wedgies BGP Wedgies
draft-ietf-grow-bgp-wedgies-02.txt draft-ietf-grow-bgp-wedgies-03.txt
Status of this Memo Status of this Memo
This document is an Internet-Draft and is subject to all provisions By submitting this Internet-Draft, each author represents that any
of Section 3 of RFC 3667. By submitting this Internet-Draft, each applicable patent or other IPR claims of which he or she is aware
author represents that any applicable patent or other IPR claims of have been or will be disclosed, and any of which he or she becomes
which he or she is aware have been or will be disclosed, and any of aware will be disclosed, in accordance with Section 6 of BCP 79.
which he or she become aware will be disclosed, in accordance with
RFC 3668.
Internet-Drafts are working documents of the Internet Engineering Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF), its areas, and its working groups. Note that Task Force (IETF), its areas, and its working groups. Note that
other groups may also distribute working documents as Internet- other groups may also distribute working documents as Internet-
Drafts. Drafts.
Internet-Drafts are draft documents valid for a maximum of six months Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress." material or to cite them other than as "work in progress."
The list of current Internet-Drafts can be accessed at The list of current Internet-Drafts can be accessed at
http://www.ietf.org/ietf/1id-abstracts.txt. http://www.ietf.org/ietf/1id-abstracts.txt.
The list of Internet-Draft Shadow Directories can be accessed at The list of Internet-Draft Shadow Directories can be accessed at
http://www.ietf.org/shadow.html. http://www.ietf.org/shadow.html.
This Internet-Draft will expire on October 16, 2005. This Internet-Draft will expire on December 12, 2005.
Copyright Notice Copyright Notice
Copyright (C) The Internet Society (2005). Copyright (C) The Internet Society (2005).
Abstract Abstract
It has commonly been assumed that the Border Gateway Protocol (BGP) It has commonly been assumed that the Border Gateway Protocol (BGP)
is a tool for distributing reachability information in a manner that is a tool for distributing reachability information in a manner that
creates forwarding paths in a deterministic manner. In this memo we creates forwarding paths in a deterministic manner. In this memo we
skipping to change at page 2, line 42 skipping to change at page 2, line 40
exchange. In the same vein the local network is often configured to exchange. In the same vein the local network is often configured to
prefer routes learned from a peer or a customer over those learned prefer routes learned from a peer or a customer over those learned
from a directly connected upstream transit provider. These from a directly connected upstream transit provider. These
preferences may be expressed via a local preference configuration preferences may be expressed via a local preference configuration
setting, where the local preference overrides the AS path length setting, where the local preference overrides the AS path length
metric of the base BGP operation. metric of the base BGP operation.
In terms of engineering reliability in the inter-domain routing In terms of engineering reliability in the inter-domain routing
environment it is commonly the case that a service provider may enter environment it is commonly the case that a service provider may enter
into arrangements with two or more upstream transit providers, into arrangements with two or more upstream transit providers,
passing routes to both providers , and receiving traffic from both passing routes to all upstream providers, and receiving traffic from
sources. If the path to one upstream fails the traffic will switch all sources. If the path to one upstream fails the traffic will
to other links, and once the path is recovered, the traffic should switch to other links. Once the path is recovered, the traffic
switch back. should switch back.
In such situations of multiple upstream providers it is also In such situations of multiple upstream providers it is also
commonplace to place a relative preference on the providers, so that commonplace to place a relative preference on the providers, so that
one connection is regarded as a preferred, or "primary" connection, one connection is regarded as a preferred, or "primary" connection,
and other connections are regarded as less preferred, or "backup" and other connections are regarded as less preferred, or "backup"
connections. The intent is typically that the backup connections connections. The intent is typically that the backup connections
will be used for traffic only for the duration of a failure in the will be used for traffic only for the duration of a failure in the
primary connection. primary connection.
It is possible to express this primary / backup policy using local AS It is possible to express this primary / backup policy using local AS
skipping to change at page 3, line 34 skipping to change at page 3, line 32
is no other source of the route. is no other source of the route.
3. BGP Wedgies 3. BGP Wedgies
The richness of local policy expression through the use of The richness of local policy expression through the use of
communities, when coupled with the behavior of a distance vector communities, when coupled with the behavior of a distance vector
protocol like BGP leads to the observation that certain protocol like BGP leads to the observation that certain
configurations have more than one "solution", or more than one stable configurations have more than one "solution", or more than one stable
BGP state. An example of such a situation is indicated in Figure 1. BGP state. An example of such a situation is indicated in Figure 1.
+----+ +----+
|AS 3|----------------|AS 4|
+----+ peer peer +----+ +----+ peer peer +----+
|provider |provider |AS 3|------------------------|AS 4|
+----+ +----+
|provider provider|
| |
| | | |
|customer | |customer |
+----+ | +----+ |
|AS 2| | |AS 2| |
+----+ | +----+ |
|provider | |provider |
| | | |
|customer |customer | |
+-------+ +----------+ |customer customer|
backup| |primary +---------------+ +----------+
backup service| |primary service
+----+ +----+
|AS 1| |AS 1|
+----+ +----+
Figure 1
Figure 1
In this case AS1 has marked its advertisement of prefixes to AS2 as In this case AS1 has marked its advertisement of prefixes to AS2 as
"backup only", and its advertisement of prefixes to AS4 as "primary". "backup only", and its advertisement of prefixes to AS4 as "primary".
AS3 will hear AS4's advertisement across the peering link, and pick AS4 will advertise AS1's prefixes to AS3. AS3 will hear AS4's
of AS1's prefixes with the path "AS4, AS1". AS3 will advertise this advertisement across the peering link, and select AS1's prefixes with
to AS2. AS2 will hear two paths to AS1, the first is by the direct the path "AS4, AS1". AS3 will advertise these prefixes to AS2. AS2
will hear two paths to AS1's prefixes, the first is via the direct
connection to AS1, and the second is via the path "AS3, AS4, AS1". connection to AS1, and the second is via the path "AS3, AS4, AS1".
AS2 will prefer the longer path as the directly connected routes are AS2 will prefer the longer path, as the directly connected routes are
marked "backup only", and AS2's local preference decision will prefer marked "backup only", and AS2's local preference decision will prefer
the AS3 advertisement over the AS1 advertisement. the AS3 advertisement over the AS1 advertisement.
This is the intended outcome of AS1's policy settings, where no This is the intended outcome of AS1's policy settings, where in the
traffic passes from AS2 to AS1, and AS2, reaches AS1 via a path that 'normal' state no traffic passes from AS2 to AS1 across the backup
transits AS3 and AS4. link, and AS2 reaches AS1 via a path that transits AS3 and AS4, using
the primary link to AS1.
This intended outcome is achieved as long as AS1 announces its routes This intended outcome is achieved as long as AS1 announces its routes
on the primary path, to AS4, before announcing its backup routes to on the primary path to AS4 before announcing its backup routes to
AS2. AS2.
If the AS1 - AS4 path is broken, causing aBGP sesssion failure If the AS1 - AS4 path is broken, causing aBGP sesssion failure
between AS1 and AS4, then AS4 will withdraw its advertisement of between AS1 and AS4, then AS4 will withdraw its advertisement of
AS1's routes to AS3, who, in turn will send a withdrawal to AS2. AS1's routes to AS3, who, in turn, will send a withdrawal to AS2.
As2, will then select the backup path to AS1. AS2 will advertise AS2, will then select the backup path to AS1. AS2 will advertise
this path to AS3, and AS3 will advertise this path to AS4. Again, this path to AS3, and AS3 will advertise this path to AS4. Again,
this is part of the intended operation of the primary / backup policy this is part of the intended operation of the primary / backup policy
setting. setting, and all traffic to AS1 will use the backup path.
When connectivity between AS4 and AS1 is restored the BGP state will When connectivity between AS4 and AS1 is restored the BGP state will
not revert to the original state. AS4 will learn the primary path to not revert to the original state. AS4 will learn the primary path to
AS1, and readvertise this to AS3 using the path "AS4, AS1". AS3, AS1, and readvertise this to AS3 using the path "AS4, AS1". AS3,
using a default preference of preferring customer-advertised routes using a default preference of preferring customer-advertised routes
over peer routes will continue to prefer the "AS2, AS1" path. AS3 over peer routes will continue to prefer the "AS2, AS1" path. AS3
will not pass any updates to AS2. After the restoration of the will not pass any updates to AS2. After the restoration of the AS4
circuit traffic from AS3 to AS1 and from AS2 to AS1 will be presented to AS1 circuit the traffic from AS3 to AS1 and from AS2 to AS1 will
to AS1 via the backup path, even through the primary path via AS4 is be presented to AS1 via the backup path, even through the primary
in service. path via AS4 is back in service.
The intended forwarding state can only be restored by AS1 The intended forwarding state can only be restored by AS1
deliberately bringing down its eBGP session with AS2, even though it deliberately bringing down its eBGP session with AS2, even though it
is carrying traffic. This will cause the BGP state to revert to the is carrying traffic. This will cause the BGP state to revert to the
intended configuration. intended configuration.
It is often the case that an AS will attempt to balance incoming It is often the case that an AS will attempt to balance incoming
traffic across multiple providers, again using the primary / backup traffic across multiple providers, again using the primary / backup
mechanism. For some prefixes one link is configured as the primary mechanism. For some prefixes one link is configured as the primary
link, and the others as the backup link, while for other prefixes link, and the others as the backup link, while for other prefixes
another link is selected as the primary link. An example is shown in another link is selected as the primary link. An example is shown in
Figure 2. Figure 2.
+----+ +----+
|AS 3|----------------|AS 4|
+----+ peer peer +----+ +----+ peer peer +----+
|provider |provider |AS 3|--------------------------|AS 4|
+----+ +----+
|provider provider|
| | | |
|customer |customer | customer|
|customer |
+----+ +----+ +----+ +----+
|AS 2| |AS 5| |AS 2| |AS 5|
+----+ +----+ +----+ +----+
|provider |provider |provider provider|
| | | |
|customer |customer | |
+-------+ +----------+ |customer customer|
backup| |primary for 192.9.200.0/25 +-----------------+ +----------+
primary| |backup for 192.9.200.128/25 | |
backup (192.0.2.0/25) | |primary service (192.0.2.0/25)
primary (192.0.2.128/25)| |backup service (192.0.2.128/25)
+----+ +----+
|AS 1| |AS 1|
+----+ +----+
Figure 2 Figure 2
The intended configuration has all incoming traffic for addresses in The intended configuration has all incoming traffic for addresses in
the range 192.9.200.0/25 via the link from AS5, and all incoming the range 192.0.2.0/25 via the link from AS5, and all incoming
traffic for addresses in the range 192.9.200.128/25 from AS2. traffic for addresses in the range 192.0.2.128/25 from AS2.
In this case if the link between AS3 and AS4 is reset, AS3 will learn In this case if the link between AS3 and AS4 is reset, AS3 will learn
both routes from AS2, and AS4 will learn both routes from AS5. As both routes from AS2, and AS4 will learn both routes from AS5. As
these customer routes are preferred over peer routes, when the link these customer routes are preferred over peer routes, when the link
between AS3 and AS4 is restored, neither AS will alter its routing between AS3 and AS4 is restored, neither AS3 nor AS4 will alter their
behavior with respect to AS1's routes. This situation is now wedged, routing behavior with respect to AS1's routes. This situation is now
in that there is no eBGP peering that can be reset that will flip BGP wedged, in that there is no eBGP peering that can be reset that will
back to the intended state. This is an instance of a BGP Wedgie. flip BGP back to the intended state. This is an instance of a BGP
Wedgie.
The restoration path here is that AS1 has to withdraw the backup The restoration path here is that AS1 has to withdraw the backup
advertisements on both paths and operate for an interval without advertisements on both paths and operate for an interval without
backup, and then readvertise the backup prefix advertisements. The backup, and then readvertise the backup prefix advertisements. The
length of the interval cannot be readily determined in advance, as it length of the interval cannot be readily determined in advance, as it
has to be sufficiently long so as to allow AS2 and AS5 to learn of an has to be sufficiently long so as to allow AS2 and AS5 to learn of an
alternate path to AS1. At this stage the backup routes can be alternate path to AS1. At this stage the backup routes can be
readvertised. readvertised.
4. Multi-Party BGP Wedgies 4. Multi-Party BGP Wedgies
This situation can be more complex when three or more parties provide This situation can be more complex when three or more parties provide
upstream transit services to an AS. An example is indicated in upstream transit services to an AS. An example is indicated in
Figure 3. Figure 3.
+----+ +----+
|AS 3|----------------|AS 4|
+----+ peer peer +----+ +----+ peer peer +----+
||provider |provider |AS 3|------------------------|AS 4|
|+-----------+ | +----+ +----+
||provider provider|
|+----------------+ |
| | |
|customer |customer | |customer |customer |
+----+peer peer+----+ |
|AS 2|-----------|AS 5| |
+----+ +----+ | +----+ +----+ |
|AS 2|-------|AS 5| | |provider provider| |
+----+ peer +----+ |
|provider |provider |
| | | | | |
|customer +-+customer |customer | | |
+-------+ |+----------+ |customer customer| customer|
backup| ||primary +---------------+ |+---------+
backup service| ||primary service
+----+ +----+
|AS 1| |AS 1|
+----+ +----+
Figure 3 Figure 3
In this example the intended state is that AS2 and AS5 are both In this example the intended state is that AS2 and AS5 are both
backup providers, and AS4 is the primary provider. When the link backup providers to AS1, and AS4 is the primary provider. When the
between AS1 and AS4 breaks and is subsequently restored, AS3 will link between AS1 and AS4 breaks and is subsequently restored, AS3
continue to direct traffic to AS1 via AS2 or AS5. In this case a will continue to direct traffic to AS1 via AS2 or AS5. In this case
single reset of the link between AS2 and AS1 will not restore the a single reset of the link between AS2 and AS1 will not restore the
original intended BGP state, as the BGP-selected best route to AS1 original intended BGP state, as the BGP-selected best route to AS1
will switch to AS5, and AS2 and AS3 will learn a path to AS1 via AS5. will switch to AS5, and AS2 and AS3 will learn a path to AS1 via AS5.
What AS1 is observing is incoming traffic on the backup link from What AS1 is observing is incoming traffic on the backup link from
AS2. Resetting this connection will not restore traffic back to the AS2. Resetting this connection will not restore traffic back to the
primary path, but instead will switch incoming traffic over to AS5. primary path, but instead will switch incoming traffic over to AS5.
The action required to correct the situation is to simultaneously The action required to correct the situation is to simultaneously
reset both the link to AS2, and also the link to AS5. This is not reset both the link to AS2, and also the link to AS5. This is not
necessarily an intuitive solution, as at any point on time only one necessarily an intuitively obvious solution, as at any point on time
of these links will be carrying backup traffic, yet both BGP sessions only one of these links will be carrying backup traffic, yet both BGP
need to be brought down at the same time in order to commence sessions need to be brought down at the same time in order to
restoration of the intended primary and backup state. commence restoration of the intended primary and backup state.
5. BGP and Determinism 5. BGP and Determinism
BGP does not behave deterministically in all cases, and, as a BGP does not behave deterministically in all cases, and, as a
consequence, there is intended and unintended non-determinism in BGP. consequence, there is intended and unintended non-determinism in BGP.
For example, the default final tie break in some implementations of For example, the default final tie break in some implementations of
BGP is to prefer the longest-lived route. To achieve determinism in BGP is to prefer the longest-lived route. To achieve determinism in
this last step it would be necessary to use a comparison operator this last step it would be necessary to use a comparison operator
that has a predictable outcome, such as a comparison of router that has a predictable outcome, such as a comparison of router
identifiers. This class of non-deterministic behavior is termed here identifiers. This class of non-deterministic behavior is termed here
skipping to change at page 8, line 34 skipping to change at page 8, line 43
introduces no new factors in terms of the security and integrity of introduces no new factors in terms of the security and integrity of
inter-domain routing. inter-domain routing.
The memo illustrates that in attempting to create policy-based The memo illustrates that in attempting to create policy-based
outcomes relating to path selection for incoming traffic it is outcomes relating to path selection for incoming traffic it is
possible to generate BGP configurations where there are multiple possible to generate BGP configurations where there are multiple
stable outcomes, rather than a single outcome. Furthermore, of these stable outcomes, rather than a single outcome. Furthermore, of these
instances of multiple outcomes, there are cases where the BGP instances of multiple outcomes, there are cases where the BGP
selection of a particular outcome is not a deterministic selection. selection of a particular outcome is not a deterministic selection.
This class of behaviour may be exploitable by a hostile third party.
A common theme of BGP Wedgies is that starting from an intended or
desired forwarding state, the loss and subsequent restoration of an
eBGP peering connection can flip the network's forwarding
configuration into an unintended and potentially undesired state.
Significant administrative effort, based on BGP state and
configuration knowledge that may not be locally available, may be
required to shift the BGP forwarding configuration back to the
intended or desired forwardinging state. If a hostile third party
can deliberately cause the BGP session to reset, thereby producing
the initial conditions that lead to an unintended forwarding state,
the network impacts of the resulting unintended or undesired
forwarding state may be long-lived, far outliving the temporary
interruption of connectivity that triggered the condition. If these
impacts, including potential issues of increased cost, reduction of
available bandwidth, increases in overall latency or degradation of
service reliability, are significant, then disrupting a BGP session
could represent an attractive attack vector to a hostile party.
7. IANA Considerations 7. IANA Considerations
[Note to RFC Editor: Please remove this section prior to publication] [Note to RFC Editor: Please remove this section prior to publication]
This document has no associated IANA actions or considerations. This document has no associated IANA actions or considerations.
8. References 8. References
8.1 Normative References 8.1 Normative References
 End of changes. 

This html diff was produced by rfcdiff 1.23, available from http://www.levkowetz.com/ietf/tools/rfcdiff/