draft-ietf-grow-va-01.txt   draft-ietf-grow-va-02.txt 
Network Working Group P. Francis Network Working Group P. Francis
Internet-Draft MPI-SWS Internet-Draft MPI-SWS
Intended status: Informational X. Xu Intended status: Informational X. Xu
Expires: April 28, 2010 Huawei Expires: September 9, 2010 Huawei
H. Ballani H. Ballani
Cornell U. Cornell U.
D. Jen D. Jen
UCLA UCLA
R. Raszuk R. Raszuk
Self Cisco
L. Zhang L. Zhang
UCLA UCLA
October 25, 2009 March 8, 2010
FIB Suppression with Virtual Aggregation FIB Suppression with Virtual Aggregation
draft-ietf-grow-va-01.txt draft-ietf-grow-va-02.txt
Abstract
The continued growth in the Default Free Routing Table (DFRT)
stresses the global routing system in a number of ways. One of the
most costly stresses is FIB size: ISPs often must upgrade router
hardware simply because the FIB has run out of space, and router
vendors must design routers that have adequate FIB. FIB suppression
is an approach to relieving stress on the FIB by NOT loading selected
RIB entries into the FIB. Virtual Aggregation (VA) allows ISPs to
shrink the FIBs of any and all routers, easily by an order of
magnitude with negligible increase in path length and load. FIB
suppression deployed autonomously by an ISP (cooperation between ISPs
is not required), and can co-exist with legacy routers in the ISP.
Status of this Memo Status of this Memo
This Internet-Draft is submitted to IETF in full conformance with the This Internet-Draft is submitted to IETF in full conformance with the
provisions of BCP 78 and BCP 79. provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF), its areas, and its working groups. Note that Task Force (IETF), its areas, and its working groups. Note that
other groups may also distribute working documents as Internet- other groups may also distribute working documents as Internet-
Drafts. Drafts.
skipping to change at page 1, line 41 skipping to change at page 2, line 9
and may be updated, replaced, or obsoleted by other documents at any and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress." material or to cite them other than as "work in progress."
The list of current Internet-Drafts can be accessed at The list of current Internet-Drafts can be accessed at
http://www.ietf.org/ietf/1id-abstracts.txt. http://www.ietf.org/ietf/1id-abstracts.txt.
The list of Internet-Draft Shadow Directories can be accessed at The list of Internet-Draft Shadow Directories can be accessed at
http://www.ietf.org/shadow.html. http://www.ietf.org/shadow.html.
This Internet-Draft will expire on April 28, 2010. This Internet-Draft will expire on September 9, 2010.
Copyright Notice Copyright Notice
Copyright (c) 2009 IETF Trust and the persons identified as the Copyright (c) 2010 IETF Trust and the persons identified as the
document authors. All rights reserved. document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents in effect on the date of Provisions Relating to IETF Documents
publication of this document (http://trustee.ietf.org/license-info). (http://trustee.ietf.org/license-info) in effect on the date of
Please review these documents carefully, as they describe your rights publication of this document. Please review these documents
and restrictions with respect to this document. carefully, as they describe your rights and restrictions with respect
to this document. Code Components extracted from this document must
Abstract include Simplified BSD License text as described in Section 4.e of
the Trust Legal Provisions and are provided without warranty as
The continued growth in the Default Free Routing Table (DFRT) described in the BSD License.
stresses the global routing system in a number of ways. One of the
most costly stresses is FIB size: ISPs often must upgrade router
hardware simply because the FIB has run out of space, and router
vendors must design routers that have adequate FIB. FIB suppression
is an approach to relieving stress on the FIB by NOT loading selected
RIB entries into the FIB. Virtual Aggregation (VA) allows ISPs to
shrink the FIBs of any and all routers, easily by an order of
magnitude with negligible increase in path length and load. FIB
suppression deployed autonomously by an ISP (cooperation between ISPs
is not required), and can co-exist with legacy routers in the ISP.
Table of Contents Table of Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.1. Scope of this Document . . . . . . . . . . . . . . . . . . 4 1.1. Scope of this Document . . . . . . . . . . . . . . . . . . 5
1.2. Requirements notation . . . . . . . . . . . . . . . . . . 5 1.2. Requirements notation . . . . . . . . . . . . . . . . . . 6
1.3. Terminology . . . . . . . . . . . . . . . . . . . . . . . 5 1.3. Terminology . . . . . . . . . . . . . . . . . . . . . . . 6
1.4. Temporary Sections . . . . . . . . . . . . . . . . . . . . 6 1.4. Temporary Sections . . . . . . . . . . . . . . . . . . . . 7
1.4.1. Document revisions . . . . . . . . . . . . . . . . . . 6 1.4.1. Document revisions . . . . . . . . . . . . . . . . . . 7
2. Overview of Virtual Aggregation (VA) . . . . . . . . . . . . . 8 2. Overview of Virtual Aggregation (VA) . . . . . . . . . . . . . 9
2.1. Mix of legacy and VA routers . . . . . . . . . . . . . . . 10 2.1. Mix of legacy and VA routers . . . . . . . . . . . . . . . 11
2.2. Summary of Tunnels and Paths . . . . . . . . . . . . . . . 10 2.2. Summary of Tunnels and Paths . . . . . . . . . . . . . . . 11
3. Specification of VA . . . . . . . . . . . . . . . . . . . . . 12 3. Specification of VA . . . . . . . . . . . . . . . . . . . . . 13
3.1. Requirements for VA . . . . . . . . . . . . . . . . . . . 12 3.1. VA Operation . . . . . . . . . . . . . . . . . . . . . . . 13
3.2. VA Operation . . . . . . . . . . . . . . . . . . . . . . . 13 3.1.1. Legacy Routers . . . . . . . . . . . . . . . . . . . . 13
3.2.1. Legacy Routers . . . . . . . . . . . . . . . . . . . . 13 3.1.2. Advertising and Handling Virtual Prefixes (VP) . . . . 14
3.2.2. Advertising and Handling Virtual Prefixes (VP) . . . . 13 3.1.3. Border VA Routers . . . . . . . . . . . . . . . . . . 17
3.2.3. Border VA Routers . . . . . . . . . . . . . . . . . . 17 3.1.4. Advertising and Handling Sub-Prefixes . . . . . . . . 18
3.2.4. Advertising and Handling Sub-Prefixes . . . . . . . . 18 3.1.5. Suppressing FIB Sub-prefix Routes . . . . . . . . . . 18
3.2.5. Suppressing FIB Sub-prefix Routes . . . . . . . . . . 18 3.2. New Configuration . . . . . . . . . . . . . . . . . . . . 20
3.2.6. Core-Edge Operation . . . . . . . . . . . . . . . . . 20 4. Usage of Tunnels . . . . . . . . . . . . . . . . . . . . . . . 21
3.3. Requirements Discussion . . . . . . . . . . . . . . . . . 21 4.1. MPLS tunnels . . . . . . . . . . . . . . . . . . . . . . . 21
3.3.1. Response to router failure . . . . . . . . . . . . . . 21 4.2. Usage of Inner Label . . . . . . . . . . . . . . . . . . . 21
3.3.2. Traffic Engineering . . . . . . . . . . . . . . . . . 22 5. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 22
3.3.3. Incremental and safe deploy and start-up . . . . . . . 22 6. Security Considerations . . . . . . . . . . . . . . . . . . . 22
3.3.4. VA security . . . . . . . . . . . . . . . . . . . . . 22 6.1. Properly Configured VA . . . . . . . . . . . . . . . . . . 22
3.4. New Configuration . . . . . . . . . . . . . . . . . . . . 23 6.2. Mis-configured VA . . . . . . . . . . . . . . . . . . . . 23
4. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 24 7. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 23
5. Security Considerations . . . . . . . . . . . . . . . . . . . 24 8. References . . . . . . . . . . . . . . . . . . . . . . . . . . 23
5.1. Properly Configured VA . . . . . . . . . . . . . . . . . . 24 8.1. Normative References . . . . . . . . . . . . . . . . . . . 23
5.2. Mis-configured VA . . . . . . . . . . . . . . . . . . . . 25 8.2. Informative References . . . . . . . . . . . . . . . . . . 24
6. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 25 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 24
7. References . . . . . . . . . . . . . . . . . . . . . . . . . . 25
7.1. Normative References . . . . . . . . . . . . . . . . . . . 25
7.2. Informative References . . . . . . . . . . . . . . . . . . 26
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 26
1. Introduction 1. Introduction
ISPs today manage constant DFRT growth in a number of ways. Most ISPs today manage constant DFRT growth in a number of ways. One way,
commonly, ISPs will upgrade their router hardware before DFRT growth of course, is for ISPs to upgrade their router hardware before DFRT
outstrips the size of the FIB. In cases where an ISP wants to growth outstrips the size of the FIB. This is too expensive for many
continue to use routers whose FIBs are not large enough, it may ISPs. They would prefer to extend the lifetime of routers whose FIBs
deploy them at edge locations where a full DFRT is not needed, for can no longer hold the full DFRT.
instance at the customer interface. Packets for which there is no
route are defaulted to a "core" infrastructure that does contain the A common approach taken by lower-tier ISPs is to default route to
full DFRT. While this helps, it cannot be used for all edge routers, their providers. Routes to customers and peer ISPs are maintained,
for instance those that interface with other ISPs. Alternatively, but everything else defaults to the provider. This approach has
some lower-tier ISPs may simply ignore some routes, for instance several disadvantages. First, packets to Internet destinations may
/24's that fall within the aggregate of another route. take longer-than-necessary AS paths. This problem can be mitigated
through careful configuration of partial defaults, but this can
require substantial configuration overhead. A second problem with
defaulting to providers is that the ISP is no longer able to provide
the full DFRT to its customers. Finally, provider defaults prevents
the ISP from being able to detect martian packets. As a result, the
ISP transmits packets that could otherwise have been dropped over its
expensive provider links.
An alternative is for the ISP to maintain full routes in its core
routers, but to filter routes from edge routers that do not require a
full DFRT. These edge routers can then default route to the core
routers. This is often possible with edge routers that interface to
customer networks. The problem with this approach is that it cannot
be used for all edge routers. For instance, it cannot be used for
routers that connect to transits. It of course also does not help in
cases where core routers themselves have inadequate FIB capacity.
FIB Suppression is an approach to shrinking FIB size that requires no FIB Suppression is an approach to shrinking FIB size that requires no
changes to BGP, no changes to packet forwarding mechanisms in changes to BGP, no changes to packet forwarding mechanisms in
routers, and relatively minor changes to control mechanisms in routers, and relatively minor changes to control mechanisms in
routers and configuration of those mechanisms. The core idea behind routers and configuration of those mechanisms. The core idea behind
FIB suppression is to run BGP as normal, and in particular to not FIB suppression is to run BGP as normal, and in particular to not
shrink the RIB, but rather to not load certain RIB entries into the shrink the RIB, but rather to not load certain RIB entries into the
FIB. This approach minimizes changes to routers, and in particular FIB. This approach minimizes changes to routers, and in particular
is simpler than more general routing architectures that try to shrink is simpler than more general routing architectures that try to shrink
both RIB and FIB. With FIB suppression, there are no changes to BGP both RIB and FIB. With FIB suppression, there are no changes to BGP
per se. The BGP decision process does not change. The selected AS- per se. The BGP decision process does not change. The selected AS-
path does not change, and except on rare occasion the exit router path does not change, and except on rare occasion the exit router
does not change. ISPs can deploy FIB suppression autonomously and does not change. ISPs can deploy FIB suppression autonomously and
with no coordination with neighbor ASes. with no coordination with neighboring ASes.
This document describes an approach to FIB suppression called This document describes an approach to FIB suppression called
"Virtual Aggregation" (VA). VA operates by organizing the IP (v4 or "Virtual Aggregation" (VA). VA operates by organizing the IP (v4 or
v6) address space into Virtual Prefixes (VP), and using tunnels to v6) address space into Virtual Prefixes (VP), and using tunnels to
aggregate the (regular) sub-prefixes within each VP. The decrease in aggregate the (regular) sub-prefixes within each VP. The decrease in
FIB size can be dramatic, easily 5x or 10x with only a slight path FIB size can be dramatic, easily 5x or 10x with only a slight path
length and router load increase [nsdi09]. The VPs can be organized length and router load increase [nsdi09]. The VPs can be organized
such that all routers in an ISP see FIB size decrease, or in such a such that all routers in an ISP see FIB size decrease, or in such a
way that "core" routers keep the full FIB, and "edge" routers have way that "core" routers keep the full FIB, and "edge" routers have
almost no FIB (i.e. by defining a VP of 0/0). almost no FIB (i.e. by defining a VP of 0/0). This "core-edge" style
of VA deployment is much simpler than a "full" VA deployment, whereby
multiple VPs are defined, and any router, core or otherwise, can have
reduced FIB size. This simpler "core-edge" style of deployment is
specified in a separate draft in order to make it more easily
understandable [I-D.ietf-grow-simple-va].
VA has the following characteristics:
o it is robust to router failure,
o it allows for traffic engineering,
o it allows for existing inter-domain routing policies,
o it operates in a predictable manner and is therefore possible to
test, debug, and reason about performance (i.e. establish SLAs),
o it can be safely installed, tested, and started up,
o it can be configured and reconfigured without service
interruption,
o it can be incrementally deployed, and in particular can be
operated in an AS with a mix of VA-capable and legacy routers,
o it accommodates existing security mechanisms such as unicast
Reverse Path Forwarding (uRPF) ingress filtering and DoS defense,
o does not introduce significant new security vulnerabilities.
1.1. Scope of this Document 1.1. Scope of this Document
The scope of this document is limited to Intra-domain VA operation. The scope of this document is limited to Intra-domain VA operation.
In other words, the case where a single ISP autonomously operates VA In other words, the case where a single ISP autonomously operates VA
internally without any coordination with neighboring ISPs. internally without any coordination with neighboring ISPs.
Note that this document assumes that the VA "domain" (i.e. the unit Note that this document assumes that the VA "domain" (i.e. the unit
of autonomy) is the AS (that is, different ASes run VA independently of autonomy) is the AS (that is, different ASes run VA independently
and without coordination). For the remainder of this document, the and without coordination). For the remainder of this document, the
terms ISP, AS, and domain are used interchangeably. terms ISP, AS, and domain are used interchangeably.
This document applies equally to IPv4 and IPv6. This document applies equally to IPv4 and IPv6.
VA may operate with a mix of upgraded routers and legacy routers. VA may operate with a mix of upgraded routers and legacy routers.
There are no topological restrictions placed on the mix of routers. There are no topological restrictions placed on the mix of routers.
In order to avoid loops between upgraded and legacy routers, however, In order to avoid loops between upgraded and legacy routers, packets
legacy routers must be able to terminate tunnels. are always tunneled by the VA routers to the BGP NEXT_HOPs of the
matched BGP routes. If a given local ASBR is a legacy router, it
This document is agnostic about what type of tunnel may be used for must be able to terminate tunnels.
VA, and does not specify a tunnel type per se. Rather, it refers
generically to tunnels and specifies the minimum set of requirements
that a given tunnel type must satisfy. Separate documents are used
to specify the operation of VA for specific tunnel types.
1.2. Requirements notation 1.2. Requirements notation
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
document are to be interpreted as described in [RFC2119]. document are to be interpreted as described in [RFC2119].
1.3. Terminology 1.3. Terminology
Aggregation Point Router (APR): An Aggregation Point Router (APR) is Aggregation Point Router (APR): An Aggregation Point Router (APR) is
skipping to change at page 5, line 39 skipping to change at page 6, line 26
APRs advertise the VP to other routers with BGP. For each sub- APRs advertise the VP to other routers with BGP. For each sub-
prefix within the VP, APRs have a tunnel from themselves to the prefix within the VP, APRs have a tunnel from themselves to the
remote ASBR (Autonomous System Border Router) where packets for remote ASBR (Autonomous System Border Router) where packets for
that prefix should be delivered. that prefix should be delivered.
Install and Suppress: The terms "install" and "suppress" are used to Install and Suppress: The terms "install" and "suppress" are used to
describe whether a RIB entry has been loaded or not loaded into describe whether a RIB entry has been loaded or not loaded into
the FIB. In other words, the phrase "install a route" means the FIB. In other words, the phrase "install a route" means
"install a route into the FIB", and the phrase "suppress a route" "install a route into the FIB", and the phrase "suppress a route"
means "do not install a route into the FIB". means "do not install a route into the FIB".
Legacy Router: A router that does not run VA, and has no knowledge Legacy Router: A router that does not run VA, and has no knowledge
of VA. Legacy routers, however, must be able to terminate of VA. Legacy routers, however, must be able to terminate tunnels
tunnels. (If a Legacy router cannot terminate tunnels, then any when they are local ASBRs.
routes that are reached via that router must be installed in all
FIBs.)
non-APR Router: In discussing VPs, it is often necessary to non-APR Router: In discussing VPs, it is often necessary to
distinguish between routers that are APRs for that VP, and routers distinguish between routers that are APRs for that VP, and routers
that are not APRs for that VP (but of course may be APRs for other that are not APRs for that VP (but of course may be APRs for other
VPs not under discussion). In these cases, the term "APR" is VPs not under discussion). In these cases, the term "APR" is
taken to mean "a VA router that is an APR for the given VP", and taken to mean "a VA router that is an APR for the given VP", and
the term "non-APR" is taken to mean "a VA router that is not an the term "non-APR" is taken to mean "a VA router that is not an
APR for the given VP". The term non-APR router is not used to APR for the given VP". The term non-APR router is not used to
refer to legacy routers. refer to legacy routers.
Popular Prefix: A Popular Prefix is a sub-prefix that is installed
Popular Prefix: A popular prefix is a sub-prefix that is installed
in a router in addition to the sub-prefixes it holds by virtue of in a router in addition to the sub-prefixes it holds by virtue of
being a Aggregation Point Router. The popular prefix allows being a Aggregation Point Router. The Popular Prefix allows
packets to follow the shortest path. Note that different routers packets to follow the shortest path. Note that different routers
do not need to have the same set of popular prefixes. do not need to have the same set of Popular Prefixes.
Routing Information Base (RIB): The term RIB is used rather sloppily Routing Information Base (RIB): The term RIB is used rather sloppily
in this document to refer either to the loc-RIB (as used in in this document to refer either to the loc-RIB (as used in
[RFC4271]), or to the combined Adj-RIBs-In, the Loc-RIB, and the [RFC4271]), or to the combined Adj-RIBs-In, the Loc-RIB, and the
Adj-RIBs-Out. Adj-RIBs-Out.
Sub-Prefix: A regular (physically aggregatable) prefix. These are Sub-Prefix: A regular (physically aggregatable) prefix. These are
equivalent to the prefixes that would normally comprise the DFRT equivalent to the prefixes that would normally comprise the DFRT
in the absence of VA. A VA router will contain a sub-prefix entry in the absence of VA. A VA router will contain a sub-prefix entry
either because the sub-prefix falls within a virtual prefix for either because the sub-prefix falls within a Virtual Prefix for
which the router is an APR, or because the sub-prefix is installed which the router is an APR, or because the sub-prefix is installed
as a popular prefix. Legacy routers hold the same sub-prefixes as a Popular Prefix. Legacy routers hold the same sub-prefixes
they hold today. that they hold today.
Tunnel: VA can use a variety of tunnel types: MPLS LSPs, IP-in-IP,
GRE, L2TP, and so on. This document does not describe how any Tunnel: This draft specifies the use of MPLS Label Switched Paths
given tunnel information is conveyed: that is left for companion (LSP), and of MPLS inner labels tunneled over either LSPs or IP
documents. This document uses the term tunnel to refer to any headers. Other types of tunnels may be used, but are not
appropriate tunnel type. specified here. This document generically uses the term tunnel to
refer to any of these tunnel types.
VA router: A router that operates Virtual Aggregation according to VA router: A router that operates Virtual Aggregation according to
this document. this document.
Virtual Prefix (VP): A Virtual Prefix (VP) is a prefix used to Virtual Prefix (VP): A Virtual Prefix (VP) is a prefix used to
aggregate its contained regular prefixes (sub-prefixes). A VP is aggregate its contained regular prefixes (sub-prefixes). The set
not physically aggregatable, and so it is aggregated at APRs of sub-prefixes in a VP are not physically aggregatable, and so
through the use of tunnels. they are aggregated at APRs through the use of tunnels.
VP-List: A list of defines VPs. All routers must agree on the VP-List: A list of defined VPs. All routers must agree on the
contents of this list (which is statically configured into every contents of this list (which is statically configured into every
VA router). VA router).
1.4. Temporary Sections 1.4. Temporary Sections
This section contains temporary information, and will be removed in This section contains temporary information, and will be removed in
the final version. the final version.
1.4.1. Document revisions 1.4.1. Document revisions
This document was previously published as both This document was previously published as both
draft-francis-idr-intra-va-01.txt and draft-francis-intra-va-01.txt. draft-francis-idr-intra-va-01.txt and draft-francis-intra-va-01.txt.
1.4.1.1. Revisions from the 00 version of draft-ietf-grow-va-00 1.4.1.1. Revisions from the 01 version of draft-ietf-grow-va
The specification of how to use tunnels has been incorporated
directly into this draft. Formerly the specifications were provided
in separate drafts ([I-D.ietf-grow-va-mpls], and
[I-D.ietf-grow-va-mpls-innerlabel]). The tunneling types specified
in [I-D.ietf-grow-va-gre] are not included in this draft.
The simpler "core-edge" style of deployment has been removed from
this draft and specified in a stand-alone draft
[I-D.ietf-grow-simple-va] to simplify its understanding for those
interested in only that style of deployment.
Added text about usage of uRPF (strict and loose).
Added text about flapping APR failure scenario.
1.4.1.2. Revisions from the 00 version of draft-ietf-grow-va
Removed the notion that FIB suppression can be done by suppressing Removed the notion that FIB suppression can be done by suppressing
entries from the Routing Table (as defined in Section 3.2 of entries from the Routing Table (as defined in Section 3.2 of
[RFC4271]), an idea that was introduced in the second version of the [RFC4271]), an idea that was introduced in the second version of the
draft. Suppressing from the Routing Table breaks PIM-SM, which draft. Suppressing from the Routing Table breaks PIM-SM, which
relies on the contents of the Routing Table to produce its forwarding relies on the contents of the unicast Routing Table to produce its
table. forwarding table.
1.4.1.2. Revisions from the 00 version (of 1.4.1.3. Revisions from the 00 version (of
draft-francis-intra-va-00.txt) draft-francis-intra-va-00.txt)
Added additional authors (Jen, Raszuk, Zhang), to reflect primary Added additional authors (Jen, Raszuk, Zhang), to reflect primary
contributors moving forwards. In addition, a number of minor contributors moving forwards. In addition, a number of minor
clarifications were made. clarifications were made.
1.4.1.3. Revisions from the 01 version (of 1.4.1.4. Revisions from the 01 version (of
draft-francis-idr-intra-va-01.txt) draft-francis-idr-intra-va-01.txt)
1. Changed file name from draft-francis-idr-intra-va to 1. Changed file name from draft-francis-idr-intra-va to
draft-francis-intra-va. draft-francis-intra-va.
2. Restructured the document to make the edge suppression mode a 2. Restructured the document to make the edge suppression mode a
specific sub-case of VA rather than a separate mode of operation. specific sub-case of VA rather than a separate mode of operation.
This includes modifying the title of the draft. This includes modifying the title of the draft.
3. Removed MPLS tunneling details so that specific tunneling 3. Removed MPLS tunneling details so that specific tunneling
approaches can be described in separate documents. approaches can be described in separate documents.
1.4.1.4. Revisions from 00 version 1.4.1.5. Revisions from 00 version
o Changed intended document type from STD to BCP, as per advice from o Changed intended document type from STD to BCP, as per advice from
Dublin IDR meeting. Dublin IDR meeting.
o Cleaned up the MPLS language, and specified that the full-address o Cleaned up the MPLS language, and specified that the full-address
routes to remote ASBRs must be imported into OSPF (Section 3.2.3). routes to remote ASBRs must be imported into OSPF (Section 3.1.3).
As per Daniel Ginsburg's email As per Daniel Ginsburg's email
http://www.ietf.org/mail-archive/web/idr/current/msg02933.html. http://www.ietf.org/mail-archive/web/idr/current/msg02933.html.
o Clarified that legacy routers must run MPLS. As per Daniel o Clarified that legacy routers must run MPLS. As per Daniel
Ginsburg's email Ginsburg's email
http://www.ietf.org/mail-archive/web/idr/current/msg02935.html. http://www.ietf.org/mail-archive/web/idr/current/msg02935.html.
o Fixed LOCAL_PREF bug. As per Daniel Ginsburg's email o Fixed LOCAL_PREF bug. As per Daniel Ginsburg's email
http://www.ietf.org/mail-archive/web/idr/current/msg02940.html. http://www.ietf.org/mail-archive/web/idr/current/msg02940.html.
o Removed the need for the extended communities attribute on VP o Removed the need for the extended communities attribute on VP
routes, and added the requirement that all VA routers be routes, and added the requirement that all VA routers be
statically configured with the complete list of VPs. As per statically configured with the complete list of VPs. As per
skipping to change at page 8, line 24 skipping to change at page 9, line 24
overviews the additional functions required by VA routers to overviews the additional functions required by VA routers to
accommodate legacy routers. accommodate legacy routers.
A key concept behind VA is to operate BGP as normal, and in A key concept behind VA is to operate BGP as normal, and in
particular to populate the RIB with the full DFRT, but to suppress particular to populate the RIB with the full DFRT, but to suppress
many or most prefixes from being loaded into the FIB. By populating many or most prefixes from being loaded into the FIB. By populating
the RIB as normal, we avoid any changes to BGP, and changes to router the RIB as normal, we avoid any changes to BGP, and changes to router
operation are relatively minor. The basic idea behind VA is quite operation are relatively minor. The basic idea behind VA is quite
simple. The address space is partitioned into large prefixes --- simple. The address space is partitioned into large prefixes ---
larger than any aggregatable prefix in use today. These prefixes are larger than any aggregatable prefix in use today. These prefixes are
called virtual prefixes (VP). Different VPs do not need to be the called Virtual Prefixes (VP). Different VPs do not need to be the
same size. They may be a mix of /6, /7, /8 (for IPv4), and so on. same size. They may be a mix of /6, /7, /8 (for IPv4), and so on.
Indeed, an ISP can define a single /0 VP, and use it for a core/edge Indeed, an ISP can define a single /0 VP, and use it for a core/edge
type of configuration (commonly seen today). That is, the core type of configuration [I-D.ietf-grow-simple-va]. That is, the core
routers would maintain full FIBs, and edge routers could maintain routers would maintain full FIBs, and edge routers could maintain
default routes to the core routers, and suppress as much of the FIB default routes to the core routers, and suppress as much of the FIB
as they wish. Each ISP can independently select the size of its VPs. as they wish. Each ISP can independently select the size of its VPs.
VPs are not themselves topologically aggregatable. VA makes the VPs VPs are not themselves topologically aggregatable. VA makes the VPs
aggregatable through the use of tunnels, as follows. Associated with aggregatable through the use of tunnels, as follows. Associated with
each VP are one or more "Aggregation Point Routers" (APR). An APR each VP are one or more "Aggregation Point Routers" (APR). An APR
(for a given VP) is a router that installs routes for all sub- (for a given VP) is a router that installs routes for all sub-
prefixes (i.e. real physically aggregatable prefixes) within the VP. prefixes (i.e. real physically aggregatable prefixes) within the VP.
By "install routes" here, we mean: Note that an APR is not a special router per se---it is an otherwise
normal router that is configured to operate as an APR. By "install
routes" here, we mean:
1. The route for each of the sub-prefixes is loaded into the FIB, 1. The route for each of the sub-prefixes is loaded into the FIB,
and and
2. there is a tunnel from the APR to the BGP NEXT_HOP for the route. 2. there is a tunnel from the APR to the BGP NEXT_HOP for the route.
The APR originates a BGP route to the VP. This route is distributed The APR originates a BGP route to the VP. This route is distributed
within the domain, but not outside the domain. With this structure within the domain, but not outside the domain. With this structure
in place, a packet transiting the ISP goes from the ingress router to in place, a packet transiting the ISP goes from the ingress router to
the APR (possibly via a tunnel), and then from the APR to the BGP the APR (usually via a tunnel), and then from the APR to the BGP
NEXT_HOP router via a tunnel. NEXT_HOP router via a tunnel. VA can operate with MPLS LSPs, or with
MPLS inner labels over LSPs or IP headers. Section 4 specifies the
usage of tunnels.
Normally the BGP NEXT_HOP is the remote ASBR. In this case, even The BGP NEXT_HOP can be either the local ASBR or the remote ASBR. In
though the remote ASBR is the tunnel endpoint, the tunnel header is the former case, an inner label is used to tunnel packets
stripped by the local ASBR before the packet is delivered to the (Section 4.2). In either case, all tunner headers are stripped by
remote ASBR. In other words, the remote ASBR sees a normal IP the local ASBR before the packet is delivered to the remote ASBR. In
packet, and is completely unaware of the existence of VA in the other words, the remote ASBR sees a normal IP packet, and is
neighboring ISP. The exception to this is legacy local ASBR routers. completely unaware of the existence of VA in the neighboring ISP.
In this case, the legacy router is the BGP NEXT_HOP, and packets are Note that legacy ASBRs MUST set themselves as the BGP NEXT_HOP.
tunneled to the legacy router, which then uses a FIB lookup to
deliver the packet to the appropriate remote ASBR. This applies only
to legacy routers that can convey tunnel parameters and detunnel
packets.
Note that the AS-path is not effected at all by VA. This means among Note that the AS-path is not effected at all by VA. This means among
other things that AS-level policies are not effected by VA. The other things that AS-level policies are not effected by VA. The
packet may not, however, follow the shortest path within the ISP packet may not, however, follow the shortest path within the ISP
(where shortest path is defined here as the path that would have been (where shortest path is defined here as the path that would have been
taken if VA were not operating), because the APR may not be on the taken if VA were not operating), because the APR may not be on the
shortest path between the ingress and egress routers. When this shortest path between the ingress and egress routers. When this
happens, the packet experiences additional latency and creates extra happens, the packet experiences additional latency and creates extra
load (by virtue of taking more hops than it otherwise would have). load (by virtue of taking more hops than it otherwise would have).
Note also that, with VA, a packet may occasionally take a different Note also that, with VA, a packet may occasionally take a different
exit point than it otherwise would have. exit point than it otherwise would have.
VA can avoid traversing the APR for selected routes by installing VA can avoid traversing the APR for selected routes by installing
these routes in non-APR routers. In other words, even if an ingress these routes in non-APR routers. In other words, even if an ingress
router is not an APR for a given sub-prefix, it may install that sub- router is not an APR for a given sub-prefix, it MAY install that sub-
prefix into its FIB. Packets in this case are tunneled directly from prefix into its FIB. Packets in this case are tunneled directly from
the ingress to the BGP NEXT_HOP. These extra routes are called the ingress to the BGP NEXT_HOP. These extra routes are called
"Popular Prefixes", and are typically installed for policy reasons "Popular Prefixes", and are typically installed for policy reasons
(i.e. customer routes are always installed), or for sub-prefixes that (e.g. customer routes are always installed), or for sub-prefixes that
carry a high volume of traffic (Section 3.2.5.1). Different routers carry a high volume of traffic (Section 3.1.5.1). Different routers
may have different popular prefixes. As such, an ISP may assign MAY have different Popular Prefixes. As such, an ISP MAY assign
popular prefixes per router, per POP, or uniformly across the ISP. A Popular Prefixes per router, per POP, or uniformly across the ISP. A
given router may have zero popular prefixes, or the majority of its given router MAY have zero Popular Prefixes, or the majority of its
FIB may consist of popular prefixes. The effectiveness of popular FIB MAY consist of Popular Prefixes. The effectiveness of Popular
prefixes to reduce traffic load relies on the fact that traffic Prefixes to reduce traffic load relies on the fact that traffic
volumes follow something like a power-law distribution: i.e. that 90% volumes follow something like a power-law distribution: i.e. that 90%
of traffic is destined to 10% of the destinations. Internet traffic of traffic is destined to 10% of the destinations. Internet traffic
measurement studies over the years have consistently shown that measurement studies over the years have consistently shown that
traffic patterns follow this distribution, though there is no traffic patterns follow this distribution, though there is no
guarantee that they always will. guarantee that they always will.
Note that for routing to work properly, every packet must sooner or Note that for routing to work properly, every packet must sooner or
later reach a router that has installed a sub-prefix route that later reach a router that has installed a sub-prefix route that
matches the packet. This would obviously be the case for a given matches the packet. This would obviously be the case for a given
sub-prefix if every router has installed a route for that sub-prefix sub-prefix if every router has installed a route for that sub-prefix
(which of course is the situation in the absence of VA). If this is (which of course is the situation in the absence of VA). If this is
not the case, then there must be at least one Aggregation Point not the case, then there MUST be at least one Aggregation Point
Router (APR) for the sub-prefix's virtual prefix (VP). Ideally, Router (APR) for the sub-prefix's Virtual Prefix (VP). Ideally,
every POP contains at least two APRs for every virtual prefix. By every POP contains at least two APRs for every Virtual Prefix. By
having APRs in every POP, the latency imposed by routing to the APR having APRs in every POP, the latency imposed by routing to the APR
is minimal (the extra hop is within the POP). By having more than is minimal (the extra hop is within the POP). By having more than
one APR, there is a redundant APR should one fail. In practice it is one APR, there is a redundant APR should one fail. In practice it is
often not possible to have an APR for every VP in every POP. This is often not possible to have an APR for every VP in every POP. This is
because some POPs may have only one or a few routers, and therefore because some POPs may have only one or a few routers, and therefore
there may not have enough cumulative FIB space in the POP to hold there may not have enough cumulative FIB space in the POP to hold
every sub-prefix. Note that any router ("edge", "core", etc.) may be every sub-prefix. Note that any router ("edge", "core", etc.) MAY
an APR. be an APR.
It is important that both the contents of BGP RIBs, as well as the It is important that both the contents of BGP RIBs, as well as the
contents of the Routing Table (as defined in Section 3.2 of contents of the Routing Table (as defined in Section 3.2 of
[RFC4271]) not be modified by VA (other than the introduction of [RFC4271]) not be modified by VA (other than the introduction of
routes to VPs). This is because PIM-SM [RFC4601] relies on the routes to VPs). This is because PIM-SM [RFC4601] relies on the
contents of the Routing Table to build its own trees and forwarding contents of the Routing Table to build its own trees and forwarding
table. Therefore, FIB suppression must take place between the table. Therefore, FIB suppression MUST take place between the
Routing Table and the actual FIB(s). Routing Table and the actual FIB(s).
2.1. Mix of legacy and VA routers 2.1. Mix of legacy and VA routers
It is important that an ISP be able to operate with a mix of "VA It is important that an ISP be able to operate with a mix of "VA
routers" (routers upgraded to operate VA as described in the routers" (routers upgraded to operate VA as described in the
document) and "legacy routers". This allows ISPs to deploy VA in an document) and "legacy routers". This allows ISPs to deploy VA in an
incremental fashion and to continue to use routers that for whatever incremental fashion and to continue to use routers that for whatever
reason cannot be upgraded. This document allows such a mix, and reason cannot be upgraded. This document allows such a mix, and
indeed places no topological restrictions on that mix. It does, indeed places no topological restrictions on that mix. It does,
however, require that legacy routers (and VA routers for that matter) however, require that legacy routers (and VA routers for that matter)
are able to forward already-tunneled packets, are able to serve as are able to forward already-tunneled packets, are able to serve as
tunnel endpoints, and are able to participate in distribution of tunnel endpoints, and are able to participate in distribution of
tunnel information required to establish themselves as tunnel tunnel information required to establish themselves as tunnel
endpoints. (This is listed as Requirement R5 in the companion endpoints. (This is listed as Requirement R5 in the companion
tunneling documents.) Depending on the tunnel type, legacy routers tunneling documents.) Depending on the tunnel type, legacy routers
may also be able to generate tunneled packets, though this is an MAY also be able to initiate tunneled packets, though this is an
optional requirement. (This is listed as Requirement R4 in the OPTIONAL requirement. (This is listed as Requirement R4 in the
companion tunneling documents.) Legacy routers must use their own companion tunneling documents.) Legacy routers MUST use their own
address as the BGP NEXT_HOP, and must FIB-install routes for which address as the BGP NEXT_HOP, and MUST FIB-install routes for which
they are the BGP NEXT_HOP. they are the BGP NEXT_HOP.
2.2. Summary of Tunnels and Paths 2.2. Summary of Tunnels and Paths
To summarize, the following tunnels are created: To summarize, the following tunnels are created:
1. From all VA routers to all BGP NEXT_HOP addresses (where the BGP 1. From all VA routers to all BGP NEXT_HOP addresses (where the BGP
NEXT_HOP address is either an APR, a legacy router, or the remote NEXT_HOP address is either an APR, a local ASBR, or the remote
ASBR neighbor of a VA router). Note that this is listed as ASBR neighbor of a VA router). Note that this is listed as
Requirement R3 in the companion tunneling documents. Requirement R3 in the companion tunneling documents.
2. Optionally, from all legacy routers to all BGP NEXT_HOP 2. Optionally, from all legacy routers to all BGP NEXT_HOP
addresses. addresses.
There are a number of possible paths that packets may take through an There are a number of possible paths that packets may take through an
ISP, summarized in the following diagram. Here, "VA" is a VA router, ISP, summarized in the following diagram. Here, "VA" is a VA router,
"LR" is a legacy router, the symbol "==>" represents a tunneled "LR" is a legacy router, the symbol "==>" represents a tunneled
packet (through zero or more routers), "-->" represents an untunneled packet (through zero or more routers), "-->" represents an untunneled
packet, and "(pop)" represents stripping the tunnel header. The packet, and "(pop)" represents stripping the tunnel header. The
symbol "::>" represents the portion of the path where although the symbol "::>" represents the portion of the path where although the
skipping to change at page 11, line 40 skipping to change at page 12, line 39
6. LR===============================>LR--------->LR 6. LR===============================>LR--------->LR
(The following two exist in the case where legacy routers (The following two exist in the case where legacy routers
cannot initiate tunneled packets.) cannot initiate tunneled packets.)
7. LR------->VA (remaining paths as in 1 to 4 above) 7. LR------->VA (remaining paths as in 1 to 4 above)
8. LR------->LR--------------------->LR--------->LR 8. LR------->LR--------------------->LR--------->LR
The first and second paths represent the case where the ingress The first and second paths represent the case where the ingress
router does not have a popular prefix for the destination, and must router does not have a Popular Prefix for the destination, and MUST
tunnel the packet to an APR. The third and fourth paths represent tunnel the packet to an APR. The third and fourth paths represent
the case where the ingress router does have a popular prefix for the the case where the ingress router does have a Popular Prefix for the
destination, and so tunnels the packet directly to the egress. The destination, and so tunnels the packet directly to the egress. The
fifth and sixth paths are similar, but where the ingress is a legacy fifth and sixth paths are similar to the third and fourth paths
router that can initiate tunneled packets, and effectively has the respectively, but where the ingress is a legacy router that can
popular prefix by virtue of holding the entire DFRT. (Note that some initiate tunneled packets, and effectively has the Popular Prefix by
ISPs have only partial RIBs in their customer-facing edge routers, virtue of holding the entire DFRT. (Note that some ISPs have only
and default route to a router that holds the full DFRT. This case is partial RIBs in their customer-facing edge routers, and default route
not shown here.) Finally, paths 7 and 8 represent the case where to a router that holds the full DFRT. This case is not shown here,
legacy routers cannot initiate a tunneled packet. but works perfectly well.) Finally, paths 7 and 8 represent the case
where legacy routers cannot initiate a tunneled packet.
VA prevents the routing loops that might otherwise occur when VA VA prevents the routing loops that might otherwise occur when VA
routers and legacy routers are mixed. The trick is avoiding the case routers and legacy routers are mixed. The trick is avoiding the case
where a legacy router is forwarding packets towards the BGP NEXT_HOP, where a legacy router is forwarding packets towards the BGP NEXT_HOP,
while a VA router is forwarding packets towards the APR, with each while a VA router is forwarding packets towards the APR, with each
router thinking that the other is on the shortest path to their router thinking that the other is on the shortest path to their
respective targets. respective targets.
In the first four types of path, the loop is avoided because tunnels In the first four types of path, the loop is avoided because tunnels
are used all the way to the egress. As a result, there is never an are used all the way to the egress. As a result, there is never an
skipping to change at page 12, line 35 skipping to change at page 13, line 35
progress through a series of legacy routers (in which case the IGP progress through a series of legacy routers (in which case the IGP
prevents loops), or it will eventually reach a VA router, after which prevents loops), or it will eventually reach a VA router, after which
it will take tunnels as in the 1st and 2nd cases. it will take tunnels as in the 1st and 2nd cases.
3. Specification of VA 3. Specification of VA
This section describes in detail how to operate VA. It starts with a This section describes in detail how to operate VA. It starts with a
brief discussion of requirements, followed by a specification of brief discussion of requirements, followed by a specification of
router support for VA. router support for VA.
3.1. Requirements for VA 3.1. VA Operation
While the core requirement is of course to be able to manage FIB
size, this must be done in a way that:
o is robust to router failure,
o allows for traffic engineering,
o allows for existing inter-domain routing policies,
o operates in a predictable manner and is therefore possible to
test, debug, and reason about performance (i.e. establish SLAs),
o can be safely installed, tested, and started up,
o Can be configured and reconfigured without service interruption,
o can be incrementally deployed, and in particular can be operated
in an AS with a mix of VA-capable and legacy routers,
o accommodates existing security mechanisms such as ingress
filtering and DoS defense,
o does not introduce significant new security vulnerabilities.
In short, operation of VA must not significantly affect the way ISPs
operate their networks today. Section 3.3 discusses the extent to
which these requirements are met by the design presented in
Section 3.2.
3.2. VA Operation
In this section, the detailed operation of VA is specified. In this section, the detailed operation of VA is specified.
3.2.1. Legacy Routers 3.1.1. Legacy Routers
VA can operate with a mix of VA and legacy routers. To avoid the VA can operate with a mix of VA and legacy routers. To prevent the
types of loops described in Section 2.2, however, legacy routers MUST types of loops described in Section 2.2, however, legacy routers MUST
satisfy the following requirements: satisfy the following requirements:
1. When forwarding externally-received routes over iBGP, the BGP 1. When forwarding externally-received routes over iBGP, the BGP
NEXT_HOP attribute MUST be set to the legacy router itself. NEXT_HOP attribute MUST be set to the legacy router itself.
2. Legacy routers MUST be able to detunnel packets addressed to 2. Legacy routers MUST be able to detunnel packets addressed to
themselves at the BGP NEXT_HOP address. They MUST also be able themselves at the BGP NEXT_HOP address. They MUST also be able
to convey the tunnel information needed by other routers to to convey the tunnel information needed by other routers to
initiate tunneled packets to them. This is listed as initiate tunneled packets to them. This is listed as
"Requirement R1" in the companion tunneling documents. If a "Requirement R1" in the companion tunneling documents. If a
skipping to change at page 13, line 41 skipping to change at page 14, line 17
4. Every legacy router MUST hold its complete FIB. (Note, of 4. Every legacy router MUST hold its complete FIB. (Note, of
course, that this FIB does not necessarily need to contain the course, that this FIB does not necessarily need to contain the
full DFRT. This might be the case, for instance, if the router full DFRT. This might be the case, for instance, if the router
is an edge router that defaults to a core router.) is an edge router that defaults to a core router.)
As long as legacy routers participating in tunneling as described As long as legacy routers participating in tunneling as described
above there are no topological restrictions on the legacy routers. above there are no topological restrictions on the legacy routers.
They may be freely mixed with VA routers without the possibility of They may be freely mixed with VA routers without the possibility of
forming sustained loops (Section 2.2). forming sustained loops (Section 2.2).
3.2.2. Advertising and Handling Virtual Prefixes (VP) 3.1.2. Advertising and Handling Virtual Prefixes (VP)
3.2.2.1. Distinguishing VP's from Sub-prefixes 3.1.2.1. Distinguishing VPs from Sub-prefixes
VA routers must be able to distinguish VP's from sub-prefixes. This VA routers MUST be able to distinguish VPs from sub-prefixes. This
is primarily in order to know which routes to install. In is primarily in order to know which routes to install. In
particular, non-APR routers must know which prefixes are VPs before particular, non-APR routers MUST know which prefixes are VPs before
they receive routes for those VPs, for instance when they first boot they receive routes for those VPs, for instance when they first boot
up. This is in order to avoid the situation where they unnecessarily up. This is in order to avoid the situation where they unnecessarily
start filling their FIB with routes that they ultimately don't need start filling their FIBs with routes that they ultimately don't need
to install (Section 3.2.5). This leads to the following requirement: to install (Section 3.1.5). This leads to the following requirement:
It MUST be possible to statically configure the complete list of VP's It MUST be possible to statically configure the complete list of VPs
into all VA routers. This list is known as the VP-List. into all VA routers. This list is known as the VP-List.
3.2.2.2. Limitations on Virtual Prefixes 3.1.2.2. Limitations on Virtual Prefixes
From the point of view of best-match routing semantics, VPs are From the point of view of best-match routing semantics, VPs are
treated identically to any other prefix. In other words, if the treated identically to any other prefix. In other words, if the
longest matching prefix is a VP, then the packet is routed towards longest matching prefix is a VP, then the packet is routed towards
the VP. If a packet matching a VP reaches an Aggregation Point the VP. If a packet matching a VP reaches an Aggregation Point
Router (APR) for that VP, and the APR does not have a better matching Router (APR) for that VP, and the APR does not have a better matching
route, then the packet is discarded by the APR (just as a router that route, then the packet is discarded by the APR (just as a router that
originates any prefix will discard a packet that does not have a originates any prefix will discard a packet that does not have a
better match). better match).
The overall semantics of VPs, however, are subtly different from The overall semantics of VPs, however, are slightly different from
those of real prefixes (well, maybe not so subtly). Without VA, when those of real prefixes. Without VA, when a router originates a route
a router originates a route for a (real) prefix, the expectation is for a (real) prefix, the expectation is that the addresses within the
that the addresses within the prefix are within the originating AS prefix are within the originating AS (or a customer of the AS). For
(or a customer of the AS). For VPs, this is not the case. APRs VPs, this is not the case. APRs originate VPs whose sub-prefixes
originate VPs whose sub-prefixes exist in different ASes. Because of exist in different ASes. Because of this, it is important that VPs
this, it is important that VPs not be advertised across AS not be advertised across AS boundaries.
boundaries.
It is up to individual domains to define their own VPs. VPs MUST be It is up to individual domains to define their own VPs. VPs MUST be
"larger" (span a larger address space) than any real sub-prefix. If "larger" (span a larger address space) than any real sub-prefix. If
a VP is smaller than a real prefix, then packets that match the real a VP is smaller than a real prefix, then packets that match the real
prefix will nevertheless be routed to an APR owning the VP, at which prefix will nevertheless be routed to an APR owning the VP, at which
point the packet will be dropped if it does not match a sub-prefix point the packet will be dropped if it does not match a sub-prefix
within the VP (Section 5). within the VP (Section 6).
(Note that, in principle there are cases where a VP could be smaller (Note that, in principle there are cases where a VP could be smaller
than a real prefix. This is where the egress router to the real than a real prefix. This is where the egress router to the real
prefix is a VA router. In this case, the APR could theoretically prefix is a VA router. In this case, the APR could theoretically
tunnel the packet to the appropriate remote ASBR, which would then tunnel the packet to the appropriate remote ASBR, which would then
forward the packet correctly. On the other hand, if the egress forward the packet correctly. On the other hand, if the egress
router is a legacy router, then the APR could not tunnel matching router is a legacy router, then the APR could not tunnel matching
packets to the egress. This is because the egress would view the VP packets to the egress. This is because the egress would view the VP
as a better match, and would loop the packet back to the APR. For as a better match, and would loop the packet back to the APR. For
this reason we require that VPs be larger than any real prefixes, and this reason we require that VPs be larger than any real prefixes, and
that APR's never install prefixes larger than a VP in their FIBs.) that APRs never install prefixes larger than a VP in their FIBs.)
It is valid for a VP to be a subset of another VP. For example, 20/7 It is valid for a VP to be a subset of another VP. For example, 20/7
and 20/8 can both be VPs. In fact, this capability is necessary for and 20/8 can both be VPs. In fact, this capability is necessary for
"splitting" a VP without temporarily the FIB size in any router. "splitting" a VP without temporarily increasing the FIB size in any
(Section 3.2.2.5). router. (Section 3.1.2.5).
3.2.2.3. Aggregation Point Routers (APR) 3.1.2.3. Aggregation Point Routers (APR)
Any router may be configured as an Aggregation Point Router (APR) for Any router MAY be configured as an Aggregation Point Router (APR) for
one or more Virtual Prefixes (VP). For each VP for which a router is one or more Virtual Prefixes (VP). For each VP for which a router is
an APR, the router does the following: an APR, the router does the following:
1. The APR MUST originate a BGP route to the VP [RFC4271]. In this 1. The APR MUST originate a BGP route to the VP [RFC4271]. In this
route, the NLRI are all of the VPs for which the router is an route, the NLRI are all of the VPs for which the router is an
APR. This is true even for VPs that are a subset of another VP. APR. This is true even for VPs that are a subset of another VP.
The ORIGIN is set to INCOMPLETE (value 2), the AS number of the The ORIGIN is set to INCOMPLETE (value 2), the AS number of the
APR's AS is used in the AS_PATH, and the BGP NEXT_HOP is set to APR's AS is used in the AS_PATH, and the BGP NEXT_HOP is set to
the address of the APR. The ATOMIC_AGGREGATE and AGGREGATOR the address of the APR. The ATOMIC_AGGREGATE and AGGREGATOR
attributes are not included. attributes are not included.
2. The APR MUST attach a NO_EXPORT Communities Attribute [RFC1997] 2. The APR MUST attach a NO_EXPORT Communities Attribute [RFC1997]
to the route. to the route.
3. The APR MUST be able to detunnel packets addressed to itself at 3. The APR MUST be able to detunnel packets addressed to itself at
its BGP NEXT_HOP address. It MUST also be able to convey the its BGP NEXT_HOP address. It MUST also be able to convey the
tunnel information needed by other routers to initiate tunneled tunnel information needed by other routers to initiate tunneled
packets to them (Requirement R1). packets to them (Requirement R1).
4. If a packet is received at the APR whose best match is the VP 4. If a packet is received at the APR whose best match route is the
(i.e. it matches the VP but not any sub-prefixes within the VP), VP (i.e. it matches the VP but not any sub-prefixes within the
then the packet MUST be discarded (see Section 3.2.2.2). This VP), then the packet MUST be discarded (see Section 3.1.2.2).
can be accomplished by never installing a prefix larger than the This can be accomplished by never installing a prefix larger than
VP into the FIB, or by installing the VP as a route to \dev\null. the VP into the FIB, or by installing the VP as a route to
\dev\null.
3.2.2.3.1. Selecting APRs 3.1.2.3.1. Selecting APRs
An ISP is free to select APRs however it chooses. The details of An ISP is free to select APRs however it chooses. The details of
this are outside the scope of this document. Nevertheless, a few this are outside the scope of this document. Nevertheless, a few
comments are made here. In general, APRs should be selected such comments are made here. In general, APRs should be selected such
that the distance to the nearest APR for any VP is small---ideally that the distance to the nearest APR for any VP is small---ideally
within the same POP. Depending on the number of routers in a POP, within the same POP. Depending on the number of routers in a POP,
and the sizes of the FIBs in the routers relative to the DFRT size, and the sizes of the FIBs in the routers relative to the DFRT size,
it may not be possible for all VPs to be represented in a given POP. it may not be possible for all VPs to be represented in a given POP.
In addition, there should be multiple APRs for each VP, again ideally In addition, there should be multiple APRs for each VP, again ideally
in each POP, so that the failure of one does not unduly disrupt in each POP, so that the failure of one does not unduly disrupt
traffic. traffic.
APRs may be (and probably should be) statically assigned. They may
also, however, be dynamically assigned, for instance in response to
APR failure. For instance, each router may be assigned as a backup
APR for some other APR. If the other APR crashes (as indicated by
the withdrawal of its routes to its VPs), the backup APR can install
the appropriate sub-prefixes and advertise the VP as specified above.
Note that doing so may require it to first remove some popular
prefixes from its FIB to make room.
Note that, although VPs MUST be larger than real prefixes, there is Note that, although VPs MUST be larger than real prefixes, there is
intentionally no mechanism designed to automatically insure that this intentionally no mechanism designed to automatically insure that this
is the case. Such a mechanisms would be dangerous. For instance, if is the case. Such a mechanisms would be dangerous. For instance, if
an ISP somewhere advertised a very large prefix (a /4, say), then an ISP somewhere advertised a very large prefix (a /4, say), then
this would cause APRs to throw out all VPs that are smaller than this would cause APRs to throw out all VPs that are smaller than
this. For this reason, VPs must be set through static configuration this. For this reason, VPs MUST be set through static configuration
only. only.
3.2.2.4. Non-APR Routers 3.1.2.4. Non-APR Routers
A non-APR router MUST install at least the following routes: A non-APR router MUST install at least the following routes:
1. Routes to VPs (identifiable using the VP-List). 1. Routes to VPs (identifiable using the VP-List).
2. Routes to the largest of any prefixes that contain a given VP. 2. Routes to all sub-prefixes that are not covered by any VP in the
(Note that although this is not supposed to happen, if it does VP-list.
the non-APR should install it, with the effect that any addresses
in the prefix not covered by VPs will be routed outside the
domain.)
3. Routes to all prefixes that contain an address that is in part of
the address space for which no VP is defined (i.e. as is done
today without VA).
If the non-APR has a tunnel to the BGP NEXT_HOP of any such route, it If the non-APR has a tunnel to the BGP NEXT_HOP of any such route, it
MUST use the tunnel to forward packets to the BGP NEXT_HOP. MUST use the tunnel to forward packets to the BGP NEXT_HOP.
When an APR fails, routers MUST select another APR to send packets to When an APR fails, routers must select another APR to send packets to
(if there is one). This happens, however, through normal internal (if there is one). This happens, however, through normal internal
BGP convergence mechanisms. Note that it is strongly recommended BGP convergence mechanisms.
that routers keep at least two VP routes in their RIB at all times.
The main reason is that if the currently used VP route is withdrawn,
the second VP route can be immediately installed, and the issue of
whether to temporarily install sub-prefixes in the FIB is avoided
(Section 3.2.5). Another reason is that the IGP can be used to even
more quickly detect that the APR has crashed, again allowing the
second VP route to be immediately installed.
3.2.2.5. Adding and deleting VP's 3.1.2.5. Adding and deleting VPs
An ISP may from time to time wish to reconfigure its VP-List. There An ISP may from time to time wish to reconfigure its VP-List. There
are a number of reasons for this. For instance, early in its are a number of reasons for this. For instance, early in its
deployment an ISP may configure one or a small number of VPs in order deployment an ISP may configure one or a small number of VPs in order
to test VA. As the ISP gets more confident with VA, it may increase to test VA. As the ISP gets more confident with VA, it may increase
the number of VPs. Or, an ISP may start with a small number of large the number of VPs. Or, an ISP may start with a small number of large
VPs (i.e. /4's or even one /0), and over time move to more smaller VPs (i.e. /4's or even one /0), and over time move to more smaller
VPs in order to save even more FIB. In this case, the ISP will need VPs in order to save even more FIB. In this case, the ISP will need
to "split" a VP. Finally, since the address space is not uniformly to "split" a VP. Finally, since the address space is not uniformly
populated with prefixes, the ISP may want to change the size of VPs populated with prefixes, the ISP may want to change the size of VPs
in order to balance FIB size across routers. This can involve both in order to balance FIB size across routers. This can involve both
splitting and merging VPs. Of course, an ISP MUST be able to modify splitting and merging VPs. Of course, an ISP must be able to modify
its VP-List without 1) interrupting service to any destinations, or its VP-List without 1) interrupting service to any destinations, or
2) temporarily increasing the size of any FIB (i.e. where the FIB 2) temporarily increasing the size of any FIB (i.e. where the FIB
size during the change is no bigger than its size either before or size during the change is no bigger than its size either before or
after the change). after the change).
Adding a VP is straightforward. The first step is to configure the Adding a VP is straightforward. The first step is to configure the
APRs for the VP. This causes the APRs to originate routes for the APRs for the VP. This causes the APRs to originate routes for the
VP. Non-APR routers will install this route according to the rules VP. Non-APR routers will install this route according to the rules
in Section 3.2.2.4 even though they do not yet recognize that the in Section 3.1.2.4 even though they do not yet recognize that the
prefix is a VP. Subsequently the VP is added to the VP-List of non- prefix is a VP. Subsequently the VP is added to the VP-List of non-
APR routers. The Non-APR routers can then start suppressing the sub- APR routers. The Non-APR routers can then start suppressing the sub-
prefixes with no loss of service. prefixes with no loss of service.
To delete a VP, the process is reversed. First, the VP is removed To delete a VP, the process is reversed. First, the VP is removed
from the VP-Lists of non-APRs. This causes the non-APRs to install from the VP-Lists of non-APRs. This causes the non-APRs to install
the sub-prefixes. After all sub-prefixes have been installed, the VP the sub-prefixes. After all sub-prefixes have been installed, the VP
may be removed from the APRs. may be removed from the APRs.
In many cases, it is desirable to split a VP. For instance, consider In many cases, it is desirable to split a VP. For instance, consider
skipping to change at page 17, line 47 skipping to change at page 17, line 48
forward packets to the APRs for the smaller VPs. Next, the larger VP forward packets to the APRs for the smaller VPs. Next, the larger VP
can be removed from the VP-Lists of all non-APR routers. Finally, can be removed from the VP-Lists of all non-APR routers. Finally,
the larger VP can be removed from its APRs. the larger VP can be removed from its APRs.
To merge two VPs, the new larger VP is configured in all non-APRs. To merge two VPs, the new larger VP is configured in all non-APRs.
This has no effect on FIB size or APR selection, since the smaller This has no effect on FIB size or APR selection, since the smaller
VPs are better matches. Next the larger VP is configured in its VPs are better matches. Next the larger VP is configured in its
selected APRs. Next the smaller VPs are deleted from all non-APRs. selected APRs. Next the smaller VPs are deleted from all non-APRs.
Finally, the smaller VPs are deleted from their corresponding APRs. Finally, the smaller VPs are deleted from their corresponding APRs.
3.2.3. Border VA Routers 3.1.3. Border VA Routers
VA routers that are border routers MUST do the following: When
forwarding externally-received routes over iBGP, the BGP NEXT_HOP
attribute MUST be set to the remote ASBR. They MUST establish
tunnels that have the following properties (Requirement R2 in
companion documents):
1. The tunnel target must be the remote ASBR BGP NEXT_HOP address. A VA router that is an ASBR MUST do the following:
In other words, the target address used by other routers in the
domain for tunneling packets is the remote ASBR address.
2. The border router must detunnel the packet before forwarding the
packet to the remote ASBR. In other words, the remote ASBR
receives a normal untunneled packet identical to the packet it
would receive without VA.
3. The border router must be able to forward the packet without a
FIB lookup. In other words, the tunnel information itself
contains all the information needed by the border router to know
which remote ASBR should receive the packet.
Note that there are a number of ways the above tunnel can be created, 1. When forwarding externally-received routes over iBGP, if a tunnel
as documented separately. For instance, the tag on an MPLS LSP could with an inner label is used, the ASBR MUST set the BGP NEXT_HOP
identify the remote ASBR, and the border router could use what is attribute to itself. Otherwise, the BGP NEXT_HOP attribute is
effectively penultimate hop popping to deliver the packet. Or, GRE left unchanged.
could be used whereby the outer IP header addresses the border 2. They MUST establish tunnels as described in Section 4.
router, and the GRE key value identifies the remote ASBR. 3. The ASBR MUST detunnel the packet before forwarding the packet to
the remote ASBR. In other words, the remote ASBR receives a
normal untunneled packet identical to the packet it would receive
without VA.
4. The ASBR MUST be able to forward the packet without a FIB lookup.
In other words, the tunnel information itself contains all the
information needed by the border router to know which remote ASBR
should receive the packet.
3.2.4. Advertising and Handling Sub-Prefixes 3.1.4. Advertising and Handling Sub-Prefixes
Sub-prefixes are advertised and handled by BGP as normal. VA does Sub-prefixes are advertised and handled by BGP as normal. VA does
not effect this behavior. The only difference in the handling of not effect this behavior. The only difference in the handling of
sub-prefixes is that they might not be installed in the FIB, as sub-prefixes is that they might not be installed in the FIB, as
described in Section 3.2.5. described in Section 3.1.5.
In those cases where the route is installed, packets forwarded to In those cases where the route is installed, packets forwarded to
prefixes external to the AS MUST be transmitted via the tunnel prefixes external to the AS MUST be transmitted via the tunnel
established as described in Section 3.2.3. established as described in Section 3.1.3.
3.2.5. Suppressing FIB Sub-prefix Routes 3.1.5. Suppressing FIB Sub-prefix Routes
Any route not for a known VP (i.e. not in the VP-List) is taken to be Any route not for a known VP (i.e. not in the VP-List) is taken to be
a sub-prefix. The following rules are used to determine if a sub- a sub-prefix. The following rules are used to determine if a sub-
prefix route can be suppressed. prefix route can be suppressed.
1. A VA router must never FIB-install a sub-prefix route for which 1. A VA router MUST NOT FIB-install a sub-prefix route for which
there is no tunnel to the BGP NEXT_HOP address. This is to there is no tunnel to the BGP NEXT_HOP address. This is to
prevent a loop whereby the APR forwards the packet hop-by-hop prevent a loop whereby the APR forwards the packet hop-by-hop
towards the next hop, but a router on the path that has FIB- towards the next hop, but a router on the path that has FIB-
suppressed the sub-prefix forwards it back to the APR. If there suppressed the sub-prefix forwards it back to the APR. If there
is an alternate route to the sub-prefix for which there is a is an alternate route to the sub-prefix for which there is a
tunnel, then that route should be selected, even if it is less tunnel, then that route SHOULD be selected, even if it is less
attractive according to the normal BGP best path selection attractive according to the normal BGP best path selection
algorithm. algorithm.
2. If the router is an APR, a route for every sub-prefix within the 2. If the router is an APR, a route for every sub-prefix within the
VP MUST be FIB-installed (subject to the above limitation that VP MUST be FIB-installed (subject to the above limitation that
there be a tunnel). there be a tunnel).
3. If a non-APR router has a sub-prefix route that does not fall 3. If a non-APR router has a sub-prefix route that does not fall
within any VP (as determined by the VP-List), then the route must within any VP (as determined by the VP-List), then the route MUST
be installed. This may occur because the ISP hasn't defined a VP be installed. This may occur because the ISP hasn't defined a VP
covering that prefix, for instance during an incremental covering that prefix, for instance during an incremental
deployment buildup. deployment buildup.
4. If a non-APR router does not have a route for a known VP, then it
MAY or MAY NOT install sub-prefixes within that VP. Whether or 4. If an ASBR is using strict uRPF to do ingress filtering, then it
not it does is up to the vendor and the network operator. One MUST install routes for which the remote ASBR is the BGP NEXT_HOP
approach is to never install such sub-prefixes, on the assumption [RFC2827]. Note that only a APR may do loose uRPF filtering, and
that the network operator will engineer his network so that this then only for routes to sub-prefixes within its VPs.
rarely if ever happens. 5. All other sub-prefix routes MAY be suppressed. Such "optional"
5. Another approach is to have routers install such sub-prefixes,
but taking care not to do so if the missing VP route is a
transient condition. For instance, if the router is booting up,
and simply has not yet received all of its routes, then it can
reasonably expect to receive a VP route soon and so SHOULD NOT
install the sub-prefixes. On the other hand, if a continuously
operating router had only a single remaining route for the VP,
and that route is withdrawn, then the router might not expect to
receive a replacement VP route soon and so SHOULD install the
sub-prefixes. Obviously a router can't predict the future with
certainty, so the following algorithm might be a useful way to
manage whether or not to install sub-prefixes for a non-existing
VP route:
* Define a timer MISSING_VP_TIMER, set for a relatively short
time (say 10 seconds or so).
* Start the timer when either: 1) the last VP route is
withdrawn, or 2) there are initially neither VP routes nor
sub-prefix routes, and the first sub-prefix route is received.
* When the timer expires, install sub-prefix routes. Note,
however, that optional routes may first need to be removed
from the FIB to make room for the new sub-prefix routes. If
even after removing optional routes there is no room in the
FIB for sub-prefix routes, then they should remain suppressed.
In other words, sub-prefix entries required by virtue of being
an APR take priority over sub-prefix entries required by
virtue of not having a VP route.
6. All other sub-prefix routes MAY be suppressed. Such "optional"
sub-prefixes that are nevertheless installed are referred to as sub-prefixes that are nevertheless installed are referred to as
popular prefixes. Popular Prefixes. Note, however, that whether or not to install
a given sub-prefix SHOULD NOT be based on whether or not there is
an active route to a VP in the VP-list. This avoids the
situation whereby, during BGP initialization, the router receives
some sub-prefix routes before receiving the corresponding VP
route, with the result that it installs routes in its FIB that it
will only remove a short time later, possibly even overflowing
its FIB.
3.2.5.1. Selecting Popular Prefixes 3.1.5.1. Selecting Popular Prefixes
Individual routers may independently choose which sub-prefixes are Individual routers MAY independently choose which sub-prefixes are
popular prefixes. There is no need for different routers to install Popular Prefixes. There is no need for different routers to install
the same sub-prefixes. There is therefore significant leeway as to the same sub-prefixes. There is therefore significant leeway as to
how routers select popular prefixes. As a general rule, routers how routers select Popular Prefixes. As a general rule, routers
should fill the FIB as much as possible, because the cost of doing so should fill the FIB as much as possible, because the cost of doing so
is relatively small, and more FIB entries leads to fewer packets is relatively small, and more FIB entries leads to fewer packets
taking a longer path. Broadly speaking, an ISP may choose to fill taking a longer path. Broadly speaking, an ISP may choose to fill
the FIB by making routers APR's for as many VP's as possible, or by the FIB by making routers APRs for as many VPs as possible, or by
assigning relatively few APR's and rather filling the FIB with assigning relatively few APRs and rather filling the FIB with Popular
popular prefixes. Several basic approaches to selecting popular Prefixes. Several basic approaches to selecting Popular Prefixes are
prefixes are outlined here. Router vendors are free to implement outlined here. Router vendors are free to implement whatever
whatever approaches they want. approaches they want.
1. Policy-based: The simplest approach for network administrators is 1. Policy-based: The simplest approach for network administrators is
to have broad policies that routers use to determine which sub- to have broad policies that routers use to determine which sub-
prefixes are designated as popular. An obvious policy would be a prefixes are designated as popular. An obvious policy would be a
"customer routes" policy, whereby all customer routes are "customer routes" policy, whereby all customer routes are
installed (as identified for instance by appropriate community installed (as identified for instance by appropriate community
attribute tags). Another policy would be for a router to install attribute tags). Another policy would be for a router to install
prefixes originated by specific ASes. For instance, two ISPs prefixes originated by specific ASes. For instance, two ISPs
could mutually agree to install each other's originated prefixes. could mutually agree to install each other's originated prefixes.
A third policy might be to install prefixes with the shortest AS- A third policy might be to install prefixes with the shortest AS-
path. path.
2. Static list: Another approach would be to configure static lists 2. Static list: Another approach would be to configure static lists
of specific prefixes to install. For instance, prefixes of specific prefixes to install. For instance, prefixes
associated with an SLA might be configured. Or, a list of associated with an SLA might be configured. Or, a list of
prefixes for the most popular websites might be installed. prefixes for the most popular websites might be installed.
3. High-volume prefixes: By installing high-volume prefixes as 3. High-volume prefixes: By installing high-volume prefixes as
popular prefixes, the latency and load associated with the longer Popular Prefixes, the latency and load associated with the longer
path required by VA is minimized. One approach would be for an path required by VA is minimized. One approach would be for an
ISP to measure its traffic volume over time (days or a few ISP to measure its traffic volume over time (days or a few
weeks), and statically configure high-volume prefixes as popular weeks), and statically configure high-volume prefixes as Popular
prefixes. There is strong evidence that prefixes that are high- Prefixes. There is strong evidence that prefixes that are high-
volume tend to remain high-volume over multi-day or multi-week volume tend to remain high-volume over multi-day or multi-week
timeframes (though not necessarily at short timeframes like timeframes (though not necessarily at short timeframes like
minutes or seconds). High-volume prefixes may also be installed minutes or seconds). High-volume prefixes MAY also be installed
dynamically. In other words, a router measures its own traffic dynamically. In other words, a router measures its own traffic
volumes, and installs and removes popular prefixes in response to volumes, and installs and removes Popular Prefixes in response to
short term traffic load. The downside of this approach is that short term traffic load. The downside of this approach is that
it complicates debugging network problems. If packets are being it complicates debugging network problems. If packets are being
dropped somewhere in the network, it is more difficult to find dropped somewhere in the network, it is more difficult to find
out where if the selected path can change dynamically. out where if the selected path can change dynamically.
3.2.6. Core-Edge Operation 3.2. New Configuration
A common style of router deployment in ISPs is the "core-edge"
deployment, whereby there is a core of high-capacity routers
surrounded by potentially lower-capacity "edge" routers that may not
carry the whole DFRT, and which default route to a core router. VA
can support this style of configuration be effectively defining a
single VP as 0/0, and by defining core routers to be APRs for 0/0.
This results in core routers maintaining full FIBs, and edge routers
having potentially extremely small FIBs. The advantage of using VA
to support core-edge topologies is that, with VA, any edge router,
including those peering with other ISPs, can have a small FIB. Today
such routers must maintain the full DFRT in order to peer.
Vendors may wish to facilitate configuration of a core-edge style of
VA for its customers that already use a core-edge topology. In other
words, a vendor may wish to simplify the VA configuration task so
that a customer merely needs to configure which of its routers are
core and which are edge, and the appropriate VA configuration, i.e.
the VP-List, tunnels, and popular prefixes, is automatically done
"under the hood" so to speak. Note that, under a core-edge
configuration, it isn't strictly speaking necessary for core routers
to advertise the 0/0 VP within BGP. Rather, edge routers could rely
on their default route to a core router.
3.3. Requirements Discussion
This section describes the extent to which VA satisfies the list of
requirements given in Section 3.1.
3.3.1. Response to router failure
VA introduces a new failure mode in the form of Aggregation Point
Router (APR) failure. There are two basic approaches to protecting
against APR failure, static APR redundancy, and dynamic APR
assignment (see Section 3.2.2.3.1). In static APR redundancy, enough
APRs are assigned for each Virtual Prefix (VP) so that if one goes
down, there are others to absorb its load. Failover to a static
redundant APR is automatic with existing BGP mechanisms. If an APR
crashes, BGP will cause packets to be routed to the next nearest APR.
Nevertheless, there are three concerns here: convergence time, load
increase at the redundant APR, and latency increase for diverted
flows.
Regarding convergence time, note that, while fast-reroute mechanisms
apply to the rerouting of packets to a given APR or egress router,
they don't apply to APR failure. Convergence time was discussed in
Section 3.2.2.4, which suggested that it is likely that BGP
convergence times will be adequate, and if not the IGP mechanisms may
be used.
Regarding load increase, in general this is relatively small. This
is because substantial reductions in FIB size can be achieved with
almost negligible increase in load. For instance, [nsdi09] shows
that a 5x reduction in FIB size yields a less than one percent
increase in load overall. Given this, depending on the configuration
of redundant APRs, failure of one APR increases the load of its
backups by only a few percent. This is well within the variation
seen in normal traffic loads.
Regarding latency increase, some flows may see a significant increase
in delay (and, specifically, an increase that puts it outside of its
SLA boundary). Normally a redundant APR would be placed within the
same POP, and so increased latency would be minimal (assuming that
load is also quite small, and so there is no significant queuing
delay). It is not always possible, however, to have an APR for every
VP within every POP, much less a redundant APR within every POP, and
so sometimes failure of an APR will result in significant latency
increases for a small fraction of traffic.
3.3.2. Traffic Engineering
VA complicates traffic engineering because the placement of APRs and
selection of popular prefixes influences how packets flow. (Though
to repeat, increased load is in any event likely to be minimal, and
so the effect on traffic engineering should not be great in any
event.) Since the majority of packets may be forwarded by popular
prefixes (and therefore follow the shortest path), it is particularly
important that popular prefixes be selected appropriately. As
discussed in Section 3.2.5.1, there are static and dynamic approaches
to this. [nsdi09] shows that high-volume prefixes tend to stay high-
volume for many days, and so a static strategy is probably adequate.
VA can operate correctly using either RSVP-TE [RFC3209] or LDP to
establish tunnels.
3.3.3. Incremental and safe deploy and start-up
It must be possible to install and configure VA in a safe and
incremental fashion, as well as start it up when routers reboot.
This document allows for a mixture of VA and legacy routers, allows a
fraction or all of the address space to fall within virtual prefixes,
and allows different routers to suppress different FIB entries
(including none at all). As a result, it is generally possible to
deploy and test VA in an incremental fashion.
3.3.4. VA security
Regarding ingress filtering, because in VA the RIB is effectively
unchanged, routers contain the same information they have today for
installing ingress filters [RFC2827]. Presumably, installing an
ingress filter in the FIB takes up some memory space. Since ingress
filtering is most effective at the "edge" of the network (i.e. at the
customer interface), the number of FIB entries for ingress filtering
should remain relatively small---equal to the number of prefixes
owned by the customer. Whether this is true in all cases remains for
further study.
Regarding DoS attacks, there are two issues that need to be
considered. First, does VA result in new types of DoS attacks?
Second, does VA make it more difficult to deploy DoS defense systems.
Regarding the first issue, one possibility is that an attacker
targets a given router by flooding the network with traffic to
prefixes that are not popular, and for which that router is an APR.
This would cause a disproportionate amount of traffic to be forwarded
to the APR(s). While it is up to individual ISPs to decide if this
attack is a concern, it does not strike the authors that this attack
is likely to significantly worsen the DoS problem.
Regarding DoS defense system deployment, more input about specific
systems is needed. It is the authors' understanding, however, that
at least some of these systems use dynamically established Routing
Table entries to divert victims' traffic into LSPs that carry the
traffic to scrubbers. The expectation is that this mechanism simply
over-rides whatever route is in place (with or without VA), and so
the operation of VA should not limit the deployment of these types of
DoS defense systems. Nevertheless, more study is needed here.
3.4. New Configuration
VA places new configuration requirements on ISP administrators. VA places new configuration requirements on ISP administrators.
Namely, the administrator must: Namely, the administrator must:
1. Select VPs, and configure the VP-List into all VA routers. As a 1. Select VPs, and configure the VP-List into all VA routers. As a
general rule, having a larger number of relatively small prefixes general rule, having a larger number of relatively small prefixes
gives administrators the most flexibility in terms of filling gives administrators the most flexibility in terms of filling
available FIB with sub-prefixes, and in terms of balancing load available FIB with sub-prefixes, and in terms of balancing load
across routers. Once an administrator has selected a VP-List, it across routers. Once an administrator has selected a VP-List, it
is just as easy to configure routers with a large list as a small is just as easy to configure routers with a large list as a small
skipping to change at page 24, line 15 skipping to change at page 20, line 51
Routers should have the appropriate counters to allow Routers should have the appropriate counters to allow
administrators to know the volume of APR traffic each router is administrators to know the volume of APR traffic each router is
handling so as to adjust load by adding or removing APR handling so as to adjust load by adding or removing APR
assignments. assignments.
3. Select and configure Popular Prefixes or Popular Prefix policies. 3. Select and configure Popular Prefixes or Popular Prefix policies.
There are two general goals here. The first is to minimize load There are two general goals here. The first is to minimize load
overall by minimizing the number of packets that take longer overall by minimizing the number of packets that take longer
paths. The second is to insure that specific selected prefixes paths. The second is to insure that specific selected prefixes
don't have overly long paths. These goals must be weighed don't have overly long paths. These goals must be weighed
against the administrative overhead of configuring potentially against the administrative overhead of configuring potentially
thousands of popular prefixes. As one example a small ISP may thousands of Popular Prefixes. As one example a small ISP may
wish to keep it simple by doing nothing more than indicating that wish to keep it simple by doing nothing more than indicating that
customer routes should be installed. In this case, the customer routes should be installed. In this case, the
administrator could otherwise assign as many APRs as possible administrator could otherwise assign as many APRs as possible
while leaving enough FIB space for customer routes. As another while leaving enough FIB space for customer routes. As another
example, a large ISP could build a management system that takes example, a large ISP could build a management system that takes
into consideration the traffic matrix, customer SLAs, robustness into consideration the traffic matrix, customer SLAs, robustness
requirements, FIB sizes, topology, and router capacity, and requirements, FIB sizes, topology, and router capacity, and
periodically automatically computes APR and popular prefix periodically automatically computes APR and Popular Prefix
assignments. assignments.
4. IANA Considerations 4. Usage of Tunnels
4.1. MPLS tunnels
VA utilizes a straight-forward application of MPLS. The tunnels are
MPLS Label Switched Paths (LSP), and are signaled using either the
Label Distribution Protocol (LDP) [RFC5036] or RSVP-TE [RFC3209].
Both VA and legacy routers MUST participate in this signaling.
APRs and ASBRs initiate tunnels. In both cases, Downstream
Unsolicited tunnels are initiated to all IGP neighbors with the full
BGP NEXT_HOP address as the Forwarding Equivalence Class (FEC). In
the case of APRs, the BGP NEXT_HOP is the APR's own address. In the
case of legacy ASBRs, the BGP NEXT_HOP is the ASBR's own address. In
the case of VA ASBRs, the BGP NEXT_HOP is that of the remote ASBR.
Existing Penultimate Hop Popping (PHP) mechanisms in the data plane
can be used for forwarding packets to remote ASBRs.
4.2. Usage of Inner Label
Besides using a separate LSP to identify the remote ASBR as described
above, it is also possible to use an inner label to identify the
remote ASBR. Either an outer label or an IP tunnel identifies the
local ASBR.
When a local ASBR advertises a route into iBGP, it sets the NEXT_HOP
to itself, and assigns a label to the route. This label is used as
the inner label, and identifies the remote ASBR from which the route
was received [RFC3107].
The presence of the inner label in the iBGP update acts as the signal
to the receiving router that an inner label MUST be used in packets
tunneled to the NEXT_HOP address. If there is an LSP established
targeted to the NEXT_HOP address, then it is used to tunnel the
packet to the NEXT_HOP address. Otherwise, an IP header address to
the NEXT_HOP address is used.
5. IANA Considerations
There are no IANA considerations. There are no IANA considerations.
5. Security Considerations 6. Security Considerations
We consider the security implications of VA under two scenarios, one We consider the security implications of VA under two scenarios, one
where VA is configured and operated correctly, and one where it is where VA is configured and operated correctly, and one where it is
mis-configured. A cornerstone of VA operation is that the basic mis-configured. A cornerstone of VA operation is that the basic
behavior of BGP doesn't change, especially inter-domain. Among other behavior of BGP doesn't change, especially inter-domain. Among other
things, this makes it easier to reason about security. things, this makes it easier to reason about security.
5.1. Properly Configured VA 6.1. Properly Configured VA
If VA is configured and operated properly, then the external behavior If VA is configured and operated properly, then the external behavior
of an AS does not change. The same upstream ASes are selected, and of an AS does not change. The same upstream ASes are selected, and
the same prefixes and AS-paths are advertised. Therefore, a properly the same prefixes and AS-paths are advertised. Therefore, a properly
configured VA domain has no security impact on other domains. configured VA domain has no security impact on other domains.
This document discusses intra-domain security concerns in
Section 3.3.4 which argues that any new security concerns appear to
be relatively minor.
If another ISP starts advertising a prefix that is larger than a If another ISP starts advertising a prefix that is larger than a
given VP, this prefix will be ignored by APRs that have a VP that given VP, this prefix will be ignored by APRs that have a VP that
falls within the larger prefix (Section 3.2.2.3). As a result, falls within the larger prefix (Section 3.1.2.3). As a result,
packets that might otherwise have been routed to the new larger packets that might otherwise have been routed to the new larger
prefix will be dropped at the APRs. Note that the trend in the prefix will be dropped at the APRs. Note that the trend in the
Internet is towards large prefixes being broken up into smaller ones, Internet is towards large prefixes being broken up into smaller ones,
not the reverse. Therefore, such a larger prefix is likely to be not the reverse. Therefore, such a larger prefix is likely to be
invalid. If it is determined without a doubt that the larger prefix invalid. If it is determined without a doubt that the larger prefix
is valid, then the ISP will have to reconfigure its VPs. is valid, then the ISP will have to reconfigure its VPs.
5.2. Mis-configured VA VA does not change an ISP's ability to do ingress filtering using
strict uRPF (Section 3.1.5).
Regarding DoS attacks, there are two issues that need to be
considered. First, does VA result in new types of DoS attacks?
Second, does VA make it more difficult to deploy DoS defense systems.
Regarding the first issue, one possibility is that an attacker
targets a given router by flooding the network with traffic to
prefixes that are not popular, and for which that router is an APR.
This would cause a disproportionate amount of traffic to be forwarded
to the APR(s). While it is up to individual ISPs to decide if this
attack is a concern, it does not strike the authors that this attack
is likely to significantly worsen the DoS problem.
Many DoS defense systems use dynamically established Routing Table
entries to divert victims' traffic into LSPs that carry the traffic
to scrubbers. This mechanism works with VA---it simply over-rides
whatever route is in place. This mechanism works equally well with
APRs and non-APRs.
6.2. Mis-configured VA
VA introduces the possibility that a VP is advertised outside of an VA introduces the possibility that a VP is advertised outside of an
AS. This in fact should be a low probability event, but it is AS. This in fact should be a low probability event, but it is
considered here none-the-less. considered here none-the-less.
If an AS leaks a large VP (i.e. larger than any real prefixes), then If an AS leaks a large VP (i.e. larger than any real prefixes), then
the impact is minimal. Smaller prefixes will be preferred because of the impact is minimal. Smaller prefixes will be preferred because of
best-match semantics, and so the only impact is that packets that best-match semantics, and so the only impact is that packets that
otherwise have no matching routes will be sent to the misbehaving AS otherwise have no matching routes will be sent to the misbehaving AS
and dropped there. If an AS leaks a small VP (i.e. smaller than a and dropped there. If an AS leaks a small VP (i.e. smaller than a
real prefix), then packets to that AS will be hijacked by the real prefix), then packets to that AS will be hijacked by the
misbehaving AS and dropped. This can happen with or without VA, and misbehaving AS and dropped. This can happen with or without VA, and
so doesn't represent a new security problem per se. so doesn't represent a new security problem per se.
6. Acknowledgements 7. Acknowledgements
The authors would like to acknowledge the efforts of Xinyang Zhang The authors would like to acknowledge the efforts of Xinyang Zhang
and Jia Wang, who worked on CRIO (Core Router Integrated Overlay), an and Jia Wang, who worked on CRIO (Core Router Integrated Overlay), an
early inter-domain variant of FIB suppression, and the efforts of early inter-domain variant of FIB suppression, and the efforts of
Hitesh Ballani and Tuan Cao, who worked on the configuration-only Hitesh Ballani and Tuan Cao, who worked on the configuration-only
variant of VA that works with legacy routers. We would also like to variant of VA that works with legacy routers. We would also like to
thank Scott Brim, Daniel Ginsburg, and Rajiv Asati for their helpful thank Scott Brim, Daniel Ginsburg, and Rajiv Asati for their helpful
comments. In particular, Daniel's comments significantly simplified comments. In particular, Daniel's comments significantly simplified
the spec (eliminating the need for a new External Communities the spec (eliminating the need for a new External Communities
Attribute). Attribute).
7. References 8. References
7.1. Normative References 8.1. Normative References
[RFC1997] Chandrasekeran, R., Traina, P., and T. Li, "BGP [RFC1997] Chandrasekeran, R., Traina, P., and T. Li, "BGP
Communities Attribute", RFC 1997, August 1996. Communities Attribute", RFC 1997, August 1996.
[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
Requirement Levels", BCP 14, RFC 2119, March 1997. Requirement Levels", BCP 14, RFC 2119, March 1997.
[RFC2328] Moy, J., "OSPF Version 2", STD 54, RFC 2328, April 1998.
[RFC2827] Ferguson, P. and D. Senie, "Network Ingress Filtering: [RFC2827] Ferguson, P. and D. Senie, "Network Ingress Filtering:
Defeating Denial of Service Attacks which employ IP Source Defeating Denial of Service Attacks which employ IP Source
Address Spoofing", BCP 38, RFC 2827, May 2000. Address Spoofing", BCP 38, RFC 2827, May 2000.
[RFC3107] Rekhter, Y. and E. Rosen, "Carrying Label Information in [RFC3107] Rekhter, Y. and E. Rosen, "Carrying Label Information in
BGP-4", RFC 3107, May 2001. BGP-4", RFC 3107, May 2001.
[RFC3209] Awduche, D., Berger, L., Gan, D., Li, T., Srinivasan, V., [RFC3209] Awduche, D., Berger, L., Gan, D., Li, T., Srinivasan, V.,
and G. Swallow, "RSVP-TE: Extensions to RSVP for LSP and G. Swallow, "RSVP-TE: Extensions to RSVP for LSP
Tunnels", RFC 3209, December 2001. Tunnels", RFC 3209, December 2001.
[RFC4271] Rekhter, Y., Li, T., and S. Hares, "A Border Gateway [RFC4271] Rekhter, Y., Li, T., and S. Hares, "A Border Gateway
Protocol 4 (BGP-4)", RFC 4271, January 2006. Protocol 4 (BGP-4)", RFC 4271, January 2006.
[RFC4601] Fenner, B., Handley, M., Holbrook, H., and I. Kouvelas, [RFC4601] Fenner, B., Handley, M., Holbrook, H., and I. Kouvelas,
"Protocol Independent Multicast - Sparse Mode (PIM-SM): "Protocol Independent Multicast - Sparse Mode (PIM-SM):
Protocol Specification (Revised)", RFC 4601, August 2006. Protocol Specification (Revised)", RFC 4601, August 2006.
7.2. Informative References [RFC5036] Andersson, L., Minei, I., and B. Thomas, "LDP
Specification", RFC 5036, October 2007.
8.2. Informative References
[I-D.ietf-grow-simple-va]
Francis, P., Xu, X., Ballani, H., Raszuk, R., and L.
Zhang, "Simple Virtual Aggregation (S-VA)",
draft-ietf-grow-simple-va-00 (work in progress),
March 2010.
[I-D.ietf-grow-va-gre]
Francis, P., Raszuk, R., and X. Xu, "GRE and IP-in-IP
Tunnels for Virtual Aggregation",
draft-ietf-grow-va-gre-00 (work in progress), July 2009.
[I-D.ietf-grow-va-mpls]
Francis, P. and X. Xu, "MPLS Tunnels for Virtual
Aggregation", draft-ietf-grow-va-mpls-00 (work in
progress), May 2009.
[I-D.ietf-grow-va-mpls-innerlabel]
Xu, X. and P. Francis, "Proposal to use an inner MPLS
label to identify the remote ASBR VA",
draft-ietf-grow-va-mpls-innerlabel-00 (work in progress),
September 2009.
[nsdi09] Ballani, H., Francis, P., Cao, T., and J. Wang, "Making [nsdi09] Ballani, H., Francis, P., Cao, T., and J. Wang, "Making
Routers Last Longer with ViAggre", ACM Usenix NSDI 2009 ht Routers Last Longer with ViAggre", ACM Usenix NSDI 2009 ht
tp://www.usenix.org/events/nsdi09/tech/full_papers/ tp://www.usenix.org/events/nsdi09/tech/full_papers/
ballani/ballani.pdf, April 2009. ballani/ballani.pdf, April 2009.
Authors' Addresses Authors' Addresses
Paul Francis Paul Francis
Max Planck Institute for Software Systems Max Planck Institute for Software Systems
skipping to change at page 27, line 30 skipping to change at page 26, line 4
Email: hitesh@cs.cornell.edu Email: hitesh@cs.cornell.edu
Dan Jen Dan Jen
UCLA UCLA
4805 Boelter Hall 4805 Boelter Hall
Los Angeles, CA 90095 Los Angeles, CA 90095
US US
Phone: Phone:
Email: jenster@cs.ucla.edu Email: jenster@cs.ucla.edu
Robert Raszuk Robert Raszuk
Self Cisco Systems, Inc.
170 West Tasman Drive
San Jose, CA 95134
USA
Phone: Phone:
Email: robert@raszuk.net Email: raszuk@cisco.com
Lixia Zhang Lixia Zhang
UCLA UCLA
3713 Boelter Hall 3713 Boelter Hall
Los Angeles, CA 90095 Los Angeles, CA 90095
US US
Phone: Phone:
Email: lixia@cs.ucla.edu Email: lixia@cs.ucla.edu
 End of changes. 105 change blocks. 
460 lines changed or deleted 386 lines changed or added

This html diff was produced by rfcdiff 1.38. The latest version is available from http://tools.ietf.org/tools/rfcdiff/