Outbound Route Optimization

Mon Jan 26 18:58:49 UTC 2004

Richard, 

  you have made some good points in this thread. 
One general observation, and then specific responses
... I don't assert that current route optimization
technology solves ALL routing problems, but do think
that there are some specific problems that automation
can effectively, and gracefully solve.

> * The inability to receive FULL bgp routes from every bgp peer to your
> optimization box without requiring your transit providers to set up a host
> of eBGP Multihop sessions (which most refuse to do). This means you will
> always be stuck assuming that every egress path is a transit and can reach 
> any destination on the Internet until your active or passive probing says 
> otherwise.

The issue that you describe does indeed offer some 
constraints to the application of route optimization
technology. Within the scope of this issue, though, 
I think that you would agree that a network which is
ALL transit would face no challenge here -- and more
specifically, if there is a routing optimization 
decision among local transit links, that problem 
could be solved independantly of the existance of
"non-transit" links. 

Applying this technology in the presence of "non-
transit" routes requires constraining measurments to 
only the prefixes appropriate for a given link. It
is true that knowing all BGP routes ("BGP Losers")
would be a nice way to get this information ... 
but it's not necessarily the only approach towards
the goal. Some solutions may have topological 
dependancies, but it can be feasible to simply drop 
all measurement towards "illegal" destinations.

In other cases, it may be possible to define the
set of destinations that are legal over a given
link, and constrain measurements for that link. 

> * The requirement of deaggregation in order to make best path decisions 
> effective. For example, someone's T3 to genuithree gets congested and the 
> best path to their little /24 of the Internet is through another provider. 
> Do you move 4.0.0.0/8?

Perhaps. Yes, it's a /8. But if measurements to the /8 show
better collective performance over another link, why NOT 
move it? Yes, it could be carrying a lot of traffic, and 
could result in congesting the next link ... so it is 
necessary to be able to:

  - know when links are at/near capacity, 
    and so avoid their use; and

  - react quickly in case of congestion

Note that these problems are not specific to /8s, 
and that traffic loads are dynamic - even if it 
does look like there is "room" for a prefix on a
link, once the route gets changed, conditions 
could very well change also. Any route optimization
system needs to deal with these issues for ALL 
prefixes. 

There are multiple levels of optimization possible
on top of this:

  a) If there is a general belief that /8s are 
     simply "too big" to move, they can be manually
     deaggregated. Our experience shows that by 
     breaking up a /8 into as few as (10) or (15) 
     carefully designed "chunks", the resultant 
     load per (deaggregated) prefix becomes equivalent
     to hundreds of other prefixes. 

  b) If manually configuring deaggregates is not
     desirable, automated approaches to deaggregation
     are possible: "If I see traffic in this range, 
     and a /xx does not exist for the observed traffic, 
     then create the /xx". 

  c) Dynamically measure all of the possible 
     deaggregations of all active space, and dynamically
     determine which prefixes need to be deaggregated
     to what level. 

Note that in any of the above cases, the de-aggregated 
routes should be marked NO_EXPORT. 

I know of solid commercial implementations of (a) and
(b). (c) is a more interesting project ... :)    

> * The constant noise of stupid scripts pinging everything 
>   on the Internet.

Pinging the Internet is clearly a wasteful approach. Essentially
no one needs optimization to the ENTIRE Internet. Granted, major
backbones probably actually use a great deal of the routing 
table ... 

  (Quiz for the list readers: 
   What percentage of the Internet routing table does 
   your network actually use?)

... but for many ISP/hosting facility/major multihomed
enterprise, our experience shows that only a very small
fraction of traffic is seen beyond about (20,000-30,000)
routes in a given day. 

There is no reason to measure destinations unless they 
are involved with traffic to your network. Basing 
measurements on observed traffic, or having applications 
instrumented to automatically generate their own measurement 
are both "clean" options here. 

Companies and ISPs today spend time(=money) managing their
connectivity to the Internet. Loop-free connectivity is a
basic first step; but in many cases real connectivity goals
include:

   - Capacity management (especially in the presence 
     of asymmetrical bandwidth)
   - Load management (in the case of usage-based billig)
   - Performance management (realizing 'best possible'
     performance)
   - Maximizing application availability (fastest possible
     reroute, in the case of congestive failure)

Manually tweaking routing policies to achieve these goals
is a time-honored craft (especially with this crowd :) ...
but I suspect that even the most experienced in this area
will acknowledge that there is a tier of this problem that
may be best automated. (Note that I said "a tier" -- there
are clearly additional problems that current route optimization
technology DOESN'solve. :)

cheers -- Sean