redundancy [was: something about arrogance]

Pedro R Marques roque at sbcglobal.net
Tue Jul 30 10:23:24 UTC 2002


Brad writes:
 >        I'm probably demonstrating my ignorance here (and my stupidity 
in 
 > stepping into a long-standing highly charged argument), but I'm 
 > completely missing something.  For reasons of redundancy & 
 > reliability, even if you were to buy bandwidth in only one location, 
 > wouldn't you want to buy it from at least two different providers?
 
 >        If you buy bandwidth from two different providers at two 
 > different locations, this would seem to me to be a good way to 
 > provide backup in case on provider or one location goes 
 > Tango-Uniform, and you could always backhaul the bandwidth for the 
 > site/provider that is down.

Several other posters have mentioned reasons why redundancy between 2 
different connections to separate providers are not, in most situations, 
the preferable aproach but i would like to add another point/question...

When considering redudancy/reliability/etc it is important to think 
about what kind of failures do you want to protect against vs cost of 
doing so.

It is my impression, from reading this list and tidbits of gossip, that 
the most common causes of failure are:
- link failure
- equipment failure (routers mostly), both software and hardware
- configuration errors

All of those are much more frequent than the failure of an entire ISP (a 
transit provider). It is expected, i believe, of a competent ISP to 
provide redudancy both within a POP and intra-POP links/equipment and 
its connections to upstreams/peers.

As such, probably the first level of redundancy that a origin AS 
(non-transit) would look at would be  with the intent to protect from 
failures of its external connectivity link and termination equipment 
(routers on both ends).

To do so, one can look at:
- 2 external links to distinct providers
- 2 external links to the same provider

While i can't speak to the economics part of the equation (although i 
would expect it to be cheaper to buy an additional link than connect to 
a different provider) from a point of view of restoration, protecting a 
path with an alternate path from the same provider is certainly an 
aproach that gives you much better convengence times.

This comes from the fact that in terms of network topology, the distance 
between 2 links to the same upstream is much shorter than 2 links to 
different upstreams. While, if you protect a path with an alternate path 
to the same ISP you can expect convergence to occur within the IGP 
convergence times of your provider, with 2 different providers you need 
global BGP convergence to occur.

This gets to be longer dependent on how topologically distant your 2 
upstreams are... for instance attempting to protect a path to an ISP 
with very wide connectivity with a protection path from one with very 
limited connectivity would be a particularly bad case as you would have 
to wait for the path announced by the larger ISP to be withdrawn n times 
from all its peering points and the protection path to make its way 
through in replacement.

It is counter-intuitive to me what i perceive to be the standard 
practice of attempting to multi-home to 2 distinct providers by 
origin-only ASes... from several points of view: convergence times, load 
on the global routing system, complexity of management, etc, dual 
connectivity to different routers of the same provider (using distinct 
physical paths) would seem to me to make more sense.

Unless the main concern is that the upstream ISP fails entirely... which 
given the fact that it tends to have frontpage honors on the NYTimes 
this days does not apear to be an all to common occurence (i mean 
operationally, not financially - clarification added to dispel potential 
humorous remarks).

So, my question to the list is, why is multi-homing to 2 different 
providers such a desirable thing ? What is the motivation and why is it 
prefered over multiple connections to the same upstream ?

Is the main motivation not so much reliability but having a shorter 
as-path to more destinations ? This would apear to me to be a clear 
advantage since that doesn't necessarily reflect in better qualitify of 
interconnection.

My apologies in advance if these seem to be stupid questions...

thanks,
  Pedro.




More information about the NANOG mailing list