OSPF multi-level hierarch: side question
avg at kotovnik.com
Fri May 28 22:44:41 UTC 1999
Steve Meuse <smeuse at bbnplanet.com> wrote:
> On the other hand, you can choose to build a box that can handle thousands
> of customers, and decrease the traffic load, but also increase the
> liklihood of a failure that can directly affect a larger percentage of
Dan Rabb <danr at dbn.net> wrote:
> Routers will inevitably fail. The question becomes how much exposure do you
> want when it does?
First, you have to stop thinking of routers as "black boxes" and expose internal
structure of large boxes so you can compare it with clusters.
In this respect, big router designs i know of are eminently more reliable than
clusters of traditional routers for a number of reasons:
1) the connectivity between components ("elementary routers") is significantly
richer, with many diverse paths between components.
2) the design is inherently simpler than that of a multi-vendor and
multi-standard cluster; with significantly fewer number of different components
and a lot more regular topology. Simplicity directly translates into
3) there is a built-in support for extensive fault-tolerance and self-diagnostics
at a level simply unachievable with standard routing protocols (which by
their nature do not have a foggiest idea of the internal structure and
diagnostic possiblilities of the routers; and do not provide any support
for state mirroring).
4) the individual failure blocks are much smaller (i.e. one "line card" vs entire
router, at least in Pluris design -- the line card interface is not a bus,
but a serial line with protocol which cannot be screwed up by misbehaving
line card, unlike any known bus protocols).
5) power supplies are distributed (Pluris box simply has a separate DC-DC converter
on every card)
6) at least one vendor (Pluris) has all card cages complteley isolated electrically
7) the last (but not least) aspect of terabit routing is its inherent reliance
on inverse-multiplexing over multiple parallel channels allowing to degrade
service gracefully in case of individual channel or path failures - without
any need to make the problem visible at IP level; and therefore not limited
by performance of distributed routing algorithms.
Alex Zinin <zinin at amt.ru> wrote:
> "Have more bigger boxes rather than less smaller ones"-approach is not
> for everybody and not for every case. If you have clusters sitting in one room,
> powered from the same source, sharing the same ceiling that can fall, running
> the same version of soft, using the same config., etc., than yes it's ok,
> because they will more likely crash at the same moment.
A big router does not have to be all in one place physically. Pluris design
allows hundreds feet of component separation with optical cabling.
> Also, even if you do use a large box, you probably don't wanna know
> all the details about it's connections at some level of your network.
The whole premise of big box design is that its internal capacity is so much
bigger than interface capacity that from outside it looks like a single
point w/o any need to optimize routing inside. From the perspective of
network management, of course, big boxes have to provide detailed internal
status info. A sane design for a big router has an out-of-band diagnostic
network within the box.
>> to eliminate updates which "do not matter" unlike SPF-based algorithms
>> which have to inform everyone about local topology changes.
> In SPF-based protocols we have areas for this purpose---we do not propogate
> topology information across the area boundaries.
Across boundaries which have to be configured _manually_. DV and diffuse
algorithms tend to squelch topology updates automatically _within_ an
area if a same-metric alternative path is found. SPF has to have a coherent
picture of network topology at all times; so flap can easily kill it off.
Diffuse algorithms are by design work well in a network with rapidly
More information about the NANOG