Vulnerbilities of Interconnection

Iljitsch van Beijnum iljitsch at muada.com
Fri Sep 13 09:33:13 UTC 2002


On Fri, 13 Sep 2002, Stephen J. Wilcox wrote:

> > At what point does one build redundancy into the network.

> No, it doesnt necessarily use IX's, in the event of there being no peered path
> across an IX traffic will flow from the originator to their upstream
> "tier1" over a private transit link, then that "tier1" will peer with the
> destination's upstream "tier1" over a private fat pipe then that will go to the
> destination via their transit private link.

But will these links have enough spare capacity so congestion doesn't
happen?

> I'm only aware of a few providers who transit across IX's and I think the
> consensus is that its a bad thing so it tends to be just small people for whom
> the cost of the private link is relatively high.

I apologize in advance for naming names here, but I think it is important
for making my point.

A while back (I think last year, but I'm not sure) the AMS-IX had a huge
outage because the power failed in two of the main locations. One of the
locations didn't at that time have battery or generator backed up power
(although they used three diversely routed inputs from the power company)
and the other location only had batteries, which didn't last long.

Nearly everything was still reachable over transit rather than peering
with only minor congestion. However, some networks got their transit in
the same buildings as where they connect to the AMS-IX, so both their
peering and transit was gone and they were unreachable. If you think this
was only true for small networks: think again. Surfnet suffered the same
problem. Surfnet one of the largest (if not _the_ largest) Dutch network,
connecting all the universities in the country at multi-gigabit speeds.
However, they only connected to other networks in a single building at
that time. I don't know if this is still the case.

Now this is only one big network and a few small ones that suffered.
However, things could have been much worse for people in the rest of the
Netherlands, because even with all the rerouting going on almost all
traffic still flowed through Amsterdam. So any outage in Amsterdam that
takes down more than a single building would cripple the majority of Dutch
networks. Obviously, something like this doesn't happen all the time, but
luck has a tendency to run out from time to time. A plane crash (a 747
went down in an Amsterdam suburb 10 years ago) or a good sized flood (lots
of stuff is below sea level in NL) will do it.

> I suspect the catch would be that in the event of major switching nodes being
> taken out there would be considerable congestion on the transit links and most
> likely on the private peering of the tier1's also.

I'm more worried about long distance fiber running through rural areas.
Much more bang for your backhoe renting buck.

> > not sure I'd call it a "poor job"  for not planning all possible
> > failure modes, or for not having links in place for them.

> Well the trouble is in the real world we cant have the budgets we'd like to
> implement our plans and end up compromising.. theres the catch.

I don't think it's just a matter of money. In 1999, I helped roll out a
completely new network. EVERYTHING in it, except the ports customers
connect to, had a backup. Management originally wanted to connect every
location to at least three others. (We got this requirement dropped
because it essentially means you're buying a third circuit that doesn't do
anything useful until the two others are down; traffic engineering to for
both regular operation and the different failure modes is too complex.)
Still, I couldn't convince them to move the second transit connection to
another city where both our network and the transit network were also
present in the same building.

A year or so after I left I was in the building where that entire network
connects to its transit network over two independent routers at both ends
and the power went down and they couldn't get the generators online...
Eventually the utility power came back online before the batteries were
empty. All of this is on the ground floor in a place that's below sea
level only a block or so from a river.




More information about the NANOG mailing list