Level3 routing issues?

Jack Bates jbates at brightok.net
Tue Jan 28 15:47:17 UTC 2003


From: <cowie at renesys.com>

<snip>
> On the other hand, we also know (from private communications and from
> other mailing lists.. ahem) that high rate and high src/dst diversity
> of scans causes some network devices to fail (devices that cache flows, or
> devices that suffer from cpu overload under such conditions).
>
> Some BGP-speaking routers (not all, by any means, but some subpopulation)
> found themselves pegged at 100% CPU on Saturday.  Just one example:
>
>    http://noc.ilan.net.il/stats/ILAN-CPU/new-gp-cpu.html
>
Was it not known that under certain conditions the router would flatline?
What percautionary measures were put into place in such an event to limit
the damage?

> Whether you believe "anthropogenic" explanations for the instability
> depends on how fast you believe NEs can look, think, and type, compared
> to the speed with which the BGP announcement and withdrawal rates are
> observed to take off.  For my part, I'd bet that the long slow exponential
> decay (with superimposed spiky noise) is people at work.  But the initial
> blast is not.
>
When the crisis is on you, it's too late. You are either prepared and know
exactly what to do at that critical moment or you don't. You either had a <5
minute response time to the crisis or you didn't. We also know (from private
communications and from other mailing lists.. yes, I'm a thief :) that many
NEs were caught with their pants down, a mistake they aren't apt to do
again. It comes down to one's outlook. Do you just configure and maintain or
do you strive to push it to the envelope? Do you truly know your network?
Remember, it's a living, breathing thing. The complexity of variables makes
complete predictability impossible, and so we must learn to understand it
and how it reacts.

Then again, perhaps I'm a lunatic. :)

Jack Bates
BrightNet Oklahoma




More information about the NANOG mailing list