Famous operational issues

Daniel Karrenberg dfk at ripe.net
Fri Feb 19 11:07:58 UTC 2021



On 16 Feb 2021, at 20:37, John Kristoff wrote:

> I'd like to start a thread about the most famous and widespread 
> Internet
> operational issues, outages or implementation incompatibilities you
> have seen.
>
> Which examples would make up your top three?


My absolute top one happened 1995. Traffic engineering was not a widely 
used term then. A bright colleague who will remain un-named decided that 
he could make AS paths longer by repeating the same AS number more than 
once. Unfortunately the prevalent software on CISCO routers was not 
resilient to such trickery and reacted with a reboot. This caused an 
avalanche of jo-jo-ing routers. Think it through!

It took some time before that offending path could be purged from the 
whole Internet; yes we all roughly knew the topology and the players of 
the  BGP speaking parts of it at that time.  Luckily this happened 
during the set-up for the Danvers IETF and co-ordination between major 
operators was quick because most of their routing geeks happened to be 
in the same room, the ‘terminal room’; remember those?

Since at the time I personally had no responsibility for operations any 
more I went back to pulling cables and crimping RJ45s.

Lessons: HW/SW mono-cultures are dangerous. Input testing is good 
practice at all levels software. Operational co-ordination is key in 
times of crisis.

Daniel



More information about the NANOG mailing list