Famous operational issues

Pierre Emeriaud petrus.lt at gmail.com
Tue Feb 16 22:52:01 UTC 2021


Le mar. 16 févr. 2021 à 21:03, Job Snijders via NANOG
<nanog at nanog.org> a écrit :
>
> https://labs.ripe.net/Members/erik/ripe-ncc-and-duke-university-bgp-experiment/
>
> The experiment triggered a bug in some Cisco router models: affected
> Ciscos would corrupt this specific BGP announcement ** ON OUTBOUND **.
> Any peers of such Ciscos receiving this BGP update, would (according to
> then current RFCs) consider the BGP UPDATE corrupted, and would
> subsequently tear down the BGP sessions with the Ciscos. Because the
> corruption was not detected by the Ciscos themselves, whenever the
> sessions would come back online again they'd reannounce the corrupted
> update, causing a session tear down. Bounce ... Bounce ... Bounce ... at
> global scale in both IBGP and EBGP! :-)

In a similar fashion, a network I know had a massive outage when a
failing linecard corrupted is-is lsps, triggering a flood of purges
and taking down the whole backbone.

This was pre-rfc6232, so you can guess that resolving the issue was a real PITA.

This kind of outages fuels my netops nightmares.


More information about the NANOG mailing list