[outages] Major Level3 (CenturyLink) Issues

Jon Lewis jlewis at lewis.org
Wed Sep 2 21:31:22 UTC 2020


On Wed, 2 Sep 2020, Warren Kumari wrote:

> The root issue here is that the *publicc* RFO is incomplete / unclear.
> Something something flowspec something, blocked flowspec, no more
> something does indeed explain that something bad happened, but not
> what caused the lack of withdraws / cascading churn.
> As with many interesting outages, I suspect that we will never get the
> full story, and "Something bad happened, we fixed it and now it's all
> better and will never happen ever again, trust us..." seems to be the
> new normal for public postmortems...

It's possible Level3's people don't fully understand what happened or that 
the "bad flowspec rule" causing BGP sessions to repeatedly flap network 
wide triggered software bugs on their routers.  You've never seen rpd 
stuck at 100% CPU for hours or an MX960 advertise history routes to 
external peers, even after the internal session that had advertised the 
route to it has been cleared?

To quote Zaphod Beeblebrox "Listen, three eyes, don't you try to outweird 
me. I get stranger things than you free with my breakfast cereal."

Kick a BGP implementation hard enough, and weird shit is likely to happen.

----------------------------------------------------------------------
  Jon Lewis, MCP :)           |  I route
  StackPath, Sr. Neteng       |  therefore you are
_________ http://www.lewis.org/~jlewis/pgp for PGP public key_________



More information about the NANOG mailing list