[outages] Major Level3 (CenturyLink) Issues

Luke Guillory lguillory at reservetele.com
Wed Sep 2 17:13:06 UTC 2020


Detailed explanation can be found below.


https://blog.thousandeyes.com/centurylink-level-3-outage-analysis/





From: NANOG <nanog-bounces+lguillory=reservetele.com at nanog.org> on behalf of Baldur Norddahl <baldur.norddahl at gmail.com>
Date: Wednesday, September 2, 2020 at 12:09 PM
To: "nanog at nanog.org" <nanog at nanog.org>
Subject: Re: [outages] Major Level3 (CenturyLink) Issues

*External Email: Use Caution*
I believe someone on this list reported that updates were also broken. They could not add prepending nor modify communities.

Anyway I am not saying it cannot happen because clearly something did happen. I just don't believe it is a simple case of overload. There has to be more to it.
ons. 2. sep. 2020 15.36 skrev Saku Ytti <saku at ytti.fi<mailto:saku at ytti.fi>>:
On Wed, 2 Sep 2020 at 16:16, Baldur Norddahl <baldur.norddahl at gmail.com<mailto:baldur.norddahl at gmail.com>> wrote:

> I am not buying it. No normal implementation of BGP stays online, replying to heart beat and accepting updates from ebgp peers, yet after 5 hours failed to process withdrawal from customers.

I can imagine writing BGP implementation like this

 a) own queue for keepalives, which i always serve first fully
 b) own queue for update, which i serve second
 c) own queue for withdraw, which i serve last

Why I might think this makes sense, is perhaps I just received from
RR2 prefix I'm pulling from RR1, if I don't handle all my updates
first, I'm causing outage that should not happen, because I already
actually received the update telling I don't need to withdraw it.

Is this the right way to do it? Maybe not, but it's easy to imagine
why it might seem like a good idea.

How well BGP works in common cases and how it works in pathologically
scaled and busy cases are very different cases.

I know that even in stable states commonly run vendors on commonly run
hardware can take +2h to finish converging iBGP on initial turn-up.

--
  ++ytti
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.nanog.org/pipermail/nanog/attachments/20200902/7c3619e5/attachment.html>


More information about the NANOG mailing list