[outages] Major Level3 (CenturyLink) Issues

Baldur Norddahl baldur.norddahl at gmail.com
Wed Sep 2 17:07:30 UTC 2020


I believe someone on this list reported that updates were also broken. They
could not add prepending nor modify communities.

Anyway I am not saying it cannot happen because clearly something did
happen. I just don't believe it is a simple case of overload. There has to
be more to it.

ons. 2. sep. 2020 15.36 skrev Saku Ytti <saku at ytti.fi>:

> On Wed, 2 Sep 2020 at 16:16, Baldur Norddahl <baldur.norddahl at gmail.com>
> wrote:
>
> > I am not buying it. No normal implementation of BGP stays online,
> replying to heart beat and accepting updates from ebgp peers, yet after 5
> hours failed to process withdrawal from customers.
>
> I can imagine writing BGP implementation like this
>
>  a) own queue for keepalives, which i always serve first fully
>  b) own queue for update, which i serve second
>  c) own queue for withdraw, which i serve last
>
> Why I might think this makes sense, is perhaps I just received from
> RR2 prefix I'm pulling from RR1, if I don't handle all my updates
> first, I'm causing outage that should not happen, because I already
> actually received the update telling I don't need to withdraw it.
>
> Is this the right way to do it? Maybe not, but it's easy to imagine
> why it might seem like a good idea.
>
> How well BGP works in common cases and how it works in pathologically
> scaled and busy cases are very different cases.
>
> I know that even in stable states commonly run vendors on commonly run
> hardware can take +2h to finish converging iBGP on initial turn-up.
>
> --
>   ++ytti
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.nanog.org/pipermail/nanog/attachments/20200902/aa9f78c7/attachment.html>


More information about the NANOG mailing list