Centurylink having a bad morning?

Mike Bolitho mikebolitho at gmail.com
Mon Aug 31 14:48:56 UTC 2020


That's all we can do. Thankfully I work for an org that understands this
and has *at least* two fully redundant circuits. Sometimes a third smaller
carrier if we can prove that it is diverse, but that isn't the case very
often.

- Mike Bolitho


On Mon, Aug 31, 2020 at 7:35 AM Tomas Lynch <tomas.lynch at gmail.com> wrote:

> Maybe we are idealizing these so-called tier-1 carriers and we, tier-ns,
> should treat them as what they really are: another AS. Accept that they are
> going to fail and do our best to mitigate the impact on our own networks,
> i.e. more peering.
>
> On Mon, Aug 31, 2020 at 9:54 AM Martijn Schmidt via NANOG <nanog at nanog.org>
> wrote:
>
>> At this point you don't even know whether it's a human error (example:
>> generating a flowspec rule for port TCP/179), a filtering issue (example:
>> accepting a flowspec rule for port TCP/179), or a software issue (example:
>> certain flowspec update crashes the BGP daemon). And in the third scenario
>> I think that at least some portion of the blame shifts from the carrier to
>> its vendors, assuming the thing that crashed was not a home-grown BGP
>> implementation.
>>
>> With the route optimizer incidents - because let's face it, Honest
>> Networker is on the money as usual
>> https://honestnetworker.net/2020/08/06/as10990-routing/ - there is
>> really no excuse for any tier-1 carrier, they should at the very least have
>> strict prefix-list based filtering in place for customer-facing EBGP
>> sessions. In those cases it's much easier to state who's not taking care of
>> their proverbial lawn.
>>
>> Best regards,
>> Martijn
>>
>> On 8/31/20 3:25 PM, Tom Beecher wrote:
>>
>> https://blog.cloudflare.com/analysis-of-todays-centurylink-level-3-outage/
>>
>>
>> I definitely found Mr. Prince's writing about yesterday's events
>> fascinating.
>>
>> Verizon makes a mistake with BGP filters that allows a secondary mistake
>> from leaked "optimizer" routes to propagate, and Mr. Prince takes every
>> opportunity to lob large chunks of granite about how terrible they are.
>>
>> L3 allows an erroneous flowspec announcement to cause massive global
>> connectivity issues, and Mr. Prince shrugs and says "Incidents happen."
>>
>>
>>
>>
>>
>> On Mon, Aug 31, 2020 at 1:15 AM Hank Nussbacher <hank at interall.co.il>
>> wrote:
>>
>>> On 30/08/2020 20:08, Baldur Norddahl wrote:
>>>
>>>
>>> https://blog.cloudflare.com/analysis-of-todays-centurylink-level-3-outage/
>>>
>>> Sounds like Flowspec possibly blocking tcp/179 might be the cause.
>>>
>>> But that is Cloudflare speculation.
>>>
>>> Regards,
>>> Hank
>>> Caveat: The views expressed above are solely my own and do not express
>>> the views or opinions of my employer
>>>
>>> An outage is what it is. I am not worried about outages. We have
>>> multiple transits to deal with that.
>>>
>>> It is the keep announcing prefixes after withdrawal from peers and
>>> customers that is the huge problem here. That is killing all the effort and
>>> money I put into having redundancy. It is sabotage of my network after I
>>> cut the ties. I do not want to be a customer at an outlet who has a system
>>> that will do that. Luckily we do not currently have a contract and now they
>>> will have to convince me it is safe for me to make a contract with them. If
>>> that is impossible I guess I won't be getting a contract with them.
>>>
>>> But I disagree in that it would be impossible. They need to make a good
>>> report telling exactly what went wrong and how they changed the design, so
>>> something like this can not happen again. The basic design of BGP is such
>>> that this should not happen easily if at all. They did something unwise.
>>> Did they make a route reflector based on a database or something?
>>>
>>> Regards,
>>>
>>> Baldur
>>>
>>> On Sun, Aug 30, 2020 at 5:13 PM Mike Bolitho <mikebolitho at gmail.com>
>>> wrote:
>>>
>>>> Exactly. And asking that they somehow prove this won't happen again is
>>>> impossible.
>>>>
>>>> - Mike Bolitho
>>>>
>>>> On Sun, Aug 30, 2020, 8:10 AM Drew Weaver <drew.weaver at thenap.com>
>>>> wrote:
>>>>
>>>>> I’m not defending them but I am sure it isn’t intentional.
>>>>>
>>>>>
>>>>>
>>>>> *From:* NANOG <nanog-bounces+drew.weaver=thenap.com at nanog.org> *On
>>>>> Behalf Of *Baldur Norddahl
>>>>> *Sent:* Sunday, August 30, 2020 9:28 AM
>>>>> *To:* nanog at nanog.org
>>>>> *Subject:* Re: Centurylink having a bad morning?
>>>>>
>>>>>
>>>>>
>>>>> How is that acceptable behaviour? I shall remember never to make a
>>>>> contract with these guys until they can prove that they won't advertise my
>>>>> prefixes after I pull them. Under any circumstances.
>>>>>
>>>>>
>>>>>
>>>>> søn. 30. aug. 2020 15.14 skrev Joseph Jenkins <
>>>>> joe at breathe-underwater.com>:
>>>>>
>>>>> Finally got through on their support line and spoke to level1. The
>>>>> only thing the tech could say was it was an issue with BGP route reflectors
>>>>> and it started about 3am(pacific). They were still trying to isolate the
>>>>> issue. I've tried failing over my circuits and no go, the traffic just dies
>>>>> as L3 won't stop advertising my routes.
>>>>>
>>>>>
>>>>>
>>>>> On Sun, Aug 30, 2020 at 5:21 AM Drew Weaver via NANOG <nanog at nanog.org>
>>>>> wrote:
>>>>>
>>>>> Hello,
>>>>>
>>>>>
>>>>>
>>>>> Woke up this morning to a bunch of reports of issues with connectivity
>>>>> had to shut down some Level3/CTL connections to get it to return to normal.
>>>>>
>>>>>
>>>>>
>>>>> As of right now their support portal won’t load:
>>>>> https://www.centurylink.com/business/login/
>>>>>
>>>>>
>>>>>
>>>>> Just wondering what others are seeing.
>>>>>
>>>>>
>>>>>
>>>>>
>>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.nanog.org/pipermail/nanog/attachments/20200831/104b3d6e/attachment.html>


More information about the NANOG mailing list