CenturyLink RCA?

Töma Gavrichenkov ximaera at gmail.com
Sun Dec 30 17:46:05 UTC 2018


There's a Reddit user claiming he works at CL who says the reason were some
faulty Infinera DTN-X instances.

https://www.reddit.com/r/centurylink/comments/aa2qa4/comment/ecovgab

(dunno though why the user posted that to Reddit and not here)

30 Dec. 2018 г., 20:19 Saku Ytti <saku at ytti.fi>:

> Hey John,
>
> Your criticism is warranted, but would also be addressed by
> explanation DCN/OOB being the source of the problem.
>
> At any rate, I am looking forward to stop speculating and start
> reading post-mortem written by someone who knows how networks work.
>
> On Sun, 30 Dec 2018 at 18:28, John Von Essen <john at essenz.com> wrote:
> >
> > One thing that is troubling when reading that URL is that it appears
> several steps of restoration required teams to go onsite for local login,
> etc.,. Granted, to troubleshoot hardware you need to be physically present
> to pop a line card in and out, but CTL/LVL3 should have full out-of-band
> console and power control to all core devices, we shouldn't be waiting for
> someone to drive to a location to get console or do power cycling. And I
> would imagine the first step to alot of the troubleshooting was power
> cycling and local console logs.
> >
> >
> > -John
> >
> >
> >
> > On 12/30/18 10:42 AM, Mike Hammett wrote:
> >
> > It's technical enough so that laypeople immediately lose interest, yet
> completely useless to anyone that works with this stuff.
> >
> >
> >
> > -----
> > Mike Hammett
> > Intelligent Computing Solutions
> > http://www.ics-il.com
> >
> > Midwest-IX
> > http://www.midwest-ix.com
> >
> > ________________________________
> > From: "Saku Ytti" <saku at ytti.fi>
> > To: "nanog list" <nanog at nanog.org>
> > Sent: Sunday, December 30, 2018 7:42:49 AM
> > Subject: CenturyLink RCA?
> >
> > Apologies for the URL, I do not know official source and I do not
> > share the URLs sentiment.
> > https://fuckingcenturylink.com/
> >
> > Can someone translate this to IP engineer? What did actually happen?
> > From my own history, I rarely recognise the problem I fixed from
> > reading the public RCA. I hope CenturyLink will do better.
> >
> > Best guess so far that I've heard is
> >
> > a) CenturyLink runs global L2 DCN/OOB
> > b) there was HW fault which caused L2 loop (perhaps HW dropped BPDU,
> > I've had this failure mode)
> > c) DCN had direct access to control-plane, and L2 congested
> > control-plane resources causing it to deprovision waves
> >
> > Now of course this is entirely speculation, but intended to show what
> > type of explanation is acceptable and can be used to fix things.
> > Hopefully CenturyLink does come out with IP-engineering readable
> > explanation, so that we may use it as leverage to support work in our
> > own domains to remove such risks.
> >
> > a) do not run L2 DCN/OOB
> > b) do not connect MGMT ETH (it is unprotected access to control-plane,
> > it  cannot be protected by CoPP/lo0 filter/LPTS ec)
> > c) do add in your RFP scoring item for proper OOB port (Like Cisco CMP)
> > d) do fail optical network up
> >
> > --
> >   ++ytti
> >
>
>
> --
>   ++ytti
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.nanog.org/pipermail/nanog/attachments/20181230/1013e57e/attachment.html>


More information about the NANOG mailing list