<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
</head>
<body text="#000000" bgcolor="#FFFFFF">
<p>One thing that is troubling when reading that URL is that it
appears several steps of restoration required teams to go onsite
for local login, etc.,. Granted, to troubleshoot hardware you need
to be physically present to pop a line card in and out, but
CTL/LVL3 should have full out-of-band console and power control to
all core devices, we shouldn't be waiting for someone to drive to
a location to get console or do power cycling. And I would imagine
the first step to alot of the troubleshooting was power cycling
and local console logs.<br>
</p>
<p><br>
</p>
<p>-John<br>
</p>
<p><br>
</p>
<p><br>
</p>
<div class="moz-cite-prefix">On 12/30/18 10:42 AM, Mike Hammett
wrote:<br>
</div>
<blockquote type="cite"
cite="mid:543548599.1639.1546184537220.JavaMail.mhammett@ThunderFuck">
<meta http-equiv="content-type" content="text/html; charset=UTF-8">
<style type="text/css">p { margin: 0; }</style>
<div style="font-family: arial,helvetica,sans-serif; font-size:
10pt; color: #000000">It's technical enough so that laypeople
immediately lose interest, yet completely useless to anyone that
works with this stuff.<br>
<br>
<div><span name="x"></span><br>
<br>
-----<br>
Mike Hammett<br>
Intelligent Computing Solutions<br>
<a class="moz-txt-link-freetext" href="http://www.ics-il.com">http://www.ics-il.com</a><br>
<br>
Midwest-IX<br>
<a class="moz-txt-link-freetext" href="http://www.midwest-ix.com">http://www.midwest-ix.com</a><span name="x"></span><br>
</div>
<br>
<hr id="zwchr">
<div
style="color:#000;font-weight:normal;font-style:normal;text-decoration:none;font-family:Helvetica,Arial,sans-serif;font-size:12pt;"><b>From:
</b>"Saku Ytti" <a class="moz-txt-link-rfc2396E" href="mailto:saku@ytti.fi"><saku@ytti.fi></a><br>
<b>To: </b>"nanog list" <a class="moz-txt-link-rfc2396E" href="mailto:nanog@nanog.org"><nanog@nanog.org></a><br>
<b>Sent: </b>Sunday, December 30, 2018 7:42:49 AM<br>
<b>Subject: </b>CenturyLink RCA?<br>
<br>
Apologies for the URL, I do not know official source and I do
not<br>
share the URLs sentiment.<br>
<a class="moz-txt-link-freetext" href="https://fuckingcenturylink.com/">https://fuckingcenturylink.com/</a><br>
<br>
Can someone translate this to IP engineer? What did actually
happen?<br>
From my own history, I rarely recognise the problem I fixed
from<br>
reading the public RCA. I hope CenturyLink will do better.<br>
<br>
Best guess so far that I've heard is<br>
<br>
a) CenturyLink runs global L2 DCN/OOB<br>
b) there was HW fault which caused L2 loop (perhaps HW dropped
BPDU,<br>
I've had this failure mode)<br>
c) DCN had direct access to control-plane, and L2 congested<br>
control-plane resources causing it to deprovision waves<br>
<br>
Now of course this is entirely speculation, but intended to
show what<br>
type of explanation is acceptable and can be used to fix
things.<br>
Hopefully CenturyLink does come out with IP-engineering
readable<br>
explanation, so that we may use it as leverage to support work
in our<br>
own domains to remove such risks.<br>
<br>
a) do not run L2 DCN/OOB<br>
b) do not connect MGMT ETH (it is unprotected access to
control-plane,<br>
it cannot be protected by CoPP/lo0 filter/LPTS ec)<br>
c) do add in your RFP scoring item for proper OOB port (Like
Cisco CMP)<br>
d) do fail optical network up<br>
<br>
-- <br>
++ytti<br>
</div>
<br>
</div>
</blockquote>
</body>
</html>