Level3 worldwide emergency upgrade?

Dorian Kim dorian at blackrose.org
Thu Feb 7 22:12:14 UTC 2013


No one had hit the ISIS bug before the IETF enforced maintenance freeze because no one in their right mind would be running three week old code back then. I don't think things have changed that much. ;)

-dorian

On Feb 7, 2013, at 4:19 PM, Siegel, David wrote:

> I remember being glued to my workstation for 10 straight hours due to an OSPF bug that took down the whole of net99's network.
> 
> I was pretty proud of our size at the time...about 30Mbps at peak.  Times are different and so are expectations.  :-)
> 
> Dave
> 
> 
> -----Original Message-----
> From: Brett Watson [mailto:brett at the-watsons.org] 
> Sent: Wednesday, February 06, 2013 6:07 PM
> To: nanog at nanog.org
> Subject: Re: Level3 worldwide emergency upgrade?
> 
> Hell, we used to not have to bother notifying customers of anything, we just fixed the problem. Reminds me a of a story I've probably shared on the past. 
> 
> 1995, IETF in Dallas. The "big ISP" I worked for at the time got tripped up on a 24-day IS-IS timer bug (maybe all of them at the time did, I don't recall)  where all adjacencies reset at once. That's like, entire network down. Working with our engineering team in the *terminal* lab mind you, and Ravi Chandra (then at Cisco) we reloaded the entire network of routers with new code from Cisco once they'd fixed the bug. I seem to remember this being my first exposure to Tony Li's infamous line, "... Confidence Level: boots in the lab."
> 
> Good times.
> 
> -b
> 
> 
> On Feb 6, 2013, at 5:41 PM, Brandt, Ralph wrote:
> 
>> David. I am on an evening shift and am just now reading this thread.   
>> 
>> I was almost tempted to write an explanation that would have had 
>> identical content with yours based simply on Level3 doing something 
>> and keeping the information close.
>> 
>> Responsible Vendors do not try to hide what is being done unless it is 
>> an Op Sec issue and I have never seen Level3 act with less than 
>> responsibility so it had to be Op Sec.
>> 
>> When it is that, it is best if the remainder of us sit quietly on the 
>> sidelines.
>> 
>> Ralph Brandt
>> 
>> 
>> -----Original Message-----
>> From: Siegel, David [mailto:David.Siegel at Level3.com]
>> Sent: Wednesday, February 06, 2013 12:01 PM
>> To: 'Ray Wong'; nanog at nanog.org
>> Subject: RE: Level3 worldwide emergency upgrade?
>> 
>> Hi Ray,
>> 
>> This topic reminds me of yesterday's discussion in the conference 
>> around getting some BCOP's drafted.  it would be useful to confirm my 
>> own view of the BCOP around communicating security issues.  My 
>> understanding for the best practice is to limit knowledge distribution 
>> of security related problems both before and after the patches are 
>> deployed.  You limit knowledge before the patch is deployed to prevent 
>> yourself from being exploited, but you also limit knowledge afterwards 
>> in order to limit potential damage to others (customers, 
>> competitors...the Internet at large).  You also do not want to 
>> announce that you will be deploying a security patch until you have a 
>> fix in hand and know when you will deploy it (typically, next 
>> available maintenance window unless the cat is out of the bag and danger is real and imminent).
>> 
>> As a service provider, you should stay on top of security alerts from 
>> your vendors so that you can make your own decision about what action 
>> is required.  I would not recommend relying on service provider 
>> maintenance bulletins or public operations mailing lists for obtaining 
>> this type of information.  There is some information that can cause 
>> more harm than good if it is distributed in the wrong way and 
>> information relating to security vulnerabilities definitely falls into that category.
>> 
>> Dave
>> 
>> -----Original Message-----
>> From: Ray Wong [mailto:rayw at rayw.net]
>> Sent: Wednesday, February 06, 2013 9:16 AM
>> To: nanog at nanog.org
>> Subject: Re: Level3 worldwide emergency upgrade?
>> 
>>> 
>> 
>> OK, having had that first cup of coffee, I can say perhaps the main 
>> reason I was wondering is I've gotten used to Level3 always being on 
>> top of things (and admittedly, rarely communicating). They've reached 
>> the top by often being a black box of reliability, so it's (perhaps
>> unrealistically) surprising to see them caught by surprise. Anything 
>> that pushes them into scramble mode causes me to lose a little sleep 
>> anyway. The alternative to what they did seems likely for at least a 
>> few providers who'll NOT manage to fix things in time, so I may well 
>> be looking at longer outages from other providers, and need to issue 
>> guidance to others on what to do if/when other links go down for 
>> periods long enough that all the cost-bounding monitoring alarms start 
>> to scream even louder.
>> 
>> I was also grumpy at myself for having not noticed advance 
>> communication, which I still don't seem to have, though since I 
>> outsourced my email to bigG, I've noticed I'm more likely to miss 
>> things. Perhaps giving up maintaining that massive set of procmail 
>> rules has cost me a bit more edge.
>> 
>> Related, of course, just because you design/run your network to 
>> tolerate some issues doesn't mean you can also budget to be in support 
>> contract as well. :) Knowing more about the exploit/fix might mean 
>> trying to find a way to get free upgrades to some kit to prevent more 
>> localized attacks to other types of gear, as well, though in this case 
>> it's all about Juniper PR839412 then, so vendor specific, it seems?
>> 
>> There are probably more reasons to wish for more info, too. There's 
>> still more of them (exploiters/attackers) than there are those of us 
>> trying to keep things running smoothly and transparently, so anything 
>> that smells of "OMG new exploit found!" also triggers my desire to 
>> share information. The network bad guys share information far more 
>> quickly and effectively than we do, it often seems.
>> 
>> -R>
>> 
>> 
>> 
> 
> 
> 





More information about the NANOG mailing list