FYI Netflix is down

Hal Murray hmurray at megapathdsl.net
Tue Jul 3 04:24:29 UTC 2012


George Herbert <george.herbert at gmail.com> said:

> I worked for a Sun clone vendor (Axil) for a while and took some of our
> systems and storage to Comdex one year in the 90s.  We had a RAID unit
> (Mylex controller) we had just introduced.  Beforehand, I made REALLY REALLY
> SURE that the pull-the-disk and pull-the-redundant-power tricks worked.  And
> showed them to people with the "Please keep in mind that this voids the
> warranty, but here we *rip* go...".  All of the other server vendors were
> giving me dirty looks for that one. Apparently I sold a few systems that
> way. 

:)  Nice.  Thanks.

Many years ago, I worked for one of DEC's research groups.  We built a 
network using FDDI 4B/5B link technology based on AMD TAXI chips.  (They were 
state of the art back then.)  The switches were 3U(?) boxes with 12 ports.  
It took a rack of 6 or 8 of them in the phone closet to cover a floor.  
Workstations had 2 cables plugged into different switches.  In theory, we 
covered any single point of failure.

My office was near the phone closet.  I got to watch my boss give demos to 
visiting VIPs.  He was pretty good at it.  In the middle of explaining 
things, he would grab a power cord and yank it.  Blinka-blinka=blinka and the 
remaining switches would reconfigure and go back to work.  (It took under a 
second.)

It was interesting to watch the VIPs.  Most of them got it: the network 
really could recover quickly. The interesting ones had a telco background.  
They were really surprised.  The concept of disrupting live traffic for 
something as insignificant as a demo was off scale in their culture.

It was just a research lab.  We were used to eating our own dog food.

----------

"Greg D. Moore" <mooregr at greenms.com> said:

> If folks have not read it, I would suggest reading Normal Accidents  by
> Charles Perrow.

+1

> The "it can't happen" is almost guaranteed to happen. ;-)  And when  it
> does, it'll often interact in ways we can't predict or sometimes  even
> understand. 

My memory of that sort of event is roughly...  (see above for context)

The hardware broke and turned a vanilla packet into a super-long packet.  My 
FPGA code was supposed to catch that case and do something sane.  It was 
never tested and didn't work.  It poured crap all over memory.  Needless to 
say, things went downhill from there.

Easy to spot in hindsight.  None of us thought that was an interesting case 
while we were testing.


-- 
These are my opinions.  I hate spam.







More information about the NANOG mailing list