Famous operational issues
Rogier van Eeten
rogier at virtunix.nl
Wed Feb 17 14:57:38 UTC 2021
Ahh, war stories. I like the one where I got a wake up call that our IRC
server was on fire, together with the rest of the DC.
Not that widespread, but we reached Slashdot. :)
November 2002, University of Twente, The Netherlands. Some idiot wanted
to be a hero. He deflated peoples tires, to help inflate them. One
morning he thought it would be a good idea to start a small fire and
then extinguish it, so he would be the hero that stopped a fire. He
failed and the building burned down. He got caught a few days later when
he tried the same thing in a different building.
Almost all of the IT was in that building, including core network,
uplinks to SURFNet (Dutch Educational Network) and to the 2000 students
living on the campus. Ironically a new DC was already being built, so
that was ready for use a few weeks later.
As we had quite a network for 2002 we hosted for instance
security.debian.org. The students all had 100Mbit in their room, so some
of them also hosted some popular websites. One I can remember was an
image sharing site.
Some students immediately created a backup network; dhcp server, dns
server with a catch all, website explaining what was going on, IRC
A local ISP offered to sponsor 50Mbit for the residents, which was
connected via a microwave relay and a temporary fiber was run through a
ditch to connect two parts of the campus residencies. At the end of the
day all 2000 students had their internet connection back, although all
behind a single 50Mbit link.
Syslog message from the local SURFNet router:
lo0.ar5.enschede1.surf.net 3613: Nov 20 07:20:50.927 UTC:
%ENV_MON-2-TEMP: Hotpoint temp sensor(slot 18) temperature has reached
WARNING level at 61(C)
(Disclaimer: Where I say we, I mean we as University. I wasn't working
for the university, but was part of the students working on the backup
network. There are probably some other people on list with some more
details and I've probably missed some details, but this is the summary.)
On 16-02-2021 23:08, Jared Mauch wrote:
> I was thinking about how we need a war stories nanog track. My favorite was being on call when the router was stolen.
> Sent from my TI-99/4a
>> On Feb 16, 2021, at 2:40 PM, John Kristoff <jtk at dataplane.org> wrote:
>> I'd like to start a thread about the most famous and widespread Internet
>> operational issues, outages or implementation incompatibilities you
>> have seen.
>> Which examples would make up your top three?
>> To get things started, I'd suggest the AS 7007 event is perhaps the
>> most notorious and likely to top many lists including mine. So if
>> that is one for you I'm asking for just two more.
>> I'm particularly interested in this as the first step in developing a
>> future NANOG session. I'd be particularly interested in any issues
>> that also identify key individuals that might still be around and
>> interested in participating in a retrospective. I already have someone
>> that is willing to talk about AS 7007, which shouldn't be hard to guess
>> Thanks in advance for your suggestions,
More information about the NANOG