Famous operational issues

Aaron C. de Bruyn aaron at heyaaron.com
Fri Feb 19 16:44:59 UTC 2021


All these stories remind me of two of my own from back in the late 90s.
I worked for a regional ISP doing some network stuff (under the real
engineer), and some software development.

Like a lot of ISPs in the 90s, this one started out in a rental house.
Over the months and years rooms were slowly converted to host more and more
equipment as we expanded our customer base and presence in the region.
If we needed a "rack", someone would go to the store and buy a 4-post metal
shelf [1] or...in some cases the dump to see what they had.

We had one that looked like an oversized filing cabinet with some sort of
rails on the sides.  I don't recall how the equipment was mounted, but I
think it was by drilling holes into the front lip and tapping the screws
in.  This was the big super-important rack.  It had the main router that
connected lines between 5 POPs around the region, and also several
connections to Portland Oregon about 60 miles away.  Since we were
making tons of money, we decided we should update our image and install
real racks in the "bedroom server room".  It was decided we were going to
do it with no downtime.

I was on the 2-man team that stood behind and in front of the rack with
2x4s dead-lifting them as equipment was unscrewed and lowered onto the
boards.  I was on the back side of the rack.  After all the equipment was
unscrewed, someone came in with a sawzall and cut the filing cabinet thing
apart.  The top half was removed and taken away, then we lifted up on the
boards and the bottom half was slid out of the way.  The new rack was
brought in, bolted to the floor, and then one by one equipment was taken
off the pile we were holding up with 2x4s, brought through the back of the
new rack, and then mounted.

I was pleasantly surprised and very relieved when we finished moving the
big router, several switches, a few servers, and a UPS unit over to the new
rack with zero downtime.  The entire team cheered and cracked beers.  I
stepped out from behind the rack...
...and snagged the power cable to the main router with my foot.  I don't
recall the Cisco model number after all this time...but I do remember the
excruciating 6-8 minutes it took for the damn thing to reboot, and the
sight of the 7 PRI cards in our phone system almost immediately jumping
from 5 channels in-use to being 100% full.

It's been 20 years, but I swear my arms are still sore from holding all
that equipment up for ~20 minutes, and I always pick my feet up very slowly
when I'm near a rack. ;)

The second story is a short one from the same time period.  Our POPs
consisted of the afore-mentioned 4-post metal shelves stacked with piles of
US Robotics 56k modems [2] stacked on top of each other.  They were wired
back to some sort of serial box that was in-turn connected to an ISA card
stuck in a Windows NT 4 server that used RADIUS to authenticate sessions
with an NT4 server back at the main office that had user accounts for all
our customers.  Every single modem had a wall-wart power brick for power,
an RJ11 phone line, and a big old serial cable.  It was an absolute rats
nest of cables.  The small POP (which I think was a TuffShed in someone's
yard about 50 feet from the telco building) was always 100 degrees--even in
the dead of winter.

One year we made the decision to switch to 3Com Total Control Chassis with
PRI cards.  The cut-over was pretty seamless and immediately made shelves
stacked full of hundreds of modems completely useless.  As we started
disconnecting modems with the intent of selling them for a few bucks to
existing customers who wanted to upgrade or giving them to new customers to
get them signed up, we found a bunch of the stacks of modems had actually
melted together due to the temps.  That explained the handful of numbers in
the hunt group that would just ring and ring with no answer.  In the end we
went from a completely packed 10x20 shed to two small 3Com TCH boxes packed
with PRI cards and a handful of PRI cables with much more normal
temperatures.

I thoroughly enjoyed the "wild west" days of the internet.

If Eric and Dan are reading this, thanks for everything you taught me about
networking, business, hard work, and generally being a good person.

-A

[1] -
https://www.amazon.com/dp/B01D54TICS/ref=redir_mobile_desktop?_encoding=UTF8&aaxitk=Pe4xuew1D1PkrRA9cq8Cdg&hsa_cr_id=5048111780901&pd_rd_plhdr=t&pd_rd_r=4d9e3b6b-3360-41e8-9901-d079ac063f03&pd_rd_w=uRxXq&pd_rd_wg=CDibq&ref_=sbx_be_s_sparkle_td_asin_0_img

[2] - https://www.usr.com/products/56k-dialup-modem/usr5686g/



On Tue, Feb 16, 2021 at 11:39 AM John Kristoff <jtk at dataplane.org> wrote:

> Friends,
>
> I'd like to start a thread about the most famous and widespread Internet
> operational issues, outages or implementation incompatibilities you
> have seen.
>
> Which examples would make up your top three?
>
> To get things started, I'd suggest the AS 7007 event is perhaps  the
> most notorious and likely to top many lists including mine.  So if
> that is one for you I'm asking for just two more.
>
> I'm particularly interested in this as the first step in developing a
> future NANOG session.  I'd be particularly interested in any issues
> that also identify key individuals that might still be around and
> interested in participating in a retrospective.  I already have someone
> that is willing to talk about AS 7007, which shouldn't be hard to guess
> who.
>
> Thanks in advance for your suggestions,
>
> John
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.nanog.org/pipermail/nanog/attachments/20210219/e6ed1ba6/attachment.html>


More information about the NANOG mailing list