Famous operational issues

Valdis Kl=?utf-8?Q?=c4=93?=tnieks valdis.kletnieks at vt.edu
Wed Feb 24 06:20:01 UTC 2021


On Tue, 23 Feb 2021 20:46:38 -0800, Randy Bush said:
> maybe late '60s or so, we had a few 2314 dasd monsters[0].  think maybe
> 4m x 2m with 9 drives with removable disk packs.
>
> a grave shift operator gets errors on a drive and wonders if maybe they
> swap it into another spindle.  no luck, so swapped those two drives with
> two others.  one more iteration, and they had wiped out the entire
> array.  at that point they called me; so i missed the really creative
> part.

I suspect every S/360 site that had 2314's had an operator who did that, as I
was witness to the same thing.  For at least a decade after that debacle, the
Manager of Operations was awarding Gold, Silver, and Bronze Danny awards for
operational screw-ups. (The 2314 event was the sole Platinum Danny :)

And yes, IBM 4341 consoles were all too easy to hit the EPO button on the
keyboard, we got guards for the consoles after one of our operators nailed the
button a second time in a month.

And to tie the S/360 and 4341 together - we were one of the last sites that was
still running an S/360 Mod 65J.  And plans came through for a new server room
on the top floor of a new building.  Architect comes through, measures the S/360
and all the peripherals for floorspace and power/cooling - and the CPU, plus
*4* meg of memory, and 3 strings of 2314 drives chewed a lot of both.

Construction starts.   Meanwhile, IBM announces the 4341, and offers us a real
sweetheart deal because even at the high maintenance charges we were paying,
IBM was losing money. Something insane like the system and peripherals and
first 3 years of maintenance, for less than the old system per-year
maintenance. Oh, and the power requirements are like 10% of the 360s.

So we take delivery of the new system and it's looking pitiful, just one box
and 2 small strings of disk in 10K square feet.  Lots of empty space. Do all
the migrations to the new system over the summer, and life is good.   Until
fall and winter arrive, and we discover there is zero heat in the room, and the
ceiling is uninsulated, and it's below zero outside because this is way upstate
NY.  And if there was a 360 in the room, it would *still* be needing cooling
rather than heating. But it's a 4341 that's shedding only 10% of the heat...

Finally, one February morning, the 4341 throws a thermal check. Air was too
cold at the intakes.  Our IBM CE did a double-take because he'd been doing IBM
mainframes for 3 decades and had never seen a thermal check for too cold
before.

Lots of legal action threatened against the architect, who simply said "If you
had *told* me that the system was being replaced, I'd have put heat in the
room". A settlement was reached, revised plans were drawn up, there was a whole
mess of construction to get ductwork and insulation and other stuff into place,
and life was good for the decade or so before I left for a better gig....




More information about the NANOG mailing list