What to expect after a cooling failure

Tri Tran tritran at cox.net
Wed Jul 10 05:56:07 UTC 2013


I have seen DDR2 RAM give random errors from inadequate cooling. The cabinets were stacked to the max with severs but the doors were not meshed. DDR2 run fairly hot, especially when all the banks are filled.
Tri Tran

-----Original Message-----
From: Jay Ashworth <jra at baylink.com>
Date: Wed, 10 Jul 2013 00:04:23 
To: NANOG<nanog at nanog.org>
Subject: Re: What to expect after a cooling failure

----- Original Message -----
> From: "Erik Levinson" <erik.levinson at uberflip.com>


> For those who have gone through such events in the past, what can one
> expect in terms of long-term impact...should we expect some premature
> component failures? Does anyone have any stats to share?

If the HDDs were spinning while above rated maximum ambient intake temp,
*especially* if they're not *right out front in the intake path* (is
anything not built that way anymore?  Yeah; the back side of 45-drive
Supermicro racks, among other things), you should probably plan on doing
a preemptive replacement cycle, or at the very least, pay *very* close
attention to smartctld, and have a good stock of pre-trayed replacements.

Remember that you may fall in the RAID Hole if you wait for failures,
and hence lose data which isn't backed up anyway -- if more drives in a 
raid group fail *during rebuilds*, you're essentially screwed.

If your raid groups were properly dispersed across drive build dates, then
this will probably be *slightly* less dangerous, but still.

Also watch bearing-type fans.

Cheers,
-- jra
-- 
Jay R. Ashworth                  Baylink                       jra at baylink.com
Designer                     The Things I Think                       RFC 2100
Ashworth & Associates     http://baylink.pitas.com         2000 Land Rover DII
St Petersburg FL USA               #natog                      +1 727 647 1274



More information about the NANOG mailing list