What to expect after a cooling failure
george.herbert at gmail.com
Wed Jul 10 09:07:29 UTC 2013
Numbers from memory and filed off a bit for anonymity, but....
A site I was consulting with had statistically large numbers of x86 servers (say, 3000), SPARC enterprise gear (100), NetApp units (60) and NetApp drives (5000+) go through a roughly 42C excursion. It was much hotter at ceiling level but fortunately high (20 foot) ceilings. Within about 1C of the (wet pipes) sprinkler system head fuse temp... (shudder)
Both NetApp and X86 server PSUs had significantly increased failure rates for the next year. Say in rough numbers 10% failed in the year. About 2% were instant fails.
Hard drives had a significantly higher fail rate for the next year, also in the 10% range.
No change in rate of motherboard or CPU or RAM failures was noted that I recall.
George William Herbert
Sent from my iPhone
On Jul 9, 2013, at 8:28 PM, "Erik Levinson" <erik.levinson at uberflip.com> wrote:
> As some may know, yesterday 151 Front St suffered a cooling failure after Enwave's facilities were flooded.
> One of the suites that we're in recovered quickly but the other took much longer and some of our gear shutdown automatically due to overheating. We shut down remotely many redundant and non-essential systems in the hotter suite, and transferred remotely some others to the cooler suite, to ensure that we had a minimum of all core systems running in the hotter suite. We waited until the temperatures returned to normal, and brought everything back online. The entire event lasted from approx 18:45 until 01:15. Apparently ambient temperature was above 43 degrees Celcius at one point on the cool side of cabinets in the hotter suite.
> For those who have gone through such events in the past, what can one expect in terms of long-term impact...should we expect some premature component failures? Does anyone have any stats to share?
> Erik Levinson
> CTO, Uberflip
> 1183 King Street West, Suite 100
> Toronto ON M6K 3C5
More information about the NANOG