What to expect after a cooling failure

Jimmy Hess mysidia at gmail.com
Wed Jul 10 05:07:11 UTC 2013


On 7/9/13, Erik Levinson <erik.levinson at uberflip.com> wrote:
> For those who have gone through such events in the past, what can one expect
> in terms of long-term impact...should we expect some premature component
> failures? Does anyone have any stats to share?

Realistically...  you had a single short-lived stress event.    There
are likely to be some number of random component failures in the
future.   It is unlikely that you will be able to attribute the
failures to such a short lived stress event of that magnitude  --
there might on average be a small increase over normal failure rates.

The bigger concern,  may be that  /a lot of different components/
could have been subject to the same kind of abuse at the same time:
including  sets of components that are supposed to be in a redundant
pair  and not fail simultaneously.

I wouldn't necessarily be so concerned about premature failures ---
I would be more concerned,  that you  may have redundant components
that were exposed to the same stress event at the same time;    now
the assumption that   their chances of failure are independent  may
become more questionable   ---   the chance of a correlated failure in
the future  might be greatly increased,     reducing the level of
effective redundancy/risk reduction today.

That would apply mainly to mechanical devices such as HDDs.


> Thanks
--
-JH




More information about the NANOG mailing list