Tornados in Ashburn (Equinix affected)

Sean Donelan sean at donelan.com
Sun Sep 19 05:40:53 UTC 2004


On Sat, 18 Sep 2004, Deepak Jain wrote:
> 3) Many new systems [say datacenters built/upgraded in the last 5 years]
> haven't been around long enough to really test 99.999% and above levels
> of availability... many new systems won't start showing problems for
> 5-10 years.

Past performance is not a guarantee of future results.

Sometimes you get lucky.  My residence with no UPS, no backup generator,
no surge protection hasn't lost power in almost 5 years even during
the California rolling blackouts.  Nevertheless I wouldn't recommend using
my residence as co-location.

The 5 9s is a bit of a myth and causes some creative statistics. There are
datacenters over 5 years old which have met 100% scheduled availability.
They are rare and probably exceeded their design expectations.  All of
them I know about are private data centers, not co-location, and all the
owners have backup data centers because they know one day they will have a
problem. On the other hand, there are many private data centers worse
than professionally operated co-location facilities.

> 1) Good that they [seemed] to have maintained partial power.

It would be interesting to find out what happened to the two UPSes that
apparently failed.  Was it something that exceeded the design, i.e. a
lightning strike greater than X joules?  Or something else?  Equinix
tests the heck out of their systems, but there is always the potential
for a problem.

> 2) Good that they restored cooling [power to the blowers?] relatively
> quickly. By the graph someone posted and their message, it looks like
> their chillers were on an unaffected system, but their blowers weren't
> [as in, were affected].

The initial spike looks normal, although a bit bigger than is comfortable.
Chiller plants and compressors take several minutes to reset and restart
when the backup generators come online.  The storm may have had some
impact on the recovery because the temperature appears to take a long time
to stabilize.

> 3) Good that they seemed to be able to bring together enough
> knowledgeable folks quickly to resolve the problems that did occur
> relatively quickly.

Yep, whatever the problem, restoration that quickly tends to indicate
their team was on the ball.  Stuff will always fail.  The real test is
how quickly is it fixed.



More information about the NANOG mailing list