Amazon diagnosis

Sun May 1 18:18:47 UTC 2011

On 5/1/2011 2:07 PM, Mike wrote:
> I am still waiting for proof that single points of failure can
> realistically be completely eliminated from any moderately complicated
> network environment / application. So far, I think murphy is still
> winning on this one.

Sure they can, but as a thought exercise fully 2n redundancy is
difficult on a small scale for anything web facing.  I've seen a very
simple implementation for a website requiring 5 9's that consumed over
$50k in equipment, and this wasn't even geographically diverse.  I have
to believe that scaling up the concept of "doing it right" results in
exponential cost increases.  To illustrate the problem, I would give you
the first step in the thought exercise:  first find two datacenters with
diverse carriers, that aren't on the same regional power grid (As we've
learned in the (iirc) 2003 power outage, New York and DC won't work, nor
will Ohio, so you need redundant teams to cover a very remote site).