FYI Netflix is down

Sat Jun 30 13:12:27 UTC 2012

On Jun 30, 2012 12:25 AM, "joel jaeggli" <joelja at bogus.com> wrote:
>
> On 6/30/12 12:11 AM, Tyler Haske wrote:
>>>
>>> I am not a computer science guy but been around a long time.  Data
centers
>>> and clouds are like software.  Once they reach a certain size, its
>>> impossible to keep the bugs out.  You can test and test your heart out
and
>>> something will slip by.  You can say the same thing about nuclear
reactors,
>>> Apollo moon missions, the NorthEast power grid, and most other
technology
>>> disasters.
>>
>> How to run a datacenter 101. Have more then one location, preferably
>> far apart. It being Amazon I would expect more. :/
>
> there are 7 regions  in ec2 three in north  america two in asia one in
europe and one in south america.
>
> us east coast, the one currently being impacted is further subdivided
into 5 availability zones.
>
> us east 1d appears to be the only one currently being impacted.
>
> distributing your application is left as an exercise to the reader.
>
>

+1

Sorry to be the monday morning quarterback, but the sites that went down
learned a valuable lesson in single point of failure analysis.  A highly
redundant and professionally run data center is a single point of failure.

Geo-redundancy is key. In fact, i would take distributed data centers over
RAID, UPS, or any other "fancy pants" © mechanisms any day.

And,  aws East also seems to be cursed. I would run out of west for a
while. :-)

I would also look into clouds of clouds. ... Who knows. Amazon could have
an Enron moment, at which point a corporate entity with a tax id is now a
single point of failure.

Pay your money, take your chances.

CB