Reliable Cloud host ?

George Herbert george.herbert at
Mon Feb 27 19:19:27 UTC 2012

On Mon, Feb 27, 2012 at 7:28 AM, William Herrin <bill at> wrote:
> On Sun, Feb 26, 2012 at 7:02 PM, Randy Carpenter <rcarpen at> wrote:
>>> On Feb 26, 2012, at 4:56 PM, Randy Carpenter wrote:
>>> > 1. Full redundancy with instant failover to other hypervisor hosts
>>> > upon hardware failure (I thought this was a given!)
>>> This is actually a much harder problem to solve than it sounds, and
>>> gets progressively harder depending on what you mean by "failover".
>>> At the very least, having two physical hosts capable of running your
>>> VM requires that your VM be stored on some kind of SAN (usually
>>> iSCSI based) storage system. Otherwise, two hosts have no way of
>>> accessing your VM's data if one were to die. This makes things an
>>> order of magnitude or higher more expensive.
>> This does not have to be true at all.  Even having a fully fault-tolerant
>> SAN in addition to spare servers should not cost much more than
>> having separate RAID arrays inside each of the server, when you
>> are talking about 1,000s of server (which Rackspace certainly has)
> Randy,
> You're kidding, right?
> SAN storage costs the better part of an order of magnitude more than
> server storage, which itself is several times more expensive than
> workstation storage. That's before you duplicate the SAN and set up
> the replication process so that cabinet and room level failures don't
> take you out.

This is clearly becoming a not-NANOG-ish thread, however...

Failing to have central shared storage (iSCSI, NAS, SAN, whatever you
prefer) fails the smell test on a local enterprise-grade
virtualization cluster, much less a shared cloud service.

Some people have done tricks with distributing the data using one of
the research-ish shared filesystems, rather than separate shared
storage.  That can be made to work if the host OS model and its
available shared filesystems work for you.  Doesn't work for Vmware
Vcenter / Vmotion-ish stuff as far as I know.

There are plenty of people doing non-enterprise-grade virtualization.
There's no mandate that you have the ability to migrate a virtual to
another node in realtime or restart it immediately on another node if
the first node dies suddenly.  But anyone saying "we have a cloud" and
not providing that type of service, is in marketing not engineering.
>From a systems architecture point of view, you can't do that.

-george william herbert
george.herbert at

More information about the NANOG mailing list