Reliable Cloud host ?

William Herrin bill at herrin.us
Mon Feb 27 15:28:37 UTC 2012


On Sun, Feb 26, 2012 at 7:02 PM, Randy Carpenter <rcarpen at network1.net> wrote:
>> On Feb 26, 2012, at 4:56 PM, Randy Carpenter wrote:
>> > 1. Full redundancy with instant failover to other hypervisor hosts
>> > upon hardware failure (I thought this was a given!)
>>
>> This is actually a much harder problem to solve than it sounds, and
>> gets progressively harder depending on what you mean by "failover".
>>
>> At the very least, having two physical hosts capable of running your
>> VM requires that your VM be stored on some kind of SAN (usually
>> iSCSI based) storage system. Otherwise, two hosts have no way of
>> accessing your VM's data if one were to die. This makes things an
>> order of magnitude or higher more expensive.
>
> This does not have to be true at all.  Even having a fully fault-tolerant
> SAN in addition to spare servers should not cost much more than
> having separate RAID arrays inside each of the server, when you
> are talking about 1,000s of server (which Rackspace certainly has)

Randy,

You're kidding, right?

SAN storage costs the better part of an order of magnitude more than
server storage, which itself is several times more expensive than
workstation storage. That's before you duplicate the SAN and set up
the replication process so that cabinet and room level failures don't
take you out.

DR sites then create a ferocious (read: expensive) bandwidth
challenge. Data can't flush from the primary SAN's write cache until
the DR SAN acknowledges receipt. If you don't have enough bandwidth to
keep up under the heaviest daily loads, the cache quickly fills and
the writes block.


I maintain 50ish VMs with about 30 different providers at the moment.
Not one of them attempts to do anything like what you describe.


> NetApp. HA heads. Done. Add a DR site with replication,
>and you can survive a site failure, and be back up and
>running in less than an hour. I would think that the big
>datacenter guys already have this type of thing set up.

That's expensive and VMs are sold primarily on price. You want high
reliability, you start with the dedicated colo server. Customers who
want DR in a VM environment buy two VMs and build data replication at
the app layer.


On Mon, Feb 27, 2012 at 9:31 AM, Max <perldork at webwizarddesign.com> wrote:
> Linode.com is not cloud based but they offer IP failover between VPS
> instances at no additonal charge - their pricing is excellent, I have
> had no down time issues with them in 3+ years with 3 different
> customers using them and they have nice OOB and programmatic API
> access for controlling VPs instances as well.

Hi Max,

I have had superb results from Linode and highly recommend them.
However, they're facilitating application level failover not keeping
your VM magically alive. And:

http://library.linode.com/linux-ha/ip-failover-heartbeat-pacemaker-ubuntu-10.04

"Both Linodes must reside in the same datacenter for IP failover"

So they don't support a full DR capability even if you're smart at the
app level.


On Mon, Feb 27, 2012 at 9:39 AM, Jared Mauch <jared at puck.nether.net> wrote:
> Is the DNS service authoritative or recursive?  If auth, you can
> solve this a few ways, either by giving the DNS name people
> point to multiple AAAA (and A) records pointing at a diverse
> set of instances.  DNS is designed to work around a host
> being down.  Same goes for MX and several other services.
>  While it may make the service slightly slower, it's certainly
> not the end of the world.

Hi Jared,

How DNS is designed to work and how it actually works is not the same.
Look up "DNS Pinning" for example. For most kinds of DR you need IP
level failover where the IP address is rerouted to the available site.

Regards,
Bill Herrin


-- 
William D. Herrin ................ herrin at dirtside.com  bill at herrin.us
3005 Crane Dr. ...................... Web: <http://bill.herrin.us/>
Falls Church, VA 22042-3004




More information about the NANOG mailing list