Redundant Data Center Architectures

Thu Oct 29 04:52:05 UTC 2009

On 29/10/2009, at 8:39 AM, Stefan Fouant wrote:

>> -----Original Message-----
>> From: Darren Bolding [mailto:darren at bolding.org]
>> Sent: Wednesday, October 28, 2009 4:57 PM
>> To: Roland Dobbins
>> Cc: NANOG list
>> Subject: Re: Redundant Data Center Architectures
>>
>> Also, commercial solutions from F5 (their GTM product and their old  
>> 3-
>> DNS
>> product).
>>
>> Using CDN's is also a way of handling this, but you need to be  
>> prepared
>> for
>> all your traffic to come from their source-ip's or do creative things
>> with
>> x-forwarded-for etc.
>>
>> Making an active/active datacenter design work (or preferably one  
>> with
>> enough sites such that more than one can be down without seriously
>> impacted
>> service) is a serious challenge.  Lots of people will tell you (and
>> sell you
>> solutions for) parts of the puzzle.  My experience has been that the
>> best
>> case is when the architecture of the application/infrastructure have
>> been
>> designed with these challenges in mind from the get-go.  I have seen
>> that
>> done on the network and server side, but never on the software side-
>> that
>> has always required significant effort when the time came.
>>
>> The "drop in" solutions for this (active/active database replication,
>> middleware solutions, proxies) are always expensive in one way or
>> another
>> and frequently have major deployment challenges.
>>
>> The network side of this can frequently be the easiest to resolve, in
>> my
>> experience.  If you are serving up content that does not require
>> synchronized data on the backend, then that will make your life much
>> easier,
>> and GSLB, a CDN or similar may help a great deal.
>
> Thanks everyone who has responded so far.
>
> I should have clarified my intent a bit in the original email.  I am  
> definitely interested in architectures which support synchronized  
> data between data center locations in as close to real-time as  
> possible.  More specifically, I am interested in designs which  
> support zero down-time during failures, or as close to zero down- 
> time as possible.  GSLB, Anycast, CDNs... those types of approaches  
> certainly have their place especially where the pull-model is  
> employed (DNS, Netflix, etc.).  However, what types of solutions are  
> being used for synchronized data and even network I/O on back-end  
> systems?  I've been looking at the VMware vSphere 4 Fault Tolerance  
> stuff to synchronize the data storage and network I/O across  
> distributed virtual machines, but still worried about the  
> consequences of doing this stuff across WAN links with high degrees  
> of latency, etc.  From the thread I get the feeling that L2  
> interconnects (VPLS, PWs) are generally considered a bad thing, I  
> gathered as much as I figured there would be lots of unintended  
> consequences with regards to designated router elections and other  
> weirdness.  Besides connecting sites via L3 VPNs, what other  
> approaches are others using?  Also, would appreciate any comments to  
> the synchronization items above.
>
> Thanks,
>
> --
> Stefan Fouant

Layer 2 interconnects (whether they are VPLS / PWE3 / or other CCC- 
based models) are not bad in their own right, but I think it's  
important to realize that extending a (sub)network across large  
geographical regions because applications are not building  
intelligence about locality or presence is a move without intelligent  
engineering. I hear it all the time: just extend layer 2 between these  
two data centers so that we can have either (1) disaster recovery or  
(2)  vmotion / heart beats / etc ...

The truth is we can do things better and smarter than just extending  
bridging domains across disparate geographical locations. Real time  
storage should ideally be local .. but there is no reason why it can't  
be "available" over the cloud to other networks. The key is to have a  
single namespace for all storage, to not be tied to a particular  
storage technology, but to simply be able to present the storage/disk/ 
mount point to virtual machines.

Extending layer 2 for iSCSI / SAN / and even FCoE is feasible. But,  
lets think about the technology in detail .. FCoE uses pause frames,  
and when there is significant geographical delay between sites, then  
FCoE is not the right technology. It works great locally .. and this  
should be just one technology to deliver storage locally in DCs.

Internally I would explore DNS (GSLB / anycast / etc) .. and even  
ideas like mobile IPv4 / IPv6 mobility before I started extending  
layer 2 domains across the world.

Kind regards,
Truman