Facility wide DR/Continuity

Stefan netfortius at gmail.com
Wed Jun 3 15:01:38 UTC 2009

On Wed, Jun 3, 2009 at 7:09 AM, Drew Weaver <drew.weaver at thenap.com> wrote:

> Hi All,
> I'm attempting to devise a method which will provide continuous operation
> of certain resources in the event of a disaster at a single facility.
> The types of resources that need to be available in the event of a disaster
> are ecommerce applications and other business critical resources.
> Some of the questions I keep running into are:
>                Should the additional sites be connected to the primary site
> (and/or the Internet directly)?
>                What is the best way to handle the routing? Obviously two
> devices cannot occupy the same IP address at the same time, so how do you
> provide that instant 'cut-over'? I could see using application balancers to
> do this but then what if the application balancers fail, etc?
> Any advice from folks on list or off who have done similar work is greatly
> appreciated.
> Thanks,
> -Drew

In an environment where a DR site is deemed critical, it is my experience
that critical business applications also have a test or development
environment associated with the production one. If you look at the problem
this way, then a DR equipped with the test/devel systems, with one
"instance" of production always available, would only be challenging in
terms of data sync. Various SAN solutions would resolve that (SAN sync-ing
over WAN/MAN/etc.). Virtualization of critical systems may also add some
benefits here: clone the critical VMs in the DR, and in conjunction with the
storage being available, you'll be able to bring up this type of machines in
no time - just make sure you have some sort of L2 available - maybe EoS, or
tunneling over an L3 connectivity - tons of info when querying for virtual
machine mobility and inter-site connectivity.

Voice has to be considered, also - f/PSTN - make arrangements with provider
to re-route (8xx) in case of disaster. VoIP may add some extra capabilities
in terms of reachability over the Internet, in case your DR site cannot
accommodate - C/S people, for example, who are critical to interface with
customers in case of disaster (if no information - bigger loss - perception
issues) have to be able to connect even from home.

As far as "immediate" switch from one to another - DNS is the primary
concern (unless some wise people have hardcoded IPs all over), but there are
other issues people tend to forget, at the core of some clilents - take
Oracle "fat" client and its TNS names - I've seen those associated with IPs,
instead of host names ... etc.

Disclaimer: the above = one of many aspects. Have seen DNS comments already,
so I won't repeat those aspects.


More information about the NANOG mailing list