Facility wide DR/Continuity

gb10hkzo-nanog at yahoo.co.uk gb10hkzo-nanog at yahoo.co.uk
Wed Jun 3 14:53:30 UTC 2009

As with all things, there's no "right answer" ..... a lot of it depends on three things :

- what you are hoping to achieve
- what your budget is
- what you have at your disposal in terms of numbers of qualified staff available to both implement and support the chosen solution

That's the main business level factors.  From a technical level, two key factors (although, of course, there are many others to consider) are :

- whether you are after an active/active or active/passive solution
- what the underlying application(s) are (e.g. you might have other options such as anycast with DNS)

Anyway, there's a lot to consider.  And despite all the expertise on Nanog, I would still suggest the original poster does their fair share of their own homework. :)

----- Original Message ----
From: Jim Wise <jwise at draga.com>
To: gb10hkzo-nanog at yahoo.co.uk
Cc: nanog at nanog.org
Sent: Wednesday, 3 June, 2009 15:42:24
Subject: Re: Facility wide DR/Continuity

gb10hkzo-nanog at yahoo.co.uk writes:

> On the subject of DNS GSLB, there's a fairly well known article on the
> subject that anyone considering implementing it should read at least
> once.... :)
> http://www.tenereillo.com/GSLBPageOfShame.htm
> and part 2
> http://www.tenereillo.com/GSLBPageOfShameII.htm
> Yes it was written in 2004.  But all the "food for thought" that it
> provides is still very much applicable today.

One thing I've noticed about this paper in the past that kind of bugs me
is that in arguing that multiple A records are a better solution than a
single GSLB-managed A record, the paper assumes that browsers and other
common internet clients will actually cache multiple A records, and fail
between them if the earlier A records fail.  The (first) of the two
pages explicitly touts this as a high availability solution.

However, I haven't observed this behavior from browsers, media players,
and similar programs `in the wild' -- as far as I've been able to tell,
most client software picks an A record from those returned (possibly,
but not usually skipping those found to be unreachable), and then holds
onto that choice of IP address until the record times out of cache, and
a new request is made.

Have I been unlucky in my observations?  Are there client programs which
do failover between multiple A records returned for a single name --
presumably sticking with one IP for session-affinity purposes until a
failure is detected?

If clients do not behave this way, then the paper's observations about
GSLB for HA purposes don't seem to hold -- though in my limited
experience the paper's other point (that geographic dispatch is Hard)
seems much more accurate (making GSLB a better HA solution than it is a
load-sharing solution, again, at least in my experience).

Or am I missing something?

                Jim Wise
                jwise at draga.com


More information about the NANOG mailing list