Server Redundancy
Joe Abley
jabley at isc.org
Thu Aug 7 16:35:35 UTC 2003
On Thursday, 7 August 2003, at 07:28AM, Rob Pickering wrote:
> Then you've just got your BGP convergence time and unequal load
> balancing effects to worry about.
>
> Whilst I'm not knocking Paul's solution in an application like running
> a root NS for which it is perfect, I'm not so sure it's necessarily
> best for every kind of service load balancing.
We're using the technique Paul used in local clusters with OSPF; the
convergence time in an OSPF area which contains only a small number of
server and a couple of routers in a single area is pretty small.
There's no BGP convergence issue in this application (there's no BGP
within the server cluster).
We're using another anycast technique in the wide area, using BGP to
advertise covering supernets for services which are offered
autonomously in multiple locations. BGP is involved in this one, but we
are mitigating the potential for flap damage or transient convergence
loops by offering service from remote nodes to a local community only,
and not the whole Internet (i.e. the service supernet is offered as a
peering route, with restricted propagation, and not for global
transit).
The general approach we're taking with the wide-area, global service
distribution technique is described here:
http://www.isc.org/tn/isc-tn-2003-1.html
http://www.isc.org/tn/isc-tn-2003-1.txt
> I've used both the route hack based and commercial NAT load balancers,
> and they both have their place.
It's not really that much of a hack; it's just anycast over an IGP
coupled with routers which can populate the FIB with multiple
equal-cost routes with different next-hops, with some manner of flow
hash to keep traffic from a s single session pointing at the same
server.
> If you are running complex web services (think expensive per server sw
> licences etc) then the investment in a pair of redundant load
> balancers for the front end to give more consistent performance under
> load as well as resilience can look very sane indeed.
I've deployed services behind foundry
layer-4/layer-7/content/SLB/buzzword-du-jour switches before, and they
worked very well; from the brief time I spent with them, they seemed
well-designed and feature rich.
However, the foundries still suffered from the (near) single point of
failure problem. It only takes one person to mess up the switch config
whilst modifying a service or adding a new one, or a firmware upgrade
that goes bad, and you lose all your services at once.
As Paul mentioned, the advantage of using local-scope anycast with an
IGP to build a cluster is that there are no additional components, and
hence no additional points of failure.
Joe
More information about the NANOG
mailing list