Data Center testing
warren at kumari.net
Wed Aug 26 16:39:57 CDT 2009
On Aug 24, 2009, at 9:38 AM, Dan Snyder wrote:
> We have done power tests before and had no problem. I guess I am
> for someone who does testing of the network equipment outside of
> just power
> tests. We had an outage due to a configuration mistake that became
> when a switch failed.
So, one of the better ways to make sure that your failover system is
working when you need it is just to do away with the concept of a
failover system and make your "failover" system be part of your
This means that your failover system is always passing traffic and you
know that it is alive and well -- it also helps mitigate the pain when
a device fails (you are sharing the load over both systems and so only
half as much traffic gets disrupted). Scheduled maintenance is also
simpler and less stressful as you already know that your other path is
alive and well.
Your design and use case dictates how exactly you implement this, but
in general it involves things like tuning your IGP so you are using
all your links, staggering VLANs if you rely on them, multiple VRRP
groups per subnet, etc.
This does require a tiny bit more planning during the design phase,
and also requires that you check every now and then to make sure that
you are actually using both devices (and didn't, for example, shift
traffic to one device and then forget to shift it back :-)).
It also requires that you keep capacity issues in mind -- in a
primary and failover scenario you might be able to run devices fairly
close to capacity, but if you are sharing the load you need to keep
things under 50% (so when you *do* have a failure the remaining device
can handle the full load) -it's important to make this clear to the
finance folks before going down this path :-)
> It didn't cause a problem however when we did a power
> test for the whole data center.
> On Mon, Aug 24, 2009 at 9:31 AM, Ken Gilmour <ken.gilmour at gmail.com>
>> I know Peer1 in vancouver reguarly send out notifications of
>> "non-impacting" generator load testing, like monthly. Also InterXion
>> in Dublin, Ireland have occasionally sent me notification that there
>> was a power outage of less than a minute however their backup
>> successfully took the load.
>> I only remember one complete outage in Peer1 a few years ago... Never
>> seen any outage in InterXion Dublin.
>> Also I don't ever remember any power failure at AiNet (Deepak will
>> probably elaborate)
>> 2009/8/24 Dan Snyder <sliplever at gmail.com>:
>>> Does any one know of any data centers that do failure testing of
>>> networking equipment
>>> regularly? I mean to verify that everything fails over properly
>>> changes have been made over
>>> time. Is there any best practice guides for doing this?
"Does Emacs have the Buddha nature? Why not? It has bloody well
More information about the NANOG