Data Center testing

Warren Kumari warren at kumari.net
Wed Aug 26 21:39:57 UTC 2009


On Aug 24, 2009, at 9:38 AM, Dan Snyder wrote:

> We have done power tests before and had no problem.  I guess I am  
> looking
> for someone who does testing of the network equipment outside of  
> just power
> tests.  We had an outage due to a configuration mistake that became  
> apparent
> when a switch failed.

So, one of the better ways to make sure that your failover system is  
working when you need it is just to do away with the concept of a  
failover system  and make your "failover" system be part of your  
"primary" system
.
This means that your failover system is always passing traffic and you  
know that it is alive and well -- it also helps mitigate the pain when  
a device fails (you are sharing the load over both systems and so only  
half as much traffic gets disrupted). Scheduled maintenance is also  
simpler and less stressful as you already know that your other path is  
alive and well.

Your design and use case dictates how exactly you implement this, but  
in general it involves things like tuning your IGP so you are using  
all your links, staggering VLANs if you rely on them, multiple VRRP  
groups per subnet, etc.

This does require a tiny bit more planning during the design phase,  
and also requires that you check every now and then to make sure that  
you are actually using both devices (and didn't, for example, shift  
traffic to one device and then forget to shift it back :-)).
It also requires that you keep capacity issues in mind --  in a  
primary and failover scenario you might be able to run devices fairly  
close to capacity, but if you are sharing the load you need to keep  
things under 50% (so when you *do* have a failure the remaining device  
can handle the full load) -it's important to make this clear to the  
finance folks before going down this path :-)

W

>  It didn't cause a problem however when we did a power
> test for the whole data center.
>
> -Dan
>
>
> On Mon, Aug 24, 2009 at 9:31 AM, Ken Gilmour <ken.gilmour at gmail.com>  
> wrote:
>
>> I know Peer1 in vancouver reguarly send out notifications of
>> "non-impacting" generator load testing, like monthly. Also InterXion
>> in Dublin, Ireland have occasionally sent me notification that there
>> was a power outage of less than a minute however their backup
>> successfully took the load.
>>
>> I only remember one complete outage in Peer1 a few years ago... Never
>> seen any outage in InterXion Dublin.
>>
>> Also I don't ever remember any power failure at AiNet (Deepak will
>> probably elaborate)
>>
>> 2009/8/24 Dan Snyder <sliplever at gmail.com>:
>>> Does any one know of any data centers that do failure testing of  
>>> their
>>> networking equipment
>>> regularly? I mean to verify that everything fails over properly  
>>> after
>>> changes have been made over
>>> time.  Is there any best practice guides for doing this?
>>>
>>> Thanks,
>>> Dan
>>>
>>

-- 
"Does Emacs have the Buddha nature? Why not? It has bloody well  
everything else!"






More information about the NANOG mailing list