Anycast applicable to Radius Server Farm ?

Joe Maimon jmaimon at ttec.com
Mon May 8 16:18:14 UTC 2006




Joe Shen wrote:
>>Can you indicate in more detail what the problems
>>were with the L4  
>>switch?
> 
> 
> We seperate our Radius servers into two farms, each
> farm has a L4 switch in front. To our understanding,
> radius authentication info. and accounting info. of a
> PPPoE session should be processed by the same Radius
> server.

I dont think its true. If the auth radius server fails to respond, 
authentication and accounting will then go to the next configured server

> So, although L4 switch provides a single IP
> for BRAS configuration  each BRAS is specified a real
> server IP in L4 switch. So, there comes the problem:
> 
> 1) Load is not balanced automatically  but by human
> estimation; there is server whose load is twice of
> some other server.
> 

See if you can extract load from the radius server using snmp or 
something and make your l4 switch utlilize that.

> 2) L4 switch becomes bottleneck of service
> availability. In past years, L4 switch caused several
> times of service failure. Just last friday, L4 switch
> does not repond to any network packets while its
> ethernet interface seems OK. 
> 

Add a couple of the actual servers IPs to the aaa servers the NAS's use

> 3) As L4 switch is the only entrance to a single
> server farm, DoS attack or some other kind of software
> bug  will surely degrade security level. While, a farm
> using ECMP rely on server groups to resist DoS attack.

Your firewalls should be protecting your radius servers from DoS -- 
unless you really expect the world to communicate with them. Spoofed 
sources however could be hard to protect against.

> 
> 4) Maintence is a little bit costy.  Any maintence ,
> no matter on radius server or on L4 switch, need a
> scheduled time window.
> 
> 5) Service protection is hard ( as you mentioned as
> 'cascade' one). As there are two server farms, if one
> farm failed it takes ten or more minute to migrate
> those Radius traffic to the other farm. This is
> unacceptable.
> 

Let the nas do it. they fail over much faster than that.

Whatever you choose, try to combine the ability of the nas to failover 
radius servers into your redundancy plan.



More information about the NANOG mailing list