Dampening considered harmful? (Was: Re: verizon.net and other email grief)

Jared Mauch jared at puck.nether.net
Fri Dec 17 04:52:54 UTC 2004


On Thu, Dec 16, 2004 at 11:43:12PM -0500, Jared Mauch wrote:
> On Fri, Dec 17, 2004 at 12:42:21AM +0100, Iljitsch van Beijnum wrote:
> > 
> > On 17-dec-04, at 0:21, Jerry Pasker wrote:
> > 
> > >>	ie: does dampening cause more problems than it tries to solve/avoid
> > >>these days.
> > 
> > >I don't know what takes more router resources;  dampening enabled 
> > >doing the dampening calculations, or no dampening and constantly 
> > >churning the BGP table.  I would assume dampening generally saves 
> > >router resources, or operators wouldn't chose to enable it.
> > 
> > One reason to be careful with dampening is that flaps can be 
> > multiplied. (Connect to routeviews and see the different flap counts 
> > under different peers for the same flap at your end to observe this.)
> 
> 	There have been numerous people who have spoken and released
> research on this topic.
> 
> 	I think with the "better" routing code out there these
> days, that most people can quickly handle a large number
> of next-hop changes, etc.. in their hw/sw that disabling dampening
> would allow the networks to reconverge fairly quickly without (much)
> trouble.  (going to respond to the streaming video/audio/whatnot
> issue seperately).

	oops, i thought their reply was cc:ed to the list, but i
guess not.  i'm going to summarize a private reply i got:

	a streaming video provider saw that when dampening was
enabled, it reduced the number of complaints related to rebuffering
and other similar issues by 15%.

	I think that having a more stable/reliable
network switchover to a forwarding path should happen
at a more moderate rate.. Obviously, if a prefix is flapping
badly via one connection, it should be less prefered, but if
it's the only path to that prefix, what harm is caused by
reinstalling it fairly quickly?  (eg: after 60 seconds of stability
instead of a lot longer).

	This would possibly create a few transient cases of
lack of reachability that would be a bit harder to diagnose if
max dampening timers were a lot lower and a route was still oscilating..

	In the case of lack of reachability to an anycast prefix
(i know that .org and ultradns were being picked on, so i'll use
them as an example.. sorry Rodney :) ) having the shorter times
would increase availability.  Obviously the ideal case is that you
would exempt these prefixes from dampening, but creating such
policies may actually be more work for the router in the long term
than just disabling dampening totally.

	you might see more transient cpu usage as the network goes
flappity-flap, but the cost of not evaluating a more complex
dampening policy might see some savings as well.  (considering
the cpu time spent to evaluate it, allocate memory to
store state, count flaps, etc..)

	- jared

-- 
Jared Mauch  | pgp key available via finger from jared at puck.nether.net
clue++;      | http://puck.nether.net/~jared/  My statements are only mine.



More information about the NANOG mailing list