BFD for routes learned trough Route-servers in IXPs

Baldur Norddahl baldur.norddahl at gmail.com
Sun Sep 20 20:32:42 UTC 2020


Hello

ARP timeout should be lower than MAC timeout, but usually the default is
the other way around. Which is extremely stupid. To those who do not know
why, let me give a simple example:

Router R1 is connected to switch SW1 with a connection to server SRV: R1
<-> SW1 <-> SRV
Router R2 is connected to switch SW2 with a connection to server SRV: R2
<-> SW2 <-> SRV

The server is using R1 as default gateway. Traffic is arriving from the
internet through R2 towards the server. The server will however send
replies back through the default gateway at R1. This is a usual case with
redundant routers - only one will be used as a default gateway but traffic
may come from both.

Initially all will be good. But SW2 is only seeing unidirectional traffic
from R2. No traffic goes from SRV to R2 and thus, after some time, SW2 will
expire the MAC learning for SRV. This has the unfortunate result that SW2
will start flooding traffic to SRV out through all ports.

Then after more time has passed, R2 will renew the ARP binding by sending
out an ARP query to SRV. The server will send back an ARP reply to R2. This
packet from SRV to R2 will pass SW2 and thus have the effect of renewing
the MAC binding at SW2 too. The flooding stops and all is well again. Until
the MAC binding expires and the story repeats.

If the MAC timeout is 5 minutes and the ARP timeout is 20 minutes, which is
very usual, you will have flooding for 15 minutes out of every 20 minutes
interval! Stupid!

Why have vendors not fixed their defaults for this case?

Regards,

Baldur



On Thu, Sep 17, 2020 at 7:51 AM Saku Ytti <saku at ytti.fi> wrote:

> On Wed, 16 Sep 2020 at 23:15, Chriztoffer Hansen
> <chriztoffer.hansen at de-cix.net> wrote:
> > On 16/09/2020 04:01, Ryan Hamel wrote:
>
> > > CoPP is always important, and it's not just Mikrotik's with default low
> > > ARP timeouts.
> > >
> > > Linux - 1 minute
> > > Brocade - 10 minutes
> > > Cumulus  - 18 minutes
> > > BSD distros - 20 minutes
> > > Extreme - 20 minutes
> > Juniper - 20 minutes
> > > HP - 25 minutes
> IOS - 4 hours
>
> Why are these considered (by Ryan) low values? Does low have a
> negative connotation here?
>
> ARP timeout should be lower than MAC timeout, and MAC timeout usually
> is 300 seconds. Anything above 300seconds is probably poor BCP for
> default value, as defaults should interoperate in a somewhat sane
> manner.
> Of course operators are free to configure very high ARP timeout, as
> long as they also remember to equally configure higher MAC timeout.
>
> --
>   ++ytti
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.nanog.org/pipermail/nanog/attachments/20200920/8fe6bba6/attachment.html>


More information about the NANOG mailing list