BGP offloading (fixing legacy router BGP scalability issues)
Frederik Kriewitz
frederik at kriewitz.eu
Thu Apr 9 11:24:28 UTC 2015
Thank you very much for all your responses.
First of all, the problems we see are really RIB (Processor memory)
and CPU related.
The TCAM/FIB limits are properly configured. From the FIB capacity
view they should last a couple of more years. Software routing doesn't
cause the problem.
The most extreme case of Cisco 6500/SUP720 abuse I'm aware of is a
setup with 4 full table transit connections + 2 RR sessions + ~20
peerings, no downstreams. Besides the IPv4 and IPv6 peerings it's
pretty much only handling a small amount of OSPF and MPLS (<5k
prefixes ~500 routers). No netflow or any other memory hog. Under
normal condition it's running at 20% CPU and 90% processor memory
(1G/SUP720 XL).
In case a session with a lot of prefixes (e.g. a transit) fails, it
takes up to 5 minutes for the BGP Router process to recompute the RIB,
etc.. During that time it's running at 100% CPU. Low priority
processes are completely ignored (e.g. SNMP based monitoring stops
working). Occasionally it even drops OSPF neighbours or other BGP
sessions due to expired hold timers causing further havoc.
I had a look at David Barroso's SDN Internet Router project. While
it's definitely a very interesting project it focuses on FIB
limitations, in our case the RIB is the problem.
Using netflow and traffic stats as additional metric is something I'm
missing from today's routers too (not to work around FIB limits but to
allow more intelligent load balancing/avoid congested ports).
Applying a /22 filter was suggested. In order to actually safe the RIB
memory we would have to disable soft-reconfiguration on the
corresponding sessions.
I don't like that option for various reasons as it trades less memory
usage for longer convergence times and significant bigger impacts on
route map updates.
Due to the IPv4 exhaustion we expect to see more small prefixes in the
future which can't be aggregated (considering the AS path). Simply
dropping them would result in less optimal routing.
Having a hardware router with just a small subset of routes to handle
most of the traffic and send remaining traffic via a default route to
a software router with a full table is a different approach to FIB
limits. It shares similar problems as mentioned in the original post
(how to make two routers appear as one, ...).
On the edge towards the end customers we already make heavy use of
Linux routers based on standard servers. While we would love to
replace all hardware routers with feature rich software routers we
still consider them necessary towards the internet facing edge in
order to allow us the mitigation certain (D)DoS attacks.
Dropping entire ASs is not an option as already discussed here.
Another suggestion was to use OpenFlow PacketIn/Out messages to
inject/extract the BGP packets. That probably would be a nice way to
do it but unfortunately legacy routers typically don't support
OpenFlow. The Cisco 6500/SUP720 is no exception.
I'll probably will setup a small test environment to see if this
actually works as expected.
Best Regards,
Freddy
More information about the NANOG
mailing list