scaling linux-based router hardware recommendations
edtardist at gmail.com
Wed Jan 28 00:05:58 UTC 2015
On Mon, Jan 26, 2015 at 8:53 PM, micah anderson <micah at riseup.net> wrote:
> I know that specially programmed ASICs on dedicated hardware like Cisco,
> Juniper, etc. are going to always outperform a general purpose server
> running gnu/linux, *bsd... but I find the idea of trying to use
> proprietary, NSA-backdoored devices difficult to accept, especially when
> I don't have the budget for it.
> I've noticed that even with a relatively modern system (supermicro with
> a 4 core 1265LV2 CPU, with a 9MB cache, Intel E1G44HTBLK Server
> adapters, and 16gig of ram, you still tend to get high percentage of
> time working on softirqs on all the CPUs when pps reaches somewhere
> around 60-70k, and the traffic approaching 600-900mbit/sec (during a
> DDoS, such hardware cannot typically cope).
> It seems like finding hardware more optimized for very high packet per
> second counts would be a good thing to do. I just have no idea what is
> out there that could meet these goals. I'm unsure if faster CPUs, or
> more CPUs is really the problem, or networking cards, or just plain old
> fashioned tuning.
> Any ideas or suggestions would be welcome!
This is a very interesting yet obscure and not widely discussed subject.
And industry generally does not like the discussion to come up in public
lists like this one. If you happen to reach line rate PPS throughput on
x86, for filtering or forwarding, how will they keep that high profit rate
on their products and keep investors happy?
With that said, I am a very happy user for two hardware vendors not widely
known, and a technology very well known but still barely discussed.
I run FreeBSD, the so called "silent workhorse" as a BGP router and also
FreeBSD (or pfSense) as a border firewall.
For hardware vendors, I am a very happy customer of:
- iXSystems (www.ixsystems.com)
- ServerU Inc. (www.serveru.us)
They are both BSD/Linux driven hardware specialists, and they are both very
good consultants and technology engineers.
I run a number of BGP and firewall boxes on GA, NY, FL and some other
locations on east coast, as well as Belize, BVI and Bahamas and LATAM.
pfSense is my number one system of choice, but sometimes I run FreeBSD
vanilla, specially in my core locations.
In one central location I have the following setup:
- 1x ServerU Netmap L800 box in Bridge Mode for Core Firewall protection
- 2x ServerU Netmap L800 boxes as BGP router (redundant)
- Several Netmap L800, L100 and iXSystems servers (iXS for everything else
since ServerU are only networking-centric, not high storage high processing
In this setup I am running yet another not well known but very promising
technology, called Netmap.
A Netmap firewall (called netmap-ipfw) was supplied from ServerU vendor,
it's a slightly modified version from what you can download from Luigi
Rizzo's (netmap author) public repository with multithread capabilities
based on the number of queues available in the ServerU igb(4) networking
What it does is, IMHO, amazing for a x86 hardware: line rate firewall on
1GbE port (1.3-1.4Mpps) and line rate firewall for 10GbE port (12-14Mpps)
in a system with 8 @2.4Ghz Intel Rangeley CPU.
It's not Linux DNA. It's not PF_RING. It's not Intel DPDK.
It's netmap, it's there, available, on FreeBSD base system with a number of
utilities and code for reference on Rizzos' repositories. It's there, it's
available and it's amazing.
This firewall has saved my sleep several times since November, dropping up
to 9Mpps amplified UDP/NTP traffic on peak DDoS attack rates.
For the BGP box, I needed trunking, Q-in-Q and vlan. And sadly right now
this is not available in a netmap implementation.
It means I had to keep my BGP router in the kernel path. It's funny to say
this, but Netmap usually skips kernel path completely and does its job
direct on the NIC, reaching backplane and bus limits directly.
ServerU people recommended me to use Chelsio Terminator 5 40G ports. OK I
only needed 10G but they convinced me not to look at the bits per second
numbers but the packets per seconds number.
Honestly, I don't know how Chelsio T5 did it, even though ServerU 1GbE
ports perform very good on interruption CPU usage (probably this is an
Intel igb(4) / ix(4) credit) but everything I route from one 40GbE port to
the other port on the same L-800 expansion card, I have very, very, very
LOW interrupt rates. Sometimes I have no interrupt at all!!
I peaked routing 6Mpps on ServerU L-800 and still had CPU there, available.
I am not sure where proper credits is due to ServerU hardware, to FreeBSD
OS, to Netmap or to Chelsio. But I am sure on what it matters for my VP or
my CFO: $$$
While a T5 card will cost around USD 1,000 and a ServerU L-800 router will
cost another USD 1,200, I have a 2,2k USD overall cost of ownership for a
box that will give me PPS rates that otherwise would cost from 9,000 USD to
12,000 USD on an industry product.
I have followed a good discussion on a Linkedin Group (anyone googling for
it will find it) comparing Netmap to DPDK from the developer perspective.
Netmap developer pointed some good considerations while an Intel engineer
pointed some other perspectives. Overall, DPDK and Netmap sounds, from my
end-user/non-developer/non-software-engineer point of view, very similar in
matter of results, while different in the inner gore details with some
flexibility/generalist advantages for Netmap and some hardware specifics
advantages for DPDK when running Intel hardware (of course), since its like
CUDA is for Nvidia... vendor specific.
I honestly hope a fraction of this million dollar donated to FreeBSD
Foundation from WhatsApp founder goes on research and enhancements for
It's the most promising networking technology I have seen in the last
years, and it goes straight to what FreeBSD does best: networking
performance. It's not a coincidence that since the beginning of Internet,
top Internet traffic servers, from Yahoo! to WhatsApp and Netflix, run
I don't know how important decisions can be addressed concerning adding to
a Netmap stack a superset of full forwarding capability along with lagg(4),
vlan(4), Q-in-Q, maybe carp(4) and other lightweight but still very
kernel-path choppy features. But I hope FreeBSD engineers take good
decisions on assigning those issues. And address time, funds and goals to
For now, however, if you really want a relatively new and innovative
technology with actual code to use and run, ready and available, this is my
And for hardware vendors, iXSystems + ServerU.
It gets out from the speculation field, since Netmap reference code for
serious stuff, including a whole firewall, is available and ready to test,
compare results, enhance and use.
Suricata IDP has Netmap support, so yes, you can inspect close to line rate
packets on IDS (not IPS) mode with Suricata.
For everything else, DPDK, DNA, PF_RING, you have a framework in place.
Some are experimental, some are more mature, but you will have to code and
prove it by yourself.
While FreeBSD/Netmap is a flavor ready to be tasted.
This is my 5 cents opinion for such a great topic!
Concerning BGP convergence time. Come on, are you serious? You deal with
platforms that take 1 minute, up to 3 minutes for full convergence of a
couple of bgp FULL sessions?
What hardware is that? A Nintendo 8bits? LOL! ;-)
Seriously and literally, a Sega Dreamcast videogame running NetBSD + BIRD
will have better convergence time!!
Now, serious again and no ironic statements further.
While Cisco and Juniper have great ASICS chips and stuff, it's amazing to
see that nowadays, Juniper MX Series still run weak Cavium-Octeon CPU for
stuff their Trio 3D chip won't run. The same goes to Cisco with amazing
ASICS but with weak CPU power that need, indeed, to be protected from DDoS
attacks for things won't run on ASICS.
Convergence time frames above 30 seconds nowadays, IMHO, should not be
accepted on any new BGP environment. Only legacy hardware should take that
For OpenBGP I have <30s convergence time for several full sessions on x86
hardware as the ones mentioned above. With BIRD, convergence time frames
are even lower. If convergence time takes longer on OpenBGP or BIRD its
mostly related to how long the UPDATE messages take to arrive, not to be
More information about the NANOG