Zebra/linux device production networking?

Tue Jun 6 22:53:28 UTC 2006

On Jun 6, 2006, at 4:42 PM, Nick Burke wrote:

>
> How many of you have actually use(d) Zebra/Linux as a routing  
> device (core and/or regional, I'd be interested in both) in a  
> production (read: 99.999% required, hsrp, bgp, dot1q, other  
> goodies) environment?
>
> And, if you care to spend this much time, what pitfalls/benefits  
> did you find out about after implementation?

We started out on a FreeBSD/Zebra routing solution for our company  
(content provider). While it did work acceptably for many years, it  
wasn't what I'd call robust.

The "router" was a single P4 2.4GHz server. We had 4 GigE ports to 4  
uplinks, each giving us a full BGP feed. Then two more GigE ports to  
our switches. We could route over 750mbps easily, without any packet  
loss or latency.

The biggest issue we'd have was Zebra's single-threadedness. After a  
restart of bgpd, it would spend so much CPU time handling the BGP  
updates that it would get very very behind in processing BGP  
keepalives, and our sessions would time out before it had finished  
handling the initial burst. I'd have to shut down all sessions, then  
bring them up one at a time. It wasn't so much bgpd taking that much  
CPU, but bgpd not having very much left after the server was handling  
a few hundred mbps of traffic. Perhaps a dual CPU server would have  
worked better, but we never tried.

There were also issues where you could get two zebra routers  
deadlocked - they'd both have many megabytes of BGP updates to send  
each other, and both would want to send a full update until  
completion before accepting any data in.  Mucking with the kernel to  
allow TCP sockets to have a 16MB receive buffer helped, but still  
wasn't a cure.

You're also giving up things like RIBs, fancy queuing/rate limiting,  
and any kind of hardware acceleration. Doing hundreds of megabits is  
easy, but software based routers seem to have trouble under DoS  
situations (lots of tiny packets) quicker.

However, it was about as close to free as you could get. We re-used  
an old server, and only had to buy some 2 port ethernet cards.  
Support for Zebra is pretty iffy though. More often than not, I'd  
post a message to the Zebra mailing list to report a bug, and would  
get a "Yeah, known bug!" reply. The original author has all but  
abandoned development, leading to a fork called Quagga. Quagga is  
better (we still use it in a few places), but is still mostly a  
polished up Zebra.

In the end, we needed to start pushing more traffic than we were able  
get our Zebra box to do. A couple 20+ minute outages during peak  
usage because of deadlocked bgpd processes helped my case that we  
needed to buy some Junipers instead.

I know you're not giving specifics, but any kind of description of  
just how much traffic you're intending to push and how many ports you  
need would help in giving relevant advice. If you're talking about 1  
BGP feed for 10mbps, I'd say go for it. If you're talking about a  
dozen sessions, and 2gbps of traffic... no way. Where you are between  
those is what really matters.