Zebra/linux device production networking?
alex at pilosoft.com
alex at pilosoft.com
Tue Jun 6 23:17:17 UTC 2006
On Tue, 6 Jun 2006, Nick Burke wrote:
> First, a little background.. My CTO made my stomach curdle today when he
> announced that he wanted to do away with all our cisco [routers] and
> instead use Linux/zebra boxen. We are a small company, so naturally
> penny pinching is the primary motivation. That, and the sheer joy of
> watching me squirm. He has informed me that he has found "many people"
> who do this for their "core devices". I'm not so certain about this
> whole situation, so I humbly ask:
>
> How many of you have actually use(d) Zebra/Linux as a routing device
> (core and/or regional, I'd be interested in both) in a production (read:
> 99.999% required, hsrp, bgp, dot1q, other goodies) environment?
>
> And, if you care to spend this much time, what pitfalls/benefits did you
> find out about after implementation?
Having done exactly that previously, I wouldn't recommend it.
While it will work, most of the time, reaching 99.999% will be a
challenge. Amount of engineering time you will spend in order to reach
that point (and to maintain your setup) will dwarf the cost of leasing
proper equipment.
Issues encountered:
*) Performance under ddos: Linux routing stack is route-cache-based. That
means, performance is a function of flows per second, and even small
random src/dst ddos will kill you. Even when this is fixed, performance
will be limited by pps - and the "worst case" performance of PC router is
not as impressive as "omg i can route 1gbit with p3/1ghz". In the end,
"worst case" performance is what really matters, and it isn't all that
awesome.
*) Management: It takes certain amount of sysadmin time to manage each PC
router (tools/etc).
*) Integration: As it is not designed as a "complete system", you will
have little wierdnesses, such as, quagga not seeing kernel-installed
routes, or netlink not being able to keep up with route updates, etc. All
of those are fairly small things, but there are more than enough of them.
*) Troubleshooting/continuity of operations: It takes two orders of
magnitude more clue to troubleshoot zebra network - there are simply
*lots* more things that can possibly go wrong - you don't worry just about
your links breaking, you have to worry about your software being buggy.
While any CCIE will most likely be able to troubleshoot and run a
cisco-based network, pool of engineers sufficiently clued in a myriad of
things that relate to troubleshooting of a PC router (ie. both network
engineer, system admin, protocol engineer, kernel hacker, and at times,
zebra-source-code-hacker) is far smaller.
*) Maturity: While it has been improving, things like Quagga have still
have stability issues and "wierd issues that are resolved by killing
ospfd". Because of a greater state of flux in such environment, you are
likely to encounter things like "oh, this bug is fixed in latest release"
- and then having to retest the new release which has completely different
bugs. Yes, I know, you get that with proprietary vendors - but at least
you get a benefit of *them* doing at least some amount of testing prior to
release.
*) Redundancy: Adding more redundancy to such a system is not likely to
increase availability - in fact, it is likely to decrease availability
because of added complexity and "more things to break". Your problems
are not likely to be the PC losing power (complete failure). Your problem
will be things like zebra's idea of routing table being different from
kernel's idea, zebra being unhappy after a transit flaps sucking up CPU
time, leading to other things timing out, etc. Redundancy will
excarcerbate these issues, making troubleshooting *harder*.
So, in conclusion, if you have a large number of clued linux hackers who
have nothing better to do, it may be a good idea. Otherwise, you'll
realize you are spending far more on sysadmin time than you are saving on
equipment cost.
--
Alex Pilosov | DSL, Colocation, Hosting Services
President | alex at pilosoft.com 877-PILOSOFT x601
Pilosoft, Inc. | http://www.pilosoft.com
More information about the NANOG
mailing list