External BGP Controller for L3 Switch BGP routing

Mon Jan 16 12:08:53 UTC 2017

On 16 January 2017 at 08:40, Tore Anderson <tore at fud.no> wrote:

Hey,

> Do you know of any traditional «Internet scale» router that can do ~720
> Gbps of throughput for less than 10x the price of a Trident II box? Or
> even <100kUSD? (Disregarding any volume discounts.)

It's really hard to talk about pricing, as it's very dependant on many
factors.  But I guess pretty much all Jericho boxes would fit that
bill? Arista will probably set you back anywhere in range of 15<35k,
will do full table (for now) and has deep packet buffers. NCS5501 is
also sub 100k, even with external TCAM. Probably single unit around
40k without external TCAM and 60k with external TCAM and you lose
8x10G and 2x100G ports.

But my comment wasn't really about what is available now, it was more
fundament about economics of large FIB or large buffers, they are not
inherently very BOM expensive.

I wonder if true whitelabel is possible, would some 'real' HW vendor,
of BRCM size, release HW docs openly? Then some integrator could start
selling the HW with BOM+10-20%, no support, no software at all. And
community could build the actual software on it.
It seems to me, what is keeping us away from near-BOM prices is
software engineering, and we cannot do it as a community, as HW docs
are not available.

> Quite the opposite, changing between different interface speeds happens
> very commonly inside the data centre (and most of the time it's done by
> shallow-buffered switches using Trident II or similar chips).

Why I said it won't be a problem inside DC, is because low RTT, which
means small bursts. I'm talking about backend network infra in DC, not
Internet facing. Anywhere where you'll see large RTT and
speed/availability step-down you'll need buffers (unless we change TCP
to pace window-growth, unlike burst what it does now, AFAIK, you could
already configure your Linux server to do pacing at estimate BW, but
then you'd lose in congested links, as more aggressive TCP stack would
beat you to oblivion).

> I suppose you could for example use a couple of MX240s or something as
> a special-purpose leaf layer for external connectivity.
> MPC5E-40G10G-IRB or something towards the 40GE spines and any regular
> 10GE MPC towards the exits. That way you'd only have one
> shallow-buffered speed conversion remaining. But I'm very sceptical if
> something like this makes sense after taking the cost/benefit ratio
> into account.

MPC indeed is on completely another level in BOM, as it's NPU with
lookup and packets in DRAM, fairly complicated and space-inefficient.
But we have pipeline chips in the market with deep buffers and full
DFZ. There is no real reason that the markup on them would be
significant, control-plane should cost more. This is why the promise
of XEON router is odd to me, as it's fundamentally very expensive
chip, combined with poorly predictable performance (jitter,
latency...)

-- 
  ++ytti