External BGP Controller for L3 Switch BGP routing

joel jaeggli joelja at bogus.com
Tue Jan 17 05:22:16 UTC 2017

On 1/15/17 11:00 PM, Yucong Sun wrote:
> In my setup, I use an BIRD instance to combine multiple internet full
> tables,  i use some filter to generate some override route to send to my L3
> switch to do routing.  The L3 switch is configured with the default route
> to the main transit provider , if BIRD is down, the route would be
> unoptimized, but everything else remain operable until i fixed that BIRD
> instance.
> I've asked around about why there isn't a L3 switch capable of handling
> full tables, I really don't understand the difference/logic behind it.

In practice there are several merchant silicon implmentations that
support the addition of external tcams. building them accordingly
increases the COGS and and various performance and packaging limitions.

arista 7280r and cisco ncs5500 are broadcom jericho based devices that
are packaged  accordingly.

Ethernet merchant silicon is heavily biased towards doing most if not
all the IO on the same asic, with limitations driven by gate size, die
size, heat dissipation pin count an so on.

There was a recent packet pushers episode with Pradeep Sindhu that
touched on some of these issues:


> On Sun, Jan 15, 2017 at 10:43 PM Tore Anderson <tore at fud.no> wrote:
>> Hi Saku,
>> https://www.redpill-linpro.com/sysadvent/2016/12/09/slimming-routing-table.html
>>> ---
>>> As described in a prevous post, we’re testing a HPE Altoline 6920 in
>>> our lab. The Altoline 6920 is, like other switches based on the
>>> Broadcom Trident II chipset, able to handle up to 720 Gbps of
>>> throughput, packing 48x10GbE + 6x40GbE ports in a compact 1RU chassis.
>>> Its price is in all likelihood a single-digit percentage of the price
>>> of a traditional Internet router with a comparable throughput rating.
>>> ---
>>> This makes it sound like small-FIB router is single-digit percentage
>>> cost of full-FIB.
>> Do you know of any traditional «Internet scale» router that can do ~720
>> Gbps of throughput for less than 10x the price of a Trident II box? Or
>> even <100kUSD? (Disregarding any volume discounts.)
>>> Also having Trident in Internet facing interface may be suspect,
>>> especially if you need to go from fast interface to slow or busy
>>> interface, due to very minor packet buffers. This obviously won't be
>>> much of a problem in inside-DC traffic.
>> Quite the opposite, changing between different interface speeds happens
>> very commonly inside the data centre (and most of the time it's done by
>> shallow-buffered switches using Trident II or similar chips).
>> One ubiquitous configuration has the servers and any external uplinks
>> attached with 10GE to leaf switches which in turn connects to a 40GE
>> spine layer with. In this config server<->server and server<->Internet
>> packets will need to change speed twice:
>> [server]-10GE-(leafX)-40GE-(spine)-40GE-(leafY)-10GE-[server/internet]
>> I suppose you could for example use a couple of MX240s or something as
>> a special-purpose leaf layer for external connectivity.
>> MPC5E-40G10G-IRB or something towards the 40GE spines and any regular
>> 10GE MPC towards the exits. That way you'd only have one
>> shallow-buffered speed conversion remaining. But I'm very sceptical if
>> something like this makes sense after taking the cost/benefit ratio
>> into account.
>> Tore

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 203 bytes
Desc: OpenPGP digital signature
URL: <http://mailman.nanog.org/pipermail/nanog/attachments/20170116/67fa40d8/attachment.sig>

More information about the NANOG mailing list