<div dir="ltr"><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">My understanding of Juniper's approach to the problem is that instead<br>of employing TCAMs for next-hop lookup, they use general purpose CPUs<br>operating on a radix tree, exactly as you would for an all-software<br>router.<br></blockquote><div><br></div><div>Absolutely are not doing that with "general purpose CPUs". </div><div><br></div><div>The LU block on early gen Trios was a dedicated ASIC (LU by itself, then consolidated slightly) , then later gen Trio put everything on a single chip, but again dedicated ASIC. </div><div><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">To<br>achieve an -aggregate- lookup speed comparable to a TCAM, they<br>implement a bunch of these lookup engines as dedicated parallel<br>subprocessors rather than using the router's primary compute engine.<br></blockquote><div><br></div><div>You're correct that there is parallelism in the LU functions , but I still think you're kinda smushing a bunch of stuff that's happening in different places together. </div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Fri, Sep 29, 2023 at 4:44 PM William Herrin <<a href="mailto:bill@herrin.us">bill@herrin.us</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">On Thu, Sep 28, 2023 at 10:29 PM Saku Ytti <<a href="mailto:saku@ytti.fi" target="_blank">saku@ytti.fi</a>> wrote:<br>

> On Fri, 29 Sept 2023 at 08:24, William Herrin <<a href="mailto:bill@herrin.us" target="_blank">bill@herrin.us</a>> wrote:<br>

> > Maybe. That's where my comment about CPU cache starvation comes into<br>

> > play. I haven't delved into the Juniper line cards recently so I could<br>

> > easily be wrong, but if the number of routes being actively used<br>

> > pushes past the CPU data cache, the cache miss rate will go way up and<br>

> > it'll start thrashing main memory. The net result is that the<br>

> > achievable PPS drops by at least an order of magnitude.<br>

><br>

> When you say, you've not delved into the Juniper line cards recently,<br>

> to which specific Juniper linecard your comment applies to?<br>

<br>

Howdy,<br>

<br>

My understanding of Juniper's approach to the problem is that instead<br>

of employing TCAMs for next-hop lookup, they use general purpose CPUs<br>

operating on a radix tree, exactly as you would for an all-software<br>

router. This makes each lookup much slower than a TCAM can achieve.<br>

However, that doesn't matter much: the lookup delays are much shorter<br>

than the transmission delays so it's not noticeable to the user. To<br>

achieve an -aggregate- lookup speed comparable to a TCAM, they<br>

implement a bunch of these lookup engines as dedicated parallel<br>

subprocessors rather than using the router's primary compute engine.<br>

<br>

A TCAM lookup is approximately O(1) while a radix tree lookup is<br>

approximately O(log n). (Neither description is strictly correct but<br>

it's close enough to understand the running time.) Log n is pretty<br>

small so it doesn't take much parallelism for the practical run time<br>

to catch up to the TCAM.<br>

<br>

Feel free to correct me if I'm mistaken or fill in any important<br>

details I've glossed over.<br>

<br>

Regards,<br>

Bill Herrin<br>

<br>

<br>

--<br>

William Herrin<br>

<a href="mailto:bill@herrin.us" target="_blank">bill@herrin.us</a><br>

<a href="https://bill.herrin.us/" rel="noreferrer" target="_blank">https://bill.herrin.us/</a><br>

</blockquote></div>