400G forwarding - how does it work?

Wed Aug 10 18:38:48 UTC 2022

Sharada’s answers:

a) Yes, the run-to-completion model of Trio is superior to FP5/Nokia model when it comes to flexible processing engines. In Trio, the same engines can do either ingress or egress processing. Traditionally, there is more processing on ingress than on egress. When that happens, by design, less number of processing engines get used for egress, and more engines are available for ingress processing. Trio gives full flexibility. Unless Nokia is optimizing the engines (not all engines are identical, and some are area optimized for specific processing) to save overall area, I do not see any other advantage.  

b) Trio provides on-chip shallow buffering on ingress for fabric queues. We share this buffer between the slices on the same die. This gives us the flexibility to go easy on the size of SRAM we want to support for buffering. 

c) I didn't completely follow the question. Shallow ingress buffers are for fabric-facing queues, and we do not expect sustained fabric congestion. This, combined with the fact that we have some speed up over fabric, ensures that all WAN packets do reach the egress PFE buffer. On ingress, if packet processing is oversubscribed, we have line rate pre-classifiers do proper drops based on WAN queue priority.

Cheers,
Jeff

> On Aug 9, 2022, at 16:34, Jeff Tantsura <jefftant.ietf at gmail.com> wrote:
> 
> 
> Saku,
>  
> I have forwarded your questions to Sharada.
>  
> All,
>  
> For this week – at 11:00am PST, Thursday 08/11, we will be joined by Guy Caspary (co-founder of Leaba Semiconductor (acquired by Cisco -> SiliconOne)
> https://m.youtube.com/watch?v=GDthnCj31_Y
>  
> For the next week, I’m planning to get one of main architects of Broadcom DNX  (Jericho/Qumran/Ramon).
>  
> Cheers,
> Jeff
>  
> From: Saku Ytti
> Sent: Friday, August 5, 2022 12:15 AM
> To: Jeff Tantsura
> Cc: NANOG; Jeff Doyle
> Subject: Re: 400G forwarding - how does it work?
>  
> Thank you for this.
>  
> I wish there would have been a deeper dive to the lookup side. My open questions
>  
> a) Trio model of packet stays in single PPE until done vs. FP model of
> line-of-PPE (identical cores). I don't understand the advantages of
> the FP model, the Trio model advantages are clear to me. Obviously the
> FP model has to have some advantages, what are they?
>  
> b) What exactly are the gains of putting two trios on-package in
> Trio6, there is no local-switching between WANs of trios in-package,
> they are, as far as I can tell, ships in the night, packets between
> trios go via fabric, as they would with separate Trios. I can
> understand the benefit of putting trio and HBM2 on the same package,
> to reduce distance so wattage goes down or frequency goes up.
>  
> c) What evolution they are thinking for the shallow ingress buffers
> for Trio6. The collateral damage potential is significant, because WAN
> which asks most, gets most, instead each having their fair share, thus
> potentially arbitrarily low rate WAN ingress might not get access to
> ingress buffer causing drop. Would it be practical in terms of
> wattage/area to add some sort of preQoS towards the shallow ingress
> buffer, so each WAN ingress has a fair guaranteed-rate to shallow
> buffers?
>  
> On Fri, 5 Aug 2022 at 02:18, Jeff Tantsura <jefftant.ietf at gmail.com> wrote:
> > 
> > Apologies for garbage/HTMLed email, not sure what happened (thanks
> > Brian F for letting me know).
> > Anyway, the podcast with Juniper (mostly around Trio/Express) has been
> > broadcasted today and is available at
> > https://www.youtube.com/watch?v=1he8GjDBq9g
> > Next in the pipeline are:
> > Cisco SiliconOne
> > Broadcom DNX (Jericho/Qumran/Ramon)
> > For both - the guests are main architects of the silicon
> > 
> > Enjoy
> > 
> > 
> > On Wed, Aug 3, 2022 at 5:06 PM Jeff Tantsura <jefftant.ietf at gmail.com> wrote:
> > >
> > > Hey,
> > >
> > >
> > >
> > > This is not an advertisement but an attempt to help folks to better understand networking HW.
> > >
> > >
> > >
> > > Some of you might know (and love 😊) “between 0x2 nerds” podcast Jeff Doyle and I have been hosting for a couple of years.
> > >
> > >
> > >
> > > Following up the discussion we have decided to dedicate a number of upcoming podcasts to networking HW, the topic where more information and better education is very much needed (no, you won’t have to sign NDA before joining 😊), we have lined up a number of great guests, people who design and build ASICs and can talk firsthand about evolution of networking HW, complexity of the process, differences between fixed and programmable pipelines, memories and databases. This Thursday (08/04) at 11:00PST we are joined by one and only Sharada Yeluri - Sr. Director ASIC at Juniper. Other vendors will be joining in the later episodes, usual rules apply – no marketing, no BS.
> > >
> > > More to come, stay tuned.
> > >
> > > Live feed: https://lnkd.in/gk2x2ezZ
> > >
> > > Between 0x2 nerds playlist, videos will be published to: https://www.youtube.com/playlist?list=PLMYH1xDLIabuZCr1Yeoo39enogPA2yJB7
> > >
> > >
> > >
> > > Cheers,
> > >
> > > Jeff
> > >
> > >
> > >
> > > From: James Bensley
> > > Sent: Wednesday, July 27, 2022 12:53 PM
> > > To: Lawrence Wobker; NANOG
> > > Subject: Re: 400G forwarding - how does it work?
> > >
> > >
> > >
> > > On Tue, 26 Jul 2022 at 21:39, Lawrence Wobker <ljwobker at gmail.com> wrote:
> > >
> > > > So if this pipeline can do 1.25 billion PPS and I want to be able to forward 10BPPS, I can build a chip that has 8 of these pipelines and get my performance target that way.  I could also build a "pipeline" that processes multiple packets per clock, if I have one that does 2 packets/clock then I only need 4 of said pipelines.. and so on and so forth.
> > >
> > >
> > >
> > > Thanks for the response Lawrence.
> > >
> > >
> > >
> > > The Broadcom BCM16K KBP has a clock speed of 1.2Ghz, so I expect the
> > >
> > > J2 to have something similar (as someone already mentioned, most chips
> > >
> > > I've seen are in the 1-1.5Ghz range), so in this case "only" 2
> > >
> > > pipelines would be needed to maintain the headline 2Bpps rate of the
> > >
> > > J2, or even just 1 if they have managed to squeeze out two packets per
> > >
> > > cycle through parallelisation within the pipeline.
> > >
> > >
> > >
> > > Cheers,
> > >
> > > James.
> > >
> > >
>  
>  
>  
> --
>   ++ytti
>  
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.nanog.org/pipermail/nanog/attachments/20220810/1ca860e6/attachment.html>