400G forwarding - how does it work?

Masataka Ohta mohta at necom830.hpcl.titech.ac.jp
Sun Aug 7 11:16:07 UTC 2022

Saku Ytti wrote:

>> I'm afraid you imply too much buffer bloat only to cause
>> unnecessary and unpleasant delay.
>> With 99% load M/M/1, 500 packets (750kB for 1500B MTU) of
>> buffer is enough to make packet drop probability less than
>> 1%. With 98% load, the probability is 0.0041%.

> I feel like I'll live to regret asking. Which congestion control
> algorithm are you thinking of?

I'm not assuming LAN environment, for which paced TCP may
be desirable (if bandwidth requirement is tight, which is
unlikely in LAN).

> But Cubic and Reno will burst tcp window growth at sender rate, which
> may be much more than receiver rate, someone has to store that growth
> and pace it out at receiver rate, otherwise window won't grow, and
> receiver rate won't be achieved.

When many TCPs are running, burst is averaged and traffic
is poisson.

> So in an ideal scenario, no we don't need a lot of buffer, in
> practical situations today, yes we need quite a bit of buffer.

That is an old theory known to be invalid (Ethernet switches with
small buffer is enough for IXes) and theoretically denied by:

	Sizing router buffers

after which paced TCP was developed for unimportant exceptional
cases of LAN.

 > Now add to this multiple logical interfaces, each having 4-8 queues,
 > it adds up.

Having so may queues requires sorting of queues to properly
prioritize them, which costs a lot of computation (and
performance loss) for no benefit and is a bad idea.

 > Also the shallow ingress buffers discussed in the thread are not delay
 > buffers and the problem is complex because no device is marketable
 > that can accept wire rate of minimum packet size, so what trade-offs
 > do we carry, when we get bad traffic at wire rate at small packet
 > size? We can't empty the ingress buffers fast enough, do we have
 > physical memory for each port, do we share, how do we share?

People who use irrationally small packets will suffer, which is
not a problem for the rest of us.

						Masataka Ohta

More information about the NANOG mailing list