400G forwarding - how does it work?

Sun Aug 7 09:46:00 UTC 2022

On Sun, 7 Aug 2022 at 12:16, Masataka Ohta
<mohta at necom830.hpcl.titech.ac.jp> wrote:

> I'm afraid you imply too much buffer bloat only to cause
> unnecessary and unpleasant delay.
>
> With 99% load M/M/1, 500 packets (750kB for 1500B MTU) of
> buffer is enough to make packet drop probability less than
> 1%. With 98% load, the probability is 0.0041%.

I feel like I'll live to regret asking. Which congestion control
algorithm are you thinking of? If we estimate BW and pace TCP window
growth at estimated BW, we don't need much buffering at all.
But Cubic and Reno will burst tcp window growth at sender rate, which
may be much more than receiver rate, someone has to store that growth
and pace it out at receiver rate, otherwise window won't grow, and
receiver rate won't be achieved.
So in an ideal scenario, no we don't need a lot of buffer, in
practical situations today, yes we need quite a bit of buffer.

Now add to this multiple logical interfaces, each having 4-8 queues,
it adds up. Big buffers are bad mmmm'kay is frankly simplistic and
inaccurate.

Also the shallow ingress buffers discussed in the thread are not delay
buffers and the problem is complex because no device is marketable
that can accept wire rate of minimum packet size, so what trade-offs
do we carry, when we get bad traffic at wire rate at small packet
size? We can't empty the ingress buffers fast enough, do we have
physical memory for each port, do we share, how do we share?

-- 
  ++ytti