10GE TOR port buffers (was Re: 10G switch recommendaton)

Leo Bicknell bicknell at ufp.org
Fri Jan 27 15:52:27 CST 2012


In a message written on Fri, Jan 27, 2012 at 10:40:03PM +0100, bas wrote:
> But do you generally agree that "the market" has a requirement for a
> deep-buffer TOR switch?
> 
> Or am I crazy for thinking that my customers need such a solution?

You're crazy. :)

You need to google "bufferbloat", which while the aim has been more
at (SOHO) routers that have absurd (multi-second) buffers, the
concepts at play work here as well.

Let's say you have a VOIP application with 250ms of jitter tolerance,
and you're going 80ms across country.  You then add in a switch on
one end that has 300ms of buffer.

Ooops, you go way over, but only from time to time when the switch
is full, getting 300+80ms of latency for a few packets.

Dropped packets are a _GOOD_ thing.  If your ethernet switch can't
get the packet out another port in ~1-2ms it should drop it.  The
output port is congested, congestion is what tells the sender to
back off.  If you buffer the packets you get congestion collapse,
which is far worse for throughput in the end, and in particular has
severely detremental effects on the others on the LAN, not just the box
filling the buffers.

A network dropping packets is healthy, telling the upstream boxes
to throttle to the appropiate speeds with packet loss which is how
TCP operates.  I can' tell you how many times I've seen network
engineers tell me "no matter how big I make the buffers performance
gets worse and worse".  Well duh, you're just introducing more and
more latency in your network, and making TCP backoff fail, rather
than work properly.  I go in and slash their 50-100 packet buffers
down to 5 and magically the network performs great, even when full.

Now, how much buffer do you need?  One packet is the minimum.  If
you can't buffer one packet it becomes hard to reach 100% utilization
on a link.  Anyone who's tried with a pure cut-through switch can
tell you it tops out around 90% (with multiple senders to a single
egress).  Amazing one packet of buffer almost entirely fixes the
problem.

When I can manually set the buffers, I generally go for 1ms of buffers
on high speed (e.g. 10GE) links, and might increase that to as much as
15 ms of buffers on extremely low speed links, like sub-T1.

Remember, your RTT will vary (jitter) +- the sum of all buffers on all
hops along the path.  A 10 hop path with 15ms per hop could see 150ms of
jitter if all links go between full and not full!

Buffers in most network gear is bad, don't do it.

-- 
       Leo Bicknell - bicknell at ufp.org - CCIE 3440
        PGP keys at http://www.ufp.org/~bicknell/
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 826 bytes
Desc: not available
URL: <http://mailman.nanog.org/pipermail/nanog/attachments/20120127/3e7b4302/attachment.bin>


More information about the NANOG mailing list