TCP time_wait and port exhaustion for servers

Wed Dec 5 17:09:31 UTC 2012

This would be outgoing connections sourced from the IP of the proxy,
destined to whatever remote website (so 80 or 443) requested by the
user.

Essentially it's a modified Squid service that is used to filter HTTP
for CIPA compliance (required by the government) for keep children in
public schools from stumbling on to inappropriate content.

Like most web traffic, the majority of these connections open and
close in under a second.  When we get to a point that there is enough
traffic from users behind the proxy to be generating over 500 new
outgoing connections per second, sustained, we start having users
experience an error where there are no local ports available to Squid
to use since they're all tied up in a TIME_WAIT state.

Here is an example of netstat totals on a box we're seeing the behavior on:

   10 LAST_ACK
   32 LISTEN
    5 SYN_RECV
    5 CLOSE_WAIT
  756 ESTABLISHED
   26 FIN_WAIT1
   40 FIN_WAIT2
    5 CLOSING
   10 SYN_SENT
481947 TIME_WAIT

As a band-aid we've opened up the local port range to allow up to 50K
local ports with /proc/sys/net/ipv4/ip_local_port_range, but they're
brushing up against that limit again at peak times.

It's a shame because memory and CPU-wise the box isn't breaking a sweat.

Enabling TW_REUSE doesn't seem to have any effect for this case
(/proc/sys/net/ipv4/tcp_tw_reuse)
Using TW_RECYCLE drops the TIME_WAIT count to about 10K instead of
50K, but everything I read online says to avoid using TW_RECYCLE
because it will break things horribly.

Someone responded off-list saying that TIME_WAIT is controlled by
/proc/sys/net/ipv4/tcp_fin_timeout, but that is just incorrect
information that has been parroted by a lot on blogs.  There is no
relation between fin_timeout and TCP_TIMEWAIT_LEN.

This level of use seems to translate into about 250 Mbps of traffic on
average, FWIW.

On Wed, Dec 5, 2012 at 11:56 AM, JÁKÓ András <jako.andras at eik.bme.hu> wrote:
>  Ray,
>
>> With a 60 second timeout on TIME_WAIT, local port identifiers are tied
>> up from being used for new outgoing connections (in this case a proxy
>> server).  The default local port range on Linux can easily be
>> adjusted; but even when bumped up to a range of 32K ports, the 60
>> second timeout means you can only sustain about 500 new connections
>> per second before you run out of ports.
>
> Is that 500 new connections per second per {protocol, remote address,
> remote port} tuple, that's too few for your proxy? (OK, this tuple is more
> or less equivalent with only {remote address} if we talk about a web
> proxy.) Just curious.
>
> Regards,
> András

-- 
Ray Patrick Soucy
Network Engineer
University of Maine System

T: 207-561-3526
F: 207-561-3531

MaineREN, Maine's Research and Education Network
www.maineren.net