Westnet and Utah outage

Jim Forster forster at cisco.com
Fri Dec 1 00:01:35 UTC 1995


I made a private reply to Curtis on his posting earlier this week, and he
gave a nice analysis and cc'd end2end-interest rather than nanog.  For
those that don't care to care to read all this, here's the summary:

> Which would you prefer?  140 msec and 0% loss or 70 msec and 5% loss?

So we get to choose between large delay or large lossage.  Doesn't sound
wonderful...

I thought you folks in nanog might be interested, so with Curtis'
permission, here's the full exchange, (the original posting by Curtis is at
the at the very end).

  -- Jim

Here's what I wrote:

> In message <199511272220.OAA01151 at stilton.cisco.com>, Jim Forster writes:
> > Curtis,
> > 
> > I think these days for lots of folks the interesting question is not what
> > happens when a single or a few high-rate TCPs get in equlibrium, but rather
> > what happens when a DS-3 or higher is filled with 56k or slower flows, each
> > of which only lasts for an average of 20 packets or so.  Unfortunately,
> > these 20 packet TCP flows are what's driving the stats these days, due I
> > guess to the silly WWW (TCP per file; file per graphic; many graphics per
> > page) that's been so successful.

And Curtis's reply:

> The analysis below also applies to just under 800 TCP flows each
> getting 1/800th of a DS3 link or about 56Kb/s.  The loss rate on the
> link should be about one packet in 11 if the delay can be increased to
> 250 msec.  If the delay is held at 70 msec, lots of timeouts and
> terrible fairness and poor overall performance will result.
> 
> Do we need an ISP to prove this to you by exhibiting terrible
> performance?  If so, please speak to Jon Crowcroft.  His case is 400
> flows on 4 Mb/s which is far worse, since delay would have to be
> increased over 3 seconds or segment size reduced below 552.  :-(
> 
> > I could try to derive the results but I'm sure you or others would do
> > better :-).  How many of the packets in the 20 packet flow are at
> > equilibrium?  What's the drop rate?  Hmmm, very simple minded analysis says
> > that it will be large: expontential growth (doubling cwnd every ack) should
> > get above best case pretty quickly, certainly within the 20 packet flow.
> > Assume it's only above optimum once, then the packet loss rate is 1 in 20.
> > Sounds grim.  Vegas TCP sounds better for these reasons, since it tracks
> > actual bw, but I'm not really qualified to judge.
> > 
> >   -- Jim
> 
> 
> Jim,
> 
> The end2end-interest thread was quite long and I didn't want to repeat
> the whole thing.  The initial topic was very tiny TCP flows of 3 to 4
> packets.  That is a really bad problem, but should no longer be a
> realistic problem once HTTP is modified to allow it to pick up both
> the HTML page and all inline images in one TCP connection.
> 
> Your example is quite reasonable.  At 20 packets per flow, with no
> loss you get 1, 2, 4, 8, 3 packets per RTT or complete transfer in
> about 5 RTT.  On average each TCP flow will get 20 packets / 5 RTT of
> bandwidth until congestion of 4 packets/RTT (for 552/70 msec, this is
> about 64 Kb/s).  If the connection is temporarily overloaded by a
> factor of 2, this must be reduced to 2 packets/RTT.  If we drop 1
> packet in 20, roughly 35% of the flows go completely untouched
> (0.95^20).  Some 15% will drop one packet of the first 3 and timeout
> and slow start, resulting in less than 20 packet / 3 seconds (3
> seconds >> 5*RTT).  Some 60% will drop one packet of the 4th through
> 20th, resulting in fast retransmit, no timeout, and linear growth in
> window.  If the 4th is dropped, the window is cut to 2, so next few
> RTTs you get 2, 3, 4, 5, 3, or 8 RTTS (2 initial, 1 drop, 5 more).
> This is probably not quite enough to slow things down.
> 
> On a DS3 with 70 msec RTT and 1500 simultaneous flows of 20 packets
> each (steady state such that the number of active flows remains about
> 1500, roughly twice what a DS3 could support) you would need a drop
> rate of on the order of 5% or more.  Alternately, you could queue
> things up, doubling the delay to 140 msec and give every flow the same
> slower rate (perfect fairness in your example) and have a zero drop
> rate.
> 
> Which would you prefer?  140 msec and 0% loss or 70 msec and 5% loss?
> Delay is good.  We want delay for elastic traffic!  But not for real
> time - use RSVP, admission control, police at the ingress and stick it
> on the front of the queue.
> 
> In practice, I'd expect overload to be due to lots of flows, but not
> enough little guys to overload the link (if so, get a bigger pipe, we
> can say that and put it in practice).  The overload will be due to a
> high baseline of little guys (20 packet flows, or a range of fairly
> small ones), plus some percentage of longer duration flows capable of
> sucking up the better part of a T1, giving half a chance.  It is the
> latter that you want to slow down, and these are the ones that you
> *can* slow down with a fairly low drop rate.
> 
> I leave it as an exercise to the reader to determine how RED fits into
> this picture (either one, my overload scenario or Jim's where all the
> flows are 20 packets in duration).
> 
> The 400 flows on 4 Mb/s is an interesting (and difficult) case.  I've
> suggested both allowing delay to get very large (ie: as high as 2
> seconds) and hacking the host implementation to reduce segment size to
> as low as 128 bytes when RTT gets huge or cwnd drops below 4 segments,
> holding the window to no less than 512 (4 segments) in hopes that fast
> retransmit will almost always work even in 15-20% loss situations.
> 
> Curtis
> 


Curtis's original posting:


> In order to get X bandwidth on a given TCP flow you need to have an
> average window size of X * RTT.  This is expressed in terms of TCP
> segments N = (X * RTT) / MSS (or more correctly the segment size in
> use rather than MSS).  To sustain an average window of N segments, you
> must ideally reach a steady state where you cut cwnd (current window)
> in half, then grow linearly, fluctuating between 2/3 and 4/3 of the
> target size.  This would mean one drop in 2/3 N windows or DropRate in
> terms of time is 2/3 N * RTT.  In one RTT on average X * RTT amount of
> data flows.  In practice, you rarely drop at the perfect time, so the
> constant 2/3 (call it K) can be raised to 1-2.  Since N = (X * RTT) /
> MSS, DropRate = K * X * RTT * X * RTT / MSS.  Units are b/s * sec *
> b/s * sec / b, or b.  The DropRate expressed in bits can be converted
> to seconds or packets (divide by X or by MSS).  This type of analysis
> is courtesy of the good folks at PSC (Matt, Jamshid, et al).
> 
> For example, to get 40 Mb/s at 70 msec RTT and 4096 MSS, you get one
> error about every 6 seconds (K=1) or 1 in 7,300 packets.  If you look
> at 56k Kb/s and 512 MSS you get a very interesting result.  You need
> one error every 66 msec or 1 error in 0.9 packets.  This gives a good
> incentive to increase delay.  At 250 msec, you get a result of one
> error in 11.7 packets (much better!).
> 
> Another interesting point to note is that you need 3 duplicate ACKs
> for TCP fast retransmit to work, so your window must be at least 4
> segments (and should be more).  If you have a very large number of TCP
> flows, where on average people get less than 1200 baud or so, the
> delay you need to make TCP work well starts to exceed the magic 3
> second boundary.  This was discussed ad nauseum on end2end-interest.
> An important result is that you need more queueing than the delay
> bandwidth product for severely congested links.  Another is that there
> is a limit to the number of active TCP flows that can be supported per
> bandwidth.  One suggestion to address the latter problem is to further
> drop segment size if cwnd is less than 4 segments in size and/or when
> estimated RTT gets into the seconds range.
> 
> This analysis of how much loss is acceptable to TCP may not be outside
> the bounds of an informational RFC, but so far none exists.
> 
> Curtis




More information about the NANOG mailing list