Extreme congestion (was Re: inter-domain link recovery)

Fred Baker fred at cisco.com
Thu Aug 16 17:15:28 UTC 2007

On Aug 16, 2007, at 7:46 AM, <michael.dillon at bt.com> wrote:
>> In many cases, yes. I know of a certain network that ran with 30%  
>> loss for a matter of years because the option didn't exist to  
>> increase the bandwidth. When it became reality, guess what they did.
> How many people have noticed that when you replace a circuit with a  
> higher capacity one, the traffic on the new circuit is suddenly  
> greater than 100% of the old one. Obviously this doesn't happen all  
> the time, such as when you have a 40% threshold for initiating a  
> circuit upgrade, but if you do your upgrades when they are 80% or  
> 90% full, this does happen.

well, so lets do a thought experiment.

First, that infocomm paper I mentioned says that they measured the  
variation in delay pop-2-pop at microsecond granularity with hyper- 
synchronized clocks, and found that with 90% confidence the variation  
in delay in their particular optical network was less than 1 ms. Also  
with 90% confidence, they noted "frequent" (frequency not specified,  
but apparently pretty frequent, enough that one of the authors later  
worried in my presence about offering VoIP services on it) variations  
on the order of 10 ms. For completeness, I'll note that they had six  
cases in a five hour sample where the delay changed by 100 ms and  
stayed there for a period of time, but we'll leave that observation  
for now.

Such spikes are not difficult to explain. If you think of TCP as an  
on-off function, a wave function with some similarities to a sin  
wave, you might ask yourself what the sum of a bunch of sin waves  
with slightly different periods is. It is also a wave function, and  
occasionally has a very tall peak. The study says that TCP  
synchronization happens in the backbone. Surprise.

Now, let's say you're running your favorite link at 90% and get such  
a spike. What happens? The tip of it gets clipped off - a few packets  
get dropped. Those TCPs slow down momentarily. The more that happens,  
the more frequently TCPs get clipped and back off.

Now you upgrade the circuit and the TCPs stop getting clipped. What  

The TCPs don't slow down. They use the bandwidth you have made  
available instead.

in your words, "the traffic on the new circuit is suddenly greater  
than 100% of the old one".

In 1995 at the NGN conference, I found myself on a stage with Phill  
Gross, then a VP at MCI. He was basically reporting on this  
phenomenon and apologizing to his audience. MCI had put in an OC-3  
network - gee-whiz stuff then - and had some of the links run too  
close to full before starting to upgrade. By the time they had two  
OC-3's in parallel on every path, there were some paths with a  
standing 20% loss rate. Phill figured that doubling the bandwidth  
again (622 everywhere) on every path throughout the network should  
solve the problem for that remaining 20% of load, and started with  
the hottest links. To his surprise, with the standing load > 95% and  
experiencing 20% loss at 311 MBPS, doubling the rate to 622 MBPS  
resulted in links with a standing load > 90% and 4% loss. He still  
needed more bandwidth. After we walked offstage, I explained TCP to  

Yup. That's what happens.

Several folks have commented on p2p as a major issue here.  
Personally, I don't think of p2p as the problem in this context, but  
it is an application that exacerbates the problem. Bottom line, the  
common p2p applications like to keep lots of TCP sessions flowing,  
and have lots of data to move. Also (and to my small mind this is  
egregious), they make no use of locality - if the content they are  
looking for is both next door and half-way around the world, they're  
perfectly happen to move it around the world. Hence, moving a file  
into a campus doesn't mean that the campus has the file and will stop  
bothering you. I'm pushing an agenda in the open source world to add  
some concept of locality, with the purpose of moving traffic off ISP  
networks when I can. I think the user will be just as happy or  
happier, and folks pushing large optics will certainly be.

More information about the NANOG mailing list