Extreme congestion (was Re: inter-domain link recovery)
Fred Baker
fred at cisco.com
Thu Aug 16 17:15:28 UTC 2007
On Aug 16, 2007, at 7:46 AM, <michael.dillon at bt.com> wrote:
>> In many cases, yes. I know of a certain network that ran with 30%
>> loss for a matter of years because the option didn't exist to
>> increase the bandwidth. When it became reality, guess what they did.
>
> How many people have noticed that when you replace a circuit with a
> higher capacity one, the traffic on the new circuit is suddenly
> greater than 100% of the old one. Obviously this doesn't happen all
> the time, such as when you have a 40% threshold for initiating a
> circuit upgrade, but if you do your upgrades when they are 80% or
> 90% full, this does happen.
well, so lets do a thought experiment.
First, that infocomm paper I mentioned says that they measured the
variation in delay pop-2-pop at microsecond granularity with hyper-
synchronized clocks, and found that with 90% confidence the variation
in delay in their particular optical network was less than 1 ms. Also
with 90% confidence, they noted "frequent" (frequency not specified,
but apparently pretty frequent, enough that one of the authors later
worried in my presence about offering VoIP services on it) variations
on the order of 10 ms. For completeness, I'll note that they had six
cases in a five hour sample where the delay changed by 100 ms and
stayed there for a period of time, but we'll leave that observation
for now.
Such spikes are not difficult to explain. If you think of TCP as an
on-off function, a wave function with some similarities to a sin
wave, you might ask yourself what the sum of a bunch of sin waves
with slightly different periods is. It is also a wave function, and
occasionally has a very tall peak. The study says that TCP
synchronization happens in the backbone. Surprise.
Now, let's say you're running your favorite link at 90% and get such
a spike. What happens? The tip of it gets clipped off - a few packets
get dropped. Those TCPs slow down momentarily. The more that happens,
the more frequently TCPs get clipped and back off.
Now you upgrade the circuit and the TCPs stop getting clipped. What
happens?
The TCPs don't slow down. They use the bandwidth you have made
available instead.
in your words, "the traffic on the new circuit is suddenly greater
than 100% of the old one".
In 1995 at the NGN conference, I found myself on a stage with Phill
Gross, then a VP at MCI. He was basically reporting on this
phenomenon and apologizing to his audience. MCI had put in an OC-3
network - gee-whiz stuff then - and had some of the links run too
close to full before starting to upgrade. By the time they had two
OC-3's in parallel on every path, there were some paths with a
standing 20% loss rate. Phill figured that doubling the bandwidth
again (622 everywhere) on every path throughout the network should
solve the problem for that remaining 20% of load, and started with
the hottest links. To his surprise, with the standing load > 95% and
experiencing 20% loss at 311 MBPS, doubling the rate to 622 MBPS
resulted in links with a standing load > 90% and 4% loss. He still
needed more bandwidth. After we walked offstage, I explained TCP to
him...
Yup. That's what happens.
Several folks have commented on p2p as a major issue here.
Personally, I don't think of p2p as the problem in this context, but
it is an application that exacerbates the problem. Bottom line, the
common p2p applications like to keep lots of TCP sessions flowing,
and have lots of data to move. Also (and to my small mind this is
egregious), they make no use of locality - if the content they are
looking for is both next door and half-way around the world, they're
perfectly happen to move it around the world. Hence, moving a file
into a campus doesn't mean that the campus has the file and will stop
bothering you. I'm pushing an agenda in the open source world to add
some concept of locality, with the purpose of moving traffic off ISP
networks when I can. I think the user will be just as happy or
happier, and folks pushing large optics will certainly be.
More information about the NANOG
mailing list