Extreme congestion (was Re: inter-domain link recovery)
Fred Baker
fred at cisco.com
Wed Aug 15 16:59:00 UTC 2007
let me answer at least twice.
As you say, remember the end-2-end principle. The end-2-end
principle, in my precis, says "in deciding where functionality should
be placed, do so in the simplest, cheapest, and most reliable manner
when considered in the context of the entire network. That is usually
close to the edge." Note the presence of advice and absence of mandate.
Parekh and Gallagher in their 1993 papers on the topic proved using
control theory that if we can specify the amount of data that each
session keeps in the network (for some definition of "session") and
for each link the session crosses define exactly what the link will
do with it, we can mathematically predict the delay the session will
experience. TCP congestion control as presently defined tries to
manage delay by adjusting the window; some algorithms literally
measure delay, while most measure loss, which is the extreme case of
delay. The math tells me that place to control the rate of a session
is in the end system. Funny thing, that is found "close to the edge".
What ISPs routinely try to do is adjust routing in order to maximize
their ability to carry customer sessions without increasing their
outlay for bandwidth. It's called "load sharing", and we have a list
of ways we do that, notably in recent years using BGP advertisements.
Where Parekh and Gallagher calculated what the delay was, the ISP has
the option of minimizing it through appropriate use of routing.
ie, edge and middle both have valid options, and the totality works
best when they work together. That may be heresy, but it's true. When
I hear my company's marketing line on intelligence in the network
(which makes me cringe), I try to remind my marketing folks that the
best use of intelligence in the network is to offer intelligent
services to the intelligent edge that enable the intelligent edge to
do something intelligent. But there is a place for intelligence in
the network, and routing its its poster child.
In your summary of the problem, the assumption is that both of these
are operative and have done what they can - several links are down,
the remaining links (including any rerouting that may have occurred)
are full to the gills, TCP is backing off as far as it can back off,
and even so due to high loss little if anything productive is in fact
happening. You're looking for a third "thing that can be done" to
avoid congestive collapse, which is the case in which the network or
some part of it is fully utilized and yet accomplishing no useful work.
So I would suggest that a third thing that can be done, after the
other two avenues have been exhausted, is to decide to not start new
sessions unless there is some reasonable chance that they will be
able to accomplish their work. This is a burden I would not want to
put on the host, because the probability is vanishingly small - any
competent network operator is going to solve the problem with money
if it is other than transient. But from where I sit, it looks like
the "simplest, cheapest, and most reliable" place to detect
overwhelming congestion is at the congested link, and given that
sessions tend to be of finite duration and present semi-predictable
loads, if you want to allow established sessions to complete, you
want to run the established sessions in preference to new ones. The
thing to do is delay the initiation of new sessions.
If I had an ICMP that went to the application, and if I trusted the
application to obey me, I might very well say "dear browser or p2p
application, I know you want to open 4-7 TCP sessions at a time, but
for the coming 60 seconds could I convince you to open only one at a
time?". I suspect that would go a long way. But there is a trust
issue - would enterprise firewalls let it get to the host, would the
host be able to get it to the application, would the application
honor it, and would the ISP trust the enterprise/host/application to
do so? is ddos possible? <mumble>
So plan B would be to in some way rate limit the passage of TCP SYN/
SYN-ACK and SCTP INIT in such a way that the hosed links remain fully
utilized but sessions that have become established get acceptable
service (maybe not great service, but they eventually complete
without failing).
On Aug 15, 2007, at 8:59 AM, Sean Donelan wrote:
> On Wed, 15 Aug 2007, Fred Baker wrote:
>> On Aug 15, 2007, at 8:35 AM, Sean Donelan wrote:
>>> Or should IP backbones have methods to predictably control which
>>> IP applications receive the remaining IP bandwidth? Similar to
>>> the telephone network special information tone -- All Circuits
>>> are Busy. Maybe we've found a new use for ICMP Source Quench.
>>
>> Source Quench wouldn't be my favored solution here. What I might
>> suggest is taking TCP SYN and SCTP INIT (or new sessions if they
>> are encrypted or UDP) and put them into a lower priority/rate
>> queue. Delaying the start of new work would have a pretty strong
>> effect on the congestive collapse of the existing work, I should
>> think.
>
> I was joking about Source Quench (missing :-), its got a lot of
> problems.
>
> But I think the fundamental issue is who is responsible for
> controlling the back-off process? The edge or the middle?
>
> Using different queues implies the middle (i.e. routers). At best
> it might be the "near-edge," and creating some type of shared
> knowledge between past, current and new sessions in the host stacks
> (and maybe middle-boxes like NAT gateways).
>
> How fast do you need to signal large-scale back-off over what time
> period? Since major events in the real-world also result in a lot
> of "new" traffic, how do you signal new sessions before they reach
> the affected region of the network? Can you use BGP to signal the
> far-reaches of the Internet that I'm having problems, and other
> ASNs should start slowing things down before they reach my region
> (security can-o-worms being opened).
More information about the NANOG
mailing list