Extreme congestion (was Re: inter-domain link recovery)
Rod.Beck at hiberniaatlantic.com
Wed Aug 15 19:40:27 UTC 2007
Is this a declaration of principles? There is no reason why 'Tier 1' means that the carrier will not have an incentive to shape or even block traffic. Particularly, if they have a lot of eyeballs.
Roderick S. Beck
Director of EMEA Sales
1, Passage du Chantier, 75012 Paris
AOL Messenger: GlobalBandwidth
rod.beck at hiberniaatlantic.com
rodbeck at erols.com
``Unthinking respect for authority is the greatest enemy of truth.'' Albert Einstein.
From: owner-nanog at merit.edu on behalf of Chiloé Temuco
Sent: Wed 8/15/2007 6:06 PM
To: nanog at merit.edu
Subject: Re: Extreme congestion (was Re: inter-domain link recovery)
Congestion and applications...
A tier 1 provider does not care what traffic it carries. That is all a function of the application not the network.
A tier 2 provider may do traffic shaping, etc.
A tier 3 provider may decide to block traffic paterns.
More or less... The network was intended to move data from one machine to another... The less manipulation in the middle the better... No manipulation of the payload is the name of the game.
That being said. It's entirely a function of the application to timeout and drop out of order packets, etc.
ONS is designed around this principle.
In streaming data... often it is better to get bad or missing data than to try and put out of order or bad data in the buffer...
A good example is digital over-the-air tv... If you didn't build in enough error correction... then you'll have digital breakup, etc. It is impossible to recover any of that data.
If reliable transport of data is required... That is a function of the application.
ONS is an Optical Networking Standard in the development stage.
On 8/15/07, Stephen Wilcox <steve.wilcox at packetrade.com> wrote:
On Wed, Aug 15, 2007 at 11:35:43AM -0400, Sean Donelan wrote:
> On Wed, 15 Aug 2007, Stephen Wilcox wrote:
> >(Check slide 4) - the simple fact was that with something like 7 of 9
> >cables down the redundancy is useless .. even if operators maintained
> >N+1 redundancy which is unlikely for many operators that would imply
> >50% of capacity was actually used with 50% spare.. however we see
> >around 78% of capacity is lost. There was simply to much traffic and
> >not enough capacity.. IP backbones fail pretty badly when faced with
> >extreme congestion.
> Remember the end-to-end principle. IP backbones don't fail with extreme
> congestion, IP applications fail with extreme congestion.
Hmm I'm not sure about that... a 100% full link dropping packets causes many problems:
L7: Applications stop working, humans get angry
L4: TCP/UDP drops cause retransmits, connection drops, retries etc
L3: BGP sessions drop, OSPF hellos are lost.. routing fails
L2: STP packets dropped.. switching fails
I believe any or all of the above could occur on a backbone which has just failed massively and now has 20% capacity available such as occurred in SE Asia
> Should IP applications respond to extreme congestion conditions better?
"Ping timed out"
kinda icky but its not the applications job to manage the network
> Or should IP backbones have methods to predictably control which IP
> applications receive the remaining IP bandwidth? Similar to the telephone
> network special information tone -- All Circuits are Busy. Maybe we've
> found a new use for ICMP Source Quench.
yes and no.. for a private network perhaps, but for the Internet backbone where all traffic is important (right?), differentiation is difficult unless applied at the edge and you have major failure and congestion i dont see what you can do that will have any reasonable effect. perhaps you are a government contractor and you reserve some capacity for them and drop everything else but what is really out there as a solution?
FYI I have seen telephone networks fail badly under extreme congestion. CO's have small CPUs that dont do a whole lot - setup calls, send busy signals .. once a call is in place it doesnt occupy CPU time as the path is locked in place elsewhere. however, if something occurs to cause a serious amount of busy ccts then CPU usage goes thro the roof and you can cause cascade failures of whole COs
telcos look to solutions such as call gapping to intervene when they anticipate major congestion, and not rely on the network to handle it
> Even if the IP protocols recover "as designed," does human impatience mean
> there is a maximum recovery timeout period before humans start making the
> problem worse?
i'm not sure they were designed to do this.. the arpanet wasnt intended to be massively congested.. the redundant links were in place to cope with loss of a node and usage was manageable.
This e-mail and any attachments thereto is intended only for use by the addressee(s) named herein and may be proprietary and/or legally privileged. If you are not the intended recipient of this e-mail, you are hereby notified that any dissemination, distribution or copying of this email, and any attachments thereto, without the prior written permission of the sender is strictly prohibited. If you receive this e-mail in error, please immediately telephone or e-mail the sender and permanently delete the original copy and any copy of this e-mail, and any printout thereof. All documents, contracts or agreements referred or attached to this e-mail are SUBJECT TO CONTRACT. The contents of an attachment to this e-mail may contain software viruses that could damage your own computer system. While Hibernia Atlantic has taken every reasonable precaution to minimize this risk, we cannot accept liability for any damage that you sustain as a result of software viruses. You should carry out your own virus checks before opening any attachment
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the NANOG