Questions about Internet Packet Losses

Mon Jan 13 23:26:14 UTC 1997

to add to tony's constructive response...

re: host stacks.  other improvements i've heard of that are
relevant to "reacting to http" are (1) applying slow-start-
and congestion-avoidance-type algorithms to the rate at
which new tcp connections are opened and (2) having kernels
share data in the protocol control blocks relevant to the
tcp algorithms across connections to the same host (or even
"network").  there are issues with both of these ideas, but
the point is that we can do things to react to the observed
behavior resulting from the extreme popularity of the web

re: merit's data.  while i have the highest respects for
their motivations in collecting this data, i *do* concur
with tony about them doing anything more than reporting raw
numbers.  and with respect to the raw numbers, some questions
and observations are:

    (1) the measurements are made nap-to-nap.  no user traffic
        goes from nap-to-nap (because ISP1 doesn't play
        transit for traffic between ISP2 and ISP3)
    (2) what percentage of traffic between large providers
        goes across public exchanges anyway?
    (3) do the collection methodologies and analysis
        techniques share community consensus? is the former
        independantly verifiable?

having said all of this, i will add that some of their data
is alarming.  but for the second time today on this list, i
will say that we need to be careful about chasing stats for
their own sake .. we need to look very carefully at *exactly*
what is being measured, how it's being collected and analyzed
and what the results actually mean

/jws

 > 
 > Bob,
 > 
 > You quote:
 > 
 >    "Although some
 >    of the packet loss is inadvertent, a large percentage of the public
 >    exchange point connectivity problems reflect intentional engineering
 >    decisions by Internet service providers based on commercial settlement
 >    issues.
 > 
 > I think that this is an _extremely_ dangerous assertion on Merit's part.
 > As always, ascribing intent rather than raw data requires much more
 > justification which I have yet to see.
 > 
 >    Are you familiar with this packet loss data from Merit?  If not, please s
ee
 >    above URL.
 > 
 > Am now...  ;-)
 > 
 >    Is Merit's packet loss data (NetNow) credible?  Do packet losses in the
 >    Internet now average between 2% and 4% daily?  Are 30% packet losses comm
on
 >    during peak periods?  Is there any evidence that Internet packet losses a
re
 >    trending up or down?
 > 
 > Yes, that matches my instinctive feel.  I don't have concrete data which
 > corroborates or disputes their data, nor reflects high packet loss rates
 > nor trends.
 > 
 >    Were Merit's data correct, what would be the impact of 30% packet losses 
on
 >    opening up TCP connections?  
 > 
 > TCP is pretty damn robust.  Opening a connection is still likely to work.
 > 
 >    On TCP throughput, say through a 28.8Kbps
 >    modem?  On Web throughput, since so many TCP connections are involved?  O
n
 >    DNS look-ups?  On email transport?
 > 
 > As you might imagine, that kind of packet loss rate is 'highly detrimental'
 > to throughput.  If you're asking for concrete numbers, I don't have them,
 > but I've lived through them.  Qualitatively, it means that interactive
 > usage is intolerable.  On the bright side, email works just fine.
 > 
 >    How big a problem is HTTP's opening of so many TCP connections?  
 > 
 > It's a very significant problem.  It decreases the average packet size,
 > thereby making router work much harder.  It generates many more packets
 > than necessary, and then closes down the connection after a very short
 > transfer.  In short, it's a horribly inefficient use of the net.
 > 
 >    Does TCP need to operate differently than it does now when confronted
 >    routinely with 30% packet losses and quarter-second transit delays?
 > 
 > Your question presumes that we should live with the 30% losses.  We should
 > not.  TCP does palatably well at surviving such brown-outs and I would not
 > suggest changes for that cause.  Note that there are other changes that I'd
 > like to see, such as more use of Path MTU Discovery and fixing HTTP which
 > are much more important.  The quarter-second transit delays fall into two
 > categories: one are transient delays, mostly caused by routing transients.
 > Obviously we need to minimize such transients.  The second is normal
 > propagation delay.  Using larger windows would aid that a great deal.  I
 > don't think that many TCP implementations allocate sufficient buffering
 > today to truly be efficient.
 > 
 >    What is the proper
 >    response of an IP-based protocol, like TCP, as packet losses climb?  Try
 >    harder or back off or what?  
 > 
 > Back off.  Slow start is the accepted algorithm.  Trying harder only
 > increases congestion.
 > 
 >    How robust are various widespread TCP/IP
   implementations in the face of 30% packet loss and quarter-second transit
 >    delays?
 > 
 > I have yet to see a significant problem with robustness.
 > 
 >    Is the Internet's sometimes bogging down due mainly to packet losses or
 >    busy servers or what, or does the Internet not bog down?
 > 
 > That depends on your definitions.  "The Internet" as a whole does not bog
 > down.  It's a modular system and there are localized problems and
 > congestion which result in poor service to a wide-ranging set of users.
 > The causes of the problems vary.  I've seen lots of really slow servers,
 > congested access links, unhappy routers, congested interconnects, etc.
 > 
 >    Where is the data on packet losses experienced by traffic that does not g
o
 >    through public exchange points?
 > 
 > I suspect that you'd have to ask the parties involved in the private
 > exchange point.  I suspect that there are not such statistics currently
 > kept, or if so, they would not be willing to disclose them.  Thus IPPM...
 > 
 >    If 30% loss impacts are noticeable, what should be done to eliminate the
 >    losses or reduce their impacts on Web performance and reliability?
 > 
 > Ah...  Yes, loss rates of 30% are noticeable and painful.  There are
 > literally hundreds of things that can and should be done to imrpove
 > things.  Let's see, just off the top of my head:
 > 
 > - more private interconnects are necessary in the long term to scale the
 >   network.  We cannot have interconnects of infinite bandwidth as hardware
 >   simply doesn't scale as quickly as demand.  Thus, we need to invoke
 >   parallelism.  I think that this is already happening in a reasonable way.
 > - more bandwidth.  Of course, faster is better.  OC3 SONET technology is
 >   quickly becoming an obvious upgrade path from today's T3 backbones.
 > - better routers.  Current implementations have many shortcomings which
 >   aggravate instability.
 > - accurate reporting.  There seems to be a trend to find a problem and get
 >   everyone hyped up over it, far in excess of reality.  We spend time
 >   dealing with such issues rather than doing beneficial engineering.
 > - improved protocols.  We have an ongoing scalability problem with our
 >   routing protocols.
 > - fixed host stacks.  Using the full MTU would be a boon.  Recent data
 >   indicates that >40% of the packets out there are 40 bytes.
 > 
 >    Are packet losses due mainly to transient queue buffer overflows of user
 >    traffic or to discards by overburdened routing processors or something el
se?
 > 
 > "mainly" is a dangerous quantifier given that there's no hard data.  My
 > intuition says that sheer congestion is the most serious problem, followed
 > closely by router implementation.
 > 
 >    What does Merit mean when they say that some of these losses are
 >    intentional because of settlement issues?
 > 
 > I think you really need to ask Merit that.  I could find no justification
 > for that on their Web page.
 > 
 >    Are ISPs cooperating
 >    intelligently in the carriage of Internet traffic, or are ISPs competing
 >    destructively, to the detriment of them and their customers?
 > 
 > Ummm...  I see them cooperating.  "intelligently" is in the eye of the
beholder.  Certainly there are some who are being anti-social.
 > 
 > Tony