Questions about Internet Packet Losses

Tue Jan 14 03:47:16 UTC 1997

> From: Bob Metcalfe <metcalfe at infoworld.com>
> Is Merit's packet loss data (NetNow) credible?

The measurement needs more data points located inside other providers,
but is pretty accurate regarding links into and out of the major NAPs.

> Do packet losses in the
> Internet now average between 2% and 4% daily?

This is nothing new.  In fact, it used to be much worse.  Remember 1986?
1989?  1994?  We've had pretty serious loss rates at times.

> Are 30% packet losses common
> during peak periods?

I've personally measured 40% pretty regularly, and 80% at times.  But it
is much better now than a few years back.

> Is there any evidence that Internet packet losses are
> trending up or down?
>
Losses are on the rise again (past few months), but there was a downward
trend during 1996 for my own connections.  I specifically saw MCI and
Sprint put in faster links last year, and add more private interconnects.
Improved my life immensely.

> Were Merit's data correct, what would be the impact of 30% packet losses on
> opening up TCP connections?

TCP will keep working.   It's much harder on UDP, as the UDP
applications often aren't very good on recovery.  Lots of applications
use UDP that should have used TCP.  TCP is very robust.

> On TCP throughput, say through a 28.8Kbps
> modem?

Compared to what?  I remember using TCP pretty well over 300 bps and
1200 bps modems.  Even at 28.8 Kbps, the delay in the modem swamps the
delay of the net.

> On Web throughput, since so many TCP connections are involved?

The Web is actually a major source of the problem.  It was not very well
designed, protocol-wise.  It causes rapid transient congestion.

> On DNS look-ups?

Yes, DNS uses UDP, and fails a lot more often.  So, I just try again.

> On email transport?

SMTP uses TCP, and still works great.  Neither delay nor packet loss are
an issue.

> How big a problem is HTTP's opening of so many TCP connections?

It is a terrible problem.  There is no time for the congestion and round
trip estimation algorithms to kick in, as each connection is so short.

But, HTTP is being fixed.

> Does TCP
> need to operate differently than it does now when confronted routinely with
> 30% packet losses and quarter-second transit delays?

No.  It works fine.  BTW, 1/4 second delays are not a problem; as I said
earlier, this is actually better than we had when we developed the code,
2 second delays were typical in modems (576 byte payloads at 2400 bps).

> What is the proper
> response of an IP-based protocol, like TCP, as packet losses climb?  Try
> harder or back off or what?

Exponential backoff.  This is all well described by Van Jacobson nearly
a decade ago.  This kind of question is what makes folks think you
haven't done your homework, Bob.

> How robust are various widespread TCP/IP
> implementations in the face of 30% packet loss and quarter-second transit
> delays?
>
BSD 4.4 and derivatives work fine.  Newer implementations with Sack work
a bit better.

Karn's KA9Q stack (used in many small enterprise routers and some MS-DOS
hosts) is even more robust, as it was developed for amateur radio.  High
losses and high delay are typical in radio.

FTP Software's stack seems to perform very well, and they update it
regularly.

MacTCP is terrible.  And many early WWW servers and clients were Mac
based, so they have terrible TCP characteristics.  But, they still
actually worked!

But Apple replaced MacTCP last year with Open Transport, which works
much better.  Everyone should upgrade to MacOS 7.6.

I've also had some problems with Win'95, and actually removed it and
went back to Win 3.1.  Since I gave up on it, I never really got to the
bottom of why it was performing so badly.  As an ISP, we actually charge
more to setup folks with Win'95, while we charge nothing for Macs with
Open Transport.  Amazing difference in user support costs.

I've noticed that there are a rather a lot of old SunOS systems out
there.  They don't have modern versions of TCP/IP, with MTU Discovery
et cetera.  But they should have been upgraded years ago, and are
probably about to fall over anyway.

> Is the Internet's sometimes bogging down due mainly to packet losses or
> busy servers or what, or does the Internet not bog down?
>
Busy servers is the worst problem I've had in the past year.  Much worse
(more frequent) than the packet losses.

> What fraction of Internet traffic still goes through public exchange points
> and therefore sees these kinds of packet losses?  What fraction of Internet
> traffic originates and terminates within a single ISP?
>
>From personal experience at my clients, virtually all traffic is
distributed across multiple providers.  Hardly any is local.  This has
always been true of email, and is also true of WWW traffic.

But as far as I know, there is no serious data available on internal
versus external traffic of ISPs.  In the ISP I partly own, most of the
traffic is external.  Where dial-up users POP their email from the
server locally, the email still came via other providers to the server.

> Where is the data on packet losses experienced by traffic that does not go
> through public exchange points?
>
As far as I know, there is no data on how much traffic goes through
public versus private exchange points.  But, we should encourage more
exchange points of every kind.

> If 30% loss impacts are noticeable, what should be done to eliminate the
> losses or reduce their impacts on Web performance and reliability?
>
The losses are noticable.  What _should_ be done is fairly well known.
We've been talking about it for years.  I've been fairly active on the
topic.

Link speeds do not increase at the rate of Internet traffic.  Merely
making the links faster is doomed to fail.

Routing performance will not increase at the rate of Internet traffic.
Merely adding links between the same places is doomed to fail.

Resource reservation on already congested links is doomed to fail.

Resource reservation on many short flows is doomed to fail.

We need providers to share faster links, such as inter-continental.
By the very nature of Internet traffic multiplexing, it is better to
share one bigger link than many smaller ones.  Traffic shaping would
ensure each provider getting their "fair share".

We need more exchanges, both public and private.  There should be one or
more major public exchanges in every metropolitan area.  Massive
parallel inter-connections.  More robust in the case of link failure and
as backhoe protection.  It's the only way we can scale at the rate of
Internet traffic.

Unfortunately, both these solutions require cooperation, which is in
short supply.

> Are packet losses due mainly to transient queue buffer overflows of user
> traffic or to discards by overburdened routing processors or something else?
>
Most of the packet losses I see and have verified are _link_
underprovisioning!  That is, providers have sold more subscriber
connections than they can carry to other providers, and subscribers have
bought links that are too small for the amount of traffic they generate.
I've seen the provider version of the problem a lot more often than I've
seen the subscriber problem.

> What does Merit mean when they say that some of these losses are
> intentional because of settlement issues?

In some cases, the decision to keep the small congested link to other
providers appears to be political and deliberate.

> Are ISPs cooperating
> intelligently in the carriage of Internet traffic, or are ISPs competing
> destructively, to the detriment of them and their customers?
>
Major ISPs are not cooperating very well.  Regional ISPs are doing a
better job of cooperating.  There are plenty of examples of both.

WSimpson at UMich.edu
    Key fingerprint =  17 40 5E 67 15 6F 31 26  DD 0D B9 9B 6A 15 2C 32
BSimpson at MorningStar.com
    Key fingerprint =  2E 07 23 03 C5 62 70 D3  59 B1 4F 5E 1D C2 C1 A2