How relable does the Internet need to be? (Was: Re: Converged Network Threat)

Steve Gibbard scg at gibbard.org
Thu Feb 26 00:30:15 UTC 2004


Having woken up this morning and realized it was raining in my bedroom
(last night was the biggest storm the Bay Area has had since my house got
its new roof last summer), and then having moved from cleaning up that
mess to vacuuming water out of the basement after the city's storm sewer
overflowed (which seems to happen to everybody in my neighborhood a couple
of times a year), I've spent lots of time today thinking about general
expectations of reliability.  In the telecommunications industry, where we
tend to treat reliability as very important and any outage as a disaster,
hopefully the questions I've been coming up with aren't career ending. ;)
With that in mind, how much in the way of reliability problems is it
reasonable to expect our users to accept?

If the Internet is a utility, or more generally infrastructure our society
depends on, it seems there are a bunch of different systems to compare it
to.  In general, if I pick up my landline phone, I expect to get a
dialtone, and I expect to be able to make a call.  If somebody calls my
landline, I expect the phone to ring, and if I'm near the phone I expect
to be able to answer.  Yet, if I want somebody to actually get through to
me reliably, I'll probably give them my cell phone number instead.  If it
rings, I'm far more likely to able to answer it easily than I am my
landline, since the landline phone is in a fixed location.  Yet some
significant portion of calls to or from my cell phone come in when I'm in
areas with bad reception, and the conversation becomes barely
understandable.  In many cases, the signal is too weak to make a call at
all, and those who call me get sent straight to voicemail.  Most of us put
up with this, because we judge mobility to be more important than
reliability.

I don't think I've ever had a natural gas outage that I've noticed, but
most of my gas appliances won't work without electric power.  I seem to
lose electric power at home for a few hours once a year or so, and after
the interuption life tends to resume as it was before.  When power outages
were significantly more frequent, and due to rationing rather than to
accidents, it caused major political problems for the California
government.  There must be some threshold for what people are willing to
accept in terms of residential power outages, that's somewhere above 2-3
hours per year.

In Ann Arbor, Michigan, where I grew up, the whole town tended to pretty
much grind to a halt two or three days a year, when more snow fell than
the city had the resources to deal with.  That quantity of snow necessary
to cause that was probably four or five inches.  My understanding is that
Minneapolis and Washington DC both grind to a halt due to snow with
somewhat similar frequency, but the amount of snow requred is
significantly more in Minneapolis and significantly less in DC.  Again,
there must be some threshold of interruptions due to exceptionally bad
weather that are tolerated, which nobody wants to do worse than and nobody
wants to spend the money to do better than.

So, it appears that among general infrastructure we depend on, there are
probably the following reliability thresholds:

Employees not being able to get to work due to snow: two to three days per
year.
Berkeley storm sewers: overflow two to three days per year.
Residential Electricity: out two to three hours per year.
Cell phone service: Somewhat better than nine fives of reliability ;)
Landline phone service:  I haven't noticed an outage on my home lines in a
few years.
Natural gas: I've never noticed an outage.

How Internet service fits into that of course depends on how you're
accessing the Net.  The T-Mobile GPRS card I got recently seems
significantly less reliable than my cell phone.  My SBC DSL line is almost
to the reliability level of my landline phone or natural gas service,
except that the DSL router in my basement doesn't work when electric power
is out.  I'm probably poorly qualified to talk about the end-user
experience on the networks I actually work on, even if I had permission
to.  Like pretty much everybody else here, I'm always interested in doing
better on reliability.  And, like many of my neighbors, I'd like to be
able to store stuff on my basement floor.  In comparison to a lot of other
infrastructure we depend on, it seems to me the Internet is already doing
pretty well.

-Steve

On Wed, 25 Feb 2004, Jared Mauch wrote:

>
> 	Ok.
>
> 	I can't sit by here while people speculate about the possible
> problems of a network outage.
>
> 	I think that most everyone here reading NANOG realizes that
> the Internet is becoming more and more central to daily life even
> for those that are not connected to the internet.
>
> 	From where i'm sitting, I see a number of potentially dangerous
> trends that could result in some quite catastrophic failures of networks.
> No, i'm not predicting that the internet will end in 8^H7 days or anything
> like that.  I think the Level3 outage as seen from the outside is a clear
> case that single providers will continue to have their own network failures
> for time to come.  (I just hope daily it's not my employers network ;-) )
>
> 	So, We're sitting here at the crossroads, where VoIP is
> "coming of age".  Vonage, 8x8 and others are blazing a path that
> the rest of the providers are now beginning to gun for.  We've already
> read in press releases and articles in the past year how providers
> in Canada and the US are moving to VoIP transport within their long-distance
> networks.
>
> 	I keep hear of Frame-Relay and ATM signaling that is going
> to happen in large providers MPLS cores.  That's right, your "safe" TDM
> based services, will be transported over someones IP backbone first.
> This means if they don't protect their IP network, the TDM services could
> fail.  These types of CES services are not just limited to Frame and ATM.
> (Did anyone with frame/atm/vpn services from Level3 experience the
> same outage?)
>
> 	Now the question of Emergency Services is being posed here but also
> in parallel by a number of other people at the FCC.  We've seen the E911
> recommendation come out regarding VoIP calls.  How long until a simple
> power failure results in the inability to place calls?
>
> 	Now, i'm not trying to pick on Level3 at all.  The trend I
> outline here is very real.  The reliance on the Internet for critical
> communications is a trend that continues.  Look at how it was used
> on 9/11 for communications when cell and land based telephony networks
> were crippled.
>
> 	The internet has become a very critical part of all of our lives
> (some more than others) with banks using VPNs to link their ATMs back into
> their corporate network as well as the number of people that use it for
> just plain "just in time" bill payment and other things.  I can literally
> cancel my home phone line, cell phone and communicate soley with my
> internet connection, performing all my bill payments without any paperwork.
> I can even file my taxes online.
>
> 	We're at (or already past) the dangerous point of network
> convergence.  While I suspect that nobody directly died as a result of
> the recent outage, the trend to link together hospitals, doctors
> and other agencies via the Internet and a series of VPN clients continues
> to grow.  (I say this knowing how important the internet is to
> the medical community, reading x-rays and other data scans at home for the
> oncall is quite common).
>
> 	While my friends that are local VFD do still have the traditional
> pager service with towers, etc... how long until the T1's that are
> used for dial-in or speaking to the towers are moved to some sort of
> IP based system?  The global economy seems to be going this direction with
> varying degrees of caution.
>
> 	I'm concerned, but not worried.. the network will survive..
>
> 	- Jared
>
>
> On Wed, Feb 25, 2004 at 09:17:30AM -0600, Pete Templin wrote:
> > If an IP-based system lets you see the status of the 23 hospitals in San
> > Antonio graphically, perhaps overlaid with near-real-time traffic
> > conditions, I'd rather use it as primary and telephone as secondary.
> >
> > Counting on it?  No.  Gaining usability from it?  You betcha.
> >
> > Brian Knoblauch wrote:
> >
> > >	If you're counting on IP (a "best attempt" protocol) for critical
> > >data, you've got a serious design flaw in your system...
> > >
> > >-----Original Message-----
> > >From: owner-nanog at merit.edu [mailto:owner-nanog at merit.edu] On Behalf Of
> > >Pete
> > >Templin
> > >Sent: Wednesday, February 25, 2004 9:10
> > >To: Colin Neeson
> > >Cc: nanog at merit.edu
> > >Subject: Re: Level 3 statement concerning 2/23 events (nothing to see, move
> > >along)
> > >
> > >
> > >
> > >
> > >Are you sure no one died as a result?  My hobby is volunteering as a
> > >firefighter and EMT.  If Level3's network sits between a dispatch center
> > >or mobile data terminal and a key resource, it could be a factor
> > >(hospital status website, hazardous materials action guide, VoIP link
> > >that didn't reroute because the control plane was happy but the
> > >forwarding plane was sad, etc.).
> > >
> > >And if the problem could happen to another network tomorrow but could be
> > >prevented or patched, wouldn't inquiring minds want to know?  Your life
> > >might be more interesting when the fit hits the shan if you have the
> > >same vulnerability.
> > >
> > >Colin Neeson wrote:
> > >
> > >
> > >>Because, in the the grand scale scheme of things, it's really not that
> > >>important.
> > >>
> > >>No one died because of it, the normal, everyday events of the world
> > >>went
> > >>on,
> > >>unaffected by a Level 3 outage...
> > >>
> > >>Might be nice to know what happened, but my life will certainly not be
> > >>less interesting by not having that knowledge...
>
> --
> Jared Mauch  | pgp key available via finger from jared at puck.nether.net
> clue++;      | http://puck.nether.net/~jared/  My statements are only mine.
>

--------------------------------------------------------------------------------
Steve Gibbard				scg at gibbard.org
+1 415 717-7842	(cell)			http://www.gibbard.org/~scg
+1 510 528-1035 (home)



More information about the NANOG mailing list