SUMMARY: SONET ring questions

Peter Polasek pete at cobra.brass.com
Mon Oct 19 21:09:09 UTC 1998


Last week I submitted a question about SONET connectivity.  The message 
reached a much wider audience than I had anticipated and many people 
took the time to provide very detailed data.   All of the information 
was beneficial and I very much appreciate the efforts.  This message 
summarizes the information received.  The beginning of the message is 
my attempt at summarizing the responses to each point.  Excerpts from 
the individual responses are included after the summary to provide 
additional details and an opportunity for enthusiastic readers to draw 
their own conclusions.

Thanks,
Peter Polasek

> This is my first posting to the NANOG list.  I don't think this is
> off-topic, but if so, please send replies (or flames) directly to me 
> (rather than to the list) and I will issue a detailed summary.
>  
> We are in the process of implementing an OC-12 SONET ring connection 
> between two sites in New Jersey that span a distance of 15 miles.
> The SONET ring will be provided by Bell Atlantic and is composed of
> fully redundant hardware (there are no single points of within the
> telecom equipment) and redundant rings.  We are splitting the OC-12 
> into pairs of OC-3's on two routers in each location (running ATM on
> the WAN).  This interface is extremely mission critical to the point 
> that a 99.9% uptime will not be acceptable.  I have the following 
> questions:
> 
> 1) Bell Atlantic assures us that, because of the redundancy, we can
>    expect 100% uptime from the OC-12.  I would like feedback as to 
>    whether this is a realistic portrayal of the SONET environment.

The overwhelming majority felt that the reliability of the SONET will 
be very high. The real world switch times between primary and secondary 
rings are well under 1 second.  Nobody believes there will be 100% 
uptime, but, by all accounts, there should be lots of 9's following 
the 99.9% figure.  All of the respondent's reliability statements were 
predicated with the condition that all telecom equipment and fiber is 
redundant and non-shared.  Even Bell represents SONET as 'virtually'
100% uptime.  When asked about SONET failures, the BELL representative 
leaned back in the chair, convincingly straining to recall an incident, 
then finally described an episode in Alabama where connectivity was lost 
during simultaneous floods and earthquakes while, I believe, JR was being
shot.  Actually, I may not have all the details correct - but it was 
either an academy award performance or a sincere portrayal of a 
reliable medium.  The response from the group seems to support the 
latter.  Many suggested that 'guarantee of diversity' and 'downtime
penalty' clauses  should be written into the Bell Atlantic contract.  

Howard Berkowitz suggested URL 'http://www.bell-atl.atd.net/s-wpaper'
describing a Bell Atlantic SONET deployment for military organizations.
 
Several people pointed out that the sample period is extremely 
important when talking about uptime percentages.  This is a very
valid point.  Thirty minutes of downtime in a year represents 99.994
percent uptime.  If the 30 minutes is consecutive, then this would be a 
very large impact on service.  If there were 30 one minute interruptions,
the impact would be far less significant for our application (though it 
would represent vastly disappointing performance).

Several people questioned the use of ATM on WAN.  We are not running
ATM anywhere within the internal LAN and do not need voice or video.  The
ATM interface is being deployed on the WAN because it is the only option 
for 155Mbps connectivity - almost.  Cisco provides a relatively
new 'Packet over Sonet' (POS) option that more effectively uses the
OC-3 bandwidth because it eliminates the ATM encapsulation overhead.
We are considering this option but are a little hesitant because it is 
not terribly mature at this point.  I would be interested in hearing 
about any real-world experience with POS from those who are using 
it in a mission critical production environment.

> 2) We have the option of using either single-mode or multi-mode fiber 
>    OC-3 connections - what factors should be considered in selecting
>    the fiber media type.

The primary difference between the two is distance.  Single mode fiber
has much longer range than multimode.  Single mode also has a higher 
theoretical bandwidth limit - I didn't get any specific limits for each 
fiber type, but I suspect it is not an issue for OC-3 and OC-12 data 
rates.  Virtually all phone company fiber is single mode and they 
typically provide an on-site converter to hand off the signal in 
multi-mode.  3com and Cisco support both single and multi mode (with 
different cards).  The multi-mode cards are less expensive.  Cisco 
offers the single mode in two flavors - 'long range' and 'intermediate'.
A few people suggested avoiding the long range flavor.  One also 
suggested that it is very easy to damage the eyes through exposure to
open single mode fiber cables.  Surprisingly, nobody provided the 
specific range limits, but I was able to get these from the Cisco rep:

Type                      Distance    Aprox. List Price
------------------------  --------    -----------------
Multi-Mode                2Km          $8000.00
Single-Mode-Intermediate  15Km        $10000.00
Single-Mode-Long          45Km        $12000.00
 
Austin Schutz suggested the following URL to help select the correct fiber
mode: http://www.cisco.com/univercd/cc/td/doc/product/core/cis7505/ipicg

> 3) Which routers should be used.  The options are 3com NB-II's DPE+
>    (dual CPU), Cisco 7200 series, or Cisco 7500 (each router will 
>    have at least two 100Base-T LAN ports).  OK, I know the 3com 
>    suggestion is a loaded question for this list, but has anybody
>    used the 3com's in this capacity?  We are a 3com shop that is 
>    considering switching to Cisco - this is a significant decision 
>    because switching will require us to continue to maintain the 
>    existing 3com environment (~500 routers) and the new Cisco 
>    routers.

Most recommended Cisco (which is not terribly surprising given the
list membership) citing support as the primary discriminator.  Nobody
discouraged the 3com implementation and several are running similar
environments trouble-free with 3com equipment.  For processing power, 
the 7500 is recommended over the 7200 because a 7200 can only support 
3 Fast-Ethernet cards at wire-speed (we will probably have 2 fast-E, 2 
OC3's in each router).  The Cisco 12000 series router can handle a 
single OC-12 interface (this is not practical for us because we need
router diversity).  A few recommended using 3com's CoreBuilder ATM
switch products (7000/9000 series) to eliminate the need for routers.


The excerpts from individual responses are included below:
===========================================================

From: "Howard C. Berkowitz" <hcb at clark.net>
-------------------------------------------
There's an interesting white paper about a Bell Atlantic SONET deployment
for military organizations at:
 
http://www.bell-atl.atd.net/s-wpaper

From: Austin Schutz <tex at shrubbery.net>
---------------------------------------
Multimode fiber has the disadvantage that it it not capable of as high
bandwidth as single mode - a consideration if you are digging a trench for it.

If you want to make a good ball-park guess as to whether or not multimode 
will work for you it would probably be worthwhile to see:
 
http://www.cisco.com/univercd/cc/td/doc/product/core/cis7505/ipicg/ipicgpos.htm

From: Hal Murray <murray at pa.dec.com>
------------------------------------
I get very suspicious when anybody says "100% uptime".  I would go over the 
routing of the fibers very carefully.  There have been all too many tales of 
one backhoe cutting all supposedly redundant connections between points X 
and Y. You should probably even review it yearly.

You might consider going with 2 sets of routers from different vendors.  
The idea being that software bugs in one may not kill the others.

From: scott w <scott at digisle.net>
---------------------------------
Make sure each 'side' of the ring is run along *physically* different paths.
 
From: ilazar at rpm.com:
---------------------
Make sure the fiber rings are geographically diverse, i.e. they run along
separate paths.  I've seen cases where one cut can take down both parts of
a ring due their being run in the same conduit.  You might also want to
have a third backup of some sort.

Single-mode allows for greater distances for most protocols.  I don't know
the specifics for ATM, but for gigabit ethernet, the distance difference is
huge

i'm a Cisco guy, so I'm biased, but for OC-12 I would be looking at the
12000's, not the 7xxx series.  Best bet is to call Cisco and talk to an SE.
I really don't know anything about 3-com's product line so I can't comment.
You might also want to look at the Ascend GSR.

From: David Lesher <wb8foz at nrk.com>
-----------------------------------
Do you have a written guarantee of diversity routing of the fiber?

From: Robert Gibson <wa3pxx at pimmitrun.com>
------------------------------------------
Cisco would be the safest option. Cisco has people available 24hrs
per day who REALLY know hardware, and 4 hour turnaround for failed 
hardware. Remember with even two routers you have software issues
that might not be easy to fix with anyone other than Cisco.
There is no reason to think the 3COM would not work, but I doubt
they could respond/fix things ANY time.

From: bmanning at vacation.karoshi.com:
------------------------------------
You won't get 100%.  3com gear is fine, esp. if you are comfortable 
w/ the UI.

From: Erik <00199740 at bigred.unl.edu>:
-------------------------------------
I work for an ILEC/LEC/CLEC here in the midwest, we use SONET equipment
for data and voice traffic.  Bell Atl was not wrong in their portrayal of
SONET.  But redundancy really depends on what they are using for
switching, such as are they using Bi-directional Line Switched Rings, or
Unidirectional Line Switched rings?  Most LECs use Bi-directional Line
Switched rings for heavy traffic applications, and it is used by 90% or so
carriers out there.  Based on our experiences, we have never noticed a
failure which caused both the working ring, and protection ring to go
down, and have experienced no downtime whatsoever.  I work primarily with
WAN applications and we had a cut last year and we didn't see any downtime
associated with the cut, but, our carrier dept notified us that we were on
our protection ring, and we were on our working ring within a few hours,
again, with no downtime.  One of the Good Things (tm) about SONET is you
can split your traffic up, and use what is called "virtual rings."  (i.e.
running some traffic over the protection ring) But as far as my experience
and our experience with SONET, has been a positive one, and Bell Atl is
correct, as far as I can tell.  

The factors to consider are: What type of applications would be running 
over this fibre?  If it is bandwidth intensive, you would want to go 
with single-mode fibre. If it is bursty traffic, and not a constant 
bit-rate application such as video, or multimedia applications, or any 
application that is bandwidth intensive, then multi-mode would be fine.  
Another factor is, single-mode fiber is a bit more expensive than
multi-mode because of the end terminating equipment.

The 7200 would work depending on the application, we use a 75xx series
router with our SONET rings. We have about 7 of them, each connected to a
7xxx or higher series router.  I haven't had any experience with 3com
fiber equipment, so I can't really comment on that.  As far as redundancy,
and reliability Cisco would be the way to go just based on my experience.
If it is going to be bandwidth intensive the way to go would be with 75xx
series, they have the backplane and CPU horsepower to handle streaming
video, and medical imaging and such, which is what this sounds like to me.  

From: Leo Bicknell <bicknell at ufp.org>
-------------------------------------
Unfortunately, no.  Draw your redundant counter-rotating rings, so both 
sides travel down the same fiber bundle:

         _______                      ______
        /       \____________________/      \
station          ____________________         station 
        \_______/                    \______/

A cut of the fiber bundle in the middle will take out both rings, downing 
the circuit.  You need to make sure both rings have redundancy physical
paths from end to end for true redundancy, two different building
entrances, two different long haul paths.  Anywhere they are within
50' of each other one backhoe might take them both out.

If you have an active mux that can run Single Mode over the long haul 
and multimode locally the multimode will be cheaper.  If you have a
passive mux you'll need single mode interfaces that have a 15km throw 
length.  These will be a little more than multimode.  There is no (uptime) 
advantage to one or the other.

One would hope any router would work to balance between two circuits. 
Personally I would recommend using 7500 class routers, with OC-3 POSIP 
to the sonet side, and appropriate (fast-e, fddi, ATM, or gig-e) to the 
lan side.  In particular with redundant RSP's the deliver excellent uptime.

From: noc at nso.org (Network Operations Center)
---------------------------------------------
of what consists the assurance ? contractual agreement that they pay X-sum, 
if their word wasn't good ?  A solid  guaranteed100% up time
cant be reached at any event, as you depend on a third party...  One might 
reach 99.9999999%, but there's always that risk.... and then there are 
lawyers...

I didn't use 3Com.  I'd also look into the Cisco 12000 as it supports 
OC12. as in most cases Peter, it all depends on the 'doshies'.

From: Ron_Johnson at enron.net
---------------------------
This holds true if the ring does not share a single path. I.E. The ring
runs down a right of way on a railroad on both inbound and outbound runs.
SONET works roughly like FDDI in it's ability to self heal and route around
failures.  Of course your connection is a single point of failure. If your 
local router dies, or your onsite SONET gear loses power, you are down no 
matter what. Also your choice of ATM, is arguable. SONET is optical "T" 
carrier service. Running ATM over SONET is at a overhead price. I guess to 
make this more clear would be to ask; Would you run ATM over a T-1? Cisco 
makes Packet over SONET cards (POS) for 7500 series routers. Consider running 
in native modes. You will also get better signaling from the SONET rings 
about ring conditions.

We use single mode, But that is only because our SONET gear prefers single
mode. Single mode fiber is dangerous. The power behind the laser on single
mode will definitely cause damage to human optical devices (eyes). Single
mode fiber has advantages of being able to run longer distances without
repeaters. Long haul SONET is always single mode. Multimode conversion is
handled by your SONET ADM box that Bell Atlantic will supply.

From: Steve_Blanchard at 3com.com
------------------------------
3Com has some of the largest ATM networks in the world, in highly mission 
critical environments.  Cisco equipment is not bad either.  Converting to 
cisco will minimally cost you much more and require an additional management
platform as well as the requisite learning curve.  Hopefully, you have 
discussed your requirements with 3Com and also cisco.

From: "Bill St. Arnaud" <bill.st.arnaud at canarie.ca>
---------------------------------------------------
Yes, for the SONET rings themselves. Depending on the architecture, there is
not likely to be same redundancy afer the ADM.

If it is just a short pig tail of a couple of meters it should not make any
difference (whether you use single or multi mode fiber).

It is how the routers are configured that is more critical.  It sounds like
you want to run IP over ATM.  Your ATM switch could give you more problems
than the routers.  That has been our experience on an SONET based carrier
network.  At a NANOG meeting last spring SPRINT outlined a good cross-linked
architecture of how your routers should be connected to the SONET ADM.

From: "John A. Tamplin" <jat at Traveller.COM>
-------------------------------------------
You also mention ATM but don't say anything about the switches involved.
If you are using a public ATM network, then I doubt you will get better
than 99.9% uptime.  Otherwise, you need to be at least as careful about
the ATM switches as you are the routers.  You mention that you are a 3COM
shop, so you might be using the LinkBuilder 7000 ATM switch (that name may
not be exact, they went through so many names on it) -- if so, you need to
worry about cell loss in high load conditions, especially when you have links
that aren't all the same speed.  We had one here with OC3 connections to 
routers and hosts, and a DS3 to one of our other POPs.  It dropped cells like
crazy on the DS3 because it couldn't properly rate-limit the cell stream,
and I would suspect you would have the same problem mixing OC12 and OC3 
interfaces.  We swapped it out with a Cisco ATM switches and all of the 
problems went away.

Have them show you a fiber map of the path your fiber actually takes.  Too
many times both sides of the ring are in the same place somewhere along the
path, a perfect target for a backhoe.  Other than that, you obviously can't
get 100% uptime (there is always the chance that the last working hardware
fails before the redundant hardware is replaced), but you can get arbitrarily
close with sufficient levels of redundancy.

In our experience (not with BellAtl), we have not had a failure of any sort
on our OC12 ring in over a year, and we don't have redundant muxes on this
end, and there is about 20 feet of non-diverse fiber path in the loop.  Now
telco operator error when adding circuits over the OC12 is another story 
entirely :).

Simply what your equipment can take and the distances you need to go.  If
you have a short distance, you have to have an attenuator on the SM fiber to
avoid overdriving the receiver.  Most telco equipment only support SM.

We are all Cisco here, using 7500s for the core routers and 7200s and 4500s
for the borders.  If you stick to stable IOS releases, they just work.  Our
core routers are only down for IOS upgrades, period -- I can't remember the
last failure.  I've never used 3COM routers, but have had only lackluster
results with their ATM and Ethernet switches.

I don't know if 3COM has anything similar, but with Cisco you can setup
two routers on each end and use HSRP for redundancy.  You probably also
want to get dual RSPs and power supplies since you are so concerned about
downtime. 

From: Ron_Johnson at enron.net
---------------------------
OC-3(c) POS is a released product from Cisco. It is no longer Beta.  But it
still is pretty fresh. While OC-3 ATM has been around in Cisco's for 2+ years.
Cisco offers the OC-3 POS card in channelized and unchannelized forms, and
in single and multimode configurations.  You will likely want to look at OC-3c 
multimode.  Here is some Cisco blurbage about POS from Cisco's web site.

http://www.cisco.com/warp/customer/733/adap/pos/literature.shtml

From: Nikki  Gupta Mehta <nmehta at cisco.com>
-------------------------------------------
How are you splitting the OC12 into OC3?  Why are you running ATM?  You might 
want to consider the 12008.

100% uptime? depends on how intelligently they have built their sonet rings.
Mostly this is true.  I think a ring wraps in less than 25ns.

distance and cost.  SM is used for long distances and is more expensive.
Depending on the manufacturer there are 2 types  of SM: long reach and
intermediate reach.  MM is for shorter distances and is less expensive.

From: Niels Bakker <niels at euro.net>
------------------------------------
12000 series are fully redundant.  One mainboard can go up in flames, the
other will take over with no downtime.  7507 and 7513 are not-so-fully 
redundant: you can equip them with two power supplies and two main processing
boards, but a problem with one of them (not sure about the PSU's though) 
requires an (automatic) reboot.  7200 series can't be made redundant (unless 
you buy two and have a little devil on standby forever, I guess).

Also, try asking on cisco-nsp at qual.net or comp.dcom.sdh-sonet; probably a 
small overlap in audience, but you may get something better out of it
than I am able to provide. :)

If your telco is willing to put 100.0% on paper, I think I'll move.
I certainly wouldn't count on it...

From: Dave Israel <davei at biohazard.demon.digex.net>
---------------------------------------------------
Assuming the primary and backup paths are not in the same cable
anywhere (don't laugh, it happens), the chances of the OC-12 itself
failing inside Bell land are tiny. (Of course, if your local loop gets
assaulted by a backhoe, you're in trouble.) 
 
Single mode goes farther. Multi-mode is cheaper. If you've got
relatively short cables, save some money and use multi-mode. If
you're dragging fiber all the way across the building, or between
buildings on a campus, use single-mode.

If you go Cisco, I'd recommend the 7500 series. The 7200 series won't
handle two OC-3's and two fast ethers. You'll drop packets.

From: "Mark Evans" <evansm at cerf.net>
------------------------------------
Having come from Bell Atlantic NJ in a 'previous life', I'll take a swag at
point #1.

1) Bell Atlantic assures us that, because of the redundancy, we can expect
100% uptime from the OC-12.  I would like feedback as to whether this is a
realistic portrayal of the SONET environment.

It depends on who is telling you this, but it sounds like it is tinged with
sales-speak.  SONET is a good thing, and I am not looking to diminish it -
but SONET is only as good as the implementation it is riding on.  Some of
the things that would need to be in place to make the 100% claim more
supportable are:
a) dual entrances to the facilities in question, preferably coming into
separate sides of the building from different streets
b) going into separate fiber bundles when it leaves the building (versus
multiple strands in the same bundle (aka a collapsed ring), which would be
vulnerable to pole knock-down or backhoe fade)
c) fully redundant electronics - this gets into whether you have interface
diversity on the same SONET mux (vulnerable to chassis failure - which is
infrequent) versus having redundant electronics.  This gets into the dual
entrance discussion as well - having 2 telephone rooms in the facility (on
separate power sources) allows for dual electronics -- if you have both
entrance facilities terminating in the same equipment room, you may have
both strands going into the same mux (albeit on different interfaces).
d) the type of SONET configuration being utilized (ex. bi-directional line
switched ring)

2) Depends on what BA NJ is offering to hand off.  We frequently utilize
multi-mode into the 72xx and the 75xx, and things operate quite nicely.  If
you are dealing with very short lengths (patch cable distances) the optical
characteristics are not terribly different between single- versus multi-.
However, since optic performance characteristics are not my specialty, I'll
defer to what other respondents to your message suggest.

3) Can't give you advise on this one.  We're a Cisco powered network, and
like that flavor of hardware.  I can't say much about 3com's ATM abilities,
with no large base of experience to draw on.

From: "Bruce R. Babcock" <bbabcock at cisco.com>
---------------------------------------------
Even Sonet takes a small amount of time to self heal.  Depending on what 
they do with the facility and how you have the routers configured, the 
router may or may not 'see'`an interface flap.  If the interface flaps, the 
route table will clear for that interface and traffic can reroute assuming 
that alternate paths exist.  A few packets will be lost in any kind of link 
event regardless of how you are configured.  During a reroute, there is 
the chance of mis-sequence of small number of packets.  This is a non-issue 
for TCP and impact varies for UDP but usually it is minimal.  This is a 
very short duration event/possibility.  Our routers can load share on up 
to 6 paths.  We can do this for any IP routing protocol (IGP or EGP/BGP), 
even static routes and are not limited to OSPF.  Load sharing can be 
sequenced delivery of packets between each pair of IP endstations.  We 
also have link-by-link / packet-by-packet load sharing (sequence not 
guaranteed)  There is more about this technology on our web page under 
"CEF - Cisco Express Forwarding".  This is newer switching technology 
that most of the ISPs are implementing now.  It also supports rate shaping 
and IP QOS features at up to OC3 line rates.

MM is usually less expensive.  If the Sonet ADM is co-lo on your facility 
or a Km or 2 away, MM would be fine.  Most carriers default to SM but this 
[adds a per-port cost you may not need.  The primary deciding factor 
between MM and SM is length (assuming you need to install fiber in either 
case).  If you are within MM limits, verify that it is less expensive than 
SM and go with whatever costs less.  SM supports longer differences than 
MM and is available in various power output levels.  The Sonet ADM's will 
usually run SM on the ring for increased distance but use either SM or MM 
for connection between the DTE and the ADM.

Given that you will run two 100BT interfaces and probably a few OC3 links,
750X is the best choice.

From: Tony Li <tli at juniper.net>
-------------------------------
How many significant digits do you consider acceptable?  Even in an ideal
APS environment, link failure detection and protection switching does take
finite time.  You might get 99.999% uptime, but probably not 99.9999999%.
Methinks that you've been subjected to Marketing.  ;-)

From: Ken_King at 3com.com
------------------------
I thought I would ask you why you wanted to route at all when you have 
the entire OC3 at your disposal.  I have installed several switched 
connections just like the one that you described in Las Vegas & Albuquerque,
and have had no downtime in over 12 months.(Not due to our equipment at 
any rate)  I also have colleagues in the Phoenix office that have had NBII 
routers doing ATM for a couple of years.  Is there a reason why you wish 
to route rather than switch?  Are you planning on implementing voice and/or 
video?

From: Dan Martin <Dan.Martin at anixter.com>
------------------------------------------
1.  the ring they give you will live in a single ddm2000 or fujitsu sonet mux.  
they generally don't break, but stuff happens.  the way to manage their 
guarantee is to give them a chance to give you back a month's payment for every 
hour the circuit is down.  see how much money they are willing to bet on "100%" 
up time. you may even want to look at an atm hand off from bell atlantic into a 
broadband access device to leverage voice and FUNI services.

2.  if you use single mode it will be easier for bell atlantic to trouble shoot 
connections, they will only need test equipment with single mode interfaces.  
if you use multimode your atm interfaces will be a few thousand dollars 
cheaper.  we have one customer with multimode and one with single mode.  if you 
get single mode make sure you don't get any more than intermediate reach.  you 
need to be careful not to over drive their interfaces.  as bell atlantic would 
say "you don't want your stuff too hot"  they speak a secret language.

3.   i have 4 netbuilder IIs with atm interfaces that have been in production 
for over a year in an atm wan environment that have run fine.  the 3com routers 
may not say cisco on them anywhere but they do work.  if you have a bunch of 
frame relay connections and are getting atm you should look into funi.  3com 
did it at olsten on long island and i did it at EHS on long island.  

we've worked on a number of projects like yours and might be of some material 
assistance in getting this thing up and running.  

From: Sean Donelan <SEAN at SDG.DRA.COM>
--------------------------------------
100% uptime meaning Bell Atlantic will pay you some small token amount
when it fails? or 100% uptime meaning it will never fail?  Figure out
how much Bell Atlantic will pay you, and compute your risk factor.  However,
you are getting into the insanity region of availability vs. disaster
planning, category 5 hurricane, regional firestorm, sabotage or
terrorist attack, etc.  Yes, all have happened, and will happen again.
But paying a bigger insurance premium may be cheaper for these extremes.

Even a perfectly functioning SONET ring will have some delay as the
APS switches to the protect circuit.  So make sure your application
can tolerate whatever the maximum switching delay.  Most carriers
consider SONET outages of less than 60ms 'normal' and an overall
SONET end-user restoral of 2 seconds 'acceptable.'

SONET also has its own set of hardware failures (e.g. APS controller
failure), user errors (e.g. improper provisioning, improper maintenance),
and the dreaded multiple failure modes (e.g. a fiber cut during a
Forced Switched, double Forced Switches in the same ring at the same
time, and other types of ring partitions), and software/firmware bugs.
Humans are always a single point of failure in any system.

In the last carrier summary report I saw the number of SONET failures
was very small, but was greater than zero.  The number of carriers
reporting was also small, making any statistical relevance of the
numbers virtually nill.  On average less than 5.5 SONET outages a
year were reported across 7 carriers.

3com is more likely to jump when when a customer with 500 3com routers
has a problem than cisco will jump for someone with only a couple of
cisco routers.  Assuming cisco will even agree to sell their favored
products to anyone without their seal of approval.  And you already
know the capabilities and operation of 3com.

On the other hand, cisco's SONET products have had the extensive
testing and feedback from a number of noted luminaries/customers
who haven't been shy about encouraging cisco to rectify any apparent
shortcomings.  So you won't be the leading edge trailblazing customer.
And since its a good bet Bell Atlantic will blame any problems you
encounter on your CPE equipment, using the CPE equipment which Bell
Atlantic 'related subsidiaries' resells may cut short at least one round
of finger-pointing.

From: "Roeland M.J. Meyer" <rmeyer at mhsc.com>
---------------------------------------------
For total system uptime
90.0% (one nine or less) Desktop systems.
99.0% (two nines) Intermediate business systems
99.9% (three nines) Most business data systems and workgroup servers
99.99% (four nines) High-end business systems and your friendly
neighborhood telco
99.999% (five nines) Bank Data Centers and Telco Data Centers, some ISPs
99.9999% (six nines) Only God and Norad live here.
99.99999% (seven nines) Even God doesn't have pockets this deep.

There is a matching exponential cost increment with each step.

From: Alex Bligh <amb at gxn.net>
-------------------------------
Don't just buy from one carrier. Even if they give you dig plans,
and diversity warranties, they are likely to reroute things without
telling you. Even if you get around this one, and all the other problems,
and are satisfied you have permanent, true, diversity which will protect
you against any one fiber break, it doesn't protect you against a
procedure break, like someone terminating the wrong circuit on misreading
a circuit ID.

The above is certainly true in the UK, and from my experience in the US
I'd think it's doubly or quadruply true.

Also, use the lowest level routing redundancy you can find. I may get flamed
for this, but if bandwidth is not an issue, you might consider ATM switches
instead especially if cutover time is critical.

From: "David Greer" <david-greer at gnc-hq.com>
--------------------------------------------
If you are going to use Cisco which is a good choice, do not go cheap and
get a 7200 for the job.  A 7200 can only handle three fast ethernet ports.
I would look at the 7500 series or possibly there gigabit router series.
That will give you enough muscle to handle the job.  I would call in your
local Cisco office to check out which model is most appropriate.

Nobody gets 100% uptime from anything.  Can't happen!

Look at the distance, I believe multi mode is only good for a couple of
miles, then you have to go single mode, which will drive your costs up.

From: "Dave Cooper" <dave_cooper at eli.net>
-----------------------------------------
In addition to what Alex has stated, if you are purchasing this OC-12c or 
OC-3c from a single carrier, you might want to check that the carrier has
a 'dual-entrance' into your building.  Although the "main" fiber backbone
may be truly ringed and redundant, it is common practice for RBOCs and
CLECs to spur off the backbone and bring the fibers into the building via 
SINGLE sheath.  This subjects the spur to backhoes or augers that might be 
digging up the sidewalks in front of your building.  Most large data centers 
require the telco/clec to enter the facility (via fiber) from two diverse 
entry points.  This literally brings their main backbone "through" the 
facility, thereby, truly preventing a fiber-cut that will take down your 
OC-12/OC-48 Sonet gear.  Good dual-entries will even land the two diverse 
fiber runs on two separate FODUs in the event that someone is moving fibers 
or reterminating. (However, this kind of fiber build usually requires a
revenue commitment from the customer since it costs three times as much a
s a standard spur build.)  Might be a good thing to check out if your 
applications are very mission critical.

From: Ken_King at 3com.com
------------------------
You might want to take a look at the PathBuilder s600/700(formerly known as
the Accessbuilder 9600)  These ATM access devices are being used in RBOCs
all over the company and they are very solid.  I will enclose a couple of
PDF files for you.

From: john heasley <heas at shrubbery.net>
---------------------------------------
fwiw, we found it easier to use SM for all applications.  a wee bit
more expensive, but it is easier to spare parts and distance is
hardly ever an issue.

From: Dave Cooper <dave_cooper at eli.net>
----------------------------------------
>More seriously - SLA's that specify a sampling period then also give an
>indication what is considered too long an outage. If you get just under the
>.1% downtime allowed per year all in one go you may well be pretty pissed
>at being told the 8 hour outage was within the SLA.
 
The quasi Engineering guidelines for many CLECs when calculating average
downtime over a year's span is 52 minutes (meaning .0001% downtime over
the year).  Anything above and beyond this estimate would be suspect.
Obviously, these Engineering baselines vary from carrier to carrier. 
Also, this 52 minute guideline relates to the SONET ring and the muxes
and not the tributaries (OC-3 or OC-12) or the optical/electrical hand-offs
that might fail due to bad terminations/bad wiring/or misconfigured nodes.
A common failure for OC-3c or OC-12c is the 2-fiber optical handoff to the
customer which has nothing to do with the SONET ring itself or the associated
SONET gear.



More information about the NANOG mailing list