T3 backbone stability
Thu Feb 13 06:34:48 UTC 1992
I have summarized and enclosed below our trouble tickets written over
the last 5 days for the T3 network. The only problem that affected more than
one peer site was a crash of CNSS40 at Cleveland yesterday afternoon due to a
hardware problem. While a CNSS crash of this type would not normally cause
users loss of connectivity due to backbone redundancy, this crash did result
in connectivity loss since the Ann Arbor interconnect E131 which is homed int
the Cleveland POP became reachable via the safety net T1 links only. The
interconnect gateway was switched over to Houston during this time. This
resulted in a 25 minute outage of the interconnect.
The last T3 router crash we had was several weeks ago and the last
hardware induced crash was several months ago. While this is an undesirable
event, I suspect that SURAnet's use of the T3 backbone may actually reduce
your dependency on the interconnect gateways.
We have installed new software (build 64 and new rcp_routed) across
the T3 system during the last two weeks with additional performance and
reliability enhancements including improved aggregation of interior routing
updates, faster convergence time, and reduced CPU utilization.
Operationally we have begun the on-call schedule for the new NNAF
engineering group which has resulted in some more detailed NSR reports on
these scheduled and unscheduled events. This event coupled with the numerous
scheduled software installations may have given you the false impression that
there was an increase in problems.
My general conclusion is that while we have still have some problems
with the current T3 adapter technology, this has been manageable and the T3
backbone is still very reliable. We will cautiously monitor the network
reliability for changes as we add additional traffic to the T3 system. The T
network is not nearly as reliable and we are busy working on those problems.
Peer network router problems:
18410 - SURAnet sura7.sura.net
18429 - BARRnet equipment move
18441 - Pittsburgh power failure
Backbone router hardware problems:
18423 - cisco serial interface, Xlink (Germany)
18426 - cnss40 spare t3 card removed from backplane to reduce
frequency of black links. We have been getting about
3 black links per month on one interface on this cnss.
18432, 18491, 18509 - enss129 fddi card hang, manual reset
18465 - enss129 fddi card replacement scheduled maintainence
18504 - cnss40 crash resulting in interconnect switchover
Routing configuration problems:
18503 - Missing ibgp line in cnss48 and 49 for new enss164 at IBM Watson
Scheduled Backbone router software upgrades:
18414 - new rcp_routed on enss131 for better route aggregation
18416 - new build 64 on enss163 for performance improvements
18424 - new rcp_routed on cnss83
18425 - new rcp_routed on enss135, 137, 139
18430 - new rcp_routed on enss129, 132
18431 - new build 67 on enss135, 129, 132 (to fix fddi bug in build 64)
Site maintenance tickets, no downtime:
18451, 18452, 18453, 18454, 18455, 18456, 18457 - Perform spare parts
inventory at Cleveland and Hartford POPs, and at ENSS sites
128, 135, 129, 132, 133, 134.
Date: Wed, 12 Feb 92 18:18:11 EST
To: nwg at merit.edu
From: oleary at sura.net
Subject: T3 backbone stability
We are cutting over to use the T3 backbone now, and I am concerned
by the several recent messages about ENSS and CNSS problems.
Could someone at Merit summarize some of the recent outages if
there have been trends, or could some of the other midlevels
provide us with some insight as to how (in)stability has
affected your connections?
We are going with the emerging standard of send T3 stuff to the T3
and T1 stuff to the T1, explicitly importing T3 routes and defaulting
everything else to the T1.
More information about the NANOG