RS/960 upgrade status report for Week 2

mak mak
Thu May 7 05:09:27 UTC 1992


		Phase-III RS960 Deployment Status Report - Step 2
		=================================================
		Jordan Becker, ANS		Mark Knopper, Merit

	Step 2 of the phase-III network upgrade was successfully completed
last Saturday 5/2.  The following T3 backbone nodes are currently running with
new T3 hardware and software in a stable configuration:

Seattle POP:	CNSS88, CNSS89, CNSS91
Denver POP:	CNSS96, CNSS97, CNSS99
San Fran. POP:	CNSS8,  CNSS9,  CNSS11
L.A. POP:	CNSS16, CNSS17, CNSS19
Regionals:	ENSS141 (Boulder), ENSS142 (Salt Lake), ENSS143 (U. Washington)
		ENSS128 (Palo Alto), ENSS144 (FIX-W), ENSS135 (San Diego)

	CNSS8, CNSS16, CNSS96 are now running with mixed technology (e.g.
3xRS960 T3 interfaces, 1xHawthorne T3 interface).  Production traffic on the
affected Bay Area ENSS nodes was cutover to the T1 backbone at 2:00 AM EST on
5/2.  Production traffic on ENSS135 was cutover two hours earlier.  The San
Francisco and Los Angeles POP nodes were returned to full service by 10:50 AM
EST on 5/2, well within the planned maintainence window.

	The maintainence in the Los Angeles POP was complicated by the curfew
existing at that time.  Normally a specially trained 2-3 person deployment
team is scheduled to perform these upgrades at each POP location.  Because of
the circumstances in Los Angeles, a special IBM engineer (Carl Kraft) from
Gaithersberg, Maryland was deployed to the Los Angeles POP to perform the
upgrade by himself.  Carl was able to upgrade the node single-handedly on
schedule.

	Several new procedures were developed following the first deployment
step in Seattle and Denver on 4/25.  These procedures helped to reduce the
installation window and number of installation problems experienced on 4/25.

	The only problem experienced was the supected failure of a single
RS960 adapter during the installation at ENSS128 (Palo Alto).  This problem
was isolated to the adapter within several minutes and the adapter was swapped
resulting in successful operation of the node.  A subsequent failure analysis
of the RS960 adapter has not resulted in any reproducible problems, and has
been attributed to an improper seating of the adapter in ENSS128 during the
initial installation.


Next Steps
==========
	Based upon the successful completion of step 2 of the deployment,
step3 is currently scheduled to commence at 23:00 local time on 5/8.  Step 3
will involve the following nodes/locations:

Chicago POP:		CNSS24, CNSS25, CNSS27
Cleveland POP:		CNSS40, CNSS41, CNSS43
New York City POP:	CNSS32, CNSS33, CNSS35
Hartford POP:		CNSS48, CNSS49, CNSS51
San Fran. POP:		CNSS8 (Second visit to CNSS8->CNSS24 Interface)

Regionals: 		ENSS130 (Argonne), ENSS131 (Ann Arbor), ENSS132 (Pittsb
urgh)
			ENSS133 (Ithaca), ENSS134 (Boston), ENSS137 (Princeton)

Other ENSS's Affected: 	E152, E162, E154, E158, E167, E168, E171, E172, E163,
			E155, E160, E161, E164, E169

	The system software (build 2.78.22) required to support RS960
installations has been fully deployed to all T3 network nodes as of
early this week.  New rcp_routed software has also been installed on
all T3 nodes, although this is not a pre-requisite for any phase-III
deployment activities.  The new rcp_routed software has enhancements
including support for externally administered inter-AS metrics, an
auto-restart capability, and a fix for the invalid acceptance by the
ENSS of a route to itself from a peer.

	Following the step 3 deployment, selected T3 internal link metrics
will be adjusted to support load balancing of traffic across the 5 different
hybrid technology links that will exist.  The selection of these link metrics
has been chosen through a calculation of traffic distributions on each link
based upon an AS<->AS traffic matrix. This step of the deployment involves
4 POPs, and will complete the coast to coast RS/960 path for a large
proportion of the backbone traffic. 

	During the step 3 deployment, the Ann Arbor ENSS will be
isolated from the T3 backbone. Since the Merit/ANS NOC is located
in Ann Arbor, and Merit's backup connectivity to the backbone will be
through the T1 network, we are implementing a backup network
management machine. The "rover" monitoring tool is running on an
unused RS/6000 CNSS at the Denver POP, and its data collection
capability will be used if there is any problem with Merit's
connection to the T3 backbone. 

	Also during this deployment the Princeton ENSS will be
isolated from the backbone. This means that both the Ann Arbor
T1/T3 interconnect gateway and its backup at Princeton will not
be operational. Therefore on Friday night we will run the Houston
interconnect as primary, and temporarily configure the San Diego
interconnect gateway as secondary with load sharing being handled
as with the Ann Arbor/Houston configuration.








More information about the NANOG mailing list