August Backbone Engineering Report

mak mak
Fri Sep 11 03:07:16 UTC 1992


                ANSNET/NSFNET Backbone Engineering Report

                                 August 1992

                Jordan Becker, ANS      Mark Knopper, Merit
                becker at ans.net          mak at merit.edu


T3 Backbone Status
==================
	The system software and routing software for the T3 routers
has stabilized. The new RS/960 FDDI card has completed testing and
deployment schedules are in progress.  A new system software build
with support for 10,000 routes maintained locally on the smart-card
interfaces is being tested on the T3 Research Network.

	Planning is now underway for dismantling of the T1 backbone
which is targeted for November. Several steps to be completed prior to
dismantling the T1 backbone include support for OSI CLNP transport
over the T3 backbone, and the deployment of the redundant backup
circuit plan for the T3 ENSS gateways at each regional network.

	Further activities in support of the Phase IV upgrade to the
T3 backbone are in progress.


Backbone Traffic and Routing Statistics
=======================================
	The total inbound packet count for the T1 network during
August was 3,903,906,145, down 17.9% from July.  298,961,253 of these
packets entered from the T3 network.

	The total inbound packet count for the T3 network was
13,051,979,670, up 1.3% from July.  129,835,094 of these packets
entered from the T1 network.

	The combined total inbound packet count for the T1 and T3
networks (less cross network traffic) was 16,527,089,468 down 3.1%
from July.  Reports on T3 backbone byte counts for June, July and
August were incorrect due to SNMP reporting problems. These will be
corrected soon in reports available on the nis.nsf.net machine. The
totals for June, July, and August are 2.279, 2.546, and 2.548
trillion, respectively.

	As of August 31, the number of networks configured in the
NSFNET Policy Routing Database was 6360 for the T1 backbone, and 5594
for the T3 backbone. Of these, 1233 networks were never announced to
the T1 backbone and 1102 were never announced to the T3 backbone.  For
the T1, the maximum number of networks announced to the backbone
during the month (from samples collected every 15 minutes) was 4866;
on the T3 the maximum number of announced networks was 4206.  Average
announced networks on 8/31 were 4817 to T1, and 4161 to T3.


New FDDI Interface Adapter for ENSS Nodes 
========================================= 
     We have a new RS960 FDDI adapter for the RS/6000 router that
provides much improved reliability and performance.  It was our hope
that the new RS960 FDDI interface adapter targeted to upgrade the
older 'Hawthorne' technology FDDI adapters in the the T3 ENSS routers
would be ready for deployment in early August.  However several
serious bugs were encountered during testing in late July, and the
upgrade has been delayed by more than a month.
 
     Fortunately we have corrected or worked around all of these known
remaining bugs.  We are re-running our full suite of regression tests,
and a full set of stress tests on the T3 test network during the labor
day weekend.  Pending the successful completion of our tests, we
expect that the first set of FDDI adapter upgrades on the production
T3 ENSS nodes could begining during the week of 9/7.  We would like to
begin planning for the installation of these new interface adapters at
ENSS128 (Palo Alto), ENSS135 (San Diego), ENSS129 (Champaign), and
ENSS132 (Pittsburgh).  We will develop plans for any further FDDI
deployments after these first 4 installations have been successfully
completed.
 
 
Dismantling the T1 Backbone
===========================
	The current target for dismantling the T1 backbone is November
'92.  This can be accomplished once the remaining networks using the
T1 backbone have been cut over to the T3 backbone (these are: ESnet,
EASInet, Mexican Networks at Boulder, and CA*net); an OSI CLNP
transport capability over the T3 backbone is in place; the T3 ENSS
nodes are backed up by additional T1 circuits terminating at alternate
backbone POPs; and the network-to-network source/destination pair
statistics matrix is available on the T3 backbone. These activities
are described below.  Since the RCP nodes on the T1 backbone are
experiencing further congestion and performance problems due to the
growth in networks, we are planning to reduce the number of networks
announced to the T1 nodes by the T3 interconnect gateways. This will
eliminate the use of the T3 to back up the T1 for those networks yet
to cut over, in the event of a failure in the T1 network.

Remaining Network Cutovers
--------------------------
	The ESnet cutover is waiting for a new version of software to
be configured for the ESnet router peers at FIX-West and FIX-East.
The Mexican autonomous system will be cut over soon, pending
communication with the folks in Mexico.  We are developing a plan that
will allow EASInet to peer directly with the T3 network.  The plan for
CA*net is to remove the RT from the token ring on the NSS nodes at
Seattle, Princeton and Ithaca, configure them to run the CA*net kernel
and gated, and peer directly across the ethernet to the T3 ENSS at
these sites.

OSI Support Plan
----------------
     In order to dismantle the T1 backbone, we need to support the
transport of OSI (CLNP) packets across the T3 network. Because we
would like to target dismantling of the T1 backbone for sometime in
late 1992 and the T3 backbone software for support of OSI is still in
test, we would like to proceed with a phased (multi-step) migration
for support of OSI switching over the T3 network in order to ensure
network stability as we introduce OSI software support.  The migration
plan involves several steps:

1.   Convert RT/PC EPSP routers that reside on the shared ENSS LAN
     into OSI packet encapsulators.  This would be done at the 8 or so
     sites where there are regionals that currently support OSI switching
     services.  OSI traffic is encapsulated in an IP packet on the RT
     router and forwarded as an IP packet across the T3 network to a
     destination RT de-encapsulator.  This software already exists and
     can support the migration of OSI traffic off of the T1 backbone, with
     no software changes required to the T3 backbone.  This software is
     entering test now and could be running in production by early
     October.

2.   Introduce new RS/6000 OSI encapsulator systems that are the
     running AIX 3.2 operating system with native CLNP support.  These
     machines will replace the RT OSI encapsulators on the shared
     ENSS LAN.  As the CLNP software gets more stable, the RS/6000
     system can begin to support non-encapsulated dynamic OSI routing. 
     There are still no changes required to the production T3 network
     software in this step.  This step could occur sometime in the mid-
     fall.

3.   Deploy the AIX 3.2 operating system and native CLNP switching
     software on the T3 routers across the backbone.  The experience
     gained in step#2 above will facilitate this migration.  This step is
     expected sometime in January 1993.

T1 ENSS Backup Circuits
-----------------------
	The T1 backbone is currently providing backup connectivity in
the event of a problem with the T3 backbone. Since the T3 ENSS nodes
are currently singly-connected to a CNSS at an MCI POP, the single T3
circuit and CNSS node represent a single point of failure.  As a
backup plan, each T3 ENSS will be connected to a new T1 circuit which
terminates at a different backbone POP CNSS.  This will allow bypass
recovery in the event of circuit or CNSS failure.  We are executing a
test plan on the test network to measure internal routing convergence
times and end-user observations during a backup transition.  These
circuits are being ordered now and are expected to be in place by late
October.

Network Source/Destination Statistics
-------------------------------------
     During the migration to the smart card forwarding technology
(RS960/T960) we temporarily lost the ability to collect network
source/destination pair traffic statistics.  This is because packets
were no longer passing through the RS/6000 system processor where the
statistics collection application software ran.  We are now testing
new software for near-term deployment that will allow us to continue
to collect statistics for each network source/destination pair.  These
statistics include packets_in, packets_out, bytes_in, and bytes_out.
The statistics will be cached on the RS960 and T960 interfaces and
uploaded to the RS/6000 system for processing and transmission to a
central collection machine.



Increase Routing Table Sizes on T3 Network
==========================================
     We continue to experience an increase in ANSNET/NSFNET advertised
networks, (see Backbone Traffic and Routing Statistics, above) The
current on-card routing table size on the T3 router RS960 card
(T3/FDDI) and T960 card (T1/ethernet) supports 6,000 destination
networks with up to 4 alternate routes per destination.  The current
on-card routing tables are managing on the order of 12K routes
(including alternate routes to the same destination).

     We are now testing new software for the RS960 and T960 interfaces
that will be deployed shortly that supports up to 10,000 destination
networks with up to 4 alternate routes per destination.  This software
will be deployed on the T3 network in the near future.

     We also continue to work on support for on-card route caching
which will significantly increase the upper limit on the number of
routes to unique destination networks that may be supported.  This
software will be available with the AIX 3.2 operating system release
of the router software in early 1Q93.



Phase-IV T3 Network Upgrade Status
==================================
     The scheduled upgrades to the T3 backbone discussed in the July
report are continuing on schedule and will allow the dismantling of
the T1 backbone.  The major features of this plan include:
 
1)   T3 ENSS FDDI interface upgrades to new RS/960 card.  This is
     currently being scheduled at 4 regional sites.
 
2)   T3 ENSS backup connections are being installed.  A T1 circuit will
     be installed at each T3 ENSS to allow a backup connection to a
     different CNSS. This will provide some redundancy in the case of T3
     circuit or primary CNSS failure.  These circuits are scheduled for
     cutin in October.
 
3)   T3 DSU PROM upgrades.  A problem was uncovered in testing the
     new DSU firmware.  The new firmware supports additional SNMP
     function and fixes a few non-critical bugs.  Since this problem was
     uncovered, a fix has been provided.  However the testnet has been
     occupied with FDDI and other system testing since then.  Therefore
     the upgrades to the DSUs that were scheduled to begin on 9/14 will
     be postponed until early October.

4)   The existing set of CNSS routers in the Washington D.C. area will
     be moved to an MCI POP in downtown Washington D.C. on 9/12 for
     closer proximity to several ENSS locations.  The tail circuits of the
     existing network attachments to this POP will be reduced to local
     access circuits only. 

5)   The installation of a new CNSS in Atlanta is scheduled for 9/26 to
     reduce the GA Tech T3 tail to local access only, and provide
     expansion capability in the southeast. 



T3 Network Performance Enhancements
===================================
     The general approach to engineering the T3 network has been to
prioritze enhancements that improve stability rather than performance. 
Since the T3 network RS960 upgrade in May '92, the stability of the
network has become very good, and we have been able to spend more
resources focusing on the performance of the network, which has also
improved significantly.  With the upcoming deployment of the new RS960
FDDI adapter, we expect to observe higher peak bandwidth utilization
across the T3 network, and higher aggregate packet traffic rates.  In
anticipation of this, we have conducted some baseline performance
measurements on the T3 network that serve as a basis for continued
tuning and improvement over time.

T3 Network Delay
----------------
     In order to analyze the delay across the T3 ANSNET, we start by
measuring the delay incurred by each T3 router hop, and then measure
the circuit propagation delay across all backbone circuits.  We have MCI
T3 circuit route mileage figures which can be calibrated with PING
measurements to determine how much each hop through a T3 router adds
to the round trip time. 

     A set of round trip delay measurements was made using a special
version of PING that records timestamps using the AIX system clock
with microsecond precision.  The technical details of the measurements
may be described in a future report on the subject.  The end result is
that the round trip transit delay across a T3 router was measured to
be about 0.33 ms (0.165ms one way delay), with a maximum variance
between all samples on the same router of 0.03 ms.  The T3 routers
currently experience very little variance in delay at the current load
on the T3 network.  The T3 router transit hop delay is therefore
negligible compared to the T3 circuit mileage propagation delay.

     It turns out that the round trip delay between the Washington POP
and the San Franciso POP can be 77ms for packets traversing the
southern route (Washington->Greensboro->Houston->Los Angeles->San
Francisco) or 67ms for packets traversing the northern route
(Washington- >New York->Cleveland->Chicago->San Francisco).
 
     During the timeframe of "Hawthorne" technology routers, it was
appropriate to choose internal routing metrics that balanced load
across redundant T3 paths, and minimized transit traffic on the
routers.  However now with RS960 technology, the requirement for
load-balancing, minimizing transit traffic and hop count, and
maintaining equal cost paths is no longer justified.  With the
introduction of the new Atlanta CNSS, we will explore adjustment of
the internal T3 link metrics to minmize round-trip latency
ENSS<->ENSS.  This will improve overall network performance as
perceived by end users.  The summary on T3 network latency is:
 
(1)  Delays due to multiple hops in the T3 network are measurable, but
     not large enough to matter a whole lot.  The observed T3 ANSNET
     one way delay associated with a single T3 router hop is 0.165mS
     per router (1.35mS cross country one way delay due to 8 router
     hops). This is neglible compared with the cross-country propogation
     delays (e.g. 35ms one way).  It would require the addition of 30 T3
     routers to a path to add 10 ms to the unloaded round trip time,
     given constant circuit mileage.  Delays introduced by extra router
     hops are negligible compared to circuit mileage delays.  
 
(2)  For small packets, like the default for ping and traceroute, the round 
     trip delay is mostly dependent on circuit mileage, and is relatively
     independent of bandwidth (for T1 and beyond, at least).
 
(3)  All T3 links within the network are maintained at equal cost link
     metrics regardless of physical mileage.  This was designed during
     the timeframe when RS/6000 routers were switching packets through
     the system processor, and hop count, and transit traffic through the
     router were important quantities to minimize.  With the introduction
     of pure adapter level switching (e.g. no RS/6000 system processor
     involved in switching user datagrams), minimizing hop count and
     router transit traffic become less important.  Minimizing overall
     ENSS<->ENSS delay becomes more important.

(4)  The T3 ANSNET maintains two different physical circuit routes
     between Washington D.C.  and Palo Alto.   Each of these routes
     represent equal cost paths, and therefore will split the traffic load
     between them.  However one of these physical routes is about 600
     miles longer than the other.  This can introduce problems involving
     asymmetric routes internal to the T3 network, and sub-optimal
     latency.  The T3 ANSNET circuits are physically diverse to avoid
     large scale network failures in the event of a fiber cut. 
     Compromising physical route diversity is not planned.  However
     some reduction of real T3 circuit mileage (and therefore about 5mS
     of delay) might be possible on the ANSNET with the installation of
     the Atlanta POP CNSS in September.  ANS is conducting a review
     with MCI to determine whether the
     Washington->Greensboro->Houston->Los Angeles->Hayward physical
     route can be reduced in total circuit miles without compromising
     route network diversity.  This might be possible as part of the plan
     to co-locate equipment within Atlanta.  

T3 Network Throughput
---------------------
     The RS960 adapter technology will support up to five T3
interfaces per router, with an individual T3 interface operating at
switching rates in excess of 10K packets per second in each direction.
The unit and system tests performed prior to the April '92 network
upgrade required the CNSS routers to operate at 50KPPS+ aggregate
switching rates, and 22Mbps+ in each direction with an average packet
size of 270 bytes on a particular RS960 interface.  The router has
also been configured and tested in the lab to saturate a full 45Mbps
T3 link.
 
     The performance that is currently observed by individual end
users on the T3 network is largely determined by their access to the
network.  Access may be via an ethernet or an FDDI local area network.
Many users have reported peak throughput observations up to 10Mbps
across the T3 network using ethernet access.  Several of the T3
network attachments support an FDDI local area network interface
which, unfortunately, does not result in more than 14Mbps peak
throughput across the T3 backbone right now.  With the new RS960 FDDI
adapter to be introduced in September, end-to-end network throughput
may exceed 22Mbps in each direction (limited by the T3 adapter).  The
initial RS960 FDDI card software will support a 4000 byte MTU that
will be increased later on with subsequent performance tuning.
Further performance enhancements will be administered to the T3
backbone in the fall and winter to further approach peak 45Mbps
switching rates for end-user applications.







More information about the NANOG mailing list