TT60009 -- Merit Network Backbone Service -- Update (fwd)

Betty J. Burke bburke at merit.edu
Fri Aug 17 18:15:50 UTC 2007


FYI,
NANOG Community participants, if you have any questions about this network
event, please contact me..

Betty Burke
NANOG Project Manager
(734) 647-3743 office
(734) 395-1724 cell

....
------------ Forwarded Message ------------
Date: Friday, August 17, 2007 11:15 AM -0400
From: "Elwood J. Downing" <ejd at merit.edu>
To: mjts at merit.edu, netdirs at merit.edu, network-alerts at merit.edu, Merit NOC
<trouble at merit.edu>
Subject: TT60009 -- Merit Network Backbone Service -- Update


We are pleased to provide an update to the recent backbone service outage.
As reported last night, 9:29 PM, EDT, Thursday, August 16, 2007, Merit's
backbone began to stabilize around 8:20 PM, EDT on Thursday and has
continued without any additional network alerts or major problems reported
to us from our Members.

We are continuing to work with our backbone equipment vendors to determine
what caused the problem and how to prevent it from happening in the future.
We believe that we will have enough information to share with you the "How,
What, and Why's" of this network problem.

The main cause of the problem seemed to be with the management interface
cards on the Extreme Aspen 10G switches, which were crashing. We would
reset the card and the system would work for a while then the card would
crash again. This also became a problem with these cards that had 1G LAN
ports on them servicing our members and connecting to routers. Some cards
did not come back after resetting which required a hard reset (power
cycle). Since we and Extreme Engineers did not know the nature of the
problem, we were also investigating a possible Denial of Service attack on
our network.

What we found is that our 10G ring was changing up/down states rapidly
which made us think this was causing the hardware to stop working because
they were running out of resources.  We then determined that the root of
the problem was here in Ann Arbor, where a Cisco switch was connected to
our 10G  core Extreme Networks switch. We disabled the port and the network
began to stabilize. We have informed Extreme Networks and they are working
to provide feedback on the problem and resolution. It took additional time
for all the routes to propagate since many of our networks were route
dampened because of the on-going instability of the network.

If you are continuing to experience any network performance problems please
contact Merit's Network Operations Center (NOC) immediately. We have
engineering, NOC, and support staff available to work with you on resolving
any issues you are experiencing.

We sincerely apologize for the inconvenience this outage has caused your
organization. We continuously strive to provide the highest level of
service to our Membership and regret this service issue."

Sincere Regards,
--Elwood

---------------------------------------------------------------------------
Elwood J. Downing                      e-mail: ejd at merit.edu
Merit Network                          Phone: (734) 936-2040
Director of Member Services            Fax: (734) 647-3185

     Merit Network -- Connecting Organizations, Building Community










---------- End Forwarded Message ----------







More information about the NANOG mailing list