Spanning tree melt down ?

blitz blitz at macronet.net
Fri Nov 29 05:31:10 UTC 2002


Smells like it to me...sounds like they said, "HALP" to Cisco, and Cisco 
said, "Clean out the warehouse, we've got a live one!"

At 16:08 11/28/02 -0600, you wrote:

>I'm still failing to see why this required a $3M forklift of new equipment
>to correct the problem.  Was this just Cisco sales pouncing on someone's
>misfortune as a way to push new stuff?
>
>On Thu, 28 Nov 2002, Stephen J. Wilcox wrote:
>
> >
> > Heh, so they kept bolting stuff on and a failure somewhere caused a 
> spanning
> > tree change which because of over complexity and out of date config was 
> unable
> > to converge.
> >
> > Ah yes, occam also applies to switch topology :)
> >
> > Steve
> >
> > On Fri, 29 Nov 2002, Simon Lyall wrote:
> >
> > >
> > > On Thu, 28 Nov 2002, Garrett Allen wrote:
> > > > speculating on cause and effect, my first bet would that someone 
> turned off
> > > > spanning tree on a trunk or trunks immediately prior to the 
> flood.  my next
> > > > bet would be a babbling device - i've seen an unauthorized hub on a 
> flat
> > > > layer 2 net basically shut the network down.  it was after a power hit.
> > > > when we found the buggar and power cycled it, all was well.  i 
> don't think
> > > > that the researcher was the culprit.  more likely the victim.
> > >
> > > This article had some more information:
> > >
> > > http://www.nwfusion.com/news/2002/1125bethisrael.html
> > >
> > > This slashdot article also seems to have some details:
> > >
> > > http://slashdot.org/comments.pl?sid=46238&cid=4770093
> > >
> > > Text as follows:
> > >
> > >  I contacted Dr. John D. Halamka to see if he could provide more 
> detail on
> > > the network outage. Dr. Halamka is the chief information officer for
> > > CareGroup Health System, the parent company of the Beth Israel Deaconess
> > > medical center. His reply is as follows: "Here's the technical 
> explanation
> > > for you. When TAC was first able to access and assess the network, we
> > > found the Layer 2 structure of the network to be unstable and out of
> > > specification with 802.1d standards. The management vlan (vlan 1) had in
> > > some locations 10 Layer2 hops from root. The conservative default values
> > > for the Spanning Tree Protocol (STP) impose a maximum network diameter of
> > > seven. This means that two distinct bridges in the network should not be
> > > more than seven hops away from one to the other. Part of this restriction
> > > is coming from the age field Bridge Protocol Data Unit (BPDU) carry: when
> > > a BPDU is propagated from the root bridge towards the leaves of the tree,
> > > the age field is incremented each time it goes though a bridge.
> > > Eventually, when the age field of a BPDU goes beyond max age, it is
> > > discarded. Typically, this will occur if the root is too far away from
> > > some bridges of the network. This issue will impact convergence of the
> > > spanning tree. A major contributor to this STP issue was the PACS network
> > > and its connection to the CareGroup network. To eliminate its 
> influence on
> > > the Care Group network we isolated it with a Layer 3 boundary. All
> > > redundancy in the network was removed to ensure no STP loops were
> > > possible. Full connectivity was restored to remote devices and networks
> > > that were disconnected in troubleshooting efforts prior to TACs
> > > involvement. Redundancy was returned between the core campus devices.
> > > Spanning Tree was stabilized and localized issues were pursued. 
> Thanks for
> > > your support. CIO Magazine will devote the February issue to this event
> > > and Harvard Business School is doing a case study."
> > >
> > >
> > >  --
> > > Simon Lyall.                |  Newsmaster  | Work: simon.lyall at ihug.co.nz
> > > Senior Network/System Admin |  Postmaster  | Home: simon at darkmere.gen.nz
> > > ihug, Auckland, NZ          | Asst Doorman | Web: 
> http://www.darkmere.gen.nz
> > >
> > >
> >
> >
> >




More information about the NANOG mailing list