Links on the blink - what will/should mci & sprint do?

Curtis Villamizar curtis at ans.net
Mon Nov 20 19:57:47 UTC 1995


In message <Pine.LNX.3.91.951119010238.14818B-100000 at okjunc.junction.net>, Mich
ael Dillon writes:
> On Sat, 18 Nov 1995, Sean Doran wrote:
> 
> > 
> > | Sounds like there is a need for a good ip switch.  Something simple, 
> > | very fast, and low cost that you can download "static" routes to.  
> > 
> > It's called an SSP.
> 
> And the problem on the net isn't with SSP's. The problem is that the 
> routing tables are NOT static. Switching is working fine, but the size of 
> the routing tables (CIDRize or die!) and the constant change in the 
> routing tables are the problem. Note that CIDRizing also reduces the 
> amount of change in the routing tables by replacing a set of potentially 
> varying routes with an unvarying aggregate.
> 
> Even building a mondo box to handle huge routing tables and lots of 
> changes is not enough to solve the problem because there there is also 
> the protocol problem whereby routers communicate these route changes to 
> one another. This limits the number of BGP peering sessions that are 
> practical.
> 
> Of course, most people here already know this but for those who are 
> trying to understand what is going on, I hope my brief explanation helps.
> 
> Michael Dillon                                    Voice: +1-604-546-8022
> Memra Software Inc.                                 Fax: +1-604-542-4130
> http://www.memra.com                             E-mail: michael at memra.com


Actually, you don't have the problem quite right.

The problem is not the sheer size of the routing table.  The 64 MB RP
has fixed that for quite a while.  It is not the processing load
associated with the route change.  An RS6K can keep up easily if it
doesn't have to page (enough RAM in the box), and so can a 68020 if it
was allowed enough CPU time to do something.

The problem is that when a large set of routes change, a large set of
routes in the SSP are invalidated.  This results in a large amount of
traffic forwarded to the RP.  The SSP is bludgenning the RP in order
to tell it that it needs some cache entires updated.  The RP then
can't keep adjacencies up and more route change results, which can
kill other routers.  If it gets far enough out of hand, the
instability can turn into a stable oscillation and you have a melted
backbone.  This is a consequence of the architecture and the cache
design.  I've been pointing out this for years.  Now it blew up.

This is very fixable and Cisco could even fix it without requiring
everyone to throw out their Cisco 7000s.  Just get rid of the cache
completely and push full routing from the RP to the SSP!

Curtis

ps - This is my guess.  Cisco or Sprint have not yet confirmed or
denied this.  Perhaps Sean or Tony would care to comment.  ;-)



More information about the NANOG mailing list