Links on the blink - reprise

Curtis Villamizar curtis at ans.net
Mon Nov 20 17:42:17 UTC 1995



Sean,

I tried to make some blanket statements without pointing fingers in
ant dirrection.  I was pointing out that everyone has equipment
limitations to deal with.  For example, we have some very severe PPS
limitations, but can still build an extremely stable and low loss
backbone if we can arrange our topology to keep within those limits.
That won't last forever and we know that.  Everyone has to understand
and deal with the limits of the equipment they use.

In message <95Nov17.215523-0000_est.20701+37 at chops.icp.net>, Sean Doran writes:
> 
> Curtis - 
> 
> | The brick wall is that a particular piece of equipment from a particular
> | vendor that a lot of service providers have made a large investment in
> | doesn't really perform all that well in the real world.
> 
> Please allow me to mitigate your politics with a dose
> of reality.
> 
> sl-dc-8.sprintlink.net, a now-fairly-old Cisco 7000 with one
> of the first four 2MB SSP boards ever shipped outside
> Cisco's doors has been observed to switch 125kpps through
> several interfaces over a 15 minute period several times in
> the past three weeks.
> 
> The bottleneck is not in terms of switching capacity nor
> is it in terms of throughput across its backplane at present.

I never said that the problem had anything to do with running out of
PPS.  I said they did not perform well.  Switching major amounts of
traffic for a 15 minute period and then falling over and dieing now
and then is not my idea of "performing well".  Particularly if routers
can take others down in the process and create a sustained state of
routing instability.  I feel justified in calling that "poor
performance".  It's a subtle semantic difference.  ;-)

> The latter issue is looming, but we're simply not there yet.
> 
> There have been substantial problems with respect to
> convergence times.  Many of these have been ameliorated with
> experimental code now deployed throughout SprintLink and
> ICM, which does selective packet dropping to assist
> convergence rather than having the box keel over dead
> process-switching packets when the SSE cache is being
> completely repopulated.
> 
> We are no longer hovering close to the practical limits
> of the current limitation, and are not very near the
> reasonable maximum for the current platform.

Fine.  The routers are being improved to make them more stable.  They
certainly need it.  Getting rid of the current caching design that
indicates that it is time to refresh by bombarding the RP with packets
would be a welcome change.

> This is not to say that we have all that much breathing-room,
> but this and other developments in the works does and will
> buy us much more time than we would gain by moving towards
> a system of the kind other providers appear to favour.
> 
> The immediate danger is still in terms of BGP routing on
> defaultless routers, and we are all now keenly aware of that
> and I believe that even you have accepted that despite
> available alternatives like dedicated route servers, 
> we must CIDRize or die.

At the last NANOG I gave a talk about scaling up the Internet and
described CIDR as the single most promising thing that providers to
cooperate on to improve scaling.  We are putting a lot of effort into
upgrading configuration based on the IRR to allow accurate aggregation
and aggregation across provider boundaries with cooperating providers.
Some providers are supportive of these goals.  Some want to undermine it.

[ .. defensive posturing and personal insults from Sean deleted for
brevity .. ]

> P.S.: You might want to consider some "Sprint-did-it-firsts" which
> 	developed both within ICM and SprintLink vis a vis
> 	Cisco and general router technology deployment:
> 	7000s, 64Mb RPs, BGP4, SSPs, 2MB SSPs, 7500s,
> 	reprioritization of forwarding vs other tasks, 
> 	selective packet drop, and so on and so forth.
> 	Vadim Antonov and Peter Lothberg were and are never idle,
> 	and I fully intend to carry on the tradition of
> 	pushing useful new technology into the field as fast as
> 	it is available, because quite frankly, we need it all.
> 	DS3?  OC3?  Hah.  You ain't seen nothing yet, baby.

Please keep in mind that Sprint was also the first in other things.
First to sustain high backbone packet loss due to Cisco full cache
flush problems.  First to experience Cisco cache overlap bugs on a
running network.  And now first to experience sustained instability
due to sending traffic from the SP to the RP after major route change.

[ aside: <g> reprioritization of forwarding vs other tasks, selective
packet drop.  Neat ideas.  Did you think of that? ;-) ]

The difference may be one of approach.  ANS has tried and has been
very successful at anticipating problems and convincing our vendors to
fix them before they become operational problems.  We warned Cisco in
1993 when the 7000 first came out and we tested them of the full cache
flush, difficulty of doing overlap in a cache right, and potential for
instabilility if sending traffic to the RP to signal a need for cache
refresh.  Cisco did not fix these to our satisfaction and so we
limited deployment of Cisco 7000s.  At this point we will be skipping
the Cisco 7000 entirely as a backbone router, opting for a later
generation, possibly a Baynet router.  You've chosen to deploy the
Cisco 7000 in your backbone and stepped on many of the bugs that were
the reason we refused to deploy Ciscos in our backbone.

Back to the point of my original message:

In summary in response to Gordan: There is a tommorrow for the
Internet.  Some people have been very aware of the wall.  The wall
only exists with respect to a particular generation of routers.

In response to Sean: ANS has choosen to skip the Cisco 7000.  There
are better routers on the immediate horizon that will allow ANS to
build a next generation backbone and allow ANS to continue to expand.
Other Internet providers can expand too, in whatever way they deem
appropriate or wise, whether that is Frame Relay, ATM or whatever.

Curtis

PS- Cisco has been very responsive to ISP needs, particularly in terms
of protocol and feature support.  Their next generation looks
extremely promising.  So does Baynet's current generation with some
(fairly major) software change that we expect alpha on soon.  Baynet
has been trying to become very responsive to ISP needs and may be
ready to step to the plate for real.



More information about the NANOG mailing list