Converged Networks Threat (Was: Level3 Outage)

Matthew Crocker matthew at crocker.com
Wed Feb 25 18:57:14 UTC 2004


>
> 	  Is it that sharing fate in the switching fabric (as
> 	  opposed to say, in the transport fabric, or even
> 	  conduit) reduces the resiliency of a given service (in
> 	  this case FR/ATM/TDM), and as such poses the "danger"
> 	  you describe?
>

Sharing fate in the physical layer (multiple fibers in the same 
conduit) or transport layer (multiple services on the same SONET) have 
clear and well defined resource limits.  A GigE running down a piece of 
fiber will NEVER jump over to the ATM network fiber and wipe it out. 
Same goes with SONET. An STS1 is an STS1 and will never eat up an OC-48 
no matter how much traffic.  Clear well defined resource requirements 
with well defined protection between resources.
shared fate in the switching fabric won't be as stable until routers 
(the switching fabric) can allocate and manage resources in a clear and 
defined way.  If the resources are being over committed the fabric must 
be able to handle the full burden of resource requests while still 
managing to provide appropriate resource limits to services.  QoS plays 
a part in managing the resources of a given link,  what manages the 
resources a service can consume in the fabric itself (CPU, Memory, 
bandwidth).  With proper traffic engineering you can build/overbuild 
the network to handle 'normal' traffic with a great deal of 
reliability.  The switch fabric and/or network itself must be able to 
protect itself from the abnormal.  Limiting memory/CPU consumption of a 
flapping BGP peer so you still have enough resources to handle the AToM 
traffic which is given a higher priority.  Let the BGP peers fail, let 
the Internet traffic drop to save the high priority traffic and the 
MPLS glue traffic to keep the core operational.  Wouldn't it be great 
if routers had the equivalent of 'User mode Linux' each process 
handling a service, isolated and protected from each other.  The 
physical router would be nothing more than a generic kernel handling 
resource allocation.  Each virtual router would have access to x amount 
of resources and will either halt, sleep, crash when it exhausts those 
resources for a given time slice.  I don't know of any method in the 
current router offerings to limit a VRF to x% of CPU and y% of memory.

-Matt


> 	Is this an accurate characterization of your point? If
> 	so, why should sharing fate in the switching fabric
> 	necessarily reduce the resiliency of the those services
> 	that share that fabric (i.e., why should this be so)? I
> 	have some ideas, but I'm interested in what ideas other
> 	folks have.
>
> 	Thanks,
>
> 	Dave
>
>




More information about the NANOG mailing list