Converged Networks Threat (Was: Level3 Outage)
Matthew Crocker
matthew at crocker.com
Wed Feb 25 18:57:14 UTC 2004
>
> Is it that sharing fate in the switching fabric (as
> opposed to say, in the transport fabric, or even
> conduit) reduces the resiliency of a given service (in
> this case FR/ATM/TDM), and as such poses the "danger"
> you describe?
>
Sharing fate in the physical layer (multiple fibers in the same
conduit) or transport layer (multiple services on the same SONET) have
clear and well defined resource limits. A GigE running down a piece of
fiber will NEVER jump over to the ATM network fiber and wipe it out.
Same goes with SONET. An STS1 is an STS1 and will never eat up an OC-48
no matter how much traffic. Clear well defined resource requirements
with well defined protection between resources.
shared fate in the switching fabric won't be as stable until routers
(the switching fabric) can allocate and manage resources in a clear and
defined way. If the resources are being over committed the fabric must
be able to handle the full burden of resource requests while still
managing to provide appropriate resource limits to services. QoS plays
a part in managing the resources of a given link, what manages the
resources a service can consume in the fabric itself (CPU, Memory,
bandwidth). With proper traffic engineering you can build/overbuild
the network to handle 'normal' traffic with a great deal of
reliability. The switch fabric and/or network itself must be able to
protect itself from the abnormal. Limiting memory/CPU consumption of a
flapping BGP peer so you still have enough resources to handle the AToM
traffic which is given a higher priority. Let the BGP peers fail, let
the Internet traffic drop to save the high priority traffic and the
MPLS glue traffic to keep the core operational. Wouldn't it be great
if routers had the equivalent of 'User mode Linux' each process
handling a service, isolated and protected from each other. The
physical router would be nothing more than a generic kernel handling
resource allocation. Each virtual router would have access to x amount
of resources and will either halt, sleep, crash when it exhausts those
resources for a given time slice. I don't know of any method in the
current router offerings to limit a VRF to x% of CPU and y% of memory.
-Matt
> Is this an accurate characterization of your point? If
> so, why should sharing fate in the switching fabric
> necessarily reduce the resiliency of the those services
> that share that fabric (i.e., why should this be so)? I
> have some ideas, but I'm interested in what ideas other
> folks have.
>
> Thanks,
>
> Dave
>
>
More information about the NANOG
mailing list