MCI and SprintLink are partitioned (fwd)

Wed Oct 4 21:05:50 UTC 1995

HWB - 

  The problem is really not so much that the routing
fell over but that other problems were run into, that
are independent of NAPs/MAEs/etc.

|  . are all three (four?) NAPs really being used (I know they are
|    there, but despite repeated requests to at least one NAP service
|    provider I appear to be unable to get an answer). I do know that the
|    NY NAP is heavily used, including as my traffic to the Bay area
|    sites I need access to traverses it (modulo all the losses in
|    Sprintlink for at least weeks (reported to and confirmed by the
|    regional network that serves SDSC, though from rumors I am hearing
|    Sprintlink is rather not the exception, and many natives in the
|    community starting to get restless]

SprintLink and MCI exchange traffic at two NAPs, both MAEs,
and FIX-WEST.   We likely will start exchanging traffic at
the PAC*Bell NAP in the very near future.

SprintLink and AGIS will be exchanging traffic there even
before then.

Others at that NAP are either in the queue wrt negotiating
a bilateral agreement with Sprint, or have not yet approached
Sprint with regards to a peering for various reasons.

SprintLink and MCI exchange fairly heavy traffic at the
Chicago NAP, very heavy traffic at the Pennsauken NAP,
and extraordinarily heavy traffic at MAE-EAST, and we
are have already been looking at a very strong and purely
technical need to start moving traffic between ourselves
directly in several other locations.

The reason you see so much use of the Pennsauken NAP
is that CERFNET has a DS3 ATM pipe terminated on a 
router there, and that is where CERFNET and SprintLink
exchange a good chunk of traffic, principally because
the bandwidth available to do that in New Yorsey has
been greater than any other path on the west coast
through which CERFNET and SprintLink could have 
exchanged traffic.

I believe that Push could supply you with further details
of CERFNET's near-future plans in this and other regards.

|  . Is there any evidence that the NAPs are really backing each other
|    up? Did someone test and document it, e.g., with a few "test" networks
|    in a bunch of regional networks? What are the time delays for a
|    switch? Does someone have consecutive traceroute outputs where a
|    switch among the NAPs really happened?

What do you mean by backing each other up?  There was never
a requirement for NAPs to do that; what does fall-overs
is the bilateral routing among each pair of peers at each
touchdown point.

BGP fallover with respect to very large changes
(disconnectivity at a NAP, MAE or FIX-WEST) between
two very big peers adjusts in various ways; firstly,
you could have a fast IGP switching, which means
convergence time within one side of a few seconds.
Secondly, you could have an eBGP timeout or the like,
which means convergence time in a matter of a couple
of minutes or less.

The key problem here is that convergence eats CPU (lots of
routes to be announced or withdrawn or sent to different
next-hops), and very very bad transitions can take ten to
fifteen minutes, depending on the characteristics of the
failures.

However, fall-over happens fairly frequently and sometimes
as a result of having to make code changes and the like
at edge routers (routers colocated at NAPs/MAEs/FIXes etc),
and we have long established that done right, it's not
very painful.

|  . do we have some regular examples from *any* site A initiating a
|    connection from A to B, A to C, and A to D, where the three are
|    verifiably (via traceroute, I guess) would traverse different NAPs
|    (and hopefully only one each)?

Sure; if I understand the question correctly, anybody on
SprintLink or MCI should be able to do this without thinking about it.

|  . Are there routing stability reports accessible online from the RA
|    (or whoever else feels responsible for this) that graph fluctuations
|    at the NAPs, including correlation among them? What are the quality
|    metrics for routing stability?

Not AFAIK.

|  . Do all the NAPs provide online statistics?

The Sprint NAP has a statistics package which is really
nifty but not yet widely publically available; you should
tap Bilal or one of the other responsible people on the
shoulder to show it off to you.  

|  . Are the NAP and RA regular reports to NSF publicly (hopefully via
|    the Web) available?

Not sure.  It might be a good idea.

|  . Is there any way NANOG can be used to exchange status information
|    about networks, rather than getting comments and rumors second or
|    third hand.

outage-request at sprint.net can put you on our (very
widely-subscribed) list for announcing SprintLink/ICM
outages, innages, root problems and potential solutions.  I
hope that you'll find that the quality of information there
is reasonable and that volume is fairly light, and generally
is what you seem to be asking for.

|    Even better
|    then posting (e.g, via some mailing list) would be an accessible
|    distributed data base covering all the service pproviders and
|    accessible via the network. Is someone already working on that?
|    Would not NANOG be *the* forum to cooperate on that?

I'm certainly open to the idea.   

The first thing you'll have to realize is that when you
can't get to the database because the network is broken,
it can't help you...

	Sean.