Carrier Circus (was RE: Intermedia (ICIX) brokenness...)

Fri May 4 19:47:09 UTC 2001

On Fri, May 04, 2001 at 12:18:18PM -0700, Jonathan Disher wrote:
>
> Personally, I'm still trying to figure out why Exodus, in all their
> apparent wisdom (or lack thereof), has stopped using the GBLX OC-48's
> in the former GlobalCenter facilities (or at least SNV3), and is now
> shuttling all its traffic out a single Exodus OC-12.  Prior to
> yesterday these traces would've shown gblx.net routers (on different
> IPs), and would never have touched an exodus backbone...

Hrm lets think about that for a momment shall we. Could it be, perhaps,
that Exodus purchased GlobalCenter and is integrating those facilities
into their network? Could it also be that Exodus has a well designed
network where most of the traffic is quickly sent to peers and an OC48
backbone is not required? I don't see any congestion on that OC12, so
perhaps that is the case? I also don't see a damn thing wrong with the
traceroute you provided, and an OC12 peer to UU is pretty good. Was there
some other complaint or do you just not like it when your traceroute
changes?

> Of course, this is probably a move I should've expected from Exodus,
> after the mongolian flustercluck that was the AS change in SNV3.  
> You'd think they would do something like that carefully, as you can
> -seriously- bone customers.  But noooooo.  One of our junior admins
> made the change (since I was out of town, but hey, it's cut and
> paste!).  He, and all of the other affected customers in SNV3 on the
> conference call, were left on hold for about half an hour (plus the
> call started half an hour late), whereupon the exodus engineering team
> popped back in and said "We're done with our side, you guys go
> ahead!".

Actually I was awake for that. I guess your junior engineer wasn't able to
figure out that if he simply put in an additional neighbor statement with
the new AS your downtime would have been less then 30 seconds as bgp came
back up. 30 second outages are pretty light in the history of GCTR and
GBLX outages, if you can't handle maint then you should have setup static
routes out or multihomed, but you shouldn't blame your stupidity or lack
of forethought on other networks.

> Now.  Does it seem logical to kill connectivity over BOTH of your
> hosting routers at once, thus killing every single BGP-running
> customer you have that isn't physically in their cage at the time?  
> Or would it seem better to do what I assumed they'd do, which is do
> one router, wait for everyone to make changes, then do the other?

ASN changes are not exactly easy or frequent, but I seem to recall that
one going over rather smoothly. Customers were given ample warning and a
conference call was setup to handle any outstanding issues, of which there
were none.

> I guess this is what happens when I assume intelligence at a
> hosting/backbone provider.

Or when we assume intelligent posts to nanog...

-- 
Richard A Steenbergen <ras at e-gerbil.net>       http://www.e-gerbil.net/ras
PGP Key ID: 0x138EA177  (67 29 D7 BC E8 18 3E DA  B2 46 B3 D8 14 36 FE B6)