Any2 LAX

Bryan Holloway bryan at shout.net
Fri Jun 11 18:18:24 UTC 2021


This is what I got from those guys ...

--

CoreSite Incident Notification


Description:  During a planned maintenance event to integrate new 
hardware into our MPLS core an extreme dip in Any2 traffic was observed. 
After about 4 hours running in a degraded state, an emergency case was 
opened with the hardware vendor. After working with the hardware vendor 
to rule out any possible hardware or software bugs, the network 
engineering team located the source of the traffic loss. It was an 
errant configuration applied by the custom automation written to build 
LSP's in our MPLS network. A formal IR will be provided for this event.




On 6/11/21 8:03 PM, jim deleskie wrote:
> Also saw a major traffic drop. There is a Root Cause to be issued early 
> in the week I'm told.
> 
> 
> -jim
> 
> On Fri, Jun 11, 2021 at 2:42 PM Siyuan Miao <aveline at misaka.io 
> <mailto:aveline at misaka.io>> wrote:
> 
>     Yea, it was down but both RS are online and feeding us unreachable
>     nexthops during the outage .
> 
>     On Sat, Jun 12, 2021 at 1:27 AM Seth Mattinen <sethm at rollernet.us
>     <mailto:sethm at rollernet.us>> wrote:
> 
>         On 6/11/21 10:16 AM, Jon Lewis wrote:
>          > On Fri, 11 Jun 2021, Seth Mattinen wrote:
>          >
>          >> Did Any2 LAX barf last night between about 1am and 8am
>         Pacific time?
>          >
>          > More like 00:00-7:45 (Pacific time).
>          >
>          > Anyone know what broke, and why the IX was dead for nearly 8
>         hours?
>          > This is our second recent issue with "an Any2 IX", having
>         dealt with an
>          > IX partition event at Any2 Denver just a few weeks ago.
>          >
> 
> 
>         What I saw was a lot of unreachable nexthops (I'm in LA2) on routes
>         advertised through the route servers. Most of my direct BGP
>         sessions
>         were down, but a handful were still working including the route
>         servers.
> 
>         For example, I was getting routes for AS29791 from the route
>         servers,
>         but nexthop 206.72.211.106 was dead to me. Not to pick on
>         Internap other
>         than a mutual customer called me directly at 1am and wanted to
>         know why
>         things were down.
> 
>         I killed the route server sessions and went back to sleep.
> 
>         Feels like LA1 and LA2 got split, but however the route servers
>         interconnect still worked, which was problematic.
> 


More information about the NANOG mailing list