massive routes hijack at AS48400, up to 6000 AS affected?

Sun Jan 25 05:47:21 UTC 2009

On Jan 24, 2009, at 9:47 PM, AKK wrote:

> Hi all,
>
> Jan 24 23:20 - Jan 25 01:45 UK time, from LINX peers I have seen  
> major performance degradation on unusually strange route to some  
> eastern Europe countries - see MTR at the bottom of this email.
>
> If this is true, it is exactly what few people told us(and we knew)  
> last year. Probably AS48400, which is defined as two ISP multihomed  
> non-transit at RIPEDB, announced various prefixes it had from their  
> ISP.. actually from one ISP to another becoming a transit ISP. I am  
> sure this was not the only faulty point in this failure, but it took  
> quite a while for ISPs to fix it..  It would be interesting to know  
> whether that was unintentional... I know that at least two countries  
> in that way were almost taken internationally offline. I wonder if  
> some sort of action should/will be taken.. Unfortunately ripe's  
> asinuse doesn't tell the story about 48400, but you would find that  
> in raw update logs and bgpplay where I counted around 6k AS were  
> affected.

A cursory looks suggests this was nothing more than an
ordinary route leak (inspection of the leaking AS, as
well as location of that AS in the relevant paths, and
the preservation of the prefix lengths; versus some
deaggregation or re-origination, and the proximity of
those leaked routes to the leaking ASes transit providers).
Of course, an ordinary route leak could be the result of
accidental configuration, but it could have been malicious
as well.

When you've got:

1) an AS multi-homed to two different ISPs, AND
2) that AS fails to scope what they announce to those ISPs
(i.e.,  advertise only locally originated and downstream
prefixes explicitly), AND
3) one or both of the ISPs employ neither per-prefix, or
explicit AS path filtering on ingress, OR
4) they do, but they enable a BGP session _before they
apply that ingress policy on the session

This is exactly what you get....

To complicate things further, common RFC 1998-style routing
policy models result in most clueful ISPs preferring customer
routes over peer routes (e.g., via local preference), so
those leaked routes are now the preferred path to ALL those
prefixes (because local preference trumps AS path), so the
customer, with a T1, E1, 100M Ethernet, or whatever, is now
the sole primary transit data path between the two networks
in question, and all their non-customer egress traffic takes
that route, the congestion and collateral damage makes fixing
the mistake .. challenging.

We always had fallback AS path filters we'd apply to peers
in the past that were automatically applied to all new
sessions *before* they were turned up in order to avert this
type of problem.  Basically, the AS path filters listed
all the ASNs that we bi-laterally interconnected with, so that
if a customer leaked any of their other transit ISPs routes to
us, they'd be dropped.

Ohh, and regarding 4) above, I experienced that first
hand in 1995, when a BGP session between iMCI and Sprint
was turned up before ANY ingress BGP policy was applied.
The T1 customer immediately became the sole transit path
for all iMCI -> Sprint traffic, and all iMCI ->
non-iMCI-customer traffic, as they were taking full routes
from Sprint, and we preferred those paths (because of
default local preference) over all other paths.  Took a
while to get the router rebooted to fixed the problem.
I suspect you can find some evidence of this event in some
dusty NANOG archives out there somewhere.

Glad to see things have evolved so little...

I'll dig a little deeper and qualify the terse look
I've already had when I get some time.  I suspect others
will be looking as well.

-danny