CloudFlare issues?

James Jun james.jun at towardex.com
Tue Jun 25 00:19:14 UTC 2019


On Mon, Jun 24, 2019 at 08:03:26PM -0400, Tom Beecher wrote:
> 
> You are 100% right that 701 should have had some sort of protection
> mechanism in place to prevent this. But do we know they didn???t? Do we know
> it was there and just setup wrong? Did another change at another time break
> what was there? I used 701 many  jobs ago and they absolutely had filtering
> in place; it saved my bacon when I screwed up once and started
> readvertising a full table from a 2nd provider. They smacked my session
> down an I got a nice call about it.

In my past (and current) dealings with AS701, I do agree that they have generally
been good about filtering customer sessions and running a tight ship.  But, manual
config changes being what they are, I suppose an honest mistake or oversight issue
had occurred at 701 today that made them contribute significantly to today's outage.


> 
> It also would have been nice, in my opinion, to take a harder stance on the
> BGP optimizer that generated he bogus routes, and the steel company that
> failed BGP 101 and just gladly reannounced one upstream to another. 701 is
> culpable for their mistakes, but there doesn???t seem like there is much
> appetite to shame the other contributors.

I think the biggest question to be asked here -- why the hell is a BGP optimizer
(Noction in this case) injecting fake more specifics to steer traffic?  And why did a
regional provider providing IP transit (DQE), use such a dangerous accident-waiting-to-
happen tool in their network, especially when they have other ASNs taking transit
feeds from them, with all these fake man-in-the-middle routes being injected?

I get that BGP optimizers can have some use cases, but IMO, in most of the situations,
(especially if you are a network provider selling transit and taking peering yourself)
a well crafted routing policy and interconnection strategy eliminates the need for 
implementing flawed route selection optimizers in your network.

The notion of BGP Optimizer generating fake more specifics is absurd, and is definitely
not a tool that is designed to "fail -> safe".  Instead of failing safe, it has failed
epically and catastrophically today.  I remember long time ago, when Internap used
to sell their FCP product, Internap SE were advising the customer to make appropriate
adjustments to local-preference to prefer the FCP generated routes to ensure optimal
selection.  That is a much more sane design choice, than injecting man-in-the-middle
attacks and relying on customers to prevent a disaster.

Any time I have a sit down with any engineer who "outsources" responsibility of 
maintaining robustness principle onto their customer, it makes me want to puke.

James



More information about the NANOG mailing list