Request comment: list of IPs to block outbound

Lukas Tribus lists at ltri.eu
Fri Oct 18 20:45:27 UTC 2019


Hello,

On Fri, Oct 18, 2019 at 7:40 PM Saku Ytti <saku at ytti.fi> wrote:
> It's interesting to also think, when is good time to break things.
>
> CustomerA buys transit from ProviderB and ProviderA
>
> CustomerA gets new prefix, but does not appropriately register it.
>
> ProviderB doesn't filter anything, so it works. ProviderA does filter
> and does not accept this new prefix. Neither Provider has ACL.
>
>
> Some time passes, and ProviderB connection goes down, the new prefix,
> which is now old prefix experiences total outage. CustomerA is not
> happy.
>
>
> Would it have been better, if ProviderA would have ACLd the traffic
> from CustomerA? Forcing the problem to be evident when the prefix is
> young and not in production. Or was it better that it broke later on?

That's an orthogonal problem and it's solution hopefully doesn't
require a traffic impacting ingress ACL.

I'm saying this breaks valid configurations because even with textbook
IRR registrations there is a race condition between the IRR
registration (not a route-object, but a new AS in the AS-MACRO), the
ACL update and the BGP turn-up of a new customer (on AS further down).


Here's a environment for the examples below:

Customer C1 uses existing transits Provider P11 and P12 (meaning C1 is
actually a production network; dropping traffic sourced by it in the
DFZ is very bad; P11 and P12 is otherwise irrelevant).
Customer C1 is about to turn-up a BGP session to Provider P13.
Provider P13 is a Tier2 and buys transit from Tier1 Providers P1 and P2
Provider P2 deploys ingress ACLs depending on IRR data, based on P13's AS-MACRO.


Example 1:

P13's AS-MACRO is updated last-minute because:

- provisioning was last minute OR
- provisioning was wrong initially OR
- it's an emergency turn-up
- whatever the case IRR records are corrected only 60 minutes before the turn up
- and C1 is aware traffic towards C1 will completely converge only
after additional 24 hours (but that's accepted, because $reasons;
maybe C1 just needs TX bandwidth - in a hypothetical emergency turn-up
for example)

At the turn-up of C1_P13, traffic with as-path C1_P13_P2 is dropped,
because the ingress ACL at P2 wasn't updated yet (updated only once
every night). P13 expected prefixes not getting accepted at P2 on the
BGP session, but never would have imagined that traffic sourced from
valid prefixes present in the DFZ would be dropped.


Example 2:

Just as in example 1, C1 turns up BGP with P13, but the provisoning
was "normal". P13 AS-MACRO was updated correctly 36 hours before the
turn-up.

However, at P2 the nightly cronjob for IRR updates (prefix-lists and
ACL ingress filters) failed. It's is monitored and a ticket about the
failing cronjob was raised, however they either:

- the did not recognize the severity, because "worst-case some new
prefixes are not allowed in ingress tomorrow"
- where unable to fix it in just a few hours
- did fix it, but did not trigger a subsequent full rerun ("it will
run next time", or "it could not complete anyway before the next run")
- maybe the node was actually just unreachable for a regular
maintenance, so automation could not connect this time around
- or maybe automation just couldn't connect to the $node, because
someone regenerated the SSH key by mistake this morning

Whatever the case, the point is: for internal problems at P2, the ACL
wasn't updated during the night like it usually does. And at turn-up
of C1_P13, C1_P13_P2 traffic is again dropped on the floor.



When you reject a BGP prefix, you don't blackhole traffic, with an
ingress ACL you do. That is a big difference and because of this, you
*but more importantly every single downstream ASN* need to account for
race conditions and failures in the entire process, that includes the
immediate resolution thereof, which is not required for BGP strict
prefix-lists and uRPF loose mode.


Is this deployed like this in a production transit network? How does
this network handle a failure like in example 2? How does it
downstream customers handle the race conditions like in example 1?


For the record: I'm imagining myself operating P13 getting blamed in
both examples for partially blackholing C1's traffic at the turn-up.



Thanks,
Lukas



More information about the NANOG mailing list