Cogent Layer 2

Thu Oct 15 07:28:26 UTC 2020

Saku,

My experience with multiple carriers is that reroutes happen in under a minute but rarely happen, I also have redundant backup circuits to another datacenter, so no traffic is truly lost. If an outage lasts longer than 5 minutes, or it's flapping very frequently, then I call the carrier. Last mile carriers install CPE equipment at the sites, which makes BFD a requirement to account for the fiber uplink on it going down, or an issue upstream.
As for security vulnerabilities, none can be leveraged if they are using internal IPs, and if not, a quick ACL can drop BFD traffic from unknown sources the same way BGP sessions are filtered.
In Juniper speak, the ACL would look like:
(under policy-options)
prefix-list bgp_hosts {
apply-path "protocols bgp group <*> neighbor <*>";
}

(under firewall family inet(6) filter mgmt_acl)
term allow_bfd {
from {
protocol udp;
destination-port [ 3784 3785 4784 ];
source-prefix-list bgp_hosts;
}
then accept;
}
term deny_bfd {
from {
protocol udp;
destination-port [ 3784 3785 4784 ];
}
then discard;
}

Ryan
On Oct 14 2020, at 11:29 pm, Saku Ytti <saku at ytti.fi> wrote:
> On Thu, 15 Oct 2020 at 09:11, Ryan Hamel <ryan at rkhtech.org (mailto:ryan at rkhtech.org)> wrote:
>
>
> > Yep. Make sure you run BFD with your peering protocols, to catch outages very quickly.
>
> Make sure you get higher availability with BFD than without it, it is easy to get this wrong and end up losing availability.
>
> First issue is that BFD has quite a lot of bug surface, because unlike most of your control-plane protocols, BFD is implemented in your NPU ucode when done right.
> We've had the entire linecard down on ASR9k due to BFD, their BFD-of-death packet you can send over the internet to crash JNPR FPC.
> When done in a control-plane, poor scheduling can cause false positives more often than it protects from actual outages (CISCO7600).
>
> In a world where BFD is perfect you still need to consider what you are protecting yourself from, so you bought Martini from someone and run your backbone over that Martini. What is an outage? Is your provider IGP rerouting due to backbone outage an outage to you? Or would you rather the provider convergees their network and you don't converge, you take the outage?
> If provider rerouting is not an outage, you need to know what their SLA is regarding rerouting time and make BFD less aggressive than that. If provider rerouting is an outage, you can of course run as aggressive timers as you want, but you probably have lower availability than without BFD.
>
> Also, don't add complexity to solve problems you don't have. If you don't know if BFD improved your availability, you didn't need it.
> Networking is full of belief practices, we do things because we believe they help and faux data is used often to dress the beliefs as science. The problem space tends to be complex and good quality data is sparse to come by, we do necessarily fly a lot by the seat of our pants, if we admit or not.
> My belief is the majority of BFD implementations in real life on average reduce availability, my belief is you need frequently failing link which does not propagate link-down to reliability improve availability by deploying BFD.
>
>
>
>
>
> --
> ++ytti
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.nanog.org/pipermail/nanog/attachments/20201015/7c3cb8cc/attachment.html>