Facebook post-mortems...

Bjørn Mork bjorn at mork.no
Wed Oct 6 06:56:33 UTC 2021

Masataka Ohta <mohta at necom830.hpcl.titech.ac.jp> writes:

> As long as name servers with expired zone data won't serve
> request from outside of facebook, whether BGP routes to the
> name servers are announced or not is unimportant.

I am not convinced this is true.  You'd normally serve some semi-static
content, especially wrt stuff you need yourself to manage your network.
Removing all DNS servers at the same time is never a good idea, even in
the situation where you believe they are all failing.

The problem is of course that you can't let the servers take the
decision to withdraw from anycast if you want to prevent this
catastrophe.  The servers have no knowledge of the rest of the network.
They only know that they've lost contact with it.  So they all make the
same stupid decision.

But if the servers can't withdraw, then they will serve stale content if
the data center loses backbone access. And with a large enough network
then that is probably something which happens on a regular basis.

This is a very hard problem to solve.

Thanks a lot to facebook for making the detailed explanation available
to the public.  I'm crossing my fingers hoping they follow up with
details about the solutions they come up with.  The problem affects any
critical anycast DNS service. And it doesn't have to be as big as
facebook to be locally critical to an enterprise, ISP or whatever.


More information about the NANOG mailing list