Facebook post-mortems...

Masataka Ohta mohta at necom830.hpcl.titech.ac.jp
Wed Oct 6 08:51:25 UTC 2021


Hank Nussbacher wrote:

> - "it was not possible to access our data centers through our normal
>  means because their networks were down, and second, the total loss
> of DNS broke many of the internal tools we'd normally use to
> investigate and resolve outages like this.  Our primary and
> out-of-band network access was down..."
> 
> Does this mean that FB acknowledges that the loss of DNS broke their
> OOB access?

It means FB still do not yet understand what happened.

Lack of BGP announcement does not mean "total loss". Name
servers should still be accessible by internal tools.

But, withdrawing route (for BGP and, maybe, IGP) of failing anycast
server is a bad engineering seemingly derived from commonly seen
misunderstanding that anycast could provide redundancy.

Redundancy of DNS is maintained by multiple (unicast or anycast)
name servers with different addresses, for which, withdrawal of
failing route is unnecessary complication.

						Masataka Ohta


More information about the NANOG mailing list