Facebook post-mortems...

Hank Nussbacher hank at interall.co.il
Wed Oct 6 04:51:52 UTC 2021


On 05/10/2021 21:11, Randy Monroe via NANOG wrote:
> Updated: 
> https://engineering.fb.com/2021/10/05/networking-traffic/outage-details/ 

Lets try to breakdown this "engineering" blog posting:

- "During one of these routine maintenance jobs, a command was issued 
with the intention to assess the availability of global backbone 
capacity, which unintentionally took down all the connections in our 
backbone network"

Can anyone guess as to what command FB issued that would cause them to 
withdraw all those prefixes?

- "it was not possible to access our data centers through our normal 
means because their networks were down, and second, the total loss of 
DNS broke many of the internal tools we’d normally use to investigate 
and resolve outages like this.  Our primary and out-of-band network 
access was down..."

Does this mean that FB acknowledges that the loss of DNS broke their OOB 
access?

-Hank


More information about the NANOG mailing list