DNS pulling BGP routes?

Jon Lewis jlewis at lewis.org
Wed Oct 6 22:50:19 UTC 2021


On Wed, 6 Oct 2021, Michael Thomas wrote:

>
> On 10/6/21 3:33 PM, Jon Lewis wrote:
>>  On Wed, 6 Oct 2021, Michael Thomas wrote:
>>
>>>>   People have been anycasting DNS server IPs for years (decades?). So,
>>>>  no.
>>>>
>>>  But it wasn't just their DNS subnets that were pulled, I thought. I'm
>>>  obviously really confused. Anycast to a DNS server makes sense that
>>>  they'd pull out if they couldn't contact the backend. But I thought that
>>>  almost all of their routes to the backend were pulled? That is, the DFZ
>>>  was emptied of FB routes.
>>
>>  Well, as someone else said, DNS wasn't the problem...it was just one of
>>  the more noticeable casualties.  Whatever they did broke the network
>>  rather completely, and that took out all of their DNS, which broke lots of
>>  other things that depend on DNS.
>> 
> Maybe the problem here is that two things happened and the article conflated 
> the two: the DNS infrastructure pulled its routes from the anycast address 
> and something else pulled all of the other routes but wasn't mentioned in the 
> article.

>From the engineering.fb.com article:

"This was the source of yesterday’s outage. During one of these routine 
maintenance jobs, a command was issued with the intention to assess the 
availability of global backbone capacity, which unintentionally took down 
all the connections in our backbone network, effectively disconnecting 
Facebook data centers globally."

If you kill the backbone, and every site determines "my connectivity is 
hosed, suppress anycast propagation.", then you simultaneously have no 
network, and no anycast (which might otherwise propagate to transit/peers 
at each or at least some subset of your sites). All of your internal data 
and communication systems that rely on both network and working DNS 
suddenly don't work, so internal communications likely degraded to 
engineers calling or texting each other.

>From one of the earlier articles, it sounds like they don't have true out 
of band access to their routers/switches, which makes it kind of hard to 
fix the network, if it's no longer a network and you have no access to 
console or management ports.

----------------------------------------------------------------------
  Jon Lewis, MCP :)           |  I route
  StackPath, Sr. Neteng       |  therefore you are
_________ http://www.lewis.org/~jlewis/pgp for PGP public key_________


More information about the NANOG mailing list