DNS pulling BGP routes?

Christopher Morrow morrowc.lists at gmail.com
Mon Oct 11 15:04:38 UTC 2021


On Sat, Oct 9, 2021 at 11:16 AM Masataka Ohta <
mohta at necom830.hpcl.titech.ac.jp> wrote:

> Bill Woodcock wrote:
>
> >> It may be that facebook uses all the four name server IP addresses
> >> in each edge node. But, it effectively kills essential redundancy
> >> of DNS to have two or more name servers (at separate locations)
> >> and the natural consequence is, as you can see, mass disaster.
> >
> > Yep.  I think we even had a NANOG talk on exactly that specific topic a
> long time ago.
> >
> >
> https://www.pch.net/resources/Papers/dns-service-architecture/dns-service-architecture-v10.pdf
>
> Yes, having separate sets of anycast addresses by two or more pops
> should be fine.
>
>
To be fair, it looks like FB has 4 /32's (and 4 /128's) for their DNS
authoritatives.
All from different /24's or /48's, so they should have decent routing
diversity.
They could choose to announce half/half from alternate pops, or other games
such as this.
I don't know that that would have solved any of the problems last week nor
any problems in the future.
I think Bill's slide 30 is pretty much what FB has/had deployed:
  1) I would think the a/b cloud is really 'as similar a set of paths from
like deployments as possible
  2) redundant pairs of servers in the same transit/network
  3) hidden masters (almost certainly these are in the depths of the FB
datacenter network)
      (though also this part isn't important for the conversation)
  4) control/sync traffic on a different topology than the customer serving
one


> However, if CDN provider has their own transit backbone, which is,
> seemingly, not assumed by your slides, and retail ISPs are tightly
>

I think it is, actually, in slide 30 ?
   "We need a network topology to carry control and synchronization traffic
between the nodes"

connected to only one pop of the CDN provider, the CDN provider
>

it's also not clear that FB is connecting their CDN to single points in any
provider...
I'd guess there are some cases of that, but for larger networks I would
imagine there are multiple CDN
deployments per network. I can't imagine that it's safe to deploy 1 CDN
node for all of 7018 or 3320...
for instance.


> may be motivated to let users access only one pop killing essential
> redundancy of DNS, which should be overengineering, which is my
> concern of the paragraph quoted by you.
>
>
it seems that the problem FB ran into was really that there wasn't either:
   "secondary path to communicate: "You are the last one standing, do not
die"  (to an edge node)
 or:
  "maintain a very long/less-preferred path to a core location(s) to
maintain service in case the CDN disappears"

There are almost certainly more complexities which FB is not discussion in
their design/deployment which
affected their services last week, but it doesn't look like they were very
far off on their deployment, if they
need to maintain back-end connectivity to serve customers from the CDN
locales.

-chris
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.nanog.org/pipermail/nanog/attachments/20211011/223dd11f/attachment.html>


More information about the NANOG mailing list