Facebook post-mortems...

Warren Kumari warren at kumari.net
Tue Oct 5 18:07:46 UTC 2021


On Tue, Oct 5, 2021 at 1:47 PM Miles Fidelman <mfidelman at meetinghouse.net>
wrote:

> jcurran at istaff.org wrote:
>
> Fairly abstract - Facebook Engineering -
> https://m.facebook.com/nt/screen/?params=%7B%22note_id%22%3A10158791436142200%7D&path=%2Fnotes%2Fnote%2F&_rdr
> <https://m.facebook.com/nt/screen/?params=%7B%22note_id%22:10158791436142200%7D&path=/notes/note/&_rdr>
>
> Also, Cloudflare’s take on the outage -
> https://blog.cloudflare.com/october-2021-facebook-outage/
>
> FYI,
> /John
>
> This may be a dumb question, but does this suggest that Facebook publishes
> rather short TTLs for their DNS records?  Otherwise, why would an internal
> failure make them unreachable so quickly?
>

Looks like 60 seconds:

$  dig +norec star-mini.c10r.facebook.com. @d.ns.c10r.facebook.com.

; <<>> DiG 9.10.6 <<>> +norec star-mini.c10r.facebook.com. @
d.ns.c10r.facebook.com.
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 25582
;; flags: qr aa; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;star-mini.c10r.facebook.com. IN A

;; ANSWER SECTION:
star-mini.c10r.facebook.com. 60 IN A 157.240.229.35

;; Query time: 42 msec
;; SERVER: 185.89.219.11#53(185.89.219.11)
;; WHEN: Tue Oct 05 14:01:06 EDT 2021
;; MSG SIZE  rcvd: 72



... and cue the "Bwahahhaha! If *I* ran Facebook I'd make the TTL be [2
sec|30sec|5min|1h|6h+3sec|1day|6months|maxint32]" threads....

Choosing the TTL is a balancing act between stability, agility, load,
politeness, renewal latency, etc -- but I'm sure NANOG can boil it down to
"They did it wrong!..."

W


> Miles Fidelman
>
> --
> In theory, there is no difference between theory and practice.
> In practice, there is.  .... Yogi Berra
>
> Theory is when you know everything but nothing works.
> Practice is when everything works but no one knows why.
> In our lab, theory and practice are combined:
> nothing works and no one knows why.  ... unknown
>
>

-- 
The computing scientist’s main challenge is not to get confused by the
complexities of his own making.
  -- E. W. Dijkstra
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.nanog.org/pipermail/nanog/attachments/20211005/7d438b56/attachment.html>


More information about the NANOG mailing list