Facebook post-mortems...

Joe Greco jgreco at ns.sol.net
Tue Oct 5 14:49:12 UTC 2021

On Tue, Oct 05, 2021 at 03:40:39PM +0200, Mark Tinka wrote:
> Yes, total nightmare yesterday, but sure that 9,999 of the helpdesk 
> tickets had nothing to do with DNS. They likely all were - "Your 
> Internet is down, just fix it; we don't wanna know".

Unrealistic user expectations are not the point.  Users can demand
whatever unrealistic claptrap they wish to. 

The point is that there are a lot of helpdesk staff at a lot of
organizations who are responsible for responding to these issues.
When Facebook or Microsoft or Amazon take a dump, you get a storm
of requests.  This is a storm of requests not just to one helpdesk,
but to MANY helpdesks, across a wide number of organizations, and
this means that you have thousands of people trying to investigate
what has happened.

It is very common for large companies to forget (or not care) that
their technical failures impact not just their users, but also
external support organizations.

I totally get your disdain and indifference towards end users in these
instances; for the average end user, yes, it indeed makes no difference
if DNS works or not.

However, some of those end users do have a point of contact up the
chain.  This could be their ISP support, or a company helpdesk, and
most of these are tasked with taking an issue like this to some sort
of resolution.  What I'm talking about here is that it is easier to
debug and make a determination that there is an IP connectivity issue
when DNS works.  If DNS isn't working, then you get into a bunch of
stuff where you need to do things like determine if maybe it is some
sort of DNSSEC issue, or other arcane and obscure issues, which tends
to be beyond what front line helpdesk is capable of.

These issues often cost companies real time and money to figure out.
It is unlikely that Facebook is going to compensate them for this, so
this brings me back around to the point that it's preferable to have
DNS working when you have a BGP problem, because this is ultimately
easier for people to test and reach a reasonable determination that
the problem is on Facebook's side quickly and easily.

... JG
Joe Greco - sol.net Network Services - Milwaukee, WI - http://www.sol.net
"The strain of anti-intellectualism has been a constant thread winding its way
through our political and cultural life, nurtured by the false notion that
democracy means that 'my ignorance is just as good as your knowledge.'"-Asimov

More information about the NANOG mailing list