Facebook post-mortems...

Mark Tinka mark at tinka.africa
Tue Oct 5 15:06:35 UTC 2021

On 10/5/21 16:49, Joe Greco wrote:

> Unrealistic user expectations are not the point.  Users can demand
> whatever unrealistic claptrap they wish to.

The user's expectations, today, are always going to be unrealistic, 
especially when they are able to enjoy a half-decent service free-of-charge.

The bar has moved. Nothing we can do about it but adapt.

> The point is that there are a lot of helpdesk staff at a lot of
> organizations who are responsible for responding to these issues.
> When Facebook or Microsoft or Amazon take a dump, you get a storm
> of requests.  This is a storm of requests not just to one helpdesk,
> but to MANY helpdesks, across a wide number of organizations, and
> this means that you have thousands of people trying to investigate
> what has happened.

We are in agreement.

And it's no coincidence that the Facebook's of the world rely almost 
100% on non-human contact to give their users support. So that leaves 
us, infrastructure, in the firing line to pick up the slack for a lack 
of warm-body access to BigContent.

> It is very common for large companies to forget (or not care) that
> their technical failures impact not just their users, but also
> external support organizations.

Not just large companies, but I believe all companies... and worse, not 
at ground level where folk on lists like these tend to keep in touch, 
but higher up where money decisions where caring about your footprint on 
other Internet settlers whom you may never meet matters.

You and I can bash our heads till they come home, but if the folk that 
need to say "Yes" to $$$ needed to help external parties troubleshoot 
better don't get it, then perhaps starting a NOG or some such is our 
best bet.

> I totally get your disdain and indifference towards end users in these
> instances; for the average end user, yes, it indeed makes no difference
> if DNS works or not.

On the contrary, I looooooove customers. I wasn't into them, say, 12 
years ago, but since I began to understand that users will respond to 
empathy and value, I fell in love with them. They drive my entire 
thought-process and decision-making.

This is why I keep saying, "Users don't care about how we build the 
Internet", and they shouldn't. And I support that.

BigContent get it, and for better or worse, they are the ones who've set 
the bar higher than what most network operators are happy with.

Infrastructure still doesn't get it, and we are seeing the effects of 
that play out around the world, with the recent SK Broadband/Netflix 
debacle being the latest barbershop gossip.

> However, some of those end users do have a point of contact up the
> chain.  This could be their ISP support, or a company helpdesk, and
> most of these are tasked with taking an issue like this to some sort
> of resolution.  What I'm talking about here is that it is easier to
> debug and make a determination that there is an IP connectivity issue
> when DNS works.  If DNS isn't working, then you get into a bunch of
> stuff where you need to do things like determine if maybe it is some
> sort of DNSSEC issue, or other arcane and obscure issues, which tends
> to be beyond what front line helpdesk is capable of.

We are in agreement.

> These issues often cost companies real time and money to figure out.
> It is unlikely that Facebook is going to compensate them for this, so
> this brings me back around to the point that it's preferable to have
> DNS working when you have a BGP problem, because this is ultimately
> easier for people to test and reach a reasonable determination that
> the problem is on Facebook's side quickly and easily.

We are in agreement.

So let's see if Facebook can fix the scope of their DNS architecture, 
and whether others can learn from it. I know I have... even though we 
provide friendly secondary for a bunch of folk we are friends with, we 
haven't done the same for our own networks... all our stuff sits on just 
our network - granted in many different countries, but still, one AS.

It's been nagging at the back of my mind for yonks, but yesterday was 
the nudge I needed to get this organized; so off I go.


More information about the NANOG mailing list