massive facebook outage presently

Baldur Norddahl baldur.norddahl at gmail.com
Mon Oct 4 20:12:06 UTC 2021


On Mon, 4 Oct 2021 at 21:58, Michael Thomas <mike at mtcc.com> wrote:

>
> On 10/4/21 11:48 AM, Luke Guillory wrote:
>
>
> I believe the original change was 'automatic' (as in configuration done
> via a web interface). However, now that connection to the outside world is
> down, remote access to those tools don't exist anymore, so the emergency
> procedure is to gain physical access to the peering routers and do all the
> configuration locally.
>
> Assuming that this is what actually happened, what should fb have done
> different (beyond the obvious of not screwing up the immediate issue)? This
> seems like it's a single point of failure. Should all of the BGP speakers
> have been dual homed or something like that? Or should they not have been
> mixing ops and production networks? Sorry if this sounds dumb.
>

Facebook is a huge network. It is doubtful that what is going on is this
simple. So I will make no guesses to what Facebook is or should be doing.

However the traditional way for us small timers is to have a backdoor using
someone else's network. Nowadays this could be a simple 4/5G router with a
VPN, to a terminal server that allows the operator to configure the
equipment through the monitor port even when the config is completely
destroyed.

Regards,

Baldur
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.nanog.org/pipermail/nanog/attachments/20211004/5a0a65e1/attachment.html>


More information about the NANOG mailing list