massive facebook outage presently

> I believe the original change was 'automatic' (as in configuration done
> via a web interface). However, now that connection to the outside world is
> down, remote access to those tools don't exist anymore, so the emergency
> procedure is to gain physical access to the peering routers and do all the
> configuration locally.
> Assuming that this is what actually happened, what should fb have done
> different (beyond the obvious of not screwing up the immediate issue)? This
> seems like it's a single point of failure. Should all of the BGP speakers
> have been dual homed or something like that? Or should they not have been
> mixing ops and production networks? Sorry if this sounds dumb.

Facebook is a huge network. It is doubtful that what is going on is this
simple. So I will make no guesses to what Facebook is or should be doing.

However the traditional way for us small timers is to have a backdoor using
someone else's network. Nowadays this could be a simple 4/5G router with a
VPN, to a terminal server that allows the operator to configure the
equipment through the monitor port even when the config is completely


