GoDaddy.com shuts down entire data center?
Steve Gibbard
scg at gibbard.org
Tue Jan 17 01:04:48 UTC 2006
On Sun, 15 Jan 2006, Elijah Savage wrote:
>
> Any validatity to this and if so I am suprised that our team has got no calls
> on not be able to get to certain websites.
>
> http://webhostingtalk.com/showthread.php?t=477562
Casting blame may be a fun exercise. Listening to others cast blame gets
old fast. The more useful question here is whether there are lessons the
rest of us can learn from this incident.
The most important lesson is probably that your problems will almost
always be more important to you than to somebody else. If you end up with
a business killing problem, it doesn't matter if it's somebody else's
fault -- you're the one who will be out of business. Likewise, you
shouldn't go wandering out into heavy traffic just because the drivers are
required by law to stop for you.
Choosing your vendors carefully is important. Having a backup plan for
what to do if your vendors fail you is a good thing, but it's nice not to
have to use the backup plan. Likewise, if something is really important
to you, make sure your vendors know that. Nobody wants to suddenly find
out in the middle of the night that they're responsible for something
critical.
Knowing what's important to you in advance can help you figure out what
arrangements need to be made. If your hosting operation won't run without
power, Internet connectivity, and DNS, making sure your power,
connectivity, and DNS are robust matters a lot. If your business can
continue to operate for a few days without toner for your laser printer,
choosing a less reliable toner supplier is probably ok.
If you do need to call your vendors, having a clear explanation of what's
going on is often a good thing. "An entire datacenter" is an awfully
vague term. If that were all of, say, Equinix Ashburn, it would be a big
enough deal that government regulators would probably be concerned. But a
room in the back of somebody's office with a rack of servers in it could
also be justifiably called a "datacenter" (and a rack of servers in the
back of somebody's office could also be important to somebody). It's
probably better to be able to say, "x number of domains are down,
representing y amount of revenue for our company and z critical service
that the rest of the Internet relys on. This might put us out of
business." This still may not get the desired response -- it's not your
vendor who is going to be put out of business -- but it at least gives the
person on the other end of the phone call some idea of what they're
dealing with.
Protecting everything you've decided is important may be expensive. It
may not be worth the cost. It's best to have made that calculation before
the problem starts, when there's still time to spend money on protection
if you do decide it's worth it.
Not having all your DNS servers in the same domain, or registered through
the same registrar, isn't a "best practice" that has previously occurred
to me, but it makes a lot of sense now that I think about it. Looking at
the big TLDs, .com and .net have all their servers in the gtld-servers.net
domain, but Verisign controls .net and can presumably fix gtld-servers.net
if it breaks. UltraDNS has their TLD servers (for .org and others) in
several different TLDs. Maybe that is to protect against this sort of
thing.
And there's a PR lesson here, too. I'd never heard of Nectartech before
this, and I'm guessing that's the case for a lot of NANOG readers. Having
heard this story, I'd be hesitant to register a domain with GoDaddy, and
that was presumably the goal. But I'd be hesitant to rely on a company
with a name like GoDaddy anyway, just because of the name. Now that I've
heard of Nectartech, I know them as the company that had the outage.
That's not exactly a selling point.
I've certainly got sympathy for Mr. Perkel. I've learned a lot of the
lessons above the hard way, some due to my own miscalculations and some
due to working for companies that didn't value my time and stress levels
as highly as I would have liked (choosing your employers carefully is
important too...).
These lessons don't apply just to networking. The loss prevention
department of a bank once locked my account for "suspicious activity" on a
Friday afternoon and then left for the weekend. I had two dollars in my
wallet, and didn't have much food. Escalating as far as I could through
the ranks of people working the bank's customer service lines on Friday
evening, I didn't manage to find anybody who didn't think I should just
wait until Monday. Multiple accounts at different banks, neither of which
is the bank that locked my account, now seem like a very good idea.
-Steve
More information about the NANOG
mailing list