.ORG problems this evening

Todd Vierling tv at duh.org
Thu Sep 18 23:11:59 UTC 2003


On Thu, 18 Sep 2003, Majdi S. Abbas wrote:

: > Sucks to be anyone trying to use the service whose routers pick those nodes
: > as the only ones available.  That's the fault of the implementor, not the
: > client.

: 	I think it's out of line to speculate on how UltraDNS has configured
: these clusters,

I don't care what the underlying implementation is.  I care about the
effect:  that for at least one hour, possibly up to two last night, one of
the physical locations went dead but was still considered available via
BGP, while being considered the best.available path to both nets.

: 	First it was two locations, one of which you can't tell us about
: (Deep inside OSPF Area 51?)

I can't provide all the exact source machines for reasons I can discuss
offlist, but I'm happy to do so to a representative of UltraDNS.  My home
machine, though, is 66.56.93.94.

: now it's several?

Three to be exact that I verified last night to be unable to query DNS from
either IP address: one at my home (Atlanta GA), one at my employer (Atlanta
GA), and one in Chicago IL.  However, here's three straw examples of both
IPs going to the same place from spot checks right now (funny, my home
machine actually gets two different ones at this moment):

===== Southern CA =====
traceroute to tld1.ultradns.net (204.74.112.1): 1-30 hops, 38 byte packets
...
 .  p4-1-0-0.r00.lsanca01.us.bb.verio.net (129.250.16.80)  16.9 ms (ttl=251!)
 .  p16-1-1-0.r21.lsanca01.us.bb.verio.net (129.250.2.10)  19.5 ms (ttl=250!)
 .  ge-1-0.a01.lsanca02.us.ra.verio.net (129.250.29.131)  3.44 ms
 .  66.238.50.26.ptr.us.xo.net (66.238.50.26)  13.2 ms (ttl=248!)
 .  dellfwisi.ultradns.net (204.74.98.2)  13.8 ms (ttl=57!) !H

traceroute to tld2.ultradns.net (204.74.113.1): 1-30 hops, 38 byte packets
...
 .  p5-1-0-0.RAR1.LA-CA.us.xo.net (65.106.5.13)  2.64 ms (ttl=250!)
 .  p0-0-0.MAR1.LA-CA.us.xo.net (65.106.5.6)  2.73 ms (ttl=249!)
 .  p1-0.CHR1.LA-CA.us.xo.net (207.88.81.166)  2.78 ms
 .  66.238.50.26.ptr.us.xo.net (66.238.50.26)  35.0 ms
 .  dellfwisi.ultradns.net (204.74.98.2)  29.7 ms (ttl=57!) !H

===== Dallas TX =====
traceroute to tld1.ultradns.net (204.74.112.1): 1-30 hops, 38 byte packets
...
 .  p16-0-0-0.r01.atlnga03.us.bb.verio.net (129.250.4.195)  25.3 ms (ttl=250!)
 .  p16-2-0-0.r00.atlnga03.us.bb.verio.net (129.250.5.16)  25.3 ms (ttl=249!)
 .  p16-1-0-0.r01.mclnva02.us.bb.verio.net (129.250.2.48)  40.8 ms (ttl=247!)
 .  ge-1-0-0.a00.mclnva02.us.ra.verio.net (129.250.31.170)  40.8 ms (ttl=246!)
 .  168.143.247.38 (168.143.247.38)  44.1 ms (ttl=246!)
 .  64.124.112.141.ultradns.com (64.124.112.141)  45.0 ms (ttl=244!)
 .  dellfwpxvn.ultradns.net (204.74.104.2)  43.7 ms (ttl=53!) !H

traceroute to tld2.ultradns.net (204.74.113.1): 1-30 hops, 38 byte packets
...
 .  sl-bb26-fw-5-1.sprintlink.net (144.232.20.147)  7.54 ms
 .  sl-bb25-fw-15-0.sprintlink.net (144.232.11.89)  32.0 ms
 .  sl-bb23-atl-10-0.sprintlink.net (144.232.20.60)  36.4 ms
 .  sl-bb26-rly-14-1.sprintlink.net (144.232.20.65)  33.3 ms
 .  sl-st21-ash-14-2.sprintlink.net (144.232.20.3)  34.8 ms
 .  sl-xocomm-5-0.sprintlink.net (144.223.246.50)  34.2 ms
 .  p5-0-0.RAR1.Washington-DC.us.xo.net (65.106.3.133)  35.3 ms (ttl=245!)
 .  p6-1-0.MAR1.Washington-DC.us.xo.net (65.106.3.182)  35.7 ms (ttl=244!)
 .  p0-0.CHR1.Washington-DC.us.xo.net (207.88.87.10)  35.7 ms
 .  64.124.112.141.ultradns.com (64.124.112.141)  39.7 ms (ttl=244!)
 .  dellfwpxvn.ultradns.net (204.74.104.2)  40.0 ms (ttl=53!) !H

===== Chicago IL =====
traceroute to tld1.ultradns.net (204.74.112.1): 1-30 hops, 38 byte packets
...
 .  gige3-2.core2.Chicago1.Level3.net (209.244.8.185)  0.796 ms
 .  so-4-1-0.bbr1.Chicago1.level3.net (209.247.10.165)  0.905 ms (ttl=250!)
 .  so-6-0-0.edge1.Chicago1.Level3.net (209.244.8.10)  1.01 ms (ttl=249!)
 .  verio-level3-oc12.Chicago1.Level3.net (209.0.227.66)  0.860 ms (ttl=251!)
 .  ge-1-2.a00.chcgil07.us.ra.verio.net (129.250.25.136)  0.967 ms (ttl=253!)
 .  fa-2-1.a00.chcgil07.us.ce.verio.net (128.242.186.134)  1.04 ms (ttl=251!)
 .  dellfweqch.ultradns.net (204.74.102.2)  0.881 ms (ttl=60!) !H

traceroute to tld2.ultradns.net (204.74.113.1): 1-30 hops, 38 byte packets
...
 .  0.so-1-0-0.XL2.CHI13.ALTER.NET (152.63.69.182)  1.58 ms (ttl=251!)
 .  POS7-0.BR1.CHI13.ALTER.NET (152.63.73.22)  1.29 ms
 .  a11-0d114.IR1.Chicago2-IL.us.xo.net (206.111.2.73)  1.11 ms (ttl=251!)
 .  p5-0-0.RAR1.Chicago-IL.us.xo.net (65.106.6.133)  1.40 ms
 .  p4-0-0.MAR1.Chicago-IL.us.xo.net (65.106.6.142)  2.03 ms
 .  p0-0.CHR1.Chicago-IL.us.xo.net (207.88.84.10)  1.80 ms (ttl=248!)
 .  *
 .  dellfweqch.ultradns.net (204.74.102.2)  1.48 ms (ttl=60!) !H

===

: 	Are you absolutely, positively sure this cluster was responding to 0
: queries,

Yes.  My mail server was more or less dead (it's a .org) for an hour, and I
was trying frantically to get DNS to resolve with all kinds of "dig"
requests directly to the IPs and traceroute tests until I gave up after an
hour.

: but still propagating those two /24's?

Both traceroutes went to the same place.  I might have had more information
available, had I known this was a more complicated problem; my original post
was just a "did anyone else see this problem?" query.

I had thought, at first (because my spot checks noted above also timed out),
that the zone's "only" servers may have in fact been dead and happened to be
located in the same place -- I didn't know they were anycasted until I
posted here and received responses.  Effectively, of course, those *were*
the only servers for the zone.

: > On the other hand, if you can't see the fatal flaw in a major Internet
: > infrastructure service depending on a single point of failure, I can point
: > you at a few books that could enlighten you.
:
: 	It isn't a single point of failure,

It's a single point of failure -- or a blackhole, if you will -- when both
anycast addresses point to the same destination from any site, and that
destination is dead in the water.  The perspective of whether there is
redundant failover is from the querying site, not the provider.

What else should I call it?

-- 
-- Todd Vierling <tv at duh.org> <tv at pobox.com>



More information about the NANOG mailing list