Anycast 101

Thu Dec 16 23:31:37 UTC 2004

I got some messages from people who weren't exactly clear on how 
anycast works and fails. So let me try to explain...

In IPv6, there are three ways to address a packet: one-to-one 
(unicast), one-to-many (multicast), or one-to-any (anycast). Like 
multicast addresses, anycast addresses are shared by a group of 
systems, but a packet addressed to the group address is only delivered 
to a single member of the group. IPv6 has "round robin ARP" 
functionality that allows anycast to work on local subnets.

Anycast DNS is a very different beast. Unlike IPv6, IPv4 has no 
specific support for anycast, and the point here is to distribute the 
group address very widely, rather than over a single subnet anyway. So 
what happens is that a BGP announcement that covers the service address 
is sourced in different locations, and each location is basically 
configured to think it's the "owner" of the address.

The idea is that BGP will see the different paths towards the different 
anycast instances, and select the best one. Now note that the only real 
benefit of doing this is reducing the network distance between the 
users and the service. (Some people cite DoS benefits but DoSsers play 
the distribution game too, and they're much better at it.)

Anycast is now deployed for a significant number of root and gtld 
servers. Before anycast, most of those servers were located in the US, 
and most of the rest of the world suffered significant latency in 
querying them. Due to limitations in the DNS protocol, it's not 
possible to increase the number of authoritative DNS servers for a zone 
beyond around 13. With anycast, a much larger part of the world now has 
regional access to the root and com and net zones, and probably many 
more that I don't know about.

However, there are some issues. The first one is that different packets 
can end up at different anycast instances. This can happen when BGP 
reconverges after some network event (or after an anycast instance goes 
offline and stops announcing the anycast prefix), but under some very 
specific circumstances it can also happen with per packet load 
balancing. Most DNS traffic consists of single packets, but the DNS 
also uses TCP for queries sometimes, and when intermediate MTUs are 
small there may be fragmentation.

Another issue is the increased risk of fait sharing. In the old root 
setup, it was very unlikely for a non-single homed network to see all 
the root DNS servers behind the same next hop address. With anycast, 
this is much more likely to happen. The pathological case is one where 
a small network connects to one or more transit networks and has 
local/regional peering, and then sees an anycast instance for all root 
servers over peering. If then something bad happens to the peering 
connection (peering router melts down, a peer pulls an AS7007, peering 
fabric goes down, or worse, starts flapping), all the anycasted 
addresses become unreachable at the same time.

Obviously this won't happen to the degree of unreachability in practice 
(well, unless there are only two addresses that are both anycast for a 
certain TLD, then your milage may vary), but even if 5 or 8 or 12 
addresses become unreachable the timeouts get bad enough for users to 
notice.

The 64000 ms timeout query is: at what point do the downsides listed 
above (along with troubleshooting hell) start to overtake the benefit 
of better latency? I think the answer lies in the answers to these 
three questions:

- How good is BGP in selecting the lowest latency path?
- How fast is BGP convergence?
- Which percentage of queries go to the first or fastest server in the 
list?