[dnsop] Re: Root Anycast (fwd)

Sat Apr 23 18:34:44 UTC 2005

Here is a good mesage on the subject from DNSOP. It explains the PPLB 
loadbalancing on BGP links is not the only way the problem can arise.  
PPLB on interior links can also be a problem.

The following statement from RFC 1546 is probably most significant:

   It is important to remember that anycasting is a stateless service.
   An internetwork has no obligation to deliver two successive packets
   sent to the same anycast address to the same host.

Particularly, the second sentence.

		--Dean

-- 
Av8 Internet   Prepared to pay a premium for better service?
www.av8.net         faster, more reliable, better service
617 344 9000   

---------- Forwarded message ----------
Date: Mon, 4 Oct 2004 19:51:45 -0400 (EDT)
From: Dean Anderson <dean at av8.com>
To: Iljitsch van Beijnum <iljitsch at muada.com>
Subject: Re: [dnsop] Re: Root Anycast (fwd)

On Sat, 2 Oct 2004, Iljitsch van Beijnum wrote:

> On 2-okt-04, at 2:48, Dean Anderson wrote:
> 
> > I agree entirely.  Though I'd point out that paths over OSPF redundant
> > links exiting an AS could lead to different border routers.
> 
> Well, I don't see how information in OSPF or another internal protocol 
> could lead to different links being used. (Unless you redistribute BGP 
> in OSPF, which isn't a very good idea.)

It doesn't have to be exported:

      host
      |
      A   OSPF interior:  sends traffic to B1, B2 (PPLB & default route)
     / \
    B1  B2     BGP used to talk to AS D and E
    |    |
    D1    E1   BGP used to talk AS B and AS F
   / \   / \
     F1    F2   BGP ...
    / \   / \
   f1    f3       f1, f3 are anycasted roots

Any packet sent to B1 will likely always arrive at f1, while any packet 
sent to B2 will probably always arrive at f3...

BTW, good call on the UDP fragmentation. I didn't think of that.

		--Dean

> > I'm not sure what "significant number of coincidences" is,
> 
> Like this:
>
>     A
>    / \
>   B1  B2
>   |    |
>   C    D
>   |    |
>   E1  E2
>
> 
> AS A connects to two different routers in AS B, and each of these 
> routers prefers a different external path towards different anycast 
> instances of AS E. In order for this to happen the paths from B to both 
> anycast instances must be completely identical, except that for one 
> router in B one path is preferred and for another router the other. 
> This will only happen if these routers connect to ASes C and D 
> themselves, or if one sees a better IGP metric towards the router 
> connecting to C and another sees a better IGP metric towards the router 
> connecting to D.

If PPLB is turned on, and if there is a path from A to E through both B1
and B2, packets will be evenly distributed across those links.  I'm told
this works on IOS.  I've also been told many companies are working on
this.

Next, we need to look more in detail how anycast works. Essentially, 
anycast is giving two or more computers the same IP address, and then 
distributing those computers so that packets to that IP address will hit 
only one computer. Thus, you get load distribution and not just 
redundancy.

Using your example, AS C and AS D both connect to AS E through different
routers.  Using Vixie's document, we suppose that the destination IP is on
a switched LAN directly attached to both routers. Note that this does not
appear to be the case in practice. In practice, it seems that anycast
servers are physically distributed and that routes to the particular
address just go to differnet computers that don't share a lan. But I think
these particulars are not significant.  So, I'll just consider the case of
a switched LAN:

On getting the packet from A to anycast host(on E), router E1 will ARP for
the MAC address of the anycast host. It will get one MAC. After that, and
until the expiration of the ARP, it will use that MAC for that IP.  
Packets from that router will go through the switch only to that physical
address. Again, a load distribution is achieved.

At first, I thought a possible solution might be to use anycast on a MAC
address.  However, this approach only achieves replication, not load
distribution.  All packets have to be seen and processed by all anycast
hosts, and there is no distribution of load.  This would still not work 
with TCP, and in fact would be worse.  

> > However, this is a more treacherous problem because a site many AS's
> > away from the roots may configure PPLB and find no problems.  
> > Sometime later, a change at one of the intermediate ASs causes packets
> > from that site to now go over multiple paths instead of one path.  
> > Suddenly, the site is not working but they have made no change, and
> > perhaps their immediate upstreams have also made no change. The only
> > way out of that mess once a deep hole is dug would be to have very
> > strict global regulation of peering and even transit so that this
> > situation is always avoided.

Here's picture of the hard to find problem: (borrowing your pictures)

A deploys PPLB and finds no problems:

     A
    / \
   B1  B2
   |    |
   X    Y
   |    |
   C    D
     \  |
   F   G
   |   |
   E1  E2

Everything is hitting E2 and just one anycast server.

Sometime later, C decides to peer with F and drop peering with G

     A
    / \
   B1  B2
   |    |
   X    Y
   |    |
   C    D
   |    |
   F    G
   |    |
   E1  E2

Now A has problems.  A calls X and Y and asks them if anything has 
changed. They report now. A calls E1 and asks them if anything has 
changes. Likewise, they report no change. 

> 
> Apart from the small chance of this actually occurring (and that most 
> DNS stuff is UDP that fits in one packet), 

I think this is only true of the network today:
	Few people are using PPLB at present,
	Few are using TCP as a DNS transport.  
	Few are using large UDP packets (which as you pointed out, could be fragmented).  

However, in the future, we anticipate a different usage pattern:
	Many will be using PPLB to improve performance and reliability
	Many will be using TCP for DNS
	Many will be using large UDP packets for DNS

> and the fact that it's 
> extremely unlikely to hit multiple anycasted roots at the same time, I 
> don't believe this problem will be too harmful. In most cases a TCP 
> session will still work even though there is a lot of "packet loss". 
> And as long as there are some non-anycasted roots we'll be fine anyway.

The likelihood of hitting multiple anycasted roots depends on:

	The number of anycasted roots
	The number and configuration of paths to the roots
	The uniqueness of multiple PPLB paths between a user and the roots
	The use of PPLB

The more anycasted roots there are, and the more paths there are, and the
the more PPLB is used, the more likely it becomes that there are distinct 
paths to different anycast roots through which PPLB interleaved packets 
are sent.

		--Dean