Anycast provider for SMTP?

Thu Jun 18 13:59:01 UTC 2015

On Thu, Jun 18, 2015 at 09:08:13AM -0400, Joe Abley wrote:
> On 18 Jun 2015, at 7:51, Ray Soucy wrote:
> 
> >You can certainly do anycast with TCP, and for small stateless services it
> >can be effective.  You can't do anycast for a stateful application without
> >taking the split-brain problem into account.
> 
> It's really difficult to apply broad "can" or "can't", "works" or "doesn't
> work" advice here since there really are no absolutes. What works and what
> doesn't depends on the intersection between theory and practice (including
> other peoples' networks), and is broader than the architectural decision to
> use or not use anycast.
> 
> The text I pasted much earlier from RFC 4786 was a result of a lot of
> discussion (and more than a handful of objections to our attempts to answer
> this question, and to the document as a whole existing at all).
> 
> In the general, mathematical sense, it's never safe to use anycast with TCP;
> "safe" here means "entirely safe in all circumstances". Since we live on the
> Internet, we know nowhere is safe, so this answer is unsatisfying and
> doesn't help us make real-world decisions.
> 
> In the pragmatic, throw it at the wall and see what sticks sense, it's
> usually fine to use anycast with TCP; "usually" means things like "pretty
> sure I remember this working just fine at my last job" and "in our very
> particular situation the helpdesk phone didn't seem to ring". There's
> usually very little science attached to this answer, either in terms of
> comprehensive data about failures or in terms of characterising the precise
> environment and considering the ways in which it is similar or dissimilar to
> others.

I think the single greatest issue with anycast is people relying too much on anycast
where traffic falls over in a certain location, say with blackholing, and there's no
easy/quick fallback.  Like two dns servers for a domain both served in the same location
on anycast. But that can happen without anycast too..

> If anycast is being considered as part of a solution to a particular
> problem, we might consider an answer of the form "anycast, when it works, is
> expected to solve that problem; anycast might introduce new problems,
> though, so we also need to think about a fall-back to a situation where the
> old problems are reintroduced but the new ones are gone". This kind of
> fudges around the difficulty in confidently enumerating all the new problems
> with an anticipation that anycast will work enough of the time to make it
> worth using at all.
> 
> So, in the example at hand, using an MX RRSet that tries first to deliver to
> an SMTP service that is distributed using anycast but will fall back to SMTP
> service that is not might be a reasonable approach, e.g.
> 
> $ORIGIN QUIRKAFLEEG.ORG.
> 
> @  MX 10 ANY.MX   ; service provided at DEFRA, NLAMS, USIAD, HKHKG
>    MX 20 DEFRA.MX ; service provided just at DEFRA
>    MX 20 NLAMS.MX ; service provided just at NLAMS
>    MX 20 USIAD.MX ; service provided just at USIAD
>    MX 20 HKHKG.MX ; service provided just at HKHKG.
> 
> so a client will first attempt to deliver to ANY.MX.QUIRKAFLEEG.ORG, and if
> that fails we'll try one of the others.

I think that is the most prudent advice, if using anycast, have a fallback.  But
following this thread there's something that's been left unsaid, and that no-one
seems to have mentioned.

If there's two MX hosts that can most likely receive mail for users in either
location, and of them is unreliable, then what happens when that unreliable one
receives an email and can't pass it onto the relevant place.

One solution is to segregate email into location dependent domains, and just have
the right email go to the right location.  But if wanting to pick and choose what
to send on, it might make sense to proxy all the emails to the destination, so that
if email is coming in the dodgy location, and being forwarded to the less dodgy
location and the connection breaks mid connection the message can be resent and
hopefully hit the less dodgy location.

And I think in some ways what might make more sense is to get some alternate path
connectivity in the dodgy location if it's just backhaul that's failing.

> For this particular question I still think that geoip/dns is a more
> straightforward approach, since it avoids the possible timeout and retry
> behaviour of the client that might delay delivery of mail in the event that
> the anycast MX is unavailable.

For availability without a high amount of performance necessary I think that geoip/dns
is probably a better solution than anycast.

But if wanting to sidetrack a little, I think that anycasting, or even moving mail
servers closer to the user isn't happening much yet.  And in a way terminating close to
the input of network, and proxying to a relevant location seems to me a way that could
incorporate some smarts without having to hold e-mail close to the edge, and slightly
improve mail delivery performance for larger emails.  So the proxy would hold mappings
of user to location, then open up a connection masquerading as the users original
source for any acl's, rate limiting or such.  And if the connection from the edge to the
mail server breaks, then another connection directly to the relevant location may work.

Ben.