Content Delivery Networks

Warren Kumari warren at kumari.net
Fri Aug 10 17:15:09 UTC 2007


On Aug 10, 2007, at 1:55 AM, Paul Reubens wrote:

> How do you engineer around enterprise and ISP recursors that don't  
> honor TTL, instead caching DNS records for a week or more?
>

A friend of mine was working for a place that performed some service  
on data (not important what, you send them some data (through this  
really ugly client app that they wrote in-house) and they sent you  
back something...).

Anyway, for various reasons they needed to move out of their current  
data-center to a new provider. They had this truly monumental plan  
for doing this that they had been working on for months --- MS  
Project printouts that covered entire walls in this huge rainbow of  
colors, 400 or so pages of plans, etc etc etc -- it all boiled down  
to: Decrease the TTL, then swap in the new A record at midnight on  
Friday. As soon as the TTL expired everything would start working in  
the new place and it will all be transparent to the end users...

Anyway, my friend calls me at like 3 in the morning on Saturday --  
they have updated DNS and none of their clients are connecting to the  
new place... It seems that they have burnt some bridges with the old  
provider and will be shut off on Saturday evening -- he's really  
desperate, so I agree to wander over and take a look...

I arrive to find utter confusion -- the CEO is screaming at the CTO,  
who appears to have decided that the best way to fix things is by  
getting drunk, random other people are screaming (apparently just for  
fun), etc.... I manage to get someone to calm down for long enough to  
explain the summary of the plan to me and run nslookup.. Sure enough  
the TTL is really low and the new IP is being handed out, etc.

I ask how long it took for the client to fail over during their tests  
-- "Oh, no, we didn't test like that, we didn't want to impact the  
current service, so we tested with a different domain and checked how  
long it took for a IE to pick up the change... It was less than 10  
minutes..."

We track down one of the developers and talk to him. He explains this  
long and involved system with the client performing heath-checks on  
the server and reconnecting wit exponential back-off, etc etc etc.  
Its all great -- apart from the fact that he calls gethostbyname()  
during startup, and then never again....

This is a *really* common issue....

W



> On 8/7/07, Patrick W.Gilmore <patrick at ianai.net> wrote:
> On Aug 7, 2007, at 10:05 AM, Michal Krsek wrote:
>
> >>> 5) User redirection
> >>> - You have to implement a scalable mechanisms that redirects
> >>> users  to the closes POP. You can use application redirect (fast,
> >>> but not  so much scalable), DNS redirect (scalable, but not so
> >>> fast) or  anycasting (this needs cooperation with ISP).
> >>
> >> What is slow about handing back different answers to the same
> >> query  via DNS, especially when they are pre-calculated?  Seems
> >> very fast to  me.
> >
> > Yes DNS-based redirection scales very pretty.
> >
> > But there are two problems:
> > 1) Client may not be in same network as DNS server (I'm using my
> > home DNS server even if I'm at IETF or I2 meeting on other side of
> > globe)
>
> This has been discussed.  Operational experience posted here by Owen
> shows < 10% of users are "far" from their recursive NS.
>
> You are the tiny minority.  (Don't feel bad, so am I. :)  Most
> "users" either use the NS handed out by their local DHCP server, or
> they are VPN'ing anyway.
>
>
> > 2) DNS TTL makes realtime traffic management inpossible. Remember
> > you may not distribute network traffic, but sometimes also server
> > load. If one server/POP fails or is overloaded, you need to
> > redirect users to another one in realtime.
>
> Define "real time"?  To do it in 1 second or less is nigh
> impossible.  But I challenge you to fail anything over in 1 second
> when IP communication with end users not on your LAN is involved.
>
> I've seen TTLs as low as 20s, giving you a mean fail-over time of 10
> seconds.  That's more than fast enough for most applications these  
> days.
>
> --
> TTFN,
> patrick
>
>




More information about the NANOG mailing list