Content Delivery Networks
warren at kumari.net
Fri Aug 10 17:15:09 UTC 2007
On Aug 10, 2007, at 1:55 AM, Paul Reubens wrote:
> How do you engineer around enterprise and ISP recursors that don't
> honor TTL, instead caching DNS records for a week or more?
A friend of mine was working for a place that performed some service
on data (not important what, you send them some data (through this
really ugly client app that they wrote in-house) and they sent you
Anyway, for various reasons they needed to move out of their current
data-center to a new provider. They had this truly monumental plan
for doing this that they had been working on for months --- MS
Project printouts that covered entire walls in this huge rainbow of
colors, 400 or so pages of plans, etc etc etc -- it all boiled down
to: Decrease the TTL, then swap in the new A record at midnight on
Friday. As soon as the TTL expired everything would start working in
the new place and it will all be transparent to the end users...
Anyway, my friend calls me at like 3 in the morning on Saturday --
they have updated DNS and none of their clients are connecting to the
new place... It seems that they have burnt some bridges with the old
provider and will be shut off on Saturday evening -- he's really
desperate, so I agree to wander over and take a look...
I arrive to find utter confusion -- the CEO is screaming at the CTO,
who appears to have decided that the best way to fix things is by
getting drunk, random other people are screaming (apparently just for
fun), etc.... I manage to get someone to calm down for long enough to
explain the summary of the plan to me and run nslookup.. Sure enough
the TTL is really low and the new IP is being handed out, etc.
I ask how long it took for the client to fail over during their tests
-- "Oh, no, we didn't test like that, we didn't want to impact the
current service, so we tested with a different domain and checked how
long it took for a IE to pick up the change... It was less than 10
We track down one of the developers and talk to him. He explains this
long and involved system with the client performing heath-checks on
the server and reconnecting wit exponential back-off, etc etc etc.
Its all great -- apart from the fact that he calls gethostbyname()
during startup, and then never again....
This is a *really* common issue....
> On 8/7/07, Patrick W.Gilmore <patrick at ianai.net> wrote:
> On Aug 7, 2007, at 10:05 AM, Michal Krsek wrote:
> >>> 5) User redirection
> >>> - You have to implement a scalable mechanisms that redirects
> >>> users to the closes POP. You can use application redirect (fast,
> >>> but not so much scalable), DNS redirect (scalable, but not so
> >>> fast) or anycasting (this needs cooperation with ISP).
> >> What is slow about handing back different answers to the same
> >> query via DNS, especially when they are pre-calculated? Seems
> >> very fast to me.
> > Yes DNS-based redirection scales very pretty.
> > But there are two problems:
> > 1) Client may not be in same network as DNS server (I'm using my
> > home DNS server even if I'm at IETF or I2 meeting on other side of
> > globe)
> This has been discussed. Operational experience posted here by Owen
> shows < 10% of users are "far" from their recursive NS.
> You are the tiny minority. (Don't feel bad, so am I. :) Most
> "users" either use the NS handed out by their local DHCP server, or
> they are VPN'ing anyway.
> > 2) DNS TTL makes realtime traffic management inpossible. Remember
> > you may not distribute network traffic, but sometimes also server
> > load. If one server/POP fails or is overloaded, you need to
> > redirect users to another one in realtime.
> Define "real time"? To do it in 1 second or less is nigh
> impossible. But I challenge you to fail anything over in 1 second
> when IP communication with end users not on your LAN is involved.
> I've seen TTLs as low as 20s, giving you a mean fail-over time of 10
> seconds. That's more than fast enough for most applications these
More information about the NANOG