dns and software, was Re: Reliable Cloud host ?

William Herrin bill at herrin.us
Fri Mar 2 18:12:56 UTC 2012


On Fri, Mar 2, 2012 at 1:03 AM, Owen DeLong <owen at delong.com> wrote:
> On Mar 1, 2012, at 9:34 PM, William Herrin wrote:
>> You know, when I wrote 'socket=connect("www.google.com",80,TCP);' I
>> stopped and thought to myself, "I wonder if I should change that to
>> 'connectbyname' instead just to make it clear that I'm not replacing
>> the existing connect() call?" But then I thought, "No, there's a
>> thousand ways someone determined to misunderstand what I'm saying will
>> find to misunderstand it. To someone who wants to understand my point,
>> this is crystal clear."

"Hyperbole." If I had remembered the word, I could have skipped the
long description.

> I'm all for additional library functionality
> I just don't want conect() to stop working the way it does or for getaddrinfo() to stop
> working the way it does.

Good. Let's move on.


First question: who actually maintains the standard for the C sockets
API these days? Is it a POSIX standard?

Next, we have a set of APIs which, with sufficient caution and skill
(which is rarely the case) it's possible to string together a
reasonable process which starts with a some kind of name in a text
string and ends with established communication with a remote server
for any sort of name and any sort of protocol. These APIs are complete
but we repeatedly see certain kinds of error committed while using
them.

Is there a common set of activities an application programmer intends
to perform 9 times out of 10 when using getaddrinfo+connect? I think
there is, and it has the following functionality:

Create a [stream].to one of the hosts satisfying [name] + [service]
within [timeout] and return a [socket].

Does anybody disagree? Here's my reasoning:

Better than 9 times out of 10 a steam and usually a TCP stream at
that. Connect also designates a receiver for a connectionless protocol
like UDP, but its use for that has always been a little peculiar since
the protocol doesn't actually connect. And indeed, sendto() can
designate a different receiver for each packet sent through the
socket.

Name + Service. If TCP, a hostname and a port.

Sometimes you want to start multiple connection attempts in parallel
or have some not-quire-threaded process implement its own scheduler
for dealing with multiple connections at once, but that's the
exception. Usually the only reason for dealing with the connect() in
non-blocking mode is that you want to implement sensible error recover
with timeouts.

And the timeout - the direction that control should be returned to the
caller no later than X. If it would take more than X to complete, then
fail instead.



Next item: how would this work under the hood?

Well, you have two tasks: find a list of candidate endpoints from the
name, and establish a connection to one of them.

Find the candidates: ask all available name services in parallel
(hosts, NIS, DNS, etc). Finished when:

1. All services have responded negative (failure)

2. You have a positive answer and all services which have not yet
answered are at a lower priority (e.g. hosts answers, so you don't
need to wait for NIS and DNS).

3. You have a positive answer from at least one name service and 1/2
of the requested time out has expired.

4. The full time out has expired (failure).

Cache the knowledge somewhere along with TTLs (locally defined if the
name service doesn't explicitly provide a TTL). This may well be the
first of a series of connection requests for the same host. If cached
and TTL valid knowledge was known for this name for a particular
service, don't ask that service again.

Also need to let the app tell us to deprioritize a particular result
later on. Why? Let's say I get an HTTP connection to a host but then
that connection times out. If the app is managing the address list, it
can try again to another address for the same name. We're now hiding
that detail from the app, so we need a callback for the app to tell
us, "when I try again, avoid giving me this answer because it didn't
turn out to work."


So, now we have a list of addresses with valid TTLs as of the start of
our connection attempt. Next step: start the connection attempt.

Pick the "first" address (chosen by whatever the ordering rules are)
and send the connection request packet and let the OS do its normal
retry schedule. Wait one second (system or sysctl configurable) or
until the previous connection request was either accepted or rejected,
whichever is shorter. If not connected yet, background it, pick the
next address and send a connection request. Repeat until a one
connection request has been issued to all possible destination
addresses for the name.

Finished when:

1. Any of the pending connection requests completes (others are aborted).

2. The time out is reached (all pending request aborted).

Once a connection is established, this should be cached alongside the
address and its TTL so that next time around that address can be tried
first.

Thoughts?

The idea here, of course, is that any application which uses this
function to make its connections should, at an operations level, do a
good job handling both multiple addresses with one of them unreachable
as well as host renumbering that relies on the DNS TTL.



> Since you were hell bent on calling the existing mechanisms broken rather than
> conceding the point that the current process is not broken, but, could stand some
> improvements in the library

I hold that if an architecture encourages a certain implementation
mistake largely to the exclusion of correct implementations then that
architecture is in some way broken. That error may be in a particular
component, but it could be that the components themselves are correct.
There could be in a missing component or the components could strung
together in a way that doesn't work right. Regardless of the exact
cause, there is an architecture level mistake which is the root cause
of the consistently broken implementations.


Regards,
Bill Herrin


-- 
William D. Herrin ................ herrin at dirtside.com  bill at herrin.us
3005 Crane Dr. ...................... Web: <http://bill.herrin.us/>
Falls Church, VA 22042-3004




More information about the NANOG mailing list