ipaddr, ipv4addr ipv6addr

Sat May 31 18:20:27 CEST 2014

Alan DeKok wrote:
>> Brian Julin wrote:
>> For home servers, the conjectural behavior when DNS fails for
>> all servers in the pool would be an empty pool would be brought up, so
>> the server would still start, but would behave as if it had no non-dead
>> servers in that pool.  Then things would start working when DNS started
>> working again.

>  That isn't the issue.  The issue is that DNS failures can take 30s to
>time out.  So the server will be down for 30s... potentially for each
>DNS lookup.  That's bad.

>  The solution is to delay the DNS lookups until some later time.
> But... when?

Yes, just bring up the servers in an dead-ish state, and resolve the DNS
asyncronously, then mark them alive.

In DDDS you pretty much have no choice but to look up fresh domains
when the request comes in, because you don't know them until then.  As
long as your DNS is caching and sanely configured, though, this will be
the minority of requests.  I would imagine in the case of literal servers
in the config file, we would start up status-server immediately and this
could be done during the setup of status-server requests and that would
be a clean model for it.

>  And how long does it keep retrying?  Does that go into a
> child thread?  If so, how does it update the structures in the main thread?

The unbound library does all its interim work (calling back for recursion
and DNSSec and whatnot in its own threads (or processes)
which can queue callbacks in the application once it has final
results, but the callback queue does not get run until explicitly told to from
an application thread context.  There is also a selectable socket and such
to determine when callbacks are ready to run.

They have to do it that way to support the optional process-based model.
So we have say over whether callbacks get run from the main thread or not,
and they are run sequentially, not in parallel.

The example code in my fork does a stable job of delivering current
DDDS results in attributes to the post-realm stages of request processing,
even though that involves making multiple requests from DNS based on
previous results.  (I do still have to refork and adjust that code now that the
experimental event loop is available, that fork is still hacking up the protocol
mechanism and no doubt some freshening is needed.)  At that point the actual
choice of the server has not been made yet but the eligible hosts and their
probability weights are known.

So other than the structural layout work and how to hash/balance when your
weights can change, the question boils down to what happens when we need
to modify the pools themselves by adding an ad-hoc server -- how do we signal the
main thread to do that work and then defer until it finishes, then do it again if that
fails, then drop the request if it all takes too long and rely on the client
to retry, perhaps completing the work in the meantime.  When doing the lookups
I have libunbound to bounce things off, but then later we need to use
in-house mechanisms.

Also naturally this all has to be done while allowing the worker threads enough
atomic read-only access to use existing/cached connections.

>>  Also as far as AAAA/A records it would try both unless
>> told otherwise, so worst case there is we end up load balancing needlessly
>> between the A and AAAA sockets on the same server, or having
>> a useless marked-dead home server hanging around for one of the
>> address records if ipv4 or ipv6 were broken.
>
>  And when some DNS admin (i.e. non-RADIUS) adds another A record
> without telling you... the proxy is suddenly going to be trying an
> address which has no RADIUS server.

In that case you end up with another dead server, and log entries to match,
and I don't think it should be incumbent on FreeRADIUS to attempt to do anything
other than log that problem at a reasonable rate.  The potential for foot shooting
when the RADIUS and DNS administrators are in different departments is
indeed high.

>  The failure cases for DDNS lookups are many, and hard to get right.

Yup.  I had a long back and forth with the RFC editor about just what to do
in the various places where failures can occur and there are a few tweaks
resulting from that that still need to be made to the code in my fork.  That
handles all the DNS-level failure scenarios but there is also the scenario
of connection attempt failures and how to feed those back into the DDDS
process to cause traversal into backup branches of the DDDS tree.

It's a bear, but I think it is doable.