LDAP timeouts during failure conditions

Tue Jun 28 22:11:58 CEST 2011

On 06/28/2011 08:01 PM, Alexander Clouter wrote:
> Phil Mayers<p.mayers at imperial.ac.uk>  wrote:
>>
>>> I'd really like 3.0 to have generic connection pools.  That would
>>> solve this problem by having common code, instead of stuff in
>>> rlm_sql, rlm_ldap, etc.
>>
>> Do you have any pointers how to get started on this? Off the top of my
>> head it seems we'd need something like the code below; a struct to hold
>> module-supplied connection create/keepalive/delete functions, some code
>> in the server core to set and re-set "last used" times and call a
>> keepalive function, and delete
>>
> I probably would not bother with keep alive (or 'last used').  I would

The idea of the keepalive was not to "hold the connection open" at the 
TCP layer. It's to detect dead server(s) in a timely fashion i.e. 
hopefully _before_ someone tries to run a radius packet through the module.

But now that I think about it, it won't work as I envisaged. The "open" 
and "keepalive" call on a connection may block, so can't be run in the 
main event loop - it needs to be in a thread. You don't want to just 
connect on-demand, since the whole point is that a pool with 
connections==0 will fast-fail and let "redundant {}" do its work.

It needs a worker thread, or a proper async LDAP API, which libldap 
doesn't have sadly :o(

> imagine in practice your 'idle' time should be shorter that any NAT or
> server daemon concept of idleness?  What I'm trying to say is the cost
> of an open idle link is low, tearing it down and rebuilding it is...if
> the connection has been idle for a long time the server (or
> NAT/firewall) would have probably killed it.
>
> If you really want keep alives, it probably would be better to go for
> SO_KEEPALIVE (as NOOP as you can get)?  No doubt this would have to be
> done in the driver rather than the layer you are constructing?

Maybe. Depends if the driver layer offers you any ability to tweak 
underlying TCP parameters. But as I say, I'm not concerned about TCP 
connections; I'm concerned about detecting dead servers.

And SO_KEEPALIVE is way, way too slow to detect dead servers.

>
> As a passing not, I susepect you do not care for async LDAP queries?

On the contrary, I really like async LDAP queries, and async/event 
driven architectures in general. My preferred network programming 
framework is Python Twisted, which is completely non-blocking 
event/callback driven

Problem is, async LDAP queries are a little (!) more work to implement 
in a threadpool-based server like FreeRADIUS:

  * put LDAP query params into struct
  * call query function; this allocates a semaphore, locks a queue, puts 
the query & semaphore into the queue, unlocks then waits on the semaphore
  * a separate worker thread/threadpool continually locks/pulls requests 
off/unlocks the queue, issues them & stores the LDAP msgid, then polls 
in a loop over ldap_result(); as each message comes back, it finds the 
corresponding query, copies a pointer to the result and flags the semaphore

Basically you have run another thread/pool AFACIT. And I was hoping to 
avoid that, as it seems to me likely to risk stomping all over 
FreeRADIUS carefully crafted and testing internals.

It's a real shame that libldap doesn't offer a better way to integrate 
the open LDAP TCP socket into a select()-based loop.

Some projects go the whole hog and fork() a child process to do their 
LDAP queries e.g. sssd, but this is just tedious; you have to marshal 
the LDAP query and results across process boundaries.

> It's probably the only database FreeRADIUS supports that supports this
> anyway so probably not worth thinking about.

Postgres, at least, can be used in async mode. And it can cooperate with 
select()