LDAP timeouts during failure conditions
p.mayers at imperial.ac.uk
Wed Jun 29 18:59:05 CEST 2011
On 06/23/2011 05:28 PM, Alan DeKok wrote:
> Phil Mayers wrote:
>> So, some discussion on the JANET-ROAMING list leads me to believe that,
>> during an "ldap server down" condition, rlm_ldap will incur
>> "net_timeout" on every (or many) passes through the module.
> It's better for the module to track when connections are down, and
> return quickly if all are down.
...this is *not* a connection pool, but an example of one way to solve
the problem; spawn a child thread to create connections.
I'm aware the code as-is has big problems but it might inspire something
more useful; off the top of my head:
* the new "failure" flag to ldap_release_conn is used too
aggressively, meaning rlm_ldap will drop a connection in some cases it
doesn't need to
* it doesn't touch the eDir code - I don't have a way to test it
* there's no way to terminate and re-start the connection manager thread
* the connection-manager thread does not obey the "-s" command line
* it uses a dumb sleep() rather than semaphore to wake and commence
...and probably lots more.
Related to this, connection re-binding in the non-async case should
probably live inside ldap_get_conn and move out of perform_search() and
siblings. But the diff as-is is hopefully easier to read.
This patch also doesn't solve "LDAP-Group == X" pointing at one and only
one module. One possible way to solve that is as per Alex suggestion, to
manage the TCP connections ourselves (which we could do inside the
worker thread) and when people pass in >1 hostname to the module, do
some kind of round-robin / fastest-wins connection algorithm.
More information about the Freeradius-Devel