FreeRADIUS Thread Behaviour

Arran Cudbard-Bell a.cudbardb at freeradius.org
Tue Oct 17 03:30:41 CEST 2017



> The motivation for looking into this is that I have occasions where freeradius
> reports “Could not start TLS: Can't contact LDAP server”
> which seems to only occur when a new thread fires up and tries to start TLS
> to the LDAP server.   This happens randomly for a few seconds a couple times a month.
> Regardless of whether or not this indicates a problem
> with my LDAP server or the connection to it, I would like to know why
> I have threads continuously dying and spawning every few minutes even
> though my config, as I understand it, is set so that the threads should not die.

That's simply not how the rlm_ldap module and the threading code interact.  It will be in v4.0.x because we'll be binding (very tightly) LDAP connections to worker threads, but it is not how the code works in v3.0.x, v2.0.x or v1.0.x.

What you're seeing is threads being killed because the load on the server does not require that number of threads. i.e. The server is self-pruning threads to free up resources.

If there's an uptick in load, new threads will be created, and new connections may be created to service those threads.

The connection pool for LDAP is in no way tied to the threading code.  The connections do not get closed because a backend thread is killed, they get closed because the LDAP module is similarly detecting that it has spare connections, and is freeing them to concentrate requests on the smallest subset possible.

The reason why these events occur around the same time is because the start/mac/min_spare/max_spare parameters are shared between the thread pool and connection pool.

      start = ${thread[pool].start_servers}
      min   = ${thread[pool].min_spare_servers}
      max   = ${thread[pool].max_servers}
      spare = ${thread[pool].max_spare_servers}

Alan's example config:

start_servers = 32
max_servers = 32
min_spare_servers = 0
max_spare_servers = 32

Will indeed prevent threads and connections being pruned, but be sure to adjust your idle timeouts appropriately so you don't end up with stale connections.

Regarding the OpenLDAP API.

There's an issue with ldap_initialize where it does some global initialisation the first time it's called.  We work around that by calling it once when the server starts up before we spawn worker threads. I don't think that's the issue here.

> In multi-threaded software with a persistent connection (pool) I usually
> use a lock to ensure that ldap_initialize() and bind (simple or SASL)
> were successfully done before the connection can be used by another
> thread. That's also the reason why I usually avoid a code path with
> anon-search-without-bind because I want to provoke a connect error with
> an explicit bind right from the beginning.

If you're experiencing the same issue as us, you can just call ldap_initialize with a zero length string as the host argument.

https://github.com/FreeRADIUS/freeradius-server/blob/v4.0.x/src/lib/ldap/libfreeradius-ldap.c#L913

There are multiple other issues with libldap and lazy connections.  Almost all can be worked around, but it makes the calling code more complex.

The major issue we currently have with libldap is even with all the magic flags set to invoke async behaviour, TLS negotiation is still blocking.

-Arran


Arran Cudbard-Bell <a.cudbardb at freeradius.org>
FreeRADIUS Development Team

FD31 3077 42EC 7FCD 32FE 5EE2 56CF 27F9 30A8 CAA2




More information about the Freeradius-Users mailing list