Memory leak with dynamic clients

Alan DeKok aland at deployingradius.com
Thu Dec 27 22:31:14 CET 2018


On Dec 27, 2018, at 3:06 PM, Jared Cantwell <jared.cantwell at jumpcloud.com> wrote:
> 
> We've noticed our FreeRADIUS (3.0.15 & 3.0.17) servers are leaking memory
> at a pretty steady rate, and I think I've finally root caused it.  I'd like
> some input to make sure I'm not missing something, and also have a few
> potential solutions to discuss.
> 
> The problem seems to be that when a dynamic client's lifetime expires the
> listen code deletes the client, but nothing ever frees the client and thus
> its memory is leaked.
> 
> This listen.c code deletes the client if its lifetime has expired:
> https://github.com/FreeRADIUS/freeradius-server/blob/release_3_0_17/src/main/listen.c#L203-L207

  That's good...

> But the client.c code it calls only removes the client from the rbtree.  It
> does not call client_free:
> https://github.com/FreeRADIUS/freeradius-server/blob/release_3_0_17/src/main/client.c#L379-L395

  That's bad...

> I found this changeset from a few years ago that removed free callback from
> the rbtree.  It seems that before this change, the rbtree_deletebydata in
> client_delete would have called client_free indirectly.  But since this
> change, I don't see anything that is ever calling client_free on deleted
> clients:
> https://github.com/FreeRADIUS/freeradius-server/commit/155ea052fba183b692f8fccccf091371977d2548

  That call was removed because you can't free a client until all requests using that client are finished.  Otherwise, if you free the client, the memory is bad, and the server crashes.

> I think there are a couple of potential solutions:
> 
>   1. Add back the client_free callback into the rbtree_create call.  I
>   have to investigate more why this would have been removed in the first
>   place.
>   2. Call client_free at the bottom of client_delete to ensure that
>   deleted clients get freed.

  Both of those will have the same issue.

  The better solution is once a client is deleted, add *another* timer for "max_request_time + 10", which actually frees the client.  At that point, the requests using the client should be finished, and it can be safely freed.

> Additionally, once this is fixed, I have additional concerns around the
> client_free implementation for dynamic clients.  In particular, the use of
> the bounded-size fifo deleted_clients:
> https://github.com/FreeRADIUS/freeradius-server/blob/release_3_0_17/src/main/client.c#L73-L97
> 
> My concerns are:
> 
>   - The FIFO is limited in size to 1024.  Currently, if a fr_fifo_push is
>   called that would exceed the size, that client is leaked since the error
>   code is not checked. You could set the 1024 to something higher, but how
>   high?  When would we be convinced that it won't ever leak?

  It's better to just check the error code, I think.

>   - The code seems to guarantee that a client won't be freed for 120
>   seconds after the free is requested.  This is probably safe, but how do we
>   know?  I think something more explicit might be nicer.  Also, keeping
>   memory around unnecessarily for 120 seconds isn't ideal either.

  We know it's (mostly) safe because max_request_time is less than 120.  After max_request_time, the request is done, and no more requests are using the client.

>   - Nothing reaps the deleted_clients list outside of client_free
>   requests.  This is probably good enough, but really old things can just get
>   stuck on this queue.

  It's better to just set a per-client timer.  We can then drop the fifo.

> I'm wondering if this can be addressed by using talloc's reference
> counting?  I feel like I saw a comment suggesting this, but I can't find it
> right now.  It would require all callers of client_find to have a memory
> context the returned client can be added under.  There are ~10-15 places
> that would need changed.

  It would be better to just have an explicit counter in the RADCLIENT data structure.  But... the clients can be accessed from multiple threads, so there are locking issues with that approach.

  It's best to just set a timer.  I'll push a fix.

  Alan DeKok.




More information about the Freeradius-Devel mailing list