Memory leak with dynamic clients

Jared Cantwell jared.cantwell at jumpcloud.com
Thu Dec 27 21:06:49 CET 2018


Hi FR devs,

We've noticed our FreeRADIUS (3.0.15 & 3.0.17) servers are leaking memory
at a pretty steady rate, and I think I've finally root caused it.  I'd like
some input to make sure I'm not missing something, and also have a few
potential solutions to discuss.

The problem seems to be that when a dynamic client's lifetime expires the
listen code deletes the client, but nothing ever frees the client and thus
its memory is leaked.

This listen.c code deletes the client if its lifetime has expired:
https://github.com/FreeRADIUS/freeradius-server/blob/release_3_0_17/src/main/listen.c#L203-L207

But the client.c code it calls only removes the client from the rbtree.  It
does not call client_free:
https://github.com/FreeRADIUS/freeradius-server/blob/release_3_0_17/src/main/client.c#L379-L395

I found this changeset from a few years ago that removed free callback from
the rbtree.  It seems that before this change, the rbtree_deletebydata in
client_delete would have called client_free indirectly.  But since this
change, I don't see anything that is ever calling client_free on deleted
clients:
https://github.com/FreeRADIUS/freeradius-server/commit/155ea052fba183b692f8fccccf091371977d2548

I think there are a couple of potential solutions:

   1. Add back the client_free callback into the rbtree_create call.  I
   have to investigate more why this would have been removed in the first
   place.
   2. Call client_free at the bottom of client_delete to ensure that
   deleted clients get freed.


Anyone have thoughts on which is a better approach?

Additionally, once this is fixed, I have additional concerns around the
client_free implementation for dynamic clients.  In particular, the use of
the bounded-size fifo deleted_clients:
https://github.com/FreeRADIUS/freeradius-server/blob/release_3_0_17/src/main/client.c#L73-L97

My concerns are:

   - The FIFO is limited in size to 1024.  Currently, if a fr_fifo_push is
   called that would exceed the size, that client is leaked since the error
   code is not checked. You could set the 1024 to something higher, but how
   high?  When would we be convinced that it won't ever leak?
   - The code seems to guarantee that a client won't be freed for 120
   seconds after the free is requested.  This is probably safe, but how do we
   know?  I think something more explicit might be nicer.  Also, keeping
   memory around unnecessarily for 120 seconds isn't ideal either.
   - Nothing reaps the deleted_clients list outside of client_free
   requests.  This is probably good enough, but really old things can just get
   stuck on this queue.

I'm wondering if this can be addressed by using talloc's reference
counting?  I feel like I saw a comment suggesting this, but I can't find it
right now.  It would require all callers of client_find to have a memory
context the returned client can be added under.  There are ~10-15 places
that would need changed.

Thanks!
Jared Cantwell


More information about the Freeradius-Devel mailing list