radiusd deadlock on recvfrom on port 1814

Ryan Melendez rmelendez at wayport.net
Wed Oct 31 15:05:52 CET 2007


On Wed, 2007-10-31 at 08:13 +0100, Alan DeKok wrote:
> Ryan Melendez wrote:
> > recvfrom() blocks on datagram sockets just like any other type of socket
> > unless it gets a S0_RCVTIMEO or the O_NONBLOCK is set (in which case you
> > would receive an error). 
> 
>   Hmm... I guess I hadn't run into that before, because select() never
> lied about data being available.
> 
>   The simplest solution on your system is to set O_NONBLOCK on the
> sockets.  But that is just a work-around for the kernel bug (i.e. race
> condition).  If data is ready on a socket, it means that data is
> ready... blocking on the recvfrom() after telling the application that
> data is ready is not very nice.

I'm not positive that select is lying about data being available. It
could be that there is data when select is called, but _something_ out
of line grabs it before recvfrom() can get to it.  The only time I've
ran into this in the past(not freeradius) is when some flavor of read is
called on the socket outside the select loop (bad programming).  I can't
see anywhere this is happening in freeradius.

Again, this only started happening when I began running two radiusd
processes on different interfaces on a multihomed system.  I also have
radrelay binding to one interface and replicating acct packets to the
other process.

I suspect you are correct that some race condition in the kernel
possibly regarding pthread.  I'm going to continue investigating, I'll
make the socket non-blocking as a last resort.

If anyone has experienced this problem before, or has any suggestions
please let me know.

Thanks,
Ryan

> 
>   Alan DeKok.
> -
> List info/subscribe/unsubscribe? See http://www.freeradius.org/list/users.html



More information about the Freeradius-Users mailing list