3.0.11 - all threads blocked in "linelog"

Chaigneau, Nicolas nicolas.chaigneau at capgemini.com
Thu Jun 23 10:40:57 CEST 2016


Hello again,



We've not rolled back yet.
The issue has occurred a second time, but nothing more since then.



I'm currently trying to reproduce the issue on our test lab with FreeRADIUS 3.0.8 (so I can check it doesn't occur anymore with the 3.0.x HEAD), but so far without success.
(even with a heavy load, having linelog write in many different files...)

Do you have a test case which triggers the issue, that you could share ?



I've noticed the Changelog entry has been added on February 18, 2016.
There are several commits the day before, I assume one of them did fix the issue but I'm not sure which... could you specify which one I should look at ?



Thanks for your help.

Regards,
Nicolas.


> On Jun 22, 2016, at 9:34 AM, Chaigneau, Nicolas <nicolas.chaigneau at capgemini.com> wrote:
> > We've installed on production yesterday FreeRADIUS version 3.0.11 (upgraded from 3.0.8).
> > 
> > Today we've noticed that the server got blocked.
> > All the threads got blocked on module "linelog", which we can see in the logs:
> > 
> > Wed Jun 22 11:57:11 2016 : Error: (1774170) Ignoring duplicate packet from client *** port 21687 - ID: 186 due to unfinished request in component post-auth module linelog
> 
>   That issue was fixed in the v3.0.x branch already.
> 
> > So I'm suspecting this is related to FreeRADIUS.
> 
>   Yes.
> 
> > I've noticed the following commit related to locking (which is configurable for detail, but not linelog):
> > 
> > https://github.com/FreeRADIUS/freeradius-server/commit/dd2a06aa6ba8d6b819712a5008213d3d28375158
> > 
> > Note that I didn't see anything wrong with the code, just noticed the "locking" part which made me suspicious... maybe I'm wrong, but I'd like to get your opinion.
> 
>   It's related.
> 
> > Could the locking be responsible for the behaviour we've observed ?
> > Should we patch our linelog to force "locking" to false ?
> 
>   That may help.
> 
> > Also, I think FreeRADIUS threads should never get blocked forever when trying to acquire a lock. (if that's what is happening)
> 
>   While I agree, the problem is that it's difficult to know whether a thread is blocked due to a lock, or just busy.  And you can't just signal a thread which is in a lock, the code will just retry the lock.  And you can't cancel the thread, because it then leaks all memory which the thread was using.
> 
>   You can't just say "software should work perfectly no matter what".  Things go wrong, and sometimes they just can't be fixed.
> 
>   Alan DeKok.


This message contains information that may be privileged or confidential and is the property of the Capgemini Group. It is intended only for the person to whom it is addressed. If you are not the intended recipient, you are not authorized to read, print, retain, copy, disseminate, distribute, or use this message or any part thereof. If you receive this message in error, please notify the sender immediately and delete all copies of this message.



More information about the Freeradius-Users mailing list