Zombie accounting processes

Phil Mayers p.mayers at imperial.ac.uk
Mon Feb 13 21:08:22 CET 2006


George-Cristian Bîrzan wrote:
> On Mon, 2006-02-13 at 13:39 -0500, Alan DeKok wrote:
>>> waitpid(22058, NULL, WNOHANG)           = -1 ECHILD (No child
>> processes)
>>> 22058 exists, and is another process that hung.
>>   That's definitely a bug in your OS somewhere.  If 22058 exists, and
>> is "defunct", then the parent process calling "waitpid" on it should
>> *never* get ECHILD. 
> 
> It's not the parent, from what I can tell. At which point it should do
> that, no?
> 

No it should not do that, that's why you are having problems.

Posix MANDATES that any thread in a thread group can reap any child 
process, not just the thread that created the process. The Linux manpage 
for waitpid says this was fixed in the 2.4 kernel, but clearly not - 
possibly the problem lies in the pthread library (LinuxThreads) you're 
using, which is bundled with glibc. There's also the morass of SIGCHLD 
handling...

I would suggest trying a system with proper NPTL threading (2.6 kernel, 
recent libc compiled appropriately) which will almost certainly work. 
Upgrading your main system to that may be a little more... involved.



More information about the Freeradius-Users mailing list