rlm_exec triggered hang or local weirdness?

Joe Maimon jmaimon at ttec.com
Fri Oct 28 05:25:48 CEST 2005


(long)

I have been running radiusd cvs 9/29 + local mods for some time. (mostly 
under gdb with -fxx), to see if I can upgrade all systems to this or 
something later (plan to recheckpoint as soon as 9/29 is stable for me).

As illustrated by previous unrelated posts, this server forks a lot (for 
every accounting which is confed for 5 minute updates per logged in user 
on the nas) to implement a ddns scheme.

One issue that appears to be cropping up is where debug output and the 
server appears to freeze up.

=====================================================
thread apply all bt shows this :

Thread 11 (Thread -1290798160 (LWP 7382)):
#0  0xb7ca438f in fork () from /lib/tls/libc.so.6
#1  0xb7f73064 in fork () from /lib/tls/libpthread.so.0
#2  0x08058a03 in rad_fork (exec_wait=0) at threads.c:1114
#3  0xb7fa8205 in radius_exec_program () from 
/usr/lib/freeradius/rlm_exec-1.1.0-pre0.so
#4  0xb7fa8d8c in radius_exec_program () from 
/usr/lib/freeradius/rlm_exec-1.1.0-pre0.so
#5  0x08054d26 in modcall (component=3, c=0x81b1188, request=0xb51033c8) 
at modcall.c:254
#6  0x080552d9 in call_one (component=Variable "component" is not available.
) at modcall.c:287
#7  0x08054eb5 in modcall (component=3, c=0x81afa08, request=0xb51033c8) 
at modcall.c:342
#8  0x08053f5f in indexed_modcall (comp=3, idx=Variable "idx" is not 
available.
) at modules.c:428
#9  0x0804f009 in rad_accounting (request=0xb51033c8) at acct.c:65
#10 0x08060f32 in rad_respond (request=0xb51033c8, fun=0x804efa0 
<rad_accounting>) at request_process.c:456
#11 0x08057dba in request_handler_thread (arg=0x81b5828) at threads.c:585
#12 0xb7f6cccd in start_thread () from /lib/tls/libpthread.so.0
#13 0xb7ce3b0e in clone () from /lib/tls/libc.so.6

<the rest of the threads are at sem_wait>

Thread 1 (Thread -1212102976 (LWP 27527)):
#0  0xb7cf0d69 in pthread_setcanceltype () from /lib/tls/libc.so.6
#1  0xb7c7ac05 in malloc () from /lib/tls/libc.so.6
#2  0xb7f5a744 in rad_recv (fd=8) at radius.c:1063
#3  0x0805ec18 in proxy_socket_recv (listener=0xfffffffc, 
pfun=0xfffffffc, prequest=0xfffffffc) at listen.c:802
#4  0x0804d3c1 in main (argc=2, argv=0xbfabde44) at radiusd.c:665

============================================

This tentatively appears to occur with -fxx NOT under gdb, normal is not 
currently known.

cvs 3/16 does not have this issue when running normally.

The only change I could detect as being obviously relevant is an added 
call to reap_children() in rad_fork()

Now reap_children implements some magic voodoo tail head queue scheme 
that I havent managed to wrap my head around yet.

debug output shows that radutmp (which is before the ddns exec module) 
is the last module to returns normally before.

So I have added copious debug output. And I am having trouble 
understanding what I am seeing (nothing new there).

In reap_children I have added some calls like this:


==================================
--- 
/home/joe/download/radius/cvs/cvs-20050929/radiusd/src/main/threads.c 
     2005-09-14 21:21:14.000000000 -0400
+++ src/main/threads.c  2005-10-27 22:45:28.000000000 -0400
@@ -246,8 +248,10 @@
  {
         if (thread_pool.wait_head != thread_pool.wait_tail) {
                 int num;
+               int rval;

                 pthread_mutex_lock(&thread_pool.wait_mutex);
+               FUNC_LINE_FILE(" mutex locked");
                 for (num = ((thread_pool.wait_tail + MAX_WAITERS) - 
thread_pool.wait_head) % MAX_WAITERS;
                      num != 0;
                      num--) {
@@ -259,18 +263,35 @@
                         /*
                          *      Child is still alive: move it to the tail.
                          */
-                       if (waitpid(pid, NULL, WNOHANG) == 0) {
+                       FUNC_LINE_FILE(" wait for %d", pid);
+                       if (waitpid(pid, &rval, WNOHANG) == 0) {
                                 if (((thread_pool.wait_tail + 1) % 
MAX_WAITERS)
                                     == thread_pool.wait_head) {
                                         rad_assert(0 == 1);
                                 }
+                               FUNC_LINE_FILE(" wait for %d returned val\n"
+                                               " WIFEXITED(%d)"
+                                               " WEXITSTATUS(%d)"
+                                               " WIFSIGNALED(%d)"
+                                               " WTERMSIG(%d)"
+                                               " WIFSTOPPED(%d)"
+                                               " WSTOPSIG(%d)"
+                                               , pid, WIFEXITED(rval),
+                                               WIFEXITED(rval) ? 
WEXITSTATUS(rval) : 0,
+                                               WIFSIGNALED(rval),
+                                               WIFSIGNALED(rval) ? 
WTERMSIG(rval) : 0,
+                                               WIFSTOPPED(rval),
+                                               WIFSTOPPED(rval) ? 
WSTOPSIG(rval) : 0
+                                               );

                                 thread_pool.wait[thread_pool.wait_tail] 
= pid;
                                 thread_pool.wait_tail++;
                                 thread_pool.wait_tail %= MAX_WAITERS;
                         } /* else no child, or was already reaped */
+                       else FUNC_LINE_FILE(" wait for %d returned 
noval", pid);
                 }
                 pthread_mutex_unlock(&thread_pool.wait_mutex);
+               FUNC_LINE_FILE(" mutex unlocked");
         }
  }

===============================================================

A sample of the output looks like this:

================================================================
reap_children() threads.c:254 mutex locked
reap_children() threads.c:266 wait for 9240
reap_children() threads.c:285 wait for 9240 returned val
  WIFEXITED(0) WEXITSTATUS(0) WIFSIGNALED(1) WTERMSIG(84) WIFSTOPPED(0) 
WSTOPSIG(0)
reap_children() threads.c:266 wait for 9241
reap_children() threads.c:285 wait for 9241 returned val
  WIFEXITED(0) WEXITSTATUS(0) WIFSIGNALED(1) WTERMSIG(84) WIFSTOPPED(0) 
WSTOPSIG(0)
reap_children() threads.c:266 wait for 14565
reap_children() threads.c:285 wait for 14565 returned val
  WIFEXITED(0) WEXITSTATUS(0) WIFSIGNALED(1) WTERMSIG(84) WIFSTOPPED(0) 
WSTOPSIG(0)
reap_children() threads.c:266 wait for 14566
reap_children() threads.c:285 wait for 14566 returned val
  WIFEXITED(0) WEXITSTATUS(0) WIFSIGNALED(1) WTERMSIG(84) WIFSTOPPED(0) 
WSTOPSIG(0)
reap_children() threads.c:266 wait for 17694
reap_children() threads.c:285 wait for 17694 returned val
  WIFEXITED(0) WEXITSTATUS(0) WIFSIGNALED(1) WTERMSIG(84) WIFSTOPPED(0) 
WSTOPSIG(0)
reap_children() threads.c:266 wait for 19290
reap_children() threads.c:285 wait for 19290 returned val
  WIFEXITED(0) WEXITSTATUS(0) WIFSIGNALED(1) WTERMSIG(84) WIFSTOPPED(0) 
WSTOPSIG(0)
reap_children() threads.c:266 wait for 19291
reap_children() threads.c:285 wait for 19291 returned val
  WIFEXITED(0) WEXITSTATUS(0) WIFSIGNALED(1) WTERMSIG(84) WIFSTOPPED(0) 
WSTOPSIG(0)
reap_children() threads.c:266 wait for 20399
reap_children() threads.c:285 wait for 20399 returned val
  WIFEXITED(0) WEXITSTATUS(0) WIFSIGNALED(1) WTERMSIG(84) WIFSTOPPED(0) 
WSTOPSIG(0)
reap_children() threads.c:294 mutex unlocked
========================================================================

Question 1:

Signal 84?

Question 2:

Appears that reap_children waits for thread pids, in addition to forked 
children? (see 9241 and 9240 and 20399..... below)

======================================================
freerad   6760 32641  6760  0    7 22:45 pts/15   00:00:00 
src/main/.libs/radiusd -fxx
freerad   6760 32641  6791  0    7 22:45 pts/15   00:00:00 
src/main/.libs/radiusd -fxx
freerad   6760 32641  6792  0    7 22:45 pts/15   00:00:00 
src/main/.libs/radiusd -fxx
freerad   6760 32641  6793  0    7 22:45 pts/15   00:00:00 
src/main/.libs/radiusd -fxx
freerad   6760 32641  6794  0    7 22:45 pts/15   00:00:00 
src/main/.libs/radiusd -fxx
freerad   6760 32641  6795  0    7 22:45 pts/15   00:00:00 
src/main/.libs/radiusd -fxx
freerad   6760 32641 19287  0    7 22:58 pts/15   00:00:00 
src/main/.libs/radiusd -fxx
freerad   9240  6760  9240  0    1 22:48 pts/15   00:00:00 
src/main/.libs/radiusd -fxx
freerad   9241  6760  9241  0    1 22:48 pts/15   00:00:00 
src/main/.libs/radiusd -fxx
freerad  14565  6760 14565  0    1 22:53 pts/15   00:00:00 
src/main/.libs/radiusd -fxx
freerad  14566  6760 14566  0    1 22:53 pts/15   00:00:00 
src/main/.libs/radiusd -fxx
freerad  17694  6760 17694  0    1 22:56 pts/15   00:00:00 
src/main/.libs/radiusd -fxx
freerad  19290  6760 19290  0    1 22:58 pts/15   00:00:00 
src/main/.libs/radiusd -fxx
freerad  19291  6760 19291  0    1 22:58 pts/15   00:00:00 
src/main/.libs/radiusd -fxx
freerad  20399  6760 20399  0    1 22:59 pts/15   00:00:00 
src/main/.libs/radiusd -fxx
freerad  22467  6760 22467  0    1 23:01 pts/15   00:00:00 
src/main/.libs/radiusd -fxx
=============================================================

How this has anything todo with my possible issue, I am not quite certain.


misc:

FUNC_LINE_FILE() is a local debug modification looks like this:

#define FUNC_LINE_FILE(format, ...) do {\
                printf("%s() %s:%d" format 
"\n",__func__,__FILE__,__LINE__, ## __VA_ARGS__);\
                } while (0);


(yes its missing a mutex lock, but since I put it into libradius.h I 
couldnt easily use DEBUG2)

If anyone has any tips I would appreciate it.

Joe



More information about the Freeradius-Devel mailing list