bin/164526: kill(1) can not kill process despite on -KILL

Коньков Евгений kes-kes at yandex.ru
Wed Feb 1 23:16:39 CET 2012


Здравствуйте, Jilles.

Вы писали 28 января 2012 г., 20:24:07:

>> [stuck process cannot be killed, system hangs when reboot is
>> attempted]

JT> A signal cannot forcibly kill a process that is stuck in the kernel.
JT> Allowing this would put the integrity of the kernel data structures at
JT> risk and likely cause hangs, data corruption or panics later on.

JT> If a process is stuck in the kernel for a long time, this can be things
JT> like broken hardware, a non-responsive NFS server or a bug.

JT> A state 'T' (stopped) probably means the process is multi-threaded and
JT> is trying to suspend but one or more threads will not cooperate
JT> (non-interruptible sleep or running in the kernel).

JT> Useful commands to obtain more information (supposing pid is 45471):

JT> ps Hl45471
JT> procstat -k 45471

JT> Of course, this does not help if you already rebooted.

repeated again:
bug is repeateable:
1. radiusd + mod_perl + example.pl(it is connects to FireBird) +
FireBIrd
2. restart firebird
3. try to restart radiusd
4. process in fall into STOP state

# ps awx | grep radi
 9438  ??  TLs     5:10.12 /usr/local/sbin/radiusd
27603   2  S+      0:00.00 grep radi
# procstat -k 9438
  PID    TID COMM             TDNAME           KSTACK
 9438 100080 radiusd          -                mi_switch sleepq_switch sleepq_wait _sx_xlock_hard _sx_xlock _vm_map_lock_upgrade vm_map_lookup vm_fault_hold vm_fault trap_pfault trap calltrap
 9438 100195 radiusd          -                mi_switch sleepq_switch sleepq_wait __lockmgr_args ffs_lock VOP_LOCK1_APV _vn_lock vm_object_deallocate unlock_and_deallocate vm_fault_hold vm_fault trap_pfault trap calltrap
 9438 101144 radiusd          -                mi_switch thread_suspend_switch thread_single exit1 sigexit postsig ast doreti_ast
# ps wHl9438
  UID   PID  PPID CPU PRI NI    VSZ    RSS MWCHAN STAT  TT     TIME COMMAND
  133  9438     1   0  20  0 351124 322000 user m TLs   ??  0:03.65 /usr/local/sbin/radiusd
  133  9438     1   0  20  0 351124 322000 ufs    TLs   ??  0:00.00 /usr/local/sbin/radiusd
  133  9438     1   0  20  0 351124 322000 -      TLs   ??  0:05.28 /usr/local/sbin/radiusd

#top
last pid: 28497;  load averages:  0.56,  2.34,  9.37                                                                    up 0+10:23:14  00:12:5
162 processes: 1 running, 158 sleeping, 3 stopped
CPU:  1.9% user,  0.0% nice,  1.9% system,  5.3% interrupt, 90.8% idle
Mem: 525M Active, 1259M Inact, 182M Wired, 41M Cache, 112M Buf, 1890M Free
Swap: 4096M Total, 4096M Free

  PID USERNAME    THR PRI NICE   SIZE    RES STATE   C   TIME   WCPU COMMAND
 6893 root          1  26    0 15392K  5580K select  0  21:17  6.10% snmpd
75797 bind          7  20    0   100M 77280K kqread  2   4:27  0.00% named
 5553 root          7  20    0 53544K 39832K select  1   0:19  0.00% mpd5
77411 dhcpd         1  20    0 15032K  5360K select  3   0:18  0.00% dhcpd
 3605 root          1  20    0 10460K  4004K select  3   0:11  0.00% zebra
 5316 root          1  20    0  9616K  1244K select  1   0:10  0.00% syslogd
 9438 freeradius    3  20    0   343M   314M STOP    0   0:09  0.00% radiusd
80843 mysql        26  20    0   402M   333M sbwait  0   0:05  0.00% mysqld
 3611 root          1  20    0 14660K  5348K select  2   0:05  0.00% bgpd
80396 www           1  20    0 37908K 22876K lockf   1   0:01  0.00% httpd
26278 root          1  20    0 33812K 15608K select  2   0:01  0.00% httpd
10559 www           1  20    0 42004K 26768K lockf   1   0:01  0.00% httpd

if I can supply another usefull debug info, answer as fast as you can, I can
not wait too long. Thank you.


-- 
С уважением,
 Коньков                          mailto:kes-kes at yandex.ru




More information about the Freeradius-Users mailing list