reaped processes still timing out in rad_waitpid
Louis Munro
lmunro at inverse.ca
Tue Oct 28 22:18:55 CET 2014
Hello,
I have hit upon a case where some ntlm_auth processes would return (and write the NT_KEY to the connecting pipe) but FR still complains that it failed and denies authentication (this is on 2.2.5).
This manifests in the logs like the following:
Tue Oct 28 11:10:15 2014 : Auth: Login incorrect (mschap: External script says NT_KEY: 4BCE6CA72058BA7EE500D1A68A8771C0): [tstRad9] (from client 155.98.204.47 port 0 cli 02-00-00-00-00-01 via TLS tunnel)
Since this is actually the output for a valid and successful authentication, it appears that the exit code is the real issue.
That exit code is either that of the process itself or 2 if rad_waitpid times out while waiting for the child.
After adding some debugging statements and recompiling I found that there were cases where reap_children would reap a child process but the pid would not be found in thread_pool.waiters. This only happened when there were a significant number of auths per seconds and still not consistently. Some head scratching ensued and a colleague then suggested there may be a race condition between rad_fork (where it calls fr_hash_table_insert) and reap_children (where it calls fr_hash_table_finddata).
So I wrote a quick patch which I submit to your consideration. It adds the pid to the hash table if not found.
In our (admittedly limited) tests this has fixed the issue. No more ntlm_auth waiting ten seconds before timing out.
I don't claim for this patch to be perfect. For one thing it causes an error to be logged by rad_fork (" Failed to store PID, creating what will be a zombie process") when the pid is added to the hash table in reap_children.
There may be better ways to do this, perhaps simply with better locking in rad_fork. This is only meant as a fix while the issue is discussed and improved upon.
Regards,
--
Louis Munro
lmunro at inverse.ca :: www.inverse.ca
+1.514.447.4918 x125 :: +1 (866) 353-6153 x125
Inverse inc. :: Leaders behind SOGo (www.sogo.nu) and PacketFence (www.packetfence.org)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.freeradius.org/pipermail/freeradius-users/attachments/20141028/3689f877/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: fr_225.1_debug.patch
Type: application/octet-stream
Size: 1336 bytes
Desc: not available
URL: <http://lists.freeradius.org/pipermail/freeradius-users/attachments/20141028/3689f877/attachment.obj>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.freeradius.org/pipermail/freeradius-users/attachments/20141028/3689f877/attachment-0001.html>
More information about the Freeradius-Users
mailing list