FreeRADIUS can't make progress under certain load

rihad rihad at mail.ru
Sat Sep 10 17:49:21 CEST 2011


Hi, sometimes when a NAS with many users reboots FreeRADIUS is unable to 
cope with the number of incoming requests:

Sat Sep 10 13:23:16 2011 : Auth: Login OK: [5702018] (from client 
10.10.70.100 port 0)
Sat Sep 10 13:23:16 2011 : Error: Received conflicting packet from 
client 10.10.70.98 port 1645 - ID: 131 due to unfinished request 2814. 
Giving up on old request.
Sat Sep 10 13:23:16 2011 : Auth: Login OK: [5026706] (from client 
10.10.70.38 port 0)
Sat Sep 10 13:23:16 2011 : Error: Received conflicting packet from 
client 10.10.70.93 port 1646 - ID: 117 due to unfinished request 1036. 
Giving up on old request.
Sat Sep 10 13:23:16 2011 : Auth: Login OK: [4925140] (from client 
10.10.70.93 port 0)
Sat Sep 10 13:23:16 2011 : Error: Received conflicting packet from 
client 10.10.70.60 port 1645 - ID: 109 due to unfinished request 3370. 
Giving up on old request.
Sat Sep 10 13:23:16 2011 : Auth: Login OK: [4977364] (from client 
10.10.70.38 port 0)
Sat Sep 10 13:23:16 2011 : Error: Received conflicting packet from 
client 10.10.70.93 port 1646 - ID: 118 due to unfinished request 1040. 
Giving up on old request.
Sat Sep 10 13:23:16 2011 : Error: Received conflicting packet from 
client 10.10.70.60 port 1645 - ID: 110 due to unfinished request 3373. 
Giving up on old request.
Sat Sep 10 13:23:16 2011 : Auth: Login OK: [4529464] (from client 
10.10.70.28 port 0)
Sat Sep 10 13:23:16 2011 : Error: Received conflicting packet from 
client 10.10.70.98 port 1645 - ID: 132 due to unfinished request 2829. 
Giving up on old request.

and ad infinitum. The duplicate requests come from PPPoE clients after 
they fail to receive a response within 5 seconds or so. The problem can 
then be solved by restarting the daemon once or twice and watching the 
errors go away. The fact that dropping all current work helped gave me 
this idea: wouldn't it be nice if FreeRADIUS dropped both the old _and_ 
the new request at the time it logged that "giving up" message? That 
would at least allow it to make progress. BTW, I'm not sure why, but 
under comparable workloads openradius does not exhibit this problem.

I looked in the code, namely in src/main/event.c


         /*
          *      FUTURE: Add checks for system load.  If the system is
          *      busy, start dropping requests...
          *
          *      We can probably keep some statistics ourselves...  if
          *      there are more requests coming in than we can handle,
          *      start dropping some.
          */

Sure the time has come for us :) Here's my try to drop them both when 
receiving a dupe:

--- freeradius-server-2.1.11/src/main/event.c   2011-06-20 
19:57:14.000000000 +0500
+++ event.c     2011-09-10 20:44:34.000000000 +0500
@@ -2894,7 +2894,7 @@

                         received_conflicting_request(request, client);
                         request = NULL;
-                       break;
+                       return 0;

                 case REQUEST_REJECT_DELAY:
                 case REQUEST_CLEANUP_DELAY:


Any thoughts?



More information about the Freeradius-Devel mailing list