FATAL! Server is too busy to process requests

Mitchell, Michael J Michael.Mitchell at team.telstra.com
Thu Feb 16 17:14:52 CET 2006


Hi all,

I'm at a bit of a loss. I'm currently trying to load test the
authentication proxy performance of freeRADIUS 1.0.1 in preparation for
a deployment this weekend.

Unfortunately, I'm running into this error "Error: FATAL!  Server is too
busy to process requests".

My scenario is:

Authentication Request comes in, we look the username up in openLDAP
running on the same server, and if the user doesn't exist, proxy the
request by setting Proxy-To-Realm using the attribute rewrite module. I
have made changes to the rlm_ldap module for local requirements, and
also running a couple of custom modules, so its not pure 1.0.1. The
error still occurs when I disable all of my custom modules (except
rlm_ldap of course).

Interestingly, this error doesn't seem to occur when the openLDAP server
is running on a different server, however the rate of requests that I
can push through the server is also a lot less in this circumstance
(about 25%).

Oh, and finally, this is running Solaris 9 on a V240.

>From what I can tell, it doesn't seem to be related to thread
starvation, and the time it takes to reach this error seems to be
somewhat variable.

The CPU (according to prstat) doesn't need to be at 100% for this to
occur either. However typically when it does occur radiusd is using all
or close to all of one of the CPU's.

It also doesn't happen when I run the server with -xx. Presumably this
is because the extra output slows the server down enough such that its
not hitting whatever barrier is causing this.

To do the testing I'm using radclient sending using multiple threads.
The number of radclient threads does seem to have a bearing, and I can
stop the error from happening by reducing the number of threads. Once
again I presume this is due to the reduced throughput or requests.


I've just worked 11 days straight and averaged at least 15 hours a day,
and at the moment I'm just too tired to trace right through the code to
see what might be causing the issue. It is 3am here after all.

Any advice or help is really appreciated at this stage. What might be
the cause of (*request)->child_pid != NO_SUCH_CHILD_PID in
request_dequeue? Anything I should look at, or tune to reduce the
likelyhood of this occurring?

It seems that I can also resolve the issue (at least for the same
requests rate) by looping at the "select" in requests_dequeue 20 times
instead of 10.

What risk does this present?

I then get errors like:

Fri Feb 17 03:10:54 2006 : Error: Dropping conflicting packet from
client dbst1:63628 - ID: 198 due to unfinished request 44357

Which is better (to me) than the server stopping. ;-)

Thankyou kindly for your time to read this email. Sorry it was so long
winded. Hopefully you will be able to offer some advice!

kind regards,
Mike




More information about the Freeradius-Users mailing list