FreeRADIUS can't make progress under certain load
rihad at mail.ru
Sun Sep 11 10:20:07 CEST 2011
On 09/11/2011 12:01 PM, rihad wrote:
> On 09/11/2011 11:11 AM, rihad wrote:
>> On 09/10/2011 11:23 PM, Arran Cudbard-Bell wrote:
>>>> I'm not blaming anyone. Thanks for the great software and for
>>>> sharing it with us. The great thing about open source is that I can
>>>> tweak it to my needs. I'm not saying this is the best way to get
>>>> rid of the problem. But it may be the easiest and the quickest.
>>> It's a really bad way to fix the problem. You're just masking the
>>> underlying issue doing this.
>>> You need to figure out why your backend authentication system is
>>> taking more than 5 seconds to complete a request. Its that simple.
>>> I'm suggesting lowering the max thread count to reduce the number of
>>> requests running in parallel to take load of your backend system, so
>>> it starts responding before the NAS retransmits the packet.
>>> Likely there's much more that could be done to deal with high
>>> volumes of requests, but we would need to know what modules you're
>>> using with the server, and so far you've ignored all requests for
>>> this information.
>> We're using preprocess, rlm_perl (for AAA), acct_unique, detail,
>>> If you just want to throw new requests away once the number queued
>>> gets stupidly large, use the undocumented parameter 'max_queue_size'
>>> in the threadpool stanza.
>>> One the server has X number of pending requests, it'll start
>>> throwing new ones away, relying on the NAS' retransmit behaviour to
>>> eventually get the request processed.
>> Great. First I thought of tweaking max_clients to the point where it
>> never triggerred under "normal" load, but started dropping new
>> request (and logging the fact) whenever the box couldn't cope with
>> the current load. max_queue_size may turn out to be just as useful.
> I just dropped max_requests to a mere 512. That's around 50 requests
> per second given cleanup_delay=10. Enough for the normal load as it is
> now. I'll see how it goes.
Coincidentally one of the NASen rebooted just now (we've been
experiencing electricity problems on one of the PSTN these days). The
server handled around 350 auth requests, which together with their
corresponding acct requests probably equalled max_requests=512. After
that it started dropping new requests, as expected, and thus easing the
Sun Sep 11 12:53:37 2011 : Error: Dropping request (513 is too many):
from client 10.10.70.28 port 1646 - ID: 173
Sun Sep 11 12:53:37 2011 : Info: WARNING: Please check the configuration
file. The value for 'max_requests' is probably set too low.
So it gracefully handled the load. This parallels with the fact that
restarting radiusd also helped for the same reason: no queued up work to do.
I might even raise the limit to 1024 and see how that goes.
Thanks for giving much useful advice!
More information about the Freeradius-Devel