FreeRADIUS can't make progress under certain load
rihad at mail.ru
Sun Sep 11 15:14:25 CEST 2011
On 09/11/2011 04:46 PM, Arran Cudbard-Bell wrote:
> On 11 Sep 2011, at 12:51, rihad wrote:
>> On 09/11/2011 01:58 PM, Alan DeKok wrote:
>>> rihad wrote:
>>>> We're using preprocess, rlm_perl (for AAA),
>>> I think it's abundantly clear you don't understand what you're doing.
>>> Perl doesn't do AAA. Perl is a programming language.
>> In case I wasn't clear enough: Perl scripts servicing AAA requests.
>>> What I *suspect* you're doing is using Perl to connect to a DB.
>>> (Notice how I keep mentioning DB, and you keep ignoring it? Maybe it's
>>> Your Perl script is breaking the server. Fix it.
>> I know that. The auth& billing software we're using is admittedly slow. But see how easy it was to lower max_requests and allow FreeRADIUS to make progress on its own during load spikes (like when a NAS reboots). PPPoE clients (most of which are ADSL modems) retry auth anyway. Noting in radiusd.conf that max_clients shouldn't be set higher than the system can process within cleanup_delay seconds might save some poor soul their spare time in the future.
> Cleanup delay just controls the lifetime of the request cache... If it is effecting how the server is performing that points to your NAS re-using IDs way too quickly. It may be that its pool of src ports is too small, or it only uses a single source port. Either way it's extremely specific to your case so not really useful advice for a wider audience.
>> Let me just quote Mr. Arran again:
>>> Your NAS is also behaving very strangely. FreeRADIUS only gives up on processing a request if a request with a duplicate ID, SRC IP, and SRC PORT but a different REQUEST AUTHENTICATOR is received.
>>> When a NAS retransmits it should use the same ID, SRC IP, SRC PORT and REQUEST AUTHENTICATOR.
>> By this it should be clear that it's not NAS resending unanswered auth requests, but rather ADSL modems issuing _new_ requests.
> Not necessarily, the NAS is using the same request ID, if this were a completely new request you'd expect that to change as well as the request authenticator.
> This behaviour has compounded the issue you're experiencing. Usually the NAS handles retransmits, and a retransmit would not cause the original request to be dropped like you're seeing. It's because the server thinks the NAS has given up waiting on the original request, and has looped through all 256 IDs, and is sending a new, completely unrelated request, that it trashes the original.
> This is not normal behaviour for a NAS, I would check to see if there's anything you could do to make the ADSL clients back off, and have the NAS handle retransmits, as well as what you've already done (if in fact the ADSL clients are handling the retransmits, i'm not convinced without seeing packet traces).
> Also, if you think processing the accounting data might be slowing down authentication processing you might want to consider using a detail writer and detail reader virtual server (there are examples in sites-available, robust something or other). The server can spool accounting requests to a detail file very quickly, much quicker than rlm_perl could process them. The reader intelligently throttles based on server load, so unless the billing software needs to be notified in real time of a client connecting or disconnecting, it's a very good solution for dealing with load spikes.
Thank you, Arran, I think the problem was solved because new requests
are no longer accepted when the system is under heavy load, so
FreeRADIUS can sustain the load spikes until they're over.
> Arran Cudbard-Bell
> a.cudbardb at freeradius.org
> RADIUS - Waging war on ignorance and apathy one Access-Challenge at a time.
More information about the Freeradius-Devel