Performance or locking issue with rlm_detail
Aleš Rygl
ales at rygl.net
Thu Mar 30 14:41:13 CEST 2017
Hello Alan,
thanks for your answer.
On středa 29. března 2017 12:30:21 CEST Alan DeKok wrote:
> On Mar 29, 2017, at 4:43 AM, Aleš Rygl <ales at rygl.net> wrote:
> > I would like to kindly ask for help with freeradius server
> > processing Accounting req. I am running freeradius 3.0.11 on Debian
> > stable, kernel 3.16.0-4-amd64. Server is DL380 G9, CPU E5-2680 v3 @
> > 2.50GHz, 256GB RAM, very fast HP SSD RAID5 2.5 TB. The kernel can see 48
> > CPU.
> >
> > Radius is processing ~ 3kqps of Accounting (Telco environment).
>
> That's a lot of packets, and a high-end server.
The spikes can reach about 6-8 kqps in case there is a network outage.
>
> The reality is that FreeRADIUS can do 30K packets/s on a single-core
> system. Until, of course, it touches a database. And then performance
> drops significantly.
> > The daemon receives Acct. req. and using module rml_detail it writes it
> > into a detail file(s) - there are 32 file queues based on modulo of
> > Calling-Station-Id.
>
> It would likely be better to write to SQL directly, and then to the detail
> files as a backup.
I will try it - once more time. The DB is pretty fast nevertheless I am affraid about the performance as it will be bound just to DB.
> > Everything works fine, there is ~15
> > connections to DB while processing ~2-3 kqps. Freeradius utilize about 3
> > CPU. This setup can load gigs of logs into DB without slowing down
> > Radius responses.
>
> That's good.
>
> > The issue is that freeradius randomly starts to use
> > too much CPU (60-70% of whole server). There are following lines
> > appearing in the log:
> >
> > Wed Mar 29 09:17:43 2017 : WARNING: (44056176)
> > WARNING: Module rlm_detail became unblocked for request 44056176
> > Wed Mar
> > 29 09:17:43 2017 : WARNING: (44056177) WARNING: Module rlm_detail became
> > unblocked for request 44056177
> > Wed Mar 29 09:17:43 2017 : WARNING:
> > (44056178) WARNING: Module rlm_detail became unblocked for request
> > 44056178
> > Wed Mar 29 09:17:43 2017 : WARNING: (44056179) WARNING: Module
> > rlm_detail became unblocked for request 44056179
> > Wed Mar 29 09:17:43
> > 2017 : WARNING: (44056180) WARNING: Module rlm_detail became unblocked
> > for request 44056180
>
> That's due to the file locking. The short answer is to split up the
> detail files even more, which minimizes lock contention,
>
> And, to write to SQL directly if you can.
I have tried to increase the number of queues to 48 and reduced the load factor to 90. Unfortunately I looks like I am running out of something somewhere else...
Thu Mar 30 14:00:29 2017 : ERROR: (9093117) detail.mobile: ERROR: Couldn't open file /var/log/freeradius/radacct/detail.mobile_v3/queue-6/detail-2017033014: Too many different filenames
Thu Mar 30 14:00:29 2017 : ERROR: (9093118) detail.mobile: ERROR: Couldn't open file /var/log/freeradius/radacct/detail.mobile_v3/queue-0/detail-2017033014: Too many different filenames
Could I save some resources by limiting max_servers or any other parameters?
> > It seems to be related to high spikes in the
> > incoming radius traffic that occurs every hour - the issue become worse
> > after DST change when certain amount of terminal were restarted in the
> > same moment and their sessions produce Radius Accounting synchronously.
> > But not every time.
>
> Yes. Load spikes are a huge problem for RADIUS servers.
...and not an exception in the wild.
> > I have an old server doing the same thing running
> > Freeradius 3.0.8., the kernel can see 24 CPU, and it really never
> > happened here while on the new box it is on daily basis. The
> > configuration of Radius is identical.
>
> See the commit log for 3.0.9. :( v3.0.8 didn't do locking on the detail
> files. So there are no complaints, because it doesn't lock anything.
Ah...
> I'd really suggest just writing to SQL directly.
>
> For the future, we're re-architecting v4 to avoid this problem by design.
> The writes to SQL can be queued internally in an async fashion. That
> allows for high sustained throughput with minimal contention. We're also
> re-working the detail handling in a similar fashion.
Is it already available for testing?
Ales Rygl
More information about the Freeradius-Users
mailing list