Performance or locking issue with rlm_detail

Aleš Rygl ales at rygl.net
Thu Mar 30 14:41:13 CEST 2017


Hello Alan, 

thanks for your answer.

On středa 29. března 2017 12:30:21 CEST Alan DeKok wrote:
> On Mar 29, 2017, at 4:43 AM, Aleš Rygl <ales at rygl.net> wrote:
> > I would like to kindly ask for help with freeradius server
> > processing Accounting req. I am running freeradius 3.0.11 on Debian
> > stable, kernel 3.16.0-4-amd64. Server is DL380 G9, CPU E5-2680 v3 @
> > 2.50GHz, 256GB RAM, very fast HP SSD RAID5 2.5 TB. The kernel can see 48
> > CPU.
> > 
> > Radius is processing ~ 3kqps of Accounting (Telco environment).
> 
>   That's a lot of packets, and a high-end server.

The spikes can reach about 6-8 kqps in case there is a network outage.
> 
>   The reality is that FreeRADIUS can do 30K packets/s on a single-core
> system.  Until, of course, it touches a database.  And then performance
> drops significantly.
> > The daemon receives Acct. req. and using module rml_detail it writes it
> > into a detail file(s) - there are 32 file queues based on modulo of
> > Calling-Station-Id.
> 
>   It would likely be better to write to SQL directly, and then to the detail
> files as a backup.

I will try it - once more time. The DB is pretty fast nevertheless I am affraid about the performance as it will be bound just to DB.

> > Everything works fine, there is ~15
> > connections to DB while processing ~2-3 kqps. Freeradius utilize about 3
> > CPU. This setup can load gigs of logs into DB without slowing down
> > Radius responses.
> 
>   That's good.
> 
> > The issue is that freeradius randomly starts to use
> > too much CPU (60-70% of whole server). There are following lines
> > appearing in the log:
> > 
> > Wed Mar 29 09:17:43 2017 : WARNING: (44056176)
> > WARNING: Module rlm_detail became unblocked for request 44056176
> > Wed Mar
> > 29 09:17:43 2017 : WARNING: (44056177) WARNING: Module rlm_detail became
> > unblocked for request 44056177
> > Wed Mar 29 09:17:43 2017 : WARNING:
> > (44056178) WARNING: Module rlm_detail became unblocked for request
> > 44056178
> > Wed Mar 29 09:17:43 2017 : WARNING: (44056179) WARNING: Module
> > rlm_detail became unblocked for request 44056179
> > Wed Mar 29 09:17:43
> > 2017 : WARNING: (44056180) WARNING: Module rlm_detail became unblocked
> > for request 44056180
> 
>   That's due to the file locking.  The short answer is to split up the
> detail files even more, which minimizes lock contention,
> 
>   And, to write to SQL directly if you can.

I have tried to increase the number of queues to 48 and reduced the load factor to 90. Unfortunately I looks like I am running out of something somewhere else...

Thu Mar 30 14:00:29 2017 : ERROR: (9093117) detail.mobile: ERROR: Couldn't open file /var/log/freeradius/radacct/detail.mobile_v3/queue-6/detail-2017033014: Too many different filenames
Thu Mar 30 14:00:29 2017 : ERROR: (9093118) detail.mobile: ERROR: Couldn't open file /var/log/freeradius/radacct/detail.mobile_v3/queue-0/detail-2017033014: Too many different filenames

Could I save some resources by limiting max_servers or any other parameters?

> > It seems to be related to high spikes in the
> > incoming radius traffic that occurs every hour - the issue become worse
> > after DST change when certain amount of terminal were restarted in the
> > same moment and their sessions produce Radius Accounting synchronously.
> > But not every time.
> 
>   Yes.  Load spikes are a huge problem for RADIUS servers.

...and not an exception in the wild.

> > I have an old server doing the same thing running
> > Freeradius 3.0.8., the kernel can see 24 CPU, and it really never
> > happened here while on the new box it is on daily basis. The
> > configuration of Radius is identical.
> 
>   See the commit log for 3.0.9. :(  v3.0.8 didn't do locking on the detail
> files.  So there are no complaints, because it doesn't lock anything.

Ah... 

>   I'd really suggest just writing to SQL directly.
> 
>   For the future, we're re-architecting v4 to avoid this problem by design. 
> The writes to SQL can be queued internally in an async fashion.  That
> allows for high sustained throughput with minimal contention.  We're also
> re-working the detail handling in a similar fashion.

Is it already available for testing?

Ales Rygl




More information about the Freeradius-Users mailing list