Performance or locking issue with rlm_detail

Alan DeKok aland at deployingradius.com
Wed Mar 29 18:30:21 CEST 2017


On Mar 29, 2017, at 4:43 AM, Aleš Rygl <ales at rygl.net> wrote:
> I would like to kindly ask for help with freeradius server
> processing Accounting req. I am running freeradius 3.0.11 on Debian
> stable, kernel 3.16.0-4-amd64. Server is DL380 G9, CPU E5-2680 v3 @
> 2.50GHz, 256GB RAM, very fast HP SSD RAID5 2.5 TB. The kernel can see 48
> CPU.
> 
> Radius is processing ~ 3kqps of Accounting (Telco environment).

  That's a lot of packets, and a high-end server.

  The reality is that FreeRADIUS can do 30K packets/s on a single-core system.  Until, of course, it touches a database.  And then performance drops significantly.

> The daemon receives Acct. req. and using module rml_detail it writes it
> into a detail file(s) - there are 32 file queues based on modulo of
> Calling-Station-Id.

  It would likely be better to write to SQL directly, and then to the detail files as a backup.

> Everything works fine, there is ~15
> connections to DB while processing ~2-3 kqps. Freeradius utilize about 3
> CPU. This setup can load gigs of logs into DB without slowing down
> Radius responses. 

  That's good.

> The issue is that freeradius randomly starts to use
> too much CPU (60-70% of whole server). There are following lines
> appearing in the log: 
> 
> Wed Mar 29 09:17:43 2017 : WARNING: (44056176)
> WARNING: Module rlm_detail became unblocked for request 44056176
> Wed Mar
> 29 09:17:43 2017 : WARNING: (44056177) WARNING: Module rlm_detail became
> unblocked for request 44056177
> Wed Mar 29 09:17:43 2017 : WARNING:
> (44056178) WARNING: Module rlm_detail became unblocked for request
> 44056178
> Wed Mar 29 09:17:43 2017 : WARNING: (44056179) WARNING: Module
> rlm_detail became unblocked for request 44056179
> Wed Mar 29 09:17:43
> 2017 : WARNING: (44056180) WARNING: Module rlm_detail became unblocked
> for request 44056180 

  That's due to the file locking.  The short answer is to split up the detail files even more, which minimizes lock contention,

  And, to write to SQL directly if you can.

> It seems to be related to high spikes in the
> incoming radius traffic that occurs every hour - the issue become worse
> after DST change when certain amount of terminal were restarted in the
> same moment and their sessions produce Radius Accounting synchronously.
> But not every time. 

  Yes.  Load spikes are a huge problem for RADIUS servers.

> I have an old server doing the same thing running
> Freeradius 3.0.8., the kernel can see 24 CPU, and it really never
> happened here while on the new box it is on daily basis. The
> configuration of Radius is identical. 

  See the commit log for 3.0.9. :(  v3.0.8 didn't do locking on the detail files.  So there are no complaints, because it doesn't lock anything.

  I'd really suggest just writing to SQL directly.

  For the future, we're re-architecting v4 to avoid this problem by design.  The writes to SQL can be queued internally in an async fashion.  That allows for high sustained throughput with minimal contention.  We're also re-working the detail handling in a similar fashion.

  Alan DeKok.




More information about the Freeradius-Users mailing list