[4.0.x] radiusd process CPU spikes at 400% when trying to do DHCP concurrently

Chaigneau, Nicolas nicolas.chaigneau at capgemini.com
Fri Dec 20 09:24:34 CET 2019


I've tried latest HEAD (054095b5310d1c7b0994565ddb1d9eda2b45b435).

Now things are... different, but I can't say it's better :/

To sum up:

- Now (054095b5310d1c7b0994565ddb1d9eda2b45b435, December 20):
radiusd does not get stuck at 400% CPU anymore. 
However, it won't go beyond ~60% CPU, and cannot serve more than ~8.8k DHCP Discover/s (using the dummy DHCP virtual server).

- When issue first appeared (7c2b992cc79c5c2cdd2863eb6d91ffb9559dd0a9, November 24):
radiusd CPU rises very rapidly to ~420% CPU. Then after the load test is stopped, radiusd CPU stays forever at exactly 400% CPU, even though it receives no new packet to handle.
(looks like 4 worker threads in a busy loop ?)

- Just before issue appeared (86dec6a917c00d36df0ca5e3a9b04a48e105f488, November 24):
As expected, radiusd CPU usage rises according to the load of packets it has to handle.
It can handle 60k DHCP Discover/s easily (this is not its limit, at this rate it's using ~310% CPU). After the load test is stopped, radiusd CPU goes back to zero.

(just using a "top" to look at the CPU when doing the tests.)

>  I've pushed some fixes which I *hope* will help.  I've broken them down into a series of tiny commits.  So it should be possible to tell (a) which commit fixes things, or (b) which one makes it worse.
This message contains information that may be privileged or confidential and is the property of the Capgemini Group. It is intended only for the person to whom it is addressed. If you are not the intended recipient, you are not authorized to read, print, retain, copy, disseminate, distribute, or use this message or any part thereof. If you receive this message in error, please notify the sender immediately and delete all copies of this message.

More information about the Freeradius-Devel mailing list