[4.0.x] radiusd process CPU spikes at 400% when trying to do DHCP concurrently
aland at deployingradius.com
Fri Dec 20 12:56:14 CET 2019
On Dec 20, 2019, at 3:24 AM, Chaigneau, Nicolas via Freeradius-Devel <freeradius-devel at lists.freeradius.org> wrote:
> I've tried latest HEAD (054095b5310d1c7b0994565ddb1d9eda2b45b435).
> Now things are... different, but I can't say it's better :/
> To sum up:
> - Now (054095b5310d1c7b0994565ddb1d9eda2b45b435, December 20):
> radiusd does not get stuck at 400% CPU anymore.
> However, it won't go beyond ~60% CPU, and cannot serve more than ~8.8k DHCP Discover/s (using the dummy DHCP virtual server).
We've seen that in our tests too. There are other issues hidden by the previous bug.
> - When issue first appeared (7c2b992cc79c5c2cdd2863eb6d91ffb9559dd0a9, November 24):
> radiusd CPU rises very rapidly to ~420% CPU. Then after the load test is stopped, radiusd CPU stays forever at exactly 400% CPU, even though it receives no new packet to handle.
> (looks like 4 worker threads in a busy loop ?)
Yes. The code was changed because we noticed there were situations when it wouldn't service timer events correctly. Which was wrong. The change unfortunately made it busy-loop.
The fixes to the event code are correct (we believe), but seem to have highlighted issues elsewhere.
> - Just before issue appeared (86dec6a917c00d36df0ca5e3a9b04a48e105f488, November 24):
> As expected, radiusd CPU usage rises according to the load of packets it has to handle.
> It can handle 60k DHCP Discover/s easily (this is not its limit, at this rate it's using ~310% CPU). After the load test is stopped, radiusd CPU goes back to zero.
> (just using a "top" to look at the CPU when doing the tests.)
We've managed to reproduce all of this here, and have a few people looking into it.
More information about the Freeradius-Devel