[4.0.x] radiusd process CPU spikes at 400% when trying to do DHCP concurrently

Alan DeKok aland at deployingradius.com
Fri Dec 20 12:56:14 CET 2019


On Dec 20, 2019, at 3:24 AM, Chaigneau, Nicolas via Freeradius-Devel <freeradius-devel at lists.freeradius.org> wrote:
> I've tried latest HEAD (054095b5310d1c7b0994565ddb1d9eda2b45b435).
> 
> Now things are... different, but I can't say it's better :/

  Yeah.

> To sum up:
> 
> - Now (054095b5310d1c7b0994565ddb1d9eda2b45b435, December 20):
> radiusd does not get stuck at 400% CPU anymore. 
> However, it won't go beyond ~60% CPU, and cannot serve more than ~8.8k DHCP Discover/s (using the dummy DHCP virtual server).

  We've seen that in our tests too.  There are other issues hidden by the previous bug.

> - When issue first appeared (7c2b992cc79c5c2cdd2863eb6d91ffb9559dd0a9, November 24):
> radiusd CPU rises very rapidly to ~420% CPU. Then after the load test is stopped, radiusd CPU stays forever at exactly 400% CPU, even though it receives no new packet to handle.
> (looks like 4 worker threads in a busy loop ?)

  Yes.  The code was changed because we noticed there were situations when it wouldn't service timer events correctly.  Which was wrong.  The change unfortunately made it busy-loop.

  The fixes to the event code are correct (we believe), but seem to have highlighted issues elsewhere.

> - Just before issue appeared (86dec6a917c00d36df0ca5e3a9b04a48e105f488, November 24):
> As expected, radiusd CPU usage rises according to the load of packets it has to handle.
> It can handle 60k DHCP Discover/s easily (this is not its limit, at this rate it's using ~310% CPU). After the load test is stopped, radiusd CPU goes back to zero.
> 
> 
> (just using a "top" to look at the CPU when doing the tests.)

  We've managed to reproduce all of this here, and have a few people looking into it.

  Alan DeKok.




More information about the Freeradius-Devel mailing list