System frequently stops responding...

Alan DeKok aland at deployingradius.com
Fri Jul 24 12:38:18 CEST 2015


On Jul 24, 2015, at 1:34 AM, Mohamed Lrhazi <Mohamed.Lrhazi at georgetown.edu> wrote:
> Still still trying to get to the bottom of this issue... to summarize:
> - Wireless controllers log that RADIUS server (a load balanced VIP), did
> not respond to a query. this is logged in clusters of dozen or so, several
> times a day.
> - Using docker containers.. so decided to try without them
> - Built two VMs, RedHat Enterprise 7, running provided freeradius
> RPMs. 3.0.4

  Please use 3.0.9.  We're not going to debug issues which were tracked down and fixed six months ago.

> - Sending the quarter of our traffic to this pool of two VMs.
> - Issue still occurs on these VMs.
> - I run radiusd in -Xx mode, on both of the RHEL7 VMs, also run a
> continuous tcdpump, on each VM.
> 
> - Problem occurrences seem to reliably coincide with:
> -- tcpdump shows all the requests logged by the controllers having been
> resent few times (duplicates in wireshark).
> -- radiusd goes silent (no log at all) for 30 seconds. after which it
> resumes logging and I presume, working.

  And.... what does the debug log say during this time?  You should be able to correlate timestamps.

  If there's *nothing* in the debug output, then most likely is that database is locking up, and preventing FreeRADIUS from doing anything

> - radiusd logs a line for each missed query, I think, like so:
> Error: (7719) Ignoring duplicate packet from client gu_net_10 port 3010 -
> ID: 96 due to unfinished request in component <core> module
> 
> -- Spikes in CPU usage (as seen in sar output).
> 
> What can I do next? to further zoom in on the root cause? Or is this pretty
> clearly CPU starvation? just add more VMs ?

 Use 3.0.9.

  Alan DeKok.




More information about the Freeradius-Users mailing list