System frequently stops responding...
Alan DeKok
aland at deployingradius.com
Fri Jul 24 12:38:18 CEST 2015
On Jul 24, 2015, at 1:34 AM, Mohamed Lrhazi <Mohamed.Lrhazi at georgetown.edu> wrote:
> Still still trying to get to the bottom of this issue... to summarize:
> - Wireless controllers log that RADIUS server (a load balanced VIP), did
> not respond to a query. this is logged in clusters of dozen or so, several
> times a day.
> - Using docker containers.. so decided to try without them
> - Built two VMs, RedHat Enterprise 7, running provided freeradius
> RPMs. 3.0.4
Please use 3.0.9. We're not going to debug issues which were tracked down and fixed six months ago.
> - Sending the quarter of our traffic to this pool of two VMs.
> - Issue still occurs on these VMs.
> - I run radiusd in -Xx mode, on both of the RHEL7 VMs, also run a
> continuous tcdpump, on each VM.
>
> - Problem occurrences seem to reliably coincide with:
> -- tcpdump shows all the requests logged by the controllers having been
> resent few times (duplicates in wireshark).
> -- radiusd goes silent (no log at all) for 30 seconds. after which it
> resumes logging and I presume, working.
And.... what does the debug log say during this time? You should be able to correlate timestamps.
If there's *nothing* in the debug output, then most likely is that database is locking up, and preventing FreeRADIUS from doing anything
> - radiusd logs a line for each missed query, I think, like so:
> Error: (7719) Ignoring duplicate packet from client gu_net_10 port 3010 -
> ID: 96 due to unfinished request in component <core> module
>
> -- Spikes in CPU usage (as seen in sar output).
>
> What can I do next? to further zoom in on the root cause? Or is this pretty
> clearly CPU starvation? just add more VMs ?
Use 3.0.9.
Alan DeKok.
More information about the Freeradius-Users
mailing list