RADIUS Monitoring tool
Phil Mayers
p.mayers at imperial.ac.uk
Mon Mar 2 18:28:14 CET 2015
On 02/03/15 16:09, John Douglass wrote:
>
> On 02/25/2015 08:28 AM, Clement Ogedengbe wrote:
>> On two occasions in the last 2 weeks, our RADIUS server suddenly started to reject ALL users. Even though we have set up a failover system. Unfotunately, the fail-over system did not kick in because the RADIUS service was still running, only that it was rejecting all users for some strange reasons.
>>
>> Does anyone know of any monitoring script/tool that can be used to test that the RADIUS server is authenticating properly and which can send an alert by email or text in the event that the server rejects authentication of a valid user credentials a number of times.
>>
>> Best Regards
>>
>> Clement Ogedengbe
>>
> Clement,
>
> At Georgia Tech we are currently refining our Radius monitoring
> services. We are finding that using eapol_test is not enough when
> debugging the variety of failure scenarios that can occur when using I
> will be writing some updated PHP monitoring objects, if enough people
> are interested in how we will be monitoring this service for its various
> failure scenarios, I'm glad to share it. We've had a lot of people
> looking into these issues and working through our pain points.
>
> Just to give you some background, we currently have deployed 4 hardware
> radius servers, 1 VM radius server, we are using 1 "shared" AD server
> and 2 "dedicated" VM AD servers (that only our radius servers
> communicate with via samba) between
>
> Not here to argue over use of EAP-PEAP-MSChapV2 vs EAP-TLS, we all live
> within our environments and make them the best there is.
>
> If you are using the configuration of Controller -> Radius ->
> Samba/ntlm_auth -> AD here are a number of things you need to consider
> that we have come across:
>
> 1) The samba joins to AD are somewhat brittle.
Yep. We check this with a passive nagios service which basically does this:
for attempt in 1 2 3 4:
wbinfo -t
if success:
break
# sometimes the pipe just times out harmlessly; retry
# if we see that kind of error message, and only that
if NT_STATUS_PIPE_NOT_AVAILABLE in output:
sleep 1
continue
# failed, and not a ignorable error message; fall through
break
if not success:
service fail
net -P ads status
if not success:
service fail
> There have been instances when our samba service (winbind) has
> completely lost its privileges to AD. This has happened under numerous
> versions and has happened at random times. I'm sure it's something to do
> with the renegotiation of keys between the joined samba machine and the
> AD servers.
This can happen with Windows servers too; we've had Windows 2012R2
member servers fall out of the domain. It's not Samba-specific; it
appears to be related to AD replication issues occurring at the same
time as an AD machine account password event. But this is all hypothesis
- we haven't proven it.
If you do a bit of googling, you'll see a lot of people run into it.
It's just a bit of Microsoft nonsense we all have to live with :o(
> If anyone has some experience in the above failures of Samba joins to
> AD, I'd love to hear it.
See above!
> 2) AD servers get overloaded when we are talking about large numbers of
> users. We are still learning what our load limits are. Our devicebase is
> anywhere between 20k-25k+ with two dedicated AD vms (scaled large) with
> radius requests from controller with about 250 APs (we had to scale down
> due to a radius flaw in the controller software that we have been
> testing a fix for Cisco for).
There has been some discussion about this on -devel recently. Matthew
Newton has a patch which runs ntlm_auth in "pipe" mode, avoiding the
overhead of a fork/exec/startup on each auth. This seems to make a
substantial difference - you might want to check the patch out.
Short version: it might not be AD. It might be the overhead of starting
ntlm_auth on every mschap request.
>
> You absolutely need to use the maxConcurrentApi connection setting if
> you are doing any sort of large user transactions.
>
> http://support.microsoft.com/kb/2688798
This is AD version dependent. We do *not* have it set, and seem to run
without problem, but are on Windows 2012R2 where the default is different.
More information about the Freeradius-Users
mailing list