Degradation of service when authentication fails with Windows AD

Phil Mayers p.mayers at imperial.ac.uk
Tue Feb 5 14:27:42 CET 2013


On 05/02/13 10:20, Antonio Alberola wrote:
> Dear All,
>
> I'm having random authentication failures and I think they are due to a
> Radius server internal failure. I use Radius for authenticating the email of
> users in Windows Active Directory via PAM. Before I used NTLM and Kerberos
> together, and now I use PAM.

This is confusing. FreeRADIUS is calling the "pam" module, yes? So what 
is the PAM stack calling?

> The problem is as follows. Users authenticate properly during the whole day,
> but suddenly authentication begins to fail and user authentication error
> appears even if the credentials are right. Since the failure, the service is
> exponentially degrade and it only validates 1 of every 20 requests. The
> onset of failure seems to coincide with one of these three messages:

Those messages are a symptom; your PAM module is taking too long to 
respond. You need to investigate what the PAM stack is calling, why it 
is hanging, and how to reduce the timeouts or improve the speed of 
failure detection.

This is not a FreeRADIUS problem.

>
> Tue Jan 30 08:27:38 2013 : Error: Received conflicting packet from client
> localhost port 14038 - ID: 194 due to unfinished request 161451.  Giving up
> on old request.
> Tue Jan 30 08:27:52 2013 : Error: Request 161507 has been waiting in the
> processing queue for 11 seconds.  Check that all databases are running
> properly!
> Fri Feb  1 14:55:15 2013 : Info: WARNING: Child is hung for request 3609 in
> component <core> module <queue>.
>
> The solution we are applying at the moment is restarting Radius. Sometimes
> restarting does not fix the problem and we have to set Radius for allowing
> all connections. Few minutes later, we turn it back to the current
> configuration and it works again. The biggest drawback, besides annoyance of
> users, is Windows AD accounts are blocked because of the failures.
>
> I need help to find the cause of the problem and fix it. I do not know yet
> if the problem is in the domain controllers, in the PAM module or in Radius.
> But everything seems to point to Radius.

In short: the problem you are experiencing with FreeRADIUS is because 
your authentication mechanism (PAM) is taking too long to respond. This 
is consuming all threads in the pool, which explains the log messages 
you see.

Fix the PAM stack to fail over properly, and this problem will go away.


More information about the Freeradius-Users mailing list