Intermittent failures of mod_krb5

Brian Candler b.candler at pobox.com
Fri Feb 23 13:43:38 CET 2018


Hello,

I have been setting up freeradius-3.0.15 (from PPA) under Ubuntu 16.04, 
and using mod_krb5 to authenticate against Samba4.  There are four 
freeradius servers, each with a Samba replica behind it.

Once every few days, I get alerts from nagios saying one or more RADIUS 
servers has stopped working for a few minutes, and then they recover. 
There are two nagios servers making RADIUS checks across all four RADIUS 
servers, and both nagios servers see the problem simultaneously.

I now have authentication logging turn on, and at the time of the last 
event I see:

... all OK
Thu Feb 22 17:21:51 2018 : Auth: (273) Login OK: [nagiostest] (from 
client ix_nagios port 0)
Thu Feb 22 17:22:38 2018 : Auth: (274) Login OK: [nagiostest] (from 
client wrn_mon2 port 0)
Thu Feb 22 17:23:00 2018 : Error: Unresponsive child for request 275, in 
component authenticate module krb5
Thu Feb 22 17:23:10 2018 : WARNING: (275) WARNING: Module rlm_krb5 
became unblocked
Thu Feb 22 17:23:38 2018 : Auth: (276) Login OK: [nagiostest] (from 
client wrn_mon2 port 0)
Thu Feb 22 17:24:00 2018 : Error: Unresponsive child for request 277, in 
component authenticate module krb5
Thu Feb 22 17:24:01 2018 : WARNING: (277) WARNING: Module rlm_krb5 
became unblocked
Thu Feb 22 17:24:46 2018 : Error: Unresponsive child for request 278, in 
component authenticate module krb5
Thu Feb 22 17:24:52 2018 : Info: Need 1 more connections to reach 5 spares
Thu Feb 22 17:24:52 2018 : Info: rlm_krb5 (krb5): Opening additional 
connection (5), 1 of 5 pending slots used
Thu Feb 22 17:24:52 2018 : WARNING: (278) WARNING: Module rlm_krb5 
became unblocked
Thu Feb 22 17:25:00 2018 : Error: Unresponsive child for request 279, in 
component authenticate module krb5
Thu Feb 22 17:25:01 2018 : WARNING: (279) WARNING: Module rlm_krb5 
became unblocked
Thu Feb 22 17:25:45 2018 : Info: rlm_krb5 (krb5): Closing connection 
(4), from 1 unused connections
Thu Feb 22 17:25:45 2018 : Auth: (280) Login OK: [nagiostest] (from 
client wrn_mon2 port 0)
Thu Feb 22 17:26:00 2018 : Error: Unresponsive child for request 281, in 
component authenticate module krb5
Thu Feb 22 17:26:10 2018 : WARNING: (281) WARNING: Module rlm_krb5 
became unblocked
Thu Feb 22 17:26:46 2018 : Error: Unresponsive child for request 282, in 
component authenticate module krb5
Thu Feb 22 17:26:56 2018 : Info: Need 1 more connections to reach 5 spares
Thu Feb 22 17:26:56 2018 : Info: rlm_krb5 (krb5): Opening additional 
connection (6), 1 of 5 pending slots used
Thu Feb 22 17:26:56 2018 : WARNING: (282) WARNING: Module rlm_krb5 
became unblocked
Thu Feb 22 17:27:00 2018 : Error: Unresponsive child for request 283, in 
component authenticate module krb5
Thu Feb 22 17:27:10 2018 : WARNING: (283) WARNING: Module rlm_krb5 
became unblocked
Thu Feb 22 17:27:43 2018 : Info: rlm_krb5 (krb5): Closing connection 
(1), from 1 unused connections
Thu Feb 22 17:27:43 2018 : Auth: (284) Login OK: [nagiostest] (from 
client wrn_mon2 port 0)
Thu Feb 22 17:28:00 2018 : Auth: (285) Login OK: [nagiostest] (from 
client ix_nagios port 0)
... all OK again


I'm not sure about the meaning of the message "became unblocked". 
Looking at the source of call_modsingle()

         blocked = (request->master_state == REQUEST_STOP_PROCESSING);
         if (blocked) return RLM_MODULE_NOOP;
...

         blocked = (request->master_state == REQUEST_STOP_PROCESSING);
         if (blocked) {
                 RWARN("Module %s became unblocked", 
sp->modinst->entry->name);
         }

It seems to me that the module has unexpectedly become blocked (not 
unblocked!)  But does this simply mean that the module is taking longer 
to respond than expected?  Or if not, what else does it indicate?


Apart from this: I wondered if you had any pointers to narrowing down 
the problem.  There are four servers so I can probably leave one running 
under freeradius -X until it recurs.

Now, of course the problem *could* be with Samba4.  However I have a 
separate nagios script (below) which performs LDAP queries using krb5 
authentication, and it never reports a problem - not even an occasional 
"soft" error; that suggests that KRB5 is reliable.

It considered if it was a problem with the RADIUS checking plugin 
(check_radius_adv), but this seems very unlikely given that (a) both 
nagios servers see the same problem simultaneously, and (b) freeradius 
itself is logging problems around the time of the issue.

I have:

[radiusd.conf]

max_request_time = 10

thread pool {
         start_servers = 3


-----------

#!/bin/sh
host="$1"
if [ "$host" = "" ]; then
   echo "Missing hostname"
   exit 2
fi

export KRB5_CONFIG="/tmp/krb5.config.$$"
export KRB5CCNAME="/tmp/krb5.cc_cache.$$"

if expr "$host" : ".*:" >/dev/null; then
   khost="[$host]"
else
   khost="$host"
fi

cat <<EOS >"$KRB5_CONFIG"
[realms]
AD.EXAMPLE.NET = {
   kdc = $khost
}
EOS

# 1. Can we get a ticket from this host?

kinit -k -t /etc/nagiostest.keytab nagiostest at AD.EXAMPLE.NET
if [ "$?" != "0" ]; then
   echo "KINIT FAILED"
   rm "$KRB5_CONFIG"
   exit 2
fi

# 2. Can we use this ticket to do an LDAP query to this host?

res="$(ldapsearch -LLLQ -Y GSSAPI -h "$host" -b 
"cn=users,dc=ad,dc=example,dc=net" "(samaccountname=nagiostest)" 
samaccountname userprincipalname)"
if [ "$?" != "0" ]; then
   echo "LDAPSEARCH FAILED"
   kdestroy
   rm "$KRB5_CONFIG"
   exit 2
fi
if ! expr "$res" : ".*userPrincipalName: nagiostest at ad.example.net" 
 >/dev/null; then
   echo "LDAPSEARCH BAD RESULT: $res"
   kdestroy
   rm "$KRB5_CONFIG"
   exit 2
fi

# 3. Clean up

kdestroy
rm "$KRB5_CONFIG"
echo "OK"




More information about the Freeradius-Users mailing list