Intermittent failures of mod_krb5
Brian Candler
b.candler at pobox.com
Fri Feb 23 13:43:38 CET 2018
Hello,
I have been setting up freeradius-3.0.15 (from PPA) under Ubuntu 16.04,
and using mod_krb5 to authenticate against Samba4. There are four
freeradius servers, each with a Samba replica behind it.
Once every few days, I get alerts from nagios saying one or more RADIUS
servers has stopped working for a few minutes, and then they recover.
There are two nagios servers making RADIUS checks across all four RADIUS
servers, and both nagios servers see the problem simultaneously.
I now have authentication logging turn on, and at the time of the last
event I see:
... all OK
Thu Feb 22 17:21:51 2018 : Auth: (273) Login OK: [nagiostest] (from
client ix_nagios port 0)
Thu Feb 22 17:22:38 2018 : Auth: (274) Login OK: [nagiostest] (from
client wrn_mon2 port 0)
Thu Feb 22 17:23:00 2018 : Error: Unresponsive child for request 275, in
component authenticate module krb5
Thu Feb 22 17:23:10 2018 : WARNING: (275) WARNING: Module rlm_krb5
became unblocked
Thu Feb 22 17:23:38 2018 : Auth: (276) Login OK: [nagiostest] (from
client wrn_mon2 port 0)
Thu Feb 22 17:24:00 2018 : Error: Unresponsive child for request 277, in
component authenticate module krb5
Thu Feb 22 17:24:01 2018 : WARNING: (277) WARNING: Module rlm_krb5
became unblocked
Thu Feb 22 17:24:46 2018 : Error: Unresponsive child for request 278, in
component authenticate module krb5
Thu Feb 22 17:24:52 2018 : Info: Need 1 more connections to reach 5 spares
Thu Feb 22 17:24:52 2018 : Info: rlm_krb5 (krb5): Opening additional
connection (5), 1 of 5 pending slots used
Thu Feb 22 17:24:52 2018 : WARNING: (278) WARNING: Module rlm_krb5
became unblocked
Thu Feb 22 17:25:00 2018 : Error: Unresponsive child for request 279, in
component authenticate module krb5
Thu Feb 22 17:25:01 2018 : WARNING: (279) WARNING: Module rlm_krb5
became unblocked
Thu Feb 22 17:25:45 2018 : Info: rlm_krb5 (krb5): Closing connection
(4), from 1 unused connections
Thu Feb 22 17:25:45 2018 : Auth: (280) Login OK: [nagiostest] (from
client wrn_mon2 port 0)
Thu Feb 22 17:26:00 2018 : Error: Unresponsive child for request 281, in
component authenticate module krb5
Thu Feb 22 17:26:10 2018 : WARNING: (281) WARNING: Module rlm_krb5
became unblocked
Thu Feb 22 17:26:46 2018 : Error: Unresponsive child for request 282, in
component authenticate module krb5
Thu Feb 22 17:26:56 2018 : Info: Need 1 more connections to reach 5 spares
Thu Feb 22 17:26:56 2018 : Info: rlm_krb5 (krb5): Opening additional
connection (6), 1 of 5 pending slots used
Thu Feb 22 17:26:56 2018 : WARNING: (282) WARNING: Module rlm_krb5
became unblocked
Thu Feb 22 17:27:00 2018 : Error: Unresponsive child for request 283, in
component authenticate module krb5
Thu Feb 22 17:27:10 2018 : WARNING: (283) WARNING: Module rlm_krb5
became unblocked
Thu Feb 22 17:27:43 2018 : Info: rlm_krb5 (krb5): Closing connection
(1), from 1 unused connections
Thu Feb 22 17:27:43 2018 : Auth: (284) Login OK: [nagiostest] (from
client wrn_mon2 port 0)
Thu Feb 22 17:28:00 2018 : Auth: (285) Login OK: [nagiostest] (from
client ix_nagios port 0)
... all OK again
I'm not sure about the meaning of the message "became unblocked".
Looking at the source of call_modsingle()
blocked = (request->master_state == REQUEST_STOP_PROCESSING);
if (blocked) return RLM_MODULE_NOOP;
...
blocked = (request->master_state == REQUEST_STOP_PROCESSING);
if (blocked) {
RWARN("Module %s became unblocked",
sp->modinst->entry->name);
}
It seems to me that the module has unexpectedly become blocked (not
unblocked!) But does this simply mean that the module is taking longer
to respond than expected? Or if not, what else does it indicate?
Apart from this: I wondered if you had any pointers to narrowing down
the problem. There are four servers so I can probably leave one running
under freeradius -X until it recurs.
Now, of course the problem *could* be with Samba4. However I have a
separate nagios script (below) which performs LDAP queries using krb5
authentication, and it never reports a problem - not even an occasional
"soft" error; that suggests that KRB5 is reliable.
It considered if it was a problem with the RADIUS checking plugin
(check_radius_adv), but this seems very unlikely given that (a) both
nagios servers see the same problem simultaneously, and (b) freeradius
itself is logging problems around the time of the issue.
I have:
[radiusd.conf]
max_request_time = 10
thread pool {
start_servers = 3
-----------
#!/bin/sh
host="$1"
if [ "$host" = "" ]; then
echo "Missing hostname"
exit 2
fi
export KRB5_CONFIG="/tmp/krb5.config.$$"
export KRB5CCNAME="/tmp/krb5.cc_cache.$$"
if expr "$host" : ".*:" >/dev/null; then
khost="[$host]"
else
khost="$host"
fi
cat <<EOS >"$KRB5_CONFIG"
[realms]
AD.EXAMPLE.NET = {
kdc = $khost
}
EOS
# 1. Can we get a ticket from this host?
kinit -k -t /etc/nagiostest.keytab nagiostest at AD.EXAMPLE.NET
if [ "$?" != "0" ]; then
echo "KINIT FAILED"
rm "$KRB5_CONFIG"
exit 2
fi
# 2. Can we use this ticket to do an LDAP query to this host?
res="$(ldapsearch -LLLQ -Y GSSAPI -h "$host" -b
"cn=users,dc=ad,dc=example,dc=net" "(samaccountname=nagiostest)"
samaccountname userprincipalname)"
if [ "$?" != "0" ]; then
echo "LDAPSEARCH FAILED"
kdestroy
rm "$KRB5_CONFIG"
exit 2
fi
if ! expr "$res" : ".*userPrincipalName: nagiostest at ad.example.net"
>/dev/null; then
echo "LDAPSEARCH BAD RESULT: $res"
kdestroy
rm "$KRB5_CONFIG"
exit 2
fi
# 3. Clean up
kdestroy
rm "$KRB5_CONFIG"
echo "OK"
More information about the Freeradius-Users
mailing list