Behavior of FreeRADIUS auth when SQL backend becomes inaccessible
Patrick Wagner
patrick.wagner at rga-net.de
Wed Mar 5 10:39:27 CET 2014
Hello,
we've got a problem with FreeRADIUS / rlm_sql behavior that prevents us from
enjoying a fully redundant RADIUS service, consisting of 3 FreeRADIUS
servers, probably because of compounded mistakes / misunderstandings in our
configuration that I've attached at the bottom of this post.
Setup:
- each of the 3 servers consists of a FR instance and a MySQL instance that
contains all auth data and the radacct table to store acct data
- each FR instance connects to its local SQL instance only, there's no
redundant or load-balance setup required - going that route and have FR try
to connect to the SQL instance on one of the other nodes in case of errors
with the local one would also solve our problem, but we deem it unnecessary
if FR was able to handle a failed SQL connection "properly"
- the SQL nodes are configured as Galera multi-master cluster, so any node
is operating on the same set of data
-> this means every node should work perfectly fine on its own, which is why
we don't think we need to implement the redundancy options that FR offers.
If SQL is unavailable before we start FR, FR refuses to start and exits
immediately after it finds out it cannot connect to the local SQL instance,
with "Instantiation failed for module sql_localhost". This behavior is
perfectly fine because it means that any NAS client sending requests to that
particular FR node will find that the node does not respond, and the client
will retry the request with the other RADIUS servers it knows of and
hopefully, at least one of them will answer.
However, if we start FR while and subsequently shut down the SQL instance,
rlm_sql returns a fail, "SQL query error; rejecting user", and FR
subsequently sends a REJECT response to any NAS request it receives, which
is not at all the behavior we'd like to see as it means that any NAS
querying this particular FR node will deny all requests instead of retrying
the request with another node. I've seen a post on this list by Alan DeKok
suggesting that "fail = ok" (or = invalid or whatever we should use in this
case) is proper unlang or at least was proper syntax back in 2007, but with
FR 2.1 (the respective version packaged with Ubuntu 10.04 LTS and CentOS
6.5) I was unable to start FR with such a statement added in either the
"authorize" or "post-auth" section, declaring "Unknown action 'invalid'"
when parsing the config.
Respective thread which I hoped would give me some pointers:
http://lists.freeradius.org/pipermail/freeradius-users/2007-December/024055.
html
Reply by Alan DeKok which I went on to try, unsuccessfully, with the error
message quoted above:
http://lists.freeradius.org/pipermail/freeradius-users/2007-December/024059.
html
Actually, FR's current behavior is a bit more irritating to us because we
need to use a custom huntgroup SQL query that we placed in an "update
request" section right before we (try to) query SQL for auth in the
"authorize" section, but instead of two times "fail" we get two different
error codes from the two statements when SQL is unavailable:
- "++ [request] returns notfound"
and later
- " ++[sql_localhost] returns fail"
Do we need to suppress / rewrite both of them? Suppressing the first one is
impossible, I think, because in "update request" apparently FR doesn't
differentiate between a query that was executed returned but returned an
empty result, and a failed query (because SQL was unavailable). The
invocation of sql_localhost, right below, does differentiate, as it returns
fail instead of notfound.
What is the proper way to allow the NAS clients to fail over to another FR
node altogether instead of getting misleading and in most cases outright
wrong information ("Invalid user" is what FR tells the NAS) from FR? Can we
make FR just not reply to the request at all in these cases, or send a
request that signals to the NAS that it should try the FR node next door
instead because this FR node is unable to make any definitive statement?
And finally, we're forwarding exactly one particular realm to another RADIUS
server outside of our administrative control, and while any information FR
needs to be able identify these requests as "to-be-proxied" is configured in
plaintext files and thus should continue to work if SQL fails, requests for
this realm also fail as soon as we shut down SQL, because the explicit
REJECT from SQL makes FR not even proxy the request to the home server
before telling the NAS that the Login request should be denied.
Why does FR try to run the query against SQL (i.e. its own authorize
section) at all if it knows from config that it should simply forward the
request (unmodified even, we don't use pre-proxy or post-proxy at all) and
wait for the reply of the home server for this particular realm?
The last issue doesn't occur if we put a redundant {sql_localhost; handled}
block instead of the single "sql_localhost" statement in the auth section,
but I don't know WHY it works (it probably causes side effects we don't
want), or rather, I figured that somehow the reseller request always gets
checked against the local SQL database first (which it shouldn't or at least
doesn't need to waste any CPU cycles on as it will never find anything in
there about the reseller's customers), no matter whether the SQL connections
works or doesn't work, but somehow a "notfound" from SQL leads FR to finally
proxy the request to the reseller RADIUS server and get a proper answer,
while a "fail" from SQL somehow skips the proxying step and outright denies
the request.
Obviously we don't want the proxy requests to ever get checked locally -
this would solve this issue completely.
I've attached the configuration we use currently below - if you need radius
-X output for a particular scenario, let me know.
- Patrick Wagner
__________________________
Configuration of the only sites-enabled/ site document:
authorize {
update request {
Huntgroup-Name := "%{sql_localhost:select groupname from
radhuntgroup where nasipaddress=\"%{NAS-IP-Address}\"}"
}
preprocess
chap
mschap
realmraute # realms are differentiated by #suffix
eap {
ok = return
}
sql_localhost
# the following allows reseller requests to get proxied to the correct home
server, but with unknown side effects
# redundant {
# sql_localhost
# handled
# }
expiration
logintime
pap
}
authenticate {
Auth-Type PAP {
pap
}
Auth-Type CHAP {
chap
}
Auth-Type MS-CHAP {
mschap
}
unix
eap
}
preacct {
preprocess
acct_unique
realmraute
}
accounting {
detail {
fail = 1
}
unix
radutmp
sql_localhost
attr_filter.accounting_response
}
session {
sql_localhost
}
post-auth {
exec
Post-Auth-Type REJECT {
attr_filter.access_reject
}
}
pre-proxy {
}
post-proxy {
}
______________________
Configuration of radiusd.conf:
prefix = /usr
exec_prefix = /usr
sysconfdir = /etc
localstatedir = /var
sbindir = ${exec_prefix}/sbin
logdir = /var/log/freeradius
raddbdir = /etc/freeradius
radacctdir = ${logdir}/radacct
name = freeradius
confdir = ${raddbdir}
run_dir = ${localstatedir}/run/${name}
db_dir = ${raddbdir}
libdir = /usr/lib/freeradius
pidfile = ${run_dir}/${name}.pid
user = freerad
group = freerad
max_request_time = 30
cleanup_delay = 5
max_requests = 1024
listen {
type = auth
ipaddr = 2.2.2.2
port = 1645
}
listen {
ipaddr = 2.2.2.2
port = 1646
type = acct
}
listen {
ipaddr = 192.168.5.2
port = 1645
type = auth
}
listen {
ipaddr = 192.168.5.2
port = 1646
type = acct
}
hostname_lookups = no
allow_core_dumps = no
regular_expressions = yes
extended_expressions = yes
log {
destination = files
file = ${logdir}/radius.log
syslog_facility = daemon
stripped_names = no
auth = no
auth_badpass = no
auth_goodpass = no
}
checkrad = ${sbindir}/checkrad
security {
max_attributes = 200
reject_delay = 1
status_server = yes
}
proxy_requests = yes
$INCLUDE proxy.conf
thread pool {
start_servers = 5
max_servers = 50
min_spare_servers = 3
max_spare_servers = 10
max_requests_per_server = 0
}
modules {
$INCLUDE ${confdir}/modules/
$INCLUDE eap.conf
$INCLUDE sql_localhost.conf
$INCLUDE sql/mysql/counter.conf
}
instantiate {
exec
expr
expiration
logintime
}
$INCLUDE policy.conf
$INCLUDE sites-enabled/
_________________
Configuration of proxy.conf:
proxy server {
synchronous = no
retry_delay = 5
retry_count = 3
dead_time = 120
default_fallback = no
post_proxy_authorize = no
}
realm resrealm at domain.tld <mailto:resrealm at domain.tld> {
type = radius
authhost = 10.10.10.10:1645
accthost = 10.10.10.10:1646
secret = respw
nostrip
}
realm LOCAL {
type = radius
authhost = LOCAL
accthost = LOCAL
}
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.freeradius.org/pipermail/freeradius-users/attachments/20140305/7f1ac77a/attachment-0001.html>
More information about the Freeradius-Users
mailing list