Access-Reject (code 3) on an otherwise valid Accept packet / delayed reject when SQL module fails in post-auth
Jason Antman
jantman at oit.rutgers.edu
Fri Oct 28 16:40:07 CEST 2011
Hello,
I'm have a FreeRADIUS 2.1.10 server on CentOS... with a backend MySQL
database (logging) which is, as Alan would say, Broken - at peak usage
times it's horribly overloaded. Up until yesterday, our post-auth
section had a "sql" call not wrapped in a "redundant { sql / ok }". It
seems that when the MySQL server gets overloaded, and the sql module
fails/times out (can't find a configuration value for timeout, or a note
on what the timeout actually is), the post-auth section stops and
returns a Reject packet. Since it stops before attr_filter, it's
returning a non-RFC-compliant REJECT... i.e. an otherwise valid and
correct Access-Accept packet, but with a Code of 3 (Access-Reject) since
the sql module failed.
For the short-term, I wrapped the sql calls in post-auth and
post-auth-type reject in redundant, followed by "ok", so that users will
not get rejected because of failed logging (administrative/managerial
decision, not mine) and gave up SQL post-auth logging in favor of not
locking out users when the DB is struggling. Using the test described
below, this appears to solve the problem of the improperly formed REJECT
packets, but it still ends up with lots of "There are no DB handles to
use! skipped 0, tried to connect 0" and then lots of "Info: WARNING
Child is hung for request XXXX in component post-auth module sql."
lines... I can only assume that these are a non-critical issue (i.e. the
child is hung but will recover), but 1) I would like to set a much lower
database timeout, i.e. have the sql module fail sooner and move on, and
2) as the "Login (OK|incorrect)" messages don't have the request number,
it's difficult to associate these with a given client.
An acceptable way to reproduce this is to start radiusd, and then issue
a LOCK TABLES radpostauth WRITE; followed by bombarding the radius
server with access-requests. In my case, after the DB handles fill up, I
will start getting these invalid Reject packets out (a valid accept with
a code of 3), along with the above log messages - and "Login OK"
messages even for the clients who are rejected due to the sql failure.
So a few questions:
1) The logging for this is flat-out wrong. When the SQL module fails in
this way in post-auth (because of timeout), I'm still seeing "Auth:
Login OK" in radius.log. I guess this could be "a matter of opinion",
but the fact that I'm seeing a log message for OK but getting a reject
out on the wire, seems like a bug. IMHO, the logging of ok/incorrect
should either a) be delayed until JUST before the packet goes out,
beyond where an Accept could turn into a Reject, or b) if/when an Accept
packet is turned into a Reject and sent out on the wire, there should be
an explicit log message stating such (as opposed to just the current
"Sending delayed reject" which appears only in debug logs).
2) Is it the expected behavior that if a module in post-auth fails, the
packet will be changed to a reject without passing through
Post-Auth-Type REJECT, therefore without passing through attr_filter,
therefore sending out a non-RFC-compliant Access-Reject packet?
3) As far as I can tell, this entire issue does *not* happen when
running "radiusd -fXx"... in that case, radiusd will simply hang
indefinitely until the DB recovers (or the table is unlocked, in
testing). Does this sound right? Does anyone know if rlm_sql_mysql will
behave differently in this way in debug mode as opposed to normal daemon
mode? And does it seem correct that in debug mode, there doesn't seem to
be a timeout for mysql? I was, however, able to replicate the issue when
running "radiusd -fxx", i.e. no single threaded mode.
4) As an example of log output, see the lines below. I know that the
15:19:30 "Login OK" for client mc4k-121 port 8195 actually left as an
Access-Reject packet (code 3), which still had User-Name, Callback-Id,
Session-Timeout, Termination-Action, and two VSAs set, as the proper
Accept for that user should. In certain instances with this problem, the
SQL logging was queued somewhere (presumably on the MySQL side, waiting
for the unlock) and actually made it to the DB, listed as an Accept
packet and all, after radiusd sent out the incompliant Reject.
However, of concern in the logs below is the fact that 3 of those Login
OK lines (outer tunnel) were actually the invalid Access-Reject packets
on the wire, but there's no indication of that here. While there is a
vague "There are no DB handles to use" message, that doesn't specify
which request it relates to (hence no way to tell which Login OK was
actually rejected), nor does it explicitly say that the request will be
rejected.
### BEGIN LOG SNIP
Thu Oct 27 15:19:29 2011 : Auth: Login OK: [usernameRedacted] (from
client mc5k-125 port 8255 cli 00:22:FB:8B:58:04 via TLS tunnel)
Thu Oct 27 15:19:29 2011 : Auth: Login OK: [usernameRedacted] (from
client mc5k-125 port 8255 cli 00:22:FB:8B:58:04)
Thu Oct 27 15:19:29 2011 : Auth: Login OK: [usernameRedacted] (from
client mc5k-125 port 10287 cli 58:B0:35:41:46:A1 via TLS tunnel)
Thu Oct 27 15:19:29 2011 : Auth: Login OK: [usernameRedacted] (from
client mc5k-125 port 10287 cli 58:B0:35:41:46:A1)
Thu Oct 27 15:19:29 2011 : Auth: Login OK: [usernameRedacted] (from
client mc4k-121 port 8203 cli 1C:65:9D:16:2D:EB via TLS tunnel)
Thu Oct 27 15:19:29 2011 : Auth: Login OK: [usernameRedacted] (from
client mc4k-121 port 8203 cli 1C:65:9D:16:2D:EB)
Thu Oct 27 15:19:29 2011 : Auth: Login OK: [usernameRedacted] (from
client mc4k-121 port 8289 cli 70:F1:A1:8B:F2:F8 via TLS tunnel)
Thu Oct 27 15:19:29 2011 : Auth: Login OK: [usernameRedacted] (from
client mc4k-121 port 8289 cli 70:F1:A1:8B:F2:F8)
Thu Oct 27 15:19:29 2011 : Auth: Login OK: [usernameRedacted] (from
client mc5k-126 port 6243 cli 88:53:2E:0D:F2:D7 via TLS tunnel)
Thu Oct 27 15:19:29 2011 : Auth: Login OK: [usernameRedacted] (from
client mc5k-126 port 6243 cli 88:53:2E:0D:F2:D7)
Thu Oct 27 15:19:29 2011 : Auth: Login OK: [usernameRedacted] (from
client mc4k-121 port 8288 cli 88:9F:FA:77:F7:EE via TLS tunnel)
Thu Oct 27 15:19:29 2011 : Auth: Login OK: [usernameRedacted] (from
client mc4k-121 port 8288 cli 88:9F:FA:77:F7:EE)
Thu Oct 27 15:19:29 2011 : Auth: Login OK: [usernameRedacted] (from
client mc4k-121 port 8209 cli 8C:58:77:4C:78:CE via TLS tunnel)
Thu Oct 27 15:19:29 2011 : Auth: Login OK: [usernameRedacted] (from
client mc4k-121 port 8209 cli 8C:58:77:4C:78:CE)
Thu Oct 27 15:19:29 2011 : Auth: Login OK: [usernameRedacted] (from
client mc4k-121 port 8202 cli F0:B4:79:45:14:DD via TLS tunnel)
Thu Oct 27 15:19:29 2011 : Auth: Login OK: [usernameRedacted] (from
client mc4k-121 port 8202 cli F0:B4:79:45:14:DD)
Thu Oct 27 15:19:29 2011 : Info: rlm_sql (sql): There are no DB handles
to use! skipped 0, tried to connect 0
Thu Oct 27 15:19:29 2011 : Auth: Login OK: [usernameRedacted] (from
client mc5k-126 port 6278 cli 18:E7:F4:F6:C8:E6 via TLS tunnel)
Thu Oct 27 15:19:29 2011 : Auth: Login OK: [usernameRedacted] (from
client mc5k-126 port 6278 cli 18:E7:F4:F6:C8:E6)
Thu Oct 27 15:19:30 2011 : Auth: Login OK: [usernameRedacted] (from
client mc4k-121 port 8273 cli 00:26:C6:83:50:82 via TLS tunnel)
Thu Oct 27 15:19:30 2011 : Auth: Login OK: [usernameRedacted] (from
client mc4k-121 port 8273 cli 00:26:C6:83:50:82)
Thu Oct 27 15:19:30 2011 : Info: rlm_sql (sql): There are no DB handles
to use! skipped 0, tried to connect 0
Thu Oct 27 15:19:30 2011 : Auth: Login OK: [usernameRedacted] (from
client mc4k-121 port 8295 cli CC:08:E0:EA:1A:23 via TLS tunnel)
Thu Oct 27 15:19:30 2011 : Auth: Login OK: [usernameRedacted] (from
client mc4k-121 port 8295 cli CC:08:E0:EA:1A:23)
Thu Oct 27 15:19:30 2011 : Auth: Login OK: [usernameRedacted] (from
client mc4k-121 port 8195 cli 00:1E:C2:AB:03:07 via TLS tunnel)
Thu Oct 27 15:19:30 2011 : Auth: Login OK: [usernameRedacted] (from
client mc4k-121 port 8195 cli 00:1E:C2:AB:03:07)
Thu Oct 27 15:19:30 2011 : Auth: Login OK: [usernameRedacted] (from
client mc4k-121 port 8219 cli 00:21:E9:DE:F2:3A via TLS tunnel)
Thu Oct 27 15:19:30 2011 : Auth: Login OK: [usernameRedacted] (from
client mc4k-121 port 8219 cli 00:21:E9:DE:F2:3A)
Thu Oct 27 15:19:30 2011 : Auth: Login OK: [usernameRedacted] (from
client mc4k-121 port 8272 cli 00:25:56:51:82:C8 via TLS tunnel)
Thu Oct 27 15:19:30 2011 : Auth: Login OK: [usernameRedacted] (from
client mc4k-121 port 8272 cli 00:25:56:51:82:C8)
Thu Oct 27 15:19:30 2011 : Auth: Login OK: [usernameRedacted] (from
client mc4k-121 port 8280 cli E4:CE:8F:41:DD:82 via TLS tunnel)
Thu Oct 27 15:19:30 2011 : Auth: Login OK: [usernameRedacted] (from
client mc4k-121 port 8280 cli E4:CE:8F:41:DD:82)
Thu Oct 27 15:19:30 2011 : Auth: Login OK: [usernameRedacted] (from
client mc4k-121 port 8300 cli 58:1F:AA:69:48:49 via TLS tunnel)
Thu Oct 27 15:19:30 2011 : Auth: Login OK: [usernameRedacted] (from
client mc4k-121 port 8300 cli 58:1F:AA:69:48:49)
Thu Oct 27 15:19:30 2011 : Auth: Login OK: [usernameRedacted] (from
client mc4k-121 port 10241 cli 04:0C:CE:D7:B3:28 via TLS tunnel)
Thu Oct 27 15:19:30 2011 : Auth: Login OK: [usernameRedacted] (from
client mc4k-121 port 10241 cli 04:0C:CE:D7:B3:28)
Thu Oct 27 15:19:31 2011 : Auth: Login OK: [usernameRedacted] (from
client mc4k-121 port 8301 cli 8C:58:77:66:0C:AE via TLS tunnel)
Thu Oct 27 15:19:31 2011 : Auth: Login OK: [usernameRedacted] (from
client mc4k-121 port 8301 cli 8C:58:77:66:0C:AE)
Thu Oct 27 15:19:31 2011 : Auth: Login OK: [usernameRedacted] (from
client mc4k-121 port 8202 cli F0:B4:79:45:14:DD via TLS tunnel)
Thu Oct 27 15:19:31 2011 : Auth: Login OK: [usernameRedacted] (from
client mc4k-121 port 8202 cli F0:B4:79:45:14:DD)
Thu Oct 27 15:19:31 2011 : Auth: Login OK: [usernameRedacted] (from
client mc4k-121 port 8284 cli 34:51:C9:2E:05:DD via TLS tunnel)
Thu Oct 27 15:19:31 2011 : Auth: Login OK: [usernameRedacted] (from
client mc4k-121 port 8284 cli 34:51:C9:2E:05:DD)
Thu Oct 27 15:19:31 2011 : Auth: Login OK: [usernameRedacted] (from
client mc4k-121 port 8271 cli 00:22:69:68:87:B4 via TLS tunnel)
Thu Oct 27 15:19:31 2011 : Auth: Login OK: [usernameRedacted] (from
client mc4k-121 port 8271 cli 00:22:69:68:87:B4)
Thu Oct 27 15:19:31 2011 : Auth: Login OK: [usernameRedacted] (from
client mc4k-121 port 8303 cli 00:21:5C:0C:9C:61 via TLS tunnel)
Thu Oct 27 15:19:31 2011 : Auth: Login OK: [usernameRedacted] (from
client mc4k-121 port 8303 cli 00:21:5C:0C:9C:61)
Thu Oct 27 15:19:31 2011 : Auth: Login OK: [usernameRedacted] (from
client mc4k-121 port 8213 cli 00:25:00:4D:5D:EA via TLS tunnel)
### END LOG SNIP
So to summarize what I'm asking for/about:
1) In instances of a "delayed reject" caused by a failure in post-auth,
would it be possible to have this clearly logged to radius.log, either
instead of "Login OK", or something along the lines of "WARNING -
delayed reject of previous request (request id, username, Calling
Station Id, client, port)"?
2) Is there any way to maintain attr_filter.access_reject even with a
delayed reject because of module failure in post-auth, so that these
packets are still RFC-compliant?
3) Is there any way to configure the rlm_sql_mysql timeout, or whatever
is causing this module to fail when the table is locked?
4) Can I get some clarification on the "There are no DB handles to use!
skipped 0, tried to connect 0" and "Info: WARNING Child is hung for
request XXXX in component post-auth module sql" radius.log messages, and
the severity of these (i.e. does either of them definitively mean that a
request was rejected?)
Thank you for any advice/insight/comments/suggestions,
Jason Antman
Rutgers University, NJ, USA
More information about the Freeradius-Users
mailing list