Behavior of FreeRADIUS auth when SQL backend becomes inaccessible

Wed Mar 5 19:11:46 CET 2014

Alan DeKok wrote:
> Patrick Wagner wrote:
>> However, if we start FR while  and subsequently shut down the SQL
>> instance, rlm_sql returns a fail, “SQL query error; rejecting user”, and
>> FR subsequently sends a REJECT response to any NAS request it receives,
>> which is not at all the behavior we’d like to see as it means that any
>> NAS querying this particular FR node will deny all requests instead of
>> retrying the request with another node.
>    In 2.2.3, you can use the "do_not_respond" policy.
>
> 	sql
> 	if (fail) {
> 		do_not_respond
> 		
> 	}
This is entirely equivalent to the implementation Arran has suggested, 
enclosing sql and do_not_respond in a "redundant" block, correct?

>> And finally, we're forwarding exactly one particular realm to another
>> RADIUS server outside of our administrative control, and while any
>> information FR needs to be able identify these requests as
>> "to-be-proxied" is configured in plaintext files and thus should
>> continue to work if SQL fails, requests for this realm also fail as soon
>> as we shut down SQL, because the explicit REJECT from SQL makes FR not
>> even proxy the request to the home server before telling the NAS that
>> the Login request should be denied.
>    I welcome suggestions for a better way to do things.
>
>    Since you're doing local authentication *and* proxying, you should be
> aware that they both run in the same RADIUS server.  The requests also
> come from one NAS.  So adding a "do_not_respond" policy to local auth
> policy, makes the NAS think that the *entire server* is down.  It then
> may not send *any* requests to the server.
>
>    That's why FR defaults to sending a reject.  The NAS thinks that the
> server is alive, and will continue to send it requests.  Including
> requests which need to be proxied.
>
>    There is *no* way around this problem.  There is *no* solution to it.
>   RADIUS simply isn't capable of that fine-grained level of distinction
> you need.  If you expect it to be capable of that, you're wrong.
OK, understood and agreed with. No, we don't need that level of 
distinction, I'm perfectly comfortable with ALL requests failing over to 
another RADIUS server if the SQL backend on one of them fails.
I was just curious as to why freeradius would make this kind of 
distinction between "fail" and other return codes without even 
considering asking the home server configured for the realm. I hadn't 
realised that a fallback to REJECT instead of "do_not_respond" in case 
of a module error (="fail") is indeed a safer default, because I always 
only thought about the issue from the angle of multiple RADIUS servers 
providing fail-over for each other, in which case a "failed" RADIUS 
server pretending to know the answer and replying with a valid but 
factually incorrect RADIUS reply to a NAS client didn't make sense to 
me. But Arran and you cleared that up for me, thanks.

>> Why does FR try to run the query against SQL (i.e. its own authorize
>> section) at all if it knows from config that it should simply forward
>> the request (unmodified even, we don't use pre-proxy or post-proxy at
>> all) and wait for the reply of the home server for this particular realm?
>    Because that's what you told it to do.  It process the "authorize"
> section from top to bottom.  Read the debug log, this should ALL be clear.
>
>    If you want it to avoid the SQL query when proxying, configure it to
> do that:
>
> 	authorize {
> 		realm
> 		if (updated) {
> 			handled
> 		}
>
> 		... everything else ...
>
> 	}
[...]
Yes, this makes sense now. I was confused as to why it wouldn't evaluate 
realmraute further down the config anymore in cases where SQL returned a 
"fail". It's all been fixed in our config and running fine now, at least 
as far as I'm able to test.

- Patrick Wagner