Handling unreliable proxy partners

Wed May 19 19:37:14 CEST 2021

On May 19, 2021, at 1:20 PM, Paul Moser via Freeradius-Users <freeradius-users at lists.freeradius.org> wrote:
> We'd also like a manual mechanism that our support team can trigger to cover other failure scenarios, eg  the remote radius server is incorrectly returning access-reject for all valid users, and those scenarios that we haven't been able to think of but will occur, inevitably at the most inconvenient of times.
> 
> My first attempt at this was that the support team could use radmin to set the home servers to dead which would mean packets were routed via the falback virtual server. I initially thought this worked as a solution, but if FreeRadius is doing status checks against the remote servers then it will automatically bring them back into service as long as the status check requests are responding, which if say the remote partner is responding with access-rejects to even valid users is not what you want.

  Yes.  The server tries very hard to proxy packets.

> One idea I haven't explored is having two copies of each virtual server, in different files, one for the normal situation and one for the failure situation and switching which one to use using symlinks and radmin to reload the configuration.

  That's probably too complex.  I wouldn't recommend it.

> So far what I have come up with so far is within a virtual server pre-proxy section to use the exec module to call a simple shell script that check for the presence of flag files indicating which if any partners are in a bad state. The support team are responsible for creating these files. If any flag files are present the the script adds a radius attribute for each, the value indicating which partner. In the pre-proxy section I can then check for this attribute and value if it indicates that the partner the virtual server is handling is in a failure state then call accept from the always module which will cancel the proxying attempt and send an access-accept. We can also call any policy that would also get called in the fallback virtual server or Post-Proxy-Type Fail-Authentication if we want common radius attributes to be returned in the response to apply some sort of QoS restriction.
> 
> The rlm_exec documentation states using exec is very slow and something like the perl module would be more appropriate for a live environment. Before I carry on down the path of performance testing this and trying perl/python/rest/custom C module does anyone have any thoughts/observations or alternative suggestions?

  Use "rlm_always".   From the Changelog for 3.0.22:

	* New xlat for setting status of rlm_always instances and new
	  resource-check example virtual server for manipulating control flow
	  in unlang policies based on status of some external resource.
	  Patches from Terry Burton.

https://github.com/FreeRADIUS/freeradius-server/blob/v3.0.x/raddb/mods-available/always#L35

  You can also also use "radmin" to poke the "always" configuration live:

radmin> set module config always rcode fail

  The idea would be to create an instance of the "always" module, for each home server / pool you're proxying to.  You can then do something like:

always server_x_status {
	rcode = ok
}

  and then

	server_x_status
	if (ok) {
		proxy to server x
	}

  And then you can set that to ok / fail, depending on whatever you want. 

  Alan DeKok.