Robust Authentication Proxying

Fri Jul 10 22:12:55 CEST 2009

Alan DeKok wrote:
> Philip Molter wrote:
>> Thanks for your patience with this.  I'm migrating from an old RADIUS
>> platform that supports this behavior to freeradius, and I'm just trying
>> to make sure I get everything working.
> 
>   What behavior?  Failover from one home server to another?  FreeRADIUS
> does this already.
> 
>   I think what you want is to have the re-transmits switch from one home
> server to another *before* the first one has been marked dead.  This is
> difficult to do automatically.  Something like "send retransmits to a
> backup server" is possible, but can have cause other problems.
> 
>   But you can use "radmin" to do this manually.
> 
>> What I really want is just, instead of the request being marked as
>> failed when one of the home servers doesn't respond, for the proxy
>> subsystem to just try sending the request to another configured home
>> server.
> 
>   But it already does that.  Run the server, and watch how it behaves.
> As I said before, the difficulty is determining *when* to do this failover.
> 
>>  If the proxy has tried sending a request to every non-zombie
>> home server in the list and still hasn't gotten anything, then it can
>> mark the request as failed.
> 
>   Sorry, but it takes time to determine that a home server has failed.
> By the time this decision has been made for 2-3 home servers, 30 seconds
> have usually passed, and the NAS has given up on the request.
> 
>> The way I originally thought it was going to work is similar to how
>> modules are load-balanced.  If I have five SQL servers loaded through 5
>> named SQL module configs, it will try the first, then the second, then
>> the third until one of them returns success.  It would be great if the
>> proxy load-balancing could work the same way.
> 
>   Unless I'm really missing something, it already does this.  Just
> configure "type = load-balance" in the home server pool.
> 
>   Have you done this?

Yes, this is the configuration I'm currently running, and it's not 
working for me.  I have a radclient sending a request, retrying 10 times 
on a 5-second timer, and after 10 retries, it still hasn't gotten a 
response.  After the second retry, the proxy has marked the server as at 
least a zombie and started status-checks, but every retransmit after 
that is getting a cached result of no response.

>   What do you expect the proxy to do with requests sent to a home server
> that *might* be down?  How should the proxy decide that the home server
> is down?  Be specific.  Draw flow diagrams...

This is what I want to happen

client req ->  proxy
                proxy req ->  home server #1
client ret ->  proxy
                proxy ret ->  home server #1
               [proxy fails home server #1 for lack of response]
client ret ->  proxy
                proxy req ->  home server #2
                proxy <- resp home server #2
client <- resp proxy

This is what is happening with my post-proxy config:

client req ->  proxy
                proxy req -> home server #1
client ret ->  proxy
                proxy ret -> home server #1
               [proxy fails home server #1 for lack of response]
client ret ->  proxy
               [proxy detects retransmit, does nothing]
client ret ->  proxy
               [proxy detects retransmit, does nothing]
client ret ->  proxy
               [proxy detects retransmit, does nothing]
...

This is what happens without a post-proxy config:

client req ->  proxy
                proxy req -> home server #1
client ret ->  proxy
                proxy ret -> home server #1
               [proxy fails home server #1 for lack of response]
client  <- rej proxy

>   If you can come up with a better algorithm, then by all means we'll
> implement it.  But coming up with an algorithm that works *well* from
> limited information is hard.
> 
>   The issue with your configuration is that you are trying valiantly to
> game the system.  You're setting the timers *way* too low, and the
> marking the requests as failed too early.  When the NAS retransmits, you
> claim you want the proxy to fail over to another server... AFTER you've
> already told it to give up on the request.

My config is not marking any request as failed.  If I do not configure 
anything for Post-Proxy-Type, I get back an Access-Reject right when the 
first home server fails.  There is no failover.  The comments in 
proxy.conf make that clear:

#  If the home server doesn't respond to the request within
#  this time, this server will consider the request dead, and
#  respond to the NAS with an Access-Reject.

In other words, if the server the load-balance solution happens to 
choose doesn't respond to my request, tough luck.  I might have 19 other 
servers configured that are up, the request I just sent is getting an 
Access-Reject.  The Post-Proxy-Type is just a hack to at least not send 
back an Access-Reject which breaks the whole process.

>   Your configuration is contradicting your stated needs.  Fix one or the
> other so that there is no contradiction.

Okay, so I obviously do not understand how I can tweak response_window 
and zombie_period to make sure that requests that can be serviced by 
many possible RADIUS home servers do not return an Access-Reject when 
one of those home servers does not respond.

Here are my stated needs.

The client sends a request to the proxy.  If a home server does not 
respond within a short period of time to the request, a second home 
server is chosen.  If the second home server does not respond to the 
same request, then a third is chosen.  This continues until all possible 
home servers are exhausted.  At that point, an Access-Reject packet is 
sent back to the client.  Otherwise, the response from the home server 
is sent back to the client.

How do I configure that?  It doesn't seem to matter what I set 
response_window or zombie_period to, once the first home server fails to 
respond, an Access-Reject (or nothing if I configure a post-proxy 
handler) is returned to the client.  My client's not going to retry the 
request if he gets an Access-Reject, so I need the proxy to retry it.

Is that possible?

Philip