Robust Authentication Proxying

Philip Molter hrunting at hrunting.org
Sat Jul 11 22:53:43 CEST 2009


On Jul 11, 2009, at 2:23 PM, Alan DeKok wrote:

> Philip Molter wrote:
>> I did try that.  It did not do what I was attempting to do.
>
>  Hmm... "it didn't work".

I apologize I was not more specific.  The retransmits kept getting  
sent to the same failed home server rather than the failed home server  
being marked dead and the retransmits going to a different home  
server.  I have figured out why.  The minimum zombie_period is 20,  
hard-coded in realms.c.  The zombie_period of 5 you recommended which  
I tried was not taking effect, which lead to my 20 second test timeout  
kicking in before the proxy had waited long enough to actually mark  
the server as dead (The 5th retransmit would have triggered the  
failover, but the proxy only got 3 retransmits).

And that does exactly what I want for this case.  I can provide a  
patch that does the following things:

a) allows lower values than 5 for response_window and 20 for  
zombie_period (I will not change recommendations)
b) makes the post_proxy_fail_handler optional on a pool-by-pool basis

Does that seem acceptable?  You seem hesitant to accept a solution  
that you do not think could be used for more than a few people.  This  
solution is going to be minimally invasive to the code.

Also, is there a config with which the retransmit proxy failover code  
could actually be triggered without the patch?  I cannot see it.   
Failover only happens after the response_window is exceeded, and if  
the response_window is exceeded, the original request is replied to  
with an Access-Reject message, which means any retransmits will be  
never reach the REQUEST_PROXIED state in received_retransmits() after  
the response_window is exceeded.  Am I reading that correctly?

>> Does that seem like a method that can work.  Again, not to replace
>> anything but to supplement it?
>
>  It's an *additional* method, rather than a *better* method.  It
> requires additional code, additional state machine checks, and as
> such... I'm biased against it.  It's just too much like a site- 
> specific
> hack for it to be integrated into the main distribution.
>
>  Adding a *better* method is the preferred approach.  It's OK to  
> change
> existing behavior, so long as it's for the better.

You had the retransmit failover code already written.  It seems not  
much needs to be done to allow a pool configuration to continue on  
after the response_window has been exceeded.  Let me submit a patch  
and you see what you think.

>> Many NASes can use an internal user cache as a backup to a
>> non-responding or slowly-responding RADIUS server.  If the proxy  
>> returns
>> a an actual Access-Reject, the NAS accepts that and says the  
>> request is
>> invalid.  If the proxy returns nothing, the NAS can say, "Well, my
>> RADIUS server is down, but I have this record for the same user/ 
>> pass in
>> my cache and previously, it received an Access-Accept.  Let me accept
>> this request."  That does not break any RADIUS specifications I  
>> know of,
>
>  Nonsense.  NASes that cache authentication credentials are
> *completely* outside of the RADIUS specification.  It's like adding a
> jet engine to a car.  There's no law *preventing* it, so it must be
> legal, right?

Well, there's nothing in the RADIUS specification that describes or  
even recommends how a lack of response must be handled by the NAS.   
You make it sound as if the NAS is doing something illegal by using a  
previous cached accept.  It's not.  The NAS can implement whatever  
logic it wants, and that particular feature is one that leads to a  
better user experience.  Just because you think a failure-to-contact  
is the same as a denial does not mean that other vendors have not come  
up with solutions that can work around it.

RFC 2607 is clear that the proxy should not respond to the client  
unless it receives a reply from the home server.  At the very least,  
returning a rejection is not an accurate portrayal of the state of the  
authentication.  It would be a better representation to just let it  
timeout, but I understand returning the rejection so that the NAS can  
short-circuit more quickly the transaction.



More information about the Freeradius-Users mailing list