Robust Authentication Proxying

Alan DeKok aland at deployingradius.com
Sat Jul 11 17:15:48 CEST 2009


  I think there's a fundamental disconnect here.  I'm trying to explain
that RADIUS is an imperfect protocol.  You're trying to find ways of
configuring FreeRADIUS to be work around those imperfections.

  My suggestions are:

 1) realize that RADIUS is imperfect.  If a home server fails, there
will *always* be a request that is lost, rejected, timed out, etc.  The
client WILL fail authentication and disconnect the user when this
happens.  This is how RADIUS works.

  2) proxy fail-over DOES work in the server.  Maybe not exactly the way
you want... but I recall asking you for specific suggestions as to how
to make it better, and getting... not much.

  3) If you want to try source code mods, go to src/main/event.c.  Look
in the function no_response_to_proxied_request().  Find the line:

	post_proxy_fail_handler(request);

  Delete it.  Re-compile && re-install radiusd.  Then try the fail over
tests again.  It SHOULD cause fail-over to backup home servers for one
request.  Do NOT configure the "do_not_respond" policy.  Try setting
"response_window = 5" and "zombie_period = 5".

  4) Try setting the home server pool type to "load-balance" (again, as
I have suggested).  It WILL still fail over from one server to another.
 But the "load-balance" portion will spread the load MUCH more evenly
across all home servers, and there will be FEWER failed requests when a
home server dies.

  I hope those suggestions will be followed.  They are your best hope
for resolving this issue.

  And now for a detailed response to the rest of your comments.

> Yes, if the NAS sends another separate request with a different ID, it
> will be proxied to a different home server, but that does not help the
> poor guy who had the hard luck of his request hitting the bad home
> server.  He will get an error message.  He will have to retry or call
> support or whatever.

  The suggestions above will help.  The code changes MAY help.

  If you want the users to always get a response, ensure that there are
never any failures in your system.  No amount of poking FreeRADIUS will
fix the problem that home servers go down.

> Yes, a subsequent, different request will go
> to a different home server, but, again, I want to use the proxy to
> provide smarter resiliency across a pool of servers.

  So... use "load-balance", as I suggested.  That can provide a slightly
better response in some situations.

> I guess I do not see those as negatives.  That is exactly what I want to
> happen.  RADIUS network traffic is tiny.  The system load created by
> sending multiple requests to a home server or a bunch of home servers is
> minimal. I am not seeing how you are adding any more load when instead,
> the proxy sends back an Access-Reject, which, in the best case scenario,
> will result in the end-client re-authenticating, generating yet another
> request.

  You will be sending packets to TWO home servers, rather than one.
This might be fine in your situation.  It is definitely not fine in
other situations.

  FreeRADIUS is designed to work in a wide variety of environments.
This means that it might NOT work exactly the way you demand.  The
solution is simple: you have source code.  Fix it.  If we add a fix that
will make *your* situation work, it is likely to break *other* peoples
networks.

  We can't take that risk.

> Your argument that the RADIUS server cannot handle a retry does not hold
> water to me, 

  That isn't what I said.  That's part of the disconnect in communication.

> Okay, AT BEST you get 3-6 different home servers in a 30-second period. 
> Right now, AT BEST I get 1.  Which method is more resilient?  Which
> method results in no false rejections being returned to the NAS?  

  Which method is working in 100,000 deployments?

  You should note that I asked you for *specific* suggestions for a
better algorithm.  Your response was "I want it to fail over sooner".
That is unhelpful.

  If you are so set on demanding something better, then offer *concrete*
suggestions for how to fix it.  Look at the code.  It's available,
commented, and reasonably clean.

  Come up with a *better* method, and we'll implement it.  The current
repetition of "it's bad and I want it to work differently" isn't useful.

>  I have my NASes configured to retry for
> up to 60 seconds, once every 2 seconds.  They will retry 30 times.  It
> is more important to me that authentication requests succeed, even if
> they succeed slowly.  It sounds to me like freeradius is making
> assumptions about how NASes should work, and as a result, reducing the
> flexibility it provides.

  That's what "max_request_time" is for.  If your NAS is retransmitting
for 60 seconds, set "max_request_time" to 60 or 62 seconds.

>>  If you want the proxy to fail over, send it more than ONE request at a
>> time (like a normal proxying system), and do NOT configure the "do not
>> respond" policy.
> 
> So my NAS now has to send two separate requests for the same
> authentication, and pick the one that does not come back with an
> Access-Reject?

  That is not what I meant.  You seemed to be claiming that it NEVER
failed over.  I pointed out that it does fail over, and gave an example
of when  and how it fails over.

> Again, I am not arguing that the proxy will not fail over.

  Well... that wasn't at all clear from your messages.

> To a NAS, there is a big difference between a timeout and a reject.  If
> it does not get a response, a NAS will typically handle the client
> differently than if it gets an explicit rejection.

  Huh?  How?  Will it accept the user?  Will it let them in?  Will it
give them some "minimal" service, even if they weren't authenticated?

  That violates all RADIUS specifications, best practices, and network
security guidelines I'm aware of.

>  Right now, a timeout
> event from the home server results in an explicit rejection (unless I
> configure it not to send that reject).  It IS possible to get the number
> down to zero, because I have used RADIUS software that does it.  The
> only time it should ever be non-zero is if all home servers that can
> possibly be tried in a given window (which might not be all of them, but
> is most likely going to be more than one of them) fail to respond.  Like
> I said, I am trying to migrate to freeradius for some other features.  I
> have used two other proprietary RADIUS server software packages that
> implement this behavior.

  Well... offer *specific* suggestions for changes to FreeRADIUS that
will help implement this.  Try the suggested patches, and see if they help.

> I also understand that Accept-Challenge can complicate the proxying, but
> that is solvable as well with standard state tracking.

  That I disagree with.  EAP makes re-routing proxied Access-Challenges
pretty much impossible. (Except in certain rare situation)

> I know that networks are imperfect.  The answer to that imperfection is
> to retry, not to give up.  When you tell a NAS that the request has been
> rejected when, in fact, it has not, you are not effectively retrying. 
> You are saying, "Do not retry.  You actually got this failed result."

  No.  That's *your* NAS behavior.  Most NASes authenticate end users,
who will hit the "connect network" button again when something fails.

  This is another cause of the miscommunication.  It seems that your
NASes behave *very* differently from standard RADIUS NASes.  They treat
timeouts as "re-try authentication".

  But.. if they behave that way, why did they time out in the first
place?  Why not just set the timeouts to infinity?  That way
authentication will *never* fail.

> But look, I have gone through the code.  Ivan's right, that there is no
> way to get the behavior I want in freeradius without either a module
> (not sure if this is even possible to accomplish via a module because
> proxying is not handled via a module ) or by hacking the code to change
> how proxy no-responses are handled.  It just frustrates me that you
> challenge the value of this.

  Nonsense.  I asked *specifically* for suggestions as to a better
algorithm.  I'm refusing to implement a vague and poorly defined
suggestion.  That shouldn't be a surprise.

  Come up with a well-defined algorithm that's better than the current
one, and we'll implement it.

>  For people like me who use freeradius not
> to serve dial gear but to serve as robust authentication platforms for
> on-network services, where sending a false rejection to a client is an
> SLA issue, having a proxy that can robustly and transparently handle
> transient network failures is very valuable.  With that, we do not have
> to reprogram or replace NAS software (some of which we cannot control)
> to handle those kinds of transient network failures for us.

  I understand that.  Please also understand that it doesn't help to say
"make it better... I don't know HOW, but you guys need to make it better".

  We can't implement magic.  We CAN implement concrete suggestions.

  Alan DeKok.



More information about the Freeradius-Users mailing list