Robust Authentication Proxying

Alan DeKok aland at deployingradius.com
Fri Jul 10 16:22:00 CEST 2009


Philip Molter wrote:
> I must be missing something, because even after the home_server has been
> marked as a zombie, the proxy is still ignoring the retransmits.

  Yes... see the debug log.  You have configured the "do_not_respond"
policy.  As a result, the server doesn't respond to retransmits.

> Furthermore, it's taking much longer than the response_window for the
> home_server to be marked as a zombie.

  Yes.  The server doesn't set up timers for each individual packet that
mark the home server as unresponsive.  That's just too hard.  Instead,
it waits for the client to retransmit the packet, and THEN the server
applies the policies.

> I have a response_window of 1, trying to force the home_server to be
> marked zombie as fast as possible. 

  See the documentation in raddb/proxy.conf.  It clearly says that the
minimum useful value is 5.  Even that is likely to be small.

  And the server will NOT fail over during the zombie period.  It can't.
 It has no idea if the home server is really down or not.

> Here are the log messages (I've
> stripped out test packet contents) for the three client attempts using
> radtest, which sends 3 packets for a total processing time of 15 seconds:
...
> rad_recv: Access-Request packet from host 127.0.0.1 port 39091, id=56,
> length=59
> Sending duplicate proxied request to home server xxx.xxx.xxx.12 port
> 1812 - ID: 175

  As expected...

> Sending Access-Request of id 175 to xxx.xxx.xxx.12 port 1812
> Rejecting request 0 due to lack of any response from home server
> xxx.xxx.xxx.12 port 1812

  But there's no response

>   Found Post-Proxy-Type
> +- entering group Fail {...}
> ++[control] returns noop
> ++- entering policy do_not_respond {...}

  So you say DO NOT RESPOND TO THIS PACKET.

> +++[control] returns noop
> +++[handled] returns handled
> ++- policy do_not_respond returns handled
> Going to the next request
> PROXY: Marking home server xxx.xxx.xxx.12 port 1812 as zombie (it looks
> like it is dead).

  And the home server is marked dead, as you want.

> Sending Status-Server of id 81 to xxx.xxx.xxx.12 port 1812
>         Message-Authenticator := 0x00000000000000000000000000000000
>         NAS-Identifier := "Status Check. Are you alive?"
> Waking up in 3.9 seconds.
> Waking up in 3.9 seconds.
> rad_recv: Access-Request packet from host 127.0.0.1 port 39091, id=56,
> length=59
> Ignoring retransmit from client SERVERS port 39091 - ID: 56, no reply
> was configured

  And the server doesn't respond, because that's what you told it to do.


  I think the main issue here is that you're expecting RADIUS to be a
reliable and robust transport protocol.  It's not.  It just doesn't work
that way.

  There will ALWAYS be packets in "limbo" when a home server goes down.
 It takes time to determine that the server is down, and during that
time, all of the packets sent to it are in "limbo".

  When the server decides that the home server is down, it WILL
retransmit the packets to a backup server.  This is documented in
proxy.conf.  However, it MAY timeout some packets, because it took too
long for the home server to respond.

  There is really very little you can do to work around this problem.
FreeRADIUS *will* fail over to a backup home server.  That's what the
"fail-over" configuration is for in the home server pools.  But it takes
time to figure this out.

  How do you propose that the proxy determine that a home server is
down, without taking any time to do so, and without making any false
positive errors?

  Alan DeKok.



More information about the Freeradius-Users mailing list