Proxy realms and home_server_pool fallback not working

Mon Mar 7 09:22:36 CET 2016

On Mon, Mar 7, 2016 at 2:55 PM, Alan DeKok <aland at deployingradius.com>
wrote:

> On Mar 6, 2016, at 6:54 PM, Peter Lambrechtsen <peter at crypt.co.nz> wrote:
> > I'm looking to add more robustness into my proxy architecture and noticed
> > in the home_server_pool there is the option for "fallback = virtualrealm"
> > so if all home servers fail then a last resort home_server is used with
> > some config locally to always accept / reject customers based on the
> realm
> > they are coming from. I'm not using the status_check
>
>   Then you can do "status_check = request".  An Access-Accept or
> Access-Reject response will be accepted as an indication that the home
> server i alive.
>
> > as some of the
> > downstream clients don't support status-server, but I will look into that
> > to see if it makes a difference.
>
>   It should.
>
> > However for this situation I would expect
> > if you are using or not using Status server checks shouldn't have any
> > impact on how the fallback server works.
>
>   It does.  A lot.
>
>   The problem is that without Status-Server, FreeRADIUS has to *guess*
> when the home server is alive.  And the guess is usually wrong.  Because
> most guesses are wrong.
>

Yes, I have figured that out. I'm now pinging all our downstream radius
clients to see which respond to something sane when sent a Status, and then
turning on Status server for them.

> > In the proxy.conf I have configured:
> >
> > home_server ProxyDest {
> >        type = auth+acct
> >        ipaddr = 192.168.1.113
> >        port = 1812
> >        secret = password
> >        response_window = 1
> >        require_message_authenticator = no
> >        zombie_period = 5
> >        revive_interval = 10
>
>   That's really low.  After 10s, just mark the home server alive?
>
>   It should be 60s at the minimum.  Maybe 5min.
>

It was purely for testing as waiting around for 10 seconds is much better
than waiting around for 2 mins. Now with check_interval with status turned
on things are making more sense.

> > But the second and subsequent request I would expect to get proxied to
> the
> > local fallback virtual server as the home_server has been marked as
> zombie.
> > But that never seems to happen. It keeps on rejecting the requests and
> > fallback never seems to be used.
>
>   Hmm... I'll take a look.
>
> > If I configure a second home server in the pool.
> ...
> > Then the second server is failed over to when the first fails. Which is
> all
> > good if I wanted to use the type fail-over, but if I wanted to use
> > load-balance then I can't have my fallback server as a home server
> > otherwise a percentage of requests will always be local which isn't
> ideal.
>
>   Yes.  You can't do load-balance and fallback.
>
>   You *can* put something into Post-Proxy-Type Fail.  Which is probably
> what we should do.  And remove the fallback virtual server.
>

What could I do in Post-Proxy-Type? As I can't call the virtual server, and
Proxy-To-Realm doesn't proxy to a new destination nor does setting the
control to accept. There doesn't seem to be a way to turn a Reject from a
failed proxy request back into an Accept.

(0) ERROR: Failing proxied request for user "peter", due to lack of any
response from home server 192.168.1.113 port 1812
(0) Clearing existing &reply: attributes
(0) Found Post-Proxy-Type Fail-Authentication
(0) # Executing group from file ./sites-enabled/default
(0)   Post-Proxy-Type Fail-Authentication {
(0)     policy accept {
(0)       update control {
(0)         &Response-Packet-Type = Access-Accept
(0)       } # update control = noop
(0)       [handled] = handled
(0)     } # policy accept = handled
(0)   } # Post-Proxy-Type Fail-Authentication = handled
(0) There was no response configured: rejecting request
(0) Using Post-Auth-Type Reject

>   This allows the same behaviour for all packets, and simplifies the proxy
> code.
>
> > The other interesting thing with the failover is I set the check_interval
> > to 10 seconds, or 30 seconds. But it only seems that the first client is
> > re-checked after 60 seconds and assumed to be back up.
>
>   Because you have revive_interval set.
>
> > Waking up in 0.2 seconds.
> > Marking home server 192.168.1.113 port 1812 alive again... we have no
> idea
> > if it really is alive or not.
>
>   And that message is printed only when you have revive_interval set.
>
>   The solution is to *not* set revive_interval.  And use Status-Server
> exclusively.
>

> > Waking up in 1.0 seconds.
> >
> > I would have thought that
> >
> >        zombie_period = 5
> >        revive_interval = 10
> >        check_interval = 10
> >
> > Would mean that the client would be re-checked in 10 seconds.
>
>   check_interval and revive_interval should be mutually exclusive.  It
> just doesn't make sense to both check that the home server is alive every
> 10s, and then *always* mark it as alive after 10s.
>
> > Am I mis-understanding how fallback is supposed to work?
>
>   A bit.
>
>   But the fallback virtual server should work.  Tho I'm inclined to remove
> it in 3.1, as it makes everything more complicated.
>

Thanks for all your help on this, the fail-over with the second server
being the virtual seems to work well, just means I am restricted to a
single server and can't use load-balance. But having this config would be
my ideal:

home_server_pool ProxyDestPool {
        type = load-balance
        home_server = ProxyDest1
        home_server = ProxyDest2
        home_server = ProxyDest3
        fallback = cacheuser
}

Where if all the home servers go awol I use the local virtual server
cacheuser.

Many thanks

Peter