home server debugging issues
Alan DeKok
aland at deployingradius.com
Fri Nov 27 11:30:02 CET 2009
Josip Rodin wrote:
> Returning to the original problem, in my pool of two fail-over home servers
> I now have both of them set up with "status_check = none".
2.1.7 has some changes in proxy fail-over. The *first* packet that
discovers that a home server is dead is no longer rejected. Instead, it
fails over to the second home server.
This makes proxying more robust.
> My upstream proxy maintainers refuse to implement decent status checks,
> so I'm forced to do this for now. I can do a status check with an entry
> from a particular HL RADIUS that I happen to control, but that just creates
> a daisy-chain of SPoFs. :/ They insist that I not do anything like this,
> but that I set up my server so that it stubbornly tries their first server,
> then if that fails their second server, for each request.
That's stupid. It increases latency, bandwidth used, and decreases
reliability.
The Status-Server draft says that using Status-Server is preferable to
the alternatives. Maybe they'll follow it once it becomes an RFC.
> Now, when a request comes through that gets discarded by the first proxy
> (because it itself times out on a random HL RADIUS), that one gets marked as
> a zombie. Strangely enough, my server keeps it marked as a zombie even after
> several minutes (long past any of the zombie_period and revive_interval
> periods I've kept in the configuration). My server keeps talking only with
> the second server which is in the 'alive' state, and ignores the zombie.
Hmm... the "zombie_period" timers depend on continued packet streams.
If the NAS doesn't re-transmit packets, then it could stay zombie for a
while. I'll have to take a look at that.
> After re-reading proxy.conf comments, this actually looks logical - there is
> no kind of a status check that would unmark it as a zombie. revive_interval
> can resurrect it from the 'dead' state, but not from the zombie state. Also
> this part of the revive_interval comment is a bit confusing:
>
> # As a result, we recommend enabling status checks, and
> # we do NOT recommend using "revive_interval".
> #
> # The "revive_interval" is used ONLY if the "status_check"
> # entry below is not "none". Otherwise, it will not be used,
> # and should be deleted.
>
> So it's supposed to be a crutch only for people who *have* status checks,
> but not a crutch for those of us who do *not* have status checks.
Huh? That's not what it says. It says "revive_interval" is ONLY for
people who have "status_check = none". i.e. no status checks.
> What is a crutch for this situation? A cron job that keeps doing
> radmin -e 'set home_server state X Y alive'? :)
If you don't have status-checks, then the "revive_interval" should
apply. If it's not being applied, that should be fixed.
Alan DeKok.
More information about the Freeradius-Users
mailing list