Zombie servers are never marked as dead or unzombied when using status_check = none

Herwin Weststrate herwin at quarantainenet.nl
Fri Jun 20 14:07:09 CEST 2014

Another thing of which I'm not really sure if it's a bug or if I'm
misinterpreting something. It's tested with version 3.0.x

I'm using a setup with 3 FreeRADIUS backends ({3,4,5}) and a
single inner-proxy-server. All the home servers are marked in the config
with a zombie_period of 20 and a revive_interval of 60. Requests are
load-balanced to the home server using keyed-balance on the username
(yes, I understand this is something you wouldn't normally do when
proxying the inner tunnel, but it makes it easier to reproduce).

The first authentication get proxied to .114. Then, we disable that home
server by adding a firewall rule. The next authentication requests shows
some logging that the home serve is marked as a zombie:

  Marking home server port 1812 as zombie (it has not
  responded in 10.000000 seconds).

The next authentication request with the same user name now get proxied
to .115. Authentication requests with other user names are now
distributed among .113 and .115 (with somethings that looks like a bias
towards .115, possibly because all requests that would originally have
gone to .114 are now sent to .115, but it's tested with a very small
set, quite possibly it's something else).

The strange thing here is that .114 stays a zombie. Even after more than
two minutes (which would expire both zombie_period and revive_interval)
new requests just get proxied to .115 or .113, but no mention on
updating the state of .114 whatsoever.

Looking at the source, I see that after a server is marked as a zombie,
the method ping_home_server (main/process.c) is called. This method is
effectively a no-op if the status_check of the home server is set to
none, which might be the cause.

Herwin Weststrate

More information about the Freeradius-Users mailing list