Home servers constantly zombied, and I can't figure out how to fix it

Adam Bultman abultman at mtasolutions.com
Sat Jul 17 01:16:00 CEST 2010

Alan DeKok wrote:
> Adam Bultman wrote:
>> How do I change that functionality?  I'd *love* it if it didn't zombie
>> their servers for no reason.
>   No.. it marks the servers zombie for a reason: they're not responding.
>  But it may be too aggressive.
>> When I do a radiusd -CXXX, I see options I don't see documented for the
>> latest releases of freeradius:
>>  - ping_check
>>  - ping_interval
>>  - num_pings_to_alive
>   Those are for backwards compatibility with pre-releases of 2.0.  They
> should be removed.  They are just different names for the status-server
> checks.
Excellent; I was wondering if I was somehow not "seeing" something as I
went through the documentation.
>>  - max_outstanding  (I can't even find what this is for)
>   You can put a limit on the total number of "outstanding"  packets sent
> to a home server.  i.e. put it at 256, and if there are 256 packets sent
> without a response, the proxy will *not* use that home server again,
> until it gets at least one response.
>   This is a way to do load-limiting on home servers.
>> As it is, my *.work files are "stuck" (And I've googled for that, and
>> found other list posts regarding that) which seems to indicate that the
>> home servers aren't responding... except that even when my detail.work
>> file is 'stuck' at 24k, and the detail file keeps growing, I'm still
>> sending data to the other side.  So something's working, but only sort of..
>   It's re-transmitting the same packet over and over.  If you install
> 2.1.9, you can use "radmin" to see its progress in reading the detail file.
After some work getting 2.1.9, and v2.1.x from the git repository up and
running, I had to go back to 2.1.7-7, that is patched (hopefully,
anyway!) for the "zombie" problem, via the patch you sent me.  The 2.1.9
and 2.1.10 versions would die unexpectedly, right around the time the
"Info: ... ... adding new socket command file
/var/run/radiusd/radiusd.sock " would scroll through the debug.  I
couldn't figure it out for the life of me, and strace didn't give me too
much - it'd just segfault right around that time.  It also did it on
vanilla installs of 2.1.10, too - so I just gave it up.

At any rate, "radmin" *does* exist for 2.1.7-7 (from the redhat source,
which I patched with the patch you gave me), but it's complaining about
permissions on the sock file (which appear to be fine, but perhaps
selinux is killing it, I have to take a gander) - once I get that ironed
out, I'll take great pleasure in using radmin and seeing what it sees.

>> I'm about to shoot an email to them to see if they can explain their 4
>> year old radius software, and perhaps maybe that's part of the problem.
>   Yup.  They can upgrade to a (cough) real radius server. :)

Turns out, they were a bit stand-offish. They didn't like their radius
servers being implicated in the mix.  "It's working for 30+ clients, so
we have no plans to upgrade".

One thing I also noticed was that it it doesn't look like freeradius is
giving it very many tries on a packet before marking the system down.
At least, that's the way it appears.  I don't know how to use wireshark
filters enough to find unacked packets, so I have to do that before I'll
be able to piece that together.

It is also noteworthy that upon pingscanning their network, I found two
IP addresses that are up - and I'm getting packet loss to them.  Between
4 and 7 percent, which while not a ton, might be enough to cause a
problem if I'm relaying thousands of packets an hour.

Thanks for the help, Alan. I appreciate it.


More information about the Freeradius-Users mailing list