Home servers constantly zombied, and I can't figure out how to fix it
abultman at mtasolutions.com
Sat Jul 17 01:16:00 CEST 2010
Alan DeKok wrote:
> Adam Bultman wrote:
>> How do I change that functionality? I'd *love* it if it didn't zombie
>> their servers for no reason.
> No.. it marks the servers zombie for a reason: they're not responding.
> But it may be too aggressive.
>> When I do a radiusd -CXXX, I see options I don't see documented for the
>> latest releases of freeradius:
>> - ping_check
>> - ping_interval
>> - num_pings_to_alive
> Those are for backwards compatibility with pre-releases of 2.0. They
> should be removed. They are just different names for the status-server
Excellent; I was wondering if I was somehow not "seeing" something as I
went through the documentation.
>> - max_outstanding (I can't even find what this is for)
> You can put a limit on the total number of "outstanding" packets sent
> to a home server. i.e. put it at 256, and if there are 256 packets sent
> without a response, the proxy will *not* use that home server again,
> until it gets at least one response.
> This is a way to do load-limiting on home servers.
>> As it is, my *.work files are "stuck" (And I've googled for that, and
>> found other list posts regarding that) which seems to indicate that the
>> home servers aren't responding... except that even when my detail.work
>> file is 'stuck' at 24k, and the detail file keeps growing, I'm still
>> sending data to the other side. So something's working, but only sort of..
> It's re-transmitting the same packet over and over. If you install
> 2.1.9, you can use "radmin" to see its progress in reading the detail file.
After some work getting 2.1.9, and v2.1.x from the git repository up and
running, I had to go back to 2.1.7-7, that is patched (hopefully,
anyway!) for the "zombie" problem, via the patch you sent me. The 2.1.9
and 2.1.10 versions would die unexpectedly, right around the time the
"Info: ... ... adding new socket command file
/var/run/radiusd/radiusd.sock " would scroll through the debug. I
couldn't figure it out for the life of me, and strace didn't give me too
much - it'd just segfault right around that time. It also did it on
vanilla installs of 2.1.10, too - so I just gave it up.
At any rate, "radmin" *does* exist for 2.1.7-7 (from the redhat source,
which I patched with the patch you gave me), but it's complaining about
permissions on the sock file (which appear to be fine, but perhaps
selinux is killing it, I have to take a gander) - once I get that ironed
out, I'll take great pleasure in using radmin and seeing what it sees.
>> I'm about to shoot an email to them to see if they can explain their 4
>> year old radius software, and perhaps maybe that's part of the problem.
> Yup. They can upgrade to a (cough) real radius server. :)
Turns out, they were a bit stand-offish. They didn't like their radius
servers being implicated in the mix. "It's working for 30+ clients, so
we have no plans to upgrade".
One thing I also noticed was that it it doesn't look like freeradius is
giving it very many tries on a packet before marking the system down.
At least, that's the way it appears. I don't know how to use wireshark
filters enough to find unacked packets, so I have to do that before I'll
be able to piece that together.
It is also noteworthy that upon pingscanning their network, I found two
IP addresses that are up - and I'm getting packet loss to them. Between
4 and 7 percent, which while not a ton, might be enough to cause a
problem if I'm relaying thousands of packets an hour.
Thanks for the help, Alan. I appreciate it.
More information about the Freeradius-Users