Home servers constantly zombied, and I can't figure out how to fix it
Adam Bultman
abultman at mtasolutions.com
Sat Jul 17 01:36:51 CEST 2010
Oh, I must apologize - I didn't know the 'detail' portion of radmin
didn't exist until 2.1.9. Perhaps I'll work on compiling and testing
that over the weekend.
Adam Bultman wrote:
>
> Alan DeKok wrote:
>> Adam Bultman wrote:
>>> How do I change that functionality? I'd *love* it if it didn't zombie
>>> their servers for no reason.
>> No.. it marks the servers zombie for a reason: they're not responding.
>> But it may be too aggressive.
>>
>>> When I do a radiusd -CXXX, I see options I don't see documented for the
>>> latest releases of freeradius:
>>> - ping_check
>>> - ping_interval
>>> - num_pings_to_alive
>> Those are for backwards compatibility with pre-releases of 2.0. They
>> should be removed. They are just different names for the status-server
>> checks.
>>
> Excellent; I was wondering if I was somehow not "seeing" something as I
> went through the documentation.
>>> - max_outstanding (I can't even find what this is for)
>> You can put a limit on the total number of "outstanding" packets sent
>> to a home server. i.e. put it at 256, and if there are 256 packets sent
>> without a response, the proxy will *not* use that home server again,
>> until it gets at least one response.
>>
>> This is a way to do load-limiting on home servers.
>>
>>> As it is, my *.work files are "stuck" (And I've googled for that, and
>>> found other list posts regarding that) which seems to indicate that the
>>> home servers aren't responding... except that even when my detail.work
>>> file is 'stuck' at 24k, and the detail file keeps growing, I'm still
>>> sending data to the other side. So something's working, but only sort of..
>> It's re-transmitting the same packet over and over. If you install
>> 2.1.9, you can use "radmin" to see its progress in reading the detail file.
>>
> After some work getting 2.1.9, and v2.1.x from the git repository up and
> running, I had to go back to 2.1.7-7, that is patched (hopefully,
> anyway!) for the "zombie" problem, via the patch you sent me. The 2.1.9
> and 2.1.10 versions would die unexpectedly, right around the time the
> "Info: ... ... adding new socket command file
> /var/run/radiusd/radiusd.sock " would scroll through the debug. I
> couldn't figure it out for the life of me, and strace didn't give me too
> much - it'd just segfault right around that time. It also did it on
> vanilla installs of 2.1.10, too - so I just gave it up.
>
> At any rate, "radmin" *does* exist for 2.1.7-7 (from the redhat source,
> which I patched with the patch you gave me), but it's complaining about
> permissions on the sock file (which appear to be fine, but perhaps
> selinux is killing it, I have to take a gander) - once I get that ironed
> out, I'll take great pleasure in using radmin and seeing what it sees.
>
>>> I'm about to shoot an email to them to see if they can explain their 4
>>> year old radius software, and perhaps maybe that's part of the problem.
>> Yup. They can upgrade to a (cough) real radius server. :)
>>
>
> Turns out, they were a bit stand-offish. They didn't like their radius
> servers being implicated in the mix. "It's working for 30+ clients, so
> we have no plans to upgrade".
>
> One thing I also noticed was that it it doesn't look like freeradius is
> giving it very many tries on a packet before marking the system down.
> At least, that's the way it appears. I don't know how to use wireshark
> filters enough to find unacked packets, so I have to do that before I'll
> be able to piece that together.
>
> It is also noteworthy that upon pingscanning their network, I found two
> IP addresses that are up - and I'm getting packet loss to them. Between
> 4 and 7 percent, which while not a ton, might be enough to cause a
> problem if I'm relaying thousands of packets an hour.
>
> Thanks for the help, Alan. I appreciate it.
>
--
Adam
More information about the Freeradius-Users
mailing list