Home servers constantly zombied, and I can't figure out how to fix it

Adam Bultman abultman at mtasolutions.com
Sat Jul 17 01:36:51 CEST 2010


Oh, I must apologize - I didn't know the 'detail' portion of radmin
didn't exist until 2.1.9.  Perhaps I'll work on compiling and testing
that over the weekend.



Adam Bultman wrote:
> 
> Alan DeKok wrote:
>> Adam Bultman wrote:
>>> How do I change that functionality?  I'd *love* it if it didn't zombie
>>> their servers for no reason.
>>   No.. it marks the servers zombie for a reason: they're not responding.
>>  But it may be too aggressive.
>>
>>> When I do a radiusd -CXXX, I see options I don't see documented for the
>>> latest releases of freeradius:
>>>  - ping_check
>>>  - ping_interval
>>>  - num_pings_to_alive
>>   Those are for backwards compatibility with pre-releases of 2.0.  They
>> should be removed.  They are just different names for the status-server
>> checks.
>>
> Excellent; I was wondering if I was somehow not "seeing" something as I
> went through the documentation.
>>>  - max_outstanding  (I can't even find what this is for)
>>   You can put a limit on the total number of "outstanding"  packets sent
>> to a home server.  i.e. put it at 256, and if there are 256 packets sent
>> without a response, the proxy will *not* use that home server again,
>> until it gets at least one response.
>>
>>   This is a way to do load-limiting on home servers.
>>
>>> As it is, my *.work files are "stuck" (And I've googled for that, and
>>> found other list posts regarding that) which seems to indicate that the
>>> home servers aren't responding... except that even when my detail.work
>>> file is 'stuck' at 24k, and the detail file keeps growing, I'm still
>>> sending data to the other side.  So something's working, but only sort of..
>>   It's re-transmitting the same packet over and over.  If you install
>> 2.1.9, you can use "radmin" to see its progress in reading the detail file.
>>
> After some work getting 2.1.9, and v2.1.x from the git repository up and
> running, I had to go back to 2.1.7-7, that is patched (hopefully,
> anyway!) for the "zombie" problem, via the patch you sent me.  The 2.1.9
> and 2.1.10 versions would die unexpectedly, right around the time the
> "Info: ... ... adding new socket command file
> /var/run/radiusd/radiusd.sock " would scroll through the debug.  I
> couldn't figure it out for the life of me, and strace didn't give me too
> much - it'd just segfault right around that time.  It also did it on
> vanilla installs of 2.1.10, too - so I just gave it up.
> 
> At any rate, "radmin" *does* exist for 2.1.7-7 (from the redhat source,
> which I patched with the patch you gave me), but it's complaining about
> permissions on the sock file (which appear to be fine, but perhaps
> selinux is killing it, I have to take a gander) - once I get that ironed
> out, I'll take great pleasure in using radmin and seeing what it sees.
> 
>>> I'm about to shoot an email to them to see if they can explain their 4
>>> year old radius software, and perhaps maybe that's part of the problem.
>>   Yup.  They can upgrade to a (cough) real radius server. :)
>>
> 
> Turns out, they were a bit stand-offish. They didn't like their radius
> servers being implicated in the mix.  "It's working for 30+ clients, so
> we have no plans to upgrade".
> 
> One thing I also noticed was that it it doesn't look like freeradius is
> giving it very many tries on a packet before marking the system down.
> At least, that's the way it appears.  I don't know how to use wireshark
> filters enough to find unacked packets, so I have to do that before I'll
> be able to piece that together.
> 
> It is also noteworthy that upon pingscanning their network, I found two
> IP addresses that are up - and I'm getting packet loss to them.  Between
> 4 and 7 percent, which while not a ton, might be enough to cause a
> problem if I'm relaying thousands of packets an hour.
> 
> Thanks for the help, Alan. I appreciate it.
> 

-- 
Adam



More information about the Freeradius-Users mailing list