Home servers constantly zombied, and I can't figure out how to fix it

Adam Bultman abultman at mtasolutions.com
Fri Jul 16 03:57:49 CEST 2010



Alan DeKok wrote:
> Adam Bultman wrote:
>> I have FreeRADIUS 2.1.3 servers that are proxying accounting information
>> to two remote RADIUS servers (radiator, if it matters.)
> 
>   It could matter.
> 

Checking the status-server packets, it appears that their servers are
stating they are radiator version 3.15 - which according to Radiator's
web site, is from June of 2006. Holy crap.


> 
>   As an indication that this is happening, see:
> 
> http://www.open.com.au/radiator/history.html
> 
>   See Revision 4.0 (2008-01-14), reference to RFC 5080.  They
> implemented the duplicate detection cache which has been in FreeRADIUS
> since day 1. (1999).
> 

Yeah, they're still 1.5 years away from 4.0, if their server-status
stuff is returning the correct version.

>   My suggestion is to try replacing one of the home servers with
> FreeRADIUS.  It will respond to any retransmit.  So if there are packet
> loss problems, they should be less problematic.
> 

Er, I can't do that; it's not the same company, not the same state.
They're tight-lipped about their radius server, their firewalls, their
network setups - I have no clue what their systems are doing at any
given time, so it's hard to know what's going on their end.

>> My problem is that the two servers I am sending to are constantly
>> declared zombies.  Perhaps related is that in packet traces on the
>> RADIUS servers, I see my RADIUS servers sending duplicate packets. I do
>> not know if the duplicate packets are because the NAS is sending
>> duplicate packets to me (it is indeed sending duplicate packets,
>> according to wireshark), or if it is something on the RADIUS server's
>> end.
> 
>   The NAS is likely sending retransmits because it isn't seeing a response.
> 

> 
>> I have been making a lot of configuration changes (esp. with regard to
>> the check interval, number of responses before alive, etc) - so if
>> anything is seriously out of whack, let me know - but it seems that no
>> matter what, those systems get marked as zombies by my RADIUS servers a
>> half a dozen times a minute.
> 
>   The servers are marked zombie when the proxy sends a request, and
> doesn't see a response in 30s.
> 
>   That's a little aggressive.  The current logic doesn't take into
> account if *other* packets have received responses.  It should probably
> mark the home server "zombie" only if there have been *no* responses in
> the "zombie" time interval.
> 

How do I change that functionality?  I'd *love* it if it didn't zombie
their servers for no reason.

When I do a radiusd -CXXX, I see options I don't see documented for the
latest releases of freeradius:
 - ping_check
 - ping_interval
 - num_pings_to_alive
 - max_outstanding  (I can't even find what this is for)

I've googled for these (and some others) and not sure if they're used,
or what - perhaps they are deprecated, but show up in the config check
anyway.

As it is, my *.work files are "stuck" (And I've googled for that, and
found other list posts regarding that) which seems to indicate that the
home servers aren't responding... except that even when my detail.work
file is 'stuck' at 24k, and the detail file keeps growing, I'm still
sending data to the other side.  So something's working, but only sort of..



>   I'd suggest replacing one of the home servers with FreeRADIUS.  If
> that makes a big difference for the proxy, then the Radiator server is
> borked.
> 
I'm about to shoot an email to them to see if they can explain their 4
year old radius software, and perhaps maybe that's part of the problem.

Thanks for your help.

Adam


-- 
Adam



More information about the Freeradius-Users mailing list