2.1.8 proxy zombie/dead/alive loops

Alan DeKok aland at deployingradius.com
Mon Jan 4 17:46:46 CET 2010


Craig Campbell wrote:
> There are 2 radius servers (radius-a and radius-b).
> Each server will relay packets it receives to the other server.
> (Currently only accounting packets are being received)
> The packets are collected in detail-relay file.
> The packets are then relayed via the
> sites/enabled/copy-acct-to-home-server config.

  OK...

> What I observe is a single packet being read from the detail-relay.work
> file on radius-b and being sent radius-a.
> I do not see any response from radius-a being returned to radius-b. 
> After what seems to be about 30 seconds the packet is resent from
> radius-b to radius-a.  Again and again...

  If the home server doesn't respond, the packet will be retried forever.

> On radius-b the following messages are logged (status_check =
> status-server)....
..
>     Mon Jan  4 10:10:42 2010 : Proxy: Marking home server 192.168.1.225
>     port 1813 as zombie (it looks like it is dead).
>     Mon Jan  4 10:10:42 2010 : Proxy: Received response to status check
>     5938 (1 in current sequence)
>     Mon Jan  4 10:11:11 2010 : Proxy: Received response to status check
>     6013 (2 in current sequence)
>     Mon Jan  4 10:11:40 2010 : Proxy: Received response to status check
>     6048 (3 in current sequence)
>     Mon Jan  4 10:11:40 2010 : Proxy: Marking home server 192.168.1.225
>     port 1813 alive
>     Mon Jan  4 10:11:43 2010 : Proxy: Marking home server 192.168.1.225
>     port 1813 as zombie (it looks like it is dead).

  Uh... the "response_window" is 3 seconds?  Why?

>     Mon Jan  4 10:11:43 2010 : Proxy: Received response to status check
>     6051 (4 in current sequence)
>     Mon Jan  4 10:11:43 2010 : Proxy: Marking home server 192.168.1.225
>     port 1813 alive

  Hmm... that last bit shouldn't happen.  The timing is such that it
*can* send more than 3 Status-Server packets.  But it should use those
to make only *one* transition from "dead" to "alive".

>     Mon Jan  4 10:12:13 2010 : Proxy: Marking home server 192.168.1.225
>     port 1813 as zombie (it looks like it is dead).
>     Mon Jan  4 10:12:13 2010 : Proxy: Received response to status check
>     6086 (5 in current sequence)

  OK... the home server isn't getting the sequence numbers re-set.
That's an issue, but probably a minor one.

>     Mon Jan  4 11:53:46 2010 : Info: [sql] stop packet with zero session
>     length. [user 'test_user_please_reject_me', nas '192.168.1.226']

  Why are you trying to log a test packet to SQL?  Just make sure that
the 'accounting' section returns 'ok' for the test packet.

> I suspect I SHOULD be using status_check=status-server.
> Which then leads to why my server keeps getting marked as
> zombie/dead/alive....

  Because the home server is down, and isn't responding.

> It seems like the accounting stop packet being sent is not generating a
> reply...?

  Yes.  Go fix that.  In 2.1.8, see raddb/sites-available/default.  Go
to the "accounting" section, and read the comments after the "sql" module.

  Alan DeKok.



More information about the Freeradius-Users mailing list