Radiusd hangs on redis cluster failover (sometimes)
Alan DeKok
aland at deployingradius.com
Thu Aug 8 16:32:38 CEST 2019
On Aug 7, 2019, at 1:37 PM, Milan Nikolic <gen2brain at gmail.com> wrote:
>
> I have an issue with FreeRADIUS 4.0.x and Redis cluster. When I shut
> down one of the nodes (all have freeradius and use redis cluster),
> redis recovers and cluster state is OK but it seems freeradius doesn't
> refresh cluster topology, and when I send a packet to one of the
> working nodes it is trying to send command to node that is down and
> then just hangs and doesn't return response. I cannot stop radiusd
> after that (i.e. ctrl+c doesn't work) and it must be killed.
That isn't good.
> The last line in log is this, and nothing is printed after that:
>
> Debug : (7) rediswho - [16] >>> Sending command(s) to
> 192.168.1.8:7004 (fr_redis_cluster_state_init)
>
> Btw. I changed the message in cluster.c just to confirm which function
> is called (there are two same Sending command(s) msg in that file), it
> is this line https://github.com/FreeRADIUS/freeradius-server/blob/master/src/lib/redis/cluster.c#L1784
> .
> So 192.168.1.8 is the node that I shut down to test high availability,
> and I send packet after redis is recovered.
>
> This doesn't happen when I shut down the other node,
What "other" node? I know there's a cluster, but what is different between the two nodes?
> I can see in log
> how radius refreshes cluster topology and everything just continues to
> work. Before every test, I always make sure cluster state is ok and
> master/slaves are in balance on all nodes.
>
> Attached is a log file I get with `radiusd -X` on the node that fails
> and hangs after it tries to contact node that is down.
Please don't attach log files as zips. The mailing list deletes them. Attach log files in-line. If they're too large, put them on a pastebin web site somewhere.
Alan DeKok.
More information about the Freeradius-Users
mailing list