Radiusd hangs on redis cluster failover (sometimes)

Alan DeKok aland at deployingradius.com
Thu Aug 8 16:32:38 CEST 2019


On Aug 7, 2019, at 1:37 PM, Milan Nikolic <gen2brain at gmail.com> wrote:
> 
> I have an issue with FreeRADIUS 4.0.x and Redis cluster. When I shut
> down one of the nodes (all have freeradius and use redis cluster),
> redis recovers and cluster state is OK but it seems freeradius doesn't
> refresh cluster topology, and when I send a packet to one of the
> working nodes it is trying to send command to node that is down and
> then just hangs and doesn't return response. I cannot stop radiusd
> after that (i.e. ctrl+c doesn't work) and it must be killed.

  That isn't good.

> The last line in log is this, and nothing is printed after that:
> 
> Debug : (7)        rediswho - [16] >>> Sending command(s) to
> 192.168.1.8:7004 (fr_redis_cluster_state_init)
> 
> Btw. I changed the message in cluster.c just to confirm which function
> is called (there are two same Sending command(s) msg in that file), it
> is this line https://github.com/FreeRADIUS/freeradius-server/blob/master/src/lib/redis/cluster.c#L1784
> .
> So 192.168.1.8 is the node that I shut down to test high availability,
> and I send packet after redis is recovered.
> 
> This doesn't happen when I shut down the other node,

  What "other" node?  I know there's a cluster, but what is different between the two nodes?

> I can see in log
> how radius refreshes cluster topology and everything just continues to
> work. Before every test, I always make sure cluster state is ok and
> master/slaves are in balance on all nodes.
> 
> Attached is a log file I get with `radiusd -X` on the node that fails
> and hangs after it tries to contact node that is down.

  Please don't attach log files as zips.  The mailing list deletes them.  Attach log files in-line.  If they're too large, put them on a pastebin web site somewhere.

  Alan DeKok.




More information about the Freeradius-Users mailing list