Radiusd hangs on redis cluster failover (sometimes)

Milan Nikolic gen2brain at gmail.com
Wed Aug 7 19:37:17 CEST 2019


Hello,

I have an issue with FreeRADIUS 4.0.x and Redis cluster. When I shut
down one of the nodes (all have freeradius and use redis cluster),
redis recovers and cluster state is OK but it seems freeradius doesn't
refresh cluster topology, and when I send a packet to one of the
working nodes it is trying to send command to node that is down and
then just hangs and doesn't return response. I cannot stop radiusd
after that (i.e. ctrl+c doesn't work) and it must be killed.

The last line in log is this, and nothing is printed after that:

Debug : (7)        rediswho - [16] >>> Sending command(s) to
192.168.1.8:7004 (fr_redis_cluster_state_init)

Btw. I changed the message in cluster.c just to confirm which function
is called (there are two same Sending command(s) msg in that file), it
is this line https://github.com/FreeRADIUS/freeradius-server/blob/master/src/lib/redis/cluster.c#L1784
.
So 192.168.1.8 is the node that I shut down to test high availability,
and I send packet after redis is recovered.

This doesn't happen when I shut down the other node, I can see in log
how radius refreshes cluster topology and everything just continues to
work. Before every test, I always make sure cluster state is ok and
master/slaves are in balance on all nodes.

Attached is a log file I get with `radiusd -X` on the node that fails
and hangs after it tries to contact node that is down.

Thanks,
Milan


More information about the Freeradius-Users mailing list