FreeRADIUS in failover - HA setup (question)

Arran Cudbard-Bell a.cudbardb at freeradius.org
Thu Jul 19 16:47:06 CEST 2012


On 19 Jul 2012, at 08:52, Arran Cudbard-Bell wrote:

> 
> On 19 Jul 2012, at 01:11, Aldo Zavala wrote:
> 
>> Hi, everybody. 
>> 
>> I was reading the "Deploying FreeRADIUS with the MySQL Cluster Database" whitepaper downloaded from MySQL website, it mentions in "3.1 Deployment Topologies" section that MySQL cluster can be integrated with FreeRADIUS but it always mention FreeRADIUS to be installed in a single node, would be a way to setup FreeRADIUS to be also failover the same way MySQL is, and not run just in a single node always?
> 
> Yes. You use multiple instances of the SQL module and point them at different SQL nodes, the local node and then any other SQL node in the cluster.
> 
> You can then use the redundant construct to switch between them in case of failure:
> 
> redundant {
> 	sql
> 	sql_remote0
> 	sql_remote1
> 	sql_remoteN
> }
> 
> The problem is (with 2.1x and 3.0) is that if all the connections in the SQL connection pool are down, the SQL module will only fail once its tried to establish a new connection and failed.
> 
> Only one thread can be modifying the contents of the connection pool at a given time because its protected by a mutex. Only one thread can try and open a new connection, and until that connection succeeds or fails all the other threads will block. 
> 
> This causes a bottleneck where a bunch of threads block waiting for their turn to try and re-establish the  connection before failing over. Because the server is largely stateless this happens on every request.
> 
> I believe a good fix would be to check the state of the pool and the mutex, and if were at 0 connections and the pool is locked, to fail instantly. I'll talk to Alan and see what he thinks, maybe it should be configurable.
> 
> If you're using SQL via a unix socket you should be fine, but as soon as you hit a remote server where you have to wait for the TCP connection to time out, I can almost guarantee that the server will just lock up completely.
> 
> So unlike the example above, i'd recommend you only specify one local node and one remote node. Or wait and we'll try and fix something for 3.0.

Fix now in 3.0 (master branch). If you want to use FreeRADIUS with NDB and failover then I highly recommend you use this branch over 2.1.x.

-Arran


More information about the Freeradius-Users mailing list