Configure SQL to timeout requests rather than rejecting if no connections available

Fri Jan 16 19:18:05 CET 2015

I am using FreeRADIUS 3.0.4, compiled from source on CentOS 6.5.  I'm using the source-compiled version rather than from package because my RADIUS servers need to make a connection to a MS SQL always-on availability group (cluster) that relies on the Microsoft SQL Native Client 11 ODBC driver (which requires a source-compiled version of unix-odbc and thus prevents me from installing the package versions of ODBC & FreeRADIUS).

My infrastructure is set up with multiple MikroTik routers as the RADIUS clients, connecting to two FreeRADIUS servers in a round-robin fashion.  The MikroTik routers are configured to wait 1500ms for a response from their primary RADIUS server before they re-submit the request.  If they make a total of 3 unsuccessful requests with no response (the original + 2 resends) on the primary server, they will re-submit the request to the secondary server.  The RADIUS requests are being made to authorize DHCP requests and hotspot sessions.  Because of a difference in what reply information (for rate-limit and other similar properties) need to go out based on the two types of requests, we are using realms to route the requests off to an appropriate virtual server.  Both virtual servers share the same SQL connection pool (and have different queries they pass).

Occasionally, we have a network disruption (intentional or otherwise) that drops multiple hotspot sessions simultaneously, "crushing" the RADIUS server under a sharp spike in request load.  The SQL module is configured as follows:

	pool {
		start = 15
		min = 0
		max = 45
		spare = 15
		uses = 0
		lifetime = 0
		idle_timeout = 0
	}

Under the heavy load situation, we're seeing entries in radius.log that there aren't enough spares, it starts opening another, and eventually runs out of connections before it can open enough additional connections (it usually only tries to add 1, maybe 2 new connections total during the course of the spike).

Looking through the log of the "master" virtual server that handles the requests and proxies it off to the realm-specific handler (another virtual server on the same physical machine), we're seeing results along the lines of:
Auth: Login incorrect (Home Server says so): [<client MAC>@hotspot<realm>] (from client <nas identity> port 16346 cli <client MAC>)

The realm-specific virtual server is returning:
Auth: Invalid user: [<client MAC>@hotspot<realm>] (from client <nas identity> port 16415 cli <client MAC> via TLS tunnel)

Is there a setting change I can make to get the server to just ignore/drop the request if there are no SQL connections available, rather than sending a reject?  I'm obviously still tuning my max concurrency limits to minimize the frequency of these spikes, but it would be nice if RADIUS would also help my by generating re-requests that spread out the load and distribute it across both servers, rather than just generating a bunch of rejects that block my customers until someone intervenes and cleans up the rejected sessions.

Thank you,

Noah Engelberth
MetaLINK Technologies