reconnect to mysql on DB write errors
aland at deployingradius.com
Mon Apr 25 16:50:48 UTC 2022
On Apr 25, 2022, at 8:33 AM, Cypher A <cyphera at gmail.com> wrote:
> I can’t find anything in rlm_sql that could kill the current connection.
Because the connection handling is done by the connection pool code, not by rlm_sql. See src/main/connection.c
> I also tested to use an exec xlat to run a script to kill and restart the current radiusd process. Really interesting that this works when radiusd is running in debug mode. But not when it’s running under systemd ..
> What’s the proper way to do this??
The proper way is to not make your database read-only while FreeRADIUS is trying to write to it.
What's happening here is fairly simple. When the DB goes read-only there is pretty much no way for FreeRADIUS (or the SQL driver) to know that. An error of "cannot insert duplicate row" is similar to "cannot write to the DB".
So the database is up, it's just not doing it's job. Since the DB is up, the SQL connections don't fail over.
The solution here is really "don't take the DB down while it' being used in production". There's no simple way to tell FreeRADIUS "close this SQL connection", because no one needs that functionality. For most people, if the DB is up, then it's writable. And when the DB is down, the normal SQL failover mechanism works fine.
This is very much an unusual situation which is caused by the decision to randomly make the DB read-only, but still have it respond to SQL queries.
Another solution which should work is to simply use FW rules to reset all of the old connections to the DB. Ensure that the firewall returns RST, then FreeRADIUS will see that the connections are down, and will fail over.
More information about the Freeradius-Users