mysql and utf8 handling in FreeRADIUS 3.2.x

Fri Apr 4 10:01:07 UTC 2025

Alan DeKok <aland at deployingradius.com> writes:

> On Apr 3, 2025, at 7:35 AM, Bjørn Mork via Freeradius-Users <freeradius-users at lists.freeradius.org> wrote:
>> I have several questions after hitting this hard:
>> 
>> Should FR handle the situation better, detecting the character set
>> mismatches and automatically escape utf8 multi-byte sequences when the
>> target table use a different charset?
>
>   In general, detecting character sets is impossible.  There is
>   significant overlap between valid characters, so any detection
>   attempt is likely to get things wrong.

I was thinking about detecting the mismatch between table schema and
mysql client connection.

But thinking more about this, it's probably not a good idea to add more
automatic "magic" here.  A simple

 i_want_multibyte_utf8 = no

defaulting to "yes" would be better and much simpler.

>> Should there be a way to configure rlm_sql into always escaping utf8
>> multi-byte sequences?
>
>   Not right now.
>
>> Does the default aggressive ascii escaping really align with allowing
>> any multi-byte utf8?  The end results look strange and unexpected IMHO.
>
>   SQL databases use ASCII characters for quotes, statement separation,
>   etc.  SQL databases treat UTF-8 characters the same way they treat
>   upper/lower case.  So valid UTF-8 characters (with high bit set)
>   don't need to be escaped.

Yes, I understand that they don't *need* to be escaped to be SQL safe.
But the end result is a mess, IMHO.

I guess there is some use case I don't see, but if I have to decode the
string to read it then I'd much prefer to have to decode the multi-byte
utf8 chars too.  Dealing with utf8 requires some extra care, as
demonstrated. And there is no gain unless the one byte utf8 chars are
allowed.  At least in my part of the world.

>> Any thoughts outside "don't to that then"?
>
>   This is definitely a "don't do that" scenario.

Thanks for confirming.  It's now on my ever growing list of legacy
config to clean up :-)

> Attributes of type 'octets' can contain anything.  If you're gong to
> store them as printable strings, you have to be careful.

Yup.  Storing them in their default hex digit encoding makes much more
sense. Post processing can be done by the data consumers. Safer and much
easier.  People should not access the accounting database directly
anyway.

>   The short term solution is perhaps to write a small patch which
>   double-checks that the data matches the latin1 character set you're
>   using.  Any data which doesn't match can be converted to escape
>   sequences.

Or maybe add a sql_charset config item and use it in the rlm_sql_mysql
sql_socket_init()?

One stupid problem I faced when trying to work around the problem in a
hurry, was that I put the "[freeradius]" section into the wrong mysql
config file.  Using /etc/my.cnf.d/freeradius.cnf did not work despite
/etc/my.cnf containing

  !includedir /etc/my.cnf.d

Maybe because that's in a section named "[client-server]" which does not
match "[freeradius]"?

Anyway, having a per-instance charset config setting would be nice.

Bjørn