Enhanced regex support
Arran Cudbard-Bell
a.cudbardb at freeradius.org
Sat Dec 13 01:46:40 CET 2014
Regular expressions were one of the few non binary safe functions of the expression evaluation code.
The reason for this was the POSIX specified regexec and regcomp functions didn't take a length
argument and would stop parsing if they hit an embedded \0.
This has been well known security vulnerability in the PHP world for the past 10 years, where in some
cases form validation could be bypassed by adding an embedded null to a string being validated.
It could have been used in a similar way for RADIUS, though the likelihood of someone using it to
expose a critical vulnerability in a site's configuration was slim, so it was never a priority to fix.
The latest series of commits introduces some changes in how regexes are handled.
* If you're building on a POSIX compliant non-BSD system without libpcre (i.e. most Linuxes)
and if the regex code finds an embedded \0 in the pattern or subject of the regular expression,
evaluation will fail.
* If you're building on a POSIX compliant BSD system, regncomp and regnexec are used. These are
non-portable BSD functions which take a length argument. In this case a string with an
embedded \0 will be treated like any other string.
* If you're building on a system with libpcre, the native libpcre functions are used (previously
we relied on the pcreposix shim). As pcre_compile, pcre_study and pcre_exec all take length
arguments, a string with an embedded \0 will be treated like any other string.
In addition to being mostly (apart from POSIX) binary safe, switching to the native libpcre library
has some advantages.
If using libpcre > 8.20 the JIT compiler is now used for precompiled expressions (the majority of
unlang if statements). The JIT compiler converts the compiled expression to architecture specific byte
code, which should execute significantly faster, especially in expressions with lots of alternation.
Named capture groups are also available if using libpcre, and may be accessed using
%{regex:<named capture group>}.
Reworking the subcapture storage code has removed the performance penalty for large numbers of
subcaptures, so the limit has been increased to 32. The limit is completely arbitrary, but is compiled
in, so can't be changed in the config. If people really want more capture groups then it can be
increased again.
The memory cost is 12 bytes for every PCRE capture group or 16 bytes for every POSIX capture group,
and access is O(1).
-Arran
Arran Cudbard-Bell <a.cudbardb at freeradius.org>
FreeRADIUS development team
FD31 3077 42EC 7FCD 32FE 5EE2 56CF 27F9 30A8 CAA2
More information about the Freeradius-Users
mailing list