Enhanced regex support

Arran Cudbard-Bell a.cudbardb at freeradius.org
Sat Dec 13 05:53:50 CET 2014


> On 12 Dec 2014, at 19:46, Arran Cudbard-Bell <a.cudbardb at freeradius.org> wrote:
> 
> Regular expressions were one of the few non binary safe functions of the expression evaluation code.
> 
> The reason for this was the POSIX specified regexec and regcomp functions didn't take a length 
> argument and would stop parsing if they hit an embedded \0.
> 
> This has been well known security vulnerability in the PHP world for the past 10 years, where in some
> cases form validation could be bypassed by adding an embedded null to a string being validated.
> 
> It could have been used in a similar way for RADIUS, though the likelihood of someone using it to 
> expose a critical vulnerability in a site's configuration was slim, so it was never a priority to fix.
> 
> The latest series of commits introduces some changes in how regexes are handled.
> 
> *	If you're building on a POSIX compliant non-BSD system without libpcre (i.e. most Linuxes) 
> 	and if the regex code finds an embedded \0 in the pattern or subject of the regular expression,
> 	evaluation will fail.
> 
> *	If you're building on a POSIX compliant BSD system, regncomp and regnexec are used. These are
> 	non-portable BSD functions which take a length argument. In this case a string with an
> 	embedded \0 will be treated like any other string.
> 
> *	If you're building on a system with libpcre, the native libpcre functions are used (previously
> 	we relied on the pcreposix shim). As pcre_compile, pcre_study and pcre_exec all take length 
> 	arguments, a string with an embedded \0 will be treated like any other string.
> 
> In addition to being mostly (apart from POSIX) binary safe, switching to the native libpcre library
> has some advantages.
> 
> If using libpcre > 8.20 the JIT compiler is now used for precompiled expressions (the majority of
> unlang if statements). The JIT compiler converts the compiled expression to architecture specific byte 
> code, which should execute significantly faster, especially in expressions with lots of alternation.
> 
> Named capture groups are also available if using libpcre, and may be accessed using 
> %{regex:<named capture group>}. 
> 
> Reworking the subcapture storage code has removed the performance penalty for large numbers of 
> subcaptures, so the limit has been increased to 32. The limit is completely arbitrary, but is compiled
> in, so can't be changed in the config. If people really want more capture groups then it can be
> increased again.
> 
> The memory cost is 12 bytes for every PCRE capture group or 16 bytes for every POSIX capture group,
> and access is O(1).
> 
> -Arran

JIT is actually available in libpcre >= 8.20.

The linked version is now shown in radiusd -xv.

Fri Dec 12 21:12:35 2014 : Debug: Server core libs:
Fri Dec 12 21:12:35 2014 : Debug:   talloc : 2.0.*
Fri Dec 12 21:12:35 2014 : Debug:   ssl    : OpenSSL 0.9.8za 5 Jun 2014 0x009081af (0.9.8z release)
Fri Dec 12 21:12:35 2014 : Debug:   pcre   : 8.36 2014-09-26

A few more points. 

Named captures can be accessed using the numeric indexes too. Named captures also use on subcapture slot,
so it's worth using (?:<pattern>) if you don't need the value.

If the server is built without PCRE, the %{regex: } expansion will not be available, and the server will 
fail to start if it's used in the config.

If headers from a version greater than 8.20 are used with a library with a version less than 8.20
pcre_study will fail with an error about invalid options.

UTF8 support is left disabled, but may be enabled by inserting (*UTF8) at the start of the pattern. 
This is a standard function of libpcre. There's no option to turn it off within the pattern which 
is why it's not explicitly enabled by default.

Line endings can also be customised on a per pattern basis. Options for PCRE are described here:
http://www.pcre.org/pcre.txt

libpcre is nicer than the POSIX regex functions in that it also gives us descriptive error messages
and the offset where the issue was found, meaning we can output error markers like this:

/usr/local/freeradius/etc/raddb/sites-enabled/default[250]: Invalid regular expression:
/usr/local/freeradius/etc/raddb/sites-enabled/default[250]: testing123_foo[
/usr/local/freeradius/etc/raddb/sites-enabled/default[250]:                ^ Pattern compilation failed: missing terminating ] for character class

-Arran

Arran Cudbard-Bell <a.cudbardb at freeradius.org>
FreeRADIUS development team

FD31 3077 42EC 7FCD 32FE 5EE2 56CF 27F9 30A8 CAA2



More information about the Freeradius-Users mailing list