Enhanced regex support

Arran Cudbard-Bell a.cudbardb at freeradius.org
Sat Dec 13 01:46:40 CET 2014


Regular expressions were one of the few non binary safe functions of the expression evaluation code.

The reason for this was the POSIX specified regexec and regcomp functions didn't take a length 
argument and would stop parsing if they hit an embedded \0.

This has been well known security vulnerability in the PHP world for the past 10 years, where in some
cases form validation could be bypassed by adding an embedded null to a string being validated.

It could have been used in a similar way for RADIUS, though the likelihood of someone using it to 
expose a critical vulnerability in a site's configuration was slim, so it was never a priority to fix.

The latest series of commits introduces some changes in how regexes are handled.

*	If you're building on a POSIX compliant non-BSD system without libpcre (i.e. most Linuxes) 
	and if the regex code finds an embedded \0 in the pattern or subject of the regular expression,
	evaluation will fail.

*	If you're building on a POSIX compliant BSD system, regncomp and regnexec are used. These are
	non-portable BSD functions which take a length argument. In this case a string with an
	embedded \0 will be treated like any other string.

*	If you're building on a system with libpcre, the native libpcre functions are used (previously
	we relied on the pcreposix shim). As pcre_compile, pcre_study and pcre_exec all take length 
	arguments, a string with an embedded \0 will be treated like any other string.

In addition to being mostly (apart from POSIX) binary safe, switching to the native libpcre library
has some advantages.

If using libpcre > 8.20 the JIT compiler is now used for precompiled expressions (the majority of
unlang if statements). The JIT compiler converts the compiled expression to architecture specific byte 
code, which should execute significantly faster, especially in expressions with lots of alternation.

Named capture groups are also available if using libpcre, and may be accessed using 
%{regex:<named capture group>}. 

Reworking the subcapture storage code has removed the performance penalty for large numbers of 
subcaptures, so the limit has been increased to 32. The limit is completely arbitrary, but is compiled
in, so can't be changed in the config. If people really want more capture groups then it can be
increased again.

The memory cost is 12 bytes for every PCRE capture group or 16 bytes for every POSIX capture group,
and access is O(1).

-Arran

Arran Cudbard-Bell <a.cudbardb at freeradius.org>
FreeRADIUS development team

FD31 3077 42EC 7FCD 32FE 5EE2 56CF 27F9 30A8 CAA2



More information about the Freeradius-Users mailing list