3.0.x HEAD crashing
Phil Mayers
p.mayers at imperial.ac.uk
Wed Jun 18 20:11:17 CEST 2014
On 18/06/14 18:03, Arran Cudbard-Bell wrote:
>
> On 18 Jun 2014, at 16:45, Phil Mayers <p.mayers at IMPERIAL.AC.UK> wrote:
>
>> So run under valgrind, I'm reliably seeing use-after-free errors like this:
>>
>> Invalid read of size 4
>> at 0x36AD402D84: talloc_get_name (talloc.c:349)
>> by 0x36AD4057EA: _talloc_get_type_abort (talloc.c:1206)
>> by 0x4E470EC: fr_verify_vp (debug.c:829)
>
> Git pull.
>
> Set envvar TALLOC_FREE_FILL=B
>
> Talloc should now abort a little more gracefully.
>
> If it doesn't then lib/debug.c:828
Sorry, none of that seems to work; I just get:
Wed Jun 18 19:04:16 2014 : Info: talloc: access after free error - first
free may be at src/lib/valuepair.c:171
Wed Jun 18 19:04:16 2014 : Info: Bad talloc magic value - access after
free
Wed Jun 18 19:04:16 2014 : Info: talloc abort: Bad talloc magic value -
access after free
Wed Jun 18 19:04:16 2014 : Info: CAUGHT SIGNAL: Aborted
Wed Jun 18 19:04:16 2014 : Info: Backtrace of last 17 frames:
/opt/fr3/lib/libfreeradius-radius.so(fr_fault+0xd2)[0x7f19f465d674]
/opt/fr3/lib/libfreeradius-radius.so(+0xa9af)[0x7f19f465d9af]
/usr/lib64/libtalloc.so.2(talloc_get_name+0x58)[0x36ad402dd8]
/usr/lib64/libtalloc.so.2(_talloc_get_type_abort+0x2b)[0x36ad4057eb]
/opt/fr3/lib/libfreeradius-radius.so(fr_verify_vp+0xb1)[0x7f19f465e162]
/opt/fr3/lib/libfreeradius-radius.so(_fr_cursor_init+0x67)[0x7f19f465c91f]
/opt/fr3/lib/libfreeradius-radius.so(fr_verify_list+0x2e)[0x7f19f465e602]
/opt/fr3/lib/libfreeradius-server.so(+0x20a7d)[0x7f19f48b4a7d]
/opt/fr3/lib/libfreeradius-server.so(verify_request+0xd4)[0x7f19f48b4b5b]
/opt/fr3/sbin/radiusd[0x4354b9]
/opt/fr3/sbin/radiusd[0x433251]
/opt/fr3/lib/libfreeradius-radius.so(fr_event_run+0x142)[0x7f19f46803d9]
/opt/fr3/lib/libfreeradius-radius.so(fr_event_loop+0x509)[0x7f19f4680ce6]
/opt/fr3/sbin/radiusd(radius_event_process+0x26)[0x43d488]
/opt/fr3/sbin/radiusd(main+0xbf5)[0x42a1a5]
/lib64/libc.so.6(__libc_start_main+0xfd)[0x379dc1ed1d]
/opt/fr3/sbin/radiusd[0x40cd09]
,..and then it calls panic_action and aborts.
I can reliably reproduce it now; the trick seems to be to set a bunch of
eapol_test running in a loop, stop the server and start it. I think this
means it's concurrency/racey.
I have some circumstantial evidence that eap_ttls is implicated, and
that it might be related to the handling of the fake requests for the
inner tunnel - but it's very circumstantial. The heap corruption makes
it really hard to be sure of anything - *someone* is trampling over
memory they shouldn't, but valgrind seems to get very very confused when
this happens, and swamps me with messages.
More information about the Freeradius-Devel
mailing list