3.0.x crash in _talloc_free()

Matthew Newton mcn4 at leicester.ac.uk
Wed Nov 18 13:47:48 CET 2015


Hi,

Just put our first real FR3 deployment live yesterday (3.0.x HEAD
46fb0cd312f0) :) Then quickly hit a crash :(

Initial outputs (was running -X to start so these are single
threaded) gave:

...
(294) eap_tls: ocsp: Cert status: good
(294) eap_tls: ocsp: Certificate is valid
(294) eap_tls: TLS_accept: SSLv3 read client certificate A
(294) eap_tls: <<< recv TLS 1.0 Handshake [length 0046], ClientKeyExchange 
*** glibc detected *** freeradius: free(): invalid pointer: 0x0000000001c7a140 ***
======= Backtrace: =========
...

and

...
Waking up in 0.1 seconds.
(4563) Cleaning up request packet ID 8 with timestamp +1917
(4564) Cleaning up request packet ID 9 with timestamp +1917
Waking up in 0.2 seconds.
(4566) Cleaning up request packet ID 207 with timestamp +1917
*** glibc detected *** /usr/sbin/freeradius: free(): invalid next size (fast): 0x00000000019f4830 ***
======= Backtrace: =========
...

Haven't manage to get a core dump yet, but the panic action is
working. Three crashes so far are the same, one slightly
different. All are in _talloc_free: The three same ones error in
the request_free (process.c:806) at the end of request_done. The
other is in eappeap_process (peap.c:828).

The fact it manages to get all the way through request_free
without crashing, with all the checks that are done in there,
before bombing out in the free seem slightly weird to me, as the
data in *request must be sane.

System is Debian wheezy with talloc 2.0.7. I think my next plan is
to upgrade the system to jessie which has version 2.1.1 available
and see if it's a bug in that old version of talloc that's being
hit. Weird nobody else has reported it before, though; it's not a
busy server and crashed four times so far this morning (though not
at all overnight when less lightly loaded).

Full startup output from -X and four gdb panic action dumps are at
(yeah I know there are a few warnings in the config at the moment):

  https://gist.github.com/mcnewton/63a01685f3748e202e81

One snippet of panic_action output below.

Any ideas?

Cheers,

Matthew



Thread 1 (Thread 0x7fadf28dd720 (LWP 16110)):
#0  0x00007fadf067a0ad in waitpid () from /lib/x86_64-linux-gnu/libc.so.6
No symbol table info available.
#1  0x00007fadf060e989 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
No symbol table info available.
#2  0x00007fadf060ecc0 in system () from /lib/x86_64-linux-gnu/libc.so.6
No symbol table info available.
#3  0x00007fadf22a80ab in fr_fault (sig=<optimized out>)
    at src/lib/debug.c:725
        disable = false
        cmd = "gdb -silent -x /srv/radius/panic.gdb /usr/sbin/freeradius 16110 2>&1 | tee /var/log/freeradius/gdb-freeradius-16110.log", '\000' <repeats 412 times>
        out = 0x7ffcff10ec23 ".log"
        left = 417
        ret = <optimized out>
        p = <optimized out>
        q = 0x0
        code = <optimized out>
#4  <signal handler called>
No symbol table info available.
#5  0x00007fadf0601165 in raise () from /lib/x86_64-linux-gnu/libc.so.6
No symbol table info available.
#6  0x00007fadf06043e0 in abort () from /lib/x86_64-linux-gnu/libc.so.6
No symbol table info available.
#7  0x00007fadf063b39b in ?? () from /lib/x86_64-linux-gnu/libc.so.6
No symbol table info available.
#8  0x00007fadf0644be6 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
No symbol table info available.
#9  0x00007fadf064998c in free () from /lib/x86_64-linux-gnu/libc.so.6
No symbol table info available.
#10 0x00007fadf1831089 in ?? () from /usr/lib/x86_64-linux-gnu/libtalloc.so.2
No symbol table info available.
#11 0x00007fadf1831023 in ?? () from /usr/lib/x86_64-linux-gnu/libtalloc.so.2
No symbol table info available.
#12 0x00007fadf1831023 in ?? () from /usr/lib/x86_64-linux-gnu/libtalloc.so.2
No symbol table info available.
#13 0x00007fadf1831023 in ?? () from /usr/lib/x86_64-linux-gnu/libtalloc.so.2
No symbol table info available.
#14 0x00007fadf182d8b3 in _talloc_free ()
   from /usr/lib/x86_64-linux-gnu/libtalloc.so.2
No symbol table info available.
#15 0x00000000004316fe in request_done (request=request at entry=0x7fade4101d00, 
    action=action at entry=6) at src/main/process.c:806
        now = {tv_sec = 0, tv_usec = 140384832071768}
        when = {tv_sec = 0, tv_usec = 140385039194508}
#16 0x000000000043347d in request_cleanup_delay (request=0x7fade4101d00, 
    action=<optimized out>) at src/main/process.c:1115
        when = {tv_sec = 1447837345, tv_usec = 239972}
        now = {tv_sec = 1447837345, tv_usec = 240304}
        __FUNCTION__ = "request_cleanup_delay"
#17 0x00007fadf22c44f4 in fr_event_run (el=el at entry=0x1ca51d0, 
    when=when at entry=0x7ffcff10fee0) at src/lib/event.c:309
        callback = 0x42f9e0 <request_timer>
        ctx = 0x7fade4101d00
        ev = <optimized out>
#18 0x00007fadf22c4a4e in fr_event_loop (el=0x1ca51d0) at src/lib/event.c:624
        i = <optimized out>
        rcode = 0
        when = {tv_sec = 1447837345, tv_usec = 240304}
        wake = <optimized out>
        maxfd = 20
        read_fds = {fds_bits = {0 <repeats 16 times>}}
        master_fds = {fds_bits = {256000, 0 <repeats 15 times>}}
#19 0x00000000004374c1 in radius_event_process () at src/main/process.c:5640
No locals.
#20 0x000000000041100f in main (argc=3, argv=<optimized out>)
    at src/main/radiusd.c:578
        rcode = 0
        status = <optimized out>
        argval = <optimized out>
        spawn_flag = true
        display_version = false
        flag = 0
        from_child = {4, 5}
        state = 0x668840
        autofree = 0x1a76060
A debugging session is active.

  Inferior 1 [process 16110] will be detached.







-- 
Matthew Newton, Ph.D. <mcn4 at le.ac.uk>

Systems Specialist, Infrastructure Services,
I.T. Services, University of Leicester, Leicester LE1 7RH, United Kingdom

For IT help contact helpdesk extn. 2253, <ithelp at le.ac.uk>


More information about the Freeradius-Devel mailing list