3.0.x crash in _talloc_free()
Matthew Newton
mcn4 at leicester.ac.uk
Wed Nov 18 13:47:48 CET 2015
Hi,
Just put our first real FR3 deployment live yesterday (3.0.x HEAD
46fb0cd312f0) :) Then quickly hit a crash :(
Initial outputs (was running -X to start so these are single
threaded) gave:
...
(294) eap_tls: ocsp: Cert status: good
(294) eap_tls: ocsp: Certificate is valid
(294) eap_tls: TLS_accept: SSLv3 read client certificate A
(294) eap_tls: <<< recv TLS 1.0 Handshake [length 0046], ClientKeyExchange
*** glibc detected *** freeradius: free(): invalid pointer: 0x0000000001c7a140 ***
======= Backtrace: =========
...
and
...
Waking up in 0.1 seconds.
(4563) Cleaning up request packet ID 8 with timestamp +1917
(4564) Cleaning up request packet ID 9 with timestamp +1917
Waking up in 0.2 seconds.
(4566) Cleaning up request packet ID 207 with timestamp +1917
*** glibc detected *** /usr/sbin/freeradius: free(): invalid next size (fast): 0x00000000019f4830 ***
======= Backtrace: =========
...
Haven't manage to get a core dump yet, but the panic action is
working. Three crashes so far are the same, one slightly
different. All are in _talloc_free: The three same ones error in
the request_free (process.c:806) at the end of request_done. The
other is in eappeap_process (peap.c:828).
The fact it manages to get all the way through request_free
without crashing, with all the checks that are done in there,
before bombing out in the free seem slightly weird to me, as the
data in *request must be sane.
System is Debian wheezy with talloc 2.0.7. I think my next plan is
to upgrade the system to jessie which has version 2.1.1 available
and see if it's a bug in that old version of talloc that's being
hit. Weird nobody else has reported it before, though; it's not a
busy server and crashed four times so far this morning (though not
at all overnight when less lightly loaded).
Full startup output from -X and four gdb panic action dumps are at
(yeah I know there are a few warnings in the config at the moment):
https://gist.github.com/mcnewton/63a01685f3748e202e81
One snippet of panic_action output below.
Any ideas?
Cheers,
Matthew
Thread 1 (Thread 0x7fadf28dd720 (LWP 16110)):
#0 0x00007fadf067a0ad in waitpid () from /lib/x86_64-linux-gnu/libc.so.6
No symbol table info available.
#1 0x00007fadf060e989 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
No symbol table info available.
#2 0x00007fadf060ecc0 in system () from /lib/x86_64-linux-gnu/libc.so.6
No symbol table info available.
#3 0x00007fadf22a80ab in fr_fault (sig=<optimized out>)
at src/lib/debug.c:725
disable = false
cmd = "gdb -silent -x /srv/radius/panic.gdb /usr/sbin/freeradius 16110 2>&1 | tee /var/log/freeradius/gdb-freeradius-16110.log", '\000' <repeats 412 times>
out = 0x7ffcff10ec23 ".log"
left = 417
ret = <optimized out>
p = <optimized out>
q = 0x0
code = <optimized out>
#4 <signal handler called>
No symbol table info available.
#5 0x00007fadf0601165 in raise () from /lib/x86_64-linux-gnu/libc.so.6
No symbol table info available.
#6 0x00007fadf06043e0 in abort () from /lib/x86_64-linux-gnu/libc.so.6
No symbol table info available.
#7 0x00007fadf063b39b in ?? () from /lib/x86_64-linux-gnu/libc.so.6
No symbol table info available.
#8 0x00007fadf0644be6 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
No symbol table info available.
#9 0x00007fadf064998c in free () from /lib/x86_64-linux-gnu/libc.so.6
No symbol table info available.
#10 0x00007fadf1831089 in ?? () from /usr/lib/x86_64-linux-gnu/libtalloc.so.2
No symbol table info available.
#11 0x00007fadf1831023 in ?? () from /usr/lib/x86_64-linux-gnu/libtalloc.so.2
No symbol table info available.
#12 0x00007fadf1831023 in ?? () from /usr/lib/x86_64-linux-gnu/libtalloc.so.2
No symbol table info available.
#13 0x00007fadf1831023 in ?? () from /usr/lib/x86_64-linux-gnu/libtalloc.so.2
No symbol table info available.
#14 0x00007fadf182d8b3 in _talloc_free ()
from /usr/lib/x86_64-linux-gnu/libtalloc.so.2
No symbol table info available.
#15 0x00000000004316fe in request_done (request=request at entry=0x7fade4101d00,
action=action at entry=6) at src/main/process.c:806
now = {tv_sec = 0, tv_usec = 140384832071768}
when = {tv_sec = 0, tv_usec = 140385039194508}
#16 0x000000000043347d in request_cleanup_delay (request=0x7fade4101d00,
action=<optimized out>) at src/main/process.c:1115
when = {tv_sec = 1447837345, tv_usec = 239972}
now = {tv_sec = 1447837345, tv_usec = 240304}
__FUNCTION__ = "request_cleanup_delay"
#17 0x00007fadf22c44f4 in fr_event_run (el=el at entry=0x1ca51d0,
when=when at entry=0x7ffcff10fee0) at src/lib/event.c:309
callback = 0x42f9e0 <request_timer>
ctx = 0x7fade4101d00
ev = <optimized out>
#18 0x00007fadf22c4a4e in fr_event_loop (el=0x1ca51d0) at src/lib/event.c:624
i = <optimized out>
rcode = 0
when = {tv_sec = 1447837345, tv_usec = 240304}
wake = <optimized out>
maxfd = 20
read_fds = {fds_bits = {0 <repeats 16 times>}}
master_fds = {fds_bits = {256000, 0 <repeats 15 times>}}
#19 0x00000000004374c1 in radius_event_process () at src/main/process.c:5640
No locals.
#20 0x000000000041100f in main (argc=3, argv=<optimized out>)
at src/main/radiusd.c:578
rcode = 0
status = <optimized out>
argval = <optimized out>
spawn_flag = true
display_version = false
flag = 0
from_child = {4, 5}
state = 0x668840
autofree = 0x1a76060
A debugging session is active.
Inferior 1 [process 16110] will be detached.
--
Matthew Newton, Ph.D. <mcn4 at le.ac.uk>
Systems Specialist, Infrastructure Services,
I.T. Services, University of Leicester, Leicester LE1 7RH, United Kingdom
For IT help contact helpdesk extn. 2253, <ithelp at le.ac.uk>
More information about the Freeradius-Devel
mailing list