Memory leak parsing/relaying from detail files?
Nathan Ward
lists+freeradius at daork.net
Tue Mar 8 06:10:53 CET 2016
Hi,
I am working through debugging a memory leak I am hitting, and I think I have found the culprit.
The trigger appears to be a malformed NAS-IPv6-Address attribute coming from a Cisco ASR9000, and I will certainly contact the vendor, however it occurs to me that we should work to solve this in FreeRADIUS as well.
We use reliable proxied accounting with the detail file reader, however we write everything to a file and read from that file rather than writing only if the backend server pool has no reachable members.
We are running 3.0.11, based on the 3.0.4 Redhat RPMs which we have forward ported.
The ASR9000 sends a NAS-IPv6-Address attribute as follows:
0000 5f 16 1a 10 97 8c 00 00 00 00 00 00 00 00 00 00 _...............
0010 00 00 00 00 00 00 ……
This has a length field of 0x16, while the standard defines this as always 0x12 - an IPv6 address is 16 bytes, plus two bytes for the type/len. I’m not sure what the ASR9000 thinks is going on.. but I guess that’s not in scope for this memory leak per se.
FreeRADIUS interprets this as being not a valid NAS-IPv6-Address, so internally treats it as Attr-95 rather than NAS-IPv6-Address, which is the behaviour I’d expect.
This gets written to a detail file, and some time later the detail file reader calls through to dict_unknown_afrom_fields, which allocates some memory that seems to never get freed.
Running in valgrind, once for 5 mins, once for 25 mins, we note that the following call stack shows up as much larger after 25 mins.
Here it is after about 5 mins:
==31363== 1,379,332 bytes in 5,653 blocks are definitely lost in loss record 1,935 of 1,937
==31363== at 0x4C29BFD: malloc (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
==31363== by 0x5B0CA20: __talloc_with_prefix (talloc.c:615)
==31363== by 0x5B0CA20: __talloc (talloc.c:655)
==31363== by 0x5B0CA20: _talloc_named_const (talloc.c:812)
==31363== by 0x5B0CA20: _talloc_zero (talloc.c:2206)
==31363== by 0x50786DE: dict_unknown_afrom_fields (dict.c:2826)
==31363== by 0x508573F: data2vp (radius.c:3794)
==31363== by 0x507F1B1: fr_pair_from_unkown (pair.c:1267)
==31363== by 0x507F1B1: fr_pair_make_unknown (pair.c:1365)
==31363== by 0x507F1B1: fr_pair_make (pair.c:1439)
==31363== by 0x507F807: fr_pair_list_afrom_str (pair.c:1865)
==31363== by 0x146B18: detail_poll (detail.c:735)
==31363== by 0x147028: detail_handler_thread (detail.c:1085)
==31363== by 0x67BBDF4: start_thread (pthread_create.c:308)
==31363== by 0x6F431AC: clone (clone.S:113)
Again after about 25 mins:
==10362== 7,163,840 bytes in 29,360 blocks are definitely lost in loss record 1,943 of 1,944
==10362== at 0x4C29BFD: malloc (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
==10362== by 0x5B0CA20: __talloc_with_prefix (talloc.c:615)
==10362== by 0x5B0CA20: __talloc (talloc.c:655)
==10362== by 0x5B0CA20: _talloc_named_const (talloc.c:812)
==10362== by 0x5B0CA20: _talloc_zero (talloc.c:2206)
==10362== by 0x50786DE: dict_unknown_afrom_fields (dict.c:2826)
==10362== by 0x508573F: data2vp (radius.c:3794)
==10362== by 0x507F1B1: fr_pair_from_unkown (pair.c:1267)
==10362== by 0x507F1B1: fr_pair_make_unknown (pair.c:1365)
==10362== by 0x507F1B1: fr_pair_make (pair.c:1439)
==10362== by 0x507F807: fr_pair_list_afrom_str (pair.c:1865)
==10362== by 0x146B18: detail_poll (detail.c:735)
==10362== by 0x147028: detail_handler_thread (detail.c:1085)
==10362== by 0x67BBDF4: start_thread (pthread_create.c:308)
==10362== by 0x6F431AC: clone (clone.S:113)
I have set up GDB to print out a backtrace whenever dict_unknown_afrom_fields is called with fr_pair_from_unkown 2 frames up the stack. We get this:
#0 dict_unknown_afrom_fields (ctx=ctx at entry=0x0, attr=95, vendor=0) at src/lib/dict.c:2822
#1 0x00007fee4f416740 in data2vp (ctx=ctx at entry=0x0, packet=packet at entry=0x0, original=original at entry=0x0, secret=secret at entry=0x0, da=<optimized out>, start=<optimized out>, attrlen=20, packetlen=20,
pvp=pvp at entry=0x7fee42bec848) at src/lib/radius.c:3794
#2 0x00007fee4f4101b2 in fr_pair_from_unkown (da=<optimized out>, vp=0x7fee340e3690) at src/lib/pair.c:1267
#3 fr_pair_make_unknown (op=<optimized out>, value=0x7fee42bec9d0 "Attr-95", attribute=0x7fee42bec9d0 "Attr-95", ctx=0x7fee5079c3f0) at src/lib/pair.c:1365
#4 fr_pair_make (ctx=ctx at entry=0x7fee5079c3f0, vps=vps at entry=0x0, attribute=attribute at entry=0x7fee42bec9d0 "Attr-95", value=value at entry=0x7fee42becad0 "0x1a108834", '0' <repeats 32 times>, op=<optimized out>)
at src/lib/pair.c:1439
#5 0x00007fee4f410808 in fr_pair_list_afrom_str (ctx=ctx at entry=0x7fee5079c3f0, buffer=buffer at entry=0x7fee42bed570 "\tAttr-95 = 0x1a108834", '0' <repeats 32 times>, "\n", list=list at entry=0x7fee42becf48) at src/lib/pair.c:1865
#6 0x00007fee4facdb19 in detail_poll (listener=listener at entry=0x7fee5079c2a0) at src/main/detail.c:735
#7 0x00007fee4face029 in detail_handler_thread (arg=0x7fee5079c2a0) at src/main/detail.c:1085
#8 0x00007fee4dcd7df5 in start_thread (arg=0x7fee42bee700) at pthread_create.c:308
#9 0x00007fee4d5881ad in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113
Note Attr-95 - this is in every one of these backtraces that I looked at.
I have attempted to filter out both Attr-95 and NAS-IPv6-Address at the start of my accounting section before it is written to the detail file - we don’t use this attribute so don’t care.
I do this with the following:
if (Attr-95) {
update request {
Attr-95 !* ANY
}
}
But if (Attr-95) always returns false.
I tried the same logic with NAS-IPv6-Address and get the same result, same trying this in preacct. Preacct has had no configuration changed from the standard “default” site config.
Here is a trimmed output - let me know if you want the full thing, and I will get some test circuits set up which don’t have customer data:
(469) Tue Mar 8 15:33:39 2016: Debug: Received Accounting-Request Id 141 from 103.241.56.176:51487 to 10.55.86.12:1813 length 1100
.. trimmed ..
(469) Tue Mar 8 15:33:39 2016: Debug: Attr-95 = 0x1a10983400000000000000000000000000000000
.. trimmed ..
(469) Tue Mar 8 15:33:39 2016: Debug: # Executing section preacct from file /etc/raddb/sites-enabled/trustpower-frontend
.. trimmed ..
(469) Tue Mar 8 15:33:40 2016: Debug: # Executing section accounting from file /etc/raddb/sites-enabled/trustpower-frontend
(469) Tue Mar 8 15:33:40 2016: Debug: accounting {
(469) Tue Mar 8 15:33:40 2016: Debug: if (Attr-95) {
(469) Tue Mar 8 15:33:40 2016: Debug: if (Attr-95) -> FALSE
I have two questions:
1) Can I filter this out in some way that I am unaware of as a short term fix?
2) I would like to figure out this memory leak myself and submit a patch, but I’m not sure how this part of the code works. I *think* that this should be freed after being proxied to the backend server - so, perhaps in process.c somewhere after a packet is sent, it should have any vp das that have is_unknown set to true freed? I’m not clear to me what impact that would have on other bits of code.
--
Nathan Ward
More information about the Freeradius-Users
mailing list