Solaris 8/SPARC - MySQL 5.0 NDB Cluster - Freeradius 1.1.1 withrlm_sqlippool module: 'radiusd' segmentation fault

Robles Rodriguez,Alejandro Alejandro.RoblesRodriguez at tre.se
Wed May 10 16:01:13 CEST 2006


Hi,

The purpose of this mail is to give you an insight on some stuff I've been trying at work (playing some could argue) that I'd like to share in case it could be useful to any of you out there.

	I won't describe all the issues I've had during the compilation, configuration and functional/performance testing nor ask you for help but rather just describe what I've done and document one of the last problems I had which kept me awake a few nights (segmentation fault).

	I have for the past 4 weeks been trying to evaluate if FreeRadius can be used as a AAA in an UMTS network with a large amount of subscribers for the GPRS Data services. With "if it can be used" I mean essentially if it can handle:

(1) Functionality: basic Authentication/Authorization/Accounting, IP Address allocation and some GPRS attribute to IP Address mapping storage.

(2) High Availability (no single point of failure HW/SW)

(3) Distributed Architecture (performance target of 250 requests/second peak hour at a reasonable HW/SW cost)


	For the purpose of this test I have decided to use (32 bit due to problems getting it to compile with 64 bit on SPARC with the distributed binaries from MySQL):

(a) Solaris 8 on SPARC (selected due to the fact that these machines were pretty much idle at my company similar tests were run on x86 PCs based on Fedora Linux Core 4).

(b) MySQL 5.0.21 (MAX version) 32 bit SPARC binary distribution.

(c) Freeradius 1.1.1 (originally with 1.1.0 but due to bugs on the Dictionary and thanks to recommendation (mail archives) from Alan DeKok I upgraded.

(d) For IP allocation I'm using the rlm_sqlippool module (hard to tell its version because it's not version controlled as far as I could see, I got it from a Russian website) as per Alan DeKok's recommendation (mail archives). It will require some customization as I'm looking into being able to define IP pools as being comprised of several (not just one) start/end IP ranges.


	The test bed is basically two physical nodes each running the same software i.e. radiusd, mysqld and ndbd (MySQL clustered storage engine process). The NAS (in UMTS these are called GGSN) will load-balance the requests (directly or through an IP Load Balancer or even a freeradius proxy haven't decided yet which).

	This configuration allows vertical (bigger machines) and horizontal (more machines) scalability by adding more CPU:s or extra nodes to the cluster respectively for improved performance. I have tested the vertical scalability and it's linear with the CPU utilization. The horizontal will be tested in the coming days (hard to get hold of the required HW for the tests). I will publish some results (more quantitative than this email) then.

	Last but not least (and in connection to the subject of this email) one bug I found on the rlm_sqlippool that I have (as I mentioned hard to tell its version) is that during load testing and given the right circumstances (multiple NAS, Solaris architecture, MySQL Cluster storage engine only and high CPU utilization) I was getting a core dump of the 'radiusd' process.

	The problem was during the post-authorization phase of the sqlippool module on the 'allocate-find' SQL statement result retrieval due to the fact that the expected result row (just one expected with just one field containing the IP address to allocate) had invalid memory references (a row is modelled as an array of references to result columns and the only reference was invalid and therefore causing a segmentation fault to happen).

	Looking at the code and debugging it for a while I noticed that the memory holding the result set was being released before it was being used (though previously a reference to the first and only row had been kept) hence causing unpredictable results.

	Anyhow the code changes to fix this was to simply move the 'sql_finish_select_query' function call (which indirectly calls the MySQL function 'mysql_free_result' to release memory allocated to the result set) a few lines down the 'sqlippool_query1' function which is the one retrieving the IP Address to be allocated in 'rlm_sqlippool.c' file. See below for details:

1	/*
2	 * Query the database expecting a single result row
3	 */
4	static int sqlippool_query1(char * out, int outlen, const char * fmt, SQLSOCK * sqlsocket, void * instance, REQU
5	EST * request, char * param, int param_len)
6	{
7	        rlm_sqlippool_t * data = (rlm_sqlippool_t *) instance;
8	        char expansion[MAX_STRING_LEN * 4];
9	        char query[MAX_STRING_LEN * 4];
10	        SQL_ROW row;
11	        int r;
12	
13	        sqlippool_expand(expansion, sizeof(expansion), fmt, instance, param, param_len);
14	
15	        /*
16	         * Do an xlat on the provided string
17	         */
18	        if (request) {
19	                if (!radius_xlat(query, sizeof(query), expansion, request, NULL)) {
20	                        radlog(L_ERR, "sqlippool_command: xlat failed.");
21	                        out[0] = '\0';
22	                        return 0;
23	                }
24	        }
25	        else {
26	                strcpy(query, expansion);
27	        }
28	
29	#if 0
30	        DEBUG2("sqlippool_query1: '%s'", query);
31	#endif
32	
33	        if (rlm_sql_select_query(sqlsocket, data->sql_inst, query)){
34	                radlog(L_ERR, "sqlippool_query1: database query error");
35	                out[0] = '\0';
36	                return 0;
37	        }
38	
39	        r = rlm_sql_fetch_row(sqlsocket, data->sql_inst);
40	
41	        if (r) {
42	                DEBUG("sqlippool_query1: SQL query did not succeed");
43	                out[0] = '\0';
44	                return 0;
45	        }
46	
47	        row = sqlsocket->row;
48	        if (row == NULL) {
49	                DEBUG("sqlippool_query1: SQL query did not return any results");
50	                out[0] = '\0';
51	                return 0;
52	        }
53	
54	        if (row[0] == NULL){
55	                DEBUG("sqlippool_query1: row[0] returned NULL");
56	                out[0] = '\0';
57	                return 0;
58	        }
59	
60	        r = strlen(row[0]);
61	        if (r >= outlen){
62	                DEBUG("sqlippool_query1: insufficient string space");
63	                out[0] = '\0';
64	                return 0;
65	        }
66	
67	        strncpy(out, row[0], r);
68	        out[r] = '\0';
69	
70	        (data->sql_inst->module->sql_finish_select_query)(sqlsocket, data->sql_inst->config);
71	
72	        return r;
73	}


	Line number 70 was originally right after 39 (after keeping a reference to the first (and only) result row. The problem is that the row is a reference to references to memory allocated by the MySQL C API, which gets released whenever the 'mysql_free_result' function gets called, but the problem it only popped up under certain conditions hard to re-create.


	I'm done for now more details will come later meanwhile I have a question: is the rlm_sqlippool module going to be part of a freeradius release in the near future and if not, what would it be the procedure to follow for it to happen?

Thanks and hope I didn't take so much of your time if you have read the whole thing!

Cheers,
Alex.




More information about the Freeradius-Users mailing list