possible radutmp problem in 2.1.4 and/or 2.1.5-git(2009-04-17 state)?

Rozsahegyi Bela rb at externet.hu
Sun Apr 19 16:57:42 CEST 2009


Hello!

I use freeradius 1.1.x version a few years ago without any problems with 
5 cisco NAS and ~15,000 online user in daytime. This work with realtime 
mysql accounting, but more user, more accounting record --> growing 
problem with the records rotate and user search without long sql lock.

A few days before I changed the version to 2.1.4 with modified 
configuration, traditional authentication and separate radrelay 
process for accounting. If I need rotate the radacct table, only stop the radrelay, 
and this is ideal for me, I do the rotate without sql locking problem or 
the tech support team can do a search without long time lock :)

But I found some problem. On the 2.1.4 version found the "detail file 
polling go crazy" problem and the fix from 2.1.5-git. This works very 
well, thanks :)

The second problem both the 2.1.4 and 2.1.5, but the 2.1.5git well tested:

In this situation:
-or the network connection interrupted for a few seconds or a minute between the radius server and NAS or sql server
-or the radiusd need stop, some second still and start
-or after 5-10 hour (the longest running time 16 hour),
-or the radutmp file size around 1.8-2.2 Mbyte

the radius stop responding.

I was luck, and catch an strace output, while stop:


stat("/usr/local/freeradius/var/log/radius/radacct/radrelay/detail", 0x705d786a5ba0) = -1 ENOENT (No file or directory)
stat("/usr/local/freeradius/var/log/radius/radacct/radrelay", {st_mode=S_IFDIR|0755, st_size=6, ...})
open("/usr/local/freeradius/var/log/radius/radacct/radrelay/detail", O_WRONLY|O_CREAT|O_APPEND, 0600)
lseek(127, 0, SEEK_SET)                 = 0
flock(127, LOCK_EX|LOCK_NB)             = 0
fstat(127, {st_mode=S_IFREG|0600, st_size=0, ...}) = 0
fcntl(127, F_GETFL)                     = 0x8401 (flags O_WRONLY|O_APPEND|O_LARGEFILE)
fstat(127, {st_mode=S_IFREG|0600, st_size=0, ...}) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x6a8898c4d000
lseek(127, 0, SEEK_CUR)                 = 0
fstat(127, {st_mode=S_IFREG|0600, st_size=0, ...}) = 0
lseek(127, 0, SEEK_SET)                 = 0
write(127, "Fri Apr 17 21:45:34 2009\n\tAcct-S"..., 492) = 492
lseek(127, 0, SEEK_SET)                 = 0
flock(127, LOCK_UN)                     = 0
close(127)                              = 0
munmap(0x6a8898c4d000, 4096)            = 0
open("/usr/local/freeradius/var/log/radius/radutmp", O_RDWR|O_CREAT, 0600) = 127
flock(127, LOCK_EX


...and this point freeze the process, rarely only the kill -9 the solution 
for stop.


The restart is not possible, it can do accept one or two user after start, 
and freeze again. The only one solution is delete the radutmp file, and start again.

At this moment (when can do only 1-2 user) do a debug with -X paramter. 
The latest lines, twice probe to start:

The first probe:


[radrelay-detail]       expand: 
/usr/local/freeradius/var/log/radius/radacct/radrelay/detail -> 
/usr/local/freeradius/var/log/radius/radacct/radrelay/detail
[radrelay-detail] 
/usr/local/freeradius/var/log/radius/radacct/radrelay/detail expands to 
/usr/local/freeradius/var/log/radius/radacct/radrelay/detail
[radrelay-detail] Acquired filelock, tried 1 time(s)
[radrelay-detail]       expand: %t -> Sun Apr 19 16:34:03 2009
[radrelay-detail] Released filelock
++[radrelay-detail] returns ok
[radutmp]       expand: /dev/shm/radutmp -> /dev/shm/radutmp
[radutmp]       expand: %{User-Name} -> myuser at myrealm.hu

...and this pont freeze and not responding.

The second probe to start:

[radrelay-detail]       expand: 
/usr/local/freeradius/var/log/radius/radacct/radrelay/detail -> 
/usr/local/freeradius/var/log/radius/radacct/radrelay/detail
[radrelay-detail] 
/usr/local/freeradius/var/log/radius/radacct/radrelay/detail expands to 
/usr/local/freeradius/var/log/radius/radacct/radrelay/detail
[radrelay-detail] Acquired filelock, tried 1 time(s)
[radrelay-detail]       expand: %t -> Sun Apr 19 16:34:59 2009
[radrelay-detail] Released filelock
++[radrelay-detail] returns ok
[radutmp]       expand: /dev/shm/radutmp -> /dev/shm/radutmp
[radutmp]       expand: %{User-Name} -> myotheruser at myrealm.hu


...and this pont freeze and not responding.



When delete the radutmp, and start the radius, the authentication is very fast, a 
custom, radclient style monitoring system say about 10-12 msec for a login.
If the radutmp file grow, this time going over 30 msec, when the radutmp file in the /dev/shm 
"ramdisk". If the radutmp file on standard filesystem, had similar effect, 
but a little slower respond.

Could someone help, what should I do? Ran out of ideas...


The radutmp file usage better me, than the sql backend, because  the 
radrelay do a little delay.


 	Thanks,
 		RB





More information about the Freeradius-Users mailing list