possible radutmp problem in 2.1.4 and/or 2.1.5-git(2009-04-17 state)?
Rozsahegyi Bela
rb at externet.hu
Sun Apr 19 16:57:42 CEST 2009
Hello!
I use freeradius 1.1.x version a few years ago without any problems with
5 cisco NAS and ~15,000 online user in daytime. This work with realtime
mysql accounting, but more user, more accounting record --> growing
problem with the records rotate and user search without long sql lock.
A few days before I changed the version to 2.1.4 with modified
configuration, traditional authentication and separate radrelay
process for accounting. If I need rotate the radacct table, only stop the radrelay,
and this is ideal for me, I do the rotate without sql locking problem or
the tech support team can do a search without long time lock :)
But I found some problem. On the 2.1.4 version found the "detail file
polling go crazy" problem and the fix from 2.1.5-git. This works very
well, thanks :)
The second problem both the 2.1.4 and 2.1.5, but the 2.1.5git well tested:
In this situation:
-or the network connection interrupted for a few seconds or a minute between the radius server and NAS or sql server
-or the radiusd need stop, some second still and start
-or after 5-10 hour (the longest running time 16 hour),
-or the radutmp file size around 1.8-2.2 Mbyte
the radius stop responding.
I was luck, and catch an strace output, while stop:
stat("/usr/local/freeradius/var/log/radius/radacct/radrelay/detail", 0x705d786a5ba0) = -1 ENOENT (No file or directory)
stat("/usr/local/freeradius/var/log/radius/radacct/radrelay", {st_mode=S_IFDIR|0755, st_size=6, ...})
open("/usr/local/freeradius/var/log/radius/radacct/radrelay/detail", O_WRONLY|O_CREAT|O_APPEND, 0600)
lseek(127, 0, SEEK_SET) = 0
flock(127, LOCK_EX|LOCK_NB) = 0
fstat(127, {st_mode=S_IFREG|0600, st_size=0, ...}) = 0
fcntl(127, F_GETFL) = 0x8401 (flags O_WRONLY|O_APPEND|O_LARGEFILE)
fstat(127, {st_mode=S_IFREG|0600, st_size=0, ...}) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x6a8898c4d000
lseek(127, 0, SEEK_CUR) = 0
fstat(127, {st_mode=S_IFREG|0600, st_size=0, ...}) = 0
lseek(127, 0, SEEK_SET) = 0
write(127, "Fri Apr 17 21:45:34 2009\n\tAcct-S"..., 492) = 492
lseek(127, 0, SEEK_SET) = 0
flock(127, LOCK_UN) = 0
close(127) = 0
munmap(0x6a8898c4d000, 4096) = 0
open("/usr/local/freeradius/var/log/radius/radutmp", O_RDWR|O_CREAT, 0600) = 127
flock(127, LOCK_EX
...and this point freeze the process, rarely only the kill -9 the solution
for stop.
The restart is not possible, it can do accept one or two user after start,
and freeze again. The only one solution is delete the radutmp file, and start again.
At this moment (when can do only 1-2 user) do a debug with -X paramter.
The latest lines, twice probe to start:
The first probe:
[radrelay-detail] expand:
/usr/local/freeradius/var/log/radius/radacct/radrelay/detail ->
/usr/local/freeradius/var/log/radius/radacct/radrelay/detail
[radrelay-detail]
/usr/local/freeradius/var/log/radius/radacct/radrelay/detail expands to
/usr/local/freeradius/var/log/radius/radacct/radrelay/detail
[radrelay-detail] Acquired filelock, tried 1 time(s)
[radrelay-detail] expand: %t -> Sun Apr 19 16:34:03 2009
[radrelay-detail] Released filelock
++[radrelay-detail] returns ok
[radutmp] expand: /dev/shm/radutmp -> /dev/shm/radutmp
[radutmp] expand: %{User-Name} -> myuser at myrealm.hu
...and this pont freeze and not responding.
The second probe to start:
[radrelay-detail] expand:
/usr/local/freeradius/var/log/radius/radacct/radrelay/detail ->
/usr/local/freeradius/var/log/radius/radacct/radrelay/detail
[radrelay-detail]
/usr/local/freeradius/var/log/radius/radacct/radrelay/detail expands to
/usr/local/freeradius/var/log/radius/radacct/radrelay/detail
[radrelay-detail] Acquired filelock, tried 1 time(s)
[radrelay-detail] expand: %t -> Sun Apr 19 16:34:59 2009
[radrelay-detail] Released filelock
++[radrelay-detail] returns ok
[radutmp] expand: /dev/shm/radutmp -> /dev/shm/radutmp
[radutmp] expand: %{User-Name} -> myotheruser at myrealm.hu
...and this pont freeze and not responding.
When delete the radutmp, and start the radius, the authentication is very fast, a
custom, radclient style monitoring system say about 10-12 msec for a login.
If the radutmp file grow, this time going over 30 msec, when the radutmp file in the /dev/shm
"ramdisk". If the radutmp file on standard filesystem, had similar effect,
but a little slower respond.
Could someone help, what should I do? Ran out of ideas...
The radutmp file usage better me, than the sql backend, because the
radrelay do a little delay.
Thanks,
RB
More information about the Freeradius-Users
mailing list