Detail file handling
Peter Nixon
listuser at peternixon.net
Sat May 5 11:08:06 CEST 2007
On Sat 05 May 2007, Alan DeKok wrote:
> Peter Nixon wrote:
> > When running detail2db.pl (the perl import script I posted to the list a
> > few weeks ago) which only uses a single thread and a single DB socket,
> > from a remote machine (same machine as freeradius, not same as
> > PostgreSQL) the DB server load spikes to 2+ and the DB sometimes
> > responds slow enough that FreeRADIUS Auth queries fail and errors are
> > logged (not always.. depends on traffic levels)
>
> Ouch. That's not nice. I had thought that the priority queuing would
> ensure that authentication packets get handled before detail files.
>
> On the other hand, if the CPU is pegged from handling the detail
> packets, it won't have time to notice that an authentication packet has
> arrived.
>
> > For that reason I have added a 3000 usec sleep in between processing of
> > each packet, which keeps the DB load at about 0.8 while processing a
> > months worth of detail files. Note that perl should be a bit slower than
> > C so a good default sleep time is probably around 5000usec..
>
> Hmm... Hard-coded sleep times don't adapt well to changing
> circumstances. Maybe changing the code to require *2* waiting threads
> would be better. That way, the "max active threads" limit wouldn't be
> reached. Since that's not reached, more threads won't ever be created
> to handle the flood of detail packets. And, there will always be one
> waiting thread, which can handle any authentication packet that comes in.
Hi Alan
I think you may have misunderstood the problem and my solution slightly. The
CPU on the RADIUS servers NEVER gets pegged. Its the system load on the
backend DB server which goes high, slowing everything down. Now in fact its
not even the CPU that is being pegged, but rather the hard disk(s).
Now, you can always add more, faster harddisks and try to trim down the
amount of data you keep in your DB (so you seek less, and have smaller
indexes), but that fact remains that even with a dedicated DB server like I
have, your hard disks will NEVER be able to keep up with a similar speed
RADIUS server poking at it full speed with multiple threads, unless you have
at least one disk per radius thread in a RAID mirror so that each thread
gets its own disk head to seek with. This is an unlikely situation as its
extremely expensive.
Having 2 waiting threads before the detail file reader kicks in is a good
idea.. Maybe it should even be 3, or configurable... But it still doesn't
solve the problem above..
I have the following queries hitting the database during normal operation:
Auth-Request:
authorize_check_query - select from stored procedure with multiple joins
and selects
authorize_reply_query - select from stored procedure with multiple joins
and selects
authorize_group_check_query - default FR select which includes join
authorize_group_reply_query - default FR select which includes join
authenticate_query - never gets hit in my setup...
duplicate_session_killer - single select from radacct
sqlippool_dynamic - modified default FR select
sqlippool_static - modified default FR select
group_membership_query - default FR select
postauth_query - default FR insert
Now, this chunk of queries happens for EVERY auth packet. At the same time,
on other threads you could have multiple Accounting start, update or stop
packets which are poking new data into both the radacct table as well as the
sqlippool tables slowing things down by locking bits of those tables and
doing index updates...
One possible optimisation I could make it to have my duplicate_session_killer
use data from the sqlippool tables (which are fixed in length and therefore
should be faster to update indexes) than radacct. I need t benchmark this
though, and haven't seen the need yet to break a working system.
My current config uses 2 separate sql modules, one for auth queries and one
for acct. Both hit the same database. This ensures that auth can never
overwhelm acct and visa versa.. (Maybe some of your new changes will make
this redundant but it certainly improved things when I did it 6 months ago)
Now, whileever you have a relatively even mix of radius packets, spread apart
in time everythings works well. (As I said my DB has a 15min load average of
0.12 and in normal operation never goes over 0.3)
However when you have a single thread poking accounting data into the system
at full speed things start to break down. A single thread can push data into
the radacct table at > 1000 records per second, which pegs the hard disks
(and works the CPU pretty well also) making the DB spend all its time
updating indexes (as well as writing to disk of course).
This works slows down the multi-query auth process to the point where it can
start to internally time out, and certainly replies late to the NAS.
So, again.. Having the detail reader not inject packets back into the system
unless there is > X threads free is a good idea, but I still think that that
reader needs a configurable delay in between each packet that it injects...
Its no the lack of threads thats the issue (although that IS an issue), its
the amount of load one dedicated thread can create...
Cheers
--
Peter Nixon
http://www.peternixon.net/
PGP Key: http://www.peternixon.net/public.asc
More information about the Freeradius-Devel
mailing list