Conflicting packets
Alan DeKok
aland at deployingradius.com
Mon Mar 1 14:39:42 CET 2010
rihad wrote:
> We have FreeRADIUS 2.1.3 servicing four Cisco NASses, which in turn
> service hundreds of PPPoE clients. rlm_perl with a custom written script
> is used for authorization/accounting, performing at about 10 auth
> requests/sec on a Dell PowerEdge 2950 box.
That is *incredibly* slow. The server should normally be able to
handle 1000's of requests/s going to a DB, and 10's of 1000's of
requests/s if the user data is cached in RAM.
> At times, when a NAS is
> rebooted, triggering reauthentication of hundreds of PPPoE users, the
> server log is swamped with many lines of this kind:
>
> Error: Received conflicting packet from client 10.10.70.3 port 1645 -
> ID: 86 due to unfinished request 273963. Giving up on old request.
Yup. Your system is too slow to handle the load.
> I've tried all sorts of combinations for the above, with max_requests as
> low as 50, to no avail.
There is no magic configuration option that will make your server faster.
Do the math: 100's of users dialing in simultaneously, with the server
processing 10 packets/s. With a backlog of 500 requests, it will be
many 10s of seconds before the server gets around to processing a
particular request. At that point, the ADSL modem NAS will have given
up, and tried *again*. This will increase the backlog, likely doubling it.
However, if you want to "work around" the problem, set "max_requests"
to something like 128000. The server will use more RAM, but it will
make progress.
> Can freeradius be configured to reply with a (temporary) REJECT to an
> auth request when max_requests is reached, instead of just ignoring the
> request?
No.
> I think that would allow the server to make steady progress.
No. It means that the users will think that they can't get on, and
will call tech support for help.
> What else should I do in terms of radius configuration? Please do not
> suggest that I fix the code to make it faster, it's more of a
> misconfiguration issue (radius or Cisco).
Nonsense. Your code is slow. Don't blame FreeRADIUS or Cisco for
your mistake.
> The server should be making
> some progress no matter how slow it ran, unlike what we're having at the
> times of trouble.
The server *is* making progress. It just doesn't matter. When the
CPU is pegged at 100% due to YOUR SCRIPT BEING SLOW, then no amount of
magic RADIUS configuration will make the system run faster.
Fix your script, or install 4x as many servers, and put a load
balancer in front of them. Nothing else will make the system run faster.
Alan DeKok.
More information about the Freeradius-Users
mailing list