Conflicting packets

Alan DeKok aland at deployingradius.com
Mon Mar 1 14:39:42 CET 2010


rihad wrote:
> We have FreeRADIUS 2.1.3 servicing four Cisco NASses, which in turn
> service hundreds of PPPoE clients. rlm_perl with a custom written script
> is used for authorization/accounting, performing at about 10 auth
> requests/sec on a Dell PowerEdge 2950 box.

  That is *incredibly* slow.  The server should normally be able to
handle 1000's of requests/s going to a DB, and 10's of 1000's of
requests/s if the user data is cached in RAM.

> At times, when a NAS is
> rebooted, triggering reauthentication of hundreds of PPPoE users, the
> server log is swamped with many lines of this kind:
> 
> Error: Received conflicting packet from client 10.10.70.3 port 1645 -
> ID: 86 due to unfinished request 273963.  Giving up on old request.

  Yup.  Your system is too slow to handle the load.

> I've tried all sorts of combinations for the above, with max_requests as
> low as 50, to no avail.

  There is no magic configuration option that will make your server faster.

  Do the math: 100's of users dialing in simultaneously, with the server
processing 10 packets/s.  With a backlog of 500 requests, it will be
many 10s of seconds before the server gets around to processing a
particular request.  At that point, the ADSL modem NAS will have given
up, and tried *again*.  This will increase the backlog, likely doubling it.

  However, if you want to "work around" the problem, set "max_requests"
to something like 128000.  The server will use more RAM, but it will
make progress.

> Can freeradius be configured to reply with a (temporary) REJECT to an
> auth request when max_requests is reached, instead of just ignoring the
> request?

  No.

> I think that would allow the server to make steady progress.

  No.  It means that the users will think that they can't get on, and
will call tech support for help.

> What else should I do in terms of radius configuration? Please do not
> suggest that I fix the code to make it faster, it's more of a
> misconfiguration issue (radius or Cisco).

  Nonsense.  Your code is slow.  Don't blame FreeRADIUS or Cisco for
your mistake.

> The server should be making
> some progress no matter how slow it ran, unlike what we're having at the
> times of trouble.

  The server *is* making progress.  It just doesn't matter.  When the
CPU is pegged at 100% due to YOUR SCRIPT BEING SLOW, then no amount of
magic RADIUS configuration will make the system run faster.

  Fix your script, or install 4x as many servers, and put a load
balancer in front of them.  Nothing else will make the system run faster.

  Alan DeKok.



More information about the Freeradius-Users mailing list