error discarding packet

Borislav Dimitrov b.dimitrov at ngsystems.net
Wed Dec 23 15:04:49 CET 2009


It's difficult to say, but I'd say about 7 - 10... Try with something  
like that:
thread pool {
	start_servers = 10
	max_servers = 32
	min_spare_servers = 3
	max_spare_servers = 10
	max_requests_per_server = 0
}

I don't know what your setup is exactly but in some situations its  
advisable that you configure FR not to start and stop processes/ 
threads dynamically (e.g.if you have some heavy setup procedures that  
need to be executed on thread startup; e.g. if you have a custom  
rlm_perl module which loads lots of configuration from a DB on thread  
startup).
I now saw you second message. The default settings should be good for  
about 3500 users and more. For VoIP accounting even less is OK. I'm  
starting to think that your problem is elsewhere... Does your CPU  
usage stay low? Have you created indexes in your DB? Is  
hostname_lookups = no? I find it difficult to guess what the problem  
is without knowing your setup but something is deffinitely slowing  
things down... FR is capable of managing many more than 3500 users on  
a commodity server when configured properly. Also check for general  
network delay packet loss etc... It's not normal the CPU usage to stay  
low while requests are queueing one after another to be processed...

Sincerely,

Borislav Dimitrov
e-mail: b.dimitrov at ngsystems.net
GSM: 0888 51 55 45; 0889 28 54 57
NG Systems
Lavele 32 str, fl: 4,
Sofia, Bulgaria



On 23.12.2009, at 15:49, Alisson wrote:

> ok..
>
> look I have 3500 customers authenticating on this server with mysql
>
> how many threads I need to set?
>
> thread pool {
> 	start_servers = 1
> 	max_servers = 1
> 	min_spare_servers = 1
> 	max_spare_servers = 1
> 	max_requests_per_server = 0
> }
>
>
> 2009/12/23 Borislav Dimitrov <b.dimitrov at ngsystems.net>
> Just to add that I hope that you are starting FR without the debug  
> flag/option (i.e. without -X). When started like that (radiusd -X &)  
> it starts in a single thread and obviously the requests will await  
> each other to finish...
>
> С поздрави
>
> Борислав Димитров
> e-mail: b.dimitrov at ngsystems.net
> GSM: 0888 51 55 45; 0889 28 54 57
> NG Systems
> Лавеле 32, ет: 4,
> София, България
>
>
>
>
> On 23.12.2009, at 15:43, Borislav Dimitrov wrote:
>
>> In radiusd.conf:
>>
>> # THREAD POOL CONFIGURATION
>> thread pool {
>> 	start_servers = 1
>> 	max_servers = 1
>> 	min_spare_servers = 1
>> 	max_spare_servers = 1
>> 	max_requests_per_server = 0
>> }
>>
>> ...but instead of ones (1s) put something more appropriate for your  
>> network usage (like 5s or 7s). It's similar to Apache's thread pool  
>> settings... Stay monitoring and tuning until the error discarding  
>> duplicate packet disappears or becomes very rare. Also look at the  
>> Acct-Delay-Time parameter returned from the NAS to FR. It should be  
>> 0. If it's more than 0, then there's some delay. When you increase  
>> you thread pool settings the CPU usage will start increasing as FR  
>> starts processing more requests simultaneously/concurrently. Also  
>> check your NAS documentation for configuration options of these  
>> timeout etc parameters. For Cisco they are like that:
>>> "radius-server retransmit 0" etc
>>
>> С поздрави
>>
>> Ð‘Ð¾Ñ€Ð¸Ñ Ð»Ð°Ð² Димитров
>> e-mail: b.dimitrov at ngsystems.net
>> GSM: 0888 51 55 45; 0889 28 54 57
>> NG Systems
>> Лавеле 32, ет: 4,
>> Ð¡Ð¾Ñ„Ð¸Ñ , БългариÑ
>>
>>
>>
>>
>> On 23.12.2009, at 15:36, Alisson wrote:
>>
>>> hi, my DB is ok I tested with another programms e etc, and is  
>>> running well
>>>
>>> how I set the thread pool to better concurrency?
>>>
>>> 2009/12/23 Borislav Dimitrov <b.dimitrov at ngsystems.net>
>>> Hi,
>>>
>>> This question has been answered many times on this ML. I myself  
>>> have (at least tried) answered it two times. Here're some of my  
>>> previous messages:
>>>
>>> Msg1:
>>> Hi,
>>>
>>> I've already tried to answer a similar question some time ago (and  
>>> I'm probably not the only one) but anyways...
>>> The cause of the problems probably is some delay or packet loss or  
>>> something like that. Notice the Acct-Delay-Time value increasing  
>>> as the NAS retries to send the "lost" accounting packet (although  
>>> - at least in my case - it wasn't lost but just its processing was  
>>> delayed). I've experienced such issues with Cisco VoIP routers -  
>>> the router's log is flooded with RADIUS Server DEAD - and then ...  
>>> ALIVE messages and in the FR log you can see the retries with the  
>>> values of Acct-Delay-Time increasing. The main cause of the  
>>> problem may be different, so you'll have to check it in your case.  
>>> In my case it was caused by the thread pool settings not being  
>>> appropriate for the load. In this case the CPU usage stays low but  
>>> it's not used because you cannot achieve good concurrency and  
>>> request have to await each other to finish. So find the main cause  
>>> for your problems and eliminate it. The other thing is that most  
>>> NASs have options to configure the RADIUS timeout, dead,  
>>> retransmit etc times. E.g.for Cisco you could try "radius-server  
>>> retransmit 0".
>>>
>>> Msg2:
>>> Hi,
>>>
>>> As far as I can see, the people on the list have provided you with  
>>> a lot of very useful suggestions on what could cause the problem.  
>>> As I said earlier (let me clarify) and to help you narrow things a  
>>> little bit - it's probably due to the RADIUS response timing out  
>>> hence the NAS complains the server is dead and later when it  
>>> responds finally it marks it as alive again. The reasons can be  
>>> different depending on your setup - slow network, database, custom  
>>> module (like rlm_perl/python etc) or as I suggested (from my  
>>> personal experiences) improperly configured concurrence settings  
>>> of FR itself. See which component of your setup is causing the  
>>> slow responds (it can be the backend, or messed up FR  
>>> configuration) and fix it. Just for completeness check your NASs  
>>> manuals - most have these settings configurable - response  
>>> timeouts, retransmits, marking the server as dead etc but playing  
>>> with the NAS while possibly useful is probably not the main issue  
>>> in your setup - check what is slowing things down.
>>>
>>> Msg3:
>>> Hi there,
>>>
>>> I may be mistaken but... these are log message on the NAS aren't  
>>> they?
>>> If this is the case, I've experienced similar behavior with Cisco  
>>> VoIP routers (RADIUS Server DEAD and then... ALIVE). This happens  
>>> if you haven't properly enabled concurrency in FreeRADIUS - the  
>>> CPU usage stays low 0%-1%-2% but if the requests are many they are  
>>> obviously waiting each other... This happens when you have stared  
>>> FreeRADIUS with the -X key (I think it starts with a single thread  
>>> then) or have too low values for the thread pool parameters (and/ 
>>> or the *_clones options of rlm_perl which are to be deprecated  
>>> soon). If you configure proper values according to the expected  
>>> usage (concurrent requests), then the request won't wait each  
>>> other to finish while the CPU stays unused and you'll avoid this  
>>> annoying message in your logs. A sure sing that something like  
>>> that is going on is the Acct-Delay-Time parameter with values  
>>> greater than 0 - that is for accounting not sure for auth etc.  
>>> Anyways if the values of that parameter are high (they are in  
>>> seconds I think) then the requests are waiting too long and hence  
>>> the error messages.
>>>
>>> Bottom line:
>>> 1) Check the ML for more info
>>> 2) The NAS can be configured when to timeout and resend the RADIUS  
>>> packages
>>> 3) Something is slowing down your setup. It may be the DB or  
>>> something else. If your CPU usage stays low (< 5%), check your  
>>> thread pool settings and increase them to achieve better  
>>> concurrency.
>>>
>>> Sincerely,
>>>
>>> Borislav Dimitrov
>>> e-mail: b.dimitrov at ngsystems.net
>>> GSM: 0888 51 55 45; 0889 28 54 57
>>> NG Systems
>>> Lavele 32 str, fl: 4,
>>> Sofia, Bulgaria
>>>
>>>
>>>
>>>
>>> On 23.12.2009, at 15:10, Alisson wrote:
>>>
>>> hi, in another day I posted this same error ' Error: Discarding  
>>> duplicate request from client '
>>>
>>> and the answer was 'your database is slow'
>>>
>>> so I upgrade my server with more memory, and changed servers  
>>> variables...
>>>
>>> but, i'm still having this problem
>>>
>>> and I dont know what can be
>>>
>>> -- 
>>> Att.
>>> Alisson F. Gonçalves
>>> Sistemas de Informação - UFGD
>>> -
>>> List info/subscribe/unsubscribe? See http://www.freeradius.org/list/users.html
>>>
>>>
>>> -
>>> List info/subscribe/unsubscribe? See http://www.freeradius.org/list/users.html
>>>
>>>
>>>
>>> -- 
>>> Att.
>>> Alisson F. Gonçalves
>>> Sistemas de Informação - UFGD
>>>
>>> -
>>> List info/subscribe/unsubscribe? See http://www.freeradius.org/list/users.html
>>
>
>
> -
> List info/subscribe/unsubscribe? See http://www.freeradius.org/list/users.html
>
>
>
> -- 
> Att.
> Alisson F. Gonçalves
> Sistemas de Informação - UFGD
> -
> List info/subscribe/unsubscribe? See http://www.freeradius.org/list/users.html

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.freeradius.org/pipermail/freeradius-users/attachments/20091223/f3317b67/attachment.html>


More information about the Freeradius-Users mailing list