error discarding packet

Wed Dec 23 15:16:59 CET 2009

i'm trying to solve this problem in six months

this is my configuration

thread pool {
start_servers = 10
max_servers = 32
min_spare_servers = 3
max_spare_servers = 10
max_requests_per_server = 0
}

what i can change ?

upgrade start_servers?

2009/12/23 Borislav Dimitrov <b.dimitrov at ngsystems.net>

> It's difficult to say, but I'd say about 7 - 10... Try with something like
> that:
> thread pool {
> start_servers = 10
> max_servers = 32
> min_spare_servers = 3
> max_spare_servers = 10
> max_requests_per_server = 0
> }
>
> I don't know what your setup is exactly but in some situations its
> advisable that you configure FR not to start and stop processes/threads
> dynamically (e.g.if you have some heavy setup procedures that need to be
> executed on thread startup; e.g. if you have a custom rlm_perl module which
> loads lots of configuration from a DB on thread startup).
> I now saw you second message. The default settings should be good for about
> 3500 users and more. For VoIP accounting even less is OK. I'm starting to
> think that your problem is elsewhere... Does your CPU usage stay low? Have
> you created indexes in your DB? Is hostname_lookups = no? I find it
> difficult to guess what the problem is without knowing your setup but
> something is deffinitely slowing things down... FR is capable of managing
> many more than 3500 users on a commodity server when configured properly.
> Also check for general network delay packet loss etc... It's not normal the
> CPU usage to stay low while requests are queueing one after another to be
> processed...
>
> Sincerely,
>
> Borislav Dimitrov
> e-mail: b.dimitrov at ngsystems.net
> GSM: 0888 51 55 45; 0889 28 54 57
> NG Systems
> Lavele 32 str, fl: 4,
> Sofia, Bulgaria
>
>
>
> On 23.12.2009, at 15:49, Alisson wrote:
>
> ok..
>
> look I have 3500 customers authenticating on this server with mysql
>
> how many threads I need to set?
>
> thread pool {
> start_servers = 1
>  max_servers = 1
> min_spare_servers = 1
> max_spare_servers = 1
> max_requests_per_server = 0
> }
>
>
> 2009/12/23 Borislav Dimitrov <b.dimitrov at ngsystems.net>
>
>> Just to add that I hope that you are starting FR without the debug
>> flag/option (i.e. without -X). When started like that (radiusd -X &) it
>> starts in a single thread and obviously the requests will await each other
>> to finish...
>>
>>   С поздрави
>>
>> Борислав Димитров
>> e-mail: b.dimitrov at ngsystems.net
>> GSM: 0888 51 55 45; 0889 28 54 57
>> NG Systems
>> Лавеле 32, ет: 4,
>> София, България
>>
>>
>>
>>
>> On 23.12.2009, at 15:43, Borislav Dimitrov wrote:
>>
>> In radiusd.conf:
>>
>> # THREAD POOL CONFIGURATION
>> thread pool {
>> start_servers = 1
>>  max_servers = 1
>> min_spare_servers = 1
>> max_spare_servers = 1
>> max_requests_per_server = 0
>> }
>>
>> ...but instead of ones (1s) put something more appropriate for your
>> network usage (like 5s or 7s). It's similar to Apache's thread pool
>> settings... Stay monitoring and tuning until the error discarding duplicate
>> packet disappears or becomes very rare. Also look at the Acct-Delay-Time
>> parameter returned from the NAS to FR. It should be 0. If it's more than 0,
>> then there's some delay. When you increase you thread pool settings the CPU
>> usage will start increasing as FR starts processing more requests
>> simultaneously/concurrently. Also check your NAS documentation for
>> configuration options of these timeout etc parameters. For Cisco they are
>> like that:
>>
>> "radius-server retransmit 0" etc
>>
>>
>>   Ð¡ Ð¿Ð¾Ð·Ð´Ñ€Ð°Ð²Ð¸
>>
>> Ð‘Ð¾Ñ€Ð¸Ñ Ð»Ð°Ð² Ð”Ð¸Ð¼Ð¸Ñ‚Ñ€Ð¾Ð²
>> e-mail: b.dimitrov at ngsystems.net
>> GSM: 0888 51 55 45; 0889 28 54 57
>> NG Systems
>> Ð›Ð°Ð²ÐµÐ»Ðµ 32, ÐµÑ‚: 4,
>> Ð¡Ð¾Ñ„Ð¸Ñ , Ð‘ÑŠÐ»Ð³Ð°Ñ€Ð¸Ñ
>>
>>
>>
>>
>> On 23.12.2009, at 15:36, Alisson wrote:
>>
>> hi, my DB is ok I tested with another programms e etc, and is running well
>>
>> how I set the thread pool to better concurrency?
>>
>> 2009/12/23 Borislav Dimitrov <b.dimitrov at ngsystems.net>
>>
>>> Hi,
>>>
>>> This question has been answered many times on this ML. I myself have (at
>>> least tried) answered it two times. Here're some of my previous messages:
>>>
>>> Msg1:
>>> Hi,
>>>
>>> I've already tried to answer a similar question some time ago (and I'm
>>> probably not the only one) but anyways...
>>> The cause of the problems probably is some delay or packet loss or
>>> something like that. Notice the Acct-Delay-Time value increasing as the NAS
>>> retries to send the "lost" accounting packet (although - at least in my case
>>> - it wasn't lost but just its processing was delayed). I've experienced such
>>> issues with Cisco VoIP routers - the router's log is flooded with RADIUS
>>> Server DEAD - and then ... ALIVE messages and in the FR log you can see the
>>> retries with the values of Acct-Delay-Time increasing. The main cause of the
>>> problem may be different, so you'll have to check it in your case. In my
>>> case it was caused by the thread pool settings not being appropriate for the
>>> load. In this case the CPU usage stays low but it's not used because you
>>> cannot achieve good concurrency and request have to await each other to
>>> finish. So find the main cause for your problems and eliminate it. The other
>>> thing is that most NASs have options to configure the RADIUS timeout, dead,
>>> retransmit etc times. E.g.for Cisco you could try "radius-server retransmit
>>> 0".
>>>
>>> Msg2:
>>> Hi,
>>>
>>> As far as I can see, the people on the list have provided you with a lot
>>> of very useful suggestions on what could cause the problem. As I said
>>> earlier (let me clarify) and to help you narrow things a little bit - it's
>>> probably due to the RADIUS response timing out hence the NAS complains the
>>> server is dead and later when it responds finally it marks it as alive
>>> again. The reasons can be different depending on your setup - slow network,
>>> database, custom module (like rlm_perl/python etc) or as I suggested (from
>>> my personal experiences) improperly configured concurrence settings of FR
>>> itself. See which component of your setup is causing the slow responds (it
>>> can be the backend, or messed up FR configuration) and fix it. Just for
>>> completeness check your NASs manuals - most have these settings configurable
>>> - response timeouts, retransmits, marking the server as dead etc but playing
>>> with the NAS while possibly useful is probably not the main issue in your
>>> setup - check what is slowing things down.
>>>
>>> Msg3:
>>> Hi there,
>>>
>>> I may be mistaken but... these are log message on the NAS aren't they?
>>> If this is the case, I've experienced similar behavior with Cisco VoIP
>>> routers (RADIUS Server DEAD and then... ALIVE). This happens if you haven't
>>> properly enabled concurrency in FreeRADIUS - the CPU usage stays low
>>> 0%-1%-2% but if the requests are many they are obviously waiting each
>>> other... This happens when you have stared FreeRADIUS with the -X key (I
>>> think it starts with a single thread then) or have too low values for the
>>> thread pool parameters (and/or the *_clones options of rlm_perl which are to
>>> be deprecated soon). If you configure proper values according to the
>>> expected usage (concurrent requests), then the request won't wait each other
>>> to finish while the CPU stays unused and you'll avoid this annoying message
>>> in your logs. A sure sing that something like that is going on is the
>>> Acct-Delay-Time parameter with values greater than 0 - that is for
>>> accounting not sure for auth etc. Anyways if the values of that parameter
>>> are high (they are in seconds I think) then the requests are waiting too
>>> long and hence the error messages.
>>>
>>> Bottom line:
>>> 1) Check the ML for more info
>>> 2) The NAS can be configured when to timeout and resend the RADIUS
>>> packages
>>> 3) Something is slowing down your setup. It may be the DB or something
>>> else. If your CPU usage stays low (< 5%), check your thread pool settings
>>> and increase them to achieve better concurrency.
>>>
>>> Sincerely,
>>>
>>> Borislav Dimitrov
>>> e-mail: b.dimitrov at ngsystems.net
>>> GSM: 0888 51 55 45; 0889 28 54 57
>>> NG Systems
>>> Lavele 32 str, fl: 4,
>>> Sofia, Bulgaria
>>>
>>>
>>>
>>>
>>> On 23.12.2009, at 15:10, Alisson wrote:
>>>
>>>  hi, in another day I posted this same error ' Error: Discarding
>>>> duplicate request from client '
>>>>
>>>> and the answer was 'your database is slow'
>>>>
>>>> so I upgrade my server with more memory, and changed servers
>>>> variables...
>>>>
>>>> but, i'm still having this problem
>>>>
>>>> and I dont know what can be
>>>>
>>>> --
>>>> Att.
>>>> Alisson F. GonÃ§alves
>>>> Sistemas de InformaÃ§Ã£o - UFGD
>>>> -
>>>> List info/subscribe/unsubscribe? See
>>>> http://www.freeradius.org/list/users.html
>>>>
>>>
>>>
>>> -
>>> List info/subscribe/unsubscribe? See
>>> http://www.freeradius.org/list/users.html
>>>
>>
>>
>>
>> --
>> Att.
>> Alisson F. GonÃ§alves
>> Sistemas de InformaÃ§Ã£o - UFGD
>>
>> -
>> List info/subscribe/unsubscribe? See
>> http://www.freeradius.org/list/users.html
>>
>>
>>
>>
>> -
>> List info/subscribe/unsubscribe? See
>> http://www.freeradius.org/list/users.html
>>
>
>
>
> --
> Att.
> Alisson F. Gonçalves
> Sistemas de Informação - UFGD
> -
> List info/subscribe/unsubscribe? See
> http://www.freeradius.org/list/users.html
>
>
>
> -
> List info/subscribe/unsubscribe? See
> http://www.freeradius.org/list/users.html
>

-- 
Att.
Alisson F. Gonçalves
Sistemas de Informação - UFGD
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.freeradius.org/pipermail/freeradius-users/attachments/20091223/2b2557a0/attachment.html>