Dave Aldwinckle daldwinc at
Fri Sep 16 13:44:24 CEST 2016

Thanks for all of the responses. There is a lot of useful info here.

- Aruba controllers, so yes, about 4500 APs on 6 clients (good note 
about max_requests)
- Will look into updating Samba, right now we are running 3.6, so too 
old to use the winbind_* stuff from mschap
- Will add logging as suggested to auth type reject
- Will trial eap:caching
- We are running 3.0.10, which I understand is old. I may look into 
upgrading to 3.1, but I was hoping the next upgrade would be 4.0.

The box is a VM, what did you mean by this?

 > is this box running ina  VM? the UDP performance isnt quite so good, 
you need a few tweaks to that part.

Thanks again!


On 16-09-15 02:44 PM, A.L.M.Buxey at wrote:
> Hi,
>> During periods of high load, we are seeing many messages like the following:
>> radiusd[28187]: rlm_eap: No EAP session matching the State variable.
> RADIUS protocol only allows a certain number of auths to be 'being handled' from
> a NAS - I'm going to guess you are using Cisco controllers?   the joy of them using just
> one single NAS port as the ID - no extended IDs - thus all those hundreds of APs
> are just the one client :/
>> During peak times, we have about 8K wireless logins per minute, for
>> extended periods. We have 6 wireless controllers, from which the
>> Access-Requests are sent. Due to the high load, I am unable to run
>> the server with -X, because it gets crushed while running single
>> threaded. I can use radmin, but I'm not sure what to set the debug
>> condition to.
> most sites find they hit some magic numbers.... upgrading to new code on controller
> might help (eg cisco moved to using new NAS port ID for the accounting traffic...double
> your throughput then...) - then you hit a higher we did...
> the other big issue is ntlm_auth  is does take quite some time and is a mix of CPU-bound and
> server bound - basically you'll find it sticks like glue with older SAMBAs to just one of the
> KDC entries in AD.
>> radiusd.conf: max_request_time = 30
>> radiusd.conf: cleanup_delay = 5
>> radiusd.conf: max_requests = 8000000 #about 30K wireless users at
>> peak * 256 ~= 8million
> max_requests , please note: This should be 256 multiplied by the number of clients - this
> is clients of the RADIUS server - ie NAS devices... 'those' clients, not wireless clients.
> hence this number should be more like 256 x 6 !!  ;-)
>> radiusd.conf: #       max_queue_size = 65536 (unsure why this is
>> commented out)
> its commented out by default IIRC - you could try tweaking...but if somethings in the
> queue it hasnt really been handled - eg I believe its entry will not have
> been added to state table etc.....
> how many CPUs and cores? you could try increasing the number of threads to eg 64 if you have the
> cores for the threads
>> mods-enabled/eap: timer_expire = 60
>> mods-enabled/eap: cache = disabled
> and you really should be using the cache capability. this dramatically improves the re-auth time of
> clients - hence shortcutting them
>> 1. Is there a way to get more info along with the message "rlm_eap:
>> No EAP session matching the State variable." ?
>>      - eg. Which NAS it came from, calling-station-id, etc.
> you can add extra logging eg with linelog or detail log (reject log here)
> for debugging you can just debug particular clients (NAS) or end user clients - just follow the instructions
> for commands as given by 'man radmin' - you may want to up the debug log level to 5 if you dont get enough info.
> is this box running ina  VM? the UDP performance isnt quite so good, you need a few tweaks to that part.
>   I would advise looking at new features available in 3.1.x - the native winbind client - VERY fast and nice...and
> the caching is improved.
> you can ALSO help things out by using the 'QoS' packet handler "queue_priority" - setting that to EAP will
> ensure that those requests that are further along the EAP process will be handled first - which is nice for
> those auths that have already dealt with eg 8 or 9 of the RADIUS responses out of 11...rather than failing
> at that late stage....
> alan
> -
