Help troubleshooting No EAP session matching...

A.L.M.Buxey at lboro.ac.uk A.L.M.Buxey at lboro.ac.uk
Thu Sep 15 20:44:56 CEST 2016


Hi,

> During periods of high load, we are seeing many messages like the following:
> 
> radiusd[28187]: rlm_eap: No EAP session matching the State variable.

RADIUS protocol only allows a certain number of auths to be 'being handled' from 
a NAS - I'm going to guess you are using Cisco controllers?   the joy of them using just 
one single NAS port as the ID - no extended IDs - thus all those hundreds of APs
are just the one client :/

> During peak times, we have about 8K wireless logins per minute, for
> extended periods. We have 6 wireless controllers, from which the
> Access-Requests are sent. Due to the high load, I am unable to run
> the server with -X, because it gets crushed while running single
> threaded. I can use radmin, but I'm not sure what to set the debug
> condition to.

most sites find they hit some magic numbers.... upgrading to new code on controller
might help (eg cisco moved to using new NAS port ID for the accounting traffic...double
your throughput then...) - then you hit a higher level...as we did... 

the other big issue is ntlm_auth  is does take quite some time and is a mix of CPU-bound and
server bound - basically you'll find it sticks like glue with older SAMBAs to just one of the
KDC entries in AD.

> radiusd.conf: max_request_time = 30
> radiusd.conf: cleanup_delay = 5
> radiusd.conf: max_requests = 8000000 #about 30K wireless users at
> peak * 256 ~= 8million

max_requests , please note: This should be 256 multiplied by the number of clients - this
is clients of the RADIUS server - ie NAS devices... 'those' clients, not wireless clients.
hence this number should be more like 256 x 6 !!  ;-)


> radiusd.conf: #       max_queue_size = 65536 (unsure why this is
> commented out)

its commented out by default IIRC - you could try tweaking...but if somethings in the
queue it hasnt really been handled - eg I believe its entry will not have
been added to state table etc.....

how many CPUs and cores? you could try increasing the number of threads to eg 64 if you have the
cores for the threads

> mods-enabled/eap: timer_expire = 60
> mods-enabled/eap: cache = disabled

and you really should be using the cache capability. this dramatically improves the re-auth time of
clients - hence shortcutting them

> 1. Is there a way to get more info along with the message "rlm_eap:
> No EAP session matching the State variable." ?
>     - eg. Which NAS it came from, calling-station-id, etc.


you can add extra logging eg with linelog or detail log (reject log here)

for debugging you can just debug particular clients (NAS) or end user clients - just follow the instructions 
for commands as given by 'man radmin' - you may want to up the debug log level to 5 if you dont get enough info.


is this box running ina  VM? the UDP performance isnt quite so good, you need a few tweaks to that part.

 I would advise looking at new features available in 3.1.x - the native winbind client - VERY fast and nice...and
the caching is improved.  

you can ALSO help things out by using the 'QoS' packet handler "queue_priority" - setting that to EAP will 
ensure that those requests that are further along the EAP process will be handled first - which is nice for
those auths that have already dealt with eg 8 or 9 of the RADIUS responses out of 11...rather than failing
at that late stage....

alan



More information about the Freeradius-Users mailing list