recommendations for max_servers

John Douglass john.douglass at oit.gatech.edu
Tue Sep 23 19:31:42 CEST 2014


Loius,

Be aware, there are some major design flaws within the Cisco WLC 
controller software (Georgia Tech is working with Cisco to work through 
them) regarding the number of requests a controller can field.

https://tools.cisco.com/bugsearch/bug/CSCuj88508

The flaws in the controller software cause an "overrun" of radiusIDs if 
you have too many authentications/second which will manifest as 
"duplicate" and "discards" in the logs. No amount of tweaking on the 
radius side will fix this. You can however, improve performance to try 
and improve the client experience.

When we are talking about AD, Phil Mayers had some great suggestions on 
improving ntlm_auth performance. Here were his recommendations:

1. Upgraded the radius servers.
  Old spec: 3Gb RAM, 2x P4-based Xeon 1 core @ 3.2GHz, RHEL5
  New spec: 16Gb RAM, 1x Xeon E5-2620 6 core @ 2GHz, RHEL6

  2. Upgraded Samba - went from RHEL5 samba3x-3.5.4 to RHEL6 samba-3.6.9

  3. Set "winbind max domain connections = 12" in smb.conf (restart 
winbind) (we at GT actually have so many authentications, we set to 128 
as we reached our limit during peak times)

  4. Forced our smb.conf to talk to specific AD controllers which are 
physical, not VMWare (most our DCs are VMWare)

  5. Spent a *lot* of time debugging and tracking the Samba->DC RPC 
round-trip times and hassling our AD people to keep these stable; not 
sure what they did, if anything.

  6. Increased radiusd.conf setting to "max_requests = 16384"

  7. Worked really, really hard on getting the Cisco APs, AP radios and 
controllers to STOP CRASHING; their software quality has been abysmal, 
and this was a contributing factor - APs or controllers would crash 
under load, and this would trigger a burst of auths, which would trigger 
the problem.

As Alan said before, there are lots of moving parts where issues can 
happen. If you improve server performance within the pieces 
(AD/database/winbind/etc), that's a start.

If you are in a large scale Cisco deployment, depending on how many APs 
and users, you may find yourself having issues regardless. It's a hard 
problem to advise on, but adding additional radius servers and 
optimizing ours for performance has helped us immensely.

- JohnD

On 09/19/2014 02:58 PM, Louis Munro wrote:
> Hello,
>
> While troubleshooting a system I came upon a case of 'hung children' 
> and duplicate requests.
> I would usually ascribe this to a database issue but in this case the 
> database is mostly unused and properly indexed.
> Accounting is not used, so that's one less thing to consider.
>
> On the other hand the max_servers setting had been set as high as 192 
> by someone with good intentions.
> Tuning it down to 64 seemed to significantly reduce the load on the 
> system and the number of hung children was reduced by a factor of 
> about 100.
>
> While there remains an issue with some (intermittent) slow ntlm_auth 
> to take care of, I wondered how others tune the value of max_servers 
> other than by trial and error. Most of the time the default of 32 has 
> been enough for me. Higher is not necessarily better in my experience 
> since at least in this case it seems to have led to the main thread 
> working harder when under load (with most of the work done in the 
> "system" space).
>
> This is a system running 2.2.5 on RHEL 6.4 in VmWare. It's got 24Gb of 
> RAM and 16 cores so it should still be pretty capable.
>
> Does anyone have an algorithm, rule of thumb or other ballpark way of 
> estimating the "ideal" maximum number of threads?
>
> Regards,
> --
> Louis Munro
> lmunro at inverse.ca <mailto:lmunro at inverse.ca>  :: www.inverse.ca 
> <http://www.inverse.ca>
> +1.514.447.4918 x125  :: +1 (866) 353-6153 x125
> Inverse inc. :: Leaders behind SOGo (www.sogo.nu <http://www.sogo.nu>) 
> and PacketFence (www.packetfence.org <http://www.packetfence.org>)
>
>
>
> -
> List info/subscribe/unsubscribe? See http://www.freeradius.org/list/users.html

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.freeradius.org/pipermail/freeradius-users/attachments/20140923/d1323405/attachment-0001.html>


More information about the Freeradius-Users mailing list