Antw: Re: How many NAS kann radius take?

Anja Ruckdaeschel Anja.Ruckdaeschel at rz.uni-regensburg.de
Thu Feb 13 19:38:33 CET 2014


Hi Phil!

I´m very very sorry. There was a typo: It was 0.01 s and 0.02 s with ldap. My
fault. Shame on me.
So no ldap problem again.


 I already reduced ldap and sql as much as possible with  eap {
     ok = return
   }
and some unlang conditons.

Did this based on huntgroups because I saw that
1. eap returned updated or handeled and only ok for one packet in default
2. There is an EAP-Message in every Access-Request packet I receive for
peap/mschap now.

But I will look into rlm_cache. Always looking for optimization in expectation
of supplicants becoming more and more every day.

And we will talk to our wireless vendor about the ports.

And I will look into the preprocess thing.

Thank you all very very much








>>> Phil Mayers <p.mayers at imperial.ac.uk> 13.02.2014 19:01 >>>
On 13/02/14 17:03, Anja Ruckdaeschel wrote:

> Tests with exact one ldap server showed no differences. Ldap response times
> went e.g. from 0.1 s to 0.2 s when the problem appears, but then we do a
lot
> more ldap queries, too.
> So I think, that´s quite normal.?

0.2 sec for a query is quite slow. Our median time for an AD MSCHAP auth 
is around 35 milliseconds, and our SQL lookups are closer to 5 milliseconds.

If you do the maths, a thread pool of 100 threads can only serve 500 
auths/sec if you have 0.2 sec blocking per auth. Scale numbers 
appropriately for number of lookups, divided by 11/3 (usual ratio of 
PEAP outer to PEAP inner packets).

>
> Here are our authentication methods:
> PAP with or without ldap for switches, routers and dial-in servers (cisco,
hp,
> max ascend) and for Juniper VPN controllers (only those devices do
accounting)
> CHAP PAP with or without ldap for switches and routers (cisco, hp)
> For Wi-Fi: PEAP/MSCHAPv2 and EAP/TTLS-PAP with 350 lancom aps and 1
Colubris
> Controller with does 2 different SSIDs with ldap for our users at home and
> those proxied in via the eduroam-project,
> for which we are service provider and identity provider

Consider splitting these out into separate processes, listening on 
separate IP/ports. This will help with brief thread pool exhaustion 
spikes, packet ID space exhaustion from common NAS source IPs and so on.

>
> 90% of our radius requests are coming in over Wi-Fi.
> ldap, sql and so on are only called in inner-tunnel (3 Packets)

You can, and should, reduce that even further. In later server versions, 
you can do:

authorize {
   eap {
     ok = return
   }
   ...
}

...in the inner tunnel as well, which will skip LDAP/SQL lookups for 
EAP-Identity and EAP-MSCHAP success/fail packets. You only need the 
LDAP/SQL for EAP-MSCHAP auth.

In earlier versions of the server you can do crazy hacks to match the 
EAP-Message against a regexp and ignore identity / mschap success/fail 
packets - search the archives.

Also investigate rlm_cache. Cache every possible lookup, even if it's 
just for a few seconds.

> I think we have quite a default config, but with a few modules commented
out
> (e.g. unix, radutmp, ... we don´t need) and a few policies in place to
reduce
> unnecessary stuff like querying ldap or sql
> in default, when its peap.
>
> Short eap timeouts on client devices and sleep mode increased our radius
> requests for the last two years. There are single users doing up to ~ 1500
> login requests per day from one device.
> But there is nothing we can do about that :-(

Maybe, but you need to be aware of it. Keep an eye on fast roaming, 
discuss the issues with your wireless vendor, etc.

>
> "Some NASes e.g. Cisco lightweight wireless use a single UDP source  port,
so
> at most 256 requests can be in-flight at any given time" could be our
problem
> to  because our lancom access points do the same.
>
> Does this mean that when one NAS is sending more than 256 Access-Requests
from
> one port freeradius cannot process one more at that time from this NAS?

It's not a freeradius thing - *no* RADIUS server could. It's a protocol 
limitation.

A given source/dest ip/port 4-tuple can only carry 256 requests 
in-flight, so no software could handle more. The NAS would have to open 
more ports.

It would be rare - but not impossible, under heavy load - for 256 
requests to be outstanding. When that happens, the NAS will either start 
dropping auth requests, or re-using IDs. In the latter case, that makes 
things worse, and the backlog can grow until the backend database (SQL, 
LDAP, AD, whatever) unblocks the thread pool.

TBH, I think 0.2 sec is slow for an LDAP lookup; can you run a local 
replica?
-
List info/subscribe/unsubscribe? See
http://www.freeradius.org/list/users.html


More information about the Freeradius-Users mailing list