mschap via ntlm_auth over a socket

Matthew Newton mcn4 at leicester.ac.uk
Wed Dec 3 01:33:41 CET 2014


Hi,

We've been hit with the same issue that a lot of other sites seem
to have seen, which is that one freeradius server just doesn't
seem to be able to authenticate over a certain number of users per
second, somewhere around 30 or so, to our AD domain controllers.
Standard ntlm_auth -> winbind -> AD stuff.

We've done things like tweak "winbind max domain connections" and
"winbind max clients", but can't seem to get winbind to connect to
more than one DC, or seemingly parallelise anything in any way.
Though reading archives it looks like we may have to use Samba 4
for that (though I still don't understand the reason for the max
connections option if it can't/won't do it; I must me missing
something).

I've done some digging, and looking at the winbind debug logs, it
seems to be taking around 3ms to do an auth, give or take.
However, using sysdig to watch the ntlm_auth process, it takes
around 30ms to run. That would figure with the maximum ~30 auths
per second. Now I'm not entirely sure either of these times are
completely accurate, but they were far enough off to get me
looking.

It seems ntlm_auth has a "--helper-protocol" option to enable it
to start and then process requests over stdin/stdout. This should
at least cut out the process exec time. So I've hacked around and
updated the mschap module in a couple of ways to allow use of
this.

First is to add a new option to mschap, "method", which specifies
the auth method to use. Currently this can be internal, or
ntlm_auth, selected depending on whether ntlm_auth is defined or
not. That will now still work for backwards compatibility, but if
the new "method" option is set, that overrides it.

The second is to add a new method, "ntlmauth_socket", which uses
a connection pool to talk to a UNIX socket to send/receive auth
data to ntlm_auth.

To use it you then have to add ntlm_auth to inetd to create the
socket. That handles spawning enough ntlm_auth processes off to
meet demand as the connection pool stuff opens the sockets.

I've not been able to test it in production yet; a) because I only
just wrote it, and b) because our servers are still FR2. So if
anyone else is running FR3 with a heavy load and willing to try,
it would be good. I fear with the amount of change control (and/or
paranoia) we have now, it's going to take me a while to get FR3
near the wireless controllers :(

I have tried it both standalone against Samba, and against the
domain. I'm getting these results. Note that the test (eapol_test)
is run sequentially, so I'm artificially limiting the throughput
there. All tests PEAP/EAP-MSCHAPv2 and FR threaded (not in debug
mode).

Standalone machine, Samba 3, not AD. 1,000 auths:

  internal mschap: 21.7s
  using ntlm_auth exec: 29.7s
  with socket: 22.9s

so the socket looks decidedly faster, close to the internal mschap
auth speed.

Domain joined machine, Samba 3, 100 auths, eapol_test being run
across the network from another machine:

  internal mschap: 6.3s
  using ntlm_auth exec: 8.1s
  with socket: 6.5s

Only 100 auths in a loop this time (not to hammer the DC too
much), but the socket version still manages to get 3 more auths in
per second.

Finally, domain joined machine, Samba 3, 100 auths, but direct
mschap with radtest on the FR host rather than eapol_test remotely:

  internal mschap: 4.7s
  using ntlm_auth exec: 6.3s
  with socket: 4.9s

So the real question (apart from whether I've completely missed
something in the above results) is whether it's actually any
better under real load.

I've just done a pull request, but I'm sure there are things that
need looking at or fixing even if the idea possibly sane. Let me
know.

Cheers,

Matthew


-- 
Matthew Newton, Ph.D. <mcn4 at le.ac.uk>

Systems Specialist, Infrastructure Services,
I.T. Services, University of Leicester, Leicester LE1 7RH, United Kingdom

For IT help contact helpdesk extn. 2253, <ithelp at le.ac.uk>


More information about the Freeradius-Devel mailing list