mschap via ntlm_auth over a socket

Wed Dec 3 02:28:07 CET 2014

> On 2 Dec 2014, at 19:33, Matthew Newton <mcn4 at LEICESTER.AC.UK> wrote:
> 
> Hi,
> 
> We've been hit with the same issue that a lot of other sites seem
> to have seen, which is that one freeradius server just doesn't
> seem to be able to authenticate over a certain number of users per
> second, somewhere around 30 or so, to our AD domain controllers.
> Standard ntlm_auth -> winbind -> AD stuff.
> 
> We've done things like tweak "winbind max domain connections" and
> "winbind max clients", but can't seem to get winbind to connect to
> more than one DC, or seemingly parallelise anything in any way.
> Though reading archives it looks like we may have to use Samba 4
> for that (though I still don't understand the reason for the max
> connections option if it can't/won't do it; I must me missing
> something).
> 
> I've done some digging, and looking at the winbind debug logs, it
> seems to be taking around 3ms to do an auth, give or take.
> However, using sysdig to watch the ntlm_auth process, it takes
> around 30ms to run. That would figure with the maximum ~30 auths
> per second. Now I'm not entirely sure either of these times are
> completely accurate, but they were far enough off to get me
> looking.
> 
> It seems ntlm_auth has a "--helper-protocol" option to enable it
> to start and then process requests over stdin/stdout. This should
> at least cut out the process exec time. So I've hacked around and
> updated the mschap module in a couple of ways to allow use of
> this.
> 
> First is to add a new option to mschap, "method", which specifies
> the auth method to use. Currently this can be internal, or
> ntlm_auth, selected depending on whether ntlm_auth is defined or
> not. That will now still work for backwards compatibility, but if
> the new "method" option is set, that overrides it.
> 
> The second is to add a new method, "ntlmauth_socket", which uses
> a connection pool to talk to a UNIX socket to send/receive auth
> data to ntlm_auth.
> 
> To use it you then have to add ntlm_auth to inetd to create the
> socket. That handles spawning enough ntlm_auth processes off to
> meet demand as the connection pool stuff opens the sockets.
> 
> I've not been able to test it in production yet; a) because I only
> just wrote it, and b) because our servers are still FR2. So if
> anyone else is running FR3 with a heavy load and willing to try,
> it would be good. I fear with the amount of change control (and/or
> paranoia) we have now, it's going to take me a while to get FR3
> near the wireless controllers :(
> 
> I have tried it both standalone against Samba, and against the
> domain. I'm getting these results. Note that the test (eapol_test)
> is run sequentially, so I'm artificially limiting the throughput
> there. All tests PEAP/EAP-MSCHAPv2 and FR threaded (not in debug
> mode).
> 
> Standalone machine, Samba 3, not AD. 1,000 auths:
> 
>  internal mschap: 21.7s
>  using ntlm_auth exec: 29.7s
>  with socket: 22.9s
> 
> so the socket looks decidedly faster, close to the internal mschap
> auth speed.
> 
> Domain joined machine, Samba 3, 100 auths, eapol_test being run
> across the network from another machine:
> 
>  internal mschap: 6.3s
>  using ntlm_auth exec: 8.1s
>  with socket: 6.5s
> 
> Only 100 auths in a loop this time (not to hammer the DC too
> much), but the socket version still manages to get 3 more auths in
> per second.
> 
> Finally, domain joined machine, Samba 3, 100 auths, but direct
> mschap with radtest on the FR host rather than eapol_test remotely:
> 
>  internal mschap: 4.7s
>  using ntlm_auth exec: 6.3s
>  with socket: 4.9s
> 
> So the real question (apart from whether I've completely missed
> something in the above results) is whether it's actually any
> better under real load.
> 
> I've just done a pull request, but I'm sure there are things that
> need looking at or fixing even if the idea possibly sane. Let me
> know.

Very nice! Possible improvement (and I may be completely wrong here)
but shouldn't it be possible to fork/exec and create a pipe with the 
ends mapped to stdin/stdout of the execed process?

Slightly less configuration, and maybe slight performance improvement.

Could just look for --helper-protocol option in the arguments to 
ntlm_auth and enable the connection pool automagically...

In this case we could likely avoid exposing the knobs of the connection
pool, as there's no benefit to tweaking anything, and just fork/exec 
max_servers instances of ntlm_auth?

Arran Cudbard-Bell <a.cudbardb at freeradius.org>
FreeRADIUS development team

FD31 3077 42EC 7FCD 32FE 5EE2 56CF 27F9 30A8 CAA2