mschap via ntlm_auth over a socket

Phil Mayers p.mayers at imperial.ac.uk
Wed Dec 3 14:00:09 CET 2014


On 03/12/14 00:33, Matthew Newton wrote:
> Hi,
>
> We've been hit with the same issue that a lot of other sites seem
> to have seen, which is that one freeradius server just doesn't
> seem to be able to authenticate over a certain number of users per
> second, somewhere around 30 or so, to our AD domain controllers.
> Standard ntlm_auth -> winbind -> AD stuff.

Ah, the AD auth spiral of doom. Very puzzling that one. Never really got 
to the bottom of what exactly the cause was - I think it was a mix of 
hardware, kernel, samba and AD problems at our site.

>
> We've done things like tweak "winbind max domain connections" and
> "winbind max clients", but can't seem to get winbind to connect to
> more than one DC, or seemingly parallelise anything in any way.

Which version of Samba are you using, and how are you determining 
there's no parallelism?

We're running on Samba 3.6.9 on RHEL6, and have "winbind max domain 
connections = 12", and with "lsof -i :445" we see many windbind 
processes and separate TCP connections after spikes of load.

However, the parallelism is complex - in Samba 3.x it's only to one DC, 
not several, and you run into issues with failed auths being punted to 
the PDC emulator.

> Though reading archives it looks like we may have to use Samba 4
> for that (though I still don't understand the reason for the max
> connections option if it can't/won't do it; I must me missing
> something).

As above, works for us.

>
> I've done some digging, and looking at the winbind debug logs, it
> seems to be taking around 3ms to do an auth, give or take.
> However, using sysdig to watch the ntlm_auth process, it takes

I've got C source for a tiny wrapper that logs process start/stop times 
to an append file, which I found useful for instrumentation.

You might also find it useful to setup a rolling tcpdump ringbuffer 
capture to the DCs, and use "tshark -T fields" to dump out the msrpc 
header and packet time - although payload is encrypted, you can 
correlate request ID in request/response payload to get on-the-wire auth 
times for AD.

We found the latter very useful.

> The second is to add a new method, "ntlmauth_socket", which uses
> a connection pool to talk to a UNIX socket to send/receive auth
> data to ntlm_auth.

Handy. FWIW I think this is a better solution long-term for big/busy 
sites - it avoids process startup overhead completely.

I have no time to test right now unfortunately, and like you ITIL has 
kept 3.x away from our radius servers :o(


More information about the Freeradius-Devel mailing list