Troubleshooting apparent failed RADIUS challenges

Alan DeKok aland at deployingradius.com
Wed Nov 8 17:11:21 CET 2017


On Nov 8, 2017, at 9:39 AM, Turner, Ryan H <rhturner at email.unc.edu> wrote:
> 
> Hoping this does not invoke a beating from Alan...

  Only if you work hard to ignore all of the documentation...

> We are a LARGE EAP-TLS shop (one of the first), and authenticate 10s of thousands of clients every day.  I am noticing (which doesn't mean it is necessarily our fault), that some percentage of our users that cannot connect appear to be breaking down in the Challenge phase.  Initially it looks like they aren't even trying to authenticate on wireless (we don't log challenges), but a capture will tell me that  a challenge is made, and then there are no responses...

  Yup.  That happens.

> A few questions...  I honestly find the Radius Challenge section difficult to understand.  In our EAP-TLS environment, exactly what is happening during this challenge phase?  Doing searching on this has so far returned nebulous answers.  Exactly what is being checked of verified with an EAP-TLS client?

  It's all black magic. :(

  In short, EAP-TLS does this:

- end user system sends EAP over Ethernet to the AP
- the AP packages EAP in RADIUS, and sends it to the RADIUS server
- the RADIUS server responds in EAP over RADIUS
- the AP sends that response to the end user as EAP over Ethernet.

  Simple, right?

  The black magic is that EAP contains TLS. i.e. the end system and RADIUS server set up a *TLS* connection between themselves.  This means lots and lots of packets going back and forth.

  Depending on the size of the certificates, it could be 10-40 packets.

  Each packet contains RADIUS + EAP + TLS magic.  Lots, and lots, of TLS magic.

  FreeRADIUS doesn't implement TLS.  Instead, it relies on OpenSSL.  So when something goes wrong in TLS, you hope to high heaven that OpenSSL gives a useful error message.  Which it often doesn't.

  Because it doesn't *have* an error message.  The end user system has just magically decided that it doesn't like the server, and stops talking to it.  No message to the server saying why (much of the time).  And no message to the end user saying "Hey, I expected cert X, but instead got cert Y".

  Nope.  Just packets, and.... failure.

> Secondly, how do you even begin to troubleshoot why certain clients would not progress beyond the challenge?

  Cursing.  Lots, and lots of cursing.

  Most of these clients will not give you ANY indication as to why it failed.  Which is horrible.

>  We have an android user that is onboarded with a certificate that will last a year.  Periodically, it will just stop working.  I will notice an incomplete challenge.  If he reonboards with a new certificate, everything works again for some period (obviously looks like a client issue).  We also notice that some users fail their challenge when traveling abroad with eduroam.

  Weird.  TBH, I'd blame the client, and tell them to upgrade to something that works.

  Android uses wpa_supplicant as it's client.  That code works.  Very, very, very, well.  If android doesn't work, it's because google or some other vendor mangled the software / config to break it.

  I use eapol_test (from wpa_supplicant) for all of my EAP testing.  It works, and it produces relatively good error messages.  But those messages *don't* come out to (e.g.) end users on android.

  Because google, et al. *hate* their end users, and don't want anyone to be able to debug anything.

  Maybe that's not *quite* true, but it sure looks true from here.

> We are running one of the most patched 2.X versions of FreeRadius (we are actively building for a deployment of 3 at this moment).

  Use 2.2.10.  There shouldn't be any need to patch v2 for anything.

  Alan DeKok.





More information about the Freeradius-Users mailing list