Status-Server requests are blocked if an Access-Request is waiting for downstream service to respond
Ignacio Arces
ignacio.arces at gmail.com
Thu Nov 12 19:26:16 CET 2020
> v3 has rlm_rest, which should be good enough for most purposes.
We ended up using rlm_c since our custom authentication requires a couple
API calls and generate random correlation/request IDs.
> Yes. That's how it works. The Status-Server packets are processed by
the same threads which process the Access-Requests. So if all of those
threads are blocked, then Status-Server packets are also blocked.
This was our understanding as well, that's why we didn't expect that a
single stuck request were blocking status requests.
> If *one* Access-Request packet is blocked, then other threads can still
process Status-Server. So no, you don't see a "single stuck auth request
impacting Status-Server".
We confirmed this scenario in our test env. We forced the request handler
in our auth API to sleep for 60 seconds and then perform a simple
Access-Request with radtest. As expected, this single Access-Request were
blocked for 60s (we removed the curl timeouts and the container health
check for this test) and during this time all Status-Server request we sent
got blocked and returned only after the Access-Requests completed.
> It makes zero sense to have a back-end database (or REST API) take 10
seconds to respond to a request. The solution here isn't to hack up the
RADIUS server to do something magical. The solution is to make the
back-end system *not* crap.
Agree. Our current focus is to improve our auth API. Nonetheless, I don't
think we are trying to hack up RADIUS, we just want to understand why it's
not working the way it's supposed to work. Maybe, we have
misconfigured something that's causing this behavior.
On Thu, Nov 12, 2020 at 7:54 AM Alan DeKok <aland at deployingradius.com>
wrote:
> On Nov 12, 2020, at 1:23 AM, Ignacio Arces <ignacio.arces at gmail.com>
> wrote:
> >
> > I'm running a containerized FreeRADIUS server v3.0.19 with a custom
> > authentication module written in C language that authenticates users
> > through a HTTP API.
>
> v3 has rlm_rest, which should be good enough for most purposes.
>
> > We recently experienced an outage in the auth API and since we didn't
> have
> > timeouts properly configured in the curl calls in our custom C module,
> the
> > requests were hanging indefinitely.
>
> Yes, that's the downside of a blocking design. :(
>
> > When this happened, we also noticed
> > that our containerized server was restarted by Docker as the container
> was
> > set to "Unhealthy" state, so the health checks were failing.
> > Troubleshooting the health checks we found that Status-Server requests
> were
> > not responding while the auth request was hanging waiting for the auth
> API
> > to respond.
>
> Yes. That's how it works. The Status-Server packets are processed by
> the same threads which process the Access-Requests. So if all of those
> threads are blocked, then Status-Server packets are also blocked.
>
> > Now that we have a 10s timeout properly configured in our curl requests,
> we
> > have mitigated the undesired restarts but we still can understand why
> even
> > a single stuck auth request is impacting Status-Server request.
>
> If *one* Access-Request packet is blocked, then other threads can still
> process Status-Server. So no, you don't see a "single stuck auth request
> impacting Status-Server".
>
> The goal of Status-Server is to see if the server is up and *working*.
> Maybe the server is running, but is unable to process any packets. In that
> case, yes, you *do* want it to stop processing Status-Server.
>
> This situation also falls into the standard design requirements for
> RADIUS: If the RADIUS server is critical, then _any_ system which is used
> by RADIUS is also critical. Make sure that those systems are (a) up, and
> (b) responsive.
>
> It makes zero sense to have a back-end database (or REST API) take 10
> seconds to respond to a request. The solution here isn't to hack up the
> RADIUS server to do something magical. The solution is to make the
> back-end system *not* crap.
>
> Alan DeKok.
>
>
>
> -
> List info/subscribe/unsubscribe? See
> http://www.freeradius.org/list/users.html
More information about the Freeradius-Users
mailing list