Error: Unresponsive child for request in component authorize module cache

Wed Aug 20 11:27:02 UTC 2025

On Aug 20, 2025, at 6:57 AM, James Fan <polysorb at gmail.com> wrote:
> We recently did a stress test for the home server proxy.
> However, we found these two kinds of errors that block the server:
> 
> Wed Aug 20 06:58:04 2025 : Error: Unresponsive child for request 396685, in
> component authorize module cache
> Wed Aug 20 06:58:04 2025 : Error: Unresponsive child for request 396689, in
> component <core> module
> 
> I know it could be slow SQL queries causing these errors. But I wonder if
> the slow SQL queries cause the cache module to be unresponsive?

  No.  But the above messages might not be entirely correct due to a race condition.  i.e. the SQL server blocks for 10 seconds, comes back, and then the server starts processing the request through the cache module.  At that point the NAS sends a new packet, and the above message shows up.

> Because the errors of the cache module are more than the core module. And I
> can't see the SQL module being unresponsive. So, I'm unsure where to find
> the root cause. The SQL server is working fine.

  It might be the cache module, or it might be SQL.  But if you're getting "unresponsive child" messages, then something is seriously wrong.

> We are conducting a stress test with 150 reqs/sec,

  We've tested v3 at 50K+ packets per second (auth) to OpenLDAP.  We've tested it at 4K packets/s to a carefully tuned SQL database.  We've tested it at 5K packets/s to Redis.

> but I'm unsure if a
> deadlock issue arises due to setting the max_requests to 50000

  That's just a limit on the number of requests the server is processing.  Setting a limit doesn't cause the child thread to become unresponsive.

> Please advise how to find the root cause, and I can provide more details
> if needed. Thank you.

  The root cause is something in your local configuration.  Blocking at 150 packets/s means that something is *catastrophically* wrong with your local configuration.

  The root cause is almost always one of a few issues.  A slow SQL database, running many shell scripts per packet, using complex Perl / Python scripts, etc.

  Create a new system with a default configuration.  Set it to always return "ok" for both authentication and accounting.  See that it does thousands of packets/s without an issue.

  Then, gradually add in pieces of your existing configuration.  Run the tests again.  At some point, performance will drop off of a cliff.  That last change is the source of the problem.

  Since you've given only a vague description of your configuration, I can only give a vague suggestion for how to fix it.

  Alan DeKok.