rlm_python failing

Andrew Parisio parisioa at gmail.com
Mon Sep 21 23:12:59 CEST 2015


We use rlm_python and AWS DynamoDB as a backend, and over the weekend there
were some problems with DynamoDB:
http://venturebeat.com/2015/09/20/amazons-aws-outage-takes-down-netflix-reddit-medium-and-more/.
During this time we saw lots of dynamo errors, but we also started to see a
lot of freeradius problems:
Sun Sep 20 09:22:18 2015 : WARNING: (27733) WARNING: Module rlm_python
became unblocked for request 27733
Sun Sep 20 09:22:37 2015 : Error: Unresponsive child for request 27751, in
component accounting module python

During this timeframe I see a few things in my statistics:
* Our python code taking as long as 4,500 seconds to respond to requests
(freeradius is configured to 30 seconds max per request)
[image: Inline image 1]
* Accounting requests went from 100/5minutes to 17,000/5minutes (as devices
were resending accounting packets because they didn't get a response)
[image: Inline image 4]
These errors kept accumulating and eventually our server wasn't sending any
incoming requests to python (and it didn't recover when dynamo recovered),
which I know because the python code should have created external logs that
don't exist during this time period.  This morning I turned on debugging on
a node and this is what I see:

Mon Sep 21 18:14:46 2015 : Debug: (870048)   accounting {
Mon Sep 21 18:14:46 2015 : Debug: (870048)     modsingle[accounting]:
calling python (rlm_python) for request 870048
Mon Sep 21 18:14:46 2015 : Debug: (870048) python: Using thread state
0x7f1058000cc0
Mon Sep 21 18:14:46 2015 : Debug: (870048)     modsingle[accounting]:
returned from python (rlm_python) for request 870048
Mon Sep 21 18:14:46 2015 : Debug: (870048)     [python] = fail
Mon Sep 21 18:14:46 2015 : Debug: (870048)   } # accounting = fail
Mon Sep 21 18:14:46 2015 : Debug: (870048) Not sending reply to client.

The first line in our python accounting module is to radlog() so I know if
the request hit our code, and that wasn't even happening so it seems as
though FreeRADIUS+rlm_python had failed in some way.  Restarting freeradius
resolved the problem.

I'm fairly certain the long request time problem is because the default
timeout for the dynamo SDK is none (wait forever).  I'm working on setting
a boto timeout to prevent the long response time from happening again, but
at the moment I'm concerned about how freeradius was failing to even call
python after a while.  Aside from the boto timeout issue how would we go
about debugging the freeradius+python problem to make sure it doesn't
happen in the future?

Thanks!
Andrew
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image.png
Type: image/png
Size: 91788 bytes
Desc: not available
URL: <http://lists.freeradius.org/pipermail/freeradius-users/attachments/20150921/6e53a6b9/attachment-0002.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image.png
Type: image/png
Size: 180423 bytes
Desc: not available
URL: <http://lists.freeradius.org/pipermail/freeradius-users/attachments/20150921/6e53a6b9/attachment-0003.png>


More information about the Freeradius-Users mailing list