rlm_python failing

Andrew Parisio parisioa at gmail.com
Tue Sep 22 23:11:09 CEST 2015


Thank you, this response is very helpful.  The goal was to see if anybody
knew of any other obvious problems or a way to test similar situations with
the freeradius+python combo that might have been triggered by the lack of a
(python) timeout that we could prevent with some work today rather than
finding it in production down the road.  With the timeout fixed we'll do
some stress testing with long sleeps in python to make sure radius recovers
when python finally responds.

Thanks!

On Tue, Sep 22, 2015 at 11:26 AM, Arran Cudbard-Bell <
a.cudbardb at freeradius.org> wrote:

>
> > On 22 Sep 2015, at 01:46, Alan DeKok <aland at deployingradius.com> wrote:
> >
> > On Sep 21, 2015, at 7:06 PM, Andrew Parisio <parisioa at gmail.com> wrote:
> >> Like I said in my original email we are aware of the problems caused by
> the
> >> lack of a timeout and are actively working to fix that, so you don't
> need
> >> to be condescending and rude about the lack of a timeout.
> >
> > His message was neither condescending nor rude.  It simply points out
> that you're blaming FreeRADIUS, when the real blame lies elsewhere.  I
> suggest learning how to take feedback without getting offended.
>
> Right, I was pointing out the issue with the mental model OP constructed
> to explain what was occurring.
>
> I then explained the technical limitation that prevents FreeRADIUS from
> intervening, control is never returned to rlm_python during blocking calls.
>
> For that you'd need a python/C coroutine implementation, which probably
> doesn't exist, or a python interpreter on the end of a socket, which would
> allow you to implement an asynchronous interface.  Even there, there'd
> still be issues with blocking, but at least it'd allow FreeRADIUS to
> control the timeout.
>
> >> My question was that even after dynamo recovered FreeRADIUS continued to
> >> fail to call python, and I wanted to make sure that even after we fix
> the
> >> timeout issue that there isn't another problem that could be triggered.
> >> Are you suggesting the only possible explanation for what we saw was
> that
> >> all of the worker threads were still waiting for a response from dynamo
> 2
> >> days later, and that there is no point in debugging why FreeRADIUS was
> >> unable to execute rlm_python?
> >
> > That is what Arran said.
> >
> > FreeRADIUS calls Python, which calls your software.  If your software
> doesn't return execution to FreeRADIUS for 2 days, it's the fault of *your
> software*.  Go fix it.
>
> They're blocked with an infinite timeout.  What could cause them to
> unblock? Probably nothing.  Fix the blindingly obvious issue first, before
> going hunting for other issues you're not sure exist.
>
> >> I suppose that's possible but it seems
> >> unlikely that that many sockets were hung open for that long.  Is there
> any
> >> way to see what the worker threads were doing?
> >
> > Do python / C debuggers exist?  If so, use them.  Use "strace".  These
> are all tools which are available to you as an administrator.  There is
> nothing FreeRADIUS-specific about debugging a program.
>
> I think in this case it'd be more python debugger.  Given the weird way
> Python deals with I/O strace probably won't show a blocking system call,
> but it may allow examination of the sequence of calls.
>
> -Arran
>
> Arran Cudbard-Bell <a.cudbardb at freeradius.org>
> FreeRADIUS development team
>
> FD31 3077 42EC 7FCD 32FE 5EE2 56CF 27F9 30A8 CAA2
>
>
> -
> List info/subscribe/unsubscribe? See
> http://www.freeradius.org/list/users.html
>


More information about the Freeradius-Users mailing list