Seg fault
Gabriel Blanchard
gabe at teksavvy.ca
Sun Aug 16 14:18:46 CEST 2009
On 16-Aug-09, at 3:12 AM, Alan DeKok wrote:
> Gabriel Blanchard wrote:
> >
> > Something was committed recently that's causing it to seg fault.
>
> When? What else is happening with the old request?
>
I downloaded the stable snapshot on Aug 5th from http://git.freeradius.org/pre/
Sorry, that's pretty much all I've got. I'll have to diff the snapshot
with the latest commit to see what's going on.
>
> > Aug 15 21:41:42 rad03 radiusd[62455]: Received conflicting packet
> from
> > client ERXes port 50000 - ID: 98 due to unfinished request 390.
> Giving
> > up on old request.
> > Aug 15 21:41:42 rad03 radiusd[62455]: ASSERT FAILED event.c[2730]:
> > request->ev != NULL
> > Aug 15 21:41:42 rad03 kernel: pid 62455 (radiusd), uid 133: exited
> on
> > signal 6
> >
> > I'll have to dig up a bit deeper to find the cause, but this
> definitely
> > doesn't happen with a stable snapshot from a few days ago.
>
> Maybe the network situation has changed, too.
>
I don't think so, I ran the daemon a few times in a row and the server
does this exact same thing every time. Not to mention that simply
recompiling the earlier snapshot fixed the problem without any
configuration change.
> > Aug 15 21:41:42 rad03 radiusd[62455]: Received conflicting packet
> from
> > client ERXes port 50000 - ID: 98 due to unfinished request 390.
> Giving
> > up on old request.
>
I actually do get this error with the Snapshot from a few days ago
like I mentioned, but it does NOT abort.
I'm honestly not sure why I'm getting this error in the first place.
Is it because the NAS is retransmitting the request before the radius
server is done with the first one?
>
> That assert could be changed to a debug warning. The code removes
> the
> "old" request from the internal tracking table, but only under certain
> conditions. If those conditions aren't met, it hits the assert.
>
> e.g. The packet was proxied, etc.
>
> Or, maybe there's a race condition.
>
I think that may be the case, the issue doesn't happen in debugging
mode. (single thread) which makes it very hard to debug. Not to
mention that the issue only happens under the heavy load of running
live on our network. Doesn't happen in my lab.
We only have two NASes, but each of them have over 20,000 users. So
the radius servers are definitely busy.
More information about the Freeradius-Devel
mailing list