Freeradius crashes with SIGABRT
aland at deployingradius.com
Wed Dec 4 15:14:19 CET 2019
On Dec 4, 2019, at 8:24 AM, Daniel Feuchtinger <daniel.feuchtinger at lrz.de> wrote:
> Am 04.12.19 um 13:05 schrieb Daniel Feuchtinger:
>> The first run crashed after a few minutes (see AddressSanitizer output below),
>> a second run with higher max_requests still runs without a crash
>> for a few hours now. I guess there shouldn't be a crash with
>> double-free, even if max_requests is to low?
> The issue is not related to max_requests, but maybe to
> threading? With -s the server run stable for some
> hours, without -s it crashes after minutes.
The issue seems to be that the server is slow, and is receiving conflicting packets. i.e. the NAS sends packet A, some time later gives up, and then send packet B.
But the server is still running packet A. Likely because it's blocked in a database.
So it tries to cancel packet A. But for some reason that cancellation is delayed. During that delay, the NAS retransmits packet B. And the server tries to cancel request A again. And then things go boom.
It's not clear to me *why* this is happening. The issue is buried deep inside of some fairly complex state machine work in the server.
I've pushed a patch which should help. It updates the main "check for duplicate / conflicting packet" code. When the *first* conflicting packet B comes in, packet A is now removed entirely from the conflicting packet list. That means when the next packet B comes in, any lookup will find B not A, and there's no double free.
One reason this has been so hard to track down is that it doesn't occur in most situations. It looks like it happens only when (1) the back-end DB is very slow, and (2) the NAS retransmits very aggressively.
Please test the latest code from GitHub:
If that works, it should be the definitive fix.
More information about the Freeradius-Users