rlm_kafka accounting module for Freeradius 3.x

Terry Burton terry.burton at gmail.com
Sun May 19 17:59:30 UTC 2024


On Sun, 19 May 2024 at 17:54, Arran Cudbard-Bell via Freeradius-Devel <
freeradius-devel at lists.freeradius.org> wrote:

> > On May 19, 2024, at 10:49, The Binary <binary4bytes at gmail.com> wrote:
> > I had created a FreeRADIUS 3.x module a year ago, for pushing Accounting
> > Records to Kafka (uses librdkafka). I created a separate repo outside of
> > FreeRADIUS project, which I used to compile separately by cloning inside
> > the FreeRADIUS source 'src/modules' path.
> >
> > I would like to submit the module if considered useful by the community.
>
> I believe there's already work underway to pull it in :)
>
> This was the one you were basing the v3 module on right Terry?
>

Indeed, I was looking at this over the last week to ensure that there is a
path for inclusion of the module into FreeRADIUS. We are grateful for the
FRv3 submission. We also need to ensure that we have a plan for FRv4...

There's some things we need to get into shape for FRv3.2.x, mainly:


1. Durability of the events:

The Kafka Producer in librdkafka is an asynchronous implementation that
creates an in-application queue serviced by a dedicated IO thread. There
exists the possibility for queued events to be lost (delivery failures,
application crashes) after the module has returned OK (and FreeRADIUS acked
the accounting event). Currently the module does not poll for delivery
status reports (DSRs) for messages, resulting in silent data loss. We
should at least log which messages were lost, even if FreeRADIUS is no
longer handling the request.

In v3 there is little that we can do to link FreeRADIUS's request handling
to the Kafka queue. When a Kafka DSR is received the original request will
have long gone. We may be able to integrate with the librdkafka-managed
queue much tighter in FRv4 which is async by design (to be investigated) —
slightly complicated because Kafka uses a polling interface to fetch
notifications rather than an event-driven interface.

So for FRv3 in order to match the usual durability properties for
accounting messages, we likely want a default "synchronous mode" that does
not tap into the full power of librdkafka (plugging of requests to allow
batching, larger compression windows, etc.) but in which the module reports
OK ( => Accounting-Response) only upon receiving a successful DSR (durably
received by a set of in-sync brokers), otherwise FAIL.

The user can then opt to enable async delivery, allowing >10x throughput,
provided that they are prepared to relax the durability properties for
accounting events. (In practise the user may take a hybrid approach, e.g.
Accounting Start/Stops via a slow, durable synchronous route and Accounting
I-Us via the fast asynchronous route: Selected by control attributes /
separate module instances.)


2. Example module option sets (for sync vs async ; small vs large queues,
i.e. risk appetite vs throughput), with careful descriptions of the
durability model vs performance.


3. Integration with the FR build system: Autoconf configuration (library
detection), etc.


4. Further testing and analysis to ensure that the implementation of the
background IO thread does not cause issues for FreeRADIUS as a whole.


5. CI tests: End to end testing within GitHub actions.


I have made progress with points 1 to 3. We (NetworkRADIUS) are currently
performing a review prior to deciding to complete the work required to
accept the module into the Open Source project.

As soon as we have something further to share I will reach out.


More information about the Freeradius-Devel mailing list