Observing a common error in Freeradius with Postgres but not justification why it happens

Sat Sep 2 09:10:10 UTC 2023

Hello all,

We have implemented freeradius 3.2.3 with Postgresql 12.15 to store
accounting packets i.e start, interim-update, stop for reporting purposes.
We customized the queries files to store fewer columns than the original
schema since this is what we need.

The following columns are customized in the accounting queries and same
columns in postgres radacct table:

radacctid, acctsessionid, acctuniqueid, acctstarttime, acctudpatetime,
acctstoptime, callingstationid, framedipaddress, imei, 3gpplocation.

Also the following indexes are in radacct table:

1. radacct_active_session_idx (this is the original from FR schema of
postgresql)
2. framedipaddress, starttime, stoptime, updatetime (This one we added to
use upon need in our read queries on report generation)

Currently, from postgres dashboard we observe around 300-400 write/update
transactions.

The issue we noticed a couple of days ago is there were a lot of messages
in radius log stating "Error: xxx requests have been waiting in the
processing queue for x seconds.  Check that all databases are running
properly!"
And sometimes we see another warning stating "Warning: Please check the
configuration file. The value for 'max_requests' is probably set too low. "

Based on the above observed messages, we tuned two matters:
1. the postgresql db by changing the default shared_buffer from 128MB to
8GB since we have plenty of RAM in the server, and this clearly reflected
in the performance and removed around 98% of the above error messages.
However we sometimes see the same error message showing up in peak hours
and KPI of radius is impacted per the NAS report. After tuning the
Postgres, we used pgbench and the following is what we get:

[postgres at postgresql-syslog01 ~]$ pgbench -c 10 -j 2 -t 1000
my_benchmark_test_db
starting vacuum...end.
transaction type: <builtin: TPC-B (sort of)>
scaling factor: 1
query mode: simple
number of clients: 10
number of threads: 2
number of transactions per client: 1000
number of transactions actually processed: 10000/10000
latency average = 10.931 ms
tps = 914.869517 (including connections establishing)
tps = 915.175572 (excluding connections establishing)

2. In radiusd.conf we have the following, and we tuned max request to
32000,
start_servers = 5
max_servers = 32
min_spare_servers = 3
max_spare_servers = 10
max_requests_per_server = 0
cleanup_delay = 5
max_request_time = 30
max_requests = 32000

Radius server specs:
We have 4 sockets 1.87Ghz and RAM 8GB
In Linux top command, load average is almost always : 0.17, 0.14, 0.13

Postgres server specs:
8 sockets 2.5Ghz and RAM 16GB
In Linux top command, load average is around most of the time : 0.59, 0.50,
0.45

The questions to summarize are :
1. Is ~900 TPS and 10ms acceptable DB performance for accounting?
2. In the max_requests setting, what does the clients mean here, does it
mean NAS client or SQL connections based on max requests? how to size it?
3. What could be the culprit given the fact all the servers are relaxed?

Really appreciate your guidance here

Best,