annoying stop retransmissions.

Alan DeKok aland at deployingradius.com
Mon Nov 28 13:53:23 CET 2011


Alexandre Chapellon wrote:
> This work as epected for most of my NASes. Unfortunately, i have some
> NASes that are behind a satelite link, which is a very unreliable link
> with regular packets loss. UDP retramission of packet make the systems
> work even with that kind of link, but I have one scenario that create
> errors:

  This is common in RADIUS.  Accounting is... awkward, to put it politely.

> When a stop ticket is transmitted once and reaches correctly the
> freradius servers (nas -> front -> back), Session record is deleted from
> the "live acct" table, packet is then proxied to the 2nd freeradius and
> session in Acct table is marked as stoped (acctstoptime=something). If
> the front freeradius acks the Stop packet and that Ack is lost on the
> link, the NAS retransmit the STOP.

  As it should.  It's the responsibility of the RADIUS server to deal
with retransmissions from the NAS.

> Same thing occur,:
> - front radius tries to delete the sessions using its acct_stop_query,
> wich result in no line being modified and so tries to run its
> acct_stop_query_alt (which basicly does the same thind: delete).

  It really should delete the record ONLY if it exists.  Or, UPDATE the
record to say "session stopped".  After a suitable delay (10-20 min),
the "stopped" sessions can safely be deleted.

> alt
> query also modify no lines but no error is logged. retransmitted packet
> is then proxied to the back server, wich in turns tries ti run its
> acct_stop_query (tries to update a session with no acctstoptime). That
> query fails as the previous Stop ticket for that session already updated
> the recod. It then tries to run the acct_stop_query_alt, which is
> designed to try to insert a new session record based on the content of
> the stop ticket (this is done to deal with the case where start ticket
> is lost and only stop ticket is received, i guess). In my case this last
> query fails because of some unicity constraint in the oracle database
> (to prevent one session from being recorded multiple times), and an
> error is logged in freeradius.

  The solution is to fix the queries so that they deal with non-existant
sessions.  This is no different than a NAS sending a STOP for sessions
that *never* existed.

> Does anybody have an idea on how to deal with that (minor) problem so I
> have no more regular error messages.
> I was maybe thinking of not proxying to the back server, packets
> retransmitted du to ACK loss, but I can't really find out how to do that...
> 
> Thanks for reading that long post (I hope it's understandable enough).

  It is.

  There is no real solution other than building a smarter system to
handle accounting packets.

  I suggest writing a detailed state machine describing what happens for
each session, and how each kind of packet is handled.  Until that's
done, no good solution is possible.

  We can take such a state machine and use it to update the handling of
accounting packets for 3.0.

  Alan DeKok.




More information about the Freeradius-Users mailing list