rlm_sql: default Acct-On/Off query for all backends is somewhat bogus

Mon May 5 13:16:23 CEST 2008

A.L.M.Buxey at lboro.ac.uk wrote:
...
> rlm_sql.c 
> 
> "stop packet with zero session length" as found

  Ok... but how do you normally deal with that?

  In some cases, the server just doesn't respond to accounting packets.
 e.g. if there's a failure of the SQL database.  But for the "detail"
file reader, a failure of the SQL database means that it should keep
re-trying.

  i.e. the "detail" file reader is a lot like a NAS.  It will
re-transmit accounting packets forever... until it's told to shut up.

> 1)main server sees accounting packet but simply dumps it to detail file

  Exactly.

> 2)the second virtual server sucks in the detail file when the server
> is not busy (working much better with 2.0.4 now delay is changed) 
> and then uses the chosen method to handle it....

  Exactly.

> 2a) our chosen method is using postgresql to put the detail into
> database, so DB module is used (rlm_sql). 

  Yup.

> however, i fear that the problem is that the detail code reads in
> the detail file in possibly a too basic way - and when rlm_sql
> detects a problem instead of just ditching the packet like
> the real live server would, the detail module keeps just reading
> the same one in...over and over and over. 10Gb error log file
> and a very busy thread for no purpose. 

  Because that's what a real NAS does.  If the server doesn't respond to
a NAS, then the NAS re-transmits until the sun runs out of fuel.

> so, is my understanding of the logic correct and from a quick
> parse/view of the detail/listener code would my theory that
> a bad packet just gets continually munged rather than being
> dropped away hold any water? :-)

  Yes.  That's the design intent... because the detail file reader
doesn't know what a "bad packet" is.  *You* do.  If you don't tell it
"yes, this packet was dealt with", then it will keep bugging you until
you tell it that the packet was dealt with.

  Maybe this has to be made more obvious in the configuration...

> as far as code and any side effects...... the main server would
> have just ditched the dodgy packet (or am I wrong?) - 
> 
> ret = RLM_MODULE_NOOP  (for session length)
> return RLM_MODULE_INVALID;  (for accounting status)
> 
> (ps why different return type structures?)

  If a session has zero time, it's still a valid accounting packet.  But
a packet without Acct-Status-Type is non-compliant... and invalid.

> - so therefore if the 'out of band' method using a detail
> file was to do the same, then its the same behaviour.
> the only problem is that the out of band was designed
> for times when the database was down, slow or problematic..
> so therefore you'd have to distinguish between fail codes
> due to database issues - ie keep the packet and retry.... - 
> and fail codes due to bad packets in the detail file - which
> need to be ditched.

  Maybe a sample policy would help:

	if ((!Acct-Status-Type) || (!Acct-Session-Time) ||
(Acct-Session-Session-Time == 0)) {
		ok
	}
'	else {
		sql
		...
	}

  Alan DeKok.