v2.x.x redundant-load-balance broken
Brian De Wolf
bldewolf at cpp.edu
Wed Mar 25 03:14:01 CET 2015
On Tue, 24 Mar 2015 17:19:40 -0500
Alan DeKok <aland at deployingradius.com> wrote:
> > This causes the other tests to randomly fail, as it sometimes load
> > balances to the second member, which causes it to only try the three
> > fail modules.
>
> It should loop around to the beginning if there’s a failure. The
> code should do that...
>
The code to loop around is there and works, but because of the
off-by-one error it stops before that. Since it only does N-1 tries,
it tries the 2nd, 3rd, 4th, then gives up. If it had done N tries, it
would reach the 1st module again and succeed.
> > What puzzles me is that, when I add this config instead:
> >
> > redundant-load-balance {
> > fail
> > ok
> > fail
> > fail
> > }
> >
> > I stop getting random failures. When I add logging to record which
> > one we picked and to identify the module before we call
> > modcall_child, it says:
> >
> > ++load-balance redundant-load-balance {
> > pick is 3
> > ++redundant-load-balance group redundant-load-balance {
> > trying 0x13a7780
> > +++[fail] = fail
> > +++[fail] = fail
> > trying 0x13a77e0
> > +++[fail] = fail
> > trying 0x13a7550
> > +++[fail] = fail
> > +++[ok] = ok
> > ++} # redundant-load-balance group redundant-load-balance = ok
> >
> > It's not clear to me why it's listing fail multiple times for some
> > modcalls, or where that last ok comes from.
>
> Maybe your instrumentation code is wrong?
>
Even with the standard debugging, it was printing >3 "+++[fail] = fail"
lines before the "+++[ok] = ok" line on a group with only 4 modules. I
added the extra debugging lines to try to clear things up and they just
made me more confused. I was hoping it was something obvious, like a
quirk from using fail/ok in a redundant-load-balance (because what kind
of silly person would do that?).
> > Anyway, I checked v3.x.x for the off-by-one error and it looks like
> > the loop was re-done to avoid count entirely. Maybe more of the
> > v3.x.x code needs to be back ported?
>
> Try the change. If it fixes the problem, send a patch, and I’ll
> put it in 5 min later.
>
I'll try to poke at this some more next week.
More information about the Freeradius-Devel
mailing list