num_answers_to_alive

Gary Gatten Ggatten at waddell.com
Thu Aug 4 17:52:13 CEST 2011


Yup.  Typically once something fails I consider it questionable / unstable until it proves itself to me again.  The routing / circuit analogy is a perfect example.

Many HA "things" allow the user to configure preemption or not - such that once the primary node fails and the secondary takes over, when the primary is believed to be healthy again, does it "automatically" become the primary again - OR - must the admin manually make it the primary again?  Personally preemption is disabled in all my HA routers, firewalls, etc.  Once something fails I want to review / analyze the failure and validate it's stable before I trust it again and start running traffic through it!

G


-----Original Message-----
From: freeradius-users-bounces+ggatten=waddell.com at lists.freeradius.org [mailto:freeradius-users-bounces+ggatten=waddell.com at lists.freeradius.org] On Behalf Of Alexander Clouter
Sent: Thursday, August 04, 2011 9:20 AM
To: freeradius-users at lists.freeradius.org
Subject: Re: num_answers_to_alive

Stefan Winter <stefan.winter at restena.lu> wrote:
> 
> The documentation says that 3..10 are *useful* ranges, but doesn't
> mention that everything else is forbidden. In particular, I would like
> to use 1, not 3. The idea is: the server was dead before, but now it
> managed to send a reply back - so it must have been fixed. I would like
> to mark it alive immediately. Is that unreasonable?
>
Similar to 'link flapping' (think OSPF/BGP), you should use heuristics 
as things are not just black and white.  If a service simply had two 
states "up" and "down" then that probably would be okay, but we also 
have 'unstable'.  Imagine this state coming from:
 * overloaded RADIUS server (or backend DB)
 * link congestion between RADIUS servers

Having a value of three, says not just "alive" but also "alive and has 
been for a while"; this could be further interpreted that the service is 
stable as well as alive.  If the system briefly came back and died then 
on attempt two or three you would have likely seen a failure.

Hope I am explaining myself well :)

Cheers

-- 
Alexander Clouter
.sigmonster says: BOFH excuse #256:
                  You need to install an RTFM interface.

-
List info/subscribe/unsubscribe? See http://www.freeradius.org/list/users.html





<font size="1">
<div style='border:none;border-bottom:double windowtext 2.25pt;padding:0in 0in 1.0pt 0in'>
</div>
"This email is intended to be reviewed by only the intended recipient
 and may contain information that is privileged and/or confidential.
 If you are not the intended recipient, you are hereby notified that
 any review, use, dissemination, disclosure or copying of this email
 and its attachments, if any, is strictly prohibited.  If you have
 received this email in error, please immediately notify the sender by
 return email and delete this email from your system."
</font>





More information about the Freeradius-Users mailing list