FreeRadius high availability lab

Tue Nov 7 14:54:28 CET 2017

On Nov 7, 2017, at 6:49 AM, Nathan Ward <lists+freeradius at daork.net> wrote:
> First thing I’d say is don’t have both load balancers sending traffic to all the RADIUS servers. You don’t gain much resiliency there except in the load balancer - if the pool of RADIUS servers dies for some reason (it will, at some point) you lose your service. Split the backends in to two pools, if you don’t have enough in each pool, get more - don’t merge the pools.

  The approach I often take is:

- two data centres.  Each NAS points to one DC as primary, and the other as secondary
- each DC is the same
  - two RADIUS servers, sharing an IP via VRRP
  - two DBs.
  - each RADIUS server can write to both SQL DBs, in a fail-over configuration.

  That's good enough for most common uses.  If people need higher performance, the front-end RADIUS servers become proxies, and they load-balance to a set of back-ends, which are all identical (RADIUS + SQL).  The load-balancing is done on a hash of User-Name, for simple sharing and consistency.

  The main issue here is merging all of the data which is split into different SQL DBs.  But that can be done off-line, i.e. *not* live, and *not* by the RADIUS server.

  Much of "high performance" system design ends up working around performance limitations with databases.

> Other than that, there are so, so many questions that need to be asked before you start drawing this sort of picture. In addition to the above:
> - What is RADIUS doing? Auth? Accounting? Both?
> - Can you accept lag in accounting getting to the DB? How much?
> - Can you accept lag in password/user changes? How much?
> - Can you accept accounting black spots? How long/often?
> - What conditions cause your NAS to use the secondary server?
> - Are you doing anything that requires shared state (i.e. IP address assignment, multiple login prevention, etc.)? How important is integrity here if you have to trade it for availability?

  i.e. What problem are you trying to solve?  Knowing that will lead you to a solution.

> These are just some that I came up with off the top of my head.. there are many, many more that would come from the answers to these questions.. Don’t try and design your solution to cater for failure modes that don’t cause problems in your specific environment by adding needless complexity, as adding complexity will cause failures of its own.

  Very true.

  Alan DeKok.