DHCP code in 2.0.4+

Alan DeKok aland at deployingradius.com
Sun Jun 7 22:25:58 CEST 2009

Karl Auer wrote:
> DHCP failover and load-balancing are not simple *at all*.

  As evidenced by the fact that the ISC fail-over protocol is horrible,
and the implementation is almost as bad.

  Scratch that.. it's *terrible*.

  After toasting the leases once accidentally, I managed to prove to
myself that this was a design feature.  Let's say that the primary and
secondary have the same configuration, and have synchronized leases.
Stop the secondary, delete *all* leases, and bring it back up again.
You can get into a state where the fail-over protocol does this:

S: Send me the leases
P: I did already!  We're in sync!

  And the secondary has *zero* leases, and therefore wastes CPU cycles
never handing out leases.

  WTF?  I mean.. really.  Is it that hard?

  Oh, and the server is O(N^2) in the number of leases.  Why?  Well...
they don't use fancy concepts like "dynamically resizable hash tables".
    Fixed size hash tables were good enough in 1995, so they're good
enough now, right?

  About 4 years ago I had a series of 200-400 line patches that would
dramatically improve the performance of ISC.  I got told that (1) it's
impossible, and (2) if it was possible, it would require a drastic
re-design of the server.  When I told them that the patches were proven
to work, and quoted *their own* code back at them showing where the
patches could go, there was... nothing.

  They weren't malicious, they just had different priorities.

> I would be very interested to hear how freeradius does it (or plans to
> do it) hence my interest in the discussion. Are there any docs on how
> freeradius implements DHCP? And especially how it implements failover?

  There are little to no documentation on how it does DHCP.  And it
doesn't do fail-over.

  Why?  Fail-over is hard.  My experience is that the fail-over protocol
doesn't help.  From what I recall the last time I looked at it, it was
missing key things, like "transaction numbers".  Hence the failure case
noted above.

  The *correct* conversation should have been:

S: I'm at transaction #0: I have no leases!
P: Geez.. my last recollection is that you were at 1000.  Let me send
   you all of the updates from 0 to where we are now: 1010.
S: Thanks!

  It's really not that hard.  Database books describe replication
protocols.  They look very different from the DHCP fail-over protocol.

  So... for now, DHCP in FreeRADIUS is still experimental.  If you want
to use it, see raddb/sites-available/dhcp.  DHCP fail-over will be
supported when there's a fail-over protocol that *works*.

  And for most enterprise sites, you *don't* need a fail-over protocol.

 Alan DeKok.

More information about the Freeradius-Users mailing list