DHCP code in 2.0.4+

Mon Jun 8 15:57:25 CEST 2009

Ok i'm going to try and draw this back into a central thread.

On 7/6/09 17:57, Karl Auer wrote:
> On Sun, 2009-06-07 at 17:20 +0100, Arran Cudbard-Bell wrote:
>> For purposes of resilience there is absolutely no requirement for DHCP
>> servers to communicate with each other directly. They just need a common
>> source of knowledge about DHCP lease allocation.
>
> OK - but lets not talk about "DHCP failover" then, which has very
> specific meaning in a DHCP context. "Resilience" is a good term.

Karl; It has a very specific meaning in an ISC DHCP context. A clustered system could replicate fail-over behavior, but really there'd be little advantage in it doing so.

Alan; I'd be interested in hearing what the 'corner cases' are where fail-over is advantageous.

>
> You are basically talking about making sure the DHCP service stays up by
> using multiple servers accessing a common data source, and then making
> sure the data source stays available. Fair enough - but it isn't
> failover.

Can't find a definitive term in the OED or Merriam-Webster, but 'dictionary.com' *shudder*, defines it as:

"Automatically switching to a different, redundant system upon failure or abnormal termination of the currently active system"

There is no reason why this behavior couldn't be replicated on a request by request basis in a clustered environment. You just need to record the fact that a server has made an offer, as well as 
recording the fact that a lease has been made for an IP address.

>
> And of course it means you can make the actual DHCP servers much simpler
> - for a start they don't have to implement failover :-) Probably a very
> good thing because as I believe I may have mentioned, it isn't trivail
> to do.
>
>> All this goes away. You run multiple servers in active/active mode;
>> there is no takeover or recovery. All DHCP servers act as one, they all
>> have an identical view of the state of the various leases.
>
> Yes indeed. However the supply of that resilient common view requires
> quite a bit of cooperating hardware and software. You have made the DHCP
> side of things much easier, but the back end is now more complex. I'm
> not sure you get to call a resilient, high-availability clustered
> database "trivial" either :-)

It depends how seamless you want to make the fail-over process. You don't have to run truly clustered SQL, you can get away with a Master-> Slaves type arrangement. With MySQL at least, it's possible 
to nominate one of the replicas as a new master should the current master fail; this process can be automated to a large degree. Fail-back is much harder (apparently), and would be difficult to automate.

I think one of the points Alan was making about using an SQL as backend lease storage mechanism, is that the mainstream SQL databases have proven track records when it comes to high availability 
configurations and scalability. The actual code base is maintained by people who know far more about the concepts behind synchronous replication of data than the people maintaining the ISC code or the 
other DHCP server implementations.

Alex C; A small point about LDAP and DHCP leases. IIRC LDAP doesn't have any kind of 'locking' mechanism for objects/properties, or any way of defining index constraints. This makes it unsuitable for 
storing DHCP lease information in a clustered (or threaded) environment, as you cannot guarantee at the point of modifying the lease state, that the directory contents has not been modified.

>
> One down side is that you need all your DHCP servers to be the same; you
> lose interoperability. That may or may not be a problem.

Well no. There's this lovely layer of abstraction between the DHCP server and the database. SQL is actually an ANSI standard, databases may implement extensions to it, but the base language can be 
used with any SQL compliant server, and/or client.

So i'd argue that you mitigate against interoperability issues by using an SQL database as a lease store. It's certainly better defined than the ISC Failover protocol, and with some databases you also 
have stored procedures which offer an additional layer of abstraction on top.

So long as you can modify the SQL queries used by the DHCP server, you should be able to get two servers from completely different vendors using the same database structure.

>
> Another downside might be DHCP response time - how fast can you get
> addresses out when obtaining one involves a call into the database?

Our current production SQL server comes in at 0.0004 seconds for a SELECT statement against an indexed column containing 13,000 unique IP addresses. Insert statements will vary depending on the number 
indexes defined. But should be less than 50ms absolute maximum... With MySQL-INNODB you can run multiple concurrent insert and select statements on the same table.

>  Have
> you tested that side of things yet? (I'm not sure whether you have DHCP
> servers actually working this way or are talking about how they could or
> should work).

It's a little of both. I keep trying to find the time to give the DHCP server a through vetting; but fighting vendor issues has greatly reduced the amount of time I have to spend on fun things.

>
>>>   in the case where
>>> a server drops out, is not reachable by its peer or is deliberately
>>> taken offline. Not to mention the possibility of having several servers
>>> participating in various failover relationships.
>>>
>>>
>> *sigh* don't overcomplicate it.
>
> Life complicates things. You have to deal with it.

Complexity has its root in simplicity. Complexity itself isn't a bad thing, sometimes things are necessarily complex. What makes a complex system bad (read unmaintainable) is poor stratification of 
the (simple) concepts it's built upon.

>  However, with a
> common data source driving your DHCP, you also don't have to worry about
> creating meshes of DHCP failover relationships, because failover has
> disappeared.
>
> It's one of the great things about DHCPv6, by the way - no more
> failover!

Is anyone actually using that? What advantages does it have over the stateless auto-configuration protocol? (i've not really done that much reading as regards to IPv6 yet).

>
>> One of the good points you made was about load balancing; Any idea how
>> this is done in existing DHCP server solutions?
>
> Er - packet or DHCP-level balancing? We have never needed packet level
> load balancing; the servers we use have never come remotely close to
> needing it. I suppose a bigger network might need it,

We have a subnet with ~3000 hosts. After a campus wide power failure, it is conceivable that they'd all be trying to acquire leases at the same time, especially once the distribution layer is UPS 
backed. This would probably make the DHCP server sad.

> though in that
> case I'd be wanting a better DHCP server distribution rather than load
> balancing hardware. For DHCP balancing we've always just distributed the
> addresses 50/50.

Ok.

Regards,
Arran
-- 
Arran Cudbard-Bell (A.Cudbard-Bell at sussex.ac.uk),
Authentication, Authorisation and Accounting Officer,
Infrastructure Services (IT Services),
E1-1-08, Engineering 1, University Of Sussex, Brighton, BN1 9QT
DDI+FAX: +44 1273 873900 | INT: 3900
GPG: 86FF A285 1AA1 EE40 D228 7C2E 71A9 25BB 1E68 54A2