DHCP code in 2.0.4+

Tue Jun 9 15:40:49 CEST 2009

Karl Auer wrote:
> Maybe - but it's the way a good many, in fact most, of the main
> protocols we use today have become what they are. People do their best,
> then the real world comes along and reminds them of all the things they
> forgot. It's normal for stuff to need fixing.

  That's nice.  Except that database replication was already a solved
problem when the protocol was designed.

>>   See earlier messages in this thread.  I (a) found a theoretical issue
>> with the protocol, and (b) demonstrated it in a live system.
> 
> I missed it. What was it again?

  It doesn't have transaction numbers.  Parts of the request/ack
protocol is missing anything *other* than request/ack.  It can't say
"get me all leases since time T".  It can't say "get me the leases since
I last synced".

  What I did was to configure a primary and secondary.  Let them sync.
Then, take down the secondary, and delete it's lease database.  When the
secondary comes back up, the fail-over protocol does this:

S: Send me leases
P: I did!
S: OK.

  And the secondary is quite happy to sit there with *zero* leases.
It's really mind-boggling.  Maybe they've fixed it in more recent
versions, but it's still a catastrophic design error.

  A real replication protocol using techniques known since at least 1990 is:

S: send me transactions since time 0
P: Hmm... I recall sending you transactions until time T, but OK...
P: <here's all the leases from time 0..T'>
S: OK.  I'm synced at time T'
P: Thanks, I'll remember that
...
S: Can you send me updates since time T'?
P: OK, here they are

> You do need quite a few states for leases, and you need some mechanism
> for transitioning between those states in an orderly fashion, in a way
> that does not invalidating the contract you have with your DHCP clients.

  Yes.  So long as both servers can share the same view of what the
client should be doing, they will work together seamlessly.  Note that
this does *not* mean that they share *all* information before responding
to the client.  Replication can be lazy in many, many, cases.

> But these lease states aren't the same states as those used in the DHCP
> failover protocol. Seems to me you don't need *any* of those, because
> the servers simply do not have to communicate directly. They
> "communicate", if at all, through changing state in a shared database.

  Yes.

  Alan DeKok.