[Gluster-devel] Glusterd 'Management Volume' proposal

Thu Nov 27 17:42:13 UTC 2014


On November 27, 2014 8:13:01 AM PST, Jeff Darcy <jdarcy at redhat.com> wrote:
>> > To be sure, maintaining external daemons such as etcd or consul
>> > creates its own problems.  I think the ideal might be to embed a
>> > consensus protocol implementation (Paxos, Raft, or Viewstamped
>> > Replication) directly into glusterd, so it's guaranteed to start up
>> > and die exactly when those daemons do and be subject to the same
>> > permission or resource limits.  I'm not sure it's even more work
>> > than managing either an external daemon or a management volume
>(with
>> > its own daemons).
>>
>> To solve the store consistency problem it may be enough to implement
>> Raft (or any other consensus algorithm) but if we aspire to make
>> acquiring data (among the servers) from the distributed store to
>> scale, we would need a 'watch' functionality. IMO, it is easier to
>> implement a key-value interface than a POSIX-y interface for the
>> store. The end solution would be dangerously (sic) similar to
>> consul/etcd/ZK. This bring us to whether we piggyback on reasonably
>> mature solutions that provide the same functionality or build one
>> ourselves. I am torn between the two. Which approach would be
>> practical? Thoughts? MV was born while we were exploring a middle
>> ground.
>
>I've thought quite a bit about what it would take to implement our own
>embeddable consensus-protocol implementation.  It's not something to be
>undertaken lightly.  The difficulty of managing etcd/consul/whatever
>daemons is fairly well understood; it's not trivial, but a lot of what
>we need is there already for other purposes (or done as part of NSR).
>There are just a few "rough edges" left.  What we don't know is the
>difficulty of filling the gaps in the MV approach - e.g. self heal,
>lack
>of a "watch" function.  Therefore, we should probably try to
>characterize those gaps and estimate (as well as we can) the effort to
>overcome them Then we can compare that to the effort required to
>complete the external daemon approach, and see which one will get us to
>our goal more quickly.

... and reliably. 

-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.