[Gluster-devel] [Gluster-users] Proposal for GlusterD-2.0

Mike S mike at luminatewireless.com
Mon Sep 8 21:05:19 UTC 2014

Is there any reason not to consider zookeeper? The 3.4 release is
quite stable and due to a large number of users, bugs are fixed and
its quirks are known.

I like the idea of RAFT. The paper is well written and very
compelling. The last time I read it, a number of critical issues were
glossed over - for instance, log compaction and pruning. Systems must
be correct in both theory and implementation. Although many raft-based
sustems have cropped in the last year or so since RAFT was published,
I don't judge their use to be significant compared to zookeeper.
Quality only comes with maturity and many workloads bashing on it.

The last time I needed to build up a new distributed system, I had
written up some notes about etcd vs zookeeper. Perhaps you will find
them helpful, or motivate some new questions before you make your


On Monday, September 8, 2014, Jonathan Barber <jonathan.barber at gmail.com> wrote:
> On 8 September 2014 05:05, Krishnan Parthasarathi <kparthas at redhat.com> wrote:
>> > Bulk of current GlusterD code deals with keeping the configuration of the
>> > cluster and the volumes in it consistent and available across the nodes. The
>> > current algorithm is not scalable (N^2 in no. of nodes) and doesn't prevent
>> > split-brain of configuration. This is the problem area we are targeting for
>> > the first phase.
>> >
>> > As part of the first phase, we aim to delegate the distributed configuration
>> > store. We are exploring consul [1] as a replacement for the existing
>> > distributed configuration store (sum total of /var/lib/glusterd/* across all
>> > nodes). Consul provides distributed configuration store which is consistent
>> > and partition tolerant. By moving all Gluster related configuration
>> > information into consul we could avoid split-brain situations.
>> > Did you get a chance to go over the following questions while making the
>> > decision? If yes could you please share the info.
>> > What are the consistency guarantees for changing the configuration in case of
>> > network partitions?
>> > specifically when there are 2 nodes and 1 of them is not reachable?
>> > consistency guarantees when there are more than 2 nodes?
>> > What are the consistency guarantees for reading configuration in case of
>> > network partitions?
>> consul uses Raft[1] distributed consensus algorithm internally for maintaining
>> consistency. The Raft consensus algorithm is proven to be correct. I will be
>> going through the workings of the algorithm soon. I will share my answers to
>> the above questions after that. Thanks for the questions, it is important
>> for the user to understand the behaviour of a system especially under failure.
>> I am considering adding a FAQ section to this proposal, where questions like the above would
>> go, once it gets accepted and makes it to the feature page.
>> [1] - https://ramcloud.stanford.edu/wiki/download/attachments/11370504/raft.pdf
> The following article provides some results on how Consul works following partitioning, actually testing whether it recovers successfully:
> http://aphyr.com/posts/316-call-me-maybe-etcd-and-consul
> It gives Consul a positive review.
>> ~KP
>> _______________________________________________
>> Gluster-users mailing list
>> Gluster-users at gluster.org
>> http://supercolony.gluster.org/mailman/listinfo/gluster-users
> --
> Jonathan Barber <jonathan.barber at gmail.com>

More information about the Gluster-devel mailing list