[Gluster-users] [Gluster-devel] Proposal for GlusterD-2.0

Mon Sep 8 21:05:19 UTC 2014

Is there any reason not to consider zookeeper? The 3.4 release is
quite stable and due to a large number of users, bugs are fixed and
its quirks are known.

I like the idea of RAFT. The paper is well written and very
compelling. The last time I read it, a number of critical issues were
glossed over - for instance, log compaction and pruning. Systems must
be correct in both theory and implementation. Although many raft-based
sustems have cropped in the last year or so since RAFT was published,
I don't judge their use to be significant compared to zookeeper.
Quality only comes with maturity and many workloads bashing on it.

The last time I needed to build up a new distributed system, I had
written up some notes about etcd vs zookeeper. Perhaps you will find
them helpful, or motivate some new questions before you make your
decision.

https://docs.google.com/document/d/1FOnLD26W9iQ2CUZ-jVCn7o0OrX8KPH7QGeokN4tA_j4/edit

On Monday, September 8, 2014, Jonathan Barber <jonathan.barber at gmail.com> wrote:
>
> On 8 September 2014 05:05, Krishnan Parthasarathi <kparthas at redhat.com> wrote:
>>
>>
>>
>> > Bulk of current GlusterD code deals with keeping the configuration of the
>> > cluster and the volumes in it consistent and available across the nodes. The
>> > current algorithm is not scalable (N^2 in no. of nodes) and doesn't prevent
>> > split-brain of configuration. This is the problem area we are targeting for
>> > the first phase.
>> >
>> > As part of the first phase, we aim to delegate the distributed configuration
>> > store. We are exploring consul [1] as a replacement for the existing
>> > distributed configuration store (sum total of /var/lib/glusterd/* across all
>> > nodes). Consul provides distributed configuration store which is consistent
>> > and partition tolerant. By moving all Gluster related configuration
>> > information into consul we could avoid split-brain situations.
>> > Did you get a chance to go over the following questions while making the
>> > decision? If yes could you please share the info.
>> > What are the consistency guarantees for changing the configuration in case of
>> > network partitions?
>> > specifically when there are 2 nodes and 1 of them is not reachable?
>> > consistency guarantees when there are more than 2 nodes?
>> > What are the consistency guarantees for reading configuration in case of
>> > network partitions?
>>
>> consul uses Raft[1] distributed consensus algorithm internally for maintaining
>> consistency. The Raft consensus algorithm is proven to be correct. I will be
>> going through the workings of the algorithm soon. I will share my answers to
>> the above questions after that. Thanks for the questions, it is important
>> for the user to understand the behaviour of a system especially under failure.
>> I am considering adding a FAQ section to this proposal, where questions like the above would
>> go, once it gets accepted and makes it to the feature page.
>>
>> [1] - https://ramcloud.stanford.edu/wiki/download/attachments/11370504/raft.pdf
>>
>
> The following article provides some results on how Consul works following partitioning, actually testing whether it recovers successfully:
> http://aphyr.com/posts/316-call-me-maybe-etcd-and-consul
>
> It gives Consul a positive review.
>
> HTH
>
>> ~KP
>>
>> _______________________________________________
>> Gluster-users mailing list
>> Gluster-users at gluster.org
>> http://supercolony.gluster.org/mailman/listinfo/gluster-users
>
>
>
>
> --
> Jonathan Barber <jonathan.barber at gmail.com>