[Gluster-devel] ZkFarmer

Wed May 9 01:33:50 UTC 2012

On Mon, May 7, 2012 at 9:33 PM, Anand Babu Periasamy <ab at gluster.com> wrote:

> On Mon, May 7, 2012 at 7:43 AM, Jeff Darcy <jdarcy at redhat.com> wrote:
> > I've long felt that our ways of dealing with cluster membership and
> staging of
> > config changes is not quite as robust and scalable as we might want.
> > Accordingly, I spent a bit of time a couple of weeks ago looking into the
> > possibility of using ZooKeeper to do some of this stuff.  Yeah, it
> brings in a
> > heavy Java dependency, but when I looked at some lighter-weight
> alternatives
> > they all seemed to be lacking in more important ways.  Basically the
> idea was
> > to do this:
> >
> > * Set up the first N (e.g. N=3) nodes in our cluster as ZooKeeper
> servers, or
> > point everyone at an existing ZooKeeper cluster.
> >
> > * Use ZK ephemeral nodes as a way to track cluster membership ("peer
> probe"
> > merely updates ZK, and "peer status" merely reads from it).
> >
> > * Store config information in ZK *once* instead of regenerating volfiles
> etc.
> > on every node (and dealing with the ugly cases where a node was down
> when the
> > config change happened).
> >
> > * Set watches on ZK nodes to be notified when config changes happen, and
> > respond appropriately.
> >
> > I eventually ran out of time and moved on to other things, but this or
> > something like it (e.g. using Riak Core) still seems like a better
> approach
> > than what we have.  In that context, it looks like ZkFarmer[1] might be
> a big
> > help.  AFAICT someone else was trying to solve almost exactly the same
> kind of
> > server/config problem that we have, and wrapped their solution into a
> library.
> >  Is this a direction other devs might be interested in pursuing some day,
> > if/when time allows?
> >
> >
> > [1] https://github.com/rs/zkfarmer
>
> Real issue is here is: GlusterFS is a fully distributed system. It is
> OK for config files to be in one place (centralized). It is easier to
> manage and backup. Avati still claims that making distributed copies
> are not a problem (volume operations are fast, versioned and
> checksumed). Also the code base for replicating 3 way or all-node is
> same. We all need to come to agreement on the demerits of replicating
> the volume spec on every node.
>

My claim is somewhat similar to what you said literally, but slightly
different in meaning. What I mean is, while it is true keeping multiple
copies of the volfile is more expensive/resource consuming in theory, what
is the breaking point in terms of number of servers where it begins to
matter? There are trivial (low lying) enhancements which are possible (for
e.g, store volfiles of a volume only on participating servers instead of
all servers) which could address a class of concerns. There are clear
advantages in having volfiles in all the participating nodes at least - it
takes away dependency on order of booting of servers in your data centre.
If volfiles are available locally you dont have to wait/retry for the
"central servers" to come up first. Whether this is volfiles managed by
glusterd, or "storage servers" of ZK, it is a big advantage to have the
startup of a given server decoupled from the others (of course the coupling
comes in at an operational level at the time of volume modifications, but
that is much more acceptable).

If the storage of volfiles on all servers really seems unnecessary, we
should first come up with real hard numbers - number of servers v/s latency
of volume operations and then figure out at what point it starts becoming
unacceptably slow. Maybe a good solution is to just propagate the volfiles
in the background while still retaining version info than introducing a
more intrusive change? But we really need the numbers first.

>
> If we are convinced to keep the config info in one place, ZK is
> certainly one a good idea. I personally hate Java dependency. I still
> struggle with Java dependencies for browser and clojure. I can digest
> that if we are going to adopt Java over Python for future external
> modules. Alternatively we can also look at creating a replicated meta
> system volume. What ever we adopt, we should keep dependencies and
> installation steps to the bare minimum and simple.
>
>
It is true other projects have figured out the problem of membership and
configuration management and specialize at doing that. That is very good
for the entire computing community as a whole. If there are components we
can incorporate and build upon their work, that is very desirable. At the
same time we also need to check what other baggage we inherit along with
the specialized expertise we take on. One of the biggest strengths of
Gluster has been its "lightweight"edness and lack of dependencies - which
in turn has driven our adoption significantly which in turn results in
higher feedback and bug reports etc. (i.e, it is not an isolated strength
in itself). Enforcing a Java dependency down the throat of users who want a
simple distributed filesystem (yes, the moment we stop thinking of gluster
as a "simple" distributed filesystem - even though it may be an oxymoron
technically, but I guess you know what I mean :)  it's a slippery slope
towards it becoming "yet another" distributed filesystem.) The simplicity
is what "makes" gluster to a large extent what it is. This makes the
developer's life miserable to a fair degree, but it anyways always is, one
way or another ;)

I am not against adopting external projects. There are good reasons many
times to do so. If there are external projects which are "compatible in
personality" with gluster and helps us avoid reinventing the wheel, we must
definitely do so. If they are not compatible, I'm sure there are lessons
and ideas we can adopt, if not code.

Avati
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://supercolony.gluster.org/pipermail/gluster-devel/attachments/20120508/2bdb3edf/attachment-0003.html>