[Gluster-devel] Glusterd: A New Hope

Tue Mar 26 03:59:04 UTC 2013

On Mon, Mar 25, 2013 at 6:07 AM, Jeff Darcy <jdarcy at redhat.com> wrote:
> On 03/25/2013 05:38 AM, Vidar Hokstad wrote:
>> I see a number of complaints about this as some sort of admission of
>> failure.
>
> I wouldn't quite characterize it as failure.  It does work, after all.
> However, glusterd has kind of reached its limits.  Moving it forward has
> become increasingly difficult, and it must move forward to support
> future scale and features.  There's nothing wrong with hand saws and
> axes for small jobs, but at a certain point you're going to need a
> chainsaw.  We're at that point for glusterd IMO.
>
>>     under its care.  The best known example of such a coordination service
>>     is Apache's ZooKeeper[1], but there are others that don't have the
>>     noxious Java dependency
>>
>> I'm happy you recognise the issue of Java. I'd see having to drag that
>> around as a major barrier. One of the major benefits of glusterfs is the
>> simplicity of deployment compared to many alternatives, and that benefit
>> would be massively diminished if I needed to deal with a Java dependency.
>
> Yeah, I think it's a non-starter.  It's a shame, really, because the
> functionality is good and the people working on ZK are doing a good job.
>  Nonetheless, I think the Java dependency is a deal killer.  For what
> it's worth (and this is more to AB's point) I wouldn't favor *any*
> solution that requires users to maintain another component.  I think
> anything we use has to be fully embedded, with low resource needs and
> management completely "under the covers" as far as users are concerned.
>  I don't think that's possible with a big ball of Java like ZK.
>
>> I like the Gluster on Gluster idea you mention later on.
>
> I'm a little surprised by the positive reactions to the "Gluster on
> Gluster" approach.  Even though Kaleb and I considered it for HekaFS,
> it's still a bit of a hack.  In particular, we'd still have to solve the
> problems of keeping that private instance available, restarting daemons
> and initiating repair etc. - exactly the problems it's supposed to be
> solving for the rest of the system.
>
>> Apart from
>> that, have you considered pulling out the parts of Glusterd that you'd
>> like to be able to ditch and try to generalize it and see if there'd be
>> any interest in it as a standalone project? Or is too much of what
>> you're looking for new functionality that is not already covered by part
>> of your current codebase?
>
> We don't have anything like ZK ephemerals, and we'd need to add inotify
> support (or something equivalent) as well.  Then again, those features
> would then be exposed to users as well, so it might be worth it.  Maybe
> we should consider how this might be arranged so that parts would be
> useful for things other than GlusterFS itself.  Thanks for the idea.
>
>>     * Membership: a certain small set of servers (three or more) would be
>>     manually set up as coordination-service masters, e.g. via "peer probe
>>     xxx as master").
>>
>> Careful here. Again, a big advantage of Gluster to users is to not need
>> any "special" servers that require other treatment. I recognise there's
>> a  bootstrap problem, but to whatever extent possible, at the very least
>> try to make this transparent to users (e.g. have the cluster
>> automatically make more of the nodes take on coordination-service roles
>> if any are lost etc.).
>
> I'm a little wary of trying to hide this from users.  The coordination
> servers should be chosen to minimize the risk of correlated failure, and
> we currently lack the topological awareness (e.g. which server is in
> which rack or attached to which switch) to do that properly.  If we just
> do something like "first three servers to be configured become
> configuration servers" then we run a very high risk of choosing exactly
> those servers that are most likely to fail together.  :(  As long as the
> extra configuration is limited to one option on "peer probe" is it
> really a problem?
>

gluster meta-volume + zeromq for notification (pub/sub) will solve our
problems largely and still be light weight.  In a large scale
deployment, it is not a good idea to declare all the servers as
coordination servers. Since meta-volume is a regular distributed
replicated gluster volume, it can always be expanded later depending
on the load and availability requirements.

-- 
-ab

Imagination is more important than knowledge --Albert Einstein