[Gluster-devel] Design for lookup-optimize made default

Tue Dec 15 09:34:04 UTC 2015

> > > 
> > > Sakshi,
> > > 
> > > In the doc. there is reference to the fact that when a client fixes a
> > > layout it assigns the same dircommit hash to the layout which is
> > > equivalent to the vol commit hash. I think this assumption is incorrect,
> > > when a client heals the layout, the commit hash is set to 1
> > > (DHT_LAYOUT_HASH_INVALID) [1].
> > 
> > Yes. You are correct. Thats an oversight on my part. Sorry about it :).
> > 
> > > 
> > > What the above basically means is that when anyone other than rebalance
> > > changes the layout of an existing directory, it's commit-hash will start
> > > disagreeing with the volume commit hash. So that part is already handled
> > > (unless I am missing something, which case it is a bug and we need it
> > > fixed).
> > > 
> > > The other part of the self-heal, I would consider *not* needed. If a
> > > client heals a layout, it is because a previous layout creator
> > > (rebalance or mkdir) was incomplete, and hence the client needs to set
> > > the layout. If this was by rebalance, the rebalance process would have
> > > failed and hence would need to be rerun. For abnormal failures on
> > > directory creations, I think the proposed solution is heavy weight, as
> > > lookup-optimize is an *optimization* and so it can devolve into
> > > non-optimized modes in such cases. IOW, I am stating we do not need to
> > > do this self healing.
> > 
> > If this happens to a large number of directories, then performance hit can
> > be
> > large (and its not an optimization in the sense that hashing should've
> > helped us to conclusively say when a file is absent and its basic design of
> > dht, which we had strayed away because of bugs). However, the question as
> > you pointed out is, can it happen often enough? As of now, healing can be
> > triggered because of following reasons:
> > 
> > 1. As of now, no synchronization between rename (src, dst) and healing.
> > There
> > are two cases here:
> >    a. healing of src by a racing lookup on src. This falls in the class of
> >    bugs similar to lookup-heal creating directories deleted by a racing
> >    rmdir and hence will be fixed when we fix that class of bugs (the
> >    solution to which is ready, implementation is pending).
> >    b. Healing of destination (as layout of src breaks the continuum of dst
> >    layout). But again this is not a problem rename overwrites dst only if
> >    its an empty directory and no children need to be healed for empty
> >    directory.
> > 
> > 2. Race b/w fix layout from a rebalance process and lookup-heal from a
> > client.
> >    We don't have synchronization b/w these two as of now and *might* end up
> >    with too many directories with DHT_LAYOUT_HASH_INVALID set resulting in
> >    poor performance.
> > 
> > 3. Any failures in layout setting (because of node going down after we
> > choose
> > to heal layout, setxattr failures etc).
> > 
> > Given the above considerations, I conservatively chose to heal children of
> > directories. I am not sure whether these considerations are just
> > theoretical
> > or something realistic that can be hit in field. With the above details, do
> > you still think healing from selfheal daemon is not worth the effort?
> 
> And the other thing to note that, once a directory ends up with
> DHT_LAYOUT_HASH_INVALID (in non add/remove-brick) scenario, its stays in
> that state till there is a fix layout is run or for the entire lifetime of
> the directory.

Another case where we might end up with DHT_LAYOUT_HASH_INVALID for a directory is a race b/w lookup on a directory and mkdir of the same name. In this race if lookup wins the race and sets the layout, we'll have invalid_hash set on the layout.

I was worried about these unknowns and I tried to solve this by having a fall-back option in terms of heal by self-heal daemon in case if we end up with invalid-hash. It seemed easier
1. to identify scenarios where we might heal and add directory to index.
2. poll the index and heal the children of entries found and remove the entry from index.

This fall-back option I think helps to recover instead of assuming that not many use cases lead to invalid-hash of a directory.
> 
> > 
> > > 
> > > I think we still need to handle stale layouts and the lookup (and other
> > > problems).
> > 
> > Yes, the more we avoid spurious heals, the less we need healing from
> > self-heal daemon. In fact we need healing from self-heal daemon only for
> > those directories self-heal was triggered spuriously.
> > 
> > > 
> > > [1]
> > > https://github.com/gluster/glusterfs/blob/master/xlators/cluster/dht/src/dht-selfheal.c#L1685
> > > 
> > > On 12/11/2015 06:08 AM, Sakshi Bansal wrote:
> > > > The above link may not be accessible to all. In that case please refer
> > > > to
> > > > this:
> > > > https://public.pad.fsfe.org/p/dht_lookup_optimize
> > > > _______________________________________________
> > > > Gluster-devel mailing list
> > > > Gluster-devel at gluster.org
> > > > http://www.gluster.org/mailman/listinfo/gluster-devel
> > > >
> > > _______________________________________________
> > > Gluster-devel mailing list
> > > Gluster-devel at gluster.org
> > > http://www.gluster.org/mailman/listinfo/gluster-devel
> > > 
> > _______________________________________________
> > Gluster-devel mailing list
> > Gluster-devel at gluster.org
> > http://www.gluster.org/mailman/listinfo/gluster-devel
> > 
>