[Gluster-devel] Design for lookup-optimize made default

Tue Dec 15 06:08:03 UTC 2015

----- Original Message -----
> From: "Shyam" <srangana at redhat.com>
> To: "Sakshi Bansal" <sabansal at redhat.com>, "Gluster Devel" <gluster-devel at gluster.org>
> Sent: Monday, December 14, 2015 10:40:09 PM
> Subject: Re: [Gluster-devel] Design for lookup-optimize made default
> 
> Sakshi,
> 
> In the doc. there is reference to the fact that when a client fixes a
> layout it assigns the same dircommit hash to the layout which is
> equivalent to the vol commit hash. I think this assumption is incorrect,
> when a client heals the layout, the commit hash is set to 1
> (DHT_LAYOUT_HASH_INVALID) [1].

Yes. You are correct. Thats an oversight on my part. Sorry about it :).

> 
> What the above basically means is that when anyone other than rebalance
> changes the layout of an existing directory, it's commit-hash will start
> disagreeing with the volume commit hash. So that part is already handled
> (unless I am missing something, which case it is a bug and we need it
> fixed).
> 
> The other part of the self-heal, I would consider *not* needed. If a
> client heals a layout, it is because a previous layout creator
> (rebalance or mkdir) was incomplete, and hence the client needs to set
> the layout. If this was by rebalance, the rebalance process would have
> failed and hence would need to be rerun. For abnormal failures on
> directory creations, I think the proposed solution is heavy weight, as
> lookup-optimize is an *optimization* and so it can devolve into
> non-optimized modes in such cases. IOW, I am stating we do not need to
> do this self healing.

If this happens to a large number of directories, then performance hit can be large (and its not an optimization in the sense that hashing should've helped us to conclusively say when a file is absent and its basic design of dht, which we had strayed away because of bugs). However, the question as you pointed out is, can it happen often enough? As of now, healing can be triggered because of following reasons:

1. As of now, no synchronization between rename (src, dst) and healing. There are two cases here:
   a. healing of src by a racing lookup on src. This falls in the class of bugs similar to lookup-heal creating directories deleted by a racing rmdir and hence will be fixed when we fix that class of bugs (the solution to which is ready, implementation is pending).
   b. Healing of destination (as layout of src breaks the continuum of dst layout). But again this is not a problem rename overwrites dst only if its an empty directory and no children need to be healed for empty directory.

2. Race b/w fix layout from a rebalance process and lookup-heal from a client.
   We don't have synchronization b/w these two as of now and *might* end up with too many directories with DHT_LAYOUT_HASH_INVALID set resulting in poor performance.

3. Any failures in layout setting (because of node going down after we choose to heal layout, setxattr failures etc).

Given the above considerations, I conservatively chose to heal children of directories. I am not sure whether these considerations are just theoretical or something realistic that can be hit in field. With the above details, do you still think healing from selfheal daemon is not worth the effort?

> 
> I think we still need to handle stale layouts and the lookup (and other
> problems).

Yes, the more we avoid spurious heals, the less we need healing from self-heal daemon. In fact we need healing from self-heal daemon only for those directories self-heal was triggered spuriously.

> 
> [1]
> https://github.com/gluster/glusterfs/blob/master/xlators/cluster/dht/src/dht-selfheal.c#L1685
> 
> On 12/11/2015 06:08 AM, Sakshi Bansal wrote:
> > The above link may not be accessible to all. In that case please refer to
> > this:
> > https://public.pad.fsfe.org/p/dht_lookup_optimize
> > _______________________________________________
> > Gluster-devel mailing list
> > Gluster-devel at gluster.org
> > http://www.gluster.org/mailman/listinfo/gluster-devel
> >
> _______________________________________________
> Gluster-devel mailing list
> Gluster-devel at gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-devel
>