[Gluster-devel] Design for lookup-optimize made default

Tue Dec 15 06:18:24 UTC 2015

----- Original Message -----
> From: "Raghavendra Gowdappa" <rgowdapp at redhat.com>
> To: "Shyam" <srangana at redhat.com>
> Cc: "Gluster Devel" <gluster-devel at gluster.org>
> Sent: Tuesday, December 15, 2015 11:38:03 AM
> Subject: Re: [Gluster-devel] Design for lookup-optimize made default
> 
> 
> 
> ----- Original Message -----
> > From: "Shyam" <srangana at redhat.com>
> > To: "Sakshi Bansal" <sabansal at redhat.com>, "Gluster Devel"
> > <gluster-devel at gluster.org>
> > Sent: Monday, December 14, 2015 10:40:09 PM
> > Subject: Re: [Gluster-devel] Design for lookup-optimize made default
> > 
> > Sakshi,
> > 
> > In the doc. there is reference to the fact that when a client fixes a
> > layout it assigns the same dircommit hash to the layout which is
> > equivalent to the vol commit hash. I think this assumption is incorrect,
> > when a client heals the layout, the commit hash is set to 1
> > (DHT_LAYOUT_HASH_INVALID) [1].
> 
> Yes. You are correct. Thats an oversight on my part. Sorry about it :).
> 
> > 
> > What the above basically means is that when anyone other than rebalance
> > changes the layout of an existing directory, it's commit-hash will start
> > disagreeing with the volume commit hash. So that part is already handled
> > (unless I am missing something, which case it is a bug and we need it
> > fixed).
> > 
> > The other part of the self-heal, I would consider *not* needed. If a
> > client heals a layout, it is because a previous layout creator
> > (rebalance or mkdir) was incomplete, and hence the client needs to set
> > the layout. If this was by rebalance, the rebalance process would have
> > failed and hence would need to be rerun. For abnormal failures on
> > directory creations, I think the proposed solution is heavy weight, as
> > lookup-optimize is an *optimization* and so it can devolve into
> > non-optimized modes in such cases. IOW, I am stating we do not need to
> > do this self healing.
> 
> If this happens to a large number of directories, then performance hit can be
> large (and its not an optimization in the sense that hashing should've
> helped us to conclusively say when a file is absent and its basic design of
> dht, which we had strayed away because of bugs). However, the question as
> you pointed out is, can it happen often enough? As of now, healing can be
> triggered because of following reasons:
> 
> 1. As of now, no synchronization between rename (src, dst) and healing. There
> are two cases here:
>    a. healing of src by a racing lookup on src. This falls in the class of
>    bugs similar to lookup-heal creating directories deleted by a racing
>    rmdir and hence will be fixed when we fix that class of bugs (the
>    solution to which is ready, implementation is pending).
>    b. Healing of destination (as layout of src breaks the continuum of dst
>    layout). But again this is not a problem rename overwrites dst only if
>    its an empty directory and no children need to be healed for empty
>    directory.
> 
> 2. Race b/w fix layout from a rebalance process and lookup-heal from a
> client.
>    We don't have synchronization b/w these two as of now and *might* end up
>    with too many directories with DHT_LAYOUT_HASH_INVALID set resulting in
>    poor performance.
> 
> 3. Any failures in layout setting (because of node going down after we choose
> to heal layout, setxattr failures etc).
> 
> Given the above considerations, I conservatively chose to heal children of
> directories. I am not sure whether these considerations are just theoretical
> or something realistic that can be hit in field. With the above details, do
> you still think healing from selfheal daemon is not worth the effort?

And the other thing to note that, once a directory ends up with DHT_LAYOUT_HASH_INVALID (in non add/remove-brick) scenario, its stays in that state till there is a fix layout is run or for the entire lifetime of the directory.

> 
> > 
> > I think we still need to handle stale layouts and the lookup (and other
> > problems).
> 
> Yes, the more we avoid spurious heals, the less we need healing from
> self-heal daemon. In fact we need healing from self-heal daemon only for
> those directories self-heal was triggered spuriously.
> 
> > 
> > [1]
> > https://github.com/gluster/glusterfs/blob/master/xlators/cluster/dht/src/dht-selfheal.c#L1685
> > 
> > On 12/11/2015 06:08 AM, Sakshi Bansal wrote:
> > > The above link may not be accessible to all. In that case please refer to
> > > this:
> > > https://public.pad.fsfe.org/p/dht_lookup_optimize
> > > _______________________________________________
> > > Gluster-devel mailing list
> > > Gluster-devel at gluster.org
> > > http://www.gluster.org/mailman/listinfo/gluster-devel
> > >
> > _______________________________________________
> > Gluster-devel mailing list
> > Gluster-devel at gluster.org
> > http://www.gluster.org/mailman/listinfo/gluster-devel
> > 
> _______________________________________________
> Gluster-devel mailing list
> Gluster-devel at gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-devel
>