[Gluster-devel] question on glustershd

Tue Dec 2 16:59:01 UTC 2014

Hi

I have been tracking down a bug reported by /tests/basic/afr/entry-self-heal.t 
on NetBSD, and now I wonder how glustershd is supposed to work. 

In xlators/cluster/afr/src/afr-self-heald.c, we create a healder for
each AFR subvolume. In afr_selfheal_tryinodelk(), each healer performs 
the INODELK for each AFR subvolume, using AFR_ONALL().

The result is that healers compete for the locks on the same inodes
in the subvolumes. They sometime conflict, and if we have only two 
subvolumes, we ran into this condition:
                if (ret < AFR_SH_MIN_PARTICIPANTS) {
                        /* Either less than two subvols available, or another
                           selfheal (from another server) is in progress. Skip
                           for now in any case there isn't anything to do.
                        */             
                        ret = -ENOTCONN;
                        goto unlock;
                }

Since there is no glustershd doing the work on another server, the entry
will remain unhealed. I beleive this is exactly the same problem I am 
trying to address in http://review.gluster.org/9074

What is wrong here? Should there really be healers for each subvolume, 
or is it the AFR_ONALL() usage that is wrong? Or did I completely miss
the thing?

-- 
Emmanuel Dreyfus
manu at netbsd.org