[Gluster-devel] AFR conservative merge portability

Ravishankar N ravishankar at redhat.com
Mon Dec 15 07:12:44 UTC 2014


On 12/15/2014 11:10 AM, Krutika Dhananjay wrote:
> Seems OK to me, as long as the appropriate locks are taken.
> 
> -Krutika
> 
> --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
> 
>     *From: *"Emmanuel Dreyfus" <manu at netbsd.org>
>     *To: *"Gluster Devel" <gluster-devel at gluster.org>
>     *Sent: *Saturday, December 13, 2014 8:08:03 PM
>     *Subject: *[Gluster-devel] AFR conservative merge portability
> 
>     Hello
> 
>     On NetBSD, tests/basic/afr/entry-self-heal.t always fail on this
>     scenario:
> 
>     mkdir spb_heal
>     kill brick brick0
>     touch spb_heal/0
>     glusterfs volume start force
>     kill_brick brick1
>     touch spb_heal/1
>     glusterfs volume start force
> 
>     At that time, conservative merge takes off and copy spb_heal/0 and
>     spb_heal/1 in each brick where it is missing. That works, but on NetBSD
>     we are left with AFR xattr on spb_heal directory telling each brick
>     accuses the other for metadata. This metadata split brain that will not
>     self heal.
> 
>     This happens because after adding an entry, parent directory (spb_heal
>     here) mtime/ctime must be updated. On Linux, it seems the filesystem is
>     responsible for that. On NetBSD, the kernel filesyste-independant code
>     takes care of it and will send a SETATTR to update ctime/mtime on parent
>     directory.
> 
>     So when we touch spb_heal/0 and spb_heal/1, the NetBSD kernel sends a
>     SETATTR for spb_heal ctime/mtine, and since the other brick is down,
>     here is our metadata split brain.
> 
>     In http://review.gluster.org/9267, Krutika Dhananjay fixes the test by
>     clearing AFR xattr to remove the split brain state, but while it let the
>     test pass, it does not address the real world problem that will leave
>     metadata split brain that does not self heal.
> 
>     Here is a proposal: we know that at the end of conservative merge, we
>     should end up with the situation where directory ctime/mtime is the
>     ctime of the most recently added children. And fortunately, as
>     conservative merge happens, parent directory ctime/mtime are updated on
>     each child addition, and we finish in the desired state.
> 
>     In other words, after conservative merge, parent directory metadata
>     split brain for only ctime/mtime can just be cleared by AFR without any
>     harm.
> 


Seems like adding a lot of code just to do this. AFR does data selfheal, metadata selfheal and entry selfheal for a gfid in that order.
After (conservative) entry self-heal completes, we would again have to examine the looked-up iatts of the dir to see if the metadata SB is
only due to c/mtime mismatch, and not due to uid/gid/permissions mismatch, and only if so, clear the metadata xattrs.

The check can be done in metadata selfheal itself but I don't think AFR is to blame if the user space app in NetBSD sends a setattr on
the parent dir. As far as AFR is concerned, it witnessed a pending metadata fop on the dir and recorded it in the xattrs.We could end up in this
situation on Linux too if we kill bricks alternatively and just do a `touch /mount/existing_dir_name`

Since the extra setattr comes from user space, we still have the directory listed in the output of `heal info` command and we can resolve it.

-Ravi





>     Does it looks reasonable? Any opinion?
> 
>     -- 
>     Emmanuel Dreyfus
>     http://hcpnet.free.fr/pubz
>     manu at netbsd.org
>     _______________________________________________
>     Gluster-devel mailing list
>     Gluster-devel at gluster.org
>     http://supercolony.gluster.org/mailman/listinfo/gluster-devel
> 
> 
> 
> 
> _______________________________________________
> Gluster-devel mailing list
> Gluster-devel at gluster.org
> http://supercolony.gluster.org/mailman/listinfo/gluster-devel
> 



More information about the Gluster-devel mailing list