[Gluster-devel] AFR conservative merge portability
Ravishankar N
ravishankar at redhat.com
Mon Dec 15 07:12:44 UTC 2014
On 12/15/2014 11:10 AM, Krutika Dhananjay wrote:
> Seems OK to me, as long as the appropriate locks are taken.
>
> -Krutika
>
> --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
> *From: *"Emmanuel Dreyfus" <manu at netbsd.org>
> *To: *"Gluster Devel" <gluster-devel at gluster.org>
> *Sent: *Saturday, December 13, 2014 8:08:03 PM
> *Subject: *[Gluster-devel] AFR conservative merge portability
>
> Hello
>
> On NetBSD, tests/basic/afr/entry-self-heal.t always fail on this
> scenario:
>
> mkdir spb_heal
> kill brick brick0
> touch spb_heal/0
> glusterfs volume start force
> kill_brick brick1
> touch spb_heal/1
> glusterfs volume start force
>
> At that time, conservative merge takes off and copy spb_heal/0 and
> spb_heal/1 in each brick where it is missing. That works, but on NetBSD
> we are left with AFR xattr on spb_heal directory telling each brick
> accuses the other for metadata. This metadata split brain that will not
> self heal.
>
> This happens because after adding an entry, parent directory (spb_heal
> here) mtime/ctime must be updated. On Linux, it seems the filesystem is
> responsible for that. On NetBSD, the kernel filesyste-independant code
> takes care of it and will send a SETATTR to update ctime/mtime on parent
> directory.
>
> So when we touch spb_heal/0 and spb_heal/1, the NetBSD kernel sends a
> SETATTR for spb_heal ctime/mtine, and since the other brick is down,
> here is our metadata split brain.
>
> In http://review.gluster.org/9267, Krutika Dhananjay fixes the test by
> clearing AFR xattr to remove the split brain state, but while it let the
> test pass, it does not address the real world problem that will leave
> metadata split brain that does not self heal.
>
> Here is a proposal: we know that at the end of conservative merge, we
> should end up with the situation where directory ctime/mtime is the
> ctime of the most recently added children. And fortunately, as
> conservative merge happens, parent directory ctime/mtime are updated on
> each child addition, and we finish in the desired state.
>
> In other words, after conservative merge, parent directory metadata
> split brain for only ctime/mtime can just be cleared by AFR without any
> harm.
>
Seems like adding a lot of code just to do this. AFR does data selfheal, metadata selfheal and entry selfheal for a gfid in that order.
After (conservative) entry self-heal completes, we would again have to examine the looked-up iatts of the dir to see if the metadata SB is
only due to c/mtime mismatch, and not due to uid/gid/permissions mismatch, and only if so, clear the metadata xattrs.
The check can be done in metadata selfheal itself but I don't think AFR is to blame if the user space app in NetBSD sends a setattr on
the parent dir. As far as AFR is concerned, it witnessed a pending metadata fop on the dir and recorded it in the xattrs.We could end up in this
situation on Linux too if we kill bricks alternatively and just do a `touch /mount/existing_dir_name`
Since the extra setattr comes from user space, we still have the directory listed in the output of `heal info` command and we can resolve it.
-Ravi
> Does it looks reasonable? Any opinion?
>
> --
> Emmanuel Dreyfus
> http://hcpnet.free.fr/pubz
> manu at netbsd.org
> _______________________________________________
> Gluster-devel mailing list
> Gluster-devel at gluster.org
> http://supercolony.gluster.org/mailman/listinfo/gluster-devel
>
>
>
>
> _______________________________________________
> Gluster-devel mailing list
> Gluster-devel at gluster.org
> http://supercolony.gluster.org/mailman/listinfo/gluster-devel
>
More information about the Gluster-devel
mailing list