[Gluster-devel] [RFC] Consistency issues with DHT after snapshots are taken

Raghavendra G raghavendra at gluster.com
Fri Apr 11 06:55:08 UTC 2014


On Thu, Apr 10, 2014 at 11:02 AM, Raghavendra Gowdappa
<rgowdapp at redhat.com>wrote:

> Hi all,
>
> I was trying to come up with some consistency issues. I am not sure
> whether case 5 is a valid one, since lookup would succeed and mkdir would
> fail with EEXIST (scroll down to the case for more detailed explanation).
>

Case 5 is a valid one. This comment was based on an earlier test case which
seemed to be invalid. Sorry about the confusion.


>
> We are considering a distribute of 3 bricks - b1, b2, b3.
>
> Case 1:
> =======
>
> Operation: rename (src, dst) - dst does not exist
>
> T0: rename successful on Hashed subvol but not on other bricks
> T1: Snapshot on b1, b2, b3
>
> Result: After snapshot is restored and healing is complete on src, dst we
> end up with two directories src and dst having gfid of src
>
> Case 2:
> =======
>
> Operation: Two parallel rename (src, dst) and rename (dst, src). Both src
> and dst exist and hash to b1 and b2 respectively
>
> T0: rename (src, dst) successful on b1
> T1: rename (dst, src) successful on b2
> T3: Snapshot on b1, b2, b3
>
> Result:
> After restore, if lookup happens on src and is healed to b1 from b2, gfids
> of src on each brick will be,
> b1 - (src, dst-gfid)
> b2 - (src, dst-gfid)
> b3 - (src, src-gfid)
>
> Case 3:
> =======
>
> Operation: Parallel rename and two mkdirs. Only src exists. Both hash to
> same brick b1.
>
> T0: two lookups triggered as part of application mkdir1 and mkdir2
> complete with ENOENT.
> T1: mkdir2 goes ahead and creates directory with gfid, gfid1
> T2: rename (src, dst) on b1
> T3: mkdir1 (src) on b1
> T4: snapshot on b1, b2 and b3
>
> Result:
> After restore and healing of src and dst, we end up with,
> b1 - (src, gfid2) and (dst, gfid1)
> b2 - (src, gfid1) and (dst, gfid1)
> b3 - (src, gfid1) and (dst, gfid1)
>
> Another reason for this inconsistency is that dht don't consider mkdir
> failures with EEXIST on subvols as failures. More details can be found in
> [2].
>
> Case 4:
> =======
>
> Operation: Parallel rename (src, dst) and rmdir (src). Both src and dst
> exist with gfids gfid1 and gfid2 respectively
>
> T0: rename (src, dst) on b1
> T1: rmdir (src) on b2 and b3
> T2: snapshot on b1, b2 and b3
>
> Result: After restore and healing,
> b1 - (dst, gfid1)
> b2 - (dst, gfid2)
> b3 - (dst, gfid2)
>
> case 5:
> =======
>
> This bug was hit and fix being reviewed at [1]
>
> Operation: Parallel two rmdir and two mkdirs. Directory dir does not exist
> to start with.
>
> T0: two lookups triggered as part of application mkdir1 and mkdir2
> complete with ENOENT.
> T1: mkdir2 goes ahead and creates directory with gfid, gfid1
> T2: rmdir1 (dir) on b1
> T3: lookup (dir) triggered as part of rmdir2 (or any name based
> opeartion), heals dir on b1 with gfid, gfid2
> T4: mkdir1 (dir, gfid2) on b2 and b3
> T5: snapshots on b1, b2 and b3
>
> Result:
> b1 - (dir, gfid1)
> b2 - (dir, gfid2)
> b3 - (dir, gfid2)
>
> Considering all these issues, following set of fixes have been proposed:
>
> 1. in posix, if we receive mkdir (dir1) on an existing gfid (with name
> dir2), posix will convert mkdir (dir1) into rename (dir1, dir2). This
> solves case 1
>
> 2. in case of rename (src, dst), if dst already exists, rmdir (dst), so
> that we don't bring in inconsistency into dst gfid space. This solves all
> the cases of inconsistencies in dst gfid with rename failing.
>
> 3. hold entrylks in directory heal (part of lookup) and rmdir. This solves
> consistency issues because of races b/w mkdir and rmdir.
>
> [1] http://review.gluster.org/#/c/4846/
> [2] http://review.gluster.org/4459
>
> regards,
> Raghavendra.
>
> _______________________________________________
> Gluster-devel mailing list
> Gluster-devel at nongnu.org
> https://lists.nongnu.org/mailman/listinfo/gluster-devel
>



-- 
Raghavendra G
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://supercolony.gluster.org/pipermail/gluster-devel/attachments/20140411/b8017246/attachment-0001.html>


More information about the Gluster-devel mailing list