[Bugs] [Bug 1216303] Fixes for data self-heal in ec
bugzilla at redhat.com
bugzilla at redhat.com
Fri May 8 22:04:22 UTC 2015
https://bugzilla.redhat.com/show_bug.cgi?id=1216303
--- Comment #17 from Anand Avati <aavati at redhat.com> ---
COMMIT: http://review.gluster.org/10691 committed in release-3.7 by Pranith
Kumar Karampuri (pkarampu at redhat.com)
------
commit fae1e70ff3309d2b64febaafc70abcaa2771ecf0
Author: Pranith Kumar K <pkarampu at redhat.com>
Date: Thu Apr 16 09:25:31 2015 +0530
cluster/ec: metadata/name/entry heal implementation for ec
Metadata self-heal:
1) Take inode lock in domain 'this->name' on 0-0 range (full file)
2) perform lookup and get the xattrs on all the bricks
3) Choose the brick with highest version as source
4) Setattr uid/gid/permissions
5) removexattr stale xattrs
6) Setxattr existing/new xattrs
7) xattrop with -ve values of 'dirty' and difference of highest and its own
version values for version xattr
8) unlock lock acquired in 1)
Entry self-heal:
1) take directory lock in domain 'this->name:self-heal' on 'NULL' to
prevent
more than one self-heal
2) we take directory lock in domain 'this->name' on 'NULL'
3) Perform lookup on version, dirty and remember the values
4) unlock lock acquired in 2)
5) readdir on all the bricks and trigger name heals
6) xattrop with -ve values of 'dirty' and difference of highest and its own
version values for version xattr
7) unlock lock acquired in 1)
Name heal:
1) Take 'name' lock in 'this->name' on 'NULL'
2) Perform lookup on 'name' and get stat and xattr structures
3) Build gfid_db where for each gfid we know what subvolumes/bricks have
a file with 'name'
4) Delete all the stale files i.e. the file does not exist on more than
ec->redundancy number of bricks
5) On all the subvolumes/bricks with missing entry create 'name' with same
type,gfid,permissions etc.
6) Unlock lock acquired in 1)
Known limitation: At the moment with present design, it conservatively
preserves the 'name' in case it can not decide whether to delete it. this
can
happen in the following scenario:
1) we have 3=2+1 (bricks: A, B, C) ec volume and 1 brick is down (Lets say
A)
2) rename d1/f1 -> d2/f2 is performed but the rename is successful only on
one
of the bricks (Lets say B)
3) Now name self-heal on d1 and d2 would re-create the file on both d1 and
d2
resulting in d1/f1 and d2/f2.
Because we wanted to prevent data loss in the case above, the following
scenario is not healable, i.e. it needs manual intervention:
1) we have 3=2+1 (bricks: A, B, C) ec volume and 1 brick is down (Lets say
A)
2) We have two hard links: d1/a, d2/b and another file d3/c even before the
brick went down
3) rename d3/c -> d2/b is performed
4) Now name self-heal on d2/b doesn't heal because d2/b with older gfid
will
not be deleted. One could think why not delete the link if there is
more than 1 hardlink, but that leads to similar data loss issue I
described
earlier:
Scenario:
1) we have 3=2+1 (bricks: A, B, C) ec volume and 1 brick is down (Lets say
A)
2) We have two hard links: d1/a, d2/b
3) rename d1/a -> d3/c, d2/b -> d4/d is performed and both the operations
are
successful only on one of the bricks (Lets say B)
4) Now name self-heal on the 'names' above which can happen in parallel can
decide to delete the file thinking it has 2 links but after all the
self-heals do unlinks we are left with data loss.
Change-Id: I3a68218a47bb726bd684604efea63cf11cfd11be
BUG: 1216303
Signed-off-by: Pranith Kumar K <pkarampu at redhat.com>
Reviewed-on: http://review.gluster.org/10298
Reviewed-on: http://review.gluster.org/10691
Tested-by: Gluster Build System <jenkins at build.gluster.com>
Tested-by: NetBSD Build System
--
You are receiving this mail because:
You are on the CC list for the bug.
You are the assignee for the bug.
More information about the Bugs
mailing list