[Gluster-devel] [RFC] storage/posix: healing parent gfid xattrs

Raghavendra Gowdappa rgowdapp at redhat.com
Mon Dec 2 05:40:15 UTC 2013

----- Original Message -----
> From: "Raghavendra Gowdappa" <rgowdapp at redhat.com>
> To: "Gluster Devel" <gluster-devel at nongnu.org>
> Cc: "Brian Foster" <bfoster at redhat.com>, "Pranith Kumar Karampuri" <pkarampu at redhat.com>, "Krishnan Parthasarathi"
> <kparthas at redhat.com>, "Ramana Raja" <rraja at redhat.com>
> Sent: Monday, December 2, 2013 10:59:26 AM
> Subject: [RFC] storage/posix: healing parent gfid xattrs
> Hi all,
> With nameless lookups [1], all files and directories present in gluster file
> systems can be accessed in a flat namespace addressed by gfids. This helps
> us to bypass the hierarchical path based resolution and operate normally
> with just gfid based resolution. One of the primary users of this feature is
> gluster-nfs server. However there are other scenarios too (like a brick
> rebooting while there are clients connected to it) where we end up doing
> only gfid based resolution. While this works for most of our use cases,
> there are certain features which rely on path based hierarchy. Quota and
> (earlier version of) geo-replication are one of these features. Hence we
> brought in a feature which, given a gfid can give out all possible paths to
> that gfid from root [2]. This feature stores a list of keys, one for each
> parent directory of the file and the name/key is a function of gfid of
> parent directory. The value of each of these keys is the number of hardlinks
> to the file in that parent directory.
> [root at unused gfs]# mkdir dir1 dir2
> [root at unused gfs]# touch dir1/file
> [root at unused gfs]# link dir1/file dir1/link
> [root at unused gfs]# link dir1/file dir2/link
> #pgfid keys and their values.
> [root at unused gfs]# getfattr -e hex -d -m "trusted.pgfid.*" dir1/file
> # file: dir1/file
> trusted.pgfid.117d8ae9-4111-44dc-9c67-29ad676876ec=0x00000001
> trusted.pgfid.addcacb9-30cb-4555-a450-9aa45598f445=0x00000002
> #gfid of directories dir1 and dir2 from backend directories.
> [root at unused gfs]# getfattr -e hex -n "trusted.gfid" /home/export/ptop/dir1
> trusted.gfid=0xaddcacb930cb4555a4509aa45598f445
> [root at unused gfs]# getfattr -e hex -n "trusted.gfid" /home/export/ptop/dir2
> trusted.gfid=0x117d8ae9411144dc9c6729ad676876ec
> For more details on how these pgfid xattrs are used to construct path, please
> refer [2]. The problem this RFC is trying to solve is, if pgfid xattrs are
> absent on a file, how can we self-heal them. Couple of use cases where we
> can run into this problem are:
> 1. Quota is enabled on pre-existing data.
> 2. Brick crashes after completing an operation creating/deleting a dentry but
> before that operation is reflected in pgfid xattrs. One way of bringing in
> crash consistency is to remove pgfid xattrs and heal them freshly.
> The fundamental problem here is, calculating correct number of links to a
> file in a directory while not serializing other dentry operations (with
> self-heal operation). It seems inevitable that we have to read the entire
> parent directory to figure out the number of links. Following solution, (it
> seems to me) can solve the issue with least impact on other dentry
> operations:
> 1. In a path-based lookup (where we have parent gfid), check for pgfid xattr.
> If present, no need for self-heal and if absent, proceed with self-heal
> operation outlined below. Self-heal can also be triggered for all the files
> in a directory during a lookup on the directory.
> 2. before starting heal, get current time
>     self_heal_start = gettimeofday()
> 3. read the entire contents of the directory and store each file (probably in
> an in-memory data-structure like list or rb-tree) with correct link count to
> that file in the directory represented by pgfid.
> 4. For each entry in the above list, do
>    a. acquire a lock exclusively used for synchronization between self-heal
>    code and other operations modifying dentries pointing to this file (say
>    inode->dentry_lock)
>       lock (inode->dentry_lock)
>    b. if (inode_stbuf.st_ctime < self_heal_start) {
>           setxattr pgfid key with correct link count on the file
>       }
>    c. unlock (inode->dentry_lock)
> 5. All the other operations modifying dentries (like create, link, rename,
> unlink, mknod, symlink) have to acquire inode->dentry_lock before
> adding/deleting dentries.
> This solution has a caveat that, because of the check in 4b, pgfid can never
> be healed. We can work around this situation by storing the number of failed
> heal-attempts in the same pgfid xattr key 

As negative values, to differentiate self-heal failures from link counts

> and when failures cross a certain
> value, we can:
> # remember this code is executed, holding inode->dentry_lock
> 4b.1 if ((inode_stbuf.st_ctime >= self_heal_start) && (self_heal_failures >=
> permissible_failures)) {
>             read the entire parent directory
>             calculate the link count for this file
>             set the pgfid key with correct link count
>      }
> Improvements to the above solution or different solutions solving the same
> problem are welcome.
> [1] http://review.gluster.com/#/c/669/
> [2] http://review.gluster.org/#/c/5951/
> regards,
> Raghavendra.

More information about the Gluster-devel mailing list