[Gluster-devel] [RFC] storage/posix: healing parent gfid xattrs

Raghavendra Gowdappa rgowdapp at redhat.com
Mon Dec 2 05:29:26 UTC 2013


Hi all,

With nameless lookups [1], all files and directories present in gluster file systems can be accessed in a flat namespace addressed by gfids. This helps us to bypass the hierarchical path based resolution and operate normally with just gfid based resolution. One of the primary users of this feature is gluster-nfs server. However there are other scenarios too (like a brick rebooting while there are clients connected to it) where we end up doing only gfid based resolution. While this works for most of our use cases, there are certain features which rely on path based hierarchy. Quota and (earlier version of) geo-replication are one of these features. Hence we brought in a feature which, given a gfid can give out all possible paths to that gfid from root [2]. This feature stores a list of keys, one for each parent directory of the file and the name/key is a function of gfid of parent directory. The value of each of these keys is the number of hardlinks to the file in that parent directory.

[root at unused gfs]# mkdir dir1 dir2
[root at unused gfs]# touch dir1/file
[root at unused gfs]# link dir1/file dir1/link
[root at unused gfs]# link dir1/file dir2/link

#pgfid keys and their values.
[root at unused gfs]# getfattr -e hex -d -m "trusted.pgfid.*" dir1/file
# file: dir1/file
trusted.pgfid.117d8ae9-4111-44dc-9c67-29ad676876ec=0x00000001
trusted.pgfid.addcacb9-30cb-4555-a450-9aa45598f445=0x00000002

#gfid of directories dir1 and dir2 from backend directories.
[root at unused gfs]# getfattr -e hex -n "trusted.gfid" /home/export/ptop/dir1
trusted.gfid=0xaddcacb930cb4555a4509aa45598f445

[root at unused gfs]# getfattr -e hex -n "trusted.gfid" /home/export/ptop/dir2
trusted.gfid=0x117d8ae9411144dc9c6729ad676876ec

For more details on how these pgfid xattrs are used to construct path, please refer [2]. The problem this RFC is trying to solve is, if pgfid xattrs are absent on a file, how can we self-heal them. Couple of use cases where we can run into this problem are:
1. Quota is enabled on pre-existing data.
2. Brick crashes after completing an operation creating/deleting a dentry but before that operation is reflected in pgfid xattrs. One way of bringing in crash consistency is to remove pgfid xattrs and heal them freshly.

The fundamental problem here is, calculating correct number of links to a file in a directory while not serializing other dentry operations (with self-heal operation). It seems inevitable that we have to read the entire parent directory to figure out the number of links. Following solution, (it seems to me) can solve the issue with least impact on other dentry operations:

1. In a path-based lookup (where we have parent gfid), check for pgfid xattr. If present, no need for self-heal and if absent, proceed with self-heal operation outlined below. Self-heal can also be triggered for all the files in a directory during a lookup on the directory.

2. before starting heal, get current time 
    self_heal_start = gettimeofday()

3. read the entire contents of the directory and store each file (probably in an in-memory data-structure like list or rb-tree) with correct link count to that file in the directory represented by pgfid.

4. For each entry in the above list, do
   a. acquire a lock exclusively used for synchronization between self-heal code and other operations modifying dentries pointing to this file (say inode->dentry_lock)
      
      lock (inode->dentry_lock)

   b. if (inode_stbuf.st_ctime < self_heal_start) {
          setxattr pgfid key with correct link count on the file
      }

   c. unlock (inode->dentry_lock)

5. All the other operations modifying dentries (like create, link, rename, unlink, mknod, symlink) have to acquire inode->dentry_lock before adding/deleting dentries.

This solution has a caveat that, because of the check in 4b, pgfid can never be healed. We can work around this situation by storing the number of failed heal-attempts in the same pgfid xattr key and when failures cross a certain value, we can:

# remember this code is executed, holding inode->dentry_lock
4b.1 if ((inode_stbuf.st_ctime >= self_heal_start) && (self_heal_failures >= permissible_failures)) {
            read the entire parent directory
            calculate the link count for this file
            set the pgfid key with correct link count
     }

Improvements to the above solution or different solutions solving the same problem are welcome.
           
[1] http://review.gluster.com/#/c/669/
[2] http://review.gluster.org/#/c/5951/

regards,
Raghavendra.




More information about the Gluster-devel mailing list