[Gluster-devel] GFID to Path Conversion

Tue Oct 27 07:07:46 UTC 2015

Hi,

We have a volume option called "build-pgfid:on" to enable recording 
parent gfid in file xattr. This simplifies the GFID to Path conversion.
Is it possible to save base name also in xattr along with PGFID? It 
helps in converting GFID to Path easily without doing crawl.

Example structure,

dir1 (3c789e71-24b0-4723-92a2-7eb3c14b4114)
     - f1 (0aa94a0a-62aa-4afc-9d59-eb68ad39f78c)
     - f2 (f1e7ad00-6500-4284-b21c-d02766ecc336)
dir2 (6c3bf2ea-9b52-4bda-a1db-01f3ed5e3fed)
     - h1 (0aa94a0a-62aa-4afc-9d59-eb68ad39f78c)

Where file f1 and h1 are hardlinks. Note the same GFID.

Backend,

.glusterfs
      - 3c/78/3c789e71-24b0-4723-92a2-7eb3c14b4114
      - 0a/a9/0aa94a0a-62aa-4afc-9d59-eb68ad39f78c
      - f1/e7/f1e7ad00-6500-4284-b21c-d02766ecc336
      - 6c/3b/6c3bf2ea-9b52-4bda-a1db-01f3ed5e3fed

Since f1 and h1 are hardlinks accross directories, file xattr will have 
two parent GFIDs. Xattr dump will be,

trusted.pgfid.3c789e71-24b0-4723-92a2-7eb3c14b4114=1
trusted.pgfid.6c3bf2ea-9b52-4bda-a1db-01f3ed5e3fed=1

Number shows number of hardlinks per parent GFID.

If we know GFID of a file, to get path,
1. Identify which brick has that file using pathinfo xattr.
2. Get all parent GFIDs(using listxattr on backend gfid path 
.glusterfs/0a/a9/0aa94a0a-62aa-4afc-9d59-eb68ad39f78c)
3. Crawl those directories to find files with same inode as 
.glusterfs/0a/a9/0aa94a0a-62aa-4afc-9d59-eb68ad39f78c

Updating PGFID to be done when,
1. CREATE/MKNOD - Add xattr
2. RENAME - If moved to different directory, Update PGFID
3. UNLINK - If number of links is more than 1. Reduce number of link, 
Remove respective parent PGFID
4. LINK - Add PGFID if link to different directory, Increment count

Advantageous:
1. Crawling is limited to a few directories instead of full file system 
crawl.
2. Break early during crawl when search reaches the hardlinks number as 
of Xattr value.

Disadvantageous:
1. Crawling is expensive if a directory has lot of files.
2. Updating PGFID when CREATE/MKNOD/RENAME/UNLINK/LINK
3. This method of conversion will not work if file is deleted.

We can improve performance of GFID to Path conversion if we record 
Basename also in file xattr.

trusted.pgfid.3c789e71-24b0-4723-92a2-7eb3c14b4114=f1
trusted.pgfid.6c3bf2ea-9b52-4bda-a1db-01f3ed5e3fed=h1

Note: Multiple base names delimited by zerobyte.

What additional overhead compare to storing only PGFID,
1. Space
2. Number of xattrs will grow as number of hardlinks
3. Max size issue for xattr value?
4. Even when renamed within the same directory.
5. Updating value of xattr involves parsing in case of multiple hardlinks.

Are there any performance issues except during initial indexing.(Assume 
pgfid and basenames are populated by a separate script)

Comments and Suggestions Welcome.

-- 
regards
Aravinda