[Gluster-devel] GFID to Path Conversion

Tue Oct 27 14:02:18 UTC 2015

Aravinda, List,

The topic is interesting and also relevant in the case of DHT2 where we 
lose the hierarchy on a single brick (unlike the older DHT) and so some 
of the thoughts here are along the same lines as what we are debating 
w.r.t DHT2 as well.

Here is another option that extends the current thought, that I would 
like to put forward, that is pretty much inspired from the Linux kernel 
NFS implementation (based on my current understanding of the same) [1] [2].

If gluster server/brick processes handed out handles, (which are 
currently just GFID (or inode #) of the file), that encode pGFID/GFID, 
then on any handle based operation, we get the pGFID/GFID for the object 
being operated on. This solves the first part of the problem where we 
are encoding the pGFID in the xattr, and here we not only do that but 
further hand out the handle with that relationship.

It also helps when an object is renamed and we still allow the older 
handle to be used for operations. Not a bad thing in some cases, and 
possibly not the best thing to do in some other cases (say access).

To further this knowledge back to a name, what you propose can be stored 
on the object itself. Thus giving us a short dentry tree creation 
ability of pGFID->name(GFID).

This of course changes the gluster RPC wire protocol, as we need to 
encode/send pGFID as well in some cases (or could be done adding this to 
the xdata payload.

Shyam

[1] http://nfs.sourceforge.net/#faq_c7
[2] https://www.kernel.org/doc/Documentation/filesystems/nfs/Exporting

On 10/27/2015 03:07 AM, Aravinda wrote:
> Hi,
>
> We have a volume option called "build-pgfid:on" to enable recording
> parent gfid in file xattr. This simplifies the GFID to Path conversion.
> Is it possible to save base name also in xattr along with PGFID? It
> helps in converting GFID to Path easily without doing crawl.
>
> Example structure,
>
> dir1 (3c789e71-24b0-4723-92a2-7eb3c14b4114)
>      - f1 (0aa94a0a-62aa-4afc-9d59-eb68ad39f78c)
>      - f2 (f1e7ad00-6500-4284-b21c-d02766ecc336)
> dir2 (6c3bf2ea-9b52-4bda-a1db-01f3ed5e3fed)
>      - h1 (0aa94a0a-62aa-4afc-9d59-eb68ad39f78c)
>
> Where file f1 and h1 are hardlinks. Note the same GFID.
>
> Backend,
>
> .glusterfs
>       - 3c/78/3c789e71-24b0-4723-92a2-7eb3c14b4114
>       - 0a/a9/0aa94a0a-62aa-4afc-9d59-eb68ad39f78c
>       - f1/e7/f1e7ad00-6500-4284-b21c-d02766ecc336
>       - 6c/3b/6c3bf2ea-9b52-4bda-a1db-01f3ed5e3fed
>
> Since f1 and h1 are hardlinks accross directories, file xattr will have
> two parent GFIDs. Xattr dump will be,
>
> trusted.pgfid.3c789e71-24b0-4723-92a2-7eb3c14b4114=1
> trusted.pgfid.6c3bf2ea-9b52-4bda-a1db-01f3ed5e3fed=1
>
> Number shows number of hardlinks per parent GFID.
>
> If we know GFID of a file, to get path,
> 1. Identify which brick has that file using pathinfo xattr.
> 2. Get all parent GFIDs(using listxattr on backend gfid path
> .glusterfs/0a/a9/0aa94a0a-62aa-4afc-9d59-eb68ad39f78c)
> 3. Crawl those directories to find files with same inode as
> .glusterfs/0a/a9/0aa94a0a-62aa-4afc-9d59-eb68ad39f78c
>
> Updating PGFID to be done when,
> 1. CREATE/MKNOD - Add xattr
> 2. RENAME - If moved to different directory, Update PGFID
> 3. UNLINK - If number of links is more than 1. Reduce number of link,
> Remove respective parent PGFID
> 4. LINK - Add PGFID if link to different directory, Increment count
>
> Advantageous:
> 1. Crawling is limited to a few directories instead of full file system
> crawl.
> 2. Break early during crawl when search reaches the hardlinks number as
> of Xattr value.
>
> Disadvantageous:
> 1. Crawling is expensive if a directory has lot of files.
> 2. Updating PGFID when CREATE/MKNOD/RENAME/UNLINK/LINK
> 3. This method of conversion will not work if file is deleted.
>
> We can improve performance of GFID to Path conversion if we record
> Basename also in file xattr.
>
> trusted.pgfid.3c789e71-24b0-4723-92a2-7eb3c14b4114=f1
> trusted.pgfid.6c3bf2ea-9b52-4bda-a1db-01f3ed5e3fed=h1
>
> Note: Multiple base names delimited by zerobyte.
>
> What additional overhead compare to storing only PGFID,
> 1. Space
> 2. Number of xattrs will grow as number of hardlinks
> 3. Max size issue for xattr value?
> 4. Even when renamed within the same directory.
> 5. Updating value of xattr involves parsing in case of multiple hardlinks.
>
> Are there any performance issues except during initial indexing.(Assume
> pgfid and basenames are populated by a separate script)
>
> Comments and Suggestions Welcome.
>