[Gluster-devel] GFID to Path Conversion

Tue Nov 24 17:55:27 UTC 2015

There seem to be other interested consumers in gluster for the same 
information, and I guess we need a god base design to address this on 
disk change, so that it can be leveraged in the various use cases 
appropriately.

Request a few folks to list out how they would use this feature and also 
what performance characteristics they expect around the same.

- gluster find class of utilties
- change log processors
- swift on file
- inotify support on gluster
- Others?

[3] is an attempt in XFS to do the same, possibly there is a more later 
thread around the same that discusses later approaches.

[4] slide 13 onwards talks about how cephfs does this. (see cephfs inode 
backtraces)

Aravinda, could you put up a design for the same, and how and where this 
is information is added etc. Would help review it from other xlators 
perspective (like existing DHT).

Shyam
[3] http://oss.sgi.com/archives/xfs/2014-01/msg00224.html
[4] 
http://events.linuxfoundation.org/sites/events/files/slides/CephFS-Vault.pdf

On 10/27/2015 10:02 AM, Shyam wrote:
> Aravinda, List,
>
> The topic is interesting and also relevant in the case of DHT2 where we
> lose the hierarchy on a single brick (unlike the older DHT) and so some
> of the thoughts here are along the same lines as what we are debating
> w.r.t DHT2 as well.
>
> Here is another option that extends the current thought, that I would
> like to put forward, that is pretty much inspired from the Linux kernel
> NFS implementation (based on my current understanding of the same) [1] [2].
>
> If gluster server/brick processes handed out handles, (which are
> currently just GFID (or inode #) of the file), that encode pGFID/GFID,
> then on any handle based operation, we get the pGFID/GFID for the object
> being operated on. This solves the first part of the problem where we
> are encoding the pGFID in the xattr, and here we not only do that but
> further hand out the handle with that relationship.
>
> It also helps when an object is renamed and we still allow the older
> handle to be used for operations. Not a bad thing in some cases, and
> possibly not the best thing to do in some other cases (say access).
>
> To further this knowledge back to a name, what you propose can be stored
> on the object itself. Thus giving us a short dentry tree creation
> ability of pGFID->name(GFID).
>
> This of course changes the gluster RPC wire protocol, as we need to
> encode/send pGFID as well in some cases (or could be done adding this to
> the xdata payload.
>
> Shyam
>
> [1] http://nfs.sourceforge.net/#faq_c7
> [2] https://www.kernel.org/doc/Documentation/filesystems/nfs/Exporting
>
> On 10/27/2015 03:07 AM, Aravinda wrote:
>> Hi,
>>
>> We have a volume option called "build-pgfid:on" to enable recording
>> parent gfid in file xattr. This simplifies the GFID to Path conversion.
>> Is it possible to save base name also in xattr along with PGFID? It
>> helps in converting GFID to Path easily without doing crawl.
>>
>> Example structure,
>>
>> dir1 (3c789e71-24b0-4723-92a2-7eb3c14b4114)
>>      - f1 (0aa94a0a-62aa-4afc-9d59-eb68ad39f78c)
>>      - f2 (f1e7ad00-6500-4284-b21c-d02766ecc336)
>> dir2 (6c3bf2ea-9b52-4bda-a1db-01f3ed5e3fed)
>>      - h1 (0aa94a0a-62aa-4afc-9d59-eb68ad39f78c)
>>
>> Where file f1 and h1 are hardlinks. Note the same GFID.
>>
>> Backend,
>>
>> .glusterfs
>>       - 3c/78/3c789e71-24b0-4723-92a2-7eb3c14b4114
>>       - 0a/a9/0aa94a0a-62aa-4afc-9d59-eb68ad39f78c
>>       - f1/e7/f1e7ad00-6500-4284-b21c-d02766ecc336
>>       - 6c/3b/6c3bf2ea-9b52-4bda-a1db-01f3ed5e3fed
>>
>> Since f1 and h1 are hardlinks accross directories, file xattr will have
>> two parent GFIDs. Xattr dump will be,
>>
>> trusted.pgfid.3c789e71-24b0-4723-92a2-7eb3c14b4114=1
>> trusted.pgfid.6c3bf2ea-9b52-4bda-a1db-01f3ed5e3fed=1
>>
>> Number shows number of hardlinks per parent GFID.
>>
>> If we know GFID of a file, to get path,
>> 1. Identify which brick has that file using pathinfo xattr.
>> 2. Get all parent GFIDs(using listxattr on backend gfid path
>> .glusterfs/0a/a9/0aa94a0a-62aa-4afc-9d59-eb68ad39f78c)
>> 3. Crawl those directories to find files with same inode as
>> .glusterfs/0a/a9/0aa94a0a-62aa-4afc-9d59-eb68ad39f78c
>>
>> Updating PGFID to be done when,
>> 1. CREATE/MKNOD - Add xattr
>> 2. RENAME - If moved to different directory, Update PGFID
>> 3. UNLINK - If number of links is more than 1. Reduce number of link,
>> Remove respective parent PGFID
>> 4. LINK - Add PGFID if link to different directory, Increment count
>>
>> Advantageous:
>> 1. Crawling is limited to a few directories instead of full file system
>> crawl.
>> 2. Break early during crawl when search reaches the hardlinks number as
>> of Xattr value.
>>
>> Disadvantageous:
>> 1. Crawling is expensive if a directory has lot of files.
>> 2. Updating PGFID when CREATE/MKNOD/RENAME/UNLINK/LINK
>> 3. This method of conversion will not work if file is deleted.
>>
>> We can improve performance of GFID to Path conversion if we record
>> Basename also in file xattr.
>>
>> trusted.pgfid.3c789e71-24b0-4723-92a2-7eb3c14b4114=f1
>> trusted.pgfid.6c3bf2ea-9b52-4bda-a1db-01f3ed5e3fed=h1
>>
>> Note: Multiple base names delimited by zerobyte.
>>
>> What additional overhead compare to storing only PGFID,
>> 1. Space
>> 2. Number of xattrs will grow as number of hardlinks
>> 3. Max size issue for xattr value?
>> 4. Even when renamed within the same directory.
>> 5. Updating value of xattr involves parsing in case of multiple
>> hardlinks.
>>
>> Are there any performance issues except during initial indexing.(Assume
>> pgfid and basenames are populated by a separate script)
>>
>> Comments and Suggestions Welcome.
>>
> _______________________________________________
> Gluster-devel mailing list
> Gluster-devel at gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-devel