[Gluster-devel] GFID to Path Conversion

Wed Nov 25 04:38:07 UTC 2015

regards
Aravinda

On 11/24/2015 11:25 PM, Shyam wrote:
> There seem to be other interested consumers in gluster for the same 
> information, and I guess we need a god base design to address this on 
> disk change, so that it can be leveraged in the various use cases 
> appropriately.
>
> Request a few folks to list out how they would use this feature and 
> also what performance characteristics they expect around the same.
>
> - gluster find class of utilties
> - change log processors
> - swift on file
> - inotify support on gluster
> - Others?
Debugging utilities for users/admins(Show path for GFIDs displayed in 
log files)
Retrigger Sync in Geo-replication(Geo-rep reports failed GFIDs in logs, 
we can retrigger sync if path is known instead of GFID)
>
> [3] is an attempt in XFS to do the same, possibly there is a more 
> later thread around the same that discusses later approaches.
>
> [4] slide 13 onwards talks about how cephfs does this. (see cephfs 
> inode backtraces)
>
> Aravinda, could you put up a design for the same, and how and where 
> this is information is added etc. Would help review it from other 
> xlators perspective (like existing DHT).
>
> Shyam
> [3] http://oss.sgi.com/archives/xfs/2014-01/msg00224.html
> [4] 
> http://events.linuxfoundation.org/sites/events/files/slides/CephFS-Vault.pdf
>
> On 10/27/2015 10:02 AM, Shyam wrote:
>> Aravinda, List,
>>
>> The topic is interesting and also relevant in the case of DHT2 where we
>> lose the hierarchy on a single brick (unlike the older DHT) and so some
>> of the thoughts here are along the same lines as what we are debating
>> w.r.t DHT2 as well.
>>
>> Here is another option that extends the current thought, that I would
>> like to put forward, that is pretty much inspired from the Linux kernel
>> NFS implementation (based on my current understanding of the same) 
>> [1] [2].
>>
>> If gluster server/brick processes handed out handles, (which are
>> currently just GFID (or inode #) of the file), that encode pGFID/GFID,
>> then on any handle based operation, we get the pGFID/GFID for the object
>> being operated on. This solves the first part of the problem where we
>> are encoding the pGFID in the xattr, and here we not only do that but
>> further hand out the handle with that relationship.
>>
>> It also helps when an object is renamed and we still allow the older
>> handle to be used for operations. Not a bad thing in some cases, and
>> possibly not the best thing to do in some other cases (say access).
>>
>> To further this knowledge back to a name, what you propose can be stored
>> on the object itself. Thus giving us a short dentry tree creation
>> ability of pGFID->name(GFID).
>>
>> This of course changes the gluster RPC wire protocol, as we need to
>> encode/send pGFID as well in some cases (or could be done adding this to
>> the xdata payload.
>>
>> Shyam
>>
>> [1] http://nfs.sourceforge.net/#faq_c7
>> [2] https://www.kernel.org/doc/Documentation/filesystems/nfs/Exporting
>>
>> On 10/27/2015 03:07 AM, Aravinda wrote:
>>> Hi,
>>>
>>> We have a volume option called "build-pgfid:on" to enable recording
>>> parent gfid in file xattr. This simplifies the GFID to Path conversion.
>>> Is it possible to save base name also in xattr along with PGFID? It
>>> helps in converting GFID to Path easily without doing crawl.
>>>
>>> Example structure,
>>>
>>> dir1 (3c789e71-24b0-4723-92a2-7eb3c14b4114)
>>>      - f1 (0aa94a0a-62aa-4afc-9d59-eb68ad39f78c)
>>>      - f2 (f1e7ad00-6500-4284-b21c-d02766ecc336)
>>> dir2 (6c3bf2ea-9b52-4bda-a1db-01f3ed5e3fed)
>>>      - h1 (0aa94a0a-62aa-4afc-9d59-eb68ad39f78c)
>>>
>>> Where file f1 and h1 are hardlinks. Note the same GFID.
>>>
>>> Backend,
>>>
>>> .glusterfs
>>>       - 3c/78/3c789e71-24b0-4723-92a2-7eb3c14b4114
>>>       - 0a/a9/0aa94a0a-62aa-4afc-9d59-eb68ad39f78c
>>>       - f1/e7/f1e7ad00-6500-4284-b21c-d02766ecc336
>>>       - 6c/3b/6c3bf2ea-9b52-4bda-a1db-01f3ed5e3fed
>>>
>>> Since f1 and h1 are hardlinks accross directories, file xattr will have
>>> two parent GFIDs. Xattr dump will be,
>>>
>>> trusted.pgfid.3c789e71-24b0-4723-92a2-7eb3c14b4114=1
>>> trusted.pgfid.6c3bf2ea-9b52-4bda-a1db-01f3ed5e3fed=1
>>>
>>> Number shows number of hardlinks per parent GFID.
>>>
>>> If we know GFID of a file, to get path,
>>> 1. Identify which brick has that file using pathinfo xattr.
>>> 2. Get all parent GFIDs(using listxattr on backend gfid path
>>> .glusterfs/0a/a9/0aa94a0a-62aa-4afc-9d59-eb68ad39f78c)
>>> 3. Crawl those directories to find files with same inode as
>>> .glusterfs/0a/a9/0aa94a0a-62aa-4afc-9d59-eb68ad39f78c
>>>
>>> Updating PGFID to be done when,
>>> 1. CREATE/MKNOD - Add xattr
>>> 2. RENAME - If moved to different directory, Update PGFID
>>> 3. UNLINK - If number of links is more than 1. Reduce number of link,
>>> Remove respective parent PGFID
>>> 4. LINK - Add PGFID if link to different directory, Increment count
>>>
>>> Advantageous:
>>> 1. Crawling is limited to a few directories instead of full file system
>>> crawl.
>>> 2. Break early during crawl when search reaches the hardlinks number as
>>> of Xattr value.
>>>
>>> Disadvantageous:
>>> 1. Crawling is expensive if a directory has lot of files.
>>> 2. Updating PGFID when CREATE/MKNOD/RENAME/UNLINK/LINK
>>> 3. This method of conversion will not work if file is deleted.
>>>
>>> We can improve performance of GFID to Path conversion if we record
>>> Basename also in file xattr.
>>>
>>> trusted.pgfid.3c789e71-24b0-4723-92a2-7eb3c14b4114=f1
>>> trusted.pgfid.6c3bf2ea-9b52-4bda-a1db-01f3ed5e3fed=h1
>>>
>>> Note: Multiple base names delimited by zerobyte.
>>>
>>> What additional overhead compare to storing only PGFID,
>>> 1. Space
>>> 2. Number of xattrs will grow as number of hardlinks
>>> 3. Max size issue for xattr value?
>>> 4. Even when renamed within the same directory.
>>> 5. Updating value of xattr involves parsing in case of multiple
>>> hardlinks.
>>>
>>> Are there any performance issues except during initial indexing.(Assume
>>> pgfid and basenames are populated by a separate script)
>>>
>>> Comments and Suggestions Welcome.
>>>
>> _______________________________________________
>> Gluster-devel mailing list
>> Gluster-devel at gluster.org
>> http://www.gluster.org/mailman/listinfo/gluster-devel