[Gluster-devel] compound fop design first cut

Pranith Kumar Karampuri pkarampu at redhat.com
Wed Dec 9 17:07:03 UTC 2015

On 12/09/2015 08:08 PM, Shyam wrote:
> On 12/09/2015 12:52 AM, Pranith Kumar Karampuri wrote:
>> On 12/09/2015 10:39 AM, Prashanth Pai wrote:
>>>> However, I’d be even more comfortable with an even simpler approach 
>>>> that
>>>> avoids the need to solve what the database folks (who have dealt with
>>>> complex transactions for years) would tell us is a really hard 
>>>> problem.
>>>> Instead of designing for every case we can imagine, let’s design 
>>>> for the
>>>> cases that we know would be useful for improving performance.  Open 
>>>> plus
>>>> read/write plus close is an obvious one.  Raghavendra mentions
>>>> create+inodelk as well.
>>>  From object interface (Swift/S3) perspective, this is the fop order
>>> and flow for object operations:
>>> GET: open(), fstat(), fgetxattr()s, read()s, close()
>> Krutika implemented fstat+fgetxattr(http://review.gluster.org/10180). In
>> posix there is an implementation of GF_CONTENT_KEY which is used to read
>> a file in lookup by quick-read. This needs to be exposed for fds as well
>> I think. So you can do all this using fstat on anon-fd.
>>> HEAD: stat(), getxattr()s
>> Krutika already implemented this for sharding
>> http://review.gluster.org/10158. You can do this using stat fop.
> I believe we need to fork this part of the conversation, i.e the stat 
> + xattr information clubbing.
> My view on a stat for gluster is, POSIX stat + gluster extended 
> information being returned. I state this as, a file system when it 
> stats its inode, should get all information regarding the inode, and 
> not just the POSIX ones. In the case of other local FS, the inode 
> structure has more fields than just what POSIX needs, so when the 
> inode is *read* the FS can populate all its internal inode information 
> and return to the application/syscall the relevant fields that it needs.
> I believe gluster should do the same, so in the cases above, we should 
> actually extend our stat information (not elaborating how) to include 
> all information from the brick, i.e stat from POSIX and all the 
> extended attrs for the inode (file or dir). This can then be consumed 
> by any layer as needed.
> Currently, each layer adds what it needs in addition to the stat 
> information in the xdata, as an xattr request, this can continue or go 
> away, if the relevant FOPs return the whole inode information upward.
> This also has useful outcomes in readdirp calls, where we get the 
> extended stat information for each entry.
You can use "list-xattr" in xdata request to get this.
> With the patches referred to, and older patches, this seems to be the 
> direction sought (around 2013), any reasons why this is not prevalent 
> across the stack and made so? Or am I mistaken?
No reason. We can revive it. There didn't seem to be any interest. So I 
didn't follow up to get it in.

>>> PUT: creat(), write()s, setxattr(), fsync(), close(), rename()
>> This I think should be a new compound fop. Nothing similar exists.
>>> DELETE: getxattr(), unlink()
>> This can also be clubbed in unlink already because xdata exists on the
>> wire already.
>>> Compounding some of these ops and exposing them as consumable libgfapi
>>> APIs like glfs_get() and glfs_put() similar to librados compound
>>> APIs[1] would greatly improve performance for object based access.
>>> [1]:
>>> https://github.com/ceph/ceph/blob/master/src/include/rados/librados.h#L2219 
>>> Thanks.
>>> - Prashanth Pai
>> _______________________________________________
>> Gluster-devel mailing list
>> Gluster-devel at gluster.org
>> http://www.gluster.org/mailman/listinfo/gluster-devel

More information about the Gluster-devel mailing list