[Gluster-devel] compound fop design first cut

Wed Dec 9 14:38:21 UTC 2015

On 12/09/2015 12:52 AM, Pranith Kumar Karampuri wrote:
>
>
> On 12/09/2015 10:39 AM, Prashanth Pai wrote:
>>> However, I’d be even more comfortable with an even simpler approach that
>>> avoids the need to solve what the database folks (who have dealt with
>>> complex transactions for years) would tell us is a really hard problem.
>>> Instead of designing for every case we can imagine, let’s design for the
>>> cases that we know would be useful for improving performance.  Open plus
>>> read/write plus close is an obvious one.  Raghavendra mentions
>>> create+inodelk as well.
>>  From object interface (Swift/S3) perspective, this is the fop order
>> and flow for object operations:
>>
>> GET: open(), fstat(), fgetxattr()s, read()s, close()
> Krutika implemented fstat+fgetxattr(http://review.gluster.org/10180). In
> posix there is an implementation of GF_CONTENT_KEY which is used to read
> a file in lookup by quick-read. This needs to be exposed for fds as well
> I think. So you can do all this using fstat on anon-fd.
>> HEAD: stat(), getxattr()s
> Krutika already implemented this for sharding
> http://review.gluster.org/10158. You can do this using stat fop.

I believe we need to fork this part of the conversation, i.e the stat + 
xattr information clubbing.

My view on a stat for gluster is, POSIX stat + gluster extended 
information being returned. I state this as, a file system when it stats 
its inode, should get all information regarding the inode, and not just 
the POSIX ones. In the case of other local FS, the inode structure has 
more fields than just what POSIX needs, so when the inode is *read* the 
FS can populate all its internal inode information and return to the 
application/syscall the relevant fields that it needs.

I believe gluster should do the same, so in the cases above, we should 
actually extend our stat information (not elaborating how) to include 
all information from the brick, i.e stat from POSIX and all the extended 
attrs for the inode (file or dir). This can then be consumed by any 
layer as needed.

Currently, each layer adds what it needs in addition to the stat 
information in the xdata, as an xattr request, this can continue or go 
away, if the relevant FOPs return the whole inode information upward.

This also has useful outcomes in readdirp calls, where we get the 
extended stat information for each entry.

With the patches referred to, and older patches, this seems to be the 
direction sought (around 2013), any reasons why this is not prevalent 
across the stack and made so? Or am I mistaken?

>> PUT: creat(), write()s, setxattr(), fsync(), close(), rename()
> This I think should be a new compound fop. Nothing similar exists.
>> DELETE: getxattr(), unlink()
> This can also be clubbed in unlink already because xdata exists on the
> wire already.
>>
>> Compounding some of these ops and exposing them as consumable libgfapi
>> APIs like glfs_get() and glfs_put() similar to librados compound
>> APIs[1] would greatly improve performance for object based access.
>>
>> [1]:
>> https://github.com/ceph/ceph/blob/master/src/include/rados/librados.h#L2219
>>
>>
>> Thanks.
>>
>> - Prashanth Pai
>
> _______________________________________________
> Gluster-devel mailing list
> Gluster-devel at gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-devel