[Gluster-devel] compound fop design first cut
Pranith Kumar Karampuri
pkarampu at redhat.com
Wed Dec 9 17:07:03 UTC 2015
On 12/09/2015 08:08 PM, Shyam wrote:
> On 12/09/2015 12:52 AM, Pranith Kumar Karampuri wrote:
>>
>>
>> On 12/09/2015 10:39 AM, Prashanth Pai wrote:
>>>> However, I’d be even more comfortable with an even simpler approach
>>>> that
>>>> avoids the need to solve what the database folks (who have dealt with
>>>> complex transactions for years) would tell us is a really hard
>>>> problem.
>>>> Instead of designing for every case we can imagine, let’s design
>>>> for the
>>>> cases that we know would be useful for improving performance. Open
>>>> plus
>>>> read/write plus close is an obvious one. Raghavendra mentions
>>>> create+inodelk as well.
>>> From object interface (Swift/S3) perspective, this is the fop order
>>> and flow for object operations:
>>>
>>> GET: open(), fstat(), fgetxattr()s, read()s, close()
>> Krutika implemented fstat+fgetxattr(http://review.gluster.org/10180). In
>> posix there is an implementation of GF_CONTENT_KEY which is used to read
>> a file in lookup by quick-read. This needs to be exposed for fds as well
>> I think. So you can do all this using fstat on anon-fd.
>>> HEAD: stat(), getxattr()s
>> Krutika already implemented this for sharding
>> http://review.gluster.org/10158. You can do this using stat fop.
>
> I believe we need to fork this part of the conversation, i.e the stat
> + xattr information clubbing.
>
> My view on a stat for gluster is, POSIX stat + gluster extended
> information being returned. I state this as, a file system when it
> stats its inode, should get all information regarding the inode, and
> not just the POSIX ones. In the case of other local FS, the inode
> structure has more fields than just what POSIX needs, so when the
> inode is *read* the FS can populate all its internal inode information
> and return to the application/syscall the relevant fields that it needs.
>
> I believe gluster should do the same, so in the cases above, we should
> actually extend our stat information (not elaborating how) to include
> all information from the brick, i.e stat from POSIX and all the
> extended attrs for the inode (file or dir). This can then be consumed
> by any layer as needed.
>
> Currently, each layer adds what it needs in addition to the stat
> information in the xdata, as an xattr request, this can continue or go
> away, if the relevant FOPs return the whole inode information upward.
>
> This also has useful outcomes in readdirp calls, where we get the
> extended stat information for each entry.
You can use "list-xattr" in xdata request to get this.
>
> With the patches referred to, and older patches, this seems to be the
> direction sought (around 2013), any reasons why this is not prevalent
> across the stack and made so? Or am I mistaken?
No reason. We can revive it. There didn't seem to be any interest. So I
didn't follow up to get it in.
Pranith
>
>>> PUT: creat(), write()s, setxattr(), fsync(), close(), rename()
>> This I think should be a new compound fop. Nothing similar exists.
>>> DELETE: getxattr(), unlink()
>> This can also be clubbed in unlink already because xdata exists on the
>> wire already.
>>>
>>> Compounding some of these ops and exposing them as consumable libgfapi
>>> APIs like glfs_get() and glfs_put() similar to librados compound
>>> APIs[1] would greatly improve performance for object based access.
>>>
>>> [1]:
>>> https://github.com/ceph/ceph/blob/master/src/include/rados/librados.h#L2219
>>>
>>>
>>>
>>> Thanks.
>>>
>>> - Prashanth Pai
>>
>> _______________________________________________
>> Gluster-devel mailing list
>> Gluster-devel at gluster.org
>> http://www.gluster.org/mailman/listinfo/gluster-devel
More information about the Gluster-devel
mailing list