[Gluster-devel] optimizing gluster fuse

Manoj Pillai mpillai at redhat.com
Tue Apr 10 05:24:16 UTC 2018


On Tue, Apr 10, 2018 at 10:02 AM, riya khanna <riyakhanna1983 at gmail.com>
wrote:

> On Mon, Apr 9, 2018 at 10:42 PM, Raghavendra Gowdappa <rgowdapp at redhat.com
> > wrote:
>
>> +Manoj.
>>
>> On Mon, Apr 9, 2018 at 10:18 PM, riya khanna <riyakhanna1983 at gmail.com>
>> wrote:
>>
>>> Hi All,
>>>
>>> I'm trying to use the new framework to speed up lookups/attr/xattr
>>> operations by split functionality between fast/slow execution paths. I'd
>>> highly appreciate if you could suggest experiments to evaluate the
>>> performance improvement.
>>>
>>
How about a software build workload, varying the number of source files?
Especially the case where nothing needs to be done because no files have
changed since last build -- this case should be all metadata operations.

-- Manoj


>> As you've pointed out already, this is a good place for read caches (both
>> data and metadata). While there is an overlap between things cached by
>> kernel and things cached by glusterfs, there are somethings which are
>> cached only by glusterfs but not by VFS/kernel. I think this is the area we
>> can explore to move these caches into kernel. Things I can think of:
>>
>>
> Even if things are cached by VFS (e.g., dir entries, attributes, etc.).
> The size of VFS dcache is limited and can affect performance when under
> pressure. Have you ever experienced such as case? Nevertheless, with the
> new framework can help create your own dir/attr cache managed by the
> user-space daemon - lets call it self-managed dcache.
>
>
>> * xattr caching - done by md-cache in glusterfs. I am not sure whether
>> VFS caches xattrs. If not, this can yield good returns for workloads
>> involving xattrs (like POSIX acls etc).
>>
>
> Thanks! Similar to attr, xattr caching should be doable as well. I can
> start by looking at the existing implementation in md-cache.
>
>
>> * GET kind of interface for small files - done by quick-read in
>> glusterfs. Note that we fetch the file in lookup. If we couple this with
>> pushing open-behind in kernel, we can prevent open/readv/flush/release to
>> glusterfs completely in suitable workloads (We had earlier found that this
>> boosts performance for webserver usecases). I think in lookup response, we
>> would've to populate page cache. Also lookup response signature doesn't
>> provide for holding this data. Not sure whether this can be done.
>>
>
> This one is tricky. There are some limitations imposed by the framework.
> Let me think about it.
>
>
>> * Dirent prefetching for directories - done by readdir-ahead.
>>
> The user space daemon in readdir() can populate the self-managed dcache.
> Future lookups can be served from this cache entirely within the kernel.
> What kind of workload can benefit from this?
>
>
>> * As you've already pointed out, we can improve on our invalidation
>> strategies.
>> * since page cache is already present in VFS, I don't think
>> read-ahead/io-cache might have any benefits.
>>
>
> The framework can also bypass fuse user space daemon during data I/O
> (e.g., read, write) if the file is locally stored by the lower file system.
> This design is called pass-though I/O and has been discussed numerous times
> on fuse-dlevel mailing list. Recent discussion: https://lwn.net/Ar
> ticles/674286/
> Does this apply to glusterfs as well, perhaps when a file is cached by the
> client locally?
>
>
>>> As I mentioned in my previous email, I'm caching replies from fuse
>>> daemon (hashed key/value blobs) in the kernel so that for the same key
>>> (e.g., <parent ino, child name> in case of FUSE_LOOKUP), the reply (e.g.,
>>> fuse_entry_out) is served from the kernel itself and no call is delivered
>>> to user-space.
>>>
>>> While this may seem redundant due to entry_timeout/attr_timeout caching
>>> that already exists in FUSE, this design provides more control to the
>>> user-space daemon over when/what to invalidate. For instance, entry_timeout
>>> caching is only valid until a timeout or until the kernel removes dentry
>>> from its dcache.
>>>
>>> For invalidation, fuse_lowlevel_notify_inval_entry() can also remove
>>> entries from the hash table. Please refer to the figure attached in my last
>>> email.
>>>
>>> Thanks,
>>> Riya
>>>
>>> On Tue, Apr 3, 2018 at 1:45 PM, riya khanna <riyakhanna1983 at gmail.com>
>>> wrote:
>>>
>>>> I'm attaching a figure that depicts the architecture of my optimized
>>>> fuse framework. Kindly let me know if you have any questions.
>>>>
>>>> On Mon, Apr 2, 2018 at 10:57 AM, riya khanna <riyakhanna1983 at gmail.com>
>>>> wrote:
>>>>
>>>>> Thanks Amar! Please see my answers inline.
>>>>>
>>>>> On Mon, Apr 2, 2018 at 5:41 AM, Amar Tumballi <atumball at redhat.com>
>>>>> wrote:
>>>>>
>>>>>> Hi Riya,
>>>>>>
>>>>>> Thanks for writing to us. Some questions before we start on this.
>>>>>>
>>>>>> * Where can we see your work of modifying the fuse module to cache
>>>>>> the calls? Some reference would help us to provide more specific pointers.
>>>>>> (or ask better questions).
>>>>>>
>>>>>> I've created a fast path framework for FUSE, where the user space
>>>>> daemon can load a module and register handlers for file operations (lookup,
>>>>> open, r/w, etc.) that must be handled in the kernel itself without an up
>>>>> call to the user space. I call them fast path handlers. This design also
>>>>> retains the regular FUSE handlers for file system operations in upser-space
>>>>> (slow path). The fast path and slow path can communicate with each other
>>>>> over shared memory or using  syscalls to enable/invalidate caching of data
>>>>> structs (e.g., results of getattr, getxattr, etc.)
>>>>>
>>>>> There's a process I need to follow in order to make the code available
>>>>> publicly. I've already started, but will take some time. I will try to do
>>>>> this asap.
>>>>>
>>>>> * If the caching happened in fuse module, and it expects the regular
>>>>>> arguments as the parameters, then there may not be any work required at all
>>>>>> in glusterfs, as it works on low-level fuse api.
>>>>>>
>>>>>>
>>>>> The fast handlers expect same interface and args (fuse_args) as the
>>>>> regular user-space daemon. The fast handler code is fs-specific, therefore,
>>>>> must come from glusterfs. Changes are also needed in glusterfs code to
>>>>> communicate with the fast path for enabling/invalidating caching.
>>>>>
>>>>>
>>>>>> * Also, how to invalidate caches from userspace program? because
>>>>>> GlusterFS could be accessed from multiple clients, so it becomes an
>>>>>> important piece to have.
>>>>>>
>>>>>>
>>>>> Server invalidate can trigger a system call into the fast path
>>>>> framework to invalidate caches.
>>>>>
>>>>>
>>>>>> For referring at the codebase to look at integration with fuse
>>>>>> module, please check the directory 'xlators/mount/fuse/src/' and mostly the
>>>>>> file 'fuse-bridge.c'.
>>>>>>
>>>>>> Thanks for your interest in project, would be great to collaborate on
>>>>>> this effort, as it can enhance the performance of glusterfs in many
>>>>>> usecases.
>>>>>>
>>>>>
>>>>> I'm still going through gluster developer documentation, but it'd be
>>>>> helpful if you could mention what kind of use cases does the fast/slow
>>>>> split FUSE framework enable? i've already applied the framework to
>>>>> accelerate multiple FUSE-based stackable file systems, but want the
>>>>> interface to be generic enough for all FUSE file systems to take advantage
>>>>> of it.
>>>>>
>>>>>
>>>>>> Regards,
>>>>>> Amar
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Mon, Apr 2, 2018 at 6:34 AM, riya khanna <riyakhanna1983 at gmail.com
>>>>>> > wrote:
>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> I've modified FUSE framework to take a part of user-space daemon
>>>>>>> code and moves it into the kernel fuse driver to minimize user-kernel-user
>>>>>>> switches during file system  operations. An example would be caching
>>>>>>> getattr/getxattr/lookup/security checks etc. This design,
>>>>>>> therefore, create fast (served directly from the kernel) and a slow
>>>>>>> (regular fuse) execution paths. The fast and slow paths can also
>>>>>>> communicate with each other using shared memory.
>>>>>>>
>>>>>>> I was wondering if it is possible to accelerate glusterfs using this
>>>>>>> design. What pieces could (should) be easily moved to kernel space?
>>>>>>> Any pointers would be highly appreciated. Thanks!
>>>>>>>
>>>>>>> -Riya
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> Gluster-devel mailing list
>>>>>>> Gluster-devel at gluster.org
>>>>>>> http://lists.gluster.org/mailman/listinfo/gluster-devel
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Amar Tumballi (amarts)
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-devel/attachments/20180410/ab965433/attachment-0001.html>


More information about the Gluster-devel mailing list