[Gluster-devel] Fuse mounts and inodes
Csaba Henk
chenk at redhat.com
Wed Sep 6 05:46:24 UTC 2017
Thanks Du, nice bit of info! It made me wander about the following:
- Could it be then the default answer we give to "glusterfs client
high memory usage"
type of complaints to set vfs_cache_pressure to 100 + x?
- And then x = ? Was there proper performance testing done to see how
performance /
mem consumtion changes in terms of vfs_cache_performace?
- vfs_cache_pressure is an allover system tunable. If 100 + x is ideal
for GlusterFS, can
we take the courage to propose this? Is there no risk to trash other
(disk-based)
filesystems' performace?
Csaba
On Wed, Sep 6, 2017 at 6:57 AM, Raghavendra G <raghavendra at gluster.com> wrote:
> Another parallel effort could be trying to configure the number of
> inodes/dentries cached by kernel VFS using /proc/sys/vm interface.
>
> ==============================================================
>
> vfs_cache_pressure
> ------------------
>
> This percentage value controls the tendency of the kernel to reclaim
> the memory which is used for caching of directory and inode objects.
>
> At the default value of vfs_cache_pressure=100 the kernel will attempt to
> reclaim dentries and inodes at a "fair" rate with respect to pagecache and
> swapcache reclaim. Decreasing vfs_cache_pressure causes the kernel to
> prefer
> to retain dentry and inode caches. When vfs_cache_pressure=0, the kernel
> will
> never reclaim dentries and inodes due to memory pressure and this can easily
> lead to out-of-memory conditions. Increasing vfs_cache_pressure beyond 100
> causes the kernel to prefer to reclaim dentries and inodes.
>
> Increasing vfs_cache_pressure significantly beyond 100 may have negative
> performance impact. Reclaim code needs to take various locks to find
> freeable
> directory and inode objects. With vfs_cache_pressure=1000, it will look for
> ten times more freeable objects than there are.
>
> Also we've an article for sysadmins which has a section:
>
> <quote>
>
> With GlusterFS, many users with a lot of storage and many small files
> easily end up using a lot of RAM on the server side due to
> 'inode/dentry' caching, leading to decreased performance when the kernel
> keeps crawling through data-structures on a 40GB RAM system. Changing
> this value higher than 100 has helped many users to achieve fair caching
> and more responsiveness from the kernel.
>
> </quote>
>
> Complete article can be found at:
> https://gluster.readthedocs.io/en/latest/Administrator%20Guide/Linux%20Kernel%20Tuning/
>
> regards,
>
>
> On Tue, Sep 5, 2017 at 5:20 PM, Raghavendra Gowdappa <rgowdapp at redhat.com>
> wrote:
>>
>> +gluster-devel
>>
>> Ashish just spoke to me about need of GC of inodes due to some state in
>> inode that is being proposed in EC. Hence adding more people to
>> conversation.
>>
>> > > On 4 September 2017 at 12:34, Csaba Henk <chenk at redhat.com> wrote:
>> > >
>> > > > I don't know, depends on how sophisticated GC we need/want/can get
>> > > > by. I
>> > > > guess the complexity will be inherent, ie. that of the algorithm
>> > > > chosen
>> > > > and
>> > > > how we address concurrency & performance impacts, but once that's
>> > > > got
>> > > > right
>> > > > the other aspects of implementation won't be hard.
>> > > >
>> > > > Eg. would it be good just to maintain a simple LRU list?
>> > > >
>> >
>> > Yes. I was also thinking of leveraging lru list. We can invalidate first
>> > "n"
>> > inodes from lru list of fuse inode table.
>> >
>> > >
>> > > That might work for starters.
>> > >
>> > > >
>> > > > Csaba
>> > > >
>> > > > On Mon, Sep 4, 2017 at 8:48 AM, Nithya Balachandran
>> > > > <nbalacha at redhat.com>
>> > > > wrote:
>> > > >
>> > > >>
>> > > >>
>> > > >> On 4 September 2017 at 12:14, Csaba Henk <chenk at redhat.com> wrote:
>> > > >>
>> > > >>> Basically how I see the fuse invalidate calls as rescuers of
>> > > >>> sanity.
>> > > >>>
>> > > >>> Normally, when you have lot of certain kind of stuff that tends to
>> > > >>> accumulate, the immediate thought is: let's set up some garbage
>> > > >>> collection
>> > > >>> mechanism, that will take care of keeping the accumulation at bay.
>> > > >>> But
>> > > >>> that's what doesn't work with inodes in a naive way, as they are
>> > > >>> referenced
>> > > >>> from kernel, so we have to keep them around until kernel tells us
>> > > >>> it's
>> > > >>> giving up its reference. However, with the fuse invalidate calls
>> > > >>> we can
>> > > >>> take the initiative and instruct the kernel: "hey, kernel, give up
>> > > >>> your
>> > > >>> references to this thing!"
>> > > >>>
>> > > >>> So we are actually free to implement any kind of inode GC in
>> > > >>> glusterfs,
>> > > >>> just have to take care to add the proper callback to
>> > > >>> fuse_invalidate_*
>> > > >>> and
>> > > >>> we are good to go.
>> > > >>>
>> > > >>>
>> > > >> That sounds good and something we need to do in the near future. Is
>> > > >> this
>> > > >> something that is easy to implement?
>> > > >>
>> > > >>
>> > > >>> Csaba
>> > > >>>
>> > > >>> On Mon, Sep 4, 2017 at 7:00 AM, Nithya Balachandran
>> > > >>> <nbalacha at redhat.com
>> > > >>> > wrote:
>> > > >>>
>> > > >>>>
>> > > >>>>
>> > > >>>> On 4 September 2017 at 10:25, Raghavendra Gowdappa
>> > > >>>> <rgowdapp at redhat.com
>> > > >>>> > wrote:
>> > > >>>>
>> > > >>>>>
>> > > >>>>>
>> > > >>>>> ----- Original Message -----
>> > > >>>>> > From: "Nithya Balachandran" <nbalacha at redhat.com>
>> > > >>>>> > Sent: Monday, September 4, 2017 10:19:37 AM
>> > > >>>>> > Subject: Fuse mounts and inodes
>> > > >>>>> >
>> > > >>>>> > Hi,
>> > > >>>>> >
>> > > >>>>> > One of the reasons for the memory consumption in gluster fuse
>> > > >>>>> > mounts
>> > > >>>>> is the
>> > > >>>>> > number of inodes in the table which are never kicked out.
>> > > >>>>> >
>> > > >>>>> > Is there any way to default to an entry-timeout and
>> > > >>>>> attribute-timeout value
>> > > >>>>> > while mounting Gluster using Fuse? Say 60s each so those
>> > > >>>>> > entries
>> > > >>>>> will be
>> > > >>>>> > purged periodically?
>> > > >>>>>
>> > > >>>>> Once the entry timeouts, inodes won't be purged. Kernel sends a
>> > > >>>>> lookup
>> > > >>>>> to revalidate the mapping of path to inode. AFAIK, reverse
>> > > >>>>> invalidation
>> > > >>>>> (see inode_invalidate) is the only way to make kernel forget
>> > > >>>>> inodes/attributes.
>> > > >>>>>
>> > > >>>>> Is that something that can be done from the Fuse mount ? Or is
>> > > >>>>> this
>> > > >>>> something that needs to be added to Fuse?
>> > > >>>>
>> > > >>>>> >
>> > > >>>>> > Regards,
>> > > >>>>> > Nithya
>> > > >>>>> >
>> > > >>>>>
>> > > >>>>
>> > > >>>>
>> > > >>>
>> > > >>
>> > > >
>> > >
>> >
>> _______________________________________________
>> Gluster-devel mailing list
>> Gluster-devel at gluster.org
>> http://lists.gluster.org/mailman/listinfo/gluster-devel
>
>
>
>
> --
> Raghavendra G
>
> _______________________________________________
> Gluster-devel mailing list
> Gluster-devel at gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-devel
More information about the Gluster-devel
mailing list