[Gluster-devel] Fuse mounts and inodes

Wed Sep 6 14:49:16 UTC 2017

On Wed, Sep 6, 2017 at 11:16 AM, Csaba Henk <chenk at redhat.com> wrote:

> Thanks Du, nice bit of info! It made me wander about the following:
>
> - Could it be then the default answer we give to "glusterfs client
> high memory usage"
>   type of complaints to set vfs_cache_pressure to 100 + x?
>
- And then x = ? Was there proper performance testing done to see how
> performance /
>   mem consumtion changes in terms of vfs_cache_performace?
>

I had a discussion with Manoj on this. One drawback with using
vfs_cache_performance tunable is that its a dynamic algorithm which decides
whether to purge from page cache or inode cache looking at the current
memory pressure. An obvious drawback for glusterfs is that, various of
caches of glusterfs are not visible to kernel (Memory consumed by Glusterfs
gets reflected neither in page cache nor in inode cache). This _might_
result in algorithm working poorly.

- vfs_cache_pressure is an allover system tunable. If 100 + x is ideal
> for GlusterFS, can
>   we take the courage to propose this? Is there no risk to trash other
> (disk-based)
>   filesystems' performace?
>

That's a valid point. Behavior of other filesystems would be a concern.

I've not really thought through this suggestion of tuning /proc/sys/vm
tunables and I am not even an expert who knows what tunables are at our
disposal. Just wanted to bring this idea to notice of wider audience.

> Csaba
>
> On Wed, Sep 6, 2017 at 6:57 AM, Raghavendra G <raghavendra at gluster.com>
> wrote:
> > Another parallel effort could be trying to configure the number of
> > inodes/dentries cached by kernel VFS using /proc/sys/vm interface.
> >
> > ==============================================================
> >
> > vfs_cache_pressure
> > ------------------
> >
> > This percentage value controls the tendency of the kernel to reclaim
> > the memory which is used for caching of directory and inode objects.
> >
> > At the default value of vfs_cache_pressure=100 the kernel will attempt to
> > reclaim dentries and inodes at a "fair" rate with respect to pagecache
> and
> > swapcache reclaim.  Decreasing vfs_cache_pressure causes the kernel to
> > prefer
> > to retain dentry and inode caches. When vfs_cache_pressure=0, the kernel
> > will
> > never reclaim dentries and inodes due to memory pressure and this can
> easily
> > lead to out-of-memory conditions. Increasing vfs_cache_pressure beyond
> 100
> > causes the kernel to prefer to reclaim dentries and inodes.
> >
> > Increasing vfs_cache_pressure significantly beyond 100 may have negative
> > performance impact. Reclaim code needs to take various locks to find
> > freeable
> > directory and inode objects. With vfs_cache_pressure=1000, it will look
> for
> > ten times more freeable objects than there are.
> >
> > Also we've an article for sysadmins which has a section:
> >
> > <quote>
> >
> > With GlusterFS, many users with a lot of storage and many small files
> > easily end up using a lot of RAM on the server side due to
> > 'inode/dentry' caching, leading to decreased performance when the kernel
> > keeps crawling through data-structures on a 40GB RAM system. Changing
> > this value higher than 100 has helped many users to achieve fair caching
> > and more responsiveness from the kernel.
> >
> > </quote>
> >
> > Complete article can be found at:
> > https://gluster.readthedocs.io/en/latest/Administrator%
> 20Guide/Linux%20Kernel%20Tuning/
> >
> > regards,
> >
> >
> > On Tue, Sep 5, 2017 at 5:20 PM, Raghavendra Gowdappa <
> rgowdapp at redhat.com>
> > wrote:
> >>
> >> +gluster-devel
> >>
> >> Ashish just spoke to me about need of GC of inodes due to some state in
> >> inode that is being proposed in EC. Hence adding more people to
> >> conversation.
> >>
> >> > > On 4 September 2017 at 12:34, Csaba Henk <chenk at redhat.com> wrote:
> >> > >
> >> > > > I don't know, depends on how sophisticated GC we need/want/can get
> >> > > > by. I
> >> > > > guess the complexity will be inherent, ie. that of the algorithm
> >> > > > chosen
> >> > > > and
> >> > > > how we address concurrency & performance impacts, but once that's
> >> > > > got
> >> > > > right
> >> > > > the other aspects of implementation won't be hard.
> >> > > >
> >> > > > Eg. would it be good just to maintain a simple LRU list?
> >> > > >
> >> >
> >> > Yes. I was also thinking of leveraging lru list. We can invalidate
> first
> >> > "n"
> >> > inodes from lru list of fuse inode table.
> >> >
> >> > >
> >> > > That might work for starters.
> >> > >
> >> > > >
> >> > > > Csaba
> >> > > >
> >> > > > On Mon, Sep 4, 2017 at 8:48 AM, Nithya Balachandran
> >> > > > <nbalacha at redhat.com>
> >> > > > wrote:
> >> > > >
> >> > > >>
> >> > > >>
> >> > > >> On 4 September 2017 at 12:14, Csaba Henk <chenk at redhat.com>
> wrote:
> >> > > >>
> >> > > >>> Basically how I see the fuse invalidate calls as rescuers of
> >> > > >>> sanity.
> >> > > >>>
> >> > > >>> Normally, when you have lot of certain kind of stuff that tends
> to
> >> > > >>> accumulate, the immediate thought is: let's set up some garbage
> >> > > >>> collection
> >> > > >>> mechanism, that will take care of keeping the accumulation at
> bay.
> >> > > >>> But
> >> > > >>> that's what doesn't work with inodes in a naive way, as they are
> >> > > >>> referenced
> >> > > >>> from kernel, so we have to keep them around until kernel tells
> us
> >> > > >>> it's
> >> > > >>> giving up its reference. However, with the fuse invalidate calls
> >> > > >>> we can
> >> > > >>> take the initiative and instruct the kernel: "hey, kernel, give
> up
> >> > > >>> your
> >> > > >>> references to this thing!"
> >> > > >>>
> >> > > >>> So we are actually free to implement any kind of inode GC in
> >> > > >>> glusterfs,
> >> > > >>> just have to take care to add the proper callback to
> >> > > >>> fuse_invalidate_*
> >> > > >>> and
> >> > > >>> we are good to go.
> >> > > >>>
> >> > > >>>
> >> > > >> That sounds good and something we need to do in the near future.
> Is
> >> > > >> this
> >> > > >> something that is easy to implement?
> >> > > >>
> >> > > >>
> >> > > >>> Csaba
> >> > > >>>
> >> > > >>> On Mon, Sep 4, 2017 at 7:00 AM, Nithya Balachandran
> >> > > >>> <nbalacha at redhat.com
> >> > > >>> > wrote:
> >> > > >>>
> >> > > >>>>
> >> > > >>>>
> >> > > >>>> On 4 September 2017 at 10:25, Raghavendra Gowdappa
> >> > > >>>> <rgowdapp at redhat.com
> >> > > >>>> > wrote:
> >> > > >>>>
> >> > > >>>>>
> >> > > >>>>>
> >> > > >>>>> ----- Original Message -----
> >> > > >>>>> > From: "Nithya Balachandran" <nbalacha at redhat.com>
> >> > > >>>>> > Sent: Monday, September 4, 2017 10:19:37 AM
> >> > > >>>>> > Subject: Fuse mounts and inodes
> >> > > >>>>> >
> >> > > >>>>> > Hi,
> >> > > >>>>> >
> >> > > >>>>> > One of the reasons for the memory consumption in gluster
> fuse
> >> > > >>>>> > mounts
> >> > > >>>>> is the
> >> > > >>>>> > number of inodes in the table which are never kicked out.
> >> > > >>>>> >
> >> > > >>>>> > Is there any way to default to an entry-timeout and
> >> > > >>>>> attribute-timeout value
> >> > > >>>>> > while mounting Gluster using Fuse? Say 60s each so those
> >> > > >>>>> > entries
> >> > > >>>>> will be
> >> > > >>>>> > purged periodically?
> >> > > >>>>>
> >> > > >>>>> Once the entry timeouts, inodes won't be purged. Kernel sends
> a
> >> > > >>>>> lookup
> >> > > >>>>> to revalidate the mapping of path to inode. AFAIK, reverse
> >> > > >>>>> invalidation
> >> > > >>>>> (see inode_invalidate) is the only way to make kernel forget
> >> > > >>>>> inodes/attributes.
> >> > > >>>>>
> >> > > >>>>> Is that something that can be done from the Fuse mount ? Or is
> >> > > >>>>> this
> >> > > >>>> something that needs to be added to Fuse?
> >> > > >>>>
> >> > > >>>>> >
> >> > > >>>>> > Regards,
> >> > > >>>>> > Nithya
> >> > > >>>>> >
> >> > > >>>>>
> >> > > >>>>
> >> > > >>>>
> >> > > >>>
> >> > > >>
> >> > > >
> >> > >
> >> >
> >> _______________________________________________
> >> Gluster-devel mailing list
> >> Gluster-devel at gluster.org
> >> http://lists.gluster.org/mailman/listinfo/gluster-devel
> >
> >
> >
> >
> > --
> > Raghavendra G
> >
> > _______________________________________________
> > Gluster-devel mailing list
> > Gluster-devel at gluster.org
> > http://lists.gluster.org/mailman/listinfo/gluster-devel
> _______________________________________________
> Gluster-devel mailing list
> Gluster-devel at gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-devel
>

-- 
Raghavendra G
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-devel/attachments/20170906/98041bca/attachment-0001.html>