[Bugs] [Bug 1657743] Very high memory usage (25GB) on Gluster FUSE mountpoint

Wed Jul 10 07:40:26 UTC 2019

https://bugzilla.redhat.com/show_bug.cgi?id=1657743

Nithya Balachandran <nbalacha at redhat.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
              Flags|                            |needinfo?(ryan at magenta.tv)

--- Comment #15 from Nithya Balachandran <nbalacha at redhat.com> ---
(In reply to ryan from comment #14)
> Hi Nithya,
> 
> We set the value of the ping timer lower to reduce the delay in failover
> when a node or storage device fails, normally we only put it on clusters
> that are distribute-replicate, but seems we've put it on a distribute too.
> Am I misunderstanding the role of that option? Would you be able to quickly
> explain its usage?

I found an old email thread that explains this in detail. Please go through
https://lists.gluster.org/pipermail/gluster-users/2017-December/033123.html

> 
> I've grep'd all gluster logs for that string and cannot find any occurrences
> of it. I've also just grep'd for the string 'failed' and cannot find "failed
> to unserialize xattr dict" in the mount point logs.

That is probably because these logs are logged as Warnings but the
client-log-level is set to ERROR.
Can you set it to WARNING, try to reproduce the issue and search the logs
again?

Any information that would help us reproduce this in house would be
appreciated. It is difficult to debug memleaks otherwise.

> 
> Here's the gluster vol info you requested:
> Volume Name: mcv01
> Type: Distribute
> Volume ID: 66d50a1a-7c87-4712-8d7b-bddb19d76498
> Status: Started
> Snapshot Count: 0
> Number of Bricks: 3
> Transport-type: tcp
> Bricks:
> Brick1: node01:/mnt/h1a/data
> Brick2: node02:/mnt/h1a/data
> Brick3: node03:/mnt/h1a/data
> Options Reconfigured:
> server.outstanding-rpc-limit: 128
> performance.readdir-ahead: off
> cluster.rebal-throttle: lazy
> features.quota-deem-statfs: on
> features.inode-quota: on
> features.quota: on
> nfs.disable: on
> transport.address-family: inet
> auth.allow: 172.30.30.*
> performance.client-io-threads: on
> performance.write-behind-window-size: 1MB
> performance.nl-cache-timeout: 600
> performance.nl-cache: on
> performance.io-thread-count: 16
> performance.md-cache-timeout: 600
> performance.cache-samba-metadata: on
> performance.cache-invalidation: on
> features.cache-invalidation-timeout: 600
> features.cache-invalidation: on
> performance.stat-prefetch: on
> performance.cache-size: 1000MB
> storage.batch-fsync-delay-usec: 0
> network.ping-timeout: 5
> performance.md-cache-statfs: off
> client.event-threads: 8
> server.event-threads: 8
> diagnostics.client-log-level: ERROR
> diagnostics.brick-log-level: ERROR
> 
> Many thanks,
> Ryan

-- 
You are receiving this mail because:
You are on the CC list for the bug.