[Gluster-users] Run away memory with gluster mount
Raghavendra Gowdappa
rgowdapp at redhat.com
Mon Feb 5 19:45:26 UTC 2018
I missed your reply :). Sorry about that.
----- Original Message -----
> From: "Raghavendra Gowdappa" <rgowdapp at redhat.com>
> To: "Dan Ragle" <daniel at Biblestuph.com>
> Cc: "Csaba Henk" <chenk at redhat.com>, "gluster-users" <gluster-users at gluster.org>
> Sent: Tuesday, February 6, 2018 1:14:10 AM
> Subject: Re: [Gluster-users] Run away memory with gluster mount
>
> Hi Dan,
>
> I had a suggestion and a question in my previous response. Let us know
> whether the suggestion helps and please let us know about your data-set
> (like how many directories/files and how these directories/files are
> organised) to understand the problem better.
>
> <snip>
>
> > In the
> > meantime can you remount glusterfs with options
> > --entry-timeout=0 and --attribute-timeout=0? This will make sure
> > that kernel won't cache inodes/attributes of the file and should
> > bring down the memory usage.
> >
> > I am curious to know what is your data-set like? Is it the case
> > of too many directories and files present in deep directories? I
> > am wondering whether a significant number of inodes cached by
> > kernel are there to hold dentry structure in kernel.
>
> </snip>
>
> regards,
> Raghavendra
>
> ----- Original Message -----
> > From: "Dan Ragle" <daniel at Biblestuph.com>
> > To: "Nithya Balachandran" <nbalacha at redhat.com>
> > Cc: "gluster-users" <gluster-users at gluster.org>, "Csaba Henk"
> > <chenk at redhat.com>
> > Sent: Saturday, February 3, 2018 7:28:15 PM
> > Subject: Re: [Gluster-users] Run away memory with gluster mount
> >
> >
> >
> > On 2/2/2018 2:13 AM, Nithya Balachandran wrote:
> > > Hi Dan,
> > >
> > > It sounds like you might be running into [1]. The patch has been posted
> > > upstream and the fix should be in the next release.
> > > In the meantime, I'm afraid there is no way to get around this without
> > > restarting the process.
> > >
> > > Regards,
> > > Nithya
> > >
> > > [1]https://bugzilla.redhat.com/show_bug.cgi?id=1541264
> > >
> >
> > Much appreciated. Will watch for the next release and retest then.
> >
> > Cheers!
> >
> > Dan
> >
> > >
> > > On 2 February 2018 at 02:57, Dan Ragle <daniel at biblestuph.com
> > > <mailto:daniel at biblestuph.com>> wrote:
> > >
> > >
> > >
> > > On 1/30/2018 6:31 AM, Raghavendra Gowdappa wrote:
> > >
> > >
> > >
> > > ----- Original Message -----
> > >
> > > From: "Dan Ragle" <daniel at Biblestuph.com>
> > > To: "Raghavendra Gowdappa" <rgowdapp at redhat.com
> > > <mailto:rgowdapp at redhat.com>>, "Ravishankar N"
> > > <ravishankar at redhat.com <mailto:ravishankar at redhat.com>>
> > > Cc: gluster-users at gluster.org
> > > <mailto:gluster-users at gluster.org>, "Csaba Henk"
> > > <chenk at redhat.com <mailto:chenk at redhat.com>>, "Niels de Vos"
> > > <ndevos at redhat.com <mailto:ndevos at redhat.com>>, "Nithya
> > > Balachandran" <nbalacha at redhat.com
> > > <mailto:nbalacha at redhat.com>>
> > > Sent: Monday, January 29, 2018 9:02:21 PM
> > > Subject: Re: [Gluster-users] Run away memory with gluster
> > > mount
> > >
> > >
> > >
> > > On 1/29/2018 2:36 AM, Raghavendra Gowdappa wrote:
> > >
> > >
> > >
> > > ----- Original Message -----
> > >
> > > From: "Ravishankar N" <ravishankar at redhat.com
> > > <mailto:ravishankar at redhat.com>>
> > > To: "Dan Ragle" <daniel at Biblestuph.com>,
> > > gluster-users at gluster.org
> > > <mailto:gluster-users at gluster.org>
> > > Cc: "Csaba Henk" <chenk at redhat.com
> > > <mailto:chenk at redhat.com>>, "Niels de Vos"
> > > <ndevos at redhat.com <mailto:ndevos at redhat.com>>,
> > > "Nithya Balachandran" <nbalacha at redhat.com
> > > <mailto:nbalacha at redhat.com>>,
> > > "Raghavendra Gowdappa" <rgowdapp at redhat.com
> > > <mailto:rgowdapp at redhat.com>>
> > > Sent: Saturday, January 27, 2018 10:23:38 AM
> > > Subject: Re: [Gluster-users] Run away memory with
> > > gluster mount
> > >
> > >
> > >
> > > On 01/27/2018 02:29 AM, Dan Ragle wrote:
> > >
> > >
> > > On 1/25/2018 8:21 PM, Ravishankar N wrote:
> > >
> > >
> > >
> > > On 01/25/2018 11:04 PM, Dan Ragle wrote:
> > >
> > > *sigh* trying again to correct
> > > formatting ... apologize for the
> > > earlier mess.
> > >
> > > Having a memory issue with Gluster
> > > 3.12.4 and not sure how to
> > > troubleshoot. I don't *think* this is
> > > expected behavior.
> > >
> > > This is on an updated CentOS 7 box. The
> > > setup is a simple two node
> > > replicated layout where the two nodes
> > > act as both server and
> > > client.
> > >
> > > The volume in question:
> > >
> > > Volume Name: GlusterWWW
> > > Type: Replicate
> > > Volume ID:
> > > 8e9b0e79-f309-4d9b-a5bb-45d065faaaa3
> > > Status: Started
> > > Snapshot Count: 0
> > > Number of Bricks: 1 x 2 = 2
> > > Transport-type: tcp
> > > Bricks:
> > > Brick1:
> > > vs1dlan.mydomain.com:/glusterfs_bricks/brick1/www
> > > Brick2:
> > > vs2dlan.mydomain.com:/glusterfs_bricks/brick1/www
> > > Options Reconfigured:
> > > nfs.disable: on
> > > cluster.favorite-child-policy: mtime
> > > transport.address-family: inet
> > >
> > > I had some other performance options in
> > > there, (increased
> > > cache-size, md invalidation, etc) but
> > > stripped them out in an
> > > attempt to
> > > isolate the issue. Still got the problem
> > > without them.
> > >
> > > The volume currently contains over 1M
> > > files.
> > >
> > > When mounting the volume, I get (among
> > > other things) a process as such:
> > >
> > > /usr/sbin/glusterfs
> > > --volfile-server=localhost
> > > --volfile-id=/GlusterWWW /var/www
> > >
> > > This process begins with little memory,
> > > but then as files are
> > > accessed in the volume the memory
> > > increases. I setup a script that
> > > simply reads the files in the volume one
> > > at a time (no writes). It's
> > > been running on and off about 12 hours
> > > now and the resident
> > > memory of the above process is already
> > > at 7.5G and continues to grow
> > > slowly. If I stop the test script the
> > > memory stops growing,
> > > but does not reduce. Restart the test
> > > script and the memory begins
> > > slowly growing again.
> > >
> > > This is obviously a contrived app
> > > environment. With my intended
> > > application load it takes about a week
> > > or so for the memory to get
> > > high enough to invoke the oom killer.
> > >
> > >
> > > Can you try debugging with the statedump
> > > (https://gluster.readthedocs.io/en/latest/Troubleshooting/statedump/#read-a-statedump
> > > <https://gluster.readthedocs.io/en/latest/Troubleshooting/statedump/#read-a-statedump>)
> > > of
> > > the fuse mount process and see what member
> > > is leaking? Take the
> > > statedumps in succession, maybe once
> > > initially during the I/O and
> > > once the memory gets high enough to hit the
> > > OOM mark.
> > > Share the dumps here.
> > >
> > > Regards,
> > > Ravi
> > >
> > >
> > > Thanks for the reply. I noticed yesterday that
> > > an update (3.12.5) had
> > > been posted so I went ahead and updated and
> > > repeated the test
> > > overnight. The memory usage does not appear to
> > > be growing as quickly
> > > as is was with 3.12.4, but does still appear to
> > > be growing.
> > >
> > > I should also mention that there is another
> > > process beyond my test app
> > > that is reading the files from the volume.
> > > Specifically, there is an
> > > rsync that runs from the second node 2-4 times
> > > an hour that reads from
> > > the GlusterWWW volume mounted on node 1. Since
> > > none of the files in
> > > that mount are changing it doesn't actually
> > > rsync anything, but
> > > nonetheless it is running and reading the files
> > > in addition to my test
> > > script. (It's a part of my intended production
> > > setup that I forgot was
> > > still running.)
> > >
> > > The mount process appears to be gaining memory
> > > at a rate of about 1GB
> > > every 4 hours or so. At that rate it'll take
> > > several days before it
> > > runs the box out of memory. But I took your
> > > suggestion and made some
> > > statedumps today anyway, about 2 hours apart, 4
> > > total so far. It looks
> > > like there may already be some actionable
> > > information. These are the
> > > only registers where the num_allocs have grown
> > > with each of the four
> > > samples:
> > >
> > > [mount/fuse.fuse - usage-type gf_fuse_mt_gids_t
> > > memusage]
> > > ---> num_allocs at Fri Jan 26 08:57:31 2018:
> > > 784
> > > ---> num_allocs at Fri Jan 26 10:55:50 2018:
> > > 831
> > > ---> num_allocs at Fri Jan 26 12:55:15 2018:
> > > 877
> > > ---> num_allocs at Fri Jan 26 14:58:27 2018:
> > > 908
> > >
> > > [mount/fuse.fuse - usage-type
> > > gf_common_mt_fd_lk_ctx_t memusage]
> > > ---> num_allocs at Fri Jan 26 08:57:31 2018:
> > > 5
> > > ---> num_allocs at Fri Jan 26 10:55:50 2018:
> > > 10
> > > ---> num_allocs at Fri Jan 26 12:55:15 2018:
> > > 15
> > > ---> num_allocs at Fri Jan 26 14:58:27 2018:
> > > 17
> > >
> > > [cluster/distribute.GlusterWWW-dht - usage-type
> > > gf_dht_mt_dht_layout_t
> > > memusage]
> > > ---> num_allocs at Fri Jan 26 08:57:31 2018:
> > > 24243596
> > > ---> num_allocs at Fri Jan 26 10:55:50 2018:
> > > 27902622
> > > ---> num_allocs at Fri Jan 26 12:55:15 2018:
> > > 30678066
> > > ---> num_allocs at Fri Jan 26 14:58:27 2018:
> > > 33801036
> > >
> > > Not sure the best way to get you the full dumps.
> > > They're pretty big,
> > > over 1G for all four. Also, I noticed some
> > > filepath information in
> > > there that I'd rather not share. What's the
> > > recommended next step?
> > >
> > >
> > > Please run the following query on statedump files and
> > > report us the
> > > results:
> > > # grep itable <client-statedump> | grep active | wc -l
> > > # grep itable <client-statedump> | grep active_size
> > > # grep itable <client-statedump> | grep lru | wc -l
> > > # grep itable <client-statedump> | grep lru_size
> > > # grep itable <client-statedump> | grep purge | wc -l
> > > # grep itable <client-statedump> | grep purge_size
> > >
> > >
> > > Had to restart the test and have been running for 36 hours
> > > now. RSS is
> > > currently up to 23g.
> > >
> > > Working on getting a bug report with link to the dumps. In
> > > the mean
> > > time, I'm including the results of your above queries for
> > > the first
> > > dump, the 18 hour dump, and the 36 hour dump:
> > >
> > > # grep itable glusterdump.153904.dump.1517104561 | grep
> > > active | wc -l
> > > 53865
> > > # grep itable glusterdump.153904.dump.1517169361 | grep
> > > active | wc -l
> > > 53864
> > > # grep itable glusterdump.153904.dump.1517234161 | grep
> > > active | wc -l
> > > 53864
> > >
> > > # grep itable glusterdump.153904.dump.1517104561 | grep
> > > active_size
> > > xlator.mount.fuse.itable.active_size=53864
> > > # grep itable glusterdump.153904.dump.1517169361 | grep
> > > active_size
> > > xlator.mount.fuse.itable.active_size=53863
> > > # grep itable glusterdump.153904.dump.1517234161 | grep
> > > active_size
> > > xlator.mount.fuse.itable.active_size=53863
> > >
> > > # grep itable glusterdump.153904.dump.1517104561 | grep lru
> > > | wc -l
> > > 998510
> > > # grep itable glusterdump.153904.dump.1517169361 | grep lru
> > > | wc -l
> > > 998510
> > > # grep itable glusterdump.153904.dump.1517234161 | grep lru
> > > | wc -l
> > > 995992
> > >
> > > # grep itable glusterdump.153904.dump.1517104561 | grep
> > > lru_size
> > > xlator.mount.fuse.itable.lru_size=998508
> > > # grep itable glusterdump.153904.dump.1517169361 | grep
> > > lru_size
> > > xlator.mount.fuse.itable.lru_size=998508
> > > # grep itable glusterdump.153904.dump.1517234161 | grep
> > > lru_size
> > > xlator.mount.fuse.itable.lru_size=995990
> > >
> > >
> > > Around 1 million of inodes in lru table!! These are the inodes
> > > kernel has just cached and no operation is currently progress on
> > > these inodes. This could be the reason for high memory usage.
> > > We've a patch being worked on (merged on experimental branch
> > > currently) [1], that will help in these sceanrios. In the
> > > meantime can you remount glusterfs with options
> > > --entry-timeout=0 and --attribute-timeout=0? This will make sure
> > > that kernel won't cache inodes/attributes of the file and should
> > > bring down the memory usage.
> > >
> > > I am curious to know what is your data-set like? Is it the case
> > > of too many directories and files present in deep directories? I
> > > am wondering whether a significant number of inodes cached by
> > > kernel are there to hold dentry structure in kernel.
> > >
> > > [1] https://review.gluster.org/#/c/18665/
> > > <https://review.gluster.org/#/c/18665/>
> > >
> > >
> > > OK, remounted with your recommended attributes and repeated the
> > > test. Now the mount process looks like this:
> > >
> > > /usr/sbin/glusterfs --attribute-timeout=0 --entry-timeout=0
> > > --volfile-server=localhost --volfile-id=/GlusterWWW /var/www
> > >
> > > However after running for 36 hours it's again at about 23g (about
> > > the same place it was on the first test).
> > >
> > > A few metrics from the 36 hour mark:
> > >
> > > num_allocs for [cluster/distribute.GlusterWWW-dht - usage-type
> > > gf_dht_mt_dht_layout_t memusage] is 109140094. Seems at least
> > > somewhat similar to the original test, which had 117901593 at the 36
> > > hour mark.
> > >
> > > The dump file at the 36 hour mark had nothing for lru or lru_size.
> > > However, at the dump two hours prior it had:
> > >
> > > # grep itable glusterdump.67299.dump.1517493361 | grep lru | wc -l
> > > 998510
> > > # grep itable glusterdump.67299.dump.1517493361 | grep lru_size
> > > xlator.mount.fuse.itable.lru_size=998508
> > >
> > > and the same thing for the dump four hours later. Are these values
> > > only relevant when the ls -R is actually running? I'm thinking the
> > > 36 hour dump may have caught the ls -R between runs there (?)
> > >
> > > The data set is multiple Web sites. I know there's some litter there
> > > we can clean up, but I'd guess not more than 200-300k files or so.
> > > The biggest culprit is a single directory that we use as a
> > > multi-purpose file store, with filenames stored as GUIDs and linked
> > > to a DB. That directory currently has 500k+ files. Another directory
> > > serves a similar purpose and has about 66k files in it. The rest is
> > > generally distributed more "normally", I.E., a mixed nesting of
> > > directories and files.
> > >
> > > Cheers!
> > >
> > > Dan
> > >
> > >
> > >
> > > # grep itable glusterdump.153904.dump.1517104561 | grep
> > > purge | wc -l
> > > 1
> > > # grep itable glusterdump.153904.dump.1517169361 | grep
> > > purge | wc -l
> > > 1
> > > # grep itable glusterdump.153904.dump.1517234161 | grep
> > > purge | wc -l
> > > 1
> > >
> > > # grep itable glusterdump.153904.dump.1517104561 | grep
> > > purge_size
> > > xlator.mount.fuse.itable.purge_size=0
> > > # grep itable glusterdump.153904.dump.1517169361 | grep
> > > purge_size
> > > xlator.mount.fuse.itable.purge_size=0
> > > # grep itable glusterdump.153904.dump.1517234161 | grep
> > > purge_size
> > > xlator.mount.fuse.itable.purge_size=0
> > >
> > > Cheers,
> > >
> > > Dan
> > >
> > >
> > >
> > > I've CC'd the fuse/ dht devs to see if these data
> > > types have potential
> > > leaks. Could you raise a bug with the volume info
> > > and a (dropbox?) link
> > > from which we can download the dumps? You can
> > > remove/replace the
> > > filepaths from them.
> > >
> > > Regards.
> > > Ravi
> > >
> > >
> > > Cheers!
> > >
> > > Dan
> > >
> > >
> > > Is there potentially something
> > > misconfigured here?
> > >
> > > I did see a reference to a memory leak
> > > in another thread in this
> > > list, but that had to do with the
> > > setting of quotas, I don't have
> > > any quotas set on my system.
> > >
> > > Thanks,
> > >
> > > Dan Ragle
> > > daniel at Biblestuph.com
> > >
> > > On 1/25/2018 11:04 AM, Dan Ragle wrote:
> > >
> > > Having a memory issue with Gluster
> > > 3.12.4 and not sure how to
> > > troubleshoot. I don't *think* this
> > > is expected behavior. This is on an
> > > updated CentOS 7 box. The setup is a
> > > simple two node replicated layout
> > > where the two nodes act as both
> > > server and client. The volume in
> > > question: Volume Name: GlusterWWW
> > > Type: Replicate Volume ID:
> > > 8e9b0e79-f309-4d9b-a5bb-45d065faaaa3
> > > Status: Started Snapshot Count: 0
> > > Number of Bricks: 1 x 2 = 2
> > > Transport-type: tcp Bricks: Brick1:
> > > vs1dlan.mydomain.com:/glusterfs_bricks/brick1/www
> > > Brick2:
> > > vs2dlan.mydomain.com:/glusterfs_bricks/brick1/www
> > > Options
> > > Reconfigured:
> > > nfs.disable: on
> > > cluster.favorite-child-policy: mtime
> > > transport.address-family: inet I had
> > > some other performance options in
> > > there, (increased cache-size, md
> > > invalidation, etc) but stripped them
> > > out in an attempt to isolate the
> > > issue. Still got the problem without
> > > them. The volume currently contains
> > > over 1M files. When mounting the
> > > volume, I get (among other things) a
> > > process as such:
> > > /usr/sbin/glusterfs
> > > --volfile-server=localhost
> > > --volfile-id=/GlusterWWW
> > > /var/www This process begins with
> > > little memory, but then as files are
> > > accessed in the volume the memory
> > > increases. I setup a script that
> > > simply reads the files in the volume
> > > one at a time (no writes). It's
> > > been running on and off about 12
> > > hours now and the resident memory of
> > > the above process is already at 7.5G
> > > and continues to grow slowly.
> > > If I
> > > stop the test script the memory
> > > stops growing, but does not reduce.
> > > Restart the test script and the
> > > memory begins slowly growing again.
> > > This
> > > is obviously a contrived app
> > > environment. With my intended
> > > application
> > > load it takes about a week or so for
> > > the memory to get high enough to
> > > invoke the oom killer. Is there
> > > potentially something misconfigured
> > > here? Thanks, Dan Ragle
> > > daniel at Biblestuph.com
> > >
> > >
> > >
> > >
> > > _______________________________________________
> > > Gluster-users mailing list
> > > Gluster-users at gluster.org
> > > <mailto:Gluster-users at gluster.org>
> > > http://lists.gluster.org/mailman/listinfo/gluster-users
> > > <http://lists.gluster.org/mailman/listinfo/gluster-users>
> > >
> > > _______________________________________________
> > > Gluster-users mailing list
> > > Gluster-users at gluster.org
> > > <mailto:Gluster-users at gluster.org>
> > > http://lists.gluster.org/mailman/listinfo/gluster-users
> > > <http://lists.gluster.org/mailman/listinfo/gluster-users>
> > >
> > >
> > > _______________________________________________
> > > Gluster-users mailing list
> > > Gluster-users at gluster.org
> > > <mailto:Gluster-users at gluster.org>
> > > http://lists.gluster.org/mailman/listinfo/gluster-users
> > > <http://lists.gluster.org/mailman/listinfo/gluster-users>
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > _______________________________________________
> > Gluster-users mailing list
> > Gluster-users at gluster.org
> > http://lists.gluster.org/mailman/listinfo/gluster-users
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-users
More information about the Gluster-users
mailing list