[Gluster-users] Run away memory with gluster mount
Dan Ragle
daniel at Biblestuph.com
Sat Feb 3 13:58:15 UTC 2018
On 2/2/2018 2:13 AM, Nithya Balachandran wrote:
> Hi Dan,
>
> It sounds like you might be running into [1]. The patch has been posted
> upstream and the fix should be in the next release.
> In the meantime, I'm afraid there is no way to get around this without
> restarting the process.
>
> Regards,
> Nithya
>
> [1]https://bugzilla.redhat.com/show_bug.cgi?id=1541264
>
Much appreciated. Will watch for the next release and retest then.
Cheers!
Dan
>
> On 2 February 2018 at 02:57, Dan Ragle <daniel at biblestuph.com
> <mailto:daniel at biblestuph.com>> wrote:
>
>
>
> On 1/30/2018 6:31 AM, Raghavendra Gowdappa wrote:
>
>
>
> ----- Original Message -----
>
> From: "Dan Ragle" <daniel at Biblestuph.com>
> To: "Raghavendra Gowdappa" <rgowdapp at redhat.com
> <mailto:rgowdapp at redhat.com>>, "Ravishankar N"
> <ravishankar at redhat.com <mailto:ravishankar at redhat.com>>
> Cc: gluster-users at gluster.org
> <mailto:gluster-users at gluster.org>, "Csaba Henk"
> <chenk at redhat.com <mailto:chenk at redhat.com>>, "Niels de Vos"
> <ndevos at redhat.com <mailto:ndevos at redhat.com>>, "Nithya
> Balachandran" <nbalacha at redhat.com <mailto:nbalacha at redhat.com>>
> Sent: Monday, January 29, 2018 9:02:21 PM
> Subject: Re: [Gluster-users] Run away memory with gluster mount
>
>
>
> On 1/29/2018 2:36 AM, Raghavendra Gowdappa wrote:
>
>
>
> ----- Original Message -----
>
> From: "Ravishankar N" <ravishankar at redhat.com
> <mailto:ravishankar at redhat.com>>
> To: "Dan Ragle" <daniel at Biblestuph.com>,
> gluster-users at gluster.org
> <mailto:gluster-users at gluster.org>
> Cc: "Csaba Henk" <chenk at redhat.com
> <mailto:chenk at redhat.com>>, "Niels de Vos"
> <ndevos at redhat.com <mailto:ndevos at redhat.com>>,
> "Nithya Balachandran" <nbalacha at redhat.com
> <mailto:nbalacha at redhat.com>>,
> "Raghavendra Gowdappa" <rgowdapp at redhat.com
> <mailto:rgowdapp at redhat.com>>
> Sent: Saturday, January 27, 2018 10:23:38 AM
> Subject: Re: [Gluster-users] Run away memory with
> gluster mount
>
>
>
> On 01/27/2018 02:29 AM, Dan Ragle wrote:
>
>
> On 1/25/2018 8:21 PM, Ravishankar N wrote:
>
>
>
> On 01/25/2018 11:04 PM, Dan Ragle wrote:
>
> *sigh* trying again to correct
> formatting ... apologize for the
> earlier mess.
>
> Having a memory issue with Gluster
> 3.12.4 and not sure how to
> troubleshoot. I don't *think* this is
> expected behavior.
>
> This is on an updated CentOS 7 box. The
> setup is a simple two node
> replicated layout where the two nodes
> act as both server and
> client.
>
> The volume in question:
>
> Volume Name: GlusterWWW
> Type: Replicate
> Volume ID:
> 8e9b0e79-f309-4d9b-a5bb-45d065faaaa3
> Status: Started
> Snapshot Count: 0
> Number of Bricks: 1 x 2 = 2
> Transport-type: tcp
> Bricks:
> Brick1:
> vs1dlan.mydomain.com:/glusterfs_bricks/brick1/www
> Brick2:
> vs2dlan.mydomain.com:/glusterfs_bricks/brick1/www
> Options Reconfigured:
> nfs.disable: on
> cluster.favorite-child-policy: mtime
> transport.address-family: inet
>
> I had some other performance options in
> there, (increased
> cache-size, md invalidation, etc) but
> stripped them out in an
> attempt to
> isolate the issue. Still got the problem
> without them.
>
> The volume currently contains over 1M files.
>
> When mounting the volume, I get (among
> other things) a process as such:
>
> /usr/sbin/glusterfs
> --volfile-server=localhost
> --volfile-id=/GlusterWWW /var/www
>
> This process begins with little memory,
> but then as files are
> accessed in the volume the memory
> increases. I setup a script that
> simply reads the files in the volume one
> at a time (no writes). It's
> been running on and off about 12 hours
> now and the resident
> memory of the above process is already
> at 7.5G and continues to grow
> slowly. If I stop the test script the
> memory stops growing,
> but does not reduce. Restart the test
> script and the memory begins
> slowly growing again.
>
> This is obviously a contrived app
> environment. With my intended
> application load it takes about a week
> or so for the memory to get
> high enough to invoke the oom killer.
>
>
> Can you try debugging with the statedump
> (https://gluster.readthedocs.io/en/latest/Troubleshooting/statedump/#read-a-statedump
> <https://gluster.readthedocs.io/en/latest/Troubleshooting/statedump/#read-a-statedump>)
> of
> the fuse mount process and see what member
> is leaking? Take the
> statedumps in succession, maybe once
> initially during the I/O and
> once the memory gets high enough to hit the
> OOM mark.
> Share the dumps here.
>
> Regards,
> Ravi
>
>
> Thanks for the reply. I noticed yesterday that
> an update (3.12.5) had
> been posted so I went ahead and updated and
> repeated the test
> overnight. The memory usage does not appear to
> be growing as quickly
> as is was with 3.12.4, but does still appear to
> be growing.
>
> I should also mention that there is another
> process beyond my test app
> that is reading the files from the volume.
> Specifically, there is an
> rsync that runs from the second node 2-4 times
> an hour that reads from
> the GlusterWWW volume mounted on node 1. Since
> none of the files in
> that mount are changing it doesn't actually
> rsync anything, but
> nonetheless it is running and reading the files
> in addition to my test
> script. (It's a part of my intended production
> setup that I forgot was
> still running.)
>
> The mount process appears to be gaining memory
> at a rate of about 1GB
> every 4 hours or so. At that rate it'll take
> several days before it
> runs the box out of memory. But I took your
> suggestion and made some
> statedumps today anyway, about 2 hours apart, 4
> total so far. It looks
> like there may already be some actionable
> information. These are the
> only registers where the num_allocs have grown
> with each of the four
> samples:
>
> [mount/fuse.fuse - usage-type gf_fuse_mt_gids_t
> memusage]
> ---> num_allocs at Fri Jan 26 08:57:31 2018: 784
> ---> num_allocs at Fri Jan 26 10:55:50 2018: 831
> ---> num_allocs at Fri Jan 26 12:55:15 2018: 877
> ---> num_allocs at Fri Jan 26 14:58:27 2018: 908
>
> [mount/fuse.fuse - usage-type
> gf_common_mt_fd_lk_ctx_t memusage]
> ---> num_allocs at Fri Jan 26 08:57:31 2018: 5
> ---> num_allocs at Fri Jan 26 10:55:50 2018: 10
> ---> num_allocs at Fri Jan 26 12:55:15 2018: 15
> ---> num_allocs at Fri Jan 26 14:58:27 2018: 17
>
> [cluster/distribute.GlusterWWW-dht - usage-type
> gf_dht_mt_dht_layout_t
> memusage]
> ---> num_allocs at Fri Jan 26 08:57:31 2018:
> 24243596
> ---> num_allocs at Fri Jan 26 10:55:50 2018:
> 27902622
> ---> num_allocs at Fri Jan 26 12:55:15 2018:
> 30678066
> ---> num_allocs at Fri Jan 26 14:58:27 2018:
> 33801036
>
> Not sure the best way to get you the full dumps.
> They're pretty big,
> over 1G for all four. Also, I noticed some
> filepath information in
> there that I'd rather not share. What's the
> recommended next step?
>
>
> Please run the following query on statedump files and
> report us the
> results:
> # grep itable <client-statedump> | grep active | wc -l
> # grep itable <client-statedump> | grep active_size
> # grep itable <client-statedump> | grep lru | wc -l
> # grep itable <client-statedump> | grep lru_size
> # grep itable <client-statedump> | grep purge | wc -l
> # grep itable <client-statedump> | grep purge_size
>
>
> Had to restart the test and have been running for 36 hours
> now. RSS is
> currently up to 23g.
>
> Working on getting a bug report with link to the dumps. In
> the mean
> time, I'm including the results of your above queries for
> the first
> dump, the 18 hour dump, and the 36 hour dump:
>
> # grep itable glusterdump.153904.dump.1517104561 | grep
> active | wc -l
> 53865
> # grep itable glusterdump.153904.dump.1517169361 | grep
> active | wc -l
> 53864
> # grep itable glusterdump.153904.dump.1517234161 | grep
> active | wc -l
> 53864
>
> # grep itable glusterdump.153904.dump.1517104561 | grep
> active_size
> xlator.mount.fuse.itable.active_size=53864
> # grep itable glusterdump.153904.dump.1517169361 | grep
> active_size
> xlator.mount.fuse.itable.active_size=53863
> # grep itable glusterdump.153904.dump.1517234161 | grep
> active_size
> xlator.mount.fuse.itable.active_size=53863
>
> # grep itable glusterdump.153904.dump.1517104561 | grep lru
> | wc -l
> 998510
> # grep itable glusterdump.153904.dump.1517169361 | grep lru
> | wc -l
> 998510
> # grep itable glusterdump.153904.dump.1517234161 | grep lru
> | wc -l
> 995992
>
> # grep itable glusterdump.153904.dump.1517104561 | grep lru_size
> xlator.mount.fuse.itable.lru_size=998508
> # grep itable glusterdump.153904.dump.1517169361 | grep lru_size
> xlator.mount.fuse.itable.lru_size=998508
> # grep itable glusterdump.153904.dump.1517234161 | grep lru_size
> xlator.mount.fuse.itable.lru_size=995990
>
>
> Around 1 million of inodes in lru table!! These are the inodes
> kernel has just cached and no operation is currently progress on
> these inodes. This could be the reason for high memory usage.
> We've a patch being worked on (merged on experimental branch
> currently) [1], that will help in these sceanrios. In the
> meantime can you remount glusterfs with options
> --entry-timeout=0 and --attribute-timeout=0? This will make sure
> that kernel won't cache inodes/attributes of the file and should
> bring down the memory usage.
>
> I am curious to know what is your data-set like? Is it the case
> of too many directories and files present in deep directories? I
> am wondering whether a significant number of inodes cached by
> kernel are there to hold dentry structure in kernel.
>
> [1] https://review.gluster.org/#/c/18665/
> <https://review.gluster.org/#/c/18665/>
>
>
> OK, remounted with your recommended attributes and repeated the
> test. Now the mount process looks like this:
>
> /usr/sbin/glusterfs --attribute-timeout=0 --entry-timeout=0
> --volfile-server=localhost --volfile-id=/GlusterWWW /var/www
>
> However after running for 36 hours it's again at about 23g (about
> the same place it was on the first test).
>
> A few metrics from the 36 hour mark:
>
> num_allocs for [cluster/distribute.GlusterWWW-dht - usage-type
> gf_dht_mt_dht_layout_t memusage] is 109140094. Seems at least
> somewhat similar to the original test, which had 117901593 at the 36
> hour mark.
>
> The dump file at the 36 hour mark had nothing for lru or lru_size.
> However, at the dump two hours prior it had:
>
> # grep itable glusterdump.67299.dump.1517493361 | grep lru | wc -l
> 998510
> # grep itable glusterdump.67299.dump.1517493361 | grep lru_size
> xlator.mount.fuse.itable.lru_size=998508
>
> and the same thing for the dump four hours later. Are these values
> only relevant when the ls -R is actually running? I'm thinking the
> 36 hour dump may have caught the ls -R between runs there (?)
>
> The data set is multiple Web sites. I know there's some litter there
> we can clean up, but I'd guess not more than 200-300k files or so.
> The biggest culprit is a single directory that we use as a
> multi-purpose file store, with filenames stored as GUIDs and linked
> to a DB. That directory currently has 500k+ files. Another directory
> serves a similar purpose and has about 66k files in it. The rest is
> generally distributed more "normally", I.E., a mixed nesting of
> directories and files.
>
> Cheers!
>
> Dan
>
>
>
> # grep itable glusterdump.153904.dump.1517104561 | grep
> purge | wc -l
> 1
> # grep itable glusterdump.153904.dump.1517169361 | grep
> purge | wc -l
> 1
> # grep itable glusterdump.153904.dump.1517234161 | grep
> purge | wc -l
> 1
>
> # grep itable glusterdump.153904.dump.1517104561 | grep
> purge_size
> xlator.mount.fuse.itable.purge_size=0
> # grep itable glusterdump.153904.dump.1517169361 | grep
> purge_size
> xlator.mount.fuse.itable.purge_size=0
> # grep itable glusterdump.153904.dump.1517234161 | grep
> purge_size
> xlator.mount.fuse.itable.purge_size=0
>
> Cheers,
>
> Dan
>
>
>
> I've CC'd the fuse/ dht devs to see if these data
> types have potential
> leaks. Could you raise a bug with the volume info
> and a (dropbox?) link
> from which we can download the dumps? You can
> remove/replace the
> filepaths from them.
>
> Regards.
> Ravi
>
>
> Cheers!
>
> Dan
>
>
> Is there potentially something
> misconfigured here?
>
> I did see a reference to a memory leak
> in another thread in this
> list, but that had to do with the
> setting of quotas, I don't have
> any quotas set on my system.
>
> Thanks,
>
> Dan Ragle
> daniel at Biblestuph.com
>
> On 1/25/2018 11:04 AM, Dan Ragle wrote:
>
> Having a memory issue with Gluster
> 3.12.4 and not sure how to
> troubleshoot. I don't *think* this
> is expected behavior. This is on an
> updated CentOS 7 box. The setup is a
> simple two node replicated layout
> where the two nodes act as both
> server and client. The volume in
> question: Volume Name: GlusterWWW
> Type: Replicate Volume ID:
> 8e9b0e79-f309-4d9b-a5bb-45d065faaaa3
> Status: Started Snapshot Count: 0
> Number of Bricks: 1 x 2 = 2
> Transport-type: tcp Bricks: Brick1:
> vs1dlan.mydomain.com:/glusterfs_bricks/brick1/www
> Brick2:
> vs2dlan.mydomain.com:/glusterfs_bricks/brick1/www
> Options
> Reconfigured:
> nfs.disable: on
> cluster.favorite-child-policy: mtime
> transport.address-family: inet I had
> some other performance options in
> there, (increased cache-size, md
> invalidation, etc) but stripped them
> out in an attempt to isolate the
> issue. Still got the problem without
> them. The volume currently contains
> over 1M files. When mounting the
> volume, I get (among other things) a
> process as such:
> /usr/sbin/glusterfs
> --volfile-server=localhost
> --volfile-id=/GlusterWWW
> /var/www This process begins with
> little memory, but then as files are
> accessed in the volume the memory
> increases. I setup a script that
> simply reads the files in the volume
> one at a time (no writes). It's
> been running on and off about 12
> hours now and the resident memory of
> the above process is already at 7.5G
> and continues to grow slowly.
> If I
> stop the test script the memory
> stops growing, but does not reduce.
> Restart the test script and the
> memory begins slowly growing again.
> This
> is obviously a contrived app
> environment. With my intended
> application
> load it takes about a week or so for
> the memory to get high enough to
> invoke the oom killer. Is there
> potentially something misconfigured
> here? Thanks, Dan Ragle
> daniel at Biblestuph.com
>
>
>
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> <mailto:Gluster-users at gluster.org>
> http://lists.gluster.org/mailman/listinfo/gluster-users
> <http://lists.gluster.org/mailman/listinfo/gluster-users>
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> <mailto:Gluster-users at gluster.org>
> http://lists.gluster.org/mailman/listinfo/gluster-users
> <http://lists.gluster.org/mailman/listinfo/gluster-users>
>
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> <mailto:Gluster-users at gluster.org>
> http://lists.gluster.org/mailman/listinfo/gluster-users
> <http://lists.gluster.org/mailman/listinfo/gluster-users>
>
>
>
>
>
>
>
More information about the Gluster-users
mailing list