[Gluster-users] Run away memory with gluster mount
Nithya Balachandran
nbalacha at redhat.com
Thu Feb 22 03:37:44 UTC 2018
On 21 February 2018 at 21:11, Dan Ragle <daniel at biblestuph.com> wrote:
>
>
> On 2/3/2018 8:58 AM, Dan Ragle wrote:
>
>>
>>
>> On 2/2/2018 2:13 AM, Nithya Balachandran wrote:
>>
>>> Hi Dan,
>>>
>>> It sounds like you might be running into [1]. The patch has been posted
>>> upstream and the fix should be in the next release.
>>> In the meantime, I'm afraid there is no way to get around this without
>>> restarting the process.
>>>
>>> Regards,
>>> Nithya
>>>
>>> [1]https://bugzilla.redhat.com/show_bug.cgi?id=1541264
>>>
>>>
>> Much appreciated. Will watch for the next release and retest then.
>>
>> Cheers!
>>
>> Dan
>>
>>
> FYI, this looks like it's fixed in 3.12.6. Ran the test setup with
> repeated ls listings for just shy of 48 hours with no increase in RAM
> usage. Next will try my production application load for awhile to see if it
> holds steady.
>
> The gf_dht_mt_dht_layout_t memusage num_allocs went quickly up to 105415
> and then stayed there for the entire 48 hours.
>
>
Excellent. Thanks for letting us know.
Nithya
> Thanks for the quick response,
>
> Dan
>
>
>>> On 2 February 2018 at 02:57, Dan Ragle <daniel at biblestuph.com <mailto:
>>> daniel at biblestuph.com>> wrote:
>>>
>>>
>>>
>>> On 1/30/2018 6:31 AM, Raghavendra Gowdappa wrote:
>>>
>>>
>>>
>>> ----- Original Message -----
>>>
>>> From: "Dan Ragle" <daniel at Biblestuph.com>
>>> To: "Raghavendra Gowdappa" <rgowdapp at redhat.com
>>> <mailto:rgowdapp at redhat.com>>, "Ravishankar N"
>>> <ravishankar at redhat.com <mailto:ravishankar at redhat.com>>
>>> Cc: gluster-users at gluster.org
>>> <mailto:gluster-users at gluster.org>, "Csaba Henk"
>>> <chenk at redhat.com <mailto:chenk at redhat.com>>, "Niels de Vos"
>>> <ndevos at redhat.com <mailto:ndevos at redhat.com>>, "Nithya
>>> Balachandran" <nbalacha at redhat.com <mailto:
>>> nbalacha at redhat.com>>
>>> Sent: Monday, January 29, 2018 9:02:21 PM
>>> Subject: Re: [Gluster-users] Run away memory with gluster
>>> mount
>>>
>>>
>>>
>>> On 1/29/2018 2:36 AM, Raghavendra Gowdappa wrote:
>>>
>>>
>>>
>>> ----- Original Message -----
>>>
>>> From: "Ravishankar N" <ravishankar at redhat.com
>>> <mailto:ravishankar at redhat.com>>
>>> To: "Dan Ragle" <daniel at Biblestuph.com>,
>>> gluster-users at gluster.org
>>> <mailto:gluster-users at gluster.org>
>>> Cc: "Csaba Henk" <chenk at redhat.com
>>> <mailto:chenk at redhat.com>>, "Niels de Vos"
>>> <ndevos at redhat.com <mailto:ndevos at redhat.com>>,
>>> "Nithya Balachandran" <nbalacha at redhat.com
>>> <mailto:nbalacha at redhat.com>>,
>>> "Raghavendra Gowdappa" <rgowdapp at redhat.com
>>> <mailto:rgowdapp at redhat.com>>
>>> Sent: Saturday, January 27, 2018 10:23:38 AM
>>> Subject: Re: [Gluster-users] Run away memory with
>>> gluster mount
>>>
>>>
>>>
>>> On 01/27/2018 02:29 AM, Dan Ragle wrote:
>>>
>>>
>>> On 1/25/2018 8:21 PM, Ravishankar N wrote:
>>>
>>>
>>>
>>> On 01/25/2018 11:04 PM, Dan Ragle wrote:
>>>
>>> *sigh* trying again to correct
>>> formatting ... apologize for the
>>> earlier mess.
>>>
>>> Having a memory issue with Gluster
>>> 3.12.4 and not sure how to
>>> troubleshoot. I don't *think* this is
>>> expected behavior.
>>>
>>> This is on an updated CentOS 7 box. The
>>> setup is a simple two node
>>> replicated layout where the two nodes
>>> act as both server and
>>> client.
>>>
>>> The volume in question:
>>>
>>> Volume Name: GlusterWWW
>>> Type: Replicate
>>> Volume ID:
>>> 8e9b0e79-f309-4d9b-a5bb-45d065faaaa3
>>> Status: Started
>>> Snapshot Count: 0
>>> Number of Bricks: 1 x 2 = 2
>>> Transport-type: tcp
>>> Bricks:
>>> Brick1:
>>> vs1dlan.mydomain.com:/glusterf
>>> s_bricks/brick1/www
>>> Brick2:
>>> vs2dlan.mydomain.com:/glusterf
>>> s_bricks/brick1/www
>>> Options Reconfigured:
>>> nfs.disable: on
>>> cluster.favorite-child-policy: mtime
>>> transport.address-family: inet
>>>
>>> I had some other performance options in
>>> there, (increased
>>> cache-size, md invalidation, etc) but
>>> stripped them out in an
>>> attempt to
>>> isolate the issue. Still got the problem
>>> without them.
>>>
>>> The volume currently contains over 1M
>>> files.
>>>
>>> When mounting the volume, I get (among
>>> other things) a process as such:
>>>
>>> /usr/sbin/glusterfs
>>> --volfile-server=localhost
>>> --volfile-id=/GlusterWWW /var/www
>>>
>>> This process begins with little memory,
>>> but then as files are
>>> accessed in the volume the memory
>>> increases. I setup a script that
>>> simply reads the files in the volume one
>>> at a time (no writes). It's
>>> been running on and off about 12 hours
>>> now and the resident
>>> memory of the above process is already
>>> at 7.5G and continues to grow
>>> slowly. If I stop the test script the
>>> memory stops growing,
>>> but does not reduce. Restart the test
>>> script and the memory begins
>>> slowly growing again.
>>>
>>> This is obviously a contrived app
>>> environment. With my intended
>>> application load it takes about a week
>>> or so for the memory to get
>>> high enough to invoke the oom killer.
>>>
>>>
>>> Can you try debugging with the statedump
>>> (https://gluster.readthedocs.i
>>> o/en/latest/Troubleshooting/statedump/#read-a-statedump
>>> <https://gluster.readthedocs.i
>>> o/en/latest/Troubleshooting/statedump/#read-a-statedump>)
>>> of
>>> the fuse mount process and see what member
>>> is leaking? Take the
>>> statedumps in succession, maybe once
>>> initially during the I/O and
>>> once the memory gets high enough to hit the
>>> OOM mark.
>>> Share the dumps here.
>>>
>>> Regards,
>>> Ravi
>>>
>>>
>>> Thanks for the reply. I noticed yesterday that
>>> an update (3.12.5) had
>>> been posted so I went ahead and updated and
>>> repeated the test
>>> overnight. The memory usage does not appear to
>>> be growing as quickly
>>> as is was with 3.12.4, but does still appear to
>>> be growing.
>>>
>>> I should also mention that there is another
>>> process beyond my test app
>>> that is reading the files from the volume.
>>> Specifically, there is an
>>> rsync that runs from the second node 2-4 times
>>> an hour that reads from
>>> the GlusterWWW volume mounted on node 1. Since
>>> none of the files in
>>> that mount are changing it doesn't actually
>>> rsync anything, but
>>> nonetheless it is running and reading the files
>>> in addition to my test
>>> script. (It's a part of my intended production
>>> setup that I forgot was
>>> still running.)
>>>
>>> The mount process appears to be gaining memory
>>> at a rate of about 1GB
>>> every 4 hours or so. At that rate it'll take
>>> several days before it
>>> runs the box out of memory. But I took your
>>> suggestion and made some
>>> statedumps today anyway, about 2 hours apart, 4
>>> total so far. It looks
>>> like there may already be some actionable
>>> information. These are the
>>> only registers where the num_allocs have grown
>>> with each of the four
>>> samples:
>>>
>>> [mount/fuse.fuse - usage-type gf_fuse_mt_gids_t
>>> memusage]
>>> ---> num_allocs at Fri Jan 26 08:57:31 2018:
>>> 784
>>> ---> num_allocs at Fri Jan 26 10:55:50 2018:
>>> 831
>>> ---> num_allocs at Fri Jan 26 12:55:15 2018:
>>> 877
>>> ---> num_allocs at Fri Jan 26 14:58:27 2018:
>>> 908
>>>
>>> [mount/fuse.fuse - usage-type
>>> gf_common_mt_fd_lk_ctx_t memusage]
>>> ---> num_allocs at Fri Jan 26 08:57:31 2018:
>>> 5
>>> ---> num_allocs at Fri Jan 26 10:55:50 2018:
>>> 10
>>> ---> num_allocs at Fri Jan 26 12:55:15 2018:
>>> 15
>>> ---> num_allocs at Fri Jan 26 14:58:27 2018:
>>> 17
>>>
>>> [cluster/distribute.GlusterWWW-dht - usage-type
>>> gf_dht_mt_dht_layout_t
>>> memusage]
>>> ---> num_allocs at Fri Jan 26 08:57:31 2018:
>>> 24243596
>>> ---> num_allocs at Fri Jan 26 10:55:50 2018:
>>> 27902622
>>> ---> num_allocs at Fri Jan 26 12:55:15 2018:
>>> 30678066
>>> ---> num_allocs at Fri Jan 26 14:58:27 2018:
>>> 33801036
>>>
>>> Not sure the best way to get you the full dumps.
>>> They're pretty big,
>>> over 1G for all four. Also, I noticed some
>>> filepath information in
>>> there that I'd rather not share. What's the
>>> recommended next step?
>>>
>>>
>>> Please run the following query on statedump files and
>>> report us the
>>> results:
>>> # grep itable <client-statedump> | grep active | wc -l
>>> # grep itable <client-statedump> | grep active_size
>>> # grep itable <client-statedump> | grep lru | wc -l
>>> # grep itable <client-statedump> | grep lru_size
>>> # grep itable <client-statedump> | grep purge | wc -l
>>> # grep itable <client-statedump> | grep purge_size
>>>
>>>
>>> Had to restart the test and have been running for 36 hours
>>> now. RSS is
>>> currently up to 23g.
>>>
>>> Working on getting a bug report with link to the dumps. In
>>> the mean
>>> time, I'm including the results of your above queries for
>>> the first
>>> dump, the 18 hour dump, and the 36 hour dump:
>>>
>>> # grep itable glusterdump.153904.dump.1517104561 | grep
>>> active | wc -l
>>> 53865
>>> # grep itable glusterdump.153904.dump.1517169361 | grep
>>> active | wc -l
>>> 53864
>>> # grep itable glusterdump.153904.dump.1517234161 | grep
>>> active | wc -l
>>> 53864
>>>
>>> # grep itable glusterdump.153904.dump.1517104561 | grep
>>> active_size
>>> xlator.mount.fuse.itable.active_size=53864
>>> # grep itable glusterdump.153904.dump.1517169361 | grep
>>> active_size
>>> xlator.mount.fuse.itable.active_size=53863
>>> # grep itable glusterdump.153904.dump.1517234161 | grep
>>> active_size
>>> xlator.mount.fuse.itable.active_size=53863
>>>
>>> # grep itable glusterdump.153904.dump.1517104561 | grep lru
>>> | wc -l
>>> 998510
>>> # grep itable glusterdump.153904.dump.1517169361 | grep lru
>>> | wc -l
>>> 998510
>>> # grep itable glusterdump.153904.dump.1517234161 | grep lru
>>> | wc -l
>>> 995992
>>>
>>> # grep itable glusterdump.153904.dump.1517104561 | grep
>>> lru_size
>>> xlator.mount.fuse.itable.lru_size=998508
>>> # grep itable glusterdump.153904.dump.1517169361 | grep
>>> lru_size
>>> xlator.mount.fuse.itable.lru_size=998508
>>> # grep itable glusterdump.153904.dump.1517234161 | grep
>>> lru_size
>>> xlator.mount.fuse.itable.lru_size=995990
>>>
>>>
>>> Around 1 million of inodes in lru table!! These are the inodes
>>> kernel has just cached and no operation is currently progress on
>>> these inodes. This could be the reason for high memory usage.
>>> We've a patch being worked on (merged on experimental branch
>>> currently) [1], that will help in these sceanrios. In the
>>> meantime can you remount glusterfs with options
>>> --entry-timeout=0 and --attribute-timeout=0? This will make sure
>>> that kernel won't cache inodes/attributes of the file and should
>>> bring down the memory usage.
>>>
>>> I am curious to know what is your data-set like? Is it the case
>>> of too many directories and files present in deep directories? I
>>> am wondering whether a significant number of inodes cached by
>>> kernel are there to hold dentry structure in kernel.
>>>
>>> [1] https://review.gluster.org/#/c/18665/
>>> <https://review.gluster.org/#/c/18665/>
>>>
>>>
>>> OK, remounted with your recommended attributes and repeated the
>>> test. Now the mount process looks like this:
>>>
>>> /usr/sbin/glusterfs --attribute-timeout=0 --entry-timeout=0
>>> --volfile-server=localhost --volfile-id=/GlusterWWW /var/www
>>>
>>> However after running for 36 hours it's again at about 23g (about
>>> the same place it was on the first test).
>>>
>>> A few metrics from the 36 hour mark:
>>>
>>> num_allocs for [cluster/distribute.GlusterWWW-dht - usage-type
>>> gf_dht_mt_dht_layout_t memusage] is 109140094. Seems at least
>>> somewhat similar to the original test, which had 117901593 at the 36
>>> hour mark.
>>>
>>> The dump file at the 36 hour mark had nothing for lru or lru_size.
>>> However, at the dump two hours prior it had:
>>>
>>> # grep itable glusterdump.67299.dump.1517493361 | grep lru | wc -l
>>> 998510
>>> # grep itable glusterdump.67299.dump.1517493361 | grep lru_size
>>> xlator.mount.fuse.itable.lru_size=998508
>>>
>>> and the same thing for the dump four hours later. Are these values
>>> only relevant when the ls -R is actually running? I'm thinking the
>>> 36 hour dump may have caught the ls -R between runs there (?)
>>>
>>> The data set is multiple Web sites. I know there's some litter there
>>> we can clean up, but I'd guess not more than 200-300k files or so.
>>> The biggest culprit is a single directory that we use as a
>>> multi-purpose file store, with filenames stored as GUIDs and linked
>>> to a DB. That directory currently has 500k+ files. Another directory
>>> serves a similar purpose and has about 66k files in it. The rest is
>>> generally distributed more "normally", I.E., a mixed nesting of
>>> directories and files.
>>>
>>> Cheers!
>>>
>>> Dan
>>>
>>>
>>>
>>> # grep itable glusterdump.153904.dump.1517104561 | grep
>>> purge | wc -l
>>> 1
>>> # grep itable glusterdump.153904.dump.1517169361 | grep
>>> purge | wc -l
>>> 1
>>> # grep itable glusterdump.153904.dump.1517234161 | grep
>>> purge | wc -l
>>> 1
>>>
>>> # grep itable glusterdump.153904.dump.1517104561 | grep
>>> purge_size
>>> xlator.mount.fuse.itable.purge_size=0
>>> # grep itable glusterdump.153904.dump.1517169361 | grep
>>> purge_size
>>> xlator.mount.fuse.itable.purge_size=0
>>> # grep itable glusterdump.153904.dump.1517234161 | grep
>>> purge_size
>>> xlator.mount.fuse.itable.purge_size=0
>>>
>>> Cheers,
>>>
>>> Dan
>>>
>>>
>>>
>>> I've CC'd the fuse/ dht devs to see if these data
>>> types have potential
>>> leaks. Could you raise a bug with the volume info
>>> and a (dropbox?) link
>>> from which we can download the dumps? You can
>>> remove/replace the
>>> filepaths from them.
>>>
>>> Regards.
>>> Ravi
>>>
>>>
>>> Cheers!
>>>
>>> Dan
>>>
>>>
>>> Is there potentially something
>>> misconfigured here?
>>>
>>> I did see a reference to a memory leak
>>> in another thread in this
>>> list, but that had to do with the
>>> setting of quotas, I don't have
>>> any quotas set on my system.
>>>
>>> Thanks,
>>>
>>> Dan Ragle
>>> daniel at Biblestuph.com
>>>
>>> On 1/25/2018 11:04 AM, Dan Ragle wrote:
>>>
>>> Having a memory issue with Gluster
>>> 3.12.4 and not sure how to
>>> troubleshoot. I don't *think* this
>>> is expected behavior. This is on an
>>> updated CentOS 7 box. The setup is a
>>> simple two node replicated layout
>>> where the two nodes act as both
>>> server and client. The volume in
>>> question: Volume Name: GlusterWWW
>>> Type: Replicate Volume ID:
>>> 8e9b0e79-f309-4d9b-a5bb-45d065faaaa3
>>> Status: Started Snapshot Count: 0
>>> Number of Bricks: 1 x 2 = 2
>>> Transport-type: tcp Bricks: Brick1:
>>> vs1dlan.mydomain.com:/glusterf
>>> s_bricks/brick1/www
>>> Brick2:
>>> vs2dlan.mydomain.com:/glusterf
>>> s_bricks/brick1/www
>>> Options
>>> Reconfigured:
>>> nfs.disable: on
>>> cluster.favorite-child-policy: mtime
>>> transport.address-family: inet I had
>>> some other performance options in
>>> there, (increased cache-size, md
>>> invalidation, etc) but stripped them
>>> out in an attempt to isolate the
>>> issue. Still got the problem without
>>> them. The volume currently contains
>>> over 1M files. When mounting the
>>> volume, I get (among other things) a
>>> process as such:
>>> /usr/sbin/glusterfs
>>> --volfile-server=localhost
>>> --volfile-id=/GlusterWWW
>>> /var/www This process begins with
>>> little memory, but then as files are
>>> accessed in the volume the memory
>>> increases. I setup a script that
>>> simply reads the files in the volume
>>> one at a time (no writes). It's
>>> been running on and off about 12
>>> hours now and the resident memory of
>>> the above process is already at 7.5G
>>> and continues to grow slowly.
>>> If I
>>> stop the test script the memory
>>> stops growing, but does not reduce.
>>> Restart the test script and the
>>> memory begins slowly growing again.
>>> This
>>> is obviously a contrived app
>>> environment. With my intended
>>> application
>>> load it takes about a week or so for
>>> the memory to get high enough to
>>> invoke the oom killer. Is there
>>> potentially something misconfigured
>>> here? Thanks, Dan Ragle
>>> daniel at Biblestuph.com
>>>
>>>
>>>
>>>
>>> ______________________________
>>> _________________
>>> Gluster-users mailing list
>>> Gluster-users at gluster.org
>>> <mailto:Gluster-users at gluster.org>
>>> http://lists.gluster.org/mailm
>>> an/listinfo/gluster-users
>>> <http://lists.gluster.org/mail
>>> man/listinfo/gluster-users>
>>>
>>> ______________________________
>>> _________________
>>> Gluster-users mailing list
>>> Gluster-users at gluster.org
>>> <mailto:Gluster-users at gluster.org>
>>> http://lists.gluster.org/mailm
>>> an/listinfo/gluster-users
>>> <http://lists.gluster.org/mail
>>> man/listinfo/gluster-users>
>>>
>>>
>>> _______________________________________________
>>> Gluster-users mailing list
>>> Gluster-users at gluster.org
>>> <mailto:Gluster-users at gluster.org>
>>> http://lists.gluster.org/mailm
>>> an/listinfo/gluster-users
>>> <http://lists.gluster.org/mail
>>> man/listinfo/gluster-users>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> _______________________________________________
>> Gluster-users mailing list
>> Gluster-users at gluster.org
>> http://lists.gluster.org/mailman/listinfo/gluster-users
>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20180222/d2c4958e/attachment.html>
More information about the Gluster-users
mailing list