[Gluster-users] Run away memory with gluster mount

Wed Feb 21 15:41:41 UTC 2018

On 2/3/2018 8:58 AM, Dan Ragle wrote:
> 
> 
> On 2/2/2018 2:13 AM, Nithya Balachandran wrote:
>> Hi Dan,
>>
>> It sounds like you might be running into [1]. The patch has been 
>> posted upstream and the fix should be in the next release.
>> In the meantime, I'm afraid there is no way to get around this without 
>> restarting the process.
>>
>> Regards,
>> Nithya
>>
>> [1]https://bugzilla.redhat.com/show_bug.cgi?id=1541264
>>
> 
> Much appreciated. Will watch for the next release and retest then.
> 
> Cheers!
> 
> Dan
> 

FYI, this looks like it's fixed in 3.12.6. Ran the test setup with 
repeated ls listings for just shy of 48 hours with no increase in RAM 
usage. Next will try my production application load for awhile to see if 
it holds steady.

The gf_dht_mt_dht_layout_t memusage num_allocs went quickly up to 105415 
and then stayed there for the entire 48 hours.

Thanks for the quick response,

Dan

>>
>> On 2 February 2018 at 02:57, Dan Ragle <daniel at biblestuph.com 
>> <mailto:daniel at biblestuph.com>> wrote:
>>
>>
>>
>>     On 1/30/2018 6:31 AM, Raghavendra Gowdappa wrote:
>>
>>
>>
>>         ----- Original Message -----
>>
>>             From: "Dan Ragle" <daniel at Biblestuph.com>
>>             To: "Raghavendra Gowdappa" <rgowdapp at redhat.com
>>             <mailto:rgowdapp at redhat.com>>, "Ravishankar N"
>>             <ravishankar at redhat.com <mailto:ravishankar at redhat.com>>
>>             Cc: gluster-users at gluster.org
>>             <mailto:gluster-users at gluster.org>, "Csaba Henk"
>>             <chenk at redhat.com <mailto:chenk at redhat.com>>, "Niels de Vos"
>>             <ndevos at redhat.com <mailto:ndevos at redhat.com>>, "Nithya
>>             Balachandran" <nbalacha at redhat.com 
>> <mailto:nbalacha at redhat.com>>
>>             Sent: Monday, January 29, 2018 9:02:21 PM
>>             Subject: Re: [Gluster-users] Run away memory with gluster 
>> mount
>>
>>
>>
>>             On 1/29/2018 2:36 AM, Raghavendra Gowdappa wrote:
>>
>>
>>
>>                 ----- Original Message -----
>>
>>                     From: "Ravishankar N" <ravishankar at redhat.com
>>                     <mailto:ravishankar at redhat.com>>
>>                     To: "Dan Ragle" <daniel at Biblestuph.com>,
>>                     gluster-users at gluster.org
>>                     <mailto:gluster-users at gluster.org>
>>                     Cc: "Csaba Henk" <chenk at redhat.com
>>                     <mailto:chenk at redhat.com>>, "Niels de Vos"
>>                     <ndevos at redhat.com <mailto:ndevos at redhat.com>>,
>>                     "Nithya Balachandran" <nbalacha at redhat.com
>>                     <mailto:nbalacha at redhat.com>>,
>>                     "Raghavendra Gowdappa" <rgowdapp at redhat.com
>>                     <mailto:rgowdapp at redhat.com>>
>>                     Sent: Saturday, January 27, 2018 10:23:38 AM
>>                     Subject: Re: [Gluster-users] Run away memory with
>>                     gluster mount
>>
>>
>>
>>                     On 01/27/2018 02:29 AM, Dan Ragle wrote:
>>
>>
>>                         On 1/25/2018 8:21 PM, Ravishankar N wrote:
>>
>>
>>
>>                             On 01/25/2018 11:04 PM, Dan Ragle wrote:
>>
>>                                 *sigh* trying again to correct
>>                                 formatting ... apologize for the
>>                                 earlier mess.
>>
>>                                 Having a memory issue with Gluster
>>                                 3.12.4 and not sure how to
>>                                 troubleshoot. I don't *think* this is
>>                                 expected behavior.
>>
>>                                 This is on an updated CentOS 7 box. The
>>                                 setup is a simple two node
>>                                 replicated layout where the two nodes
>>                                 act as both server and
>>                                 client.
>>
>>                                 The volume in question:
>>
>>                                 Volume Name: GlusterWWW
>>                                 Type: Replicate
>>                                 Volume ID:
>>                                 8e9b0e79-f309-4d9b-a5bb-45d065faaaa3
>>                                 Status: Started
>>                                 Snapshot Count: 0
>>                                 Number of Bricks: 1 x 2 = 2
>>                                 Transport-type: tcp
>>                                 Bricks:
>>                                 Brick1:
>>                                 
>> vs1dlan.mydomain.com:/glusterfs_bricks/brick1/www
>>                                 Brick2:
>>                                 
>> vs2dlan.mydomain.com:/glusterfs_bricks/brick1/www
>>                                 Options Reconfigured:
>>                                 nfs.disable: on
>>                                 cluster.favorite-child-policy: mtime
>>                                 transport.address-family: inet
>>
>>                                 I had some other performance options in
>>                                 there, (increased
>>                                 cache-size, md invalidation, etc) but
>>                                 stripped them out in an
>>                                 attempt to
>>                                 isolate the issue. Still got the problem
>>                                 without them.
>>
>>                                 The volume currently contains over 1M 
>> files.
>>
>>                                 When mounting the volume, I get (among
>>                                 other things) a process as such:
>>
>>                                 /usr/sbin/glusterfs
>>                                 --volfile-server=localhost
>>                                 --volfile-id=/GlusterWWW /var/www
>>
>>                                 This process begins with little memory,
>>                                 but then as files are
>>                                 accessed in the volume the memory
>>                                 increases. I setup a script that
>>                                 simply reads the files in the volume one
>>                                 at a time (no writes). It's
>>                                 been running on and off about 12 hours
>>                                 now and the resident
>>                                 memory of the above process is already
>>                                 at 7.5G and continues to grow
>>                                 slowly. If I stop the test script the
>>                                 memory stops growing,
>>                                 but does not reduce. Restart the test
>>                                 script and the memory begins
>>                                 slowly growing again.
>>
>>                                 This is obviously a contrived app
>>                                 environment. With my intended
>>                                 application load it takes about a week
>>                                 or so for the memory to get
>>                                 high enough to invoke the oom killer.
>>
>>
>>                             Can you try debugging with the statedump
>>                             
>> (https://gluster.readthedocs.io/en/latest/Troubleshooting/statedump/#read-a-statedump 
>>
>>                             
>> <https://gluster.readthedocs.io/en/latest/Troubleshooting/statedump/#read-a-statedump>) 
>>
>>                             of
>>                             the fuse mount process and see what member
>>                             is leaking? Take the
>>                             statedumps in succession, maybe once
>>                             initially during the I/O and
>>                             once the memory gets high enough to hit the
>>                             OOM mark.
>>                             Share the dumps here.
>>
>>                             Regards,
>>                             Ravi
>>
>>
>>                         Thanks for the reply. I noticed yesterday that
>>                         an update (3.12.5) had
>>                         been posted so I went ahead and updated and
>>                         repeated the test
>>                         overnight. The memory usage does not appear to
>>                         be growing as quickly
>>                         as is was with 3.12.4, but does still appear to
>>                         be growing.
>>
>>                         I should also mention that there is another
>>                         process beyond my test app
>>                         that is reading the files from the volume.
>>                         Specifically, there is an
>>                         rsync that runs from the second node 2-4 times
>>                         an hour that reads from
>>                         the GlusterWWW volume mounted on node 1. Since
>>                         none of the files in
>>                         that mount are changing it doesn't actually
>>                         rsync anything, but
>>                         nonetheless it is running and reading the files
>>                         in addition to my test
>>                         script. (It's a part of my intended production
>>                         setup that I forgot was
>>                         still running.)
>>
>>                         The mount process appears to be gaining memory
>>                         at a rate of about 1GB
>>                         every 4 hours or so. At that rate it'll take
>>                         several days before it
>>                         runs the box out of memory. But I took your
>>                         suggestion and made some
>>                         statedumps today anyway, about 2 hours apart, 4
>>                         total so far. It looks
>>                         like there may already be some actionable
>>                         information. These are the
>>                         only registers where the num_allocs have grown
>>                         with each of the four
>>                         samples:
>>
>>                         [mount/fuse.fuse - usage-type gf_fuse_mt_gids_t
>>                         memusage]
>>                             ---> num_allocs at Fri Jan 26 08:57:31 
>> 2018: 784
>>                             ---> num_allocs at Fri Jan 26 10:55:50 
>> 2018: 831
>>                             ---> num_allocs at Fri Jan 26 12:55:15 
>> 2018: 877
>>                             ---> num_allocs at Fri Jan 26 14:58:27 
>> 2018: 908
>>
>>                         [mount/fuse.fuse - usage-type
>>                         gf_common_mt_fd_lk_ctx_t memusage]
>>                             ---> num_allocs at Fri Jan 26 08:57:31 
>> 2018: 5
>>                             ---> num_allocs at Fri Jan 26 10:55:50 
>> 2018: 10
>>                             ---> num_allocs at Fri Jan 26 12:55:15 
>> 2018: 15
>>                             ---> num_allocs at Fri Jan 26 14:58:27 
>> 2018: 17
>>
>>                         [cluster/distribute.GlusterWWW-dht - usage-type
>>                         gf_dht_mt_dht_layout_t
>>                         memusage]
>>                             ---> num_allocs at Fri Jan 26 08:57:31 2018:
>>                         24243596
>>                             ---> num_allocs at Fri Jan 26 10:55:50 2018:
>>                         27902622
>>                             ---> num_allocs at Fri Jan 26 12:55:15 2018:
>>                         30678066
>>                             ---> num_allocs at Fri Jan 26 14:58:27 2018:
>>                         33801036
>>
>>                         Not sure the best way to get you the full dumps.
>>                         They're pretty big,
>>                         over 1G for all four. Also, I noticed some
>>                         filepath information in
>>                         there that I'd rather not share. What's the
>>                         recommended next step?
>>
>>
>>                 Please run the following query on statedump files and
>>                 report us the
>>                 results:
>>                 # grep itable <client-statedump> | grep active | wc -l
>>                 # grep itable <client-statedump> | grep active_size
>>                 # grep itable <client-statedump> | grep lru | wc -l
>>                 # grep itable <client-statedump> | grep lru_size
>>                 # grep itable <client-statedump> | grep purge | wc -l
>>                 # grep itable <client-statedump> | grep purge_size
>>
>>
>>             Had to restart the test and have been running for 36 hours
>>             now. RSS is
>>             currently up to 23g.
>>
>>             Working on getting a bug report with link to the dumps. In
>>             the mean
>>             time, I'm including the results of your above queries for
>>             the first
>>             dump, the 18 hour dump, and the 36 hour dump:
>>
>>             # grep itable glusterdump.153904.dump.1517104561 | grep
>>             active | wc -l
>>             53865
>>             # grep itable glusterdump.153904.dump.1517169361 | grep
>>             active | wc -l
>>             53864
>>             # grep itable glusterdump.153904.dump.1517234161 | grep
>>             active | wc -l
>>             53864
>>
>>             # grep itable glusterdump.153904.dump.1517104561 | grep
>>             active_size
>>             xlator.mount.fuse.itable.active_size=53864
>>             # grep itable glusterdump.153904.dump.1517169361 | grep
>>             active_size
>>             xlator.mount.fuse.itable.active_size=53863
>>             # grep itable glusterdump.153904.dump.1517234161 | grep
>>             active_size
>>             xlator.mount.fuse.itable.active_size=53863
>>
>>             # grep itable glusterdump.153904.dump.1517104561 | grep lru
>>             | wc -l
>>             998510
>>             # grep itable glusterdump.153904.dump.1517169361 | grep lru
>>             | wc -l
>>             998510
>>             # grep itable glusterdump.153904.dump.1517234161 | grep lru
>>             | wc -l
>>             995992
>>
>>             # grep itable glusterdump.153904.dump.1517104561 | grep 
>> lru_size
>>             xlator.mount.fuse.itable.lru_size=998508
>>             # grep itable glusterdump.153904.dump.1517169361 | grep 
>> lru_size
>>             xlator.mount.fuse.itable.lru_size=998508
>>             # grep itable glusterdump.153904.dump.1517234161 | grep 
>> lru_size
>>             xlator.mount.fuse.itable.lru_size=995990
>>
>>
>>         Around 1 million of inodes in lru table!! These are the inodes
>>         kernel has just cached and no operation is currently progress on
>>         these inodes. This could be the reason for high memory usage.
>>         We've a patch being worked on (merged on experimental branch
>>         currently) [1], that will help in these sceanrios. In the
>>         meantime can you remount glusterfs with options
>>         --entry-timeout=0 and --attribute-timeout=0? This will make sure
>>         that kernel won't cache inodes/attributes of the file and should
>>         bring down the memory usage.
>>
>>         I am curious to know what is your data-set like? Is it the case
>>         of too many directories and files present in deep directories? I
>>         am wondering whether a significant number of inodes cached by
>>         kernel are there to hold dentry structure in kernel.
>>
>>         [1] https://review.gluster.org/#/c/18665/
>>         <https://review.gluster.org/#/c/18665/>
>>
>>
>>     OK, remounted with your recommended attributes and repeated the
>>     test. Now the mount process looks like this:
>>
>>     /usr/sbin/glusterfs --attribute-timeout=0 --entry-timeout=0
>>     --volfile-server=localhost --volfile-id=/GlusterWWW /var/www
>>
>>     However after running for 36 hours it's again at about 23g (about
>>     the same place it was on the first test).
>>
>>     A few metrics from the 36 hour mark:
>>
>>     num_allocs for [cluster/distribute.GlusterWWW-dht - usage-type
>>     gf_dht_mt_dht_layout_t memusage] is 109140094. Seems at least
>>     somewhat similar to the original test, which had 117901593 at the 36
>>     hour mark.
>>
>>     The dump file at the 36 hour mark had nothing for lru or lru_size.
>>     However, at the dump two hours prior it had:
>>
>>     # grep itable glusterdump.67299.dump.1517493361 | grep lru | wc -l
>>     998510
>>     # grep itable glusterdump.67299.dump.1517493361 | grep lru_size
>>     xlator.mount.fuse.itable.lru_size=998508
>>
>>     and the same thing for the dump four hours later. Are these values
>>     only relevant when the ls -R is actually running? I'm thinking the
>>     36 hour dump may have caught the ls -R between runs there (?)
>>
>>     The data set is multiple Web sites. I know there's some litter there
>>     we can clean up, but I'd guess not more than 200-300k files or so.
>>     The biggest culprit is a single directory that we use as a
>>     multi-purpose file store, with filenames stored as GUIDs and linked
>>     to a DB. That directory currently has 500k+ files. Another directory
>>     serves a similar purpose and has about 66k files in it. The rest is
>>     generally distributed more "normally", I.E., a mixed nesting of
>>     directories and files.
>>
>>     Cheers!
>>
>>     Dan
>>
>>
>>
>>             # grep itable glusterdump.153904.dump.1517104561 | grep
>>             purge | wc -l
>>             1
>>             # grep itable glusterdump.153904.dump.1517169361 | grep
>>             purge | wc -l
>>             1
>>             # grep itable glusterdump.153904.dump.1517234161 | grep
>>             purge | wc -l
>>             1
>>
>>             # grep itable glusterdump.153904.dump.1517104561 | grep
>>             purge_size
>>             xlator.mount.fuse.itable.purge_size=0
>>             # grep itable glusterdump.153904.dump.1517169361 | grep
>>             purge_size
>>             xlator.mount.fuse.itable.purge_size=0
>>             # grep itable glusterdump.153904.dump.1517234161 | grep
>>             purge_size
>>             xlator.mount.fuse.itable.purge_size=0
>>
>>             Cheers,
>>
>>             Dan
>>
>>
>>
>>                     I've CC'd the fuse/ dht devs to see if these data
>>                     types have potential
>>                     leaks. Could you raise a bug with the volume info
>>                     and a (dropbox?) link
>>                     from which we can download the dumps? You can
>>                     remove/replace the
>>                     filepaths from them.
>>
>>                     Regards.
>>                     Ravi
>>
>>
>>                         Cheers!
>>
>>                         Dan
>>
>>
>>                                 Is there potentially something
>>                                 misconfigured here?
>>
>>                                 I did see a reference to a memory leak
>>                                 in another thread in this
>>                                 list, but that had to do with the
>>                                 setting of quotas, I don't have
>>                                 any quotas set on my system.
>>
>>                                 Thanks,
>>
>>                                 Dan Ragle
>>                                 daniel at Biblestuph.com
>>
>>                                 On 1/25/2018 11:04 AM, Dan Ragle wrote:
>>
>>                                     Having a memory issue with Gluster
>>                                     3.12.4 and not sure how to
>>                                     troubleshoot. I don't *think* this
>>                                     is expected behavior. This is on an
>>                                     updated CentOS 7 box. The setup is a
>>                                     simple two node replicated layout
>>                                     where the two nodes act as both
>>                                     server and client. The volume in
>>                                     question: Volume Name: GlusterWWW
>>                                     Type: Replicate Volume ID:
>>                                     8e9b0e79-f309-4d9b-a5bb-45d065faaaa3
>>                                     Status: Started Snapshot Count: 0
>>                                     Number of Bricks: 1 x 2 = 2
>>                                     Transport-type: tcp Bricks: Brick1:
>>                                     
>> vs1dlan.mydomain.com:/glusterfs_bricks/brick1/www
>>                                     Brick2:
>>                                     
>> vs2dlan.mydomain.com:/glusterfs_bricks/brick1/www
>>                                     Options
>>                                     Reconfigured:
>>                                     nfs.disable: on
>>                                     cluster.favorite-child-policy: mtime
>>                                     transport.address-family: inet I had
>>                                     some other performance options in
>>                                     there, (increased cache-size, md
>>                                     invalidation, etc) but stripped them
>>                                     out in an attempt to isolate the
>>                                     issue. Still got the problem without
>>                                     them. The volume currently contains
>>                                     over 1M files. When mounting the
>>                                     volume, I get (among other things) a
>>                                     process as such:
>>                                     /usr/sbin/glusterfs
>>                                     --volfile-server=localhost
>>                                     --volfile-id=/GlusterWWW
>>                                     /var/www This process begins with
>>                                     little memory, but then as files are
>>                                     accessed in the volume the memory
>>                                     increases. I setup a script that
>>                                     simply reads the files in the volume
>>                                     one at a time (no writes). It's
>>                                     been running on and off about 12
>>                                     hours now and the resident memory of
>>                                     the above process is already at 7.5G
>>                                     and continues to grow slowly.
>>                                     If I
>>                                     stop the test script the memory
>>                                     stops growing, but does not reduce.
>>                                     Restart the test script and the
>>                                     memory begins slowly growing again.
>>                                     This
>>                                     is obviously a contrived app
>>                                     environment. With my intended
>>                                     application
>>                                     load it takes about a week or so for
>>                                     the memory to get high enough to
>>                                     invoke the oom killer. Is there
>>                                     potentially something misconfigured
>>                                     here? Thanks, Dan Ragle
>>                                     daniel at Biblestuph.com
>>
>>
>>
>>
>>                                     
>> _______________________________________________
>>                                     Gluster-users mailing list
>>                                     Gluster-users at gluster.org
>>                                     <mailto:Gluster-users at gluster.org>
>>                                     
>> http://lists.gluster.org/mailman/listinfo/gluster-users
>>                                     
>> <http://lists.gluster.org/mailman/listinfo/gluster-users>
>>
>>                                 
>> _______________________________________________
>>                                 Gluster-users mailing list
>>                                 Gluster-users at gluster.org
>>                                 <mailto:Gluster-users at gluster.org>
>>                                 
>> http://lists.gluster.org/mailman/listinfo/gluster-users
>>                                 
>> <http://lists.gluster.org/mailman/listinfo/gluster-users>
>>
>>
>>                         _______________________________________________
>>                         Gluster-users mailing list
>>                         Gluster-users at gluster.org
>>                         <mailto:Gluster-users at gluster.org>
>>                         
>> http://lists.gluster.org/mailman/listinfo/gluster-users
>>                         
>> <http://lists.gluster.org/mailman/listinfo/gluster-users>
>>
>>
>>
>>
>>
>>
>>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-users
>