[Gluster-users] possible memory leak in client/fuse mount

Wed Nov 25 13:14:23 UTC 2020

On 25/11/20 5:50 pm, Olaf Buitelaar wrote:
> Hi Ashish,
>
> Thank you for looking into this. I indeed also suspect it has 
> something todo with the 7.X client, because on the 6.X clients the 
> issue doesn't really seem to occur.
> I would love to update everything to 7.X, But since the self-heal 
> daemons 
> (https://lists.gluster.org/pipermail/gluster-users/2020-November/038917.html 
> <https://lists.gluster.org/pipermail/gluster-users/2020-November/038917.html>) 
> won't start, i halted the full upgrade.

Olaf, based on your email. I did try to upgrade a 1 node of a 3-node 
replica 3 setup from 6.10 to 7.8 on my test VMs and I found that the 
self-heal daemon (and the bricks) came online after I restarted glusterd 
post-upgrade on that node. (I did not touch the op-version), and I did 
not spend time on it further.  So I don't think the problem is related 
to the shd mux changes I referred to. But if you have a test setup where 
you can reproduce this, please raise a github issue with the details.

Thanks,
Ravi
> Hopefully that issue will be addressed in the upcoming release. Once 
> i've everything running on the same version i'll check if the issue 
> still occurs and reach out, if that's the case.
>
> Thanks Olaf
>
> Op wo 25 nov. 2020 om 10:42 schreef Ashish Pandey <aspandey at redhat.com 
> <mailto:aspandey at redhat.com>>:
>
>
>     Hi,
>
>     I checked the statedump and found some very high memory allocations.
>     grep -rwn "num_allocs" glusterdump.17317.dump.1605* | cut -d'='
>     -f2 | sort
>
>     30003616
>     30003616
>     3305
>     3305
>     36960008
>     36960008
>     38029944
>     38029944
>     38450472
>     38450472
>     39566824
>     39566824
>     4
>     I did check the lines on statedump and it could be happening in
>     protocol/clinet. However, I did not find anything suspicious in my
>     quick code exploration.
>     I would suggest to upgrade all the nodes on latest version and the
>     start your work and see if there is any high usage of memory .
>     That way it will also be easier to debug this issue.
>
>     ---
>     Ashish
>
>     ------------------------------------------------------------------------
>     *From: *"Olaf Buitelaar" <olaf.buitelaar at gmail.com
>     <mailto:olaf.buitelaar at gmail.com>>
>     *To: *"gluster-users" <gluster-users at gluster.org
>     <mailto:gluster-users at gluster.org>>
>     *Sent: *Thursday, November 19, 2020 10:28:57 PM
>     *Subject: *[Gluster-users] possible memory leak in client/fuse mount
>
>     Dear Gluster Users,
>
>     I've a glusterfs process which consumes about all memory of the
>     machine (~58GB);
>
>     # ps -faxu|grep 17317
>     root     17317  3.1 88.9 59695516 58479708 ?   Ssl  Oct31 839:36
>     /usr/sbin/glusterfs --process-name fuse
>     --volfile-server=10.201.0.1
>     --volfile-server=10.201.0.8:10.201.0.5:10.201.0.6:10.201.0.7:10.201.0.9
>     --volfile-id=/docker2 /mnt/docker2
>
>     The gluster version on this machine is 7.8, but i'm currently
>     running a mixed cluster of 6.10 and 7.8, while awaiting to proceed
>     to upgrade for the issue mentioned earlier with the self-heal daemon.
>
>     The affected volume info looks like;
>
>     # gluster v info docker2
>
>     Volume Name: docker2
>     Type: Distributed-Replicate
>     Volume ID: 4e0670a0-3d00-4360-98bd-3da844cedae5
>     Status: Started
>     Snapshot Count: 0
>     Number of Bricks: 3 x (2 + 1) = 9
>     Transport-type: tcp
>     Bricks:
>     Brick1: 10.201.0.5:/data0/gfs/bricks/brick1/docker2
>     Brick2: 10.201.0.9:/data0/gfs/bricks/brick1/docker2
>     Brick3: 10.201.0.3:/data0/gfs/bricks/bricka/docker2 (arbiter)
>     Brick4: 10.201.0.6:/data0/gfs/bricks/brick1/docker2
>     Brick5: 10.201.0.7:/data0/gfs/bricks/brick1/docker2
>     Brick6: 10.201.0.4:/data0/gfs/bricks/bricka/docker2 (arbiter)
>     Brick7: 10.201.0.1:/data0/gfs/bricks/brick1/docker2
>     Brick8: 10.201.0.8:/data0/gfs/bricks/brick1/docker2
>     Brick9: 10.201.0.2:/data0/gfs/bricks/bricka/docker2 (arbiter)
>     Options Reconfigured:
>     performance.cache-size: 128MB
>     transport.address-family: inet
>     nfs.disable: on
>     cluster.brick-multiplex: on
>
>     The issue seems to be triggered by a program called zammad, which
>     has an init process, which runs in a loop. on cycle it re-compiles
>     the ruby-on-rails application.
>
>     I've attached 2 statedumps, but as i only recently noticed the
>     high memory usage, i believe both statedumps already show an
>     escalated state of the glusterfs process. If it's needed to also
>     have them from the beginning let me know. The dumps are taken
>     about an hour apart.
>     Also i've included the glusterd.log. I couldn't include
>     mnt-docker2.log since it's too large, since it's littered with: "
>     I [MSGID: 109066] [dht-rename.c:1951:dht_rename] 0-docker2-dht"
>     However i've inspected the log and it contains no Error
>     message's all are of the Info kind;
>     which look like these;
>     [2020-11-19 03:29:05.406766] I
>     [glusterfsd-mgmt.c:2282:mgmt_getspec_cbk] 0-glusterfs: No change
>     in volfile,continuing
>     [2020-11-19 03:29:21.271886] I [socket.c:865:__socket_shutdown]
>     0-docker2-client-8: intentional socket shutdown(5)
>     [2020-11-19 03:29:24.479738] I [socket.c:865:__socket_shutdown]
>     0-docker2-client-2: intentional socket shutdown(5)
>     [2020-11-19 03:30:12.318146] I [socket.c:865:__socket_shutdown]
>     0-docker2-client-5: intentional socket shutdown(5)
>     [2020-11-19 03:31:27.381720] I [socket.c:865:__socket_shutdown]
>     0-docker2-client-8: intentional socket shutdown(5)
>     [2020-11-19 03:31:30.579630] I [socket.c:865:__socket_shutdown]
>     0-docker2-client-2: intentional socket shutdown(5)
>     [2020-11-19 03:32:18.427364] I [socket.c:865:__socket_shutdown]
>     0-docker2-client-5: intentional socket shutdown(5)
>
>     The rename messages look like these;
>     [2020-11-19 03:29:05.402663] I [MSGID: 109066]
>     [dht-rename.c:1951:dht_rename] 0-docker2-dht: renaming
>     /corporate/zammad/tmp/init/cache/bootsnap-compile-cache/95/75f93c20e375c5.tmp.eVcE5D
>     (fe083b7e-b0d5-485c-8666-e1f7cdac33e2)
>     (hash=docker2-replicate-2/cache=docker2-replicate-2) =>
>     /corporate/zammad/tmp/init/cache/bootsnap-compile-cache/95/75f93c20e375c5
>     ((null)) (hash=docker2-replicate-2/cache=<nul>)
>     [2020-11-19 03:29:05.410972] I [MSGID: 109066]
>     [dht-rename.c:1951:dht_rename] 0-docker2-dht: renaming
>     /corporate/zammad/tmp/init/cache/bootsnap-compile-cache/0d/86dd25f3d238ff.tmp.AdDTLu
>     (b1edadad-1d48-4bf4-be85-ffbe9d69d338)
>     (hash=docker2-replicate-1/cache=docker2-replicate-1) =>
>     /corporate/zammad/tmp/init/cache/bootsnap-compile-cache/0d/86dd25f3d238ff
>     ((null)) (hash=docker2-replicate-2/cache=<nul>)
>     [2020-11-19 03:29:05.420064] I [MSGID: 109066]
>     [dht-rename.c:1951:dht_rename] 0-docker2-dht: renaming
>     /corporate/zammad/tmp/init/cache/bootsnap-compile-cache/f2/6e44f76b508fd3.tmp.QKmxul
>     (31f80fcb-977c-433b-9259-5fdfcad1171c)
>     (hash=docker2-replicate-0/cache=docker2-replicate-0) =>
>     /corporate/zammad/tmp/init/cache/bootsnap-compile-cache/f2/6e44f76b508fd3
>     ((null)) (hash=docker2-replicate-0/cache=<nul>)
>     [2020-11-19 03:29:05.427537] I [MSGID: 109066]
>     [dht-rename.c:1951:dht_rename] 0-docker2-dht: renaming
>     /corporate/zammad/tmp/init/cache/bootsnap-compile-cache/b0/1d7303d9dfe009.tmp.qLUMec
>     (e2fdf971-731f-4765-80e8-3165433488ea)
>     (hash=docker2-replicate-2/cache=docker2-replicate-2) =>
>     /corporate/zammad/tmp/init/cache/bootsnap-compile-cache/b0/1d7303d9dfe009
>     ((null)) (hash=docker2-replicate-1/cache=<nul>)
>     [2020-11-19 03:29:05.440576] I [MSGID: 109066]
>     [dht-rename.c:1951:dht_rename] 0-docker2-dht: renaming
>     /corporate/zammad/tmp/init/cache/bootsnap-compile-cache/bd/952a089e164b36.tmp.4qvl22
>     (3e0bc6d1-13ac-47c6-b221-1256b4b506ef)
>     (hash=docker2-replicate-2/cache=docker2-replicate-2) =>
>     /corporate/zammad/tmp/init/cache/bootsnap-compile-cache/bd/952a089e164b36
>     ((null)) (hash=docker2-replicate-1/cache=<nul>)
>     [2020-11-19 03:29:05.452407] I [MSGID: 109066]
>     [dht-rename.c:1951:dht_rename] 0-docker2-dht: renaming
>     /corporate/zammad/tmp/init/cache/bootsnap-compile-cache/a3/b587dd08f35e2e.tmp.iIweTT
>     (9685b5f3-4b14-4050-9b00-1163856239b5)
>     (hash=docker2-replicate-1/cache=docker2-replicate-1) =>
>     /corporate/zammad/tmp/init/cache/bootsnap-compile-cache/a3/b587dd08f35e2e
>     ((null)) (hash=docker2-replicate-0/cache=<nul>)
>     [2020-11-19 03:29:05.460720] I [MSGID: 109066]
>     [dht-rename.c:1951:dht_rename] 0-docker2-dht: renaming
>     /corporate/zammad/tmp/init/cache/bootsnap-compile-cache/48/89cfb1b971c025.tmp.0W7jMK
>     (d0a8d0a4-c783-45db-bb4a-68e24044d830)
>     (hash=docker2-replicate-0/cache=docker2-replicate-0) =>
>     /corporate/zammad/tmp/init/cache/bootsnap-compile-cache/48/89cfb1b971c025
>     ((null)) (hash=docker2-replicate-1/cache=<nul>)
>     [2020-11-19 03:29:05.468800] I [MSGID: 109066]
>     [dht-rename.c:1951:dht_rename] 0-docker2-dht: renaming
>     /corporate/zammad/tmp/init/cache/bootsnap-compile-cache/d9/759d55e8da66eb.tmp.2yXtHB
>     (e5b61ef5-a3c2-4a2c-aa47-c377a6c090d7)
>     (hash=docker2-replicate-0/cache=docker2-replicate-0) =>
>     /corporate/zammad/tmp/init/cache/bootsnap-compile-cache/d9/759d55e8da66eb
>     ((null)) (hash=docker2-replicate-0/cache=<nul>)
>     [2020-11-19 03:29:05.476745] I [MSGID: 109066]
>     [dht-rename.c:1951:dht_rename] 0-docker2-dht: renaming
>     /corporate/zammad/tmp/init/cache/bootsnap-compile-cache/1c/f3a658342e36b7.tmp.gSkiEs
>     (17181a40-f9b2-438f-9dfc-7bb159c516e6)
>     (hash=docker2-replicate-2/cache=docker2-replicate-2) =>
>     /corporate/zammad/tmp/init/cache/bootsnap-compile-cache/1c/f3a658342e36b7
>     ((null)) (hash=docker2-replicate-0/cache=<nul>)
>     [2020-11-19 03:29:05.486729] I [MSGID: 109066]
>     [dht-rename.c:1951:dht_rename] 0-docker2-dht: renaming
>     /corporate/zammad/tmp/init/cache/bootsnap-compile-cache/f1/6bef7cb6446c7a.tmp.sVT0Dj
>     (cb6b1d52-b1c0-420c-86b7-2ceb8e8e73db)
>     (hash=docker2-replicate-0/cache=docker2-replicate-0) =>
>     /corporate/zammad/tmp/init/cache/bootsnap-compile-cache/f1/6bef7cb6446c7a
>     ((null)) (hash=docker2-replicate-1/cache=<nul>)
>     [2020-11-19 03:29:05.495115] I [MSGID: 109066]
>     [dht-rename.c:1951:dht_rename] 0-docker2-dht: renaming
>     /corporate/zammad/tmp/init/cache/bootsnap-compile-cache/45/73ba226559961b.tmp.QdPTFa
>     (d8450d9e-62a7-4fd5-9dd2-e072e318d9a0)
>     (hash=docker2-replicate-0/cache=docker2-replicate-0) =>
>     /corporate/zammad/tmp/init/cache/bootsnap-compile-cache/45/73ba226559961b
>     ((null)) (hash=docker2-replicate-1/cache=<nul>)
>     [2020-11-19 03:29:05.503424] I [MSGID: 109066]
>     [dht-rename.c:1951:dht_rename] 0-docker2-dht: renaming
>     /corporate/zammad/tmp/init/cache/bootsnap-compile-cache/13/29c0df35961ca0.tmp.s1xUJ1
>     (ffc57a77-8b91-4264-8e2d-a9966f0f37ef)
>     (hash=docker2-replicate-1/cache=docker2-replicate-1) =>
>     /corporate/zammad/tmp/init/cache/bootsnap-compile-cache/13/29c0df35961ca0
>     ((null)) (hash=docker2-replicate-2/cache=<nul>)
>     [2020-11-19 03:29:05.513532] I [MSGID: 109066]
>     [dht-rename.c:1951:dht_rename] 0-docker2-dht: renaming
>     /corporate/zammad/tmp/init/cache/bootsnap-compile-cache/be/8d6a07b6a0d6ad.tmp.A5DzQS
>     (5a595a65-372d-4377-b547-2c4e23f7be3a)
>     (hash=docker2-replicate-1/cache=docker2-replicate-1) =>
>     /corporate/zammad/tmp/init/cache/bootsnap-compile-cache/be/8d6a07b6a0d6ad
>     ((null)) (hash=docker2-replicate-0/cache=<nul>)
>     [2020-11-19 03:29:05.526885] I [MSGID: 109066]
>     [dht-rename.c:1951:dht_rename] 0-docker2-dht: renaming
>     /corporate/zammad/tmp/init/cache/bootsnap-compile-cache/ec/4208216d993cbe.tmp.IMXg0J
>     (2fa99fcd-64f8-4934-aeda-b356816f1132)
>     (hash=docker2-replicate-2/cache=docker2-replicate-2) =>
>     /corporate/zammad/tmp/init/cache/bootsnap-compile-cache/ec/4208216d993cbe
>     ((null)) (hash=docker2-replicate-2/cache=<nul>)
>     [2020-11-19 03:29:05.537637] I [MSGID: 109066]
>     [dht-rename.c:1951:dht_rename] 0-docker2-dht: renaming
>     /corporate/zammad/tmp/init/cache/bootsnap-compile-cache/57/1527c482cf2d6b.tmp.Y2L0cB
>     (db24d7bf-4a06-4356-a52e-1ab9537d1c3a)
>     (hash=docker2-replicate-0/cache=docker2-replicate-0) =>
>     /corporate/zammad/tmp/init/cache/bootsnap-compile-cache/57/1527c482cf2d6b
>     ((null)) (hash=docker2-replicate-1/cache=<nul>)
>     [2020-11-19 03:29:05.547878] I [MSGID: 109066]
>     [dht-rename.c:1951:dht_rename] 0-docker2-dht: renaming
>     /corporate/zammad/tmp/init/cache/bootsnap-compile-cache/88/1b60ead8d4c4e5.tmp.u47rss
>     (b12f041b-5bbd-4e3d-b700-8f673830393f)
>     (hash=docker2-replicate-1/cache=docker2-replicate-1) =>
>     /corporate/zammad/tmp/init/cache/bootsnap-compile-cache/88/1b60ead8d4c4e5
>     ((null)) (hash=docker2-replicate-1/cache=<nul>)
>
>     if i can provide any more information please let me know.
>
>     Thanks Olaf
>
>
>     ________
>
>
>
>     Community Meeting Calendar:
>
>     Schedule -
>     Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
>     Bridge: https://meet.google.com/cpu-eiue-hvk
>     <https://meet.google.com/cpu-eiue-hvk>
>     Gluster-users mailing list
>     Gluster-users at gluster.org <mailto:Gluster-users at gluster.org>
>     https://lists.gluster.org/mailman/listinfo/gluster-users
>     <https://lists.gluster.org/mailman/listinfo/gluster-users>
>
>
> ________
>
>
>
> Community Meeting Calendar:
>
> Schedule -
> Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
> Bridge: https://meet.google.com/cpu-eiue-hvk
> Gluster-users mailing list
> Gluster-users at gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-users
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20201125/5ac8d98b/attachment.html>