[Gluster-users] possible memory leak in client/fuse mount

Thu Nov 26 10:52:49 UTC 2020

On 26/11/20 4:00 pm, Olaf Buitelaar wrote:
> Hi Ravi,
>
> I could try that, but i can only try a setup on VM's, and will not be 
> able to setup an environment like our production environment.
> Which runs on physical machines, and has actual production load etc. 
> So the 2 setups would be quite different.
> Personally i think it would be best debug the actual machines instead 
> of trying to reproduce it. Since the reproduction of the issue on the 
> physical machines is just swap the repositories and upgrade the packages.
> Let me know what you think?

Physical machines or VMs - anything is fine. The only thing is I cannot 
guarantee quick responses , so if it is a production machine, it will be 
an issue for you. So any set up you can use for experimenting is fine. 
You don't need any clients for the testing. Just create a 1x2  replica 
volume using 2 nodes and start it. Then upgrade one node and see if shd 
and bricks come up on that node.

-Ravi

>
> Thanks Olaf
>
> Op do 26 nov. 2020 om 02:43 schreef Ravishankar N 
> <ravishankar at redhat.com <mailto:ravishankar at redhat.com>>:
>
>
>     On 25/11/20 7:17 pm, Olaf Buitelaar wrote:
>>     Hi Ravi,
>>
>>     Thanks for checking. Unfortunately this is our production system,
>>     what i've done is simple change the yum repo from gluter-6 to
>>     http://mirror.centos.org/centos/$releasever/storage/$basearch/gluster-7/
>>     <http://mirror.centos.org/centos/$releasever/storage/$basearch/gluster-7/>.
>>     Did a yum upgrade. And did restart the glusterd process
>>     several times, i've also tried rebooting the machine. And didn't
>>     touch the op-version yet, which is still at (60000), usually i
>>     only do this when all nodes are upgraded, and are running stable.
>>     We're running multiple volumes with different configurations, but
>>     for none of the volumes the shd starts on the upgraded nodes.
>>     Is there anything further i could check/do to get to the bottom
>>     of this?
>
>     Hi Olaf, like I said, would it be possible to create a test setup
>     to see if you can recreate it?
>
>     Regards,
>     Ravi
>>
>>     Thanks Olaf
>>
>>     Op wo 25 nov. 2020 om 14:14 schreef Ravishankar N
>>     <ravishankar at redhat.com <mailto:ravishankar at redhat.com>>:
>>
>>
>>         On 25/11/20 5:50 pm, Olaf Buitelaar wrote:
>>>         Hi Ashish,
>>>
>>>         Thank you for looking into this. I indeed also suspect it
>>>         has something todo with the 7.X client, because on the 6.X
>>>         clients the issue doesn't really seem to occur.
>>>         I would love to update everything to 7.X, But since the
>>>         self-heal daemons
>>>         (https://lists.gluster.org/pipermail/gluster-users/2020-November/038917.html
>>>         <https://lists.gluster.org/pipermail/gluster-users/2020-November/038917.html>)
>>>         won't start, i halted the full upgrade.
>>
>>         Olaf, based on your email. I did try to upgrade a 1 node of a
>>         3-node replica 3 setup from 6.10 to 7.8 on my test VMs and I
>>         found that the self-heal daemon (and the bricks) came online
>>         after I restarted glusterd post-upgrade on that node. (I did
>>         not touch the op-version), and I did not spend time on it
>>         further.  So I don't think the problem is related to the shd
>>         mux changes I referred to. But if you have a test setup where
>>         you can reproduce this, please raise a github issue with the
>>         details.
>>
>>         Thanks,
>>         Ravi
>>>         Hopefully that issue will be addressed in the upcoming
>>>         release. Once i've everything running on the same version
>>>         i'll check if the issue still occurs and reach out, if
>>>         that's the case.
>>>
>>>         Thanks Olaf
>>>
>>>         Op wo 25 nov. 2020 om 10:42 schreef Ashish Pandey
>>>         <aspandey at redhat.com <mailto:aspandey at redhat.com>>:
>>>
>>>
>>>             Hi,
>>>
>>>             I checked the statedump and found some very high memory
>>>             allocations.
>>>             grep -rwn "num_allocs" glusterdump.17317.dump.1605* |
>>>             cut -d'=' -f2 | sort
>>>
>>>             30003616
>>>             30003616
>>>             3305
>>>             3305
>>>             36960008
>>>             36960008
>>>             38029944
>>>             38029944
>>>             38450472
>>>             38450472
>>>             39566824
>>>             39566824
>>>             4
>>>             I did check the lines on statedump and it could be
>>>             happening in protocol/clinet. However, I did not find
>>>             anything suspicious in my quick code exploration.
>>>             I would suggest to upgrade all the nodes on latest
>>>             version and the start your work and see if there is any
>>>             high usage of memory .
>>>             That way it will also be easier to debug this issue.
>>>
>>>             ---
>>>             Ashish
>>>
>>>             ------------------------------------------------------------------------
>>>             *From: *"Olaf Buitelaar" <olaf.buitelaar at gmail.com
>>>             <mailto:olaf.buitelaar at gmail.com>>
>>>             *To: *"gluster-users" <gluster-users at gluster.org
>>>             <mailto:gluster-users at gluster.org>>
>>>             *Sent: *Thursday, November 19, 2020 10:28:57 PM
>>>             *Subject: *[Gluster-users] possible memory leak in
>>>             client/fuse mount
>>>
>>>             Dear Gluster Users,
>>>
>>>             I've a glusterfs process which consumes about all memory
>>>             of the machine (~58GB);
>>>
>>>             # ps -faxu|grep 17317
>>>             root     17317  3.1 88.9 59695516 58479708 ?   Ssl
>>>              Oct31 839:36 /usr/sbin/glusterfs --process-name fuse
>>>             --volfile-server=10.201.0.1
>>>             --volfile-server=10.201.0.8:10.201.0.5:10.201.0.6:10.201.0.7:10.201.0.9
>>>             --volfile-id=/docker2 /mnt/docker2
>>>
>>>             The gluster version on this machine is 7.8, but i'm
>>>             currently running a mixed cluster of 6.10 and 7.8, while
>>>             awaiting to proceed to upgrade for the issue mentioned
>>>             earlier with the self-heal daemon.
>>>
>>>             The affected volume info looks like;
>>>
>>>             # gluster v info docker2
>>>
>>>             Volume Name: docker2
>>>             Type: Distributed-Replicate
>>>             Volume ID: 4e0670a0-3d00-4360-98bd-3da844cedae5
>>>             Status: Started
>>>             Snapshot Count: 0
>>>             Number of Bricks: 3 x (2 + 1) = 9
>>>             Transport-type: tcp
>>>             Bricks:
>>>             Brick1: 10.201.0.5:/data0/gfs/bricks/brick1/docker2
>>>             Brick2: 10.201.0.9:/data0/gfs/bricks/brick1/docker2
>>>             Brick3: 10.201.0.3:/data0/gfs/bricks/bricka/docker2
>>>             (arbiter)
>>>             Brick4: 10.201.0.6:/data0/gfs/bricks/brick1/docker2
>>>             Brick5: 10.201.0.7:/data0/gfs/bricks/brick1/docker2
>>>             Brick6: 10.201.0.4:/data0/gfs/bricks/bricka/docker2
>>>             (arbiter)
>>>             Brick7: 10.201.0.1:/data0/gfs/bricks/brick1/docker2
>>>             Brick8: 10.201.0.8:/data0/gfs/bricks/brick1/docker2
>>>             Brick9: 10.201.0.2:/data0/gfs/bricks/bricka/docker2
>>>             (arbiter)
>>>             Options Reconfigured:
>>>             performance.cache-size: 128MB
>>>             transport.address-family: inet
>>>             nfs.disable: on
>>>             cluster.brick-multiplex: on
>>>
>>>             The issue seems to be triggered by a program called
>>>             zammad, which has an init process, which runs in a loop.
>>>             on cycle it re-compiles the ruby-on-rails application.
>>>
>>>             I've attached 2 statedumps, but as i only recently
>>>             noticed the high memory usage, i believe both
>>>             statedumps already show an escalated state of the
>>>             glusterfs process. If it's needed to also have them from
>>>             the beginning let me know. The dumps are taken about an
>>>             hour apart.
>>>             Also i've included the glusterd.log. I couldn't include
>>>             mnt-docker2.log since it's too large, since it's
>>>             littered with: " I [MSGID: 109066]
>>>             [dht-rename.c:1951:dht_rename] 0-docker2-dht"
>>>             However i've inspected the log and it contains no Error
>>>             message's all are of the Info kind;
>>>             which look like these;
>>>             [2020-11-19 03:29:05.406766] I
>>>             [glusterfsd-mgmt.c:2282:mgmt_getspec_cbk] 0-glusterfs:
>>>             No change in volfile,continuing
>>>             [2020-11-19 03:29:21.271886] I
>>>             [socket.c:865:__socket_shutdown] 0-docker2-client-8:
>>>             intentional socket shutdown(5)
>>>             [2020-11-19 03:29:24.479738] I
>>>             [socket.c:865:__socket_shutdown] 0-docker2-client-2:
>>>             intentional socket shutdown(5)
>>>             [2020-11-19 03:30:12.318146] I
>>>             [socket.c:865:__socket_shutdown] 0-docker2-client-5:
>>>             intentional socket shutdown(5)
>>>             [2020-11-19 03:31:27.381720] I
>>>             [socket.c:865:__socket_shutdown] 0-docker2-client-8:
>>>             intentional socket shutdown(5)
>>>             [2020-11-19 03:31:30.579630] I
>>>             [socket.c:865:__socket_shutdown] 0-docker2-client-2:
>>>             intentional socket shutdown(5)
>>>             [2020-11-19 03:32:18.427364] I
>>>             [socket.c:865:__socket_shutdown] 0-docker2-client-5:
>>>             intentional socket shutdown(5)
>>>
>>>             The rename messages look like these;
>>>             [2020-11-19 03:29:05.402663] I [MSGID: 109066]
>>>             [dht-rename.c:1951:dht_rename] 0-docker2-dht: renaming
>>>             /corporate/zammad/tmp/init/cache/bootsnap-compile-cache/95/75f93c20e375c5.tmp.eVcE5D
>>>             (fe083b7e-b0d5-485c-8666-e1f7cdac33e2)
>>>             (hash=docker2-replicate-2/cache=docker2-replicate-2) =>
>>>             /corporate/zammad/tmp/init/cache/bootsnap-compile-cache/95/75f93c20e375c5
>>>             ((null)) (hash=docker2-replicate-2/cache=<nul>)
>>>             [2020-11-19 03:29:05.410972] I [MSGID: 109066]
>>>             [dht-rename.c:1951:dht_rename] 0-docker2-dht: renaming
>>>             /corporate/zammad/tmp/init/cache/bootsnap-compile-cache/0d/86dd25f3d238ff.tmp.AdDTLu
>>>             (b1edadad-1d48-4bf4-be85-ffbe9d69d338)
>>>             (hash=docker2-replicate-1/cache=docker2-replicate-1) =>
>>>             /corporate/zammad/tmp/init/cache/bootsnap-compile-cache/0d/86dd25f3d238ff
>>>             ((null)) (hash=docker2-replicate-2/cache=<nul>)
>>>             [2020-11-19 03:29:05.420064] I [MSGID: 109066]
>>>             [dht-rename.c:1951:dht_rename] 0-docker2-dht: renaming
>>>             /corporate/zammad/tmp/init/cache/bootsnap-compile-cache/f2/6e44f76b508fd3.tmp.QKmxul
>>>             (31f80fcb-977c-433b-9259-5fdfcad1171c)
>>>             (hash=docker2-replicate-0/cache=docker2-replicate-0) =>
>>>             /corporate/zammad/tmp/init/cache/bootsnap-compile-cache/f2/6e44f76b508fd3
>>>             ((null)) (hash=docker2-replicate-0/cache=<nul>)
>>>             [2020-11-19 03:29:05.427537] I [MSGID: 109066]
>>>             [dht-rename.c:1951:dht_rename] 0-docker2-dht: renaming
>>>             /corporate/zammad/tmp/init/cache/bootsnap-compile-cache/b0/1d7303d9dfe009.tmp.qLUMec
>>>             (e2fdf971-731f-4765-80e8-3165433488ea)
>>>             (hash=docker2-replicate-2/cache=docker2-replicate-2) =>
>>>             /corporate/zammad/tmp/init/cache/bootsnap-compile-cache/b0/1d7303d9dfe009
>>>             ((null)) (hash=docker2-replicate-1/cache=<nul>)
>>>             [2020-11-19 03:29:05.440576] I [MSGID: 109066]
>>>             [dht-rename.c:1951:dht_rename] 0-docker2-dht: renaming
>>>             /corporate/zammad/tmp/init/cache/bootsnap-compile-cache/bd/952a089e164b36.tmp.4qvl22
>>>             (3e0bc6d1-13ac-47c6-b221-1256b4b506ef)
>>>             (hash=docker2-replicate-2/cache=docker2-replicate-2) =>
>>>             /corporate/zammad/tmp/init/cache/bootsnap-compile-cache/bd/952a089e164b36
>>>             ((null)) (hash=docker2-replicate-1/cache=<nul>)
>>>             [2020-11-19 03:29:05.452407] I [MSGID: 109066]
>>>             [dht-rename.c:1951:dht_rename] 0-docker2-dht: renaming
>>>             /corporate/zammad/tmp/init/cache/bootsnap-compile-cache/a3/b587dd08f35e2e.tmp.iIweTT
>>>             (9685b5f3-4b14-4050-9b00-1163856239b5)
>>>             (hash=docker2-replicate-1/cache=docker2-replicate-1) =>
>>>             /corporate/zammad/tmp/init/cache/bootsnap-compile-cache/a3/b587dd08f35e2e
>>>             ((null)) (hash=docker2-replicate-0/cache=<nul>)
>>>             [2020-11-19 03:29:05.460720] I [MSGID: 109066]
>>>             [dht-rename.c:1951:dht_rename] 0-docker2-dht: renaming
>>>             /corporate/zammad/tmp/init/cache/bootsnap-compile-cache/48/89cfb1b971c025.tmp.0W7jMK
>>>             (d0a8d0a4-c783-45db-bb4a-68e24044d830)
>>>             (hash=docker2-replicate-0/cache=docker2-replicate-0) =>
>>>             /corporate/zammad/tmp/init/cache/bootsnap-compile-cache/48/89cfb1b971c025
>>>             ((null)) (hash=docker2-replicate-1/cache=<nul>)
>>>             [2020-11-19 03:29:05.468800] I [MSGID: 109066]
>>>             [dht-rename.c:1951:dht_rename] 0-docker2-dht: renaming
>>>             /corporate/zammad/tmp/init/cache/bootsnap-compile-cache/d9/759d55e8da66eb.tmp.2yXtHB
>>>             (e5b61ef5-a3c2-4a2c-aa47-c377a6c090d7)
>>>             (hash=docker2-replicate-0/cache=docker2-replicate-0) =>
>>>             /corporate/zammad/tmp/init/cache/bootsnap-compile-cache/d9/759d55e8da66eb
>>>             ((null)) (hash=docker2-replicate-0/cache=<nul>)
>>>             [2020-11-19 03:29:05.476745] I [MSGID: 109066]
>>>             [dht-rename.c:1951:dht_rename] 0-docker2-dht: renaming
>>>             /corporate/zammad/tmp/init/cache/bootsnap-compile-cache/1c/f3a658342e36b7.tmp.gSkiEs
>>>             (17181a40-f9b2-438f-9dfc-7bb159c516e6)
>>>             (hash=docker2-replicate-2/cache=docker2-replicate-2) =>
>>>             /corporate/zammad/tmp/init/cache/bootsnap-compile-cache/1c/f3a658342e36b7
>>>             ((null)) (hash=docker2-replicate-0/cache=<nul>)
>>>             [2020-11-19 03:29:05.486729] I [MSGID: 109066]
>>>             [dht-rename.c:1951:dht_rename] 0-docker2-dht: renaming
>>>             /corporate/zammad/tmp/init/cache/bootsnap-compile-cache/f1/6bef7cb6446c7a.tmp.sVT0Dj
>>>             (cb6b1d52-b1c0-420c-86b7-2ceb8e8e73db)
>>>             (hash=docker2-replicate-0/cache=docker2-replicate-0) =>
>>>             /corporate/zammad/tmp/init/cache/bootsnap-compile-cache/f1/6bef7cb6446c7a
>>>             ((null)) (hash=docker2-replicate-1/cache=<nul>)
>>>             [2020-11-19 03:29:05.495115] I [MSGID: 109066]
>>>             [dht-rename.c:1951:dht_rename] 0-docker2-dht: renaming
>>>             /corporate/zammad/tmp/init/cache/bootsnap-compile-cache/45/73ba226559961b.tmp.QdPTFa
>>>             (d8450d9e-62a7-4fd5-9dd2-e072e318d9a0)
>>>             (hash=docker2-replicate-0/cache=docker2-replicate-0) =>
>>>             /corporate/zammad/tmp/init/cache/bootsnap-compile-cache/45/73ba226559961b
>>>             ((null)) (hash=docker2-replicate-1/cache=<nul>)
>>>             [2020-11-19 03:29:05.503424] I [MSGID: 109066]
>>>             [dht-rename.c:1951:dht_rename] 0-docker2-dht: renaming
>>>             /corporate/zammad/tmp/init/cache/bootsnap-compile-cache/13/29c0df35961ca0.tmp.s1xUJ1
>>>             (ffc57a77-8b91-4264-8e2d-a9966f0f37ef)
>>>             (hash=docker2-replicate-1/cache=docker2-replicate-1) =>
>>>             /corporate/zammad/tmp/init/cache/bootsnap-compile-cache/13/29c0df35961ca0
>>>             ((null)) (hash=docker2-replicate-2/cache=<nul>)
>>>             [2020-11-19 03:29:05.513532] I [MSGID: 109066]
>>>             [dht-rename.c:1951:dht_rename] 0-docker2-dht: renaming
>>>             /corporate/zammad/tmp/init/cache/bootsnap-compile-cache/be/8d6a07b6a0d6ad.tmp.A5DzQS
>>>             (5a595a65-372d-4377-b547-2c4e23f7be3a)
>>>             (hash=docker2-replicate-1/cache=docker2-replicate-1) =>
>>>             /corporate/zammad/tmp/init/cache/bootsnap-compile-cache/be/8d6a07b6a0d6ad
>>>             ((null)) (hash=docker2-replicate-0/cache=<nul>)
>>>             [2020-11-19 03:29:05.526885] I [MSGID: 109066]
>>>             [dht-rename.c:1951:dht_rename] 0-docker2-dht: renaming
>>>             /corporate/zammad/tmp/init/cache/bootsnap-compile-cache/ec/4208216d993cbe.tmp.IMXg0J
>>>             (2fa99fcd-64f8-4934-aeda-b356816f1132)
>>>             (hash=docker2-replicate-2/cache=docker2-replicate-2) =>
>>>             /corporate/zammad/tmp/init/cache/bootsnap-compile-cache/ec/4208216d993cbe
>>>             ((null)) (hash=docker2-replicate-2/cache=<nul>)
>>>             [2020-11-19 03:29:05.537637] I [MSGID: 109066]
>>>             [dht-rename.c:1951:dht_rename] 0-docker2-dht: renaming
>>>             /corporate/zammad/tmp/init/cache/bootsnap-compile-cache/57/1527c482cf2d6b.tmp.Y2L0cB
>>>             (db24d7bf-4a06-4356-a52e-1ab9537d1c3a)
>>>             (hash=docker2-replicate-0/cache=docker2-replicate-0) =>
>>>             /corporate/zammad/tmp/init/cache/bootsnap-compile-cache/57/1527c482cf2d6b
>>>             ((null)) (hash=docker2-replicate-1/cache=<nul>)
>>>             [2020-11-19 03:29:05.547878] I [MSGID: 109066]
>>>             [dht-rename.c:1951:dht_rename] 0-docker2-dht: renaming
>>>             /corporate/zammad/tmp/init/cache/bootsnap-compile-cache/88/1b60ead8d4c4e5.tmp.u47rss
>>>             (b12f041b-5bbd-4e3d-b700-8f673830393f)
>>>             (hash=docker2-replicate-1/cache=docker2-replicate-1) =>
>>>             /corporate/zammad/tmp/init/cache/bootsnap-compile-cache/88/1b60ead8d4c4e5
>>>             ((null)) (hash=docker2-replicate-1/cache=<nul>)
>>>
>>>             if i can provide any more information please let me know.
>>>
>>>             Thanks Olaf
>>>
>>>
>>>             ________
>>>
>>>
>>>
>>>             Community Meeting Calendar:
>>>
>>>             Schedule -
>>>             Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
>>>             Bridge: https://meet.google.com/cpu-eiue-hvk
>>>             <https://meet.google.com/cpu-eiue-hvk>
>>>             Gluster-users mailing list
>>>             Gluster-users at gluster.org <mailto:Gluster-users at gluster.org>
>>>             https://lists.gluster.org/mailman/listinfo/gluster-users
>>>             <https://lists.gluster.org/mailman/listinfo/gluster-users>
>>>
>>>
>>>         ________
>>>
>>>
>>>
>>>         Community Meeting Calendar:
>>>
>>>         Schedule -
>>>         Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
>>>         Bridge:https://meet.google.com/cpu-eiue-hvk  <https://meet.google.com/cpu-eiue-hvk>
>>>         Gluster-users mailing list
>>>         Gluster-users at gluster.org  <mailto:Gluster-users at gluster.org>
>>>         https://lists.gluster.org/mailman/listinfo/gluster-users  <https://lists.gluster.org/mailman/listinfo/gluster-users>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20201126/258a1141/attachment.html>