[Gluster-users] possible memory leak in client/fuse mount

Olaf Buitelaar olaf.buitelaar at gmail.com
Thu Nov 26 10:30:04 UTC 2020


Hi Ravi,

I could try that, but i can only try a setup on VM's, and will not be able
to setup an environment like our production environment.
Which runs on physical machines, and has actual production load etc. So the
2 setups would be quite different.
Personally i think it would be best debug the actual machines instead of
trying to reproduce it. Since the reproduction of the issue on the
physical machines is just swap the repositories and upgrade the packages.
Let me know what you think?

Thanks Olaf

Op do 26 nov. 2020 om 02:43 schreef Ravishankar N <ravishankar at redhat.com>:

>
> On 25/11/20 7:17 pm, Olaf Buitelaar wrote:
>
> Hi Ravi,
>
> Thanks for checking. Unfortunately this is our production system, what
> i've done is simple change the yum repo from gluter-6 to
> http://mirror.centos.org/centos/$releasever/storage/$basearch/gluster-7/.
> Did a yum upgrade. And did restart the glusterd process several times, i've
> also tried rebooting the machine. And didn't touch the op-version yet,
> which is still at (60000), usually i only do this when all nodes are
> upgraded, and are running stable.
> We're running multiple volumes with different configurations, but for none
> of the volumes the shd starts on the upgraded nodes.
> Is there anything further i could check/do to get to the bottom of this?
>
> Hi Olaf, like I said, would it be possible to create a test setup to see
> if you can recreate it?
> Regards,
> Ravi
>
>
> Thanks Olaf
>
> Op wo 25 nov. 2020 om 14:14 schreef Ravishankar N <ravishankar at redhat.com
> >:
>
>>
>> On 25/11/20 5:50 pm, Olaf Buitelaar wrote:
>>
>> Hi Ashish,
>>
>> Thank you for looking into this. I indeed also suspect it has something
>> todo with the 7.X client, because on the 6.X clients the issue doesn't
>> really seem to occur.
>> I would love to update everything to 7.X, But since the self-heal daemons
>> (
>> https://lists.gluster.org/pipermail/gluster-users/2020-November/038917.html)
>> won't start, i halted the full upgrade.
>>
>> Olaf, based on your email. I did try to upgrade a 1 node of a 3-node
>> replica 3 setup from 6.10 to 7.8 on my test VMs and I found that the
>> self-heal daemon (and the bricks) came online after I restarted glusterd
>> post-upgrade on that node. (I did not touch the op-version), and I did not
>> spend time on it further.  So I don't think the problem is related to the
>> shd mux changes I referred to. But if you have a test setup where you can
>> reproduce this, please raise a github issue with the details.
>> Thanks,
>> Ravi
>>
>> Hopefully that issue will be addressed in the upcoming release. Once i've
>> everything running on the same version i'll check if the issue still occurs
>> and reach out, if that's the case.
>>
>> Thanks Olaf
>>
>> Op wo 25 nov. 2020 om 10:42 schreef Ashish Pandey <aspandey at redhat.com>:
>>
>>>
>>> Hi,
>>>
>>> I checked the statedump and found some very high memory allocations.
>>> grep -rwn "num_allocs" glusterdump.17317.dump.1605* | cut -d'=' -f2 |
>>> sort
>>>
>>> 30003616
>>> 30003616
>>> 3305
>>> 3305
>>> 36960008
>>> 36960008
>>> 38029944
>>> 38029944
>>> 38450472
>>> 38450472
>>> 39566824
>>> 39566824
>>> 4
>>> I did check the lines on statedump and it could be happening in
>>> protocol/clinet. However, I did not find anything suspicious in my quick
>>> code exploration.
>>> I would suggest to upgrade all the nodes on latest version and the start
>>> your work and see if there is any high usage of memory .
>>> That way it will also be easier to debug this issue.
>>>
>>> ---
>>> Ashish
>>>
>>> ------------------------------
>>> *From: *"Olaf Buitelaar" <olaf.buitelaar at gmail.com>
>>> *To: *"gluster-users" <gluster-users at gluster.org>
>>> *Sent: *Thursday, November 19, 2020 10:28:57 PM
>>> *Subject: *[Gluster-users] possible memory leak in client/fuse mount
>>>
>>> Dear Gluster Users,
>>>
>>> I've a glusterfs process which consumes about all memory of the machine
>>> (~58GB);
>>>
>>> # ps -faxu|grep 17317
>>> root     17317  3.1 88.9 59695516 58479708 ?   Ssl  Oct31 839:36
>>> /usr/sbin/glusterfs --process-name fuse --volfile-server=10.201.0.1
>>> --volfile-server=10.201.0.8:10.201.0.5:10.201.0.6:10.201.0.7:10.201.0.9
>>> --volfile-id=/docker2 /mnt/docker2
>>>
>>> The gluster version on this machine is 7.8, but i'm currently running a
>>> mixed cluster of 6.10 and 7.8, while awaiting to proceed to upgrade for the
>>> issue mentioned earlier with the self-heal daemon.
>>>
>>> The affected volume info looks like;
>>>
>>> # gluster v info docker2
>>>
>>> Volume Name: docker2
>>> Type: Distributed-Replicate
>>> Volume ID: 4e0670a0-3d00-4360-98bd-3da844cedae5
>>> Status: Started
>>> Snapshot Count: 0
>>> Number of Bricks: 3 x (2 + 1) = 9
>>> Transport-type: tcp
>>> Bricks:
>>> Brick1: 10.201.0.5:/data0/gfs/bricks/brick1/docker2
>>> Brick2: 10.201.0.9:/data0/gfs/bricks/brick1/docker2
>>> Brick3: 10.201.0.3:/data0/gfs/bricks/bricka/docker2 (arbiter)
>>> Brick4: 10.201.0.6:/data0/gfs/bricks/brick1/docker2
>>> Brick5: 10.201.0.7:/data0/gfs/bricks/brick1/docker2
>>> Brick6: 10.201.0.4:/data0/gfs/bricks/bricka/docker2 (arbiter)
>>> Brick7: 10.201.0.1:/data0/gfs/bricks/brick1/docker2
>>> Brick8: 10.201.0.8:/data0/gfs/bricks/brick1/docker2
>>> Brick9: 10.201.0.2:/data0/gfs/bricks/bricka/docker2 (arbiter)
>>> Options Reconfigured:
>>> performance.cache-size: 128MB
>>> transport.address-family: inet
>>> nfs.disable: on
>>> cluster.brick-multiplex: on
>>>
>>> The issue seems to be triggered by a program called zammad, which has an
>>> init process, which runs in a loop. on cycle it re-compiles the
>>> ruby-on-rails application.
>>>
>>> I've attached 2 statedumps, but as i only recently noticed the high
>>> memory usage, i believe both statedumps already show an escalated state of
>>> the glusterfs process. If it's needed to also have them from the beginning
>>> let me know. The dumps are taken about an hour apart.
>>> Also i've included the glusterd.log. I couldn't include mnt-docker2.log
>>> since it's too large, since it's littered with: " I [MSGID: 109066]
>>> [dht-rename.c:1951:dht_rename] 0-docker2-dht"
>>> However i've inspected the log and it contains no Error message's all
>>> are of the Info kind;
>>> which look like these;
>>> [2020-11-19 03:29:05.406766] I [glusterfsd-mgmt.c:2282:mgmt_getspec_cbk]
>>> 0-glusterfs: No change in volfile,continuing
>>> [2020-11-19 03:29:21.271886] I [socket.c:865:__socket_shutdown]
>>> 0-docker2-client-8: intentional socket shutdown(5)
>>> [2020-11-19 03:29:24.479738] I [socket.c:865:__socket_shutdown]
>>> 0-docker2-client-2: intentional socket shutdown(5)
>>> [2020-11-19 03:30:12.318146] I [socket.c:865:__socket_shutdown]
>>> 0-docker2-client-5: intentional socket shutdown(5)
>>> [2020-11-19 03:31:27.381720] I [socket.c:865:__socket_shutdown]
>>> 0-docker2-client-8: intentional socket shutdown(5)
>>> [2020-11-19 03:31:30.579630] I [socket.c:865:__socket_shutdown]
>>> 0-docker2-client-2: intentional socket shutdown(5)
>>> [2020-11-19 03:32:18.427364] I [socket.c:865:__socket_shutdown]
>>> 0-docker2-client-5: intentional socket shutdown(5)
>>>
>>> The rename messages look like these;
>>> [2020-11-19 03:29:05.402663] I [MSGID: 109066]
>>> [dht-rename.c:1951:dht_rename] 0-docker2-dht: renaming
>>> /corporate/zammad/tmp/init/cache/bootsnap-compile-cache/95/75f93c20e375c5.tmp.eVcE5D
>>> (fe083b7e-b0d5-485c-8666-e1f7cdac33e2)
>>> (hash=docker2-replicate-2/cache=docker2-replicate-2) =>
>>> /corporate/zammad/tmp/init/cache/bootsnap-compile-cache/95/75f93c20e375c5
>>> ((null)) (hash=docker2-replicate-2/cache=<nul>)
>>> [2020-11-19 03:29:05.410972] I [MSGID: 109066]
>>> [dht-rename.c:1951:dht_rename] 0-docker2-dht: renaming
>>> /corporate/zammad/tmp/init/cache/bootsnap-compile-cache/0d/86dd25f3d238ff.tmp.AdDTLu
>>> (b1edadad-1d48-4bf4-be85-ffbe9d69d338)
>>> (hash=docker2-replicate-1/cache=docker2-replicate-1) =>
>>> /corporate/zammad/tmp/init/cache/bootsnap-compile-cache/0d/86dd25f3d238ff
>>> ((null)) (hash=docker2-replicate-2/cache=<nul>)
>>> [2020-11-19 03:29:05.420064] I [MSGID: 109066]
>>> [dht-rename.c:1951:dht_rename] 0-docker2-dht: renaming
>>> /corporate/zammad/tmp/init/cache/bootsnap-compile-cache/f2/6e44f76b508fd3.tmp.QKmxul
>>> (31f80fcb-977c-433b-9259-5fdfcad1171c)
>>> (hash=docker2-replicate-0/cache=docker2-replicate-0) =>
>>> /corporate/zammad/tmp/init/cache/bootsnap-compile-cache/f2/6e44f76b508fd3
>>> ((null)) (hash=docker2-replicate-0/cache=<nul>)
>>> [2020-11-19 03:29:05.427537] I [MSGID: 109066]
>>> [dht-rename.c:1951:dht_rename] 0-docker2-dht: renaming
>>> /corporate/zammad/tmp/init/cache/bootsnap-compile-cache/b0/1d7303d9dfe009.tmp.qLUMec
>>> (e2fdf971-731f-4765-80e8-3165433488ea)
>>> (hash=docker2-replicate-2/cache=docker2-replicate-2) =>
>>> /corporate/zammad/tmp/init/cache/bootsnap-compile-cache/b0/1d7303d9dfe009
>>> ((null)) (hash=docker2-replicate-1/cache=<nul>)
>>> [2020-11-19 03:29:05.440576] I [MSGID: 109066]
>>> [dht-rename.c:1951:dht_rename] 0-docker2-dht: renaming
>>> /corporate/zammad/tmp/init/cache/bootsnap-compile-cache/bd/952a089e164b36.tmp.4qvl22
>>> (3e0bc6d1-13ac-47c6-b221-1256b4b506ef)
>>> (hash=docker2-replicate-2/cache=docker2-replicate-2) =>
>>> /corporate/zammad/tmp/init/cache/bootsnap-compile-cache/bd/952a089e164b36
>>> ((null)) (hash=docker2-replicate-1/cache=<nul>)
>>> [2020-11-19 03:29:05.452407] I [MSGID: 109066]
>>> [dht-rename.c:1951:dht_rename] 0-docker2-dht: renaming
>>> /corporate/zammad/tmp/init/cache/bootsnap-compile-cache/a3/b587dd08f35e2e.tmp.iIweTT
>>> (9685b5f3-4b14-4050-9b00-1163856239b5)
>>> (hash=docker2-replicate-1/cache=docker2-replicate-1) =>
>>> /corporate/zammad/tmp/init/cache/bootsnap-compile-cache/a3/b587dd08f35e2e
>>> ((null)) (hash=docker2-replicate-0/cache=<nul>)
>>> [2020-11-19 03:29:05.460720] I [MSGID: 109066]
>>> [dht-rename.c:1951:dht_rename] 0-docker2-dht: renaming
>>> /corporate/zammad/tmp/init/cache/bootsnap-compile-cache/48/89cfb1b971c025.tmp.0W7jMK
>>> (d0a8d0a4-c783-45db-bb4a-68e24044d830)
>>> (hash=docker2-replicate-0/cache=docker2-replicate-0) =>
>>> /corporate/zammad/tmp/init/cache/bootsnap-compile-cache/48/89cfb1b971c025
>>> ((null)) (hash=docker2-replicate-1/cache=<nul>)
>>> [2020-11-19 03:29:05.468800] I [MSGID: 109066]
>>> [dht-rename.c:1951:dht_rename] 0-docker2-dht: renaming
>>> /corporate/zammad/tmp/init/cache/bootsnap-compile-cache/d9/759d55e8da66eb.tmp.2yXtHB
>>> (e5b61ef5-a3c2-4a2c-aa47-c377a6c090d7)
>>> (hash=docker2-replicate-0/cache=docker2-replicate-0) =>
>>> /corporate/zammad/tmp/init/cache/bootsnap-compile-cache/d9/759d55e8da66eb
>>> ((null)) (hash=docker2-replicate-0/cache=<nul>)
>>> [2020-11-19 03:29:05.476745] I [MSGID: 109066]
>>> [dht-rename.c:1951:dht_rename] 0-docker2-dht: renaming
>>> /corporate/zammad/tmp/init/cache/bootsnap-compile-cache/1c/f3a658342e36b7.tmp.gSkiEs
>>> (17181a40-f9b2-438f-9dfc-7bb159c516e6)
>>> (hash=docker2-replicate-2/cache=docker2-replicate-2) =>
>>> /corporate/zammad/tmp/init/cache/bootsnap-compile-cache/1c/f3a658342e36b7
>>> ((null)) (hash=docker2-replicate-0/cache=<nul>)
>>> [2020-11-19 03:29:05.486729] I [MSGID: 109066]
>>> [dht-rename.c:1951:dht_rename] 0-docker2-dht: renaming
>>> /corporate/zammad/tmp/init/cache/bootsnap-compile-cache/f1/6bef7cb6446c7a.tmp.sVT0Dj
>>> (cb6b1d52-b1c0-420c-86b7-2ceb8e8e73db)
>>> (hash=docker2-replicate-0/cache=docker2-replicate-0) =>
>>> /corporate/zammad/tmp/init/cache/bootsnap-compile-cache/f1/6bef7cb6446c7a
>>> ((null)) (hash=docker2-replicate-1/cache=<nul>)
>>> [2020-11-19 03:29:05.495115] I [MSGID: 109066]
>>> [dht-rename.c:1951:dht_rename] 0-docker2-dht: renaming
>>> /corporate/zammad/tmp/init/cache/bootsnap-compile-cache/45/73ba226559961b.tmp.QdPTFa
>>> (d8450d9e-62a7-4fd5-9dd2-e072e318d9a0)
>>> (hash=docker2-replicate-0/cache=docker2-replicate-0) =>
>>> /corporate/zammad/tmp/init/cache/bootsnap-compile-cache/45/73ba226559961b
>>> ((null)) (hash=docker2-replicate-1/cache=<nul>)
>>> [2020-11-19 03:29:05.503424] I [MSGID: 109066]
>>> [dht-rename.c:1951:dht_rename] 0-docker2-dht: renaming
>>> /corporate/zammad/tmp/init/cache/bootsnap-compile-cache/13/29c0df35961ca0.tmp.s1xUJ1
>>> (ffc57a77-8b91-4264-8e2d-a9966f0f37ef)
>>> (hash=docker2-replicate-1/cache=docker2-replicate-1) =>
>>> /corporate/zammad/tmp/init/cache/bootsnap-compile-cache/13/29c0df35961ca0
>>> ((null)) (hash=docker2-replicate-2/cache=<nul>)
>>> [2020-11-19 03:29:05.513532] I [MSGID: 109066]
>>> [dht-rename.c:1951:dht_rename] 0-docker2-dht: renaming
>>> /corporate/zammad/tmp/init/cache/bootsnap-compile-cache/be/8d6a07b6a0d6ad.tmp.A5DzQS
>>> (5a595a65-372d-4377-b547-2c4e23f7be3a)
>>> (hash=docker2-replicate-1/cache=docker2-replicate-1) =>
>>> /corporate/zammad/tmp/init/cache/bootsnap-compile-cache/be/8d6a07b6a0d6ad
>>> ((null)) (hash=docker2-replicate-0/cache=<nul>)
>>> [2020-11-19 03:29:05.526885] I [MSGID: 109066]
>>> [dht-rename.c:1951:dht_rename] 0-docker2-dht: renaming
>>> /corporate/zammad/tmp/init/cache/bootsnap-compile-cache/ec/4208216d993cbe.tmp.IMXg0J
>>> (2fa99fcd-64f8-4934-aeda-b356816f1132)
>>> (hash=docker2-replicate-2/cache=docker2-replicate-2) =>
>>> /corporate/zammad/tmp/init/cache/bootsnap-compile-cache/ec/4208216d993cbe
>>> ((null)) (hash=docker2-replicate-2/cache=<nul>)
>>> [2020-11-19 03:29:05.537637] I [MSGID: 109066]
>>> [dht-rename.c:1951:dht_rename] 0-docker2-dht: renaming
>>> /corporate/zammad/tmp/init/cache/bootsnap-compile-cache/57/1527c482cf2d6b.tmp.Y2L0cB
>>> (db24d7bf-4a06-4356-a52e-1ab9537d1c3a)
>>> (hash=docker2-replicate-0/cache=docker2-replicate-0) =>
>>> /corporate/zammad/tmp/init/cache/bootsnap-compile-cache/57/1527c482cf2d6b
>>> ((null)) (hash=docker2-replicate-1/cache=<nul>)
>>> [2020-11-19 03:29:05.547878] I [MSGID: 109066]
>>> [dht-rename.c:1951:dht_rename] 0-docker2-dht: renaming
>>> /corporate/zammad/tmp/init/cache/bootsnap-compile-cache/88/1b60ead8d4c4e5.tmp.u47rss
>>> (b12f041b-5bbd-4e3d-b700-8f673830393f)
>>> (hash=docker2-replicate-1/cache=docker2-replicate-1) =>
>>> /corporate/zammad/tmp/init/cache/bootsnap-compile-cache/88/1b60ead8d4c4e5
>>> ((null)) (hash=docker2-replicate-1/cache=<nul>)
>>>
>>> if i can provide any more information please let me know.
>>>
>>> Thanks Olaf
>>>
>>>
>>> ________
>>>
>>>
>>>
>>> Community Meeting Calendar:
>>>
>>> Schedule -
>>> Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
>>> Bridge: https://meet.google.com/cpu-eiue-hvk
>>> Gluster-users mailing list
>>> Gluster-users at gluster.org
>>> https://lists.gluster.org/mailman/listinfo/gluster-users
>>>
>>>
>> ________
>>
>>
>>
>> Community Meeting Calendar:
>>
>> Schedule -
>> Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
>> Bridge: https://meet.google.com/cpu-eiue-hvk
>> Gluster-users mailing listGluster-users at gluster.orghttps://lists.gluster.org/mailman/listinfo/gluster-users
>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20201126/1b06dc32/attachment.html>


More information about the Gluster-users mailing list