<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
</head>
<body>
<p><br>
</p>
<div class="moz-cite-prefix">On 26/11/20 4:00 pm, Olaf Buitelaar
wrote:<br>
</div>
<blockquote type="cite"
cite="mid:CACH-y5SNXAFM3MnBg91JQeXTAcUCYtnOQJOpFxWbd=pcdC09xw@mail.gmail.com">
<meta http-equiv="content-type" content="text/html; charset=UTF-8">
<div dir="ltr">Hi Ravi,
<div><br>
</div>
<div>I could try that, but i can only try a setup on VM's, and
will not be able to setup an environment like our production
environment.</div>
<div>Which runs on physical machines, and has actual production
load etc. So the 2 setups would be quite different.</div>
<div>Personally i think it would be best debug the actual
machines instead of trying to reproduce it. Since the
reproduction of the issue on the physical machines is just
swap the repositories and upgrade the packages.</div>
<div>Let me know what you think?</div>
</div>
</blockquote>
<p>Physical machines or VMs - anything is fine. The only thing is I
cannot guarantee quick responses , so if it is a production
machine, it will be an issue for you. So any set up you can use
for experimenting is fine. You don't need any clients for the
testing. Just create a 1x2 replica volume using 2 nodes and start
it. Then upgrade one node and see if shd and bricks come up on
that node.</p>
<p>-Ravi<br>
</p>
<blockquote type="cite"
cite="mid:CACH-y5SNXAFM3MnBg91JQeXTAcUCYtnOQJOpFxWbd=pcdC09xw@mail.gmail.com">
<div dir="ltr">
<div><br>
</div>
<div>Thanks Olaf</div>
</div>
<br>
<div class="gmail_quote">
<div dir="ltr" class="gmail_attr">Op do 26 nov. 2020 om 02:43
schreef Ravishankar N <<a
href="mailto:ravishankar@redhat.com" moz-do-not-send="true">ravishankar@redhat.com</a>>:<br>
</div>
<blockquote class="gmail_quote" style="margin:0px 0px 0px
0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<div>
<p><br>
</p>
<div>On 25/11/20 7:17 pm, Olaf Buitelaar wrote:<br>
</div>
<blockquote type="cite">
<div dir="ltr">Hi Ravi,
<div><br>
</div>
<div>Thanks for checking. Unfortunately this is our
production system, what i've done is simple change the
yum repo from gluter-6 to <a
href="http://mirror.centos.org/centos/$releasever/storage/$basearch/gluster-7/"
target="_blank" moz-do-not-send="true">http://mirror.centos.org/centos/$releasever/storage/$basearch/gluster-7/</a>.
Did a yum upgrade. And did restart the glusterd
process several times, i've also tried rebooting the
machine. And didn't touch the op-version yet, which is
still at (60000), usually i only do this when all
nodes are upgraded, and are running stable.</div>
<div>We're running multiple volumes with different
configurations, but for none of the volumes the shd
starts on the upgraded nodes.</div>
<div>Is there anything further i could check/do to get
to the bottom of this?</div>
</div>
</blockquote>
<p>Hi Olaf, like I said, would it be possible to create a
test setup to see if you can recreate it?<br>
</p>
Regards,<br>
Ravi<br>
<blockquote type="cite">
<div dir="ltr">
<div><br>
</div>
<div>Thanks Olaf</div>
</div>
<br>
<div class="gmail_quote">
<div dir="ltr" class="gmail_attr">Op wo 25 nov. 2020 om
14:14 schreef Ravishankar N <<a
href="mailto:ravishankar@redhat.com" target="_blank"
moz-do-not-send="true">ravishankar@redhat.com</a>>:<br>
</div>
<blockquote class="gmail_quote" style="margin:0px 0px
0px 0.8ex;border-left:1px solid
rgb(204,204,204);padding-left:1ex">
<div>
<p><br>
</p>
<div>On 25/11/20 5:50 pm, Olaf Buitelaar wrote:<br>
</div>
<blockquote type="cite">
<div dir="ltr">Hi Ashish,
<div><br>
</div>
<div>Thank you for looking into this. I indeed
also suspect it has something todo with the
7.X client, because on the 6.X clients the
issue doesn't really seem to occur.</div>
<div>I would love to update everything to 7.X,
But since the self-heal daemons (<a
href="https://lists.gluster.org/pipermail/gluster-users/2020-November/038917.html"
target="_blank" moz-do-not-send="true">https://lists.gluster.org/pipermail/gluster-users/2020-November/038917.html</a>)
won't start, i halted the full upgrade. <br>
</div>
</div>
</blockquote>
<p>Olaf, based on your email. I did try to upgrade a
1 node of a 3-node replica 3 setup from 6.10 to
7.8 on my test VMs and I found that the self-heal
daemon (and the bricks) came online after I
restarted glusterd post-upgrade on that node. (I
did not touch the op-version), and I did not spend
time on it further. So I don't think the problem
is related to the shd mux changes I referred to.
But if you have a test setup where you can
reproduce this, please raise a github issue with
the details.<br>
</p>
Thanks,<br>
Ravi<br>
<blockquote type="cite">
<div dir="ltr">
<div>Hopefully that issue will be addressed in
the upcoming release. Once i've everything
running on the same version i'll check if the
issue still occurs and reach out, if that's
the case.</div>
<div><br>
</div>
<div>Thanks Olaf</div>
</div>
<br>
<div class="gmail_quote">
<div dir="ltr" class="gmail_attr">Op wo 25 nov.
2020 om 10:42 schreef Ashish Pandey <<a
href="mailto:aspandey@redhat.com"
target="_blank" moz-do-not-send="true">aspandey@redhat.com</a>>:<br>
</div>
<blockquote class="gmail_quote"
style="margin:0px 0px 0px
0.8ex;border-left:1px solid
rgb(204,204,204);padding-left:1ex">
<div>
<div>
<div><br>
</div>
<div>Hi,<br>
</div>
<div><br>
</div>
<div>I checked the statedump and found
some very high memory allocations.<br>
</div>
<div>grep -rwn "num_allocs"
glusterdump.17317.dump.1605* | cut -d'='
-f2 | sort</div>
<div><br>
30003616 <br>
30003616 <br>
3305 <br>
3305 <br>
36960008 <br>
36960008 <br>
38029944 <br>
38029944 <br>
38450472 <br>
38450472 <br>
39566824 <br>
39566824 <br>
4 <br>
I did check the lines on statedump and
it could be happening in
protocol/clinet. However, I did not find
anything suspicious in my quick code
exploration.<br>
</div>
<div>I would suggest to upgrade all the
nodes on latest version and the start
your work and see if there is any high
usage of memory .<br>
</div>
<div>That way it will also be easier to
debug this issue.<br>
</div>
<div><br>
</div>
<div>---<br>
</div>
<div>Ashish<br>
</div>
<div><br>
</div>
<hr
id="gmail-m_-8005825012383939613gmail-m_-926496664757219733gmail-m_-5447166865019259697zwchr">
<div
style="color:rgb(0,0,0);font-weight:normal;font-style:normal;text-decoration:none;font-family:Helvetica,Arial,sans-serif;font-size:12pt"><b>From:
</b>"Olaf Buitelaar" <<a
href="mailto:olaf.buitelaar@gmail.com"
target="_blank" moz-do-not-send="true">olaf.buitelaar@gmail.com</a>><br>
<b>To: </b>"gluster-users" <<a
href="mailto:gluster-users@gluster.org"
target="_blank" moz-do-not-send="true">gluster-users@gluster.org</a>><br>
<b>Sent: </b>Thursday, November 19,
2020 10:28:57 PM<br>
<b>Subject: </b>[Gluster-users]
possible memory leak in client/fuse
mount<br>
<div><br>
</div>
<div dir="ltr">Dear Gluster Users,
<div><br>
</div>
<div>I've a glusterfs process which
consumes about all memory of the
machine (~58GB);</div>
<div><br>
</div>
<div># ps -faxu|grep 17317<br>
root 17317 3.1 88.9 59695516
58479708 ? Ssl Oct31 839:36
/usr/sbin/glusterfs --process-name
fuse --volfile-server=10.201.0.1
--volfile-server=10.201.0.8:10.201.0.5:10.201.0.6:10.201.0.7:10.201.0.9
--volfile-id=/docker2 /mnt/docker2<br>
</div>
<div><br>
</div>
<div>The gluster version on this
machine is 7.8, but i'm currently
running a mixed cluster of 6.10 and
7.8, while awaiting to proceed to
upgrade for the issue mentioned
earlier with the self-heal daemon.</div>
<div><br>
</div>
<div>The affected volume info looks
like;</div>
<div><br>
</div>
# gluster v info docker2<br>
<div><br>
</div>
Volume Name: docker2<br>
Type: Distributed-Replicate<br>
Volume ID:
4e0670a0-3d00-4360-98bd-3da844cedae5<br>
Status: Started<br>
Snapshot Count: 0<br>
Number of Bricks: 3 x (2 + 1) = 9<br>
Transport-type: tcp<br>
Bricks:<br>
Brick1:
10.201.0.5:/data0/gfs/bricks/brick1/docker2<br>
Brick2:
10.201.0.9:/data0/gfs/bricks/brick1/docker2<br>
Brick3:
10.201.0.3:/data0/gfs/bricks/bricka/docker2
(arbiter)<br>
Brick4:
10.201.0.6:/data0/gfs/bricks/brick1/docker2<br>
Brick5:
10.201.0.7:/data0/gfs/bricks/brick1/docker2<br>
Brick6:
10.201.0.4:/data0/gfs/bricks/bricka/docker2
(arbiter)<br>
Brick7:
10.201.0.1:/data0/gfs/bricks/brick1/docker2<br>
Brick8:
10.201.0.8:/data0/gfs/bricks/brick1/docker2<br>
Brick9:
10.201.0.2:/data0/gfs/bricks/bricka/docker2
(arbiter)<br>
Options Reconfigured:<br>
performance.cache-size: 128MB<br>
transport.address-family: inet<br>
nfs.disable: on<br>
<div>cluster.brick-multiplex: on</div>
<div><br>
</div>
<div>The issue seems to be triggered
by a program called zammad, which
has an init process, which runs in a
loop. on cycle it re-compiles the
ruby-on-rails application.</div>
<div><br>
</div>
<div>I've attached 2 statedumps, but
as i only recently noticed the high
memory usage, i believe both
statedumps already show an escalated
state of the glusterfs process. If
it's needed to also have them from
the beginning let me know. The dumps
are taken about an hour apart.</div>
<div>Also i've included the
glusterd.log. I couldn't include
mnt-docker2.log since it's too
large, since it's littered with: " I
[MSGID: 109066]
[dht-rename.c:1951:dht_rename]
0-docker2-dht"</div>
<div>However i've inspected the log
and it contains no Error
message's all are of the Info kind;</div>
<div>which look like these;</div>
<div>[2020-11-19 03:29:05.406766] I
[glusterfsd-mgmt.c:2282:mgmt_getspec_cbk]
0-glusterfs: No change in
volfile,continuing<br>
[2020-11-19 03:29:21.271886] I
[socket.c:865:__socket_shutdown]
0-docker2-client-8: intentional
socket shutdown(5)<br>
[2020-11-19 03:29:24.479738] I
[socket.c:865:__socket_shutdown]
0-docker2-client-2: intentional
socket shutdown(5)<br>
[2020-11-19 03:30:12.318146] I
[socket.c:865:__socket_shutdown]
0-docker2-client-5: intentional
socket shutdown(5)<br>
[2020-11-19 03:31:27.381720] I
[socket.c:865:__socket_shutdown]
0-docker2-client-8: intentional
socket shutdown(5)<br>
[2020-11-19 03:31:30.579630] I
[socket.c:865:__socket_shutdown]
0-docker2-client-2: intentional
socket shutdown(5)<br>
[2020-11-19 03:32:18.427364] I
[socket.c:865:__socket_shutdown]
0-docker2-client-5: intentional
socket shutdown(5)<br>
</div>
<div><br>
</div>
<div>The rename messages look like
these;</div>
<div>[2020-11-19 03:29:05.402663] I
[MSGID: 109066]
[dht-rename.c:1951:dht_rename]
0-docker2-dht: renaming
/corporate/zammad/tmp/init/cache/bootsnap-compile-cache/95/75f93c20e375c5.tmp.eVcE5D
(fe083b7e-b0d5-485c-8666-e1f7cdac33e2)
(hash=docker2-replicate-2/cache=docker2-replicate-2)
=>
/corporate/zammad/tmp/init/cache/bootsnap-compile-cache/95/75f93c20e375c5
((null))
(hash=docker2-replicate-2/cache=<nul>)<br>
[2020-11-19 03:29:05.410972] I
[MSGID: 109066]
[dht-rename.c:1951:dht_rename]
0-docker2-dht: renaming
/corporate/zammad/tmp/init/cache/bootsnap-compile-cache/0d/86dd25f3d238ff.tmp.AdDTLu
(b1edadad-1d48-4bf4-be85-ffbe9d69d338)
(hash=docker2-replicate-1/cache=docker2-replicate-1)
=>
/corporate/zammad/tmp/init/cache/bootsnap-compile-cache/0d/86dd25f3d238ff
((null))
(hash=docker2-replicate-2/cache=<nul>)<br>
[2020-11-19 03:29:05.420064] I
[MSGID: 109066]
[dht-rename.c:1951:dht_rename]
0-docker2-dht: renaming
/corporate/zammad/tmp/init/cache/bootsnap-compile-cache/f2/6e44f76b508fd3.tmp.QKmxul
(31f80fcb-977c-433b-9259-5fdfcad1171c)
(hash=docker2-replicate-0/cache=docker2-replicate-0)
=>
/corporate/zammad/tmp/init/cache/bootsnap-compile-cache/f2/6e44f76b508fd3
((null))
(hash=docker2-replicate-0/cache=<nul>)<br>
[2020-11-19 03:29:05.427537] I
[MSGID: 109066]
[dht-rename.c:1951:dht_rename]
0-docker2-dht: renaming
/corporate/zammad/tmp/init/cache/bootsnap-compile-cache/b0/1d7303d9dfe009.tmp.qLUMec
(e2fdf971-731f-4765-80e8-3165433488ea)
(hash=docker2-replicate-2/cache=docker2-replicate-2)
=>
/corporate/zammad/tmp/init/cache/bootsnap-compile-cache/b0/1d7303d9dfe009
((null))
(hash=docker2-replicate-1/cache=<nul>)<br>
[2020-11-19 03:29:05.440576] I
[MSGID: 109066]
[dht-rename.c:1951:dht_rename]
0-docker2-dht: renaming
/corporate/zammad/tmp/init/cache/bootsnap-compile-cache/bd/952a089e164b36.tmp.4qvl22
(3e0bc6d1-13ac-47c6-b221-1256b4b506ef)
(hash=docker2-replicate-2/cache=docker2-replicate-2)
=>
/corporate/zammad/tmp/init/cache/bootsnap-compile-cache/bd/952a089e164b36
((null))
(hash=docker2-replicate-1/cache=<nul>)<br>
[2020-11-19 03:29:05.452407] I
[MSGID: 109066]
[dht-rename.c:1951:dht_rename]
0-docker2-dht: renaming
/corporate/zammad/tmp/init/cache/bootsnap-compile-cache/a3/b587dd08f35e2e.tmp.iIweTT
(9685b5f3-4b14-4050-9b00-1163856239b5)
(hash=docker2-replicate-1/cache=docker2-replicate-1)
=>
/corporate/zammad/tmp/init/cache/bootsnap-compile-cache/a3/b587dd08f35e2e
((null))
(hash=docker2-replicate-0/cache=<nul>)<br>
[2020-11-19 03:29:05.460720] I
[MSGID: 109066]
[dht-rename.c:1951:dht_rename]
0-docker2-dht: renaming
/corporate/zammad/tmp/init/cache/bootsnap-compile-cache/48/89cfb1b971c025.tmp.0W7jMK
(d0a8d0a4-c783-45db-bb4a-68e24044d830)
(hash=docker2-replicate-0/cache=docker2-replicate-0)
=>
/corporate/zammad/tmp/init/cache/bootsnap-compile-cache/48/89cfb1b971c025
((null))
(hash=docker2-replicate-1/cache=<nul>)<br>
[2020-11-19 03:29:05.468800] I
[MSGID: 109066]
[dht-rename.c:1951:dht_rename]
0-docker2-dht: renaming
/corporate/zammad/tmp/init/cache/bootsnap-compile-cache/d9/759d55e8da66eb.tmp.2yXtHB
(e5b61ef5-a3c2-4a2c-aa47-c377a6c090d7)
(hash=docker2-replicate-0/cache=docker2-replicate-0)
=>
/corporate/zammad/tmp/init/cache/bootsnap-compile-cache/d9/759d55e8da66eb
((null))
(hash=docker2-replicate-0/cache=<nul>)<br>
[2020-11-19 03:29:05.476745] I
[MSGID: 109066]
[dht-rename.c:1951:dht_rename]
0-docker2-dht: renaming
/corporate/zammad/tmp/init/cache/bootsnap-compile-cache/1c/f3a658342e36b7.tmp.gSkiEs
(17181a40-f9b2-438f-9dfc-7bb159c516e6)
(hash=docker2-replicate-2/cache=docker2-replicate-2)
=>
/corporate/zammad/tmp/init/cache/bootsnap-compile-cache/1c/f3a658342e36b7
((null))
(hash=docker2-replicate-0/cache=<nul>)<br>
[2020-11-19 03:29:05.486729] I
[MSGID: 109066]
[dht-rename.c:1951:dht_rename]
0-docker2-dht: renaming
/corporate/zammad/tmp/init/cache/bootsnap-compile-cache/f1/6bef7cb6446c7a.tmp.sVT0Dj
(cb6b1d52-b1c0-420c-86b7-2ceb8e8e73db)
(hash=docker2-replicate-0/cache=docker2-replicate-0)
=>
/corporate/zammad/tmp/init/cache/bootsnap-compile-cache/f1/6bef7cb6446c7a
((null))
(hash=docker2-replicate-1/cache=<nul>)<br>
[2020-11-19 03:29:05.495115] I
[MSGID: 109066]
[dht-rename.c:1951:dht_rename]
0-docker2-dht: renaming
/corporate/zammad/tmp/init/cache/bootsnap-compile-cache/45/73ba226559961b.tmp.QdPTFa
(d8450d9e-62a7-4fd5-9dd2-e072e318d9a0)
(hash=docker2-replicate-0/cache=docker2-replicate-0)
=>
/corporate/zammad/tmp/init/cache/bootsnap-compile-cache/45/73ba226559961b
((null))
(hash=docker2-replicate-1/cache=<nul>)<br>
[2020-11-19 03:29:05.503424] I
[MSGID: 109066]
[dht-rename.c:1951:dht_rename]
0-docker2-dht: renaming
/corporate/zammad/tmp/init/cache/bootsnap-compile-cache/13/29c0df35961ca0.tmp.s1xUJ1
(ffc57a77-8b91-4264-8e2d-a9966f0f37ef)
(hash=docker2-replicate-1/cache=docker2-replicate-1)
=>
/corporate/zammad/tmp/init/cache/bootsnap-compile-cache/13/29c0df35961ca0
((null))
(hash=docker2-replicate-2/cache=<nul>)<br>
[2020-11-19 03:29:05.513532] I
[MSGID: 109066]
[dht-rename.c:1951:dht_rename]
0-docker2-dht: renaming
/corporate/zammad/tmp/init/cache/bootsnap-compile-cache/be/8d6a07b6a0d6ad.tmp.A5DzQS
(5a595a65-372d-4377-b547-2c4e23f7be3a)
(hash=docker2-replicate-1/cache=docker2-replicate-1)
=>
/corporate/zammad/tmp/init/cache/bootsnap-compile-cache/be/8d6a07b6a0d6ad
((null))
(hash=docker2-replicate-0/cache=<nul>)<br>
[2020-11-19 03:29:05.526885] I
[MSGID: 109066]
[dht-rename.c:1951:dht_rename]
0-docker2-dht: renaming
/corporate/zammad/tmp/init/cache/bootsnap-compile-cache/ec/4208216d993cbe.tmp.IMXg0J
(2fa99fcd-64f8-4934-aeda-b356816f1132)
(hash=docker2-replicate-2/cache=docker2-replicate-2)
=>
/corporate/zammad/tmp/init/cache/bootsnap-compile-cache/ec/4208216d993cbe
((null))
(hash=docker2-replicate-2/cache=<nul>)<br>
[2020-11-19 03:29:05.537637] I
[MSGID: 109066]
[dht-rename.c:1951:dht_rename]
0-docker2-dht: renaming
/corporate/zammad/tmp/init/cache/bootsnap-compile-cache/57/1527c482cf2d6b.tmp.Y2L0cB
(db24d7bf-4a06-4356-a52e-1ab9537d1c3a)
(hash=docker2-replicate-0/cache=docker2-replicate-0)
=>
/corporate/zammad/tmp/init/cache/bootsnap-compile-cache/57/1527c482cf2d6b
((null))
(hash=docker2-replicate-1/cache=<nul>)<br>
[2020-11-19 03:29:05.547878] I
[MSGID: 109066]
[dht-rename.c:1951:dht_rename]
0-docker2-dht: renaming
/corporate/zammad/tmp/init/cache/bootsnap-compile-cache/88/1b60ead8d4c4e5.tmp.u47rss
(b12f041b-5bbd-4e3d-b700-8f673830393f)
(hash=docker2-replicate-1/cache=docker2-replicate-1)
=>
/corporate/zammad/tmp/init/cache/bootsnap-compile-cache/88/1b60ead8d4c4e5
((null))
(hash=docker2-replicate-1/cache=<nul>)<br>
</div>
<div><br>
</div>
<div>if i can provide any more
information please let me know.</div>
<div><br>
</div>
<div>Thanks Olaf</div>
<div><br>
</div>
</div>
<br>
________<br>
<div><br>
</div>
<br>
<div><br>
</div>
Community Meeting Calendar:<br>
<div><br>
</div>
Schedule -<br>
Every 2nd and 4th Tuesday at 14:30 IST /
09:00 UTC<br>
Bridge: <a
href="https://meet.google.com/cpu-eiue-hvk"
target="_blank" moz-do-not-send="true">https://meet.google.com/cpu-eiue-hvk</a><br>
Gluster-users mailing list<br>
<a
href="mailto:Gluster-users@gluster.org"
target="_blank" moz-do-not-send="true">Gluster-users@gluster.org</a><br>
<a
href="https://lists.gluster.org/mailman/listinfo/gluster-users"
target="_blank" moz-do-not-send="true">https://lists.gluster.org/mailman/listinfo/gluster-users</a><br>
</div>
<div><br>
</div>
</div>
</div>
</blockquote>
</div>
<br>
<fieldset></fieldset>
<pre>________
Community Meeting Calendar:
Schedule -
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
Bridge: <a href="https://meet.google.com/cpu-eiue-hvk" target="_blank" moz-do-not-send="true">https://meet.google.com/cpu-eiue-hvk</a>
Gluster-users mailing list
<a href="mailto:Gluster-users@gluster.org" target="_blank" moz-do-not-send="true">Gluster-users@gluster.org</a>
<a href="https://lists.gluster.org/mailman/listinfo/gluster-users" target="_blank" moz-do-not-send="true">https://lists.gluster.org/mailman/listinfo/gluster-users</a>
</pre>
</blockquote>
</div>
</blockquote>
</div>
</blockquote>
</div>
</blockquote>
</div>
</blockquote>
</body>
</html>