<div dir="ltr">Hi Ravi,<div><br></div><div>Thanks for checking. Unfortunately this is our production system, what i&#39;ve done is simple change the yum repo from gluter-6 to <a href="http://mirror.centos.org/centos/$releasever/storage/$basearch/gluster-7/">http://mirror.centos.org/centos/$releasever/storage/$basearch/gluster-7/</a>. Did a yum upgrade. And did restart the glusterd process several times, i&#39;ve also tried rebooting the machine. And didn&#39;t touch the op-version yet, which is still at (60000), usually i only do this when all nodes are upgraded, and are running stable.</div><div>We&#39;re running multiple volumes with different configurations, but for none of the volumes the shd starts on the upgraded nodes.</div><div>Is there anything further i could check/do to get to the bottom of this?</div><div><br></div><div>Thanks Olaf</div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">Op wo 25 nov. 2020 om 14:14 schreef Ravishankar N &lt;<a href="mailto:ravishankar@redhat.com">ravishankar@redhat.com</a>&gt;:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

  <div>

    <p><br>

    </p>

    <div>On 25/11/20 5:50 pm, Olaf Buitelaar

      wrote:<br>

    </div>

    <blockquote type="cite">

      <div dir="ltr">Hi Ashish,

        <div><br>

        </div>

        <div>Thank you for looking into this. I indeed also suspect it

          has something todo with the 7.X client, because on the 6.X

          clients the issue doesn&#39;t really seem to occur.</div>

        <div>I would love to update everything to 7.X, But since the

          self-heal daemons (<a href="https://lists.gluster.org/pipermail/gluster-users/2020-November/038917.html" target="_blank">https://lists.gluster.org/pipermail/gluster-users/2020-November/038917.html</a>)

          won&#39;t start, i halted the full upgrade. <br>

        </div>

      </div>

    </blockquote>

    <p>Olaf, based on your email. I did try to upgrade a 1 node of a

      3-node replica 3 setup from 6.10 to 7.8 on my test VMs and I found

      that the self-heal daemon (and the bricks) came online after I

      restarted glusterd post-upgrade on that node. (I did not touch the

      op-version), and I did not spend time on it further.  So I don&#39;t

      think the problem is related to the shd mux changes I referred to.

      But if you have a test setup where you can reproduce this, please

      raise a github issue with the details.<br>

    </p>

    Thanks,<br>

    Ravi<br>

    <blockquote type="cite">

      <div dir="ltr">

        <div>Hopefully that issue will be addressed in the upcoming

          release. Once i&#39;ve everything running on the same version i&#39;ll

          check if the issue still occurs and reach out, if that&#39;s the

          case.</div>

        <div><br>

        </div>

        <div>Thanks Olaf</div>

      </div>

      <br>

      <div class="gmail_quote">

        <div dir="ltr" class="gmail_attr">Op wo 25 nov. 2020 om 10:42

          schreef Ashish Pandey &lt;<a href="mailto:aspandey@redhat.com" target="_blank">aspandey@redhat.com</a>&gt;:<br>

        </div>

        <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

          <div>

            <div>

              <div><br>

              </div>

              <div>Hi,<br>

              </div>

              <div><br>

              </div>

              <div>I checked the statedump and found some very high

                memory allocations.<br>

              </div>

              <div>grep -rwn &quot;num_allocs&quot; glusterdump.17317.dump.1605* |

                cut -d&#39;=&#39; -f2 | sort</div>

              <div><br>

                30003616 <br>

                30003616 <br>

                3305 <br>

                3305 <br>

                36960008 <br>

                36960008 <br>

                38029944 <br>

                38029944 <br>

                38450472 <br>

                38450472 <br>

                39566824 <br>

                39566824 <br>

                4 <br>

                I did check the lines on statedump and it could be

                happening in protocol/clinet. However, I did not find

                anything suspicious in my quick code exploration.<br>

              </div>

              <div>I would suggest to upgrade all the nodes on latest

                version and the start your work and see if there is any

                high usage of memory .<br>

              </div>

              <div>That way it will also be easier to debug this issue.<br>

              </div>

              <div><br>

              </div>

              <div>---<br>

              </div>

              <div>Ashish<br>

              </div>

              <div><br>

              </div>

              <hr id="gmail-m_-926496664757219733gmail-m_-5447166865019259697zwchr">

              <div style="color:rgb(0,0,0);font-weight:normal;font-style:normal;text-decoration:none;font-family:Helvetica,Arial,sans-serif;font-size:12pt"><b>From:

                </b>&quot;Olaf Buitelaar&quot; &lt;<a href="mailto:olaf.buitelaar@gmail.com" target="_blank">olaf.buitelaar@gmail.com</a>&gt;<br>

                <b>To: </b>&quot;gluster-users&quot; &lt;<a href="mailto:gluster-users@gluster.org" target="_blank">gluster-users@gluster.org</a>&gt;<br>

                <b>Sent: </b>Thursday, November 19, 2020 10:28:57 PM<br>

                <b>Subject: </b>[Gluster-users] possible memory leak in

                client/fuse mount<br>

                <div><br>

                </div>

                <div dir="ltr">Dear Gluster Users,

                  <div><br>

                  </div>

                  <div>I&#39;ve a glusterfs process which consumes about all

                    memory of the machine (~58GB);</div>

                  <div><br>

                  </div>

                  <div># ps -faxu|grep 17317<br>

                    root     17317  3.1 88.9 59695516 58479708 ?   Ssl

                     Oct31 839:36 /usr/sbin/glusterfs --process-name

                    fuse --volfile-server=10.201.0.1

                    --volfile-server=10.201.0.8:10.201.0.5:10.201.0.6:10.201.0.7:10.201.0.9

                    --volfile-id=/docker2 /mnt/docker2<br>

                  </div>

                  <div><br>

                  </div>

                  <div>The gluster version on this machine is 7.8, but

                    i&#39;m currently running a mixed cluster of 6.10 and

                    7.8, while awaiting to proceed to upgrade for the

                    issue mentioned earlier with the self-heal daemon.</div>

                  <div><br>

                  </div>

                  <div>The affected volume info looks like;</div>

                  <div><br>

                  </div>

                  # gluster v info docker2<br>

                  <div><br>

                  </div>

                  Volume Name: docker2<br>

                  Type: Distributed-Replicate<br>

                  Volume ID: 4e0670a0-3d00-4360-98bd-3da844cedae5<br>

                  Status: Started<br>

                  Snapshot Count: 0<br>

                  Number of Bricks: 3 x (2 + 1) = 9<br>

                  Transport-type: tcp<br>

                  Bricks:<br>

                  Brick1: 10.201.0.5:/data0/gfs/bricks/brick1/docker2<br>

                  Brick2: 10.201.0.9:/data0/gfs/bricks/brick1/docker2<br>

                  Brick3: 10.201.0.3:/data0/gfs/bricks/bricka/docker2

                  (arbiter)<br>

                  Brick4: 10.201.0.6:/data0/gfs/bricks/brick1/docker2<br>

                  Brick5: 10.201.0.7:/data0/gfs/bricks/brick1/docker2<br>

                  Brick6: 10.201.0.4:/data0/gfs/bricks/bricka/docker2

                  (arbiter)<br>

                  Brick7: 10.201.0.1:/data0/gfs/bricks/brick1/docker2<br>

                  Brick8: 10.201.0.8:/data0/gfs/bricks/brick1/docker2<br>

                  Brick9: 10.201.0.2:/data0/gfs/bricks/bricka/docker2

                  (arbiter)<br>

                  Options Reconfigured:<br>

                  performance.cache-size: 128MB<br>

                  transport.address-family: inet<br>

                  nfs.disable: on<br>

                  <div>cluster.brick-multiplex: on</div>

                  <div><br>

                  </div>

                  <div>The issue seems to be triggered by a program

                    called zammad, which has an init process, which runs

                    in a loop. on cycle it re-compiles the ruby-on-rails

                    application.</div>

                  <div><br>

                  </div>

                  <div>I&#39;ve attached 2 statedumps, but as i only

                    recently noticed the high memory usage, i believe

                    both statedumps already show an escalated state of

                    the glusterfs process. If it&#39;s needed to also have

                    them from the beginning let me know. The dumps are

                    taken about an hour apart.</div>

                  <div>Also i&#39;ve included the glusterd.log. I couldn&#39;t

                    include mnt-docker2.log since it&#39;s too large, since

                    it&#39;s littered with: &quot; I [MSGID: 109066]

                    [dht-rename.c:1951:dht_rename] 0-docker2-dht&quot;</div>

                  <div>However i&#39;ve inspected the log and it contains no

                    Error message&#39;s all are of the Info kind;</div>

                  <div>which look like these;</div>

                  <div>[2020-11-19 03:29:05.406766] I

                    [glusterfsd-mgmt.c:2282:mgmt_getspec_cbk]

                    0-glusterfs: No change in volfile,continuing<br>

                    [2020-11-19 03:29:21.271886] I

                    [socket.c:865:__socket_shutdown] 0-docker2-client-8:

                    intentional socket shutdown(5)<br>

                    [2020-11-19 03:29:24.479738] I

                    [socket.c:865:__socket_shutdown] 0-docker2-client-2:

                    intentional socket shutdown(5)<br>

                    [2020-11-19 03:30:12.318146] I

                    [socket.c:865:__socket_shutdown] 0-docker2-client-5:

                    intentional socket shutdown(5)<br>

                    [2020-11-19 03:31:27.381720] I

                    [socket.c:865:__socket_shutdown] 0-docker2-client-8:

                    intentional socket shutdown(5)<br>

                    [2020-11-19 03:31:30.579630] I

                    [socket.c:865:__socket_shutdown] 0-docker2-client-2:

                    intentional socket shutdown(5)<br>

                    [2020-11-19 03:32:18.427364] I

                    [socket.c:865:__socket_shutdown] 0-docker2-client-5:

                    intentional socket shutdown(5)<br>

                  </div>

                  <div><br>

                  </div>

                  <div>The rename messages look like these;</div>

                  <div>[2020-11-19 03:29:05.402663] I [MSGID: 109066]

                    [dht-rename.c:1951:dht_rename] 0-docker2-dht:

                    renaming

/corporate/zammad/tmp/init/cache/bootsnap-compile-cache/95/75f93c20e375c5.tmp.eVcE5D

                    (fe083b7e-b0d5-485c-8666-e1f7cdac33e2)

                    (hash=docker2-replicate-2/cache=docker2-replicate-2)

                    =&gt;

/corporate/zammad/tmp/init/cache/bootsnap-compile-cache/95/75f93c20e375c5

                    ((null))

                    (hash=docker2-replicate-2/cache=&lt;nul&gt;)<br>

                    [2020-11-19 03:29:05.410972] I [MSGID: 109066]

                    [dht-rename.c:1951:dht_rename] 0-docker2-dht:

                    renaming

/corporate/zammad/tmp/init/cache/bootsnap-compile-cache/0d/86dd25f3d238ff.tmp.AdDTLu

                    (b1edadad-1d48-4bf4-be85-ffbe9d69d338)

                    (hash=docker2-replicate-1/cache=docker2-replicate-1)

                    =&gt;

/corporate/zammad/tmp/init/cache/bootsnap-compile-cache/0d/86dd25f3d238ff

                    ((null))

                    (hash=docker2-replicate-2/cache=&lt;nul&gt;)<br>

                    [2020-11-19 03:29:05.420064] I [MSGID: 109066]

                    [dht-rename.c:1951:dht_rename] 0-docker2-dht:

                    renaming

/corporate/zammad/tmp/init/cache/bootsnap-compile-cache/f2/6e44f76b508fd3.tmp.QKmxul

                    (31f80fcb-977c-433b-9259-5fdfcad1171c)

                    (hash=docker2-replicate-0/cache=docker2-replicate-0)

                    =&gt;

/corporate/zammad/tmp/init/cache/bootsnap-compile-cache/f2/6e44f76b508fd3

                    ((null))

                    (hash=docker2-replicate-0/cache=&lt;nul&gt;)<br>

                    [2020-11-19 03:29:05.427537] I [MSGID: 109066]

                    [dht-rename.c:1951:dht_rename] 0-docker2-dht:

                    renaming

/corporate/zammad/tmp/init/cache/bootsnap-compile-cache/b0/1d7303d9dfe009.tmp.qLUMec

                    (e2fdf971-731f-4765-80e8-3165433488ea)

                    (hash=docker2-replicate-2/cache=docker2-replicate-2)

                    =&gt;

/corporate/zammad/tmp/init/cache/bootsnap-compile-cache/b0/1d7303d9dfe009

                    ((null))

                    (hash=docker2-replicate-1/cache=&lt;nul&gt;)<br>

                    [2020-11-19 03:29:05.440576] I [MSGID: 109066]

                    [dht-rename.c:1951:dht_rename] 0-docker2-dht:

                    renaming

/corporate/zammad/tmp/init/cache/bootsnap-compile-cache/bd/952a089e164b36.tmp.4qvl22

                    (3e0bc6d1-13ac-47c6-b221-1256b4b506ef)

                    (hash=docker2-replicate-2/cache=docker2-replicate-2)

                    =&gt;

/corporate/zammad/tmp/init/cache/bootsnap-compile-cache/bd/952a089e164b36

                    ((null))

                    (hash=docker2-replicate-1/cache=&lt;nul&gt;)<br>

                    [2020-11-19 03:29:05.452407] I [MSGID: 109066]

                    [dht-rename.c:1951:dht_rename] 0-docker2-dht:

                    renaming

/corporate/zammad/tmp/init/cache/bootsnap-compile-cache/a3/b587dd08f35e2e.tmp.iIweTT

                    (9685b5f3-4b14-4050-9b00-1163856239b5)

                    (hash=docker2-replicate-1/cache=docker2-replicate-1)

                    =&gt;

/corporate/zammad/tmp/init/cache/bootsnap-compile-cache/a3/b587dd08f35e2e

                    ((null))

                    (hash=docker2-replicate-0/cache=&lt;nul&gt;)<br>

                    [2020-11-19 03:29:05.460720] I [MSGID: 109066]

                    [dht-rename.c:1951:dht_rename] 0-docker2-dht:

                    renaming

/corporate/zammad/tmp/init/cache/bootsnap-compile-cache/48/89cfb1b971c025.tmp.0W7jMK

                    (d0a8d0a4-c783-45db-bb4a-68e24044d830)

                    (hash=docker2-replicate-0/cache=docker2-replicate-0)

                    =&gt;

/corporate/zammad/tmp/init/cache/bootsnap-compile-cache/48/89cfb1b971c025

                    ((null))

                    (hash=docker2-replicate-1/cache=&lt;nul&gt;)<br>

                    [2020-11-19 03:29:05.468800] I [MSGID: 109066]

                    [dht-rename.c:1951:dht_rename] 0-docker2-dht:

                    renaming

/corporate/zammad/tmp/init/cache/bootsnap-compile-cache/d9/759d55e8da66eb.tmp.2yXtHB

                    (e5b61ef5-a3c2-4a2c-aa47-c377a6c090d7)

                    (hash=docker2-replicate-0/cache=docker2-replicate-0)

                    =&gt;

/corporate/zammad/tmp/init/cache/bootsnap-compile-cache/d9/759d55e8da66eb

                    ((null))

                    (hash=docker2-replicate-0/cache=&lt;nul&gt;)<br>

                    [2020-11-19 03:29:05.476745] I [MSGID: 109066]

                    [dht-rename.c:1951:dht_rename] 0-docker2-dht:

                    renaming

/corporate/zammad/tmp/init/cache/bootsnap-compile-cache/1c/f3a658342e36b7.tmp.gSkiEs

                    (17181a40-f9b2-438f-9dfc-7bb159c516e6)

                    (hash=docker2-replicate-2/cache=docker2-replicate-2)

                    =&gt;

/corporate/zammad/tmp/init/cache/bootsnap-compile-cache/1c/f3a658342e36b7

                    ((null))

                    (hash=docker2-replicate-0/cache=&lt;nul&gt;)<br>

                    [2020-11-19 03:29:05.486729] I [MSGID: 109066]

                    [dht-rename.c:1951:dht_rename] 0-docker2-dht:

                    renaming

/corporate/zammad/tmp/init/cache/bootsnap-compile-cache/f1/6bef7cb6446c7a.tmp.sVT0Dj

                    (cb6b1d52-b1c0-420c-86b7-2ceb8e8e73db)

                    (hash=docker2-replicate-0/cache=docker2-replicate-0)

                    =&gt;

/corporate/zammad/tmp/init/cache/bootsnap-compile-cache/f1/6bef7cb6446c7a

                    ((null))

                    (hash=docker2-replicate-1/cache=&lt;nul&gt;)<br>

                    [2020-11-19 03:29:05.495115] I [MSGID: 109066]

                    [dht-rename.c:1951:dht_rename] 0-docker2-dht:

                    renaming

/corporate/zammad/tmp/init/cache/bootsnap-compile-cache/45/73ba226559961b.tmp.QdPTFa

                    (d8450d9e-62a7-4fd5-9dd2-e072e318d9a0)

                    (hash=docker2-replicate-0/cache=docker2-replicate-0)

                    =&gt;

/corporate/zammad/tmp/init/cache/bootsnap-compile-cache/45/73ba226559961b

                    ((null))

                    (hash=docker2-replicate-1/cache=&lt;nul&gt;)<br>

                    [2020-11-19 03:29:05.503424] I [MSGID: 109066]

                    [dht-rename.c:1951:dht_rename] 0-docker2-dht:

                    renaming

/corporate/zammad/tmp/init/cache/bootsnap-compile-cache/13/29c0df35961ca0.tmp.s1xUJ1

                    (ffc57a77-8b91-4264-8e2d-a9966f0f37ef)

                    (hash=docker2-replicate-1/cache=docker2-replicate-1)

                    =&gt;

/corporate/zammad/tmp/init/cache/bootsnap-compile-cache/13/29c0df35961ca0

                    ((null))

                    (hash=docker2-replicate-2/cache=&lt;nul&gt;)<br>

                    [2020-11-19 03:29:05.513532] I [MSGID: 109066]

                    [dht-rename.c:1951:dht_rename] 0-docker2-dht:

                    renaming

/corporate/zammad/tmp/init/cache/bootsnap-compile-cache/be/8d6a07b6a0d6ad.tmp.A5DzQS

                    (5a595a65-372d-4377-b547-2c4e23f7be3a)

                    (hash=docker2-replicate-1/cache=docker2-replicate-1)

                    =&gt;

/corporate/zammad/tmp/init/cache/bootsnap-compile-cache/be/8d6a07b6a0d6ad

                    ((null))

                    (hash=docker2-replicate-0/cache=&lt;nul&gt;)<br>

                    [2020-11-19 03:29:05.526885] I [MSGID: 109066]

                    [dht-rename.c:1951:dht_rename] 0-docker2-dht:

                    renaming

/corporate/zammad/tmp/init/cache/bootsnap-compile-cache/ec/4208216d993cbe.tmp.IMXg0J

                    (2fa99fcd-64f8-4934-aeda-b356816f1132)

                    (hash=docker2-replicate-2/cache=docker2-replicate-2)

                    =&gt;

/corporate/zammad/tmp/init/cache/bootsnap-compile-cache/ec/4208216d993cbe

                    ((null))

                    (hash=docker2-replicate-2/cache=&lt;nul&gt;)<br>

                    [2020-11-19 03:29:05.537637] I [MSGID: 109066]

                    [dht-rename.c:1951:dht_rename] 0-docker2-dht:

                    renaming

/corporate/zammad/tmp/init/cache/bootsnap-compile-cache/57/1527c482cf2d6b.tmp.Y2L0cB

                    (db24d7bf-4a06-4356-a52e-1ab9537d1c3a)

                    (hash=docker2-replicate-0/cache=docker2-replicate-0)

                    =&gt;

/corporate/zammad/tmp/init/cache/bootsnap-compile-cache/57/1527c482cf2d6b

                    ((null))

                    (hash=docker2-replicate-1/cache=&lt;nul&gt;)<br>

                    [2020-11-19 03:29:05.547878] I [MSGID: 109066]

                    [dht-rename.c:1951:dht_rename] 0-docker2-dht:

                    renaming

/corporate/zammad/tmp/init/cache/bootsnap-compile-cache/88/1b60ead8d4c4e5.tmp.u47rss

                    (b12f041b-5bbd-4e3d-b700-8f673830393f)

                    (hash=docker2-replicate-1/cache=docker2-replicate-1)

                    =&gt;

/corporate/zammad/tmp/init/cache/bootsnap-compile-cache/88/1b60ead8d4c4e5

                    ((null))

                    (hash=docker2-replicate-1/cache=&lt;nul&gt;)<br>

                  </div>

                  <div><br>

                  </div>

                  <div>if i can provide any more information please let

                    me know.</div>

                  <div><br>

                  </div>

                  <div>Thanks Olaf</div>

                  <div><br>

                  </div>

                </div>

                <br>

                ________<br>

                <div><br>

                </div>

                <br>

                <div><br>

                </div>

                Community Meeting Calendar:<br>

                <div><br>

                </div>

                Schedule -<br>

                Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC<br>

                Bridge: <a href="https://meet.google.com/cpu-eiue-hvk" target="_blank">https://meet.google.com/cpu-eiue-hvk</a><br>

                Gluster-users mailing list<br>

                <a href="mailto:Gluster-users@gluster.org" target="_blank">Gluster-users@gluster.org</a><br>

                <a href="https://lists.gluster.org/mailman/listinfo/gluster-users" target="_blank">https://lists.gluster.org/mailman/listinfo/gluster-users</a><br>

              </div>

              <div><br>

              </div>

            </div>

          </div>

        </blockquote>

      </div>

      <br>

      <fieldset></fieldset>

      <pre>________

Community Meeting Calendar:

Schedule -

Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC

Bridge: <a href="https://meet.google.com/cpu-eiue-hvk" target="_blank">https://meet.google.com/cpu-eiue-hvk</a>

Gluster-users mailing list

<a href="mailto:Gluster-users@gluster.org" target="_blank">Gluster-users@gluster.org</a>

<a href="https://lists.gluster.org/mailman/listinfo/gluster-users" target="_blank">https://lists.gluster.org/mailman/listinfo/gluster-users</a>

</pre>

    </blockquote>

  </div>

</blockquote></div>