<div dir="ltr">Hi Ravi,<div><br></div><div>Thanks for checking. Unfortunately this is our production system, what i&#39;ve done is simple change the yum repo from gluter-6 to <a href="http://mirror.centos.org/centos/$releasever/storage/$basearch/gluster-7/">http://mirror.centos.org/centos/$releasever/storage/$basearch/gluster-7/</a>. Did a yum upgrade. And did restart the glusterd process several times, i&#39;ve also tried rebooting the machine. And didn&#39;t touch the op-version yet, which is still at (60000), usually i only do this when all nodes are upgraded, and are running stable.</div><div>We&#39;re running multiple volumes with different configurations, but for none of the volumes the shd starts on the upgraded nodes.</div><div>Is there anything further i could check/do to get to the bottom of this?</div><div><br></div><div>Thanks Olaf</div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">Op wo 25 nov. 2020 om 14:14 schreef Ravishankar N &lt;<a href="mailto:ravishankar@redhat.com">ravishankar@redhat.com</a>&gt;:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
  
    
  
  <div>
    <p><br>
    </p>
    <div>On 25/11/20 5:50 pm, Olaf Buitelaar
      wrote:<br>
    </div>
    <blockquote type="cite">
      
      <div dir="ltr">Hi Ashish,
        <div><br>
        </div>
        <div>Thank you for looking into this. I indeed also suspect it
          has something todo with the 7.X client, because on the 6.X
          clients the issue doesn&#39;t really seem to occur.</div>
        <div>I would love to update everything to 7.X, But since the
          self-heal daemons (<a href="https://lists.gluster.org/pipermail/gluster-users/2020-November/038917.html" target="_blank">https://lists.gluster.org/pipermail/gluster-users/2020-November/038917.html</a>)
          won&#39;t start, i halted the full upgrade. <br>
        </div>
      </div>
    </blockquote>
    <p>Olaf, based on your email. I did try to upgrade a 1 node of a
      3-node replica 3 setup from 6.10 to 7.8 on my test VMs and I found
      that the self-heal daemon (and the bricks) came online after I
      restarted glusterd post-upgrade on that node. (I did not touch the
      op-version), and I did not spend time on it further.  So I don&#39;t
      think the problem is related to the shd mux changes I referred to.
      But if you have a test setup where you can reproduce this, please
      raise a github issue with the details.<br>
    </p>
    Thanks,<br>
    Ravi<br>
    <blockquote type="cite">
      <div dir="ltr">
        <div>Hopefully that issue will be addressed in the upcoming
          release. Once i&#39;ve everything running on the same version i&#39;ll
          check if the issue still occurs and reach out, if that&#39;s the
          case.</div>
        <div><br>
        </div>
        <div>Thanks Olaf</div>
      </div>
      <br>
      <div class="gmail_quote">
        <div dir="ltr" class="gmail_attr">Op wo 25 nov. 2020 om 10:42
          schreef Ashish Pandey &lt;<a href="mailto:aspandey@redhat.com" target="_blank">aspandey@redhat.com</a>&gt;:<br>
        </div>
        <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
          <div>
            <div>
              <div><br>
              </div>
              <div>Hi,<br>
              </div>
              <div><br>
              </div>
              <div>I checked the statedump and found some very high
                memory allocations.<br>
              </div>
              <div>grep -rwn &quot;num_allocs&quot; glusterdump.17317.dump.1605* |
                cut -d&#39;=&#39; -f2 | sort</div>
              <div><br>
                30003616 <br>
                30003616 <br>
                3305 <br>
                3305 <br>
                36960008 <br>
                36960008 <br>
                38029944 <br>
                38029944 <br>
                38450472 <br>
                38450472 <br>
                39566824 <br>
                39566824 <br>
                4 <br>
                I did check the lines on statedump and it could be
                happening in protocol/clinet. However, I did not find
                anything suspicious in my quick code exploration.<br>
              </div>
              <div>I would suggest to upgrade all the nodes on latest
                version and the start your work and see if there is any
                high usage of memory .<br>
              </div>
              <div>That way it will also be easier to debug this issue.<br>
              </div>
              <div><br>
              </div>
              <div>---<br>
              </div>
              <div>Ashish<br>
              </div>
              <div><br>
              </div>
              <hr id="gmail-m_-926496664757219733gmail-m_-5447166865019259697zwchr">
              <div style="color:rgb(0,0,0);font-weight:normal;font-style:normal;text-decoration:none;font-family:Helvetica,Arial,sans-serif;font-size:12pt"><b>From:
                </b>&quot;Olaf Buitelaar&quot; &lt;<a href="mailto:olaf.buitelaar@gmail.com" target="_blank">olaf.buitelaar@gmail.com</a>&gt;<br>
                <b>To: </b>&quot;gluster-users&quot; &lt;<a href="mailto:gluster-users@gluster.org" target="_blank">gluster-users@gluster.org</a>&gt;<br>
                <b>Sent: </b>Thursday, November 19, 2020 10:28:57 PM<br>
                <b>Subject: </b>[Gluster-users] possible memory leak in
                client/fuse mount<br>
                <div><br>
                </div>
                <div dir="ltr">Dear Gluster Users,
                  <div><br>
                  </div>
                  <div>I&#39;ve a glusterfs process which consumes about all
                    memory of the machine (~58GB);</div>
                  <div><br>
                  </div>
                  <div># ps -faxu|grep 17317<br>
                    root     17317  3.1 88.9 59695516 58479708 ?   Ssl
                     Oct31 839:36 /usr/sbin/glusterfs --process-name
                    fuse --volfile-server=10.201.0.1
                    --volfile-server=10.201.0.8:10.201.0.5:10.201.0.6:10.201.0.7:10.201.0.9
                    --volfile-id=/docker2 /mnt/docker2<br>
                  </div>
                  <div><br>
                  </div>
                  <div>The gluster version on this machine is 7.8, but
                    i&#39;m currently running a mixed cluster of 6.10 and
                    7.8, while awaiting to proceed to upgrade for the
                    issue mentioned earlier with the self-heal daemon.</div>
                  <div><br>
                  </div>
                  <div>The affected volume info looks like;</div>
                  <div><br>
                  </div>
                  # gluster v info docker2<br>
                  <div><br>
                  </div>
                  Volume Name: docker2<br>
                  Type: Distributed-Replicate<br>
                  Volume ID: 4e0670a0-3d00-4360-98bd-3da844cedae5<br>
                  Status: Started<br>
                  Snapshot Count: 0<br>
                  Number of Bricks: 3 x (2 + 1) = 9<br>
                  Transport-type: tcp<br>
                  Bricks:<br>
                  Brick1: 10.201.0.5:/data0/gfs/bricks/brick1/docker2<br>
                  Brick2: 10.201.0.9:/data0/gfs/bricks/brick1/docker2<br>
                  Brick3: 10.201.0.3:/data0/gfs/bricks/bricka/docker2
                  (arbiter)<br>
                  Brick4: 10.201.0.6:/data0/gfs/bricks/brick1/docker2<br>
                  Brick5: 10.201.0.7:/data0/gfs/bricks/brick1/docker2<br>
                  Brick6: 10.201.0.4:/data0/gfs/bricks/bricka/docker2
                  (arbiter)<br>
                  Brick7: 10.201.0.1:/data0/gfs/bricks/brick1/docker2<br>
                  Brick8: 10.201.0.8:/data0/gfs/bricks/brick1/docker2<br>
                  Brick9: 10.201.0.2:/data0/gfs/bricks/bricka/docker2
                  (arbiter)<br>
                  Options Reconfigured:<br>
                  performance.cache-size: 128MB<br>
                  transport.address-family: inet<br>
                  nfs.disable: on<br>
                  <div>cluster.brick-multiplex: on</div>
                  <div><br>
                  </div>
                  <div>The issue seems to be triggered by a program
                    called zammad, which has an init process, which runs
                    in a loop. on cycle it re-compiles the ruby-on-rails
                    application.</div>
                  <div><br>
                  </div>
                  <div>I&#39;ve attached 2 statedumps, but as i only
                    recently noticed the high memory usage, i believe
                    both statedumps already show an escalated state of
                    the glusterfs process. If it&#39;s needed to also have
                    them from the beginning let me know. The dumps are
                    taken about an hour apart.</div>
                  <div>Also i&#39;ve included the glusterd.log. I couldn&#39;t
                    include mnt-docker2.log since it&#39;s too large, since
                    it&#39;s littered with: &quot; I [MSGID: 109066]
                    [dht-rename.c:1951:dht_rename] 0-docker2-dht&quot;</div>
                  <div>However i&#39;ve inspected the log and it contains no
                    Error message&#39;s all are of the Info kind;</div>
                  <div>which look like these;</div>
                  <div>[2020-11-19 03:29:05.406766] I
                    [glusterfsd-mgmt.c:2282:mgmt_getspec_cbk]
                    0-glusterfs: No change in volfile,continuing<br>
                    [2020-11-19 03:29:21.271886] I
                    [socket.c:865:__socket_shutdown] 0-docker2-client-8:
                    intentional socket shutdown(5)<br>
                    [2020-11-19 03:29:24.479738] I
                    [socket.c:865:__socket_shutdown] 0-docker2-client-2:
                    intentional socket shutdown(5)<br>
                    [2020-11-19 03:30:12.318146] I
                    [socket.c:865:__socket_shutdown] 0-docker2-client-5:
                    intentional socket shutdown(5)<br>
                    [2020-11-19 03:31:27.381720] I
                    [socket.c:865:__socket_shutdown] 0-docker2-client-8:
                    intentional socket shutdown(5)<br>
                    [2020-11-19 03:31:30.579630] I
                    [socket.c:865:__socket_shutdown] 0-docker2-client-2:
                    intentional socket shutdown(5)<br>
                    [2020-11-19 03:32:18.427364] I
                    [socket.c:865:__socket_shutdown] 0-docker2-client-5:
                    intentional socket shutdown(5)<br>
                  </div>
                  <div><br>
                  </div>
                  <div>The rename messages look like these;</div>
                  <div>[2020-11-19 03:29:05.402663] I [MSGID: 109066]
                    [dht-rename.c:1951:dht_rename] 0-docker2-dht:
                    renaming
/corporate/zammad/tmp/init/cache/bootsnap-compile-cache/95/75f93c20e375c5.tmp.eVcE5D
                    (fe083b7e-b0d5-485c-8666-e1f7cdac33e2)
                    (hash=docker2-replicate-2/cache=docker2-replicate-2)
                    =&gt;
/corporate/zammad/tmp/init/cache/bootsnap-compile-cache/95/75f93c20e375c5
                    ((null))
                    (hash=docker2-replicate-2/cache=&lt;nul&gt;)<br>
                    [2020-11-19 03:29:05.410972] I [MSGID: 109066]
                    [dht-rename.c:1951:dht_rename] 0-docker2-dht:
                    renaming
/corporate/zammad/tmp/init/cache/bootsnap-compile-cache/0d/86dd25f3d238ff.tmp.AdDTLu
                    (b1edadad-1d48-4bf4-be85-ffbe9d69d338)
                    (hash=docker2-replicate-1/cache=docker2-replicate-1)
                    =&gt;
/corporate/zammad/tmp/init/cache/bootsnap-compile-cache/0d/86dd25f3d238ff
                    ((null))
                    (hash=docker2-replicate-2/cache=&lt;nul&gt;)<br>
                    [2020-11-19 03:29:05.420064] I [MSGID: 109066]
                    [dht-rename.c:1951:dht_rename] 0-docker2-dht:
                    renaming
/corporate/zammad/tmp/init/cache/bootsnap-compile-cache/f2/6e44f76b508fd3.tmp.QKmxul
                    (31f80fcb-977c-433b-9259-5fdfcad1171c)
                    (hash=docker2-replicate-0/cache=docker2-replicate-0)
                    =&gt;
/corporate/zammad/tmp/init/cache/bootsnap-compile-cache/f2/6e44f76b508fd3
                    ((null))
                    (hash=docker2-replicate-0/cache=&lt;nul&gt;)<br>
                    [2020-11-19 03:29:05.427537] I [MSGID: 109066]
                    [dht-rename.c:1951:dht_rename] 0-docker2-dht:
                    renaming
/corporate/zammad/tmp/init/cache/bootsnap-compile-cache/b0/1d7303d9dfe009.tmp.qLUMec
                    (e2fdf971-731f-4765-80e8-3165433488ea)
                    (hash=docker2-replicate-2/cache=docker2-replicate-2)
                    =&gt;
/corporate/zammad/tmp/init/cache/bootsnap-compile-cache/b0/1d7303d9dfe009
                    ((null))
                    (hash=docker2-replicate-1/cache=&lt;nul&gt;)<br>
                    [2020-11-19 03:29:05.440576] I [MSGID: 109066]
                    [dht-rename.c:1951:dht_rename] 0-docker2-dht:
                    renaming
/corporate/zammad/tmp/init/cache/bootsnap-compile-cache/bd/952a089e164b36.tmp.4qvl22
                    (3e0bc6d1-13ac-47c6-b221-1256b4b506ef)
                    (hash=docker2-replicate-2/cache=docker2-replicate-2)
                    =&gt;
/corporate/zammad/tmp/init/cache/bootsnap-compile-cache/bd/952a089e164b36
                    ((null))
                    (hash=docker2-replicate-1/cache=&lt;nul&gt;)<br>
                    [2020-11-19 03:29:05.452407] I [MSGID: 109066]
                    [dht-rename.c:1951:dht_rename] 0-docker2-dht:
                    renaming
/corporate/zammad/tmp/init/cache/bootsnap-compile-cache/a3/b587dd08f35e2e.tmp.iIweTT
                    (9685b5f3-4b14-4050-9b00-1163856239b5)
                    (hash=docker2-replicate-1/cache=docker2-replicate-1)
                    =&gt;
/corporate/zammad/tmp/init/cache/bootsnap-compile-cache/a3/b587dd08f35e2e
                    ((null))
                    (hash=docker2-replicate-0/cache=&lt;nul&gt;)<br>
                    [2020-11-19 03:29:05.460720] I [MSGID: 109066]
                    [dht-rename.c:1951:dht_rename] 0-docker2-dht:
                    renaming
/corporate/zammad/tmp/init/cache/bootsnap-compile-cache/48/89cfb1b971c025.tmp.0W7jMK
                    (d0a8d0a4-c783-45db-bb4a-68e24044d830)
                    (hash=docker2-replicate-0/cache=docker2-replicate-0)
                    =&gt;
/corporate/zammad/tmp/init/cache/bootsnap-compile-cache/48/89cfb1b971c025
                    ((null))
                    (hash=docker2-replicate-1/cache=&lt;nul&gt;)<br>
                    [2020-11-19 03:29:05.468800] I [MSGID: 109066]
                    [dht-rename.c:1951:dht_rename] 0-docker2-dht:
                    renaming
/corporate/zammad/tmp/init/cache/bootsnap-compile-cache/d9/759d55e8da66eb.tmp.2yXtHB
                    (e5b61ef5-a3c2-4a2c-aa47-c377a6c090d7)
                    (hash=docker2-replicate-0/cache=docker2-replicate-0)
                    =&gt;
/corporate/zammad/tmp/init/cache/bootsnap-compile-cache/d9/759d55e8da66eb
                    ((null))
                    (hash=docker2-replicate-0/cache=&lt;nul&gt;)<br>
                    [2020-11-19 03:29:05.476745] I [MSGID: 109066]
                    [dht-rename.c:1951:dht_rename] 0-docker2-dht:
                    renaming
/corporate/zammad/tmp/init/cache/bootsnap-compile-cache/1c/f3a658342e36b7.tmp.gSkiEs
                    (17181a40-f9b2-438f-9dfc-7bb159c516e6)
                    (hash=docker2-replicate-2/cache=docker2-replicate-2)
                    =&gt;
/corporate/zammad/tmp/init/cache/bootsnap-compile-cache/1c/f3a658342e36b7
                    ((null))
                    (hash=docker2-replicate-0/cache=&lt;nul&gt;)<br>
                    [2020-11-19 03:29:05.486729] I [MSGID: 109066]
                    [dht-rename.c:1951:dht_rename] 0-docker2-dht:
                    renaming
/corporate/zammad/tmp/init/cache/bootsnap-compile-cache/f1/6bef7cb6446c7a.tmp.sVT0Dj
                    (cb6b1d52-b1c0-420c-86b7-2ceb8e8e73db)
                    (hash=docker2-replicate-0/cache=docker2-replicate-0)
                    =&gt;
/corporate/zammad/tmp/init/cache/bootsnap-compile-cache/f1/6bef7cb6446c7a
                    ((null))
                    (hash=docker2-replicate-1/cache=&lt;nul&gt;)<br>
                    [2020-11-19 03:29:05.495115] I [MSGID: 109066]
                    [dht-rename.c:1951:dht_rename] 0-docker2-dht:
                    renaming
/corporate/zammad/tmp/init/cache/bootsnap-compile-cache/45/73ba226559961b.tmp.QdPTFa
                    (d8450d9e-62a7-4fd5-9dd2-e072e318d9a0)
                    (hash=docker2-replicate-0/cache=docker2-replicate-0)
                    =&gt;
/corporate/zammad/tmp/init/cache/bootsnap-compile-cache/45/73ba226559961b
                    ((null))
                    (hash=docker2-replicate-1/cache=&lt;nul&gt;)<br>
                    [2020-11-19 03:29:05.503424] I [MSGID: 109066]
                    [dht-rename.c:1951:dht_rename] 0-docker2-dht:
                    renaming
/corporate/zammad/tmp/init/cache/bootsnap-compile-cache/13/29c0df35961ca0.tmp.s1xUJ1
                    (ffc57a77-8b91-4264-8e2d-a9966f0f37ef)
                    (hash=docker2-replicate-1/cache=docker2-replicate-1)
                    =&gt;
/corporate/zammad/tmp/init/cache/bootsnap-compile-cache/13/29c0df35961ca0
                    ((null))
                    (hash=docker2-replicate-2/cache=&lt;nul&gt;)<br>
                    [2020-11-19 03:29:05.513532] I [MSGID: 109066]
                    [dht-rename.c:1951:dht_rename] 0-docker2-dht:
                    renaming
/corporate/zammad/tmp/init/cache/bootsnap-compile-cache/be/8d6a07b6a0d6ad.tmp.A5DzQS
                    (5a595a65-372d-4377-b547-2c4e23f7be3a)
                    (hash=docker2-replicate-1/cache=docker2-replicate-1)
                    =&gt;
/corporate/zammad/tmp/init/cache/bootsnap-compile-cache/be/8d6a07b6a0d6ad
                    ((null))
                    (hash=docker2-replicate-0/cache=&lt;nul&gt;)<br>
                    [2020-11-19 03:29:05.526885] I [MSGID: 109066]
                    [dht-rename.c:1951:dht_rename] 0-docker2-dht:
                    renaming
/corporate/zammad/tmp/init/cache/bootsnap-compile-cache/ec/4208216d993cbe.tmp.IMXg0J
                    (2fa99fcd-64f8-4934-aeda-b356816f1132)
                    (hash=docker2-replicate-2/cache=docker2-replicate-2)
                    =&gt;
/corporate/zammad/tmp/init/cache/bootsnap-compile-cache/ec/4208216d993cbe
                    ((null))
                    (hash=docker2-replicate-2/cache=&lt;nul&gt;)<br>
                    [2020-11-19 03:29:05.537637] I [MSGID: 109066]
                    [dht-rename.c:1951:dht_rename] 0-docker2-dht:
                    renaming
/corporate/zammad/tmp/init/cache/bootsnap-compile-cache/57/1527c482cf2d6b.tmp.Y2L0cB
                    (db24d7bf-4a06-4356-a52e-1ab9537d1c3a)
                    (hash=docker2-replicate-0/cache=docker2-replicate-0)
                    =&gt;
/corporate/zammad/tmp/init/cache/bootsnap-compile-cache/57/1527c482cf2d6b
                    ((null))
                    (hash=docker2-replicate-1/cache=&lt;nul&gt;)<br>
                    [2020-11-19 03:29:05.547878] I [MSGID: 109066]
                    [dht-rename.c:1951:dht_rename] 0-docker2-dht:
                    renaming
/corporate/zammad/tmp/init/cache/bootsnap-compile-cache/88/1b60ead8d4c4e5.tmp.u47rss
                    (b12f041b-5bbd-4e3d-b700-8f673830393f)
                    (hash=docker2-replicate-1/cache=docker2-replicate-1)
                    =&gt;
/corporate/zammad/tmp/init/cache/bootsnap-compile-cache/88/1b60ead8d4c4e5
                    ((null))
                    (hash=docker2-replicate-1/cache=&lt;nul&gt;)<br>
                  </div>
                  <div><br>
                  </div>
                  <div>if i can provide any more information please let
                    me know.</div>
                  <div><br>
                  </div>
                  <div>Thanks Olaf</div>
                  <div><br>
                  </div>
                </div>
                <br>
                ________<br>
                <div><br>
                </div>
                <br>
                <div><br>
                </div>
                Community Meeting Calendar:<br>
                <div><br>
                </div>
                Schedule -<br>
                Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC<br>
                Bridge: <a href="https://meet.google.com/cpu-eiue-hvk" target="_blank">https://meet.google.com/cpu-eiue-hvk</a><br>
                Gluster-users mailing list<br>
                <a href="mailto:Gluster-users@gluster.org" target="_blank">Gluster-users@gluster.org</a><br>
                <a href="https://lists.gluster.org/mailman/listinfo/gluster-users" target="_blank">https://lists.gluster.org/mailman/listinfo/gluster-users</a><br>
              </div>
              <div><br>
              </div>
            </div>
          </div>
        </blockquote>
      </div>
      <br>
      <fieldset></fieldset>
      <pre>________



Community Meeting Calendar:

Schedule -
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
Bridge: <a href="https://meet.google.com/cpu-eiue-hvk" target="_blank">https://meet.google.com/cpu-eiue-hvk</a>
Gluster-users mailing list
<a href="mailto:Gluster-users@gluster.org" target="_blank">Gluster-users@gluster.org</a>
<a href="https://lists.gluster.org/mailman/listinfo/gluster-users" target="_blank">https://lists.gluster.org/mailman/listinfo/gluster-users</a>
</pre>
    </blockquote>
  </div>

</blockquote></div>