<div dir="ltr">Hi Ravi,<div><br></div><div>I could try that, but i can only try a setup on VM&#39;s, and will not be able to setup an environment like our production environment.</div><div>Which runs on physical machines, and has actual production load etc. So the 2 setups would be quite different.</div><div>Personally i think it would be best debug the actual machines instead of trying to reproduce it. Since the reproduction of the issue on the physical machines is just swap the repositories and upgrade the packages.</div><div>Let me know what you think?</div><div><br></div><div>Thanks Olaf</div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">Op do 26 nov. 2020 om 02:43 schreef Ravishankar N &lt;<a href="mailto:ravishankar@redhat.com">ravishankar@redhat.com</a>&gt;:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

  <div>

    <p><br>

    </p>

    <div>On 25/11/20 7:17 pm, Olaf Buitelaar

      wrote:<br>

    </div>

    <blockquote type="cite">

      <div dir="ltr">Hi Ravi,

        <div><br>

        </div>

        <div>Thanks for checking. Unfortunately this is our production

          system, what i&#39;ve done is simple change the yum repo from

          gluter-6 to <a href="http://mirror.centos.org/centos/$releasever/storage/$basearch/gluster-7/" target="_blank">http://mirror.centos.org/centos/$releasever/storage/$basearch/gluster-7/</a>.

          Did a yum upgrade. And did restart the glusterd process

          several times, i&#39;ve also tried rebooting the machine. And

          didn&#39;t touch the op-version yet, which is still at (60000),

          usually i only do this when all nodes are upgraded, and are

          running stable.</div>

        <div>We&#39;re running multiple volumes with different

          configurations, but for none of the volumes the shd starts on

          the upgraded nodes.</div>

        <div>Is there anything further i could check/do to get to the

          bottom of this?</div>

      </div>

    </blockquote>

    <p>Hi Olaf, like I said, would it be possible to create a test setup

      to see if you can recreate it?<br>

    </p>

    Regards,<br>

    Ravi<br>

    <blockquote type="cite">

      <div dir="ltr">

        <div><br>

        </div>

        <div>Thanks Olaf</div>

      </div>

      <br>

      <div class="gmail_quote">

        <div dir="ltr" class="gmail_attr">Op wo 25 nov. 2020 om 14:14

          schreef Ravishankar N &lt;<a href="mailto:ravishankar@redhat.com" target="_blank">ravishankar@redhat.com</a>&gt;:<br>

        </div>

        <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

          <div>

            <p><br>

            </p>

            <div>On 25/11/20 5:50 pm, Olaf Buitelaar wrote:<br>

            </div>

            <blockquote type="cite">

              <div dir="ltr">Hi Ashish,

                <div><br>

                </div>

                <div>Thank you for looking into this. I indeed also

                  suspect it has something todo with the 7.X client,

                  because on the 6.X clients the issue doesn&#39;t really

                  seem to occur.</div>

                <div>I would love to update everything to 7.X, But since

                  the self-heal daemons (<a href="https://lists.gluster.org/pipermail/gluster-users/2020-November/038917.html" target="_blank">https://lists.gluster.org/pipermail/gluster-users/2020-November/038917.html</a>)

                  won&#39;t start, i halted the full upgrade. <br>

                </div>

              </div>

            </blockquote>

            <p>Olaf, based on your email. I did try to upgrade a 1 node

              of a 3-node replica 3 setup from 6.10 to 7.8 on my test

              VMs and I found that the self-heal daemon (and the bricks)

              came online after I restarted glusterd post-upgrade on

              that node. (I did not touch the op-version), and I did not

              spend time on it further.  So I don&#39;t think the problem is

              related to the shd mux changes I referred to. But if you

              have a test setup where you can reproduce this, please

              raise a github issue with the details.<br>

            </p>

            Thanks,<br>

            Ravi<br>

            <blockquote type="cite">

              <div dir="ltr">

                <div>Hopefully that issue will be addressed in the

                  upcoming release. Once i&#39;ve everything running on the

                  same version i&#39;ll check if the issue still occurs and

                  reach out, if that&#39;s the case.</div>

                <div><br>

                </div>

                <div>Thanks Olaf</div>

              </div>

              <br>

              <div class="gmail_quote">

                <div dir="ltr" class="gmail_attr">Op wo 25 nov. 2020 om

                  10:42 schreef Ashish Pandey &lt;<a href="mailto:aspandey@redhat.com" target="_blank">aspandey@redhat.com</a>&gt;:<br>

                </div>

                <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

                  <div>

                    <div>

                      <div><br>

                      </div>

                      <div>Hi,<br>

                      </div>

                      <div><br>

                      </div>

                      <div>I checked the statedump and found some very

                        high memory allocations.<br>

                      </div>

                      <div>grep -rwn &quot;num_allocs&quot;

                        glusterdump.17317.dump.1605* | cut -d&#39;=&#39; -f2 |

                        sort</div>

                      <div><br>

                        30003616 <br>

                        30003616 <br>

                        3305 <br>

                        3305 <br>

                        36960008 <br>

                        36960008 <br>

                        38029944 <br>

                        38029944 <br>

                        38450472 <br>

                        38450472 <br>

                        39566824 <br>

                        39566824 <br>

                        4 <br>

                        I did check the lines on statedump and it could

                        be happening in protocol/clinet. However, I did

                        not find anything suspicious in my quick code

                        exploration.<br>

                      </div>

                      <div>I would suggest to upgrade all the nodes on

                        latest version and the start your work and see

                        if there is any high usage of memory .<br>

                      </div>

                      <div>That way it will also be easier to debug this

                        issue.<br>

                      </div>

                      <div><br>

                      </div>

                      <div>---<br>

                      </div>

                      <div>Ashish<br>

                      </div>

                      <div><br>

                      </div>

                      <hr id="gmail-m_-8005825012383939613gmail-m_-926496664757219733gmail-m_-5447166865019259697zwchr">

                      <div style="color:rgb(0,0,0);font-weight:normal;font-style:normal;text-decoration:none;font-family:Helvetica,Arial,sans-serif;font-size:12pt"><b>From:

                        </b>&quot;Olaf Buitelaar&quot; &lt;<a href="mailto:olaf.buitelaar@gmail.com" target="_blank">olaf.buitelaar@gmail.com</a>&gt;<br>

                        <b>To: </b>&quot;gluster-users&quot; &lt;<a href="mailto:gluster-users@gluster.org" target="_blank">gluster-users@gluster.org</a>&gt;<br>

                        <b>Sent: </b>Thursday, November 19, 2020

                        10:28:57 PM<br>

                        <b>Subject: </b>[Gluster-users] possible memory

                        leak in client/fuse mount<br>

                        <div><br>

                        </div>

                        <div dir="ltr">Dear Gluster Users,

                          <div><br>

                          </div>

                          <div>I&#39;ve a glusterfs process which consumes

                            about all memory of the machine (~58GB);</div>

                          <div><br>

                          </div>

                          <div># ps -faxu|grep 17317<br>

                            root     17317  3.1 88.9 59695516 58479708 ?

                              Ssl  Oct31 839:36 /usr/sbin/glusterfs

                            --process-name fuse

                            --volfile-server=10.201.0.1

                            --volfile-server=10.201.0.8:10.201.0.5:10.201.0.6:10.201.0.7:10.201.0.9

                            --volfile-id=/docker2 /mnt/docker2<br>

                          </div>

                          <div><br>

                          </div>

                          <div>The gluster version on this machine is

                            7.8, but i&#39;m currently running a mixed

                            cluster of 6.10 and 7.8, while awaiting to

                            proceed to upgrade for the issue mentioned

                            earlier with the self-heal daemon.</div>

                          <div><br>

                          </div>

                          <div>The affected volume info looks like;</div>

                          <div><br>

                          </div>

                          # gluster v info docker2<br>

                          <div><br>

                          </div>

                          Volume Name: docker2<br>

                          Type: Distributed-Replicate<br>

                          Volume ID:

                          4e0670a0-3d00-4360-98bd-3da844cedae5<br>

                          Status: Started<br>

                          Snapshot Count: 0<br>

                          Number of Bricks: 3 x (2 + 1) = 9<br>

                          Transport-type: tcp<br>

                          Bricks:<br>

                          Brick1:

                          10.201.0.5:/data0/gfs/bricks/brick1/docker2<br>

                          Brick2:

                          10.201.0.9:/data0/gfs/bricks/brick1/docker2<br>

                          Brick3:

                          10.201.0.3:/data0/gfs/bricks/bricka/docker2

                          (arbiter)<br>

                          Brick4:

                          10.201.0.6:/data0/gfs/bricks/brick1/docker2<br>

                          Brick5:

                          10.201.0.7:/data0/gfs/bricks/brick1/docker2<br>

                          Brick6:

                          10.201.0.4:/data0/gfs/bricks/bricka/docker2

                          (arbiter)<br>

                          Brick7:

                          10.201.0.1:/data0/gfs/bricks/brick1/docker2<br>

                          Brick8:

                          10.201.0.8:/data0/gfs/bricks/brick1/docker2<br>

                          Brick9:

                          10.201.0.2:/data0/gfs/bricks/bricka/docker2

                          (arbiter)<br>

                          Options Reconfigured:<br>

                          performance.cache-size: 128MB<br>

                          transport.address-family: inet<br>

                          nfs.disable: on<br>

                          <div>cluster.brick-multiplex: on</div>

                          <div><br>

                          </div>

                          <div>The issue seems to be triggered by a

                            program called zammad, which has an init

                            process, which runs in a loop. on cycle it

                            re-compiles the ruby-on-rails application.</div>

                          <div><br>

                          </div>

                          <div>I&#39;ve attached 2 statedumps, but as i only

                            recently noticed the high memory usage, i

                            believe both statedumps already show an

                            escalated state of the glusterfs process. If

                            it&#39;s needed to also have them from the

                            beginning let me know. The dumps are taken

                            about an hour apart.</div>

                          <div>Also i&#39;ve included the glusterd.log. I

                            couldn&#39;t include mnt-docker2.log since it&#39;s

                            too large, since it&#39;s littered with: &quot; I

                            [MSGID: 109066]

                            [dht-rename.c:1951:dht_rename]

                            0-docker2-dht&quot;</div>

                          <div>However i&#39;ve inspected the log and it

                            contains no Error message&#39;s all are of the

                            Info kind;</div>

                          <div>which look like these;</div>

                          <div>[2020-11-19 03:29:05.406766] I

                            [glusterfsd-mgmt.c:2282:mgmt_getspec_cbk]

                            0-glusterfs: No change in volfile,continuing<br>

                            [2020-11-19 03:29:21.271886] I

                            [socket.c:865:__socket_shutdown]

                            0-docker2-client-8: intentional socket

                            shutdown(5)<br>

                            [2020-11-19 03:29:24.479738] I

                            [socket.c:865:__socket_shutdown]

                            0-docker2-client-2: intentional socket

                            shutdown(5)<br>

                            [2020-11-19 03:30:12.318146] I

                            [socket.c:865:__socket_shutdown]

                            0-docker2-client-5: intentional socket

                            shutdown(5)<br>

                            [2020-11-19 03:31:27.381720] I

                            [socket.c:865:__socket_shutdown]

                            0-docker2-client-8: intentional socket

                            shutdown(5)<br>

                            [2020-11-19 03:31:30.579630] I

                            [socket.c:865:__socket_shutdown]

                            0-docker2-client-2: intentional socket

                            shutdown(5)<br>

                            [2020-11-19 03:32:18.427364] I

                            [socket.c:865:__socket_shutdown]

                            0-docker2-client-5: intentional socket

                            shutdown(5)<br>

                          </div>

                          <div><br>

                          </div>

                          <div>The rename messages look like these;</div>

                          <div>[2020-11-19 03:29:05.402663] I [MSGID:

                            109066] [dht-rename.c:1951:dht_rename]

                            0-docker2-dht: renaming

/corporate/zammad/tmp/init/cache/bootsnap-compile-cache/95/75f93c20e375c5.tmp.eVcE5D

                            (fe083b7e-b0d5-485c-8666-e1f7cdac33e2)

                            (hash=docker2-replicate-2/cache=docker2-replicate-2)

                            =&gt;

/corporate/zammad/tmp/init/cache/bootsnap-compile-cache/95/75f93c20e375c5

                            ((null))

                            (hash=docker2-replicate-2/cache=&lt;nul&gt;)<br>

                            [2020-11-19 03:29:05.410972] I [MSGID:

                            109066] [dht-rename.c:1951:dht_rename]

                            0-docker2-dht: renaming

/corporate/zammad/tmp/init/cache/bootsnap-compile-cache/0d/86dd25f3d238ff.tmp.AdDTLu

                            (b1edadad-1d48-4bf4-be85-ffbe9d69d338)

                            (hash=docker2-replicate-1/cache=docker2-replicate-1)

                            =&gt;

/corporate/zammad/tmp/init/cache/bootsnap-compile-cache/0d/86dd25f3d238ff

                            ((null))

                            (hash=docker2-replicate-2/cache=&lt;nul&gt;)<br>

                            [2020-11-19 03:29:05.420064] I [MSGID:

                            109066] [dht-rename.c:1951:dht_rename]

                            0-docker2-dht: renaming

/corporate/zammad/tmp/init/cache/bootsnap-compile-cache/f2/6e44f76b508fd3.tmp.QKmxul

                            (31f80fcb-977c-433b-9259-5fdfcad1171c)

                            (hash=docker2-replicate-0/cache=docker2-replicate-0)

                            =&gt;

/corporate/zammad/tmp/init/cache/bootsnap-compile-cache/f2/6e44f76b508fd3

                            ((null))

                            (hash=docker2-replicate-0/cache=&lt;nul&gt;)<br>

                            [2020-11-19 03:29:05.427537] I [MSGID:

                            109066] [dht-rename.c:1951:dht_rename]

                            0-docker2-dht: renaming

/corporate/zammad/tmp/init/cache/bootsnap-compile-cache/b0/1d7303d9dfe009.tmp.qLUMec

                            (e2fdf971-731f-4765-80e8-3165433488ea)

                            (hash=docker2-replicate-2/cache=docker2-replicate-2)

                            =&gt;

/corporate/zammad/tmp/init/cache/bootsnap-compile-cache/b0/1d7303d9dfe009

                            ((null))

                            (hash=docker2-replicate-1/cache=&lt;nul&gt;)<br>

                            [2020-11-19 03:29:05.440576] I [MSGID:

                            109066] [dht-rename.c:1951:dht_rename]

                            0-docker2-dht: renaming

/corporate/zammad/tmp/init/cache/bootsnap-compile-cache/bd/952a089e164b36.tmp.4qvl22

                            (3e0bc6d1-13ac-47c6-b221-1256b4b506ef)

                            (hash=docker2-replicate-2/cache=docker2-replicate-2)

                            =&gt;

/corporate/zammad/tmp/init/cache/bootsnap-compile-cache/bd/952a089e164b36

                            ((null))

                            (hash=docker2-replicate-1/cache=&lt;nul&gt;)<br>

                            [2020-11-19 03:29:05.452407] I [MSGID:

                            109066] [dht-rename.c:1951:dht_rename]

                            0-docker2-dht: renaming

/corporate/zammad/tmp/init/cache/bootsnap-compile-cache/a3/b587dd08f35e2e.tmp.iIweTT

                            (9685b5f3-4b14-4050-9b00-1163856239b5)

                            (hash=docker2-replicate-1/cache=docker2-replicate-1)

                            =&gt;

/corporate/zammad/tmp/init/cache/bootsnap-compile-cache/a3/b587dd08f35e2e

                            ((null))

                            (hash=docker2-replicate-0/cache=&lt;nul&gt;)<br>

                            [2020-11-19 03:29:05.460720] I [MSGID:

                            109066] [dht-rename.c:1951:dht_rename]

                            0-docker2-dht: renaming

/corporate/zammad/tmp/init/cache/bootsnap-compile-cache/48/89cfb1b971c025.tmp.0W7jMK

                            (d0a8d0a4-c783-45db-bb4a-68e24044d830)

                            (hash=docker2-replicate-0/cache=docker2-replicate-0)

                            =&gt;

/corporate/zammad/tmp/init/cache/bootsnap-compile-cache/48/89cfb1b971c025

                            ((null))

                            (hash=docker2-replicate-1/cache=&lt;nul&gt;)<br>

                            [2020-11-19 03:29:05.468800] I [MSGID:

                            109066] [dht-rename.c:1951:dht_rename]

                            0-docker2-dht: renaming

/corporate/zammad/tmp/init/cache/bootsnap-compile-cache/d9/759d55e8da66eb.tmp.2yXtHB

                            (e5b61ef5-a3c2-4a2c-aa47-c377a6c090d7)

                            (hash=docker2-replicate-0/cache=docker2-replicate-0)

                            =&gt;

/corporate/zammad/tmp/init/cache/bootsnap-compile-cache/d9/759d55e8da66eb

                            ((null))

                            (hash=docker2-replicate-0/cache=&lt;nul&gt;)<br>

                            [2020-11-19 03:29:05.476745] I [MSGID:

                            109066] [dht-rename.c:1951:dht_rename]

                            0-docker2-dht: renaming

/corporate/zammad/tmp/init/cache/bootsnap-compile-cache/1c/f3a658342e36b7.tmp.gSkiEs

                            (17181a40-f9b2-438f-9dfc-7bb159c516e6)

                            (hash=docker2-replicate-2/cache=docker2-replicate-2)

                            =&gt;

/corporate/zammad/tmp/init/cache/bootsnap-compile-cache/1c/f3a658342e36b7

                            ((null))

                            (hash=docker2-replicate-0/cache=&lt;nul&gt;)<br>

                            [2020-11-19 03:29:05.486729] I [MSGID:

                            109066] [dht-rename.c:1951:dht_rename]

                            0-docker2-dht: renaming

/corporate/zammad/tmp/init/cache/bootsnap-compile-cache/f1/6bef7cb6446c7a.tmp.sVT0Dj

                            (cb6b1d52-b1c0-420c-86b7-2ceb8e8e73db)

                            (hash=docker2-replicate-0/cache=docker2-replicate-0)

                            =&gt;

/corporate/zammad/tmp/init/cache/bootsnap-compile-cache/f1/6bef7cb6446c7a

                            ((null))

                            (hash=docker2-replicate-1/cache=&lt;nul&gt;)<br>

                            [2020-11-19 03:29:05.495115] I [MSGID:

                            109066] [dht-rename.c:1951:dht_rename]

                            0-docker2-dht: renaming

/corporate/zammad/tmp/init/cache/bootsnap-compile-cache/45/73ba226559961b.tmp.QdPTFa

                            (d8450d9e-62a7-4fd5-9dd2-e072e318d9a0)

                            (hash=docker2-replicate-0/cache=docker2-replicate-0)

                            =&gt;

/corporate/zammad/tmp/init/cache/bootsnap-compile-cache/45/73ba226559961b

                            ((null))

                            (hash=docker2-replicate-1/cache=&lt;nul&gt;)<br>

                            [2020-11-19 03:29:05.503424] I [MSGID:

                            109066] [dht-rename.c:1951:dht_rename]

                            0-docker2-dht: renaming

/corporate/zammad/tmp/init/cache/bootsnap-compile-cache/13/29c0df35961ca0.tmp.s1xUJ1

                            (ffc57a77-8b91-4264-8e2d-a9966f0f37ef)

                            (hash=docker2-replicate-1/cache=docker2-replicate-1)

                            =&gt;

/corporate/zammad/tmp/init/cache/bootsnap-compile-cache/13/29c0df35961ca0

                            ((null))

                            (hash=docker2-replicate-2/cache=&lt;nul&gt;)<br>

                            [2020-11-19 03:29:05.513532] I [MSGID:

                            109066] [dht-rename.c:1951:dht_rename]

                            0-docker2-dht: renaming

/corporate/zammad/tmp/init/cache/bootsnap-compile-cache/be/8d6a07b6a0d6ad.tmp.A5DzQS

                            (5a595a65-372d-4377-b547-2c4e23f7be3a)

                            (hash=docker2-replicate-1/cache=docker2-replicate-1)

                            =&gt;

/corporate/zammad/tmp/init/cache/bootsnap-compile-cache/be/8d6a07b6a0d6ad

                            ((null))

                            (hash=docker2-replicate-0/cache=&lt;nul&gt;)<br>

                            [2020-11-19 03:29:05.526885] I [MSGID:

                            109066] [dht-rename.c:1951:dht_rename]

                            0-docker2-dht: renaming

/corporate/zammad/tmp/init/cache/bootsnap-compile-cache/ec/4208216d993cbe.tmp.IMXg0J

                            (2fa99fcd-64f8-4934-aeda-b356816f1132)

                            (hash=docker2-replicate-2/cache=docker2-replicate-2)

                            =&gt;

/corporate/zammad/tmp/init/cache/bootsnap-compile-cache/ec/4208216d993cbe

                            ((null))

                            (hash=docker2-replicate-2/cache=&lt;nul&gt;)<br>

                            [2020-11-19 03:29:05.537637] I [MSGID:

                            109066] [dht-rename.c:1951:dht_rename]

                            0-docker2-dht: renaming

/corporate/zammad/tmp/init/cache/bootsnap-compile-cache/57/1527c482cf2d6b.tmp.Y2L0cB

                            (db24d7bf-4a06-4356-a52e-1ab9537d1c3a)

                            (hash=docker2-replicate-0/cache=docker2-replicate-0)

                            =&gt;

/corporate/zammad/tmp/init/cache/bootsnap-compile-cache/57/1527c482cf2d6b

                            ((null))

                            (hash=docker2-replicate-1/cache=&lt;nul&gt;)<br>

                            [2020-11-19 03:29:05.547878] I [MSGID:

                            109066] [dht-rename.c:1951:dht_rename]

                            0-docker2-dht: renaming

/corporate/zammad/tmp/init/cache/bootsnap-compile-cache/88/1b60ead8d4c4e5.tmp.u47rss

                            (b12f041b-5bbd-4e3d-b700-8f673830393f)

                            (hash=docker2-replicate-1/cache=docker2-replicate-1)

                            =&gt;

/corporate/zammad/tmp/init/cache/bootsnap-compile-cache/88/1b60ead8d4c4e5

                            ((null))

                            (hash=docker2-replicate-1/cache=&lt;nul&gt;)<br>

                          </div>

                          <div><br>

                          </div>

                          <div>if i can provide any more information

                            please let me know.</div>

                          <div><br>

                          </div>

                          <div>Thanks Olaf</div>

                          <div><br>

                          </div>

                        </div>

                        <br>

                        ________<br>

                        <div><br>

                        </div>

                        <br>

                        <div><br>

                        </div>

                        Community Meeting Calendar:<br>

                        <div><br>

                        </div>

                        Schedule -<br>

                        Every 2nd and 4th Tuesday at 14:30 IST / 09:00

                        UTC<br>

                        Bridge: <a href="https://meet.google.com/cpu-eiue-hvk" target="_blank">https://meet.google.com/cpu-eiue-hvk</a><br>

                        Gluster-users mailing list<br>

                        <a href="mailto:Gluster-users@gluster.org" target="_blank">Gluster-users@gluster.org</a><br>

                        <a href="https://lists.gluster.org/mailman/listinfo/gluster-users" target="_blank">https://lists.gluster.org/mailman/listinfo/gluster-users</a><br>

                      </div>

                      <div><br>

                      </div>

                    </div>

                  </div>

                </blockquote>

              </div>

              <br>

              <fieldset></fieldset>

              <pre>________

Community Meeting Calendar:

Schedule -

Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC

Bridge: <a href="https://meet.google.com/cpu-eiue-hvk" target="_blank">https://meet.google.com/cpu-eiue-hvk</a>

Gluster-users mailing list

<a href="mailto:Gluster-users@gluster.org" target="_blank">Gluster-users@gluster.org</a>

<a href="https://lists.gluster.org/mailman/listinfo/gluster-users" target="_blank">https://lists.gluster.org/mailman/listinfo/gluster-users</a>

</pre>

            </blockquote>

          </div>

        </blockquote>

      </div>

    </blockquote>

  </div>

</blockquote></div>