<html>

  <head>

    <meta http-equiv="Content-Type" content="text/html; charset=utf-8">

  </head>

  <body text="#000000" bgcolor="#FFFFFF">

    <p>I couldn't find any disconnections yet. We analyzed the port's

      traffic to see if there was too much data going through, but that

      was OK. I also cannot see any other disconnections so for now we

      will continue to check the network because I might have missed

      something.</p>

    <p>Thanks for all the help! If I have any other news I will let you

      know.<br>

    </p>

    <div class="moz-signature">Pablo.<br>

      <br>

    </div>

    <div class="moz-cite-prefix">On 08/16/2018 01:06 AM, Ravishankar N

      wrote:<br>

    </div>

    <blockquote type="cite"

      cite="mid:088fe525-26a7-9804-1108-0fe71fcec20e@redhat.com">

      <meta http-equiv="Content-Type" content="text/html; charset=utf-8">

      <p><br>

      </p>

      <br>

      <div class="moz-cite-prefix">On 08/15/2018 11:07 PM, Pablo

        Schandin wrote:<br>

      </div>

      <blockquote type="cite"

        cite="mid:d4fba638-c664-0547-5d84-75f9a822c302@avature.net">

        <meta http-equiv="Content-Type" content="text/html;

          charset=utf-8">

        <p>I found another log that I wasn't aware of in

          /var/log/glusterfs/brick, that is te mount log, I confused the

          log files. In this file I see a lot of entries like this one:</p>

        <p><font size="-1">[2018-08-15 16:41:19.568477] I

            [addr.c:55:compare_addr_and_update] 0-/mnt/brick1/gv1:

            allowed = "172.20.36.10", received addr = "172.20.36.11"<br>

            [2018-08-15 16:41:19.568527] I

            [addr.c:55:compare_addr_and_update] 0-/mnt/brick1/gv1:

            allowed = "172.20.36.11", received addr = "172.20.36.11"<br>

            [2018-08-15 16:41:19.568547] I [login.c:76:gf_auth]

            0-auth/login: allowed user names:

            7107ccfa-0ba1-4172-aa5a-031568927bf1<br>

            [2018-08-15 16:41:19.568564] I [MSGID: 115029]

            [server-handshake.c:793:server_setvolume] 0-gv1-server:

            accepted client from

physinfra-hb2.xcade.net-21091-2018/08/15-16:41:03:103872-gv1-client-0-0-0

            (version: 3.1<br>

            2.6)<br>

            [2018-08-15 16:41:19.582710] I [MSGID: 115036]

            [server.c:527:server_rpc_notify] 0-gv1-server: disconnecting

            connection from

physinfra-hb2.xcade.net-21091-2018/08/15-16:41:03:103872-gv1-client-0-0-0<br>

            [2018-08-15 16:41:19.582830] I [MSGID: 101055]

            [client_t.c:443:gf_client_unref] 0-gv1-server: Shutting down

            connection

physinfra-hb2.xcade.net-21091-2018/08/15-16:41:03:103872-gv1-client-0-0-0</font><br>

          <br>

        </p>

        <p>So I see a lot of disconnections, right? This might be why

          the self healing is triggered all the time? <br>

        </p>

      </blockquote>

      Not necessarily. These disconnects could also be due to the

      glfsheal binary which is invoked when you run `gluster vol heal

      volname info` etc and do not cause heals. It would be better to

      check your client mount logs for disconnect messages like these:<br>

      <br>

      <tt>[2018-08-16 03:59:32.289763] I [MSGID: 114018]

        [client.c:2285:client_rpc_notify] 0-testvol-client-4:

        disconnected from testvol-client-0. Client process will keep

        trying to connect to glusterd until brick's port is available<br>

        <br>

      </tt>If there are no disconnects and you are still seeing files

      undergoing heal, then you might want to check the brick logs to

      see if there are any write failures.<br>

      Thanks,<br>

      Ravi<tt><br>

      </tt>

      <blockquote type="cite"

        cite="mid:d4fba638-c664-0547-5d84-75f9a822c302@avature.net">

        <p> </p>

        <p>Thanks! <br>

        </p>

        <div class="moz-signature">Pablo.<br>

          <p style="color:#72706b; font-family:

            Arial,Helvetica,sans-serif; font-size: 29px;

            line-height:29px; margin-top: 20px; margin-bottom:

            0;letter-spacing: 0;">Avature</p>

          <p style="color:#72706b; font-family:

            Arial,Helvetica,sans-serif; font-size: 10px; margin-top: 0;

            margin-bottom: 0;letter-spacing: 0;">Get Engaged to Talent</p>

          <br>

          <br>

        </div>

        <div class="moz-cite-prefix">On 08/14/2018 09:15 AM, Pablo

          Schandin wrote:<br>

        </div>

        <blockquote type="cite"

          cite="mid:d8866307-346f-d7e4-1734-93250b0bae32@avature.net">

          <meta http-equiv="Content-Type" content="text/html;

            charset=utf-8">

          <p>Thanks for the info!</p>

          <p>I cannot see any logs in the mount log besides one line

            every time it rotates<br>

          </p>

          <p><font size="-1">[2018-08-13 06:25:02.246187] I

              [glusterfsd-mgmt.c:1821:mgmt_getspec_cbk] 0-glusterfs: No

              change in volfile,continuing</font><br>

            <br>

          </p>

          <p>But I did find in the glfsheal-gv1.log of the volumes some

            kind of server-client connection that was disconnected and

            now it connects using a different port. The block of log per

            each run is kind of long so I'm copying it into a pastebin.<br>

          </p>

          <p><a class="moz-txt-link-freetext"

              href="https://pastebin.com/bp06rrsT"

              moz-do-not-send="true">https://pastebin.com/bp06rrsT</a></p>

          <p>Maybe this has something to do with it?</p>

          <p>Thanks!<br>

          </p>

          Pablo.<br>

          <div class="moz-signature"><br>

          </div>

          <div class="moz-cite-prefix">On 08/11/2018 12:19 AM,

            Ravishankar N wrote:<br>

          </div>

          <blockquote type="cite"

            cite="mid:961663b5-76f9-98fd-a04e-4dfb5fbef33d@redhat.com">

            <meta http-equiv="Content-Type" content="text/html;

              charset=utf-8">

            <p><br>

            </p>

            <br>

            <div class="moz-cite-prefix">On 08/10/2018 11:25 PM, Pablo

              Schandin wrote:<br>

            </div>

            <blockquote type="cite"

              cite="mid:737ac110-72ef-8499-db9a-a0be0a5134c7@avature.net">

              <meta http-equiv="content-type" content="text/html;

                charset=utf-8">

              <p>Hello everyone!</p>

              <p>I'm having some trouble with something but I'm not

                quite sure of with what yet. I'm running GlusterFS

                3.12.6 on Ubuntu 16.04. I have two servers (nodes) in

                the cluster in a replica mode. Each server has 2 bricks.

                As the servers are KVM running several VMs, one brick

                has some VMs locally defined in it and the second brick

                is the replicated from the other server. It has data but

                not actual writing is being done except for the

                replication.<br>

              </p>

              <p>                            Server 1                   

                                                                Server 2<br>

                Volume 1 (gv1): Brick 1 defined VMs (read/write)   

                ----&gt;                  Brick 1 replicated qcow2 files<br>

                Volume 2 (gv2): Brick 2 replicated qcow2 files       

                &lt;-----                 Brick 2 defined VMs

                (read/write)</p>

              <p>So, the main issue arose when I got a nagios alarm that

                warned about a file listed to be healed. And then it

                disappeared. I came to find out that every 5 minutes,

                the self heal daemon triggers the healing and this fixes

                it. But looking at the logs I have a lot of entries in

                the glustershd.log file like this:</p>

              <p><font size="-1">[2018-08-09 14:23:37.689403] I [MSGID:

                  108026] [afr-self-heal-common.c:1656:afr_log_selfheal]

                  0-gv1-replicate-0: Completed data selfheal on

                  407bd97b-e76c-4f81-8f59-7dae11507b0c. sources=[0] 

                  sinks=1 <br>

                  [2018-08-09 14:44:37.933143] I [MSGID: 108026]

                  [afr-self-heal-common.c:1656:afr_log_selfheal]

                  0-gv2-replicate-0: Completed data selfheal on

                  73713556-5b63-4f91-b83d-d7d82fee111f. sources=[0] 

                  sinks=1 </font><br>

              </p>

              <p>The qcow2 files are being healed several times a day

                (up to 30 in occasions). As I understand, this means

                that a data heal occurred on file with gfid 407b... and

                7371... in source to sink. Local server to replica

                server? Is it OK for the shd to heal files in the

                replicated brick that supposedly has no writing on it

                besides the mirroring? How does that work?</p>

            </blockquote>

            In AFR, for writes, there is no notion of local/remote

            brick. No matter from which client you write to the volume,

            it gets sent to both bricks. i.e. the replication is

            synchronous and real time. <br>

             <br>

            <blockquote type="cite"

              cite="mid:737ac110-72ef-8499-db9a-a0be0a5134c7@avature.net">

              <p>How does afr replication work? The file with gfid

                7371... is the qcow2 root disk of an owncloud server

                with 17GB of data. It does not seem to be that big to be

                a bottleneck of some sort, I think.</p>

              <p>Also, I was investigating the directory tree in

                brick/.glusterfs/indices and I notices that both in

                xattrop and dirty I always have a file created named

                xattrop-xxxxxx and dirty-xxxxxx. I read that the xattrop

                file is like a parent file or handle to reference other

                files created there as hardlinks with gfid name for the

                shd to heal. Is the same case as the ones in the dirty

                dir?</p>

            </blockquote>

            Yes, before the write, the gfid gets captured inside dirty

            on all bricks. If the write is successful, it gets removed.

            In addition, if the write fails on one brick, the other

            brick will capture the gfid inside xattrop.<br>

            <blockquote type="cite"

              cite="mid:737ac110-72ef-8499-db9a-a0be0a5134c7@avature.net">

              <p>Any help will be greatly appreciated it. Thanks!<br>

              </p>

            </blockquote>

            If frequent heals are triggered, it could mean there are

            frequent network disconnects from the clients to the bricks

            as writes happen. You can check the mount logs and see if

            that is the case and investigate possible network issues.<br>

            <br>

            HTH,<br>

            Ravi <br>

            <blockquote type="cite"

              cite="mid:737ac110-72ef-8499-db9a-a0be0a5134c7@avature.net">

              <p> </p>

              <p>Pablo.<br>

              </p>

              <div class="moz-signature"><br>

                <br>

              </div>

              <br>

              <fieldset class="mimeAttachmentHeader"></fieldset>

              <br>

              <pre wrap="">_______________________________________________

Gluster-users mailing list

<a class="moz-txt-link-abbreviated" href="mailto:Gluster-users@gluster.org" moz-do-not-send="true">Gluster-users@gluster.org</a>

<a class="moz-txt-link-freetext" href="https://lists.gluster.org/mailman/listinfo/gluster-users" moz-do-not-send="true">https://lists.gluster.org/mailman/listinfo/gluster-users</a></pre>

            </blockquote>

            <br>

          </blockquote>

          <br>

        </blockquote>

        <br>

      </blockquote>

      <br>

    </blockquote>

    <br>

  </body>

</html>