<html>

  <head>

    <meta http-equiv="Content-Type" content="text/html; charset=utf-8">

  </head>

  <body text="#000000" bgcolor="#FFFFFF">

    <p>I found another log that I wasn't aware of in

      /var/log/glusterfs/brick, that is te mount log, I confused the log

      files. In this file I see a lot of entries like this one:</p>

    <p><font size="-1">[2018-08-15 16:41:19.568477] I

        [addr.c:55:compare_addr_and_update] 0-/mnt/brick1/gv1: allowed =

        "172.20.36.10", received addr = "172.20.36.11"<br>

        [2018-08-15 16:41:19.568527] I

        [addr.c:55:compare_addr_and_update] 0-/mnt/brick1/gv1: allowed =

        "172.20.36.11", received addr = "172.20.36.11"<br>

        [2018-08-15 16:41:19.568547] I [login.c:76:gf_auth]

        0-auth/login: allowed user names:

        7107ccfa-0ba1-4172-aa5a-031568927bf1<br>

        [2018-08-15 16:41:19.568564] I [MSGID: 115029]

        [server-handshake.c:793:server_setvolume] 0-gv1-server: accepted

        client from

physinfra-hb2.xcade.net-21091-2018/08/15-16:41:03:103872-gv1-client-0-0-0

        (version: 3.1<br>

        2.6)<br>

        [2018-08-15 16:41:19.582710] I [MSGID: 115036]

        [server.c:527:server_rpc_notify] 0-gv1-server: disconnecting

        connection from

physinfra-hb2.xcade.net-21091-2018/08/15-16:41:03:103872-gv1-client-0-0-0<br>

        [2018-08-15 16:41:19.582830] I [MSGID: 101055]

        [client_t.c:443:gf_client_unref] 0-gv1-server: Shutting down

        connection

physinfra-hb2.xcade.net-21091-2018/08/15-16:41:03:103872-gv1-client-0-0-0</font><br>

      <br>

    </p>

    <p>So I see a lot of disconnections, right? This might be why the

      self healing is triggered all the time? <br>

    </p>

    <p>Thanks! <br>

    </p>

    <div class="moz-signature">Pablo.<br>

      <p style="color:#72706b; font-family: Arial,Helvetica,sans-serif;

        font-size: 29px; line-height:29px; margin-top: 20px;

        margin-bottom: 0;letter-spacing: 0;">Avature</p>

      <p style="color:#72706b; font-family: Arial,Helvetica,sans-serif;

        font-size: 10px; margin-top: 0; margin-bottom: 0;letter-spacing:

        0;">Get Engaged to Talent</p>

      <br>

      <br>

    </div>

    <div class="moz-cite-prefix">On 08/14/2018 09:15 AM, Pablo Schandin

      wrote:<br>

    </div>

    <blockquote type="cite"

      cite="mid:d8866307-346f-d7e4-1734-93250b0bae32@avature.net">

      <meta http-equiv="Content-Type" content="text/html; charset=utf-8">

      <p>Thanks for the info!</p>

      <p>I cannot see any logs in the mount log besides one line every

        time it rotates<br>

      </p>

      <p><font size="-1">[2018-08-13 06:25:02.246187] I

          [glusterfsd-mgmt.c:1821:mgmt_getspec_cbk] 0-glusterfs: No

          change in volfile,continuing</font><br>

        <br>

      </p>

      <p>But I did find in the glfsheal-gv1.log of the volumes some kind

        of server-client connection that was disconnected and now it

        connects using a different port. The block of log per each run

        is kind of long so I'm copying it into a pastebin.<br>

      </p>

      <p><a class="moz-txt-link-freetext"

          href="https://pastebin.com/bp06rrsT" moz-do-not-send="true">https://pastebin.com/bp06rrsT</a></p>

      <p>Maybe this has something to do with it?</p>

      <p>Thanks!<br>

      </p>

      Pablo.<br>

      <div class="moz-signature"><br>

      </div>

      <div class="moz-cite-prefix">On 08/11/2018 12:19 AM, Ravishankar N

        wrote:<br>

      </div>

      <blockquote type="cite"

        cite="mid:961663b5-76f9-98fd-a04e-4dfb5fbef33d@redhat.com">

        <meta http-equiv="Content-Type" content="text/html;

          charset=utf-8">

        <p><br>

        </p>

        <br>

        <div class="moz-cite-prefix">On 08/10/2018 11:25 PM, Pablo

          Schandin wrote:<br>

        </div>

        <blockquote type="cite"

          cite="mid:737ac110-72ef-8499-db9a-a0be0a5134c7@avature.net">

          <meta http-equiv="content-type" content="text/html;

            charset=utf-8">

          <p>Hello everyone!</p>

          <p>I'm having some trouble with something but I'm not quite

            sure of with what yet. I'm running GlusterFS 3.12.6 on

            Ubuntu 16.04. I have two servers (nodes) in the cluster in a

            replica mode. Each server has 2 bricks. As the servers are

            KVM running several VMs, one brick has some VMs locally

            defined in it and the second brick is the replicated from

            the other server. It has data but not actual writing is

            being done except for the replication.<br>

          </p>

          <p>                            Server 1                       

                                                        Server 2<br>

            Volume 1 (gv1): Brick 1 defined VMs (read/write)   

            ----&gt;                  Brick 1 replicated qcow2 files<br>

            Volume 2 (gv2): Brick 2 replicated qcow2 files       

            &lt;-----                 Brick 2 defined VMs (read/write)</p>

          <p>So, the main issue arose when I got a nagios alarm that

            warned about a file listed to be healed. And then it

            disappeared. I came to find out that every 5 minutes, the

            self heal daemon triggers the healing and this fixes it. But

            looking at the logs I have a lot of entries in the

            glustershd.log file like this:</p>

          <p><font size="-1">[2018-08-09 14:23:37.689403] I [MSGID:

              108026] [afr-self-heal-common.c:1656:afr_log_selfheal]

              0-gv1-replicate-0: Completed data selfheal on

              407bd97b-e76c-4f81-8f59-7dae11507b0c. sources=[0]  sinks=1

              <br>

              [2018-08-09 14:44:37.933143] I [MSGID: 108026]

              [afr-self-heal-common.c:1656:afr_log_selfheal]

              0-gv2-replicate-0: Completed data selfheal on

              73713556-5b63-4f91-b83d-d7d82fee111f. sources=[0]  sinks=1

            </font><br>

          </p>

          <p>The qcow2 files are being healed several times a day (up to

            30 in occasions). As I understand, this means that a data

            heal occurred on file with gfid 407b... and 7371... in

            source to sink. Local server to replica server? Is it OK for

            the shd to heal files in the replicated brick that

            supposedly has no writing on it besides the mirroring? How

            does that work?</p>

        </blockquote>

        In AFR, for writes, there is no notion of local/remote brick. No

        matter from which client you write to the volume, it gets sent

        to both bricks. i.e. the replication is synchronous and real

        time. <br>

         <br>

        <blockquote type="cite"

          cite="mid:737ac110-72ef-8499-db9a-a0be0a5134c7@avature.net">

          <p>How does afr replication work? The file with gfid 7371...

            is the qcow2 root disk of an owncloud server with 17GB of

            data. It does not seem to be that big to be a bottleneck of

            some sort, I think.</p>

          <p>Also, I was investigating the directory tree in

            brick/.glusterfs/indices and I notices that both in xattrop

            and dirty I always have a file created named xattrop-xxxxxx

            and dirty-xxxxxx. I read that the xattrop file is like a

            parent file or handle to reference other files created there

            as hardlinks with gfid name for the shd to heal. Is the same

            case as the ones in the dirty dir?</p>

        </blockquote>

        Yes, before the write, the gfid gets captured inside dirty on

        all bricks. If the write is successful, it gets removed. In

        addition, if the write fails on one brick, the other brick will

        capture the gfid inside xattrop.<br>

        <blockquote type="cite"

          cite="mid:737ac110-72ef-8499-db9a-a0be0a5134c7@avature.net">

          <p>Any help will be greatly appreciated it. Thanks!<br>

          </p>

        </blockquote>

        If frequent heals are triggered, it could mean there are

        frequent network disconnects from the clients to the bricks as

        writes happen. You can check the mount logs and see if that is

        the case and investigate possible network issues.<br>

        <br>

        HTH,<br>

        Ravi <br>

        <blockquote type="cite"

          cite="mid:737ac110-72ef-8499-db9a-a0be0a5134c7@avature.net">

          <p> </p>

          <p>Pablo.<br>

          </p>

          <div class="moz-signature"><br>

            <br>

          </div>

          <br>

          <fieldset class="mimeAttachmentHeader"></fieldset>

          <br>

          <pre wrap="">_______________________________________________

Gluster-users mailing list

<a class="moz-txt-link-abbreviated" href="mailto:Gluster-users@gluster.org" moz-do-not-send="true">Gluster-users@gluster.org</a>

<a class="moz-txt-link-freetext" href="https://lists.gluster.org/mailman/listinfo/gluster-users" moz-do-not-send="true">https://lists.gluster.org/mailman/listinfo/gluster-users</a></pre>

        </blockquote>

        <br>

      </blockquote>

      <br>

    </blockquote>

    <br>

  </body>

</html>