<html>

  <head>

    <meta content="text/html; charset=windows-1252"

      http-equiv="Content-Type">

  </head>

  <body bgcolor="#FFFFFF" text="#000000">

    <div class="moz-cite-prefix">On 05/09/2017 12:59 PM, Alessandro

      Briosi wrote:<br>

    </div>

    <blockquote

      cite="mid:e278d975-3fd7-0fd2-3aa7-4d95fe081384@metalit.com"

      type="cite">

      <meta content="text/html; charset=windows-1252"

        http-equiv="Content-Type">

      <div class="moz-cite-prefix">Il 08/05/2017 15:49, Alessandro

        Briosi ha scritto:<br>

      </div>

      <blockquote

        cite="mid:5a70610f-75ff-453b-355a-ae0a5037a9d2@metalit.com"

        type="cite">

        <div class="moz-cite-prefix">Il 08/05/2017 12:57, Jesper Led

          Lauridsen TS Infra server ha scritto:<br>

        </div>

        <blockquote

cite="mid:DB6PR0202MB2568C7EFC5390C8DEADAFE0AB2EE0@DB6PR0202MB2568.eurprd02.prod.outlook.com"

          type="cite">

          <p class="MsoNormal"><span

style="font-size:11.0pt;font-family:&quot;Calibri&quot;,sans-serif;mso-fareast-language:EN-US"

              lang="EN-US">I dont know if this has any relation to you

              issue. But I have seen several times during gluster

              healing that my wm’s fail or are marked unresponsive in

              rhev. My conclusion is that the load gluster puts on the

              wm-images during checksum while healing, result in to much

              latency and wm’s fail.<o:p></o:p></span></p>

          <p class="MsoNormal"><span

style="font-size:11.0pt;font-family:&quot;Calibri&quot;,sans-serif;mso-fareast-language:EN-US"

              lang="EN-US"><o:p> </o:p></span></p>

          <p class="MsoNormal"><span

style="font-size:11.0pt;font-family:&quot;Calibri&quot;,sans-serif;mso-fareast-language:EN-US"

              lang="EN-US">My plans is to try using sharding, so the

              wm-images/files are split into smaller files, changing the

              number of allowed concurrent heals

              ‘cluster.background-self-heal-count’ and disabling

              ‘cluster.self-heal-daemon’.<o:p></o:p></span></p>

          <p class="MsoNormal"><span

style="font-size:11.0pt;font-family:&quot;Calibri&quot;,sans-serif;mso-fareast-language:EN-US"

              lang="EN-US"><o:p></o:p></span></p>

        </blockquote>

        <br>

        The thing is that there are no heal processes running, no log

        entries either.<br>

        Few days ago I had a failure and the heal process started and

        finished without any problems.<br>

        <br>

        I do not use sharding yet.<br>

      </blockquote>

      <br>

      Well, it happened again on a different volume and a different VM.<br>

      <br>

      This time a self heal process was started.<br>

      <br>

      Why is this happening? there are no network problems on the hosts

      and they all do have bonded 2x1Gbit nics dedicated to gluster...<br>

      <br>

      Is there any information I can give you to find out what happened?<br>

      <br>

      This is the only mention about heal in the logs:<br>

      [2017-05-08 17:34:40.474774] I [MSGID: 108026]

      [afr-self-heal-common.c:1254:afr_log_selfheal]

      0-datastore1-replicate-0: Completed data selfheal on

      bc8f6a7e-31e5-4b48-946c-f779a4b2e64f. sources=[1]  sinks=0 2<br>

      <br>

      The VM went down 1 1/2 hour before:<br>

      [2017-05-08 15:54:11.781749] I [MSGID: 115036]

      [server.c:548:server_rpc_notify] 0-datastore1-server:

      disconnecting connection from

      srvpve1-59243-2017/05/04-00:51:23:914158-datastore1-client-0-0-0<br>

      [2017-05-08 15:54:11.781749] I [MSGID: 115036]

      [server.c:548:server_rpc_notify] 0-datastore1-server:

      disconnecting connection from

      srvpve1-59243-2017/05/04-00:51:23:790337-datastore1-client-0-0-0<br>

      [2017-05-08 15:54:11.781840] I [MSGID: 115013]

      [server-helpers.c:293:do_fd_cleanup] 0-datastore1-server: fd

      cleanup on /images/101/vm-101-disk-2.qcow2<br>

      [2017-05-08 15:54:11.781838] W

      [inodelk.c:399:pl_inodelk_log_cleanup] 0-datastore1-server:

      releasing lock on bc8f6a7e-31e5-4b48-946c-f779a4b2e64f held by

      {client=0x7ffa7c0051f0, pid=0 lk-owner=5c600023827f0<br>

      000}<br>

      [2017-05-08 15:54:11.781863] I [MSGID: 115013]

      [server-helpers.c:293:do_fd_cleanup] 0-datastore1-server: fd

      cleanup on /images/101/vm-101-disk-1.qcow2<br>

      [2017-05-08 15:54:11.781947] I [MSGID: 101055]

      [client_t.c:415:gf_client_unref] 0-datastore1-server: Shutting

      down connection

      srvpve1-59243-2017/05/04-00:51:23:914158-datastore1-client-0-0-0<br>

      [2017-05-08 15:54:11.781971] I [MSGID: 101055]

      [client_t.c:415:gf_client_unref] 0-datastore1-server: Shutting

      down connection

      srvpve1-59243-2017/05/04-00:51:23:790337-datastore1-client-0-0-0<br>

      <br>

      <br>

      Any hint would be greatly apreciated.<br>

    </blockquote>

    <br>

    Can you share the log of the fuse mount

    (/var/log/glusterfs/&lt;path-to-mout-point.log&gt; on which the VM

    was running? When you say 'VM going down' , do you mean it paused/

    became unresponsive?<br>

    -Ravi<br>

    <blockquote

      cite="mid:e278d975-3fd7-0fd2-3aa7-4d95fe081384@metalit.com"

      type="cite"> <br>

      Alessandro<br>

      <br>

      <br>

      <fieldset class="mimeAttachmentHeader"></fieldset>

      <br>

      <pre wrap="">_______________________________________________

Gluster-users mailing list

<a class="moz-txt-link-abbreviated" href="mailto:Gluster-users@gluster.org">Gluster-users@gluster.org</a>

<a class="moz-txt-link-freetext" href="http://lists.gluster.org/mailman/listinfo/gluster-users">http://lists.gluster.org/mailman/listinfo/gluster-users</a></pre>

    </blockquote>

    <p><br>

    </p>

  </body>

</html>