<html>
  <head>
    <meta content="text/html; charset=windows-1252"
      http-equiv="Content-Type">
  </head>
  <body bgcolor="#FFFFFF" text="#000000">
    <div class="moz-cite-prefix">On 05/09/2017 12:59 PM, Alessandro
      Briosi wrote:<br>
    </div>
    <blockquote
      cite="mid:e278d975-3fd7-0fd2-3aa7-4d95fe081384@metalit.com"
      type="cite">
      <meta content="text/html; charset=windows-1252"
        http-equiv="Content-Type">
      <div class="moz-cite-prefix">Il 08/05/2017 15:49, Alessandro
        Briosi ha scritto:<br>
      </div>
      <blockquote
        cite="mid:5a70610f-75ff-453b-355a-ae0a5037a9d2@metalit.com"
        type="cite">
        <div class="moz-cite-prefix">Il 08/05/2017 12:57, Jesper Led
          Lauridsen TS Infra server ha scritto:<br>
        </div>
        <blockquote
cite="mid:DB6PR0202MB2568C7EFC5390C8DEADAFE0AB2EE0@DB6PR0202MB2568.eurprd02.prod.outlook.com"
          type="cite">
          <p class="MsoNormal"><span
style="font-size:11.0pt;font-family:&quot;Calibri&quot;,sans-serif;mso-fareast-language:EN-US"
              lang="EN-US">I dont know if this has any relation to you
              issue. But I have seen several times during gluster
              healing that my wm’s fail or are marked unresponsive in
              rhev. My conclusion is that the load gluster puts on the
              wm-images during checksum while healing, result in to much
              latency and wm’s fail.<o:p></o:p></span></p>
          <p class="MsoNormal"><span
style="font-size:11.0pt;font-family:&quot;Calibri&quot;,sans-serif;mso-fareast-language:EN-US"
              lang="EN-US"><o:p> </o:p></span></p>
          <p class="MsoNormal"><span
style="font-size:11.0pt;font-family:&quot;Calibri&quot;,sans-serif;mso-fareast-language:EN-US"
              lang="EN-US">My plans is to try using sharding, so the
              wm-images/files are split into smaller files, changing the
              number of allowed concurrent heals
              ‘cluster.background-self-heal-count’ and disabling
              ‘cluster.self-heal-daemon’.<o:p></o:p></span></p>
          <p class="MsoNormal"><span
style="font-size:11.0pt;font-family:&quot;Calibri&quot;,sans-serif;mso-fareast-language:EN-US"
              lang="EN-US"><o:p></o:p></span></p>
        </blockquote>
        <br>
        The thing is that there are no heal processes running, no log
        entries either.<br>
        Few days ago I had a failure and the heal process started and
        finished without any problems.<br>
        <br>
        I do not use sharding yet.<br>
      </blockquote>
      <br>
      Well, it happened again on a different volume and a different VM.<br>
      <br>
      This time a self heal process was started.<br>
      <br>
      Why is this happening? there are no network problems on the hosts
      and they all do have bonded 2x1Gbit nics dedicated to gluster...<br>
      <br>
      Is there any information I can give you to find out what happened?<br>
      <br>
      This is the only mention about heal in the logs:<br>
      [2017-05-08 17:34:40.474774] I [MSGID: 108026]
      [afr-self-heal-common.c:1254:afr_log_selfheal]
      0-datastore1-replicate-0: Completed data selfheal on
      bc8f6a7e-31e5-4b48-946c-f779a4b2e64f. sources=[1]  sinks=0 2<br>
      <br>
      The VM went down 1 1/2 hour before:<br>
      [2017-05-08 15:54:11.781749] I [MSGID: 115036]
      [server.c:548:server_rpc_notify] 0-datastore1-server:
      disconnecting connection from
      srvpve1-59243-2017/05/04-00:51:23:914158-datastore1-client-0-0-0<br>
      [2017-05-08 15:54:11.781749] I [MSGID: 115036]
      [server.c:548:server_rpc_notify] 0-datastore1-server:
      disconnecting connection from
      srvpve1-59243-2017/05/04-00:51:23:790337-datastore1-client-0-0-0<br>
      [2017-05-08 15:54:11.781840] I [MSGID: 115013]
      [server-helpers.c:293:do_fd_cleanup] 0-datastore1-server: fd
      cleanup on /images/101/vm-101-disk-2.qcow2<br>
      [2017-05-08 15:54:11.781838] W
      [inodelk.c:399:pl_inodelk_log_cleanup] 0-datastore1-server:
      releasing lock on bc8f6a7e-31e5-4b48-946c-f779a4b2e64f held by
      {client=0x7ffa7c0051f0, pid=0 lk-owner=5c600023827f0<br>
      000}<br>
      [2017-05-08 15:54:11.781863] I [MSGID: 115013]
      [server-helpers.c:293:do_fd_cleanup] 0-datastore1-server: fd
      cleanup on /images/101/vm-101-disk-1.qcow2<br>
      [2017-05-08 15:54:11.781947] I [MSGID: 101055]
      [client_t.c:415:gf_client_unref] 0-datastore1-server: Shutting
      down connection
      srvpve1-59243-2017/05/04-00:51:23:914158-datastore1-client-0-0-0<br>
      [2017-05-08 15:54:11.781971] I [MSGID: 101055]
      [client_t.c:415:gf_client_unref] 0-datastore1-server: Shutting
      down connection
      srvpve1-59243-2017/05/04-00:51:23:790337-datastore1-client-0-0-0<br>
      <br>
      <br>
      Any hint would be greatly apreciated.<br>
    </blockquote>
    <br>
    Can you share the log of the fuse mount
    (/var/log/glusterfs/&lt;path-to-mout-point.log&gt; on which the VM
    was running? When you say 'VM going down' , do you mean it paused/
    became unresponsive?<br>
    -Ravi<br>
    <blockquote
      cite="mid:e278d975-3fd7-0fd2-3aa7-4d95fe081384@metalit.com"
      type="cite"> <br>
      Alessandro<br>
      <br>
      <br>
      <fieldset class="mimeAttachmentHeader"></fieldset>
      <br>
      <pre wrap="">_______________________________________________
Gluster-users mailing list
<a class="moz-txt-link-abbreviated" href="mailto:Gluster-users@gluster.org">Gluster-users@gluster.org</a>
<a class="moz-txt-link-freetext" href="http://lists.gluster.org/mailman/listinfo/gluster-users">http://lists.gluster.org/mailman/listinfo/gluster-users</a></pre>
    </blockquote>
    <p><br>
    </p>
  </body>
</html>