<html>

  <head>

    <meta content="text/html; charset=windows-1252"

      http-equiv="Content-Type">

  </head>

  <body bgcolor="#FFFFFF" text="#000000">

    <div class="moz-cite-prefix">Il 08/05/2017 15:49, Alessandro Briosi

      ha scritto:<br>

    </div>

    <blockquote

      cite="mid:5a70610f-75ff-453b-355a-ae0a5037a9d2@metalit.com"

      type="cite">

      <div class="moz-cite-prefix">Il 08/05/2017 12:57, Jesper Led

        Lauridsen TS Infra server ha scritto:<br>

      </div>

      <blockquote

cite="mid:DB6PR0202MB2568C7EFC5390C8DEADAFE0AB2EE0@DB6PR0202MB2568.eurprd02.prod.outlook.com"

        type="cite">

        <p class="MsoNormal"><span

style="font-size:11.0pt;font-family:&quot;Calibri&quot;,sans-serif;mso-fareast-language:EN-US"

            lang="EN-US">I dont know if this has any relation to you

            issue. But I have seen several times during gluster healing

            that my wm’s fail or are marked unresponsive in rhev. My

            conclusion is that the load gluster puts on the wm-images

            during checksum while healing, result in to much latency and

            wm’s fail.<o:p></o:p></span></p>

        <p class="MsoNormal"><span

style="font-size:11.0pt;font-family:&quot;Calibri&quot;,sans-serif;mso-fareast-language:EN-US"

            lang="EN-US"><o:p> </o:p></span></p>

        <p class="MsoNormal"><span

style="font-size:11.0pt;font-family:&quot;Calibri&quot;,sans-serif;mso-fareast-language:EN-US"

            lang="EN-US">My plans is to try using sharding, so the

            wm-images/files are split into smaller files, changing the

            number of allowed concurrent heals

            ‘cluster.background-self-heal-count’ and disabling

            ‘cluster.self-heal-daemon’.<o:p></o:p></span></p>

        <p class="MsoNormal"><span

style="font-size:11.0pt;font-family:&quot;Calibri&quot;,sans-serif;mso-fareast-language:EN-US"

            lang="EN-US"><o:p></o:p></span></p>

      </blockquote>

      <br>

      The thing is that there are no heal processes running, no log

      entries either.<br>

      Few days ago I had a failure and the heal process started and

      finished without any problems.<br>

      <br>

      I do not use sharding yet.<br>

    </blockquote>

    <br>

    Well, it happened again on a different volume and a different VM.<br>

    <br>

    This time a self heal process was started.<br>

    <br>

    Why is this happening? there are no network problems on the hosts

    and they all do have bonded 2x1Gbit nics dedicated to gluster...<br>

    <br>

    Is there any information I can give you to find out what happened?<br>

    <br>

    This is the only mention about heal in the logs:<br>

    [2017-05-08 17:34:40.474774] I [MSGID: 108026]

    [afr-self-heal-common.c:1254:afr_log_selfheal]

    0-datastore1-replicate-0: Completed data selfheal on

    bc8f6a7e-31e5-4b48-946c-f779a4b2e64f. sources=[1]  sinks=0 2<br>

    <br>

    The VM went down 1 1/2 hour before:<br>

    [2017-05-08 15:54:11.781749] I [MSGID: 115036]

    [server.c:548:server_rpc_notify] 0-datastore1-server: disconnecting

    connection from

    srvpve1-59243-2017/05/04-00:51:23:914158-datastore1-client-0-0-0<br>

    [2017-05-08 15:54:11.781749] I [MSGID: 115036]

    [server.c:548:server_rpc_notify] 0-datastore1-server: disconnecting

    connection from

    srvpve1-59243-2017/05/04-00:51:23:790337-datastore1-client-0-0-0<br>

    [2017-05-08 15:54:11.781840] I [MSGID: 115013]

    [server-helpers.c:293:do_fd_cleanup] 0-datastore1-server: fd cleanup

    on /images/101/vm-101-disk-2.qcow2<br>

    [2017-05-08 15:54:11.781838] W

    [inodelk.c:399:pl_inodelk_log_cleanup] 0-datastore1-server:

    releasing lock on bc8f6a7e-31e5-4b48-946c-f779a4b2e64f held by

    {client=0x7ffa7c0051f0, pid=0 lk-owner=5c600023827f0<br>

    000}<br>

    [2017-05-08 15:54:11.781863] I [MSGID: 115013]

    [server-helpers.c:293:do_fd_cleanup] 0-datastore1-server: fd cleanup

    on /images/101/vm-101-disk-1.qcow2<br>

    [2017-05-08 15:54:11.781947] I [MSGID: 101055]

    [client_t.c:415:gf_client_unref] 0-datastore1-server: Shutting down

    connection

    srvpve1-59243-2017/05/04-00:51:23:914158-datastore1-client-0-0-0<br>

    [2017-05-08 15:54:11.781971] I [MSGID: 101055]

    [client_t.c:415:gf_client_unref] 0-datastore1-server: Shutting down

    connection

    srvpve1-59243-2017/05/04-00:51:23:790337-datastore1-client-0-0-0<br>

    <br>

    <br>

    Any hint would be greatly apreciated.<br>

    <br>

    Alessandro<br>

    <br>

  </body>

</html>