<html>
  <head>

    <meta http-equiv="content-type" content="text/html; charset=utf-8">
  </head>
  <body text="#000000" bgcolor="#FFFFFF">
    <p>Hi,</p>
    <p>A client has a glusterfs cluster that's behaving weirdly after
      some issues during upgrade.</p>
    <p>They upgraded a glusterfs 2+1 cluster (replica with arbiter) from
      3.10.9 to 3.12.4 on Centos and now have weird issues and some
      files maybe being corrupted. They also switched from nfs ganesha
      that crashed every couple of days to glusterfs subdirectory
      mounting. Subdirectory mounting was the point of the upgrade.</p>
    <p>On a client when I go switch to a user and go to the home folder,
      I do a ls -lart and get a list of files, when I do it the second
      time I get "ls: cannot open directory .: Permission denied" until
      I logout and login again. Then it works mostly one time only,
      somtimes it fails on the first time. Deploying something in this
      folder works sometimes, other times I get write permission is
      denied on /home/user.<br>
    </p>
    <p>I've enabled the bitrot daemon but it's still running:</p>
    <blockquote>
      <p>Volume name : home<br>
        <br>
        State of scrub: Active (In Progress)<br>
        <br>
        Scrub impact: normal<br>
        <br>
        Scrub frequency: biweekly<br>
        <br>
        Bitrot error log location: /var/log/glusterfs/bitd.log<br>
        <br>
        Scrubber error log location: /var/log/glusterfs/scrub.log<br>
        <br>
        <br>
        =========================================================<br>
        <br>
        Node: localhost<br>
        <br>
        Number of Scrubbed files: 152451<br>
        <br>
        Number of Skipped files: 197<br>
        <br>
        Last completed scrub time: Scrubber pending to complete.<br>
        <br>
        Duration of last scrub (D:M:H:M:S): 0:0:0:0<br>
        <br>
        Error count: 0<br>
        <br>
        <br>
        =========================================================<br>
        <br>
        Node: gluster01<br>
        <br>
        Number of Scrubbed files: 150198<br>
        <br>
        Number of Skipped files: 190<br>
        <br>
        Last completed scrub time: Scrubber pending to complete.<br>
        <br>
        Duration of last scrub (D:M:H:M:S): 0:0:0:0<br>
        <br>
        Error count: 0<br>
        <br>
        <br>
        =========================================================<br>
        <br>
        Node: gluster02<br>
        <br>
        Number of Scrubbed files: 0<br>
        <br>
        Number of Skipped files: 153939<br>
        <br>
        Last completed scrub time: Scrubber pending to complete.<br>
        <br>
        Duration of last scrub (D:M:H:M:S): 0:0:0:0<br>
        <br>
        Error count: 0<br>
        <br>
        =========================================================<br>
        <br>
      </p>
    </blockquote>
    <p>Gluster volume heal has one failed entries.<br>
    </p>
    <blockquote>Ending time of crawl: Thu Jan 18 11:49:04 2018<br>
      <br>
      Type of crawl: INDEX<br>
      No. of entries healed: 0<br>
      No. of entries in split-brain: 0<br>
      No. of heal failed entries: 0<br>
      <br>
      Starting time of crawl: Thu Jan 18 11:59:04 2018<br>
      <br>
      Ending time of crawl: Thu Jan 18 11:59:06 2018<br>
      <br>
      Type of crawl: INDEX<br>
      No. of entries healed: 0<br>
      No. of entries in split-brain: 0<br>
      <b>No. of heal failed entries: 1</b><br>
      <br>
      Starting time of crawl: Thu Jan 18 12:09:06 2018<br>
      <br>
      Ending time of crawl: Thu Jan 18 12:09:07 2018<br>
      <br>
      Type of crawl: INDEX<br>
      No. of entries healed: 0<br>
      No. of entries in split-brain: 0<br>
      No. of heal failed entries: 0<br>
      <br>
    </blockquote>
    <p> but I'm unable to trace which file as "gluster volume heal
      &lt;volume&gt; info heal-failed" doesn't exist anymore<br>
    </p>
    <blockquote>
      <p><br>
        gluster volume heal home info heal-failed<br>
        <br>
        Usage:<br>
        volume heal &lt;VOLNAME&gt; [enable | disable | full |statistics
        [heal-count [replica &lt;HOSTNAME:BRICKNAME&gt;]] |info
        [split-brain] |split-brain {bigger-file &lt;FILE&gt; |
        latest-mtime &lt;FILE&gt; |source-brick
        &lt;HOSTNAME:BRICKNAME&gt; [&lt;FILE&gt;]} |granular-entry-heal
        {enable | disable}]<br>
        <br>
      </p>
    </blockquote>
    <p>The things I'm seeing in the logfiles on the client:<br>
    </p>
    <p>       [2018-01-18 08:59:53.360864] W [MSGID: 114031]
      [client-rpc-fops.c:2151:client3_3_seek_cbk] 0-home-client-0:
      remote operation failed [No such device or address]<br>
             [2018-01-18 09:00:17.512636] W [MSGID: 114031]
      [client-rpc-fops.c:2151:client3_3_seek_cbk] 0-home-client-0:
      remote operation failed [No such device or address]<br>
             [2018-01-18 09:00:27.473702] W [MSGID: 114031]
      [client-rpc-fops.c:2151:client3_3_seek_cbk] 0-home-client-0:
      remote operation failed [No such device or address]<br>
             [2018-01-18 09:00:40.372756] W [MSGID: 114031]
      [client-rpc-fops.c:2151:client3_3_seek_cbk] 0-home-client-0:
      remote operation failed [No such device or address]<br>
             [2018-01-18 09:00:50.344597] W [MSGID: 114031]
      [client-rpc-fops.c:2151:client3_3_seek_cbk] 0-home-client-0:
      remote operation failed [No such device or address]<br>
    </p>
    <p>Thes one worry me, there are multiple but without any gfid:<br>
    </p>
    <blockquote> [MSGID: 109063] [dht-layout.c:716:dht_layout_normalize]
      5-home-dht: Found anomalies in (null) (gfid =
      00000000-0000-0000-0000-000000000000). Holes=1 overlaps=0<br>
    </blockquote>
    Also it seems to disconnect and <br>
    <blockquote>[2018-01-18 08:38:41.210848] I [MSGID: 114057]
      [client-handshake.c:1478:select_server_supported_programs]
      0-home-client-1: Using Program GlusterFS 3.3, Num (1298437),
      Version (330)<br>
      [2018-01-18 08:38:41.214548] I [MSGID: 114057]
      [client-handshake.c:1478:select_server_supported_programs]
      0-home-client-2: Using Program GlusterFS 3.3, Num (1298437),
      Version (330)<br>
      [2018-01-18 08:38:41.255458] I [MSGID: 114046]
      [client-handshake.c:1231:client_setvolume_cbk] 0-home-client-0:
      Connected to home-client-0, attached to remote volume
      '/data/home/brick1'.<br>
      [2018-01-18 08:38:41.255505] I [MSGID: 114047]
      [client-handshake.c:1242:client_setvolume_cbk] 0-home-client-0:
      Server and Client lk-version numbers are not same, reopening the
      fds<br>
      [2018-01-18 08:38:41.255643] I [MSGID: 108005]
      [afr-common.c:4929:__afr_handle_child_up_event] 0-home-replicate-0<b>:
        Subvolume 'home-client-0' came back up; going online.</b><br>
      [2018-01-18 08:38:41.255791] I [MSGID: 114035]
      [client-handshake.c:202:client_set_lk_version_cbk]
      0-home-client-0: Server lk version = 1<br>
      [2018-01-18 08:38:41.285774] I [MSGID: 114046]
      [client-handshake.c:1231:client_setvolume_cbk] 0-home-client-1:
      Connected to home-client-1, attached to remote volume
      '/data/home/brick1'.<br>
      [2018-01-18 08:38:41.285815] I [MSGID: 114047]
      [client-handshake.c:1242:client_setvolume_cbk] 0-home-client-1:
      Server and Client lk-version numbers are not same, reopening the
      fds<br>
      [2018-01-18 08:38:41.286672] I [MSGID: 114035]
      [client-handshake.c:202:client_set_lk_version_cbk]
      0-home-client-1: Server lk version = 1<br>
      [2018-01-18 08:38:41.291928] I [MSGID: 114046]
      [client-handshake.c:1231:client_setvolume_cbk] 0-home-client-2:
      Connected to home-client-2, attached to remote volume
      '/data/home/brick1'.<br>
      [2018-01-18 08:38:41.291970] I [MSGID: 114047]
      [client-handshake.c:1242:client_setvolume_cbk] 0-home-client-2:
      Server and Client lk-version numbers are not same, reopening the
      fds<br>
      [2018-01-18 08:38:41.292750] I [MSGID: 114035]
      [client-handshake.c:202:client_set_lk_version_cbk]
      0-home-client-2: Server lk version = 1<br>
      [2018-01-18 08:38:41.764822] I [MSGID: 108031]
      [afr-common.c:2376:afr_local_discovery_cbk] 0-home-replicate-0:
      selecting local read_child home-client-0<br>
    </blockquote>
    <br>
    <blockquote>[2018-01-18 08:38:39.039054] I
      [addr.c:55:compare_addr_and_update] 0-/data/home/brick1: allowed =
      "*", received addr = "*.*.*.*"<br>
      [2018-01-18 08:38:39.040813] I [login.c:76:gf_auth] 0-auth/login:
      allowed user names: 2d225def-3f34-472a-b8e4-c183acafc151<br>
      [2018-01-18 08:38:39.040853] I [MSGID: 115029]
      [server-handshake.c:793:server_setvolume] 0-home-server: accepted
      client from
gluster00.cluster.local-21570-2018/01/18-08:38:38:940331-home-client-0-0-0
      (version: 3.12.4)<br>
      [2018-01-18 08:38:39.412234] I [MSGID: 115036]
      [server.c:527:server_rpc_notify] 0-home-server: disconnecting
      connection from
gluster00.cluster.local-21570-2018/01/18-08:38:38:940331-home-client-0-0-0<br>
      [2018-01-18 08:38:39.435180] I [MSGID: 101055]
      [client_t.c:443:gf_client_unref] 0-home-server: Shutting down
      connection
storage00.railscluster.local-21570-2018/01/18-08:38:38:940331-home-client-0-0-0<br>
      [2018-01-18 08:38:41.203430] I [addr.c:55:compare_addr_and_update]
      0-/data/home/brick1: allowed = "*", received addr = "*.*.*.*"<br>
      [2018-01-18 08:38:41.203464] I [login.c:76:gf_auth] 0-auth/login:
      allowed user names: 2d225def-3f34-472a-b8e4-c183acafc151<br>
      [2018-01-18 08:38:41.203502] I [MSGID: 115029]
      [server-handshake.c:793:server_setvolume] 0-home-server: accepted
      client from
gluster00.cluster.locall-21590-2018/01/18-08:38:41:174726-home-client-0-0-0
      (version: 3.12.4)<br>
      [2018-01-18 08:38:41.787891] I [MSGID: 115036]
      [server.c:527:server_rpc_notify] 0-home-server: disconnecting
      connection from
gluster00.cluster.local-21590-2018/01/18-08:38:41:174726-home-client-0-0-0<br>
      [2018-01-18 08:38:41.790071] I [MSGID: 101055]
      [client_t.c:443:gf_client_unref] 0-home-server: Shutting down
      connection
gluster00.cluster.local.local-21590-2018/01/18-08:38:41:174726-home-client-0-0-0<br>
      [2018-01-18 08:38:56.298125] I
      [glusterfsd-mgmt.c:52:mgmt_cbk_spec] 0-mgmt: Volume file changed<br>
      [2018-01-18 08:38:56.384120] I
      [glusterfsd-mgmt.c:52:mgmt_cbk_spec] 0-mgmt: Volume file changed<br>
      [2018-01-18 08:38:56.394284] I
      [glusterfsd-mgmt.c:1821:mgmt_getspec_cbk] 0-glusterfs: No change
      in volfile,continuing<br>
      [2018-01-18 08:38:56.450621] I
      [glusterfsd-mgmt.c:1821:mgmt_getspec_cbk] 0-glusterfs: No change
      in volfile,continuing<br>
      [2018-01-18 08:39:17.606237] E [MSGID: 113107]
      [posix.c:1150:posix_seek] 0-home-posix: seek failed on fd 639
      length 1166 [No such device or address]<br>
      [2018-01-18 08:39:17.606310] E [MSGID: 115089]
      [server-rpc-fops.c:2090:server_seek_cbk] 0-home-server: 1432655:
      SEEK-2 (c20376ae-7db9-4340-950e-dba6aa95e848), client:
      client1.cluster.local-5710-2018/01/17-22:02:07:122452-home-client-0-0-0,
      error-xlator: home-posix [No such device or address]<br>
      [2018-01-18 08:39:17.610233] E [MSGID: 113107]
      [posix.c:1150:posix_seek] 0-home-posix: seek failed on fd 639
      length 1166 [No such device or address]<br>
      [2018-01-18 08:39:17.610285] E [MSGID: 115089]
      [server-rpc-fops.c:2090:server_seek_cbk] 0-home-server: 1432656:
      SEEK-2 (c20376ae-7db9-4340-950e-dba6aa95e848), client:
      client1.cluster.local-5710-2018/01/17-22:02:07:122452-home-client-0-0-0,
      error-xlator: home-posix [No such device or address]<br>
      [2018-01-18 08:37:53.592700] E [MSGID: 113107]
      [posix.c:1150:posix_seek] 0-home-posix: seek failed on fd 1993
      length 2279 [No such device or address]<br>
      [2018-01-18 08:39:53.583741] E [MSGID: 113107]
      [posix.c:1150:posix_seek] 0-home-posix: seek failed on fd 1948
      length 1758 [No such device or address]<br>
      [2018-01-18 08:39:53.583822] E [MSGID: 115089]
      [server-rpc-fops.c:2090:server_seek_cbk] 0-home-server: 641734:
      SEEK-2 (96dc4c82-ab85-4477-a6e6-1116466f68ba), client:
      client1.cluster.local-5772-2018/01/17-22:02:07:201351-home-client-0-0-0,
      error-xlator: home-posix [No such device or address]<br>
      <br>
    </blockquote>
  <BR />
<BR />
<b style="color:#604c78"></b><br><br><span style="color:#604c78;"><font color="000000"><span style="mso-fareast-language:en-gb;" lang="NL">Met vriendelijke groet, With kind regards,<br><br>Jorick Astrego<br></span></font></span><b style="color:#604c78"><br>Netbulae Virtualization Experts </b><br><hr style="border:none;border-top:1px solid #ccc;"><table style="width: 522px"><tbody><tr><td style="width: 130px;font-size: 10px">Tel:  053 20 30 270</td>    <td style="width: 130px;font-size: 10px">info@netbulae.eu</td>    <td style="width: 130px;font-size: 10px">Staalsteden 4-3A</td>    <td style="width: 130px;font-size: 10px">KvK 08198180</td></tr><tr>    <td style="width: 130px;font-size: 10px">Fax: 053 20 30 271</td>    <td style="width: 130px;font-size: 10px">www.netbulae.eu</td>    <td style="width: 130px;font-size: 10px">7547 TA Enschede</td>    <td style="width: 130px;font-size: 10px">BTW NL821234584B01</td></tr></tbody></table><br><hr style="border:none;border-top:1px solid #ccc;"><BR />
</body>
</html>