<html><head><style><![CDATA[#x8f9ef8de84f5426e85fa8d9e48372580{
        font-family:Tahoma;
        font-size:12pt;
}]]></style>
<style id="signatureStyle"><!--#x64746296573c472
{font-family: 'Segoe UI'; font-size: 12pt;}
--></style>
<style id="css_styles"><![CDATA[
blockquote.cite { margin-left: 5px; margin-right: 0px; padding-left: 10px; padding-right:0px; border-left: 1px solid #cccccc }
blockquote.cite2 {margin-left: 5px; margin-right: 0px; padding-left: 10px; padding-right:0px; border-left: 1px solid #cccccc; margin-top: 3px; padding-top: 0px; }
a img { border: 0px; }
li[style='text-align: center;'], li[style='text-align: right;'] { list-style-position: inside;}
body { font-family: Tahoma; font-size: 12pt; }
]]></style>
</head>
<body><div>Hello all,</div><div><br /></div><div>We are having a rather interesting problem with one of our VM storage systems. The GlusterFS client is throwing errors relating to GFID mismatches. We traced this down to multiple shards being present on the gluster nodes, with different gfids.</div><div><br /></div><div>Hypervisor gluster mount log:</div><div><br /></div><div>[2018-03-25 18:54:19.261733] E [MSGID: 133010] [shard.c:1724:shard_common_lookup_shards_cbk] 0-ovirt-zone1-shard: Lookup on shard 7 failed. Base file gfid = 87137cac-49eb-492a-8f33-8e33470d8cb7 [Stale file handle]</div><div>The message "W [MSGID: 109009] [dht-common.c:2162:dht_lookup_linkfile_cbk] 0-ovirt-zone1-dht: /.shard/87137cac-49eb-492a-8f33-8e33470d8cb7.7: gfid different on data file on ovirt-zone1-replicate-3, gfid local = 00000000-0000-0000-0000-000000000000, gfid node = 57c6fcdf-52bb-4f7a-aea4-02f0dc81ff56 " repeated 2 times between [2018-03-25 18:54:19.253748] and [2018-03-25 18:54:19.263576]</div><div>[2018-03-25 18:54:19.264349] W [MSGID: 109009] [dht-common.c:1901:dht_lookup_everywhere_cbk] 0-ovirt-zone1-dht: /.shard/87137cac-49eb-492a-8f33-8e33470d8cb7.7: gfid differs on subvolume ovirt-zone1-replicate-3, gfid local = fdf0813b-718a-4616-a51b-6999ebba9ec3, gfid node = 57c6fcdf-52bb-4f7a-aea4-02f0dc81ff56</div><div><br /></div><div><br /></div><div>On the storage nodes, we found this:</div><div><br /></div><div>[root@n1 gluster]# find -name 87137cac-49eb-492a-8f33-8e33470d8cb7.7</div><div>./brick2/brick/.shard/87137cac-49eb-492a-8f33-8e33470d8cb7.7</div><div>./brick4/brick/.shard/87137cac-49eb-492a-8f33-8e33470d8cb7.7</div><div><br /></div><div>[root@n1 gluster]# ls -lh ./brick2/brick/.shard/87137cac-49eb-492a-8f33-8e33470d8cb7.7</div><div>---------T. 2 root root 0 Mar 25 13:55 ./brick2/brick/.shard/87137cac-49eb-492a-8f33-8e33470d8cb7.7</div><div>[root@n1 gluster]# ls -lh ./brick4/brick/.shard/87137cac-49eb-492a-8f33-8e33470d8cb7.7</div><div>-rw-rw----. 2 root root 3.8G Mar 25 13:55 ./brick4/brick/.shard/87137cac-49eb-492a-8f33-8e33470d8cb7.7</div>
<div><br /></div><div id="signature_old"><div id="x64746296573c472"><div>[root@n1 gluster]# getfattr -d -m . -e hex ./brick2/brick/.shard/87137cac-49eb-492a-8f33-8e33470d8cb7.7</div><div># file: brick2/brick/.shard/87137cac-49eb-492a-8f33-8e33470d8cb7.7</div><div>security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6c6162656c65645f743a733000</div><div>trusted.gfid=0xfdf0813b718a4616a51b6999ebba9ec3</div><div>trusted.glusterfs.dht.linkto=0x6f766972742d3335302d7a6f6e65312d7265706c69636174652d3300</div><div><br /></div><div>[root@n1 gluster]# getfattr -d -m . -e hex ./brick4/brick/.shard/87137cac-49eb-492a-8f33-8e33470d8cb7.7</div><div># file: brick4/brick/.shard/87137cac-49eb-492a-8f33-8e33470d8cb7.7</div><div>security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6c6162656c65645f743a733000</div><div>trusted.afr.dirty=0x000000000000000000000000</div><div>trusted.bit-rot.version=0x020000000000000059914190000ce672</div><div>trusted.gfid=0x57c6fcdf52bb4f7aaea402f0dc81ff56</div><div><br /></div><div><br /></div><div>I'm wondering how they got created in the first place, and if anyone has any insight on how to fix it?</div><div><br /></div><div>Storage nodes:</div><div>[root@n1 gluster]# gluster --version</div><div>glusterfs 4.0.0</div><div><br /></div><div>[root@n1 gluster]# gluster volume info</div><div><br /></div><div>Volume Name: ovirt-350-zone1</div><div>Type: Distributed-Replicate</div><div>Volume ID: 106738ed-9951-4270-822e-63c9bcd0a20e</div><div>Status: Started</div><div>Snapshot Count: 0</div><div>Number of Bricks: 7 x (2 + 1) = 21</div><div>Transport-type: tcp</div><div>Bricks:</div><div>Brick1: 10.0.6.100:/gluster/brick1/brick</div><div>Brick2: 10.0.6.101:/gluster/brick1/brick</div><div>Brick3: 10.0.6.102:/gluster/arbrick1/brick (arbiter)</div><div>Brick4: 10.0.6.100:/gluster/brick2/brick</div><div>Brick5: 10.0.6.101:/gluster/brick2/brick</div><div>Brick6: 10.0.6.102:/gluster/arbrick2/brick (arbiter)</div><div>Brick7: 10.0.6.100:/gluster/brick3/brick</div><div>Brick8: 10.0.6.101:/gluster/brick3/brick</div><div>Brick9: 10.0.6.102:/gluster/arbrick3/brick (arbiter)</div><div>Brick10: 10.0.6.100:/gluster/brick4/brick</div><div>Brick11: 10.0.6.101:/gluster/brick4/brick</div><div>Brick12: 10.0.6.102:/gluster/arbrick4/brick (arbiter)</div><div>Brick13: 10.0.6.100:/gluster/brick5/brick</div><div>Brick14: 10.0.6.101:/gluster/brick5/brick</div><div>Brick15: 10.0.6.102:/gluster/arbrick5/brick (arbiter)</div><div>Brick16: 10.0.6.100:/gluster/brick6/brick</div><div>Brick17: 10.0.6.101:/gluster/brick6/brick</div><div>Brick18: 10.0.6.102:/gluster/arbrick6/brick (arbiter)</div><div>Brick19: 10.0.6.100:/gluster/brick7/brick</div><div>Brick20: 10.0.6.101:/gluster/brick7/brick</div><div>Brick21: 10.0.6.102:/gluster/arbrick7/brick (arbiter)</div><div>Options Reconfigured:</div><div>cluster.min-free-disk: 50GB</div><div>performance.strict-write-ordering: off</div><div>performance.strict-o-direct: off</div><div>nfs.disable: off</div><div>performance.readdir-ahead: on</div><div>transport.address-family: inet</div><div>performance.cache-size: 1GB</div><div>features.shard: on</div><div>features.shard-block-size: 5GB</div><div>server.event-threads: 8</div><div>server.outstanding-rpc-limit: 128</div><div>storage.owner-uid: 36</div><div>storage.owner-gid: 36</div><div>performance.quick-read: off</div><div>performance.read-ahead: off</div><div>performance.io-cache: off</div><div>performance.stat-prefetch: on</div><div>cluster.eager-lock: enable</div><div>network.remote-dio: enable</div><div>cluster.quorum-type: auto</div><div>cluster.server-quorum-type: server</div><div>cluster.data-self-heal-algorithm: full</div><div>performance.flush-behind: off</div><div>performance.write-behind-window-size: 8MB</div><div>client.event-threads: 8</div><div>server.allow-insecure: on</div><div><br /></div><div><br /></div><div>Client version: </div><div>[root@kvm573 ~]# gluster --version</div><div>glusterfs 3.12.5</div><div><br /></div><div><br /></div><div>Thanks!</div><div><br /></div>- Ian</div></div><div><br /></div>
</body></html>