<html><head><style><![CDATA[#xcf8c6b187c374e65a7c71a1be745729a{
        font-family:'Segoe UI';
        font-size:12pt;
        color:#000;
        margin-left:0px;
        margin-right:8px;
        background-color:#FFF;
}]]></style><style id="signatureStyle" type="text/css"><!--#x07ae5449dff84e3
{font-family: 'Segoe UI'; font-size: 12pt;}
--></style><style id="css_styles" type="text/css"><!--blockquote.cite { margin-left: 5px; margin-right: 0px; padding-left: 10px; padding-right:0px; border-left: 1px solid #cccccc }
blockquote.cite2 {margin-left: 5px; margin-right: 0px; padding-left: 10px; padding-right:0px; border-left: 1px solid #cccccc; margin-top: 3px; padding-top: 0px; }
a img { border: 0px; }
li[style='text-align: center;'], li[style='text-align: right;'] { list-style-position: inside;}
body { font-family: Segoe UI; font-size: 12pt; }--></style></head><body><div><div id="xcf8c6b187c374e65a7c71a1be745729a"><div><div dir="ltr">Raghavendra,</div></div><div dir="ltr"><br /></div><div dir="ltr">Thanks! I'll get you this info within the next few days and will file a bug report at the same time. </div><div dir="ltr"><br /></div><div dir="ltr">For what its worth, we were able to reproduce the issue on a completely new cluster running 3.13. The IO pattern that most easily causes it to fail is a VM image format with XFS. Formatting VMS with Ext4 will create the additional shard files, but the GFIDs will usually match. I'm not sure if there are supposed to be 2 identical shard filenames, with one being empty, but they don't seem to cause VMs to pause or fail when the GFID matches. </div><div dir="ltr"><br /></div><div dir="ltr">Both of these clusters are pure SSD (one replica 3 arbiter 1, the other replica 3). I haven't seen any issues with our non-SSD clusters yet, but they aren't pushed as hard. </div></div></div><div id="signature_old"><div id="x07ae5449dff84e3"><div><div><br /></div>Ian</div></div></div><div><br /></div>
<div>------ Original Message ------</div>
<div>From: "Raghavendra Gowdappa" <<a href="mailto:rgowdapp@redhat.com">rgowdapp@redhat.com</a>></div>
<div>To: "Ian Halliday" <<a href="mailto:ihalliday@ndevix.com">ihalliday@ndevix.com</a>></div>
<div>Cc: "Krutika Dhananjay" <<a href="mailto:kdhananj@redhat.com">kdhananj@redhat.com</a>>; "gluster-user" <<a href="mailto:gluster-users@gluster.org">gluster-users@gluster.org</a>>; "Nithya Balachandran" <<a href="mailto:nbalacha@redhat.com">nbalacha@redhat.com</a>></div>
<div>Sent: 4/5/2018 10:39:47 PM</div>
<div>Subject: Re: Re[2]: [Gluster-users] Sharding problem - multiple shard copies with mismatching gfids</div><div><br /></div>
<div id="x5827566a1e7a4b8"><blockquote cite="CAFkORY_3koVaY7i2Y7eRTvgSVAxrm6Vb5g3NDszzH4Ukuk0bbw@mail.gmail.com" type="cite" class="cite2">
<div dir="ltr"><div><div><div><div><div><div><div>Sorry for the delay, Ian :).</div><div><br /></div><div>This looks to be a genuine issue which requires some effort in fixing it. Can you file a bug? I need following information attached to bug:<br /><br /></div>* Client and bricks logs. If you can reproduce the issue, please set diagnostics.client-log-level and diagnostics.brick-log-level to TRACE. If you cannot reproduce the issue or if you cannot accommodate such big logs, please set the log-level to DEBUG.<br /></div>* If possible a simple reproducer. A simple script or steps are appreciated.<br /></div>* strace of VM (to find out I/O pattern). If possible, dump of traffic between kernel and glusterfs. This can be captured by mounting glusterfs using --dump-fuse option.<br /></div> <br /></div>Note that the logs you've posted here captures the scenario _after_ the shard file has gone into bad state. But I need information on what led to that situation. So, please start collecting this diagnostic information as early as you can.<br /><br /></div>regards,<br /></div>Raghavendra<br /></div><div class="gmail_extra"><br /><div class="gmail_quote">On Tue, Apr 3, 2018 at 7:52 AM, Ian Halliday <span dir="ltr"><<a href="mailto:ihalliday@ndevix.com">ihalliday@ndevix.com</a>></span> wrote:<br /><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div><div>Raghavendra,</div><div><br /></div><div>Sorry for the late follow up. I have some more data on the issue.</div><div><br /></div><div>The issue tends to happen when the shards are created. The easiest time to reproduce this is during an initial VM disk format. This is a log from a test VM that was launched, and then partitioned and formatted with LVM / XFS:</div><div><br /></div><div>[2018-04-03 02:05:00.838440] W [MSGID: 109048] [dht-common.c:9732:dht_rmdir_<wbr>cached_lookup_cbk] 0-ovirt-350-zone1-dht: /489c6fb7-fe61-4407-8160-<wbr>35c0aac40c85/images/_remove_<wbr>me_9a0660e1-bd86-47ea-8e09-<wbr>865c14f11f26/e2645bd1-a7f3-<wbr>4cbd-9036-3d3cbc7204cd.meta found on cached subvol ovirt-350-zone1-replicate-5</wbr></wbr></wbr></wbr></wbr></div><div>[2018-04-03 02:07:57.967489] I [MSGID: 109070] [dht-common.c:2796:dht_lookup_<wbr>linkfile_cbk] 0-ovirt-350-zone1-dht: Lookup of /.shard/927c6620-848b-4064-<wbr>8c88-68a332b645c2.7 on ovirt-350-zone1-replicate-3 (following linkfile) failed ,gfid = 00000000-0000-0000-0000-<wbr>000000000000 [No such file or directory]</wbr></wbr></wbr></div><div>[2018-04-03 02:07:57.974815] I [MSGID: 109069] [dht-common.c:2095:dht_lookup_<wbr>unlink_stale_linkto_cbk] 0-ovirt-350-zone1-dht: Returned with op_ret 0 and op_errno 0 for /.shard/927c6620-848b-4064-<wbr>8c88-68a332b645c2.3</wbr></wbr></div><div>[2018-04-03 02:07:57.979851] W [MSGID: 109009] [dht-common.c:2831:dht_lookup_<wbr>linkfile_cbk] 0-ovirt-350-zone1-dht: /.shard/927c6620-848b-4064-<wbr>8c88-68a332b645c2.3: gfid different on data file on ovirt-350-zone1-replicate-3, gfid local = 00000000-0000-0000-0000-<wbr>000000000000, gfid node = 55f86aa0-e7a0-4075-b46b-<wbr>a11f8bdbbceb</wbr></wbr></wbr></wbr></div><div>[2018-04-03 02:07:57.980716] W [MSGID: 109009] [dht-common.c:2570:dht_lookup_<wbr>everywhere_cbk] 0-ovirt-350-zone1-dht: /.shard/927c6620-848b-4064-<wbr>8c88-68a332b645c2.3: gfid differs on subvolume ovirt-350-zone1-replicate-3, gfid local = b1e3f299-32ff-497e-918b-<wbr>090e957090f6, gfid node = 55f86aa0-e7a0-4075-b46b-<wbr>a11f8bdbbceb</wbr></wbr></wbr></wbr></div><div>[2018-04-03 02:07:57.980763] E [MSGID: 133010] [shard.c:1724:shard_common_<wbr>lookup_shards_cbk] 0-ovirt-350-zone1-shard: Lookup on shard 3 failed. Base file gfid = 927c6620-848b-4064-8c88-<wbr>68a332b645c2 [Stale file handle]</wbr></wbr></div><div>[2018-04-03 02:07:57.983016] I [MSGID: 109069] [dht-common.c:2095:dht_lookup_<wbr>unlink_stale_linkto_cbk] 0-ovirt-350-zone1-dht: Returned with op_ret 0 and op_errno 0 for /.shard/927c6620-848b-4064-<wbr>8c88-68a332b645c2.7</wbr></wbr></div><div>[2018-04-03 02:07:57.988761] W [MSGID: 109009] [dht-common.c:2570:dht_lookup_<wbr>everywhere_cbk] 0-ovirt-350-zone1-dht: /.shard/927c6620-848b-4064-<wbr>8c88-68a332b645c2.3: gfid differs on subvolume ovirt-350-zone1-replicate-3, gfid local = b1e3f299-32ff-497e-918b-<wbr>090e957090f6, gfid node = 55f86aa0-e7a0-4075-b46b-<wbr>a11f8bdbbceb</wbr></wbr></wbr></wbr></div><div>[2018-04-03 02:07:57.988844] W [MSGID: 109009] [dht-common.c:2831:dht_lookup_<wbr>linkfile_cbk] 0-ovirt-350-zone1-dht: /.shard/927c6620-848b-4064-<wbr>8c88-68a332b645c2.7: gfid different on data file on ovirt-350-zone1-replicate-3, gfid local = 00000000-0000-0000-0000-<wbr>000000000000, gfid node = 955a5e78-ab4c-499a-89f8-<wbr>511e041167fb</wbr></wbr></wbr></wbr></div><div>[2018-04-03 02:07:57.989748] W [MSGID: 109009] [dht-common.c:2570:dht_lookup_<wbr>everywhere_cbk] 0-ovirt-350-zone1-dht: /.shard/927c6620-848b-4064-<wbr>8c88-68a332b645c2.7: gfid differs on subvolume ovirt-350-zone1-replicate-3, gfid local = efbb9be5-0744-4883-8f3e-<wbr>e8f7ce8d7741, gfid node = 955a5e78-ab4c-499a-89f8-<wbr>511e041167fb</wbr></wbr></wbr></wbr></div><div>[2018-04-03 02:07:57.989827] I [MSGID: 109069] [dht-common.c:2095:dht_lookup_<wbr>unlink_stale_linkto_cbk] 0-ovirt-350-zone1-dht: Returned with op_ret -1 and op_errno 2 for /.shard/927c6620-848b-4064-<wbr>8c88-68a332b645c2.3</wbr></wbr></div><div>[2018-04-03 02:07:57.989832] E [MSGID: 133010] [shard.c:1724:shard_common_<wbr>lookup_shards_cbk] 0-ovirt-350-zone1-shard: Lookup on shard 7 failed. Base file gfid = 927c6620-848b-4064-8c88-<wbr>68a332b645c2 [Stale file handle]</wbr></wbr></div><div>The message "W [MSGID: 109009] [dht-common.c:2831:dht_lookup_<wbr>linkfile_cbk] 0-ovirt-350-zone1-dht: /.shard/927c6620-848b-4064-<wbr>8c88-68a332b645c2.3: gfid different on data file on ovirt-350-zone1-replicate-3, gfid local = 00000000-0000-0000-0000-<wbr>000000000000, gfid node = 55f86aa0-e7a0-4075-b46b-<wbr>a11f8bdbbceb " repeated 2 times between [2018-04-03 02:07:57.979851] and [2018-04-03 02:07:57.995739]</wbr></wbr></wbr></wbr></div><div>[2018-04-03 02:07:57.996644] W [MSGID: 109009] [dht-common.c:2570:dht_lookup_<wbr>everywhere_cbk] 0-ovirt-350-zone1-dht: /.shard/927c6620-848b-4064-<wbr>8c88-68a332b645c2.3: gfid differs on subvolume ovirt-350-zone1-replicate-3, gfid local = 0a701104-e9a2-44c0-8181-<wbr>4a9a6edecf9f, gfid node = 55f86aa0-e7a0-4075-b46b-<wbr>a11f8bdbbceb</wbr></wbr></wbr></wbr></div><div>[2018-04-03 02:07:57.996761] E [MSGID: 133010] [shard.c:1724:shard_common_<wbr>lookup_shards_cbk] 0-ovirt-350-zone1-shard: Lookup on shard 3 failed. Base file gfid = 927c6620-848b-4064-8c88-<wbr>68a332b645c2 [Stale file handle]</wbr></wbr></div><div>[2018-04-03 02:07:57.998986] W [MSGID: 109009] [dht-common.c:2831:dht_lookup_<wbr>linkfile_cbk] 0-ovirt-350-zone1-dht: /.shard/927c6620-848b-4064-<wbr>8c88-68a332b645c2.3: gfid different on data file on ovirt-350-zone1-replicate-3, gfid local = 00000000-0000-0000-0000-<wbr>000000000000, gfid node = 55f86aa0-e7a0-4075-b46b-<wbr>a11f8bdbbceb</wbr></wbr></wbr></wbr></div><div>[2018-04-03 02:07:57.999857] W [MSGID: 109009] [dht-common.c:2570:dht_lookup_<wbr>everywhere_cbk] 0-ovirt-350-zone1-dht: /.shard/927c6620-848b-4064-<wbr>8c88-68a332b645c2.3: gfid differs on subvolume ovirt-350-zone1-replicate-3, gfid local = 0a701104-e9a2-44c0-8181-<wbr>4a9a6edecf9f, gfid node = 55f86aa0-e7a0-4075-b46b-<wbr>a11f8bdbbceb</wbr></wbr></wbr></wbr></div><div>[2018-04-03 02:07:57.999899] E [MSGID: 133010] [shard.c:1724:shard_common_<wbr>lookup_shards_cbk] 0-ovirt-350-zone1-shard: Lookup on shard 3 failed. Base file gfid = 927c6620-848b-4064-8c88-<wbr>68a332b645c2 [Stale file handle]</wbr></wbr></div><div>[2018-04-03 02:07:57.999942] W [fuse-bridge.c:896:fuse_attr_<wbr>cbk] 0-glusterfs-fuse: 22338: FSTAT() /489c6fb7-fe61-4407-8160-<wbr>35c0aac40c85/images/a717e25c-<wbr>f108-4367-9d28-9235bd432bb7/<wbr>5a8e541e-8883-4dec-8afd-<wbr>aa29f38ef502 => -1 (Stale file handle)</wbr></wbr></wbr></wbr></wbr></div><div>[2018-04-03 02:07:57.987941] I [MSGID: 109069] [dht-common.c:2095:dht_lookup_<wbr>unlink_stale_linkto_cbk] 0-ovirt-350-zone1-dht: Returned with op_ret 0 and op_errno 0 for /.shard/927c6620-848b-4064-<wbr>8c88-68a332b645c2.3</wbr></wbr></div><div><br /></div><div><br /></div><div>Duplicate shards are created. Output from one of the gluster nodes:</div><div><br /></div><div># find -name 927c6620-848b-4064-8c88-<wbr>68a332b645c2.*</wbr></div><div>./brick1/brick/.shard/<wbr>927c6620-848b-4064-8c88-<wbr>68a332b645c2.19</wbr></wbr></div><div>./brick1/brick/.shard/<wbr>927c6620-848b-4064-8c88-<wbr>68a332b645c2.9</wbr></wbr></div><div>./brick1/brick/.shard/<wbr>927c6620-848b-4064-8c88-<wbr>68a332b645c2.7</wbr></wbr></div><div>./brick3/brick/.shard/<wbr>927c6620-848b-4064-8c88-<wbr>68a332b645c2.5</wbr></wbr></div><div>./brick3/brick/.shard/<wbr>927c6620-848b-4064-8c88-<wbr>68a332b645c2.3</wbr></wbr></div><div>./brick4/brick/.shard/<wbr>927c6620-848b-4064-8c88-<wbr>68a332b645c2.19</wbr></wbr></div><div>./brick4/brick/.shard/<wbr>927c6620-848b-4064-8c88-<wbr>68a332b645c2.9</wbr></wbr></div><div>./brick4/brick/.shard/<wbr>927c6620-848b-4064-8c88-<wbr>68a332b645c2.5</wbr></wbr></div><div>./brick4/brick/.shard/<wbr>927c6620-848b-4064-8c88-<wbr>68a332b645c2.3</wbr></wbr></div><div>./brick4/brick/.shard/<wbr>927c6620-848b-4064-8c88-<wbr>68a332b645c2.7</wbr></wbr></div><div><br /></div><div>[root@n1 gluster]# getfattr -d -m . -e hex ./brick1/brick/.shard/<wbr>927c6620-848b-4064-8c88-<wbr>68a332b645c2.19</wbr></wbr></div><div># file: brick1/brick/.shard/927c6620-<wbr>848b-4064-8c88-68a332b645c2.19</wbr></div><div>security.selinux=<wbr>0x73797374656d5f753a6f626a6563<wbr>745f723a756e6c6162656c65645f74<wbr>3a733000</wbr></wbr></wbr></div><div>trusted.gfid=<wbr>0x46083184a0e5468e89e6cc1db0bf<wbr>c63b</wbr></wbr></div><div>trusted.gfid2path.<wbr>77528eefc6a11c45=<wbr>0x62653331383633382d653861302d<wbr>346336642d393737642d3761393337<wbr>616138343830362f39323763363632<wbr>302d383438622d343036342d386338<wbr>382d3638613333326236343563322e<wbr>3139</wbr></wbr></wbr></wbr></wbr></wbr></wbr></div><div>trusted.glusterfs.dht.linkto=<wbr>0x6f766972742d3335302d7a6f6e65<wbr>312d7265706c69636174652d3300</wbr></wbr></div><div><br /></div><div>[root@n1 gluster]# getfattr -d -m . -e hex ./brick4/brick/.shard/<wbr>927c6620-848b-4064-8c88-<wbr>68a332b645c2.19</wbr></wbr></div><div># file: brick4/brick/.shard/927c6620-<wbr>848b-4064-8c88-68a332b645c2.19</wbr></div><div>security.selinux=<wbr>0x73797374656d5f753a6f626a6563<wbr>745f723a756e6c6162656c65645f74<wbr>3a733000</wbr></wbr></wbr></div><div>trusted.afr.dirty=<wbr>0x000000000000000000000000</wbr></div><div>trusted.gfid=<wbr>0x46083184a0e5468e89e6cc1db0bf<wbr>c63b</wbr></wbr></div><div>trusted.gfid2path.<wbr>77528eefc6a11c45=<wbr>0x62653331383633382d653861302d<wbr>346336642d393737642d3761393337<wbr>616138343830362f39323763363632<wbr>302d383438622d343036342d386338<wbr>382d3638613333326236343563322e<wbr>3139</wbr></wbr></wbr></wbr></wbr></wbr></wbr></div><div><br /></div><div><br /></div><div>In the above example, the shard on Brick 1 is the bad one. </div><div><br /></div><div>At this point, the VM will pause with an unknown storage error and will not boot until the offending shards are removed. </div><div><div class="h5"><div><br /></div><div><br /></div><div># gluster volume info</div><div>Volume Name: ovirt-350-zone1</div><div>Type: Distributed-Replicate</div><div>Volume ID: 106738ed-9951-4270-822e-<wbr>63c9bcd0a20e</wbr></div><div>Status: Started</div><div>Snapshot Count: 0</div><div>Number of Bricks: 7 x (2 + 1) = 21</div><div>Transport-type: tcp</div><div>Bricks:</div><div>Brick1: 10.0.6.100:/gluster/brick1/<wbr>brick</wbr></div><div>Brick2: 10.0.6.101:/gluster/brick1/<wbr>brick</wbr></div><div>Brick3: 10.0.6.102:/gluster/arbrick1/<wbr>brick (arbiter)</wbr></div><div>Brick4: 10.0.6.100:/gluster/brick2/<wbr>brick</wbr></div><div>Brick5: 10.0.6.101:/gluster/brick2/<wbr>brick</wbr></div><div>Brick6: 10.0.6.102:/gluster/arbrick2/<wbr>brick (arbiter)</wbr></div><div>Brick7: 10.0.6.100:/gluster/brick3/<wbr>brick</wbr></div><div>Brick8: 10.0.6.101:/gluster/brick3/<wbr>brick</wbr></div><div>Brick9: 10.0.6.102:/gluster/arbrick3/<wbr>brick (arbiter)</wbr></div><div>Brick10: 10.0.6.100:/gluster/brick4/<wbr>brick</wbr></div><div>Brick11: 10.0.6.101:/gluster/brick4/<wbr>brick</wbr></div><div>Brick12: 10.0.6.102:/gluster/arbrick4/<wbr>brick (arbiter)</wbr></div><div>Brick13: 10.0.6.100:/gluster/brick5/<wbr>brick</wbr></div><div>Brick14: 10.0.6.101:/gluster/brick5/<wbr>brick</wbr></div><div>Brick15: 10.0.6.102:/gluster/arbrick5/<wbr>brick (arbiter)</wbr></div><div>Brick16: 10.0.6.100:/gluster/brick6/<wbr>brick</wbr></div><div>Brick17: 10.0.6.101:/gluster/brick6/<wbr>brick</wbr></div><div>Brick18: 10.0.6.102:/gluster/arbrick6/<wbr>brick (arbiter)</wbr></div><div>Brick19: 10.0.6.100:/gluster/brick7/<wbr>brick</wbr></div><div>Brick20: 10.0.6.101:/gluster/brick7/<wbr>brick</wbr></div><div>Brick21: 10.0.6.102:/gluster/arbrick7/<wbr>brick (arbiter)</wbr></div><div>Options Reconfigured:</div></div></div><div>cluster.server-quorum-type: server</div><div>cluster.data-self-heal-<wbr>algorithm: full</wbr></div><div>performance.client-io-threads: off</div><div>server.allow-insecure: on</div><div>client.event-threads: 8</div><div>storage.owner-gid: 36</div><div>storage.owner-uid: 36</div><div>server.event-threads: 16</div><div>features.shard-block-size: 5GB</div><div>features.shard: on</div><div>transport.address-family: inet</div><div>nfs.disable: yes</div><div><br /></div>
<div>Any suggestions?</div><span class=""><div><br /></div><div id="m_-7404469986021662363signature_old"><div id="m_-7404469986021662363xf9c312702d434a9"><div><div><br /></div>-- Ian</div></div></div><div><br /></div><div><br /></div>
<div>------ Original Message ------</div>
<div>From: "Raghavendra Gowdappa" <<a href="mailto:rgowdapp@redhat.com">rgowdapp@redhat.com</a>></div>
<div>To: "Krutika Dhananjay" <<a href="mailto:kdhananj@redhat.com">kdhananj@redhat.com</a>></div>
<div>Cc: "Ian Halliday" <<a href="mailto:ihalliday@ndevix.com">ihalliday@ndevix.com</a>>; "gluster-user" <<a href="mailto:gluster-users@gluster.org">gluster-users@gluster.org</a>>; "Nithya Balachandran" <<a href="mailto:nbalacha@redhat.com">nbalacha@redhat.com</a>></div>
<div>Sent: 3/26/2018 2:37:21 AM</div>
<div>Subject: Re: [Gluster-users] Sharding problem - multiple shard copies with mismatching gfids</div><div><br /></div>
</span><div><div class="h5"><div id="m_-7404469986021662363x0a2b82748261429"><blockquote cite="http://CAFkORY9MuBM1RYNm3RHORvdFzmnF9C+Ed5k+q5=YuuPX7Sq4Rw@mail.gmail.com" type="cite" class="m_-7404469986021662363cite2">
<div dir="ltr"><div><div><div>Ian,<br /><br /></div>Do you've a reproducer for this bug? If not a specific one, a general outline of what operations where done on the file will help.<br /><br /></div>regards,<br /></div>Raghavendra<br /></div><div class="gmail_extra"><br /><div class="gmail_quote">On Mon, Mar 26, 2018 at 12:55 PM, Raghavendra Gowdappa <span dir="ltr"><<a href="mailto:rgowdapp@redhat.com">rgowdapp@redhat.com</a>></span> wrote:<br /><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><br /><div class="gmail_extra"><br /><div class="gmail_quote"><span>On Mon, Mar 26, 2018 at 12:40 PM, Krutika Dhananjay <span dir="ltr"><<a href="mailto:kdhananj@redhat.com">kdhananj@redhat.com</a>></span> wrote:<br /><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div><div>The gfid mismatch here is between the shard and its "link-to" file, the creation of which happens at a layer below that of shard translator on the stack.<br /></div><div><br /></div><div>Adding DHT devs to take a look.<br /></div></div></div></blockquote><div><br /></div></span><div>Thanks Krutika. I assume shard doesn't do any dentry operations like rename, link, unlink on the path of file (not the gfid handle based path) internally while managing shards. Can you confirm? If it does these operations, what fops does it do?<br /></div><div><br /></div><div>@Ian,</div><div><br /></div><div>I can suggest following way to fix the problem:<br /></div><div>* Since one of files listed is a DHT linkto file, I am assuming there is only one shard of the file. If not, please list out gfids of other shards and don't proceed with healing procedure.<br /></div><div>* If gfids of all shards happen to be same and only linkto has a different gfid, please proceed to step 3. Otherwise abort the healing procedure.</div><div>* If cluster.lookup-optimize is set to true abort the healing procedure</div><div>* Delete the linkto file - the file with permissions -------T and xattr trusted.dht.linkto and do a lookup on the file from mount point after turning off readdriplus [1].</div><div><br /></div><div>As to reasons on how we ended up in this situation, Can you explain me what is the I/O pattern on this file - like are there lots of entry operations like rename, link, unlink etc on the file? There have been known races in rename/lookup-heal-creating-li<wbr>nkto where linkto and data file have different gfids. [2] fixes some of these cases<br /></wbr></div><div><br /></div><div>[1] <a href="http://lists.gluster.org/pipermail/gluster-users/2017-March/030148.html">http://lists.gluster.org/piper<wbr>mail/gluster-users/2017-March/<wbr>030148.html</wbr></wbr></a></div><div>[2] <a href="https://review.gluster.org/#/c/19547/">https://review.gluster.org/#/c<wbr>/19547/</wbr></a><br /></div><div><br /></div><div>regards,</div><div>Raghavendra<br /></div><div><div class="m_-7404469986021662363h5"><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div><div><br /></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><br /></blockquote></div>-Krutika<br /></div><div class="gmail_extra"><br /><div class="gmail_quote"><div><div class="m_-7404469986021662363m_-2217322752226999361gmail-h5">On Mon, Mar 26, 2018 at 1:09 AM, Ian Halliday <span dir="ltr"><<a href="mailto:ihalliday@ndevix.com">ihalliday@ndevix.com</a>></span> wrote:<br /></div></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div><div class="m_-7404469986021662363m_-2217322752226999361gmail-h5">
<div><div>Hello all,</div><div><br /></div><div>We are having a rather interesting problem with one of our VM storage systems. The GlusterFS client is throwing errors relating to GFID mismatches. We traced this down to multiple shards being present on the gluster nodes, with different gfids.</div><div><br /></div><div>Hypervisor gluster mount log:</div><div><br /></div><div>[2018-03-25 18:54:19.261733] E [MSGID: 133010] [shard.c:1724:shard_common_loo<wbr>kup_shards_cbk] 0-ovirt-zone1-shard: Lookup on shard 7 failed. Base file gfid = 87137cac-49eb-492a-8f33-8e3347<wbr>0d8cb7 [Stale file handle]</wbr></wbr></div><div>The message "W [MSGID: 109009] [dht-common.c:2162:dht_lookup_<wbr>linkfile_cbk] 0-ovirt-zone1-dht: /.shard/87137cac-49eb-492a-8f3<wbr>3-8e33470d8cb7.7: gfid different on data file on ovirt-zone1-replicate-3, gfid local = 00000000-0000-0000-0000-000000<wbr>000000, gfid node = 57c6fcdf-52bb-4f7a-aea4-02f0dc<wbr>81ff56 " repeated 2 times between [2018-03-25 18:54:19.253748] and [2018-03-25 18:54:19.263576]</wbr></wbr></wbr></wbr></div><div>[2018-03-25 18:54:19.264349] W [MSGID: 109009] [dht-common.c:1901:dht_lookup_<wbr>everywhere_cbk] 0-ovirt-zone1-dht: /.shard/87137cac-49eb-492a-8f3<wbr>3-8e33470d8cb7.7: gfid differs on subvolume ovirt-zone1-replicate-3, gfid local = fdf0813b-718a-4616-a51b-6999eb<wbr>ba9ec3, gfid node = 57c6fcdf-52bb-4f7a-aea4-02f0dc<wbr>81ff56</wbr></wbr></wbr></wbr></div><div><br /></div><div><br /></div><div>On the storage nodes, we found this:</div><div><br /></div><div>[root@n1 gluster]# find -name 87137cac-49eb-492a-8f33-8e3347<wbr>0d8cb7.7</wbr></div><div>./brick2/brick/.shard/87137cac<wbr>-49eb-492a-8f33-8e33470d8cb7.7</wbr></div><div>./brick4/brick/.shard/87137cac<wbr>-49eb-492a-8f33-8e33470d8cb7.7</wbr></div><div><br /></div><div>[root@n1 gluster]# ls -lh ./brick2/brick/.shard/87137cac<wbr>-49eb-492a-8f33-8e33470d8cb7.7</wbr></div><div>---------T. 2 root root 0 Mar 25 13:55 ./brick2/brick/.shard/87137cac<wbr>-49eb-492a-8f33-8e33470d8cb7.7</wbr></div><div>[root@n1 gluster]# ls -lh ./brick4/brick/.shard/87137cac<wbr>-49eb-492a-8f33-8e33470d8cb7.7</wbr></div><div>-rw-rw----. 2 root root 3.8G Mar 25 13:55 ./brick4/brick/.shard/87137cac<wbr>-49eb-492a-8f33-8e33470d8cb7.7</wbr></div>
<div><br /></div><div id="m_-7404469986021662363m_-2217322752226999361gmail-m_-7008808090284224248m_4796567358248921674signature_old"><div id="m_-7404469986021662363m_-2217322752226999361gmail-m_-7008808090284224248m_4796567358248921674x64746296573c472"><div>[root@n1 gluster]# getfattr -d -m . -e hex ./brick2/brick/.shard/87137cac<wbr>-49eb-492a-8f33-8e33470d8cb7.7</wbr></div><div># file: brick2/brick/.shard/87137cac-4<wbr>9eb-492a-8f33-8e33470d8cb7.7</wbr></div><div>security.selinux=0x73797374656<wbr>d5f753a6f626a6563745f723a756e6<wbr>c6162656c65645f743a733000</wbr></wbr></div><div>trusted.gfid=0xfdf0813b718a461<wbr>6a51b6999ebba9ec3</wbr></div><div>trusted.glusterfs.dht.linkto=0<wbr>x6f766972742d3335302d7a6f6e653<wbr>12d7265706c69636174652d3300</wbr></wbr></div><div><br /></div><div>[root@n1 gluster]# getfattr -d -m . -e hex ./brick4/brick/.shard/87137cac<wbr>-49eb-492a-8f33-8e33470d8cb7.7</wbr></div><div># file: brick4/brick/.shard/87137cac-4<wbr>9eb-492a-8f33-8e33470d8cb7.7</wbr></div><div>security.selinux=0x73797374656<wbr>d5f753a6f626a6563745f723a756e6<wbr>c6162656c65645f743a733000</wbr></wbr></div><div>trusted.afr.dirty=0x0000000000<wbr>00000000000000</wbr></div><div>trusted.bit-rot.version=0x0200<wbr>00000000000059914190000ce672</wbr></div><div>trusted.gfid=0x57c6fcdf52bb4f7<wbr>aaea402f0dc81ff56</wbr></div><div><br /></div><div><br /></div><div>I'm wondering how they got created in the first place, and if anyone has any insight on how to fix it?</div><div><br /></div><div>Storage nodes:</div><div>[root@n1 gluster]# gluster --version</div><div>glusterfs 4.0.0</div><div><br /></div><div>[root@n1 gluster]# gluster volume info</div><div><br /></div><div>Volume Name: ovirt-350-zone1</div><div>Type: Distributed-Replicate</div><div>Volume ID: 106738ed-9951-4270-822e-63c9bc<wbr>d0a20e</wbr></div><div>Status: Started</div><div>Snapshot Count: 0</div><div>Number of Bricks: 7 x (2 + 1) = 21</div><div>Transport-type: tcp</div><div>Bricks:</div><div>Brick1: 10.0.6.100:/gluster/brick1/bri<wbr>ck</wbr></div><div>Brick2: 10.0.6.101:/gluster/brick1/bri<wbr>ck</wbr></div><div>Brick3: 10.0.6.102:/gluster/arbrick1/b<wbr>rick (arbiter)</wbr></div><div>Brick4: 10.0.6.100:/gluster/brick2/bri<wbr>ck</wbr></div><div>Brick5: 10.0.6.101:/gluster/brick2/bri<wbr>ck</wbr></div><div>Brick6: 10.0.6.102:/gluster/arbrick2/b<wbr>rick (arbiter)</wbr></div><div>Brick7: 10.0.6.100:/gluster/brick3/bri<wbr>ck</wbr></div><div>Brick8: 10.0.6.101:/gluster/brick3/bri<wbr>ck</wbr></div><div>Brick9: 10.0.6.102:/gluster/arbrick3/b<wbr>rick (arbiter)</wbr></div><div>Brick10: 10.0.6.100:/gluster/brick4/bri<wbr>ck</wbr></div><div>Brick11: 10.0.6.101:/gluster/brick4/bri<wbr>ck</wbr></div><div>Brick12: 10.0.6.102:/gluster/arbrick4/b<wbr>rick (arbiter)</wbr></div><div>Brick13: 10.0.6.100:/gluster/brick5/bri<wbr>ck</wbr></div><div>Brick14: 10.0.6.101:/gluster/brick5/bri<wbr>ck</wbr></div><div>Brick15: 10.0.6.102:/gluster/arbrick5/b<wbr>rick (arbiter)</wbr></div><div>Brick16: 10.0.6.100:/gluster/brick6/bri<wbr>ck</wbr></div><div>Brick17: 10.0.6.101:/gluster/brick6/bri<wbr>ck</wbr></div><div>Brick18: 10.0.6.102:/gluster/arbrick6/b<wbr>rick (arbiter)</wbr></div><div>Brick19: 10.0.6.100:/gluster/brick7/bri<wbr>ck</wbr></div><div>Brick20: 10.0.6.101:/gluster/brick7/bri<wbr>ck</wbr></div><div>Brick21: 10.0.6.102:/gluster/arbrick7/b<wbr>rick (arbiter)</wbr></div><div>Options Reconfigured:</div><div>cluster.min-free-disk: 50GB</div><div>performance.strict-write-order<wbr>ing: off</wbr></div><div>performance.strict-o-direct: off</div><div>nfs.disable: off</div><div>performance.readdir-ahead: on</div><div>transport.address-family: inet</div><div>performance.cache-size: 1GB</div><div>features.shard: on</div><div>features.shard-block-size: 5GB</div><div>server.event-threads: 8</div><div>server.outstanding-rpc-limit: 128</div><div>storage.owner-uid: 36</div><div>storage.owner-gid: 36</div><div>performance.quick-read: off</div><div>performance.read-ahead: off</div><div>performance.io-cache: off</div><div>performance.stat-prefetch: on</div><div>cluster.eager-lock: enable</div><div>network.remote-dio: enable</div><div>cluster.quorum-type: auto</div><div>cluster.server-quorum-type: server</div><div>cluster.data-self-heal-algorit<wbr>hm: full</wbr></div><div>performance.flush-behind: off</div><div>performance.write-behind-windo<wbr>w-size: 8MB</wbr></div><div>client.event-threads: 8</div><div>server.allow-insecure: on</div><div><br /></div><div><br /></div><div>Client version: </div><div>[root@kvm573 ~]# gluster --version</div><div>glusterfs 3.12.5</div><div><br /></div><div><br /></div><div>Thanks!</div><span class="m_-7404469986021662363m_-2217322752226999361gmail-m_-7008808090284224248HOEnZb"><font color="#888888"><div><br /></div>- Ian</font></span></div></div><div><br /></div>
</div><br /></div></div>______________________________<wbr>_________________<br />
Gluster-users mailing list<br />
<a href="mailto:Gluster-users@gluster.org">Gluster-users@gluster.org</a><br />
<a href="http://lists.gluster.org/mailman/listinfo/gluster-users" rel="noreferrer">http://lists.gluster.org/mailm<wbr>an/listinfo/gluster-users</wbr></a><br /></wbr></blockquote></div><br /></div>
</blockquote></div></div></div><br /></div></div>
</blockquote></div><br /></div>
</blockquote></div>
</div></div></div></blockquote></div><br /></div>
</blockquote></div>
</body></html>