[Gluster-users] Sharding problem - multiple shard copies with mismatching gfids
Ian Halliday
ihalliday at ndevix.com
Mon Mar 26 08:39:52 UTC 2018
Raghavenda,
The issue typically appears during heavy write operations to the VM
image. Its most noticeable during the filesystem creation process on a
virtual machine image. I'll get some specific data while executing that
process and will get back to you soon.
thanks
-- Ian
------ Original Message ------
From: "Raghavendra Gowdappa" <rgowdapp at redhat.com>
To: "Krutika Dhananjay" <kdhananj at redhat.com>
Cc: "Ian Halliday" <ihalliday at ndevix.com>; "gluster-user"
<gluster-users at gluster.org>; "Nithya Balachandran" <nbalacha at redhat.com>
Sent: 3/26/2018 2:37:21 AM
Subject: Re: [Gluster-users] Sharding problem - multiple shard copies
with mismatching gfids
>Ian,
>
>Do you've a reproducer for this bug? If not a specific one, a general
>outline of what operations where done on the file will help.
>
>regards,
>Raghavendra
>
>On Mon, Mar 26, 2018 at 12:55 PM, Raghavendra Gowdappa
><rgowdapp at redhat.com> wrote:
>>
>>
>>On Mon, Mar 26, 2018 at 12:40 PM, Krutika Dhananjay
>><kdhananj at redhat.com> wrote:
>>>The gfid mismatch here is between the shard and its "link-to" file,
>>>the creation of which happens at a layer below that of shard
>>>translator on the stack.
>>>
>>>Adding DHT devs to take a look.
>>
>>Thanks Krutika. I assume shard doesn't do any dentry operations like
>>rename, link, unlink on the path of file (not the gfid handle based
>>path) internally while managing shards. Can you confirm? If it does
>>these operations, what fops does it do?
>>
>>@Ian,
>>
>>I can suggest following way to fix the problem:
>>* Since one of files listed is a DHT linkto file, I am assuming there
>>is only one shard of the file. If not, please list out gfids of other
>>shards and don't proceed with healing procedure.
>>* If gfids of all shards happen to be same and only linkto has a
>>different gfid, please proceed to step 3. Otherwise abort the healing
>>procedure.
>>* If cluster.lookup-optimize is set to true abort the healing
>>procedure
>>* Delete the linkto file - the file with permissions -------T and
>>xattr trusted.dht.linkto and do a lookup on the file from mount point
>>after turning off readdriplus [1].
>>
>>As to reasons on how we ended up in this situation, Can you explain me
>>what is the I/O pattern on this file - like are there lots of entry
>>operations like rename, link, unlink etc on the file? There have been
>>known races in rename/lookup-heal-creating-linkto where linkto and
>>data file have different gfids. [2] fixes some of these cases
>>
>>[1]
>>http://lists.gluster.org/pipermail/gluster-users/2017-March/030148.html
>><http://lists.gluster.org/pipermail/gluster-users/2017-March/030148.html>
>>[2] https://review.gluster.org/#/c/19547/
>><https://review.gluster.org/#/c/19547/>
>>
>>regards,
>>Raghavendra
>>>
>>>>
>>>-Krutika
>>>
>>>On Mon, Mar 26, 2018 at 1:09 AM, Ian Halliday <ihalliday at ndevix.com>
>>>wrote:
>>>>Hello all,
>>>>
>>>>We are having a rather interesting problem with one of our VM
>>>>storage systems. The GlusterFS client is throwing errors relating to
>>>>GFID mismatches. We traced this down to multiple shards being
>>>>present on the gluster nodes, with different gfids.
>>>>
>>>>Hypervisor gluster mount log:
>>>>
>>>>[2018-03-25 18:54:19.261733] E [MSGID: 133010]
>>>>[shard.c:1724:shard_common_lookup_shards_cbk] 0-ovirt-zone1-shard:
>>>>Lookup on shard 7 failed. Base file gfid =
>>>>87137cac-49eb-492a-8f33-8e33470d8cb7 [Stale file handle]
>>>>The message "W [MSGID: 109009]
>>>>[dht-common.c:2162:dht_lookup_linkfile_cbk] 0-ovirt-zone1-dht:
>>>>/.shard/87137cac-49eb-492a-8f33-8e33470d8cb7.7: gfid different on
>>>>data file on ovirt-zone1-replicate-3, gfid local =
>>>>00000000-0000-0000-0000-000000000000, gfid node =
>>>>57c6fcdf-52bb-4f7a-aea4-02f0dc81ff56 " repeated 2 times between
>>>>[2018-03-25 18:54:19.253748] and [2018-03-25 18:54:19.263576]
>>>>[2018-03-25 18:54:19.264349] W [MSGID: 109009]
>>>>[dht-common.c:1901:dht_lookup_everywhere_cbk] 0-ovirt-zone1-dht:
>>>>/.shard/87137cac-49eb-492a-8f33-8e33470d8cb7.7: gfid differs on
>>>>subvolume ovirt-zone1-replicate-3, gfid local =
>>>>fdf0813b-718a-4616-a51b-6999ebba9ec3, gfid node =
>>>>57c6fcdf-52bb-4f7a-aea4-02f0dc81ff56
>>>>
>>>>
>>>>On the storage nodes, we found this:
>>>>
>>>>[root at n1 gluster]# find -name 87137cac-49eb-492a-8f33-8e33470d8cb7.7
>>>>./brick2/brick/.shard/87137cac-49eb-492a-8f33-8e33470d8cb7.7
>>>>./brick4/brick/.shard/87137cac-49eb-492a-8f33-8e33470d8cb7.7
>>>>
>>>>[root at n1 gluster]# ls -lh
>>>>./brick2/brick/.shard/87137cac-49eb-492a-8f33-8e33470d8cb7.7
>>>>---------T. 2 root root 0 Mar 25 13:55
>>>>./brick2/brick/.shard/87137cac-49eb-492a-8f33-8e33470d8cb7.7
>>>>[root at n1 gluster]# ls -lh
>>>>./brick4/brick/.shard/87137cac-49eb-492a-8f33-8e33470d8cb7.7
>>>>-rw-rw----. 2 root root 3.8G Mar 25 13:55
>>>>./brick4/brick/.shard/87137cac-49eb-492a-8f33-8e33470d8cb7.7
>>>>
>>>>[root at n1 gluster]# getfattr -d -m . -e hex
>>>>./brick2/brick/.shard/87137cac-49eb-492a-8f33-8e33470d8cb7.7
>>>># file: brick2/brick/.shard/87137cac-49eb-492a-8f33-8e33470d8cb7.7
>>>>security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6c6162656c65645f743a733000
>>>>trusted.gfid=0xfdf0813b718a4616a51b6999ebba9ec3
>>>>trusted.glusterfs.dht.linkto=0x6f766972742d3335302d7a6f6e65312d7265706c69636174652d3300
>>>>
>>>>[root at n1 gluster]# getfattr -d -m . -e hex
>>>>./brick4/brick/.shard/87137cac-49eb-492a-8f33-8e33470d8cb7.7
>>>># file: brick4/brick/.shard/87137cac-49eb-492a-8f33-8e33470d8cb7.7
>>>>security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6c6162656c65645f743a733000
>>>>trusted.afr.dirty=0x000000000000000000000000
>>>>trusted.bit-rot.version=0x020000000000000059914190000ce672
>>>>trusted.gfid=0x57c6fcdf52bb4f7aaea402f0dc81ff56
>>>>
>>>>
>>>>I'm wondering how they got created in the first place, and if anyone
>>>>has any insight on how to fix it?
>>>>
>>>>Storage nodes:
>>>>[root at n1 gluster]# gluster --version
>>>>glusterfs 4.0.0
>>>>
>>>>[root at n1 gluster]# gluster volume info
>>>>
>>>>Volume Name: ovirt-350-zone1
>>>>Type: Distributed-Replicate
>>>>Volume ID: 106738ed-9951-4270-822e-63c9bcd0a20e
>>>>Status: Started
>>>>Snapshot Count: 0
>>>>Number of Bricks: 7 x (2 + 1) = 21
>>>>Transport-type: tcp
>>>>Bricks:
>>>>Brick1: 10.0.6.100:/gluster/brick1/brick
>>>>Brick2: 10.0.6.101:/gluster/brick1/brick
>>>>Brick3: 10.0.6.102:/gluster/arbrick1/brick (arbiter)
>>>>Brick4: 10.0.6.100:/gluster/brick2/brick
>>>>Brick5: 10.0.6.101:/gluster/brick2/brick
>>>>Brick6: 10.0.6.102:/gluster/arbrick2/brick (arbiter)
>>>>Brick7: 10.0.6.100:/gluster/brick3/brick
>>>>Brick8: 10.0.6.101:/gluster/brick3/brick
>>>>Brick9: 10.0.6.102:/gluster/arbrick3/brick (arbiter)
>>>>Brick10: 10.0.6.100:/gluster/brick4/brick
>>>>Brick11: 10.0.6.101:/gluster/brick4/brick
>>>>Brick12: 10.0.6.102:/gluster/arbrick4/brick (arbiter)
>>>>Brick13: 10.0.6.100:/gluster/brick5/brick
>>>>Brick14: 10.0.6.101:/gluster/brick5/brick
>>>>Brick15: 10.0.6.102:/gluster/arbrick5/brick (arbiter)
>>>>Brick16: 10.0.6.100:/gluster/brick6/brick
>>>>Brick17: 10.0.6.101:/gluster/brick6/brick
>>>>Brick18: 10.0.6.102:/gluster/arbrick6/brick (arbiter)
>>>>Brick19: 10.0.6.100:/gluster/brick7/brick
>>>>Brick20: 10.0.6.101:/gluster/brick7/brick
>>>>Brick21: 10.0.6.102:/gluster/arbrick7/brick (arbiter)
>>>>Options Reconfigured:
>>>>cluster.min-free-disk: 50GB
>>>>performance.strict-write-ordering: off
>>>>performance.strict-o-direct: off
>>>>nfs.disable: off
>>>>performance.readdir-ahead: on
>>>>transport.address-family: inet
>>>>performance.cache-size: 1GB
>>>>features.shard: on
>>>>features.shard-block-size: 5GB
>>>>server.event-threads: 8
>>>>server.outstanding-rpc-limit: 128
>>>>storage.owner-uid: 36
>>>>storage.owner-gid: 36
>>>>performance.quick-read: off
>>>>performance.read-ahead: off
>>>>performance.io-cache: off
>>>>performance.stat-prefetch: on
>>>>cluster.eager-lock: enable
>>>>network.remote-dio: enable
>>>>cluster.quorum-type: auto
>>>>cluster.server-quorum-type: server
>>>>cluster.data-self-heal-algorithm: full
>>>>performance.flush-behind: off
>>>>performance.write-behind-window-size: 8MB
>>>>client.event-threads: 8
>>>>server.allow-insecure: on
>>>>
>>>>
>>>>Client version:
>>>>[root at kvm573 ~]# gluster --version
>>>>glusterfs 3.12.5
>>>>
>>>>
>>>>Thanks!
>>>>
>>>>- Ian
>>>>
>>>>
>>>>_______________________________________________
>>>>Gluster-users mailing list
>>>>Gluster-users at gluster.org
>>>>http://lists.gluster.org/mailman/listinfo/gluster-users
>>>><http://lists.gluster.org/mailman/listinfo/gluster-users>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20180326/f5a01276/attachment.html>
More information about the Gluster-users
mailing list