[Gluster-users] Sharding problem - multiple shard copies with mismatching gfids

Mon Mar 26 07:10:54 UTC 2018

The gfid mismatch here is between the shard and its "link-to" file, the
creation of which happens at a layer below that of shard translator on the
stack.

Adding DHT devs to take a look.

-Krutika

On Mon, Mar 26, 2018 at 1:09 AM, Ian Halliday <ihalliday at ndevix.com> wrote:

> Hello all,
>
> We are having a rather interesting problem with one of our VM storage
> systems. The GlusterFS client is throwing errors relating to GFID
> mismatches. We traced this down to multiple shards being present on the
> gluster nodes, with different gfids.
>
> Hypervisor gluster mount log:
>
> [2018-03-25 18:54:19.261733] E [MSGID: 133010] [shard.c:1724:shard_common_lookup_shards_cbk]
> 0-ovirt-zone1-shard: Lookup on shard 7 failed. Base file gfid =
> 87137cac-49eb-492a-8f33-8e33470d8cb7 [Stale file handle]
> The message "W [MSGID: 109009] [dht-common.c:2162:dht_lookup_linkfile_cbk]
> 0-ovirt-zone1-dht: /.shard/87137cac-49eb-492a-8f33-8e33470d8cb7.7: gfid
> different on data file on ovirt-zone1-replicate-3, gfid local =
> 00000000-0000-0000-0000-000000000000, gfid node = 57c6fcdf-52bb-4f7a-aea4-02f0dc81ff56
> " repeated 2 times between [2018-03-25 18:54:19.253748] and [2018-03-25
> 18:54:19.263576]
> [2018-03-25 18:54:19.264349] W [MSGID: 109009]
> [dht-common.c:1901:dht_lookup_everywhere_cbk] 0-ovirt-zone1-dht:
> /.shard/87137cac-49eb-492a-8f33-8e33470d8cb7.7: gfid differs on subvolume
> ovirt-zone1-replicate-3, gfid local = fdf0813b-718a-4616-a51b-6999ebba9ec3,
> gfid node = 57c6fcdf-52bb-4f7a-aea4-02f0dc81ff56
>
>
> On the storage nodes, we found this:
>
> [root at n1 gluster]# find -name 87137cac-49eb-492a-8f33-8e33470d8cb7.7
> ./brick2/brick/.shard/87137cac-49eb-492a-8f33-8e33470d8cb7.7
> ./brick4/brick/.shard/87137cac-49eb-492a-8f33-8e33470d8cb7.7
>
> [root at n1 gluster]# ls -lh ./brick2/brick/.shard/87137cac-49eb-492a-8f33-
> 8e33470d8cb7.7
> ---------T. 2 root root 0 Mar 25 13:55 ./brick2/brick/.shard/
> 87137cac-49eb-492a-8f33-8e33470d8cb7.7
> [root at n1 gluster]# ls -lh ./brick4/brick/.shard/87137cac-49eb-492a-8f33-
> 8e33470d8cb7.7
> -rw-rw----. 2 root root 3.8G Mar 25 13:55 ./brick4/brick/.shard/
> 87137cac-49eb-492a-8f33-8e33470d8cb7.7
>
> [root at n1 gluster]# getfattr -d -m . -e hex ./brick2/brick/.shard/
> 87137cac-49eb-492a-8f33-8e33470d8cb7.7
> # file: brick2/brick/.shard/87137cac-49eb-492a-8f33-8e33470d8cb7.7
> security.selinux=0x73797374656d5f753a6f626a6563
> 745f723a756e6c6162656c65645f743a733000
> trusted.gfid=0xfdf0813b718a4616a51b6999ebba9ec3
> trusted.glusterfs.dht.linkto=0x6f766972742d3335302d7a6f6e65
> 312d7265706c69636174652d3300
>
> [root at n1 gluster]# getfattr -d -m . -e hex ./brick4/brick/.shard/
> 87137cac-49eb-492a-8f33-8e33470d8cb7.7
> # file: brick4/brick/.shard/87137cac-49eb-492a-8f33-8e33470d8cb7.7
> security.selinux=0x73797374656d5f753a6f626a6563
> 745f723a756e6c6162656c65645f743a733000
> trusted.afr.dirty=0x000000000000000000000000
> trusted.bit-rot.version=0x020000000000000059914190000ce672
> trusted.gfid=0x57c6fcdf52bb4f7aaea402f0dc81ff56
>
>
> I'm wondering how they got created in the first place, and if anyone has
> any insight on how to fix it?
>
> Storage nodes:
> [root at n1 gluster]# gluster --version
> glusterfs 4.0.0
>
> [root at n1 gluster]# gluster volume info
>
> Volume Name: ovirt-350-zone1
> Type: Distributed-Replicate
> Volume ID: 106738ed-9951-4270-822e-63c9bcd0a20e
> Status: Started
> Snapshot Count: 0
> Number of Bricks: 7 x (2 + 1) = 21
> Transport-type: tcp
> Bricks:
> Brick1: 10.0.6.100:/gluster/brick1/brick
> Brick2: 10.0.6.101:/gluster/brick1/brick
> Brick3: 10.0.6.102:/gluster/arbrick1/brick (arbiter)
> Brick4: 10.0.6.100:/gluster/brick2/brick
> Brick5: 10.0.6.101:/gluster/brick2/brick
> Brick6: 10.0.6.102:/gluster/arbrick2/brick (arbiter)
> Brick7: 10.0.6.100:/gluster/brick3/brick
> Brick8: 10.0.6.101:/gluster/brick3/brick
> Brick9: 10.0.6.102:/gluster/arbrick3/brick (arbiter)
> Brick10: 10.0.6.100:/gluster/brick4/brick
> Brick11: 10.0.6.101:/gluster/brick4/brick
> Brick12: 10.0.6.102:/gluster/arbrick4/brick (arbiter)
> Brick13: 10.0.6.100:/gluster/brick5/brick
> Brick14: 10.0.6.101:/gluster/brick5/brick
> Brick15: 10.0.6.102:/gluster/arbrick5/brick (arbiter)
> Brick16: 10.0.6.100:/gluster/brick6/brick
> Brick17: 10.0.6.101:/gluster/brick6/brick
> Brick18: 10.0.6.102:/gluster/arbrick6/brick (arbiter)
> Brick19: 10.0.6.100:/gluster/brick7/brick
> Brick20: 10.0.6.101:/gluster/brick7/brick
> Brick21: 10.0.6.102:/gluster/arbrick7/brick (arbiter)
> Options Reconfigured:
> cluster.min-free-disk: 50GB
> performance.strict-write-ordering: off
> performance.strict-o-direct: off
> nfs.disable: off
> performance.readdir-ahead: on
> transport.address-family: inet
> performance.cache-size: 1GB
> features.shard: on
> features.shard-block-size: 5GB
> server.event-threads: 8
> server.outstanding-rpc-limit: 128
> storage.owner-uid: 36
> storage.owner-gid: 36
> performance.quick-read: off
> performance.read-ahead: off
> performance.io-cache: off
> performance.stat-prefetch: on
> cluster.eager-lock: enable
> network.remote-dio: enable
> cluster.quorum-type: auto
> cluster.server-quorum-type: server
> cluster.data-self-heal-algorithm: full
> performance.flush-behind: off
> performance.write-behind-window-size: 8MB
> client.event-threads: 8
> server.allow-insecure: on
>
>
> Client version:
> [root at kvm573 ~]# gluster --version
> glusterfs 3.12.5
>
>
> Thanks!
>
> - Ian
>
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-users
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20180326/675b0a9c/attachment.html>