[Gluster-users] [Stale file handle] in shard volume

Wed Dec 19 20:56:42 UTC 2018

Dear All,

It appears i've a stale file in one of the volumes, on 2 files. These files
are qemu images (1 raw and 1 qcow2).
I'll just focus on 1 file since the situation on the other seems the same.

The VM get's paused more or less directly after being booted with error;
[2018-12-18 14:05:05.275713] E [MSGID: 133010]
[shard.c:1724:shard_common_lookup_shards_cbk] 0-ovirt-backbone-2-shard:
Lookup on shard 51500 failed. Base file gfid =
f28cabcb-d169-41fc-a633-9bef4c4a8e40 [Stale file handle]

investigating the shard;

#on the arbiter node:

[root at lease-05 ovirt-backbone-2]# getfattr -n glusterfs.gfid.string
/mnt/ovirt-backbone-2/b1c2c949-aef4-4aec-999b-b179efeef732/images/f6ac9660-a84e-469e-a17c-c6dbc538af4b/d6b09501-5b79-4c92-bf10-2ed3bda1b425
getfattr: Removing leading '/' from absolute path names
# file:
mnt/ovirt-backbone-2/b1c2c949-aef4-4aec-999b-b179efeef732/images/f6ac9660-a84e-469e-a17c-c6dbc538af4b/d6b09501-5b79-4c92-bf10-2ed3bda1b425
glusterfs.gfid.string="f28cabcb-d169-41fc-a633-9bef4c4a8e40"

[root at lease-05 ovirt-backbone-2]# getfattr -d -m . -e hex
.shard/f28cabcb-d169-41fc-a633-9bef4c4a8e40.51500
# file: .shard/f28cabcb-d169-41fc-a633-9bef4c4a8e40.51500
security.selinux=0x73797374656d5f753a6f626a6563745f723a6574635f72756e74696d655f743a733000
trusted.afr.dirty=0x000000000000000000000000
trusted.gfid=0x1f86b4328ec6424699aa48cc6d7b5da0
trusted.gfid2path.b48064c78d7a85c9=0x62653331383633382d653861302d346336642d393737642d3761393337616138343830362f66323863616263622d643136392d343166632d613633332d3962656634633461386534302e3531353030

[root at lease-05 ovirt-backbone-2]# getfattr -d -m . -e hex
.glusterfs/1f/86/1f86b432-8ec6-4246-99aa-48cc6d7b5da0
# file: .glusterfs/1f/86/1f86b432-8ec6-4246-99aa-48cc6d7b5da0
security.selinux=0x73797374656d5f753a6f626a6563745f723a6574635f72756e74696d655f743a733000
trusted.afr.dirty=0x000000000000000000000000
trusted.gfid=0x1f86b4328ec6424699aa48cc6d7b5da0
trusted.gfid2path.b48064c78d7a85c9=0x62653331383633382d653861302d346336642d393737642d3761393337616138343830362f66323863616263622d643136392d343166632d613633332d3962656634633461386534302e3531353030

[root at lease-05 ovirt-backbone-2]# stat
.glusterfs/1f/86/1f86b432-8ec6-4246-99aa-48cc6d7b5da0
  File: ‘.glusterfs/1f/86/1f86b432-8ec6-4246-99aa-48cc6d7b5da0’
  Size: 0               Blocks: 0          IO Block: 4096   regular empty
file
Device: fd01h/64769d    Inode: 537277306   Links: 2
Access: (0660/-rw-rw----)  Uid: (    0/    root)   Gid: (    0/    root)
Context: system_u:object_r:etc_runtime_t:s0
Access: 2018-12-17 21:43:36.361984810 +0000
Modify: 2018-12-17 21:43:36.361984810 +0000
Change: 2018-12-18 20:55:29.908647417 +0000
 Birth: -

[root at lease-05 ovirt-backbone-2]# find . -inum 537277306
./.glusterfs/1f/86/1f86b432-8ec6-4246-99aa-48cc6d7b5da0
./.shard/f28cabcb-d169-41fc-a633-9bef4c4a8e40.51500

#on the data nodes:

[root at lease-08 ~]# getfattr -n glusterfs.gfid.string
/mnt/ovirt-backbone-2/b1c2c949-aef4-4aec-999b-b179efeef732/images/f6ac9660-a84e-469e-a17c-c6dbc538af4b/d6b09501-5b79-4c92-bf10-2ed3bda1b425
getfattr: Removing leading '/' from absolute path names
# file:
mnt/ovirt-backbone-2/b1c2c949-aef4-4aec-999b-b179efeef732/images/f6ac9660-a84e-469e-a17c-c6dbc538af4b/d6b09501-5b79-4c92-bf10-2ed3bda1b425
glusterfs.gfid.string="f28cabcb-d169-41fc-a633-9bef4c4a8e40"

[root at lease-08 ovirt-backbone-2]# getfattr -d -m . -e hex
.shard/f28cabcb-d169-41fc-a633-9bef4c4a8e40.51500
# file: .shard/f28cabcb-d169-41fc-a633-9bef4c4a8e40.51500
security.selinux=0x73797374656d5f753a6f626a6563745f723a6574635f72756e74696d655f743a733000
trusted.afr.dirty=0x000000000000000000000000
trusted.gfid=0x1f86b4328ec6424699aa48cc6d7b5da0
trusted.gfid2path.b48064c78d7a85c9=0x62653331383633382d653861302d346336642d393737642d3761393337616138343830362f66323863616263622d643136392d343166632d613633332d3962656634633461386534302e3531353030

[root at lease-08 ovirt-backbone-2]# getfattr -d -m . -e hex
.glusterfs/1f/86/1f86b432-8ec6-4246-99aa-48cc6d7b5da0
# file: .glusterfs/1f/86/1f86b432-8ec6-4246-99aa-48cc6d7b5da0
security.selinux=0x73797374656d5f753a6f626a6563745f723a6574635f72756e74696d655f743a733000
trusted.afr.dirty=0x000000000000000000000000
trusted.gfid=0x1f86b4328ec6424699aa48cc6d7b5da0
trusted.gfid2path.b48064c78d7a85c9=0x62653331383633382d653861302d346336642d393737642d3761393337616138343830362f66323863616263622d643136392d343166632d613633332d3962656634633461386534302e3531353030

[root at lease-08 ovirt-backbone-2]# stat
.glusterfs/1f/86/1f86b432-8ec6-4246-99aa-48cc6d7b5da0
  File: ‘.glusterfs/1f/86/1f86b432-8ec6-4246-99aa-48cc6d7b5da0’
  Size: 2166784         Blocks: 4128       IO Block: 4096   regular file
Device: fd03h/64771d    Inode: 12893624759  Links: 3
Access: (0660/-rw-rw----)  Uid: (    0/    root)   Gid: (    0/    root)
Context: system_u:object_r:etc_runtime_t:s0
Access: 2018-12-18 18:52:38.070776585 +0000
Modify: 2018-12-17 21:43:36.388054443 +0000
Change: 2018-12-18 21:01:47.810506528 +0000
 Birth: -

[root at lease-08 ovirt-backbone-2]# find . -inum 12893624759
./.glusterfs/1f/86/1f86b432-8ec6-4246-99aa-48cc6d7b5da0
./.shard/f28cabcb-d169-41fc-a633-9bef4c4a8e40.51500

========================

[root at lease-11 ovirt-backbone-2]# getfattr -n glusterfs.gfid.string
/mnt/ovirt-backbone-2/b1c2c949-aef4-4aec-999b-b179efeef732/images/f6ac9660-a84e-469e-a17c-c6dbc538af4b/d6b09501-5b79-4c92-bf10-2ed3bda1b425
getfattr: Removing leading '/' from absolute path names
# file:
mnt/ovirt-backbone-2/b1c2c949-aef4-4aec-999b-b179efeef732/images/f6ac9660-a84e-469e-a17c-c6dbc538af4b/d6b09501-5b79-4c92-bf10-2ed3bda1b425
glusterfs.gfid.string="f28cabcb-d169-41fc-a633-9bef4c4a8e40"

[root at lease-11 ovirt-backbone-2]#  getfattr -d -m . -e hex
.shard/f28cabcb-d169-41fc-a633-9bef4c4a8e40.51500
# file: .shard/f28cabcb-d169-41fc-a633-9bef4c4a8e40.51500
security.selinux=0x73797374656d5f753a6f626a6563745f723a6574635f72756e74696d655f743a733000
trusted.afr.dirty=0x000000000000000000000000
trusted.gfid=0x1f86b4328ec6424699aa48cc6d7b5da0
trusted.gfid2path.b48064c78d7a85c9=0x62653331383633382d653861302d346336642d393737642d3761393337616138343830362f66323863616263622d643136392d343166632d613633332d3962656634633461386534302e3531353030

[root at lease-11 ovirt-backbone-2]# getfattr -d -m . -e hex
.glusterfs/1f/86/1f86b432-8ec6-4246-99aa-48cc6d7b5da0
# file: .glusterfs/1f/86/1f86b432-8ec6-4246-99aa-48cc6d7b5da0
security.selinux=0x73797374656d5f753a6f626a6563745f723a6574635f72756e74696d655f743a733000
trusted.afr.dirty=0x000000000000000000000000
trusted.gfid=0x1f86b4328ec6424699aa48cc6d7b5da0
trusted.gfid2path.b48064c78d7a85c9=0x62653331383633382d653861302d346336642d393737642d3761393337616138343830362f66323863616263622d643136392d343166632d613633332d3962656634633461386534302e3531353030

[root at lease-11 ovirt-backbone-2]# stat
.glusterfs/1f/86/1f86b432-8ec6-4246-99aa-48cc6d7b5da0
  File: ‘.glusterfs/1f/86/1f86b432-8ec6-4246-99aa-48cc6d7b5da0’
  Size: 2166784         Blocks: 4128       IO Block: 4096   regular file
Device: fd03h/64771d    Inode: 12956094809  Links: 3
Access: (0660/-rw-rw----)  Uid: (    0/    root)   Gid: (    0/    root)
Context: system_u:object_r:etc_runtime_t:s0
Access: 2018-12-18 20:11:53.595208449 +0000
Modify: 2018-12-17 21:43:36.391580259 +0000
Change: 2018-12-18 19:19:25.888055392 +0000
 Birth: -

[root at lease-11 ovirt-backbone-2]# find . -inum 12956094809
./.glusterfs/1f/86/1f86b432-8ec6-4246-99aa-48cc6d7b5da0
./.shard/f28cabcb-d169-41fc-a633-9bef4c4a8e40.51500

================

I don't really see any inconsistencies, except the dates on the stat.
However this is only after i tried moving the file out of the volumes to
force a heal, which does happen on the data nodes, but not on the arbiter
node. Before that they were also the same.
I've also compared the file
./.shard/f28cabcb-d169-41fc-a633-9bef4c4a8e40.51500 on the 2 nodes and they
are exactly the same.

Things i've further tried;
- gluster v heal ovirt-backbone-2 full => gluster v heal ovirt-backbone-2
info reports 0 entries on all nodes

- stop each glusterd and glusterfsd, pause around 40sec and start them
again on each node, 1 at a time, waiting for the heal to recover before
moving to the next node

- force a heal by stopping glusterd on a node and perform these steps;
mkdir /mnt/ovirt-backbone-2/trigger
rmdir /mnt/ovirt-backbone-2/trigger
setfattr -n trusted.non-existent-key -v abc /mnt/ovirt-backbone-2/
setfattr -x trusted.non-existent-key /mnt/ovirt-backbone-2/
start glusterd

- gluster volume rebalance ovirt-backbone-2 start => success

Whats further interesting is that according the mount log, the volume is in
split-brain;
[2018-12-18 10:06:04.606870] E [MSGID: 108008]
[afr-read-txn.c:90:afr_read_txn_refresh_done]
0-ovirt-backbone-2-replicate-2: Failing FSTAT on gfid
2a57d87d-fe49-4034-919b-fdb79531bf68: split-brain observed. [Input/output
error]
[2018-12-18 10:06:04.606908] E [MSGID: 133014]
[shard.c:1248:shard_common_stat_cbk] 0-ovirt-backbone-2-shard: stat failed:
2a57d87d-fe49-4034-919b-fdb79531bf68 [Input/output error]
[2018-12-18 10:06:04.606927] W [fuse-bridge.c:871:fuse_attr_cbk]
0-glusterfs-fuse: 428090: FSTAT()
/b1c2c949-aef4-4aec-999b-b179efeef732/dom_md/ids => -1 (Input/output error)
[2018-12-18 10:06:05.107729] E [MSGID: 108008]
[afr-read-txn.c:90:afr_read_txn_refresh_done]
0-ovirt-backbone-2-replicate-2: Failing FSTAT on gfid
2a57d87d-fe49-4034-919b-fdb79531bf68: split-brain observed. [Input/output
error]
[2018-12-18 10:06:05.107770] E [MSGID: 133014]
[shard.c:1248:shard_common_stat_cbk] 0-ovirt-backbone-2-shard: stat failed:
2a57d87d-fe49-4034-919b-fdb79531bf68 [Input/output error]
[2018-12-18 10:06:05.107791] W [fuse-bridge.c:871:fuse_attr_cbk]
0-glusterfs-fuse: 428091: FSTAT()
/b1c2c949-aef4-4aec-999b-b179efeef732/dom_md/ids => -1 (Input/output error)
[2018-12-18 10:06:05.537244] I [MSGID: 108006]
[afr-common.c:5494:afr_local_init] 0-ovirt-backbone-2-replicate-1: no
subvolumes up
[2018-12-18 10:06:05.538523] E [MSGID: 108008]
[afr-read-txn.c:90:afr_read_txn_refresh_done]
0-ovirt-backbone-2-replicate-2: Failing STAT on gfid
00000000-0000-0000-0000-000000000001: split-brain observed. [Input/output
error]
[2018-12-18 10:06:05.538685] I [MSGID: 108006]
[afr-common.c:5494:afr_local_init] 0-ovirt-backbone-2-replicate-1: no
subvolumes up
[2018-12-18 10:06:05.538794] I [MSGID: 108006]
[afr-common.c:5494:afr_local_init] 0-ovirt-backbone-2-replicate-1: no
subvolumes up
[2018-12-18 10:06:05.539342] I [MSGID: 109063]
[dht-layout.c:716:dht_layout_normalize] 0-ovirt-backbone-2-dht: Found
anomalies in /b1c2c949-aef4-4aec-999b-b179efeef732 (gfid =
8c8598ce-1a52-418e-a7b4-435fee34bae8). Holes=2 overlaps=0
[2018-12-18 10:06:05.539372] W [MSGID: 109005]
[dht-selfheal.c:2158:dht_selfheal_directory] 0-ovirt-backbone-2-dht:
Directory selfheal failed: 2 subvolumes down.Not fixing. path =
/b1c2c949-aef4-4aec-999b-b179efeef732, gfid =
8c8598ce-1a52-418e-a7b4-435fee34bae8
[2018-12-18 10:06:05.539694] I [MSGID: 108006]
[afr-common.c:5494:afr_local_init] 0-ovirt-backbone-2-replicate-1: no
subvolumes up
[2018-12-18 10:06:05.540652] I [MSGID: 108006]
[afr-common.c:5494:afr_local_init] 0-ovirt-backbone-2-replicate-1: no
subvolumes up
[2018-12-18 10:06:05.608612] E [MSGID: 108008]
[afr-read-txn.c:90:afr_read_txn_refresh_done]
0-ovirt-backbone-2-replicate-2: Failing FSTAT on gfid
2a57d87d-fe49-4034-919b-fdb79531bf68: split-brain observed. [Input/output
error]
[2018-12-18 10:06:05.608657] E [MSGID: 133014]
[shard.c:1248:shard_common_stat_cbk] 0-ovirt-backbone-2-shard: stat failed:
2a57d87d-fe49-4034-919b-fdb79531bf68 [Input/output error]
[2018-12-18 10:06:05.608672] W [fuse-bridge.c:871:fuse_attr_cbk]
0-glusterfs-fuse: 428096: FSTAT()
/b1c2c949-aef4-4aec-999b-b179efeef732/dom_md/ids => -1 (Input/output error)
[2018-12-18 10:06:06.109339] E [MSGID: 108008]
[afr-read-txn.c:90:afr_read_txn_refresh_done]
0-ovirt-backbone-2-replicate-2: Failing FSTAT on gfid
2a57d87d-fe49-4034-919b-fdb79531bf68: split-brain observed. [Input/output
error]
[2018-12-18 10:06:06.109378] E [MSGID: 133014]
[shard.c:1248:shard_common_stat_cbk] 0-ovirt-backbone-2-shard: stat failed:
2a57d87d-fe49-4034-919b-fdb79531bf68 [Input/output error]
[2018-12-18 10:06:06.109399] W [fuse-bridge.c:871:fuse_attr_cbk]
0-glusterfs-fuse: 428097: FSTAT()
/b1c2c949-aef4-4aec-999b-b179efeef732/dom_md/ids => -1 (Input/output error)

#note i'm able to see ; /b1c2c949-aef4-4aec-999b-b179efeef732/dom_md/ids
[root at lease-11 ovirt-backbone-2]# stat
/mnt/ovirt-backbone-2/b1c2c949-aef4-4aec-999b-b179efeef732/dom_md/ids
  File:
‘/mnt/ovirt-backbone-2/b1c2c949-aef4-4aec-999b-b179efeef732/dom_md/ids’
  Size: 1048576         Blocks: 2048       IO Block: 131072 regular file
Device: 41h/65d Inode: 10492258721813610344  Links: 1
Access: (0660/-rw-rw----)  Uid: (   36/    vdsm)   Gid: (   36/     kvm)
Context: system_u:object_r:fusefs_t:s0
Access: 2018-12-19 20:07:39.917573869 +0000
Modify: 2018-12-19 20:07:39.928573917 +0000
Change: 2018-12-19 20:07:39.929573921 +0000
 Birth: -

however checking: gluster v heal ovirt-backbone-2 info split-brain
reports no entries.

I've also tried mounting the qemu image, and this works fine, i'm able to
see all contents;
 losetup /dev/loop0
/mnt/ovirt-backbone-2/b1c2c949-aef4-4aec-999b-b179efeef732/images/f6ac9660-a84e-469e-a17c-c6dbc538af4b/d6b09501-5b79-4c92-bf10-2ed3bda1b425
 kpartx -a /dev/loop0
 vgscan
 vgchange -ay slave-data
 mkdir /mnt/slv01
 mount /dev/mapper/slave--data-lvol0 /mnt/slv01/

Possible causes for this issue;
1. the machine "lease-11" suffered from a faulty RAM module (ECC), which
halted the machine and causes an invalid state. (this machine also hosts
other volumes, with similar configurations, which report no issue)
2. after the RAM module was replaced, the VM using the backing qemu image,
was restored from a backup (the backup was file based within the VM on a
different directory). This is because some files were corrupted. The
backup/recovery obviously causes extra IO, possible introducing race
conditions? The machine did run for about 12h without issues, and in total
for about 36h.
3. since only the client (maybe only gfapi?) reports errors, something is
broken there?

The volume info;
root at lease-06 ~# gluster v info ovirt-backbone-2

Volume Name: ovirt-backbone-2
Type: Distributed-Replicate
Volume ID: 85702d35-62c8-4c8c-930d-46f455a8af28
Status: Started
Snapshot Count: 0
Number of Bricks: 3 x (2 + 1) = 9
Transport-type: tcp
Bricks:
Brick1: 10.32.9.7:/data/gfs/bricks/brick1/ovirt-backbone-2
Brick2: 10.32.9.3:/data/gfs/bricks/brick1/ovirt-backbone-2
Brick3: 10.32.9.4:/data/gfs/bricks/bricka/ovirt-backbone-2 (arbiter)
Brick4: 10.32.9.8:/data0/gfs/bricks/brick1/ovirt-backbone-2
Brick5: 10.32.9.21:/data0/gfs/bricks/brick1/ovirt-backbone-2
Brick6: 10.32.9.5:/data/gfs/bricks/bricka/ovirt-backbone-2 (arbiter)
Brick7: 10.32.9.9:/data0/gfs/bricks/brick1/ovirt-backbone-2
Brick8: 10.32.9.20:/data0/gfs/bricks/brick1/ovirt-backbone-2
Brick9: 10.32.9.6:/data/gfs/bricks/bricka/ovirt-backbone-2 (arbiter)
Options Reconfigured:
nfs.disable: on
transport.address-family: inet
performance.quick-read: off
performance.read-ahead: off
performance.io-cache: off
performance.low-prio-threads: 32
network.remote-dio: enable
cluster.eager-lock: enable
cluster.quorum-type: auto
cluster.server-quorum-type: server
cluster.data-self-heal-algorithm: full
cluster.locking-scheme: granular
cluster.shd-max-threads: 8
cluster.shd-wait-qlength: 10000
features.shard: on
user.cifs: off
storage.owner-uid: 36
storage.owner-gid: 36
features.shard-block-size: 64MB
performance.write-behind-window-size: 512MB
performance.cache-size: 384MB
cluster.brick-multiplex: on

The volume status;
root at lease-06 ~# gluster v status ovirt-backbone-2
Status of volume: ovirt-backbone-2
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick 10.32.9.7:/data/gfs/bricks/brick1/ovi
rt-backbone-2                               49152     0          Y
7727
Brick 10.32.9.3:/data/gfs/bricks/brick1/ovi
rt-backbone-2                               49152     0          Y
12620
Brick 10.32.9.4:/data/gfs/bricks/bricka/ovi
rt-backbone-2                               49152     0          Y
8794
Brick 10.32.9.8:/data0/gfs/bricks/brick1/ov
irt-backbone-2                              49161     0          Y
22333
Brick 10.32.9.21:/data0/gfs/bricks/brick1/o
virt-backbone-2                             49152     0          Y
15030
Brick 10.32.9.5:/data/gfs/bricks/bricka/ovi
rt-backbone-2                               49166     0          Y
24592
Brick 10.32.9.9:/data0/gfs/bricks/brick1/ov
irt-backbone-2                              49153     0          Y
20148
Brick 10.32.9.20:/data0/gfs/bricks/brick1/o
virt-backbone-2                             49154     0          Y
15413
Brick 10.32.9.6:/data/gfs/bricks/bricka/ovi
rt-backbone-2                               49152     0          Y
43120
Self-heal Daemon on localhost               N/A       N/A        Y
44587
Self-heal Daemon on 10.201.0.2              N/A       N/A        Y
8401
Self-heal Daemon on 10.201.0.5              N/A       N/A        Y
11038
Self-heal Daemon on 10.201.0.8              N/A       N/A        Y
9513
Self-heal Daemon on 10.32.9.4               N/A       N/A        Y
23736
Self-heal Daemon on 10.32.9.20              N/A       N/A        Y
2738
Self-heal Daemon on 10.32.9.3               N/A       N/A        Y
25598
Self-heal Daemon on 10.32.9.5               N/A       N/A        Y       511
Self-heal Daemon on 10.32.9.9               N/A       N/A        Y
23357
Self-heal Daemon on 10.32.9.8               N/A       N/A        Y
15225
Self-heal Daemon on 10.32.9.7               N/A       N/A        Y
25781
Self-heal Daemon on 10.32.9.21              N/A       N/A        Y
5034

Task Status of Volume ovirt-backbone-2
------------------------------------------------------------------------------
Task                 : Rebalance
ID                   : 6dfbac43-0125-4568-9ac3-a2c453faaa3d
Status               : completed

gluster version is @3.12.15 and cluster.op-version=31202

========================

It would be nice to know if it's possible to mark the files as not stale or
if i should investigate other things?
Or should we consider this volume lost?
Also checking the code at;
https://github.com/gluster/glusterfs/blob/master/xlators/features/shard/src/shard.c
it seems the functions shifted quite some (line 1724 vs. 2243), so maybe
it's fixed in a future version?
Any thoughts are welcome.

Thanks Olaf
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20181219/21c71048/attachment.html>