[Gluster-users] FUSE Client Crashes On Shared File

Timothy Orme torme at ancestry.com
Mon Oct 21 19:02:00 UTC 2019

Some further bits of info on this.  I've found a lot of similar issues related to linking, but I don't seem to have any of those same issues.

I've checked the gfids on all 3 of the replicas and they are the same, as are all extended attributes.

I did notice that the gfid is present in the `.remove_me` directory, though the file itself is not removed still.  Perhaps this is the root of the problem?  I don't know much about the `.remove_me` dir, and couldn't find similar issues though.

From: Timothy Orme
Sent: Monday, October 21, 2019 10:50 AM
To: gluster-users <gluster-users at gluster.org>
Subject: FUSE Client Crashes On Shared File


I'm running gluster 6.5 on Amazon Linux 2 (CentOS 7 variant).  I have a distributed-replicated cluster running, with sharding enabled for files over 512 MB.

I tried issuing an `rm` for a large number of files, and seem to be consistently getting the client to crash on a specific file set.  I see the following error in the logs:

[2019-10-21 17:43:19.875880] I [fuse-bridge.c:5142:fuse_init] 0-glusterfs-fuse: FUSE inited with protocol versions: glusterfs 7.24 kernel 7.26
[2019-10-21 17:43:19.875896] I [fuse-bridge.c:5753:fuse_graph_sync] 0-fuse: switched to graph 0
[2019-10-21 17:44:16.372054] W [MSGID: 109009] [dht-common.c:2807:dht_lookup_linkfile_cbk] 0-scratch-dht: /.shard/4b6d0aab-aa33-44dd-8d3f-2054712702dd.1: gfid different on data file on scratch-replicate-3, gfid local = 00000000-0000-0000-0000-000000000000, gfid node = 793507db-42e1-4b9e-9ce0-b2c2451f78dd
[2019-10-21 17:44:16.373429] W [MSGID: 109009] [dht-common.c:2562:dht_lookup_everywhere_cbk] 0-scratch-dht: /.shard/4b6d0aab-aa33-44dd-8d3f-2054712702dd.1: gfid differs on subvolume scratch-replicate-3, gfid local = 5c52fe2a-c580-42ae-b2cb-ce3cae39ffeb, gfid node = 793507db-42e1-4b9e-9ce0-b2c2451f78dd
[2019-10-21 17:44:16.373730] E [MSGID: 133010] [shard.c:2326:shard_common_lookup_shards_cbk] 0-scratch-shard: Lookup on shard 1 failed. Base file gfid = 4b6d0aab-aa33-44dd-8d3f-2054712702dd [Stale file handle]
pending frames:
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(1) op(UNLINK)
frame : type(1) op(UNLINK)
frame : type(0) op(0)
frame : type(0) op(0)
patchset: git://git.gluster.org/glusterfs.git
signal received: 11
time of crash:
2019-10-21 17:44:16
configuration details:
argp 1
backtrace 1
dlfcn 1
libpthread 1
llistxattr 1
setfsid 1
spinlock 1
epoll.h 1
xattr.h 1
st_atim.tv_nsec 1
package-string: glusterfs 6.5

The error is consistent and reproducible.  Every time I remount and try and delete the client crashes again.  I'm assuming something is wrong with the shards.  How do I correct this, or is this a bug with sharding?

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20191021/b74ced85/attachment.html>

More information about the Gluster-users mailing list