[Gluster-users] Possible reason for "meta-data data entry missing-entry gfid self-heal failed"?

Laurent Chouinard laurent.chouinard at ubisoft.com
Mon Jul 8 19:51:29 UTC 2013


Hi,

I've found this thread by searching for a problem we're having. After a certain time, and under sustained write load, our GlusterFS system experiences what appears to be the same problem as Marc Seeger has shown.

We don't know exactly how the event is triggered, but here's the last few lines of the fuse mount log (/var/log/mnt-gv0.log):

[2013-07-08 10:41:22.222438] W [client3_1-fops.c:258:client3_1_mknod_cbk] 0-gv0-client-3: remote operation failed: File exists. Path: /folder/ba/5d/ba5d2531274e683f8aa67494ebae8955 (00000000-0000-0000-0000-000000000000)
[2013-07-08 10:41:25.837297] W [client3_1-fops.c:258:client3_1_mknod_cbk] 0-gv0-client-3: remote operation failed: File exists. Path: /folder/8d/00/8d00b4d980cec0e2e6ebd800927e01de (00000000-0000-0000-0000-000000000000)
[2013-07-08 10:41:25.837408] W [client3_1-fops.c:258:client3_1_mknod_cbk] 0-gv0-client-2: remote operation failed: File exists. Path: /folder/8d/00/8d00b4d980cec0e2e6ebd800927e01de (00000000-0000-0000-0000-000000000000)
[2013-07-08 10:41:25.837724] W [fuse-bridge.c:713:fuse_fd_cbk] 0-glusterfs-fuse: 99513056: OPEN() /folder/8d/00/8d00b4d980cec0e2e6ebd800927e01de => -1 (Invalid argument)
[2013-07-08 10:41:25.875407] I [afr-self-heal-common.c:1941:afr_sh_post_nb_entrylk_conflicting_sh_cbk] 0-gv0-replicate-1: Non blocking entrylks failed.
[2013-07-08 10:41:25.875456] E [afr-self-heal-common.c:2160:afr_self_heal_completion_cbk] 0-gv0-replicate-1: background  meta-data data entry missing-entry gfid self-heal failed on /folder/d3/4a/d34a2601138ea0b7d243d146e411c9a2
[2013-07-08 10:41:26.032067] W [inode.c:914:inode_lookup] (-->/usr/lib/glusterfs/3.3.1/xlator/debug/io-stats.so(io_stats_lookup_cbk+0x104) [0x7fc0723553e4] (-->/usr/lib/glusterfs/3.3.1/xlator/mount/fuse.so(+0x14232) [0x7fc075ada232] (-->/usr/lib/glusterfs/3.3.1/xlator/mount/fuse.so(+0x140a7) [0x7fc075ada0a7]))) 0-fuse: inode not found
[2013-07-08 10:45:01.928054] I [fuse-bridge.c:4091:fuse_thread_proc] 0-fuse: unmounting /mnt/gv0
[2013-07-08 10:45:01.943226] W [glusterfsd.c:831:cleanup_and_exit] (-->/lib/x86_64-linux-gnu/libc.so.6(clone+0x6d) [0x7fc0776e44bd] (-->/lib/x86_64-linux-gnu/libpthread.so.0(+0x7e9a) [0x7fc0779b6e9a] (-->/usr/sbin/glusterfs(glusterfs_sigwaiter+0xd5) [0x7fc07849ca55]))) 0-: received signum (15), shutting down
[2013-07-08 10:45:01.943290] I [fuse-bridge.c:4648:fini] 0-fuse: Unmounting '/mnt/gv0'.

In the past few months, every node has about 5 to 10 occurrences of this. It seems the sequence of events it:
- Errors about other nodes trying to do something on a file that already exists?
- Failed OPEN on a file that exists
- Replication "non blocking entrylks"
- Replication "background meta-data data entry missing"
- inode not found, which triggers unmounting immediately or a few minutes later.

Here's another event:

[2013-07-08 10:57:29.762798] W [client3_1-fops.c:258:client3_1_mknod_cbk] 0-gv0-client-3: remote operation failed: File exists. Path: /assets/acu/79/95/79952dc84cc096d79dad886f16ef1c58 (00000000-0000-0000-0000-000000000000)
[2013-07-08 10:57:29.763110] W [fuse-bridge.c:713:fuse_fd_cbk] 0-glusterfs-fuse: 24903: OPEN() /assets/acu/79/95/79952dc84cc096d79dad886f16ef1c58 => -1 (Invalid argument)
[2013-07-08 10:57:29.779663] I [afr-self-heal-common.c:1941:afr_sh_post_nb_entrylk_conflicting_sh_cbk] 0-gv0-replicate-1: Non blocking entrylks failed. 
[2013-07-08 10:57:29.779713] E [afr-self-heal-common.c:2160:afr_self_heal_completion_cbk] 0-gv0-replicate-1: background  meta-data data entry missing-entry gfid self-heal failed on /assets/acu/cd/4f/cd4ffce6846aa50bfcf233b4b0ebbe7e
[2013-07-08 10:57:42.796401] I [afr-self-heal-entry.c:2333:afr_sh_entry_fix] 0-gv0-replicate-0: /assets/acu/b8/a0: Performing conservative merge
[2013-07-08 10:57:55.036157] I [afr-self-heal-common.c:1970:afr_sh_post_nb_entrylk_gfid_sh_cbk] 0-gv0-replicate-1: Non blocking entrylks failed.
[2013-07-08 10:57:55.036775] W [client3_1-fops.c:258:client3_1_mknod_cbk] 0-gv0-client-3: remote operation failed: File exists. Path: /assets/acu/4a/eb/4aeb9e8041f36e55f64e6761d4d2afde (00000000-0000-0000-0000-000000000000)
[2013-07-08 10:57:55.036855] W [client3_1-fops.c:258:client3_1_mknod_cbk] 0-gv0-client-2: remote operation failed: File exists. Path: /assets/acu/4a/eb/4aeb9e8041f36e55f64e6761d4d2afde (00000000-0000-0000-0000-000000000000)
[2013-07-08 10:57:55.037211] W [inode.c:914:inode_lookup] (-->/usr/lib/glusterfs/3.3.1/xlator/debug/io-stats.so(io_stats_lookup_cbk+0x104) [0x7feec832c3e4] (-->/usr/lib/glusterfs/3.3.1/xlator/mount/fuse.so(+0x14232) [0x7feecbab1232] (-->/usr/lib/glusterfs/3.3.1/xlator/mount/fuse.so(+0x140a7) [0x7feecbab10a7]))) 0-fuse: inode not found
[2013-07-08 11:00:01.989980] I [fuse-bridge.c:4091:fuse_thread_proc] 0-fuse: unmounting /mnt/gv0
[2013-07-08 11:00:01.990196] W [glusterfsd.c:831:cleanup_and_exit] (-->/lib/x86_64-linux-gnu/libc.so.6(clone+0x6d) [0x7feecd6bb4bd] (-->/lib/x86_64-linux-gnu/libpthread.so.0(+0x7e9a) [0x7feecd98de9a] (-->/usr/sbin/glusterfs(glusterfs_sigwaiter+0xd5) [0x7feece473a55]))) 0-: received signum (15), shutting down
[2013-07-08 11:00:01.990222] I [fuse-bridge.c:4648:fini] 0-fuse: Unmounting '/mnt/gv0'.

I checked through all other log files and could not find anything relevant or indicative of the problem around those time stamps.

Is it related to the bug that Marc Seeger indicated? (http://review.gluster.org/#/c/4689/ )

Thanks,

Laurent Chouinard




More information about the Gluster-users mailing list