[Gluster-users] Fuse client dying after "gfid different on subvolume" ?
Marc Seeger
marc.seeger at acquia.com
Mon Jun 3 09:07:01 UTC 2013
Hey gluster-users,
I just stumbled on a problem in our current test-setup of gluster 3.3.2.
This is a simple replicated setup with 2 bricks (on XFS) in 1 volume running on glusterfs version 3.3.2qa3 on ubuntu lucid.
The client mounting this volume on /mnt/gfs sits on a mother machine and is using fuse (Version: 2.8.1-1.1ubuntu3.1).
On the gluster-fs fuse client mount log:
[2013-06-02 21:23:26.677069] W [afr-common.c:1196:afr_detect_self_heal_by_iatt] 0-test-fs-cluster-1-replicate-0: /home/filesshared/README.txt.lock: gfid different on subvolume
[2013-06-02 21:23:26.677069] I [afr-self-heal-common.c:1970:afr_sh_post_nb_entrylk_gfid_sh_cbk] 0-test-fs-cluster-1-replicate-0: Non blocking entrylks failed.
[2013-06-02 21:23:26.697068] W [client3_1-fops.c:258:client3_1_mknod_cbk] 0-test-fs-cluster-1-client-0: remote operation failed: File exists. Path: /home/filesshared/README.txt.lock (00000000-0000-0000-0000-000000000000)
[2013-06-02 21:23:26.697068] W [client3_1-fops.c:258:client3_1_mknod_cbk] 0-test-fs-cluster-1-client-1: remote operation failed: File exists. Path: /home/filesshared/README.txt.lock (00000000-0000-0000-0000-000000000000)
[2013-06-02 21:23:26.697068] W [inode.c:914:inode_lookup] (-->/usr/lib/glusterfs/3.3.2qa3/xlator/debug/io-stats.so(io_stats_lookup_cbk+0xff) [0x7fb16c310d8f] (-->/usr/lib/glusterfs/3.3.2qa3/xlator/mount/fuse.so(+0xf248) [0x7fb16fa95248] (-->/usr/lib/glusterfs/3.3.2qa3/xlator/mount/fuse.so(+0xf0b1) [0x7fb16fa950b1]))) 0-fuse: inode not found
What the application side is doing when this happened:
1. It created /home/filesshared
2. creates /mnt/gfs/home/filesshared
3. deleted /home/filesshared and replaced it with a symlink from /home/filesshared to /mnt/gfs/home/filesshared
4. Tried to write some files
Here's the log for that:
2013-06-02T21:23:26+00:00 daemon.notice web-14 f-c-w[4842]: deploying filesshared.prod
2013-06-02T21:23:26+00:00 daemon.notice web-14 f-c-w[4842]: creating directory: dir=/home/filesshared, user=0, group=filesshared, mode=0550
2013-06-02T21:23:26+00:00 daemon.notice web-14 f-c-w[4842]: creating directory: dir=/mnt/gfs/home/filesshared, user=filesshared, group=filesshared, mode=0700
2013-06-02T21:23:26+00:00 daemon.notice web-14 f-c-w[4842]: created /home/filesshared -> /mnt/gfs/home/filesshared
2013-06-02T21:23:26+00:00 daemon.notice web-14 f-c-w[4842]: PHP Warning: stat(): stat failed for /home/filesshared/README.txt.lock in /usr/ah/lib/ah-lib.php on line 701
2013-06-02T21:23:27+00:00 daemon.notice web-14 f-c-w[4842]: PHP Warning: stat(): stat failed for /home/filesshared/README.txt.lock in /usr/ah/lib/ah-lib.php on line 701
2013-06-02T21:23:27+00:00 daemon.notice web-14 f-c-w[4842]: PHP Warning: stat(): stat failed for /home/filesshared/README.txt.lock in /usr/ah/lib/ah-lib.php on line 701
2013-06-02T21:23:28+00:00 daemon.notice web-14 f-c-w[4842]: PHP Warning: stat(): stat failed for /home/filesshared/README.txt.lock in /usr/ah/lib/ah-lib.php on line 701
What this resulted in:
This turned the mount point completely unresponsive.
This means that in PHP, file_exists('/mnt/gfs') returns false and stat() calls fail. In Ruby File.directory?('/mnt/gfs') returns false.
This can be solved by calling "umount /mnt/gfs" and then remounting the share again from fstab ("mount /mnt/gfs")
I could not find any relevant log entries on the bricks themselves. I sadly also wasn't able to come up with a test case to reproduce it.
It seems somewhat similar to http://gluster.org/pipermail/gluster-users/2013-March/035662.html
I initially thought that this could have been fixed in http://review.gluster.org/#/c/4689/ , but the qa branch we run has this fix backported.
Any idea what could cause this behaviour?
Cheers,
Marc
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://supercolony.gluster.org/pipermail/gluster-users/attachments/20130603/f8bca681/attachment.html>
More information about the Gluster-users
mailing list