[Gluster-users] Directory metadata inconsistencies and missing output ("mismatched layout" and "no dentry for inode" error)

Mon Feb 18 19:29:19 UTC 2013

Hi I'm running into a rather strange and frustrating bug and wondering if
anyone on the mailing list might have some insight about what might be
causing it. I'm running a cluster of two dozen nodes, where the processing
nodes are also the gluster bricks (using the SLURM resource manager). Each
node has the glusters mounted natively (not NFS). All nodes are using
v3.2.7. Each job in the node runs a shell script like so:

containerDir=$1
groupNum=$2
mkdir -p $containerDir
./generateGroupGen.py $groupNum >$containerDir/$groupNum.out

Then run the following jobs:

runGroupGen [glusterDirectory] 1
runGroupGen [glusterDirectory] 2
runGroupGen [glusterDirectory] 3
...

Typically about 200 jobs launch within milliseconds of each other so the
glusterfs/fuse directory system receives a large number of simultaneous
create directory and create file system calls within a very short time.

Some of the output files inside the directory have a file that exists but
no output. When this occurs it is always the case that either all jobs on a
node behave normally or all fail to produce output. It should be noted that
there are no error messages generated by the processes themselves, and all
processes on the no-output node return with no error code. In that sense
the failure is silent, but corrupts the data, which is dangerous. The only
indication of error are errors (on the no output nodes) in the
/var/log/distrib-glusterfs.log of the form:

[2013-02-18 05:55:31.382279] E [client3_1-fops.c:2228:client3_1_lookup_cbk]
0-volume1-client-16: remote operation failed: Stale NFS file handle
[2013-02-18 05:55:31.382302] E [client3_1-fops.c:2228:client3_1_lookup_cbk]
0-volume1-client-17: remote operation failed: Stale NFS file handle
[2013-02-18 05:55:31.382327] E [client3_1-fops.c:2228:client3_1_lookup_cbk]
0-volume1-client-18: remote operation failed: Stale NFS file handle
[2013-02-18 05:55:31.640791] W [inode.c:1044:inode_path]
(-->/usr/lib/glusterfs/3.2.7/xlator/mount/fuse.so(+0xe8fd) [0x7fa8341868fd]
(-->/usr/lib/glusterfs/3.2.7/xlator/mount/fuse.so(+0xa6bb) [0x7fa8341826bb]
(-->/usr/lib/glusterfs/3.2.7/xlator/mount/fuse.so(fuse_loc_fill+0x1c6)
[0x7fa83417d156]))) 0-volume1/inode: no dentry for non-root inode
-69777006931: 0a37836d-e9e5-4cc1-8bd2-e8a49947959b
[2013-02-18 05:55:31.640865] W [fuse-bridge.c:561:fuse_getattr]
0-glusterfs-fuse: 2298073: GETATTR 140360215569520 (fuse_loc_fill() failed)
[2013-02-18 05:55:31.641672] W [inode.c:1044:inode_path]
(-->/usr/lib/glusterfs/3.2.7/xlator/mount/fuse.so(+0xe8fd) [0x7fa8341868fd]
(-->/usr/lib/glusterfs/3.2.7/xlator/mount/fuse.so(+0xa6bb) [0x7fa8341826bb]
(-->/usr/lib/glusterfs/3.2.7/xlator/mount/fuse.so(fuse_loc_fill+0x1c6)
[0x7fa83417d156]))) 0-volume1/inode: no dentry for non-root inode
-69777006931: 0a37836d-e9e5-4cc1-8bd2-e8a49947959b
[2013-02-18 05:55:31.641724] W [fuse-bridge.c:561:fuse_getattr]
0-glusterfs-fuse: 2298079: GETATTR 140360215569520 (fuse_loc_fill() failed)
...

Sometimes on these events, and sometimes not, there will also be logs (on
both normal and abnormal nodes) of the form:

[2013-02-18 03:35:28.679681] I [dht-common.c:525:dht_revalidate_cbk]
0-volume1-dht: mismatching layouts for /inSample/pred/20110831

I understand from reading the mailing list that both the dentry errors and
the mismatched layout errors are both non-fatal warnings and that the
metadata will become internally consistent regardless. But these errors
only happen on times when I'm slamming the glusterfs system with the
creation of a bunch of small files in a very short burst like I described
above. So their presence seems to be related to the error.

I think the issue is almost assuredly related to the delayed propagation of
glusterfs directory metadata. Some nodes are creating directory
simultaneous to other nodes and the two are producing inconsistencies with
regards to the dht layout information. My hypothesis is that when Node A is
still writing that the process to resolve the inconsistencies with and
propagate the metadata from Node B is rendering the location that Node A is
writing to disconnected from its supposed path. (And hence the no dentry
errors).

I've taken some effort to go through the glusterfs source code,
particularly the dht related files. The way dht normalizes anomalies could
be the problem, but I've failed to find anything specific.

Has anyone else run into a problem like this, or have insight about what
might be causing it or how to avoid it?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://supercolony.gluster.org/pipermail/gluster-users/attachments/20130218/d2daa165/attachment.html>