[Gluster-users] Gluster FUSE mount sometimes reports that files do not exist until ls is performed on parent directory

Mon Apr 16 08:37:26 UTC 2018

On Mon, Apr 16, 2018 at 1:54 PM, Niels Hendriks <niels at nuvini.com> wrote:

> Hi,
>
> We have a 3-node gluster setup where gluster is both the server and the
> client.
> Every few days we have some $random file or directory that does not exist
> according to the FUSE mountpoint. When we try to access the file (stat,
> cat, etc...) the filesystem reports that the file/directory does not exist,
> even though it does. When we try to create the file/directory we receive
> the following error which is also logged in
> /var/log/glusterfs/bricks/$brick.log:
>
> [2018-04-10 12:51:26.755928] E [MSGID: 113027] [posix.c:1779:posix_mkdir]
> 0-www-posix: mkdir of /storage/gluster/path/to/dir failed [File exists]
>
> We don't see this issue on all of the servers, but only on the servers that
> did not create the file/directory (so 2 of the 3 gluster nodes).
>
> We found that this issue does not resolve itself automatically. However,
> when we perform an ls command on the parent directory the issue will be
> resolved for the other nodes.
>
> We are running glusterfs 3.12.6 on debian 8
>
> Mount-options in /etc/fstab:
> /dev/storage-gluster/gluster /storage/gluster xfs rw,inode64,noatime,nouuid
> 0 2
> localhost:/www /var/www glusterfs
> backup-volfile-servers=10.0.0.2:10.0.0.3,log-level=WARNING
> 0 0
>
> gluster volume info www
>
> Volume Name: www
> Type: Replicate
> Volume ID: e0579d53-f671-4868-863b-ba85c4cfacb3
> Status: Started
> Snapshot Count: 0
> Number of Bricks: 1 x 3 = 3
> Transport-type: tcp
> Bricks:
> Brick1: n01c01-gluster:/storage/gluster/www
> Brick2: n02c01-gluster:/storage/gluster/www
> Brick3: n03c01-gluster:/storage/gluster/www
> Options Reconfigured:
> performance.read-ahead: on
> performance.client-io-threads: on
> nfs.disable: on
> transport.address-family: inet
> performance.md-cache-timeout: 600
> diagnostics.brick-log-level: WARNING
> network.ping-timeout: 3
> features.cache-invalidation: on
> server.event-threads: 4
> performance.cache-invalidation: on
> performance.quick-read: on
> features.cache-invalidation-timeout: 600
> network.inode-lru-limit: 90000
> performance.cache-priority: *.php:3,*.temp:3,*:1
> performance.nl-cache: on
> performance.cache-size: 1GB
> performance.readdir-ahead: on
> performance.write-behind: on
> cluster.readdir-optimize: on
> performance.io-thread-count: 64
> client.event-threads: 4
> cluster.lookup-optimize: on
> performance.parallel-readdir: off
> performance.write-behind-window-size: 4MB
> performance.flush-behind: on
> features.bitrot: on
> features.scrub: Active
> performance.io-cache: off
> performance.stat-prefetch: on
>
> We suspected that the md-cache could be the cause, but it does have a
> timeout of 600 seconds so this would be strange since the issue can be
> present for hours (at which point we did an ls to fix it).
>
> Does anyone have an idea of what could be the cause of this?
>

For files, it could be because of:
* cluster.lookup-optimize set to on
* datafile is present on non hashed subvol, but linkto file is absent on
hashed subvol

I see that lookup-optimize is on. Can you get the following information
from problematic file?

* Name of the file
* all xattrs on parent directory from all bricks
* stat of file from all bricks where it is present.
* all xattrs on file from all bricks where it is present.

If you are seeing the problem on directory,
* Name of directory
* xattr of directory and its parent from all bricks

regards,
Raghavendra

> Thanks!
>
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-users
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20180416/909d12de/attachment.html>