[Gluster-devel] stat() returns invalid file size when self healing

Wed Apr 12 11:10:59 UTC 2017

On 04/12/2017 01:57 PM, Mateusz Slupny wrote:
> Hi,
>
> I'm observing strange behavior when accessing glusterfs 3.10.0 volume 
> through FUSE mount: when self-healing, stat() on a file that I know 
> has non-zero size and is being appended to results in stat() return 
> code 0, and st_size being set to 0 as well.
>
> Next week I'm planning to find a minimal reproducible example and file 
> a bug report. I wasn't able to find any references to similar issues, 
> but I wanted to make sure that it isn't an already known problem.
>
> Some notes about my current setup:
> - Multiple applications are writing to multiple FUSE mounts pointing 
> to the same gluster volume. Only one of those applicatuibs is writing 
> to a given file at a time. I am only appending to files, or to be 
> specific calling pwrite() with offset set to file size obtained by 
> stat(). (I'm not sure if using O_APPEND would change anything, but 
> still it would be a workaround, so shouldn't matter.)
> - The issue happens even if no reads are performed on those files, 
> e.g. load is no higher than usual.
> - Since I'm calling stat() only before writing, and only one node 
> writes to a given file, it means that stat() returns invalid size even 
> to clients that write to the file.
>
> Steps to reproduce:
> 0. Have multiple processes constantly appending data to files.
> 1. Stop one replica.
> 2. Wait few minutes.
> 3. Start that replica again - shd starts self healing.
> 4. stat() on some of the files that are being healed returns st_size 
> equal to 0.
>
> Setup:
> - glusterfs 3.10.0
>
> - volume type: replicas with arbiters
> Type: Distributed-Replicate
> Number of Bricks: 12 x (2 + 1) = 36
>
> - FUSE mount configuration:
> -o direct-io-mode=on passed explicitly to mount
>
> - volume configuration:
> cluster.consistent-metadata: yes
> cluster.eager-lock: on
> cluster.readdir-optimize: on
> cluster.self-heal-readdir-size: 64KB
> cluster.self-heal-daemon: on
> cluster.read-hash-mode: 2
> cluster.use-compound-fops: on
> cluster.ensure-durability: on
> cluster.granular-entry-heal: enable
> cluster.entry-self-heal: off
> cluster.data-self-heal: off
> cluster.metadata-self-heal: off
> performance.quick-read: off
> performance.md-cache-timeout: 600
> performance.cache-invalidation: on
> performance.stat-prefetch: on
> features.cache-invalidation-timeout: 600
> features.cache-invalidation: on
> performance.flush-behind: off
> performance.write-behind: off
> performance.open-behind: off
> cluster.background-self-heal-count: 1
> network.inode-lru-limit: 1024
> network.ping-timeout: 1
> performance.io-cache: off
> transport.address-family: inet
> nfs.disable: on
> cluster.locking-scheme: granular
>
> I have already verified that following options do not influence this 
> behavior:
> - cluster.data-self-heal-algorithm (all possible values)
> - cluster.eager-lock
> - cluster.consistent-metadata
> - performance.stat-prefetch
>
> I would greatly appreciate any hints on what may be wrong with the 
> current setup, or what to focus on (or not) in minimal reproducible 
> example.

Would you be able to  try and see if you can reproduce this in a 
replica-3 volume? Since you are observing it on arbiter config, the bug 
could be that the stat is being served from the arbiter brick but we had 
fixed (http://review.gluster.org/13609) in one of the 3.7 releases 
itself, so maybe this is a new bug. In any case please do raise the bug 
with the gluster logs attached.

Regards,
Ravi

>
> thanks and best regards,
> Matt
> _______________________________________________
> Gluster-devel mailing list
> Gluster-devel at gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-devel