[Gluster-users] missing files on FUSE mount

Fri Oct 23 15:50:17 UTC 2020

Hi Eli, remounting the volume fixes it.
So, regarding cache invalidation, which volume options should I modify in
order to minimize it?
I cannot use gluster vfs on samba since it is broken on 4.10.5
https://lists.samba.org/archive/samba/2019-June/223683.html
Also, is  it correlated to system load?. I'm planning to upgrade CPU/RAM on
a pair of nodes to see what happens...

On Fri, Oct 23, 2020 at 12:33 PM Eli V <eliventer at gmail.com> wrote:

> On Tue, Oct 20, 2020 at 8:41 AM Martín Lorenzo <mlorenzo at gmail.com> wrote:
> >
> > Hi, I have the following problem, I have a distributed replicated
> cluster set up with samba and CTDB, over fuse mount points
> > I am having inconsistencies across the FUSE mounts, users report that
> files are disappearing after being copied/moved. I take a look at the mount
> points on each node, and they don't display the same data
> >
> > #### faulty mount point####
> > [root at gluster6 ARRIBA GENTE martes 20 de octubre]# ll
> > ls: cannot access PANEO VUELTA A CLASES CON TAPABOCAS.mpg: No such file
> or directory
> > ls: cannot access PANEO NIÑOS ESCUELAS CON TAPABOCAS.mpg: No such file
> or directory
> > total 633723
> > drwxr-xr-x. 5 arribagente PN      4096 Oct 19 10:52 COMERCIAL AG martes
> 20 de octubre
> > -rw-r--r--. 1 arribagente PN 648927236 Jun  3 07:16 PANEO FACHADA
> PALACIO LEGISLATIVO DRONE DIA Y NOCHE.mpg
> > -?????????? ? ?           ?          ?            ? PANEO NIÑOS ESCUELAS
> CON TAPABOCAS.mpg
> > -?????????? ? ?           ?          ?            ? PANEO VUELTA A
> CLASES CON TAPABOCAS.mpg
> >
> >
> > ###healthy mount point###
> > [root at gluster7 ARRIBA GENTE martes 20 de octubre]# ll
> > total 3435596
> > drwxr-xr-x. 5 arribagente PN       4096 Oct 19 10:52 COMERCIAL AG martes
> 20 de octubre
> > -rw-r--r--. 1 arribagente PN  648927236 Jun  3 07:16 PANEO FACHADA
> PALACIO LEGISLATIVO DRONE DIA Y NOCHE.mpg
> > -rw-r--r--. 1 arribagente PN 2084415492 Aug 18 09:14 PANEO NIÑOS
> ESCUELAS CON TAPABOCAS.mpg
> > -rw-r--r--. 1 arribagente PN  784701444 Sep  4 07:23 PANEO VUELTA A
> CLASES CON TAPABOCAS.mpg
> >
> >  - So far the only way to solve this is to create a directory in the
> healthy mount point, on the same path:
> > [root at gluster7 ARRIBA GENTE martes 20 de octubre]# mkdir hola
> >
> > - When you refresh the other mountpoint, and the issue is resolved:
> > [root at gluster6 ARRIBA GENTE martes 20 de octubre]# ll
> > total 3435600
> > drwxr-xr-x. 5 arribagente PN         4096 Oct 19 10:52 COMERCIAL AG
> martes 20 de octubre
> > drwxr-xr-x. 2 root        root       4096 Oct 20 08:45 hola
> > -rw-r--r--. 1 arribagente PN    648927236 Jun  3 07:16 PANEO FACHADA
> PALACIO LEGISLATIVO DRONE DIA Y NOCHE.mpg
> > -rw-r--r--. 1 arribagente PN   2084415492 Aug 18 09:14 PANEO NIÑOS
> ESCUELAS CON TAPABOCAS.mpg
> > -rw-r--r--. 1 arribagente PN    784701444 Sep  4 07:23 PANEO VUELTA A
> CLASES CON TAPABOCAS.mpg
> >
> > Interestingly, the error occurs on the mount point where the files were
> copied. They don't show up as pending heal entries. I have around 15 people
> using them over samba, I think I'm having this issue reported every two
> days.
> >
> > I have an older cluster with similar issues, different gluster version,
> but a very similar topology (4 bricks, initially two bricks then expanded)
> > Please note , the bricks aren't the same size (but their replicas are),
> so my other suspicion is that rebalancing has something to do with it.
> >
> > I'm trying to reproduce it over a small virtualized cluster, so far no
> results.
> >
> > Here are the cluster details
> > four nodes, replica 2, plus one arbiter hosting 2 bricks
> > I have 2 bricks with ~20 TB capacity and the other pair is ~48TB
> > Volume Name: tapeless
> > Type: Distributed-Replicate
> > Volume ID: 53bfa86d-b390-496b-bbd7-c4bba625c956
> > Status: Started
> > Snapshot Count: 0
> > Number of Bricks: 2 x (2 + 1) = 6
> > Transport-type: tcp
> > Bricks:
> > Brick1: gluster6.glustersaeta.net:/data/glusterfs/tapeless/brick_6/brick
> > Brick2: gluster7.glustersaeta.net:/data/glusterfs/tapeless/brick_7/brick
> > Brick3: kitchen-store.glustersaeta.net:/data/glusterfs/tapeless/brick_1a/brick
> (arbiter)
> > Brick4: gluster12.glustersaeta.net:
> /data/glusterfs/tapeless/brick_12/brick
> > Brick5: gluster13.glustersaeta.net:
> /data/glusterfs/tapeless/brick_13/brick
> > Brick6: kitchen-store.glustersaeta.net:/data/glusterfs/tapeless/brick_2a/brick
> (arbiter)
> > Options Reconfigured:
> > features.quota-deem-statfs: on
> > performance.client-io-threads: on
> > nfs.disable: on
> > transport.address-family: inet
> > features.quota: on
> > features.inode-quota: on
> > features.cache-invalidation: on
> > features.cache-invalidation-timeout: 600
> > performance.cache-samba-metadata: on
> > performance.stat-prefetch: on
> > performance.cache-invalidation: on
> > performance.md-cache-timeout: 600
> > network.inode-lru-limit: 200000
> > performance.nl-cache: on
> > performance.nl-cache-timeout: 600
> > performance.readdir-ahead: on
> > performance.parallel-readdir: on
> > performance.cache-size: 1GB
> > client.event-threads: 4
> > server.event-threads: 4
> > performance.normal-prio-threads: 16
> > performance.io-thread-count: 32
> > performance.write-behind-window-size: 8MB
> > storage.batch-fsync-delay-usec: 0
> > cluster.data-self-heal: on
> > cluster.metadata-self-heal: on
> > cluster.entry-self-heal: on
> > cluster.self-heal-daemon: on
> > performance.write-behind: on
> > performance.open-behind: on
> >
> > Log section form faulty mount point. I think the [file exists] entries
> are from people trying to copy the missing files over an over
> >
> >
> > [2020-10-20 11:31:03.034220] I [MSGID: 108031]
> [afr-common.c:2581:afr_local_discovery_cbk] 0-tapeless-replicate-0:
> selecting local read_child tapeless-client-0
> > [2020-10-20 11:32:06.684329] I [MSGID: 108031]
> [afr-common.c:2581:afr_local_discovery_cbk] 0-tapeless-replicate-0:
> selecting local read_child tapeless-client-0
> > [2020-10-20 11:33:02.191863] I [MSGID: 108031]
> [afr-common.c:2581:afr_local_discovery_cbk] 0-tapeless-replicate-0:
> selecting local read_child tapeless-client-0
> > [2020-10-20 11:34:05.841608] I [MSGID: 108031]
> [afr-common.c:2581:afr_local_discovery_cbk] 0-tapeless-replicate-0:
> selecting local read_child tapeless-client-0
> > [2020-10-20 11:35:20.736633] I [MSGID: 108026]
> [afr-self-heal-metadata.c:52:__afr_selfheal_metadata_do]
> 0-tapeless-replicate-1: performing metadata selfheal on
> 958dbd7a-3cd7-4b66-9038-76e5c5669644
> > [2020-10-20 11:35:20.741213] I [MSGID: 108026]
> [afr-self-heal-common.c:1750:afr_log_selfheal] 0-tapeless-replicate-1:
> Completed metadata selfheal on 958dbd7a-3cd7-4b66-9038-76e5c5669644.
> sources=[0] 1  sinks=2
> > [2020-10-20 11:35:04.278043] I [MSGID: 108031]
> [afr-common.c:2581:afr_local_discovery_cbk] 0-tapeless-replicate-0:
> selecting local read_child tapeless-client-0
> > The message "I [MSGID: 108026]
> [afr-self-heal-metadata.c:52:__afr_selfheal_metadata_do]
> 0-tapeless-replicate-1: performing metadata selfheal on
> 958dbd7a-3cd7-4b66-9038-76e5c5669644" repeated 3 times between [2020-10-20
> 11:35:20.736633] and [2020-10-20 11:35:26.733298]
> > The message "I [MSGID: 108026]
> [afr-self-heal-common.c:1750:afr_log_selfheal] 0-tapeless-replicate-1:
> Completed metadata selfheal on 958dbd7a-3cd7-4b66-9038-76e5c5669644.
> sources=[0] 1  sinks=2 " repeated 3 times between [2020-10-20
> 11:35:20.741213] and [2020-10-20 11:35:26.737629]
> > [2020-10-20 11:36:02.548350] I [MSGID: 108031]
> [afr-common.c:2581:afr_local_discovery_cbk] 0-tapeless-replicate-0:
> selecting local read_child tapeless-client-0
> > [2020-10-20 11:36:57.365537] I [MSGID: 108026]
> [afr-self-heal-metadata.c:52:__afr_selfheal_metadata_do]
> 0-tapeless-replicate-1: performing metadata selfheal on
> f4907af2-1775-4c46-89b5-e9776df6d5c7
> > [2020-10-20 11:36:57.370824] I [MSGID: 108026]
> [afr-self-heal-common.c:1750:afr_log_selfheal] 0-tapeless-replicate-1:
> Completed metadata selfheal on f4907af2-1775-4c46-89b5-e9776df6d5c7.
> sources=[0] 1  sinks=2
> > [2020-10-20 11:37:01.363925] I [MSGID: 108026]
> [afr-self-heal-metadata.c:52:__afr_selfheal_metadata_do]
> 0-tapeless-replicate-1: performing metadata selfheal on
> f4907af2-1775-4c46-89b5-e9776df6d5c7
> > [2020-10-20 11:37:01.368069] I [MSGID: 108026]
> [afr-self-heal-common.c:1750:afr_log_selfheal] 0-tapeless-replicate-1:
> Completed metadata selfheal on f4907af2-1775-4c46-89b5-e9776df6d5c7.
> sources=[0] 1  sinks=2
> > The message "I [MSGID: 108031]
> [afr-common.c:2581:afr_local_discovery_cbk] 0-tapeless-replicate-0:
> selecting local read_child tapeless-client-0" repeated 3 times between
> [2020-10-20 11:36:02.548350] and [2020-10-20 11:37:36.389208]
> > [2020-10-20 11:38:07.367113] I [MSGID: 108031]
> [afr-common.c:2581:afr_local_discovery_cbk] 0-tapeless-replicate-0:
> selecting local read_child tapeless-client-0
> > [2020-10-20 11:39:01.595981] I [MSGID: 108031]
> [afr-common.c:2581:afr_local_discovery_cbk] 0-tapeless-replicate-0:
> selecting local read_child tapeless-client-0
> > [2020-10-20 11:40:04.184899] I [MSGID: 108031]
> [afr-common.c:2581:afr_local_discovery_cbk] 0-tapeless-replicate-0:
> selecting local read_child tapeless-client-0
> > [2020-10-20 11:41:07.833470] I [MSGID: 108031]
> [afr-common.c:2581:afr_local_discovery_cbk] 0-tapeless-replicate-0:
> selecting local read_child tapeless-client-0
> > [2020-10-20 11:42:01.871621] I [MSGID: 108031]
> [afr-common.c:2581:afr_local_discovery_cbk] 0-tapeless-replicate-0:
> selecting local read_child tapeless-client-0
> > [2020-10-20 11:43:04.399194] I [MSGID: 108031]
> [afr-common.c:2581:afr_local_discovery_cbk] 0-tapeless-replicate-0:
> selecting local read_child tapeless-client-0
> > [2020-10-20 11:44:04.558647] I [MSGID: 108031]
> [afr-common.c:2581:afr_local_discovery_cbk] 0-tapeless-replicate-0:
> selecting local read_child tapeless-client-0
> > [2020-10-20 11:44:15.953600] W [MSGID: 114031]
> [client-rpc-fops_v2.c:2114:client4_0_create_cbk] 0-tapeless-client-5:
> remote operation failed. Path: /PN/arribagente/PLAYER 2020/ARRIBA GENTE
> martes 20 de octubre/PANEO NIÑOS ESCUELAS CON TAPABOCAS.mpg [File exists]
> > [2020-10-20 11:44:15.953819] W [MSGID: 114031]
> [client-rpc-fops_v2.c:2114:client4_0_create_cbk] 0-tapeless-client-2:
> remote operation failed. Path: /PN/arribagente/PLAYER 2020/ARRIBA GENTE
> martes 20 de octubre/PANEO NIÑOS ESCUELAS CON TAPABOCAS.mpg [File exists]
> > [2020-10-20 11:44:15.954072] W [MSGID: 114031]
> [client-rpc-fops_v2.c:2114:client4_0_create_cbk] 0-tapeless-client-3:
> remote operation failed. Path: /PN/arribagente/PLAYER 2020/ARRIBA GENTE
> martes 20 de octubre/PANEO NIÑOS ESCUELAS CON TAPABOCAS.mpg [File exists]
> > [2020-10-20 11:44:15.954680] W [fuse-bridge.c:2606:fuse_create_cbk]
> 0-glusterfs-fuse: 31043294: /PN/arribagente/PLAYER 2020/ARRIBA GENTE martes
> 20 de octubre/PANEO NIÑOS ESCUELAS CON TAPABOCAS.mpg => -1 (File exists)
> > [2020-10-20 11:44:15.963175] W [fuse-bridge.c:2606:fuse_create_cbk]
> 0-glusterfs-fuse: 31043306: /PN/arribagente/PLAYER 2020/ARRIBA GENTE martes
> 20 de octubre/PANEO NIÑOS ESCUELAS CON TAPABOCAS.mpg => -1 (File exists)
> > [2020-10-20 11:44:15.971839] W [fuse-bridge.c:2606:fuse_create_cbk]
> 0-glusterfs-fuse: 31043318: /PN/arribagente/PLAYER 2020/ARRIBA GENTE martes
> 20 de octubre/PANEO NIÑOS ESCUELAS CON TAPABOCAS.mpg => -1 (File exists)
> > [2020-10-20 11:44:16.010242] W [fuse-bridge.c:2606:fuse_create_cbk]
> 0-glusterfs-fuse: 31043403: /PN/arribagente/PLAYER 2020/ARRIBA GENTE martes
> 20 de octubre/PANEO NIÑOS ESCUELAS CON TAPABOCAS.mpg => -1 (File exists)
> > [2020-10-20 11:44:16.020291] W [fuse-bridge.c:2606:fuse_create_cbk]
> 0-glusterfs-fuse: 31043415: /PN/arribagente/PLAYER 2020/ARRIBA GENTE martes
> 20 de octubre/PANEO NIÑOS ESCUELAS CON TAPABOCAS.mpg => -1 (File exists)
> > [2020-10-20 11:44:16.028857] W [fuse-bridge.c:2606:fuse_create_cbk]
> 0-glusterfs-fuse: 31043427: /PN/arribagente/PLAYER 2020/ARRIBA GENTE martes
> 20 de octubre/PANEO NIÑOS ESCUELAS CON TAPABOCAS.mpg => -1 (File exists)
> > The message "W [MSGID: 114031]
> [client-rpc-fops_v2.c:2114:client4_0_create_cbk] 0-tapeless-client-5:
> remote operation failed. Path: /PN/arribagente/PLAYER 2020/ARRIBA GENTE
> martes 20 de octubre/PANEO NIÑOS ESCUELAS CON TAPABOCAS.mpg [File exists]"
> repeated 5 times between [2020-10-20 11:44:15.953600] and [2020-10-20
> 11:44:16.027785]
> > The message "W [MSGID: 114031]
> [client-rpc-fops_v2.c:2114:client4_0_create_cbk] 0-tapeless-client-2:
> remote operation failed. Path: /PN/arribagente/PLAYER 2020/ARRIBA GENTE
> martes 20 de octubre/PANEO NIÑOS ESCUELAS CON TAPABOCAS.mpg [File exists]"
> repeated 5 times between [2020-10-20 11:44:15.953819] and [2020-10-20
> 11:44:16.028331]
> > The message "W [MSGID: 114031]
> [client-rpc-fops_v2.c:2114:client4_0_create_cbk] 0-tapeless-client-3:
> remote operation failed. Path: /PN/arribagente/PLAYER 2020/ARRIBA GENTE
> martes 20 de octubre/PANEO NIÑOS ESCUELAS CON TAPABOCAS.mpg [File exists]"
> repeated 5 times between [2020-10-20 11:44:15.954072] and [2020-10-20
> 11:44:16.028355]
> > [2020-10-20 11:45:03.572106] I [MSGID: 108031]
> [afr-common.c:2581:afr_local_discovery_cbk] 0-tapeless-replicate-0:
> selecting local read_child tapeless-client-0
> > [2020-10-20 11:45:40.080010] I [MSGID: 108031]
> [afr-common.c:2581:afr_local_discovery_cbk] 0-tapeless-replicate-0:
> selecting local read_child tapeless-client-0
> > The message "I [MSGID: 108031]
> [afr-common.c:2581:afr_local_discovery_cbk] 0-tapeless-replicate-0:
> selecting local read_child tapeless-client-0" repeated 2 times between
> [2020-10-20 11:45:40.080010] and [2020-10-20 11:47:10.871801]
> > [2020-10-20 11:48:03.913129] I [MSGID: 108031]
> [afr-common.c:2581:afr_local_discovery_cbk] 0-tapeless-replicate-0:
> selecting local read_child tapeless-client-0
> > [2020-10-20 11:49:05.082165] I [MSGID: 108031]
> [afr-common.c:2581:afr_local_discovery_cbk] 0-tapeless-replicate-0:
> selecting local read_child tapeless-client-0
> > [2020-10-20 11:50:06.725722] I [MSGID: 108031]
> [afr-common.c:2581:afr_local_discovery_cbk] 0-tapeless-replicate-0:
> selecting local read_child tapeless-client-0
> > [2020-10-20 11:51:04.254685] I [MSGID: 108031]
> [afr-common.c:2581:afr_local_discovery_cbk] 0-tapeless-replicate-0:
> selecting local read_child tapeless-client-0
> > [2020-10-20 11:52:07.903617] I [MSGID: 108031]
> [afr-common.c:2581:afr_local_discovery_cbk] 0-tapeless-replicate-0:
> selecting local read_child tapeless-client-0
> > [2020-10-20 11:53:01.420513] I [MSGID: 108026]
> [afr-self-heal-metadata.c:52:__afr_selfheal_metadata_do]
> 0-tapeless-replicate-0: performing metadata selfheal on
> 3c316533-5f47-4267-ac19-58b3be305b94
> > [2020-10-20 11:53:01.428657] I [MSGID: 108026]
> [afr-self-heal-common.c:1750:afr_log_selfheal] 0-tapeless-replicate-0:
> Completed metadata selfheal on 3c316533-5f47-4267-ac19-58b3be305b94.
> sources=[0]  sinks=1 2
> > The message "I [MSGID: 108031]
> [afr-common.c:2581:afr_local_discovery_cbk] 0-tapeless-replicate-0:
> selecting local read_child tapeless-client-0" repeated 3 times between
> [2020-10-20 11:52:07.903617] and [2020-10-20 11:53:12.037835]
> > [2020-10-20 11:54:02.208354] I [MSGID: 108031]
> [afr-common.c:2581:afr_local_discovery_cbk] 0-tapeless-replicate-0:
> selecting local read_child tapeless-client-0
> > [2020-10-20 11:55:04.360284] I [MSGID: 108031]
> [afr-common.c:2581:afr_local_discovery_cbk] 0-tapeless-replicate-0:
> selecting local read_child tapeless-client-0
> > [2020-10-20 11:56:09.508092] I [MSGID: 108031]
> [afr-common.c:2581:afr_local_discovery_cbk] 0-tapeless-replicate-0:
> selecting local read_child tapeless-client-0
> > [2020-10-20 11:57:02.580970] I [MSGID: 108031]
> [afr-common.c:2581:afr_local_discovery_cbk] 0-tapeless-replicate-0:
> selecting local read_child tapeless-client-0
> > [2020-10-20 11:58:06.230698] I [MSGID: 108031]
> [afr-common.c:2581:afr_local_discovery_cbk] 0-tapeless-replicate-0:
> selecting local read_child tapeless-client-0
> >
> >
> > Let me know if you need something else. Thank you for you suppoort!
> > Best Regards,
> > Martin Lorenzo
> >
> >
> > ________
> >
> >
> >
> > Community Meeting Calendar:
> >
> > Schedule -
> > Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
> > Bridge: https://bluejeans.com/441850968
> >
> > Gluster-users mailing list
> > Gluster-users at gluster.org
> > https://lists.gluster.org/mailman/listinfo/gluster-users
>
> I've seen very similar issues with changes, specifically in my case
> deletion and then recreation of a large directory not appearing for a
> host using the fuse client. My gluster has all bricks of identical
> size, so I don't think that's the issue in and of itself. It seems
> like some sort of fuse client cache invalidation bug to me, since a
> umount && mount on the client with the problem always fixes it.
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20201023/3f98f93c/attachment.html>