[Gluster-users] GlusterFS 9.3 - Replicate Volume (2 Bricks / 1 Arbiter) - Self-healing does not always work

Ravishankar N ravishankar.n at pavilion.io
Sun Oct 31 06:35:23 UTC 2021


On Sat, Oct 30, 2021 at 10:47 PM Strahil Nikolov <hunter86_bg at yahoo.com>
wrote:

> Hi,
>
> based on the output it seems that for some reason the file was deployed
> locally but not on the 2-nd brick and the arbiter , which for a 'replica 3
> arbiter 1' (a.k.a replica 2 arbiter 1) is strange.
>
> It seems that cluster.eager-lock is enabled as per the virt group:
> https://github.com/gluster/glusterfs/blob/devel/extras/group-virt.example
>
> @Ravi,
>
> do you think that it should not be enabled by default in the virt group ?
>

It should be enabled alright, but we have noticed some issues of stale
locks (https://github.com/gluster/glusterfs/issues/ {2198, 2211, 2027})
which could prevent self-heal (or any other I/O that takes a blocking lock)
from happening. But the problem here is different as you noticed. Thorsten
needs to find the actual file (`find -samefile`) corresponding to this gfid
and see what is the file size, hard-link count etc.) If it is a zero -byte
file, then it should be safe to just delete the file and its hardlink from
the brick.

Regards,
Ravi


> Best Regards,
> Strahil Nikolov
>
>
>
> On Sat, Oct 30, 2021 at 16:14, Thorsten Walk
> <darkiop at gmail.com> wrote:
> Hi Ravi & Strahil, thanks a lot for your answer!
>
> The file in the path .glusterfs/26/c5/.. only exists at node1 (=pve01). On
> node2 (pve02) and the arbiter (freya), the file does not exist:
>
>
>
> ┬[14:35:48] [ssh:root at pve01(192.168.1.50): ~ (700)]
> ╰─># getfattr -d -m. -e hex
>  /data/glusterfs/.glusterfs/26/c5/26c5396c-86ff-408d-9cda-106acd2b0768
> getfattr: Removing leading '/' from absolute path names
> # file:
> data/glusterfs/.glusterfs/26/c5/26c5396c-86ff-408d-9cda-106acd2b0768
> trusted.afr.dirty=0x000000000000000000000000
> trusted.afr.glusterfs-1-volume-client-1=0x000000010000000100000000
> trusted.afr.glusterfs-1-volume-client-2=0x000000010000000100000000
> trusted.gfid=0x26c5396c86ff408d9cda106acd2b0768
>
> trusted.glusterfs.mdata=0x01000000000000000000000000617880a3000000003b2f011700000000617880a3000000003b2f011700000000617880a3000000003983a635
>
> ┬[14:36:49] [ssh:root at pve02(192.168.1.51):
> /data/glusterfs/.glusterfs/26/c5 (700)]
> ╰─># ll
> drwx------ root root   6B 3 days ago   ./
> drwx------ root root 8.0K 6 hours ago  ../
>
> ┬[14:36:58] [ssh:root at freya(192.168.1.40):
> /data/glusterfs/.glusterfs/26/c5 (700)]
> ╰─># ll
> drwx------ root root   6B 3 days ago   ./
> drwx------ root root 8.0K 3 hours ago  ../
>
>
>
> After this, i have disabled the the option you mentioned:
>
> gluster volume set glusterfs-1-volume cluster.eager-lock off
>
> After that I started another healing process manually. Unfortunately
> without success.
>
> @Strahil: For your idea with
> https://docs.gluster.org/en/latest/Troubleshooting/gfid-to-path/ i need
> more time, maybe i can try it tomorrow. I'll be in touch.
>
> Thanks again and best regards,
> Thorsten
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20211031/4f007dd9/attachment.html>


More information about the Gluster-users mailing list