[Gluster-users] GlusterFS 9.3 - Replicate Volume (2 Bricks / 1 Arbiter) - Self-healing does not always work

Tue Nov 30 15:26:18 UTC 2021

Hello all,

I have now rebuilt my cluster and am currently still in the process of
putting it back into operation. Should the error occur again, I would
contact you.

I would like to switch directly to GlusterFS 10. My two Intel NUCs are
running Proxmox 7.1, so GlusterFS 10 is not an issue - there is a Debian
repo for it.

My Arbiter (a Raspberry PI) is also running Debian Bullseye, but I couldn't
find a repo for GlusterFS 10 @ arm. Can I run the Arbiter on v9 together
with v10? Or is it better to stay on v9.

Thanks & Regards,
Thorsten

Am Fr., 5. Nov. 2021 um 20:46 Uhr schrieb Strahil Nikolov <
hunter86_bg at yahoo.com>:

> You can mount the volume via # mount -t glusterfs -o aux-gfid-mount
> vm1:test /mnt/testvol
>
> And then obtain the path:
>
> getfattr -n trusted.glusterfs.pathinfo -e text /mnt/testvol/.gfid/<GFID>
>
>
> Source: https://docs.gluster.org/en/latest/Troubleshooting/gfid-to-path/
>
> Best Regards,
> Strahil Nikolov
>
>
> On Fri, Nov 5, 2021 at 19:29, Thorsten Walk
> <darkiop at gmail.com> wrote:
> Hi Guys,
>
> I pushed some VMs to the GlusterFS storage this week and ran them there.
> For a maintenance task, I moved these VMs to Proxmox-Node-2 and took Node-1
> offline for a short time.
> After moving them back to Node-1 there were some file corpses left (see
> attachment). In the logs I can't find anything about the gfids :)
>
>
> ┬[15:36:51] [ssh:root at pve02(192.168.1.51): /home/darkiop (755)]
> ╰─># gvi
>
> Cluster:
>          Status: Healthy                 GlusterFS: 9.3
>          Nodes: 3/3                      Volumes: 1/1
>
> Volumes:
>
> glusterfs-1-volume
>                 Replicate          Started (UP) - 3/3 Bricks Up  -
> (Arbiter Volume)
>                                    Capacity: (17.89% used) 83.00
> GiB/466.00 GiB (used/total)
>                                    Self-Heal:
>                                       192.168.1.51:/data/glusterfs (4
> File(s) to heal).
>                                    Bricks:
>                                       Distribute Group 1:
>                                          192.168.1.50:/data/glusterfs
> (Online)
>                                          192.168.1.51:/data/glusterfs
> (Online)
>                                          192.168.1.40:/data/glusterfs
> (Online)
>
>
> Brick 192.168.1.50:/data/glusterfs
> Status: Connected
> Number of entries: 0
>
> Brick 192.168.1.51:/data/glusterfs
> <gfid:ade6f31c-b80b-457e-a054-6ca1548d9cd3>
> <gfid:39365c96-296b-4270-9cdb-1b751e40ad86>
> <gfid:54774d44-26a7-4954-a657-6e4fa79f2b97>
> <gfid:d5a8ae04-7301-4876-8d32-37fcd6093977>
> Status: Connected
> Number of entries: 4
>
> Brick 192.168.1.40:/data/glusterfs
> Status: Connected
> Number of entries: 0
>
>
> ┬[15:37:03] [ssh:root at pve02(192.168.1.51): /home/darkiop (755)]
> ╰─># cat
> /data/glusterfs/.glusterfs/ad/e6/ade6f31c-b80b-457e-a054-6ca1548d9cd3
> 22962
>
>
> ┬[15:37:13] [ssh:root at pve02(192.168.1.51): /home/darkiop (755)]
> ╰─># grep -ir 'ade6f31c-b80b-457e-a054-6ca1548d9cd3'
> /var/log/glusterfs/*.log
>
> Am Mo., 1. Nov. 2021 um 07:51 Uhr schrieb Thorsten Walk <darkiop at gmail.com
> >:
>
> After deleting the file, output of heal info is clear.
>
> >Not sure why you ended up in this situation (maybe unlink partially
> failed on this brick?)
>
> Neither did I, this was a completely fresh setup with 1-2 VMs and 1-2
> Proxmox LXC templates. I let it run for a few days and at some point it had
> the mentioned state. I continue to monitor and start with fill the bricks
> with data.
> Thanks for your help!
>
> Am Mo., 1. Nov. 2021 um 02:54 Uhr schrieb Ravishankar N <
> ravishankar.n at pavilion.io>:
>
>
>
> On Mon, Nov 1, 2021 at 12:02 AM Thorsten Walk <darkiop at gmail.com> wrote:
>
> Hi Ravi, the file only exists at pve01 and since only once:
>
> ┬[19:22:10] [ssh:root at pve01(192.168.1.50): ~ (700)]
> ╰─># stat
> /data/glusterfs/.glusterfs/26/c5/26c5396c-86ff-408d-9cda-106acd2b0768
>   File:
> /data/glusterfs/.glusterfs/26/c5/26c5396c-86ff-408d-9cda-106acd2b0768
>   Size: 6               Blocks: 8          IO Block: 4096   regular file
> Device: fd12h/64786d    Inode: 528         Links: 1
> Access: (0644/-rw-r--r--)  Uid: (    0/    root)   Gid: (    0/    root)
> Access: 2021-10-30 14:34:50.385893588 +0200
> Modify: 2021-10-27 00:26:43.988756557 +0200
> Change: 2021-10-27 00:26:43.988756557 +0200
>  Birth: -
>
> ┬[19:24:41] [ssh:root at pve01(192.168.1.50): ~ (700)]
> ╰─># ls -l
> /data/glusterfs/.glusterfs/26/c5/26c5396c-86ff-408d-9cda-106acd2b0768
> .rw-r--r-- root root 6B 4 days ago 
> /data/glusterfs/.glusterfs/26/c5/26c5396c-86ff-408d-9cda-106acd2b0768
>
> ┬[19:24:54] [ssh:root at pve01(192.168.1.50): ~ (700)]
> ╰─># cat
> /data/glusterfs/.glusterfs/26/c5/26c5396c-86ff-408d-9cda-106acd2b0768
> 28084
>
> Hi Thorsten, you can delete the file. From the file size and contents, it
> looks like it belongs to ovirt sanlock. Not sure why you ended up in this
> situation (maybe unlink partially failed on this brick?). You can check the
> mount, brick and self-heal daemon logs for this gfid to  see if you find
> related error/warning messages.
>
> -Ravi
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20211130/9fc9e93a/attachment.html>