[Gluster-users] GlusterFS 9.3 - Replicate Volume (2 Bricks / 1 Arbiter) - Self-healing does not always work

Fri Nov 5 17:28:56 UTC 2021

Hi Guys,

I pushed some VMs to the GlusterFS storage this week and ran them there.
For a maintenance task, I moved these VMs to Proxmox-Node-2 and took Node-1
offline for a short time.
After moving them back to Node-1 there were some file corpses left (see
attachment). In the logs I can't find anything about the gfids :)

┬[15:36:51] [ssh:root at pve02(192.168.1.51): /home/darkiop (755)]
╰─># gvi

Cluster:
         Status: Healthy                 GlusterFS: 9.3
         Nodes: 3/3                      Volumes: 1/1

Volumes:

glusterfs-1-volume
                Replicate          Started (UP) - 3/3 Bricks Up  - (Arbiter
Volume)
                                   Capacity: (17.89% used) 83.00 GiB/466.00
GiB (used/total)
                                   Self-Heal:
                                      192.168.1.51:/data/glusterfs (4
File(s) to heal).
                                   Bricks:
                                      Distribute Group 1:
                                         192.168.1.50:/data/glusterfs
(Online)
                                         192.168.1.51:/data/glusterfs
(Online)
                                         192.168.1.40:/data/glusterfs
(Online)

Brick 192.168.1.50:/data/glusterfs
Status: Connected
Number of entries: 0

Brick 192.168.1.51:/data/glusterfs
<gfid:ade6f31c-b80b-457e-a054-6ca1548d9cd3>
<gfid:39365c96-296b-4270-9cdb-1b751e40ad86>
<gfid:54774d44-26a7-4954-a657-6e4fa79f2b97>
<gfid:d5a8ae04-7301-4876-8d32-37fcd6093977>
Status: Connected
Number of entries: 4

Brick 192.168.1.40:/data/glusterfs
Status: Connected
Number of entries: 0

┬[15:37:03] [ssh:root at pve02(192.168.1.51): /home/darkiop (755)]
╰─># cat
/data/glusterfs/.glusterfs/ad/e6/ade6f31c-b80b-457e-a054-6ca1548d9cd3
22962

┬[15:37:13] [ssh:root at pve02(192.168.1.51): /home/darkiop (755)]
╰─># grep -ir 'ade6f31c-b80b-457e-a054-6ca1548d9cd3'
/var/log/glusterfs/*.log

Am Mo., 1. Nov. 2021 um 07:51 Uhr schrieb Thorsten Walk <darkiop at gmail.com>:

> After deleting the file, output of heal info is clear.
>
> >Not sure why you ended up in this situation (maybe unlink partially
> failed on this brick?)
>
> Neither did I, this was a completely fresh setup with 1-2 VMs and 1-2
> Proxmox LXC templates. I let it run for a few days and at some point it had
> the mentioned state. I continue to monitor and start with fill the bricks
> with data.
> Thanks for your help!
>
> Am Mo., 1. Nov. 2021 um 02:54 Uhr schrieb Ravishankar N <
> ravishankar.n at pavilion.io>:
>
>>
>>
>> On Mon, Nov 1, 2021 at 12:02 AM Thorsten Walk <darkiop at gmail.com> wrote:
>>
>>> Hi Ravi, the file only exists at pve01 and since only once:
>>>
>>> ┬[19:22:10] [ssh:root at pve01(192.168.1.50): ~ (700)]
>>> ╰─># stat
>>> /data/glusterfs/.glusterfs/26/c5/26c5396c-86ff-408d-9cda-106acd2b0768
>>>   File:
>>> /data/glusterfs/.glusterfs/26/c5/26c5396c-86ff-408d-9cda-106acd2b0768
>>>   Size: 6               Blocks: 8          IO Block: 4096   regular file
>>> Device: fd12h/64786d    Inode: 528         Links: 1
>>> Access: (0644/-rw-r--r--)  Uid: (    0/    root)   Gid: (    0/    root)
>>> Access: 2021-10-30 14:34:50.385893588 +0200
>>> Modify: 2021-10-27 00:26:43.988756557 +0200
>>> Change: 2021-10-27 00:26:43.988756557 +0200
>>>  Birth: -
>>>
>>> ┬[19:24:41] [ssh:root at pve01(192.168.1.50): ~ (700)]
>>> ╰─># ls -l
>>> /data/glusterfs/.glusterfs/26/c5/26c5396c-86ff-408d-9cda-106acd2b0768
>>> .rw-r--r-- root root 6B 4 days ago 
>>> /data/glusterfs/.glusterfs/26/c5/26c5396c-86ff-408d-9cda-106acd2b0768
>>>
>>> ┬[19:24:54] [ssh:root at pve01(192.168.1.50): ~ (700)]
>>> ╰─># cat
>>> /data/glusterfs/.glusterfs/26/c5/26c5396c-86ff-408d-9cda-106acd2b0768
>>> 28084
>>>
>>> Hi Thorsten, you can delete the file. From the file size and contents,
>> it looks like it belongs to ovirt sanlock. Not sure why you ended up in
>> this situation (maybe unlink partially failed on this brick?). You can
>> check the mount, brick and self-heal daemon logs for this gfid to  see if
>> you find related error/warning messages.
>>
>> -Ravi
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20211105/3b2fda5a/attachment.html>