[Gluster-users] GlusterFS 9.3 - Replicate Volume (2 Bricks / 1 Arbiter) - Self-healing does not always work

Strahil Nikolov hunter86_bg at yahoo.com
Fri Nov 5 19:45:51 UTC 2021


You can mount the volume via # mount -t glusterfs -o aux-gfid-mount vm1:test /mnt/testvol
And then obtain the path:
getfattr -n trusted.glusterfs.pathinfo -e text /mnt/testvol/.gfid/<GFID>
 
Source: https://docs.gluster.org/en/latest/Troubleshooting/gfid-to-path/
Best Regards,Strahil Nikolov

 
  On Fri, Nov 5, 2021 at 19:29, Thorsten Walk<darkiop at gmail.com> wrote:   Hi Guys,
I pushed some VMs to the GlusterFS storage this week and ran them there. For a maintenance task, I moved these VMs to Proxmox-Node-2 and took Node-1 offline for a short time.After moving them back to Node-1 there were some file corpses left (see attachment). In the logs I can't find anything about the gfids :)


┬[15:36:51] [ssh:root at pve02(192.168.1.51): /home/darkiop (755)]
╰─># gvi

Cluster:
         Status: Healthy                 GlusterFS: 9.3
         Nodes: 3/3                      Volumes: 1/1

Volumes: 

glusterfs-1-volume
                Replicate          Started (UP) - 3/3 Bricks Up  - (Arbiter Volume)
                                   Capacity: (17.89% used) 83.00 GiB/466.00 GiB (used/total)
                                   Self-Heal:
                                      192.168.1.51:/data/glusterfs (4 File(s) to heal).
                                   Bricks:
                                      Distribute Group 1:
                                         192.168.1.50:/data/glusterfs   (Online)
                                         192.168.1.51:/data/glusterfs   (Online)
                                         192.168.1.40:/data/glusterfs   (Online)


Brick 192.168.1.50:/data/glusterfs
Status: Connected
Number of entries: 0

Brick 192.168.1.51:/data/glusterfs
<gfid:ade6f31c-b80b-457e-a054-6ca1548d9cd3> 
<gfid:39365c96-296b-4270-9cdb-1b751e40ad86> 
<gfid:54774d44-26a7-4954-a657-6e4fa79f2b97> 
<gfid:d5a8ae04-7301-4876-8d32-37fcd6093977> 
Status: Connected
Number of entries: 4

Brick 192.168.1.40:/data/glusterfs
Status: Connected
Number of entries: 0


┬[15:37:03] [ssh:root at pve02(192.168.1.51): /home/darkiop (755)]
╰─># cat /data/glusterfs/.glusterfs/ad/e6/ade6f31c-b80b-457e-a054-6ca1548d9cd3
22962


┬[15:37:13] [ssh:root at pve02(192.168.1.51): /home/darkiop (755)]
╰─># grep -ir 'ade6f31c-b80b-457e-a054-6ca1548d9cd3' /var/log/glusterfs/*.log

Am Mo., 1. Nov. 2021 um 07:51 Uhr schrieb Thorsten Walk <darkiop at gmail.com>:

After deleting the file, output of heal info is clear.
>Not sure why you ended up in this situation (maybe unlink partially failed on this brick?)
Neither did I, this was a completely fresh setup with 1-2 VMs and 1-2 Proxmox LXC templates. I let it run for a few days and at some point it had the mentioned state. I continue to monitor and start with fill the bricks with data.
Thanks for your help!

Am Mo., 1. Nov. 2021 um 02:54 Uhr schrieb Ravishankar N <ravishankar.n at pavilion.io>:



On Mon, Nov 1, 2021 at 12:02 AM Thorsten Walk <darkiop at gmail.com> wrote:

Hi Ravi, the file only exists at pve01 and since only once:
┬[19:22:10] [ssh:root at pve01(192.168.1.50): ~ (700)]
╰─># stat /data/glusterfs/.glusterfs/26/c5/26c5396c-86ff-408d-9cda-106acd2b0768
  File: /data/glusterfs/.glusterfs/26/c5/26c5396c-86ff-408d-9cda-106acd2b0768
  Size: 6               Blocks: 8          IO Block: 4096   regular file
Device: fd12h/64786d    Inode: 528         Links: 1
Access: (0644/-rw-r--r--)  Uid: (    0/    root)   Gid: (    0/    root)
Access: 2021-10-30 14:34:50.385893588 +0200
Modify: 2021-10-27 00:26:43.988756557 +0200
Change: 2021-10-27 00:26:43.988756557 +0200
 Birth: -

┬[19:24:41] [ssh:root at pve01(192.168.1.50): ~ (700)]
╰─># ls -l /data/glusterfs/.glusterfs/26/c5/26c5396c-86ff-408d-9cda-106acd2b0768
.rw-r--r-- root root 6B 4 days ago  /data/glusterfs/.glusterfs/26/c5/26c5396c-86ff-408d-9cda-106acd2b0768

┬[19:24:54] [ssh:root at pve01(192.168.1.50): ~ (700)]
╰─># cat /data/glusterfs/.glusterfs/26/c5/26c5396c-86ff-408d-9cda-106acd2b0768
28084


Hi Thorsten, you can delete the file. From the file size and contents, it looks like it belongs to ovirt sanlock. Not sure why you ended up in this situation (maybe unlink partially failed on this brick?). You can check the mount, brick and self-heal daemon logs for this gfid to  see if you find related error/warning messages.
-Ravi

  
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20211105/65828a39/attachment.html>


More information about the Gluster-users mailing list