[Gluster-users] GlusterFS 9.3 - Replicate Volume (2 Bricks / 1 Arbiter) - Self-healing does not always work

Fri Oct 29 06:57:47 UTC 2021

Hello GlusterFS Community,

I am using GlusterFS version 9.3 on two Intel NUCs and a Raspberry PI as
arbiter for a replicate volume. The whole thing serves me as distributed
storage for a Proxmox cluster.

I use version 9.3, because I could not find a more recent ARM package for
the RPI (= Debian 11).

The partions for the volume:

NUC1
nvme0n1                      259:0    0 465.8G  0 disk
└─vg_glusterfs-lv_glusterfs  253:18   0 465.8G  0 lvm  /data/glusterfs

NUC2
nvme0n1                      259:0    0 465.8G  0 disk
└─vg_glusterfs-lv_glusterfs  253:14   0 465.8G  0 lvm  /data/glusterfs

RPI
sda           8:0    1 29,8G  0 disk
└─sda1        8:1    1 29,8G  0 part /data/glusterfs

The volume was created with:

mkfs.xfs -f -i size=512 -n size=8192 -d su=128K,sw=10 -L GlusterFS
/dev/vg_glusterfs/lv_glusterfs

gluster volume create glusterfs-1-volume transport tcp replica 3 arbiter 1
192.168.1.50:/data/glusterfs 192.168.1.51:/data/glusterfs
192.168.1.40:/data/glusterfs
force

After a certain time it always comes to the state that there are not
healable files in the GFS (in the example below:
<gfid:26c5396c-86ff-408d-9cda-106acd2b0768>).

Currently I have the GlusterFS volume in test mode and only 1-2 VMs running
on it. So far there are no negative effects. The replication and the
selfheal basically work, only now and then something remains that cannot be
healed.

Does anyone have an idea how to prevent or heal this? I have already
completely rebuilt the volume incl. partitions and glusterd to exclude old
loads.

If you need more information, please contact me.

Thanks a lot!

================

And here is some more info about the volume and the healing attempts:

>$ gstatus -ab
Cluster:
         Status: Healthy                 GlusterFS: 9.3
         Nodes: 3/3                      Volumes: 1/1

Volumes:

glusterfs-1-volume
                Replicate          Started (UP) - 3/3 Bricks Up  - (Arbiter
Volume)
                                   Capacity: (1.82% used) 8.00 GiB/466.00
GiB (used/total)
                                   Self-Heal:
                                      192.168.1.50:/data/glusterfs (1
File(s) to heal).
                                   Bricks:
                                      Distribute Group 1:
                                         192.168.1.50:/data/glusterfs
(Online)
                                         192.168.1.51:/data/glusterfs
(Online)
                                         192.168.1.40:/data/glusterfs
(Online)

>$ gluster volume info
Volume Name: glusterfs-1-volume
Type: Replicate
Volume ID: f70d9b2c-b30d-4a36-b8ff-249c09c8b45d
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x (2 + 1) = 3
Transport-type: tcp
Bricks:
Brick1: 192.168.1.50:/data/glusterfs
Brick2: 192.168.1.51:/data/glusterfs
Brick3: 192.168.1.40:/data/glusterfs (arbiter)
Options Reconfigured:
cluster.lookup-optimize: off
server.keepalive-count: 5
server.keepalive-interval: 2
server.keepalive-time: 10
server.tcp-user-timeout: 20
network.ping-timeout: 20
server.event-threads: 4
client.event-threads: 4
cluster.choose-local: off
user.cifs: off
features.shard: on
cluster.shd-wait-qlength: 10000
cluster.shd-max-threads: 8
cluster.locking-scheme: granular
cluster.data-self-heal-algorithm: full
cluster.server-quorum-type: server
cluster.quorum-type: auto
cluster.eager-lock: enable
performance.strict-o-direct: on
network.remote-dio: disable
performance.low-prio-threads: 32
performance.io-cache: off
performance.read-ahead: off
performance.quick-read: off
cluster.granular-entry-heal: on
storage.fips-mode-rchecksum: on
transport.address-family: inet
nfs.disable: on
performance.client-io-threads: on

>$ gluster volume heal glusterfs-1-volume
Launching heal operation to perform index self heal on volume
glusterfs-1-volume has been successful
Use heal info commands to check status.

>$ gluster volume heal glusterfs-1-volume info
Brick 192.168.1.50:/data/glusterfs
<gfid:26c5396c-86ff-408d-9cda-106acd2b0768>
Status: Connected
Number of entries: 1

Brick 192.168.1.51:/data/glusterfs
Status: Connected
Number of entries: 0

Brick 192.168.1.40:/data/glusterfs
Status: Connected
Number of entries: 0

>$ gluster volume heal glusterfs-1-volume info split-brain
Brick 192.168.1.50:/data/glusterfs
Status: Connected
Number of entries in split-brain: 0

Brick 192.168.1.51:/data/glusterfs
Status: Connected
Number of entries in split-brain: 0

Brick 192.168.1.40:/data/glusterfs
Status: Connected
Number of entries in split-brain: 0
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20211029/3860c1e1/attachment.html>