[Gluster-users] Glusterfs 10.5-1 healing issues

Mon Apr 22 14:59:44 UTC 2024

Dear Darrell,

Thanks again for the suggestions. We used some of the options suggested 
for our case and in the meantime we also set the round-robin 
configuration in order to have a single pointer in the samba share 
configuration that then picks one of the bricks (single host name in our 
DNS each brick).

This has improved the situation and the bricks are developing less 
entries. But less is not really "none" as unfortunately we have a 
behaviour observed:

We very frequently have directories not healing and then the files and 
dirs under this "base path".

Sometimes one brick reports "New Folder" in the path, and the others 
have already a name for it.

So we see a pattern also with renamed directories. I think that we have 
an issue with renaming taking place slowly on one brick (and it is not 
necessary the same one) while the others have already the correct name.

Rebooting solves this, or renaming the file on the volume (or in case of 
new folder create a "New Folder" in the path the brick which is wrong 
expects it), issue listing, (delete if "New Folder") rename the folder 
with old name on the "wrong" brick, listing, re-renaming and the entries 
are healed. At this point the names match and everything is back. What 
is not fully understandable for me (because I surely miss something 
fundamental) is that the gfid should be the same everywhere and I would 
not expect that a rename changes that. So what is seen as mismatching is 
just the "name attribute", once this matches, everything is back fine.

Sorry for the simplistic exposure of facts, but I hope you have some 
suggestions anyway. Perhaps it would be better if I send a post related 
to renamings. I saw folks setting open-behind to off, tried that but no 
results. Could be also that we mount incorrectly on our ctdbs? Again I 
apologize as I understand that the points of failure can be many, but we 
are trying to do something with data/logs and observations that we have 
available.

Best regards
Ilias

Am 10.04.24 um 18:40 schrieb Darrell Budic:
> I would strongly recommend running the glusterfs servers directly on 
> bare metal instead of in VMs. Check out Ovirt, especially its hybrid 
> cluster model. While it’s not currently well maintained, it works fine 
> on your class of hardware and fully supports this model of gluster on 
> the bare metal and VMs running on the same hosts. And we may see it 
> get some more support after the VMWare buyout, who knows?
>
> Gluster isn’t known for small file performance, but hunt through the 
> archives for specific tuning hints. And if you’re using it to host the 
> VM image files, you’re making that problem because the files shared by 
> gluster are large. More cache and write (behind) buffers can help, and 
> 10G or better networking would be something you want to do if you can 
> afford it. Going to 2x1G LAGs can help a tiny bit, but you really want 
> the lower latency from a faster physical media if you can get it.
>
> If you are not already using tuned to set virtual-guest profiles on 
> your VMs (and virtual-host on the hosts), I’d look into that as well. 
> Set the disk elevator to ’none’ on the VMs as well.
>
>
>> On Apr 10, 2024, at 10:07 AM, Ilias Chasapakis forumZFD 
>> <chasapakis at forumZFD.de> wrote:
>>
>> Dear Darrell,
>>
>> Dear ...,
>>
>> Many thanks for the prompt reply. Here some of the additional 
>> information requested (please feel free to ask for more if needed)
>>
>>> CPU info:
>>> Hosts
>>> 1. Intel(R) Xeon(R) CPU E5-2630 v4 @ 2.20GHz (20 cores) hw RAID 1 
>>> (adaptec)
>>> 2. Intel(R) Xeon(R) CPU E5-2630 v4 @ 2.20GHz (20 cores) hw RAID 1 
>>> (adaptec)
>>> 3. Intel(R) Xeon(R) Silver 4112 CPU @ 2.60GHz (8 cores) hw RAID 1 
>>> (adaptec)
>>>
>>> GlusterFS VMs
>>> 1. 4 cores,  10 GB RAM, CPU model Broadwell
>>> 2. 4 cores,  10 GB RAM, CPU model Broadwell
>>> 3. 4 cores,  10 GB RAM, CPU model: host passthrough value
>>>
>>> Network info
>>> Physical connection between gluster nodes in a heartbeat network 
>>> that comprises a new Cisco switch.
>>> TCP connections with 1Gbit links
>>> Virtual default connectivity with virtio drivers for the NICs and 
>>> Macvtap connection to use the host´s connectivity.
>>> No errors or lost packages are recorded between the VMs or the 
>>> hosts. Quick iperf tests (between glusters and ctdbs-glusters show 
>>> now evident issues).
>>>
>>> Workload:
>>> An instantaneous from this moment which can be considered a peak 
>>> time is around 450 files open on the volume.
>>> In terms of "litteral" load we notice cpu peaks mostly related to 
>>> the shd process
>>
>> All disks use virtIO drivers (virtIO disks).
>>
>> The file system on all nodes is XFS (not ZFS)
>>
>> Other than the clients on the gluster nodes themselves there are 
>> clients on ctdbs that mount the gluster volume and then expose it via 
>> smb to Windows clients (user profiles included for roaming profiles).
>> ctdbs reach the glusters through the heartbeat network
>>
>> We are considering to move the glusters to a network with existing 
>> DNS capabilities in order to create a round-robin configuration by 
>> assigning hosts by IP to a single hostname to use it then for the 
>> mounts configuration of the ctdbs.
>> The reasoning/hope behind that we would minimize access time and sync 
>> issues.
>>
>> Thank you for the information about the "sharding" we will take this 
>> into account and consider pros and cons in the current situation, 
>> epsecially because turning back is not easy afterwards. Also our main 
>> problem is mainly not with big files, but with a large quantity of 
>> small files.
>>
>> We could gladly make use of some of the options you suggested after 
>> we assess the situation again. We welcome any further suggestion in 
>> the meantime.
>>
>> Ilias
>>
>> Am 09.04.24 um 18:26 schrieb Darrell Budic:
>>> The big one I see of you is to investigate and enable sharding. It 
>>> can improve performance and makes it much easier to heal VM style 
>>> workloads. Be aware that once you turn it on, you can’t go back 
>>> easily, and you need to copy the VM disk images around to get them 
>>> to be sharded before it will show any real effect. A couple other 
>>> recommendations from my main volume (three dedicated host servers 
>>> with HDDs and SDD/NVM caching and log volumes on ZFS ). The 
>>> cluster.shd-* entries are especially recommended. This is on gluster 
>>> 9.4 at the moment, so some of these won’t map exactly.
>>>
>>> Volume Name: gv1
>>> Type: Replicate
>>> Number of Bricks: 1 x 3 = 3
>>> Transport-type: tcp
>>> Options Reconfigured:
>>> cluster.read-hash-mode: 3
>>> performance.client-io-threads: on
>>> performance.write-behind-window-size: 64MB
>>> performance.cache-size: 1G
>>> nfs.disable: on
>>> performance.readdir-ahead: on
>>> performance.quick-read: off
>>> performance.read-ahead: on
>>> performance.io-cache: off
>>> performance.stat-prefetch: on
>>> cluster.eager-lock: enable
>>> network.remote-dio: enable
>>> server.event-threads: 4
>>> client.event-threads: 8
>>> performance.io-thread-count: 64
>>> performance.low-prio-threads: 32
>>> features.shard: on
>>> features.shard-block-size: 64MB
>>> cluster.locking-scheme: granular
>>> cluster.data-self-heal-algorithm: full
>>> cluster.shd-max-threads: 8
>>> cluster.shd-wait-qlength: 10240
>>> cluster.choose-local: false
>>> cluster.granular-entry-heal: enable
>>>
>>> Otherwise, more details about your servers, CPU, RAM, and Disks 
>>> would be useful for suggestions, and details of your network as 
>>> well. And if you haven’t done kernel level tuning on the servers, 
>>> you should address that as well. These all vary a lot by your work 
>>> load and hardware setup, so there aren’t many generic 
>>> recommendations I can give other than to make sure you tuned your 
>>> tcp stack and enabled the none disk elevator on SSDs or disks used 
>>> by ZFS.
>>>
>>> There’s a lot of tuning suggesting in the archives if you go 
>>> searching as well.
>>>
>>>   -Darrell
>>>
>>>
>>>> On Apr 9, 2024, at 3:05 AM, Ilias Chasapakis forumZFD 
>>>> <chasapakis at forumZFD.de> wrote:
>>>>
>>>> Dear all,
>>>>
>>>> we would like to describe the situation that we have and that does 
>>>> not solve since a long time, that means after many minor
>>>> and major upgrades of GlusterFS
>>>>
>>>> We use a KVM environment for VMs for glusterfs and host servers are 
>>>> updated regularly. Hosts are disomogeneous hardware,
>>>> but configured with same characteristics.
>>>>
>>>> The VMs have been also harmonized to use the virtio drivers where 
>>>> available for devices and resources reserved are the same
>>>> on each host.
>>>>
>>>> Physical switch for hosts has been substituted with a reliable one.
>>>>
>>>> Probing peers has been and is quite quick in the heartbeat network 
>>>> and communication between the servers for apparently has no issues 
>>>> on disruptions.
>>>>
>>>> And I say apparently because what we have is:
>>>>
>>>> - always pending failed heals that used to resolve by a rotated 
>>>> reboot of the gluster vms (replica 3). Restarting only
>>>> glusterfs related services (daemon, events etc.) has no effect, 
>>>> only reboot brings results
>>>> - very often failed heals are directories
>>>>
>>>> We lately removed a brick that was on a vm on a host that has been 
>>>> entirely substituted. Re-added the brick, sync went on and
>>>> all data was eventually synced and started with 0 pending failed 
>>>> heals. Now it develops failed heals too like its fellow
>>>> bricks. Please take into account we healed all the failed entries 
>>>> (manually with various methods) before adding the third brick.
>>>>
>>>> After some days of operating, the count of failed heals rises 
>>>> again, not really fast but with new entries for sure (which might solve
>>>> with rotated reboots, or not).
>>>>
>>>> We have gluster clients also on ctdbs that connect to the gluster 
>>>> and mount via glusterfs client. Windows roaming profiles shared via 
>>>> smb become frequently corrupted,(they are composed of a great 
>>>> number small files and are though of big total dimension). Gluster 
>>>> nodes are formatted with xfs.
>>>>
>>>> Also what we observer is that mounting with the vfs option in smb 
>>>> on the ctdbs has some kind of delay. This means that you can see 
>>>> the shared folder on for example
>>>> a Windows client machine on a ctdb, but not on another ctdb in the 
>>>> cluster and then after a while it appears there too. And this 
>>>> frequently st
>>>>
>>>>
>>>> This is an excerpt of entries on our shd logs:
>>>>
>>>>> 2024-04-08 10:13:26.213596 +0000] I [MSGID: 108026] 
>>>>> [afr-self-heal-entry.c:1080:afr_selfheal_entry_do] 
>>>>> 0-gv-ho-replicate-0: performing full entry selfheal on 
>>>>> 2c621415-6223-4b66-a4ca-3f6f267a448d
>>>>> [2024-04-08 10:14:08.135911 +0000] W [MSGID: 114031] 
>>>>> [client-rpc-fops_v2.c:2457:client4_0_link_cbk] 0-gv-ho-client-5: 
>>>>> remote operation failed. 
>>>>> [{source=<gfid:91d83f0e-1864-4ff3-9174-b7c956e20596>}, 
>>>>> {target=(null)}, {errno=116}, {error=Veraltete Dateizugriffsnummer 
>>>>> (file handle)}]
>>>>> [2024-04-08 10:15:59.135908 +0000] W [MSGID: 114061] 
>>>>> [client-common.c:2992:client_pre_readdir_v2] 0-gv-ho-client-5: 
>>>>> remote_fd is -1. EBADFD 
>>>>> [{gfid=6b5e599e-c836-4ebe-b16a-8224425b88c7}, {errno=77}, 
>>>>> {error=Die Dateizugriffsnummer ist in schlechter Verfassung}]
>>>>> [2024-04-08 10:30:25.013592 +0000] I [MSGID: 108026] 
>>>>> [afr-self-heal-entry.c:1080:afr_selfheal_entry_do] 
>>>>> 0-gv-ho-replicate-0: performing full entry selfheal on 
>>>>> 24e82e12-5512-4679-9eb3-8bd098367db7
>>>>> [2024-04-08 10:33:17.613594 +0000] W [MSGID: 114031] 
>>>>> [client-rpc-fops_v2.c:2457:client4_0_link_cbk] 0-gv-ho-client-5: 
>>>>> remote operation failed. 
>>>>> [{source=<gfid:ef9068fc-a329-4a21-88d2-265ecd3d208c>}, 
>>>>> {target=(null)}, {errno=116}, {error=Veraltete Dateizugriffsnummer 
>>>>> (file handle)}]
>>>>> [2024-04-08 10:33:21.201359 +0000] W [MSGID: 114031] 
>>>>> [client-rpc-fops_v2.c:2457:client4_0_link_cbk] 0-gv-ho-client-5: 
>>>>> remote operation failed. [{source=
>>>>
>>>> How are he clients mapped to real hosts in order to know on which 
>>>> one´s logs to look at?
>>>>
>>>> We would like to go by exclusion to finally eradicate this, 
>>>> possibly in a conservative way (not rebuilding everything) and we
>>>>
>>>> are becoming clueless as to where to look at as we also tried 
>>>> various options settings regarding performance etc.
>>>>
>>>> Here is the set on our main volume:
>>>>
>>>>> cluster.lookup-unhashed on (DEFAULT)
>>>>> cluster.lookup-optimize                  on (DEFAULT)
>>>>> cluster.min-free-disk                    10% (DEFAULT)
>>>>> cluster.min-free-inodes                  5% (DEFAULT)
>>>>> cluster.rebalance-stats                  off (DEFAULT)
>>>>> cluster.subvols-per-directory (null) (DEFAULT)
>>>>> cluster.readdir-optimize                 off (DEFAULT)
>>>>> cluster.rsync-hash-regex (null) (DEFAULT)
>>>>> cluster.extra-hash-regex (null) (DEFAULT)
>>>>> cluster.dht-xattr-name trusted.glusterfs.dht (DEFAULT)
>>>>> cluster.randomize-hash-range-by-gfid     off (DEFAULT)
>>>>> cluster.rebal-throttle normal (DEFAULT)
>>>>> cluster.lock-migration off
>>>>> cluster.force-migration off
>>>>> cluster.local-volume-name (null) (DEFAULT)
>>>>> cluster.weighted-rebalance               on (DEFAULT)
>>>>> cluster.switch-pattern (null) (DEFAULT)
>>>>> cluster.entry-change-log                 on (DEFAULT)
>>>>> cluster.read-subvolume (null) (DEFAULT)
>>>>> cluster.read-subvolume-index             -1 (DEFAULT)
>>>>> cluster.read-hash-mode                   1 (DEFAULT)
>>>>> cluster.background-self-heal-count       8 (DEFAULT)
>>>>> cluster.metadata-self-heal on
>>>>> cluster.data-self-heal on
>>>>> cluster.entry-self-heal on
>>>>> cluster.self-heal-daemon enable
>>>>> cluster.heal-timeout                     600 (DEFAULT)
>>>>> cluster.self-heal-window-size            8 (DEFAULT)
>>>>> cluster.data-change-log                  on (DEFAULT)
>>>>> cluster.metadata-change-log              on (DEFAULT)
>>>>> cluster.data-self-heal-algorithm (null) (DEFAULT)
>>>>> cluster.eager-lock                       on (DEFAULT)
>>>>> disperse.eager-lock                      on (DEFAULT)
>>>>> disperse.other-eager-lock                on (DEFAULT)
>>>>> disperse.eager-lock-timeout              1 (DEFAULT)
>>>>> disperse.other-eager-lock-timeout        1 (DEFAULT)
>>>>> cluster.quorum-type auto
>>>>> cluster.quorum-count 2
>>>>> cluster.choose-local true (DEFAULT)
>>>>> cluster.self-heal-readdir-size           1KB (DEFAULT)
>>>>> cluster.post-op-delay-secs               1 (DEFAULT)
>>>>> cluster.ensure-durability                on (DEFAULT)
>>>>> cluster.consistent-metadata              no (DEFAULT)
>>>>> cluster.heal-wait-queue-length           128 (DEFAULT)
>>>>> cluster.favorite-child-policy none
>>>>> cluster.full-lock                        yes (DEFAULT)
>>>>> cluster.optimistic-change-log            on (DEFAULT)
>>>>> diagnostics.latency-measurement off
>>>>> diagnostics.dump-fd-stats                off (DEFAULT)
>>>>> diagnostics.count-fop-hits off
>>>>> diagnostics.brick-log-level INFO
>>>>> diagnostics.client-log-level INFO
>>>>> diagnostics.brick-sys-log-level CRITICAL (DEFAULT)
>>>>> diagnostics.client-sys-log-level CRITICAL (DEFAULT)
>>>>> diagnostics.brick-logger (null) (DEFAULT)
>>>>> diagnostics.client-logger (null) (DEFAULT)
>>>>> diagnostics.brick-log-format (null) (DEFAULT)
>>>>> diagnostics.client-log-format (null) (DEFAULT)
>>>>> diagnostics.brick-log-buf-size           5 (DEFAULT)
>>>>> diagnostics.client-log-buf-size          5 (DEFAULT)
>>>>> diagnostics.brick-log-flush-timeout      120 (DEFAULT)
>>>>> diagnostics.client-log-flush-timeout     120 (DEFAULT)
>>>>> diagnostics.stats-dump-interval          0 (DEFAULT)
>>>>> diagnostics.fop-sample-interval          0 (DEFAULT)
>>>>> diagnostics.stats-dump-format json (DEFAULT)
>>>>> diagnostics.fop-sample-buf-size 65535 (DEFAULT)
>>>>> diagnostics.stats-dnscache-ttl-sec 86400 (DEFAULT)
>>>>> performance.cache-max-file-size 10
>>>>> performance.cache-min-file-size          0 (DEFAULT)
>>>>> performance.cache-refresh-timeout        1 (DEFAULT)
>>>>> performance.cache-priority (DEFAULT)
>>>>> performance.io-cache-size 32MB (DEFAULT)
>>>>> performance.cache-size 32MB (DEFAULT)
>>>>> performance.io-thread-count              16 (DEFAULT)
>>>>> performance.high-prio-threads            16 (DEFAULT)
>>>>> performance.normal-prio-threads          16 (DEFAULT)
>>>>> performance.low-prio-threads             16 (DEFAULT)
>>>>> performance.least-prio-threads           1 (DEFAULT)
>>>>> performance.enable-least-priority        on (DEFAULT)
>>>>> performance.iot-watchdog-secs (null) (DEFAULT)
>>>>> performance.iot-cleanup-disconnected-reqs off (DEFAULT)
>>>>> performance.iot-pass-through false (DEFAULT)
>>>>> performance.io-cache-pass-through false (DEFAULT)
>>>>> performance.quick-read-cache-size 128MB (DEFAULT)
>>>>> performance.cache-size 128MB (DEFAULT)
>>>>> performance.quick-read-cache-timeout     1 (DEFAULT)
>>>>> performance.qr-cache-timeout 600
>>>>> performance.quick-read-cache-invalidation false (DEFAULT)
>>>>> performance.ctime-invalidation false (DEFAULT)
>>>>> performance.flush-behind                 on (DEFAULT)
>>>>> performance.nfs.flush-behind             on (DEFAULT)
>>>>> performance.write-behind-window-size 4MB
>>>>> performance.resync-failed-syncs-after-fsync off (DEFAULT)
>>>>> performance.nfs.write-behind-window-size 1MB (DEFAULT)
>>>>> performance.strict-o-direct              off (DEFAULT)
>>>>> performance.nfs.strict-o-direct          off (DEFAULT)
>>>>> performance.strict-write-ordering        off (DEFAULT)
>>>>> performance.nfs.strict-write-ordering    off (DEFAULT)
>>>>> performance.write-behind-trickling-writes on (DEFAULT)
>>>>> performance.aggregate-size 128KB (DEFAULT)
>>>>> performance.nfs.write-behind-trickling-writes on (DEFAULT)
>>>>> performance.lazy-open                    yes (DEFAULT)
>>>>> performance.read-after-open              yes (DEFAULT)
>>>>> performance.open-behind-pass-through false (DEFAULT)
>>>>> performance.read-ahead-page-count        4 (DEFAULT)
>>>>> performance.read-ahead-pass-through false (DEFAULT)
>>>>> performance.readdir-ahead-pass-through false (DEFAULT)
>>>>> performance.md-cache-pass-through false (DEFAULT)
>>>>> performance.write-behind-pass-through false (DEFAULT)
>>>>> performance.md-cache-timeout 600
>>>>> performance.cache-swift-metadata false (DEFAULT)
>>>>> performance.cache-samba-metadata on
>>>>> performance.cache-capability-xattrs true (DEFAULT)
>>>>> performance.cache-ima-xattrs true (DEFAULT)
>>>>> performance.md-cache-statfs              off (DEFAULT)
>>>>> performance.xattr-cache-list (DEFAULT)
>>>>> performance.nl-cache-pass-through false (DEFAULT)
>>>>> network.frame-timeout 1800 (DEFAULT)
>>>>> network.ping-timeout 20
>>>>> network.tcp-window-size (null) (DEFAULT)
>>>>> client.ssl off
>>>>> network.remote-dio disable (DEFAULT)
>>>>> client.event-threads 4
>>>>> client.tcp-user-timeout 0
>>>>> client.keepalive-time 20
>>>>> client.keepalive-interval 2
>>>>> client.keepalive-count 9
>>>>> client.strict-locks off
>>>>> network.tcp-window-size (null) (DEFAULT)
>>>>> network.inode-lru-limit 200000
>>>>> auth.allow *
>>>>> auth.reject (null) (DEFAULT)
>>>>> transport.keepalive 1
>>>>> server.allow-insecure                    on (DEFAULT)
>>>>> server.root-squash                       off (DEFAULT)
>>>>> server.all-squash                        off (DEFAULT)
>>>>> server.anonuid 65534 (DEFAULT)
>>>>> server.anongid 65534 (DEFAULT)
>>>>> server.statedump-path /var/run/gluster (DEFAULT)
>>>>> server.outstanding-rpc-limit             64 (DEFAULT)
>>>>> server.ssl off
>>>>> auth.ssl-allow *
>>>>> server.manage-gids                       off (DEFAULT)
>>>>> server.dynamic-auth                      on (DEFAULT)
>>>>> client.send-gids                         on (DEFAULT)
>>>>> server.gid-timeout                       300 (DEFAULT)
>>>>> server.own-thread (null) (DEFAULT)
>>>>> server.event-threads 4
>>>>> server.tcp-user-timeout                  42 (DEFAULT)
>>>>> server.keepalive-time 20
>>>>> server.keepalive-interval 2
>>>>> server.keepalive-count 9
>>>>> transport.listen-backlog 1024
>>>>> ssl.own-cert (null) (DEFAULT)
>>>>> ssl.private-key (null) (DEFAULT)
>>>>> ssl.ca-list (null) (DEFAULT)
>>>>> ssl.crl-path (null) (DEFAULT)
>>>>> ssl.certificate-depth (null) (DEFAULT)
>>>>> ssl.cipher-list (null) (DEFAULT)
>>>>> ssl.dh-param (null) (DEFAULT)
>>>>> ssl.ec-curve (null) (DEFAULT)
>>>>> transport.address-family inet
>>>>> performance.write-behind off
>>>>> performance.read-ahead on
>>>>> performance.readdir-ahead on
>>>>> performance.io-cache off
>>>>> performance.open-behind on
>>>>> performance.quick-read on
>>>>> performance.nl-cache on
>>>>> performance.stat-prefetch on
>>>>> performance.client-io-threads off
>>>>> performance.nfs.write-behind on
>>>>> performance.nfs.read-ahead off
>>>>> performance.nfs.io-cache off
>>>>> performance.nfs.quick-read off
>>>>> performance.nfs.stat-prefetch off
>>>>> performance.nfs.io-threads off
>>>>> performance.force-readdirp true (DEFAULT)
>>>>> performance.cache-invalidation on
>>>>> performance.global-cache-invalidation true (DEFAULT)
>>>>> features.uss off
>>>>> features.snapshot-directory .snaps
>>>>> features.show-snapshot-directory off
>>>>> features.tag-namespaces off
>>>>> network.compression off
>>>>> network.compression.window-size          -15 (DEFAULT)
>>>>> network.compression.mem-level            8 (DEFAULT)
>>>>> network.compression.min-size             0 (DEFAULT)
>>>>> network.compression.compression-level    -1 (DEFAULT)
>>>>> network.compression.debug false (DEFAULT)
>>>>> features.default-soft-limit              80% (DEFAULT)
>>>>> features.soft-timeout                    60 (DEFAULT)
>>>>> features.hard-timeout                    5 (DEFAULT)
>>>>> features.alert-time 86400 (DEFAULT)
>>>>> features.quota-deem-statfs off
>>>>> geo-replication.indexing off
>>>>> geo-replication.indexing off
>>>>> geo-replication.ignore-pid-check off
>>>>> geo-replication.ignore-pid-check off
>>>>> features.quota off
>>>>> features.inode-quota off
>>>>> features.bitrot disable
>>>>> debug.trace off
>>>>> debug.log-history                        no (DEFAULT)
>>>>> debug.log-file                           no (DEFAULT)
>>>>> debug.exclude-ops (null) (DEFAULT)
>>>>> debug.include-ops (null) (DEFAULT)
>>>>> debug.error-gen off
>>>>> debug.error-failure (null) (DEFAULT)
>>>>> debug.error-number (null) (DEFAULT)
>>>>> debug.random-failure                     off (DEFAULT)
>>>>> debug.error-fops (null) (DEFAULT)
>>>>> nfs.disable on
>>>>> features.read-only                       off (DEFAULT)
>>>>> features.worm off
>>>>> features.worm-file-level off
>>>>> features.worm-files-deletable on
>>>>> features.default-retention-period        120 (DEFAULT)
>>>>> features.retention-mode relax (DEFAULT)
>>>>> features.auto-commit-period              180 (DEFAULT)
>>>>> storage.linux-aio                        off (DEFAULT)
>>>>> storage.linux-io_uring                   off (DEFAULT)
>>>>> storage.batch-fsync-mode reverse-fsync (DEFAULT)
>>>>> storage.batch-fsync-delay-usec           0 (DEFAULT)
>>>>> storage.owner-uid                        -1 (DEFAULT)
>>>>> storage.owner-gid                        -1 (DEFAULT)
>>>>> storage.node-uuid-pathinfo               off (DEFAULT)
>>>>> storage.health-check-interval            30 (DEFAULT)
>>>>> storage.build-pgfid                      off (DEFAULT)
>>>>> storage.gfid2path                        on (DEFAULT)
>>>>> storage.gfid2path-separator              : (DEFAULT)
>>>>> storage.reserve                          1 (DEFAULT)
>>>>> storage.health-check-timeout             20 (DEFAULT)
>>>>> storage.fips-mode-rchecksum on
>>>>> storage.force-create-mode 0000 (DEFAULT)
>>>>> storage.force-directory-mode 0000 (DEFAULT)
>>>>> storage.create-mask 0777 (DEFAULT)
>>>>> storage.create-directory-mask 0777 (DEFAULT)
>>>>> storage.max-hardlinks                    100 (DEFAULT)
>>>>> features.ctime                           on (DEFAULT)
>>>>> config.gfproxyd off
>>>>> cluster.server-quorum-type server
>>>>> cluster.server-quorum-ratio 51
>>>>> changelog.changelog                      off (DEFAULT)
>>>>> changelog.changelog-dir                  {{ brick.path 
>>>>> }}/.glusterfs/changelogs (DEFAULT)
>>>>> changelog.encoding ascii (DEFAULT)
>>>>> changelog.rollover-time                  15 (DEFAULT)
>>>>> changelog.fsync-interval                 5 (DEFAULT)
>>>>> changelog.changelog-barrier-timeout 120
>>>>> changelog.capture-del-path               off (DEFAULT)
>>>>> features.barrier disable
>>>>> features.barrier-timeout 120
>>>>> features.trash                           off (DEFAULT)
>>>>> features.trash-dir .trashcan (DEFAULT)
>>>>> features.trash-eliminate-path (null) (DEFAULT)
>>>>> features.trash-max-filesize              5MB (DEFAULT)
>>>>> features.trash-internal-op               off (DEFAULT)
>>>>> cluster.enable-shared-storage disable
>>>>> locks.trace                              off (DEFAULT)
>>>>> locks.mandatory-locking                  off (DEFAULT)
>>>>> cluster.disperse-self-heal-daemon enable (DEFAULT)
>>>>> cluster.quorum-reads                     no (DEFAULT)
>>>>> client.bind-insecure (null) (DEFAULT)
>>>>> features.timeout                         45 (DEFAULT)
>>>>> features.failover-hosts (null) (DEFAULT)
>>>>> features.shard off
>>>>> features.shard-block-size 64MB (DEFAULT)
>>>>> features.shard-lru-limit 16384 (DEFAULT)
>>>>> features.shard-deletion-rate             100 (DEFAULT)
>>>>> features.scrub-throttle lazy
>>>>> features.scrub-freq biweekly
>>>>> features.scrub false (DEFAULT)
>>>>> features.expiry-time 120
>>>>> features.signer-threads 4
>>>>> features.cache-invalidation on
>>>>> features.cache-invalidation-timeout 600
>>>>> ganesha.enable off
>>>>> features.leases off
>>>>> features.lease-lock-recall-timeout       60 (DEFAULT)
>>>>> disperse.background-heals                8 (DEFAULT)
>>>>> disperse.heal-wait-qlength               128 (DEFAULT)
>>>>> cluster.heal-timeout                     600 (DEFAULT)
>>>>> dht.force-readdirp                       on (DEFAULT)
>>>>> disperse.read-policy gfid-hash (DEFAULT)
>>>>> cluster.shd-max-threads 4
>>>>> cluster.shd-wait-qlength 1024 (DEFAULT)
>>>>> cluster.locking-scheme full (DEFAULT)
>>>>> cluster.granular-entry-heal              no (DEFAULT)
>>>>> features.locks-revocation-secs           0 (DEFAULT)
>>>>> features.locks-revocation-clear-all false (DEFAULT)
>>>>> features.locks-revocation-max-blocked    0 (DEFAULT)
>>>>> features.locks-monkey-unlocking false (DEFAULT)
>>>>> features.locks-notify-contention         yes (DEFAULT)
>>>>> features.locks-notify-contention-delay   5 (DEFAULT)
>>>>> disperse.shd-max-threads                 1 (DEFAULT)
>>>>> disperse.shd-wait-qlength 4096
>>>>> disperse.cpu-extensions auto (DEFAULT)
>>>>> disperse.self-heal-window-size           32 (DEFAULT)
>>>>> cluster.use-compound-fops off
>>>>> performance.parallel-readdir on
>>>>> performance.rda-request-size 131072
>>>>> performance.rda-low-wmark 4096 (DEFAULT)
>>>>> performance.rda-high-wmark 128KB (DEFAULT)
>>>>> performance.rda-cache-limit 10MB
>>>>> performance.nl-cache-positive-entry false (DEFAULT)
>>>>> performance.nl-cache-limit 10MB
>>>>> performance.nl-cache-timeout 600
>>>>> cluster.brick-multiplex disable
>>>>> cluster.brick-graceful-cleanup disable
>>>>> glusterd.vol_count_per_thread 100
>>>>> cluster.max-bricks-per-process 250
>>>>> disperse.optimistic-change-log           on (DEFAULT)
>>>>> disperse.stripe-cache                    4 (DEFAULT)
>>>>> cluster.halo-enabled False (DEFAULT)
>>>>> cluster.halo-shd-max-latency 99999 (DEFAULT)
>>>>> cluster.halo-nfsd-max-latency            5 (DEFAULT)
>>>>> cluster.halo-max-latency                 5 (DEFAULT)
>>>>> cluster.halo-max-replicas 99999 (DEFAULT)
>>>>> cluster.halo-min-replicas                2 (DEFAULT)
>>>>> features.selinux on
>>>>> cluster.daemon-log-level INFO
>>>>> debug.delay-gen off
>>>>> delay-gen.delay-percentage               10% (DEFAULT)
>>>>> delay-gen.delay-duration 100000 (DEFAULT)
>>>>> delay-gen.enable (DEFAULT)
>>>>> disperse.parallel-writes                 on (DEFAULT)
>>>>> disperse.quorum-count                    0 (DEFAULT)
>>>>> features.sdfs off
>>>>> features.cloudsync off
>>>>> features.ctime on
>>>>> ctime.noatime on
>>>>> features.cloudsync-storetype (null) (DEFAULT)
>>>>> features.enforce-mandatory-lock off
>>>>> config.global-threading off
>>>>> config.client-threads 16
>>>>> config.brick-threads 16
>>>>> features.cloudsync-remote-read off
>>>>> features.cloudsync-store-id (null) (DEFAULT)
>>>>> features.cloudsync-product-id (null) (DEFAULT)
>>>>> features.acl enable
>>>>> cluster.use-anonymous-inode yes
>>>>> rebalance.ensure-durability              on (DEFAULT)
>>>>
>>>> Again, sorry for the long post. We would be happy to have this 
>>>> solved as we are excited using glusterfs and we would like to go 
>>>> back to having a stable configuration.
>>>>
>>>> We always appreciate the spirit of collaboration and reciprocal 
>>>> help on this list.
>>>>
>>>> Best
>>>> Ilias
>>>>
>>>> -- 
>>>> forumZFD
>>>> Entschieden für Frieden | Committed to Peace
>>>>
>>>> Ilias Chasapakis
>>>> Referent IT | IT Consultant
>>>>
>>>> Forum Ziviler Friedensdienst e.V. | Forum Civil Peace Service
>>>> Am Kölner Brett 8 | 50825 Köln | Germany
>>>>
>>>> Tel 0221 91273243 | Fax 0221 91273299 | http://www.forumZFD.de
>>>>
>>>> Vorstand nach § 26 BGB, einzelvertretungsberechtigt|Executive Board:
>>>> Alexander Mauz, Sonja Wiekenberg-Mlalandle, Jens von Bargen
>>>> VR 17651 Amtsgericht Köln
>>>>
>>>> Spenden|Donations: IBAN DE90 4306 0967 4103 7264 00   BIC GENODEM1GLS
>>>>
>>>> ________
>>>>
>>>>
>>>>
>>>> Community Meeting Calendar:
>>>>
>>>> Schedule -
>>>> Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
>>>> Bridge: https://meet.google.com/cpu-eiue-hvk
>>>> Gluster-users mailing list
>>>> Gluster-users at gluster.org
>>>> https://lists.gluster.org/mailman/listinfo/gluster-users
>>>
>> -- 
>> forumZFD
>> Entschieden für Frieden | Committed to Peace
>>
>> Ilias Chasapakis
>> Referent IT | IT Consultant
>>
>> Forum Ziviler Friedensdienst e.V. | Forum Civil Peace Service
>> Am Kölner Brett 8 | 50825 Köln | Germany
>>
>> Tel 0221 91273243 | Fax 0221 91273299 |http://www.forumZFD.de
>>
>> Vorstand nach § 26 BGB, einzelvertretungsberechtigt|Executive Board:
>> Alexander Mauz, Sonja Wiekenberg-Mlalandle, Jens von Bargen
>> VR 17651 Amtsgericht Köln
>>
>> Spenden|Donations: IBAN DE90 4306 0967 4103 7264 00   BIC GENODEM1GLS
>> ________
>>
>>
>>
>> Community Meeting Calendar:
>>
>> Schedule -
>> Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
>> Bridge: https://meet.google.com/cpu-eiue-hvk
>> Gluster-users mailing list
>> Gluster-users at gluster.org
>> https://lists.gluster.org/mailman/listinfo/gluster-users
>
-- 
forumZFD
Entschieden für Frieden | Committed to Peace

Ilias Chasapakis
Referent IT | IT Consultant

Forum Ziviler Friedensdienst e.V. | Forum Civil Peace Service
Am Kölner Brett 8 | 50825 Köln | Germany

Tel 0221 91273243 | Fax 0221 91273299 |http://www.forumZFD.de

Vorstand nach § 26 BGB, einzelvertretungsberechtigt|Executive Board:
Alexander Mauz, Sonja Wiekenberg-Mlalandle, Jens von Bargen
VR 17651 Amtsgericht Köln

Spenden|Donations: IBAN DE90 4306 0967 4103 7264 00   BIC GENODEM1GLS

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20240422/77b1be9b/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: OpenPGP_signature.asc
Type: application/pgp-signature
Size: 665 bytes
Desc: OpenPGP digital signature
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20240422/77b1be9b/attachment.sig>