From hunter86_bg at yahoo.com Tue Apr 2 05:13:05 2024 From: hunter86_bg at yahoo.com (Strahil Nikolov) Date: Tue, 2 Apr 2024 05:13:05 +0000 (UTC) Subject: [Gluster-users] =?utf-8?q?Adding_storage_capacity_to_a_production?= =?utf-8?q?=09disperse_volume?= In-Reply-To: References: Message-ID: <1486901479.2533488.1712034785928@mail.yahoo.com> Hi Ted, What do you mean with one unit ? Best Regards,Strahil Nikolov On Fri, Mar 29, 2024 at 4:33, Theodore Buchwald wrote: ________ Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://meet.google.com/cpu-eiue-hvk Gluster-users mailing list Gluster-users at gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users -------------- next part -------------- An HTML attachment was scrubbed... URL: From buchwald at ucsc.edu Thu Apr 4 20:28:18 2024 From: buchwald at ucsc.edu (Theodore Buchwald) Date: Thu, 4 Apr 2024 13:28:18 -0700 Subject: [Gluster-users] Adding storage capacity to a production disperse volume In-Reply-To: References: Message-ID: Hi Strahil, Sorry for the latin reply. I did not catch this response until just now. For the "What do you mean with one unit ?" I meant making 5 bricks (of equal size of the original 5) within that unit and adding that to the existing gluster volume "Number of Bricks: 1 x (4 + 1) = 5". Those 5 bricks just happen to be individual storage units. Thanks, Ted On Thu, Mar 28, 2024 at 4:28?PM Theodore Buchwald wrote: > Hello, > > And thank you to the users of the group. That clarified what I was > wondering and answered my question on the additional storage space. > > I had another question in regard to adding a unit to this gluster disperse > setup. I may be able to add a large capacity JBOD to this gluster > configuration. So I would be able to divide the single storage unit into 5 > bricks of the same size of each of the existing 5 bricks that are > configured in the gluster cluster. > > My question is this possible to do with one unit and add to the > single volume "Volume Name: researchdata" in my existing setup? Reading > documentation on adding to a disperse volume I am not sure on how to > proceed with this type of addition. > > Thanks Tbuck! > > On Wed, Mar 13, 2024 at 5:39?PM Theodore Buchwald > wrote: > >> Hi, >> >> >> This is the first time I have tried to expand the storage of a live >> gluster volume. I was able to get another supermicro storage unit for a >> gluster cluster that I built. The current clustered storage configuration >> contains five supermicro units. And the cluster volume is setup with the >> following configuration: >> >> >> node-6[/var/log/glusterfs]# gluster volume info >> >> >> >> Volume Name: researchdata >> >> Type: Disperse >> >> Volume ID: 93d4-482a-8933-2d81298d5b3b >> >> Status: Started >> >> Snapshot Count: 0 >> >> Number of Bricks: 1 x (4 + 1) = 5 >> >> Transport-type: tcp >> >> Bricks: >> >> Brick1: node-1:/mnt/data/researchdata-1 >> >> Brick2: node-2:/mnt/data/researchdata-2 >> >> Brick3: node-3:/mnt/data/researchdata-3 >> >> Brick4: node-4:/mnt/data/researchdata-4 >> >> Brick5: node-5:/mnt/data/researchdata-5 >> >> Options Reconfigured: >> >> features.quota-deem-statfs: on >> >> features.inode-quota: on >> >> features.quota: on >> >> storage.fips-mode-rchecksum: on >> >> transport.address-family: inet >> >> nfs.disable: on >> >> locks.mandatory-locking: optimal >> >> >> Adding the node to the cluster was no problem. But adding a brick using >> 'add-brick' to the volume resulted in "volume add-brick: failed: Incorrect >> number of bricks supplied 1 with count 5". So my question is. What would be >> the correct amount of bricks needed to expand the storage on the current >> configuration of 'Number of Bricks: 1 x (4 + 1) = 5'? Without >> reconfiguring the volume all together. >> >> >> Thanks in advance for any pointers in how to expand this volume's storage >> capabilities. >> >> Thanks, Tbuck >> > > > -- > > Ted Buchwald > > Divisional LIT support of PBSci IT Staff > buchwald at ucsc.edu > 831-459-1298 > > Earth & Marine Sciences Bldg , Room A309 > > > -- Ted Buchwald Divisional LIT support of PBSci IT Staff buchwald at ucsc.edu 831-459-1298 Earth & Marine Sciences Bldg , Room A309 -------------- next part -------------- An HTML attachment was scrubbed... URL: From chasapakis at forumZFD.de Tue Apr 9 08:05:25 2024 From: chasapakis at forumZFD.de (Ilias Chasapakis forumZFD) Date: Tue, 9 Apr 2024 10:05:25 +0200 Subject: [Gluster-users] Glusterfs 10.5-1 healing issues Message-ID: Dear all, we would like to describe the situation that we have and that does not solve since a long time, that means after many minor and major upgrades of GlusterFS We use a KVM environment for VMs for glusterfs and host servers are updated regularly. Hosts are disomogeneous hardware, but configured with same characteristics. The VMs have been also harmonized to use the virtio drivers where available for devices and resources reserved are the same on each host. Physical switch for hosts has been substituted with a reliable one. Probing peers has been and is quite quick in the heartbeat network and communication between the servers for apparently has no issues on disruptions. And I say apparently because what we have is: - always pending failed heals that used to resolve by a rotated reboot of the gluster vms (replica 3). Restarting only glusterfs related services (daemon, events etc.) has no effect, only reboot brings results - very often failed heals are directories We lately removed a brick that was on a vm on a host that has been entirely substituted. Re-added the brick, sync went on and all data was eventually synced and started with 0 pending failed heals. Now it develops failed heals too like its fellow bricks. Please take into account we healed all the failed entries (manually with various methods) before adding the third brick. After some days of operating, the count of failed heals rises again, not really fast but with new entries for sure (which might solve with rotated reboots, or not). We have gluster clients also on ctdbs that connect to the gluster and mount via glusterfs client. Windows roaming profiles shared via smb become frequently corrupted,(they are composed of a great number small files and are though of big total dimension). Gluster nodes are formatted with xfs. Also what we observer is that mounting with the vfs option in smb on the ctdbs has some kind of delay. This means that you can see the shared folder on for example a Windows client machine on a ctdb, but not on another ctdb in the cluster and then after a while it appears there too. And this frequently st This is an excerpt of entries on our shd logs: > 2024-04-08 10:13:26.213596 +0000] I [MSGID: 108026] > [afr-self-heal-entry.c:1080:afr_selfheal_entry_do] > 0-gv-ho-replicate-0: performing full entry selfheal on > 2c621415-6223-4b66-a4ca-3f6f267a448d > [2024-04-08 10:14:08.135911 +0000] W [MSGID: 114031] > [client-rpc-fops_v2.c:2457:client4_0_link_cbk] 0-gv-ho-client-5: > remote operation failed. > [{source=}, > {target=(null)}, {errno=116}, {error=Veraltete Dateizugriffsnummer > (file handle)}] > [2024-04-08 10:15:59.135908 +0000] W [MSGID: 114061] > [client-common.c:2992:client_pre_readdir_v2] 0-gv-ho-client-5: > remote_fd is -1. EBADFD [{gfid=6b5e599e-c836-4ebe-b16a-8224425b88c7}, > {errno=77}, {error=Die Dateizugriffsnummer ist in schlechter Verfassung}] > [2024-04-08 10:30:25.013592 +0000] I [MSGID: 108026] > [afr-self-heal-entry.c:1080:afr_selfheal_entry_do] > 0-gv-ho-replicate-0: performing full entry selfheal on > 24e82e12-5512-4679-9eb3-8bd098367db7 > [2024-04-08 10:33:17.613594 +0000] W [MSGID: 114031] > [client-rpc-fops_v2.c:2457:client4_0_link_cbk] 0-gv-ho-client-5: > remote operation failed. > [{source=}, > {target=(null)}, {errno=116}, {error=Veraltete Dateizugriffsnummer > (file handle)}] > [2024-04-08 10:33:21.201359 +0000] W [MSGID: 114031] > [client-rpc-fops_v2.c:2457:client4_0_link_cbk] 0-gv-ho-client-5: > remote operation failed. [{source= How are he clients mapped to real hosts in order to know on which one?s logs to look at? We would like to go by exclusion to finally eradicate this, possibly in a conservative way (not rebuilding everything) and we are becoming clueless as to where to look at as we also tried various options settings regarding performance etc. Here is the set on our main volume: > cluster.lookup-unhashed????????????????? on (DEFAULT) > cluster.lookup-optimize????????????????? on (DEFAULT) > cluster.min-free-disk??????????????????? 10% (DEFAULT) > cluster.min-free-inodes????????????????? 5% (DEFAULT) > cluster.rebalance-stats????????????????? off (DEFAULT) > cluster.subvols-per-directory??????????? (null) (DEFAULT) > cluster.readdir-optimize???????????????? off (DEFAULT) > cluster.rsync-hash-regex???????????????? (null) (DEFAULT) > cluster.extra-hash-regex???????????????? (null) (DEFAULT) > cluster.dht-xattr-name?????????????????? trusted.glusterfs.dht (DEFAULT) > cluster.randomize-hash-range-by-gfid???? off (DEFAULT) > cluster.rebal-throttle?????????????????? normal (DEFAULT) > cluster.lock-migration off > cluster.force-migration off > cluster.local-volume-name??????????????? (null) (DEFAULT) > cluster.weighted-rebalance?????????????? on (DEFAULT) > cluster.switch-pattern?????????????????? (null) (DEFAULT) > cluster.entry-change-log???????????????? on (DEFAULT) > cluster.read-subvolume?????????????????? (null) (DEFAULT) > cluster.read-subvolume-index???????????? -1 (DEFAULT) > cluster.read-hash-mode?????????????????? 1 (DEFAULT) > cluster.background-self-heal-count?????? 8 (DEFAULT) > cluster.metadata-self-heal on > cluster.data-self-heal on > cluster.entry-self-heal on > cluster.self-heal-daemon enable > cluster.heal-timeout???????????????????? 600 (DEFAULT) > cluster.self-heal-window-size??????????? 8 (DEFAULT) > cluster.data-change-log????????????????? on (DEFAULT) > cluster.metadata-change-log????????????? on (DEFAULT) > cluster.data-self-heal-algorithm???????? (null) (DEFAULT) > cluster.eager-lock?????????????????????? on (DEFAULT) > disperse.eager-lock????????????????????? on (DEFAULT) > disperse.other-eager-lock??????????????? on (DEFAULT) > disperse.eager-lock-timeout????????????? 1 (DEFAULT) > disperse.other-eager-lock-timeout??????? 1 (DEFAULT) > cluster.quorum-type auto > cluster.quorum-count 2 > cluster.choose-local???????????????????? true (DEFAULT) > cluster.self-heal-readdir-size?????????? 1KB (DEFAULT) > cluster.post-op-delay-secs?????????????? 1 (DEFAULT) > cluster.ensure-durability??????????????? on (DEFAULT) > cluster.consistent-metadata????????????? no (DEFAULT) > cluster.heal-wait-queue-length?????????? 128 (DEFAULT) > cluster.favorite-child-policy none > cluster.full-lock??????????????????????? yes (DEFAULT) > cluster.optimistic-change-log??????????? on (DEFAULT) > diagnostics.latency-measurement off > diagnostics.dump-fd-stats??????????????? off (DEFAULT) > diagnostics.count-fop-hits off > diagnostics.brick-log-level INFO > diagnostics.client-log-level INFO > diagnostics.brick-sys-log-level????????? CRITICAL (DEFAULT) > diagnostics.client-sys-log-level???????? CRITICAL (DEFAULT) > diagnostics.brick-logger???????????????? (null) (DEFAULT) > diagnostics.client-logger??????????????? (null) (DEFAULT) > diagnostics.brick-log-format???????????? (null) (DEFAULT) > diagnostics.client-log-format??????????? (null) (DEFAULT) > diagnostics.brick-log-buf-size?????????? 5 (DEFAULT) > diagnostics.client-log-buf-size????????? 5 (DEFAULT) > diagnostics.brick-log-flush-timeout????? 120 (DEFAULT) > diagnostics.client-log-flush-timeout???? 120 (DEFAULT) > diagnostics.stats-dump-interval????????? 0 (DEFAULT) > diagnostics.fop-sample-interval????????? 0 (DEFAULT) > diagnostics.stats-dump-format??????????? json (DEFAULT) > diagnostics.fop-sample-buf-size????????? 65535 (DEFAULT) > diagnostics.stats-dnscache-ttl-sec?????? 86400 (DEFAULT) > performance.cache-max-file-size 10 > performance.cache-min-file-size????????? 0 (DEFAULT) > performance.cache-refresh-timeout??????? 1 (DEFAULT) > performance.cache-priority (DEFAULT) > performance.io-cache-size??????????????? 32MB (DEFAULT) > performance.cache-size?????????????????? 32MB (DEFAULT) > performance.io-thread-count????????????? 16 (DEFAULT) > performance.high-prio-threads??????????? 16 (DEFAULT) > performance.normal-prio-threads????????? 16 (DEFAULT) > performance.low-prio-threads???????????? 16 (DEFAULT) > performance.least-prio-threads?????????? 1 (DEFAULT) > performance.enable-least-priority??????? on (DEFAULT) > performance.iot-watchdog-secs??????????? (null) (DEFAULT) > performance.iot-cleanup-disconnected-reqs off (DEFAULT) > performance.iot-pass-through???????????? false (DEFAULT) > performance.io-cache-pass-through??????? false (DEFAULT) > performance.quick-read-cache-size??????? 128MB (DEFAULT) > performance.cache-size?????????????????? 128MB (DEFAULT) > performance.quick-read-cache-timeout???? 1 (DEFAULT) > performance.qr-cache-timeout 600 > performance.quick-read-cache-invalidation false (DEFAULT) > performance.ctime-invalidation?????????? false (DEFAULT) > performance.flush-behind???????????????? on (DEFAULT) > performance.nfs.flush-behind???????????? on (DEFAULT) > performance.write-behind-window-size 4MB > performance.resync-failed-syncs-after-fsync off (DEFAULT) > performance.nfs.write-behind-window-size 1MB (DEFAULT) > performance.strict-o-direct????????????? off (DEFAULT) > performance.nfs.strict-o-direct????????? off (DEFAULT) > performance.strict-write-ordering??????? off (DEFAULT) > performance.nfs.strict-write-ordering??? off (DEFAULT) > performance.write-behind-trickling-writes on (DEFAULT) > performance.aggregate-size?????????????? 128KB (DEFAULT) > performance.nfs.write-behind-trickling-writes on (DEFAULT) > performance.lazy-open??????????????????? yes (DEFAULT) > performance.read-after-open????????????? yes (DEFAULT) > performance.open-behind-pass-through???? false (DEFAULT) > performance.read-ahead-page-count??????? 4 (DEFAULT) > performance.read-ahead-pass-through????? false (DEFAULT) > performance.readdir-ahead-pass-through?? false (DEFAULT) > performance.md-cache-pass-through??????? false (DEFAULT) > performance.write-behind-pass-through??? false (DEFAULT) > performance.md-cache-timeout 600 > performance.cache-swift-metadata???????? false (DEFAULT) > performance.cache-samba-metadata on > performance.cache-capability-xattrs????? true (DEFAULT) > performance.cache-ima-xattrs???????????? true (DEFAULT) > performance.md-cache-statfs????????????? off (DEFAULT) > performance.xattr-cache-list (DEFAULT) > performance.nl-cache-pass-through??????? false (DEFAULT) > network.frame-timeout??????????????????? 1800 (DEFAULT) > network.ping-timeout 20 > network.tcp-window-size????????????????? (null) (DEFAULT) > client.ssl off > network.remote-dio?????????????????????? disable (DEFAULT) > client.event-threads 4 > client.tcp-user-timeout 0 > client.keepalive-time 20 > client.keepalive-interval 2 > client.keepalive-count 9 > client.strict-locks off > network.tcp-window-size????????????????? (null) (DEFAULT) > network.inode-lru-limit 200000 > auth.allow * > auth.reject????????????????????????????? (null) (DEFAULT) > transport.keepalive 1 > server.allow-insecure??????????????????? on (DEFAULT) > server.root-squash?????????????????????? off (DEFAULT) > server.all-squash??????????????????????? off (DEFAULT) > server.anonuid?????????????????????????? 65534 (DEFAULT) > server.anongid?????????????????????????? 65534 (DEFAULT) > server.statedump-path??????????????????? /var/run/gluster (DEFAULT) > server.outstanding-rpc-limit???????????? 64 (DEFAULT) > server.ssl off > auth.ssl-allow * > server.manage-gids?????????????????????? off (DEFAULT) > server.dynamic-auth????????????????????? on (DEFAULT) > client.send-gids???????????????????????? on (DEFAULT) > server.gid-timeout?????????????????????? 300 (DEFAULT) > server.own-thread??????????????????????? (null) (DEFAULT) > server.event-threads 4 > server.tcp-user-timeout????????????????? 42 (DEFAULT) > server.keepalive-time 20 > server.keepalive-interval 2 > server.keepalive-count 9 > transport.listen-backlog 1024 > ssl.own-cert???????????????????????????? (null) (DEFAULT) > ssl.private-key????????????????????????? (null) (DEFAULT) > ssl.ca-list????????????????????????????? (null) (DEFAULT) > ssl.crl-path???????????????????????????? (null) (DEFAULT) > ssl.certificate-depth??????????????????? (null) (DEFAULT) > ssl.cipher-list????????????????????????? (null) (DEFAULT) > ssl.dh-param???????????????????????????? (null) (DEFAULT) > ssl.ec-curve???????????????????????????? (null) (DEFAULT) > transport.address-family inet > performance.write-behind off > performance.read-ahead on > performance.readdir-ahead on > performance.io-cache off > performance.open-behind on > performance.quick-read on > performance.nl-cache on > performance.stat-prefetch on > performance.client-io-threads off > performance.nfs.write-behind on > performance.nfs.read-ahead off > performance.nfs.io-cache off > performance.nfs.quick-read off > performance.nfs.stat-prefetch off > performance.nfs.io-threads off > performance.force-readdirp?????????????? true (DEFAULT) > performance.cache-invalidation on > performance.global-cache-invalidation??? true (DEFAULT) > features.uss off > features.snapshot-directory .snaps > features.show-snapshot-directory off > features.tag-namespaces off > network.compression off > network.compression.window-size????????? -15 (DEFAULT) > network.compression.mem-level??????????? 8 (DEFAULT) > network.compression.min-size???????????? 0 (DEFAULT) > network.compression.compression-level??? -1 (DEFAULT) > network.compression.debug??????????????? false (DEFAULT) > features.default-soft-limit????????????? 80% (DEFAULT) > features.soft-timeout??????????????????? 60 (DEFAULT) > features.hard-timeout??????????????????? 5 (DEFAULT) > features.alert-time????????????????????? 86400 (DEFAULT) > features.quota-deem-statfs off > geo-replication.indexing off > geo-replication.indexing off > geo-replication.ignore-pid-check off > geo-replication.ignore-pid-check off > features.quota off > features.inode-quota off > features.bitrot disable > debug.trace off > debug.log-history??????????????????????? no (DEFAULT) > debug.log-file?????????????????????????? no (DEFAULT) > debug.exclude-ops??????????????????????? (null) (DEFAULT) > debug.include-ops??????????????????????? (null) (DEFAULT) > debug.error-gen off > debug.error-failure????????????????????? (null) (DEFAULT) > debug.error-number?????????????????????? (null) (DEFAULT) > debug.random-failure???????????????????? off (DEFAULT) > debug.error-fops???????????????????????? (null) (DEFAULT) > nfs.disable on > features.read-only?????????????????????? off (DEFAULT) > features.worm off > features.worm-file-level off > features.worm-files-deletable on > features.default-retention-period??????? 120 (DEFAULT) > features.retention-mode????????????????? relax (DEFAULT) > features.auto-commit-period????????????? 180 (DEFAULT) > storage.linux-aio??????????????????????? off (DEFAULT) > storage.linux-io_uring?????????????????? off (DEFAULT) > storage.batch-fsync-mode???????????????? reverse-fsync (DEFAULT) > storage.batch-fsync-delay-usec?????????? 0 (DEFAULT) > storage.owner-uid??????????????????????? -1 (DEFAULT) > storage.owner-gid??????????????????????? -1 (DEFAULT) > storage.node-uuid-pathinfo?????????????? off (DEFAULT) > storage.health-check-interval??????????? 30 (DEFAULT) > storage.build-pgfid????????????????????? off (DEFAULT) > storage.gfid2path??????????????????????? on (DEFAULT) > storage.gfid2path-separator????????????? : (DEFAULT) > storage.reserve????????????????????????? 1 (DEFAULT) > storage.health-check-timeout???????????? 20 (DEFAULT) > storage.fips-mode-rchecksum on > storage.force-create-mode??????????????? 0000 (DEFAULT) > storage.force-directory-mode???????????? 0000 (DEFAULT) > storage.create-mask????????????????????? 0777 (DEFAULT) > storage.create-directory-mask??????????? 0777 (DEFAULT) > storage.max-hardlinks??????????????????? 100 (DEFAULT) > features.ctime?????????????????????????? on (DEFAULT) > config.gfproxyd off > cluster.server-quorum-type server > cluster.server-quorum-ratio 51 > changelog.changelog????????????????????? off (DEFAULT) > changelog.changelog-dir????????????????? {{ brick.path > }}/.glusterfs/changelogs (DEFAULT) > changelog.encoding?????????????????????? ascii (DEFAULT) > changelog.rollover-time????????????????? 15 (DEFAULT) > changelog.fsync-interval???????????????? 5 (DEFAULT) > changelog.changelog-barrier-timeout 120 > changelog.capture-del-path?????????????? off (DEFAULT) > features.barrier disable > features.barrier-timeout 120 > features.trash?????????????????????????? off (DEFAULT) > features.trash-dir?????????????????????? .trashcan (DEFAULT) > features.trash-eliminate-path??????????? (null) (DEFAULT) > features.trash-max-filesize????????????? 5MB (DEFAULT) > features.trash-internal-op?????????????? off (DEFAULT) > cluster.enable-shared-storage disable > locks.trace????????????????????????????? off (DEFAULT) > locks.mandatory-locking????????????????? off (DEFAULT) > cluster.disperse-self-heal-daemon??????? enable (DEFAULT) > cluster.quorum-reads???????????????????? no (DEFAULT) > client.bind-insecure???????????????????? (null) (DEFAULT) > features.timeout???????????????????????? 45 (DEFAULT) > features.failover-hosts????????????????? (null) (DEFAULT) > features.shard off > features.shard-block-size??????????????? 64MB (DEFAULT) > features.shard-lru-limit???????????????? 16384 (DEFAULT) > features.shard-deletion-rate???????????? 100 (DEFAULT) > features.scrub-throttle lazy > features.scrub-freq biweekly > features.scrub?????????????????????????? false (DEFAULT) > features.expiry-time 120 > features.signer-threads 4 > features.cache-invalidation on > features.cache-invalidation-timeout 600 > ganesha.enable off > features.leases off > features.lease-lock-recall-timeout?????? 60 (DEFAULT) > disperse.background-heals??????????????? 8 (DEFAULT) > disperse.heal-wait-qlength?????????????? 128 (DEFAULT) > cluster.heal-timeout???????????????????? 600 (DEFAULT) > dht.force-readdirp?????????????????????? on (DEFAULT) > disperse.read-policy???????????????????? gfid-hash (DEFAULT) > cluster.shd-max-threads 4 > cluster.shd-wait-qlength???????????????? 1024 (DEFAULT) > cluster.locking-scheme?????????????????? full (DEFAULT) > cluster.granular-entry-heal????????????? no (DEFAULT) > features.locks-revocation-secs?????????? 0 (DEFAULT) > features.locks-revocation-clear-all????? false (DEFAULT) > features.locks-revocation-max-blocked??? 0 (DEFAULT) > features.locks-monkey-unlocking????????? false (DEFAULT) > features.locks-notify-contention???????? yes (DEFAULT) > features.locks-notify-contention-delay?? 5 (DEFAULT) > disperse.shd-max-threads???????????????? 1 (DEFAULT) > disperse.shd-wait-qlength 4096 > disperse.cpu-extensions????????????????? auto (DEFAULT) > disperse.self-heal-window-size?????????? 32 (DEFAULT) > cluster.use-compound-fops off > performance.parallel-readdir on > performance.rda-request-size 131072 > performance.rda-low-wmark??????????????? 4096 (DEFAULT) > performance.rda-high-wmark?????????????? 128KB (DEFAULT) > performance.rda-cache-limit 10MB > performance.nl-cache-positive-entry????? false (DEFAULT) > performance.nl-cache-limit 10MB > performance.nl-cache-timeout 600 > cluster.brick-multiplex disable > cluster.brick-graceful-cleanup disable > glusterd.vol_count_per_thread 100 > cluster.max-bricks-per-process 250 > disperse.optimistic-change-log?????????? on (DEFAULT) > disperse.stripe-cache??????????????????? 4 (DEFAULT) > cluster.halo-enabled???????????????????? False (DEFAULT) > cluster.halo-shd-max-latency???????????? 99999 (DEFAULT) > cluster.halo-nfsd-max-latency??????????? 5 (DEFAULT) > cluster.halo-max-latency???????????????? 5 (DEFAULT) > cluster.halo-max-replicas??????????????? 99999 (DEFAULT) > cluster.halo-min-replicas??????????????? 2 (DEFAULT) > features.selinux on > cluster.daemon-log-level INFO > debug.delay-gen off > delay-gen.delay-percentage?????????????? 10% (DEFAULT) > delay-gen.delay-duration???????????????? 100000 (DEFAULT) > delay-gen.enable (DEFAULT) > disperse.parallel-writes???????????????? on (DEFAULT) > disperse.quorum-count??????????????????? 0 (DEFAULT) > features.sdfs off > features.cloudsync off > features.ctime on > ctime.noatime on > features.cloudsync-storetype???????????? (null) (DEFAULT) > features.enforce-mandatory-lock off > config.global-threading off > config.client-threads 16 > config.brick-threads 16 > features.cloudsync-remote-read off > features.cloudsync-store-id????????????? (null) (DEFAULT) > features.cloudsync-product-id??????????? (null) (DEFAULT) > features.acl enable > cluster.use-anonymous-inode yes > rebalance.ensure-durability????????????? on (DEFAULT) Again, sorry for the long post. We would be happy to have this solved as we are excited using glusterfs and we would like to go back to having a stable configuration. We always appreciate the spirit of collaboration and reciprocal help on this list. Best Ilias -- ?forumZFD Entschieden f?r Frieden | Committed to Peace Ilias Chasapakis Referent IT | IT Consultant Forum Ziviler Friedensdienst e.V. | Forum Civil Peace Service Am K?lner Brett 8 | 50825 K?ln | Germany Tel 0221 91273243 | Fax 0221 91273299 | http://www.forumZFD.de Vorstand nach ? 26 BGB, einzelvertretungsberechtigt|Executive Board: Alexander Mauz, Sonja Wiekenberg-Mlalandle, Jens von Bargen VR 17651 Amtsgericht K?ln Spenden|Donations: IBAN DE90 4306 0967 4103 7264 00 BIC GENODEM1GLS -------------- next part -------------- A non-text attachment was scrubbed... Name: OpenPGP_signature.asc Type: application/pgp-signature Size: 665 bytes Desc: OpenPGP digital signature URL: From budic at onholyground.com Tue Apr 9 16:26:19 2024 From: budic at onholyground.com (Darrell Budic) Date: Tue, 9 Apr 2024 11:26:19 -0500 Subject: [Gluster-users] Glusterfs 10.5-1 healing issues In-Reply-To: References: Message-ID: The big one I see of you is to investigate and enable sharding. It can improve performance and makes it much easier to heal VM style workloads. Be aware that once you turn it on, you can?t go back easily, and you need to copy the VM disk images around to get them to be sharded before it will show any real effect. A couple other recommendations from my main volume (three dedicated host servers with HDDs and SDD/NVM caching and log volumes on ZFS ). The cluster.shd-* entries are especially recommended. This is on gluster 9.4 at the moment, so some of these won?t map exactly. Volume Name: gv1 Type: Replicate Number of Bricks: 1 x 3 = 3 Transport-type: tcp Options Reconfigured: cluster.read-hash-mode: 3 performance.client-io-threads: on performance.write-behind-window-size: 64MB performance.cache-size: 1G nfs.disable: on performance.readdir-ahead: on performance.quick-read: off performance.read-ahead: on performance.io-cache: off performance.stat-prefetch: on cluster.eager-lock: enable network.remote-dio: enable server.event-threads: 4 client.event-threads: 8 performance.io-thread-count: 64 performance.low-prio-threads: 32 features.shard: on features.shard-block-size: 64MB cluster.locking-scheme: granular cluster.data-self-heal-algorithm: full cluster.shd-max-threads: 8 cluster.shd-wait-qlength: 10240 cluster.choose-local: false cluster.granular-entry-heal: enable Otherwise, more details about your servers, CPU, RAM, and Disks would be useful for suggestions, and details of your network as well. And if you haven?t done kernel level tuning on the servers, you should address that as well. These all vary a lot by your work load and hardware setup, so there aren?t many generic recommendations I can give other than to make sure you tuned your tcp stack and enabled the none disk elevator on SSDs or disks used by ZFS. There?s a lot of tuning suggesting in the archives if you go searching as well. -Darrell > On Apr 9, 2024, at 3:05?AM, Ilias Chasapakis forumZFD wrote: > > Dear all, > > we would like to describe the situation that we have and that does not solve since a long time, that means after many minor > and major upgrades of GlusterFS > > We use a KVM environment for VMs for glusterfs and host servers are updated regularly. Hosts are disomogeneous hardware, > but configured with same characteristics. > > The VMs have been also harmonized to use the virtio drivers where available for devices and resources reserved are the same > on each host. > > Physical switch for hosts has been substituted with a reliable one. > > Probing peers has been and is quite quick in the heartbeat network and communication between the servers for apparently has no issues on disruptions. > > And I say apparently because what we have is: > > - always pending failed heals that used to resolve by a rotated reboot of the gluster vms (replica 3). Restarting only > glusterfs related services (daemon, events etc.) has no effect, only reboot brings results > - very often failed heals are directories > > We lately removed a brick that was on a vm on a host that has been entirely substituted. Re-added the brick, sync went on and > all data was eventually synced and started with 0 pending failed heals. Now it develops failed heals too like its fellow > bricks. Please take into account we healed all the failed entries (manually with various methods) before adding the third brick. > > After some days of operating, the count of failed heals rises again, not really fast but with new entries for sure (which might solve > with rotated reboots, or not). > > We have gluster clients also on ctdbs that connect to the gluster and mount via glusterfs client. Windows roaming profiles shared via smb become frequently corrupted,(they are composed of a great number small files and are though of big total dimension). Gluster nodes are formatted with xfs. > > Also what we observer is that mounting with the vfs option in smb on the ctdbs has some kind of delay. This means that you can see the shared folder on for example > a Windows client machine on a ctdb, but not on another ctdb in the cluster and then after a while it appears there too. And this frequently st > > > This is an excerpt of entries on our shd logs: > >> 2024-04-08 10:13:26.213596 +0000] I [MSGID: 108026] [afr-self-heal-entry.c:1080:afr_selfheal_entry_do] 0-gv-ho-replicate-0: performing full entry selfheal on 2c621415-6223-4b66-a4ca-3f6f267a448d >> [2024-04-08 10:14:08.135911 +0000] W [MSGID: 114031] [client-rpc-fops_v2.c:2457:client4_0_link_cbk] 0-gv-ho-client-5: remote operation failed. [{source=}, {target=(null)}, {errno=116}, {error=Veraltete Dateizugriffsnummer (file handle)}] >> [2024-04-08 10:15:59.135908 +0000] W [MSGID: 114061] [client-common.c:2992:client_pre_readdir_v2] 0-gv-ho-client-5: remote_fd is -1. EBADFD [{gfid=6b5e599e-c836-4ebe-b16a-8224425b88c7}, {errno=77}, {error=Die Dateizugriffsnummer ist in schlechter Verfassung}] >> [2024-04-08 10:30:25.013592 +0000] I [MSGID: 108026] [afr-self-heal-entry.c:1080:afr_selfheal_entry_do] 0-gv-ho-replicate-0: performing full entry selfheal on 24e82e12-5512-4679-9eb3-8bd098367db7 >> [2024-04-08 10:33:17.613594 +0000] W [MSGID: 114031] [client-rpc-fops_v2.c:2457:client4_0_link_cbk] 0-gv-ho-client-5: remote operation failed. [{source=}, {target=(null)}, {errno=116}, {error=Veraltete Dateizugriffsnummer (file handle)}] >> [2024-04-08 10:33:21.201359 +0000] W [MSGID: 114031] [client-rpc-fops_v2.c:2457:client4_0_link_cbk] 0-gv-ho-client-5: remote operation failed. [{source= > > How are he clients mapped to real hosts in order to know on which one?s logs to look at? > > We would like to go by exclusion to finally eradicate this, possibly in a conservative way (not rebuilding everything) and we > > are becoming clueless as to where to look at as we also tried various options settings regarding performance etc. > > Here is the set on our main volume: > >> cluster.lookup-unhashed on (DEFAULT) >> cluster.lookup-optimize on (DEFAULT) >> cluster.min-free-disk 10% (DEFAULT) >> cluster.min-free-inodes 5% (DEFAULT) >> cluster.rebalance-stats off (DEFAULT) >> cluster.subvols-per-directory (null) (DEFAULT) >> cluster.readdir-optimize off (DEFAULT) >> cluster.rsync-hash-regex (null) (DEFAULT) >> cluster.extra-hash-regex (null) (DEFAULT) >> cluster.dht-xattr-name trusted.glusterfs.dht (DEFAULT) >> cluster.randomize-hash-range-by-gfid off (DEFAULT) >> cluster.rebal-throttle normal (DEFAULT) >> cluster.lock-migration off >> cluster.force-migration off >> cluster.local-volume-name (null) (DEFAULT) >> cluster.weighted-rebalance on (DEFAULT) >> cluster.switch-pattern (null) (DEFAULT) >> cluster.entry-change-log on (DEFAULT) >> cluster.read-subvolume (null) (DEFAULT) >> cluster.read-subvolume-index -1 (DEFAULT) >> cluster.read-hash-mode 1 (DEFAULT) >> cluster.background-self-heal-count 8 (DEFAULT) >> cluster.metadata-self-heal on >> cluster.data-self-heal on >> cluster.entry-self-heal on >> cluster.self-heal-daemon enable >> cluster.heal-timeout 600 (DEFAULT) >> cluster.self-heal-window-size 8 (DEFAULT) >> cluster.data-change-log on (DEFAULT) >> cluster.metadata-change-log on (DEFAULT) >> cluster.data-self-heal-algorithm (null) (DEFAULT) >> cluster.eager-lock on (DEFAULT) >> disperse.eager-lock on (DEFAULT) >> disperse.other-eager-lock on (DEFAULT) >> disperse.eager-lock-timeout 1 (DEFAULT) >> disperse.other-eager-lock-timeout 1 (DEFAULT) >> cluster.quorum-type auto >> cluster.quorum-count 2 >> cluster.choose-local true (DEFAULT) >> cluster.self-heal-readdir-size 1KB (DEFAULT) >> cluster.post-op-delay-secs 1 (DEFAULT) >> cluster.ensure-durability on (DEFAULT) >> cluster.consistent-metadata no (DEFAULT) >> cluster.heal-wait-queue-length 128 (DEFAULT) >> cluster.favorite-child-policy none >> cluster.full-lock yes (DEFAULT) >> cluster.optimistic-change-log on (DEFAULT) >> diagnostics.latency-measurement off >> diagnostics.dump-fd-stats off (DEFAULT) >> diagnostics.count-fop-hits off >> diagnostics.brick-log-level INFO >> diagnostics.client-log-level INFO >> diagnostics.brick-sys-log-level CRITICAL (DEFAULT) >> diagnostics.client-sys-log-level CRITICAL (DEFAULT) >> diagnostics.brick-logger (null) (DEFAULT) >> diagnostics.client-logger (null) (DEFAULT) >> diagnostics.brick-log-format (null) (DEFAULT) >> diagnostics.client-log-format (null) (DEFAULT) >> diagnostics.brick-log-buf-size 5 (DEFAULT) >> diagnostics.client-log-buf-size 5 (DEFAULT) >> diagnostics.brick-log-flush-timeout 120 (DEFAULT) >> diagnostics.client-log-flush-timeout 120 (DEFAULT) >> diagnostics.stats-dump-interval 0 (DEFAULT) >> diagnostics.fop-sample-interval 0 (DEFAULT) >> diagnostics.stats-dump-format json (DEFAULT) >> diagnostics.fop-sample-buf-size 65535 (DEFAULT) >> diagnostics.stats-dnscache-ttl-sec 86400 (DEFAULT) >> performance.cache-max-file-size 10 >> performance.cache-min-file-size 0 (DEFAULT) >> performance.cache-refresh-timeout 1 (DEFAULT) >> performance.cache-priority (DEFAULT) >> performance.io-cache-size 32MB (DEFAULT) >> performance.cache-size 32MB (DEFAULT) >> performance.io-thread-count 16 (DEFAULT) >> performance.high-prio-threads 16 (DEFAULT) >> performance.normal-prio-threads 16 (DEFAULT) >> performance.low-prio-threads 16 (DEFAULT) >> performance.least-prio-threads 1 (DEFAULT) >> performance.enable-least-priority on (DEFAULT) >> performance.iot-watchdog-secs (null) (DEFAULT) >> performance.iot-cleanup-disconnected-reqs off (DEFAULT) >> performance.iot-pass-through false (DEFAULT) >> performance.io-cache-pass-through false (DEFAULT) >> performance.quick-read-cache-size 128MB (DEFAULT) >> performance.cache-size 128MB (DEFAULT) >> performance.quick-read-cache-timeout 1 (DEFAULT) >> performance.qr-cache-timeout 600 >> performance.quick-read-cache-invalidation false (DEFAULT) >> performance.ctime-invalidation false (DEFAULT) >> performance.flush-behind on (DEFAULT) >> performance.nfs.flush-behind on (DEFAULT) >> performance.write-behind-window-size 4MB >> performance.resync-failed-syncs-after-fsync off (DEFAULT) >> performance.nfs.write-behind-window-size 1MB (DEFAULT) >> performance.strict-o-direct off (DEFAULT) >> performance.nfs.strict-o-direct off (DEFAULT) >> performance.strict-write-ordering off (DEFAULT) >> performance.nfs.strict-write-ordering off (DEFAULT) >> performance.write-behind-trickling-writes on (DEFAULT) >> performance.aggregate-size 128KB (DEFAULT) >> performance.nfs.write-behind-trickling-writes on (DEFAULT) >> performance.lazy-open yes (DEFAULT) >> performance.read-after-open yes (DEFAULT) >> performance.open-behind-pass-through false (DEFAULT) >> performance.read-ahead-page-count 4 (DEFAULT) >> performance.read-ahead-pass-through false (DEFAULT) >> performance.readdir-ahead-pass-through false (DEFAULT) >> performance.md-cache-pass-through false (DEFAULT) >> performance.write-behind-pass-through false (DEFAULT) >> performance.md-cache-timeout 600 >> performance.cache-swift-metadata false (DEFAULT) >> performance.cache-samba-metadata on >> performance.cache-capability-xattrs true (DEFAULT) >> performance.cache-ima-xattrs true (DEFAULT) >> performance.md-cache-statfs off (DEFAULT) >> performance.xattr-cache-list (DEFAULT) >> performance.nl-cache-pass-through false (DEFAULT) >> network.frame-timeout 1800 (DEFAULT) >> network.ping-timeout 20 >> network.tcp-window-size (null) (DEFAULT) >> client.ssl off >> network.remote-dio disable (DEFAULT) >> client.event-threads 4 >> client.tcp-user-timeout 0 >> client.keepalive-time 20 >> client.keepalive-interval 2 >> client.keepalive-count 9 >> client.strict-locks off >> network.tcp-window-size (null) (DEFAULT) >> network.inode-lru-limit 200000 >> auth.allow * >> auth.reject (null) (DEFAULT) >> transport.keepalive 1 >> server.allow-insecure on (DEFAULT) >> server.root-squash off (DEFAULT) >> server.all-squash off (DEFAULT) >> server.anonuid 65534 (DEFAULT) >> server.anongid 65534 (DEFAULT) >> server.statedump-path /var/run/gluster (DEFAULT) >> server.outstanding-rpc-limit 64 (DEFAULT) >> server.ssl off >> auth.ssl-allow * >> server.manage-gids off (DEFAULT) >> server.dynamic-auth on (DEFAULT) >> client.send-gids on (DEFAULT) >> server.gid-timeout 300 (DEFAULT) >> server.own-thread (null) (DEFAULT) >> server.event-threads 4 >> server.tcp-user-timeout 42 (DEFAULT) >> server.keepalive-time 20 >> server.keepalive-interval 2 >> server.keepalive-count 9 >> transport.listen-backlog 1024 >> ssl.own-cert (null) (DEFAULT) >> ssl.private-key (null) (DEFAULT) >> ssl.ca-list (null) (DEFAULT) >> ssl.crl-path (null) (DEFAULT) >> ssl.certificate-depth (null) (DEFAULT) >> ssl.cipher-list (null) (DEFAULT) >> ssl.dh-param (null) (DEFAULT) >> ssl.ec-curve (null) (DEFAULT) >> transport.address-family inet >> performance.write-behind off >> performance.read-ahead on >> performance.readdir-ahead on >> performance.io-cache off >> performance.open-behind on >> performance.quick-read on >> performance.nl-cache on >> performance.stat-prefetch on >> performance.client-io-threads off >> performance.nfs.write-behind on >> performance.nfs.read-ahead off >> performance.nfs.io-cache off >> performance.nfs.quick-read off >> performance.nfs.stat-prefetch off >> performance.nfs.io-threads off >> performance.force-readdirp true (DEFAULT) >> performance.cache-invalidation on >> performance.global-cache-invalidation true (DEFAULT) >> features.uss off >> features.snapshot-directory .snaps >> features.show-snapshot-directory off >> features.tag-namespaces off >> network.compression off >> network.compression.window-size -15 (DEFAULT) >> network.compression.mem-level 8 (DEFAULT) >> network.compression.min-size 0 (DEFAULT) >> network.compression.compression-level -1 (DEFAULT) >> network.compression.debug false (DEFAULT) >> features.default-soft-limit 80% (DEFAULT) >> features.soft-timeout 60 (DEFAULT) >> features.hard-timeout 5 (DEFAULT) >> features.alert-time 86400 (DEFAULT) >> features.quota-deem-statfs off >> geo-replication.indexing off >> geo-replication.indexing off >> geo-replication.ignore-pid-check off >> geo-replication.ignore-pid-check off >> features.quota off >> features.inode-quota off >> features.bitrot disable >> debug.trace off >> debug.log-history no (DEFAULT) >> debug.log-file no (DEFAULT) >> debug.exclude-ops (null) (DEFAULT) >> debug.include-ops (null) (DEFAULT) >> debug.error-gen off >> debug.error-failure (null) (DEFAULT) >> debug.error-number (null) (DEFAULT) >> debug.random-failure off (DEFAULT) >> debug.error-fops (null) (DEFAULT) >> nfs.disable on >> features.read-only off (DEFAULT) >> features.worm off >> features.worm-file-level off >> features.worm-files-deletable on >> features.default-retention-period 120 (DEFAULT) >> features.retention-mode relax (DEFAULT) >> features.auto-commit-period 180 (DEFAULT) >> storage.linux-aio off (DEFAULT) >> storage.linux-io_uring off (DEFAULT) >> storage.batch-fsync-mode reverse-fsync (DEFAULT) >> storage.batch-fsync-delay-usec 0 (DEFAULT) >> storage.owner-uid -1 (DEFAULT) >> storage.owner-gid -1 (DEFAULT) >> storage.node-uuid-pathinfo off (DEFAULT) >> storage.health-check-interval 30 (DEFAULT) >> storage.build-pgfid off (DEFAULT) >> storage.gfid2path on (DEFAULT) >> storage.gfid2path-separator : (DEFAULT) >> storage.reserve 1 (DEFAULT) >> storage.health-check-timeout 20 (DEFAULT) >> storage.fips-mode-rchecksum on >> storage.force-create-mode 0000 (DEFAULT) >> storage.force-directory-mode 0000 (DEFAULT) >> storage.create-mask 0777 (DEFAULT) >> storage.create-directory-mask 0777 (DEFAULT) >> storage.max-hardlinks 100 (DEFAULT) >> features.ctime on (DEFAULT) >> config.gfproxyd off >> cluster.server-quorum-type server >> cluster.server-quorum-ratio 51 >> changelog.changelog off (DEFAULT) >> changelog.changelog-dir {{ brick.path }}/.glusterfs/changelogs (DEFAULT) >> changelog.encoding ascii (DEFAULT) >> changelog.rollover-time 15 (DEFAULT) >> changelog.fsync-interval 5 (DEFAULT) >> changelog.changelog-barrier-timeout 120 >> changelog.capture-del-path off (DEFAULT) >> features.barrier disable >> features.barrier-timeout 120 >> features.trash off (DEFAULT) >> features.trash-dir .trashcan (DEFAULT) >> features.trash-eliminate-path (null) (DEFAULT) >> features.trash-max-filesize 5MB (DEFAULT) >> features.trash-internal-op off (DEFAULT) >> cluster.enable-shared-storage disable >> locks.trace off (DEFAULT) >> locks.mandatory-locking off (DEFAULT) >> cluster.disperse-self-heal-daemon enable (DEFAULT) >> cluster.quorum-reads no (DEFAULT) >> client.bind-insecure (null) (DEFAULT) >> features.timeout 45 (DEFAULT) >> features.failover-hosts (null) (DEFAULT) >> features.shard off >> features.shard-block-size 64MB (DEFAULT) >> features.shard-lru-limit 16384 (DEFAULT) >> features.shard-deletion-rate 100 (DEFAULT) >> features.scrub-throttle lazy >> features.scrub-freq biweekly >> features.scrub false (DEFAULT) >> features.expiry-time 120 >> features.signer-threads 4 >> features.cache-invalidation on >> features.cache-invalidation-timeout 600 >> ganesha.enable off >> features.leases off >> features.lease-lock-recall-timeout 60 (DEFAULT) >> disperse.background-heals 8 (DEFAULT) >> disperse.heal-wait-qlength 128 (DEFAULT) >> cluster.heal-timeout 600 (DEFAULT) >> dht.force-readdirp on (DEFAULT) >> disperse.read-policy gfid-hash (DEFAULT) >> cluster.shd-max-threads 4 >> cluster.shd-wait-qlength 1024 (DEFAULT) >> cluster.locking-scheme full (DEFAULT) >> cluster.granular-entry-heal no (DEFAULT) >> features.locks-revocation-secs 0 (DEFAULT) >> features.locks-revocation-clear-all false (DEFAULT) >> features.locks-revocation-max-blocked 0 (DEFAULT) >> features.locks-monkey-unlocking false (DEFAULT) >> features.locks-notify-contention yes (DEFAULT) >> features.locks-notify-contention-delay 5 (DEFAULT) >> disperse.shd-max-threads 1 (DEFAULT) >> disperse.shd-wait-qlength 4096 >> disperse.cpu-extensions auto (DEFAULT) >> disperse.self-heal-window-size 32 (DEFAULT) >> cluster.use-compound-fops off >> performance.parallel-readdir on >> performance.rda-request-size 131072 >> performance.rda-low-wmark 4096 (DEFAULT) >> performance.rda-high-wmark 128KB (DEFAULT) >> performance.rda-cache-limit 10MB >> performance.nl-cache-positive-entry false (DEFAULT) >> performance.nl-cache-limit 10MB >> performance.nl-cache-timeout 600 >> cluster.brick-multiplex disable >> cluster.brick-graceful-cleanup disable >> glusterd.vol_count_per_thread 100 >> cluster.max-bricks-per-process 250 >> disperse.optimistic-change-log on (DEFAULT) >> disperse.stripe-cache 4 (DEFAULT) >> cluster.halo-enabled False (DEFAULT) >> cluster.halo-shd-max-latency 99999 (DEFAULT) >> cluster.halo-nfsd-max-latency 5 (DEFAULT) >> cluster.halo-max-latency 5 (DEFAULT) >> cluster.halo-max-replicas 99999 (DEFAULT) >> cluster.halo-min-replicas 2 (DEFAULT) >> features.selinux on >> cluster.daemon-log-level INFO >> debug.delay-gen off >> delay-gen.delay-percentage 10% (DEFAULT) >> delay-gen.delay-duration 100000 (DEFAULT) >> delay-gen.enable (DEFAULT) >> disperse.parallel-writes on (DEFAULT) >> disperse.quorum-count 0 (DEFAULT) >> features.sdfs off >> features.cloudsync off >> features.ctime on >> ctime.noatime on >> features.cloudsync-storetype (null) (DEFAULT) >> features.enforce-mandatory-lock off >> config.global-threading off >> config.client-threads 16 >> config.brick-threads 16 >> features.cloudsync-remote-read off >> features.cloudsync-store-id (null) (DEFAULT) >> features.cloudsync-product-id (null) (DEFAULT) >> features.acl enable >> cluster.use-anonymous-inode yes >> rebalance.ensure-durability on (DEFAULT) > > Again, sorry for the long post. We would be happy to have this solved as we are excited using glusterfs and we would like to go back to having a stable configuration. > > We always appreciate the spirit of collaboration and reciprocal help on this list. > > Best > Ilias > > -- > ?forumZFD > Entschieden f?r Frieden | Committed to Peace > > Ilias Chasapakis > Referent IT | IT Consultant > > Forum Ziviler Friedensdienst e.V. | Forum Civil Peace Service > Am K?lner Brett 8 | 50825 K?ln | Germany > > Tel 0221 91273243 | Fax 0221 91273299 | http://www.forumZFD.de > > Vorstand nach ? 26 BGB, einzelvertretungsberechtigt|Executive Board: > Alexander Mauz, Sonja Wiekenberg-Mlalandle, Jens von Bargen > VR 17651 Amtsgericht K?ln > > Spenden|Donations: IBAN DE90 4306 0967 4103 7264 00 BIC GENODEM1GLS > > ________ > > > > Community Meeting Calendar: > > Schedule - > Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC > Bridge: https://meet.google.com/cpu-eiue-hvk > Gluster-users mailing list > Gluster-users at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users -------------- next part -------------- An HTML attachment was scrubbed... URL: From chasapakis at forumZFD.de Wed Apr 10 15:07:26 2024 From: chasapakis at forumZFD.de (Ilias Chasapakis forumZFD) Date: Wed, 10 Apr 2024 17:07:26 +0200 Subject: [Gluster-users] Glusterfs 10.5-1 healing issues In-Reply-To: References: Message-ID: <8d37f1ed-87ba-43af-b491-b37c642333e6@forumZFD.de> Dear Darrell, Dear ..., Many thanks for the prompt reply. Here some of the additional information requested (please feel free to ask for more if needed) > CPU info: > Hosts > 1. Intel(R) Xeon(R) CPU E5-2630 v4 @ 2.20GHz (20 cores) hw RAID 1 > (adaptec) > 2. Intel(R) Xeon(R) CPU E5-2630 v4 @ 2.20GHz (20 cores) hw RAID 1 > (adaptec) > 3. Intel(R) Xeon(R) Silver 4112 CPU @ 2.60GHz (8 cores) hw RAID 1 > (adaptec) > > GlusterFS VMs > 1. 4 cores,? 10 GB RAM, CPU model Broadwell > 2. 4 cores,? 10 GB RAM, CPU model Broadwell > 3. 4 cores,? 10 GB RAM, CPU model: host passthrough value > > Network info > Physical connection between gluster nodes in a heartbeat network that > comprises a new Cisco switch. > TCP connections with 1Gbit links > Virtual default connectivity with virtio drivers for the NICs and > Macvtap connection to use the host?s connectivity. > No errors or lost packages are recorded between the VMs or the hosts. > Quick iperf tests (between glusters and ctdbs-glusters show now > evident issues). > > Workload: > An instantaneous from this moment which can be considered a peak time > is around 450 files open on the volume. > In terms of "litteral" load we notice cpu peaks mostly related to the > shd process All disks use virtIO drivers (virtIO disks). The file system on all nodes is XFS (not ZFS) Other than the clients on the gluster nodes themselves there are clients on ctdbs that mount the gluster volume and then expose it via smb to Windows clients (user profiles included for roaming profiles). ctdbs reach the glusters through the heartbeat network We are considering to move the glusters to a network with existing DNS capabilities in order to create a round-robin configuration by assigning hosts by IP to a single hostname to use it then for the mounts configuration of the ctdbs. The reasoning/hope behind that we would minimize access time and sync issues. Thank you for the information about the "sharding" we will take this into account and consider pros and cons in the current situation, epsecially because turning back is not easy afterwards. Also our main problem is mainly not with big files, but with a large quantity of small files. We could gladly make use of some of the options you suggested after we assess the situation again. We welcome any further suggestion in the meantime. Ilias Am 09.04.24 um 18:26 schrieb Darrell Budic: > The big one I see of you is to investigate and enable sharding. It can > improve performance and makes it much easier to heal VM style > workloads. Be aware that once you turn it on, you can?t go back > easily, and you need to copy the VM disk images around to get them to > be sharded before it will show any real effect. A couple other > recommendations from my main volume (three dedicated host servers with > HDDs and SDD/NVM caching and log volumes on ZFS ). The cluster.shd-* > entries are especially recommended. This is on gluster 9.4 at the > moment, so some of these won?t map exactly. > > Volume Name: gv1 > > Type: Replicate > > Number of Bricks: 1 x 3 = 3 > > Transport-type: tcp > > Options Reconfigured: > > cluster.read-hash-mode: 3 > > performance.client-io-threads: on > > performance.write-behind-window-size: 64MB > > performance.cache-size: 1G > > nfs.disable: on > > performance.readdir-ahead: on > > performance.quick-read: off > > performance.read-ahead: on > > performance.io-cache: off > > performance.stat-prefetch: on > > cluster.eager-lock: enable > > network.remote-dio: enable > > server.event-threads: 4 > > client.event-threads: 8 > > performance.io-thread-count: 64 > > performance.low-prio-threads: 32 > > features.shard: on > > features.shard-block-size: 64MB > > cluster.locking-scheme: granular > > cluster.data-self-heal-algorithm: full > > cluster.shd-max-threads: 8 > > cluster.shd-wait-qlength: 10240 > > cluster.choose-local: false > > cluster.granular-entry-heal: enable > > > Otherwise, more details about your servers, CPU, RAM, and Disks would > be useful for suggestions, and details of your network as well. And if > you haven?t done kernel level tuning on the servers, you should > address that as well. These all vary a lot by your work load and > hardware setup, so there aren?t many generic recommendations I can > give other than to make sure you tuned your tcp stack and enabled the > none disk elevator on SSDs or disks used by ZFS. > > There?s a lot of tuning suggesting in the archives if you go searching > as well. > > ? -Darrell > > >> On Apr 9, 2024, at 3:05?AM, Ilias Chasapakis forumZFD >> wrote: >> >> Dear all, >> >> we would like to describe the situation that we have and that does >> not solve since a long time, that means after many minor >> and major upgrades of GlusterFS >> >> We use a KVM environment for VMs for glusterfs and host servers are >> updated regularly. Hosts are disomogeneous hardware, >> but configured with same characteristics. >> >> The VMs have been also harmonized to use the virtio drivers where >> available for devices and resources reserved are the same >> on each host. >> >> Physical switch for hosts has been substituted with a reliable one. >> >> Probing peers has been and is quite quick in the heartbeat network >> and communication between the servers for apparently has no issues on >> disruptions. >> >> And I say apparently because what we have is: >> >> - always pending failed heals that used to resolve by a rotated >> reboot of the gluster vms (replica 3). Restarting only >> glusterfs related services (daemon, events etc.) has no effect, only >> reboot brings results >> - very often failed heals are directories >> >> We lately removed a brick that was on a vm on a host that has been >> entirely substituted. Re-added the brick, sync went on and >> all data was eventually synced and started with 0 pending failed >> heals. Now it develops failed heals too like its fellow >> bricks. Please take into account we healed all the failed entries >> (manually with various methods) before adding the third brick. >> >> After some days of operating, the count of failed heals rises again, >> not really fast but with new entries for sure (which might solve >> with rotated reboots, or not). >> >> We have gluster clients also on ctdbs that connect to the gluster and >> mount via glusterfs client. Windows roaming profiles shared via smb >> become frequently corrupted,(they are composed of a great number >> small files and are though of big total dimension). Gluster nodes are >> formatted with xfs. >> >> Also what we observer is that mounting with the vfs option in smb on >> the ctdbs has some kind of delay. This means that you can see the >> shared folder on for example >> a Windows client machine on a ctdb, but not on another ctdb in the >> cluster and then after a while it appears there too. And this >> frequently st >> >> >> This is an excerpt of entries on our shd logs: >> >>> 2024-04-08 10:13:26.213596 +0000] I [MSGID: 108026] >>> [afr-self-heal-entry.c:1080:afr_selfheal_entry_do] >>> 0-gv-ho-replicate-0: performing full entry selfheal on >>> 2c621415-6223-4b66-a4ca-3f6f267a448d >>> [2024-04-08 10:14:08.135911 +0000] W [MSGID: 114031] >>> [client-rpc-fops_v2.c:2457:client4_0_link_cbk] 0-gv-ho-client-5: >>> remote operation failed. >>> [{source=}, >>> {target=(null)}, {errno=116}, {error=Veraltete Dateizugriffsnummer >>> (file handle)}] >>> [2024-04-08 10:15:59.135908 +0000] W [MSGID: 114061] >>> [client-common.c:2992:client_pre_readdir_v2] 0-gv-ho-client-5: >>> remote_fd is -1. EBADFD >>> [{gfid=6b5e599e-c836-4ebe-b16a-8224425b88c7}, {errno=77}, {error=Die >>> Dateizugriffsnummer ist in schlechter Verfassung}] >>> [2024-04-08 10:30:25.013592 +0000] I [MSGID: 108026] >>> [afr-self-heal-entry.c:1080:afr_selfheal_entry_do] >>> 0-gv-ho-replicate-0: performing full entry selfheal on >>> 24e82e12-5512-4679-9eb3-8bd098367db7 >>> [2024-04-08 10:33:17.613594 +0000] W [MSGID: 114031] >>> [client-rpc-fops_v2.c:2457:client4_0_link_cbk] 0-gv-ho-client-5: >>> remote operation failed. >>> [{source=}, >>> {target=(null)}, {errno=116}, {error=Veraltete Dateizugriffsnummer >>> (file handle)}] >>> [2024-04-08 10:33:21.201359 +0000] W [MSGID: 114031] >>> [client-rpc-fops_v2.c:2457:client4_0_link_cbk] 0-gv-ho-client-5: >>> remote operation failed. [{source= >> >> How are he clients mapped to real hosts in order to know on which >> one?s logs to look at? >> >> We would like to go by exclusion to finally eradicate this, possibly >> in a conservative way (not rebuilding everything) and we >> >> are becoming clueless as to where to look at as we also tried various >> options settings regarding performance etc. >> >> Here is the set on our main volume: >> >>> cluster.lookup-unhashed on (DEFAULT) >>> cluster.lookup-optimize????????????????? on (DEFAULT) >>> cluster.min-free-disk??????????????????? 10% (DEFAULT) >>> cluster.min-free-inodes????????????????? 5% (DEFAULT) >>> cluster.rebalance-stats????????????????? off (DEFAULT) >>> cluster.subvols-per-directory??????????? (null) (DEFAULT) >>> cluster.readdir-optimize???????????????? off (DEFAULT) >>> cluster.rsync-hash-regex???????????????? (null) (DEFAULT) >>> cluster.extra-hash-regex???????????????? (null) (DEFAULT) >>> cluster.dht-xattr-name trusted.glusterfs.dht (DEFAULT) >>> cluster.randomize-hash-range-by-gfid???? off (DEFAULT) >>> cluster.rebal-throttle?????????????????? normal (DEFAULT) >>> cluster.lock-migration off >>> cluster.force-migration off >>> cluster.local-volume-name??????????????? (null) (DEFAULT) >>> cluster.weighted-rebalance?????????????? on (DEFAULT) >>> cluster.switch-pattern?????????????????? (null) (DEFAULT) >>> cluster.entry-change-log???????????????? on (DEFAULT) >>> cluster.read-subvolume?????????????????? (null) (DEFAULT) >>> cluster.read-subvolume-index???????????? -1 (DEFAULT) >>> cluster.read-hash-mode?????????????????? 1 (DEFAULT) >>> cluster.background-self-heal-count?????? 8 (DEFAULT) >>> cluster.metadata-self-heal on >>> cluster.data-self-heal on >>> cluster.entry-self-heal on >>> cluster.self-heal-daemon enable >>> cluster.heal-timeout???????????????????? 600 (DEFAULT) >>> cluster.self-heal-window-size??????????? 8 (DEFAULT) >>> cluster.data-change-log????????????????? on (DEFAULT) >>> cluster.metadata-change-log????????????? on (DEFAULT) >>> cluster.data-self-heal-algorithm???????? (null) (DEFAULT) >>> cluster.eager-lock?????????????????????? on (DEFAULT) >>> disperse.eager-lock????????????????????? on (DEFAULT) >>> disperse.other-eager-lock??????????????? on (DEFAULT) >>> disperse.eager-lock-timeout????????????? 1 (DEFAULT) >>> disperse.other-eager-lock-timeout??????? 1 (DEFAULT) >>> cluster.quorum-type auto >>> cluster.quorum-count 2 >>> cluster.choose-local???????????????????? true (DEFAULT) >>> cluster.self-heal-readdir-size?????????? 1KB (DEFAULT) >>> cluster.post-op-delay-secs?????????????? 1 (DEFAULT) >>> cluster.ensure-durability??????????????? on (DEFAULT) >>> cluster.consistent-metadata????????????? no (DEFAULT) >>> cluster.heal-wait-queue-length?????????? 128 (DEFAULT) >>> cluster.favorite-child-policy none >>> cluster.full-lock??????????????????????? yes (DEFAULT) >>> cluster.optimistic-change-log??????????? on (DEFAULT) >>> diagnostics.latency-measurement off >>> diagnostics.dump-fd-stats??????????????? off (DEFAULT) >>> diagnostics.count-fop-hits off >>> diagnostics.brick-log-level INFO >>> diagnostics.client-log-level INFO >>> diagnostics.brick-sys-log-level????????? CRITICAL (DEFAULT) >>> diagnostics.client-sys-log-level???????? CRITICAL (DEFAULT) >>> diagnostics.brick-logger???????????????? (null) (DEFAULT) >>> diagnostics.client-logger??????????????? (null) (DEFAULT) >>> diagnostics.brick-log-format???????????? (null) (DEFAULT) >>> diagnostics.client-log-format??????????? (null) (DEFAULT) >>> diagnostics.brick-log-buf-size?????????? 5 (DEFAULT) >>> diagnostics.client-log-buf-size????????? 5 (DEFAULT) >>> diagnostics.brick-log-flush-timeout????? 120 (DEFAULT) >>> diagnostics.client-log-flush-timeout???? 120 (DEFAULT) >>> diagnostics.stats-dump-interval????????? 0 (DEFAULT) >>> diagnostics.fop-sample-interval????????? 0 (DEFAULT) >>> diagnostics.stats-dump-format??????????? json (DEFAULT) >>> diagnostics.fop-sample-buf-size????????? 65535 (DEFAULT) >>> diagnostics.stats-dnscache-ttl-sec?????? 86400 (DEFAULT) >>> performance.cache-max-file-size 10 >>> performance.cache-min-file-size????????? 0 (DEFAULT) >>> performance.cache-refresh-timeout??????? 1 (DEFAULT) >>> performance.cache-priority (DEFAULT) >>> performance.io-cache-size??????????????? 32MB (DEFAULT) >>> performance.cache-size?????????????????? 32MB (DEFAULT) >>> performance.io-thread-count????????????? 16 (DEFAULT) >>> performance.high-prio-threads??????????? 16 (DEFAULT) >>> performance.normal-prio-threads????????? 16 (DEFAULT) >>> performance.low-prio-threads???????????? 16 (DEFAULT) >>> performance.least-prio-threads?????????? 1 (DEFAULT) >>> performance.enable-least-priority??????? on (DEFAULT) >>> performance.iot-watchdog-secs??????????? (null) (DEFAULT) >>> performance.iot-cleanup-disconnected-reqs off (DEFAULT) >>> performance.iot-pass-through???????????? false (DEFAULT) >>> performance.io-cache-pass-through??????? false (DEFAULT) >>> performance.quick-read-cache-size??????? 128MB (DEFAULT) >>> performance.cache-size?????????????????? 128MB (DEFAULT) >>> performance.quick-read-cache-timeout???? 1 (DEFAULT) >>> performance.qr-cache-timeout 600 >>> performance.quick-read-cache-invalidation false (DEFAULT) >>> performance.ctime-invalidation?????????? false (DEFAULT) >>> performance.flush-behind???????????????? on (DEFAULT) >>> performance.nfs.flush-behind???????????? on (DEFAULT) >>> performance.write-behind-window-size 4MB >>> performance.resync-failed-syncs-after-fsync off (DEFAULT) >>> performance.nfs.write-behind-window-size 1MB (DEFAULT) >>> performance.strict-o-direct????????????? off (DEFAULT) >>> performance.nfs.strict-o-direct????????? off (DEFAULT) >>> performance.strict-write-ordering??????? off (DEFAULT) >>> performance.nfs.strict-write-ordering??? off (DEFAULT) >>> performance.write-behind-trickling-writes on (DEFAULT) >>> performance.aggregate-size?????????????? 128KB (DEFAULT) >>> performance.nfs.write-behind-trickling-writes on (DEFAULT) >>> performance.lazy-open??????????????????? yes (DEFAULT) >>> performance.read-after-open????????????? yes (DEFAULT) >>> performance.open-behind-pass-through???? false (DEFAULT) >>> performance.read-ahead-page-count??????? 4 (DEFAULT) >>> performance.read-ahead-pass-through????? false (DEFAULT) >>> performance.readdir-ahead-pass-through?? false (DEFAULT) >>> performance.md-cache-pass-through??????? false (DEFAULT) >>> performance.write-behind-pass-through??? false (DEFAULT) >>> performance.md-cache-timeout 600 >>> performance.cache-swift-metadata???????? false (DEFAULT) >>> performance.cache-samba-metadata on >>> performance.cache-capability-xattrs????? true (DEFAULT) >>> performance.cache-ima-xattrs???????????? true (DEFAULT) >>> performance.md-cache-statfs????????????? off (DEFAULT) >>> performance.xattr-cache-list (DEFAULT) >>> performance.nl-cache-pass-through??????? false (DEFAULT) >>> network.frame-timeout??????????????????? 1800 (DEFAULT) >>> network.ping-timeout 20 >>> network.tcp-window-size????????????????? (null) (DEFAULT) >>> client.ssl off >>> network.remote-dio?????????????????????? disable (DEFAULT) >>> client.event-threads 4 >>> client.tcp-user-timeout 0 >>> client.keepalive-time 20 >>> client.keepalive-interval 2 >>> client.keepalive-count 9 >>> client.strict-locks off >>> network.tcp-window-size????????????????? (null) (DEFAULT) >>> network.inode-lru-limit 200000 >>> auth.allow * >>> auth.reject????????????????????????????? (null) (DEFAULT) >>> transport.keepalive 1 >>> server.allow-insecure??????????????????? on (DEFAULT) >>> server.root-squash?????????????????????? off (DEFAULT) >>> server.all-squash??????????????????????? off (DEFAULT) >>> server.anonuid?????????????????????????? 65534 (DEFAULT) >>> server.anongid?????????????????????????? 65534 (DEFAULT) >>> server.statedump-path /var/run/gluster (DEFAULT) >>> server.outstanding-rpc-limit???????????? 64 (DEFAULT) >>> server.ssl off >>> auth.ssl-allow * >>> server.manage-gids?????????????????????? off (DEFAULT) >>> server.dynamic-auth????????????????????? on (DEFAULT) >>> client.send-gids???????????????????????? on (DEFAULT) >>> server.gid-timeout?????????????????????? 300 (DEFAULT) >>> server.own-thread??????????????????????? (null) (DEFAULT) >>> server.event-threads 4 >>> server.tcp-user-timeout????????????????? 42 (DEFAULT) >>> server.keepalive-time 20 >>> server.keepalive-interval 2 >>> server.keepalive-count 9 >>> transport.listen-backlog 1024 >>> ssl.own-cert???????????????????????????? (null) (DEFAULT) >>> ssl.private-key????????????????????????? (null) (DEFAULT) >>> ssl.ca-list????????????????????????????? (null) (DEFAULT) >>> ssl.crl-path???????????????????????????? (null) (DEFAULT) >>> ssl.certificate-depth??????????????????? (null) (DEFAULT) >>> ssl.cipher-list????????????????????????? (null) (DEFAULT) >>> ssl.dh-param???????????????????????????? (null) (DEFAULT) >>> ssl.ec-curve???????????????????????????? (null) (DEFAULT) >>> transport.address-family inet >>> performance.write-behind off >>> performance.read-ahead on >>> performance.readdir-ahead on >>> performance.io-cache off >>> performance.open-behind on >>> performance.quick-read on >>> performance.nl-cache on >>> performance.stat-prefetch on >>> performance.client-io-threads off >>> performance.nfs.write-behind on >>> performance.nfs.read-ahead off >>> performance.nfs.io-cache off >>> performance.nfs.quick-read off >>> performance.nfs.stat-prefetch off >>> performance.nfs.io-threads off >>> performance.force-readdirp?????????????? true (DEFAULT) >>> performance.cache-invalidation on >>> performance.global-cache-invalidation??? true (DEFAULT) >>> features.uss off >>> features.snapshot-directory .snaps >>> features.show-snapshot-directory off >>> features.tag-namespaces off >>> network.compression off >>> network.compression.window-size????????? -15 (DEFAULT) >>> network.compression.mem-level??????????? 8 (DEFAULT) >>> network.compression.min-size???????????? 0 (DEFAULT) >>> network.compression.compression-level??? -1 (DEFAULT) >>> network.compression.debug??????????????? false (DEFAULT) >>> features.default-soft-limit????????????? 80% (DEFAULT) >>> features.soft-timeout??????????????????? 60 (DEFAULT) >>> features.hard-timeout??????????????????? 5 (DEFAULT) >>> features.alert-time????????????????????? 86400 (DEFAULT) >>> features.quota-deem-statfs off >>> geo-replication.indexing off >>> geo-replication.indexing off >>> geo-replication.ignore-pid-check off >>> geo-replication.ignore-pid-check off >>> features.quota off >>> features.inode-quota off >>> features.bitrot disable >>> debug.trace off >>> debug.log-history??????????????????????? no (DEFAULT) >>> debug.log-file?????????????????????????? no (DEFAULT) >>> debug.exclude-ops??????????????????????? (null) (DEFAULT) >>> debug.include-ops??????????????????????? (null) (DEFAULT) >>> debug.error-gen off >>> debug.error-failure????????????????????? (null) (DEFAULT) >>> debug.error-number?????????????????????? (null) (DEFAULT) >>> debug.random-failure???????????????????? off (DEFAULT) >>> debug.error-fops???????????????????????? (null) (DEFAULT) >>> nfs.disable on >>> features.read-only?????????????????????? off (DEFAULT) >>> features.worm off >>> features.worm-file-level off >>> features.worm-files-deletable on >>> features.default-retention-period??????? 120 (DEFAULT) >>> features.retention-mode????????????????? relax (DEFAULT) >>> features.auto-commit-period????????????? 180 (DEFAULT) >>> storage.linux-aio??????????????????????? off (DEFAULT) >>> storage.linux-io_uring?????????????????? off (DEFAULT) >>> storage.batch-fsync-mode???????????????? reverse-fsync (DEFAULT) >>> storage.batch-fsync-delay-usec?????????? 0 (DEFAULT) >>> storage.owner-uid??????????????????????? -1 (DEFAULT) >>> storage.owner-gid??????????????????????? -1 (DEFAULT) >>> storage.node-uuid-pathinfo?????????????? off (DEFAULT) >>> storage.health-check-interval??????????? 30 (DEFAULT) >>> storage.build-pgfid????????????????????? off (DEFAULT) >>> storage.gfid2path??????????????????????? on (DEFAULT) >>> storage.gfid2path-separator????????????? : (DEFAULT) >>> storage.reserve????????????????????????? 1 (DEFAULT) >>> storage.health-check-timeout???????????? 20 (DEFAULT) >>> storage.fips-mode-rchecksum on >>> storage.force-create-mode??????????????? 0000 (DEFAULT) >>> storage.force-directory-mode???????????? 0000 (DEFAULT) >>> storage.create-mask????????????????????? 0777 (DEFAULT) >>> storage.create-directory-mask??????????? 0777 (DEFAULT) >>> storage.max-hardlinks??????????????????? 100 (DEFAULT) >>> features.ctime?????????????????????????? on (DEFAULT) >>> config.gfproxyd off >>> cluster.server-quorum-type server >>> cluster.server-quorum-ratio 51 >>> changelog.changelog????????????????????? off (DEFAULT) >>> changelog.changelog-dir????????????????? {{ brick.path >>> }}/.glusterfs/changelogs (DEFAULT) >>> changelog.encoding?????????????????????? ascii (DEFAULT) >>> changelog.rollover-time????????????????? 15 (DEFAULT) >>> changelog.fsync-interval???????????????? 5 (DEFAULT) >>> changelog.changelog-barrier-timeout 120 >>> changelog.capture-del-path?????????????? off (DEFAULT) >>> features.barrier disable >>> features.barrier-timeout 120 >>> features.trash?????????????????????????? off (DEFAULT) >>> features.trash-dir?????????????????????? .trashcan (DEFAULT) >>> features.trash-eliminate-path??????????? (null) (DEFAULT) >>> features.trash-max-filesize????????????? 5MB (DEFAULT) >>> features.trash-internal-op?????????????? off (DEFAULT) >>> cluster.enable-shared-storage disable >>> locks.trace????????????????????????????? off (DEFAULT) >>> locks.mandatory-locking????????????????? off (DEFAULT) >>> cluster.disperse-self-heal-daemon??????? enable (DEFAULT) >>> cluster.quorum-reads???????????????????? no (DEFAULT) >>> client.bind-insecure???????????????????? (null) (DEFAULT) >>> features.timeout???????????????????????? 45 (DEFAULT) >>> features.failover-hosts????????????????? (null) (DEFAULT) >>> features.shard off >>> features.shard-block-size??????????????? 64MB (DEFAULT) >>> features.shard-lru-limit???????????????? 16384 (DEFAULT) >>> features.shard-deletion-rate???????????? 100 (DEFAULT) >>> features.scrub-throttle lazy >>> features.scrub-freq biweekly >>> features.scrub?????????????????????????? false (DEFAULT) >>> features.expiry-time 120 >>> features.signer-threads 4 >>> features.cache-invalidation on >>> features.cache-invalidation-timeout 600 >>> ganesha.enable off >>> features.leases off >>> features.lease-lock-recall-timeout?????? 60 (DEFAULT) >>> disperse.background-heals??????????????? 8 (DEFAULT) >>> disperse.heal-wait-qlength?????????????? 128 (DEFAULT) >>> cluster.heal-timeout???????????????????? 600 (DEFAULT) >>> dht.force-readdirp?????????????????????? on (DEFAULT) >>> disperse.read-policy???????????????????? gfid-hash (DEFAULT) >>> cluster.shd-max-threads 4 >>> cluster.shd-wait-qlength???????????????? 1024 (DEFAULT) >>> cluster.locking-scheme?????????????????? full (DEFAULT) >>> cluster.granular-entry-heal????????????? no (DEFAULT) >>> features.locks-revocation-secs?????????? 0 (DEFAULT) >>> features.locks-revocation-clear-all????? false (DEFAULT) >>> features.locks-revocation-max-blocked??? 0 (DEFAULT) >>> features.locks-monkey-unlocking????????? false (DEFAULT) >>> features.locks-notify-contention???????? yes (DEFAULT) >>> features.locks-notify-contention-delay?? 5 (DEFAULT) >>> disperse.shd-max-threads???????????????? 1 (DEFAULT) >>> disperse.shd-wait-qlength 4096 >>> disperse.cpu-extensions????????????????? auto (DEFAULT) >>> disperse.self-heal-window-size?????????? 32 (DEFAULT) >>> cluster.use-compound-fops off >>> performance.parallel-readdir on >>> performance.rda-request-size 131072 >>> performance.rda-low-wmark??????????????? 4096 (DEFAULT) >>> performance.rda-high-wmark?????????????? 128KB (DEFAULT) >>> performance.rda-cache-limit 10MB >>> performance.nl-cache-positive-entry????? false (DEFAULT) >>> performance.nl-cache-limit 10MB >>> performance.nl-cache-timeout 600 >>> cluster.brick-multiplex disable >>> cluster.brick-graceful-cleanup disable >>> glusterd.vol_count_per_thread 100 >>> cluster.max-bricks-per-process 250 >>> disperse.optimistic-change-log?????????? on (DEFAULT) >>> disperse.stripe-cache??????????????????? 4 (DEFAULT) >>> cluster.halo-enabled???????????????????? False (DEFAULT) >>> cluster.halo-shd-max-latency???????????? 99999 (DEFAULT) >>> cluster.halo-nfsd-max-latency??????????? 5 (DEFAULT) >>> cluster.halo-max-latency???????????????? 5 (DEFAULT) >>> cluster.halo-max-replicas??????????????? 99999 (DEFAULT) >>> cluster.halo-min-replicas??????????????? 2 (DEFAULT) >>> features.selinux on >>> cluster.daemon-log-level INFO >>> debug.delay-gen off >>> delay-gen.delay-percentage?????????????? 10% (DEFAULT) >>> delay-gen.delay-duration???????????????? 100000 (DEFAULT) >>> delay-gen.enable (DEFAULT) >>> disperse.parallel-writes???????????????? on (DEFAULT) >>> disperse.quorum-count??????????????????? 0 (DEFAULT) >>> features.sdfs off >>> features.cloudsync off >>> features.ctime on >>> ctime.noatime on >>> features.cloudsync-storetype???????????? (null) (DEFAULT) >>> features.enforce-mandatory-lock off >>> config.global-threading off >>> config.client-threads 16 >>> config.brick-threads 16 >>> features.cloudsync-remote-read off >>> features.cloudsync-store-id????????????? (null) (DEFAULT) >>> features.cloudsync-product-id??????????? (null) (DEFAULT) >>> features.acl enable >>> cluster.use-anonymous-inode yes >>> rebalance.ensure-durability????????????? on (DEFAULT) >> >> Again, sorry for the long post. We would be happy to have this solved >> as we are excited using glusterfs and we would like to go back to >> having a stable configuration. >> >> We always appreciate the spirit of collaboration and reciprocal help >> on this list. >> >> Best >> Ilias >> >> -- >> ?forumZFD >> Entschieden f?r Frieden | Committed to Peace >> >> Ilias Chasapakis >> Referent IT | IT Consultant >> >> Forum Ziviler Friedensdienst e.V. | Forum Civil Peace Service >> Am K?lner Brett 8 | 50825 K?ln | Germany >> >> Tel 0221 91273243 | Fax 0221 91273299 | http://www.forumZFD.de >> >> Vorstand nach ? 26 BGB, einzelvertretungsberechtigt|Executive Board: >> Alexander Mauz, Sonja Wiekenberg-Mlalandle, Jens von Bargen >> VR 17651 Amtsgericht K?ln >> >> Spenden|Donations: IBAN DE90 4306 0967 4103 7264 00 ??BIC GENODEM1GLS >> >> ________ >> >> >> >> Community Meeting Calendar: >> >> Schedule - >> Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC >> Bridge: https://meet.google.com/cpu-eiue-hvk >> Gluster-users mailing list >> Gluster-users at gluster.org >> https://lists.gluster.org/mailman/listinfo/gluster-users > -- ?forumZFD Entschieden f?r Frieden | Committed to Peace Ilias Chasapakis Referent IT | IT Consultant Forum Ziviler Friedensdienst e.V. | Forum Civil Peace Service Am K?lner Brett 8 | 50825 K?ln | Germany Tel 0221 91273243 | Fax 0221 91273299 |http://www.forumZFD.de Vorstand nach ? 26 BGB, einzelvertretungsberechtigt|Executive Board: Alexander Mauz, Sonja Wiekenberg-Mlalandle, Jens von Bargen VR 17651 Amtsgericht K?ln Spenden|Donations: IBAN DE90 4306 0967 4103 7264 00 BIC GENODEM1GLS -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: OpenPGP_signature.asc Type: application/pgp-signature Size: 665 bytes Desc: OpenPGP digital signature URL: From budic at onholyground.com Wed Apr 10 16:40:58 2024 From: budic at onholyground.com (Darrell Budic) Date: Wed, 10 Apr 2024 11:40:58 -0500 Subject: [Gluster-users] Glusterfs 10.5-1 healing issues In-Reply-To: <8d37f1ed-87ba-43af-b491-b37c642333e6@forumZFD.de> References: <8d37f1ed-87ba-43af-b491-b37c642333e6@forumZFD.de> Message-ID: <38664E17-2F70-4DA0-8231-DF2D71092B29@onholyground.com> I would strongly recommend running the glusterfs servers directly on bare metal instead of in VMs. Check out Ovirt, especially its hybrid cluster model. While it?s not currently well maintained, it works fine on your class of hardware and fully supports this model of gluster on the bare metal and VMs running on the same hosts. And we may see it get some more support after the VMWare buyout, who knows? Gluster isn?t known for small file performance, but hunt through the archives for specific tuning hints. And if you?re using it to host the VM image files, you?re making that problem because the files shared by gluster are large. More cache and write (behind) buffers can help, and 10G or better networking would be something you want to do if you can afford it. Going to 2x1G LAGs can help a tiny bit, but you really want the lower latency from a faster physical media if you can get it. If you are not already using tuned to set virtual-guest profiles on your VMs (and virtual-host on the hosts), I?d look into that as well. Set the disk elevator to ?none? on the VMs as well. > On Apr 10, 2024, at 10:07?AM, Ilias Chasapakis forumZFD wrote: > > Dear Darrell, > > Dear ..., > > Many thanks for the prompt reply. Here some of the additional information requested (please feel free to ask for more if needed) > > >> CPU info: >> Hosts >> 1. Intel(R) Xeon(R) CPU E5-2630 v4 @ 2.20GHz (20 cores) hw RAID 1 (adaptec) >> 2. Intel(R) Xeon(R) CPU E5-2630 v4 @ 2.20GHz (20 cores) hw RAID 1 (adaptec) >> 3. Intel(R) Xeon(R) Silver 4112 CPU @ 2.60GHz (8 cores) hw RAID 1 (adaptec) >> >> GlusterFS VMs >> 1. 4 cores, 10 GB RAM, CPU model Broadwell >> 2. 4 cores, 10 GB RAM, CPU model Broadwell >> 3. 4 cores, 10 GB RAM, CPU model: host passthrough value >> >> Network info >> Physical connection between gluster nodes in a heartbeat network that comprises a new Cisco switch. >> TCP connections with 1Gbit links >> Virtual default connectivity with virtio drivers for the NICs and Macvtap connection to use the host?s connectivity. >> No errors or lost packages are recorded between the VMs or the hosts. Quick iperf tests (between glusters and ctdbs-glusters show now evident issues). >> >> Workload: >> An instantaneous from this moment which can be considered a peak time is around 450 files open on the volume. >> In terms of "litteral" load we notice cpu peaks mostly related to the shd process > > All disks use virtIO drivers (virtIO disks). > > The file system on all nodes is XFS (not ZFS) > > Other than the clients on the gluster nodes themselves there are clients on ctdbs that mount the gluster volume and then expose it via smb to Windows clients (user profiles included for roaming profiles). > ctdbs reach the glusters through the heartbeat network > > We are considering to move the glusters to a network with existing DNS capabilities in order to create a round-robin configuration by assigning hosts by IP to a single hostname to use it then for the mounts configuration of the ctdbs. > The reasoning/hope behind that we would minimize access time and sync issues. > > Thank you for the information about the "sharding" we will take this into account and consider pros and cons in the current situation, epsecially because turning back is not easy afterwards. Also our main problem is mainly not with big files, but with a large quantity of small files. > > We could gladly make use of some of the options you suggested after we assess the situation again. We welcome any further suggestion in the meantime. > > Ilias > > Am 09.04.24 um 18:26 schrieb Darrell Budic: >> The big one I see of you is to investigate and enable sharding. It can improve performance and makes it much easier to heal VM style workloads. Be aware that once you turn it on, you can?t go back easily, and you need to copy the VM disk images around to get them to be sharded before it will show any real effect. A couple other recommendations from my main volume (three dedicated host servers with HDDs and SDD/NVM caching and log volumes on ZFS ). The cluster.shd-* entries are especially recommended. This is on gluster 9.4 at the moment, so some of these won?t map exactly. >> >> Volume Name: gv1 >> Type: Replicate >> Number of Bricks: 1 x 3 = 3 >> Transport-type: tcp >> Options Reconfigured: >> cluster.read-hash-mode: 3 >> performance.client-io-threads: on >> performance.write-behind-window-size: 64MB >> performance.cache-size: 1G >> nfs.disable: on >> performance.readdir-ahead: on >> performance.quick-read: off >> performance.read-ahead: on >> performance.io-cache: off >> performance.stat-prefetch: on >> cluster.eager-lock: enable >> network.remote-dio: enable >> server.event-threads: 4 >> client.event-threads: 8 >> performance.io-thread-count: 64 >> performance.low-prio-threads: 32 >> features.shard: on >> features.shard-block-size: 64MB >> cluster.locking-scheme: granular >> cluster.data-self-heal-algorithm: full >> cluster.shd-max-threads: 8 >> cluster.shd-wait-qlength: 10240 >> cluster.choose-local: false >> cluster.granular-entry-heal: enable >> >> Otherwise, more details about your servers, CPU, RAM, and Disks would be useful for suggestions, and details of your network as well. And if you haven?t done kernel level tuning on the servers, you should address that as well. These all vary a lot by your work load and hardware setup, so there aren?t many generic recommendations I can give other than to make sure you tuned your tcp stack and enabled the none disk elevator on SSDs or disks used by ZFS. >> >> There?s a lot of tuning suggesting in the archives if you go searching as well. >> >> -Darrell >> >> >>> On Apr 9, 2024, at 3:05?AM, Ilias Chasapakis forumZFD wrote: >>> >>> Dear all, >>> >>> we would like to describe the situation that we have and that does not solve since a long time, that means after many minor >>> and major upgrades of GlusterFS >>> >>> We use a KVM environment for VMs for glusterfs and host servers are updated regularly. Hosts are disomogeneous hardware, >>> but configured with same characteristics. >>> >>> The VMs have been also harmonized to use the virtio drivers where available for devices and resources reserved are the same >>> on each host. >>> >>> Physical switch for hosts has been substituted with a reliable one. >>> >>> Probing peers has been and is quite quick in the heartbeat network and communication between the servers for apparently has no issues on disruptions. >>> >>> And I say apparently because what we have is: >>> >>> - always pending failed heals that used to resolve by a rotated reboot of the gluster vms (replica 3). Restarting only >>> glusterfs related services (daemon, events etc.) has no effect, only reboot brings results >>> - very often failed heals are directories >>> >>> We lately removed a brick that was on a vm on a host that has been entirely substituted. Re-added the brick, sync went on and >>> all data was eventually synced and started with 0 pending failed heals. Now it develops failed heals too like its fellow >>> bricks. Please take into account we healed all the failed entries (manually with various methods) before adding the third brick. >>> >>> After some days of operating, the count of failed heals rises again, not really fast but with new entries for sure (which might solve >>> with rotated reboots, or not). >>> >>> We have gluster clients also on ctdbs that connect to the gluster and mount via glusterfs client. Windows roaming profiles shared via smb become frequently corrupted,(they are composed of a great number small files and are though of big total dimension). Gluster nodes are formatted with xfs. >>> >>> Also what we observer is that mounting with the vfs option in smb on the ctdbs has some kind of delay. This means that you can see the shared folder on for example >>> a Windows client machine on a ctdb, but not on another ctdb in the cluster and then after a while it appears there too. And this frequently st >>> >>> >>> This is an excerpt of entries on our shd logs: >>> >>>> 2024-04-08 10:13:26.213596 +0000] I [MSGID: 108026] [afr-self-heal-entry.c:1080:afr_selfheal_entry_do] 0-gv-ho-replicate-0: performing full entry selfheal on 2c621415-6223-4b66-a4ca-3f6f267a448d >>>> [2024-04-08 10:14:08.135911 +0000] W [MSGID: 114031] [client-rpc-fops_v2.c:2457:client4_0_link_cbk] 0-gv-ho-client-5: remote operation failed. [{source=}, {target=(null)}, {errno=116}, {error=Veraltete Dateizugriffsnummer (file handle)}] >>>> [2024-04-08 10:15:59.135908 +0000] W [MSGID: 114061] [client-common.c:2992:client_pre_readdir_v2] 0-gv-ho-client-5: remote_fd is -1. EBADFD [{gfid=6b5e599e-c836-4ebe-b16a-8224425b88c7}, {errno=77}, {error=Die Dateizugriffsnummer ist in schlechter Verfassung}] >>>> [2024-04-08 10:30:25.013592 +0000] I [MSGID: 108026] [afr-self-heal-entry.c:1080:afr_selfheal_entry_do] 0-gv-ho-replicate-0: performing full entry selfheal on 24e82e12-5512-4679-9eb3-8bd098367db7 >>>> [2024-04-08 10:33:17.613594 +0000] W [MSGID: 114031] [client-rpc-fops_v2.c:2457:client4_0_link_cbk] 0-gv-ho-client-5: remote operation failed. [{source=}, {target=(null)}, {errno=116}, {error=Veraltete Dateizugriffsnummer (file handle)}] >>>> [2024-04-08 10:33:21.201359 +0000] W [MSGID: 114031] [client-rpc-fops_v2.c:2457:client4_0_link_cbk] 0-gv-ho-client-5: remote operation failed. [{source= >>> >>> How are he clients mapped to real hosts in order to know on which one?s logs to look at? >>> >>> We would like to go by exclusion to finally eradicate this, possibly in a conservative way (not rebuilding everything) and we >>> >>> are becoming clueless as to where to look at as we also tried various options settings regarding performance etc. >>> >>> Here is the set on our main volume: >>> >>>> cluster.lookup-unhashed on (DEFAULT) >>>> cluster.lookup-optimize on (DEFAULT) >>>> cluster.min-free-disk 10% (DEFAULT) >>>> cluster.min-free-inodes 5% (DEFAULT) >>>> cluster.rebalance-stats off (DEFAULT) >>>> cluster.subvols-per-directory (null) (DEFAULT) >>>> cluster.readdir-optimize off (DEFAULT) >>>> cluster.rsync-hash-regex (null) (DEFAULT) >>>> cluster.extra-hash-regex (null) (DEFAULT) >>>> cluster.dht-xattr-name trusted.glusterfs.dht (DEFAULT) >>>> cluster.randomize-hash-range-by-gfid off (DEFAULT) >>>> cluster.rebal-throttle normal (DEFAULT) >>>> cluster.lock-migration off >>>> cluster.force-migration off >>>> cluster.local-volume-name (null) (DEFAULT) >>>> cluster.weighted-rebalance on (DEFAULT) >>>> cluster.switch-pattern (null) (DEFAULT) >>>> cluster.entry-change-log on (DEFAULT) >>>> cluster.read-subvolume (null) (DEFAULT) >>>> cluster.read-subvolume-index -1 (DEFAULT) >>>> cluster.read-hash-mode 1 (DEFAULT) >>>> cluster.background-self-heal-count 8 (DEFAULT) >>>> cluster.metadata-self-heal on >>>> cluster.data-self-heal on >>>> cluster.entry-self-heal on >>>> cluster.self-heal-daemon enable >>>> cluster.heal-timeout 600 (DEFAULT) >>>> cluster.self-heal-window-size 8 (DEFAULT) >>>> cluster.data-change-log on (DEFAULT) >>>> cluster.metadata-change-log on (DEFAULT) >>>> cluster.data-self-heal-algorithm (null) (DEFAULT) >>>> cluster.eager-lock on (DEFAULT) >>>> disperse.eager-lock on (DEFAULT) >>>> disperse.other-eager-lock on (DEFAULT) >>>> disperse.eager-lock-timeout 1 (DEFAULT) >>>> disperse.other-eager-lock-timeout 1 (DEFAULT) >>>> cluster.quorum-type auto >>>> cluster.quorum-count 2 >>>> cluster.choose-local true (DEFAULT) >>>> cluster.self-heal-readdir-size 1KB (DEFAULT) >>>> cluster.post-op-delay-secs 1 (DEFAULT) >>>> cluster.ensure-durability on (DEFAULT) >>>> cluster.consistent-metadata no (DEFAULT) >>>> cluster.heal-wait-queue-length 128 (DEFAULT) >>>> cluster.favorite-child-policy none >>>> cluster.full-lock yes (DEFAULT) >>>> cluster.optimistic-change-log on (DEFAULT) >>>> diagnostics.latency-measurement off >>>> diagnostics.dump-fd-stats off (DEFAULT) >>>> diagnostics.count-fop-hits off >>>> diagnostics.brick-log-level INFO >>>> diagnostics.client-log-level INFO >>>> diagnostics.brick-sys-log-level CRITICAL (DEFAULT) >>>> diagnostics.client-sys-log-level CRITICAL (DEFAULT) >>>> diagnostics.brick-logger (null) (DEFAULT) >>>> diagnostics.client-logger (null) (DEFAULT) >>>> diagnostics.brick-log-format (null) (DEFAULT) >>>> diagnostics.client-log-format (null) (DEFAULT) >>>> diagnostics.brick-log-buf-size 5 (DEFAULT) >>>> diagnostics.client-log-buf-size 5 (DEFAULT) >>>> diagnostics.brick-log-flush-timeout 120 (DEFAULT) >>>> diagnostics.client-log-flush-timeout 120 (DEFAULT) >>>> diagnostics.stats-dump-interval 0 (DEFAULT) >>>> diagnostics.fop-sample-interval 0 (DEFAULT) >>>> diagnostics.stats-dump-format json (DEFAULT) >>>> diagnostics.fop-sample-buf-size 65535 (DEFAULT) >>>> diagnostics.stats-dnscache-ttl-sec 86400 (DEFAULT) >>>> performance.cache-max-file-size 10 >>>> performance.cache-min-file-size 0 (DEFAULT) >>>> performance.cache-refresh-timeout 1 (DEFAULT) >>>> performance.cache-priority (DEFAULT) >>>> performance.io-cache-size 32MB (DEFAULT) >>>> performance.cache-size 32MB (DEFAULT) >>>> performance.io-thread-count 16 (DEFAULT) >>>> performance.high-prio-threads 16 (DEFAULT) >>>> performance.normal-prio-threads 16 (DEFAULT) >>>> performance.low-prio-threads 16 (DEFAULT) >>>> performance.least-prio-threads 1 (DEFAULT) >>>> performance.enable-least-priority on (DEFAULT) >>>> performance.iot-watchdog-secs (null) (DEFAULT) >>>> performance.iot-cleanup-disconnected-reqs off (DEFAULT) >>>> performance.iot-pass-through false (DEFAULT) >>>> performance.io-cache-pass-through false (DEFAULT) >>>> performance.quick-read-cache-size 128MB (DEFAULT) >>>> performance.cache-size 128MB (DEFAULT) >>>> performance.quick-read-cache-timeout 1 (DEFAULT) >>>> performance.qr-cache-timeout 600 >>>> performance.quick-read-cache-invalidation false (DEFAULT) >>>> performance.ctime-invalidation false (DEFAULT) >>>> performance.flush-behind on (DEFAULT) >>>> performance.nfs.flush-behind on (DEFAULT) >>>> performance.write-behind-window-size 4MB >>>> performance.resync-failed-syncs-after-fsync off (DEFAULT) >>>> performance.nfs.write-behind-window-size 1MB (DEFAULT) >>>> performance.strict-o-direct off (DEFAULT) >>>> performance.nfs.strict-o-direct off (DEFAULT) >>>> performance.strict-write-ordering off (DEFAULT) >>>> performance.nfs.strict-write-ordering off (DEFAULT) >>>> performance.write-behind-trickling-writes on (DEFAULT) >>>> performance.aggregate-size 128KB (DEFAULT) >>>> performance.nfs.write-behind-trickling-writes on (DEFAULT) >>>> performance.lazy-open yes (DEFAULT) >>>> performance.read-after-open yes (DEFAULT) >>>> performance.open-behind-pass-through false (DEFAULT) >>>> performance.read-ahead-page-count 4 (DEFAULT) >>>> performance.read-ahead-pass-through false (DEFAULT) >>>> performance.readdir-ahead-pass-through false (DEFAULT) >>>> performance.md-cache-pass-through false (DEFAULT) >>>> performance.write-behind-pass-through false (DEFAULT) >>>> performance.md-cache-timeout 600 >>>> performance.cache-swift-metadata false (DEFAULT) >>>> performance.cache-samba-metadata on >>>> performance.cache-capability-xattrs true (DEFAULT) >>>> performance.cache-ima-xattrs true (DEFAULT) >>>> performance.md-cache-statfs off (DEFAULT) >>>> performance.xattr-cache-list (DEFAULT) >>>> performance.nl-cache-pass-through false (DEFAULT) >>>> network.frame-timeout 1800 (DEFAULT) >>>> network.ping-timeout 20 >>>> network.tcp-window-size (null) (DEFAULT) >>>> client.ssl off >>>> network.remote-dio disable (DEFAULT) >>>> client.event-threads 4 >>>> client.tcp-user-timeout 0 >>>> client.keepalive-time 20 >>>> client.keepalive-interval 2 >>>> client.keepalive-count 9 >>>> client.strict-locks off >>>> network.tcp-window-size (null) (DEFAULT) >>>> network.inode-lru-limit 200000 >>>> auth.allow * >>>> auth.reject (null) (DEFAULT) >>>> transport.keepalive 1 >>>> server.allow-insecure on (DEFAULT) >>>> server.root-squash off (DEFAULT) >>>> server.all-squash off (DEFAULT) >>>> server.anonuid 65534 (DEFAULT) >>>> server.anongid 65534 (DEFAULT) >>>> server.statedump-path /var/run/gluster (DEFAULT) >>>> server.outstanding-rpc-limit 64 (DEFAULT) >>>> server.ssl off >>>> auth.ssl-allow * >>>> server.manage-gids off (DEFAULT) >>>> server.dynamic-auth on (DEFAULT) >>>> client.send-gids on (DEFAULT) >>>> server.gid-timeout 300 (DEFAULT) >>>> server.own-thread (null) (DEFAULT) >>>> server.event-threads 4 >>>> server.tcp-user-timeout 42 (DEFAULT) >>>> server.keepalive-time 20 >>>> server.keepalive-interval 2 >>>> server.keepalive-count 9 >>>> transport.listen-backlog 1024 >>>> ssl.own-cert (null) (DEFAULT) >>>> ssl.private-key (null) (DEFAULT) >>>> ssl.ca-list (null) (DEFAULT) >>>> ssl.crl-path (null) (DEFAULT) >>>> ssl.certificate-depth (null) (DEFAULT) >>>> ssl.cipher-list (null) (DEFAULT) >>>> ssl.dh-param (null) (DEFAULT) >>>> ssl.ec-curve (null) (DEFAULT) >>>> transport.address-family inet >>>> performance.write-behind off >>>> performance.read-ahead on >>>> performance.readdir-ahead on >>>> performance.io-cache off >>>> performance.open-behind on >>>> performance.quick-read on >>>> performance.nl-cache on >>>> performance.stat-prefetch on >>>> performance.client-io-threads off >>>> performance.nfs.write-behind on >>>> performance.nfs.read-ahead off >>>> performance.nfs.io-cache off >>>> performance.nfs.quick-read off >>>> performance.nfs.stat-prefetch off >>>> performance.nfs.io-threads off >>>> performance.force-readdirp true (DEFAULT) >>>> performance.cache-invalidation on >>>> performance.global-cache-invalidation true (DEFAULT) >>>> features.uss off >>>> features.snapshot-directory .snaps >>>> features.show-snapshot-directory off >>>> features.tag-namespaces off >>>> network.compression off >>>> network.compression.window-size -15 (DEFAULT) >>>> network.compression.mem-level 8 (DEFAULT) >>>> network.compression.min-size 0 (DEFAULT) >>>> network.compression.compression-level -1 (DEFAULT) >>>> network.compression.debug false (DEFAULT) >>>> features.default-soft-limit 80% (DEFAULT) >>>> features.soft-timeout 60 (DEFAULT) >>>> features.hard-timeout 5 (DEFAULT) >>>> features.alert-time 86400 (DEFAULT) >>>> features.quota-deem-statfs off >>>> geo-replication.indexing off >>>> geo-replication.indexing off >>>> geo-replication.ignore-pid-check off >>>> geo-replication.ignore-pid-check off >>>> features.quota off >>>> features.inode-quota off >>>> features.bitrot disable >>>> debug.trace off >>>> debug.log-history no (DEFAULT) >>>> debug.log-file no (DEFAULT) >>>> debug.exclude-ops (null) (DEFAULT) >>>> debug.include-ops (null) (DEFAULT) >>>> debug.error-gen off >>>> debug.error-failure (null) (DEFAULT) >>>> debug.error-number (null) (DEFAULT) >>>> debug.random-failure off (DEFAULT) >>>> debug.error-fops (null) (DEFAULT) >>>> nfs.disable on >>>> features.read-only off (DEFAULT) >>>> features.worm off >>>> features.worm-file-level off >>>> features.worm-files-deletable on >>>> features.default-retention-period 120 (DEFAULT) >>>> features.retention-mode relax (DEFAULT) >>>> features.auto-commit-period 180 (DEFAULT) >>>> storage.linux-aio off (DEFAULT) >>>> storage.linux-io_uring off (DEFAULT) >>>> storage.batch-fsync-mode reverse-fsync (DEFAULT) >>>> storage.batch-fsync-delay-usec 0 (DEFAULT) >>>> storage.owner-uid -1 (DEFAULT) >>>> storage.owner-gid -1 (DEFAULT) >>>> storage.node-uuid-pathinfo off (DEFAULT) >>>> storage.health-check-interval 30 (DEFAULT) >>>> storage.build-pgfid off (DEFAULT) >>>> storage.gfid2path on (DEFAULT) >>>> storage.gfid2path-separator : (DEFAULT) >>>> storage.reserve 1 (DEFAULT) >>>> storage.health-check-timeout 20 (DEFAULT) >>>> storage.fips-mode-rchecksum on >>>> storage.force-create-mode 0000 (DEFAULT) >>>> storage.force-directory-mode 0000 (DEFAULT) >>>> storage.create-mask 0777 (DEFAULT) >>>> storage.create-directory-mask 0777 (DEFAULT) >>>> storage.max-hardlinks 100 (DEFAULT) >>>> features.ctime on (DEFAULT) >>>> config.gfproxyd off >>>> cluster.server-quorum-type server >>>> cluster.server-quorum-ratio 51 >>>> changelog.changelog off (DEFAULT) >>>> changelog.changelog-dir {{ brick.path }}/.glusterfs/changelogs (DEFAULT) >>>> changelog.encoding ascii (DEFAULT) >>>> changelog.rollover-time 15 (DEFAULT) >>>> changelog.fsync-interval 5 (DEFAULT) >>>> changelog.changelog-barrier-timeout 120 >>>> changelog.capture-del-path off (DEFAULT) >>>> features.barrier disable >>>> features.barrier-timeout 120 >>>> features.trash off (DEFAULT) >>>> features.trash-dir .trashcan (DEFAULT) >>>> features.trash-eliminate-path (null) (DEFAULT) >>>> features.trash-max-filesize 5MB (DEFAULT) >>>> features.trash-internal-op off (DEFAULT) >>>> cluster.enable-shared-storage disable >>>> locks.trace off (DEFAULT) >>>> locks.mandatory-locking off (DEFAULT) >>>> cluster.disperse-self-heal-daemon enable (DEFAULT) >>>> cluster.quorum-reads no (DEFAULT) >>>> client.bind-insecure (null) (DEFAULT) >>>> features.timeout 45 (DEFAULT) >>>> features.failover-hosts (null) (DEFAULT) >>>> features.shard off >>>> features.shard-block-size 64MB (DEFAULT) >>>> features.shard-lru-limit 16384 (DEFAULT) >>>> features.shard-deletion-rate 100 (DEFAULT) >>>> features.scrub-throttle lazy >>>> features.scrub-freq biweekly >>>> features.scrub false (DEFAULT) >>>> features.expiry-time 120 >>>> features.signer-threads 4 >>>> features.cache-invalidation on >>>> features.cache-invalidation-timeout 600 >>>> ganesha.enable off >>>> features.leases off >>>> features.lease-lock-recall-timeout 60 (DEFAULT) >>>> disperse.background-heals 8 (DEFAULT) >>>> disperse.heal-wait-qlength 128 (DEFAULT) >>>> cluster.heal-timeout 600 (DEFAULT) >>>> dht.force-readdirp on (DEFAULT) >>>> disperse.read-policy gfid-hash (DEFAULT) >>>> cluster.shd-max-threads 4 >>>> cluster.shd-wait-qlength 1024 (DEFAULT) >>>> cluster.locking-scheme full (DEFAULT) >>>> cluster.granular-entry-heal no (DEFAULT) >>>> features.locks-revocation-secs 0 (DEFAULT) >>>> features.locks-revocation-clear-all false (DEFAULT) >>>> features.locks-revocation-max-blocked 0 (DEFAULT) >>>> features.locks-monkey-unlocking false (DEFAULT) >>>> features.locks-notify-contention yes (DEFAULT) >>>> features.locks-notify-contention-delay 5 (DEFAULT) >>>> disperse.shd-max-threads 1 (DEFAULT) >>>> disperse.shd-wait-qlength 4096 >>>> disperse.cpu-extensions auto (DEFAULT) >>>> disperse.self-heal-window-size 32 (DEFAULT) >>>> cluster.use-compound-fops off >>>> performance.parallel-readdir on >>>> performance.rda-request-size 131072 >>>> performance.rda-low-wmark 4096 (DEFAULT) >>>> performance.rda-high-wmark 128KB (DEFAULT) >>>> performance.rda-cache-limit 10MB >>>> performance.nl-cache-positive-entry false (DEFAULT) >>>> performance.nl-cache-limit 10MB >>>> performance.nl-cache-timeout 600 >>>> cluster.brick-multiplex disable >>>> cluster.brick-graceful-cleanup disable >>>> glusterd.vol_count_per_thread 100 >>>> cluster.max-bricks-per-process 250 >>>> disperse.optimistic-change-log on (DEFAULT) >>>> disperse.stripe-cache 4 (DEFAULT) >>>> cluster.halo-enabled False (DEFAULT) >>>> cluster.halo-shd-max-latency 99999 (DEFAULT) >>>> cluster.halo-nfsd-max-latency 5 (DEFAULT) >>>> cluster.halo-max-latency 5 (DEFAULT) >>>> cluster.halo-max-replicas 99999 (DEFAULT) >>>> cluster.halo-min-replicas 2 (DEFAULT) >>>> features.selinux on >>>> cluster.daemon-log-level INFO >>>> debug.delay-gen off >>>> delay-gen.delay-percentage 10% (DEFAULT) >>>> delay-gen.delay-duration 100000 (DEFAULT) >>>> delay-gen.enable (DEFAULT) >>>> disperse.parallel-writes on (DEFAULT) >>>> disperse.quorum-count 0 (DEFAULT) >>>> features.sdfs off >>>> features.cloudsync off >>>> features.ctime on >>>> ctime.noatime on >>>> features.cloudsync-storetype (null) (DEFAULT) >>>> features.enforce-mandatory-lock off >>>> config.global-threading off >>>> config.client-threads 16 >>>> config.brick-threads 16 >>>> features.cloudsync-remote-read off >>>> features.cloudsync-store-id (null) (DEFAULT) >>>> features.cloudsync-product-id (null) (DEFAULT) >>>> features.acl enable >>>> cluster.use-anonymous-inode yes >>>> rebalance.ensure-durability on (DEFAULT) >>> >>> Again, sorry for the long post. We would be happy to have this solved as we are excited using glusterfs and we would like to go back to having a stable configuration. >>> >>> We always appreciate the spirit of collaboration and reciprocal help on this list. >>> >>> Best >>> Ilias >>> >>> -- >>> ?forumZFD >>> Entschieden f?r Frieden | Committed to Peace >>> >>> Ilias Chasapakis >>> Referent IT | IT Consultant >>> >>> Forum Ziviler Friedensdienst e.V. | Forum Civil Peace Service >>> Am K?lner Brett 8 | 50825 K?ln | Germany >>> >>> Tel 0221 91273243 | Fax 0221 91273299 | http://www.forumZFD.de >>> >>> Vorstand nach ? 26 BGB, einzelvertretungsberechtigt|Executive Board: >>> Alexander Mauz, Sonja Wiekenberg-Mlalandle, Jens von Bargen >>> VR 17651 Amtsgericht K?ln >>> >>> Spenden|Donations: IBAN DE90 4306 0967 4103 7264 00 BIC GENODEM1GLS >>> >>> ________ >>> >>> >>> >>> Community Meeting Calendar: >>> >>> Schedule - >>> Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC >>> Bridge: https://meet.google.com/cpu-eiue-hvk >>> Gluster-users mailing list >>> Gluster-users at gluster.org >>> https://lists.gluster.org/mailman/listinfo/gluster-users >> > -- > ?forumZFD > Entschieden f?r Frieden | Committed to Peace > > Ilias Chasapakis > Referent IT | IT Consultant > > Forum Ziviler Friedensdienst e.V. | Forum Civil Peace Service > Am K?lner Brett 8 | 50825 K?ln | Germany > > Tel 0221 91273243 | Fax 0221 91273299 | http://www.forumZFD.de > > Vorstand nach ? 26 BGB, einzelvertretungsberechtigt|Executive Board: > Alexander Mauz, Sonja Wiekenberg-Mlalandle, Jens von Bargen > VR 17651 Amtsgericht K?ln > > Spenden|Donations: IBAN DE90 4306 0967 4103 7264 00 BIC GENODEM1GLS > ________ > > > > Community Meeting Calendar: > > Schedule - > Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC > Bridge: https://meet.google.com/cpu-eiue-hvk > Gluster-users mailing list > Gluster-users at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users -------------- next part -------------- An HTML attachment was scrubbed... URL: From chasapakis at forumZFD.de Mon Apr 22 14:59:44 2024 From: chasapakis at forumZFD.de (Ilias Chasapakis forumZFD) Date: Mon, 22 Apr 2024 16:59:44 +0200 Subject: [Gluster-users] Glusterfs 10.5-1 healing issues In-Reply-To: <38664E17-2F70-4DA0-8231-DF2D71092B29@onholyground.com> References: <8d37f1ed-87ba-43af-b491-b37c642333e6@forumZFD.de> <38664E17-2F70-4DA0-8231-DF2D71092B29@onholyground.com> Message-ID: <0113927e-59ff-4e1c-88ff-5adb660500a4@forumZFD.de> Dear Darrell, Thanks again for the suggestions. We used some of the options suggested for our case and in the meantime we also set the round-robin configuration in order to have a single pointer in the samba share configuration that then picks one of the bricks (single host name in our DNS each brick). This has improved the situation and the bricks are developing less entries. But less is not really "none" as unfortunately we have a behaviour observed: We very frequently have directories not healing and then the files and dirs under this "base path". Sometimes one brick reports "New Folder" in the path, and the others have already a name for it. So we see a pattern also with renamed directories. I think that we have an issue with renaming taking place slowly on one brick (and it is not necessary the same one) while the others have already the correct name. Rebooting solves this, or renaming the file on the volume (or in case of new folder create a "New Folder" in the path the brick which is wrong expects it), issue listing, (delete if "New Folder") rename the folder with old name on the "wrong" brick, listing, re-renaming and the entries are healed. At this point the names match and everything is back. What is not fully understandable for me (because I surely miss something fundamental) is that the gfid should be the same everywhere and I would not expect that a rename changes that. So what is seen as mismatching is just the "name attribute", once this matches, everything is back fine. Sorry for the simplistic exposure of facts, but I hope you have some suggestions anyway. Perhaps it would be better if I send a post related to renamings. I saw folks setting open-behind to off, tried that but no results. Could be also that we mount incorrectly on our ctdbs? Again I apologize as I understand that the points of failure can be many, but we are trying to do something with data/logs and observations that we have available. Best regards Ilias Am 10.04.24 um 18:40 schrieb Darrell Budic: > I would strongly recommend running the glusterfs servers directly on > bare metal instead of in VMs. Check out Ovirt, especially its hybrid > cluster model. While it?s not currently well maintained, it works fine > on your class of hardware and fully supports this model of gluster on > the bare metal and VMs running on the same hosts. And we may see it > get some more support after the VMWare buyout, who knows? > > Gluster isn?t known for small file performance, but hunt through the > archives for specific tuning hints. And if you?re using it to host the > VM image files, you?re making that problem because the files shared by > gluster are large. More cache and write (behind) buffers can help, and > 10G or better networking would be something you want to do if you can > afford it. Going to 2x1G LAGs can help a tiny bit, but you really want > the lower latency from a faster physical media if you can get it. > > If you are not already using tuned to set virtual-guest profiles on > your VMs (and virtual-host on the hosts), I?d look into that as well. > Set the disk elevator to ?none? on the VMs as well. > > >> On Apr 10, 2024, at 10:07?AM, Ilias Chasapakis forumZFD >> wrote: >> >> Dear Darrell, >> >> Dear ..., >> >> Many thanks for the prompt reply. Here some of the additional >> information requested (please feel free to ask for more if needed) >> >>> CPU info: >>> Hosts >>> 1. Intel(R) Xeon(R) CPU E5-2630 v4 @ 2.20GHz (20 cores) hw RAID 1 >>> (adaptec) >>> 2. Intel(R) Xeon(R) CPU E5-2630 v4 @ 2.20GHz (20 cores) hw RAID 1 >>> (adaptec) >>> 3. Intel(R) Xeon(R) Silver 4112 CPU @ 2.60GHz (8 cores) hw RAID 1 >>> (adaptec) >>> >>> GlusterFS VMs >>> 1. 4 cores,? 10 GB RAM, CPU model Broadwell >>> 2. 4 cores,? 10 GB RAM, CPU model Broadwell >>> 3. 4 cores,? 10 GB RAM, CPU model: host passthrough value >>> >>> Network info >>> Physical connection between gluster nodes in a heartbeat network >>> that comprises a new Cisco switch. >>> TCP connections with 1Gbit links >>> Virtual default connectivity with virtio drivers for the NICs and >>> Macvtap connection to use the host?s connectivity. >>> No errors or lost packages are recorded between the VMs or the >>> hosts. Quick iperf tests (between glusters and ctdbs-glusters show >>> now evident issues). >>> >>> Workload: >>> An instantaneous from this moment which can be considered a peak >>> time is around 450 files open on the volume. >>> In terms of "litteral" load we notice cpu peaks mostly related to >>> the shd process >> >> All disks use virtIO drivers (virtIO disks). >> >> The file system on all nodes is XFS (not ZFS) >> >> Other than the clients on the gluster nodes themselves there are >> clients on ctdbs that mount the gluster volume and then expose it via >> smb to Windows clients (user profiles included for roaming profiles). >> ctdbs reach the glusters through the heartbeat network >> >> We are considering to move the glusters to a network with existing >> DNS capabilities in order to create a round-robin configuration by >> assigning hosts by IP to a single hostname to use it then for the >> mounts configuration of the ctdbs. >> The reasoning/hope behind that we would minimize access time and sync >> issues. >> >> Thank you for the information about the "sharding" we will take this >> into account and consider pros and cons in the current situation, >> epsecially because turning back is not easy afterwards. Also our main >> problem is mainly not with big files, but with a large quantity of >> small files. >> >> We could gladly make use of some of the options you suggested after >> we assess the situation again. We welcome any further suggestion in >> the meantime. >> >> Ilias >> >> Am 09.04.24 um 18:26 schrieb Darrell Budic: >>> The big one I see of you is to investigate and enable sharding. It >>> can improve performance and makes it much easier to heal VM style >>> workloads. Be aware that once you turn it on, you can?t go back >>> easily, and you need to copy the VM disk images around to get them >>> to be sharded before it will show any real effect. A couple other >>> recommendations from my main volume (three dedicated host servers >>> with HDDs and SDD/NVM caching and log volumes on ZFS ). The >>> cluster.shd-* entries are especially recommended. This is on gluster >>> 9.4 at the moment, so some of these won?t map exactly. >>> >>> Volume Name: gv1 >>> Type: Replicate >>> Number of Bricks: 1 x 3 = 3 >>> Transport-type: tcp >>> Options Reconfigured: >>> cluster.read-hash-mode: 3 >>> performance.client-io-threads: on >>> performance.write-behind-window-size: 64MB >>> performance.cache-size: 1G >>> nfs.disable: on >>> performance.readdir-ahead: on >>> performance.quick-read: off >>> performance.read-ahead: on >>> performance.io-cache: off >>> performance.stat-prefetch: on >>> cluster.eager-lock: enable >>> network.remote-dio: enable >>> server.event-threads: 4 >>> client.event-threads: 8 >>> performance.io-thread-count: 64 >>> performance.low-prio-threads: 32 >>> features.shard: on >>> features.shard-block-size: 64MB >>> cluster.locking-scheme: granular >>> cluster.data-self-heal-algorithm: full >>> cluster.shd-max-threads: 8 >>> cluster.shd-wait-qlength: 10240 >>> cluster.choose-local: false >>> cluster.granular-entry-heal: enable >>> >>> Otherwise, more details about your servers, CPU, RAM, and Disks >>> would be useful for suggestions, and details of your network as >>> well. And if you haven?t done kernel level tuning on the servers, >>> you should address that as well. These all vary a lot by your work >>> load and hardware setup, so there aren?t many generic >>> recommendations I can give other than to make sure you tuned your >>> tcp stack and enabled the none disk elevator on SSDs or disks used >>> by ZFS. >>> >>> There?s a lot of tuning suggesting in the archives if you go >>> searching as well. >>> >>> ? -Darrell >>> >>> >>>> On Apr 9, 2024, at 3:05?AM, Ilias Chasapakis forumZFD >>>> wrote: >>>> >>>> Dear all, >>>> >>>> we would like to describe the situation that we have and that does >>>> not solve since a long time, that means after many minor >>>> and major upgrades of GlusterFS >>>> >>>> We use a KVM environment for VMs for glusterfs and host servers are >>>> updated regularly. Hosts are disomogeneous hardware, >>>> but configured with same characteristics. >>>> >>>> The VMs have been also harmonized to use the virtio drivers where >>>> available for devices and resources reserved are the same >>>> on each host. >>>> >>>> Physical switch for hosts has been substituted with a reliable one. >>>> >>>> Probing peers has been and is quite quick in the heartbeat network >>>> and communication between the servers for apparently has no issues >>>> on disruptions. >>>> >>>> And I say apparently because what we have is: >>>> >>>> - always pending failed heals that used to resolve by a rotated >>>> reboot of the gluster vms (replica 3). Restarting only >>>> glusterfs related services (daemon, events etc.) has no effect, >>>> only reboot brings results >>>> - very often failed heals are directories >>>> >>>> We lately removed a brick that was on a vm on a host that has been >>>> entirely substituted. Re-added the brick, sync went on and >>>> all data was eventually synced and started with 0 pending failed >>>> heals. Now it develops failed heals too like its fellow >>>> bricks. Please take into account we healed all the failed entries >>>> (manually with various methods) before adding the third brick. >>>> >>>> After some days of operating, the count of failed heals rises >>>> again, not really fast but with new entries for sure (which might solve >>>> with rotated reboots, or not). >>>> >>>> We have gluster clients also on ctdbs that connect to the gluster >>>> and mount via glusterfs client. Windows roaming profiles shared via >>>> smb become frequently corrupted,(they are composed of a great >>>> number small files and are though of big total dimension). Gluster >>>> nodes are formatted with xfs. >>>> >>>> Also what we observer is that mounting with the vfs option in smb >>>> on the ctdbs has some kind of delay. This means that you can see >>>> the shared folder on for example >>>> a Windows client machine on a ctdb, but not on another ctdb in the >>>> cluster and then after a while it appears there too. And this >>>> frequently st >>>> >>>> >>>> This is an excerpt of entries on our shd logs: >>>> >>>>> 2024-04-08 10:13:26.213596 +0000] I [MSGID: 108026] >>>>> [afr-self-heal-entry.c:1080:afr_selfheal_entry_do] >>>>> 0-gv-ho-replicate-0: performing full entry selfheal on >>>>> 2c621415-6223-4b66-a4ca-3f6f267a448d >>>>> [2024-04-08 10:14:08.135911 +0000] W [MSGID: 114031] >>>>> [client-rpc-fops_v2.c:2457:client4_0_link_cbk] 0-gv-ho-client-5: >>>>> remote operation failed. >>>>> [{source=}, >>>>> {target=(null)}, {errno=116}, {error=Veraltete Dateizugriffsnummer >>>>> (file handle)}] >>>>> [2024-04-08 10:15:59.135908 +0000] W [MSGID: 114061] >>>>> [client-common.c:2992:client_pre_readdir_v2] 0-gv-ho-client-5: >>>>> remote_fd is -1. EBADFD >>>>> [{gfid=6b5e599e-c836-4ebe-b16a-8224425b88c7}, {errno=77}, >>>>> {error=Die Dateizugriffsnummer ist in schlechter Verfassung}] >>>>> [2024-04-08 10:30:25.013592 +0000] I [MSGID: 108026] >>>>> [afr-self-heal-entry.c:1080:afr_selfheal_entry_do] >>>>> 0-gv-ho-replicate-0: performing full entry selfheal on >>>>> 24e82e12-5512-4679-9eb3-8bd098367db7 >>>>> [2024-04-08 10:33:17.613594 +0000] W [MSGID: 114031] >>>>> [client-rpc-fops_v2.c:2457:client4_0_link_cbk] 0-gv-ho-client-5: >>>>> remote operation failed. >>>>> [{source=}, >>>>> {target=(null)}, {errno=116}, {error=Veraltete Dateizugriffsnummer >>>>> (file handle)}] >>>>> [2024-04-08 10:33:21.201359 +0000] W [MSGID: 114031] >>>>> [client-rpc-fops_v2.c:2457:client4_0_link_cbk] 0-gv-ho-client-5: >>>>> remote operation failed. [{source= >>>> >>>> How are he clients mapped to real hosts in order to know on which >>>> one?s logs to look at? >>>> >>>> We would like to go by exclusion to finally eradicate this, >>>> possibly in a conservative way (not rebuilding everything) and we >>>> >>>> are becoming clueless as to where to look at as we also tried >>>> various options settings regarding performance etc. >>>> >>>> Here is the set on our main volume: >>>> >>>>> cluster.lookup-unhashed on (DEFAULT) >>>>> cluster.lookup-optimize????????????????? on (DEFAULT) >>>>> cluster.min-free-disk??????????????????? 10% (DEFAULT) >>>>> cluster.min-free-inodes????????????????? 5% (DEFAULT) >>>>> cluster.rebalance-stats????????????????? off (DEFAULT) >>>>> cluster.subvols-per-directory (null) (DEFAULT) >>>>> cluster.readdir-optimize???????????????? off (DEFAULT) >>>>> cluster.rsync-hash-regex (null) (DEFAULT) >>>>> cluster.extra-hash-regex (null) (DEFAULT) >>>>> cluster.dht-xattr-name trusted.glusterfs.dht (DEFAULT) >>>>> cluster.randomize-hash-range-by-gfid???? off (DEFAULT) >>>>> cluster.rebal-throttle normal (DEFAULT) >>>>> cluster.lock-migration off >>>>> cluster.force-migration off >>>>> cluster.local-volume-name (null) (DEFAULT) >>>>> cluster.weighted-rebalance?????????????? on (DEFAULT) >>>>> cluster.switch-pattern (null) (DEFAULT) >>>>> cluster.entry-change-log???????????????? on (DEFAULT) >>>>> cluster.read-subvolume (null) (DEFAULT) >>>>> cluster.read-subvolume-index???????????? -1 (DEFAULT) >>>>> cluster.read-hash-mode?????????????????? 1 (DEFAULT) >>>>> cluster.background-self-heal-count?????? 8 (DEFAULT) >>>>> cluster.metadata-self-heal on >>>>> cluster.data-self-heal on >>>>> cluster.entry-self-heal on >>>>> cluster.self-heal-daemon enable >>>>> cluster.heal-timeout???????????????????? 600 (DEFAULT) >>>>> cluster.self-heal-window-size??????????? 8 (DEFAULT) >>>>> cluster.data-change-log????????????????? on (DEFAULT) >>>>> cluster.metadata-change-log????????????? on (DEFAULT) >>>>> cluster.data-self-heal-algorithm (null) (DEFAULT) >>>>> cluster.eager-lock?????????????????????? on (DEFAULT) >>>>> disperse.eager-lock????????????????????? on (DEFAULT) >>>>> disperse.other-eager-lock??????????????? on (DEFAULT) >>>>> disperse.eager-lock-timeout????????????? 1 (DEFAULT) >>>>> disperse.other-eager-lock-timeout??????? 1 (DEFAULT) >>>>> cluster.quorum-type auto >>>>> cluster.quorum-count 2 >>>>> cluster.choose-local true (DEFAULT) >>>>> cluster.self-heal-readdir-size?????????? 1KB (DEFAULT) >>>>> cluster.post-op-delay-secs?????????????? 1 (DEFAULT) >>>>> cluster.ensure-durability??????????????? on (DEFAULT) >>>>> cluster.consistent-metadata????????????? no (DEFAULT) >>>>> cluster.heal-wait-queue-length?????????? 128 (DEFAULT) >>>>> cluster.favorite-child-policy none >>>>> cluster.full-lock??????????????????????? yes (DEFAULT) >>>>> cluster.optimistic-change-log??????????? on (DEFAULT) >>>>> diagnostics.latency-measurement off >>>>> diagnostics.dump-fd-stats??????????????? off (DEFAULT) >>>>> diagnostics.count-fop-hits off >>>>> diagnostics.brick-log-level INFO >>>>> diagnostics.client-log-level INFO >>>>> diagnostics.brick-sys-log-level CRITICAL (DEFAULT) >>>>> diagnostics.client-sys-log-level CRITICAL (DEFAULT) >>>>> diagnostics.brick-logger (null) (DEFAULT) >>>>> diagnostics.client-logger (null) (DEFAULT) >>>>> diagnostics.brick-log-format (null) (DEFAULT) >>>>> diagnostics.client-log-format (null) (DEFAULT) >>>>> diagnostics.brick-log-buf-size?????????? 5 (DEFAULT) >>>>> diagnostics.client-log-buf-size????????? 5 (DEFAULT) >>>>> diagnostics.brick-log-flush-timeout????? 120 (DEFAULT) >>>>> diagnostics.client-log-flush-timeout???? 120 (DEFAULT) >>>>> diagnostics.stats-dump-interval????????? 0 (DEFAULT) >>>>> diagnostics.fop-sample-interval????????? 0 (DEFAULT) >>>>> diagnostics.stats-dump-format json (DEFAULT) >>>>> diagnostics.fop-sample-buf-size 65535 (DEFAULT) >>>>> diagnostics.stats-dnscache-ttl-sec 86400 (DEFAULT) >>>>> performance.cache-max-file-size 10 >>>>> performance.cache-min-file-size????????? 0 (DEFAULT) >>>>> performance.cache-refresh-timeout??????? 1 (DEFAULT) >>>>> performance.cache-priority (DEFAULT) >>>>> performance.io-cache-size 32MB (DEFAULT) >>>>> performance.cache-size 32MB (DEFAULT) >>>>> performance.io-thread-count????????????? 16 (DEFAULT) >>>>> performance.high-prio-threads??????????? 16 (DEFAULT) >>>>> performance.normal-prio-threads????????? 16 (DEFAULT) >>>>> performance.low-prio-threads???????????? 16 (DEFAULT) >>>>> performance.least-prio-threads?????????? 1 (DEFAULT) >>>>> performance.enable-least-priority??????? on (DEFAULT) >>>>> performance.iot-watchdog-secs (null) (DEFAULT) >>>>> performance.iot-cleanup-disconnected-reqs off (DEFAULT) >>>>> performance.iot-pass-through false (DEFAULT) >>>>> performance.io-cache-pass-through false (DEFAULT) >>>>> performance.quick-read-cache-size 128MB (DEFAULT) >>>>> performance.cache-size 128MB (DEFAULT) >>>>> performance.quick-read-cache-timeout???? 1 (DEFAULT) >>>>> performance.qr-cache-timeout 600 >>>>> performance.quick-read-cache-invalidation false (DEFAULT) >>>>> performance.ctime-invalidation false (DEFAULT) >>>>> performance.flush-behind???????????????? on (DEFAULT) >>>>> performance.nfs.flush-behind???????????? on (DEFAULT) >>>>> performance.write-behind-window-size 4MB >>>>> performance.resync-failed-syncs-after-fsync off (DEFAULT) >>>>> performance.nfs.write-behind-window-size 1MB (DEFAULT) >>>>> performance.strict-o-direct????????????? off (DEFAULT) >>>>> performance.nfs.strict-o-direct????????? off (DEFAULT) >>>>> performance.strict-write-ordering??????? off (DEFAULT) >>>>> performance.nfs.strict-write-ordering??? off (DEFAULT) >>>>> performance.write-behind-trickling-writes on (DEFAULT) >>>>> performance.aggregate-size 128KB (DEFAULT) >>>>> performance.nfs.write-behind-trickling-writes on (DEFAULT) >>>>> performance.lazy-open??????????????????? yes (DEFAULT) >>>>> performance.read-after-open????????????? yes (DEFAULT) >>>>> performance.open-behind-pass-through false (DEFAULT) >>>>> performance.read-ahead-page-count??????? 4 (DEFAULT) >>>>> performance.read-ahead-pass-through false (DEFAULT) >>>>> performance.readdir-ahead-pass-through false (DEFAULT) >>>>> performance.md-cache-pass-through false (DEFAULT) >>>>> performance.write-behind-pass-through false (DEFAULT) >>>>> performance.md-cache-timeout 600 >>>>> performance.cache-swift-metadata false (DEFAULT) >>>>> performance.cache-samba-metadata on >>>>> performance.cache-capability-xattrs true (DEFAULT) >>>>> performance.cache-ima-xattrs true (DEFAULT) >>>>> performance.md-cache-statfs????????????? off (DEFAULT) >>>>> performance.xattr-cache-list (DEFAULT) >>>>> performance.nl-cache-pass-through false (DEFAULT) >>>>> network.frame-timeout 1800 (DEFAULT) >>>>> network.ping-timeout 20 >>>>> network.tcp-window-size (null) (DEFAULT) >>>>> client.ssl off >>>>> network.remote-dio disable (DEFAULT) >>>>> client.event-threads 4 >>>>> client.tcp-user-timeout 0 >>>>> client.keepalive-time 20 >>>>> client.keepalive-interval 2 >>>>> client.keepalive-count 9 >>>>> client.strict-locks off >>>>> network.tcp-window-size (null) (DEFAULT) >>>>> network.inode-lru-limit 200000 >>>>> auth.allow * >>>>> auth.reject (null) (DEFAULT) >>>>> transport.keepalive 1 >>>>> server.allow-insecure??????????????????? on (DEFAULT) >>>>> server.root-squash?????????????????????? off (DEFAULT) >>>>> server.all-squash??????????????????????? off (DEFAULT) >>>>> server.anonuid 65534 (DEFAULT) >>>>> server.anongid 65534 (DEFAULT) >>>>> server.statedump-path /var/run/gluster (DEFAULT) >>>>> server.outstanding-rpc-limit???????????? 64 (DEFAULT) >>>>> server.ssl off >>>>> auth.ssl-allow * >>>>> server.manage-gids?????????????????????? off (DEFAULT) >>>>> server.dynamic-auth????????????????????? on (DEFAULT) >>>>> client.send-gids???????????????????????? on (DEFAULT) >>>>> server.gid-timeout?????????????????????? 300 (DEFAULT) >>>>> server.own-thread (null) (DEFAULT) >>>>> server.event-threads 4 >>>>> server.tcp-user-timeout????????????????? 42 (DEFAULT) >>>>> server.keepalive-time 20 >>>>> server.keepalive-interval 2 >>>>> server.keepalive-count 9 >>>>> transport.listen-backlog 1024 >>>>> ssl.own-cert (null) (DEFAULT) >>>>> ssl.private-key (null) (DEFAULT) >>>>> ssl.ca-list (null) (DEFAULT) >>>>> ssl.crl-path (null) (DEFAULT) >>>>> ssl.certificate-depth (null) (DEFAULT) >>>>> ssl.cipher-list (null) (DEFAULT) >>>>> ssl.dh-param (null) (DEFAULT) >>>>> ssl.ec-curve (null) (DEFAULT) >>>>> transport.address-family inet >>>>> performance.write-behind off >>>>> performance.read-ahead on >>>>> performance.readdir-ahead on >>>>> performance.io-cache off >>>>> performance.open-behind on >>>>> performance.quick-read on >>>>> performance.nl-cache on >>>>> performance.stat-prefetch on >>>>> performance.client-io-threads off >>>>> performance.nfs.write-behind on >>>>> performance.nfs.read-ahead off >>>>> performance.nfs.io-cache off >>>>> performance.nfs.quick-read off >>>>> performance.nfs.stat-prefetch off >>>>> performance.nfs.io-threads off >>>>> performance.force-readdirp true (DEFAULT) >>>>> performance.cache-invalidation on >>>>> performance.global-cache-invalidation true (DEFAULT) >>>>> features.uss off >>>>> features.snapshot-directory .snaps >>>>> features.show-snapshot-directory off >>>>> features.tag-namespaces off >>>>> network.compression off >>>>> network.compression.window-size????????? -15 (DEFAULT) >>>>> network.compression.mem-level??????????? 8 (DEFAULT) >>>>> network.compression.min-size???????????? 0 (DEFAULT) >>>>> network.compression.compression-level??? -1 (DEFAULT) >>>>> network.compression.debug false (DEFAULT) >>>>> features.default-soft-limit????????????? 80% (DEFAULT) >>>>> features.soft-timeout??????????????????? 60 (DEFAULT) >>>>> features.hard-timeout??????????????????? 5 (DEFAULT) >>>>> features.alert-time 86400 (DEFAULT) >>>>> features.quota-deem-statfs off >>>>> geo-replication.indexing off >>>>> geo-replication.indexing off >>>>> geo-replication.ignore-pid-check off >>>>> geo-replication.ignore-pid-check off >>>>> features.quota off >>>>> features.inode-quota off >>>>> features.bitrot disable >>>>> debug.trace off >>>>> debug.log-history??????????????????????? no (DEFAULT) >>>>> debug.log-file?????????????????????????? no (DEFAULT) >>>>> debug.exclude-ops (null) (DEFAULT) >>>>> debug.include-ops (null) (DEFAULT) >>>>> debug.error-gen off >>>>> debug.error-failure (null) (DEFAULT) >>>>> debug.error-number (null) (DEFAULT) >>>>> debug.random-failure???????????????????? off (DEFAULT) >>>>> debug.error-fops (null) (DEFAULT) >>>>> nfs.disable on >>>>> features.read-only?????????????????????? off (DEFAULT) >>>>> features.worm off >>>>> features.worm-file-level off >>>>> features.worm-files-deletable on >>>>> features.default-retention-period??????? 120 (DEFAULT) >>>>> features.retention-mode relax (DEFAULT) >>>>> features.auto-commit-period????????????? 180 (DEFAULT) >>>>> storage.linux-aio??????????????????????? off (DEFAULT) >>>>> storage.linux-io_uring?????????????????? off (DEFAULT) >>>>> storage.batch-fsync-mode reverse-fsync (DEFAULT) >>>>> storage.batch-fsync-delay-usec?????????? 0 (DEFAULT) >>>>> storage.owner-uid??????????????????????? -1 (DEFAULT) >>>>> storage.owner-gid??????????????????????? -1 (DEFAULT) >>>>> storage.node-uuid-pathinfo?????????????? off (DEFAULT) >>>>> storage.health-check-interval??????????? 30 (DEFAULT) >>>>> storage.build-pgfid????????????????????? off (DEFAULT) >>>>> storage.gfid2path??????????????????????? on (DEFAULT) >>>>> storage.gfid2path-separator????????????? : (DEFAULT) >>>>> storage.reserve????????????????????????? 1 (DEFAULT) >>>>> storage.health-check-timeout???????????? 20 (DEFAULT) >>>>> storage.fips-mode-rchecksum on >>>>> storage.force-create-mode 0000 (DEFAULT) >>>>> storage.force-directory-mode 0000 (DEFAULT) >>>>> storage.create-mask 0777 (DEFAULT) >>>>> storage.create-directory-mask 0777 (DEFAULT) >>>>> storage.max-hardlinks??????????????????? 100 (DEFAULT) >>>>> features.ctime?????????????????????????? on (DEFAULT) >>>>> config.gfproxyd off >>>>> cluster.server-quorum-type server >>>>> cluster.server-quorum-ratio 51 >>>>> changelog.changelog????????????????????? off (DEFAULT) >>>>> changelog.changelog-dir????????????????? {{ brick.path >>>>> }}/.glusterfs/changelogs (DEFAULT) >>>>> changelog.encoding ascii (DEFAULT) >>>>> changelog.rollover-time????????????????? 15 (DEFAULT) >>>>> changelog.fsync-interval???????????????? 5 (DEFAULT) >>>>> changelog.changelog-barrier-timeout 120 >>>>> changelog.capture-del-path?????????????? off (DEFAULT) >>>>> features.barrier disable >>>>> features.barrier-timeout 120 >>>>> features.trash?????????????????????????? off (DEFAULT) >>>>> features.trash-dir .trashcan (DEFAULT) >>>>> features.trash-eliminate-path (null) (DEFAULT) >>>>> features.trash-max-filesize????????????? 5MB (DEFAULT) >>>>> features.trash-internal-op?????????????? off (DEFAULT) >>>>> cluster.enable-shared-storage disable >>>>> locks.trace????????????????????????????? off (DEFAULT) >>>>> locks.mandatory-locking????????????????? off (DEFAULT) >>>>> cluster.disperse-self-heal-daemon enable (DEFAULT) >>>>> cluster.quorum-reads???????????????????? no (DEFAULT) >>>>> client.bind-insecure (null) (DEFAULT) >>>>> features.timeout???????????????????????? 45 (DEFAULT) >>>>> features.failover-hosts (null) (DEFAULT) >>>>> features.shard off >>>>> features.shard-block-size 64MB (DEFAULT) >>>>> features.shard-lru-limit 16384 (DEFAULT) >>>>> features.shard-deletion-rate???????????? 100 (DEFAULT) >>>>> features.scrub-throttle lazy >>>>> features.scrub-freq biweekly >>>>> features.scrub false (DEFAULT) >>>>> features.expiry-time 120 >>>>> features.signer-threads 4 >>>>> features.cache-invalidation on >>>>> features.cache-invalidation-timeout 600 >>>>> ganesha.enable off >>>>> features.leases off >>>>> features.lease-lock-recall-timeout?????? 60 (DEFAULT) >>>>> disperse.background-heals??????????????? 8 (DEFAULT) >>>>> disperse.heal-wait-qlength?????????????? 128 (DEFAULT) >>>>> cluster.heal-timeout???????????????????? 600 (DEFAULT) >>>>> dht.force-readdirp?????????????????????? on (DEFAULT) >>>>> disperse.read-policy gfid-hash (DEFAULT) >>>>> cluster.shd-max-threads 4 >>>>> cluster.shd-wait-qlength 1024 (DEFAULT) >>>>> cluster.locking-scheme full (DEFAULT) >>>>> cluster.granular-entry-heal????????????? no (DEFAULT) >>>>> features.locks-revocation-secs?????????? 0 (DEFAULT) >>>>> features.locks-revocation-clear-all false (DEFAULT) >>>>> features.locks-revocation-max-blocked??? 0 (DEFAULT) >>>>> features.locks-monkey-unlocking false (DEFAULT) >>>>> features.locks-notify-contention???????? yes (DEFAULT) >>>>> features.locks-notify-contention-delay?? 5 (DEFAULT) >>>>> disperse.shd-max-threads???????????????? 1 (DEFAULT) >>>>> disperse.shd-wait-qlength 4096 >>>>> disperse.cpu-extensions auto (DEFAULT) >>>>> disperse.self-heal-window-size?????????? 32 (DEFAULT) >>>>> cluster.use-compound-fops off >>>>> performance.parallel-readdir on >>>>> performance.rda-request-size 131072 >>>>> performance.rda-low-wmark 4096 (DEFAULT) >>>>> performance.rda-high-wmark 128KB (DEFAULT) >>>>> performance.rda-cache-limit 10MB >>>>> performance.nl-cache-positive-entry false (DEFAULT) >>>>> performance.nl-cache-limit 10MB >>>>> performance.nl-cache-timeout 600 >>>>> cluster.brick-multiplex disable >>>>> cluster.brick-graceful-cleanup disable >>>>> glusterd.vol_count_per_thread 100 >>>>> cluster.max-bricks-per-process 250 >>>>> disperse.optimistic-change-log?????????? on (DEFAULT) >>>>> disperse.stripe-cache??????????????????? 4 (DEFAULT) >>>>> cluster.halo-enabled False (DEFAULT) >>>>> cluster.halo-shd-max-latency 99999 (DEFAULT) >>>>> cluster.halo-nfsd-max-latency??????????? 5 (DEFAULT) >>>>> cluster.halo-max-latency???????????????? 5 (DEFAULT) >>>>> cluster.halo-max-replicas 99999 (DEFAULT) >>>>> cluster.halo-min-replicas??????????????? 2 (DEFAULT) >>>>> features.selinux on >>>>> cluster.daemon-log-level INFO >>>>> debug.delay-gen off >>>>> delay-gen.delay-percentage?????????????? 10% (DEFAULT) >>>>> delay-gen.delay-duration 100000 (DEFAULT) >>>>> delay-gen.enable (DEFAULT) >>>>> disperse.parallel-writes???????????????? on (DEFAULT) >>>>> disperse.quorum-count??????????????????? 0 (DEFAULT) >>>>> features.sdfs off >>>>> features.cloudsync off >>>>> features.ctime on >>>>> ctime.noatime on >>>>> features.cloudsync-storetype (null) (DEFAULT) >>>>> features.enforce-mandatory-lock off >>>>> config.global-threading off >>>>> config.client-threads 16 >>>>> config.brick-threads 16 >>>>> features.cloudsync-remote-read off >>>>> features.cloudsync-store-id (null) (DEFAULT) >>>>> features.cloudsync-product-id (null) (DEFAULT) >>>>> features.acl enable >>>>> cluster.use-anonymous-inode yes >>>>> rebalance.ensure-durability????????????? on (DEFAULT) >>>> >>>> Again, sorry for the long post. We would be happy to have this >>>> solved as we are excited using glusterfs and we would like to go >>>> back to having a stable configuration. >>>> >>>> We always appreciate the spirit of collaboration and reciprocal >>>> help on this list. >>>> >>>> Best >>>> Ilias >>>> >>>> -- >>>> ?forumZFD >>>> Entschieden f?r Frieden | Committed to Peace >>>> >>>> Ilias Chasapakis >>>> Referent IT | IT Consultant >>>> >>>> Forum Ziviler Friedensdienst e.V. | Forum Civil Peace Service >>>> Am K?lner Brett 8 | 50825 K?ln | Germany >>>> >>>> Tel 0221 91273243 | Fax 0221 91273299 | http://www.forumZFD.de >>>> >>>> Vorstand nach ? 26 BGB, einzelvertretungsberechtigt|Executive Board: >>>> Alexander Mauz, Sonja Wiekenberg-Mlalandle, Jens von Bargen >>>> VR 17651 Amtsgericht K?ln >>>> >>>> Spenden|Donations: IBAN DE90 4306 0967 4103 7264 00 ??BIC GENODEM1GLS >>>> >>>> ________ >>>> >>>> >>>> >>>> Community Meeting Calendar: >>>> >>>> Schedule - >>>> Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC >>>> Bridge: https://meet.google.com/cpu-eiue-hvk >>>> Gluster-users mailing list >>>> Gluster-users at gluster.org >>>> https://lists.gluster.org/mailman/listinfo/gluster-users >>> >> -- >> ?forumZFD >> Entschieden f?r Frieden | Committed to Peace >> >> Ilias Chasapakis >> Referent IT | IT Consultant >> >> Forum Ziviler Friedensdienst e.V. | Forum Civil Peace Service >> Am K?lner Brett 8 | 50825 K?ln | Germany >> >> Tel 0221 91273243 | Fax 0221 91273299 |http://www.forumZFD.de >> >> Vorstand nach ? 26 BGB, einzelvertretungsberechtigt|Executive Board: >> Alexander Mauz, Sonja Wiekenberg-Mlalandle, Jens von Bargen >> VR 17651 Amtsgericht K?ln >> >> Spenden|Donations: IBAN DE90 4306 0967 4103 7264 00 BIC GENODEM1GLS >> ________ >> >> >> >> Community Meeting Calendar: >> >> Schedule - >> Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC >> Bridge: https://meet.google.com/cpu-eiue-hvk >> Gluster-users mailing list >> Gluster-users at gluster.org >> https://lists.gluster.org/mailman/listinfo/gluster-users > -- ?forumZFD Entschieden f?r Frieden | Committed to Peace Ilias Chasapakis Referent IT | IT Consultant Forum Ziviler Friedensdienst e.V. | Forum Civil Peace Service Am K?lner Brett 8 | 50825 K?ln | Germany Tel 0221 91273243 | Fax 0221 91273299 |http://www.forumZFD.de Vorstand nach ? 26 BGB, einzelvertretungsberechtigt|Executive Board: Alexander Mauz, Sonja Wiekenberg-Mlalandle, Jens von Bargen VR 17651 Amtsgericht K?ln Spenden|Donations: IBAN DE90 4306 0967 4103 7264 00 BIC GENODEM1GLS -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: OpenPGP_signature.asc Type: application/pgp-signature Size: 665 bytes Desc: OpenPGP digital signature URL: From revirii at googlemail.com Tue Apr 23 06:46:53 2024 From: revirii at googlemail.com (Hu Bert) Date: Tue, 23 Apr 2024 08:46:53 +0200 Subject: [Gluster-users] Gluster 11.1 - heal hangs (again) Message-ID: Hi, referring to this thread: https://lists.gluster.org/pipermail/gluster-users/2024-January/040465.html especially: https://lists.gluster.org/pipermail/gluster-users/2024-January/040513.html I've updated+rebooted 3 servers (debian bookworm) with gluster 11.1 running. The first 2 servers went fine, gluster volume ok, no heals, so after a couple of minutes i rebooted the 3rd server. And having the same problem again: heals are counting up, no heals happen. gluster volume status+info ok, gluster peer status ok. Full volume status+info: https://pastebin.com/aEEEKn7h Volume Name: sourceimages Type: Replicate Volume ID: d6a559a1-ca4c-48c7-8adf-89048333bb58 Status: Started Snapshot Count: 0 Number of Bricks: 1 x 3 = 3 Transport-type: tcp Bricks: Brick1: gluster188:/gluster/md3/sourceimages Brick2: gluster189:/gluster/md3/sourceimages Brick3: gluster190:/gluster/md3/sourceimages Internal IPs: gluster188: 192.168.0.188 gluster189: 192.168.0.189 gluster190: 192.168.0.190 After rebooting the 3rd server (gluster190) the client info looks like this: gluster volume status sourceimages clients Client connections for volume sourceimages ---------------------------------------------- Brick : gluster188:/gluster/md3/sourceimages Clients connected : 17 Hostname BytesRead BytesWritten OpVersion -------- --------- ------------ --------- 192.168.0.188:49151 1047856 988364 110000 192.168.0.189:49149 930792 654096 110000 192.168.0.109:49147 271598 279908 110000 192.168.0.223:49147 126764 130964 110000 192.168.0.222:49146 125848 130144 110000 192.168.0.2:49147 273756 43400387 110000 192.168.0.15:49147 57248531 14327465 110000 192.168.0.126:49147 32282645 671284763 110000 192.168.0.94:49146 125520 128864 110000 192.168.0.66:49146 34086248 666519388 110000 192.168.0.99:49146 3051076 522652843 110000 192.168.0.16:49146 149773024 1049035 110000 192.168.0.110:49146 1574768 566124922 110000 192.168.0.106:49146 152640790 146483580 110000 192.168.0.91:49133 89548971 82709793 110000 192.168.0.190:49149 4132 6540 110000 192.168.0.118:49133 92176 92884 110000 ---------------------------------------------- Brick : gluster189:/gluster/md3/sourceimages Clients connected : 17 Hostname BytesRead BytesWritten OpVersion -------- --------- ------------ --------- 192.168.0.188:49146 935172 658268 110000 192.168.0.189:49151 1039048 977920 110000 192.168.0.126:49146 27106555 231766764 110000 192.168.0.110:49147 1121696 226426262 110000 192.168.0.16:49147 147165735 994015 110000 192.168.0.106:49147 152476618 1091156 110000 192.168.0.94:49147 109612 112688 110000 192.168.0.109:49146 180819 1489715 110000 192.168.0.223:49146 110708 114316 110000 192.168.0.99:49147 2573412 157737429 110000 192.168.0.2:49145 242696 26088710 110000 192.168.0.222:49145 109728 113064 110000 192.168.0.66:49145 27003740 215124678 110000 192.168.0.15:49145 57217513 594699 110000 192.168.0.91:49132 89463431 2714920 110000 192.168.0.190:49148 4132 6540 110000 192.168.0.118:49131 92380 94996 110000 ---------------------------------------------- Brick : gluster190:/gluster/md3/sourceimages Clients connected : 2 Hostname BytesRead BytesWritten OpVersion -------- --------- ------------ --------- 192.168.0.190:49151 21252 27988 110000 192.168.0.118:49132 92176 92884 110000 The bad server (gluster190) has only 2 clients: itself and 192.168.0.118 (was rebooted after gluster190). Well, i remounted the volume on the other clients (without reboot), they appear now - but the most important thing: the other 2 gluster servers are missing. Output shortened, removed the connected clients: gluster volume status sourceimages clients Client connections for volume sourceimages ---------------------------------------------- Brick : gluster188:/gluster/md3/sourceimages Clients connected : 17 Hostname BytesRead BytesWritten OpVersion -------- --------- ------------ --------- 192.168.0.188:49151 3707272 3387700 110000 192.168.0.189:49149 3346388 2264688 110000 192.168.0.190:49149 4132 6540 110000 ---------------------------------------------- Brick : gluster189:/gluster/md3/sourceimages Clients connected : 17 Hostname BytesRead BytesWritten OpVersion -------- --------- ------------ --------- 192.168.0.189:49151 3698464 3377496 110000 192.168.0.188:49146 3350768 2268260 110000 192.168.0.190:49148 4132 6540 110000 ---------------------------------------------- Brick : gluster190:/gluster/md3/sourceimages Clients connected : 15 Hostname BytesRead BytesWritten OpVersion -------- --------- ------------ --------- 192.168.0.190:49151 38692 49988 110000 ---------------------------------------------- The 2 good (peer) cluster are missing on the 3rd/bad server. As these are not normal clients: how do i re-add/re-connect them? The 3 servers do not mount the volume to some mountpoint during normal service. Best regards, Hubert From revirii at googlemail.com Tue Apr 23 06:54:51 2024 From: revirii at googlemail.com (Hu Bert) Date: Tue, 23 Apr 2024 08:54:51 +0200 Subject: [Gluster-users] Gluster 11.1 - heal hangs (again) In-Reply-To: References: Message-ID: Ah, logs: nothing in the glustershd.log on the 3 gluster servers. But on one client in /var/log/glusterfs/data-sourceimages.log : [2024-04-23 06:54:21.456157 +0000] W [MSGID: 114061] [client-common.c:796:client_pre_lk_v2] 0-sourceimages-client-2: remote_fd is -1. EBADFD [{gfid=a1817071-2949-4145-a96a-874159e46511}, {errno=77}, {error=File descriptor in bad state}] [2024-04-23 06:54:21.456195 +0000] E [MSGID: 108028] [afr-open.c:361:afr_is_reopen_allowed_cbk] 0-sourceimages-replicate-0: Failed getlk for a1817071-2949-4145-a96a-874159e46511 [File descriptor in bad state] [2024-04-23 06:54:21.488511 +0000] W [MSGID: 114061] [client-common.c:530:client_pre_flush_v2] 0-sourceimages-client-2: remote_fd is -1. EBADFD [{gfid=a1817071-2949-4145-a96a-874159e46511}, {errno=77}, {error=File descriptor in bad stat e}] Am Di., 23. Apr. 2024 um 08:46 Uhr schrieb Hu Bert : > > Hi, > > referring to this thread: > https://lists.gluster.org/pipermail/gluster-users/2024-January/040465.html > especially: https://lists.gluster.org/pipermail/gluster-users/2024-January/040513.html > > I've updated+rebooted 3 servers (debian bookworm) with gluster 11.1 > running. The first 2 servers went fine, gluster volume ok, no heals, > so after a couple of minutes i rebooted the 3rd server. And having the > same problem again: heals are counting up, no heals happen. gluster > volume status+info ok, gluster peer status ok. > > Full volume status+info: https://pastebin.com/aEEEKn7h > > Volume Name: sourceimages > Type: Replicate > Volume ID: d6a559a1-ca4c-48c7-8adf-89048333bb58 > Status: Started > Snapshot Count: 0 > Number of Bricks: 1 x 3 = 3 > Transport-type: tcp > Bricks: > Brick1: gluster188:/gluster/md3/sourceimages > Brick2: gluster189:/gluster/md3/sourceimages > Brick3: gluster190:/gluster/md3/sourceimages > > Internal IPs: > gluster188: 192.168.0.188 > gluster189: 192.168.0.189 > gluster190: 192.168.0.190 > > After rebooting the 3rd server (gluster190) the client info looks like this: > > gluster volume status sourceimages clients > Client connections for volume sourceimages > ---------------------------------------------- > Brick : gluster188:/gluster/md3/sourceimages > Clients connected : 17 > Hostname BytesRead > BytesWritten OpVersion > -------- --------- > ------------ --------- > 192.168.0.188:49151 1047856 > 988364 110000 > 192.168.0.189:49149 930792 > 654096 110000 > 192.168.0.109:49147 271598 > 279908 110000 > 192.168.0.223:49147 126764 > 130964 110000 > 192.168.0.222:49146 125848 > 130144 110000 > 192.168.0.2:49147 273756 > 43400387 110000 > 192.168.0.15:49147 57248531 > 14327465 110000 > 192.168.0.126:49147 32282645 > 671284763 110000 > 192.168.0.94:49146 125520 > 128864 110000 > 192.168.0.66:49146 34086248 > 666519388 110000 > 192.168.0.99:49146 3051076 > 522652843 110000 > 192.168.0.16:49146 149773024 > 1049035 110000 > 192.168.0.110:49146 1574768 > 566124922 110000 > 192.168.0.106:49146 152640790 > 146483580 110000 > 192.168.0.91:49133 89548971 > 82709793 110000 > 192.168.0.190:49149 4132 > 6540 110000 > 192.168.0.118:49133 92176 > 92884 110000 > ---------------------------------------------- > Brick : gluster189:/gluster/md3/sourceimages > Clients connected : 17 > Hostname BytesRead > BytesWritten OpVersion > -------- --------- > ------------ --------- > 192.168.0.188:49146 935172 > 658268 110000 > 192.168.0.189:49151 1039048 > 977920 110000 > 192.168.0.126:49146 27106555 > 231766764 110000 > 192.168.0.110:49147 1121696 > 226426262 110000 > 192.168.0.16:49147 147165735 > 994015 110000 > 192.168.0.106:49147 152476618 > 1091156 110000 > 192.168.0.94:49147 109612 > 112688 110000 > 192.168.0.109:49146 180819 > 1489715 110000 > 192.168.0.223:49146 110708 > 114316 110000 > 192.168.0.99:49147 2573412 > 157737429 110000 > 192.168.0.2:49145 242696 > 26088710 110000 > 192.168.0.222:49145 109728 > 113064 110000 > 192.168.0.66:49145 27003740 > 215124678 110000 > 192.168.0.15:49145 57217513 > 594699 110000 > 192.168.0.91:49132 89463431 > 2714920 110000 > 192.168.0.190:49148 4132 > 6540 110000 > 192.168.0.118:49131 92380 > 94996 110000 > ---------------------------------------------- > Brick : gluster190:/gluster/md3/sourceimages > Clients connected : 2 > Hostname BytesRead > BytesWritten OpVersion > -------- --------- > ------------ --------- > 192.168.0.190:49151 21252 > 27988 110000 > 192.168.0.118:49132 92176 > 92884 110000 > > The bad server (gluster190) has only 2 clients: itself and > 192.168.0.118 (was rebooted after gluster190). Well, i remounted the > volume on the other clients (without reboot), they appear now - but > the most important thing: the other 2 gluster servers are missing. > Output shortened, removed the connected clients: > > gluster volume status sourceimages clients > Client connections for volume sourceimages > ---------------------------------------------- > Brick : gluster188:/gluster/md3/sourceimages > Clients connected : 17 > Hostname BytesRead > BytesWritten OpVersion > -------- --------- > ------------ --------- > 192.168.0.188:49151 3707272 > 3387700 110000 > 192.168.0.189:49149 3346388 > 2264688 110000 > 192.168.0.190:49149 4132 > 6540 110000 > ---------------------------------------------- > Brick : gluster189:/gluster/md3/sourceimages > Clients connected : 17 > Hostname BytesRead > BytesWritten OpVersion > -------- --------- > ------------ --------- > 192.168.0.189:49151 3698464 > 3377496 110000 > 192.168.0.188:49146 3350768 > 2268260 110000 > 192.168.0.190:49148 4132 > 6540 110000 > ---------------------------------------------- > Brick : gluster190:/gluster/md3/sourceimages > Clients connected : 15 > Hostname BytesRead > BytesWritten OpVersion > -------- --------- > ------------ --------- > 192.168.0.190:49151 38692 > 49988 110000 > ---------------------------------------------- > > The 2 good (peer) cluster are missing on the 3rd/bad server. As these > are not normal clients: how do i re-add/re-connect them? The 3 servers > do not mount the volume to some mountpoint during normal service. > > > Best regards, > Hubert From revirii at googlemail.com Tue Apr 23 09:18:38 2024 From: revirii at googlemail.com (Hu Bert) Date: Tue, 23 Apr 2024 11:18:38 +0200 Subject: [Gluster-users] Gluster 11.1 - heal hangs (again) In-Reply-To: References: Message-ID: Howdy, was able to solve the problem. I had 2 options: reset-brick (i.e. reconfigure) or replace-brick (i.e. full sync). Tried reset-brick first... gluster volume reset-brick sourceimages gluster190:/gluster/md3/sourceimages start [... do nothing ...] gluster volume reset-brick sourceimages gluster190:/gluster/md3/sourceimages gluster190:/gluster/md3/sourceimages commit force After that the pending heals started, going to 0 pretty fast, and the connected clients are now identical for all 3 servers. Thx for reading, Hubert Am Di., 23. Apr. 2024 um 08:46 Uhr schrieb Hu Bert : > > Hi, > > referring to this thread: > https://lists.gluster.org/pipermail/gluster-users/2024-January/040465.html > especially: https://lists.gluster.org/pipermail/gluster-users/2024-January/040513.html > > I've updated+rebooted 3 servers (debian bookworm) with gluster 11.1 > running. The first 2 servers went fine, gluster volume ok, no heals, > so after a couple of minutes i rebooted the 3rd server. And having the > same problem again: heals are counting up, no heals happen. gluster > volume status+info ok, gluster peer status ok. > > Full volume status+info: https://pastebin.com/aEEEKn7h > > Volume Name: sourceimages > Type: Replicate > Volume ID: d6a559a1-ca4c-48c7-8adf-89048333bb58 > Status: Started > Snapshot Count: 0 > Number of Bricks: 1 x 3 = 3 > Transport-type: tcp > Bricks: > Brick1: gluster188:/gluster/md3/sourceimages > Brick2: gluster189:/gluster/md3/sourceimages > Brick3: gluster190:/gluster/md3/sourceimages > > Internal IPs: > gluster188: 192.168.0.188 > gluster189: 192.168.0.189 > gluster190: 192.168.0.190 > > After rebooting the 3rd server (gluster190) the client info looks like this: > > gluster volume status sourceimages clients > Client connections for volume sourceimages > ---------------------------------------------- > Brick : gluster188:/gluster/md3/sourceimages > Clients connected : 17 > Hostname BytesRead > BytesWritten OpVersion > -------- --------- > ------------ --------- > 192.168.0.188:49151 1047856 > 988364 110000 > 192.168.0.189:49149 930792 > 654096 110000 > 192.168.0.109:49147 271598 > 279908 110000 > 192.168.0.223:49147 126764 > 130964 110000 > 192.168.0.222:49146 125848 > 130144 110000 > 192.168.0.2:49147 273756 > 43400387 110000 > 192.168.0.15:49147 57248531 > 14327465 110000 > 192.168.0.126:49147 32282645 > 671284763 110000 > 192.168.0.94:49146 125520 > 128864 110000 > 192.168.0.66:49146 34086248 > 666519388 110000 > 192.168.0.99:49146 3051076 > 522652843 110000 > 192.168.0.16:49146 149773024 > 1049035 110000 > 192.168.0.110:49146 1574768 > 566124922 110000 > 192.168.0.106:49146 152640790 > 146483580 110000 > 192.168.0.91:49133 89548971 > 82709793 110000 > 192.168.0.190:49149 4132 > 6540 110000 > 192.168.0.118:49133 92176 > 92884 110000 > ---------------------------------------------- > Brick : gluster189:/gluster/md3/sourceimages > Clients connected : 17 > Hostname BytesRead > BytesWritten OpVersion > -------- --------- > ------------ --------- > 192.168.0.188:49146 935172 > 658268 110000 > 192.168.0.189:49151 1039048 > 977920 110000 > 192.168.0.126:49146 27106555 > 231766764 110000 > 192.168.0.110:49147 1121696 > 226426262 110000 > 192.168.0.16:49147 147165735 > 994015 110000 > 192.168.0.106:49147 152476618 > 1091156 110000 > 192.168.0.94:49147 109612 > 112688 110000 > 192.168.0.109:49146 180819 > 1489715 110000 > 192.168.0.223:49146 110708 > 114316 110000 > 192.168.0.99:49147 2573412 > 157737429 110000 > 192.168.0.2:49145 242696 > 26088710 110000 > 192.168.0.222:49145 109728 > 113064 110000 > 192.168.0.66:49145 27003740 > 215124678 110000 > 192.168.0.15:49145 57217513 > 594699 110000 > 192.168.0.91:49132 89463431 > 2714920 110000 > 192.168.0.190:49148 4132 > 6540 110000 > 192.168.0.118:49131 92380 > 94996 110000 > ---------------------------------------------- > Brick : gluster190:/gluster/md3/sourceimages > Clients connected : 2 > Hostname BytesRead > BytesWritten OpVersion > -------- --------- > ------------ --------- > 192.168.0.190:49151 21252 > 27988 110000 > 192.168.0.118:49132 92176 > 92884 110000 > > The bad server (gluster190) has only 2 clients: itself and > 192.168.0.118 (was rebooted after gluster190). Well, i remounted the > volume on the other clients (without reboot), they appear now - but > the most important thing: the other 2 gluster servers are missing. > Output shortened, removed the connected clients: > > gluster volume status sourceimages clients > Client connections for volume sourceimages > ---------------------------------------------- > Brick : gluster188:/gluster/md3/sourceimages > Clients connected : 17 > Hostname BytesRead > BytesWritten OpVersion > -------- --------- > ------------ --------- > 192.168.0.188:49151 3707272 > 3387700 110000 > 192.168.0.189:49149 3346388 > 2264688 110000 > 192.168.0.190:49149 4132 > 6540 110000 > ---------------------------------------------- > Brick : gluster189:/gluster/md3/sourceimages > Clients connected : 17 > Hostname BytesRead > BytesWritten OpVersion > -------- --------- > ------------ --------- > 192.168.0.189:49151 3698464 > 3377496 110000 > 192.168.0.188:49146 3350768 > 2268260 110000 > 192.168.0.190:49148 4132 > 6540 110000 > ---------------------------------------------- > Brick : gluster190:/gluster/md3/sourceimages > Clients connected : 15 > Hostname BytesRead > BytesWritten OpVersion > -------- --------- > ------------ --------- > 192.168.0.190:49151 38692 > 49988 110000 > ---------------------------------------------- > > The 2 good (peer) cluster are missing on the 3rd/bad server. As these > are not normal clients: how do i re-add/re-connect them? The 3 servers > do not mount the volume to some mountpoint during normal service. > > > Best regards, > Hubert