[Gluster-users] Weird Issue / Gluster not really synced
Christian Reiss
email at christian-reiss.de
Wed Sep 14 07:47:42 UTC 2022
Hey folks,
I am having a weird issue here. I am running a 3-node gluster setup
with these versions:
glusterfs-selinux-2.0.1-1.el8s.noarch
glusterfs-9.6-1.el8s.x86_64
centos-release-gluster9-1.0-1.el8.noarch
libglusterfs0-9.6-1.el8s.x86_64
libglusterd0-9.6-1.el8s.x86_64
glusterfs-cli-9.6-1.el8s.x86_64
glusterfs-server-9.6-1.el8s.x86_64
glusterfs-client-xlators-9.6-1.el8s.x86_64
glusterfs-fuse-9.6-1.el8s.x86_64
My volume info:
Volume Name: web-dir
Type: Replicate
Volume ID: 4ff57154-6ccb-45b0-97da-c12b8b5afa2b
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x 3 = 3
Transport-type: tcp
Bricks:
Brick1: wc-srv01.eulie.de:/var/lib/gluster/brick01
Brick2: wc-srv02.eulie.de:/var/lib/gluster/brick01
Brick3: wc-srv03.eulie.de:/var/lib/gluster/brick01
Options Reconfigured:
cluster.granular-entry-heal: on
storage.fips-mode-rchecksum: on
transport.address-family: inet
nfs.disable: on
performance.client-io-threads: on
features.cache-invalidation: on
features.cache-invalidation-timeout: 600
performance.stat-prefetch: on
performance.cache-invalidation: on
performance.md-cache-timeout: 600
network.inode-lru-limit: 200000
performance.readdir-ahead: on
performance.parallel-readdir: on
performance.nl-cache: on
performance.nl-cache-timeout: 600
performance.nl-cache-positive-entry: on
performance.qr-cache-timeout: 600
performance.cache-size: 4096MB
performance.cache-max-file-size: 512KB
diagnostics.latency-measurement: on
diagnostics.count-fop-hits: on
performance.io-cache: on
performance.io-thread-count: 16
server.allow-insecure: on
cluster.lookup-optimize: on
client.event-threads: 8
server.event-threads: 4
cluster.readdir-optimize: on
performance.write-behind-window-size: 32MB
and all bricks are online:
Status of volume: web-dir
Gluster process TCP Port RDMA Port Online Pid
------------------------------------------------------------------------------
Brick wc-srv01.eulie.de:/var/lib/
gluster/brick01 49152 0 Y
2671
Brick wc-srv02.eulie.de:/var/lib/
gluster/brick01 49152 0 Y
2614
Brick wc-srv03.eulie.de:/var/lib/
gluster/brick01 49152 0 Y
3223
Self-heal Daemon on localhost N/A N/A Y
2679
Self-heal Daemon on wc-srv02.dc-dus.dalason
.net N/A N/A Y
41537
Self-heal Daemon on wc-srv03.dc-dus.dalason
.net N/A N/A Y
78473
Task Status of Volume web-dir
------------------------------------------------------------------------------
There are no active volume tasks
Selinux is set to permissive.
System is running AlmaLinux 8 with current patches (as of today).
The three servers wc-srv01, wc-srv02 and wc-srv03 are connected via
10Gbit, can see each other and no connections issue arive. Network speed
is nearly 10Gbit, tested.
I mounted the volume on each server via itself:
wc-srv01 fstab:
wc-srv01.eulie.de:/web-dir /var/www glusterfs defaults,_netdev 0 0
wc-srv02 fstab:
wc-srv02.eulie.de:/web-dir /var/www glusterfs defaults,_netdev 0 0
wc-srv03 fstab:
wc-srv03.eulie.de:/web-dir /var/www glusterfs defaults,_netdev 0 0
Mounting works, and its size is correct across all servers:
# df -h /var/www/
Filesystem Size Used Avail Use% Mounted on
wc-srv01.eulie.de:/web-dir 100G 31G 70G 31% /var/www
Here is the weird issue:
wc01:
while sleep 1; do date > testfile ; done
wc02:
while sleep 1 ; do date ; cat testfile ; done
Wed 14 Sep 09:43:47 CEST 2022
Wed 14 Sep 09:43:45 CEST 2022
Wed 14 Sep 09:43:48 CEST 2022
Wed 14 Sep 09:43:45 CEST 2022
Wed 14 Sep 09:43:49 CEST 2022
Wed 14 Sep 09:43:45 CEST 2022
Wed 14 Sep 09:43:50 CEST 2022
Wed 14 Sep 09:43:45 CEST 2022
Wed 14 Sep 09:43:51 CEST 2022
Wed 14 Sep 09:43:45 CEST 2022
Wed 14 Sep 09:43:52 CEST 2022
Wed 14 Sep 09:43:45 CEST 2022
wc03:
while sleep 1 ; do date ; cat testfile ; done
Wed 14 Sep 09:43:43 CEST 2022
Wed 14 Sep 09:41:12 CEST 2022
Wed 14 Sep 09:43:45 CEST 2022
Wed 14 Sep 09:41:12 CEST 2022
Wed 14 Sep 09:43:46 CEST 2022
Wed 14 Sep 09:41:12 CEST 2022
Wed 14 Sep 09:43:47 CEST 2022
Wed 14 Sep 09:41:12 CEST 2022
Wed 14 Sep 09:43:48 CEST 2022
Wed 14 Sep 09:41:12 CEST 2022
Wed 14 Sep 09:43:49 CEST 2022
Wed 14 Sep 09:41:12 CEST 2022
So the file exists, and on initial write the stamps are correct. From
second 2 onward, I have three different files on all servers.
Deleting the file is instant on all nodes, and editing a file in vim
(doing :w) also instantly updates all files.
# gluster volume heal web-dir info
Brick wc-srv01.eulie.de:/var/lib/gluster/brick01
Status: Connected
Number of entries: 0
Brick wc-srv02.eulie.de:/var/lib/gluster/brick01
Status: Connected
Number of entries: 0
Brick wc-srv03.eulie.de:/var/lib/gluster/brick01
Status: Connected
Number of entries: 0
What... Why... How? :-)
I need a synced three-way active-active-active cluster with consistent
data across all nodes.
Any pointers from you gurus?
--
with kind regards,
mit freundlichen Gruessen,
Christian Reiss
-------------- next part --------------
A non-text attachment was scrubbed...
Name: OpenPGP_signature
Type: application/pgp-signature
Size: 840 bytes
Desc: OpenPGP digital signature
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20220914/aab2a858/attachment.sig>
More information about the Gluster-users
mailing list