<div dir="ltr">Hi Jiri,<div><br></div><div>your probleem looks pretty similar to mine, see; <a href="https://lists.gluster.org/pipermail/gluster-users/2021-February/039134.html">https://lists.gluster.org/pipermail/gluster-users/2021-February/039134.html</a></div><div>Any chance you also see the xfs errors in de brick logs?</div><div>For me the situation improved once i disabled brick multiplexing, but i don't see that in your volume configuration.</div><div><br></div><div>Cheers Olaf</div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">Op do 8 jul. 2021 om 12:28 schreef Jiří Sléžka <<a href="mailto:jiri.slezka@slu.cz">jiri.slezka@slu.cz</a>>:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">Hello gluster community,<br>
<br>
I am new to this list but using glusterfs for log time as our SDS <br>
solution for storing 80+TiB of data. I'm also using glusterfs for small <br>
3 node HCI cluster with oVirt 4.4.6 and CentOS 8 (not stream yet). <br>
Glusterfs version here is 8.5-2.el8.x86_64.<br>
<br>
For time to time (I belive) random brick on random host goes down <br>
because health-check. It looks like<br>
<br>
[root@ovirt-hci02 ~]# grep "posix_health_check" /var/log/glusterfs/bricks/*<br>
/var/log/glusterfs/bricks/gluster_bricks-vms2-vms2.log:[2021-07-07 <br>
07:13:37.408184] M [MSGID: 113075] <br>
[posix-helpers.c:2214:posix_health_check_thread_proc] 0-vms-posix: <br>
health-check failed, going down<br>
/var/log/glusterfs/bricks/gluster_bricks-vms2-vms2.log:[2021-07-07 <br>
07:13:37.408407] M [MSGID: 113075] <br>
[posix-helpers.c:2232:posix_health_check_thread_proc] 0-vms-posix: still <br>
alive! -> SIGTERM<br>
/var/log/glusterfs/bricks/gluster_bricks-vms2-vms2.log:[2021-07-07 <br>
16:11:14.518971] M [MSGID: 113075] <br>
[posix-helpers.c:2214:posix_health_check_thread_proc] 0-vms-posix: <br>
health-check failed, going down<br>
/var/log/glusterfs/bricks/gluster_bricks-vms2-vms2.log:[2021-07-07 <br>
16:11:14.519200] M [MSGID: 113075] <br>
[posix-helpers.c:2232:posix_health_check_thread_proc] 0-vms-posix: still <br>
alive! -> SIGTERM<br>
<br>
on other host<br>
<br>
[root@ovirt-hci01 ~]# grep "posix_health_check" /var/log/glusterfs/bricks/*<br>
/var/log/glusterfs/bricks/gluster_bricks-engine-engine.log:[2021-07-05 <br>
13:15:51.983327] M [MSGID: 113075] <br>
[posix-helpers.c:2214:posix_health_check_thread_proc] 0-engine-posix: <br>
health-check failed, going down<br>
/var/log/glusterfs/bricks/gluster_bricks-engine-engine.log:[2021-07-05 <br>
13:15:51.983728] M [MSGID: 113075] <br>
[posix-helpers.c:2232:posix_health_check_thread_proc] 0-engine-posix: <br>
still alive! -> SIGTERM<br>
/var/log/glusterfs/bricks/gluster_bricks-vms2-vms2.log:[2021-07-05 <br>
01:53:35.769129] M [MSGID: 113075] <br>
[posix-helpers.c:2214:posix_health_check_thread_proc] 0-vms-posix: <br>
health-check failed, going down<br>
/var/log/glusterfs/bricks/gluster_bricks-vms2-vms2.log:[2021-07-05 <br>
01:53:35.769819] M [MSGID: 113075] <br>
[posix-helpers.c:2232:posix_health_check_thread_proc] 0-vms-posix: still <br>
alive! -> SIGTERM<br>
<br>
I cannot link these errors to any storage/fs issue (in dmesg or <br>
/var/log/messages), brick devices looks healthy (smartd).<br>
<br>
I can force start brick with<br>
<br>
gluster volume start vms|engine force<br>
<br>
and after some healing all works fine for few days<br>
<br>
Did anybody observe this behavior?<br>
<br>
vms volume has this structure (two bricks per host, each is separate <br>
JBOD ssd disk), engine volume has one brick on each host...<br>
<br>
gluster volume info vms<br>
<br>
Volume Name: vms<br>
Type: Distributed-Replicate<br>
Volume ID: 52032ec6-99d4-4210-8fb8-ffbd7a1e0bf7<br>
Status: Started<br>
Snapshot Count: 0<br>
Number of Bricks: 2 x 3 = 6<br>
Transport-type: tcp<br>
Bricks:<br>
Brick1: 10.0.4.11:/gluster_bricks/vms/vms<br>
Brick2: 10.0.4.13:/gluster_bricks/vms/vms<br>
Brick3: 10.0.4.12:/gluster_bricks/vms/vms<br>
Brick4: 10.0.4.11:/gluster_bricks/vms2/vms2<br>
Brick5: 10.0.4.13:/gluster_bricks/vms2/vms2<br>
Brick6: 10.0.4.12:/gluster_bricks/vms2/vms2<br>
Options Reconfigured:<br>
cluster.granular-entry-heal: enable<br>
performance.stat-prefetch: off<br>
cluster.eager-lock: enable<br>
performance.io-cache: off<br>
performance.read-ahead: off<br>
performance.quick-read: off<br>
user.cifs: off<br>
network.ping-timeout: 30<br>
network.remote-dio: off<br>
performance.strict-o-direct: on<br>
performance.low-prio-threads: 32<br>
features.shard: on<br>
storage.owner-gid: 36<br>
storage.owner-uid: 36<br>
transport.address-family: inet<br>
storage.fips-mode-rchecksum: on<br>
nfs.disable: on<br>
performance.client-io-threads: off<br>
<br>
<br>
Cheers,<br>
<br>
Jiri<br>
<br>
________<br>
<br>
<br>
<br>
Community Meeting Calendar:<br>
<br>
Schedule -<br>
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC<br>
Bridge: <a href="https://meet.google.com/cpu-eiue-hvk" rel="noreferrer" target="_blank">https://meet.google.com/cpu-eiue-hvk</a><br>
Gluster-users mailing list<br>
<a href="mailto:Gluster-users@gluster.org" target="_blank">Gluster-users@gluster.org</a><br>
<a href="https://lists.gluster.org/mailman/listinfo/gluster-users" rel="noreferrer" target="_blank">https://lists.gluster.org/mailman/listinfo/gluster-users</a><br>
</blockquote></div>