<div dir="ltr">Hi Jiri,<div><br></div><div>your probleem looks pretty similar to mine, see;Â <a href="https://lists.gluster.org/pipermail/gluster-users/2021-February/039134.html">https://lists.gluster.org/pipermail/gluster-users/2021-February/039134.html</a></div><div>Any chance you also see the xfs errors in de brick logs?</div><div>For me the situation improved once i disabled brick multiplexing, but i don't see that in your volume configuration.</div><div><br></div><div>Cheers Olaf</div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">Op do 8 jul. 2021 om 12:28 schreef JiÅ™Ã SlÃ©Å¾ka <<a href="mailto:jiri.slezka@slu.cz">jiri.slezka@slu.cz</a>>:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">Hello gluster community,<br>

<br>

I am new to this list but using glusterfs for log time as our SDS <br>

solution for storing 80+TiB of data. I'm also using glusterfs for small <br>

3 node HCI cluster with oVirt 4.4.6 and CentOS 8 (not stream yet). <br>

Glusterfs version here is 8.5-2.el8.x86_64.<br>

<br>

For time to time (I belive) random brick on random host goes down <br>

because health-check. It looks like<br>

<br>

[root@ovirt-hci02 ~]# grep "posix_health_check" /var/log/glusterfs/bricks/*<br>

/var/log/glusterfs/bricks/gluster_bricks-vms2-vms2.log:[2021-07-07 <br>

07:13:37.408184] M [MSGID: 113075] <br>

[posix-helpers.c:2214:posix_health_check_thread_proc] 0-vms-posix: <br>

health-check failed, going down<br>

/var/log/glusterfs/bricks/gluster_bricks-vms2-vms2.log:[2021-07-07 <br>

07:13:37.408407] M [MSGID: 113075] <br>

[posix-helpers.c:2232:posix_health_check_thread_proc] 0-vms-posix: still <br>

alive! -> SIGTERM<br>

/var/log/glusterfs/bricks/gluster_bricks-vms2-vms2.log:[2021-07-07 <br>

16:11:14.518971] M [MSGID: 113075] <br>

[posix-helpers.c:2214:posix_health_check_thread_proc] 0-vms-posix: <br>

health-check failed, going down<br>

/var/log/glusterfs/bricks/gluster_bricks-vms2-vms2.log:[2021-07-07 <br>

16:11:14.519200] M [MSGID: 113075] <br>

[posix-helpers.c:2232:posix_health_check_thread_proc] 0-vms-posix: still <br>

alive! -> SIGTERM<br>

<br>

on other host<br>

<br>

[root@ovirt-hci01 ~]# grep "posix_health_check" /var/log/glusterfs/bricks/*<br>

/var/log/glusterfs/bricks/gluster_bricks-engine-engine.log:[2021-07-05 <br>

13:15:51.983327] M [MSGID: 113075] <br>

[posix-helpers.c:2214:posix_health_check_thread_proc] 0-engine-posix: <br>

health-check failed, going down<br>

/var/log/glusterfs/bricks/gluster_bricks-engine-engine.log:[2021-07-05 <br>

13:15:51.983728] M [MSGID: 113075] <br>

[posix-helpers.c:2232:posix_health_check_thread_proc] 0-engine-posix: <br>

still alive! -> SIGTERM<br>

/var/log/glusterfs/bricks/gluster_bricks-vms2-vms2.log:[2021-07-05 <br>

01:53:35.769129] M [MSGID: 113075] <br>

[posix-helpers.c:2214:posix_health_check_thread_proc] 0-vms-posix: <br>

health-check failed, going down<br>

/var/log/glusterfs/bricks/gluster_bricks-vms2-vms2.log:[2021-07-05 <br>

01:53:35.769819] M [MSGID: 113075] <br>

[posix-helpers.c:2232:posix_health_check_thread_proc] 0-vms-posix: still <br>

alive! -> SIGTERM<br>

<br>

I cannot link these errors to any storage/fs issue (in dmesg or <br>

/var/log/messages), brick devices looks healthy (smartd).<br>

<br>

I can force start brick with<br>

<br>

gluster volume start vms|engine force<br>

<br>

and after some healing all works fine for few days<br>

<br>

Did anybody observe this behavior?<br>

<br>

vms volume has this structure (two bricks per host, each is separate <br>

JBOD ssd disk), engine volume has one brick on each host...<br>

<br>

gluster volume info vms<br>

<br>

Volume Name: vms<br>

Type: Distributed-Replicate<br>

Volume ID: 52032ec6-99d4-4210-8fb8-ffbd7a1e0bf7<br>

Status: Started<br>

Snapshot Count: 0<br>

Number of Bricks: 2 x 3 = 6<br>

Transport-type: tcp<br>

Bricks:<br>

Brick1: 10.0.4.11:/gluster_bricks/vms/vms<br>

Brick2: 10.0.4.13:/gluster_bricks/vms/vms<br>

Brick3: 10.0.4.12:/gluster_bricks/vms/vms<br>

Brick4: 10.0.4.11:/gluster_bricks/vms2/vms2<br>

Brick5: 10.0.4.13:/gluster_bricks/vms2/vms2<br>

Brick6: 10.0.4.12:/gluster_bricks/vms2/vms2<br>

Options Reconfigured:<br>

cluster.granular-entry-heal: enable<br>

performance.stat-prefetch: off<br>

cluster.eager-lock: enable<br>

performance.io-cache: off<br>

performance.read-ahead: off<br>

performance.quick-read: off<br>

user.cifs: off<br>

network.ping-timeout: 30<br>

network.remote-dio: off<br>

performance.strict-o-direct: on<br>

performance.low-prio-threads: 32<br>

features.shard: on<br>

storage.owner-gid: 36<br>

storage.owner-uid: 36<br>

transport.address-family: inet<br>

storage.fips-mode-rchecksum: on<br>

nfs.disable: on<br>

performance.client-io-threads: off<br>

<br>

<br>

Cheers,<br>

<br>

Jiri<br>

<br>

________<br>

<br>

<br>

<br>

Community Meeting Calendar:<br>

<br>

Schedule -<br>

Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC<br>

Bridge: <a href="https://meet.google.com/cpu-eiue-hvk" rel="noreferrer" target="_blank">https://meet.google.com/cpu-eiue-hvk</a><br>

Gluster-users mailing list<br>

<a href="mailto:Gluster-users@gluster.org" target="_blank">Gluster-users@gluster.org</a><br>

<a href="https://lists.gluster.org/mailman/listinfo/gluster-users" rel="noreferrer" target="_blank">https://lists.gluster.org/mailman/listinfo/gluster-users</a><br>

</blockquote></div>