[Gluster-users] Gluster endless heal
mahdi.adnan at outlook.com
Wed Jan 17 18:50:51 UTC 2018
I have an issue with Gluster 3.8.14.
The cluster is 4 nodes with replica count 2, on of the nodes went offline for around 15 minutes, when it came back online, self heal triggered and it just did not stop afterward, it's been running for 3 days now, maxing the bricks utilization without actually healing anything.
The bricks are all SSDs, and the logs of the source node is spamming with the following messages;
[2018-01-17 18:37:11.815247] I [MSGID: 108026] [afr-self-heal-common.c:1254:afr_log_selfheal] 0-ovirt_imgs-replicate-0: Completed data selfheal on 450fb07a-e95d-48ef-a229-48917557c278. sources= sinks=1
[2018-01-17 18:37:12.830887] I [MSGID: 108026] [afr-self-heal-metadata.c:51:__afr_selfheal_metadata_do] 0-ovirt_imgs-replicate-0: performing metadata selfheal on ce0f545d-635a-40c0-95eb-ccfc71971f78
[2018-01-17 18:37:12.845978] I [MSGID: 108026] [afr-self-heal-common.c:1254:afr_log_selfheal] 0-ovirt_imgs-replicate-0: Completed metadata selfheal on ce0f545d-635a-40c0-95eb-ccfc71971f78. sources= sinks=1
I tried restarting glusterd and rebooting the node after about 24 hours of healing, but it just did not help, i had like several bricks doing heal and after rebooting it's now only 4 bricks doing heal.
The volume is used for oVirt storage domain with sharding enabled.
No errors or warnings on both nodes, just info messages about afr healing.
any idea whats going on or where should i start looking ?
Mahdi A. Mahdi
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Gluster-users