[Gluster-users] Healing Delays

Sun Oct 2 00:19:55 UTC 2016

On 2/10/2016 12:48 AM, Lindsay Mathieson wrote:
> Only the heal count does not change, it just does not seem to start. 
> It can take hours before it shifts, but once it does, its quite rapid. 
> Node 1 has restarted and the heal count has been static at 511 shards 
> for 45 minutes now. Nodes 1 & 2 have low CPU load, node 3 has 
> glusterfsd pegged at 800% CPU. 

Ok, had a try at systematically reproducing it this morning and was 
actually unable to do so - quite weird. Testing was the same as last 
night - move all the VM's off a server and reboot it, wait for the 
healing to finish. This time I tried it with various different settings.

Test 1
------
cluster.granular-entry-heal no
cluster.locking-scheme full
Shards / Min: 350 / 8

Test 2
------
cluster.granular-entry-heal yes
cluster.locking-scheme granular
Shards / Min:  391 / 10

Test 3
------
cluster.granular-entry-heal yes
cluster.locking-scheme granular
heal command issued
Shards / Min: 358 / 11

Test 3
------
cluster.granular-entry-heal yes
cluster.locking-scheme granular
heal full command issued
Shards / Min: 358 / 27

Best results were with cluster.granular-entry-heal=yes, 
cluster.locking-scheme=granular but they were all quite good.

Don't know why it was so much worse last night - i/o load, cpu and 
memory were the same. However one thin that is different which I can't 
easily reproduce was that the cluster had been running for several 
weeks, but last night I rebooted all nodes. Could gluster be developing 
an issue after running for some time?

-- 
Lindsay Mathieson