[Gluster-users] Maximising Heal Speed

Thu Apr 14 14:07:38 UTC 2016

On 14/04/2016 7:47 PM, Krutika Dhananjay wrote:
> Just curious, are you seeing poor heal performance of VMs by any 
> chance in 3.7.9 *even* with sharding?

No its actually been pretty good, apart from earlier noted quirks :) but 
I have to work to trigger them.

The test I ran tonight was with a volume hosting 8 running VM's for a 
total of 500GB of data. I killed one brick and let damaged shard 
accumulate to around 5000 - basically 20GB of data. It finished healing 
that in under two hours and I believe it transferred a lot less than 
20GB of data, checksum comparisons I guess. I/O and CPU were barely 
impacted,, all VM's kept running normally.

A non sharded volume would have needed to check all 500GB, it would have 
taken all night and seriously hogged the disks and  CPU or network.

Performance is  is really good now I can disable 
performance.strict-write-ordering and enable performance.stat-prefetch. 
VM is getting 110MB/s sequential writes and pretty good iops as well. 
This is on a 3 node rep 3 cluster with 1G*2 LACP Bonded ports.

And I cannot stress how great "heal datastore statistic heal-count" is, 
it returns immediately, perfect for placing in a watch statement and 
gives you a good feel for progress, you can see it ticking down. A good 
confidence booster.

-- 
Lindsay Mathieson