[Gluster-users] poor performance during healing

Tue Feb 24 01:41:08 UTC 2015

On 02/24/2015 05:00 AM, Craig Yoshioka wrote:
> I’m using Gluster 3.6 to host a volume with some KVM images.  I’d seen before that other people were having terrible performance while Gluster was auto-healing but that a rewrite in 3.6 had potentially solved this problem.
>
> Well, it hasn’t (for me).  If my gluster volume starts to auto-heal, performance can get so bad that some of the VMs essentially lock up.  In top I can see the glusterfsd process sometime hitting 700% of the CPU.  Is there anything I can do to prevent this by throttling the healing process?
For VM workloads, you could set the 'cluster.data-self-heal-algorithm' 
option to 'full'. The checksum computation in the 'diff' algorithm can 
be cpu intensive, especially since VM images are big files.

[root at tuxpad glusterfs]# gluster v set help|grep algorithm
Option: cluster.data-self-heal-algorithm
Description: Select between "full", "diff". The "full" algorithm copies 
the entire file from source to sink. The "diff" algorithm copies to sink 
only those blocks whose checksums don't match with those of source. If 
no option is configured the option is chosen dynamically as follows: If 
the file does not exist on one of the sinks or empty file exists or if 
the source file size is about the same as page size the entire file will 
be read and written i.e "full" algo, otherwise "diff" algo is chosen.

Hope this helps.
Ravi

> Here are my volume options:
>
> Volume Name: vm-images
> Type: Replicate
> Volume ID: 5b38ddbe-a1ae-4e10-b0ad-dcd785a44493
> Status: Started
> Number of Bricks: 1 x 2 = 2
> Transport-type: tcp
> Bricks:
> Brick1: vmhost-1:/gfs/brick-0
> Brick2: vmhost-2:/gfs/brick-0
> Options Reconfigured:
> nfs.disable: on
> cluster.quorum-count: 1
> network.frame-timeout: 1800
> network.ping-timeout: 15
> server.allow-insecure: on
> storage.owner-gid: 36
> storage.owner-uid: 107
> performance.quick-read: off
> performance.read-ahead: off
> performance.io-cache: off
> performance.stat-prefetch: off
> cluster.eager-lock: enable
> network.remote-dio: enable
> cluster.quorum-type: fixed
> cluster.server-quorum-type: server
> cluster.server-quorum-ratio: 51%
>
> Thanks!
> -Craig
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-users