[Gluster-users] 100% cpu on brick replication

Fri May 29 07:07:19 UTC 2015

On 05/29/2015 12:34 PM, Pedro Oriani wrote:
> Hi Pranith,
>
> it's for sure related to a replication / healing task, because 
> occurses when you create a new replicated brick or when you bring back 
> online an old one.
> The problem is that the cpu load on the online brick is so high that I 
> cannot do normal operations.
> In my case when a replication / healing occurs, the cluster cannot 
> serve content.
> I'm asking if there is a way to limit cpu usage in this case, or set a 
> less aggressive mode, because otherwise I have to rethink the image 
> repository.
Disable self-heal. I see that you already did that for self-heal daemon. 
Lets do that even for mounts.
gluster volume set <volname> cluster.entry-self-heal off

Let me know how that goes.

Pranith
>
> thanks,
> Pedro
>
> ------------------------------------------------------------------------
> Date: Fri, 29 May 2015 11:14:29 +0530
> From: pkarampu at redhat.com
> To: sgunfio at hotmail.com; gluster-users at gluster.org
> Subject: Re: [Gluster-users] 100% cpu on brick replication
>
>
>
> On 05/27/2015 08:48 PM, Pedro Oriani wrote:
>
>     Hi All,
>     I'm writing because I'm experiecing an issue with gluster's
>     replication feature.
>     I've a brick on srv1 with about 2TB of mixed side files, ranging
>     from 10k a 300k
>     When I add a new replication brick on srv2, the glusterfs process
>     take all the cpu.
>     This is unsuitable because the volume is not responding at normal
>     r/w queries.
>
>     Glusterfs version is 3.7.0
>
> Is it because of self-heals? Was the brick offline until then?
>
> Pranith
>
>
>     the underlaying volume is xfs.
>
>
>     Volume Name: vol1
>     Type: Replicate
>     Volume ID:
>     Status: Started
>     Number of Bricks: 1 x 2 = 2
>     Transport-type: tcp
>     Bricks:
>     Brick1: 172.16.0.1:/data/glusterfs/vol1/brick1/brick
>     Brick2: 172.16.0.2:/data/glusterfs/vol1/brick1/brick
>     Options Reconfigured:
>     performance.cache-size: 1gb
>     cluster.self-heal-daemon: off
>     cluster.data-self-heal-algorithm: full
>     cluster.metadata-self-heal: off
>     performance.cache-max-file-size: 2MB
>     performance.cache-refresh-timeout: 1
>     performance.stat-prefetch: off
>     performance.read-ahead: on
>     performance.quick-read: off
>     performance.write-behind-window-size: 4MB
>     performance.flush-behind: on
>     performance.write-behind: on
>     performance.io-thread-count: 32
>     performance.io-cache: on
>     network.ping-timeout: 2
>     nfs.addr-namelookup: off
>     performance.strict-write-ordering: on
>
>
>     there is any parameter or hint that I can follow to limit cpu
>     occupation to grant a replication with few lag on normal operations ?
>
>     thank
>
>
>     _______________________________________________
>     Gluster-users mailing list
>     Gluster-users at gluster.org  <mailto:Gluster-users at gluster.org>
>     http://www.gluster.org/mailman/listinfo/gluster-users
>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20150529/71cc7b4c/attachment.html>