[Gluster-users] 100% cpu on brick replication

Fri May 29 08:16:52 UTC 2015

Could you give gluster volume info output?

Pranith

On 05/29/2015 01:18 PM, Pedro Oriani wrote:
> I've set
>
> cluster.entry-self-heal: off
>
> Maybe I've missed, and when started the service on srv02 seemed to do 
> the job.
> then i've restarted the service.
>
> on srv02
>
> 11607 ?        Ssl    0:00 /usr/sbin/glusterfs -s localhost 
> --volfile-id gluster/glustershd -p 
> /var/lib/glusterd/glustershd/run/glustershd.pid -l 
> /var/log/glusterfs/glustershd.log -S 
> /var/run/gluster/eb93ca526d4559069efc40da9c71b3a4.socket 
> --xlator-option *replicate*.node-uuid=7207ea30-41e9-4344-8fc3-47743b83629e
> 11612 ?        Ssl    0:03 /usr/sbin/glusterfsd -s 172.16.0.2 
> --volfile-id vol1.172.16.0.2.data-glusterfs-vol1-brick1-brick -p 
> /var/lib/glusterd/vols/vol1/run/172.16.0.2-data-glusterfs-vol1-brick1-brick.pid 
> -S /var/run/gluster/09285d60c2c8c9aa546602147a99a347.socket 
> --brick-name /data/glusterfs/vol1/brick1/brick -l 
> /var/log/glusterfs/bricks/data-glusterfs-vol1-brick1-brick.log 
> --xlator-option 
> *-posix.glusterd-uuid=7207ea30-41e9-4344-8fc3-47743b83629e 
> --brick-port 49154 --xlator-option vol1-server.listen-port=49154
>
>
> it's seems like self healing starts and brings down srv01, with 600% load
>
> thanks,
> Pedro
>
> ------------------------------------------------------------------------
> Date: Fri, 29 May 2015 12:37:19 +0530
> From: pkarampu at redhat.com
> To: sgunfio at hotmail.com
> CC: Gluster-users at gluster.org
> Subject: Re: [Gluster-users] 100% cpu on brick replication
>
>
>
> On 05/29/2015 12:34 PM, Pedro Oriani wrote:
>
>     Hi Pranith,
>
>     it's for sure related to a replication / healing task, because
>     occurses when you create a new replicated brick or when you bring
>     back online an old one.
>     The problem is that the cpu load on the online brick is so high
>     that I cannot do normal operations.
>     In my case when a replication / healing occurs, the cluster cannot
>     serve content.
>     I'm asking if there is a way to limit cpu usage in this case, or
>     set a less aggressive mode, because otherwise I have to rethink
>     the image repository.
>
> Disable self-heal. I see that you already did that for self-heal 
> daemon. Lets do that even for mounts.
> gluster volume set <volname> cluster.entry-self-heal off
>
> Let me know how that goes.
>
> Pranith
>
>
>     thanks,
>     Pedro
>
>     ------------------------------------------------------------------------
>     Date: Fri, 29 May 2015 11:14:29 +0530
>     From: pkarampu at redhat.com <mailto:pkarampu at redhat.com>
>     To: sgunfio at hotmail.com <mailto:sgunfio at hotmail.com>;
>     gluster-users at gluster.org <mailto:gluster-users at gluster.org>
>     Subject: Re: [Gluster-users] 100% cpu on brick replication
>
>
>
>     On 05/27/2015 08:48 PM, Pedro Oriani wrote:
>
>         Hi All,
>         I'm writing because I'm experiecing an issue with gluster's
>         replication feature.
>         I've a brick on srv1 with about 2TB of mixed side files,
>         ranging from 10k a 300k
>         When I add a new replication brick on srv2, the glusterfs
>         process take all the cpu.
>         This is unsuitable because the volume is not responding at
>         normal r/w queries.
>
>         Glusterfs version is 3.7.0
>
>     Is it because of self-heals? Was the brick offline until then?
>
>     Pranith
>
>
>         the underlaying volume is xfs.
>
>
>         Volume Name: vol1
>         Type: Replicate
>         Volume ID:
>         Status: Started
>         Number of Bricks: 1 x 2 = 2
>         Transport-type: tcp
>         Bricks:
>         Brick1: 172.16.0.1:/data/glusterfs/vol1/brick1/brick
>         Brick2: 172.16.0.2:/data/glusterfs/vol1/brick1/brick
>         Options Reconfigured:
>         performance.cache-size: 1gb
>         cluster.self-heal-daemon: off
>         cluster.data-self-heal-algorithm: full
>         cluster.metadata-self-heal: off
>         performance.cache-max-file-size: 2MB
>         performance.cache-refresh-timeout: 1
>         performance.stat-prefetch: off
>         performance.read-ahead: on
>         performance.quick-read: off
>         performance.write-behind-window-size: 4MB
>         performance.flush-behind: on
>         performance.write-behind: on
>         performance.io-thread-count: 32
>         performance.io-cache: on
>         network.ping-timeout: 2
>         nfs.addr-namelookup: off
>         performance.strict-write-ordering: on
>
>
>         there is any parameter or hint that I can follow to limit cpu
>         occupation to grant a replication with few lag on normal
>         operations ?
>
>         thank
>
>
>         _______________________________________________
>         Gluster-users mailing list
>         Gluster-users at gluster.org  <mailto:Gluster-users at gluster.org>
>         http://www.gluster.org/mailman/listinfo/gluster-users
>
>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20150529/10b80381/attachment.html>