[Bugs] [Bug 1403984] New: Node node high CPU - healing entries increasing

Mon Dec 12 19:36:00 UTC 2016

https://bugzilla.redhat.com/show_bug.cgi?id=1403984

            Bug ID: 1403984
           Summary: Node node high CPU - healing entries increasing
           Product: GlusterFS
           Version: 3.8
         Component: core
          Severity: urgent
          Assignee: bugs at gluster.org
          Reporter: tu2Bgone at gmail.com
                CC: bugs at gluster.org

Created attachment 1230893
  --> https://bugzilla.redhat.com/attachment.cgi?id=1230893&action=edit
statedump from gluster node with high load

Description of problem:

3 x node Fedora Cluster in AWS (m4.xlarge) (Fedora 23 (Cloud Edition))
2.5Tb volume

One node out of 3 gets high CPU, healing entries increase. Logs on one node
with low CPU usage keep getting messages similar to this:

I [MSGID: 115072] [server-rpc-fops.c:1640:server_setattr_cbk]
0-marketplace_nfs-server: 6954047: SETATTR /ftpdata/<removed>/2_kamih2.zip
(c46c2f49-9688-4617-9541-a7181b495f80) ==> (Operation not permitted) [Operation
not permitted]

I have asked previously about this message on the mailing list but nobody would
answer.

Other two nodes (one with high CPU and one with low) logs are quiet, with
occassional heal messages 

To reduce load we have to stop reading and writing to volumes.

After having trouble explained here
https://bugzilla.redhat.com/show_bug.cgi?id=1402621 we upgraded the cluster
3.8. The cluster of 3x m4.xlarge hosts (4CPU 16G RAM) support only 12 clients
at most. 

Version-Release number of selected component (if applicable):
3.8

How reproducible:
Performance issue occurring regularly.

Steps to Reproduce:
Running find /path -type f -exec stat {} \; can show a significant increase
from just one host.

Additional info:

sudo gluster volume info

Volume Name: marketplace_nfs
Type: Distributed-Replicate
Volume ID: 528de1b5-0bd5-488b-83cf-c4f3f747e6cd
Status: Started
Snapshot Count: 0
Number of Bricks: 2 x 3 = 6
Transport-type: tcp
Bricks:
Brick1: 10.90.5.105:/data/data0/marketplace_nfs
Brick2: 10.90.3.14:/data/data3/marketplace_nfs
Brick3: 10.90.4.195:/data/data0/marketplace_nfs
Brick4: 10.90.5.105:/data/data1/marketplace_nfs
Brick5: 10.90.3.14:/data/data1/marketplace_nfs
Brick6: 10.90.4.195:/data/data1/marketplace_nfs
Options Reconfigured:
performance.client-io-threads: on
performance.io-thread-count: 12
server.event-threads: 3
client.event-threads: 3
server.outstanding-rpc-limit: 256
cluster.self-heal-readdir-size: 16KB
cluster.self-heal-window-size: 3
diagnostics.brick-log-level: INFO
network.ping-timeout: 15
cluster.quorum-type: none
performance.readdir-ahead: on
cluster.self-heal-daemon: enable
performance.cache-size: 1024MB
cluster.lookup-optimize: on
cluster.data-self-heal-algorithm: diff
nfs.disable: off
cluster.server-quorum-ratio: 51%

sudo gluster volume status
Status of volume: marketplace_nfs
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick 10.90.5.105:/data/data0/marketplace_n
fs                                          49155     0          Y       20611
Brick 10.90.3.14:/data/data3/marketplace_nf
s                                           49158     0          Y       23161
Brick 10.90.4.195:/data/data0/marketplace_n
fs                                          49155     0          Y       5504
Brick 10.90.5.105:/data/data1/marketplace_n
fs                                          49156     0          Y       20616
Brick 10.90.3.14:/data/data1/marketplace_nf
s                                           49159     0          Y       23166
Brick 10.90.4.195:/data/data1/marketplace_n
fs                                          49156     0          Y       5509
NFS Server on localhost                     2049      0          Y       23250
Self-heal Daemon on localhost               N/A       N/A        Y       23262
NFS Server on ip-10-90-4-195.ec2.internal   2049      0          Y       25289
Self-heal Daemon on ip-10-90-4-195.ec2.inte
rnal                                        N/A       N/A        Y       25297
NFS Server on ip-10-90-5-105.ec2.internal   2049      0          Y       8405
Self-heal Daemon on ip-10-90-5-105.ec2.inte
rnal                                        N/A       N/A        Y       8416

Task Status of Volume marketplace_nfs
------------------------------------------------------------------------------
There are no active volume tasks

Status of volume: marketplace_uploads
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick 10.90.4.195:/data/data2/uploads       49157     0          Y       5528
Brick 10.90.3.14:/data/data2/uploads        49160     0          Y       23180
Brick 10.90.5.105:/data/data2/uploads       49157     0          Y       20621
NFS Server on localhost                     2049      0          Y       23250
Self-heal Daemon on localhost               N/A       N/A        Y       23262
NFS Server on ip-10-90-4-195.ec2.internal   2049      0          Y       25289
Self-heal Daemon on ip-10-90-4-195.ec2.inte
rnal                                        N/A       N/A        Y       25297
NFS Server on ip-10-90-5-105.ec2.internal   2049      0          Y       8405
Self-heal Daemon on ip-10-90-5-105.ec2.inte
rnal                                        N/A       N/A        Y       8416

Task Status of Volume marketplace_uploads
------------------------------------------------------------------------------
There are no active volume tasks

-- 
You are receiving this mail because:
You are on the CC list for the bug.
You are the assignee for the bug.