[Bugs] [Bug 1402621] New: High load one node, gluster fuse clients hang, heal info does not complete
bugzilla at redhat.com
bugzilla at redhat.com
Thu Dec 8 00:22:04 UTC 2016
https://bugzilla.redhat.com/show_bug.cgi?id=1402621
Bug ID: 1402621
Summary: High load one node, gluster fuse clients hang, heal
info does not complete
Product: GlusterFS
Version: 3.7.16
Component: glusterd
Severity: high
Assignee: bugs at gluster.org
Reporter: tu2Bgone at gmail.com
CC: bugs at gluster.org
Created attachment 1229281
--> https://bugzilla.redhat.com/attachment.cgi?id=1229281&action=edit
ftp gluster fuse client log (redacted personal information)
Description of problem:
We have a problem that has occurred twice in two days, but has occurred more
than once before.
3 x node Fedora Cluster in AWS (m4.xlarge) (Fedora 23 (Cloud Edition))
2.5Tb volume
Volume Name: marketplace_nfs
Type: Distributed-Replicate
Volume ID: 528de1b5-0bd5-488b-83cf-c4f3f747e6cd
Status: Started
Number of Bricks: 2 x 3 = 6
Transport-type: tcp
Bricks:
Brick1: 10.90.5.105:/data/data0/marketplace_nfs
Brick2: 10.90.3.14:/data/data3/marketplace_nfs
Brick3: 10.90.4.195:/data/data0/marketplace_nfs
Brick4: 10.90.5.105:/data/data1/marketplace_nfs
Brick5: 10.90.3.14:/data/data1/marketplace_nfs
Brick6: 10.90.4.195:/data/data1/marketplace_nfs
Options Reconfigured:
server.outstanding-rpc-limit: 128
cluster.self-heal-readdir-size: 16KB
cluster.self-heal-window-size: 3
diagnostics.brick-log-level: INFO
network.ping-timeout: 15
cluster.quorum-type: none
performance.readdir-ahead: on
cluster.self-heal-daemon: enable
performance.cache-size: 512MB
cluster.lookup-optimize: on
cluster.data-self-heal-algorithm: diff
cluster.server-quorum-ratio: 51%
Status of volume: marketplace_nfs
Gluster process TCP Port RDMA Port Online Pid
------------------------------------------------------------------------------
Brick 10.90.5.105:/data/data0/marketplace_n
fs 49152 0 Y 3426
Brick 10.90.3.14:/data/data3/marketplace_nf
s 49154 0 Y 3402
Brick 10.90.4.195:/data/data0/marketplace_n
fs 49152 0 Y 4868
Brick 10.90.5.105:/data/data1/marketplace_n
fs 49153 0 Y 31636
Brick 10.90.3.14:/data/data1/marketplace_nf
s 49153 0 Y 348
Brick 10.90.4.195:/data/data1/marketplace_n
fs 49153 0 Y 31238
NFS Server on localhost 2049 0 Y 3999
Self-heal Daemon on localhost N/A N/A Y 4008
NFS Server on ip-10-90-5-105.ec2.internal 2049 0 Y 1488
Self-heal Daemon on ip-10-90-5-105.ec2.inte
rnal N/A N/A Y 1496
NFS Server on ip-10-90-4-195.ec2.internal 2049 0 Y 20526
Self-heal Daemon on ip-10-90-4-195.ec2.inte
rnal N/A N/A Y 20534
Task Status of Volume marketplace_nfs
------------------------------------------------------------------------------
There are no active volume tasks
Version-Release number of selected component (if applicable):
3.7.16
How reproducible:
Cannot reproduce on demand but occurs frequently.
Actual results:
Client processes hang and cannot list the GlusterFS mount
$ gluster volume heal marketplace_nfs info hangs and cannot list healing
information
Shutdown clients (not umount - halt clients)
$ gluster volume heal completes
Load starts reducing and we can remount.
Recovery time is around 20 minutes and causes significant problems
Expected results:
This does not happen
Additional info:
The file size average is 13Mb - 5Gb is around the largest size. We do some post
processing after initial upload (mv, unzip, mv, delete). We have the logs from
the ftp server, web servers also mount and work off this volume but we do not
have logs from them.
Gluster servers provide no useful logging during this time. I will attach
statedumps as well as the client log.
--
You are receiving this mail because:
You are on the CC list for the bug.
You are the assignee for the bug.
More information about the Bugs
mailing list