[Gluster-users] heal hanging

Wed Jan 20 15:56:24 UTC 2016

resending with parsed logs...

>
>
>>I am having issues with 3.6.6 where the load will spike up to 800% for 
>>one of the glusterfsd processes and the users can no longer access the 
>>system.  If I reboot the node, the heal will finish normally after a 
>>few minutes and the system will be responsive, but a few hours later 
>>the issue will start again.  It look like it is hanging in a heal and 
>>spinning up the load on one of the bricks.  The heal gets stuck and 
>>says it is crawling and never returns.  After a few minutes of the 
>>heal saying it is crawling, the load spikes up and the mounts become 
>>unresponsive.
>>
>>Any suggestions on how to fix this?  It has us stopped cold as the 
>>user can no longer access the systems when the load spikes... Logs 
>>attached.
>>
>>System setup info is:
>>
>>[root at gfs01a ~]# gluster volume info homegfs
>>
>>Volume Name: homegfs
>>Type: Distributed-Replicate
>>Volume ID: 1e32672a-f1b7-4b58-ba94-58c085e59071
>>Status: Started
>>Number of Bricks: 4 x 2 = 8
>>Transport-type: tcp
>>Bricks:
>>Brick1: gfsib01a.corvidtec.com:/data/brick01a/homegfs
>>Brick2: gfsib01b.corvidtec.com:/data/brick01b/homegfs
>>Brick3: gfsib01a.corvidtec.com:/data/brick02a/homegfs
>>Brick4: gfsib01b.corvidtec.com:/data/brick02b/homegfs
>>Brick5: gfsib02a.corvidtec.com:/data/brick01a/homegfs
>>Brick6: gfsib02b.corvidtec.com:/data/brick01b/homegfs
>>Brick7: gfsib02a.corvidtec.com:/data/brick02a/homegfs
>>Brick8: gfsib02b.corvidtec.com:/data/brick02b/homegfs
>>Options Reconfigured:
>>performance.io-thread-count: 32
>>performance.cache-size: 128MB
>>performance.write-behind-window-size: 128MB
>>server.allow-insecure: on
>>network.ping-timeout: 42
>>storage.owner-gid: 100
>>geo-replication.indexing: off
>>geo-replication.ignore-pid-check: on
>>changelog.changelog: off
>>changelog.fsync-interval: 3
>>changelog.rollover-time: 15
>>server.manage-gids: on
>>diagnostics.client-log-level: WARNING
>>
>>[root at gfs01a ~]# rpm -qa | grep gluster
>>gluster-nagios-common-0.1.1-0.el6.noarch
>>glusterfs-fuse-3.6.6-1.el6.x86_64
>>glusterfs-debuginfo-3.6.6-1.el6.x86_64
>>glusterfs-libs-3.6.6-1.el6.x86_64
>>glusterfs-geo-replication-3.6.6-1.el6.x86_64
>>glusterfs-api-3.6.6-1.el6.x86_64
>>glusterfs-devel-3.6.6-1.el6.x86_64
>>glusterfs-api-devel-3.6.6-1.el6.x86_64
>>glusterfs-3.6.6-1.el6.x86_64
>>glusterfs-cli-3.6.6-1.el6.x86_64
>>glusterfs-rdma-3.6.6-1.el6.x86_64
>>samba-vfs-glusterfs-4.1.11-2.el6.x86_64
>>glusterfs-server-3.6.6-1.el6.x86_64
>>glusterfs-extra-xlators-3.6.6-1.el6.x86_64
>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20160120/34032f34/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: glusterfs-log.tgz
Type: application/x-compressed
Size: 880609 bytes
Desc: not available
URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20160120/34032f34/attachment-0001.bin>