[Gluster-users] heal hanging

Wed Jan 20 15:42:55 UTC 2016


>I am having issues with 3.6.6 where the load will spike up to 800% for 
>one of the glusterfsd processes and the users can no longer access the 
>system.  If I reboot the node, the heal will finish normally after a 
>few minutes and the system will be responsive, but a few hours later 
>the issue will start again.  It look like it is hanging in a heal and 
>spinning up the load on one of the bricks.  The heal gets stuck and 
>says it is crawling and never returns.  After a few minutes of the heal 
>saying it is crawling, the load spikes up and the mounts become 
>unresponsive.
>
>Any suggestions on how to fix this?  It has us stopped cold as the user 
>can no longer access the systems when the load spikes... Logs attached.
>
>System setup info is:
>
>[root at gfs01a ~]# gluster volume info homegfs
>
>Volume Name: homegfs
>Type: Distributed-Replicate
>Volume ID: 1e32672a-f1b7-4b58-ba94-58c085e59071
>Status: Started
>Number of Bricks: 4 x 2 = 8
>Transport-type: tcp
>Bricks:
>Brick1: gfsib01a.corvidtec.com:/data/brick01a/homegfs
>Brick2: gfsib01b.corvidtec.com:/data/brick01b/homegfs
>Brick3: gfsib01a.corvidtec.com:/data/brick02a/homegfs
>Brick4: gfsib01b.corvidtec.com:/data/brick02b/homegfs
>Brick5: gfsib02a.corvidtec.com:/data/brick01a/homegfs
>Brick6: gfsib02b.corvidtec.com:/data/brick01b/homegfs
>Brick7: gfsib02a.corvidtec.com:/data/brick02a/homegfs
>Brick8: gfsib02b.corvidtec.com:/data/brick02b/homegfs
>Options Reconfigured:
>performance.io-thread-count: 32
>performance.cache-size: 128MB
>performance.write-behind-window-size: 128MB
>server.allow-insecure: on
>network.ping-timeout: 42
>storage.owner-gid: 100
>geo-replication.indexing: off
>geo-replication.ignore-pid-check: on
>changelog.changelog: off
>changelog.fsync-interval: 3
>changelog.rollover-time: 15
>server.manage-gids: on
>diagnostics.client-log-level: WARNING
>
>[root at gfs01a ~]# rpm -qa | grep gluster
>gluster-nagios-common-0.1.1-0.el6.noarch
>glusterfs-fuse-3.6.6-1.el6.x86_64
>glusterfs-debuginfo-3.6.6-1.el6.x86_64
>glusterfs-libs-3.6.6-1.el6.x86_64
>glusterfs-geo-replication-3.6.6-1.el6.x86_64
>glusterfs-api-3.6.6-1.el6.x86_64
>glusterfs-devel-3.6.6-1.el6.x86_64
>glusterfs-api-devel-3.6.6-1.el6.x86_64
>glusterfs-3.6.6-1.el6.x86_64
>glusterfs-cli-3.6.6-1.el6.x86_64
>glusterfs-rdma-3.6.6-1.el6.x86_64
>samba-vfs-glusterfs-4.1.11-2.el6.x86_64
>glusterfs-server-3.6.6-1.el6.x86_64
>glusterfs-extra-xlators-3.6.6-1.el6.x86_64
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20160120/11cdb723/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: glusterfs-log.tgz
Type: application/x-compressed
Size: 6004421 bytes
Desc: not available
URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20160120/11cdb723/attachment-0001.bin>