[Gluster-users] 3.8.2 : Node not healing

Mon Aug 15 06:49:24 UTC 2016

Moved to a new subject as its now an issue on our cluster.

As an experiment I killed glusterfsd on one node. System kept
chugging allong fine with no hiccups. I ran a few disk intensive VM's
on that node and others, no real slow down. Monitoring it with "heal
statistics heal-count'

heal-count got up to approx 2500 shards and restarted glusterfsd by
restarting the gluster-service (glusterd).

heal-count stopped rising, but what is concerning is that it doesn't
seem to be going back down. 45min later at its stable at 2439 files
needing healed and glusterfsd is thrashing the CPU's on that node
(1000%!)

The glfsheal log has no entries at all.

Previously (3.7.x) when I've done this test, heals kicked in very rapidly.

At three hours later, still no progress in heal at all. VM's on other
nodes getting occasional read timeouts.

heal-count = 2550, and not changing.

-- 
Lindsay