[Gluster-users] Healing queue rarely empty

Nicolas Ecarnot nicolas at ecarnot.net
Thu Dec 17 09:10:30 UTC 2015


Our setup : 3 Centos 7.2 nodes, with gluster 3.7.6 in replica-3, used as 
storage+compute for an oVirt 3.5.6 DC.

Two days ago, we added some nagios/centreon monitoring watching every 5 
minutes the state of the heal queue :
(something like "gluster volume heal some_vol info" with the adequate grep).

I expected the "Number of entries" of every node to appear in the graph 
as a flat zero line, most of the times, except for the rare cases of 
node reboot, after which healing is launched and takes some minutes 
(sometimes hours) but is doing good.

Instead, we see that the healing queue is doing 2 or 3 files healing say 
4 times an hour. All day long.

Our DC is a small one, and has few VMs, so not more than only 8 big 
files are stored in glusterfs.
I'm very surprised to see that these files constantly need healing, as I 
thought I've understood that read/writes were synchronous at every time, 
and replica-3 meant that every files were absolutely synced and commited 
at all time.

I've also read about the 10 minutes cron-like job of the self-healing 
daemon, which we are using by default, but this is a second point.

The first point leads to :
- Why do we see so frequent desynchronizations between nodes?
- Can I confirm that reading which logs?
- What must I check?


