[Gluster-users] Healing queue rarely empty

Nicolas Ecarnot nicolas at ecarnot.net
Thu Dec 17 09:51:46 UTC 2015

Le 17/12/2015 10:10, Nicolas Ecarnot a écrit :
> Hello,
> Our setup : 3 Centos 7.2 nodes, with gluster 3.7.6 in replica-3, used as
> storage+compute for an oVirt 3.5.6 DC.
> Two days ago, we added some nagios/centreon monitoring watching every 5
> minutes the state of the heal queue :
> (something like "gluster volume heal some_vol info" with the adequate
> grep).
> I expected the "Number of entries" of every node to appear in the graph
> as a flat zero line, most of the times, except for the rare cases of
> node reboot, after which healing is launched and takes some minutes
> (sometimes hours) but is doing good.
> Instead, we see that the healing queue is doing 2 or 3 files healing say
> 4 times an hour. All day long.
> Our DC is a small one, and has few VMs, so not more than only 8 big
> files are stored in glusterfs.
> I'm very surprised to see that these files constantly need healing, as I
> thought I've understood that read/writes were synchronous at every time,
> and replica-3 meant that every files were absolutely synced and commited
> at all time.
> I've also read about the 10 minutes cron-like job of the self-healing
> daemon, which we are using by default, but this is a second point.
> The first point leads to :
> - Why do we see so frequent desynchronizations between nodes?
> - Can I confirm that reading which logs?
> - What must I check?

Self-replying, but as I found :

could this make sense to be surprised to see that :

gluster volume get data cluster.op-version
Option                                  Value 

------                                  ----- 

cluster.op-version                      30600

in a 3.7.6 gluster cluster?

I have absolutely no idea of what this means nor how this changes 
anything. But I see many things in my logs like :

Server and Client lk-version numbers are not same, reopening the fds


many many errors in etc-glusterfs-glusterd.vol.log about
missing options, other points like 'Unable to release lock', very 
frequent vol reqs :

What is op-version used for?


More information about the Gluster-users mailing list