[Gluster-users] Healing queue rarely empty
Nicolas Ecarnot
nicolas at ecarnot.net
Thu Dec 17 09:51:46 UTC 2015
Le 17/12/2015 10:10, Nicolas Ecarnot a écrit :
> Hello,
>
> Our setup : 3 Centos 7.2 nodes, with gluster 3.7.6 in replica-3, used as
> storage+compute for an oVirt 3.5.6 DC.
>
> Two days ago, we added some nagios/centreon monitoring watching every 5
> minutes the state of the heal queue :
> (something like "gluster volume heal some_vol info" with the adequate
> grep).
>
> I expected the "Number of entries" of every node to appear in the graph
> as a flat zero line, most of the times, except for the rare cases of
> node reboot, after which healing is launched and takes some minutes
> (sometimes hours) but is doing good.
>
> Instead, we see that the healing queue is doing 2 or 3 files healing say
> 4 times an hour. All day long.
>
> Our DC is a small one, and has few VMs, so not more than only 8 big
> files are stored in glusterfs.
> I'm very surprised to see that these files constantly need healing, as I
> thought I've understood that read/writes were synchronous at every time,
> and replica-3 meant that every files were absolutely synced and commited
> at all time.
>
> I've also read about the 10 minutes cron-like job of the self-healing
> daemon, which we are using by default, but this is a second point.
>
> The first point leads to :
> - Why do we see so frequent desynchronizations between nodes?
> - Can I confirm that reading which logs?
> - What must I check?
>
Self-replying, but as I found :
https://www.mail-archive.com/gluster-users%40gluster.org/msg20611.html
could this make sense to be surprised to see that :
gluster volume get data cluster.op-version
Option Value
------ -----
cluster.op-version 30600
in a 3.7.6 gluster cluster?
I have absolutely no idea of what this means nor how this changes
anything. But I see many things in my logs like :
Server and Client lk-version numbers are not same, reopening the fds
and
many many errors in etc-glusterfs-glusterd.vol.log about
missing options, other points like 'Unable to release lock', very
frequent vol reqs :
http://pastebin.com/e6nQfeLx
What is op-version used for?
--
Nicolas ECARNOT
More information about the Gluster-users
mailing list