[Gluster-users] Healing queue rarely empty
nicolas at ecarnot.net
Tue Dec 29 15:12:54 UTC 2015
Le 17/12/2015 10:51, Nicolas Ecarnot a écrit :
> Le 17/12/2015 10:10, Nicolas Ecarnot a écrit :
>> Our setup : 3 Centos 7.2 nodes, with gluster 3.7.6 in replica-3, used as
>> storage+compute for an oVirt 3.5.6 DC.
>> Two days ago, we added some nagios/centreon monitoring watching every 5
>> minutes the state of the heal queue :
>> (something like "gluster volume heal some_vol info" with the adequate
>> I expected the "Number of entries" of every node to appear in the graph
>> as a flat zero line, most of the times, except for the rare cases of
>> node reboot, after which healing is launched and takes some minutes
>> (sometimes hours) but is doing good.
>> Instead, we see that the healing queue is doing 2 or 3 files healing say
>> 4 times an hour. All day long.
>> Our DC is a small one, and has few VMs, so not more than only 8 big
>> files are stored in glusterfs.
>> I'm very surprised to see that these files constantly need healing, as I
>> thought I've understood that read/writes were synchronous at every time,
>> and replica-3 meant that every files were absolutely synced and commited
>> at all time.
>> I've also read about the 10 minutes cron-like job of the self-healing
>> daemon, which we are using by default, but this is a second point.
>> The first point leads to :
>> - Why do we see so frequent desynchronizations between nodes?
>> - Can I confirm that reading which logs?
>> - What must I check?
> Self-replying, but as I found :
> could this make sense to be surprised to see that :
> gluster volume get data cluster.op-version
> Option Value
> ------ -----
> cluster.op-version 30600
> in a 3.7.6 gluster cluster?
Ok, cluster.op-version bumped up, but no improvement.
More information about the Gluster-users