[Gluster-users] Issues in AFR and self healing

Fri Aug 10 17:55:54 UTC 2018

Hello everyone!

I'm having some trouble with something but I'm not quite sure of with 
what yet. I'm running GlusterFS 3.12.6 on Ubuntu 16.04. I have two 
servers (nodes) in the cluster in a replica mode. Each server has 2 
bricks. As the servers are KVM running several VMs, one brick has some 
VMs locally defined in it and the second brick is the replicated from 
the other server. It has data but not actual writing is being done 
except for the replication.

                             Server 1                                 
         Server 2
Volume 1 (gv1): Brick 1 defined VMs (read/write)    ---->               
Brick 1 replicated qcow2 files
Volume 2 (gv2): Brick 2 replicated qcow2 files        <-----         
      Brick 2 defined VMs (read/write)

So, the main issue arose when I got a nagios alarm that warned about a 
file listed to be healed. And then it disappeared. I came to find out 
that every 5 minutes, the self heal daemon triggers the healing and this 
fixes it. But looking at the logs I have a lot of entries in the 
glustershd.log file like this:

[2018-08-09 14:23:37.689403] I [MSGID: 108026] 
[afr-self-heal-common.c:1656:afr_log_selfheal] 0-gv1-replicate-0: 
Completed data selfheal on 407bd97b-e76c-4f81-8f59-7dae11507b0c. 
sources=[0]  sinks=1
[2018-08-09 14:44:37.933143] I [MSGID: 108026] 
[afr-self-heal-common.c:1656:afr_log_selfheal] 0-gv2-replicate-0: 
Completed data selfheal on 73713556-5b63-4f91-b83d-d7d82fee111f. 
sources=[0]  sinks=1

The qcow2 files are being healed several times a day (up to 30 in 
occasions). As I understand, this means that a data heal occurred on 
file with gfid 407b... and 7371... in source to sink. Local server to 
replica server? Is it OK for the shd to heal files in the replicated 
brick that supposedly has no writing on it besides the mirroring? How 
does that work?

How does afr replication work? The file with gfid 7371... is the qcow2 
root disk of an owncloud server with 17GB of data. It does not seem to 
be that big to be a bottleneck of some sort, I think.

Also, I was investigating the directory tree in brick/.glusterfs/indices 
and I notices that both in xattrop and dirty I always have a file 
created named xattrop-xxxxxx and dirty-xxxxxx. I read that the xattrop 
file is like a parent file or handle to reference other files created 
there as hardlinks with gfid name for the shd to heal. Is the same case 
as the ones in the dirty dir?

Any help will be greatly appreciated it. Thanks!

Pablo.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20180810/b1fd4b57/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 4008 bytes
Desc: S/MIME Cryptographic Signature
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20180810/b1fd4b57/attachment.p7s>