[Gluster-users] Issues in AFR and self healing

Tue Aug 14 12:15:53 UTC 2018

Thanks for the info!

I cannot see any logs in the mount log besides one line every time it 
rotates

[2018-08-13 06:25:02.246187] I [glusterfsd-mgmt.c:1821:mgmt_getspec_cbk] 
0-glusterfs: No change in volfile,continuing

But I did find in the glfsheal-gv1.log of the volumes some kind of 
server-client connection that was disconnected and now it connects using 
a different port. The block of log per each run is kind of long so I'm 
copying it into a pastebin.

https://pastebin.com/bp06rrsT

Maybe this has something to do with it?

Thanks!

Pablo.

On 08/11/2018 12:19 AM, Ravishankar N wrote:
>
>
>
> On 08/10/2018 11:25 PM, Pablo Schandin wrote:
>>
>> Hello everyone!
>>
>> I'm having some trouble with something but I'm not quite sure of with 
>> what yet. I'm running GlusterFS 3.12.6 on Ubuntu 16.04. I have two 
>> servers (nodes) in the cluster in a replica mode. Each server has 2 
>> bricks. As the servers are KVM running several VMs, one brick has 
>> some VMs locally defined in it and the second brick is the replicated 
>> from the other server. It has data but not actual writing is being 
>> done except for the replication.
>>
>>                             Server 1                               
>>               Server 2
>> Volume 1 (gv1): Brick 1 defined VMs (read/write) ---->                
>>   Brick 1 replicated qcow2 files
>> Volume 2 (gv2): Brick 2 replicated qcow2 files <-----                 
>> Brick 2 defined VMs (read/write)
>>
>> So, the main issue arose when I got a nagios alarm that warned about 
>> a file listed to be healed. And then it disappeared. I came to find 
>> out that every 5 minutes, the self heal daemon triggers the healing 
>> and this fixes it. But looking at the logs I have a lot of entries in 
>> the glustershd.log file like this:
>>
>> [2018-08-09 14:23:37.689403] I [MSGID: 108026] 
>> [afr-self-heal-common.c:1656:afr_log_selfheal] 0-gv1-replicate-0: 
>> Completed data selfheal on 407bd97b-e76c-4f81-8f59-7dae11507b0c. 
>> sources=[0]  sinks=1
>> [2018-08-09 14:44:37.933143] I [MSGID: 108026] 
>> [afr-self-heal-common.c:1656:afr_log_selfheal] 0-gv2-replicate-0: 
>> Completed data selfheal on 73713556-5b63-4f91-b83d-d7d82fee111f. 
>> sources=[0]  sinks=1
>>
>> The qcow2 files are being healed several times a day (up to 30 in 
>> occasions). As I understand, this means that a data heal occurred on 
>> file with gfid 407b... and 7371... in source to sink. Local server to 
>> replica server? Is it OK for the shd to heal files in the replicated 
>> brick that supposedly has no writing on it besides the mirroring? How 
>> does that work?
>>
> In AFR, for writes, there is no notion of local/remote brick. No 
> matter from which client you write to the volume, it gets sent to both 
> bricks. i.e. the replication is synchronous and real time.
>
>> How does afr replication work? The file with gfid 7371... is the 
>> qcow2 root disk of an owncloud server with 17GB of data. It does not 
>> seem to be that big to be a bottleneck of some sort, I think.
>>
>> Also, I was investigating the directory tree in 
>> brick/.glusterfs/indices and I notices that both in xattrop and dirty 
>> I always have a file created named xattrop-xxxxxx and dirty-xxxxxx. I 
>> read that the xattrop file is like a parent file or handle to 
>> reference other files created there as hardlinks with gfid name for 
>> the shd to heal. Is the same case as the ones in the dirty dir?
>>
> Yes, before the write, the gfid gets captured inside dirty on all 
> bricks. If the write is successful, it gets removed. In addition, if 
> the write fails on one brick, the other brick will capture the gfid 
> inside xattrop.
>>
>> Any help will be greatly appreciated it. Thanks!
>>
> If frequent heals are triggered, it could mean there are frequent 
> network disconnects from the clients to the bricks as writes happen. 
> You can check the mount logs and see if that is the case and 
> investigate possible network issues.
>
> HTH,
> Ravi
>>
>> Pablo.
>>
>>
>>
>>
>>
>> _______________________________________________
>> Gluster-users mailing list
>> Gluster-users at gluster.org
>> https://lists.gluster.org/mailman/listinfo/gluster-users
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20180814/1e3a98ce/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 4008 bytes
Desc: S/MIME Cryptographic Signature
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20180814/1e3a98ce/attachment.p7s>