[Gluster-users] gnfs split brain when 1 server in 3x1 down (high load) - help request

Wed Apr 15 08:35:31 UTC 2020

On 10/04/20 2:06 am, Erik Jacobson wrote:
> Once again thanks for sticking with us. Here is a reply from Scott
> Titus. If you have something for us to try, we'd love it. The code had
> your patch applied when gdb was run:
>
>
> Here is the addr2line output for those addresses.  Very interesting command, of
> which I was not aware.
>
> [root at leader3 ~]# addr2line -f -e/usr/lib64/glusterfs/7.2/xlator/cluster/
> afr.so 0x6f735
> afr_lookup_metadata_heal_check
> afr-common.c:2803
> [root at leader3 ~]# addr2line -f -e/usr/lib64/glusterfs/7.2/xlator/cluster/
> afr.so 0x6f0b9
> afr_lookup_done
> afr-common.c:2455
> [root at leader3 ~]# addr2line -f -e/usr/lib64/glusterfs/7.2/xlator/cluster/
> afr.so 0x5c701
> afr_inode_event_gen_reset
> afr-common.c:755
>
Right, so afr_lookup_done() is resetting the event gen to zero. This 
looks like a race between lookup and inode refresh code paths. We made 
some changes to the event generation logic in AFR. Can you apply the 
attached patch and see if it fixes the split-brain issue? It should 
apply cleanly on glusterfs-7.4.

Thanks,
Ravi
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0001-afr-mark-pending-xattrs-as-a-part-of-metadata-heal.patch
Type: text/x-patch
Size: 3813 bytes
Desc: not available
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20200415/404b33fd/attachment.bin>