[Gluster-users] gnfs split brain when 1 server in 3x1 down (high load) - help request

Sun Apr 5 08:35:29 UTC 2020

On 04/04/20 9:12 pm, Erik Jacobson wrote:
> This leaves us with afr_quorum_errno() returning the error.
>
> afr_final_errno() iterates through the 'children', looking for
> valid errors within the replies for the transaction (refresh transaction?).
> The function returns the highest valued error, which must be EIO (value of 5)
> in this case.
>
> I have not looked into how or what would set the error value in the
> replies array,

The errror numbers that you see in the replies array in 
afr_final_errno() are set in afr_inode_refresh_subvol_cbk().

During inode refresh (which is essentially a lookup), AFR sends the the 
lookup request on all its connected children and the replies from each 
one of them are captured in afr_inode_refresh_subvol_cbk(). So adding a 
log here can identify if we got EIO from any of its children. See 
attached patch for an example.

After we hear from all children, afr_inode_refresh_subvol_cbk() then 
calls 
afr_inode_refresh_done()-->afr_txn_refresh_done()-->afr_read_txn_refresh_done(). 
But you already know this flow now.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: log.patch
Type: text/x-patch
Size: 840 bytes
Desc: not available
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20200405/fbd75665/attachment.bin>