[Gluster-users] gnfs split brain when 1 server in 3x1 down (high load) - help request

Erik Jacobson erik.jacobson at hpe.com
Sat Apr 4 15:42:53 UTC 2020


I had a co-worker look through this with me (Scott Titus). He has a more
analyitcal mind than I do. Here is what he said with some edits by me.
My edits were formatting and adjusting some words. So we were hoping
that, given this analysis, the community could let us know if it raises
any red flags that would lead to a solution to the problem (whether it
be setup, settings, or code). If needed, I can get Scott to work with me
and dig further but it was starting to get painful where Scott stopped.

Scott's words (edited):

(all backtraces match - at least up to the point I'm concerned with at this
time)

Error was passed from afr_inode_refresh_done() into afr_txn_refresh_done() as
afr_inode_refresh_done()'s call frame has 'error=0'
while afr_txn_refresh_done() has 'err=5' in the call frame.


#0  afr_read_txn_refresh_done (frame=0x7ffc949cf7c8, this=0x7fff640137b0,
    err=5) at afr-read-txn.c:281
#1  0x00007fff68901fdb in afr_txn_refresh_done (
    frame=frame at entry=0x7ffc949cf7c8, this=this at entry=0x7fff640137b0,
err=5,
    err at entry=0) at afr-common.c:1223
#2  0x00007fff689022b3 in afr_inode_refresh_done (
    frame=frame at entry=0x7ffc949cf7c8, this=this at entry=0x7fff640137b0,
error=0)
    at afr-common.c:1295
#3  0x00007fff6890f3fb in afr_inode_refresh_subvol_cbk (frame=0x7ffc949cf7c8,
    cookie=<optimized out>, this=0x7fff640137b0, op_ret=<optimized out>,
    op_errno=<optimized out>, buf=buf at entry=0x7ffd53ffdaa0,
    xdata=0x7ffd3c6764f8, par=0x7ffd53ffdb40) at afr-common.c:1333


Within afr_inode_refresh_done(), the only two ways it can generate an error
within is via setting it to EINVAL or resulting from a failure status from
afr_has_quorum().  Since EINVAL is 22, not 5, the quorum test failed.

Within the afr_has_quorum() conditional, an error could be set
from afr_final_errno() or afr_quorum_errno().  Digging reveals
afr_quorum_errno() just returns ENOTCONN which is 107, so that is not it.
This leaves us with afr_quorum_errno() returning the error.

(Scott provided me with source code with pieces bolded but I don't think
you need that).

afr_final_errno() iterates through the 'children', looking for
valid errors within the replies for the transaction (refresh transaction?).
The function returns the highest valued error, which must be EIO (value of 5)
in this case.

I have not looked into how or what would set the error value in the
replies array, as this being a distributed system the error could have been
generated on another server. Unless this path needs to be investigated, I'd
rather not get mired into finding which iteration (value of 'i') has the error
and what system? thread?  added the error to the reply unless it is
information that is required.



Any suggested next steps?

> 
> On 01/04/20 8:57 am, Erik Jacobson wrote:
> > Here are some back traces. They make my head hurt. Maybe you can suggest
> > something else to try next? In the morning I'll try to unwind this
> > myself too in the source code but I suspect it will be tough for me.
> > 
> > 
> > (gdb) break xlators/cluster/afr/src/afr-read-txn.c:280 if err == 5
> > Breakpoint 1 at 0x7fff688e057b: file afr-read-txn.c, line 281.
> > (gdb) continue
> > Continuing.
> > [Switching to Thread 0x7ffecffff700 (LWP 50175)]
> > 
> > Thread 15 "glfs_epoll007" hit Breakpoint 1, afr_read_txn_refresh_done (
> >      frame=0x7fff48325d78, this=0x7fff640137b0, err=5) at afr-read-txn.c:281
> > 281	    if (err) {
> > (gdb) bt
> > #0  afr_read_txn_refresh_done (frame=0x7fff48325d78, this=0x7fff640137b0,
> >      err=5) at afr-read-txn.c:281
> > #1  0x00007fff68901fdb in afr_txn_refresh_done (
> >      frame=frame at entry=0x7fff48325d78, this=this at entry=0x7fff640137b0, err=5,
> >      err at entry=0) at afr-common.c:1223
> > #2  0x00007fff689022b3 in afr_inode_refresh_done (
> >      frame=frame at entry=0x7fff48325d78, this=this at entry=0x7fff640137b0, error=0)
> >      at afr-common.c:1295
> Hmm, afr_inode_refresh_done() is called with error=0 and by the time we
> reach afr_txn_refresh_done(), it becomes 5(i.e. EIO).
> So afr_inode_refresh_done() is changing it to 5. Maybe you can put
> breakpoints/ log messages in afr_inode_refresh_done() at the places where
> error is getting changed and see where the assignment happens.
> 
> 
> Regards,
> Ravi



More information about the Gluster-users mailing list