[Gluster-users] gnfs split brain when 1 server in 3x1 down (high load) - help request

Wed Apr 1 04:18:51 UTC 2020

On 01/04/20 8:57 am, Erik Jacobson wrote:
> Here are some back traces. They make my head hurt. Maybe you can suggest
> something else to try next? In the morning I'll try to unwind this
> myself too in the source code but I suspect it will be tough for me.
>
>
> (gdb) break xlators/cluster/afr/src/afr-read-txn.c:280 if err == 5
> Breakpoint 1 at 0x7fff688e057b: file afr-read-txn.c, line 281.
> (gdb) continue
> Continuing.
> [Switching to Thread 0x7ffecffff700 (LWP 50175)]
>
> Thread 15 "glfs_epoll007" hit Breakpoint 1, afr_read_txn_refresh_done (
>      frame=0x7fff48325d78, this=0x7fff640137b0, err=5) at afr-read-txn.c:281
> 281	    if (err) {
> (gdb) bt
> #0  afr_read_txn_refresh_done (frame=0x7fff48325d78, this=0x7fff640137b0,
>      err=5) at afr-read-txn.c:281
> #1  0x00007fff68901fdb in afr_txn_refresh_done (
>      frame=frame at entry=0x7fff48325d78, this=this at entry=0x7fff640137b0, err=5,
>      err at entry=0) at afr-common.c:1223
> #2  0x00007fff689022b3 in afr_inode_refresh_done (
>      frame=frame at entry=0x7fff48325d78, this=this at entry=0x7fff640137b0, error=0)
>      at afr-common.c:1295
Hmm, afr_inode_refresh_done() is called with error=0 and by the time we 
reach afr_txn_refresh_done(), it becomes 5(i.e. EIO).
So afr_inode_refresh_done() is changing it to 5. Maybe you can put 
breakpoints/ log messages in afr_inode_refresh_done() at the places 
where error is getting changed and see where the assignment happens.


Regards,
Ravi