[Gluster-users] gnfs split brain when 1 server in 3x1 down (high load) - help request
erik.jacobson at hpe.com
Thu Apr 2 07:02:46 UTC 2020
> Hmm, afr_inode_refresh_done() is called with error=0 and by the time we
> reach afr_txn_refresh_done(), it becomes 5(i.e. EIO).
> So afr_inode_refresh_done() is changing it to 5. Maybe you can put
> breakpoints/ log messages in afr_inode_refresh_done() at the places where
> error is getting changed and see where the assignment happens.
I had a lot of struggles tonight getting the system ready to go. I had
seg11's in glusterfs(nfs) but I think it was related to not all brick
processes stopping with glusterd. I also re-installed and/or the print
statements. I'm not sure. I'm not used to seeing that.
I put print statements everywhere I thought error could change and got
no printed log messages.
I put break points where error would change and we didn't hit them.
I then point a breakpoint at
break xlators/cluster/afr/src/afr-common.c:1298 if error != 0
afr_txn_refresh_done(frame, this, error);
And it never triggered (despite split-brain messages and my crapola
So I'm unable to explain this transition. I'm also not a gdb expert.
I still see the same back trace though.
#1 0x00007fff68938d7b in afr_txn_refresh_done (
frame=frame at entry=0x7ffd540ed8e8, this=this at entry=0x7fff64013720, err=5,
err at entry=0) at afr-common.c:1222
#2 0x00007fff689391f0 in afr_inode_refresh_done (
frame=frame at entry=0x7ffd540ed8e8, this=this at entry=0x7fff64013720, error=0)
Is there other advice you might have for me to try?
I'm very eager to solve this problem, which is why I'm staying up late
to get machine time. I must go to bed now. I look forward to another
shot tomorrow night if you have more ideas to try.
More information about the Gluster-users