[Gluster-users] gnfs split brain when 1 server in 3x1 down (high load) - help request
Ravishankar N
ravishankar at redhat.com
Tue Mar 31 09:57:59 UTC 2020
From your reply in the other thread, I'm assuming that the file/gfid in
question is not in genuine split-brain or needing heal. i.e. for example
with that 1 brick down and 2 bricks up test case, if you tried to read
the file from say a temporary fuse mount (which is also now connected to
only to 2 bricks since the 3rd one is down) it works fine and there is
no EIO error...
...which means that what you have observed is true, i.e.
afr_read_txn_refresh_done() is called with err=EIO. You can add logs to
see at what point it is EIO set. The call graph is like this:
afr_inode_refresh_done()-->afr_txn_refresh_done()-->afr_read_txn_refresh_done().
Maybe
https://github.com/gluster/glusterfs/blob/v7.4/xlators/cluster/afr/src/afr-common.c#L1188
in afr_txn_refresh_done() is causing it either due to ret being -EIO or
event_generation being zero.
If you are comfortable with gdb, you an put a conditional break point in
afr_read_txn_refresh_done() at
https://github.com/gluster/glusterfs/blob/v7.4/xlators/cluster/afr/src/afr-read-txn.c#L283
when err=EIO and then check the backtrace for who is setting err to EIO.
Regards,
Ravi
On 31/03/20 12:20 pm, Erik Jacobson wrote:
> I note that this part of afr_read_txn() gets triggered a lot.
>
> if (afr_is_inode_refresh_reqd(inode, this, local->event_generation,
> event_generation)) {
>
> Maybe that's normal when one of the three servers are down (but why
> isn't it using its local copy by default?)
>
> The comment in that if block is:
> /* servers have disconnected / reconnected, and possibly
> rebooted, very likely changing the state of freshness
> of copies */
>
> But we have one server conssitently down, not a changing situation.
>
> digging digging digging seemed to show this related to cache
> invalidation.... Because the paths seemed to suggest the inode needed
> refreshing and that seems handled by a case statement named
> GF_UPCALL_CACHE_INVALIDATION
>
> However, that must have been a wrong turn since turning off
> cache invalidation didn't help.
>
> I'm struggling to wrap my head around the code base and without the
> background in these concepts it's a tough hill to climb.
>
> I am going to have to try this again some day with fresh eyes and go to
> bed; the machine I have easy access to is going away in the morning.
> Now I'll have to reserve time on a contended one but I will do that and
> continue digging.
>
> Any suggestions would be greatly appreciated as I think I'm starting to
> tip over here on this one.
>
>
> On Mon, Mar 30, 2020 at 04:04:39PM -0500, Erik Jacobson wrote:
>>> Sadly I am not a developer, so I can't answer your questions.
>> I'm not a FS o rnetwork developer either. I think there is a joke about
>> playing one on TV but maybe it's netflix now.
>>
>> Enabling certain debug options made too much information for me to watch
>> personally (but an expert could probably get through it).
>>
>> So I started putting targeted 'print' (gf_msg) statements in the code to
>> see how it got its way to split-brain. Maybe this will ring a bell
>> for someone.
>>
>> I can tell the only way we enter the split-brain path is through in the
>> first if statement of afr_read_txn_refresh_done().
>>
>> This means afr_read_txn_refresh_done() itself was passed "err" and
>> that it appears thin_arbiter_count was not set (which makes sense,
>> I'm using 1x3, not a thin arbiter).
>>
>> So we jump to the readfn label, and read_subvol() should still be -1.
>> If I read right, it must mean that this if didn't return true because
>> my print statement didn't appear:
>> if ((ret == 0) && spb_choice >= 0) {
>>
>> So we're still with the original read_subvol == 1,
>> Which gets us to the split_brain message.
>>
>> So now I will try to learn why afr_read_txn_refresh_done() would have
>> 'err' set in the first place. I will also learn about
>> afr_inode_split_brain_choice_get(). Those seem to be the two methods to
>> have avoided falling in to the split brain hole here.
>>
>>
>> I put debug statements in these locations. I will mark with !!!!!! what
>> I see:
>>
>>
>>
>> diff -Narup glusterfs-7.2-orig/xlators/cluster/afr/src/afr-read-txn.c glusterfs-7.2-new/xlators/cluster/afr/src/afr-read-txn.c
>> --- glusterfs-7.2-orig/xlators/cluster/afr/src/afr-read-txn.c 2020-01-15 11:43:53.887894293 -0600
>> +++ glusterfs-7.2-new/xlators/cluster/afr/src/afr-read-txn.c 2020-03-30 15:45:02.917104321 -0500
>> @@ -279,10 +279,14 @@ afr_read_txn_refresh_done(call_frame_t *
>> priv = this->private;
>>
>> if (err) {
>> - if (!priv->thin_arbiter_count)
>> + if (!priv->thin_arbiter_count) {
>> + gf_msg(this->name, GF_LOG_ERROR,0,0,"erikj dbg crapola 1st if in afr_read_txn_refresh_done() !priv->thin_arbiter_count -- goto to readfn");
>> !!!!!!!!!!!!!!!!!!!!!!
>> We hit this error condition and jump to readfn below
>> !!!!!!!!!!!!!!!!!!!!!!!
>> goto readfn;
>> - if (err != EINVAL)
>> + }
>> + if (err != EINVAL) {
>> + gf_msg(this->name, GF_LOG_ERROR,0,0,"erikj 2nd if in afr_read_txn_refresh_done() err != EINVAL, goto readfn");
>> goto readfn;
>> + }
>> /* We need to query the good bricks and/or thin-arbiter.*/
>> afr_ta_read_txn_synctask(frame, this);
>> return 0;
>> @@ -291,6 +295,8 @@ afr_read_txn_refresh_done(call_frame_t *
>> read_subvol = afr_read_subvol_select_by_policy(inode, this, local->readable,
>> NULL);
>> if (read_subvol == -1) {
>> + gf_msg(this->name, GF_LOG_ERROR,0,0,"erikj dbg whoops read_subvol returned -1, going to readfn");
>> +
>> err = EIO;
>> goto readfn;
>> }
>> @@ -304,11 +310,15 @@ afr_read_txn_refresh_done(call_frame_t *
>> readfn:
>> if (read_subvol == -1) {
>> ret = afr_inode_split_brain_choice_get(inode, this, &spb_choice);
>> - if ((ret == 0) && spb_choice >= 0)
>> + if ((ret == 0) && spb_choice >= 0) {
>> !!!!!!!!!!!!!!!!!!!!!!
>> We never get here, afr_inode_split_brain_choice_get() must not have
>> returned what was needed to enter.
>> !!!!!!!!!!!!!!!!!!!!!!
>> + gf_msg(this->name, GF_LOG_ERROR,0,0,"erikj dbg read_subvol was -1 to begin with split brain choice found: %d", spb_choice);
>> read_subvol = spb_choice;
>> + }
>> }
>>
>> if (read_subvol == -1) {
>> + gf_msg(this->name, GF_LOG_ERROR,0,0,"erikj dbg verify this shows up above split-brain error");
>> !!!!!!!!!!!!!!!!!!!!!!
>> We hit here. Game over player.
>> !!!!!!!!!!!!!!!!!!!!!!
>> +
>> AFR_SET_ERROR_AND_CHECK_SPLIT_BRAIN(-1, err);
>> }
>> afr_read_txn_wind(frame, this, read_subvol);
>
> ________
>
>
>
> Community Meeting Calendar:
>
> Schedule -
> Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
> Bridge: https://bluejeans.com/441850968
>
> Gluster-users mailing list
> Gluster-users at gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-users
>
More information about the Gluster-users
mailing list