[Gluster-users] gnfs split brain when 1 server in 3x1 down (high load) - help request
Erik Jacobson
erik.jacobson at hpe.com
Tue Mar 31 06:50:11 UTC 2020
I note that this part of afr_read_txn() gets triggered a lot.
if (afr_is_inode_refresh_reqd(inode, this, local->event_generation,
event_generation)) {
Maybe that's normal when one of the three servers are down (but why
isn't it using its local copy by default?)
The comment in that if block is:
/* servers have disconnected / reconnected, and possibly
rebooted, very likely changing the state of freshness
of copies */
But we have one server conssitently down, not a changing situation.
digging digging digging seemed to show this related to cache
invalidation.... Because the paths seemed to suggest the inode needed
refreshing and that seems handled by a case statement named
GF_UPCALL_CACHE_INVALIDATION
However, that must have been a wrong turn since turning off
cache invalidation didn't help.
I'm struggling to wrap my head around the code base and without the
background in these concepts it's a tough hill to climb.
I am going to have to try this again some day with fresh eyes and go to
bed; the machine I have easy access to is going away in the morning.
Now I'll have to reserve time on a contended one but I will do that and
continue digging.
Any suggestions would be greatly appreciated as I think I'm starting to
tip over here on this one.
On Mon, Mar 30, 2020 at 04:04:39PM -0500, Erik Jacobson wrote:
> > Sadly I am not a developer, so I can't answer your questions.
>
> I'm not a FS o rnetwork developer either. I think there is a joke about
> playing one on TV but maybe it's netflix now.
>
> Enabling certain debug options made too much information for me to watch
> personally (but an expert could probably get through it).
>
> So I started putting targeted 'print' (gf_msg) statements in the code to
> see how it got its way to split-brain. Maybe this will ring a bell
> for someone.
>
> I can tell the only way we enter the split-brain path is through in the
> first if statement of afr_read_txn_refresh_done().
>
> This means afr_read_txn_refresh_done() itself was passed "err" and
> that it appears thin_arbiter_count was not set (which makes sense,
> I'm using 1x3, not a thin arbiter).
>
> So we jump to the readfn label, and read_subvol() should still be -1.
> If I read right, it must mean that this if didn't return true because
> my print statement didn't appear:
> if ((ret == 0) && spb_choice >= 0) {
>
> So we're still with the original read_subvol == 1,
> Which gets us to the split_brain message.
>
> So now I will try to learn why afr_read_txn_refresh_done() would have
> 'err' set in the first place. I will also learn about
> afr_inode_split_brain_choice_get(). Those seem to be the two methods to
> have avoided falling in to the split brain hole here.
>
>
> I put debug statements in these locations. I will mark with !!!!!! what
> I see:
>
>
>
> diff -Narup glusterfs-7.2-orig/xlators/cluster/afr/src/afr-read-txn.c glusterfs-7.2-new/xlators/cluster/afr/src/afr-read-txn.c
> --- glusterfs-7.2-orig/xlators/cluster/afr/src/afr-read-txn.c 2020-01-15 11:43:53.887894293 -0600
> +++ glusterfs-7.2-new/xlators/cluster/afr/src/afr-read-txn.c 2020-03-30 15:45:02.917104321 -0500
> @@ -279,10 +279,14 @@ afr_read_txn_refresh_done(call_frame_t *
> priv = this->private;
>
> if (err) {
> - if (!priv->thin_arbiter_count)
> + if (!priv->thin_arbiter_count) {
> + gf_msg(this->name, GF_LOG_ERROR,0,0,"erikj dbg crapola 1st if in afr_read_txn_refresh_done() !priv->thin_arbiter_count -- goto to readfn");
> !!!!!!!!!!!!!!!!!!!!!!
> We hit this error condition and jump to readfn below
> !!!!!!!!!!!!!!!!!!!!!!!
> goto readfn;
> - if (err != EINVAL)
> + }
> + if (err != EINVAL) {
> + gf_msg(this->name, GF_LOG_ERROR,0,0,"erikj 2nd if in afr_read_txn_refresh_done() err != EINVAL, goto readfn");
> goto readfn;
> + }
> /* We need to query the good bricks and/or thin-arbiter.*/
> afr_ta_read_txn_synctask(frame, this);
> return 0;
> @@ -291,6 +295,8 @@ afr_read_txn_refresh_done(call_frame_t *
> read_subvol = afr_read_subvol_select_by_policy(inode, this, local->readable,
> NULL);
> if (read_subvol == -1) {
> + gf_msg(this->name, GF_LOG_ERROR,0,0,"erikj dbg whoops read_subvol returned -1, going to readfn");
> +
> err = EIO;
> goto readfn;
> }
> @@ -304,11 +310,15 @@ afr_read_txn_refresh_done(call_frame_t *
> readfn:
> if (read_subvol == -1) {
> ret = afr_inode_split_brain_choice_get(inode, this, &spb_choice);
> - if ((ret == 0) && spb_choice >= 0)
> + if ((ret == 0) && spb_choice >= 0) {
> !!!!!!!!!!!!!!!!!!!!!!
> We never get here, afr_inode_split_brain_choice_get() must not have
> returned what was needed to enter.
> !!!!!!!!!!!!!!!!!!!!!!
> + gf_msg(this->name, GF_LOG_ERROR,0,0,"erikj dbg read_subvol was -1 to begin with split brain choice found: %d", spb_choice);
> read_subvol = spb_choice;
> + }
> }
>
> if (read_subvol == -1) {
> + gf_msg(this->name, GF_LOG_ERROR,0,0,"erikj dbg verify this shows up above split-brain error");
> !!!!!!!!!!!!!!!!!!!!!!
> We hit here. Game over player.
> !!!!!!!!!!!!!!!!!!!!!!
> +
> AFR_SET_ERROR_AND_CHECK_SPLIT_BRAIN(-1, err);
> }
> afr_read_txn_wind(frame, this, read_subvol);
More information about the Gluster-users
mailing list