<div dir="ltr"><span style="color:rgb(0,0,0);font-family:Calibri,Helvetica,sans-serif;font-size:16px">Unsubscribe</span><br></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Mon, Apr 6, 2020 at 6:10 PM Oskar Pienkos <<a href="mailto:oskarp10@hotmail.com">oskarp10@hotmail.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<div dir="ltr">
<div style="font-family:Calibri,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
Unsubscribe</div>
<div>
<div style="font-family:Calibri,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
<br>
</div>
<div id="gmail-m_4928644625267831244Signature">
<p>Sent from <a href="http://aka.ms/weboutlook" target="_blank">Outlook</a><br>
</p>
<div>
<div id="gmail-m_4928644625267831244appendonsend"></div>
<div style="font-family:Calibri,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
<br>
</div>
<hr style="display:inline-block;width:98%">
<div id="gmail-m_4928644625267831244divRplyFwdMsg" dir="ltr"><font face="Calibri, sans-serif" color="#000000" style="font-size:11pt"><b>From:</b> <a href="mailto:gluster-users-bounces@gluster.org" target="_blank">gluster-users-bounces@gluster.org</a> <<a href="mailto:gluster-users-bounces@gluster.org" target="_blank">gluster-users-bounces@gluster.org</a>> on behalf of <a href="mailto:gluster-users-request@gluster.org" target="_blank">gluster-users-request@gluster.org</a> <<a href="mailto:gluster-users-request@gluster.org" target="_blank">gluster-users-request@gluster.org</a>><br>
<b>Sent:</b> April 6, 2020 5:00 AM<br>
<b>To:</b> <a href="mailto:gluster-users@gluster.org" target="_blank">gluster-users@gluster.org</a> <<a href="mailto:gluster-users@gluster.org" target="_blank">gluster-users@gluster.org</a>><br>
<b>Subject:</b> Gluster-users Digest, Vol 144, Issue 6</font>
<div> </div>
</div>
<div><font size="2"><span style="font-size:11pt">
<div>Send Gluster-users mailing list submissions to<br>
<a href="mailto:gluster-users@gluster.org" target="_blank">gluster-users@gluster.org</a><br>
<br>
To subscribe or unsubscribe via the World Wide Web, visit<br>
<a href="https://lists.gluster.org/mailman/listinfo/gluster-users" target="_blank">https://lists.gluster.org/mailman/listinfo/gluster-users</a><br>
or, via email, send a message with subject or body 'help' to<br>
<a href="mailto:gluster-users-request@gluster.org" target="_blank">gluster-users-request@gluster.org</a><br>
<br>
You can reach the person managing the list at<br>
<a href="mailto:gluster-users-owner@gluster.org" target="_blank">gluster-users-owner@gluster.org</a><br>
<br>
When replying, please edit your Subject line so it is more specific<br>
than "Re: Contents of Gluster-users digest..."<br>
<br>
<br>
Today's Topics:<br>
<br>
1. Re: gnfs split brain when 1 server in 3x1 down (high load) -<br>
help request (Erik Jacobson)<br>
2. Re: gnfs split brain when 1 server in 3x1 down (high load) -<br>
help request (Erik Jacobson)<br>
3. Re: Repository down ? (Hu Bert)<br>
4. One error/warning message after upgrade 5.11 -> 6.8 (Hu Bert)<br>
5. Re: gnfs split brain when 1 server in 3x1 down (high load) -<br>
help request (Ravishankar N)<br>
6. Gluster testcase Hackathon (Hari Gowtham)<br>
7. gluster v6.8: systemd units disabled after install (Hu Bert)<br>
<br>
<br>
----------------------------------------------------------------------<br>
<br>
Message: 1<br>
Date: Sun, 5 Apr 2020 18:49:56 -0500<br>
From: Erik Jacobson <<a href="mailto:erik.jacobson@hpe.com" target="_blank">erik.jacobson@hpe.com</a>><br>
To: Ravishankar N <<a href="mailto:ravishankar@redhat.com" target="_blank">ravishankar@redhat.com</a>><br>
Cc: <a href="mailto:gluster-users@gluster.org" target="_blank">gluster-users@gluster.org</a><br>
Subject: Re: [Gluster-users] gnfs split brain when 1 server in 3x1<br>
down (high load) - help request<br>
Message-ID: <<a href="mailto:20200405234956.GB29598@metalio.americas.hpqcorp.net" target="_blank">20200405234956.GB29598@metalio.americas.hpqcorp.net</a>><br>
Content-Type: text/plain; charset=us-ascii<br>
<br>
First, it's possible our analysis is off somewhere. I never get to your<br>
print message. I put a debug statement at the start of the function so I<br>
know we get there (just to verify my print statements were taking<br>
affect).<br>
<br>
I put a print statement for the if (call_count == 0) { call there, right<br>
after the if. I ran some tests.<br>
<br>
I suspect that isn't a problem area. There were some interesting results<br>
with an NFS stale file handle error going through that path. Otherwise<br>
it's always errno=0 even in the heavy test case. I'm not concerned about<br>
a stale NFS file handle this moment. That print was also hit heavily when<br>
one server was down (which surprised me but I don't know the internals).<br>
<br>
I'm trying to re-read and work through Scott's message to see if any<br>
other print statements might be helpful.<br>
<br>
Thank you for your help so far. I will reply back if I find something.<br>
Otherwise suggestions welcome!<br>
<br>
The MFG system I can access got smaller this weekend but is still large<br>
enough to reproduce the error.<br>
<br>
As you can tell, I work mostly at a level well above filesystem code so<br>
thank you for staying with me as I struggle through this.<br>
<br>
Erik<br>
<br>
> After we hear from all children, afr_inode_refresh_subvol_cbk() then calls afr_inode_refresh_done()-->afr_txn_refresh_done()-->afr_read_txn_refresh_done().<br>
> But you already know this flow now.<br>
<br>
> diff --git a/xlators/cluster/afr/src/afr-common.c b/xlators/cluster/afr/src/afr-common.c<br>
> index 4bfaef9e8..096ce06f0 100644<br>
> --- a/xlators/cluster/afr/src/afr-common.c<br>
> +++ b/xlators/cluster/afr/src/afr-common.c<br>
> @@ -1318,6 +1318,12 @@ afr_inode_refresh_subvol_cbk(call_frame_t *frame, void *cookie, xlator_t *this,<br>
> if (xdata)<br>
> local->replies[call_child].xdata = dict_ref(xdata);<br>
> }<br>
> + if (op_ret == -1)<br>
> + gf_msg_callingfn(<br>
> + this->name, GF_LOG_ERROR, op_errno, AFR_MSG_SPLIT_BRAIN,<br>
> + "Inode refresh on child:%d failed with errno:%d for %s(%s) ",<br>
> + call_child, op_errno, local-><a href="http://loc.name" target="_blank">loc.name</a>,<br>
> + uuid_utoa(local->loc.inode->gfid));<br>
> if (xdata) {<br>
> ret = dict_get_int8(xdata, "link-count", &need_heal);<br>
> local->replies[call_child].need_heal = need_heal;<br>
<br>
<br>
<br>
------------------------------<br>
<br>
Message: 2<br>
Date: Sun, 5 Apr 2020 20:22:21 -0500<br>
From: Erik Jacobson <<a href="mailto:erik.jacobson@hpe.com" target="_blank">erik.jacobson@hpe.com</a>><br>
To: Ravishankar N <<a href="mailto:ravishankar@redhat.com" target="_blank">ravishankar@redhat.com</a>><br>
Cc: <a href="mailto:gluster-users@gluster.org" target="_blank">gluster-users@gluster.org</a><br>
Subject: Re: [Gluster-users] gnfs split brain when 1 server in 3x1<br>
down (high load) - help request<br>
Message-ID: <<a href="mailto:20200406012221.GD29598@metalio.americas.hpqcorp.net" target="_blank">20200406012221.GD29598@metalio.americas.hpqcorp.net</a>><br>
Content-Type: text/plain; charset=us-ascii<br>
<br>
During the problem case, near as I can tell, afr_final_errno(),<br>
in the loop where tmp_errno = local->replies[i].op_errno is set,<br>
the errno is always "2" when it gets to that point on server 3 (where<br>
the NFS load is).<br>
<br>
I never see a value other than 2.<br>
<br>
I later simply put the print at the end of the function too, to double<br>
verify non-zero exit codes. There are thousands of non-zero return<br>
codes, all 2 when not zero. Here is an exmaple flow right before a<br>
split-brain. I do not wish to imply the split-brain is related, it's<br>
just an example log snip:<br>
<br>
<br>
[2020-04-06 00:54:21.125373] E [MSGID: 0] [afr-common.c:2546:afr_final_errno] 0-erikj-afr_final_errno: erikj dbg afr_final_errno() errno from loop before afr_higher_errno was: 2<br>
[2020-04-06 00:54:21.125374] E [MSGID: 0] [afr-common.c:2551:afr_final_errno] 0-erikj-afr_final_errno: erikj dbg returning non-zero: 2<br>
[2020-04-06 00:54:23.315397] E [MSGID: 0] [afr-read-txn.c:283:afr_read_txn_refresh_done] 0-cm_shared-replicate-0: erikj dbg crapola 1st if in afr_read_txn_refresh_done() !priv->thin_arbiter_count -- goto to readfn<br>
[2020-04-06 00:54:23.315432] E [MSGID: 108008] [afr-read-txn.c:314:afr_read_txn_refresh_done] 0-cm_shared-replicate-0: Failing READLINK on gfid 57f269ef-919d-40ec-b7fc-a7906fee648b: split-brain observed. [Input/output error]<br>
[2020-04-06 00:54:23.315450] W [MSGID: 112199] [nfs3-helpers.c:3327:nfs3_log_readlink_res] 0-nfs-nfsv3: /image/images_ro_nfs/rhel8.0/usr/lib64/libmlx5.so.1 => (XID: 1fdba2bc, READLINK: NFS: 5(I/O error), POSIX: 5(Input/output error)) target: (null)<br>
<br>
<br>
I am missing something. I will see if Scott and I can work together<br>
tomorrow. Happy for any more ideas, Thank you!!<br>
<br>
<br>
On Sun, Apr 05, 2020 at 06:49:56PM -0500, Erik Jacobson wrote:<br>
> First, it's possible our analysis is off somewhere. I never get to your<br>
> print message. I put a debug statement at the start of the function so I<br>
> know we get there (just to verify my print statements were taking<br>
> affect).<br>
> <br>
> I put a print statement for the if (call_count == 0) { call there, right<br>
> after the if. I ran some tests.<br>
> <br>
> I suspect that isn't a problem area. There were some interesting results<br>
> with an NFS stale file handle error going through that path. Otherwise<br>
> it's always errno=0 even in the heavy test case. I'm not concerned about<br>
> a stale NFS file handle this moment. That print was also hit heavily when<br>
> one server was down (which surprised me but I don't know the internals).<br>
> <br>
> I'm trying to re-read and work through Scott's message to see if any<br>
> other print statements might be helpful.<br>
> <br>
> Thank you for your help so far. I will reply back if I find something.<br>
> Otherwise suggestions welcome!<br>
> <br>
> The MFG system I can access got smaller this weekend but is still large<br>
> enough to reproduce the error.<br>
> <br>
> As you can tell, I work mostly at a level well above filesystem code so<br>
> thank you for staying with me as I struggle through this.<br>
> <br>
> Erik<br>
> <br>
> > After we hear from all children, afr_inode_refresh_subvol_cbk() then calls afr_inode_refresh_done()-->afr_txn_refresh_done()-->afr_read_txn_refresh_done().<br>
> > But you already know this flow now.<br>
> <br>
> > diff --git a/xlators/cluster/afr/src/afr-common.c b/xlators/cluster/afr/src/afr-common.c<br>
> > index 4bfaef9e8..096ce06f0 100644<br>
> > --- a/xlators/cluster/afr/src/afr-common.c<br>
> > +++ b/xlators/cluster/afr/src/afr-common.c<br>
> > @@ -1318,6 +1318,12 @@ afr_inode_refresh_subvol_cbk(call_frame_t *frame, void *cookie, xlator_t *this,<br>
> > if (xdata)<br>
> > local->replies[call_child].xdata = dict_ref(xdata);<br>
> > }<br>
> > + if (op_ret == -1)<br>
> > + gf_msg_callingfn(<br>
> > + this->name, GF_LOG_ERROR, op_errno, AFR_MSG_SPLIT_BRAIN,<br>
> > + "Inode refresh on child:%d failed with errno:%d for %s(%s) ",<br>
> > + call_child, op_errno, local-><a href="http://loc.name" target="_blank">loc.name</a>,<br>
> > + uuid_utoa(local->loc.inode->gfid));<br>
> > if (xdata) {<br>
> > ret = dict_get_int8(xdata, "link-count", &need_heal);<br>
> > local->replies[call_child].need_heal = need_heal;<br>
<br>
<br>
<br>
<br>
------------------------------<br>
<br>
Message: 3<br>
Date: Mon, 6 Apr 2020 06:03:22 +0200<br>
From: Hu Bert <<a href="mailto:revirii@googlemail.com" target="_blank">revirii@googlemail.com</a>><br>
To: Renaud Fortier <<a href="mailto:Renaud.Fortier@fsaa.ulaval.ca" target="_blank">Renaud.Fortier@fsaa.ulaval.ca</a>><br>
Cc: "<a href="mailto:gluster-users@gluster.org" target="_blank">gluster-users@gluster.org</a>" <<a href="mailto:gluster-users@gluster.org" target="_blank">gluster-users@gluster.org</a>><br>
Subject: Re: [Gluster-users] Repository down ?<br>
Message-ID:<br>
<CAAV-989P_=hNXj_PJdEBB8rG29b=<a href="mailto:jrzwDY3X%2BMBHi40zddzPgg@mail.gmail.com" target="_blank">jrzwDY3X+MBHi40zddzPgg@mail.gmail.com</a>><br>
Content-Type: text/plain; charset="UTF-8"<br>
<br>
Good morning,<br>
<br>
upgraded from 5.11 to 6.8 today; 2 servers worked smoothly, one again<br>
had connection problems:<br>
<br>
Err:1 <a href="https://download.gluster.org/pub/gluster/glusterfs/6/6.8/Debian/buster/amd64/apt" target="_blank">
https://download.gluster.org/pub/gluster/glusterfs/6/6.8/Debian/buster/amd64/apt</a><br>
buster/main amd64 libglusterfs-dev amd64 6.8-1<br>
Could not connect to <a href="http://download.gluster.org:443" target="_blank">download.gluster.org:443</a> (8.43.85.185),<br>
connection timed out<br>
Err:2 <a href="https://download.gluster.org/pub/gluster/glusterfs/6/6.8/Debian/buster/amd64/apt" target="_blank">
https://download.gluster.org/pub/gluster/glusterfs/6/6.8/Debian/buster/amd64/apt</a><br>
buster/main amd64 libgfxdr0 amd64 6.8-1<br>
Unable to connect to download.gluster.org:https:<br>
<br>
As a workaround i downloaded the packages manually on one of the other<br>
2 servers, copied them to server3 and installed them manually.<br>
<br>
Any idea why this happens? /etc/hosts, /etc/resolv.conf are identical.<br>
The servers are behind the same gateway (switch in datacenter of<br>
provider), the server IPs differ only in the last number.<br>
<br>
<br>
Best regards,<br>
Hubert<br>
<br>
Am Fr., 3. Apr. 2020 um 10:33 Uhr schrieb Hu Bert <<a href="mailto:revirii@googlemail.com" target="_blank">revirii@googlemail.com</a>>:<br>
><br>
> ok, half an hour later it worked. Not funny during an upgrade. Strange... :-)<br>
><br>
><br>
> Regards,<br>
> Hubert<br>
><br>
> Am Fr., 3. Apr. 2020 um 10:19 Uhr schrieb Hu Bert <<a href="mailto:revirii@googlemail.com" target="_blank">revirii@googlemail.com</a>>:<br>
> ><br>
> > Hi,<br>
> ><br>
> > i'm currently preparing an upgrade 5.x -> 6.8; the download of the<br>
> > repository key works on 2 of 3 servers. nameserver settings are<br>
> > identical. On the 3rd server i get this:<br>
> ><br>
> > wget -O - <a href="https://download.gluster.org/pub/gluster/glusterfs/6/rsa.pub" target="_blank">
https://download.gluster.org/pub/gluster/glusterfs/6/rsa.pub</a><br>
> > | apt-key add -<br>
> > --2020-04-03 10:15:43--<br>
> > <a href="https://download.gluster.org/pub/gluster/glusterfs/6/rsa.pub" target="_blank">https://download.gluster.org/pub/gluster/glusterfs/6/rsa.pub</a><br>
> > Resolving <a href="http://download.gluster.org" target="_blank">download.gluster.org</a> (<a href="http://download.gluster.org" target="_blank">download.gluster.org</a>)... 8.43.85.185<br>
> > Connecting to <a href="http://download.gluster.org" target="_blank">download.gluster.org</a><br>
> > (<a href="http://download.gluster.org" target="_blank">download.gluster.org</a>)|8.43.85.185|:443... failed: Connection timed<br>
> > out.<br>
> > Retrying.<br>
> ><br>
> > and this goes on and on... Which errors do you see?<br>
> ><br>
> ><br>
> > Regards,<br>
> > Hubert<br>
> ><br>
> > Am Mo., 30. M?rz 2020 um 20:40 Uhr schrieb Renaud Fortier<br>
> > <<a href="mailto:Renaud.Fortier@fsaa.ulaval.ca" target="_blank">Renaud.Fortier@fsaa.ulaval.ca</a>>:<br>
> > ><br>
> > > Hi,<br>
> > ><br>
> > > I?m trying to download packages from the gluster repository <a href="https://download.gluster.org/" target="_blank">
https://download.gluster.org/</a> but it failed for every download I?ve tried.<br>
> > ><br>
> > ><br>
> > ><br>
> > > Is it happening only to me ?<br>
> > ><br>
> > > Thank you<br>
> > ><br>
> > ><br>
> > ><br>
> > > Renaud Fortier<br>
> > ><br>
> > ><br>
> > ><br>
> > > ________<br>
> > ><br>
> > ><br>
> > ><br>
> > > Community Meeting Calendar:<br>
> > ><br>
> > > Schedule -<br>
> > > Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC<br>
> > > Bridge: <a href="https://bluejeans.com/441850968" target="_blank">https://bluejeans.com/441850968</a><br>
> > ><br>
> > > Gluster-users mailing list<br>
> > > <a href="mailto:Gluster-users@gluster.org" target="_blank">Gluster-users@gluster.org</a><br>
> > > <a href="https://lists.gluster.org/mailman/listinfo/gluster-users" target="_blank">https://lists.gluster.org/mailman/listinfo/gluster-users</a><br>
<br>
<br>
------------------------------<br>
<br>
Message: 4<br>
Date: Mon, 6 Apr 2020 06:13:23 +0200<br>
From: Hu Bert <<a href="mailto:revirii@googlemail.com" target="_blank">revirii@googlemail.com</a>><br>
To: gluster-users <<a href="mailto:gluster-users@gluster.org" target="_blank">gluster-users@gluster.org</a>><br>
Subject: [Gluster-users] One error/warning message after upgrade 5.11<br>
-> 6.8<br>
Message-ID:<br>
<<a href="mailto:CAAV-988kaD0bat2p3Xfw8vHYhY-pcekeoPOyLq297XEqRosSyw@mail.gmail.com" target="_blank">CAAV-988kaD0bat2p3Xfw8vHYhY-pcekeoPOyLq297XEqRosSyw@mail.gmail.com</a>><br>
Content-Type: text/plain; charset="UTF-8"<br>
<br>
Hello,<br>
<br>
i just upgraded my servers and clients from 5.11 to 6.8; besides one<br>
connection problem to the gluster download server everything went<br>
fine.<br>
<br>
On the 3 gluster servers i mount the 2 volumes as well, and only there<br>
(and not on all the other clients) there are some messages in the log<br>
file of both mount logs:<br>
<br>
[2020-04-06 04:10:53.552561] W [MSGID: 114031]<br>
[client-rpc-fops_v2.c:851:client4_0_setxattr_cbk]<br>
0-persistent-client-2: remote operation failed [Permission denied]<br>
[2020-04-06 04:10:53.552635] W [MSGID: 114031]<br>
[client-rpc-fops_v2.c:851:client4_0_setxattr_cbk]<br>
0-persistent-client-1: remote operation failed [Permission denied]<br>
[2020-04-06 04:10:53.552639] W [MSGID: 114031]<br>
[client-rpc-fops_v2.c:851:client4_0_setxattr_cbk]<br>
0-persistent-client-0: remote operation failed [Permission denied]<br>
[2020-04-06 04:10:53.553226] E [MSGID: 148002]<br>
[utime.c:146:gf_utime_set_mdata_setxattr_cbk] 0-persistent-utime: dict<br>
set of key for set-ctime-mdata failed [Permission denied]<br>
The message "W [MSGID: 114031]<br>
[client-rpc-fops_v2.c:851:client4_0_setxattr_cbk]<br>
0-persistent-client-2: remote operation failed [Permission denied]"<br>
repeated 4 times between [2020-04-06 04:10:53.552561] and [2020-04-06<br>
04:10:53.745542]<br>
The message "W [MSGID: 114031]<br>
[client-rpc-fops_v2.c:851:client4_0_setxattr_cbk]<br>
0-persistent-client-1: remote operation failed [Permission denied]"<br>
repeated 4 times between [2020-04-06 04:10:53.552635] and [2020-04-06<br>
04:10:53.745610]<br>
The message "W [MSGID: 114031]<br>
[client-rpc-fops_v2.c:851:client4_0_setxattr_cbk]<br>
0-persistent-client-0: remote operation failed [Permission denied]"<br>
repeated 4 times between [2020-04-06 04:10:53.552639] and [2020-04-06<br>
04:10:53.745632]<br>
The message "E [MSGID: 148002]<br>
[utime.c:146:gf_utime_set_mdata_setxattr_cbk] 0-persistent-utime: dict<br>
set of key for set-ctime-mdata failed [Permission denied]" repeated 4<br>
times between [2020-04-06 04:10:53.553226] and [2020-04-06<br>
04:10:53.746080]<br>
<br>
Anything to worry about?<br>
<br>
<br>
Regards,<br>
Hubert<br>
<br>
<br>
------------------------------<br>
<br>
Message: 5<br>
Date: Mon, 6 Apr 2020 10:06:41 +0530<br>
From: Ravishankar N <<a href="mailto:ravishankar@redhat.com" target="_blank">ravishankar@redhat.com</a>><br>
To: Erik Jacobson <<a href="mailto:erik.jacobson@hpe.com" target="_blank">erik.jacobson@hpe.com</a>><br>
Cc: <a href="mailto:gluster-users@gluster.org" target="_blank">gluster-users@gluster.org</a><br>
Subject: Re: [Gluster-users] gnfs split brain when 1 server in 3x1<br>
down (high load) - help request<br>
Message-ID: <<a href="mailto:4f5c5a73-69b9-f575-5974-d582e2a06051@redhat.com" target="_blank">4f5c5a73-69b9-f575-5974-d582e2a06051@redhat.com</a>><br>
Content-Type: text/plain; charset=windows-1252; format=flowed<br>
<br>
afr_final_errno() is called from many other places other than the inode <br>
refresh code path, so the 2 (ENOENT) could be from one of those (mostly <br>
afr_lookup_done) but it is puzzling that you are not seeing EIO even <br>
once when it is called from afr_inode_refresh_subvol_cbk() code path. <br>
Not sure what is happening here.<br>
<br>
On 06/04/20 6:52 am, Erik Jacobson wrote:<br>
> During the problem case, near as I can tell, afr_final_errno(),<br>
> in the loop where tmp_errno = local->replies[i].op_errno is set,<br>
> the errno is always "2" when it gets to that point on server 3 (where<br>
> the NFS load is).<br>
><br>
> I never see a value other than 2.<br>
><br>
> I later simply put the print at the end of the function too, to double<br>
> verify non-zero exit codes. There are thousands of non-zero return<br>
> codes, all 2 when not zero. Here is an exmaple flow right before a<br>
> split-brain. I do not wish to imply the split-brain is related, it's<br>
> just an example log snip:<br>
><br>
><br>
> [2020-04-06 00:54:21.125373] E [MSGID: 0] [afr-common.c:2546:afr_final_errno] 0-erikj-afr_final_errno: erikj dbg afr_final_errno() errno from loop before afr_higher_errno was: 2<br>
> [2020-04-06 00:54:21.125374] E [MSGID: 0] [afr-common.c:2551:afr_final_errno] 0-erikj-afr_final_errno: erikj dbg returning non-zero: 2<br>
> [2020-04-06 00:54:23.315397] E [MSGID: 0] [afr-read-txn.c:283:afr_read_txn_refresh_done] 0-cm_shared-replicate-0: erikj dbg crapola 1st if in afr_read_txn_refresh_done() !priv->thin_arbiter_count -- goto to readfn<br>
> [2020-04-06 00:54:23.315432] E [MSGID: 108008] [afr-read-txn.c:314:afr_read_txn_refresh_done] 0-cm_shared-replicate-0: Failing READLINK on gfid 57f269ef-919d-40ec-b7fc-a7906fee648b: split-brain observed. [Input/output error]<br>
> [2020-04-06 00:54:23.315450] W [MSGID: 112199] [nfs3-helpers.c:3327:nfs3_log_readlink_res] 0-nfs-nfsv3: /image/images_ro_nfs/rhel8.0/usr/lib64/libmlx5.so.1 => (XID: 1fdba2bc, READLINK: NFS: 5(I/O error), POSIX: 5(Input/output error)) target: (null)<br>
><br>
><br>
> I am missing something. I will see if Scott and I can work together<br>
> tomorrow. Happy for any more ideas, Thank you!!<br>
><br>
><br>
> On Sun, Apr 05, 2020 at 06:49:56PM -0500, Erik Jacobson wrote:<br>
>> First, it's possible our analysis is off somewhere. I never get to your<br>
>> print message. I put a debug statement at the start of the function so I<br>
>> know we get there (just to verify my print statements were taking<br>
>> affect).<br>
>><br>
>> I put a print statement for the if (call_count == 0) { call there, right<br>
>> after the if. I ran some tests.<br>
>><br>
>> I suspect that isn't a problem area. There were some interesting results<br>
>> with an NFS stale file handle error going through that path. Otherwise<br>
>> it's always errno=0 even in the heavy test case. I'm not concerned about<br>
>> a stale NFS file handle this moment. That print was also hit heavily when<br>
>> one server was down (which surprised me but I don't know the internals).<br>
>><br>
>> I'm trying to re-read and work through Scott's message to see if any<br>
>> other print statements might be helpful.<br>
>><br>
>> Thank you for your help so far. I will reply back if I find something.<br>
>> Otherwise suggestions welcome!<br>
>><br>
>> The MFG system I can access got smaller this weekend but is still large<br>
>> enough to reproduce the error.<br>
>><br>
>> As you can tell, I work mostly at a level well above filesystem code so<br>
>> thank you for staying with me as I struggle through this.<br>
>><br>
>> Erik<br>
>><br>
>>> After we hear from all children, afr_inode_refresh_subvol_cbk() then calls afr_inode_refresh_done()-->afr_txn_refresh_done()-->afr_read_txn_refresh_done().<br>
>>> But you already know this flow now.<br>
>>> diff --git a/xlators/cluster/afr/src/afr-common.c b/xlators/cluster/afr/src/afr-common.c<br>
>>> index 4bfaef9e8..096ce06f0 100644<br>
>>> --- a/xlators/cluster/afr/src/afr-common.c<br>
>>> +++ b/xlators/cluster/afr/src/afr-common.c<br>
>>> @@ -1318,6 +1318,12 @@ afr_inode_refresh_subvol_cbk(call_frame_t *frame, void *cookie, xlator_t *this,<br>
>>> if (xdata)<br>
>>> local->replies[call_child].xdata = dict_ref(xdata);<br>
>>> }<br>
>>> + if (op_ret == -1)<br>
>>> + gf_msg_callingfn(<br>
>>> + this->name, GF_LOG_ERROR, op_errno, AFR_MSG_SPLIT_BRAIN,<br>
>>> + "Inode refresh on child:%d failed with errno:%d for %s(%s) ",<br>
>>> + call_child, op_errno, local-><a href="http://loc.name" target="_blank">loc.name</a>,<br>
>>> + uuid_utoa(local->loc.inode->gfid));<br>
>>> if (xdata) {<br>
>>> ret = dict_get_int8(xdata, "link-count", &need_heal);<br>
>>> local->replies[call_child].need_heal = need_heal;<br>
><br>
<br>
<br>
<br>
------------------------------<br>
<br>
Message: 6<br>
Date: Mon, 6 Apr 2020 10:54:10 +0530<br>
From: Hari Gowtham <<a href="mailto:hgowtham@redhat.com" target="_blank">hgowtham@redhat.com</a>><br>
To: gluster-users <<a href="mailto:gluster-users@gluster.org" target="_blank">gluster-users@gluster.org</a>>, gluster-devel<br>
<<a href="mailto:gluster-devel@gluster.org" target="_blank">gluster-devel@gluster.org</a>><br>
Subject: [Gluster-users] Gluster testcase Hackathon<br>
Message-ID:<br>
<CAKh1kXshSzz1FxmoitCPLyX7_jXe+=j0Nqwt3__hfr=<a href="mailto:671jN8w@mail.gmail.com" target="_blank">671jN8w@mail.gmail.com</a>><br>
Content-Type: text/plain; charset="utf-8"<br>
<br>
Hi all,<br>
<br>
We have been seeing a good number of CI test cases failing. This does<br>
become a problem with taking in new fixes and having them marked as bad<br>
tests reduces the test coverage. As a result, we are planning to have a<br>
hackathon on 09th April. This will be a virtual one happening on google<br>
meet. Information to join can be found below[4].<br>
<br>
We will be working on the tests that are currently failing spuriously[1]<br>
and also on the test cases that have been marked as bad[2] in the past and<br>
fix these. The consolidated list [2] has all the test cases we will look<br>
into during this hackathon. If any of you have come across a test which has<br>
failed and was missed in the list, feel free to link the test case and the<br>
link to failure in the consolidated list [2].<br>
<br>
Prerequisite would be to have a centos machine where you can clone gluster<br>
and work on.<br>
The action item would be to take up as many test cases as possible(write<br>
down your name against the test case to avoid rework), run it on the local<br>
environment, find out why it fails, send out a fix for it and review<br>
others' patches. If you want to add more test cases as per your usage, feel<br>
free to do so.<br>
<br>
You can go through the link[3] to get the basic idea on how to set up and<br>
contribute. If there are more questions, please feel free to ask them, also<br>
you can discuss it with us during the hackathon and sort it as well.<br>
<br>
We will send out a google calendar invite soon. The tentative timing will<br>
be from 11am to 4:30pm IST<br>
<br>
[1] <a href="https://fstat.gluster.org/summary" target="_blank">https://fstat.gluster.org/summary</a><br>
[2]<br>
<a href="https://docs.google.com/spreadsheets/d/1_j_JfJw1YjEziVT1pe8I_8-kMf7LVVmlleaukEikHw0/edit?usp=sharing" target="_blank">https://docs.google.com/spreadsheets/d/1_j_JfJw1YjEziVT1pe8I_8-kMf7LVVmlleaukEikHw0/edit?usp=sharing</a><br>
[3]<br>
<a href="https://docs.gluster.org/en/latest/Developer-guide/Simplified-Development-Workflow/" target="_blank">https://docs.gluster.org/en/latest/Developer-guide/Simplified-Development-Workflow/</a><br>
[4] To join the video meeting, click this link:<br>
<a href="https://meet.google.com/cde-uycs-koz" target="_blank">https://meet.google.com/cde-uycs-koz</a><br>
Otherwise, to join by phone, dial +1 501-939-4201 and enter this PIN: 803<br>
155 592#<br>
To view more phone numbers, click this link:<br>
<a href="https://tel.meet/cde-uycs-koz?hs=5" target="_blank">https://tel.meet/cde-uycs-koz?hs=5</a><br>
<br>
-- <br>
Regards,<br>
Hari Gowtham.<br>
-------------- next part --------------<br>
An HTML attachment was scrubbed...<br>
URL: <<a href="http://lists.gluster.org/pipermail/gluster-users/attachments/20200406/4a25d280/attachment-0001.html" target="_blank">http://lists.gluster.org/pipermail/gluster-users/attachments/20200406/4a25d280/attachment-0001.html</a>><br>
<br>
------------------------------<br>
<br>
Message: 7<br>
Date: Mon, 6 Apr 2020 12:30:41 +0200<br>
From: Hu Bert <<a href="mailto:revirii@googlemail.com" target="_blank">revirii@googlemail.com</a>><br>
To: gluster-users <<a href="mailto:gluster-users@gluster.org" target="_blank">gluster-users@gluster.org</a>><br>
Subject: [Gluster-users] gluster v6.8: systemd units disabled after<br>
install<br>
Message-ID:<br>
<CAAV-98-KT5iv63N32S_ta0A9Fw1YaKZQWhNjB5x=<a href="mailto:R4jU4NTbeg@mail.gmail.com" target="_blank">R4jU4NTbeg@mail.gmail.com</a>><br>
Content-Type: text/plain; charset="UTF-8"<br>
<br>
Hello,<br>
<br>
after a server reboot (with a fresh gluster 6.8 install) i noticed<br>
that the gluster services weren't running.<br>
<br>
systemctl status glusterd.service<br>
? glusterd.service - GlusterFS, a clustered file-system server<br>
Loaded: loaded (/lib/systemd/system/glusterd.service; disabled;<br>
vendor preset: enabled)<br>
Active: inactive (dead)<br>
Docs: man:glusterd(8)<br>
<br>
Apr 06 11:34:18 glfsserver1 systemd[1]:<br>
/lib/systemd/system/glusterd.service:9: PIDFile= references path below<br>
legacy directory /var/run/, updating /var/run/glusterd.pid ?<br>
/run/glusterd.pid; please update the unit file accordingly.<br>
<br>
systemctl status glustereventsd.service<br>
? glustereventsd.service - Gluster Events Notifier<br>
Loaded: loaded (/lib/systemd/system/glustereventsd.service;<br>
disabled; vendor preset: enabled)<br>
Active: inactive (dead)<br>
Docs: man:glustereventsd(8)<br>
<br>
Apr 06 11:34:27 glfsserver1 systemd[1]:<br>
/lib/systemd/system/glustereventsd.service:11: PIDFile= references<br>
path below legacy directory /var/run/, updating<br>
/var/run/glustereventsd.pid ? /run/glustereventsd.pid; please update<br>
the unit file accordingly.<br>
<br>
You have to enable them manually:<br>
<br>
systemctl enable glusterd.service<br>
Created symlink<br>
/etc/systemd/system/multi-user.target.wants/glusterd.service ?<br>
/lib/systemd/system/glusterd.service.<br>
systemctl enable glustereventsd.service<br>
Created symlink<br>
/etc/systemd/system/multi-user.target.wants/glustereventsd.service ?<br>
/lib/systemd/system/glustereventsd.service.<br>
<br>
Is this a bug? If so: already known?<br>
<br>
<br>
Regards,<br>
Hubert<br>
<br>
<br>
------------------------------<br>
<br>
_______________________________________________<br>
Gluster-users mailing list<br>
<a href="mailto:Gluster-users@gluster.org" target="_blank">Gluster-users@gluster.org</a><br>
<a href="https://lists.gluster.org/mailman/listinfo/gluster-users" target="_blank">https://lists.gluster.org/mailman/listinfo/gluster-users</a><br>
<br>
End of Gluster-users Digest, Vol 144, Issue 6<br>
*********************************************<br>
</div>
</span></font></div>
</div>
</div>
</div>
</div>
________<br>
<br>
<br>
<br>
Community Meeting Calendar:<br>
<br>
Schedule -<br>
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC<br>
Bridge: <a href="https://bluejeans.com/441850968" rel="noreferrer" target="_blank">https://bluejeans.com/441850968</a><br>
<br>
Gluster-users mailing list<br>
<a href="mailto:Gluster-users@gluster.org" target="_blank">Gluster-users@gluster.org</a><br>
<a href="https://lists.gluster.org/mailman/listinfo/gluster-users" rel="noreferrer" target="_blank">https://lists.gluster.org/mailman/listinfo/gluster-users</a><br>
</blockquote></div><br clear="all"><div><br></div>-- <br><div dir="ltr" class="gmail_signature">Suvendu Mitra<br>GSM - +358504821066</div>