<html>

<head>

<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">

<style type="text/css" style="display:none;"> P {margin-top:0;margin-bottom:0;} </style>

</head>

<body dir="ltr">

<div style="font-family: Calibri, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">

Unsubscribe</div>

<div>

<div style="font-family: Calibri, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">

<br>

</div>

<div id="Signature">

<p>Sent from <a href="http://aka.ms/weboutlook">Outlook</a><br>

</p>

<div>

<div id="appendonsend"></div>

<div style="font-family:Calibri,Helvetica,sans-serif; font-size:12pt; color:rgb(0,0,0)">

<br>

</div>

<hr tabindex="-1" style="display:inline-block; width:98%">

<div id="divRplyFwdMsg" dir="ltr"><font face="Calibri, sans-serif" color="#000000" style="font-size:11pt"><b>From:</b> gluster-users-bounces@gluster.org &lt;gluster-users-bounces@gluster.org&gt; on behalf of gluster-users-request@gluster.org &lt;gluster-users-request@gluster.org&gt;<br>

<b>Sent:</b> April 6, 2020 5:00 AM<br>

<b>To:</b> gluster-users@gluster.org &lt;gluster-users@gluster.org&gt;<br>

<b>Subject:</b> Gluster-users Digest, Vol 144, Issue 6</font>

<div>&nbsp;</div>

</div>

<div class="BodyFragment"><font size="2"><span style="font-size:11pt">

<div class="PlainText">Send Gluster-users mailing list submissions to<br>

&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; gluster-users@gluster.org<br>

<br>

To subscribe or unsubscribe via the World Wide Web, visit<br>

&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <a href="https://lists.gluster.org/mailman/listinfo/gluster-users">https://lists.gluster.org/mailman/listinfo/gluster-users</a><br>

or, via email, send a message with subject or body 'help' to<br>

&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; gluster-users-request@gluster.org<br>

<br>

You can reach the person managing the list at<br>

&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; gluster-users-owner@gluster.org<br>

<br>

When replying, please edit your Subject line so it is more specific<br>

than &quot;Re: Contents of Gluster-users digest...&quot;<br>

<br>

<br>

Today's Topics:<br>

<br>

&nbsp;&nbsp; 1. Re: gnfs split brain when 1 server in 3x1 down (high load) -<br>

&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; help request (Erik Jacobson)<br>

&nbsp;&nbsp; 2. Re: gnfs split brain when 1 server in 3x1 down (high load) -<br>

&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; help request (Erik Jacobson)<br>

&nbsp;&nbsp; 3. Re: Repository down ? (Hu Bert)<br>

&nbsp;&nbsp; 4. One error/warning message after upgrade 5.11 -&gt; 6.8 (Hu Bert)<br>

&nbsp;&nbsp; 5. Re: gnfs split brain when 1 server in 3x1 down (high load) -<br>

&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; help request (Ravishankar N)<br>

&nbsp;&nbsp; 6. Gluster testcase Hackathon (Hari Gowtham)<br>

&nbsp;&nbsp; 7. gluster v6.8: systemd units disabled after install (Hu Bert)<br>

<br>

<br>

----------------------------------------------------------------------<br>

<br>

Message: 1<br>

Date: Sun, 5 Apr 2020 18:49:56 -0500<br>

From: Erik Jacobson &lt;erik.jacobson@hpe.com&gt;<br>

To: Ravishankar N &lt;ravishankar@redhat.com&gt;<br>

Cc: gluster-users@gluster.org<br>

Subject: Re: [Gluster-users] gnfs split brain when 1 server in 3x1<br>

&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; down (high load) - help request<br>

Message-ID: &lt;20200405234956.GB29598@metalio.americas.hpqcorp.net&gt;<br>

Content-Type: text/plain; charset=us-ascii<br>

<br>

First, it's possible our analysis is off somewhere. I never get to your<br>

print message. I put a debug statement at the start of the function so I<br>

know we get there (just to verify my print statements were taking<br>

affect).<br>

<br>

I put a print statement for the if (call_count == 0) { call there, right<br>

after the if. I ran some tests.<br>

<br>

I suspect that isn't a problem area. There were some interesting results<br>

with an NFS stale file handle error going through that path. Otherwise<br>

it's always errno=0 even in the heavy test case. I'm not concerned about<br>

a stale NFS file handle this moment. That print was also hit heavily when<br>

one server was down (which surprised me but I don't know the internals).<br>

<br>

I'm trying to re-read and work through Scott's message to see if any<br>

other print statements might be helpful.<br>

<br>

Thank you for your help so far. I will reply back if I find something.<br>

Otherwise suggestions welcome!<br>

<br>

The MFG system I can access got smaller this weekend but is still large<br>

enough to reproduce the error.<br>

<br>

As you can tell, I work mostly at a level well above filesystem code so<br>

thank you for staying with me as I struggle through this.<br>

<br>

Erik<br>

<br>

&gt; After we hear from all children, afr_inode_refresh_subvol_cbk() then calls afr_inode_refresh_done()--&gt;afr_txn_refresh_done()--&gt;afr_read_txn_refresh_done().<br>

&gt; But you already know this flow now.<br>

<br>

&gt; diff --git a/xlators/cluster/afr/src/afr-common.c b/xlators/cluster/afr/src/afr-common.c<br>

&gt; index 4bfaef9e8..096ce06f0 100644<br>

&gt; --- a/xlators/cluster/afr/src/afr-common.c<br>

&gt; &#43;&#43;&#43; b/xlators/cluster/afr/src/afr-common.c<br>

&gt; @@ -1318,6 &#43;1318,12 @@ afr_inode_refresh_subvol_cbk(call_frame_t *frame, void *cookie, xlator_t *this,<br>

&gt;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; if (xdata)<br>

&gt;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; local-&gt;replies[call_child].xdata = dict_ref(xdata);<br>

&gt;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; }<br>

&gt; &#43;&nbsp;&nbsp;&nbsp; if (op_ret == -1)<br>

&gt; &#43;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; gf_msg_callingfn(<br>

&gt; &#43;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; this-&gt;name, GF_LOG_ERROR, op_errno, AFR_MSG_SPLIT_BRAIN,<br>

&gt; &#43;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &quot;Inode refresh on child:%d failed with errno:%d for %s(%s) &quot;,<br>

&gt; &#43;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; call_child, op_errno, local-&gt;loc.name,<br>

&gt; &#43;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; uuid_utoa(local-&gt;loc.inode-&gt;gfid));<br>

&gt;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; if (xdata) {<br>

&gt;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; ret = dict_get_int8(xdata, &quot;link-count&quot;, &amp;need_heal);<br>

&gt;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; local-&gt;replies[call_child].need_heal = need_heal;<br>

<br>

<br>

<br>

------------------------------<br>

<br>

Message: 2<br>

Date: Sun, 5 Apr 2020 20:22:21 -0500<br>

From: Erik Jacobson &lt;erik.jacobson@hpe.com&gt;<br>

To: Ravishankar N &lt;ravishankar@redhat.com&gt;<br>

Cc: gluster-users@gluster.org<br>

Subject: Re: [Gluster-users] gnfs split brain when 1 server in 3x1<br>

&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; down (high load) - help request<br>

Message-ID: &lt;20200406012221.GD29598@metalio.americas.hpqcorp.net&gt;<br>

Content-Type: text/plain; charset=us-ascii<br>

<br>

During the problem case, near as I can tell, afr_final_errno(),<br>

in the loop where tmp_errno = local-&gt;replies[i].op_errno is set,<br>

the errno is always &quot;2&quot; when it gets to that point on server 3 (where<br>

the NFS load is).<br>

<br>

I never see a value other than 2.<br>

<br>

I later simply put the print at the end of the function too, to double<br>

verify non-zero exit codes. There are thousands of non-zero return<br>

codes, all 2 when not zero. Here is an exmaple flow right before a<br>

split-brain. I do not wish to imply the split-brain is related, it's<br>

just an example log snip:<br>

<br>

<br>

[2020-04-06 00:54:21.125373] E [MSGID: 0] [afr-common.c:2546:afr_final_errno] 0-erikj-afr_final_errno: erikj dbg afr_final_errno() errno from loop before afr_higher_errno was: 2<br>

[2020-04-06 00:54:21.125374] E [MSGID: 0] [afr-common.c:2551:afr_final_errno] 0-erikj-afr_final_errno: erikj dbg returning non-zero: 2<br>

[2020-04-06 00:54:23.315397] E [MSGID: 0] [afr-read-txn.c:283:afr_read_txn_refresh_done] 0-cm_shared-replicate-0: erikj dbg crapola 1st if in afr_read_txn_refresh_done() !priv-&gt;thin_arbiter_count -- goto to readfn<br>

[2020-04-06 00:54:23.315432] E [MSGID: 108008] [afr-read-txn.c:314:afr_read_txn_refresh_done] 0-cm_shared-replicate-0: Failing READLINK on gfid 57f269ef-919d-40ec-b7fc-a7906fee648b: split-brain observed. [Input/output error]<br>

[2020-04-06 00:54:23.315450] W [MSGID: 112199] [nfs3-helpers.c:3327:nfs3_log_readlink_res] 0-nfs-nfsv3: /image/images_ro_nfs/rhel8.0/usr/lib64/libmlx5.so.1 =&gt; (XID: 1fdba2bc, READLINK: NFS: 5(I/O error), POSIX: 5(Input/output error)) target: (null)<br>

<br>

<br>

I am missing something. I will see if Scott and I can work together<br>

tomorrow. Happy for any more ideas, Thank you!!<br>

<br>

<br>

On Sun, Apr 05, 2020 at 06:49:56PM -0500, Erik Jacobson wrote:<br>

&gt; First, it's possible our analysis is off somewhere. I never get to your<br>

&gt; print message. I put a debug statement at the start of the function so I<br>

&gt; know we get there (just to verify my print statements were taking<br>

&gt; affect).<br>

&gt; <br>

&gt; I put a print statement for the if (call_count == 0) { call there, right<br>

&gt; after the if. I ran some tests.<br>

&gt; <br>

&gt; I suspect that isn't a problem area. There were some interesting results<br>

&gt; with an NFS stale file handle error going through that path. Otherwise<br>

&gt; it's always errno=0 even in the heavy test case. I'm not concerned about<br>

&gt; a stale NFS file handle this moment. That print was also hit heavily when<br>

&gt; one server was down (which surprised me but I don't know the internals).<br>

&gt; <br>

&gt; I'm trying to re-read and work through Scott's message to see if any<br>

&gt; other print statements might be helpful.<br>

&gt; <br>

&gt; Thank you for your help so far. I will reply back if I find something.<br>

&gt; Otherwise suggestions welcome!<br>

&gt; <br>

&gt; The MFG system I can access got smaller this weekend but is still large<br>

&gt; enough to reproduce the error.<br>

&gt; <br>

&gt; As you can tell, I work mostly at a level well above filesystem code so<br>

&gt; thank you for staying with me as I struggle through this.<br>

&gt; <br>

&gt; Erik<br>

&gt; <br>

&gt; &gt; After we hear from all children, afr_inode_refresh_subvol_cbk() then calls afr_inode_refresh_done()--&gt;afr_txn_refresh_done()--&gt;afr_read_txn_refresh_done().<br>

&gt; &gt; But you already know this flow now.<br>

&gt; <br>

&gt; &gt; diff --git a/xlators/cluster/afr/src/afr-common.c b/xlators/cluster/afr/src/afr-common.c<br>

&gt; &gt; index 4bfaef9e8..096ce06f0 100644<br>

&gt; &gt; --- a/xlators/cluster/afr/src/afr-common.c<br>

&gt; &gt; &#43;&#43;&#43; b/xlators/cluster/afr/src/afr-common.c<br>

&gt; &gt; @@ -1318,6 &#43;1318,12 @@ afr_inode_refresh_subvol_cbk(call_frame_t *frame, void *cookie, xlator_t *this,<br>

&gt; &gt;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; if (xdata)<br>

&gt; &gt;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; local-&gt;replies[call_child].xdata = dict_ref(xdata);<br>

&gt; &gt;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; }<br>

&gt; &gt; &#43;&nbsp;&nbsp;&nbsp; if (op_ret == -1)<br>

&gt; &gt; &#43;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; gf_msg_callingfn(<br>

&gt; &gt; &#43;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; this-&gt;name, GF_LOG_ERROR, op_errno, AFR_MSG_SPLIT_BRAIN,<br>

&gt; &gt; &#43;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &quot;Inode refresh on child:%d failed with errno:%d for %s(%s) &quot;,<br>

&gt; &gt; &#43;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; call_child, op_errno, local-&gt;loc.name,<br>

&gt; &gt; &#43;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; uuid_utoa(local-&gt;loc.inode-&gt;gfid));<br>

&gt; &gt;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; if (xdata) {<br>

&gt; &gt;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; ret = dict_get_int8(xdata, &quot;link-count&quot;, &amp;need_heal);<br>

&gt; &gt;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; local-&gt;replies[call_child].need_heal = need_heal;<br>

<br>

<br>

<br>

<br>

------------------------------<br>

<br>

Message: 3<br>

Date: Mon, 6 Apr 2020 06:03:22 &#43;0200<br>

From: Hu Bert &lt;revirii@googlemail.com&gt;<br>

To: Renaud Fortier &lt;Renaud.Fortier@fsaa.ulaval.ca&gt;<br>

Cc: &quot;gluster-users@gluster.org&quot; &lt;gluster-users@gluster.org&gt;<br>

Subject: Re: [Gluster-users] Repository down ?<br>

Message-ID:<br>

&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &lt;CAAV-989P_=hNXj_PJdEBB8rG29b=jrzwDY3X&#43;MBHi40zddzPgg@mail.gmail.com&gt;<br>

Content-Type: text/plain; charset=&quot;UTF-8&quot;<br>

<br>

Good morning,<br>

<br>

upgraded from 5.11 to 6.8 today; 2 servers worked smoothly, one again<br>

had connection problems:<br>

<br>

Err:1 <a href="https://download.gluster.org/pub/gluster/glusterfs/6/6.8/Debian/buster/amd64/apt">

https://download.gluster.org/pub/gluster/glusterfs/6/6.8/Debian/buster/amd64/apt</a><br>

buster/main amd64 libglusterfs-dev amd64 6.8-1<br>

&nbsp; Could not connect to download.gluster.org:443 (8.43.85.185),<br>

connection timed out<br>

Err:2 <a href="https://download.gluster.org/pub/gluster/glusterfs/6/6.8/Debian/buster/amd64/apt">

https://download.gluster.org/pub/gluster/glusterfs/6/6.8/Debian/buster/amd64/apt</a><br>

buster/main amd64 libgfxdr0 amd64 6.8-1<br>

&nbsp; Unable to connect to download.gluster.org:https:<br>

<br>

As a workaround i downloaded the packages manually on one of the other<br>

2 servers, copied them to server3 and installed them manually.<br>

<br>

Any idea why this happens? /etc/hosts, /etc/resolv.conf are identical.<br>

The servers are behind the same gateway (switch in datacenter of<br>

provider), the server IPs differ only in the last number.<br>

<br>

<br>

Best regards,<br>

Hubert<br>

<br>

Am Fr., 3. Apr. 2020 um 10:33 Uhr schrieb Hu Bert &lt;revirii@googlemail.com&gt;:<br>

&gt;<br>

&gt; ok, half an hour later it worked. Not funny during an upgrade. Strange... :-)<br>

&gt;<br>

&gt;<br>

&gt; Regards,<br>

&gt; Hubert<br>

&gt;<br>

&gt; Am Fr., 3. Apr. 2020 um 10:19 Uhr schrieb Hu Bert &lt;revirii@googlemail.com&gt;:<br>

&gt; &gt;<br>

&gt; &gt; Hi,<br>

&gt; &gt;<br>

&gt; &gt; i'm currently preparing an upgrade 5.x -&gt; 6.8; the download of the<br>

&gt; &gt; repository key works on 2 of 3 servers. nameserver settings are<br>

&gt; &gt; identical. On the 3rd server i get this:<br>

&gt; &gt;<br>

&gt; &gt; wget -O - <a href="https://download.gluster.org/pub/gluster/glusterfs/6/rsa.pub">

https://download.gluster.org/pub/gluster/glusterfs/6/rsa.pub</a><br>

&gt; &gt; | apt-key add -<br>

&gt; &gt; --2020-04-03 10:15:43--<br>

&gt; &gt; <a href="https://download.gluster.org/pub/gluster/glusterfs/6/rsa.pub">https://download.gluster.org/pub/gluster/glusterfs/6/rsa.pub</a><br>

&gt; &gt; Resolving download.gluster.org (download.gluster.org)... 8.43.85.185<br>

&gt; &gt; Connecting to download.gluster.org<br>

&gt; &gt; (download.gluster.org)|8.43.85.185|:443... failed: Connection timed<br>

&gt; &gt; out.<br>

&gt; &gt; Retrying.<br>

&gt; &gt;<br>

&gt; &gt; and this goes on and on... Which errors do you see?<br>

&gt; &gt;<br>

&gt; &gt;<br>

&gt; &gt; Regards,<br>

&gt; &gt; Hubert<br>

&gt; &gt;<br>

&gt; &gt; Am Mo., 30. M?rz 2020 um 20:40 Uhr schrieb Renaud Fortier<br>

&gt; &gt; &lt;Renaud.Fortier@fsaa.ulaval.ca&gt;:<br>

&gt; &gt; &gt;<br>

&gt; &gt; &gt; Hi,<br>

&gt; &gt; &gt;<br>

&gt; &gt; &gt; I?m trying to download packages from the gluster repository <a href="https://download.gluster.org/">

https://download.gluster.org/</a> but it failed for every download I?ve tried.<br>

&gt; &gt; &gt;<br>

&gt; &gt; &gt;<br>

&gt; &gt; &gt;<br>

&gt; &gt; &gt; Is it happening only to me ?<br>

&gt; &gt; &gt;<br>

&gt; &gt; &gt; Thank you<br>

&gt; &gt; &gt;<br>

&gt; &gt; &gt;<br>

&gt; &gt; &gt;<br>

&gt; &gt; &gt; Renaud Fortier<br>

&gt; &gt; &gt;<br>

&gt; &gt; &gt;<br>

&gt; &gt; &gt;<br>

&gt; &gt; &gt; ________<br>

&gt; &gt; &gt;<br>

&gt; &gt; &gt;<br>

&gt; &gt; &gt;<br>

&gt; &gt; &gt; Community Meeting Calendar:<br>

&gt; &gt; &gt;<br>

&gt; &gt; &gt; Schedule -<br>

&gt; &gt; &gt; Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC<br>

&gt; &gt; &gt; Bridge: <a href="https://bluejeans.com/441850968">https://bluejeans.com/441850968</a><br>

&gt; &gt; &gt;<br>

&gt; &gt; &gt; Gluster-users mailing list<br>

&gt; &gt; &gt; Gluster-users@gluster.org<br>

&gt; &gt; &gt; <a href="https://lists.gluster.org/mailman/listinfo/gluster-users">https://lists.gluster.org/mailman/listinfo/gluster-users</a><br>

<br>

<br>

------------------------------<br>

<br>

Message: 4<br>

Date: Mon, 6 Apr 2020 06:13:23 &#43;0200<br>

From: Hu Bert &lt;revirii@googlemail.com&gt;<br>

To: gluster-users &lt;gluster-users@gluster.org&gt;<br>

Subject: [Gluster-users] One error/warning message after upgrade 5.11<br>

&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; -&gt; 6.8<br>

Message-ID:<br>

&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &lt;CAAV-988kaD0bat2p3Xfw8vHYhY-pcekeoPOyLq297XEqRosSyw@mail.gmail.com&gt;<br>

Content-Type: text/plain; charset=&quot;UTF-8&quot;<br>

<br>

Hello,<br>

<br>

i just upgraded my servers and clients from 5.11 to 6.8; besides one<br>

connection problem to the gluster download server everything went<br>

fine.<br>

<br>

On the 3 gluster servers i mount the 2 volumes as well, and only there<br>

(and not on all the other clients) there are some messages in the log<br>

file of both mount logs:<br>

<br>

[2020-04-06 04:10:53.552561] W [MSGID: 114031]<br>

[client-rpc-fops_v2.c:851:client4_0_setxattr_cbk]<br>

0-persistent-client-2: remote operation failed [Permission denied]<br>

[2020-04-06 04:10:53.552635] W [MSGID: 114031]<br>

[client-rpc-fops_v2.c:851:client4_0_setxattr_cbk]<br>

0-persistent-client-1: remote operation failed [Permission denied]<br>

[2020-04-06 04:10:53.552639] W [MSGID: 114031]<br>

[client-rpc-fops_v2.c:851:client4_0_setxattr_cbk]<br>

0-persistent-client-0: remote operation failed [Permission denied]<br>

[2020-04-06 04:10:53.553226] E [MSGID: 148002]<br>

[utime.c:146:gf_utime_set_mdata_setxattr_cbk] 0-persistent-utime: dict<br>

set of key for set-ctime-mdata failed [Permission denied]<br>

The message &quot;W [MSGID: 114031]<br>

[client-rpc-fops_v2.c:851:client4_0_setxattr_cbk]<br>

0-persistent-client-2: remote operation failed [Permission denied]&quot;<br>

repeated 4 times between [2020-04-06 04:10:53.552561] and [2020-04-06<br>

04:10:53.745542]<br>

The message &quot;W [MSGID: 114031]<br>

[client-rpc-fops_v2.c:851:client4_0_setxattr_cbk]<br>

0-persistent-client-1: remote operation failed [Permission denied]&quot;<br>

repeated 4 times between [2020-04-06 04:10:53.552635] and [2020-04-06<br>

04:10:53.745610]<br>

The message &quot;W [MSGID: 114031]<br>

[client-rpc-fops_v2.c:851:client4_0_setxattr_cbk]<br>

0-persistent-client-0: remote operation failed [Permission denied]&quot;<br>

repeated 4 times between [2020-04-06 04:10:53.552639] and [2020-04-06<br>

04:10:53.745632]<br>

The message &quot;E [MSGID: 148002]<br>

[utime.c:146:gf_utime_set_mdata_setxattr_cbk] 0-persistent-utime: dict<br>

set of key for set-ctime-mdata failed [Permission denied]&quot; repeated 4<br>

times between [2020-04-06 04:10:53.553226] and [2020-04-06<br>

04:10:53.746080]<br>

<br>

Anything to worry about?<br>

<br>

<br>

Regards,<br>

Hubert<br>

<br>

<br>

------------------------------<br>

<br>

Message: 5<br>

Date: Mon, 6 Apr 2020 10:06:41 &#43;0530<br>

From: Ravishankar N &lt;ravishankar@redhat.com&gt;<br>

To: Erik Jacobson &lt;erik.jacobson@hpe.com&gt;<br>

Cc: gluster-users@gluster.org<br>

Subject: Re: [Gluster-users] gnfs split brain when 1 server in 3x1<br>

&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; down (high load) - help request<br>

Message-ID: &lt;4f5c5a73-69b9-f575-5974-d582e2a06051@redhat.com&gt;<br>

Content-Type: text/plain; charset=windows-1252; format=flowed<br>

<br>

afr_final_errno() is called from many other places other than the inode <br>

refresh code path, so the 2 (ENOENT) could be from one of those (mostly <br>

afr_lookup_done) but it is puzzling that you are not seeing EIO even <br>

once when it is called from afr_inode_refresh_subvol_cbk() code path. <br>

Not sure what is happening here.<br>

<br>

On 06/04/20 6:52 am, Erik Jacobson wrote:<br>

&gt; During the problem case, near as I can tell, afr_final_errno(),<br>

&gt; in the loop where tmp_errno = local-&gt;replies[i].op_errno is set,<br>

&gt; the errno is always &quot;2&quot; when it gets to that point on server 3 (where<br>

&gt; the NFS load is).<br>

&gt;<br>

&gt; I never see a value other than 2.<br>

&gt;<br>

&gt; I later simply put the print at the end of the function too, to double<br>

&gt; verify non-zero exit codes. There are thousands of non-zero return<br>

&gt; codes, all 2 when not zero. Here is an exmaple flow right before a<br>

&gt; split-brain. I do not wish to imply the split-brain is related, it's<br>

&gt; just an example log snip:<br>

&gt;<br>

&gt;<br>

&gt; [2020-04-06 00:54:21.125373] E [MSGID: 0] [afr-common.c:2546:afr_final_errno] 0-erikj-afr_final_errno: erikj dbg afr_final_errno() errno from loop before afr_higher_errno was: 2<br>

&gt; [2020-04-06 00:54:21.125374] E [MSGID: 0] [afr-common.c:2551:afr_final_errno] 0-erikj-afr_final_errno: erikj dbg returning non-zero: 2<br>

&gt; [2020-04-06 00:54:23.315397] E [MSGID: 0] [afr-read-txn.c:283:afr_read_txn_refresh_done] 0-cm_shared-replicate-0: erikj dbg crapola 1st if in afr_read_txn_refresh_done() !priv-&gt;thin_arbiter_count -- goto to readfn<br>

&gt; [2020-04-06 00:54:23.315432] E [MSGID: 108008] [afr-read-txn.c:314:afr_read_txn_refresh_done] 0-cm_shared-replicate-0: Failing READLINK on gfid 57f269ef-919d-40ec-b7fc-a7906fee648b: split-brain observed. [Input/output error]<br>

&gt; [2020-04-06 00:54:23.315450] W [MSGID: 112199] [nfs3-helpers.c:3327:nfs3_log_readlink_res] 0-nfs-nfsv3: /image/images_ro_nfs/rhel8.0/usr/lib64/libmlx5.so.1 =&gt; (XID: 1fdba2bc, READLINK: NFS: 5(I/O error), POSIX: 5(Input/output error)) target: (null)<br>

&gt;<br>

&gt;<br>

&gt; I am missing something. I will see if Scott and I can work together<br>

&gt; tomorrow. Happy for any more ideas, Thank you!!<br>

&gt;<br>

&gt;<br>

&gt; On Sun, Apr 05, 2020 at 06:49:56PM -0500, Erik Jacobson wrote:<br>

&gt;&gt; First, it's possible our analysis is off somewhere. I never get to your<br>

&gt;&gt; print message. I put a debug statement at the start of the function so I<br>

&gt;&gt; know we get there (just to verify my print statements were taking<br>

&gt;&gt; affect).<br>

&gt;&gt;<br>

&gt;&gt; I put a print statement for the if (call_count == 0) { call there, right<br>

&gt;&gt; after the if. I ran some tests.<br>

&gt;&gt;<br>

&gt;&gt; I suspect that isn't a problem area. There were some interesting results<br>

&gt;&gt; with an NFS stale file handle error going through that path. Otherwise<br>

&gt;&gt; it's always errno=0 even in the heavy test case. I'm not concerned about<br>

&gt;&gt; a stale NFS file handle this moment. That print was also hit heavily when<br>

&gt;&gt; one server was down (which surprised me but I don't know the internals).<br>

&gt;&gt;<br>

&gt;&gt; I'm trying to re-read and work through Scott's message to see if any<br>

&gt;&gt; other print statements might be helpful.<br>

&gt;&gt;<br>

&gt;&gt; Thank you for your help so far. I will reply back if I find something.<br>

&gt;&gt; Otherwise suggestions welcome!<br>

&gt;&gt;<br>

&gt;&gt; The MFG system I can access got smaller this weekend but is still large<br>

&gt;&gt; enough to reproduce the error.<br>

&gt;&gt;<br>

&gt;&gt; As you can tell, I work mostly at a level well above filesystem code so<br>

&gt;&gt; thank you for staying with me as I struggle through this.<br>

&gt;&gt;<br>

&gt;&gt; Erik<br>

&gt;&gt;<br>

&gt;&gt;&gt; After we hear from all children, afr_inode_refresh_subvol_cbk() then calls afr_inode_refresh_done()--&gt;afr_txn_refresh_done()--&gt;afr_read_txn_refresh_done().<br>

&gt;&gt;&gt; But you already know this flow now.<br>

&gt;&gt;&gt; diff --git a/xlators/cluster/afr/src/afr-common.c b/xlators/cluster/afr/src/afr-common.c<br>

&gt;&gt;&gt; index 4bfaef9e8..096ce06f0 100644<br>

&gt;&gt;&gt; --- a/xlators/cluster/afr/src/afr-common.c<br>

&gt;&gt;&gt; &#43;&#43;&#43; b/xlators/cluster/afr/src/afr-common.c<br>

&gt;&gt;&gt; @@ -1318,6 &#43;1318,12 @@ afr_inode_refresh_subvol_cbk(call_frame_t *frame, void *cookie, xlator_t *this,<br>

&gt;&gt;&gt;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; if (xdata)<br>

&gt;&gt;&gt;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; local-&gt;replies[call_child].xdata = dict_ref(xdata);<br>

&gt;&gt;&gt;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; }<br>

&gt;&gt;&gt; &#43;&nbsp;&nbsp;&nbsp; if (op_ret == -1)<br>

&gt;&gt;&gt; &#43;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; gf_msg_callingfn(<br>

&gt;&gt;&gt; &#43;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; this-&gt;name, GF_LOG_ERROR, op_errno, AFR_MSG_SPLIT_BRAIN,<br>

&gt;&gt;&gt; &#43;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &quot;Inode refresh on child:%d failed with errno:%d for %s(%s) &quot;,<br>

&gt;&gt;&gt; &#43;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; call_child, op_errno, local-&gt;loc.name,<br>

&gt;&gt;&gt; &#43;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; uuid_utoa(local-&gt;loc.inode-&gt;gfid));<br>

&gt;&gt;&gt;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; if (xdata) {<br>

&gt;&gt;&gt;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; ret = dict_get_int8(xdata, &quot;link-count&quot;, &amp;need_heal);<br>

&gt;&gt;&gt;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; local-&gt;replies[call_child].need_heal = need_heal;<br>

&gt;<br>

<br>

<br>

<br>

------------------------------<br>

<br>

Message: 6<br>

Date: Mon, 6 Apr 2020 10:54:10 &#43;0530<br>

From: Hari Gowtham &lt;hgowtham@redhat.com&gt;<br>

To: gluster-users &lt;gluster-users@gluster.org&gt;,&nbsp; gluster-devel<br>

&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &lt;gluster-devel@gluster.org&gt;<br>

Subject: [Gluster-users] Gluster testcase Hackathon<br>

Message-ID:<br>

&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &lt;CAKh1kXshSzz1FxmoitCPLyX7_jXe&#43;=j0Nqwt3__hfr=671jN8w@mail.gmail.com&gt;<br>

Content-Type: text/plain; charset=&quot;utf-8&quot;<br>

<br>

Hi all,<br>

<br>

We have been seeing a good number of CI test cases failing. This does<br>

become a problem with taking in new fixes and having them marked as bad<br>

tests reduces the test coverage. As a result, we are planning to have a<br>

hackathon on 09th April. This will be a virtual one happening on google<br>

meet. Information to join can be found below[4].<br>

<br>

We will be working on the tests that are currently failing spuriously[1]<br>

and also on the test cases that have been marked as bad[2] in the past and<br>

fix these. The consolidated list [2] has all the test cases we will look<br>

into during this hackathon. If any of you have come across a test which has<br>

failed and was missed in the list, feel free to link the test case and the<br>

link to failure in the consolidated list [2].<br>

<br>

Prerequisite would be to have a centos machine where you can clone gluster<br>

and work on.<br>

The action item would be to take up as many test cases as possible(write<br>

down your name against the test case to avoid rework), run it on the local<br>

environment, find out why it fails, send out a fix for it and review<br>

others' patches. If you want to add more test cases as per your usage, feel<br>

free to do so.<br>

<br>

You can go through the link[3] to get the basic idea on how to set up and<br>

contribute. If there are more questions, please feel free to ask them, also<br>

you can discuss it with us during the hackathon and sort it as well.<br>

<br>

We will send out a google calendar invite soon. The tentative timing will<br>

be from 11am to 4:30pm IST<br>

<br>

[1] <a href="https://fstat.gluster.org/summary">https://fstat.gluster.org/summary</a><br>

[2]<br>

<a href="https://docs.google.com/spreadsheets/d/1_j_JfJw1YjEziVT1pe8I_8-kMf7LVVmlleaukEikHw0/edit?usp=sharing">https://docs.google.com/spreadsheets/d/1_j_JfJw1YjEziVT1pe8I_8-kMf7LVVmlleaukEikHw0/edit?usp=sharing</a><br>

[3]<br>

<a href="https://docs.gluster.org/en/latest/Developer-guide/Simplified-Development-Workflow/">https://docs.gluster.org/en/latest/Developer-guide/Simplified-Development-Workflow/</a><br>

[4] To join the video meeting, click this link:<br>

<a href="https://meet.google.com/cde-uycs-koz">https://meet.google.com/cde-uycs-koz</a><br>

Otherwise, to join by phone, dial &#43;1 501-939-4201 and enter this PIN: 803<br>

155 592#<br>

To view more phone numbers, click this link:<br>

<a href="https://tel.meet/cde-uycs-koz?hs=5">https://tel.meet/cde-uycs-koz?hs=5</a><br>

<br>

-- <br>

Regards,<br>

Hari Gowtham.<br>

-------------- next part --------------<br>

An HTML attachment was scrubbed...<br>

URL: &lt;<a href="http://lists.gluster.org/pipermail/gluster-users/attachments/20200406/4a25d280/attachment-0001.html">http://lists.gluster.org/pipermail/gluster-users/attachments/20200406/4a25d280/attachment-0001.html</a>&gt;<br>

<br>

------------------------------<br>

<br>

Message: 7<br>

Date: Mon, 6 Apr 2020 12:30:41 &#43;0200<br>

From: Hu Bert &lt;revirii@googlemail.com&gt;<br>

To: gluster-users &lt;gluster-users@gluster.org&gt;<br>

Subject: [Gluster-users] gluster v6.8: systemd units disabled after<br>

&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; install<br>

Message-ID:<br>

&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &lt;CAAV-98-KT5iv63N32S_ta0A9Fw1YaKZQWhNjB5x=R4jU4NTbeg@mail.gmail.com&gt;<br>

Content-Type: text/plain; charset=&quot;UTF-8&quot;<br>

<br>

Hello,<br>

<br>

after a server reboot (with a fresh gluster 6.8 install) i noticed<br>

that the gluster services weren't running.<br>

<br>

systemctl status glusterd.service<br>

? glusterd.service - GlusterFS, a clustered file-system server<br>

&nbsp;&nbsp; Loaded: loaded (/lib/systemd/system/glusterd.service; disabled;<br>

vendor preset: enabled)<br>

&nbsp;&nbsp; Active: inactive (dead)<br>

&nbsp;&nbsp;&nbsp;&nbsp; Docs: man:glusterd(8)<br>

<br>

Apr 06 11:34:18 glfsserver1 systemd[1]:<br>

/lib/systemd/system/glusterd.service:9: PIDFile= references path below<br>

legacy directory /var/run/, updating /var/run/glusterd.pid ?<br>

/run/glusterd.pid; please update the unit file accordingly.<br>

<br>

systemctl status glustereventsd.service<br>

? glustereventsd.service - Gluster Events Notifier<br>

&nbsp;&nbsp; Loaded: loaded (/lib/systemd/system/glustereventsd.service;<br>

disabled; vendor preset: enabled)<br>

&nbsp;&nbsp; Active: inactive (dead)<br>

&nbsp;&nbsp;&nbsp;&nbsp; Docs: man:glustereventsd(8)<br>

<br>

Apr 06 11:34:27 glfsserver1 systemd[1]:<br>

/lib/systemd/system/glustereventsd.service:11: PIDFile= references<br>

path below legacy directory /var/run/, updating<br>

/var/run/glustereventsd.pid ? /run/glustereventsd.pid; please update<br>

the unit file accordingly.<br>

<br>

You have to enable them manually:<br>

<br>

systemctl enable glusterd.service<br>

Created symlink<br>

/etc/systemd/system/multi-user.target.wants/glusterd.service ?<br>

/lib/systemd/system/glusterd.service.<br>

systemctl enable glustereventsd.service<br>

Created symlink<br>

/etc/systemd/system/multi-user.target.wants/glustereventsd.service ?<br>

/lib/systemd/system/glustereventsd.service.<br>

<br>

Is this a bug? If so: already known?<br>

<br>

<br>

Regards,<br>

Hubert<br>

<br>

<br>

------------------------------<br>

<br>

_______________________________________________<br>

Gluster-users mailing list<br>

Gluster-users@gluster.org<br>

<a href="https://lists.gluster.org/mailman/listinfo/gluster-users">https://lists.gluster.org/mailman/listinfo/gluster-users</a><br>

<br>

End of Gluster-users Digest, Vol 144, Issue 6<br>

*********************************************<br>

</div>

</span></font></div>

</div>

</div>

</div>

</body>

</html>