<div dir="ltr"><span style="color:rgb(0,0,0);font-family:Calibri,Helvetica,sans-serif;font-size:16px">Unsubscribe</span><br></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Mon, Apr 6, 2020 at 6:10 PM Oskar Pienkos &lt;<a href="mailto:oskarp10@hotmail.com">oskarp10@hotmail.com</a>&gt; wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">


<div dir="ltr">

<div style="font-family:Calibri,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">

Unsubscribe</div>

<div>

<div style="font-family:Calibri,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">

<br>

</div>

<div id="gmail-m_4928644625267831244Signature">

<p>Sent from <a href="http://aka.ms/weboutlook" target="_blank">Outlook</a><br>

</p>

<div>

<div id="gmail-m_4928644625267831244appendonsend"></div>

<div style="font-family:Calibri,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">

<br>

</div>

<hr style="display:inline-block;width:98%">

<div id="gmail-m_4928644625267831244divRplyFwdMsg" dir="ltr"><font face="Calibri, sans-serif" color="#000000" style="font-size:11pt"><b>From:</b> <a href="mailto:gluster-users-bounces@gluster.org" target="_blank">gluster-users-bounces@gluster.org</a> &lt;<a href="mailto:gluster-users-bounces@gluster.org" target="_blank">gluster-users-bounces@gluster.org</a>&gt; on behalf of <a href="mailto:gluster-users-request@gluster.org" target="_blank">gluster-users-request@gluster.org</a> &lt;<a href="mailto:gluster-users-request@gluster.org" target="_blank">gluster-users-request@gluster.org</a>&gt;<br>

<b>Sent:</b> April 6, 2020 5:00 AM<br>

<b>To:</b> <a href="mailto:gluster-users@gluster.org" target="_blank">gluster-users@gluster.org</a> &lt;<a href="mailto:gluster-users@gluster.org" target="_blank">gluster-users@gluster.org</a>&gt;<br>

<b>Subject:</b> Gluster-users Digest, Vol 144, Issue 6</font>

<div> </div>

</div>

<div><font size="2"><span style="font-size:11pt">

<div>Send Gluster-users mailing list submissions to<br>

        <a href="mailto:gluster-users@gluster.org" target="_blank">gluster-users@gluster.org</a><br>

<br>

To subscribe or unsubscribe via the World Wide Web, visit<br>

        <a href="https://lists.gluster.org/mailman/listinfo/gluster-users" target="_blank">https://lists.gluster.org/mailman/listinfo/gluster-users</a><br>

or, via email, send a message with subject or body &#39;help&#39; to<br>

        <a href="mailto:gluster-users-request@gluster.org" target="_blank">gluster-users-request@gluster.org</a><br>

<br>

You can reach the person managing the list at<br>

        <a href="mailto:gluster-users-owner@gluster.org" target="_blank">gluster-users-owner@gluster.org</a><br>

<br>

When replying, please edit your Subject line so it is more specific<br>

than &quot;Re: Contents of Gluster-users digest...&quot;<br>

<br>

<br>

Today&#39;s Topics:<br>

<br>

   1. Re: gnfs split brain when 1 server in 3x1 down (high load) -<br>

      help request (Erik Jacobson)<br>

   2. Re: gnfs split brain when 1 server in 3x1 down (high load) -<br>

      help request (Erik Jacobson)<br>

   3. Re: Repository down ? (Hu Bert)<br>

   4. One error/warning message after upgrade 5.11 -&gt; 6.8 (Hu Bert)<br>

   5. Re: gnfs split brain when 1 server in 3x1 down (high load) -<br>

      help request (Ravishankar N)<br>

   6. Gluster testcase Hackathon (Hari Gowtham)<br>

   7. gluster v6.8: systemd units disabled after install (Hu Bert)<br>

<br>

<br>

----------------------------------------------------------------------<br>

<br>

Message: 1<br>

Date: Sun, 5 Apr 2020 18:49:56 -0500<br>

From: Erik Jacobson &lt;<a href="mailto:erik.jacobson@hpe.com" target="_blank">erik.jacobson@hpe.com</a>&gt;<br>

To: Ravishankar N &lt;<a href="mailto:ravishankar@redhat.com" target="_blank">ravishankar@redhat.com</a>&gt;<br>

Cc: <a href="mailto:gluster-users@gluster.org" target="_blank">gluster-users@gluster.org</a><br>

Subject: Re: [Gluster-users] gnfs split brain when 1 server in 3x1<br>

        down (high load) - help request<br>

Message-ID: &lt;<a href="mailto:20200405234956.GB29598@metalio.americas.hpqcorp.net" target="_blank">20200405234956.GB29598@metalio.americas.hpqcorp.net</a>&gt;<br>

Content-Type: text/plain; charset=us-ascii<br>

<br>

First, it&#39;s possible our analysis is off somewhere. I never get to your<br>

print message. I put a debug statement at the start of the function so I<br>

know we get there (just to verify my print statements were taking<br>

affect).<br>

<br>

I put a print statement for the if (call_count == 0) { call there, right<br>

after the if. I ran some tests.<br>

<br>

I suspect that isn&#39;t a problem area. There were some interesting results<br>

with an NFS stale file handle error going through that path. Otherwise<br>

it&#39;s always errno=0 even in the heavy test case. I&#39;m not concerned about<br>

a stale NFS file handle this moment. That print was also hit heavily when<br>

one server was down (which surprised me but I don&#39;t know the internals).<br>

<br>

I&#39;m trying to re-read and work through Scott&#39;s message to see if any<br>

other print statements might be helpful.<br>

<br>

Thank you for your help so far. I will reply back if I find something.<br>

Otherwise suggestions welcome!<br>

<br>

The MFG system I can access got smaller this weekend but is still large<br>

enough to reproduce the error.<br>

<br>

As you can tell, I work mostly at a level well above filesystem code so<br>

thank you for staying with me as I struggle through this.<br>

<br>

Erik<br>

<br>

&gt; After we hear from all children, afr_inode_refresh_subvol_cbk() then calls afr_inode_refresh_done()--&gt;afr_txn_refresh_done()--&gt;afr_read_txn_refresh_done().<br>

&gt; But you already know this flow now.<br>

<br>

&gt; diff --git a/xlators/cluster/afr/src/afr-common.c b/xlators/cluster/afr/src/afr-common.c<br>

&gt; index 4bfaef9e8..096ce06f0 100644<br>

&gt; --- a/xlators/cluster/afr/src/afr-common.c<br>

&gt; +++ b/xlators/cluster/afr/src/afr-common.c<br>

&gt; @@ -1318,6 +1318,12 @@ afr_inode_refresh_subvol_cbk(call_frame_t *frame, void *cookie, xlator_t *this,<br>

&gt;          if (xdata)<br>

&gt;              local-&gt;replies[call_child].xdata = dict_ref(xdata);<br>

&gt;      }<br>

&gt; +    if (op_ret == -1)<br>

&gt; +        gf_msg_callingfn(<br>

&gt; +            this-&gt;name, GF_LOG_ERROR, op_errno, AFR_MSG_SPLIT_BRAIN,<br>

&gt; +            &quot;Inode refresh on child:%d failed with errno:%d for %s(%s) &quot;,<br>

&gt; +            call_child, op_errno, local-&gt;<a href="http://loc.name" target="_blank">loc.name</a>,<br>

&gt; +            uuid_utoa(local-&gt;loc.inode-&gt;gfid));<br>

&gt;      if (xdata) {<br>

&gt;          ret = dict_get_int8(xdata, &quot;link-count&quot;, &amp;need_heal);<br>

&gt;          local-&gt;replies[call_child].need_heal = need_heal;<br>

<br>

<br>

<br>

------------------------------<br>

<br>

Message: 2<br>

Date: Sun, 5 Apr 2020 20:22:21 -0500<br>

From: Erik Jacobson &lt;<a href="mailto:erik.jacobson@hpe.com" target="_blank">erik.jacobson@hpe.com</a>&gt;<br>

To: Ravishankar N &lt;<a href="mailto:ravishankar@redhat.com" target="_blank">ravishankar@redhat.com</a>&gt;<br>

Cc: <a href="mailto:gluster-users@gluster.org" target="_blank">gluster-users@gluster.org</a><br>

Subject: Re: [Gluster-users] gnfs split brain when 1 server in 3x1<br>

        down (high load) - help request<br>

Message-ID: &lt;<a href="mailto:20200406012221.GD29598@metalio.americas.hpqcorp.net" target="_blank">20200406012221.GD29598@metalio.americas.hpqcorp.net</a>&gt;<br>

Content-Type: text/plain; charset=us-ascii<br>

<br>

During the problem case, near as I can tell, afr_final_errno(),<br>

in the loop where tmp_errno = local-&gt;replies[i].op_errno is set,<br>

the errno is always &quot;2&quot; when it gets to that point on server 3 (where<br>

the NFS load is).<br>

<br>

I never see a value other than 2.<br>

<br>

I later simply put the print at the end of the function too, to double<br>

verify non-zero exit codes. There are thousands of non-zero return<br>

codes, all 2 when not zero. Here is an exmaple flow right before a<br>

split-brain. I do not wish to imply the split-brain is related, it&#39;s<br>

just an example log snip:<br>

<br>

<br>

[2020-04-06 00:54:21.125373] E [MSGID: 0] [afr-common.c:2546:afr_final_errno] 0-erikj-afr_final_errno: erikj dbg afr_final_errno() errno from loop before afr_higher_errno was: 2<br>

[2020-04-06 00:54:21.125374] E [MSGID: 0] [afr-common.c:2551:afr_final_errno] 0-erikj-afr_final_errno: erikj dbg returning non-zero: 2<br>

[2020-04-06 00:54:23.315397] E [MSGID: 0] [afr-read-txn.c:283:afr_read_txn_refresh_done] 0-cm_shared-replicate-0: erikj dbg crapola 1st if in afr_read_txn_refresh_done() !priv-&gt;thin_arbiter_count -- goto to readfn<br>

[2020-04-06 00:54:23.315432] E [MSGID: 108008] [afr-read-txn.c:314:afr_read_txn_refresh_done] 0-cm_shared-replicate-0: Failing READLINK on gfid 57f269ef-919d-40ec-b7fc-a7906fee648b: split-brain observed. [Input/output error]<br>

[2020-04-06 00:54:23.315450] W [MSGID: 112199] [nfs3-helpers.c:3327:nfs3_log_readlink_res] 0-nfs-nfsv3: /image/images_ro_nfs/rhel8.0/usr/lib64/libmlx5.so.1 =&gt; (XID: 1fdba2bc, READLINK: NFS: 5(I/O error), POSIX: 5(Input/output error)) target: (null)<br>

<br>

<br>

I am missing something. I will see if Scott and I can work together<br>

tomorrow. Happy for any more ideas, Thank you!!<br>

<br>

<br>

On Sun, Apr 05, 2020 at 06:49:56PM -0500, Erik Jacobson wrote:<br>

&gt; First, it&#39;s possible our analysis is off somewhere. I never get to your<br>

&gt; print message. I put a debug statement at the start of the function so I<br>

&gt; know we get there (just to verify my print statements were taking<br>

&gt; affect).<br>

&gt; <br>

&gt; I put a print statement for the if (call_count == 0) { call there, right<br>

&gt; after the if. I ran some tests.<br>

&gt; <br>

&gt; I suspect that isn&#39;t a problem area. There were some interesting results<br>

&gt; with an NFS stale file handle error going through that path. Otherwise<br>

&gt; it&#39;s always errno=0 even in the heavy test case. I&#39;m not concerned about<br>

&gt; a stale NFS file handle this moment. That print was also hit heavily when<br>

&gt; one server was down (which surprised me but I don&#39;t know the internals).<br>

&gt; <br>

&gt; I&#39;m trying to re-read and work through Scott&#39;s message to see if any<br>

&gt; other print statements might be helpful.<br>

&gt; <br>

&gt; Thank you for your help so far. I will reply back if I find something.<br>

&gt; Otherwise suggestions welcome!<br>

&gt; <br>

&gt; The MFG system I can access got smaller this weekend but is still large<br>

&gt; enough to reproduce the error.<br>

&gt; <br>

&gt; As you can tell, I work mostly at a level well above filesystem code so<br>

&gt; thank you for staying with me as I struggle through this.<br>

&gt; <br>

&gt; Erik<br>

&gt; <br>

&gt; &gt; After we hear from all children, afr_inode_refresh_subvol_cbk() then calls afr_inode_refresh_done()--&gt;afr_txn_refresh_done()--&gt;afr_read_txn_refresh_done().<br>

&gt; &gt; But you already know this flow now.<br>

&gt; <br>

&gt; &gt; diff --git a/xlators/cluster/afr/src/afr-common.c b/xlators/cluster/afr/src/afr-common.c<br>

&gt; &gt; index 4bfaef9e8..096ce06f0 100644<br>

&gt; &gt; --- a/xlators/cluster/afr/src/afr-common.c<br>

&gt; &gt; +++ b/xlators/cluster/afr/src/afr-common.c<br>

&gt; &gt; @@ -1318,6 +1318,12 @@ afr_inode_refresh_subvol_cbk(call_frame_t *frame, void *cookie, xlator_t *this,<br>

&gt; &gt;          if (xdata)<br>

&gt; &gt;              local-&gt;replies[call_child].xdata = dict_ref(xdata);<br>

&gt; &gt;      }<br>

&gt; &gt; +    if (op_ret == -1)<br>

&gt; &gt; +        gf_msg_callingfn(<br>

&gt; &gt; +            this-&gt;name, GF_LOG_ERROR, op_errno, AFR_MSG_SPLIT_BRAIN,<br>

&gt; &gt; +            &quot;Inode refresh on child:%d failed with errno:%d for %s(%s) &quot;,<br>

&gt; &gt; +            call_child, op_errno, local-&gt;<a href="http://loc.name" target="_blank">loc.name</a>,<br>

&gt; &gt; +            uuid_utoa(local-&gt;loc.inode-&gt;gfid));<br>

&gt; &gt;      if (xdata) {<br>

&gt; &gt;          ret = dict_get_int8(xdata, &quot;link-count&quot;, &amp;need_heal);<br>

&gt; &gt;          local-&gt;replies[call_child].need_heal = need_heal;<br>

<br>

<br>

<br>

<br>

------------------------------<br>

<br>

Message: 3<br>

Date: Mon, 6 Apr 2020 06:03:22 +0200<br>

From: Hu Bert &lt;<a href="mailto:revirii@googlemail.com" target="_blank">revirii@googlemail.com</a>&gt;<br>

To: Renaud Fortier &lt;<a href="mailto:Renaud.Fortier@fsaa.ulaval.ca" target="_blank">Renaud.Fortier@fsaa.ulaval.ca</a>&gt;<br>

Cc: &quot;<a href="mailto:gluster-users@gluster.org" target="_blank">gluster-users@gluster.org</a>&quot; &lt;<a href="mailto:gluster-users@gluster.org" target="_blank">gluster-users@gluster.org</a>&gt;<br>

Subject: Re: [Gluster-users] Repository down ?<br>

Message-ID:<br>

        &lt;CAAV-989P_=hNXj_PJdEBB8rG29b=<a href="mailto:jrzwDY3X%2BMBHi40zddzPgg@mail.gmail.com" target="_blank">jrzwDY3X+MBHi40zddzPgg@mail.gmail.com</a>&gt;<br>

Content-Type: text/plain; charset=&quot;UTF-8&quot;<br>

<br>

Good morning,<br>

<br>

upgraded from 5.11 to 6.8 today; 2 servers worked smoothly, one again<br>

had connection problems:<br>

<br>

Err:1 <a href="https://download.gluster.org/pub/gluster/glusterfs/6/6.8/Debian/buster/amd64/apt" target="_blank">

https://download.gluster.org/pub/gluster/glusterfs/6/6.8/Debian/buster/amd64/apt</a><br>

buster/main amd64 libglusterfs-dev amd64 6.8-1<br>

  Could not connect to <a href="http://download.gluster.org:443" target="_blank">download.gluster.org:443</a> (8.43.85.185),<br>

connection timed out<br>

Err:2 <a href="https://download.gluster.org/pub/gluster/glusterfs/6/6.8/Debian/buster/amd64/apt" target="_blank">

https://download.gluster.org/pub/gluster/glusterfs/6/6.8/Debian/buster/amd64/apt</a><br>

buster/main amd64 libgfxdr0 amd64 6.8-1<br>

  Unable to connect to download.gluster.org:https:<br>

<br>

As a workaround i downloaded the packages manually on one of the other<br>

2 servers, copied them to server3 and installed them manually.<br>

<br>

Any idea why this happens? /etc/hosts, /etc/resolv.conf are identical.<br>

The servers are behind the same gateway (switch in datacenter of<br>

provider), the server IPs differ only in the last number.<br>

<br>

<br>

Best regards,<br>

Hubert<br>

<br>

Am Fr., 3. Apr. 2020 um 10:33 Uhr schrieb Hu Bert &lt;<a href="mailto:revirii@googlemail.com" target="_blank">revirii@googlemail.com</a>&gt;:<br>

&gt;<br>

&gt; ok, half an hour later it worked. Not funny during an upgrade. Strange... :-)<br>

&gt;<br>

&gt;<br>

&gt; Regards,<br>

&gt; Hubert<br>

&gt;<br>

&gt; Am Fr., 3. Apr. 2020 um 10:19 Uhr schrieb Hu Bert &lt;<a href="mailto:revirii@googlemail.com" target="_blank">revirii@googlemail.com</a>&gt;:<br>

&gt; &gt;<br>

&gt; &gt; Hi,<br>

&gt; &gt;<br>

&gt; &gt; i&#39;m currently preparing an upgrade 5.x -&gt; 6.8; the download of the<br>

&gt; &gt; repository key works on 2 of 3 servers. nameserver settings are<br>

&gt; &gt; identical. On the 3rd server i get this:<br>

&gt; &gt;<br>

&gt; &gt; wget -O - <a href="https://download.gluster.org/pub/gluster/glusterfs/6/rsa.pub" target="_blank">

https://download.gluster.org/pub/gluster/glusterfs/6/rsa.pub</a><br>

&gt; &gt; | apt-key add -<br>

&gt; &gt; --2020-04-03 10:15:43--<br>

&gt; &gt; <a href="https://download.gluster.org/pub/gluster/glusterfs/6/rsa.pub" target="_blank">https://download.gluster.org/pub/gluster/glusterfs/6/rsa.pub</a><br>

&gt; &gt; Resolving <a href="http://download.gluster.org" target="_blank">download.gluster.org</a> (<a href="http://download.gluster.org" target="_blank">download.gluster.org</a>)... 8.43.85.185<br>

&gt; &gt; Connecting to <a href="http://download.gluster.org" target="_blank">download.gluster.org</a><br>

&gt; &gt; (<a href="http://download.gluster.org" target="_blank">download.gluster.org</a>)|8.43.85.185|:443... failed: Connection timed<br>

&gt; &gt; out.<br>

&gt; &gt; Retrying.<br>

&gt; &gt;<br>

&gt; &gt; and this goes on and on... Which errors do you see?<br>

&gt; &gt;<br>

&gt; &gt;<br>

&gt; &gt; Regards,<br>

&gt; &gt; Hubert<br>

&gt; &gt;<br>

&gt; &gt; Am Mo., 30. M?rz 2020 um 20:40 Uhr schrieb Renaud Fortier<br>

&gt; &gt; &lt;<a href="mailto:Renaud.Fortier@fsaa.ulaval.ca" target="_blank">Renaud.Fortier@fsaa.ulaval.ca</a>&gt;:<br>

&gt; &gt; &gt;<br>

&gt; &gt; &gt; Hi,<br>

&gt; &gt; &gt;<br>

&gt; &gt; &gt; I?m trying to download packages from the gluster repository <a href="https://download.gluster.org/" target="_blank">

https://download.gluster.org/</a> but it failed for every download I?ve tried.<br>

&gt; &gt; &gt;<br>

&gt; &gt; &gt;<br>

&gt; &gt; &gt;<br>

&gt; &gt; &gt; Is it happening only to me ?<br>

&gt; &gt; &gt;<br>

&gt; &gt; &gt; Thank you<br>

&gt; &gt; &gt;<br>

&gt; &gt; &gt;<br>

&gt; &gt; &gt;<br>

&gt; &gt; &gt; Renaud Fortier<br>

&gt; &gt; &gt;<br>

&gt; &gt; &gt;<br>

&gt; &gt; &gt;<br>

&gt; &gt; &gt; ________<br>

&gt; &gt; &gt;<br>

&gt; &gt; &gt;<br>

&gt; &gt; &gt;<br>

&gt; &gt; &gt; Community Meeting Calendar:<br>

&gt; &gt; &gt;<br>

&gt; &gt; &gt; Schedule -<br>

&gt; &gt; &gt; Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC<br>

&gt; &gt; &gt; Bridge: <a href="https://bluejeans.com/441850968" target="_blank">https://bluejeans.com/441850968</a><br>

&gt; &gt; &gt;<br>

&gt; &gt; &gt; Gluster-users mailing list<br>

&gt; &gt; &gt; <a href="mailto:Gluster-users@gluster.org" target="_blank">Gluster-users@gluster.org</a><br>

&gt; &gt; &gt; <a href="https://lists.gluster.org/mailman/listinfo/gluster-users" target="_blank">https://lists.gluster.org/mailman/listinfo/gluster-users</a><br>

<br>

<br>

------------------------------<br>

<br>

Message: 4<br>

Date: Mon, 6 Apr 2020 06:13:23 +0200<br>

From: Hu Bert &lt;<a href="mailto:revirii@googlemail.com" target="_blank">revirii@googlemail.com</a>&gt;<br>

To: gluster-users &lt;<a href="mailto:gluster-users@gluster.org" target="_blank">gluster-users@gluster.org</a>&gt;<br>

Subject: [Gluster-users] One error/warning message after upgrade 5.11<br>

        -&gt; 6.8<br>

Message-ID:<br>

        &lt;<a href="mailto:CAAV-988kaD0bat2p3Xfw8vHYhY-pcekeoPOyLq297XEqRosSyw@mail.gmail.com" target="_blank">CAAV-988kaD0bat2p3Xfw8vHYhY-pcekeoPOyLq297XEqRosSyw@mail.gmail.com</a>&gt;<br>

Content-Type: text/plain; charset=&quot;UTF-8&quot;<br>

<br>

Hello,<br>

<br>

i just upgraded my servers and clients from 5.11 to 6.8; besides one<br>

connection problem to the gluster download server everything went<br>

fine.<br>

<br>

On the 3 gluster servers i mount the 2 volumes as well, and only there<br>

(and not on all the other clients) there are some messages in the log<br>

file of both mount logs:<br>

<br>

[2020-04-06 04:10:53.552561] W [MSGID: 114031]<br>

[client-rpc-fops_v2.c:851:client4_0_setxattr_cbk]<br>

0-persistent-client-2: remote operation failed [Permission denied]<br>

[2020-04-06 04:10:53.552635] W [MSGID: 114031]<br>

[client-rpc-fops_v2.c:851:client4_0_setxattr_cbk]<br>

0-persistent-client-1: remote operation failed [Permission denied]<br>

[2020-04-06 04:10:53.552639] W [MSGID: 114031]<br>

[client-rpc-fops_v2.c:851:client4_0_setxattr_cbk]<br>

0-persistent-client-0: remote operation failed [Permission denied]<br>

[2020-04-06 04:10:53.553226] E [MSGID: 148002]<br>

[utime.c:146:gf_utime_set_mdata_setxattr_cbk] 0-persistent-utime: dict<br>

set of key for set-ctime-mdata failed [Permission denied]<br>

The message &quot;W [MSGID: 114031]<br>

[client-rpc-fops_v2.c:851:client4_0_setxattr_cbk]<br>

0-persistent-client-2: remote operation failed [Permission denied]&quot;<br>

repeated 4 times between [2020-04-06 04:10:53.552561] and [2020-04-06<br>

04:10:53.745542]<br>

The message &quot;W [MSGID: 114031]<br>

[client-rpc-fops_v2.c:851:client4_0_setxattr_cbk]<br>

0-persistent-client-1: remote operation failed [Permission denied]&quot;<br>

repeated 4 times between [2020-04-06 04:10:53.552635] and [2020-04-06<br>

04:10:53.745610]<br>

The message &quot;W [MSGID: 114031]<br>

[client-rpc-fops_v2.c:851:client4_0_setxattr_cbk]<br>

0-persistent-client-0: remote operation failed [Permission denied]&quot;<br>

repeated 4 times between [2020-04-06 04:10:53.552639] and [2020-04-06<br>

04:10:53.745632]<br>

The message &quot;E [MSGID: 148002]<br>

[utime.c:146:gf_utime_set_mdata_setxattr_cbk] 0-persistent-utime: dict<br>

set of key for set-ctime-mdata failed [Permission denied]&quot; repeated 4<br>

times between [2020-04-06 04:10:53.553226] and [2020-04-06<br>

04:10:53.746080]<br>

<br>

Anything to worry about?<br>

<br>

<br>

Regards,<br>

Hubert<br>

<br>

<br>

------------------------------<br>

<br>

Message: 5<br>

Date: Mon, 6 Apr 2020 10:06:41 +0530<br>

From: Ravishankar N &lt;<a href="mailto:ravishankar@redhat.com" target="_blank">ravishankar@redhat.com</a>&gt;<br>

To: Erik Jacobson &lt;<a href="mailto:erik.jacobson@hpe.com" target="_blank">erik.jacobson@hpe.com</a>&gt;<br>

Cc: <a href="mailto:gluster-users@gluster.org" target="_blank">gluster-users@gluster.org</a><br>

Subject: Re: [Gluster-users] gnfs split brain when 1 server in 3x1<br>

        down (high load) - help request<br>

Message-ID: &lt;<a href="mailto:4f5c5a73-69b9-f575-5974-d582e2a06051@redhat.com" target="_blank">4f5c5a73-69b9-f575-5974-d582e2a06051@redhat.com</a>&gt;<br>

Content-Type: text/plain; charset=windows-1252; format=flowed<br>

<br>

afr_final_errno() is called from many other places other than the inode <br>

refresh code path, so the 2 (ENOENT) could be from one of those (mostly <br>

afr_lookup_done) but it is puzzling that you are not seeing EIO even <br>

once when it is called from afr_inode_refresh_subvol_cbk() code path. <br>

Not sure what is happening here.<br>

<br>

On 06/04/20 6:52 am, Erik Jacobson wrote:<br>

&gt; During the problem case, near as I can tell, afr_final_errno(),<br>

&gt; in the loop where tmp_errno = local-&gt;replies[i].op_errno is set,<br>

&gt; the errno is always &quot;2&quot; when it gets to that point on server 3 (where<br>

&gt; the NFS load is).<br>

&gt;<br>

&gt; I never see a value other than 2.<br>

&gt;<br>

&gt; I later simply put the print at the end of the function too, to double<br>

&gt; verify non-zero exit codes. There are thousands of non-zero return<br>

&gt; codes, all 2 when not zero. Here is an exmaple flow right before a<br>

&gt; split-brain. I do not wish to imply the split-brain is related, it&#39;s<br>

&gt; just an example log snip:<br>

&gt;<br>

&gt;<br>

&gt; [2020-04-06 00:54:21.125373] E [MSGID: 0] [afr-common.c:2546:afr_final_errno] 0-erikj-afr_final_errno: erikj dbg afr_final_errno() errno from loop before afr_higher_errno was: 2<br>

&gt; [2020-04-06 00:54:21.125374] E [MSGID: 0] [afr-common.c:2551:afr_final_errno] 0-erikj-afr_final_errno: erikj dbg returning non-zero: 2<br>

&gt; [2020-04-06 00:54:23.315397] E [MSGID: 0] [afr-read-txn.c:283:afr_read_txn_refresh_done] 0-cm_shared-replicate-0: erikj dbg crapola 1st if in afr_read_txn_refresh_done() !priv-&gt;thin_arbiter_count -- goto to readfn<br>

&gt; [2020-04-06 00:54:23.315432] E [MSGID: 108008] [afr-read-txn.c:314:afr_read_txn_refresh_done] 0-cm_shared-replicate-0: Failing READLINK on gfid 57f269ef-919d-40ec-b7fc-a7906fee648b: split-brain observed. [Input/output error]<br>

&gt; [2020-04-06 00:54:23.315450] W [MSGID: 112199] [nfs3-helpers.c:3327:nfs3_log_readlink_res] 0-nfs-nfsv3: /image/images_ro_nfs/rhel8.0/usr/lib64/libmlx5.so.1 =&gt; (XID: 1fdba2bc, READLINK: NFS: 5(I/O error), POSIX: 5(Input/output error)) target: (null)<br>

&gt;<br>

&gt;<br>

&gt; I am missing something. I will see if Scott and I can work together<br>

&gt; tomorrow. Happy for any more ideas, Thank you!!<br>

&gt;<br>

&gt;<br>

&gt; On Sun, Apr 05, 2020 at 06:49:56PM -0500, Erik Jacobson wrote:<br>

&gt;&gt; First, it&#39;s possible our analysis is off somewhere. I never get to your<br>

&gt;&gt; print message. I put a debug statement at the start of the function so I<br>

&gt;&gt; know we get there (just to verify my print statements were taking<br>

&gt;&gt; affect).<br>

&gt;&gt;<br>

&gt;&gt; I put a print statement for the if (call_count == 0) { call there, right<br>

&gt;&gt; after the if. I ran some tests.<br>

&gt;&gt;<br>

&gt;&gt; I suspect that isn&#39;t a problem area. There were some interesting results<br>

&gt;&gt; with an NFS stale file handle error going through that path. Otherwise<br>

&gt;&gt; it&#39;s always errno=0 even in the heavy test case. I&#39;m not concerned about<br>

&gt;&gt; a stale NFS file handle this moment. That print was also hit heavily when<br>

&gt;&gt; one server was down (which surprised me but I don&#39;t know the internals).<br>

&gt;&gt;<br>

&gt;&gt; I&#39;m trying to re-read and work through Scott&#39;s message to see if any<br>

&gt;&gt; other print statements might be helpful.<br>

&gt;&gt;<br>

&gt;&gt; Thank you for your help so far. I will reply back if I find something.<br>

&gt;&gt; Otherwise suggestions welcome!<br>

&gt;&gt;<br>

&gt;&gt; The MFG system I can access got smaller this weekend but is still large<br>

&gt;&gt; enough to reproduce the error.<br>

&gt;&gt;<br>

&gt;&gt; As you can tell, I work mostly at a level well above filesystem code so<br>

&gt;&gt; thank you for staying with me as I struggle through this.<br>

&gt;&gt;<br>

&gt;&gt; Erik<br>

&gt;&gt;<br>

&gt;&gt;&gt; After we hear from all children, afr_inode_refresh_subvol_cbk() then calls afr_inode_refresh_done()--&gt;afr_txn_refresh_done()--&gt;afr_read_txn_refresh_done().<br>

&gt;&gt;&gt; But you already know this flow now.<br>

&gt;&gt;&gt; diff --git a/xlators/cluster/afr/src/afr-common.c b/xlators/cluster/afr/src/afr-common.c<br>

&gt;&gt;&gt; index 4bfaef9e8..096ce06f0 100644<br>

&gt;&gt;&gt; --- a/xlators/cluster/afr/src/afr-common.c<br>

&gt;&gt;&gt; +++ b/xlators/cluster/afr/src/afr-common.c<br>

&gt;&gt;&gt; @@ -1318,6 +1318,12 @@ afr_inode_refresh_subvol_cbk(call_frame_t *frame, void *cookie, xlator_t *this,<br>

&gt;&gt;&gt;           if (xdata)<br>

&gt;&gt;&gt;               local-&gt;replies[call_child].xdata = dict_ref(xdata);<br>

&gt;&gt;&gt;       }<br>

&gt;&gt;&gt; +    if (op_ret == -1)<br>

&gt;&gt;&gt; +        gf_msg_callingfn(<br>

&gt;&gt;&gt; +            this-&gt;name, GF_LOG_ERROR, op_errno, AFR_MSG_SPLIT_BRAIN,<br>

&gt;&gt;&gt; +            &quot;Inode refresh on child:%d failed with errno:%d for %s(%s) &quot;,<br>

&gt;&gt;&gt; +            call_child, op_errno, local-&gt;<a href="http://loc.name" target="_blank">loc.name</a>,<br>

&gt;&gt;&gt; +            uuid_utoa(local-&gt;loc.inode-&gt;gfid));<br>

&gt;&gt;&gt;       if (xdata) {<br>

&gt;&gt;&gt;           ret = dict_get_int8(xdata, &quot;link-count&quot;, &amp;need_heal);<br>

&gt;&gt;&gt;           local-&gt;replies[call_child].need_heal = need_heal;<br>

&gt;<br>

<br>

<br>

<br>

------------------------------<br>

<br>

Message: 6<br>

Date: Mon, 6 Apr 2020 10:54:10 +0530<br>

From: Hari Gowtham &lt;<a href="mailto:hgowtham@redhat.com" target="_blank">hgowtham@redhat.com</a>&gt;<br>

To: gluster-users &lt;<a href="mailto:gluster-users@gluster.org" target="_blank">gluster-users@gluster.org</a>&gt;,  gluster-devel<br>

        &lt;<a href="mailto:gluster-devel@gluster.org" target="_blank">gluster-devel@gluster.org</a>&gt;<br>

Subject: [Gluster-users] Gluster testcase Hackathon<br>

Message-ID:<br>

        &lt;CAKh1kXshSzz1FxmoitCPLyX7_jXe+=j0Nqwt3__hfr=<a href="mailto:671jN8w@mail.gmail.com" target="_blank">671jN8w@mail.gmail.com</a>&gt;<br>

Content-Type: text/plain; charset=&quot;utf-8&quot;<br>

<br>

Hi all,<br>

<br>

We have been seeing a good number of CI test cases failing. This does<br>

become a problem with taking in new fixes and having them marked as bad<br>

tests reduces the test coverage. As a result, we are planning to have a<br>

hackathon on 09th April. This will be a virtual one happening on google<br>

meet. Information to join can be found below[4].<br>

<br>

We will be working on the tests that are currently failing spuriously[1]<br>

and also on the test cases that have been marked as bad[2] in the past and<br>

fix these. The consolidated list [2] has all the test cases we will look<br>

into during this hackathon. If any of you have come across a test which has<br>

failed and was missed in the list, feel free to link the test case and the<br>

link to failure in the consolidated list [2].<br>

<br>

Prerequisite would be to have a centos machine where you can clone gluster<br>

and work on.<br>

The action item would be to take up as many test cases as possible(write<br>

down your name against the test case to avoid rework), run it on the local<br>

environment, find out why it fails, send out a fix for it and review<br>

others&#39; patches. If you want to add more test cases as per your usage, feel<br>

free to do so.<br>

<br>

You can go through the link[3] to get the basic idea on how to set up and<br>

contribute. If there are more questions, please feel free to ask them, also<br>

you can discuss it with us during the hackathon and sort it as well.<br>

<br>

We will send out a google calendar invite soon. The tentative timing will<br>

be from 11am to 4:30pm IST<br>

<br>

[1] <a href="https://fstat.gluster.org/summary" target="_blank">https://fstat.gluster.org/summary</a><br>

[2]<br>

<a href="https://docs.google.com/spreadsheets/d/1_j_JfJw1YjEziVT1pe8I_8-kMf7LVVmlleaukEikHw0/edit?usp=sharing" target="_blank">https://docs.google.com/spreadsheets/d/1_j_JfJw1YjEziVT1pe8I_8-kMf7LVVmlleaukEikHw0/edit?usp=sharing</a><br>

[3]<br>

<a href="https://docs.gluster.org/en/latest/Developer-guide/Simplified-Development-Workflow/" target="_blank">https://docs.gluster.org/en/latest/Developer-guide/Simplified-Development-Workflow/</a><br>

[4] To join the video meeting, click this link:<br>

<a href="https://meet.google.com/cde-uycs-koz" target="_blank">https://meet.google.com/cde-uycs-koz</a><br>

Otherwise, to join by phone, dial +1 501-939-4201 and enter this PIN: 803<br>

155 592#<br>

To view more phone numbers, click this link:<br>

<a href="https://tel.meet/cde-uycs-koz?hs=5" target="_blank">https://tel.meet/cde-uycs-koz?hs=5</a><br>

<br>

-- <br>

Regards,<br>

Hari Gowtham.<br>

-------------- next part --------------<br>

An HTML attachment was scrubbed...<br>

URL: &lt;<a href="http://lists.gluster.org/pipermail/gluster-users/attachments/20200406/4a25d280/attachment-0001.html" target="_blank">http://lists.gluster.org/pipermail/gluster-users/attachments/20200406/4a25d280/attachment-0001.html</a>&gt;<br>

<br>

------------------------------<br>

<br>

Message: 7<br>

Date: Mon, 6 Apr 2020 12:30:41 +0200<br>

From: Hu Bert &lt;<a href="mailto:revirii@googlemail.com" target="_blank">revirii@googlemail.com</a>&gt;<br>

To: gluster-users &lt;<a href="mailto:gluster-users@gluster.org" target="_blank">gluster-users@gluster.org</a>&gt;<br>

Subject: [Gluster-users] gluster v6.8: systemd units disabled after<br>

        install<br>

Message-ID:<br>

        &lt;CAAV-98-KT5iv63N32S_ta0A9Fw1YaKZQWhNjB5x=<a href="mailto:R4jU4NTbeg@mail.gmail.com" target="_blank">R4jU4NTbeg@mail.gmail.com</a>&gt;<br>

Content-Type: text/plain; charset=&quot;UTF-8&quot;<br>

<br>

Hello,<br>

<br>

after a server reboot (with a fresh gluster 6.8 install) i noticed<br>

that the gluster services weren&#39;t running.<br>

<br>

systemctl status glusterd.service<br>

? glusterd.service - GlusterFS, a clustered file-system server<br>

   Loaded: loaded (/lib/systemd/system/glusterd.service; disabled;<br>

vendor preset: enabled)<br>

   Active: inactive (dead)<br>

     Docs: man:glusterd(8)<br>

<br>

Apr 06 11:34:18 glfsserver1 systemd[1]:<br>

/lib/systemd/system/glusterd.service:9: PIDFile= references path below<br>

legacy directory /var/run/, updating /var/run/glusterd.pid ?<br>

/run/glusterd.pid; please update the unit file accordingly.<br>

<br>

systemctl status glustereventsd.service<br>

? glustereventsd.service - Gluster Events Notifier<br>

   Loaded: loaded (/lib/systemd/system/glustereventsd.service;<br>

disabled; vendor preset: enabled)<br>

   Active: inactive (dead)<br>

     Docs: man:glustereventsd(8)<br>

<br>

Apr 06 11:34:27 glfsserver1 systemd[1]:<br>

/lib/systemd/system/glustereventsd.service:11: PIDFile= references<br>

path below legacy directory /var/run/, updating<br>

/var/run/glustereventsd.pid ? /run/glustereventsd.pid; please update<br>

the unit file accordingly.<br>

<br>

You have to enable them manually:<br>

<br>

systemctl enable glusterd.service<br>

Created symlink<br>

/etc/systemd/system/multi-user.target.wants/glusterd.service ?<br>

/lib/systemd/system/glusterd.service.<br>

systemctl enable glustereventsd.service<br>

Created symlink<br>

/etc/systemd/system/multi-user.target.wants/glustereventsd.service ?<br>

/lib/systemd/system/glustereventsd.service.<br>

<br>

Is this a bug? If so: already known?<br>

<br>

<br>

Regards,<br>

Hubert<br>

<br>

<br>

------------------------------<br>

<br>

_______________________________________________<br>

Gluster-users mailing list<br>

<a href="mailto:Gluster-users@gluster.org" target="_blank">Gluster-users@gluster.org</a><br>

<a href="https://lists.gluster.org/mailman/listinfo/gluster-users" target="_blank">https://lists.gluster.org/mailman/listinfo/gluster-users</a><br>

<br>

End of Gluster-users Digest, Vol 144, Issue 6<br>

*********************************************<br>

</div>

</span></font></div>

</div>

</div>

</div>

</div>


________<br>

<br>

<br>

<br>

Community Meeting Calendar:<br>

<br>

Schedule -<br>

Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC<br>

Bridge: <a href="https://bluejeans.com/441850968" rel="noreferrer" target="_blank">https://bluejeans.com/441850968</a><br>

<br>

Gluster-users mailing list<br>

<a href="mailto:Gluster-users@gluster.org" target="_blank">Gluster-users@gluster.org</a><br>

<a href="https://lists.gluster.org/mailman/listinfo/gluster-users" rel="noreferrer" target="_blank">https://lists.gluster.org/mailman/listinfo/gluster-users</a><br>

</blockquote></div><br clear="all"><div><br></div>-- <br><div dir="ltr" class="gmail_signature">Suvendu Mitra<br>GSM - +358504821066</div>