<div dir="ltr">I've also moved this to github: <a href="https://github.com/gluster/glusterfs/issues/1257">https://github.com/gluster/glusterfs/issues/1257</a>.<br clear="all"><div><div dir="ltr" class="gmail_signature" data-smartmail="gmail_signature"><div dir="ltr"><div><div dir="ltr"><div><div dir="ltr"><div><div dir="ltr"><div><div dir="ltr"><div><div dir="ltr"><br>Sincerely,<br>Artem<br><br>--<br>Founder, <a href="http://www.androidpolice.com" target="_blank">Android Police</a>, <a href="http://www.apkmirror.com/" style="font-size:12.8px" target="_blank">APK Mirror</a><span style="font-size:12.8px">, Illogical Robot LLC</span></div><div dir="ltr"><a href="http://beerpla.net/" target="_blank">beerpla.net</a> | <a href="http://twitter.com/ArtemR" target="_blank">@ArtemR</a><br></div></div></div></div></div></div></div></div></div></div></div></div></div><br></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Fri, May 15, 2020 at 2:51 PM Artem Russakovskii <<a href="mailto:archon810@gmail.com">archon810@gmail.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr">Hi,<div><br></div><div>I see the team met up recently and one of the discussed items was issues upgrading to v7. What were the results of this discussion?</div><div><br></div><div>Is the team going to respond to this thread with their thoughts and analysis?</div><div><br></div><div>Thanks.<br clear="all"><div><div dir="ltr"><div dir="ltr"><div><div dir="ltr"><div><div dir="ltr"><div><div dir="ltr"><div><div dir="ltr"><div><div dir="ltr"><br>Sincerely,<br>Artem<br><br>--<br>Founder, <a href="http://www.androidpolice.com" target="_blank">Android Police</a>, <a href="http://www.apkmirror.com/" style="font-size:12.8px" target="_blank">APK Mirror</a><span style="font-size:12.8px">, Illogical Robot LLC</span></div><div dir="ltr"><a href="http://beerpla.net/" target="_blank">beerpla.net</a> | <a href="http://twitter.com/ArtemR" target="_blank">@ArtemR</a><br></div></div></div></div></div></div></div></div></div></div></div></div></div><br></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Mon, May 4, 2020 at 10:23 PM Strahil Nikolov <<a href="mailto:hunter86_bg@yahoo.com" target="_blank">hunter86_bg@yahoo.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">On May 4, 2020 4:26:32 PM GMT+03:00, Amar Tumballi <<a href="mailto:amar@kadalu.io" target="_blank">amar@kadalu.io</a>> wrote:<br>
>On Sat, May 2, 2020 at 10:49 PM Artem Russakovskii<br>
><<a href="mailto:archon810@gmail.com" target="_blank">archon810@gmail.com</a>><br>
>wrote:<br>
><br>
>> I don't have geo replication.<br>
>><br>
>> Still waiting for someone from the gluster team to chime in. They<br>
>used to<br>
>> be a lot more responsive here. Do you know if there is a holiday<br>
>perhaps,<br>
>> or have the working hours been cut due to Coronavirus currently?<br>
>><br>
>><br>
>It was Holiday on May 1st, and 2nd and 3rd were Weekend days! And also<br>
>I<br>
>guess many of Developers from Red Hat were attending Virtual Summit!<br>
><br>
><br>
><br>
>> I'm not inclined to try a v6 upgrade without their word first.<br>
>><br>
><br>
>Fair bet! I will bring this topic in one of the community meetings, and<br>
>ask<br>
>developers if they have some feedback! I personally have not seen these<br>
>errors, and don't have a hunch on which patch would have caused an<br>
>increase<br>
>in logs!<br>
><br>
>-Amar<br>
><br>
><br>
>><br>
>> On Sat, May 2, 2020, 12:47 AM Strahil Nikolov <<a href="mailto:hunter86_bg@yahoo.com" target="_blank">hunter86_bg@yahoo.com</a>><br>
>> wrote:<br>
>><br>
>>> On May 1, 2020 8:03:50 PM GMT+03:00, Artem Russakovskii <<br>
>>> <a href="mailto:archon810@gmail.com" target="_blank">archon810@gmail.com</a>> wrote:<br>
>>> >The good news is the downgrade seems to have worked and was<br>
>painless.<br>
>>> ><br>
>>> >zypper install --oldpackage glusterfs-5.13, restart gluster, and<br>
>almost<br>
>>> >immediately there are no heal pending entries anymore.<br>
>>> ><br>
>>> >The only things still showing up in the logs, besides some healing<br>
>is<br>
>>> >0-glusterfs-fuse:<br>
>>> >writing to fuse device failed: No such file or directory:<br>
>>> >==> mnt-androidpolice_data3.log <==<br>
>>> >[2020-05-01 16:54:21.085643] E<br>
>>> >[fuse-bridge.c:219:check_and_dump_fuse_W]<br>
>>> >(--><br>
>>><br>
>>/usr/lib64/libglusterfs.so.0(_gf_log_callingfn+0x17d)[0x7fd13d50624d]<br>
>>> >(--><br>
>>><br>
>>/usr/lib64/glusterfs/5.13/xlator/mount/fuse.so(+0x849a)[0x7fd1398e949a]<br>
>>> >(--><br>
>>><br>
>>/usr/lib64/glusterfs/5.13/xlator/mount/fuse.so(+0x87bb)[0x7fd1398e97bb]<br>
>>> >(--> /lib64/libpthread.so.0(+0x84f9)[0x7fd13ca564f9] (--><br>
>>> >/lib64/libc.so.6(clone+0x3f)[0x7fd13c78ef2f] )))))<br>
>0-glusterfs-fuse:<br>
>>> >writing to fuse device failed: No such file or directory<br>
>>> >==> mnt-apkmirror_data1.log <==<br>
>>> >[2020-05-01 16:54:21.268842] E<br>
>>> >[fuse-bridge.c:219:check_and_dump_fuse_W]<br>
>>> >(--><br>
>>><br>
>>/usr/lib64/libglusterfs.so.0(_gf_log_callingfn+0x17d)[0x7fdf2b0a624d]<br>
>>> >(--><br>
>>><br>
>>/usr/lib64/glusterfs/5.13/xlator/mount/fuse.so(+0x849a)[0x7fdf2748949a]<br>
>>> >(--><br>
>>><br>
>>/usr/lib64/glusterfs/5.13/xlator/mount/fuse.so(+0x87bb)[0x7fdf274897bb]<br>
>>> >(--> /lib64/libpthread.so.0(+0x84f9)[0x7fdf2a5f64f9] (--><br>
>>> >/lib64/libc.so.6(clone+0x3f)[0x7fdf2a32ef2f] )))))<br>
>0-glusterfs-fuse:<br>
>>> >writing to fuse device failed: No such file or directory<br>
>>> ><br>
>>> >It'd be very helpful if it had more info about what failed to write<br>
>and<br>
>>> >why.<br>
>>> ><br>
>>> >I'd still really love to see the analysis of this failed upgrade<br>
>from<br>
>>> >core<br>
>>> >gluster maintainers to see what needs fixing and how we can upgrade<br>
>in<br>
>>> >the<br>
>>> >future.<br>
>>> ><br>
>>> >Thanks.<br>
>>> ><br>
>>> >Sincerely,<br>
>>> >Artem<br>
>>> ><br>
>>> >--<br>
>>> >Founder, Android Police <<a href="http://www.androidpolice.com" rel="noreferrer" target="_blank">http://www.androidpolice.com</a>>, APK Mirror<br>
>>> ><<a href="http://www.apkmirror.com/" rel="noreferrer" target="_blank">http://www.apkmirror.com/</a>>, Illogical Robot LLC<br>
>>> ><a href="http://beerpla.net" rel="noreferrer" target="_blank">beerpla.net</a> | @ArtemR <<a href="http://twitter.com/ArtemR" rel="noreferrer" target="_blank">http://twitter.com/ArtemR</a>><br>
>>> ><br>
>>> ><br>
>>> >On Fri, May 1, 2020 at 7:25 AM Artem Russakovskii<br>
><<a href="mailto:archon810@gmail.com" target="_blank">archon810@gmail.com</a>><br>
>>> >wrote:<br>
>>> ><br>
>>> >> I do not have snapshots, no. I have a general file based backup,<br>
>but<br>
>>> >also<br>
>>> >> the other 3 nodes are up.<br>
>>> >><br>
>>> >> OpenSUSE 15.1.<br>
>>> >><br>
>>> >> If I try to downgrade and it doesn't work, what's the brick<br>
>>> >replacement<br>
>>> >> scenario - is this still accurate?<br>
>>> >><br>
>>> ><br>
>>><br>
><a href="https://docs.gluster.org/en/latest/Administrator%20Guide/Managing%20Volumes/#replace-brick" rel="noreferrer" target="_blank">https://docs.gluster.org/en/latest/Administrator%20Guide/Managing%20Volumes/#replace-brick</a><br>
>>> >><br>
>>> >> Any feedback about the issues themselves yet please?<br>
>Specifically, is<br>
>>> >> there a chance this is happening because of the mismatched<br>
>gluster<br>
>>> >> versions? Though, what's the solution then?<br>
>>> >><br>
>>> >> On Fri, May 1, 2020, 1:07 AM Strahil Nikolov<br>
><<a href="mailto:hunter86_bg@yahoo.com" target="_blank">hunter86_bg@yahoo.com</a>><br>
>>> >> wrote:<br>
>>> >><br>
>>> >>> On May 1, 2020 1:25:17 AM GMT+03:00, Artem Russakovskii <<br>
>>> >>> <a href="mailto:archon810@gmail.com" target="_blank">archon810@gmail.com</a>> wrote:<br>
>>> >>> >If more time is needed to analyze this, is this an option? Shut<br>
>>> >down<br>
>>> >>> >7.5,<br>
>>> >>> >downgrade it back to 5.13 and restart, or would this screw<br>
>>> >something up<br>
>>> >>> >badly? I didn't up the op-version yet.<br>
>>> >>> ><br>
>>> >>> >Thanks.<br>
>>> >>> ><br>
>>> >>> >Sincerely,<br>
>>> >>> >Artem<br>
>>> >>> ><br>
>>> >>> >--<br>
>>> >>> >Founder, Android Police <<a href="http://www.androidpolice.com" rel="noreferrer" target="_blank">http://www.androidpolice.com</a>>, APK<br>
>Mirror<br>
>>> >>> ><<a href="http://www.apkmirror.com/" rel="noreferrer" target="_blank">http://www.apkmirror.com/</a>>, Illogical Robot LLC<br>
>>> >>> ><a href="http://beerpla.net" rel="noreferrer" target="_blank">beerpla.net</a> | @ArtemR <<a href="http://twitter.com/ArtemR" rel="noreferrer" target="_blank">http://twitter.com/ArtemR</a>><br>
>>> >>> ><br>
>>> >>> ><br>
>>> >>> >On Thu, Apr 30, 2020 at 3:13 PM Artem Russakovskii<br>
>>> >>> ><<a href="mailto:archon810@gmail.com" target="_blank">archon810@gmail.com</a>><br>
>>> >>> >wrote:<br>
>>> >>> ><br>
>>> >>> >> The number of heal pending on citadel, the one that was<br>
>upgraded<br>
>>> >to<br>
>>> >>> >7.5,<br>
>>> >>> >> has now gone to 10s of thousands and continues to go up.<br>
>>> >>> >><br>
>>> >>> >> Sincerely,<br>
>>> >>> >> Artem<br>
>>> >>> >><br>
>>> >>> >> --<br>
>>> >>> >> Founder, Android Police <<a href="http://www.androidpolice.com" rel="noreferrer" target="_blank">http://www.androidpolice.com</a>>, APK<br>
>>> >Mirror<br>
>>> >>> >> <<a href="http://www.apkmirror.com/" rel="noreferrer" target="_blank">http://www.apkmirror.com/</a>>, Illogical Robot LLC<br>
>>> >>> >> <a href="http://beerpla.net" rel="noreferrer" target="_blank">beerpla.net</a> | @ArtemR <<a href="http://twitter.com/ArtemR" rel="noreferrer" target="_blank">http://twitter.com/ArtemR</a>><br>
>>> >>> >><br>
>>> >>> >><br>
>>> >>> >> On Thu, Apr 30, 2020 at 2:57 PM Artem Russakovskii<br>
>>> >>> ><<a href="mailto:archon810@gmail.com" target="_blank">archon810@gmail.com</a>><br>
>>> >>> >> wrote:<br>
>>> >>> >><br>
>>> >>> >>> Hi all,<br>
>>> >>> >>><br>
>>> >>> >>> Today, I decided to upgrade one of the four servers<br>
>(citadel) we<br>
>>> >>> >have to<br>
>>> >>> >>> 7.5 from 5.13. There are 2 volumes, 1x4 replicate, and fuse<br>
>>> >mounts<br>
>>> >>> >(I sent<br>
>>> >>> >>> the full details earlier in another message). If everything<br>
>>> >looked<br>
>>> >>> >OK, I<br>
>>> >>> >>> would have proceeded the rolling upgrade for all of them,<br>
>>> >following<br>
>>> >>> >the<br>
>>> >>> >>> full heal.<br>
>>> >>> >>><br>
>>> >>> >>> However, as soon as I upgraded and restarted, the logs<br>
>filled<br>
>>> >with<br>
>>> >>> >>> messages like these:<br>
>>> >>> >>><br>
>>> >>> >>> [2020-04-30 21:39:21.316149] E<br>
>>> >>> >>> [rpcsvc.c:567:rpcsvc_check_and_reply_error] 0-rpcsvc: rpc<br>
>actor<br>
>>> >>> >>> (1298437:400:17) failed to complete successfully<br>
>>> >>> >>> [2020-04-30 21:39:21.382891] E<br>
>>> >>> >>> [rpcsvc.c:567:rpcsvc_check_and_reply_error] 0-rpcsvc: rpc<br>
>actor<br>
>>> >>> >>> (1298437:400:17) failed to complete successfully<br>
>>> >>> >>> [2020-04-30 21:39:21.442440] E<br>
>>> >>> >>> [rpcsvc.c:567:rpcsvc_check_and_reply_error] 0-rpcsvc: rpc<br>
>actor<br>
>>> >>> >>> (1298437:400:17) failed to complete successfully<br>
>>> >>> >>> [2020-04-30 21:39:21.445587] E<br>
>>> >>> >>> [rpcsvc.c:567:rpcsvc_check_and_reply_error] 0-rpcsvc: rpc<br>
>actor<br>
>>> >>> >>> (1298437:400:17) failed to complete successfully<br>
>>> >>> >>> [2020-04-30 21:39:21.571398] E<br>
>>> >>> >>> [rpcsvc.c:567:rpcsvc_check_and_reply_error] 0-rpcsvc: rpc<br>
>actor<br>
>>> >>> >>> (1298437:400:17) failed to complete successfully<br>
>>> >>> >>> [2020-04-30 21:39:21.668192] E<br>
>>> >>> >>> [rpcsvc.c:567:rpcsvc_check_and_reply_error] 0-rpcsvc: rpc<br>
>actor<br>
>>> >>> >>> (1298437:400:17) failed to complete successfully<br>
>>> >>> >>><br>
>>> >>> >>><br>
>>> >>> >>> The message "I [MSGID: 108031]<br>
>>> >>> >>> [afr-common.c:2581:afr_local_discovery_cbk]<br>
>>> >>> >>> 0-androidpolice_data3-replicate-0: selecting local<br>
>read_child<br>
>>> >>> >>> androidpolice_data3-client-3" repeated 10 times between<br>
>>> >[2020-04-30<br>
>>> >>> >>> 21:46:41.854675] and [2020-04-30 21:48:20.206323]<br>
>>> >>> >>> The message "W [MSGID: 114031]<br>
>>> >>> >>> [client-rpc-fops_v2.c:850:client4_0_setxattr_cbk]<br>
>>> >>> >>> 0-androidpolice_data3-client-1: remote operation failed<br>
>>> >[Transport<br>
>>> >>> >endpoint<br>
>>> >>> >>> is not connected]" repeated 264 times between [2020-04-30<br>
>>> >>> >21:46:32.129567]<br>
>>> >>> >>> and [2020-04-30 21:48:29.905008]<br>
>>> >>> >>> The message "W [MSGID: 114031]<br>
>>> >>> >>> [client-rpc-fops_v2.c:850:client4_0_setxattr_cbk]<br>
>>> >>> >>> 0-androidpolice_data3-client-0: remote operation failed<br>
>>> >[Transport<br>
>>> >>> >endpoint<br>
>>> >>> >>> is not connected]" repeated 264 times between [2020-04-30<br>
>>> >>> >21:46:32.129602]<br>
>>> >>> >>> and [2020-04-30 21:48:29.905040]<br>
>>> >>> >>> The message "W [MSGID: 114031]<br>
>>> >>> >>> [client-rpc-fops_v2.c:850:client4_0_setxattr_cbk]<br>
>>> >>> >>> 0-androidpolice_data3-client-2: remote operation failed<br>
>>> >[Transport<br>
>>> >>> >endpoint<br>
>>> >>> >>> is not connected]" repeated 264 times between [2020-04-30<br>
>>> >>> >21:46:32.129512]<br>
>>> >>> >>> and [2020-04-30 21:48:29.905047]<br>
>>> >>> >>><br>
>>> >>> >>><br>
>>> >>> >>><br>
>>> >>> >>> Once in a while, I'm seeing this:<br>
>>> >>> >>> ==> bricks/mnt-hive_block4-androidpolice_data3.log <==<br>
>>> >>> >>> [2020-04-30 21:45:54.251637] I [MSGID: 115072]<br>
>>> >>> >>> [server-rpc-fops_v2.c:1681:server4_setattr_cbk]<br>
>>> >>> >>> 0-androidpolice_data3-server: 5725811: SETATTR /<br>
>>> >>> >>><br>
>>> >>> ><br>
>>> >>><br>
>>> ><br>
>>><br>
><a href="http://androidpolice.com/public/wp-content/uploads/2019/03/cielo-breez-plus-hero.png" rel="noreferrer" target="_blank">androidpolice.com/public/wp-content/uploads/2019/03/cielo-breez-plus-hero.png</a><br>
>>> >>> >>> (d4556eb4-f15b-412c-a42a-32b4438af557), client:<br>
>>> >>> >>><br>
>>> >>><br>
>>> >>><br>
>>><br>
>>><br>
>>>CTX_ID:32e2d636-038a-472d-8199-007555d1805f-GRAPH_ID:0-PID:14265-HOST:nexus2-PC_NAME:androidpolice_data3-client-2-RECON_NO:-1,<br>
>>> >>> >>> error-xlator: androidpolice_data3-access-control [Operation<br>
>not<br>
>>> >>> >permitted]<br>
>>> >>> >>> [2020-04-30 21:49:10.439701] I [MSGID: 115072]<br>
>>> >>> >>> [server-rpc-fops_v2.c:1680:server4_setattr_cbk]<br>
>>> >>> >>> 0-androidpolice_data3-server: 201833: SETATTR /<br>
>>> >>> >>> <a href="http://androidpolice.com/public/wp-content/uploads" rel="noreferrer" target="_blank">androidpolice.com/public/wp-content/uploads</a><br>
>>> >>> >>> (2692eeba-1ebe-49b6-927f-1dfbcd227591), client:<br>
>>> >>> >>><br>
>>> >>><br>
>>> >>><br>
>>><br>
>>><br>
>>>CTX_ID:af341e80-70ff-4d23-99ef-3d846a546fc9-GRAPH_ID:0-PID:2358-HOST:forge-PC_NAME:androidpolice_data3-client-3-RECON_NO:-2,<br>
>>> >>> >>> error-xlator: androidpolice_data3-access-control [Operation<br>
>not<br>
>>> >>> >permitted]<br>
>>> >>> >>> [2020-04-30 21:49:10.453724] I [MSGID: 115072]<br>
>>> >>> >>> [server-rpc-fops_v2.c:1680:server4_setattr_cbk]<br>
>>> >>> >>> 0-androidpolice_data3-server: 201842: SETATTR /<br>
>>> >>> >>> <a href="http://androidpolice.com/public/wp-content/uploads" rel="noreferrer" target="_blank">androidpolice.com/public/wp-content/uploads</a><br>
>>> >>> >>> (2692eeba-1ebe-49b6-927f-1dfbcd227591), client:<br>
>>> >>> >>><br>
>>> >>><br>
>>> >>><br>
>>><br>
>>><br>
>>>CTX_ID:af341e80-70ff-4d23-99ef-3d846a546fc9-GRAPH_ID:0-PID:2358-HOST:forge-PC_NAME:androidpolice_data3-client-3-RECON_NO:-2,<br>
>>> >>> >>> error-xlator: androidpolice_data3-access-control [Operation<br>
>not<br>
>>> >>> >permitted]<br>
>>> >>> >>> [2020-04-30 21:49:16.224662] I [MSGID: 115072]<br>
>>> >>> >>> [server-rpc-fops_v2.c:1680:server4_setattr_cbk]<br>
>>> >>> >>> 0-androidpolice_data3-server: 202865: SETATTR /<br>
>>> >>> >>> <a href="http://androidpolice.com/public/wp-content/uploads" rel="noreferrer" target="_blank">androidpolice.com/public/wp-content/uploads</a><br>
>>> >>> >>> (2692eeba-1ebe-49b6-927f-1dfbcd227591), client:<br>
>>> >>> >>><br>
>>> >>><br>
>>> >>><br>
>>><br>
>>><br>
>>>CTX_ID:32e2d636-038a-472d-8199-007555d1805f-GRAPH_ID:0-PID:14265-HOST:nexus2-PC_NAME:androidpolice_data3-client-3-RECON_NO:-2,<br>
>>> >>> >>> error-xlator: androidpolice_data3-access-control [Operation<br>
>not<br>
>>> >>> >permitted]<br>
>>> >>> >>><br>
>>> >>> >>> There's also lots of self-healing happening that I didn't<br>
>expect<br>
>>> >at<br>
>>> >>> >all,<br>
>>> >>> >>> since the upgrade only took ~10-15s.<br>
>>> >>> >>> [2020-04-30 21:47:38.714448] I [MSGID: 108026]<br>
>>> >>> >>> [afr-self-heal-metadata.c:52:__afr_selfheal_metadata_do]<br>
>>> >>> >>> 0-apkmirror_data1-replicate-0: performing metadata selfheal<br>
>on<br>
>>> >>> >>> 4a6ba2d7-7ad8-4113-862b-02e4934a3461<br>
>>> >>> >>> [2020-04-30 21:47:38.765033] I [MSGID: 108026]<br>
>>> >>> >>> [afr-self-heal-common.c:1723:afr_log_selfheal]<br>
>>> >>> >>> 0-apkmirror_data1-replicate-0: Completed metadata selfheal<br>
>on<br>
>>> >>> >>> 4a6ba2d7-7ad8-4113-862b-02e4934a3461. sources=[3] sinks=0 1<br>
>2<br>
>>> >>> >>> [2020-04-30 21:47:38.765289] I [MSGID: 108026]<br>
>>> >>> >>> [afr-self-heal-metadata.c:52:__afr_selfheal_metadata_do]<br>
>>> >>> >>> 0-apkmirror_data1-replicate-0: performing metadata selfheal<br>
>on<br>
>>> >>> >>> f3c62a41-1864-4e75-9883-4357a7091296<br>
>>> >>> >>> [2020-04-30 21:47:38.800987] I [MSGID: 108026]<br>
>>> >>> >>> [afr-self-heal-common.c:1723:afr_log_selfheal]<br>
>>> >>> >>> 0-apkmirror_data1-replicate-0: Completed metadata selfheal<br>
>on<br>
>>> >>> >>> f3c62a41-1864-4e75-9883-4357a7091296. sources=[3] sinks=0 1<br>
>2<br>
>>> >>> >>><br>
>>> >>> >>><br>
>>> >>> >>> I'm also seeing "remote operation failed" and "writing to<br>
>fuse<br>
>>> >>> >device<br>
>>> >>> >>> failed: No such file or directory" messages<br>
>>> >>> >>> [2020-04-30 21:46:34.891957] I [MSGID: 108026]<br>
>>> >>> >>> [afr-self-heal-common.c:1723:afr_log_selfheal]<br>
>>> >>> >>> 0-androidpolice_data3-replicate-0: Completed metadata<br>
>selfheal<br>
>>> >on<br>
>>> >>> >>> 2692eeba-1ebe-49b6-927f-1dfbcd227591. sources=0 1 [2] <br>
>sinks=3<br>
>>> >>> >>> [2020-04-30 21:45:36.127412] W [MSGID: 114031]<br>
>>> >>> >>> [client-rpc-fops_v2.c:1985:client4_0_setattr_cbk]<br>
>>> >>> >>> 0-androidpolice_data3-client-0: remote operation failed<br>
>>> >[Operation<br>
>>> >>> >not<br>
>>> >>> >>> permitted]<br>
>>> >>> >>> [2020-04-30 21:45:36.345924] W [MSGID: 114031]<br>
>>> >>> >>> [client-rpc-fops_v2.c:1985:client4_0_setattr_cbk]<br>
>>> >>> >>> 0-androidpolice_data3-client-1: remote operation failed<br>
>>> >[Operation<br>
>>> >>> >not<br>
>>> >>> >>> permitted]<br>
>>> >>> >>> [2020-04-30 21:46:35.291853] I [MSGID: 108031]<br>
>>> >>> >>> [afr-common.c:2543:afr_local_discovery_cbk]<br>
>>> >>> >>> 0-androidpolice_data3-replicate-0: selecting local<br>
>read_child<br>
>>> >>> >>> androidpolice_data3-client-2<br>
>>> >>> >>> [2020-04-30 21:46:35.977342] I [MSGID: 108026]<br>
>>> >>> >>> [afr-self-heal-metadata.c:52:__afr_selfheal_metadata_do]<br>
>>> >>> >>> 0-androidpolice_data3-replicate-0: performing metadata<br>
>selfheal<br>
>>> >on<br>
>>> >>> >>> 2692eeba-1ebe-49b6-927f-1dfbcd227591<br>
>>> >>> >>> [2020-04-30 21:46:36.006607] I [MSGID: 108026]<br>
>>> >>> >>> [afr-self-heal-common.c:1723:afr_log_selfheal]<br>
>>> >>> >>> 0-androidpolice_data3-replicate-0: Completed metadata<br>
>selfheal<br>
>>> >on<br>
>>> >>> >>> 2692eeba-1ebe-49b6-927f-1dfbcd227591. sources=0 1 [2] <br>
>sinks=3<br>
>>> >>> >>> [2020-04-30 21:46:37.245599] E<br>
>>> >>> >[fuse-bridge.c:219:check_and_dump_fuse_W]<br>
>>> >>> >>> (--><br>
>>> >>><br>
>>><br>
>>>/usr/lib64/libglusterfs.so.0(_gf_log_callingfn+0x17d)[0x7fd13d50624d]<br>
>>> >>> >>> (--><br>
>>> >>> >>><br>
>>> >>><br>
>>><br>
>>>/usr/lib64/glusterfs/5.13/xlator/mount/fuse.so(+0x849a)[0x7fd1398e949a]<br>
>>> >>> >>> (--><br>
>>> >>> >>><br>
>>> >>><br>
>>><br>
>>>/usr/lib64/glusterfs/5.13/xlator/mount/fuse.so(+0x87bb)[0x7fd1398e97bb]<br>
>>> >>> >>> (--> /lib64/libpthread.so.0(+0x84f9)[0x7fd13ca564f9] (--><br>
>>> >>> >>> /lib64/libc.so.6(clone+0x3f)[0x7fd13c78ef2f] )))))<br>
>>> >0-glusterfs-fuse:<br>
>>> >>> >>> writing to fuse device failed: No such file or directory<br>
>>> >>> >>> [2020-04-30 21:46:50.864797] E<br>
>>> >>> >[fuse-bridge.c:219:check_and_dump_fuse_W]<br>
>>> >>> >>> (--><br>
>>> >>><br>
>>><br>
>>>/usr/lib64/libglusterfs.so.0(_gf_log_callingfn+0x17d)[0x7fd13d50624d]<br>
>>> >>> >>> (--><br>
>>> >>> >>><br>
>>> >>><br>
>>><br>
>>>/usr/lib64/glusterfs/5.13/xlator/mount/fuse.so(+0x849a)[0x7fd1398e949a]<br>
>>> >>> >>> (--><br>
>>> >>> >>><br>
>>> >>><br>
>>><br>
>>>/usr/lib64/glusterfs/5.13/xlator/mount/fuse.so(+0x87bb)[0x7fd1398e97bb]<br>
>>> >>> >>> (--> /lib64/libpthread.so.0(+0x84f9)[0x7fd13ca564f9] (--><br>
>>> >>> >>> /lib64/libc.so.6(clone+0x3f)[0x7fd13c78ef2f] )))))<br>
>>> >0-glusterfs-fuse:<br>
>>> >>> >>> writing to fuse device failed: No such file or directory<br>
>>> >>> >>><br>
>>> >>> >>> The number of items being healed is going up and down<br>
>wildly,<br>
>>> >from 0<br>
>>> >>> >to<br>
>>> >>> >>> 8000+ and sometimes taking a really long time to return a<br>
>value.<br>
>>> >I'm<br>
>>> >>> >really<br>
>>> >>> >>> worried as this is a production system, and I didn't observe<br>
>>> >this in<br>
>>> >>> >our<br>
>>> >>> >>> test system.<br>
>>> >>> >>><br>
>>> >>> >>><br>
>>> >>> >>><br>
>>> >>> >>> gluster v heal apkmirror_data1 info summary<br>
>>> >>> >>> Brick nexus2:/mnt/nexus2_block1/apkmirror_data1<br>
>>> >>> >>> Status: Connected<br>
>>> >>> >>> Total Number of entries: 27<br>
>>> >>> >>> Number of entries in heal pending: 27<br>
>>> >>> >>> Number of entries in split-brain: 0<br>
>>> >>> >>> Number of entries possibly healing: 0<br>
>>> >>> >>><br>
>>> >>> >>> Brick forge:/mnt/forge_block1/apkmirror_data1<br>
>>> >>> >>> Status: Connected<br>
>>> >>> >>> Total Number of entries: 27<br>
>>> >>> >>> Number of entries in heal pending: 27<br>
>>> >>> >>> Number of entries in split-brain: 0<br>
>>> >>> >>> Number of entries possibly healing: 0<br>
>>> >>> >>><br>
>>> >>> >>> Brick hive:/mnt/hive_block1/apkmirror_data1<br>
>>> >>> >>> Status: Connected<br>
>>> >>> >>> Total Number of entries: 27<br>
>>> >>> >>> Number of entries in heal pending: 27<br>
>>> >>> >>> Number of entries in split-brain: 0<br>
>>> >>> >>> Number of entries possibly healing: 0<br>
>>> >>> >>><br>
>>> >>> >>> Brick citadel:/mnt/citadel_block1/apkmirror_data1<br>
>>> >>> >>> Status: Connected<br>
>>> >>> >>> Total Number of entries: 8540<br>
>>> >>> >>> Number of entries in heal pending: 8540<br>
>>> >>> >>> Number of entries in split-brain: 0<br>
>>> >>> >>> Number of entries possibly healing: 0<br>
>>> >>> >>><br>
>>> >>> >>><br>
>>> >>> >>><br>
>>> >>> >>> gluster v heal androidpolice_data3 info summary<br>
>>> >>> >>> Brick nexus2:/mnt/nexus2_block4/androidpolice_data3<br>
>>> >>> >>> Status: Connected<br>
>>> >>> >>> Total Number of entries: 1<br>
>>> >>> >>> Number of entries in heal pending: 1<br>
>>> >>> >>> Number of entries in split-brain: 0<br>
>>> >>> >>> Number of entries possibly healing: 0<br>
>>> >>> >>><br>
>>> >>> >>> Brick forge:/mnt/forge_block4/androidpolice_data3<br>
>>> >>> >>> Status: Connected<br>
>>> >>> >>> Total Number of entries: 1<br>
>>> >>> >>> Number of entries in heal pending: 1<br>
>>> >>> >>> Number of entries in split-brain: 0<br>
>>> >>> >>> Number of entries possibly healing: 0<br>
>>> >>> >>><br>
>>> >>> >>> Brick hive:/mnt/hive_block4/androidpolice_data3<br>
>>> >>> >>> Status: Connected<br>
>>> >>> >>> Total Number of entries: 1<br>
>>> >>> >>> Number of entries in heal pending: 1<br>
>>> >>> >>> Number of entries in split-brain: 0<br>
>>> >>> >>> Number of entries possibly healing: 0<br>
>>> >>> >>><br>
>>> >>> >>> Brick citadel:/mnt/citadel_block4/androidpolice_data3<br>
>>> >>> >>> Status: Connected<br>
>>> >>> >>> Total Number of entries: 1149<br>
>>> >>> >>> Number of entries in heal pending: 1149<br>
>>> >>> >>> Number of entries in split-brain: 0<br>
>>> >>> >>> Number of entries possibly healing: 0<br>
>>> >>> >>><br>
>>> >>> >>><br>
>>> >>> >>> What should I do at this point? The files I tested seem to<br>
>be<br>
>>> >>> >replicating<br>
>>> >>> >>> correctly, but I don't know if it's the case for all of<br>
>them,<br>
>>> >and<br>
>>> >>> >the heals<br>
>>> >>> >>> going up and down, and all these log messages are making me<br>
>very<br>
>>> >>> >nervous.<br>
>>> >>> >>><br>
>>> >>> >>> Thank you.<br>
>>> >>> >>><br>
>>> >>> >>> Sincerely,<br>
>>> >>> >>> Artem<br>
>>> >>> >>><br>
>>> >>> >>> --<br>
>>> >>> >>> Founder, Android Police <<a href="http://www.androidpolice.com" rel="noreferrer" target="_blank">http://www.androidpolice.com</a>>, APK<br>
>>> >Mirror<br>
>>> >>> >>> <<a href="http://www.apkmirror.com/" rel="noreferrer" target="_blank">http://www.apkmirror.com/</a>>, Illogical Robot LLC<br>
>>> >>> >>> <a href="http://beerpla.net" rel="noreferrer" target="_blank">beerpla.net</a> | @ArtemR <<a href="http://twitter.com/ArtemR" rel="noreferrer" target="_blank">http://twitter.com/ArtemR</a>><br>
>>> >>> >>><br>
>>> >>> >><br>
>>> >>><br>
>>> >>> I's not supported , but usually it works.<br>
>>> >>><br>
>>> >>> In worst case scenario, you can remove the node, wipe gluster<br>
>on<br>
>>> >the<br>
>>> >>> node, reinstall the packages and add it - it will require full<br>
>heal<br>
>>> >of the<br>
>>> >>> brick and as you have previously reported could lead to<br>
>performance<br>
>>> >>> degradation.<br>
>>> >>><br>
>>> >>> I think you are on SLES, but I could be wrong . Do you have<br>
>btrfs or<br>
>>> >LVM<br>
>>> >>> snapshots to revert from ?<br>
>>> >>><br>
>>> >>> Best Regards,<br>
>>> >>> Strahil Nikolov<br>
>>> >>><br>
>>> >><br>
>>><br>
>>> Hi Artem,<br>
>>><br>
>>> You can increase the brick log level following<br>
>>><br>
><a href="https://access.redhat.com/documentation/en-us/red_hat_gluster_storage/3/html/administration_guide/configuring_the_log_level" rel="noreferrer" target="_blank">https://access.redhat.com/documentation/en-us/red_hat_gluster_storage/3/html/administration_guide/configuring_the_log_level</a><br>
>>> but keep in mind that logs grow quite fast - so don't keep them<br>
>above the<br>
>>> current level for too much time.<br>
>>><br>
>>><br>
>>> Do you have a geo replication running ?<br>
>>><br>
>>> About the migration issue - I have no clue why this happened. Last<br>
>time I<br>
>>> skipped a major release(3.12 to 5.5) I got a huge trouble (all<br>
>files<br>
>>> ownership was switched to root) and I have the feeling that it<br>
>won't<br>
>>> happen again if you go through v6.<br>
>>><br>
>>> Best Regards,<br>
>>> Strahil Nikolov<br>
>>><br>
>> ________<br>
>><br>
>><br>
>><br>
>> Community Meeting Calendar:<br>
>><br>
>> Schedule -<br>
>> Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC<br>
>> Bridge: <a href="https://bluejeans.com/441850968" rel="noreferrer" target="_blank">https://bluejeans.com/441850968</a><br>
>><br>
>> Gluster-users mailing list<br>
>> <a href="mailto:Gluster-users@gluster.org" target="_blank">Gluster-users@gluster.org</a><br>
>> <a href="https://lists.gluster.org/mailman/listinfo/gluster-users" rel="noreferrer" target="_blank">https://lists.gluster.org/mailman/listinfo/gluster-users</a><br>
>><br>
<br>
Hey Artem,<br>
<br>
I just checked if the 'replica 4' is causing the issue , but that's not true (tested with 1 node down, but it's the same situation).<br>
<br>
I created 4 VMs on CentOS 7 & Gluster v7.5 (brick has only noatime mount option) and created a 'replica 4' volume.<br>
Then I created a dir and placed 50000 very small files there via:<br>
for i in {1..50000}; do echo $RANDOM > $i ; done<br>
<br>
The find command 'finds' them in 4s and after some tuning I have managed to lower it to 2.5s.<br>
<br>
What has caused some improvement was:<br>
A) Activated the rhgs-random-io tuned profile which you can take from <a href="ftp://ftp.redhat.com/redhat/linux/enterprise/7Server/en/RHS/SRPMS/redhat-storage-server-3.5.0.0-1.el7rhgs.src.rpm" rel="noreferrer" target="_blank">ftp://ftp.redhat.com/redhat/linux/enterprise/7Server/en/RHS/SRPMS/redhat-storage-server-3.5.0.0-1.el7rhgs.src.rpm</a><br>
B) using noatime for the mount option and if you use SELINUX you could use the 'context=system_u:object_r:glusterd_brick_t:s0' mount option to prevent selinux context lookups<br>
C) Activation of the gluster group of settings 'metadata-cache' or 'nl-cache' brought 'find' to the same results - lowered from 3.5s to 2.5s after an initial run.<br>
<br>
I know that I'm not compairing apples to apples , but still it might help.<br>
<br>
I would like to learn what actually gluster does when a 'find' or 'ls' is invoked, as I doubt it just executes it on the bricks.<br>
<br>
Best Regards,<br>
Strahil Nikolov<br>
</blockquote></div>
</blockquote></div>