[Gluster-users] Upgrade from 5.13 to 7.5 full of weird messages
    Artem Russakovskii 
    archon810 at gmail.com
       
    Fri Jun 26 21:26:42 UTC 2020
    
    
  
Hi Mahdi,
I already listed the steps that it took - simply upgrading one of the four
nodes from 5.13 to 7.5 and observing the log.
Sincerely,
Artem
--
Founder, Android Police <http://www.androidpolice.com>, APK Mirror
<http://www.apkmirror.com/>, Illogical Robot LLC
beerpla.net | @ArtemR <http://twitter.com/ArtemR>
On Sun, Jun 21, 2020 at 12:03 PM Mahdi Adnan <mahdi at sysmin.io> wrote:
> I think if it's reproducible than someone can look into it, can you list
> the steps to reproduce it?
>
> On Sun, Jun 21, 2020 at 9:12 PM Artem Russakovskii <archon810 at gmail.com>
> wrote:
>
>> There's been 0 progress or attention to this issue in a month on github
>> or otherwise.
>>
>> Sincerely,
>> Artem
>>
>> --
>> Founder, Android Police <http://www.androidpolice.com>, APK Mirror
>> <http://www.apkmirror.com/>, Illogical Robot LLC
>> beerpla.net | @ArtemR <http://twitter.com/ArtemR>
>>
>>
>> On Thu, May 21, 2020 at 12:43 PM Artem Russakovskii <archon810 at gmail.com>
>> wrote:
>>
>>> I've also moved this to github:
>>> https://github.com/gluster/glusterfs/issues/1257.
>>>
>>> Sincerely,
>>> Artem
>>>
>>> --
>>> Founder, Android Police <http://www.androidpolice.com>, APK Mirror
>>> <http://www.apkmirror.com/>, Illogical Robot LLC
>>> beerpla.net | @ArtemR <http://twitter.com/ArtemR>
>>>
>>>
>>> On Fri, May 15, 2020 at 2:51 PM Artem Russakovskii <archon810 at gmail.com>
>>> wrote:
>>>
>>>> Hi,
>>>>
>>>> I see the team met up recently and one of the discussed items was
>>>> issues upgrading to v7. What were the results of this discussion?
>>>>
>>>> Is the team going to respond to this thread with their thoughts and
>>>> analysis?
>>>>
>>>> Thanks.
>>>>
>>>> Sincerely,
>>>> Artem
>>>>
>>>> --
>>>> Founder, Android Police <http://www.androidpolice.com>, APK Mirror
>>>> <http://www.apkmirror.com/>, Illogical Robot LLC
>>>> beerpla.net | @ArtemR <http://twitter.com/ArtemR>
>>>>
>>>>
>>>> On Mon, May 4, 2020 at 10:23 PM Strahil Nikolov <hunter86_bg at yahoo.com>
>>>> wrote:
>>>>
>>>>> On May 4, 2020 4:26:32 PM GMT+03:00, Amar Tumballi <amar at kadalu.io>
>>>>> wrote:
>>>>> >On Sat, May 2, 2020 at 10:49 PM Artem Russakovskii
>>>>> ><archon810 at gmail.com>
>>>>> >wrote:
>>>>> >
>>>>> >> I don't have geo replication.
>>>>> >>
>>>>> >> Still waiting for someone from the gluster team to chime in. They
>>>>> >used to
>>>>> >> be a lot more responsive here. Do you know if there is a holiday
>>>>> >perhaps,
>>>>> >> or have the working hours been cut due to Coronavirus currently?
>>>>> >>
>>>>> >>
>>>>> >It was Holiday on May 1st, and 2nd and 3rd were Weekend days!  And
>>>>> also
>>>>> >I
>>>>> >guess many of Developers from Red Hat were attending Virtual Summit!
>>>>> >
>>>>> >
>>>>> >
>>>>> >> I'm not inclined to try a v6 upgrade without their word first.
>>>>> >>
>>>>> >
>>>>> >Fair bet! I will bring this topic in one of the community meetings,
>>>>> and
>>>>> >ask
>>>>> >developers if they have some feedback! I personally have not seen
>>>>> these
>>>>> >errors, and don't have a hunch on which patch would have caused an
>>>>> >increase
>>>>> >in logs!
>>>>> >
>>>>> >-Amar
>>>>> >
>>>>> >
>>>>> >>
>>>>> >> On Sat, May 2, 2020, 12:47 AM Strahil Nikolov <
>>>>> hunter86_bg at yahoo.com>
>>>>> >> wrote:
>>>>> >>
>>>>> >>> On May 1, 2020 8:03:50 PM GMT+03:00, Artem Russakovskii <
>>>>> >>> archon810 at gmail.com> wrote:
>>>>> >>> >The good news is the downgrade seems to have worked and was
>>>>> >painless.
>>>>> >>> >
>>>>> >>> >zypper install --oldpackage glusterfs-5.13, restart gluster, and
>>>>> >almost
>>>>> >>> >immediately there are no heal pending entries anymore.
>>>>> >>> >
>>>>> >>> >The only things still showing up in the logs, besides some healing
>>>>> >is
>>>>> >>> >0-glusterfs-fuse:
>>>>> >>> >writing to fuse device failed: No such file or directory:
>>>>> >>> >==> mnt-androidpolice_data3.log <==
>>>>> >>> >[2020-05-01 16:54:21.085643] E
>>>>> >>> >[fuse-bridge.c:219:check_and_dump_fuse_W]
>>>>> >>> >(-->
>>>>> >>>
>>>>> >>/usr/lib64/libglusterfs.so.0(_gf_log_callingfn+0x17d)[0x7fd13d50624d]
>>>>> >>> >(-->
>>>>> >>>
>>>>>
>>>>> >>/usr/lib64/glusterfs/5.13/xlator/mount/fuse.so(+0x849a)[0x7fd1398e949a]
>>>>> >>> >(-->
>>>>> >>>
>>>>>
>>>>> >>/usr/lib64/glusterfs/5.13/xlator/mount/fuse.so(+0x87bb)[0x7fd1398e97bb]
>>>>> >>> >(--> /lib64/libpthread.so.0(+0x84f9)[0x7fd13ca564f9] (-->
>>>>> >>> >/lib64/libc.so.6(clone+0x3f)[0x7fd13c78ef2f] )))))
>>>>> >0-glusterfs-fuse:
>>>>> >>> >writing to fuse device failed: No such file or directory
>>>>> >>> >==> mnt-apkmirror_data1.log <==
>>>>> >>> >[2020-05-01 16:54:21.268842] E
>>>>> >>> >[fuse-bridge.c:219:check_and_dump_fuse_W]
>>>>> >>> >(-->
>>>>> >>>
>>>>> >>/usr/lib64/libglusterfs.so.0(_gf_log_callingfn+0x17d)[0x7fdf2b0a624d]
>>>>> >>> >(-->
>>>>> >>>
>>>>>
>>>>> >>/usr/lib64/glusterfs/5.13/xlator/mount/fuse.so(+0x849a)[0x7fdf2748949a]
>>>>> >>> >(-->
>>>>> >>>
>>>>>
>>>>> >>/usr/lib64/glusterfs/5.13/xlator/mount/fuse.so(+0x87bb)[0x7fdf274897bb]
>>>>> >>> >(--> /lib64/libpthread.so.0(+0x84f9)[0x7fdf2a5f64f9] (-->
>>>>> >>> >/lib64/libc.so.6(clone+0x3f)[0x7fdf2a32ef2f] )))))
>>>>> >0-glusterfs-fuse:
>>>>> >>> >writing to fuse device failed: No such file or directory
>>>>> >>> >
>>>>> >>> >It'd be very helpful if it had more info about what failed to
>>>>> write
>>>>> >and
>>>>> >>> >why.
>>>>> >>> >
>>>>> >>> >I'd still really love to see the analysis of this failed upgrade
>>>>> >from
>>>>> >>> >core
>>>>> >>> >gluster maintainers to see what needs fixing and how we can
>>>>> upgrade
>>>>> >in
>>>>> >>> >the
>>>>> >>> >future.
>>>>> >>> >
>>>>> >>> >Thanks.
>>>>> >>> >
>>>>> >>> >Sincerely,
>>>>> >>> >Artem
>>>>> >>> >
>>>>> >>> >--
>>>>> >>> >Founder, Android Police <http://www.androidpolice.com>, APK
>>>>> Mirror
>>>>> >>> ><http://www.apkmirror.com/>, Illogical Robot LLC
>>>>> >>> >beerpla.net | @ArtemR <http://twitter.com/ArtemR>
>>>>> >>> >
>>>>> >>> >
>>>>> >>> >On Fri, May 1, 2020 at 7:25 AM Artem Russakovskii
>>>>> ><archon810 at gmail.com>
>>>>> >>> >wrote:
>>>>> >>> >
>>>>> >>> >> I do not have snapshots, no. I have a general file based backup,
>>>>> >but
>>>>> >>> >also
>>>>> >>> >> the other 3 nodes are up.
>>>>> >>> >>
>>>>> >>> >> OpenSUSE 15.1.
>>>>> >>> >>
>>>>> >>> >> If I try to downgrade and it doesn't work, what's the brick
>>>>> >>> >replacement
>>>>> >>> >> scenario - is this still accurate?
>>>>> >>> >>
>>>>> >>> >
>>>>> >>>
>>>>> >
>>>>> https://docs.gluster.org/en/latest/Administrator%20Guide/Managing%20Volumes/#replace-brick
>>>>> >>> >>
>>>>> >>> >> Any feedback about the issues themselves yet please?
>>>>> >Specifically, is
>>>>> >>> >> there a chance this is happening because of the mismatched
>>>>> >gluster
>>>>> >>> >> versions? Though, what's the solution then?
>>>>> >>> >>
>>>>> >>> >> On Fri, May 1, 2020, 1:07 AM Strahil Nikolov
>>>>> ><hunter86_bg at yahoo.com>
>>>>> >>> >> wrote:
>>>>> >>> >>
>>>>> >>> >>> On May 1, 2020 1:25:17 AM GMT+03:00, Artem Russakovskii <
>>>>> >>> >>> archon810 at gmail.com> wrote:
>>>>> >>> >>> >If more time is needed to analyze this, is this an option?
>>>>> Shut
>>>>> >>> >down
>>>>> >>> >>> >7.5,
>>>>> >>> >>> >downgrade it back to 5.13 and restart, or would this screw
>>>>> >>> >something up
>>>>> >>> >>> >badly? I didn't up the op-version yet.
>>>>> >>> >>> >
>>>>> >>> >>> >Thanks.
>>>>> >>> >>> >
>>>>> >>> >>> >Sincerely,
>>>>> >>> >>> >Artem
>>>>> >>> >>> >
>>>>> >>> >>> >--
>>>>> >>> >>> >Founder, Android Police <http://www.androidpolice.com>, APK
>>>>> >Mirror
>>>>> >>> >>> ><http://www.apkmirror.com/>, Illogical Robot LLC
>>>>> >>> >>> >beerpla.net | @ArtemR <http://twitter.com/ArtemR>
>>>>> >>> >>> >
>>>>> >>> >>> >
>>>>> >>> >>> >On Thu, Apr 30, 2020 at 3:13 PM Artem Russakovskii
>>>>> >>> >>> ><archon810 at gmail.com>
>>>>> >>> >>> >wrote:
>>>>> >>> >>> >
>>>>> >>> >>> >> The number of heal pending on citadel, the one that was
>>>>> >upgraded
>>>>> >>> >to
>>>>> >>> >>> >7.5,
>>>>> >>> >>> >> has now gone to 10s of thousands and continues to go up.
>>>>> >>> >>> >>
>>>>> >>> >>> >> Sincerely,
>>>>> >>> >>> >> Artem
>>>>> >>> >>> >>
>>>>> >>> >>> >> --
>>>>> >>> >>> >> Founder, Android Police <http://www.androidpolice.com>, APK
>>>>> >>> >Mirror
>>>>> >>> >>> >> <http://www.apkmirror.com/>, Illogical Robot LLC
>>>>> >>> >>> >> beerpla.net | @ArtemR <http://twitter.com/ArtemR>
>>>>> >>> >>> >>
>>>>> >>> >>> >>
>>>>> >>> >>> >> On Thu, Apr 30, 2020 at 2:57 PM Artem Russakovskii
>>>>> >>> >>> ><archon810 at gmail.com>
>>>>> >>> >>> >> wrote:
>>>>> >>> >>> >>
>>>>> >>> >>> >>> Hi all,
>>>>> >>> >>> >>>
>>>>> >>> >>> >>> Today, I decided to upgrade one of the four servers
>>>>> >(citadel) we
>>>>> >>> >>> >have to
>>>>> >>> >>> >>> 7.5 from 5.13. There are 2 volumes, 1x4 replicate, and fuse
>>>>> >>> >mounts
>>>>> >>> >>> >(I sent
>>>>> >>> >>> >>> the full details earlier in another message). If everything
>>>>> >>> >looked
>>>>> >>> >>> >OK, I
>>>>> >>> >>> >>> would have proceeded the rolling upgrade for all of them,
>>>>> >>> >following
>>>>> >>> >>> >the
>>>>> >>> >>> >>> full heal.
>>>>> >>> >>> >>>
>>>>> >>> >>> >>> However, as soon as I upgraded and restarted, the logs
>>>>> >filled
>>>>> >>> >with
>>>>> >>> >>> >>> messages like these:
>>>>> >>> >>> >>>
>>>>> >>> >>> >>> [2020-04-30 21:39:21.316149] E
>>>>> >>> >>> >>> [rpcsvc.c:567:rpcsvc_check_and_reply_error] 0-rpcsvc: rpc
>>>>> >actor
>>>>> >>> >>> >>> (1298437:400:17) failed to complete successfully
>>>>> >>> >>> >>> [2020-04-30 21:39:21.382891] E
>>>>> >>> >>> >>> [rpcsvc.c:567:rpcsvc_check_and_reply_error] 0-rpcsvc: rpc
>>>>> >actor
>>>>> >>> >>> >>> (1298437:400:17) failed to complete successfully
>>>>> >>> >>> >>> [2020-04-30 21:39:21.442440] E
>>>>> >>> >>> >>> [rpcsvc.c:567:rpcsvc_check_and_reply_error] 0-rpcsvc: rpc
>>>>> >actor
>>>>> >>> >>> >>> (1298437:400:17) failed to complete successfully
>>>>> >>> >>> >>> [2020-04-30 21:39:21.445587] E
>>>>> >>> >>> >>> [rpcsvc.c:567:rpcsvc_check_and_reply_error] 0-rpcsvc: rpc
>>>>> >actor
>>>>> >>> >>> >>> (1298437:400:17) failed to complete successfully
>>>>> >>> >>> >>> [2020-04-30 21:39:21.571398] E
>>>>> >>> >>> >>> [rpcsvc.c:567:rpcsvc_check_and_reply_error] 0-rpcsvc: rpc
>>>>> >actor
>>>>> >>> >>> >>> (1298437:400:17) failed to complete successfully
>>>>> >>> >>> >>> [2020-04-30 21:39:21.668192] E
>>>>> >>> >>> >>> [rpcsvc.c:567:rpcsvc_check_and_reply_error] 0-rpcsvc: rpc
>>>>> >actor
>>>>> >>> >>> >>> (1298437:400:17) failed to complete successfully
>>>>> >>> >>> >>>
>>>>> >>> >>> >>>
>>>>> >>> >>> >>> The message "I [MSGID: 108031]
>>>>> >>> >>> >>> [afr-common.c:2581:afr_local_discovery_cbk]
>>>>> >>> >>> >>> 0-androidpolice_data3-replicate-0: selecting local
>>>>> >read_child
>>>>> >>> >>> >>> androidpolice_data3-client-3" repeated 10 times between
>>>>> >>> >[2020-04-30
>>>>> >>> >>> >>> 21:46:41.854675] and [2020-04-30 21:48:20.206323]
>>>>> >>> >>> >>> The message "W [MSGID: 114031]
>>>>> >>> >>> >>> [client-rpc-fops_v2.c:850:client4_0_setxattr_cbk]
>>>>> >>> >>> >>> 0-androidpolice_data3-client-1: remote operation failed
>>>>> >>> >[Transport
>>>>> >>> >>> >endpoint
>>>>> >>> >>> >>> is not connected]" repeated 264 times between [2020-04-30
>>>>> >>> >>> >21:46:32.129567]
>>>>> >>> >>> >>> and [2020-04-30 21:48:29.905008]
>>>>> >>> >>> >>> The message "W [MSGID: 114031]
>>>>> >>> >>> >>> [client-rpc-fops_v2.c:850:client4_0_setxattr_cbk]
>>>>> >>> >>> >>> 0-androidpolice_data3-client-0: remote operation failed
>>>>> >>> >[Transport
>>>>> >>> >>> >endpoint
>>>>> >>> >>> >>> is not connected]" repeated 264 times between [2020-04-30
>>>>> >>> >>> >21:46:32.129602]
>>>>> >>> >>> >>> and [2020-04-30 21:48:29.905040]
>>>>> >>> >>> >>> The message "W [MSGID: 114031]
>>>>> >>> >>> >>> [client-rpc-fops_v2.c:850:client4_0_setxattr_cbk]
>>>>> >>> >>> >>> 0-androidpolice_data3-client-2: remote operation failed
>>>>> >>> >[Transport
>>>>> >>> >>> >endpoint
>>>>> >>> >>> >>> is not connected]" repeated 264 times between [2020-04-30
>>>>> >>> >>> >21:46:32.129512]
>>>>> >>> >>> >>> and [2020-04-30 21:48:29.905047]
>>>>> >>> >>> >>>
>>>>> >>> >>> >>>
>>>>> >>> >>> >>>
>>>>> >>> >>> >>> Once in a while, I'm seeing this:
>>>>> >>> >>> >>> ==> bricks/mnt-hive_block4-androidpolice_data3.log <==
>>>>> >>> >>> >>> [2020-04-30 21:45:54.251637] I [MSGID: 115072]
>>>>> >>> >>> >>> [server-rpc-fops_v2.c:1681:server4_setattr_cbk]
>>>>> >>> >>> >>> 0-androidpolice_data3-server: 5725811: SETATTR /
>>>>> >>> >>> >>>
>>>>> >>> >>> >
>>>>> >>> >>>
>>>>> >>> >
>>>>> >>>
>>>>> >
>>>>> androidpolice.com/public/wp-content/uploads/2019/03/cielo-breez-plus-hero.png
>>>>> >>> >>> >>> (d4556eb4-f15b-412c-a42a-32b4438af557), client:
>>>>> >>> >>> >>>
>>>>> >>> >>>
>>>>> >>> >>>
>>>>> >>>
>>>>> >>>
>>>>>
>>>>> >>>CTX_ID:32e2d636-038a-472d-8199-007555d1805f-GRAPH_ID:0-PID:14265-HOST:nexus2-PC_NAME:androidpolice_data3-client-2-RECON_NO:-1,
>>>>> >>> >>> >>> error-xlator: androidpolice_data3-access-control [Operation
>>>>> >not
>>>>> >>> >>> >permitted]
>>>>> >>> >>> >>> [2020-04-30 21:49:10.439701] I [MSGID: 115072]
>>>>> >>> >>> >>> [server-rpc-fops_v2.c:1680:server4_setattr_cbk]
>>>>> >>> >>> >>> 0-androidpolice_data3-server: 201833: SETATTR /
>>>>> >>> >>> >>> androidpolice.com/public/wp-content/uploads
>>>>> >>> >>> >>> (2692eeba-1ebe-49b6-927f-1dfbcd227591), client:
>>>>> >>> >>> >>>
>>>>> >>> >>>
>>>>> >>> >>>
>>>>> >>>
>>>>> >>>
>>>>>
>>>>> >>>CTX_ID:af341e80-70ff-4d23-99ef-3d846a546fc9-GRAPH_ID:0-PID:2358-HOST:forge-PC_NAME:androidpolice_data3-client-3-RECON_NO:-2,
>>>>> >>> >>> >>> error-xlator: androidpolice_data3-access-control [Operation
>>>>> >not
>>>>> >>> >>> >permitted]
>>>>> >>> >>> >>> [2020-04-30 21:49:10.453724] I [MSGID: 115072]
>>>>> >>> >>> >>> [server-rpc-fops_v2.c:1680:server4_setattr_cbk]
>>>>> >>> >>> >>> 0-androidpolice_data3-server: 201842: SETATTR /
>>>>> >>> >>> >>> androidpolice.com/public/wp-content/uploads
>>>>> >>> >>> >>> (2692eeba-1ebe-49b6-927f-1dfbcd227591), client:
>>>>> >>> >>> >>>
>>>>> >>> >>>
>>>>> >>> >>>
>>>>> >>>
>>>>> >>>
>>>>>
>>>>> >>>CTX_ID:af341e80-70ff-4d23-99ef-3d846a546fc9-GRAPH_ID:0-PID:2358-HOST:forge-PC_NAME:androidpolice_data3-client-3-RECON_NO:-2,
>>>>> >>> >>> >>> error-xlator: androidpolice_data3-access-control [Operation
>>>>> >not
>>>>> >>> >>> >permitted]
>>>>> >>> >>> >>> [2020-04-30 21:49:16.224662] I [MSGID: 115072]
>>>>> >>> >>> >>> [server-rpc-fops_v2.c:1680:server4_setattr_cbk]
>>>>> >>> >>> >>> 0-androidpolice_data3-server: 202865: SETATTR /
>>>>> >>> >>> >>> androidpolice.com/public/wp-content/uploads
>>>>> >>> >>> >>> (2692eeba-1ebe-49b6-927f-1dfbcd227591), client:
>>>>> >>> >>> >>>
>>>>> >>> >>>
>>>>> >>> >>>
>>>>> >>>
>>>>> >>>
>>>>>
>>>>> >>>CTX_ID:32e2d636-038a-472d-8199-007555d1805f-GRAPH_ID:0-PID:14265-HOST:nexus2-PC_NAME:androidpolice_data3-client-3-RECON_NO:-2,
>>>>> >>> >>> >>> error-xlator: androidpolice_data3-access-control [Operation
>>>>> >not
>>>>> >>> >>> >permitted]
>>>>> >>> >>> >>>
>>>>> >>> >>> >>> There's also lots of self-healing happening that I didn't
>>>>> >expect
>>>>> >>> >at
>>>>> >>> >>> >all,
>>>>> >>> >>> >>> since the upgrade only took ~10-15s.
>>>>> >>> >>> >>> [2020-04-30 21:47:38.714448] I [MSGID: 108026]
>>>>> >>> >>> >>> [afr-self-heal-metadata.c:52:__afr_selfheal_metadata_do]
>>>>> >>> >>> >>> 0-apkmirror_data1-replicate-0: performing metadata selfheal
>>>>> >on
>>>>> >>> >>> >>> 4a6ba2d7-7ad8-4113-862b-02e4934a3461
>>>>> >>> >>> >>> [2020-04-30 21:47:38.765033] I [MSGID: 108026]
>>>>> >>> >>> >>> [afr-self-heal-common.c:1723:afr_log_selfheal]
>>>>> >>> >>> >>> 0-apkmirror_data1-replicate-0: Completed metadata selfheal
>>>>> >on
>>>>> >>> >>> >>> 4a6ba2d7-7ad8-4113-862b-02e4934a3461. sources=[3]  sinks=0
>>>>> 1
>>>>> >2
>>>>> >>> >>> >>> [2020-04-30 21:47:38.765289] I [MSGID: 108026]
>>>>> >>> >>> >>> [afr-self-heal-metadata.c:52:__afr_selfheal_metadata_do]
>>>>> >>> >>> >>> 0-apkmirror_data1-replicate-0: performing metadata selfheal
>>>>> >on
>>>>> >>> >>> >>> f3c62a41-1864-4e75-9883-4357a7091296
>>>>> >>> >>> >>> [2020-04-30 21:47:38.800987] I [MSGID: 108026]
>>>>> >>> >>> >>> [afr-self-heal-common.c:1723:afr_log_selfheal]
>>>>> >>> >>> >>> 0-apkmirror_data1-replicate-0: Completed metadata selfheal
>>>>> >on
>>>>> >>> >>> >>> f3c62a41-1864-4e75-9883-4357a7091296. sources=[3]  sinks=0
>>>>> 1
>>>>> >2
>>>>> >>> >>> >>>
>>>>> >>> >>> >>>
>>>>> >>> >>> >>> I'm also seeing "remote operation failed" and "writing to
>>>>> >fuse
>>>>> >>> >>> >device
>>>>> >>> >>> >>> failed: No such file or directory" messages
>>>>> >>> >>> >>> [2020-04-30 21:46:34.891957] I [MSGID: 108026]
>>>>> >>> >>> >>> [afr-self-heal-common.c:1723:afr_log_selfheal]
>>>>> >>> >>> >>> 0-androidpolice_data3-replicate-0: Completed metadata
>>>>> >selfheal
>>>>> >>> >on
>>>>> >>> >>> >>> 2692eeba-1ebe-49b6-927f-1dfbcd227591. sources=0 1 [2]
>>>>> >sinks=3
>>>>> >>> >>> >>> [2020-04-30 21:45:36.127412] W [MSGID: 114031]
>>>>> >>> >>> >>> [client-rpc-fops_v2.c:1985:client4_0_setattr_cbk]
>>>>> >>> >>> >>> 0-androidpolice_data3-client-0: remote operation failed
>>>>> >>> >[Operation
>>>>> >>> >>> >not
>>>>> >>> >>> >>> permitted]
>>>>> >>> >>> >>> [2020-04-30 21:45:36.345924] W [MSGID: 114031]
>>>>> >>> >>> >>> [client-rpc-fops_v2.c:1985:client4_0_setattr_cbk]
>>>>> >>> >>> >>> 0-androidpolice_data3-client-1: remote operation failed
>>>>> >>> >[Operation
>>>>> >>> >>> >not
>>>>> >>> >>> >>> permitted]
>>>>> >>> >>> >>> [2020-04-30 21:46:35.291853] I [MSGID: 108031]
>>>>> >>> >>> >>> [afr-common.c:2543:afr_local_discovery_cbk]
>>>>> >>> >>> >>> 0-androidpolice_data3-replicate-0: selecting local
>>>>> >read_child
>>>>> >>> >>> >>> androidpolice_data3-client-2
>>>>> >>> >>> >>> [2020-04-30 21:46:35.977342] I [MSGID: 108026]
>>>>> >>> >>> >>> [afr-self-heal-metadata.c:52:__afr_selfheal_metadata_do]
>>>>> >>> >>> >>> 0-androidpolice_data3-replicate-0: performing metadata
>>>>> >selfheal
>>>>> >>> >on
>>>>> >>> >>> >>> 2692eeba-1ebe-49b6-927f-1dfbcd227591
>>>>> >>> >>> >>> [2020-04-30 21:46:36.006607] I [MSGID: 108026]
>>>>> >>> >>> >>> [afr-self-heal-common.c:1723:afr_log_selfheal]
>>>>> >>> >>> >>> 0-androidpolice_data3-replicate-0: Completed metadata
>>>>> >selfheal
>>>>> >>> >on
>>>>> >>> >>> >>> 2692eeba-1ebe-49b6-927f-1dfbcd227591. sources=0 1 [2]
>>>>> >sinks=3
>>>>> >>> >>> >>> [2020-04-30 21:46:37.245599] E
>>>>> >>> >>> >[fuse-bridge.c:219:check_and_dump_fuse_W]
>>>>> >>> >>> >>> (-->
>>>>> >>> >>>
>>>>> >>>
>>>>>
>>>>> >>>/usr/lib64/libglusterfs.so.0(_gf_log_callingfn+0x17d)[0x7fd13d50624d]
>>>>> >>> >>> >>> (-->
>>>>> >>> >>> >>>
>>>>> >>> >>>
>>>>> >>>
>>>>>
>>>>> >>>/usr/lib64/glusterfs/5.13/xlator/mount/fuse.so(+0x849a)[0x7fd1398e949a]
>>>>> >>> >>> >>> (-->
>>>>> >>> >>> >>>
>>>>> >>> >>>
>>>>> >>>
>>>>>
>>>>> >>>/usr/lib64/glusterfs/5.13/xlator/mount/fuse.so(+0x87bb)[0x7fd1398e97bb]
>>>>> >>> >>> >>> (--> /lib64/libpthread.so.0(+0x84f9)[0x7fd13ca564f9] (-->
>>>>> >>> >>> >>> /lib64/libc.so.6(clone+0x3f)[0x7fd13c78ef2f] )))))
>>>>> >>> >0-glusterfs-fuse:
>>>>> >>> >>> >>> writing to fuse device failed: No such file or directory
>>>>> >>> >>> >>> [2020-04-30 21:46:50.864797] E
>>>>> >>> >>> >[fuse-bridge.c:219:check_and_dump_fuse_W]
>>>>> >>> >>> >>> (-->
>>>>> >>> >>>
>>>>> >>>
>>>>>
>>>>> >>>/usr/lib64/libglusterfs.so.0(_gf_log_callingfn+0x17d)[0x7fd13d50624d]
>>>>> >>> >>> >>> (-->
>>>>> >>> >>> >>>
>>>>> >>> >>>
>>>>> >>>
>>>>>
>>>>> >>>/usr/lib64/glusterfs/5.13/xlator/mount/fuse.so(+0x849a)[0x7fd1398e949a]
>>>>> >>> >>> >>> (-->
>>>>> >>> >>> >>>
>>>>> >>> >>>
>>>>> >>>
>>>>>
>>>>> >>>/usr/lib64/glusterfs/5.13/xlator/mount/fuse.so(+0x87bb)[0x7fd1398e97bb]
>>>>> >>> >>> >>> (--> /lib64/libpthread.so.0(+0x84f9)[0x7fd13ca564f9] (-->
>>>>> >>> >>> >>> /lib64/libc.so.6(clone+0x3f)[0x7fd13c78ef2f] )))))
>>>>> >>> >0-glusterfs-fuse:
>>>>> >>> >>> >>> writing to fuse device failed: No such file or directory
>>>>> >>> >>> >>>
>>>>> >>> >>> >>> The number of items being healed is going up and down
>>>>> >wildly,
>>>>> >>> >from 0
>>>>> >>> >>> >to
>>>>> >>> >>> >>> 8000+ and sometimes taking a really long time to return a
>>>>> >value.
>>>>> >>> >I'm
>>>>> >>> >>> >really
>>>>> >>> >>> >>> worried as this is a production system, and I didn't
>>>>> observe
>>>>> >>> >this in
>>>>> >>> >>> >our
>>>>> >>> >>> >>> test system.
>>>>> >>> >>> >>>
>>>>> >>> >>> >>>
>>>>> >>> >>> >>>
>>>>> >>> >>> >>> gluster v heal apkmirror_data1 info summary
>>>>> >>> >>> >>> Brick nexus2:/mnt/nexus2_block1/apkmirror_data1
>>>>> >>> >>> >>> Status: Connected
>>>>> >>> >>> >>> Total Number of entries: 27
>>>>> >>> >>> >>> Number of entries in heal pending: 27
>>>>> >>> >>> >>> Number of entries in split-brain: 0
>>>>> >>> >>> >>> Number of entries possibly healing: 0
>>>>> >>> >>> >>>
>>>>> >>> >>> >>> Brick forge:/mnt/forge_block1/apkmirror_data1
>>>>> >>> >>> >>> Status: Connected
>>>>> >>> >>> >>> Total Number of entries: 27
>>>>> >>> >>> >>> Number of entries in heal pending: 27
>>>>> >>> >>> >>> Number of entries in split-brain: 0
>>>>> >>> >>> >>> Number of entries possibly healing: 0
>>>>> >>> >>> >>>
>>>>> >>> >>> >>> Brick hive:/mnt/hive_block1/apkmirror_data1
>>>>> >>> >>> >>> Status: Connected
>>>>> >>> >>> >>> Total Number of entries: 27
>>>>> >>> >>> >>> Number of entries in heal pending: 27
>>>>> >>> >>> >>> Number of entries in split-brain: 0
>>>>> >>> >>> >>> Number of entries possibly healing: 0
>>>>> >>> >>> >>>
>>>>> >>> >>> >>> Brick citadel:/mnt/citadel_block1/apkmirror_data1
>>>>> >>> >>> >>> Status: Connected
>>>>> >>> >>> >>> Total Number of entries: 8540
>>>>> >>> >>> >>> Number of entries in heal pending: 8540
>>>>> >>> >>> >>> Number of entries in split-brain: 0
>>>>> >>> >>> >>> Number of entries possibly healing: 0
>>>>> >>> >>> >>>
>>>>> >>> >>> >>>
>>>>> >>> >>> >>>
>>>>> >>> >>> >>> gluster v heal androidpolice_data3 info summary
>>>>> >>> >>> >>> Brick nexus2:/mnt/nexus2_block4/androidpolice_data3
>>>>> >>> >>> >>> Status: Connected
>>>>> >>> >>> >>> Total Number of entries: 1
>>>>> >>> >>> >>> Number of entries in heal pending: 1
>>>>> >>> >>> >>> Number of entries in split-brain: 0
>>>>> >>> >>> >>> Number of entries possibly healing: 0
>>>>> >>> >>> >>>
>>>>> >>> >>> >>> Brick forge:/mnt/forge_block4/androidpolice_data3
>>>>> >>> >>> >>> Status: Connected
>>>>> >>> >>> >>> Total Number of entries: 1
>>>>> >>> >>> >>> Number of entries in heal pending: 1
>>>>> >>> >>> >>> Number of entries in split-brain: 0
>>>>> >>> >>> >>> Number of entries possibly healing: 0
>>>>> >>> >>> >>>
>>>>> >>> >>> >>> Brick hive:/mnt/hive_block4/androidpolice_data3
>>>>> >>> >>> >>> Status: Connected
>>>>> >>> >>> >>> Total Number of entries: 1
>>>>> >>> >>> >>> Number of entries in heal pending: 1
>>>>> >>> >>> >>> Number of entries in split-brain: 0
>>>>> >>> >>> >>> Number of entries possibly healing: 0
>>>>> >>> >>> >>>
>>>>> >>> >>> >>> Brick citadel:/mnt/citadel_block4/androidpolice_data3
>>>>> >>> >>> >>> Status: Connected
>>>>> >>> >>> >>> Total Number of entries: 1149
>>>>> >>> >>> >>> Number of entries in heal pending: 1149
>>>>> >>> >>> >>> Number of entries in split-brain: 0
>>>>> >>> >>> >>> Number of entries possibly healing: 0
>>>>> >>> >>> >>>
>>>>> >>> >>> >>>
>>>>> >>> >>> >>> What should I do at this point? The files I tested seem to
>>>>> >be
>>>>> >>> >>> >replicating
>>>>> >>> >>> >>> correctly, but I don't know if it's the case for all of
>>>>> >them,
>>>>> >>> >and
>>>>> >>> >>> >the heals
>>>>> >>> >>> >>> going up and down, and all these log messages are making me
>>>>> >very
>>>>> >>> >>> >nervous.
>>>>> >>> >>> >>>
>>>>> >>> >>> >>> Thank you.
>>>>> >>> >>> >>>
>>>>> >>> >>> >>> Sincerely,
>>>>> >>> >>> >>> Artem
>>>>> >>> >>> >>>
>>>>> >>> >>> >>> --
>>>>> >>> >>> >>> Founder, Android Police <http://www.androidpolice.com>,
>>>>> APK
>>>>> >>> >Mirror
>>>>> >>> >>> >>> <http://www.apkmirror.com/>, Illogical Robot LLC
>>>>> >>> >>> >>> beerpla.net | @ArtemR <http://twitter.com/ArtemR>
>>>>> >>> >>> >>>
>>>>> >>> >>> >>
>>>>> >>> >>>
>>>>> >>> >>> I's not supported  , but usually it works.
>>>>> >>> >>>
>>>>> >>> >>> In worst case scenario,  you can remove the node, wipe gluster
>>>>> >on
>>>>> >>> >the
>>>>> >>> >>> node, reinstall the packages and add it - it will require full
>>>>> >heal
>>>>> >>> >of the
>>>>> >>> >>> brick and as you have previously reported could lead to
>>>>> >performance
>>>>> >>> >>> degradation.
>>>>> >>> >>>
>>>>> >>> >>> I think you are on SLES, but I could be wrong . Do you have
>>>>> >btrfs or
>>>>> >>> >LVM
>>>>> >>> >>> snapshots to revert from ?
>>>>> >>> >>>
>>>>> >>> >>> Best Regards,
>>>>> >>> >>> Strahil Nikolov
>>>>> >>> >>>
>>>>> >>> >>
>>>>> >>>
>>>>> >>> Hi Artem,
>>>>> >>>
>>>>> >>> You can increase the brick log level following
>>>>> >>>
>>>>> >
>>>>> https://access.redhat.com/documentation/en-us/red_hat_gluster_storage/3/html/administration_guide/configuring_the_log_level
>>>>> >>> but keep in mind that logs grow quite fast - so don't keep them
>>>>> >above the
>>>>> >>> current level for too much time.
>>>>> >>>
>>>>> >>>
>>>>> >>> Do you have a geo replication running ?
>>>>> >>>
>>>>> >>> About the migration issue - I have no clue why this happened. Last
>>>>> >time I
>>>>> >>> skipped a major release(3.12  to 5.5) I got a huge trouble (all
>>>>> >files
>>>>> >>> ownership was switched to root)  and I have the feeling  that it
>>>>> >won't
>>>>> >>> happen again if you go through v6.
>>>>> >>>
>>>>> >>> Best Regards,
>>>>> >>> Strahil Nikolov
>>>>> >>>
>>>>> >> ________
>>>>> >>
>>>>> >>
>>>>> >>
>>>>> >> Community Meeting Calendar:
>>>>> >>
>>>>> >> Schedule -
>>>>> >> Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
>>>>> >> Bridge: https://bluejeans.com/441850968
>>>>> >>
>>>>> >> Gluster-users mailing list
>>>>> >> Gluster-users at gluster.org
>>>>> >> https://lists.gluster.org/mailman/listinfo/gluster-users
>>>>> >>
>>>>>
>>>>> Hey Artem,
>>>>>
>>>>> I just checked if the 'replica 4' is causing the issue , but that's
>>>>> not true (tested with 1 node down, but it's the same situation).
>>>>>
>>>>> I created 4 VMs on CentOS 7 & Gluster v7.5 (brick has only noatime
>>>>> mount option) and created a 'replica 4' volume.
>>>>> Then I created a dir and placed 50000 very small files there via:
>>>>> for i in {1..50000}; do echo $RANDOM > $i ; done
>>>>>
>>>>> The find command 'finds' them in 4s and after some tuning I have
>>>>> managed to lower it to 2.5s.
>>>>>
>>>>> What has caused some improvement was:
>>>>> A) Activated the rhgs-random-io tuned profile which you can take from
>>>>> ftp://ftp.redhat.com/redhat/linux/enterprise/7Server/en/RHS/SRPMS/redhat-storage-server-3.5.0.0-1.el7rhgs.src.rpm
>>>>> B) using noatime for the mount option and if you use SELINUX you could
>>>>> use  the 'context=system_u:object_r:glusterd_brick_t:s0' mount option to
>>>>> prevent selinux context lookups
>>>>> C) Activation of the gluster group of settings 'metadata-cache' or
>>>>> 'nl-cache' brought 'find' to the same results - lowered  from 3.5s to 2.5s
>>>>> after an initial run.
>>>>>
>>>>> I know that I'm not compairing apples to apples , but still it might
>>>>> help.
>>>>>
>>>>> I would like to learn what actually gluster does when a 'find' or 'ls'
>>>>> is invoked, as I doubt it just executes it on the bricks.
>>>>>
>>>>> Best Regards,
>>>>> Strahil Nikolov
>>>>>
>>>> ________
>>
>>
>>
>> Community Meeting Calendar:
>>
>> Schedule -
>> Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
>> Bridge: https://bluejeans.com/441850968
>>
>> Gluster-users mailing list
>> Gluster-users at gluster.org
>> https://lists.gluster.org/mailman/listinfo/gluster-users
>>
>
>
> --
> Respectfully
> Mahdi
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20200626/34e3b500/attachment.html>
    
    
More information about the Gluster-users
mailing list