<div dir="ltr"><div>Hello Niels,<br></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">Am Fr., 17. Mai 2019 um 10:21 Uhr schrieb Niels de Vos <<a href="mailto:ndevos@redhat.com">ndevos@redhat.com</a>>:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">On Fri, May 17, 2019 at 09:50:28AM +0200, David Spisla wrote:<br>
> Hello Vijay,<br>
> thank you for the clarification. Yes, there is an unconditional dereference<br>
> in stbuf. It seems plausible that this causes the crash. I think a check<br>
> like this should help:<br>
> <br>
> if (buf == NULL) {<br>
> goto out;<br>
> }<br>
> map_atime_from_server(this, buf);<br>
> <br>
> Is there a reason why buf can be NULL?<br>
<br>
It seems LOOKUP returned an error (errno=13: EACCES: Permission denied).<br>
This is probably something you need to handle in worm_lookup_cbk. There<br>
can be many reasons for a FOP to return an error, why it happened in<br>
this case is a little difficult to say without (much) more details.<br></blockquote><div>Yes, I will look for a way to handle that case.<br></div><div>It is intended, that the struct stbuf ist NULL when an error happens?</div><div><br></div><div>Regards</div><div>David Spisla<br></div><div> <br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<br>
HTH,<br>
Niels<br>
<br>
<br>
> <br>
> Regards<br>
> David Spisla<br>
> <br>
> <br>
> Am Fr., 17. Mai 2019 um 01:51 Uhr schrieb Vijay Bellur <<a href="mailto:vbellur@redhat.com" target="_blank">vbellur@redhat.com</a>>:<br>
> <br>
> > Hello David,<br>
> ><br>
> > From the backtrace it looks like stbuf is NULL in map_atime_from_server()<br>
> > as worm_lookup_cbk has got an error (op_ret = -1, op_errno = 13). Can you<br>
> > please check if there is an unconditional dereference of stbuf in<br>
> > map_atime_from_server()?<br>
> ><br>
> > Regards,<br>
> > Vijay<br>
> ><br>
> > On Thu, May 16, 2019 at 2:36 AM David Spisla <<a href="mailto:spisla80@gmail.com" target="_blank">spisla80@gmail.com</a>> wrote:<br>
> ><br>
> >> Hello Vijay,<br>
> >><br>
> >> yes, we are using custom patches. It s a helper function, which is<br>
> >> defined in xlator_helper.c and used in worm_lookup_cbk.<br>
> >> Do you think this could be the problem? The functions only manipulates<br>
> >> the atime in struct iattr<br>
> >><br>
> >> Regards<br>
> >> David Spisla<br>
> >><br>
> >> Am Do., 16. Mai 2019 um 10:05 Uhr schrieb Vijay Bellur <<br>
> >> <a href="mailto:vbellur@redhat.com" target="_blank">vbellur@redhat.com</a>>:<br>
> >><br>
> >>> Hello David,<br>
> >>><br>
> >>> Do you have any custom patches in your deployment? I looked up v5.5 but<br>
> >>> could not find the following functions referred to in the core:<br>
> >>><br>
> >>> map_atime_from_server()<br>
> >>> worm_lookup_cbk()<br>
> >>><br>
> >>> Neither do I see xlator_helper.c in the codebase.<br>
> >>><br>
> >>> Thanks,<br>
> >>> Vijay<br>
> >>><br>
> >>><br>
> >>> #0 map_atime_from_server (this=0x7fdef401af00, stbuf=0x0) at<br>
> >>> ../../../../xlators/lib/src/xlator_helper.c:21<br>
> >>> __FUNCTION__ = "map_to_atime_from_server"<br>
> >>> #1 0x00007fdef39a0382 in worm_lookup_cbk (frame=frame@entry=0x7fdeac0015c8,<br>
> >>> cookie=<optimized out>, this=0x7fdef401af00, op_ret=op_ret@entry=-1,<br>
> >>> op_errno=op_errno@entry=13,<br>
> >>> inode=inode@entry=0x0, buf=0x0, xdata=0x0, postparent=0x0) at<br>
> >>> worm.c:531<br>
> >>> priv = 0x7fdef4075378<br>
> >>> ret = 0<br>
> >>> __FUNCTION__ = "worm_lookup_cbk"<br>
> >>><br>
> >>> On Thu, May 16, 2019 at 12:53 AM David Spisla <<a href="mailto:spisla80@gmail.com" target="_blank">spisla80@gmail.com</a>><br>
> >>> wrote:<br>
> >>><br>
> >>>> Hello Vijay,<br>
> >>>><br>
> >>>> I could reproduce the issue. After doing a simple DIR Listing from<br>
> >>>> Win10 powershell, all brick processes crashes. Its not the same scenario<br>
> >>>> mentioned before but the crash report in the bricks log is the same.<br>
> >>>> Attached you find the backtrace.<br>
> >>>><br>
> >>>> Regards<br>
> >>>> David Spisla<br>
> >>>><br>
> >>>> Am Di., 7. Mai 2019 um 20:08 Uhr schrieb Vijay Bellur <<br>
> >>>> <a href="mailto:vbellur@redhat.com" target="_blank">vbellur@redhat.com</a>>:<br>
> >>>><br>
> >>>>> Hello David,<br>
> >>>>><br>
> >>>>> On Tue, May 7, 2019 at 2:16 AM David Spisla <<a href="mailto:spisla80@gmail.com" target="_blank">spisla80@gmail.com</a>><br>
> >>>>> wrote:<br>
> >>>>><br>
> >>>>>> Hello Vijay,<br>
> >>>>>><br>
> >>>>>> how can I create such a core file? Or will it be created<br>
> >>>>>> automatically if a gluster process crashes?<br>
> >>>>>> Maybe you can give me a hint and will try to get a backtrace.<br>
> >>>>>><br>
> >>>>><br>
> >>>>> Generation of core file is dependent on the system configuration.<br>
> >>>>> `man 5 core` contains useful information to generate a core file in a<br>
> >>>>> directory. Once a core file is generated, you can use gdb to get a<br>
> >>>>> backtrace of all threads (using "thread apply all bt full").<br>
> >>>>><br>
> >>>>><br>
> >>>>>> Unfortunately this bug is not easy to reproduce because it appears<br>
> >>>>>> only sometimes.<br>
> >>>>>><br>
> >>>>><br>
> >>>>> If the bug is not easy to reproduce, having a backtrace from the<br>
> >>>>> generated core would be very useful!<br>
> >>>>><br>
> >>>>> Thanks,<br>
> >>>>> Vijay<br>
> >>>>><br>
> >>>>><br>
> >>>>>><br>
> >>>>>> Regards<br>
> >>>>>> David Spisla<br>
> >>>>>><br>
> >>>>>> Am Mo., 6. Mai 2019 um 19:48 Uhr schrieb Vijay Bellur <<br>
> >>>>>> <a href="mailto:vbellur@redhat.com" target="_blank">vbellur@redhat.com</a>>:<br>
> >>>>>><br>
> >>>>>>> Thank you for the report, David. Do you have core files available on<br>
> >>>>>>> any of the servers? If yes, would it be possible for you to provide a<br>
> >>>>>>> backtrace.<br>
> >>>>>>><br>
> >>>>>>> Regards,<br>
> >>>>>>> Vijay<br>
> >>>>>>><br>
> >>>>>>> On Mon, May 6, 2019 at 3:09 AM David Spisla <<a href="mailto:spisla80@gmail.com" target="_blank">spisla80@gmail.com</a>><br>
> >>>>>>> wrote:<br>
> >>>>>>><br>
> >>>>>>>> Hello folks,<br>
> >>>>>>>><br>
> >>>>>>>> we have a client application (runs on Win10) which does some FOPs<br>
> >>>>>>>> on a gluster volume which is accessed by SMB.<br>
> >>>>>>>><br>
> >>>>>>>> *Scenario 1* is a READ Operation which reads all files<br>
> >>>>>>>> successively and checks if the files data was correctly copied. While doing<br>
> >>>>>>>> this, all brick processes crashes and in the logs one have this crash<br>
> >>>>>>>> report on every brick log:<br>
> >>>>>>>><br>
> >>>>>>>>> CTX_ID:a0359502-2c76-4fee-8cb9-365679dc690e-GRAPH_ID:0-PID:32934-HOST:XX-XXXXX-XX-XX-PC_NAME:shortterm-client-2-RECON_NO:-0, gfid: 00000000-0000-0000-0000-000000000001, req(uid:2000,gid:2000,perm:1,ngrps:1), ctx(uid:0,gid:0,in-groups:0,perm:700,updated-fop:LOOKUP, acl:-) [Permission denied]<br>
> >>>>>>>>> pending frames:<br>
> >>>>>>>>> frame : type(0) op(27)<br>
> >>>>>>>>> frame : type(0) op(40)<br>
> >>>>>>>>> patchset: git://<a href="http://git.gluster.org/glusterfs.git" rel="noreferrer" target="_blank">git.gluster.org/glusterfs.git</a><br>
> >>>>>>>>> signal received: 11<br>
> >>>>>>>>> time of crash:<br>
> >>>>>>>>> 2019-04-16 08:32:21<br>
> >>>>>>>>> configuration details:<br>
> >>>>>>>>> argp 1<br>
> >>>>>>>>> backtrace 1<br>
> >>>>>>>>> dlfcn 1<br>
> >>>>>>>>> libpthread 1<br>
> >>>>>>>>> llistxattr 1<br>
> >>>>>>>>> setfsid 1<br>
> >>>>>>>>> spinlock 1<br>
> >>>>>>>>> epoll.h 1<br>
> >>>>>>>>> xattr.h 1<br>
> >>>>>>>>> st_atim.tv_nsec 1<br>
> >>>>>>>>> package-string: glusterfs 5.5<br>
> >>>>>>>>> /usr/lib64/libglusterfs.so.0(+0x2764c)[0x7f9a5bd4d64c]<br>
> >>>>>>>>> /usr/lib64/libglusterfs.so.0(gf_print_trace+0x306)[0x7f9a5bd57d26]<br>
> >>>>>>>>> /lib64/libc.so.6(+0x361a0)[0x7f9a5af141a0]<br>
> >>>>>>>>> /usr/lib64/glusterfs/5.5/xlator/features/worm.so(+0xb910)[0x7f9a4ef0e910]<br>
> >>>>>>>>> /usr/lib64/glusterfs/5.5/xlator/features/worm.so(+0x8118)[0x7f9a4ef0b118]<br>
> >>>>>>>>> /usr/lib64/glusterfs/5.5/xlator/features/locks.so(+0x128d6)[0x7f9a4f1278d6]<br>
> >>>>>>>>> /usr/lib64/glusterfs/5.5/xlator/features/access-control.so(+0x575b)[0x7f9a4f35975b]<br>
> >>>>>>>>> /usr/lib64/glusterfs/5.5/xlator/features/locks.so(+0xb3b3)[0x7f9a4f1203b3]<br>
> >>>>>>>>> /usr/lib64/glusterfs/5.5/xlator/features/worm.so(+0x85b2)[0x7f9a4ef0b5b2]<br>
> >>>>>>>>> /usr/lib64/libglusterfs.so.0(default_lookup+0xbc)[0x7f9a5bdd7b6c]<br>
> >>>>>>>>> /usr/lib64/libglusterfs.so.0(default_lookup+0xbc)[0x7f9a5bdd7b6c]<br>
> >>>>>>>>> /usr/lib64/glusterfs/5.5/xlator/features/upcall.so(+0xf548)[0x7f9a4e8cf548]<br>
> >>>>>>>>> /usr/lib64/libglusterfs.so.0(default_lookup_resume+0x1e2)[0x7f9a5bdefc22]<br>
> >>>>>>>>> /usr/lib64/libglusterfs.so.0(call_resume+0x75)[0x7f9a5bd733a5]<br>
> >>>>>>>>> /usr/lib64/glusterfs/5.5/xlator/performance/io-threads.so(+0x6088)[0x7f9a4e6b7088]<br>
> >>>>>>>>> /lib64/libpthread.so.0(+0x7569)[0x7f9a5b29f569]<br>
> >>>>>>>>> /lib64/libc.so.6(clone+0x3f)[0x7f9a5afd69af]<br>
> >>>>>>>>><br>
> >>>>>>>>> *Scenario 2 *The application just SET Read-Only on each file<br>
> >>>>>>>> sucessively. After the 70th file was set, all the bricks crashes and again,<br>
> >>>>>>>> one can read this crash report in every brick log:<br>
> >>>>>>>><br>
> >>>>>>>>><br>
> >>>>>>>>><br>
> >>>>>>>>> [2019-05-02 07:43:39.953591] I [MSGID: 139001]<br>
> >>>>>>>>> [posix-acl.c:263:posix_acl_log_permit_denied] 0-longterm-access-control:<br>
> >>>>>>>>> client:<br>
> >>>>>>>>> CTX_ID:21aa9c75-3a5f-41f9-925b-48e4c80bd24a-GRAPH_ID:0-PID:16325-HOST:XXX-X-X-XXX-PC_NAME:longterm-client-0-RECON_NO:-0,<br>
> >>>>>>>>> gfid: 00000000-0000-0000-0000-000000000001,<br>
> >>>>>>>>> req(uid:2000,gid:2000,perm:1,ngrps:1),<br>
> >>>>>>>>> ctx(uid:0,gid:0,in-groups:0,perm:700,updated-fop:LOOKUP, acl:-) [Permission<br>
> >>>>>>>>> denied]<br>
> >>>>>>>>><br>
> >>>>>>>>> pending frames:<br>
> >>>>>>>>><br>
> >>>>>>>>> frame : type(0) op(27)<br>
> >>>>>>>>><br>
> >>>>>>>>> patchset: git://<a href="http://git.gluster.org/glusterfs.git" rel="noreferrer" target="_blank">git.gluster.org/glusterfs.git</a><br>
> >>>>>>>>><br>
> >>>>>>>>> signal received: 11<br>
> >>>>>>>>><br>
> >>>>>>>>> time of crash:<br>
> >>>>>>>>><br>
> >>>>>>>>> 2019-05-02 07:43:39<br>
> >>>>>>>>><br>
> >>>>>>>>> configuration details:<br>
> >>>>>>>>><br>
> >>>>>>>>> argp 1<br>
> >>>>>>>>><br>
> >>>>>>>>> backtrace 1<br>
> >>>>>>>>><br>
> >>>>>>>>> dlfcn 1<br>
> >>>>>>>>><br>
> >>>>>>>>> libpthread 1<br>
> >>>>>>>>><br>
> >>>>>>>>> llistxattr 1<br>
> >>>>>>>>><br>
> >>>>>>>>> setfsid 1<br>
> >>>>>>>>><br>
> >>>>>>>>> spinlock 1<br>
> >>>>>>>>><br>
> >>>>>>>>> epoll.h 1<br>
> >>>>>>>>><br>
> >>>>>>>>> xattr.h 1<br>
> >>>>>>>>><br>
> >>>>>>>>> st_atim.tv_nsec 1<br>
> >>>>>>>>><br>
> >>>>>>>>> package-string: glusterfs 5.5<br>
> >>>>>>>>><br>
> >>>>>>>>> /usr/lib64/libglusterfs.so.0(+0x2764c)[0x7fbb3f0b364c]<br>
> >>>>>>>>><br>
> >>>>>>>>> /usr/lib64/libglusterfs.so.0(gf_print_trace+0x306)[0x7fbb3f0bdd26]<br>
> >>>>>>>>><br>
> >>>>>>>>> /lib64/libc.so.6(+0x361e0)[0x7fbb3e27a1e0]<br>
> >>>>>>>>><br>
> >>>>>>>>><br>
> >>>>>>>>> /usr/lib64/glusterfs/5.5/xlator/features/worm.so(+0xb910)[0x7fbb32257910]<br>
> >>>>>>>>><br>
> >>>>>>>>><br>
> >>>>>>>>> /usr/lib64/glusterfs/5.5/xlator/features/worm.so(+0x8118)[0x7fbb32254118]<br>
> >>>>>>>>><br>
> >>>>>>>>><br>
> >>>>>>>>> /usr/lib64/glusterfs/5.5/xlator/features/locks.so(+0x128d6)[0x7fbb324708d6]<br>
> >>>>>>>>><br>
> >>>>>>>>><br>
> >>>>>>>>> /usr/lib64/glusterfs/5.5/xlator/features/access-control.so(+0x575b)[0x7fbb326a275b]<br>
> >>>>>>>>><br>
> >>>>>>>>><br>
> >>>>>>>>> /usr/lib64/glusterfs/5.5/xlator/features/locks.so(+0xb3b3)[0x7fbb324693b3]<br>
> >>>>>>>>><br>
> >>>>>>>>><br>
> >>>>>>>>> /usr/lib64/glusterfs/5.5/xlator/features/worm.so(+0x85b2)[0x7fbb322545b2]<br>
> >>>>>>>>><br>
> >>>>>>>>> /usr/lib64/libglusterfs.so.0(default_lookup+0xbc)[0x7fbb3f13db6c]<br>
> >>>>>>>>><br>
> >>>>>>>>> /usr/lib64/libglusterfs.so.0(default_lookup+0xbc)[0x7fbb3f13db6c]<br>
> >>>>>>>>><br>
> >>>>>>>>><br>
> >>>>>>>>> /usr/lib64/glusterfs/5.5/xlator/features/upcall.so(+0xf548)[0x7fbb31c18548]<br>
> >>>>>>>>><br>
> >>>>>>>>><br>
> >>>>>>>>> /usr/lib64/libglusterfs.so.0(default_lookup_resume+0x1e2)[0x7fbb3f155c22]<br>
> >>>>>>>>><br>
> >>>>>>>>> /usr/lib64/libglusterfs.so.0(call_resume+0x75)[0x7fbb3f0d93a5]<br>
> >>>>>>>>><br>
> >>>>>>>>><br>
> >>>>>>>>> /usr/lib64/glusterfs/5.5/xlator/performance/io-threads.so(+0x6088)[0x7fbb31a00088]<br>
> >>>>>>>>><br>
> >>>>>>>>> /lib64/libpthread.so.0(+0x7569)[0x7fbb3e605569]<br>
> >>>>>>>>><br>
> >>>>>>>>> /lib64/libc.so.6(clone+0x3f)[0x7fbb3e33c9ef]<br>
> >>>>>>>>><br>
> >>>>>>>><br>
> >>>>>>>> This happens on a 3-Node Gluster v5.5 Cluster on two different<br>
> >>>>>>>> volumes. But both volumes has the same settings:<br>
> >>>>>>>><br>
> >>>>>>>>> Volume Name: shortterm<br>
> >>>>>>>>> Type: Replicate<br>
> >>>>>>>>> Volume ID: 5307e5c5-e8a1-493a-a846-342fb0195dee<br>
> >>>>>>>>> Status: Started<br>
> >>>>>>>>> Snapshot Count: 0<br>
> >>>>>>>>> Number of Bricks: 1 x 3 = 3<br>
> >>>>>>>>> Transport-type: tcp<br>
> >>>>>>>>> Bricks:<br>
> >>>>>>>>> Brick1: fs-xxxxx-c1-n1:/gluster/brick4/glusterbrick<br>
> >>>>>>>>> Brick2: fs-xxxxx-c1-n2:/gluster/brick4/glusterbrick<br>
> >>>>>>>>> Brick3: fs-xxxxx-c1-n3:/gluster/brick4/glusterbrick<br>
> >>>>>>>>> Options Reconfigured:<br>
> >>>>>>>>> storage.reserve: 1<br>
> >>>>>>>>> performance.client-io-threads: off<br>
> >>>>>>>>> nfs.disable: on<br>
> >>>>>>>>> transport.address-family: inet<br>
> >>>>>>>>> user.smb: disable<br>
> >>>>>>>>> features.read-only: off<br>
> >>>>>>>>> features.worm: off<br>
> >>>>>>>>> features.worm-file-level: on<br>
> >>>>>>>>> features.retention-mode: enterprise<br>
> >>>>>>>>> features.default-retention-period: 120<br>
> >>>>>>>>> network.ping-timeout: 10<br>
> >>>>>>>>> features.cache-invalidation: on<br>
> >>>>>>>>> features.cache-invalidation-timeout: 600<br>
> >>>>>>>>> performance.nl-cache: on<br>
> >>>>>>>>> performance.nl-cache-timeout: 600<br>
> >>>>>>>>> client.event-threads: 32<br>
> >>>>>>>>> server.event-threads: 32<br>
> >>>>>>>>> cluster.lookup-optimize: on<br>
> >>>>>>>>> performance.stat-prefetch: on<br>
> >>>>>>>>> performance.cache-invalidation: on<br>
> >>>>>>>>> performance.md-cache-timeout: 600<br>
> >>>>>>>>> performance.cache-samba-metadata: on<br>
> >>>>>>>>> performance.cache-ima-xattrs: on<br>
> >>>>>>>>> performance.io-thread-count: 64<br>
> >>>>>>>>> cluster.use-compound-fops: on<br>
> >>>>>>>>> performance.cache-size: 512MB<br>
> >>>>>>>>> performance.cache-refresh-timeout: 10<br>
> >>>>>>>>> performance.read-ahead: off<br>
> >>>>>>>>> performance.write-behind-window-size: 4MB<br>
> >>>>>>>>> performance.write-behind: on<br>
> >>>>>>>>> storage.build-pgfid: on<br>
> >>>>>>>>> features.utime: on<br>
> >>>>>>>>> storage.ctime: on<br>
> >>>>>>>>> cluster.quorum-type: fixed<br>
> >>>>>>>>> cluster.quorum-count: 2<br>
> >>>>>>>>> features.bitrot: on<br>
> >>>>>>>>> features.scrub: Active<br>
> >>>>>>>>> features.scrub-freq: daily<br>
> >>>>>>>>> cluster.enable-shared-storage: enable<br>
> >>>>>>>>><br>
> >>>>>>>>><br>
> >>>>>>>> Why can this happen to all Brick processes? I don't understand the<br>
> >>>>>>>> crash report. The FOPs are nothing special and after restart brick<br>
> >>>>>>>> processes everything works fine and our application was succeed.<br>
> >>>>>>>><br>
> >>>>>>>> Regards<br>
> >>>>>>>> David Spisla<br>
> >>>>>>>><br>
> >>>>>>>><br>
> >>>>>>>><br>
> >>>>>>>> _______________________________________________<br>
> >>>>>>>> Gluster-users mailing list<br>
> >>>>>>>> <a href="mailto:Gluster-users@gluster.org" target="_blank">Gluster-users@gluster.org</a><br>
> >>>>>>>> <a href="https://lists.gluster.org/mailman/listinfo/gluster-users" rel="noreferrer" target="_blank">https://lists.gluster.org/mailman/listinfo/gluster-users</a><br>
> >>>>>>><br>
> >>>>>>><br>
<br>
> _______________________________________________<br>
> Gluster-users mailing list<br>
> <a href="mailto:Gluster-users@gluster.org" target="_blank">Gluster-users@gluster.org</a><br>
> <a href="https://lists.gluster.org/mailman/listinfo/gluster-users" rel="noreferrer" target="_blank">https://lists.gluster.org/mailman/listinfo/gluster-users</a><br>
<br>
</blockquote></div></div>