[Gluster-users] Brick-Xlators crashes after Set-RO and Read

Thu May 16 08:05:22 UTC 2019

Hello David,

Do you have any custom patches in your deployment? I looked up v5.5 but
could not find the following functions referred to in the core:

map_atime_from_server()
worm_lookup_cbk()

Neither do I see xlator_helper.c in the codebase.

Thanks,
Vijay

#0  map_atime_from_server (this=0x7fdef401af00, stbuf=0x0) at
../../../../xlators/lib/src/xlator_helper.c:21
        __FUNCTION__ = "map_to_atime_from_server"
#1  0x00007fdef39a0382 in worm_lookup_cbk (frame=frame at entry=0x7fdeac0015c8,
cookie=<optimized out>, this=0x7fdef401af00, op_ret=op_ret at entry=-1,
op_errno=op_errno at entry=13,
    inode=inode at entry=0x0, buf=0x0, xdata=0x0, postparent=0x0) at worm.c:531
        priv = 0x7fdef4075378
        ret = 0
        __FUNCTION__ = "worm_lookup_cbk"

On Thu, May 16, 2019 at 12:53 AM David Spisla <spisla80 at gmail.com> wrote:

> Hello Vijay,
>
> I could reproduce the issue. After doing a simple DIR Listing from Win10
> powershell, all brick processes crashes. Its not the same scenario
> mentioned before but the crash report in the bricks log is the same.
> Attached you find the backtrace.
>
> Regards
> David Spisla
>
> Am Di., 7. Mai 2019 um 20:08 Uhr schrieb Vijay Bellur <vbellur at redhat.com
> >:
>
>> Hello David,
>>
>> On Tue, May 7, 2019 at 2:16 AM David Spisla <spisla80 at gmail.com> wrote:
>>
>>> Hello Vijay,
>>>
>>> how can I create such a core file? Or will it be created automatically
>>> if a gluster process crashes?
>>> Maybe you can give me a hint and will try to get a backtrace.
>>>
>>
>> Generation of core file is dependent on the system configuration.  `man 5
>> core` contains useful information to generate a core file in a directory.
>> Once a core file is generated, you can use gdb to get a backtrace of all
>> threads (using "thread apply all bt full").
>>
>>
>>> Unfortunately this bug is not easy to reproduce because it appears only
>>> sometimes.
>>>
>>
>> If the bug is not easy to reproduce, having a backtrace from the
>> generated core would be very useful!
>>
>> Thanks,
>> Vijay
>>
>>
>>>
>>> Regards
>>> David Spisla
>>>
>>> Am Mo., 6. Mai 2019 um 19:48 Uhr schrieb Vijay Bellur <
>>> vbellur at redhat.com>:
>>>
>>>> Thank you for the report, David. Do you have core files available on
>>>> any of the servers? If yes, would it be possible for you to provide a
>>>> backtrace.
>>>>
>>>> Regards,
>>>> Vijay
>>>>
>>>> On Mon, May 6, 2019 at 3:09 AM David Spisla <spisla80 at gmail.com> wrote:
>>>>
>>>>> Hello folks,
>>>>>
>>>>> we have a client application (runs on Win10) which does some FOPs on a
>>>>> gluster volume which is accessed by SMB.
>>>>>
>>>>> *Scenario 1* is a READ Operation which reads all files successively
>>>>> and checks if the files data was correctly copied. While doing this, all
>>>>> brick processes crashes and in the logs one have this crash report on every
>>>>> brick log:
>>>>>
>>>>>> CTX_ID:a0359502-2c76-4fee-8cb9-365679dc690e-GRAPH_ID:0-PID:32934-HOST:XX-XXXXX-XX-XX-PC_NAME:shortterm-client-2-RECON_NO:-0, gfid: 00000000-0000-0000-0000-000000000001, req(uid:2000,gid:2000,perm:1,ngrps:1), ctx(uid:0,gid:0,in-groups:0,perm:700,updated-fop:LOOKUP, acl:-) [Permission denied]
>>>>>> pending frames:
>>>>>> frame : type(0) op(27)
>>>>>> frame : type(0) op(40)
>>>>>> patchset: git://git.gluster.org/glusterfs.git
>>>>>> signal received: 11
>>>>>> time of crash:
>>>>>> 2019-04-16 08:32:21
>>>>>> configuration details:
>>>>>> argp 1
>>>>>> backtrace 1
>>>>>> dlfcn 1
>>>>>> libpthread 1
>>>>>> llistxattr 1
>>>>>> setfsid 1
>>>>>> spinlock 1
>>>>>> epoll.h 1
>>>>>> xattr.h 1
>>>>>> st_atim.tv_nsec 1
>>>>>> package-string: glusterfs 5.5
>>>>>> /usr/lib64/libglusterfs.so.0(+0x2764c)[0x7f9a5bd4d64c]
>>>>>> /usr/lib64/libglusterfs.so.0(gf_print_trace+0x306)[0x7f9a5bd57d26]
>>>>>> /lib64/libc.so.6(+0x361a0)[0x7f9a5af141a0]
>>>>>> /usr/lib64/glusterfs/5.5/xlator/features/worm.so(+0xb910)[0x7f9a4ef0e910]
>>>>>> /usr/lib64/glusterfs/5.5/xlator/features/worm.so(+0x8118)[0x7f9a4ef0b118]
>>>>>> /usr/lib64/glusterfs/5.5/xlator/features/locks.so(+0x128d6)[0x7f9a4f1278d6]
>>>>>> /usr/lib64/glusterfs/5.5/xlator/features/access-control.so(+0x575b)[0x7f9a4f35975b]
>>>>>> /usr/lib64/glusterfs/5.5/xlator/features/locks.so(+0xb3b3)[0x7f9a4f1203b3]
>>>>>> /usr/lib64/glusterfs/5.5/xlator/features/worm.so(+0x85b2)[0x7f9a4ef0b5b2]
>>>>>> /usr/lib64/libglusterfs.so.0(default_lookup+0xbc)[0x7f9a5bdd7b6c]
>>>>>> /usr/lib64/libglusterfs.so.0(default_lookup+0xbc)[0x7f9a5bdd7b6c]
>>>>>> /usr/lib64/glusterfs/5.5/xlator/features/upcall.so(+0xf548)[0x7f9a4e8cf548]
>>>>>> /usr/lib64/libglusterfs.so.0(default_lookup_resume+0x1e2)[0x7f9a5bdefc22]
>>>>>> /usr/lib64/libglusterfs.so.0(call_resume+0x75)[0x7f9a5bd733a5]
>>>>>> /usr/lib64/glusterfs/5.5/xlator/performance/io-threads.so(+0x6088)[0x7f9a4e6b7088]
>>>>>> /lib64/libpthread.so.0(+0x7569)[0x7f9a5b29f569]
>>>>>> /lib64/libc.so.6(clone+0x3f)[0x7f9a5afd69af]
>>>>>>
>>>>>> *Scenario 2 *The application just SET Read-Only on each file
>>>>> sucessively. After the 70th file was set, all the bricks crashes and again,
>>>>> one can read this crash report in every brick log:
>>>>>
>>>>>>
>>>>>>
>>>>>> [2019-05-02 07:43:39.953591] I [MSGID: 139001]
>>>>>> [posix-acl.c:263:posix_acl_log_permit_denied] 0-longterm-access-control:
>>>>>> client:
>>>>>> CTX_ID:21aa9c75-3a5f-41f9-925b-48e4c80bd24a-GRAPH_ID:0-PID:16325-HOST:XXX-X-X-XXX-PC_NAME:longterm-client-0-RECON_NO:-0,
>>>>>> gfid: 00000000-0000-0000-0000-000000000001,
>>>>>> req(uid:2000,gid:2000,perm:1,ngrps:1),
>>>>>> ctx(uid:0,gid:0,in-groups:0,perm:700,updated-fop:LOOKUP, acl:-) [Permission
>>>>>> denied]
>>>>>>
>>>>>> pending frames:
>>>>>>
>>>>>> frame : type(0) op(27)
>>>>>>
>>>>>> patchset: git://git.gluster.org/glusterfs.git
>>>>>>
>>>>>> signal received: 11
>>>>>>
>>>>>> time of crash:
>>>>>>
>>>>>> 2019-05-02 07:43:39
>>>>>>
>>>>>> configuration details:
>>>>>>
>>>>>> argp 1
>>>>>>
>>>>>> backtrace 1
>>>>>>
>>>>>> dlfcn 1
>>>>>>
>>>>>> libpthread 1
>>>>>>
>>>>>> llistxattr 1
>>>>>>
>>>>>> setfsid 1
>>>>>>
>>>>>> spinlock 1
>>>>>>
>>>>>> epoll.h 1
>>>>>>
>>>>>> xattr.h 1
>>>>>>
>>>>>> st_atim.tv_nsec 1
>>>>>>
>>>>>> package-string: glusterfs 5.5
>>>>>>
>>>>>> /usr/lib64/libglusterfs.so.0(+0x2764c)[0x7fbb3f0b364c]
>>>>>>
>>>>>> /usr/lib64/libglusterfs.so.0(gf_print_trace+0x306)[0x7fbb3f0bdd26]
>>>>>>
>>>>>> /lib64/libc.so.6(+0x361e0)[0x7fbb3e27a1e0]
>>>>>>
>>>>>>
>>>>>> /usr/lib64/glusterfs/5.5/xlator/features/worm.so(+0xb910)[0x7fbb32257910]
>>>>>>
>>>>>>
>>>>>> /usr/lib64/glusterfs/5.5/xlator/features/worm.so(+0x8118)[0x7fbb32254118]
>>>>>>
>>>>>>
>>>>>> /usr/lib64/glusterfs/5.5/xlator/features/locks.so(+0x128d6)[0x7fbb324708d6]
>>>>>>
>>>>>>
>>>>>> /usr/lib64/glusterfs/5.5/xlator/features/access-control.so(+0x575b)[0x7fbb326a275b]
>>>>>>
>>>>>>
>>>>>> /usr/lib64/glusterfs/5.5/xlator/features/locks.so(+0xb3b3)[0x7fbb324693b3]
>>>>>>
>>>>>>
>>>>>> /usr/lib64/glusterfs/5.5/xlator/features/worm.so(+0x85b2)[0x7fbb322545b2]
>>>>>>
>>>>>> /usr/lib64/libglusterfs.so.0(default_lookup+0xbc)[0x7fbb3f13db6c]
>>>>>>
>>>>>> /usr/lib64/libglusterfs.so.0(default_lookup+0xbc)[0x7fbb3f13db6c]
>>>>>>
>>>>>>
>>>>>> /usr/lib64/glusterfs/5.5/xlator/features/upcall.so(+0xf548)[0x7fbb31c18548]
>>>>>>
>>>>>>
>>>>>> /usr/lib64/libglusterfs.so.0(default_lookup_resume+0x1e2)[0x7fbb3f155c22]
>>>>>>
>>>>>> /usr/lib64/libglusterfs.so.0(call_resume+0x75)[0x7fbb3f0d93a5]
>>>>>>
>>>>>>
>>>>>> /usr/lib64/glusterfs/5.5/xlator/performance/io-threads.so(+0x6088)[0x7fbb31a00088]
>>>>>>
>>>>>> /lib64/libpthread.so.0(+0x7569)[0x7fbb3e605569]
>>>>>>
>>>>>> /lib64/libc.so.6(clone+0x3f)[0x7fbb3e33c9ef]
>>>>>>
>>>>>
>>>>> This happens on a 3-Node Gluster v5.5 Cluster on two different
>>>>> volumes. But both volumes has the same settings:
>>>>>
>>>>>> Volume Name: shortterm
>>>>>> Type: Replicate
>>>>>> Volume ID: 5307e5c5-e8a1-493a-a846-342fb0195dee
>>>>>> Status: Started
>>>>>> Snapshot Count: 0
>>>>>> Number of Bricks: 1 x 3 = 3
>>>>>> Transport-type: tcp
>>>>>> Bricks:
>>>>>> Brick1: fs-xxxxx-c1-n1:/gluster/brick4/glusterbrick
>>>>>> Brick2: fs-xxxxx-c1-n2:/gluster/brick4/glusterbrick
>>>>>> Brick3: fs-xxxxx-c1-n3:/gluster/brick4/glusterbrick
>>>>>> Options Reconfigured:
>>>>>> storage.reserve: 1
>>>>>> performance.client-io-threads: off
>>>>>> nfs.disable: on
>>>>>> transport.address-family: inet
>>>>>> user.smb: disable
>>>>>> features.read-only: off
>>>>>> features.worm: off
>>>>>> features.worm-file-level: on
>>>>>> features.retention-mode: enterprise
>>>>>> features.default-retention-period: 120
>>>>>> network.ping-timeout: 10
>>>>>> features.cache-invalidation: on
>>>>>> features.cache-invalidation-timeout: 600
>>>>>> performance.nl-cache: on
>>>>>> performance.nl-cache-timeout: 600
>>>>>> client.event-threads: 32
>>>>>> server.event-threads: 32
>>>>>> cluster.lookup-optimize: on
>>>>>> performance.stat-prefetch: on
>>>>>> performance.cache-invalidation: on
>>>>>> performance.md-cache-timeout: 600
>>>>>> performance.cache-samba-metadata: on
>>>>>> performance.cache-ima-xattrs: on
>>>>>> performance.io-thread-count: 64
>>>>>> cluster.use-compound-fops: on
>>>>>> performance.cache-size: 512MB
>>>>>> performance.cache-refresh-timeout: 10
>>>>>> performance.read-ahead: off
>>>>>> performance.write-behind-window-size: 4MB
>>>>>> performance.write-behind: on
>>>>>> storage.build-pgfid: on
>>>>>> features.utime: on
>>>>>> storage.ctime: on
>>>>>> cluster.quorum-type: fixed
>>>>>> cluster.quorum-count: 2
>>>>>> features.bitrot: on
>>>>>> features.scrub: Active
>>>>>> features.scrub-freq: daily
>>>>>> cluster.enable-shared-storage: enable
>>>>>>
>>>>>>
>>>>> Why can this happen to all Brick processes? I don't understand the
>>>>> crash report. The FOPs are nothing special and after restart brick
>>>>> processes everything works fine and our application was succeed.
>>>>>
>>>>> Regards
>>>>> David Spisla
>>>>>
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> Gluster-users mailing list
>>>>> Gluster-users at gluster.org
>>>>> https://lists.gluster.org/mailman/listinfo/gluster-users
>>>>
>>>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20190516/20a821a8/attachment.html>