[Bugs] [Bug 1686009] New: gluster fuse crashed with segmentation fault possibly due to dentry not found

bugzilla at redhat.com bugzilla at redhat.com
Wed Mar 6 14:25:09 UTC 2019


https://bugzilla.redhat.com/show_bug.cgi?id=1686009

            Bug ID: 1686009
           Summary: gluster fuse crashed with segmentation fault possibly
                    due to dentry not found
           Product: GlusterFS
           Version: mainline
            Status: NEW
         Component: core
          Keywords: ZStream
          Severity: urgent
          Priority: high
          Assignee: atumball at redhat.com
          Reporter: atumball at redhat.com
                CC: bmekala at redhat.com, bugs at gluster.org,
                    jahernan at redhat.com, nbalacha at redhat.com,
                    nchilaka at redhat.com, pkarampu at redhat.com,
                    rcyriac at redhat.com, rgowdapp at redhat.com,
                    rhs-bugs at redhat.com, sankarshan at redhat.com,
                    srangana at redhat.com, vbellur at redhat.com
        Depends On: 1685078
  Target Milestone: ---
    Classification: Community



+++ This bug was initially created as a clone of Bug #1685078 +++

Description of problem:
=====================
was performing some non-functional tests for testing rpc fixes and found that
one of the fuse clients crashed as below
[2019-03-01 13:37:02.398653] I [dict.c:471:dict_get]
(-->/usr/lib64/glusterfs/3.12.2/xlator/cluster/replicate.so(+0x6228d)
[0x7f5147de128d]
-->/usr/lib64/glusterfs/3.12.2/xlator/cluster/distribute.so(+0x202f7)
[0x7f5147afc2f7] -->/lib64/libglusterfs.so.0(dict_get+0x10c) [0x7f515a5b2d3c] )
13-dict: !this || key=trusted.glusterfs.dht.mds [Invalid argument]
[2019-03-01 13:37:02.711187] W [inode.c:197:__is_dentry_hashed]
(-->/lib64/libglusterfs.so.0(__inode_path+0x68) [0x7f515a5cd1c8]
-->/lib64/libglusterfs.so.0(+0x3add4) [0x7f515a5cadd4]
-->/lib64/libglusterfs.so.0(+0x3ad7e) [0x7f515a5cad7e] ) 0-fuse: dentry not
found
pending frames:
frame : type(1) op(UNLINK)
frame : type(1) op(UNLINK)
frame : type(1) op(READDIRP)
frame : type(1) op(OPEN)
frame : type(1) op(STAT)
frame : type(0) op(0)
frame : type(0) op(0)
patchset: git://git.gluster.org/glusterfs.git
signal received: 11
time of crash: 
2019-03-01 13:37:02
configuration details:
argp 1
backtrace 1
dlfcn 1
libpthread 1
llistxattr 1
setfsid 1
spinlock 1
epoll.h 1
xattr.h 1
st_atim.tv_nsec 1
package-string: glusterfs 3.12.2
/lib64/libglusterfs.so.0(_gf_msg_backtrace_nomem+0x9d)[0x7f515a5bbb9d]
/lib64/libglusterfs.so.0(gf_print_trace+0x334)[0x7f515a5c6114]
/lib64/libc.so.6(+0x36280)[0x7f5158bf8280]
/lib64/libglusterfs.so.0(+0x3adc2)[0x7f515a5cadc2]
/lib64/libglusterfs.so.0(__inode_path+0x68)[0x7f515a5cd1c8]
/lib64/libglusterfs.so.0(inode_path+0x31)[0x7f515a5cd551]
/lib64/libglusterfs.so.0(loc_touchup+0x7a)[0x7f515a5b9dba]
/usr/lib64/glusterfs/3.12.2/xlator/mount/fuse.so(+0x6f8b)[0x7f515196df8b]
/usr/lib64/glusterfs/3.12.2/xlator/mount/fuse.so(+0x7665)[0x7f515196e665]
/usr/lib64/glusterfs/3.12.2/xlator/mount/fuse.so(+0x7bbd)[0x7f515196ebbd]
/usr/lib64/glusterfs/3.12.2/xlator/mount/fuse.so(+0x7c8e)[0x7f515196ec8e]
/usr/lib64/glusterfs/3.12.2/xlator/mount/fuse.so(+0x7cd0)[0x7f515196ecd0]
/usr/lib64/glusterfs/3.12.2/xlator/mount/fuse.so(+0x1f702)[0x7f5151986702]
/lib64/libpthread.so.0(+0x7dd5)[0x7f51593f7dd5]
/lib64/libc.so.6(clone+0x6d)[0x7f5158cbfead]
---------



warning: core file may not match specified executable file.
Reading symbols from /usr/sbin/glusterfsd...Reading symbols from
/usr/lib/debug/usr/sbin/glusterfsd.debug...done.
done.
Missing separate debuginfo for 
Try: yum --enablerepo='*debug*' install
/usr/lib/debug/.build-id/73/3d1c681cfbd8bbeb11e8b7f80876a9aed6bb74
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Core was generated by `/usr/sbin/glusterfs
--volfile-server=my-machine.redhat.com --volf'.
Program terminated with signal 11, Segmentation fault.
#0  __dentry_search_arbit (inode=inode at entry=0x7f50ec000e98) at inode.c:1450
1450            list_for_each_entry (trav, &inode->dentry_list, inode_list) {
Missing separate debuginfos, use: debuginfo-install
glibc-2.17-260.el7_6.3.x86_64 keyutils-libs-1.5.8-3.el7.x86_64
krb5-libs-1.15.1-37.el7_6.x86_64 libcom_err-1.42.9-13.el7.x86_64
libgcc-4.8.5-36.el7.x86_64 libselinux-2.5-14.1.el7.x86_64
libuuid-2.23.2-59.el7.x86_64 openssl-libs-1.0.2k-16.el7.x86_64
pcre-8.32-17.el7.x86_64 zlib-1.2.7-18.el7.x86_64
(gdb) bt
#0  __dentry_search_arbit (inode=inode at entry=0x7f50ec000e98) at inode.c:1450
#1  0x00007f515a5cd1c8 in __inode_path (inode=inode at entry=0x7f50dc01dfc8,
name=name at entry=0x0, bufp=bufp at entry=0x7f5145ed6d30) at inode.c:1551
#2  0x00007f515a5cd551 in inode_path (inode=0x7f50dc01dfc8,
name=name at entry=0x0, bufp=bufp at entry=0x7f5145ed6d30) at inode.c:1642
#3  0x00007f515a5b9dba in loc_touchup (loc=0x7f5069ee43c0, name=<optimized
out>) at xlator.c:880
#4  0x00007f515196df8b in fuse_resolve_loc_touchup (state=0x7f5069ee43a0) at
fuse-resolve.c:33
#5  fuse_resolve_continue (state=0x7f5069ee43a0) at fuse-resolve.c:704
#6  0x00007f515196e665 in fuse_resolve_inode (state=0x7f5069ee43a0) at
fuse-resolve.c:364
#7  0x00007f515196ebbd in fuse_resolve (state=0x7f5069ee43a0) at
fuse-resolve.c:651
#8  0x00007f515196ec8e in fuse_resolve_all (state=<optimized out>) at
fuse-resolve.c:679
#9  0x00007f515196ecd0 in fuse_resolve_and_resume (state=0x7f5069ee43a0,
fn=0x7f5151972c10 <fuse_getattr_resume>) at fuse-resolve.c:718
#10 0x00007f5151986702 in fuse_thread_proc (data=0x563689a35f10) at
fuse-bridge.c:5781
#11 0x00007f51593f7dd5 in start_thread () from /lib64/libpthread.so.0
#12 0x00007f5158cbfead in clone () from /lib64/libc.so.6
(gdb) t a a bt

Thread 11 (Thread 0x7f515aaa6780 (LWP 4229)):
#0  0x00007f51593f8f47 in pthread_join () from /lib64/libpthread.so.0
#1  0x00007f515a61af78 in event_dispatch_epoll (event_pool=0x563689a2e250) at
event-epoll.c:846
#2  0x0000563687bed538 in main (argc=4, argv=<optimized out>) at
glusterfsd.c:2692

Thread 10 (Thread 0x7f51456d6700 (LWP 4257)):
#0  0x00007f51593fb965 in pthread_cond_wait@@GLIBC_2.3.2 () from
/lib64/libpthread.so.0
#1  0x00007f515197039b in timed_response_loop (data=0x563689a35f10) at
fuse-bridge.c:4660
#2  0x00007f51593f7dd5 in start_thread () from /lib64/libpthread.so.0
#3  0x00007f5158cbfead in clone () from /lib64/libc.so.6

Thread 9 (Thread 0x7f514cc97700 (LWP 4251)):
#0  0x00007f5158cc0483 in epoll_wait () from /lib64/libc.so.6
#1  0x00007f515a61a6e8 in event_dispatch_epoll_worker (data=0x563689a8bd60) at
event-epoll.c:749
#2  0x00007f51593f7dd5 in start_thread () from /lib64/libpthread.so.0
#3  0x00007f5158cbfead in clone () from /lib64/libc.so.6

Thread 8 (Thread 0x7f514d498700 (LWP 4250)):
#0  0x00007f5158cc0483 in epoll_wait () from /lib64/libc.so.6
#1  0x00007f515a61a6e8 in event_dispatch_epoll_worker (data=0x563689a8ba90) at
event-epoll.c:749
#2  0x00007f51593f7dd5 in start_thread () from /lib64/libpthread.so.0
#3  0x00007f5158cbfead in clone () from /lib64/libc.so.6

Thread 7 (Thread 0x7f5150163700 (LWP 4234)):
#0  0x00007f51593fbd12 in pthread_cond_timedwait@@GLIBC_2.3.2 () from
/lib64/libpthread.so.0
#1  0x00007f515a5f7dd8 in syncenv_task (proc=proc at entry=0x563689a49b20) at
syncop.c:603
#2  0x00007f515a5f8ca0 in syncenv_processor (thdata=0x563689a49b20) at
syncop.c:695
#3  0x00007f51593f7dd5 in start_thread () from /lib64/libpthread.so.0
#4  0x00007f5158cbfead in clone () from /lib64/libc.so.6

Thread 6 (Thread 0x7f5151165700 (LWP 4231)):
#0  0x00007f51593ff361 in sigwait () from /lib64/libpthread.so.0
#1  0x0000563687bf0c7b in glusterfs_sigwaiter (arg=<optimized out>) at
glusterfsd.c:2242
#2  0x00007f51593f7dd5 in start_thread () from /lib64/libpthread.so.0
#3  0x00007f5158cbfead in clone () from /lib64/libc.so.6

Thread 5 (Thread 0x7f51322ff700 (LWP 22247)):
#0  0x00007f51593fbd12 in pthread_cond_timedwait@@GLIBC_2.3.2 () from
/lib64/libpthread.so.0
#1  0x00007f515a5f7dd8 in syncenv_task (proc=proc at entry=0x563689a4a2a0) at
syncop.c:603
#2  0x00007f515a5f8ca0 in syncenv_processor (thdata=0x563689a4a2a0) at
syncop.c:695
#3  0x00007f51593f7dd5 in start_thread () from /lib64/libpthread.so.0
#4  0x00007f5158cbfead in clone () from /lib64/libc.so.6

Thread 4 (Thread 0x7f5150964700 (LWP 4232)):
#0  0x00007f5158c86e2d in nanosleep () from /lib64/libc.so.6
#1  0x00007f5158c86cc4 in sleep () from /lib64/libc.so.6
#2  0x00007f515a5e4b9d in pool_sweeper (arg=<optimized out>) at mem-pool.c:470
#3  0x00007f51593f7dd5 in start_thread () from /lib64/libpthread.so.0
#4  0x00007f5158cbfead in clone () from /lib64/libc.so.6

Thread 3 (Thread 0x7f5144ed5700 (LWP 4258)):
#0  0x00007f51593fb965 in pthread_cond_wait@@GLIBC_2.3.2 () from
/lib64/libpthread.so.0
#1  0x00007f51519708d3 in notify_kernel_loop (data=<optimized out>) at
fuse-bridge.c:4584
#2  0x00007f51593f7dd5 in start_thread () from /lib64/libpthread.so.0
#3  0x00007f5158cbfead in clone () from /lib64/libc.so.6

Thread 2 (Thread 0x7f5151966700 (LWP 4230)):
#0  0x00007f51593fee3d in nanosleep () from /lib64/libpthread.so.0
#1  0x00007f515a5c9f86 in gf_timer_proc (data=0x563689a49300) at timer.c:174
#2  0x00007f51593f7dd5 in start_thread () from /lib64/libpthread.so.0
#3  0x00007f5158cbfead in clone () from /lib64/libc.so.6

---Type <return> to continue, or q <return> to quit--- 
Thread 1 (Thread 0x7f5145ed7700 (LWP 4256)):
#0  __dentry_search_arbit (inode=inode at entry=0x7f50ec000e98) at inode.c:1450
#1  0x00007f515a5cd1c8 in __inode_path (inode=inode at entry=0x7f50dc01dfc8,
name=name at entry=0x0, bufp=bufp at entry=0x7f5145ed6d30) at inode.c:1551
#2  0x00007f515a5cd551 in inode_path (inode=0x7f50dc01dfc8,
name=name at entry=0x0, bufp=bufp at entry=0x7f5145ed6d30) at inode.c:1642
#3  0x00007f515a5b9dba in loc_touchup (loc=0x7f5069ee43c0, name=<optimized
out>) at xlator.c:880
#4  0x00007f515196df8b in fuse_resolve_loc_touchup (state=0x7f5069ee43a0) at
fuse-resolve.c:33
#5  fuse_resolve_continue (state=0x7f5069ee43a0) at fuse-resolve.c:704
#6  0x00007f515196e665 in fuse_resolve_inode (state=0x7f5069ee43a0) at
fuse-resolve.c:364
#7  0x00007f515196ebbd in fuse_resolve (state=0x7f5069ee43a0) at
fuse-resolve.c:651
#8  0x00007f515196ec8e in fuse_resolve_all (state=<optimized out>) at
fuse-resolve.c:679
#9  0x00007f515196ecd0 in fuse_resolve_and_resume (state=0x7f5069ee43a0,
fn=0x7f5151972c10 <fuse_getattr_resume>) at fuse-resolve.c:718
#10 0x00007f5151986702 in fuse_thread_proc (data=0x563689a35f10) at
fuse-bridge.c:5781
#11 0x00007f51593f7dd5 in start_thread () from /lib64/libpthread.so.0
#12 0x00007f5158cbfead in clone () from /lib64/libc.so.6
(gdb) q



################# on another client too hit same crash ##########
 sing host libthread_db library "/lib64/libthread_db.so.1".
Core was generated by `/usr/sbin/glusterfs --acl
--volfile-server=my-machine.redhat.com'.
Program terminated with signal 11, Segmentation fault.
#0  __dentry_search_arbit (inode=inode at entry=0x7f0a5002eab8) at inode.c:1450
1450            list_for_each_entry (trav, &inode->dentry_list, inode_list) {
Missing separate debuginfos, use: debuginfo-install
glusterfs-fuse-3.12.2-43.el7.x86_64
(gdb) bt
#0  __dentry_search_arbit (inode=inode at entry=0x7f0a5002eab8) at inode.c:1450
#1  0x00007f0ac23a01c8 in __inode_path (inode=inode at entry=0x7f0a50009e68, 
    name=name at entry=0x7f0a028f1b50 "fresh_top.log",
bufp=bufp at entry=0x7f0985ba36e8)
    at inode.c:1551
#2  0x00007f0ac23a0551 in inode_path (inode=0x7f0a50009e68, 
    name=0x7f0a028f1b50 "fresh_top.log", bufp=bufp at entry=0x7f0985ba36e8)
    at inode.c:1642
#3  0x00007f0ab9740489 in fuse_resolve_entry (state=0x7f0985ba3570) at
fuse-resolve.c:99
#4  0x00007f0ab974162d in fuse_resolve_parent
(state=state at entry=0x7f0985ba3570) at fuse-resolve.c:312
#5  0x00007f0ab9741998 in fuse_resolve (state=0x7f0985ba3570) at
fuse-resolve.c:647
#6  0x00007f0ab9741c8e in fuse_resolve_all (state=<optimized out>) at
fuse-resolve.c:679
#7  0x00007f0ab9741cd0 in fuse_resolve_and_resume (state=0x7f0985ba3570,
fn=0x7f0ab9744e40 <fuse_lookup_resume>)
    at fuse-resolve.c:718
---Type <return> to continue, or q <return> to quit---
#8  0x00007f0ab9759702 in fuse_thread_proc (data=0x559b1ebcf0c0) at
fuse-bridge.c:5781
#9  0x00007f0ac11cadd5 in start_thread () from /lib64/libpthread.so.0
#10 0x00007f0ac0a92ead in clone () from /lib64/libc.so.6
(gdb) t a a bt

Thread 11 (Thread 0x7f0ac2879780 (LWP 4246)):
#0  0x00007f0ac11cbf47 in pthread_join () from /lib64/libpthread.so.0
#1  0x00007f0ac23edf78 in event_dispatch_epoll (event_pool=0x559b1ebc7250) at
event-epoll.c:846
#2  0x0000559b1de33538 in main (argc=5, argv=<optimized out>) at
glusterfsd.c:2692

Thread 10 (Thread 0x7f0aad2ae700 (LWP 4260)):
#0  0x00007f0ac11ce965 in pthread_cond_wait@@GLIBC_2.3.2 () from
/lib64/libpthread.so.0
#1  0x00007f0ab974339b in timed_response_loop (data=0x559b1ebcf0c0) at
fuse-bridge.c:4660
#2  0x00007f0ac11cadd5 in start_thread () from /lib64/libpthread.so.0
#3  0x00007f0ac0a92ead in clone () from /lib64/libc.so.6

Thread 9 (Thread 0x7f0ab7f36700 (LWP 4251)):
---Type <return> to continue, or q <return> to quit---
#0  0x00007f0ac11ced12 in pthread_cond_timedwait@@GLIBC_2.3.2 () from
/lib64/libpthread.so.0
#1  0x00007f0ac23cadd8 in syncenv_task (proc=proc at entry=0x559b1ebe2e40) at
syncop.c:603
#2  0x00007f0ac23cbca0 in syncenv_processor (thdata=0x559b1ebe2e40) at
syncop.c:695
#3  0x00007f0ac11cadd5 in start_thread () from /lib64/libpthread.so.0
#4  0x00007f0ac0a92ead in clone () from /lib64/libc.so.6

Thread 8 (Thread 0x7f0ab8737700 (LWP 4250)):
#0  0x00007f0ac0a59e2d in nanosleep () from /lib64/libc.so.6
#1  0x00007f0ac0a59cc4 in sleep () from /lib64/libc.so.6
#2  0x00007f0ac23b7b9d in pool_sweeper (arg=<optimized out>) at mem-pool.c:470
#3  0x00007f0ac11cadd5 in start_thread () from /lib64/libpthread.so.0
#4  0x00007f0ac0a92ead in clone () from /lib64/libc.so.6

---Type <return> to continue, or q <return> to quit---
Thread 7 (Thread 0x7f0ab9739700 (LWP 4248)):
#0  0x00007f0ac11d1e3d in nanosleep () from /lib64/libpthread.so.0
#1  0x00007f0ac239cf86 in gf_timer_proc (data=0x559b1ebe2620) at timer.c:174
#2  0x00007f0ac11cadd5 in start_thread () from /lib64/libpthread.so.0
#3  0x00007f0ac0a92ead in clone () from /lib64/libc.so.6

Thread 6 (Thread 0x7f0ab4a6a700 (LWP 4256)):
#0  0x00007f0ac11d14ed in __lll_lock_wait () from /lib64/libpthread.so.0
#1  0x00007f0ac11ccdcb in _L_lock_883 () from /lib64/libpthread.so.0
#2  0x00007f0ac11ccc98 in pthread_mutex_lock () from /lib64/libpthread.so.0
#3  0x00007f0ac239eee9 in inode_unref (inode=0x7f09850f0d98) at inode.c:668
#4  0x00007f0ac238ca02 in loc_wipe (loc=loc at entry=0x7f09865af178) at
xlator.c:768
#5  0x00007f0ab40231ee in client_local_wipe (local=local at entry=0x7f09865af178)
at client-helpers.c:127
---Type <return> to continue, or q <return> to quit---
#6  0x00007f0ab4032f0d in client3_3_lookup_cbk (req=<optimized out>,
iov=<optimized out>, count=<optimized out>, 
    myframe=0x7f098676f668) at client-rpc-fops.c:2872
#7  0x00007f0ac2134a00 in rpc_clnt_handle_reply
(clnt=clnt at entry=0x7f097ca19cb0, pollin=pollin at entry=0x7f097ccffd50)
    at rpc-clnt.c:778
#8  0x00007f0ac2134d6b in rpc_clnt_notify (trans=<optimized out>,
mydata=0x7f097ca19ce0, event=<optimized out>, 
    data=0x7f097ccffd50) at rpc-clnt.c:971
#9  0x00007f0ac2130ae3 in rpc_transport_notify (this=this at entry=0x7f098e451610, 
    event=event at entry=RPC_TRANSPORT_MSG_RECEIVED,
data=data at entry=0x7f097ccffd50) at rpc-transport.c:557
#10 0x00007f0ab6d22586 in socket_event_poll_in (this=this at entry=0x7f098e451610,
notify_handled=<optimized out>)
    at socket.c:2322
#11 0x00007f0ab6d24bca in socket_event_handler (fd=33, idx=26, gen=4,
data=0x7f098e451610, poll_in=<optimized out>, 
    poll_out=<optimized out>, poll_err=0, event_thread_died=0 '\000') at
socket.c:2482
#12 0x00007f0ac23ed870 in event_dispatch_epoll_handler (event=0x7f0ab4a69e70,
event_pool=0x559b1ebc7250)
---Type <return> to continue, or q <return> to quit---
    at event-epoll.c:643
#13 event_dispatch_epoll_worker (data=0x559b1ec250a0) at event-epoll.c:759
#14 0x00007f0ac11cadd5 in start_thread () from /lib64/libpthread.so.0
#15 0x00007f0ac0a92ead in clone () from /lib64/libc.so.6

Thread 5 (Thread 0x7f0ab526b700 (LWP 4255)):
#0  0x00007f0ac11d14ed in __lll_lock_wait () from /lib64/libpthread.so.0
#1  0x00007f0ac11ccdcb in _L_lock_883 () from /lib64/libpthread.so.0
#2  0x00007f0ac11ccc98 in pthread_mutex_lock () from /lib64/libpthread.so.0
#3  0x00007f0ac23a1819 in inode_is_linked (inode=inode at entry=0x7f0a4c03b8f8) at
inode.c:2490
#4  0x00007f0aafdd2693 in afr_read_subvol_select_by_policy
(inode=inode at entry=0x7f0a4c03b8f8, 
    this=this at entry=0x7f097d715a70, readable=readable at entry=0x7f0ab526a420
"\001\001\001\265\n\177", 
    args=args at entry=0x0) at afr-common.c:1685
---Type <return> to continue, or q <return> to quit---
#5  0x00007f0aafdd29d6 in afr_read_subvol_get
(inode=inode at entry=0x7f0a4c03b8f8, this=0x7f097d715a70, 
    subvol_p=subvol_p at entry=0x0, readables=readables at entry=0x0,
event_p=event_p at entry=0x0, 
    type=type at entry=AFR_DATA_TRANSACTION, args=args at entry=0x0) at
afr-common.c:1771
#6  0x00007f0aafde1050 in afr_get_parent_read_subvol (readable=0x7f0ab526a540
"\001\001\001|\t\177", 
    replies=<optimized out>, parent=0x7f0a4c03b8f8, this=<optimized out>) at
afr-common.c:2167
#7  afr_lookup_done (frame=frame at entry=0x7f098664dcb8,
this=this at entry=0x7f097d715a70) at afr-common.c:2319
#8  0x00007f0aafde2058 in afr_lookup_metadata_heal_check
(frame=frame at entry=0x7f098664dcb8, 
    this=this at entry=0x7f097d715a70) at afr-common.c:2771
#9  0x00007f0aafde2a5b in afr_lookup_entry_heal
(frame=frame at entry=0x7f098664dcb8, this=this at entry=0x7f097d715a70)
    at afr-common.c:2920
#10 0x00007f0aafde2e3d in afr_lookup_cbk (frame=frame at entry=0x7f098664dcb8,
cookie=<optimized out>, 
    this=0x7f097d715a70, op_ret=<optimized out>, op_errno=<optimized out>,
inode=inode at entry=0x7f09850f0d98, 
    buf=buf at entry=0x7f0ab526a900, xdata=0x7f0993075e88,
postparent=postparent at entry=0x7f0ab526a970)
---Type <return> to continue, or q <return> to quit---
    at afr-common.c:2968
#11 0x00007f0ab4032efd in client3_3_lookup_cbk (req=<optimized out>,
iov=<optimized out>, count=<optimized out>, 
    myframe=0x7f0985df1b58) at client-rpc-fops.c:2872
#12 0x00007f0ac2134a00 in rpc_clnt_handle_reply
(clnt=clnt at entry=0x7f097c49ee60, pollin=pollin at entry=0x7f09937ef6c0)
    at rpc-clnt.c:778
#13 0x00007f0ac2134d6b in rpc_clnt_notify (trans=<optimized out>,
mydata=0x7f097c49ee90, event=<optimized out>, 
    data=0x7f09937ef6c0) at rpc-clnt.c:971
#14 0x00007f0ac2130ae3 in rpc_transport_notify (this=this at entry=0x7f097e3ea6a0, 
    event=event at entry=RPC_TRANSPORT_MSG_RECEIVED,
data=data at entry=0x7f09937ef6c0) at rpc-transport.c:557
#15 0x00007f0ab6d22586 in socket_event_poll_in (this=this at entry=0x7f097e3ea6a0,
notify_handled=<optimized out>)
    at socket.c:2322
#16 0x00007f0ab6d24bca in socket_event_handler (fd=29, idx=20, gen=7,
data=0x7f097e3ea6a0, poll_in=<optimized out>, 
    poll_out=<optimized out>, poll_err=0, event_thread_died=0 '\000') at
socket.c:2482
---Type <return> to continue, or q <return> to quit---
#17 0x00007f0ac23ed870 in event_dispatch_epoll_handler (event=0x7f0ab526ae70,
event_pool=0x559b1ebc7250)
    at event-epoll.c:643
#18 event_dispatch_epoll_worker (data=0x559b1ec24dd0) at event-epoll.c:759
#19 0x00007f0ac11cadd5 in start_thread () from /lib64/libpthread.so.0
#20 0x00007f0ac0a92ead in clone () from /lib64/libc.so.6

Thread 4 (Thread 0x7f0ab7735700 (LWP 4252)):
#0  0x00007f0ac11ced12 in pthread_cond_timedwait@@GLIBC_2.3.2 () from
/lib64/libpthread.so.0
#1  0x00007f0ac23cadd8 in syncenv_task (proc=proc at entry=0x559b1ebe3200) at
syncop.c:603
#2  0x00007f0ac23cbca0 in syncenv_processor (thdata=0x559b1ebe3200) at
syncop.c:695
#3  0x00007f0ac11cadd5 in start_thread () from /lib64/libpthread.so.0
#4  0x00007f0ac0a92ead in clone () from /lib64/libc.so.6

---Type <return> to continue, or q <return> to quit---
Thread 3 (Thread 0x7f0aacaad700 (LWP 4261)):
#0  0x00007f0ac11ce965 in pthread_cond_wait@@GLIBC_2.3.2 () from
/lib64/libpthread.so.0
#1  0x00007f0ab97438d3 in notify_kernel_loop (data=<optimized out>) at
fuse-bridge.c:4584
#2  0x00007f0ac11cadd5 in start_thread () from /lib64/libpthread.so.0
#3  0x00007f0ac0a92ead in clone () from /lib64/libc.so.6

Thread 2 (Thread 0x7f0ab8f38700 (LWP 4249)):
#0  0x00007f0ac11d2361 in sigwait () from /lib64/libpthread.so.0
#1  0x0000559b1de36c7b in glusterfs_sigwaiter (arg=<optimized out>) at
glusterfsd.c:2242
#2  0x00007f0ac11cadd5 in start_thread () from /lib64/libpthread.so.0
#3  0x00007f0ac0a92ead in clone () from /lib64/libc.so.6

Thread 1 (Thread 0x7f0aadaaf700 (LWP 4259)):
---Type <return> to continue, or q <return> to quit---
#0  __dentry_search_arbit (inode=inode at entry=0x7f0a5002eab8) at inode.c:1450
#1  0x00007f0ac23a01c8 in __inode_path (inode=inode at entry=0x7f0a50009e68, 
    name=name at entry=0x7f0a028f1b50 "fresh_top.log",
bufp=bufp at entry=0x7f0985ba36e8)
    at inode.c:1551
#2  0x00007f0ac23a0551 in inode_path (inode=0x7f0a50009e68, 
    name=0x7f0a028f1b50 "fresh_top.log", bufp=bufp at entry=0x7f0985ba36e8)
    at inode.c:1642
#3  0x00007f0ab9740489 in fuse_resolve_entry (state=0x7f0985ba3570) at
fuse-resolve.c:99
#4  0x00007f0ab974162d in fuse_resolve_parent
(state=state at entry=0x7f0985ba3570) at fuse-resolve.c:312
#5  0x00007f0ab9741998 in fuse_resolve (state=0x7f0985ba3570) at
fuse-resolve.c:647
#6  0x00007f0ab9741c8e in fuse_resolve_all (state=<optimized out>) at
fuse-resolve.c:679
#7  0x00007f0ab9741cd0 in fuse_resolve_and_resume (state=0x7f0985ba3570,
fn=0x7f0ab9744e40 <fuse_lookup_resume>)
    at fuse-resolve.c:718
---Type <return> to continue, or q <return> to quit---
#8  0x00007f0ab9759702 in fuse_thread_proc (data=0x559b1ebcf0c0) at
fuse-bridge.c:5781
#9  0x00007f0ac11cadd5 in start_thread () from /lib64/libpthread.so.0
#10 0x00007f0ac0a92ead in clone () from /lib64/libc.so.6
(gdb) 


1. 3x3 volume
2. IOs triggered from 8 different clients both as root and non root user for
about a week, with quotas/uss enabled and set after 2 days
3. after about a week, did a add-brick with 3 replica sets to make it 6x3 and
triggered rebalance and left it over weekend



--- Additional comment from Amar Tumballi on 2019-03-04 11:49:51 UTC ---

Checking the core. My initial suspicion was on lru-limit, but doesn't look so.

---------------
(gdb) p name
$22 = 0x7f0a028f1b50 "fresh_top.log"
(gdb) p *inode
$23 = {table = 0x7f09971d46f0, gfid =
"\353\034\020'DZG\266\271\034\240\031\205ׂL", lock = {spinlock = 0, mutex =
{__data = {__lock = 0, __count = 0, 
        __owner = 0, __nusers = 0, __kind = 0, __spins = 0, __elision = 0,
__list = {__prev = 0x0, __next = 0x0}}, __size = '\000' <repeats 39 times>, 
      __align = 0}}, nlookup = 0, fd_count = 0, active_fd_count = 0, ref = 3,
ia_type = IA_IFDIR, fd_list = {next = 0x7f0a50009ec0, prev = 0x7f0a50009ec0}, 
  dentry_list = {next = 0x7f0a4c0386f8, prev = 0x7f0a4c0386f8}, hash = {next =
0x7f097f4971e0, prev = 0x7f097f4971e0}, list = {next = 0x7f0985631490, 
    prev = 0x7f098536bcf0}, _ctx = 0x7f0a50011380, invalidate_sent = _gf_false}

(gdb) p *inode->table
$24 = {lock = {__data = {__lock = 2, __count = 0, __owner = 4259, __nusers = 1,
__kind = 0, __spins = 0, __elision = 0, __list = {__prev = 0x0, __next = 0x0}}, 
    __size = "\002\000\000\000\000\000\000\000\243\020\000\000\001", '\000'
<repeats 26 times>, __align = 2}, hashsize = 14057, 
  name = 0x7f098fc5b040 "meta-autoload/inode", root = 0x7f097ca5a168, xl =
0x7f097e043300, lru_limit = 131072, inode_hash = 0x7f097f414d20, 
  name_hash = 0x7f097f514d70, active = {next = 0x7f098536bcf0, prev =
0x7f097ca5a1f0}, active_size = 26, lru = {next = 0x7f0985f9ccd0, prev =
0x7f09803d4020}, 
  lru_size = 70, purge = {next = 0x7f09971d4780, prev = 0x7f09971d4780},
purge_size = 0, inode_pool = 0x7f09971d4830, dentry_pool = 0x7f09971d48f0, 
  fd_mem_pool = 0x7f098df5eb80, ctxcount = 33, invalidator_fn = 0x7f0ab97425d0
<fuse_inode_invalidate_fn>, invalidator_xl = 0x559b1ebcf0c0, invalidate = {
    next = 0x7f09971d47c8, prev = 0x7f09971d47c8}, invalidate_size = 0}

(gdb) p *inode->table->root
$29 = {table = 0x7f09971d46f0, gfid = '\000' <repeats 15 times>, "\001", lock =
{spinlock = 0, mutex = {__data = {__lock = 0, __count = 0, __owner = 0, 
        __nusers = 0, __kind = 0, __spins = 0, __elision = 0, __list = {__prev
= 0x0, __next = 0x0}}, __size = '\000' <repeats 39 times>, __align = 0}}, 
  nlookup = 0, fd_count = 0, active_fd_count = 0, ref = 1, ia_type = IA_IFDIR,
fd_list = {next = 0x7f097ca5a1c0, prev = 0x7f097ca5a1c0}, dentry_list = {
    next = 0x7f097ca5a1d0, prev = 0x7f097ca5a1d0}, hash = {next =
0x7f097f414d30, prev = 0x7f097f414d30}, list = {next = 0x7f09971d4750, prev =
0x7f0a50012840}, 
  _ctx = 0x7f097c72e1a0, invalidate_sent = _gf_false}

(gdb) p *((dentry_t *)inode->dentry_list)
$26 = {inode_list = {next = 0x7f0a50009ed0, prev = 0x7f0a50009ed0}, hash =
{next = 0x7f097f536fc0, prev = 0x7f097f536fc0}, inode = 0x7f0a50009e68, 
  name = 0x7f0a4c00d340 "top.dir", parent = 0x7f0a5002eab8}

<CORRUPTED>
(gdb) p *((dentry_t *)inode->dentry_list)->parent
$27 = {table = 0x7f0a5004f6a8, gfid =
"\000\000\000\000\000\000\000\000\310\352\002P\n\177\000", lock = {spinlock =
1342368456, mutex = {__data = {
        __lock = 1342368456, __count = 32522, __owner = 0, __nusers = 0, __kind
= 515698880, __spins = 21915, __elision = 0, __list = {__prev = 0x0, 
          __next = 0x0}}, __size =
"\310\352\002P\n\177\000\000\000\000\000\000\000\000\000\000\300\360\274\036\233U",
'\000' <repeats 17 times>, 
      __align = 139682268768968}}, nlookup = 0, fd_count = 0, active_fd_count =
0, ref = 4294967295, ia_type = IA_INVAL, fd_list = {next = 0x0, prev = 0x0}, 
  dentry_list = {next = 0x0, prev = 0x100000000}, hash = {next = 0x0, prev =
0x0}, list = {next = 0x0, prev = 0x0}, _ctx = 0x0, invalidate_sent = _gf_false}


---------------

Notice that lru_size is no where close to lru-limit.

Also the inode by itself is fine. The issue is, while everything is under lock,
a parent dentry looks to be gone, or rather freed. A possible case of extra
unref() ??

Looking at the info, that this is 'top.dir' mostly it should have been linked
to root inode. But dentry is freed. Will look more into this, and keep this
updated. Also, looks like something which would have existed forever by the
first look.

--- Additional comment from Amar Tumballi on 2019-03-04 12:47:00 UTC ---

> [2019-03-01 13:37:02.711187] W [inode.c:197:__is_dentry_hashed] (-->/lib64/libglusterfs.so.0(__inode_path+0x68) [0x7f515a5cd1c8] -->/lib64/libglusterfs.so.0(+0x3add4) [0x7f515a5cadd4] -->/lib64/libglusterfs.so.0(+0x3ad7e) [0x7f515a5cad7e] ) 0-fuse: dentry not found
> pending frames:
> frame : type(1) op(UNLINK)
> frame : type(1) op(UNLINK)
> frame : type(1) op(READDIRP)

----
(gdb) bt
#0  __dentry_search_arbit (inode=inode at entry=0x7f0a5002eab8) at inode.c:1450
(gdb) l
1445            dentry_t *trav = NULL;
1446
1447            if (!inode)
1448                    return NULL;
1449
1450            list_for_each_entry (trav, &inode->dentry_list, inode_list) {
1451                    if (__is_dentry_hashed (trav)) {
1452                            dentry = trav;
1453                            break;

----

Looks like we need to see if trav is null, and break the loop. Mainly here,
__is_dentry_hashed() has given 0 output, and we still continue to traverse the
list. I guess, that should have stopped.

Still checking.


--- Additional comment from Amar Tumballi on 2019-03-04 14:23:18 UTC ---

As per comment #4, 

> Also, looks like something which would have existed forever by the first look.

I suspect this to be a bug in code since a very long time. If in these lists,
if dentry is NULL, by the execution, next iteration will definitely crash,
which happened here. Need to traceback why this happened when every possible
change operation in inode happens with table lock.


Referenced Bugs:

https://bugzilla.redhat.com/show_bug.cgi?id=1685078
[Bug 1685078] systemic: gluster fuse crashed with segmentation fault possibly
due to dentry not found
-- 
You are receiving this mail because:
You are on the CC list for the bug.


More information about the Bugs mailing list