[Bugs] [Bug 1596513] New: glustershd crashes when index heal is launched before graph is initialized.

Fri Jun 29 07:23:53 UTC 2018

https://bugzilla.redhat.com/show_bug.cgi?id=1596513

            Bug ID: 1596513
           Summary: glustershd crashes when index heal is launched before
                    graph is initialized.
           Product: GlusterFS
           Version: mainline
         Component: core
          Assignee: bugs at gluster.org
          Reporter: ravishankar at redhat.com
                CC: bugs at gluster.org

Description of problem:
glustershd crashes when index heal is launched via CLI before graph is
initialized.

Version-Release number of selected component (if applicable)/ How reproducible:
I'm able to reproduce this easily on glusterfs-3.8.4 and very infrequently on
glusterfs-3.12.2 (only once on 3.12.2)

Steps to Reproduce:
1. create a replica 2 volume and start it.
2. `while true; do gluster volume heal <volname>;sleep 0.5; done` in one
terminal.
3. In another terminal, keep running 'service glusterd restart`

Actual results:
Once in a while shd crashes and never comes up until manually restarted:

[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Core was generated by `/usr/local/sbin/glusterfs -s localhost --volfile-id
gluster/glustershd -p /var/'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x000000000040cdfa in glusterfs_handle_translator_op (req=0x7f0fa4003490)
at glusterfsd-mgmt.c:793
793             any = active->first;
[Current thread is 1 (Thread 0x7f0face2c700 (LWP 3716))]
Missing separate debuginfos, use: dnf debuginfo-install
glibc-2.25-12.fc26.x86_64 libgcc-7.2.1-2.fc26.x86_64
libuuid-2.30.2-1.fc26.x86_64 openssl-libs-1.1.0f-7.fc26.x86_64
sssd-client-1.16.0-1.fc26.x86_64 zlib-1.2.11-2.fc26.x86_64
(gdb) t a a bt

Thread 7 (Thread 0x7f0fabd86700 (LWP 3717)):
#0  0x00007f0fb65adce6 in fnmatch@@GLIBC_2.2.5 () from /lib64/libc.so.6
#1  0x00007f0fb7f43f42 in gf_add_cmdline_options (graph=0x7f0fa4000c40,
cmd_args=0x15c2010) at graph.c:299
#2  0x00007f0fb7f449c0 in glusterfs_graph_prepare (graph=0x7f0fa4000c40,
ctx=0x15c2010, volume_name=0x0) at graph.c:588
#3  0x000000000040a74b in glusterfs_process_volfp (ctx=0x15c2010,
fp=0x7f0fa4006920) at glusterfsd.c:2368
#4  0x000000000040fc81 in mgmt_getspec_cbk (req=0x7f0fa4001d10,
iov=0x7f0fa4001d50, count=1, myframe=0x7f0fa4001560) at glusterfsd-mgmt.c:1989
#5  0x00007f0fb7cc26b5 in rpc_clnt_handle_reply (clnt=0x163fef0,
pollin=0x7f0fa40061b0) at rpc-clnt.c:778
#6  0x00007f0fb7cc2c53 in rpc_clnt_notify (trans=0x1640120, mydata=0x163ff20,
event=RPC_TRANSPORT_MSG_RECEIVED, data=0x7f0fa40061b0) at rpc-clnt.c:971
#7  0x00007f0fb7cbecb8 in rpc_transport_notify (this=0x1640120,
event=RPC_TRANSPORT_MSG_RECEIVED, data=0x7f0fa40061b0) at rpc-transport.c:538
#8  0x00007f0fac41919e in socket_event_poll_in (this=0x1640120,
notify_handled=_gf_true) at socket.c:2315
#9  0x00007f0fac4197c3 in socket_event_handler (fd=10, idx=1, gen=1,
data=0x1640120, poll_in=1, poll_out=0, poll_err=0) at socket.c:2467
#10 0x00007f0fb7f6d367 in event_dispatch_epoll_handler (event_pool=0x15f9240,
event=0x7f0fabd85e94) at event-epoll.c:583
#11 0x00007f0fb7f6d63e in event_dispatch_epoll_worker (data=0x1642f90) at
event-epoll.c:659
#12 0x00007f0fb6d3736d in start_thread () from /lib64/libpthread.so.0
#13 0x00007f0fb65e0e1f in clone () from /lib64/libc.so.6

Thread 6 (Thread 0x7f0fad62d700 (LWP 3715)):
#0  0x00007f0fb6d3deb6 in pthread_cond_timedwait@@GLIBC_2.3.2 () from
/lib64/libpthread.so.0
#1  0x00007f0fb7f48274 in syncenv_task (proc=0x16033c0) at syncop.c:603
#2  0x00007f0fb7f4850f in syncenv_processor (thdata=0x16033c0) at syncop.c:695
#3  0x00007f0fb6d3736d in start_thread () from /lib64/libpthread.so.0
#4  0x00007f0fb65e0e1f in clone () from /lib64/libc.so.6

Thread 5 (Thread 0x7f0fade2e700 (LWP 3714)):
#0  0x00007f0fb65a4c0d in nanosleep () from /lib64/libc.so.6
#1  0x00007f0fb65a4b4a in sleep () from /lib64/libc.so.6
#2  0x00007f0fb7f32762 in pool_sweeper (arg=0x0) at mem-pool.c:481
#3  0x00007f0fb6d3736d in start_thread () from /lib64/libpthread.so.0
#4  0x00007f0fb65e0e1f in clone () from /lib64/libc.so.6

Thread 4 (Thread 0x7f0fae62f700 (LWP 3713)):
#0  0x00007f0fb6d41f56 in sigwait () from /lib64/libpthread.so.0
#1  0x000000000040a001 in glusterfs_sigwaiter (arg=0x7fff4608dcd0) at
glusterfsd.c:2137
#2  0x00007f0fb6d3736d in start_thread () from /lib64/libpthread.so.0
#3  0x00007f0fb65e0e1f in clone () from /lib64/libc.so.6

Thread 3 (Thread 0x7f0faee30700 (LWP 3712)):
#0  0x00007f0fb6d4192d in nanosleep () from /lib64/libpthread.so.0
#1  0x00007f0fb7f0ee1c in gf_timer_proc (data=0x1600ed0) at timer.c:174
#2  0x00007f0fb6d3736d in start_thread () from /lib64/libpthread.so.0
#3  0x00007f0fb65e0e1f in clone () from /lib64/libc.so.6

Thread 2 (Thread 0x7f0fb83fe780 (LWP 3711)):
#0  0x00007f0fb6d3883d in pthread_join () from /lib64/libpthread.so.0
#1  0x00007f0fb7f6d89c in event_dispatch_epoll (event_pool=0x15f9240) at
event-epoll.c:746
#2  0x00007f0fb7f30f3a in event_dispatch (event_pool=0x15f9240) at event.c:124
#3  0x000000000040acce in main (argc=13, argv=0x7fff4608eec8) at
glusterfsd.c:2550

Thread 1 (Thread 0x7f0face2c700 (LWP 3716)):
#0  0x000000000040cdfa in glusterfs_handle_translator_op (req=0x7f0fa4003490)
at glusterfsd-mgmt.c:793
#1  0x00007f0fb7f47a44 in synctask_wrap () at syncop.c:375
#2  0x00007f0fb651c950 in ?? () from /lib64/libc.so.6
#3  0x0000000000000000 in ?? ()
(gdb) l
788                     goto out;
789             }
790
791             ctx = glusterfsd_ctx;
792             active = ctx->active;
793             any = active->first;
794             input = dict_new ();
795             ret = dict_unserialize (xlator_req.input.input_val,
796                                     xlator_req.input.input_len,
797                                     &input);
(gdb) p ctx->active
$1 = (glusterfs_graph_t *) 0x0

Expected results:
shd must not crash

-- 
You are receiving this mail because:
You are on the CC list for the bug.
You are the assignee for the bug.