[Bugs] [Bug 1421721] New: volume start command hangs

bugzilla at redhat.com bugzilla at redhat.com
Mon Feb 13 14:16:49 UTC 2017


https://bugzilla.redhat.com/show_bug.cgi?id=1421721

            Bug ID: 1421721
           Summary: volume start command hangs
           Product: GlusterFS
           Version: mainline
         Component: glusterd
          Assignee: bugs at gluster.org
          Reporter: amukherj at redhat.com
                CC: bugs at gluster.org



Description of problem:

If cluster.brick-multiplex was turned on and a volume was created and stopped,
then if someone tries to switch off brick multiplexing and start the volume,
command hangs with the following backtrace:

(gdb) t a a bt

Thread 8 (Thread 0x7f42977fe700 (LWP 21259)):
#0  0x00007f42a3ab4bc0 in pthread_cond_wait@@GLIBC_2.3.2 () from
/lib64/libpthread.so.0
#1  0x00007f42a4c87bfb in __synclock_lock (lock=lock at entry=0x7f42a4fac828) at
syncop.c:924
#2  0x00007f42a4c8a7c6 in synclock_lock (lock=lock at entry=0x7f42a4fac828) at
syncop.c:950
#3  0x00007f429fc1b65b in glusterd_big_locked_notify (rpc=0x7f4298001fc0,
mydata=0x7f4298198730, 
    event=RPC_CLNT_CONNECT, data=0x0, notify_fn=0x7f429fc18c20
<__glusterd_brick_rpc_notify>)
    at glusterd-handler.c:68
#4  0x00007f42a4a1fc14 in rpc_clnt_notify (trans=<optimized out>,
mydata=0x7f4298002018, event=<optimized out>, 
    data=0x7f4298194800) at rpc-clnt.c:1004
#5  0x00007f42a4a1c223 in rpc_transport_notify (this=this at entry=0x7f4298194800, 
    event=event at entry=RPC_TRANSPORT_CONNECT, data=data at entry=0x7f4298194800) at
rpc-transport.c:538
#6  0x00007f429d4f7d59 in socket_connect_finish
(this=this at entry=0x7f4298194800) at socket.c:2344
#7  0x00007f429d4fc047 in socket_event_handler (fd=<optimized out>, idx=4,
data=0x7f4298194800, poll_in=0, 
    poll_out=4, poll_err=0) at socket.c:2387
#8  0x00007f42a4ca621a in event_dispatch_epoll_handler (event=0x7f42977fde90,
event_pool=0x1a67f20)
    at event-epoll.c:572
#9  event_dispatch_epoll_worker (data=0x1a88f50) at event-epoll.c:675
#10 0x00007f42a3aaf5ba in start_thread () from /lib64/libpthread.so.0
#11 0x00007f42a33887cd in clone () from /lib64/libc.so.6

Thread 7 (Thread 0x7f4297fff700 (LWP 21258)):
#0  0x00007f42a3ab4bc0 in pthread_cond_wait@@GLIBC_2.3.2 () from
/lib64/libpthread.so.0
#1  0x00007f429fcc415b in hooks_worker (args=<optimized out>) at
glusterd-hooks.c:529
#2  0x00007f42a3aaf5ba in start_thread () from /lib64/libpthread.so.0
#3  0x00007f42a33887cd in clone () from /lib64/libc.so.6

Thread 6 (Thread 0x7f42a0780700 (LWP 21130)):
#0  0x00007f42a3ab4f69 in pthread_cond_timedwait@@GLIBC_2.3.2 () from
/lib64/libpthread.so.0
#1  0x00007f42a4c89929 in syncenv_task (proc=proc at entry=0x1a70fc0) at
syncop.c:603
#2  0x00007f42a4c8a500 in syncenv_processor (thdata=0x1a70fc0) at syncop.c:695
---Type <return> to continue, or q <return> to quit---
#3  0x00007f42a3aaf5ba in start_thread () from /lib64/libpthread.so.0
#4  0x00007f42a33887cd in clone () from /lib64/libc.so.6

Thread 5 (Thread 0x7f42a0f81700 (LWP 21129)):
#0  0x00007f42a3ab7add in __lll_lock_wait () from /lib64/libpthread.so.0
#1  0x00007f42a3ab19fd in pthread_mutex_lock () from /lib64/libpthread.so.0
#2  0x00007f42a4c614f5 in gf_timer_call_cancel (ctx=ctx at entry=0x1a31010,
event=0x7f4280000920) at timer.c:98
#3  0x00007f42a4c58173 in gf_log_disable_suppression_before_exit
(ctx=0x1a31010) at logging.c:428
#4  0x00007f42a4c5daf0 in gf_print_trace (signum=11, ctx=0x1a31010) at
common-utils.c:714
#5  <signal handler called>
#6  0x00007f42a4c61481 in list_del (old=0x7f4280000a40) at list.h:76
#7  gf_timer_call_cancel (ctx=<optimized out>, event=0x7f4280000a40) at
timer.c:103
#8  0x00007f429fc425fe in glusterd_volume_start_glusterfs
(volinfo=volinfo at entry=0x7f428c01b120, 
    brickinfo=brickinfo at entry=0x7f428c011d40, wait=wait at entry=_gf_true) at
glusterd-utils.c:1879
#9  0x00007f429fc44a24 in glusterd_brick_start
(volinfo=volinfo at entry=0x7f428c01b120, 
    brickinfo=brickinfo at entry=0x7f428c011d40, wait=wait at entry=_gf_true) at
glusterd-utils.c:5382
#10 0x00007f429fcb2f01 in glusterd_start_volume (volinfo=0x7f428c01b120,
flags=<optimized out>, 
    wait=wait at entry=_gf_true) at glusterd-volume-ops.c:2508
#11 0x00007f429fcb3351 in glusterd_op_start_volume
(dict=dict at entry=0x7f42981882e0, 
    op_errstr=op_errstr at entry=0x7f4288206ba8) at glusterd-volume-ops.c:2636
#12 0x00007f429fce9a1b in gd_mgmt_v3_commit_fn (op=op at entry=GD_OP_START_VOLUME,
dict=dict at entry=0x7f42981882e0, 
    op_errstr=op_errstr at entry=0x7f4288206ba8,
op_errno=op_errno at entry=0x7f4288206b9c, 
    rsp_dict=rsp_dict at entry=0x7f4298185ba0) at glusterd-mgmt.c:319
#13 0x00007f429fcec7f4 in glusterd_mgmt_v3_commit
(op=op at entry=GD_OP_START_VOLUME, 
    op_ctx=op_ctx at entry=0x7f429819ac70, req_dict=0x7f42981882e0,
op_errstr=op_errstr at entry=0x7f4288206ba8, 
    op_errno=op_errno at entry=0x7f4288206b9c,
txn_generation=txn_generation at entry=0) at glusterd-mgmt.c:1524
#14 0x00007f429fcee360 in glusterd_mgmt_v3_initiate_all_phases
(req=req at entry=0x7f4288003750, 
    op=op at entry=GD_OP_START_VOLUME, dict=0x7f429819ac70) at
glusterd-mgmt.c:2141
#15 0x00007f429fcac532 in __glusterd_handle_cli_start_volume
(req=req at entry=0x7f4288003750)
    at glusterd-volume-ops.c:549
#16 0x00007f429fc1b6cd in glusterd_big_locked_handler (req=0x7f4288003750, 
---Type <return> to continue, or q <return> to quit---
    actor_fn=0x7f429fcac2d0 <__glusterd_handle_cli_start_volume>) at
glusterd-handler.c:81
#17 0x00007f42a4c87b10 in synctask_wrap (old_task=<optimized out>) at
syncop.c:375
#18 0x00007f42a32ce2c0 in ?? () from /lib64/libc.so.6
#19 0x0000000000000000 in ?? ()

Thread 4 (Thread 0x7f42a1782700 (LWP 21128)):
#0  0x00007f42a334e14d in nanosleep () from /lib64/libc.so.6
#1  0x00007f42a334e09a in sleep () from /lib64/libc.so.6
#2  0x00007f42a4c7927a in pool_sweeper (arg=<optimized out>) at mem-pool.c:464
#3  0x00007f42a3aaf5ba in start_thread () from /lib64/libpthread.so.0
#4  0x00007f42a33887cd in clone () from /lib64/libc.so.6

Thread 3 (Thread 0x7f42a1f83700 (LWP 21127)):
#0  0x00007f42a3ab89d6 in sigwait () from /lib64/libpthread.so.0
#1  0x00000000004085c7 in glusterfs_sigwaiter (arg=<optimized out>) at
glusterfsd.c:2095
#2  0x00007f42a3aaf5ba in start_thread () from /lib64/libpthread.so.0
#3  0x00007f42a33887cd in clone () from /lib64/libc.so.6

Thread 2 (Thread 0x7f42a2784700 (LWP 21126)):
#0  0x00007f42a3ab7add in __lll_lock_wait () from /lib64/libpthread.so.0
#1  0x00007f42a3ab19fd in pthread_mutex_lock () from /lib64/libpthread.so.0
#2  0x00007f42a4c60f65 in gf_timer_proc (data=0x1a6e7c0) at timer.c:135
#3  0x00007f42a3aaf5ba in start_thread () from /lib64/libpthread.so.0
#4  0x00007f42a33887cd in clone () from /lib64/libc.so.6

Thread 1 (Thread 0x7f42a512e780 (LWP 21125)):
#0  0x00007f42a3ab06ad in pthread_join () from /lib64/libpthread.so.0
#1  0x00007f42a4ca6668 in event_dispatch_epoll (event_pool=0x1a67f20) at
event-epoll.c:759
#2  0x00000000004059b0 in main (argc=<optimized out>, argv=<optimized out>) at
glusterfsd.c:2505

The above trace does indicate a deadlock situation.

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

-- 
You are receiving this mail because:
You are on the CC list for the bug.
You are the assignee for the bug.


More information about the Bugs mailing list