[Bugs] [Bug 1431175] New: volume start command hangs

Fri Mar 10 14:20:11 UTC 2017

https://bugzilla.redhat.com/show_bug.cgi?id=1431175

            Bug ID: 1431175
           Summary: volume start command hangs
           Product: GlusterFS
           Version: 3.10
         Component: glusterd
          Keywords: Triaged
          Severity: high
          Priority: high
          Assignee: bugs at gluster.org
          Reporter: jdarcy at redhat.com
                CC: amukherj at redhat.com, bugs at gluster.org,
                    jdarcy at redhat.com
        Depends On: 1421721

+++ This bug was initially created as a clone of Bug #1421721 +++

Description of problem:

If cluster.brick-multiplex was turned on and a volume was created and stopped,
then if someone tries to switch off brick multiplexing and start the volume,
command hangs with the following backtrace:

(gdb) t a a bt

Thread 8 (Thread 0x7f42977fe700 (LWP 21259)):
#0  0x00007f42a3ab4bc0 in pthread_cond_wait@@GLIBC_2.3.2 () from
/lib64/libpthread.so.0
#1  0x00007f42a4c87bfb in __synclock_lock (lock=lock at entry=0x7f42a4fac828) at
syncop.c:924
#2  0x00007f42a4c8a7c6 in synclock_lock (lock=lock at entry=0x7f42a4fac828) at
syncop.c:950
#3  0x00007f429fc1b65b in glusterd_big_locked_notify (rpc=0x7f4298001fc0,
mydata=0x7f4298198730, 
    event=RPC_CLNT_CONNECT, data=0x0, notify_fn=0x7f429fc18c20
<__glusterd_brick_rpc_notify>)
    at glusterd-handler.c:68
#4  0x00007f42a4a1fc14 in rpc_clnt_notify (trans=<optimized out>,
mydata=0x7f4298002018, event=<optimized out>, 
    data=0x7f4298194800) at rpc-clnt.c:1004
#5  0x00007f42a4a1c223 in rpc_transport_notify (this=this at entry=0x7f4298194800, 
    event=event at entry=RPC_TRANSPORT_CONNECT, data=data at entry=0x7f4298194800) at
rpc-transport.c:538
#6  0x00007f429d4f7d59 in socket_connect_finish
(this=this at entry=0x7f4298194800) at socket.c:2344
#7  0x00007f429d4fc047 in socket_event_handler (fd=<optimized out>, idx=4,
data=0x7f4298194800, poll_in=0, 
    poll_out=4, poll_err=0) at socket.c:2387
#8  0x00007f42a4ca621a in event_dispatch_epoll_handler (event=0x7f42977fde90,
event_pool=0x1a67f20)
    at event-epoll.c:572
#9  event_dispatch_epoll_worker (data=0x1a88f50) at event-epoll.c:675
#10 0x00007f42a3aaf5ba in start_thread () from /lib64/libpthread.so.0
#11 0x00007f42a33887cd in clone () from /lib64/libc.so.6

Thread 7 (Thread 0x7f4297fff700 (LWP 21258)):
#0  0x00007f42a3ab4bc0 in pthread_cond_wait@@GLIBC_2.3.2 () from
/lib64/libpthread.so.0
#1  0x00007f429fcc415b in hooks_worker (args=<optimized out>) at
glusterd-hooks.c:529
#2  0x00007f42a3aaf5ba in start_thread () from /lib64/libpthread.so.0
#3  0x00007f42a33887cd in clone () from /lib64/libc.so.6

Thread 6 (Thread 0x7f42a0780700 (LWP 21130)):
#0  0x00007f42a3ab4f69 in pthread_cond_timedwait@@GLIBC_2.3.2 () from
/lib64/libpthread.so.0
#1  0x00007f42a4c89929 in syncenv_task (proc=proc at entry=0x1a70fc0) at
syncop.c:603
#2  0x00007f42a4c8a500 in syncenv_processor (thdata=0x1a70fc0) at syncop.c:695
---Type <return> to continue, or q <return> to quit---
#3  0x00007f42a3aaf5ba in start_thread () from /lib64/libpthread.so.0
#4  0x00007f42a33887cd in clone () from /lib64/libc.so.6

Thread 5 (Thread 0x7f42a0f81700 (LWP 21129)):
#0  0x00007f42a3ab7add in __lll_lock_wait () from /lib64/libpthread.so.0
#1  0x00007f42a3ab19fd in pthread_mutex_lock () from /lib64/libpthread.so.0
#2  0x00007f42a4c614f5 in gf_timer_call_cancel (ctx=ctx at entry=0x1a31010,
event=0x7f4280000920) at timer.c:98
#3  0x00007f42a4c58173 in gf_log_disable_suppression_before_exit
(ctx=0x1a31010) at logging.c:428
#4  0x00007f42a4c5daf0 in gf_print_trace (signum=11, ctx=0x1a31010) at
common-utils.c:714
#5  <signal handler called>
#6  0x00007f42a4c61481 in list_del (old=0x7f4280000a40) at list.h:76
#7  gf_timer_call_cancel (ctx=<optimized out>, event=0x7f4280000a40) at
timer.c:103
#8  0x00007f429fc425fe in glusterd_volume_start_glusterfs
(volinfo=volinfo at entry=0x7f428c01b120, 
    brickinfo=brickinfo at entry=0x7f428c011d40, wait=wait at entry=_gf_true) at
glusterd-utils.c:1879
#9  0x00007f429fc44a24 in glusterd_brick_start
(volinfo=volinfo at entry=0x7f428c01b120, 
    brickinfo=brickinfo at entry=0x7f428c011d40, wait=wait at entry=_gf_true) at
glusterd-utils.c:5382
#10 0x00007f429fcb2f01 in glusterd_start_volume (volinfo=0x7f428c01b120,
flags=<optimized out>, 
    wait=wait at entry=_gf_true) at glusterd-volume-ops.c:2508
#11 0x00007f429fcb3351 in glusterd_op_start_volume
(dict=dict at entry=0x7f42981882e0, 
    op_errstr=op_errstr at entry=0x7f4288206ba8) at glusterd-volume-ops.c:2636
#12 0x00007f429fce9a1b in gd_mgmt_v3_commit_fn (op=op at entry=GD_OP_START_VOLUME,
dict=dict at entry=0x7f42981882e0, 
    op_errstr=op_errstr at entry=0x7f4288206ba8,
op_errno=op_errno at entry=0x7f4288206b9c, 
    rsp_dict=rsp_dict at entry=0x7f4298185ba0) at glusterd-mgmt.c:319
#13 0x00007f429fcec7f4 in glusterd_mgmt_v3_commit
(op=op at entry=GD_OP_START_VOLUME, 
    op_ctx=op_ctx at entry=0x7f429819ac70, req_dict=0x7f42981882e0,
op_errstr=op_errstr at entry=0x7f4288206ba8, 
    op_errno=op_errno at entry=0x7f4288206b9c,
txn_generation=txn_generation at entry=0) at glusterd-mgmt.c:1524
#14 0x00007f429fcee360 in glusterd_mgmt_v3_initiate_all_phases
(req=req at entry=0x7f4288003750, 
    op=op at entry=GD_OP_START_VOLUME, dict=0x7f429819ac70) at
glusterd-mgmt.c:2141
#15 0x00007f429fcac532 in __glusterd_handle_cli_start_volume
(req=req at entry=0x7f4288003750)
    at glusterd-volume-ops.c:549
#16 0x00007f429fc1b6cd in glusterd_big_locked_handler (req=0x7f4288003750, 
---Type <return> to continue, or q <return> to quit---
    actor_fn=0x7f429fcac2d0 <__glusterd_handle_cli_start_volume>) at
glusterd-handler.c:81
#17 0x00007f42a4c87b10 in synctask_wrap (old_task=<optimized out>) at
syncop.c:375
#18 0x00007f42a32ce2c0 in ?? () from /lib64/libc.so.6
#19 0x0000000000000000 in ?? ()

Thread 4 (Thread 0x7f42a1782700 (LWP 21128)):
#0  0x00007f42a334e14d in nanosleep () from /lib64/libc.so.6
#1  0x00007f42a334e09a in sleep () from /lib64/libc.so.6
#2  0x00007f42a4c7927a in pool_sweeper (arg=<optimized out>) at mem-pool.c:464
#3  0x00007f42a3aaf5ba in start_thread () from /lib64/libpthread.so.0
#4  0x00007f42a33887cd in clone () from /lib64/libc.so.6

Thread 3 (Thread 0x7f42a1f83700 (LWP 21127)):
#0  0x00007f42a3ab89d6 in sigwait () from /lib64/libpthread.so.0
#1  0x00000000004085c7 in glusterfs_sigwaiter (arg=<optimized out>) at
glusterfsd.c:2095
#2  0x00007f42a3aaf5ba in start_thread () from /lib64/libpthread.so.0
#3  0x00007f42a33887cd in clone () from /lib64/libc.so.6

Thread 2 (Thread 0x7f42a2784700 (LWP 21126)):
#0  0x00007f42a3ab7add in __lll_lock_wait () from /lib64/libpthread.so.0
#1  0x00007f42a3ab19fd in pthread_mutex_lock () from /lib64/libpthread.so.0
#2  0x00007f42a4c60f65 in gf_timer_proc (data=0x1a6e7c0) at timer.c:135
#3  0x00007f42a3aaf5ba in start_thread () from /lib64/libpthread.so.0
#4  0x00007f42a33887cd in clone () from /lib64/libc.so.6

Thread 1 (Thread 0x7f42a512e780 (LWP 21125)):
#0  0x00007f42a3ab06ad in pthread_join () from /lib64/libpthread.so.0
#1  0x00007f42a4ca6668 in event_dispatch_epoll (event_pool=0x1a67f20) at
event-epoll.c:759
#2  0x00000000004059b0 in main (argc=<optimized out>, argv=<optimized out>) at
glusterfsd.c:2505

The above trace does indicate a deadlock situation.

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:
1.
2.
3.

Actual results:

Expected results:

Additional info:

--- Additional comment from Jeff Darcy on 2017-02-13 13:50:15 EST ---

Looks a bit racy.  I ran this sequence a few times and didn't see any failures.
 This part of the trace (thread 3) seems most important:

#4  0x00007f42a4c5daf0 in gf_print_trace (signum=11, ctx=0x1a31010) at
common-utils.c:714
#5  <signal handler called>

So we're probably looking at some sort of list/memory corruption.  Is it
possible that the rpc isn't completely valid (even though the pointer is
non-null) or that the timer has already been removed from its list?

--- Additional comment from Worker Ant on 2017-02-16 17:01:39 EST ---

REVIEW: https://review.gluster.org/16650 (tests: add test for toggling MPX and
restarting a volume) posted (#1) for review on master by Jeff Darcy
(jdarcy at redhat.com)

--- Additional comment from Jeff Darcy on 2017-02-16 17:04:48 EST ---

I have tested this more than a hundred times, with zero failures.  You can see
the script here.

https://review.gluster.org/#/c/16650/

Is there something else that you did, that's missing from the script?

--- Additional comment from Jeff Darcy on 2017-02-16 21:20:19 EST ---

There is one place in the multiplexing code where I added a call to
gf_timer_call_cancel.  It's in glusterd_volume_start_glusterfs, which is part
of the stack above.  Suspiciously, this is done without a lock on conn->lock as
is done in other places (e.g. rpc_clnt_reconnect_cleanup).  This fits with the
theory that it's a race.  In gf_timer_call_cancel we do call list_del instead
of list_del_init, and there seems to be no other protection against being
called twice, so I suspect that when we hit the race (because of the missing
lock) we corrupt the list.  Unfortunately, since the test already passes
consistently for me, I won't be able to test whether the fix has any effect.

--- Additional comment from Worker Ant on 2017-02-17 09:44:55 EST ---

REVIEW: https://review.gluster.org/16662 (glusterd: take conn->lock around
operations on conn->reconnect) posted (#1) for review on master by Jeff Darcy
(jdarcy at redhat.com)

--- Additional comment from Worker Ant on 2017-02-17 10:00:12 EST ---

REVIEW: https://review.gluster.org/16662 (glusterd: take conn->lock around
operations on conn->reconnect) posted (#2) for review on master by Jeff Darcy
(jdarcy at redhat.com)

--- Additional comment from Worker Ant on 2017-02-18 01:59:14 EST ---

COMMIT: https://review.gluster.org/16662 committed in master by Atin Mukherjee
(amukherj at redhat.com) 
------
commit 4e0d4b15717da1f6466133158a26927fb91384b8
Author: Jeff Darcy <jdarcy at redhat.com>
Date:   Fri Feb 17 09:42:46 2017 -0500

    glusterd: take conn->lock around operations on conn->reconnect

    Failure to do this could lead to a race in which a timer would be
    removed twice concurrently, corrupting the timer list (because
    gf_timer_call_cancel has no internal protection against this) and
    possibly causing a crash.

    Change-Id: Ic1a8b612d436daec88fd6cee935db0ae81a47d5c
    BUG: 1421721
    Signed-off-by: Jeff Darcy <jdarcy at redhat.com>
    Reviewed-on: https://review.gluster.org/16662
    Smoke: Gluster Build System <jenkins at build.gluster.org>
    NetBSD-regression: NetBSD Build System <jenkins at build.gluster.org>
    CentOS-regression: Gluster Build System <jenkins at build.gluster.org>
    Reviewed-by: Atin Mukherjee <amukherj at redhat.com>

Referenced Bugs:

https://bugzilla.redhat.com/show_bug.cgi?id=1421721
[Bug 1421721] volume start command hangs
-- 
You are receiving this mail because:
You are on the CC list for the bug.
You are the assignee for the bug.