[Bugs] [Bug 1445408] New: gluster volume stop hangs
bugzilla at redhat.com
bugzilla at redhat.com
Tue Apr 25 15:35:20 UTC 2017
https://bugzilla.redhat.com/show_bug.cgi?id=1445408
Bug ID: 1445408
Summary: gluster volume stop hangs
Product: GlusterFS
Version: 3.10
Component: glusterd
Keywords: Triaged
Assignee: bugs at gluster.org
Reporter: amukherj at redhat.com
CC: bugs at gluster.org
Depends On: 1441910
Blocks: 1441932
+++ This bug was initially created as a clone of Bug #1441910 +++
Description of problem:
While I was testing some of the glusterd basic commands, I ended up in a
situation where volume stop hung:
(gdb) t a a bt
Thread 8 (Thread 0x7f3f213d5700 (LWP 31710)):
#0 0x00007f3f28a56dd3 in epoll_wait () from /lib64/libc.so.6
#1 0x00007f3f2a3748ef in event_dispatch_epoll_worker (data=0x250df70) at
event-epoll.c:665
#2 0x00007f3f2917d5ba in start_thread () from /lib64/libpthread.so.0
#3 0x00007f3f28a567cd in clone () from /lib64/libc.so.6
Thread 7 (Thread 0x7f3f21bd6700 (LWP 31709)):
#0 0x00007f3f29182bc0 in pthread_cond_wait@@GLIBC_2.3.2 () from
/lib64/libpthread.so.0
#1 0x00007f3f2539349b in hooks_worker (args=<optimized out>) at
glusterd-hooks.c:529
#2 0x00007f3f2917d5ba in start_thread () from /lib64/libpthread.so.0
#3 0x00007f3f28a567cd in clone () from /lib64/libc.so.6
Thread 6 (Thread 0x7f3f25e4e700 (LWP 31581)):
#0 0x00007f3f28a1c14d in nanosleep () from /lib64/libc.so.6
#1 0x00007f3f28a1c09a in sleep () from /lib64/libc.so.6
#2 0x00007f3f252f92e2 in glusterd_wait_for_blockers (priv=0x7f3f2a653050) at
glusterd-op-sm.c:6052
#3 0x00007f3f253014ec in glusterd_op_commit_perform
(op=op at entry=GD_OP_STOP_VOLUME,
dict=dict at entry=0x7f3f100790b0, op_errstr=op_errstr at entry=0x7f3f1840c040,
rsp_dict=rsp_dict at entry=0x7f3f10004f30) at glusterd-op-sm.c:6075
#4 0x00007f3f25390c6d in gd_commit_op_phase (op=GD_OP_STOP_VOLUME,
op_ctx=op_ctx at entry=0x7f3f1c01c860,
req_dict=0x7f3f100790b0, op_errstr=op_errstr at entry=0x7f3f1840c040,
txn_opinfo=txn_opinfo at entry=0x7f3f1840c060)
at glusterd-syncop.c:1413
#5 0x00007f3f253920ed in gd_sync_task_begin
(op_ctx=op_ctx at entry=0x7f3f1c01c860, req=req at entry=0x7f3f18006ee0)
at glusterd-syncop.c:1942
#6 0x00007f3f253923bc in glusterd_op_begin_synctask
(req=req at entry=0x7f3f18006ee0, op=op at entry=GD_OP_STOP_VOLUME,
dict=0x7f3f1c01c860) at glusterd-syncop.c:2007
#7 0x00007f3f2537bdac in __glusterd_handle_cli_stop_volume
(req=req at entry=0x7f3f18006ee0)
at glusterd-volume-ops.c:628
#8 0x00007f3f252ec7bd in glusterd_big_locked_handler (req=0x7f3f18006ee0,
actor_fn=0x7f3f2537bbd0 <__glusterd_handle_cli_stop_volume>) at
glusterd-handler.c:81
---Type <return> to continue, or q <return> to quit---
#9 0x00007f3f2a355f20 in synctask_wrap () at syncop.c:375
#10 0x00007f3f2899c2c0 in ?? () from /lib64/libc.so.6
#11 0x0000000000000000 in ?? ()
Thread 5 (Thread 0x7f3f2664f700 (LWP 31580)):
#0 0x00007f3f29182f69 in pthread_cond_timedwait@@GLIBC_2.3.2 () from
/lib64/libpthread.so.0
#1 0x00007f3f2a357d49 in syncenv_task (proc=proc at entry=0x24f5c10) at
syncop.c:603
#2 0x00007f3f2a358920 in syncenv_processor (thdata=0x24f5c10) at syncop.c:695
#3 0x00007f3f2917d5ba in start_thread () from /lib64/libpthread.so.0
#4 0x00007f3f28a567cd in clone () from /lib64/libc.so.6
Thread 4 (Thread 0x7f3f26e50700 (LWP 31579)):
#0 0x00007f3f28a1c14d in nanosleep () from /lib64/libc.so.6
#1 0x00007f3f28a1c09a in sleep () from /lib64/libc.so.6
#2 0x00007f3f2a34750a in pool_sweeper (arg=<optimized out>) at mem-pool.c:465
#3 0x00007f3f2917d5ba in start_thread () from /lib64/libpthread.so.0
#4 0x00007f3f28a567cd in clone () from /lib64/libc.so.6
Thread 3 (Thread 0x7f3f27651700 (LWP 31578)):
#0 0x00007f3f291869d6 in sigwait () from /lib64/libpthread.so.0
#1 0x00000000004085c7 in glusterfs_sigwaiter (arg=<optimized out>) at
glusterfsd.c:2095
#2 0x00007f3f2917d5ba in start_thread () from /lib64/libpthread.so.0
#3 0x00007f3f28a567cd in clone () from /lib64/libc.so.6
Thread 2 (Thread 0x7f3f27e52700 (LWP 31577)):
#0 0x00007f3f2918648d in nanosleep () from /lib64/libpthread.so.0
#1 0x00007f3f2a32f2e6 in gf_timer_proc (data=0x24f37d0) at timer.c:164
#2 0x00007f3f2917d5ba in start_thread () from /lib64/libpthread.so.0
#3 0x00007f3f28a567cd in clone () from /lib64/libc.so.6
Thread 1 (Thread 0x7f3f2a7fe780 (LWP 31576)):
---Type <return> to continue, or q <return> to quit---
#0 0x00007f3f2917e6ad in pthread_join () from /lib64/libpthread.so.0
#1 0x00007f3f2a374e48 in event_dispatch_epoll (event_pool=0x24ecf30) at
event-epoll.c:759
#2 0x00000000004059b0 in main (argc=<optimized out>, argv=<optimized out>) at
glusterfsd.c:2505
(gdb) t 6
[Switching to thread 6 (Thread 0x7f3f25e4e700 (LWP 31581))]
#0 0x00007f3f28a1c14d in nanosleep () from /lib64/libc.so.6
(gdb) f 4
#4 0x00007f3f25390c6d in gd_commit_op_phase (op=GD_OP_STOP_VOLUME,
op_ctx=op_ctx at entry=0x7f3f1c01c860,
req_dict=0x7f3f100790b0, op_errstr=op_errstr at entry=0x7f3f1840c040,
txn_opinfo=txn_opinfo at entry=0x7f3f1840c060)
at glusterd-syncop.c:1413
1413 ret = glusterd_op_commit_perform (op, req_dict, op_errstr,
rsp_dict);
(gdb) f 3
#3 0x00007f3f253014ec in glusterd_op_commit_perform
(op=op at entry=GD_OP_STOP_VOLUME,
dict=dict at entry=0x7f3f100790b0, op_errstr=op_errstr at entry=0x7f3f1840c040,
rsp_dict=rsp_dict at entry=0x7f3f10004f30) at glusterd-op-sm.c:6075
6075 glusterd_wait_for_blockers (this->private);
(gdb) f 2
#2 0x00007f3f252f92e2 in glusterd_wait_for_blockers (priv=0x7f3f2a653050) at
glusterd-op-sm.c:6052
6052 sleep (1);
(gdb) p priv.blockers
$1 = 4294967294
priv.blockers shoots up with a big number, this counter was introduced in
https://review.gluster.org/#/c/16927 . Further debugging to continue.
Version-Release number of selected component (if applicable):
How reproducible:
Steps to Reproduce:
1.
2.
3.
Actual results:
Expected results:
Additional info:
--- Additional comment from Worker Ant on 2017-04-13 03:59:47 EDT ---
REVIEW: https://review.gluster.org/17055 (glusterd: fix
glusterd_wait_for_blockers to go in infinite loop) posted (#1) for review on
master by Atin Mukherjee (amukherj at redhat.com)
--- Additional comment from Worker Ant on 2017-04-13 08:11:11 EDT ---
REVIEW: https://review.gluster.org/17055 (glusterd: fix
glusterd_wait_for_blockers to go in infinite loop) posted (#2) for review on
master by Atin Mukherjee (amukherj at redhat.com)
--- Additional comment from Worker Ant on 2017-04-13 08:11:18 EDT ---
REVIEW: https://review.gluster.org/17055 (glusterd: fix
glusterd_wait_for_blockers to go in infinite loop) posted (#3) for review on
master by Atin Mukherjee (amukherj at redhat.com)
--- Additional comment from Worker Ant on 2017-04-13 14:15:17 EDT ---
COMMIT: https://review.gluster.org/17055 committed in master by Jeff Darcy
(jeff at pl.atyp.us)
------
commit 090c8866eb3ae174be50dec8d9d5ecf978d18a45
Author: Atin Mukherjee <amukherj at redhat.com>
Date: Thu Apr 13 13:20:18 2017 +0530
glusterd: fix glusterd_wait_for_blockers to go in infinite loop
In send_attach_req () conf->blockers is bumped up before
rpc_clnt_submit however the same is bumped down twice, one from the
callback and one from the negative ret handling which can very well be a
possible case if the rpc submit fails.
Change-Id: Icb820694034cbfcb3d427911e192ac4a0f4540f6
BUG: 1441910
Signed-off-by: Atin Mukherjee <amukherj at redhat.com>
Reviewed-on: https://review.gluster.org/17055
Smoke: Gluster Build System <jenkins at build.gluster.org>
NetBSD-regression: NetBSD Build System <jenkins at build.gluster.org>
CentOS-regression: Gluster Build System <jenkins at build.gluster.org>
Reviewed-by: Jeff Darcy <jeff at pl.atyp.us>
Referenced Bugs:
https://bugzilla.redhat.com/show_bug.cgi?id=1441910
[Bug 1441910] gluster volume stop hangs
https://bugzilla.redhat.com/show_bug.cgi?id=1441932
[Bug 1441932] Gluster operations fails with another transaction in progress
as volume delete acquires lock and won't release
--
You are receiving this mail because:
You are on the CC list for the bug.
You are the assignee for the bug.
More information about the Bugs
mailing list