[Bugs] [Bug 1631128] rpc marks brick disconnected from glusterd & volume stop transaction gets timed out
bugzilla at redhat.com
bugzilla at redhat.com
Thu Sep 20 03:27:05 UTC 2018
https://bugzilla.redhat.com/show_bug.cgi?id=1631128
Atin Mukherjee <amukherj at redhat.com> changed:
What |Removed |Added
----------------------------------------------------------------------------
Keywords|ZStream |
Blocks|1516598, 1571620, 1573304, |1628651
|1584639, 1589070, 1590389, |
|1598890 |
Depends On|1628651 |
Summary|rpc marks brick |rpc marks brick
|disconnected from glusterd |disconnected from glusterd
| |& volume stop transaction
| |gets timed out
Whiteboard|ocs-dependency-issue |
--- Comment #1 from Atin Mukherjee <amukherj at redhat.com> ---
Steps to reproduce:
1. Create 90 1 X 3 volumes with brick mux mode in a 3 node cluster and start
all the volumes
2. trigger volume stop & delete commands parallely from the cli of N1 & N2 & N3
in following manner:
node 1 : vol1 - vol30
node 2 : vol31 - vol60
node 3: vol61 - vol90
Observation:
some of the volume stop started timing out and based on further probing into
glusterd processes one of the glusterd's thread was observed as:
T hread 8 (Thread 0x7f9baadfb700 (LWP 12643)):
#0 0x00007f9bb202d460 in nanosleep () from /lib64/libc.so.6
#1 0x00007f9bb202d36a in sleep () from /lib64/libc.so.6
#2 0x00007f9bae7d0882 in glusterd_wait_for_blockers (priv=0x7f9bb3c92050) at
glusterd-op-sm.c:6264
#3 0x00007f9bae7d912d in glusterd_op_commit_perform (op=GD_OP_STOP_VOLUME,
dict=dict at entry=0x7f9ba4a0ec70, op_errstr=op_errstr at entry=0x7f9baadfa978,
rsp_dict=rsp_dict at entry=0x7f9ba0226d30) at glusterd-op-sm.c:6287
#4 0x00007f9bae7e24b0 in glusterd_op_ac_commit_op (event=0x7f9ba0212980,
ctx=0x7f9ba4d1f3a0) at glusterd-op-sm.c:6019
#5 0x00007f9bae7df6ab in glusterd_op_sm () at glusterd-op-sm.c:8391
#6 0x00007f9bae80c10c in __glusterd_brick_op_cbk
(req=req at entry=0x7f9ba4cc47a0, iov=iov at entry=0x7f9ba4cc47e0,
count=count at entry=1,
myframe=myframe at entry=0x7f9ba4b15b70) at glusterd-rpc-ops.c:2241
#7 0x00007f9bae80f0e9 in glusterd_big_locked_cbk (req=0x7f9ba4cc47a0,
iov=0x7f9ba4cc47e0, count=1, myframe=0x7f9ba4b15b70,
fn=0x7f9bae80bd50 <__glusterd_brick_op_cbk>) at glusterd-rpc-ops.c:223
#8 0x00007f9bb3762a50 in rpc_clnt_handle_reply
(clnt=clnt at entry=0x7f9ba46547a0, pollin=pollin at entry=0x7f9ba00c4170) at
rpc-clnt.c:778
#9 0x00007f9bb3762da3 in rpc_clnt_notify (trans=<optimized out>,
mydata=0x7f9ba46547d0, event=<optimized out>, data=0x7f9ba00c4170) at
rpc-clnt.c:971
#10 0x00007f9bb375f313 in rpc_transport_notify (this=this at entry=0x7f9ba44f5190,
event=event at entry=RPC_TRANSPORT_MSG_RECEIVED, data=data at entry=0x7f9ba00c4170)
at rpc-transport.c:538
#11 0x00007f9baba70bb2 in socket_event_poll_in (this=this at entry=0x7f9ba44f5190,
notify_handled=<optimized out>) at socket.c:2315
#12 0x00007f9baba73023 in socket_event_handler (fd=13, idx=7, gen=1,
data=0x7f9ba44f5190, poll_in=<optimized out>, poll_out=0, poll_err=0) at
socket.c:2467
#13 0x00007f9bb39ecad9 in event_dispatch_epoll_handler (event=0x7f9baadfae74,
event_pool=0xeca570) at event-epoll.c:583
#14 event_dispatch_epoll_worker (data=0xf2ab00) at event-epoll.c:659
#15 0x00007f9bb27af50b in start_thread () from /lib64/libpthread.so.0
#16 0x00007f9bb205f16f in clone () from /lib64/libc.so.6
Referenced Bugs:
https://bugzilla.redhat.com/show_bug.cgi?id=1571620
[Bug 1571620] arbiter brick is not getting unmounted
https://bugzilla.redhat.com/show_bug.cgi?id=1573304
[Bug 1573304] [Tracker-RHGS-BZ#1628651] Cant delete PV - Stuck in Failed
status
https://bugzilla.redhat.com/show_bug.cgi?id=1584639
[Bug 1584639] During parallel node and device removal testing, for device
removal operation, found one Stale brick at gluster backend
https://bugzilla.redhat.com/show_bug.cgi?id=1589070
[Bug 1589070] [Tracker-RHGS-BZ#1628651] Difference in volume count in
heketi and gluster volume list
https://bugzilla.redhat.com/show_bug.cgi?id=1590389
[Bug 1590389] [Tracker-RHGS-BZ#1524336] Node remove leaves behind a stale
brick in its gluster pod
https://bugzilla.redhat.com/show_bug.cgi?id=1598890
[Bug 1598890] Deleting 50 file volumes succeeded but 1 volume did not get
deleted.
https://bugzilla.redhat.com/show_bug.cgi?id=1628651
[Bug 1628651] rpc marks brick disconnected from glusterd
--
You are receiving this mail because:
You are on the CC list for the bug.
You are the assignee for the bug.
More information about the Bugs
mailing list