[Bugs] [Bug 1631128] rpc marks brick disconnected from glusterd & volume stop transaction gets timed out

bugzilla at redhat.com bugzilla at redhat.com
Thu Sep 20 03:27:05 UTC 2018


https://bugzilla.redhat.com/show_bug.cgi?id=1631128

Atin Mukherjee <amukherj at redhat.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Keywords|ZStream                     |
             Blocks|1516598, 1571620, 1573304,  |1628651
                   |1584639, 1589070, 1590389,  |
                   |1598890                     |
         Depends On|1628651                     |
            Summary|rpc marks brick             |rpc marks brick
                   |disconnected from glusterd  |disconnected from glusterd
                   |                            |& volume stop transaction
                   |                            |gets timed out
         Whiteboard|ocs-dependency-issue        |



--- Comment #1 from Atin Mukherjee <amukherj at redhat.com> ---
Steps to reproduce:

1. Create 90 1 X 3 volumes with brick mux mode in a 3 node cluster and start
all the volumes
2. trigger volume stop & delete commands parallely from the cli of N1 & N2 & N3
in following manner:
node 1 : vol1 - vol30
node 2 : vol31 - vol60
node 3: vol61 - vol90

Observation:

some of the volume stop started timing out and based on further probing into
glusterd processes one of the glusterd's thread was observed as:

T    hread 8 (Thread 0x7f9baadfb700 (LWP 12643)):
#0  0x00007f9bb202d460 in nanosleep () from /lib64/libc.so.6
#1  0x00007f9bb202d36a in sleep () from /lib64/libc.so.6
#2  0x00007f9bae7d0882 in glusterd_wait_for_blockers (priv=0x7f9bb3c92050) at
glusterd-op-sm.c:6264
#3  0x00007f9bae7d912d in glusterd_op_commit_perform (op=GD_OP_STOP_VOLUME,
dict=dict at entry=0x7f9ba4a0ec70, op_errstr=op_errstr at entry=0x7f9baadfa978, 
    rsp_dict=rsp_dict at entry=0x7f9ba0226d30) at glusterd-op-sm.c:6287
#4  0x00007f9bae7e24b0 in glusterd_op_ac_commit_op (event=0x7f9ba0212980,
ctx=0x7f9ba4d1f3a0) at glusterd-op-sm.c:6019
#5  0x00007f9bae7df6ab in glusterd_op_sm () at glusterd-op-sm.c:8391
#6  0x00007f9bae80c10c in __glusterd_brick_op_cbk
(req=req at entry=0x7f9ba4cc47a0, iov=iov at entry=0x7f9ba4cc47e0,
count=count at entry=1, 
    myframe=myframe at entry=0x7f9ba4b15b70) at glusterd-rpc-ops.c:2241
#7  0x00007f9bae80f0e9 in glusterd_big_locked_cbk (req=0x7f9ba4cc47a0,
iov=0x7f9ba4cc47e0, count=1, myframe=0x7f9ba4b15b70, 
    fn=0x7f9bae80bd50 <__glusterd_brick_op_cbk>) at glusterd-rpc-ops.c:223
#8  0x00007f9bb3762a50 in rpc_clnt_handle_reply
(clnt=clnt at entry=0x7f9ba46547a0, pollin=pollin at entry=0x7f9ba00c4170) at
rpc-clnt.c:778
#9  0x00007f9bb3762da3 in rpc_clnt_notify (trans=<optimized out>,
mydata=0x7f9ba46547d0, event=<optimized out>, data=0x7f9ba00c4170) at
rpc-clnt.c:971
#10 0x00007f9bb375f313 in rpc_transport_notify (this=this at entry=0x7f9ba44f5190,
event=event at entry=RPC_TRANSPORT_MSG_RECEIVED, data=data at entry=0x7f9ba00c4170)
    at rpc-transport.c:538
#11 0x00007f9baba70bb2 in socket_event_poll_in (this=this at entry=0x7f9ba44f5190,
notify_handled=<optimized out>) at socket.c:2315
#12 0x00007f9baba73023 in socket_event_handler (fd=13, idx=7, gen=1,
data=0x7f9ba44f5190, poll_in=<optimized out>, poll_out=0, poll_err=0) at
socket.c:2467
#13 0x00007f9bb39ecad9 in event_dispatch_epoll_handler (event=0x7f9baadfae74,
event_pool=0xeca570) at event-epoll.c:583
#14 event_dispatch_epoll_worker (data=0xf2ab00) at event-epoll.c:659
#15 0x00007f9bb27af50b in start_thread () from /lib64/libpthread.so.0
#16 0x00007f9bb205f16f in clone () from /lib64/libc.so.6


Referenced Bugs:

https://bugzilla.redhat.com/show_bug.cgi?id=1571620
[Bug 1571620] arbiter brick is not getting unmounted
https://bugzilla.redhat.com/show_bug.cgi?id=1573304
[Bug 1573304] [Tracker-RHGS-BZ#1628651] Cant delete PV - Stuck in Failed
status
https://bugzilla.redhat.com/show_bug.cgi?id=1584639
[Bug 1584639] During parallel node and device removal testing, for device
removal operation, found one Stale brick at gluster backend
https://bugzilla.redhat.com/show_bug.cgi?id=1589070
[Bug 1589070] [Tracker-RHGS-BZ#1628651] Difference in volume count in
heketi and gluster volume list
https://bugzilla.redhat.com/show_bug.cgi?id=1590389
[Bug 1590389] [Tracker-RHGS-BZ#1524336] Node remove leaves behind a stale
brick in its gluster pod
https://bugzilla.redhat.com/show_bug.cgi?id=1598890
[Bug 1598890] Deleting 50 file volumes succeeded but 1 volume did not get
deleted.
https://bugzilla.redhat.com/show_bug.cgi?id=1628651
[Bug 1628651] rpc marks brick disconnected from glusterd
-- 
You are receiving this mail because:
You are on the CC list for the bug.
You are the assignee for the bug.


More information about the Bugs mailing list