[Bugs] [Bug 1537362] glustershd/ glusterd is not using right port when connecting to glusterfsd process

Tue Apr 17 01:34:48 UTC 2018

https://bugzilla.redhat.com/show_bug.cgi?id=1537362

zhou lin <zz.sh.cynthia at gmail.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
              Flags|needinfo?(zz.sh.cynthia at gma |
                   |il.com)                     |

--- Comment #6 from zhou lin <zz.sh.cynthia at gmail.com> ---
the version i am using is glusterfs 3.12.3  i have added
https://review.gluster.org/#/c/19263/
but occasionally this issue still exists i think the real root cause is that
the glusterfsd process started twice and according to previous analysis

 I can see is every time when I restart all the gluster process (simulating a
reboot scenario) I could see respective brick disconnect events being received
at glusterd.

#0  __glusterd_brick_rpc_notify (rpc=rpc at entry=0x7fffe8008810,
mydata=mydata at entry=0x7fffe800e0d0, 
    event=event at entry=RPC_CLNT_DISCONNECT, data=data at entry=0x0) at
glusterd-handler.c:6007
#1  0x00007ffff2936169 in glusterd_big_locked_notify (rpc=0x7fffe8008810,
mydata=0x7fffe800e0d0, event=RPC_CLNT_DISCONNECT, 
    data=0x0, notify_fn=0x7ffff2932440 <__glusterd_brick_rpc_notify>) at
glusterd-handler.c:68
#2  0x00007ffff78c0343 in rpc_clnt_handle_disconnect (conn=0x7fffe8008840,
clnt=0x7fffe8008810) at rpc-clnt.c:876
#3  rpc_clnt_notify (trans=0x7fffe8008ae0, mydata=0x7fffe8008840,
event=RPC_TRANSPORT_DISCONNECT, data=<optimized out>)
    at rpc-clnt.c:939
#4  0x00007ffff78bca73 in rpc_transport_notify (this=this at entry=0x7fffe8008ae0,
event=event at entry=RPC_TRANSPORT_DISCONNECT, 
    data=data at entry=0x7fffe8008ae0) at rpc-transport.c:545
#5  0x00007fffefc252bf in socket_event_poll_err (idx=<optimized out>,
gen=<optimized out>, this=0x7fffe8008ae0) at socket.c:1210
#6  socket_event_handler (fd=9, idx=<optimized out>, gen=<optimized out>,
data=0x7fffe8008ae0, poll_in=0, poll_out=<optimized out>, 
    poll_err=16) at socket.c:2488
#7  0x00007ffff7b5097c in event_dispatch_epoll_handler (event=0x7fffee9dce84,
event_pool=0x64f040) at event-epoll.c:583
#8  event_dispatch_epoll_worker (data=0x70b820) at event-epoll.c:659
#9  0x00007ffff693f36d in start_thread () from /lib64/libpthread.so.0
#10 0x00007ffff61ecbbf in clone () from /lib64/libc.so.6
The question what I have for RPC team is - why do we see a disconnect event in
this case?  Any pointers, Mohit/Raghu/Milind?
The side effect of this disconnect event is what caused this issue. In
__glusterd_brick_rpc_notify we set the brickinfo->start_triggered to false. So
when two glusterd_brick_start threads were racing with each other, in between
we got a disconnect because of which this flag got reset and we ended up trying
to spawn the same brick twice.

-- 
You are receiving this mail because:
You are on the CC list for the bug.
Unsubscribe from this bug https://bugzilla.redhat.com/token.cgi?t=Jorba4LiCe&a=cc_unsubscribe