[Gluster-users] 3.9.1 in docker: problems when one of peers is unavailable.

Tue May 16 05:33:14 UTC 2017

Hi All.

I have a 9 node dockerized glusterfs cluster and I am seeing a situation
that:
1) docker daemon on 8th node failes and as a result glusterd on this node
is leaving the cluster
2) as a result on 1st node I see message about 8th node being unavailable:

[2017-05-15 12:48:22.142865] I [MSGID: 106004]
[glusterd-handler.c:5808:__glusterd_peer_rpc_notify] 0-management: Peer
<10.10.10.8> (<5cb55b7a-1e04-4fb8-bd1d-55ee647719d2>), in state <Peer in
Cluster>, has disconnected from glusterd.
[2017-05-15 12:48:22.167746] W
[glusterd-locks.c:675:glusterd_mgmt_v3_unlock]
(-->/usr/lib64/glusterfs/3.9.1/xlator/mgmt/glusterd.so(+0x2035a)
[0x7f7d9d62535a]
-->/usr/lib64/glusterfs/3.9.1/xlator/mgmt/glusterd.so(+0x29f48)
[0x7f7d9d62ef48] -->/usr/lib64/glus
terfs/3.9.1/xlator/mgmt/glusterd.so(+0xd50aa) [0x7f7d9d6da0aa] )
0-management: Lock for vol csv not held
[2017-05-15 12:48:22.167767] W [MSGID: 106118]
[glusterd-handler.c:5833:__glusterd_peer_rpc_notify] 0-management: Lock not
released for csv

and the gluster share is unavailable and when I try to list it I get:

Transport endpoint is not connected
3) then on 5th node I see message similar to 2) about 1st node being
unavailable and 5th also disconnects from the cluster

[2017-05-15 12:52:54.321189] W
[glusterd-locks.c:675:glusterd_mgmt_v3_unlock]
(-->/usr/lib64/glusterfs/3.9.1/xlator/mgmt/glusterd.so(+0x2035a)
[0x7f7fda22335a]
-->/usr/lib64/glusterfs/3.9.1/xlator/mgmt/glusterd.so(+0x29f48)
[0x7f7fda22cf48] -->/usr/lib64/glus
terfs/3.9.1/xlator/mgmt/glusterd.so(+0xd50aa) [0x7f7fda2d80aa] )
0-management: Lock for vol csv not held

[2017-05-15 12:52:54.321200] W [MSGID: 106118]
[glusterd-handler.c:5833:__glusterd_peer_rpc_notify] 0-management: Lock not
released for csv

[2017-05-15 12:53:04.659418] E [socket.c:2307:socket_connect_finish]
0-management: connection to 10.10.10.:24007 failed (Connection refused)

I am quite new to gluster but as far as I see this is somewhat a chain in
which failure of 1st node leads to disconnect of two other nodes. Any hints
how to solve this? Are there any settings for retries/timeouts/reconnects
in gluster which could help in my case?

Thanks for all help!

BR,
Rafal.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20170516/219127e6/attachment.html>