[Gluster-users] Gluster failure due to "0-management: Lock not released for <volumename>"

Atin Mukherjee amukherj at redhat.com
Tue Jun 27 07:28:43 UTC 2017


I had looked at the logs shared by Victor privately and it seems to be
there is a N/W glitch in the cluster which is causing the glusterd to lose
its connection with other peers and as a side effect to this, lot of rpc
requests are getting bailed out resulting glusterd to end up into a stale
lock and hence you see that some of the commands failed with "another
transaction is in progress or locking failed."

Some examples of the symptom highlighted:

[2017-06-21 23:02:03.826858] E [rpc-clnt.c:200:call_bail] 0-management:
bailing out frame type(Peer mgmt) op(--(2)) xid = 0x4 sent = 2017-06-21
22:52:02.719068. timeout = 600 for 192.168.150.53:24007
[2017-06-21 23:02:03.826888] E [rpc-clnt.c:200:call_bail] 0-management:
bailing out frame type(Peer mgmt) op(--(2)) xid = 0x4 sent = 2017-06-21
22:52:02.716782. timeout = 600 for 192.168.150.52:24007
[2017-06-21 23:02:53.836936] E [rpc-clnt.c:200:call_bail] 0-management:
bailing out frame type(glusterd mgmt v3) op(--(1)) xid = 0x5 sent =
2017-06-21 22:52:47.909169. timeout = 600 for 192.168.150.53:24007
[2017-06-21 23:02:53.836991] E [MSGID: 106116]
[glusterd-mgmt.c:124:gd_mgmt_v3_collate_errors] 0-management: Locking
failed on gfsnode3. Please check log file for details.
[2017-06-21 23:02:53.837016] E [rpc-clnt.c:200:call_bail] 0-management:
bailing out frame type(glusterd mgmt v3) op(--(1)) xid = 0x5 sent =
2017-06-21 22:52:47.909175. timeout = 600 for 192.168.150.52:24007

I'd like you to request to first look at the N/W layer and rectify the
problems.





On Thu, Jun 22, 2017 at 9:30 PM, Atin Mukherjee <amukherj at redhat.com> wrote:

> Could you attach glusterd.log and cmd_history.log files from all the nodes?
>
> On Wed, Jun 21, 2017 at 11:40 PM, Victor Nomura <victor at mezine.com> wrote:
>
>> Hi All,
>>
>>
>>
>> I’m fairly new to Gluster (3.10.3) and got it going for a couple of
>> months now but suddenly after a power failure in our building it all came
>> crashing down.  No client is able to connect after powering back the 3
>> nodes I have setup.
>>
>>
>>
>> Looking at the logs, it looks like there’s some sort of “Lock” placed on
>> the volume which prevents all the clients from connecting to the Gluster
>> endpoint.
>>
>>
>>
>> I can’t even do a #gluster volume status all command IF more than 1 node
>> is powered up.  I have to shutdown node2-3 and then I am able to issue the
>> command on node1 to see volume status.  When all nodes are powered up
>> and I check the peer status, it says that all peers are connected.  Trying
>> to connect to the Gluster volume from all clients says gluster endpoint is
>> not available and times out. There are no network issues and each node
>> can ping each other and there are no firewalls or any other device between
>> the nodes and clients.
>>
>>
>>
>> Please help if you think you know how to fix this.  I have a feeling it’s
>> this “lock” that’s not “released” due to the whole setup losing power all
>> of a sudden.  I’ve tried restarting all the nodes, restarting
>> glusterfs-server etc. I’m out of ideas.
>>
>>
>>
>> Thanks in advance!
>>
>>
>>
>> Victor
>>
>>
>>
>> Volume Name: teravolume
>>
>> Type: Distributed-Replicate
>>
>> Volume ID: 85af74d0-f1bc-4b0d-8901-4dea6e4efae5
>>
>> Status: Started
>>
>> Snapshot Count: 0
>>
>> Number of Bricks: 3 x 2 = 6
>>
>> Transport-type: tcp
>>
>> Bricks:
>>
>> Brick1: gfsnode1:/media/brick1
>>
>> Brick2: gfsnode2:/media/brick1
>>
>> Brick3: gfsnode3:/media/brick1
>>
>> Brick4: gfsnode1:/media/brick2
>>
>> Brick5: gfsnode2:/media/brick2
>>
>> Brick6: gfsnode3:/media/brick2
>>
>> Options Reconfigured:
>>
>> nfs.disable: on
>>
>>
>>
>>
>>
>> [2017-06-21 16:02:52.376709] W [MSGID: 106118]
>> [glusterd-handler.c:5913:__glusterd_peer_rpc_notify] 0-management: Lock
>> not released for teravolume
>>
>> [2017-06-21 16:03:03.429032] I [MSGID: 106163]
>> [glusterd-handshake.c:1309:__glusterd_mgmt_hndsk_versions_ack]
>> 0-management: using the op-version 31000
>>
>> [2017-06-21 16:13:13.326478] E [rpc-clnt.c:200:call_bail] 0-management:
>> bailing out frame type(Peer mgmt) op(--(2)) xid = 0x105 sent = 2017-06-21
>> 16:03:03.202284. timeout = 600 for 192.168.150.52:$
>>
>> [2017-06-21 16:13:13.326519] E [rpc-clnt.c:200:call_bail] 0-management:
>> bailing out frame type(Peer mgmt) op(--(2)) xid = 0x105 sent = 2017-06-21
>> 16:03:03.204555. timeout = 600 for 192.168.150.53:$
>>
>> [2017-06-21 16:18:34.456522] I [MSGID: 106004]
>> [glusterd-handler.c:5888:__glusterd_peer_rpc_notify] 0-management: Peer
>> <gfsnode2> (<e1e1caa5-9842-40d8-8492-a82b079879a3>), in state <Peer in
>> Cluste$
>>
>> [2017-06-21 16:18:34.456619] W [glusterd-locks.c:675:glusterd_mgmt_v3_unlock]
>> (-->/usr/lib/x86_64-linux-gnu/glusterfs/3.10.3/xlator/mgmt/glusterd.so(+0x1f879)
>> [0x7fee6bc22879] -->/usr/lib/x86_64-l$
>>
>> [2017-06-21 16:18:34.456638] W [MSGID: 106118]
>> [glusterd-handler.c:5913:__glusterd_peer_rpc_notify] 0-management: Lock
>> not released for teravolume
>>
>> [2017-06-21 16:18:34.456661] I [MSGID: 106004]
>> [glusterd-handler.c:5888:__glusterd_peer_rpc_notify] 0-management: Peer
>> <gfsnode3> (<59b9effa-2b88-4764-9130-4f31c14c362e>), in state <Peer in
>> Cluste$
>>
>> [2017-06-21 16:18:34.456692] W [glusterd-locks.c:675:glusterd_mgmt_v3_unlock]
>> (-->/usr/lib/x86_64-linux-gnu/glusterfs/3.10.3/xlator/mgmt/glusterd.so(+0x1f879)
>> [0x7fee6bc22879] -->/usr/lib/x86_64-l$
>>
>> [2017-06-21 16:18:43.323944] I [MSGID: 106163]
>> [glusterd-handshake.c:1309:__glusterd_mgmt_hndsk_versions_ack]
>> 0-management: using the op-version 31000
>>
>> [2017-06-21 16:18:34.456699] W [MSGID: 106118]
>> [glusterd-handler.c:5913:__glusterd_peer_rpc_notify] 0-management: Lock
>> not released for teravolume
>>
>> [2017-06-21 16:18:45.628552] I [MSGID: 106163]
>> [glusterd-handshake.c:1309:__glusterd_mgmt_hndsk_versions_ack]
>> 0-management: using the op-version 31000
>>
>> [2017-06-21 16:23:40.607173] I [MSGID: 106499]
>> [glusterd-handler.c:4363:__glusterd_handle_status_volume] 0-management:
>> Received status volume req for volume teravolume
>>
>> _______________________________________________
>> Gluster-users mailing list
>> Gluster-users at gluster.org
>> http://lists.gluster.org/mailman/listinfo/gluster-users
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20170627/99f0a294/attachment.html>


More information about the Gluster-users mailing list