[Gluster-users] held cluster lock blocking volume operations

Matthew Nicholson matthew_nicholson at harvard.edu
Tue Jun 4 16:59:58 UTC 2013


even more info:

i only see the Unable to get lock messages on the same node i'm running the
gluster volume command on (status, in this instance). and, it always
complains about its self. Forexample:

I run:
[root at ox60-gstore10 ~]# gluster volume status
[root at ox60-gstore10 ~]#
(it sits for a few, then just comes back empty).

the logs on that system (ox60-gstore10) yeild:

==> /var/log/glusterfs/etc-glusterfs-glusterd.vol.log <==
[2013-06-04 12:55:13.447584] I [glusterd-utils.c:285:glusterd_lock]
0-glusterd: Cluster lock held by 0edce15e-0de2-4496-a520-58c65dbbc7da
[2013-06-04 12:55:13.447637] I
[glusterd-handler.c:463:glusterd_op_txn_begin] 0-management: Acquired local
lock
[2013-06-04 12:55:13.447868] I
[glusterd-handler.c:502:glusterd_handle_cluster_lock] 0-glusterd: Received
LOCK from uuid: 0edce15e-0de2-4496-a520-58c65dbbc7da
[2013-06-04 12:55:13.447898] E [glusterd-utils.c:277:glusterd_lock]
0-glusterd: Unable to get lock for uuid:
0edce15e-0de2-4496-a520-58c65dbbc7da, lock held by:
0edce15e-0de2-4496-a520-58c65dbbc7da
[2013-06-04 12:55:13.447932] I
[glusterd-handler.c:1322:glusterd_op_lock_send_resp] 0-glusterd: Responded,
ret: 0
[2013-06-04 12:55:13.447945] E [glusterd-op-sm.c:4624:glusterd_op_sm]
0-glusterd: handler returned: -1
[2013-06-04 12:55:13.447971] I
[glusterd-rpc-ops.c:548:glusterd3_1_cluster_lock_cbk] 0-glusterd: Received
ACC from uuid: 725a2567-b668-4a5f-b2c9-5c7dcc90c846
[2013-06-04 12:55:13.447993] I
[glusterd-rpc-ops.c:548:glusterd3_1_cluster_lock_cbk] 0-glusterd: Received
ACC from uuid: 757297b4-5648-4e31-88f4-00fc167a43e4
[2013-06-04 12:55:13.448013] I
[glusterd-rpc-ops.c:548:glusterd3_1_cluster_lock_cbk] 0-glusterd: Received
RJT from uuid: 0edce15e-0de2-4496-a520-58c65dbbc7da
[2013-06-04 12:55:13.448035] I
[glusterd-rpc-ops.c:548:glusterd3_1_cluster_lock_cbk] 0-glusterd: Received
ACC from uuid: a5de08c0-e761-45ee-a7ad-e8c556f2540b
[2013-06-04 12:55:13.448056] I
[glusterd-rpc-ops.c:548:glusterd3_1_cluster_lock_cbk] 0-glusterd: Received
ACC from uuid: 303f4cc4-c8ae-42c7-b8cd-eafee8f95122
[2013-06-04 12:55:13.448143] I
[glusterd-rpc-ops.c:548:glusterd3_1_cluster_lock_cbk] 0-glusterd: Received
ACC from uuid: a327cd38-f98a-4554-ae62-97a21153f4d3
[2013-06-04 12:55:13.448166] I
[glusterd-rpc-ops.c:548:glusterd3_1_cluster_lock_cbk] 0-glusterd: Received
ACC from uuid: cdba3b89-e804-4bf1-afb9-d7c231399955
[2013-06-04 12:55:13.448191] I
[glusterd-rpc-ops.c:548:glusterd3_1_cluster_lock_cbk] 0-glusterd: Received
ACC from uuid: 055a13fe-e40a-46ff-9011-6c81832e3ba1
[2013-06-04 12:55:13.448231] I
[glusterd-rpc-ops.c:548:glusterd3_1_cluster_lock_cbk] 0-glusterd: Received
ACC from uuid: e0c267e6-3dc2-4623-89f1-4516f1285c1a
[2013-06-04 12:55:13.448257] I
[glusterd-rpc-ops.c:548:glusterd3_1_cluster_lock_cbk] 0-glusterd: Received
ACC from uuid: 6456206b-fe19-4b65-b7ab-0c9e7ce6221e
[2013-06-04 12:55:13.448282] I
[glusterd-rpc-ops.c:548:glusterd3_1_cluster_lock_cbk] 0-glusterd: Received
ACC from uuid: 439f3ffa-e468-4a8b-801e-e2f20062e6f0
[2013-06-04 12:55:13.448303] I
[glusterd-rpc-ops.c:548:glusterd3_1_cluster_lock_cbk] 0-glusterd: Received
ACC from uuid: 2225df4c-4510-457c-9958-0b6506ff25e4
[2013-06-04 12:55:13.448322] I
[glusterd-rpc-ops.c:548:glusterd3_1_cluster_lock_cbk] 0-glusterd: Received
ACC from uuid: e503bd2e-b2b2-49d4-ae05-45090e24acca
[2013-06-04 12:55:13.448340] I
[glusterd-rpc-ops.c:548:glusterd3_1_cluster_lock_cbk] 0-glusterd: Received
ACC from uuid: 5517a055-c5f5-41b7-95d2-dedf6900be21
[2013-06-04 12:55:13.448358] I
[glusterd-rpc-ops.c:548:glusterd3_1_cluster_lock_cbk] 0-glusterd: Received
ACC from uuid: 974a503e-4f0f-44f2-81df-5383c28cdf20
[2013-06-04 12:55:13.448376] I
[glusterd-rpc-ops.c:548:glusterd3_1_cluster_lock_cbk] 0-glusterd: Received
ACC from uuid: 428e11bc-5a80-41cb-af1d-a9023e2bc11b

So it sees something is holding the lock, Rejects it,

If i look up that uuid:

[root at ox60-gstore10 ~]# gluster peer status |grep
0edce15e-0de2-4496-a520-58c65dbbc7da --context=3
Number of Peers: 20

Hostname: ox60-gstore10
Uuid: 0edce15e-0de2-4496-a520-58c65dbbc7da
State: Peer in Cluster (Connected)

so it itself i holding the lock it seems. If i do this on another node in
the cluster, i se the same (the node I'm checking the status from is
holding a lock, gets rejected, and never gets any info back).




--
Matthew Nicholson
Research Computing Specialist
Harvard FAS Research Computing
matthew_nicholson at harvard.edu



On Tue, Jun 4, 2013 at 12:21 PM, Matthew Nicholson <
matthew_nicholson at harvard.edu> wrote:

> If i strace a "gluster volume status" it hangs here:
>
> epoll_wait(3, {{EPOLLOUT, {u32=5, u64=5}}}, 257, 4294967295) = 1
> getsockopt(5, SOL_SOCKET, SO_ERROR, [150710196258209792], [4]) = 0
> getsockname(5, {sa_family=AF_INET, sin_port=htons(964),
> sin_addr=inet_addr("127.0.0.1")}, [16]) = 0
> futex(0x63b7a4, FUTEX_CMP_REQUEUE_PRIVATE, 1, 2147483647, 0x63b760, 2) = 1
> futex(0x63b760, FUTEX_WAKE_PRIVATE, 1)  = 1
> epoll_ctl(3, EPOLL_CTL_MOD, 5, {EPOLLIN|EPOLLPRI, {u32=5, u64=5}}) = 0
> epoll_wait(3,
>
> so talking to locahost on 964
>
> All nodes do that, but with different ports.
>
>
>
> --
> Matthew Nicholson
> Research Computing Specialist
> Harvard FAS Research Computing
> matthew_nicholson at harvard.edu
>
>
>
> On Tue, Jun 4, 2013 at 12:19 PM, Matthew Nicholson <
> matthew_nicholson at harvard.edu> wrote:
>
>> No, no duplicate UUIDs:
>>
>> [root at ox60-gstore01 ~]# gluster peer status |grep -i uuid | uniq -c
>>       1 Uuid: 055a13fe-e40a-46ff-9011-6c81832e3ba1
>>       1 Uuid: e0c267e6-3dc2-4623-89f1-4516f1285c1a
>>       1 Uuid: e503bd2e-b2b2-49d4-ae05-45090e24acca
>>       1 Uuid: 974a503e-4f0f-44f2-81df-5383c28cdf20
>>       1 Uuid: 5517a055-c5f5-41b7-95d2-dedf6900be21
>>       1 Uuid: 13cfacc1-65a4-4151-91d5-bc7977e01654
>>       1 Uuid: a5de08c0-e761-45ee-a7ad-e8c556f2540b
>>       1 Uuid: 428e11bc-5a80-41cb-af1d-a9023e2bc11b
>>       1 Uuid: 113562a1-e521-4747-ae75-477614ea28cf
>>       1 Uuid: 04c6c37b-743d-4f87-9bdc-3dfe1b573709
>>       1 Uuid: 2225df4c-4510-457c-9958-0b6506ff25e4
>>       1 Uuid: 6456206b-fe19-4b65-b7ab-0c9e7ce6221e
>>       1 Uuid: 0edce15e-0de2-4496-a520-58c65dbbc7da
>>       1 Uuid: a327cd38-f98a-4554-ae62-97a21153f4d3
>>       1 Uuid: a7d3a064-1bb4-4da0-a680-180db8150e4c
>>       1 Uuid: 757297b4-5648-4e31-88f4-00fc167a43e4
>>       1 Uuid: 725a2567-b668-4a5f-b2c9-5c7dcc90c846
>>       1 Uuid: 303f4cc4-c8ae-42c7-b8cd-eafee8f95122
>>       1 Uuid: 439f3ffa-e468-4a8b-801e-e2f20062e6f0
>>       1 Uuid: cdba3b89-e804-4bf1-afb9-d7c231399955
>>
>> glusterd (as well as glusterfs and the nfs server, which seemingly never
>> dies if glusterd is shutdown) have all been restarted. Actually, we just
>> went so fas as to bounce one replica then another (reboot).
>>
>>
>>
>> --
>> Matthew Nicholson
>> Research Computing Specialist
>> Harvard FAS Research Computing
>> matthew_nicholson at harvard.edu
>>
>>
>>
>> On Tue, Jun 4, 2013 at 10:30 AM, Vijay Bellur <vbellur at redhat.com> wrote:
>>
>>> On 06/04/2013 07:57 PM, Matthew Nicholson wrote:
>>>
>>>> So, we've got a volume that is mostly functioning fine (its up
>>>> accessible, etc etc). However, volume operations fail/don't return on
>>>> it.
>>>>
>>>>
>>>> what i mean is
>>>>
>>>> gluster peer status//probe/etc : works
>>>> gluster volume info : works
>>>> gluster volume status/remove-brick/etc : sit for a long time and return
>>>> nothing.
>>>>
>>>> The only things coming up in logs are:
>>>>
>>>> [2013-06-04 10:21:36.398072] I [glusterd-utils.c:285:**glusterd_lock]
>>>> 0-glusterd: Cluster lock held by 757297b4-5648-4e31-88f4-**00fc167a43e4
>>>> [2013-06-04 10:21:36.398123] I
>>>> [glusterd-handler.c:463:**glusterd_op_txn_begin] 0-management: Acquired
>>>> local lock
>>>> [2013-06-04 10:21:36.398424] I
>>>> [glusterd-handler.c:502:**glusterd_handle_cluster_lock] 0-glusterd:
>>>> Received LOCK from uuid: 757297b4-5648-4e31-88f4-**00fc167a43e4
>>>> [2013-06-04 10:21:36.398448] E [glusterd-utils.c:277:**glusterd_lock]
>>>> 0-glusterd: Unable to get lock for uuid:
>>>> 757297b4-5648-4e31-88f4-**00fc167a43e4, lock held by:
>>>> 757297b4-5648-4e31-88f4-**00fc167a43e4
>>>> [2013-06-04 10:21:36.398483] I
>>>> [glusterd-handler.c:1322:**glusterd_op_lock_send_resp] 0-glusterd:
>>>> Responded, ret: 0
>>>> [2013-06-04 10:21:36.398498] E [glusterd-op-sm.c:4624:**glusterd_op_sm]
>>>> 0-glusterd: handler returned: -1
>>>>
>>>> If you notice, the UUID holding the lock, and the uuid requesting the
>>>> lock, are the same. So it seems like a lock was "forgotten" about?
>>>>
>>>> any thoughts on clearing this?
>>>>
>>>
>>> Does gluster peer status list the same UUID more than once?
>>>
>>> If not, restarting the glusterd which is the lock owner should address
>>> it.
>>>
>>> -Vijay
>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://supercolony.gluster.org/pipermail/gluster-users/attachments/20130604/febef0ec/attachment.html>


More information about the Gluster-users mailing list