[Gluster-users] stuck lock

Thu Dec 13 13:10:14 UTC 2012

Hi

I still don't know what caused it, wether the failure of one node in gluster that lost one SATA controller and was rebooted or some user activity, but gluster became quite unusable. Even basic gluster commands like gluster volume heal home0 info etc didn't work either hanging or giving operation failed results. I finally managed to stop the volume after numerous attempts and restarted gluster on all nodes. However I don't seem to be able to do anything useful still. Most commands fail and the log shows:

==> etc-glusterfs-glusterd.vol.log <==
[2012-12-13 14:59:49.713103] I [glusterd-volume-ops.c:492:glusterd_handle_cli_heal_volume] 0-management: Received heal vol req for volume home0
[2012-12-13 14:59:49.713194] E [glusterd-utils.c:277:glusterd_lock] 0-glusterd: Unable to get lock for uuid: c3ce6b9c-6297-4e77-924c-b44e2c13e58f, lock held by: c3ce6b9c-6297-4e77-924c-b44e2c13e58f
[2012-12-13 14:59:49.713234] E [glusterd-handler.c:458:glusterd_op_txn_begin] 0-management: Unable to acquire local lock, ret: -1

I've googled and seen people hit with this at times, but never resolutions. Is there some way to clear this lock? It's been in effect for well over an hour so one of the googled results that claimed there's a generic lock timeout of 30 minutes seems not to be at work here. 

Any help would be appreciated.

[root at se1 home0]# gluster volume info

Volume Name: home0
Type: Distributed-Replicate
Volume ID: 8e594854-16e1-445e-8434-1d597cef1749
Status: Started
Number of Bricks: 4 x 3 = 12
Transport-type: tcp
Bricks:
Brick1: 192.168.1.241:/d35
Brick2: 192.168.1.242:/d35
Brick3: 192.168.1.243:/d35
Brick4: 192.168.1.244:/d35
Brick5: 192.168.1.245:/d35
Brick6: 192.168.1.240:/d35
Brick7: 192.168.1.241:/d36
Brick8: 192.168.1.242:/d36
Brick9: 192.168.1.243:/d36
Brick10: 192.168.1.244:/d36
Brick11: 192.168.1.245:/d36
Brick12: 192.168.1.240:/d36
Options Reconfigured:
cluster.quorum-type: auto
cluster.lookup-unhashed: off
performance.client-io-threads: on
cluster.data-self-heal: on
performance.stat-prefetch

[root at se1 home0]# gluster volume status
Status of volume: home0
Gluster process						Port	Online	Pid
------------------------------------------------------------------------------
Brick 192.168.1.241:/d35				24009	Y	7137
Brick 192.168.1.242:/d35				24009	Y	6804
Brick 192.168.1.243:/d35				24009	Y	5763
Brick 192.168.1.244:/d35				24009	Y	10378
Brick 192.168.1.245:/d35				24009	Y	3770
Brick 192.168.1.240:/d35				24009	Y	21112
Brick 192.168.1.241:/d36				24010	Y	7143
Brick 192.168.1.242:/d36				24010	Y	6810
Brick 192.168.1.243:/d36				24010	Y	5771
Brick 192.168.1.244:/d36				24010	Y	10384
Brick 192.168.1.245:/d36				24010	Y	3781
Brick 192.168.1.240:/d36				24010	Y	21120
NFS Server on localhost					38467	Y	13552
Self-heal Daemon on localhost				N/A	Y	13792
NFS Server on 192.168.1.242				38467	Y	21254
Self-heal Daemon on 192.168.1.242			N/A	Y	21267
NFS Server on 192.168.1.243				38467	Y	8865
Self-heal Daemon on 192.168.1.243			N/A	Y	8871
NFS Server on 192.168.1.240				38467	Y	18806
Self-heal Daemon on 192.168.1.240			N/A	Y	19045
NFS Server on 192.168.1.244				38467	Y	536
Self-heal Daemon on 192.168.1.244			N/A	Y	745
NFS Server on 192.168.1.245				38467	Y	8689
Self-heal Daemon on 192.168.1.245			N/A	Y	8955

[root at se1 home0]# 

[root at se1 home0]# gluster volume heal home0 info

==> cli.log <==
[2012-12-13 15:09:33.476616] W [rpc-transport.c:174:rpc_transport_load] 0-rpc-transport: missing 'option transport-type'. defaulting to "socket"

==> etc-glusterfs-glusterd.vol.log <==
[2012-12-13 15:09:33.565022] I [glusterd-volume-ops.c:492:glusterd_handle_cli_heal_volume] 0-management: Received heal vol req for volume home0
[2012-12-13 15:09:33.565122] I [glusterd-utils.c:285:glusterd_lock] 0-glusterd: Cluster lock held by c3ce6b9c-6297-4e77-924c-b44e2c13e58f
[2012-12-13 15:09:33.565136] I [glusterd-handler.c:463:glusterd_op_txn_begin] 0-management: Acquired local lock
[2012-12-13 15:09:33.565938] I [glusterd-rpc-ops.c:548:glusterd3_1_cluster_lock_cbk] 0-glusterd: Received ACC from uuid: 663ecbfb-4209-417e-a955-6c9f72751dbc
[2012-12-13 15:09:33.565999] I [glusterd-rpc-ops.c:548:glusterd3_1_cluster_lock_cbk] 0-glusterd: Received ACC from uuid: f1a89ed2-a2f5-49a9-9482-1c6984c37945
[2012-12-13 15:09:33.566024] I [glusterd-rpc-ops.c:548:glusterd3_1_cluster_lock_cbk] 0-glusterd: Received ACC from uuid: b1ce84be-de0b-4ae1-a1e8-758d828b8872
[2012-12-13 15:09:33.566047] I [glusterd-rpc-ops.c:548:glusterd3_1_cluster_lock_cbk] 0-glusterd: Received ACC from uuid: 0f61d484-0f93-4144-b166-2145f4ea4427
[2012-12-13 15:09:33.566069] I [glusterd-rpc-ops.c:548:glusterd3_1_cluster_lock_cbk] 0-glusterd: Received ACC from uuid: d9b48655-4b25-4ad2-be19-c5ec8768a789
[2012-12-13 15:09:33.566224] I [glusterd-op-sm.c:2039:glusterd_op_ac_send_stage_op] 0-glusterd: Sent op req to 5 peers
[2012-12-13 15:09:33.566420] I [glusterd-rpc-ops.c:881:glusterd3_1_stage_op_cbk] 0-glusterd: Received ACC from uuid: b1ce84be-de0b-4ae1-a1e8-758d828b8872
[2012-12-13 15:09:33.566450] I [glusterd-rpc-ops.c:881:glusterd3_1_stage_op_cbk] 0-glusterd: Received ACC from uuid: d9b48655-4b25-4ad2-be19-c5ec8768a789
[2012-12-13 15:09:33.566499] I [glusterd-rpc-ops.c:881:glusterd3_1_stage_op_cbk] 0-glusterd: Received ACC from uuid: f1a89ed2-a2f5-49a9-9482-1c6984c37945
[2012-12-13 15:09:33.566524] I [glusterd-rpc-ops.c:881:glusterd3_1_stage_op_cbk] 0-glusterd: Received ACC from uuid: 0f61d484-0f93-4144-b166-2145f4ea4427
[2012-12-13 15:09:33.566667] I [glusterd-rpc-ops.c:881:glusterd3_1_stage_op_cbk] 0-glusterd: Received ACC from uuid: 663ecbfb-4209-417e-a955-6c9f72751dbc

<hangs here> 
ctrl+C

[root at se1 home0]# gluster volume heal home0
operation failed
[root at se1 home0]# 
==> cli.log <==
[2012-12-13 15:10:00.686308] W [rpc-transport.c:174:rpc_transport_load] 0-rpc-transport: missing 'option transport-type'. defaulting to "socket"
[2012-12-13 15:10:00.842108] I [cli-rpc-ops.c:5928:gf_cli3_1_heal_volume_cbk] 0-cli: Received resp to heal volume
[2012-12-13 15:10:00.842187] I [input.c:46:cli_batch] 0-: Exiting with: -1

==> etc-glusterfs-glusterd.vol.log <==
[2012-12-13 15:10:00.841789] I [glusterd-volume-ops.c:492:glusterd_handle_cli_heal_volume] 0-management: Received heal vol req for volume home0
[2012-12-13 15:10:00.841910] E [glusterd-utils.c:277:glusterd_lock] 0-glusterd: Unable to get lock for uuid: c3ce6b9c-6297-4e77-924c-b44e2c13e58f, lock held by: c3ce6b9c-6297-4e77-924c-b44e2c13e58f
[2012-12-13 15:10:00.841926] E [glusterd-handler.c:458:glusterd_op_txn_begin] 0-management: Unable to acquire local lock, ret: -1

Mario Kadastik, PhD
Researcher

---
  "Physics is like sex, sure it may have practical reasons, but that's not why we do it" 
     -- Richard P. Feynman