[Gluster-users] Replace-brick on 3.3.1 hangs entire volume for several minutes and then hangs glusterfs on destination brick

Hans Lambermont hans at shapeways.com
Mon Apr 8 14:17:34 UTC 2013


Hi gluster users,

I just upgraded 3.2.5 to 3.3.1 for a Distributed-Replicate volume with
about 2M directories to get a working replace-brick and now see it hang
up the entire gluster volume for all clients for several minutes, and
subsequently hang up the glusterfs on the destination brick.

I suspect the gluster volume hangup to be related to
https://bugzilla.redhat.com/show_bug.cgi?id=832609 "Glusterfsd hangs if
brick filesystem becomes unresponsive, causing all clients to lock up".

The resulting hanging destination replace-brick sits at 100% CPU and
shows no strace output.

gluster volume replace-brick xxx status
Number of files migrated = 3       Current file= /xxx 

%CPU %MEM    TIME+  P COMMAND
100  0.2   2238:48 2 //sbin/glusterfs -f/var/lib/glusterd/vols/vol01/rb_dst_brick.vol ...

The target brick received about 1% of the intended directories.

The log file -etc-glusterfs-glusterd.vol.log shows only that the
replace-brick has started :

I [glusterd-replace-brick.c:98:glusterd_handle_replace_brick] 0-glusterd: Received replace brick req
I [glusterd-replace-brick.c:147:glusterd_handle_replace_brick] 0-glusterd: Received replace brick status request
I [glusterd-utils.c:285:glusterd_lock] 0-glusterd: Cluster lock held by 3*
I [glusterd-handler.c:463:glusterd_op_txn_begin] 0-management: Acquired local lock
I [glusterd-rpc-ops.c:548:glusterd3_1_cluster_lock_cbk] 0-glusterd: Received ACC from uuid: 9*
I [glusterd-rpc-ops.c:548:glusterd3_1_cluster_lock_cbk] 0-glusterd: Received ACC from uuid: c*
I [glusterd-utils.c:857:glusterd_volume_brickinfo_get_by_brick] 0-: brick: s1:/g/c
I [glusterd-utils.c:814:glusterd_volume_brickinfo_get] 0-management: Found brick
I [glusterd-op-sm.c:2039:glusterd_op_ac_send_stage_op] 0-glusterd: Sent op req to 2 peers
I [glusterd-rpc-ops.c:881:glusterd3_1_stage_op_cbk] 0-glusterd: Received ACC from uuid: c*
I [glusterd-rpc-ops.c:881:glusterd3_1_stage_op_cbk] 0-glusterd: Received ACC from uuid: 9*
I [glusterd-utils.c:857:glusterd_volume_brickinfo_get_by_brick] 0-: brick: s1:/g/c
I [glusterd-utils.c:814:glusterd_volume_brickinfo_get] 0-management: Found brick
I [glusterd-replace-brick.c:1288:rb_update_dstbrick_port] 0-: adding dst-brick port no
I [glusterd-op-sm.c:2384:glusterd_op_ac_send_commit_op] 0-management: Sent op req to 2 peers
I [glusterd-rpc-ops.c:1317:glusterd3_1_commit_op_cbk] 0-glusterd: Received ACC from uuid: c*
I [glusterd-rpc-ops.c:1317:glusterd3_1_commit_op_cbk] 0-glusterd: Received ACC from uuid: 9*
I [glusterd-rpc-ops.c:607:glusterd3_1_cluster_unlock_cbk] 0-glusterd: Received ACC from uuid: 9*
I [glusterd-rpc-ops.c:607:glusterd3_1_cluster_unlock_cbk] 0-glusterd: Received ACC from uuid: c*
I [glusterd-op-sm.c:2653:glusterd_op_txn_complete] 0-glusterd: Cleared local lock

Any hints on how to proceed from here and get replace-brick to work are welcome.

regards,
   Hans Lambermont
-- 
Hans Lambermont | Senior Architect
(t) +31407370104 (w) www.shapeways.com



More information about the Gluster-users mailing list