[Gluster-users] Volume add-brick: failed: (with no error message)

Iain Milne glusterfs at noognet.org
Wed Apr 16 14:56:41 UTC 2014


A further investigation of this turned up a solution...

The add-brick attempt was being done from the first server in the array
(server1). On the off chance of something different happening, we also
tried from server3. This time the error was:

[root at server3 data]# gluster volume add-brick gfs server3:/mnt/data
volume add-brick: failed: The brick server3:/mnt/data is a mount point.
Please create a sub-directory under the mount point and use that as the
brick directory. Or use 'force' at the end of the command if you want to
override this behavior.


I guess there's two issues here:

1) Why didn't the first server report the error (I assume it's a bug;
should we file a report for it?)
2) Using force worked, but now we're worried that we shouldn't be mounting
the bricks this way, despite the existing two servers running that way for
the last year

The mounts are basically:
/dev/sdb1 on /mnt/data type xfs (rw,logbufs=8,inode64,nobarrier)

We did also try the subdirectory mount option and it worked fine without
the force, but a nagging need for consistency across the servers caused us
to do a remove-brick (with data migration) on that option. That took all
night to run despite only 7MB or so supposedly being on the new server. As
the storage pool is 44TB in use of 110TB total maybe it has to scan it all
anyway...


> -----Original Message-----
> From: Iain Milne
> Sent: 16 April 2014 15:49
> To: Iain Milne
> Subject: Volume add-brick: failed: (with no error message)
>
> Hi folks,
>
> We've had a 2 node gluster array working great for the last year. Each
brick is
> a 37TB xfs mount. It's now on Centos 6.5 (x64) running gluster
> 3.4.3-2
>
> Volume Name: gfs
> Type: Distribute
> Volume ID: ddbb46bb-821e-44db-bc7e-32f43334f62c
> Status: Started
> Number of Bricks: 2
> Transport-type: tcp
> Bricks:
> Brick1: server1:/mnt/data
> Brick2: server2:/mnt/data
>
>
> We've just bought a new server (identical in every way to the previous
> two) and we're trying to get it added to the volume.
>
> The peering process goes fine:
>
> Number of Peers: 2
>
> Hostname: server2
> Uuid: 02f1a25b-afd8-49e2-8708-95456f6b8473
> State: Peer in Cluster (Connected)
>
> Hostname: server3
> Port: 24007
> Uuid: 3fc9df26-bb49-4c74-8eae-4b3f37389224
> State: Peer in Cluster (Connected)
>
>
> The only thing of interest (?) there is the addition of the port number
for the
> new server. Neither of the old servers show a port, even when running the
> peer status command on any of the boxes.
>
> The main problem is the addition of the new server/brick:
>
> [root at server1 glusterfs]# gluster volume add-brick gfs server3:/mnt/data
> volume add-brick: failed:
>
>
> There's no error there at all: just a blank after the colon.
>
> The logs on server1 (the one trying to do the add):
>
> W [rpc-transport.c:175:rpc_transport_load] 0-rpc-transport: missing 'option
> transport-type'. defaulting to "socket"
> I [socket.c:3480:socket_init] 0-glusterfs: SSL support is NOT enabled I
> [socket.c:3495:socket_init] 0-glusterfs: using system polling thread I
[cli-cmd-
> volume.c:1336:cli_check_gsync_present] 0-: geo-replication not installed I
> [cli-rpc-ops.c:1695:gf_cli_add_brick_cbk] 0-cli: Received resp to add
brick I
> [input.c:36:cli_batch] 0-: Exiting with: -1
>
>
> And the logs on server3 (the one being added):
>
> E [glusterd-op-sm.c:3719:glusterd_op_ac_stage_op] 0-management: Stage
> failed on operation 'Volume Add brick', Status : -1
>
>
> The current storage array is live and in-use by users, so it can't be taken
> offline at short notice.
>
> For completeness, here's glusterd on server3 running in debug mode when
> the add-brick command was attempted:
>
> [2014-04-15 15:03:33.133976] D
> [glusterd-handler.c:549:__glusterd_handle_cluster_lock] 0-management:
> Received LOCK from uuid: 881743a9-b71e-45a9-8528-cc932837ebb8
> [2014-04-15 15:03:33.134013] D
> [glusterd-utils.c:4936:glusterd_friend_find_by_uuid] 0-management: Friend
> found... state: Peer in Cluster
> [2014-04-15 15:03:33.134031] D
> [glusterd-op-sm.c:5355:glusterd_op_sm_inject_event] 0-management:
> Enqueue
> event: 'GD_OP_EVENT_LOCK'
> [2014-04-15 15:03:33.134051] D
> [glusterd-handler.c:572:__glusterd_handle_cluster_lock] 0-management:
> Returning 0
> [2014-04-15 15:03:33.134065] D [glusterd-op-sm.c:5432:glusterd_op_sm]
> 0-management: Dequeued event of type: 'GD_OP_EVENT_LOCK'
> [2014-04-15 15:03:33.134083] D [glusterd-utils.c:340:glusterd_lock]
> 0-management: Cluster lock held by 881743a9-b71e-45a9-8528-cc932837ebb8
> [2014-04-15 15:03:33.134096] D [glusterd-op-sm.c:2445:glusterd_op_ac_lock]
> 0-management: Lock Returned 0
> [2014-04-15 15:03:33.134153] D
> [glusterd-handler.c:1776:glusterd_op_lock_send_resp] 0-management:
> Responded to lock, ret: 0
> [2014-04-15 15:03:33.134171] D
> [glusterd-utils.c:5598:glusterd_sm_tr_log_transition_add] 0-management:
> Transitioning from 'Default' to 'Locked' due to event 'GD_OP_EVENT_LOCK'
> [2014-04-15 15:03:33.134187] D
> [glusterd-utils.c:5600:glusterd_sm_tr_log_transition_add] 0-management:
> returning 0
> [2014-04-15 15:03:33.135409] D
> [glusterd-utils.c:4936:glusterd_friend_find_by_uuid] 0-management: Friend
> found... state: Peer in Cluster
> [2014-04-15 15:03:33.135452] D
> [glusterd-handler.c:604:glusterd_req_ctx_create] 0-management: Received
> op from uuid 881743a9-b71e-45a9-8528-cc932837ebb8
> [2014-04-15 15:03:33.135481] D
> [glusterd-op-sm.c:5355:glusterd_op_sm_inject_event] 0-management:
> Enqueue
> event: 'GD_OP_EVENT_STAGE_OP'
> [2014-04-15 15:03:33.135497] D [glusterd-op-sm.c:5432:glusterd_op_sm]
> 0-management: Dequeued event of type: 'GD_OP_EVENT_STAGE_OP'
> [2014-04-15 15:03:33.135524] D
> [glusterd-utils.c:1209:glusterd_volinfo_find] 0-: Volume gfs found
> [2014-04-15 15:03:33.135537] D
> [glusterd-utils.c:1216:glusterd_volinfo_find] 0-: Returning 0
> [2014-04-15 15:03:33.135554] D
> [glusterd-utils.c:5223:glusterd_is_rb_started] 0-: is_rb_started:status=0
> [2014-04-15 15:03:33.135600] D
> [glusterd-utils.c:5232:glusterd_is_rb_paused] 0-: is_rb_paused:status=0
> [2014-04-15 15:03:33.135643] D
> [glusterd-utils.c:803:glusterd_brickinfo_new] 0-management: Returning 0
> [2014-04-15 15:03:33.135662] D
> [glusterd-utils.c:865:glusterd_brickinfo_new_from_brick] 0-management:
> Returning 0
> [2014-04-15 15:03:33.135677] D [glusterd-utils.c:665:glusterd_volinfo_new]
> 0-management: Returning 0
> [2014-04-15 15:03:33.135698] D
> [glusterd-utils.c:749:glusterd_volume_brickinfos_delete] 0-management:
> Returning 0
> [2014-04-15 15:03:33.135713] D
> [glusterd-utils.c:777:glusterd_volinfo_delete] 0-management: Returning 0
> [2014-04-15 15:03:33.135729] D
> [glusterd-utils.c:803:glusterd_brickinfo_new] 0-management: Returning 0
> [2014-04-15 15:03:33.135742] D
> [glusterd-utils.c:865:glusterd_brickinfo_new_from_brick] 0-management:
> Returning 0
> [2014-04-15 15:03:33.135755] D [glusterd-utils.c:665:glusterd_volinfo_new]
> 0-management: Returning 0
> [2014-04-15 15:03:33.135771] D
> [glusterd-utils.c:749:glusterd_volume_brickinfos_delete] 0-management:
> Returning 0
> [2014-04-15 15:03:33.135784] D
> [glusterd-utils.c:777:glusterd_volinfo_delete] 0-management: Returning 0
> [2014-04-15 15:03:33.135797] D
> [glusterd-utils.c:803:glusterd_brickinfo_new] 0-management: Returning 0
> [2014-04-15 15:03:33.135810] D
> [glusterd-utils.c:865:glusterd_brickinfo_new_from_brick] 0-management:
> Returning 0
> [2014-04-15 15:03:33.136093] D
> [glusterd-utils.c:5029:glusterd_friend_find_by_hostname] 0-management:
> Unable to find friend: server3
> [2014-04-15 15:03:33.136194] D
> [glusterd-utils.c:290:glusterd_is_local_addr] 0-management: 10.0.0.244
> [2014-04-15 15:03:33.136755] D
> [glusterd-utils.c:257:glusterd_interface_search] 0-management:
10.0.0.244 is
> local address at interface em1
> [2014-04-15 15:03:33.136778] D
> [glusterd-utils.c:5064:glusterd_hostname_to_uuid] 0-management:
> returning
> 0
> [2014-04-15 15:03:33.136790] D
> [glusterd-utils.c:819:glusterd_resolve_brick] 0-management: Returning 0
> [2014-04-15 15:03:33.136818] D
> [glusterd-utils.c:5215:glusterd_new_brick_validate] 0-management:
> returning 0
> [2014-04-15 15:03:33.136849] D
> [glusterd-brick-ops.c:1177:glusterd_op_stage_add_brick] 0-management:
> Returning -1
> [2014-04-15 15:03:33.136866] D
> [glusterd-op-sm.c:3975:glusterd_op_stage_validate] 0-management:
> Returning
> -1
> [2014-04-15 15:03:33.136878] E
> [glusterd-op-sm.c:3719:glusterd_op_ac_stage_op] 0-management: Stage
> failed on operation 'Volume Add brick', Status : -1
> [2014-04-15 15:03:33.136940] D
> [glusterd-handler.c:1891:glusterd_op_stage_send_resp] 0-management:
> Responded to stage, ret: 0
> [2014-04-15 15:03:33.136959] D
> [glusterd-op-sm.c:3728:glusterd_op_ac_stage_op] 0-management:
> Returning with 0
> [2014-04-15 15:03:33.136975] D
> [glusterd-utils.c:5598:glusterd_sm_tr_log_transition_add] 0-management:
> Transitioning from 'Locked' to 'Staged' due to event
> 'GD_OP_EVENT_STAGE_OP'
> [2014-04-15 15:03:33.136989] D
> [glusterd-utils.c:5600:glusterd_sm_tr_log_transition_add] 0-management:
> returning 0
> [2014-04-15 15:03:33.138024] D
> [glusterd-handler.c:1824:__glusterd_handle_cluster_unlock] 0-
> management:
> Received UNLOCK from uuid: 881743a9-b71e-45a9-8528-cc932837ebb8
> [2014-04-15 15:03:33.138063] D
> [glusterd-utils.c:4936:glusterd_friend_find_by_uuid] 0-management: Friend
> found... state: Peer in Cluster
> [2014-04-15 15:03:33.138105] D
> [glusterd-op-sm.c:5355:glusterd_op_sm_inject_event] 0-management:
> Enqueue
> event: 'GD_OP_EVENT_UNLOCK'
> [2014-04-15 15:03:33.138123] D [glusterd-op-sm.c:5432:glusterd_op_sm]
> 0-management: Dequeued event of type: 'GD_OP_EVENT_UNLOCK'
> [2014-04-15 15:03:33.138139] D
> [glusterd-op-sm.c:2469:glusterd_op_ac_unlock] 0-management: Unlock
> Returned 0
> [2014-04-15 15:03:33.138192] D
> [glusterd-handler.c:1795:glusterd_op_unlock_send_resp] 0-management:
> Responded to unlock, ret: 0
> [2014-04-15 15:03:33.138209] D
> [glusterd-utils.c:5598:glusterd_sm_tr_log_transition_add] 0-management:
> Transitioning from 'Staged' to 'Default' due to event
> 'GD_OP_EVENT_UNLOCK'
> [2014-04-15 15:03:33.138224] D
> [glusterd-utils.c:5600:glusterd_sm_tr_log_transition_add] 0-management:
> returning 0





More information about the Gluster-users mailing list