[Gluster-users] Problem in replicating an existing gluster volume from single brick setup to two brick setup

Thu Aug 25 15:55:18 UTC 2016

On Thu, Aug 25, 2016 at 7:00 PM, Jabran Asghar <jabran.asghar at futureit.ch>
wrote:

> Greetings,
>
>
>
> I have a problem in replicating an existing gluster volume from single
> brick setup to two brick setup. Background of the problem is following:
>
>
>
> OS: Ubuntu 14.04
>
> Gluster version (from gluster repos): glusterfs 3.7.14 built on Aug  1
> 2016 16:57:28
>
>
>
> 1. I had a replication setup consisting of two Gluster bricks (srv100,
> srv102), and three volumes (gv0, gv100).
>
> 2. I had to completely rebuild raid/disks of one of the bricks (srv100)
> due to hardware failure. I did it by doing following on the faulty node:
>
> 2.1 Removed the failed brick from replication setup (reduced replica count
> to 1 from 2, and detached the node). I executed following commands on the *
> *good** brick.
>
>               sudo gluster volume remove-brick gv100 replica 1
> srv100:/pool01/gfs/brick1/gv100 force
>
>              sudo gluster volume remove-brick gv0 replica 1
> srv100:/pool01/gfs/brick1/gv0 force
>
>               sudo gluster vol info #make sure the faulty node bricks are
> not listed, and brick count is 1 for each volume
>
>               sudo gluster peer detach srv100 force
>
>               sudo gluster peer status # --> OK, only one node/brick
>
>
>
> 2.2 Stopped glusterd, killed all gluster processes
>
> 2.3 Replaced HDs, and recreated raid. This means all GlusterFS data
> relevant directories were lost on the faulty-brick (srv100), while
> GlusterFS service installation and config files were untouched (including
> host name and IP address).
>
> 2.4 After rebuilding, I created volume directories on the rebuilt-node
>
> 2.5 Then I started gluster service, and added the node back to gluster
> cluster. Peer status is ok (in cluster)
>

This step should have synced the existing volumes as well. Could you share
the glusterd log file of srv100 with us to check what went wrong that time?
Does restarting glusterd on srv100 bring back the existing volumes under
/var/lib/glusterd ?

>
> 2.6 Then I attempted to replicate one of the existing volume (gv0), and *
> *there** came the problem. The replication could not be setup properly,
> and gave following error
>
>            sudo gluster volume add-brick gv0 replica 2
> srv100:/pool01/gfs/brick1/gv0
>
>                  volume add-brick: failed: Staging failed on srv100.
> Please check log file for details.
>
>
>
>     The relevant gluster log file says
>
>
>
> [2016-08-25 12:32:29.499708] I [MSGID: 106499] [glusterd-handler.c:4267:__glusterd_handle_status_volume]
> 0-management: Received status volume req for volume gv-temp
>
> [2016-08-25 12:32:29.501881] E [MSGID: 106301] [glusterd-syncop.c:1274:gd_stage_op_phase]
> 0-management: Staging of operation 'Volume Status' failed on localhost :
> Volume gv-temp is not started
>
> [2016-08-25 12:32:29.505033] I [MSGID: 106499] [glusterd-handler.c:4267:__glusterd_handle_status_volume]
> 0-management: Received status volume req for volume gv0
>
> [2016-08-25 12:32:29.508585] E [MSGID: 106153] [glusterd-syncop.c:113:gd_collate_errors]
> 0-glusterd: Staging failed on srv100. Error: Volume gv0 does not exist
>
> [2016-08-25 12:32:29.511062] I [MSGID: 106499] [glusterd-handler.c:4267:__glusterd_handle_status_volume]
> 0-management: Received status volume req for volume gv100
>
> [2016-08-25 12:32:29.514556] E [MSGID: 106153] [glusterd-syncop.c:113:gd_collate_errors]
> 0-glusterd: Staging failed on srv100. Error: Volume gv100 does not exist
>
> [2016-08-25 12:33:15.865773] I [MSGID: 106499] [glusterd-handler.c:4267:__glusterd_handle_status_volume]
> 0-management: Received status volume req for volume gv0
>
> [2016-08-25 12:33:15.869441] E [MSGID: 106153] [glusterd-syncop.c:113:gd_collate_errors]
> 0-glusterd: Staging failed on srv100. Error: Volume gv0 does not exist
>
> [2016-08-25 12:33:15.872630] I [MSGID: 106499] [glusterd-handler.c:4267:__glusterd_handle_status_volume]
> 0-management: Received status volume req for volume gv100
>
> [2016-08-25 12:33:15.876199] E [MSGID: 106153] [glusterd-syncop.c:113:gd_collate_errors]
> 0-glusterd: Staging failed on srv100. Error: Volume gv100 does not exist
>
> [2016-08-25 12:34:14.716735] I [MSGID: 106482] [glusterd-brick-ops.c:442:__glusterd_handle_add_brick]
> 0-management: Received add brick req
>
> [2016-08-25 12:34:14.716787] I [MSGID: 106062] [glusterd-brick-ops.c:494:__glusterd_handle_add_brick]
> 0-management: replica-count is 2
>
> [2016-08-25 12:34:14.716809] I [MSGID: 106447]
> [glusterd-brick-ops.c:240:gd_addbr_validate_replica_count] 0-management:
> Changing the type of volume gv0 from 'distribute' to 'replica'
>
> [2016-08-25 12:34:14.720133] E [MSGID: 106153] [glusterd-syncop.c:113:gd_collate_errors]
> 0-glusterd: Staging failed on srv100. Please check log file for details.
>
>
>
> 3. I tried to create a new replicated volume (gv-temp) over the nodes à
> it is created and replicated. It is only that the existing volume I cannot
> replicate again!
>
> 4. I also observed that /var/lib/glusterd/vols directory on the rebuilt
> node contains directory for the newly created volume (gv-temp), and no
> existing volumes (gv100, gv0)
>
>
>
>
>
> *Questions:*
>
> a. How to re-replicate the exiting volume, for which I set the replica
> count to 1 (see point 2.1)?
>
> b. Is there a “glusterfs” way to create missing volume directories (under
> /var/lib/glusterd/vols) on the re-built node (see point 4)?
>
> c. Any other pointers, hints?
>
>
>
> Thanks.
>
>
>
> Kind regards,
>
> JAsghar
>
>
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-users
>

-- 

--Atin
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20160825/32da1291/attachment.html>