[Gluster-users] How to remove a dead node and re-balance volume?

Sun Sep 8 19:34:06 UTC 2013

On 09/05/2013 02:16 AM, Anup Nair wrote:
> On Thu, Sep 5, 2013 at 12:41 AM, Vijay Bellur <vbellur at redhat.com 
> <mailto:vbellur at redhat.com>> wrote:
>
>     On 09/03/2013 01:18 PM, Anup Nair wrote:
>
>         Glusterfs version 3.2.2
>
>         I have a Gluster volume in which one our of the 4 peers/nodes had
>         crashed some time ago, prior to my joining service here.
>
>         I see from volume info that the crashed (non-existing) node is
>         still
>         listed as one of the peers and the bricks are also listed. I
>         would like
>         to detach this node and its bricks and rebalance the volume with
>         remaining 3 peers. But I am unable to do so. Here are my setps:
>
>         1. #gluster peer status
>            Number of Peers: 3 -- (note: excluding the one I run this
>         command from)
>
>            Hostname: dbstore4r294 --- (note: node/peer that is down)
>            Uuid: 8bf13458-1222-452c-81d3-565a563d768a
>            State: Peer in Cluster (Disconnected)
>
>            Hostname: 172.16.1.90
>            Uuid: 77ebd7e4-7960-4442-a4a4-00c5b99a61b4
>            State: Peer in Cluster (Connected)
>
>            Hostname: dbstore3r294
>            Uuid: 23d7a18c-fe57-47a0-afbc-1e1a5305c0eb
>            State: Peer in Cluster (Connected)
>
>         2. #gluster peer detach dbstore4r294
>            Brick(s) with the peer dbstore4r294 exist in cluster
>
>         3. #gluster volume info
>
>            Volume Name: test-volume
>            Type: Distributed-Replicate
>            Status: Started
>            Number of Bricks: 4 x 2 = 8
>            Transport-type: tcp
>            Bricks:
>            Brick1: dbstore1r293:/datastore1
>            Brick2: dbstore2r293:/datastore1
>            Brick3: dbstore3r294:/datastore1
>            Brick4: dbstore4r294:/datastore1
>            Brick5: dbstore1r293:/datastore2
>            Brick6: dbstore2r293:/datastore2
>            Brick7: dbstore3r294:/datastore2
>            Brick8: dbstore4r294:/datastore2
>            Options Reconfigured:
>            network.ping-timeout: 42s
>            performance.cache-size: 64MB
>            performance.write-behind-window-size: 3MB
>            performance.io-thread-count: 8
>            performance.cache-refresh-timeout: 2
>
>         Note that the non-existent node/peer is  -- dbstore4r294
>         (bricks are
>         :/datastore1 & /datastore2  - i.e.  brick4 and brick8)
>
>         4. #gluster volume remove-brick test-volume
>         dbstore4r294:/datastore1
>            Removing brick(s) can result in data loss. Do you want to
>         Continue?
>         (y/n) y
>            Remove brick incorrect brick count of 1 for replica 2
>
>         5. #gluster volume remove-brick test-volume
>         dbstore4r294:/datastore1
>         dbstore4r294:/datastore2
>            Removing brick(s) can result in data loss. Do you want to
>         Continue?
>         (y/n) y
>            Bricks not from same subvol for replica
>
>         How do I remove the peer? What are the steps considering that
>         the node
>         is non-existent?
>         */
>
>
>
>     Do you plan to replace the dead server with a new server? If so,
>     this could be a possible sequence of steps:
>
>
> No. We are not going to replace it. So, I need to resize it to a 3 
> node cluster.
>
> I discovered the issue when one of the node hung and I had to reboot 
> it. I expected Gluster volume to be available for one node failure. 
> The volume was non-responsive.
> Surprised at that, I checked the details and found it was running with 
> one node missing for many months now, perhaps an year!
>
> I have no node to replace it with. So, I am looking for a method by 
> which I can resize it.
>
The problem is that you want to do a replica 2 volume with an odd number 
of servers. This can be done but requires that you think of bricks 
individually rather than tying sets of bricks to servers. Your goal is 
to simply have each pair of replica bricks on two unique servers.

See 
http://joejulian.name/blog/how-to-expand-glusterfs-replicated-clusters-by-one-server/ 
for an example.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://supercolony.gluster.org/pipermail/gluster-users/attachments/20130908/f89cf6d4/attachment.html>