[Gluster-users] Issues removing then adding a brick to a replica volume (Gluster 3.7.6)

Mon Jan 18 12:24:47 UTC 2016

----- Original Message -----

> From: "Lindsay Mathieson" <lindsay.mathieson at gmail.com>
> To: "gluster-users" <gluster-users at gluster.org>
> Sent: Monday, January 18, 2016 11:19:22 AM
> Subject: [Gluster-users] Issues removing then adding a brick to a replica
> volume (Gluster 3.7.6)

> Been running through my eternal testing regime ... and experimenting with
> removing/adding bricks - to me, a necessary part of volume maintenance for
> dealing with failed disks. The datastore is a VM host and all the following
> is done live. Sharding is active with a 512MB shard size.

> So I started off with a replica 3 volume

> > // recreated from memory
> 
> > Volume Name: datastore1
> 
> > Type: Replicate
> 
> > Volume ID: bf882533-f1a9-40bf-a13e-d26d934bfa8b
> 
> > Status: Started
> 
> > Number of Bricks: 1 x 3 = 3
> 
> > Transport-type: tcp
> 
> > Bricks:
> 
> > Brick1: vnb.proxmox.softlog:/vmdata/datastore1
> 
> > Brick2: vng.proxmox.softlog:/vmdata/datastore1
> 
> > Brick3: vna.proxmox.softlog:/vmdata/datastore1
> 

> I remove a brick with:

> gluster volume remove-brick datastore1 replica 2
> vng.proxmox.softlog:/vmdata/datastore1 force

> so we end up with:

> > Volume Name: datastore1
> 
> > Type: Replicate
> 
> > Volume ID: bf882533-f1a9-40bf-a13e-d26d934bfa8b
> 
> > Status: Started
> 
> > Number of Bricks: 1 x 2 = 2
> 
> > Transport-type: tcp
> 
> > Bricks:
> 
> > Brick1: vna.proxmox.softlog:/vmdata/datastore1
> 
> > Brick2: vnb.proxmox.softlog:/vmdata/datastore1
> 

> All well and good. No heal issues, VM's running ok.

> Then I clean the brick off the vng host:

> rm -rf /vmdata/datastore1

> I then add the brick back with:

> > gluster volume add-brick datastore1 replica 3
> > vng.proxmox.softlog:/vmdata/datastore1
> 

> > Volume Name: datastore1
> 
> > Type: Replicate
> 
> > Volume ID: bf882533-f1a9-40bf-a13e-d26d934bfa8b
> 
> > Status: Started
> 
> > Number of Bricks: 1 x 3 = 3
> 
> > Transport-type: tcp
> 
> > Bricks:
> 
> > Brick1: vna.proxmox.softlog:/vmdata/datastore1
> 
> > Brick2: vnb.proxmox.softlog:/vmdata/datastore1
> 
> > Brick3: vng.proxmox.softlog:/vmdata/datastore1
> 

> This recreates the brick directory "datastore1". Unfortunately this is where
> things start to go wrong :( Heal info:

> > gluster volume heal datastore1 info
> 
> > Brick vna.proxmox.softlog:/vmdata/datastore1
> 
> > /.shard/d6aad699-d71d-4b35-b021-d35e5ff297c4.57
> 
> > /.shard/d6aad699-d71d-4b35-b021-d35e5ff297c4.5
> 
> > Number of entries: 2
> 

> > Brick vnb.proxmox.softlog:/vmdata/datastore1
> 
> > /.shard/d6aad699-d71d-4b35-b021-d35e5ff297c4.5
> 
> > /.shard/d6aad699-d71d-4b35-b021-d35e5ff297c4.57
> 
> > Number of entries: 2
> 

> > Brick vng.proxmox.softlog:/vmdata/datastore1
> 
> > /.shard/d6aad699-d71d-4b35-b021-d35e5ff297c4.1
> 
> > /.shard/d6aad699-d71d-4b35-b021-d35e5ff297c4.6
> 
> > /.shard/d6aad699-d71d-4b35-b021-d35e5ff297c4.15
> 
> > /.shard/d6aad699-d71d-4b35-b021-d35e5ff297c4.18
> 
> > /.shard/d6aad699-d71d-4b35-b021-d35e5ff297c4.5
> 

> Its my understanding that there shouldn't be any heal entries on vng as it
> that is where all the shards should be sent *to*
Lindsay, 

Heal _is_ necessary when you add a brick that changes the replica count from n to (n+1). Now the new brick that is also part of the existing replica set is lagging with respect to the existing bricks 
and needs to be brought in sync with these. All files and directories in vna and/or vnb will be healed to vng in your case. 

> also running qemu-img check on the hosted VM images results in a I/O error.
> Eventually the VM's themselves crash - I suspect this is due to individual
> shards being unreadable.

> Another odd behaviour I get is if I run a full heal on vnb I get the
> following error:

> > Launching heal operation to perform full self heal on volume datastore1 has
> > been unsuccessful
> 

> However if I run it on VNA, it succeeds.
Yes, there is a bug report for this @ https://bugzilla.redhat.com/show_bug.cgi?id=1112158 . 
The workaround, like you yourself figured, is to run the command on the node with the highest uuid. 
Steps: 
1) Collect output of `cat /var/lib/glusterd/glusterd.info | grep UUID` from each of the nodes , perhaps into a file named 'uuid.txt'. 
2) cat uuid.txt | sort 
3) Pick the last gfid. 
4) find out which of the glusterd.info files has the same uuid as this selected uuid. 
5) Run 'heal info full' on that same node. 

Let me know if this works for you. 

-Krutika 

> Lastly - if I remove the brick everythign returns to normal immediately. Heal
> Info shows no issues and qemu-img check returns no errors.

> --
> Lindsay Mathieson

> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-users
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20160118/aa727b8b/attachment.html>