[Gluster-users] add/replace brick corrupting data

Sat May 14 14:45:28 UTC 2016

Am testing replacing the brick in a replica 3 test volume. Gluster 
3.7.11. Volume hosts two VM's. 3 Nodes, vna, vnb and vng.

*First off I tried removing/adding a brick.*

     gluster v remove-brick replica 2 
vng.proxmox.softlog:/tank/vmdata/test1 force.

That worked fine, VM's (on another node) kept running without a hiccup

I deleted /tank/vmdata/test1, then

     gluster v add-brick replica 3 
vng.proxmox.softlog:/tank/vmdata/test1 force.

Succeeded and heal statistics immediatly showed 3000+ shards being 
healed on vna and vnb

Unfortunately it also show 100's of sharded being healed on vng, which 
should not be happening as it had no data on it. Reverse heal basically.

Eventually all the heals completed, but the VM's were hopeless ccorrupted.

*Then I retried the above, but with all VM's shutdown*
i.e, no writes or reads happening on the volume.

This worked - i.e all the shards on vna & vnb healed, nothing in 
reverse. Once completed the data (VM's) was fine.

Unfortunately this isn't practical in production - can' bring all the 
VM's down for the 1-2 days it would take to heal.

*Replacing the brick

*I tried

killed the glusterfsd process on vng, then
     gluster v replace-brick test1 
vng.proxmox.softlog:/tank/vmdata/test1 
vng.proxmox.softlog:/tank/vmdata/test1.1 commit force
*
*vna & vnb shards started healing, but vng showed 5 reverse heals happening.
Eventually it got down to 4-5 shards needing healing on each brick and 
stopped. They didn't go away till I removed the test1.1 brick.
*

*Currently the replace brick processes seems to be unusable except when 
the volume is not being used.

-- 
Lindsay Mathieson

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20160515/b212c79b/attachment.html>