[Gluster-users] Remove a brick, rebuild it, put it back in

Fri Oct 7 03:25:45 UTC 2016

I've simulated the problem on 4 VMs in a distributed replicated setup with
a 2 replica-factor. I've repeatedly torn down and brought up a VM from a
snapshot in each of my tests.

What has worked so far is this:

   1. Make a copy of /var/lib/glusterd from the affected machine, save it
   elsewhere.
   2. Configure your new machine (in my case I reverted to a VM snapshot).
   Assign the same ip and hostname!
   3. Install gluster.
   4. Stop the daemons if they are running.
   5. Nuke the /var/lib/glusterd directory and replace it with the saved
   copy in step 1.
   6. Create the brick directory.
   7. Get the extended volume attribute from a healthy node like so: getfattr
   -e base64 -n trusted.glusterfs.volume-id /data/brick_dir
   8. Apply the extended attribute volume id attribute like so: setfattr -n
   trusted.glusterfs.volume-id -v 'the_value_you_got_in_7==' /data/brick_dir
   9. Start the daemons.
   10. FUSE mount the gluster partition through the daemons running
   locally. So the /etc/fstab would contain something like:
   localhost:/gluster_volume /mnt/gluster  glusterfs _netdev,defaults  0 0
   11. On the healthy partner machine with another fuse mount point to the
   same volume do something like: find /mnt/fuse | xargs stat.
   12. Step 8 will make files appear under the mount point on the new box
   but the files are not going to be physically in the brick directory -- yet.
   See 10.
   13. Run the heal command from the same host where you ran find. That
   will finally sync the files to the brick. Run the heal info command
   periodically and the number of files being healed should eventually go down
   to 0.

That's my experience with the VMs today.

On Wed, Oct 5, 2016 at 4:46 PM, Joe Julian <joe at julianfamily.org> wrote:

> What I always do is just shut it down, repair (or replace) the brick, then
> start it up again with "... start $volname force".
>
> On October 5, 2016 11:27:36 PM GMT+02:00, Sergei Gerasenko <
> sgerasenko74 at gmail.com> wrote:
>>
>> Hi, sorry if this has been asked before but the documentation is a bit
>> conflicting in various sources on what to do exactly.
>>
>> I have an 6-node, distributed replicated cluster with a replica factor of
>> 2. So it's 3 pairs of servers. I need to remove a server from one of those
>> replica sets, rebuild it and put it back in.
>>
>> What's the tried and proven sequence of steps for this? Any pointers
>> would be very useful.
>>
>> Thanks!
>>   Sergei
>>
>> ------------------------------
>>
>> Gluster-users mailing list
>> Gluster-users at gluster.org
>> http://www.gluster.org/mailman/listinfo/gluster-users
>>
>>
> --
> Sent from my Android device with K-9 Mail. Please excuse my brevity.
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-users
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20161006/caeea5be/attachment.html>