[Gluster-users] Remove a brick, rebuild it, put it back in

Mon Oct 10 21:31:06 UTC 2016

To answer my own question. I now realize the importance of creating
specifically directories and not files because it appears that directories
are created on each pair regardless of the hash. And so, if a host is down,
changes will be marked as pending for that host. With files it's different
because in that case the name of the file *IS *important resulting in the
file being placed on only 1 pair.

On Mon, Oct 10, 2016 at 3:47 PM, Sergei Gerasenko <sgerasenko74 at gmail.com>
wrote:

> The guide here: https://gluster.readthedocs.io/en/latest/Administrator
> %20Guide/Managing%20Volumes/#replace-faulty-brick suggests running the
> following while the partner host is down:
>
> mkdir /mnt/r2/<name-of-nonexistent-dir> rmdir
> /mnt/r2/<name-of-nonexistent-dir> setfattr -n trusted.non-existent-key -v
> abc /mnt/r2 setfattr -x trusted.non-existent-key /mnt/r2
>
> That should set an extended attribute on the healthy replica partner
> indicating that there are pending changes for the partner host.
>
> Remembering that we're in a distributed, replicated situation, I don't
> quite understand because the created directories can be created on any
> pair, not necessarily the one we're fixing. I think the name of the
> directory should chosen such that its dht value lands the file on the
> affected brick (the healthy of the two replicate hosts). That's not easy to
> do.
>
> Does somebody have any suggestions?
>
>
> On Thu, Oct 6, 2016 at 10:47 PM, Sergei Gerasenko <sgerasenko74 at gmail.com>
> wrote:
>
>> Step 10 isn't really necessary. The changes should probably be monitored
>> under the brick directory.
>>
>> On Thu, Oct 6, 2016 at 10:25 PM, Sergei Gerasenko <sgerasenko74 at gmail.com
>> > wrote:
>>
>>> I've simulated the problem on 4 VMs in a distributed replicated setup
>>> with a 2 replica-factor. I've repeatedly torn down and brought up a VM from
>>> a snapshot in each of my tests.
>>>
>>> What has worked so far is this:
>>>
>>>
>>>    1. Make a copy of /var/lib/glusterd from the affected machine, save
>>>    it elsewhere.
>>>    2. Configure your new machine (in my case I reverted to a VM
>>>    snapshot). Assign the same ip and hostname!
>>>    3. Install gluster.
>>>    4. Stop the daemons if they are running.
>>>    5. Nuke the /var/lib/glusterd directory and replace it with the
>>>    saved copy in step 1.
>>>    6. Create the brick directory.
>>>    7. Get the extended volume attribute from a healthy node like so: getfattr
>>>    -e base64 -n trusted.glusterfs.volume-id /data/brick_dir
>>>    8. Apply the extended attribute volume id attribute like so: setfattr
>>>    -n trusted.glusterfs.volume-id -v 'the_value_you_got_in_7==' /data/brick_dir
>>>    9. Start the daemons.
>>>    10. FUSE mount the gluster partition through the daemons running
>>>    locally. So the /etc/fstab would contain something like:
>>>    localhost:/gluster_volume /mnt/gluster  glusterfs _netdev,defaults
>>>     0 0
>>>    11. On the healthy partner machine with another fuse mount point to
>>>    the same volume do something like: find /mnt/fuse | xargs stat.
>>>    12. Step 8 will make files appear under the mount point on the new
>>>    box but the files are not going to be physically in the brick directory --
>>>    yet. See 10.
>>>    13. Run the heal command from the same host where you ran find. That
>>>    will finally sync the files to the brick. Run the heal info command
>>>    periodically and the number of files being healed should eventually go down
>>>    to 0.
>>>
>>> That's my experience with the VMs today.
>>>
>>> On Wed, Oct 5, 2016 at 4:46 PM, Joe Julian <joe at julianfamily.org> wrote:
>>>
>>>> What I always do is just shut it down, repair (or replace) the brick,
>>>> then start it up again with "... start $volname force".
>>>>
>>>> On October 5, 2016 11:27:36 PM GMT+02:00, Sergei Gerasenko <
>>>> sgerasenko74 at gmail.com> wrote:
>>>>>
>>>>> Hi, sorry if this has been asked before but the documentation is a bit
>>>>> conflicting in various sources on what to do exactly.
>>>>>
>>>>> I have an 6-node, distributed replicated cluster with a replica factor
>>>>> of 2. So it's 3 pairs of servers. I need to remove a server from one of
>>>>> those replica sets, rebuild it and put it back in.
>>>>>
>>>>> What's the tried and proven sequence of steps for this? Any pointers
>>>>> would be very useful.
>>>>>
>>>>> Thanks!
>>>>>   Sergei
>>>>>
>>>>> ------------------------------
>>>>>
>>>>> Gluster-users mailing list
>>>>> Gluster-users at gluster.org
>>>>> http://www.gluster.org/mailman/listinfo/gluster-users
>>>>>
>>>>>
>>>> --
>>>> Sent from my Android device with K-9 Mail. Please excuse my brevity.
>>>>
>>>> _______________________________________________
>>>> Gluster-users mailing list
>>>> Gluster-users at gluster.org
>>>> http://www.gluster.org/mailman/listinfo/gluster-users
>>>>
>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20161010/f3a13be0/attachment.html>