[Gluster-users] Clarification on common tasks

Thu Aug 11 14:13:15 UTC 2016

----- Original Message -----
> From: "Anuradha Talur" <atalur at redhat.com>
> To: "Gandalf Corvotempesta" <gandalf.corvotempesta at gmail.com>
> Cc: "gluster-users" <Gluster-users at gluster.org>
> Sent: Thursday, August 11, 2016 5:47:12 PM
> Subject: Re: [Gluster-users] Clarification on common tasks
> 
> 
> 
> ----- Original Message -----
> > From: "Gandalf Corvotempesta" <gandalf.corvotempesta at gmail.com>
> > To: "gluster-users" <Gluster-users at gluster.org>
> > Sent: Thursday, August 11, 2016 2:43:34 PM
> > Subject: [Gluster-users] Clarification on common tasks
> > 
> > I would like to make some clarification on common tasks needed by
> > gluster administrators.
> > 
> > A) Let's assume a disk/brick is failed (or is going to fail) and I
> > would like to replace.
> > Which is the proper way to do so with no data loss or downtime ?
> > 
> > Looking on mailing list, seems to be the following:
> > 
> > 1) kill the brick process (how can I ensure which is the brick process
> > to kill)? I have the following on a test cluster (with just one
> > brick):
> > # ps ax -o command | grep gluster
> > /usr/sbin/glusterfsd -s 1.2.3.112 --volfile-id
> > gv0.1.2.3.112.export-sdb1-brick -p
> > /var/lib/glusterd/vols/gv0/run/1.2.3.112-export-sdb1-brick.pid -S
> > /var/run/gluster/27555a68c738d9841879991c725e92e0.socket --brick-name
> > /export/sdb1/brick -l /var/log/glusterfs/bricks/export-sdb1-brick.log
> > --xlator-option
> > *-posix.glusterd-uuid=c97606ac-f6b7-4fdc-a401-6c2d04dd73a8
> > --brick-port 49152 --xlator-option gv0-server.listen-port=49152
> > /usr/sbin/glusterd -p /var/run/glusterd.pid
> > /usr/sbin/glusterfs -s localhost --volfile-id gluster/glustershd -p
> > /var/lib/glusterd/glustershd/run/glustershd.pid -l
> > /var/log/glusterfs/glustershd.log -S
> > /var/run/gluster/5f3713389b19487b6c7d6efca6102987.socket
> > --xlator-option
> > *replicate*.node-uuid=c97606ac-f6b7-4fdc-a401-6c2d04dd73a8
> > 
> > which is the "brick process" ?
> > 
> As clarified by Lindsay, you can find the correct brick to kill
> by mapping output of gluster v status with the brick that has failed.
> > 2) unmount the brick, in example:
> > unmount /dev/sdc
> > 
> > 3) remove the failed disk
> > 
> > 4) insert the new disk
> > 5) create an XFS filesystem on the new disk
> > 6) mount the new disk where the previous one was
> > 7) add the new brick to the gluster. How ?
> > 8) run "gluster v start force".
> 
> If this is a replicate volume then only these steps are not enough.
> 
> If you are okay with the mount of new and previous brick to be
> different-
> 
> After you mount the new-brick, you will have to run
> gluster v replace-brick <volname> old_brick new_brick commit force.
> 
> By doing this you would be adding new brick to the gluster cluster
> and also letting the replicate translator know that
> the brick has been replaced and that it needs to be healed.
> 
> Once this is done, self-heal-daemon will start the healing process
> automatically.
> 
> If this step is done, you wouldn't have to run step 8 - gluster v start
> force.
> As replace-brick command takes care of bringing the new brick up.
> 
> In case you want to mount the new brick to the same path as the previous one,
> then after step 6, I'd suggest you:
> a) Create a dummy-non-existent-dir under '/' of the volume's mount point.
> b) create a dummy-non-existent-xattr on '/' of the volume's mount point.
> The above steps are basically again letting the replicate translator know
> that some healing has to be done on the brick that is down. replace-brick
> command would do this for you but as it doesn't support same path for old
> and new brick, this is a work-around. (Support for replacing bricks with
> same path will be provided in upcoming releases. It is being worked on.)
> 
Sorry, there was a mistake in this mail.
As I said, replace-brick can't be used when old and new path are the same.
And I by mistake suggested replace-brick after all the steps again!

There was a document that I'm not able to locate right now.
The first step after mounting the brick was to set volume ID using
setfattr -n trusted.glusterfs.volume-id -v <volume-id> <brickpath>.
I think there were more steps, I will update once I find the doc.
Once all the required xattrs are set, gluster v start force was supposed to be done.

start force needs to be done here as volume is already in start state but the
management daemon, glusterd, is not aware that the failed brick has been fixed
with new disks. start force is a way of letting glusterd know that there is
a brick that is down but needs to be started. This will be done without
affecting the existing up bricks.

> Once this is done, run the replace-brick command mentioned above.
> This should add some volume uuids to the brick, start the brick and then
> trigger heal to new brick.
> >
> > Why should I need the step 8? If the volume is already started and
> > working (remember that I would like to change disk with no downtime,
> > thus i can't stop the volume), why should I "start" it again ?
> > 
> > 
> > 
> > 
> > B) let's assume I would like to add a bounch of new bricks on existing
> > servers. Which is the proper procedure to do so?
> 
> Do you mean increase the capacity of the volume by adding new bricks?
> You can use gluster v add-brick new-brick(s)
> 
> The options provided to add-brick are going to vary based on how you plan to
> add these bricks (whether you want to increase replica-count or add a new
> replica set etc).
> > 
> > 
> > Ceph has a good documentation page where some common tasks are explained:
> > http://docs.ceph.com/docs/master/rados/operations/add-or-rm-osds/
> > i've not found anything similiar in gluster.
> 
> I found this for glusterFS :
> https://gluster.readthedocs.io/en/latest/Administrator%20Guide/Managing%20Volumes/#replace-brick
> 
> Hope this helps.
> > _______________________________________________
> > Gluster-users mailing list
> > Gluster-users at gluster.org
> > http://www.gluster.org/mailman/listinfo/gluster-users
> > 
> 
> --
> Thanks,
> Anuradha.
> 

-- 
Thanks,
Anuradha.