[Gluster-users] Replacing failed node (2node replication)

Mario Splivalo mario at splivalo.hr
Sun May 1 12:03:34 UTC 2016


I have set up glusterfs on two nodes, replicated volume across two
bricks (each on one server):

root at glu-tru:~# gluster volume info

Volume Name: gv01
Type: Replicate
Volume ID: 28dcf1d4-8a6c-4b70-a075-9e1dc4215271
Status: Started
Number of Bricks: 1 x 2 = 2
Transport-type: tcp
Brick1: glu-tru:/srv/gfs-bucket
Brick2: glu-pre:/srv/gfs-bucket
root at glu-tru:~#

Now, I lost one server (glu-pre), and replaced it with a fresh one. The
hostname and the IP address of the old server are the same as on the new
server. This is how I re-added fresh server:

glu-tru# gluster volume remove-brick gv01 replica 1 glu-pre:/srv/gfs-bucket
glu-tru# gluster peer detach glu-pre

Then I installed glusterfs on glu-pre (freshly installed box), created
the brick directory, and then:

glu-tru# gluster peer probe glu-pre

glu-pre# gluster peer probe glu-tru

glu-pre# gluster volume add-brick gv01 replica 2 glu-pre:/srv/gfs-bucket

glu-tru# gluster volume heal gv01 full

After the last command the /srv/gfs-bucket started to get populated on
the newly added server (glu-pre).

Now, is this proper procedure to replace failed server? I've read the
documentation (http://tinyurl.com/j3pnjza, and for instance:
- but those mention that the volume will be inaccessible during healing
period. Also, it seems more complicated, changing UUIDs in config files
and so on.

With the commands I pasted above I had perfectly fine running volume
which was accessible all the time during the re-adding of the new
server, and also during the healing period (I'm using this for a
HA-setup for a django application, which writes a lot of custom files
while working - while the volume was being healied I made sure that all
the webapp-traffic is hitting only glu-tru node, the one which haven't

I'd appreciate some comments from the more experienced glusterfs users.


P.S. This is glusterfs 3.4 on Ubuntu 14.04.
