[Gluster-users] Setup scenario for Cluster 4 node cluster.

Tue Mar 8 21:17:06 UTC 2011

Hello Pranithk

Thanks for your help. I really appreciate it. The following is our proof of
concept setup. Hopefully through this you can guide how best to work around
disasters and node failures.

4 nodes distributed replication. All nodes run on 1 Gbps private network and
have 1 TB sata HDD each

192.168.2.100
192.168.2.101
192.168.2.102
192.168.2.103

A single access client

192.168.2.104

Scenario
On the Node1 (192.168.2.100) issued the peer probe command to the rest of
the nodes. And 1 brick is created. Now the client (192.168.2.104) writes
data over the cluster each nodes gets a replicated copy. All nodes run Ucarp
for single ip address for the client to access. We use Glusterfs native
client (FUSE)

Now say around midnight Node1 fails (total failure -- disk dies -- processor
dies -- everything on this node dies -- no chance of data recovery on this
node -- total node loss). Our staff add another node onto the private
network this node is blank. We hardware spec as Node1. We load up the
partition tables onto this new node.. its similar to the lost node except
does not have the gluster data anymore. Now, what should i do to add this
node into the cluster and get the cluster back to normal.

Should the following be ok:

1. Run probe peer again on the re-gained Node1

2. Run rebalance command.

3 According to you Pranithk the system is self healing. So do the other
nodes constantly ping back Node1 ip again and again until they get a
response.

4. What are the exact steps we need to take in order to make sure that the
data is not lost.. the way i see it.. raid 10 etc are not needed simply
because there are so many replicas of the initial data that raid 10 feels
like overkill. Personally, with our tests the 4 node cluster actually
outperformed our old raid array.

5. we got the setup part properly. We do not know the proper procedure to
bring back the cluster to its full strength. Now one can deploy gluster on
an AMI or Vmware image but the underlying codebase is the same all the
times. So what do we do to get this proof on concept  done.

Best Regards
Hareem. Haque