[Gluster-users] Single Point of failure in geo Replication

Thu Oct 17 03:02:03 UTC 2019

On Wed, Oct 16, 2019 at 11:08 PM deepu srinivasan <sdeepugd at gmail.com>
wrote:

> Hi Users
> Is there a single point of failure in GeoReplication for gluster?
> My Case:
> I Use 3 nodes in both master and slave volume.
> Master volume : Node1,Node2,Node3
> Slave Volume : Node4,Node5,Node6
> I tried to recreate the scenario to test a single point of failure.
>
> Geo-Replication Status:
>
> *Master Node         Slave Node         Status *
> Node1                   Node4                  Active
> Node2                   Node4                  Passive
> Node3                   Node4                  Passive
>
> Step 1: Stoped the glusterd daemon in Node4.
> Result: There were only two-node statuses like the one below.
>
> *Master Node         Slave Node         Status *
> Node2                   Node4                  Passive
> Node3                   Node4                  Passive
>
>
> Will the GeoReplication session goes down if the primary slave is down?
>

Hi Deepu,

Geo-replication depends on a primary slave node to get the information
about other nodes which are part of Slave Volume.

Once the workers are started, it is not dependent on the primary slave
node. Will not fail if a primary goes down. But if any other node goes down
then the worker will try to connect to some other node, for which it tries
to run Volume status command on the slave node using the following command.

```
ssh -i <georep-pem> <primary-node> gluster volume status <slavevol>
```

The above command will fail and Worker will not get the list of Slave nodes
to which it can connect to.

This is only a temporary failure until the primary node comes back online.
If the primary node is permanently down then run Geo-rep delete and Geo-rep
create command again with the new primary node. (Note: Geo-rep Delete and
Create will remember the last sync time and resume once it starts)

I will evaluate the possibility of caching a list of Slave nodes so that it
can be used as a backup primary node in case of failures. I will open
Github issue for the same.

Thanks for reporting the issue.

-- 
regards
Aravinda VK
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20191017/53f34e78/attachment.html>