[Gluster-users] Single Point of failure in geo Replication

Aravinda Vishwanathapura Krishna Murthy avishwan at redhat.com
Thu Oct 17 15:33:43 UTC 2019


On Thu, Oct 17, 2019 at 11:44 AM deepu srinivasan <sdeepugd at gmail.com>
wrote:

> Thank you for your response.
> We have tried the above use case you mentioned.
>
> Case 1: Primary node is permanently Down (Hardware failure)
> In this case, the Georeplication session cannot be stopped and returns the
> failure "start the primary node and then stop(or similar message)".
> Now I cannot delete because I cannot stop the session.
>

Please try "stop force", Let us know if that works.


> On Thu, Oct 17, 2019 at 8:32 AM Aravinda Vishwanathapura Krishna Murthy <
> avishwan at redhat.com> wrote:
>
>>
>> On Wed, Oct 16, 2019 at 11:08 PM deepu srinivasan <sdeepugd at gmail.com>
>> wrote:
>>
>>> Hi Users
>>> Is there a single point of failure in GeoReplication for gluster?
>>> My Case:
>>> I Use 3 nodes in both master and slave volume.
>>> Master volume : Node1,Node2,Node3
>>> Slave Volume : Node4,Node5,Node6
>>> I tried to recreate the scenario to test a single point of failure.
>>>
>>> Geo-Replication Status:
>>>
>>> *Master Node         Slave Node         Status *
>>> Node1                   Node4                  Active
>>> Node2                   Node4                  Passive
>>> Node3                   Node4                  Passive
>>>
>>> Step 1: Stoped the glusterd daemon in Node4.
>>> Result: There were only two-node statuses like the one below.
>>>
>>> *Master Node         Slave Node         Status *
>>> Node2                   Node4                  Passive
>>> Node3                   Node4                  Passive
>>>
>>>
>>> Will the GeoReplication session goes down if the primary slave is down?
>>>
>>
>>
>> Hi Deepu,
>>
>> Geo-replication depends on a primary slave node to get the information
>> about other nodes which are part of Slave Volume.
>>
>> Once the workers are started, it is not dependent on the primary slave
>> node. Will not fail if a primary goes down. But if any other node goes down
>> then the worker will try to connect to some other node, for which it tries
>> to run Volume status command on the slave node using the following command.
>>
>> ```
>> ssh -i <georep-pem> <primary-node> gluster volume status <slavevol>
>> ```
>>
>> The above command will fail and Worker will not get the list of Slave
>> nodes to which it can connect to.
>>
>> This is only a temporary failure until the primary node comes back
>> online. If the primary node is permanently down then run Geo-rep delete and
>> Geo-rep create command again with the new primary node. (Note: Geo-rep
>> Delete and Create will remember the last sync time and resume once it
>> starts)
>>
>> I will evaluate the possibility of caching a list of Slave nodes so that
>> it can be used as a backup primary node in case of failures. I will open
>> Github issue for the same.
>>
>> Thanks for reporting the issue.
>>
>> --
>> regards
>> Aravinda VK
>>
>

-- 
regards
Aravinda VK
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20191017/fe2a180f/attachment.html>


More information about the Gluster-users mailing list