[Bugs] [Bug 1258831] New: [RFE] Failure Handling of Primary Slave Node

bugzilla at redhat.com bugzilla at redhat.com
Tue Sep 1 11:32:57 UTC 2015


https://bugzilla.redhat.com/show_bug.cgi?id=1258831

            Bug ID: 1258831
           Summary: [RFE] Failure Handling of Primary Slave Node
           Product: GlusterFS
           Version: mainline
         Component: geo-replication
          Assignee: bugs at gluster.org
          Reporter: avishwan at redhat.com
                CC: bugs at gluster.org, gluster-bugs at redhat.com



Description of problem:
------------------------
When primary slave node which is used in Geo-rep command goes down, geo-rep
fails to get other slave nodes information and fails to start Geo-replication.

If Geo-rep is already started and Primary slave node goes down, that worker
will remain Faulty since it is unable to get the Other nodes information.

Solution:
---------
Save slave hosts details in config file. When a worker goes to faulty, it tries
to get volume status using --remote-host. Use this pool of hosts for
remote-host

Cache the slave nodes/cluster info in config file as slave_nodes. If Primary
Slave node is not available, use other available node.

Pseudo code:
-------------
Two new config items: prev_main_node, slave_nodes

1. if prev_main_node not in CONFIG:
    Set prev_main_node = Slave node passed in Geo-rep command

2. Try to get Slave Volinfo with prev_main_node
3. If failed, Try to get Slave Volinfo from the node specified in Geo-rep
command(If Node specified in Geo-rep command != prev_main_node)
4. If failed, check `slave_nodes` is available in CONFIG
5. If not available, FAIL
6. If available, Try to get Slave Volinfo using any one remote host except
previously failed
7. If Volinfo available, match the Slave Vol UUID with the results to make sure
it is the same Slave Volume
8. If Volinfo is valid, return it and update prev_main_node in config file and
re-update `slave_nodes`
9. If invalid Volinfo FAIL
10. If Volinfo not available(from step 7) from all nodes, FAIL

-- 
You are receiving this mail because:
You are on the CC list for the bug.
You are the assignee for the bug.


More information about the Bugs mailing list