[Gluster-devel] Geo-replication: Reduce overlap in Changelogs processing when Geo-rep worker becomes Active from Passive

Fri Jul 17 08:37:28 UTC 2015

With the recent enhancement in Geo-rep, Active/Passive switch is
improved in Geo-replication. But as a side effect, it introduces more
overlap in Changelogs processing.

When Active worker goes down, Passive worker will become Active and
participate in syncing. Earlier this was happening when Active node
goes down or Active brick process goes down. Now it is happening on
worker failures.

Worker can fail due to following reasons

1. Brick/Node goes down
2. Unhandled Python Tracebacks
3. Network failure between Master Cluster to Slave Cluster
4. Slave node to which Master node is connected goes down

When passive becomes active, it starts syncing based on the stime in
its brick root.

When worker is Passive, in periodic interval it updates its stime with
mount point stime. Mount point stime is Min of Distribute and Max of
Replica.

If other subvolume workers are lagging in sync then stime of Passive
worker will be less than its replica pair. Due to this, when Passive
worker becomes active it always uses starting point less than actual
synced time.

We need more intelligence to update stime in periodical
interval. Instead of updating the mount point stime to itself, update
Max of its replica.

Approaches:
===========
Introduce a virtual xattr to query replica-stime
------------------------------------------------
getfattr -n glusterfs.replica-stime.<BRICK_NODE:BRICK_ROOT> /mnt/

This returns max value from all the replica pairs of
<BRICK_NODE:BRICK_ROOT>. Update this as brick stime in Passive
workers.

passive brick stime = MAX(getfattr replica-time, mount point stime)

MAX is required since if Active brick is down when queried
replica-time. and replica-time will be old(less than the replica/mount
stime)

Maintain Active bricks stime in state file inside Meta Volume.
--------------------------------------------------------------
Whenever active brick completes syncing and updates its stime, update
the state file present in Meta
Volume. 
($META/geo-rep/<MASTER_UUID>.<SLAVE_UUID>.replica.stime-<SUBVOL_NUM>)

Passive worker periodically look for above file and updates its stime
as MAX(Mount stime, Stime value from state file)

passive brick stime = MAX(Value from state file, mount point stime)

---

With the above mentioned approaches, when Passive becomes Active, it
will continue from where its replica workers are stopped.

Let me know your thoughts. Thanks.

-- 
regards
Aravinda