[Gluster-devel] Geo-rep: Solving changelog ordering problem!

Thu Sep 3 06:55:24 UTC 2015

Hi DHT Team and Others,

Changelog is a server side translator sits above POSIX and records FOPs.
Hence, the order of operation is true only for that brick and the order
of operation is lost across bricks.

e.g.,(f1 hashes to brick1 and f2 to brick2)
          brick1                         brick2
          CREATE f1
          RENAME f1, f2    
    >>>>> Re-balance happens, which is very common with Tiering in place<<<<
                                         RENAME f2, f3
                                         DATA f3

The moment re-balance happens, the changelogs related to same entry is distributed
across bricks and since geo-rep sync these changes independently, it is well possible
that it processes in wrong order and end up in inconsistent state in slave.

SOLUTION APPROACHES:

1. Capture re-balance traffic as well and workout all combinations of FOPs to end
   up in correct state. Though we started thinking in these lines, one or the other
   corner case does exist and still end up in out of order syncing.

2. The changes related to the 'entry'(file), should always be captured on the first
   brick where it recorded initially no matter where the file moves because of re-balance.
   This retains the ordering for an entry implicitly and yet geo-rep can sync in distributed
   manner from each brick keeping the performance up.

   DHT needs to maintain the state for each entry where it was first cached (to be precise, 
   which brick it gets recorded in changelog) and always notifies changelog the FOP.

   I think if can achieve second solution, it would solve geo-rep's out of order syncing
   problem for ever. 

   Let me know your comments and suggestions on this!

Thanks and Regards,
Kotresh H R