[Gluster-devel] Geo-replication and Tiering - Solution for Rebalance races

Saravanakumar Arumugam sarumuga at redhat.com
Mon Oct 5 13:00:39 UTC 2015


Hi,

Some Background:

Geo-replication in a Tiering volume has race issues as changelogs are 
processed independently in each brick.
Due to frequent movement of files between cold/hot tier, geo-replication 
is prone to races.

Below is one such example:
==================================
Brick1             Brick2
==================================
Create file        (file moved due to rebalance).
                           Data file
                           Delete file
==================================
If Brick2 changelogs processed first followed by Brick1, file may be 
created.
But, we expect the file to be deleted (as per the last operation)


Solution:

Step 1.

Record all the fop operations in HOT tier and Record only Data/Meta Data
in COLD tier.

Why ?

a. If the file is directly placed in Hot tier , all fops will be
recorded in HOT tier.

b. If  the file is *already* present in Cold tier, and if any fop is
carried out, it creates linkto file in Hot tier.

               Now, operations like UNLINK, RENAME are captured in Hot
tier(by means of linkto file).
    This way, we can get both tier's operation in HOT tier itself.

Step 2.

 From gluster volume info, figure out whether the brick is of COLD 
subvolume.
(This is possible using gluster volume info <tiervol> --xml )

IF so, IGNORE all file ops except DATA and METADATA.


Help from DHT:

Now, We need some help from (tiering)DHT for Step 1.

There is one issue in Step 1, if the file was Created on a COLD subvolume,
We will miss "CREATE" operation in Hot  subvolume.

Now, If the linkto file is created in HOT tier(Hash) (due to lookup alone),
This needs to be informed to changelog xlator, so that it will record it 
as CREATE.


IIUC, There are multiple places where linkto file is created.
So, this should be done only in case, lookup creates a linkto file in 
Hot tier(Hash) alone.


Please provide your feedback on this.
Thanks!

Regards,
Saravana



More information about the Gluster-devel mailing list