[Gluster-devel] Geo-replication: Improving the performance during History Crawl

Tue Aug 9 05:54:36 UTC 2016

Thanks Vijay. Posted the initial patch for the same.
http://review.gluster.org/15110

Answers inline.

regards
Aravinda

On 08/09/2016 01:38 AM, Vijay Bellur wrote:
> On 08/05/2016 04:45 AM, Aravinda wrote:
>> Hi,
>>
>> Geo-replication has three types of Change detection(To identify the list
>> of files changed and to sync only those files)
>>
>> 1. XTime based Brick backend Crawl for initial sync
>> 2. Historical Changelogs to sync backlogs(Files created/modified/deleted
>> between Worker down and start)
>> 3. Live Changelogs - As and when changelog is rolled over, process it
>> and sync the changes
>>
>> If initial data available in Master Volume before Geo-replication
>> session is created, then it does XTime based Crawl(Hybrid Crawl) and
>> then switches to Live Changelog mode.
>> After initial sync, Xtime crawl will not be used. On worker restart it
>> uses Historical changelogs and then switches to Live Changelogs.
>>
>> Geo-replication is very slow during History Crawl if backlog changelogs
>> grows(If Geo-rep session was down for long time).
>>
>
> Do we need an upper bound on the duration allowed for the backlog 
> changelog to grow? If the backlog grows beyond a certain threshold, 
> should we resort to xtime based crawl as in the initial sync?
Added 15 days cap for processing. Initial sync part is not changed, 
Geo-rep will use Xsync for initial sync. This optimization only when 
worker is down after initial sync for long time.
>
>> - If a same file is Created, deleted and again created, Geo-rep is
>> replaying the changelogs in the same manner in Slave side.
>> - Data sync happens GFID to GFID, So except the final GFID sync all the
>> other sync will fail since file not exists in Master(File may exist but
>> with different GFID)
>>   Due to these data sync and retries, Geo-rep performance is affected.
>>
>> Me and Kotresh discussed about the same and came up with following
>> changes to Geo-replication
>>
>> While processing History,
>>
>> - Collect all the entry, data and meta operations in a temporary 
>> database
>
> Depending on the number of changelogs and operations, creation of this 
> database itself might take a non trivial amount of time. If there is 
> an archival/WORM workload without any deletions, would this step be 
> counter productive from a performance perspective?
Temp database is purged and created for each iteration. Little change 
here, entry operations are not stored in db. Only Data and Meta GFIDs 
are stored. Entry operations are processed as and when Changelogs are 
processed.
>
>> - Delete all Data and Meta GFIDs which are already unlinked as per
>> Changelogs
>
> We need to delete only those GFIDs whose link count happens to be zero 
> after the unlink. Would this need an additional stat()?
Valid point, will add stat before removing from data/meta list.
>
>
>> - Process all Entry operations in batch
>> - Process data and meta operations in batch
>> - Once the sync is complete, Update last Changelog's time as last_synced
>> time as usual.
>>
>> Challenges:
>> - If worker crashes in between while doing above steps, on restart same
>> changelogs will be reprocessed.(Crawl done in small batches in existing,
>> so on failure reprocess only last partially completed last batch)
>>   Some of the retries can be avoided if we start maintaining details
>> about entry_last_synced(entry_stime) and data_last_synced(stime)
>> separately.
>>
>
> Right, this can be a significant challenge if we keep crashing at the 
> same point due to an external factor or a bug in code. Having a more 
> granular tracker can help in reducing the cost of a retry.
Entry operations retries can be optimized by having entry_stime xattr 
separate from stime xattr.
>
> -Vijay
>
>
>