[Gluster-devel] Performance improvements to Gluster Geo-replication
Shyam
srangana at redhat.com
Mon Aug 31 13:47:55 UTC 2015
On 08/31/2015 03:17 AM, Aravinda wrote:
> Following Changes/ideas identified to improve the Geo-replication
> Performance. Please add your ideas/issues to the list
>
> 1. Entry stime and Data/Meta stime
> ----------------------------------
> Now we use only one xattr to maintain the state of sync, called
> stime. When a Geo-replication worker restarts, it starts from that
> stime and sync files.
>
> get_changes from <STIME> to <CURRENT TIME>
> perform <ENTRY> operations
> perform <META> operations
> perform <DATA> operations
>
> If data operation is failed worker crashes and restarts and reprocess
> the changelogs again. Entry, Meta and Data operations will be
> retried. If we maintain entry_stime seperately then we can avoid
> reprocessing of entry operations which are completed previously.
This seems like a good thing to do.
Here is something more that could be done (I am not well aware of
geo-rep internals so maybe this cannot be done),
- Why not maintain a 'mark', till which even ENTRY/META operations are
performed, so that even when failures occur in ENTRY/META operation
queue, we need to restart from the mark and not all the way from the
beginning STIME.
I am not sure where such a 'mark' can be maintained, unless the
processed get_changes are ordered and written to disk, or ordered
idempotently in memory each time.
>
>
> 2. In case of Rsync/Tar failure, do not repeat Entry Operations
> ---------------------------------------------------------------
> In case of Rsync/Tar failures, Changelogs are reprocessed
> again. Instead re trigger only Rsync/Tar job for those list of files
> which are failed.
(this is more for my understanding)
I assume that this retry is within the same STIME -> NOW1 period. IOW,
if the re-trigger of the tar/rsync is going to occur in the next sync
interval, then I would assume that ENTRY/META for NOW1 -> NOW would be
repeated, correct? The same is true for the above as well, i.e all
ENTRY/META operations that are completed between STIME and NOW1 is not
repeated, but events between NOW1 to NOW is, correct?
>
>
> 3. Better Rsync Queue
> ---------------------
> Now Geo-rep has a Rsync/Tar queue called PostBox. Sync
> jobs(configurable, default is 3) will empty the Post Box and feeds it
> to Rsync/Tar process. Second sync job may not find any items to sync,
> only first job may overloaded. To avoid this, introduce a batch size
> to PostBox so that each sync jobs gets equal number of files to sync.
Do you want to consider round-robin of entries to the sync jobs,
something that we did in rebalance, instead of a batch size?
A batch size can again be consumed by a single sync process, and the
next batch by the next one so on. Maybe a round-robin distribution of
files to sync from the post-box to each sync thread may help.
>
>
> 4. Handling the Tracebacks
> --------------------------
> Collect the list of Tracebacks which are not yet handled, and look for
> posibility of handling it in run time. With this, workers crash will
> be minimized so that we can avoid initializing and changelogs
> reprocess efforts.
>
>
> 5. SSH failure handling
> -----------------------
> If Slave node goes down, the Master worker connected to it will go to
> Faulty and restarts. If we can handle SSH failures intelligently, we
> can reestablish the SSH connection instead of restarting Geo-rep
> worker. With this change, Active/Passive switch for Network failures
> can be avoided.
>
>
> 6. On Worker restart, Utilizing Changelogs which are in .processing
> directory
> --------------------------------------------------------------------
> On Worker restart, Start time for Geo-rep is previously updated
> stime. Geo-rep re-parses the Changelogs from Brick backend to Working
> directory even though those changelogs parsed previously but stime is
> not updated due to failures in sync.
>
> 1. On Geo-rep restart, Delete all files in .processing/cache and
> move all the changelogs available in .processing directory to
> .processing/cache
> 2. In Changelog API, look for Changelog file name in cache before
> parsing it.
> 3. If available in cache, move it to .processing
> 4. else parse it and generate parsed changelog in .processing
>
I did not understand the above, but that's probably just me as I am not
fully aware of change log process yet :)
More information about the Gluster-devel
mailing list