[Gluster-users] no progress in geo-replication

Felix Kölzow felix.koelzow at gmx.de
Wed Mar 3 19:41:23 UTC 2021


Dear Dietmar,


I am very interested in helping you with that geo-replication, since we
also have a setup with geo-replication that is crucial for the

backup procedure. I just had a quick look at this and for the moment, I
just can suggest:

> is there any suitable setting in the gluster-environment which would
> take influence on the speed of the processing (current settings
> attached) ?
gluster volume geo-replication mvol1 gl-slave-05-int::svol  config
sync_jobs  9


in order to increase the number of rsync processes.

Furthermore, taken from
https://access.redhat.com/documentation/en-us/red_hat_gluster_storage/3.5/html/administration_guide/recommended_practices3


> Performance Tuning
>
> When the following option is set, it has been observed that there is
> an increase in geo-replication performance. On the slave volume, run
> the following command:
>
> #|gluster volume set /SLAVE_VOL/ batch-fsync-delay-usec 0|

Can you verify that the changelog-files are consumed?


Regards,

Felix

On 03/03/2021 17:28, Dietmar Putz wrote:
>
> Hi,
>
> I'm having a problem with geo-replication. A short summary...
> About two month ago I have added two further nodes to a distributed
> replicated volume. For that purpose I have stopped the
> geo-replication, added two nodes on mvol and svol and started a
> rebalance process on both sides. Once the rebalance process was
> finished I startet the geo-replication again.
>
> After a few days and beside some Unicode Errors the status of the new
> added brick changed from hybrid crawl to history crawl. Since then no
> progress, no files / directories have been created on svol for a
> couple of days.
>
> Looking for a possible reason I recognized that there is was
> /var/log/glusterfs/geo-replication-slaves/mvol1_gl-slave-01-int_svol1
> directory on the new added slave nodes.
> Obviously I forgot to add the new svol node IP addresses on all
> master's /etc/hosts. After fixing that I did the '... execute
> gsec_create' and '...create push-pem force' command again and
> corresponding directory were created. Geo-replication started normal,
> all active sessions were in history crawl (as shown below) and for a
> short while some data were transfered to svol. But for about a week
> nothing had changed on svol, 0 byte transferred.
>
> Meanwhile i have deleted (without reset-sync-time) and recreated the
> geo-rep session. the current status is as shown below but without any
> last_synced date.
> an entry like "last_synced_entry": 1609283145 is still visible in
> /var/lib/glusterd/geo-replication/mvol1_gl-slave-01-int_svol1/*status
> and changelog files are continously created in
> /var/lib/misc/gluster/gsyncd/mvol1_gl-slave-01-int_svol1/<brick>/.processing.
>
>
> Short time ago i changed log_level to DEBUG for a moment.
> Unfortunately I got an 'EOFError: Ran out of input' in gsyncd.log and
> rebuild of .processing starts from beginning.
> But one of the first very long lines in gsyncd.log looks like :
>
> [2021-03-03 11:59:39.503881] D [repce(worker
> /brick1/mvol1):215:__call__] RepceClient: call
> 9163:139944064358208:1614772779.4982471 history_getchanges ->
> ['/var/lib/misc/gluster/gsyncd/mvol1_gl-slave-01-int_svol1/brick1-mvol1/.history/.processing/CHANGELOG.1609280278',...
>
> 1609280278 means Tuesday, December 29, 2020 10:17:58 PM and would
> somehow fit to the last_synced date.
>
> However, I got nearly 300k files in <brick>/.history/.processing and
> in in log/trace it seems that any file in <brick>/.history/.processing
> will be processed and transferred to <brick>/.processing.
> My questions so far...
> first of all, is everything still ok with this geo-replication ?
> do i have to wait until all changelog files in
> <brick>/.history/.processing are processed until transfers to svol start ?
> what happens if any other error appears in geo-replication while these
> changelog files are processed resp. crawl status is history crawl ...
> does the entire process starts from the beginning ? would a checkpiont
> be helpful...for future decisions...?
> is there any suitable setting in the gluster-environment which would
> take influence on the speed of the processing (current settings
> attached) ?
>
>
> I hope someone can help...
>
> best regards
> dietmar
>
>
>
> [ 15:17:47 ] - root at gl-master-01
> /var/lib/misc/gluster/gsyncd/mvol1_gl-slave-01-int_svol1/brick1-mvol1/.history
> $ls .processing/ | wc -l
> 294669
>
> [ 12:56:31 ] - root at gl-master-01  ~ $gluster volume geo-replication
> mvol1 gl-slave-01-int::svol1 status
>
> MASTER NODE         MASTER VOL    MASTER BRICK     SLAVE USER
> SLAVE                     SLAVE NODE         STATUS     CRAWL
> STATUS     LAST_SYNCED
> ----------------------------------------------------------------------------------------------------------------------------------------------------
> gl-master-01-int    mvol1         /brick1/mvol1    root
> gl-slave-01-int::svol1    gl-slave-05-int    Active History Crawl   
> 2020-12-29 23:00:48
> gl-master-01-int    mvol1         /brick2/mvol1    root
> gl-slave-01-int::svol1    gl-slave-03-int    Active History Crawl   
> 2020-12-29 23:05:45
> gl-master-05-int    mvol1         /brick1/mvol1    root
> gl-slave-01-int::svol1    gl-slave-03-int    Active History Crawl   
> 2021-02-20 17:38:38
> gl-master-06-int    mvol1         /brick1/mvol1    root
> gl-slave-01-int::svol1    gl-slave-06-int    Passive N/A              N/A
> gl-master-03-int    mvol1         /brick1/mvol1    root
> gl-slave-01-int::svol1    gl-slave-05-int    Passive N/A              N/A
> gl-master-03-int    mvol1         /brick2/mvol1    root
> gl-slave-01-int::svol1    gl-slave-04-int    Active History Crawl   
> 2020-12-29 23:07:34
> gl-master-04-int    mvol1         /brick1/mvol1    root
> gl-slave-01-int::svol1    gl-slave-06-int    Active History Crawl   
> 2020-12-29 23:07:22
> gl-master-04-int    mvol1         /brick2/mvol1    root
> gl-slave-01-int::svol1    gl-slave-01-int    Passive N/A              N/A
> gl-master-02-int    mvol1         /brick1/mvol1    root
> gl-slave-01-int::svol1    gl-slave-01-int    Passive N/A              N/A
> gl-master-02-int    mvol1         /brick2/mvol1    root
> gl-slave-01-int::svol1    gl-slave-06-int    Passive N/A              N/A
> [ 13:14:47 ] - root at gl-master-01  ~ $
>
>
> ________
>
>
>
> Community Meeting Calendar:
>
> Schedule -
> Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
> Bridge: https://meet.google.com/cpu-eiue-hvk
> Gluster-users mailing list
> Gluster-users at gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-users
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20210303/4ce8ad39/attachment.html>


More information about the Gluster-users mailing list