[Gluster-users] no progress in geo-replication

Thu Mar 4 00:12:44 UTC 2021

Hello Felix,

thank you for your reply...

batch-fsync-delay-usec was already set to 0 and I increased the 
sync_jobs from 3 to 6. In the moment I increased the sync_jobs following 
error appeared in gsyncd.log :

[2021-03-03 23:17:46.59727] E [syncdutils(worker 
/brick1/mvol1):312:log_raise_exception] <top>: connection to peer is broken
[2021-03-03 23:17:46.59912] E [syncdutils(worker 
/brick2/mvol1):312:log_raise_exception] <top>: connection to peer is broken

passive nodes became active and the content in <brick>/.processing was 
removed. currently new changelog files are created in this directory.

shortly before I changed the sync_jobs I have checked the 
<brick>/.processing directory on the master nodes. the result was the 
same for every master node.

since the last error about 12 hours ago nearly 2400 changelog files were 
created on each node but it looks like none of them were consumed.

in the moment I'm not sure what is right and what is wrong...should at 
least the oldest changelog files in this directory have been processed 
gradually ?

best regards

Dietmar

[ 00:12:57 ] - putz at centreon-3qmedien ~/central $./mycommand.sh -H 
gl-master -c "ls -l 
/var/lib/misc/gluster/gsyncd/mvol1_gl-slave-01-int_svol1/brick1-mvol1/.processing/"

Host : gl-master-01
total 9824
-rw-r--r-- 1 root root  268 Mar  3 12:02 CHANGELOG.1614772950
-rw-r--r-- 1 root root 1356 Mar  3 12:02 CHANGELOG.1614772965
-rw-r--r-- 1 root root 1319 Mar  3 12:03 CHANGELOG.1614772980
...
-rw-r--r-- 1 root root  693 Mar  3 23:10 CHANGELOG.1614813002
-rw-r--r-- 1 root root   48 Mar  3 23:12 CHANGELOG.1614813170
-rw-r--r-- 1 root root 1222 Mar  3 23:13 CHANGELOG.1614813226

On 03.03.21 20:41, Felix Kölzow wrote:
>
> Dear Dietmar,
>
>
> I am very interested in helping you with that geo-replication, since 
> we also have a setup with geo-replication that is crucial for the
>
> backup procedure. I just had a quick look at this and for the moment, 
> I just can suggest:
>
>> is there any suitable setting in the gluster-environment which would 
>> take influence on the speed of the processing (current settings 
>> attached) ?
> gluster volume geo-replication mvol1 gl-slave-05-int::svol  config 
> sync_jobs  9
>
>
> in order to increase the number of rsync processes.
>
> Furthermore, taken from 
> https://access.redhat.com/documentation/en-us/red_hat_gluster_storage/3.5/html/administration_guide/recommended_practices3
>
>
>> Performance Tuning
>>
>> When the following option is set, it has been observed that there is 
>> an increase in geo-replication performance. On the slave volume, run 
>> the following command:
>>
>> #|gluster volume set /SLAVE_VOL/ batch-fsync-delay-usec 0|
>
> Can you verify that the changelog-files are consumed?
>
>
> Regards,
>
> Felix
>
> On 03/03/2021 17:28, Dietmar Putz wrote:
>>
>> Hi,
>>
>> I'm having a problem with geo-replication. A short summary...
>> About two month ago I have added two further nodes to a distributed 
>> replicated volume. For that purpose I have stopped the 
>> geo-replication, added two nodes on mvol and svol and started a 
>> rebalance process on both sides. Once the rebalance process was 
>> finished I startet the geo-replication again.
>>
>> After a few days and beside some Unicode Errors the status of the new 
>> added brick changed from hybrid crawl to history crawl. Since then no 
>> progress, no files / directories have been created on svol for a 
>> couple of days.
>>
>> Looking for a possible reason I recognized that there is was 
>> /var/log/glusterfs/geo-replication-slaves/mvol1_gl-slave-01-int_svol1 
>> directory on the new added slave nodes.
>> Obviously I forgot to add the new svol node IP addresses on all 
>> master's /etc/hosts. After fixing that I did the '... execute 
>> gsec_create' and '...create push-pem force' command again and 
>> corresponding directory were created. Geo-replication started normal, 
>> all active sessions were in history crawl (as shown below) and for a 
>> short while some data were transfered to svol. But for about a week 
>> nothing had changed on svol, 0 byte transferred.
>>
>> Meanwhile i have deleted (without reset-sync-time) and recreated the 
>> geo-rep session. the current status is as shown below but without any 
>> last_synced date.
>> an entry like "last_synced_entry": 1609283145 is still visible in 
>> /var/lib/glusterd/geo-replication/mvol1_gl-slave-01-int_svol1/*status 
>> and changelog files are continously created in 
>> /var/lib/misc/gluster/gsyncd/mvol1_gl-slave-01-int_svol1/<brick>/.processing. 
>>
>>
>> Short time ago i changed log_level to DEBUG for a moment. 
>> Unfortunately I got an 'EOFError: Ran out of input' in gsyncd.log and 
>> rebuild of .processing starts from beginning.
>> But one of the first very long lines in gsyncd.log looks like :
>>
>> [2021-03-03 11:59:39.503881] D [repce(worker 
>> /brick1/mvol1):215:__call__] RepceClient: call 
>> 9163:139944064358208:1614772779.4982471 history_getchanges -> 
>> ['/var/lib/misc/gluster/gsyncd/mvol1_gl-slave-01-int_svol1/brick1-mvol1/.history/.processing/CHANGELOG.1609280278',...
>>
>> 1609280278 means Tuesday, December 29, 2020 10:17:58 PM and would 
>> somehow fit to the last_synced date.
>>
>> However, I got nearly 300k files in <brick>/.history/.processing and 
>> in in log/trace it seems that any file in 
>> <brick>/.history/.processing will be processed and transferred to 
>> <brick>/.processing.
>> My questions so far...
>> first of all, is everything still ok with this geo-replication ?
>> do i have to wait until all changelog files in 
>> <brick>/.history/.processing are processed until transfers to svol 
>> start ?
>> what happens if any other error appears in geo-replication while 
>> these changelog files are processed resp. crawl status is history 
>> crawl ... does the entire process starts from the beginning ? would a 
>> checkpiont be helpful...for future decisions...?
>> is there any suitable setting in the gluster-environment which would 
>> take influence on the speed of the processing (current settings 
>> attached) ?
>>
>>
>> I hope someone can help...
>>
>> best regards
>> dietmar
>>
>>
>>
>> [ 15:17:47 ] - root at gl-master-01 
>> /var/lib/misc/gluster/gsyncd/mvol1_gl-slave-01-int_svol1/brick1-mvol1/.history 
>> $ls .processing/ | wc -l
>> 294669
>>
>> [ 12:56:31 ] - root at gl-master-01  ~ $gluster volume geo-replication 
>> mvol1 gl-slave-01-int::svol1 status
>>
>> MASTER NODE         MASTER VOL    MASTER BRICK     SLAVE USER 
>> SLAVE                     SLAVE NODE         STATUS CRAWL STATUS     
>> LAST_SYNCED
>> ----------------------------------------------------------------------------------------------------------------------------------------------------
>> gl-master-01-int    mvol1         /brick1/mvol1    root 
>> gl-slave-01-int::svol1    gl-slave-05-int    Active History Crawl    
>> 2020-12-29 23:00:48
>> gl-master-01-int    mvol1         /brick2/mvol1    root 
>> gl-slave-01-int::svol1    gl-slave-03-int    Active History Crawl    
>> 2020-12-29 23:05:45
>> gl-master-05-int    mvol1         /brick1/mvol1    root 
>> gl-slave-01-int::svol1    gl-slave-03-int    Active History Crawl    
>> 2021-02-20 17:38:38
>> gl-master-06-int    mvol1         /brick1/mvol1    root 
>> gl-slave-01-int::svol1    gl-slave-06-int    Passive N/A              N/A
>> gl-master-03-int    mvol1         /brick1/mvol1    root 
>> gl-slave-01-int::svol1    gl-slave-05-int    Passive N/A              N/A
>> gl-master-03-int    mvol1         /brick2/mvol1    root 
>> gl-slave-01-int::svol1    gl-slave-04-int    Active History Crawl    
>> 2020-12-29 23:07:34
>> gl-master-04-int    mvol1         /brick1/mvol1    root 
>> gl-slave-01-int::svol1    gl-slave-06-int    Active History Crawl    
>> 2020-12-29 23:07:22
>> gl-master-04-int    mvol1         /brick2/mvol1    root 
>> gl-slave-01-int::svol1    gl-slave-01-int    Passive N/A              N/A
>> gl-master-02-int    mvol1         /brick1/mvol1    root 
>> gl-slave-01-int::svol1    gl-slave-01-int    Passive N/A              N/A
>> gl-master-02-int    mvol1         /brick2/mvol1    root 
>> gl-slave-01-int::svol1    gl-slave-06-int    Passive N/A              N/A
>> [ 13:14:47 ] - root at gl-master-01  ~ $
>>
>>
>> ________
>>
>>
>>
>> Community Meeting Calendar:
>>
>> Schedule -
>> Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
>> Bridge:https://meet.google.com/cpu-eiue-hvk
>> Gluster-users mailing list
>> Gluster-users at gluster.org
>> https://lists.gluster.org/mailman/listinfo/gluster-users
>
> ________
>
>
>
> Community Meeting Calendar:
>
> Schedule -
> Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
> Bridge: https://meet.google.com/cpu-eiue-hvk
> Gluster-users mailing list
> Gluster-users at gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-users

-- 
Mit freundlichen Grüßen / Kind Regards

Dietmar Putz
Head of Infrastructure
dietmar.putz at 3q.video
www.3q.video

3Q GmbH
Kurfürstendamm 102 | 10711 Berlin

CEO Julius Thomas
Amtsgericht Charlottenburg
Registernummer HRB 217706 B

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20210304/f17393f1/attachment.html>