[Bugs] [Bug 1217930] New: Geo-replication very slow, not able to sync all the files to slave

Sun May 3 05:07:07 UTC 2015

https://bugzilla.redhat.com/show_bug.cgi?id=1217930

            Bug ID: 1217930
           Summary: Geo-replication very slow, not able to sync all the
                    files to slave
           Product: GlusterFS
           Version: 3.7.0
         Component: geo-replication
          Severity: urgent
          Priority: urgent
          Assignee: bugs at gluster.org
          Reporter: avishwan at redhat.com
                CC: aavati at redhat.com, bkunal at redhat.com,
                    bugs at gluster.org, csaba at redhat.com,
                    gluster-bugs at redhat.com, nlevinki at redhat.com,
                    rhs-bugs at redhat.com, sasundar at redhat.com,
                    storage-qa-internal at redhat.com
        Depends On: 1210719, 1210965

+++ This bug was initially created as a clone of Bug #1210965 +++

+++ This bug was initially created as a clone of Bug #1210719 +++

Description of problem:
=======================

Geo-replication is very slow, not able to sync all the files to slave.
Geo-replication state moving to faulty very frequently.

Reproduction:

1. Create 2 node replica master volume and slave volume.
2. Set up geo-replication from master volume to slave volume.
3. Put numerous files into master volume. (I'm currently testing with 52
million files which size is 5KB. CU's case is same file size and the number of
file is 45 million.)
4. Check the geo-replication status, 'df -h', 'df -i' command output.
5. Initially, the geo-replication crawl status is Changelog crawl.
6. After several times(From my testing, it took 1 - 2 days in our lab env), the
geo-replication crawl status changed to History crawl then the file transfer to
slave volume seems to be not working.

~~~
[Sample result]
<Master side>
[root at master1 ~]# df -h
Filesystem            Size  Used Avail Use% Mounted on
/dev/mapper/vg_master1-lv_root
                       17G  4.8G   11G  31% /
tmpfs                 4.0G     0  4.0G   0% /dev/shm
/dev/vda1             477M   28M  424M   7% /boot
/dev/vdb              3.0T  130G  2.9T   5% /data

[root at master1 ~]# df -i
Filesystem              Inodes    IUsed     IFree IUse% Mounted on
/dev/mapper/vg_master1-lv_root
                       1093440    64936   1028504    6% /
tmpfs                  1023965        2   1023963    1% /dev/shm
/dev/vda1               128016       38    127978    1% /boot
/dev/vdb             322122496 15507582 306614914    5% /data

<Slave side>
[root at slave1 geo-replication-slaves]# df -h
Filesystem            Size  Used Avail Use% Mounted on
/dev/mapper/vg_slave1-lv_root
                      139G   20G  112G  16% /
tmpfs                 4.0G     0  4.0G   0% /dev/shm
/dev/vda1             477M   28M  424M   7% /boot
/dev/vdb              3.0T   35G  3.0T   2% /data
localhost:*******  3.0T   35G  3.0T   2% /mnt

[root at slave1 geo-replication-slaves]# df -i
Filesystem              Inodes   IUsed     IFree IUse% Mounted on
/dev/mapper/vg_slave1-lv_root
                       9224192   46456   9177736    1% /
tmpfs                  1023966       2   1023964    1% /dev/shm
/dev/vda1               128016      38    127978    1% /boot
/dev/vdb             322122496 5145872 316976624    2% /data
localhost:****** 322122496 5145872 316976624    2% /mnt <--- huge difference
between master and slave.
~~~

<Master side>
[root at master1 ~]# df -h
Filesystem            Size  Used Avail Use% Mounted on
/dev/mapper/vg_master1-lv_root
                       17G  3.6G   12G  23% /
tmpfs                 4.0G     0  4.0G   0% /dev/shm
/dev/vda1             477M   28M  424M   7% /boot
/dev/vdb              3.0T   78G  3.0T   3% /data
~~~
~~~
<Slave side>
[root at slave1 geo-replication-slaves]# df -h
Filesystem            Size  Used Avail Use% Mounted on
/dev/mapper/vg_slave1-lv_root
                      139G  7.5G  124G   6% /
tmpfs                 4.0G     0  4.0G   0% /dev/shm
/dev/vda1             477M   28M  424M   7% /boot
/dev/vdb              3.0T   29G  3.0T   1% /data
~~~

~~~
Status shows History Crawl.

--- Additional comment from Bipin Kunal on 2015-04-10 08:31:35 EDT ---

RCA done by Aravinda,

Hi Bipin,

Thanks for the setup. I root caused the issue.

Changelog processing is done in batch, for example if 10 changelogs 
available for processing then process all at once. Collect Entry, Meta 
and Data operations separately, All the entry operations like CREATE, 
MKDIR, MKNOD, LINK, UNLINK will be executed first then rsync will be 
triggered for whole batch. Stime will get updated once the complete 
batch is complete.

In this setup, large number of Changelogs available for processing 
during History Changelogs processing(Since batch size is automatic). 
Entry operations are complete, but while trying rsync, worker goes to 
faulty for some reason and restarted. Since stime is not updated, it has 
to process all the changelogs again. While processing same changelogs 
again, all CREATE will get EEXIST since all the files created in 
previous run.

To understand better we will consider three changelogs in batch,

CHANGELOG.1428375039, CHANGELOG.1428375054, CHANGELOG.1428375069 each 
having 850 creates and 850 data operations recorded. Geo-rep picks all 
the three files for processing, collect all the entry operations and 
data operations(2550 Creates and 2550 data). After processing 2550 
entries if worker failed, Geo-rep has to process all 2550 entries 
again(But gets EEXIST).

Solution:
Geo-rep should limit the batch size either based on number of changelogs 
or based on number of records to be processed. Once this smaller/Optimal 
batch processed, stime will be updated. Even if worker crashes, Geo-rep 
has to repeat only small delta instead of large set.

Workaround:
Not available. :(
Geo-rep is not failed, only thing is it is taking more time to come to 
the speed. If workers not crashed, it will complete sync. If workers 
crash in between, then again retry the same and time consuming :(

I will work on this as priority, will post the patch soon.

--
regards
Aravinda

--- Additional comment from Anand Avati on 2015-04-11 12:02:24 EDT ---

REVIEW: http://review.gluster.org/10202 (geo-rep: Limit number of changelogs to
process in batch) posted (#1) for review on master by Aravinda VK
(avishwan at redhat.com)

--- Additional comment from Anand Avati on 2015-04-27 07:23:44 EDT ---

REVIEW: http://review.gluster.org/10202 (geo-rep: Limit number of changelogs to
process in batch) posted (#2) for review on master by Aravinda VK
(avishwan at redhat.com)

--- Additional comment from Anand Avati on 2015-04-28 13:39:45 EDT ---

COMMIT: http://review.gluster.org/10202 committed in master by Vijay Bellur
(vbellur at redhat.com) 
------
commit 428933dce2c87ea62b4f58af7d260064fade6a8b
Author: Aravinda VK <avishwan at redhat.com>
Date:   Sat Apr 11 20:03:47 2015 +0530

    geo-rep: Limit number of changelogs to process in batch

    Changelog processing is done in batch, for example if 10 changelogs
    available for processing then process all at once. Collect Entry, Meta
    and Data operations separately, All the entry operations like CREATE,
    MKDIR, MKNOD, LINK, UNLINK will be executed first then rsync will be
    triggered for whole batch. Stime will get updated once the complete
    batch is complete.

    In case of large number of Changelogs in a batch, If geo-rep fails after
    Entry operations, but before rsync then on restart, it again starts from
the
    beginning since stime is not updated. It has to process all the changelogs
    again. While processing same changelogs again, all CREATE will get EEXIST
    since all the files created in previous run. Big hit for performance.

    With this patch, Geo-rep limits number of changelogs per batch based on
    Changelog file size. So that when geo-rep fails it has to retry only last
batch
    changelogs since stime gets updated after each batch.

    BUG: 1210965
    Change-Id: I844448c4cdcce38a3a2e2cca7c9a50db8f5a9062
    Signed-off-by: Aravinda VK <avishwan at redhat.com>
    Reviewed-on: http://review.gluster.org/10202
    Reviewed-by: Kotresh HR <khiremat at redhat.com>
    Tested-by: NetBSD Build System
    Tested-by: Gluster Build System <jenkins at build.gluster.com>
    Reviewed-by: Vijay Bellur <vbellur at redhat.com>

Referenced Bugs:

https://bugzilla.redhat.com/show_bug.cgi?id=1210719
[Bug 1210719] [GSS]Geo-replication very slow, not able to sync all the
files to slave
https://bugzilla.redhat.com/show_bug.cgi?id=1210965
[Bug 1210965] Geo-replication very slow, not able to sync all the files to
slave
-- 
You are receiving this mail because:
You are on the CC list for the bug.
You are the assignee for the bug.