[Bugs] [Bug 1287107] New: [georep][sharding] Unable to resume geo-rep session after previous errors

Tue Dec 1 14:07:53 UTC 2015

https://bugzilla.redhat.com/show_bug.cgi?id=1287107

            Bug ID: 1287107
           Summary: [georep][sharding] Unable to resume geo-rep session
                    after previous errors
           Product: GlusterFS
           Version: mainline
         Component: geo-replication
          Severity: high
          Assignee: bugs at gluster.org
          Reporter: sabose at redhat.com
                CC: bugs at gluster.org, gluster-bugs at redhat.com

Created attachment 1100927
  --> https://bugzilla.redhat.com/attachment.cgi?id=1100927&action=edit
georep-master-log

Description of problem:

Geo-replication session that was running on a sharded volume resulted in
failures due to lack of space at slave volume.

Geo-rep session was stopped, slave volume disk space was extended (using
lvextend on underlying brick mount point), and geo-replication session was
resumed.

But looking at geo-rep status detail shows failures and it seems that files are
not being synced.

Status detail and volume info in Additional info

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:
1. Setup geo-replication session between master and slave (slave volume has
lesser capacity than master)
2. Start geo-rep 
3. Create data in master volume more than slave capacity
4. geo-rep status will report failures (seen in status detail as failure count)
5. stop geo-rep session
6. Increase capacity of slave volume (in my case, I extended brick lv by adding
additional vdisk to VM hosting slave)
7. start geo-rep session again

Actual results:

Expected results:

Additional info:

# gluster vol geo-replication data1 10.70.40.112::hc-slavevol  status detail

MASTER NODE                              MASTER VOL    MASTER BRICK      SLAVE
USER    SLAVE                        SLAVE NODE      STATUS     CRAWL STATUS   
 LAST_SYNCED            ENTRY    DATA    META    FAILURES    CHECKPOINT TIME   
    CHECKPOINT COMPLETED    CHECKPOINT COMPLETION TIME   
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
rhsdev-docker1.lab.eng.blr.redhat.com    data1         /rhgs/data1/b1    root  
       10.70.40.112::hc-slavevol    10.70.40.112    Passive    N/A             
N/A                    N/A      N/A     N/A     N/A         N/A                
   N/A                     N/A                          
rhsdev9.lab.eng.blr.redhat.com           data1         /rhgs/data1/b1    root  
       10.70.40.112::hc-slavevol    10.70.40.112    Active     History Crawl   
2015-11-26 15:18:56    0        7226    0       107         2015-12-01 18:09:11
   No                      N/A                          
rhsdev-docker2.lab.eng.blr.redhat.com    data1         /rhgs/data1/b1    root  
       10.70.40.112::hc-slavevol    10.70.40.112    Passive    N/A             
N/A                    N/A      N/A     N/A     N/A         N/A                
   N/A

Master volume:
Volume Name: data1
Type: Replicate
Volume ID: 55bd10b0-f05a-446b-a481-6590cc400263
Status: Started
Number of Bricks: 1 x 3 = 3
Transport-type: tcp
Bricks:
Brick1: rhsdev9.lab.eng.blr.redhat.com:/rhgs/data1/b1
Brick2: rhsdev-docker2.lab.eng.blr.redhat.com:/rhgs/data1/b1
Brick3: rhsdev-docker1.lab.eng.blr.redhat.com:/rhgs/data1/b1
Options Reconfigured:
performance.readdir-ahead: on
performance.low-prio-threads: 32
performance.quick-read: off
performance.read-ahead: off
performance.io-cache: off
performance.stat-prefetch: off
cluster.eager-lock: enable
network.remote-dio: enable
cluster.quorum-type: auto
cluster.server-quorum-type: server
storage.owner-uid: 36
storage.owner-gid: 36
features.shard: on
features.shard-block-size: 512MB
geo-replication.indexing: on
geo-replication.ignore-pid-check: on
changelog.changelog: on

Slave volume:
Volume Name: hc-slavevol
Type: Distribute
Volume ID: 56a3d4d9-51bc-4daf-9257-bd13e10511ae
Status: Started
Number of Bricks: 1
Transport-type: tcp
Bricks:
Brick1: 10.70.40.112:/brick/hc1
Options Reconfigured:
storage.owner-gid: 36
storage.owner-uid: 36
cluster.server-quorum-type: server
cluster.quorum-type: auto
network.remote-dio: enable
cluster.eager-lock: enable
performance.stat-prefetch: off
performance.io-cache: off
performance.read-ahead: off
performance.quick-read: off
performance.readdir-ahead: on
features.shard: on
features.shard-block-size: 512MB

-- 
You are receiving this mail because:
You are on the CC list for the bug.
You are the assignee for the bug.