[Bugs] [Bug 1577862] New: [geo-rep]: Upgrade fails, session in FAULTY state

Mon May 14 09:59:31 UTC 2018

https://bugzilla.redhat.com/show_bug.cgi?id=1577862

            Bug ID: 1577862
           Summary: [geo-rep]:  Upgrade fails, session in FAULTY state
           Product: GlusterFS
           Version: 3.12
         Component: geo-replication
          Keywords: Regression
          Severity: urgent
          Assignee: bugs at gluster.org
          Reporter: khiremat at redhat.com
                CC: amukherj at redhat.com, bugs at gluster.org,
                    csaba at redhat.com, rallan at redhat.com,
                    rhinduja at redhat.com, rhs-bugs at redhat.com,
                    sankarshan at redhat.com, storage-qa-internal at redhat.com
        Depends On: 1569490, 1575490
            Blocks: 1474012, 1503137

+++ This bug was initially created as a clone of Bug #1575490 +++

Description of problem:
=======================
While upgrading from gluster version 3.8 to v.3.12 encountered a FAULTY session
where there was only one worker ACTIVE.

[root at dhcp42-53 master]# gluster volume geo-replication master
10.70.42.164::slave status

MASTER NODE     MASTER VOL    MASTER BRICK      SLAVE USER    SLAVE            
     SLAVE NODE      STATUS    CRAWL STATUS     LAST_SYNCED          
------------------------------------------------------------------------------------------------------------------------------------------
10.70.42.53     master        /rhs/brick1/b1    root         
10.70.42.164::slave    N/A             Faulty    N/A              N/A           
10.70.42.53     master        /rhs/brick2/b4    root         
10.70.42.164::slave    N/A             Faulty    N/A              N/A           
10.70.42.138    master        /rhs/brick1/b3    root         
10.70.42.164::slave    10.70.42.164    Active    History Crawl    N/A           
10.70.42.138    master        /rhs/brick2/b6    root         
10.70.42.164::slave    N/A             Faulty    N/A              N/A           
10.70.42.160    master        /rhs/brick1/b2    root         
10.70.42.164::slave    N/A             Faulty    N/A              N/A           
10.70.42.160    master        /rhs/brick2/b5    root         
10.70.42.164::slave    N/A             Faulty    N/A              N/A  

Traceback in geo-rep logs:
--------------------------------
Traceback (most recent call last):
  File "/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py", line 210, in main
    main_i()
  File "/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py", line 802, in
main_i
    local.service_loop(*[r for r in [remote] if r])
  File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 1676, in
service_loop
    g3.crawlwrap(oneshot=True)
  File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 597, in
crawlwrap
    self.crawl()
  File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1470, in
crawl
    self.changelogs_batch_process(changes)
  File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1370, in
changelogs_batch_process
    self.process(batch)
  File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1204, in
process
    self.process_change(change, done, retry)
  File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1123, in
process_change
    entry_stime_to_update[0])
  File "/usr/libexec/glusterfs/python/syncdaemon/gsyncdstatus.py", line 200, in
set_field
    return self._update(merger)
  File "/usr/libexec/glusterfs/python/syncdaemon/gsyncdstatus.py", line 161, in
_update
    data = mergerfunc(data)
  File "/usr/libexec/glusterfs/python/syncdaemon/gsyncdstatus.py", line 194, in
merger
    if data[key] == value:
KeyError: 'last_synced_entry'

Version-Release number of selected component (if applicable):
=============================================================

How reproducible:
=================
1/1

Actual results:
===============
Session is FAULTY.

Expected results:
=================
Session should not be FAULTY.

--- Additional comment from Worker Ant on 2018-05-07 02:06:22 EDT ---

REVIEW: https://review.gluster.org/19969 (geo-rep: Fix upgrade issue) posted
(#1) for review on master by Kotresh HR

--- Additional comment from Worker Ant on 2018-05-07 06:17:41 EDT ---

COMMIT: https://review.gluster.org/19969 committed in master by "Aravinda VK"
<avishwan at redhat.com> with a commit message- geo-rep: Fix upgrade issue

Cause and Analysis:
The last synced changelog for entry operations is
marked in current version to avoid re-processing
of already processed entry operations in a batch
during crash/restart of geo-rep. This was not
present in previous versoins.

The marker is maintained in the dictionary with the
key 'last_synced_entry' and dictionary is persisted
into status file. So upgrading to current version in
which the marker is present was failing with KeyError.

Solution:
Load the dictionary with default keys first which
contains all the keys including latest ones and then
load the values from status file instead of doing
otherwise.

fixes: bz#1575490
Change-Id: Ic654e6f9a3c97f616761f1362f890352a2186fb4
Signed-off-by: Kotresh HR <khiremat at redhat.com>

Referenced Bugs:

https://bugzilla.redhat.com/show_bug.cgi?id=1474012
[Bug 1474012] [geo-rep]: Incorrect last sync "0" during hystory crawl after
upgrade/stop-start
https://bugzilla.redhat.com/show_bug.cgi?id=1569490
[Bug 1569490] [geo-rep]: in-service upgrade fails, session in FAULTY state
https://bugzilla.redhat.com/show_bug.cgi?id=1575490
[Bug 1575490] [geo-rep]:  Upgrade fails, session in FAULTY state
-- 
You are receiving this mail because:
You are on the CC list for the bug.
You are the assignee for the bug.