[Bugs] [Bug 1500853] [geo-rep]: Incorrect last sync "0" during hystory crawl after upgrade/stop-start
bugzilla at redhat.com
bugzilla at redhat.com
Wed Oct 11 15:18:06 UTC 2017
https://bugzilla.redhat.com/show_bug.cgi?id=1500853
Kotresh HR <khiremat at redhat.com> changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|NEW |ASSIGNED
Assignee|bugs at gluster.org |khiremat at redhat.com
--- Comment #1 from Kotresh HR <khiremat at redhat.com> ---
Description of problem:
=======================
Observed a scenario where lasy sync became zero post upgrade/reboot during
hystory crawl. Before upgrade started, the sync was "changelog crawl" with last
sync time as: "2017-07-21 12:51:55". However after upgrade and starting the
geo-rep, the last sync for few workers were shown as "0". The corresponding
status file shows "0"
[root at dhcp42-79 ~]# gluster volume geo-replication master 10.70.41.209::slave
status
MASTER NODE MASTER VOL MASTER BRICK SLAVE USER SLAVE
SLAVE NODE STATUS CRAWL STATUS LAST_SYNCED
----------------------------------------------------------------------------------------------------------------------------------------------------
10.70.42.79 master /rhs/brick1/b1 root
10.70.41.209::slave 10.70.41.209 Active History Crawl 2017-07-21
12:51:55
10.70.42.79 master /rhs/brick2/b5 root
10.70.41.209::slave 10.70.41.209 Active History Crawl 2017-07-21
12:51:55
10.70.42.79 master /rhs/brick3/b9 root
10.70.41.209::slave 10.70.41.209 Active History Crawl 2017-07-21
12:51:55
10.70.42.74 master /rhs/brick1/b3 root
10.70.41.209::slave 10.70.41.202 Active History Crawl N/A
10.70.42.74 master /rhs/brick2/b7 root
10.70.41.209::slave 10.70.41.202 Active History Crawl N/A
10.70.42.74 master /rhs/brick3/b11 root
10.70.41.209::slave 10.70.41.202 Active History Crawl N/A
10.70.41.217 master /rhs/brick1/b4 root
10.70.41.209::slave 10.70.42.177 Passive N/A N/A
10.70.41.217 master /rhs/brick2/b8 root
10.70.41.209::slave 10.70.42.177 Passive N/A N/A
10.70.41.217 master /rhs/brick3/b12 root
10.70.41.209::slave 10.70.42.177 Passive N/A N/A
10.70.43.210 master /rhs/brick1/b2 root
10.70.41.209::slave 10.70.41.194 Passive N/A N/A
10.70.43.210 master /rhs/brick2/b6 root
10.70.41.209::slave 10.70.41.194 Passive N/A N/A
10.70.43.210 master /rhs/brick3/b10 root
10.70.41.209::slave 10.70.41.194 Passive N/A N/A
[root at dhcp42-79 ~]#
[root at dhcp42-79 ~]# date
Sun Jul 23 11:04:25 IST 2017
[root at dhcp42-79 ~]#
[root at dhcp42-74 ~]# cd
/var/lib/glusterd/geo-replication/master_10.70.41.209_slave/
[root at dhcp42-74 master_10.70.41.209_slave]# ls
brick_%2Frhs%2Fbrick1%2Fb3.status brick_%2Frhs%2Fbrick2%2Fb7.status
brick_%2Frhs%2Fbrick3%2Fb11.status gsyncd.conf monitor.pid monitor.status
[root at dhcp42-74 master_10.70.41.209_slave]# cat
brick_%2Frhs%2Fbrick1%2Fb3.status
{"checkpoint_time": 0, "last_synced": 0, "checkpoint_completed": "N/A", "meta":
0, "failures": 0, "entry": 583, "slave_node": "10.70.41.202", "data": 2083,
"worker_status": "Active", "crawl_status": "History Crawl",
"checkpoint_completion_time": 0}[root at dhcp42-74 master_10.70.41.209_slave]#
[root at dhcp42-74 master_10.70.41.209_slave]# cat
brick_%2Frhs%2Fbrick2%2Fb7.status
{"checkpoint_time": 0, "last_synced": 0, "checkpoint_completed": "N/A", "meta":
0, "failures": 0, "entry": 584, "slave_node": "10.70.41.202", "data": 2059,
"worker_status": "Active", "crawl_status": "History Crawl",
"checkpoint_completion_time": 0}[root at dhcp42-74 master_10.70.41.209_slave]#
[root at dhcp42-74 master_10.70.41.209_slave]# cat
brick_%2Frhs%2Fbrick3%2Fb11.status
{"checkpoint_time": 0, "last_synced": 0, "checkpoint_completed": "N/A", "meta":
0, "failures": 0, "entry": 586, "slave_node": "10.70.41.202", "data": 2101,
"worker_status": "Active", "crawl_status": "History Crawl",
"checkpoint_completion_time": 0}[root at dhcp42-74 master_10.70.41.209_slave]#
[root at dhcp42-74 master_10.70.41.209_slave]# cat monitor.status
Started[root at dhcp42-74 master_10.70.41.209_slave]#
The status remained same for more than 10 mins until one batch did not sync
MASTER NODE MASTER VOL MASTER BRICK SLAVE USER SLAVE
SLAVE NODE STATUS CRAWL STATUS LAST_SYNCED
----------------------------------------------------------------------------------------------------------------------------------------------------
10.70.42.79 master /rhs/brick1/b1 root
10.70.41.209::slave 10.70.41.209 Active History Crawl 2017-07-21
12:51:55
10.70.42.79 master /rhs/brick2/b5 root
10.70.41.209::slave 10.70.41.209 Active History Crawl 2017-07-21
12:51:55
10.70.42.79 master /rhs/brick3/b9 root
10.70.41.209::slave 10.70.41.209 Active History Crawl 2017-07-21
12:51:55
10.70.41.217 master /rhs/brick1/b4 root
10.70.41.209::slave 10.70.42.177 Passive N/A N/A
10.70.41.217 master /rhs/brick2/b8 root
10.70.41.209::slave 10.70.42.177 Passive N/A N/A
10.70.41.217 master /rhs/brick3/b12 root
10.70.41.209::slave 10.70.42.177 Passive N/A N/A
10.70.42.74 master /rhs/brick1/b3 root
10.70.41.209::slave 10.70.41.202 Active History Crawl N/A
10.70.42.74 master /rhs/brick2/b7 root
10.70.41.209::slave 10.70.41.202 Active History Crawl N/A
10.70.42.74 master /rhs/brick3/b11 root
10.70.41.209::slave 10.70.41.202 Active History Crawl N/A
10.70.43.210 master /rhs/brick1/b2 root
10.70.41.209::slave 10.70.41.194 Passive N/A N/A
10.70.43.210 master /rhs/brick2/b6 root
10.70.41.209::slave 10.70.41.194 Passive N/A N/A
10.70.43.210 master /rhs/brick3/b10 root
10.70.41.209::slave 10.70.41.194 Passive N/A N/A
Sun Jul 23 11:14:50 IST 2017
Version-Release number of selected component (if applicable):
=============================================================
mainline
How reproducible:
=================
I remember seeing this only once before upon stop/start. Have tried upgrade
twice and seen this once.
Steps to Reproduce:
===================
No specific steps, the systems were upgraded and as part of upgrade
geo-replication was stopped/started.
Actual results:
===============
Last sync is "0"
Expected results:
=================
Last sync should be what it was before geo-rep stopped. Looks like brick status
file was overwritten with "0" as last synced.
--- Additional comment from Worker Ant on 2017-10-10 08:34:12 EDT ---
REVIEW: https://review.gluster.org/18468 (geo-rep: Fix passive brick's last
sync time) posted (#1) for review on master by Kotresh HR (khiremat at redhat.com)
--- Additional comment from Worker Ant on 2017-10-11 11:16:39 EDT ---
COMMIT: https://review.gluster.org/18468 committed in master by Kotresh HR
(khiremat at redhat.com)
------
commit f18a47ee7e6e06c9a9a8893aef7957f23a18de53
Author: Kotresh HR <khiremat at redhat.com>
Date: Tue Oct 10 08:25:19 2017 -0400
geo-rep: Fix passive brick's last sync time
Passive brick's stime was not updated to the
status file immediately after updating the brick
root. As a result the last sync time was showing
'0' until it finishes first crawl if passive
worker becomes active after restart. Fix is to
update the status file immediately after upgrading
the brick root.
Change-Id: I248339497303bad20b7f5a1d42ab44a1fe6bca99
BUG: 1500346
Signed-off-by: Kotresh HR <khiremat at redhat.com>
--
You are receiving this mail because:
You are on the CC list for the bug.
You are the assignee for the bug.
More information about the Bugs
mailing list