[Bugs] [Bug 1500346] [geo-rep]: Incorrect last sync "0" during hystory crawl after upgrade/stop-start
bugzilla at redhat.com
bugzilla at redhat.com
Tue Oct 10 12:33:25 UTC 2017
https://bugzilla.redhat.com/show_bug.cgi?id=1500346
Kotresh HR <khiremat at redhat.com> changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|NEW |ASSIGNED
Assignee|bugs at gluster.org |khiremat at redhat.com
--- Comment #1 from Kotresh HR <khiremat at redhat.com> ---
Description of problem:
=======================
Observed a scenario where lasy sync became zero post upgrade/reboot during
hystory crawl. Before upgrade started, the sync was "changelog crawl" with last
sync time as: "2017-07-21 12:51:55". However after upgrade and starting the
geo-rep, the last sync for few workers were shown as "0". The corresponding
status file shows "0"
[root at dhcp42-79 ~]# gluster volume geo-replication master 10.70.41.209::slave
status
MASTER NODE MASTER VOL MASTER BRICK SLAVE USER SLAVE
SLAVE NODE STATUS CRAWL STATUS LAST_SYNCED
----------------------------------------------------------------------------------------------------------------------------------------------------
10.70.42.79 master /rhs/brick1/b1 root
10.70.41.209::slave 10.70.41.209 Active History Crawl 2017-07-21
12:51:55
10.70.42.79 master /rhs/brick2/b5 root
10.70.41.209::slave 10.70.41.209 Active History Crawl 2017-07-21
12:51:55
10.70.42.79 master /rhs/brick3/b9 root
10.70.41.209::slave 10.70.41.209 Active History Crawl 2017-07-21
12:51:55
10.70.42.74 master /rhs/brick1/b3 root
10.70.41.209::slave 10.70.41.202 Active History Crawl N/A
10.70.42.74 master /rhs/brick2/b7 root
10.70.41.209::slave 10.70.41.202 Active History Crawl N/A
10.70.42.74 master /rhs/brick3/b11 root
10.70.41.209::slave 10.70.41.202 Active History Crawl N/A
10.70.41.217 master /rhs/brick1/b4 root
10.70.41.209::slave 10.70.42.177 Passive N/A N/A
10.70.41.217 master /rhs/brick2/b8 root
10.70.41.209::slave 10.70.42.177 Passive N/A N/A
10.70.41.217 master /rhs/brick3/b12 root
10.70.41.209::slave 10.70.42.177 Passive N/A N/A
10.70.43.210 master /rhs/brick1/b2 root
10.70.41.209::slave 10.70.41.194 Passive N/A N/A
10.70.43.210 master /rhs/brick2/b6 root
10.70.41.209::slave 10.70.41.194 Passive N/A N/A
10.70.43.210 master /rhs/brick3/b10 root
10.70.41.209::slave 10.70.41.194 Passive N/A N/A
[root at dhcp42-79 ~]#
[root at dhcp42-79 ~]# date
Sun Jul 23 11:04:25 IST 2017
[root at dhcp42-79 ~]#
[root at dhcp42-74 ~]# cd
/var/lib/glusterd/geo-replication/master_10.70.41.209_slave/
[root at dhcp42-74 master_10.70.41.209_slave]# ls
brick_%2Frhs%2Fbrick1%2Fb3.status brick_%2Frhs%2Fbrick2%2Fb7.status
brick_%2Frhs%2Fbrick3%2Fb11.status gsyncd.conf monitor.pid monitor.status
[root at dhcp42-74 master_10.70.41.209_slave]# cat
brick_%2Frhs%2Fbrick1%2Fb3.status
{"checkpoint_time": 0, "last_synced": 0, "checkpoint_completed": "N/A", "meta":
0, "failures": 0, "entry": 583, "slave_node": "10.70.41.202", "data": 2083,
"worker_status": "Active", "crawl_status": "History Crawl",
"checkpoint_completion_time": 0}[root at dhcp42-74 master_10.70.41.209_slave]#
[root at dhcp42-74 master_10.70.41.209_slave]# cat
brick_%2Frhs%2Fbrick2%2Fb7.status
{"checkpoint_time": 0, "last_synced": 0, "checkpoint_completed": "N/A", "meta":
0, "failures": 0, "entry": 584, "slave_node": "10.70.41.202", "data": 2059,
"worker_status": "Active", "crawl_status": "History Crawl",
"checkpoint_completion_time": 0}[root at dhcp42-74 master_10.70.41.209_slave]#
[root at dhcp42-74 master_10.70.41.209_slave]# cat
brick_%2Frhs%2Fbrick3%2Fb11.status
{"checkpoint_time": 0, "last_synced": 0, "checkpoint_completed": "N/A", "meta":
0, "failures": 0, "entry": 586, "slave_node": "10.70.41.202", "data": 2101,
"worker_status": "Active", "crawl_status": "History Crawl",
"checkpoint_completion_time": 0}[root at dhcp42-74 master_10.70.41.209_slave]#
[root at dhcp42-74 master_10.70.41.209_slave]# cat monitor.status
Started[root at dhcp42-74 master_10.70.41.209_slave]#
The status remained same for more than 10 mins until one batch did not sync
MASTER NODE MASTER VOL MASTER BRICK SLAVE USER SLAVE
SLAVE NODE STATUS CRAWL STATUS LAST_SYNCED
----------------------------------------------------------------------------------------------------------------------------------------------------
10.70.42.79 master /rhs/brick1/b1 root
10.70.41.209::slave 10.70.41.209 Active History Crawl 2017-07-21
12:51:55
10.70.42.79 master /rhs/brick2/b5 root
10.70.41.209::slave 10.70.41.209 Active History Crawl 2017-07-21
12:51:55
10.70.42.79 master /rhs/brick3/b9 root
10.70.41.209::slave 10.70.41.209 Active History Crawl 2017-07-21
12:51:55
10.70.41.217 master /rhs/brick1/b4 root
10.70.41.209::slave 10.70.42.177 Passive N/A N/A
10.70.41.217 master /rhs/brick2/b8 root
10.70.41.209::slave 10.70.42.177 Passive N/A N/A
10.70.41.217 master /rhs/brick3/b12 root
10.70.41.209::slave 10.70.42.177 Passive N/A N/A
10.70.42.74 master /rhs/brick1/b3 root
10.70.41.209::slave 10.70.41.202 Active History Crawl N/A
10.70.42.74 master /rhs/brick2/b7 root
10.70.41.209::slave 10.70.41.202 Active History Crawl N/A
10.70.42.74 master /rhs/brick3/b11 root
10.70.41.209::slave 10.70.41.202 Active History Crawl N/A
10.70.43.210 master /rhs/brick1/b2 root
10.70.41.209::slave 10.70.41.194 Passive N/A N/A
10.70.43.210 master /rhs/brick2/b6 root
10.70.41.209::slave 10.70.41.194 Passive N/A N/A
10.70.43.210 master /rhs/brick3/b10 root
10.70.41.209::slave 10.70.41.194 Passive N/A N/A
Sun Jul 23 11:14:50 IST 2017
Version-Release number of selected component (if applicable):
=============================================================
mainline
How reproducible:
=================
I remember seeing this only once before upon stop/start. Have tried upgrade
twice and seen this once.
Steps to Reproduce:
===================
No specific steps, the systems were upgraded and as part of upgrade
geo-replication was stopped/started.
Actual results:
===============
Last sync is "0"
Expected results:
=================
Last sync should be what it was before geo-rep stopped. Looks like brick status
file was overwritten with "0" as last synced.
--
You are receiving this mail because:
You are on the CC list for the bug.
You are the assignee for the bug.
More information about the Bugs
mailing list