[Bugs] [Bug 1668118] New: Failure to start geo-replication for tiered volume.
bugzilla at redhat.com
bugzilla at redhat.com
Mon Jan 21 23:21:09 UTC 2019
https://bugzilla.redhat.com/show_bug.cgi?id=1668118
Bug ID: 1668118
Summary: Failure to start geo-replication for tiered volume.
Product: GlusterFS
Version: 5
Hardware: x86_64
OS: Linux
Status: NEW
Component: geo-replication
Severity: high
Assignee: bugs at gluster.org
Reporter: vnosov at stonefly.com
CC: bugs at gluster.org
Target Milestone: ---
Classification: Community
Description of problem: Status of geo-replication workers on master nodes is
"inconsistent" if master volume is tiered.
Version-Release number of selected component (if applicable):
GlusterFS 5.2 installation from source code TAR file
How reproducible: 100%
Steps to Reproduce:
1. Set up two nodes. One will host geo-replication master volume. Master volume
has to be tiered. Other node will host geo-replication slave volume.
[root at SC-10-10-63-182 log]# glusterfsd --version
glusterfs 5.2
[root at SC-10-10-63-183 log]# glusterfsd --version
glusterfs 5.2
2. On master node create tiered volume:
[root at SC-10-10-63-182 log]# gluster volume info master-volume-1
Volume Name: master-volume-1
Type: Tier
Volume ID: aa95df34-f181-456c-aa26-9756b68ed679
Status: Started
Snapshot Count: 0
Number of Bricks: 2
Transport-type: tcp
Hot Tier :
Hot Tier Type : Distribute
Number of Bricks: 1
Brick1: 10.10.60.182:/exports/master-hot-tier/master-volume-1
Cold Tier:
Cold Tier Type : Distribute
Number of Bricks: 1
Brick2: 10.10.60.182:/exports/master-segment-1/master-volume-1
Options Reconfigured:
features.ctr-sql-db-wal-autocheckpoint: 25000
features.ctr-sql-db-cachesize: 12500
cluster.tier-mode: cache
features.ctr-enabled: on
server.allow-insecure: on
performance.quick-read: off
performance.stat-prefetch: off
nfs.addr-namelookup: off
transport.address-family: inet
nfs.disable: on
cluster.enable-shared-storage: disable
snap-activate-on-create: enable
[root at SC-10-10-63-182 log]# gluster volume status master-volume-1
Status of volume: master-volume-1
Gluster process TCP Port RDMA Port Online Pid
------------------------------------------------------------------------------
Hot Bricks:
Brick 10.10.60.182:/exports/master-hot-tier
/master-volume-1 62001 0 Y 15690
Cold Bricks:
Brick 10.10.60.182:/exports/master-segment-
1/master-volume-1 62000 0 Y 9762
Tier Daemon on localhost N/A N/A Y 15713
Task Status of Volume master-volume-1
------------------------------------------------------------------------------
There are no active volume tasks
[root at SC-10-10-63-182 log]# gluster volume tier master-volume-1 status
Node Promoted files Demoted files Status
run time in h:m:s
--------- --------- --------- ---------
---------
localhost 0 0 in progress
0:3:40
Tiering Migration Functionality: master-volume-1: success
3. On slave node create slave volume:
[root at SC-10-10-63-183 log]# gluster volume info slave-volume-1
Volume Name: slave-volume-1
Type: Distribute
Volume ID: 569a340b-35f8-4109-8816-720982b11806
Status: Started
Snapshot Count: 0
Number of Bricks: 1
Transport-type: tcp
Bricks:
Brick1: 10.10.60.183:/exports/slave-segment-1/slave-volume-1
Options Reconfigured:
server.allow-insecure: on
performance.quick-read: off
performance.stat-prefetch: off
nfs.addr-namelookup: off
transport.address-family: inet
nfs.disable: on
cluster.enable-shared-storage: disable
snap-activate-on-create: enable
[root at SC-10-10-63-183 log]# gluster volume status slave-volume-1
Status of volume: slave-volume-1
Gluster process TCP Port RDMA Port Online Pid
------------------------------------------------------------------------------
Brick 10.10.60.183:/exports/slave-segment-1
/slave-volume-1 62000 0 Y 2532
Task Status of Volume slave-volume-1
------------------------------------------------------------------------------
There are no active volume tasks
4. Set up SSH access to slave node:
SSH from 182 to 183:
20660 01/21/2019 13:58:54.930122501 1548107934 command: /usr/bin/ssh
nasgorep at 10.10.60.183 /bin/pwd
20660 01/21/2019 13:58:55.021906148 1548107935 status=0 /usr/bin/ssh
nasgorep at 10.10.60.183 /bin/pwd
20694 01/21/2019 13:58:56.169890800 1548107936 command: /usr/bin/ssh -q
-oConnectTimeout=5 nasgorep at 10.10.60.183 /bin/pwd 2>&1
20694 01/21/2019 13:58:56.256032202 1548107936 status=0 /usr/bin/ssh -q
-oConnectTimeout=5 nasgorep at 10.10.60.183 /bin/pwd 2>&1
5. Initialize geo-replication from master volume to slave volume:
[root at SC-10-10-63-182 log]# vi /var/log/glusterfs/cmd_history.log
[2019-01-21 21:59:08.942567] : system:: execute gsec_create : SUCCESS
[2019-01-21 21:59:42.722194] : volume geo-replication master-volume-1
nasgorep at 10.10.60.183::slave-volume-1 create push-pem : SUCCESS
[2019-01-21 21:59:49.527353] : volume geo-replication master-volume-1
nasgorep at 10.10.60.183::slave-volume-1 start : SUCCESS
[2019-01-21 21:59:55.636198] : volume geo-replication master-volume-1
nasgorep at 10.10.60.183::slave-volume-1 status detail : SUCCESS
6. Check status of the geo-replication:
Actual results:
[root at SC-10-10-63-183 log]# /usr/sbin/gluster-mountbroker status
+-----------+-------------+---------------------------+--------------+---------------------------+
| NODE | NODE STATUS | MOUNT ROOT | GROUP |
USERS |
+-----------+-------------+---------------------------+--------------+---------------------------+
| localhost | UP | /var/mountbroker-root(OK) | nasgorep(OK) |
nasgorep(slave-volume-1) |
+-----------+-------------+---------------------------+--------------+---------------------------+
[root at SC-10-10-63-182 log]# gluster volume geo-replication master-volume-1
nasgorep at 10.10.60.183::slave-volume-1 status
MASTER NODE MASTER VOL MASTER BRICK
SLAVE USER SLAVE SLAVE NODE STATUS
CRAWL STATUS LAST_SYNCED
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
10.10.60.182 master-volume-1 /exports/master-hot-tier/master-volume-1
nasgorep nasgorep at 10.10.60.183::slave-volume-1 N/A Stopped
N/A N/A
10.10.60.182 master-volume-1 /exports/master-segment-1/master-volume-1
nasgorep nasgorep at 10.10.60.183::slave-volume-1 N/A Stopped
N/A N/A
Expected results:
Status of the geo-replication workers on master node has to be "Active".
Additional info:
Contents of file
/var/log/glusterfs/geo-replication/master-volume-1_10.10.60.183_slave-volume-1/gsyncd.log
on master node has explanation what is wrong:
[root at SC-10-10-63-182 log]# vi
/var/log/glusterfs/geo-replication/master-volume-1_10.10.60.183_slave-volume-1/gsyncd.log
[2019-01-21 21:59:39.347943] W [gsyncd(config-get):304:main] <top>: Session
config file not exists, using the default config
path=/var/lib/glusterd/geo-replication/master-volume-1_10.10.60.183_slave-volume-1/gsyncd.conf
[2019-01-21 21:59:42.438145] I [gsyncd(monitor-status):308:main] <top>: Using
session config file
path=/var/lib/glusterd/geo-replication/master-volume-1_10.10.60.183_slave-volume-1/gsyncd.conf
[2019-01-21 21:59:42.454929] I
[subcmds(monitor-status):29:subcmd_monitor_status] <top>: Monitor Status Change
status=Created
[2019-01-21 21:59:48.756702] I [gsyncd(config-get):308:main] <top>: Using
session config file
path=/var/lib/glusterd/geo-replication/master-volume-1_10.10.60.183_slave-volume-1/gsyncd.conf
[2019-01-21 21:59:49.4720] I [gsyncd(config-get):308:main] <top>: Using session
config file
path=/var/lib/glusterd/geo-replication/master-volume-1_10.10.60.183_slave-volume-1/gsyncd.conf
[2019-01-21 21:59:49.239733] I [gsyncd(config-get):308:main] <top>: Using
session config file
path=/var/lib/glusterd/geo-replication/master-volume-1_10.10.60.183_slave-volume-1/gsyncd.conf
[2019-01-21 21:59:49.475193] I [gsyncd(monitor):308:main] <top>: Using session
config file
path=/var/lib/glusterd/geo-replication/master-volume-1_10.10.60.183_slave-volume-1/gsyncd.conf
[2019-01-21 21:59:49.868150] I [gsyncdstatus(monitor):248:set_worker_status]
GeorepStatus: Worker Status Change status=Initializing...
[2019-01-21 21:59:49.868396] I [monitor(monitor):157:monitor] Monitor: starting
gsyncd worker slave_node=10.10.60.183
brick=/exports/master-segment-1/master-volume-1
[2019-01-21 21:59:49.871593] I [gsyncdstatus(monitor):248:set_worker_status]
GeorepStatus: Worker Status Change status=Initializing...
[2019-01-21 21:59:49.871963] I [monitor(monitor):157:monitor] Monitor: starting
gsyncd worker slave_node=10.10.60.183
brick=/exports/master-hot-tier/master-volume-1
[2019-01-21 21:59:50.4395] I [monitor(monitor):268:monitor] Monitor: worker
died before establishing connection
brick=/exports/master-segment-1/master-volume-1
[2019-01-21 21:59:50.7447] I [monitor(monitor):268:monitor] Monitor: worker
died before establishing connection
brick=/exports/master-hot-tier/master-volume-1
[2019-01-21 21:59:50.8415] I [gsyncd(agent
/exports/master-segment-1/master-volume-1):308:main] <top>: Using session
config file
path=/var/lib/glusterd/geo-replication/master-volume-1_10.10.60.183_slave-volume-1/gsyncd.conf
[2019-01-21 21:59:50.10383] I [gsyncd(agent
/exports/master-hot-tier/master-volume-1):308:main] <top>: Using session config
file
path=/var/lib/glusterd/geo-replication/master-volume-1_10.10.60.183_slave-volume-1/gsyncd.conf
[2019-01-21 21:59:50.14039] I [repce(agent
/exports/master-segment-1/master-volume-1):97:service_loop] RepceServer:
terminating on reaching EOF.
[2019-01-21 21:59:50.15556] I [changelogagent(agent
/exports/master-hot-tier/master-volume-1):72:__init__] ChangelogAgent: Agent
listining...
[2019-01-21 21:59:50.15964] I [repce(agent
/exports/master-hot-tier/master-volume-1):97:service_loop] RepceServer:
terminating on reaching EOF.
[2019-01-21 21:59:55.141768] I [gsyncd(config-get):308:main] <top>: Using
session config file
path=/var/lib/glusterd/geo-replication/master-volume-1_10.10.60.183_slave-volume-1/gsyncd.conf
[2019-01-21 21:59:55.380496] I [gsyncd(status):308:main] <top>: Using session
config file
path=/var/lib/glusterd/geo-replication/master-volume-1_10.10.60.183_slave-volume-1/gsyncd.conf
[2019-01-21 21:59:55.625045] I [gsyncd(status):308:main] <top>: Using session
config file
path=/var/lib/glusterd/geo-replication/master-volume-1_10.10.60.183_slave-volume-1/gsyncd.conf
[2019-01-21 22:00:00.66032] I [gsyncdstatus(monitor):248:set_worker_status]
GeorepStatus: Worker Status Change status=inconsistent
[2019-01-21 22:00:00.66289] E [syncdutils(monitor):338:log_raise_exception]
<top>: FAIL:
Traceback (most recent call last):
File "/usr/libexec/glusterfs/python/syncdaemon/syncdutils.py", line 368, in
twrap
tf(*aargs)
File "/usr/libexec/glusterfs/python/syncdaemon/monitor.py", line 339, in wmon
slave_host, master, suuid, slavenodes)
TypeError: 'int' object is not iterable
Similar test on GlusterFS 3.12.14 does not show the same failure.
--
You are receiving this mail because:
You are on the CC list for the bug.
You are the assignee for the bug.
More information about the Bugs
mailing list