[Bugs] [Bug 1186286] New: Geo-Replication Faulty state

Tue Jan 27 11:05:02 UTC 2015

https://bugzilla.redhat.com/show_bug.cgi?id=1186286

            Bug ID: 1186286
           Summary: Geo-Replication Faulty state
           Product: GlusterFS
           Version: 3.6.1
         Component: geo-replication
          Assignee: bugs at gluster.org
          Reporter: pierre-marie.janvre at agoda.com
                CC: bugs at gluster.org, gluster-bugs at redhat.com

Description of problem: Geo-Replication Faulty state when stating it

Version-Release number of selected component (if applicable): 3.6.1

How reproducible: Each time I tried

Steps to Reproduce:
1.gluster volume geo-replication master_volume root at slave_node1::slave_volume
create push-pem
2.gluster volume geo-replication master_volume root at slave_node1::slave_volume
start
3.gluster volume geo-replication master_volume root at slave_node1::slave_volume
status

Actual results:
Faulty

Expected results:
OK

Additional info:
Here is the setup:

Datacenter A
2 nodes:
-master_node1
-master_node2
1 brick per node (replica)

Datacenter B
2 nodes:
-slave_node1
-slave_node2
1 brick per node (replica)

OS: CentOS 6.6
Gluster: glusterfs 3.6.1 built on Nov  7 2014 15:15:48

Bricks setup properly without any error.
Passwordless authentication between node 1 of datacenter 1 and node 1 of
datacenter 2 setup successfully.
Geo-Replication setup properly as  below:
gluster system:: execute gsec_create
gluster volume geo-replication master_volume root at slave_node1::slave_volume
create push-pem

I can start successfully the geo-replication:
gluster volume geo-replication master_volume root at slave_node1::slave_volume
start

But when checking the status, I have the following:
gluster volume geo-replication master_volume root at slave_node1::slave_volume
status

MASTER NODE    MASTER VOL    MASTER BRICK    SLAVE                         
STATUS    CHECKPOINT STATUS    CRAWL STATUS
------------------------------------------------------------------------------------------------------------------------
master_node1    master_volume     /master_brick1    
root at slave_node1::slave_volume    faulty    N/A                  N/A
master_node2    master_volume     /master_brick2    
root at slave_node1::slave_volume    faulty    N/A                  N/A

>From the master node 1, I run geo-replication logs in debug mode and I found
the following:
[2015-01-26 15:33:29.247694] D [monitor(monitor):280:distribute] <top>: master
bricks: [{'host': 'master_node1', 'dir': '/master_brick1'}, {'host':
'master_node2', 'dir': '/master_brick2'}]
[2015-01-26 15:33:29.248047] D [monitor(monitor):286:distribute] <top>: slave
SSH gateway: root at slave_node1
[2015-01-26 15:33:29.721532] I [monitor(monitor):296:distribute] <top>: slave
bricks: [{'host': 'slave_node1', 'dir': '/slave_brick1'}, {'host':
'slave_node2', 'dir': '/ slave_brick2'}]
[2015-01-26 15:33:29.729722] I [monitor(monitor):316:distribute] <top>: worker
specs: [('/master_brick1',
'ssh://root@slave_node2:gluster://localhost:slave_volume')]
[2015-01-26 15:33:29.730287] I [monitor(monitor):109:set_state] Monitor: new
state: Initializing...
[2015-01-26 15:33:29.731513] I [monitor(monitor):163:monitor] Monitor:
------------------------------------------------------------
[2015-01-26 15:33:29.731647] I [monitor(monitor):164:monitor] Monitor: starting
gsyncd worker
[2015-01-26 15:33:29.830656] D [gsyncd(agent):627:main_i] <top>: rpc_fd:
'7,10,9,8'
[2015-01-26 15:33:29.831882] I [monitor(monitor):214:monitor] Monitor:
worker(/master_brick1) died before establishing connection
[2015-01-26 15:33:29.831476] I [changelogagent(agent):72:__init__]
ChangelogAgent: Agent listining...
[2015-01-26 15:33:29.832392] I [repce(agent):92:service_loop] RepceServer:
terminating on reaching EOF.
[2015-01-26 15:33:29.832693] I [syncdutils(agent):214:finalize] <top>: exiting.
[2015-01-26 15:33:29.834060] I [monitor(monitor):109:set_state] Monitor: new
state: faulty
[2015-01-26 15:33:39.846858] I [monitor(monitor):163:monitor] Monitor:
------------------------------------------------------------
[2015-01-26 15:33:39.847105] I [monitor(monitor):164:monitor] Monitor: starting
gsyncd worker
[2015-01-26 15:33:39.941967] D [gsyncd(agent):627:main_i] <top>: rpc_fd:
'7,10,9,8'
[2015-01-26 15:33:39.942630] I [changelogagent(agent):72:__init__]
ChangelogAgent: Agent listining...
[2015-01-26 15:33:39.945791] I [repce(agent):92:service_loop] RepceServer:
terminating on reaching EOF.
[2015-01-26 15:33:39.945941] I [syncdutils(agent):214:finalize] <top>: exiting.
[2015-01-26 15:33:39.945904] I [monitor(monitor):214:monitor] Monitor:
worker(/master_brick1) died before establishing connection
[2015-01-26 15:33:49.959361] I [monitor(monitor):163:monitor] Monitor:
------------------------------------------------------------
[2015-01-26 15:33:49.959599] I [monitor(monitor):164:monitor] Monitor: starting
gsyncd worker
[2015-01-26 15:33:50.56200] D [gsyncd(agent):627:main_i] <top>: rpc_fd:
'7,10,9,8'
[2015-01-26 15:33:50.56809] I [changelogagent(agent):72:__init__]
ChangelogAgent: Agent listining...
[2015-01-26 15:33:50.58903] I [repce(agent):92:service_loop] RepceServer:
terminating on reaching EOF.
[2015-01-26 15:33:50.59078] I [syncdutils(agent):214:finalize] <top>: exiting.
[2015-01-26 15:33:50.59039] I [monitor(monitor):214:monitor] Monitor:
worker(/master_brick1) died before establishing connection
[2015-01-26 15:34:00.72674] I [monitor(monitor):163:monitor] Monitor:
------------------------------------------------------------
[2015-01-26 15:34:00.72926] I [monitor(monitor):164:monitor] Monitor: starting
gsyncd worker
[2015-01-26 15:34:00.169071] D [gsyncd(agent):627:main_i] <top>: rpc_fd:
'7,10,9,8'
[2015-01-26 15:34:00.169931] I [changelogagent(agent):72:__init__]
ChangelogAgent: Agent listining...
[2015-01-26 15:34:00.170466] I [repce(agent):92:service_loop] RepceServer:
terminating on reaching EOF.
[2015-01-26 15:34:00.170526] I [monitor(monitor):214:monitor] Monitor:
worker(/master_brick1) died before establishing connection
[2015-01-26 15:34:00.170938] I [syncdutils(agent):214:finalize] <top>: exiting.
[2015-01-26 15:34:10.183361] I [monitor(monitor):163:monitor] Monitor:
------------------------------------------------------------
[2015-01-26 15:34:10.183614] I [monitor(monitor):164:monitor] Monitor: starting
gsyncd worker
[2015-01-26 15:34:10.278914] D [monitor(monitor):217:monitor] Monitor:
worker(/master_brick1) connected
[2015-01-26 15:34:10.279994] I [monitor(monitor):222:monitor] Monitor:
worker(/master_brick1) died in startup phase
[2015-01-26 15:34:10.282217] D [gsyncd(agent):627:main_i] <top>: rpc_fd:
'7,10,9,8'
[2015-01-26 15:34:10.282943] I [repce(agent):92:service_loop] RepceServer:
terminating on reaching EOF.
[2015-01-26 15:34:10.283098] I [changelogagent(agent):72:__init__]
ChangelogAgent: Agent listining...
[2015-01-26 15:34:10.283303] I [syncdutils(agent):214:finalize] <top>: exiting.

-- 
You are receiving this mail because:
You are on the CC list for the bug.
You are the assignee for the bug.