[Bugs] [Bug 1435587] New: [GSS]geo-replication faulty

Fri Mar 24 10:10:35 UTC 2017

https://bugzilla.redhat.com/show_bug.cgi?id=1435587

            Bug ID: 1435587
           Summary: [GSS]geo-replication faulty
           Product: Red Hat Gluster Storage
           Version: 3.0
         Component: geo-replication
          Keywords: Triaged
          Severity: high
          Assignee: avishwan at redhat.com
          Reporter: rnalakka at redhat.com
        QA Contact: rhinduja at redhat.com
                CC: avishwan at redhat.com, bugs at gluster.org,
                    csaba at redhat.com, ihkim at osci.kr, khoj at osci.kr,
                    rhs-bugs at redhat.com, sarumuga at redhat.com,
                    storage-qa-internal at redhat.com
        Depends On: 1261689

+++ This bug was initially created as a clone of Bug #1261689 +++

Description of problem:
geo-replication passive faulty

Version-Release number of selected component (if applicable):
glusterfs-3.6.3

How reproducible:
didn't try to reproduce 

Steps to Reproduce:
1. create glusterfs distributed-replication
2. create glusterfs geo-replication 
3.

Actual results:

Expected results:

Additional info:
ssh%3A%2F%2Fgeorepuser1%4052.74.184.17%3Agluster%3A%2F%2F127.0.0.1%3Acnprddrnas.log
[2015-08-31 00:31:18.726161] E
[syncdutils(/estore_disk02):246:log_raise_exception] <top>: connection to peer
is broken
[2015-08-31 00:31:18.726675] I [syncdutils(/estore_disk02):214:finalize] <top>:
exiting.
[2015-08-31 00:31:18.728502] I [repce(agent):92:service_loop] RepceServer:
terminating on reaching EOF.
[2015-08-31 00:31:18.729090] I [syncdutils(agent):214:finalize] <top>: exiting.
[2015-08-31 00:31:18.801343] I [monitor(monitor):141:set_state] Monitor: new
state: faulty
[2015-08-31 00:31:48.324974] I [monitor(monitor):215:monitor] Monitor:
------------------------------------------------------------
[2015-08-31 00:31:48.325562] I [monitor(monitor):216:monitor] Monitor: starting
gsyncd worker
[2015-08-31 00:31:48.540290] I [gsyncd(/estore_disk02):633:main_i] <top>:
syncing: gluster://localhost:cnprdnas ->
ssh://georepuser1@sg1-cndr-fst01:gluster://localhost:cnprddrnas
[2015-08-31 00:31:48.542256] I [changelogagent(agent):72:__init__]
ChangelogAgent: Agent listining...
[2015-08-31 00:32:17.621978] I [master(/estore_disk01):514:crawlwrap] _GMaster:
0 crawls, 0 turns
[2015-08-31 00:32:48.387411] I [monitor(monitor):281:monitor] Monitor:
worker(/estore_disk02) not confirmed in 60 sec, aborting it
[2015-08-31 00:32:48.389297] I [repce(agent):92:service_loop] RepceServer:
terminating on reaching EOF.
[2015-08-31 00:32:48.389467] I [syncdutils(agent):214:finalize] <top>: exiting.
[2015-08-31 00:32:59.851869] I [monitor(monitor):215:monitor] Monitor:
------------------------------------------------------------
[2015-08-31 00:32:59.852358] I [monitor(monitor):216:monitor] Monitor: starting
gsyncd worker
[2015-08-31 00:32:59.914304] I [gsyncd(/estore_disk02):633:main_i] <top>:
syncing: gluster://localhost:cnprdnas ->
ssh://georepuser1@sg1-cndr-fst01:gluster://localhost:cnprddrnas
[2015-08-31 00:32:59.915914] I [changelogagent(agent):72:__init__]
ChangelogAgent: Agent listining...
[2015-08-31 00:33:17.995424] I [master(/estore_disk01):514:crawlwrap] _GMaster:
0 crawls, 0 turns
[2015-08-31 00:33:59.911894] I [monitor(monitor):281:monitor] Monitor:
worker(/estore_disk02) not confirmed in 60 sec, aborting it
[2015-08-31 00:33:59.914036] I [repce(agent):92:service_loop] RepceServer:
terminating on reaching EOF.
[2015-08-31 00:33:59.914247] I [syncdutils(agent):214:finalize] <top>: exiting.
[2015-08-31 00:34:18.636343] I [master(/estore_disk01):514:crawlwrap] _GMaster:
0 crawls, 0 turns
[2015-08-31 00:34:57.938266] I [monitor(monitor):215:monitor] Monitor:
------------------------------------------------------------
[2015-08-31 00:34:57.938764] I [monitor(monitor):216:monitor] Monitor: starting
gsyncd worker
[2015-08-31 00:34:58.2572] I [gsyncd(/estore_disk02):633:main_i] <top>:
syncing: gluster://localhost:cnprdnas ->
ssh://georepuser1@sg1-cndr-fst01:gluster://localhost:cnprddrnas
[2015-08-31 00:34:58.3433] I [changelogagent(agent):72:__init__]
ChangelogAgent: Agent listining...
[2015-08-31 00:35:19.155519] I [master(/estore_disk01):514:crawlwrap] _GMaster:
0 crawls, 0 turns
[2015-08-31 00:35:58.533] I [monitor(monitor):281:monitor] Monitor:
worker(/estore_disk02) not confirmed in 60 sec, aborting it
[2015-08-31 00:35:58.2600] I [repce(agent):92:service_loop] RepceServer:
terminating on reaching EOF.
[2015-08-31 00:35:58.2794] I [syncdutils(agent):214:finalize] <top>: exiting.
[2015-08-31 00:36:08.497905] I [monitor(monitor):215:monitor] Monitor:
------------------------------------------------------------
[2015-08-31 00:36:08.498428] I [monitor(monitor):216:monitor] Monitor: starting
gsyncd worker
[2015-08-31 00:36:08.559051] I [gsyncd(/estore_disk02):633:main_i] <top>:
syncing: gluster://localhost:cnprdnas ->
ssh://georepuser1@sg1-cndr-fst01:gluster://localhost:cnprddrnas
[2015-08-31 00:36:08.559923] I [changelogagent(agent):72:__init__]
ChangelogAgent: Agent listining...
[2015-08-31 00:36:19.614520] I [master(/estore_disk01):514:crawlwrap] _GMaster:
0 crawls, 0 turns
[2015-08-31 00:37:08.559808] I [monitor(monitor):281:monitor] Monitor:
worker(/estore_disk02) not confirmed in 60 sec, aborting it
[2015-08-31 00:37:08.561731] I [repce(agent):92:service_loop] RepceServer:
terminating on reaching EOF.
[2015-08-31 00:37:08.561908] I [syncdutils(agent):214:finalize] <top>: exiting.
[2015-08-31 00:37:20.161235] I [master(/estore_disk01):514:crawlwrap] _GMaster:
0 crawls, 0 turns
[2015-08-31 00:37:34.803252] I [monitor(monitor):215:monitor] Monitor:
------------------------------------------------------------
[2015-08-31 00:37:34.803802] I [monitor(monitor):216:monitor] Monitor: starting
gsyncd worker
[2015-08-31 00:37:34.866637] I [gsyncd(/estore_disk02):633:main_i] <top>:
syncing: gluster://localhost:cnprdnas ->
ssh://georepuser1@sg1-cndr-fst01:gluster://localhost:cnprddrnas
[2015-08-31 00:37:34.867227] I [changelogagent(agent):72:__init__]
ChangelogAgent: Agent listining...
[2015-08-31 00:38:20.776204] I [master(/estore_disk01):514:crawlwrap] _GMaster:
0 crawls, 0 turns
[2015-08-31 00:38:34.865513] I [monitor(monitor):281:monitor] Monitor:
worker(/estore_disk02) not confirmed in 60 sec, aborting it
[2015-08-31 00:38:34.867564] I [repce(agent):92:service_loop] RepceServer:
terminating on reaching EOF.
[2015-08-31 00:38:34.867752] I [syncdutils(agent):214:finalize] <top>: exiting.
[2015-08-31 00:39:21.417596] I [master(/estore_disk01):514:crawlwrap] _GMaster:
0 crawls, 0 turns
[2015-08-31 00:39:37.110416] I [monitor(monitor):215:monitor] Monitor:
------------------------------------------------------------
[2015-08-31 00:39:37.110945] I [monitor(monitor):216:monitor] Monitor: starting
gsyncd worker
[2015-08-31 00:39:37.173197] I [gsyncd(/estore_disk02):633:main_i] <top>:
syncing: gluster://localhost:cnprdnas ->
ssh://georepuser1@sg1-cndr-fst01:gluster://localhost:cnprddrnas
[2015-08-31 00:39:37.173186] I [changelogagent(agent):72:__init__]
ChangelogAgent: Agent listining...
[2015-08-31 00:40:21.983525] I [master(/estore_disk01):514:crawlwrap] _GMaster:
0 crawls, 0 turns
[2015-08-31 00:40:37.118439] I [monitor(monitor):281:monitor] Monitor:
worker(/estore_disk02) not confirmed in 60 sec, aborting it
[2015-08-31 00:40:37.120224] I [repce(agent):92:service_loop] RepceServer:
terminating on reaching EOF.
[2015-08-31 00:40:37.120392] I [syncdutils(agent):214:finalize] <top>: exiting.
[2015-08-31 00:40:48.605340] I [monitor(monitor):215:monitor] Monitor:
------------------------------------------------------------
[2015-08-31 00:40:48.605844] I [monitor(monitor):216:monitor] Monitor: starting
gsyncd worker
[2015-08-31 00:40:48.668005] I [changelogagent(agent):72:__init__]
ChangelogAgent: Agent listining...
[2015-08-31 00:40:48.668332] I [gsyncd(/estore_disk02):633:main_i] <top>:
syncing: gluster://localhost:cnprdnas ->
ssh://georepuser1@sg1-cndr-fst01:gluster://localhost:cnprddrnas
[2015-08-31 00:41:22.499874] I [master(/estore_disk01):514:crawlwrap] _GMaster:
0 crawls, 0 turns
[2015-08-31 00:41:48.667587] I [monitor(monitor):281:monitor] Monitor:
worker(/estore_disk02) not confirmed in 60 sec, aborting it
[2015-08-31 00:41:48.669570] I [repce(agent):92:service_loop] RepceServer:
terminating on reaching EOF.
[2015-08-31 00:41:48.669785] I [syncdutils(agent):214:finalize] <top>: exiting.
[2015-08-31 00:42:13.974661] I [monitor(monitor):215:monitor] Monitor:
------------------------------------------------------------
[2015-08-31 00:42:13.975204] I [monitor(monitor):216:monitor] Monitor: starting
gsyncd worker
[2015-08-31 00:42:14.37354] I [gsyncd(/estore_disk02):633:main_i] <top>:
syncing: gluster://localhost:cnprdnas ->
ssh://georepuser1@sg1-cndr-fst01:gluster://localhost:cnprddrnas
[2015-08-31 00:42:14.37564] I [changelogagent(agent):72:__init__]
ChangelogAgent: Agent listining...
[2015-08-31 00:42:22.983730] I [master(/estore_disk01):514:crawlwrap] _GMaster:
0 crawls, 0 turns
[2015-08-31 00:43:14.36992] I [monitor(monitor):281:monitor] Monitor:
worker(/estore_disk02) not confirmed in 60 sec, aborting it
[2015-08-31 00:43:14.38670] I [repce(agent):92:service_loop] RepceServer:
terminating on reaching EOF.
[2015-08-31 00:43:14.38840] I [syncdutils(agent):214:finalize] <top>: exiting.
[2015-08-31 00:43:23.795301] I [master(/estore_disk01):514:crawlwrap] _GMaster:
0 crawls, 0 turns
[2015-08-31 00:44:24.395556] I [master(/estore_disk01):514:crawlwrap] _GMaster:
0 crawls, 0 turns
[2015-08-31 00:45:24.291826] E [resource(monitor):221:errlog] Popen: command
"gluster --xml --remote-host=sg1-cndr-fst01 volume status cnprddrnas detail"
returned with 146
[2015-08-31 00:45:24.292504] I [syncdutils(monitor):214:finalize] <top>:
exiting.

ssh%3A%2F%2Fgeorepuser1%4052.74.184.17%3Agluster%3A%2F%2F127.0.0.1%3Acnprddrnas.%2Festore_disk02.gluster.log
[2015-08-31 00:31:18.728293] I [fuse-bridge.c:4921:fuse_thread_proc] 0-fuse:
unmounting /tmp/gsyncd-aux-mount-BehNtW
[2015-08-31 00:31:18.797937] W [glusterfsd.c:1194:cleanup_and_exit] (--> 0-:
received signum (15), shutting down
[2015-08-31 00:31:18.797957] I [fuse-bridge.c:5599:fini] 0-fuse: Unmounting
'/tmp/gsyncd-aux-mount-BehNtW'.

file attach
sosreport, glusterfs log

--- Additional comment from hojin kim on 2015-09-15 22:22:36 EDT ---

Hi, Venky. the configuration is as below..

--------------------------------------------------------------
mount volume 

server1: CN1-PRD-FS01     CN2-PRD-FS01 ==> replicated volume0
                  |                        |
             distributed          distributed
                  |                         |
server2: CN1-PRD-FS02    CN2-PRD-FS02 ==> replicated volume1
                 | 
                 |
                 |
                 |
georeplicate  cndr 

we restart the geo-replication and the problem is solved.
but we want to know which cause fall into the faulty state.
Thank for advance...

--- Additional comment from hojin kim on 2015-09-17 03:08:52 EDT ---

we met same error again. It fell into faulty status,,,we restart by force
option..
please let me know the reason, asap...Thks..

--- Additional comment from Aravinda VK on 2016-08-19 07:30:43 EDT ---

GlusterFS-3.6 is nearing its End-Of-Life, only important security bugs still
make a chance on getting fixed. Moving this to the mainline 'version'. If this
needs to get fixed in 3.7 or 3.8 this bug should get cloned.

Referenced Bugs:

https://bugzilla.redhat.com/show_bug.cgi?id=1261689
[Bug 1261689] geo-replication faulty
-- 
You are receiving this mail because:
You are on the CC list for the bug.
Unsubscribe from this bug https://bugzilla.redhat.com/token.cgi?t=tsPQKEQmbX&a=cc_unsubscribe