[Bugs] [Bug 1435587] [GSS]geo-replication faulty

Mon Jul 24 12:04:40 UTC 2017

https://bugzilla.redhat.com/show_bug.cgi?id=1435587

Rochelle <rallan at redhat.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|ON_QA                       |VERIFIED
                 CC|                            |rallan at redhat.com

--- Comment #18 from Rochelle <rallan at redhat.com> ---

    Verified with the build: glusterfs-geo-replication-3.8.4-35.el7rhgs.x86_64

    1. Monitor starts the worker and agent 
    2. Worker will do the ssh connection between master and slave. gsyncd on
slave mounts the slave volume on local slave host.
    3. Worker mounts the master volume locally
    4. Upon succesful completion of 2 and 3. It acknowledges back to the
monitor.

    [2017-06-29 11:33:33.563153] I [monitor(monitor):275:monitor] Monitor:
starting gsyncd worker(/rhs/brick2/b4). Slave node:
ssh://root@10.70.43.150:gluster://localhost:slave
    [2017-06-29 11:33:33.588865] I [monitor(monitor):275:monitor] Monitor:
starting gsyncd worker(/rhs/brick1/b1). Slave node:
ssh://root@10.70.43.52:gluster://localhost:slave
    [2017-06-29 11:33:33.815712] I
[resource(/rhs/brick2/b4):1676:connect_remote] SSH: Initializing SSH connection
between master and slave...
    [2017-06-29 11:33:33.818607] I [changelogagent(/rhs/brick2/b4):73:__init__]
ChangelogAgent: Agent listining...
    [2017-06-29 11:33:33.824904] I
[resource(/rhs/brick1/b1):1676:connect_remote] SSH: Initializing SSH connection
between master and slave...
    [2017-06-29 11:33:33.840003] I [changelogagent(/rhs/brick1/b1):73:__init__]
ChangelogAgent: Agent listining...
    [2017-06-29 11:33:40.65862] I
[resource(/rhs/brick2/b4):1683:connect_remote] SSH: SSH connection between
master and slave established. Time taken: 6.2496 secs
    [2017-06-29 11:33:40.66544] I [resource(/rhs/brick2/b4):1416:connect]
GLUSTER: Mounting gluster volume locally...
    [2017-06-29 11:33:40.139708] I
[resource(/rhs/brick1/b1):1683:connect_remote] SSH: SSH connection between
master and slave established. Time taken: 6.3142 secs
    [2017-06-29 11:33:40.140361] I [resource(/rhs/brick1/b1):1416:connect]
GLUSTER: Mounting gluster volume locally...
    [2017-06-29 11:33:45.215692] I [resource(/rhs/brick2/b4):1429:connect]
GLUSTER: Mounted gluster volume. Time taken: 5.1487 secs
    [2017-06-29 11:33:45.216334] I [gsyncd(/rhs/brick2/b4):778:main_i] <top>:
Closing feedback fd, waking up the monitor
    [2017-06-29 11:33:45.280860] I [resource(/rhs/brick1/b1):1429:connect]
GLUSTER: Mounted gluster volume. Time taken: 5.1400 secs
    [2017-06-29 11:33:45.281521] I [gsyncd(/rhs/brick1/b1):778:main_i] <top>:
Closing feedback fd, waking up the monitor

    Incase if the failure is in the ssh connection, it fails and logs the
errors:

    [2017-07-24 11:56:07.717888] I [gsyncd(conf):688:main_i] <top>: Config Set:
session-owner = a7039831-0bb5-40ec-a6aa-1e568b0ba516
    [2017-07-24 11:56:08.607244] I
[gsyncdstatus(monitor):240:set_worker_status] GeorepStatus: Worker Status:
Initializing...
    [2017-07-24 11:56:08.611022] I
[gsyncdstatus(monitor):240:set_worker_status] GeorepStatus: Worker Status:
Initializing...
    [2017-07-24 11:56:08.625727] I
[gsyncdstatus(monitor):240:set_worker_status] GeorepStatus: Worker Status:
Initializing...
    [2017-07-24 11:56:08.897754] I
[monitor(monitor):74:get_slave_bricks_status] <top>: Unable to get list of up
nodes of slave, returning empty list: Another transaction is in progress for
slave. Please try agai
    n after sometime.
    [2017-07-24 11:56:08.898608] I [monitor(monitor):275:monitor] Monitor:
starting gsyncd worker(/rhs/brick2/b5). Slave node:
ssh://root@10.70.41.209:gluster://localhost:slave
    [2017-07-24 11:56:08.923662] I [monitor(monitor):275:monitor] Monitor:
starting gsyncd worker(/rhs/brick1/b1). Slave node:
ssh://root@10.70.41.209:gluster://localhost:slave
    [2017-07-24 11:56:08.939045] I [monitor(monitor):275:monitor] Monitor:
starting gsyncd worker(/rhs/brick3/b9). Slave node:
ssh://root@10.70.41.209:gluster://localhost:slave
    [2017-07-24 11:56:09.225071] I [changelogagent(/rhs/brick1/b1):73:__init__]
ChangelogAgent: Agent listining...
    [2017-07-24 11:56:09.231733] I
[resource(/rhs/brick2/b5):1676:connect_remote] SSH: Initializing SSH connection
between master and slave...
    [2017-07-24 11:56:09.241522] I [changelogagent(/rhs/brick2/b5):73:__init__]
ChangelogAgent: Agent listining...
    [2017-07-24 11:56:09.261339] I [changelogagent(/rhs/brick3/b9):73:__init__]
ChangelogAgent: Agent listining...
    [2017-07-24 11:56:09.262459] I
[resource(/rhs/brick3/b9):1676:connect_remote] SSH: Initializing SSH connection
between master and slave...
    [2017-07-24 11:56:09.274967] I
[resource(/rhs/brick1/b1):1676:connect_remote] SSH: Initializing SSH connection
between master and slave...
    [2017-07-24 11:56:14.779021] E
[syncdutils(/rhs/brick3/b9):269:log_raise_exception] <top>: connection to peer
is broken
    [2017-07-24 11:56:14.778943] E
[syncdutils(/rhs/brick2/b5):269:log_raise_exception] <top>: connection to peer
is broken
    [2017-07-24 11:56:14.779907] E [resource(/rhs/brick3/b9):234:errlog] Popen:
command "ssh -oPasswordAuthentication=no -oStrictHostKeyChecking=no -i
/var/lib/glusterd/geo-replication/secret.pem -p 22 -oControlM
    aster=auto -S
/tmp/gsyncd-aux-ssh-OPaCBo/aa544c64f039a6d7349da94e0501bc58.sock
root at 10.70.41.209 /nonexistent/gsyncd --session-owner
a7039831-0bb5-40ec-a6aa-1e568b0ba516 --local-id .%2Frhs%2Fbrick3%2Fb9 --loc
    al-node 10.70.42.79 -N --listen --timeout 120 gluster://localhost:slave"
returned with 1, saying:
    [2017-07-24 11:56:14.780435] E [resource(/rhs/brick3/b9):238:logerr] Popen:
ssh> [2017-07-24 11:56:10.085599] I [cli.c:762:main] 0-cli: Started running
/usr/sbin/gluster with version 3.8.4
    [2017-07-24 11:56:14.780502] E [resource(/rhs/brick2/b5):234:errlog] Popen:
command "ssh -oPasswordAuthentication=no -oStrictHostKeyChecking=no -i
/var/lib/glusterd/geo-replication/secret.pem -p 22 -oControlM
    aster=auto -S
/tmp/gsyncd-aux-ssh-UJB2EP/aa544c64f039a6d7349da94e0501bc58.sock
root at 10.70.41.209 /nonexistent/gsyncd --session-owner
a7039831-0bb5-40ec-a6aa-1e568b0ba516 --local-id .%2Frhs%2Fbrick2%2Fb5 --loc
    al-node 10.70.42.79 -N --listen --timeout 120 gluster://localhost:slave"
returned with 1, saying:
    [2017-07-24 11:56:14.780728] E [resource(/rhs/brick3/b9):238:logerr] Popen:
ssh> [2017-07-24 11:56:10.085653] I [cli.c:645:cli_rpc_init] 0-cli: Connecting
to remote glusterd at localhost
    [2017-07-24 11:56:14.780967] E [resource(/rhs/brick3/b9):238:logerr] Popen:
ssh> [2017-07-24 11:56:10.298145] I [MSGID: 101190]
[event-epoll.c:602:event_dispatch_epoll_worker] 0-epoll: Started thread with
ind
    ex 1
    [2017-07-24 11:56:14.780844] E [resource(/rhs/brick2/b5):238:logerr] Popen:
ssh> [2017-07-24 11:56:10.082411] I [cli.c:762:main] 0-cli: Started running
/usr/sbin/gluster with version 3.8.4
    [2017-07-24 11:56:14.781389] E [resource(/rhs/brick2/b5):238:logerr] Popen:
ssh> [2017-07-24 11:56:10.082532] I [cli.c:645:cli_rpc_init] 0-cli: Connecting
to remote glusterd at localhost
    [2017-07-24 11:56:14.781356] E [resource(/rhs/brick3/b9):238:logerr] Popen:
ssh> [2017-07-24 11:56:10.300407] I [socket.c:2465:socket_event_handler]
0-transport: EPOLLERR - disconnecting now
    [2017-07-24 11:56:14.781673] E [resource(/rhs/brick2/b5):238:logerr] Popen:
ssh> [2017-07-24 11:56:10.300755] I [MSGID: 101190]
[event-epoll.c:602:event_dispatch_epoll_worker] 0-epoll: Started thread with
ind
    ex 1
    [2017-07-24 11:56:14.781827] E [resource(/rhs/brick3/b9):238:logerr] Popen:
ssh> [2017-07-24 11:56:10.302962] I [cli-rpc-ops.c:7020:gf_cli_getwd_cbk]
0-cli: Received resp to getwd
    [2017-07-24 11:56:14.782166] E [resource(/rhs/brick2/b5):238:logerr] Popen:
ssh> [2017-07-24 11:56:10.300919] I [socket.c:2465:socket_event_handler]
0-transport: EPOLLERR - disconnecting now
    [2017-07-24 11:56:14.782174] E [resource(/rhs/brick3/b9):238:logerr] Popen:
ssh> [2017-07-24 11:56:10.303072] I [input.c:31:cli_batch] 0-: Exiting with: 0
    [2017-07-24 11:56:14.782412] E [resource(/rhs/brick2/b5):238:logerr] Popen:
ssh> [2017-07-24 11:56:10.302254] I [cli-rpc-ops.c:7020:gf_cli_getwd_cbk]
0-cli: Received resp to getwd
    [2017-07-24 11:56:14.782664] E [resource(/rhs/brick2/b5):238:logerr] Popen:
ssh> [2017-07-24 11:56:10.302529] I [input.c:31:cli_batch] 0-: Exiting with: 0
    [2017-07-24 11:56:14.782872] I [syncdutils(/rhs/brick3/b9):237:finalize]
<top>: exiting.
    [2017-07-24 11:56:14.783211] I [syncdutils(/rhs/brick2/b5):237:finalize]
<top>: exiting.

    Since the fix is only in the logging improvements, moving the bug to
verified state based on the above logs.

-- 
You are receiving this mail because:
You are on the CC list for the bug.
Unsubscribe from this bug https://bugzilla.redhat.com/token.cgi?t=dJe4kerymX&a=cc_unsubscribe