[Gluster-users] Geo-replication 'faulty' status during initial sync, 'Transport endpoint is not connected'

Tom Fite tomfite at gmail.com
Mon May 15 19:44:35 UTC 2017


Hi all,

I've hit a strange problem with geo-replication.

On gluster 3.10.1, I have set up geo replication between my replicated /
distributed instance and a remote replicated / distributed instance. The
master and slave instances are connected via VPN. Initially the
geo-replication setup was working fine, I had a status of "Active" with
"Changelog crawl" previously after the initial setup, and I confirmed that
files were synced between the two gluster instances.

Something must have changed between then and now, because about a week
after the instance had been online it switched to a "Faulty" status.

[root at master-gfs1 ~]# gluster volume geo-replication gv0
root at slave-gfs1.tomfite.com::gv0 status

MASTER NODE                       MASTER VOL    MASTER BRICK        SLAVE
USER    SLAVE                                 SLAVE NODE
    STATUS     CRAWL STATUS    LAST_SYNCED
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
master-gfs1.tomfite.com    gv0           /data/brick1/gv0    root
 slave-gfs1.tomfite.com::gv0    N/A                              Faulty
N/A             N/A
master-gfs1.tomfite.com    gv0           /data/brick2/gv0    root
 slave-gfs1.tomfite.com::gv0    N/A                              Faulty
N/A             N/A
master-gfs1.tomfite.com    gv0           /data/brick3/gv0    root
 slave-gfs1.tomfite.com::gv0    N/A                              Faulty
N/A             N/A
master-gfs2.tomfite.com    gv0           /data/brick1/gv0    root
 slave-gfs1.tomfite.com::gv0    slave-gfs1.tomfite.com    Passive    N/A
          N/A
master-gfs2.tomfite.com    gv0           /data/brick2/gv0    root
 slave-gfs1.tomfite.com::gv0    slave-gfs1.tomfite.com    Passive    N/A
          N/A
master-gfs2.tomfite.com    gv0           /data/brick3/gv0    root
 slave-gfs1.tomfite.com::gv0    slave-gfs1.tomfite.com    Passive    N/A
          N/A

>From the logs (see below) seems like there is an issue trying to sync files
to the slave, as I get a "Transport is not connected" error when gsyncd
attempts to sync the first set of files.

Here's what I've tried so far:

1. ssh_port is currently configured on a non-standard port. I switched the
port to the standard 22 but observed no change in behavior.
2. I verified that SELinux is disabled on all boxes, and that there are no
firewalls running.
3. The remote_gsyncd setting was set to "/nonexistent/gsyncd' which looked
incorrect, changed it to a valid location for that executable
/usr/libexec/glusterfs/gsyncd
4. In an attempt to start the slave from scratch, I removed all files from
the slave and reset the geo-replication instance by deleting and recreating
the session.

Debug logs when trying to start geo-replication:

[2017-05-15 16:31:32.940068] I [gsyncd(conf):689:main_i] <top>: Config Set:
session-owner = d37a7455-0b1b-402e-985b-cf1ace4e513e
[2017-05-15 16:31:33.293926] D [monitor(monitor):434:distribute] <top>:
master bricks: [{'host': 'master-gfs1.tomfite.com', 'uuid':
'e0d9d624-5383-4c43-aca4-e946e7de296d', 'dir': '/data/brick1/gv0'},
{'host': 'master-gfs2.tomfite.com', 'uuid':
'bdbb7a18-3ecf-4733-a5df-447d8c712af5', 'dir': '/data/brick1/gv0'},
{'host': 'master-gfs1.tomfite.com', 'uuid':
'e0d9d624-5383-4c43-aca4-e946e7de296d', 'dir': '/data/brick2/gv0'},
{'host': 'master-gfs2.tomfite.com', 'uuid':
'bdbb7a18-3ecf-4733-a5df-447d8c712af5', 'dir': '/data/brick2/gv0'},
{'host': 'master-gfs1.tomfite.com', 'uuid':
'e0d9d624-5383-4c43-aca4-e946e7de296d', 'dir': '/data/brick3/gv0'},
{'host': 'master-gfs2.tomfite.com', 'uuid':
'bdbb7a18-3ecf-4733-a5df-447d8c712af5', 'dir': '/data/brick3/gv0'}]
[2017-05-15 16:31:33.294250] D [monitor(monitor):443:distribute] <top>:
slave SSH gateway: slave-gfs1.tomfite.com
[2017-05-15 16:31:33.424451] D [monitor(monitor):464:distribute] <top>:
slave bricks: [{'host': 'slave-gfs1.tomfite.com', 'uuid':
'c184bc78-cff0-4cef-8c6a-e637ab52b324', 'dir': '/data/brick1/gv0'},
{'host': 'slave-gfs2.tomfite.com', 'uuid':
'7290f265-0709-45fc-86ef-2ff5125d31e1', 'dir': '/data/brick1/gv0'},
{'host': 'slave-gfs1.tomfite.com', 'uuid':
'c184bc78-cff0-4cef-8c6a-e637ab52b324', 'dir': '/data/brick2/gv0'},
{'host': 'slave-gfs2.tomfite.com', 'uuid':
'7290f265-0709-45fc-86ef-2ff5125d31e1', 'dir': '/data/brick2/gv0'},
{'host': 'slave-gfs1.tomfite.com', 'uuid':
'c184bc78-cff0-4cef-8c6a-e637ab52b324', 'dir': '/data/brick3/gv0'},
{'host': 'slave-gfs2.tomfite.com', 'uuid':
'7290f265-0709-45fc-86ef-2ff5125d31e1', 'dir': '/data/brick3/gv0'}]
[2017-05-15 16:31:33.424927] D [monitor(monitor):119:is_hot] Volinfo:
brickpath: 'master-gfs1.tomfite.com:/data/brick1/gv0'
[2017-05-15 16:31:33.425452] D [monitor(monitor):119:is_hot] Volinfo:
brickpath: 'master-gfs1.tomfite.com:/data/brick2/gv0'
[2017-05-15 16:31:33.425790] D [monitor(monitor):119:is_hot] Volinfo:
brickpath: 'master-gfs1.tomfite.com:/data/brick3/gv0'
[2017-05-15 16:31:33.426130] D [monitor(monitor):489:distribute] <top>:
worker specs: [({'host': 'master-gfs1.tomfite.com', 'uuid':
'e0d9d624-5383-4c43-aca4-e946e7de296d', 'dir': '/data/brick1/gv0'},
'ssh://root@slave-gfs2.tomfite.com:gluster://localhost:gv0', '1', False),
({'host': 'master-gfs1.tomfite.com', 'uuid':
'e0d9d624-5383-4c43-aca4-e946e7de296d', 'dir': '/data/brick2/gv0'},
'ssh://root@slave-gfs2.tomfite.com:gluster://localhost:gv0', '2', False),
({'host': 'master-gfs1.tomfite.com', 'uuid':
'e0d9d624-5383-4c43-aca4-e946e7de296d', 'dir': '/data/brick3/gv0'},
'ssh://root@slave-gfs2.tomfite.com:gluster://localhost:gv0', '3', False)]
[2017-05-15 16:31:33.429359] I
[gsyncdstatus(monitor):241:set_worker_status] GeorepStatus: Worker Status:
Initializing...
[2017-05-15 16:31:33.432882] I
[gsyncdstatus(monitor):241:set_worker_status] GeorepStatus: Worker Status:
Initializing...
[2017-05-15 16:31:33.435489] I
[gsyncdstatus(monitor):241:set_worker_status] GeorepStatus: Worker Status:
Initializing...
[2017-05-15 16:31:33.574393] I
[monitor(monitor):74:get_slave_bricks_status] <top>: Unable to get list of
up nodes of gv0, returning empty list: Another transaction is in progress
for gv0. Please try again after sometime.
[2017-05-15 16:31:33.574764] I [monitor(monitor):275:monitor] Monitor:
starting gsyncd worker(/data/brick2/gv0). Slave node:
ssh://root@slave-gfs2.tomfite.com:gluster://localhost:gv0
[2017-05-15 16:31:33.578641] I
[monitor(monitor):74:get_slave_bricks_status] <top>: Unable to get list of
up nodes of gv0, returning empty list: Another transaction is in progress
for gv0. Please try again after sometime.
[2017-05-15 16:31:33.579119] I [monitor(monitor):275:monitor] Monitor:
starting gsyncd worker(/data/brick1/gv0). Slave node:
ssh://root@slave-gfs2.tomfite.com:gluster://localhost:gv0
[2017-05-15 16:31:33.585609] I [monitor(monitor):275:monitor] Monitor:
starting gsyncd worker(/data/brick3/gv0). Slave node:
ssh://root@slave-gfs2.tomfite.com:gluster://localhost:gv0
[2017-05-15 16:31:33.671281] D [gsyncd(/data/brick1/gv0):765:main_i] <top>:
rpc_fd: '9,12,11,10'
[2017-05-15 16:31:33.672070] I
[changelogagent(/data/brick1/gv0):73:__init__] ChangelogAgent: Agent
listining...
[2017-05-15 16:31:33.673501] D [gsyncd(/data/brick3/gv0):765:main_i] <top>:
rpc_fd: '8,11,10,9'
[2017-05-15 16:31:33.674078] I
[changelogagent(/data/brick3/gv0):73:__init__] ChangelogAgent: Agent
listining...
[2017-05-15 16:31:33.676042] D [gsyncd(/data/brick2/gv0):765:main_i] <top>:
rpc_fd: '9,14,13,11'
[2017-05-15 16:31:33.676713] I
[changelogagent(/data/brick2/gv0):73:__init__] ChangelogAgent: Agent
listining...
[2017-05-15 16:31:33.695128] D [repce(/data/brick2/gv0):191:push]
RepceClient: call 12632:140397039490880:1494865893.7 __repce_version__() ...
[2017-05-15 16:31:33.696594] D [repce(/data/brick1/gv0):191:push]
RepceClient: call 12634:139689683523392:1494865893.7 __repce_version__() ...
[2017-05-15 16:31:33.706545] D [repce(/data/brick3/gv0):191:push]
RepceClient: call 12636:140598056400704:1494865893.71 __repce_version__()
...
[2017-05-15 16:31:39.342730] D [repce(/data/brick2/gv0):209:__call__]
RepceClient: call 12632:140397039490880:1494865893.7 __repce_version__ ->
1.0
[2017-05-15 16:31:39.343020] D [repce(/data/brick2/gv0):191:push]
RepceClient: call 12632:140397039490880:1494865899.34 version() ...
[2017-05-15 16:31:39.343569] D [repce(/data/brick3/gv0):209:__call__]
RepceClient: call 12636:140598056400704:1494865893.71 __repce_version__ ->
1.0
[2017-05-15 16:31:39.343859] D [repce(/data/brick3/gv0):191:push]
RepceClient: call 12636:140598056400704:1494865899.34 version() ...
[2017-05-15 16:31:39.349275] D [repce(/data/brick1/gv0):209:__call__]
RepceClient: call 12634:139689683523392:1494865893.7 __repce_version__ ->
1.0
[2017-05-15 16:31:39.349540] D [repce(/data/brick1/gv0):191:push]
RepceClient: call 12634:139689683523392:1494865899.35 version() ...
[2017-05-15 16:31:39.349998] D [repce(/data/brick2/gv0):209:__call__]
RepceClient: call 12632:140397039490880:1494865899.34 version -> 1.0
[2017-05-15 16:31:39.350292] D [repce(/data/brick2/gv0):191:push]
RepceClient: call 12632:140397039490880:1494865899.35 pid() ...
[2017-05-15 16:31:39.350780] D [repce(/data/brick3/gv0):209:__call__]
RepceClient: call 12636:140598056400704:1494865899.34 version -> 1.0
[2017-05-15 16:31:39.351070] D [repce(/data/brick3/gv0):191:push]
RepceClient: call 12636:140598056400704:1494865899.35 pid() ...
[2017-05-15 16:31:39.356405] D [repce(/data/brick1/gv0):209:__call__]
RepceClient: call 12634:139689683523392:1494865899.35 version -> 1.0
[2017-05-15 16:31:39.356715] D [repce(/data/brick1/gv0):191:push]
RepceClient: call 12634:139689683523392:1494865899.36 pid() ...
[2017-05-15 16:31:39.357254] D [repce(/data/brick2/gv0):209:__call__]
RepceClient: call 12632:140397039490880:1494865899.35 pid -> 19304
[2017-05-15 16:31:39.357983] D [repce(/data/brick3/gv0):209:__call__]
RepceClient: call 12636:140598056400704:1494865899.35 pid -> 19305
[2017-05-15 16:31:39.363502] D [repce(/data/brick1/gv0):209:__call__]
RepceClient: call 12634:139689683523392:1494865899.36 pid -> 19303
[2017-05-15 16:31:43.453656] D [resource(/data/brick3/gv0):1332:inhibit]
DirectMounter: auxiliary glusterfs mount in place
[2017-05-15 16:31:43.462914] D [resource(/data/brick1/gv0):1332:inhibit]
DirectMounter: auxiliary glusterfs mount in place
[2017-05-15 16:31:43.464389] D [resource(/data/brick2/gv0):1332:inhibit]
DirectMounter: auxiliary glusterfs mount in place
[2017-05-15 16:31:44.478801] D [resource(/data/brick3/gv0):1387:inhibit]
DirectMounter: auxiliary glusterfs mount prepared
[2017-05-15 16:31:44.479312] D
[master(/data/brick3/gv0):101:gmaster_builder] <top>: setting up xsync
change detection mode
[2017-05-15 16:31:44.479366] D [monitor(monitor):350:monitor] Monitor:
worker(/data/brick3/gv0) connected
[2017-05-15 16:31:44.480387] D
[master(/data/brick3/gv0):101:gmaster_builder] <top>: setting up changelog
change detection mode
[2017-05-15 16:31:44.481631] D
[master(/data/brick3/gv0):101:gmaster_builder] <top>: setting up
changeloghistory change detection mode
[2017-05-15 16:31:44.485300] D [repce(/data/brick3/gv0):191:push]
RepceClient: call 12636:140598056400704:1494865904.49 version() ...
[2017-05-15 16:31:44.485999] D [repce(/data/brick3/gv0):209:__call__]
RepceClient: call 12636:140598056400704:1494865904.49 version -> 1.0
[2017-05-15 16:31:44.486202] D
[master(/data/brick3/gv0):752:setup_working_dir] _GMaster: changelog
working dir
/var/lib/misc/glusterfsd/gv0/ssh%3A%2F%2Froot%40172.17.20.60%3Agluster%3A%2F%2F127.0.0.1%3Agv0/9cffa7778b4f82aafb982c0f3eb3d650
[2017-05-15 16:31:44.486382] D [repce(/data/brick3/gv0):191:push]
RepceClient: call 12636:140598056400704:1494865904.49 init() ...
[2017-05-15 16:31:44.487781] D [resource(/data/brick1/gv0):1387:inhibit]
DirectMounter: auxiliary glusterfs mount prepared
[2017-05-15 16:31:44.488292] D
[master(/data/brick1/gv0):101:gmaster_builder] <top>: setting up xsync
change detection mode
[2017-05-15 16:31:44.488245] D [monitor(monitor):350:monitor] Monitor:
worker(/data/brick1/gv0) connected
[2017-05-15 16:31:44.489343] D
[master(/data/brick1/gv0):101:gmaster_builder] <top>: setting up changelog
change detection mode
[2017-05-15 16:31:44.489279] D [resource(/data/brick2/gv0):1387:inhibit]
DirectMounter: auxiliary glusterfs mount prepared
[2017-05-15 16:31:44.489826] D
[master(/data/brick2/gv0):101:gmaster_builder] <top>: setting up xsync
change detection mode
[2017-05-15 16:31:44.489825] D [monitor(monitor):350:monitor] Monitor:
worker(/data/brick2/gv0) connected
[2017-05-15 16:31:44.490509] D
[master(/data/brick1/gv0):101:gmaster_builder] <top>: setting up
changeloghistory change detection mode
[2017-05-15 16:31:44.491131] D
[master(/data/brick2/gv0):101:gmaster_builder] <top>: setting up changelog
change detection mode
[2017-05-15 16:31:44.493197] D
[master(/data/brick2/gv0):101:gmaster_builder] <top>: setting up
changeloghistory change detection mode
[2017-05-15 16:31:44.493820] D [repce(/data/brick1/gv0):191:push]
RepceClient: call 12634:139689683523392:1494865904.49 version() ...
[2017-05-15 16:31:44.494577] D [repce(/data/brick1/gv0):209:__call__]
RepceClient: call 12634:139689683523392:1494865904.49 version -> 1.0
[2017-05-15 16:31:44.494801] D
[master(/data/brick1/gv0):752:setup_working_dir] _GMaster: changelog
working dir
/var/lib/misc/glusterfsd/gv0/ssh%3A%2F%2Froot%40172.17.20.60%3Agluster%3A%2F%2F127.0.0.1%3Agv0/430af6dc2d4f6e41e4786764428f83dd
[2017-05-15 16:31:44.494982] D [repce(/data/brick1/gv0):191:push]
RepceClient: call 12634:139689683523392:1494865904.49 init() ...
[2017-05-15 16:31:44.495695] D [repce(/data/brick2/gv0):191:push]
RepceClient: call 12632:140397039490880:1494865904.5 version() ...
[2017-05-15 16:31:44.496423] D [repce(/data/brick2/gv0):209:__call__]
RepceClient: call 12632:140397039490880:1494865904.5 version -> 1.0
[2017-05-15 16:31:44.496617] D
[master(/data/brick2/gv0):752:setup_working_dir] _GMaster: changelog
working dir
/var/lib/misc/glusterfsd/gv0/ssh%3A%2F%2Froot%40172.17.20.60%3Agluster%3A%2F%2F127.0.0.1%3Agv0/dff20e5176b24a4185f15b3e2c70fad8
[2017-05-15 16:31:44.496607] D [repce(/data/brick3/gv0):209:__call__]
RepceClient: call 12636:140598056400704:1494865904.49 init -> None
[2017-05-15 16:31:44.496813] D [repce(/data/brick2/gv0):191:push]
RepceClient: call 12632:140397039490880:1494865904.5 init() ...
[2017-05-15 16:31:44.496891] D [repce(/data/brick3/gv0):191:push]
RepceClient: call 12636:140598056400704:1494865904.5
register('/data/brick3/gv0',
'/var/lib/misc/glusterfsd/gv0/ssh%3A%2F%2Froot%40172.17.20.60%3Agluster%3A%2F%2F127.0.0.1%3Agv0/9cffa7778b4f82aafb982c0f3eb3d650',
'/var/log/glusterfs/geo-replication/gv0/ssh%3A%2F%2Froot%40172.17.20.60%3Agluster%3A%2F%2F127.0.0.1%3Agv0.%2Fdata%2Fbrick3%2Fgv0-changes.log',
7, 5) ...
[2017-05-15 16:31:44.505940] D [repce(/data/brick1/gv0):209:__call__]
RepceClient: call 12634:139689683523392:1494865904.49 init -> None
[2017-05-15 16:31:44.506314] D [repce(/data/brick1/gv0):191:push]
RepceClient: call 12634:139689683523392:1494865904.51
register('/data/brick1/gv0',
'/var/lib/misc/glusterfsd/gv0/ssh%3A%2F%2Froot%40172.17.20.60%3Agluster%3A%2F%2F127.0.0.1%3Agv0/430af6dc2d4f6e41e4786764428f83dd',
'/var/log/glusterfs/geo-replication/gv0/ssh%3A%2F%2Froot%40172.17.20.60%3Agluster%3A%2F%2F127.0.0.1%3Agv0.%2Fdata%2Fbrick1%2Fgv0-changes.log',
7, 5) ...
[2017-05-15 16:31:44.507751] D [repce(/data/brick2/gv0):209:__call__]
RepceClient: call 12632:140397039490880:1494865904.5 init -> None
[2017-05-15 16:31:44.508045] D [repce(/data/brick2/gv0):191:push]
RepceClient: call 12632:140397039490880:1494865904.51
register('/data/brick2/gv0',
'/var/lib/misc/glusterfsd/gv0/ssh%3A%2F%2Froot%40172.17.20.60%3Agluster%3A%2F%2F127.0.0.1%3Agv0/dff20e5176b24a4185f15b3e2c70fad8',
'/var/log/glusterfs/geo-replication/gv0/ssh%3A%2F%2Froot%40172.17.20.60%3Agluster%3A%2F%2F127.0.0.1%3Agv0.%2Fdata%2Fbrick2%2Fgv0-changes.log',
7, 5) ...
[2017-05-15 16:31:46.605554] D [repce(/data/brick3/gv0):209:__call__]
RepceClient: call 12636:140598056400704:1494865904.5 register -> None
[2017-05-15 16:31:46.605916] D
[master(/data/brick3/gv0):752:setup_working_dir] _GMaster: changelog
working dir
/var/lib/misc/glusterfsd/gv0/ssh%3A%2F%2Froot%40172.17.20.60%3Agluster%3A%2F%2F127.0.0.1%3Agv0/9cffa7778b4f82aafb982c0f3eb3d650
[2017-05-15 16:31:46.606117] D
[master(/data/brick3/gv0):752:setup_working_dir] _GMaster: changelog
working dir
/var/lib/misc/glusterfsd/gv0/ssh%3A%2F%2Froot%40172.17.20.60%3Agluster%3A%2F%2F127.0.0.1%3Agv0/9cffa7778b4f82aafb982c0f3eb3d650
[2017-05-15 16:31:46.606285] D
[master(/data/brick3/gv0):752:setup_working_dir] _GMaster: changelog
working dir
/var/lib/misc/glusterfsd/gv0/ssh%3A%2F%2Froot%40172.17.20.60%3Agluster%3A%2F%2F127.0.0.1%3Agv0/9cffa7778b4f82aafb982c0f3eb3d650
[2017-05-15 16:31:46.606420] I [master(/data/brick3/gv0):1328:register]
_GMaster: Working dir:
/var/lib/misc/glusterfsd/gv0/ssh%3A%2F%2Froot%40172.17.20.60%3Agluster%3A%2F%2F127.0.0.1%3Agv0/9cffa7778b4f82aafb982c0f3eb3d650
[2017-05-15 16:31:46.606653] I
[resource(/data/brick3/gv0):1604:service_loop] GLUSTER: Register time:
1494865906
[2017-05-15 16:31:46.607355] D [repce(/data/brick3/gv0):191:push]
RepceClient: call 12636:140597264365312:1494865906.61 keep_alive(None,) ...
[2017-05-15 16:31:46.610795] D [master(/data/brick3/gv0):540:crawlwrap]
_GMaster: primary master with volume id
d37a7455-0b1b-402e-985b-cf1ace4e513e ...
[2017-05-15 16:31:46.615416] D [repce(/data/brick3/gv0):209:__call__]
RepceClient: call 12636:140597264365312:1494865906.61 keep_alive -> 1
[2017-05-15 16:31:46.622519] I
[gsyncdstatus(/data/brick3/gv0):272:set_active] GeorepStatus: Worker
Status: Active
[2017-05-15 16:31:46.623460] I
[gsyncdstatus(/data/brick3/gv0):245:set_worker_crawl_status] GeorepStatus:
Crawl Status: History Crawl
[2017-05-15 16:31:46.623876] I [master(/data/brick3/gv0):1244:crawl]
_GMaster: starting history crawl... turns: 1, stime: (1492459926, 0),
etime: 1494865906, entry_stime: (1492459926, 0)
[2017-05-15 16:31:46.624118] D [repce(/data/brick3/gv0):191:push]
RepceClient: call 12636:140598056400704:1494865906.62
history('/data/brick3/gv0/.glusterfs/changelogs', 1492459926, 1494865906,
3) ...
[2017-05-15 16:31:46.639169] D [repce(/data/brick3/gv0):209:__call__]
RepceClient: call 12636:140598056400704:1494865906.62 history -> (0,
1494865893L)
[2017-05-15 16:31:46.639429] D [repce(/data/brick3/gv0):191:push]
RepceClient: call 12636:140598056400704:1494865906.64 history_scan() ...
[2017-05-15 16:31:46.671082] D [repce(/data/brick1/gv0):209:__call__]
RepceClient: call 12634:139689683523392:1494865904.51 register -> None
[2017-05-15 16:31:46.671462] D
[master(/data/brick1/gv0):752:setup_working_dir] _GMaster: changelog
working dir
/var/lib/misc/glusterfsd/gv0/ssh%3A%2F%2Froot%40172.17.20.60%3Agluster%3A%2F%2F127.0.0.1%3Agv0/430af6dc2d4f6e41e4786764428f83dd
[2017-05-15 16:31:46.671639] D
[master(/data/brick1/gv0):752:setup_working_dir] _GMaster: changelog
working dir
/var/lib/misc/glusterfsd/gv0/ssh%3A%2F%2Froot%40172.17.20.60%3Agluster%3A%2F%2F127.0.0.1%3Agv0/430af6dc2d4f6e41e4786764428f83dd
[2017-05-15 16:31:46.671840] D
[master(/data/brick1/gv0):752:setup_working_dir] _GMaster: changelog
working dir
/var/lib/misc/glusterfsd/gv0/ssh%3A%2F%2Froot%40172.17.20.60%3Agluster%3A%2F%2F127.0.0.1%3Agv0/430af6dc2d4f6e41e4786764428f83dd
[2017-05-15 16:31:46.671979] I [master(/data/brick1/gv0):1328:register]
_GMaster: Working dir:
/var/lib/misc/glusterfsd/gv0/ssh%3A%2F%2Froot%40172.17.20.60%3Agluster%3A%2F%2F127.0.0.1%3Agv0/430af6dc2d4f6e41e4786764428f83dd
[2017-05-15 16:31:46.671940] D [repce(/data/brick2/gv0):209:__call__]
RepceClient: call 12632:140397039490880:1494865904.51 register -> None
[2017-05-15 16:31:46.672233] I
[resource(/data/brick1/gv0):1604:service_loop] GLUSTER: Register time:
1494865906
[2017-05-15 16:31:46.672239] D
[master(/data/brick2/gv0):752:setup_working_dir] _GMaster: changelog
working dir
/var/lib/misc/glusterfsd/gv0/ssh%3A%2F%2Froot%40172.17.20.60%3Agluster%3A%2F%2F127.0.0.1%3Agv0/dff20e5176b24a4185f15b3e2c70fad8
[2017-05-15 16:31:46.672440] D
[master(/data/brick2/gv0):752:setup_working_dir] _GMaster: changelog
working dir
/var/lib/misc/glusterfsd/gv0/ssh%3A%2F%2Froot%40172.17.20.60%3Agluster%3A%2F%2F127.0.0.1%3Agv0/dff20e5176b24a4185f15b3e2c70fad8
[2017-05-15 16:31:46.672616] D
[master(/data/brick2/gv0):752:setup_working_dir] _GMaster: changelog
working dir
/var/lib/misc/glusterfsd/gv0/ssh%3A%2F%2Froot%40172.17.20.60%3Agluster%3A%2F%2F127.0.0.1%3Agv0/dff20e5176b24a4185f15b3e2c70fad8
[2017-05-15 16:31:46.672787] I [master(/data/brick2/gv0):1328:register]
_GMaster: Working dir:
/var/lib/misc/glusterfsd/gv0/ssh%3A%2F%2Froot%40172.17.20.60%3Agluster%3A%2F%2F127.0.0.1%3Agv0/dff20e5176b24a4185f15b3e2c70fad8
[2017-05-15 16:31:46.673033] I
[resource(/data/brick2/gv0):1604:service_loop] GLUSTER: Register time:
1494865906
[2017-05-15 16:31:46.673294] D [repce(/data/brick1/gv0):191:push]
RepceClient: call 12634:139688752957184:1494865906.67 keep_alive(None,) ...
[2017-05-15 16:31:46.674438] D [repce(/data/brick2/gv0):191:push]
RepceClient: call 12632:140395904235264:1494865906.67 keep_alive(None,) ...
[2017-05-15 16:31:46.675556] D [master(/data/brick1/gv0):540:crawlwrap]
_GMaster: primary master with volume id
d37a7455-0b1b-402e-985b-cf1ace4e513e ...
[2017-05-15 16:31:46.677221] D [master(/data/brick2/gv0):540:crawlwrap]
_GMaster: primary master with volume id
d37a7455-0b1b-402e-985b-cf1ace4e513e ...
[2017-05-15 16:31:46.680387] D [repce(/data/brick1/gv0):209:__call__]
RepceClient: call 12634:139688752957184:1494865906.67 keep_alive -> 1
[2017-05-15 16:31:46.681812] I
[gsyncdstatus(/data/brick1/gv0):272:set_active] GeorepStatus: Worker
Status: Active
[2017-05-15 16:31:46.682248] D [repce(/data/brick2/gv0):209:__call__]
RepceClient: call 12632:140395904235264:1494865906.67 keep_alive -> 1
[2017-05-15 16:31:46.682954] I
[gsyncdstatus(/data/brick1/gv0):245:set_worker_crawl_status] GeorepStatus:
Crawl Status: History Crawl
[2017-05-15 16:31:46.683324] I [master(/data/brick1/gv0):1244:crawl]
_GMaster: starting history crawl... turns: 1, stime: (1492459922, 0),
etime: 1494865906, entry_stime: (1492459922, 0)
[2017-05-15 16:31:46.683530] D [repce(/data/brick1/gv0):191:push]
RepceClient: call 12634:139689683523392:1494865906.68
history('/data/brick1/gv0/.glusterfs/changelogs', 1492459922, 1494865906,
3) ...
[2017-05-15 16:31:46.683958] I
[gsyncdstatus(/data/brick2/gv0):272:set_active] GeorepStatus: Worker
Status: Active
[2017-05-15 16:31:46.684827] I
[gsyncdstatus(/data/brick2/gv0):245:set_worker_crawl_status] GeorepStatus:
Crawl Status: History Crawl
[2017-05-15 16:31:46.685203] I [master(/data/brick2/gv0):1244:crawl]
_GMaster: starting history crawl... turns: 1, stime: (1492459925, 0),
etime: 1494865906, entry_stime: (1492459925, 0)
[2017-05-15 16:31:46.685420] D [repce(/data/brick2/gv0):191:push]
RepceClient: call 12632:140397039490880:1494865906.69
history('/data/brick2/gv0/.glusterfs/changelogs', 1492459925, 1494865906,
3) ...
[2017-05-15 16:31:46.702970] D [repce(/data/brick1/gv0):209:__call__]
RepceClient: call 12634:139689683523392:1494865906.68 history -> (0,
1494865893L)
[2017-05-15 16:31:46.703003] D [repce(/data/brick2/gv0):209:__call__]
RepceClient: call 12632:140397039490880:1494865906.69 history -> (0,
1494865897L)
[2017-05-15 16:31:46.703197] D [repce(/data/brick1/gv0):191:push]
RepceClient: call 12634:139689683523392:1494865906.7 history_scan() ...
[2017-05-15 16:31:46.703249] D [repce(/data/brick2/gv0):191:push]
RepceClient: call 12632:140397039490880:1494865906.7 history_scan() ...
[2017-05-15 16:31:46.703787] D [repce(/data/brick1/gv0):209:__call__]
RepceClient: call 12634:139689683523392:1494865906.7 history_scan -> 1
[2017-05-15 16:31:46.703988] D [repce(/data/brick1/gv0):191:push]
RepceClient: call 12634:139689683523392:1494865906.7 history_getchanges()
...
[2017-05-15 16:31:46.704641] D [repce(/data/brick1/gv0):209:__call__]
RepceClient: call 12634:139689683523392:1494865906.7 history_getchanges ->
['/var/lib/misc/glusterfsd/gv0/ssh%3A%2F%2Froot%40172.17.20.60%3Agluster%3A%2F%2F127.0.0.1%3Agv0/430af6dc2d4f6e41e4786764428f83dd/.history/.processing/CHANGELOG.1492459923']
[2017-05-15 16:31:46.704828] I [master(/data/brick1/gv0):1272:crawl]
_GMaster: slave's time: (1492459922, 0)
[2017-05-15 16:31:46.704973] D
[master(/data/brick1/gv0):1183:changelogs_batch_process] _GMaster:
processing changes
['/var/lib/misc/glusterfsd/gv0/ssh%3A%2F%2Froot%40172.17.20.60%3Agluster%3A%2F%2F127.0.0.1%3Agv0/430af6dc2d4f6e41e4786764428f83dd/.history/.processing/CHANGELOG.1492459923']
[2017-05-15 16:31:46.705100] D [master(/data/brick1/gv0):1038:process]
_GMaster: processing change
/var/lib/misc/glusterfsd/gv0/ssh%3A%2F%2Froot%40172.17.20.60%3Agluster%3A%2F%2F127.0.0.1%3Agv0/430af6dc2d4f6e41e4786764428f83dd/.history/.processing/CHANGELOG.1492459923
[2017-05-15 16:31:46.706136] D
[master(/data/brick1/gv0):948:process_change] _GMaster: entries: [{'uid':
10000006, 'gfid': 'bf3b90bd-34a5-4265-98a6-54e7a783c142', 'gid': 25961,
'mode': 33200, 'entry':
'.gfid/598cc6d2-b95e-4ba2-9a70-d1a9c0f752ce/file-946-of-5000-at-1.00KB',
'op': 'CREATE'}, ...
...
/* omitted many file paths to sync */
...
[2017-05-15 16:31:46.737530] D [repce(/data/brick1/gv0):209:__call__]
RepceClient: call 12634:139689683523392:1494865906.71 entry_ops -> []
[2017-05-15 16:31:46.741244] E
[syncdutils(/data/brick1/gv0):291:log_raise_exception] <top>: glusterfs
session went down [ENOTCONN]
[2017-05-15 16:31:46.741379] E
[syncdutils(/data/brick1/gv0):297:log_raise_exception] <top>: FULL
EXCEPTION TRACE:
Traceback (most recent call last):
  File "/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py", line 204, in
main
    main_i()
  File "/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py", line 780, in
main_i
    local.service_loop(*[r for r in [remote] if r])
  File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 1610,
in service_loop
    g3.crawlwrap(oneshot=True)
  File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 600, in
crawlwrap
    self.crawl()
  File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1281, in
crawl
    self.changelogs_batch_process(changes)
  File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1184, in
changelogs_batch_process
    self.process(batch)
  File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1039, in
process
    self.process_change(change, done, retry)
  File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 986, in
process_change
    st = lstat(go[0])
  File "/usr/libexec/glusterfs/python/syncdaemon/syncdutils.py", line 490,
in lstat
    return errno_wrap(os.lstat, [e], [ENOENT], [ESTALE])
  File "/usr/libexec/glusterfs/python/syncdaemon/syncdutils.py", line 473,
in errno_wrap
    return call(*arg)
OSError: [Errno 107] Transport endpoint is not connected:
'.gfid/accf7915-d1dc-4869-86d9-60722ccdf9c4'

Current geo-replication config:

special_sync_mode: partial
gluster_log_file:
/var/log/glusterfs/geo-replication/gv0/ssh%3A%2F%2Froot%40172.17.20.60%3Agluster%3A%2F%2F127.0.0.1%3Agv0.gluster.log
ssh_command: ssh -oPasswordAuthentication=no -oStrictHostKeyChecking=no -i
/var/lib/glusterd/geo-replication/secret.pem
ssh_port: 20022
change_detector: changelog
session_owner: d37a7455-0b1b-402e-985b-cf1ace4e513e
state_file:
/var/lib/glusterd/geo-replication/gv0_slave-gfs1.tomfite.com_gv0/monitor.status
gluster_params: aux-gfid-mount acl
log_level: DEBUG
remote_gsyncd: /usr/libexec/glusterfs/gsyncd
working_dir:
/var/lib/misc/glusterfsd/gv0/ssh%3A%2F%2Froot%40172.17.20.60%3Agluster%3A%2F%2F127.0.0.1%3Agv0
state_detail_file:
/var/lib/glusterd/geo-replication/gv0_slave-gfs1.tomfite.com_gv0/ssh%3A%2F%2Froot%40172.17.20.60%3Agluster%3A%2F%2F127.0.0.1%3Agv0-detail.status
gluster_command_dir: /usr/sbin/
pid_file:
/var/lib/glusterd/geo-replication/gv0_slave-gfs1.tomfite.com_gv0/monitor.pid
georep_session_working_dir:
/var/lib/glusterd/geo-replication/gv0_slave-gfs1.tomfite.com_gv0/
ssh_command_tar: ssh -oPasswordAuthentication=no -oStrictHostKeyChecking=no
-i /var/lib/glusterd/geo-replication/tar_ssh.pem
master.stime_xattr_name:
trusted.glusterfs.d37a7455-0b1b-402e-985b-cf1ace4e513e.30970990-6acb-4f33-a1f2-5c2056004818.stime
changelog_log_file:
/var/log/glusterfs/geo-replication/gv0/ssh%3A%2F%2Froot%40172.17.20.60%3Agluster%3A%2F%2F127.0.0.1%3Agv0-changes.log
socketdir: /var/run/gluster
volume_id: d37a7455-0b1b-402e-985b-cf1ace4e513e
ignore_deletes: false
state_socket_unencoded:
/var/lib/glusterd/geo-replication/gv0_slave-gfs1.tomfite.com_gv0/ssh%3A%2F%2Froot%40172.17.20.60%3Agluster%3A%2F%2F127.0.0.1%3Agv0.socket
log_file:
/var/log/glusterfs/geo-replication/gv0/ssh%3A%2F%2Froot%40172.17.20.60%3Agluster%3A%2F%2F127.0.0.1%3Agv0.log


Gluster volume status on master

Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick master-gfs1.tomfite.com:/data/
brick1/gv0                                  49152     0          Y
3989
Brick master-gfs2.tomfite.com:/data/
brick1/gv0                                  49152     0          Y
3610
Brick master-gfs1.tomfite.com:/data/
brick2/gv0                                  49153     0          Y
4000
Brick master-gfs2.tomfite.com:/data/
brick2/gv0                                  49153     0          Y
3621
Brick master-gfs1.tomfite.com:/data/
brick3/gv0                                  49154     0          Y
4010
Brick master-gfs2.tomfite.com:/data/
brick3/gv0                                  49154     0          Y
3632
Snapshot Daemon on localhost                49197     0          Y
4946
NFS Server on localhost                     N/A       N/A        N
N/A
Self-heal Daemon on localhost               N/A       N/A        Y
2885
Snapshot Daemon on master-gfs2.tomfite
.com                                       49197     0          Y
4600
NFS Server on master-gfs2.tomfite.co
m                                           N/A       N/A        N
N/A
Self-heal Daemon on master-gfs2.tomfite
.com                                        N/A       N/A        Y
2856

Task Status of Volume gv0
------------------------------------------------------------------------------
There are no active volume tasks


Gluster volume status on slave

Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick slave-gfs1.tomfite.com:/data/b
rick1/gv0                                   49152     0          Y
3688
Brick slave-gfs2.tomfite.com:/data/b
rick1/gv0                                   49152     0          Y
3701
Brick slave-gfs1.tomfite.com:/data/b
rick2/gv0                                   49153     0          Y
3696
Brick slave-gfs2.tomfite.com:/data/b
rick2/gv0                                   49153     0          Y
3695
Brick slave-gfs1.tomfite.com:/data/b
rick3/gv0                                   49154     0          Y
3702
Brick slave-gfs2.tomfite.com:/data/b
rick3/gv0                                   49154     0          Y
3707
NFS Server on localhost                     N/A       N/A        N
N/A
Self-heal Daemon on localhost               N/A       N/A        Y
2630
NFS Server on slave-gfs2.tomfite.com N/A       N/A        N       N/A
Self-heal Daemon on slave-gfs2.tomfite.
com                                         N/A       N/A        Y
2635

Task Status of Volume gv0
------------------------------------------------------------------------------
There are no active volume tasks


Anybody have any other ideas for me to check out?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20170515/a8e46c36/attachment.html>


More information about the Gluster-users mailing list