<div dir="ltr"><div>Hi all,<br></div><div><br></div><div>I&#39;ve hit a strange problem with geo-replication. </div><div><br></div><div>On gluster 3.10.1, I have set up geo replication between my replicated / distributed instance and a remote replicated / distributed instance. The master and slave instances are connected via VPN. Initially the geo-replication setup was working fine, I had a status of &quot;Active&quot; with &quot;Changelog crawl&quot; previously after the initial setup, and I confirmed that files were synced between the two gluster instances.</div><div><br></div><div>Something must have changed between then and now, because about a week after the instance had been online it switched to a &quot;Faulty&quot; status.</div><div><br></div><div>[root@master-gfs1 ~]# gluster volume geo-replication gv0 root@slave-gfs1.tomfite.com::gv0 status</div><div> </div><div>MASTER NODE                       MASTER VOL    MASTER BRICK        SLAVE USER    SLAVE                                 SLAVE NODE                       STATUS     CRAWL STATUS    LAST_SYNCED          </div><div>----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------</div><div><a href="http://master-gfs1.tomfite.com">master-gfs1.tomfite.com</a>    gv0           /data/brick1/gv0    root          slave-gfs1.tomfite.com::gv0    N/A                              Faulty     N/A             N/A                  </div><div><a href="http://master-gfs1.tomfite.com">master-gfs1.tomfite.com</a>    gv0           /data/brick2/gv0    root          slave-gfs1.tomfite.com::gv0    N/A                              Faulty     N/A             N/A                  </div><div><a href="http://master-gfs1.tomfite.com">master-gfs1.tomfite.com</a>    gv0           /data/brick3/gv0    root          slave-gfs1.tomfite.com::gv0    N/A                              Faulty     N/A             N/A                  </div><div><a href="http://master-gfs2.tomfite.com">master-gfs2.tomfite.com</a>    gv0           /data/brick1/gv0    root          slave-gfs1.tomfite.com::gv0    <a href="http://slave-gfs1.tomfite.com">slave-gfs1.tomfite.com</a>    Passive    N/A             N/A                  </div><div><a href="http://master-gfs2.tomfite.com">master-gfs2.tomfite.com</a>    gv0           /data/brick2/gv0    root          slave-gfs1.tomfite.com::gv0    <a href="http://slave-gfs1.tomfite.com">slave-gfs1.tomfite.com</a>    Passive    N/A             N/A                  </div><div><a href="http://master-gfs2.tomfite.com">master-gfs2.tomfite.com</a>    gv0           /data/brick3/gv0    root          slave-gfs1.tomfite.com::gv0    <a href="http://slave-gfs1.tomfite.com">slave-gfs1.tomfite.com</a>    Passive    N/A             N/A   </div><div><br></div><div>From the logs (see below) seems like there is an issue trying to sync files to the slave, as I get a &quot;Transport is not connected&quot; error when gsyncd attempts to sync the first set of files.</div><div><br></div><div>Here&#39;s what I&#39;ve tried so far:</div><div><br></div><div>1. ssh_port is currently configured on a non-standard port. I switched the port to the standard 22 but observed no change in behavior.</div><div>2. I verified that SELinux is disabled on all boxes, and that there are no firewalls running.</div><div>3. The remote_gsyncd setting was set to &quot;/nonexistent/gsyncd&#39; which looked incorrect, changed it to a valid location for that executable /usr/libexec/glusterfs/gsyncd</div><div>4. In an attempt to start the slave from scratch, I removed all files from the slave and reset the geo-replication instance by deleting and recreating the session.</div><div><br></div><div>Debug logs when trying to start geo-replication: </div><div><br></div><div>[2017-05-15 16:31:32.940068] I [gsyncd(conf):689:main_i] &lt;top&gt;: Config Set: session-owner = d37a7455-0b1b-402e-985b-cf1ace4e513e</div><div>[2017-05-15 16:31:33.293926] D [monitor(monitor):434:distribute] &lt;top&gt;: master bricks: [{&#39;host&#39;: &#39;<a href="http://master-gfs1.tomfite.com">master-gfs1.tomfite.com</a>&#39;, &#39;uuid&#39;: &#39;e0d9d624-5383-4c43-aca4-e946e7de296d&#39;, &#39;dir&#39;: &#39;/data/brick1/gv0&#39;}, {&#39;host&#39;: &#39;<a href="http://master-gfs2.tomfite.com">master-gfs2.tomfite.com</a>&#39;, &#39;uuid&#39;: &#39;bdbb7a18-3ecf-4733-a5df-447d8c712af5&#39;, &#39;dir&#39;: &#39;/data/brick1/gv0&#39;}, {&#39;host&#39;: &#39;<a href="http://master-gfs1.tomfite.com">master-gfs1.tomfite.com</a>&#39;, &#39;uuid&#39;: &#39;e0d9d624-5383-4c43-aca4-e946e7de296d&#39;, &#39;dir&#39;: &#39;/data/brick2/gv0&#39;}, {&#39;host&#39;: &#39;<a href="http://master-gfs2.tomfite.com">master-gfs2.tomfite.com</a>&#39;, &#39;uuid&#39;: &#39;bdbb7a18-3ecf-4733-a5df-447d8c712af5&#39;, &#39;dir&#39;: &#39;/data/brick2/gv0&#39;}, {&#39;host&#39;: &#39;<a href="http://master-gfs1.tomfite.com">master-gfs1.tomfite.com</a>&#39;, &#39;uuid&#39;: &#39;e0d9d624-5383-4c43-aca4-e946e7de296d&#39;, &#39;dir&#39;: &#39;/data/brick3/gv0&#39;}, {&#39;host&#39;: &#39;<a href="http://master-gfs2.tomfite.com">master-gfs2.tomfite.com</a>&#39;, &#39;uuid&#39;: &#39;bdbb7a18-3ecf-4733-a5df-447d8c712af5&#39;, &#39;dir&#39;: &#39;/data/brick3/gv0&#39;}]</div><div>[2017-05-15 16:31:33.294250] D [monitor(monitor):443:distribute] &lt;top&gt;: slave SSH gateway: <a href="http://slave-gfs1.tomfite.com">slave-gfs1.tomfite.com</a></div><div>[2017-05-15 16:31:33.424451] D [monitor(monitor):464:distribute] &lt;top&gt;: slave bricks: [{&#39;host&#39;: &#39;<a href="http://slave-gfs1.tomfite.com">slave-gfs1.tomfite.com</a>&#39;, &#39;uuid&#39;: &#39;c184bc78-cff0-4cef-8c6a-e637ab52b324&#39;, &#39;dir&#39;: &#39;/data/brick1/gv0&#39;}, {&#39;host&#39;: &#39;<a href="http://slave-gfs2.tomfite.com">slave-gfs2.tomfite.com</a>&#39;, &#39;uuid&#39;: &#39;7290f265-0709-45fc-86ef-2ff5125d31e1&#39;, &#39;dir&#39;: &#39;/data/brick1/gv0&#39;}, {&#39;host&#39;: &#39;<a href="http://slave-gfs1.tomfite.com">slave-gfs1.tomfite.com</a>&#39;, &#39;uuid&#39;: &#39;c184bc78-cff0-4cef-8c6a-e637ab52b324&#39;, &#39;dir&#39;: &#39;/data/brick2/gv0&#39;}, {&#39;host&#39;: &#39;<a href="http://slave-gfs2.tomfite.com">slave-gfs2.tomfite.com</a>&#39;, &#39;uuid&#39;: &#39;7290f265-0709-45fc-86ef-2ff5125d31e1&#39;, &#39;dir&#39;: &#39;/data/brick2/gv0&#39;}, {&#39;host&#39;: &#39;<a href="http://slave-gfs1.tomfite.com">slave-gfs1.tomfite.com</a>&#39;, &#39;uuid&#39;: &#39;c184bc78-cff0-4cef-8c6a-e637ab52b324&#39;, &#39;dir&#39;: &#39;/data/brick3/gv0&#39;}, {&#39;host&#39;: &#39;<a href="http://slave-gfs2.tomfite.com">slave-gfs2.tomfite.com</a>&#39;, &#39;uuid&#39;: &#39;7290f265-0709-45fc-86ef-2ff5125d31e1&#39;, &#39;dir&#39;: &#39;/data/brick3/gv0&#39;}]</div><div>[2017-05-15 16:31:33.424927] D [monitor(monitor):119:is_hot] Volinfo: brickpath: &#39;master-gfs1.tomfite.com:/data/brick1/gv0&#39;</div><div>[2017-05-15 16:31:33.425452] D [monitor(monitor):119:is_hot] Volinfo: brickpath: &#39;master-gfs1.tomfite.com:/data/brick2/gv0&#39;</div><div>[2017-05-15 16:31:33.425790] D [monitor(monitor):119:is_hot] Volinfo: brickpath: &#39;master-gfs1.tomfite.com:/data/brick3/gv0&#39;</div><div>[2017-05-15 16:31:33.426130] D [monitor(monitor):489:distribute] &lt;top&gt;: worker specs: [({&#39;host&#39;: &#39;<a href="http://master-gfs1.tomfite.com">master-gfs1.tomfite.com</a>&#39;, &#39;uuid&#39;: &#39;e0d9d624-5383-4c43-aca4-e946e7de296d&#39;, &#39;dir&#39;: &#39;/data/brick1/gv0&#39;}, &#39;ssh://root@slave-gfs2.tomfite.com:gluster://localhost:gv0&#39;, &#39;1&#39;, False), ({&#39;host&#39;: &#39;<a href="http://master-gfs1.tomfite.com">master-gfs1.tomfite.com</a>&#39;, &#39;uuid&#39;: &#39;e0d9d624-5383-4c43-aca4-e946e7de296d&#39;, &#39;dir&#39;: &#39;/data/brick2/gv0&#39;}, &#39;ssh://root@slave-gfs2.tomfite.com:gluster://localhost:gv0&#39;, &#39;2&#39;, False), ({&#39;host&#39;: &#39;<a href="http://master-gfs1.tomfite.com">master-gfs1.tomfite.com</a>&#39;, &#39;uuid&#39;: &#39;e0d9d624-5383-4c43-aca4-e946e7de296d&#39;, &#39;dir&#39;: &#39;/data/brick3/gv0&#39;}, &#39;ssh://root@slave-gfs2.tomfite.com:gluster://localhost:gv0&#39;, &#39;3&#39;, False)]</div><div>[2017-05-15 16:31:33.429359] I [gsyncdstatus(monitor):241:set_worker_status] GeorepStatus: Worker Status: Initializing...</div><div>[2017-05-15 16:31:33.432882] I [gsyncdstatus(monitor):241:set_worker_status] GeorepStatus: Worker Status: Initializing...</div><div>[2017-05-15 16:31:33.435489] I [gsyncdstatus(monitor):241:set_worker_status] GeorepStatus: Worker Status: Initializing...</div><div>[2017-05-15 16:31:33.574393] I [monitor(monitor):74:get_slave_bricks_status] &lt;top&gt;: Unable to get list of up nodes of gv0, returning empty list: Another transaction is in progress for gv0. Please try again after sometime.</div><div>[2017-05-15 16:31:33.574764] I [monitor(monitor):275:monitor] Monitor: starting gsyncd worker(/data/brick2/gv0). Slave node: ssh://root@slave-gfs2.tomfite.com:gluster://localhost:gv0</div><div>[2017-05-15 16:31:33.578641] I [monitor(monitor):74:get_slave_bricks_status] &lt;top&gt;: Unable to get list of up nodes of gv0, returning empty list: Another transaction is in progress for gv0. Please try again after sometime.</div><div>[2017-05-15 16:31:33.579119] I [monitor(monitor):275:monitor] Monitor: starting gsyncd worker(/data/brick1/gv0). Slave node: ssh://root@slave-gfs2.tomfite.com:gluster://localhost:gv0</div><div>[2017-05-15 16:31:33.585609] I [monitor(monitor):275:monitor] Monitor: starting gsyncd worker(/data/brick3/gv0). Slave node: ssh://root@slave-gfs2.tomfite.com:gluster://localhost:gv0</div><div>[2017-05-15 16:31:33.671281] D [gsyncd(/data/brick1/gv0):765:main_i] &lt;top&gt;: rpc_fd: &#39;9,12,11,10&#39;</div><div>[2017-05-15 16:31:33.672070] I [changelogagent(/data/brick1/gv0):73:__init__] ChangelogAgent: Agent listining...</div><div>[2017-05-15 16:31:33.673501] D [gsyncd(/data/brick3/gv0):765:main_i] &lt;top&gt;: rpc_fd: &#39;8,11,10,9&#39;</div><div>[2017-05-15 16:31:33.674078] I [changelogagent(/data/brick3/gv0):73:__init__] ChangelogAgent: Agent listining...</div><div>[2017-05-15 16:31:33.676042] D [gsyncd(/data/brick2/gv0):765:main_i] &lt;top&gt;: rpc_fd: &#39;9,14,13,11&#39;</div><div>[2017-05-15 16:31:33.676713] I [changelogagent(/data/brick2/gv0):73:__init__] ChangelogAgent: Agent listining...</div><div>[2017-05-15 16:31:33.695128] D [repce(/data/brick2/gv0):191:push] RepceClient: call 12632:140397039490880:1494865893.7 __repce_version__() ...</div><div>[2017-05-15 16:31:33.696594] D [repce(/data/brick1/gv0):191:push] RepceClient: call 12634:139689683523392:1494865893.7 __repce_version__() ...</div><div>[2017-05-15 16:31:33.706545] D [repce(/data/brick3/gv0):191:push] RepceClient: call 12636:140598056400704:1494865893.71 __repce_version__() ...</div><div>[2017-05-15 16:31:39.342730] D [repce(/data/brick2/gv0):209:__call__] RepceClient: call 12632:140397039490880:1494865893.7 __repce_version__ -&gt; 1.0</div><div>[2017-05-15 16:31:39.343020] D [repce(/data/brick2/gv0):191:push] RepceClient: call 12632:140397039490880:1494865899.34 version() ...</div><div>[2017-05-15 16:31:39.343569] D [repce(/data/brick3/gv0):209:__call__] RepceClient: call 12636:140598056400704:1494865893.71 __repce_version__ -&gt; 1.0</div><div>[2017-05-15 16:31:39.343859] D [repce(/data/brick3/gv0):191:push] RepceClient: call 12636:140598056400704:1494865899.34 version() ...</div><div>[2017-05-15 16:31:39.349275] D [repce(/data/brick1/gv0):209:__call__] RepceClient: call 12634:139689683523392:1494865893.7 __repce_version__ -&gt; 1.0</div><div>[2017-05-15 16:31:39.349540] D [repce(/data/brick1/gv0):191:push] RepceClient: call 12634:139689683523392:1494865899.35 version() ...</div><div>[2017-05-15 16:31:39.349998] D [repce(/data/brick2/gv0):209:__call__] RepceClient: call 12632:140397039490880:1494865899.34 version -&gt; 1.0</div><div>[2017-05-15 16:31:39.350292] D [repce(/data/brick2/gv0):191:push] RepceClient: call 12632:140397039490880:1494865899.35 pid() ...</div><div>[2017-05-15 16:31:39.350780] D [repce(/data/brick3/gv0):209:__call__] RepceClient: call 12636:140598056400704:1494865899.34 version -&gt; 1.0</div><div>[2017-05-15 16:31:39.351070] D [repce(/data/brick3/gv0):191:push] RepceClient: call 12636:140598056400704:1494865899.35 pid() ...</div><div>[2017-05-15 16:31:39.356405] D [repce(/data/brick1/gv0):209:__call__] RepceClient: call 12634:139689683523392:1494865899.35 version -&gt; 1.0</div><div>[2017-05-15 16:31:39.356715] D [repce(/data/brick1/gv0):191:push] RepceClient: call 12634:139689683523392:1494865899.36 pid() ...</div><div>[2017-05-15 16:31:39.357254] D [repce(/data/brick2/gv0):209:__call__] RepceClient: call 12632:140397039490880:1494865899.35 pid -&gt; 19304</div><div>[2017-05-15 16:31:39.357983] D [repce(/data/brick3/gv0):209:__call__] RepceClient: call 12636:140598056400704:1494865899.35 pid -&gt; 19305</div><div>[2017-05-15 16:31:39.363502] D [repce(/data/brick1/gv0):209:__call__] RepceClient: call 12634:139689683523392:1494865899.36 pid -&gt; 19303</div><div>[2017-05-15 16:31:43.453656] D [resource(/data/brick3/gv0):1332:inhibit] DirectMounter: auxiliary glusterfs mount in place</div><div>[2017-05-15 16:31:43.462914] D [resource(/data/brick1/gv0):1332:inhibit] DirectMounter: auxiliary glusterfs mount in place</div><div>[2017-05-15 16:31:43.464389] D [resource(/data/brick2/gv0):1332:inhibit] DirectMounter: auxiliary glusterfs mount in place</div><div>[2017-05-15 16:31:44.478801] D [resource(/data/brick3/gv0):1387:inhibit] DirectMounter: auxiliary glusterfs mount prepared</div><div>[2017-05-15 16:31:44.479312] D [master(/data/brick3/gv0):101:gmaster_builder] &lt;top&gt;: setting up xsync change detection mode</div><div>[2017-05-15 16:31:44.479366] D [monitor(monitor):350:monitor] Monitor: worker(/data/brick3/gv0) connected</div><div>[2017-05-15 16:31:44.480387] D [master(/data/brick3/gv0):101:gmaster_builder] &lt;top&gt;: setting up changelog change detection mode</div><div>[2017-05-15 16:31:44.481631] D [master(/data/brick3/gv0):101:gmaster_builder] &lt;top&gt;: setting up changeloghistory change detection mode</div><div>[2017-05-15 16:31:44.485300] D [repce(/data/brick3/gv0):191:push] RepceClient: call 12636:140598056400704:1494865904.49 version() ...</div><div>[2017-05-15 16:31:44.485999] D [repce(/data/brick3/gv0):209:__call__] RepceClient: call 12636:140598056400704:1494865904.49 version -&gt; 1.0</div><div>[2017-05-15 16:31:44.486202] D [master(/data/brick3/gv0):752:setup_working_dir] _GMaster: changelog working dir /var/lib/misc/glusterfsd/gv0/ssh%3A%2F%2Froot%40172.17.20.60%3Agluster%3A%2F%2F127.0.0.1%3Agv0/9cffa7778b4f82aafb982c0f3eb3d650</div><div>[2017-05-15 16:31:44.486382] D [repce(/data/brick3/gv0):191:push] RepceClient: call 12636:140598056400704:1494865904.49 init() ...</div><div>[2017-05-15 16:31:44.487781] D [resource(/data/brick1/gv0):1387:inhibit] DirectMounter: auxiliary glusterfs mount prepared</div><div>[2017-05-15 16:31:44.488292] D [master(/data/brick1/gv0):101:gmaster_builder] &lt;top&gt;: setting up xsync change detection mode</div><div>[2017-05-15 16:31:44.488245] D [monitor(monitor):350:monitor] Monitor: worker(/data/brick1/gv0) connected</div><div>[2017-05-15 16:31:44.489343] D [master(/data/brick1/gv0):101:gmaster_builder] &lt;top&gt;: setting up changelog change detection mode</div><div>[2017-05-15 16:31:44.489279] D [resource(/data/brick2/gv0):1387:inhibit] DirectMounter: auxiliary glusterfs mount prepared</div><div>[2017-05-15 16:31:44.489826] D [master(/data/brick2/gv0):101:gmaster_builder] &lt;top&gt;: setting up xsync change detection mode</div><div>[2017-05-15 16:31:44.489825] D [monitor(monitor):350:monitor] Monitor: worker(/data/brick2/gv0) connected</div><div>[2017-05-15 16:31:44.490509] D [master(/data/brick1/gv0):101:gmaster_builder] &lt;top&gt;: setting up changeloghistory change detection mode</div><div>[2017-05-15 16:31:44.491131] D [master(/data/brick2/gv0):101:gmaster_builder] &lt;top&gt;: setting up changelog change detection mode</div><div>[2017-05-15 16:31:44.493197] D [master(/data/brick2/gv0):101:gmaster_builder] &lt;top&gt;: setting up changeloghistory change detection mode</div><div>[2017-05-15 16:31:44.493820] D [repce(/data/brick1/gv0):191:push] RepceClient: call 12634:139689683523392:1494865904.49 version() ...</div><div>[2017-05-15 16:31:44.494577] D [repce(/data/brick1/gv0):209:__call__] RepceClient: call 12634:139689683523392:1494865904.49 version -&gt; 1.0</div><div>[2017-05-15 16:31:44.494801] D [master(/data/brick1/gv0):752:setup_working_dir] _GMaster: changelog working dir /var/lib/misc/glusterfsd/gv0/ssh%3A%2F%2Froot%40172.17.20.60%3Agluster%3A%2F%2F127.0.0.1%3Agv0/430af6dc2d4f6e41e4786764428f83dd</div><div>[2017-05-15 16:31:44.494982] D [repce(/data/brick1/gv0):191:push] RepceClient: call 12634:139689683523392:1494865904.49 init() ...</div><div>[2017-05-15 16:31:44.495695] D [repce(/data/brick2/gv0):191:push] RepceClient: call 12632:140397039490880:1494865904.5 version() ...</div><div>[2017-05-15 16:31:44.496423] D [repce(/data/brick2/gv0):209:__call__] RepceClient: call 12632:140397039490880:1494865904.5 version -&gt; 1.0</div><div>[2017-05-15 16:31:44.496617] D [master(/data/brick2/gv0):752:setup_working_dir] _GMaster: changelog working dir /var/lib/misc/glusterfsd/gv0/ssh%3A%2F%2Froot%40172.17.20.60%3Agluster%3A%2F%2F127.0.0.1%3Agv0/dff20e5176b24a4185f15b3e2c70fad8</div><div>[2017-05-15 16:31:44.496607] D [repce(/data/brick3/gv0):209:__call__] RepceClient: call 12636:140598056400704:1494865904.49 init -&gt; None</div><div>[2017-05-15 16:31:44.496813] D [repce(/data/brick2/gv0):191:push] RepceClient: call 12632:140397039490880:1494865904.5 init() ...</div><div>[2017-05-15 16:31:44.496891] D [repce(/data/brick3/gv0):191:push] RepceClient: call 12636:140598056400704:1494865904.5 register(&#39;/data/brick3/gv0&#39;, &#39;/var/lib/misc/glusterfsd/gv0/ssh%3A%2F%2Froot%40172.17.20.60%3Agluster%3A%2F%2F127.0.0.1%3Agv0/9cffa7778b4f82aafb982c0f3eb3d650&#39;, &#39;/var/log/glusterfs/geo-replication/gv0/ssh%3A%2F%2Froot%40172.17.20.60%3Agluster%3A%2F%2F127.0.0.1%3Agv0.%2Fdata%2Fbrick3%2Fgv0-changes.log&#39;, 7, 5) ...</div><div>[2017-05-15 16:31:44.505940] D [repce(/data/brick1/gv0):209:__call__] RepceClient: call 12634:139689683523392:1494865904.49 init -&gt; None</div><div>[2017-05-15 16:31:44.506314] D [repce(/data/brick1/gv0):191:push] RepceClient: call 12634:139689683523392:1494865904.51 register(&#39;/data/brick1/gv0&#39;, &#39;/var/lib/misc/glusterfsd/gv0/ssh%3A%2F%2Froot%40172.17.20.60%3Agluster%3A%2F%2F127.0.0.1%3Agv0/430af6dc2d4f6e41e4786764428f83dd&#39;, &#39;/var/log/glusterfs/geo-replication/gv0/ssh%3A%2F%2Froot%40172.17.20.60%3Agluster%3A%2F%2F127.0.0.1%3Agv0.%2Fdata%2Fbrick1%2Fgv0-changes.log&#39;, 7, 5) ...</div><div>[2017-05-15 16:31:44.507751] D [repce(/data/brick2/gv0):209:__call__] RepceClient: call 12632:140397039490880:1494865904.5 init -&gt; None</div><div>[2017-05-15 16:31:44.508045] D [repce(/data/brick2/gv0):191:push] RepceClient: call 12632:140397039490880:1494865904.51 register(&#39;/data/brick2/gv0&#39;, &#39;/var/lib/misc/glusterfsd/gv0/ssh%3A%2F%2Froot%40172.17.20.60%3Agluster%3A%2F%2F127.0.0.1%3Agv0/dff20e5176b24a4185f15b3e2c70fad8&#39;, &#39;/var/log/glusterfs/geo-replication/gv0/ssh%3A%2F%2Froot%40172.17.20.60%3Agluster%3A%2F%2F127.0.0.1%3Agv0.%2Fdata%2Fbrick2%2Fgv0-changes.log&#39;, 7, 5) ...</div><div>[2017-05-15 16:31:46.605554] D [repce(/data/brick3/gv0):209:__call__] RepceClient: call 12636:140598056400704:1494865904.5 register -&gt; None</div><div>[2017-05-15 16:31:46.605916] D [master(/data/brick3/gv0):752:setup_working_dir] _GMaster: changelog working dir /var/lib/misc/glusterfsd/gv0/ssh%3A%2F%2Froot%40172.17.20.60%3Agluster%3A%2F%2F127.0.0.1%3Agv0/9cffa7778b4f82aafb982c0f3eb3d650</div><div>[2017-05-15 16:31:46.606117] D [master(/data/brick3/gv0):752:setup_working_dir] _GMaster: changelog working dir /var/lib/misc/glusterfsd/gv0/ssh%3A%2F%2Froot%40172.17.20.60%3Agluster%3A%2F%2F127.0.0.1%3Agv0/9cffa7778b4f82aafb982c0f3eb3d650</div><div>[2017-05-15 16:31:46.606285] D [master(/data/brick3/gv0):752:setup_working_dir] _GMaster: changelog working dir /var/lib/misc/glusterfsd/gv0/ssh%3A%2F%2Froot%40172.17.20.60%3Agluster%3A%2F%2F127.0.0.1%3Agv0/9cffa7778b4f82aafb982c0f3eb3d650</div><div>[2017-05-15 16:31:46.606420] I [master(/data/brick3/gv0):1328:register] _GMaster: Working dir: /var/lib/misc/glusterfsd/gv0/ssh%3A%2F%2Froot%40172.17.20.60%3Agluster%3A%2F%2F127.0.0.1%3Agv0/9cffa7778b4f82aafb982c0f3eb3d650</div><div>[2017-05-15 16:31:46.606653] I [resource(/data/brick3/gv0):1604:service_loop] GLUSTER: Register time: 1494865906</div><div>[2017-05-15 16:31:46.607355] D [repce(/data/brick3/gv0):191:push] RepceClient: call 12636:140597264365312:1494865906.61 keep_alive(None,) ...</div><div>[2017-05-15 16:31:46.610795] D [master(/data/brick3/gv0):540:crawlwrap] _GMaster: primary master with volume id d37a7455-0b1b-402e-985b-cf1ace4e513e ...</div><div>[2017-05-15 16:31:46.615416] D [repce(/data/brick3/gv0):209:__call__] RepceClient: call 12636:140597264365312:1494865906.61 keep_alive -&gt; 1</div><div>[2017-05-15 16:31:46.622519] I [gsyncdstatus(/data/brick3/gv0):272:set_active] GeorepStatus: Worker Status: Active</div><div>[2017-05-15 16:31:46.623460] I [gsyncdstatus(/data/brick3/gv0):245:set_worker_crawl_status] GeorepStatus: Crawl Status: History Crawl</div><div>[2017-05-15 16:31:46.623876] I [master(/data/brick3/gv0):1244:crawl] _GMaster: starting history crawl... turns: 1, stime: (1492459926, 0), etime: 1494865906, entry_stime: (1492459926, 0)</div><div>[2017-05-15 16:31:46.624118] D [repce(/data/brick3/gv0):191:push] RepceClient: call 12636:140598056400704:1494865906.62 history(&#39;/data/brick3/gv0/.glusterfs/changelogs&#39;, 1492459926, 1494865906, 3) ...</div><div>[2017-05-15 16:31:46.639169] D [repce(/data/brick3/gv0):209:__call__] RepceClient: call 12636:140598056400704:1494865906.62 history -&gt; (0, 1494865893L)</div><div>[2017-05-15 16:31:46.639429] D [repce(/data/brick3/gv0):191:push] RepceClient: call 12636:140598056400704:1494865906.64 history_scan() ...</div><div>[2017-05-15 16:31:46.671082] D [repce(/data/brick1/gv0):209:__call__] RepceClient: call 12634:139689683523392:1494865904.51 register -&gt; None</div><div>[2017-05-15 16:31:46.671462] D [master(/data/brick1/gv0):752:setup_working_dir] _GMaster: changelog working dir /var/lib/misc/glusterfsd/gv0/ssh%3A%2F%2Froot%40172.17.20.60%3Agluster%3A%2F%2F127.0.0.1%3Agv0/430af6dc2d4f6e41e4786764428f83dd</div><div>[2017-05-15 16:31:46.671639] D [master(/data/brick1/gv0):752:setup_working_dir] _GMaster: changelog working dir /var/lib/misc/glusterfsd/gv0/ssh%3A%2F%2Froot%40172.17.20.60%3Agluster%3A%2F%2F127.0.0.1%3Agv0/430af6dc2d4f6e41e4786764428f83dd</div><div>[2017-05-15 16:31:46.671840] D [master(/data/brick1/gv0):752:setup_working_dir] _GMaster: changelog working dir /var/lib/misc/glusterfsd/gv0/ssh%3A%2F%2Froot%40172.17.20.60%3Agluster%3A%2F%2F127.0.0.1%3Agv0/430af6dc2d4f6e41e4786764428f83dd</div><div>[2017-05-15 16:31:46.671979] I [master(/data/brick1/gv0):1328:register] _GMaster: Working dir: /var/lib/misc/glusterfsd/gv0/ssh%3A%2F%2Froot%40172.17.20.60%3Agluster%3A%2F%2F127.0.0.1%3Agv0/430af6dc2d4f6e41e4786764428f83dd</div><div>[2017-05-15 16:31:46.671940] D [repce(/data/brick2/gv0):209:__call__] RepceClient: call 12632:140397039490880:1494865904.51 register -&gt; None</div><div>[2017-05-15 16:31:46.672233] I [resource(/data/brick1/gv0):1604:service_loop] GLUSTER: Register time: 1494865906</div><div>[2017-05-15 16:31:46.672239] D [master(/data/brick2/gv0):752:setup_working_dir] _GMaster: changelog working dir /var/lib/misc/glusterfsd/gv0/ssh%3A%2F%2Froot%40172.17.20.60%3Agluster%3A%2F%2F127.0.0.1%3Agv0/dff20e5176b24a4185f15b3e2c70fad8</div><div>[2017-05-15 16:31:46.672440] D [master(/data/brick2/gv0):752:setup_working_dir] _GMaster: changelog working dir /var/lib/misc/glusterfsd/gv0/ssh%3A%2F%2Froot%40172.17.20.60%3Agluster%3A%2F%2F127.0.0.1%3Agv0/dff20e5176b24a4185f15b3e2c70fad8</div><div>[2017-05-15 16:31:46.672616] D [master(/data/brick2/gv0):752:setup_working_dir] _GMaster: changelog working dir /var/lib/misc/glusterfsd/gv0/ssh%3A%2F%2Froot%40172.17.20.60%3Agluster%3A%2F%2F127.0.0.1%3Agv0/dff20e5176b24a4185f15b3e2c70fad8</div><div>[2017-05-15 16:31:46.672787] I [master(/data/brick2/gv0):1328:register] _GMaster: Working dir: /var/lib/misc/glusterfsd/gv0/ssh%3A%2F%2Froot%40172.17.20.60%3Agluster%3A%2F%2F127.0.0.1%3Agv0/dff20e5176b24a4185f15b3e2c70fad8</div><div>[2017-05-15 16:31:46.673033] I [resource(/data/brick2/gv0):1604:service_loop] GLUSTER: Register time: 1494865906</div><div>[2017-05-15 16:31:46.673294] D [repce(/data/brick1/gv0):191:push] RepceClient: call 12634:139688752957184:1494865906.67 keep_alive(None,) ...</div><div>[2017-05-15 16:31:46.674438] D [repce(/data/brick2/gv0):191:push] RepceClient: call 12632:140395904235264:1494865906.67 keep_alive(None,) ...</div><div>[2017-05-15 16:31:46.675556] D [master(/data/brick1/gv0):540:crawlwrap] _GMaster: primary master with volume id d37a7455-0b1b-402e-985b-cf1ace4e513e ...</div><div>[2017-05-15 16:31:46.677221] D [master(/data/brick2/gv0):540:crawlwrap] _GMaster: primary master with volume id d37a7455-0b1b-402e-985b-cf1ace4e513e ...</div><div>[2017-05-15 16:31:46.680387] D [repce(/data/brick1/gv0):209:__call__] RepceClient: call 12634:139688752957184:1494865906.67 keep_alive -&gt; 1</div><div>[2017-05-15 16:31:46.681812] I [gsyncdstatus(/data/brick1/gv0):272:set_active] GeorepStatus: Worker Status: Active</div><div>[2017-05-15 16:31:46.682248] D [repce(/data/brick2/gv0):209:__call__] RepceClient: call 12632:140395904235264:1494865906.67 keep_alive -&gt; 1</div><div>[2017-05-15 16:31:46.682954] I [gsyncdstatus(/data/brick1/gv0):245:set_worker_crawl_status] GeorepStatus: Crawl Status: History Crawl</div><div>[2017-05-15 16:31:46.683324] I [master(/data/brick1/gv0):1244:crawl] _GMaster: starting history crawl... turns: 1, stime: (1492459922, 0), etime: 1494865906, entry_stime: (1492459922, 0)</div><div>[2017-05-15 16:31:46.683530] D [repce(/data/brick1/gv0):191:push] RepceClient: call 12634:139689683523392:1494865906.68 history(&#39;/data/brick1/gv0/.glusterfs/changelogs&#39;, 1492459922, 1494865906, 3) ...</div><div>[2017-05-15 16:31:46.683958] I [gsyncdstatus(/data/brick2/gv0):272:set_active] GeorepStatus: Worker Status: Active</div><div>[2017-05-15 16:31:46.684827] I [gsyncdstatus(/data/brick2/gv0):245:set_worker_crawl_status] GeorepStatus: Crawl Status: History Crawl</div><div>[2017-05-15 16:31:46.685203] I [master(/data/brick2/gv0):1244:crawl] _GMaster: starting history crawl... turns: 1, stime: (1492459925, 0), etime: 1494865906, entry_stime: (1492459925, 0)</div><div>[2017-05-15 16:31:46.685420] D [repce(/data/brick2/gv0):191:push] RepceClient: call 12632:140397039490880:1494865906.69 history(&#39;/data/brick2/gv0/.glusterfs/changelogs&#39;, 1492459925, 1494865906, 3) ...</div><div>[2017-05-15 16:31:46.702970] D [repce(/data/brick1/gv0):209:__call__] RepceClient: call 12634:139689683523392:1494865906.68 history -&gt; (0, 1494865893L)</div><div>[2017-05-15 16:31:46.703003] D [repce(/data/brick2/gv0):209:__call__] RepceClient: call 12632:140397039490880:1494865906.69 history -&gt; (0, 1494865897L)</div><div>[2017-05-15 16:31:46.703197] D [repce(/data/brick1/gv0):191:push] RepceClient: call 12634:139689683523392:1494865906.7 history_scan() ...</div><div>[2017-05-15 16:31:46.703249] D [repce(/data/brick2/gv0):191:push] RepceClient: call 12632:140397039490880:1494865906.7 history_scan() ...</div><div>[2017-05-15 16:31:46.703787] D [repce(/data/brick1/gv0):209:__call__] RepceClient: call 12634:139689683523392:1494865906.7 history_scan -&gt; 1</div><div>[2017-05-15 16:31:46.703988] D [repce(/data/brick1/gv0):191:push] RepceClient: call 12634:139689683523392:1494865906.7 history_getchanges() ...</div><div>[2017-05-15 16:31:46.704641] D [repce(/data/brick1/gv0):209:__call__] RepceClient: call 12634:139689683523392:1494865906.7 history_getchanges -&gt; [&#39;/var/lib/misc/glusterfsd/gv0/ssh%3A%2F%2Froot%40172.17.20.60%3Agluster%3A%2F%2F127.0.0.1%3Agv0/430af6dc2d4f6e41e4786764428f83dd/.history/.processing/CHANGELOG.1492459923&#39;]</div><div>[2017-05-15 16:31:46.704828] I [master(/data/brick1/gv0):1272:crawl] _GMaster: slave&#39;s time: (1492459922, 0)</div><div>[2017-05-15 16:31:46.704973] D [master(/data/brick1/gv0):1183:changelogs_batch_process] _GMaster: processing changes [&#39;/var/lib/misc/glusterfsd/gv0/ssh%3A%2F%2Froot%40172.17.20.60%3Agluster%3A%2F%2F127.0.0.1%3Agv0/430af6dc2d4f6e41e4786764428f83dd/.history/.processing/CHANGELOG.1492459923&#39;]</div><div>[2017-05-15 16:31:46.705100] D [master(/data/brick1/gv0):1038:process] _GMaster: processing change /var/lib/misc/glusterfsd/gv0/ssh%3A%2F%2Froot%40172.17.20.60%3Agluster%3A%2F%2F127.0.0.1%3Agv0/430af6dc2d4f6e41e4786764428f83dd/.history/.processing/CHANGELOG.1492459923</div><div>[2017-05-15 16:31:46.706136] D [master(/data/brick1/gv0):948:process_change] _GMaster: entries: [{&#39;uid&#39;: 10000006, &#39;gfid&#39;: &#39;bf3b90bd-34a5-4265-98a6-54e7a783c142&#39;, &#39;gid&#39;: 25961, &#39;mode&#39;: 33200, &#39;entry&#39;: &#39;.gfid/598cc6d2-b95e-4ba2-9a70-d1a9c0f752ce/file-946-of-5000-at-1.00KB&#39;, &#39;op&#39;: &#39;CREATE&#39;}, ... </div><div>...</div><div>/* omitted many file paths to sync */</div><div>...</div><div>[2017-05-15 16:31:46.737530] D [repce(/data/brick1/gv0):209:__call__] RepceClient: call 12634:139689683523392:1494865906.71 entry_ops -&gt; []</div><div>[2017-05-15 16:31:46.741244] E [syncdutils(/data/brick1/gv0):291:log_raise_exception] &lt;top&gt;: glusterfs session went down [ENOTCONN]</div><div>[2017-05-15 16:31:46.741379] E [syncdutils(/data/brick1/gv0):297:log_raise_exception] &lt;top&gt;: FULL EXCEPTION TRACE: </div><div>Traceback (most recent call last):</div><div>  File &quot;/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py&quot;, line 204, in main</div><div>    main_i()</div><div>  File &quot;/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py&quot;, line 780, in main_i</div><div>    local.service_loop(*[r for r in [remote] if r])</div><div>  File &quot;/usr/libexec/glusterfs/python/syncdaemon/resource.py&quot;, line 1610, in service_loop</div><div>    g3.crawlwrap(oneshot=True)</div><div>  File &quot;/usr/libexec/glusterfs/python/syncdaemon/master.py&quot;, line 600, in crawlwrap</div><div>    self.crawl()</div><div>  File &quot;/usr/libexec/glusterfs/python/syncdaemon/master.py&quot;, line 1281, in crawl</div><div>    self.changelogs_batch_process(changes)</div><div>  File &quot;/usr/libexec/glusterfs/python/syncdaemon/master.py&quot;, line 1184, in changelogs_batch_process</div><div>    self.process(batch)</div><div>  File &quot;/usr/libexec/glusterfs/python/syncdaemon/master.py&quot;, line 1039, in process</div><div>    self.process_change(change, done, retry)</div><div>  File &quot;/usr/libexec/glusterfs/python/syncdaemon/master.py&quot;, line 986, in process_change</div><div>    st = lstat(go[0])</div><div>  File &quot;/usr/libexec/glusterfs/python/syncdaemon/syncdutils.py&quot;, line 490, in lstat</div><div>    return errno_wrap(os.lstat, [e], [ENOENT], [ESTALE])</div><div>  File &quot;/usr/libexec/glusterfs/python/syncdaemon/syncdutils.py&quot;, line 473, in errno_wrap</div><div>    return call(*arg)</div><div>OSError: [Errno 107] Transport endpoint is not connected: &#39;.gfid/accf7915-d1dc-4869-86d9-60722ccdf9c4&#39;</div><div><br></div><div>Current geo-replication config:</div><div><br></div><div>special_sync_mode: partial</div><div>gluster_log_file: /var/log/glusterfs/geo-replication/gv0/ssh%3A%2F%2Froot%40172.17.20.60%3Agluster%3A%2F%2F127.0.0.1%3Agv0.gluster.log</div><div>ssh_command: ssh -oPasswordAuthentication=no -oStrictHostKeyChecking=no -i /var/lib/glusterd/geo-replication/secret.pem</div><div>ssh_port: 20022</div><div>change_detector: changelog</div><div>session_owner: d37a7455-0b1b-402e-985b-cf1ace4e513e</div><div>state_file: /var/lib/glusterd/geo-replication/gv0_slave-gfs1.tomfite.com_gv0/monitor.status</div><div>gluster_params: aux-gfid-mount acl</div><div>log_level: DEBUG</div><div>remote_gsyncd: /usr/libexec/glusterfs/gsyncd</div><div>working_dir: /var/lib/misc/glusterfsd/gv0/ssh%3A%2F%2Froot%40172.17.20.60%3Agluster%3A%2F%2F127.0.0.1%3Agv0</div><div>state_detail_file: /var/lib/glusterd/geo-replication/gv0_slave-gfs1.tomfite.com_gv0/ssh%3A%2F%2Froot%40172.17.20.60%3Agluster%3A%2F%2F127.0.0.1%3Agv0-detail.status</div><div>gluster_command_dir: /usr/sbin/</div><div>pid_file: /var/lib/glusterd/geo-replication/gv0_slave-gfs1.tomfite.com_gv0/monitor.pid</div><div>georep_session_working_dir: /var/lib/glusterd/geo-replication/gv0_slave-gfs1.tomfite.com_gv0/</div><div>ssh_command_tar: ssh -oPasswordAuthentication=no -oStrictHostKeyChecking=no -i /var/lib/glusterd/geo-replication/tar_ssh.pem</div><div>master.stime_xattr_name: trusted.glusterfs.d37a7455-0b1b-402e-985b-cf1ace4e513e.30970990-6acb-4f33-a1f2-5c2056004818.stime</div><div>changelog_log_file: /var/log/glusterfs/geo-replication/gv0/ssh%3A%2F%2Froot%40172.17.20.60%3Agluster%3A%2F%2F127.0.0.1%3Agv0-changes.log</div><div>socketdir: /var/run/gluster</div><div>volume_id: d37a7455-0b1b-402e-985b-cf1ace4e513e</div><div>ignore_deletes: false</div><div>state_socket_unencoded: /var/lib/glusterd/geo-replication/gv0_slave-gfs1.tomfite.com_gv0/ssh%3A%2F%2Froot%40172.17.20.60%3Agluster%3A%2F%2F127.0.0.1%3Agv0.socket</div><div>log_file: /var/log/glusterfs/geo-replication/gv0/ssh%3A%2F%2Froot%40172.17.20.60%3Agluster%3A%2F%2F127.0.0.1%3Agv0.log</div><div><br></div><div><br></div><div>Gluster volume status on master</div><div><br></div><div>Gluster process                             TCP Port  RDMA Port  Online  Pid</div><div>------------------------------------------------------------------------------</div><div>Brick master-gfs1.tomfite.com:/data/</div><div>brick1/gv0                                  49152     0          Y       3989 </div><div>Brick master-gfs2.tomfite.com:/data/</div><div>brick1/gv0                                  49152     0          Y       3610 </div><div>Brick master-gfs1.tomfite.com:/data/</div><div>brick2/gv0                                  49153     0          Y       4000 </div><div>Brick master-gfs2.tomfite.com:/data/</div><div>brick2/gv0                                  49153     0          Y       3621 </div><div>Brick master-gfs1.tomfite.com:/data/</div><div>brick3/gv0                                  49154     0          Y       4010 </div><div>Brick master-gfs2.tomfite.com:/data/</div><div>brick3/gv0                                  49154     0          Y       3632 </div><div>Snapshot Daemon on localhost                49197     0          Y       4946 </div><div>NFS Server on localhost                     N/A       N/A        N       N/A  </div><div>Self-heal Daemon on localhost               N/A       N/A        Y       2885 </div><div>Snapshot Daemon on master-gfs2.tomfite</div><div>.com                                       49197     0          Y       4600 </div><div>NFS Server on <a href="http://master-gfs2.tomfite.co">master-gfs2.tomfite.co</a></div><div>m                                           N/A       N/A        N       N/A  </div><div>Self-heal Daemon on master-gfs2.tomfite</div><div>.com                                        N/A       N/A        Y       2856 </div><div> </div><div>Task Status of Volume gv0</div><div>------------------------------------------------------------------------------</div><div>There are no active volume tasks</div><div><br></div><div><br></div><div>Gluster volume status on slave</div><div><br></div><div>Gluster process                             TCP Port  RDMA Port  Online  Pid</div><div>------------------------------------------------------------------------------</div><div>Brick slave-gfs1.tomfite.com:/data/b</div><div>rick1/gv0                                   49152     0          Y       3688 </div><div>Brick slave-gfs2.tomfite.com:/data/b</div><div>rick1/gv0                                   49152     0          Y       3701 </div><div>Brick slave-gfs1.tomfite.com:/data/b</div><div>rick2/gv0                                   49153     0          Y       3696 </div><div>Brick slave-gfs2.tomfite.com:/data/b</div><div>rick2/gv0                                   49153     0          Y       3695 </div><div>Brick slave-gfs1.tomfite.com:/data/b</div><div>rick3/gv0                                   49154     0          Y       3702 </div><div>Brick slave-gfs2.tomfite.com:/data/b</div><div>rick3/gv0                                   49154     0          Y       3707 </div><div>NFS Server on localhost                     N/A       N/A        N       N/A  </div><div>Self-heal Daemon on localhost               N/A       N/A        Y       2630 </div><div>NFS Server on <a href="http://slave-gfs2.tomfite.com">slave-gfs2.tomfite.com</a> N/A       N/A        N       N/A  </div><div>Self-heal Daemon on slave-gfs2.tomfite.</div><div>com                                         N/A       N/A        Y       2635 </div><div> </div><div>Task Status of Volume gv0</div><div>------------------------------------------------------------------------------</div><div>There are no active volume tasks</div><div><br></div><div><br></div><div>Anybody have any other ideas for me to check out?</div></div>