<div dir="auto">The problem occured on slave side whose error is propagated to master. Mostly any traceback with repce involved is related to problem in slave. Just check few lines above in the log to find the slave node, the crashed worker is connected to and get geo replication logs to further debug.<div dir="auto"><br></div><div dir="auto"><br><div dir="auto"><br></div><div dir="auto"><br></div></div></div><br><div class="gmail_quote"><div dir="ltr">On Fri, 21 Sep 2018, 20:10 Kotte, Christian (Ext), &lt;<a href="mailto:christian.kotte@novartis.com">christian.kotte@novartis.com</a>&gt; wrote:<br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">





<div lang="DE" link="#0563C1" vlink="#954F72">
<div class="m_8699061385450587676WordSection1">
<p class="MsoNormal"><span lang="EN-GB" style="font-size:11.0pt">Hi,<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-GB" style="font-size:11.0pt"><u></u> <u></u></span></p>
<p class="MsoNormal"><span lang="EN-GB" style="font-size:11.0pt">Any idea how to troubleshoot this?<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-GB" style="font-size:11.0pt"><u></u> <u></u></span></p>
<p class="MsoNormal"><span lang="EN-GB" style="font-size:11.0pt">New folders and files were created on the master and the replication went faulty. They were created via Samba.
<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-GB" style="font-size:11.0pt"><u></u> <u></u></span></p>
<p class="MsoNormal"><span lang="EN-GB" style="font-size:11.0pt">Version: GlusterFS 4.1.3<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-GB" style="font-size:11.0pt"><u></u> <u></u></span></p>
<p class="MsoNormal"><span lang="EN-GB" style="font-size:11.0pt;font-family:&quot;Courier New&quot;">[root@master]# gluster volume geo-replication status<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-GB" style="font-size:11.0pt;font-family:&quot;Courier New&quot;"><u></u> <u></u></span></p>
<p class="MsoNormal"><span lang="EN-GB" style="font-size:11.0pt;font-family:&quot;Courier New&quot;">MASTER NODE                         MASTER VOL     MASTER BRICK            SLAVE USER    SLAVE                                                             SLAVE NODE   
 STATUS    CRAWL STATUS    LAST_SYNCED<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-GB" style="font-size:11.0pt;font-family:&quot;Courier New&quot;">-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-GB" style="font-size:11.0pt;font-family:&quot;Courier New&quot;">master    glustervol1    /bricks/brick1/brick    geoaccount    ssh://geoaccount@slave_1::glustervol1       N/A           Faulty    N/A             N/A<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-GB" style="font-size:11.0pt;font-family:&quot;Courier New&quot;">master    glustervol1    /bricks/brick1/brick    geoaccount    ssh://geoaccount@slave_2::glustervol1       N/A           Faulty    N/A             N/A<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-GB" style="font-size:11.0pt;font-family:&quot;Courier New&quot;">master    glustervol1    /bricks/brick1/brick    geoaccount    ssh://geoaccount@interimmaster::glustervol1   N/A           Faulty    N/A             N/A<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-GB" style="font-size:11.0pt"><u></u> <u></u></span></p>
<p class="MsoNormal"><span lang="EN-GB" style="font-size:11.0pt">The following error is repeatedly logged in the gsyncd.logs:<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-GB" style="font-size:11.0pt;font-family:&quot;Courier New&quot;">[2018-09-21 14:26:38.611479] I [repce(agent /bricks/brick1/brick):80:service_loop] RepceServer: terminating on reaching EOF.<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-GB" style="font-size:11.0pt;font-family:&quot;Courier New&quot;">[2018-09-21 14:26:39.211527] I [monitor(monitor):279:monitor] Monitor: worker died in startup phase     brick=/bricks/brick1/brick<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-GB" style="font-size:11.0pt;font-family:&quot;Courier New&quot;">[2018-09-21 14:26:39.214322] I [gsyncdstatus(monitor):244:set_worker_status] GeorepStatus: Worker Status Change status=Faulty<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-GB" style="font-size:11.0pt;font-family:&quot;Courier New&quot;">[2018-09-21 14:26:49.318953] I [monitor(monitor):158:monitor] Monitor: starting gsyncd worker   brick=/bricks/brick1/brick      slave_node=<a href="http://nrchbs-slp2020.nibr.novartis.net" target="_blank" rel="noreferrer">nrchbs-slp2020.nibr.novartis.net</a><u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-GB" style="font-size:11.0pt;font-family:&quot;Courier New&quot;">[2018-09-21 14:26:49.471532] I [gsyncd(agent /bricks/brick1/brick):297:main] &lt;top&gt;: Using session config file   path=/var/lib/glusterd/geo-replication/glustervol1_nrchbs-slp2020.nibr.novartis.net_glustervol1/gsyncd.conf<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-GB" style="font-size:11.0pt;font-family:&quot;Courier New&quot;">[2018-09-21 14:26:49.473917] I [changelogagent(agent /bricks/brick1/brick):72:__init__] ChangelogAgent: Agent listining...<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-GB" style="font-size:11.0pt;font-family:&quot;Courier New&quot;">[2018-09-21 14:26:49.491359] I [gsyncd(worker /bricks/brick1/brick):297:main] &lt;top&gt;: Using session config file  path=/var/lib/glusterd/geo-replication/glustervol1_nrchbs-slp2020.nibr.novartis.net_glustervol1/gsyncd.conf<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-GB" style="font-size:11.0pt;font-family:&quot;Courier New&quot;">[2018-09-21 14:26:49.538049] I [resource(worker /bricks/brick1/brick):1377:connect_remote] SSH: Initializing SSH connection between master and slave...<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-GB" style="font-size:11.0pt;font-family:&quot;Courier New&quot;">[2018-09-21 14:26:53.5017] I [resource(worker /bricks/brick1/brick):1424:connect_remote] SSH: SSH connection between master and slave established.      duration=3.4665<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-GB" style="font-size:11.0pt;font-family:&quot;Courier New&quot;">[2018-09-21 14:26:53.5419] I [resource(worker /bricks/brick1/brick):1096:connect] GLUSTER: Mounting gluster volume locally...<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-GB" style="font-size:11.0pt;font-family:&quot;Courier New&quot;">[2018-09-21 14:26:54.120374] I [resource(worker /bricks/brick1/brick):1119:connect] GLUSTER: Mounted gluster volume     duration=1.1146<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-GB" style="font-size:11.0pt;font-family:&quot;Courier New&quot;">[2018-09-21 14:26:54.121012] I [subcmds(worker /bricks/brick1/brick):70:subcmd_worker] &lt;top&gt;: Worker spawn successful. Acknowledging back to monitor<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-GB" style="font-size:11.0pt;font-family:&quot;Courier New&quot;">[2018-09-21 14:26:56.144460] I [master(worker /bricks/brick1/brick):1593:register] _GMaster: Working dir        path=/var/lib/misc/gluster/gsyncd/glustervol1_nrchbs-slp2020.nibr.novartis.net_glustervol1/bricks-brick1-brick<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-GB" style="font-size:11.0pt;font-family:&quot;Courier New&quot;">[2018-09-21 14:26:56.145145] I [resource(worker /bricks/brick1/brick):1282:service_loop] GLUSTER: Register time time=1537540016<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-GB" style="font-size:11.0pt;font-family:&quot;Courier New&quot;">[2018-09-21 14:26:56.160064] I [gsyncdstatus(worker /bricks/brick1/brick):277:set_active] GeorepStatus: Worker Status Change    status=Active<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-GB" style="font-size:11.0pt;font-family:&quot;Courier New&quot;">[2018-09-21 14:26:56.161175] I [gsyncdstatus(worker /bricks/brick1/brick):249:set_worker_crawl_status] GeorepStatus: Crawl Status Change        status=History Crawl<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-GB" style="font-size:11.0pt;font-family:&quot;Courier New&quot;">[2018-09-21 14:26:56.161536] I [master(worker /bricks/brick1/brick):1507:crawl] _GMaster: starting history crawl        turns=1 stime=(1537522637, 0)   entry_stime=(1537537141,
 0)     etime=1537540016<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-GB" style="font-size:11.0pt;font-family:&quot;Courier New&quot;">[2018-09-21 14:26:56.164277] I [master(worker /bricks/brick1/brick):1536:crawl] _GMaster: slave&#39;s time  stime=(1537522637, 0)<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-GB" style="font-size:11.0pt;font-family:&quot;Courier New&quot;">[2018-09-21 14:26:56.197065] I [master(worker /bricks/brick1/brick):1360:process] _GMaster: Skipping already processed entry ops        to_changelog=1537522638 num_changelogs=1       
 from_changelog=1537522638<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-GB" style="font-size:11.0pt;font-family:&quot;Courier New&quot;">[2018-09-21 14:26:56.197402] I [master(worker /bricks/brick1/brick):1374:process] _GMaster: Entry Time Taken    MKD=0   MKN=0   LIN=0   SYM=0   REN=0   RMD=0   CRE=0  
 duration=0.0000 UNL=1<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-GB" style="font-size:11.0pt;font-family:&quot;Courier New&quot;">[2018-09-21 14:26:56.197623] I [master(worker /bricks/brick1/brick):1384:process] _GMaster: Data/Metadata Time Taken    SETA=0  SETX=0  meta_duration=0.0000    data_duration=0.0284   
 DATA=0  XATT=0<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-GB" style="font-size:11.0pt;font-family:&quot;Courier New&quot;">[2018-09-21 14:26:56.198230] I [master(worker /bricks/brick1/brick):1394:process] _GMaster: Batch Completed     changelog_end=1537522638        entry_stime=(1537537141,
 0)     changelog_start=1537522638      stime=(1537522637, 0)   duration=0.0333 num_changelogs=1        mode=history_changelog<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-GB" style="font-size:11.0pt;font-family:&quot;Courier New&quot;">[2018-09-21 14:26:57.200436] I [master(worker /bricks/brick1/brick):1536:crawl] _GMaster: slave&#39;s time  stime=(1537522637, 0)<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-GB" style="font-size:11.0pt;font-family:&quot;Courier New&quot;">[2018-09-21 14:26:57.528625] E [repce(worker /bricks/brick1/brick):197:__call__] RepceClient: call failed       call=17209:140650361157440:1537540017.21        method=entry_ops       
 error=OSError<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-GB" style="font-size:11.0pt;font-family:&quot;Courier New&quot;">[2018-09-21 14:26:57.529371] E [syncdutils(worker /bricks/brick1/brick):332:log_raise_exception] &lt;top&gt;: FAIL:<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-GB" style="font-size:11.0pt;font-family:&quot;Courier New&quot;">Traceback (most recent call last):<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-GB" style="font-size:11.0pt;font-family:&quot;Courier New&quot;">  File &quot;/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py&quot;, line 311, in main<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-GB" style="font-size:11.0pt;font-family:&quot;Courier New&quot;">    func(args)<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-GB" style="font-size:11.0pt;font-family:&quot;Courier New&quot;">  File &quot;/usr/libexec/glusterfs/python/syncdaemon/subcmds.py&quot;, line 72, in subcmd_worker<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-GB" style="font-size:11.0pt;font-family:&quot;Courier New&quot;">    local.service_loop(remote)<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-GB" style="font-size:11.0pt;font-family:&quot;Courier New&quot;">  File &quot;/usr/libexec/glusterfs/python/syncdaemon/resource.py&quot;, line 1288, in service_loop<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-GB" style="font-size:11.0pt;font-family:&quot;Courier New&quot;">    g3.crawlwrap(oneshot=True)<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-GB" style="font-size:11.0pt;font-family:&quot;Courier New&quot;">  File &quot;/usr/libexec/glusterfs/python/syncdaemon/master.py&quot;, line 615, in crawlwrap<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-GB" style="font-size:11.0pt;font-family:&quot;Courier New&quot;">    self.crawl()<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-GB" style="font-size:11.0pt;font-family:&quot;Courier New&quot;">  File &quot;/usr/libexec/glusterfs/python/syncdaemon/master.py&quot;, line 1545, in crawl<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-GB" style="font-size:11.0pt;font-family:&quot;Courier New&quot;">    self.changelogs_batch_process(changes)<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-GB" style="font-size:11.0pt;font-family:&quot;Courier New&quot;">  File &quot;/usr/libexec/glusterfs/python/syncdaemon/master.py&quot;, line 1445, in changelogs_batch_process<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-GB" style="font-size:11.0pt;font-family:&quot;Courier New&quot;">    self.process(batch)<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-GB" style="font-size:11.0pt;font-family:&quot;Courier New&quot;">  File &quot;/usr/libexec/glusterfs/python/syncdaemon/master.py&quot;, line 1280, in process<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-GB" style="font-size:11.0pt;font-family:&quot;Courier New&quot;">    self.process_change(change, done, retry)<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-GB" style="font-size:11.0pt;font-family:&quot;Courier New&quot;">  File &quot;/usr/libexec/glusterfs/python/syncdaemon/master.py&quot;, line 1179, in process_change<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-GB" style="font-size:11.0pt;font-family:&quot;Courier New&quot;">    failures = self.slave.server.entry_ops(entries)<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-GB" style="font-size:11.0pt;font-family:&quot;Courier New&quot;">  File &quot;/usr/libexec/glusterfs/python/syncdaemon/repce.py&quot;, line 216, in __call__<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-GB" style="font-size:11.0pt;font-family:&quot;Courier New&quot;">    return self.ins(self.meth, *a)<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-GB" style="font-size:11.0pt;font-family:&quot;Courier New&quot;">  File &quot;/usr/libexec/glusterfs/python/syncdaemon/repce.py&quot;, line 198, in __call__<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-GB" style="font-size:11.0pt;font-family:&quot;Courier New&quot;">    raise res<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-GB" style="font-size:11.0pt;font-family:&quot;Courier New&quot;">OSError: [Errno 13] Permission denied: &#39;/bricks/brick1/brick/.glusterfs/29/d1/29d1d60d-1ad6-45fc-87e0-93d478f7331e&#39;<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-GB" style="font-size:11.0pt"><u></u> <u></u></span></p>
<p class="MsoNormal"><span lang="EN-GB" style="font-size:11.0pt">The permissions look fine. The replication is done via geo user instead of root. It should be able to read, but I’m not sure if the syncdaemon runs under geoaccount!?<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-GB" style="font-size:11.0pt"><u></u> <u></u></span></p>
<p class="MsoNormal"><span lang="EN-GB" style="font-size:11.0pt;font-family:&quot;Courier New&quot;">[root@master vRealize Operation Manager]# ll /bricks/brick1/brick/.glusterfs/29/d1/29d1d60d-1ad6-45fc-87e0-93d478f7331e<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-GB" style="font-size:11.0pt;font-family:&quot;Courier New&quot;">lrwxrwxrwx. 1 root root 75 Sep 21 09:39 /bricks/brick1/brick/.glusterfs/29/d1/29d1d60d-1ad6-45fc-87e0-93d478f7331e -&gt; ../../6b/97/6b97b987-8aef-46c3-af27-20d3aa883016/vRealize
 Operation Manager<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-GB" style="font-size:11.0pt;font-family:&quot;Courier New&quot;"><u></u> <u></u></span></p>
<p class="MsoNormal"><span lang="EN-GB" style="font-size:11.0pt;font-family:&quot;Courier New&quot;">[root@master vRealize Operation Manager]# ll /bricks/brick1/brick/.glusterfs/29/d1/29d1d60d-1ad6-45fc-87e0-93d478f7331e/<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-GB" style="font-size:11.0pt;font-family:&quot;Courier New&quot;">total 4<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-GB" style="font-size:11.0pt;font-family:&quot;Courier New&quot;">drwxrwxr-x. 2 AD+user AD+group  131 Sep 21 10:14 6.7<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-GB" style="font-size:11.0pt;font-family:&quot;Courier New&quot;">drwxrwxr-x. 2 AD+user AD+group 4096 Sep 21 09:43 7.0<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-GB" style="font-size:11.0pt;font-family:&quot;Courier New&quot;">drwxrwxr-x. 2 AD+user AD+group   57 Sep 21 10:28 7.5<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-GB" style="font-size:11.0pt;font-family:&quot;Courier New&quot;">[root@master vRealize Operation Manager]#<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-GB" style="font-size:11.0pt"><u></u> <u></u></span></p>
<p class="MsoNormal"><span lang="EN-GB" style="font-size:11.0pt">It could be possible that the folder was renamed. I had 3 similar issues since I migrated to GlusterFS 4.x but couldn’t investigate much. I needed to completely wipe GlusterFS and geo-repliction
 to get rid of this error…<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-GB" style="font-size:11.0pt"><u></u> <u></u></span></p>
<p class="MsoNormal"><span lang="EN-GB" style="font-size:11.0pt">Any help is appreciated.<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-GB" style="font-size:11.0pt"><u></u> <u></u></span></p>
<p class="MsoNormal"><span lang="EN-GB" style="font-size:11.0pt;color:black">Regards,<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-GB" style="font-size:11.0pt;color:black"> <u></u><u></u></span></p>
<p class="MsoNormal"><b><span lang="EN-GB" style="font-size:11.0pt;color:black">Christian Kotte</span></b><span lang="EN-GB" style="font-size:11.0pt;color:black"><u></u><u></u></span></p>
</div>
</div>

_______________________________________________<br>
Gluster-users mailing list<br>
<a href="mailto:Gluster-users@gluster.org" target="_blank" rel="noreferrer">Gluster-users@gluster.org</a><br>
<a href="https://lists.gluster.org/mailman/listinfo/gluster-users" rel="noreferrer noreferrer" target="_blank">https://lists.gluster.org/mailman/listinfo/gluster-users</a></blockquote></div>