<div dir="auto">The problem occured on slave side whose error is propagated to master. Mostly any traceback with repce involved is related to problem in slave. Just check few lines above in the log to find the slave node, the crashed worker is connected to and get geo replication logs to further debug.<div dir="auto"><br></div><div dir="auto"><br><div dir="auto"><br></div><div dir="auto"><br></div></div></div><br><div class="gmail_quote"><div dir="ltr">On Fri, 21 Sep 2018, 20:10 Kotte, Christian (Ext), <<a href="mailto:christian.kotte@novartis.com">christian.kotte@novartis.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div lang="DE" link="#0563C1" vlink="#954F72">
<div class="m_8699061385450587676WordSection1">
<p class="MsoNormal"><span lang="EN-GB" style="font-size:11.0pt">Hi,<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-GB" style="font-size:11.0pt"><u></u> <u></u></span></p>
<p class="MsoNormal"><span lang="EN-GB" style="font-size:11.0pt">Any idea how to troubleshoot this?<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-GB" style="font-size:11.0pt"><u></u> <u></u></span></p>
<p class="MsoNormal"><span lang="EN-GB" style="font-size:11.0pt">New folders and files were created on the master and the replication went faulty. They were created via Samba.
<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-GB" style="font-size:11.0pt"><u></u> <u></u></span></p>
<p class="MsoNormal"><span lang="EN-GB" style="font-size:11.0pt">Version: GlusterFS 4.1.3<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-GB" style="font-size:11.0pt"><u></u> <u></u></span></p>
<p class="MsoNormal"><span lang="EN-GB" style="font-size:11.0pt;font-family:"Courier New"">[root@master]# gluster volume geo-replication status<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-GB" style="font-size:11.0pt;font-family:"Courier New""><u></u> <u></u></span></p>
<p class="MsoNormal"><span lang="EN-GB" style="font-size:11.0pt;font-family:"Courier New"">MASTER NODE MASTER VOL MASTER BRICK SLAVE USER SLAVE SLAVE NODE
STATUS CRAWL STATUS LAST_SYNCED<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-GB" style="font-size:11.0pt;font-family:"Courier New"">-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-GB" style="font-size:11.0pt;font-family:"Courier New"">master glustervol1 /bricks/brick1/brick geoaccount ssh://geoaccount@slave_1::glustervol1 N/A Faulty N/A N/A<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-GB" style="font-size:11.0pt;font-family:"Courier New"">master glustervol1 /bricks/brick1/brick geoaccount ssh://geoaccount@slave_2::glustervol1 N/A Faulty N/A N/A<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-GB" style="font-size:11.0pt;font-family:"Courier New"">master glustervol1 /bricks/brick1/brick geoaccount ssh://geoaccount@interimmaster::glustervol1 N/A Faulty N/A N/A<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-GB" style="font-size:11.0pt"><u></u> <u></u></span></p>
<p class="MsoNormal"><span lang="EN-GB" style="font-size:11.0pt">The following error is repeatedly logged in the gsyncd.logs:<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-GB" style="font-size:11.0pt;font-family:"Courier New"">[2018-09-21 14:26:38.611479] I [repce(agent /bricks/brick1/brick):80:service_loop] RepceServer: terminating on reaching EOF.<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-GB" style="font-size:11.0pt;font-family:"Courier New"">[2018-09-21 14:26:39.211527] I [monitor(monitor):279:monitor] Monitor: worker died in startup phase brick=/bricks/brick1/brick<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-GB" style="font-size:11.0pt;font-family:"Courier New"">[2018-09-21 14:26:39.214322] I [gsyncdstatus(monitor):244:set_worker_status] GeorepStatus: Worker Status Change status=Faulty<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-GB" style="font-size:11.0pt;font-family:"Courier New"">[2018-09-21 14:26:49.318953] I [monitor(monitor):158:monitor] Monitor: starting gsyncd worker brick=/bricks/brick1/brick slave_node=<a href="http://nrchbs-slp2020.nibr.novartis.net" target="_blank" rel="noreferrer">nrchbs-slp2020.nibr.novartis.net</a><u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-GB" style="font-size:11.0pt;font-family:"Courier New"">[2018-09-21 14:26:49.471532] I [gsyncd(agent /bricks/brick1/brick):297:main] <top>: Using session config file path=/var/lib/glusterd/geo-replication/glustervol1_nrchbs-slp2020.nibr.novartis.net_glustervol1/gsyncd.conf<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-GB" style="font-size:11.0pt;font-family:"Courier New"">[2018-09-21 14:26:49.473917] I [changelogagent(agent /bricks/brick1/brick):72:__init__] ChangelogAgent: Agent listining...<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-GB" style="font-size:11.0pt;font-family:"Courier New"">[2018-09-21 14:26:49.491359] I [gsyncd(worker /bricks/brick1/brick):297:main] <top>: Using session config file path=/var/lib/glusterd/geo-replication/glustervol1_nrchbs-slp2020.nibr.novartis.net_glustervol1/gsyncd.conf<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-GB" style="font-size:11.0pt;font-family:"Courier New"">[2018-09-21 14:26:49.538049] I [resource(worker /bricks/brick1/brick):1377:connect_remote] SSH: Initializing SSH connection between master and slave...<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-GB" style="font-size:11.0pt;font-family:"Courier New"">[2018-09-21 14:26:53.5017] I [resource(worker /bricks/brick1/brick):1424:connect_remote] SSH: SSH connection between master and slave established. duration=3.4665<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-GB" style="font-size:11.0pt;font-family:"Courier New"">[2018-09-21 14:26:53.5419] I [resource(worker /bricks/brick1/brick):1096:connect] GLUSTER: Mounting gluster volume locally...<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-GB" style="font-size:11.0pt;font-family:"Courier New"">[2018-09-21 14:26:54.120374] I [resource(worker /bricks/brick1/brick):1119:connect] GLUSTER: Mounted gluster volume duration=1.1146<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-GB" style="font-size:11.0pt;font-family:"Courier New"">[2018-09-21 14:26:54.121012] I [subcmds(worker /bricks/brick1/brick):70:subcmd_worker] <top>: Worker spawn successful. Acknowledging back to monitor<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-GB" style="font-size:11.0pt;font-family:"Courier New"">[2018-09-21 14:26:56.144460] I [master(worker /bricks/brick1/brick):1593:register] _GMaster: Working dir path=/var/lib/misc/gluster/gsyncd/glustervol1_nrchbs-slp2020.nibr.novartis.net_glustervol1/bricks-brick1-brick<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-GB" style="font-size:11.0pt;font-family:"Courier New"">[2018-09-21 14:26:56.145145] I [resource(worker /bricks/brick1/brick):1282:service_loop] GLUSTER: Register time time=1537540016<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-GB" style="font-size:11.0pt;font-family:"Courier New"">[2018-09-21 14:26:56.160064] I [gsyncdstatus(worker /bricks/brick1/brick):277:set_active] GeorepStatus: Worker Status Change status=Active<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-GB" style="font-size:11.0pt;font-family:"Courier New"">[2018-09-21 14:26:56.161175] I [gsyncdstatus(worker /bricks/brick1/brick):249:set_worker_crawl_status] GeorepStatus: Crawl Status Change status=History Crawl<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-GB" style="font-size:11.0pt;font-family:"Courier New"">[2018-09-21 14:26:56.161536] I [master(worker /bricks/brick1/brick):1507:crawl] _GMaster: starting history crawl turns=1 stime=(1537522637, 0) entry_stime=(1537537141,
0) etime=1537540016<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-GB" style="font-size:11.0pt;font-family:"Courier New"">[2018-09-21 14:26:56.164277] I [master(worker /bricks/brick1/brick):1536:crawl] _GMaster: slave's time stime=(1537522637, 0)<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-GB" style="font-size:11.0pt;font-family:"Courier New"">[2018-09-21 14:26:56.197065] I [master(worker /bricks/brick1/brick):1360:process] _GMaster: Skipping already processed entry ops to_changelog=1537522638 num_changelogs=1
from_changelog=1537522638<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-GB" style="font-size:11.0pt;font-family:"Courier New"">[2018-09-21 14:26:56.197402] I [master(worker /bricks/brick1/brick):1374:process] _GMaster: Entry Time Taken MKD=0 MKN=0 LIN=0 SYM=0 REN=0 RMD=0 CRE=0
duration=0.0000 UNL=1<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-GB" style="font-size:11.0pt;font-family:"Courier New"">[2018-09-21 14:26:56.197623] I [master(worker /bricks/brick1/brick):1384:process] _GMaster: Data/Metadata Time Taken SETA=0 SETX=0 meta_duration=0.0000 data_duration=0.0284
DATA=0 XATT=0<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-GB" style="font-size:11.0pt;font-family:"Courier New"">[2018-09-21 14:26:56.198230] I [master(worker /bricks/brick1/brick):1394:process] _GMaster: Batch Completed changelog_end=1537522638 entry_stime=(1537537141,
0) changelog_start=1537522638 stime=(1537522637, 0) duration=0.0333 num_changelogs=1 mode=history_changelog<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-GB" style="font-size:11.0pt;font-family:"Courier New"">[2018-09-21 14:26:57.200436] I [master(worker /bricks/brick1/brick):1536:crawl] _GMaster: slave's time stime=(1537522637, 0)<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-GB" style="font-size:11.0pt;font-family:"Courier New"">[2018-09-21 14:26:57.528625] E [repce(worker /bricks/brick1/brick):197:__call__] RepceClient: call failed call=17209:140650361157440:1537540017.21 method=entry_ops
error=OSError<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-GB" style="font-size:11.0pt;font-family:"Courier New"">[2018-09-21 14:26:57.529371] E [syncdutils(worker /bricks/brick1/brick):332:log_raise_exception] <top>: FAIL:<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-GB" style="font-size:11.0pt;font-family:"Courier New"">Traceback (most recent call last):<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-GB" style="font-size:11.0pt;font-family:"Courier New""> File "/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py", line 311, in main<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-GB" style="font-size:11.0pt;font-family:"Courier New""> func(args)<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-GB" style="font-size:11.0pt;font-family:"Courier New""> File "/usr/libexec/glusterfs/python/syncdaemon/subcmds.py", line 72, in subcmd_worker<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-GB" style="font-size:11.0pt;font-family:"Courier New""> local.service_loop(remote)<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-GB" style="font-size:11.0pt;font-family:"Courier New""> File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 1288, in service_loop<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-GB" style="font-size:11.0pt;font-family:"Courier New""> g3.crawlwrap(oneshot=True)<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-GB" style="font-size:11.0pt;font-family:"Courier New""> File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 615, in crawlwrap<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-GB" style="font-size:11.0pt;font-family:"Courier New""> self.crawl()<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-GB" style="font-size:11.0pt;font-family:"Courier New""> File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1545, in crawl<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-GB" style="font-size:11.0pt;font-family:"Courier New""> self.changelogs_batch_process(changes)<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-GB" style="font-size:11.0pt;font-family:"Courier New""> File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1445, in changelogs_batch_process<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-GB" style="font-size:11.0pt;font-family:"Courier New""> self.process(batch)<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-GB" style="font-size:11.0pt;font-family:"Courier New""> File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1280, in process<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-GB" style="font-size:11.0pt;font-family:"Courier New""> self.process_change(change, done, retry)<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-GB" style="font-size:11.0pt;font-family:"Courier New""> File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1179, in process_change<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-GB" style="font-size:11.0pt;font-family:"Courier New""> failures = self.slave.server.entry_ops(entries)<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-GB" style="font-size:11.0pt;font-family:"Courier New""> File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 216, in __call__<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-GB" style="font-size:11.0pt;font-family:"Courier New""> return self.ins(self.meth, *a)<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-GB" style="font-size:11.0pt;font-family:"Courier New""> File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 198, in __call__<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-GB" style="font-size:11.0pt;font-family:"Courier New""> raise res<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-GB" style="font-size:11.0pt;font-family:"Courier New"">OSError: [Errno 13] Permission denied: '/bricks/brick1/brick/.glusterfs/29/d1/29d1d60d-1ad6-45fc-87e0-93d478f7331e'<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-GB" style="font-size:11.0pt"><u></u> <u></u></span></p>
<p class="MsoNormal"><span lang="EN-GB" style="font-size:11.0pt">The permissions look fine. The replication is done via geo user instead of root. It should be able to read, but I’m not sure if the syncdaemon runs under geoaccount!?<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-GB" style="font-size:11.0pt"><u></u> <u></u></span></p>
<p class="MsoNormal"><span lang="EN-GB" style="font-size:11.0pt;font-family:"Courier New"">[root@master vRealize Operation Manager]# ll /bricks/brick1/brick/.glusterfs/29/d1/29d1d60d-1ad6-45fc-87e0-93d478f7331e<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-GB" style="font-size:11.0pt;font-family:"Courier New"">lrwxrwxrwx. 1 root root 75 Sep 21 09:39 /bricks/brick1/brick/.glusterfs/29/d1/29d1d60d-1ad6-45fc-87e0-93d478f7331e -> ../../6b/97/6b97b987-8aef-46c3-af27-20d3aa883016/vRealize
Operation Manager<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-GB" style="font-size:11.0pt;font-family:"Courier New""><u></u> <u></u></span></p>
<p class="MsoNormal"><span lang="EN-GB" style="font-size:11.0pt;font-family:"Courier New"">[root@master vRealize Operation Manager]# ll /bricks/brick1/brick/.glusterfs/29/d1/29d1d60d-1ad6-45fc-87e0-93d478f7331e/<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-GB" style="font-size:11.0pt;font-family:"Courier New"">total 4<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-GB" style="font-size:11.0pt;font-family:"Courier New"">drwxrwxr-x. 2 AD+user AD+group 131 Sep 21 10:14 6.7<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-GB" style="font-size:11.0pt;font-family:"Courier New"">drwxrwxr-x. 2 AD+user AD+group 4096 Sep 21 09:43 7.0<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-GB" style="font-size:11.0pt;font-family:"Courier New"">drwxrwxr-x. 2 AD+user AD+group 57 Sep 21 10:28 7.5<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-GB" style="font-size:11.0pt;font-family:"Courier New"">[root@master vRealize Operation Manager]#<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-GB" style="font-size:11.0pt"><u></u> <u></u></span></p>
<p class="MsoNormal"><span lang="EN-GB" style="font-size:11.0pt">It could be possible that the folder was renamed. I had 3 similar issues since I migrated to GlusterFS 4.x but couldn’t investigate much. I needed to completely wipe GlusterFS and geo-repliction
to get rid of this error…<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-GB" style="font-size:11.0pt"><u></u> <u></u></span></p>
<p class="MsoNormal"><span lang="EN-GB" style="font-size:11.0pt">Any help is appreciated.<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-GB" style="font-size:11.0pt"><u></u> <u></u></span></p>
<p class="MsoNormal"><span lang="EN-GB" style="font-size:11.0pt;color:black">Regards,<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-GB" style="font-size:11.0pt;color:black"> <u></u><u></u></span></p>
<p class="MsoNormal"><b><span lang="EN-GB" style="font-size:11.0pt;color:black">Christian Kotte</span></b><span lang="EN-GB" style="font-size:11.0pt;color:black"><u></u><u></u></span></p>
</div>
</div>
_______________________________________________<br>
Gluster-users mailing list<br>
<a href="mailto:Gluster-users@gluster.org" target="_blank" rel="noreferrer">Gluster-users@gluster.org</a><br>
<a href="https://lists.gluster.org/mailman/listinfo/gluster-users" rel="noreferrer noreferrer" target="_blank">https://lists.gluster.org/mailman/listinfo/gluster-users</a></blockquote></div>