[Gluster-users] [geo-rep] Replication faulty - gsyncd.log OSError: [Errno 13] Permission denied
Kotresh Hiremath Ravishankar
khiremat at redhat.com
Mon Sep 24 14:20:07 UTC 2018
I think I am get what's happening. The geo-rep session is non-root.
Could you do readlink on brick path mentioned above
/bricks/brick1/brick/.glusterfs/29/d1/29d1d60d-1ad6-45fc-87e0-93d478f7331e
from a geaccount user and see if you are getting "Permission Denied" errors?
Thanks,
Kotresh HR
On Mon, Sep 24, 2018 at 7:35 PM Kotte, Christian (Ext) <
christian.kotte at novartis.com> wrote:
> Ok. It happens on all slave nodes (and on the interimmaster as well).
>
>
>
> It’s like I assumed. These are the logs of one of the slaves:
>
>
>
> gsyncd.log
>
> [2018-09-24 13:52:25.418382] I [repce(slave
> slave/bricks/brick1/brick):80:service_loop] RepceServer: terminating on
> reaching EOF.
>
> [2018-09-24 13:52:37.95297] W [gsyncd(slave
> slave/bricks/brick1/brick):293:main] <top>: Session config file not exists,
> using the default config
> path=/var/lib/glusterd/geo-replication/glustervol1_slave_glustervol1/gsyncd.conf
>
> [2018-09-24 13:52:37.109643] I [resource(slave
> slave/bricks/brick1/brick):1096:connect] GLUSTER: Mounting gluster volume
> locally...
>
> [2018-09-24 13:52:38.303920] I [resource(slave
> slave/bricks/brick1/brick):1119:connect] GLUSTER: Mounted gluster
> volume duration=1.1941
>
> [2018-09-24 13:52:38.304771] I [resource(slave
> slave/bricks/brick1/brick):1146:service_loop] GLUSTER: slave listening
>
> [2018-09-24 13:52:41.981554] I [resource(slave
> slave/bricks/brick1/brick):598:entry_ops] <top>: Special case: rename on
> mkdir gfid=29d1d60d-1ad6-45fc-87e0-93d478f7331e
> entry='.gfid/6b97b987-8aef-46c3-af27-20d3aa883016/New folder'
>
> [2018-09-24 13:52:42.45641] E [repce(slave
> slave/bricks/brick1/brick):105:worker] <top>: call failed:
>
> Traceback (most recent call last):
>
> File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 101, in
> worker
>
> res = getattr(self.obj, rmeth)(*in_data[2:])
>
> File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 599,
> in entry_ops
>
> src_entry = get_slv_dir_path(slv_host, slv_volume, gfid)
>
> File "/usr/libexec/glusterfs/python/syncdaemon/syncdutils.py", line 682,
> in get_slv_dir_path
>
> [ENOENT], [ESTALE])
>
> File "/usr/libexec/glusterfs/python/syncdaemon/syncdutils.py", line 540,
> in errno_wrap
>
> return call(*arg)
>
> OSError: [Errno 13] Permission denied:
> '/bricks/brick1/brick/.glusterfs/29/d1/29d1d60d-1ad6-45fc-87e0-93d478f7331e'
>
> [2018-09-24 13:52:42.81794] I [repce(slave
> slave/bricks/brick1/brick):80:service_loop] RepceServer: terminating on
> reaching EOF.
>
> [2018-09-24 13:52:53.459676] W [gsyncd(slave
> slave/bricks/brick1/brick):293:main] <top>: Session config file not exists,
> using the default config
> path=/var/lib/glusterd/geo-replication/glustervol1_slave_glustervol1/gsyncd.conf
>
> [2018-09-24 13:52:53.473500] I [resource(slave
> slave/bricks/brick1/brick):1096:connect] GLUSTER: Mounting gluster volume
> locally...
>
> [2018-09-24 13:52:54.659044] I [resource(slave
> slave/bricks/brick1/brick):1119:connect] GLUSTER: Mounted gluster volume
> duration=1.1854
>
> [2018-09-24 13:52:54.659837] I [resource(slave
> slave/bricks/brick1/brick):1146:service_loop] GLUSTER: slave listening
>
>
>
> The folder “New folder” will be created via Samba and it was renamed by my
> colleague right away after creation.
>
> [root at slave glustervol1_slave_glustervol1]# ls
> /bricks/brick1/brick/.glusterfs/29/d1/29d1d60d-1ad6-45fc-87e0-93d478f7331e/
>
> [root at slave glustervol1_slave_glustervol1]# ls
> /bricks/brick1/brick/.glusterfs/29/d1/ -al
>
> total 0
>
> drwx--S---+ 2 root AD+group 50 Sep 21 09:39 .
>
> drwx--S---+ 11 root AD+group 96 Sep 21 09:39 ..
>
> lrwxrwxrwx. 1 root AD+group 75 Sep 21 09:39
> 29d1d60d-1ad6-45fc-87e0-93d478f7331e ->
> ../../6b/97/6b97b987-8aef-46c3-af27-20d3aa883016/vRealize Operation Manager
>
>
>
> Creating the folder in
> /bricks/brick1/brick/.glusterfs/6b/97/6b97b987-8aef-46c3-af27-20d3aa883016/,
> but it didn’t change anything.
>
>
>
> mnt-slave-bricks-brick1-brick.log
>
> [2018-09-24 13:51:10.625723] W [rpc-clnt.c:1753:rpc_clnt_submit]
> 0-glustervol1-client-0: error returned while attempting to connect to
> host:(null), port:0
>
> [2018-09-24 13:51:10.626092] W [rpc-clnt.c:1753:rpc_clnt_submit]
> 0-glustervol1-client-0: error returned while attempting to connect to
> host:(null), port:0
>
> [2018-09-24 13:51:10.626181] I [rpc-clnt.c:2105:rpc_clnt_reconfig]
> 0-glustervol1-client-0: changing port to 49152 (from 0)
>
> [2018-09-24 13:51:10.643111] W [rpc-clnt.c:1753:rpc_clnt_submit]
> 0-glustervol1-client-0: error returned while attempting to connect to
> host:(null), port:0
>
> [2018-09-24 13:51:10.643489] W [dict.c:923:str_to_data]
> (-->/usr/lib64/glusterfs/4.1.3/xlator/protocol/client.so(+0x4131a)
> [0x7fafb023831a] -->/lib64/libglusterfs.so.0(dict_set_str+0x16)
> [0x7fafbdb83266] -->/lib64/libglusterfs.so.0(str_to_data+0x91)
> [0x7fafbdb7fea1] ) 0-dict: value is NULL [Invalid argument]
>
> [2018-09-24 13:51:10.643507] I [MSGID: 114006]
> [client-handshake.c:1308:client_setvolume] 0-glustervol1-client-0: failed
> to set process-name in handshake msg
>
> [2018-09-24 13:51:10.643541] W [rpc-clnt.c:1753:rpc_clnt_submit]
> 0-glustervol1-client-0: error returned while attempting to connect to
> host:(null), port:0
>
> [2018-09-24 13:51:10.671460] I [MSGID: 114046]
> [client-handshake.c:1176:client_setvolume_cbk] 0-glustervol1-client-0:
> Connected to glustervol1-client-0, attached to remote volume
> '/bricks/brick1/brick'.
>
> [2018-09-24 13:51:10.672694] I [fuse-bridge.c:4294:fuse_init]
> 0-glusterfs-fuse: FUSE inited with protocol versions: glusterfs 7.24 kernel
> 7.22
>
> [2018-09-24 13:51:10.672715] I [fuse-bridge.c:4927:fuse_graph_sync]
> 0-fuse: switched to graph 0
>
> [2018-09-24 13:51:10.673329] I [MSGID: 109005]
> [dht-selfheal.c:2342:dht_selfheal_directory] 0-glustervol1-dht: Directory
> selfheal failed: Unable to form layout for directory /
>
> [2018-09-24 13:51:16.116458] I [fuse-bridge.c:5199:fuse_thread_proc]
> 0-fuse: initating unmount of
> /var/mountbroker-root/user1300/mtpt-geoaccount-ARDW1E
>
> [2018-09-24 13:51:16.116595] W [glusterfsd.c:1514:cleanup_and_exit]
> (-->/lib64/libpthread.so.0(+0x7e25) [0x7fafbc9eee25]
> -->/usr/sbin/glusterfs(glusterfs_sigwaiter+0xe5) [0x55d5dac5dd65]
> -->/usr/sbin/glusterfs(cleanup_and_exit+0x6b) [0x55d5dac5db8b] ) 0-:
> received signum (15), shutting down
>
> [2018-09-24 13:51:16.116616] I [fuse-bridge.c:5981:fini] 0-fuse:
> Unmounting '/var/mountbroker-root/user1300/mtpt-geoaccount-ARDW1E'.
>
> [2018-09-24 13:51:16.116625] I [fuse-bridge.c:5986:fini] 0-fuse: Closing
> fuse connection to '/var/mountbroker-root/user1300/mtpt-geoaccount-ARDW1E'.
>
>
>
> Regards,
>
> Christian
>
>
>
> *From: *Kotresh Hiremath Ravishankar <khiremat at redhat.com>
> *Date: *Saturday, 22. September 2018 at 06:52
> *To: *"Kotte, Christian (Ext)" <christian.kotte at novartis.com>
> *Cc: *Gluster Users <gluster-users at gluster.org>
> *Subject: *Re: [Gluster-users] [geo-rep] Replication faulty - gsyncd.log
> OSError: [Errno 13] Permission denied
>
>
>
> The problem occured on slave side whose error is propagated to master.
> Mostly any traceback with repce involved is related to problem in slave.
> Just check few lines above in the log to find the slave node, the crashed
> worker is connected to and get geo replication logs to further debug.
>
>
>
>
>
>
>
>
>
>
>
> On Fri, 21 Sep 2018, 20:10 Kotte, Christian (Ext), <
> christian.kotte at novartis.com> wrote:
>
> Hi,
>
>
>
> Any idea how to troubleshoot this?
>
>
>
> New folders and files were created on the master and the replication went
> faulty. They were created via Samba.
>
>
>
> Version: GlusterFS 4.1.3
>
>
>
> [root at master]# gluster volume geo-replication status
>
>
>
> MASTER NODE MASTER VOL MASTER BRICK
> SLAVE USER
> SLAVE SLAVE
> NODE STATUS CRAWL STATUS LAST_SYNCED
>
>
> -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
> master glustervol1 /bricks/brick1/brick geoaccount
> ssh://geoaccount@slave_1::glustervol1 N/A Faulty
> N/A N/A
>
> master glustervol1 /bricks/brick1/brick geoaccount
> ssh://geoaccount@slave_2::glustervol1 N/A Faulty
> N/A N/A
>
> master glustervol1 /bricks/brick1/brick geoaccount
> ssh://geoaccount@interimmaster::glustervol1 N/A Faulty
> N/A N/A
>
>
>
> The following error is repeatedly logged in the gsyncd.logs:
>
> [2018-09-21 14:26:38.611479] I [repce(agent
> /bricks/brick1/brick):80:service_loop] RepceServer: terminating on reaching
> EOF.
>
> [2018-09-21 14:26:39.211527] I [monitor(monitor):279:monitor] Monitor:
> worker died in startup phase brick=/bricks/brick1/brick
>
> [2018-09-21 14:26:39.214322] I
> [gsyncdstatus(monitor):244:set_worker_status] GeorepStatus: Worker Status
> Change status=Faulty
>
> [2018-09-21 14:26:49.318953] I [monitor(monitor):158:monitor] Monitor:
> starting gsyncd worker brick=/bricks/brick1/brick slave_node=
> nrchbs-slp2020.nibr.novartis.net
>
> [2018-09-21 14:26:49.471532] I [gsyncd(agent
> /bricks/brick1/brick):297:main] <top>: Using session config file
> path=/var/lib/glusterd/geo-replication/glustervol1_nrchbs-slp2020.nibr.novartis.net_glustervol1/gsyncd.conf
>
> [2018-09-21 14:26:49.473917] I [changelogagent(agent
> /bricks/brick1/brick):72:__init__] ChangelogAgent: Agent listining...
>
> [2018-09-21 14:26:49.491359] I [gsyncd(worker
> /bricks/brick1/brick):297:main] <top>: Using session config file
> path=/var/lib/glusterd/geo-replication/glustervol1_nrchbs-slp2020.nibr.novartis.net_glustervol1/gsyncd.conf
>
> [2018-09-21 14:26:49.538049] I [resource(worker
> /bricks/brick1/brick):1377:connect_remote] SSH: Initializing SSH connection
> between master and slave...
>
> [2018-09-21 14:26:53.5017] I [resource(worker
> /bricks/brick1/brick):1424:connect_remote] SSH: SSH connection between
> master and slave established. duration=3.4665
>
> [2018-09-21 14:26:53.5419] I [resource(worker
> /bricks/brick1/brick):1096:connect] GLUSTER: Mounting gluster volume
> locally...
>
> [2018-09-21 14:26:54.120374] I [resource(worker
> /bricks/brick1/brick):1119:connect] GLUSTER: Mounted gluster volume
> duration=1.1146
>
> [2018-09-21 14:26:54.121012] I [subcmds(worker
> /bricks/brick1/brick):70:subcmd_worker] <top>: Worker spawn successful.
> Acknowledging back to monitor
>
> [2018-09-21 14:26:56.144460] I [master(worker
> /bricks/brick1/brick):1593:register] _GMaster: Working dir
> path=/var/lib/misc/gluster/gsyncd/glustervol1_nrchbs-slp2020.nibr.novartis.net_glustervol1/bricks-brick1-brick
>
> [2018-09-21 14:26:56.145145] I [resource(worker
> /bricks/brick1/brick):1282:service_loop] GLUSTER: Register time
> time=1537540016
>
> [2018-09-21 14:26:56.160064] I [gsyncdstatus(worker
> /bricks/brick1/brick):277:set_active] GeorepStatus: Worker Status Change
> status=Active
>
> [2018-09-21 14:26:56.161175] I [gsyncdstatus(worker
> /bricks/brick1/brick):249:set_worker_crawl_status] GeorepStatus: Crawl
> Status Change status=History Crawl
>
> [2018-09-21 14:26:56.161536] I [master(worker
> /bricks/brick1/brick):1507:crawl] _GMaster: starting history crawl
> turns=1 stime=(1537522637, 0) entry_stime=(1537537141, 0)
> etime=1537540016
>
> [2018-09-21 14:26:56.164277] I [master(worker
> /bricks/brick1/brick):1536:crawl] _GMaster: slave's time
> stime=(1537522637, 0)
>
> [2018-09-21 14:26:56.197065] I [master(worker
> /bricks/brick1/brick):1360:process] _GMaster: Skipping already processed
> entry ops to_changelog=1537522638 num_changelogs=1
> from_changelog=1537522638
>
> [2018-09-21 14:26:56.197402] I [master(worker
> /bricks/brick1/brick):1374:process] _GMaster: Entry Time Taken MKD=0
> MKN=0 LIN=0 SYM=0 REN=0 RMD=0 CRE=0 duration=0.0000 UNL=1
>
> [2018-09-21 14:26:56.197623] I [master(worker
> /bricks/brick1/brick):1384:process] _GMaster: Data/Metadata Time Taken
> SETA=0 SETX=0 meta_duration=0.0000 data_duration=0.0284 DATA=0
> XATT=0
>
> [2018-09-21 14:26:56.198230] I [master(worker
> /bricks/brick1/brick):1394:process] _GMaster: Batch Completed
> changelog_end=1537522638 entry_stime=(1537537141, 0)
> changelog_start=1537522638 stime=(1537522637, 0) duration=0.0333
> num_changelogs=1 mode=history_changelog
>
> [2018-09-21 14:26:57.200436] I [master(worker
> /bricks/brick1/brick):1536:crawl] _GMaster: slave's time
> stime=(1537522637, 0)
>
> [2018-09-21 14:26:57.528625] E [repce(worker
> /bricks/brick1/brick):197:__call__] RepceClient: call failed
> call=17209:140650361157440:1537540017.21 method=entry_ops
> error=OSError
>
> [2018-09-21 14:26:57.529371] E [syncdutils(worker
> /bricks/brick1/brick):332:log_raise_exception] <top>: FAIL:
>
> Traceback (most recent call last):
>
> File "/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py", line 311, in
> main
>
> func(args)
>
> File "/usr/libexec/glusterfs/python/syncdaemon/subcmds.py", line 72, in
> subcmd_worker
>
> local.service_loop(remote)
>
> File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 1288,
> in service_loop
>
> g3.crawlwrap(oneshot=True)
>
> File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 615, in
> crawlwrap
>
> self.crawl()
>
> File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1545, in
> crawl
>
> self.changelogs_batch_process(changes)
>
> File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1445, in
> changelogs_batch_process
>
> self.process(batch)
>
> File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1280, in
> process
>
> self.process_change(change, done, retry)
>
> File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1179, in
> process_change
>
> failures = self.slave.server.entry_ops(entries)
>
> File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 216, in
> __call__
>
> return self.ins(self.meth, *a)
>
> File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 198, in
> __call__
>
> raise res
>
> OSError: [Errno 13] Permission denied:
> '/bricks/brick1/brick/.glusterfs/29/d1/29d1d60d-1ad6-45fc-87e0-93d478f7331e'
>
>
>
> The permissions look fine. The replication is done via geo user instead of
> root. It should be able to read, but I’m not sure if the syncdaemon runs
> under geoaccount!?
>
>
>
> [root at master vRealize Operation Manager]# ll
> /bricks/brick1/brick/.glusterfs/29/d1/29d1d60d-1ad6-45fc-87e0-93d478f7331e
>
> lrwxrwxrwx. 1 root root 75 Sep 21 09:39
> /bricks/brick1/brick/.glusterfs/29/d1/29d1d60d-1ad6-45fc-87e0-93d478f7331e
> -> ../../6b/97/6b97b987-8aef-46c3-af27-20d3aa883016/vRealize Operation
> Manager
>
>
>
> [root at master vRealize Operation Manager]# ll
> /bricks/brick1/brick/.glusterfs/29/d1/29d1d60d-1ad6-45fc-87e0-93d478f7331e/
>
> total 4
>
> drwxrwxr-x. 2 AD+user AD+group 131 Sep 21 10:14 6.7
>
> drwxrwxr-x. 2 AD+user AD+group 4096 Sep 21 09:43 7.0
>
> drwxrwxr-x. 2 AD+user AD+group 57 Sep 21 10:28 7.5
>
> [root at master vRealize Operation Manager]#
>
>
>
> It could be possible that the folder was renamed. I had 3 similar issues
> since I migrated to GlusterFS 4.x but couldn’t investigate much. I needed
> to completely wipe GlusterFS and geo-repliction to get rid of this error…
>
>
>
> Any help is appreciated.
>
>
>
> Regards,
>
>
>
> *Christian Kotte*
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-users
> <https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.gluster.org_mailman_listinfo_gluster-2Dusers&d=DwMFaQ&c=ZbgFmJjg4pdtrnL2HUJUDw&r=faVOd9yfnSYhe2mQhqlDwcpXGm7x8HN1C9wPmFD3694&m=buld78OSs9O-NEZ-w9vywUcr-bP6_RTbL2pwat-zRIU&s=bKc1d7zoIXuVSLbZS_vD3v4-FJrG2I6T6Dhcq8Qk6Bs&e=>
>
>
--
Thanks and Regards,
Kotresh H R
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20180924/cc4428a9/attachment.html>
More information about the Gluster-users
mailing list