[Gluster-users] [geo-rep] Replication faulty - gsyncd.log OSError: [Errno 13] Permission denied

Andy Coates andy.coates at gmail.com
Wed Mar 20 06:21:47 UTC 2019


We're seeing the same permission denied errors when running as a non-root
geosync user.

Does anyone know what the underlying issue is?

On Wed, 26 Sep 2018 at 00:40, Kotte, Christian (Ext) <
christian.kotte at novartis.com> wrote:

> I changed to replication to use the root user and re-created the
> replication with “create force”. Now the files and folders were replicated,
> and the permission denied, and New folder error disappeared, but old files
> are not deleted.
>
>
>
> Looks like the history crawl is in some kind of a loop:
>
>
>
> [root at master ~]# gluster volume geo-replication status
>
>
>
> MASTER NODE                         MASTER VOL     MASTER BRICK
> SLAVE USER    SLAVE                                                  SLAVE
> NODE                          STATUS    CRAWL STATUS    LAST_SYNCED
>
>
> ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
> master                              glustervol1    /bricks/brick1/brick
> root          ssh://slave_1::glustervol1
> slave_1                             Active    Hybrid Crawl    N/A
>
> master                              glustervol1    /bricks/brick1/brick
> root          ssh://slave_2::glustervol1
> slave_2                             Active    Hybrid Crawl    N/A
>
> master                              glustervol1    /bricks/brick1/brick
> root          ssh://slave_3::glustervol1
> slave_3                             Active    Hybrid Crawl    N/A
>
>
>
> tail -f
> /var/log/glusterfs/geo-replication/glustervol1_slave_3_glustervol1/gsyncd.log
>
>   File "/usr/libexec/glusterfs/python/syncdaemon/libgfchangelog.py", line
> 104, in cl_history_changelog
>
>     raise ChangelogHistoryNotAvailable()
>
> ChangelogHistoryNotAvailable
>
> [2018-09-25 14:10:44.196011] E [repce(worker
> /bricks/brick1/brick):197:__call__] RepceClient: call failed
> call=29945:139700517484352:1537884644.19  method=history
> error=ChangelogHistoryNotAvailable
>
> [2018-09-25 14:10:44.196405] I [resource(worker
> /bricks/brick1/brick):1295:service_loop] GLUSTER: Changelog history not
> available, using xsync
>
> [2018-09-25 14:10:44.221385] I [master(worker
> /bricks/brick1/brick):1623:crawl] _GMaster: starting hybrid crawl
> stime=(0, 0)
>
> [2018-09-25 14:10:44.223382] I [gsyncdstatus(worker
> /bricks/brick1/brick):249:set_worker_crawl_status] GeorepStatus: Crawl
> Status Change      status=Hybrid Crawl
>
> [2018-09-25 14:10:46.225296] I [master(worker
> /bricks/brick1/brick):1634:crawl] _GMaster: processing xsync changelog
> path=/var/lib/misc/gluster/gsyncd/glustervol1_slave_3_glustervol1/bricks-brick1-brick/xsync/XSYNC-CHANGELOG.1537884644
>
> [2018-09-25 14:13:36.157408] I [gsyncd(config-get):297:main] <top>: Using
> session config file
> path=/var/lib/glusterd/geo-replication/glustervol1_slave_3_glustervol1/gsyncd.conf
>
> [2018-09-25 14:13:36.377880] I [gsyncd(status):297:main] <top>: Using
> session config file
> path=/var/lib/glusterd/geo-replication/glustervol1_slave_3_glustervol1/gsyncd.conf
>
> [2018-09-25 14:31:10.145035] I [master(worker
> /bricks/brick1/brick):1944:syncjob] Syncer: Sync Time Taken
> duration=1212.5316      num_files=1474    job=2 return_code=11
>
> [2018-09-25 14:31:10.152637] E [syncdutils(worker
> /bricks/brick1/brick):801:errlog] Popen: command returned error cmd=rsync
> -aR0 --inplace --files-from=- --super --stats --numeric-ids
> --no-implied-dirs --xattrs --acls . -e ssh -oPasswordAuthentication=no
> -oStrictHostKeyChecking=no -i /var/lib/glusterd/geo-replication/secret.pem
> -p 22 -oControlMaster=auto -S
> /tmp/gsyncd-aux-ssh-gg758Z/caec4d1d03cc28bc1853f692e291164f.sock
> slave_3:/proc/15919/cwd error=11
>
> [2018-09-25 14:31:10.237371] I [repce(agent
> /bricks/brick1/brick):80:service_loop] RepceServer: terminating on reaching
> EOF.
>
> [2018-09-25 14:31:10.430820] I
> [gsyncdstatus(monitor):244:set_worker_status] GeorepStatus: Worker Status
> Change  status=Faulty
>
> [2018-09-25 14:31:20.541475] I [monitor(monitor):158:monitor] Monitor:
> starting gsyncd worker      brick=/bricks/brick1/brick    slave_node=slave_3
>
> [2018-09-25 14:31:20.806518] I [gsyncd(agent
> /bricks/brick1/brick):297:main] <top>: Using session config file
> path=/var/lib/glusterd/geo-replication/glustervol1_slave_3_glustervol1/gsyncd.conf
>
> [2018-09-25 14:31:20.816536] I [changelogagent(agent
> /bricks/brick1/brick):72:__init__] ChangelogAgent: Agent listining...
>
> [2018-09-25 14:31:20.821574] I [gsyncd(worker
> /bricks/brick1/brick):297:main] <top>: Using session config file
> path=/var/lib/glusterd/geo-replication/glustervol1_slave_3_glustervol1/gsyncd.conf
>
> [2018-09-25 14:31:20.882128] I [resource(worker
> /bricks/brick1/brick):1377:connect_remote] SSH: Initializing SSH connection
> between master and slave...
>
> [2018-09-25 14:31:24.169857] I [resource(worker
> /bricks/brick1/brick):1424:connect_remote] SSH: SSH connection between
> master and slave established.      duration=3.2873
>
> [2018-09-25 14:31:24.170401] I [resource(worker
> /bricks/brick1/brick):1096:connect] GLUSTER: Mounting gluster volume
> locally...
>
> [2018-09-25 14:31:25.354633] I [resource(worker
> /bricks/brick1/brick):1119:connect] GLUSTER: Mounted gluster volume
> duration=1.1839
>
> [2018-09-25 14:31:25.355073] I [subcmds(worker
> /bricks/brick1/brick):70:subcmd_worker] <top>: Worker spawn successful.
> Acknowledging back to monitor
>
> [2018-09-25 14:31:27.439034] I [master(worker
> /bricks/brick1/brick):1593:register] _GMaster: Working dir
> path=/var/lib/misc/gluster/gsyncd/glustervol1_slave_3_glustervol1/bricks-brick1-brick
>
> [2018-09-25 14:31:27.441847] I [resource(worker
> /bricks/brick1/brick):1282:service_loop] GLUSTER: Register time
> time=1537885887
>
> [2018-09-25 14:31:27.465053] I [gsyncdstatus(worker
> /bricks/brick1/brick):277:set_active] GeorepStatus: Worker Status
> Change      status=Active
>
> [2018-09-25 14:31:27.471021] I [gsyncdstatus(worker
> /bricks/brick1/brick):249:set_worker_crawl_status] GeorepStatus: Crawl
> Status Change      status=History Crawl
>
> [2018-09-25 14:31:27.471484] I [master(worker
> /bricks/brick1/brick):1507:crawl] _GMaster: starting history crawl
> turns=1     stime=(0, 0)      entry_stime=None      etime=1537885887
>
> [2018-09-25 14:31:27.472564] E [repce(agent
> /bricks/brick1/brick):105:worker] <top>: call failed:
>
> Traceback (most recent call last):
>
>   File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 101, in
> worker
>
>     res = getattr(self.obj, rmeth)(*in_data[2:])
>
>   File "/usr/libexec/glusterfs/python/syncdaemon/changelogagent.py", line
> 53, in history
>
>     num_parallel)
>
>   File "/usr/libexec/glusterfs/python/syncdaemon/libgfchangelog.py", line
> 104, in cl_history_changelog
>
>     raise ChangelogHistoryNotAvailable()
>
> ChangelogHistoryNotAvailable
>
> [2018-09-25 14:31:27.480632] E [repce(worker
> /bricks/brick1/brick):197:__call__] RepceClient: call failed
> call=31250:140272364406592:1537885887.47  method=history
> error=ChangelogHistoryNotAvailable
>
> [2018-09-25 14:31:27.480958] I [resource(worker
> /bricks/brick1/brick):1295:service_loop] GLUSTER: Changelog history not
> available, using xsync
>
> [2018-09-25 14:31:27.495117] I [master(worker
> /bricks/brick1/brick):1623:crawl] _GMaster: starting hybrid crawl
> stime=(0, 0)
>
> [2018-09-25 14:31:27.502083] I [gsyncdstatus(worker
> /bricks/brick1/brick):249:set_worker_crawl_status] GeorepStatus: Crawl
> Status Change      status=Hybrid Crawl
>
> [2018-09-25 14:31:29.505284] I [master(worker
> /bricks/brick1/brick):1634:crawl] _GMaster: processing xsync changelog
> path=/var/lib/misc/gluster/gsyncd/glustervol1_slave_3_glustervol1/bricks-brick1-brick/xsync/XSYNC-CHANGELOG.1537885887
>
>
>
> tail -f
> /var/log/glusterfs/geo-replication-slaves/glustervol1_slave_3_glustervol1/gsyncd.log
>
> [2018-09-25 13:49:24.141303] I [repce(slave
> master/bricks/brick1/brick):80:service_loop] RepceServer: terminating on
> reaching EOF.
>
> [2018-09-25 13:49:36.602051] W [gsyncd(slave
> master/bricks/brick1/brick):293:main] <top>: Session config file not
> exists, using the default config
> path=/var/lib/glusterd/geo-replication/glustervol1_slave_3_glustervol1/gsyncd.conf
>
> [2018-09-25 13:49:36.629415] I [resource(slave
> master/bricks/brick1/brick):1096:connect] GLUSTER: Mounting gluster volume
> locally...
>
> [2018-09-25 13:49:37.701642] I [resource(slave
> master/bricks/brick1/brick):1119:connect] GLUSTER: Mounted gluster
> volume     duration=1.0718
>
> [2018-09-25 13:49:37.704282] I [resource(slave
> master/bricks/brick1/brick):1146:service_loop] GLUSTER: slave listening
>
> [2018-09-25 14:10:27.70952] I [repce(slave
> master/bricks/brick1/brick):80:service_loop] RepceServer: terminating on
> reaching EOF.
>
> [2018-09-25 14:10:39.632124] W [gsyncd(slave
> master/bricks/brick1/brick):293:main] <top>: Session config file not
> exists, using the default config
> path=/var/lib/glusterd/geo-replication/glustervol1_slave_3_glustervol1/gsyncd.conf
>
> [2018-09-25 14:10:39.650958] I [resource(slave
> master/bricks/brick1/brick):1096:connect] GLUSTER: Mounting gluster volume
> locally...
>
> [2018-09-25 14:10:40.729355] I [resource(slave
> master/bricks/brick1/brick):1119:connect] GLUSTER: Mounted gluster
> volume     duration=1.0781
>
> [2018-09-25 14:10:40.730650] I [resource(slave
> master/bricks/brick1/brick):1146:service_loop] GLUSTER: slave listening
>
> [2018-09-25 14:31:10.291064] I [repce(slave
> master/bricks/brick1/brick):80:service_loop] RepceServer: terminating on
> reaching EOF.
>
> [2018-09-25 14:31:22.802237] W [gsyncd(slave
> master/bricks/brick1/brick):293:main] <top>: Session config file not
> exists, using the default config
> path=/var/lib/glusterd/geo-replication/glustervol1_slave_3_glustervol1/gsyncd.conf
>
> [2018-09-25 14:31:22.828418] I [resource(slave
> master/bricks/brick1/brick):1096:connect] GLUSTER: Mounting gluster volume
> locally...
>
> [2018-09-25 14:31:23.910206] I [resource(slave
> master/bricks/brick1/brick):1119:connect] GLUSTER: Mounted gluster
> volume     duration=1.0813
>
> [2018-09-25 14:31:23.913369] I [resource(slave
> master/bricks/brick1/brick):1146:service_loop] GLUSTER: slave listening
>
>
>
> Any ideas how to resolve this without re-creating everything again? Can I
> reset the changelog history?
>
>
>
> Regards,
>
> Christian
>
>
>
> *From: *<gluster-users-bounces at gluster.org> on behalf of "Kotte,
> Christian (Ext)" <christian.kotte at novartis.com>
> *Date: *Monday, 24. September 2018 at 17:20
> *To: *Kotresh Hiremath Ravishankar <khiremat at redhat.com>
> *Cc: *Gluster Users <gluster-users at gluster.org>
> *Subject: *Re: [Gluster-users] [geo-rep] Replication faulty - gsyncd.log
> OSError: [Errno 13] Permission denied
>
>
>
> I don’t configure the permissions of /bricks/brick1/brick/.glusterfs. I
> don’t even see it on the local GlusterFS mount.
>
>
>
> Not sure why the permissions are configured with S and the AD group…
>
>
>
> Regards,
>
> Christian
>
>
>
> *From: *<gluster-users-bounces at gluster.org> on behalf of "Kotte,
> Christian (Ext)" <christian.kotte at novartis.com>
> *Date: *Monday, 24. September 2018 at 17:10
> *To: *Kotresh Hiremath Ravishankar <khiremat at redhat.com>
> *Cc: *Gluster Users <gluster-users at gluster.org>
> *Subject: *Re: [Gluster-users] [geo-rep] Replication faulty - gsyncd.log
> OSError: [Errno 13] Permission denied
>
>
>
> Yeah right. I get permission denied.
>
>
>
> [geoaccount at slave ~]$ ll
> /bricks/brick1/brick/.glusterfs/29/d1/29d1d60d-1ad6-45fc-87e0-93d478f7331e
>
> ls: cannot access
> /bricks/brick1/brick/.glusterfs/29/d1/29d1d60d-1ad6-45fc-87e0-93d478f7331e:
> Permission denied
>
> [geoaccount at slave ~]$ ll /bricks/brick1/brick/.glusterfs/29/d1/
>
> ls: cannot access /bricks/brick1/brick/.glusterfs/29/d1/: Permission denied
>
> [geoaccount at slave ~]$ ll /bricks/brick1/brick/.glusterfs/29/
>
> ls: cannot access /bricks/brick1/brick/.glusterfs/29/: Permission denied
>
> [geoaccount at slave ~]$ ll /bricks/brick1/brick/.glusterfs/
>
> ls: cannot open directory /bricks/brick1/brick/.glusterfs/: Permission
> denied
>
>
>
> [root at slave ~]# ll /bricks/brick1/brick/.glusterfs/29
>
> total 0
>
> drwx--S---+ 2 root AD+group 50 Sep 10 07:29 16
>
> drwx--S---+ 2 root AD+group 50 Sep 10 07:29 33
>
> drwx--S---+ 2 root AD+group 50 Sep 10 07:29 5e
>
> drwx--S---+ 2 root AD+group 50 Sep 10 07:29 73
>
> drwx--S---+ 2 root AD+group 50 Sep 10 07:29 b2
>
> drwx--S---+ 2 root AD+group 50 Sep 21 09:39 d1
>
> drwx--S---+ 2 root AD+group 50 Sep 10 07:29 d7
>
> drwx--S---+ 2 root AD+group 50 Sep 10 07:29 e6
>
> drwx--S---+ 2 root AD+group 50 Sep 10 07:29 eb
>
> [root at slave ~]#
>
>
>
> However, the strange thing is that I could replicate new files and folders
> before. The replication is broken since the “New folder” was created.
>
>
>
> These are the permissions on a dev/test system:
>
> [root at slave-dev ~]# ll /bricks/brick1/brick/.glusterfs/
>
> total 3136
>
> drwx------. 44 root root    4096 Aug 22 18:19 00
>
> drwx------. 50 root root    4096 Sep 12 13:14 01
>
> drwx------. 54 root root    4096 Sep 13 11:33 02
>
> drwx------. 59 root root    4096 Aug 22 18:21 03
>
> drwx------. 60 root root    4096 Sep 12 13:14 04
>
> drwx------. 68 root root    4096 Aug 24 12:36 05
>
> drwx------. 56 root root    4096 Aug 22 18:21 06
>
> drwx------. 46 root root    4096 Aug 22 18:21 07
>
> drwx------. 51 root root    4096 Aug 22 18:21 08
>
> drwx------. 42 root root    4096 Aug 22 18:21 09
>
> drwx------. 44 root root    4096 Sep 13 11:16 0a
>
>
>
> I’ve configured an AD group, SGID bit, and ACLs via Ansible on the local
> mount point. Could this be an issue? Should I avoid configuring the
> permissions on .glusterfs and below?
>
>
>
> # ll /mnt/glustervol1/
>
> total 12
>
> drwxrwsr-x+  4 AD+user AD+group 4096 Jul 13 07:46 Scripts
>
> drwxrwxr-x+ 10 AD+user AD+group 4096 Jun 12 12:03 Software
>
> -rw-rw-r--+  1 root    AD+group    0 Aug  8 08:44 test
>
> drwxr-xr-x+  6 AD+user AD+group 4096 Apr 18 10:58 tftp
>
>
>
> glusterfs_volumes:
>
> […]
>
>     permissions:
>
>       mode: "02775"
>
>       owner: root
>
>       group: "AD+group"
>
>       acl_permissions: rw
>
> […]
>
>
>
> # root directory is owned by root.
>
> # set permissions to 'g+s' to automatically set the group to "AD+group"
>
> # permissions of individual files will be set by Samba during creation
>
> - name: Configure volume directory permission 1/2
>
>   tags: glusterfs
>
>   file:
>
>     path: /mnt/{{ item.volume }}
>
>     state: directory
>
>     mode: "{{ item.permissions.mode }}"
>
>     owner: "{{ item.permissions.owner }}"
>
>     group: "{{ item.permissions.group }}"
>
>   with_items: "{{ glusterfs_volumes }}"
>
>   loop_control:
>
>     label: "{{ item.volume }}"
>
>   when: item.permissions is defined
>
>
>
> # ACL needs to be set to override default umask and grant "AD+group" write
> permissions
>
> - name: Configure volume directory permission 2/2 (ACL)
>
>   tags: glusterfs
>
>   acl:
>
>     path: /mnt/{{ item.volume }}
>
>     default: yes
>
>     entity: "{{ item.permissions.group }}"
>
>     etype: group
>
>     permissions: "{{ item.permissions.acl_permissions }}"
>
>     state: present
>
>   with_items: "{{ glusterfs_volumes }}"
>
>   loop_control:
>
>     label: "{{ item.volume }}"
>
>   when: item.permissions is defined
>
>
>
> Regards,
>
> Christian
>
>
>
> *From: *Kotresh Hiremath Ravishankar <khiremat at redhat.com>
> *Date: *Monday, 24. September 2018 at 16:20
> *To: *"Kotte, Christian (Ext)" <christian.kotte at novartis.com>
> *Cc: *Gluster Users <gluster-users at gluster.org>
> *Subject: *Re: [Gluster-users] [geo-rep] Replication faulty - gsyncd.log
> OSError: [Errno 13] Permission denied
>
>
>
> I think I am get what's happening. The geo-rep session is non-root.
>
> Could you do readlink on brick path mentioned above
> /bricks/brick1/brick/.glusterfs/29/d1/29d1d60d-1ad6-45fc-87e0-93d478f7331e
>
> from a geaccount user and see if you are getting "Permission Denied"
> errors?
>
>
>
> Thanks,
>
> Kotresh HR
>
>
>
> On Mon, Sep 24, 2018 at 7:35 PM Kotte, Christian (Ext) <
> christian.kotte at novartis.com> wrote:
>
> Ok. It happens on all slave nodes (and on the interimmaster as well).
>
>
>
> It’s like I assumed. These are the logs of one of the slaves:
>
>
>
> gsyncd.log
>
> [2018-09-24 13:52:25.418382] I [repce(slave
> slave/bricks/brick1/brick):80:service_loop] RepceServer: terminating on
> reaching EOF.
>
> [2018-09-24 13:52:37.95297] W [gsyncd(slave
> slave/bricks/brick1/brick):293:main] <top>: Session config file not exists,
> using the default config
> path=/var/lib/glusterd/geo-replication/glustervol1_slave_glustervol1/gsyncd.conf
>
> [2018-09-24 13:52:37.109643] I [resource(slave
> slave/bricks/brick1/brick):1096:connect] GLUSTER: Mounting gluster volume
> locally...
>
> [2018-09-24 13:52:38.303920] I [resource(slave
> slave/bricks/brick1/brick):1119:connect] GLUSTER: Mounted gluster
> volume      duration=1.1941
>
> [2018-09-24 13:52:38.304771] I [resource(slave
> slave/bricks/brick1/brick):1146:service_loop] GLUSTER: slave listening
>
> [2018-09-24 13:52:41.981554] I [resource(slave
> slave/bricks/brick1/brick):598:entry_ops] <top>: Special case: rename on
> mkdir        gfid=29d1d60d-1ad6-45fc-87e0-93d478f7331e
> entry='.gfid/6b97b987-8aef-46c3-af27-20d3aa883016/New folder'
>
> [2018-09-24 13:52:42.45641] E [repce(slave
> slave/bricks/brick1/brick):105:worker] <top>: call failed:
>
> Traceback (most recent call last):
>
>   File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 101, in
> worker
>
>     res = getattr(self.obj, rmeth)(*in_data[2:])
>
>   File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 599,
> in entry_ops
>
>     src_entry = get_slv_dir_path(slv_host, slv_volume, gfid)
>
>   File "/usr/libexec/glusterfs/python/syncdaemon/syncdutils.py", line 682,
> in get_slv_dir_path
>
>     [ENOENT], [ESTALE])
>
>   File "/usr/libexec/glusterfs/python/syncdaemon/syncdutils.py", line 540,
> in errno_wrap
>
>     return call(*arg)
>
> OSError: [Errno 13] Permission denied:
> '/bricks/brick1/brick/.glusterfs/29/d1/29d1d60d-1ad6-45fc-87e0-93d478f7331e'
>
> [2018-09-24 13:52:42.81794] I [repce(slave
> slave/bricks/brick1/brick):80:service_loop] RepceServer: terminating on
> reaching EOF.
>
> [2018-09-24 13:52:53.459676] W [gsyncd(slave
> slave/bricks/brick1/brick):293:main] <top>: Session config file not exists,
> using the default config
> path=/var/lib/glusterd/geo-replication/glustervol1_slave_glustervol1/gsyncd.conf
>
> [2018-09-24 13:52:53.473500] I [resource(slave
> slave/bricks/brick1/brick):1096:connect] GLUSTER: Mounting gluster volume
> locally...
>
> [2018-09-24 13:52:54.659044] I [resource(slave
> slave/bricks/brick1/brick):1119:connect] GLUSTER: Mounted gluster volume
>    duration=1.1854
>
> [2018-09-24 13:52:54.659837] I [resource(slave
> slave/bricks/brick1/brick):1146:service_loop] GLUSTER: slave listening
>
>
>
> The folder “New folder” will be created via Samba and it was renamed by my
> colleague right away after creation.
>
> [root at slave glustervol1_slave_glustervol1]# ls
> /bricks/brick1/brick/.glusterfs/29/d1/29d1d60d-1ad6-45fc-87e0-93d478f7331e/
>
> [root at slave glustervol1_slave_glustervol1]# ls
> /bricks/brick1/brick/.glusterfs/29/d1/ -al
>
> total 0
>
> drwx--S---+  2 root AD+group 50 Sep 21 09:39 .
>
> drwx--S---+ 11 root AD+group 96 Sep 21 09:39 ..
>
> lrwxrwxrwx.  1 root AD+group 75 Sep 21 09:39
> 29d1d60d-1ad6-45fc-87e0-93d478f7331e ->
> ../../6b/97/6b97b987-8aef-46c3-af27-20d3aa883016/vRealize Operation Manager
>
>
>
> Creating the folder in
> /bricks/brick1/brick/.glusterfs/6b/97/6b97b987-8aef-46c3-af27-20d3aa883016/,
> but it didn’t change anything.
>
>
>
> mnt-slave-bricks-brick1-brick.log
>
> [2018-09-24 13:51:10.625723] W [rpc-clnt.c:1753:rpc_clnt_submit]
> 0-glustervol1-client-0: error returned while attempting to connect to
> host:(null), port:0
>
> [2018-09-24 13:51:10.626092] W [rpc-clnt.c:1753:rpc_clnt_submit]
> 0-glustervol1-client-0: error returned while attempting to connect to
> host:(null), port:0
>
> [2018-09-24 13:51:10.626181] I [rpc-clnt.c:2105:rpc_clnt_reconfig]
> 0-glustervol1-client-0: changing port to 49152 (from 0)
>
> [2018-09-24 13:51:10.643111] W [rpc-clnt.c:1753:rpc_clnt_submit]
> 0-glustervol1-client-0: error returned while attempting to connect to
> host:(null), port:0
>
> [2018-09-24 13:51:10.643489] W [dict.c:923:str_to_data]
> (-->/usr/lib64/glusterfs/4.1.3/xlator/protocol/client.so(+0x4131a)
> [0x7fafb023831a] -->/lib64/libglusterfs.so.0(dict_set_str+0x16)
> [0x7fafbdb83266] -->/lib64/libglusterfs.so.0(str_to_data+0x91)
> [0x7fafbdb7fea1] ) 0-dict: value is NULL [Invalid argument]
>
> [2018-09-24 13:51:10.643507] I [MSGID: 114006]
> [client-handshake.c:1308:client_setvolume] 0-glustervol1-client-0: failed
> to set process-name in handshake msg
>
> [2018-09-24 13:51:10.643541] W [rpc-clnt.c:1753:rpc_clnt_submit]
> 0-glustervol1-client-0: error returned while attempting to connect to
> host:(null), port:0
>
> [2018-09-24 13:51:10.671460] I [MSGID: 114046]
> [client-handshake.c:1176:client_setvolume_cbk] 0-glustervol1-client-0:
> Connected to glustervol1-client-0, attached to remote volume
> '/bricks/brick1/brick'.
>
> [2018-09-24 13:51:10.672694] I [fuse-bridge.c:4294:fuse_init]
> 0-glusterfs-fuse: FUSE inited with protocol versions: glusterfs 7.24 kernel
> 7.22
>
> [2018-09-24 13:51:10.672715] I [fuse-bridge.c:4927:fuse_graph_sync]
> 0-fuse: switched to graph 0
>
> [2018-09-24 13:51:10.673329] I [MSGID: 109005]
> [dht-selfheal.c:2342:dht_selfheal_directory] 0-glustervol1-dht: Directory
> selfheal failed: Unable to form layout for directory /
>
> [2018-09-24 13:51:16.116458] I [fuse-bridge.c:5199:fuse_thread_proc]
> 0-fuse: initating unmount of
> /var/mountbroker-root/user1300/mtpt-geoaccount-ARDW1E
>
> [2018-09-24 13:51:16.116595] W [glusterfsd.c:1514:cleanup_and_exit]
> (-->/lib64/libpthread.so.0(+0x7e25) [0x7fafbc9eee25]
> -->/usr/sbin/glusterfs(glusterfs_sigwaiter+0xe5) [0x55d5dac5dd65]
> -->/usr/sbin/glusterfs(cleanup_and_exit+0x6b) [0x55d5dac5db8b] ) 0-:
> received signum (15), shutting down
>
> [2018-09-24 13:51:16.116616] I [fuse-bridge.c:5981:fini] 0-fuse:
> Unmounting '/var/mountbroker-root/user1300/mtpt-geoaccount-ARDW1E'.
>
> [2018-09-24 13:51:16.116625] I [fuse-bridge.c:5986:fini] 0-fuse: Closing
> fuse connection to '/var/mountbroker-root/user1300/mtpt-geoaccount-ARDW1E'.
>
>
>
> Regards,
>
> Christian
>
>
>
> *From: *Kotresh Hiremath Ravishankar <khiremat at redhat.com>
> *Date: *Saturday, 22. September 2018 at 06:52
> *To: *"Kotte, Christian (Ext)" <christian.kotte at novartis.com>
> *Cc: *Gluster Users <gluster-users at gluster.org>
> *Subject: *Re: [Gluster-users] [geo-rep] Replication faulty - gsyncd.log
> OSError: [Errno 13] Permission denied
>
>
>
> The problem occured on slave side whose error is propagated to master.
> Mostly any traceback with repce involved is related to problem in slave.
> Just check few lines above in the log to find the slave node, the crashed
> worker is connected to and get geo replication logs to further debug.
>
>
>
>
>
>
>
>
>
>
>
> On Fri, 21 Sep 2018, 20:10 Kotte, Christian (Ext), <
> christian.kotte at novartis.com> wrote:
>
> Hi,
>
>
>
> Any idea how to troubleshoot this?
>
>
>
> New folders and files were created on the master and the replication went
> faulty. They were created via Samba.
>
>
>
> Version: GlusterFS 4.1.3
>
>
>
> [root at master]# gluster volume geo-replication status
>
>
>
> MASTER NODE                         MASTER VOL     MASTER BRICK
> SLAVE USER
> SLAVE                                                             SLAVE
> NODE    STATUS    CRAWL STATUS    LAST_SYNCED
>
>
> -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
> master    glustervol1    /bricks/brick1/brick    geoaccount
> ssh://geoaccount@slave_1::glustervol1       N/A           Faulty
> N/A             N/A
>
> master    glustervol1    /bricks/brick1/brick    geoaccount
> ssh://geoaccount@slave_2::glustervol1       N/A           Faulty
> N/A             N/A
>
> master    glustervol1    /bricks/brick1/brick    geoaccount
> ssh://geoaccount@interimmaster::glustervol1   N/A           Faulty
> N/A             N/A
>
>
>
> The following error is repeatedly logged in the gsyncd.logs:
>
> [2018-09-21 14:26:38.611479] I [repce(agent
> /bricks/brick1/brick):80:service_loop] RepceServer: terminating on reaching
> EOF.
>
> [2018-09-21 14:26:39.211527] I [monitor(monitor):279:monitor] Monitor:
> worker died in startup phase     brick=/bricks/brick1/brick
>
> [2018-09-21 14:26:39.214322] I
> [gsyncdstatus(monitor):244:set_worker_status] GeorepStatus: Worker Status
> Change status=Faulty
>
> [2018-09-21 14:26:49.318953] I [monitor(monitor):158:monitor] Monitor:
> starting gsyncd worker   brick=/bricks/brick1/brick      slave_node=
> slave_3 <http://nrchbs-slp2020.nibr.novartis.net>
>
> [2018-09-21 14:26:49.471532] I [gsyncd(agent
> /bricks/brick1/brick):297:main] <top>: Using session config file
> path=/var/lib/glusterd/geo-replication/glustervol1_slave_3_glustervol1/gsyncd.conf
>
> [2018-09-21 14:26:49.473917] I [changelogagent(agent
> /bricks/brick1/brick):72:__init__] ChangelogAgent: Agent listining...
>
> [2018-09-21 14:26:49.491359] I [gsyncd(worker
> /bricks/brick1/brick):297:main] <top>: Using session config file
> path=/var/lib/glusterd/geo-replication/glustervol1_slave_3_glustervol1/gsyncd.conf
>
> [2018-09-21 14:26:49.538049] I [resource(worker
> /bricks/brick1/brick):1377:connect_remote] SSH: Initializing SSH connection
> between master and slave...
>
> [2018-09-21 14:26:53.5017] I [resource(worker
> /bricks/brick1/brick):1424:connect_remote] SSH: SSH connection between
> master and slave established.      duration=3.4665
>
> [2018-09-21 14:26:53.5419] I [resource(worker
> /bricks/brick1/brick):1096:connect] GLUSTER: Mounting gluster volume
> locally...
>
> [2018-09-21 14:26:54.120374] I [resource(worker
> /bricks/brick1/brick):1119:connect] GLUSTER: Mounted gluster volume
> duration=1.1146
>
> [2018-09-21 14:26:54.121012] I [subcmds(worker
> /bricks/brick1/brick):70:subcmd_worker] <top>: Worker spawn successful.
> Acknowledging back to monitor
>
> [2018-09-21 14:26:56.144460] I [master(worker
> /bricks/brick1/brick):1593:register] _GMaster: Working dir
> path=/var/lib/misc/gluster/gsyncd/glustervol1_slave_3_glustervol1/bricks-brick1-brick
>
> [2018-09-21 14:26:56.145145] I [resource(worker
> /bricks/brick1/brick):1282:service_loop] GLUSTER: Register time
> time=1537540016
>
> [2018-09-21 14:26:56.160064] I [gsyncdstatus(worker
> /bricks/brick1/brick):277:set_active] GeorepStatus: Worker Status Change
> status=Active
>
> [2018-09-21 14:26:56.161175] I [gsyncdstatus(worker
> /bricks/brick1/brick):249:set_worker_crawl_status] GeorepStatus: Crawl
> Status Change        status=History Crawl
>
> [2018-09-21 14:26:56.161536] I [master(worker
> /bricks/brick1/brick):1507:crawl] _GMaster: starting history crawl
> turns=1 stime=(1537522637, 0)   entry_stime=(1537537141, 0)
> etime=1537540016
>
> [2018-09-21 14:26:56.164277] I [master(worker
> /bricks/brick1/brick):1536:crawl] _GMaster: slave's time
> stime=(1537522637, 0)
>
> [2018-09-21 14:26:56.197065] I [master(worker
> /bricks/brick1/brick):1360:process] _GMaster: Skipping already processed
> entry ops        to_changelog=1537522638 num_changelogs=1
> from_changelog=1537522638
>
> [2018-09-21 14:26:56.197402] I [master(worker
> /bricks/brick1/brick):1374:process] _GMaster: Entry Time Taken    MKD=0
> MKN=0   LIN=0   SYM=0   REN=0   RMD=0   CRE=0   duration=0.0000 UNL=1
>
> [2018-09-21 14:26:56.197623] I [master(worker
> /bricks/brick1/brick):1384:process] _GMaster: Data/Metadata Time Taken
> SETA=0  SETX=0  meta_duration=0.0000    data_duration=0.0284    DATA=0
> XATT=0
>
> [2018-09-21 14:26:56.198230] I [master(worker
> /bricks/brick1/brick):1394:process] _GMaster: Batch Completed
> changelog_end=1537522638        entry_stime=(1537537141, 0)
> changelog_start=1537522638      stime=(1537522637, 0)   duration=0.0333
> num_changelogs=1        mode=history_changelog
>
> [2018-09-21 14:26:57.200436] I [master(worker
> /bricks/brick1/brick):1536:crawl] _GMaster: slave's time
> stime=(1537522637, 0)
>
> [2018-09-21 14:26:57.528625] E [repce(worker
> /bricks/brick1/brick):197:__call__] RepceClient: call failed
> call=17209:140650361157440:1537540017.21        method=entry_ops
> error=OSError
>
> [2018-09-21 14:26:57.529371] E [syncdutils(worker
> /bricks/brick1/brick):332:log_raise_exception] <top>: FAIL:
>
> Traceback (most recent call last):
>
>   File "/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py", line 311, in
> main
>
>     func(args)
>
>   File "/usr/libexec/glusterfs/python/syncdaemon/subcmds.py", line 72, in
> subcmd_worker
>
>     local.service_loop(remote)
>
>   File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 1288,
> in service_loop
>
>     g3.crawlwrap(oneshot=True)
>
>   File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 615, in
> crawlwrap
>
>     self.crawl()
>
>   File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1545, in
> crawl
>
>     self.changelogs_batch_process(changes)
>
>   File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1445, in
> changelogs_batch_process
>
>     self.process(batch)
>
>   File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1280, in
> process
>
>     self.process_change(change, done, retry)
>
>   File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1179, in
> process_change
>
>     failures = self.slave.server.entry_ops(entries)
>
>   File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 216, in
> __call__
>
>     return self.ins(self.meth, *a)
>
>   File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 198, in
> __call__
>
>     raise res
>
> OSError: [Errno 13] Permission denied:
> '/bricks/brick1/brick/.glusterfs/29/d1/29d1d60d-1ad6-45fc-87e0-93d478f7331e'
>
>
>
> The permissions look fine. The replication is done via geo user instead of
> root. It should be able to read, but I’m not sure if the syncdaemon runs
> under geoaccount!?
>
>
>
> [root at master vRealize Operation Manager]# ll
> /bricks/brick1/brick/.glusterfs/29/d1/29d1d60d-1ad6-45fc-87e0-93d478f7331e
>
> lrwxrwxrwx. 1 root root 75 Sep 21 09:39
> /bricks/brick1/brick/.glusterfs/29/d1/29d1d60d-1ad6-45fc-87e0-93d478f7331e
> -> ../../6b/97/6b97b987-8aef-46c3-af27-20d3aa883016/vRealize Operation
> Manager
>
>
>
> [root at master vRealize Operation Manager]# ll
> /bricks/brick1/brick/.glusterfs/29/d1/29d1d60d-1ad6-45fc-87e0-93d478f7331e/
>
> total 4
>
> drwxrwxr-x. 2 AD+user AD+group  131 Sep 21 10:14 6.7
>
> drwxrwxr-x. 2 AD+user AD+group 4096 Sep 21 09:43 7.0
>
> drwxrwxr-x. 2 AD+user AD+group   57 Sep 21 10:28 7.5
>
> [root at master vRealize Operation Manager]#
>
>
>
> It could be possible that the folder was renamed. I had 3 similar issues
> since I migrated to GlusterFS 4.x but couldn’t investigate much. I needed
> to completely wipe GlusterFS and geo-repliction to get rid of this error…
>
>
>
> Any help is appreciated.
>
>
>
> Regards,
>
>
>
> *Christian Kotte*
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-users
> <https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.gluster.org_mailman_listinfo_gluster-2Dusers&d=DwMFaQ&c=ZbgFmJjg4pdtrnL2HUJUDw&r=faVOd9yfnSYhe2mQhqlDwcpXGm7x8HN1C9wPmFD3694&m=buld78OSs9O-NEZ-w9vywUcr-bP6_RTbL2pwat-zRIU&s=bKc1d7zoIXuVSLbZS_vD3v4-FJrG2I6T6Dhcq8Qk6Bs&e=>
>
>
>
> --
>
> Thanks and Regards,
>
> Kotresh H R
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-users
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20190320/b66c650e/attachment-0001.html>


More information about the Gluster-users mailing list