[Gluster-users] [geo-rep] Replication faulty - gsyncd.log OSError: [Errno 13] Permission denied

Wed Mar 20 06:42:39 UTC 2019

Hi Andy,

This is a issue with non-root geo-rep session and is not fixed yet. Could
you please raise a bug for this issue?

Thanks,
Kotresh HR

On Wed, Mar 20, 2019 at 11:53 AM Andy Coates <andy.coates at gmail.com> wrote:

> We're seeing the same permission denied errors when running as a non-root
> geosync user.
>
> Does anyone know what the underlying issue is?
>
> On Wed, 26 Sep 2018 at 00:40, Kotte, Christian (Ext) <
> christian.kotte at novartis.com> wrote:
>
>> I changed to replication to use the root user and re-created the
>> replication with “create force”. Now the files and folders were replicated,
>> and the permission denied, and New folder error disappeared, but old files
>> are not deleted.
>>
>>
>>
>> Looks like the history crawl is in some kind of a loop:
>>
>>
>>
>> [root at master ~]# gluster volume geo-replication status
>>
>>
>>
>> MASTER NODE                         MASTER VOL     MASTER
>> BRICK            SLAVE USER    SLAVE
>>                                            SLAVE
>> NODE                          STATUS    CRAWL STATUS    LAST_SYNCED
>>
>>
>> ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>>
>> master                              glustervol1
>> /bricks/brick1/brick    root          ssh://slave_1::glustervol1
>>                               slave_1                             Active
>> Hybrid Crawl    N/A
>>
>> master                              glustervol1
>> /bricks/brick1/brick    root          ssh://slave_2::glustervol1
>>                               slave_2                             Active
>> Hybrid Crawl    N/A
>>
>> master                              glustervol1
>> /bricks/brick1/brick    root          ssh://slave_3::glustervol1
>>                               slave_3                             Active
>> Hybrid Crawl    N/A
>>
>>
>>
>> tail -f
>> /var/log/glusterfs/geo-replication/glustervol1_slave_3_glustervol1/gsyncd.log
>>
>>   File "/usr/libexec/glusterfs/python/syncdaemon/libgfchangelog.py", line
>> 104, in cl_history_changelog
>>
>>     raise ChangelogHistoryNotAvailable()
>>
>> ChangelogHistoryNotAvailable
>>
>> [2018-09-25 14:10:44.196011] E [repce(worker
>> /bricks/brick1/brick):197:__call__] RepceClient: call failed
>> call=29945:139700517484352:1537884644.19  method=history
>> error=ChangelogHistoryNotAvailable
>>
>> [2018-09-25 14:10:44.196405] I [resource(worker
>> /bricks/brick1/brick):1295:service_loop] GLUSTER: Changelog history not
>> available, using xsync
>>
>> [2018-09-25 14:10:44.221385] I [master(worker
>> /bricks/brick1/brick):1623:crawl] _GMaster: starting hybrid crawl
>> stime=(0, 0)
>>
>> [2018-09-25 14:10:44.223382] I [gsyncdstatus(worker
>> /bricks/brick1/brick):249:set_worker_crawl_status] GeorepStatus: Crawl
>> Status Change      status=Hybrid Crawl
>>
>> [2018-09-25 14:10:46.225296] I [master(worker
>> /bricks/brick1/brick):1634:crawl] _GMaster: processing xsync changelog
>> path=/var/lib/misc/gluster/gsyncd/glustervol1_slave_3_glustervol1/bricks-brick1-brick/xsync/XSYNC-CHANGELOG.1537884644
>>
>> [2018-09-25 14:13:36.157408] I [gsyncd(config-get):297:main] <top>: Using
>> session config file
>> path=/var/lib/glusterd/geo-replication/glustervol1_slave_3_glustervol1/gsyncd.conf
>>
>> [2018-09-25 14:13:36.377880] I [gsyncd(status):297:main] <top>: Using
>> session config file
>> path=/var/lib/glusterd/geo-replication/glustervol1_slave_3_glustervol1/gsyncd.conf
>>
>> [2018-09-25 14:31:10.145035] I [master(worker
>> /bricks/brick1/brick):1944:syncjob] Syncer: Sync Time Taken
>> duration=1212.5316      num_files=1474    job=2 return_code=11
>>
>> [2018-09-25 14:31:10.152637] E [syncdutils(worker
>> /bricks/brick1/brick):801:errlog] Popen: command returned error cmd=rsync
>> -aR0 --inplace --files-from=- --super --stats --numeric-ids
>> --no-implied-dirs --xattrs --acls . -e ssh -oPasswordAuthentication=no
>> -oStrictHostKeyChecking=no -i /var/lib/glusterd/geo-replication/secret.pem
>> -p 22 -oControlMaster=auto -S
>> /tmp/gsyncd-aux-ssh-gg758Z/caec4d1d03cc28bc1853f692e291164f.sock
>> slave_3:/proc/15919/cwd error=11
>>
>> [2018-09-25 14:31:10.237371] I [repce(agent
>> /bricks/brick1/brick):80:service_loop] RepceServer: terminating on reaching
>> EOF.
>>
>> [2018-09-25 14:31:10.430820] I
>> [gsyncdstatus(monitor):244:set_worker_status] GeorepStatus: Worker Status
>> Change  status=Faulty
>>
>> [2018-09-25 14:31:20.541475] I [monitor(monitor):158:monitor] Monitor:
>> starting gsyncd worker      brick=/bricks/brick1/brick    slave_node=slave_3
>>
>> [2018-09-25 14:31:20.806518] I [gsyncd(agent
>> /bricks/brick1/brick):297:main] <top>: Using session config file
>> path=/var/lib/glusterd/geo-replication/glustervol1_slave_3_glustervol1/gsyncd.conf
>>
>> [2018-09-25 14:31:20.816536] I [changelogagent(agent
>> /bricks/brick1/brick):72:__init__] ChangelogAgent: Agent listining...
>>
>> [2018-09-25 14:31:20.821574] I [gsyncd(worker
>> /bricks/brick1/brick):297:main] <top>: Using session config file
>> path=/var/lib/glusterd/geo-replication/glustervol1_slave_3_glustervol1/gsyncd.conf
>>
>> [2018-09-25 14:31:20.882128] I [resource(worker
>> /bricks/brick1/brick):1377:connect_remote] SSH: Initializing SSH connection
>> between master and slave...
>>
>> [2018-09-25 14:31:24.169857] I [resource(worker
>> /bricks/brick1/brick):1424:connect_remote] SSH: SSH connection between
>> master and slave established.      duration=3.2873
>>
>> [2018-09-25 14:31:24.170401] I [resource(worker
>> /bricks/brick1/brick):1096:connect] GLUSTER: Mounting gluster volume
>> locally...
>>
>> [2018-09-25 14:31:25.354633] I [resource(worker
>> /bricks/brick1/brick):1119:connect] GLUSTER: Mounted gluster volume
>> duration=1.1839
>>
>> [2018-09-25 14:31:25.355073] I [subcmds(worker
>> /bricks/brick1/brick):70:subcmd_worker] <top>: Worker spawn successful.
>> Acknowledging back to monitor
>>
>> [2018-09-25 14:31:27.439034] I [master(worker
>> /bricks/brick1/brick):1593:register] _GMaster: Working dir
>> path=/var/lib/misc/gluster/gsyncd/glustervol1_slave_3_glustervol1/bricks-brick1-brick
>>
>> [2018-09-25 14:31:27.441847] I [resource(worker
>> /bricks/brick1/brick):1282:service_loop] GLUSTER: Register time
>> time=1537885887
>>
>> [2018-09-25 14:31:27.465053] I [gsyncdstatus(worker
>> /bricks/brick1/brick):277:set_active] GeorepStatus: Worker Status
>> Change      status=Active
>>
>> [2018-09-25 14:31:27.471021] I [gsyncdstatus(worker
>> /bricks/brick1/brick):249:set_worker_crawl_status] GeorepStatus: Crawl
>> Status Change      status=History Crawl
>>
>> [2018-09-25 14:31:27.471484] I [master(worker
>> /bricks/brick1/brick):1507:crawl] _GMaster: starting history crawl
>> turns=1     stime=(0, 0)      entry_stime=None      etime=1537885887
>>
>> [2018-09-25 14:31:27.472564] E [repce(agent
>> /bricks/brick1/brick):105:worker] <top>: call failed:
>>
>> Traceback (most recent call last):
>>
>>   File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 101, in
>> worker
>>
>>     res = getattr(self.obj, rmeth)(*in_data[2:])
>>
>>   File "/usr/libexec/glusterfs/python/syncdaemon/changelogagent.py", line
>> 53, in history
>>
>>     num_parallel)
>>
>>   File "/usr/libexec/glusterfs/python/syncdaemon/libgfchangelog.py", line
>> 104, in cl_history_changelog
>>
>>     raise ChangelogHistoryNotAvailable()
>>
>> ChangelogHistoryNotAvailable
>>
>> [2018-09-25 14:31:27.480632] E [repce(worker
>> /bricks/brick1/brick):197:__call__] RepceClient: call failed
>> call=31250:140272364406592:1537885887.47  method=history
>> error=ChangelogHistoryNotAvailable
>>
>> [2018-09-25 14:31:27.480958] I [resource(worker
>> /bricks/brick1/brick):1295:service_loop] GLUSTER: Changelog history not
>> available, using xsync
>>
>> [2018-09-25 14:31:27.495117] I [master(worker
>> /bricks/brick1/brick):1623:crawl] _GMaster: starting hybrid crawl
>> stime=(0, 0)
>>
>> [2018-09-25 14:31:27.502083] I [gsyncdstatus(worker
>> /bricks/brick1/brick):249:set_worker_crawl_status] GeorepStatus: Crawl
>> Status Change      status=Hybrid Crawl
>>
>> [2018-09-25 14:31:29.505284] I [master(worker
>> /bricks/brick1/brick):1634:crawl] _GMaster: processing xsync changelog
>> path=/var/lib/misc/gluster/gsyncd/glustervol1_slave_3_glustervol1/bricks-brick1-brick/xsync/XSYNC-CHANGELOG.1537885887
>>
>>
>>
>> tail -f
>> /var/log/glusterfs/geo-replication-slaves/glustervol1_slave_3_glustervol1/gsyncd.log
>>
>> [2018-09-25 13:49:24.141303] I [repce(slave
>> master/bricks/brick1/brick):80:service_loop] RepceServer: terminating on
>> reaching EOF.
>>
>> [2018-09-25 13:49:36.602051] W [gsyncd(slave
>> master/bricks/brick1/brick):293:main] <top>: Session config file not
>> exists, using the default config
>> path=/var/lib/glusterd/geo-replication/glustervol1_slave_3_glustervol1/gsyncd.conf
>>
>> [2018-09-25 13:49:36.629415] I [resource(slave
>> master/bricks/brick1/brick):1096:connect] GLUSTER: Mounting gluster volume
>> locally...
>>
>> [2018-09-25 13:49:37.701642] I [resource(slave
>> master/bricks/brick1/brick):1119:connect] GLUSTER: Mounted gluster
>> volume     duration=1.0718
>>
>> [2018-09-25 13:49:37.704282] I [resource(slave
>> master/bricks/brick1/brick):1146:service_loop] GLUSTER: slave listening
>>
>> [2018-09-25 14:10:27.70952] I [repce(slave
>> master/bricks/brick1/brick):80:service_loop] RepceServer: terminating on
>> reaching EOF.
>>
>> [2018-09-25 14:10:39.632124] W [gsyncd(slave
>> master/bricks/brick1/brick):293:main] <top>: Session config file not
>> exists, using the default config
>> path=/var/lib/glusterd/geo-replication/glustervol1_slave_3_glustervol1/gsyncd.conf
>>
>> [2018-09-25 14:10:39.650958] I [resource(slave
>> master/bricks/brick1/brick):1096:connect] GLUSTER: Mounting gluster volume
>> locally...
>>
>> [2018-09-25 14:10:40.729355] I [resource(slave
>> master/bricks/brick1/brick):1119:connect] GLUSTER: Mounted gluster
>> volume     duration=1.0781
>>
>> [2018-09-25 14:10:40.730650] I [resource(slave
>> master/bricks/brick1/brick):1146:service_loop] GLUSTER: slave listening
>>
>> [2018-09-25 14:31:10.291064] I [repce(slave
>> master/bricks/brick1/brick):80:service_loop] RepceServer: terminating on
>> reaching EOF.
>>
>> [2018-09-25 14:31:22.802237] W [gsyncd(slave
>> master/bricks/brick1/brick):293:main] <top>: Session config file not
>> exists, using the default config
>> path=/var/lib/glusterd/geo-replication/glustervol1_slave_3_glustervol1/gsyncd.conf
>>
>> [2018-09-25 14:31:22.828418] I [resource(slave
>> master/bricks/brick1/brick):1096:connect] GLUSTER: Mounting gluster volume
>> locally...
>>
>> [2018-09-25 14:31:23.910206] I [resource(slave
>> master/bricks/brick1/brick):1119:connect] GLUSTER: Mounted gluster
>> volume     duration=1.0813
>>
>> [2018-09-25 14:31:23.913369] I [resource(slave
>> master/bricks/brick1/brick):1146:service_loop] GLUSTER: slave listening
>>
>>
>>
>> Any ideas how to resolve this without re-creating everything again? Can I
>> reset the changelog history?
>>
>>
>>
>> Regards,
>>
>> Christian
>>
>>
>>
>> *From: *<gluster-users-bounces at gluster.org> on behalf of "Kotte,
>> Christian (Ext)" <christian.kotte at novartis.com>
>> *Date: *Monday, 24. September 2018 at 17:20
>> *To: *Kotresh Hiremath Ravishankar <khiremat at redhat.com>
>> *Cc: *Gluster Users <gluster-users at gluster.org>
>> *Subject: *Re: [Gluster-users] [geo-rep] Replication faulty - gsyncd.log
>> OSError: [Errno 13] Permission denied
>>
>>
>>
>> I don’t configure the permissions of /bricks/brick1/brick/.glusterfs. I
>> don’t even see it on the local GlusterFS mount.
>>
>>
>>
>> Not sure why the permissions are configured with S and the AD group…
>>
>>
>>
>> Regards,
>>
>> Christian
>>
>>
>>
>> *From: *<gluster-users-bounces at gluster.org> on behalf of "Kotte,
>> Christian (Ext)" <christian.kotte at novartis.com>
>> *Date: *Monday, 24. September 2018 at 17:10
>> *To: *Kotresh Hiremath Ravishankar <khiremat at redhat.com>
>> *Cc: *Gluster Users <gluster-users at gluster.org>
>> *Subject: *Re: [Gluster-users] [geo-rep] Replication faulty - gsyncd.log
>> OSError: [Errno 13] Permission denied
>>
>>
>>
>> Yeah right. I get permission denied.
>>
>>
>>
>> [geoaccount at slave ~]$ ll
>> /bricks/brick1/brick/.glusterfs/29/d1/29d1d60d-1ad6-45fc-87e0-93d478f7331e
>>
>> ls: cannot access
>> /bricks/brick1/brick/.glusterfs/29/d1/29d1d60d-1ad6-45fc-87e0-93d478f7331e:
>> Permission denied
>>
>> [geoaccount at slave ~]$ ll /bricks/brick1/brick/.glusterfs/29/d1/
>>
>> ls: cannot access /bricks/brick1/brick/.glusterfs/29/d1/: Permission
>> denied
>>
>> [geoaccount at slave ~]$ ll /bricks/brick1/brick/.glusterfs/29/
>>
>> ls: cannot access /bricks/brick1/brick/.glusterfs/29/: Permission denied
>>
>> [geoaccount at slave ~]$ ll /bricks/brick1/brick/.glusterfs/
>>
>> ls: cannot open directory /bricks/brick1/brick/.glusterfs/: Permission
>> denied
>>
>>
>>
>> [root at slave ~]# ll /bricks/brick1/brick/.glusterfs/29
>>
>> total 0
>>
>> drwx--S---+ 2 root AD+group 50 Sep 10 07:29 16
>>
>> drwx--S---+ 2 root AD+group 50 Sep 10 07:29 33
>>
>> drwx--S---+ 2 root AD+group 50 Sep 10 07:29 5e
>>
>> drwx--S---+ 2 root AD+group 50 Sep 10 07:29 73
>>
>> drwx--S---+ 2 root AD+group 50 Sep 10 07:29 b2
>>
>> drwx--S---+ 2 root AD+group 50 Sep 21 09:39 d1
>>
>> drwx--S---+ 2 root AD+group 50 Sep 10 07:29 d7
>>
>> drwx--S---+ 2 root AD+group 50 Sep 10 07:29 e6
>>
>> drwx--S---+ 2 root AD+group 50 Sep 10 07:29 eb
>>
>> [root at slave ~]#
>>
>>
>>
>> However, the strange thing is that I could replicate new files and
>> folders before. The replication is broken since the “New folder” was
>> created.
>>
>>
>>
>> These are the permissions on a dev/test system:
>>
>> [root at slave-dev ~]# ll /bricks/brick1/brick/.glusterfs/
>>
>> total 3136
>>
>> drwx------. 44 root root    4096 Aug 22 18:19 00
>>
>> drwx------. 50 root root    4096 Sep 12 13:14 01
>>
>> drwx------. 54 root root    4096 Sep 13 11:33 02
>>
>> drwx------. 59 root root    4096 Aug 22 18:21 03
>>
>> drwx------. 60 root root    4096 Sep 12 13:14 04
>>
>> drwx------. 68 root root    4096 Aug 24 12:36 05
>>
>> drwx------. 56 root root    4096 Aug 22 18:21 06
>>
>> drwx------. 46 root root    4096 Aug 22 18:21 07
>>
>> drwx------. 51 root root    4096 Aug 22 18:21 08
>>
>> drwx------. 42 root root    4096 Aug 22 18:21 09
>>
>> drwx------. 44 root root    4096 Sep 13 11:16 0a
>>
>>
>>
>> I’ve configured an AD group, SGID bit, and ACLs via Ansible on the local
>> mount point. Could this be an issue? Should I avoid configuring the
>> permissions on .glusterfs and below?
>>
>>
>>
>> # ll /mnt/glustervol1/
>>
>> total 12
>>
>> drwxrwsr-x+  4 AD+user AD+group 4096 Jul 13 07:46 Scripts
>>
>> drwxrwxr-x+ 10 AD+user AD+group 4096 Jun 12 12:03 Software
>>
>> -rw-rw-r--+  1 root    AD+group    0 Aug  8 08:44 test
>>
>> drwxr-xr-x+  6 AD+user AD+group 4096 Apr 18 10:58 tftp
>>
>>
>>
>> glusterfs_volumes:
>>
>> […]
>>
>>     permissions:
>>
>>       mode: "02775"
>>
>>       owner: root
>>
>>       group: "AD+group"
>>
>>       acl_permissions: rw
>>
>> […]
>>
>>
>>
>> # root directory is owned by root.
>>
>> # set permissions to 'g+s' to automatically set the group to "AD+group"
>>
>> # permissions of individual files will be set by Samba during creation
>>
>> - name: Configure volume directory permission 1/2
>>
>>   tags: glusterfs
>>
>>   file:
>>
>>     path: /mnt/{{ item.volume }}
>>
>>     state: directory
>>
>>     mode: "{{ item.permissions.mode }}"
>>
>>     owner: "{{ item.permissions.owner }}"
>>
>>     group: "{{ item.permissions.group }}"
>>
>>   with_items: "{{ glusterfs_volumes }}"
>>
>>   loop_control:
>>
>>     label: "{{ item.volume }}"
>>
>>   when: item.permissions is defined
>>
>>
>>
>> # ACL needs to be set to override default umask and grant "AD+group"
>> write permissions
>>
>> - name: Configure volume directory permission 2/2 (ACL)
>>
>>   tags: glusterfs
>>
>>   acl:
>>
>>     path: /mnt/{{ item.volume }}
>>
>>     default: yes
>>
>>     entity: "{{ item.permissions.group }}"
>>
>>     etype: group
>>
>>     permissions: "{{ item.permissions.acl_permissions }}"
>>
>>     state: present
>>
>>   with_items: "{{ glusterfs_volumes }}"
>>
>>   loop_control:
>>
>>     label: "{{ item.volume }}"
>>
>>   when: item.permissions is defined
>>
>>
>>
>> Regards,
>>
>> Christian
>>
>>
>>
>> *From: *Kotresh Hiremath Ravishankar <khiremat at redhat.com>
>> *Date: *Monday, 24. September 2018 at 16:20
>> *To: *"Kotte, Christian (Ext)" <christian.kotte at novartis.com>
>> *Cc: *Gluster Users <gluster-users at gluster.org>
>> *Subject: *Re: [Gluster-users] [geo-rep] Replication faulty - gsyncd.log
>> OSError: [Errno 13] Permission denied
>>
>>
>>
>> I think I am get what's happening. The geo-rep session is non-root.
>>
>> Could you do readlink on brick path mentioned above
>> /bricks/brick1/brick/.glusterfs/29/d1/29d1d60d-1ad6-45fc-87e0-93d478f7331e
>>
>> from a geaccount user and see if you are getting "Permission Denied"
>> errors?
>>
>>
>>
>> Thanks,
>>
>> Kotresh HR
>>
>>
>>
>> On Mon, Sep 24, 2018 at 7:35 PM Kotte, Christian (Ext) <
>> christian.kotte at novartis.com> wrote:
>>
>> Ok. It happens on all slave nodes (and on the interimmaster as well).
>>
>>
>>
>> It’s like I assumed. These are the logs of one of the slaves:
>>
>>
>>
>> gsyncd.log
>>
>> [2018-09-24 13:52:25.418382] I [repce(slave
>> slave/bricks/brick1/brick):80:service_loop] RepceServer: terminating on
>> reaching EOF.
>>
>> [2018-09-24 13:52:37.95297] W [gsyncd(slave
>> slave/bricks/brick1/brick):293:main] <top>: Session config file not exists,
>> using the default config
>> path=/var/lib/glusterd/geo-replication/glustervol1_slave_glustervol1/gsyncd.conf
>>
>> [2018-09-24 13:52:37.109643] I [resource(slave
>> slave/bricks/brick1/brick):1096:connect] GLUSTER: Mounting gluster volume
>> locally...
>>
>> [2018-09-24 13:52:38.303920] I [resource(slave
>> slave/bricks/brick1/brick):1119:connect] GLUSTER: Mounted gluster
>> volume      duration=1.1941
>>
>> [2018-09-24 13:52:38.304771] I [resource(slave
>> slave/bricks/brick1/brick):1146:service_loop] GLUSTER: slave listening
>>
>> [2018-09-24 13:52:41.981554] I [resource(slave
>> slave/bricks/brick1/brick):598:entry_ops] <top>: Special case: rename on
>> mkdir        gfid=29d1d60d-1ad6-45fc-87e0-93d478f7331e
>> entry='.gfid/6b97b987-8aef-46c3-af27-20d3aa883016/New folder'
>>
>> [2018-09-24 13:52:42.45641] E [repce(slave
>> slave/bricks/brick1/brick):105:worker] <top>: call failed:
>>
>> Traceback (most recent call last):
>>
>>   File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 101, in
>> worker
>>
>>     res = getattr(self.obj, rmeth)(*in_data[2:])
>>
>>   File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 599,
>> in entry_ops
>>
>>     src_entry = get_slv_dir_path(slv_host, slv_volume, gfid)
>>
>>   File "/usr/libexec/glusterfs/python/syncdaemon/syncdutils.py", line
>> 682, in get_slv_dir_path
>>
>>     [ENOENT], [ESTALE])
>>
>>   File "/usr/libexec/glusterfs/python/syncdaemon/syncdutils.py", line
>> 540, in errno_wrap
>>
>>     return call(*arg)
>>
>> OSError: [Errno 13] Permission denied:
>> '/bricks/brick1/brick/.glusterfs/29/d1/29d1d60d-1ad6-45fc-87e0-93d478f7331e'
>>
>> [2018-09-24 13:52:42.81794] I [repce(slave
>> slave/bricks/brick1/brick):80:service_loop] RepceServer: terminating on
>> reaching EOF.
>>
>> [2018-09-24 13:52:53.459676] W [gsyncd(slave
>> slave/bricks/brick1/brick):293:main] <top>: Session config file not exists,
>> using the default config
>> path=/var/lib/glusterd/geo-replication/glustervol1_slave_glustervol1/gsyncd.conf
>>
>> [2018-09-24 13:52:53.473500] I [resource(slave
>> slave/bricks/brick1/brick):1096:connect] GLUSTER: Mounting gluster volume
>> locally...
>>
>> [2018-09-24 13:52:54.659044] I [resource(slave
>> slave/bricks/brick1/brick):1119:connect] GLUSTER: Mounted gluster volume
>>    duration=1.1854
>>
>> [2018-09-24 13:52:54.659837] I [resource(slave
>> slave/bricks/brick1/brick):1146:service_loop] GLUSTER: slave listening
>>
>>
>>
>> The folder “New folder” will be created via Samba and it was renamed by
>> my colleague right away after creation.
>>
>> [root at slave glustervol1_slave_glustervol1]# ls
>> /bricks/brick1/brick/.glusterfs/29/d1/29d1d60d-1ad6-45fc-87e0-93d478f7331e/
>>
>> [root at slave glustervol1_slave_glustervol1]# ls
>> /bricks/brick1/brick/.glusterfs/29/d1/ -al
>>
>> total 0
>>
>> drwx--S---+  2 root AD+group 50 Sep 21 09:39 .
>>
>> drwx--S---+ 11 root AD+group 96 Sep 21 09:39 ..
>>
>> lrwxrwxrwx.  1 root AD+group 75 Sep 21 09:39
>> 29d1d60d-1ad6-45fc-87e0-93d478f7331e ->
>> ../../6b/97/6b97b987-8aef-46c3-af27-20d3aa883016/vRealize Operation Manager
>>
>>
>>
>> Creating the folder in
>> /bricks/brick1/brick/.glusterfs/6b/97/6b97b987-8aef-46c3-af27-20d3aa883016/,
>> but it didn’t change anything.
>>
>>
>>
>> mnt-slave-bricks-brick1-brick.log
>>
>> [2018-09-24 13:51:10.625723] W [rpc-clnt.c:1753:rpc_clnt_submit]
>> 0-glustervol1-client-0: error returned while attempting to connect to
>> host:(null), port:0
>>
>> [2018-09-24 13:51:10.626092] W [rpc-clnt.c:1753:rpc_clnt_submit]
>> 0-glustervol1-client-0: error returned while attempting to connect to
>> host:(null), port:0
>>
>> [2018-09-24 13:51:10.626181] I [rpc-clnt.c:2105:rpc_clnt_reconfig]
>> 0-glustervol1-client-0: changing port to 49152 (from 0)
>>
>> [2018-09-24 13:51:10.643111] W [rpc-clnt.c:1753:rpc_clnt_submit]
>> 0-glustervol1-client-0: error returned while attempting to connect to
>> host:(null), port:0
>>
>> [2018-09-24 13:51:10.643489] W [dict.c:923:str_to_data]
>> (-->/usr/lib64/glusterfs/4.1.3/xlator/protocol/client.so(+0x4131a)
>> [0x7fafb023831a] -->/lib64/libglusterfs.so.0(dict_set_str+0x16)
>> [0x7fafbdb83266] -->/lib64/libglusterfs.so.0(str_to_data+0x91)
>> [0x7fafbdb7fea1] ) 0-dict: value is NULL [Invalid argument]
>>
>> [2018-09-24 13:51:10.643507] I [MSGID: 114006]
>> [client-handshake.c:1308:client_setvolume] 0-glustervol1-client-0: failed
>> to set process-name in handshake msg
>>
>> [2018-09-24 13:51:10.643541] W [rpc-clnt.c:1753:rpc_clnt_submit]
>> 0-glustervol1-client-0: error returned while attempting to connect to
>> host:(null), port:0
>>
>> [2018-09-24 13:51:10.671460] I [MSGID: 114046]
>> [client-handshake.c:1176:client_setvolume_cbk] 0-glustervol1-client-0:
>> Connected to glustervol1-client-0, attached to remote volume
>> '/bricks/brick1/brick'.
>>
>> [2018-09-24 13:51:10.672694] I [fuse-bridge.c:4294:fuse_init]
>> 0-glusterfs-fuse: FUSE inited with protocol versions: glusterfs 7.24 kernel
>> 7.22
>>
>> [2018-09-24 13:51:10.672715] I [fuse-bridge.c:4927:fuse_graph_sync]
>> 0-fuse: switched to graph 0
>>
>> [2018-09-24 13:51:10.673329] I [MSGID: 109005]
>> [dht-selfheal.c:2342:dht_selfheal_directory] 0-glustervol1-dht: Directory
>> selfheal failed: Unable to form layout for directory /
>>
>> [2018-09-24 13:51:16.116458] I [fuse-bridge.c:5199:fuse_thread_proc]
>> 0-fuse: initating unmount of
>> /var/mountbroker-root/user1300/mtpt-geoaccount-ARDW1E
>>
>> [2018-09-24 13:51:16.116595] W [glusterfsd.c:1514:cleanup_and_exit]
>> (-->/lib64/libpthread.so.0(+0x7e25) [0x7fafbc9eee25]
>> -->/usr/sbin/glusterfs(glusterfs_sigwaiter+0xe5) [0x55d5dac5dd65]
>> -->/usr/sbin/glusterfs(cleanup_and_exit+0x6b) [0x55d5dac5db8b] ) 0-:
>> received signum (15), shutting down
>>
>> [2018-09-24 13:51:16.116616] I [fuse-bridge.c:5981:fini] 0-fuse:
>> Unmounting '/var/mountbroker-root/user1300/mtpt-geoaccount-ARDW1E'.
>>
>> [2018-09-24 13:51:16.116625] I [fuse-bridge.c:5986:fini] 0-fuse: Closing
>> fuse connection to '/var/mountbroker-root/user1300/mtpt-geoaccount-ARDW1E'.
>>
>>
>>
>> Regards,
>>
>> Christian
>>
>>
>>
>> *From: *Kotresh Hiremath Ravishankar <khiremat at redhat.com>
>> *Date: *Saturday, 22. September 2018 at 06:52
>> *To: *"Kotte, Christian (Ext)" <christian.kotte at novartis.com>
>> *Cc: *Gluster Users <gluster-users at gluster.org>
>> *Subject: *Re: [Gluster-users] [geo-rep] Replication faulty - gsyncd.log
>> OSError: [Errno 13] Permission denied
>>
>>
>>
>> The problem occured on slave side whose error is propagated to master.
>> Mostly any traceback with repce involved is related to problem in slave.
>> Just check few lines above in the log to find the slave node, the crashed
>> worker is connected to and get geo replication logs to further debug.
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> On Fri, 21 Sep 2018, 20:10 Kotte, Christian (Ext), <
>> christian.kotte at novartis.com> wrote:
>>
>> Hi,
>>
>>
>>
>> Any idea how to troubleshoot this?
>>
>>
>>
>> New folders and files were created on the master and the replication went
>> faulty. They were created via Samba.
>>
>>
>>
>> Version: GlusterFS 4.1.3
>>
>>
>>
>> [root at master]# gluster volume geo-replication status
>>
>>
>>
>> MASTER NODE                         MASTER VOL     MASTER
>> BRICK            SLAVE USER
>> SLAVE                                                             SLAVE
>> NODE    STATUS    CRAWL STATUS    LAST_SYNCED
>>
>>
>> -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>>
>> master    glustervol1    /bricks/brick1/brick    geoaccount
>> ssh://geoaccount@slave_1::glustervol1       N/A           Faulty
>> N/A             N/A
>>
>> master    glustervol1    /bricks/brick1/brick    geoaccount
>> ssh://geoaccount@slave_2::glustervol1       N/A           Faulty
>> N/A             N/A
>>
>> master    glustervol1    /bricks/brick1/brick    geoaccount
>> ssh://geoaccount@interimmaster::glustervol1   N/A           Faulty
>> N/A             N/A
>>
>>
>>
>> The following error is repeatedly logged in the gsyncd.logs:
>>
>> [2018-09-21 14:26:38.611479] I [repce(agent
>> /bricks/brick1/brick):80:service_loop] RepceServer: terminating on reaching
>> EOF.
>>
>> [2018-09-21 14:26:39.211527] I [monitor(monitor):279:monitor] Monitor:
>> worker died in startup phase     brick=/bricks/brick1/brick
>>
>> [2018-09-21 14:26:39.214322] I
>> [gsyncdstatus(monitor):244:set_worker_status] GeorepStatus: Worker Status
>> Change status=Faulty
>>
>> [2018-09-21 14:26:49.318953] I [monitor(monitor):158:monitor] Monitor:
>> starting gsyncd worker   brick=/bricks/brick1/brick      slave_node=
>> slave_3 <http://nrchbs-slp2020.nibr.novartis.net>
>>
>> [2018-09-21 14:26:49.471532] I [gsyncd(agent
>> /bricks/brick1/brick):297:main] <top>: Using session config file
>> path=/var/lib/glusterd/geo-replication/glustervol1_slave_3_glustervol1/gsyncd.conf
>>
>> [2018-09-21 14:26:49.473917] I [changelogagent(agent
>> /bricks/brick1/brick):72:__init__] ChangelogAgent: Agent listining...
>>
>> [2018-09-21 14:26:49.491359] I [gsyncd(worker
>> /bricks/brick1/brick):297:main] <top>: Using session config file
>> path=/var/lib/glusterd/geo-replication/glustervol1_slave_3_glustervol1/gsyncd.conf
>>
>> [2018-09-21 14:26:49.538049] I [resource(worker
>> /bricks/brick1/brick):1377:connect_remote] SSH: Initializing SSH connection
>> between master and slave...
>>
>> [2018-09-21 14:26:53.5017] I [resource(worker
>> /bricks/brick1/brick):1424:connect_remote] SSH: SSH connection between
>> master and slave established.      duration=3.4665
>>
>> [2018-09-21 14:26:53.5419] I [resource(worker
>> /bricks/brick1/brick):1096:connect] GLUSTER: Mounting gluster volume
>> locally...
>>
>> [2018-09-21 14:26:54.120374] I [resource(worker
>> /bricks/brick1/brick):1119:connect] GLUSTER: Mounted gluster volume
>> duration=1.1146
>>
>> [2018-09-21 14:26:54.121012] I [subcmds(worker
>> /bricks/brick1/brick):70:subcmd_worker] <top>: Worker spawn successful.
>> Acknowledging back to monitor
>>
>> [2018-09-21 14:26:56.144460] I [master(worker
>> /bricks/brick1/brick):1593:register] _GMaster: Working dir
>> path=/var/lib/misc/gluster/gsyncd/glustervol1_slave_3_glustervol1/bricks-brick1-brick
>>
>> [2018-09-21 14:26:56.145145] I [resource(worker
>> /bricks/brick1/brick):1282:service_loop] GLUSTER: Register time
>> time=1537540016
>>
>> [2018-09-21 14:26:56.160064] I [gsyncdstatus(worker
>> /bricks/brick1/brick):277:set_active] GeorepStatus: Worker Status Change
>> status=Active
>>
>> [2018-09-21 14:26:56.161175] I [gsyncdstatus(worker
>> /bricks/brick1/brick):249:set_worker_crawl_status] GeorepStatus: Crawl
>> Status Change        status=History Crawl
>>
>> [2018-09-21 14:26:56.161536] I [master(worker
>> /bricks/brick1/brick):1507:crawl] _GMaster: starting history crawl
>> turns=1 stime=(1537522637, 0)   entry_stime=(1537537141, 0)
>> etime=1537540016
>>
>> [2018-09-21 14:26:56.164277] I [master(worker
>> /bricks/brick1/brick):1536:crawl] _GMaster: slave's time
>> stime=(1537522637, 0)
>>
>> [2018-09-21 14:26:56.197065] I [master(worker
>> /bricks/brick1/brick):1360:process] _GMaster: Skipping already processed
>> entry ops        to_changelog=1537522638 num_changelogs=1
>> from_changelog=1537522638
>>
>> [2018-09-21 14:26:56.197402] I [master(worker
>> /bricks/brick1/brick):1374:process] _GMaster: Entry Time Taken    MKD=0
>> MKN=0   LIN=0   SYM=0   REN=0   RMD=0   CRE=0   duration=0.0000 UNL=1
>>
>> [2018-09-21 14:26:56.197623] I [master(worker
>> /bricks/brick1/brick):1384:process] _GMaster: Data/Metadata Time Taken
>> SETA=0  SETX=0  meta_duration=0.0000    data_duration=0.0284    DATA=0
>> XATT=0
>>
>> [2018-09-21 14:26:56.198230] I [master(worker
>> /bricks/brick1/brick):1394:process] _GMaster: Batch Completed
>> changelog_end=1537522638        entry_stime=(1537537141, 0)
>> changelog_start=1537522638      stime=(1537522637, 0)   duration=0.0333
>> num_changelogs=1        mode=history_changelog
>>
>> [2018-09-21 14:26:57.200436] I [master(worker
>> /bricks/brick1/brick):1536:crawl] _GMaster: slave's time
>> stime=(1537522637, 0)
>>
>> [2018-09-21 14:26:57.528625] E [repce(worker
>> /bricks/brick1/brick):197:__call__] RepceClient: call failed
>> call=17209:140650361157440:1537540017.21        method=entry_ops
>> error=OSError
>>
>> [2018-09-21 14:26:57.529371] E [syncdutils(worker
>> /bricks/brick1/brick):332:log_raise_exception] <top>: FAIL:
>>
>> Traceback (most recent call last):
>>
>>   File "/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py", line 311, in
>> main
>>
>>     func(args)
>>
>>   File "/usr/libexec/glusterfs/python/syncdaemon/subcmds.py", line 72, in
>> subcmd_worker
>>
>>     local.service_loop(remote)
>>
>>   File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 1288,
>> in service_loop
>>
>>     g3.crawlwrap(oneshot=True)
>>
>>   File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 615, in
>> crawlwrap
>>
>>     self.crawl()
>>
>>   File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1545,
>> in crawl
>>
>>     self.changelogs_batch_process(changes)
>>
>>   File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1445,
>> in changelogs_batch_process
>>
>>     self.process(batch)
>>
>>   File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1280,
>> in process
>>
>>     self.process_change(change, done, retry)
>>
>>   File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1179,
>> in process_change
>>
>>     failures = self.slave.server.entry_ops(entries)
>>
>>   File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 216, in
>> __call__
>>
>>     return self.ins(self.meth, *a)
>>
>>   File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 198, in
>> __call__
>>
>>     raise res
>>
>> OSError: [Errno 13] Permission denied:
>> '/bricks/brick1/brick/.glusterfs/29/d1/29d1d60d-1ad6-45fc-87e0-93d478f7331e'
>>
>>
>>
>> The permissions look fine. The replication is done via geo user instead
>> of root. It should be able to read, but I’m not sure if the syncdaemon runs
>> under geoaccount!?
>>
>>
>>
>> [root at master vRealize Operation Manager]# ll
>> /bricks/brick1/brick/.glusterfs/29/d1/29d1d60d-1ad6-45fc-87e0-93d478f7331e
>>
>> lrwxrwxrwx. 1 root root 75 Sep 21 09:39
>> /bricks/brick1/brick/.glusterfs/29/d1/29d1d60d-1ad6-45fc-87e0-93d478f7331e
>> -> ../../6b/97/6b97b987-8aef-46c3-af27-20d3aa883016/vRealize Operation
>> Manager
>>
>>
>>
>> [root at master vRealize Operation Manager]# ll
>> /bricks/brick1/brick/.glusterfs/29/d1/29d1d60d-1ad6-45fc-87e0-93d478f7331e/
>>
>> total 4
>>
>> drwxrwxr-x. 2 AD+user AD+group  131 Sep 21 10:14 6.7
>>
>> drwxrwxr-x. 2 AD+user AD+group 4096 Sep 21 09:43 7.0
>>
>> drwxrwxr-x. 2 AD+user AD+group   57 Sep 21 10:28 7.5
>>
>> [root at master vRealize Operation Manager]#
>>
>>
>>
>> It could be possible that the folder was renamed. I had 3 similar issues
>> since I migrated to GlusterFS 4.x but couldn’t investigate much. I needed
>> to completely wipe GlusterFS and geo-repliction to get rid of this error…
>>
>>
>>
>> Any help is appreciated.
>>
>>
>>
>> Regards,
>>
>>
>>
>> *Christian Kotte*
>>
>> _______________________________________________
>> Gluster-users mailing list
>> Gluster-users at gluster.org
>> https://lists.gluster.org/mailman/listinfo/gluster-users
>> <https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.gluster.org_mailman_listinfo_gluster-2Dusers&d=DwMFaQ&c=ZbgFmJjg4pdtrnL2HUJUDw&r=faVOd9yfnSYhe2mQhqlDwcpXGm7x8HN1C9wPmFD3694&m=buld78OSs9O-NEZ-w9vywUcr-bP6_RTbL2pwat-zRIU&s=bKc1d7zoIXuVSLbZS_vD3v4-FJrG2I6T6Dhcq8Qk6Bs&e=>
>>
>>
>>
>> --
>>
>> Thanks and Regards,
>>
>> Kotresh H R
>> _______________________________________________
>> Gluster-users mailing list
>> Gluster-users at gluster.org
>> https://lists.gluster.org/mailman/listinfo/gluster-users
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-users

-- 
Thanks and Regards,
Kotresh H R
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20190320/6aa8a6f4/attachment.html>