[Gluster-users] Faulty staus in geo-replication session of a sub-volume

Sat May 30 14:51:55 UTC 2020

Hello Naranderan,

what OS are you using ? Do you have SELINUX in enforcing mode (verify via 'sestatus') ?

Best Regards,
Strahil Nikolov

В събота, 30 май 2020 г., 13:33:05 ч. Гринуич+3, Naranderan Ramakrishnan <rnaranbe at gmail.com> написа: 

Dear Developers/Users,

A geo-rep session of a sub-volume is in 'faulty' status. Please find the setup and log details below.

Setup Details:
> Gluster version - 7.0
> Volume configuration - 2x3 (DxR)
> gysncd permission(master)  - root
> gysncd permission(slave)  - sas (non-root)
> glusterd, glusterfsd permissions(master) - root
> glusterd, glusterfsd permissions(slave) - root

Log details:
In the master gyncd log, this traceback is printed repeatedly.
> [2020-05-22 12:09:43.838727] I [master(worker /home/sas/gluster/data/code-ide):1991:syncjob] Syncer: Sync Time Taken duration=0.4240 num_files=1 job=1 return_code=0
> [2020-05-22 12:09:43.944392] E [repce(worker /home/sas/gluster/data/code-ide):214:__call__] RepceClient: call failed call=261471:140535761106752:1590149383.8 method=entry_ops error=OSError
> [2020-05-22 12:09:43.944746] E [syncdutils(worker /home/sas/gluster/data/code-ide):338:log_raise_exception] <top>: FAIL: 
> Traceback (most recent call last):
>   File "/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py", line 325, in main
>     func(args)
>   File "/usr/libexec/glusterfs/python/syncdaemon/subcmds.py", line 86, in subcmd_worker
>     local.service_loop(remote)
>   File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 1305, in service_loop
>     g3.crawlwrap(oneshot=True)
>   File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 602, in crawlwrap
>     self.crawl()
>   File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1592, in crawl
>     self.changelogs_batch_process(changes)
>   File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1492, in changelogs_batch_process
>     self.process(batch)
>   File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1327, in process
>     self.process_change(change, done, retry)
>   File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1221, in process_change
>     failures = self.slave.server.entry_ops(entries)
>   File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 233, in __call__
>     return self.ins(self.meth, *a)
>   File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 215, in __call__
>     raise res
> OSError: [Errno 13] Permission denied: '/home/sas/gluster/data/code-ide/.glusterfs/c2/bf/c2bff066-b10e-468a-a67e-b8b501a8951e'
> [2020-05-22 12:09:43.968710] I [repce(agent /home/sas/gluster/data/code-ide):97:service_loop] RepceServer: terminating on reaching EOF.
> [2020-05-22 12:09:44.912470] I [monitor(monitor):280:monitor] Monitor: worker died in startup phase brick=/home/sas/gluster/data/code-ide
> [2020-05-22 12:09:44.913692] I [gsyncdstatus(monitor):248:set_worker_status] GeorepStatus: Worker Status Change status=Faulty

In salve end, these are printed repeatedly. 
> 
> [2020-05-22 11:23:26.65115] W [gsyncd(slave 10.47.8.153/home/sas/gluster/data/code-ide):307:main] <top>: Session config file not exists, using the default config path=/var/lib/glusterd/geo-replication/code-ide_10.37.11.252_code-ide/gsyncd.conf
> [2020-05-22 11:23:26.77414] I [resource(slave 10.47.8.153/home/sas/gluster/data/code-ide):1105:connect] GLUSTER: Mounting gluster volume locally...
> [2020-05-22 11:23:27.297466] I [resource(slave 10.47.8.153/home/sas/gluster/data/code-ide):1128:connect] GLUSTER: Mounted gluster volume duration=1.2199
> [2020-05-22 11:23:27.298125] I [resource(slave 10.47.8.153/home/sas/gluster/data/code-ide):1155:service_loop] GLUSTER: slave listening
> [2020-05-22 11:23:32.654939] E [repce(slave 10.47.8.153/home/sas/gluster/data/code-ide):122:worker] <top>: call failed: 
> Traceback (most recent call last):
>   File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 118, in worker
>     res = getattr(self.obj, rmeth)(*in_data[2:])
>   File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 706, in entry_ops
>     collect_failure(e, cmd_ret, uid, gid)
>   File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 444, in collect_failure
>     disk_gfid)
>   File "/usr/libexec/glusterfs/python/syncdaemon/syncdutils.py", line 687, in get_slv_dir_path
>     [ENOENT], [ESTALE])
>   File "/usr/libexec/glusterfs/python/syncdaemon/syncdutils.py", line 546, in errno_wrap
>     return call(*arg)
> OSError: [Errno 13] Permission denied: '/home/sas/gluster/data/code-ide/.glusterfs/c2/bf/c2bff066-b10e-468a-a67e-b8b501a8951e'
> [2020-05-22 11:23:32.741317] I [repce(slave 10.47.8.153/home/sas/gluster/data/code-ide):97:service_loop] RepceServer: terminating on reaching EOF.

Additional info:
Parallel to this GFID(/home/sas/gluster/data/code-ide/.glusterfs/c2/bf/c2bff066-b10e-468a-a67e-b8b501a8951e) mentioned in master gyscnd log, there are some files with ---------T permission & trusted.glusterfs.dht.linkto extended attribute in the master subvolume for which geo-rep session is in faulty status. Not sure whether this is related to this geo-rep issue or not.

I have attached a few screenshots and log stmts for further info. Please let us know how we should solve this.
Thanks in advance.

Regards,
Naranderan R

________

Community Meeting Calendar:

Schedule -
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://bluejeans.com/441850968

Gluster-users mailing list
Gluster-users at gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users