[Gluster-users] Gluster7 GeoReplication Operation not permitted and incomplete sync
Felix Kölzow
felix.koelzow at gmx.de
Mon Oct 5 08:39:48 UTC 2020
Dear Matthew,
can you provide more information regarding to the geo-replication brick
logs.
These files area also located in:
/var/log/glusterfs/geo-replication/storage_10.0.231.81_pcic-backup/
Usually, these log files are more precise to figure out the root cause
of the error.
Additionally, it is also worth to look at the log-files on the slave side.
Regards,
Felix
On 01/10/2020 23:08, Matthew Benstead wrote:
> Hello,
>
> I'm looking for some help with a GeoReplication Error in my Gluster
> 7/CentOS 7 setup. Replication progress has basically stopped, and the
> status of the replication keeps switching.
>
> The gsyncd log has errors like "Operation not permitted", "incomplete
> sync", etc... help? I'm not sure how to proceed in troubleshooting this.
>
> The log is here, it basically just repeats - from:
> /var/log/glusterfs/geo-replication/storage_10.0.231.81_pcic-backup/gsyncd.log
>
> [2020-10-01 20:52:15.291923] I [master(worker
> /data/storage_a/storage):1991:syncjob] Syncer: Sync Time Taken
> duration=32.8466 num_files=1749 job=3 return_code=23
> [2020-10-01 20:52:18.700062] I [master(worker
> /data/storage_c/storage):1991:syncjob] Syncer: Sync Time Taken
> duration=43.1210 num_files=3167 job=6 return_code=23
> [2020-10-01 20:52:23.383234] W [master(worker
> /data/storage_c/storage):1393:process] _GMaster: incomplete sync,
> retrying changelogs files=['XSYNC-CHANGELOG.1601585397']
> [2020-10-01 20:52:28.537657] E [repce(worker
> /data/storage_b/storage):213:__call__] RepceClient: call failed
> call=258187:140538843596608:1601585515.63 method=entry_ops
> error=OSError
> [2020-10-01 20:52:28.538064] E [syncdutils(worker
> /data/storage_b/storage):339:log_raise_exception] <top>: FAIL:
> Traceback (most recent call last):
> File "/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py", line 332,
> in main
> func(args)
> File "/usr/libexec/glusterfs/python/syncdaemon/subcmds.py", line 86,
> in subcmd_worker
> local.service_loop(remote)
> File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line
> 1308, in service_loop
> g1.crawlwrap(oneshot=True, register_time=register_time)
> File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 602,
> in crawlwrap
> self.crawl()
> File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1682,
> in crawl
> self.process([item[1]], 0)
> File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1327,
> in process
> self.process_change(change, done, retry)
> File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1221,
> in process_change
> failures = self.slave.server.entry_ops(entries)
> File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 232, in
> __call__
> return self.ins(self.meth, *a)
> File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 214, in
> __call__
> raise res
> OSError: [Errno 1] Operation not permitted
> [2020-10-01 20:52:28.570316] I [repce(agent
> /data/storage_b/storage):96:service_loop] RepceServer: terminating on
> reaching EOF.
> [2020-10-01 20:52:28.613603] I
> [gsyncdstatus(monitor):248:set_worker_status] GeorepStatus: Worker
> Status Change status=Faulty
> [2020-10-01 20:52:29.619797] I [master(worker
> /data/storage_c/storage):1991:syncjob] Syncer: Sync Time Taken
> duration=5.6458 num_files=455 job=3 return_code=23
> [2020-10-01 20:52:38.286245] I [master(worker
> /data/storage_c/storage):1991:syncjob] Syncer: Sync Time Taken
> duration=14.1824 num_files=1333 job=2 return_code=23
> [2020-10-01 20:52:38.628156] I
> [gsyncdstatus(monitor):248:set_worker_status] GeorepStatus: Worker
> Status Change status=Initializing...
> [2020-10-01 20:52:38.628325] I [monitor(monitor):159:monitor] Monitor:
> starting gsyncd worker brick=/data/storage_b/storage
> slave_node=10.0.231.82
> [2020-10-01 20:52:38.684736] I [gsyncd(agent
> /data/storage_b/storage):318:main] <top>: Using session config
> file
> path=/var/lib/glusterd/geo-replication/storage_10.0.231.81_pcic-backup/gsyncd.conf
> [2020-10-01 20:52:38.687213] I [gsyncd(worker
> /data/storage_b/storage):318:main] <top>: Using session config
> file
> path=/var/lib/glusterd/geo-replication/storage_10.0.231.81_pcic-backup/gsyncd.conf
> [2020-10-01 20:52:38.687401] I [changelogagent(agent
> /data/storage_b/storage):72:__init__] ChangelogAgent: Agent listining...
> [2020-10-01 20:52:38.703295] I [resource(worker
> /data/storage_b/storage):1386:connect_remote] SSH: Initializing SSH
> connection between master and slave...
> [2020-10-01 20:52:40.388372] I [resource(worker
> /data/storage_b/storage):1435:connect_remote] SSH: SSH connection
> between master and slave established. duration=1.6849
> [2020-10-01 20:52:40.388582] I [resource(worker
> /data/storage_b/storage):1105:connect] GLUSTER: Mounting gluster volume
> locally...
> [2020-10-01 20:52:41.501105] I [resource(worker
> /data/storage_b/storage):1128:connect] GLUSTER: Mounted gluster volume
> duration=1.1123
> [2020-10-01 20:52:41.501405] I [subcmds(worker
> /data/storage_b/storage):84:subcmd_worker] <top>: Worker spawn
> successful. Acknowledging back to monitor
> [2020-10-01 20:52:43.531146] I [master(worker
> /data/storage_b/storage):1640:register] _GMaster: Working dir
> path=/var/lib/misc/gluster/gsyncd/storage_10.0.231.81_pcic-backup/data-storage_b-storage
> [2020-10-01 20:52:43.533953] I [resource(worker
> /data/storage_b/storage):1291:service_loop] GLUSTER: Register time
> time=1601585563
> [2020-10-01 20:52:43.547092] I [gsyncdstatus(worker
> /data/storage_b/storage):281:set_active] GeorepStatus: Worker Status
> Change status=Active
> [2020-10-01 20:52:43.561920] I [gsyncdstatus(worker
> /data/storage_b/storage):253:set_worker_crawl_status] GeorepStatus:
> Crawl Status Change status=History Crawl
> [2020-10-01 20:52:43.562184] I [master(worker
> /data/storage_b/storage):1554:crawl] _GMaster: starting history
> crawl turns=1 stime=None entry_stime=None etime=1601585563
> [2020-10-01 20:52:43.562269] I [resource(worker
> /data/storage_b/storage):1307:service_loop] GLUSTER: No stime available,
> using xsync crawl
> [2020-10-01 20:52:43.569799] I [master(worker
> /data/storage_b/storage):1670:crawl] _GMaster: starting hybrid
> crawl stime=None
> [2020-10-01 20:52:43.573528] I [gsyncdstatus(worker
> /data/storage_b/storage):253:set_worker_crawl_status] GeorepStatus:
> Crawl Status Change status=Hybrid Crawl
> [2020-10-01 20:52:44.370985] I [master(worker
> /data/storage_c/storage):1991:syncjob] Syncer: Sync Time Taken
> duration=20.4307 num_files=2609 job=5 return_code=23
> [2020-10-01 20:52:49.431854] W [master(worker
> /data/storage_c/storage):1393:process] _GMaster: incomplete sync,
> retrying changelogs files=['XSYNC-CHANGELOG.1601585397']
> [2020-10-01 20:52:54.801500] I [master(worker
> /data/storage_a/storage):1991:syncjob] Syncer: Sync Time Taken
> duration=72.7492 num_files=4227 job=2 return_code=23
> [2020-10-01 20:52:56.766547] I [master(worker
> /data/storage_a/storage):1991:syncjob] Syncer: Sync Time Taken
> duration=74.3569 num_files=4674 job=5 return_code=23
> [2020-10-01 20:53:18.853333] I [master(worker
> /data/storage_c/storage):1991:syncjob] Syncer: Sync Time Taken
> duration=28.7125 num_files=4397 job=3 return_code=23
> [2020-10-01 20:53:21.224921] W [master(worker
> /data/storage_a/storage):1393:process] _GMaster: incomplete sync,
> retrying changelogs files=['CHANGELOG.1601044033',
> 'CHANGELOG.1601044048', 'CHANGELOG.1601044063', 'CHANGELOG.1601044078',
> 'CHANGELOG.1601044093', 'CHANGELOG.1601044108', 'CHANGELOG.1601044123']
> [2020-10-01 20:53:22.134536] I [master(worker
> /data/storage_a/storage):1991:syncjob] Syncer: Sync Time Taken
> duration=0.2159 num_files=3 job=3 return_code=23
> [2020-10-01 20:53:25.615712] I [master(worker
> /data/storage_b/storage):1681:crawl] _GMaster: processing xsync
> changelog
> path=/var/lib/misc/gluster/gsyncd/storage_10.0.231.81_pcic-backup/data-storage_b-storage/xsync/XSYNC-CHANGELOG.1601585563
> [2020-10-01 20:53:25.634970] W [master(worker
> /data/storage_c/storage):1393:process] _GMaster: incomplete sync,
> retrying changelogs files=['XSYNC-CHANGELOG.1601585397']
>
> GeoReplication status - see it change from Active to Faulty:
>
> [root at storage01 ~]# gluster volume geo-replication status
>
> MASTER NODE MASTER VOL MASTER BRICK SLAVE USER
> SLAVE SLAVE NODE STATUS
> CRAWL STATUS LAST_SYNCED
> ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
> 10.0.231.91 storage /data/storage_a/storage geoaccount
> ssh://geoaccount@10.0.231.81::pcic-backup 10.0.231.81 Active
> Changelog Crawl 2020-09-25 07:26:57
> 10.0.231.91 storage /data/storage_c/storage geoaccount
> ssh://geoaccount@10.0.231.81::pcic-backup 10.0.231.82 Active
> Hybrid Crawl N/A
> 10.0.231.91 storage /data/storage_b/storage geoaccount
> ssh://geoaccount@10.0.231.81::pcic-backup 10.0.231.82 Active
> Hybrid Crawl N/A
> 10.0.231.92 storage /data/storage_b/storage geoaccount
> ssh://geoaccount@10.0.231.81::pcic-backup 10.0.231.82 Active
> History Crawl 2020-09-23 01:56:05
> 10.0.231.92 storage /data/storage_a/storage geoaccount
> ssh://geoaccount@10.0.231.81::pcic-backup 10.0.231.82 Active
> Hybrid Crawl N/A
> 10.0.231.92 storage /data/storage_c/storage geoaccount
> ssh://geoaccount@10.0.231.81::pcic-backup 10.0.231.81 Active
> Hybrid Crawl N/A
> 10.0.231.93 storage /data/storage_c/storage geoaccount
> ssh://geoaccount@10.0.231.81::pcic-backup 10.0.231.81 Active
> Changelog Crawl 2020-09-25 06:55:57
> 10.0.231.93 storage /data/storage_b/storage geoaccount
> ssh://geoaccount@10.0.231.81::pcic-backup 10.0.231.81 Active
> Hybrid Crawl N/A
> 10.0.231.93 storage /data/storage_a/storage geoaccount
> ssh://geoaccount@10.0.231.81::pcic-backup 10.0.231.81 Active
> Hybrid Crawl N/A
>
> [root at storage01 ~]# gluster volume geo-replication status
>
> MASTER NODE MASTER VOL MASTER BRICK SLAVE USER
> SLAVE SLAVE NODE STATUS
> CRAWL STATUS LAST_SYNCED
> ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
> 10.0.231.91 storage /data/storage_a/storage geoaccount
> ssh://geoaccount@10.0.231.81::pcic-backup 10.0.231.81 Active
> Changelog Crawl 2020-09-25 07:26:57
> 10.0.231.91 storage /data/storage_c/storage geoaccount
> ssh://geoaccount@10.0.231.81::pcic-backup 10.0.231.82 Active
> Hybrid Crawl N/A
> 10.0.231.91 storage /data/storage_b/storage geoaccount
> ssh://geoaccount@10.0.231.81::pcic-backup N/A Faulty
> N/A N/A
> 10.0.231.92 storage /data/storage_b/storage geoaccount
> ssh://geoaccount@10.0.231.81::pcic-backup 10.0.231.82 Active
> History Crawl 2020-09-23 01:58:05
> 10.0.231.92 storage /data/storage_a/storage geoaccount
> ssh://geoaccount@10.0.231.81::pcic-backup 10.0.231.82 Active
> Hybrid Crawl N/A
> 10.0.231.92 storage /data/storage_c/storage geoaccount
> ssh://geoaccount@10.0.231.81::pcic-backup N/A Faulty
> N/A N/A
> 10.0.231.93 storage /data/storage_c/storage geoaccount
> ssh://geoaccount@10.0.231.81::pcic-backup 10.0.231.81 Active
> Changelog Crawl 2020-09-25 06:58:56
> 10.0.231.93 storage /data/storage_b/storage geoaccount
> ssh://geoaccount@10.0.231.81::pcic-backup 10.0.231.81 Active
> Hybrid Crawl N/A
> 10.0.231.93 storage /data/storage_a/storage geoaccount
> ssh://geoaccount@10.0.231.81::pcic-backup N/A Faulty
> N/A N/A
>
>
> Cluster information: (Note - disabled performance.open-behind to work
> around https://github.com/gluster/glusterfs/issues/1440 )
>
> [root at storage01 ~]# gluster --version | head -1; cat
> /etc/centos-release; uname -r
> glusterfs 7.7
> CentOS Linux release 7.8.2003 (Core)
> 3.10.0-1127.10.1.el7.x86_64
>
> [root at storage01 ~]# df -h /storage2/
> Filesystem Size Used Avail Use% Mounted on
> 10.0.231.91:/storage 328T 228T 100T 70% /storage2
>
> [root at storage01 ~]# gluster volume info
>
> Volume Name: storage
> Type: Distributed-Replicate
> Volume ID: cf94a8f2-324b-40b3-bf72-c3766100ea99
> Status: Started
> Snapshot Count: 0
> Number of Bricks: 3 x (2 + 1) = 9
> Transport-type: tcp
> Bricks:
> Brick1: 10.0.231.91:/data/storage_a/storage
> Brick2: 10.0.231.92:/data/storage_b/storage
> Brick3: 10.0.231.93:/data/storage_c/storage (arbiter)
> Brick4: 10.0.231.92:/data/storage_a/storage
> Brick5: 10.0.231.93:/data/storage_b/storage
> Brick6: 10.0.231.91:/data/storage_c/storage (arbiter)
> Brick7: 10.0.231.93:/data/storage_a/storage
> Brick8: 10.0.231.91:/data/storage_b/storage
> Brick9: 10.0.231.92:/data/storage_c/storage (arbiter)
> Options Reconfigured:
> changelog.changelog: on
> geo-replication.ignore-pid-check: on
> geo-replication.indexing: on
> network.ping-timeout: 10
> features.inode-quota: on
> features.quota: on
> nfs.disable: on
> features.quota-deem-statfs: on
> storage.fips-mode-rchecksum: on
> performance.readdir-ahead: on
> performance.parallel-readdir: on
> cluster.lookup-optimize: on
> client.event-threads: 4
> server.event-threads: 4
> performance.cache-size: 256MB
> performance.open-behind: off
>
> Thanks,
> -Matthew
> ________
>
>
>
> Community Meeting Calendar:
>
> Schedule -
> Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
> Bridge: https://bluejeans.com/441850968
>
> Gluster-users mailing list
> Gluster-users at gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-users
More information about the Gluster-users
mailing list