[Gluster-users] Gluster7 GeoReplication Operation not permitted and incomplete sync

Matthew Benstead matthewb at uvic.ca
Thu Oct 1 21:08:04 UTC 2020


Hello,

I'm looking for some help with a GeoReplication Error in my Gluster
7/CentOS 7 setup. Replication progress has basically stopped, and the
status of the replication keeps switching.

The gsyncd log has errors like "Operation not permitted", "incomplete
sync", etc... help? I'm not sure how to proceed in troubleshooting this.

The log is here, it basically just repeats - from:
/var/log/glusterfs/geo-replication/storage_10.0.231.81_pcic-backup/gsyncd.log

[2020-10-01 20:52:15.291923] I [master(worker
/data/storage_a/storage):1991:syncjob] Syncer: Sync Time Taken   
duration=32.8466        num_files=1749  job=3   return_code=23
[2020-10-01 20:52:18.700062] I [master(worker
/data/storage_c/storage):1991:syncjob] Syncer: Sync Time Taken   
duration=43.1210        num_files=3167  job=6   return_code=23
[2020-10-01 20:52:23.383234] W [master(worker
/data/storage_c/storage):1393:process] _GMaster: incomplete sync,
retrying changelogs     files=['XSYNC-CHANGELOG.1601585397']
[2020-10-01 20:52:28.537657] E [repce(worker
/data/storage_b/storage):213:__call__] RepceClient: call failed   
call=258187:140538843596608:1601585515.63       method=entry_ops       
error=OSError
[2020-10-01 20:52:28.538064] E [syncdutils(worker
/data/storage_b/storage):339:log_raise_exception] <top>: FAIL:
Traceback (most recent call last):
  File "/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py", line 332,
in main
    func(args)
  File "/usr/libexec/glusterfs/python/syncdaemon/subcmds.py", line 86,
in subcmd_worker
    local.service_loop(remote)
  File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line
1308, in service_loop
    g1.crawlwrap(oneshot=True, register_time=register_time)
  File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 602,
in crawlwrap
    self.crawl()
  File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1682,
in crawl
    self.process([item[1]], 0)
  File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1327,
in process
    self.process_change(change, done, retry)
  File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1221,
in process_change
    failures = self.slave.server.entry_ops(entries)
  File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 232, in
__call__
    return self.ins(self.meth, *a)
  File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 214, in
__call__
    raise res
OSError: [Errno 1] Operation not permitted
[2020-10-01 20:52:28.570316] I [repce(agent
/data/storage_b/storage):96:service_loop] RepceServer: terminating on
reaching EOF.
[2020-10-01 20:52:28.613603] I
[gsyncdstatus(monitor):248:set_worker_status] GeorepStatus: Worker
Status Change status=Faulty
[2020-10-01 20:52:29.619797] I [master(worker
/data/storage_c/storage):1991:syncjob] Syncer: Sync Time Taken   
duration=5.6458 num_files=455   job=3   return_code=23
[2020-10-01 20:52:38.286245] I [master(worker
/data/storage_c/storage):1991:syncjob] Syncer: Sync Time Taken   
duration=14.1824        num_files=1333  job=2   return_code=23
[2020-10-01 20:52:38.628156] I
[gsyncdstatus(monitor):248:set_worker_status] GeorepStatus: Worker
Status Change status=Initializing...
[2020-10-01 20:52:38.628325] I [monitor(monitor):159:monitor] Monitor:
starting gsyncd worker   brick=/data/storage_b/storage  
slave_node=10.0.231.82
[2020-10-01 20:52:38.684736] I [gsyncd(agent
/data/storage_b/storage):318:main] <top>: Using session config
file       
path=/var/lib/glusterd/geo-replication/storage_10.0.231.81_pcic-backup/gsyncd.conf
[2020-10-01 20:52:38.687213] I [gsyncd(worker
/data/storage_b/storage):318:main] <top>: Using session config
file      
path=/var/lib/glusterd/geo-replication/storage_10.0.231.81_pcic-backup/gsyncd.conf
[2020-10-01 20:52:38.687401] I [changelogagent(agent
/data/storage_b/storage):72:__init__] ChangelogAgent: Agent listining...
[2020-10-01 20:52:38.703295] I [resource(worker
/data/storage_b/storage):1386:connect_remote] SSH: Initializing SSH
connection between master and slave...
[2020-10-01 20:52:40.388372] I [resource(worker
/data/storage_b/storage):1435:connect_remote] SSH: SSH connection
between master and slave established. duration=1.6849
[2020-10-01 20:52:40.388582] I [resource(worker
/data/storage_b/storage):1105:connect] GLUSTER: Mounting gluster volume
locally...
[2020-10-01 20:52:41.501105] I [resource(worker
/data/storage_b/storage):1128:connect] GLUSTER: Mounted gluster volume 
duration=1.1123
[2020-10-01 20:52:41.501405] I [subcmds(worker
/data/storage_b/storage):84:subcmd_worker] <top>: Worker spawn
successful. Acknowledging back to monitor
[2020-10-01 20:52:43.531146] I [master(worker
/data/storage_b/storage):1640:register] _GMaster: Working dir    
path=/var/lib/misc/gluster/gsyncd/storage_10.0.231.81_pcic-backup/data-storage_b-storage
[2020-10-01 20:52:43.533953] I [resource(worker
/data/storage_b/storage):1291:service_loop] GLUSTER: Register time     
time=1601585563
[2020-10-01 20:52:43.547092] I [gsyncdstatus(worker
/data/storage_b/storage):281:set_active] GeorepStatus: Worker Status
Change status=Active
[2020-10-01 20:52:43.561920] I [gsyncdstatus(worker
/data/storage_b/storage):253:set_worker_crawl_status] GeorepStatus:
Crawl Status Change     status=History Crawl
[2020-10-01 20:52:43.562184] I [master(worker
/data/storage_b/storage):1554:crawl] _GMaster: starting history
crawl     turns=1 stime=None      entry_stime=None        etime=1601585563
[2020-10-01 20:52:43.562269] I [resource(worker
/data/storage_b/storage):1307:service_loop] GLUSTER: No stime available,
using xsync crawl
[2020-10-01 20:52:43.569799] I [master(worker
/data/storage_b/storage):1670:crawl] _GMaster: starting hybrid
crawl      stime=None
[2020-10-01 20:52:43.573528] I [gsyncdstatus(worker
/data/storage_b/storage):253:set_worker_crawl_status] GeorepStatus:
Crawl Status Change     status=Hybrid Crawl
[2020-10-01 20:52:44.370985] I [master(worker
/data/storage_c/storage):1991:syncjob] Syncer: Sync Time Taken   
duration=20.4307        num_files=2609  job=5   return_code=23
[2020-10-01 20:52:49.431854] W [master(worker
/data/storage_c/storage):1393:process] _GMaster: incomplete sync,
retrying changelogs     files=['XSYNC-CHANGELOG.1601585397']
[2020-10-01 20:52:54.801500] I [master(worker
/data/storage_a/storage):1991:syncjob] Syncer: Sync Time Taken   
duration=72.7492        num_files=4227  job=2   return_code=23
[2020-10-01 20:52:56.766547] I [master(worker
/data/storage_a/storage):1991:syncjob] Syncer: Sync Time Taken   
duration=74.3569        num_files=4674  job=5   return_code=23
[2020-10-01 20:53:18.853333] I [master(worker
/data/storage_c/storage):1991:syncjob] Syncer: Sync Time Taken   
duration=28.7125        num_files=4397  job=3   return_code=23
[2020-10-01 20:53:21.224921] W [master(worker
/data/storage_a/storage):1393:process] _GMaster: incomplete sync,
retrying changelogs     files=['CHANGELOG.1601044033',
'CHANGELOG.1601044048', 'CHANGELOG.1601044063', 'CHANGELOG.1601044078',
'CHANGELOG.1601044093', 'CHANGELOG.1601044108', 'CHANGELOG.1601044123']
[2020-10-01 20:53:22.134536] I [master(worker
/data/storage_a/storage):1991:syncjob] Syncer: Sync Time Taken   
duration=0.2159 num_files=3     job=3   return_code=23
[2020-10-01 20:53:25.615712] I [master(worker
/data/storage_b/storage):1681:crawl] _GMaster: processing xsync
changelog
path=/var/lib/misc/gluster/gsyncd/storage_10.0.231.81_pcic-backup/data-storage_b-storage/xsync/XSYNC-CHANGELOG.1601585563
[2020-10-01 20:53:25.634970] W [master(worker
/data/storage_c/storage):1393:process] _GMaster: incomplete sync,
retrying changelogs     files=['XSYNC-CHANGELOG.1601585397']

GeoReplication status - see it change from Active to Faulty:

[root at storage01 ~]# gluster volume geo-replication status
 
MASTER NODE    MASTER VOL    MASTER BRICK               SLAVE USER   
SLAVE                                        SLAVE NODE     STATUS   
CRAWL STATUS       LAST_SYNCED                 
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
10.0.231.91    storage       /data/storage_a/storage    geoaccount   
ssh://geoaccount@10.0.231.81::pcic-backup    10.0.231.81    Active   
Changelog Crawl    2020-09-25 07:26:57         
10.0.231.91    storage       /data/storage_c/storage    geoaccount   
ssh://geoaccount@10.0.231.81::pcic-backup    10.0.231.82    Active   
Hybrid Crawl       N/A                         
10.0.231.91    storage       /data/storage_b/storage    geoaccount   
ssh://geoaccount@10.0.231.81::pcic-backup    10.0.231.82    Active   
Hybrid Crawl       N/A                         
10.0.231.92    storage       /data/storage_b/storage    geoaccount   
ssh://geoaccount@10.0.231.81::pcic-backup    10.0.231.82    Active   
History Crawl      2020-09-23 01:56:05         
10.0.231.92    storage       /data/storage_a/storage    geoaccount   
ssh://geoaccount@10.0.231.81::pcic-backup    10.0.231.82    Active   
Hybrid Crawl       N/A                         
10.0.231.92    storage       /data/storage_c/storage    geoaccount   
ssh://geoaccount@10.0.231.81::pcic-backup    10.0.231.81    Active   
Hybrid Crawl       N/A                         
10.0.231.93    storage       /data/storage_c/storage    geoaccount   
ssh://geoaccount@10.0.231.81::pcic-backup    10.0.231.81    Active   
Changelog Crawl    2020-09-25 06:55:57         
10.0.231.93    storage       /data/storage_b/storage    geoaccount   
ssh://geoaccount@10.0.231.81::pcic-backup    10.0.231.81    Active   
Hybrid Crawl       N/A                         
10.0.231.93    storage       /data/storage_a/storage    geoaccount   
ssh://geoaccount@10.0.231.81::pcic-backup    10.0.231.81    Active   
Hybrid Crawl       N/A   

[root at storage01 ~]# gluster volume geo-replication status
 
MASTER NODE    MASTER VOL    MASTER BRICK               SLAVE USER   
SLAVE                                        SLAVE NODE     STATUS   
CRAWL STATUS       LAST_SYNCED                 
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
10.0.231.91    storage       /data/storage_a/storage    geoaccount   
ssh://geoaccount@10.0.231.81::pcic-backup    10.0.231.81    Active   
Changelog Crawl    2020-09-25 07:26:57         
10.0.231.91    storage       /data/storage_c/storage    geoaccount   
ssh://geoaccount@10.0.231.81::pcic-backup    10.0.231.82    Active   
Hybrid Crawl       N/A                         
10.0.231.91    storage       /data/storage_b/storage    geoaccount   
ssh://geoaccount@10.0.231.81::pcic-backup    N/A            Faulty   
N/A                N/A                         
10.0.231.92    storage       /data/storage_b/storage    geoaccount   
ssh://geoaccount@10.0.231.81::pcic-backup    10.0.231.82    Active   
History Crawl      2020-09-23 01:58:05         
10.0.231.92    storage       /data/storage_a/storage    geoaccount   
ssh://geoaccount@10.0.231.81::pcic-backup    10.0.231.82    Active   
Hybrid Crawl       N/A                         
10.0.231.92    storage       /data/storage_c/storage    geoaccount   
ssh://geoaccount@10.0.231.81::pcic-backup    N/A            Faulty   
N/A                N/A                         
10.0.231.93    storage       /data/storage_c/storage    geoaccount   
ssh://geoaccount@10.0.231.81::pcic-backup    10.0.231.81    Active   
Changelog Crawl    2020-09-25 06:58:56         
10.0.231.93    storage       /data/storage_b/storage    geoaccount   
ssh://geoaccount@10.0.231.81::pcic-backup    10.0.231.81    Active   
Hybrid Crawl       N/A                         
10.0.231.93    storage       /data/storage_a/storage    geoaccount   
ssh://geoaccount@10.0.231.81::pcic-backup    N/A            Faulty   
N/A                N/A    


Cluster information: (Note - disabled performance.open-behind to work
around https://github.com/gluster/glusterfs/issues/1440 )

[root at storage01 ~]# gluster --version | head -1; cat
/etc/centos-release; uname -r
glusterfs 7.7
CentOS Linux release 7.8.2003 (Core)
3.10.0-1127.10.1.el7.x86_64

[root at storage01 ~]# df -h /storage2/
Filesystem            Size  Used Avail Use% Mounted on
10.0.231.91:/storage  328T  228T  100T  70% /storage2

[root at storage01 ~]# gluster volume info
 
Volume Name: storage
Type: Distributed-Replicate
Volume ID: cf94a8f2-324b-40b3-bf72-c3766100ea99
Status: Started
Snapshot Count: 0
Number of Bricks: 3 x (2 + 1) = 9
Transport-type: tcp
Bricks:
Brick1: 10.0.231.91:/data/storage_a/storage
Brick2: 10.0.231.92:/data/storage_b/storage
Brick3: 10.0.231.93:/data/storage_c/storage (arbiter)
Brick4: 10.0.231.92:/data/storage_a/storage
Brick5: 10.0.231.93:/data/storage_b/storage
Brick6: 10.0.231.91:/data/storage_c/storage (arbiter)
Brick7: 10.0.231.93:/data/storage_a/storage
Brick8: 10.0.231.91:/data/storage_b/storage
Brick9: 10.0.231.92:/data/storage_c/storage (arbiter)
Options Reconfigured:
changelog.changelog: on
geo-replication.ignore-pid-check: on
geo-replication.indexing: on
network.ping-timeout: 10
features.inode-quota: on
features.quota: on
nfs.disable: on
features.quota-deem-statfs: on
storage.fips-mode-rchecksum: on
performance.readdir-ahead: on
performance.parallel-readdir: on
cluster.lookup-optimize: on
client.event-threads: 4
server.event-threads: 4
performance.cache-size: 256MB
performance.open-behind: off

Thanks,
 -Matthew


More information about the Gluster-users mailing list