[Gluster-users] Gluster7 GeoReplication Operation not permitted and incomplete sync

Felix Kölzow felix.koelzow at gmx.de
Mon Oct 5 19:53:23 UTC 2020


Dear Matthew,

this is our configuration:

zfs get all mypool

mypool  xattr sa                              local
mypool  acltype posixacl                        local


Something more to consider?


Regards,

Felix



On 05/10/2020 21:11, Matthew Benstead wrote:
> Thanks Felix - looking through some more of the logs I may have found
> the reason...
>
> From
> /var/log/glusterfs/geo-replication/storage_10.0.231.81_pcic-backup/mnt-data-storage_a-storage.log
>
> [2020-10-05 18:13:35.736838] E [fuse-bridge.c:4288:fuse_xattr_cbk]
> 0-glusterfs-fuse: extended attribute not supported by the backend storage
> [2020-10-05 18:18:53.885591] E [fuse-bridge.c:4288:fuse_xattr_cbk]
> 0-glusterfs-fuse: extended attribute not supported by the backend storage
> [2020-10-05 18:22:14.405234] E [fuse-bridge.c:4288:fuse_xattr_cbk]
> 0-glusterfs-fuse: extended attribute not supported by the backend storage
> [2020-10-05 18:25:53.971679] E [fuse-bridge.c:4288:fuse_xattr_cbk]
> 0-glusterfs-fuse: extended attribute not supported by the backend storage
> [2020-10-05 18:31:44.571557] E [fuse-bridge.c:4288:fuse_xattr_cbk]
> 0-glusterfs-fuse: extended attribute not supported by the backend storage
> [2020-10-05 18:36:36.508772] E [fuse-bridge.c:4288:fuse_xattr_cbk]
> 0-glusterfs-fuse: extended attribute not supported by the backend storage
> [2020-10-05 18:40:10.401055] E [fuse-bridge.c:4288:fuse_xattr_cbk]
> 0-glusterfs-fuse: extended attribute not supported by the backend storage
> [2020-10-05 18:42:57.833536] E [fuse-bridge.c:4288:fuse_xattr_cbk]
> 0-glusterfs-fuse: extended attribute not supported by the backend storage
> [2020-10-05 18:45:19.691953] E [fuse-bridge.c:4288:fuse_xattr_cbk]
> 0-glusterfs-fuse: extended attribute not supported by the backend storage
> [2020-10-05 18:48:26.478532] E [fuse-bridge.c:4288:fuse_xattr_cbk]
> 0-glusterfs-fuse: extended attribute not supported by the backend storage
> [2020-10-05 18:52:24.466914] E [fuse-bridge.c:4288:fuse_xattr_cbk]
> 0-glusterfs-fuse: extended attribute not supported by the backend storage
>
>
> The slave nodes are running gluster on top of ZFS, but I had
> configured ACLs - is there something else missing to make this work
> with ZFS?
>
> [root at pcic-backup01 ~]# gluster volume info
>
> Volume Name: pcic-backup
> Type: Distribute
> Volume ID: 7af8a424-f4b6-4405-bba1-0dbafb0fa231
> Status: Started
> Snapshot Count: 0
> Number of Bricks: 2
> Transport-type: tcp
> Bricks:
> Brick1: 10.0.231.81:/pcic-backup01-zpool/brick
> Brick2: 10.0.231.82:/pcic-backup02-zpool/brick
> Options Reconfigured:
> network.ping-timeout: 10
> performance.cache-size: 256MB
> server.event-threads: 4
> client.event-threads: 4
> cluster.lookup-optimize: on
> performance.parallel-readdir: on
> performance.readdir-ahead: on
> features.quota-deem-statfs: on
> features.inode-quota: on
> features.quota: on
> transport.address-family: inet
> nfs.disable: on
> features.read-only: off
> performance.open-behind: off
>
>
> [root at pcic-backup01 ~]# zfs get acltype pcic-backup01-zpool
> NAME                 PROPERTY  VALUE     SOURCE
> pcic-backup01-zpool  acltype   posixacl  local
>
> [root at pcic-backup01 ~]# grep "pcic-backup0" /proc/mounts
> pcic-backup01-zpool /pcic-backup01-zpool zfs
> rw,seclabel,xattr,posixacl 0 0
>
>
> [root at pcic-backup02 ~]# zfs get acltype pcic-backup02-zpool
> NAME                 PROPERTY  VALUE     SOURCE
> pcic-backup02-zpool  acltype   posixacl  local
>
> [root at pcic-backup02 ~]# grep "pcic-backup0" /proc/mounts
> pcic-backup02-zpool /pcic-backup02-zpool zfs
> rw,seclabel,xattr,posixacl 0 0
>
> Thanks,
>  -Matthew
>
>
> --
> Matthew Benstead
> System Administrator
> Pacific Climate Impacts Consortium <https://pacificclimate.org/>
> University of Victoria, UH1
> PO Box 1800, STN CSC
> Victoria, BC, V8W 2Y2
> Phone: +1-250-721-8432
> Email: matthewb at uvic.ca
>
> On 10/5/20 1:39 AM, Felix Kölzow wrote:
>> Dear Matthew,
>>
>>
>> can you provide more information regarding to the geo-replication brick
>> logs.
>>
>> These files area also located in:
>>
>> /var/log/glusterfs/geo-replication/storage_10.0.231.81_pcic-backup/
>>
>>
>> Usually, these log files are more precise to figure out the root cause
>> of the error.
>>
>> Additionally, it is also worth to look at the log-files on the slave
>> side.
>>
>>
>> Regards,
>>
>> Felix
>>
>>
>> On 01/10/2020 23:08, Matthew Benstead wrote:
>>> Hello,
>>>
>>> I'm looking for some help with a GeoReplication Error in my Gluster
>>> 7/CentOS 7 setup. Replication progress has basically stopped, and the
>>> status of the replication keeps switching.
>>>
>>> The gsyncd log has errors like "Operation not permitted", "incomplete
>>> sync", etc... help? I'm not sure how to proceed in troubleshooting
>>> this.
>>>
>>> The log is here, it basically just repeats - from:
>>> /var/log/glusterfs/geo-replication/storage_10.0.231.81_pcic-backup/gsyncd.log
>>>
>>>
>>> [2020-10-01 20:52:15.291923] I [master(worker
>>> /data/storage_a/storage):1991:syncjob] Syncer: Sync Time Taken
>>> duration=32.8466        num_files=1749  job=3   return_code=23
>>> [2020-10-01 20:52:18.700062] I [master(worker
>>> /data/storage_c/storage):1991:syncjob] Syncer: Sync Time Taken
>>> duration=43.1210        num_files=3167  job=6   return_code=23
>>> [2020-10-01 20:52:23.383234] W [master(worker
>>> /data/storage_c/storage):1393:process] _GMaster: incomplete sync,
>>> retrying changelogs     files=['XSYNC-CHANGELOG.1601585397']
>>> [2020-10-01 20:52:28.537657] E [repce(worker
>>> /data/storage_b/storage):213:__call__] RepceClient: call failed
>>> call=258187:140538843596608:1601585515.63 method=entry_ops
>>> error=OSError
>>> [2020-10-01 20:52:28.538064] E [syncdutils(worker
>>> /data/storage_b/storage):339:log_raise_exception] <top>: FAIL:
>>> Traceback (most recent call last):
>>>    File "/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py", line 332,
>>> in main
>>>      func(args)
>>>    File "/usr/libexec/glusterfs/python/syncdaemon/subcmds.py", line 86,
>>> in subcmd_worker
>>>      local.service_loop(remote)
>>>    File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line
>>> 1308, in service_loop
>>>      g1.crawlwrap(oneshot=True, register_time=register_time)
>>>    File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 602,
>>> in crawlwrap
>>>      self.crawl()
>>>    File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line
>>> 1682,
>>> in crawl
>>>      self.process([item[1]], 0)
>>>    File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line
>>> 1327,
>>> in process
>>>      self.process_change(change, done, retry)
>>>    File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line
>>> 1221,
>>> in process_change
>>>      failures = self.slave.server.entry_ops(entries)
>>>    File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line
>>> 232, in
>>> __call__
>>>      return self.ins(self.meth, *a)
>>>    File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line
>>> 214, in
>>> __call__
>>>      raise res
>>> OSError: [Errno 1] Operation not permitted
>>> [2020-10-01 20:52:28.570316] I [repce(agent
>>> /data/storage_b/storage):96:service_loop] RepceServer: terminating on
>>> reaching EOF.
>>> [2020-10-01 20:52:28.613603] I
>>> [gsyncdstatus(monitor):248:set_worker_status] GeorepStatus: Worker
>>> Status Change status=Faulty
>>> [2020-10-01 20:52:29.619797] I [master(worker
>>> /data/storage_c/storage):1991:syncjob] Syncer: Sync Time Taken
>>> duration=5.6458 num_files=455   job=3   return_code=23
>>> [2020-10-01 20:52:38.286245] I [master(worker
>>> /data/storage_c/storage):1991:syncjob] Syncer: Sync Time Taken
>>> duration=14.1824        num_files=1333  job=2   return_code=23
>>> [2020-10-01 20:52:38.628156] I
>>> [gsyncdstatus(monitor):248:set_worker_status] GeorepStatus: Worker
>>> Status Change status=Initializing...
>>> [2020-10-01 20:52:38.628325] I [monitor(monitor):159:monitor] Monitor:
>>> starting gsyncd worker   brick=/data/storage_b/storage
>>> slave_node=10.0.231.82
>>> [2020-10-01 20:52:38.684736] I [gsyncd(agent
>>> /data/storage_b/storage):318:main] <top>: Using session config
>>> file
>>> path=/var/lib/glusterd/geo-replication/storage_10.0.231.81_pcic-backup/gsyncd.conf
>>>
>>> [2020-10-01 20:52:38.687213] I [gsyncd(worker
>>> /data/storage_b/storage):318:main] <top>: Using session config
>>> file
>>> path=/var/lib/glusterd/geo-replication/storage_10.0.231.81_pcic-backup/gsyncd.conf
>>>
>>> [2020-10-01 20:52:38.687401] I [changelogagent(agent
>>> /data/storage_b/storage):72:__init__] ChangelogAgent: Agent
>>> listining...
>>> [2020-10-01 20:52:38.703295] I [resource(worker
>>> /data/storage_b/storage):1386:connect_remote] SSH: Initializing SSH
>>> connection between master and slave...
>>> [2020-10-01 20:52:40.388372] I [resource(worker
>>> /data/storage_b/storage):1435:connect_remote] SSH: SSH connection
>>> between master and slave established. duration=1.6849
>>> [2020-10-01 20:52:40.388582] I [resource(worker
>>> /data/storage_b/storage):1105:connect] GLUSTER: Mounting gluster volume
>>> locally...
>>> [2020-10-01 20:52:41.501105] I [resource(worker
>>> /data/storage_b/storage):1128:connect] GLUSTER: Mounted gluster volume
>>> duration=1.1123
>>> [2020-10-01 20:52:41.501405] I [subcmds(worker
>>> /data/storage_b/storage):84:subcmd_worker] <top>: Worker spawn
>>> successful. Acknowledging back to monitor
>>> [2020-10-01 20:52:43.531146] I [master(worker
>>> /data/storage_b/storage):1640:register] _GMaster: Working dir
>>> path=/var/lib/misc/gluster/gsyncd/storage_10.0.231.81_pcic-backup/data-storage_b-storage
>>>
>>> [2020-10-01 20:52:43.533953] I [resource(worker
>>> /data/storage_b/storage):1291:service_loop] GLUSTER: Register time
>>> time=1601585563
>>> [2020-10-01 20:52:43.547092] I [gsyncdstatus(worker
>>> /data/storage_b/storage):281:set_active] GeorepStatus: Worker Status
>>> Change status=Active
>>> [2020-10-01 20:52:43.561920] I [gsyncdstatus(worker
>>> /data/storage_b/storage):253:set_worker_crawl_status] GeorepStatus:
>>> Crawl Status Change     status=History Crawl
>>> [2020-10-01 20:52:43.562184] I [master(worker
>>> /data/storage_b/storage):1554:crawl] _GMaster: starting history
>>> crawl     turns=1 stime=None      entry_stime=None etime=1601585563
>>> [2020-10-01 20:52:43.562269] I [resource(worker
>>> /data/storage_b/storage):1307:service_loop] GLUSTER: No stime
>>> available,
>>> using xsync crawl
>>> [2020-10-01 20:52:43.569799] I [master(worker
>>> /data/storage_b/storage):1670:crawl] _GMaster: starting hybrid
>>> crawl      stime=None
>>> [2020-10-01 20:52:43.573528] I [gsyncdstatus(worker
>>> /data/storage_b/storage):253:set_worker_crawl_status] GeorepStatus:
>>> Crawl Status Change     status=Hybrid Crawl
>>> [2020-10-01 20:52:44.370985] I [master(worker
>>> /data/storage_c/storage):1991:syncjob] Syncer: Sync Time Taken
>>> duration=20.4307        num_files=2609  job=5   return_code=23
>>> [2020-10-01 20:52:49.431854] W [master(worker
>>> /data/storage_c/storage):1393:process] _GMaster: incomplete sync,
>>> retrying changelogs     files=['XSYNC-CHANGELOG.1601585397']
>>> [2020-10-01 20:52:54.801500] I [master(worker
>>> /data/storage_a/storage):1991:syncjob] Syncer: Sync Time Taken
>>> duration=72.7492        num_files=4227  job=2   return_code=23
>>> [2020-10-01 20:52:56.766547] I [master(worker
>>> /data/storage_a/storage):1991:syncjob] Syncer: Sync Time Taken
>>> duration=74.3569        num_files=4674  job=5   return_code=23
>>> [2020-10-01 20:53:18.853333] I [master(worker
>>> /data/storage_c/storage):1991:syncjob] Syncer: Sync Time Taken
>>> duration=28.7125        num_files=4397  job=3   return_code=23
>>> [2020-10-01 20:53:21.224921] W [master(worker
>>> /data/storage_a/storage):1393:process] _GMaster: incomplete sync,
>>> retrying changelogs     files=['CHANGELOG.1601044033',
>>> 'CHANGELOG.1601044048', 'CHANGELOG.1601044063', 'CHANGELOG.1601044078',
>>> 'CHANGELOG.1601044093', 'CHANGELOG.1601044108', 'CHANGELOG.1601044123']
>>> [2020-10-01 20:53:22.134536] I [master(worker
>>> /data/storage_a/storage):1991:syncjob] Syncer: Sync Time Taken
>>> duration=0.2159 num_files=3     job=3   return_code=23
>>> [2020-10-01 20:53:25.615712] I [master(worker
>>> /data/storage_b/storage):1681:crawl] _GMaster: processing xsync
>>> changelog
>>> path=/var/lib/misc/gluster/gsyncd/storage_10.0.231.81_pcic-backup/data-storage_b-storage/xsync/XSYNC-CHANGELOG.1601585563
>>>
>>> [2020-10-01 20:53:25.634970] W [master(worker
>>> /data/storage_c/storage):1393:process] _GMaster: incomplete sync,
>>> retrying changelogs     files=['XSYNC-CHANGELOG.1601585397']
>>>
>>> GeoReplication status - see it change from Active to Faulty:
>>>
>>> [root at storage01 ~]# gluster volume geo-replication status
>>>
>>> MASTER NODE    MASTER VOL    MASTER BRICK               SLAVE USER
>>> SLAVE                                        SLAVE NODE STATUS
>>> CRAWL STATUS       LAST_SYNCED
>>> ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>>>
>>> 10.0.231.91    storage       /data/storage_a/storage geoaccount
>>> ssh://geoaccount@10.0.231.81::pcic-backup 10.0.231.81    Active
>>> Changelog Crawl    2020-09-25 07:26:57
>>> 10.0.231.91    storage       /data/storage_c/storage geoaccount
>>> ssh://geoaccount@10.0.231.81::pcic-backup 10.0.231.82    Active
>>> Hybrid Crawl       N/A
>>> 10.0.231.91    storage       /data/storage_b/storage geoaccount
>>> ssh://geoaccount@10.0.231.81::pcic-backup 10.0.231.82    Active
>>> Hybrid Crawl       N/A
>>> 10.0.231.92    storage       /data/storage_b/storage geoaccount
>>> ssh://geoaccount@10.0.231.81::pcic-backup 10.0.231.82    Active
>>> History Crawl      2020-09-23 01:56:05
>>> 10.0.231.92    storage       /data/storage_a/storage geoaccount
>>> ssh://geoaccount@10.0.231.81::pcic-backup 10.0.231.82    Active
>>> Hybrid Crawl       N/A
>>> 10.0.231.92    storage       /data/storage_c/storage geoaccount
>>> ssh://geoaccount@10.0.231.81::pcic-backup 10.0.231.81    Active
>>> Hybrid Crawl       N/A
>>> 10.0.231.93    storage       /data/storage_c/storage geoaccount
>>> ssh://geoaccount@10.0.231.81::pcic-backup 10.0.231.81    Active
>>> Changelog Crawl    2020-09-25 06:55:57
>>> 10.0.231.93    storage       /data/storage_b/storage geoaccount
>>> ssh://geoaccount@10.0.231.81::pcic-backup 10.0.231.81    Active
>>> Hybrid Crawl       N/A
>>> 10.0.231.93    storage       /data/storage_a/storage geoaccount
>>> ssh://geoaccount@10.0.231.81::pcic-backup 10.0.231.81    Active
>>> Hybrid Crawl       N/A
>>>
>>> [root at storage01 ~]# gluster volume geo-replication status
>>>
>>> MASTER NODE    MASTER VOL    MASTER BRICK               SLAVE USER
>>> SLAVE                                        SLAVE NODE STATUS
>>> CRAWL STATUS       LAST_SYNCED
>>> ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>>>
>>> 10.0.231.91    storage       /data/storage_a/storage geoaccount
>>> ssh://geoaccount@10.0.231.81::pcic-backup 10.0.231.81    Active
>>> Changelog Crawl    2020-09-25 07:26:57
>>> 10.0.231.91    storage       /data/storage_c/storage geoaccount
>>> ssh://geoaccount@10.0.231.81::pcic-backup 10.0.231.82    Active
>>> Hybrid Crawl       N/A
>>> 10.0.231.91    storage       /data/storage_b/storage geoaccount
>>> ssh://geoaccount@10.0.231.81::pcic-backup N/A            Faulty
>>> N/A                N/A
>>> 10.0.231.92    storage       /data/storage_b/storage geoaccount
>>> ssh://geoaccount@10.0.231.81::pcic-backup 10.0.231.82    Active
>>> History Crawl      2020-09-23 01:58:05
>>> 10.0.231.92    storage       /data/storage_a/storage geoaccount
>>> ssh://geoaccount@10.0.231.81::pcic-backup 10.0.231.82    Active
>>> Hybrid Crawl       N/A
>>> 10.0.231.92    storage       /data/storage_c/storage geoaccount
>>> ssh://geoaccount@10.0.231.81::pcic-backup N/A            Faulty
>>> N/A                N/A
>>> 10.0.231.93    storage       /data/storage_c/storage geoaccount
>>> ssh://geoaccount@10.0.231.81::pcic-backup 10.0.231.81    Active
>>> Changelog Crawl    2020-09-25 06:58:56
>>> 10.0.231.93    storage       /data/storage_b/storage geoaccount
>>> ssh://geoaccount@10.0.231.81::pcic-backup 10.0.231.81    Active
>>> Hybrid Crawl       N/A
>>> 10.0.231.93    storage       /data/storage_a/storage geoaccount
>>> ssh://geoaccount@10.0.231.81::pcic-backup N/A            Faulty
>>> N/A                N/A
>>>
>>>
>>> Cluster information: (Note - disabled performance.open-behind to work
>>> around https://github.com/gluster/glusterfs/issues/1440 )
>>>
>>> [root at storage01 ~]# gluster --version | head -1; cat
>>> /etc/centos-release; uname -r
>>> glusterfs 7.7
>>> CentOS Linux release 7.8.2003 (Core)
>>> 3.10.0-1127.10.1.el7.x86_64
>>>
>>> [root at storage01 ~]# df -h /storage2/
>>> Filesystem            Size  Used Avail Use% Mounted on
>>> 10.0.231.91:/storage  328T  228T  100T  70% /storage2
>>>
>>> [root at storage01 ~]# gluster volume info
>>>
>>> Volume Name: storage
>>> Type: Distributed-Replicate
>>> Volume ID: cf94a8f2-324b-40b3-bf72-c3766100ea99
>>> Status: Started
>>> Snapshot Count: 0
>>> Number of Bricks: 3 x (2 + 1) = 9
>>> Transport-type: tcp
>>> Bricks:
>>> Brick1: 10.0.231.91:/data/storage_a/storage
>>> Brick2: 10.0.231.92:/data/storage_b/storage
>>> Brick3: 10.0.231.93:/data/storage_c/storage (arbiter)
>>> Brick4: 10.0.231.92:/data/storage_a/storage
>>> Brick5: 10.0.231.93:/data/storage_b/storage
>>> Brick6: 10.0.231.91:/data/storage_c/storage (arbiter)
>>> Brick7: 10.0.231.93:/data/storage_a/storage
>>> Brick8: 10.0.231.91:/data/storage_b/storage
>>> Brick9: 10.0.231.92:/data/storage_c/storage (arbiter)
>>> Options Reconfigured:
>>> changelog.changelog: on
>>> geo-replication.ignore-pid-check: on
>>> geo-replication.indexing: on
>>> network.ping-timeout: 10
>>> features.inode-quota: on
>>> features.quota: on
>>> nfs.disable: on
>>> features.quota-deem-statfs: on
>>> storage.fips-mode-rchecksum: on
>>> performance.readdir-ahead: on
>>> performance.parallel-readdir: on
>>> cluster.lookup-optimize: on
>>> client.event-threads: 4
>>> server.event-threads: 4
>>> performance.cache-size: 256MB
>>> performance.open-behind: off
>>>
>>> Thanks,
>>>   -Matthew
>>> ________
>>>
>>>
>>>
>>> Community Meeting Calendar:
>>>
>>> Schedule -
>>> Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
>>> Bridge: https://bluejeans.com/441850968
>>>
>>> Gluster-users mailing list
>>> Gluster-users at gluster.org
>>> https://lists.gluster.org/mailman/listinfo/gluster-users
>> ________
>>
>>
>>
>> Community Meeting Calendar:
>>
>> Schedule -
>> Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
>> Bridge: https://bluejeans.com/441850968
>>
>> Gluster-users mailing list
>> Gluster-users at gluster.org
>> https://lists.gluster.org/mailman/listinfo/gluster-users
>
>
> ________
>
>
>
> Community Meeting Calendar:
>
> Schedule -
> Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
> Bridge: https://bluejeans.com/441850968
>
> Gluster-users mailing list
> Gluster-users at gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-users
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20201005/1af661b5/attachment.html>


More information about the Gluster-users mailing list