[Gluster-users] Geo-replication failed to delete from slave file partially written to master volume.
Viktor Nosov
vnosov at stonefly.com
Tue Dec 6 01:43:22 UTC 2016
Hi,
I hit problem while testing geo-replication. Anybody knows how to fix it
except deleting and recreating geo-replication?
Geo-replication failed to delete from slave file partially written to master
volume.
Have geo-replication between two nodes that are running glusterfs 3.7.16
with master volume:
[root at SC-182 log]# gluster volume info master-for-183-0003
Volume Name: master-for-183-0003
Type: Distribute
Volume ID: 84501a83-b07c-4768-bfaa-418b038e1a9e
Status: Started
Number of Bricks: 1
Transport-type: tcp
Bricks:
Brick1: 10.10.60.182:/exports/nas-segment-0012/master-for-183-0003
Options Reconfigured:
changelog.changelog: on
geo-replication.ignore-pid-check: on
geo-replication.indexing: on
server.allow-insecure: on
performance.quick-read: off
performance.stat-prefetch: off
nfs.disable: on
nfs.addr-namelookup: off
performance.readdir-ahead: on
cluster.enable-shared-storage: enable
snap-activate-on-create: enable
and slave volume:
[root at SC-183 log]# gluster volume info rem-volume-0001
Volume Name: rem-volume-0001
Type: Distribute
Volume ID: 7680de7a-d0e2-42f2-96a9-4da29adba73c
Status: Started
Number of Bricks: 1
Transport-type: tcp
Bricks:
Brick1: 10.10.60.183:/exports/nas183-segment-0001/rem-volume-0001
Options Reconfigured:
performance.readdir-ahead: on
nfs.addr-namelookup: off
nfs.disable: on
performance.stat-prefetch: off
performance.quick-read: off
server.allow-insecure: on
snap-activate-on-create: enable
Master volume mounted on node:
[root at SC-182 log]# mount
127.0.0.1:/master-for-183-0003 on /samba/master-for-183-0003 type
fuse.glusterfs (rw,allow_other,max_read=131072)
Let's fill up space on master volume:
[root at SC-182 log]# mkdir /samba/master-for-183-0003/cifs_share/dir3
[root at SC-182 log]# cp big.file /samba/master-for-183-0003/cifs_share/dir3/
[root at SC-182 log]# cp big.file
/samba/master-for-183-0003/cifs_share/dir3/big.file.1
cp: writing `/samba/master-for-183-0003/cifs_share/dir3/big.file.1': No
space left on device
cp: closing `/samba/master-for-183-0003/cifs_share/dir3/big.file.1': No
space left on device
File " big.file.1" represent part of the original file:
[root at SC-182 log]# ls -l /samba/master-for-183-0003/cifs_share/dir3/*
-rwx------ 1 root root 78930370 Dec 5 16:49
/samba/master-for-183-0003/cifs_share/dir3/big.file
-rwx------ 1 root root 22155264 Dec 5 16:49
/samba/master-for-183-0003/cifs_share/dir3/big.file.1
Both new files are geo-replicated to the Slave volume successfully:
[root at SC-183 log]# ls -l
/exports/nas183-segment-0001/rem-volume-0001/cifs_share/dir3/
total 98720
-rwx------ 2 root root 78930370 Dec 5 16:49 big.file
-rwx------ 2 root root 22155264 Dec 5 16:49 big.file.1
[root at SC-182 log]# /usr/sbin/gluster volume geo-replication
master-for-183-0003 nasgorep at 10.10.60.183::rem-volume-0001 status detail
MASTER NODE MASTER VOL MASTER BRICK
SLAVE USER SLAVE SLAVE NODE
STATUS
CRAWL STATUS LAST_SYNCED ENTRY DATA META FAILURES
CHECKPOINT TIME CHECKPOINT COMPLETED CHECKPOINT COMPLETION TIME
----------------------------------------------------------------------------
----------------------------------------------------------------------------
----------------------------------------------------------------------------
----------------------------------------------------------------------------
------
10.10.60.182 master-for-183-0003
/exports/nas-segment-0012/master-for-183-0003 nasgorep
nasgorep at 10.10.60.183::rem-volume-0001 10.10.60.183 Active
Changelog Crawl 2016-12-05 16:49:48 0 0 0 0
N/A N/A N/A
Let's delete partially written file from the master mount:
[root at SC-182 log]# rm /samba/master-for-183-0003/cifs_share/dir3/big.file.1
rm: remove regular file
`/samba/master-for-183-0003/cifs_share/dir3/big.file.1'? y
[root at SC-182 log]# ls -l /samba/master-for-183-0003/cifs_share/dir3/*
-rwx------ 1 root root 78930370 Dec 5 16:49
/samba/master-for-183-0003/cifs_share/dir3/big.file
Set checkpoint:
32643 12/05/2016 16:57:46.540390536 1480985866 command: /usr/sbin/gluster
volume geo-replication master-for-183-0003
nasgorep at 10.10.60.183::rem-volume-0001 config checkpoint now 2>&1
32643 12/05/2016 16:57:48.770820909 1480985868 status=0 /usr/sbin/gluster
volume geo-replication master-for-183-0003
nasgorep at 10.10.60.183::rem-volume-0001 config checkpoint now 2>&1
Check geo-replication status:
[root at SC-182 log]# /usr/sbin/gluster volume geo-replication
master-for-183-0003 nasgorep at 10.10.60.183::rem-volume-0001 status detail
MASTER NODE MASTER VOL MASTER BRICK
SLAVE USER SLAVE SLAVE NODE
STATUS
CRAWL STATUS LAST_SYNCED ENTRY DATA META FAILURES
CHECKPOINT TIME CHECKPOINT COMPLETED CHECKPOINT COMPLETION TIME
----------------------------------------------------------------------------
----------------------------------------------------------------------------
----------------------------------------------------------------------------
----------------------------------------------------------------------------
----------
10.10.60.182 master-for-183-0003
/exports/nas-segment-0012/master-for-183-0003 nasgorep
nasgorep at 10.10.60.183::rem-volume-0001 10.10.60.183 Active
Changelog Crawl 2016-12-05 16:57:48 0 0 0 0
2016-12-05 16:57:46 Yes 2016-12-05 16:57:50
But the partially written file "big.file.1" is still present on the slave
volume:
[root at SC-183 log]# ls -l
/exports/nas183-segment-0001/rem-volume-0001/cifs_share/dir3/
total 98720
-rwx------ 2 root root 78930370 Dec 5 16:49 big.file
-rwx------ 2 root root 22155264 Dec 5 16:49 big.file.1
Gluster logs for geo-replication do not have any indication about failure to
delete the file:
[root at SC-182 log]# view
/var/log/glusterfs/geo-replication/master-for-183-0003/ssh%3A%2F%2Fnasgorep%
4010.10.60.183%3Agluster%3A%2F%2F127.0.0.1%3Arem-volume-0001.log
[2016-12-06 00:49:40.267956] I
[master(/exports/nas-segment-0012/master-for-183-0003):532:crawlwrap]
_GMaster: 17 crawls, 1 turns
[2016-12-06 00:49:52.348413] I
[master(/exports/nas-segment-0012/master-for-183-0003):1121:crawl] _GMaster:
slave's time: (1480985358, 0)
[2016-12-06 00:49:53.296811] W
[master(/exports/nas-segment-0012/master-for-183-0003):1058:process]
_GMaster: incomplete sync, retrying changelogs: CHANGELOG.1480985389
[2016-12-06 00:49:53.901186] W
[master(/exports/nas-segment-0012/master-for-183-0003):1058:process]
_GMaster: incomplete sync, retrying changelogs: CHANGELOG.1480985389
[2016-12-06 00:49:54.760957] W
[master(/exports/nas-segment-0012/master-for-183-0003):1058:process]
_GMaster: incomplete sync, retrying changelogs: CHANGELOG.1480985389
[2016-12-06 00:49:55.384705] W
[master(/exports/nas-segment-0012/master-for-183-0003):1058:process]
_GMaster: incomplete sync, retrying changelogs: CHANGELOG.1480985389
[2016-12-06 00:49:55.987873] W
[master(/exports/nas-segment-0012/master-for-183-0003):1058:process]
_GMaster: incomplete sync, retrying changelogs: CHANGELOG.1480985389
[2016-12-06 00:49:56.848361] W
[master(/exports/nas-segment-0012/master-for-183-0003):1058:process]
_GMaster: incomplete sync, retrying changelogs: CHANGELOG.1480985389
[2016-12-06 00:49:57.471925] W
[master(/exports/nas-segment-0012/master-for-183-0003):1058:process]
_GMaster: incomplete sync, retrying changelogs: CHANGELOG.1480985389
[2016-12-06 00:49:58.76416] W
[master(/exports/nas-segment-0012/master-for-183-0003):1058:process]
_GMaster: incomplete sync, retrying changelogs: CHANGELOG.1480985389
[2016-12-06 00:49:58.935801] W
[master(/exports/nas-segment-0012/master-for-183-0003):1058:process]
_GMaster: incomplete sync, retrying changelogs: CHANGELOG.1480985389
[2016-12-06 00:49:59.560571] E
[resource(/exports/nas-segment-0012/master-for-183-0003):1021:rsync] SSH:
SYNC Error(Rsync): rsync: rsync_xal_set:
lsetxattr(".gfid/103b87ff-3b7a-4f2b-8bc5-a2f9c1d3fc0e","trusted.glusterfs.84
501a83-b07c-4768-bfaa-418b038e1a9e.xtime") failed: Operation not permitted
(1)
[2016-12-06 00:49:59.560972] E
[master(/exports/nas-segment-0012/master-for-183-0003):1037:process]
_GMaster: changelogs CHANGELOG.1480985389 could not be processed
completely - moving on...
[2016-12-06 00:50:41.839792] I
[master(/exports/nas-segment-0012/master-for-183-0003):532:crawlwrap]
_GMaster: 18 crawls, 1 turns
[2016-12-06 00:51:42.203411] I
[master(/exports/nas-segment-0012/master-for-183-0003):532:crawlwrap]
_GMaster: 20 crawls, 0 turns
[2016-12-06 00:52:42.600800] I
[master(/exports/nas-segment-0012/master-for-183-0003):532:crawlwrap]
_GMaster: 20 crawls, 0 turns
[2016-12-06 00:53:42.983913] I
[master(/exports/nas-segment-0012/master-for-183-0003):532:crawlwrap]
_GMaster: 20 crawls, 0 turns
[2016-12-06 00:54:43.381218] I
[master(/exports/nas-segment-0012/master-for-183-0003):532:crawlwrap]
_GMaster: 20 crawls, 0 turns
[2016-12-06 00:55:43.749927] I
[master(/exports/nas-segment-0012/master-for-183-0003):532:crawlwrap]
_GMaster: 20 crawls, 0 turns
[2016-12-06 00:56:44.113914] I
[master(/exports/nas-segment-0012/master-for-183-0003):532:crawlwrap]
_GMaster: 20 crawls, 0 turns
[2016-12-06 00:57:44.494354] I
[master(/exports/nas-segment-0012/master-for-183-0003):532:crawlwrap]
_GMaster: 20 crawls, 0 turns
[2016-12-06 00:57:48.528424] I [gsyncd(conf):671:main_i] <top>: checkpoint
1480985866 set
[2016-12-06 00:57:48.528704] I [syncdutils(conf):220:finalize] <top>:
exiting.
[2016-12-06 00:57:50.530714] I
[master(/exports/nas-segment-0012/master-for-183-0003):1121:crawl] _GMaster:
slave's time: (1480985388, 0)
[2016-12-06 00:58:44.802122] I
[master(/exports/nas-segment-0012/master-for-183-0003):532:crawlwrap]
_GMaster: 20 crawls, 1 turns
[2016-12-06 00:59:45.181669] I
[master(/exports/nas-segment-0012/master-for-183-0003):532:crawlwrap]
_GMaster: 20 crawls, 0 turns
Best regards,
Viktor Nosov
More information about the Gluster-users
mailing list