[Bugs] [Bug 1499520] New: Geo-rep: Lot of changelog(rsync) retries and failure during directory syncing.
bugzilla at redhat.com
bugzilla at redhat.com
Sun Oct 8 11:23:09 UTC 2017
https://bugzilla.redhat.com/show_bug.cgi?id=1499520
Bug ID: 1499520
Summary: Geo-rep: Lot of changelog(rsync) retries and failure
during directory syncing.
Product: GlusterFS
Version: mainline
Component: distribute
Assignee: bugs at gluster.org
Reporter: khiremat at redhat.com
CC: bugs at gluster.org
Description of problem:
Directory syncing is causing changelog retries because of rsync errors on
directory gfids.
Version-Release number of selected component (if applicable):
mainline
How reproducible:
Always
Steps to Reproduce:
1. Setup geo-replication
2. Create directory on master
3. Let geo-rep sync to slave
Actual results:
Directory syncing is causing retries. Directory is synced but lot of retries.
I think rsync on directory gfid is failing. (rsync /mastermnt/.gfid/<dirgfid>
/slavemnt/)
Expected results:
Directory syncing should not involve any retries. It would cause performance
degradation
Analysis:
I found out that the issue is introduced with the below patch.
commit 9b4de61a136b8e5ba7bf0e48690cdb1292d0dee8
cluster/dht : User xattrs are not healed after brick stop/start
Additional info:
Geo-replication log:
[2017-10-08 11:12:31.657124] I [master(/bricks/brick0/b0):1382:crawl] _GMaster:
slave's time stime=(1507460923, 0)
[2017-10-08 11:12:31.867773] I [master(/bricks/brick0/b0):1860:syncjob] Syncer:
Sync Time Taken duration=0.1440 num_files=1 job=1 return_code=23
[2017-10-08 11:12:31.871272] W [master(/bricks/brick0/b0):1267:process]
_GMaster: incomplete sync, retrying changelogs files=['CHANGELOG.1507461149']
[2017-10-08 11:12:32.875348] I [master(/bricks/brick0/b0):1860:syncjob] Syncer:
Sync Time Taken duration=0.1471 num_files=1 job=3 return_code=23
[2017-10-08 11:12:32.880616] W [master(/bricks/brick0/b0):1267:process]
_GMaster: incomplete sync, retrying changelogs files=['CHANGELOG.1507461149']
[2017-10-08 11:12:33.907165] I [master(/bricks/brick0/b0):1860:syncjob] Syncer:
Sync Time Taken duration=0.1759 num_files=1 job=2 return_code=23
[2017-10-08 11:12:33.908428] W [master(/bricks/brick0/b0):1267:process]
_GMaster: incomplete sync, retrying changelogs files=['CHANGELOG.1507461149']
[2017-10-08 11:12:35.39695] I [master(/bricks/brick0/b0):1860:syncjob] Syncer:
Sync Time Taken duration=0.1532 num_files=1 job=1 return_code=23
[2017-10-08 11:12:35.42959] W [master(/bricks/brick0/b0):1267:process]
_GMaster: incomplete sync, retrying changelogs files=['CHANGELOG.1507461149']
[2017-10-08 11:12:36.42484] I [master(/bricks/brick0/b0):1860:syncjob] Syncer:
Sync Time Taken duration=0.1516 num_files=1 job=3 return_code=23
[2017-10-08 11:12:36.47372] W [master(/bricks/brick0/b0):1267:process]
_GMaster: incomplete sync, retrying changelogs files=['CHANGELOG.1507461149']
[2017-10-08 11:12:37.95174] I [master(/bricks/brick0/b0):1860:syncjob] Syncer:
Sync Time Taken duration=0.1764 num_files=1 job=2 return_code=23
[2017-10-08 11:12:37.98913] W [master(/bricks/brick0/b0):1267:process]
_GMaster: incomplete sync, retrying changelogs files=['CHANGELOG.1507461149']
[2017-10-08 11:12:38.209673] I [master(/bricks/brick0/b0):1860:syncjob] Syncer:
Sync Time Taken duration=0.1562 num_files=1 job=1 return_code=23
[2017-10-08 11:12:38.214554] W [master(/bricks/brick0/b0):1267:process]
_GMaster: incomplete sync, retrying changelogs files=['CHANGELOG.1507461149']
[2017-10-08 11:12:39.236735] I [master(/bricks/brick0/b0):1860:syncjob] Syncer:
Sync Time Taken duration=0.1782 num_files=1 job=3 return_code=23
[2017-10-08 11:12:39.241420] W [master(/bricks/brick0/b0):1267:process]
_GMaster: incomplete sync, retrying changelogs files=['CHANGELOG.1507461149']
[2017-10-08 11:12:40.267714] I [master(/bricks/brick0/b0):1860:syncjob] Syncer:
Sync Time Taken duration=0.1557 num_files=1 job=2 return_code=23
[2017-10-08 11:12:40.272130] W [master(/bricks/brick0/b0):1267:process]
_GMaster: incomplete sync, retrying changelogs files=['CHANGELOG.1507461149']
[2017-10-08 11:12:41.384635] I [master(/bricks/brick0/b0):1860:syncjob] Syncer:
Sync Time Taken duration=0.1578 num_files=1 job=1 return_code=23
[2017-10-08 11:12:41.389205] E [master(/bricks/brick0/b0):1246:process]
_GMaster: changelogs could not be processed completely - moving on...
files=['CHANGELOG.1507461149']
[2017-10-08 11:12:41.408358] I [master(/bricks/brick0/b0):1295:process]
_GMaster: Entry Time Taken MKD=0 MKN=0 LIN=0 SYM=0 REN=0 RMD=0
CRE=0 duration=0.0000 UNL=0
[2017-10-08 11:12:41.410573] I [master(/bricks/brick0/b0):1305:process]
_GMaster: Data/Metadata Time Taken SETA=0 SETX=0 meta_duration=0.0000
data_duration=1507461161.4104 DATA=0 XATT=0
[2017-10-08 11:12:41.412946] I [master(/bricks/brick0/b0):1315:process]
_GMaster: Batch Completed changelog_end=1507461149
entry_stime=(1507461148, 0) changelog_start=1507461149
stime=(1507461148, 0) duration=9.7488 num_changelogs=1
mode=live_changelog
/var/log/glusterfs/geo-replication/master/ssh\%3A\%2F\%2Froot\%40192.168.122.13\%3Agluster\%3A\%2F\%2F127.0.0.1\%3Aslave.log
(END)
Slave mount logs:
[2017-10-08 11:12:31.680512] E [MSGID: 101046]
[dht-common.c:702:dht_discover_complete] 0-slave-dht: dict is null
[2017-10-08 11:12:31.687599] I [dict.c:471:dict_get]
(-->/usr/local/lib/glusterfs/4.0dev/xlator/cluster/distribute.so(+0x28c31)
[0x7f4af9601c31]
-->/usr/local/lib/glusterfs/4.0dev/xlator/cluster/distribute.so(+0x28448)
[0x7f4af9601448] -->/usr/local/lib/libglusterfs.so.0(dict_get+0x79)
[0x7f4b0765662d] ) 0-dict: !this || key=trusted.glusterfs.dht.mds [Invalid
argument]
/var/log/glusterfs/geo-replication-slaves/3ab6f770-d2a9-440f-ba28-83bcf1a40eb6:fedora3.\%2Fbricks\%2Fbrick0\%2Fb0.slave.gluster.log
(END)
--
You are receiving this mail because:
You are on the CC list for the bug.
You are the assignee for the bug.
More information about the Bugs
mailing list