<html><head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
</head>
<body>
<p>Hi Shwetha,</p>
<p>thank you for your reply...<br>
</p>
<p>I ran a few tests in Debug Mode and found no real indication of
the problem. After each start of the geo-replication some files
are transferred at the beginning and then no further transfers.</p>
<p>Few minutes after start the amount of changelog files in
<brick> looks like : <br>
</p>
<p><font face="monospace">[ 06:42:26 ] - root@gl-master-02 ~/tmp
$./var_gluster.sh <br>
/var/lib/misc/gluster/gsyncd/mvol1_gl-slave-01-int_svol1/brick1-mvol1/.processed
: 0 <br>
/var/lib/misc/gluster/gsyncd/mvol1_gl-slave-01-int_svol1/brick1-mvol1/.processing
: 27 ### growing whole time<br>
/var/lib/misc/gluster/gsyncd/mvol1_gl-slave-01-int_svol1/brick1-mvol1/.history
: 324861<br>
/var/lib/misc/gluster/gsyncd/mvol1_gl-slave-01-int_svol1/brick1-mvol1/.history/.processed
: 1<br>
/var/lib/misc/gluster/gsyncd/mvol1_gl-slave-01-int_svol1/brick1-mvol1/.history/.processing
: 324859 ###finished building changelog files <br>
/var/lib/misc/gluster/gsyncd/mvol1_gl-slave-01-int_svol1/brick1-mvol1/.history/.current
: 0<br>
/var/lib/misc/gluster/gsyncd/mvol1_gl-slave-01-int_svol1/brick1-mvol1/.current
: 0<br>
/var/lib/misc/gluster/gsyncd/mvol1_gl-slave-01-int_svol1/brick1-mvol1/xsync
: 0<br>
---</font></p>
<p>As far as i remember at the beginning i have seen a few changelog
files in <brick>/.processed for a short moment, but always
with size 0. Even after some hours there are no files in
<brick>/.processed.<br>
</p>
<p>In strace are a lot of of messages like 'failed: No data
available' and 'rsync error: some files/attrs were not transfered
... (code 23)' for about the first 5-10 minutes after geo-rep
start.</p>
<p>for example gfid 8d601e5b-180c.... :<br>
</p>
<p><font face="monospace">19361 1615530800.812727 select(7, [6], [],
[], NULL <unfinished ...><br>
19357 1615530800.812779 select(0, NULL, NULL, NULL, {tv_sec=0,
tv_usec=235797} <unfinished ...><br>
19352 1615530800.816522
lstat(".gfid/f0ed7d0e-83be-4c3f-b2c8-f763e9aada12",
{st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0<br>
19352 1615530800.817723
lstat(".gfid/c5b44852-9cf9-441b-8766-d87bfac774c8",
{st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0<br>
19352 1615530800.819507
lstat(".gfid/b0b71bcc-7653-4ab8-b863-a83d395f5e91",
{st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0<br>
19352 1615530800.821106
lstat(".gfid/c4e80ff5-2e08-4e68-9a4d-ea7f45ed290d",
{st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0<br>
19352 1615530800.822874
lstat(".gfid/aaa468d8-9d6a-4aaf-8344-c57440286f5c",
<unfinished ...><br>
19418 1615530800.823466 <... restart_syscall resumed> ) =
1<br>
19418 1615530800.823519 read(14, "rsync: get_xattr_data:
lgetxattr(\"\"/tmp/gsyncd-aux-mount-46pc26b9/.gfid/8d601e5b-180c-46c8-b64f-ae6224542234\"\",\"trusted.glusterfs.mdata\",0)
failed: No data available (61)\n", 32768) = 171<br>
19418 1615530800.823587 poll([{fd=14, events=POLLIN}], 1, -1
<unfinished ...><br>
19352 1615530800.823830 <... lstat resumed>
{st_mode=S_IFREG|0644, st_size=4226767, ...}) = 0<br>
19352 1615530800.823882
lstat(".gfid/8164ea3b-44f6-4ea2-a75f-501cea0024cc",
{st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0<br>
19352 1615530800.897938
lstat(".gfid/01da73ae-1f88-498d-8fe5-84ea76db12f3",
{st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0<br>
19352 1615530800.934281
lstat(".gfid/be48f891-cdc1-4e4c-a141-7001ae3f592e",
{st_mode=S_IFDIR|0777, st_size=4096, ...}) = 0<br>
19352 1615530800.935938
lstat(".gfid/501fab77-5e83-42cb-9edf-ce30bc3a86a9",
{st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0<br>
19352 1615530800.937481
lstat(".gfid/668f6bd1-bdb0-46e0-9cd4-c7ebc38fbaf9",
<unfinished ...><br>
19417 1615530800.961937 <... restart_syscall resumed> ) =
1<br>
19417 1615530800.962042 read(13, "rsync error: some files/attrs
were not transferred (see previous errors) (code 23) at
main.c(1196) [sender=3.1.2]\n", 32768) = 114</font></p>
<p>according gsyncd.log this gfid is candidate for syncing like many
others, very few others are 'synced' :<br>
</p>
<p><font face="monospace"> <br>
[2021-03-12 06:33:20.651147] D [master(worker
/brick1/mvol1):318:a_syncdata] _GMaster: candidate for
syncing file=.gfid/8d601e5b-180c-46c8-b64f-ae6224542234<br>
[2021-03-12 06:35:17.419920] D [master(worker
/brick1/mvol1):318:a_syncdata] _GMaster: candidate for
syncing file=.gfid/8d601e5b-180c-46c8-b64f-ae6224542234</font></p>
<p><font face="monospace">[2021-03-12 06:35:03.382977] D
[master(worker /brick1/mvol1):324:regjob] _GMaster: synced
file=.gfid/a3656075-784c-4377-a482-4aad8378acf0<br>
</font></p>
<p>when i try to fetch some attributes i get 'No such attribute'
for the mentioned gfid while for a synced gfid attributes are
available :<br>
</p>
<font face="monospace">[ 09:58:54 ] - root@gl-master-02 ~/tmp
$getfattr -m . -d -e hex
/tmp/gsyncd-aux-mount-46pc26b9/.gfid/8d601e5b-180c-46c8-b64f-ae6224542234</font><br>
<font face="monospace">/tmp/gsyncd-aux-mount-46pc26b9/.gfid/8d601e5b-180c-46c8-b64f-ae6224542234:
trusted.glusterfs.mdata: No such attribute</font><br>
<br>
<font face="monospace">[ 09:59:38 ] - root@gl-master-02 ~/tmp
$getfattr -m . -d -e hex
/tmp/gsyncd-aux-mount-46pc26b9/.gfid/a3656075-784c-4377-a482-4aad8378acf0</font><br>
<font face="monospace">getfattr: Removing leading '/' from absolute
path names</font><br>
<font face="monospace"># file:
tmp/gsyncd-aux-mount-46pc26b9/.gfid/a3656075-784c-4377-a482-4aad8378acf0</font><br>
<font face="monospace">trusted.glusterfs.mdata=0x010000000000000000000000005d1f73ff000000000da35668000000005d1f73fd0000000015811b46000000005f29050b000000001d5363a6</font><br>
<p>i can make a stat on that directory for example :</p>
<p><font face="monospace">[ 10:07:19 ] - root@gl-master-02 ~/tmp
$stat
/tmp/gsyncd-aux-mount-46pc26b9/.gfid/8d601e5b-180c-46c8-b64f-ae6224542234<br>
File:
/tmp/gsyncd-aux-mount-46pc26b9/.gfid/8d601e5b-180c-46c8-b64f-ae6224542234<br>
Size: 4096 Blocks: 8 IO Block: 131072
directory<br>
Device: 37h/55d Inode: 9394601311212820456 Links: 2<br>
Access: (0755/drwxr-xr-x) Uid: ( 0/ root) Gid: (
0/ root)<br>
Access: 2021-01-17 07:26:09.596743288 +0000<br>
Modify: 2021-03-12 07:34:39.382122663 +0000<br>
Change: 2021-03-12 07:34:39.383446790 +0000<br>
Birth: -</font><br>
</p>
<p>currently i have no clou how to get this fixed. obviously
extended attributes are missing for the most entries in
/tmp/gsyncd-aux-mount.../ while some others exist.</p>
<p>I believe rsync is not the reason, it's rather because of missing
attributes ?!</p>
<p>Lastly the gfid points to a directory, when i try to get
attributes for this directroy in brick-path it succeed's :<br>
</p>
<br>
<font face="monospace">[ 10:15:25 ] - root@gl-master-02 ~/tmp $ls
-l
/brick1/mvol1/.glusterfs/8d/60/8d601e5b-180c-46c8-b64f-ae6224542234<br>
lrwxrwxrwx 1 root root 56 Nov 5 17:54
/brick1/mvol1/.glusterfs/8d/60/8d601e5b-180c-46c8-b64f-ae6224542234
-> ../../f0/94/f094bf06-2806-4f90-9a79-489827c6cdf9/2217547</font>
<p><font face="monospace"><br>
</font></p>
<font face="monospace">[ 10:38:01 ] - root@gl-master-02 ~ $getfattr
-m . -d -e hex /brick1/mvol1/2137/files/20/11/2217547<br>
getfattr: Removing leading '/' from absolute path names<br>
# file: brick1/mvol1/2137/files/20/11/2217547<br>
trusted.gfid=0x8d601e5b180c46c8b64fae6224542234<br>
trusted.glusterfs.2f5de6e4-66de-40a7-9f24-4762aad3ca96.xtime=0x604b198f0005e528<br>
trusted.glusterfs.dht=0x001ed359000000007a2d37c1a8b9af89<br>
trusted.glusterfs.dht.mds=0x00000000<br>
<br>
[ 10:38:42 ] - root@gl-master-02 ~ $getfattr -m . -d -e hex
/brick1/mvol1/2137/files/20/11<br>
getfattr: Removing leading '/' from absolute path names<br>
# file: brick1/mvol1/2137/files/20/11<br>
trusted.gfid=0xf094bf0628064f909a79489827c6cdf9<br>
trusted.glusterfs.2f5de6e4-66de-40a7-9f24-4762aad3ca96.xtime=0x604b198f0005e528<br>
trusted.glusterfs.dht=0x001ed35900000000d1738834ffffffff<br>
trusted.glusterfs.mdata=0x010000000000000000000000005fc5378000000000077ba08a000000005fc535b60000000038d942cc000000005f9ffc610000000007b08744<br>
<br>
[ 10:39:54 ] - root@gl-master-02 ~ $<br>
</font>
<p><br>
</p>
<p>but the geo-rep ended up in a loop, but without 'E'error :</p>
<p><font face="monospace">[2021-03-12 10:46:40.572500] D
[repce(worker /brick1/mvol1):215:__call__] RepceClient: call
19352:140387951818496:1615546000.5609794 keep_alive -> 256<br>
[2021-03-12 10:46:41.23154] D [master(worker
/brick2/mvol1):554:crawlwrap] _GMaster: ... crawl #0 done, took
5.017846 seconds<br>
[2021-03-12 10:46:41.35729] D [master(worker
/brick2/mvol1):578:crawlwrap] _GMaster: Crawl info
cluster_stime=(1609281098, 0) brick_stime=(1609281900, 0)<br>
[2021-03-12 10:46:46.41012] D [master(worker
/brick2/mvol1):554:crawlwrap] _GMaster: ... crawl #0 done, took
5.017512 seconds<br>
[2021-03-12 10:46:46.53818] D [master(worker
/brick2/mvol1):578:crawlwrap] _GMaster: Crawl info
cluster_stime=(1609281098, 0) brick_stime=(1609281900, 0)<br>
[2021-03-12 10:46:48.269174] D [repce(worker
/brick2/mvol1):195:push] RepceClient: call
19354:140476158043904:1615546008.2690222 keep_alive({'version':
(1, 0), 'uuid': '2f5de6e4-66de-40a7-9f24-4762aad3ca96',
'retval': 0, 'volume_mark': (1609275788, 819193), 'timeout':
1615546128},) ...</font><br>
</p>
<p><br>
</p>
<p>Does anyone have any idea how to solve this problem ?</p>
<p>best regards,</p>
<p>Dietar<br>
</p>
<p><br>
</p>
<div class="moz-cite-prefix">On 04.03.21 08:48, Shwetha Acharya
wrote:<br>
</div>
<blockquote type="cite" cite="mid:CAERh03oi__zcHuHXtvjYHaa41HSot-ySydzMtKCnw9fVPPw-FQ@mail.gmail.com">
<div dir="ltr">
<div dir="ltr">Hi Dietmar,<br>
<br>
</div>
<div dir="ltr">batch-fsync-delay-usec was already set to 0 and I
increased the sync_jobs from 3 to 6. In the moment I increased
the sync_jobs following error appeared in gsyncd.log :<br>
</div>
<div class="gmail_quote">
<blockquote class="gmail_quote" style="margin:0px 0px 0px
0.8ex;border-left:1px solid
rgb(204,204,204);padding-left:1ex">
<p>[2021-03-03 23:17:46.59727] E [syncdutils(worker
/brick1/mvol1):312:log_raise_exception] <top>:
connection to peer is broken<br>
[2021-03-03 23:17:46.59912] E [syncdutils(worker
/brick2/mvol1):312:log_raise_exception] <top>:
connection to peer is broken</p>
</blockquote>
<div>If the geo-rep session is currently not in faulty state,
we should be bothered about this log message. It is normal
when the config is updated, geo-rep restart occurs and the
above message pops up.</div>
<blockquote class="gmail_quote" style="margin:0px 0px 0px
0.8ex;border-left:1px solid
rgb(204,204,204);padding-left:1ex">
<div>
<p>passive nodes became active and the content in
<brick>/.processing was removed. currently new
changelog files are created in this directory.shortly
before I changed the sync_jobs I have checked the
<brick>/.processing directory on the master nodes.
the result was the same for every master node.</p>
</div>
</blockquote>
<blockquote class="gmail_quote" style="margin:0px 0px 0px
0.8ex;border-left:1px solid
rgb(204,204,204);padding-left:1ex">
<p>since the last error about 12 hours ago nearly 2400
changelog files were created on each node but it looks
like none of them were consumed.</p>
</blockquote>
<div> Processed changelogs that are synced are archived under
<brick>/.processed directory. Verify if the latest
file is created there.</div>
<blockquote class="gmail_quote" style="margin:0px 0px 0px
0.8ex;border-left:1px solid
rgb(204,204,204);padding-left:1ex">
<div>
<p>in the moment I'm not sure what is right and what is
wrong...<span lang="en"><span><span>should at least the
oldest changelog files in this directory have been
processed gradually ?</span></span></span></p>
</div>
</blockquote>
<div>Also you can try to set the log-level to debug for a
while and set it back to info(to avoid flooding of logs) and
check the logs to get a better picture of the scenario.</div>
<div>#gluster volume geo-replication <primary>
<ip>::<secondary> config log-level DEBUG<br>
#gluster volume geo-replication <primary>
<ip>::<secondary> config log-level INFO<br>
<br>
Regards,<br>
Shwetha<br>
<br>
</div>
</div>
</div>
</blockquote>
<pre class="moz-signature" cols="72">
</pre>
</body>
</html>