<html><head>

<meta http-equiv="Content-Type" content="text/html; charset=utf-8">

  </head>

  <body>

    <p>Hi Shwetha,</p>

    <p>thank you for your reply...<br>

    </p>

    <p>I ran a few tests in Debug Mode and found no real indication of

      the problem. After each start of the geo-replication some files

      are transferred at the beginning and then no further transfers.</p>

    <p>Few minutes after start the amount of changelog files in

      &lt;brick&gt; looks like : <br>

    </p>

    <p><font face="monospace">[ 06:42:26 ] - root@gl-master-02&nbsp; ~/tmp

        $./var_gluster.sh <br>

/var/lib/misc/gluster/gsyncd/mvol1_gl-slave-01-int_svol1/brick1-mvol1/.processed

        : 0&nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp; &nbsp;&nbsp; <br>

/var/lib/misc/gluster/gsyncd/mvol1_gl-slave-01-int_svol1/brick1-mvol1/.processing

        : 27&nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp;&nbsp; ### growing whole time<br>

/var/lib/misc/gluster/gsyncd/mvol1_gl-slave-01-int_svol1/brick1-mvol1/.history

        : 324861<br>

/var/lib/misc/gluster/gsyncd/mvol1_gl-slave-01-int_svol1/brick1-mvol1/.history/.processed

        : 1<br>

/var/lib/misc/gluster/gsyncd/mvol1_gl-slave-01-int_svol1/brick1-mvol1/.history/.processing

        : 324859&nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp; ###finished building changelog files <br>

/var/lib/misc/gluster/gsyncd/mvol1_gl-slave-01-int_svol1/brick1-mvol1/.history/.current

        : 0<br>

/var/lib/misc/gluster/gsyncd/mvol1_gl-slave-01-int_svol1/brick1-mvol1/.current

        : 0<br>

/var/lib/misc/gluster/gsyncd/mvol1_gl-slave-01-int_svol1/brick1-mvol1/xsync

        : 0<br>

        ---</font></p>

    <p>As far as i remember at the beginning i have seen a few changelog

      files in &lt;brick&gt;/.processed for a short moment, but always

      with size 0. Even after some hours there are no files in

      &lt;brick&gt;/.processed.<br>

    </p>

    <p>In strace are a lot of of messages like 'failed: No data

      available' and 'rsync error: some files/attrs were not transfered

      ... (code 23)' for about the first 5-10 minutes after geo-rep

      start.</p>

    <p>for example gfid 8d601e5b-180c.... :<br>

    </p>

    <p><font face="monospace">19361 1615530800.812727 select(7, [6], [],

        [], NULL &lt;unfinished ...&gt;<br>

        19357 1615530800.812779 select(0, NULL, NULL, NULL, {tv_sec=0,

        tv_usec=235797} &lt;unfinished ...&gt;<br>

        19352 1615530800.816522

        lstat(&quot;.gfid/f0ed7d0e-83be-4c3f-b2c8-f763e9aada12&quot;,

        {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0<br>

        19352 1615530800.817723

        lstat(&quot;.gfid/c5b44852-9cf9-441b-8766-d87bfac774c8&quot;,

        {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0<br>

        19352 1615530800.819507

        lstat(&quot;.gfid/b0b71bcc-7653-4ab8-b863-a83d395f5e91&quot;,

        {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0<br>

        19352 1615530800.821106

        lstat(&quot;.gfid/c4e80ff5-2e08-4e68-9a4d-ea7f45ed290d&quot;,

        {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0<br>

        19352 1615530800.822874

        lstat(&quot;.gfid/aaa468d8-9d6a-4aaf-8344-c57440286f5c&quot;,&nbsp;

        &lt;unfinished ...&gt;<br>

        19418 1615530800.823466 &lt;... restart_syscall resumed&gt; ) =

        1<br>

        19418 1615530800.823519 read(14, &quot;rsync: get_xattr_data:

lgetxattr(\&quot;\&quot;/tmp/gsyncd-aux-mount-46pc26b9/.gfid/8d601e5b-180c-46c8-b64f-ae6224542234\&quot;\&quot;,\&quot;trusted.glusterfs.mdata\&quot;,0)

        failed: No data available (61)\n&quot;, 32768) = 171<br>

        19418 1615530800.823587 poll([{fd=14, events=POLLIN}], 1, -1

        &lt;unfinished ...&gt;<br>

        19352 1615530800.823830 &lt;... lstat resumed&gt;

        {st_mode=S_IFREG|0644, st_size=4226767, ...}) = 0<br>

        19352 1615530800.823882

        lstat(&quot;.gfid/8164ea3b-44f6-4ea2-a75f-501cea0024cc&quot;,

        {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0<br>

        19352 1615530800.897938

        lstat(&quot;.gfid/01da73ae-1f88-498d-8fe5-84ea76db12f3&quot;,

        {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0<br>

        19352 1615530800.934281

        lstat(&quot;.gfid/be48f891-cdc1-4e4c-a141-7001ae3f592e&quot;,

        {st_mode=S_IFDIR|0777, st_size=4096, ...}) = 0<br>

        19352 1615530800.935938

        lstat(&quot;.gfid/501fab77-5e83-42cb-9edf-ce30bc3a86a9&quot;,

        {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0<br>

        19352 1615530800.937481

        lstat(&quot;.gfid/668f6bd1-bdb0-46e0-9cd4-c7ebc38fbaf9&quot;,&nbsp;

        &lt;unfinished ...&gt;<br>

        19417 1615530800.961937 &lt;... restart_syscall resumed&gt; ) =

        1<br>

        19417 1615530800.962042 read(13, &quot;rsync error: some files/attrs

        were not transferred (see previous errors) (code 23) at

        main.c(1196) [sender=3.1.2]\n&quot;, 32768) = 114</font></p>

    <p>according gsyncd.log this gfid is candidate for syncing like many

      others, very few others are 'synced' :<br>

    </p>

    <p><font face="monospace">&nbsp;<br>

        [2021-03-12 06:33:20.651147] D [master(worker

        /brick1/mvol1):318:a_syncdata] _GMaster: candidate for

        syncing&nbsp;&nbsp;&nbsp; file=.gfid/8d601e5b-180c-46c8-b64f-ae6224542234<br>

        [2021-03-12 06:35:17.419920] D [master(worker

        /brick1/mvol1):318:a_syncdata] _GMaster: candidate for

        syncing&nbsp;&nbsp;&nbsp; file=.gfid/8d601e5b-180c-46c8-b64f-ae6224542234</font></p>

    <p><font face="monospace">[2021-03-12 06:35:03.382977] D

        [master(worker /brick1/mvol1):324:regjob] _GMaster: synced&nbsp;&nbsp;&nbsp;

        file=.gfid/a3656075-784c-4377-a482-4aad8378acf0<br>

      </font></p>

    <p>when i try to fetch some attributes i get 'No such attribute'&nbsp;

      for the mentioned gfid while for a synced gfid attributes are

      available :<br>

    </p>

    <font face="monospace">[ 09:58:54 ] - root@gl-master-02&nbsp; ~/tmp

      $getfattr -m . -d -e hex

/tmp/gsyncd-aux-mount-46pc26b9/.gfid/8d601e5b-180c-46c8-b64f-ae6224542234</font><br>

    <font face="monospace">/tmp/gsyncd-aux-mount-46pc26b9/.gfid/8d601e5b-180c-46c8-b64f-ae6224542234:

      trusted.glusterfs.mdata: No such attribute</font><br>

    <br>

    <font face="monospace">[ 09:59:38 ] - root@gl-master-02&nbsp; ~/tmp

      $getfattr -m . -d -e hex

/tmp/gsyncd-aux-mount-46pc26b9/.gfid/a3656075-784c-4377-a482-4aad8378acf0</font><br>

    <font face="monospace">getfattr: Removing leading '/' from absolute

      path names</font><br>

    <font face="monospace"># file:

      tmp/gsyncd-aux-mount-46pc26b9/.gfid/a3656075-784c-4377-a482-4aad8378acf0</font><br>

    <font face="monospace">trusted.glusterfs.mdata=0x010000000000000000000000005d1f73ff000000000da35668000000005d1f73fd0000000015811b46000000005f29050b000000001d5363a6</font><br>

    <p>i can make a stat on that directory for example :</p>

    <p><font face="monospace">[ 10:07:19 ] - root@gl-master-02&nbsp; ~/tmp

        $stat

/tmp/gsyncd-aux-mount-46pc26b9/.gfid/8d601e5b-180c-46c8-b64f-ae6224542234<br>

        &nbsp; File:

/tmp/gsyncd-aux-mount-46pc26b9/.gfid/8d601e5b-180c-46c8-b64f-ae6224542234<br>

        &nbsp; Size: 4096&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp; Blocks: 8&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; IO Block: 131072

        directory<br>

        Device: 37h/55d&nbsp;&nbsp;&nbsp; Inode: 9394601311212820456&nbsp; Links: 2<br>

        Access: (0755/drwxr-xr-x)&nbsp; Uid: (&nbsp;&nbsp;&nbsp; 0/&nbsp;&nbsp;&nbsp; root)&nbsp;&nbsp; Gid: (&nbsp;&nbsp;&nbsp;

        0/&nbsp;&nbsp;&nbsp; root)<br>

        Access: 2021-01-17 07:26:09.596743288 +0000<br>

        Modify: 2021-03-12 07:34:39.382122663 +0000<br>

        Change: 2021-03-12 07:34:39.383446790 +0000<br>

        &nbsp;Birth: -</font><br>

    </p>

    <p>currently i have no clou how to get this fixed. obviously

      extended attributes are missing for the most entries in

      /tmp/gsyncd-aux-mount.../ while some others exist.</p>

    <p>I believe rsync is not the reason, it's rather because of missing

      attributes ?!</p>

    <p>Lastly the gfid points to a directory, when i try to get

      attributes for this directroy in brick-path it succeed's :<br>

    </p>

    <br>

    <font face="monospace">[ 10:15:25 ] - root@gl-master-02&nbsp; ~/tmp $ls

      -l

      /brick1/mvol1/.glusterfs/8d/60/8d601e5b-180c-46c8-b64f-ae6224542234<br>

      lrwxrwxrwx 1 root root 56 Nov&nbsp; 5 17:54

      /brick1/mvol1/.glusterfs/8d/60/8d601e5b-180c-46c8-b64f-ae6224542234

      -&gt; ../../f0/94/f094bf06-2806-4f90-9a79-489827c6cdf9/2217547</font>

    <p><font face="monospace"><br>

      </font></p>

    <font face="monospace">[ 10:38:01 ] - root@gl-master-02&nbsp; ~ $getfattr

      -m . -d -e hex /brick1/mvol1/2137/files/20/11/2217547<br>

      getfattr: Removing leading '/' from absolute path names<br>

      # file: brick1/mvol1/2137/files/20/11/2217547<br>

      trusted.gfid=0x8d601e5b180c46c8b64fae6224542234<br>

trusted.glusterfs.2f5de6e4-66de-40a7-9f24-4762aad3ca96.xtime=0x604b198f0005e528<br>

      trusted.glusterfs.dht=0x001ed359000000007a2d37c1a8b9af89<br>

      trusted.glusterfs.dht.mds=0x00000000<br>

      <br>

      [ 10:38:42 ] - root@gl-master-02&nbsp; ~ $getfattr -m . -d -e hex

      /brick1/mvol1/2137/files/20/11<br>

      getfattr: Removing leading '/' from absolute path names<br>

      # file: brick1/mvol1/2137/files/20/11<br>

      trusted.gfid=0xf094bf0628064f909a79489827c6cdf9<br>

trusted.glusterfs.2f5de6e4-66de-40a7-9f24-4762aad3ca96.xtime=0x604b198f0005e528<br>

      trusted.glusterfs.dht=0x001ed35900000000d1738834ffffffff<br>

trusted.glusterfs.mdata=0x010000000000000000000000005fc5378000000000077ba08a000000005fc535b60000000038d942cc000000005f9ffc610000000007b08744<br>

      <br>

      [ 10:39:54 ] - root@gl-master-02&nbsp; ~ $<br>

    </font>

    <p><br>

    </p>

    <p>but the geo-rep ended up in a loop, but without 'E'error :</p>

    <p><font face="monospace">[2021-03-12 10:46:40.572500] D

        [repce(worker /brick1/mvol1):215:__call__] RepceClient: call

        19352:140387951818496:1615546000.5609794 keep_alive -&gt; 256<br>

        [2021-03-12 10:46:41.23154] D [master(worker

        /brick2/mvol1):554:crawlwrap] _GMaster: ... crawl #0 done, took

        5.017846 seconds<br>

        [2021-03-12 10:46:41.35729] D [master(worker

        /brick2/mvol1):578:crawlwrap] _GMaster: Crawl info&nbsp;&nbsp;&nbsp;

        cluster_stime=(1609281098, 0)&nbsp;&nbsp;&nbsp; brick_stime=(1609281900, 0)<br>

        [2021-03-12 10:46:46.41012] D [master(worker

        /brick2/mvol1):554:crawlwrap] _GMaster: ... crawl #0 done, took

        5.017512 seconds<br>

        [2021-03-12 10:46:46.53818] D [master(worker

        /brick2/mvol1):578:crawlwrap] _GMaster: Crawl info&nbsp;&nbsp;&nbsp;

        cluster_stime=(1609281098, 0)&nbsp;&nbsp;&nbsp; brick_stime=(1609281900, 0)<br>

        [2021-03-12 10:46:48.269174] D [repce(worker

        /brick2/mvol1):195:push] RepceClient: call

        19354:140476158043904:1615546008.2690222 keep_alive({'version':

        (1, 0), 'uuid': '2f5de6e4-66de-40a7-9f24-4762aad3ca96',

        'retval': 0, 'volume_mark': (1609275788, 819193), 'timeout':

        1615546128},) ...</font><br>

    </p>

    <p><br>

    </p>

    <p>Does anyone have any idea how to solve this problem ?</p>

    <p>best regards,</p>

    <p>Dietar<br>

    </p>

    <p><br>

    </p>

    <div class="moz-cite-prefix">On 04.03.21 08:48, Shwetha Acharya

      wrote:<br>

    </div>

    <blockquote type="cite" cite="mid:CAERh03oi__zcHuHXtvjYHaa41HSot-ySydzMtKCnw9fVPPw-FQ@mail.gmail.com">

      <div dir="ltr">

        <div dir="ltr">Hi&nbsp;Dietmar,<br>

          <br>

        </div>

        <div dir="ltr">batch-fsync-delay-usec was already set to 0 and I

          increased the sync_jobs from 3 to 6. In the moment I increased

          the sync_jobs following error appeared in gsyncd.log :<br>

        </div>

        <div class="gmail_quote">

          <blockquote class="gmail_quote" style="margin:0px 0px 0px

            0.8ex;border-left:1px solid

            rgb(204,204,204);padding-left:1ex">

            <p>[2021-03-03 23:17:46.59727] E [syncdutils(worker

              /brick1/mvol1):312:log_raise_exception] &lt;top&gt;:

              connection to peer is broken<br>

              [2021-03-03 23:17:46.59912] E [syncdutils(worker

              /brick2/mvol1):312:log_raise_exception] &lt;top&gt;:

              connection to peer is broken</p>

          </blockquote>

          <div>If the geo-rep session is currently not in faulty state,

            we should be bothered about this log message. It is normal

            when the config is updated, geo-rep restart occurs and the

            above message&nbsp;pops up.</div>

          <blockquote class="gmail_quote" style="margin:0px 0px 0px

            0.8ex;border-left:1px solid

            rgb(204,204,204);padding-left:1ex">

            <div>

              <p>passive nodes became active and the content in

                &lt;brick&gt;/.processing was removed. currently new

                changelog files are created in this directory.shortly

                before I changed the sync_jobs I have checked the

                &lt;brick&gt;/.processing directory on the master nodes.

                the result was the same for every master node.</p>

            </div>

          </blockquote>

          <blockquote class="gmail_quote" style="margin:0px 0px 0px

            0.8ex;border-left:1px solid

            rgb(204,204,204);padding-left:1ex">

            <p>since the last error about 12 hours ago nearly 2400

              changelog files were created on each node but it looks

              like none of them were consumed.</p>

          </blockquote>

          <div>&nbsp;Processed changelogs that are synced are archived under

            &lt;brick&gt;/.processed directory. Verify if the latest

            file is created there.</div>

          <blockquote class="gmail_quote" style="margin:0px 0px 0px

            0.8ex;border-left:1px solid

            rgb(204,204,204);padding-left:1ex">

            <div>

              <p>in the moment I'm not sure what is right and what is

                wrong...<span lang="en"><span><span>should at least the

                      oldest changelog files in this directory have been

                      processed gradually ?</span></span></span></p>

            </div>

          </blockquote>

          <div>Also you can try to set the log-level to debug for a

            while and set it back to info(to avoid flooding of logs) and

            check the logs to get a better picture of the scenario.</div>

          <div>#gluster volume geo-replication &lt;primary&gt;

            &lt;ip&gt;::&lt;secondary&gt; config log-level DEBUG<br>

            #gluster volume geo-replication &lt;primary&gt;

            &lt;ip&gt;::&lt;secondary&gt; config log-level INFO<br>

            <br>

            Regards,<br>

            Shwetha<br>

            <br>

          </div>

        </div>

      </div>

    </blockquote>

    <pre class="moz-signature" cols="72">

</pre>

  </body>

</html>