<div dir="ltr"><div>Hi Strahil and Sunny,</div><div><br></div><div>Thank you for the replies. I checked the gfid on the master and slaves and they are the same. After moving the file away and back again it doesn&#39;t seem to be having the issue with that file any more.</div><div><br></div><div>We are still getting higher CPU usage on one of the master nodes than the others. It logs this every few seconds:<br></div><div><br></div><div>[2020-06-02 03:10:15.637815] I [master(worker /nodirectwritedata/gluster/gvol0):1384:process] _GMaster: Entry Time Taken        MKD=0   MKN=0   LIN=0   SYM=0   REN=0   RMD=0CRE=0   duration=0.0000 UNL=0<br>[2020-06-02 03:10:15.638010] I [master(worker /nodirectwritedata/gluster/gvol0):1394:process] _GMaster: Data/Metadata Time Taken        SETA=0  SETX=0  meta_duration=0.0000data_duration=12.7878    DATA=4  XATT=0<br>[2020-06-02 03:10:15.638286] I [master(worker /nodirectwritedata/gluster/gvol0):1404:process] _GMaster: Batch Completed changelog_end=1591067378        entry_stime=(1591067167, 0)  changelog_start=1591067364      stime=(1591067377, 0)   duration=12.8068        num_changelogs=2        mode=live_changelog<br>[2020-06-02 03:10:20.658601] I [master(worker /nodirectwritedata/gluster/gvol0):1470:crawl] _GMaster: slave&#39;s time      stime=(1591067377, 0)<br>[2020-06-02 03:10:34.21799] I [master(worker /nodirectwritedata/gluster/gvol0):1954:syncjob] Syncer: Sync Time Taken    duration=0.3826 num_files=8     job=1   return_code=0<br>[2020-06-02 03:10:46.440535] I [master(worker /nodirectwritedata/gluster/gvol0):1384:process] _GMaster: Entry Time Taken        MKD=0   MKN=0   LIN=0   SYM=0   REN=1   RMD=0CRE=2   duration=0.1314 UNL=1<br>[2020-06-02 03:10:46.440809] I [master(worker /nodirectwritedata/gluster/gvol0):1394:process] _GMaster: Data/Metadata Time Taken        SETA=0  SETX=0  meta_duration=0.0000data_duration=13.0171    DATA=14 XATT=0<br>[2020-06-02 03:10:46.441205] I [master(worker /nodirectwritedata/gluster/gvol0):1404:process] _GMaster: Batch Completed changelog_end=1591067420        entry_stime=(1591067419, 0)  changelog_start=1591067392      stime=(1591067419, 0)   duration=13.0322        num_changelogs=3        mode=live_changelog<br>[2020-06-02 03:10:51.460925] I [master(worker /nodirectwritedata/gluster/gvol0):1470:crawl] _GMaster: slave&#39;s time      stime=(1591067419, 0)<br><br>[2020-06-02 03:11:04.448913] I [master(worker /nodirectwritedata/gluster/gvol0):1954:syncjob] Syncer: Sync Time Taken   duration=0.3466 num_files=3     job=1   return_code=0<br></div><div><br></div><div>Whereas the other master nodes only log this:</div><div><br></div><div>[2020-06-02 03:11:33.886938] I [gsyncd(config-get):308:main] &lt;top&gt;: Using session config file   path=/var/lib/glusterd/geo-replication/gvol0_nvfs10_gvol0/gsyncd.conf<br>[2020-06-02 03:11:33.993175] I [gsyncd(status):308:main] &lt;top&gt;: Using session config file       path=/var/lib/glusterd/geo-replication/gvol0_nvfs10_gvol0/gsyncd.conf</div><div><br></div><div>Can anyone help with what might cause the high CPU usage on one master node? The process is this one, and is using 70-100% of CPU:</div><div><br></div><div>python2 /usr/lib/x86_64-linux-gnu/glusterfs/python/syncdaemon/gsyncd.py worker gvol0 nvfs10::gvol0 --feedback-fd 15 --local-path /nodirectwritedata/gluster/gvol0 --local-node cafs30 --local-node-id b7521445-ee93-4fed-8ced-6a609fa8c7d4 --slave-id cdcdb210-839c-4306-a4dc-e696b165ed17 --rpc-fd 12,11,9,13 --subvol-num 1 --resource-remote nvfs30 --resource-remote-id 1e698ccd-aeec-4ec4-96fe-383da8fc3b78</div><div><br></div><div>Thank you in advance!</div><div><br></div><div><br></div><div><br></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Sat, 30 May 2020 at 20:20, Strahil Nikolov &lt;<a href="mailto:hunter86_bg@yahoo.com" target="_blank">hunter86_bg@yahoo.com</a>&gt; wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">Hey David,<br>

<br>

for me a gfid  mismatch means  that the file  was  replaced/recreated  -  just like  vim in linux does (and it is expected for config file).<br>

<br>

Have  you checked the gfid  of  the file on both source and destination,  do they really match or they are different ?<br>

<br>

What happens  when you move away the file  from the slave ,  does it fixes the issue ?<br>

<br>

Best Regards,<br>

Strahil Nikolov<br>

<br>

На 30 май 2020 г. 1:10:56 GMT+03:00, David Cunningham &lt;<a href="mailto:dcunningham@voisonics.com" target="_blank">dcunningham@voisonics.com</a>&gt; написа:<br>

&gt;Hello,<br>

&gt;<br>

&gt;We&#39;re having an issue with a geo-replication process with unusually<br>

&gt;high<br>

&gt;CPU use and giving &quot;Entry not present on master. Fixing gfid mismatch<br>

&gt;in<br>

&gt;slave&quot; errors. Can anyone help on this?<br>

&gt;<br>

&gt;We have 3 GlusterFS replica nodes (we&#39;ll call the master), which also<br>

&gt;push<br>

&gt;data to a remote server (slave) using geo-replication. This has been<br>

&gt;running fine for a couple of months, but yesterday one of the master<br>

&gt;nodes<br>

&gt;started having unusually high CPU use. It&#39;s this process:<br>

&gt;<br>

&gt;root@cafs30:/var/log/glusterfs# ps aux | grep 32048<br>

&gt;root     32048 68.7  0.6 1843140 845756 ?      Rl   02:51 493:51<br>

&gt;python2<br>

&gt;/usr/lib/x86_64-linux-gnu/glusterfs/python/syncdaemon/gsyncd.py worker<br>

&gt;gvol0 nvfs10::gvol0 --feedback-fd 15 --local-path<br>

&gt;/nodirectwritedata/gluster/gvol0 --local-node cafs30 --local-node-id<br>

&gt;b7521445-ee93-4fed-8ced-6a609fa8c7d4 --slave-id<br>

&gt;cdcdb210-839c-4306-a4dc-e696b165ed17 --rpc-fd 12,11,9,13 --subvol-num 1<br>

&gt;--resource-remote nvfs30 --resource-remote-id<br>

&gt;1e698ccd-aeec-4ec4-96fe-383da8fc3b78<br>

&gt;<br>

&gt;Here&#39;s what is being logged in<br>

&gt;/var/log/glusterfs/geo-replication/gvol0_nvfs10_gvol0/gsyncd.log:<br>

&gt;<br>

&gt;[2020-05-29 21:57:18.843524] I [master(worker<br>

&gt;/nodirectwritedata/gluster/gvol0):1470:crawl] _GMaster: slave&#39;s time<br>

&gt; stime=(1590789408, 0)<br>

&gt;[2020-05-29 21:57:30.626172] I [master(worker<br>

&gt;/nodirectwritedata/gluster/gvol0):813:fix_possible_entry_failures]<br>

&gt;_GMaster: Entry not present on master. Fixing gfid mismatch in slave.<br>

&gt;Deleting the entry    retry_count=1   entry=({u&#39;uid&#39;: 108, u&#39;gfid&#39;:<br>

&gt;u&#39;7c0b75e5-d8b7-454f-8010-112d613c599e&#39;, u&#39;gid&#39;: 117, u&#39;mode&#39;: 33204,<br>

&gt;u&#39;entry&#39;:<br>

&gt;u&#39;.gfid/c5422396-1578-4b50-a29d-315be2a9c5d8/00a859f7xxxx.cfg&#39;,<br>

&gt;u&#39;op&#39;: u&#39;CREATE&#39;}, 17, {u&#39;slave_isdir&#39;: False, u&#39;gfid_mismatch&#39;: True,<br>

&gt;u&#39;slave_name&#39;: None, u&#39;slave_gfid&#39;:<br>

&gt;u&#39;ec4b0ace-2ec4-4ea5-adbc-9f519b81917c&#39;, u&#39;name_mismatch&#39;: False,<br>

&gt;u&#39;dst&#39;:<br>

&gt;False})<br>

&gt;[2020-05-29 21:57:30.627893] I [master(worker<br>

&gt;/nodirectwritedata/gluster/gvol0):813:fix_possible_entry_failures]<br>

&gt;_GMaster: Entry not present on master. Fixing gfid mismatch in slave.<br>

&gt;Deleting the entry    retry_count=1   entry=({u&#39;uid&#39;: 108, u&#39;gfid&#39;:<br>

&gt;u&#39;a4d52e40-2e2f-4885-be5f-65fe95a8ebd7&#39;, u&#39;gid&#39;: 117, u&#39;mode&#39;: 33204,<br>

&gt;u&#39;entry&#39;:<br>

&gt;u&#39;.gfid/f857c42e-22f1-4ce4-8f2e-13bdadedde45/polycom_00a859f7xxxx.cfg&#39;,<br>

&gt;u&#39;op&#39;: u&#39;CREATE&#39;}, 17, {u&#39;slave_isdir&#39;: False, u&#39;gfid_mismatch&#39;: True,<br>

&gt;u&#39;slave_name&#39;: None, u&#39;slave_gfid&#39;:<br>

&gt;u&#39;ece8da77-b5ea-45a7-9af7-7d4d8f55f74a&#39;, u&#39;name_mismatch&#39;: False,<br>

&gt;u&#39;dst&#39;:<br>

&gt;False})<br>

&gt;[2020-05-29 21:57:30.629532] I [master(worker<br>

&gt;/nodirectwritedata/gluster/gvol0):813:fix_possible_entry_failures]<br>

&gt;_GMaster: Entry not present on master. Fixing gfid mismatch in slave.<br>

&gt;Deleting the entry    retry_count=1   entry=({u&#39;uid&#39;: 108, u&#39;gfid&#39;:<br>

&gt;u&#39;3c525ad8-aeb2-46b6-9c41-7fb4987916f8&#39;, u&#39;gid&#39;: 117, u&#39;mode&#39;: 33204,<br>

&gt;u&#39;entry&#39;:<br>

&gt;u&#39;.gfid/f857c42e-22f1-4ce4-8f2e-13bdadedde45/00a859f7xxxx-directory.xml&#39;,<br>

&gt;u&#39;op&#39;: u&#39;CREATE&#39;}, 17, {u&#39;slave_isdir&#39;: False, u&#39;gfid_mismatch&#39;: True,<br>

&gt;u&#39;slave_name&#39;: None, u&#39;slave_gfid&#39;:<br>

&gt;u&#39;06717b5a-d842-495d-bd25-aab9cd454490&#39;, u&#39;name_mismatch&#39;: False,<br>

&gt;u&#39;dst&#39;:<br>

&gt;False})<br>

&gt;[2020-05-29 21:57:30.659123] I [master(worker<br>

&gt;/nodirectwritedata/gluster/gvol0):942:handle_entry_failures] _GMaster:<br>

&gt;Sucessfully fixed entry ops with gfid mismatch     retry_count=1<br>

&gt;[2020-05-29 21:57:30.659343] I [master(worker<br>

&gt;/nodirectwritedata/gluster/gvol0):1194:process_change] _GMaster: Retry<br>

&gt;original entries. count = 1<br>

&gt;[2020-05-29 21:57:30.725810] I [master(worker<br>

&gt;/nodirectwritedata/gluster/gvol0):1197:process_change] _GMaster:<br>

&gt;Sucessfully fixed all entry ops with gfid mismatch<br>

&gt;[2020-05-29 21:57:31.747319] I [master(worker<br>

&gt;/nodirectwritedata/gluster/gvol0):1954:syncjob] Syncer: Sync Time Taken<br>

&gt;duration=0.7409 num_files=18    job=1   return_code=0<br>

&gt;<br>

&gt;We&#39;ve verified that the files like polycom_00a859f7xxxx.cfg referred to<br>

&gt;in<br>

&gt;the error do exist on the master nodes and slave.<br>

&gt;<br>

&gt;We found this bug fix:<br>

&gt;<a href="https://bugzilla.redhat.com/show_bug.cgi?id=1642865" rel="noreferrer" target="_blank">https://bugzilla.redhat.com/show_bug.cgi?id=1642865</a><br>

&gt;<br>

&gt;However that fix went in 5.1, and we&#39;re running 5.12 on the master<br>

&gt;nodes<br>

&gt;and slave. A couple of GlusterFS clients connected to the master nodes<br>

&gt;are<br>

&gt;running 5.13.<br>

&gt;<br>

&gt;Would anyone have any suggestions? Thank you in advance.<br>

</blockquote></div><br clear="all"><br>-- <br><div dir="ltr"><div dir="ltr"><div><div dir="ltr"><div><div dir="ltr"><div><div dir="ltr"><div><div dir="ltr"><div>David Cunningham, Voisonics Limited<br><a href="http://voisonics.com/" target="_blank">http://voisonics.com/</a><br>USA: +1 213 221 1092<br>New Zealand: +64 (0)28 2558 3782</div></div></div></div></div></div></div></div></div></div></div>