<div dir="ltr"><div>Hi Strahil,</div><div><br></div><div>Is there a trick to getting the .gfid directory to appear besides adding "-o aux-gfid-mount" to the mount? I mounted it using "mount -t glusterfs -o aux-gfid-mount cafs30:/gvol0 /mnt/glusterfs" and there's no .gfid directory under /mnt/glusterfs.</div><div><br></div><div>I haven't tried joining a gluster meeting. Are bugs/problems usually discussed on such things? I usually find that people need to look into things and respond in their own time so email can be better.</div><div><br></div><div>Thanks for your help.</div><div><br></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Thu, 11 Jun 2020 at 15:16, Strahil Nikolov <<a href="mailto:hunter86_bg@yahoo.com">hunter86_bg@yahoo.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">You can try the path of a file based on gfid (method 2) via:<br>
<br>
<a href="https://docs.gluster.org/en/latest/Troubleshooting/gfid-to-path/" rel="noreferrer" target="_blank">https://docs.gluster.org/en/latest/Troubleshooting/gfid-to-path/</a><br>
<br>
The gfids from the strace should be there, but if the file was renamed/deleted - it is normall to be missing.<br>
<br>
Have you joined the last gluster meeting to discuss the problem ?<br>
<br>
<br>
Best Regards,<br>
Strahil Nikolov<br>
<br>
На 11 юни 2020 г. 3:15:36 GMT+03:00, David Cunningham <<a href="mailto:dcunningham@voisonics.com" target="_blank">dcunningham@voisonics.com</a>> написа:<br>
>Hi Strahil,<br>
><br>
>Thanks for that. I did search for a file with the gfid in the name, on<br>
>both<br>
>the master nodes and geo-replication slave, but none of them had such a<br>
>file. I guess maybe by the time I looked the file had been deleted?<br>
>Either<br>
>that or something is more seriously wrong with invalid gfids.<br>
><br>
>BTW, I used strace to try and see what gsyncd was up to when using all<br>
>that<br>
>CPU. Running strace attached to gsyncd for 10 seconds gave 168,000<br>
>lines of<br>
>output, mostly like the following:<br>
><br>
>read(6, "CHANGELOG.1585775398\0\0\0\0\0\0\0\0\0\0\0\0"..., 4096) = 4096<br>
>lseek(6, 1088012288, SEEK_SET) = 1088012288<br>
>lseek(6, 1088012288, SEEK_SET) = 1088012288<br>
>read(6, "est\nE da6ed6e8-2b49-4a56-b783-d0"..., 4096) = 4096<br>
>lseek(6, 1088016384, SEEK_SET) = 1088016384<br>
>lseek(6, 1088016384, SEEK_SET) = 1088016384<br>
>read(6, "lock-05380315\nE cf5fe292-2ebf-43"..., 4096) = 4096<br>
>lseek(6, 1088020480, SEEK_SET) = 1088020480<br>
>lseek(6, 1088020480, SEEK_SET) = 1088020480<br>
>read(6, "7 10a14313-4f92-4071-83cb-c900ef"..., 4096) = 4096<br>
>lseek(6, 1088024576, SEEK_SET) = 1088024576<br>
>lseek(6, 1088024576, SEEK_SET) = 1088024576<br>
>read(6, "D b70ba2e8-d954-4fb2-b17a-77c8cc"..., 4096) = 4096<br>
>lseek(6, 1088028672, SEEK_SET) = 1088028672<br>
>lseek(6, 1088028672, SEEK_SET) = 1088028672<br>
>read(6, "01681-e324-4f13-ab3a-0e8ae50ff95"..., 4096) = 4096<br>
>lseek(6, 1088032768, SEEK_SET) = 1088032768<br>
>read(6, "09ef519/voicemail_1585336530_158"..., 4096) = 4096<br>
>lseek(6, 1088036864, SEEK_SET) = 1088036864<br>
>read(6, "6-4539-8d7f-d17fb8f71d6d\nD 1236c"..., 4096) = 4096<br>
>lseek(6, 1088040960, SEEK_SET) = 1088040960<br>
>lseek(6, 1088040960, SEEK_SET) = 1088040960<br>
>read(6, "6-4d54-8b9b-4146930b3a2d\nD 88287"..., 4096) = 4096<br>
><br>
>I'm guessing those reads are mostly on the files under .glusterfs? I<br>
>did<br>
>check for files matching the names above and there aren't any matching<br>
>"da6ed6e8-2b49-4a56-b783-d0", "cf5fe292-2ebf-43", or<br>
>"10a14313-4f92-4071-83cb-c900ef" though.<br>
><br>
>Any guidance would be appreciated.<br>
><br>
><br>
>On Wed, 10 Jun 2020 at 16:06, Strahil Nikolov <<a href="mailto:hunter86_bg@yahoo.com" target="_blank">hunter86_bg@yahoo.com</a>><br>
>wrote:<br>
><br>
>> Hey David,<br>
>><br>
>> Sadly I just have a feeling that on any brick there is a gfid<br>
>mismatch,<br>
>> but I could be wrong.<br>
>><br>
>> As you have the gfid list, please check on all bricks (both master<br>
>and<br>
>> slave) that the file exists (not the one in .gluster , but the real<br>
>one)<br>
>> and it has the same gfid.<br>
>><br>
>> You can find the inode via ls and then run a find (don't forget the<br>
>> ionice) against the brick and that inode number.<br>
>><br>
>> Once you have the full path to the file , test:<br>
>> - Mount with FUSE<br>
>> - Check file exists ( no '??????' for permissions, size, etc) and can<br>
>be<br>
>> manipulated (maybe 'touch' can be used ?)<br>
>> - Find (on all replica sets ) the file and check the gfid<br>
>> - Check for heals pending for that gfid<br>
>><br>
>><br>
>> Best Regards,<br>
>> Strahil Nikolov<br>
>><br>
>> На 10 юни 2020 г. 6:37:35 GMT+03:00, David Cunningham <<br>
>> <a href="mailto:dcunningham@voisonics.com" target="_blank">dcunningham@voisonics.com</a>> написа:<br>
>> >Hi Strahil,<br>
>> ><br>
>> >Thank you for that. Do you know if these "Stale file handle" errors<br>
>on<br>
>> >the<br>
>> >geo-replication slave could be related?<br>
>> ><br>
>> >[2020-06-10 01:02:32.268989] E [MSGID: 109040]<br>
>> >[dht-helper.c:1332:dht_migration_complete_check_task] 0-gvol0-dht:<br>
>> >/.gfid/d4265a0c-d881-48d8-8ca1-0920ab5ae9ba: failed to lookup the<br>
>file<br>
>> >on<br>
>> >gvol0-dht [Stale file handle]<br>
>> >[2020-06-10 01:02:32.269092] W [fuse-bridge.c:897:fuse_attr_cbk]<br>
>> >0-glusterfs-fuse: 7434237: STAT()<br>
>> >/.gfid/d4265a0c-d881-48d8-8ca1-0920ab5ae9ba => -1 (Stale file<br>
>handle)<br>
>> >[2020-06-10 01:02:32.329280] W [fuse-bridge.c:897:fuse_attr_cbk]<br>
>> >0-glusterfs-fuse: 7434251: STAT()<br>
>> >/.gfid/d4265a0c-d881-48d8-8ca1-0920ab5ae9ba => -1 (Stale file<br>
>handle)<br>
>> >[2020-06-10 01:02:32.387129] W [fuse-bridge.c:897:fuse_attr_cbk]<br>
>> >0-glusterfs-fuse: 7434264: STAT()<br>
>> >/.gfid/d4265a0c-d881-48d8-8ca1-0920ab5ae9ba => -1 (Stale file<br>
>handle)<br>
>> >[2020-06-10 01:02:32.448838] W [fuse-bridge.c:897:fuse_attr_cbk]<br>
>> >0-glusterfs-fuse: 7434277: STAT()<br>
>> >/.gfid/d4265a0c-d881-48d8-8ca1-0920ab5ae9ba => -1 (Stale file<br>
>handle)<br>
>> >[2020-06-10 01:02:32.507196] W [fuse-bridge.c:897:fuse_attr_cbk]<br>
>> >0-glusterfs-fuse: 7434290: STAT()<br>
>> >/.gfid/d4265a0c-d881-48d8-8ca1-0920ab5ae9ba => -1 (Stale file<br>
>handle)<br>
>> >[2020-06-10 01:02:32.566033] W [fuse-bridge.c:897:fuse_attr_cbk]<br>
>> >0-glusterfs-fuse: 7434303: STAT()<br>
>> >/.gfid/d4265a0c-d881-48d8-8ca1-0920ab5ae9ba => -1 (Stale file<br>
>handle)<br>
>> >[2020-06-10 01:02:32.625168] W [fuse-bridge.c:897:fuse_attr_cbk]<br>
>> >0-glusterfs-fuse: 7434316: STAT()<br>
>> >/.gfid/d4265a0c-d881-48d8-8ca1-0920ab5ae9ba => -1 (Stale file<br>
>handle)<br>
>> >[2020-06-10 01:02:32.772442] W [fuse-bridge.c:897:fuse_attr_cbk]<br>
>> >0-glusterfs-fuse: 7434329: STAT()<br>
>> >/.gfid/d4265a0c-d881-48d8-8ca1-0920ab5ae9ba => -1 (Stale file<br>
>handle)<br>
>> >[2020-06-10 01:02:32.832481] W [fuse-bridge.c:897:fuse_attr_cbk]<br>
>> >0-glusterfs-fuse: 7434342: STAT()<br>
>> >/.gfid/d4265a0c-d881-48d8-8ca1-0920ab5ae9ba => -1 (Stale file<br>
>handle)<br>
>> >[2020-06-10 01:02:32.891835] W [fuse-bridge.c:897:fuse_attr_cbk]<br>
>> >0-glusterfs-fuse: 7434403: STAT()<br>
>> >/.gfid/d4265a0c-d881-48d8-8ca1-0920ab5ae9ba => -1 (Stale file<br>
>handle)<br>
>> ><br>
>> ><br>
>> ><br>
>> >On Tue, 9 Jun 2020 at 16:31, Strahil Nikolov <<a href="mailto:hunter86_bg@yahoo.com" target="_blank">hunter86_bg@yahoo.com</a>><br>
>> >wrote:<br>
>> ><br>
>> >> Hey David,<br>
>> >><br>
>> >> Can you check the cpu usage in the sar on the rest of the cluster<br>
>> >(going<br>
>> >> backwards from the day you found the high cpu usage), so we can<br>
>know<br>
>> >if<br>
>> >> this behaviour was obseerved on other nodes.<br>
>> >><br>
>> >> Maybe that behaviour was "normal" for the push node (which could<br>
>be<br>
>> >> another one) .<br>
>> >><br>
>> >> As this script is python, I guess you can put some debug<br>
>print<br>
>> >> statements in it.<br>
>> >><br>
>> >> Best Regards,<br>
>> >> Strahil Nikolov<br>
>> >><br>
>> >> На 9 юни 2020 г. 5:07:11 GMT+03:00, David Cunningham <<br>
>> >> <a href="mailto:dcunningham@voisonics.com" target="_blank">dcunningham@voisonics.com</a>> написа:<br>
>> >> >Hi Sankarshan,<br>
>> >> ><br>
>> >> >Thanks for that. So what should we look for to figure out what<br>
>this<br>
>> >> >process<br>
>> >> >is doing? In<br>
>> >> >/var/log/glusterfs/geo-replication/gvol0_nvfs10_gvol0/gsyncd.log<br>
>we<br>
>> >see<br>
>> >> >something like the following logged regularly:<br>
>> >> ><br>
>> >> ><br>
>> >> >[[2020-06-09 02:01:19.670595] D [master(worker<br>
>> >> >/nodirectwritedata/gluster/gvol0):1454:changelogs_batch_process]<br>
>> >> >_GMaster:<br>
>> >> >processing changes<br>
>> >><br>
>> >><br>
>><br>
>><br>
>>>batch=['/var/lib/misc/gluster/gsyncd/gvol0_nvfs10_gvol0/nodirectwritedata-gluster-gvol0/.processing/CHANGELOG.1591668040',<br>
>> >><br>
>> >><br>
>><br>
>><br>
>>>'/var/lib/misc/gluster/gsyncd/gvol0_nvfs10_gvol0/nodirectwritedata-gluster-gvol0/.processing/CHANGELOG.1591668055']<br>
>> >> >[2020-06-09 02:01:19.674927] D [master(worker<br>
>> >> >/nodirectwritedata/gluster/gvol0):1289:process] _GMaster:<br>
>processing<br>
>> >> >change<br>
>> >> ><br>
>> >><br>
>> >><br>
>><br>
>><br>
>>>changelog=/var/lib/misc/gluster/gsyncd/gvol0_nvfs10_gvol0/nodirectwritedata-gluster-gvol0/.processing/CHANGELOG.1591668040<br>
>> >> >[2020-06-09 02:01:19.683098] D [master(worker<br>
>> >> >/nodirectwritedata/gluster/gvol0):1170:process_change] _GMaster:<br>
>> >> >entries: []<br>
>> >> >[2020-06-09 02:01:19.695125] D [master(worker<br>
>> >> >/nodirectwritedata/gluster/gvol0):312:a_syncdata] _GMaster: files<br>
>> >> >files=set(['.gfid/0f98f9cd-1800-4c0f-b449-edcd7446bf17',<br>
>> >> >'.gfid/512b4710-5af7-4e5a-8f3a-0a3dece42f77',<br>
>> >> >'.gfid/779cd2b3-1571-446a-8903-48d6183d3dd0',<br>
>> >> >'.gfid/8ae32eec-f766-4cd9-a788-4561ba1fa435'])<br>
>> >> >[2020-06-09 02:01:19.695344] D [master(worker<br>
>> >> >/nodirectwritedata/gluster/gvol0):315:a_syncdata] _GMaster:<br>
>> >candidate<br>
>> >> >for<br>
>> >> >syncing file=.gfid/0f98f9cd-1800-4c0f-b449-edcd7446bf17<br>
>> >> >[2020-06-09 02:01:19.695508] D [master(worker<br>
>> >> >/nodirectwritedata/gluster/gvol0):315:a_syncdata] _GMaster:<br>
>> >candidate<br>
>> >> >for<br>
>> >> >syncing file=.gfid/512b4710-5af7-4e5a-8f3a-0a3dece42f77<br>
>> >> >[2020-06-09 02:01:19.695638] D [master(worker<br>
>> >> >/nodirectwritedata/gluster/gvol0):315:a_syncdata] _GMaster:<br>
>> >candidate<br>
>> >> >for<br>
>> >> >syncing file=.gfid/779cd2b3-1571-446a-8903-48d6183d3dd0<br>
>> >> >[2020-06-09 02:01:19.695759] D [master(worker<br>
>> >> >/nodirectwritedata/gluster/gvol0):315:a_syncdata] _GMaster:<br>
>> >candidate<br>
>> >> >for<br>
>> >> >syncing file=.gfid/8ae32eec-f766-4cd9-a788-4561ba1fa435<br>
>> >> >[2020-06-09 02:01:19.695883] D [master(worker<br>
>> >> >/nodirectwritedata/gluster/gvol0):1289:process] _GMaster:<br>
>processing<br>
>> >> >change<br>
>> >> ><br>
>> >><br>
>> >><br>
>><br>
>><br>
>>>changelog=/var/lib/misc/gluster/gsyncd/gvol0_nvfs10_gvol0/nodirectwritedata-gluster-gvol0/.processing/CHANGELOG.1591668055<br>
>> >> >[2020-06-09 02:01:19.696170] D [master(worker<br>
>> >> >/nodirectwritedata/gluster/gvol0):1170:process_change] _GMaster:<br>
>> >> >entries: []<br>
>> >> >[2020-06-09 02:01:19.714097] D [master(worker<br>
>> >> >/nodirectwritedata/gluster/gvol0):312:a_syncdata] _GMaster: files<br>
>> >> >files=set(['.gfid/0f98f9cd-1800-4c0f-b449-edcd7446bf17',<br>
>> >> >'.gfid/512b4710-5af7-4e5a-8f3a-0a3dece42f77',<br>
>> >> >'.gfid/8ae32eec-f766-4cd9-a788-4561ba1fa435'])<br>
>> >> >[2020-06-09 02:01:19.714286] D [master(worker<br>
>> >> >/nodirectwritedata/gluster/gvol0):315:a_syncdata] _GMaster:<br>
>> >candidate<br>
>> >> >for<br>
>> >> >syncing file=.gfid/0f98f9cd-1800-4c0f-b449-edcd7446bf17<br>
>> >> >[2020-06-09 02:01:19.714433] D [master(worker<br>
>> >> >/nodirectwritedata/gluster/gvol0):315:a_syncdata] _GMaster:<br>
>> >candidate<br>
>> >> >for<br>
>> >> >syncing file=.gfid/512b4710-5af7-4e5a-8f3a-0a3dece42f77<br>
>> >> >[2020-06-09 02:01:19.714577] D [master(worker<br>
>> >> >/nodirectwritedata/gluster/gvol0):315:a_syncdata] _GMaster:<br>
>> >candidate<br>
>> >> >for<br>
>> >> >syncing file=.gfid/8ae32eec-f766-4cd9-a788-4561ba1fa435<br>
>> >> >[2020-06-09 02:01:20.179656] D [resource(worker<br>
>> >> >/nodirectwritedata/gluster/gvol0):1419:rsync] SSH: files:<br>
>> >> >.gfid/0f98f9cd-1800-4c0f-b449-edcd7446bf17,<br>
>> >> >.gfid/512b4710-5af7-4e5a-8f3a-0a3dece42f77,<br>
>> >> >.gfid/779cd2b3-1571-446a-8903-48d6183d3dd0,<br>
>> >> >.gfid/8ae32eec-f766-4cd9-a788-4561ba1fa435,<br>
>> >> >.gfid/0f98f9cd-1800-4c0f-b449-edcd7446bf17,<br>
>> >> >.gfid/512b4710-5af7-4e5a-8f3a-0a3dece42f77,<br>
>> >> >.gfid/8ae32eec-f766-4cd9-a788-4561ba1fa435<br>
>> >> >[2020-06-09 02:01:20.738632] I [master(worker<br>
>> >> >/nodirectwritedata/gluster/gvol0):1954:syncjob] Syncer: Sync Time<br>
>> >Taken<br>
>> >> >duration=0.5588 num_files=7 job=2 return_code=0<br>
>> >> >[2020-06-09 02:01:20.739650] D [master(worker<br>
>> >> >/nodirectwritedata/gluster/gvol0):321:regjob] _GMaster: synced<br>
>> >> > file=.gfid/0f98f9cd-1800-4c0f-b449-edcd7446bf17<br>
>> >> >[2020-06-09 02:01:20.740041] D [master(worker<br>
>> >> >/nodirectwritedata/gluster/gvol0):321:regjob] _GMaster: synced<br>
>> >> > file=.gfid/512b4710-5af7-4e5a-8f3a-0a3dece42f77<br>
>> >> >[2020-06-09 02:01:20.740200] D [master(worker<br>
>> >> >/nodirectwritedata/gluster/gvol0):321:regjob] _GMaster: synced<br>
>> >> > file=.gfid/779cd2b3-1571-446a-8903-48d6183d3dd0<br>
>> >> >[2020-06-09 02:01:20.740343] D [master(worker<br>
>> >> >/nodirectwritedata/gluster/gvol0):321:regjob] _GMaster: synced<br>
>> >> > file=.gfid/8ae32eec-f766-4cd9-a788-4561ba1fa435<br>
>> >> >[2020-06-09 02:01:20.740482] D [master(worker<br>
>> >> >/nodirectwritedata/gluster/gvol0):321:regjob] _GMaster: synced<br>
>> >> > file=.gfid/0f98f9cd-1800-4c0f-b449-edcd7446bf17<br>
>> >> >[2020-06-09 02:01:20.740616] D [master(worker<br>
>> >> >/nodirectwritedata/gluster/gvol0):321:regjob] _GMaster: synced<br>
>> >> > file=.gfid/512b4710-5af7-4e5a-8f3a-0a3dece42f77<br>
>> >> >[2020-06-09 02:01:20.740741] D [master(worker<br>
>> >> >/nodirectwritedata/gluster/gvol0):321:regjob] _GMaster: synced<br>
>> >> > file=.gfid/8ae32eec-f766-4cd9-a788-4561ba1fa435<br>
>> >> >[2020-06-09 02:01:20.745385] D [repce(worker<br>
>> >> >/nodirectwritedata/gluster/gvol0):196:push] RepceClient: call<br>
>> >> >22499:140085349934848:1591668080.75<br>
>> >><br>
>> >><br>
>><br>
>><br>
>>>done('/var/lib/misc/gluster/gsyncd/gvol0_nvfs10_gvol0/nodirectwritedata-gluster-gvol0/.processing/CHANGELOG.1591668040',)<br>
>> >> >...<br>
>> >> >[2020-06-09 02:01:20.746252] D [repce(worker<br>
>> >> >/nodirectwritedata/gluster/gvol0):216:__call__] RepceClient: call<br>
>> >> >22499:140085349934848:1591668080.75 done -> None<br>
>> >> >[2020-06-09 02:01:20.746515] D [repce(worker<br>
>> >> >/nodirectwritedata/gluster/gvol0):196:push] RepceClient: call<br>
>> >> >22499:140085349934848:1591668080.75<br>
>> >><br>
>> >><br>
>><br>
>><br>
>>>done('/var/lib/misc/gluster/gsyncd/gvol0_nvfs10_gvol0/nodirectwritedata-gluster-gvol0/.processing/CHANGELOG.1591668055',)<br>
>> >> >...<br>
>> >> >[2020-06-09 02:01:20.747045] D [repce(worker<br>
>> >> >/nodirectwritedata/gluster/gvol0):216:__call__] RepceClient: call<br>
>> >> >22499:140085349934848:1591668080.75 done -> None<br>
>> >> >[2020-06-09 02:01:31.364780] I [gsyncd(config-get):308:main]<br>
><top>:<br>
>> >> >Using<br>
>> >> >session config file<br>
>> >><br>
>><br>
>>>path=/var/lib/glusterd/geo-replication/gvol0_nvfs10_gvol0/gsyncd.conf<br>
>> >> >[2020-06-09 02:01:31.458915] I [gsyncd(status):308:main] <top>:<br>
>> >Using<br>
>> >> >session config file<br>
>> >><br>
>><br>
>>>path=/var/lib/glusterd/geo-replication/gvol0_nvfs10_gvol0/gsyncd.conf<br>
>> >> >[2020-06-09 02:01:32.156028] I [gsyncd(config-get):308:main]<br>
><top>:<br>
>> >> >Using<br>
>> >> >session config file<br>
>> >><br>
>><br>
>>>path=/var/lib/glusterd/geo-replication/gvol0_nvfs10_gvol0/gsyncd.conf<br>
>> >> >[2020-06-09 02:01:32.250082] I [gsyncd(status):308:main] <top>:<br>
>> >Using<br>
>> >> >session config file<br>
>> >><br>
>><br>
>>>path=/var/lib/glusterd/geo-replication/gvol0_nvfs10_gvol0/gsyncd.conf<br>
>> >> >[2020-06-09 02:01:33.778866] I [master(worker<br>
>> >> >/nodirectwritedata/gluster/gvol0):1384:process] _GMaster: Entry<br>
>Time<br>
>> >> >Taken<br>
>> >> > MKD=0 MKN=0 LIN=0 SYM=0 REN=0 RMD=0 CRE=0<br>
>> >> >duration=0.0000 UNL=0<br>
>> >> >[2020-06-09 02:01:33.779000] I [master(worker<br>
>> >> >/nodirectwritedata/gluster/gvol0):1394:process] _GMaster:<br>
>> >Data/Metadata<br>
>> >> >Time Taken SETA=0 SETX=0 meta_duration=0.0000<br>
>> >> > data_duration=14.0839 DATA=11 XATT=0<br>
>> >> >[2020-06-09 02:01:33.779195] I [master(worker<br>
>> >> >/nodirectwritedata/gluster/gvol0):1404:process] _GMaster: Batch<br>
>> >> >Completed<br>
>> >> >changelog_end=1591668055 entry_stime=(1591668024, 0)<br>
>> >> >changelog_start=1591668040 stime=(1591668054, 0)<br>
>> >> >duration=14.1043<br>
>> >> > num_changelogs=2 mode=live_changelog<br>
>> >> >[2020-06-09 02:01:35.337511] I [gsyncd(config-get):308:main]<br>
><top>:<br>
>> >> >Using<br>
>> >> >session config file<br>
>> >><br>
>><br>
>>>path=/var/lib/glusterd/geo-replication/gvol0_nvfs10_gvol0/gsyncd.conf<br>
>> >> >[2020-06-09 02:01:35.432674] I [gsyncd(status):308:main] <top>:<br>
>> >Using<br>
>> >> >session config file<br>
>> >><br>
>><br>
>>>path=/var/lib/glusterd/geo-replication/gvol0_nvfs10_gvol0/gsyncd.conf<br>
>> >> >[2020-06-09 02:01:38.784449] D [master(worker<br>
>> >> >/nodirectwritedata/gluster/gvol0):551:crawlwrap] _GMaster: ...<br>
>crawl<br>
>> >> >#18195<br>
>> >> >done, took 32.533401 seconds<br>
>> >> >[2020-06-09 02:01:38.790709] D [repce(worker<br>
>> >> >/nodirectwritedata/gluster/gvol0):196:push] RepceClient: call<br>
>> >> >22499:140085349934848:1591668098.79 scan() ...<br>
>> >> >[2020-06-09 02:01:38.791342] D [repce(worker<br>
>> >> >/nodirectwritedata/gluster/gvol0):216:__call__] RepceClient: call<br>
>> >> >22499:140085349934848:1591668098.79 scan -> None<br>
>> >> >[2020-06-09 02:01:38.791479] D [repce(worker<br>
>> >> >/nodirectwritedata/gluster/gvol0):196:push] RepceClient: call<br>
>> >> >22499:140085349934848:1591668098.79 getchanges() ...<br>
>> >> >[2020-06-09 02:01:38.791989] D [repce(worker<br>
>> >> >/nodirectwritedata/gluster/gvol0):216:__call__] RepceClient: call<br>
>> >> >22499:140085349934848:1591668098.79 getchanges -><br>
>> >><br>
>> >><br>
>><br>
>><br>
>>>['/var/lib/misc/gluster/gsyncd/gvol0_nvfs10_gvol0/nodirectwritedata-gluster-gvol0/.processing/CHANGELOG.1591668069',<br>
>> >><br>
>> >><br>
>><br>
>><br>
>>>'/var/lib/misc/gluster/gsyncd/gvol0_nvfs10_gvol0/nodirectwritedata-gluster-gvol0/.processing/CHANGELOG.1591668083',<br>
>> >><br>
>> >><br>
>><br>
>><br>
>>>'/var/lib/misc/gluster/gsyncd/gvol0_nvfs10_gvol0/nodirectwritedata-gluster-gvol0/.processing/CHANGELOG.1591668097']<br>
>> >> >[2020-06-09 02:01:38.792114] I [master(worker<br>
>> >> >/nodirectwritedata/gluster/gvol0):1470:crawl] _GMaster: slave's<br>
>time<br>
>> >> > stime=(1591668054, 0)<br>
>> >> ><br>
>> >> >Something very similar is logged around every 32 seconds, and I<br>
>> >notice<br>
>> >> >the<br>
>> >> >log above says the crawl took 32 seconds.<br>
>> >> ><br>
>> >> ><br>
>> >> >On Mon, 8 Jun 2020 at 16:22, sankarshan <<a href="mailto:sankarshan@kadalu.io" target="_blank">sankarshan@kadalu.io</a>><br>
>> >wrote:<br>
>> >> ><br>
>> >> >> Reading through the thread it occurs to me that it would be a<br>
>> >> >stronger<br>
>> >> >> approach to understand the workload (a general description of<br>
>the<br>
>> >> >> application) and in terms of the releases of GlusterFS running,<br>
>> >> >assess<br>
>> >> >> if there are new issues to be addressed or if existing sets of<br>
>> >> >patches<br>
>> >> >> tend to work. Thanks for setting up the debug level in the log.<br>
>> >High<br>
>> >> >> CPU usage by a geo-replication process would need to be traced<br>
>> >back<br>
>> >> >to<br>
>> >> >> why it really requires that %-age of CPU if it was not doing so<br>
>> >> >> previously.<br>
>> >> >><br>
>> >> >> On Mon, 8 Jun 2020 at 05:29, David Cunningham<br>
>> >> ><<a href="mailto:dcunningham@voisonics.com" target="_blank">dcunningham@voisonics.com</a>><br>
>> >> >> wrote:<br>
>> >> >> ><br>
>> >> >> > Hi Strahil,<br>
>> >> >> ><br>
>> >> >> > The CPU is still quite high, with "top" regularly showing<br>
>100%<br>
>> >CPU<br>
>> >> >usage<br>
>> >> >> by that process. However it's not clear whether this is really<br>
>a<br>
>> >> >problem,<br>
>> >> >> or if it's just normal geo-replication activity. While CPU<br>
>usage<br>
>> >was<br>
>> >> >not<br>
>> >> >> previously as high on this server, it's not clear whether<br>
>> >GlusterFS<br>
>> >> >might<br>
>> >> >> have made this the "push" node for geo-replication when it used<br>
>to<br>
>> >be<br>
>> >> >some<br>
>> >> >> other server. Having said that, no other server shows a sudden<br>
>> >drop<br>
>> >> >in CPU<br>
>> >> >> usage.<br>
>> >> >> ><br>
>> >> >> > Ideally we could find out what the gsyncd process is doing,<br>
>and<br>
>> >> >> therefore whether it's expected or not. Any ideas on that? We<br>
>did<br>
>> >set<br>
>> >> >the<br>
>> >> >> log level to DEBUG as Sunny suggested.<br>
>> >> >> ><br>
>> >> >> > Looking at<br>
>> >> >><br>
>> >><br>
>> >><br>
>><br>
>><br>
>>>/var/log/glusterfs/geo-replication-slaves/gvol0_nvfs10_gvol0/mnt-cafs30-nodirectwritedata-gluster-gvol0.log<br>
>> >> >> on the geo-replication slave, we have a lot of lines like the<br>
>> >> >following<br>
>> >> >> logged.<br>
>> >> >> ><br>
>> >> >> > [2020-06-06 00:32:43.155856] W<br>
>[fuse-bridge.c:897:fuse_attr_cbk]<br>
>> >> >> 0-glusterfs-fuse: 875853: STAT()<br>
>> >> >> /.gfid/25d78a6c-41a9-4364-84a8-31f5571223ac => -1 (Stale file<br>
>> >handle)<br>
>> >> >> > [2020-06-06 00:32:43.219759] W<br>
>[fuse-bridge.c:897:fuse_attr_cbk]<br>
>> >> >> 0-glusterfs-fuse: 875923: STAT()<br>
>> >> >> /.gfid/25d78a6c-41a9-4364-84a8-31f5571223ac => -1 (Stale file<br>
>> >handle)<br>
>> >> >> > [2020-06-06 00:32:43.280357] W<br>
>[fuse-bridge.c:897:fuse_attr_cbk]<br>
>> >> >> 0-glusterfs-fuse: 876001: STAT()<br>
>> >> >> /.gfid/25d78a6c-41a9-4364-84a8-31f5571223ac => -1 (Stale file<br>
>> >handle)<br>
>> >> >> > The message "E [MSGID: 109040]<br>
>> >> >> [dht-helper.c:1332:dht_migration_complete_check_task]<br>
>0-gvol0-dht:<br>
>> >> >> /.gfid/25d78a6c-41a9-4364-84a8-31f5571223ac: failed to lookup<br>
>the<br>
>> >> >file on<br>
>> >> >> gvol0-dht [Stale file handle]" repeated 9 times between<br>
>> >[2020-06-06<br>
>> >> >> 00:32:42.689780] and [2020-06-06 00:32:43.280322]<br>
>> >> >> > [2020-06-06 09:03:03.660956] E [MSGID: 109040]<br>
>> >> >> [dht-helper.c:1332:dht_migration_complete_check_task]<br>
>0-gvol0-dht:<br>
>> >> >> /.gfid/decdd552-d58b-4ddf-b27a-0da9a6fbc38b: failed to lookup<br>
>the<br>
>> >> >file on<br>
>> >> >> gvol0-dht [Stale file handle]<br>
>> >> >> > [2020-06-06 09:03:03.661057] W<br>
>[fuse-bridge.c:897:fuse_attr_cbk]<br>
>> >> >> 0-glusterfs-fuse: 965375: STAT()<br>
>> >> >> /.gfid/decdd552-d58b-4ddf-b27a-0da9a6fbc38b => -1 (Stale file<br>
>> >handle)<br>
>> >> >> > [2020-06-06 09:03:10.258798] E [MSGID: 109040]<br>
>> >> >> [dht-helper.c:1332:dht_migration_complete_check_task]<br>
>0-gvol0-dht:<br>
>> >> >> /.gfid/1d90e129-d40f-4ca4-bea8-0bb3c4c7985a: failed to lookup<br>
>the<br>
>> >> >file on<br>
>> >> >> gvol0-dht [Stale file handle]<br>
>> >> >> > [2020-06-06 09:03:10.258880] W<br>
>[fuse-bridge.c:897:fuse_attr_cbk]<br>
>> >> >> 0-glusterfs-fuse: 969455: STAT()<br>
>> >> >> /.gfid/1d90e129-d40f-4ca4-bea8-0bb3c4c7985a => -1 (Stale file<br>
>> >handle)<br>
>> >> >> > [2020-06-06 09:09:41.259362] E [MSGID: 109040]<br>
>> >> >> [dht-helper.c:1332:dht_migration_complete_check_task]<br>
>0-gvol0-dht:<br>
>> >> >> /.gfid/740efff6-74dd-45be-847f-66919d1179e0: failed to lookup<br>
>the<br>
>> >> >file on<br>
>> >> >> gvol0-dht [Stale file handle]<br>
>> >> >> > [2020-06-06 09:09:41.259458] W<br>
>[fuse-bridge.c:897:fuse_attr_cbk]<br>
>> >> >> 0-glusterfs-fuse: 1040904: STAT()<br>
>> >> >> /.gfid/740efff6-74dd-45be-847f-66919d1179e0 => -1 (Stale file<br>
>> >handle)<br>
>> >> >> ><br>
>> >> >> > Could these errors be part of the problem?<br>
>> >> >> ><br>
>> >> >> > Thanks again.<br>
>> >> >> ><br>
>> >> >> ><br>
>> >> >> > On Sat, 6 Jun 2020 at 21:21, Strahil Nikolov<br>
>> >> ><<a href="mailto:hunter86_bg@yahoo.com" target="_blank">hunter86_bg@yahoo.com</a>><br>
>> >> >> wrote:<br>
>> >> >> >><br>
>> >> >> >> Hey David,<br>
>> >> >> >><br>
>> >> >> >> can you check the old logs for gfid mismatch and get a list <br>
>of<br>
>> >> >files<br>
>> >> >> that were causing the high cpu .<br>
>> >> >> >> Maybe they are related somehow (maybe created by the same<br>
>> >> >software<br>
>> >> >> , same client version or something else) which could help<br>
>about<br>
>> >> >that.<br>
>> >> >> >><br>
>> >> >> >> Also take a look in geo-replication-slave logs.<br>
>> >> >> >><br>
>> >> >> >> Does the issue still occurs ?<br>
>> >> >> >><br>
>> >> >> >> Best Regards,<br>
>> >> >> >> Strahil Nikolov<br>
>> >> >> >><br>
>> >> >> >> На 6 юни 2020 г. 1:21:55 GMT+03:00, David Cunningham <<br>
>> >> >> <a href="mailto:dcunningham@voisonics.com" target="_blank">dcunningham@voisonics.com</a>> написа:<br>
>> >> >> >> >Hi Sunny and Strahil,<br>
>> >> >> >> ><br>
>> >> >> >> >Thanks again for your responses. We don't have a lot of<br>
>> >renaming<br>
>> >> >> >> >activity -<br>
>> >> >> >> >maybe some, but not a lot. We do have files which are open<br>
>for<br>
>> >> >writing<br>
>> >> >> >> >for<br>
>> >> >> >> >quite a while - they're call recordings being written as<br>
>the<br>
>> >call<br>
>> >> >> >> >happens.<br>
>> >> >> >> ><br>
>> >> >> >> >We've installed GlusterFS using the Ubuntu packages and I'd<br>
>> >> >really like<br>
>> >> >> >> >to<br>
>> >> >> >> >avoid compiling and applying patches.<br>
>> >> >> >> ><br>
>> >> >> >> >After enabling DEBUG the log at<br>
>> >> >> >><br>
>> >>/var/log/glusterfs/geo-replication/gvol0_nvfs10_gvol0/gsyncd.log<br>
>> >> >> >> >doesn't<br>
>> >> >> >> >seem to show anything very unusual:<br>
>> >> >> >> ><br>
>> >> >> >> >[2020-06-03 02:49:01.992177] I [master(worker<br>
>> >> >> >> >/nodirectwritedata/gluster/gvol0):1384:process] _GMaster:<br>
>> >Entry<br>
>> >> >Time<br>
>> >> >> >> >Taken<br>
>> >> >> >> > MKD=0 MKN=0 LIN=0 SYM=0 REN=0 RMD=0CRE=0<br>
>> >> >duration=0.0000<br>
>> >> >> >> >UNL=0<br>
>> >> >> >> >[2020-06-03 02:49:01.992465] I [master(worker<br>
>> >> >> >> >/nodirectwritedata/gluster/gvol0):1394:process] _GMaster:<br>
>> >> >Data/Metadata<br>
>> >> >> >> >Time Taken SETA=0 SETX=0<br>
>> >> >> >> >meta_duration=0.0000data_duration=13.0954<br>
>> >> >> >> > DATA=8 XATT=0<br>
>> >> >> >> >[2020-06-03 02:49:01.992863] I [master(worker<br>
>> >> >> >> >/nodirectwritedata/gluster/gvol0):1404:process] _GMaster:<br>
>> >Batch<br>
>> >> >> >> >Completed<br>
>> >> >> >> >changelog_end=1591152508 entry_stime=(1591152352, 0)<br>
>> >> >> >> >changelog_start=1591152494 stime=(1591152507, 0)<br>
>> >> >> >> >duration=13.1077<br>
>> >> >> >> > num_changelogs=2 mode=live_changelog<br>
>> >> >> >> >[2020-06-03 02:49:02.958687] D [repce(worker<br>
>> >> >> >> >/nodirectwritedata/gluster/gvol0):196:push] RepceClient:<br>
>call<br>
>> >> >> >> >19017:139678812452608:1591152542.96 keep_alive({'version':<br>
>(1,<br>
>> >> >0),<br>
>> >> >> >> >'retval': 0, 'uuid':<br>
>'8ee85fae-f3aa-4285-ad48-67a1dc17ed73',<br>
>> >> >'timeout':<br>
>> >> >> >> >1591152662, 'volume_mark': (1583043396, 161632)},) ...<br>
>> >> >> >> >[2020-06-03 02:49:02.979139] D [repce(worker<br>
>> >> >> >> >/nodirectwritedata/gluster/gvol0):216:__call__]<br>
>RepceClient:<br>
>> >call<br>
>> >> >> >> >19017:139678812452608:1591152542.96 keep_alive -> 28<br>
>> >> >> >> >[2020-06-03 02:49:06.998127] D [master(worker<br>
>> >> >> >> >/nodirectwritedata/gluster/gvol0):551:crawlwrap] _GMaster:<br>
>...<br>
>> >> >crawl<br>
>> >> >> >> >#114<br>
>> >> >> >> >done, took 30.180089 seconds<br>
>> >> >> >> >[2020-06-03 02:49:07.10132] D [repce(worker<br>
>> >> >> >> >/nodirectwritedata/gluster/gvol0):196:push] RepceClient:<br>
>call<br>
>> >> >> >> >19017:139679441716992:1591152547.01 scan() ...<br>
>> >> >> >> >[2020-06-03 02:49:07.10781] D [repce(worker<br>
>> >> >> >> >/nodirectwritedata/gluster/gvol0):216:__call__]<br>
>RepceClient:<br>
>> >call<br>
>> >> >> >> >19017:139679441716992:1591152547.01 scan -> None<br>
>> >> >> >> >[2020-06-03 02:49:07.10935] D [repce(worker<br>
>> >> >> >> >/nodirectwritedata/gluster/gvol0):196:push] RepceClient:<br>
>call<br>
>> >> >> >> >19017:139679441716992:1591152547.01 getchanges() ...<br>
>> >> >> >> >[2020-06-03 02:49:07.11579] D [repce(worker<br>
>> >> >> >> >/nodirectwritedata/gluster/gvol0):216:__call__]<br>
>RepceClient:<br>
>> >call<br>
>> >> >> >> >19017:139679441716992:1591152547.01 getchanges -><br>
>> >> >> >><br>
>> >> >><br>
>> >><br>
>> >><br>
>><br>
>><br>
>>>>['/var/lib/misc/gluster/gsyncd/gvol0_nvfs10_gvol0/nodirectwritedata-gluster-gvol0/.processing/CHANGELOG.1591152522']<br>
>> >> >> >> >[2020-06-03 02:49:07.11720] I [master(worker<br>
>> >> >> >> >/nodirectwritedata/gluster/gvol0):1470:crawl] _GMaster:<br>
>> >slave's<br>
>> >> >time<br>
>> >> >> >> >stime=(1591152507, 0)<br>
>> >> >> >> ><br>
>> >> >> >> >Am I looking in the right place to find out what that<br>
>> >gsyncd.py<br>
>> >> >process<br>
>> >> >> >> >is<br>
>> >> >> >> >doing?<br>
>> >> >> >> ><br>
>> >> >> >> ><br>
>> >> >> >> >On Tue, 2 Jun 2020 at 21:58, Sunny Kumar<br>
><<a href="mailto:sunkumar@redhat.com" target="_blank">sunkumar@redhat.com</a>><br>
>> >> >wrote:<br>
>> >> >> >> ><br>
>> >> >> >> >> Hi David,<br>
>> >> >> >> >><br>
>> >> >> >> >> You haven't answered my previous question regarding the<br>
>type<br>
>> >of<br>
>> >> >your<br>
>> >> >> >> >> workload.<br>
>> >> >> >> >> ---<br>
>> >> >> >> >> You can use the below command to enable debug log.<br>
>> >> >> >> >><br>
>> >> >> >> >> `gluster vol geo-rep <mastervol> <slavehost>::<slavevol><br>
>> >config<br>
>> >> >> >> >log-level<br>
>> >> >> >> >> DEBUG`<br>
>> >> >> >> >><br>
>> >> >> >> >> and after capturing log again switch back to info mode:<br>
>> >> >> >> >><br>
>> >> >> >> >> `gluster vol geo-rep <mastervol> <slavehost>::<slavevol><br>
>> >config<br>
>> >> >> >> >log-level<br>
>> >> >> >> >> INFO`<br>
>> >> >> >> >><br>
>> >> >> >> >> Please share the debug log and geo-rep config to debug<br>
>> >further:<br>
>> >> >> >> >> for config:<br>
>> >> >> >> >><br>
>> >> >> >> >> `gluster vol geo-rep <mastervol> <slavehost>::<slavevol><br>
>> >> >config`<br>
>> >> >> >> >><br>
>> >> >> >> >> /sunny<br>
>> >> >> >> >><br>
>> >> >> >> >><br>
>> >> >> >> >> On Tue, Jun 2, 2020 at 10:18 AM Strahil Nikolov<br>
>> >> >> >> ><<a href="mailto:hunter86_bg@yahoo.com" target="_blank">hunter86_bg@yahoo.com</a>><br>
>> >> >> >> >> wrote:<br>
>> >> >> >> >> ><br>
>> >> >> >> >> > Hi David,<br>
>> >> >> >> >> ><br>
>> >> >> >> >> > in which log do you see the entries ?<br>
>> >> >> >> >> ><br>
>> >> >> >> >> > I think I got an explanation why you see the process <br>
>only<br>
>> >on<br>
>> >> >one<br>
>> >> >> >> >of the<br>
>> >> >> >> >> master nodes - geo-rep session is established from only<br>
>1<br>
>> >> >master<br>
>> >> >> >> >node /I<br>
>> >> >> >> >> hope someone corrects me if I'm wrong/ to one slave node.<br>
>> >Thus<br>
>> >> >it<br>
>> >> >> >> >will be<br>
>> >> >> >> >> natural to see the high CPU usage on only 1 master node<br>
>in<br>
>> >> >your<br>
>> >> >> >> >situation.<br>
>> >> >> >> >> ><br>
>> >> >> >> >> > Do you see anything else in the :<br>
>> >> >> >> >> var/log/glusterfs/geo-replication/<logs> (master nodes)<br>
>or<br>
>> >in<br>
>> >> >> >> >> /var/log/glusterfs/geo-replication-slaves (slaves) that<br>
>> >could<br>
>> >> >hint of<br>
>> >> >> >> >the<br>
>> >> >> >> >> exact issue. I have a vague feeling that that python<br>
>script<br>
>> >is<br>
>> >> >> >> >constantly<br>
>> >> >> >> >> looping over some data causing the CPU hog.<br>
>> >> >> >> >> ><br>
>> >> >> >> >> > Sadly, I can't find an instruction for increasing the<br>
>log<br>
>> >> >level of<br>
>> >> >> >> >the<br>
>> >> >> >> >> geo rep log .<br>
>> >> >> >> >> ><br>
>> >> >> >> >> ><br>
>> >> >> >> >> > Best Regards,<br>
>> >> >> >> >> > Strahil Nikolov<br>
>> >> >> >> >> ><br>
>> >> >> >> >> ><br>
>> >> >> >> >> > На 2 юни 2020 г. 6:14:46 GMT+03:00, David Cunningham <<br>
>> >> >> >> >> <a href="mailto:dcunningham@voisonics.com" target="_blank">dcunningham@voisonics.com</a>> написа:<br>
>> >> >> >> >> > >Hi Strahil and Sunny,<br>
>> >> >> >> >> > ><br>
>> >> >> >> >> > >Thank you for the replies. I checked the gfid on the<br>
>> >master<br>
>> >> >and<br>
>> >> >> >> >slaves<br>
>> >> >> >> >> > >and<br>
>> >> >> >> >> > >they are the same. After moving the file away and back<br>
>> >again<br>
>> >> >it<br>
>> >> >> >> >doesn't<br>
>> >> >> >> >> > >seem to be having the issue with that file any more.<br>
>> >> >> >> >> > ><br>
>> >> >> >> >> > >We are still getting higher CPU usage on one of the<br>
>> >master<br>
>> >> >nodes<br>
>> >> >> >> >than<br>
>> >> >> >> >> > >the<br>
>> >> >> >> >> > >others. It logs this every few seconds:<br>
>> >> >> >> >> > ><br>
>> >> >> >> >> > >[2020-06-02 03:10:15.637815] I [master(worker<br>
>> >> >> >> >> > >/nodirectwritedata/gluster/gvol0):1384:process]<br>
>_GMaster:<br>
>> >> >Entry<br>
>> >> >> >> >Time<br>
>> >> >> >> >> > >Taken<br>
>> >> >> >> >> > > MKD=0 MKN=0 LIN=0 SYM=0 REN=0 RMD=0CRE=0<br>
>> >> >> >> >duration=0.0000<br>
>> >> >> >> >> > >UNL=0<br>
>> >> >> >> >> > >[2020-06-02 03:10:15.638010] I [master(worker<br>
>> >> >> >> >> > >/nodirectwritedata/gluster/gvol0):1394:process]<br>
>_GMaster:<br>
>> >> >> >> >Data/Metadata<br>
>> >> >> >> >> > >Time Taken SETA=0 SETX=0<br>
>> >> >> >> >> > >meta_duration=0.0000data_duration=12.7878<br>
>> >> >> >> >> > > DATA=4 XATT=0<br>
>> >> >> >> >> > >[2020-06-02 03:10:15.638286] I [master(worker<br>
>> >> >> >> >> > >/nodirectwritedata/gluster/gvol0):1404:process]<br>
>_GMaster:<br>
>> >> >Batch<br>
>> >> >> >> >> > >Completed<br>
>> >> >> >> >> > >changelog_end=1591067378 <br>
>entry_stime=(1591067167,<br>
>> >0)<br>
>> >> >> >> >> > >changelog_start=1591067364 stime=(1591067377, 0)<br>
>> >> >> >> >> > >duration=12.8068<br>
>> >> >> >> >> > > num_changelogs=2 mode=live_changelog<br>
>> >> >> >> >> > >[2020-06-02 03:10:20.658601] I [master(worker<br>
>> >> >> >> >> > >/nodirectwritedata/gluster/gvol0):1470:crawl]<br>
>_GMaster:<br>
>> >> >slave's<br>
>> >> >> >> >time<br>
>> >> >> >> >> > > stime=(1591067377, 0)<br>
>> >> >> >> >> > >[2020-06-02 03:10:34.21799] I [master(worker<br>
>> >> >> >> >> > >/nodirectwritedata/gluster/gvol0):1954:syncjob]<br>
>Syncer:<br>
>> >Sync<br>
>> >> >Time<br>
>> >> >> >> >Taken<br>
>> >> >> >> >> > > duration=0.3826 num_files=8 job=1 return_code=0<br>
>> >> >> >> >> > >[2020-06-02 03:10:46.440535] I [master(worker<br>
>> >> >> >> >> > >/nodirectwritedata/gluster/gvol0):1384:process]<br>
>_GMaster:<br>
>> >> >Entry<br>
>> >> >> >> >Time<br>
>> >> >> >> >> > >Taken<br>
>> >> >> >> >> > > MKD=0 MKN=0 LIN=0 SYM=0 REN=1 RMD=0CRE=2<br>
>> >> >> >> >duration=0.1314<br>
>> >> >> >> >> > >UNL=1<br>
>> >> >> >> >> > >[2020-06-02 03:10:46.440809] I [master(worker<br>
>> >> >> >> >> > >/nodirectwritedata/gluster/gvol0):1394:process]<br>
>_GMaster:<br>
>> >> >> >> >Data/Metadata<br>
>> >> >> >> >> > >Time Taken SETA=0 SETX=0<br>
>> >> >> >> >> > >meta_duration=0.0000data_duration=13.0171<br>
>> >> >> >> >> > > DATA=14 XATT=0<br>
>> >> >> >> >> > >[2020-06-02 03:10:46.441205] I [master(worker<br>
>> >> >> >> >> > >/nodirectwritedata/gluster/gvol0):1404:process]<br>
>_GMaster:<br>
>> >> >Batch<br>
>> >> >> >> >> > >Completed<br>
>> >> >> >> >> > >changelog_end=1591067420 <br>
>entry_stime=(1591067419,<br>
>> >0)<br>
>> >> >> >> >> > >changelog_start=1591067392 stime=(1591067419, 0)<br>
>> >> >> >> >> > >duration=13.0322<br>
>> >> >> >> >> > > num_changelogs=3 mode=live_changelog<br>
>> >> >> >> >> > >[2020-06-02 03:10:51.460925] I [master(worker<br>
>> >> >> >> >> > >/nodirectwritedata/gluster/gvol0):1470:crawl]<br>
>_GMaster:<br>
>> >> >slave's<br>
>> >> >> >> >time<br>
>> >> >> >> >> > > stime=(1591067419, 0)<br>
>> >> >> >> >> > ><br>
>> >> >> >> >> > >[2020-06-02 03:11:04.448913] I [master(worker<br>
>> >> >> >> >> > >/nodirectwritedata/gluster/gvol0):1954:syncjob]<br>
>Syncer:<br>
>> >Sync<br>
>> >> >Time<br>
>> >> >> >> >Taken<br>
>> >> >> >> >> > >duration=0.3466 num_files=3 job=1 return_code=0<br>
>> >> >> >> >> > ><br>
>> >> >> >> >> > >Whereas the other master nodes only log this:<br>
>> >> >> >> >> > ><br>
>> >> >> >> >> > >[2020-06-02 03:11:33.886938] I<br>
>> >[gsyncd(config-get):308:main]<br>
>> >> >> >> ><top>:<br>
>> >> >> >> >> > >Using<br>
>> >> >> >> >> > >session config file<br>
>> >> >> >> >> ><br>
>> >> >> >><br>
>> >><br>
>><br>
>>>>>path=/var/lib/glusterd/geo-replication/gvol0_nvfs10_gvol0/gsyncd.conf<br>
>> >> >> >> >> > >[2020-06-02 03:11:33.993175] I<br>
>[gsyncd(status):308:main]<br>
>> >> ><top>:<br>
>> >> >> >> >Using<br>
>> >> >> >> >> > >session config file<br>
>> >> >> >> >> ><br>
>> >> >> >><br>
>> >><br>
>><br>
>>>>>path=/var/lib/glusterd/geo-replication/gvol0_nvfs10_gvol0/gsyncd.conf<br>
>> >> >> >> >> > ><br>
>> >> >> >> >> > >Can anyone help with what might cause the high CPU<br>
>usage<br>
>> >on<br>
>> >> >one<br>
>> >> >> >> >master<br>
>> >> >> >> >> > >node? The process is this one, and is using 70-100% of<br>
>> >CPU:<br>
>> >> >> >> >> > ><br>
>> >> >> >> >> > >python2<br>
>> >> >> >><br>
>> >>/usr/lib/x86_64-linux-gnu/glusterfs/python/syncdaemon/gsyncd.py<br>
>> >> >> >> >> > >worker gvol0 nvfs10::gvol0 --feedback-fd 15<br>
>--local-path<br>
>> >> >> >> >> > >/nodirectwritedata/gluster/gvol0 --local-node cafs30<br>
>> >> >> >> >--local-node-id<br>
>> >> >> >> >> > >b7521445-ee93-4fed-8ced-6a609fa8c7d4 --slave-id<br>
>> >> >> >> >> > >cdcdb210-839c-4306-a4dc-e696b165ed17 --rpc-fd<br>
>12,11,9,13<br>
>> >> >> >> >--subvol-num 1<br>
>> >> >> >> >> > >--resource-remote nvfs30 --resource-remote-id<br>
>> >> >> >> >> > >1e698ccd-aeec-4ec4-96fe-383da8fc3b78<br>
>> >> >> >> >> > ><br>
>> >> >> >> >> > >Thank you in advance!<br>
>> >> >> >> >> > ><br>
>> >> >> >> >> > ><br>
>> >> >> >> >> > ><br>
>> >> >> >> >> > ><br>
>> >> >> >> >> > >On Sat, 30 May 2020 at 20:20, Strahil Nikolov<br>
>> >> >> >> ><<a href="mailto:hunter86_bg@yahoo.com" target="_blank">hunter86_bg@yahoo.com</a>><br>
>> >> >> >> >> > >wrote:<br>
>> >> >> >> >> > ><br>
>> >> >> >> >> > >> Hey David,<br>
>> >> >> >> >> > >><br>
>> >> >> >> >> > >> for me a gfid mismatch means that the file was<br>
>> >> >> >> >replaced/recreated<br>
>> >> >> >> >> > > -<br>
>> >> >> >> >> > >> just like vim in linux does (and it is expected for<br>
>> >> >config<br>
>> >> >> >> >file).<br>
>> >> >> >> >> > >><br>
>> >> >> >> >> > >> Have you checked the gfid of the file on both<br>
>source<br>
>> >> >and<br>
>> >> >> >> >> > >destination,<br>
>> >> >> >> >> > >> do they really match or they are different ?<br>
>> >> >> >> >> > >><br>
>> >> >> >> >> > >> What happens when you move away the file from the<br>
>> >slave<br>
>> >> >,<br>
>> >> >> >> >does it<br>
>> >> >> >> >> > >fixes<br>
>> >> >> >> >> > >> the issue ?<br>
>> >> >> >> >> > >><br>
>> >> >> >> >> > >> Best Regards,<br>
>> >> >> >> >> > >> Strahil Nikolov<br>
>> >> >> >> >> > >><br>
>> >> >> >> >> > >> На 30 май 2020 г. 1:10:56 GMT+03:00, David<br>
>Cunningham <<br>
>> >> >> >> >> > >> <a href="mailto:dcunningham@voisonics.com" target="_blank">dcunningham@voisonics.com</a>> написа:<br>
>> >> >> >> >> > >> >Hello,<br>
>> >> >> >> >> > >> ><br>
>> >> >> >> >> > >> >We're having an issue with a geo-replication<br>
>process<br>
>> >with<br>
>> >> >> >> >unusually<br>
>> >> >> >> >> > >> >high<br>
>> >> >> >> >> > >> >CPU use and giving "Entry not present on master.<br>
>> >Fixing<br>
>> >> >gfid<br>
>> >> >> >> >> > >mismatch<br>
>> >> >> >> >> > >> >in<br>
>> >> >> >> >> > >> >slave" errors. Can anyone help on this?<br>
>> >> >> >> >> > >> ><br>
>> >> >> >> >> > >> >We have 3 GlusterFS replica nodes (we'll call the<br>
>> >> >master),<br>
>> >> >> >> >which<br>
>> >> >> >> >> > >also<br>
>> >> >> >> >> > >> >push<br>
>> >> >> >> >> > >> >data to a remote server (slave) using<br>
>geo-replication.<br>
>> >> >This has<br>
>> >> >> >> >been<br>
>> >> >> >> >> > >> >running fine for a couple of months, but yesterday<br>
>one<br>
>> >of<br>
>> >> >the<br>
>> >> >> >> >master<br>
>> >> >> >> >> > >> >nodes<br>
>> >> >> >> >> > >> >started having unusually high CPU use. It's this<br>
>> >process:<br>
>> >> >> >> >> > >> ><br>
>> >> >> >> >> > >> >root@cafs30:/var/log/glusterfs# ps aux | grep 32048<br>
>> >> >> >> >> > >> >root 32048 68.7 0.6 1843140 845756 ? Rl<br>
>> >02:51<br>
>> >> >> >> >493:51<br>
>> >> >> >> >> > >> >python2<br>
>> >> >> >> >> > >><br>
>> >> >>/usr/lib/x86_64-linux-gnu/glusterfs/python/syncdaemon/gsyncd.py<br>
>> >> >> >> >> > >worker<br>
>> >> >> >> >> > >> >gvol0 nvfs10::gvol0 --feedback-fd 15 --local-path<br>
>> >> >> >> >> > >> >/nodirectwritedata/gluster/gvol0 --local-node<br>
>cafs30<br>
>> >> >> >> >--local-node-id<br>
>> >> >> >> >> > >> >b7521445-ee93-4fed-8ced-6a609fa8c7d4 --slave-id<br>
>> >> >> >> >> > >> >cdcdb210-839c-4306-a4dc-e696b165ed17 --rpc-fd<br>
>> >12,11,9,13<br>
>> >> >> >> >> > >--subvol-num 1<br>
>> >> >> >> >> > >> >--resource-remote nvfs30 --resource-remote-id<br>
>> >> >> >> >> > >> >1e698ccd-aeec-4ec4-96fe-383da8fc3b78<br>
>> >> >> >> >> > >> ><br>
>> >> >> >> >> > >> >Here's what is being logged in<br>
>> >> >> >> >> > >><br>
>> >> >> >><br>
>> >><br>
>>>>/var/log/glusterfs/geo-replication/gvol0_nvfs10_gvol0/gsyncd.log:<br>
>> >> >> >> >> > >> ><br>
>> >> >> >> >> > >> >[2020-05-29 21:57:18.843524] I [master(worker<br>
>> >> >> >> >> > >> >/nodirectwritedata/gluster/gvol0):1470:crawl]<br>
>> >_GMaster:<br>
>> >> >slave's<br>
>> >> >> >> >time<br>
>> >> >> >> >> > >> > stime=(1590789408, 0)<br>
>> >> >> >> >> > >> >[2020-05-29 21:57:30.626172] I [master(worker<br>
>> >> >> >> >> > >><br>
>> >> >> >><br>
>> >><br>
>>>>/nodirectwritedata/gluster/gvol0):813:fix_possible_entry_failures]<br>
>> >> >> >> >> > >> >_GMaster: Entry not present on master. Fixing gfid<br>
>> >> >mismatch in<br>
>> >> >> >> >> > >slave.<br>
>> >> >> >> >> > >> >Deleting the entry retry_count=1 <br>
>entry=({u'uid':<br>
>> >> >108,<br>
>> >> >> >> >u'gfid':<br>
>> >> >> >> >> > >> >u'7c0b75e5-d8b7-454f-8010-112d613c599e', u'gid':<br>
>117,<br>
>> >> >u'mode':<br>
>> >> >> >> >> > >33204,<br>
>> >> >> >> >> > >> >u'entry':<br>
>> >> >> >> >> > >><br>
>> >> >>u'.gfid/c5422396-1578-4b50-a29d-315be2a9c5d8/00a859f7xxxx.cfg',<br>
>> >> >> >> >> > >> >u'op': u'CREATE'}, 17, {u'slave_isdir': False,<br>
>> >> >> >> >u'gfid_mismatch':<br>
>> >> >> >> >> > >True,<br>
>> >> >> >> >> > >> >u'slave_name': None, u'slave_gfid':<br>
>> >> >> >> >> > >> >u'ec4b0ace-2ec4-4ea5-adbc-9f519b81917c',<br>
>> >> >u'name_mismatch':<br>
>> >> >> >> >False,<br>
>> >> >> >> >> > >> >u'dst':<br>
>> >> >> >> >> > >> >False})<br>
>> >> >> >> >> > >> >[2020-05-29 21:57:30.627893] I [master(worker<br>
>> >> >> >> >> > >><br>
>> >> >> >><br>
>> >><br>
>>>>/nodirectwritedata/gluster/gvol0):813:fix_possible_entry_failures]<br>
>> >> >> >> >> > >> >_GMaster: Entry not present on master. Fixing gfid<br>
>> >> >mismatch in<br>
>> >> >> >> >> > >slave.<br>
>> >> >> >> >> > >> >Deleting the entry retry_count=1 <br>
>entry=({u'uid':<br>
>> >> >108,<br>
>> >> >> >> >u'gfid':<br>
>> >> >> >> >> > >> >u'a4d52e40-2e2f-4885-be5f-65fe95a8ebd7', u'gid':<br>
>117,<br>
>> >> >u'mode':<br>
>> >> >> >> >> > >33204,<br>
>> >> >> >> >> > >> >u'entry':<br>
>> >> >> >> >> > >><br>
>> >> >> >> >> ><br>
>> >> >> >><br>
>> >> >><br>
>> >><br>
>><br>
>><br>
>>>>>>u'.gfid/f857c42e-22f1-4ce4-8f2e-13bdadedde45/polycom_00a859f7xxxx.cfg',<br>
>> >> >> >> >> > >> >u'op': u'CREATE'}, 17, {u'slave_isdir': False,<br>
>> >> >> >> >u'gfid_mismatch':<br>
>> >> >> >> >> > >True,<br>
>> >> >> >> >> > >> >u'slave_name': None, u'slave_gfid':<br>
>> >> >> >> >> > >> >u'ece8da77-b5ea-45a7-9af7-7d4d8f55f74a',<br>
>> >> >u'name_mismatch':<br>
>> >> >> >> >False,<br>
>> >> >> >> >> > >> >u'dst':<br>
>> >> >> >> >> > >> >False})<br>
>> >> >> >> >> > >> >[2020-05-29 21:57:30.629532] I [master(worker<br>
>> >> >> >> >> > >><br>
>> >> >> >><br>
>> >><br>
>>>>/nodirectwritedata/gluster/gvol0):813:fix_possible_entry_failures]<br>
>> >> >> >> >> > >> >_GMaster: Entry not present on master. Fixing gfid<br>
>> >> >mismatch in<br>
>> >> >> >> >> > >slave.<br>
>> >> >> >> >> > >> >Deleting the entry retry_count=1 <br>
>entry=({u'uid':<br>
>> >> >108,<br>
>> >> >> >> >u'gfid':<br>
>> >> >> >> >> > >> >u'3c525ad8-aeb2-46b6-9c41-7fb4987916f8', u'gid':<br>
>117,<br>
>> >> >u'mode':<br>
>> >> >> >> >> > >33204,<br>
>> >> >> >> >> > >> >u'entry':<br>
>> >> >> >> >> > >><br>
>> >> >> >> >> ><br>
>> >> >> >> >><br>
>> >> >> >><br>
>> >> >><br>
>> >><br>
>> >><br>
>><br>
>><br>
>>>>>>u'.gfid/f857c42e-22f1-4ce4-8f2e-13bdadedde45/00a859f7xxxx-directory.xml',<br>
>> >> >> >> >> > >> >u'op': u'CREATE'}, 17, {u'slave_isdir': False,<br>
>> >> >> >> >u'gfid_mismatch':<br>
>> >> >> >> >> > >True,<br>
>> >> >> >> >> > >> >u'slave_name': None, u'slave_gfid':<br>
>> >> >> >> >> > >> >u'06717b5a-d842-495d-bd25-aab9cd454490',<br>
>> >> >u'name_mismatch':<br>
>> >> >> >> >False,<br>
>> >> >> >> >> > >> >u'dst':<br>
>> >> >> >> >> > >> >False})<br>
>> >> >> >> >> > >> >[2020-05-29 21:57:30.659123] I [master(worker<br>
>> >> >> >> >> > >><br>
>> >> >>/nodirectwritedata/gluster/gvol0):942:handle_entry_failures]<br>
>> >> >> >> >> > >_GMaster:<br>
>> >> >> >> >> > >> >Sucessfully fixed entry ops with gfid mismatch<br>
>> >> >> >> >retry_count=1<br>
>> >> >> >> >> > >> >[2020-05-29 21:57:30.659343] I [master(worker<br>
>> >> >> >> >> > >><br>
>>/nodirectwritedata/gluster/gvol0):1194:process_change]<br>
>> >> >> >> >_GMaster:<br>
>> >> >> >> >> > >Retry<br>
>> >> >> >> >> > >> >original entries. count = 1<br>
>> >> >> >> >> > >> >[2020-05-29 21:57:30.725810] I [master(worker<br>
>> >> >> >> >> > >><br>
>>/nodirectwritedata/gluster/gvol0):1197:process_change]<br>
>> >> >> >> >_GMaster:<br>
>> >> >> >> >> > >> >Sucessfully fixed all entry ops with gfid mismatch<br>
>> >> >> >> >> > >> >[2020-05-29 21:57:31.747319] I [master(worker<br>
>> >> >> >> >> > >> >/nodirectwritedata/gluster/gvol0):1954:syncjob]<br>
>> >Syncer:<br>
>> >> >Sync<br>
>> >> >> >> >Time<br>
>> >> >> >> >> > >Taken<br>
>> >> >> >> >> > >> >duration=0.7409 num_files=18 job=1 <br>
>return_code=0<br>
>> >> >> >> >> > >> ><br>
>> >> >> >> >> > >> >We've verified that the files like<br>
>> >> >polycom_00a859f7xxxx.cfg<br>
>> >> >> >> >referred<br>
>> >> >> >> >> > >to<br>
>> >> >> >> >> > >> >in<br>
>> >> >> >> >> > >> >the error do exist on the master nodes and slave.<br>
>> >> >> >> >> > >> ><br>
>> >> >> >> >> > >> >We found this bug fix:<br>
>> >> >> >> >> > >> ><a href="https://bugzilla.redhat.com/show_bug.cgi?id=1642865" rel="noreferrer" target="_blank">https://bugzilla.redhat.com/show_bug.cgi?id=1642865</a><br>
>> >> >> >> >> > >> ><br>
>> >> >> >> >> > >> >However that fix went in 5.1, and we're running<br>
>5.12<br>
>> >on<br>
>> >> >the<br>
>> >> >> >> >master<br>
>> >> >> >> >> > >> >nodes<br>
>> >> >> >> >> > >> >and slave. A couple of GlusterFS clients connected<br>
>to<br>
>> >the<br>
>> >> >> >> >master<br>
>> >> >> >> >> > >nodes<br>
>> >> >> >> >> > >> >are<br>
>> >> >> >> >> > >> >running 5.13.<br>
>> >> >> >> >> > >> ><br>
>> >> >> >> >> > >> >Would anyone have any suggestions? Thank you in<br>
>> >advance.<br>
>> >> >> >> >> > >><br>
>> >> >> >> >> > ________<br>
>> >> >> >> >> ><br>
>> >> >> >> >> ><br>
>> >> >> >> >> ><br>
>> >> >> >> >> > Community Meeting Calendar:<br>
>> >> >> >> >> ><br>
>> >> >> >> >> > Schedule -<br>
>> >> >> >> >> > Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC<br>
>> >> >> >> >> > Bridge: <a href="https://bluejeans.com/441850968" rel="noreferrer" target="_blank">https://bluejeans.com/441850968</a><br>
>> >> >> >> >> ><br>
>> >> >> >> >> > Gluster-users mailing list<br>
>> >> >> >> >> > <a href="mailto:Gluster-users@gluster.org" target="_blank">Gluster-users@gluster.org</a><br>
>> >> >> >> >> ><br>
><a href="https://lists.gluster.org/mailman/listinfo/gluster-users" rel="noreferrer" target="_blank">https://lists.gluster.org/mailman/listinfo/gluster-users</a><br>
>> >> >> >> >><br>
>> >> >> >> >><br>
>> >> >> ><br>
>> >> >> ><br>
>> >> >> ><br>
>> >> >> > --<br>
>> >> >> > David Cunningham, Voisonics Limited<br>
>> >> >> > <a href="http://voisonics.com/" rel="noreferrer" target="_blank">http://voisonics.com/</a><br>
>> >> >> > USA: +1 213 221 1092<br>
>> >> >> > New Zealand: +64 (0)28 2558 3782<br>
>> >> >> > ________<br>
>> >> >> ><br>
>> >> >> ><br>
>> >> >> ><br>
>> >> >> > Community Meeting Calendar:<br>
>> >> >> ><br>
>> >> >> > Schedule -<br>
>> >> >> > Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC<br>
>> >> >> > Bridge: <a href="https://bluejeans.com/441850968" rel="noreferrer" target="_blank">https://bluejeans.com/441850968</a><br>
>> >> >> ><br>
>> >> >> > Gluster-users mailing list<br>
>> >> >> > <a href="mailto:Gluster-users@gluster.org" target="_blank">Gluster-users@gluster.org</a><br>
>> >> >> > <a href="https://lists.gluster.org/mailman/listinfo/gluster-users" rel="noreferrer" target="_blank">https://lists.gluster.org/mailman/listinfo/gluster-users</a><br>
>> >> >><br>
>> >> >><br>
>> >> >><br>
>> >> >> --<br>
>> >> >> <a href="mailto:sankarshan@kadalu.io" target="_blank">sankarshan@kadalu.io</a> | TZ: UTC+0530 | +91 99606 03294<br>
>> >> >> <a href="http://kadalu.io" rel="noreferrer" target="_blank">kadalu.io</a> : Making it easy to provision storage in k8s!<br>
>> >> >> ________<br>
>> >> >><br>
>> >> >><br>
>> >> >><br>
>> >> >> Community Meeting Calendar:<br>
>> >> >><br>
>> >> >> Schedule -<br>
>> >> >> Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC<br>
>> >> >> Bridge: <a href="https://bluejeans.com/441850968" rel="noreferrer" target="_blank">https://bluejeans.com/441850968</a><br>
>> >> >><br>
>> >> >> Gluster-users mailing list<br>
>> >> >> <a href="mailto:Gluster-users@gluster.org" target="_blank">Gluster-users@gluster.org</a><br>
>> >> >> <a href="https://lists.gluster.org/mailman/listinfo/gluster-users" rel="noreferrer" target="_blank">https://lists.gluster.org/mailman/listinfo/gluster-users</a><br>
>> >> >><br>
>> >><br>
>><br>
</blockquote></div><br clear="all"><br>-- <br><div dir="ltr" class="gmail_signature"><div dir="ltr"><div><div dir="ltr"><div><div dir="ltr"><div><div dir="ltr"><div><div dir="ltr"><div>David Cunningham, Voisonics Limited<br><a href="http://voisonics.com/" target="_blank">http://voisonics.com/</a><br>USA: +1 213 221 1092<br>New Zealand: +64 (0)28 2558 3782</div></div></div></div></div></div></div></div></div></div></div>