[Gluster-users] Geo-replication: Entry not present on master. Fixing gfid mismatch in slave
    David Cunningham 
    dcunningham at voisonics.com
       
    Tue Jun  9 02:07:11 UTC 2020
    
    
  
Hi Sankarshan,
Thanks for that. So what should we look for to figure out what this process
is doing? In
/var/log/glusterfs/geo-replication/gvol0_nvfs10_gvol0/gsyncd.log we see
something like the following logged regularly:
[[2020-06-09 02:01:19.670595] D [master(worker
/nodirectwritedata/gluster/gvol0):1454:changelogs_batch_process] _GMaster:
processing changes
batch=['/var/lib/misc/gluster/gsyncd/gvol0_nvfs10_gvol0/nodirectwritedata-gluster-gvol0/.processing/CHANGELOG.1591668040',
'/var/lib/misc/gluster/gsyncd/gvol0_nvfs10_gvol0/nodirectwritedata-gluster-gvol0/.processing/CHANGELOG.1591668055']
[2020-06-09 02:01:19.674927] D [master(worker
/nodirectwritedata/gluster/gvol0):1289:process] _GMaster: processing change
changelog=/var/lib/misc/gluster/gsyncd/gvol0_nvfs10_gvol0/nodirectwritedata-gluster-gvol0/.processing/CHANGELOG.1591668040
[2020-06-09 02:01:19.683098] D [master(worker
/nodirectwritedata/gluster/gvol0):1170:process_change] _GMaster: entries: []
[2020-06-09 02:01:19.695125] D [master(worker
/nodirectwritedata/gluster/gvol0):312:a_syncdata] _GMaster: files
files=set(['.gfid/0f98f9cd-1800-4c0f-b449-edcd7446bf17',
'.gfid/512b4710-5af7-4e5a-8f3a-0a3dece42f77',
'.gfid/779cd2b3-1571-446a-8903-48d6183d3dd0',
'.gfid/8ae32eec-f766-4cd9-a788-4561ba1fa435'])
[2020-06-09 02:01:19.695344] D [master(worker
/nodirectwritedata/gluster/gvol0):315:a_syncdata] _GMaster: candidate for
syncing file=.gfid/0f98f9cd-1800-4c0f-b449-edcd7446bf17
[2020-06-09 02:01:19.695508] D [master(worker
/nodirectwritedata/gluster/gvol0):315:a_syncdata] _GMaster: candidate for
syncing file=.gfid/512b4710-5af7-4e5a-8f3a-0a3dece42f77
[2020-06-09 02:01:19.695638] D [master(worker
/nodirectwritedata/gluster/gvol0):315:a_syncdata] _GMaster: candidate for
syncing file=.gfid/779cd2b3-1571-446a-8903-48d6183d3dd0
[2020-06-09 02:01:19.695759] D [master(worker
/nodirectwritedata/gluster/gvol0):315:a_syncdata] _GMaster: candidate for
syncing file=.gfid/8ae32eec-f766-4cd9-a788-4561ba1fa435
[2020-06-09 02:01:19.695883] D [master(worker
/nodirectwritedata/gluster/gvol0):1289:process] _GMaster: processing change
changelog=/var/lib/misc/gluster/gsyncd/gvol0_nvfs10_gvol0/nodirectwritedata-gluster-gvol0/.processing/CHANGELOG.1591668055
[2020-06-09 02:01:19.696170] D [master(worker
/nodirectwritedata/gluster/gvol0):1170:process_change] _GMaster: entries: []
[2020-06-09 02:01:19.714097] D [master(worker
/nodirectwritedata/gluster/gvol0):312:a_syncdata] _GMaster: files
files=set(['.gfid/0f98f9cd-1800-4c0f-b449-edcd7446bf17',
'.gfid/512b4710-5af7-4e5a-8f3a-0a3dece42f77',
'.gfid/8ae32eec-f766-4cd9-a788-4561ba1fa435'])
[2020-06-09 02:01:19.714286] D [master(worker
/nodirectwritedata/gluster/gvol0):315:a_syncdata] _GMaster: candidate for
syncing file=.gfid/0f98f9cd-1800-4c0f-b449-edcd7446bf17
[2020-06-09 02:01:19.714433] D [master(worker
/nodirectwritedata/gluster/gvol0):315:a_syncdata] _GMaster: candidate for
syncing file=.gfid/512b4710-5af7-4e5a-8f3a-0a3dece42f77
[2020-06-09 02:01:19.714577] D [master(worker
/nodirectwritedata/gluster/gvol0):315:a_syncdata] _GMaster: candidate for
syncing file=.gfid/8ae32eec-f766-4cd9-a788-4561ba1fa435
[2020-06-09 02:01:20.179656] D [resource(worker
/nodirectwritedata/gluster/gvol0):1419:rsync] SSH: files:
.gfid/0f98f9cd-1800-4c0f-b449-edcd7446bf17,
.gfid/512b4710-5af7-4e5a-8f3a-0a3dece42f77,
.gfid/779cd2b3-1571-446a-8903-48d6183d3dd0,
.gfid/8ae32eec-f766-4cd9-a788-4561ba1fa435,
.gfid/0f98f9cd-1800-4c0f-b449-edcd7446bf17,
.gfid/512b4710-5af7-4e5a-8f3a-0a3dece42f77,
.gfid/8ae32eec-f766-4cd9-a788-4561ba1fa435
[2020-06-09 02:01:20.738632] I [master(worker
/nodirectwritedata/gluster/gvol0):1954:syncjob] Syncer: Sync Time Taken
duration=0.5588 num_files=7     job=2   return_code=0
[2020-06-09 02:01:20.739650] D [master(worker
/nodirectwritedata/gluster/gvol0):321:regjob] _GMaster: synced
 file=.gfid/0f98f9cd-1800-4c0f-b449-edcd7446bf17
[2020-06-09 02:01:20.740041] D [master(worker
/nodirectwritedata/gluster/gvol0):321:regjob] _GMaster: synced
 file=.gfid/512b4710-5af7-4e5a-8f3a-0a3dece42f77
[2020-06-09 02:01:20.740200] D [master(worker
/nodirectwritedata/gluster/gvol0):321:regjob] _GMaster: synced
 file=.gfid/779cd2b3-1571-446a-8903-48d6183d3dd0
[2020-06-09 02:01:20.740343] D [master(worker
/nodirectwritedata/gluster/gvol0):321:regjob] _GMaster: synced
 file=.gfid/8ae32eec-f766-4cd9-a788-4561ba1fa435
[2020-06-09 02:01:20.740482] D [master(worker
/nodirectwritedata/gluster/gvol0):321:regjob] _GMaster: synced
 file=.gfid/0f98f9cd-1800-4c0f-b449-edcd7446bf17
[2020-06-09 02:01:20.740616] D [master(worker
/nodirectwritedata/gluster/gvol0):321:regjob] _GMaster: synced
 file=.gfid/512b4710-5af7-4e5a-8f3a-0a3dece42f77
[2020-06-09 02:01:20.740741] D [master(worker
/nodirectwritedata/gluster/gvol0):321:regjob] _GMaster: synced
 file=.gfid/8ae32eec-f766-4cd9-a788-4561ba1fa435
[2020-06-09 02:01:20.745385] D [repce(worker
/nodirectwritedata/gluster/gvol0):196:push] RepceClient: call
22499:140085349934848:1591668080.75
done('/var/lib/misc/gluster/gsyncd/gvol0_nvfs10_gvol0/nodirectwritedata-gluster-gvol0/.processing/CHANGELOG.1591668040',)
...
[2020-06-09 02:01:20.746252] D [repce(worker
/nodirectwritedata/gluster/gvol0):216:__call__] RepceClient: call
22499:140085349934848:1591668080.75 done -> None
[2020-06-09 02:01:20.746515] D [repce(worker
/nodirectwritedata/gluster/gvol0):196:push] RepceClient: call
22499:140085349934848:1591668080.75
done('/var/lib/misc/gluster/gsyncd/gvol0_nvfs10_gvol0/nodirectwritedata-gluster-gvol0/.processing/CHANGELOG.1591668055',)
...
[2020-06-09 02:01:20.747045] D [repce(worker
/nodirectwritedata/gluster/gvol0):216:__call__] RepceClient: call
22499:140085349934848:1591668080.75 done -> None
[2020-06-09 02:01:31.364780] I [gsyncd(config-get):308:main] <top>: Using
session config file
path=/var/lib/glusterd/geo-replication/gvol0_nvfs10_gvol0/gsyncd.conf
[2020-06-09 02:01:31.458915] I [gsyncd(status):308:main] <top>: Using
session config file
path=/var/lib/glusterd/geo-replication/gvol0_nvfs10_gvol0/gsyncd.conf
[2020-06-09 02:01:32.156028] I [gsyncd(config-get):308:main] <top>: Using
session config file
path=/var/lib/glusterd/geo-replication/gvol0_nvfs10_gvol0/gsyncd.conf
[2020-06-09 02:01:32.250082] I [gsyncd(status):308:main] <top>: Using
session config file
path=/var/lib/glusterd/geo-replication/gvol0_nvfs10_gvol0/gsyncd.conf
[2020-06-09 02:01:33.778866] I [master(worker
/nodirectwritedata/gluster/gvol0):1384:process] _GMaster: Entry Time Taken
       MKD=0   MKN=0   LIN=0   SYM=0   REN=0   RMD=0   CRE=0
duration=0.0000 UNL=0
[2020-06-09 02:01:33.779000] I [master(worker
/nodirectwritedata/gluster/gvol0):1394:process] _GMaster: Data/Metadata
Time Taken        SETA=0  SETX=0  meta_duration=0.0000
 data_duration=14.0839   DATA=11 XATT=0
[2020-06-09 02:01:33.779195] I [master(worker
/nodirectwritedata/gluster/gvol0):1404:process] _GMaster: Batch Completed
changelog_end=1591668055        entry_stime=(1591668024, 0)
changelog_start=1591668040      stime=(1591668054, 0)   duration=14.1043
     num_changelogs=2        mode=live_changelog
[2020-06-09 02:01:35.337511] I [gsyncd(config-get):308:main] <top>: Using
session config file
path=/var/lib/glusterd/geo-replication/gvol0_nvfs10_gvol0/gsyncd.conf
[2020-06-09 02:01:35.432674] I [gsyncd(status):308:main] <top>: Using
session config file
path=/var/lib/glusterd/geo-replication/gvol0_nvfs10_gvol0/gsyncd.conf
[2020-06-09 02:01:38.784449] D [master(worker
/nodirectwritedata/gluster/gvol0):551:crawlwrap] _GMaster: ... crawl #18195
done, took 32.533401 seconds
[2020-06-09 02:01:38.790709] D [repce(worker
/nodirectwritedata/gluster/gvol0):196:push] RepceClient: call
22499:140085349934848:1591668098.79 scan() ...
[2020-06-09 02:01:38.791342] D [repce(worker
/nodirectwritedata/gluster/gvol0):216:__call__] RepceClient: call
22499:140085349934848:1591668098.79 scan -> None
[2020-06-09 02:01:38.791479] D [repce(worker
/nodirectwritedata/gluster/gvol0):196:push] RepceClient: call
22499:140085349934848:1591668098.79 getchanges() ...
[2020-06-09 02:01:38.791989] D [repce(worker
/nodirectwritedata/gluster/gvol0):216:__call__] RepceClient: call
22499:140085349934848:1591668098.79 getchanges ->
['/var/lib/misc/gluster/gsyncd/gvol0_nvfs10_gvol0/nodirectwritedata-gluster-gvol0/.processing/CHANGELOG.1591668069',
'/var/lib/misc/gluster/gsyncd/gvol0_nvfs10_gvol0/nodirectwritedata-gluster-gvol0/.processing/CHANGELOG.1591668083',
'/var/lib/misc/gluster/gsyncd/gvol0_nvfs10_gvol0/nodirectwritedata-gluster-gvol0/.processing/CHANGELOG.1591668097']
[2020-06-09 02:01:38.792114] I [master(worker
/nodirectwritedata/gluster/gvol0):1470:crawl] _GMaster: slave's time
 stime=(1591668054, 0)
Something very similar is logged around every 32 seconds, and I notice the
log above says the crawl took 32 seconds.
On Mon, 8 Jun 2020 at 16:22, sankarshan <sankarshan at kadalu.io> wrote:
> Reading through the thread it occurs to me that it would be a stronger
> approach to understand the workload (a general description of the
> application) and in terms of the releases of GlusterFS running, assess
> if there are new issues to be addressed or if existing sets of patches
> tend to work. Thanks for setting up the debug level in the log. High
> CPU usage by a geo-replication process would need to be traced back to
> why it really requires that %-age of CPU if it was not doing so
> previously.
>
> On Mon, 8 Jun 2020 at 05:29, David Cunningham <dcunningham at voisonics.com>
> wrote:
> >
> > Hi Strahil,
> >
> > The CPU is still quite high, with "top" regularly showing 100% CPU usage
> by that process. However it's not clear whether this is really a problem,
> or if it's just normal geo-replication activity. While CPU usage was not
> previously as high on this server, it's not clear whether GlusterFS might
> have made this the "push" node for geo-replication when it used to be some
> other server. Having said that, no other server shows a sudden drop in CPU
> usage.
> >
> > Ideally we could find out what the gsyncd process is doing, and
> therefore whether it's expected or not. Any ideas on that? We did set the
> log level to DEBUG as Sunny suggested.
> >
> > Looking at
> /var/log/glusterfs/geo-replication-slaves/gvol0_nvfs10_gvol0/mnt-cafs30-nodirectwritedata-gluster-gvol0.log
> on the geo-replication slave, we have a lot of lines like the following
> logged.
> >
> > [2020-06-06 00:32:43.155856] W [fuse-bridge.c:897:fuse_attr_cbk]
> 0-glusterfs-fuse: 875853: STAT()
> /.gfid/25d78a6c-41a9-4364-84a8-31f5571223ac => -1 (Stale file handle)
> > [2020-06-06 00:32:43.219759] W [fuse-bridge.c:897:fuse_attr_cbk]
> 0-glusterfs-fuse: 875923: STAT()
> /.gfid/25d78a6c-41a9-4364-84a8-31f5571223ac => -1 (Stale file handle)
> > [2020-06-06 00:32:43.280357] W [fuse-bridge.c:897:fuse_attr_cbk]
> 0-glusterfs-fuse: 876001: STAT()
> /.gfid/25d78a6c-41a9-4364-84a8-31f5571223ac => -1 (Stale file handle)
> > The message "E [MSGID: 109040]
> [dht-helper.c:1332:dht_migration_complete_check_task] 0-gvol0-dht:
> /.gfid/25d78a6c-41a9-4364-84a8-31f5571223ac: failed to lookup the file on
> gvol0-dht [Stale file handle]" repeated 9 times between [2020-06-06
> 00:32:42.689780] and [2020-06-06 00:32:43.280322]
> > [2020-06-06 09:03:03.660956] E [MSGID: 109040]
> [dht-helper.c:1332:dht_migration_complete_check_task] 0-gvol0-dht:
> /.gfid/decdd552-d58b-4ddf-b27a-0da9a6fbc38b: failed to lookup the file on
> gvol0-dht [Stale file handle]
> > [2020-06-06 09:03:03.661057] W [fuse-bridge.c:897:fuse_attr_cbk]
> 0-glusterfs-fuse: 965375: STAT()
> /.gfid/decdd552-d58b-4ddf-b27a-0da9a6fbc38b => -1 (Stale file handle)
> > [2020-06-06 09:03:10.258798] E [MSGID: 109040]
> [dht-helper.c:1332:dht_migration_complete_check_task] 0-gvol0-dht:
> /.gfid/1d90e129-d40f-4ca4-bea8-0bb3c4c7985a: failed to lookup the file on
> gvol0-dht [Stale file handle]
> > [2020-06-06 09:03:10.258880] W [fuse-bridge.c:897:fuse_attr_cbk]
> 0-glusterfs-fuse: 969455: STAT()
> /.gfid/1d90e129-d40f-4ca4-bea8-0bb3c4c7985a => -1 (Stale file handle)
> > [2020-06-06 09:09:41.259362] E [MSGID: 109040]
> [dht-helper.c:1332:dht_migration_complete_check_task] 0-gvol0-dht:
> /.gfid/740efff6-74dd-45be-847f-66919d1179e0: failed to lookup the file on
> gvol0-dht [Stale file handle]
> > [2020-06-06 09:09:41.259458] W [fuse-bridge.c:897:fuse_attr_cbk]
> 0-glusterfs-fuse: 1040904: STAT()
> /.gfid/740efff6-74dd-45be-847f-66919d1179e0 => -1 (Stale file handle)
> >
> > Could these errors be part of the problem?
> >
> > Thanks again.
> >
> >
> > On Sat, 6 Jun 2020 at 21:21, Strahil Nikolov <hunter86_bg at yahoo.com>
> wrote:
> >>
> >> Hey David,
> >>
> >> can you check the old logs for gfid mismatch and get a list  of files
> that were causing the high cpu .
> >> Maybe they are  related  somehow (maybe created by the same software
> ,  same client version or something else) which could help about that.
> >>
> >> Also  take  a  look in geo-replication-slave logs.
> >>
> >> Does the issue still occurs ?
> >>
> >> Best Regards,
> >> Strahil Nikolov
> >>
> >> На 6 юни 2020 г. 1:21:55 GMT+03:00, David Cunningham <
> dcunningham at voisonics.com> написа:
> >> >Hi Sunny and Strahil,
> >> >
> >> >Thanks again for your responses. We don't have a lot of renaming
> >> >activity -
> >> >maybe some, but not a lot. We do have files which are open for writing
> >> >for
> >> >quite a while - they're call recordings being written as the call
> >> >happens.
> >> >
> >> >We've installed GlusterFS using the Ubuntu packages and I'd really like
> >> >to
> >> >avoid compiling and applying patches.
> >> >
> >> >After enabling DEBUG the log at
> >> >/var/log/glusterfs/geo-replication/gvol0_nvfs10_gvol0/gsyncd.log
> >> >doesn't
> >> >seem to show anything very unusual:
> >> >
> >> >[2020-06-03 02:49:01.992177] I [master(worker
> >> >/nodirectwritedata/gluster/gvol0):1384:process] _GMaster: Entry Time
> >> >Taken
> >> >   MKD=0   MKN=0   LIN=0   SYM=0   REN=0   RMD=0CRE=0   duration=0.0000
> >> >UNL=0
> >> >[2020-06-03 02:49:01.992465] I [master(worker
> >> >/nodirectwritedata/gluster/gvol0):1394:process] _GMaster: Data/Metadata
> >> >Time Taken        SETA=0  SETX=0
> >> >meta_duration=0.0000data_duration=13.0954
> >> >   DATA=8  XATT=0
> >> >[2020-06-03 02:49:01.992863] I [master(worker
> >> >/nodirectwritedata/gluster/gvol0):1404:process] _GMaster: Batch
> >> >Completed
> >> >changelog_end=1591152508        entry_stime=(1591152352, 0)
> >> >changelog_start=1591152494      stime=(1591152507, 0)
> >> >duration=13.1077
> >> >     num_changelogs=2        mode=live_changelog
> >> >[2020-06-03 02:49:02.958687] D [repce(worker
> >> >/nodirectwritedata/gluster/gvol0):196:push] RepceClient: call
> >> >19017:139678812452608:1591152542.96 keep_alive({'version': (1, 0),
> >> >'retval': 0, 'uuid': '8ee85fae-f3aa-4285-ad48-67a1dc17ed73', 'timeout':
> >> >1591152662, 'volume_mark': (1583043396, 161632)},) ...
> >> >[2020-06-03 02:49:02.979139] D [repce(worker
> >> >/nodirectwritedata/gluster/gvol0):216:__call__] RepceClient: call
> >> >19017:139678812452608:1591152542.96 keep_alive -> 28
> >> >[2020-06-03 02:49:06.998127] D [master(worker
> >> >/nodirectwritedata/gluster/gvol0):551:crawlwrap] _GMaster: ... crawl
> >> >#114
> >> >done, took 30.180089 seconds
> >> >[2020-06-03 02:49:07.10132] D [repce(worker
> >> >/nodirectwritedata/gluster/gvol0):196:push] RepceClient: call
> >> >19017:139679441716992:1591152547.01 scan() ...
> >> >[2020-06-03 02:49:07.10781] D [repce(worker
> >> >/nodirectwritedata/gluster/gvol0):216:__call__] RepceClient: call
> >> >19017:139679441716992:1591152547.01 scan -> None
> >> >[2020-06-03 02:49:07.10935] D [repce(worker
> >> >/nodirectwritedata/gluster/gvol0):196:push] RepceClient: call
> >> >19017:139679441716992:1591152547.01 getchanges() ...
> >> >[2020-06-03 02:49:07.11579] D [repce(worker
> >> >/nodirectwritedata/gluster/gvol0):216:__call__] RepceClient: call
> >> >19017:139679441716992:1591152547.01 getchanges ->
> >>
> >['/var/lib/misc/gluster/gsyncd/gvol0_nvfs10_gvol0/nodirectwritedata-gluster-gvol0/.processing/CHANGELOG.1591152522']
> >> >[2020-06-03 02:49:07.11720] I [master(worker
> >> >/nodirectwritedata/gluster/gvol0):1470:crawl] _GMaster: slave's time
> >> >stime=(1591152507, 0)
> >> >
> >> >Am I looking in the right place to find out what that gsyncd.py process
> >> >is
> >> >doing?
> >> >
> >> >
> >> >On Tue, 2 Jun 2020 at 21:58, Sunny Kumar <sunkumar at redhat.com> wrote:
> >> >
> >> >> Hi David,
> >> >>
> >> >> You haven't answered my previous question regarding the type of your
> >> >> workload.
> >> >> ---
> >> >> You can use the below command to enable debug log.
> >> >>
> >> >> `gluster vol geo-rep <mastervol> <slavehost>::<slavevol> config
> >> >log-level
> >> >> DEBUG`
> >> >>
> >> >> and after capturing log again switch back to info mode:
> >> >>
> >> >> `gluster vol geo-rep <mastervol> <slavehost>::<slavevol> config
> >> >log-level
> >> >> INFO`
> >> >>
> >> >> Please share the debug log and geo-rep config to debug further:
> >> >> for config:
> >> >>
> >> >> `gluster vol geo-rep <mastervol> <slavehost>::<slavevol> config`
> >> >>
> >> >> /sunny
> >> >>
> >> >>
> >> >> On Tue, Jun 2, 2020 at 10:18 AM Strahil Nikolov
> >> ><hunter86_bg at yahoo.com>
> >> >> wrote:
> >> >> >
> >> >> > Hi David,
> >> >> >
> >> >> > in which log do you see the entries ?
> >> >> >
> >> >> > I think I got an explanation why you see the process  only on one
> >> >of the
> >> >> master nodes -  geo-rep session is established from only 1 master
> >> >node  /I
> >> >> hope someone corrects me if I'm wrong/ to one slave node. Thus it
> >> >will be
> >> >> natural to see the high CPU  usage on only 1 master node in your
> >> >situation.
> >> >> >
> >> >> > Do you see anything else  in the :
> >> >> var/log/glusterfs/geo-replication/<logs> (master nodes) or in
> >> >> /var/log/glusterfs/geo-replication-slaves (slaves) that could hint of
> >> >the
> >> >> exact issue. I have  a vague feeling that that python script is
> >> >constantly
> >> >> looping over some data causing the CPU hog.
> >> >> >
> >> >> > Sadly, I can't find an instruction for increasing the log level of
> >> >the
> >> >> geo rep log .
> >> >> >
> >> >> >
> >> >> > Best  Regards,
> >> >> > Strahil  Nikolov
> >> >> >
> >> >> >
> >> >> > На 2 юни 2020 г. 6:14:46 GMT+03:00, David Cunningham <
> >> >> dcunningham at voisonics.com> написа:
> >> >> > >Hi Strahil and Sunny,
> >> >> > >
> >> >> > >Thank you for the replies. I checked the gfid on the master and
> >> >slaves
> >> >> > >and
> >> >> > >they are the same. After moving the file away and back again it
> >> >doesn't
> >> >> > >seem to be having the issue with that file any more.
> >> >> > >
> >> >> > >We are still getting higher CPU usage on one of the master nodes
> >> >than
> >> >> > >the
> >> >> > >others. It logs this every few seconds:
> >> >> > >
> >> >> > >[2020-06-02 03:10:15.637815] I [master(worker
> >> >> > >/nodirectwritedata/gluster/gvol0):1384:process] _GMaster: Entry
> >> >Time
> >> >> > >Taken
> >> >> > >   MKD=0   MKN=0   LIN=0   SYM=0   REN=0   RMD=0CRE=0
> >> >duration=0.0000
> >> >> > >UNL=0
> >> >> > >[2020-06-02 03:10:15.638010] I [master(worker
> >> >> > >/nodirectwritedata/gluster/gvol0):1394:process] _GMaster:
> >> >Data/Metadata
> >> >> > >Time Taken        SETA=0  SETX=0
> >> >> > >meta_duration=0.0000data_duration=12.7878
> >> >> > >   DATA=4  XATT=0
> >> >> > >[2020-06-02 03:10:15.638286] I [master(worker
> >> >> > >/nodirectwritedata/gluster/gvol0):1404:process] _GMaster: Batch
> >> >> > >Completed
> >> >> > >changelog_end=1591067378        entry_stime=(1591067167, 0)
> >> >> > >changelog_start=1591067364      stime=(1591067377, 0)
> >> >> > >duration=12.8068
> >> >> > >     num_changelogs=2        mode=live_changelog
> >> >> > >[2020-06-02 03:10:20.658601] I [master(worker
> >> >> > >/nodirectwritedata/gluster/gvol0):1470:crawl] _GMaster: slave's
> >> >time
> >> >> > > stime=(1591067377, 0)
> >> >> > >[2020-06-02 03:10:34.21799] I [master(worker
> >> >> > >/nodirectwritedata/gluster/gvol0):1954:syncjob] Syncer: Sync Time
> >> >Taken
> >> >> > > duration=0.3826 num_files=8     job=1   return_code=0
> >> >> > >[2020-06-02 03:10:46.440535] I [master(worker
> >> >> > >/nodirectwritedata/gluster/gvol0):1384:process] _GMaster: Entry
> >> >Time
> >> >> > >Taken
> >> >> > >   MKD=0   MKN=0   LIN=0   SYM=0   REN=1   RMD=0CRE=2
> >> >duration=0.1314
> >> >> > >UNL=1
> >> >> > >[2020-06-02 03:10:46.440809] I [master(worker
> >> >> > >/nodirectwritedata/gluster/gvol0):1394:process] _GMaster:
> >> >Data/Metadata
> >> >> > >Time Taken        SETA=0  SETX=0
> >> >> > >meta_duration=0.0000data_duration=13.0171
> >> >> > >   DATA=14 XATT=0
> >> >> > >[2020-06-02 03:10:46.441205] I [master(worker
> >> >> > >/nodirectwritedata/gluster/gvol0):1404:process] _GMaster: Batch
> >> >> > >Completed
> >> >> > >changelog_end=1591067420        entry_stime=(1591067419, 0)
> >> >> > >changelog_start=1591067392      stime=(1591067419, 0)
> >> >> > >duration=13.0322
> >> >> > >     num_changelogs=3        mode=live_changelog
> >> >> > >[2020-06-02 03:10:51.460925] I [master(worker
> >> >> > >/nodirectwritedata/gluster/gvol0):1470:crawl] _GMaster: slave's
> >> >time
> >> >> > > stime=(1591067419, 0)
> >> >> > >
> >> >> > >[2020-06-02 03:11:04.448913] I [master(worker
> >> >> > >/nodirectwritedata/gluster/gvol0):1954:syncjob] Syncer: Sync Time
> >> >Taken
> >> >> > >duration=0.3466 num_files=3     job=1   return_code=0
> >> >> > >
> >> >> > >Whereas the other master nodes only log this:
> >> >> > >
> >> >> > >[2020-06-02 03:11:33.886938] I [gsyncd(config-get):308:main]
> >> ><top>:
> >> >> > >Using
> >> >> > >session config file
> >> >> >
> >> >>path=/var/lib/glusterd/geo-replication/gvol0_nvfs10_gvol0/gsyncd.conf
> >> >> > >[2020-06-02 03:11:33.993175] I [gsyncd(status):308:main] <top>:
> >> >Using
> >> >> > >session config file
> >> >> >
> >> >>path=/var/lib/glusterd/geo-replication/gvol0_nvfs10_gvol0/gsyncd.conf
> >> >> > >
> >> >> > >Can anyone help with what might cause the high CPU usage on one
> >> >master
> >> >> > >node? The process is this one, and is using 70-100% of CPU:
> >> >> > >
> >> >> > >python2
> >> >/usr/lib/x86_64-linux-gnu/glusterfs/python/syncdaemon/gsyncd.py
> >> >> > >worker gvol0 nvfs10::gvol0 --feedback-fd 15 --local-path
> >> >> > >/nodirectwritedata/gluster/gvol0 --local-node cafs30
> >> >--local-node-id
> >> >> > >b7521445-ee93-4fed-8ced-6a609fa8c7d4 --slave-id
> >> >> > >cdcdb210-839c-4306-a4dc-e696b165ed17 --rpc-fd 12,11,9,13
> >> >--subvol-num 1
> >> >> > >--resource-remote nvfs30 --resource-remote-id
> >> >> > >1e698ccd-aeec-4ec4-96fe-383da8fc3b78
> >> >> > >
> >> >> > >Thank you in advance!
> >> >> > >
> >> >> > >
> >> >> > >
> >> >> > >
> >> >> > >On Sat, 30 May 2020 at 20:20, Strahil Nikolov
> >> ><hunter86_bg at yahoo.com>
> >> >> > >wrote:
> >> >> > >
> >> >> > >> Hey David,
> >> >> > >>
> >> >> > >> for me a gfid  mismatch means  that the file  was
> >> >replaced/recreated
> >> >> > > -
> >> >> > >> just like  vim in linux does (and it is expected for config
> >> >file).
> >> >> > >>
> >> >> > >> Have  you checked the gfid  of  the file on both source and
> >> >> > >destination,
> >> >> > >> do they really match or they are different ?
> >> >> > >>
> >> >> > >> What happens  when you move away the file  from the slave ,
> >> >does it
> >> >> > >fixes
> >> >> > >> the issue ?
> >> >> > >>
> >> >> > >> Best Regards,
> >> >> > >> Strahil Nikolov
> >> >> > >>
> >> >> > >> На 30 май 2020 г. 1:10:56 GMT+03:00, David Cunningham <
> >> >> > >> dcunningham at voisonics.com> написа:
> >> >> > >> >Hello,
> >> >> > >> >
> >> >> > >> >We're having an issue with a geo-replication process with
> >> >unusually
> >> >> > >> >high
> >> >> > >> >CPU use and giving "Entry not present on master. Fixing gfid
> >> >> > >mismatch
> >> >> > >> >in
> >> >> > >> >slave" errors. Can anyone help on this?
> >> >> > >> >
> >> >> > >> >We have 3 GlusterFS replica nodes (we'll call the master),
> >> >which
> >> >> > >also
> >> >> > >> >push
> >> >> > >> >data to a remote server (slave) using geo-replication. This has
> >> >been
> >> >> > >> >running fine for a couple of months, but yesterday one of the
> >> >master
> >> >> > >> >nodes
> >> >> > >> >started having unusually high CPU use. It's this process:
> >> >> > >> >
> >> >> > >> >root at cafs30:/var/log/glusterfs# ps aux | grep 32048
> >> >> > >> >root     32048 68.7  0.6 1843140 845756 ?      Rl   02:51
> >> >493:51
> >> >> > >> >python2
> >> >> > >> >/usr/lib/x86_64-linux-gnu/glusterfs/python/syncdaemon/gsyncd.py
> >> >> > >worker
> >> >> > >> >gvol0 nvfs10::gvol0 --feedback-fd 15 --local-path
> >> >> > >> >/nodirectwritedata/gluster/gvol0 --local-node cafs30
> >> >--local-node-id
> >> >> > >> >b7521445-ee93-4fed-8ced-6a609fa8c7d4 --slave-id
> >> >> > >> >cdcdb210-839c-4306-a4dc-e696b165ed17 --rpc-fd 12,11,9,13
> >> >> > >--subvol-num 1
> >> >> > >> >--resource-remote nvfs30 --resource-remote-id
> >> >> > >> >1e698ccd-aeec-4ec4-96fe-383da8fc3b78
> >> >> > >> >
> >> >> > >> >Here's what is being logged in
> >> >> > >>
> >> >>/var/log/glusterfs/geo-replication/gvol0_nvfs10_gvol0/gsyncd.log:
> >> >> > >> >
> >> >> > >> >[2020-05-29 21:57:18.843524] I [master(worker
> >> >> > >> >/nodirectwritedata/gluster/gvol0):1470:crawl] _GMaster: slave's
> >> >time
> >> >> > >> > stime=(1590789408, 0)
> >> >> > >> >[2020-05-29 21:57:30.626172] I [master(worker
> >> >> > >>
> >> >>/nodirectwritedata/gluster/gvol0):813:fix_possible_entry_failures]
> >> >> > >> >_GMaster: Entry not present on master. Fixing gfid mismatch in
> >> >> > >slave.
> >> >> > >> >Deleting the entry    retry_count=1   entry=({u'uid': 108,
> >> >u'gfid':
> >> >> > >> >u'7c0b75e5-d8b7-454f-8010-112d613c599e', u'gid': 117, u'mode':
> >> >> > >33204,
> >> >> > >> >u'entry':
> >> >> > >> >u'.gfid/c5422396-1578-4b50-a29d-315be2a9c5d8/00a859f7xxxx.cfg',
> >> >> > >> >u'op': u'CREATE'}, 17, {u'slave_isdir': False,
> >> >u'gfid_mismatch':
> >> >> > >True,
> >> >> > >> >u'slave_name': None, u'slave_gfid':
> >> >> > >> >u'ec4b0ace-2ec4-4ea5-adbc-9f519b81917c', u'name_mismatch':
> >> >False,
> >> >> > >> >u'dst':
> >> >> > >> >False})
> >> >> > >> >[2020-05-29 21:57:30.627893] I [master(worker
> >> >> > >>
> >> >>/nodirectwritedata/gluster/gvol0):813:fix_possible_entry_failures]
> >> >> > >> >_GMaster: Entry not present on master. Fixing gfid mismatch in
> >> >> > >slave.
> >> >> > >> >Deleting the entry    retry_count=1   entry=({u'uid': 108,
> >> >u'gfid':
> >> >> > >> >u'a4d52e40-2e2f-4885-be5f-65fe95a8ebd7', u'gid': 117, u'mode':
> >> >> > >33204,
> >> >> > >> >u'entry':
> >> >> > >>
> >> >> >
> >>
> >>>u'.gfid/f857c42e-22f1-4ce4-8f2e-13bdadedde45/polycom_00a859f7xxxx.cfg',
> >> >> > >> >u'op': u'CREATE'}, 17, {u'slave_isdir': False,
> >> >u'gfid_mismatch':
> >> >> > >True,
> >> >> > >> >u'slave_name': None, u'slave_gfid':
> >> >> > >> >u'ece8da77-b5ea-45a7-9af7-7d4d8f55f74a', u'name_mismatch':
> >> >False,
> >> >> > >> >u'dst':
> >> >> > >> >False})
> >> >> > >> >[2020-05-29 21:57:30.629532] I [master(worker
> >> >> > >>
> >> >>/nodirectwritedata/gluster/gvol0):813:fix_possible_entry_failures]
> >> >> > >> >_GMaster: Entry not present on master. Fixing gfid mismatch in
> >> >> > >slave.
> >> >> > >> >Deleting the entry    retry_count=1   entry=({u'uid': 108,
> >> >u'gfid':
> >> >> > >> >u'3c525ad8-aeb2-46b6-9c41-7fb4987916f8', u'gid': 117, u'mode':
> >> >> > >33204,
> >> >> > >> >u'entry':
> >> >> > >>
> >> >> >
> >> >>
> >>
> >>>u'.gfid/f857c42e-22f1-4ce4-8f2e-13bdadedde45/00a859f7xxxx-directory.xml',
> >> >> > >> >u'op': u'CREATE'}, 17, {u'slave_isdir': False,
> >> >u'gfid_mismatch':
> >> >> > >True,
> >> >> > >> >u'slave_name': None, u'slave_gfid':
> >> >> > >> >u'06717b5a-d842-495d-bd25-aab9cd454490', u'name_mismatch':
> >> >False,
> >> >> > >> >u'dst':
> >> >> > >> >False})
> >> >> > >> >[2020-05-29 21:57:30.659123] I [master(worker
> >> >> > >> >/nodirectwritedata/gluster/gvol0):942:handle_entry_failures]
> >> >> > >_GMaster:
> >> >> > >> >Sucessfully fixed entry ops with gfid mismatch
> >> >retry_count=1
> >> >> > >> >[2020-05-29 21:57:30.659343] I [master(worker
> >> >> > >> >/nodirectwritedata/gluster/gvol0):1194:process_change]
> >> >_GMaster:
> >> >> > >Retry
> >> >> > >> >original entries. count = 1
> >> >> > >> >[2020-05-29 21:57:30.725810] I [master(worker
> >> >> > >> >/nodirectwritedata/gluster/gvol0):1197:process_change]
> >> >_GMaster:
> >> >> > >> >Sucessfully fixed all entry ops with gfid mismatch
> >> >> > >> >[2020-05-29 21:57:31.747319] I [master(worker
> >> >> > >> >/nodirectwritedata/gluster/gvol0):1954:syncjob] Syncer: Sync
> >> >Time
> >> >> > >Taken
> >> >> > >> >duration=0.7409 num_files=18    job=1   return_code=0
> >> >> > >> >
> >> >> > >> >We've verified that the files like polycom_00a859f7xxxx.cfg
> >> >referred
> >> >> > >to
> >> >> > >> >in
> >> >> > >> >the error do exist on the master nodes and slave.
> >> >> > >> >
> >> >> > >> >We found this bug fix:
> >> >> > >> >https://bugzilla.redhat.com/show_bug.cgi?id=1642865
> >> >> > >> >
> >> >> > >> >However that fix went in 5.1, and we're running 5.12 on the
> >> >master
> >> >> > >> >nodes
> >> >> > >> >and slave. A couple of GlusterFS clients connected to the
> >> >master
> >> >> > >nodes
> >> >> > >> >are
> >> >> > >> >running 5.13.
> >> >> > >> >
> >> >> > >> >Would anyone have any suggestions? Thank you in advance.
> >> >> > >>
> >> >> > ________
> >> >> >
> >> >> >
> >> >> >
> >> >> > Community Meeting Calendar:
> >> >> >
> >> >> > Schedule -
> >> >> > Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
> >> >> > Bridge: https://bluejeans.com/441850968
> >> >> >
> >> >> > Gluster-users mailing list
> >> >> > Gluster-users at gluster.org
> >> >> > https://lists.gluster.org/mailman/listinfo/gluster-users
> >> >>
> >> >>
> >
> >
> >
> > --
> > David Cunningham, Voisonics Limited
> > http://voisonics.com/
> > USA: +1 213 221 1092
> > New Zealand: +64 (0)28 2558 3782
> > ________
> >
> >
> >
> > Community Meeting Calendar:
> >
> > Schedule -
> > Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
> > Bridge: https://bluejeans.com/441850968
> >
> > Gluster-users mailing list
> > Gluster-users at gluster.org
> > https://lists.gluster.org/mailman/listinfo/gluster-users
>
>
>
> --
> sankarshan at kadalu.io | TZ: UTC+0530 | +91 99606 03294
> kadalu.io : Making it easy to provision storage in k8s!
> ________
>
>
>
> Community Meeting Calendar:
>
> Schedule -
> Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
> Bridge: https://bluejeans.com/441850968
>
> Gluster-users mailing list
> Gluster-users at gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-users
>
-- 
David Cunningham, Voisonics Limited
http://voisonics.com/
USA: +1 213 221 1092
New Zealand: +64 (0)28 2558 3782
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20200609/1908deea/attachment.html>
    
    
More information about the Gluster-users
mailing list