[Gluster-users] Geo-replication: Entry not present on master. Fixing gfid mismatch in slave

Sunny Kumar sunkumar at redhat.com
Sat May 30 21:54:18 UTC 2020


Hi David,

Looks like you are running a workload that involves lots of rename and
geo-rep is trying to handle those. you can try below patches which
will give you performance benefits.

[1]. https://review.gluster.org/#/c/glusterfs/+/23570/
[2]. https://review.gluster.org/#/c/glusterfs/+/23459/
[3]. https://review.gluster.org/#/c/glusterfs/+/22720/

/sunny

On Sat, May 30, 2020 at 9:20 AM Strahil Nikolov <hunter86_bg at yahoo.com> wrote:
>
> Hey David,
>
> for me a gfid  mismatch means  that the file  was  replaced/recreated  -  just like  vim in linux does (and it is expected for config file).
>
> Have  you checked the gfid  of  the file on both source and destination,  do they really match or they are different ?
>
> What happens  when you move away the file  from the slave ,  does it fixes the issue ?
>
> Best Regards,
> Strahil Nikolov
>
> На 30 май 2020 г. 1:10:56 GMT+03:00, David Cunningham <dcunningham at voisonics.com> написа:
> >Hello,
> >
> >We're having an issue with a geo-replication process with unusually
> >high
> >CPU use and giving "Entry not present on master. Fixing gfid mismatch
> >in
> >slave" errors. Can anyone help on this?
> >
> >We have 3 GlusterFS replica nodes (we'll call the master), which also
> >push
> >data to a remote server (slave) using geo-replication. This has been
> >running fine for a couple of months, but yesterday one of the master
> >nodes
> >started having unusually high CPU use. It's this process:
> >
> >root at cafs30:/var/log/glusterfs# ps aux | grep 32048
> >root     32048 68.7  0.6 1843140 845756 ?      Rl   02:51 493:51
> >python2
> >/usr/lib/x86_64-linux-gnu/glusterfs/python/syncdaemon/gsyncd.py worker
> >gvol0 nvfs10::gvol0 --feedback-fd 15 --local-path
> >/nodirectwritedata/gluster/gvol0 --local-node cafs30 --local-node-id
> >b7521445-ee93-4fed-8ced-6a609fa8c7d4 --slave-id
> >cdcdb210-839c-4306-a4dc-e696b165ed17 --rpc-fd 12,11,9,13 --subvol-num 1
> >--resource-remote nvfs30 --resource-remote-id
> >1e698ccd-aeec-4ec4-96fe-383da8fc3b78
> >
> >Here's what is being logged in
> >/var/log/glusterfs/geo-replication/gvol0_nvfs10_gvol0/gsyncd.log:
> >
> >[2020-05-29 21:57:18.843524] I [master(worker
> >/nodirectwritedata/gluster/gvol0):1470:crawl] _GMaster: slave's time
> > stime=(1590789408, 0)
> >[2020-05-29 21:57:30.626172] I [master(worker
> >/nodirectwritedata/gluster/gvol0):813:fix_possible_entry_failures]
> >_GMaster: Entry not present on master. Fixing gfid mismatch in slave.
> >Deleting the entry    retry_count=1   entry=({u'uid': 108, u'gfid':
> >u'7c0b75e5-d8b7-454f-8010-112d613c599e', u'gid': 117, u'mode': 33204,
> >u'entry':
> >u'.gfid/c5422396-1578-4b50-a29d-315be2a9c5d8/00a859f7xxxx.cfg',
> >u'op': u'CREATE'}, 17, {u'slave_isdir': False, u'gfid_mismatch': True,
> >u'slave_name': None, u'slave_gfid':
> >u'ec4b0ace-2ec4-4ea5-adbc-9f519b81917c', u'name_mismatch': False,
> >u'dst':
> >False})
> >[2020-05-29 21:57:30.627893] I [master(worker
> >/nodirectwritedata/gluster/gvol0):813:fix_possible_entry_failures]
> >_GMaster: Entry not present on master. Fixing gfid mismatch in slave.
> >Deleting the entry    retry_count=1   entry=({u'uid': 108, u'gfid':
> >u'a4d52e40-2e2f-4885-be5f-65fe95a8ebd7', u'gid': 117, u'mode': 33204,
> >u'entry':
> >u'.gfid/f857c42e-22f1-4ce4-8f2e-13bdadedde45/polycom_00a859f7xxxx.cfg',
> >u'op': u'CREATE'}, 17, {u'slave_isdir': False, u'gfid_mismatch': True,
> >u'slave_name': None, u'slave_gfid':
> >u'ece8da77-b5ea-45a7-9af7-7d4d8f55f74a', u'name_mismatch': False,
> >u'dst':
> >False})
> >[2020-05-29 21:57:30.629532] I [master(worker
> >/nodirectwritedata/gluster/gvol0):813:fix_possible_entry_failures]
> >_GMaster: Entry not present on master. Fixing gfid mismatch in slave.
> >Deleting the entry    retry_count=1   entry=({u'uid': 108, u'gfid':
> >u'3c525ad8-aeb2-46b6-9c41-7fb4987916f8', u'gid': 117, u'mode': 33204,
> >u'entry':
> >u'.gfid/f857c42e-22f1-4ce4-8f2e-13bdadedde45/00a859f7xxxx-directory.xml',
> >u'op': u'CREATE'}, 17, {u'slave_isdir': False, u'gfid_mismatch': True,
> >u'slave_name': None, u'slave_gfid':
> >u'06717b5a-d842-495d-bd25-aab9cd454490', u'name_mismatch': False,
> >u'dst':
> >False})
> >[2020-05-29 21:57:30.659123] I [master(worker
> >/nodirectwritedata/gluster/gvol0):942:handle_entry_failures] _GMaster:
> >Sucessfully fixed entry ops with gfid mismatch     retry_count=1
> >[2020-05-29 21:57:30.659343] I [master(worker
> >/nodirectwritedata/gluster/gvol0):1194:process_change] _GMaster: Retry
> >original entries. count = 1
> >[2020-05-29 21:57:30.725810] I [master(worker
> >/nodirectwritedata/gluster/gvol0):1197:process_change] _GMaster:
> >Sucessfully fixed all entry ops with gfid mismatch
> >[2020-05-29 21:57:31.747319] I [master(worker
> >/nodirectwritedata/gluster/gvol0):1954:syncjob] Syncer: Sync Time Taken
> >duration=0.7409 num_files=18    job=1   return_code=0
> >
> >We've verified that the files like polycom_00a859f7xxxx.cfg referred to
> >in
> >the error do exist on the master nodes and slave.
> >
> >We found this bug fix:
> >https://bugzilla.redhat.com/show_bug.cgi?id=1642865
> >
> >However that fix went in 5.1, and we're running 5.12 on the master
> >nodes
> >and slave. A couple of GlusterFS clients connected to the master nodes
> >are
> >running 5.13.
> >
> >Would anyone have any suggestions? Thank you in advance.
> ________
>
>
>
> Community Meeting Calendar:
>
> Schedule -
> Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
> Bridge: https://bluejeans.com/441850968
>
> Gluster-users mailing list
> Gluster-users at gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-users



More information about the Gluster-users mailing list