[Bugs] [Bug 1644518] New: [Geo-Replication] Geo-rep faulty sesion because of the directories are not synced to slave.

Wed Oct 31 04:29:20 UTC 2018

https://bugzilla.redhat.com/show_bug.cgi?id=1644518

            Bug ID: 1644518
           Summary: [Geo-Replication] Geo-rep faulty sesion  because of
                    the directories are not synced to slave.
           Product: GlusterFS
           Version: 4.1
         Component: geo-replication
          Keywords: ZStream
          Severity: urgent
          Priority: urgent
          Assignee: bugs at gluster.org
          Reporter: khiremat at redhat.com
                CC: abhishku at redhat.com, avishwan at redhat.com,
                    bkunal at redhat.com, bugs at gluster.org, csaba at redhat.com,
                    khiremat at redhat.com, rhinduja at redhat.com,
                    rhs-bugs at redhat.com, sankarshan at redhat.com,
                    sarora at redhat.com, storage-qa-internal at redhat.com
        Depends On: 1638069, 1643402
   External Bug ID: Gluster.org Gerrit 21498

+++ This bug was initially created as a clone of Bug #1643402 +++

+++ This bug was initially created as a clone of Bug #1638069 +++

Description of problem:

geo-replication becomes 'Faulty' with ENTRY FAILUREs.

----------------------------
[2018-10-08 12:24:23.669809] E [master(/rhgs/brick2/data):785:log_failures]
_GMaster: ENTRY FAILED      data=({'uid': 0, 'gfid':
'2f863cd4-bb15-488c-bb35-d2df007b689c', 'gid': 0, 'mode': 33152, 'entry':
'.gfid/ecb73262-b6a7-417e-81d6-b7b2ab8eef06/.B77357E56302BB9311E7D5A81DAC4349.XLS.0nepGW',
'op': 'CREATE'}, 2, {'slave_isdir': False, 'gfid_mismatch': False,
'slave_name': None, 'slave_gfid': None, 'name_mismatch': False, 'dst': False})
[2018-10-08 12:24:23.670046] E [master(/rhgs/brick2/data):785:log_failures]
_GMaster: ENTRY FAILED      data=({'stat': {'atime': 1512662436.052415, 'gid':
0, 'mtime': 1512030372.0, 'mode': 33188, 'uid': 0}, 'entry1':
'.gfid/ecb73262-b6a7-417e-81d6-b7b2ab8eef06/B77357E56302BB9311E7D5A81DAC4349.XLS',
'gfid': '2f863cd4-bb15-488c-bb35-d2df007b689c', 'link': None, 'entry':
'.gfid/ecb73262-b6a7-417e-81d6-b7b2ab8eef06/.B77357E56302BB9311E7D5A81DAC4349.XLS.0nepGW',
'op': 'RENAME'}, 2, {'slave_isdir': False, 'gfid_mismatch': False,
'slave_name': None, 'slave_gfid': None, 'name_mismatch': False, 'dst': False})

[2018-10-08 12:24:23.671149] E [master(/rhgs/brick2/data):785:log_failures]
_GMaster: ENTRY FAILED      data=({'uid': 0, 'gfid':
'a1c70b62-ae66-4f66-9f18-f620c543526e', 'gid': 0, 'mode': 33152, 'entry':
'.gfid/dd818b54-335c-4108-9c93-b748e9d61fc5/.2970861B16119A5611E7DAF1AEDCD19B.XLS.40iqzc',
'op': 'CREATE'}, 2, {'slave_isdir': False, 'gfid_mismatch': False,
'slave_name': None, 'slave_gfid': None, 'name_mismatch': False, 'dst': False})

[2018-10-08 12:24:23.671329] E [master(/rhgs/brick2/data):785:log_failures]
_GMaster: ENTRY FAILED      data=({'stat': {'atime': 1512662436.225416, 'gid':
0, 'mtime': 1512611725.0, 'mode': 33188, 'uid': 0}, 'entry1':
'.gfid/dd818b54-335c-4108-9c93-b748e9d61fc5/2970861B16119A5611E7DAF1AEDCD19B.XLS',
'gfid': 'a1c70b62-ae66-4f66-9f18-f620c543526e', 'link': None, 'entry':
'.gfid/dd818b54-335c-4108-9c93-b748e9d61fc5/.2970861B16119A5611E7DAF1AEDCD19B.XLS.40iqzc',
'op': 'RENAME'}, 2, {'slave_isdir': False, 'gfid_mismatch': False,
'slave_name': None, 'slave_gfid': None, 'name_mismatch': False, 'dst': False})

[2018-10-08 12:24:23.671480] E [master(/rhgs/brick2/data):785:log_failures]
_GMaster: ENTRY FAILED      data=({'uid': 0, 'gfid':
'63e4a4ec-b5aa-4f14-834c-2751247a1262', 'gid': 0, 'mode': 16832, 'entry':
'.gfid/35e4b3ce-0485-47ae-8163-df8a5b45bb3f/201712', 'op': 'MKDIR'}, 2,
{'slave_isdir': False, 'gfid_mismatch': False, 'slave_name': None,
'slave_gfid': None, 'name_mismatch': False, 'dst': False})
-------------------------------------------

Analysis:

The error number logged is 2, which is ENOENT of parent on the slave and hence
failed to create file under it on slave. As you had mentioned, there are some
directories missing in slave and that is why these errors.

And I see this in all nodes, geo-rep logs the ENTRY FAILURE for files and
proceeds further but if it's a directory ENTRY FAILURE because of ENOENT of
parent, it will go to Faulty until it gets fixed. I will check out how can we
handle in this kind of errors code itself. But for now we have to create those
directories on slave with exact proper gfid as logged in errors. I will come up
with the steps to create them on slave and share with you.

Note that automatic gfid conflict resolution only handles gfid mismatch
scenarios and this does not fall under that. Please change the topic to

"geo-rep faulty  because of the directories are not synced to slave"

Version-Release number of selected component (if applicable):

mainline

How reproducible:

--- Additional comment from Worker Ant on 2018-10-26 04:04:21 EDT ---

REVIEW: https://review.gluster.org/21498 (geo-rep: Add more intelligence to
automatic error handling) posted (#1) for review on master by Kotresh HR

--- Additional comment from Worker Ant on 2018-10-30 09:14:15 EDT ---

REVIEW: https://review.gluster.org/21498 (geo-rep: Add more intelligence to
automatic error handling) posted (#2) for review on master by Amar Tumballi

Referenced Bugs:

https://bugzilla.redhat.com/show_bug.cgi?id=1643402
[Bug 1643402] [Geo-Replication] Geo-rep faulty sesion  because of the
directories are not synced to slave.
-- 
You are receiving this mail because:
You are on the CC list for the bug.
You are the assignee for the bug.