[Bugs] [Bug 1644518] New: [Geo-Replication] Geo-rep faulty sesion because of the directories are not synced to slave.
bugzilla at redhat.com
bugzilla at redhat.com
Wed Oct 31 04:29:20 UTC 2018
https://bugzilla.redhat.com/show_bug.cgi?id=1644518
Bug ID: 1644518
Summary: [Geo-Replication] Geo-rep faulty sesion because of
the directories are not synced to slave.
Product: GlusterFS
Version: 4.1
Component: geo-replication
Keywords: ZStream
Severity: urgent
Priority: urgent
Assignee: bugs at gluster.org
Reporter: khiremat at redhat.com
CC: abhishku at redhat.com, avishwan at redhat.com,
bkunal at redhat.com, bugs at gluster.org, csaba at redhat.com,
khiremat at redhat.com, rhinduja at redhat.com,
rhs-bugs at redhat.com, sankarshan at redhat.com,
sarora at redhat.com, storage-qa-internal at redhat.com
Depends On: 1638069, 1643402
External Bug ID: Gluster.org Gerrit 21498
+++ This bug was initially created as a clone of Bug #1643402 +++
+++ This bug was initially created as a clone of Bug #1638069 +++
Description of problem:
geo-replication becomes 'Faulty' with ENTRY FAILUREs.
----------------------------
[2018-10-08 12:24:23.669809] E [master(/rhgs/brick2/data):785:log_failures]
_GMaster: ENTRY FAILED data=({'uid': 0, 'gfid':
'2f863cd4-bb15-488c-bb35-d2df007b689c', 'gid': 0, 'mode': 33152, 'entry':
'.gfid/ecb73262-b6a7-417e-81d6-b7b2ab8eef06/.B77357E56302BB9311E7D5A81DAC4349.XLS.0nepGW',
'op': 'CREATE'}, 2, {'slave_isdir': False, 'gfid_mismatch': False,
'slave_name': None, 'slave_gfid': None, 'name_mismatch': False, 'dst': False})
[2018-10-08 12:24:23.670046] E [master(/rhgs/brick2/data):785:log_failures]
_GMaster: ENTRY FAILED data=({'stat': {'atime': 1512662436.052415, 'gid':
0, 'mtime': 1512030372.0, 'mode': 33188, 'uid': 0}, 'entry1':
'.gfid/ecb73262-b6a7-417e-81d6-b7b2ab8eef06/B77357E56302BB9311E7D5A81DAC4349.XLS',
'gfid': '2f863cd4-bb15-488c-bb35-d2df007b689c', 'link': None, 'entry':
'.gfid/ecb73262-b6a7-417e-81d6-b7b2ab8eef06/.B77357E56302BB9311E7D5A81DAC4349.XLS.0nepGW',
'op': 'RENAME'}, 2, {'slave_isdir': False, 'gfid_mismatch': False,
'slave_name': None, 'slave_gfid': None, 'name_mismatch': False, 'dst': False})
[2018-10-08 12:24:23.671149] E [master(/rhgs/brick2/data):785:log_failures]
_GMaster: ENTRY FAILED data=({'uid': 0, 'gfid':
'a1c70b62-ae66-4f66-9f18-f620c543526e', 'gid': 0, 'mode': 33152, 'entry':
'.gfid/dd818b54-335c-4108-9c93-b748e9d61fc5/.2970861B16119A5611E7DAF1AEDCD19B.XLS.40iqzc',
'op': 'CREATE'}, 2, {'slave_isdir': False, 'gfid_mismatch': False,
'slave_name': None, 'slave_gfid': None, 'name_mismatch': False, 'dst': False})
[2018-10-08 12:24:23.671329] E [master(/rhgs/brick2/data):785:log_failures]
_GMaster: ENTRY FAILED data=({'stat': {'atime': 1512662436.225416, 'gid':
0, 'mtime': 1512611725.0, 'mode': 33188, 'uid': 0}, 'entry1':
'.gfid/dd818b54-335c-4108-9c93-b748e9d61fc5/2970861B16119A5611E7DAF1AEDCD19B.XLS',
'gfid': 'a1c70b62-ae66-4f66-9f18-f620c543526e', 'link': None, 'entry':
'.gfid/dd818b54-335c-4108-9c93-b748e9d61fc5/.2970861B16119A5611E7DAF1AEDCD19B.XLS.40iqzc',
'op': 'RENAME'}, 2, {'slave_isdir': False, 'gfid_mismatch': False,
'slave_name': None, 'slave_gfid': None, 'name_mismatch': False, 'dst': False})
[2018-10-08 12:24:23.671480] E [master(/rhgs/brick2/data):785:log_failures]
_GMaster: ENTRY FAILED data=({'uid': 0, 'gfid':
'63e4a4ec-b5aa-4f14-834c-2751247a1262', 'gid': 0, 'mode': 16832, 'entry':
'.gfid/35e4b3ce-0485-47ae-8163-df8a5b45bb3f/201712', 'op': 'MKDIR'}, 2,
{'slave_isdir': False, 'gfid_mismatch': False, 'slave_name': None,
'slave_gfid': None, 'name_mismatch': False, 'dst': False})
-------------------------------------------
Analysis:
The error number logged is 2, which is ENOENT of parent on the slave and hence
failed to create file under it on slave. As you had mentioned, there are some
directories missing in slave and that is why these errors.
And I see this in all nodes, geo-rep logs the ENTRY FAILURE for files and
proceeds further but if it's a directory ENTRY FAILURE because of ENOENT of
parent, it will go to Faulty until it gets fixed. I will check out how can we
handle in this kind of errors code itself. But for now we have to create those
directories on slave with exact proper gfid as logged in errors. I will come up
with the steps to create them on slave and share with you.
Note that automatic gfid conflict resolution only handles gfid mismatch
scenarios and this does not fall under that. Please change the topic to
"geo-rep faulty because of the directories are not synced to slave"
Version-Release number of selected component (if applicable):
mainline
How reproducible:
--- Additional comment from Worker Ant on 2018-10-26 04:04:21 EDT ---
REVIEW: https://review.gluster.org/21498 (geo-rep: Add more intelligence to
automatic error handling) posted (#1) for review on master by Kotresh HR
--- Additional comment from Worker Ant on 2018-10-30 09:14:15 EDT ---
REVIEW: https://review.gluster.org/21498 (geo-rep: Add more intelligence to
automatic error handling) posted (#2) for review on master by Amar Tumballi
Referenced Bugs:
https://bugzilla.redhat.com/show_bug.cgi?id=1643402
[Bug 1643402] [Geo-Replication] Geo-rep faulty sesion because of the
directories are not synced to slave.
--
You are receiving this mail because:
You are on the CC list for the bug.
You are the assignee for the bug.
More information about the Bugs
mailing list