[Bugs] [Bug 1644163] New: geo-rep: geo-replication gets stuck after file rename and gfid conflict
bugzilla at redhat.com
bugzilla at redhat.com
Tue Oct 30 06:26:57 UTC 2018
https://bugzilla.redhat.com/show_bug.cgi?id=1644163
Bug ID: 1644163
Summary: geo-rep: geo-replication gets stuck after file rename
and gfid conflict
Product: GlusterFS
Version: 4.1
Component: geo-replication
Keywords: ZStream
Severity: high
Priority: high
Assignee: bugs at gluster.org
Reporter: khiremat at redhat.com
CC: atoborek at redhat.com, atumball at redhat.com,
avishwan at redhat.com, bkunal at redhat.com,
bugs at gluster.org, csaba at redhat.com,
khiremat at redhat.com, rhinduja at redhat.com,
rhs-bugs at redhat.com, sankarshan at redhat.com,
storage-qa-internal at redhat.com, sunkumar at redhat.com
Depends On: 1640347, 1642865
Blocks: 1644158
+++ This bug was initially created as a clone of Bug #1642865 +++
+++ This bug was initially created as a clone of Bug #1640347 +++
Description of problem:
Version-Release number of selected component (if applicable):
master
How reproducible:
Rename the file on master while geo-replication is in place
Steps to Reproduce:
1. Create file
2. Geo-replicate the file
3. Rename file on master
Actual results:
Geo-replication gets stuck with errors:
[2018-10-17 14:59:44.454014] I
[master(/gluster/brick1/brick1):814:fix_possible_entry_failures] _GMaster:
Entry not present on master. Fixing gfid mismatch in slave. Deleting the entry
retry_count=1 entry=({'stat': {'atime': 1539323311.2722738, 'gid': 0,
'mtime': 1539323311.2792735, 'mode': 33277, 'uid': 0}, 'entry1':
'.gfid/08bcd5e4-b2f9-459d-a549-3fd4a303aa25/koala.jpg', 'gfid':
'917fd4ff-c476-46b2-a805-b8212bd3635a', 'link': None, 'entry':
'.gfid/d2de43dd-07c1-4413-b7eb-e793e66dc610/koala.jpg', 'op': 'RENAME'}, 17,
{'slave_isdir': False, 'gfid_mismatch': True, 'slave_name': None, 'slave_gfid':
'61e447a7-ad37-4acf-a0ba-3d570368803d', 'name_mismatch': False, 'dst': False})
[2018-10-17 14:59:44.455204] I
[master(/gluster/brick1/brick1):814:fix_possible_entry_failures] _GMaster:
Entry not present on master. Fixing gfid mismatch in slave. Deleting the entry
retry_count=1 entry=({'stat': {'atime': 1539617414.1750674, 'gid': 0,
'mtime': 1539617414.1960669, 'mode': 33204, 'uid': 0}, 'entry1':
'.gfid/d2de43dd-07c1-4413-b7eb-e793e66dc610/koala.jpg', 'gfid':
'8899e426-7709-4351-8181-b663eb57a6a7', 'link': None, 'entry':
'.gfid/d2de43dd-07c1-4413-b7eb-e793e66dc610/koala_0.jpg', 'op': 'RENAME'}, 17,
{'slave_isdir': False, 'gfid_mismatch': True, 'slave_name': None, 'slave_gfid':
'61e447a7-ad37-4acf-a0ba-3d570368803d', 'name_mismatch': False, 'dst': True})
[2018-10-17 14:59:44.457129] I
[master(/gluster/brick1/brick1):834:fix_possible_entry_failures] _GMaster:
Fixing gfid mismatch in slave. Safe to ignore, take out entry
retry_count=1 entry=({'stat': {'atime': 1539323311.2722738, 'gid': 0,
'mtime': 1539323311.2792735, 'mode': 33277, 'uid': 0}, 'entry1':
'.gfid/d2de43dd-07c1-4413-b7eb-e793e66dc610/koala_0.jpg', 'gfid':
'917fd4ff-c476-46b2-a805-b8212bd3635a', 'link': None, 'entry':
'.gfid/08bcd5e4-b2f9-459d-a549-3fd4a303aa25/koala.jpg', 'op': 'RENAME'}, 17,
{'slave_isdir': False, 'gfid_mismatch': True, 'slave_name': None, 'slave_gfid':
'8899e426-7709-4351-8181-b663eb57a6a7', 'name_mismatch': False, 'dst': True})
[2018-10-17 14:59:44.457343] E
[syncdutils(/gluster/brick1/brick1):349:log_raise_exception] <top>: FAIL:
Traceback (most recent call last):
File "/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py", line 210, in main
main_i()
File "/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py", line 805, in
main_i
local.service_loop(*[r for r in [remote] if r])
File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 1588, in
service_loop
g3.crawlwrap(oneshot=True)
File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 597, in
crawlwrap
self.crawl()
File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1535, in
crawl
self.changelogs_batch_process(changes)
File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1435, in
changelogs_batch_process
self.process(batch)
File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1269, in
process
self.process_change(change, done, retry)
File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1171, in
process_change
self.handle_entry_failures(failures, entries)
File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 926, in
handle_entry_failures
failures1, retries, entry_ops1)
File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 835, in
fix_possible_entry_failures
entries.remove(failure[0])
ValueError: list.remove(x): x not in list
Expected results:
Geo-replication proceeds and not getting stuck
Additional info:
--- Additional comment from Worker Ant on 2018-10-25 05:01:34 EDT ---
REVIEW: https://review.gluster.org/21483 (geo-rep: Fix issue in
gfid-conflict-resolution) posted (#1) for review on master by Kotresh HR
--- Additional comment from Kotresh HR on 2018-10-26 01:00 EDT ---
Testcase:
1. Setup geo-rep session and mount the master volume at "/mastermnt"
2. Create a directory and change ownership to normal user
mkdir -p /mastermnt/logrotate
chown geoaccount:geoaccount /mastermnt/logrotate
3. Login as normal user (geoaccount) and run the logrotate_simulate.sh script
Observation:
geo-rep will crash without the fix.
--- Additional comment from Worker Ant on 2018-10-26 05:26:21 EDT ---
COMMIT: https://review.gluster.org/21483 committed in master by "Sunny Kumar"
<sunkumar at redhat.com> with a commit message- geo-rep: Fix issue in
gfid-conflict-resolution
Problem:
During gfid-conflict-resolution, geo-rep crashes
with 'ValueError: list.remove(x): x not in list'
Cause and Analysis:
During gfid-conflict-resolution, the entry blob is
passed back to master along with additional
information to verify it's integrity. If everything
looks fine, the entry creation is ignored and is
deleted from the original list. But it is crashing
during removal of entry from the list saying entry
not in list. The reason is that the stat information
in the entry blob was modified and sent back to
master if present.
Fix:
Send back the correct stat information for
gfid-conflict-resolution.
fixes: bz#1642865
Change-Id: I47a6aa60b2a495465aa9314eebcb4085f0b1c4fd
Signed-off-by: Kotresh HR <khiremat at redhat.com>
Referenced Bugs:
https://bugzilla.redhat.com/show_bug.cgi?id=1642865
[Bug 1642865] geo-rep: geo-replication gets stuck after file rename and
gfid conflict
https://bugzilla.redhat.com/show_bug.cgi?id=1644158
[Bug 1644158] geo-rep: geo-replication gets stuck after file rename and
gfid conflict
--
You are receiving this mail because:
You are on the CC list for the bug.
You are the assignee for the bug.
More information about the Bugs
mailing list