[Bugs] [Bug 1348085] New: [geo-rep]: Worker crashed with "KeyError: "
bugzilla at redhat.com
bugzilla at redhat.com
Mon Jun 20 06:38:18 UTC 2016
https://bugzilla.redhat.com/show_bug.cgi?id=1348085
Bug ID: 1348085
Summary: [geo-rep]: Worker crashed with "KeyError: "
Product: GlusterFS
Version: 3.7.11
Component: geo-replication
Severity: high
Assignee: bugs at gluster.org
Reporter: avishwan at redhat.com
CC: bugs at gluster.org, csaba at redhat.com,
rhinduja at redhat.com, rhs-bugs at redhat.com,
storage-qa-internal at redhat.com
Depends On: 1344826, 1345744
+++ This bug was initially created as a clone of Bug #1345744 +++
+++ This bug was initially created as a clone of Bug #1344826 +++
Description of problem:
=======================
While performing rm -rf on cascaded setup, found a worker crash on the primary
master and intermittent master volume with traceback as:
Master Volume:
==============
[2016-06-11 09:41:17.359086] E
[syncdutils(/rhs/brick1/b1):276:log_raise_exception] <top>: FAIL:
Traceback (most recent call last):
File "/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py", line 201, in main
main_i()
File "/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py", line 720, in
main_i
local.service_loop(*[r for r in [remote] if r])
File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 1497, in
service_loop
g3.crawlwrap(oneshot=True)
File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 571, in
crawlwrap
self.crawl()
File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1201, in
crawl
self.changelogs_batch_process(changes)
File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1107, in
changelogs_batch_process
self.process(batch)
File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 984, in
process
self.datas_in_batch.remove(unlinked_gfid)
KeyError: '.gfid/757b0ad8-b6f5-44da-b71a-1b1c25a72988'
Intermittent Master:
====================
[2016-06-11 09:41:51.681622] E
[syncdutils(/rhs/brick1/b1):276:log_raise_exception] <top>: FAIL:
Traceback (most recent call last):
File "/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py", line 201, in main
main_i()
File "/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py", line 720, in
main_i
local.service_loop(*[r for r in [remote] if r])
File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 1497, in
service_loop
g3.crawlwrap(oneshot=True)
File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 571, in
crawlwrap
self.crawl()
File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1201, in
crawl
self.changelogs_batch_process(changes)
File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1107, in
changelogs_batch_process
self.process(batch)
File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 984, in
process
self.datas_in_batch.remove(unlinked_gfid)
KeyError: '.gfid/757b0ad8-b6f5-44da-b71a-1b1c25a72988'
[2016-06-11 09:41:51.684969] I [syncdutils(/rhs/brick1/b1):220:finalize] <top>:
exiting.
Version-Release number of selected component (if applicable):
=============================================================
glusterfs-3.7.9-10
How reproducible:
=================
Always, on cascaded setup upon remove (rm -rf)
Steps to Reproduce:
===================
1. Create geo-rep cascaded setup with (vol0,vol1,vol2). Such that vol0=>vol1,
vol1=>vol2
2. Mount the vol0 volume and perform fops like
(cp,create,chmod,chown,chgrp,symlink,hardlink,truncate) on vol0
3. Let it sync to slave (vol1) and (vol2)
4. Calculate arequal checksum after every fop. It should match.
5. perform rm -rf on vol0
Actual results:
===============
Worker crashed on vol1 and vol0 with keyerror.
Expected results:
=================
Worker shouldn't crash
Additional info:
================
Performed rm -rf on non cascaded setup and didn't see the crash. Also,
eventually files are removed from all Master and slaves.
--- Additional comment from Vijay Bellur on 2016-06-13 02:33:20 EDT ---
REVIEW: http://review.gluster.org/14706 (geo-rep: Safely handle if unliked GFID
not present in data list) posted (#1) for review on master by Aravinda VK
(avishwan at redhat.com)
--- Additional comment from Vijay Bellur on 2016-06-20 02:37:06 EDT ---
COMMIT: http://review.gluster.org/14706 committed in master by Aravinda VK
(avishwan at redhat.com)
------
commit 4797ca3778d82a671716d4913c14f285591ae959
Author: Aravinda VK <avishwan at redhat.com>
Date: Mon Jun 13 12:00:40 2016 +0530
geo-rep: Safely handle if unliked GFID not present in data list
If unlinked GFID is not present in data list to be synced then
Geo-rep worker was crashing with KeyError. Handled KeyError with
this patch.
BUG: 1345744
Change-Id: I5a1c9ca4473e32606df2e5c7e26c95faf55d44c0
Signed-off-by: Aravinda VK <avishwan at redhat.com>
Reviewed-on: http://review.gluster.org/14706
Smoke: Gluster Build System <jenkins at build.gluster.org>
NetBSD-regression: NetBSD Build System <jenkins at build.gluster.org>
CentOS-regression: Gluster Build System <jenkins at build.gluster.org>
Reviewed-by: Kotresh HR <khiremat at redhat.com>
Referenced Bugs:
https://bugzilla.redhat.com/show_bug.cgi?id=1344826
[Bug 1344826] [geo-rep]: Worker crashed with "KeyError: "
https://bugzilla.redhat.com/show_bug.cgi?id=1345744
[Bug 1345744] [geo-rep]: Worker crashed with "KeyError: "
--
You are receiving this mail because:
You are on the CC list for the bug.
You are the assignee for the bug.
More information about the Bugs
mailing list