[Bugs] [Bug 1500841] New: [geo-rep]: Worker crashes with OSError: [Errno 61] No data available
bugzilla at redhat.com
bugzilla at redhat.com
Wed Oct 11 15:07:27 UTC 2017
https://bugzilla.redhat.com/show_bug.cgi?id=1500841
Bug ID: 1500841
Summary: [geo-rep]: Worker crashes with OSError: [Errno 61] No
data available
Product: GlusterFS
Version: 3.12
Component: geo-replication
Keywords: ZStream
Severity: medium
Assignee: bugs at gluster.org
Reporter: khiremat at redhat.com
CC: amukherj at redhat.com, avishwan at redhat.com,
bugs at gluster.org, csaba at redhat.com,
rhinduja at redhat.com, rhs-bugs at redhat.com,
storage-qa-internal at redhat.com
Depends On: 1499391
+++ This bug was initially created as a clone of Bug #1499391 +++
+++ This bug was initially created as a clone of Bug #1375094 +++
Description of problem:
=======================
While running the automation snaity check which does "create, chmod, chown,
chgrp, symlink, hardlink, rename, truncate, rm" during changelog, xsync and
history crawl.
Following worker crash was observed:
[2016-09-11 13:52:43.422640] E
[syncdutils(/bricks/brick1/master_brick5):276:log_raise_exception] <top>: FAIL:
Traceback (most recent call last):
File "/usr/libexec/glusterfs/python/syncdaemon/syncdutils.py", line 306, in
twrap
tf(*aa)
File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1267, in
Xsyncer
self.Xcrawl()
File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1424, in
Xcrawl
self.Xcrawl(e, xtr_root)
File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1424, in
Xcrawl
self.Xcrawl(e, xtr_root)
File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1424, in
Xcrawl
self.Xcrawl(e, xtr_root)
File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1424, in
Xcrawl
self.Xcrawl(e, xtr_root)
File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1424, in
Xcrawl
self.Xcrawl(e, xtr_root)
File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1424, in
Xcrawl
self.Xcrawl(e, xtr_root)
File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1406, in
Xcrawl
gfid = self.master.server.gfid(e)
File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 1414, in
gfid
return super(brickserver, cls).gfid(e)
File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 327, in ff
return f(*a)
File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 369, in
gfid
buf = Xattr.lgetxattr(path, cls.GFID_XATTR, 16)
File "/usr/libexec/glusterfs/python/syncdaemon/libcxattr.py", line 55, in
lgetxattr
return cls._query_xattr(path, siz, 'lgetxattr', attr)
File "/usr/libexec/glusterfs/python/syncdaemon/libcxattr.py", line 47, in
_query_xattr
cls.raise_oserr()
File "/usr/libexec/glusterfs/python/syncdaemon/libcxattr.py", line 37, in
raise_oserr
raise OSError(errn, os.strerror(errn))
OSError: [Errno 61] No data available
[2016-09-11 13:52:43.428107] I
[syncdutils(/bricks/brick1/master_brick5):220:finalize] <top>: exiting.
Version-Release number of selected component (if applicable):
=============================================================
mainline
How reproducible:
=================
Happened to see it once, while the same test suite is executed multiple times.
Steps:
Cant be very certain. But it is inbetween the following:
1. Perform rm -rf on master. Let it complete on master
2. Check for files between master and slave
3. File matches on Master and slave and arequal matches
4. Set the change_detector to xsync.
It is between step 2 and 4
This was caught via automation health check which does the fops in
changelog,xsync and history one after another.
Slave Log at the same time:
[2016-09-11 13:52:43.433715] I [fuse-bridge.c:5007:fuse_thread_proc] 0-fuse:
unmounting /tmp/gsyncd-aux-mount-MvkqZP
[2016-09-11 13:52:43.436595] W [glusterfsd.c:1251:cleanup_and_exit]
(-->/lib64/libpthread.so.0(+0x7dc5) [0x7fa62ba77dc5]
-->/usr/sbin/glusterfs(glusterfs_sigwaiter+0xe5) [0x7fa62d0ef915]
-->/usr/sbin/glusterfs(cleanup_and_exit+0x6b) [0x7fa62d0ef78b] ) 0-: received
signum (15), shutting down
[2016-09-11 13:52:43.436617] I [fuse-bridge.c:5714:fini] 0-fuse: Unmounting
'/tmp/gsyncd-aux-mount-MvkqZP'.
--- Additional comment from Worker Ant on 2017-10-06 23:17:56 EDT ---
REVIEW: https://review.gluster.org/18445 (geo-rep: Add ENODATA to retry list on
gfid getxattr) posted (#1) for review on master by Kotresh HR
(khiremat at redhat.com)
--- Additional comment from Worker Ant on 2017-10-10 01:53:45 EDT ---
REVIEW: https://review.gluster.org/18445 (geo-rep: Add ENODATA to retry list on
gfid getxattr) posted (#2) for review on master by Kotresh HR
(khiremat at redhat.com)
--- Additional comment from Worker Ant on 2017-10-11 06:16:13 EDT ---
COMMIT: https://review.gluster.org/18445 committed in master by Aravinda VK
(avishwan at redhat.com)
------
commit b56bdb34dafd1a87c5bbb2c9a75d1a088d82b1f4
Author: Kotresh HR <khiremat at redhat.com>
Date: Fri Oct 6 22:42:43 2017 -0400
geo-rep: Add ENODATA to retry list on gfid getxattr
During xsync crawl, worker occasionally crashed
with ENODATA on getting gfid from backend. This
is not persistent and is transient. Worker restart
invovles re-processing of few entries in changenlogs.
So adding ENODATA to retry list to avoid worker
restart.
Change-Id: Ib78d1e925c0a83c78746f28f7c79792a327dfd3e
BUG: 1499391
Signed-off-by: Kotresh HR <khiremat at redhat.com>
Referenced Bugs:
https://bugzilla.redhat.com/show_bug.cgi?id=1499391
[Bug 1499391] [geo-rep]: Worker crashes with OSError: [Errno 61] No data
available
--
You are receiving this mail because:
You are on the CC list for the bug.
You are the assignee for the bug.
More information about the Bugs
mailing list