[Bugs] [Bug 1500841] New: [geo-rep]: Worker crashes with OSError: [Errno 61] No data available

Wed Oct 11 15:07:27 UTC 2017

https://bugzilla.redhat.com/show_bug.cgi?id=1500841

            Bug ID: 1500841
           Summary: [geo-rep]: Worker crashes with OSError: [Errno 61] No
                    data available
           Product: GlusterFS
           Version: 3.12
         Component: geo-replication
          Keywords: ZStream
          Severity: medium
          Assignee: bugs at gluster.org
          Reporter: khiremat at redhat.com
                CC: amukherj at redhat.com, avishwan at redhat.com,
                    bugs at gluster.org, csaba at redhat.com,
                    rhinduja at redhat.com, rhs-bugs at redhat.com,
                    storage-qa-internal at redhat.com
        Depends On: 1499391

+++ This bug was initially created as a clone of Bug #1499391 +++

+++ This bug was initially created as a clone of Bug #1375094 +++

Description of problem:
=======================

While running the automation snaity check which does "create, chmod, chown,
chgrp, symlink, hardlink, rename, truncate, rm" during changelog, xsync and
history crawl. 

Following worker crash was observed:

[2016-09-11 13:52:43.422640] E
[syncdutils(/bricks/brick1/master_brick5):276:log_raise_exception] <top>: FAIL: 
Traceback (most recent call last):
  File "/usr/libexec/glusterfs/python/syncdaemon/syncdutils.py", line 306, in
twrap
    tf(*aa)
  File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1267, in
Xsyncer
    self.Xcrawl()
  File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1424, in
Xcrawl
    self.Xcrawl(e, xtr_root)
  File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1424, in
Xcrawl
    self.Xcrawl(e, xtr_root)
  File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1424, in
Xcrawl
    self.Xcrawl(e, xtr_root)
  File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1424, in
Xcrawl
    self.Xcrawl(e, xtr_root)
  File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1424, in
Xcrawl
    self.Xcrawl(e, xtr_root)
  File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1424, in
Xcrawl
    self.Xcrawl(e, xtr_root)
  File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1406, in
Xcrawl
    gfid = self.master.server.gfid(e)
  File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 1414, in
gfid
    return super(brickserver, cls).gfid(e)
  File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 327, in ff
    return f(*a)
  File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 369, in
gfid
    buf = Xattr.lgetxattr(path, cls.GFID_XATTR, 16)
  File "/usr/libexec/glusterfs/python/syncdaemon/libcxattr.py", line 55, in
lgetxattr
    return cls._query_xattr(path, siz, 'lgetxattr', attr)
  File "/usr/libexec/glusterfs/python/syncdaemon/libcxattr.py", line 47, in
_query_xattr
    cls.raise_oserr()
  File "/usr/libexec/glusterfs/python/syncdaemon/libcxattr.py", line 37, in
raise_oserr
    raise OSError(errn, os.strerror(errn))
OSError: [Errno 61] No data available
[2016-09-11 13:52:43.428107] I
[syncdutils(/bricks/brick1/master_brick5):220:finalize] <top>: exiting.

Version-Release number of selected component (if applicable):
=============================================================

mainline

How reproducible:
=================

Happened to see it once, while the same test suite is executed multiple times.

Steps:
Cant be very certain. But it is inbetween the following:
1. Perform rm -rf on master. Let it complete on master
2. Check for files between master and slave
3. File matches on Master and slave and arequal matches
4. Set the change_detector to xsync.
It is between step 2 and 4
This was caught via automation health check which does the fops in
changelog,xsync and history one after another. 

Slave Log at the same time:

[2016-09-11 13:52:43.433715] I [fuse-bridge.c:5007:fuse_thread_proc] 0-fuse:
unmounting /tmp/gsyncd-aux-mount-MvkqZP
[2016-09-11 13:52:43.436595] W [glusterfsd.c:1251:cleanup_and_exit]
(-->/lib64/libpthread.so.0(+0x7dc5) [0x7fa62ba77dc5]
-->/usr/sbin/glusterfs(glusterfs_sigwaiter+0xe5) [0x7fa62d0ef915]
-->/usr/sbin/glusterfs(cleanup_and_exit+0x6b) [0x7fa62d0ef78b] ) 0-: received
signum (15), shutting down
[2016-09-11 13:52:43.436617] I [fuse-bridge.c:5714:fini] 0-fuse: Unmounting
'/tmp/gsyncd-aux-mount-MvkqZP'.

--- Additional comment from Worker Ant on 2017-10-06 23:17:56 EDT ---

REVIEW: https://review.gluster.org/18445 (geo-rep: Add ENODATA to retry list on
gfid getxattr) posted (#1) for review on master by Kotresh HR
(khiremat at redhat.com)

--- Additional comment from Worker Ant on 2017-10-10 01:53:45 EDT ---

REVIEW: https://review.gluster.org/18445 (geo-rep: Add ENODATA to retry list on
gfid getxattr) posted (#2) for review on master by Kotresh HR
(khiremat at redhat.com)

--- Additional comment from Worker Ant on 2017-10-11 06:16:13 EDT ---

COMMIT: https://review.gluster.org/18445 committed in master by Aravinda VK
(avishwan at redhat.com) 
------
commit b56bdb34dafd1a87c5bbb2c9a75d1a088d82b1f4
Author: Kotresh HR <khiremat at redhat.com>
Date:   Fri Oct 6 22:42:43 2017 -0400

    geo-rep: Add ENODATA to retry list on gfid getxattr

    During xsync crawl, worker occasionally crashed
    with ENODATA on getting gfid from backend. This
    is not persistent and is transient. Worker restart
    invovles re-processing of few entries in changenlogs.
    So adding ENODATA to retry list to avoid worker
    restart.

    Change-Id: Ib78d1e925c0a83c78746f28f7c79792a327dfd3e
    BUG: 1499391
    Signed-off-by: Kotresh HR <khiremat at redhat.com>

Referenced Bugs:

https://bugzilla.redhat.com/show_bug.cgi?id=1499391
[Bug 1499391] [geo-rep]: Worker crashes with OSError: [Errno 61] No data
available
-- 
You are receiving this mail because:
You are on the CC list for the bug.
You are the assignee for the bug.