[Bugs] [Bug 1345883] New: [geo-rep]: Worker died with [Errno 2] No such file or directory
bugzilla at redhat.com
bugzilla at redhat.com
Mon Jun 13 11:28:35 UTC 2016
https://bugzilla.redhat.com/show_bug.cgi?id=1345883
Bug ID: 1345883
Summary: [geo-rep]: Worker died with [Errno 2] No such file or
directory
Product: GlusterFS
Version: 3.8.0
Component: geo-replication
Severity: high
Assignee: bugs at gluster.org
Reporter: avishwan at redhat.com
CC: bugs at gluster.org, csaba at redhat.com,
rhinduja at redhat.com, rhs-bugs at redhat.com,
storage-qa-internal at redhat.com
Depends On: 1339159, 1339471
Blocks: 1345882
+++ This bug was initially created as a clone of Bug #1339471 +++
+++ This bug was initially created as a clone of Bug #1339159 +++
Description of problem:
=======================
Upon running the geo-rep regression cases, found the following traceback while
it was processing xsync changelog:
[2016-05-23 15:13:28.683130] I
[resource(/bricks/brick0/master_brick0):1491:service_loop] GLUSTER: Register
time: 1464016408
[2016-05-23 15:13:28.712944] I
[master(/bricks/brick0/master_brick0):510:crawlwrap] _GMaster: primary master
with volume id 7590ca29-59de-403a-95ff-10e229a403b6 ...
[2016-05-23 15:13:28.864242] I
[master(/bricks/brick0/master_brick0):519:crawlwrap] _GMaster: crawl interval:
1 seconds
[2016-05-23 15:13:28.870495] I
[master(/bricks/brick0/master_brick0):466:mgmt_lock] _GMaster: Got lock :
/bricks/brick0/master_brick0 : Becoming ACTIVE
[2016-05-23 15:13:29.163460] I
[master(/bricks/brick0/master_brick0):1163:crawl] _GMaster: starting history
crawl... turns: 1, stime: (1464016374, 0), etime: 1464016409
[2016-05-23 15:13:30.165673] I
[master(/bricks/brick0/master_brick0):1192:crawl] _GMaster: slave's time:
(1464016374, 0)
[2016-05-23 15:13:31.970442] I
[master(/bricks/brick0/master_brick0):1206:crawl] _GMaster: finished history
crawl syncing, endtime: 1464016405, stime: (1464016404, 0)
[2016-05-23 15:13:34.646481] I
[master(/bricks/brick1/master_brick6):1121:crawl] _GMaster: slave's time:
(1464016396, 0)
[2016-05-23 15:13:43.984873] I
[master(/bricks/brick0/master_brick0):1163:crawl] _GMaster: starting history
crawl... turns: 2, stime: (1464016404, 0), etime: 1464016423
[2016-05-23 15:13:43.986049] I
[master(/bricks/brick0/master_brick0):1206:crawl] _GMaster: finished history
crawl syncing, endtime: 1464016405, stime: (1464016404, 0)
[2016-05-23 15:13:43.986222] I
[resource(/bricks/brick0/master_brick0):1500:service_loop] GLUSTER: Partial
history available, using xsync crawl after consuming history till 1464016405
[2016-05-23 15:13:43.993215] I
[master(/bricks/brick0/master_brick0):510:crawlwrap] _GMaster: primary master
with volume id 7590ca29-59de-403a-95ff-10e229a403b6 ...
[2016-05-23 15:13:44.8985] I
[master(/bricks/brick0/master_brick0):519:crawlwrap] _GMaster: crawl interval:
60 seconds
[2016-05-23 15:13:44.16269] I [master(/bricks/brick0/master_brick0):1271:crawl]
_GMaster: starting hybrid crawl..., stime: (1464016404, 0)
[2016-05-23 15:13:45.20493] I [master(/bricks/brick0/master_brick0):1281:crawl]
_GMaster: processing xsync changelog
/var/lib/misc/glusterfsd/master/ssh%3A%2F%2Froot%4010.70.37.196%3Agluster%3A%2F%2F127.0.0.1
%3Aslave/4b7a065288ce3187adad4d6439fb4f75/xsync/XSYNC-CHANGELOG.1464016424
[2016-05-23 15:13:45.234045] E
[syncdutils(/bricks/brick0/master_brick0):276:log_raise_exception] <top>: FAIL:
Traceback (most recent call last):
File "/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py", line 201, in main
main_i()
File "/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py", line 708, in
main_i
local.service_loop(*[r for r in [remote] if r])
File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 1501, in
service_loop
g1.crawlwrap(oneshot=True, register_time=register_time)
File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 571, in
crawlwrap
self.crawl()
File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1286, in
crawl
self.upd_stime(item[1][1], item[1][0])
File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1069, in
upd_stime
self.sendmark(path, stime)
File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 641, in
sendmark
self.set_slave_xtime(path, mark)
File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 182, in
set_slave_xtime
self.slave.server.set_stime(path, self.uuid, mark)
File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 1439, in
<lambda>
mark)
File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 327, in ff
return f(*a)
File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 539, in
set_stime
struct.pack('!II', *mark))
File "/usr/libexec/glusterfs/python/syncdaemon/libcxattr.py", line 79, in
lsetxattr
cls.raise_oserr()
File "/usr/libexec/glusterfs/python/syncdaemon/libcxattr.py", line 37, in
raise_oserr
raise OSError(errn, os.strerror(errn))
OSError: [Errno 2] No such file or directory
[2016-05-23 15:13:45.240242] I
[syncdutils(/bricks/brick0/master_brick0):220:finalize] <top>: exiting.
When worker restarted, it registered with the time: 1464016408 and started
history crawl with stime: 1464016374, and endtime {Current Time requested}:
1464016409. In first trial it returned till 1464016405.
Again in another attempt the history changelogs were requested with stime:
1464016404 and etime: 1464016423. Since the changelog roleover hasn't been
done, it returned till 1464016405.
Due to this we had partial history available and started xsync crawl which
again died with No Such file or directory.
In the next history crawl, releover is completed and the finished endtime
returned (1464016457) is more than registered time (1464016440) and sync
completes using history crawl.
Version-Release number of selected component (if applicable):
=============================================================
glusterfs-geo-replication-3.7.9-5.el7rhgs.x86_64
glusterfs-3.7.9-5.el7rhgs.x86_64
How reproducible:
=================
Observed Once.
Steps to Reproduce:
===================
Will work on the steps and update BZ. In general the scenario would be:
1. 2 history trial should finish before changelog roleover happens to cause
partial history crawl
2. rmdir on master before the changelog is processed to sync to slave to cause
no such file or directory
--- Additional comment from Red Hat Bugzilla Rules Engine on 2016-05-24
05:27:19 EDT ---
This bug is automatically being proposed for the current z-stream release of
Red Hat Gluster Storage 3 by setting the release flag 'rhgs‑3.1.z' to '?'.
If this bug should be proposed for a different release, please manually change
the proposed release flag.
--- Additional comment from Vijay Bellur on 2016-05-25 02:45:02 EDT ---
REVIEW: http://review.gluster.org/14529 (geo-rep: Handle stime/xtime set
failures) posted (#1) for review on master by Aravinda VK (avishwan at redhat.com)
--- Additional comment from Vijay Bellur on 2016-06-13 07:27:17 EDT ---
COMMIT: http://review.gluster.org/14529 committed in master by Aravinda VK
(avishwan at redhat.com)
------
commit 1a348bfaeb9f2a50ec8ce27e5477e9b430c58b3c
Author: Aravinda VK <avishwan at redhat.com>
Date: Wed May 25 11:56:56 2016 +0530
geo-rep: Handle stime/xtime set failures
While setting stime/xtime, if the file or directory is already
deleted then Geo-rep will crash with ENOENT.
With this patch, Geo-rep will ignores ENOENT since stime/xtime can't
be applied on a deleted file/directory.
Change-Id: I2d90569e51565f81ae53fcb23323e4f47c9e9672
Signed-off-by: Aravinda VK <avishwan at redhat.com>
BUG: 1339471
Reviewed-on: http://review.gluster.org/14529
Smoke: Gluster Build System <jenkins at build.gluster.com>
NetBSD-regression: NetBSD Build System <jenkins at build.gluster.org>
CentOS-regression: Gluster Build System <jenkins at build.gluster.com>
Reviewed-by: Saravanakumar Arumugam <sarumuga at redhat.com>
Reviewed-by: Kotresh HR <khiremat at redhat.com>
Referenced Bugs:
https://bugzilla.redhat.com/show_bug.cgi?id=1339159
[Bug 1339159] [geo-rep]: Worker died with [Errno 2] No such file or
directory
https://bugzilla.redhat.com/show_bug.cgi?id=1339471
[Bug 1339471] [geo-rep]: Worker died with [Errno 2] No such file or
directory
https://bugzilla.redhat.com/show_bug.cgi?id=1345882
[Bug 1345882] [geo-rep]: Worker died with [Errno 2] No such file or
directory
--
You are receiving this mail because:
You are on the CC list for the bug.
You are the assignee for the bug.
More information about the Bugs
mailing list