[Bugs] [Bug 1236093] New: [geo-rep]: worker died with "ESTALE" when performed rm -rf on a directory from mount of master volume
bugzilla at redhat.com
bugzilla at redhat.com
Fri Jun 26 14:07:49 UTC 2015
https://bugzilla.redhat.com/show_bug.cgi?id=1236093
Bug ID: 1236093
Summary: [geo-rep]: worker died with "ESTALE" when performed rm
-rf on a directory from mount of master volume
Product: GlusterFS
Version: 3.7.1
Component: geo-replication
Severity: urgent
Priority: high
Assignee: bugs at gluster.org
Reporter: khiremat at redhat.com
CC: aavati at redhat.com, annair at redhat.com,
asrivast at redhat.com, avishwan at redhat.com,
bugs at gluster.org, csaba at redhat.com,
gluster-bugs at redhat.com, nlevinki at redhat.com,
rhinduja at redhat.com, storage-qa-internal at redhat.com,
vagarwal at redhat.com
Depends On: 1222856, 1232912, 1223286
Blocks: 1202842, 1223636
+++ This bug was initially created as a clone of Bug #1232912 +++
+++ This bug was initially created as a clone of Bug #1222856 +++
Description of problem:
=======================
Whenever perfomred rm -rf on the master volume, the worker died with the
backtrace as:
[2015-05-19 15:33:13.868683] E
[syncdutils(/rhs/brick2/b2):276:log_raise_exception] <top>: FAIL:
Traceback (most recent call last):
File "/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py", line 165, in main
main_i()
File "/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py", line 659, in
main_i
local.service_loop(*[r for r in [remote] if r])
File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 1440, in
service_loop
g3.crawlwrap(oneshot=True)
File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 580, in
crawlwrap
self.crawl()
File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1150, in
crawl
self.changelogs_batch_process(changes)
File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1059, in
changelogs_batch_process
self.process(batch)
File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 946, in
process
self.process_change(change, done, retry)
File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 902, in
process_change
failures = self.slave.server.entry_ops(entries)
File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 226, in
__call__
return self.ins(self.meth, *a)
File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 208, in
__call__
raise res
OSError: [Errno 116] Stale file handle
[2015-05-19 15:33:13.870326] I [syncdutils(/rhs/brick2/b2):220:finalize] <top>:
exiting.
[2015-05-19 15:33:13.874784] I [repce(agent):92:service_loop] RepceServer:
terminating on reaching EOF.
And with everytime monitor tries to spawn the process, it dies in startup
phase.
Version-Release number of selected component (if applicable):
=============================================================
glusterfs-3.7.0-2.el6rhs.x86_64
How reproducible:
================
Tried couple of times and was successful in reproducing it in as many times
Steps Carried:
==============
1. Created master cluster
2. Created and started master volume
3. Created shared volume (gluster_shared_storage)
4. Mounted the shared volume on /var/run/gluster/shared_storage
5. Created Slave cluster
6. Created and Started slave volume
7. Created geo-rep session between master and slave
8. Configured use_meta_volume true
9. Started geo-rep
10. Mounted master volume over Fuse and NFS to client
11. Copied files /etc{1..10} from fuse mount
12. Copied files /etc{11.20} from NFS mount
13. Sync completed successfully
14. Removed the files etc.2 from fuse and etc.12 from NFS
15. Looked into the geo-rep session it was faulty
16. Looked into the logs, it showed continuous traceback
Actual results:
===============
It crashed and comes back with crawl type as history
Expected results:
=================
Worker should not crash and it should handle ESTALE gracefully
--- Additional comment from Rahul Hinduja on 2015-05-19 06:29:45 EDT ---
sosreport @ http://rhsqe-repo.lab.eng.blr.redhat.com/sosreports/1222856/
rm -rf of a directory is a common usecase and hence proposing as blocker for
3.1
--- Additional comment from Aravinda VK on 2015-06-02 04:11:09 EDT ---
Patches:
master: http://review.gluster.org/#/c/10837/
release-3.7: http://review.gluster.org/10913
downstream: https://code.engineering.redhat.com/gerrit/#/c/49674/
--- Additional comment from errata-xmlrpc on 2015-06-05 02:23:26 EDT ---
Bug report changed to ON_QA status by Errata System.
A QE request has been submitted for advisory RHEA-2015:20560-02
https://errata.devel.redhat.com/advisory/20560
--- Additional comment from Rahul Hinduja on 2015-06-11 07:42:23 EDT ---
I still see the issue with build: glusterfs-3.7.1-1
Moving bug back to assigned state;
[root at georep1 scripts]# rpm -qa | grep gluster
glusterfs-client-xlators-3.7.1-1.el6rhs.x86_64
glusterfs-server-3.7.1-1.el6rhs.x86_64
glusterfs-3.7.1-1.el6rhs.x86_64
glusterfs-api-3.7.1-1.el6rhs.x86_64
glusterfs-cli-3.7.1-1.el6rhs.x86_64
glusterfs-geo-replication-3.7.1-1.el6rhs.x86_64
glusterfs-libs-3.7.1-1.el6rhs.x86_64
glusterfs-fuse-3.7.1-1.el6rhs.x86_64
glusterfs-debuginfo-3.7.1-1.el6rhs.x86_64
[root at georep1 scripts]# cat
/var/log/glusterfs/geo-replication/master/ssh%3A%2F%2Froot%4010.70.46.154%3Agluster%3A%2F%2F127.0.0.1%3Aslave.log
| grep "OSError"
[2015-06-11 22:34:23.111248] E [repce(/rhs/brick2/b2):207:__call__]
RepceClient: call 20852:140282122651392:1434042220.8 (entry_ops) failed on peer
with OSError
[2015-06-11 22:34:46.175925] E [repce(/rhs/brick2/b2):207:__call__]
RepceClient: call 21689:140594955093760:1434042280.85 (entry_ops) failed on
peer with OSError
OSError: [Errno 116] Stale file handle
[2015-06-11 22:35:08.149015] E [repce(/rhs/brick2/b2):207:__call__]
RepceClient: call 21766:140460004030208:1434042303.43 (entry_ops) failed on
peer with OSError
OSError: [Errno 116] Stale file handle
[root at georep1 scripts]#
Referenced Bugs:
https://bugzilla.redhat.com/show_bug.cgi?id=1202842
[Bug 1202842] [TRACKER] RHGS 3.1 Tracker BZ
https://bugzilla.redhat.com/show_bug.cgi?id=1222856
[Bug 1222856] [geo-rep]: worker died with "ESTALE" when performed rm -rf on
a directory from mount of master volume
https://bugzilla.redhat.com/show_bug.cgi?id=1223286
[Bug 1223286] [geo-rep]: worker died with "ESTALE" when performed rm -rf on
a directory from mount of master volume
https://bugzilla.redhat.com/show_bug.cgi?id=1223636
[Bug 1223636] 3.1 QE Tracker
https://bugzilla.redhat.com/show_bug.cgi?id=1232912
[Bug 1232912] [geo-rep]: worker died with "ESTALE" when performed rm -rf on
a directory from mount of master volume
--
You are receiving this mail because:
You are on the CC list for the bug.
You are the assignee for the bug.
More information about the Bugs
mailing list