[Bugs] [Bug 1434018] [geo-rep]: Worker crashes with [Errno 16] Device or resource busy: '.gfid/00000000-0000-0000-0000-000000000001/ dir.166 while renaming directories
bugzilla at redhat.com
bugzilla at redhat.com
Mon Mar 20 14:50:14 UTC 2017
https://bugzilla.redhat.com/show_bug.cgi?id=1434018
Kotresh HR <khiremat at redhat.com> changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|NEW |ASSIGNED
Blocks|1385589, 1417147 |
--- Comment #1 from Kotresh HR <khiremat at redhat.com> ---
Description of problem:
=======================
While renaming directories in a loop, I am seeing worker crash with the
following traceback:
Master:
=======
[2017-03-01 07:34:23.844472] E [master(/rhs/brick3/b5):785:log_failures]
_GMaster: ENTRY FAILED: ({'stat': {'atime': 1488353577.9969134, 'gid': 0,
'mtime': 1488353577.9
969134, 'mode': 16877, 'uid': 0}, 'entry1':
'.gfid/00000000-0000-0000-0000-000000000001/rename_dir.124', 'gfid':
'a9adc254-3ec0-402d-945d-f1dcddbe411d', 'link': None, '
entry': '.gfid/00000000-0000-0000-0000-000000000001/dir.124', 'op': 'RENAME'},
2)
[2017-03-01 07:34:28.105679] E [repce(/rhs/brick3/b5):207:__call__]
RepceClient: call 21221:140592415500096:1488353664.61 (entry_ops) failed on
peer with OSError
[2017-03-01 07:34:28.109591] E
[syncdutils(/rhs/brick3/b5):296:log_raise_exception] <top>: FAIL:
Traceback (most recent call last):
File "/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py", line 204, in main
main_i()
File "/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py", line 757, in
main_i
local.service_loop(*[r for r in [remote] if r])
File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 1555, in
service_loop
g2.crawlwrap()
File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 573, in
crawlwrap
self.crawl()
File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1136, in
crawl
self.changelogs_batch_process(changes)
File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1111, in
changelogs_batch_process
self.process(batch)
File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 994, in
process
self.process_change(change, done, retry)
File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 935, in
process_change
failures = self.slave.server.entry_ops(entries)
File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 226, in
__call__
return self.ins(self.meth, *a)
File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 208, in
__call__
raise res
OSError: [Errno 16] Device or resource busy:
'.gfid/00000000-0000-0000-0000-000000000001/dir.166'
[2017-03-01 07:34:28.117834] I [syncdutils(/rhs/brick3/b5):237:finalize] <top>:
exiting.
[2017-03-01 07:34:28.138552] I [repce(agent):92:service_loop] RepceServer:
terminating on reaching EOF.
[2017-03-01 07:34:28.141488] I [syncdutils(agent):237:finalize] <top>: exiting.
[2017-03-01 07:34:36.280246] E [master(/rhs/brick1/b1):785:log_failures]
_GMaster: ENTRY FAILED: ({'stat': {'atime': 1488353579.1139069, 'gid': 0,
'mtime': 1488353579.1139069, 'mode': 16877, 'uid': 0}, 'entry1':
'.gfid/00000000-0000-0000-0000-000000000001/rename_dir.135', 'gfid':
'e15667ad-e647-4253-a84e-0a0c6143e730', 'link': None, 'entry':
'.gfid/00000000-0000-0000-0000-000000000001/dir.135', 'op': 'RENAME'}, 2)
Slave:
======
[2017-03-01 07:20:33.380264] I [resource(slave):932:service_loop] GLUSTER:
slave listening
[2017-03-01 07:34:28.50796] E [repce(slave):117:worker] <top>: call failed:
Traceback (most recent call last):
File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 113, in worker
res = getattr(self.obj, rmeth)(*in_data[2:])
File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 766, in
entry_ops
st = lstat(entry)
File "/usr/libexec/glusterfs/python/syncdaemon/syncdutils.py", line 512, in
lstat
return errno_wrap(os.lstat, [e], [ENOENT], [ESTALE])
File "/usr/libexec/glusterfs/python/syncdaemon/syncdutils.py", line 495, in
errno_wrap
return call(*arg)
OSError: [Errno 16] Device or resource busy:
'.gfid/00000000-0000-0000-0000-000000000001/dir.166'
[2017-03-01 07:34:28.146219] I [repce(slave):92:service_loop] RepceServer:
terminating on reaching EOF.
[2017-03-01 07:34:28.147622] I [syncdutils(slave):237:finalize] <top>: exiting.
Version-Release number of selected component (if applicable):
=============================================================
mainline
How reproducible:
=================
Always
Steps to Reproduce:
===================
Seen this on non-root fanout setup, but should also see on normal setup.
Writing the exact steps as carried:
1. Create Master (2 nodes) and Slave Cluster (4 nodes)
2. Create and Start Master and 2 Slave Volumes (Each 2x2)
3. Create mount-broker geo-rep session between master and 2 slave volumes
4. Mount the Master and Slave Volume (NFS and Fuse)
5. Create dir on master and rename it.
>From one client: for i in {1..1999}; do mkdir $i ; sleep 1 ; mv $i rs.$i ; done
>From second client: for i in {1..1000}; do mv dir.$i rename_dir.$i; done
>From third client: for i in {1..500}; do mkdir h.$i ; mv h.$i rsh.$i ; done
Actual results:
===============
Multiple crashes seen during rename
Referenced Bugs:
https://bugzilla.redhat.com/show_bug.cgi?id=1385589
[Bug 1385589] [geo-rep]: Worker crashes seen while renaming directories in
loop
--
You are receiving this mail because:
You are on the CC list for the bug.
You are the assignee for the bug.
More information about the Bugs
mailing list