[Bugs] [Bug 1247529] [geo-rep]: rename followed by deletes causes ESTALE

Tue Jul 28 09:04:24 UTC 2015

https://bugzilla.redhat.com/show_bug.cgi?id=1247529

--- Comment #1 from Kotresh HR <khiremat at redhat.com> ---
Description of problem:
=======================
Ran the tests which does the following FOP's inorder:

Create, chmod, chown, chgrp, symlink, hardlink, truncate, rename, remove. 

The above fops are successful and they are successfully synced to slave. But
the logs on Master and Slave are as follows:

Master:
=======

[2015-07-03 13:36:43.154763] E
[syncdutils(/bricks/brick0/master_brick0):276:log_raise_exception] <top>: FAIL: 
Traceback (most recent call last):
  File "/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py", line 165, in main
    main_i()
  File "/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py", line 659, in
main_i
    local.service_loop(*[r for r in [remote] if r])
  File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 1438, in
service_loop
    g3.crawlwrap(oneshot=True)
  File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 580, in
crawlwrap
    self.crawl()
  File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1161, in
crawl
    self.changelogs_batch_process(changes)
  File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1070, in
changelogs_batch_process
    self.process(batch)
  File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 948, in
process
    self.process_change(change, done, retry)
  File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 903, in
process_change
    failures = self.slave.server.entry_ops(entries)
  File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 226, in
__call__
    return self.ins(self.meth, *a)
  File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 208, in
__call__
    raise res
OSError: [Errno 116] Stale file handle:
'.gfid/fece7967-616b-4d13-add7-96f6a4022e11/55958721%%BO54CXD7RN'
[2015-07-03 13:36:43.156742] I
[syncdutils(/bricks/brick0/master_brick0):220:finalize] <top>: exiting.
[2015-07-03 13:36:43.159702] I [repce(agent):92:service_loop] RepceServer:
terminating on reaching EOF.

Slave:
======

[2015-07-03 13:36:38.359909] I [resource(slave):844:service_loop] GLUSTER:
slave listening
[2015-07-03 13:36:43.149735] E [repce(slave):117:worker] <top>: call failed: 
Traceback (most recent call last):
  File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 113, in worker
    res = getattr(self.obj, rmeth)(*in_data[2:])
  File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 717, in
entry_ops
    st = lstat(entry)
  File "/usr/libexec/glusterfs/python/syncdaemon/syncdutils.py", line 493, in
lstat
    return os.lstat(e)
OSError: [Errno 116] Stale file handle:
'.gfid/fece7967-616b-4d13-add7-96f6a4022e11/55958721%%BO54CXD7RN'
[2015-07-03 13:36:43.158221] I [repce(slave):92:service_loop] RepceServer:
terminating on reaching EOF.
[2015-07-03 13:36:43.158576] I [syncdutils(slave):220:finalize] <top>: exiting.

Version-Release number of selected component (if applicable):
=============================================================

How reproducible:
=================
2/2

Steps to Reproduce:
===================
1. Create geo-rep session between Master (3x2) and Slave (3x2)
2. Run the following fops in sequential order and check the arequal after each
fop:

2015-07-03 13:03:31,870 INFO run Executing crefi --multi -n 5 -b 5 -d 5
--max=10k --min=5k --random -T 5 -t text --fop=create /mnt/glusterfs
1>/dev/null 2>&1 on wingo.lab.eng.blr.redhat.com
2015-07-03 13:05:53,581 INFO run Executing crefi --multi -n 5 -b 5 -d 5
--max=10k --min=5k --random -T 5 -t text --fop=chmod /mnt/glusterfs 1>/dev/null
2>&1 on wingo.lab.eng.blr.redhat.com
2015-07-03 13:08:17,690 INFO run Executing crefi --multi -n 5 -b 5 -d 5
--max=10k --min=5k --random -T 5 -t text --fop=chown /mnt/glusterfs 1>/dev/null
2>&1 on wingo.lab.eng.blr.redhat.com
2015-07-03 13:10:41,876 INFO run Executing crefi --multi -n 5 -b 5 -d 5
--max=10k --min=5k --random -T 5 -t text --fop=chgrp /mnt/glusterfs 1>/dev/null
2>&1 on wingo.lab.eng.blr.redhat.com
2015-07-03 13:13:06,050 INFO run Executing crefi --multi -n 5 -b 5 -d 5
--max=10k --min=5k --random -T 5 -t text --fop=symlink /mnt/glusterfs
1>/dev/null 2>&1 on wingo.lab.eng.blr.redhat.com
2015-07-03 13:15:37,194 INFO run Executing crefi --multi -n 5 -b 5 -d 5
--max=10k --min=5k --random -T 5 -t text --fop=hardlink /mnt/glusterfs
1>/dev/null 2>&1 on wingo.lab.eng.blr.redhat.com
2015-07-03 13:18:16,751 INFO run Executing crefi --multi -n 5 -b 5 -d 5
--max=10k --min=5k --random -T 5 -t text --fop=truncate /mnt/glusterfs
1>/dev/null 2>&1 on wingo.lab.eng.blr.redhat.com
2015-07-03 13:21:06,530 INFO run Executing crefi --multi -n 5 -b 5 -d 5
--max=10k --min=5k --random -T 5 -t text --fop=rename /mnt/glusterfs
1>/dev/null 2>&1 on wingo.lab.eng.blr.redhat.com

-- 
You are receiving this mail because:
You are on the CC list for the bug.
You are the assignee for the bug.