[Bugs] [Bug 1221175] New: [geo-rep]: Session goes to faulty with "Cannot allocate memory" traceback when deletes were performed having trash translators ON

Wed May 13 12:29:37 UTC 2015

https://bugzilla.redhat.com/show_bug.cgi?id=1221175

            Bug ID: 1221175
           Summary: [geo-rep]: Session goes to faulty with "Cannot
                    allocate memory" traceback when deletes were performed
                    having trash translators ON
           Product: GlusterFS
           Version: 3.7.0
         Component: geo-replication
          Severity: urgent
          Assignee: bugs at gluster.org
          Reporter: rhinduja at redhat.com
                CC: bugs at gluster.org, gluster-bugs at redhat.com

Description of problem:
=======================

geo-rep session is going into faulty state with the below traceback:

[2015-05-13 17:47:26.465090] W [master(/rhs/brick1/b1):792:log_failures]
_GMaster: META FAILED: ({'go': '.gfid/8db3cbe6-946e-45cb-bf74-d233c4091003',
'stat': {'atime': 1431518244.1599989, 'gid': 0, 'mtime': 1431518665.03, 'mode':
16877, 'uid': 0}, 'op': 'META'}, 2)
[2015-05-13 17:47:26.669015] E [repce(/rhs/brick1/b1):207:__call__]
RepceClient: call 23144:139914667464448:1431519446.51 (entry_ops) failed on
peer with OSError
[2015-05-13 17:47:26.669274] E
[syncdutils(/rhs/brick1/b1):276:log_raise_exception] <top>: FAIL: 
Traceback (most recent call last):
  File "/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py", line 165, in main
    main_i()
  File "/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py", line 659, in
main_i
    local.service_loop(*[r for r in [remote] if r])
  File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 1440, in
service_loop
    g3.crawlwrap(oneshot=True)
  File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 580, in
crawlwrap
    self.crawl()
  File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1150, in
crawl
    self.changelogs_batch_process(changes)
  File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1059, in
changelogs_batch_process
    self.process(batch)
  File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 946, in
process
    self.process_change(change, done, retry)
  File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 902, in
process_change
    failures = self.slave.server.entry_ops(entries)
  File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 226, in
__call__
    return self.ins(self.meth, *a)
  File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 208, in
__call__
    raise res
OSError: [Errno 12] Cannot allocate memory
[2015-05-13 17:47:26.670993] I [syncdutils(/rhs/brick1/b1):220:finalize] <top>:
exiting.
[2015-05-13 17:47:26.674686] I [repce(agent):92:service_loop] RepceServer:
terminating on reaching EOF.
[2015-05-13 17:47:26.675065] I [syncdutils(agent):220:finalize] <top>: exiting.
[2015-05-13 17:47:27.299925] I [monitor(monitor):282:monitor] Monitor:
worker(/rhs/brick1/b1) died in startup phase

I was doing delete from fuse and NFS mount with the trash translator ON. The
session is configured to use meta volume.

Version-Release number of selected component (if applicable):
=============================================================

glusterfs-3.7.0beta2-0.0.el6.x86_64

How reproducible:
=================

Could reproduce twice in as many attempts.

Steps Carried:
==============
1. Created master cluster 
2. Created and started master volume
3. Created shared volume (gluster_shared_storage)
4. Mounted the shared volume on /var/run/gluster/shared_storage
5. Created Slave cluster
6. Created and Started slave volume
7. Created geo-rep session between master and slave
8. Configured use_meta_volume true
9. Started geo-rep
10. Mounted master volume over Fuse and NFS to client
11. Copied files /etc{1..10} from fuse mount
12. Copied files /etc{11.20} from NFS mount
13. Sync completed successfully
14. Set the option features.trash ON for master and slave volume
15. Removed the files etc.2 from fuse and etc.12 from NFS
16. From Fuse and NFS it errored "Directory not empty". Another try was
successful in removing.
17. Looked into the geo-rep session it was faulty 
18. Looked into the logs, it showed continuous traceback

Actual results:
===============

Worker died

Expected results:
================

Worker should run and sync the deleted files to slave

Additional info:
================

geo-rep session goes to faulty state. There was a similar bug with rename  and
it was closed having ID 1144428. which did not involve trash translator and
meta volume

-- 
You are receiving this mail because:
You are on the CC list for the bug.
You are the assignee for the bug.