[Bugs] [Bug 1159190] New: dist-geo-rep: Session going into faulty with "Can no allocate memory" backtrace when pause, rename and resume is performed
bugzilla at redhat.com
bugzilla at redhat.com
Fri Oct 31 07:52:34 UTC 2014
https://bugzilla.redhat.com/show_bug.cgi?id=1159190
Bug ID: 1159190
Summary: dist-geo-rep: Session going into faulty with "Can no
allocate memory" backtrace when pause, rename and
resume is performed
Product: GlusterFS
Version: 3.6.0
Component: geo-replication
Severity: high
Assignee: bugs at gluster.org
Reporter: khiremat at redhat.com
CC: aavati at redhat.com, asrivast at redhat.com,
avishwan at redhat.com, bugs at gluster.org,
csaba at redhat.com, gluster-bugs at redhat.com,
khiremat at redhat.com, nlevinki at redhat.com,
rhs-bugs at redhat.com, smanjara at redhat.com,
ssamanta at redhat.com, storage-qa-internal at redhat.com,
vbhat at redhat.com
Depends On: 1144428, 1146823
Blocks: 1147422
+++ This bug was initially created as a clone of Bug #1146823 +++
+++ This bug was initially created as a clone of Bug #1144428 +++
Description of problem:
The session is going into faulty with OSError: [Errno 12] Cannot allocate
memory backtrace in the logs. The operation I performed was sync existing data
-> pause session -> rename all the files -> resume the session
Version-Release number of selected component (if applicable):
mainline
How reproducible:
Hit only once. Not sure I will be able to reproduce again.
Steps to Reproduce:
1. Create and start a geo-rep session between 2*2 dist-rep master and 2*2
dist-rep slave volume.
2. Create and sync some 5k files in some directory structure.
3. Now pause the session.
5. rename all the files.
6. resume the session.
Actual results:
The session went to faulty
MASTER NODE MASTER VOL MASTER BRICK SLAVE
STATUS CHECKPOINT STATUS CRAWL STATUS
-----------------------------------------------------------------------------------------------------------------------------
ccr.blr.redhat.com master /bricks/brick0 nirvana::slave
faulty N/A N/A
metallica.blr.redhat.com master /bricks/brick1 acdc::slave
Passive N/A N/A
beatles.blr.redhat.com master /bricks/brick3 rammstein::slave
Passive N/A N/A
pinkfloyd.blr.redhat.com master /bricks/brick2 led::slave
faulty N/A N/A
The backtrace in the master logs.
[2014-09-19 16:19:53.933645] I [master(/bricks/brick2):1225:crawl] _GMaster:
slave's time: (1411061833, 0)
[2014-09-19 16:20:33.653033] E [repce(/bricks/brick2):207:__call__]
RepceClient: call 18787:139727562630912:1411123833.64 (entry_ops) failed on
peer with OSError
[2014-09-19 16:20:33.653924] E
[syncdutils(/bricks/brick2):270:log_raise_exception] <top>: FAIL:
Traceback (most recent call last):
File "/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py", line 164, in main
main_i()
File "/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py", line 643, in
main_i
local.service_loop(*[r for r in [remote] if r])
File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 1324, in
service_loop
g3.crawlwrap(oneshot=True)
File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 524, in
crawlwrap
self.crawl(no_stime_update=no_stime_update)
File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1236, in
crawl
self.process(changes)
File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 927, in
process
self.process_change(change, done, retry)
File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 891, in
process_change
self.slave.server.entry_ops(entries)
File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 226, in
__call__
return self.ins(self.meth, *a)
File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 208, in
__call__
raise res
OSError: [Errno 12] Cannot allocate memory
[2014-09-19 16:20:33.657620] I [syncdutils(/bricks/brick2):214:finalize] <top>:
exiting.
[2014-09-19 16:20:33.663028] I [repce(agent):92:service_loop] RepceServer:
terminating on reaching EOF.
[2014-09-19 16:20:33.663907] I [syncdutils(agent):214:finalize] <top>: exiting.
[2014-09-19 16:20:33.795839] I [monitor(monitor):222:monitor] Monitor:
worker(/bricks/brick2) died in startup phase
This is a remote backtrace propagated to master via RPC. The actual backtrace
in slave logs are
[2014-09-19 16:27:45.780600] E [repce(slave):117:worker] <top>: call failed:
Traceback (most recent call last):
File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 113, in worker
res = getattr(self.obj, rmeth)(*in_data[2:])
File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 662, in
entry_ops
[ENOENT, ESTALE, EINVAL])
File "/usr/libexec/glusterfs/python/syncdaemon/syncdutils.py", line 470, in
errno_wrap
return call(*arg)
File "/usr/libexec/glusterfs/python/syncdaemon/libcxattr.py", line 78, in
lsetxattr
cls.raise_oserr()
File "/usr/libexec/glusterfs/python/syncdaemon/libcxattr.py", line 37, in
raise_oserr
raise OSError(errn, os.strerror(errn))
OSError: [Errno 12] Cannot allocate memory
[2014-09-19 16:27:45.794786] I [repce(slave):92:service_loop] RepceServer:
terminating on reaching EOF.
Expected results:
There should be no backtraces and no faulty sessions.
Additional info:
The slave volume had Cluster.hash-range-gfid on
Referenced Bugs:
https://bugzilla.redhat.com/show_bug.cgi?id=1144428
[Bug 1144428] dist-geo-rep: Session going into faulty with "Can no allocate
memory" backtrace when pause, rename and resume is performed
https://bugzilla.redhat.com/show_bug.cgi?id=1146823
[Bug 1146823] dist-geo-rep: Session going into faulty with "Can no allocate
memory" backtrace when pause, rename and resume is performed
https://bugzilla.redhat.com/show_bug.cgi?id=1147422
[Bug 1147422] dist-geo-rep: Session going into faulty with "Can no allocate
memory" backtrace when pause, rename and resume is performed
--
You are receiving this mail because:
You are on the CC list for the bug.
You are the assignee for the bug.
More information about the Bugs
mailing list