[Bugs] [Bug 1694002] New: Geo-re: Geo replication failing in "cannot allocate memory"

Fri Mar 29 09:04:51 UTC 2019

https://bugzilla.redhat.com/show_bug.cgi?id=1694002

            Bug ID: 1694002
           Summary: Geo-re: Geo replication failing in "cannot allocate
                    memory"
           Product: GlusterFS
           Version: 6
          Hardware: x86_64
                OS: Linux
            Status: NEW
         Component: geo-replication
          Keywords: ZStream
          Severity: medium
          Priority: medium
          Assignee: bugs at gluster.org
          Reporter: khiremat at redhat.com
                CC: abhishku at redhat.com, avishwan at redhat.com,
                    bkunal at redhat.com, bugs at gluster.org, csaba at redhat.com,
                    khiremat at redhat.com, rhinduja at redhat.com,
                    rhs-bugs at redhat.com, sankarshan at redhat.com,
                    skandark at redhat.com, smulay at redhat.com,
                    storage-qa-internal at redhat.com, sunkumar at redhat.com
        Depends On: 1670429, 1693648
  Target Milestone: ---
    Classification: Community

Description of the Problem:
Geo-rep is 'Faulty' and not syncing

Slave worker crash:

[2019-01-21 14:46:36.338450] I [resource(slave):1422:connect] GLUSTER: Mounting
gluster volume locally...
[2019-01-21 14:46:47.581492] I [resource(slave):1435:connect] GLUSTER: Mounted
gluster volume   duration=11.2428
[2019-01-21 14:46:47.582036] I [resource(slave):905:service_loop] GLUSTER:
slave listening
[2019-01-21 14:47:36.831804] E [repce(slave):117:worker] <top>: call failed: 
Traceback (most recent call last):
  File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 113, in worker
    res = getattr(self.obj, rmeth)(*in_data[2:])
  File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 756, in
entry_ops
    [ESTALE, EINVAL, EBUSY])
  File "/usr/libexec/glusterfs/python/syncdaemon/syncdutils.py", line 553, in
errno_wrap
    return call(*arg)
  File "/usr/libexec/glusterfs/python/syncdaemon/libcxattr.py", line 79, in
lsetxattr
    cls.raise_oserr()
  File "/usr/libexec/glusterfs/python/syncdaemon/libcxattr.py", line 37, in
raise_oserr
    raise OSError(errn, os.strerror(errn))
OSError: [Errno 12] Cannot allocate memory

Master worker crash:

[2019-01-21 14:46:36.7253] I
[resource(/glusterfs/glprd01-vsb-pil-modshape000/brick1):1700:connect_remote]
SSH: Initializing SSH connection between master and slave...
[2019-01-21 14:46:36.7440] I
[changelogagent(/glusterfs/glprd01-vsb-pil-modshape000/brick1):73:__init__]
ChangelogAgent: Agent listining...
[2019-01-21 14:46:47.585638] I
[resource(/glusterfs/glprd01-vsb-pil-modshape000/brick1):1707:connect_remote]
SSH: SSH connection between master and slave established.  duration=11.5781
[2019-01-21 14:46:47.585905] I
[resource(/glusterfs/glprd01-vsb-pil-modshape000/brick1):1422:connect] GLUSTER:
Mounting gluster volume locally...
[2019-01-21 14:46:48.650470] I
[resource(/glusterfs/glprd01-vsb-pil-modshape000/brick1):1435:connect] GLUSTER:
Mounted gluster volume   duration=1.0644
[2019-01-21 14:46:48.650816] I
[gsyncd(/glusterfs/glprd01-vsb-pil-modshape000/brick1):803:main_i] <top>:
Worker spawn successful. Acknowledging back to monitor
[2019-01-21 14:46:50.675277] I
[master(/glusterfs/glprd01-vsb-pil-modshape000/brick1):1583:register] _GMaster:
Working dir     
path=/var/lib/misc/glusterfsd/pil-vbs-modshape/ssh%3A%2F%2Fgeoaccount%40172.21.142.
33%3Agluster%3A%2F%2F127.0.0.1%3Apil-vbs-modshape/5eaac78a29ba1e2e24b401621c5240c3
[2019-01-21 14:46:50.675633] I
[resource(/glusterfs/glprd01-vsb-pil-modshape000/brick1):1582:service_loop]
GLUSTER: Register time       time=1548082010
[2019-01-21 14:46:50.690826] I
[master(/glusterfs/glprd01-vsb-pil-modshape000/brick1):482:mgmt_lock] _GMaster:
Didn't get lock Becoming PASSIVE
brick=/glusterfs/glprd01-vsb-pil-modshape000/brick1
[2019-01-21 14:46:50.703552] I
[gsyncdstatus(/glusterfs/glprd01-vsb-pil-modshape000/brick1):282:set_passive]
GeorepStatus: Worker Status Change status=Passive
[2019-01-21 14:47:35.797741] I
[master(/glusterfs/glprd01-vsb-pil-modshape000/brick1):436:mgmt_lock] _GMaster:
Got lock Becoming ACTIVE brick=/glusterfs/glprd01-vsb-pil-modshape000/brick1
[2019-01-21 14:47:35.802330] I
[gsyncdstatus(/glusterfs/glprd01-vsb-pil-modshape000/brick1):276:set_active]
GeorepStatus: Worker Status Change  status=Active
[2019-01-21 14:47:35.804092] I
[gsyncdstatus(/glusterfs/glprd01-vsb-pil-modshape000/brick1):248:set_worker_crawl_status]
GeorepStatus: Crawl Status Change      status=History Crawl
[2019-01-21 14:47:35.804485] I
[master(/glusterfs/glprd01-vsb-pil-modshape000/brick1):1497:crawl] _GMaster:
starting history crawl      turns=1 stime=(1548059316, 0)  
entry_stime=(1548059310, 0)     etime=15480
82055
[2019-01-21 14:47:36.808142] I
[master(/glusterfs/glprd01-vsb-pil-modshape000/brick1):1526:crawl] _GMaster:
slave's time        stime=(1548059316, 0)
[2019-01-21 14:47:36.833885] E
[repce(/glusterfs/glprd01-vsb-pil-modshape000/brick1):209:__call__]
RepceClient: call failed     call=32116:139676615182144:1548082056.82       
method=entry_ops        error=OSError
[2019-01-21 14:47:36.834212] E
[syncdutils(/glusterfs/glprd01-vsb-pil-modshape000/brick1):349:log_raise_exception]
<top>: FAIL: 
Traceback (most recent call last):
  File "/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py", line 210, in main
    main_i()
  File "/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py", line 805, in
main_i
    local.service_loop(*[r for r in [remote] if r])
  File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 1588, in
service_loop
    g3.crawlwrap(oneshot=True)
  File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 597, in
crawlwrap
    self.crawl()
  File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1535, in
crawl
    self.changelogs_batch_process(changes)
  File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1435, in
changelogs_batch_process
    self.process(batch)
  File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1269, in
process
    self.process_change(change, done, retry)
  File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1165, in
process_change
    failures = self.slave.server.entry_ops(entries)
  File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 228, in
__call__
    return self.ins(self.meth, *a)
  File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 210, in
__call__
    raise res
OSError: [Errno 12] Cannot allocate memory
[2019-01-21 14:47:36.846298] I
[syncdutils(/glusterfs/glprd01-vsb-pil-modshape000/brick1):289:finalize] <top>:
exiting.
[2019-01-21 14:47:36.849236] I
[repce(/glusterfs/glprd01-vsb-pil-modshape000/brick1):92:service_loop]
RepceServer: terminating

--- Additional comment from Worker Ant on 2019-03-29 07:24:23 UTC ---

REVIEW: https://review.gluster.org/22438 (geo-rep: Fix syncing multiple rename
of symlink) merged (#2) on master by Amar Tumballi

Referenced Bugs:

https://bugzilla.redhat.com/show_bug.cgi?id=1693648
[Bug 1693648] Geo-re: Geo replication failing in "cannot allocate memory"
-- 
You are receiving this mail because:
You are on the CC list for the bug.
You are the assignee for the bug.