[Bugs] [Bug 1293595] New: [geo-rep]: ChangelogException: [Errno 22] Invalid argument observed upon rebooting the ACTIVE master node

Tue Dec 22 10:21:18 UTC 2015

https://bugzilla.redhat.com/show_bug.cgi?id=1293595

            Bug ID: 1293595
           Summary: [geo-rep]: ChangelogException: [Errno 22] Invalid
                    argument observed upon rebooting the ACTIVE master
                    node
           Product: GlusterFS
           Version: 3.7.6
         Component: geo-replication
          Keywords: ZStream
          Severity: high
          Assignee: bugs at gluster.org
          Reporter: khiremat at redhat.com
                CC: bugs at gluster.org, chrisw at redhat.com, csaba at redhat.com,
                    gluster-bugs at redhat.com, nlevinki at redhat.com,
                    rhinduja at redhat.com, rhs-bugs at redhat.com,
                    storage-qa-internal at redhat.com
        Depends On: 1291988, 1292463

+++ This bug was initially created as a clone of Bug #1292463 +++

+++ This bug was initially created as a clone of Bug #1291988 +++

Description of problem:
=======================

In a geo-rep setup if a ACTIVE master is rebooted observed ChangelogException
as:

[2015-12-15 12:30:39.725385] I
[resource(/rhgs/brick1/master):1433:service_loop] GLUSTER: Register time:
1450162839
[2015-12-15 12:30:39.741044] I [master(/rhgs/brick1/master):549:crawlwrap]
_GMaster: primary master with volume id e560607b-131b-4e35-b6b7-4e9b1e369224
...
[2015-12-15 12:30:39.744363] I [master(/rhgs/brick1/master):558:crawlwrap]
_GMaster: crawl interval: 1 seconds
[2015-12-15 12:30:39.748029] I [master(/rhgs/brick1/master):1173:crawl]
_GMaster: starting history crawl... turns: 1, stime: (1450086953, 0), etime:
1450162839
[2015-12-15 12:30:39.748600] E [repce(agent):117:worker] <top>: call failed: 
Traceback (most recent call last):
  File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 113, in worker
    res = getattr(self.obj, rmeth)(*in_data[2:])
  File "/usr/libexec/glusterfs/python/syncdaemon/changelogagent.py", line 54,
in history
    num_parallel)
  File "/usr/libexec/glusterfs/python/syncdaemon/libgfchangelog.py", line 100,
in cl_history_changelog
    cls.raise_changelog_err()
  File "/usr/libexec/glusterfs/python/syncdaemon/libgfchangelog.py", line 27,
in raise_changelog_err
    raise ChangelogException(errn, os.strerror(errn))
ChangelogException: [Errno 2] No such file or directory
[2015-12-15 12:30:39.749222] E [repce(/rhgs/brick1/master):207:__call__]
RepceClient: call 13048:139759777986368:1450162839.75 (history) failed on peer
with ChangelogException
[2015-12-15 12:30:39.749366] E
[resource(/rhgs/brick1/master):1448:service_loop] GLUSTER: Changelog History
Crawl failed, [Errno 2] No such file or directory
[2015-12-15 12:30:39.749612] I [syncdutils(/rhgs/brick1/master):220:finalize]
<top>: exiting.
[2015-12-15 12:30:39.753611] I [repce(agent):92:service_loop] RepceServer:
terminating on reaching EOF.
[2015-12-15 12:30:39.753813] I [syncdutils(agent):220:finalize] <top>: exiting.
[2015-12-15 12:30:40.705105] I [monitor(monitor):229:monitor] Monitor:
------------------------------------------------------------
[2015-12-15 12:30:40.705337] I [monitor(monitor):230:monitor] Monitor: starting
gsyncd worker
[2015-12-15 12:30:40.717370] I [monitor(monitor):290:monitor] Monitor:
worker(/rhgs/brick1/master) died in startup phase
[2015-12-15 12:30:40.780814] I [gsyncd(/rhgs/brick2/master):652:main_i] <top>:
syncing: gluster://localhost:Master_vol ->
ssh://root@10.70.35.196:gluster://localhost:Slave_vol
[2015-12-15 12:30:40.781518] I [changelogagent(agent):75:__init__]
ChangelogAgent: Agent listining...
[2015-12-15 12:30:43.313604] I [master(/rhgs/brick2/master):83:gmaster_builder]
<top>: setting up xsync change detection mode
[2015-12-15 12:30:43.314054] I [master(/rhgs/brick2/master):404:__init__]
_GMaster: using 'rsync' as the sync engine
[2015-12-15 12:30:43.314963] I [master(/rhgs/brick2/master):83:gmaster_builder]
<top>: setting up changelog change detection mode
[2015-12-15 12:30:43.315217] I [master(/rhgs/brick2/master):404:__init__]
_GMaster: using 'rsync' as the sync engine
[2015-12-15 12:30:43.316255] I [master(/rhgs/brick2/master):83:gmaster_builder]
<top>: setting up changeloghistory change detection mode
[2015-12-15 12:30:43.316457] I [master(/rhgs/brick2/master):404:__init__]
_GMaster: using 'rsync' as the sync engine
[2015-12-15 12:30:45.333230] I [master(/rhgs/brick2/master):1259:register]
_GMaster: xsync temp directory:
/var/lib/misc/glusterfsd/Master_vol/ssh%3A%2F%2Froot%4010.70.35.196%3Agluster%3A%2F%2F127.0.0.1%3ASla
ve_vol/78ce16e9d2384148a6abf384c52b8ab1/xsync
[2015-12-15 12:30:45.333684] I
[resource(/rhgs/brick2/master):1433:service_loop] GLUSTER: Register time:
1450162845
[2015-12-15 12:30:45.346733] I [master(/rhgs/brick2/master):549:crawlwrap]
_GMaster: primary master with volume id e560607b-131b-4e35-b6b7-4e9b1e369224
...
[2015-12-15 12:30:45.349627] I [master(/rhgs/brick2/master):558:crawlwrap]
_GMaster: crawl interval: 1 seconds
[2015-12-15 12:30:45.353549] I [master(/rhgs/brick2/master):1173:crawl]
_GMaster: starting history crawl... turns: 1, stime: (1450086953, 0), etime:
1450162845
[2015-12-15 12:30:46.354907] E [repce(agent):117:worker] <top>: call failed: 
Traceback (most recent call last):
  File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 113, in worker
    res = getattr(self.obj, rmeth)(*in_data[2:])
  File "/usr/libexec/glusterfs/python/syncdaemon/changelogagent.py", line 57,
in history_scan
    return Changes.cl_history_scan()
  File "/usr/libexec/glusterfs/python/syncdaemon/libgfchangelog.py", line 89,
in cl_history_scan
    cls.raise_changelog_err()
  File "/usr/libexec/glusterfs/python/syncdaemon/libgfchangelog.py", line 27,
in raise_changelog_err
    raise ChangelogException(errn, os.strerror(errn))
ChangelogException: [Errno 22] Invalid argument
[2015-12-15 12:30:46.355663] E [repce(/rhgs/brick2/master):207:__call__]
RepceClient: call 13117:140461545350976:1450162845.35 (history_scan) failed on
peer with ChangelogException
[2015-12-15 12:30:46.355832] E
[resource(/rhgs/brick2/master):1448:service_loop] GLUSTER: Changelog History
Crawl failed, [Errno 22] Invalid argument
[2015-12-15 12:30:46.356125] I [syncdutils(/rhgs/brick2/master):220:finalize]
<top>: exiting.
[2015-12-15 12:30:46.359954] I [repce(agent):92:service_loop] RepceServer:
terminating on reaching EOF.
[2015-12-15 12:30:46.360156] I [syncdutils(agent):220:finalize] <top>: exiting.

At the same time the errors logged in the changes.log
=====================================================

[2015-12-15 07:02:21.780179] E [mem-pool.c:417:mem_get0]
(-->/lib64/libglusterfs.so.0(+0x876c2) [0x7f24942e06c2]
-->/lib64/libglusterfs.so.0(_gf_msg+0x79f) [0x7f24942805df]
-->/lib64/libglusterfs.so.0(mem_get
0+0x81) [0x7f24942b2811] ) 0-mem-pool: invalid argument [Invalid argument]
[2015-12-15 07:02:21.780396] E [mem-pool.c:417:mem_get0]
(-->/lib64/libglusterfs.so.0(+0x876c2) [0x7f24942e06c2]
-->/lib64/libglusterfs.so.0(_gf_msg+0x79f) [0x7f24942805df]
-->/lib64/libglusterfs.so.0(mem_get
0+0x81) [0x7f24942b2811] ) 0-mem-pool: invalid argument [Invalid argument]
[2015-12-15 07:02:21.780516] E [mem-pool.c:417:mem_get0]
(-->/lib64/libglusterfs.so.0(+0x876c2) [0x7f24942e06c2]
-->/lib64/libglusterfs.so.0(_gf_msg+0x79f) [0x7f24942805df]
-->/lib64/libglusterfs.so.0(mem_get
0+0x81) [0x7f24942b2811] ) 0-mem-pool: invalid argument [Invalid argument]
[2015-12-15 07:02:21.781449] E [mem-pool.c:417:mem_get0]
(-->/lib64/libglusterfs.so.0(recursive_rmdir+0x192) [0x7f249429dc12]
-->/lib64/libglusterfs.so.0(_gf_msg+0x79f) [0x7f24942805df]
-->/lib64/libglusterfs
.so.0(mem_get0+0x81) [0x7f24942b2811] ) 0-mem-pool: invalid argument [Invalid
argument]
[2015-12-15 07:02:21.782716] E [mem-pool.c:417:mem_get0]
(-->/lib64/libglusterfs.so.0(recursive_rmdir+0x192) [0x7f249429dc12]
-->/lib64/libglusterfs.so.0(_gf_msg+0x79f) [0x7f24942805df]
-->/lib64/libglusterfs
.so.0(mem_get0+0x81) [0x7f24942b2811] ) 0-mem-pool: invalid argument [Invalid
argument]

ve_vol/78ce16e9d2384148a6abf384c52b8ab1/.history/.processing/CHANGELOG.1450086954
to processed directory
[2015-12-15 07:29:57.229711] I
[gf-changelog.c:542:gf_changelog_register_generic] 0-gfchangelog: Registering
brick: /rhgs/brick2/master [notify filter: 1]
[2015-12-15 07:29:57.230785] E [mem-pool.c:417:mem_get0]
(-->/lib64/libglusterfs.so.0(+0x876c2) [0x7f49899686c2]
-->/lib64/libglusterfs.so.0(_gf_msg+0x79f) [0x7f49899085df]
-->/lib64/libglusterfs.so.0(mem_get0+0x81) [0x7f498993a811] ) 0-mem-pool:
invalid argument [Invalid argument]
[2015-12-15 07:29:57.230955] E [mem-pool.c:417:mem_get0]
(-->/lib64/libglusterfs.so.0(+0x876c2) [0x7f49899686c2]
-->/lib64/libglusterfs.so.0(_gf_msg+0x79f) [0x7f49899085df]
-->/lib64/libglusterfs.so.0(mem_get0+0x81) [0x7f498993a811] ) 0-mem-pool:
invalid argument [Invalid argument]
[2015-12-15 07:29:57.232027] E [mem-pool.c:417:mem_get0]
(-->/lib64/libglusterfs.so.0(recursive_rmdir+0x192) [0x7f4989925c12]
-->/lib64/libglusterfs.so.0(_gf_msg+0x79f) [0x7f49899085df]
-->/lib64/libglusterfs.so.0(mem_get0+0x81) [0x7f498993a811] ) 0-mem-pool:
invalid argument [Invalid argument]

Version-Release number of selected component (if applicable):
=============================================================

How reproducible:
=================

Observed Traceback only on one setup and only once while able to reproduce
invalid argument error with every reboot

Steps to Reproduce:
===================
1. Create 2 node master and slave clusters
2. Create and start 2x2 master volume
3. Create and start 2x2 slave volume
4. Setup geo-rep between master and slave volume
5. Reboot master ACTIVE node

Actual results:
===============

Once the node is up the Worker goes faulty for sometime with traceback and
errors [Invalid argument]

Expected results:
=================

Once the node is down, the bricks on peer node should be ACTIVE. Once the node
comes back, the bricks should be PASSIVE without going to faulty or reporting
error.

--- Additional comment from Vijay Bellur on 2015-12-17 09:26:16 EST ---

REVIEW: http://review.gluster.org/12997 (libgfchangelog: Allocate logbuf_pool
in master xlator) posted (#1) for review on master by Kotresh HR
(khiremat at redhat.com)

--- Additional comment from Vijay Bellur on 2015-12-22 04:33:17 EST ---

COMMIT: http://review.gluster.org/12997 committed in master by Venky Shankar
(vshankar at redhat.com) 
------
commit d73f5c9eae0f5c8796598fa1930844c15c2c37d4
Author: Kotresh HR <khiremat at redhat.com>
Date:   Tue Dec 15 19:55:05 2015 +0530

    libgfchangelog: Allocate logbuf_pool in master xlator

    The master xlator needs to allocate 'logbuf_pool'
    else 'gf_msg' fails with EINVAL.

    Change-Id: I6b2d3450250de7e77126d12b75b0dbc4db414bfb
    BUG: 1292463
    Signed-off-by: Kotresh HR <khiremat at redhat.com>
    Reviewed-on: http://review.gluster.org/12997
    Tested-by: NetBSD Build System <jenkins at build.gluster.org>
    Tested-by: Gluster Build System <jenkins at build.gluster.com>
    Reviewed-by: Venky Shankar <vshankar at redhat.com>

Referenced Bugs:

https://bugzilla.redhat.com/show_bug.cgi?id=1291988
[Bug 1291988] [geo-rep]: ChangelogException: [Errno 22] Invalid argument
observed upon rebooting the ACTIVE master node
https://bugzilla.redhat.com/show_bug.cgi?id=1292463
[Bug 1292463] [geo-rep]: ChangelogException: [Errno 22] Invalid argument
observed upon rebooting the ACTIVE master node
-- 
You are receiving this mail because:
You are on the CC list for the bug.
You are the assignee for the bug.