[Bugs] [Bug 1247882] New: [geo-rep]: killing brick from replica pair makes geo-rep session faulty with Traceback "ChangelogException"

bugzilla at redhat.com bugzilla at redhat.com
Wed Jul 29 07:25:41 UTC 2015


https://bugzilla.redhat.com/show_bug.cgi?id=1247882

            Bug ID: 1247882
           Summary: [geo-rep]: killing brick from replica pair makes
                    geo-rep session faulty with Traceback
                    "ChangelogException"
           Product: GlusterFS
           Version: 3.7.0
         Component: geo-replication
          Keywords: ZStream
          Severity: urgent
          Assignee: bugs at gluster.org
          Reporter: khiremat at redhat.com
                CC: aavati at redhat.com, bugs at gluster.org, csaba at redhat.com,
                    gluster-bugs at redhat.com, nlevinki at redhat.com,
                    nsathyan at redhat.com, rcyriac at redhat.com,
                    rhinduja at redhat.com
        Depends On: 1236546, 1239044
            Blocks: 1236554



+++ This bug was initially created as a clone of Bug #1239044 +++

+++ This bug was initially created as a clone of Bug #1236546 +++

Description of problem:
=======================
Even when the ntp is configured and the systems are in sync and same timezone.
Killing the Active bricks makes the passive brick faulty too with the history
crawl failing.

[2015-07-01 15:31:06.146286] I [master(/rhs/brick1/b1):1123:crawl] _GMaster:
starting history crawl... turns: 1, stime: (1435744752, 0)
[2015-07-01 15:31:06.147336] E [repce(agent):117:worker] <top>: call failed:
Traceback (most recent call last):
  File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 113, in worker
    res = getattr(self.obj, rmeth)(*in_data[2:])
  File "/usr/libexec/glusterfs/python/syncdaemon/changelogagent.py", line 54,
in history
    num_parallel)
  File "/usr/libexec/glusterfs/python/syncdaemon/libgfchangelog.py", line 100,
in cl_history_changelog
    cls.raise_changelog_err()
  File "/usr/libexec/glusterfs/python/syncdaemon/libgfchangelog.py", line 27,
in raise_changelog_err
    raise ChangelogException(errn, os.strerror(errn))
ChangelogException: [Errno 2] No such file or directory
[2015-07-01 15:31:06.149779]

It fails for first time and succeeds later.

Version-Release number of selected component (if applicable):
=============================================================

mainline


How reproducible:
=================

Always


Steps Carried:
==============

1. Create Master and Slave Cluster
2. Create and Start Master volume (4x2) from four nodes (node1..node4)
3. Create and Start slave volume (2x2)
4. Create Meta volume (1x3) (node1..node3)
5. Create geo-rep session between master and slave volume
6. Set the config use_meta_volume to true
7. Start the geo-rep session
8. Mount the volume on Fuse
9. Start creating data from fuse client
10. While data creation is in progress, kill few active bricks {kill -9 pid}.
{Make sure that the corresponding replica brick is UP}
11. Check the geo-rep status and log.

--- Additional comment from Kotresh HR on 2015-07-03 07:02:44 EDT ---

I got it the reason for first time failure. The register time is the end time
we pass for the history API. Since the PASSIVE worker register much earlier
along with ACTIVE worker and start time it passes the stime i.e., register time
< stime

For history API, start time > end time which obviously fails.

When it registers for second time,  register time > stime and hence it passes.

There are no side effects with respect to DATA sync. It is just worker going
down and coming back. We will fix this but not a BLOCKER definitely.


Referenced Bugs:

https://bugzilla.redhat.com/show_bug.cgi?id=1236546
[Bug 1236546] [geo-rep]: killing brick from replica pair makes geo-rep
session faulty with Traceback "ChangelogException"
https://bugzilla.redhat.com/show_bug.cgi?id=1236554
[Bug 1236554] [geo-rep]: Once the bricks are killed, worker dies after few
retry the worker comesback and session becomes active withount the brick
online
https://bugzilla.redhat.com/show_bug.cgi?id=1239044
[Bug 1239044] [geo-rep]: killing brick from replica pair makes geo-rep
session faulty with Traceback "ChangelogException"
-- 
You are receiving this mail because:
You are on the CC list for the bug.
You are the assignee for the bug.


More information about the Bugs mailing list