[Bugs] [Bug 1412914] New: Spurious split-brain error messages are seen in rebalance logs

bugzilla at redhat.com bugzilla at redhat.com
Fri Jan 13 06:09:01 UTC 2017


https://bugzilla.redhat.com/show_bug.cgi?id=1412914

            Bug ID: 1412914
           Summary: Spurious split-brain error messages are seen in
                    rebalance logs
           Product: GlusterFS
           Version: 3.9
         Component: replicate
          Keywords: Triaged
          Severity: medium
          Priority: medium
          Assignee: bugs at gluster.org
          Reporter: kdhananj at redhat.com
                CC: bugs at gluster.org, kdhananj at redhat.com,
                    nchilaka at redhat.com, pkarampu at redhat.com,
                    rhs-bugs at redhat.com, storage-qa-internal at redhat.com,
                    tdesala at redhat.com
        Depends On: 1411617, 1411625



+++ This bug was initially created as a clone of Bug #1411625 +++

+++ This bug was initially created as a clone of Bug #1411617 +++

Description of problem:
=======================
On a nfs-ganesha setup, while rm -rf and remove-brick operation are
in-progress, we are seeing spurious split-brain observed error messages in
rebalance logs.

Rebalance logs error snippet:
=============================
[2017-01-09 06:50:36.232738] E [MSGID: 108008]
[afr-read-txn.c:80:afr_read_txn_refresh_done] 0-distrep-replicate-6: Failing
GETXATTR on gfid 5ab6a290-3127-4662-86e7-c52d32949c67: split-brain observed.
[Input/output error]
[2017-01-09 06:50:36.244473] E [MSGID: 108008]
[afr-read-txn.c:80:afr_read_txn_refresh_done] 0-distrep-replicate-6: Failing
STAT on gfid 5ab6a290-3127-4662-86e7-c52d32949c67: split-brain observed.
[Input/output error]
[2017-01-09 06:50:38.930970] E [MSGID: 108008]
[afr-read-txn.c:80:afr_read_txn_refresh_done] 0-distrep-replicate-8: Failing
GETXATTR on gfid 000feb2a-2a8f-40f1-ae9e-926f0d0ae323: split-brain observed.
[Input/output error]
[2017-01-09 06:50:38.944043] E [MSGID: 108008]
[afr-read-txn.c:80:afr_read_txn_refresh_done] 0-distrep-replicate-8: Failing
STAT on gfid 000feb2a-2a8f-40f1-ae9e-926f0d0ae323: split-brain observed.
[Input/output error]
[2017-01-09 06:50:43.595767] E [MSGID: 108008]
[afr-read-txn.c:80:afr_read_txn_refresh_done] 0-distrep-replicate-8: Failing
GETXATTR on gfid a6f9d15e-969b-4630-867d-d7a402f242b2: split-brain observed.
[Input/output error]
[2017-01-09 06:50:43.611669] E [MSGID: 108008]
[afr-read-txn.c:80:afr_read_txn_refresh_done] 0-distrep-replicate-8: Failing
STAT on gfid a6f9d15e-969b-4630-867d-d7a402f242b2: split-brain observed.
[Input/output error]
[2017-01-09 06:50:46.798033] E [MSGID: 108008]
[afr-read-txn.c:80:afr_read_txn_refresh_done] 0-distrep-replicate-6: Failing
GETXATTR on gfid b0a4fef7-bd4c-472f-9027-eb6aef268e29: split-brain observed.
[Input/output error]
[2017-01-09 06:50:46.810447] E [MSGID: 108008]
[afr-read-txn.c:80:afr_read_txn_refresh_done] 0-distrep-replicate-6: Failing
STAT on gfid b0a4fef7-bd4c-472f-9027-eb6aef268e29: split-brain observed.
[Input/output error]


Version-Release number of selected component (if applicable):
3.8.4-10.el7rhgs.x86_64

Steps to Reproduce:
===================
1) Create ganesha cluster and create a distributed-replicate volume.
2) Enable nfs-ganesha on the volume with mdcache settings.
3) Mount the volume.
4) Create files and folders.
5) From mount point, issue rm -rf * and start removing bricks.

We can see split-brain error messages in rebalance logs.

Actual results:
===============
During rebalance, spurious split-brain error messages are seen in rebalance
logs.

Expected results:
=================
There should not be any split-brain error messages as actually no split-brain
has occurred.

--- Additional comment from Worker Ant on 2017-01-10 22:22:31 EST ---

REVIEW: http://review.gluster.org/16362 (cluster/afr: Do not log of split-brain
when there isn't one) posted (#2) for review on master by Krutika Dhananjay
(kdhananj at redhat.com)

--- Additional comment from Worker Ant on 2017-01-10 23:09:32 EST ---

REVIEW: http://review.gluster.org/16362 (cluster/afr: Do not log of split-brain
when there isn't one) posted (#3) for review on master by Krutika Dhananjay
(kdhananj at redhat.com)

--- Additional comment from Worker Ant on 2017-01-12 01:42:06 EST ---

COMMIT: http://review.gluster.org/16362 committed in master by Pranith Kumar
Karampuri (pkarampu at redhat.com) 
------
commit 5b24934668adb89e1dcd3888ac19555056508f06
Author: Krutika Dhananjay <kdhananj at redhat.com>
Date:   Tue Jan 10 13:26:02 2017 +0530

    cluster/afr: Do not log of split-brain when there isn't one

    * Even on errors like ENOENT, AFR logs split-brain after
      read-txn refresh, introduced by commit a07ddd8f.
      This can be a cause of much panic and confusion and needs to be fixed.

    * Also fixed this issue in write-txns.

    * Fixed afr read txns to log about split-brain only after knowing that
      there is no split-brain choice configured.

    * Removed code duplication

    * Fixed incorrect passing of error code in afr_write_txn_refresh_done()
      (the function was passing -0 as errno to gf_msg().

    Change-Id: I354f454ce5bf0e5f00bc27916eb597367cb7d927
    BUG: 1411625
    Signed-off-by: Krutika Dhananjay <kdhananj at redhat.com>
    Reviewed-on: http://review.gluster.org/16362
    Smoke: Gluster Build System <jenkins at build.gluster.org>
    NetBSD-regression: NetBSD Build System <jenkins at build.gluster.org>
    CentOS-regression: Gluster Build System <jenkins at build.gluster.org>
    Reviewed-by: Ravishankar N <ravishankar at redhat.com>
    Reviewed-by: Pranith Kumar Karampuri <pkarampu at redhat.com>


Referenced Bugs:

https://bugzilla.redhat.com/show_bug.cgi?id=1411617
[Bug 1411617] Spurious split-brain error messages are seen in rebalance
logs
https://bugzilla.redhat.com/show_bug.cgi?id=1411625
[Bug 1411625] Spurious split-brain error messages are seen in rebalance
logs
-- 
You are receiving this mail because:
You are on the CC list for the bug.
You are the assignee for the bug.


More information about the Bugs mailing list