[Bugs] [Bug 1246987] New: Deceiving log messages like "Failing STAT on gfid : split-brain observed. [Input/output error]" reported

bugzilla at redhat.com bugzilla at redhat.com
Mon Jul 27 06:10:09 UTC 2015


https://bugzilla.redhat.com/show_bug.cgi?id=1246987

            Bug ID: 1246987
           Summary: Deceiving log messages like "Failing STAT on gfid :
                    split-brain observed. [Input/output error]" reported
           Product: GlusterFS
           Version: 3.7.3
         Component: replicate
          Keywords: Triaged
          Severity: medium
          Assignee: bugs at gluster.org
          Reporter: kdhananj at redhat.com
                CC: bugs at gluster.org, gluster-bugs at redhat.com,
                    ndevos at redhat.com, pkarampu at redhat.com,
                    saujain at redhat.com, storage-qa-internal at redhat.com
        Depends On: 1240657, 1246052



+++ This bug was initially created as a clone of Bug #1246052 +++

+++ This bug was initially created as a clone of Bug #1240657 +++

Description of problem:
I try to delete a directory and I the error messages in ganesha-gfapi.log, like
these ones,

[2015-07-07 18:04:34.786903] W [MSGID: 114031]
[client-rpc-fops.c:531:client3_3_stat_cbk] 0-vol3-client-8: remote operation
failed [No such file or directory]
[2015-07-07 18:04:34.787612] E [MSGID: 108008]
[afr-read-txn.c:76:afr_read_txn_refresh_done] 0-vol3-replicate-3: Failing STAT
on gfid 18a973c4-73d3-48b8-942c-33a6f1a8e6b4: split-brain observed.
[Input/output error]
[2015-07-07 18:04:34.787954] E [MSGID: 108008]
[afr-read-txn.c:76:afr_read_txn_refresh_done] 0-vol3-replicate-1: Failing STAT
on gfid 18a973c4-73d3-48b8-942c-33a6f1a8e6b4: split-brain observed.
[Input/output error]
[2015-07-07 18:04:34.788090] E [MSGID: 108008]
[afr-read-txn.c:76:afr_read_txn_refresh_done] 0-vol3-replicate-5: Failing STAT
on gfid 18a973c4-73d3-48b8-942c-33a6f1a8e6b4: split-brain observed.
[Input/output error]
[2015-07-07 18:04:34.788191] E [MSGID: 108008]
[afr-read-txn.c:76:afr_read_txn_refresh_done] 0-vol3-replicate-0: Failing STAT
on gfid 18a973c4-73d3-48b8-942c-33a6f1a8e6b4: split-brain observed.
[Input/output error]
[2015-07-07 18:04:34.788240] E [MSGID: 108008]
[afr-read-txn.c:76:afr_read_txn_refresh_done] 0-vol3-replicate-2: Failing STAT
on gfid 18a973c4-73d3-48b8-942c-33a6f1a8e6b4: split-brain observed.
[Input/output error]
[2015-07-07 18:04:34.788478] E [MSGID: 108008]
[afr-read-txn.c:76:afr_read_txn_refresh_done] 0-vol3-replicate-4: Failing STAT
on gfid 18a973c4-73d3-48b8-942c-33a6f1a8e6b4: split-brain observed.
[Input/output error]


Though the directory deletion is successful, test was done on vers=4

Version-Release number of selected component (if applicable):
nfs-ganesha-2.2.0-4.el6rhs.x86_64
glusterfs-3.7.1-7.el6rhs.x86_64

How reproducible:
always

Actual results:
as described above

Expected results:
The above logs may be confusing while debugging the issue, hence we should try
to avoid these kind of confusing logs.

Additional info:

--- Additional comment from Saurabh on 2015-07-07 08:49:18 EDT ---



--- Additional comment from Soumya Koduri on 2015-07-08 06:48:54 EDT ---

Could you please provide the steps which led to this issue. Normal directory
removal operations work for us.

Also please CC the nfs team so that we do not miss out the bugs if needed.
Thanks!

--- Additional comment from Saurabh on 2015-07-08 07:03:37 EDT ---

rm -rf /mount-point/dir-name
or rmdir /mount-point/dir-name

--- Additional comment from Soumya Koduri on 2015-07-08 07:05:40 EDT ---

Please provide the tests you have been running before you hit the issue and if
its consistently reproducible and also the volume setup details (if in case any
other features are on or any bricks unavailable?)

--- Additional comment from Saurabh on 2015-07-08 07:20:52 EDT ---

It is pretty staright forward hence I just wrote the description.

1. create a volume of type 6x2, start it
2. mount the volume with vers=4, post configuring nfs-ganesha
3. mkdir /mount-point/<dirname>
4. rmdir /mount-point/<dirname>

--- Additional comment from Soumya Koduri on 2015-07-08 07:55:14 EDT ---

Thanks Saurabh. Have changed the bug summary to reflect that.

--- Additional comment from Niels de Vos on 2015-07-20 08:45:10 EDT ---

These messages are related to AFR, changing the component.

When a directory (or file) over NFS gets removed, a stat() on the filehandle
gets done afterwards. This is needed for updating the inode-cache that could
still be valid for hardlinks.

It is not clear to me what a stat() on a GFID could return EIO instead of
ENOENT.

--- Additional comment from Anand Avati on 2015-07-24 04:39:20 EDT ---

REVIEW: http://review.gluster.org/11756 (cluster/afr: Fix incorrect logging in
read transactions) posted (#1) for review on master by Krutika Dhananjay
(kdhananj at redhat.com)

--- Additional comment from Anand Avati on 2015-07-27 02:05:53 EDT ---

COMMIT: http://review.gluster.org/11756 committed in master by Pranith Kumar
Karampuri (pkarampu at redhat.com) 
------
commit 7bec717f8850135368609fccf1b1c697af60c546
Author: Krutika Dhananjay <kdhananj at redhat.com>
Date:   Thu Jul 23 18:08:34 2015 +0530

    cluster/afr: Fix incorrect logging in read transactions

    afr_read_txn_refresh_done() at its entry point can fail for
    reasons like ENOENT/ESTALE but seldom due to EIO, which is something
    _AFR_ would internally generate and not receive in response from
    a child translator. AFR is reporting "split-brain" for _any_
    kind of failure in read txn, of the following kind:

    [2015-07-07 18:04:34.787612] E [MSGID: 108008]
    [afr-read-txn.c:76:afr_read_txn_refresh_done] 0-vol3-replicate-3:
    Failing STAT on gfid 18a973c4-73d3-48b8-942c-33a6f1a8e6b4:
    split-brain observed. [Input/output error]

    This patch fixes such misleading errors.

    To-Do:
    Avoid logging EIO if/when split-brain choice is set.
    Will do that as part of a separate commit.

    Change-Id: Ib513c75168f7026118ad5b3f0b35e9dd498cfe1e
    BUG: 1246052
    Signed-off-by: Krutika Dhananjay <kdhananj at redhat.com>
    Reviewed-on: http://review.gluster.org/11756
    Tested-by: NetBSD Build System <jenkins at build.gluster.org>
    Reviewed-by: Pranith Kumar Karampuri <pkarampu at redhat.com>
    Reviewed-by: Anuradha Talur <atalur at redhat.com>
    Reviewed-by: Ravishankar N <ravishankar at redhat.com>
    Tested-by: Gluster Build System <jenkins at build.gluster.com>


Referenced Bugs:

https://bugzilla.redhat.com/show_bug.cgi?id=1240657
[Bug 1240657] Deceiving log messages like "Failing STAT on gfid :
split-brain observed. [Input/output error]" reported
https://bugzilla.redhat.com/show_bug.cgi?id=1246052
[Bug 1246052] Deceiving log messages like "Failing STAT on gfid :
split-brain observed. [Input/output error]" reported
-- 
You are receiving this mail because:
You are on the CC list for the bug.
You are the assignee for the bug.


More information about the Bugs mailing list