[Bugs] [Bug 1367294] IO ERROR when multiple graph switches

bugzilla at redhat.com bugzilla at redhat.com
Thu Aug 25 04:38:26 UTC 2016


https://bugzilla.redhat.com/show_bug.cgi?id=1367294



--- Comment #4 from Worker Ant <bugzilla-bot at gluster.org> ---
COMMIT: http://review.gluster.org/14835 committed in release-3.7 by Kaushal M
(kaushal at redhat.com) 
------
commit 9cd5066226770cf3c06a21757b963d315b8fe32b
Author: Poornima G <pgurusid at redhat.com>
Date:   Mon Jun 6 06:29:40 2016 -0400

    gfapi: Fix IO error caused when there is consecutive graph switches

    Issue:
    Consider a simple situation, where glfs_init() is done, i.e. initial
    graph is up. Now perform 2 volume sets that results in 2 client side
    graph changes. After this perform some IO, the IO fails with ENOTCON.
    The only way to recover this client is i guess another graph switch
    or restart.

    What actually is happening from code perspective:
    Initial graph lets say A, followed by 2 consecutive graph switches
    to B and C without any IO those two switches.

    - graph_setup (A) as a result of GF_EVENT_CHILD_UP, and
    fs->next_subvol = A

    - glfs_init() results in fs->active_subvol = A, fs->next_subvol = NULL

    - graph_setup (B) as a result of GF_EVENT_CHILD_UP, and
    fs->next_subvol = B

    - graph_setup (C) as a result of GF_EVENT_CHILD_UP, and
    fs->next_subvol = C. It also sees that the previous graph B was never
    set as fs->active_subvol, i.e. no IO or anything happened on B, so
    can safely send GF_EVENT_PARENT_DOWN (by calling glfs_subvol_done(B)).
    This parent down on B, results in child_down(B), which is fine.
    But child_down also triggers graph_setup(B).

    - graph_setup(B) as a result of GF_EVENT_CHILD_DOWN, and
    fs->next_subvol = B, and GF_EVENT_PARENT_DOWN on C as explained
    above. This again leads to GF_EVENT_CHILD_DOWN on C.

    - graph_setup(C) as a result of GF_EVENT_CHILD_DOWN, and
    fs->next_subvol = C, and GF_EVENT_PARENT_DOWN on B as explained
    above.

    Thus both the graphs B and C are disconnected, and hence the ENOTCON

    Solution:
    Remove the call to graph_setup() when the event is GF_EVENT_CHILD_DOWN.
    It don't see any reason why graph_setup should be called when there is
    child_down. Not sure what the original reason was, to have graph_setup
    in child_down. git hostory shows the first patch itself had this call.

    > Reviewed-on: http://review.gluster.org/14656
    > Smoke: Gluster Build System <jenkins at build.gluster.org>
    > CentOS-regression: Gluster Build System <jenkins at build.gluster.org>
    > NetBSD-regression: NetBSD Build System <jenkins at build.gluster.org>
    > Reviewed-by: Jeff Darcy <jdarcy at redhat.com>

    BUG: 1367294
    Change-Id: I9de86555f66cc94a05649ac863b40ed3426ffd4b
    Signed-off-by: Poornima G <pgurusid at redhat.com>
    Signed-off-by: Oleksandr Natalenko <oleksandr at natalenko.name>
    Reviewed-on: http://review.gluster.org/14835
    Smoke: Gluster Build System <jenkins at build.gluster.org>
    CentOS-regression: Gluster Build System <jenkins at build.gluster.org>
    NetBSD-regression: NetBSD Build System <jenkins at build.gluster.org>
    Reviewed-by: Kaushal M <kaushal at redhat.com>

-- 
You are receiving this mail because:
You are on the CC list for the bug.
Unsubscribe from this bug https://bugzilla.redhat.com/token.cgi?t=mHaUOyn4yk&a=cc_unsubscribe


More information about the Bugs mailing list