[Bugs] [Bug 1398888] self-heal info command hangs after triggering self-heal

bugzilla at redhat.com bugzilla at redhat.com
Sun Nov 27 08:38:03 UTC 2016


https://bugzilla.redhat.com/show_bug.cgi?id=1398888



--- Comment #2 from Worker Ant <bugzilla-bot at gluster.org> ---
COMMIT: http://review.gluster.org/15932 committed in release-3.9 by Pranith
Kumar Karampuri (pkarampu at redhat.com) 
------
commit 13725b3f30f90a11771602c546875eb70831ae5d
Author: Krutika Dhananjay <kdhananj at redhat.com>
Date:   Fri Nov 25 15:54:30 2016 +0530

    cluster/afr: Fix deadlock due to compound fops

            Backport of: http://review.gluster.org/15929

    When an afr data transaction is eligible for using
    eager-lock, this information is represented in
    local->transaction.eager_lock_on. However, if non-blocking
    inodelk attempt (which is a full lock) fails, AFR falls back
    to blocking locks which are range locks. At this point,
    local->transaction.eager_lock[] per brick is reset but
    local->transaction.eager_lock_on is still true.
    When AFR decides to compound post-op and unlock, it is after
    confirming that the transaction did not use eager lock (well,
    except for a small bug where local->transaction.locks_acquired[]
    is not considered).

    But within afr_post_op_unlock_do(), afr again incorrectly sets
    the lock range to full-lock based on local->transaction.eager_lock_on
    value. This is a bug and can lead to deadlock since the locks acquired
    were range locks and a full unlock is being sent leading to unlock failure
    and thereby every other lock request (be it from SHD or other clients or
    glfsheal) getting blocked forever and the user perceives a hang.

    FIX:
    Unconditionally rely on the range locks in inodelk object for unlocking
    when using compounded post-op + unlock.

    Big thanks to Pranith for helping with the debugging.

    Change-Id: I2edcc13ac00bc1ba2e3558891ba98d0cd410b47a
    BUG: 1398888
    Signed-off-by: Krutika Dhananjay <kdhananj at redhat.com>
    Reviewed-on: http://review.gluster.org/15932
    Smoke: Gluster Build System <jenkins at build.gluster.org>
    NetBSD-regression: NetBSD Build System <jenkins at build.gluster.org>
    CentOS-regression: Gluster Build System <jenkins at build.gluster.org>
    Reviewed-by: Pranith Kumar Karampuri <pkarampu at redhat.com>

-- 
You are receiving this mail because:
You are on the CC list for the bug.
Unsubscribe from this bug https://bugzilla.redhat.com/token.cgi?t=uA0Di5oe6q&a=cc_unsubscribe


More information about the Bugs mailing list