[Bugs] [Bug 1638159] New: data-self-heal in arbiter volume results in stale locks.

bugzilla at redhat.com bugzilla at redhat.com
Thu Oct 11 00:53:18 UTC 2018


https://bugzilla.redhat.com/show_bug.cgi?id=1638159

            Bug ID: 1638159
           Summary: data-self-heal in arbiter volume results in stale
                    locks.
           Product: GlusterFS
           Version: 5
         Component: replicate
          Keywords: Triaged
          Assignee: bugs at gluster.org
          Reporter: ravishankar at redhat.com
                CC: bugs at gluster.org
        Depends On: 1637802
            Blocks: 1636902, 1637953, 1637989, 1638026



+++ This bug was initially created as a clone of Bug #1637802 +++

Description of problem:
commit eb472d82a083883335bc494b87ea175ac43471ff in master introduced a bug
where a data-self-heal on a file in arbiter leaves a stale inodelk behind on
the bricks. Thus any new write to the file from a client can hang

How reproducible:
Always.

Steps to Reproduce:
1. Create 1x (2+1) arbiter, fuse mount it and create a file
2. Kill arbiter brick, write to the file, bring back arbiter and let self-heal
complete.
3. Next write to the file from mount will hang because the inodelk gets blocked
because of the previous stale locks left behind from self-heal


Additional info:
Downstream bug which found the issue: BZ 1636902

--- Additional comment from Worker Ant on 2018-10-10 02:56:21 EDT ---

REVIEW: https://review.gluster.org/21380 (afr: prevent winding inodelks twice
for arbiter volumes) posted (#1) for review on master by Ravishankar N

--- Additional comment from Worker Ant on 2018-10-10 12:19:31 EDT ---

COMMIT: https://review.gluster.org/21380 committed in master by "Amar Tumballi"
<amarts at redhat.com> with a commit message- afr: prevent winding inodelks twice
for arbiter volumes

Problem:
In an arbiter volume, if there is a pending data heal of a file only on
arbiter brick, self-heal takes inodelks twice due to a code-bug but unlocks
it only once, leaving behind a stale lock on the brick. This causes
the next write to the file to hang.

Fix:
Fix the code-bug to take lock only once. This bug was introduced master
with commit eb472d82a083883335bc494b87ea175ac43471ff

Thanks to  Pranith Kumar K <pkarampu at redhat.com> for finding the RCA.

fixes: bz#1637802
Change-Id: I15ad969e10a6a3c4bd255e2948b6be6dcddc61e1
Signed-off-by: Ravishankar N <ravishankar at redhat.com>


Referenced Bugs:

https://bugzilla.redhat.com/show_bug.cgi?id=1636902
[Bug 1636902] Healing is not completed on Distributed-Replicated ( Arbiter
)
https://bugzilla.redhat.com/show_bug.cgi?id=1637802
[Bug 1637802] data-self-heal in arbiter volume results in stale locks.
https://bugzilla.redhat.com/show_bug.cgi?id=1637953
[Bug 1637953] data-self-heal in arbiter volume results in stale locks.
https://bugzilla.redhat.com/show_bug.cgi?id=1637989
[Bug 1637989] data-self-heal in arbiter volume results in stale locks.
https://bugzilla.redhat.com/show_bug.cgi?id=1638026
[Bug 1638026] Healing is not completed on Distributed-Replicated ( Arbiter
)
-- 
You are receiving this mail because:
You are on the CC list for the bug.
You are the assignee for the bug.


More information about the Bugs mailing list