[Bugs] [Bug 1177418] New: entry self-heal in 3.5 and 3.6 are not compatible

bugzilla at redhat.com bugzilla at redhat.com
Sat Dec 27 07:30:50 UTC 2014


https://bugzilla.redhat.com/show_bug.cgi?id=1177418

            Bug ID: 1177418
           Summary: entry self-heal in 3.5 and 3.6 are not compatible
           Product: GlusterFS
           Version: 3.6.1
         Component: replicate
          Assignee: bugs at gluster.org
          Reporter: pkarampu at redhat.com
                CC: bugs at gluster.org, gluster-bugs at redhat.com
        Depends On: 1168189
            Blocks: 1177339



+++ This bug was initially created as a clone of Bug #1168189 +++

Description of problem:
entry self-heal in 3.6 and above, takes full lock on the directory only for the
duration of figuring out the xattrs of the directories where as 3.5 takes locks
through out the entry-self-heal. If the cluster is heterogeneous then there is
a chance that 3.6 self-heal is triggered and then 3.5 self-heal will also
triggered and both the self-heal daemons of 3.5 and 3.6 do self-heal.

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1. Create a replicate volume consisting 2 bricks on machines m1, m2 on version
3.5
2. Create a directory 'd' inside the mount on m2 and cd into it
3. While a brick was down on m2 create lot of files in this directory 'd'. I
created 10000 files.
4. upgrade m1 to 3.6
5. Bring the bricks up and initiate self-heal of directories.
6. 3.6 self-heal daemon will start healing
7. access 'd' on mount in m2 then that will also trigger heal sometimes.

Actual results:
Self-heal is happening on directory 'd' by both self-heal daemons in 3.5, 3.6

Expected results:


Additional info:

--- Additional comment from Pranith Kumar K on 2014-11-26 06:26:03 EST ---

With the patch:
while 3.5 heal is in progress 3.6 heal was prevented:
[root at localhost ~]# grep d0555511ee7f0000 /var/log/glusterfs/bricks/brick.log |
grep ENTRYLK
[2014-11-26 11:01:51.165591] I [entrylk.c:244:entrylk_trace_in] 0-r2-locks:
[REQUEST] Locker = {Pid=18446744073709551610, lk-owner=d0555511ee7f0000,
Client=0x7f51d6d21ac0, Frame=19} Lockee =
{gfid=26625058-b5f2-4561-97da-ec9e7268119e, fd=(nil), path=/d} Lock =
{lock=ENTRYLK, cmd=LOCK_NB, type=WRITE, basename=(null), domain:
r2-replicate-0:self-heal}
[2014-11-26 11:01:51.165633] I [entrylk.c:271:entrylk_trace_out] 0-r2-locks:
[GRANTED] Locker = {Pid=18446744073709551610, lk-owner=d0555511ee7f0000,
Client=0x7f51d6d21ac0, Frame=19} Lockee =
{gfid=26625058-b5f2-4561-97da-ec9e7268119e, fd=(nil), path=/d} Lock =
{lock=ENTRYLK, cmd=LOCK_NB, type=WRITE, basename=(null), domain:
r2-replicate-0:self-heal}
[2014-11-26 11:01:51.173176] I [entrylk.c:244:entrylk_trace_in] 0-r2-locks:
[REQUEST] Locker = {Pid=18446744073709551610, lk-owner=d0555511ee7f0000,
Client=0x7f51d6d21ac0, Frame=21} Lockee =
{gfid=26625058-b5f2-4561-97da-ec9e7268119e, fd=(nil), path=/d} Lock =
{lock=ENTRYLK, cmd=LOCK_NB, type=WRITE,
basename=aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa}
[2014-11-26 11:01:51.173242] I [entrylk.c:271:entrylk_trace_out] 0-r2-locks:
[TRYAGAIN] Locker = {Pid=18446744073709551610, lk-owner=d0555511ee7f0000,
Client=0x7f51d6d21ac0, Frame=21} Lockee =
{gfid=26625058-b5f2-4561-97da-ec9e7268119e, fd=(nil), path=/d} Lock =
{lock=ENTRYLK, cmd=LOCK_NB, type=WRITE,
basename=aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa}
[2014-11-26 11:01:51.184939] I [entrylk.c:244:entrylk_trace_in] 0-r2-locks:
[REQUEST] Locker = {Pid=18446744073709551610, lk-owner=d0555511ee7f0000,
Client=0x7f51d6d21ac0, Frame=23} Lockee =
{gfid=26625058-b5f2-4561-97da-ec9e7268119e, fd=(nil), path=/d} Lock =
{lock=ENTRYLK, cmd=UNLOCK, type=WRITE, basename=(null), domain:
r2-replicate-0:self-heal}
[2014-11-26 11:01:51.184989] I [entrylk.c:271:entrylk_trace_out] 0-r2-locks:
[GRANTED] Locker = {Pid=18446744073709551610, lk-owner=d0555511ee7f0000,
Client=0x7f51d6d21ac0, Frame=23} Lockee =
{gfid=26625058-b5f2-4561-97da-ec9e7268119e, fd=(nil), path=/d} Lock =
{lock=ENTRYLK, cmd=UNLOCK, type=WRITE, basename=(null), domain:
r2-replicate-0:self-heal}

--- Additional comment from Pranith Kumar K on 2014-12-02 01:37:54 EST ---

In this test case, 3.6 does healing where as 3.5 heal will not get locks:

[root at localhost ~]# egrep "(aaaaaaaaaaa|TRYAGAIN)"
/var/log/glusterfs/bricks/brick.log | grep ENTRY
[2014-12-02 06:14:54.502242] I [entrylk.c:244:entrylk_trace_in] 0-r2-locks:
[REQUEST] Locker = {Pid=18446744073709551610, lk-owner=a41e327d2e7f0000,
Client=0x7f831e1a4ac0, Frame=1064} Lockee =
{gfid=fab813d6-2ef2-4885-a293-91476cc5d167, fd=(nil),
path=<gfid:fab813d6-2ef2-4885-a293-91476cc5d167>} Lock = {lock=ENTRYLK,
cmd=LOCK_NB, type=WRITE,
basename=aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa}
[2014-12-02 06:14:54.502261] I [entrylk.c:271:entrylk_trace_out] 0-r2-locks:
[GRANTED] Locker = {Pid=18446744073709551610, lk-owner=a41e327d2e7f0000,
Client=0x7f831e1a4ac0, Frame=1064} Lockee =
{gfid=fab813d6-2ef2-4885-a293-91476cc5d167, fd=(nil),
path=<gfid:fab813d6-2ef2-4885-a293-91476cc5d167>} Lock = {lock=ENTRYLK,
cmd=LOCK_NB, type=WRITE,
basename=aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa}
[2014-12-02 06:15:01.491651] I [entrylk.c:271:entrylk_trace_out] 0-r2-locks:
[TRYAGAIN] Locker = {Pid=18446744073709551615, lk-owner=08b4e7739a7f0000,
Client=0x7f831e1e7880, Frame=1621} Lockee =
{gfid=fab813d6-2ef2-4885-a293-91476cc5d167, fd=(nil),
path=<gfid:fab813d6-2ef2-4885-a293-91476cc5d167>} Lock = {lock=ENTRYLK,
cmd=LOCK_NB, type=WRITE, basename=(null), domain: r2-replicate-0}
[2014-12-02 06:18:36.741434] I [entrylk.c:244:entrylk_trace_in] 0-r2-locks:
[REQUEST] Locker = {Pid=18446744073709551610, lk-owner=a41e327d2e7f0000,
Client=0x7f831e1a4ac0, Frame=91112} Lockee =
{gfid=fab813d6-2ef2-4885-a293-91476cc5d167, fd=(nil),
path=<gfid:fab813d6-2ef2-4885-a293-91476cc5d167>} Lock = {lock=ENTRYLK,
cmd=UNLOCK, type=WRITE,
basename=aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa}
[2014-12-02 06:18:36.741604] I [entrylk.c:271:entrylk_trace_out] 0-r2-locks:
[GRANTED] Locker = {Pid=18446744073709551610, lk-owner=a41e327d2e7f0000,
Client=0x7f831e1a4ac0, Frame=91112} Lockee =
{gfid=fab813d6-2ef2-4885-a293-91476cc5d167, fd=(nil),
path=<gfid:fab813d6-2ef2-4885-a293-91476cc5d167>} Lock = {lock=ENTRYLK,
cmd=UNLOCK, type=WRITE,
basename=aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa}

--- Additional comment from Anand Avati on 2014-12-02 01:44:27 EST ---

REVIEW: http://review.gluster.org/9125 (features/locks: Add lk-owner checks in
entrylk) posted (#2) for review on master by Pranith Kumar Karampuri
(pkarampu at redhat.com)

--- Additional comment from Anand Avati on 2014-12-02 01:44:32 EST ---

REVIEW: http://review.gluster.org/9227 (cluster/afr: Make entry-self-heal in
afr-v2 compatible with afr-v1) posted (#1) for review on master by Pranith
Kumar Karampuri (pkarampu at redhat.com)

--- Additional comment from Anand Avati on 2014-12-26 04:45:53 EST ---

REVIEW: http://review.gluster.org/9227 (cluster/afr: Make entry-self-heal in
afr-v2 compatible with afr-v1) posted (#2) for review on master by Pranith
Kumar Karampuri (pkarampu at redhat.com)

--- Additional comment from Anand Avati on 2014-12-26 04:46:01 EST ---

REVIEW: http://review.gluster.org/9125 (features/locks: Add lk-owner checks in
entrylk) posted (#3) for review on master by Pranith Kumar Karampuri
(pkarampu at redhat.com)

--- Additional comment from Anand Avati on 2014-12-26 04:46:04 EST ---

REVIEW: http://review.gluster.org/9351 (mgmt/glusterd: Add option to enable
lock trace) posted (#1) for review on master by Pranith Kumar Karampuri
(pkarampu at redhat.com)

--- Additional comment from Anand Avati on 2014-12-27 02:20:41 EST ---

COMMIT: http://review.gluster.org/9125 committed in master by Pranith Kumar
Karampuri (pkarampu at redhat.com) 
------
commit 02b2172d9bc1557b3459388969077c75b659da82
Author: Pranith Kumar K <pkarampu at redhat.com>
Date:   Fri Nov 14 14:23:31 2014 +0530

    features/locks: Add lk-owner checks in entrylk

    For backward compatibility of entry-self-heal we need
    entrylks to be accepted by same lk-owner and same client.
    This patch introduces these changes.

    Change-Id: I67004cc5e657ba5ac09ceefbea823afdf06929e0
    BUG: 1168189
    Signed-off-by: Pranith Kumar K <pkarampu at redhat.com>
    Reviewed-on: http://review.gluster.org/9125
    Reviewed-by: Krutika Dhananjay <kdhananj at redhat.com>
    Tested-by: Gluster Build System <jenkins at build.gluster.com>

--- Additional comment from Anand Avati on 2014-12-27 02:21:03 EST ---

COMMIT: http://review.gluster.org/9227 committed in master by Pranith Kumar
Karampuri (pkarampu at redhat.com) 
------
commit 2947752836bd3ddbc572b59cecd24557050ec2a5
Author: Pranith Kumar K <pkarampu at redhat.com>
Date:   Mon Nov 17 14:27:47 2014 +0530

    cluster/afr: Make entry-self-heal in afr-v2 compatible with afr-v1

    Problem:
    entry self-heal in 3.6 and above, takes full lock on the directory only for
the
    duration of figuring out the xattrs of the directories where as 3.5 takes
locks
    through out the entry-self-heal. If the cluster is heterogeneous then there
is
    a chance that 3.6 self-heal is triggered and then 3.5 self-heal will also
    triggered and both the self-heal daemons of 3.5 and 3.6 do self-heal.

    Fix:
    In 3.6.x and above get an entry lock on a very long name before entry
self-heal
    begins so that 3.5 entry self-heal will not get locks until 3.6.x entry
    self-heal completes.

    Change-Id: I71b6958dfe33056ed0a5a237e64e8506c3b0fccc
    BUG: 1168189
    Signed-off-by: Pranith Kumar K <pkarampu at redhat.com>
    Reviewed-on: http://review.gluster.org/9227
    Reviewed-by: Krutika Dhananjay <kdhananj at redhat.com>
    Tested-by: Gluster Build System <jenkins at build.gluster.com>


Referenced Bugs:

https://bugzilla.redhat.com/show_bug.cgi?id=1168189
[Bug 1168189] entry self-heal in 3.5 and 3.6 are not compatible
https://bugzilla.redhat.com/show_bug.cgi?id=1177339
[Bug 1177339] entry self-heal in 3.5 and 3.6 are not compatible
-- 
You are receiving this mail because:
You are on the CC list for the bug.
You are the assignee for the bug.


More information about the Bugs mailing list