[Bugs] [Bug 1188522] New: entry, metadata self-heal in 3.0 and 3.1 are not compatible

bugzilla at redhat.com bugzilla at redhat.com
Tue Feb 3 06:34:25 UTC 2015


https://bugzilla.redhat.com/show_bug.cgi?id=1188522

            Bug ID: 1188522
           Summary: entry, metadata self-heal in 3.0 and 3.1 are not
                    compatible
           Product: Red Hat Storage
           Version: 3.0
         Component: gluster-afr
          Assignee: pkarampu at redhat.com
          Reporter: pkarampu at redhat.com
        QA Contact: storage-qa-internal at redhat.com
                CC: bugs at gluster.org, gluster-bugs at redhat.com
        Depends On: 1168189, 1177339, 1177418



+++ This bug was initially created as a clone of Bug #1177339 +++

+++ This bug was initially created as a clone of Bug #1168189 +++

Description of problem:
entry self-heal in 3.6 and above, takes full lock on the directory only for the
duration of figuring out the xattrs of the directories where as 3.5 takes locks
through out the entry-self-heal. If the cluster is heterogeneous then there is
a chance that 3.6 self-heal is triggered and then 3.5 self-heal will also
triggered and both the self-heal daemons of 3.5 and 3.6 do self-heal.

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1. Create a replicate volume consisting 2 bricks on machines m1, m2 on version
3.5
2. Create a directory 'd' inside the mount on m2 and cd into it
3. While a brick was down on m2 create lot of files in this directory 'd'. I
created 10000 files.
4. upgrade m1 to 3.6
5. Bring the bricks up and initiate self-heal of directories.
6. 3.6 self-heal daemon will start healing
7. access 'd' on mount in m2 then that will also trigger heal sometimes.

Actual results:
Self-heal is happening on directory 'd' by both self-heal daemons in 3.5, 3.6

Expected results:


Additional info:

--- Additional comment from Pranith Kumar K on 2014-11-26 06:26:03 EST ---

With the patch:
while 3.5 heal is in progress 3.6 heal was prevented:
[root at localhost ~]# grep d0555511ee7f0000 /var/log/glusterfs/bricks/brick.log |
grep ENTRYLK
[2014-11-26 11:01:51.165591] I [entrylk.c:244:entrylk_trace_in] 0-r2-locks:
[REQUEST] Locker = {Pid=18446744073709551610, lk-owner=d0555511ee7f0000,
Client=0x7f51d6d21ac0, Frame=19} Lockee =
{gfid=26625058-b5f2-4561-97da-ec9e7268119e, fd=(nil), path=/d} Lock =
{lock=ENTRYLK, cmd=LOCK_NB, type=WRITE, basename=(null), domain:
r2-replicate-0:self-heal}
[2014-11-26 11:01:51.165633] I [entrylk.c:271:entrylk_trace_out] 0-r2-locks:
[GRANTED] Locker = {Pid=18446744073709551610, lk-owner=d0555511ee7f0000,
Client=0x7f51d6d21ac0, Frame=19} Lockee =
{gfid=26625058-b5f2-4561-97da-ec9e7268119e, fd=(nil), path=/d} Lock =
{lock=ENTRYLK, cmd=LOCK_NB, type=WRITE, basename=(null), domain:
r2-replicate-0:self-heal}
[2014-11-26 11:01:51.173176] I [entrylk.c:244:entrylk_trace_in] 0-r2-locks:
[REQUEST] Locker = {Pid=18446744073709551610, lk-owner=d0555511ee7f0000,
Client=0x7f51d6d21ac0, Frame=21} Lockee =
{gfid=26625058-b5f2-4561-97da-ec9e7268119e, fd=(nil), path=/d} Lock =
{lock=ENTRYLK, cmd=LOCK_NB, type=WRITE,
basename=aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa}
[2014-11-26 11:01:51.173242] I [entrylk.c:271:entrylk_trace_out] 0-r2-locks:
[TRYAGAIN] Locker = {Pid=18446744073709551610, lk-owner=d0555511ee7f0000,
Client=0x7f51d6d21ac0, Frame=21} Lockee =
{gfid=26625058-b5f2-4561-97da-ec9e7268119e, fd=(nil), path=/d} Lock =
{lock=ENTRYLK, cmd=LOCK_NB, type=WRITE,
basename=aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa}
[2014-11-26 11:01:51.184939] I [entrylk.c:244:entrylk_trace_in] 0-r2-locks:
[REQUEST] Locker = {Pid=18446744073709551610, lk-owner=d0555511ee7f0000,
Client=0x7f51d6d21ac0, Frame=23} Lockee =
{gfid=26625058-b5f2-4561-97da-ec9e7268119e, fd=(nil), path=/d} Lock =
{lock=ENTRYLK, cmd=UNLOCK, type=WRITE, basename=(null), domain:
r2-replicate-0:self-heal}
[2014-11-26 11:01:51.184989] I [entrylk.c:271:entrylk_trace_out] 0-r2-locks:
[GRANTED] Locker = {Pid=18446744073709551610, lk-owner=d0555511ee7f0000,
Client=0x7f51d6d21ac0, Frame=23} Lockee =
{gfid=26625058-b5f2-4561-97da-ec9e7268119e, fd=(nil), path=/d} Lock =
{lock=ENTRYLK, cmd=UNLOCK, type=WRITE, basename=(null), domain:
r2-replicate-0:self-heal}

--- Additional comment from Pranith Kumar K on 2014-12-02 01:37:54 EST ---

In this test case, 3.6 does healing where as 3.5 heal will not get locks:

[root at localhost ~]# egrep "(aaaaaaaaaaa|TRYAGAIN)"
/var/log/glusterfs/bricks/brick.log | grep ENTRY
[2014-12-02 06:14:54.502242] I [entrylk.c:244:entrylk_trace_in] 0-r2-locks:
[REQUEST] Locker = {Pid=18446744073709551610, lk-owner=a41e327d2e7f0000,
Client=0x7f831e1a4ac0, Frame=1064} Lockee =
{gfid=fab813d6-2ef2-4885-a293-91476cc5d167, fd=(nil),
path=<gfid:fab813d6-2ef2-4885-a293-91476cc5d167>} Lock = {lock=ENTRYLK,
cmd=LOCK_NB, type=WRITE,
basename=aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa}
[2014-12-02 06:14:54.502261] I [entrylk.c:271:entrylk_trace_out] 0-r2-locks:
[GRANTED] Locker = {Pid=18446744073709551610, lk-owner=a41e327d2e7f0000,
Client=0x7f831e1a4ac0, Frame=1064} Lockee =
{gfid=fab813d6-2ef2-4885-a293-91476cc5d167, fd=(nil),
path=<gfid:fab813d6-2ef2-4885-a293-91476cc5d167>} Lock = {lock=ENTRYLK,
cmd=LOCK_NB, type=WRITE,
basename=aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa}
[2014-12-02 06:15:01.491651] I [entrylk.c:271:entrylk_trace_out] 0-r2-locks:
[TRYAGAIN] Locker = {Pid=18446744073709551615, lk-owner=08b4e7739a7f0000,
Client=0x7f831e1e7880, Frame=1621} Lockee =
{gfid=fab813d6-2ef2-4885-a293-91476cc5d167, fd=(nil),
path=<gfid:fab813d6-2ef2-4885-a293-91476cc5d167>} Lock = {lock=ENTRYLK,
cmd=LOCK_NB, type=WRITE, basename=(null), domain: r2-replicate-0}
[2014-12-02 06:18:36.741434] I [entrylk.c:244:entrylk_trace_in] 0-r2-locks:
[REQUEST] Locker = {Pid=18446744073709551610, lk-owner=a41e327d2e7f0000,
Client=0x7f831e1a4ac0, Frame=91112} Lockee =
{gfid=fab813d6-2ef2-4885-a293-91476cc5d167, fd=(nil),
path=<gfid:fab813d6-2ef2-4885-a293-91476cc5d167>} Lock = {lock=ENTRYLK,
cmd=UNLOCK, type=WRITE,
basename=aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa}
[2014-12-02 06:18:36.741604] I [entrylk.c:271:entrylk_trace_out] 0-r2-locks:
[GRANTED] Locker = {Pid=18446744073709551610, lk-owner=a41e327d2e7f0000,
Client=0x7f831e1a4ac0, Frame=91112} Lockee =
{gfid=fab813d6-2ef2-4885-a293-91476cc5d167, fd=(nil),
path=<gfid:fab813d6-2ef2-4885-a293-91476cc5d167>} Lock = {lock=ENTRYLK,
cmd=UNLOCK, type=WRITE,
basename=aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa}

--- Additional comment from Anand Avati on 2014-12-02 01:44:27 EST ---

REVIEW: http://review.gluster.org/9125 (features/locks: Add lk-owner checks in
entrylk) posted (#2) for review on master by Pranith Kumar Karampuri
(pkarampu at redhat.com)

--- Additional comment from Anand Avati on 2014-12-02 01:44:32 EST ---

REVIEW: http://review.gluster.org/9227 (cluster/afr: Make entry-self-heal in
afr-v2 compatible with afr-v1) posted (#1) for review on master by Pranith
Kumar Karampuri (pkarampu at redhat.com)

--- Additional comment from Anand Avati on 2014-12-26 04:45:53 EST ---

REVIEW: http://review.gluster.org/9227 (cluster/afr: Make entry-self-heal in
afr-v2 compatible with afr-v1) posted (#2) for review on master by Pranith
Kumar Karampuri (pkarampu at redhat.com)

--- Additional comment from Anand Avati on 2014-12-26 04:46:01 EST ---

REVIEW: http://review.gluster.org/9125 (features/locks: Add lk-owner checks in
entrylk) posted (#3) for review on master by Pranith Kumar Karampuri
(pkarampu at redhat.com)

--- Additional comment from Anand Avati on 2014-12-26 04:46:04 EST ---

REVIEW: http://review.gluster.org/9351 (mgmt/glusterd: Add option to enable
lock trace) posted (#1) for review on master by Pranith Kumar Karampuri
(pkarampu at redhat.com)

--- Additional comment from Anand Avati on 2014-12-26 04:49:15 EST ---

REVIEW: http://review.gluster.org/9352 (features/locks: Add lk-owner checks in
entrylk) posted (#1) for review on release-3.5 by Pranith Kumar Karampuri
(pkarampu at redhat.com)

--- Additional comment from Anand Avati on 2014-12-28 23:58:15 EST ---

REVIEW: http://review.gluster.org/9352 (features/locks: Add lk-owner checks in
entrylk) posted (#2) for review on release-3.5 by Pranith Kumar Karampuri
(pkarampu at redhat.com)


Referenced Bugs:

https://bugzilla.redhat.com/show_bug.cgi?id=1168189
[Bug 1168189] entry self-heal in 3.5 and 3.6 are not compatible
https://bugzilla.redhat.com/show_bug.cgi?id=1177339
[Bug 1177339] entry self-heal in 3.5 and 3.6 are not compatible
https://bugzilla.redhat.com/show_bug.cgi?id=1177418
[Bug 1177418] entry self-heal in 3.5 and 3.6 are not compatible
-- 
You are receiving this mail because:
You are on the CC list for the bug.
Unsubscribe from this bug https://bugzilla.redhat.com/token.cgi?t=vMEpk1uNYy&a=cc_unsubscribe


More information about the Bugs mailing list