[Bugs] [Bug 1177339] New: entry self-heal in 3.5 and 3.6 are not compatible

bugzilla at redhat.com bugzilla at redhat.com
Fri Dec 26 09:47:27 UTC 2014


https://bugzilla.redhat.com/show_bug.cgi?id=1177339

            Bug ID: 1177339
           Summary: entry self-heal in 3.5 and 3.6 are not compatible
           Product: GlusterFS
           Version: 3.5.3
         Component: replicate
          Assignee: bugs at gluster.org
          Reporter: pkarampu at redhat.com
                CC: bugs at gluster.org, gluster-bugs at redhat.com
        Depends On: 1168189



+++ This bug was initially created as a clone of Bug #1168189 +++

Description of problem:
entry self-heal in 3.6 and above, takes full lock on the directory only for the
duration of figuring out the xattrs of the directories where as 3.5 takes locks
through out the entry-self-heal. If the cluster is heterogeneous then there is
a chance that 3.6 self-heal is triggered and then 3.5 self-heal will also
triggered and both the self-heal daemons of 3.5 and 3.6 do self-heal.

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1. Create a replicate volume consisting 2 bricks on machines m1, m2 on version
3.5
2. Create a directory 'd' inside the mount on m2 and cd into it
3. While a brick was down on m2 create lot of files in this directory 'd'. I
created 10000 files.
4. upgrade m1 to 3.6
5. Bring the bricks up and initiate self-heal of directories.
6. 3.6 self-heal daemon will start healing
7. access 'd' on mount in m2 then that will also trigger heal sometimes.

Actual results:
Self-heal is happening on directory 'd' by both self-heal daemons in 3.5, 3.6

Expected results:


Additional info:

--- Additional comment from Pranith Kumar K on 2014-11-26 06:26:03 EST ---

With the patch:
while 3.5 heal is in progress 3.6 heal was prevented:
[root at localhost ~]# grep d0555511ee7f0000 /var/log/glusterfs/bricks/brick.log |
grep ENTRYLK
[2014-11-26 11:01:51.165591] I [entrylk.c:244:entrylk_trace_in] 0-r2-locks:
[REQUEST] Locker = {Pid=18446744073709551610, lk-owner=d0555511ee7f0000,
Client=0x7f51d6d21ac0, Frame=19} Lockee =
{gfid=26625058-b5f2-4561-97da-ec9e7268119e, fd=(nil), path=/d} Lock =
{lock=ENTRYLK, cmd=LOCK_NB, type=WRITE, basename=(null), domain:
r2-replicate-0:self-heal}
[2014-11-26 11:01:51.165633] I [entrylk.c:271:entrylk_trace_out] 0-r2-locks:
[GRANTED] Locker = {Pid=18446744073709551610, lk-owner=d0555511ee7f0000,
Client=0x7f51d6d21ac0, Frame=19} Lockee =
{gfid=26625058-b5f2-4561-97da-ec9e7268119e, fd=(nil), path=/d} Lock =
{lock=ENTRYLK, cmd=LOCK_NB, type=WRITE, basename=(null), domain:
r2-replicate-0:self-heal}
[2014-11-26 11:01:51.173176] I [entrylk.c:244:entrylk_trace_in] 0-r2-locks:
[REQUEST] Locker = {Pid=18446744073709551610, lk-owner=d0555511ee7f0000,
Client=0x7f51d6d21ac0, Frame=21} Lockee =
{gfid=26625058-b5f2-4561-97da-ec9e7268119e, fd=(nil), path=/d} Lock =
{lock=ENTRYLK, cmd=LOCK_NB, type=WRITE,
basename=aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa}
[2014-11-26 11:01:51.173242] I [entrylk.c:271:entrylk_trace_out] 0-r2-locks:
[TRYAGAIN] Locker = {Pid=18446744073709551610, lk-owner=d0555511ee7f0000,
Client=0x7f51d6d21ac0, Frame=21} Lockee =
{gfid=26625058-b5f2-4561-97da-ec9e7268119e, fd=(nil), path=/d} Lock =
{lock=ENTRYLK, cmd=LOCK_NB, type=WRITE,
basename=aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa}
[2014-11-26 11:01:51.184939] I [entrylk.c:244:entrylk_trace_in] 0-r2-locks:
[REQUEST] Locker = {Pid=18446744073709551610, lk-owner=d0555511ee7f0000,
Client=0x7f51d6d21ac0, Frame=23} Lockee =
{gfid=26625058-b5f2-4561-97da-ec9e7268119e, fd=(nil), path=/d} Lock =
{lock=ENTRYLK, cmd=UNLOCK, type=WRITE, basename=(null), domain:
r2-replicate-0:self-heal}
[2014-11-26 11:01:51.184989] I [entrylk.c:271:entrylk_trace_out] 0-r2-locks:
[GRANTED] Locker = {Pid=18446744073709551610, lk-owner=d0555511ee7f0000,
Client=0x7f51d6d21ac0, Frame=23} Lockee =
{gfid=26625058-b5f2-4561-97da-ec9e7268119e, fd=(nil), path=/d} Lock =
{lock=ENTRYLK, cmd=UNLOCK, type=WRITE, basename=(null), domain:
r2-replicate-0:self-heal}

--- Additional comment from Pranith Kumar K on 2014-12-02 01:37:54 EST ---

In this test case, 3.6 does healing where as 3.5 heal will not get locks:

[root at localhost ~]# egrep "(aaaaaaaaaaa|TRYAGAIN)"
/var/log/glusterfs/bricks/brick.log | grep ENTRY
[2014-12-02 06:14:54.502242] I [entrylk.c:244:entrylk_trace_in] 0-r2-locks:
[REQUEST] Locker = {Pid=18446744073709551610, lk-owner=a41e327d2e7f0000,
Client=0x7f831e1a4ac0, Frame=1064} Lockee =
{gfid=fab813d6-2ef2-4885-a293-91476cc5d167, fd=(nil),
path=<gfid:fab813d6-2ef2-4885-a293-91476cc5d167>} Lock = {lock=ENTRYLK,
cmd=LOCK_NB, type=WRITE,
basename=aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa}
[2014-12-02 06:14:54.502261] I [entrylk.c:271:entrylk_trace_out] 0-r2-locks:
[GRANTED] Locker = {Pid=18446744073709551610, lk-owner=a41e327d2e7f0000,
Client=0x7f831e1a4ac0, Frame=1064} Lockee =
{gfid=fab813d6-2ef2-4885-a293-91476cc5d167, fd=(nil),
path=<gfid:fab813d6-2ef2-4885-a293-91476cc5d167>} Lock = {lock=ENTRYLK,
cmd=LOCK_NB, type=WRITE,
basename=aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa}
[2014-12-02 06:15:01.491651] I [entrylk.c:271:entrylk_trace_out] 0-r2-locks:
[TRYAGAIN] Locker = {Pid=18446744073709551615, lk-owner=08b4e7739a7f0000,
Client=0x7f831e1e7880, Frame=1621} Lockee =
{gfid=fab813d6-2ef2-4885-a293-91476cc5d167, fd=(nil),
path=<gfid:fab813d6-2ef2-4885-a293-91476cc5d167>} Lock = {lock=ENTRYLK,
cmd=LOCK_NB, type=WRITE, basename=(null), domain: r2-replicate-0}
[2014-12-02 06:18:36.741434] I [entrylk.c:244:entrylk_trace_in] 0-r2-locks:
[REQUEST] Locker = {Pid=18446744073709551610, lk-owner=a41e327d2e7f0000,
Client=0x7f831e1a4ac0, Frame=91112} Lockee =
{gfid=fab813d6-2ef2-4885-a293-91476cc5d167, fd=(nil),
path=<gfid:fab813d6-2ef2-4885-a293-91476cc5d167>} Lock = {lock=ENTRYLK,
cmd=UNLOCK, type=WRITE,
basename=aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa}
[2014-12-02 06:18:36.741604] I [entrylk.c:271:entrylk_trace_out] 0-r2-locks:
[GRANTED] Locker = {Pid=18446744073709551610, lk-owner=a41e327d2e7f0000,
Client=0x7f831e1a4ac0, Frame=91112} Lockee =
{gfid=fab813d6-2ef2-4885-a293-91476cc5d167, fd=(nil),
path=<gfid:fab813d6-2ef2-4885-a293-91476cc5d167>} Lock = {lock=ENTRYLK,
cmd=UNLOCK, type=WRITE,
basename=aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa}

--- Additional comment from Anand Avati on 2014-12-02 01:44:27 EST ---

REVIEW: http://review.gluster.org/9125 (features/locks: Add lk-owner checks in
entrylk) posted (#2) for review on master by Pranith Kumar Karampuri
(pkarampu at redhat.com)

--- Additional comment from Anand Avati on 2014-12-02 01:44:32 EST ---

REVIEW: http://review.gluster.org/9227 (cluster/afr: Make entry-self-heal in
afr-v2 compatible with afr-v1) posted (#1) for review on master by Pranith
Kumar Karampuri (pkarampu at redhat.com)

--- Additional comment from Anand Avati on 2014-12-26 04:45:53 EST ---

REVIEW: http://review.gluster.org/9227 (cluster/afr: Make entry-self-heal in
afr-v2 compatible with afr-v1) posted (#2) for review on master by Pranith
Kumar Karampuri (pkarampu at redhat.com)

--- Additional comment from Anand Avati on 2014-12-26 04:46:01 EST ---

REVIEW: http://review.gluster.org/9125 (features/locks: Add lk-owner checks in
entrylk) posted (#3) for review on master by Pranith Kumar Karampuri
(pkarampu at redhat.com)

--- Additional comment from Anand Avati on 2014-12-26 04:46:04 EST ---

REVIEW: http://review.gluster.org/9351 (mgmt/glusterd: Add option to enable
lock trace) posted (#1) for review on master by Pranith Kumar Karampuri
(pkarampu at redhat.com)


Referenced Bugs:

https://bugzilla.redhat.com/show_bug.cgi?id=1168189
[Bug 1168189] entry self-heal in 3.5 and 3.6 are not compatible
-- 
You are receiving this mail because:
You are on the CC list for the bug.
You are the assignee for the bug.


More information about the Bugs mailing list