[Bugs] [Bug 1177418] New: entry self-heal in 3.5 and 3.6 are not compatible
bugzilla at redhat.com
bugzilla at redhat.com
Sat Dec 27 07:30:50 UTC 2014
https://bugzilla.redhat.com/show_bug.cgi?id=1177418
Bug ID: 1177418
Summary: entry self-heal in 3.5 and 3.6 are not compatible
Product: GlusterFS
Version: 3.6.1
Component: replicate
Assignee: bugs at gluster.org
Reporter: pkarampu at redhat.com
CC: bugs at gluster.org, gluster-bugs at redhat.com
Depends On: 1168189
Blocks: 1177339
+++ This bug was initially created as a clone of Bug #1168189 +++
Description of problem:
entry self-heal in 3.6 and above, takes full lock on the directory only for the
duration of figuring out the xattrs of the directories where as 3.5 takes locks
through out the entry-self-heal. If the cluster is heterogeneous then there is
a chance that 3.6 self-heal is triggered and then 3.5 self-heal will also
triggered and both the self-heal daemons of 3.5 and 3.6 do self-heal.
Version-Release number of selected component (if applicable):
How reproducible:
Steps to Reproduce:
1. Create a replicate volume consisting 2 bricks on machines m1, m2 on version
3.5
2. Create a directory 'd' inside the mount on m2 and cd into it
3. While a brick was down on m2 create lot of files in this directory 'd'. I
created 10000 files.
4. upgrade m1 to 3.6
5. Bring the bricks up and initiate self-heal of directories.
6. 3.6 self-heal daemon will start healing
7. access 'd' on mount in m2 then that will also trigger heal sometimes.
Actual results:
Self-heal is happening on directory 'd' by both self-heal daemons in 3.5, 3.6
Expected results:
Additional info:
--- Additional comment from Pranith Kumar K on 2014-11-26 06:26:03 EST ---
With the patch:
while 3.5 heal is in progress 3.6 heal was prevented:
[root at localhost ~]# grep d0555511ee7f0000 /var/log/glusterfs/bricks/brick.log |
grep ENTRYLK
[2014-11-26 11:01:51.165591] I [entrylk.c:244:entrylk_trace_in] 0-r2-locks:
[REQUEST] Locker = {Pid=18446744073709551610, lk-owner=d0555511ee7f0000,
Client=0x7f51d6d21ac0, Frame=19} Lockee =
{gfid=26625058-b5f2-4561-97da-ec9e7268119e, fd=(nil), path=/d} Lock =
{lock=ENTRYLK, cmd=LOCK_NB, type=WRITE, basename=(null), domain:
r2-replicate-0:self-heal}
[2014-11-26 11:01:51.165633] I [entrylk.c:271:entrylk_trace_out] 0-r2-locks:
[GRANTED] Locker = {Pid=18446744073709551610, lk-owner=d0555511ee7f0000,
Client=0x7f51d6d21ac0, Frame=19} Lockee =
{gfid=26625058-b5f2-4561-97da-ec9e7268119e, fd=(nil), path=/d} Lock =
{lock=ENTRYLK, cmd=LOCK_NB, type=WRITE, basename=(null), domain:
r2-replicate-0:self-heal}
[2014-11-26 11:01:51.173176] I [entrylk.c:244:entrylk_trace_in] 0-r2-locks:
[REQUEST] Locker = {Pid=18446744073709551610, lk-owner=d0555511ee7f0000,
Client=0x7f51d6d21ac0, Frame=21} Lockee =
{gfid=26625058-b5f2-4561-97da-ec9e7268119e, fd=(nil), path=/d} Lock =
{lock=ENTRYLK, cmd=LOCK_NB, type=WRITE,
basename=aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa}
[2014-11-26 11:01:51.173242] I [entrylk.c:271:entrylk_trace_out] 0-r2-locks:
[TRYAGAIN] Locker = {Pid=18446744073709551610, lk-owner=d0555511ee7f0000,
Client=0x7f51d6d21ac0, Frame=21} Lockee =
{gfid=26625058-b5f2-4561-97da-ec9e7268119e, fd=(nil), path=/d} Lock =
{lock=ENTRYLK, cmd=LOCK_NB, type=WRITE,
basename=aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa}
[2014-11-26 11:01:51.184939] I [entrylk.c:244:entrylk_trace_in] 0-r2-locks:
[REQUEST] Locker = {Pid=18446744073709551610, lk-owner=d0555511ee7f0000,
Client=0x7f51d6d21ac0, Frame=23} Lockee =
{gfid=26625058-b5f2-4561-97da-ec9e7268119e, fd=(nil), path=/d} Lock =
{lock=ENTRYLK, cmd=UNLOCK, type=WRITE, basename=(null), domain:
r2-replicate-0:self-heal}
[2014-11-26 11:01:51.184989] I [entrylk.c:271:entrylk_trace_out] 0-r2-locks:
[GRANTED] Locker = {Pid=18446744073709551610, lk-owner=d0555511ee7f0000,
Client=0x7f51d6d21ac0, Frame=23} Lockee =
{gfid=26625058-b5f2-4561-97da-ec9e7268119e, fd=(nil), path=/d} Lock =
{lock=ENTRYLK, cmd=UNLOCK, type=WRITE, basename=(null), domain:
r2-replicate-0:self-heal}
--- Additional comment from Pranith Kumar K on 2014-12-02 01:37:54 EST ---
In this test case, 3.6 does healing where as 3.5 heal will not get locks:
[root at localhost ~]# egrep "(aaaaaaaaaaa|TRYAGAIN)"
/var/log/glusterfs/bricks/brick.log | grep ENTRY
[2014-12-02 06:14:54.502242] I [entrylk.c:244:entrylk_trace_in] 0-r2-locks:
[REQUEST] Locker = {Pid=18446744073709551610, lk-owner=a41e327d2e7f0000,
Client=0x7f831e1a4ac0, Frame=1064} Lockee =
{gfid=fab813d6-2ef2-4885-a293-91476cc5d167, fd=(nil),
path=<gfid:fab813d6-2ef2-4885-a293-91476cc5d167>} Lock = {lock=ENTRYLK,
cmd=LOCK_NB, type=WRITE,
basename=aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa}
[2014-12-02 06:14:54.502261] I [entrylk.c:271:entrylk_trace_out] 0-r2-locks:
[GRANTED] Locker = {Pid=18446744073709551610, lk-owner=a41e327d2e7f0000,
Client=0x7f831e1a4ac0, Frame=1064} Lockee =
{gfid=fab813d6-2ef2-4885-a293-91476cc5d167, fd=(nil),
path=<gfid:fab813d6-2ef2-4885-a293-91476cc5d167>} Lock = {lock=ENTRYLK,
cmd=LOCK_NB, type=WRITE,
basename=aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa}
[2014-12-02 06:15:01.491651] I [entrylk.c:271:entrylk_trace_out] 0-r2-locks:
[TRYAGAIN] Locker = {Pid=18446744073709551615, lk-owner=08b4e7739a7f0000,
Client=0x7f831e1e7880, Frame=1621} Lockee =
{gfid=fab813d6-2ef2-4885-a293-91476cc5d167, fd=(nil),
path=<gfid:fab813d6-2ef2-4885-a293-91476cc5d167>} Lock = {lock=ENTRYLK,
cmd=LOCK_NB, type=WRITE, basename=(null), domain: r2-replicate-0}
[2014-12-02 06:18:36.741434] I [entrylk.c:244:entrylk_trace_in] 0-r2-locks:
[REQUEST] Locker = {Pid=18446744073709551610, lk-owner=a41e327d2e7f0000,
Client=0x7f831e1a4ac0, Frame=91112} Lockee =
{gfid=fab813d6-2ef2-4885-a293-91476cc5d167, fd=(nil),
path=<gfid:fab813d6-2ef2-4885-a293-91476cc5d167>} Lock = {lock=ENTRYLK,
cmd=UNLOCK, type=WRITE,
basename=aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa}
[2014-12-02 06:18:36.741604] I [entrylk.c:271:entrylk_trace_out] 0-r2-locks:
[GRANTED] Locker = {Pid=18446744073709551610, lk-owner=a41e327d2e7f0000,
Client=0x7f831e1a4ac0, Frame=91112} Lockee =
{gfid=fab813d6-2ef2-4885-a293-91476cc5d167, fd=(nil),
path=<gfid:fab813d6-2ef2-4885-a293-91476cc5d167>} Lock = {lock=ENTRYLK,
cmd=UNLOCK, type=WRITE,
basename=aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa}
--- Additional comment from Anand Avati on 2014-12-02 01:44:27 EST ---
REVIEW: http://review.gluster.org/9125 (features/locks: Add lk-owner checks in
entrylk) posted (#2) for review on master by Pranith Kumar Karampuri
(pkarampu at redhat.com)
--- Additional comment from Anand Avati on 2014-12-02 01:44:32 EST ---
REVIEW: http://review.gluster.org/9227 (cluster/afr: Make entry-self-heal in
afr-v2 compatible with afr-v1) posted (#1) for review on master by Pranith
Kumar Karampuri (pkarampu at redhat.com)
--- Additional comment from Anand Avati on 2014-12-26 04:45:53 EST ---
REVIEW: http://review.gluster.org/9227 (cluster/afr: Make entry-self-heal in
afr-v2 compatible with afr-v1) posted (#2) for review on master by Pranith
Kumar Karampuri (pkarampu at redhat.com)
--- Additional comment from Anand Avati on 2014-12-26 04:46:01 EST ---
REVIEW: http://review.gluster.org/9125 (features/locks: Add lk-owner checks in
entrylk) posted (#3) for review on master by Pranith Kumar Karampuri
(pkarampu at redhat.com)
--- Additional comment from Anand Avati on 2014-12-26 04:46:04 EST ---
REVIEW: http://review.gluster.org/9351 (mgmt/glusterd: Add option to enable
lock trace) posted (#1) for review on master by Pranith Kumar Karampuri
(pkarampu at redhat.com)
--- Additional comment from Anand Avati on 2014-12-27 02:20:41 EST ---
COMMIT: http://review.gluster.org/9125 committed in master by Pranith Kumar
Karampuri (pkarampu at redhat.com)
------
commit 02b2172d9bc1557b3459388969077c75b659da82
Author: Pranith Kumar K <pkarampu at redhat.com>
Date: Fri Nov 14 14:23:31 2014 +0530
features/locks: Add lk-owner checks in entrylk
For backward compatibility of entry-self-heal we need
entrylks to be accepted by same lk-owner and same client.
This patch introduces these changes.
Change-Id: I67004cc5e657ba5ac09ceefbea823afdf06929e0
BUG: 1168189
Signed-off-by: Pranith Kumar K <pkarampu at redhat.com>
Reviewed-on: http://review.gluster.org/9125
Reviewed-by: Krutika Dhananjay <kdhananj at redhat.com>
Tested-by: Gluster Build System <jenkins at build.gluster.com>
--- Additional comment from Anand Avati on 2014-12-27 02:21:03 EST ---
COMMIT: http://review.gluster.org/9227 committed in master by Pranith Kumar
Karampuri (pkarampu at redhat.com)
------
commit 2947752836bd3ddbc572b59cecd24557050ec2a5
Author: Pranith Kumar K <pkarampu at redhat.com>
Date: Mon Nov 17 14:27:47 2014 +0530
cluster/afr: Make entry-self-heal in afr-v2 compatible with afr-v1
Problem:
entry self-heal in 3.6 and above, takes full lock on the directory only for
the
duration of figuring out the xattrs of the directories where as 3.5 takes
locks
through out the entry-self-heal. If the cluster is heterogeneous then there
is
a chance that 3.6 self-heal is triggered and then 3.5 self-heal will also
triggered and both the self-heal daemons of 3.5 and 3.6 do self-heal.
Fix:
In 3.6.x and above get an entry lock on a very long name before entry
self-heal
begins so that 3.5 entry self-heal will not get locks until 3.6.x entry
self-heal completes.
Change-Id: I71b6958dfe33056ed0a5a237e64e8506c3b0fccc
BUG: 1168189
Signed-off-by: Pranith Kumar K <pkarampu at redhat.com>
Reviewed-on: http://review.gluster.org/9227
Reviewed-by: Krutika Dhananjay <kdhananj at redhat.com>
Tested-by: Gluster Build System <jenkins at build.gluster.com>
Referenced Bugs:
https://bugzilla.redhat.com/show_bug.cgi?id=1168189
[Bug 1168189] entry self-heal in 3.5 and 3.6 are not compatible
https://bugzilla.redhat.com/show_bug.cgi?id=1177339
[Bug 1177339] entry self-heal in 3.5 and 3.6 are not compatible
--
You are receiving this mail because:
You are on the CC list for the bug.
You are the assignee for the bug.
More information about the Bugs
mailing list