[Bugs] [Bug 1402482] New: The directories get renamed when data bricks are offline in 4*(2+1) volume
bugzilla at redhat.com
bugzilla at redhat.com
Wed Dec 7 16:07:40 UTC 2016
https://bugzilla.redhat.com/show_bug.cgi?id=1402482
Bug ID: 1402482
Summary: The directories get renamed when data bricks are
offline in 4*(2+1) volume
Product: GlusterFS
Version: 3.9
Component: replicate
Keywords: Triaged
Severity: high
Assignee: bugs at gluster.org
Reporter: pkarampu at redhat.com
CC: bugs at gluster.org, ksandha at redhat.com,
ravishankar at redhat.com
Depends On: 1369077
Blocks: 1393694
+++ This bug was initially created as a clone of Bug #1369077 +++
Description of problem:
Killed the data bricks which had the directory and data and renamed the
directory from mount pt. renaming was successfull.
Note:- Read the steps from more information
Version-Release number of selected component (if applicable):
gluster --version
glusterfs 3.8.2 built on Aug 10 2016 15:34:37
How reproducible:
3/3
[root at dhcp43-223 new]# gluster vol info
Volume Name: arbiter
Type: Distributed-Replicate
Volume ID: 70c7113e-2223-4cd2-acfd-b08b1c376ea4
Status: Started
Number of Bricks: 4 x (2 + 1) = 12
Transport-type: tcp
Bricks:
Brick1: 10.70.43.223:/bricks/brick0/abc
Brick2: 10.70.42.58:/bricks/brick0/abc
Brick3: 10.70.43.142:/bricks/brick0/abc (arbiter)
Brick4: 10.70.43.223:/bricks/brick1/abc
Brick5: 10.70.42.58:/bricks/brick1/abc
Brick6: 10.70.43.142:/bricks/brick1/abc (arbiter)
Brick7: 10.70.43.223:/bricks/brick2/abc
Brick8: 10.70.42.58:/bricks/brick2/abc
Brick9: 10.70.43.142:/bricks/brick2/abc (arbiter)
Brick10: 10.70.43.223:/bricks/brick3/abc
Brick11: 10.70.42.58:/bricks/brick3/abc
Brick12: 10.70.43.142:/bricks/brick3/abc (arbiter)
Options Reconfigured:
client.event-threads: 4
server.event-threads: 4
cluster.lookup-optimize: on
transport.address-family: inet
performance.readdir-ahead: on
Steps to Reproduce:
1. Create an arbiter volume 4 x (2 + 1) mount it using FUSE ( volume name
-Arbiter)
2. On mount point create a directory "dir1" and create a file inside "abc"
3. write 100M to the file using dd
dd if=/dev/urandom of=abc bs=1M count=100
4. now kill the data bricks from the volume on which the data is present i.e
"abc" file
in my case:- brick10 , brick11 were data bricks , brick12 was the arbiter
brick
5. Rest all bricks were online.
6. now change the directory name from dir1 to dir2 from mount point using "mv
dir1 dir2"
Actual results:
The directory got renamed in-spite being in read only mode
#mv dir1 dir2
mv: cannot move ‘dir1’ to ‘dir2’: Read-only file system
#ls
# dir2
Expected results:
directory shouldn't be renamed.
Additional info:
Tried the same on plain dist volume and plain replicate 1*3 volume. the issue
was not reproducible.
Reproduced the same issue on 2 x (2 + 1) volume
observed that after renaming the directory
[root at dhcp43-165 super]# mv new one
mv: cannot move ‘new’ to ‘one’: Read-only file system
[root at dhcp43-165 super]#
[root at dhcp43-165 super]# ls
ls: cannot access new: No such file or directory
new one
two directories are created.
--- Additional comment from Karan Sandha on 2016-08-22 08:47 EDT ---
--- Additional comment from Karan Sandha on 2016-08-22 08:48 EDT ---
--- Additional comment from Karan Sandha on 2016-08-22 08:49 EDT ---
--- Additional comment from Karan Sandha on 2016-08-22 08:53 EDT ---
--- Additional comment from Ravishankar N on 2016-08-24 10:02:28 EDT ---
Changing the component to replicate as it occurs on distribute replicate also.
(Karan, feel free to correct me if I am wrong). Also assigning it to Pranith as
he said he'd work on the fix:
Relevant technical discussions on IRC:
<itisravi> pranithk1: are you free to talk about the bug Karan raised?
<itisravi> its a day one issue IMO and not specific to afr.
<itisravi> s/afr/arbiter
<pranithk1> itisravi: He said the bug is not recreatable in 3-way
replication?
<itisravi> pranithk1: It is..I've requested him to check again.
<itisravi> pranithk1: so if mkdir fails on one replica subvol due to
quorum not met etc , dht has no roll back
<itisravi> thats the issue.
<pranithk1> itisravi: Does it happen on plain replicate?
<itisravi> pranithk1: no
<itisravi> pranithk1: its dht renamedir thing..
<pranithk1> itisravi: okay, assign the bug to DHT giving the reason
<itisravi> pranithk1: nithya was saying if afr_inodelk can also have
quorum checks, then renamedir will not happen
<itisravi> so we will be good.
<itisravi> instead of partially creating it on the up subvols of DHT
<pranithk1> itisravi: That is not a bad idea, send out a patch. Please
tell her it only prevents the odds, won't fix the problem completely
<itisravi> pranithk1: we can do it for afr_entrylk also then no?
<pranithk1> itisravi: Actually the inodelk/finodelk needs to be reworked.
I will send the patch
<pranithk1> itisravi: yeah, that too
<itisravi> pranithk1: I see , okay.
--- Additional comment from Niels de Vos on 2016-09-12 01:39:42 EDT ---
All 3.8.x bugs are now reported against version 3.8 (without .x). For more
information, see
http://www.gluster.org/pipermail/gluster-devel/2016-September/050859.html
--- Additional comment from Worker Ant on 2016-11-08 07:21:51 EST ---
REVIEW: http://review.gluster.org/15802 (cluster/afr: Fix bugs in
[f]inodelk/[f]entrylk) posted (#1) for review on master by Pranith Kumar
Karampuri (pkarampu at redhat.com)
--- Additional comment from Worker Ant on 2016-11-25 05:14:30 EST ---
REVIEW: http://review.gluster.org/15802 (cluster/afr: Fix bugs in
[f]inodelk/[f]entrylk) posted (#2) for review on master by Pranith Kumar
Karampuri (pkarampu at redhat.com)
--- Additional comment from Worker Ant on 2016-11-26 10:35:02 EST ---
COMMIT: http://review.gluster.org/15802 committed in master by Pranith Kumar
Karampuri (pkarampu at redhat.com)
------
commit 6be7bd936eb30aa8d2b908061f60e1534e797657
Author: Pranith Kumar K <pkarampu at redhat.com>
Date: Mon Nov 7 14:47:34 2016 +0530
cluster/afr: Fix bugs in [f]inodelk/[f]entrylk
Problems:
1) Inodelk is not taking quorum into account
2) finodelk, [f]entrylk are not implemented correctly
3) By default afr doesn't go for non-blocking parallel locks.
Fix:
Implemented a common framework which can be used by
[f]inodelk/[f]entrylk. Used quorum for the same.
Change-Id: I239f13875a065298630d266941df10cfa3addc85
BUG: 1369077
Signed-off-by: Pranith Kumar K <pkarampu at redhat.com>
Reviewed-on: http://review.gluster.org/15802
Tested-by: Krutika Dhananjay <kdhananj at redhat.com>
Reviewed-by: Krutika Dhananjay <kdhananj at redhat.com>
Smoke: Gluster Build System <jenkins at build.gluster.org>
Reviewed-by: Ravishankar N <ravishankar at redhat.com>
CentOS-regression: Gluster Build System <jenkins at build.gluster.org>
NetBSD-regression: NetBSD Build System <jenkins at build.gluster.org>
--- Additional comment from Worker Ant on 2016-12-01 01:24:01 EST ---
REVIEW: http://review.gluster.org/15984 (cluster/afr: Serialize conflicting
locks on all subvols) posted (#1) for review on master by Pranith Kumar
Karampuri (pkarampu at redhat.com)
--- Additional comment from Worker Ant on 2016-12-01 03:38:11 EST ---
REVIEW: http://review.gluster.org/15984 (cluster/afr: Serialize conflicting
locks on all subvols) posted (#2) for review on master by Pranith Kumar
Karampuri (pkarampu at redhat.com)
--- Additional comment from Worker Ant on 2016-12-01 03:46:21 EST ---
REVIEW: http://review.gluster.org/15984 (cluster/afr: Serialize conflicting
locks on all subvols) posted (#3) for review on master by Pranith Kumar
Karampuri (pkarampu at redhat.com)
--- Additional comment from Worker Ant on 2016-12-01 06:48:43 EST ---
REVIEW: http://review.gluster.org/15984 (cluster/afr: Serialize conflicting
locks on all subvols) posted (#4) for review on master by Pranith Kumar
Karampuri (pkarampu at redhat.com)
--- Additional comment from Worker Ant on 2016-12-06 07:42:04 EST ---
REVIEW: http://review.gluster.org/15984 (cluster/afr: Serialize conflicting
locks on all subvols) posted (#5) for review on master by Pranith Kumar
Karampuri (pkarampu at redhat.com)
--- Additional comment from Worker Ant on 2016-12-06 07:42:08 EST ---
REVIEW: http://review.gluster.org/16044 (tests: test parallel rmdirs to be
successful) posted (#1) for review on master by Pranith Kumar Karampuri
(pkarampu at redhat.com)
--- Additional comment from Worker Ant on 2016-12-07 01:47:45 EST ---
COMMIT: http://review.gluster.org/15984 committed in master by Pranith Kumar
Karampuri (pkarampu at redhat.com)
------
commit a7d7ed90c9272a42168a91f92754d3a4be605da5
Author: Pranith Kumar K <pkarampu at redhat.com>
Date: Thu Dec 1 09:42:19 2016 +0530
cluster/afr: Serialize conflicting locks on all subvols
Problem:
1) When a blocking lock is issued and the parallel lock phase fails
on all subvolumes with EAGAIN, it is not switching to serialized
locking phase.
2) When quorum is enabled and locks fail partially it is better
to give errno returned by brick rather than the default
quorum errno.
Fix:
Handled this error case and changed op_errno to reflect the actual
errno in case of quorum error.
BUG: 1369077
Change-Id: Ifac2e4a13686e9fde601873012700966d56a7f31
Signed-off-by: Pranith Kumar K <pkarampu at redhat.com>
Reviewed-on: http://review.gluster.org/15984
Smoke: Gluster Build System <jenkins at build.gluster.org>
NetBSD-regression: NetBSD Build System <jenkins at build.gluster.org>
CentOS-regression: Gluster Build System <jenkins at build.gluster.org>
Reviewed-by: Ravishankar N <ravishankar at redhat.com>
Referenced Bugs:
https://bugzilla.redhat.com/show_bug.cgi?id=1369077
[Bug 1369077] The directories get renamed when data bricks are offline in
4*(2+1) volume
https://bugzilla.redhat.com/show_bug.cgi?id=1393694
[Bug 1393694] The directories get renamed when data bricks are offline in
4*(2+1) volume
--
You are receiving this mail because:
You are on the CC list for the bug.
You are the assignee for the bug.
More information about the Bugs
mailing list