[Bugs] [Bug 1393694] New: The directories get renamed when data bricks are offline in 4*(2+1) volume

Thu Nov 10 07:50:11 UTC 2016

https://bugzilla.redhat.com/show_bug.cgi?id=1393694

            Bug ID: 1393694
           Summary: The directories get renamed when data bricks are
                    offline in 4*(2+1)  volume
           Product: Red Hat Gluster Storage
           Version: 3.2
         Component: replicate
          Keywords: Triaged
          Severity: high
          Assignee: pkarampu at redhat.com
          Reporter: ksandha at redhat.com
        QA Contact: nchilaka at redhat.com
                CC: bugs at gluster.org, pkarampu at redhat.com,
                    ravishankar at redhat.com, rhs-bugs at redhat.com,
                    storage-qa-internal at redhat.com
        Depends On: 1369077

+++ This bug was initially created as a clone of Bug #1369077 +++

Description of problem:

Killed the data bricks which had the directory and data and renamed the
directory from mount pt. renaming was successfull. 
Note:- Read the steps from more information

Version-Release number of selected component (if applicable):
gluster --version
glusterfs 3.8.2 built on Aug 10 2016 15:34:37

How reproducible:
3/3
[root at dhcp43-223 new]# gluster vol info

Volume Name: arbiter
Type: Distributed-Replicate
Volume ID: 70c7113e-2223-4cd2-acfd-b08b1c376ea4
Status: Started
Number of Bricks: 4 x (2 + 1) = 12
Transport-type: tcp
Bricks:
Brick1: 10.70.43.223:/bricks/brick0/abc
Brick2: 10.70.42.58:/bricks/brick0/abc
Brick3: 10.70.43.142:/bricks/brick0/abc (arbiter)
Brick4: 10.70.43.223:/bricks/brick1/abc
Brick5: 10.70.42.58:/bricks/brick1/abc
Brick6: 10.70.43.142:/bricks/brick1/abc (arbiter)
Brick7: 10.70.43.223:/bricks/brick2/abc
Brick8: 10.70.42.58:/bricks/brick2/abc
Brick9: 10.70.43.142:/bricks/brick2/abc (arbiter)
Brick10: 10.70.43.223:/bricks/brick3/abc
Brick11: 10.70.42.58:/bricks/brick3/abc
Brick12: 10.70.43.142:/bricks/brick3/abc (arbiter)
Options Reconfigured:
client.event-threads: 4
server.event-threads: 4
cluster.lookup-optimize: on
transport.address-family: inet
performance.readdir-ahead: on

Steps to Reproduce:
1. Create an arbiter volume 4 x (2 + 1) mount it using FUSE ( volume name
-Arbiter)
2. On mount point create a directory "dir1" and create a file inside "abc"   
3. write 100M to the file using dd
 dd if=/dev/urandom of=abc bs=1M count=100
4. now kill the data bricks from the volume on which the data is present i.e
"abc" file 
in my case:- brick10 , brick11  were data bricks , brick12 was the arbiter
brick
5. Rest all bricks were online.
6. now change the directory name from dir1 to dir2 from mount point using "mv
dir1 dir2"

Actual results:
The directory got renamed in-spite being in read only mode  
#mv dir1 dir2
mv: cannot move ‘dir1’ to ‘dir2’: Read-only file system
#ls
# dir2 
Expected results:
directory shouldn't be renamed. 

Additional info:
Tried the same on plain dist volume and plain replicate 1*3 volume. the issue
was not reproducible.

Reproduced the same issue on 2 x (2 + 1) volume 
observed that after renaming the directory 

[root at dhcp43-165 super]# mv new one
mv: cannot move ‘new’ to ‘one’: Read-only file system
[root at dhcp43-165 super]# 
[root at dhcp43-165 super]# ls
ls: cannot access new: No such file or directory
new  one

two directories are created.

--- Additional comment from Karan Sandha on 2016-08-22 08:47 EDT ---

--- Additional comment from Karan Sandha on 2016-08-22 08:48 EDT ---

--- Additional comment from Karan Sandha on 2016-08-22 08:49 EDT ---

--- Additional comment from Karan Sandha on 2016-08-22 08:53 EDT ---

--- Additional comment from Ravishankar N on 2016-08-24 10:02:28 EDT ---

Changing the component to replicate as it occurs on distribute replicate also.
(Karan, feel free to correct me if I am wrong). Also assigning it to Pranith as
he said he'd work on the fix:

Relevant technical discussions on IRC:
<itisravi>        pranithk1: are you free to talk about the bug Karan raised?
<itisravi>        its a day one issue IMO and not specific to afr.
<itisravi>        s/afr/arbiter   
<pranithk1>       itisravi: He said the bug is not recreatable in 3-way
replication?
<itisravi>        pranithk1: It is..I've requested him to check again.
<itisravi>        pranithk1: so if mkdir fails on one replica subvol due to
quorum not met etc , dht has no roll back
<itisravi>        thats the issue.
<pranithk1>       itisravi: Does it happen on plain replicate?
<itisravi>        pranithk1: no   
<itisravi>        pranithk1: its dht renamedir thing..
<pranithk1>       itisravi: okay, assign the bug to DHT giving the reason
<itisravi>        pranithk1: nithya was saying  if afr_inodelk can also have
quorum checks, then renamedir will not happen
<itisravi>        so we will be good.
<itisravi>        instead of partially creating it on the up subvols of DHT 
<pranithk1>       itisravi: That is not a bad idea, send out a patch. Please
tell her it only prevents the odds, won't fix the problem completely
<itisravi>        pranithk1: we can do it for afr_entrylk also then no?
<pranithk1>       itisravi: Actually the inodelk/finodelk needs to be reworked.
I will send the patch
<pranithk1>       itisravi: yeah, that too
<itisravi>        pranithk1: I see , okay.

--- Additional comment from Niels de Vos on 2016-09-12 01:39:42 EDT ---

All 3.8.x bugs are now reported against version 3.8 (without .x). For more
information, see
http://www.gluster.org/pipermail/gluster-devel/2016-September/050859.html

--- Additional comment from Worker Ant on 2016-11-08 07:21:51 EST ---

REVIEW: http://review.gluster.org/15802 (cluster/afr: Fix bugs in
[f]inodelk/[f]entrylk) posted (#1) for review on master by Pranith Kumar
Karampuri (pkarampu at redhat.com)

Referenced Bugs:

https://bugzilla.redhat.com/show_bug.cgi?id=1369077
[Bug 1369077] The directories get renamed when data bricks are offline in
4*(2+1)  volume
-- 
You are receiving this mail because:
You are on the CC list for the bug.
Unsubscribe from this bug https://bugzilla.redhat.com/token.cgi?t=2zir29TsbP&a=cc_unsubscribe