[Bugs] [Bug 1190058] New: folder "trusted.ec.version" can't be healed after lookup

Fri Feb 6 08:21:54 UTC 2015

https://bugzilla.redhat.com/show_bug.cgi?id=1190058

            Bug ID: 1190058
           Summary: folder "trusted.ec.version" can't be healed after
                    lookup
           Product: GlusterFS
           Version: mainline
         Component: disperse
          Severity: medium
          Assignee: bugs at gluster.org
          Reporter: download007 at sina.com
                CC: bugs at gluster.org, gluster-bugs at redhat.com

Description of problem:

Disperse volume "trusted.ec.version" is different sometimes and can't be healed
after lookup, which cause lookup failed when some brick server down.

Version-Release number of selected component (if applicable):
3.6.2

How reproducible:

Steps to Reproduce:
1.create a disperse volume(2+1)

[root at localhost ~]# gluster volume info

Volume Name: test
Type: Disperse
Volume ID: 24bcea9a-31b3-4333-a17d-776d27d89e8a
Status: Started
Number of Bricks: 1 x (2 + 1) = 3
Transport-type: tcp
Bricks:
Brick1: 192.168.130.132:/data/brick1
Brick2: 192.168.130.132:/data/brick2
Brick3: 192.168.130.132:/data/brick3

2.change trusted.ec version of one brick manually
[root at localhost ~]# setfattr -n trusted.ec.version -v 0x0000000000000003
/data/brick2/
In fact, "trusted.ec.version" is changed to invalid value when I test kill some
brick server, but I can't reproduce it everytime. So I changed it manually.

[root at localhost ~]# getfattr -m . -d -e hex /data/brick{1,2,3}
getfattr: Removing leading '/' from absolute path names
# file: data/brick1
trusted.blusterfs.dht=0x000000010000000000000000ffffffff
trusted.blusterfs.volume-id=0x24bcea9a31b34333a17d776d27d89e8a
trusted.ec.version=0x000000000000000c
trusted.gfid=0x00000000000000000000000000000001

# file: data/brick2
trusted.blusterfs.dht=0x000000010000000000000000ffffffff
trusted.blusterfs.volume-id=0x24bcea9a31b34333a17d776d27d89e8a
trusted.ec.version=0x0000000000000003
trusted.gfid=0x00000000000000000000000000000001

# file: data/brick3
trusted.blusterfs.dht=0x000000010000000000000000ffffffff
trusted.blusterfs.volume-id=0x24bcea9a31b34333a17d776d27d89e8a
trusted.ec.version=0x000000000000000c
trusted.gfid=0x00000000000000000000000000000001

3.mount the disperse volume at /home/mnt

4.ll /home/mnt
[root at localhost ~]# getfattr -m . -d -e hex /data/brick{1,2,3}
getfattr: Removing leading '/' from absolute path names
# file: data/brick1
trusted.blusterfs.dht=0x000000010000000000000000ffffffff
trusted.blusterfs.volume-id=0x24bcea9a31b34333a17d776d27d89e8a
trusted.ec.version=0x000000000000000c
trusted.gfid=0x00000000000000000000000000000001

# file: data/brick2
trusted.blusterfs.dht=0x000000010000000000000000ffffffff
trusted.blusterfs.volume-id=0x24bcea9a31b34333a17d776d27d89e8a
trusted.ec.version=0x0000000000000003
trusted.gfid=0x00000000000000000000000000000001

# file: data/brick3
trusted.blusterfs.dht=0x000000010000000000000000ffffffff
trusted.blusterfs.volume-id=0x24bcea9a31b34333a17d776d27d89e8a
trusted.ec.version=0x000000000000000c
trusted.gfid=0x00000000000000000000000000000001

Actual results:
data/brick2 trusted.ec.version=0x0000000000000003 is still invalid.

Expected results:
data/brick2 trusted.ec.version healed to 0x000000000000000c

Additional info:
If there is none of bricks broken down, ll /home/mnt report no error. But, if
kill the /data/brick3 brick server process, ll /home/mnt will cause an
Input/Output error  like this:
[root at localhost ~]# gluster volume status
Status of volume: test
Bluster process                                         Port    Online  Pid
------------------------------------------------------------------------------
Brick 192.168.130.132:/data/brick1                      49152   Y       6158
Brick 192.168.130.132:/data/brick2                      49153   Y       4031
Brick 192.168.130.132:/data/brick3                      N/A     N       N/A
NFS Server on localhost                                 2049    Y       10921

Task Status of Volume test
------------------------------------------------------------------------------
There are no active volume tasks

[root at localhost ~]# ll /home/mnt 
ls: cannot access /home/mnt: Input/output error

I hope it can be fixed, thank you

-- 
You are receiving this mail because:
You are on the CC list for the bug.
You are the assignee for the bug.