[Gluster-devel] brick multiplexing regression is broken

Ravishankar N ravishankar at redhat.com
Fri Oct 6 05:34:33 UTC 2017

The test is failing on master without any patches:

[root at tuxpad glusterfs]# prove tests/bugs/bug-1371806_1.t
tests/bugs/bug-1371806_1.t .. 7/9 setfattr: ./tmp1: No such file or 
setfattr: ./tmp2: No such file or directory
setfattr: ./tmp3: No such file or directory
setfattr: ./tmp4: No such file or directory
setfattr: ./tmp5: No such file or directory
setfattr: ./tmp6: No such file or directory
setfattr: ./tmp7: No such file or directory
setfattr: ./tmp8: No such file or directory
setfattr: ./tmp9: No such file or directory
setfattr: ./tmp10: No such file or directory
./tmp1: user.foo: No such attribute
tests/bugs/bug-1371806_1.t .. Failed 2/9 subtests

Mount log for one of the directories:
[2017-10-06 05:32:10.059798] I [MSGID: 109005] 
[dht-selfheal.c:2458:dht_selfheal_directory] 0-patchy-dht: Directory 
selfheal failed: Unable to form layout for directory /tmp1
[2017-10-06 05:32:10.060013] E [MSGID: 109011] 
[dht-common.c:5011:dht_dir_common_setxattr] 0-patchy-dht: Failed to get 
mds subvol for path /tmp1gfid is 00000000-0000-0000-0000-000000000000
[2017-10-06 05:32:10.060041] W [fuse-bridge.c:1377:fuse_err_cbk] 
0-glusterfs-fuse: 99: SETXATTR() /tmp1 => -1 (No such file or directory)

Request the patch authors to take a look at it.

On 10/05/2017 06:04 PM, Atin Mukherjee wrote:
> The following commit has broken the brick multiplexing regression job. 
> tests/bugs/bug-1371806_1.t has failed couple of times.  One of the 
> latest regression job report is at 
> https://build.gluster.org/job/regression-test-with-multiplex/406/console .
> commit 9b4de61a136b8e5ba7bf0e48690cdb1292d0dee8
> Author: Mohit Agrawal <moagrawa at redhat.com <mailto:moagrawa at redhat.com>>
> Date:   Fri May 12 21:12:47 2017 +0530
>     cluster/dht : User xattrs are not healed after brick stop/start
>     Problem: In a distributed volume custom extended attribute value 
> for a directory
>              does not display correct value after stop/start or added 
> newly brick.
>              If any extended(acl) attribute value is set for a 
> directory after stop/added
>              the brick the attribute(user|acl|quota) value is not 
> updated on brick
>              after start the brick.
>     Solution: First store hashed subvol or subvol(has internal xattr) 
> on inode ctx and
>               consider it as a MDS subvol.At the time of update custom 
> xattr
>               (user,quota,acl, selinux) on directory first check the 
> mds from
>               inode ctx, if mds is not present on inode ctx then throw 
> EINVAL error
>               to application otherwise set xattr on MDS subvol with 
> internal xattr
>               value of -1 and then try to update the attribute on 
> other non MDS
>               volumes also.If mds subvol is down in that case throw an
>               error "Transport endpoint is not connected". In 
> dht_dir_lookup_cbk|
>               dht_revalidate_cbk|dht_discover_complete call 
> dht_call_dir_xattr_heal
>               to heal custom extended attribute.
>               In case of gnfs server if hashed subvol has not found 
> based on
>               loc then wind a call on all subvol to update xattr.
>     Fix:    1) Save MDS subvol on inode ctx
>             2) Check if mds subvol is present on inode ctx
>             3) If mds subvol is down then call unwind with error 
> ENOTCONN and if it is up
>                then set new xattr "GF_DHT_XATTR_MDS" to -1 and wind a 
> call on other
>                subvol.
>             4) If setxattr fop is successful on non-mds subvol then 
> increment the value of
>                internal xattr to +1
>             5) At the time of directory_lookup check the value of new 
>             6) If value is not 0 in dht_lookup_dir_cbk(other cbk) 
> functions then call heal
>                function to heal user xattr
>             7) syncop_setxattr on hashed_subvol to reset the value of 
> xattr to 0
>                if heal is successful on all subvol.
>     Test : To reproduce the issue followed below steps
>            1) Create a distributed volume and create mount point
>            2) Create some directory from mount point mkdir tmp{1..5}
>            3) Kill any one brick from the volume
>            4) Set extended attribute from mount point on directory
>               setfattr -n user.foo -v "abc" ./tmp{1..5}
>               It will throw error " Transport End point is not connected "
>               for those hashed subvol is down
>            5) Start volume with force option to start brick process
>            6) Execute getfattr command on mount point for directory
>            7) Check extended attribute on brick
>               getfattr -n user.foo <volume-location>/tmp{1..5}
>               It shows correct value for directories for those
>               xattr fop were executed successfully.
>     Note: The patch will resolve xattr healing problem only for fuse mount
>           not for nfs mount.
>     BUG: 1371806
>     Signed-off-by: Mohit Agrawal <moagrawa at redhat.com 
> <mailto:moagrawa at redhat.com>>
>     Change-Id: I4eb137eace24a8cb796712b742f1d177a65343d5
