[Gluster-devel] brick multiplexing regression is broken

Fri Oct 6 04:45:48 UTC 2017

The test which is failing is also introduced with the same patch. It is supposed to validate the functionality introduced. From the history of earlier patchsets of same patch, same test has failed earlier too, albeit inconsistently (though the merged version has passed centos regressions). So, it looks like the patch is not working as intended during some race conditions. Frequent earlier failures should've served as an alert, but I failed to notice. Sorry about that.

----- Original Message -----
> From: "Atin Mukherjee" <amukherj at redhat.com>
> To: "Mohit Agrawal" <moagrawa at redhat.com>, "Raghavendra Gowdappa" <rgowdapp at redhat.com>
> Cc: "Gluster Devel" <gluster-devel at gluster.org>
> Sent: Thursday, October 5, 2017 6:04:57 PM
> Subject: brick multiplexing regression is broken
> 
> The following commit has broken the brick multiplexing regression job.
> tests/bugs/bug-1371806_1.t has failed couple of times.  One of the latest
> regression job report is at
> https://build.gluster.org/job/regression-test-with-multiplex/406/console .
> 
> 
> commit 9b4de61a136b8e5ba7bf0e48690cdb1292d0dee8
> Author: Mohit Agrawal <moagrawa at redhat.com>
> Date:   Fri May 12 21:12:47 2017 +0530
> 
>     cluster/dht : User xattrs are not healed after brick stop/start
> 
>     Problem: In a distributed volume custom extended attribute value for a
> directory
>              does not display correct value after stop/start or added newly
> brick.
>              If any extended(acl) attribute value is set for a directory
> after stop/added
>              the brick the attribute(user|acl|quota) value is not updated
> on brick
>              after start the brick.
> 
>     Solution: First store hashed subvol or subvol(has internal xattr) on
> inode ctx and
>               consider it as a MDS subvol.At the time of update custom xattr
>               (user,quota,acl, selinux) on directory first check the mds
> from
>               inode ctx, if mds is not present on inode ctx then throw
> EINVAL error
>               to application otherwise set xattr on MDS subvol with
> internal xattr
>               value of -1 and then try to update the attribute on other non
> MDS
>               volumes also.If mds subvol is down in that case throw an
>               error "Transport endpoint is not connected". In
> dht_dir_lookup_cbk|
>               dht_revalidate_cbk|dht_discover_complete call
> dht_call_dir_xattr_heal
>               to heal custom extended attribute.
>               In case of gnfs server if hashed subvol has not found based on
>               loc then wind a call on all subvol to update xattr.
> 
>     Fix:    1) Save MDS subvol on inode ctx
>             2) Check if mds subvol is present on inode ctx
>             3) If mds subvol is down then call unwind with error ENOTCONN
> and if it is up
>                then set new xattr "GF_DHT_XATTR_MDS" to -1 and wind a call
> on other
>                subvol.
>             4) If setxattr fop is successful on non-mds subvol then
> increment the value of
>                internal xattr to +1
>             5) At the time of directory_lookup check the value of new xattr
> GF_DHT_XATTR_MDS
>             6) If value is not 0 in dht_lookup_dir_cbk(other cbk) functions
> then call heal
>                function to heal user xattr
>             7) syncop_setxattr on hashed_subvol to reset the value of xattr
> to 0
>                if heal is successful on all subvol.
> 
>     Test : To reproduce the issue followed below steps
>            1) Create a distributed volume and create mount point
>            2) Create some directory from mount point mkdir tmp{1..5}
>            3) Kill any one brick from the volume
>            4) Set extended attribute from mount point on directory
>               setfattr -n user.foo -v "abc" ./tmp{1..5}
>               It will throw error " Transport End point is not connected "
>               for those hashed subvol is down
>            5) Start volume with force option to start brick process
>            6) Execute getfattr command on mount point for directory
>            7) Check extended attribute on brick
>               getfattr -n user.foo <volume-location>/tmp{1..5}
>               It shows correct value for directories for those
>               xattr fop were executed successfully.
> 
>     Note: The patch will resolve xattr healing problem only for fuse mount
>           not for nfs mount.
> 
>     BUG: 1371806
>     Signed-off-by: Mohit Agrawal <moagrawa at redhat.com>
> 
>     Change-Id: I4eb137eace24a8cb796712b742f1d177a65343d5
>