[Gluster-devel] NetBSD's read-subvol-entry.t spurious failures explained

Ravishankar N ravishankar at redhat.com
Fri Mar 6 12:25:34 UTC 2015


On 03/06/2015 04:31 PM, Emmanuel Dreyfus wrote:
> Hi
>
> I tracked down the spurious failures of read-subvol-entry.t on NetBSD.
>
> Here is what should happen: we have a volume with brick0 and brick1.
> We disable self-heal, kill brick0, create a file in a directory,
> restart brick0, and we list directory content to check we find the file.
>
> The tested mechanism is that in brick1, trusted.afr.patchy-client-0
> accuse brick0 of being outdated, hence AFR should rule out brick0
> for listing directory content, and it should use brick1 which contains
> the file we look for.
>
> On NetBSD I can see that AFR never gets trusted.afr.patchy-client-0
> and walways things brick0 is fine. AFR randomly picks brick0 or brick1
> to list directory content, and when it picks brick0 the test fails.
After bringing brick0 up, and performing "ls abc/def", does 
afr_do_readdir() get called for "def"?
If it does,  then AFR will send lookup to both bricks via 
afr_inode_refresh() , and  it will pick brick1 as the source.
Like I suggested earlier, we could put a print in afr_readdir_wind() and 
see that it indeed goes to brick0 when the test fails.
> The reason why trusted.afr.patchy-client-0 is not there is that the
> node is cached in kernel FUSE from an earlier lookup. The TTL obtained
> at that times tells the kernel this node is still valid, hence the
> kernel does not send the new lookup to GlusterFS. Since GlusterFS uses
> lookups to referesh client view of xattr, it sticks with older value
> where brick0 was not yet oudated, and trusted.afr.patchy-client-0 is
> unset.
If readdir comes on def, then it is AFR that initiates the lookup. So no 
fuse caching should be involved.

>
> Questions:
>
> 1) Is NetBSD behavior wrong here? It got a TTL for a node, I understand
> it should not send lookups to the filesystem until the TTL is expired.
>
> 2) How to fix it? If NetBSD behavior is correct, then I guess the test
> only succeeds on Linux by chance and we only need to fix the test.
> The change below flush kernel cache before looking for the file:
>
> --- a/tests/basic/afr/read-subvol-entry.t
> +++ b/tests/basic/afr/read-subvol-entry.t
> @@ -26,6 +26,7 @@ TEST kill_brick $V0 $H0 $B0/brick0
>   
>   TEST touch $M0/abc/def/ghi
>   TEST $CLI volume start $V0 force
> +( cd $M0 && umount $M0 )
>   EXPECT_WITHIN $PROCESS_UP_TIMEOUT "ghi" echo `ls $M0/abc/def/`
>   
>   #Cleanup
>
>
>



More information about the Gluster-devel mailing list