[Gluster-devel] Moratorium on new patch acceptance

Mon May 18 19:49:03 UTC 2015

On 05/18/2015 10:33 AM, Vijay Bellur wrote:
> On 05/16/2015 03:34 PM, Vijay Bellur wrote:
>
>>
>> I will send daily status updates from Monday (05/18) about this so that
>> we are clear about where we are and what needs to be done to remove this
>> moratorium. Appreciate your help in having a clean set of regression
>> tests going forward!
>>
>
> We have made some progress since Saturday. The problem with glupy.t has
> been fixed - thanks to Niels! All but following tests have developers
> looking into them:
>
>      ./tests/basic/afr/entry-self-heal.t
>
>      ./tests/bugs/replicate/bug-976800.t
>
>      ./tests/bugs/replicate/bug-1015990.t
>
>      ./tests/bugs/quota/bug-1038598.t
>
>      ./tests/basic/ec/quota.t
>
>      ./tests/basic/quota-nfs.t
>
>      ./tests/bugs/glusterd/bug-974007.t

The etherpad did not call out, ./tests/bugs/distribute/bug-1161156.t 
which did not have an owner, and so I took a stab at it and below are 
the results.

I also think failure in ./tests/bugs/quota/bug-1038598.t is the same as 
the observation below.

NOTE: Anyone with better knowledge of Quota can possibly chip in as to 
what should we expect in this case and how to correct the expectation 
from these test cases.

(Details of ./tests/bugs/distribute/bug-1161156.t)
1) Failure is in TEST #20
    Failed line: TEST ! dd if=/dev/zero of=$N0/$mydir/newfile_2 bs=1k 
count=10240 conv=fdatasync

2) The above line is expected to fail (i.e dd is expected to fail) as, 
the set quota is 20MB and we are attempting to exceed it by another 5MB 
at this point in the test case.

3) The failure is easily reproducible in my laptop, 2/10 times

4) On debugging, I see that when the above dd succeeds (or the test 
fails, which means dd succeeded in writing more than the set quota), 
there are no write errors from the bricks or any errors on the final 
COMMIT RPC call to NFS.

As a result the expectation of this test fails.

NOTE: Sometimes there is a write failure from one of the bricks (the 
above test uses AFR as well), but AFR self healing kicks in and fixes 
the problem, as expected, as the write succeeded on one of the replicas. 
I add this observation, as the failed regression run logs, has some 
EDQUOT errors reported in the client xlator, but only from one of the 
client bricks, and there are further AFR self heal logs noted in the logs.

5) When the test case succeeds the writes fail with EDQUOT as expected. 
There are times when the quota is exceeded by say 1MB - 4.8MB, but the 
test case still passes. Which means that, if we were to try to exceed 
the quota by 1MB (instead of the 5MB as in the test case), this test 
case may fail always.

6) Note on dd with conv=fdatasync
As one of the fixes attempts to overcome this issue with the addition of 
"conv=fdatasync", wanted to cover that behavior here.

What the above parameter does is to send an NFS_COMMIT (which internally 
becomes a flush FOP) at the end of writing the blocks to the NFS share. 
This commit as a result triggers any pending writes for this file and 
sends the flush to the brick, all of which succeeds at times, resulting 
in the failure of the test case.

NOTE: In the TC ./tests/bugs/quota/bug-1038598.t the failed line is 
pretty much in the same context (LINE 26: TEST ! dd if=/dev/zero 
of=$M0/test_dir/file1.txt bs=1024k count=15 (expecting hard limit to be 
exceeded and there are no write failures in the logs (which should be 
expected with EDQUOT (122))).

>
> Can submitters of these test cases or current feature owners pick these
> up and start looking into the failures please? Do update the spurious
> failures etherpad [1] once you pick up a particular test.
>
> Thanks,
> Vijay
>
> [1] https://public.pad.fsfe.org/p/gluster-spurious-failures
>
>
>
> _______________________________________________
> Gluster-devel mailing list
> Gluster-devel at gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-devel