[Gluster-devel] Moratorium on new patch acceptance

Shyam srangana at redhat.com
Tue May 19 00:43:06 UTC 2015


On 05/18/2015 07:05 PM, Shyam wrote:
> On 05/18/2015 03:49 PM, Shyam wrote:
>> On 05/18/2015 10:33 AM, Vijay Bellur wrote:
>>
>> The etherpad did not call out, ./tests/bugs/distribute/bug-1161156.t
>> which did not have an owner, and so I took a stab at it and below are
>> the results.
>>
>> I also think failure in ./tests/bugs/quota/bug-1038598.t is the same as
>> the observation below.
>>
>> NOTE: Anyone with better knowledge of Quota can possibly chip in as to
>> what should we expect in this case and how to correct the expectation
>> from these test cases.
>>
>> (Details of ./tests/bugs/distribute/bug-1161156.t)
>> 1) Failure is in TEST #20
>>     Failed line: TEST ! dd if=/dev/zero of=$N0/$mydir/newfile_2 bs=1k
>> count=10240 conv=fdatasync
>>
>> 2) The above line is expected to fail (i.e dd is expected to fail) as,
>> the set quota is 20MB and we are attempting to exceed it by another 5MB
>> at this point in the test case.
>>
>> 3) The failure is easily reproducible in my laptop, 2/10 times
>>
>> 4) On debugging, I see that when the above dd succeeds (or the test
>> fails, which means dd succeeded in writing more than the set quota),
>> there are no write errors from the bricks or any errors on the final
>> COMMIT RPC call to NFS.
>>
>> As a result the expectation of this test fails.
>>
>> NOTE: Sometimes there is a write failure from one of the bricks (the
>> above test uses AFR as well), but AFR self healing kicks in and fixes
>> the problem, as expected, as the write succeeded on one of the replicas.
>> I add this observation, as the failed regression run logs, has some
>> EDQUOT errors reported in the client xlator, but only from one of the
>> client bricks, and there are further AFR self heal logs noted in the
>> logs.
>>
>> 5) When the test case succeeds the writes fail with EDQUOT as expected.
>> There are times when the quota is exceeded by say 1MB - 4.8MB, but the
>> test case still passes. Which means that, if we were to try to exceed
>> the quota by 1MB (instead of the 5MB as in the test case), this test
>> case may fail always.
>
> Here is why I think this passes by quota sometime and not others making
> this and the other test case mentioned below spurious.
> - Each write is 256K from the client (that is what is sent over the wire)
> - If more IO was queued by io-threads after passing quota checks, which
> in this 5MB case requires >20 IOs to be queued (16 IOs could be active
> in io-threads itself), we could end up writing more than the quota amount
>
> So, if quota checks to see if a write is violating the quota, and let's
> it through, and updates on the UNWIND the space used for future checks,
> we could have more IO outstanding than what the quota allows, and as a
> result allow such a larger write to pass through, considering IO threads
> queue and active IOs as well. Would this be a fair assumption of how
> quota works?
>
> I believe this is what is happening in this case. Checking a fix on my
> machine, and will post the same if it proves to be help the situation.

Posted a patch to fix the problem: http://review.gluster.org/#/c/10811/

There are arguably other ways to fix/overcome the same, this seemed apt 
for this test case though.

>
>>
>> 6) Note on dd with conv=fdatasync
>> As one of the fixes attempts to overcome this issue with the addition of
>> "conv=fdatasync", wanted to cover that behavior here.
>>
>> What the above parameter does is to send an NFS_COMMIT (which internally
>> becomes a flush FOP) at the end of writing the blocks to the NFS share.
>> This commit as a result triggers any pending writes for this file and
>> sends the flush to the brick, all of which succeeds at times, resulting
>> in the failure of the test case.
>>
>> NOTE: In the TC ./tests/bugs/quota/bug-1038598.t the failed line is
>> pretty much in the same context (LINE 26: TEST ! dd if=/dev/zero
>> of=$M0/test_dir/file1.txt bs=1024k count=15 (expecting hard limit to be
>> exceeded and there are no write failures in the logs (which should be
>> expected with EDQUOT (122))).
> _______________________________________________
> Gluster-devel mailing list
> Gluster-devel at gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-devel


More information about the Gluster-devel mailing list