[Gluster-devel] Moratorium on new patch acceptance

Tue May 19 15:23:37 UTC 2015

On Tuesday 19 May 2015 08:36 PM, Shyam wrote:
> On 05/19/2015 08:10 AM, Raghavendra G wrote:
>> After discussion with Vijaykumar mallikarjuna and other inputs in this
>> thread, we are proposing all quota tests to comply to following 
>> criteria:
>>
>> * use dd always with oflag=append (to make sure there are no parallel
>> writes) and conv=fdatasync (to make sure errors, if any are delivered to
>> application. Turning off flush-behind is optional since fdatasync acts
>> as a barrier)
>>
>> OR
>>
>> * turn off write-behind in nfs client and glusterfs server.
>>
>> What do you people think is a better test scenario?
>>
>> Also, we don't have confirmation on the RCA that parallel writes are
>> indeed the culprits. We are trying to reproduce the issue locally.
>> @Shyam, it would be helpful if you can confirm the hypothesis :).
>
> Ummm... I thought we acknowledge that quota checks are done during the 
> WIND and updated during UNWIND, and we have io threads doing in flight 
> IOs (as well as possible IOs in io threads queue) and we have 256K 
> writes in the case mentioned. Put together, in my head this forms a 
> good RCA that we write more than needed due to the in flight IOs on 
> the brick. We need to control the in flight IOs as a resolution for 
> this from the application.
>
> In terms of actual proof, we would need to instrument the code and 
> check. When you say it does not fail for you, does the file stop once 
> quota is reached or is a random size greater than quota? Which itself 
> may explain or point to the RCA.
>
> The basic thing needed from an application is,
> - Sync IOs, so that there aren't too many in flight IOs and the 
> application waits for each IO to complete
> - Based on tests below if we keep block size in dd lower and use 
> oflag=sync we can achieve the same, if we use higher block sizes we 
> cannot
>
> Test results:
> 1) noac:
>   - NFS sends a COMMIT (internally translates to a flush) post each IO 
> request (NFS WRITES are still with the UNSTABLE flag)
>   - Ensures prior IO is complete before next IO request is sent (due 
> to waiting on the COMMIT)
>   - Fails if IO size is large, i.e in the test case being discussed I 
> changed the dd line that was failing as "TEST ! dd if=/dev/zero 
> of=$N0/$mydir/newfile_2 *bs=10M* count=1 conv=fdatasync" and this 
> fails at times, as the writes here are sent as 256k chunks to the 
> server and we still see the same behavior
>   - noac + performance.nfs.flush-behind: off + 
> performance.flush-behind: off + performance.nfs.strict-write-ordering: 
> on + performance.strict-write-ordering: on + 
> performance.nfs.write-behind: off + performance.write-behind: off
>     - Still see similar failures, i.e at times 10MB file is created 
> successfully in the modified dd command above
>
> Overall, the switch works, but not always. If we are to use this 
> variant then we need to announce that all quota tests using dd not try 
> to go beyond the quota limit set in a single IO from dd.
>
> 2) oflag=sync:
>   - Exactly the same behavior as above.
>
> 3) Added all (and possibly the kitches sink) to the test case, as 
> attached, and still see failures,
>   - Yes, I have made the test fail intentionally (of sorts) by using 
> 3M per dd IO and 2 IOs to go beyond the quota limit.
>   - The intention is to demonstrate that we still get parallel IOs 
> from NFS client
>   - The test would work if we reduce the block size per IO (reliably 
> is a border condition here, and we need specific rules like block size 
> and how many blocks before we state quota is exceeded etc.)
>   - The test would work if we just go beyond the quota, and then check 
> a separate dd instance as being able to *not* exceed the quota. Which 
> is why I put up that patch.
>
> What next?
>
Hi Shyam,

I tried running the test with dd option 'oflag=append' and didn't see 
the issue.Can you please try this option and see if it works?

Thanks,
Vijay

>>
>> regards,
>> Raghavendra.
>>
>> On Tue, May 19, 2015 at 5:27 PM, Raghavendra G <raghavendra at gluster.com
>> <mailto:raghavendra at gluster.com>> wrote:
>>
>>
>>
>>     On Tue, May 19, 2015 at 4:26 PM, Jeff Darcy <jdarcy at redhat.com
>>     <mailto:jdarcy at redhat.com>> wrote:
>>
>>         > No, my suggestion was aimed at not having parallel writes. 
>> In this case quota
>>         > won't even fail the writes with EDQUOT because of reasons 
>> explained above.
>>         > Yes, we need to disable flush-behind along with this so 
>> that errors are
>>         > delivered to application.
>>
>>         Would conv=sync help here?  That should prevent any kind of
>>         write parallelism.
>>
>>
>>     An strace of dd shows that
>>
>>     * fdatasync is issued only once at the end of all writes when
>>     conv=fdatasync
>>     * for some strange reason no fsync or fdatasync is issued at all
>>     when conv=sync
>>
>>     So, using conv=fdatasync in the test cannot prevent
>>     write-parallelism induced by write-behind. Parallelism would've been
>>     prevented only if dd had issued fdatasync after each write or opened
>>     the file with O_SYNC.
>>
>>         If it doesn't, I'd say that's a true test failure somewhere in
>>         our stack.  A
>>         similar possibility would be to invoke dd multiple times with
>>         oflag=append.
>>
>>
>>     Yes, appending writes curb parallelism (at least in glusterfs, but
>>     not sure how nfs client behaves) and hence can be used  as an
>>     alternative solution.
>>
>>     On a slightly unrelated note flush-behind is immaterial in this test
>>     since fdatasync is anyways acting as a barrier.
>>
>>         _______________________________________________
>>         Gluster-devel mailing list
>>         Gluster-devel at gluster.org <mailto:Gluster-devel at gluster.org>
>>         http://www.gluster.org/mailman/listinfo/gluster-devel
>>
>>
>>
>>
>>     --
>>     Raghavendra G
>>
>>
>>
>>
>> -- 
>> Raghavendra G
>>
>>
>> _______________________________________________
>> Gluster-devel mailing list
>> Gluster-devel at gluster.org
>> http://www.gluster.org/mailman/listinfo/gluster-devel
>>
>
>
> _______________________________________________
> Gluster-devel mailing list
> Gluster-devel at gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-devel

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.gluster.org/pipermail/gluster-devel/attachments/20150519/c0a63948/attachment.html>