[Gluster-devel] Moratorium on new patch acceptance
Vijaikumar M
vmallika at redhat.com
Tue May 19 15:23:37 UTC 2015
On Tuesday 19 May 2015 08:36 PM, Shyam wrote:
> On 05/19/2015 08:10 AM, Raghavendra G wrote:
>> After discussion with Vijaykumar mallikarjuna and other inputs in this
>> thread, we are proposing all quota tests to comply to following
>> criteria:
>>
>> * use dd always with oflag=append (to make sure there are no parallel
>> writes) and conv=fdatasync (to make sure errors, if any are delivered to
>> application. Turning off flush-behind is optional since fdatasync acts
>> as a barrier)
>>
>> OR
>>
>> * turn off write-behind in nfs client and glusterfs server.
>>
>> What do you people think is a better test scenario?
>>
>> Also, we don't have confirmation on the RCA that parallel writes are
>> indeed the culprits. We are trying to reproduce the issue locally.
>> @Shyam, it would be helpful if you can confirm the hypothesis :).
>
> Ummm... I thought we acknowledge that quota checks are done during the
> WIND and updated during UNWIND, and we have io threads doing in flight
> IOs (as well as possible IOs in io threads queue) and we have 256K
> writes in the case mentioned. Put together, in my head this forms a
> good RCA that we write more than needed due to the in flight IOs on
> the brick. We need to control the in flight IOs as a resolution for
> this from the application.
>
> In terms of actual proof, we would need to instrument the code and
> check. When you say it does not fail for you, does the file stop once
> quota is reached or is a random size greater than quota? Which itself
> may explain or point to the RCA.
>
> The basic thing needed from an application is,
> - Sync IOs, so that there aren't too many in flight IOs and the
> application waits for each IO to complete
> - Based on tests below if we keep block size in dd lower and use
> oflag=sync we can achieve the same, if we use higher block sizes we
> cannot
>
> Test results:
> 1) noac:
> - NFS sends a COMMIT (internally translates to a flush) post each IO
> request (NFS WRITES are still with the UNSTABLE flag)
> - Ensures prior IO is complete before next IO request is sent (due
> to waiting on the COMMIT)
> - Fails if IO size is large, i.e in the test case being discussed I
> changed the dd line that was failing as "TEST ! dd if=/dev/zero
> of=$N0/$mydir/newfile_2 *bs=10M* count=1 conv=fdatasync" and this
> fails at times, as the writes here are sent as 256k chunks to the
> server and we still see the same behavior
> - noac + performance.nfs.flush-behind: off +
> performance.flush-behind: off + performance.nfs.strict-write-ordering:
> on + performance.strict-write-ordering: on +
> performance.nfs.write-behind: off + performance.write-behind: off
> - Still see similar failures, i.e at times 10MB file is created
> successfully in the modified dd command above
>
> Overall, the switch works, but not always. If we are to use this
> variant then we need to announce that all quota tests using dd not try
> to go beyond the quota limit set in a single IO from dd.
>
> 2) oflag=sync:
> - Exactly the same behavior as above.
>
> 3) Added all (and possibly the kitches sink) to the test case, as
> attached, and still see failures,
> - Yes, I have made the test fail intentionally (of sorts) by using
> 3M per dd IO and 2 IOs to go beyond the quota limit.
> - The intention is to demonstrate that we still get parallel IOs
> from NFS client
> - The test would work if we reduce the block size per IO (reliably
> is a border condition here, and we need specific rules like block size
> and how many blocks before we state quota is exceeded etc.)
> - The test would work if we just go beyond the quota, and then check
> a separate dd instance as being able to *not* exceed the quota. Which
> is why I put up that patch.
>
> What next?
>
Hi Shyam,
I tried running the test with dd option 'oflag=append' and didn't see
the issue.Can you please try this option and see if it works?
Thanks,
Vijay
>>
>> regards,
>> Raghavendra.
>>
>> On Tue, May 19, 2015 at 5:27 PM, Raghavendra G <raghavendra at gluster.com
>> <mailto:raghavendra at gluster.com>> wrote:
>>
>>
>>
>> On Tue, May 19, 2015 at 4:26 PM, Jeff Darcy <jdarcy at redhat.com
>> <mailto:jdarcy at redhat.com>> wrote:
>>
>> > No, my suggestion was aimed at not having parallel writes.
>> In this case quota
>> > won't even fail the writes with EDQUOT because of reasons
>> explained above.
>> > Yes, we need to disable flush-behind along with this so
>> that errors are
>> > delivered to application.
>>
>> Would conv=sync help here? That should prevent any kind of
>> write parallelism.
>>
>>
>> An strace of dd shows that
>>
>> * fdatasync is issued only once at the end of all writes when
>> conv=fdatasync
>> * for some strange reason no fsync or fdatasync is issued at all
>> when conv=sync
>>
>> So, using conv=fdatasync in the test cannot prevent
>> write-parallelism induced by write-behind. Parallelism would've been
>> prevented only if dd had issued fdatasync after each write or opened
>> the file with O_SYNC.
>>
>> If it doesn't, I'd say that's a true test failure somewhere in
>> our stack. A
>> similar possibility would be to invoke dd multiple times with
>> oflag=append.
>>
>>
>> Yes, appending writes curb parallelism (at least in glusterfs, but
>> not sure how nfs client behaves) and hence can be used as an
>> alternative solution.
>>
>> On a slightly unrelated note flush-behind is immaterial in this test
>> since fdatasync is anyways acting as a barrier.
>>
>> _______________________________________________
>> Gluster-devel mailing list
>> Gluster-devel at gluster.org <mailto:Gluster-devel at gluster.org>
>> http://www.gluster.org/mailman/listinfo/gluster-devel
>>
>>
>>
>>
>> --
>> Raghavendra G
>>
>>
>>
>>
>> --
>> Raghavendra G
>>
>>
>> _______________________________________________
>> Gluster-devel mailing list
>> Gluster-devel at gluster.org
>> http://www.gluster.org/mailman/listinfo/gluster-devel
>>
>
>
> _______________________________________________
> Gluster-devel mailing list
> Gluster-devel at gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-devel
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.gluster.org/pipermail/gluster-devel/attachments/20150519/c0a63948/attachment.html>
More information about the Gluster-devel
mailing list