[Gluster-devel] Moratorium on new patch acceptance
Vijaikumar M
vmallika at redhat.com
Wed May 20 08:31:35 UTC 2015
On Tuesday 19 May 2015 09:50 PM, Shyam wrote:
> On 05/19/2015 11:23 AM, Vijaikumar M wrote:
>>
>>
>> On Tuesday 19 May 2015 08:36 PM, Shyam wrote:
>>> On 05/19/2015 08:10 AM, Raghavendra G wrote:
>>>> After discussion with Vijaykumar mallikarjuna and other inputs in this
>>>> thread, we are proposing all quota tests to comply to following
>>>> criteria:
>>>>
>>>> * use dd always with oflag=append (to make sure there are no parallel
>>>> writes) and conv=fdatasync (to make sure errors, if any are
>>>> delivered to
>>>> application. Turning off flush-behind is optional since fdatasync acts
>>>> as a barrier)
>>>>
>>>> OR
>>>>
>>>> * turn off write-behind in nfs client and glusterfs server.
>>>>
>>>> What do you people think is a better test scenario?
>>>>
>>>> Also, we don't have confirmation on the RCA that parallel writes are
>>>> indeed the culprits. We are trying to reproduce the issue locally.
>>>> @Shyam, it would be helpful if you can confirm the hypothesis :).
>>>
>>> Ummm... I thought we acknowledge that quota checks are done during the
>>> WIND and updated during UNWIND, and we have io threads doing in flight
>>> IOs (as well as possible IOs in io threads queue) and we have 256K
>>> writes in the case mentioned. Put together, in my head this forms a
>>> good RCA that we write more than needed due to the in flight IOs on
>>> the brick. We need to control the in flight IOs as a resolution for
>>> this from the application.
>>>
>>> In terms of actual proof, we would need to instrument the code and
>>> check. When you say it does not fail for you, does the file stop once
>>> quota is reached or is a random size greater than quota? Which itself
>>> may explain or point to the RCA.
>>>
>>> The basic thing needed from an application is,
>>> - Sync IOs, so that there aren't too many in flight IOs and the
>>> application waits for each IO to complete
>>> - Based on tests below if we keep block size in dd lower and use
>>> oflag=sync we can achieve the same, if we use higher block sizes we
>>> cannot
>>>
>>> Test results:
>>> 1) noac:
>>> - NFS sends a COMMIT (internally translates to a flush) post each IO
>>> request (NFS WRITES are still with the UNSTABLE flag)
>>> - Ensures prior IO is complete before next IO request is sent (due
>>> to waiting on the COMMIT)
>>> - Fails if IO size is large, i.e in the test case being discussed I
>>> changed the dd line that was failing as "TEST ! dd if=/dev/zero
>>> of=$N0/$mydir/newfile_2 *bs=10M* count=1 conv=fdatasync" and this
>>> fails at times, as the writes here are sent as 256k chunks to the
>>> server and we still see the same behavior
>>> - noac + performance.nfs.flush-behind: off +
>>> performance.flush-behind: off + performance.nfs.strict-write-ordering:
>>> on + performance.strict-write-ordering: on +
>>> performance.nfs.write-behind: off + performance.write-behind: off
>>> - Still see similar failures, i.e at times 10MB file is created
>>> successfully in the modified dd command above
>>>
>>> Overall, the switch works, but not always. If we are to use this
>>> variant then we need to announce that all quota tests using dd not try
>>> to go beyond the quota limit set in a single IO from dd.
>>>
>>> 2) oflag=sync:
>>> - Exactly the same behavior as above.
>>>
>>> 3) Added all (and possibly the kitches sink) to the test case, as
>>> attached, and still see failures,
>>> - Yes, I have made the test fail intentionally (of sorts) by using
>>> 3M per dd IO and 2 IOs to go beyond the quota limit.
>>> - The intention is to demonstrate that we still get parallel IOs
>>> from NFS client
>>> - The test would work if we reduce the block size per IO (reliably
>>> is a border condition here, and we need specific rules like block size
>>> and how many blocks before we state quota is exceeded etc.)
>>> - The test would work if we just go beyond the quota, and then check
>>> a separate dd instance as being able to *not* exceed the quota. Which
>>> is why I put up that patch.
>>>
>>> What next?
>>>
>> Hi Shyam,
>>
>> I tried running the test with dd option 'oflag=append' and didn't see
>> the issue.Can you please try this option and see if it works?
>
> Did that (in the attached script that I sent) and it still failed.
>
> Please note:
> - This dd command passes (or fails with EDQUOT)
> - dd if=/dev/zero of=$N0/$mydir/newfile_2 bs=512 count=10240
> oflag=append oflag=sync conv=fdatasync
> - We can even drop append and fdatasync, as sync sends a commit per
> block written which is better for the test and quota enforcement,
> whereas fdatasync does one in the end and sometimes fails (with larger
> block sizes, say 1M)
> - We can change bs to [512 - 256k]
>
Here you are trying to write 5M of data which is always written and test
will fail.
> - This dd command fails (or writes all the data)
> - dd if=/dev/zero of=$N0/$mydir/newfile_2 bs=3M count=2 oflag=append
> oflag=sync conv=fdatasync
>
Here you are trying to write 6M of data (Exceeding only 1M of quota
limit) and test can fail. With count=3, test passes
> The reasoning is that when we write a larger block size, NFS sends in
> multiple 256k chunks to write and then sends the commit before the
> next block. As a result if we exceed quota in the *last block* that we
> are writing, we *may* fail. If we exceed quota in the last but one
> block we will pass.
>
> Hope this shorter version explains it better.
>
> (VijayM is educating me on quota (over IM), and it looks like the
> quota update happens as a synctask in the background, so post the
> flush (NFS commit) we may still have a race)
>
> Post education solution:
> - Quota updates on disk xattr as a sync task, as a result if we
> exceeded quota in the n-1th block there is no guarantee that the nth
> block would fail, as the sync task may not have completed
>
> So I think we need to do the following for the quota based tests
> (expanding on the provided patch, http://review.gluster.org/#/c/10811/ )
> - First dd that exceeds quota (with either oflag=sync or
> conv=fdatasync so that we do not see any flush behind or write behind
> effects) to be done without checks
> - Next check in an EXPECT_WITHIN that quota is exceeded (maybe add
> checks on the just created/appended file w.r.t its minimum size that
> would make it exceed the quota)
> - Then do a further dd to a new file or append to an existing file to
> get the EDQUOT error
> - Proceed with whatever the test case needs to do next
>
> Suggestions?
>
>>
>> Thanks,
>> Vijay
>>
>>>>
>>>> regards,
>>>> Raghavendra.
>>>>
>>>> On Tue, May 19, 2015 at 5:27 PM, Raghavendra G
>>>> <raghavendra at gluster.com
>>>> <mailto:raghavendra at gluster.com>> wrote:
>>>>
>>>>
>>>>
>>>> On Tue, May 19, 2015 at 4:26 PM, Jeff Darcy <jdarcy at redhat.com
>>>> <mailto:jdarcy at redhat.com>> wrote:
>>>>
>>>> > No, my suggestion was aimed at not having parallel writes.
>>>> In this case quota
>>>> > won't even fail the writes with EDQUOT because of reasons
>>>> explained above.
>>>> > Yes, we need to disable flush-behind along with this so
>>>> that errors are
>>>> > delivered to application.
>>>>
>>>> Would conv=sync help here? That should prevent any kind of
>>>> write parallelism.
>>>>
>>>>
>>>> An strace of dd shows that
>>>>
>>>> * fdatasync is issued only once at the end of all writes when
>>>> conv=fdatasync
>>>> * for some strange reason no fsync or fdatasync is issued at all
>>>> when conv=sync
>>>>
>>>> So, using conv=fdatasync in the test cannot prevent
>>>> write-parallelism induced by write-behind. Parallelism would've
>>>> been
>>>> prevented only if dd had issued fdatasync after each write or
>>>> opened
>>>> the file with O_SYNC.
>>>>
>>>> If it doesn't, I'd say that's a true test failure somewhere in
>>>> our stack. A
>>>> similar possibility would be to invoke dd multiple times with
>>>> oflag=append.
>>>>
>>>>
>>>> Yes, appending writes curb parallelism (at least in glusterfs, but
>>>> not sure how nfs client behaves) and hence can be used as an
>>>> alternative solution.
>>>>
>>>> On a slightly unrelated note flush-behind is immaterial in this
>>>> test
>>>> since fdatasync is anyways acting as a barrier.
>>>>
>>>> _______________________________________________
>>>> Gluster-devel mailing list
>>>> Gluster-devel at gluster.org <mailto:Gluster-devel at gluster.org>
>>>> http://www.gluster.org/mailman/listinfo/gluster-devel
>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Raghavendra G
>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Raghavendra G
>>>>
>>>>
>>>> _______________________________________________
>>>> Gluster-devel mailing list
>>>> Gluster-devel at gluster.org
>>>> http://www.gluster.org/mailman/listinfo/gluster-devel
>>>>
>>>
>>>
>>> _______________________________________________
>>> Gluster-devel mailing list
>>> Gluster-devel at gluster.org
>>> http://www.gluster.org/mailman/listinfo/gluster-devel
>>
More information about the Gluster-devel
mailing list