[Gluster-devel] Moratorium on new patch acceptance

Shyam srangana at redhat.com
Tue May 19 15:06:07 UTC 2015


On 05/19/2015 08:10 AM, Raghavendra G wrote:
> After discussion with Vijaykumar mallikarjuna and other inputs in this
> thread, we are proposing all quota tests to comply to following criteria:
>
> * use dd always with oflag=append (to make sure there are no parallel
> writes) and conv=fdatasync (to make sure errors, if any are delivered to
> application. Turning off flush-behind is optional since fdatasync acts
> as a barrier)
>
> OR
>
> * turn off write-behind in nfs client and glusterfs server.
>
> What do you people think is a better test scenario?
>
> Also, we don't have confirmation on the RCA that parallel writes are
> indeed the culprits. We are trying to reproduce the issue locally.
> @Shyam, it would be helpful if you can confirm the hypothesis :).

Ummm... I thought we acknowledge that quota checks are done during the 
WIND and updated during UNWIND, and we have io threads doing in flight 
IOs (as well as possible IOs in io threads queue) and we have 256K 
writes in the case mentioned. Put together, in my head this forms a good 
RCA that we write more than needed due to the in flight IOs on the 
brick. We need to control the in flight IOs as a resolution for this 
from the application.

In terms of actual proof, we would need to instrument the code and 
check. When you say it does not fail for you, does the file stop once 
quota is reached or is a random size greater than quota? Which itself 
may explain or point to the RCA.

The basic thing needed from an application is,
- Sync IOs, so that there aren't too many in flight IOs and the 
application waits for each IO to complete
- Based on tests below if we keep block size in dd lower and use 
oflag=sync we can achieve the same, if we use higher block sizes we cannot

Test results:
1) noac:
   - NFS sends a COMMIT (internally translates to a flush) post each IO 
request (NFS WRITES are still with the UNSTABLE flag)
   - Ensures prior IO is complete before next IO request is sent (due to 
waiting on the COMMIT)
   - Fails if IO size is large, i.e in the test case being discussed I 
changed the dd line that was failing as "TEST ! dd if=/dev/zero 
of=$N0/$mydir/newfile_2 *bs=10M* count=1 conv=fdatasync" and this fails 
at times, as the writes here are sent as 256k chunks to the server and 
we still see the same behavior
   - noac + performance.nfs.flush-behind: off + 
performance.flush-behind: off + performance.nfs.strict-write-ordering: 
on + performance.strict-write-ordering: on + 
performance.nfs.write-behind: off + performance.write-behind: off
     - Still see similar failures, i.e at times 10MB file is created 
successfully in the modified dd command above

Overall, the switch works, but not always. If we are to use this variant 
then we need to announce that all quota tests using dd not try to go 
beyond the quota limit set in a single IO from dd.

2) oflag=sync:
   - Exactly the same behavior as above.

3) Added all (and possibly the kitches sink) to the test case, as 
attached, and still see failures,
   - Yes, I have made the test fail intentionally (of sorts) by using 3M 
per dd IO and 2 IOs to go beyond the quota limit.
   - The intention is to demonstrate that we still get parallel IOs from 
NFS client
   - The test would work if we reduce the block size per IO (reliably is 
a border condition here, and we need specific rules like block size and 
how many blocks before we state quota is exceeded etc.)
   - The test would work if we just go beyond the quota, and then check 
a separate dd instance as being able to *not* exceed the quota. Which is 
why I put up that patch.

What next?

>
> regards,
> Raghavendra.
>
> On Tue, May 19, 2015 at 5:27 PM, Raghavendra G <raghavendra at gluster.com
> <mailto:raghavendra at gluster.com>> wrote:
>
>
>
>     On Tue, May 19, 2015 at 4:26 PM, Jeff Darcy <jdarcy at redhat.com
>     <mailto:jdarcy at redhat.com>> wrote:
>
>         > No, my suggestion was aimed at not having parallel writes. In this case quota
>         > won't even fail the writes with EDQUOT because of reasons explained above.
>         > Yes, we need to disable flush-behind along with this so that errors are
>         > delivered to application.
>
>         Would conv=sync help here?  That should prevent any kind of
>         write parallelism.
>
>
>     An strace of dd shows that
>
>     * fdatasync is issued only once at the end of all writes when
>     conv=fdatasync
>     * for some strange reason no fsync or fdatasync is issued at all
>     when conv=sync
>
>     So, using conv=fdatasync in the test cannot prevent
>     write-parallelism induced by write-behind. Parallelism would've been
>     prevented only if dd had issued fdatasync after each write or opened
>     the file with O_SYNC.
>
>         If it doesn't, I'd say that's a true test failure somewhere in
>         our stack.  A
>         similar possibility would be to invoke dd multiple times with
>         oflag=append.
>
>
>     Yes, appending writes curb parallelism (at least in glusterfs, but
>     not sure how nfs client behaves) and hence can be used  as an
>     alternative solution.
>
>     On a slightly unrelated note flush-behind is immaterial in this test
>     since fdatasync is anyways acting as a barrier.
>
>         _______________________________________________
>         Gluster-devel mailing list
>         Gluster-devel at gluster.org <mailto:Gluster-devel at gluster.org>
>         http://www.gluster.org/mailman/listinfo/gluster-devel
>
>
>
>
>     --
>     Raghavendra G
>
>
>
>
> --
> Raghavendra G
>
>
> _______________________________________________
> Gluster-devel mailing list
> Gluster-devel at gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-devel
>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: bug-1161156.t
Type: application/x-perl
Size: 1661 bytes
Desc: not available
URL: <http://www.gluster.org/pipermail/gluster-devel/attachments/20150519/2231b29a/attachment-0001.pl>


More information about the Gluster-devel mailing list