[Gluster-devel] libgfapi compound operations - multiple writes

Thu Dec 10 03:51:42 UTC 2015

----- Original Message -----
> From: "Jeff Darcy" <jdarcy at redhat.com>
> To: "Raghavendra Gowdappa" <rgowdapp at redhat.com>, "Poornima Gurusiddaiah" <pgurusid at redhat.com>
> Cc: "Gluster Devel" <gluster-devel at gluster.org>
> Sent: Wednesday, December 9, 2015 10:36:43 PM
> Subject: Re: [Gluster-devel] libgfapi compound operations - multiple writes
> 
> 
> 
> 
> On December 9, 2015 at 10:31:03 AM, Raghavendra Gowdappa
> (rgowdapp at redhat.com) wrote:
> > forking off since it muddles the original conversation. I've some
> > questions:
> >  
> > 1. Why do multiple writes need to be compounded together?
> > 2. If the reason is aggregation, cant we tune write-behind to do the same?
> 
> I think compounding (as we’ve been discussing it) is only necessary when
> there’s a dependency between operations.  For example, if the first
> creates a value (e.g. file descriptor) used by the second, or if the
> second should not proceed unless the first (e.g. a lock) succeeded.  If
> multiple operations are completely independent of one another, as is the
> case for writes without fsync, then I think we should rely on
> write-behind or something similar instead.  Compounding is likely to be
> the wrong solution here for two reasons:
> 
>  * Correctness: if the writes are independent, there’s no reason why
>    failure of the first should cause the second not to be issued (as
>    would be the case with compounding).
> 
>  * Performance: compounding would keep the writes separate, whereas
>    write-behind can reduce overhead even more by coalescing them into a
>    single request.

Yes. I had similar thoughts while asking the question. Thanks for elaborating.

> 
> There is, however, one case where compounding would be the right answer:
> when there really is a dependency between the writes.  There’s no way to
> specify this through the POSIX/VFS interface (more’s the pity), but it’s
> easy to imagine GFAPI or internal use cases where a second write should
> not overtake or continue without the first - e.g.  a key/value store
> that writes new data followed by an index update pointing to that data.
> The strictly-sequential behavior of a compound operation might be just
> the right match for such cases.

We have one such use-case already i.e., O_APPEND writes. In fact write-behind has enough logic to address dependencies like conflicting writes, read, stat etc on just written regions etc (Of course, we would loose performance gains as write-behind still wind calls across network for dependent ops. But again, if write-behind cache is sufficient enough, this latency is not witnessed by application). So, I am wondering can we pass down these dependency requirements down the stack and let write-behind handle them.

@Poornima and others,

Did you've any such use-cases in mind when you proposed compounding?

regards,
Raghavendra