[Gluster-devel] relative ordering of writes to same file from two different fds

Raghavendra Talur rtalur at redhat.com
Mon Sep 26 10:26:02 UTC 2016

On Mon, Sep 26, 2016 at 1:05 PM, Ryan Ding <ryan.ding at open-fs.com> wrote:

> > On Sep 23, 2016, at 8:59 PM, Jeff Darcy <jdarcy at redhat.com> wrote:
> >
> >>> write-behind: implement causal ordering and other cleanup
> >>
> >>> Rules of causal ordering implemented:¬
> >>
> >>> - If request A arrives after the acknowledgement (to the app,¬
> >>
> >>> i.e, STACK_UNWIND) of another request B, then request B is¬
> >>
> >>> said to have 'caused' request A.¬
> >>
> >>
> >> With the above principle, two write requests (p1 and p2 in example
> above)
> >> issued by _two different threads/processes_ there need _not always_ be a
> >> 'causal' relationship (whether there is a causal relationship is purely
> >> based on the "chance" that write-behind chose to ack one/both of them
> and
> >> their timing of arrival).
> >
> > I think this is an issue of terminology.  While it's not *certain* that B
> > (or p1) caused A (or p2), it's *possible*.  Contrast with the case where
> > they overlap, which could not possibly happen if the application were
> > trying to ensure order.  In the distributed-system literature, this is
> > often referred to as a causal relationship even though it's really just
> > the possibility of one, because in most cases even the possibility means
> > that reordering would be unacceptable.
> >
> >> So, current write-behind is agnostic to the
> >> ordering of p1 and p2 (when done by two threads).
> >>
> >> However if p1 and p2 are issued by same thread there is _always_ a
> causal
> >> relationship (p2 being caused by p1).
> >
> > See above.  If we feel bound to respect causal relationships, we have to
> > be pessimistic and assume that wherever such a relationship *could* exist
> > it *does* exist.  However, as I explained in my previous message, I don't
> > think it's practical to provide such a guarantee across multiple clients,
> > and if we don't provide it across multiple clients then it's not worth
> > much to provide it on a single client.  Applications that require such
> > strict ordering shouldn't use write-behind, or should explicitly flush
> > between writes.  Otherwise they'll break unexpectedly when parts are
> > distributed across multiple nodes.  Assuming that everything runs on one
> > node is the same mistake POSIX makes.  The assumption was appropriate
> > for an earlier era, but not now for a decade or more.
> We can separate this into 2 question:
> 1. should it be a causal relationship in local application ?
> 2. should it be a causal relationship in a distribute application ?
> I think the answer to #2 is ’NO’. this is an issue that distribute
> application should resolve. the way to resolve it is either use distribute
> lock we provided or use their own way (fsync is required in such condition).

> I think the answer to #1 is ‘YES’. because buffer io should not involve
> new data consistency problem than no-buffer io. it’s very common that a
> local application will assume underlying file system to be.
> further more, compatible to linux page cache will always to be a better
> practice way, because there is a lot local applications that has already
> rely on its semantics.

I agree. If my understanding, this is the same model that write-behind uses
as of today if we don't consider the patch proposed. Write-behind orders
all causal operations on the inode(file object) irrespective of the FD used.
This particular patch brings a small modification where it lets the FSYNC
and FLUSH FOPs bypass the order as long as they are not on the same FD as
the pending WRITE FOP.

Raghavendra Talur

> Thanks,
> Ryan
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.gluster.org/pipermail/gluster-devel/attachments/20160926/c56134be/attachment.html>

More information about the Gluster-devel mailing list