[Gluster-devel] relative ordering of writes to same file from two different fds

Fri Sep 23 12:59:41 UTC 2016

> > write-behind: implement causal ordering and other cleanup
> 
> > Rules of causal ordering implemented:¬
> 
> > - If request A arrives after the acknowledgement (to the app,¬
> 
> > i.e, STACK_UNWIND) of another request B, then request B is¬
> 
> > said to have 'caused' request A.¬
> 
>
> With the above principle, two write requests (p1 and p2 in example above)
> issued by _two different threads/processes_ there need _not always_ be a
> 'causal' relationship (whether there is a causal relationship is purely
> based on the "chance" that write-behind chose to ack one/both of them and
> their timing of arrival).

I think this is an issue of terminology.  While it's not *certain* that B
(or p1) caused A (or p2), it's *possible*.  Contrast with the case where
they overlap, which could not possibly happen if the application were
trying to ensure order.  In the distributed-system literature, this is
often referred to as a causal relationship even though it's really just
the possibility of one, because in most cases even the possibility means
that reordering would be unacceptable.

> So, current write-behind is agnostic to the
> ordering of p1 and p2 (when done by two threads).
>
> However if p1 and p2 are issued by same thread there is _always_ a causal
> relationship (p2 being caused by p1).

See above.  If we feel bound to respect causal relationships, we have to
be pessimistic and assume that wherever such a relationship *could* exist
it *does* exist.  However, as I explained in my previous message, I don't
think it's practical to provide such a guarantee across multiple clients,
and if we don't provide it across multiple clients then it's not worth
much to provide it on a single client.  Applications that require such
strict ordering shouldn't use write-behind, or should explicitly flush
between writes.  Otherwise they'll break unexpectedly when parts are
distributed across multiple nodes.  Assuming that everything runs on one
node is the same mistake POSIX makes.  The assumption was appropriate
for an earlier era, but not now for a decade or more.