[Gluster-devel] relative ordering of writes to same file from two different fds

Mon Sep 26 07:35:38 UTC 2016

> On Sep 23, 2016, at 8:59 PM, Jeff Darcy <jdarcy at redhat.com> wrote:
> 
>>> write-behind: implement causal ordering and other cleanup
>> 
>>> Rules of causal ordering implemented:¬
>> 
>>> - If request A arrives after the acknowledgement (to the app,¬
>> 
>>> i.e, STACK_UNWIND) of another request B, then request B is¬
>> 
>>> said to have 'caused' request A.¬
>> 
>> 
>> With the above principle, two write requests (p1 and p2 in example above)
>> issued by _two different threads/processes_ there need _not always_ be a
>> 'causal' relationship (whether there is a causal relationship is purely
>> based on the "chance" that write-behind chose to ack one/both of them and
>> their timing of arrival).
> 
> I think this is an issue of terminology.  While it's not *certain* that B
> (or p1) caused A (or p2), it's *possible*.  Contrast with the case where
> they overlap, which could not possibly happen if the application were
> trying to ensure order.  In the distributed-system literature, this is
> often referred to as a causal relationship even though it's really just
> the possibility of one, because in most cases even the possibility means
> that reordering would be unacceptable.
> 
>> So, current write-behind is agnostic to the
>> ordering of p1 and p2 (when done by two threads).
>> 
>> However if p1 and p2 are issued by same thread there is _always_ a causal
>> relationship (p2 being caused by p1).
> 
> See above.  If we feel bound to respect causal relationships, we have to
> be pessimistic and assume that wherever such a relationship *could* exist
> it *does* exist.  However, as I explained in my previous message, I don't
> think it's practical to provide such a guarantee across multiple clients,
> and if we don't provide it across multiple clients then it's not worth
> much to provide it on a single client.  Applications that require such
> strict ordering shouldn't use write-behind, or should explicitly flush
> between writes.  Otherwise they'll break unexpectedly when parts are
> distributed across multiple nodes.  Assuming that everything runs on one
> node is the same mistake POSIX makes.  The assumption was appropriate
> for an earlier era, but not now for a decade or more.

We can separate this into 2 question:
1. should it be a causal relationship in local application ?
2. should it be a causal relationship in a distribute application ?
I think the answer to #2 is ’NO’. this is an issue that distribute application should resolve. the way to resolve it is either use distribute lock we provided or use their own way (fsync is required in such condition).
I think the answer to #1 is ‘YES’. because buffer io should not involve new data consistency problem than no-buffer io. it’s very common that a local application will assume underlying file system to be.
further more, compatible to linux page cache will always to be a better practice way, because there is a lot local applications that has already rely on its semantics.

Thanks,
Ryan