[Gluster-devel] patch for "limited performance for disperse volumes"

Milind Changire mchangir at redhat.com
Fri Feb 10 18:04:56 UTC 2017

Here's a quote from a paper titled: Non-blocking Writes to Files

Ordering of Page Updates.
Non-blocking writes may alter the sequence in which patches to
different pages get applied since the page fetches may complete
out-of-order. Non-blocking writes only replace writes that are
to memory that are not guaranteed to be reflected to persistent
storage in any particular sequence. Thus, ordering violations in
updates of in-memory pages are crash-safe.

Page Persistence and Syncs.
If an application would like explicit disk ordering for memory
page updates, it would execute a blocking flush operation
(e.g., fsync ) subsequent to each operation. The flush operation
causes the OS to force the fetch of any page indexed as NBW even
if it has not been allocated yet. The OS then obtains the page
lock, waits for the page fetch, and applies any outstanding
patches, before flushing the page and returning control to the
application. Ordering of disk writes are thus preserved with
non-blocking writes.


On 02/10/2017 01:37 PM, Xavier Hernandez wrote:
> Hi Raghavendra,
> On 10/02/17 04:51, Raghavendra Gowdappa wrote:
>> +gluster-devel
>> ----- Original Message -----
>>> From: "Milind Changire" <mchangir at redhat.com>
>>> To: "Raghavendra Gowdappa" <rgowdapp at redhat.com>
>>> Cc: "rhs-zteam" <rhs-zteam at redhat.com>
>>> Sent: Thursday, February 9, 2017 11:00:18 PM
>>> Subject: patch for "limited performance for disperse volumes"
>>> My first comment was:
>>> looks like patch for "limited performance for disperse volume" [1] is
>>> going
>>> to be helpful for all other types of volumes as well; but how do we
>>> guarantee ordering for writes over the same fd for the same offset and
>>> length in the file ?
>>> then thinking over a bit and in case you missed my comment over IRC:
>>> I was thinking about network multi-pathing and rpc requests(two writes)
>>> being routed through different interfaces to gluster nodes which might
>>> lead to a non-increasing transaction ID sequence and hence might lead
>>> to incorrect final value if the older write is committed to the same
>>> offset+length
>>> then it dawned on me that for blocking operations the write() call
>>> wont return until the data is safe on the disk across the network or
>>> the intermediate translators have cached it appropriately to be
>>> written behind.
>>> so would the patch work for two non-blocking writes originating for the
>>> same fd from the same thread for the same offset+length and being
>>> routed over multi-pathing and write #2 getting routed quicker than
>>> write #1 ?
>> To be honest I've not considered the case of asynchronous writes from
>> application till now. What is the ordering guarantee the
>> OS/filesystems provide for two async writes? For eg., if there are two
>> writes w1 and w2, when is w2 issued?
>> * After cbk of w1 is called or
>> * parallely just after async_write (w1) returns (cbk of w1 is not
>> invoked yet)?
>> What do POSIX or other standards (or expectation from OS) say about
>> ordering in case 2 above?
> I'm not an expert on POSIX. But I've found this [1]:
>     2.9.7 Thread Interactions with Regular File Operations
>     All of the following functions shall be atomic with respect to
>     each other in the effects specified in POSIX.1-2008 when they
>     operate on regular files or symbolic links: [...] write [...]
>     If two threads each call one of these functions, each call shall
>     either see all of the specified effects of the other call, or none
>     of them. The requirement on the close() function shall also apply
>     whenever a file descriptor is successfully closed, however caused
>     (for example, as a consequence of calling close(), calling dup2(),
>     or of process termination).
> Not sure if this also applies to write requests issued asynchronously
> from the same thread, but this would be the worst case (if the OS
> already orders it, we won't have any problem).
> As I see it, this is already satisfied by EC because it doesn't allow
> two concurrent writes to happen at the same time. They can be reordered
> if the second one arrives before the first one, but they are executed
> atomically as POSIX requires. Not sure if AFR also satisfies this
> condition, but I think so.
> From the point of view of EC it's irrelevant if the write comes from the
> same thread or from different processes on different clients. They are
> handled in the same way.
> However a thing to be aware of (from the man page of write):
>     [...] among the effects that should be atomic across threads (and
>     processes) are updates of the file offset. However, on Linux before
>     version 3.14, this was not the case: if two processes that share an
>     open file description (see open(2)) perform a write() (or
>     writev(2)) at the same time, then the I/O operations were not atomic
>     with respect updating the file offset, with the result that the
>     blocks of data output by the two processes might (incorrectly)
>     overlap. This problem was fixed in Linux 3.14.
> Xavi
> [1]
> http://pubs.opengroup.org/onlinepubs/9699919799/functions/V2_chap02.html#tag_15_09_07
>> [1] https://review.gluster.org/15036
>>> just thinking aloud
>>> --
>>> Milind
>> _______________________________________________
>> Gluster-devel mailing list
>> Gluster-devel at gluster.org
>> http://lists.gluster.org/mailman/listinfo/gluster-devel

More information about the Gluster-devel mailing list