[Gluster-devel] patch for "limited performance for disperse volumes"

Fri Feb 10 03:51:14 UTC 2017

+gluster-devel

----- Original Message -----
> From: "Milind Changire" <mchangir at redhat.com>
> To: "Raghavendra Gowdappa" <rgowdapp at redhat.com>
> Cc: "rhs-zteam" <rhs-zteam at redhat.com>
> Sent: Thursday, February 9, 2017 11:00:18 PM
> Subject: patch for "limited performance for disperse volumes"
> 
> My first comment was:
> looks like patch for "limited performance for disperse volume" [1] is going
> to be helpful for all other types of volumes as well; but how do we
> guarantee ordering for writes over the same fd for the same offset and
> length in the file ?
> 
> then thinking over a bit and in case you missed my comment over IRC:
> I was thinking about network multi-pathing and rpc requests(two writes)
> being routed through different interfaces to gluster nodes which might
> lead to a non-increasing transaction ID sequence and hence might lead
> to incorrect final value if the older write is committed to the same
> offset+length
> 
> then it dawned on me that for blocking operations the write() call
> wont return until the data is safe on the disk across the network or
> the intermediate translators have cached it appropriately to be
> written behind.
> 
> so would the patch work for two non-blocking writes originating for the
> same fd from the same thread for the same offset+length and being
> routed over multi-pathing and write #2 getting routed quicker than
> write #1 ?

To be honest I've not considered the case of asynchronous writes from application till now. What is the ordering guarantee the OS/filesystems provide for two async writes? For eg., if there are two writes w1 and w2, when is w2 issued? 
* After cbk of w1 is called or
* parallely just after async_write (w1) returns (cbk of w1 is not invoked yet)?

What do POSIX or other standards (or expectation from OS) say about ordering in case 2 above?

[1] https://review.gluster.org/15036

> 
> just thinking aloud
> 
> --
> Milind
>