[Gluster-devel] relative ordering of writes to same file from two different fds

Jeff Darcy jdarcy at redhat.com
Wed Sep 21 17:58:07 UTC 2016


> However, my understanding is that filesystems need not maintain the relative
> order of writes (as it received from vfs/kernel) on two different fds. Also,
> if we have to maintain the order it might come with increased latency. The
> increased latency can be because of having "newer" writes to wait on "older"
> ones. This wait can fill up write-behind buffer and can eventually result in
> a full write-behind cache and hence not able to "write-back" newer writes.

IEEE 1003.1, 2013 edition
http://pubs.opengroup.org/onlinepubs/9699919799/functions/write.html

> After a write() to a regular file has successfully returned:
> 
> Any successful read() from each byte position in the file that was
> modified by that write shall return the data specified by the write()
> for that position until >such byte positions are again modified.
>
> Any subsequent successful write() to the same byte position in the
> file shall overwrite that file data.

Note that the reference is to a *file*, not to a file *descriptor*.
It's an application of the general POSIX assumption that time is
simple, locking is cheap (if it's even necessary), and therefore
time-based requirements like linearizability - what this is - are
easy to satisfy.  I know that's not very realistic nowadays, but
it's pretty clear: according to the standard as it's still written,
P2's write *is* required to overwrite P1's.  Same vs. different fd
or process/thread doesn't even come into play.

Just for fun, I'll point out that the standard snippet above
doesn't say anything about *non overlapping* writes.  Does POSIX
allow the following?

   write A
   write B
   read B, get new value
   read A, get *old* value

This is a non-linearizable result, which would surely violate
some people's (notably POSIX authors') expectations, but good
luck finding anything in that standard which actually precludes
it.



More information about the Gluster-devel mailing list