[Gluster-devel] relative ordering of writes to same file from two different fds

Raghavendra Gowdappa rgowdapp at redhat.com
Wed Sep 21 05:06:40 UTC 2016


Hi all,

This mail is to figure out the behavior of write to same file from two different fds. As Ryan quotes in one of comments,

<comment>

I think it’s not safe. in this case:
1. P1 write to F1 use FD1
2. after P1 write finish, P2 write to the same place use FD2
since they are not conflict with each other now, the order the 2 writes send to underlying fs is not determined. so the final data may be P1’s or P2’s.
this semantics is not the same with linux buffer io. linux buffer io will make the second write cover the first one, this is to say the final data is P2’s.
you can see it from linux NFS (as we are all network filesystem) fs/nfs/file.c:nfs_write_begin(), nfs will flush ‘incompatible’ request first before another write begin. the way 2 request is determine to be ‘incompatible’ is that they are from 2 different open fds.
I think write-behind behaviour should keep the same with linux page cache.

</comment>

However, my understanding is that filesystems need not maintain the relative order of writes (as it received from vfs/kernel) on two different fds. Also, if we have to maintain the order it might come with increased latency. The increased latency can be because of having "newer" writes to wait on "older" ones. This wait can fill up write-behind buffer and can eventually result in a full write-behind cache and hence not able to "write-back" newer writes.

* What does POSIX say about it?
* How do other filesystems behave in this scenario?


Also, the current write-behind implementation has the concept of "generation numbers". To quote from comment:

<write-behind src>

        uint64_t     gen;    /* Liability generation number. Represents                                                                                                 
                                the current 'state' of liability. Every                                                                                                 
                                new addition to the liability list bumps                                                                                                
                                the generation number.                                                                                                                  
                                                                                                                                                                        
                                a newly arrived request is only required                                                                                                
                                to perform causal checks against the entries                                                                                            
                                in the liability list which were present                                                                                                
                                at the time of its addition. the generation                                                                                             
                                number at the time of its addition is stored                                                                                            
                                in the request and used during checks.                                                                                                  
                                                                                                                                                                        
                                the liability list can grow while the request                                                                                           
                                waits in the todo list waiting for its                                                                                                  
                                dependent operations to complete. however                                                                                               
                                it is not of the request's concern to depend                                                                                            
                                itself on those new entries which arrived                                                                                               
                                after it arrived (i.e, those that have a                                                                                                
                                liability generation higher than itself)                                                                                                
                             */
</src>

So, if a single thread is doing writes on two different fds, generation numbers are sufficient to enforce the relative ordering. If writes are from two different threads/processes, I think write-behind is not obligated to maintain their order. Comments?

[1] http://review.gluster.org/#/c/15380/

regards,
Raghavendra


More information about the Gluster-devel mailing list