[Gluster-devel] Eager-lock and nfs graph generation

Tue Feb 19 11:59:40 UTC 2013

On 02/19/2013 11:26 AM, Anand Avati wrote:
>
> Thinking over this, looks like there is a problem!
>
> Write-behind guarantees: That a second write request arriving after 
> the acknowledgement of a first overlapping request (whether 
> written-behind or otherwise) will be guaranteed to be fulfilled in the 
> backend in the same order (i.e, the second overlapping request will be 
> "serialized" behind the first one in the fulfillment process)
>
> Eager-lock requirement: That write-behind will send no two write 
> requests on an overlapping region at the same time.
>
> The requirement-set and guarantee-set have a big overlap, but the 
> requirement-set is not a subset.
>
> This is because of O_SYNC writes. write-behind performs 
> write-serialization at fulfillment only for written behind requests 
> (which get covered under the conflict detection code during liability 
> fulfillment). However, if two threads (or apps) issue overlapping 
> O_SYNC writes to the same region at approx same time, then 
> write-behind will let both of them go by without any kind of 
> serialization, into eager lock, violating the assumptions!
>
> I'm wondering if it is a safer idea to implement overlap checks within 
> eager-lock code itself rather than depend on write-behind :|
>
> Avati
>
>
> On Mon, Feb 11, 2013 at 10:07 PM, Anand Avati <anand.avati at gmail.com 
> <mailto:anand.avati at gmail.com>> wrote:
>
>
>
>     On Mon, Feb 11, 2013 at 9:32 PM, Pranith Kumar K
>     <pkarampu at redhat.com <mailto:pkarampu at redhat.com>> wrote:
>
>         hi,
>         Please note that this is a case in theory and I did not run
>         into such situation, but I feel it is important to address this.
>         Configuration with 'Eager-lock on" and "write-behind off"
>         should not be allowed as it leads to lock synchronization
>         problems which lead to data in-consistency among replicas in nfs.
>         lets say bricks b1, b2 are in replication.
>         Gluster Nfs server uses 1 anonymous fd to perform all
>         write-fops. If eager-lock is enabled in afr, the lock-owner is
>         used as fd's address which will be same for all write-fops, so
>         there will never be any inodelk contention. If write-behind is
>         disabled, there can be writes that overlap. (Does nfs makes
>         sure that the ranges don't overlap?)
>
>         Now imagine the following scenario:
>         lets say w1, w2 are 2 write fops on same offset and length. w1
>         with all '0's and w2 with all '1's. If these 2 write fops are
>         executed in 2 different threads, the order of arrival of write
>         fops on b1 can be w1, w2 where as on b2 it is w2, w1 leading
>         to data inconsistency between the two replicas. The lock
>         contention will not happen as both lk-owner, transport are
>         same for these 2 fops.
>
>
>     Write-behind has to functions - a) performing operations in the
>     background and b) serializing overlapping operations.
>
>     While the problem does exist, the specifics are different from
>     what you describe. since all writes coming in from NFS will always
>     use the same anonymous FD, two near-in-time/overlapping writes
>     will never contend with inodelk() but instead the second write
>     will inherit the lock and changelog from the first. In either
>     case, it is a problem.
>
>         We can add a check in glusterd for volume set to disallow such
>         configuration, BUT by default write-behind is off in nfs graph
>         and by default eager-lock is on. So we should either turn on
>         write-behind for nfs or turn off eager-lock by default.
>
>         Could you please suggest how to proceed with this if you agree
>         that I did not miss any important detail that makes this
>         theory invalid.
>
>
>     It seems loading write-behind xlator in NFS graph  looks like a
>     simpler solution. eager-locking is crucial for replicated NFS
>     write performance.
>
>     Avati
>
>
Shall we disable eager-lock for files opened with O_SYNC, for now?

Pranith
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://supercolony.gluster.org/pipermail/gluster-devel/attachments/20130219/1a4495f3/attachment-0001.html>