[Gluster-devel] Eager-lock and nfs graph generation
anand.avati at gmail.com
Tue Feb 19 05:56:13 UTC 2013
Thinking over this, looks like there is a problem!
Write-behind guarantees: That a second write request arriving after the
acknowledgement of a first overlapping request (whether written-behind or
otherwise) will be guaranteed to be fulfilled in the backend in the same
order (i.e, the second overlapping request will be "serialized" behind the
first one in the fulfillment process)
Eager-lock requirement: That write-behind will send no two write requests
on an overlapping region at the same time.
The requirement-set and guarantee-set have a big overlap, but the
requirement-set is not a subset.
This is because of O_SYNC writes. write-behind performs write-serialization
at fulfillment only for written behind requests (which get covered under
the conflict detection code during liability fulfillment). However, if two
threads (or apps) issue overlapping O_SYNC writes to the same region at
approx same time, then write-behind will let both of them go by without any
kind of serialization, into eager lock, violating the assumptions!
I'm wondering if it is a safer idea to implement overlap checks within
eager-lock code itself rather than depend on write-behind :|
On Mon, Feb 11, 2013 at 10:07 PM, Anand Avati <anand.avati at gmail.com> wrote:
> On Mon, Feb 11, 2013 at 9:32 PM, Pranith Kumar K <pkarampu at redhat.com>wrote:
>> Please note that this is a case in theory and I did not run into such
>> situation, but I feel it is important to address this.
>> Configuration with 'Eager-lock on" and "write-behind off" should not be
>> allowed as it leads to lock synchronization problems which lead to data
>> in-consistency among replicas in nfs.
>> lets say bricks b1, b2 are in replication.
>> Gluster Nfs server uses 1 anonymous fd to perform all write-fops. If
>> eager-lock is enabled in afr, the lock-owner is used as fd's address which
>> will be same for all write-fops, so there will never be any inodelk
>> contention. If write-behind is disabled, there can be writes that overlap.
>> (Does nfs makes sure that the ranges don't overlap?)
>> Now imagine the following scenario:
>> lets say w1, w2 are 2 write fops on same offset and length. w1 with all
>> '0's and w2 with all '1's. If these 2 write fops are executed in 2
>> different threads, the order of arrival of write fops on b1 can be w1, w2
>> where as on b2 it is w2, w1 leading to data inconsistency between the two
>> replicas. The lock contention will not happen as both lk-owner, transport
>> are same for these 2 fops.
> Write-behind has to functions - a) performing operations in the background
> and b) serializing overlapping operations.
> While the problem does exist, the specifics are different from what you
> describe. since all writes coming in from NFS will always use the same
> anonymous FD, two near-in-time/overlapping writes will never contend with
> inodelk() but instead the second write will inherit the lock and changelog
> from the first. In either case, it is a problem.
>> We can add a check in glusterd for volume set to disallow such
>> configuration, BUT by default write-behind is off in nfs graph and by
>> default eager-lock is on. So we should either turn on write-behind for nfs
>> or turn off eager-lock by default.
>> Could you please suggest how to proceed with this if you agree that I did
>> not miss any important detail that makes this theory invalid.
> It seems loading write-behind xlator in NFS graph looks like a simpler
> solution. eager-locking is crucial for replicated NFS write performance.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Gluster-devel