[Gluster-devel] Non-blocking lock for renames
vbellur at redhat.com
Fri Feb 5 04:36:07 UTC 2016
On 02/04/2016 12:58 AM, Raghavendra Gowdappa wrote:
> ----- Original Message -----
>> From: "Vijay Bellur" <vbellur at redhat.com>
>> To: "Shyamsundar Ranganathan" <srangana at redhat.com>, "Raghavendra Gowdappa" <rgowdapp at redhat.com>
>> Cc: "Gluster Devel" <gluster-devel at gluster.org>
>> Sent: Thursday, February 4, 2016 9:55:04 AM
>> Subject: Non-blocking lock for renames
>> DHT developers,
>> We introduced a non-blocking lock prior to a rename operation in dht and
>> fail the rename if the lock acquisition is not successful with 3.6. I
>> ran into an user in IRC yesterday who is affected by this behavior change:
>> "We're seeing a behavior in Gluster 3.7.x that we did not see in 3.4.x
>> and we're not sure how to fix it. When multiple processes are attempting
>> to rename a file to the same destination at once, we're now seeing
>> "Device or resource busy" and "Stale file handle" errors. Here's the
>> command to replicate it: cd /mnt/glustermount; while true; do
>> FILE=$RANDOM; touch $FILE; mv $FILE file-fv; done. The above command
>> would be ran on two or three servers within the same gluster cluster. In
>> the output, one would always be sucessfull in the rename, while the 2
>> other ones would fail with the above error."
>> The use case for concurrent renames was described as:
>> "we generate files and push them to the gluster cluster. Some are
>> generated multiple times and end up being pushed to the cluster at the
>> same time by different data generators; resulting in the 'rename
>> collision'. We use also the cluster.extra-hash-regex to make sure the
>> data is written in place. And this does the rename."
>> Is a non-blocking lock essential? Can we not use a blocking lock instead
>> of a non-blocking lock or fallback to a blocking lock if the original
>> non-blocking lock acquisition fails?
> This lock synchronizes:
> 1. rename from application with file migration from rebalance process .
> 2. multiple renames from application on same file.
> I think lock is still required for 1. However, since migration can potentially take large time, we chose a non-blocking lock to make sure application is not blocked for longer period.
Since rebalance involves reduced performance and if performance/latency
is the only reason why we have non-blocking locks, I would prefer that
we block a rename during rebalance and preserve application continuity.
> The case 2 is what causing the issue mentioned in this thread. We did see some files being removed with parallel renames on the same file. But, by the time we had identified that its a bug in 'mv' (mv issues an unlink on src if src and dst happens to be hardlinks . But test for hardlink check and unlink are not atomic. Dht breaks rename into a series of links and unlinks), we had introduced synchronizing b/w renames. So, we have two options:
> 1. Use different domains for use cases 1 and 2 above. With different domains, use-case 2 above can be changed to use blocking locks. It might not be advisable to use blocking locks for use-case 1.
> 2. Since we identified the issue is with mv (I couldn't find another bug we filed on mv, but  is close to it), probably we don't need locking in 2 at all.
I would still preserve locking for 2. as the mv fixes are unlikely to
hit all releases of all distributions. If we change the rename lock to
be blocking, I feel that we would be covering both 1. and 2. while
preserving application continuity.
More information about the Gluster-devel