[Gluster-devel] Non-blocking lock for renames

Thu Feb 4 10:23:22 UTC 2016

----- Original Message -----
> From: "Raghavendra Gowdappa" <rgowdapp at redhat.com>
> To: "Vijay Bellur" <vbellur at redhat.com>
> Cc: "Gluster Devel" <gluster-devel at gluster.org>
> Sent: Thursday, February 4, 2016 6:58:29 AM
> Subject: Re: [Gluster-devel] Non-blocking lock for renames
> 
> 
> 
> ----- Original Message -----
> > From: "Vijay Bellur" <vbellur at redhat.com>
> > To: "Shyamsundar Ranganathan" <srangana at redhat.com>, "Raghavendra Gowdappa"
> > <rgowdapp at redhat.com>
> > Cc: "Gluster Devel" <gluster-devel at gluster.org>
> > Sent: Thursday, February 4, 2016 9:55:04 AM
> > Subject: Non-blocking lock for renames
> > 
> > DHT developers,
> > 
> > We introduced a non-blocking lock prior to a rename operation in dht and
> > fail the rename if the lock acquisition is not successful with 3.6. I
> > ran into an user in IRC yesterday who is affected by this behavior change:
> > 
> > "We're seeing a behavior in Gluster 3.7.x that we did not see in 3.4.x
> > and we're not sure how to fix it. When multiple processes are attempting
> > to rename a file to the same destination at once, we're now seeing
> > "Device or resource busy" and "Stale file handle" errors. Here's the
> > command to replicate it: cd /mnt/glustermount; while true; do
> > FILE=$RANDOM; touch $FILE; mv $FILE file-fv; done. The above command
> > would be ran on two or three servers within the same gluster cluster. In
> > the output, one would always be sucessfull in the rename, while the 2
> > other ones would fail with the above error."
> > 
> > The use case for concurrent renames was described as:
> > 
> > "we generate files and push them to the gluster cluster. Some are
> > generated multiple times and end up being pushed to the cluster at the
> > same time by different data generators; resulting in the 'rename
> > collision'. We use also the cluster.extra-hash-regex to make sure the
> > data is written in place. And this does the rename."
> > 
> > Is a non-blocking lock essential? Can we not use a blocking lock instead
> > of a non-blocking lock or fallback to a blocking lock if the original
> > non-blocking lock acquisition fails?
> 
> This lock synchronizes:
> 1. rename from application with file migration from rebalance process [1].
> 2. multiple renames from application on same file.

Hi,

We've seen this behavior very recently when we had multiple instances of object servers on different nodes, each with it's own FUSE mount. During our tests, we often see many object PUTs fail because of rename() throwing EBUSY or ESTALE (which we don't catch as of today). I'm certain that there was no rebalance happening during that time and we don't use "mv" command for rename. The object server does a series of mkdirs(), followed by unique temp file creation and finally rename(). In our particular test, the final file path was also unique. So it's not multiple renames on the "same file". I'll try to reproduce this later and provide logs. 

> 
> I think lock is still required for 1. However, since migration can
> potentially take large time, we chose a non-blocking lock to make sure
> application is not blocked for longer period.
> 
> The case 2 is what causing the issue mentioned in this thread. We did see
> some files being removed with parallel renames on the same file. But, by the
> time we had identified that its a bug in 'mv' (mv issues an unlink on src if
> src and dst happens to be hardlinks [2]. But test for hardlink check and
> unlink are not atomic. Dht breaks rename into a series of links and
> unlinks), we had introduced synchronizing b/w renames. So, we have two
> options:
> 
> 1. Use different domains for use cases 1 and 2 above. With different domains,
> use-case 2 above can be changed to use blocking locks. It might not be
> advisable to use blocking locks for use-case 1.
> 2. Since we identified the issue is with mv (I couldn't find another bug we
> filed on mv, but [2] is close to it), probably we don't need locking in 2 at
> all.
> 
> Suggestions?
> 
> [1] https://bugzilla.redhat.com/show_bug.cgi?id=969298#c8
> [2] https://bugzilla.redhat.com/show_bug.cgi?id=438076
> 
> regards,
> Raghavendra
> > 
> > Thanks,
> > Vijay
> > 
> > 
> > 
> > 
> _______________________________________________
> Gluster-devel mailing list
> Gluster-devel at gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-devel
>