[Bugs] [Bug 1425421] New: Moving multiple temporary files to the same destination concurrently causes ESTALE error

bugzilla at redhat.com bugzilla at redhat.com
Tue Feb 21 12:10:09 UTC 2017


https://bugzilla.redhat.com/show_bug.cgi?id=1425421

            Bug ID: 1425421
           Summary: Moving multiple temporary files to the same
                    destination concurrently causes ESTALE error
           Product: Red Hat Gluster Storage
           Version: 3.2
         Component: distribute
          Keywords: Triaged
          Severity: high
          Assignee: nbalacha at redhat.com
          Reporter: tdesala at redhat.com
        QA Contact: tdesala at redhat.com
                CC: bugs at gluster.org, khiremat at redhat.com,
                    nbalacha at redhat.com, pkarampu at redhat.com,
                    rbhat at redhat.com, rgowdapp at redhat.com,
                    rhs-bugs at redhat.com,
                    simon.turcotte-langevin at ubisoft.com,
                    storage-qa-internal at redhat.com
        Depends On: 1378550



+++ This bug was initially created as a clone of Bug #1378550 +++

Description of problem:
We have an application which leverage POSIX atomic move semantic. Therefore, we
allow the same file to be uploaded multiple times, since it can be commited
atomically to the file system. However, when multiple clients try to upload the
same file concurrently, some gets a ESTALE error on the move operation.

Version-Release number of selected component (if applicable):
3.7.5, 3.8.4

How reproducible:
It can be reproduced by creating lots of temporary file concurrently, on
multiple machines, and to try to move them to the same final location.

Steps to Reproduce:
1. Log on multiple machines
1. Execute "while true; do uuid="`uuidgen`"; echo "some data" > "test$uuid"; mv
"test$uuid" "test" -f; done &"
2. Wait until the move command fails

Actual results:
mv: cannot move ‘test5f4c981f-efcb-4ba8-b017-cf4acb76abcc’ to ‘test’: No such
file or directory
mv: cannot move ‘test7cf00867-4982-4206-abcf-e5e836460eda’ to ‘test’: No such
file or directory
mv: cannot move ‘testcacb6c40-c164-435f-b118-7a14687bf4bd’ to ‘test’: No such
file or directory
mv: cannot move ‘test956ff19d-0a16-49bd-a877-df18311570dc’ to ‘test’: No such
file or directory
mv: cannot move ‘test6e36eb01-9e54-4b50-8de8-cebb063554ba’ to ‘test’: Structure
needs cleaning

Expected results:
No output because no error

Additional info:

--- Additional comment from Pranith Kumar K on 2016-10-17 05:22:19 EDT ---

Du, Nitya,
       Based on my debugging inodelk keeps failing with ESTALE. When I checked
dht_rename(), I see that the inodelk is done both on source and destination
inodes. But because the test above can lead to deletion of the file we are
trying to lock on by the other 'while ()...' process the inodelk fails with
ESTALE. When I changed the test to rename to independent filenames, then
everything works as expected.
On mount1:
while true; do uuid="`uuidgen`"; echo "some data" > "test$uuid"; mv "test$uuid"
"test" -f; done

On mount2:
while true; do uuid="`uuidgen`"; echo "some data" > "test$uuid"; mv "test$uuid"
"test2" -f; done

Not sure how to fix this in DHT though. For now re-assigning the bug to DHT.

--- Additional comment from Raghavendra G on 2016-10-17 07:09:14 EDT ---

(In reply to Pranith Kumar K from comment #1)
> Du, Nitya,
>        Based on my debugging inodelk keeps failing with ESTALE. When I
> checked dht_rename(), I see that the inodelk is done both on source and
> destination inodes. But because the test above can lead to deletion of the
> file we are trying to lock on by the other 'while ()...' process the inodelk
> fails with ESTALE. When I changed the test to rename to independent
> filenames, then everything works as expected.
> On mount1:
> while true; do uuid="`uuidgen`"; echo "some data" > "test$uuid"; mv
> "test$uuid" "test" -f; done
> 
> On mount2:
> while true; do uuid="`uuidgen`"; echo "some data" > "test$uuid"; mv
> "test$uuid" "test2" -f; done
> 
> Not sure how to fix this in DHT though. For now re-assigning the bug to DHT.

locking in dht_rename has two purposes:
1. serialize and ensure atomicity (of each rename) when two parallel renames
are done on the same file.
2. serialize a rename with file migration during rebalance.

The current use-case falls into category 1. I think using entrylk instead of
inodelk solves the problem. However need to think more about this.

Assigning bug to Kotresh as he is working on synchronization issues.

--- Additional comment from Pranith Kumar K on 2016-10-17 08:10:22 EDT ---

(In reply to Raghavendra G from comment #2)
> (In reply to Pranith Kumar K from comment #1)
> > Du, Nitya,
> >        Based on my debugging inodelk keeps failing with ESTALE. When I
> > checked dht_rename(), I see that the inodelk is done both on source and
> > destination inodes. But because the test above can lead to deletion of the
> > file we are trying to lock on by the other 'while ()...' process the inodelk
> > fails with ESTALE. When I changed the test to rename to independent
> > filenames, then everything works as expected.
> > On mount1:
> > while true; do uuid="`uuidgen`"; echo "some data" > "test$uuid"; mv
> > "test$uuid" "test" -f; done
> > 
> > On mount2:
> > while true; do uuid="`uuidgen`"; echo "some data" > "test$uuid"; mv
> > "test$uuid" "test2" -f; done
> > 
> > Not sure how to fix this in DHT though. For now re-assigning the bug to DHT.
> 
> locking in dht_rename has two purposes:
> 1. serialize and ensure atomicity (of each rename) when two parallel renames
> are done on the same file.
> 2. serialize a rename with file migration during rebalance.
> 
> The current use-case falls into category 1. I think using entrylk instead of
> inodelk solves the problem. However need to think more about this.
> 
> Assigning bug to Kotresh as he is working on synchronization issues.

Just a word of caution, that it is important to do it in backward compatible
way.


Referenced Bugs:

https://bugzilla.redhat.com/show_bug.cgi?id=1378550
[Bug 1378550] Moving multiple temporary files to the same destination
concurrently causes ESTALE error
-- 
You are receiving this mail because:
You are on the CC list for the bug.
Unsubscribe from this bug https://bugzilla.redhat.com/token.cgi?t=NRVQLpwCwu&a=cc_unsubscribe


More information about the Bugs mailing list