[Bugs] [Bug 1191100] New: Data corruption due to lack of cache revalidation on open

bugzilla at redhat.com bugzilla at redhat.com
Tue Feb 10 12:56:22 UTC 2015


https://bugzilla.redhat.com/show_bug.cgi?id=1191100

            Bug ID: 1191100
           Summary: Data corruption due to lack of cache revalidation on
                    open
           Product: GlusterFS
           Version: mainline
         Component: core
          Keywords: Patch, Triaged
          Severity: high
          Assignee: bugs at gluster.org
          Reporter: ndevos at redhat.com
                CC: bugs at gluster.org, gluster-bugs at redhat.com,
                    kparthas at redhat.com, lmohanty at redhat.com,
                    ndevos at redhat.com, pspencer at fields.utoronto.ca
            Blocks: 1158120



+++ This bug was initially created as a clone of Bug #1158120 +++
+++                                                           +++
+++ Use this bug to post the patch for mainline.              +++

We've started to see data corruption happening when processes on different
hosts are accessing the same SQLite database, coordinating their efforts
through proper locking.

We eventually found the following simple scenario that reproduces the problem.
Here $D is a directory mounted via a native glusterfs mount, and $OTH is
another host that also has $D mounted and can be ssh'd to without a password
and reachable quickly (in < 0.3 seconds or so).

$ echo init > $D/f; cat $D/f; ssh $OTHER_HOST "echo second > $D/$f"; cat $D/f
init
init

If one does a directory listing before the final cat, then one gets the
expected result:

$ echo init > $D/f; cat $D/f; ssh $OTHER_HOST "echo second > $D/$f"; ls $D >
/dev/null; cat $D/f
init
second

Also, if one omits the first cat, or does another modification to the file
after it, then one also gets the expected result:

$ echo init > $D/f; ssh $OTHER_HOST "echo second > $D/$f"; cat $D/f
second

There seem to be several different bugs contributing to this problem and I will
post separate bug reports and patches for each of them.

This bug report concerns the case when the default volume options are used on
mount. Then fopen-keep-cache is used by default, and fuse does not flush its
cache from the first "cat $D/f" before doing the second "cat $D/f", and nothing
else has told fuse to invalidate its cache. That is why the incorrect data is
displayed.

When one does an "ls $D" prior to the second cat, then code in md-cache.c
invalidates the cache for every file within the directory whose contents have
changed. This then gets passed up to fuse-bridge to tell fuse to flush its page
cache.

However, no such check is done when opening a file, and I think it should be,
and also should be done when obtaining a lock on part of a file that is already
open.

The attached patch is what we are now using to avoid this problem. It may not
be implemented in the best way, as I was not sure how to access the iatt data
for the file on an open, so the patch, after a successful open callback, winds
an fstat fop down the stack, and, on return, checks the iatt values from that.
Better would be for the open fop to return the iatt value directly, the way
other fops do, since I think the storage backend does a stat anyway when
responding to an open call, and if it were to return that stat information
there would be no performance hit in using it. As it stands, the patch does
incur a slight performance penalty.

Even with this patch, the behaviour is still not always entirely correct due to
other bugs which I will report separately.

The patch does a similar thing for lock calls.

The patch is against 3.6.0beta3.

--- Additional comment from Philip Spencer on 2014-10-28 21:36:24 CET ---

Oops ... pretty fundamental typo in the first patch that forgot the "op_errno
!=0" to skip doing the iatt check if op_errno was nonzero! Would result in
locks appearing to succeed (returning 0) when in fact they failed!

Revised patch is attached.

--- Additional comment from krishnan parthasarathi on 2014-11-04 13:44:49 CET
---

Philip,

Thanks for opening this issue and posting a patch.  We use Gerrit
(review.gluster.org) as our collaborative code review tool. It would be helpful
if you could submit your patches there (Ref:
http://www.gluster.org/community/documentation/index.php/Simplified_dev_workflow).


Referenced Bugs:

https://bugzilla.redhat.com/show_bug.cgi?id=1158120
[Bug 1158120] Data corruption due to lack of cache revalidation on open
-- 
You are receiving this mail because:
You are on the CC list for the bug.
You are the assignee for the bug.


More information about the Bugs mailing list