[Bugs] [Bug 1220347] Read operation on a file which is in split-brain condition is successful

Mon May 11 12:44:51 UTC 2015

https://bugzilla.redhat.com/show_bug.cgi?id=1220347

Ravishankar N <ravishankar at redhat.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |ravishankar at redhat.com

--- Comment #1 from Ravishankar N <ravishankar at redhat.com> ---
Observations from debugging the setup.

When debugging the mount process with gdb, it was observed that in
afr_lookup_done, we do afr_inode_read_subvol_reset() and consequently when
afr_read_txn(), afr_read_txn_refresh_done()  is called, we bail out because
there are no read subvols and the client gets EIO.

When no gdb was attached, the client again began reading stale data. On further
examination, it was observed that fuse sends the following FOPS when 'cat' was
performed on the mount:

1)fuse_fop_resume-->fuse_lookup_resume
2)fuse_fop_resume-->fuse_open_resume
3)fuse_fop_resume-->fuse_getattr_resume--->afr_fstat-->afr_read_txn-->bail out
with EIO.
4)fuse_fop_resume-->fuse_flush_resume

However when 'cat' was done in rapid succession, (3) was not being called. i.e
only fuse_lookup_resume, fuse_open_resume and fuse_flush_resume were being
called. Since the getattr was not sent by fuse, it did not get the EIO and was
serving data from kernel cache. It was noted that this data returned was always
the one written to the latest brick, "World" in this case.

I don't think we should hit the issue if we perform a 1) drop_caches on the
existing mount, or 2) do a remount or 3)mount with the options 
attribute-timeout and entry-timeout set to zero to begin with.

-- 
You are receiving this mail because:
You are on the CC list for the bug.
You are the assignee for the bug.