[Bugs] [Bug 1202669] New: Perf: readdirp in replicated volumes causes performance degrade

Tue Mar 17 07:51:55 UTC 2015

https://bugzilla.redhat.com/show_bug.cgi?id=1202669

            Bug ID: 1202669
           Summary: Perf:  readdirp in replicated volumes causes
                    performance degrade
           Product: GlusterFS
           Version: mainline
         Component: replicate
          Severity: urgent
          Priority: urgent
          Assignee: bugs at gluster.org
          Reporter: kdhananj at redhat.com
                CC: bturner at redhat.com, bugs at gluster.org,
                    gluster-bugs at redhat.com, kdhananj at redhat.com,
                    rcyriac at redhat.com, vagarwal at redhat.com
        Depends On: 1202541

+++ This bug was initially created as a clone of Bug #1202541 +++

Description of problem:

ls -l performance has been greatly reduced on replicated volumes.  I do not see
this perf hit on distributed volumes.

Version-Release number of selected component (if applicable):

How reproducible:

Every time.

Steps to Reproduce:
1.  Create 10k files
2.  Clear buffers and cache on both servers and clients
3.  Run ls -l on files

Actual results:

Expected results:

Additional info:

--- Additional comment from Ben Turner on 2015-03-16 18:51:55 EDT ---
The commit that causes perf degrade has been found to be the following:

<commit details>
Author: Krutika Dhananjay <kdhananj at redhat.com>
Date:   Thu Jan 22 17:02:20 2015 +0530

     cluster/afr: When parent and entry read subvols are different, set 
entry->inode to NULL
</commit details>

--- Additional comment from Krutika Dhananjay on 2015-03-17 03:19:56 EDT ---

Let me briefly explain what the problem with AFR was:
In AFR, every file has a "read child" associated with it. Read operations on a
file (like readv under the data read category, getxattr, stat etc under
metadata read category and readdirp under entry read category) are always
served from the designated read child of the file/dir unless it contains the
bad copy of the file (i.e., in need of a self-heal). I can think of atleast two
reasons why this is useful:
a. it is sufficient to serve reads from only one of the copies of a file, since
all copies are identical under normal circumstances.
b. Certain attributes like mtime/ctime/atime might differ across different
copies of a file on different bricks due to clock skew across the servers. In
these cases, it is good to always return the same values across consecutive
requests for these attributes, because
    we do not want the application to mistakenly think that the file underwent
some change just because AFR returned different values of timestamps across
different calls.

The problem case is readdirp. Readdirp also fetches the attributes of the
entries read. A directory on which readdirp is performed can have a read child
x while some of the entries read could have read child y.
This means that readdirp may violate (b) above in terms of giving timestamps
from a copy of the entry which is not from its read child. The other problem
with this behavior is that, even if the directory's read child
had a bad copy of some of the entries, this will not be detected. My patches
fixed these issues with readdirp by forcing a lookup on those read entries
whose read child did not match that of the parent. This would have led to some
extra lookups.

Both the BZs above were manifestations of the same bug in readdirp just
described. 

[UPDATE]
It was decided in the meeting that the behavior introduced by the AFR patch is
to be made optional. And by default, the behavior would be "off".

Referenced Bugs:

https://bugzilla.redhat.com/show_bug.cgi?id=1202541
[Bug 1202541] Perf:  Metadata operation performance on replicated volumes
regressed on 3.0.4 builds.
-- 
You are receiving this mail because:
You are on the CC list for the bug.
You are the assignee for the bug.