[Gluster-devel] Inputs for 4.0 Release notes on Performance

Wed Feb 21 04:11:46 UTC 2018

>From 'git log release-3.13..release-4.0' I see following patches that
might've an impact on performance:

commit a32ff73c06e1e14589817b1701c1c8d0f05aaa04
Author: Atin Mukherjee <amukherj at redhat.com>
Date:   Mon Jan 29 10:23:52 2018 +0530

    glusterd: optimize glusterd import volumes code path

    In case there's a version mismatch detected for one of the volumes
    glusterd was ending up with updating all the volumes which is a
    overkill.

    >mainline patch : https://review.gluster.org/#/c/19358/

    Change-Id: I6df792db391ce3a1697cfa9260f7dbc3f59aa62d
    BUG: 1540554
    Signed-off-by: Atin Mukherjee <amukherj at redhat.com>
    (cherry picked from commit bb34b07fd2ec5e6c3eed4fe0cdf33479dbf5127b)

commit ea972d9f5c9b318429c228108c21a334b4acd95c
Author: Sakshi Bansal <sabansal at redhat.com>
Date:   Mon Jan 22 14:38:17 2018 +0530

    dentry fop serializer: added new server side xlator for dentry fop
serialization

    Problems addressed by this xlator :

    [1]. To prevent race between parallel mkdir,mkdir and lookup etc.

    Fops like mkdir/create, lookup, rename, unlink, link that happen on a
    particular dentry must be serialized to ensure atomicity.

    Another possible case can be a fresh lookup to find existance of a path
    whose gfid is not set yet. Further, storage/posix employs a ctime based
    heuristic 'is_fresh_file' (interval time is less than 1 second of
current
    time) to check fresh-ness of file. With serialization of these two fops
    (lookup & mkdir), we eliminate the race altogether.

    [2]. Staleness of dentries

    This causes exponential increase in traversal time for any inode in the
    subtree of the directory pointed by stale dentry.

    Cause :  Stale dentry is created because of following two operations:

          a. dentry creation due to inode_link, done during operations like
             lookup, mkdir, create, mknod, symlink, create and
          b. dentry unlinking due to various operations like rmdir, rename,
             unlink.

           The reason is __inode_link uses __is_dentry_cyclic, which
explores
           all possible path to avoid cyclic link formation during inode
           linkage. __is_dentry_cyclic explores stale-dentry(ies) and its
           all ancestors which is increases traversing time exponentially.

    Implementation : To acheive this all fops on dentry must take entry
locks
    before they proceed, once they have acquired locks, they perform the fop
    and then release the lock.

    Some documentation from email conversation:
    [1]
http://www.gluster.org/pipermail/gluster-devel/2015-December/047314.html

    [2]
http://www.gluster.org/pipermail/gluster-devel/2015-August/046428.html

    With this patch, the feature is optional, enable it by running:

     `gluster volume set $volname features.sdfs enable`

    Change-Id: I6e80ba3cabfa6facd5dda63bd482b9bf18b6b79b
    Fixes: #397
    Signed-off-by: Sakshi Bansal <sabansal at redhat.com>
    Signed-off-by: Amar Tumballi <amarts at redhat.com>
    Signed-off-by: Sunny Kumar <sunkumar at redhat.com>

commit 24bf7715140586675f8d2036f4d589bc255c16dc
Author: Poornima G <pgurusid at redhat.com>
Date:   Tue Jan 9 17:26:44 2018 +0530

    md-cache: Implement dynamic configuration of xattr list for caching

    Currently, the list of xattrs that md-cache can cache is hard coded
    in the md-cache.c file, this necessiates code change and rebuild
    everytime a new xattr needs to be added to md-cache xattr cache
    list.

    With this patch, the user will be able to configure a comma
    seperated list of xattrs to be cached by md-cache

    Updates #297

    Change-Id: Ie35ed607d17182d53f6bb6e6c6563ac52bc3132e
    Signed-off-by: Poornima G <pgurusid at redhat.com>

commit efc30e60e233164bd4fe7fc903a7c5f718b0448b
Author: Poornima G <pgurusid at redhat.com>
Date:   Tue Jan 9 10:32:16 2018 +0530

    upcall: Allow md-cache to specify invalidations on xattr with wildcard

    Currently, md-cache sends a list of xattrs, it is inttrested in
recieving
    invalidations for. But, it cannot specify any wildcard in the xattr
names
    Eg: user.* - invalidate on updating any xattr with user. prefix.

    This patch, enable upcall to honor wildcard in the xattr key names

    Updates: #297

    Change-Id: I98caf0ed72f11ef10770bf2067d4428880e0a03a
    Signed-off-by: Poornima G <pgurusid at redhat.com>

commit 8fc9c6a8fc7c73b2b4c65a8ddbe988bca10e89b6
Author: Poornima G <pgurusid at redhat.com>
Date:   Thu Jan 4 19:38:05 2018 +0530

    posix: In getxattr, honor the wildcard '*'

    Currently, the posix_xattr_fill performas a sys_getxattr
    on all the keys requested, there are requirements where
    the keys could contain a wildcard, in which case sys_getxattr
    would return ENODATA, eg: if the xattr requested is user.*
    all the xattrs with prefix user. should be returned, with their
    values.

    This patch, changes posix_xattr_fill, to honor wildcard in the keys
    requested.

    Updates #297

    Change-Id: I3d52da2957ac386fca3c156e26ff4cdf0b2c79a9
    Signed-off-by: Poornima G <pgurusid at redhat.com>

commit 84c5c540b26c8f3dcb9845344dd48df063e57845
Author: karthik-us <ksubrahm at redhat.com>
Date:   Wed Jan 17 17:30:06 2018 +0530

    cluster/afr: Adding option to take full file lock

    Problem:
    In replica 3 volumes there is a possibilities of ending up in split
    brain scenario, when multiple clients writing data on the same file
    at non overlapping regions in parallel.

    Scenario:
    - Initially all the copies are good and all the clients gets the value
      of data readables as all good.
    - Client C0 performs write W1 which fails on brick B0 and succeeds on
      other two bricks.
    - C1 performs write W2 which fails on B1 and succeeds on other two
bricks.
    - C2 performs write W3 which fails on B2 and succeeds on other two
bricks.
    - All the 3 writes above happen in parallel and fall on different ranges
      so afr takes granular locks and all the writes are performed in
parallel.
      Since each client had data-readables as good, it does not see
      file going into split-brain in the in_flight_split_brain check, hence
      performs the post-op marking the pending xattrs. Now all the bricks
      are being blamed by each other, ending up in split-brain.

    Fix:
    Have an option to take either full lock or range lock on files while
    doing data transactions, to prevent the possibility of ending up in
    split brains. With this change, by default the files will take full
    lock while doing IO. If you want to make use of the old range lock
    change the value of "cluster.full-lock" to "no".

    Change-Id: I7893fa33005328ed63daa2f7c35eeed7c5218962
    BUG: 1535438
    Signed-off-by: karthik-us <ksubrahm at redhat.com>

commit 2db7872d5251d98d47c262ff269776bfae2d4fb9
Author: Poornima G <pgurusid at redhat.com>
Date:   Mon Aug 7 11:24:46 2017 +0530

    md-cache: Serve nameless lookup from cache

    Updates #232
    Change-Id: I97e92312a53a50c2d1660bf8d657201fc05a76eb
    Signed-off-by: Poornima G <pgurusid at redhat.com>

commit 78d67da17356b48cf1d5a6595764650d5b200ba7
Author: Sunil Kumar Acharya <sheggodu at redhat.com>
Date:   Thu Mar 23 12:50:41 2017 +0530

    cluster/ec: OpenFD heal implementation for EC

    Existing EC code doesn't try to heal the OpenFD to
    avoid unnecessary healing of the data later.

    Fix implements the healing of open FDs before
    carrying out file operations on them by making an
    attempt to open the FDs on required up nodes.

    BUG: 1431955
    Change-Id: Ib696f59c41ffd8d5678a484b23a00bb02764ed15
    Signed-off-by: Sunil Kumar Acharya <sheggodu at redhat.com>

commit 14dbd5da1cae64e6d4d2c69966e19844d090ce98
Author: Niklas Hambüchen <mail at nh2.me>
Date:   Fri Dec 29 15:49:13 2017 +0100

    glusterfind: Speed up gfid lookup 100x by using an SQL index

    Fixes #1529883.

    This fixes some bits of `glusterfind`'s horrible performance,
    making it 100x faster.

    Until now, glusterfind was, for each line in each CHANGELOG.* file,
    linearly reading the entire contents of the sqlite database in
    4096-bytes-sized pread64() syscalls when executing the

      SELECT COUNT(1) FROM %s WHERE 1=1 AND gfid = ?

    query through the code path:

      get_changes()
        parse_changelog_to_db()
          when_data_meta()
            gfidpath_exists()
              _exists()

    In a quick benchmark on my laptop, doing one such `SELECT` query
    took ~75ms on a 10MB-sized sqlite DB, while doing the same query
    with an index took < 1ms.

    Change-Id: I8e7fe60f1f45a06c102f56b54d2ead9e0377794e
    BUG: 1529883
    Signed-off-by: Niklas Hambüchen <mail at nh2.me>

commit c96a1338fe8139d07a0aa1bc40f0843d033f0324
Author: Pranith Kumar K <pkarampu at redhat.com>
Date:   Wed Dec 6 07:59:53 2017 +0530

    cluster/ec: Change [f]getxattr to parallel-dispatch-one

    At the moment in EC, [f]getxattr operations wait to acquire a lock
    while other operations are in progress even when it is in the same
mount with a
    lock on the file/directory. This happens because [f]getxattr operations
    follow the model where the operation is wound on 'k' of the bricks and
are
    matched to make sure the data returned is same on all of them. This
consistency
    check requires that no other operations are on-going while [f]getxattr
    operations are wound to the bricks. We can perform [f]getxattr in
    another way as well, where we find the good_mask from the lock that is
already
    granted and wind the operation on any one of the good bricks and unwind
the
    answer after adjusting size/blocks to the parent xlator. Since we are
taking
    into account good_mask, the reply we get will either be before or after
a
    possible on-going operation. Using this method, the operation doesn't
need to
    depend on completion of on-going operations which could be taking long
time (In
    case of some slow disks and writes are in progress etc). Thus we reduce
the
    time to serve [f]getxattr requests.

    I changed [f]getxattr to dispatch-one and added extra logic in
    ec_link_has_lock_conflict() to not have any conflicts for fops with
    EC_MINIMUM_ONE as fop->minimum to achieve the effect described above.
    Modified scripts to make sure READ fop is received in EC to trigger
heals.

    Updates gluster/glusterfs#368
    Change-Id: I3b4ebf89181c336b7b8d5471b0454f016cdaf296
    Signed-off-by: Pranith Kumar K <pkarampu at redhat.com>

commit e255385ae4f4c8a883b3fb96baceba4b143828da
Author: Csaba Henk <csaba at redhat.com>
Date:   Fri Nov 10 20:33:20 2017 +0100

    write-behind: Allow trickling-writes to be configurable

    This is the undisputed/trivial part of Shreyas' patch
    he attached to https://bugzilla.redhat.com/1364740 (of
    which the current bug is a clone).

    We need more evaluation for the page_size and window_size
    bits before taking them on.

    Change-Id: Iaa0b9a69d35e522b77a52a09acef47460e8ae3e9
    BUG: 1428060
    Co-authored-by: Shreyas Siravara <sshreyas at fb.com>
    Signed-off-by: Csaba Henk <csaba at redhat.com>

commit c26cadd31dfa128c4ec6883f69d654813f351018
Author: Poornima G <pgurusid at redhat.com>
Date:   Fri Jun 30 12:52:21 2017 +0530

    quick-read: Integrate quick read with upcall and increase cache time

    Fixes : #261
    Co-author: Subha sree Mohankumar <smohanku at redhat.com>
    Change-Id: Ie9dd94e86459123663b9b200d92940625ef68eab
    Signed-off-by: Poornima G <pgurusid at redhat.com>

commit d95db5505a9cb923e61ccd23d28b45ceb07b716f
Author: Shreyas Siravara <sshreyas at fb.com>
Date:   Thu Sep 7 15:34:58 2017 -0700

    md-cache: Cache statfs calls

    Summary:
    - This gives md-cache to cache statfs calls
    - You can turn it on or off via 'gluster vol set groot
performance.md-cache-statfs <on|off>'

    Change-Id: I664579e3c19fb9a6cd9d7b3a0eae061f70f4def4
    BUG: 1523295
    Signature:
t1:4652632:1488581841:111cc01efe83c71f1e98d075abb10589c4574705
    Reviewed-on: https://review.gluster.org/18228
    Reviewed-by: Shreyas Siravara <sshreyas at fb.com>
    CentOS-regression: Gluster Build System <jenkins at build.gluster.org>
    Smoke: Gluster Build System <jenkins at build.gluster.org>
    Signed-off-by: Shreyas Siravara <sshreyas at fb.com>

commit 430484c92ab5a6234958d1143e0bb14aeb0cd1c0
Author: Mohit Agrawal <moagrawa at redhat.com>
Date:   Fri Oct 20 12:39:29 2017 +0530

    glusterfs: Use gcc builtin ATOMIC operator to increase/decreate
refcount.

    Problem: In glusterfs code base we call mutex_lock/unlock to take
             reference/dereference for a object.Sometime it could be
             reason for lock contention also.

    Solution: There is no need to use mutex to increase/decrease ref
              counter, instead of using mutex use gcc builtin ATOMIC
              operation.

    Test:   I have not observed yet how much performance gain after apply
            this patch specific to glusterfs but i have tested same
            with below small program(mutex and atomic both) and
            get good difference.

   Change-Id: Ie5030a52ea264875e002e108dd4b207b15ab7cc7
   Signed-off-by: Mohit Agrawal <moagrawa at redhat.com>

commit f9b6174a7f5eb6475ca9780b062bfb3ff1132b2d
Author: Shreyas Siravara <sshreyas at fb.com>
Date:   Mon Apr 10 12:36:21 2017 -0700

    posix: Add option to disable nftw() based deletes when purging the
landfill directory

    Summary:
    - We may have found an issue where certain directories were being moved
into .landfill and then being quickly purged via nftw().
    - We would like to have an emergency option to disable these purges.

    > Reviewed-on: https://review.gluster.org/18253
    > Reviewed-by: Shreyas Siravara <sshreyas at fb.com>

    Fixes #371

    Change-Id: I90b54c535930c1ca2925a928728199b6b80eadd9
    Signed-off-by: Amar Tumballi <amarts at redhat.com>

commit 59d1cc720f52357f7a6f20bb630febc6a622c99c
Author: Raghavendra G <rgowdapp at redhat.com>
Date:   Tue Sep 19 09:44:55 2017 +0530

    cluster/dht: populate inode in dentry for single subvolume dht

    ... in readdirp response if dentry points to a directory inode. This
    is a special case where the entire layout is stored in one single
    subvolume and hence no need for lookup to construct the layout

    Change-Id: I44fd951e2393ec9dac2af120469be47081a32185
    BUG: 1492625
    Signed-off-by: Raghavendra G <rgowdapp at redhat.com>

commit e785faead91f74dce7c832848f2e8f3f43bd0be5
Author: Raghavendra G <rgowdapp at redhat.com>
Date:   Mon Sep 18 16:01:34 2017 +0530

    cluster/dht: don't overfill the buffer in readdir(p)

    Superflous dentries that cannot be fit in the buffer size provided by
    kernel are thrown away by fuse-bridge. This means,

    * the next readdir(p) seen by readdir-ahead would have an offset of a
    dentry returned in a previous readdir(p) response. When readdir-ahead
    detects non-monotonic offset it turns itself off which can result in
    poor readdir performance.

    * readdirp can be cpu-intensive on brick and there is no point to read
     all those dentries just to be thrown away by fuse-bridge.

    So, the best strategy would be to fill the buffer optimally - neither
    overfill nor underfill.

    Change-Id: Idb3d85dd4c08fdc4526b2df801d49e69e439ba84
    BUG: 1492625
    Signed-off-by: Raghavendra G <rgowdapp at redhat.com>

commit 4ad64ffe8664cc0b964586af6efcf53cc619b68a
Author: Pranith Kumar K <pkarampu at redhat.com>
Date:   Fri Nov 17 07:20:21 2017 +0530

    ec: Use tiebreaker_inodelk where necessary

    When there are big directories or files that need to be healed,
    other shds are stuck on getting lock on self-heal domain for these
    directories/files. If there is a tie-breaker logic, other shds
    can heal some other files/directories while 1 of the shds is healing
    the big file/directory.

    Before this patch:
    96.67  4890.64 us 12.89 us 646115887.30us 340869 INODELK
    After this patch:
    40.76  42.35 us   15.09 us 6546.50us 438478 INODELK

    Fixes gluster/glusterfs#354
    Change-Id: Ia995b5576b44f770c064090705c78459e543cc64
    Signed-off-by: Pranith Kumar K <pkarampu at redhat.com>

commit 3f8d118e48f11f448f35aca0c48ad40e0fd34f5b
Author: Xavier Hernandez <jahernan at redhat.com>
Date:   Tue Nov 7 13:45:03 2017 +0100

    libglusterfs/atomic: Improved atomic support

    This patch solves a detection problem in configure.ac that prevented
    that compilation detects builtin __atomic or __sync functions.

    It also adds more atomic types and support for other atomic functions.

    An special case has been added to support 64-bit atomics on 32-bit
    systems. The solution is to fallback to the mutex solution only for
    64-bit atomics, but smaller atomic types will still take advantage
    of builtins if available.

    Change-Id: I6b9afc7cd6e66b28a33278715583552872278801
    BUG: 1510397
    Signed-off-by: Xavier Hernandez <jahernan at redhat.com>

commit 0dcd5b2feeeec7c29bd2454d6ad950d094d02b0f
Author: Xavier Hernandez <jahernan at redhat.com>
Date:   Mon Oct 16 13:57:59 2017 +0200

    cluster/ec: create eager-lock option for non-regular files

    A new option is added to allow independent configuration of eager
    locking for regular files and non-regular files.

    Change-Id: I8f80e46d36d8551011132b15c0fac549b7fb1c60
    BUG: 1502610
    Signed-off-by: Xavier Hernandez <jahernan at redhat.com>

Apart from these commits there are also some patches which aid concurrency
in the code. I've left them out since performance benefits are not measured
and doesn't affect the users directly. If you feel these have to be added
please let me know. Some changes are:
* Patches from Zhang Huan <zhanghuan at open-fs.com> aimed to reduce lock
contention in rpc layer and while accessing fdtable,
* Patches from Milind Changire <mchangir at redhat.com> while accessing
programs in rpcsvc.

>From the commits listed above, I see that following components are affected
and I've listed owners for updating a short summary of changes along with
the component
* glusterd: optimize glusterd import volumes code path - Atin
* md-cache - Shreyas and Poornima
* EC - Xavi and Pranith (I see that pranith already sent an update. So I
guess this is covered)
* Improvements to consumption of Atomic Builtins - Xavi and Mohit
* Improvements to glusterfind - Niklas Hambüchen, Milind and Aravinda V K
* Modification of Quick-read to consume upcall notifications - Poornima
* Exposing trickling-writes in write-behind - Csaba and Shreyas
* Changes to Purging landfill directory in storage/posix - Shreyas
* Adding option to full file lock in afr - Karthick Subramanya
* readdirplus enhancements in DHT - Raghavendra Gowdappa
* Dentry Fop Serializer - Raghavendra Gowdappa and Amar

Please send out patches updating "Performance" section of release notes. If
you think your patch need not be mentioned in relase notes too, please send
an explicit nack so that we'll know.

If I've left out any fixes, please point them out. If not, only subset of
changes listed above will have a mention in "performance" section of
release notes.

On Tue, Feb 20, 2018 at 7:59 AM, Raghavendra Gowdappa <rgowdapp at redhat.com>
wrote:

> +gluster-devel.
>
>
> On Tue, Feb 20, 2018 at 7:35 AM, Raghavendra Gowdappa <rgowdapp at redhat.com
> > wrote:
>
>> All,
>>
>> I am trying to come up with content for release notes for 4.0 summarizing
>> performance impact. Can you point me to patches/documentation/issues/bugs
>> that could impact performance in 4.0? Better still, if you can give me a
>> summary of changes having performance impact, it would be really be helpful.
>>
>> I see that Pranith had responded with this link:
>> https://review.gluster.org/#/c/19535/3/doc/release-notes/4.0.0.md
>>
>> regards,
>> Raghavendra
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-devel/attachments/20180221/bee6fa3f/attachment-0001.html>