[Gluster-devel] Inputs for 4.0 Release notes on Performance

Wed Feb 21 04:18:29 UTC 2018

On Wed, Feb 21, 2018 at 9:47 AM, Amye Scavarda <amye at redhat.com> wrote:

> It may be more effective to email the direct parties, I know that I filter
> out mailing lists and don't always see this in time.
> Given as this is somewhat time critical and we'll need to get release
> notes out shortly, suggest taking it to direct emails.
>

I added individual owners to CC list. For some reason, they are not
reflected in CC list. But, I guess they would've received direct mails.

- amye
>
> On Tue, Feb 20, 2018 at 8:11 PM, Raghavendra Gowdappa <rgowdapp at redhat.com
> > wrote:
>
>> From 'git log release-3.13..release-4.0' I see following patches that
>> might've an impact on performance:
>>
>> commit a32ff73c06e1e14589817b1701c1c8d0f05aaa04
>> Author: Atin Mukherjee <amukherj at redhat.com>
>> Date:   Mon Jan 29 10:23:52 2018 +0530
>>
>>     glusterd: optimize glusterd import volumes code path
>>
>>     In case there's a version mismatch detected for one of the volumes
>>     glusterd was ending up with updating all the volumes which is a
>>     overkill.
>>
>>     >mainline patch : https://review.gluster.org/#/c/19358/
>>
>>     Change-Id: I6df792db391ce3a1697cfa9260f7dbc3f59aa62d
>>     BUG: 1540554
>>     Signed-off-by: Atin Mukherjee <amukherj at redhat.com>
>>     (cherry picked from commit bb34b07fd2ec5e6c3eed4fe0cdf33479dbf5127b)
>>
>> commit ea972d9f5c9b318429c228108c21a334b4acd95c
>> Author: Sakshi Bansal <sabansal at redhat.com>
>> Date:   Mon Jan 22 14:38:17 2018 +0530
>>
>>     dentry fop serializer: added new server side xlator for dentry fop
>> serialization
>>
>>     Problems addressed by this xlator :
>>
>>     [1]. To prevent race between parallel mkdir,mkdir and lookup etc.
>>
>>     Fops like mkdir/create, lookup, rename, unlink, link that happen on a
>>     particular dentry must be serialized to ensure atomicity.
>>
>>     Another possible case can be a fresh lookup to find existance of a
>> path
>>     whose gfid is not set yet. Further, storage/posix employs a ctime
>> based
>>     heuristic 'is_fresh_file' (interval time is less than 1 second of
>> current
>>     time) to check fresh-ness of file. With serialization of these two
>> fops
>>     (lookup & mkdir), we eliminate the race altogether.
>>
>>     [2]. Staleness of dentries
>>
>>     This causes exponential increase in traversal time for any inode in
>> the
>>     subtree of the directory pointed by stale dentry.
>>
>>     Cause :  Stale dentry is created because of following two operations:
>>
>>           a. dentry creation due to inode_link, done during operations
>> like
>>              lookup, mkdir, create, mknod, symlink, create and
>>           b. dentry unlinking due to various operations like rmdir,
>> rename,
>>              unlink.
>>
>>            The reason is __inode_link uses __is_dentry_cyclic, which
>> explores
>>            all possible path to avoid cyclic link formation during inode
>>            linkage. __is_dentry_cyclic explores stale-dentry(ies) and its
>>            all ancestors which is increases traversing time exponentially.
>>
>>     Implementation : To acheive this all fops on dentry must take entry
>> locks
>>     before they proceed, once they have acquired locks, they perform the
>> fop
>>     and then release the lock.
>>
>>     Some documentation from email conversation:
>>     [1] http://www.gluster.org/pipermail/gluster-devel/2015-December
>> /047314.html
>>
>>     [2] http://www.gluster.org/pipermail/gluster-devel/2015-August/
>> 046428.html
>>
>>     With this patch, the feature is optional, enable it by running:
>>
>>      `gluster volume set $volname features.sdfs enable`
>>
>>     Change-Id: I6e80ba3cabfa6facd5dda63bd482b9bf18b6b79b
>>     Fixes: #397
>>     Signed-off-by: Sakshi Bansal <sabansal at redhat.com>
>>     Signed-off-by: Amar Tumballi <amarts at redhat.com>
>>     Signed-off-by: Sunny Kumar <sunkumar at redhat.com>
>>
>>
>> commit 24bf7715140586675f8d2036f4d589bc255c16dc
>> Author: Poornima G <pgurusid at redhat.com>
>> Date:   Tue Jan 9 17:26:44 2018 +0530
>>
>>     md-cache: Implement dynamic configuration of xattr list for caching
>>
>>     Currently, the list of xattrs that md-cache can cache is hard coded
>>     in the md-cache.c file, this necessiates code change and rebuild
>>     everytime a new xattr needs to be added to md-cache xattr cache
>>     list.
>>
>>     With this patch, the user will be able to configure a comma
>>     seperated list of xattrs to be cached by md-cache
>>
>>     Updates #297
>>
>>     Change-Id: Ie35ed607d17182d53f6bb6e6c6563ac52bc3132e
>>     Signed-off-by: Poornima G <pgurusid at redhat.com>
>>
>> commit efc30e60e233164bd4fe7fc903a7c5f718b0448b
>> Author: Poornima G <pgurusid at redhat.com>
>> Date:   Tue Jan 9 10:32:16 2018 +0530
>>
>>     upcall: Allow md-cache to specify invalidations on xattr with wildcard
>>
>>     Currently, md-cache sends a list of xattrs, it is inttrested in
>> recieving
>>     invalidations for. But, it cannot specify any wildcard in the xattr
>> names
>>     Eg: user.* - invalidate on updating any xattr with user. prefix.
>>
>>     This patch, enable upcall to honor wildcard in the xattr key names
>>
>>     Updates: #297
>>
>>     Change-Id: I98caf0ed72f11ef10770bf2067d4428880e0a03a
>>     Signed-off-by: Poornima G <pgurusid at redhat.com>
>>
>> commit 8fc9c6a8fc7c73b2b4c65a8ddbe988bca10e89b6
>> Author: Poornima G <pgurusid at redhat.com>
>> Date:   Thu Jan 4 19:38:05 2018 +0530
>>
>>     posix: In getxattr, honor the wildcard '*'
>>
>>     Currently, the posix_xattr_fill performas a sys_getxattr
>>     on all the keys requested, there are requirements where
>>     the keys could contain a wildcard, in which case sys_getxattr
>>     would return ENODATA, eg: if the xattr requested is user.*
>>     all the xattrs with prefix user. should be returned, with their
>>     values.
>>
>>     This patch, changes posix_xattr_fill, to honor wildcard in the keys
>>     requested.
>>
>>     Updates #297
>>
>>     Change-Id: I3d52da2957ac386fca3c156e26ff4cdf0b2c79a9
>>     Signed-off-by: Poornima G <pgurusid at redhat.com>
>>
>> commit 84c5c540b26c8f3dcb9845344dd48df063e57845
>> Author: karthik-us <ksubrahm at redhat.com>
>> Date:   Wed Jan 17 17:30:06 2018 +0530
>>
>>     cluster/afr: Adding option to take full file lock
>>
>>     Problem:
>>     In replica 3 volumes there is a possibilities of ending up in split
>>     brain scenario, when multiple clients writing data on the same file
>>     at non overlapping regions in parallel.
>>
>>     Scenario:
>>     - Initially all the copies are good and all the clients gets the value
>>       of data readables as all good.
>>     - Client C0 performs write W1 which fails on brick B0 and succeeds on
>>       other two bricks.
>>     - C1 performs write W2 which fails on B1 and succeeds on other two
>> bricks.
>>     - C2 performs write W3 which fails on B2 and succeeds on other two
>> bricks.
>>     - All the 3 writes above happen in parallel and fall on different
>> ranges
>>       so afr takes granular locks and all the writes are performed in
>> parallel.
>>       Since each client had data-readables as good, it does not see
>>       file going into split-brain in the in_flight_split_brain check,
>> hence
>>       performs the post-op marking the pending xattrs. Now all the bricks
>>       are being blamed by each other, ending up in split-brain.
>>
>>     Fix:
>>     Have an option to take either full lock or range lock on files while
>>     doing data transactions, to prevent the possibility of ending up in
>>     split brains. With this change, by default the files will take full
>>     lock while doing IO. If you want to make use of the old range lock
>>     change the value of "cluster.full-lock" to "no".
>>
>>     Change-Id: I7893fa33005328ed63daa2f7c35eeed7c5218962
>>     BUG: 1535438
>>     Signed-off-by: karthik-us <ksubrahm at redhat.com>
>>
>> commit 2db7872d5251d98d47c262ff269776bfae2d4fb9
>> Author: Poornima G <pgurusid at redhat.com>
>> Date:   Mon Aug 7 11:24:46 2017 +0530
>>
>>     md-cache: Serve nameless lookup from cache
>>
>>     Updates #232
>>     Change-Id: I97e92312a53a50c2d1660bf8d657201fc05a76eb
>>     Signed-off-by: Poornima G <pgurusid at redhat.com>
>>
>> commit 78d67da17356b48cf1d5a6595764650d5b200ba7
>> Author: Sunil Kumar Acharya <sheggodu at redhat.com>
>> Date:   Thu Mar 23 12:50:41 2017 +0530
>>
>>     cluster/ec: OpenFD heal implementation for EC
>>
>>     Existing EC code doesn't try to heal the OpenFD to
>>     avoid unnecessary healing of the data later.
>>
>>     Fix implements the healing of open FDs before
>>     carrying out file operations on them by making an
>>     attempt to open the FDs on required up nodes.
>>
>>     BUG: 1431955
>>     Change-Id: Ib696f59c41ffd8d5678a484b23a00bb02764ed15
>>     Signed-off-by: Sunil Kumar Acharya <sheggodu at redhat.com>
>>
>> commit 14dbd5da1cae64e6d4d2c69966e19844d090ce98
>> Author: Niklas Hambüchen <mail at nh2.me>
>> Date:   Fri Dec 29 15:49:13 2017 +0100
>>
>>     glusterfind: Speed up gfid lookup 100x by using an SQL index
>>
>>     Fixes #1529883.
>>
>>     This fixes some bits of `glusterfind`'s horrible performance,
>>     making it 100x faster.
>>
>>     Until now, glusterfind was, for each line in each CHANGELOG.* file,
>>     linearly reading the entire contents of the sqlite database in
>>     4096-bytes-sized pread64() syscalls when executing the
>>
>>       SELECT COUNT(1) FROM %s WHERE 1=1 AND gfid = ?
>>
>>     query through the code path:
>>
>>       get_changes()
>>         parse_changelog_to_db()
>>           when_data_meta()
>>             gfidpath_exists()
>>               _exists()
>>
>>     In a quick benchmark on my laptop, doing one such `SELECT` query
>>     took ~75ms on a 10MB-sized sqlite DB, while doing the same query
>>     with an index took < 1ms.
>>
>>     Change-Id: I8e7fe60f1f45a06c102f56b54d2ead9e0377794e
>>     BUG: 1529883
>>     Signed-off-by: Niklas Hambüchen <mail at nh2.me>
>>
>> commit c96a1338fe8139d07a0aa1bc40f0843d033f0324
>> Author: Pranith Kumar K <pkarampu at redhat.com>
>> Date:   Wed Dec 6 07:59:53 2017 +0530
>>
>>     cluster/ec: Change [f]getxattr to parallel-dispatch-one
>>
>>     At the moment in EC, [f]getxattr operations wait to acquire a lock
>>     while other operations are in progress even when it is in the same
>> mount with a
>>     lock on the file/directory. This happens because [f]getxattr
>> operations
>>     follow the model where the operation is wound on 'k' of the bricks
>> and are
>>     matched to make sure the data returned is same on all of them. This
>> consistency
>>     check requires that no other operations are on-going while [f]getxattr
>>     operations are wound to the bricks. We can perform [f]getxattr in
>>     another way as well, where we find the good_mask from the lock that
>> is already
>>     granted and wind the operation on any one of the good bricks and
>> unwind the
>>     answer after adjusting size/blocks to the parent xlator. Since we are
>> taking
>>     into account good_mask, the reply we get will either be before or
>> after a
>>     possible on-going operation. Using this method, the operation doesn't
>> need to
>>     depend on completion of on-going operations which could be taking
>> long time (In
>>     case of some slow disks and writes are in progress etc). Thus we
>> reduce the
>>     time to serve [f]getxattr requests.
>>
>>     I changed [f]getxattr to dispatch-one and added extra logic in
>>     ec_link_has_lock_conflict() to not have any conflicts for fops with
>>     EC_MINIMUM_ONE as fop->minimum to achieve the effect described above.
>>     Modified scripts to make sure READ fop is received in EC to trigger
>> heals.
>>
>>     Updates gluster/glusterfs#368
>>     Change-Id: I3b4ebf89181c336b7b8d5471b0454f016cdaf296
>>     Signed-off-by: Pranith Kumar K <pkarampu at redhat.com>
>>
>> commit e255385ae4f4c8a883b3fb96baceba4b143828da
>> Author: Csaba Henk <csaba at redhat.com>
>> Date:   Fri Nov 10 20:33:20 2017 +0100
>>
>>     write-behind: Allow trickling-writes to be configurable
>>
>>     This is the undisputed/trivial part of Shreyas' patch
>>     he attached to https://bugzilla.redhat.com/1364740 (of
>>     which the current bug is a clone).
>>
>>     We need more evaluation for the page_size and window_size
>>     bits before taking them on.
>>
>>     Change-Id: Iaa0b9a69d35e522b77a52a09acef47460e8ae3e9
>>     BUG: 1428060
>>     Co-authored-by: Shreyas Siravara <sshreyas at fb.com>
>>     Signed-off-by: Csaba Henk <csaba at redhat.com>
>>
>>
>> commit c26cadd31dfa128c4ec6883f69d654813f351018
>> Author: Poornima G <pgurusid at redhat.com>
>> Date:   Fri Jun 30 12:52:21 2017 +0530
>>
>>     quick-read: Integrate quick read with upcall and increase cache time
>>
>>     Fixes : #261
>>     Co-author: Subha sree Mohankumar <smohanku at redhat.com>
>>     Change-Id: Ie9dd94e86459123663b9b200d92940625ef68eab
>>     Signed-off-by: Poornima G <pgurusid at redhat.com>
>>
>> commit d95db5505a9cb923e61ccd23d28b45ceb07b716f
>> Author: Shreyas Siravara <sshreyas at fb.com>
>> Date:   Thu Sep 7 15:34:58 2017 -0700
>>
>>     md-cache: Cache statfs calls
>>
>>     Summary:
>>     - This gives md-cache to cache statfs calls
>>     - You can turn it on or off via 'gluster vol set groot
>> performance.md-cache-statfs <on|off>'
>>
>>     Change-Id: I664579e3c19fb9a6cd9d7b3a0eae061f70f4def4
>>     BUG: 1523295
>>     Signature: t1:4652632:1488581841:111cc01e
>> fe83c71f1e98d075abb10589c4574705
>>     Reviewed-on: https://review.gluster.org/18228
>>     Reviewed-by: Shreyas Siravara <sshreyas at fb.com>
>>     CentOS-regression: Gluster Build System <jenkins at build.gluster.org>
>>     Smoke: Gluster Build System <jenkins at build.gluster.org>
>>     Signed-off-by: Shreyas Siravara <sshreyas at fb.com>
>>
>> commit 430484c92ab5a6234958d1143e0bb14aeb0cd1c0
>> Author: Mohit Agrawal <moagrawa at redhat.com>
>> Date:   Fri Oct 20 12:39:29 2017 +0530
>>
>>     glusterfs: Use gcc builtin ATOMIC operator to increase/decreate
>> refcount.
>>
>>     Problem: In glusterfs code base we call mutex_lock/unlock to take
>>              reference/dereference for a object.Sometime it could be
>>              reason for lock contention also.
>>
>>     Solution: There is no need to use mutex to increase/decrease ref
>>               counter, instead of using mutex use gcc builtin ATOMIC
>>               operation.
>>
>>     Test:   I have not observed yet how much performance gain after apply
>>             this patch specific to glusterfs but i have tested same
>>             with below small program(mutex and atomic both) and
>>             get good difference.
>>
>>    Change-Id: Ie5030a52ea264875e002e108dd4b207b15ab7cc7
>>    Signed-off-by: Mohit Agrawal <moagrawa at redhat.com>
>>
>> commit f9b6174a7f5eb6475ca9780b062bfb3ff1132b2d
>> Author: Shreyas Siravara <sshreyas at fb.com>
>> Date:   Mon Apr 10 12:36:21 2017 -0700
>>
>>     posix: Add option to disable nftw() based deletes when purging the
>> landfill directory
>>
>>     Summary:
>>     - We may have found an issue where certain directories were being
>> moved into .landfill and then being quickly purged via nftw().
>>     - We would like to have an emergency option to disable these purges.
>>
>>     > Reviewed-on: https://review.gluster.org/18253
>>     > Reviewed-by: Shreyas Siravara <sshreyas at fb.com>
>>
>>     Fixes #371
>>
>>     Change-Id: I90b54c535930c1ca2925a928728199b6b80eadd9
>>     Signed-off-by: Amar Tumballi <amarts at redhat.com>
>>
>> commit 59d1cc720f52357f7a6f20bb630febc6a622c99c
>> Author: Raghavendra G <rgowdapp at redhat.com>
>> Date:   Tue Sep 19 09:44:55 2017 +0530
>>
>>     cluster/dht: populate inode in dentry for single subvolume dht
>>
>>     ... in readdirp response if dentry points to a directory inode. This
>>     is a special case where the entire layout is stored in one single
>>     subvolume and hence no need for lookup to construct the layout
>>
>>     Change-Id: I44fd951e2393ec9dac2af120469be47081a32185
>>     BUG: 1492625
>>     Signed-off-by: Raghavendra G <rgowdapp at redhat.com>
>>
>> commit e785faead91f74dce7c832848f2e8f3f43bd0be5
>> Author: Raghavendra G <rgowdapp at redhat.com>
>> Date:   Mon Sep 18 16:01:34 2017 +0530
>>
>>     cluster/dht: don't overfill the buffer in readdir(p)
>>
>>     Superflous dentries that cannot be fit in the buffer size provided by
>>     kernel are thrown away by fuse-bridge. This means,
>>
>>     * the next readdir(p) seen by readdir-ahead would have an offset of a
>>     dentry returned in a previous readdir(p) response. When readdir-ahead
>>     detects non-monotonic offset it turns itself off which can result in
>>     poor readdir performance.
>>
>>     * readdirp can be cpu-intensive on brick and there is no point to read
>>      all those dentries just to be thrown away by fuse-bridge.
>>
>>     So, the best strategy would be to fill the buffer optimally - neither
>>     overfill nor underfill.
>>
>>     Change-Id: Idb3d85dd4c08fdc4526b2df801d49e69e439ba84
>>     BUG: 1492625
>>     Signed-off-by: Raghavendra G <rgowdapp at redhat.com>
>>
>> commit 4ad64ffe8664cc0b964586af6efcf53cc619b68a
>> Author: Pranith Kumar K <pkarampu at redhat.com>
>> Date:   Fri Nov 17 07:20:21 2017 +0530
>>
>>     ec: Use tiebreaker_inodelk where necessary
>>
>>     When there are big directories or files that need to be healed,
>>     other shds are stuck on getting lock on self-heal domain for these
>>     directories/files. If there is a tie-breaker logic, other shds
>>     can heal some other files/directories while 1 of the shds is healing
>>     the big file/directory.
>>
>>     Before this patch:
>>     96.67  4890.64 us 12.89 us 646115887.30us 340869 INODELK
>>     After this patch:
>>     40.76  42.35 us   15.09 us 6546.50us 438478 INODELK
>>
>>     Fixes gluster/glusterfs#354
>>     Change-Id: Ia995b5576b44f770c064090705c78459e543cc64
>>     Signed-off-by: Pranith Kumar K <pkarampu at redhat.com>
>>
>> commit 3f8d118e48f11f448f35aca0c48ad40e0fd34f5b
>> Author: Xavier Hernandez <jahernan at redhat.com>
>> Date:   Tue Nov 7 13:45:03 2017 +0100
>>
>>     libglusterfs/atomic: Improved atomic support
>>
>>     This patch solves a detection problem in configure.ac that prevented
>>     that compilation detects builtin __atomic or __sync functions.
>>
>>     It also adds more atomic types and support for other atomic functions.
>>
>>     An special case has been added to support 64-bit atomics on 32-bit
>>     systems. The solution is to fallback to the mutex solution only for
>>     64-bit atomics, but smaller atomic types will still take advantage
>>     of builtins if available.
>>
>>     Change-Id: I6b9afc7cd6e66b28a33278715583552872278801
>>     BUG: 1510397
>>     Signed-off-by: Xavier Hernandez <jahernan at redhat.com>
>>
>> commit 0dcd5b2feeeec7c29bd2454d6ad950d094d02b0f
>> Author: Xavier Hernandez <jahernan at redhat.com>
>> Date:   Mon Oct 16 13:57:59 2017 +0200
>>
>>     cluster/ec: create eager-lock option for non-regular files
>>
>>     A new option is added to allow independent configuration of eager
>>     locking for regular files and non-regular files.
>>
>>     Change-Id: I8f80e46d36d8551011132b15c0fac549b7fb1c60
>>     BUG: 1502610
>>     Signed-off-by: Xavier Hernandez <jahernan at redhat.com>
>>
>> Apart from these commits there are also some patches which aid
>> concurrency in the code. I've left them out since performance benefits are
>> not measured and doesn't affect the users directly. If you feel these have
>> to be added please let me know. Some changes are:
>> * Patches from Zhang Huan <zhanghuan at open-fs.com> aimed to reduce lock
>> contention in rpc layer and while accessing fdtable,
>> * Patches from Milind Changire <mchangir at redhat.com> while accessing
>> programs in rpcsvc.
>>
>> From the commits listed above, I see that following components are
>> affected and I've listed owners for updating a short summary of changes
>> along with the component
>> * glusterd: optimize glusterd import volumes code path - Atin
>> * md-cache - Shreyas and Poornima
>> * EC - Xavi and Pranith (I see that pranith already sent an update. So I
>> guess this is covered)
>> * Improvements to consumption of Atomic Builtins - Xavi and Mohit
>> * Improvements to glusterfind - Niklas Hambüchen, Milind and Aravinda V K
>> * Modification of Quick-read to consume upcall notifications - Poornima
>> * Exposing trickling-writes in write-behind - Csaba and Shreyas
>> * Changes to Purging landfill directory in storage/posix - Shreyas
>> * Adding option to full file lock in afr - Karthick Subramanya
>> * readdirplus enhancements in DHT - Raghavendra Gowdappa
>> * Dentry Fop Serializer - Raghavendra Gowdappa and Amar
>>
>> Please send out patches updating "Performance" section of release notes.
>> If you think your patch need not be mentioned in relase notes too, please
>> send an explicit nack so that we'll know.
>>
>> If I've left out any fixes, please point them out. If not, only subset of
>> changes listed above will have a mention in "performance" section of
>> release notes.
>>
>> On Tue, Feb 20, 2018 at 7:59 AM, Raghavendra Gowdappa <
>> rgowdapp at redhat.com> wrote:
>>
>>> +gluster-devel.
>>>
>>>
>>> On Tue, Feb 20, 2018 at 7:35 AM, Raghavendra Gowdappa <
>>> rgowdapp at redhat.com> wrote:
>>>
>>>> All,
>>>>
>>>> I am trying to come up with content for release notes for 4.0
>>>> summarizing performance impact. Can you point me to
>>>> patches/documentation/issues/bugs that could impact performance in
>>>> 4.0? Better still, if you can give me a summary of changes having
>>>> performance impact, it would be really be helpful.
>>>>
>>>> I see that Pranith had responded with this link:
>>>> https://review.gluster.org/#/c/19535/3/doc/release-notes/4.0.0.md
>>>>
>>>> regards,
>>>> Raghavendra
>>>>
>>>
>>>
>>
>> _______________________________________________
>> Gluster-devel mailing list
>> Gluster-devel at gluster.org
>> http://lists.gluster.org/mailman/listinfo/gluster-devel
>>
>
>
>
> --
> Amye Scavarda | amye at redhat.com | Gluster Community Lead
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-devel/attachments/20180221/0cc9b5d0/attachment-0001.html>