[Gluster-Maintainers] [Gluster-devel] Inputs for 4.0 Release notes on Performance
Amye Scavarda
amye at redhat.com
Wed Feb 21 04:17:02 UTC 2018
It may be more effective to email the direct parties, I know that I filter
out mailing lists and don't always see this in time.
Given as this is somewhat time critical and we'll need to get release notes
out shortly, suggest taking it to direct emails.
- amye
On Tue, Feb 20, 2018 at 8:11 PM, Raghavendra Gowdappa <rgowdapp at redhat.com>
wrote:
> From 'git log release-3.13..release-4.0' I see following patches that
> might've an impact on performance:
>
> commit a32ff73c06e1e14589817b1701c1c8d0f05aaa04
> Author: Atin Mukherjee <amukherj at redhat.com>
> Date: Mon Jan 29 10:23:52 2018 +0530
>
> glusterd: optimize glusterd import volumes code path
>
> In case there's a version mismatch detected for one of the volumes
> glusterd was ending up with updating all the volumes which is a
> overkill.
>
> >mainline patch : https://review.gluster.org/#/c/19358/
>
> Change-Id: I6df792db391ce3a1697cfa9260f7dbc3f59aa62d
> BUG: 1540554
> Signed-off-by: Atin Mukherjee <amukherj at redhat.com>
> (cherry picked from commit bb34b07fd2ec5e6c3eed4fe0cdf33479dbf5127b)
>
> commit ea972d9f5c9b318429c228108c21a334b4acd95c
> Author: Sakshi Bansal <sabansal at redhat.com>
> Date: Mon Jan 22 14:38:17 2018 +0530
>
> dentry fop serializer: added new server side xlator for dentry fop
> serialization
>
> Problems addressed by this xlator :
>
> [1]. To prevent race between parallel mkdir,mkdir and lookup etc.
>
> Fops like mkdir/create, lookup, rename, unlink, link that happen on a
> particular dentry must be serialized to ensure atomicity.
>
> Another possible case can be a fresh lookup to find existance of a path
> whose gfid is not set yet. Further, storage/posix employs a ctime based
> heuristic 'is_fresh_file' (interval time is less than 1 second of
> current
> time) to check fresh-ness of file. With serialization of these two fops
> (lookup & mkdir), we eliminate the race altogether.
>
> [2]. Staleness of dentries
>
> This causes exponential increase in traversal time for any inode in the
> subtree of the directory pointed by stale dentry.
>
> Cause : Stale dentry is created because of following two operations:
>
> a. dentry creation due to inode_link, done during operations like
> lookup, mkdir, create, mknod, symlink, create and
> b. dentry unlinking due to various operations like rmdir, rename,
> unlink.
>
> The reason is __inode_link uses __is_dentry_cyclic, which
> explores
> all possible path to avoid cyclic link formation during inode
> linkage. __is_dentry_cyclic explores stale-dentry(ies) and its
> all ancestors which is increases traversing time exponentially.
>
> Implementation : To acheive this all fops on dentry must take entry
> locks
> before they proceed, once they have acquired locks, they perform the
> fop
> and then release the lock.
>
> Some documentation from email conversation:
> [1] http://www.gluster.org/pipermail/gluster-devel/2015-
> December/047314.html
>
> [2] http://www.gluster.org/pipermail/gluster-devel/2015-
> August/046428.html
>
> With this patch, the feature is optional, enable it by running:
>
> `gluster volume set $volname features.sdfs enable`
>
> Change-Id: I6e80ba3cabfa6facd5dda63bd482b9bf18b6b79b
> Fixes: #397
> Signed-off-by: Sakshi Bansal <sabansal at redhat.com>
> Signed-off-by: Amar Tumballi <amarts at redhat.com>
> Signed-off-by: Sunny Kumar <sunkumar at redhat.com>
>
>
> commit 24bf7715140586675f8d2036f4d589bc255c16dc
> Author: Poornima G <pgurusid at redhat.com>
> Date: Tue Jan 9 17:26:44 2018 +0530
>
> md-cache: Implement dynamic configuration of xattr list for caching
>
> Currently, the list of xattrs that md-cache can cache is hard coded
> in the md-cache.c file, this necessiates code change and rebuild
> everytime a new xattr needs to be added to md-cache xattr cache
> list.
>
> With this patch, the user will be able to configure a comma
> seperated list of xattrs to be cached by md-cache
>
> Updates #297
>
> Change-Id: Ie35ed607d17182d53f6bb6e6c6563ac52bc3132e
> Signed-off-by: Poornima G <pgurusid at redhat.com>
>
> commit efc30e60e233164bd4fe7fc903a7c5f718b0448b
> Author: Poornima G <pgurusid at redhat.com>
> Date: Tue Jan 9 10:32:16 2018 +0530
>
> upcall: Allow md-cache to specify invalidations on xattr with wildcard
>
> Currently, md-cache sends a list of xattrs, it is inttrested in
> recieving
> invalidations for. But, it cannot specify any wildcard in the xattr
> names
> Eg: user.* - invalidate on updating any xattr with user. prefix.
>
> This patch, enable upcall to honor wildcard in the xattr key names
>
> Updates: #297
>
> Change-Id: I98caf0ed72f11ef10770bf2067d4428880e0a03a
> Signed-off-by: Poornima G <pgurusid at redhat.com>
>
> commit 8fc9c6a8fc7c73b2b4c65a8ddbe988bca10e89b6
> Author: Poornima G <pgurusid at redhat.com>
> Date: Thu Jan 4 19:38:05 2018 +0530
>
> posix: In getxattr, honor the wildcard '*'
>
> Currently, the posix_xattr_fill performas a sys_getxattr
> on all the keys requested, there are requirements where
> the keys could contain a wildcard, in which case sys_getxattr
> would return ENODATA, eg: if the xattr requested is user.*
> all the xattrs with prefix user. should be returned, with their
> values.
>
> This patch, changes posix_xattr_fill, to honor wildcard in the keys
> requested.
>
> Updates #297
>
> Change-Id: I3d52da2957ac386fca3c156e26ff4cdf0b2c79a9
> Signed-off-by: Poornima G <pgurusid at redhat.com>
>
> commit 84c5c540b26c8f3dcb9845344dd48df063e57845
> Author: karthik-us <ksubrahm at redhat.com>
> Date: Wed Jan 17 17:30:06 2018 +0530
>
> cluster/afr: Adding option to take full file lock
>
> Problem:
> In replica 3 volumes there is a possibilities of ending up in split
> brain scenario, when multiple clients writing data on the same file
> at non overlapping regions in parallel.
>
> Scenario:
> - Initially all the copies are good and all the clients gets the value
> of data readables as all good.
> - Client C0 performs write W1 which fails on brick B0 and succeeds on
> other two bricks.
> - C1 performs write W2 which fails on B1 and succeeds on other two
> bricks.
> - C2 performs write W3 which fails on B2 and succeeds on other two
> bricks.
> - All the 3 writes above happen in parallel and fall on different
> ranges
> so afr takes granular locks and all the writes are performed in
> parallel.
> Since each client had data-readables as good, it does not see
> file going into split-brain in the in_flight_split_brain check, hence
> performs the post-op marking the pending xattrs. Now all the bricks
> are being blamed by each other, ending up in split-brain.
>
> Fix:
> Have an option to take either full lock or range lock on files while
> doing data transactions, to prevent the possibility of ending up in
> split brains. With this change, by default the files will take full
> lock while doing IO. If you want to make use of the old range lock
> change the value of "cluster.full-lock" to "no".
>
> Change-Id: I7893fa33005328ed63daa2f7c35eeed7c5218962
> BUG: 1535438
> Signed-off-by: karthik-us <ksubrahm at redhat.com>
>
> commit 2db7872d5251d98d47c262ff269776bfae2d4fb9
> Author: Poornima G <pgurusid at redhat.com>
> Date: Mon Aug 7 11:24:46 2017 +0530
>
> md-cache: Serve nameless lookup from cache
>
> Updates #232
> Change-Id: I97e92312a53a50c2d1660bf8d657201fc05a76eb
> Signed-off-by: Poornima G <pgurusid at redhat.com>
>
> commit 78d67da17356b48cf1d5a6595764650d5b200ba7
> Author: Sunil Kumar Acharya <sheggodu at redhat.com>
> Date: Thu Mar 23 12:50:41 2017 +0530
>
> cluster/ec: OpenFD heal implementation for EC
>
> Existing EC code doesn't try to heal the OpenFD to
> avoid unnecessary healing of the data later.
>
> Fix implements the healing of open FDs before
> carrying out file operations on them by making an
> attempt to open the FDs on required up nodes.
>
> BUG: 1431955
> Change-Id: Ib696f59c41ffd8d5678a484b23a00bb02764ed15
> Signed-off-by: Sunil Kumar Acharya <sheggodu at redhat.com>
>
> commit 14dbd5da1cae64e6d4d2c69966e19844d090ce98
> Author: Niklas Hambüchen <mail at nh2.me>
> Date: Fri Dec 29 15:49:13 2017 +0100
>
> glusterfind: Speed up gfid lookup 100x by using an SQL index
>
> Fixes #1529883.
>
> This fixes some bits of `glusterfind`'s horrible performance,
> making it 100x faster.
>
> Until now, glusterfind was, for each line in each CHANGELOG.* file,
> linearly reading the entire contents of the sqlite database in
> 4096-bytes-sized pread64() syscalls when executing the
>
> SELECT COUNT(1) FROM %s WHERE 1=1 AND gfid = ?
>
> query through the code path:
>
> get_changes()
> parse_changelog_to_db()
> when_data_meta()
> gfidpath_exists()
> _exists()
>
> In a quick benchmark on my laptop, doing one such `SELECT` query
> took ~75ms on a 10MB-sized sqlite DB, while doing the same query
> with an index took < 1ms.
>
> Change-Id: I8e7fe60f1f45a06c102f56b54d2ead9e0377794e
> BUG: 1529883
> Signed-off-by: Niklas Hambüchen <mail at nh2.me>
>
> commit c96a1338fe8139d07a0aa1bc40f0843d033f0324
> Author: Pranith Kumar K <pkarampu at redhat.com>
> Date: Wed Dec 6 07:59:53 2017 +0530
>
> cluster/ec: Change [f]getxattr to parallel-dispatch-one
>
> At the moment in EC, [f]getxattr operations wait to acquire a lock
> while other operations are in progress even when it is in the same
> mount with a
> lock on the file/directory. This happens because [f]getxattr operations
> follow the model where the operation is wound on 'k' of the bricks and
> are
> matched to make sure the data returned is same on all of them. This
> consistency
> check requires that no other operations are on-going while [f]getxattr
> operations are wound to the bricks. We can perform [f]getxattr in
> another way as well, where we find the good_mask from the lock that is
> already
> granted and wind the operation on any one of the good bricks and
> unwind the
> answer after adjusting size/blocks to the parent xlator. Since we are
> taking
> into account good_mask, the reply we get will either be before or
> after a
> possible on-going operation. Using this method, the operation doesn't
> need to
> depend on completion of on-going operations which could be taking long
> time (In
> case of some slow disks and writes are in progress etc). Thus we
> reduce the
> time to serve [f]getxattr requests.
>
> I changed [f]getxattr to dispatch-one and added extra logic in
> ec_link_has_lock_conflict() to not have any conflicts for fops with
> EC_MINIMUM_ONE as fop->minimum to achieve the effect described above.
> Modified scripts to make sure READ fop is received in EC to trigger
> heals.
>
> Updates gluster/glusterfs#368
> Change-Id: I3b4ebf89181c336b7b8d5471b0454f016cdaf296
> Signed-off-by: Pranith Kumar K <pkarampu at redhat.com>
>
> commit e255385ae4f4c8a883b3fb96baceba4b143828da
> Author: Csaba Henk <csaba at redhat.com>
> Date: Fri Nov 10 20:33:20 2017 +0100
>
> write-behind: Allow trickling-writes to be configurable
>
> This is the undisputed/trivial part of Shreyas' patch
> he attached to https://bugzilla.redhat.com/1364740 (of
> which the current bug is a clone).
>
> We need more evaluation for the page_size and window_size
> bits before taking them on.
>
> Change-Id: Iaa0b9a69d35e522b77a52a09acef47460e8ae3e9
> BUG: 1428060
> Co-authored-by: Shreyas Siravara <sshreyas at fb.com>
> Signed-off-by: Csaba Henk <csaba at redhat.com>
>
>
> commit c26cadd31dfa128c4ec6883f69d654813f351018
> Author: Poornima G <pgurusid at redhat.com>
> Date: Fri Jun 30 12:52:21 2017 +0530
>
> quick-read: Integrate quick read with upcall and increase cache time
>
> Fixes : #261
> Co-author: Subha sree Mohankumar <smohanku at redhat.com>
> Change-Id: Ie9dd94e86459123663b9b200d92940625ef68eab
> Signed-off-by: Poornima G <pgurusid at redhat.com>
>
> commit d95db5505a9cb923e61ccd23d28b45ceb07b716f
> Author: Shreyas Siravara <sshreyas at fb.com>
> Date: Thu Sep 7 15:34:58 2017 -0700
>
> md-cache: Cache statfs calls
>
> Summary:
> - This gives md-cache to cache statfs calls
> - You can turn it on or off via 'gluster vol set groot
> performance.md-cache-statfs <on|off>'
>
> Change-Id: I664579e3c19fb9a6cd9d7b3a0eae061f70f4def4
> BUG: 1523295
> Signature: t1:4652632:1488581841:111cc01efe83c71f1e98d075abb105
> 89c4574705
> Reviewed-on: https://review.gluster.org/18228
> Reviewed-by: Shreyas Siravara <sshreyas at fb.com>
> CentOS-regression: Gluster Build System <jenkins at build.gluster.org>
> Smoke: Gluster Build System <jenkins at build.gluster.org>
> Signed-off-by: Shreyas Siravara <sshreyas at fb.com>
>
> commit 430484c92ab5a6234958d1143e0bb14aeb0cd1c0
> Author: Mohit Agrawal <moagrawa at redhat.com>
> Date: Fri Oct 20 12:39:29 2017 +0530
>
> glusterfs: Use gcc builtin ATOMIC operator to increase/decreate
> refcount.
>
> Problem: In glusterfs code base we call mutex_lock/unlock to take
> reference/dereference for a object.Sometime it could be
> reason for lock contention also.
>
> Solution: There is no need to use mutex to increase/decrease ref
> counter, instead of using mutex use gcc builtin ATOMIC
> operation.
>
> Test: I have not observed yet how much performance gain after apply
> this patch specific to glusterfs but i have tested same
> with below small program(mutex and atomic both) and
> get good difference.
>
> Change-Id: Ie5030a52ea264875e002e108dd4b207b15ab7cc7
> Signed-off-by: Mohit Agrawal <moagrawa at redhat.com>
>
> commit f9b6174a7f5eb6475ca9780b062bfb3ff1132b2d
> Author: Shreyas Siravara <sshreyas at fb.com>
> Date: Mon Apr 10 12:36:21 2017 -0700
>
> posix: Add option to disable nftw() based deletes when purging the
> landfill directory
>
> Summary:
> - We may have found an issue where certain directories were being
> moved into .landfill and then being quickly purged via nftw().
> - We would like to have an emergency option to disable these purges.
>
> > Reviewed-on: https://review.gluster.org/18253
> > Reviewed-by: Shreyas Siravara <sshreyas at fb.com>
>
> Fixes #371
>
> Change-Id: I90b54c535930c1ca2925a928728199b6b80eadd9
> Signed-off-by: Amar Tumballi <amarts at redhat.com>
>
> commit 59d1cc720f52357f7a6f20bb630febc6a622c99c
> Author: Raghavendra G <rgowdapp at redhat.com>
> Date: Tue Sep 19 09:44:55 2017 +0530
>
> cluster/dht: populate inode in dentry for single subvolume dht
>
> ... in readdirp response if dentry points to a directory inode. This
> is a special case where the entire layout is stored in one single
> subvolume and hence no need for lookup to construct the layout
>
> Change-Id: I44fd951e2393ec9dac2af120469be47081a32185
> BUG: 1492625
> Signed-off-by: Raghavendra G <rgowdapp at redhat.com>
>
> commit e785faead91f74dce7c832848f2e8f3f43bd0be5
> Author: Raghavendra G <rgowdapp at redhat.com>
> Date: Mon Sep 18 16:01:34 2017 +0530
>
> cluster/dht: don't overfill the buffer in readdir(p)
>
> Superflous dentries that cannot be fit in the buffer size provided by
> kernel are thrown away by fuse-bridge. This means,
>
> * the next readdir(p) seen by readdir-ahead would have an offset of a
> dentry returned in a previous readdir(p) response. When readdir-ahead
> detects non-monotonic offset it turns itself off which can result in
> poor readdir performance.
>
> * readdirp can be cpu-intensive on brick and there is no point to read
> all those dentries just to be thrown away by fuse-bridge.
>
> So, the best strategy would be to fill the buffer optimally - neither
> overfill nor underfill.
>
> Change-Id: Idb3d85dd4c08fdc4526b2df801d49e69e439ba84
> BUG: 1492625
> Signed-off-by: Raghavendra G <rgowdapp at redhat.com>
>
> commit 4ad64ffe8664cc0b964586af6efcf53cc619b68a
> Author: Pranith Kumar K <pkarampu at redhat.com>
> Date: Fri Nov 17 07:20:21 2017 +0530
>
> ec: Use tiebreaker_inodelk where necessary
>
> When there are big directories or files that need to be healed,
> other shds are stuck on getting lock on self-heal domain for these
> directories/files. If there is a tie-breaker logic, other shds
> can heal some other files/directories while 1 of the shds is healing
> the big file/directory.
>
> Before this patch:
> 96.67 4890.64 us 12.89 us 646115887.30us 340869 INODELK
> After this patch:
> 40.76 42.35 us 15.09 us 6546.50us 438478 INODELK
>
> Fixes gluster/glusterfs#354
> Change-Id: Ia995b5576b44f770c064090705c78459e543cc64
> Signed-off-by: Pranith Kumar K <pkarampu at redhat.com>
>
> commit 3f8d118e48f11f448f35aca0c48ad40e0fd34f5b
> Author: Xavier Hernandez <jahernan at redhat.com>
> Date: Tue Nov 7 13:45:03 2017 +0100
>
> libglusterfs/atomic: Improved atomic support
>
> This patch solves a detection problem in configure.ac that prevented
> that compilation detects builtin __atomic or __sync functions.
>
> It also adds more atomic types and support for other atomic functions.
>
> An special case has been added to support 64-bit atomics on 32-bit
> systems. The solution is to fallback to the mutex solution only for
> 64-bit atomics, but smaller atomic types will still take advantage
> of builtins if available.
>
> Change-Id: I6b9afc7cd6e66b28a33278715583552872278801
> BUG: 1510397
> Signed-off-by: Xavier Hernandez <jahernan at redhat.com>
>
> commit 0dcd5b2feeeec7c29bd2454d6ad950d094d02b0f
> Author: Xavier Hernandez <jahernan at redhat.com>
> Date: Mon Oct 16 13:57:59 2017 +0200
>
> cluster/ec: create eager-lock option for non-regular files
>
> A new option is added to allow independent configuration of eager
> locking for regular files and non-regular files.
>
> Change-Id: I8f80e46d36d8551011132b15c0fac549b7fb1c60
> BUG: 1502610
> Signed-off-by: Xavier Hernandez <jahernan at redhat.com>
>
> Apart from these commits there are also some patches which aid concurrency
> in the code. I've left them out since performance benefits are not measured
> and doesn't affect the users directly. If you feel these have to be added
> please let me know. Some changes are:
> * Patches from Zhang Huan <zhanghuan at open-fs.com> aimed to reduce lock
> contention in rpc layer and while accessing fdtable,
> * Patches from Milind Changire <mchangir at redhat.com> while accessing
> programs in rpcsvc.
>
> From the commits listed above, I see that following components are
> affected and I've listed owners for updating a short summary of changes
> along with the component
> * glusterd: optimize glusterd import volumes code path - Atin
> * md-cache - Shreyas and Poornima
> * EC - Xavi and Pranith (I see that pranith already sent an update. So I
> guess this is covered)
> * Improvements to consumption of Atomic Builtins - Xavi and Mohit
> * Improvements to glusterfind - Niklas Hambüchen, Milind and Aravinda V K
> * Modification of Quick-read to consume upcall notifications - Poornima
> * Exposing trickling-writes in write-behind - Csaba and Shreyas
> * Changes to Purging landfill directory in storage/posix - Shreyas
> * Adding option to full file lock in afr - Karthick Subramanya
> * readdirplus enhancements in DHT - Raghavendra Gowdappa
> * Dentry Fop Serializer - Raghavendra Gowdappa and Amar
>
> Please send out patches updating "Performance" section of release notes.
> If you think your patch need not be mentioned in relase notes too, please
> send an explicit nack so that we'll know.
>
> If I've left out any fixes, please point them out. If not, only subset of
> changes listed above will have a mention in "performance" section of
> release notes.
>
> On Tue, Feb 20, 2018 at 7:59 AM, Raghavendra Gowdappa <rgowdapp at redhat.com
> > wrote:
>
>> +gluster-devel.
>>
>>
>> On Tue, Feb 20, 2018 at 7:35 AM, Raghavendra Gowdappa <
>> rgowdapp at redhat.com> wrote:
>>
>>> All,
>>>
>>> I am trying to come up with content for release notes for 4.0
>>> summarizing performance impact. Can you point me to
>>> patches/documentation/issues/bugs that could impact performance in 4.0?
>>> Better still, if you can give me a summary of changes having performance
>>> impact, it would be really be helpful.
>>>
>>> I see that Pranith had responded with this link:
>>> https://review.gluster.org/#/c/19535/3/doc/release-notes/4.0.0.md
>>>
>>> regards,
>>> Raghavendra
>>>
>>
>>
>
> _______________________________________________
> Gluster-devel mailing list
> Gluster-devel at gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-devel
>
--
Amye Scavarda | amye at redhat.com | Gluster Community Lead
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/maintainers/attachments/20180220/abc7f12c/attachment-0001.html>
More information about the maintainers
mailing list