[Gluster-devel] Inputs for 4.0 Release notes on Performance
Raghavendra Gowdappa
rgowdapp at redhat.com
Wed Feb 21 04:18:29 UTC 2018
On Wed, Feb 21, 2018 at 9:47 AM, Amye Scavarda <amye at redhat.com> wrote:
> It may be more effective to email the direct parties, I know that I filter
> out mailing lists and don't always see this in time.
> Given as this is somewhat time critical and we'll need to get release
> notes out shortly, suggest taking it to direct emails.
>
I added individual owners to CC list. For some reason, they are not
reflected in CC list. But, I guess they would've received direct mails.
- amye
>
> On Tue, Feb 20, 2018 at 8:11 PM, Raghavendra Gowdappa <rgowdapp at redhat.com
> > wrote:
>
>> From 'git log release-3.13..release-4.0' I see following patches that
>> might've an impact on performance:
>>
>> commit a32ff73c06e1e14589817b1701c1c8d0f05aaa04
>> Author: Atin Mukherjee <amukherj at redhat.com>
>> Date: Mon Jan 29 10:23:52 2018 +0530
>>
>> glusterd: optimize glusterd import volumes code path
>>
>> In case there's a version mismatch detected for one of the volumes
>> glusterd was ending up with updating all the volumes which is a
>> overkill.
>>
>> >mainline patch : https://review.gluster.org/#/c/19358/
>>
>> Change-Id: I6df792db391ce3a1697cfa9260f7dbc3f59aa62d
>> BUG: 1540554
>> Signed-off-by: Atin Mukherjee <amukherj at redhat.com>
>> (cherry picked from commit bb34b07fd2ec5e6c3eed4fe0cdf33479dbf5127b)
>>
>> commit ea972d9f5c9b318429c228108c21a334b4acd95c
>> Author: Sakshi Bansal <sabansal at redhat.com>
>> Date: Mon Jan 22 14:38:17 2018 +0530
>>
>> dentry fop serializer: added new server side xlator for dentry fop
>> serialization
>>
>> Problems addressed by this xlator :
>>
>> [1]. To prevent race between parallel mkdir,mkdir and lookup etc.
>>
>> Fops like mkdir/create, lookup, rename, unlink, link that happen on a
>> particular dentry must be serialized to ensure atomicity.
>>
>> Another possible case can be a fresh lookup to find existance of a
>> path
>> whose gfid is not set yet. Further, storage/posix employs a ctime
>> based
>> heuristic 'is_fresh_file' (interval time is less than 1 second of
>> current
>> time) to check fresh-ness of file. With serialization of these two
>> fops
>> (lookup & mkdir), we eliminate the race altogether.
>>
>> [2]. Staleness of dentries
>>
>> This causes exponential increase in traversal time for any inode in
>> the
>> subtree of the directory pointed by stale dentry.
>>
>> Cause : Stale dentry is created because of following two operations:
>>
>> a. dentry creation due to inode_link, done during operations
>> like
>> lookup, mkdir, create, mknod, symlink, create and
>> b. dentry unlinking due to various operations like rmdir,
>> rename,
>> unlink.
>>
>> The reason is __inode_link uses __is_dentry_cyclic, which
>> explores
>> all possible path to avoid cyclic link formation during inode
>> linkage. __is_dentry_cyclic explores stale-dentry(ies) and its
>> all ancestors which is increases traversing time exponentially.
>>
>> Implementation : To acheive this all fops on dentry must take entry
>> locks
>> before they proceed, once they have acquired locks, they perform the
>> fop
>> and then release the lock.
>>
>> Some documentation from email conversation:
>> [1] http://www.gluster.org/pipermail/gluster-devel/2015-December
>> /047314.html
>>
>> [2] http://www.gluster.org/pipermail/gluster-devel/2015-August/
>> 046428.html
>>
>> With this patch, the feature is optional, enable it by running:
>>
>> `gluster volume set $volname features.sdfs enable`
>>
>> Change-Id: I6e80ba3cabfa6facd5dda63bd482b9bf18b6b79b
>> Fixes: #397
>> Signed-off-by: Sakshi Bansal <sabansal at redhat.com>
>> Signed-off-by: Amar Tumballi <amarts at redhat.com>
>> Signed-off-by: Sunny Kumar <sunkumar at redhat.com>
>>
>>
>> commit 24bf7715140586675f8d2036f4d589bc255c16dc
>> Author: Poornima G <pgurusid at redhat.com>
>> Date: Tue Jan 9 17:26:44 2018 +0530
>>
>> md-cache: Implement dynamic configuration of xattr list for caching
>>
>> Currently, the list of xattrs that md-cache can cache is hard coded
>> in the md-cache.c file, this necessiates code change and rebuild
>> everytime a new xattr needs to be added to md-cache xattr cache
>> list.
>>
>> With this patch, the user will be able to configure a comma
>> seperated list of xattrs to be cached by md-cache
>>
>> Updates #297
>>
>> Change-Id: Ie35ed607d17182d53f6bb6e6c6563ac52bc3132e
>> Signed-off-by: Poornima G <pgurusid at redhat.com>
>>
>> commit efc30e60e233164bd4fe7fc903a7c5f718b0448b
>> Author: Poornima G <pgurusid at redhat.com>
>> Date: Tue Jan 9 10:32:16 2018 +0530
>>
>> upcall: Allow md-cache to specify invalidations on xattr with wildcard
>>
>> Currently, md-cache sends a list of xattrs, it is inttrested in
>> recieving
>> invalidations for. But, it cannot specify any wildcard in the xattr
>> names
>> Eg: user.* - invalidate on updating any xattr with user. prefix.
>>
>> This patch, enable upcall to honor wildcard in the xattr key names
>>
>> Updates: #297
>>
>> Change-Id: I98caf0ed72f11ef10770bf2067d4428880e0a03a
>> Signed-off-by: Poornima G <pgurusid at redhat.com>
>>
>> commit 8fc9c6a8fc7c73b2b4c65a8ddbe988bca10e89b6
>> Author: Poornima G <pgurusid at redhat.com>
>> Date: Thu Jan 4 19:38:05 2018 +0530
>>
>> posix: In getxattr, honor the wildcard '*'
>>
>> Currently, the posix_xattr_fill performas a sys_getxattr
>> on all the keys requested, there are requirements where
>> the keys could contain a wildcard, in which case sys_getxattr
>> would return ENODATA, eg: if the xattr requested is user.*
>> all the xattrs with prefix user. should be returned, with their
>> values.
>>
>> This patch, changes posix_xattr_fill, to honor wildcard in the keys
>> requested.
>>
>> Updates #297
>>
>> Change-Id: I3d52da2957ac386fca3c156e26ff4cdf0b2c79a9
>> Signed-off-by: Poornima G <pgurusid at redhat.com>
>>
>> commit 84c5c540b26c8f3dcb9845344dd48df063e57845
>> Author: karthik-us <ksubrahm at redhat.com>
>> Date: Wed Jan 17 17:30:06 2018 +0530
>>
>> cluster/afr: Adding option to take full file lock
>>
>> Problem:
>> In replica 3 volumes there is a possibilities of ending up in split
>> brain scenario, when multiple clients writing data on the same file
>> at non overlapping regions in parallel.
>>
>> Scenario:
>> - Initially all the copies are good and all the clients gets the value
>> of data readables as all good.
>> - Client C0 performs write W1 which fails on brick B0 and succeeds on
>> other two bricks.
>> - C1 performs write W2 which fails on B1 and succeeds on other two
>> bricks.
>> - C2 performs write W3 which fails on B2 and succeeds on other two
>> bricks.
>> - All the 3 writes above happen in parallel and fall on different
>> ranges
>> so afr takes granular locks and all the writes are performed in
>> parallel.
>> Since each client had data-readables as good, it does not see
>> file going into split-brain in the in_flight_split_brain check,
>> hence
>> performs the post-op marking the pending xattrs. Now all the bricks
>> are being blamed by each other, ending up in split-brain.
>>
>> Fix:
>> Have an option to take either full lock or range lock on files while
>> doing data transactions, to prevent the possibility of ending up in
>> split brains. With this change, by default the files will take full
>> lock while doing IO. If you want to make use of the old range lock
>> change the value of "cluster.full-lock" to "no".
>>
>> Change-Id: I7893fa33005328ed63daa2f7c35eeed7c5218962
>> BUG: 1535438
>> Signed-off-by: karthik-us <ksubrahm at redhat.com>
>>
>> commit 2db7872d5251d98d47c262ff269776bfae2d4fb9
>> Author: Poornima G <pgurusid at redhat.com>
>> Date: Mon Aug 7 11:24:46 2017 +0530
>>
>> md-cache: Serve nameless lookup from cache
>>
>> Updates #232
>> Change-Id: I97e92312a53a50c2d1660bf8d657201fc05a76eb
>> Signed-off-by: Poornima G <pgurusid at redhat.com>
>>
>> commit 78d67da17356b48cf1d5a6595764650d5b200ba7
>> Author: Sunil Kumar Acharya <sheggodu at redhat.com>
>> Date: Thu Mar 23 12:50:41 2017 +0530
>>
>> cluster/ec: OpenFD heal implementation for EC
>>
>> Existing EC code doesn't try to heal the OpenFD to
>> avoid unnecessary healing of the data later.
>>
>> Fix implements the healing of open FDs before
>> carrying out file operations on them by making an
>> attempt to open the FDs on required up nodes.
>>
>> BUG: 1431955
>> Change-Id: Ib696f59c41ffd8d5678a484b23a00bb02764ed15
>> Signed-off-by: Sunil Kumar Acharya <sheggodu at redhat.com>
>>
>> commit 14dbd5da1cae64e6d4d2c69966e19844d090ce98
>> Author: Niklas Hambüchen <mail at nh2.me>
>> Date: Fri Dec 29 15:49:13 2017 +0100
>>
>> glusterfind: Speed up gfid lookup 100x by using an SQL index
>>
>> Fixes #1529883.
>>
>> This fixes some bits of `glusterfind`'s horrible performance,
>> making it 100x faster.
>>
>> Until now, glusterfind was, for each line in each CHANGELOG.* file,
>> linearly reading the entire contents of the sqlite database in
>> 4096-bytes-sized pread64() syscalls when executing the
>>
>> SELECT COUNT(1) FROM %s WHERE 1=1 AND gfid = ?
>>
>> query through the code path:
>>
>> get_changes()
>> parse_changelog_to_db()
>> when_data_meta()
>> gfidpath_exists()
>> _exists()
>>
>> In a quick benchmark on my laptop, doing one such `SELECT` query
>> took ~75ms on a 10MB-sized sqlite DB, while doing the same query
>> with an index took < 1ms.
>>
>> Change-Id: I8e7fe60f1f45a06c102f56b54d2ead9e0377794e
>> BUG: 1529883
>> Signed-off-by: Niklas Hambüchen <mail at nh2.me>
>>
>> commit c96a1338fe8139d07a0aa1bc40f0843d033f0324
>> Author: Pranith Kumar K <pkarampu at redhat.com>
>> Date: Wed Dec 6 07:59:53 2017 +0530
>>
>> cluster/ec: Change [f]getxattr to parallel-dispatch-one
>>
>> At the moment in EC, [f]getxattr operations wait to acquire a lock
>> while other operations are in progress even when it is in the same
>> mount with a
>> lock on the file/directory. This happens because [f]getxattr
>> operations
>> follow the model where the operation is wound on 'k' of the bricks
>> and are
>> matched to make sure the data returned is same on all of them. This
>> consistency
>> check requires that no other operations are on-going while [f]getxattr
>> operations are wound to the bricks. We can perform [f]getxattr in
>> another way as well, where we find the good_mask from the lock that
>> is already
>> granted and wind the operation on any one of the good bricks and
>> unwind the
>> answer after adjusting size/blocks to the parent xlator. Since we are
>> taking
>> into account good_mask, the reply we get will either be before or
>> after a
>> possible on-going operation. Using this method, the operation doesn't
>> need to
>> depend on completion of on-going operations which could be taking
>> long time (In
>> case of some slow disks and writes are in progress etc). Thus we
>> reduce the
>> time to serve [f]getxattr requests.
>>
>> I changed [f]getxattr to dispatch-one and added extra logic in
>> ec_link_has_lock_conflict() to not have any conflicts for fops with
>> EC_MINIMUM_ONE as fop->minimum to achieve the effect described above.
>> Modified scripts to make sure READ fop is received in EC to trigger
>> heals.
>>
>> Updates gluster/glusterfs#368
>> Change-Id: I3b4ebf89181c336b7b8d5471b0454f016cdaf296
>> Signed-off-by: Pranith Kumar K <pkarampu at redhat.com>
>>
>> commit e255385ae4f4c8a883b3fb96baceba4b143828da
>> Author: Csaba Henk <csaba at redhat.com>
>> Date: Fri Nov 10 20:33:20 2017 +0100
>>
>> write-behind: Allow trickling-writes to be configurable
>>
>> This is the undisputed/trivial part of Shreyas' patch
>> he attached to https://bugzilla.redhat.com/1364740 (of
>> which the current bug is a clone).
>>
>> We need more evaluation for the page_size and window_size
>> bits before taking them on.
>>
>> Change-Id: Iaa0b9a69d35e522b77a52a09acef47460e8ae3e9
>> BUG: 1428060
>> Co-authored-by: Shreyas Siravara <sshreyas at fb.com>
>> Signed-off-by: Csaba Henk <csaba at redhat.com>
>>
>>
>> commit c26cadd31dfa128c4ec6883f69d654813f351018
>> Author: Poornima G <pgurusid at redhat.com>
>> Date: Fri Jun 30 12:52:21 2017 +0530
>>
>> quick-read: Integrate quick read with upcall and increase cache time
>>
>> Fixes : #261
>> Co-author: Subha sree Mohankumar <smohanku at redhat.com>
>> Change-Id: Ie9dd94e86459123663b9b200d92940625ef68eab
>> Signed-off-by: Poornima G <pgurusid at redhat.com>
>>
>> commit d95db5505a9cb923e61ccd23d28b45ceb07b716f
>> Author: Shreyas Siravara <sshreyas at fb.com>
>> Date: Thu Sep 7 15:34:58 2017 -0700
>>
>> md-cache: Cache statfs calls
>>
>> Summary:
>> - This gives md-cache to cache statfs calls
>> - You can turn it on or off via 'gluster vol set groot
>> performance.md-cache-statfs <on|off>'
>>
>> Change-Id: I664579e3c19fb9a6cd9d7b3a0eae061f70f4def4
>> BUG: 1523295
>> Signature: t1:4652632:1488581841:111cc01e
>> fe83c71f1e98d075abb10589c4574705
>> Reviewed-on: https://review.gluster.org/18228
>> Reviewed-by: Shreyas Siravara <sshreyas at fb.com>
>> CentOS-regression: Gluster Build System <jenkins at build.gluster.org>
>> Smoke: Gluster Build System <jenkins at build.gluster.org>
>> Signed-off-by: Shreyas Siravara <sshreyas at fb.com>
>>
>> commit 430484c92ab5a6234958d1143e0bb14aeb0cd1c0
>> Author: Mohit Agrawal <moagrawa at redhat.com>
>> Date: Fri Oct 20 12:39:29 2017 +0530
>>
>> glusterfs: Use gcc builtin ATOMIC operator to increase/decreate
>> refcount.
>>
>> Problem: In glusterfs code base we call mutex_lock/unlock to take
>> reference/dereference for a object.Sometime it could be
>> reason for lock contention also.
>>
>> Solution: There is no need to use mutex to increase/decrease ref
>> counter, instead of using mutex use gcc builtin ATOMIC
>> operation.
>>
>> Test: I have not observed yet how much performance gain after apply
>> this patch specific to glusterfs but i have tested same
>> with below small program(mutex and atomic both) and
>> get good difference.
>>
>> Change-Id: Ie5030a52ea264875e002e108dd4b207b15ab7cc7
>> Signed-off-by: Mohit Agrawal <moagrawa at redhat.com>
>>
>> commit f9b6174a7f5eb6475ca9780b062bfb3ff1132b2d
>> Author: Shreyas Siravara <sshreyas at fb.com>
>> Date: Mon Apr 10 12:36:21 2017 -0700
>>
>> posix: Add option to disable nftw() based deletes when purging the
>> landfill directory
>>
>> Summary:
>> - We may have found an issue where certain directories were being
>> moved into .landfill and then being quickly purged via nftw().
>> - We would like to have an emergency option to disable these purges.
>>
>> > Reviewed-on: https://review.gluster.org/18253
>> > Reviewed-by: Shreyas Siravara <sshreyas at fb.com>
>>
>> Fixes #371
>>
>> Change-Id: I90b54c535930c1ca2925a928728199b6b80eadd9
>> Signed-off-by: Amar Tumballi <amarts at redhat.com>
>>
>> commit 59d1cc720f52357f7a6f20bb630febc6a622c99c
>> Author: Raghavendra G <rgowdapp at redhat.com>
>> Date: Tue Sep 19 09:44:55 2017 +0530
>>
>> cluster/dht: populate inode in dentry for single subvolume dht
>>
>> ... in readdirp response if dentry points to a directory inode. This
>> is a special case where the entire layout is stored in one single
>> subvolume and hence no need for lookup to construct the layout
>>
>> Change-Id: I44fd951e2393ec9dac2af120469be47081a32185
>> BUG: 1492625
>> Signed-off-by: Raghavendra G <rgowdapp at redhat.com>
>>
>> commit e785faead91f74dce7c832848f2e8f3f43bd0be5
>> Author: Raghavendra G <rgowdapp at redhat.com>
>> Date: Mon Sep 18 16:01:34 2017 +0530
>>
>> cluster/dht: don't overfill the buffer in readdir(p)
>>
>> Superflous dentries that cannot be fit in the buffer size provided by
>> kernel are thrown away by fuse-bridge. This means,
>>
>> * the next readdir(p) seen by readdir-ahead would have an offset of a
>> dentry returned in a previous readdir(p) response. When readdir-ahead
>> detects non-monotonic offset it turns itself off which can result in
>> poor readdir performance.
>>
>> * readdirp can be cpu-intensive on brick and there is no point to read
>> all those dentries just to be thrown away by fuse-bridge.
>>
>> So, the best strategy would be to fill the buffer optimally - neither
>> overfill nor underfill.
>>
>> Change-Id: Idb3d85dd4c08fdc4526b2df801d49e69e439ba84
>> BUG: 1492625
>> Signed-off-by: Raghavendra G <rgowdapp at redhat.com>
>>
>> commit 4ad64ffe8664cc0b964586af6efcf53cc619b68a
>> Author: Pranith Kumar K <pkarampu at redhat.com>
>> Date: Fri Nov 17 07:20:21 2017 +0530
>>
>> ec: Use tiebreaker_inodelk where necessary
>>
>> When there are big directories or files that need to be healed,
>> other shds are stuck on getting lock on self-heal domain for these
>> directories/files. If there is a tie-breaker logic, other shds
>> can heal some other files/directories while 1 of the shds is healing
>> the big file/directory.
>>
>> Before this patch:
>> 96.67 4890.64 us 12.89 us 646115887.30us 340869 INODELK
>> After this patch:
>> 40.76 42.35 us 15.09 us 6546.50us 438478 INODELK
>>
>> Fixes gluster/glusterfs#354
>> Change-Id: Ia995b5576b44f770c064090705c78459e543cc64
>> Signed-off-by: Pranith Kumar K <pkarampu at redhat.com>
>>
>> commit 3f8d118e48f11f448f35aca0c48ad40e0fd34f5b
>> Author: Xavier Hernandez <jahernan at redhat.com>
>> Date: Tue Nov 7 13:45:03 2017 +0100
>>
>> libglusterfs/atomic: Improved atomic support
>>
>> This patch solves a detection problem in configure.ac that prevented
>> that compilation detects builtin __atomic or __sync functions.
>>
>> It also adds more atomic types and support for other atomic functions.
>>
>> An special case has been added to support 64-bit atomics on 32-bit
>> systems. The solution is to fallback to the mutex solution only for
>> 64-bit atomics, but smaller atomic types will still take advantage
>> of builtins if available.
>>
>> Change-Id: I6b9afc7cd6e66b28a33278715583552872278801
>> BUG: 1510397
>> Signed-off-by: Xavier Hernandez <jahernan at redhat.com>
>>
>> commit 0dcd5b2feeeec7c29bd2454d6ad950d094d02b0f
>> Author: Xavier Hernandez <jahernan at redhat.com>
>> Date: Mon Oct 16 13:57:59 2017 +0200
>>
>> cluster/ec: create eager-lock option for non-regular files
>>
>> A new option is added to allow independent configuration of eager
>> locking for regular files and non-regular files.
>>
>> Change-Id: I8f80e46d36d8551011132b15c0fac549b7fb1c60
>> BUG: 1502610
>> Signed-off-by: Xavier Hernandez <jahernan at redhat.com>
>>
>> Apart from these commits there are also some patches which aid
>> concurrency in the code. I've left them out since performance benefits are
>> not measured and doesn't affect the users directly. If you feel these have
>> to be added please let me know. Some changes are:
>> * Patches from Zhang Huan <zhanghuan at open-fs.com> aimed to reduce lock
>> contention in rpc layer and while accessing fdtable,
>> * Patches from Milind Changire <mchangir at redhat.com> while accessing
>> programs in rpcsvc.
>>
>> From the commits listed above, I see that following components are
>> affected and I've listed owners for updating a short summary of changes
>> along with the component
>> * glusterd: optimize glusterd import volumes code path - Atin
>> * md-cache - Shreyas and Poornima
>> * EC - Xavi and Pranith (I see that pranith already sent an update. So I
>> guess this is covered)
>> * Improvements to consumption of Atomic Builtins - Xavi and Mohit
>> * Improvements to glusterfind - Niklas Hambüchen, Milind and Aravinda V K
>> * Modification of Quick-read to consume upcall notifications - Poornima
>> * Exposing trickling-writes in write-behind - Csaba and Shreyas
>> * Changes to Purging landfill directory in storage/posix - Shreyas
>> * Adding option to full file lock in afr - Karthick Subramanya
>> * readdirplus enhancements in DHT - Raghavendra Gowdappa
>> * Dentry Fop Serializer - Raghavendra Gowdappa and Amar
>>
>> Please send out patches updating "Performance" section of release notes.
>> If you think your patch need not be mentioned in relase notes too, please
>> send an explicit nack so that we'll know.
>>
>> If I've left out any fixes, please point them out. If not, only subset of
>> changes listed above will have a mention in "performance" section of
>> release notes.
>>
>> On Tue, Feb 20, 2018 at 7:59 AM, Raghavendra Gowdappa <
>> rgowdapp at redhat.com> wrote:
>>
>>> +gluster-devel.
>>>
>>>
>>> On Tue, Feb 20, 2018 at 7:35 AM, Raghavendra Gowdappa <
>>> rgowdapp at redhat.com> wrote:
>>>
>>>> All,
>>>>
>>>> I am trying to come up with content for release notes for 4.0
>>>> summarizing performance impact. Can you point me to
>>>> patches/documentation/issues/bugs that could impact performance in
>>>> 4.0? Better still, if you can give me a summary of changes having
>>>> performance impact, it would be really be helpful.
>>>>
>>>> I see that Pranith had responded with this link:
>>>> https://review.gluster.org/#/c/19535/3/doc/release-notes/4.0.0.md
>>>>
>>>> regards,
>>>> Raghavendra
>>>>
>>>
>>>
>>
>> _______________________________________________
>> Gluster-devel mailing list
>> Gluster-devel at gluster.org
>> http://lists.gluster.org/mailman/listinfo/gluster-devel
>>
>
>
>
> --
> Amye Scavarda | amye at redhat.com | Gluster Community Lead
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-devel/attachments/20180221/0cc9b5d0/attachment-0001.html>
More information about the Gluster-devel
mailing list