[Gluster-devel] Inputs for 4.0 Release notes on Performance
Raghavendra Gowdappa
rgowdapp at redhat.com
Wed Feb 21 04:11:46 UTC 2018
>From 'git log release-3.13..release-4.0' I see following patches that
might've an impact on performance:
commit a32ff73c06e1e14589817b1701c1c8d0f05aaa04
Author: Atin Mukherjee <amukherj at redhat.com>
Date: Mon Jan 29 10:23:52 2018 +0530
glusterd: optimize glusterd import volumes code path
In case there's a version mismatch detected for one of the volumes
glusterd was ending up with updating all the volumes which is a
overkill.
>mainline patch : https://review.gluster.org/#/c/19358/
Change-Id: I6df792db391ce3a1697cfa9260f7dbc3f59aa62d
BUG: 1540554
Signed-off-by: Atin Mukherjee <amukherj at redhat.com>
(cherry picked from commit bb34b07fd2ec5e6c3eed4fe0cdf33479dbf5127b)
commit ea972d9f5c9b318429c228108c21a334b4acd95c
Author: Sakshi Bansal <sabansal at redhat.com>
Date: Mon Jan 22 14:38:17 2018 +0530
dentry fop serializer: added new server side xlator for dentry fop
serialization
Problems addressed by this xlator :
[1]. To prevent race between parallel mkdir,mkdir and lookup etc.
Fops like mkdir/create, lookup, rename, unlink, link that happen on a
particular dentry must be serialized to ensure atomicity.
Another possible case can be a fresh lookup to find existance of a path
whose gfid is not set yet. Further, storage/posix employs a ctime based
heuristic 'is_fresh_file' (interval time is less than 1 second of
current
time) to check fresh-ness of file. With serialization of these two fops
(lookup & mkdir), we eliminate the race altogether.
[2]. Staleness of dentries
This causes exponential increase in traversal time for any inode in the
subtree of the directory pointed by stale dentry.
Cause : Stale dentry is created because of following two operations:
a. dentry creation due to inode_link, done during operations like
lookup, mkdir, create, mknod, symlink, create and
b. dentry unlinking due to various operations like rmdir, rename,
unlink.
The reason is __inode_link uses __is_dentry_cyclic, which
explores
all possible path to avoid cyclic link formation during inode
linkage. __is_dentry_cyclic explores stale-dentry(ies) and its
all ancestors which is increases traversing time exponentially.
Implementation : To acheive this all fops on dentry must take entry
locks
before they proceed, once they have acquired locks, they perform the fop
and then release the lock.
Some documentation from email conversation:
[1]
http://www.gluster.org/pipermail/gluster-devel/2015-December/047314.html
[2]
http://www.gluster.org/pipermail/gluster-devel/2015-August/046428.html
With this patch, the feature is optional, enable it by running:
`gluster volume set $volname features.sdfs enable`
Change-Id: I6e80ba3cabfa6facd5dda63bd482b9bf18b6b79b
Fixes: #397
Signed-off-by: Sakshi Bansal <sabansal at redhat.com>
Signed-off-by: Amar Tumballi <amarts at redhat.com>
Signed-off-by: Sunny Kumar <sunkumar at redhat.com>
commit 24bf7715140586675f8d2036f4d589bc255c16dc
Author: Poornima G <pgurusid at redhat.com>
Date: Tue Jan 9 17:26:44 2018 +0530
md-cache: Implement dynamic configuration of xattr list for caching
Currently, the list of xattrs that md-cache can cache is hard coded
in the md-cache.c file, this necessiates code change and rebuild
everytime a new xattr needs to be added to md-cache xattr cache
list.
With this patch, the user will be able to configure a comma
seperated list of xattrs to be cached by md-cache
Updates #297
Change-Id: Ie35ed607d17182d53f6bb6e6c6563ac52bc3132e
Signed-off-by: Poornima G <pgurusid at redhat.com>
commit efc30e60e233164bd4fe7fc903a7c5f718b0448b
Author: Poornima G <pgurusid at redhat.com>
Date: Tue Jan 9 10:32:16 2018 +0530
upcall: Allow md-cache to specify invalidations on xattr with wildcard
Currently, md-cache sends a list of xattrs, it is inttrested in
recieving
invalidations for. But, it cannot specify any wildcard in the xattr
names
Eg: user.* - invalidate on updating any xattr with user. prefix.
This patch, enable upcall to honor wildcard in the xattr key names
Updates: #297
Change-Id: I98caf0ed72f11ef10770bf2067d4428880e0a03a
Signed-off-by: Poornima G <pgurusid at redhat.com>
commit 8fc9c6a8fc7c73b2b4c65a8ddbe988bca10e89b6
Author: Poornima G <pgurusid at redhat.com>
Date: Thu Jan 4 19:38:05 2018 +0530
posix: In getxattr, honor the wildcard '*'
Currently, the posix_xattr_fill performas a sys_getxattr
on all the keys requested, there are requirements where
the keys could contain a wildcard, in which case sys_getxattr
would return ENODATA, eg: if the xattr requested is user.*
all the xattrs with prefix user. should be returned, with their
values.
This patch, changes posix_xattr_fill, to honor wildcard in the keys
requested.
Updates #297
Change-Id: I3d52da2957ac386fca3c156e26ff4cdf0b2c79a9
Signed-off-by: Poornima G <pgurusid at redhat.com>
commit 84c5c540b26c8f3dcb9845344dd48df063e57845
Author: karthik-us <ksubrahm at redhat.com>
Date: Wed Jan 17 17:30:06 2018 +0530
cluster/afr: Adding option to take full file lock
Problem:
In replica 3 volumes there is a possibilities of ending up in split
brain scenario, when multiple clients writing data on the same file
at non overlapping regions in parallel.
Scenario:
- Initially all the copies are good and all the clients gets the value
of data readables as all good.
- Client C0 performs write W1 which fails on brick B0 and succeeds on
other two bricks.
- C1 performs write W2 which fails on B1 and succeeds on other two
bricks.
- C2 performs write W3 which fails on B2 and succeeds on other two
bricks.
- All the 3 writes above happen in parallel and fall on different ranges
so afr takes granular locks and all the writes are performed in
parallel.
Since each client had data-readables as good, it does not see
file going into split-brain in the in_flight_split_brain check, hence
performs the post-op marking the pending xattrs. Now all the bricks
are being blamed by each other, ending up in split-brain.
Fix:
Have an option to take either full lock or range lock on files while
doing data transactions, to prevent the possibility of ending up in
split brains. With this change, by default the files will take full
lock while doing IO. If you want to make use of the old range lock
change the value of "cluster.full-lock" to "no".
Change-Id: I7893fa33005328ed63daa2f7c35eeed7c5218962
BUG: 1535438
Signed-off-by: karthik-us <ksubrahm at redhat.com>
commit 2db7872d5251d98d47c262ff269776bfae2d4fb9
Author: Poornima G <pgurusid at redhat.com>
Date: Mon Aug 7 11:24:46 2017 +0530
md-cache: Serve nameless lookup from cache
Updates #232
Change-Id: I97e92312a53a50c2d1660bf8d657201fc05a76eb
Signed-off-by: Poornima G <pgurusid at redhat.com>
commit 78d67da17356b48cf1d5a6595764650d5b200ba7
Author: Sunil Kumar Acharya <sheggodu at redhat.com>
Date: Thu Mar 23 12:50:41 2017 +0530
cluster/ec: OpenFD heal implementation for EC
Existing EC code doesn't try to heal the OpenFD to
avoid unnecessary healing of the data later.
Fix implements the healing of open FDs before
carrying out file operations on them by making an
attempt to open the FDs on required up nodes.
BUG: 1431955
Change-Id: Ib696f59c41ffd8d5678a484b23a00bb02764ed15
Signed-off-by: Sunil Kumar Acharya <sheggodu at redhat.com>
commit 14dbd5da1cae64e6d4d2c69966e19844d090ce98
Author: Niklas Hambüchen <mail at nh2.me>
Date: Fri Dec 29 15:49:13 2017 +0100
glusterfind: Speed up gfid lookup 100x by using an SQL index
Fixes #1529883.
This fixes some bits of `glusterfind`'s horrible performance,
making it 100x faster.
Until now, glusterfind was, for each line in each CHANGELOG.* file,
linearly reading the entire contents of the sqlite database in
4096-bytes-sized pread64() syscalls when executing the
SELECT COUNT(1) FROM %s WHERE 1=1 AND gfid = ?
query through the code path:
get_changes()
parse_changelog_to_db()
when_data_meta()
gfidpath_exists()
_exists()
In a quick benchmark on my laptop, doing one such `SELECT` query
took ~75ms on a 10MB-sized sqlite DB, while doing the same query
with an index took < 1ms.
Change-Id: I8e7fe60f1f45a06c102f56b54d2ead9e0377794e
BUG: 1529883
Signed-off-by: Niklas Hambüchen <mail at nh2.me>
commit c96a1338fe8139d07a0aa1bc40f0843d033f0324
Author: Pranith Kumar K <pkarampu at redhat.com>
Date: Wed Dec 6 07:59:53 2017 +0530
cluster/ec: Change [f]getxattr to parallel-dispatch-one
At the moment in EC, [f]getxattr operations wait to acquire a lock
while other operations are in progress even when it is in the same
mount with a
lock on the file/directory. This happens because [f]getxattr operations
follow the model where the operation is wound on 'k' of the bricks and
are
matched to make sure the data returned is same on all of them. This
consistency
check requires that no other operations are on-going while [f]getxattr
operations are wound to the bricks. We can perform [f]getxattr in
another way as well, where we find the good_mask from the lock that is
already
granted and wind the operation on any one of the good bricks and unwind
the
answer after adjusting size/blocks to the parent xlator. Since we are
taking
into account good_mask, the reply we get will either be before or after
a
possible on-going operation. Using this method, the operation doesn't
need to
depend on completion of on-going operations which could be taking long
time (In
case of some slow disks and writes are in progress etc). Thus we reduce
the
time to serve [f]getxattr requests.
I changed [f]getxattr to dispatch-one and added extra logic in
ec_link_has_lock_conflict() to not have any conflicts for fops with
EC_MINIMUM_ONE as fop->minimum to achieve the effect described above.
Modified scripts to make sure READ fop is received in EC to trigger
heals.
Updates gluster/glusterfs#368
Change-Id: I3b4ebf89181c336b7b8d5471b0454f016cdaf296
Signed-off-by: Pranith Kumar K <pkarampu at redhat.com>
commit e255385ae4f4c8a883b3fb96baceba4b143828da
Author: Csaba Henk <csaba at redhat.com>
Date: Fri Nov 10 20:33:20 2017 +0100
write-behind: Allow trickling-writes to be configurable
This is the undisputed/trivial part of Shreyas' patch
he attached to https://bugzilla.redhat.com/1364740 (of
which the current bug is a clone).
We need more evaluation for the page_size and window_size
bits before taking them on.
Change-Id: Iaa0b9a69d35e522b77a52a09acef47460e8ae3e9
BUG: 1428060
Co-authored-by: Shreyas Siravara <sshreyas at fb.com>
Signed-off-by: Csaba Henk <csaba at redhat.com>
commit c26cadd31dfa128c4ec6883f69d654813f351018
Author: Poornima G <pgurusid at redhat.com>
Date: Fri Jun 30 12:52:21 2017 +0530
quick-read: Integrate quick read with upcall and increase cache time
Fixes : #261
Co-author: Subha sree Mohankumar <smohanku at redhat.com>
Change-Id: Ie9dd94e86459123663b9b200d92940625ef68eab
Signed-off-by: Poornima G <pgurusid at redhat.com>
commit d95db5505a9cb923e61ccd23d28b45ceb07b716f
Author: Shreyas Siravara <sshreyas at fb.com>
Date: Thu Sep 7 15:34:58 2017 -0700
md-cache: Cache statfs calls
Summary:
- This gives md-cache to cache statfs calls
- You can turn it on or off via 'gluster vol set groot
performance.md-cache-statfs <on|off>'
Change-Id: I664579e3c19fb9a6cd9d7b3a0eae061f70f4def4
BUG: 1523295
Signature:
t1:4652632:1488581841:111cc01efe83c71f1e98d075abb10589c4574705
Reviewed-on: https://review.gluster.org/18228
Reviewed-by: Shreyas Siravara <sshreyas at fb.com>
CentOS-regression: Gluster Build System <jenkins at build.gluster.org>
Smoke: Gluster Build System <jenkins at build.gluster.org>
Signed-off-by: Shreyas Siravara <sshreyas at fb.com>
commit 430484c92ab5a6234958d1143e0bb14aeb0cd1c0
Author: Mohit Agrawal <moagrawa at redhat.com>
Date: Fri Oct 20 12:39:29 2017 +0530
glusterfs: Use gcc builtin ATOMIC operator to increase/decreate
refcount.
Problem: In glusterfs code base we call mutex_lock/unlock to take
reference/dereference for a object.Sometime it could be
reason for lock contention also.
Solution: There is no need to use mutex to increase/decrease ref
counter, instead of using mutex use gcc builtin ATOMIC
operation.
Test: I have not observed yet how much performance gain after apply
this patch specific to glusterfs but i have tested same
with below small program(mutex and atomic both) and
get good difference.
Change-Id: Ie5030a52ea264875e002e108dd4b207b15ab7cc7
Signed-off-by: Mohit Agrawal <moagrawa at redhat.com>
commit f9b6174a7f5eb6475ca9780b062bfb3ff1132b2d
Author: Shreyas Siravara <sshreyas at fb.com>
Date: Mon Apr 10 12:36:21 2017 -0700
posix: Add option to disable nftw() based deletes when purging the
landfill directory
Summary:
- We may have found an issue where certain directories were being moved
into .landfill and then being quickly purged via nftw().
- We would like to have an emergency option to disable these purges.
> Reviewed-on: https://review.gluster.org/18253
> Reviewed-by: Shreyas Siravara <sshreyas at fb.com>
Fixes #371
Change-Id: I90b54c535930c1ca2925a928728199b6b80eadd9
Signed-off-by: Amar Tumballi <amarts at redhat.com>
commit 59d1cc720f52357f7a6f20bb630febc6a622c99c
Author: Raghavendra G <rgowdapp at redhat.com>
Date: Tue Sep 19 09:44:55 2017 +0530
cluster/dht: populate inode in dentry for single subvolume dht
... in readdirp response if dentry points to a directory inode. This
is a special case where the entire layout is stored in one single
subvolume and hence no need for lookup to construct the layout
Change-Id: I44fd951e2393ec9dac2af120469be47081a32185
BUG: 1492625
Signed-off-by: Raghavendra G <rgowdapp at redhat.com>
commit e785faead91f74dce7c832848f2e8f3f43bd0be5
Author: Raghavendra G <rgowdapp at redhat.com>
Date: Mon Sep 18 16:01:34 2017 +0530
cluster/dht: don't overfill the buffer in readdir(p)
Superflous dentries that cannot be fit in the buffer size provided by
kernel are thrown away by fuse-bridge. This means,
* the next readdir(p) seen by readdir-ahead would have an offset of a
dentry returned in a previous readdir(p) response. When readdir-ahead
detects non-monotonic offset it turns itself off which can result in
poor readdir performance.
* readdirp can be cpu-intensive on brick and there is no point to read
all those dentries just to be thrown away by fuse-bridge.
So, the best strategy would be to fill the buffer optimally - neither
overfill nor underfill.
Change-Id: Idb3d85dd4c08fdc4526b2df801d49e69e439ba84
BUG: 1492625
Signed-off-by: Raghavendra G <rgowdapp at redhat.com>
commit 4ad64ffe8664cc0b964586af6efcf53cc619b68a
Author: Pranith Kumar K <pkarampu at redhat.com>
Date: Fri Nov 17 07:20:21 2017 +0530
ec: Use tiebreaker_inodelk where necessary
When there are big directories or files that need to be healed,
other shds are stuck on getting lock on self-heal domain for these
directories/files. If there is a tie-breaker logic, other shds
can heal some other files/directories while 1 of the shds is healing
the big file/directory.
Before this patch:
96.67 4890.64 us 12.89 us 646115887.30us 340869 INODELK
After this patch:
40.76 42.35 us 15.09 us 6546.50us 438478 INODELK
Fixes gluster/glusterfs#354
Change-Id: Ia995b5576b44f770c064090705c78459e543cc64
Signed-off-by: Pranith Kumar K <pkarampu at redhat.com>
commit 3f8d118e48f11f448f35aca0c48ad40e0fd34f5b
Author: Xavier Hernandez <jahernan at redhat.com>
Date: Tue Nov 7 13:45:03 2017 +0100
libglusterfs/atomic: Improved atomic support
This patch solves a detection problem in configure.ac that prevented
that compilation detects builtin __atomic or __sync functions.
It also adds more atomic types and support for other atomic functions.
An special case has been added to support 64-bit atomics on 32-bit
systems. The solution is to fallback to the mutex solution only for
64-bit atomics, but smaller atomic types will still take advantage
of builtins if available.
Change-Id: I6b9afc7cd6e66b28a33278715583552872278801
BUG: 1510397
Signed-off-by: Xavier Hernandez <jahernan at redhat.com>
commit 0dcd5b2feeeec7c29bd2454d6ad950d094d02b0f
Author: Xavier Hernandez <jahernan at redhat.com>
Date: Mon Oct 16 13:57:59 2017 +0200
cluster/ec: create eager-lock option for non-regular files
A new option is added to allow independent configuration of eager
locking for regular files and non-regular files.
Change-Id: I8f80e46d36d8551011132b15c0fac549b7fb1c60
BUG: 1502610
Signed-off-by: Xavier Hernandez <jahernan at redhat.com>
Apart from these commits there are also some patches which aid concurrency
in the code. I've left them out since performance benefits are not measured
and doesn't affect the users directly. If you feel these have to be added
please let me know. Some changes are:
* Patches from Zhang Huan <zhanghuan at open-fs.com> aimed to reduce lock
contention in rpc layer and while accessing fdtable,
* Patches from Milind Changire <mchangir at redhat.com> while accessing
programs in rpcsvc.
>From the commits listed above, I see that following components are affected
and I've listed owners for updating a short summary of changes along with
the component
* glusterd: optimize glusterd import volumes code path - Atin
* md-cache - Shreyas and Poornima
* EC - Xavi and Pranith (I see that pranith already sent an update. So I
guess this is covered)
* Improvements to consumption of Atomic Builtins - Xavi and Mohit
* Improvements to glusterfind - Niklas Hambüchen, Milind and Aravinda V K
* Modification of Quick-read to consume upcall notifications - Poornima
* Exposing trickling-writes in write-behind - Csaba and Shreyas
* Changes to Purging landfill directory in storage/posix - Shreyas
* Adding option to full file lock in afr - Karthick Subramanya
* readdirplus enhancements in DHT - Raghavendra Gowdappa
* Dentry Fop Serializer - Raghavendra Gowdappa and Amar
Please send out patches updating "Performance" section of release notes. If
you think your patch need not be mentioned in relase notes too, please send
an explicit nack so that we'll know.
If I've left out any fixes, please point them out. If not, only subset of
changes listed above will have a mention in "performance" section of
release notes.
On Tue, Feb 20, 2018 at 7:59 AM, Raghavendra Gowdappa <rgowdapp at redhat.com>
wrote:
> +gluster-devel.
>
>
> On Tue, Feb 20, 2018 at 7:35 AM, Raghavendra Gowdappa <rgowdapp at redhat.com
> > wrote:
>
>> All,
>>
>> I am trying to come up with content for release notes for 4.0 summarizing
>> performance impact. Can you point me to patches/documentation/issues/bugs
>> that could impact performance in 4.0? Better still, if you can give me a
>> summary of changes having performance impact, it would be really be helpful.
>>
>> I see that Pranith had responded with this link:
>> https://review.gluster.org/#/c/19535/3/doc/release-notes/4.0.0.md
>>
>> regards,
>> Raghavendra
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-devel/attachments/20180221/bee6fa3f/attachment-0001.html>
More information about the Gluster-devel
mailing list