[Gluster-devel] Gluster performance updates
Xavi Hernandez
xhernandez at redhat.com
Mon Oct 1 11:45:28 UTC 2018
Hi,
this is an update containing some work done regarding performance and
consistency during latest weeks. We'll try to build a complete list of all
known issues and track them through this email thread. Please, let me know
of any performance issue not included in this email so that we can build
and track all of them.
*New improvements*
While testing performance on Red Hat products, we have identified a problem
in the way eager-locking was working on replicate volumes for some
scenarios (virtualization and database workloads were affected). It caused
an unnecessary amount of finodelk and fxattrop requests, that was
increasing latency of write operations.
This has already been fixed with patches [1] and [2].
We have also identified some additional settings that provide better
performance for database workloads. A patch [3] to update the default
database profile with the new settings has been merged.
Combining all these changes (AFR fix and settings), pgbench performance has
improved ~300% on bare metal using NVME, and a random I/O fio test running
on VM has also improved more than 300%.
*Known issues*
We have identified two issues in fuse mounts:
- Becasue of selinux in client machine, a getxattr request is sent by
fuse before each write request. Though it adds some latency, currently this
request is directly answered by fuse xlator when selinux is not enabled in
gluster (default setting).
- When *fopen-keep-cache* is enabled (default setting), kernel fuse
sends stat requests before each read. Even disabling fopen-keep-cache, fuse
still sends half of the stat requests. This has been tracked down to the
atime update, however mounting a volume with noatime doesn't help to solve
the issue because kernel fuse doesn't correctly handle noatime setting.
Some other issues are detected:
- Bad performance of write-behind when stat and writes to the same file
are mixed. Right now, when a stat is received, all previous cached writes
are flushed before processing the new request. The same happens for reads
when it overlaps with a cached previous write. This makes write-behind
useless in this scenario.
*Note*: fuse is currently sending stat requests before reads (see previous
known issue), making reads almost as problematic as stat requests.
- Self-heal seems to be slow. It's still being investigated but there
are some indications that we have a considerable amount of contention in
io-threads. This contention could be the cause of some other performance
issues, but we'll need to investigate more about this. There is already
some work [4] trying to reduce it.
- 'ls' performance is not good in some cases. When the volume has many
bricks, 'ls' performance tends to degrade. We are still investigating the
cause, but one important factor is that DHT sends readdir(p) requests to
all its subvolumes, This means that 'ls' will run at the speed of the
slower of the bricks. If any brick has an issue, or a spike in load, even
if it's transitory, it will have a bad impact in 'ls' performance. This can
be alleviated by enabling parallel-readdir and readdir-ahead option.
*Note*: There have been reports that enabling parallel-readdir causes some
entries to apparently disappear after some time (though they are still
present on the bricks). I'm not aware of the root cause yet.
- The number of threads in a server is quite high when multiple bricks
are present, even if brick-mux is used. There are some efforts [5] trying
to reduce this number.
*New features*
We have recently started the design [6] of a new caching infrastructure
that should provide much better performance, specially for small files or
metadata intensive workloads. It should also provide a safe infrastructure
to keep cached information consistent on all clients.
This framework will make caching features available to any xlator that
could need them in an easy and safe way.
The current thinking is that current existing caching xlators (mostly
md-cache, io-cache and write-behind) will probably be reworked as a single
complete caching xlator, since this makes things easier.
Any feedback or ideas will be highly appreciated.
Xavi
[1] https://review.gluster.org/21107
[2] https://review.gluster.org/21210
[3] https://review.gluster.org/21247
[4] https://review.gluster.org/21039
[5] https://review.gluster.org/20859
[6]
https://docs.google.com/document/d/1elX-WZfPWjfTdJxXhgwq37CytRehPO4D23aaVowtiE8/edit?usp=sharing
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-devel/attachments/20181001/f44ca3f1/attachment-0001.html>
More information about the Gluster-devel
mailing list