[Gluster-users] Optimizing write performance to a few large files in a small cluster

Carlos Capriotti capriotti.carlos at gmail.com
Mon Mar 10 21:43:59 UTC 2014


Alexander:

Performance is quite a vague concept. Relative, even. I don't mean to start
some philosophy or anything, but it is true.

To begin with how are you connecting to the gluster volumes ? NFS ? Fuse
(native glusterfs) ?

What volume set are you using ? Striped ? Distributed ?

How is your network set ? Jumbo frames ?

>From the details you provided, you are not a first timer. Sounds like
you've been doing a lot of research. Did  you happen to test the
performance with other services, for instance, native NFS,or even ol-n-good
FPT  ?

Is network performance ok ?

I was fighting some read and write performance issues a couple of weeks ago
on my test servers, and it turns out it was the buffers on my NFS client.
Tweaking that, and performance for LARGE FILES COPY saturated 1 Gbps.

But in the process I've collected an interesting number of gluster and
systcl hacks that seemed to improve performance as well.

Use at your own risk, for this affects memory usage on your server:

For sysctl.conf:

net.core.wmem_max=12582912
net.core.rmem_max=12582912
net.ipv4.tcp_rmem= 10240 87380 12582912
net.ipv4.tcp_wmem= 10240 87380 12582912
net.ipv4.tcp_window_scaling = 1
net.ipv4.tcp_timestamps = 1
net.ipv4.tcp_sack = 1
vm.swappiness=10
vm.dirty_background_ratio=1
net.ipv4.neigh.default.gc_thresh2=2048
net.ipv4.neigh.default.gc_thresh3=4096
net.core.netdev_max_backlog=2500
net.ipv4.tcp_mem= 12582912 12582912 12582912



If using a NFS client, use the following mount options:


-o rw,async,vers=3,rsize=65536,wsize=65536




Gliuster options I am currently using:


network.remote-dio: on
cluster.eager-lock: enable
performance.stat-prefetch: off
performance.io-cache: off
performance.read-ahead: off
performance.quick-read: off
network.ping-timeout: 20
nfs.nlm: off
nfs.addr-namelookup: off



Other gluster options I found elsewhere and are worth a try:

gluster volume set BigVol diagnostics.brick-log-level WARNING
gluster volume set BigVol diagnostics.client-log-level WARNING
gluster volume set BigVol nfs.enable-ino32 on

gluster volume set BigVol performance.cache-max-file-size 2MB
gluster volume set BigVol performance.cache-refresh-timeout 4
gluster volume set BigVol performance.cache-size 256MB
gluster volume set BigVol performance.write-behind-window-size 4MB
gluster volume set BigVol performance.io-thread-count 32

Now, DO keep in mind: mine is a TEST environment, while yours is a
real-life situation.

Cheers,

Carlos





On Mon, Mar 10, 2014 at 7:06 PM, Alexander Valys <avalys at avalys.net> wrote:

> A quick performance question.
>
> I have a small cluster of 4 machines, 64 cores in total.  I am running a
> scientific simulation on them, which writes at between 0.1 and 10 MB/s
> (total) to roughly 64 HDF5 files.  Each HDF5 file is written by only one
> process.  The writes are not continuous, but consist of writing roughly 1
> MB of data to each file every few seconds.
>
> Writing to HDF5 involves a lot of reading the file metadata and random
> seeking within the file,  since we are actually writing to about 30
> datasets inside each file.  I am hosting the output on a distributed
> gluster volume (one brick local to each machine) to provide a unified
> namespace for the (very rare) case when each process needs to read the
> other's files.
>
> I am seeing somewhat lower performance than I expected, i.e. a factor of
> approximately 4 less throughput than each node writing locally to the bare
> drives.  I expected the write-behind cache to buffer each write, but it
> seems that the writes are being quickly flushed across the network
> regardless of what write-behind cache size I use (32 MB currently), and the
> simulation stalls while waiting for the I/O operation to finish.  Anyone
> have any suggestions as to what to look at?  I am using gluster 3.4.2 on
> ubuntu 12.04.  I have flush-behind turned on, and have mounted the volume
> with direct-io-mode=disable, and have the cache size set to 256M.
>
> The nodes are connected via a dedicated gigabit ethernet network, carrying
> only gluster traffic (no simulation traffic).
>
> (sorry if this message comes through twice, I sent it yesterday but was
> not subscribed)
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://supercolony.gluster.org/mailman/listinfo/gluster-users
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://supercolony.gluster.org/pipermail/gluster-users/attachments/20140310/963dd30d/attachment.html>


More information about the Gluster-users mailing list