[Gluster-users] AFR write performance
Michael McCallister
mikemc-gluster at terabytemedia.com
Fri Nov 20 22:28:18 UTC 2009
Greetings,
I am having what I perceive to be AFR performance problems. Before I
get to that, I will briefly describe the setup...
/** Setup **/
I have glusterfs up and running using the gluster optimized fuse module
on the latest Centos 5 kernel (2.6.18-164.6.1.el5 #1 SMP) running on two
machines (server and client volume configurations are below). Both the
server and client run on both machines. Both servers are connected by a
single CAT6 cable running directly into the Gigabit NICs dedicated to
this task (no switch is used). My goal is simply to mirror files
across both servers. As far as the files themselves, it is mixed, but
there are many and about 90% of them are under 50K. Each server runs a
Quad core Q6600 processor with 8GB or RAM. The disks are quite speedy -
running 15K RPM SAS drives hooked to a 3ware controller (RAID 5 with a
512MB cache). The filesystem is ext3 mounted with noatime. Writing
directly to the ext3 partition with dd if=/dev/zero of=/sites/disktest
bs=1M count=2048 yields 2147483648 bytes (2.1 GB) copied, 4.68686
seconds, 458 MB/s. Kernel optimizations on both servers outside of a
stock CentOS 5 setup include:
3ware controller specific to avoid iowait latency under load:
echo 64 > /sys/block/sda/queue/max_sectors_kb
/sbin/blockdev --setra 8192 /dev/sda
echo 128 > /sys/block/sda/queue/nr_requests
echo 64 > /sys/block/sda/device/queue_depth
echo 10 > /proc/sys/vm/swappiness
echo 16 > /proc/sys/vm/page-cluster
echo 2 > /proc/sys/vm/dirty_background_ratio
echo 40 > /proc/sys/vm/dirty_ratio
Tweaks for better network performance (sysctl.conf):
net/core/rmem_max = 8738000
net/core/wmem_max = 8738000
net/ipv4/tcp_rmem = 8192 873800 8738000
net/ipv4/tcp_wmem = 4096 873800 8738000
/** Gluster Results **/
It should be noted that for the below test results I did not see high
CPU or IOwait times during the tests. Also, there are no other active
processes running on either server. Doing a simple write test using "dd
if=/dev/zero of=/sites/sites/glustertest bs=1M count=2048" I am seeing:
2048+0 records in
2048+0 records out
2147483648 bytes (2.1 GB) copied, 27.8451 seconds, 77.1 MB/s
Which is acceptable for my purposes. I expected around 80 MB/s with the
gigabit NICs being the obvious bottleneck. So for a more real-world
test using the actual files to be clustered, I took a small subset of
the files (22016 of them - 440M in total) and extracted them from a
tarball onto the /sites/sites (mount point for the tests) replicated
cluster. It took 17m28.972s to extract all files. By way of comparison
it takes 0m5.102s when extracting just to the ext3 partition. Here are
some unlinking times (for the 22016 files) - gluster mount: 0m28.428s
ext3: 0m0.456s. And here are some read times: gluster mount:
0m19.871s. You can note from my config that these times are with
"option flush-behind on" for write-behind. During the write test, I
monitored NIC stats on the receiving server to see how much the link was
utilized - its peak was 4.80Mb - so the NIC was not the bottleneck
either. I just cannot find the hold up, the network, disks, and cpu are
not loaded during the write test.
So the biggest issue seems to be AFR write performance. Is this normal
or is there something specific to my setup causing these problems?
Obviously I am new to glusterfs so I do not know what to expect, but I
think I must be doing something wrong.
Any help/advice/direction is greatly appreciated. I have googled and
googled and found no advice that has yielded real results. Sorry if I
missed something obvious that was documented.
Michael
Volume files (same on each server) were first created using the
/usr/bin/glusterfs-volgen --raid 1 --cache-size 512MB --export-directory
/sites_gfs --name sites1 172.16.0.1 172.16.0.2
/** Server - adapted from generated to add one other directory **/
volume posix_sites
type storage/posix
option directory /sites_gfs
end-volume
volume posix_phplib
type storage/posix
option directory /usr/local/lib/php_gfs
end-volume
volume locks_sites
type features/locks
subvolumes posix_sites
end-volume
volume locks_phplib
type features/locks
subvolumes posix_phplib
end-volume
volume brick_sites
type performance/io-threads
option thread-count 8
subvolumes locks_sites
end-volume
volume brick_phplib
type performance/io-threads
option thread-count 8
subvolumes locks_phplib
end-volume
volume server
type protocol/server
option transport-type tcp
option auth.addr.brick_sites.allow *
option auth.addr.brick_phplib.allow *
option listen-port 6996
subvolumes brick_sites brick_phplib
end-volume
/** Client - adapted from generated to try and fix write issues - to no
avail**/
volume 172.16.0.1
type protocol/client
option transport-type tcp
option remote-host 172.16.0.1
option remote-port 6996
option remote-subvolume brick_sites
end-volume
volume 172.16.0.2
type protocol/client
option transport-type tcp
option remote-host 172.16.0.2
option remote-port 6996
option remote-subvolume brick_sites
end-volume
volume mirror-0
type cluster/replicate
subvolumes 172.16.0.1 172.16.0.2
end-volume
volume writebehind
type performance/write-behind
option cache-size 1MB
option flush-behind on
subvolumes mirror-0
end-volume
volume io-cache
type performance/io-cache
option cache-size 64MB
subvolumes writebehind
end-volume
More information about the Gluster-users
mailing list