[Gluster-devel] how to set the default read/write block size for all transactions for optimal performance (e.g. anything similar to rsize, wsize nfs options?)
Sabuj Pattanayek
sabujp at gmail.com
Sun Jul 20 16:17:21 UTC 2008
Hi,
I've tested several clustered file systems (OCFS2, XSAN, GFS2) and I
really like the simplicity (unify/stripe translators) and portability
of Gluster. However, I'm getting poor performance versus NFS if I use
a bs=4k with dd but if I use bs=128k then the performance is
comparable to gigE NFS, still nowhere near the speed directly to the
storage but that's ok because everything is going over TCP/IP when
using Gluster. Here's the test on the storage itself (15 disk RAID5 on
an Infortrend Eonstor A16F-G2221 2GB FC <-> 4GB FC QLogic Switch <->
4GB FC QLogic HBA on the server "porpoise" running XFS on the RAID5):
90 porpoise:/export/eon0/tmp% time dd if=/dev/zero of=testFile bs=4k
count=500000
2048000000 bytes (2.0 GB) copied, 9.37949 s, 218 MB/s
Here's the NFS mount going over gigE (server and client are the same):
porpoise-san:/export/eon0 on /mnt/eon0 type nfs (rw,addr=10.2.179.3)
Here's the test:
93 porpoise:/mnt/eon0/tmp% time dd if=/dev/zero of=testFile bs=4k count=500000
2048000000 bytes (2.0 GB) copied, 25.7614 s, 79.5 MB/s
Basically I'm looking for something comparable to the NFS test above
with Gluster, Here's the mount:
glusterfs 5.1T 3.6G 5.1T 1% /export/glfs
Here's the test:
88 porpoise:/export/glfs/tmp% time dd if=/dev/zero of=testFile bs=4k count=50000
204800000 bytes (205 MB) copied, 17.7291 s, 11.6 MB/s
0.106u 0.678s 0:17.73 4.3% 0+0k 0+0io 0pf+0w
The data size was reduced for the GlusterFS test because I didn't want
to wait :) . But if I increase the bs the speed becomes faster:
99 porpoise:/export/glfs/tmp% time dd if=/dev/zero of=testFile bs=64k
count=27500
1802240000 bytes (1.8 GB) copied, 26.4466 s, 68.1 MB/s
If I increase the bs=128k it the performance is even better:
100 porpoise:/export/glfs/tmp% time dd if=/dev/zero of=testFile
bs=128k count=13750
1802240000 bytes (1.8 GB) copied, 21.2332 s, 84.9 MB/s
How can I tell the Gluster server or client to use a default
read/write block size of 128k or more? With NFS there are the rsize
and wsize options which I believe accomplish the same thing. Here's my
setup, I've dumped the non-relevant bricks as well:
#### glusterfs-server.vol ####
volume eon0
type storage/posix
option thread-count 8
option cache-size 1024MB
option directory /export/eon0
end-volume
volume eon1
type storage/posix
option directory /export/eon1
end-volume
volume eon2
type storage/posix
option directory /export/eon2
end-volume
volume glfs-ns
type storage/posix
option directory /export/glfs-ns
end-volume
volume writebehind
type performance/write-behind
#option aggregate-size 131072 # in bytes
option aggregate-size 1MB # default is 0bytes
option flush-behind on # default is 'off'
subvolumes eon0
end-volume
volume writebehind
type performance/write-behind
option aggregate-size 131072 # in bytes
subvolumes eon1
end-volume
volume writebehind
type performance/write-behind
option aggregate-size 131072 # in bytes
subvolumes eon2
end-volume
volume writebehind
type performance/write-behind
option aggregate-size 131072 # in bytes
subvolumes glfs-ns
end-volume
volume readahead
type performance/read-ahead
option page-size 65536 ### in bytes
option page-count 16 ### memory cache size is page-count x page-size
per file
subvolumes eon0
end-volume
volume readahead
type performance/read-ahead
option page-size 65536 ### in bytes
option page-count 16 ### memory cache size is page-count x page-size
per file
subvolumes eon1
end-volume
volume readahead
type performance/read-ahead
option page-size 65536 ### in bytes
option page-count 16 ### memory cache size is page-count x page-size
per file
subvolumes eon2
end-volume
volume readahead
type performance/read-ahead
option page-size 65536 ### in bytes
option page-count 16 ### memory cache size is page-count x page-size
per file
subvolumes glfs-ns
end-volume
volume iothreads
type performance/io-threads
option thread-count 4 # deault is 1
option cache-size 64MB
subvolumes eon0
end-volume
volume iothreads
type performance/io-threads
option thread-count 4 # deault is 1
option cache-size 64MB
subvolumes eon1
end-volume
volume iothreads
type performance/io-threads
option thread-count 4 # deault is 1
option cache-size 64MB
subvolumes eon2
end-volume
volume iothreads
type performance/io-threads
option thread-count 4 # deault is 1
option cache-size 64MB
subvolumes glfs-ns
end-volume
volume server
type protocol/server
option transport-type tcp/server
option auth.ip.eon0.allow 10.2.179.*
option auth.ip.eon1.allow 10.2.179.*
option auth.ip.eon2.allow 10.2.179.*
option auth.ip.glfs-ns.allow 10.2.179.*
subvolumes eon0 eon1 eon2 glfs-ns
end-volume
####
#### glusterfs-client.vol ####
volume eon0
type protocol/client
option transport-type tcp/client
option remote-host porpoise-san
option remote-subvolume eon0
end-volume
volume eon1
type protocol/client
option transport-type tcp/client
option remote-host porpoise-san
option remote-subvolume eon1
end-volume
volume eon2
type protocol/client
option transport-type tcp/client
option remote-host porpoise-san
option remote-subvolume eon2
end-volume
volume glfs-ns
type protocol/client
option transport-type tcp/client
option remote-host porpoise-san
option remote-subvolume glfs-ns
end-volume
volume writebehind
type performance/write-behind
#option aggregate-size 131072 # in bytes
option aggregate-size 1MB # default is 0bytes
option flush-behind on # default is 'off'
subvolumes eon0
end-volume
volume writebehind
type performance/write-behind
option aggregate-size 131072 # in bytes
subvolumes eon1
end-volume
volume writebehind
type performance/write-behind
option aggregate-size 131072 # in bytes
subvolumes eon2
end-volume
volume writebehind
type performance/write-behind
option aggregate-size 131072 # in bytes
subvolumes glfs-ns
end-volume
volume readahead
type performance/read-ahead
option page-size 1MB
option page-count 2
#option page-size 65536 ### in bytes
#option page-count 16 ### memory cache size is page-count x page-size per file
subvolumes eon0
end-volume
volume readahead
type performance/read-ahead
option page-size 65536 ### in bytes
option page-count 16 ### memory cache size is page-count x page-size per file
subvolumes eon1
end-volume
volume readahead
type performance/read-ahead
option page-size 65536 ### in bytes
option page-count 16 ### memory cache size is page-count x page-size per file
subvolumes eon2
end-volume
volume readahead
type performance/read-ahead
option page-size 65536 ### in bytes
option page-count 16 ### memory cache size is page-count x page-size per file
subvolumes glfs-ns
end-volume
volume io-cache
type performance/io-cache
option cache-size 64MB # default is 32MB
option page-size 1MB #128KB is default option
#option priority *.h:3,*.html:2,*:1 # default is '*:0'
option priority *:0
option force-revalidate-timeout 2 # default is 1
subvolumes eon0
end-volume
#volume unify0
# type cluster/unify
# option scheduler rr # round robin
# option namespace glfs-ns
#subvolumes eon0 eon1 eon2
# subvolumes eon0
#end-volume
#volume stripe0
# type cluster/stripe
# option block-size *:1MB
# subvolumes eon0 eon1 eon2
#end-volume
####
I've tried a very basic server/client setup with no translators to the
setup above and almost everything in between to try to improve the
performance. The server/client system is an Apple XServe G5 running
Gentoo PPC64:
Linux porpoise 2.6.24.4 #6 Sun Jul 20 00:16:04 CDT 2008 ppc64
PPC970FX, altivec supported RackMac3,1 GNU/Linux
% cat /proc/cpuinfo
processor : 0
cpu : PPC970FX, altivec supported
clock : 2000.000000MHz
revision : 3.0 (pvr 003c 0300)
timebase : 33333333
platform : PowerMac
machine : RackMac3,1
motherboard : RackMac3,1 MacRISC4 Power Macintosh
detected as : 339 (XServe G5)
pmac flags : 00000000
L2 cache : 512K unified
pmac-generation : NewWorld
% cat /proc/meminfo
MemTotal: 2006988 kB
MemFree: 107864 kB
Buffers: 676 kB
Cached: 1775800 kB
SwapCached: 0 kB
Active: 46672 kB
Inactive: 1762528 kB
SwapTotal: 3583928 kB
SwapFree: 3583624 kB
Dirty: 0 kB
Writeback: 0 kB
AnonPages: 32744 kB
Mapped: 10292 kB
Slab: 65704 kB
SReclaimable: 50620 kB
SUnreclaim: 15084 kB
PageTables: 1180 kB
NFS_Unstable: 0 kB
Bounce: 0 kB
CommitLimit: 4587420 kB
Committed_AS: 261212 kB
VmallocTotal: 8589934592 kB
VmallocUsed: 6352 kB
VmallocChunk: 8589928088 kB
Here's what's under the hood:
# lspci
0000:f0:0b.0 Host bridge: Apple Computer Inc. U3H AGP Bridge
0001:00:01.0 PCI bridge: Advanced Micro Devices [AMD] AMD-8131 PCI-X
Bridge (rev 12)
0001:00:02.0 PCI bridge: Advanced Micro Devices [AMD] AMD-8131 PCI-X
Bridge (rev 12)
0001:00:03.0 PCI bridge: Apple Computer Inc. K2 HT-PCI Bridge
0001:00:04.0 PCI bridge: Apple Computer Inc. K2 HT-PCI Bridge
0001:00:05.0 PCI bridge: Apple Computer Inc. K2 HT-PCI Bridge
0001:00:06.0 PCI bridge: Apple Computer Inc. K2 HT-PCI Bridge
0001:00:07.0 PCI bridge: Apple Computer Inc. K2 HT-PCI Bridge
0001:01:07.0 Class ff00: Apple Computer Inc. K2 KeyLargo Mac/IO (rev 60)
0001:02:0b.0 USB Controller: NEC Corporation USB (rev 43)
0001:02:0b.1 USB Controller: NEC Corporation USB (rev 43)
0001:02:0b.2 USB Controller: NEC Corporation USB 2.0 (rev 04)
0001:03:0d.0 Class ff00: Apple Computer Inc. K2 ATA/100
0001:03:0e.0 FireWire (IEEE 1394): Apple Computer Inc. K2 FireWire
0001:05:0c.0 IDE interface: Broadcom K2 SATA
0001:06:02.0 VGA compatible controller: ATI Technologies Inc Radeon
RV100 QY [Radeon 7000/VE]
0001:06:03.0 Fibre Channel: QLogic Corp. ISP2422-based 4Gb Fibre
Channel to PCI-X HBA (rev 02)
0001:06:03.1 Fibre Channel: QLogic Corp. ISP2422-based 4Gb Fibre
Channel to PCI-X HBA (rev 02)
0001:07:04.0 Ethernet controller: Broadcom Corporation NetXtreme
BCM5704 Gigabit Ethernet (rev 03)
0001:07:04.1 Ethernet controller: Broadcom Corporation NetXtreme
BCM5704 Gigabit Ethernet (rev 03)
With glusterfs-1.3.10 and fuse-2.7.3glfs10 compiled from source. Any
help would be greatly appreciated.
Thanks,
Sabuj Pattanayek
Senior SysAdmin
http://structbio.vanderbilt.edu
More information about the Gluster-devel
mailing list