[Gluster-users] Low (<0.2ms) latency reads, is it possible at all?
Willem
gwillem at gmail.com
Thu Apr 18 18:28:46 UTC 2013
I'm testing GlusterFS viability for use with a typical PHP webapp (ie. lots
of small files). I don't care so much for the C in the CAP theorem, as I
have very few writes. I could live with a write propagation delay of 5
minutes (or dirty caches for up to 5 minutes).
So I'm optimizing for low latency reads of small files. My testsetup is 2
node replication. Each node is both server and gluster client. Both are in
sync. I stop glusterfs-server @ node2. @node1, I run a simple benchmark:
repeatedly (to prime the cache) open & close 1000 small files. I have
enabled the client-side io-cache and quick-read translators (see below for
config).
The results are consistently 2 ms per open (O_RDONLY) call. Which is too
slow, unfortunately, as I need < 0.2ms.
The same test with a local Gluster server and NFS mount, I get somewhat
better performance but still 0.6ms.
The same test with Linux NFS server (v3) and local mount, I get 0.12ms per
open.
I can't explain the lag using Gluster, because I can't see any traffic
being sent to node2. I would expect that using the io-cache translator and
local-only operation, the performance would approach that of the kernel FS
cache.
Is this assumption correct? If yes, how would I profile the client sub
system to detect the bottleneck?
If no, then I have to accept that 0.8ms open calls are the best that I
could squeeze out of this system. Then I'll probably look into AFS,
userspace async replication or gluster NFS mount with cachefilesd. Which
would you recommend?
Thanks a lot!
BTW I like Gluster a lot, and hope that it is also suitable for this small
files use case ;)
//Willem
PS Am testing with kernel 3.5.0-17-generic 64bit and gluster 3.2.5-1ubuntu1.
Client volfile:
+------------------------------------------------------------------------------+
1: volume testvol-client-0
2: type protocol/client
3: option remote-host g1
4: option remote-subvolume /data
5: option transport-type tcp
6: end-volume
7:
8: volume testvol-client-1
9: type protocol/client
10: option remote-host g2
11: option remote-subvolume /data
12: option transport-type tcp
13: end-volume
14:
15: volume testvol-replicate-0
16: type cluster/replicate
17: subvolumes testvol-client-0 testvol-client-1
18: end-volume
19:
20: volume testvol-write-behind
21: type performance/write-behind
22: option flush-behind on
23: subvolumes testvol-replicate-0
24: end-volume
25:
26: volume testvol-io-cache
27: type performance/io-cache
28: option max-file-size 256KB
29: option cache-timeout 60
30: option priority *.php:3,*:0
31: option cache-size 256MB
32: subvolumes testvol-write-behind
33: end-volume
34:
35: volume testvol-quick-read
36: type performance/quick-read
37: option cache-size 256MB
38: subvolumes testvol-io-cache
39: end-volume
40:
41: volume testvol
42: type debug/io-stats
43: option latency-measurement off
44: option count-fop-hits off
45: subvolumes testvol-quick-read
46: end-volume
Server volfile:
+------------------------------------------------------------------------------+
1: volume testvol-posix
2: type storage/posix
3: option directory /data
4: end-volume
5:
6: volume testvol-access-control
7: type features/access-control
8: subvolumes testvol-posix
9: end-volume
10:
11: volume testvol-locks
12: type features/locks
13: subvolumes testvol-access-control
14: end-volume
15:
16: volume testvol-io-threads
17: type performance/io-threads
18: subvolumes testvol-locks
19: end-volume
20:
21: volume testvol-marker
22: type features/marker
23: option volume-uuid bc89684f-569c-48b0-bc67-09bfd30ba253
24: option timestamp-file /etc/glusterd/vols/testvol/marker.tstamp
25: option xtime off
26: option quota off
27: subvolumes testvol-io-threads
28: end-volume
29:
30: volume /data
31: type debug/io-stats
32: option latency-measurement off
33: option count-fop-hits off
34: subvolumes testvol-marker
35: end-volume
36:
37: volume testvol-server
38: type protocol/server
39: option transport-type tcp
40: option auth.addr./data.allow *
41: subvolumes /data
42: end-volume
My benchmark to simulate PHP webapp i/o:
#!/usr/bin/env python
import sys
import os
import time
import optparse
def print_timing(func):
def wrapper(*arg):
t1 = time.time()
res = func(*arg)
t2 = time.time()
print '%-15.15s %6d ms' % (func.func_name, int ( (t2-t1)*1000.0 ))
return res
return wrapper
def parse_options():
parser = optparse.OptionParser()
parser.add_option("--path", '-p', default="/mnt/glusterfs",
help="Base directory for running tests (default: /mnt/glusterfs)",
)
parser.add_option("--num", '-n', type="int", default=100,
help="Number of files per test (default: 100)",
)
(options, args) = parser.parse_args()
return options
class FSBench():
def __init__(self,path="/tmp",num=100):
self.path = path
self.num = num
@print_timing
def test_open_read(self):
for filename in self.get_files():
f = open(filename)
data = f.read()
f.close()
def get_files(self):
for i in range(self.num):
filename = self.path + "/test_%03d" % i
yield filename
@print_timing
def test_stat(self):
for filename in self.get_files():
os.stat(filename)
@print_timing
def test_stat_nonexist(self):
for filename in self.get_files():
try:
os.stat(filename+"blkdsflskdf")
except OSError:
pass
@print_timing
def test_write(self):
for filename in self.get_files():
f = open(filename,'w')
f.write('hi there\n')
f.close()
@print_timing
def test_delete(self):
for filename in self.get_files():
os.unlink(filename)
if __name__ == '__main__':
options = parse_options()
bench = FSBench(path=options.path, num=options.num)
bench.test_write()
bench.test_open_read()
bench.test_stat()
bench.test_stat_nonexist()
bench.test_delete()
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://supercolony.gluster.org/pipermail/gluster-users/attachments/20130418/9b8ff478/attachment.html>
More information about the Gluster-users
mailing list