[Gluster-devel] [Questions] About small files performance

Thu Jul 20 10:06:04 UTC 2017

Dear all

Recently, i did some work to test small files performance for gnfsv3 
transport. Following is my scenario.

#####environment#####
==2 cluster nodes(nodeA/nodeB)==
each is equipped with E5-2650*2, 128G memory and 10GB*2 netcard
nodeA: 10.254.3.77  10.128.3.77
nodeB: 10.254.3.78  10.128.3.78

==2 stress nodes(clientA/clientB)==
each is equipped with E5-2650*2, 128G memory and 10GB*2 netcard
clientA: 10.254.3.75
clientB: 10.254.3.76

1) 10.254.3.* is for test segment, 10.128.3.* is for cluster internal 
communication.

#####vdbench setup#####
hd=default,vdbench=/root/vdbench/,user=root,shell=ssh
#hd=hd1,system=10.254.3.xx
#hd=hd2,system=10.254.3.xx

fsd=fsd1,anchor=/mnt/smalltest1/smalltest/,depth=2,width=100,openflags=o_direct,files=100,size=64k,shared=yes

fwd=format,threads=256,xfersize=xxx
fwd=default,xfersize=xxx,fileio=random,fileselect=random,rdpct=60,threads=256
#fwd=fwd1,fsd=fsd1,host=hd1
#fwd=fwd2,fsd=fsd1,host=hd2

rd=rd1,fwd=fwd*,fwdrate=max,format=restart,elapsed=600,interval=1

1) Use *o_direct* to bypass cache.
2) More than 256 threads show no affect in this test
3) Total 100 millon 64k files

#####volume info#####
Volume Name: ttt
Type: Replicate
Volume ID: cf23b1fe-d430-4ede-b33b-b54a2c04d080
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x 2 = 2
Transport-type: tcp
Bricks:
Brick1: 10.128.3.77:/gluster/brick-mm
Brick2: 10.128.3.78:/gluster/brick-mm
Options Reconfigured:
performance.nfs.stat-prefetch: off
performance.nfs.quick-read: off
performance.nfs.io-threads: on
client.event-threads: 32
server.event-threads: 32
features.shard: off
nfs.trusted-sync: on
performance.cache-size: 4000MB
performance.io-thread-count: 64
transport.address-family: inet
performance.readdir-ahead: on
nfs.disable: off

Note:
1) I put 10.128.3.*:/gluster/brick-mm on tmpfs, so we can ignore io latency.
2) The key values are based on my experience for best peformance
3) The options of mount.nfs are default because 1M 'rsize/wsize' and 
'async' are the best choice. I also dig other options, no significant 
performance difference to me
4) I've set performance.cache-size as 30GB, but it shows no diffrence to me
5) The network bandwidth is not full for all tests
6) I've tried 'nfs.mem-factor' 'rpc.outstanding-rpc-limit', but gained 
nothing
7) The version of gluster is 3.8.4

Firstly i get some data with kernel nfs for comparison, the export dir 
(rw,async,no_root_squash,no_all_squash) is also in tmpfs:
[testA]
nfs.client: clientA
nfs.server: nodeA
xfersize=32k
25000ops

[testB]
nfs.client: clientA
nfs.server: nodeA
xfersize=4k
100000ops

The i did the gnfsv3 tests:
[testC]
gnfs.client: clientA(mount nodeA)
gnfs.server: nodeA nodeB
xfersize=32k
10000ops

[testD]
gnfs.client: clientA(mount nodeA) clientB(mount nodeB)
gnfs.server: nodeA nodeB
xfersize=32k
10000ops

For testA vs testB, small xfersize archive plenty of ops, and i got the 
same result in gnfs.

For testC vs testD, it seems that there is a *bottle neck* with the 
cluster, 10000ops is limit value to me, am I right? More, i've added 
more stress nodes and thread counts, but just little affect.

We can also dig something from testA and testC. Event if gnfs is as 
efficient as kernel nfs, gluster fell 60% ops performance!

Although it's known that gluster is designed for large files. But I'm a 
little greedy to ask if there is anyway to promote small files performance?

Any idea and/or challenge for tests would be appreciated, thanks in 
advance ；）

-- 
Thanks
     -Xie