[Gluster-users] striping - only for big files and how to tune

Fri Jan 21 11:42:53 UTC 2011

Hi,

Using glusterfs 3.1.1 with a 4 node striped volume:
# gluster volume info

Volume Name: testvol
Type: Stripe
Status: Started
Number of Bricks: 4
Transport-type: tcp
Bricks:
Brick1: node20.storage.xx.nl:/data1
Brick2: node30.storage.xx.nl:/data1
Brick3: node40.storage.xx.nl:/data1
Brick4: node50.storage.xx.nl:/data1

To do some performance test I copied /usr to the gluster volume:
[root at drbd10.storage ~]# time rsync -avzx --quiet  /usr /gluster
real    5m54.453s
user    2m1.026s
sys     0m9.979s
[root at drbd10.storage ~]#

To see whether this operation was successful I check on the storage bricks the number of files, and used blocks. I expected these to be the same on all the bricks, because I use a striped configuration. The results are:

Number of files seen on the client:
[root at drbd10.storage ~]# find /gluster/usr -ls| wc -l
57517

Number of files seen on the storage bricks:
# mpssh -f s2345.txt 'find /data1/usr -ls | wc -l'                                 
  [*] read (4) hosts from the list
  [*] executing "find /data1/usr -ls | wc -l" as user "root" on each
  [*] spawning 4 parallel ssh sessions

node20 -> 57517
node30 -> 55875
node40 -> 55875
node50 -> 55875

Why has node20 all the files, but the others seem to miss quiet a lot.

The same but now for the real used storage blocks:
On the client:
[root at drbd10.storage ~]# du -sk /gluster/usr                                                                      
1229448 /gluster/usr

On the storage bricks:
# mpssh -f s2345.txt 'du -sk /data1/usr'                                           
  [*] read (4) hosts from the list
  [*] executing "du -sk /data1/usr" as user "root" on each
  [*] spawning 4 parallel ssh sessions

node20 -> 1067784       /data1/usr
node30 -> 535124        /data1/usr
node40 -> 437896        /data1/usr
node50 -> 405920        /data1/usr

In total: 2446724

My conclusions:
- all data is written to the first brick. If files are smaller than the chunk size then there is nothing more to stripe. So the first storage brick fills up with all the small files. Question: Does the filesystem stop working if the volume of the first brick is full?

- when using striping, the overhead seems to be almost 50%. This can get worse when the first node fills up. Question: what is the size of the stripe chunk and can this be tuned for the average size of the files?

All in all, glusterfs seems to be better for "big" files. Is there an "average" file size for which glusterfs is a better choice?

Greetings

Peter Gotwalt