[Gluster-devel] writebehind slowness

Thu Aug 2 12:48:45 UTC 2007

Hi everyone,

Here is my turn to give you a sum up of lately test results

first of all, a little reminder :

We're using FUSE 2.6.5

our GlusterFS is tla patched 403

our config works like this, basically :

http://users.info.unicaen.fr/~slelievr/tbs-gluster.jpg

we are using a customed bench-script as this :
#/bin/bash
# create and delete 100 files with given SIZE
MAX=1000
SIZE=1024
COUNT=10
[ -n "$1" ] && COUNT=$1

id=0

mkdir bench

while [ $id -lt $MAX ] ; do
        (( id+=1 ))
        echo "File id : $id"
        dd if=/boot/vmlinuz-2.6.16.31-17tbs-smp of=bench/$id bs=$SIZE
count=$COUNT
        rm -f bench/$id
done
rmdir bench

Tests:
* op1 + or2 + or3:

~# time ls /mnt/gluster/home/http2

real    0m50.964s
user    0m0.000s
sys     0m0.004s

~# time /tmp/bench.sh 0

real    0m41.734s
user    0m0.736s
sys     0m1.496s

* or2 + or3:

~# time ls /mnt/gluster/home/http2

real    0m6.303s
user    0m0.004s
sys     0m0.008s

~# time /tmp/bench.sh 0

real    0m14.557s
user    0m0.684s
sys     0m1.332s

Avati told us to try without namespace on op1, here how it's been:

* 1 replicated-namespace on or2 et or3 and 1 afr between or2 and or3

~# time /tmp/bench.sh 0

real    0m14.557s
user    0m0.684s
sys     0m1.332s

* 1 namespace on or2 only  and 1 afr between or2 and or3

~# time /tmp/bench.sh 0

real    0m10.264s
user    0m0.644s
sys     0m1.328s

* op1 comes back up (still 1 namespace on or2):

~# time /tmp/bench.sh 0

real    0m32.790s
user    0m0.704s
sys     0m1.292s

"--direct-io-mode=write-only" option applied on glusterfs mount doesn't
affect those numbers.

For those tests above, op1 definition was let in the client spec file.
We just cut the op1 glusterfsd server down.

If we delete the op1 server from the client specification, performance
gets even better ! here it is :

* WITH op1 in client specs, op1 down:

* 1 namespace on or2 and 1 afr between or2 and or3

~# time /tmp/bench.sh 0

real    0m10.264s
user    0m0.644s
sys     0m1.328s

* WITHOUT op1 in specs :

~# time /tmp/bench.sh 0

real    0m5.743s
user    0m0.684s
sys     0m1.188s

Avati gave me some hints about improving performance for such an
architecture.

First, we're going to try the FUSE version pimped by the Gluster Team :

http://ftp.zresearch.com/pub/gluster/glusterfs/fuse/fuse-2.7.0-glfs1.tar.gz

it increases the fuse/glusterfs channel size, makes the VM read-ahead
more aggressive, and sets a default blocksize more well suited to use
the increased channel size

In our shell script, we are using BS=1024.. that is VERY BAD (avati said
:>) for a network file system.. we're going to try by using something
like 128KB or 1MB

Having a namespace across 2 distant datacenters is not recommended on
"slow" connection like a 100Mbits one. "Keep namespace near you" they said !

in AFR, we made sure the furthest subvolume is the last in order so that
it gets least preference while reading

Finally, we should try the new feature that allows to put hosts on the
'remote-volume' section instead of IPs, using a Round-Robin DNS.

Trying FUSE 2.7.0 right now, but Harris said on the chan that it didn't
change a thing in his configuration.

I'll keep you in touch, but for now on, do you have any advise ?

Regards,

Sebastien.