[Gluster-users] Poor performance with AFR

Wed Sep 17 11:04:45 UTC 2008

I have just finished my first steps with glusterfs. Realizing in
principle what I wanted to do (including installation from source) was
astonishingly easy; however, the performance is extremely poor. Thus,
I'd appreciate comments and suggestions what to do/try next.

* Operating system is Ubuntu 8.04.1 (32bit on servers, 64bit on client)
* glusterfs is 1.3.12; compiled from source; 
* transport is 100MBit Ethernet
* on the client, I tried both the distribution provided fuse kernel module,
  as well as the one built from fuse-2.7.3glfs10.tar.gz (<rant>which is a pain
  in the neck as one has to patch the source (*) to make it compile and include
  the new module in the initramfs; for some reasons, the fuse module is one
  of the first modules loaded by ubuntu</rant>)

(*) see http://www.nabble.com/Compiling-fuse-2.7-on-Ubuntu-Hardy-td18590177.html
Is there any chance of getting your patches included upstream ???

At the moment I have two dedicated servers and one test client.  I
have set up client side AFR, following the example in:

http://www.gluster.org/docs/index.php/Setting_up_AFR_on_two_servers_with_client_side_replication. 

I include the exact volume specs with which the benchmarks were run at the
end of the mail.

In principle, this worked right away, and after finally enabling
extended attributes and Posix ACLs(**) on the underlying filesystems on
the servers, self-healings seems to work as well (at least the little
I had time to test).

(**) not clear from the documentation that they are needed, but without
-o acl I get tons of errors/warnings in the server logs!

The setup I have described is a first test for the eventual migration
of NFS mounted home dirs to glusterfs in order to enhance data safety
and availability (hence AFR). My problem is that for two typical use
scenarios, the performance is not acceptable. My two "testcases" are a
cp -a of a source tree including git repository of appr. 160MB from
the local disk to the glusterfs filesystem ("CP-A") and a make
statement in this source tree (in the remote glusterfs filesystem)
when everything is up to date ("MAKE"); i.e., after make has checked
all 800 or so source files it tells you that there is nothing to do. I
am afraid my timings speak for themselves:

                 CP-A                  MAKE
local disk      < 5sec                < 0.3sec
NFS (100MBit)    55sec+-2sec          < 2sec
glusterfs (I)    4m29sec               17sec     
glusterfs (II)   4m05sec               18sec

(I) unpatched fuse.ko, (II) patched fuse.ko, both over the same network
as NFS. Note that in (II)  the modified kernel module was used
with the default (distribution provided) libfuse library.

Measurements are fairly reproducible, i.e., variations are at most +-
a few seconds. As you can see from the volume specs below, I tried 
some performance options, but the timings, if anything, got worse. Based on
the above timings, I also don't think that the patched fuse.ko is worth
the pain.

Some questions: (1) Would server side AFR improve things? The servers can/could
talk over dedicated GBit Ethernet with jumboframes. Judging from the howtos,
server side AFR is much more problematic concerning (redundant) availibility of the
service, but I first *have* to get performance somewhere near the NFS levels before
there is any sense in continuing.
(2) what about this libboost thing? Since the documentation makes it sound
highly experimental, I didn't even try.
(3) I admit that the plethora of performance xlators has me utterly confused;
as mentioned, my ad hoc experiments didn't help at all. So, any hints
would be very much appreciated.

I should maybe add that the single client scenario is not realistic; the
tests just reflect typical activities of my users and myself. The setup
eventually should serve 8-10 users with homedirs of 10-50GB; some users
often move GBs of data through their
homedirs. This is / has not been without pain using NFS, but
performance was acceptable so far. So even if I get my test scenario
up to speed, would glusterfs be up to the real task ?

Thanks in advance for any help, suggestions, pointers. 

Stefan Boresch

---------------------------------------------------------------------
cat /etc/glusterfs/glusterfs-server.vol
# the physical data space
##volume brick # watch out
volume gfs
  type storage/posix
  option directory /data/export
end-volume

## the actual exported volume
#volume gfs
#  type performance/io-threads
#  option thread-count 8
#  option cache-size 64MB
#  subvolumes brick
#end-volume

# server declaration
volume server
  type protocol/server
  subvolumes gfs
  option transport-type tcp/server # For TCP/IP transport
  option auth.ip.gfs.allow *
end-volume

=========

cat /etc/glusterfs/glusterfs-client.vol
volume brick1
    type protocol/client
    option transport-type tcp/client # for TCP/IP transport
    option remote-host x.y.z.a # IP address of a
    option remote-subvolume gfs    # name of the remote volume on omega
end-volume

volume brick2
    type protocol/client
    option transport-type tcp/client # for TCP/IP transport
    option remote-host x.y.z.b # IP address of b
    option remote-subvolume gfs    # name of the remote volume on sigma
end-volume

volume afr
   type cluster/afr
   subvolumes brick1 brick2
end-volume

## performance block for cluster                   # optional!
#volume writeback
#  type performance/write-behind
#  option aggregate-size 131072
#  subvolumes afr
#end-volume

## performance block for cluster                   # optional!
#volume readahead
#  type performance/read-ahead
#  option page-size 65536
#  option page-count 16
#  subvolumes writeback
#end-volume

-- 
Stefan Boresch
Institute for Computational Biological Chemistry
University of Vienna, Waehringerstr. 17       A-1090 Vienna, Austria
Phone: -43-1-427752715                        Fax:   -43-1-427752790