[Gluster-devel] Performance report and some issues

Thu Mar 6 14:19:29 UTC 2008

Hi Jordi,
I see no performance translators on client side. You can load write-behind
and read-ahead/io-cache on client side. Without write-behind loaded, write
performance will be *very* less.

Regards,
Amar
On Thu, Mar 6, 2008 at 4:58 AM, Jordi Moles <jordi at cdmon.com> wrote:

> Hi,
>
> I want to report back the performance issues i've had so far with
> glusterfs mainline 2.5, patch 690 and fuse-2.7.2glfs8.
>
> I'm setting a mail system, which is all ran by Xen 3.2.0 and every
> "actual" piece of the mail system is a virtual machine from xen.
>
> Anyway... the virtual machines accessing glusterfs are 6 dovecots and 4
> postfixs.  There are also 6 nodes, which share their own disk to the
> gluster filesystem. Two of the nodes, share 2 disks, one for the
> glusterfs, and the other for the namespace
>
> these are the conf files:
>
> ****nodes with namespace****
>
> volume esp
>    type storage/posix
>    option directory /mnt/compartit
> end-volume
>
> volume espa
>    type features/posix-locks
>    subvolumes esp
> end-volume
>
> volume espai
>   type performance/io-threads
>   option thread-count 15
>   option cache-size 512MB
>   subvolumes espa
> end-volume
>
> volume nm
>    type storage/posix
>    option directory /mnt/namespace
> end-volume
>
> volume ultim
>    type protocol/server
>    subvolumes espai nm
>    option transport-type tcp/server
>    option auth.ip.espai.allow *
>    option auth.ip.nm.allow *
> end-volume
>
> *************
>
>
> ***nodes without namespace*****
>
> volume esp
>    type storage/posix
>    option directory /mnt/compartit
> end-volume
>
> volume espa
>    type features/posix-locks
>    subvolumes esp
> end-volume
>
> volume espai
>   type performance/io-threads
>   option thread-count 15
>   option cache-size 512MB
>   subvolumes espa
> end-volume
>
> volume ultim
>    type protocol/server
>    subvolumes espai
>    option transport-type tcp/server
>    option auth.ip.espai.allow *
> end-volume
>
> *****************************
>
>
> ***clients****
>
> volume espai1
>    type protocol/client
>    option transport-type tcp/client
>    option remote-host 192.168.1.204
>    option remote-subvolume espai
> end-volume
>
> volume espai2
>    type protocol/client
>    option transport-type tcp/client
>    option remote-host 192.168.1.205
>    option remote-subvolume espai
> end-volume
>
> volume espai3
>    type protocol/client
>    option transport-type tcp/client
>    option remote-host 192.168.1.206
>    option remote-subvolume espai
> end-volume
>
> volume espai4
>    type protocol/client
>    option transport-type tcp/client
>    option remote-host 192.168.1.207
>    option remote-subvolume espai
> end-volume
>
> volume espai5
>    type protocol/client
>    option transport-type tcp/client
>    option remote-host 192.168.1.213
>    option remote-subvolume espai
> end-volume
>
> volume espai6
>    type protocol/client
>    option transport-type tcp/client
>    option remote-host 192.168.1.214
>    option remote-subvolume espai
> end-volume
>
> volume namespace1
>    type protocol/client
>    option transport-type tcp/client
>    option remote-host 192.168.1.204
>    option remote-subvolume nm
> end-volume
>
> volume namespace2
>    type protocol/client
>    option transport-type tcp/client
>    option remote-host 192.168.1.205
>    option remote-subvolume nm
> end-volume
>
> volume grup1
>    type cluster/afr
>    subvolumes espai1 espai2
> end-volume
>
> volume grup2
>    type cluster/afr
>    subvolumes espai3 espai4
> end-volume
>
> volume grup3
>    type cluster/afr
>    subvolumes espai5 espai6
> end-volume
>
> volume nm
>    type cluster/afr
>    subvolumes namespace1 namespace2
> end-volume
>
> volume ultim
>    type cluster/unify
>    subvolumes grup1 grup2 grup3
>    option scheduler rr
>    option namespace nm
> end-volume
>
> ************
>
> The thing is that in earlier patches, the whole system used to hang,
> with many different error messages.
>
> Right now, it's been on for days without any hang at all, but i'm facing
> serious performance issues.
>
> By only running an "ls" command, it can take like 3 seconds to show
> something when the system is "under load". It doesn't happen at all when
> there's no activity, so i don't thing has anything to do with xen. Well,
> actually, "under load" can mean 3 mails arriving per second. I'm
> monitoring everything, and no virtual machine is using more than 20% of
> cpu or so.
>
> First, i had log level on both nodes and clients set to DEBUG, but now
> is just WARNING, and i've restarted everything so many times.
>
> I was suggested to use "type performance/io-threads" on the node side.
> It actually worked, before that, it wasn't 3 seconds, but 5 or more.
> I've set the "thread-count" value to different values and also
> "cache-size"
>
> The system is supposed to handle a big amount of traffic, far more than
> 3 mails a second.
>
> What do you think about the whole set up? Should i keep using namespace?
> Should i use new nodes for namespaces? Should i use different values for
> iothread?
>
> One last thing... i'm using reiserfs on the "storage devices" that nodes
> share. Should i be using XFS or something else?
>
> Logs don't show any kind of error now... i don't have a clue of what is
> failing now....
>
> I would be pleased if you could give some ideas.
>
> Thank you.
>
>
> _______________________________________________
> Gluster-devel mailing list
> Gluster-devel at nongnu.org
> http://lists.nongnu.org/mailman/listinfo/gluster-devel
>

-- 
Amar Tumballi
Gluster/GlusterFS Hacker
[bulde on #gluster/irc.gnu.org]
http://www.zresearch.com - Commoditizing Supercomputing and Superstorage!