[Gluster-devel] Performance report and some issues
Amar S. Tumballi
amar at zresearch.com
Thu Mar 6 14:19:29 UTC 2008
Hi Jordi,
I see no performance translators on client side. You can load write-behind
and read-ahead/io-cache on client side. Without write-behind loaded, write
performance will be *very* less.
Regards,
Amar
On Thu, Mar 6, 2008 at 4:58 AM, Jordi Moles <jordi at cdmon.com> wrote:
> Hi,
>
> I want to report back the performance issues i've had so far with
> glusterfs mainline 2.5, patch 690 and fuse-2.7.2glfs8.
>
> I'm setting a mail system, which is all ran by Xen 3.2.0 and every
> "actual" piece of the mail system is a virtual machine from xen.
>
> Anyway... the virtual machines accessing glusterfs are 6 dovecots and 4
> postfixs. There are also 6 nodes, which share their own disk to the
> gluster filesystem. Two of the nodes, share 2 disks, one for the
> glusterfs, and the other for the namespace
>
> these are the conf files:
>
> ****nodes with namespace****
>
> volume esp
> type storage/posix
> option directory /mnt/compartit
> end-volume
>
> volume espa
> type features/posix-locks
> subvolumes esp
> end-volume
>
> volume espai
> type performance/io-threads
> option thread-count 15
> option cache-size 512MB
> subvolumes espa
> end-volume
>
> volume nm
> type storage/posix
> option directory /mnt/namespace
> end-volume
>
> volume ultim
> type protocol/server
> subvolumes espai nm
> option transport-type tcp/server
> option auth.ip.espai.allow *
> option auth.ip.nm.allow *
> end-volume
>
> *************
>
>
> ***nodes without namespace*****
>
> volume esp
> type storage/posix
> option directory /mnt/compartit
> end-volume
>
> volume espa
> type features/posix-locks
> subvolumes esp
> end-volume
>
> volume espai
> type performance/io-threads
> option thread-count 15
> option cache-size 512MB
> subvolumes espa
> end-volume
>
> volume ultim
> type protocol/server
> subvolumes espai
> option transport-type tcp/server
> option auth.ip.espai.allow *
> end-volume
>
> *****************************
>
>
> ***clients****
>
> volume espai1
> type protocol/client
> option transport-type tcp/client
> option remote-host 192.168.1.204
> option remote-subvolume espai
> end-volume
>
> volume espai2
> type protocol/client
> option transport-type tcp/client
> option remote-host 192.168.1.205
> option remote-subvolume espai
> end-volume
>
> volume espai3
> type protocol/client
> option transport-type tcp/client
> option remote-host 192.168.1.206
> option remote-subvolume espai
> end-volume
>
> volume espai4
> type protocol/client
> option transport-type tcp/client
> option remote-host 192.168.1.207
> option remote-subvolume espai
> end-volume
>
> volume espai5
> type protocol/client
> option transport-type tcp/client
> option remote-host 192.168.1.213
> option remote-subvolume espai
> end-volume
>
> volume espai6
> type protocol/client
> option transport-type tcp/client
> option remote-host 192.168.1.214
> option remote-subvolume espai
> end-volume
>
> volume namespace1
> type protocol/client
> option transport-type tcp/client
> option remote-host 192.168.1.204
> option remote-subvolume nm
> end-volume
>
> volume namespace2
> type protocol/client
> option transport-type tcp/client
> option remote-host 192.168.1.205
> option remote-subvolume nm
> end-volume
>
> volume grup1
> type cluster/afr
> subvolumes espai1 espai2
> end-volume
>
> volume grup2
> type cluster/afr
> subvolumes espai3 espai4
> end-volume
>
> volume grup3
> type cluster/afr
> subvolumes espai5 espai6
> end-volume
>
> volume nm
> type cluster/afr
> subvolumes namespace1 namespace2
> end-volume
>
> volume ultim
> type cluster/unify
> subvolumes grup1 grup2 grup3
> option scheduler rr
> option namespace nm
> end-volume
>
> ************
>
> The thing is that in earlier patches, the whole system used to hang,
> with many different error messages.
>
> Right now, it's been on for days without any hang at all, but i'm facing
> serious performance issues.
>
> By only running an "ls" command, it can take like 3 seconds to show
> something when the system is "under load". It doesn't happen at all when
> there's no activity, so i don't thing has anything to do with xen. Well,
> actually, "under load" can mean 3 mails arriving per second. I'm
> monitoring everything, and no virtual machine is using more than 20% of
> cpu or so.
>
> First, i had log level on both nodes and clients set to DEBUG, but now
> is just WARNING, and i've restarted everything so many times.
>
> I was suggested to use "type performance/io-threads" on the node side.
> It actually worked, before that, it wasn't 3 seconds, but 5 or more.
> I've set the "thread-count" value to different values and also
> "cache-size"
>
> The system is supposed to handle a big amount of traffic, far more than
> 3 mails a second.
>
> What do you think about the whole set up? Should i keep using namespace?
> Should i use new nodes for namespaces? Should i use different values for
> iothread?
>
> One last thing... i'm using reiserfs on the "storage devices" that nodes
> share. Should i be using XFS or something else?
>
> Logs don't show any kind of error now... i don't have a clue of what is
> failing now....
>
> I would be pleased if you could give some ideas.
>
> Thank you.
>
>
> _______________________________________________
> Gluster-devel mailing list
> Gluster-devel at nongnu.org
> http://lists.nongnu.org/mailman/listinfo/gluster-devel
>
--
Amar Tumballi
Gluster/GlusterFS Hacker
[bulde on #gluster/irc.gnu.org]
http://www.zresearch.com - Commoditizing Supercomputing and Superstorage!
More information about the Gluster-devel
mailing list