[Gluster-devel] Performance question.

Wed Nov 21 17:07:44 UTC 2007

On Wed, 21 Nov 2007, Chris Johnson wrote:

Ok, caching and write-behind moved to the client side.  There is some
improvement.

                                                             random  random    bkwd  record  stride
               KB  reclen   write rewrite    read    reread    read   write    read rewrite    read   fwrite frewrite   fread  freread
           131072      32     312     312      361      363    1453     322     677     320     753      312      312     369      363

but as you can see it's marginal.  Is this typical, i.e. being an
order of magnitude slower than NFS?

> On Wed, 21 Nov 2007, Anand Avati wrote:
>
>     See I asked if there was a philosophy about how to build a stack.
> Never got a response until now.
>
>     Caching won't help in the real appication I don't believe.
> Mostly it's read, crunch, write.  If I'm wrong here please let me
> know.  Although I don't believe it will hurt.  I'll give moving
> write-behind and io-cache to the client and see what happens.  Does it
> matter how they're stacked, i.e. the which comes first?
>
>> You should also be loading io-cache on the client side with a decent
>> cache-size (like 256MB? depends on how much RAM you have to spare). this
>> will help re-read improve a lot.
>> 
>> avati
>> 
>> 2007/11/21, Anand Avati <avati at zresearch.com>:
>>> 
>>> Chris,
>>>  you shoud really be loading write-behind on the client side, that is wht
>>> improves write performance the most. do let us know the results with
>>> writebehind on the client side.
>>> 
>>> avati
>>> 
>>> 2007/11/21, Chris Johnson <johnson at nmr.mgh.harvard.edu>:
>>>> 
>>>>       Hi, again,
>>>> 
>>>>       I asked about stack building philosophy.  Apparently there isn't
>>>> one.  So I tried a few things.  The configs are down the end here.
>>>> 
>>>>       Two systems, CentOS5, both running fuse-devel-2.7.0-1 gluster
>>>> enhanced, glusterfs-1.3.5-2.  Both have gigabit ethernet, server runs
>>>> a SATABeast.  Currently I ge the following from from iozone.
>>>> 
>>>> iozone -aN -r 32k -s 131072k -f /mnt/glusterfs/sdm1/junknstuff
>>>> 
>>>> 
>>>> random  random    bkwd  record  stride
>>>>                KB  reclen   write rewrite    read    reread    read
>>>> write    read rewrite    read   fwrite frewrite   fread  freread
>>>>            131072      32     589     587      345      343     818
>>>> 621     757     624     845      592      591     346      366
>>>> 
>>>> Now, a similar test using NFS on a CentOS4.4 system running a 3ware
>>>> RAID card gives this
>>>> 
>>>> iozone -aN -r 32k -s 131072k -f /space/sake/5/admin/junknstuff
>>>> 
>>>> 
>>>> random  random    bkwd  record  stride
>>>>                KB  reclen   write rewrite    read    reread    read
>>>> write    read rewrite    read   fwrite frewrite   fread  freread
>>>>            131072      32      27      26      292
>>>> 11      11      24     542       9     539       30       28     295
>>>> 11
>>>> 
>>>> And you can see that the NFS system is faster.  Is this because of the
>>>> hardware 3ware RAID or is NFS really that much faster here?  Is there
>>>> a better way to stack this that would improve things?  And I tried with
>>>> and without striping.  No noticable difference in gluster performance.
>>>> 
>>>>       Help appreciated.
>>>> 
>>>> ============  server config
>>>> 
>>>> volume brick1
>>>>    type storage/posix
>>>>    option directory /home/sdm1
>>>> end-volume
>>>> 
>>>> volume brick2
>>>>    type storage/posix
>>>>    option directory /home/sdl1
>>>> end-volume
>>>> 
>>>> volume brick3
>>>>    type storage/posix
>>>>    option directory /home/sdk1
>>>> end-volume
>>>> 
>>>> volume brick4
>>>>    type storage/posix
>>>>    option directory /home/sdk1
>>>> end-volume
>>>> 
>>>> volume ns-brick
>>>>    type storage/posix
>>>>    option directory /home/sdk1
>>>> end-volume
>>>> 
>>>> volume stripe1
>>>>   type cluster/stripe
>>>>   subvolumes brick1 brick2
>>>> # option block-size *:10KB,
>>>> end-volume
>>>> 
>>>> volume stripe2
>>>>   type cluster/stripe
>>>>   subvolumes brick3 brick4
>>>> # option block-size *:10KB,
>>>> end-volume
>>>> 
>>>> volume unify0
>>>>   type cluster/unify
>>>>   subvolumes stripe1 stripe2
>>>>   option namespace ns-brick
>>>>   option scheduler rr
>>>> # option rr.limits.min-disk-free 5
>>>> end-volume
>>>> 
>>>> volume iot
>>>>   type performance/io-threads
>>>>   subvolumes unify0
>>>>   option thread-count 8
>>>> end-volume
>>>> 
>>>> volume writebehind
>>>>    type performance/write-behind
>>>>    option aggregate-size 131072 # in bytes
>>>>    subvolumes iot
>>>> end-volume
>>>> 
>>>> volume readahead
>>>>    type performance/read-ahead
>>>> #  option page-size 65536 ### in bytes
>>>>    option page-size 128kb ### in bytes
>>>> #  option page-count 16 ### memory cache size is page-count x
>>>> page-size per file
>>>>    option page-count 2 ### memory cache size is page-count x page-size
>>>> per file
>>>>    subvolumes writebehind
>>>> end-volume
>>>> 
>>>> volume server
>>>>    type protocol/server
>>>>    subvolumes readahead
>>>>    option transport-type tcp/server     # For TCP/IP transport
>>>> #  option client-volume-filename /etc/glusterfs/glusterfs-client.vol
>>>>    option auth.ip.readahead.allow *
>>>> end-volume
>>>> 
>>>> 
>>>> ============  client config
>>>> 
>>>> volume client
>>>>    type protocol/client
>>>>    option transport-type tcp/client
>>>>    option remote-host xxx.xxx.xxx.xxx
>>>>    option remote-subvolume readahead
>>>> end-volume
>>>> 
>>>> 
>>>> -------------------------------------------------------------------------------
>>>> 
>>>> Chris Johnson               |Internet: johnson at nmr.mgh.harvard.edu
>>>> Systems Administrator       |Web: 
>>>> http://www.nmr.mgh.harvard.edu/~johnson
>>>> <http://www.nmr.mgh.harvard.edu/%7Ejohnson>
>>>> NMR Center                  |Voice:    617.726.0949
>>>> Mass. General Hospital      |FAX:      617.726.7422
>>>> 149 (2301) 13th Street      |A compromise is a solution nobody is happy
>>>> with.
>>>> Charlestown, MA., 02129 USA |     Observation, Unknown
>>>> 
>>>> 
>>>> -------------------------------------------------------------------------------
>>>> 
>>>> 
>>>> _______________________________________________
>>>> Gluster-devel mailing list
>>>> Gluster-devel at nongnu.org
>>>> http://lists.nongnu.org/mailman/listinfo/gluster-devel
>>>> 
>>> 
>>> 
>>> 
>>> --
>>> It always takes longer than you expect, even when you take into account
>>> Hofstadter's Law.
>>> 
>>> -- Hofstadter's Law
>> 
>> 
>> 
>> 
>> -- 
>> It always takes longer than you expect, even when you take into account
>> Hofstadter's Law.
>> 
>> -- Hofstadter's Law
>> 
>
> ------------------------------------------------------------------------------- 
> Chris Johnson               |Internet: johnson at nmr.mgh.harvard.edu
> Systems Administrator       |Web: 
> http://www.nmr.mgh.harvard.edu/~johnson
> NMR Center                  |Voice:    617.726.0949
> Mass. General Hospital      |FAX:      617.726.7422
> 149 (2301) 13th Street      |For all sad words of tongue or pen, the saddest
> Charlestown, MA., 02129 USA |are these: "It might have been".  John G. 
> Whittier 
> -------------------------------------------------------------------------------
>
>
>

-------------------------------------------------------------------------------
Chris Johnson               |Internet: johnson at nmr.mgh.harvard.edu
Systems Administrator       |Web:      http://www.nmr.mgh.harvard.edu/~johnson
NMR Center                  |Voice:    617.726.0949
Mass. General Hospital      |FAX:      617.726.7422
149 (2301) 13th Street      |Fifty percent of all doctors graduated in the
Charlestown, MA., 02129 USA |lower half of the class.  Observation
-------------------------------------------------------------------------------