[Gluster-devel] Performance question.

Wed Nov 21 16:29:26 UTC 2007

On Wed, 21 Nov 2007, Anand Avati wrote:

      See I asked if there was a philosophy about how to build a stack.
Never got a response until now.

      Caching won't help in the real appication I don't believe.
Mostly it's read, crunch, write.  If I'm wrong here please let me
know.  Although I don't believe it will hurt.  I'll give moving
write-behind and io-cache to the client and see what happens.  Does it
matter how they're stacked, i.e. the which comes first?

> You should also be loading io-cache on the client side with a decent
> cache-size (like 256MB? depends on how much RAM you have to spare). this
> will help re-read improve a lot.
>
> avati
>
> 2007/11/21, Anand Avati <avati at zresearch.com>:
>>
>> Chris,
>>  you shoud really be loading write-behind on the client side, that is wht
>> improves write performance the most. do let us know the results with
>> writebehind on the client side.
>>
>> avati
>>
>> 2007/11/21, Chris Johnson <johnson at nmr.mgh.harvard.edu>:
>>>
>>>       Hi, again,
>>>
>>>       I asked about stack building philosophy.  Apparently there isn't
>>> one.  So I tried a few things.  The configs are down the end here.
>>>
>>>       Two systems, CentOS5, both running fuse-devel-2.7.0-1 gluster
>>> enhanced, glusterfs-1.3.5-2.  Both have gigabit ethernet, server runs
>>> a SATABeast.  Currently I ge the following from from iozone.
>>>
>>> iozone -aN -r 32k -s 131072k -f /mnt/glusterfs/sdm1/junknstuff
>>>
>>>
>>> random  random    bkwd  record  stride
>>>                KB  reclen   write rewrite    read    reread    read
>>> write    read rewrite    read   fwrite frewrite   fread  freread
>>>            131072      32     589     587      345      343     818
>>> 621     757     624     845      592      591     346      366
>>>
>>> Now, a similar test using NFS on a CentOS4.4 system running a 3ware
>>> RAID card gives this
>>>
>>> iozone -aN -r 32k -s 131072k -f /space/sake/5/admin/junknstuff
>>>
>>>
>>> random  random    bkwd  record  stride
>>>                KB  reclen   write rewrite    read    reread    read
>>> write    read rewrite    read   fwrite frewrite   fread  freread
>>>            131072      32      27      26      292
>>> 11      11      24     542       9     539       30       28     295
>>> 11
>>>
>>> And you can see that the NFS system is faster.  Is this because of the
>>> hardware 3ware RAID or is NFS really that much faster here?  Is there
>>> a better way to stack this that would improve things?  And I tried with
>>> and without striping.  No noticable difference in gluster performance.
>>>
>>>       Help appreciated.
>>>
>>> ============  server config
>>>
>>> volume brick1
>>>    type storage/posix
>>>    option directory /home/sdm1
>>> end-volume
>>>
>>> volume brick2
>>>    type storage/posix
>>>    option directory /home/sdl1
>>> end-volume
>>>
>>> volume brick3
>>>    type storage/posix
>>>    option directory /home/sdk1
>>> end-volume
>>>
>>> volume brick4
>>>    type storage/posix
>>>    option directory /home/sdk1
>>> end-volume
>>>
>>> volume ns-brick
>>>    type storage/posix
>>>    option directory /home/sdk1
>>> end-volume
>>>
>>> volume stripe1
>>>   type cluster/stripe
>>>   subvolumes brick1 brick2
>>> # option block-size *:10KB,
>>> end-volume
>>>
>>> volume stripe2
>>>   type cluster/stripe
>>>   subvolumes brick3 brick4
>>> # option block-size *:10KB,
>>> end-volume
>>>
>>> volume unify0
>>>   type cluster/unify
>>>   subvolumes stripe1 stripe2
>>>   option namespace ns-brick
>>>   option scheduler rr
>>> # option rr.limits.min-disk-free 5
>>> end-volume
>>>
>>> volume iot
>>>   type performance/io-threads
>>>   subvolumes unify0
>>>   option thread-count 8
>>> end-volume
>>>
>>> volume writebehind
>>>    type performance/write-behind
>>>    option aggregate-size 131072 # in bytes
>>>    subvolumes iot
>>> end-volume
>>>
>>> volume readahead
>>>    type performance/read-ahead
>>> #  option page-size 65536 ### in bytes
>>>    option page-size 128kb ### in bytes
>>> #  option page-count 16 ### memory cache size is page-count x
>>> page-size per file
>>>    option page-count 2 ### memory cache size is page-count x page-size
>>> per file
>>>    subvolumes writebehind
>>> end-volume
>>>
>>> volume server
>>>    type protocol/server
>>>    subvolumes readahead
>>>    option transport-type tcp/server     # For TCP/IP transport
>>> #  option client-volume-filename /etc/glusterfs/glusterfs-client.vol
>>>    option auth.ip.readahead.allow *
>>> end-volume
>>>
>>>
>>> ============  client config
>>>
>>> volume client
>>>    type protocol/client
>>>    option transport-type tcp/client
>>>    option remote-host xxx.xxx.xxx.xxx
>>>    option remote-subvolume readahead
>>> end-volume
>>>
>>> -------------------------------------------------------------------------------
>>>
>>> Chris Johnson               |Internet: johnson at nmr.mgh.harvard.edu
>>> Systems Administrator       |Web:      http://www.nmr.mgh.harvard.edu/~johnson
>>> <http://www.nmr.mgh.harvard.edu/%7Ejohnson>
>>> NMR Center                  |Voice:    617.726.0949
>>> Mass. General Hospital      |FAX:      617.726.7422
>>> 149 (2301) 13th Street      |A compromise is a solution nobody is happy
>>> with.
>>> Charlestown, MA., 02129 USA |     Observation, Unknown
>>>
>>> -------------------------------------------------------------------------------
>>>
>>>
>>> _______________________________________________
>>> Gluster-devel mailing list
>>> Gluster-devel at nongnu.org
>>> http://lists.nongnu.org/mailman/listinfo/gluster-devel
>>>
>>
>>
>>
>> --
>> It always takes longer than you expect, even when you take into account
>> Hofstadter's Law.
>>
>> -- Hofstadter's Law
>
>
>
>
> -- 
> It always takes longer than you expect, even when you take into account
> Hofstadter's Law.
>
> -- Hofstadter's Law
>

------------------------------------------------------------------------------- 
Chris Johnson               |Internet: johnson at nmr.mgh.harvard.edu
Systems Administrator       |Web:      http://www.nmr.mgh.harvard.edu/~johnson
NMR Center                  |Voice:    617.726.0949
Mass. General Hospital      |FAX:      617.726.7422
149 (2301) 13th Street      |For all sad words of tongue or pen, the saddest
Charlestown, MA., 02129 USA |are these: "It might have been".  John G. Whittier 
-------------------------------------------------------------------------------