[Gluster-users] Performance

Wed Apr 20 18:54:18 UTC 2011

Thanks again! But what I don't understand is that I have 3 X 2 servers
so I would expect 20 X 3 = 60 MBPS total atleast. My load is getting
spread accross 3 X 2 servers in distributed replica. If I was using
just one gluster server I would understand but with 6 it makes no
sense..

On Wed, Apr 20, 2011 at 11:47 AM, Joe Landman
<landman at scalableinformatics.com> wrote:
> On 04/20/2011 02:29 PM, Mohit Anchlia wrote:
>>
>> Please find
>>
>>
>> [root at dsdb1 ~]# cat /proc/sys/vm/drop_caches
>> 3
>> [root at dsdb1 ~]# dd if=/dev/zero of=/data/big.file bs=128k count=80k
>> oflag=direct
>>
>> 81920+0 records in
>> 81920+0 records out
>> 10737418240 bytes (11 GB) copied, 521.553 seconds, 20.6 MB/s
>
> Suddenly this makes a great deal more sense.
>
>> [root at dsdb1 ~]#
>> [root at dsdb1 ~]# dd if=/dev/zero of=/data/big.file bs=128k count=80k
>> iflag=direct
>> dd: opening `/dev/zero': Invalid argument
>> [root at dsdb1 ~]# dd of=/dev/null if=/data/big.file bs=128k iflag=direct
>> 81920+0 records in
>> 81920+0 records out
>> 10737418240 bytes (11 GB) copied, 37.854 seconds, 284 MB/s
>> [root at dsdb1 ~]#
>
> About what I expected.
>
> Ok.  Uncached OS writes get you to 20MB/s.  Which is about what you are
> seeing with the fuse mount and a dd.  So I think we understand the write
> side.
>
> The read side is about where I expected (lower actually, but not by enough
> that I am concerned).
>
> You can try changing bs=2M count=6k on both to see the effect of larger
> blocks.  You should get some improvement.
>
> I think we need to dig into the details of that RAID0 construction now.
>  This might be something better done offlist (unless everyone wants to see
> the gory details of digging into the hardware side).
>
> My current thought is that this is a hardware issue, and not a gluster issue
> per se, but that there are possibilities for improving performance on the
> gluster side of the equation.
>
> Short version:  PERC is not fast (never has been), and it is often a bad
> choice for high performance.  You are often better off building an MD RAID
> using the software tools in Linux, it will be faster.  Think of PERC as an
> HBA with some modicum of built in RAID capability.  You don't really want to
> use that capability if possible, but you do want to use the HBA.
>
> Longer version:  Likely a striping issue, or a caching issue (need to see
> battery state, cache size, etc.), not to mention the slow chip.  Are the
> disk write caches off or on (guessing off which is the right thing to do for
> some workloads but it does impact performance).  Also, the RAID cpu in PERC
> (its a rebadged LSI) is very low performance in general, and specifically
> not terribly good even at RAID0.  These are direct writes, skipping OS
> cache.  They will let you see how fast the underlying hardware is, and if it
> can handle the amount of data you want to shove onto disks.
>
> Here is my desktop:
>
> root at metal:/local2/home/landman# dd if=/dev/zero of=/local2/big.file bs=128k
> count=80k oflag=direct
> 81920+0 records in
> 81920+0 records out
> 10737418240 bytes (11 GB) copied, 64.7407 s, 166 MB/s
>
> root at metal:/local2/home/landman# dd if=/dev/zero of=/local2/big.file bs=2M
> count=6k oflag=direct
> 6144+0 records in
> 6144+0 records out
> 12884901888 bytes (13 GB) copied, 86.0184 s, 150 MB/s
>
>
>
> and a server in the lab
>
> [root at jr5-1 ~]# dd if=/dev/zero of=/data/big.file bs=128k count=80k
> oflag=direct
> 81920+0 records in
> 81920+0 records out
> 10737418240 bytes (11 GB) copied, 11.0948 seconds, 968 MB/s
>
> [root at jr5-1 ~]# dd if=/dev/zero of=/data/big.file bs=2M count=6k
> oflag=direct
> 6144+0 records in
> 6144+0 records out
> 12884901888 bytes (13 GB) copied, 5.11935 seconds, 2.5 GB/s
>
>
> Gluster will not be faster than the bare metal (silicon).  It may hide some
> of the issues with caching.  But it is bounded by how fast you can push to
> or pull bits from the media.
>
> In an "optimal" config, the 4x SAS 10k RPM drives should be able to sustain
> ~600 MB/s write.  Reality will be less than this, guessing 250-400 MB/s in
> most cases.  This is still pretty low in performance.
>
> --
> Joseph Landman, Ph.D
> Founder and CEO
> Scalable Informatics Inc.
> email: landman at scalableinformatics.com
> web  : http://scalableinformatics.com
>       http://scalableinformatics.com/sicluster
> phone: +1 734 786 8423 x121
> fax  : +1 866 888 3112
> cell : +1 734 612 4615
>