[Gluster-users] Performance

Tue Apr 26 23:31:26 UTC 2011

On 04/26/2011 05:48 PM, Mohit Anchlia wrote:
> I am not sure how valid this performance url is
>
> http://www.gluster.com/community/documentation/index.php/Guide_to_Optimizing_GlusterFS
>
> Does it make sense to separate out the journal and create mkfs -I 256?
>
> Also, if I already have a file system on a different partition can I
> still use it to store journal from other partition without corrupting
> the file system?

Journals are small write heavy.  You really want a raw device for them. 
  You do not want file system caching underneath them.

Raw partition for an external journal is best.  Also, understand that 
ext* suffers badly under intense parallel loads.  Keep that in mind as 
you make your file system choice.

>
> On Thu, Apr 21, 2011 at 7:23 PM, Joe Landman
> <landman at scalableinformatics.com>  wrote:
>> On 04/21/2011 08:49 PM, Mohit Anchlia wrote:
>>>
>>> After lot of digging today finaly figured out that it's not really
>>> using PERC controller but some Fusion MPT. Then it wasn't clear which
>>
>> PERC is a rebadged LSI based on the 1068E chip.
>>
>>> tool it supports. Finally I installed lsiutil and was able to change
>>> the cache size.
>>>
>>> [root at dsdb1 ~]# lspci|grep LSI
>>> 02:00.0 SCSI storage controller: LSI Logic / Symbios Logic SAS1068E
>>> PCI-Express Fusion-MPT SAS (rev 08)
>>
>>   This looks like PERC.  These are roughly equivalent to the LSI 3081 series.
>>   These are not fast units.  There is a variant of this that does RAID6, its
>> usually available as a software update or plugin module (button?) to this.
>>   I might be thinking of the 1078 chip though.
>>
>>   Regardless, these are fairly old designs.
>>
>>
>>> [root at dsdb1 ~]# dd if=/dev/zero of=/data/big.file bs=128k count=40k
>>> oflag=direct
>>> 1024+0 records in
>>> 1024+0 records out
>>> 134217728 bytes (134 MB) copied, 0.742517 seconds, 181 MB/s
>>>
>>> I compared this with SW RAID mdadm that I created yesterday on one of
>>> the servers and I get around 300MB/s. I will test out first with what
>>> we have before destroying and testing with mdadm.
>>
>> So the software RAID is giving you 300 MB/s and the hardware 'RAID' is
>> giving you ~181 MB/s?  Seems a pretty simple choice :)
>>
>> BTW: The 300MB/s could also be a limitation of the PCIe channel interconnect
>> (or worse, if they hung the chip off a PCIx bridge).  The motherboard
>> vendors are generally loathe to put more than a few PCIe lanes for handling
>> SATA, Networking, etc.  So typically you wind up with very low powered
>> 'RAID' and 'SATA/SAS' on the motherboard, connected by PCIe x2 or x4 at
>> most.  A number of motherboards have NICs that are served by a single PCIe
>> x1 link.
>>
>>> Thanks for your help that led me to this path. Another question I had
>>> was when creating mdadm RAID does it make sense to use multipathing?
>>
>> Well, for a shared backend over a fabric, I'd say possibly.  For an internal
>> connected set, I'd say no.  Given what you are doing with Gluster, I'd say
>> that the additional expense/pain of setting up a multipath scenario probably
>> isn't worth it.
>>
>> Gluster lets you get many of these benefits at a higher level in the stack.
>>   Which to a degree, and in some use cases, obviates the need for
>> multipathing at a lower level.  I'd still suggest real RAID at the lower
>> level (RAID6, and sometimes RAID10 make the most sense) for the backing
>> store.
>>
>>
>> --
>> Joseph Landman, Ph.D
>> Founder and CEO
>> Scalable Informatics, Inc.
>> email: landman at scalableinformatics.com
>> web  : http://scalableinformatics.com
>>        http://scalableinformatics.com/sicluster
>> phone: +1 734 786 8423 x121
>> fax  : +1 866 888 3112
>> cell : +1 734 612 4615
>>

-- 
Joseph Landman, Ph.D
Founder and CEO
Scalable Informatics, Inc.
email: landman at scalableinformatics.com
web  : http://scalableinformatics.com
        http://scalableinformatics.com/sicluster
phone: +1 734 786 8423 x121
fax  : +1 866 888 3112
cell : +1 734 612 4615