[Gluster-users] Performance

Liam Slusser lslusser at gmail.com
Fri Aug 14 01:04:08 UTC 2009


XFS has been around since 1994 - originally written by SGI and is one of the
oldest journaling filesystems.  It has been in the linux source tree since
2.4 and is very stable.  It supports a max volume size of 16 exabytes where
ext3/4 runs out at 8tb i believe.  I've never had one of my xfs filesystems
need recovering and use it on a bunch of larger arrays where its to large to
use ext3.
Just make sure you're using 64bit linux and mount the filesystem with the
inode64 option so you don't run out of inodes.

liam

On Thu, Aug 13, 2009 at 1:31 AM, Hiren Joshi <josh at moonfruit.com> wrote:

>  What are the advantages of XFS over ext3 (which I'm currently using)? My
> fear with XFS when selecting a filesystem was that it's not as active or as
> well supported as ext3 and if things go wrong, how easy would it be to
> recover?
>
> I have 6 x 1TB disks in a hardware raid 6 with battery backup and UPS, it's
> now just the performance I need to get sorted...
>
>  ------------------------------
> *From:* Liam Slusser [mailto:lslusser at gmail.com]
> *Sent:* 12 August 2009 20:35
> *To:* Mark Mielke
> *Cc:* Hiren Joshi; gluster-users at gluster.org
> *Subject:* Re: [Gluster-users] Performance
>
>
> I had a similar situation.  My larger gluster cluster has two nodes but
> each node has 72 1.5tb hard drives.  I ended up creating three 30TB 24 drive
> raid6 arrays, formated with xfs and 64bit-inodes, and then exporting three
> bricks with gluster.  I would recommend using a hardware raid controller
> with battery backup power, UPS power, and a journaled filesystem and i think
> you'll be fine.
>
> I'm exporting the three bricks on each of my two nodes, the clients are
> using replication to replicate each of the three bricks on each server and
> then using distribute to tie it all together.
>
> liam
>
>
> On Wed, Aug 12, 2009 at 10:51 AM, Mark Mielke <mark at mark.mielke.cc> wrote:
>
>> On 08/12/2009 01:24 PM, Hiren Joshi wrote:
>>
>>> 36 partitions on each server - the word "partition" is ambiguous. Are
>>>> they 36 separate drives? Or multiple partitions on the same drive. If
>>>> multiple partitions on the same drive, this would be a bad
>>>> idea, as it
>>>> would require the disk head to move back and forth between the
>>>> partitions, significantly increasing the latency, and therefore
>>>> significantly reducing the performance. If each partition is
>>>> on its own
>>>> drive, you still won't see benefit unless you have many clients
>>>> concurrently changing many different files. In your above case, it's
>>>> touching a single file in sequence, and having a cluster is
>>>> costing you
>>>> rather than benefitting you.
>>>>
>>>>
>>>
>>> We went with 36 partitions (on a single raid 6 drive) incase we got file
>>> system corruption, it would take less time to fsck a 100G partition than
>>> a 3.6TB one. Would a 3.6TB single disk be better?
>>>
>>
>> Putting 3.6 TB on a single disk sounds like a lot of eggs in one basket.
>> :-)
>>
>> If you are worried about fsck, I would definitely do as the other poster
>> suggested and use a journalled file system. This nearly eliminates the fsck
>> time for most situations. This would be whether using 100G partitions or
>> using 3.6T partitions. In fact, there is very few reasons not to use a
>> journalled file system these days.
>>
>> As for how to deal with data on this partition - the file system is going
>> to have a better chance of placing files close to each other, than setting
>> up 36 partitions and having Gluster scatter the files across all of them
>> based on a hash. Personally, I would choose 4 x 1 Tbyte drives over 1 x 3.6
>> Tbyte drive, as this nearly quadruples my bandwidth and for highly
>> concurrent loads, nearly divides by four the average latency to access
>> files.
>>
>> But, if you already have the 3.6 Tbyte drive, I think the only
>> performance-friendly use would be to partition it based upon access
>> requirements, rather than a hash (random). That is, files that are accessed
>> frequently should be clustered together at the front of a disk, files
>> accessed less frequently could be in the middle, and files accessed
>> infrequently could be at the end. This would be a three partition disk.
>> Gluster does not have a file system that does this automatically (that I can
>> tell), so it would probably require a software solution on your end. For
>> example, I believe dovecot (IMAP server) allows an "alternative storage"
>> location to be defined, so that infrequently read files can be moved to
>> another disk, and it knows to check the primary storage first, and fall back
>> to the alternative storage after.
>>
>> It you can't break up your storage by access patterns, then I think a 3.6
>> Tbyte file system might still be the next best option - it's still better
>> than 36 partitions. But, make sure you have a good file system on it, that
>> scales well to this size.
>>
>>
>> Cheers,
>> mark
>>
>> --
>> Mark Mielke<mark at mielke.cc>
>>
>>
>> _______________________________________________
>> Gluster-users mailing list
>> Gluster-users at gluster.org
>> http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
>>
>>
>


More information about the Gluster-users mailing list