[Gluster-users] Performance

Mark Mielke mark at mark.mielke.cc
Wed Aug 12 17:51:12 UTC 2009


On 08/12/2009 01:24 PM, Hiren Joshi wrote:
>> 36 partitions on each server - the word "partition" is ambiguous. Are
>> they 36 separate drives? Or multiple partitions on the same drive. If
>> multiple partitions on the same drive, this would be a bad
>> idea, as it
>> would require the disk head to move back and forth between the
>> partitions, significantly increasing the latency, and therefore
>> significantly reducing the performance. If each partition is
>> on its own
>> drive, you still won't see benefit unless you have many clients
>> concurrently changing many different files. In your above case, it's
>> touching a single file in sequence, and having a cluster is
>> costing you
>> rather than benefitting you.
>>      
>
> We went with 36 partitions (on a single raid 6 drive) incase we got file
> system corruption, it would take less time to fsck a 100G partition than
> a 3.6TB one. Would a 3.6TB single disk be better?

Putting 3.6 TB on a single disk sounds like a lot of eggs in one basket. :-)

If you are worried about fsck, I would definitely do as the other poster 
suggested and use a journalled file system. This nearly eliminates the 
fsck time for most situations. This would be whether using 100G 
partitions or using 3.6T partitions. In fact, there is very few reasons 
not to use a journalled file system these days.

As for how to deal with data on this partition - the file system is 
going to have a better chance of placing files close to each other, than 
setting up 36 partitions and having Gluster scatter the files across all 
of them based on a hash. Personally, I would choose 4 x 1 Tbyte drives 
over 1 x 3.6 Tbyte drive, as this nearly quadruples my bandwidth and for 
highly concurrent loads, nearly divides by four the average latency to 
access files.

But, if you already have the 3.6 Tbyte drive, I think the only 
performance-friendly use would be to partition it based upon access 
requirements, rather than a hash (random). That is, files that are 
accessed frequently should be clustered together at the front of a disk, 
files accessed less frequently could be in the middle, and files 
accessed infrequently could be at the end. This would be a three 
partition disk. Gluster does not have a file system that does this 
automatically (that I can tell), so it would probably require a software 
solution on your end. For example, I believe dovecot (IMAP server) 
allows an "alternative storage" location to be defined, so that 
infrequently read files can be moved to another disk, and it knows to 
check the primary storage first, and fall back to the alternative 
storage after.

It you can't break up your storage by access patterns, then I think a 
3.6 Tbyte file system might still be the next best option - it's still 
better than 36 partitions. But, make sure you have a good file system on 
it, that scales well to this size.

Cheers,
mark

-- 
Mark Mielke<mark at mielke.cc>



More information about the Gluster-users mailing list