[Gluster-users] Recommended Stripe Width

Keith Freedman freedman at FreeFormIT.com
Sat Feb 21 07:52:45 UTC 2009


At 11:02 PM 2/20/2009, Jordan Mendler wrote:
>I am prototyping GlusterFS with ~50-60TB of raw disk space across 
>non-raided disks in ~30 compute nodes. I initially separated the 
>nodes into groups of two, and did a replicate across each set of 
>single drives in a pair of servers. Next I did a stripe across the 
>33 resulting AFR groups, with a block size of 1MB and later with the 
>default block size. With these configurations I am only seeing 
>throughput of about 15-25 MB/s, despite a full Gig-E network.
>
>What is generally the recommended configuration in a large striped 
>environment? I am wondering if the number of nodes in the stripe is 
>causing too much overhead, or if the bottleneck is likely somewhere 
>else. In addition, I saw a thread on the list that indicates it is 
>better to replicate across stripes rather than stripe across 
>replicates. Does anyone have any comments or opinion regarding this?

I think that's all guesswork, I'm not sure anyones done a thorough 
test with gluster 2.0 on those choices.
Personally, from a data management perspective, I'd rather replicate 
then stripe, so that I know that each node in a replica has exactly 
the same data.  With striping then replicating, I imagine there is 
the possibility to have some data that's on one node in one stripe 
set on 2 nodes in another stripe set and this causes a problem if you 
have to take it apart or deal with it later.

However, if you have the time, it'd be great to see results of you 
testing with a 15 node stripe and a 10 node stripe to see how those 
numbers rate vs. the 30 node stripe you have now.
then, flip the replication and do the same tests again.

Keith






More information about the Gluster-users mailing list