[Gluster-users] Recommended Stripe Width
freedman at FreeFormIT.com
Sat Feb 21 07:52:45 UTC 2009
At 11:02 PM 2/20/2009, Jordan Mendler wrote:
>I am prototyping GlusterFS with ~50-60TB of raw disk space across
>non-raided disks in ~30 compute nodes. I initially separated the
>nodes into groups of two, and did a replicate across each set of
>single drives in a pair of servers. Next I did a stripe across the
>33 resulting AFR groups, with a block size of 1MB and later with the
>default block size. With these configurations I am only seeing
>throughput of about 15-25 MB/s, despite a full Gig-E network.
>What is generally the recommended configuration in a large striped
>environment? I am wondering if the number of nodes in the stripe is
>causing too much overhead, or if the bottleneck is likely somewhere
>else. In addition, I saw a thread on the list that indicates it is
>better to replicate across stripes rather than stripe across
>replicates. Does anyone have any comments or opinion regarding this?
I think that's all guesswork, I'm not sure anyones done a thorough
test with gluster 2.0 on those choices.
Personally, from a data management perspective, I'd rather replicate
then stripe, so that I know that each node in a replica has exactly
the same data. With striping then replicating, I imagine there is
the possibility to have some data that's on one node in one stripe
set on 2 nodes in another stripe set and this causes a problem if you
have to take it apart or deal with it later.
However, if you have the time, it'd be great to see results of you
testing with a 15 node stripe and a 10 node stripe to see how those
numbers rate vs. the 30 node stripe you have now.
then, flip the replication and do the same tests again.
More information about the Gluster-users