[Gluster-users] Hi guys.... got few doubts reg the config and volz

Thu Jul 8 13:55:15 UTC 2010

On 07/07/2010 02:35 PM, sandeep dude wrote:
> I am Sandeep from an Animation and Visual effects studio called as 
> GoldenEye.  We have a very unstructured data which needs some data 
> replication but in reality the requirement is scaling up and at the
> same time replication is somewhat tougher so we are using the "
> Allway sync" software to sync the folders from time to time worrying
> about the data server failure.. now when I saw Gluster I just got an
> idea about the Isilon storage.  I have few doubts which will make me
> to use gluster at my studio.
> 
> 1. If I got a storage cluster with 6 storage servers with mirror
> option enabled and now what is the performance? 6x? or 2x?

Probably something a bit less than 3x for write and close to 6x for
read.  Unless you're very paranoid about data loss, you'll probably want
simple two-way replication.  Each replica pair will offer a bit less
than 1x performance for write because of replication overhead, and close
to 2x for read because reads can be split across the two nodes.  Then
you take those three replica pairs and distribute across them to get 3x/6x.

Of course, the natural question is: 2x/3x/6x of what?  You can't just
multiply the local-disk performance, because there is communication
overhead to consider.  Others can probably provide more concrete
performance data for this type of configuration.  For planning purposes,
you should generally count on no more than half of the "obvious"
numbers.

> 2. and what if a node fail? does the data resides on the other one?

If a node fails, its replication partner should still have the data, so
you're protected against single failure.

> 3. using stripe on 6 servers means usable space is just 1 server
> space? and performance is 6x ?

No, stripe doesn't store data redundantly so you'll still have the
entire 6x space.

> I have too much of data which will be sucking by the renderfarm and
> my desktop users always complain that the servers are slower but
> still we have Hitachi 2TB deskstar 7200rpm which sends 130MB/sec per
> disk but now I have such a requirement where I want 400MB/sec for one
> editing machine and 600MB/sec for the renderfarm and another
> 400MB/sec for the deskstop users all simultaneously and under the
> same name space...huh...

1.4GB/s is a pretty tall order.  I've done 30x that, but that was on
very large and specialized systems.  Are all of those needs truly
concurrent, or is it more like 400MB/s then 600MB/s then 400MB/s again
for three separate work phases?  Also, are those peak or sustained
numbers, and for what kind of workload in terms of thread counts and I/O
sizes?

> I believe gluster can help me in doing this...
> 
> I have setup gluster with onboard Gigabit Lan for Asus M2n68am
> motherboard but its unable to connect... I was very very impressed
> with gluster technology but unable to test it...
> 
> But I have few questions...
> 
> what to do in order to get that speed on my servers?
> 
> 1. I want to use 4 machines with 4 1Gbps Intel dual port two cards
> per system and 4 Hitachi 130MB/sec hard disk 2TB each drive ( 8TB in
> total per system )

You can't practically get more than 100MB/s per NIC, even with good
equipment and lots of tuning.  If you want 1.4GB/s you'll need at least
14 NICs and therefore 7 machines . . . *at least*.  With the
configuration you describe, the four disks per machine will far outrun
the two NICs, and cramming more NICs per machine might not help due to
PCIe contention or network-stack limitations.

> my question is do i get 16Gbps of throughput for one file ( atleast 
> according to the calculation ) do I need to use stripe?

If you want very high throughput for large I/O requests (at least 128KB
* stripe width) then stripe might help.  In other situations it can
actually hurt.