[Gluster-devel] Dispersed volume: Initial results

Thu Feb 13 13:18:43 UTC 2014

> I've been able to do the first performance comparison of a dispersed
> volume. The translators still have known problems that will be solved as
> soon as possible, but at least they support simple tests. I used a very
> simple configuration and I compared it to a replicated volume with the
> same level of fault tolerance.
> 
> The test environment has been 4 servers based on an Intel Atom D525 1.8
> GHz dual core with 4GB of RAM and 1 TB disk. They are interconnected
> through a dedicated 1Gbit switch (this is the environment that I can
> dispose of regularly for my tests). 3 servers have been used as bricks,
> and the fourth one has been the client.
> 
> The dispersed volume is composed of 3 bricks, with one of them of
> redundancy. This has been compared to an standard replica-2 volume with
> 2 bricks.
> 
> The tests consist on the sequential write of 1GB file using different
> block sizes (all writes aligned to the ida block size). All tests have
> been executed 10 times and the average has been taken. IOPS represents
> the average number of user-side write operations of the current block
> size that can done per second.
> 
> The results are attached in an ods document. This is very preliminary,
> it can change, use it with caution.
> 
> As you will see, the network becomes a bottleneck for both tests when
> the block size if big enough.
> 
> As an additional comparison, an example scenario with 6 bricks of 1 TB
> would have 3 TB of logical space in a distributed-replicated volume, but
> 4 TB in a distributed-dispersed volume, a 33% more space but with the
> same level of fault tolerance.

Those numbers are very promising.  The fact that a very new translator
can even approach, let alone exceed, the performance of tuned-for-years
AFR is impressive.  Is this with the full coordination for avoiding
read-modify-write races on  partial blocks, or is that still a work in
progress?  Given the limitations of your hosts and network, it might be
good to get some numbers on more powerful hardware either at Red Hat or
in the cloud (it's not that expensive any more to rent SSD-equipped
machines with 10GbE for a while).  That way we could get some read and
random I/O numbers as well as sequential write, with more hosts/threads.
I'd be eager to do it myself if you think the code is sufficiently
stable, but probably not until after FAST.