[Gluster-devel] Dispersed volume: Initial results

Thu Feb 13 14:20:57 UTC 2014

Hi Jeff,

El 13/02/14 14:18, Jeff Darcy ha escrit:
>> I've been able to do the first performance comparison of a dispersed
>> volume. The translators still have known problems that will be solved as
>> soon as possible, but at least they support simple tests. I used a very
>> simple configuration and I compared it to a replicated volume with the
>> same level of fault tolerance.
>>
>> The test environment has been 4 servers based on an Intel Atom D525 1.8
>> GHz dual core with 4GB of RAM and 1 TB disk. They are interconnected
>> through a dedicated 1Gbit switch (this is the environment that I can
>> dispose of regularly for my tests). 3 servers have been used as bricks,
>> and the fourth one has been the client.
>>
>> The dispersed volume is composed of 3 bricks, with one of them of
>> redundancy. This has been compared to an standard replica-2 volume with
>> 2 bricks.
>>
>> The tests consist on the sequential write of 1GB file using different
>> block sizes (all writes aligned to the ida block size). All tests have
>> been executed 10 times and the average has been taken. IOPS represents
>> the average number of user-side write operations of the current block
>> size that can done per second.
>>
>> The results are attached in an ods document. This is very preliminary,
>> it can change, use it with caution.
>>
>> As you will see, the network becomes a bottleneck for both tests when
>> the block size if big enough.
>>
>> As an additional comparison, an example scenario with 6 bricks of 1 TB
>> would have 3 TB of logical space in a distributed-replicated volume, but
>> 4 TB in a distributed-dispersed volume, a 33% more space but with the
>> same level of fault tolerance.
> Those numbers are very promising.  The fact that a very new translator
> can even approach, let alone exceed, the performance of tuned-for-years
> AFR is impressive.  Is this with the full coordination for avoiding
> read-modify-write races on  partial blocks, or is that still a work in
> progress?
This is with all pieces in place (except for self-heal that I won't 
activate it until the other components are stable enough). The DFC 
(responsible for coordinating distributed operations) is active and 
seems to be working well.

> Given the limitations of your hosts and network, it might be
> good to get some numbers on more powerful hardware either at Red Hat or
> in the cloud (it's not that expensive any more to rent SSD-equipped
> machines with 10GbE for a while).  That way we could get some read and
> random I/O numbers as well as sequential write, with more hosts/threads.
We'll see if it's possible to do tests on better hardware.

> I'd be eager to do it myself if you think the code is sufficiently
> stable, but probably not until after FAST.
Well, the version I'm using is almost the same than that on gluster 
forge. I try to keep them synchronized on a daily basis. However there 
are a few known problems that still prevents generic tests to be 
executed successfully. It depends on what do you want to try... behavior 
under failure conditions have barely been tested yet, so it may become 
very unstable under those circumstances.

Xavi