[Gluster-devel] Wrong assumptions about disperse

Vijay Bellur vbellur at redhat.com
Fri Jun 17 14:28:45 UTC 2016


On Fri, Jun 17, 2016 at 4:59 AM, Xavier Hernandez <xhernandez at datalab.es> wrote:
> Hi all,
>
> I've seen in many places the belief that disperse, or erasure coding in
> general, is slow because of the complex or costly math involved. It's true
> that there's an overhead compared to a simple copy like replica does, but
> this overhead is way more smaller than many people think.
>
> The math used by disperse, if tested alone outside gluster, is much faster
> than it seems. AFAIK the real problem of EC is the communications layer. It
> adds a lot of latency and having to communicate simultaneously and
> coordinate 6 or more bricks has a big impact.
>
> Erasure coding also suffers from partial writes, that require a
> read-modify-write cycle. However this is completely avoided in many
> situations where the volume is optimally configured and writes are in blocks
> of multiples of 4096 bytes and aligned (typical on VMs, databases and many
> other workloads). It could even be avoided in other situations taking
> advantage of the write-behind xlator (not done yet).
>
> I've used a single core of two machines to test the raw math: one quite
> limited (Atom D525 1.8 GHz) and another more powerful but not a top CPU
> (Xeon E5-2630L 2.0 GHz).
>
> Common parameters:
>
> * nonsystematic vandermonde matrix (the same used by ec)
> * algorithm slightly slower than the one used bye ec (I haven't implemented
> some optimizations in the test program, but I think the difference should be
> very small)
> * buffer size: 128 KiB
> * number of iterations: 16384
> * total size processed: 2 GiB
> * results in MiB/s for a single core
>
> Config   Atom   Xeon
>   2+1     633   1856
>   4+1     405   1203
>   4+2     324    984
>   4+3     275    807
>   8+2     227    611
>   8+3     202    545
>   8+4     182    501
>  16+3     116    303
>  16+4     111    295
>
> The same tests using Intel SSE2 extensions (not present in EC yet, but the
> patch is in review):
>
> Config   Atom   Xeon
>   2+1     821   3047
>   4+1     767   2246
>   4+2     629   1887
>   4+3     535   1632
>   8+2     466   1237
>   8+3     423   1104
>   8+4     388   1044
>  16+3     289    675
>  16+4     271    637
>
> With AVX2 it should be faster, but my machines doesn't support it.
>
> This is even much much faster when a systematic matrix is used. For example
> a 16+4 configuration using SSE on a Xeon core can encode at 3865 MiB/s.
> However this won't be a big difference inside gluster.
>
> Currently EC encoding/decoding for small/medium configurations is not the
> bottle-neck of disperse. Maybe for big configurations on slow machines, it
> could have some impact (I don't have resources to test those big
> configurations properly).


Agree here. In the performance results that I have observed, EC
outperforms afr when multi-threaded large sequential read and write
workloads are involved.

-Vijay


More information about the Gluster-devel mailing list