[Gluster-devel] Wrong assumptions about disperse
Xavier Hernandez
xhernandez at datalab.es
Fri Jun 17 08:59:15 UTC 2016
Hi all,
I've seen in many places the belief that disperse, or erasure coding in
general, is slow because of the complex or costly math involved. It's
true that there's an overhead compared to a simple copy like replica
does, but this overhead is way more smaller than many people think.
The math used by disperse, if tested alone outside gluster, is much
faster than it seems. AFAIK the real problem of EC is the communications
layer. It adds a lot of latency and having to communicate simultaneously
and coordinate 6 or more bricks has a big impact.
Erasure coding also suffers from partial writes, that require a
read-modify-write cycle. However this is completely avoided in many
situations where the volume is optimally configured and writes are in
blocks of multiples of 4096 bytes and aligned (typical on VMs, databases
and many other workloads). It could even be avoided in other situations
taking advantage of the write-behind xlator (not done yet).
I've used a single core of two machines to test the raw math: one quite
limited (Atom D525 1.8 GHz) and another more powerful but not a top CPU
(Xeon E5-2630L 2.0 GHz).
Common parameters:
* nonsystematic vandermonde matrix (the same used by ec)
* algorithm slightly slower than the one used bye ec (I haven't
implemented some optimizations in the test program, but I think the
difference should be very small)
* buffer size: 128 KiB
* number of iterations: 16384
* total size processed: 2 GiB
* results in MiB/s for a single core
Config Atom Xeon
2+1 633 1856
4+1 405 1203
4+2 324 984
4+3 275 807
8+2 227 611
8+3 202 545
8+4 182 501
16+3 116 303
16+4 111 295
The same tests using Intel SSE2 extensions (not present in EC yet, but
the patch is in review):
Config Atom Xeon
2+1 821 3047
4+1 767 2246
4+2 629 1887
4+3 535 1632
8+2 466 1237
8+3 423 1104
8+4 388 1044
16+3 289 675
16+4 271 637
With AVX2 it should be faster, but my machines doesn't support it.
This is even much much faster when a systematic matrix is used. For
example a 16+4 configuration using SSE on a Xeon core can encode at 3865
MiB/s. However this won't be a big difference inside gluster.
Currently EC encoding/decoding for small/medium configurations is not
the bottle-neck of disperse. Maybe for big configurations on slow
machines, it could have some impact (I don't have resources to test
those big configurations properly).
Regards,
Xavi
More information about the Gluster-devel
mailing list