[Gluster-devel] Hello, I have a question about the erasure code translator, hope someone give me some advice, thank you!

Xavi Hernandez jahernan at redhat.com
Mon Apr 8 08:02:00 UTC 2019


On Mon, Apr 8, 2019 at 8:50 AM PSC <1173701037 at qq.com> wrote:

> Hi, I am a storage software coder who is interested in Gluster. I am
> trying to improve the read/write performance of it.
> I noticed that gluster is using Vandermonde matrix in erasure code
> encoding and decoding process. However, it is quite complicate to generate
> inverse matrix of a Vandermonde matrix, which is necessary for decode. The
> cost is O(n³).

That's not true, actually. A Vandermonde matrix can be inverted in O(n^2),
as the code currently does (look at ec_method_matrix_inverse() in
ec-method.c). Additionally, current code does caching of inverted matrices,
so in normal circumstances there shouldn't be many inverse computations.
Only when something changes (a brick dies or comes online), a new inverted
matrix could be needed.

> Use a Cauchy matrix, can greatly cut down the cost of the process to find
> an inverse matrix. Which is O(n²).
> I use intel storage accelerate library to replace the original ec
> encode/decode part of gluster. And it reduce the encode and decode time to
> about 50% of the original one.

How do you test that ? I also did some tests long ago and I didn't observe
that difference.

Doing a raw test of encoding/decoding performance of the current code using
Intel AVX2 extensions, it's able to process 7.6 GiB/s on a single core of
an Intel Xeon Silver 4114 when L1 cache is used. Without relying on
internal cache, it performs at 3.9 GiB/s. Does ISA-L provide better
performance for a matrix of the same size (4+2 non-systematic matrix) ?

> However, when I test the whole system. The read/write performance is
> almost the same as the original gluster.

Yes, there are many more things involved in the read and write operations
in gluster. For the particular case of EC, having to deal with many bricks
simultaneously (6 in this case) means that it's very sensitive to network
latency and communications delays, and this is probably one of the biggest
contributors. There some other small latencies added by other xlators.

> I test it on three machines as servers. Each one had two bricks, both of
> them are SSD. So the total amount of bricks is 6. Use two of them as coding
> bricks. That is a 4+2 disperse volume configure.
> The capability of network card is 10000Mbps. Theoretically it can support
> read and write with the speed faster than 1000MB/s.
> The actually performance of read is about 492MB/s.
> The actually performance of write is about 336MB/s.
> While the original one read at 461MB/s, write at 322MB/s
> Is there someone who can give me some advice about how to improve its
> performance? Which part is the critical defect on its performance if it’s
> not the ec translator?
> I did a time count on translators. It show me EC translator just take 7%
> in the whole read\write process. Even though I knew that some translators
> are run asynchronous, so the real percentage can be some how lager than
> that.
> Sincerely thank you for your patient to read my question!
> _______________________________________________
> Gluster-devel mailing list
> Gluster-devel at gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-devel
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-devel/attachments/20190408/ed753049/attachment.html>

More information about the Gluster-devel mailing list