[Gluster-devel] Weird full heal on Distributed-Disperse volume with sharding

Wed Sep 30 07:21:04 UTC 2020

On 9/30/20 8:58 AM, Xavi Hernandez wrote:

> This is normal. A dispersed volume writes encoded fragments of each block in each brick. In this case it's a 2+1 configuration, so each block is divided into 2 fragments. A third fragment is generated 
> for redundancy and stored on the third brick.

OK. But for Distributed-Replicate 2 x 3 setup and 64K shards, 4M file should be split into (4096 / 64) * 3 = 192 shards, not 189. So why 189?

And if all bricks are considered equal and has enough amount of free space, shards distribution {24, 24, 24, 39, 39, 39} looks suboptimal.
Why not {31, 32, 31, 32, 31, 32}? Isn't it a bug?

> This is not right. A disperse 2+1 configuration only supports a single failure. Wiping 2 fragments from the same file makes the file unrecoverable. Disperse works using the Reed-Solomon erasure code, 
> which requires at least 2 healthy fragments to recover the data (in a 2+1 configuration).

It seems that I missed the point that all bricks are considered equal, regardless of the physical host they're attached to.

So, for the Distributed-Disperse 2 x (2 + 1) setup with 3 hosts, 2 bricks per each, and two files, A and B, it's possible to have
the following layout:

Host0:                  Host1:                  Host2:
|- Brick0: A0 B0        |- Brick0: A1           |- Brick0: A2
|- Brick1: B1           |- Brick1: B2           |- Brick1:

This setup can tolerate single brick failure but not single host failure because if Host0 is down, two fragments of B will be lost
and so B becomes unrecoverable (but A is not).

If this is so, is it possible/hard to enforce 'one fragment per *host*' behavior? If we can guarantee the following:

Host0:                  Host1:                  Host2:
|- Brick0: A0           |- Brick0: A1           |- Brick0: A2
|- Brick1: B1           |- Brick1: B2           |- Brick1: B0

this setup can tolerate both single brick and single host failures.

Dmitry