[Gluster-devel] Weird full heal on Distributed-Disperse volume with sharding

Xavi Hernandez xhernandez at redhat.com
Wed Sep 30 05:58:05 UTC 2020


Hi Dmitry,

my comments below...

On Tue, Sep 29, 2020 at 11:19 AM Dmitry Antipov <dmantipov at yandex.ru> wrote:

> For the testing purposes, I've set up a localhost-only setup with 6x16M
> ramdisks (formatted as ext4) mounted (with '-o user_xattr') at
> /tmp/ram/{0,1,2,3,4,5} and SHARD_MIN_BLOCK_SIZE lowered to 4K. Finally
> the volume is:
>
> Volume Name: test
> Type: Distributed-Replicate
> Volume ID: 241d6679-7cd7-48b4-bdc5-8bc1c9940ac3
> Status: Started
> Snapshot Count: 0
> Number of Bricks: 2 x 3 = 6
> Transport-type: tcp
> Bricks:
> Brick1: [local-ip]:/tmp/ram/0
> Brick2: [local-ip]:/tmp/ram/1
> Brick3: [local-ip]:/tmp/ram/2
> Brick4: [local-ip]:/tmp/ram/3
> Brick5: [local-ip]:/tmp/ram/4
> Brick6: [local-ip]:/tmp/ram/5
> Options Reconfigured:
> features.shard-block-size: 64KB
> features.shard: on
> storage.fips-mode-rchecksum: on
> transport.address-family: inet
> nfs.disable: on
> performance.client-io-threads: off
>
> Then I mount it under /mnt/test:
>
> # mount -t glusterfs [local-ip]:/test /mnt/test
>
> and create 4M file on it:
>
> # dd if=/dev/random of=/mnt/test/file0 bs=1M count=4
>
> This creates 189 shards of 64K each, in /tmp/ram/?/.shard:
>
> /tmp/ram/0/.shard: 24
> /tmp/ram/1/.shard: 24
> /tmp/ram/2/.shard: 24
> /tmp/ram/3/.shard: 39
> /tmp/ram/4/.shard: 39
> /tmp/ram/5/.shard: 39
>
> To simulate data loss I just remove 2 arbitrary .shard directories,
> for example:
>
> # rm -rfv /tmp/ram/0/.shard /tmp/ram/5/.shard
>
> Finally, I do full heal:
>
> # gluster volume heal test full
>
> and successfully got all shards under /tmp/ram/{0,5}.shard back.
>
> But the things seems going weird for the following volume:
>
> Volume Name: test
> Type: Distributed-Disperse
> Volume ID: aa621c7e-1693-427a-9fd5-d7b38c27035e
> Status: Started
> Snapshot Count: 0
> Number of Bricks: 2 x (2 + 1) = 6
> Transport-type: tcp
> Bricks:
> Brick1: [local-ip]:/tmp/ram/0
> Brick2: [local-ip]:/tmp/ram/1
> Brick3: [local-ip]:/tmp/ram/2
> Brick4: [local-ip]:/tmp/ram/3
> Brick5: [local-ip]:/tmp/ram/4
> Brick6: [local-ip]:/tmp/ram/5
> Options Reconfigured:
> features.shard: on
> features.shard-block-size: 64KB
> storage.fips-mode-rchecksum: on
> transport.address-family: inet
> nfs.disable: on
>
> After creating 4M file as before, I've got the same 189 shards
> but 32K each.


This is normal. A dispersed volume writes encoded fragments of each block
in each brick. In this case it's a 2+1 configuration, so each block is
divided into 2 fragments. A third fragment is generated for redundancy and
stored on the third brick.


> After deleting /tmp/ram/{0,5}/.shard and full heal,
> I was able to get all shards back. But, after deleting
> /tmp/ram/{3,4}/.shard and full heal, I've ended up with the following:
>

This is not right. A disperse 2+1 configuration only supports a single
failure. Wiping 2 fragments from the same file makes the file
unrecoverable. Disperse works using the Reed-Solomon erasure code, which
requires at least 2 healthy fragments to recover the data (in a 2+1
configuration).

If you want to be able to recover from 2 disk failures, you need to create
a 4+2 configuration.

To make it more clear: a 2+1 configuration is like a traditional RAID5 with
3 disks. If you lose 2 disks, data is lost. A 4+2 is similar to a RAID6.

Regards,

Xavi


> /tmp/ram/0/.shard:
> -rw-r--r-- 2 root root 32768 Sep 29 12:01
> 951d7c52-7230-420b-b8bb-da887fffd41e.10
> -rw-r--r-- 2 root root 32768 Sep 29 12:01
> 951d7c52-7230-420b-b8bb-da887fffd41e.11
> -rw-r--r-- 2 root root 32768 Sep 29 12:01
> 951d7c52-7230-420b-b8bb-da887fffd41e.12
> -rw-r--r-- 2 root root 32768 Sep 29 12:01
> 951d7c52-7230-420b-b8bb-da887fffd41e.13
> -rw-r--r-- 2 root root 32768 Sep 29 12:01
> 951d7c52-7230-420b-b8bb-da887fffd41e.14
> -rw-r--r-- 2 root root 32768 Sep 29 12:01
> 951d7c52-7230-420b-b8bb-da887fffd41e.15
> -rw-r--r-- 2 root root 32768 Sep 29 12:01
> 951d7c52-7230-420b-b8bb-da887fffd41e.16
> -rw-r--r-- 2 root root 32768 Sep 29 12:01
> 951d7c52-7230-420b-b8bb-da887fffd41e.17
> -rw-r--r-- 2 root root 32768 Sep 29 12:01
> 951d7c52-7230-420b-b8bb-da887fffd41e.2
> -rw-r--r-- 2 root root 32768 Sep 29 12:01
> 951d7c52-7230-420b-b8bb-da887fffd41e.22
> -rw-r--r-- 2 root root 32768 Sep 29 12:01
> 951d7c52-7230-420b-b8bb-da887fffd41e.23
> -rw-r--r-- 2 root root 32768 Sep 29 12:01
> 951d7c52-7230-420b-b8bb-da887fffd41e.27
> -rw-r--r-- 2 root root 32768 Sep 29 12:01
> 951d7c52-7230-420b-b8bb-da887fffd41e.28
> -rw-r--r-- 2 root root 32768 Sep 29 12:01
> 951d7c52-7230-420b-b8bb-da887fffd41e.3
> -rw-r--r-- 2 root root 32768 Sep 29 12:01
> 951d7c52-7230-420b-b8bb-da887fffd41e.31
> -rw-r--r-- 2 root root 32768 Sep 29 12:01
> 951d7c52-7230-420b-b8bb-da887fffd41e.34
> -rw-r--r-- 2 root root 32768 Sep 29 12:01
> 951d7c52-7230-420b-b8bb-da887fffd41e.35
> -rw-r--r-- 2 root root 32768 Sep 29 12:01
> 951d7c52-7230-420b-b8bb-da887fffd41e.37
> -rw-r--r-- 2 root root 32768 Sep 29 12:01
> 951d7c52-7230-420b-b8bb-da887fffd41e.39
> -rw-r--r-- 2 root root 32768 Sep 29 12:01
> 951d7c52-7230-420b-b8bb-da887fffd41e.4
> -rw-r--r-- 2 root root 32768 Sep 29 12:01
> 951d7c52-7230-420b-b8bb-da887fffd41e.40
> -rw-r--r-- 2 root root 32768 Sep 29 12:01
> 951d7c52-7230-420b-b8bb-da887fffd41e.44
> -rw-r--r-- 2 root root 32768 Sep 29 12:01
> 951d7c52-7230-420b-b8bb-da887fffd41e.45
> -rw-r--r-- 2 root root 32768 Sep 29 12:01
> 951d7c52-7230-420b-b8bb-da887fffd41e.46
> -rw-r--r-- 2 root root 32768 Sep 29 12:01
> 951d7c52-7230-420b-b8bb-da887fffd41e.47
> -rw-r--r-- 2 root root 32768 Sep 29 12:01
> 951d7c52-7230-420b-b8bb-da887fffd41e.53
> -rw-r--r-- 2 root root 32768 Sep 29 12:01
> 951d7c52-7230-420b-b8bb-da887fffd41e.54
> -rw-r--r-- 2 root root 32768 Sep 29 12:01
> 951d7c52-7230-420b-b8bb-da887fffd41e.55
> -rw-r--r-- 2 root root 32768 Sep 29 12:01
> 951d7c52-7230-420b-b8bb-da887fffd41e.57
> -rw-r--r-- 2 root root 32768 Sep 29 12:01
> 951d7c52-7230-420b-b8bb-da887fffd41e.58
> -rw-r--r-- 2 root root 32768 Sep 29 12:01
> 951d7c52-7230-420b-b8bb-da887fffd41e.6
> -rw-r--r-- 2 root root 32768 Sep 29 12:01
> 951d7c52-7230-420b-b8bb-da887fffd41e.63
> -rw-r--r-- 2 root root 32768 Sep 29 12:01
> 951d7c52-7230-420b-b8bb-da887fffd41e.7
> -rw-r--r-- 2 root root 32768 Sep 29 12:01
> 951d7c52-7230-420b-b8bb-da887fffd41e.9
>
> /tmp/ram/1/.shard:
> -rw-r--r-- 2 root root 32768 Sep 29 12:01
> 951d7c52-7230-420b-b8bb-da887fffd41e.10
> -rw-r--r-- 2 root root 32768 Sep 29 12:01
> 951d7c52-7230-420b-b8bb-da887fffd41e.11
> -rw-r--r-- 2 root root 32768 Sep 29 12:01
> 951d7c52-7230-420b-b8bb-da887fffd41e.12
> -rw-r--r-- 2 root root 32768 Sep 29 12:01
> 951d7c52-7230-420b-b8bb-da887fffd41e.13
> -rw-r--r-- 2 root root 32768 Sep 29 12:01
> 951d7c52-7230-420b-b8bb-da887fffd41e.14
> -rw-r--r-- 2 root root 32768 Sep 29 12:01
> 951d7c52-7230-420b-b8bb-da887fffd41e.15
> -rw-r--r-- 2 root root 32768 Sep 29 12:01
> 951d7c52-7230-420b-b8bb-da887fffd41e.16
> -rw-r--r-- 2 root root 32768 Sep 29 12:01
> 951d7c52-7230-420b-b8bb-da887fffd41e.17
> -rw-r--r-- 2 root root 32768 Sep 29 12:01
> 951d7c52-7230-420b-b8bb-da887fffd41e.2
> -rw-r--r-- 2 root root 32768 Sep 29 12:01
> 951d7c52-7230-420b-b8bb-da887fffd41e.22
> -rw-r--r-- 2 root root 32768 Sep 29 12:01
> 951d7c52-7230-420b-b8bb-da887fffd41e.23
> -rw-r--r-- 2 root root 32768 Sep 29 12:01
> 951d7c52-7230-420b-b8bb-da887fffd41e.27
> -rw-r--r-- 2 root root 32768 Sep 29 12:01
> 951d7c52-7230-420b-b8bb-da887fffd41e.28
> -rw-r--r-- 2 root root 32768 Sep 29 12:01
> 951d7c52-7230-420b-b8bb-da887fffd41e.3
> -rw-r--r-- 2 root root 32768 Sep 29 12:01
> 951d7c52-7230-420b-b8bb-da887fffd41e.31
> -rw-r--r-- 2 root root 32768 Sep 29 12:01
> 951d7c52-7230-420b-b8bb-da887fffd41e.34
> -rw-r--r-- 2 root root 32768 Sep 29 12:01
> 951d7c52-7230-420b-b8bb-da887fffd41e.35
> -rw-r--r-- 2 root root 32768 Sep 29 12:01
> 951d7c52-7230-420b-b8bb-da887fffd41e.37
> -rw-r--r-- 2 root root 32768 Sep 29 12:01
> 951d7c52-7230-420b-b8bb-da887fffd41e.39
> -rw-r--r-- 2 root root 32768 Sep 29 12:01
> 951d7c52-7230-420b-b8bb-da887fffd41e.4
> -rw-r--r-- 2 root root 32768 Sep 29 12:01
> 951d7c52-7230-420b-b8bb-da887fffd41e.40
> -rw-r--r-- 2 root root 32768 Sep 29 12:01
> 951d7c52-7230-420b-b8bb-da887fffd41e.44
> -rw-r--r-- 2 root root 32768 Sep 29 12:01
> 951d7c52-7230-420b-b8bb-da887fffd41e.45
> -rw-r--r-- 2 root root 32768 Sep 29 12:01
> 951d7c52-7230-420b-b8bb-da887fffd41e.46
> -rw-r--r-- 2 root root 32768 Sep 29 12:01
> 951d7c52-7230-420b-b8bb-da887fffd41e.47
> -rw-r--r-- 2 root root 32768 Sep 29 12:01
> 951d7c52-7230-420b-b8bb-da887fffd41e.53
> -rw-r--r-- 2 root root 32768 Sep 29 12:01
> 951d7c52-7230-420b-b8bb-da887fffd41e.54
> -rw-r--r-- 2 root root 32768 Sep 29 12:01
> 951d7c52-7230-420b-b8bb-da887fffd41e.55
> -rw-r--r-- 2 root root 32768 Sep 29 12:01
> 951d7c52-7230-420b-b8bb-da887fffd41e.57
> -rw-r--r-- 2 root root 32768 Sep 29 12:01
> 951d7c52-7230-420b-b8bb-da887fffd41e.58
> -rw-r--r-- 2 root root 32768 Sep 29 12:01
> 951d7c52-7230-420b-b8bb-da887fffd41e.6
> -rw-r--r-- 2 root root 32768 Sep 29 12:01
> 951d7c52-7230-420b-b8bb-da887fffd41e.63
> -rw-r--r-- 2 root root 32768 Sep 29 12:01
> 951d7c52-7230-420b-b8bb-da887fffd41e.7
> -rw-r--r-- 2 root root 32768 Sep 29 12:01
> 951d7c52-7230-420b-b8bb-da887fffd41e.9
>
> /tmp/ram/2/.shard:
> -rw-r--r-- 2 root root 32768 Sep 29 12:01
> 951d7c52-7230-420b-b8bb-da887fffd41e.10
> -rw-r--r-- 2 root root 32768 Sep 29 12:01
> 951d7c52-7230-420b-b8bb-da887fffd41e.11
> -rw-r--r-- 2 root root 32768 Sep 29 12:01
> 951d7c52-7230-420b-b8bb-da887fffd41e.12
> -rw-r--r-- 2 root root 32768 Sep 29 12:01
> 951d7c52-7230-420b-b8bb-da887fffd41e.13
> -rw-r--r-- 2 root root 32768 Sep 29 12:01
> 951d7c52-7230-420b-b8bb-da887fffd41e.14
> -rw-r--r-- 2 root root 32768 Sep 29 12:01
> 951d7c52-7230-420b-b8bb-da887fffd41e.15
> -rw-r--r-- 2 root root 32768 Sep 29 12:01
> 951d7c52-7230-420b-b8bb-da887fffd41e.16
> -rw-r--r-- 2 root root 32768 Sep 29 12:01
> 951d7c52-7230-420b-b8bb-da887fffd41e.17
> -rw-r--r-- 2 root root 32768 Sep 29 12:01
> 951d7c52-7230-420b-b8bb-da887fffd41e.2
> -rw-r--r-- 2 root root 32768 Sep 29 12:01
> 951d7c52-7230-420b-b8bb-da887fffd41e.22
> -rw-r--r-- 2 root root 32768 Sep 29 12:01
> 951d7c52-7230-420b-b8bb-da887fffd41e.23
> -rw-r--r-- 2 root root 32768 Sep 29 12:01
> 951d7c52-7230-420b-b8bb-da887fffd41e.27
> -rw-r--r-- 2 root root 32768 Sep 29 12:01
> 951d7c52-7230-420b-b8bb-da887fffd41e.28
> -rw-r--r-- 2 root root 32768 Sep 29 12:01
> 951d7c52-7230-420b-b8bb-da887fffd41e.3
> -rw-r--r-- 2 root root 32768 Sep 29 12:01
> 951d7c52-7230-420b-b8bb-da887fffd41e.31
> -rw-r--r-- 2 root root 32768 Sep 29 12:01
> 951d7c52-7230-420b-b8bb-da887fffd41e.34
> -rw-r--r-- 2 root root 32768 Sep 29 12:01
> 951d7c52-7230-420b-b8bb-da887fffd41e.35
> -rw-r--r-- 2 root root 32768 Sep 29 12:01
> 951d7c52-7230-420b-b8bb-da887fffd41e.37
> -rw-r--r-- 2 root root 32768 Sep 29 12:01
> 951d7c52-7230-420b-b8bb-da887fffd41e.39
> -rw-r--r-- 2 root root 32768 Sep 29 12:01
> 951d7c52-7230-420b-b8bb-da887fffd41e.4
> -rw-r--r-- 2 root root 32768 Sep 29 12:01
> 951d7c52-7230-420b-b8bb-da887fffd41e.40
> -rw-r--r-- 2 root root 32768 Sep 29 12:01
> 951d7c52-7230-420b-b8bb-da887fffd41e.44
> -rw-r--r-- 2 root root 32768 Sep 29 12:01
> 951d7c52-7230-420b-b8bb-da887fffd41e.45
> -rw-r--r-- 2 root root 32768 Sep 29 12:01
> 951d7c52-7230-420b-b8bb-da887fffd41e.46
> -rw-r--r-- 2 root root 32768 Sep 29 12:01
> 951d7c52-7230-420b-b8bb-da887fffd41e.47
> -rw-r--r-- 2 root root 32768 Sep 29 12:01
> 951d7c52-7230-420b-b8bb-da887fffd41e.53
> -rw-r--r-- 2 root root 32768 Sep 29 12:01
> 951d7c52-7230-420b-b8bb-da887fffd41e.54
> -rw-r--r-- 2 root root 32768 Sep 29 12:01
> 951d7c52-7230-420b-b8bb-da887fffd41e.55
> -rw-r--r-- 2 root root 32768 Sep 29 12:01
> 951d7c52-7230-420b-b8bb-da887fffd41e.57
> -rw-r--r-- 2 root root 32768 Sep 29 12:01
> 951d7c52-7230-420b-b8bb-da887fffd41e.58
> -rw-r--r-- 2 root root 32768 Sep 29 12:01
> 951d7c52-7230-420b-b8bb-da887fffd41e.6
> -rw-r--r-- 2 root root 32768 Sep 29 12:01
> 951d7c52-7230-420b-b8bb-da887fffd41e.63
> -rw-r--r-- 2 root root 32768 Sep 29 12:01
> 951d7c52-7230-420b-b8bb-da887fffd41e.7
> -rw-r--r-- 2 root root 32768 Sep 29 12:01
> 951d7c52-7230-420b-b8bb-da887fffd41e.9
>
> So, /tmp/ram/{3,4}/.shard was not recovered. Even worse, /tmp/ram/5/.shard
> has disappeared completely. And of course this breaks all I/O on
> /mnt/test/file0,
> for example:
>
> # dd if=/dev/random of=/mnt/test/file0 bs=1M count=4
> dd: error writing '/mnt/test/file0': No such file or directory
> dd: closing output file '/mnt/test/file0': No such file or directory
>
> Any ideas on what's going on here?


> Dmitry
> _______________________________________________
>
> Community Meeting Calendar:
>
> Schedule -
> Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
> Bridge: https://bluejeans.com/441850968
>
>
>
>
> Gluster-devel mailing list
> Gluster-devel at gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-devel
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-devel/attachments/20200930/87a11eb8/attachment-0001.html>


More information about the Gluster-devel mailing list