[Gluster-users] File Corruption when adding bricks to live replica volumes

Tue Jan 19 11:24:07 UTC 2016

gluster 3.7.6

I seem to be able to reliably reproduce this. I have a replica 2 volume 
with 1 test VM image. While the VM is  running with heavy disk 
read/writes  (disk benchmark) I add a 3rd brick for replica 3:

gluster volume add-brick datastore1 replica 3 
vng.proxmox.softlog:/vmdata/datastore1

I pretty much immediately get this:

    gluster volume heal datastore1 info
    Brick vna.proxmox.softlog:/vmdata/datastore1
    /.shard/d6aad699-d71d-4b35-b021-d35e5ff297c4.20
    /.shard/d6aad699-d71d-4b35-b021-d35e5ff297c4.22
    /.shard/d6aad699-d71d-4b35-b021-d35e5ff297c4.55 - Possibly
    undergoing heal

    /images/301/vm-301-disk-1.qcow2 - Possibly undergoing heal

    Number of entries: 4

    Brick vnb.proxmox.softlog:/vmdata/datastore1
    /images/301/vm-301-disk-1.qcow2 - Possibly undergoing heal

    /.shard/d6aad699-d71d-4b35-b021-d35e5ff297c4.55 - Possibly
    undergoing heal

    /.shard/d6aad699-d71d-4b35-b021-d35e5ff297c4.20
    /.shard/d6aad699-d71d-4b35-b021-d35e5ff297c4.22
    Number of entries: 4

    Brick vng.proxmox.softlog:/vmdata/datastore1
    /.shard/d6aad699-d71d-4b35-b021-d35e5ff297c4.16
    /.shard/d6aad699-d71d-4b35-b021-d35e5ff297c4.28
    /.shard/d6aad699-d71d-4b35-b021-d35e5ff297c4.1
    /.shard/d6aad699-d71d-4b35-b021-d35e5ff297c4.22
    /.shard/d6aad699-d71d-4b35-b021-d35e5ff297c4.77
    /.shard/d6aad699-d71d-4b35-b021-d35e5ff297c4.9
    /.shard/d6aad699-d71d-4b35-b021-d35e5ff297c4.5
    /.shard/d6aad699-d71d-4b35-b021-d35e5ff297c4.2
    /.shard/d6aad699-d71d-4b35-b021-d35e5ff297c4.26
    /.shard/d6aad699-d71d-4b35-b021-d35e5ff297c4.15
    /.shard/d6aad699-d71d-4b35-b021-d35e5ff297c4.13
    /.shard/d6aad699-d71d-4b35-b021-d35e5ff297c4.3
    /.shard/d6aad699-d71d-4b35-b021-d35e5ff297c4.18
    Number of entries: 13

The brick on vng is the new empty brick, but it has 13 shards being 
healed back to vna & vnb. That can't be right and if I leave it the VM 
becomes hopelessly corrupted. Also there are 81 shards in the files, 
they should all be queued for healing.

Additionally I get read errors when I run a qemu-img check on the VM 
image. If I remove the vng brick the problems are resolved.

If I do the same process while the VM is not running - i.e no files are 
being access, every proceeds as expect. All shard on vn & vnb are healed 
to vng,

-- 
Lindsay Mathieson

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20160119/37126601/attachment.html>