[Bugs] [Bug 1565623] glusterfs disperse volume input output error

bugzilla at redhat.com bugzilla at redhat.com
Wed Apr 11 11:31:41 UTC 2018


https://bugzilla.redhat.com/show_bug.cgi?id=1565623



--- Comment #12 from Xavi Hernandez <jahernan at redhat.com> ---
>From this data, we can clearly see that nodes glfs-node13.avp.ru and
glfs-node25.avp.ru have incomplete fragments (its size is smaller than the
others) because they were in the middle of a heal operation.

So the best possibility is to consider the fragment on glfs-node19.avp.ru as
good. It has the correct size, but its modification time differs in one second
compared to the others. It's possible that any change made during this time
will contain garbage data now.

Does this qcow image correspond to a machine with heavy disk activity ?

We can proceed with the recovery of the fragment on glfs-node19.avp.ru and see
what happens, or wait to see if we can recover the file using a specific tool,
though without guarantees (note that since we have two nodes with a fragment
size of little more than 2 GB, we can only recover errors in glfs-node19.avp.ru
below this size. Any errors above this size are unrecoverable).

You can also make a manual copy of all fragments (directly from bricks to
somewhere else) before attempting to recover the fragment on
glfs-node19.avp.ru, just to be able to try other approaches if the first one
doesn't work.

Once we recover the fragment on glfs-node19.avp.ru, we cannot attempt a manual
repair unless a manual copy of all fragments has been done previously.

If you want to proceed with the recovery, you can do this on node
glfs-node19.avp.ru:

    setfattr -n trusted.ec.size -v 0x0000000b83f30000
/data1/bricks/brick1/vmfs/slake-test-bck-m1-d1.qcow2
    setfattr -n trusted.ec.version -v 0x0000000004e1910e0000000004e19112
/data1/bricks/brick1/vmfs/slake-test-bck-m1-d1.qcow2

This should fix the Input/Output error and a heal should be triggered shortly
after to fix the other remaining fragments, but it's no guarantee that the
virtual machine will work correctly. If it doesn't work, you can recover from
backup or we can try to manually recover the file (the lower 2GB at most).

-- 
You are receiving this mail because:
You are on the CC list for the bug.
Unsubscribe from this bug https://bugzilla.redhat.com/token.cgi?t=Wc3Ikf0OUa&a=cc_unsubscribe


More information about the Bugs mailing list