[Bugs] [Bug 1565623] glusterfs disperse volume input output error

Wed Apr 11 13:17:48 UTC 2018

https://bugzilla.redhat.com/show_bug.cgi?id=1565623

--- Comment #13 from Alexey Shcherbakov <alexey.shcherbakov at kaspersky.com> ---
(In reply to Xavi Hernandez from comment #12)
> From this data, we can clearly see that nodes glfs-node13.avp.ru and
> glfs-node25.avp.ru have incomplete fragments (its size is smaller than the
> others) because they were in the middle of a heal operation.
> 
> So the best possibility is to consider the fragment on glfs-node19.avp.ru as
> good. It has the correct size, but its modification time differs in one
> second compared to the others. It's possible that any change made during
> this time will contain garbage data now.
> 
> Does this qcow image correspond to a machine with heavy disk activity ?
> 

Yes, machine is heavy loaded.

> We can proceed with the recovery of the fragment on glfs-node19.avp.ru and
> see what happens, or wait to see if we can recover the file using a specific
> tool, though without guarantees (note that since we have two nodes with a
> fragment size of little more than 2 GB, we can only recover errors in
> glfs-node19.avp.ru below this size. Any errors above this size are
> unrecoverable).
> 
> You can also make a manual copy of all fragments (directly from bricks to
> somewhere else) before attempting to recover the fragment on
> glfs-node19.avp.ru, just to be able to try other approaches if the first one
> doesn't work.
> 
> Once we recover the fragment on glfs-node19.avp.ru, we cannot attempt a
> manual repair unless a manual copy of all fragments has been done previously.
> 
> If you want to proceed with the recovery, you can do this on node
> glfs-node19.avp.ru:
> 
>     setfattr -n trusted.ec.size -v 0x0000000b83f30000
> /data1/bricks/brick1/vmfs/slake-test-bck-m1-d1.qcow2
>     setfattr -n trusted.ec.version -v 0x0000000004e1910e0000000004e19112
> /data1/bricks/brick1/vmfs/slake-test-bck-m1-d1.qcow2
> 

I'm ran these commands on glfs-node19.avp.ru and virtual machine started with
qcow disk and working without errors - correctly.

Thank you so much!!!

> This should fix the Input/Output error and a heal should be triggered
> shortly after to fix the other remaining fragments, but it's no guarantee
> that the virtual machine will work correctly. If it doesn't work, you can
> recover from backup or we can try to manually recover the file (the lower
> 2GB at most).

But, heal info in previuos state:

# gluster volume heal vol1 info
Brick glfs-node11.local:/data1/bricks/brick1
/vmfs/slake-test-bck-m1-d1.qcow2 
Status: Connected
Number of entries: 1

Brick glfs-node12.local:/data1/bricks/brick1
/vmfs/slake-test-bck-m1-d1.qcow2 
Status: Connected
Number of entries: 1

Brick glfs-node13.local:/data1/bricks/brick1
/vmfs/slake-test-bck-m1-d1.qcow2 
Status: Connected
Number of entries: 1

Brick glfs-node14.local:/data1/bricks/brick1
/vmfs/slake-test-bck-m1-d1.qcow2 
Status: Connected
Number of entries: 1

Brick glfs-node15.local:/data1/bricks/brick1
/vmfs/slake-test-bck-m1-d1.qcow2 
Status: Connected
Number of entries: 1

Brick glfs-node16.local:/data1/bricks/brick1
/vmfs/slake-test-bck-m1-d1.qcow2 
Status: Connected
Number of entries: 1

Brick glfs-node17.local:/data1/bricks/brick1
/vmfs/slake-test-bck-m1-d1.qcow2 
Status: Connected
Number of entries: 1

Brick glfs-node18.local:/data1/bricks/brick1
/vmfs/slake-test-bck-m1-d1.qcow2 
Status: Connected
Number of entries: 1

Brick glfs-node19.local:/data1/bricks/brick1
/vmfs/slake-test-bck-m1-d1.qcow2 
Status: Connected
Number of entries: 1

Brick glfs-node20.local:/data1/bricks/brick1
/vmfs/slake-test-bck-m1-d1.qcow2 
Status: Connected
Number of entries: 1

Brick glfs-node21.local:/data1/bricks/brick1
/vmfs/slake-test-bck-m1-d1.qcow2 
Status: Connected
Number of entries: 1

Brick glfs-node22.local:/data1/bricks/brick1
/vmfs/slake-test-bck-m1-d1.qcow2 
Status: Connected
Number of entries: 1

Brick glfs-node23.local:/data1/bricks/brick1
/vmfs/slake-test-bck-m1-d1.qcow2 
Status: Connected
Number of entries: 1

Brick glfs-node24.local:/data1/bricks/brick1
/vmfs/slake-test-bck-m1-d1.qcow2 
Status: Connected
Number of entries: 1

Brick glfs-node25.local:/data1/bricks/brick1
/vmfs/slake-test-bck-m1-d1.qcow2 
Status: Connected
Number of entries: 1

What next steps should be taken ?

-- 
You are receiving this mail because:
You are on the CC list for the bug.
Unsubscribe from this bug https://bugzilla.redhat.com/token.cgi?t=SjEv7oYZOV&a=cc_unsubscribe