[Bugs] [Bug 1565623] glusterfs disperse volume input output error

Tue Apr 10 19:49:17 UTC 2018

https://bugzilla.redhat.com/show_bug.cgi?id=1565623

--- Comment #8 from Alexey Shcherbakov <alexey.shcherbakov at kaspersky.com> ---
(In reply to Xavi Hernandez from comment #5)
> Which node did you reboot before seeing those errors ? at what time was the
> node rebooted ?
> 

was rebooted only node glfs-node19.avp.ru, the rest worked normally, last time
(UTC) when qcow disk image worked:

-rwxrwx--- 1  107  107  47G apr  6 06:37 slake-test-bck-m1-d1.qcow2

and log from virtual machine that was disconnected disk at this time:

[2018-04-06 06:39:04.177631] E [MSGID: 114031]
[client-rpc-fops.c:1550:client3_3_inodelk_cbk] 0-vol1-client-8: remote
operation failed [Transport endpoint is not connected]
[2018-04-06 06:39:04.189701] E [rpc-clnt.c:365:saved_frames_unwind] (-->
/lib64/libglusterfs.so.0(_gf_log_callingfn+0x13b)[0x7fe0801186fb] (-->
/lib64/libgfrpc.so.0(saved_frames_unwind+0x1de)[0x7fe08b20b79e] (-->
/lib64/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7fe08b20b8ae] (-->
/lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x84)[0x7fe08b20d004] (-->
/lib64/libgfrpc.so.0(rpc_clnt_notify+0x110)[0x7fe08b20d8d0] )))))
0-vol1-client-8: forced unwinding frame type(GF-DUMP) op(NULL(2)) called at
2018-04-06 06:38:22.143778 (xid=0xc908c31a)
block I/O error in device 'drive-virtio-disk0': Operation not permitted (1)
block I/O error in device 'drive-virtio-disk0': Operation not permitted (1)
[2018-04-06 06:39:04.334167] E [MSGID: 122034]
[ec-common.c:461:ec_child_select] 0-vol1-disperse-0: Insufficient available
children for this request (have 0, need 13)
block I/O error in device 'drive-virtio-disk0': Operation not permitted (1)
block I/O error in device 'drive-virtio-disk0': Operation not permitted (1)

> What I see from the data you posted is that glfs-node13.avp.ru and
> glfs-node25.avp.ru were doing a heal at some point (probably at the time of
> reboot, but I'm not completely sure yet). This is ok because your
> configuration allows 2 bad bricks, but we have another node with mismatching
> data: glfs-node19.avp.ru.
> 
> This is what is causing the EIO error, since we have 3 failures but a
> maximum of 2 are allowed.
> 
> We can try to determine if one of the mismatching versions is good enough to
> be considered as good and recover the file.
> 
> I still need to check the logs to see if there's more information. Meantime,
> knowing which node was rebooted and at what time will be very useful to
> analyze the logs.

I collected and attached more logs from all nodes.

-- 
You are receiving this mail because:
You are on the CC list for the bug.
You are the assignee for the bug.