[Bugs] [Bug 1250704] Random errors when reading multiple files in parallel on disperse volume

bugzilla at redhat.com bugzilla at redhat.com
Thu Aug 20 18:48:09 UTC 2015


https://bugzilla.redhat.com/show_bug.cgi?id=1250704



--- Comment #11 from Xavier Hernandez <xhernandez at datalab.es> ---
That's interesting. Definitely these XDR errors are the cause of the I/O errors
you are seeing. Your volume can support up to 2 brick failures. Each XDR error
causes that ec considers the corresponding brick as bad or inconsistent,
ignoring it for future requests.

As it can be seen in the logs, the two first XDR errors do not cause any
failure. However the third error causes the read to return I/O error.

To solve this situation, self-heal is started to repair the brick or bricks
that are reporting errors. However if bricks fail faster than what self-heal
can repair, and more than 2 bricks are considered bad at the same time, ec
cannot return any valid data.

This XDR problem is very weird. Do you see anything relevant in the brick logs
?

Can you double check that all bricks and clients are using the same exact
version and that there aren't more than one installed copy of the executables
(for example one installed from a repository and another compiled by hand).

-- 
You are receiving this mail because:
You are on the CC list for the bug.
Unsubscribe from this bug https://bugzilla.redhat.com/token.cgi?t=QxuKdadANt&a=cc_unsubscribe


More information about the Bugs mailing list