[Gluster-devel] Error coalesce for erasure code xlator
Xavier Hernandez
xhernandez at datalab.es
Tue Jul 1 10:59:20 UTC 2014
Hi,
while the erasure code xlator is being reviewed, I'm thinking about how to
handle some kinds of errors.
In normal circumstances all bricks will give the same answers to the same
requests, however, after some brick failures, underlying file system
corruption or any other factors, it's possible that bricks give different
answers to the same request.
For example, an 'unlink' request could succeed on some bricks and fail on
others. Currently, the most "common" answer is taken as the good one only if
it reaches a minimum amount of quorum, but if there isn't enough quorum, it
fails with EIO.
Not having enough quorum means that more than R (redundancy) bricks have
failed simultaneously (or have failed while another brick was alive but not
recovered yet), which means that it's outside of the defined work conditions.
However in some circumstances this could be improved.
Supose that the reason of failure of the unlink operation on some brick is
ENOENT. We could consider this answer as a success and combine it with the
other successful answers, giving more chances to reach the quorum minimum. Of
course this will depend on the operation. If the operation were an open
instead of an unlink, this combination won't be possible.
This can also be applied to error codes. In the same case, ENOENT and ENOTDIR
errors could be combined, because they basically mean the same (relative to
the file in question). Even in an open operation these two answers could be
combined to give a more detailed error instead of EIO.
The only possible combinations I see are:
* Coalesce an error answer with a success answer
* Coalesce two different error answers
I don't see any case where two different success answers could be combined.
Would this be interesting to have for ec ?
Any thoughts/ideas/feedback will be welcome.
Xavi
More information about the Gluster-devel
mailing list