[Gluster-devel] Error coalesce for erasure code xlator

Xavier Hernandez xhernandez at datalab.es
Tue Jul 1 10:59:20 UTC 2014


Hi,

while the erasure code xlator is being reviewed, I'm thinking about how to 
handle some kinds of errors.

In normal circumstances all bricks will give the same answers to the same 
requests, however, after some brick failures, underlying file system 
corruption or any other factors, it's possible that bricks give different 
answers to the same request.

For example, an 'unlink' request could succeed on some bricks and fail on 
others. Currently, the most "common" answer is taken as the good one only if 
it reaches a minimum amount of quorum, but if there isn't enough quorum, it 
fails with EIO.

Not having enough quorum means that more than R (redundancy) bricks have 
failed simultaneously (or have failed while another brick was alive but not 
recovered yet), which means that it's outside of the defined work conditions. 
However in some circumstances this could be improved.

Supose that the reason of failure of the unlink operation on some brick is 
ENOENT. We could consider this answer as a success and combine it with the 
other successful answers, giving more chances to reach the quorum minimum. Of 
course this will depend on the operation. If the operation were an open 
instead of an unlink, this combination won't be possible.

This can also be applied to error codes. In the same case, ENOENT and ENOTDIR 
errors could be combined, because they basically mean the same (relative to 
the file in question). Even in an open operation these two answers could be 
combined to give a more detailed error instead of EIO.

The only possible combinations I see are:

* Coalesce an error answer with a success answer
* Coalesce two different error answers

I don't see any case where two different success answers could be combined.

Would this be interesting to have for ec ?

Any thoughts/ideas/feedback will be welcome.

Xavi



More information about the Gluster-devel mailing list