[Gluster-devel] Error coalesce for erasure code xlator

Jeff Darcy jdarcy at redhat.com
Tue Jul 1 12:18:46 UTC 2014


> Not having enough quorum means that more than R (redundancy) bricks have
> failed simultaneously (or have failed while another brick was alive but not
> recovered yet), which means that it's outside of the defined work conditions.
> However in some circumstances this could be improved.

I think it could be worse than that.  Consider the following series of
operations:

(1) R bricks are down - let's say A and B with R=2.

(2) A modifying operation is done on file/directory X.

(3) A and B come back up but on-demand recovery is not yet done for X.

(4) A *different* R bricks go down - let's say C and D.

(5) Somebody tries to read X.

At this point the read fails.  Even though we have quorum, we still don't
have enough bricks to satisfy the request.  One could argue that the failures
on C and D are simultaneous with those on A and B, in the sense that the
failure of A and B persists until recovery is complete, and thus it violates
our operating assumption.  I've tried to explain that to users many times on
many projects, and they seem to have very little patience for that answer.
I've come to believe that they're right, and that minimizing that recovery
time is critical.

> Supose that the reason of failure of the unlink operation on some brick is
> ENOENT. We could consider this answer as a success and combine it with the
> other successful answers, giving more chances to reach the quorum minimum. Of
> course this will depend on the operation. If the operation were an open
> instead of an unlink, this combination won't be possible.

Yes, I think it's valid to treat ENOENT on an unlink as success.  Likewise
for EEXIST on a create/mkdir/link generally, though there are cases where it's
necessary to ensure the same GFID and owner/mode/etc.

> This can also be applied to error codes. In the same case, ENOENT and ENOTDIR
> errors could be combined, because they basically mean the same (relative to
> the file in question). Even in an open operation these two answers could be
> combined to give a more detailed error instead of EIO.

Agreed.  There are already places in the code where we give certain errors
priority over others, because some errors are more "interesting" than others.
EIO is the least interesting, and should probably be overridden by any other
error code if we have one.


More information about the Gluster-devel mailing list