[Gluster-devel] Regression failure of tests/basic/afr/data-self-heal.t
Ravishankar N
ravishankar at redhat.com
Wed May 6 04:26:09 UTC 2015
TL;DR: Need to come up with a fix for AFR data self-heal from clients
(mounts).
/data-self-heal.t/ creates a 1x2 volume, sets afr changelog xattrs
directly on the files in the backend bricks, then runs full heal to heal
the files.
The test fails intermittently when run in a loop because data self-heal
attempts non-blocking locks before healing and the two heal threads
(one per brick) might try to acquire the lock at the same time and both
might fail. In afr-v1, only one thread gets spawned if both bricks are
in the same node. In afr-v2, we cannot do this because unlike in v1,
there is no conservative merge in afr_opendir_cbk() in v2. We are not
sure that adding conservative merge in v2 is a good idea because it
involves (multiple ) readdirs on both bricks and computing checksum on
the entries to see if there is a mismatch, which can be a costly
operation when done from clients. Making the locks blocking could cause
one heal thread to block instead of trying to heal other files if the
other thread holds the lock. One approach is to do what ec does by
using a virtual xattr and handling it in the getxattr FOP to trigger
data heals from clients. More thought needs to be given to this.
Regards,
Ravi
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.gluster.org/pipermail/gluster-devel/attachments/20150506/df8a69f7/attachment.html>
More information about the Gluster-devel
mailing list