[Gluster-devel] healing of bad objects (marked by scrubber)
ravishankar at redhat.com
Wed Jul 8 06:36:37 UTC 2015
On 07/08/2015 11:42 AM, Raghavendra Bhat wrote:
> Adding the correct gluster-devel id.
> Raghavendra Bhat
> On 07/08/2015 11:38 AM, Raghavendra Bhat wrote:
>> In bit-rot feature, the scrubber marks the corrupted (objects whose
>> data has gone bad) as bad objects (via extended attribute). If the
>> volume is a replicate volume and a object in one of the replicas goes
>> bad. In this case, the client is able to see the data via the good
>> copy present in the other replica. But as of now, the self-heal does
>> not heal the bad objects. So the method to heal the bad object is to
>> remove the bad object directly from the backend and let self-heal
>> take care of healing it from the good copy.
>> The above method has a problem. The bit-rot-stub xlator sitting in
>> the brick graph, remembers an object as bad in its inode context
>> (either when the object was being marked bad by scrubber, or during
>> the first lookup of the object if it was already marked bad).
>> Bit-rot-stub uses that info to block any read/write operations on
>> such bad objects. So it blocks any kind of operation attempted by
>> self-heal as well to correct the object (the object was deleted
>> directly in the backend, so the in memory inode will still be present
>> and considered valid).
>> There are 2 methods that I think can solve the issue.
>> 1) In server_lookup_cbk, if the lookup of a object fails due to
>> ENOENT *AND* the lookup is a revalidate lookup, then forget the
>> inode associated with that object (not just unlinking the dentry,
>> forget the inode as well iff there are no more dentries associated
>> with the inode). Atleast this way, the inode would be forgotten, and
>> later when self-heal wants to correct the object, it has to create a
>> new object (the object was removed directly from the backend), which
>> has to happen with the creation of a new in memory inode and
>> read/write operations by self-heal daemon will not be blocked.
>> I have sent a patch for review for the above method:
>> 2) Do not block write operations coming on the bad object if the
>> operation is coming from self-heal and allow it to completely heal
>> the file and once healing is done, remove the bad-object information
>> from the inode context.
>> The requests coming from self-heal demon can be identified by
>> checking the pid of it (it has -ve pid). But if the self-heal
>> happening from the glusterfs client itself, I am not sure whether
>> self-heal happens with a -ve pid for the frame or the same pid as
>> that of the frame of the original fop which triggered the self-heal.
>> Pranith? Can you clarify this?
For afr-v2, the heals that happen via the client happen in a synctask
with the same negative pid (GF_CLIENT_PID_AFR_SELF_HEALD) as the
I think approach 1 is better as it is independent of who does the heal
(not sure if what the pid/ behavior is with disperse volume heals) and
it makes sense to forget the inode when the corresponding file is no
longer present in the back-end.
>> Please provide feedback.
>> Raghavendra Bhat
> Gluster-devel mailing list
> Gluster-devel at gluster.org
More information about the Gluster-devel