[Gluster-devel] healing of bad objects (marked by scrubber)

Wed Jul 8 06:36:37 UTC 2015

On 07/08/2015 11:42 AM, Raghavendra Bhat wrote:
> Adding the correct gluster-devel id.
>
> Regards,
> Raghavendra Bhat
>
> On 07/08/2015 11:38 AM, Raghavendra Bhat wrote:
>>
>> Hi,
>>
>> In bit-rot feature, the scrubber marks the corrupted (objects whose 
>> data has gone bad) as bad objects (via extended attribute). If the 
>> volume is a replicate volume and a object in one of the replicas goes 
>> bad. In this case, the client is able to see the data via the good 
>> copy present in the other replica. But as of now, the self-heal does 
>> not heal the bad objects.  So the method to heal the bad object is to 
>> remove the bad object directly from the backend and let self-heal 
>> take care of healing it from the good copy.
>>
>> The above method has a problem. The bit-rot-stub xlator sitting in 
>> the brick graph, remembers an object as bad in its inode context 
>> (either when the object was being marked bad by scrubber, or during 
>> the first lookup of the object if it was already marked bad). 
>> Bit-rot-stub uses that info to block any read/write operations on 
>> such bad objects. So it blocks any kind of operation attempted by 
>> self-heal as well to correct the object (the object was deleted 
>> directly in the backend, so the in memory inode will still be present 
>> and considered valid).
>>
>> There are 2 methods that I think can solve the issue.
>>
>> 1) In server_lookup_cbk, if the lookup of a object fails due to 
>> ENOENT  *AND*  the lookup is a revalidate lookup, then forget the 
>> inode associated with that object (not just unlinking the dentry, 
>> forget the inode as well iff there are no more dentries associated 
>> with the inode). Atleast this way, the inode would be forgotten, and 
>> later when self-heal wants to correct the object, it has to create a 
>> new object (the object was removed directly from the backend), which 
>> has to happen with the creation of a new in memory inode and 
>> read/write operations by self-heal daemon will not be blocked.
>> I have sent a patch for review for the above method:
>> http://review.gluster.org/#/c/11489/
>>
>> OR
>>
>> 2) Do not block write operations coming on the bad object if the 
>> operation is coming from self-heal and allow it to completely heal 
>> the file and once healing is done, remove the bad-object information 
>> from the inode context.
>> The requests coming from self-heal demon can be identified by 
>> checking the pid of it (it has -ve pid). But if the self-heal 
>> happening from the glusterfs client itself, I am not sure whether 
>> self-heal happens with a -ve pid for the frame or the same pid as 
>> that of the frame of the original fop which triggered the self-heal. 
>> Pranith? Can you clarify this?
>>

For afr-v2, the heals that happen via the client happen in  a synctask 
with the same negative pid (GF_CLIENT_PID_AFR_SELF_HEALD) as the 
selfheal daemon.
I think approach 1 is better as it is independent of who does the heal 
(not sure if what the pid/ behavior is with disperse volume heals) and 
it makes sense to forget the inode when the corresponding file is no 
longer present in the back-end.

Thanks,
Ravi

>> Please provide feedback.
>>

>> Regards,
>> Raghavendra Bhat
>
> _______________________________________________
> Gluster-devel mailing list
> Gluster-devel at gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-devel