[Gluster-devel] healing of bad objects (marked by scrubber)

Venky Shankar vshankar at redhat.com
Wed Jul 8 08:59:13 UTC 2015

On 07/08/2015 12:06 PM, Ravishankar N wrote:
> On 07/08/2015 11:42 AM, Raghavendra Bhat wrote:
>> Adding the correct gluster-devel id.
>> Regards,
>> Raghavendra Bhat
>> On 07/08/2015 11:38 AM, Raghavendra Bhat wrote:
>>> Hi,
>>> In bit-rot feature, the scrubber marks the corrupted (objects whose 
>>> data has gone bad) as bad objects (via extended attribute). If the 
>>> volume is a replicate volume and a object in one of the replicas 
>>> goes bad. In this case, the client is able to see the data via the 
>>> good copy present in the other replica. But as of now, the self-heal 
>>> does not heal the bad objects.  So the method to heal the bad object 
>>> is to remove the bad object directly from the backend and let 
>>> self-heal take care of healing it from the good copy.
>>> The above method has a problem. The bit-rot-stub xlator sitting in 
>>> the brick graph, remembers an object as bad in its inode context 
>>> (either when the object was being marked bad by scrubber, or during 
>>> the first lookup of the object if it was already marked bad). 
>>> Bit-rot-stub uses that info to block any read/write operations on 
>>> such bad objects. So it blocks any kind of operation attempted by 
>>> self-heal as well to correct the object (the object was deleted 
>>> directly in the backend, so the in memory inode will still be 
>>> present and considered valid).
>>> There are 2 methods that I think can solve the issue.
>>> 1) In server_lookup_cbk, if the lookup of a object fails due to 
>>> ENOENT  *AND*  the lookup is a revalidate lookup, then forget the 
>>> inode associated with that object (not just unlinking the dentry, 
>>> forget the inode as well iff there are no more dentries associated 
>>> with the inode). Atleast this way, the inode would be forgotten, and 
>>> later when self-heal wants to correct the object, it has to create a 
>>> new object (the object was removed directly from the backend), which 
>>> has to happen with the creation of a new in memory inode and 
>>> read/write operations by self-heal daemon will not be blocked.
>>> I have sent a patch for review for the above method:
>>> http://review.gluster.org/#/c/11489/
>>> OR
>>> 2) Do not block write operations coming on the bad object if the 
>>> operation is coming from self-heal and allow it to completely heal 
>>> the file and once healing is done, remove the bad-object information 
>>> from the inode context.
>>> The requests coming from self-heal demon can be identified by 
>>> checking the pid of it (it has -ve pid). But if the self-heal 
>>> happening from the glusterfs client itself, I am not sure whether 
>>> self-heal happens with a -ve pid for the frame or the same pid as 
>>> that of the frame of the original fop which triggered the self-heal. 
>>> Pranith? Can you clarify this?
> For afr-v2, the heals that happen via the client happen in  a synctask 
> with the same negative pid (GF_CLIENT_PID_AFR_SELF_HEALD) as the 
> selfheal daemon.
> I think approach 1 is better as it is independent of who does the heal 
> (not sure if what the pid/ behavior is with disperse volume heals) and 
> it makes sense to forget the inode when the corresponding file is no 
> longer present in the back-end.

+1 for approach #1. Also, it might be beneficial for other xlators which 
might want to freshly initiate inode context in such scenarios.
> Thanks,
> Ravi
>>> Please provide feedback.
>>> Regards,
>>> Raghavendra Bhat
>> _______________________________________________
>> Gluster-devel mailing list
>> Gluster-devel at gluster.org
>> http://www.gluster.org/mailman/listinfo/gluster-devel
> _______________________________________________
> Gluster-devel mailing list
> Gluster-devel at gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-devel

More information about the Gluster-devel mailing list