[Gluster-devel] healing of bad objects (marked by scrubber)

Wed Jul 8 06:12:41 UTC 2015

Adding the correct gluster-devel id.

Regards,
Raghavendra Bhat

On 07/08/2015 11:38 AM, Raghavendra Bhat wrote:
>
> Hi,
>
> In bit-rot feature, the scrubber marks the corrupted (objects whose 
> data has gone bad) as bad objects (via extended attribute). If the 
> volume is a replicate volume and a object in one of the replicas goes 
> bad. In this case, the client is able to see the data via the good 
> copy present in the other replica. But as of now, the self-heal does 
> not heal the bad objects.  So the method to heal the bad object is to 
> remove the bad object directly from the backend and let self-heal take 
> care of healing it from the good copy.
>
> The above method has a problem. The bit-rot-stub xlator sitting in the 
> brick graph, remembers an object as bad in its inode context (either 
> when the object was being marked bad by scrubber, or during the first 
> lookup of the object if it was already marked bad). Bit-rot-stub uses 
> that info to block any read/write operations on such bad objects. So 
> it blocks any kind of operation attempted by self-heal as well to 
> correct the object (the object was deleted directly in the backend, so 
> the in memory inode will still be present and considered valid).
>
> There are 2 methods that I think can solve the issue.
>
> 1) In server_lookup_cbk, if the lookup of a object fails due to 
> ENOENT  *AND*  the lookup is a revalidate lookup, then forget the 
> inode associated with that object (not just unlinking the dentry, 
> forget the inode as well iff there are no more dentries associated 
> with the inode). Atleast this way, the inode would be forgotten, and 
> later when self-heal wants to correct the object, it has to create a 
> new object (the object was removed directly from the backend), which 
> has to happen with the creation of a new in memory inode and 
> read/write operations by self-heal daemon will not be blocked.
> I have sent a patch for review for the above method:
> http://review.gluster.org/#/c/11489/
>
> OR
>
> 2) Do not block write operations coming on the bad object if the 
> operation is coming from self-heal and allow it to completely heal the 
> file and once healing is done, remove the bad-object information from 
> the inode context.
> The requests coming from self-heal demon can be identified by checking 
> the pid of it (it has -ve pid). But if the self-heal happening from 
> the glusterfs client itself, I am not sure whether self-heal happens 
> with a -ve pid for the frame or the same pid as that of the frame of 
> the original fop which triggered the self-heal. Pranith? Can you 
> clarify this?
>
> Please provide feedback.
>
> Regards,
> Raghavendra Bhat