[Gluster-devel] healing of bad objects (marked by scrubber)
Venky Shankar
vshankar at redhat.com
Wed Jul 8 08:59:13 UTC 2015
On 07/08/2015 12:06 PM, Ravishankar N wrote:
>
>
> On 07/08/2015 11:42 AM, Raghavendra Bhat wrote:
>> Adding the correct gluster-devel id.
>>
>> Regards,
>> Raghavendra Bhat
>>
>> On 07/08/2015 11:38 AM, Raghavendra Bhat wrote:
>>>
>>> Hi,
>>>
>>> In bit-rot feature, the scrubber marks the corrupted (objects whose
>>> data has gone bad) as bad objects (via extended attribute). If the
>>> volume is a replicate volume and a object in one of the replicas
>>> goes bad. In this case, the client is able to see the data via the
>>> good copy present in the other replica. But as of now, the self-heal
>>> does not heal the bad objects. So the method to heal the bad object
>>> is to remove the bad object directly from the backend and let
>>> self-heal take care of healing it from the good copy.
>>>
>>> The above method has a problem. The bit-rot-stub xlator sitting in
>>> the brick graph, remembers an object as bad in its inode context
>>> (either when the object was being marked bad by scrubber, or during
>>> the first lookup of the object if it was already marked bad).
>>> Bit-rot-stub uses that info to block any read/write operations on
>>> such bad objects. So it blocks any kind of operation attempted by
>>> self-heal as well to correct the object (the object was deleted
>>> directly in the backend, so the in memory inode will still be
>>> present and considered valid).
>>>
>>> There are 2 methods that I think can solve the issue.
>>>
>>> 1) In server_lookup_cbk, if the lookup of a object fails due to
>>> ENOENT *AND* the lookup is a revalidate lookup, then forget the
>>> inode associated with that object (not just unlinking the dentry,
>>> forget the inode as well iff there are no more dentries associated
>>> with the inode). Atleast this way, the inode would be forgotten, and
>>> later when self-heal wants to correct the object, it has to create a
>>> new object (the object was removed directly from the backend), which
>>> has to happen with the creation of a new in memory inode and
>>> read/write operations by self-heal daemon will not be blocked.
>>> I have sent a patch for review for the above method:
>>> http://review.gluster.org/#/c/11489/
>>>
>>> OR
>>>
>>> 2) Do not block write operations coming on the bad object if the
>>> operation is coming from self-heal and allow it to completely heal
>>> the file and once healing is done, remove the bad-object information
>>> from the inode context.
>>> The requests coming from self-heal demon can be identified by
>>> checking the pid of it (it has -ve pid). But if the self-heal
>>> happening from the glusterfs client itself, I am not sure whether
>>> self-heal happens with a -ve pid for the frame or the same pid as
>>> that of the frame of the original fop which triggered the self-heal.
>>> Pranith? Can you clarify this?
>>>
>
>
> For afr-v2, the heals that happen via the client happen in a synctask
> with the same negative pid (GF_CLIENT_PID_AFR_SELF_HEALD) as the
> selfheal daemon.
> I think approach 1 is better as it is independent of who does the heal
> (not sure if what the pid/ behavior is with disperse volume heals) and
> it makes sense to forget the inode when the corresponding file is no
> longer present in the back-end.
+1 for approach #1. Also, it might be beneficial for other xlators which
might want to freshly initiate inode context in such scenarios.
>
> Thanks,
> Ravi
>
>>> Please provide feedback.
>>>
>
>>> Regards,
>>> Raghavendra Bhat
>>
>> _______________________________________________
>> Gluster-devel mailing list
>> Gluster-devel at gluster.org
>> http://www.gluster.org/mailman/listinfo/gluster-devel
>
> _______________________________________________
> Gluster-devel mailing list
> Gluster-devel at gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-devel
More information about the Gluster-devel
mailing list