[Gluster-devel] bad file access (bit-rot + AFR)
Venky Shankar
vshankar at redhat.com
Sat Jun 27 09:58:01 UTC 2015
On 06/27/2015 02:32 PM, Raghavendra Bhat wrote:
> Hi,
>
> There is a patch that is submitted for review to deny access to
> objects which are marked as bad by scrubber (i.e. the data of the
> object might have been corrupted in the backend).
>
> http://review.gluster.org/#/c/11126/10
> http://review.gluster.org/#/c/11389/4
>
> The above 2 patch sets solve the problem of denying access to the bad
> objects (they have passed regression and received a +1 from venky).
> But in our testing we found that there is a race window (depending
> upon the scrubber frequency the race window can be larger) where there
> is a possibility of self-heal daemon healing the contents of the bad
> file before scrubber can mark it as bad.
>
> I am not sure if the data truly gets corrupted in the backend, there
> is a chance of hitting this issue. But in our testing to simulate
> backend corruption we modify the contents of the file directly in the
> backend. Now in this case, before the scrubber can mark the object as
> bad, the self-heal daemon kicks in and heals the contents of the bad
> file to the good copy. Or before the scrubber marks the file as bad,
> if the client accesses it AFR finds that there is a mismatch in
> metadata (since we modified the contents of the file in the backend)
> and does data and metadata self-healing, thus copying the contents of
> the bad copy to good copy. And from now onwards the clients accessing
> that object always gets bad data.
I understand from Ravi (ranaraya@) that AFR-v2 would chose the "biggest"
file as the source, provided that afr xattrs are "clean" (AFR-v1 would
give back EIO). If a file is modified directly from the brick but leaves
the size unchanged, contents can be served from either copy. For
self-heal to detect anomalies, there needs to be verification
(checksum/signature) at each stage of it's operation. But this might be
too heavy on the I/O side. We could still cache mtime [but update on
client I/O] after pre-check, but this still would not catch bit flips
(unless a filesystem scrub is done).
Thoughts?
>
> Pranith?Do you have any solution for this? Venky and me are trying to
> come up with a solution for this.
>
> But does this issue block the above patches in anyway? (Those 2
> patches are still needed to deny access to objects once they are
> marked as bad by scrubber).
>
>
> Regards,
> Raghavendra Bhat
> _______________________________________________
> Gluster-devel mailing list
> Gluster-devel at gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-devel
More information about the Gluster-devel
mailing list