[Gluster-devel] BitRot notes

Venky Shankar yknev.shankar at gmail.com
Fri Dec 5 18:01:52 UTC 2014


On Fri, Nov 28, 2014 at 10:00 PM, Vijay Bellur <vbellur at redhat.com> wrote:
> On 11/28/2014 08:30 AM, Venky Shankar wrote:
>>
>> [snip]
>>>
>>>
>>> 1. Can the bitd be one per node like self-heal-daemon and other "global"
>>> services? I worry about creating 2 * N processes for N bricks in a node.
>>> Maybe we can consider having one thread per volume/brick etc. in a single
>>> bitd process to make it perform better.
>>
>>
>> Absolutely.
>> There would be one bitrot daemon per node, per volume.
>>
>
> Do you foresee any problems in having one daemon per node for all volumes?

Not technically :). Probably that's a nice thing to do.

>
>>
>>>
>>> 3. I think the algorithm for checksum computation can vary within the
>>> volume. I see a reference to "Hashtype is persisted along side the
>>> checksum
>>> and can be tuned per file type." Is this correct? If so:
>>>
>>> a) How will the policy be exposed to the user?
>>
>>
>> Bitrot daemon would have a configuration file that can be configured
>> via Gluster CLI. Tuning hash types could be based on file types or
>> file name patterns (regexes) [which is a bit tricky as bitrot would
>> work on GFIDs rather than filenames, but this can be solved by a level
>> of indirection].
>>
>>>
>>> b) It would be nice to have the algorithm for computing checksums be
>>> pluggable. Are there any thoughts on pluggability?
>>
>>
>> Do you mean the default hash algorithm be configurable? If yes, then
>> that's planned.
>
>
> Sounds good.
>
>>
>>>
>>> c) What are the steps involved in changing the hashtype/algorithm for a
>>> file?
>>
>>
>> Policy changes for file {types, patterns} are lazy, i.e., taken into
>> effect during the next recompute. For objects that are never modified
>> (after initial checksum compute), scrubbing can recompute the checksum
>> using the new hash _after_ verifying the integrity of a file with the
>> old hash.
>
>
>>
>>>
>>> 4. Is the fop on which change detection gets triggered configurable?
>>
>>
>> As of now all data modification fops trigger checksum calculation.
>>
>
> Wish I was more clear on this in my OP. Is the fop on which checksum
> verification/bitrot detection happens configurable? The feature page talks
> about "open" being a trigger point for this. Users might want to trigger
> detection on a "read" operation and not on open. It would be good to provide
> this flexibility.

Ah! ok. As of now it's mostly open() and read(). Inline verification
would be "off" by default due to obvious reasons.

>
>>
>>>
>>> 6. Any thoughts on integrating the bitrot repair framework with
>>> self-heal?
>>
>>
>> There are some thoughts on integration with self-heal daemon and EC.
>> I'm coming up with a doc which covers those [reason for delay in
>> replying to your questions ;)]. Expect the doc in in gluster-devel@
>> soon.
>
>
> Will look forward to this.
>
>>
>>>
>>> 7. How does detection figure out that lazy updation is still pending and
>>> not
>>> raise a false positive?
>>
>>
>> That's one of the things that myself and Rachana discussed yesterday.
>> Should scrubbing *wait* till checksum updating is still in progress or
>> is it expected that scrubbing happens when there is no active I/O
>> operations on the volume (both of which imply that bitrot daemon needs
>> to know when it's done it's job).
>>
>> If both scrub and checksum updating go in parallel, then there needs
>> to be way to synchronize those operations. Maybe, compute checksum on
>> priority which is provided by the scrub process as a hint (that leaves
>> little window for rot though) ?
>>
>> Any thoughts?
>
>
> Waiting for no active I/O in the volume might be a difficult condition to
> reach in some deployments.
>
> Some form of waiting is necessary to prevent false positives. One
> possibility might be to mark an object as dirty till checksum updation is
> complete. Verification/scrub can then be skipped for dirty objects.

Makes sense. Thanks!

>
> -Vijay
>


More information about the Gluster-devel mailing list