[Gluster-devel] BitRot notes

Mon Nov 24 10:49:23 UTC 2014

On 10/31/2014 04:09 PM, Venky Shankar wrote:
> Hey folks,
>
> Myself and Raghavendra (@rabhat) have been discussing about BitRot[1]
> and came up with a list of high level tasks (breakup items) captured
> here[2]. The pad will be updated on an ongoing basis reflecting the
> current status/items that are being worked on. As always, contributions
> in any form (design, code, doc, etc..) are more than welcome (just make
> sure you're heard on the pad/email :))
>
> [1]:
> http://www.gluster.org/community/documentation/index.php/Features/BitRot
> [2]: https://public.pad.fsfe.org/p/glusterfs-bitrot-notes
>

Thanks for this, Venky. This looks like a good start. A few questions 
and thoughts:

1. Can the bitd be one per node like self-heal-daemon and other "global" 
services? I worry about creating 2 * N processes for N bricks in a node. 
Maybe we can consider having one thread per volume/brick etc. in a 
single bitd process to make it perform better.

2. It would be good to consider throttling for filesystem scan and 
update of checksums. That way we can avoid overwhelming the system after 
enabling bitrot on pre-created data.

3. I think the algorithm for checksum computation can vary within the 
volume. I see a reference to "Hashtype is persisted along side the 
checksum and can be tuned per file type." Is this correct? If so:

a) How will the policy be exposed to the user?

b) It would be nice to have the algorithm for computing checksums be 
pluggable. Are there any thoughts on pluggability?

c) What are the steps involved in changing the hashtype/algorithm for a 
file?

4. Is the fop on which change detection gets triggered configurable?

5. It would be good to have the store & retrieval of checksums modular 
so that we can choose an alternate backend in the future (apart from 
extended attributes) if necessary.

6. Any thoughts on integrating the bitrot repair framework with self-heal?

7. How does detection figure out that lazy updation is still pending and 
not raise a false positive?

Regards,
Vijay