[Gluster-devel] bit rot support for glusterfs

Thu Jan 2 14:00:40 UTC 2014

> I will be starting to work on bit rot detection for glusterfs.

That's magnificent, and not only because it's a valuable feature.  It's great to see you still exhibiting such initiative.  :)

> 1. Depend on change-log to recompute checksum. This eliminates
> periodic crawl of brick/volume to update the checksum.

Absolutely.  This was always my biggest objection to Doug's design.  Crawling simply doesn't scale well.  Even for local replication we're moving to a more log-based approach.  The design you cite also mentions AFR-specific artifacts like outcast, which need to be avoided both as a matter of general practice and because those artifacts might no longer be relevant a year from now.

> 2. Policy to determine when checksum to recomputed. If a file is under
> going active I/O, then compute checksum only after a delay

This might play into what we are (or at least I am) planning with respect to tiering, HSM, data classification, or whatever you want to call it.  In fact, it might be something that you don't need to worry about at all.  If a volume/pool is divided into a "live" part geared toward high performance and an "archival" part geared toward longevity and storage efficiency - slower drives, erasure codes and/or bit rot detection instead of AFR/NSR - with transparent migration between them, then you might just be able to say that anything placed under your purview will have bit rot detection applied.  To put it another way, the policy would be applied elsewhere in the system.

> 3. Ability to turn off/on bit rot detection in volumes.
> 
> 4. If bit rot is turned on for a volume, a crawl would be necessary in
> this case to compute checksum.